You are on page 1of 194

The theory of the foundations of mathematics

- 1870 to 1940 -
Mark Scheffer
(Version 1.0)
2
3
.
Mark Scheffer, id. 415968, e-mail: zax@chello.nl. Last changes:
March 22, 2002. This report is part of a practical component of the Com-
puting Science study at the Eindhoven University of Technology.
4
To work on the foundations of mathematics, two things are needed:
Love and Blood.
- Anonymous quote, 2001.
Contents
1 Introduction 9
2 Cantor’s paradise 13
2.1 The beginning of set-theory . . . . . . . . . . . . . . . . . . . 13
2.2 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Mathematical constructs in set-theory 21
3.1 Some mathematical concepts . . . . . . . . . . . . . . . . . . . 21
3.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Induction Methods . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.2 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 The principle of induction . . . . . . . . . . . . . . . . 34
3.5 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Dedekind’s cuts . . . . . . . . . . . . . . . . . . . . . . 46
3.5.2 Cantor’s chains of segments . . . . . . . . . . . . . . . 47
3.5.3 Cauchy-sequences . . . . . . . . . . . . . . . . . . . . . 48
3.5.4 Properties of the three definitions . . . . . . . . . . . . 50
3.6 Infinite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 The Continuum Hypothesis . . . . . . . . . . . . . . . . . . . 60
3.8 Cardinal and Ordinal numbers and Paradoxes . . . . . . . . . 63
3.8.1 Cardinal numbers and Cantor’s Paradox . . . . . . . . 63
3.8.2 Ordinal numbers and Burali-Forti’s Paradox . . . . . . 65
4 Peano and Frege 71
4.1 Peano’s arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Frege’s work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5
6 CONTENTS
5 Russell 79
5.1 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Consequences and philosophies . . . . . . . . . . . . . . . . . 88
5.3 Zermelo Fraenkel . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.1 Axiomatic set theory . . . . . . . . . . . . . . . . . . . 92
5.3.2 Zermelo Fraenkel (ZF) Axioms . . . . . . . . . . . . . 93
6 Hilbert 99
6.1 Hilbert’s proof theory . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Hilbert’s 23 problems . . . . . . . . . . . . . . . . . . . . . . . 110
7 Types 113
7.1 Russell and Whitehead’s Principia Mathematica . . . . . . . . 113
7.2 Ramsey, Hilbert and Ackermann . . . . . . . . . . . . . . . . . 119
7.3 Quine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8 G¨ odel 123
8.1 Informally: G¨ odel’s incompleteness theorems . . . . . . . . . . 123
8.2 Formally: G¨ odel’s Incompleteness Theorems . . . . . . . . . . 127
8.2.1 On formally undecidable propositions . . . . . . . . . . 127
8.2.2 The impossibility of an ‘internal’ proof of consistency . 130
8.2.3 G¨ odel numbering and a concrete proof of G
1
, G
2
and G
3
131
8.3 G¨ odel’s theorem and Peano Arithmetic . . . . . . . . . . . . . 132
8.4 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.5 Neumann-Bernays-G¨ odel axioms . . . . . . . . . . . . . . . . . 135
9 Church and Turing 141
9.1 Turing and Turing Machine . . . . . . . . . . . . . . . . . . . 141
9.2 Church and the Lambda Calculus . . . . . . . . . . . . . . . . 153
9.3 The Church-Turing thesis . . . . . . . . . . . . . . . . . . . . 166
10 Conclusion 169
A Timeline and Images 181
CONTENTS 7
Mathematical Notations
Many different notations have been developed for set theory and logic.
Most notations that we have used are standard today; other notations that
we have used are introduced in the text.
Mathematical Logic
symbol meaning also described as
∧ conjuction and
∨ disjunction (inclusive) or
negation not
ϕ(x) propositional function
→ implication if . . . then
↔ bi-implication if and only if, iff
≡ equivalence is equivalent to
∀ universal quantifier for all
∃ existential quantifier exists
∃! one-element existential quantifier exists a unique
In most places we have chosen to use the following notation
1
to denote
quantifications:
(relation : range : term)
denotes the relationship over a set of terms ranging over range
Consider a general pattern (Q x : ϕ(x
0
, . . . , x
n
) : t(x
0
, . . . , x
n
)), with Q
a quantifier, ϕ a boolean expression in terms of the dummies x
0
, . . ., x
n
,
and t(x
0
, . . . , x
n
) the term of the quantification. The quantification is the
accumulation of values t(x
0
, . . . , x
n
) using an operator or relation indicated
by Q, over all values (x
0
, . . . , x
n
) for which ϕ(x
0
, . . . , x
n
) holds.
1
Notation originally due to E.W. Dijkstra.
8 CONTENTS
This notation is suitable for formal manipulation and unambiguous in the
sense that it explicitly indicates the quantifier Q, the dummies and the range
of the dummies that is indicated by the boolean expression ϕ (i.e. it exactly
determines the domain of the quantification). This allows us to reason about
general properties of quantifications, in a way in which the (scopes of the)
bound variables are clearly identified. Note that this type of quantification
is only suitable for binary operations that are symmetric and associative.
Example:
(
¸
x : 0 ≤ x ≤ 5 : x
2
)
=
0
2
+ 1
2
+ 2
2
+ 3
2
+ 4
2
+ 5
2
=
5
¸
x=0
x
2
Example:
(∃x : x ∈ N : x
3
−x
2
= 18)

‘there exists a natural number x such that x
3
−x
2
= 18’
If the term ranges over all possible values of the variable (here : x), or if
it is clear what the range of a variable is, we can omit it.
Example:
(∀x : true : x ∈ A → x ∈ B)

(∀x :: x ∈ A → x ∈ B)

‘all elements of A are also elements of B’
Chapter 1
Introduction
Pure mathematics is, in its way, the poetry of logical ideas.
- Albert Einstein
This report covers the most important developments and theory of the
foundations of mathematics in the period of 1870 to 1940. The tale of the
foundations is fairly familiar in general terms and for its philosophical con-
tent; here the main emphasis is laid on the mathematical theory. The history
of the foundations of mathematics is complicated and is a many-sided story;
with this article I do not aim to give a definitive or complete version, but
to capture what I consider the essence of the theoretical developments, and
to present them in a clear and modern setting. Some basic mathematical
knowledge on set-theory and logics are presupposed.
By the middle of the nineteenth century, certain logical problems (for
example paradoxes around the notions of infinity, the infinitesimal and con-
tinuity) at the heart of mathematics had inspired a movement, led by German
mathematicians, to provide mathematics with more rigorous foundations.
This is where the theory of this report begins, with the emergence of set
theory by the German mathematician Cantor. In section 2.1 we informally
describe how work on a problem concerning trigonometric series gradually
led Cantor to his theory of sets (section 2.2). As a result of the work of
Weierstrass, Dedekind and Cantor, pure mathematics had been provided
with much more sophisticated foundations. The notion of infinitesimal had
been banished, ‘real’ numbers had been provided with a logically consistent
9
10 CHAPTER 1. INTRODUCTION
definition (section 3.5), continuity had been redefined and, more controver-
sially, a whole new branch of arithmetic had been invented which addressed
itself to the problems (e.g. paradoxes) of infinity (sections 3.6, 3.7).
In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish
but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly
different form by Burali-Forti (section 3.8.2). Cantor and Burali-Forti could
not resolve this paradox, but it was not taken so seriously, partly because
the paradoxes appeared in a rather technical region.
The Italian mathematician Peano (section 4.1) was able to show that the
whole of arithmetic could be founded upon a system that uses three basic
notions and five initial axioms. At the same time the German mathematician
Frege (section 4.2) worked on developing a logical basis for mathematics. Just
as Peano, Frege wanted to put mathematics on firm grounds. But Frege’s
grounds were strictly logic; he followed a development later called logicism,
also known as the development of so-called mathematical logic.
The British mathematician Russell noted Peano’s work and later that
of Frege. Soon thereafter he showed (section 5.1) how finite descriptions
like ‘set of all sets’ could be self-contradictory (i.e. paradoxical) and pointed
out the difficulties that arose with self-referential terms. This paradox that
Russell found existed not only in specific technical regions but in all of the
axiomatic systems underlying mathematics at the same time (section 5.1).
But since the paradoxes could be avoided in most practical applications of
set theory, the belief in set theory as a proper foundation of mathematics
remained. Axiomatic set theory (section 5.3.1) was an attempt to come to
a theory without paradoxes. Various responses to the paradox (section 5.2)
led to new sets of axioms for set theory. The two main approaches are by the
German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hun-
garian von Neumann, the Hungarian-Austrian G¨ odel and the Briton Bernays
(section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy of
mathematics by the Dutch mathematician Brouwer (not covered here) and
to a theory of types, proposed by Russell himself with the help of his for-
mer teacher, the English mathematician Whitehead. Despite of the paradox
Russell and Whitehead still claimed that all mathematics could be founded
on a mathematical logic; this believe was given a definite presentation in
their work ‘Principia Mathematica’ (section 7.1). Various consequences fol-
lowed (section 7.3) and new conceptions of logic arose (by Wittgenstein and
11
Ramsey, see section 7.2).
At the turn of the century, the German mathematician David Hilbert
listed certain important problems concerning the foundations of mathema-
tics and mathematics in general (section 6.2. To overcome paradoxes and
other problems that arose in existing systems, Hilbert developed a theory of
axiomatic systems (section 6.1). He then stimulated his student Zermelo in
using this axiomatic method to develop as first a set of axioms for set theory
(section 5.3.2). Hilbert had since then made more precise demands on any
proposed set of axioms for mathematics (section 6.1) in terms of consistency,
completeness and decidability.
In 1931 G¨ odel had shown that consistency and completeness could not
both be attained (chapter 8). G¨ odel’s work left outstanding Hilbert’s ques-
tion of decidability. The English mathematician Turing proved in 1936 that
there are undecidable problems, by giving the so-called halting problem that
cannot be solved by any algorithm (section 9.1), after formalizing the no-
tion of algorithm with his concept of the Turing Machine. The American
mathematician Church (independently) obtained the same result but with
another formalization of the notion of an algorithm, using his computational
model of lambda calculus (section 9.2). In section 9.3 we state that these two
notions are equivalent and correspond to the intuitive notion of algorithm or
computability. In chapter 10 I summarize the theory of the foundations of
mathematics, before giving my own opinion and make some suggestions for
future work.
This article is part of the practical component of my study of computing
science, and written for a large part in 8 weeks at the Heriot-Watt university
in Edinburgh under supervision of prof. F. Kamareddine. I want to thank
Rob Nederpelt and the formal methods section of the computing science de-
partment of the Eindhoven University of Technology for making this possible.
Rob Nederpelt always inspired me to continue working on this report and was
patient in explaining difficult proofs to me. And last but not least, I want
to thank Fairouz Kamareddine for her support and positive motivation, and
Boukje Nouwen (as she breathes a sigh of relief that this is (I think) the last
revision) for the typesetting and editing of large parts of this document and
for helping me in many ways to finish this article in such a small period of
time.
12 CHAPTER 1. INTRODUCTION
Chapter 2
Cantor’s paradise
2.1 The beginning of set-theory
Perhaps the most surprising thing about mathematics is that it
is so surprising. The rules which we make up at the beginning
seem ordinary and inevitable, but it is impossible to foresee their
consequences. These have only been found out by long study, ex-
tending over many centuries. Much of our knowledge is due to a
comparatively few great mathematicians such as Newton, Euler,
Gauss, or Riemann; few careers can have been more satisfying
than theirs. They have contributed something to human thought
even more lasting than great literature, since it is independent of
language.
- Titchmarsh, E. C. in [88]
By the late 19
th
century the discussions about the foundations of geometry
had become the focus for a running debate about the nature of the branches
of mathematics ([23, last paragraph of section 35, page 69/70]). Although
there had been no conscious plan leading in that direction, the stage was set
for a consideration of questions about the fundamental nature of mathema-
tics.
In the study of logic, the work of the English mathematician George Boole
in the 1850s ([49, chapter 2.S4, page 51]), and the American Charles Sanders
13
14 CHAPTER 2. CANTOR’S PARADISE
Peirce around 1880 ([49, page 187]), had contributed to the development of a
symbolism to explore logical deductions and in Germany the logician Gottlob
Frege (see [98]) had directed keen attention to fundamental questions.
All of these debates came together through the pioneering work of the
German mathematician Georg Cantor on the concept of a set. Cantor had
begun work in this area because of his interest in Riemann’s theory of trigono-
metric series.
In Germany at the university of Halle, the direction of Cantor’s research
turned away from number theory and towards analysis. This was due to
Heine, one of his senior colleagues at Halle, who challenged Cantor to prove
the open problem on the uniqueness of representation of a function as a
trigonometric series (see [30, section 5.2, page 182]). Starting from the work
on trigonometric series and on the function of a complex variable done by
the German mathematician Bernhard Riemann (see [75]) in 1854, Cantor in
1870 showed ([30, page 182]) that such a function can be represented in only
one way by a trigonometric series. Consideration of the collection of numbers
(originally termed ‘point sets’, see [30, section 5.2, page 184]) that would not
conflict with such a representation led him, first, in 1872, to define irrational
numbers in terms of convergent sequences of rational numbers (or quotients
of integers, see section 3.5.2) and then to begin his major lifework, the theory
of sets and the concept of transfinite numbers.
2.2. BASIC CONCEPTS 15
2.2 Basic concepts
The essence of mathematics lies in its freedom.
- Georg Cantor, quoted in [58]
In 1974 Cantor published his first article on set-theory. A set, wrote Can-
tor (in ‘Untersuchungen ¨ uber die Grundlagen der Mengenlehre I’, published
in [20, page 261-281]), is “a collection of definite, distinguishable objects of
perception or thought conceived as a whole”. In this report we use a similar
description of the concept of a set.
What is a set? A (finite or infinite) collection of objects, that is considered
as a single, abstract object.
A set is sometimes also called aggregate, class or (as it was first called by
Riemann (see [31, page 88]) and later by the mathematician Russell:) mani-
fold. The objects are also called elements or members of the set.
We denote a set of elements between brackets ‘¦’, ’¦’, and membership of
an element to a set by the membership relation ∈.
Example: If we consider a set that contains natural numbers, we write 4 ∈
¦2, 3, 4, 5¦ to indicate that 4 is an element of the set ¦2, 3, 4, 5¦. We write
4 ∈ ¦7, 8, 9¦ to indicate that 4 is not an element of the set ¦7, 8, 9¦.
In a mathematical context we mostly consider sets of numbers and functions.
We denote the well-known sets of natural numbers by N (this set is also called
the naturals), the integers by Z, the fractional numbers by Q (this set is also
called the rationals) and the reals by R (this set is also called the continuum).
The objects of a set themselves can also be sets.
What is set theory? A branch of mathematics that deals with the proper-
ties of well-defined collections of objects, which may be of a mathematical
nature, such as numbers or functions, or not.
16 CHAPTER 2. CANTOR’S PARADISE
Cantor defined ([49, page 288]) two sets A and B to be identical (equal),
notation A = B, if and only if A and B have the same elements. When later
set-theory was axiomatized, this definition became also known as the
Axiom of extensionality: A = B := (∀x :: (x ∈ A ↔ x ∈ B))
Example: ¦3, 3, 7¦ = ¦7, 3¦ and ¦2, ¦3, 4¦¦ = ¦¦2, 3¦, 4¦
The relation ‘is a subset of’, notation ⊆, indicates that one set is con-
tained in the other:
Definition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B)
Definition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A = B)
We often want to create a new set from a given set by selecting elements
that have certain properties. For example we take the set of powers of three
or the set of all even numbers (to be exact: the set containing those ele-
ments of the set of natural numbers that have the property to be divisible
by 2). This principle was used by Cantor, and we also call it the unrestricted
or naive comprehension principle because it later (see sections 3.8 and 5.1)
turned out to be untenable.
Comprehension principle: For all properties ϕ there is precisely one set,
denoted by ¦x [ ϕ(x)¦, whose elements are exactly those objects which have
the property ϕ.
We thus have that y ∈ ¦x [ ϕ(x)¦ ↔ ϕ(y). As a consequence (by taking
for all x, ϕ(x) = false), there is at least one set that has no elements: the
empty set, denoted by ∅.
Theorem: (∃!x :: (∀y :: y / ∈ x))
Proof: If we take ϕ to be false, the comprehension principle says that ‘there
is precisely one set whose elements are exactly those objects which have the
property false’. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)).
This is equivalent to saying there is no element y that can be a member of
x: (∃!x :: (∀y :: y / ∈ x)). From now on, we denote this unique set x by ∅ and
call it the empty set.
2.2. BASIC CONCEPTS 17
Corollary: (∀a :: ∅ ⊆ a)
Proof: We want to prove that (∀a :: ∅ ⊆ a) or, using the definition of the
subset relation: (∀x :: x ∈ ∅ → x ∈ a). From the previous theorem we know
that (∀y :: y / ∈ ∅). This yields us (∀x :: false → x ∈ a), which is true.
Using the comprehension principle we can create new sets from given sets.
So now we can introduce some operations on sets, by applying the compre-
hension principle. But before we do that, we first introduce some general
(i.e. regardless whether the operations are set-theoretic or not) properties
of operations: idempotence, commutativity, associativity and distributivity.
Although Cantor did not formulate these properties as such, they are used
in the branch of calculus and useful in the set theory that follows in this
chapter.
Suppose ⊕ and are binary
1
operations on a certain domain and E, F and
G are elements on that domain (for example sets), on which we have defined
the equality relation ‘=’.
Definition of idempotence:
⊕ is idempotent := (∀E :: E ⊕E = E)
Definition of commutativity:
⊕ is commutative := (∀E, F :: E ⊕F = F ⊕E)
Definition of associativity:
⊕ is associative := (∀E, F, G :: (E ⊕F) ⊕G = E ⊕(F ⊕G))
Definition of distributivity:
⊕ is distributive
2
over := (∀E, F, G :: E ⊕(F G) = (E ⊕F) (E ⊕G))
1
These properties can also be generated for operations of arbitrary arity, but this will
not be necessary for our discussion.
2
This form of distributivity is also called left-distributivity, as opposed to right-
distributivity.
⊕ is right-distributive over := (∀E, F, G :: (E F) ⊕G = (E ⊕G) (F ⊕G))
In ordinary mathematics this distinction is often left out for commutative operations, and
we for example simply say that is distributive over + (when in fact it is both left- and
right-distributive).
18 CHAPTER 2. CANTOR’S PARADISE
The symbol ∪ is employed to denote the union of two sets. Thus, the set
A ∪ B is defined as the set that consists of all elements belonging either to
set A or set B.
Definition of union: A ∪ B := ¦x [ x ∈ A ∨ x ∈ B¦
The intersection operation is denoted by the symbol ∩. A ∩ B is defined
as the set composed of all elements that belong to both A and B.
Definition of intersection: A ∩ B := ¦x [ x ∈ A ∧ x ∈ B¦
Any two sets the intersection of which is the empty set are said to be dis-
joint. A collection of sets is called (pairwise) disjoint or mutually exclusive
if any two distinct sets in it are disjoint.
Example: The operations union and intersection on sets are both idempo-
tent, commutative and associative.
The difference of sets B and A, denoted B −A, contains those elements
of B, that are not in A.
Definition of difference: B −A := ¦x [ x ∈ B ∧ x / ∈ A¦
If A ⊆ B we often call the difference B−A the relative complement of A
in B. We then call B the universe, and if it is clear what the universe is we
often denote the relative complement of A by A
c
. From the definitions that
we have introduced so far, we can deduce three properties that are known as
the laws of reciprocity. The second and third law are also known as the laws
of de Morgan, named after the English mathematician Augustus de Morgan:
First law of reciprocity: A ⊆ B ↔ A
C
⊇ B
C
Second law of reciprocity: (A ∪ B)
C
= A
C
∩ B
C
Third law of reciprocity: (A ∩ B)
C
= A
C
∪ B
C
We define the power set of V , denoted by {(V ), as the set of all subsets
of V . Note that if V = ∅, this operation creates a larger set from a given set
V .
2.2. BASIC CONCEPTS 19
Definition of powerset: {(V ) := ¦A [ A ⊆ V ¦
Given a set V , we thus have that (∀y :: y ∈ {(V ) ↔ y ⊆ V )
We can extend the union of a pair of sets to any finite collection of sets;
the union is then defined as the set of all objects which belong to at least
one set in the collection A. We can do the same for the intersection.
Definition:
¸
A := ¦x [ (∃y :: y ∈ A ∧ x ∈ y)¦
Definition:
¸
A := ¦x [ (∀y :: y ∈ A → x ∈ y)¦
We can divide a set of objects into a partition, that is a family of subsets
that are mutually exclusive and jointly exhaustive. Assume P is a set of
subsets of X.
Definition of partition: P is a partition of X :=
X =
¸
¦A [ A ∈ P¦ ∧ (∀A, B : A, B ∈ P : A = B ∨ A ∩ B = ∅)
In this chapter I have made extensive use of [30] in section 2.1 and [17]
in section 2.2.
20 CHAPTER 2. CANTOR’S PARADISE
Chapter 3
Mathematical constructs in
set-theory
3.1 Some mathematical concepts
The mathematician is entirely free, within the limits of his imagi-
nation, to construct what world he pleases. What he is to imagine
is a matter for his own caprice; he is not thereby discovering the
fundamental principles of the universe nor becoming acquainted
with the ideas of God. If he can find, in experience, sets of entities
which obey the same logical scheme as his mathematical entities,
then he has applied his mathematics to the external world; he has
created a branch of science.
- J.W.N. Sullivan in Aspects of Science, 1925
Now that we have this apparatus of set-theory available, we will see that
it is not just a separate branch of mathematics, but that we can define some
basic mathematical constructs in set-theory. In this section we will consider
pairs and the cartesian product, necessary before we can treat relations (in
section 3.2) and functions (in section 3.3).
First we consider the mathematical concept of an ordered pair < a, b >.
Compared to a ‘normal’ pair, where two pairs are considered equal if they
have the same elements, we want an ordered pair to also have the property
21
22 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
that the elements appear in the same order:
(∀c, d :: < a, b > = < c, d > ↔ a = c ∧ b = d)
We can now easily verify that the following definition (see [17, chapter
8]) in set-theory satisfies the desired property.
Definition of ordered pair
1
: < a, b > := ¦a, ¦a, b¦¦
As the cartesian product A B is by definition the set of all ordered
pairs < a, b > with a ∈ A and b ∈ B, we can now use the same definition in
set-theory:
Definition of cartesian product: A B := ¦< a, b > [ a ∈ A ∧ b ∈ B¦
Let V = ¦V
i
[ i ∈ I¦ be a set of sets. We now define the cartesian product
of a set of sets, denoted by V or
i∈I
V
i
. The definition uses the concept of
a function, that will be introduced on page 29.
Definition of cartesian product of a set of sets:
V := ¦f : I →
¸
i∈I
V
i
[ (∀i : i ∈ I : f(i) ∈ V
i

1
Representation originally by Kuratowski, see [49, page 294].
3.2. RELATIONS 23
3.2 Relations
Mathematicians do not study objects, but relations between ob-
jects. Thus, they are free to replace some objects by others as
long as the relations remain unchanged. Content to them is irre-
levant: they are interested in form only.
- J.H. Poincar´ e
In mathematics, a relation maps each element from an input set (called
domain) to either true or false. We formalize this notion in set-theory.
Definition of binary relation:
R is a binary relation between X and Y := R ⊆ X Y
Note: We can easily generalize this definition for n-ary relations: R is an
n-ary relation on X
1
, . . . X
n
:= R ⊆ X
1
, X
2
. . . X
n
, for n ∈ N. We call n
the arity of the function.
Example: We have already seen the definitions of the subset and proper sub-
set relations in section 2.1. There we defined the set R ⊆ XY implicitly by
using a statement; only those pairs < x, y > are in R for which the statement
holds (here we are using in fact the comprehension principle of page 16). We
will continue to use statements to define relations.
We define the following shorthand notation (sometimes also written in
infix notation as xRy): R(x, y) := < x, y > ∈ R.
The mathematical expression ‘x < y’ is now equivalent to the set theoretic
expression ‘< x, y >∈ R’, with R representing the ‘less than’ relation.
Example: The relation < on the naturals (i.e. between N and N) can be
defined as:
< 0, 1 >, < 1, 2 >, < 2, 3 >, . . .
< 0, 2 >, < 1, 3 >, < 2, 4 >, . . .
< 0, 3 >, < 1, 4 >, < 2, 5 >, . . .
.
.
.
24 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
On a relation R we can define the concepts of domain and range.
Definition of domain, range:
dom(R) := ¦x ∈ X [ (∃y : y ∈ Y : R(x, y))¦
ran(R) := ¦y ∈ Y [ (∃x : x ∈ X : R(x, y))¦
If we define the identity relation of X, we want it to have the usual pro-
perty that id
X
(x) = x for all x ∈ X (see for example [3, section 1.9.5.b, page
30]). In set-theory, we denote the identity relation on V by I
V
.
Definition of identity relation: I
V
:= ¦< x, y >∈ V V [ x = y¦
Assume R is a binary relation on a set X (i.e. R ⊆ X X). As we did
for operations in section 2.2, we can also define some general properties of
relations. Note that we have already defined an equality relation ‘=’ on X at
page 16. Hereby we can explicitly state on which domain the property holds
(e.g. R is reflexive on X) or leave this implicit (e.g. simply R is reflexive).
Definition of reflexivity:
R is reflexive := (∀x : x ∈ X : R(x, x))
Definition of symmetry:
R is symmetric := (∀x, y : x, y ∈ X : R(x, y) → R(y, x))
Definition of anti-symmetry:
R is anti-symmetric := (∀x, y : x, y ∈ X : R(x, y) ∧ R(y, x) → x = y)
Definition of transitivity:
R is transitive := (∀x, y, z : x, y, z ∈ X : R(x, y) ∧ R(y, z) → R(x, z))
Definition of connectivity:
R is connective := (∀x, y : x, y ∈ X : R(x, y) ∨ (x = y) ∨ R(y, x))
Definition of equivalence:
R is an equivalence relation := R is reflexive, symmetric and transitive
3.2. RELATIONS 25
Note: Asymmetric means not symmetric, and is not the same as anti-
symmetric.
Example: The subset relation is reflexive, anti-symmetric (note that the proof
of anti-symmetry uses the axiom of extensionality of page 16) and transitive,
but not connective.
If R is an equivalence relation on a set X, we denote the equivalence class
of x with respect to R as [x]
R
.
Definition of equivalence class: [x]
R
:= ¦y ∈ X [ R(x, y)¦
If R is an equivalence relation on X, the quotient set X/R of X modulo
R is the set of equivalence classes [x]
R
for all x ∈ X.
Definition of quotient set: X/R := ¦[x]
R
[ x ∈ X¦
We now continue to build on the concept of relations, by categorizing
them based on the properties they have. An important property of relations
is the ability to compare and order elements. Suppose X and Y are sets, and
R is a relation on X.
Definition of (weak) partial ordering: R is a (weak) partial ordering :=
R is reflexive, anti-symmetric and transitive (on X)
Definition of quasi ordering: R is a quasi ordering := R is irreflexive and
transitive
Definition of strict partial ordering: R is a strict partial ordering :=
R is irreflexive, anti-symmetric and transitive
Definition of (total or linear) ordering: R is a (total or linear) ordering
:= R is irreflexive, anti-symmetric, transitive and connective
Definition of well-ordering: R is a well-ordering := R is an ordering on
X and each nonempty subset of X has a least element
26 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Definition of well-foundedness: A set V is well-founded by a relation R
:= S is partially ordered by R and contains no infinite descending chains
A set S contains a set C that is an infinite descending chain iff
C ⊂ S ∧ C has no minimal element.
Theorem: (without proof) Any subset of a well-founded set is also well-
founded.
Now we can speak of a set of which the elements are ordered by a relation
R, we define the well-known concepts of (immediate) successor and prede-
cessor.
Definition of (immediate) predecessor: An element x
1
∈ X is a pre-
decessor of an element x
2
∈ X (with respect to an ordering R on X) :=
R(x
1
, x
2
) ∧ R(x
2
, x
1
). x
1
is an immediate predecessor of x
2
if in addition
(∃x
3
: x
3
∈ X ∧ x
3
= x
1
∧ x
3
= x
2
: R(x
1
, x
3
) ∧ R(x
3
, x
2
))
Definition of (immediate) successor: An element x
2
∈ X is a suc-
cessor of an element x
1
∈ X (with respect to an ordering R on X) :=
R(x
1
, x
2
) ∧ R(x
2
, x
1
). x
2
is an immediate successor of x
1
if in addition
(∃x
3
: x
3
∈ X ∧ x
3
= x
1
∧ x
3
= x
2
: R(x
1
, x
3
) ∧ R(x
3
, x
2
))
Note that with these definitions it can be easily proved that if a relation
R on X is an ordering, then each element except the smallest has a unique
immediate predecessor and each element except the largest has a unique
immediate successor. The notions of smallest and largest elements will be
introduced hereafter. In the literature the immediate successor or predeces-
sor is sometimes called just successor or predecessor. Sometimes we also see
that the term ‘direct’ is used in stead of ‘immediate’, or we simply speak of
the ‘next’ or ‘previous’ value.
When R is a partial ordering we often denote it by the symbol _, and
when it is a quasi ordering by ≺. Now we can distinguish elements based on
their order. Let X be a set, partially ordered by _ and let Y be a subset of X.
Definition of minimal element:
x is a minimal element of X := x ∈ X ∧ (∃y : y ∈ X ∧ y = x : y _ x)
3.2. RELATIONS 27
Definition of maximum element:
x is a maximum element of X := x ∈ X ∧ (∃y : y ∈ X ∧ y = x : x _ y)
Definition of least element:
x is a least (also called smallest or first) element of X :=
x ∈ X ∧ (∀y : y ∈ X : x _ y)
Definition of maximal element:
x is a maximal (also called greatest, largest, last) element of X :=
x ∈ X ∧ (∀y : y ∈ X : y _ x)
Definition of lowerbound:
x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x _ y)
Definition of upperbound:
x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y _ x)
Definition of infimum:
x is an infimum for Y in X := x is the greatest lowerbound for Y in X
Definition of supremum:
x is a supremum for Y in X := x is the smallest upperbound for Y in X
Example: Let X = ¦4, 6, 12, 24, 36¦ and R(x, y) := x is a divisor of y. Then
R is a partial order (but not strict) and also a quasi order, but not a (total)
order. 4 and 6 are minimal elements of X, but X has no least element. 1 is
a lowerbound for X, and 2 is the infimum of X.
28 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
The so-called least number principle says that any non-empty subset of
the natural numbers has a least element. This principle can be shown (a
proof can be found in [59, page 7]) to be equivalent to the principles of weak
and strong induction, that will be introduced in section 3.4.
Example: The relation < on the naturals is an example of a total ordering
on N. From the so-called least number principle we can conclude that N is
also well-ordered by <. We prove the latter.
Proof: We know that < is an ordering on N. We show by induction on the
number of elements of A, notation [ A [, that (∀A : A ⊆ N ∧ A = ∅ : A has
a least element).
Suppose N = ¦0, . . . , n¦, n ∈ N. Let A ⊆ N. For [ N [ = 0 it is trivial that
A is well-ordered. For [ N [ = n + 1, if A ∩ ¦0, . . . , n¦ = ∅, n + 1 is a least
element of A. If A ∩ ¦0, . . . , n¦ = ∅, we can apply the induction principle
to conclude that A ∩ ¦0, . . . , n¦ has a least element. The least element of
A ∩ ¦0, . . . , n¦ is also a least element of A ∩ ¦0, . . . , n + 1¦.
3.3. FUNCTIONS 29
3.3 Functions
In mathematics, a function maps each element from an input set to one or
more elements of an output set; in other words it is a special kind of relation
that indicates for each pair < x, y > of the input and output set if it belongs
to the function or not. More precisely, f is a function or mapping from X
to Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y , no-
tation f(x) = y. We can define this notion in set-theory by using a relation
between X and Y such that for each x ∈ X there is a unique y ∈ Y such
that < x, y >∈ f.
Definition of function: f is a function from a set X to a set Y , notation
f : X → Y := f ⊆ X Y ∧ (∀x : x ∈ X : (∃!y : y ∈ Y : < x, y >∈ f))
The definitions of domain and range as given in the subsection about
relations can now also be used for functions. We now introduce a notation
for the set of all functions f : X → Y .
Definition of Y
X
: Y
X
:= ¦f ∈ {(X Y ) [ f is a function from X to Y ¦
As we did before for relations and operations, we now define some general
properties for functions.
Definition of injective: f : X → Y is injective or an injection :=
(∀x
1
, x
2
: x
1
, x
2
∈ X : x
1
= x
2
→ f(x
1
) = f(x
2
))
Definition of surjective: f : X → Y is surjective or a surjection :=
(∀y : y ∈ Y : (∃x : x ∈ X : y = f(x))
Definition of bijective:
f : X → Y is bijective or a bijection := f is surjective and f is injective
If f is bijective, f is also called a (one-to-one) correspondence between
X and Y .
Example: We have the following property:
f : X → Y is surjective ↔ Ran(f) = Y .
30 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Example: f : N → [−2π, 2π], with f(x) = sin(x) is a function and a relation.
g : [−2π, 2π] →N, with g(x) = y iff x = sin(y) is a relation, not a function.
We will now consider two special kinds of functions: the identity function
and the sequence.
Definition of sequence:
s is a sequence of X := s is a function from N to X (i.e. s ∈ X
N
)
Definition of identity function:
The identity function id
X
:= id
X
: X → X and (∀x : x ∈ X : id
X
(x) = x)
We now introduce some operations on functions in set-theory. We can
easily check that these definitions correspond to mathematical operations.
Definition of composition: The composition g◦f of two functions f : A →
B and g : B → C := the function g ◦ f : A → C with g ◦ f(x) = g(f(x)), for
all x ∈ A
Definition of inverse function: The inverse of a bijection f : X → Y :=
the function f
−1
: Y → X with (∀y : y ∈ Y : f
−1
(y) = x ↔ y = f(x))
Definition of restricted function: The restriction of a function f :
X → Y to X
0
, with X
0
⊆ X := the function fX
0
: X
0
→ Y with
(∀x : x ∈ X
0
: fX
0
(x) = f(x))
Just as in algebra, we can now combine a set and relations on that set
into a structure.
Definition of (relational) structure: 'X, R
0
, . . . , R
p
` is a (relational)
structure := X is a set and R
0
, . . . , R
p
are relations on X
The concept of a structure enables us to abstract from the exact set and
relations, and reason about sets of structures instead. There also is a useful
definition for equivalence of structures, called isomorphism.
3.3. FUNCTIONS 31
Let R = 'X, R
0
, . . . , R
p
` and S = 'Y, S
0
, . . . , S
p
` be two structures, such
that (∀i : 0 ≤ i ≤ p : the arity of R
i
and S
i
is n
i
+ 1).
Definition of isomorphism: f is an isomorphism between R and S := f
is a bijection from X to Y and (∀i : 0 ≤ i ≤ p : (∀x
0
, . . . , x
n
i
: x
0
, . . . , x
n
i

X : R
i
(x
0
, . . . , x
n
i
) ↔ S
i
(f(x
0
), . . . , , f(x
n
i
))))
With the notion of isomorphism, we can now abstract over structures.
When two structures are similar (the sets are of the same size and the rela-
tionships between the elements in one structure are retained between images
of those elements in the other structure), we call them isomorphic.
Definition of isomorphic: Two structures R and S are isomorphic, nota-
tion R · S := there exists an isomorphism from R to S
Definition of automorphism:
f is an automorphism of R := f is an isomorphism from R to R
Example: An isomorphism from structure 'N, <` to 'N
even
, <` is given by
f : N → N
even
, with f(n) = 2n. f is not an isomorphism from 'N, ⊕` to
'N, <`, with a ⊕b := b divides a.
Example: The function g : R
+
→R
+
with g(x) = log(x) is an isomorphism
between 'R
+
, ∗` and 'R
+
, +`, because for all r
1
, r
2
∈ R
+
, log(r
1
∗ r
2
) =
log(r
1
) +log(r
2
).
Example: An automorphism of 'A, R
0
, . . . , R
p
` is the identity function id
A
:
A → A, so id
A
= ¦< a, a > [ a ∈ A¦. Also, the function f(x) = 2x
3
is an
automorphism of 'R, <`.
32 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
3.4 Induction Methods
There is a tradition of opposition between adherents of induction
and deduction. In my view it would be just as sensible for the two
ends of a worm to quarrel.
- A. Whitehead, quoted in [76]
3.4.1 Induction
Induction is a method of reasoning from a part to a whole, from particu-
lars to generals, or from the individual to the universal. It should not be
confused with the mathematical principle of induction (treated in section
3.4.3). In ordinary induction we examine a certain number of cases and
then generalize. Reasoning by analogy, where a conclusion is made based on
an analogues situation, is also a primitive form of induction (see [23, page 6]).
Example of inductive reasoning:
2
Coffee shop burger no. 1 was greasy . . .
Coffee shop burger no. 2 was greasy . . . . . .
.
.
.
Coffee shop burger no. 100 was greasy . . .
Therefore, all coffee shop burgers are greasy (or: the next coffee shop burger
will be greasy).
So in induction the conclusion contains information that was not con-
tained in the premisses. This is the source of uncertainty in inductions:
inductions are strengthened as confirming instances pile up, but they can
never bring certainty (unless every possible cause is actually examined, in
which case they become deductions). As said in [49, page 366], the broad
difference between deductive and inductive reasoning is that in deduction
the conclusion asserts less than the premisses, whereas in induction it asserts
more. In chapter 14, section 3 of [49] there is a more detailed treatment of
inductive reasoning, including a distinguishment between determinative and
conceptual induction. In both these kinds of induction, the conclusion goes
beyond the premisses (or the evidence).
2
Example from: Peter Suber, Philosophy department, Earlham College.
3.4. INDUCTION METHODS 33
3.4.2 Deduction
Mathematics, in its widest significance, is the development of all
types of formal, necessary, deductive reasoning.
- A. Whitehead, quoted in [100]
In contrast to induction, deduction is a method of reasoning that is based
on a rigorous proof: a derivation (using fixed rules called a system of logic), of
one statement (the conclusion) from one or more statements (the premisses)
- i.e. a chain of statements, each of which is either a premise or a consequence
of a statement occurring earlier in the proof. In deductive reasoning, we are
not directly concerned with the truth of the conclusion but rather whether
the conclusion does or does not follow from the premisses. If the conclusion
follows from the premisses, we say that our reasoning is valid; if it does not
we say that our reasoning is invalid.
The Greek found deductive reasoning, not empirical procedures, the method
to establish mathematical facts. This usage is a generalization of what the
Greek philosopher Aristotle called the syllogism (see [49, chapter 1, section
5 and 6)]), but a syllogism is now recognized as merely a special case of a
deduction. Also, the traditional view that deduction proceeds from the gene-
ral to the specific has been abandoned as incorrect by most logicians. Some
experts regard all valid inferences as deductive in form and for this and other
reasons reject the supposed contrast between deduction and induction. The
German mathematician Hilbert greatly contributed to deductive reasoning as
we will see when we introduce his proof theory (also known as the axiomatic
method) in chapter 6. Logic, in mathematical context, can be seen as the
theory of the formal structure of deductive reasoning. The logic of Hilbert’s
metamathematics (see section 6.1) and Russell’s Principia Mathematica (see
section 7.1) are a form of reasoning with deductive certainty, although others
have proposed different formalizations of deductive logic (see [49, page 121]).
Originally based on Aristotle’s logic, the deductive argument has become
more subtle and complex and is now based on modern symbolic logic.
34 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
3.4.3 The principle of induction
Informal
The principle of induction, also known as mathematical induction, is an
important process for proving theorems. It was even used by Peano to define
the concept of natural numbers (see section 4.1, axiom 3). ‘Mathematical
induction’ is unfortunately named, for it is unambiguously a form of deduc-
tion. The name was probably inspired by the fact that, just like induction,
it generalizes to a whole set from a smaller sample. But, as we will see,
mathematical induction concludes with deductive certainty.
The informal structure of the proof of a theorem by mathematical induc-
tion is fairly simple:
1) Basis. Prove that the theorem holds for a specific case (which often is
minimal for a given ordering of the elements). This case is also called
base case.
2) Induction step. Prove a rule that says that if the theorem holds for an
arbitrary element, it is true for the next case. This often is a rule of
heredity that tells us that the theory is true for the immediate successor
case of an arbitrary element if it is true for the arbitrary element itself.
The claim that the theorem is true for an arbitrary element is called
the induction hypothesis.
3) Conclusion. Together, 1 and 2 imply that the theorem holds for all
cases starting with the base case. If you didn’t use the minimal case in
step 1, then you have proven only that the theorem holds for that case
and its successors, not for all possible cases.
The induction step can take two forms which correspond to two forms of
mathematical induction. Again we assume there is an ordering of the ele-
ments with +1 the immediate successor relation.
Weak: prove that if the theorem holds for an arbitrary element n, then it
holds for the element n + 1
Strong: prove that if the theorem holds for all elements up to some arbitrary
element n, then it holds for the element n + 1
3.4. INDUCTION METHODS 35
We will now formally state the principle of induction. This is important,
since many mistakes are being made in applying the principle. It does not
go without saying that if we are to use mathematical induction to prove that
some theorem applies to ‘all possible cases’, then those cases must somehow
be enumerable and in some way linked to the integers. And we have to be
able to speak about the minimal case, the n
th
case, the successor of a given
case, etc.
Formal
Suppose that we want to prove a property ϕ(s) that holds for all s ∈
S. The induction principle assumes that S is a well-founded set and every
element except for the smallest has an immediate predecessor. This condition
is also known as S is inductive. The structure of an inductive set in fact
resembles that of the naturals, i.e. if we have the axioms (see Peano axioms
in section 4.1) 0 is in N and if x is in N then x + 1 is in N, the set N is
inductive. In case the set S is the naturals, we also refer to the principle as
natural induction.
The principle presupposes the following two conditions:
A
l
S is a set, well-founded by relation R (such that ‘+’ denotes the im-
mediate successor of an element with respect to the relation R) and
with smallest element e
B
l
Every element except e has a (unique) immediate predecessor and ϕ
is a property of elements of S
If A
l
and B
l
hold, we can use the induction principle.
Definition of the (weak) (mathematical) induction principle:
if
C
l
ϕ(e) (i.e. e has a property ϕ)
D
l
(∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i.e. if s ∈ S has property ϕ, then the
(unique) immediate successor of s also has property ϕ)
then the property ϕ holds for every element in S
36 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Step C
l
is also called the base of a proof by induction, step D
l
is also
called the induction step, and ϕ(s) is called the induction hypothesis.
Proof: Suppose S is a well-founded set and every element except the small-
est, denoted e, has an immediate predecessor, and suppose that a property
ϕ is true for e, as well as for the immediate successor s+ ∈ S if it is true for
s ∈ S. We now prove by contradiction that ϕ holds for all s ∈ S. Suppose
that ϕ is not true for all s ∈ S. Let N be the set of elements of S for which
ϕ is not true, i.e. N = ¦s ∈ S [ ϕ(s)¦. By the theorem of page 26 we also
know that if S is well-founded, any subset of S is also well-founded, thus N
contains a smallest element n. If n = e, we have a contradiction. If n > e, n
has an immediate predecessor, denoted n−. Since n is the smallest element
for which ϕ doesn’t hold, ϕ must hold for n−. But then by D
l
, ϕ must also
hold for the immediate successor of n−, that is n: contradiction. Thus ϕ
must be true for all s ∈ S.
As we mentioned before, this principle can be generalized in several ways.
One way is to prove in step C
l
that ϕ holds for a (possibly non-minimal) case
b ∈ S. In step D
l
we then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)).
The conclusion then is that the property ϕ holds for all elements in S that
are ordered larger or equal to b.
We now show (with proof by contradiction) why the additional property B
l
that every element except the smallest must have an immediate predecessor
is necessary for the induction principle.
Consider the natural numbers with the ordering < defined as follows:
• if n and m are both even, then n < m if n < m
• if n and m are both odd, then n < m if n < m
• if n is even and m is odd, we always define n < m
We can check that N is well-founded by <, but not every element (for
example 1) has an immediate predecessor. We take the property ϕ that every
element is even. The smallest element in the ordering is 0, which is even.
Also, if s has property ϕ then so does the successor of s. That is because
in our ordering, the successor of an even number is always the next even
number, never an odd number, and if s has property ϕ, then s must be even.
3.4. INDUCTION METHODS 37
Therefore (with only conditions A
l
, C
l
and D
l
holding) every natural num-
ber is even: contradiction!
There is however a weaker principle, called transfinite induction which -
suitably stated - does apply to every well-ordered set. But first we regard a
stronger principle, that is based on the same assumptions ( A
l
and B
l
) as the
weak induction principle.
Principle of strong (mathematical) induction: The same as for (weak)
induction, but instead of C
l
and D
l
with
D
2

) (∀x : x ∈ S : (∀y : y ∈ S : R(y, x) → ϕ(y)) → ϕ(x)) (i.e. for all x ∈ S
we have ϕ(x) if all R-predecessors y of x have property ϕ)
Sometimes this is also informally stated using the infamous three dots as
(∀s : s ∈ S : (ϕ(e) ∧ ϕ(e+) ∧ . . . ∧ ϕ(s)) → ϕ(s+).
Proof: Suppose 'X, R` is a structure such that A
l
, B
l
and E
l
hold. Again
we use proof by contradiction, and assume (∃x : x ∈ X : ϕ(x)). Thus
¦x ∈ X [ ϕ(x)¦ is non-empty and has a smallest element e

(since 'X, R`
is well-founded). We now have ϕ(e

) ∧ (∀z : z ∈ X : R(z, e

) → ϕ(z)).
According to E
l
(substitute z for y, X for S, and take e

for x) we then have
ϕ(e

): contradiction.
Note that the base case is not really left out, since it is implicitly present
in the quantification (take e for x). This form of induction, when applied
to ordinals (ordinals form a well-ordered and hence well-founded set and are
introduced in section 3.8.2) is called transfinite induction.
Principle of transfinite induction
3
: The same as for strong induction,
but instead of A
l
and B
l
as assumptions, it can be applied to any set S
that is well-ordered by relation a R, and with smallest element e.
3
Sometimes this principle is called the Principle of Complete Induction, for example in
[4], but this is less common.
38 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
An example of such a set are the ordinals or cardinals, or even the class
of all ordinals. A proof by transfinite induction typically needs to distinguish
three cases:
1. s is a minimal element
2. s has an immediate predecessor (i.e. the set of elements which are
smaller than s has a largest element)
In this case we can apply normal induction.
3. s has no immediate predecessor (i.e. s is a so-called limit-ordinal, see
also section 3.8.2)
The case for limit ordinals is typically approached by noting that a limit
ordinal b is (by definition) the union of all ordinals a < b and using this
fact to prove ϕ(b) assuming that ϕ(a) holds true for all a < b.
Proof: The proof of the principle of transfinite induction is similar to the
proof of the strong induction principle.
Clearly, all three given principle are equivalent, since we proved them to
be true. These proofs however are based on an underlying set of axioms (the
so-called ZF axioms and the Peano axioms, that will be introduced in section
5.3 and chapter 4 respectively). Without these conditions (to be exact, with-
out Peano’s induction axiom), we cannot directly prove the principles to be
true from the ZF axioms alone
4
. In that case we can prove the equivalence
of the principles by showing that they imply each other. As an example,
we now prove that (mathematical) induction is a special case of transfinite
induction, for the set of natural numbers. To prove this it suffices to show
that ( C
l
and D
l
) ↔ E
l
.
4
With only the fundamental axioms of Zermelo-Fraenkel set theory, it is not possible to
prove mathematical induction. An extra axiom is needed, the infamous Axiom of Choice,
or one of its equivalent forms. The four statements known as ‘Axiom of Choice’, ‘Zorn’s
Lemma’, ‘Well-Ordering principle’ (also known as well-ordering theorem, see page 3.8.2)
and ‘Mathematical Induction Principle’ are all equivalent, meaning that if you assume one
of them to be true, the others follow as consequences, but none of them can be proven
from the other fundamental axioms in ZF set theory alone. There are also other equivalent
statements that are sometimes used (such as Zermelo’s postulate), and it is a nice exercise
to prove the equivalence of these statements.
3.4. INDUCTION METHODS 39
Normal induction (IND):
(∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n : n ∈ N : ϕ(n)))
Transfinite induction (TFIND):
(∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)) → (∀m : m ∈ N : ψ(m)))
We can prove the equivalence of IND and TFIND in two ways: in a con-
structive way or with a proof by contradiction. We give both proofs.
Proof by Contradiction: (from: [17])
It suffices to prove that IND’ ≡ TFIND’, with
IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)))
TFIND’ ≡ (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)))
Proof of TFIND’ → IND’: Assume ϕ is a property. We assume TFIND’,
and instantiate ψ with the property ϕ. We now want to prove IND’. If we
take q = 0, (∀p : p ∈ N : p < 0 → ϕ(p)) is trivially true. Thus we have
ϕ(0). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).
Assume k ∈ N, ϕ(k) ∧ ϕ(k + 1). That means the condition of TFIND’
(∀p : p ∈ N : p < q → ϕ(p)), with q = k + 1 must not be true: (∀p :
p ∈ N : p < k + 1 → ϕ(p)), i.e. (∃p : p ∈ N : p < k + 1 ∧ ϕ(p)).
Let s ∈ N be the smallest number such that s < k + 1 ∧ ϕ(s), that is
(∀r : r ∈ N : r < s → ϕ(r)). But then we would have ϕ(s) according to
TFIND’ (namely if we take s for q and r for p), contradiction. Now we have
proved that (∀ϕ :: (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))), and since we already
have proven (∀ϕ :: ϕ(0)), we have IND’.
Proof of IND’ → TFIND’: Assume IND’, instantiate ϕ with ψ. For all
properties ψ we have to prove (∀q : q ∈ N : (∀p : p ∈ N : p < q →
ψ(p)) → ψ(q)). First we prove this for q = 0. If we take q = 0, we have
(p < 0 → ψ(p)) → ψ(0), i.e. ψ(0). This is true by the assumption of IND’.
Now we prove this for q > 0. Suppose we have (∀q : q ∈ N : (∀p : p ∈ N : p <
q → ψ(p)). By IND’ we also know that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)), and
thus ϕ(q) also holds for all q > 0. Hereby we have proved TFIND’.
40 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Constructive Proof:
Proof of TFIND → IND: Assume TFIND, and let ϕ be a property. We
now need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n :
n ∈ N : ϕ(n)). Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). We
want to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). TFIND gives us:
(∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let k ∈ N. We now have
that (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k). If k = 0, (∀l : l ∈ N : l < k → ϕ(l))
is trivially true since the range of l is empty. Thus ϕ(k) holds for k = 0.
Assume k > 0, and (∀l : l ∈ N : l < k → ϕ(l)). This means ϕ(k − 1) holds
(since k −1 ∈ N). But we have assumed that (∀k : k ∈ N : ϕ(k) → ϕ(k +1)).
Thus ϕ(k) holds also for k > 0.
Proof of IND → TFIND: Assume ψ is a property. Also assume that
(i): (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let s(k) :=
(∀l : l ∈ N : l < k → ϕ(l)), for all k ∈ N. We prove (∀n : n ∈ N : ϕ(n)) by
first proving that (∀n : n ∈ N : s(n)) by using IND, and subsequently that
(∀n : n ∈ N : s(n) → ϕ(n)). Clearly, s(0) holds trivially since the range of
l is empty in that case. Suppose s(k) holds. Since s(k + 1) ≡ s(k) ∧ ϕ(k),
we can conclude s(k + 1) because ϕ(k) follows from (i) and the definition of
s(k). Now we have s(0) ∧ (∀k : k ∈ N : s(k) → s(k +1)), and thus (by using
IND) that (∀n : n ∈ N : s(n)). And, by the definition of s, (i) gives us that
(∀n : n ∈ N : ϕ(n)).
Structural Induction
In many cases we do not want to prove properties about the integers or
similar well-ordered sets. In such cases straight induction is not always useful.
However, forms of induction can also be appropriate when trying to prove
properties about structures defined recursively. This generalized induction
principle is known as structural induction. It is useful when objects are built
up from more primitive objects: if we can show the primitive objects have
the desired property, and that the act of building preserves that property,
then we have shown that all objects must have the property. The induc-
tive hypothesis (i.e., the assumption) is to assume that something is true for
‘simpler’ forms of an object and then prove that it holds for ‘more complex’
forms. ‘Complexity’ can be defined in several ways: the most common way
is to say that one object is more complex than another if it includes that
3.4. INDUCTION METHODS 41
other object as a subpart, but this need not always be the case.
A general treatment of recursively defined structures (formal definition
of structural induction over recursive datatypes) will be presented in a later
version of this report.
Example: We show that mathematical induction is an instance of the general
notion of structural induction over values of recursively defined types, in a
later version of this report.
Example: As an example of the use of mathematical induction we prove the
binomial theorem. The binomial theorem states that for all x, y ∈ R, and
n ∈ N we have
EQ ≡ (x +y)
n
=
n
¸
j=0

n
j

x
n−j
y
j
We call the left-hand side of this equality LHS, and the right-hand side
RHS, and abbreviate the equality by EQ. We assume two real numbers x
and y and prove EQ by induction on n.
Basis case: For n = 0 the EQ clearly is correct, since both sides are 1. For
some reason, most textbooks take n = 1 as the basis, in which case LHS is
simply x +y, and RHS is

1
0

x
1−0
y
0
+

1
1

x
1−1
y
1
= x +y
Induction case: We assume EQ is true for n = k and have to show that it is
then also true for n = k + 1 :
(x +y)
k+1
=
k+1
¸
j=0

k + 1
j

x
k+1−j
y
j
First, we rewrite the left side of this equation:
LHS = (x +y)
k+1
= (x +y)
k
(x +y) =
(here in fact we are using the induction hypothesis)

k
¸
j=0

k
j

x
k−j
y
j

(x +y) =
42 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
k
¸
j=0

k
j

x
k−j+1
y
j
+
k
¸
j=0

k
j

x
k−j
y
j+1
In rewriting the right side of the equation, we use Pascal’s identity:
(∀k, n : k, n ∈ N ∧ 0 < k < n :

n + 1
k

=

n
k −1

+

n
k

)
We first prove the latter:

n
k −1

+

n
k

=
n!
(k −1)!(n −k + 1)!
+
n!
k!(n −k)!
=
n! k
k!(n −k + 1)!
+
n! (n −k + 1)
k! (n −k + 1)!
=
n! (k + (n −k + 1))
k! (n −k + 1)!
=
(n + 1)!
k! (n + 1 −k)!
=

n + 1
k

Now we rewrite RHS:
RHS =
k+1
¸
j=0

k + 1
j

x
k+1−j
y
j
=
We split out the j = 0 and j = k + 1 terms before applying Pascal’s
identity.
x
k+1
+ y
k+1
+
k
¸
j=1

k + 1
j

x
k+1−j
y
j
=
x
k+1
+ y
k+1
+
k
¸
j=1

k
j

+

k
j −1

x
k+1−j
y
j
=
x
k+1
+ y
k+1
+
k
¸
j=1

k
j

x
k+1−j
y
j
+
k
¸
j=1

k
j −1

x
k+1−j
y
j
We can now bring x
k+1
into the first sum (as the j = 0 term), and y
k+1
into the second sum (as the j = k + 1 term). This gives
RHS =
k
¸
j=0

k
j

x
k+1−j
y
j
+
k+1
¸
j=1

k
j −1

x
k+1−j
y
j
3.4. INDUCTION METHODS 43
and
LHS =
k
¸
j=0

k
j

x
k−j+1
y
j
+
k
¸
j=0

k
j

x
k−j
y
j+1
The first sums of LHS and RHS are the same, and we can see that the
second sums are also equal, by doing a dummy transformation (let i = j −1):
k+1
¸
j=1

k
j −1

x
k+1−j
y
j
=
k
¸
i=0

k
i

x
k−i
y
i+1
So LHS = RHS, and we can conclude that EQ holds for all x, y ∈ R and
n ∈ N.
Example: We give an example of a proof about binary trees using structural
induction. First we define a data structure for binary trees. For this example
we will use a definition in the notation of the language Z to describe recur-
sive data structures. The structure of a binary tree is well known and says
that a tree is either a leaf or made up of two subtrees glued together by a node.
TREE ::= leaf [ node < TREE TREE >
An example of such a tree is node(leaf, node(node(leaf, leaf), leaf)). We
now define the size of a tree, by counting both the leaves and the nodes. The
basic idea of the definition is that we define the size of a tree inductively over
the structure, saying how the size of a given tree is calculated from the sizes
of its parts. Again we define the size in the language Z, by first declaring its
type and then saying how it is defined in each of the two cases:
44 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
size : TREE →N
∀ t
1
, t
2
: TREE •
size(leaf) = 1 ∧
size(node(t
1
, t
2
)) = 1 + size(t
1
) + size(t
2
)
Similarly, we make two new definitions about trees:
leaves: TREE →N
nodes: TREE →N
∀ t
1
, t
2
: TREE •
leaves(leaf) = 1 ∧
leaves(node(t
1
,t
2
)) = leaves(t
1
) + leaves(t
2
) ∧
nodes(leaf) = 0 ∧
nodes(node(t
1
,t
2
)) = 1 + nodes(t
1
) + nodes(t
2
)
We now want to prove the following theorem by structural induction on the
size of the tree t.
Theorem: For all trees t, size(t) = leaves(t) + nodes(t).
Proof: Let t, t

, t
1
and t
2
be of type TREE. We prove the theorem by
induction on the size of t.
Base case: Assume t=leaf. Then size(t) = size(leaf) = 1. Also, leaves(t) +
nodes(t) = leaves(t) + 0 = 1 + 0 = 1.
Induction case: Assume t = node(t
1
, t
2
). The induction hypothesis says that
the theorem holds for all t

with size(t

) < size(t). Then size(t)= size(node(t
1
,
t
2
)) = 1 + size(t
1
) + size(t
2
) = (apply induction hypothesis to t
1
and t
2
) 1
+ (leaves(t
1
) + nodes(t
1
)) + (leaves(t
2
) + nodes(t
2
)).
And leaves(t) + nodes(t) = leaves(node(t
1
, t
2
)) + nodes(node(t
1
, t
2
)) =
(leaves(t
1
) + leaves(t
2
)) + (1 + nodes(t
1
) + nodes(t
2
)) = (commutativity and
associativity of + ) 1 + (leaves(t
1
) + nodes(t
1
)) + (leaves(t
2
) + nodes(t
2
)).
3.5. REAL NUMBERS 45
3.5 Real numbers
What do we mean when we say ‘continuum’ ? Here is a description Albert
Einstein gave on page 83 of [21]:
The surface of a marble table is spread out in front of me. I can
get from any point on this table to any other point by passing
continuously from one point to a ‘neighboring’ one, and repeating
this process a (large) number of times, or, in other words, by
going from point to point without executing ‘jumps’. I am sure
the reader will appreciate with sufficient clearness what I mean
here by ‘neighboring’ and by ‘jumps’ (if he is not too pedantic).
We express this property of the surface by describing the latter as
a continuum.
People have been using the concept of real numbers for a long time (the
Babylonians for example already calculated with roots long B.C., see [12]).
In order for set theory to cover the fundamental structures of analysis, a
precise and formal basis for the real numbers was needed. Even simple equa-
tions have no solutions if all we knew were rational numbers (for example,
there is no rational number x such that x
2
= x ∗ x = 2).
When Cantor developed his set theory, it was well known that each type of
number could be constructed as the limit of a sequence of numbers of another
type. But it became clear that, especially in connection with theorems as-
serting the existence of some limit relations, (see [30, page 182]) the proof
might require irrational numbers to be defined in terms of rational ones, in
order to avoid begging the question of existence involved in the theorem.
Cauchy and Heine tried to define the irrational or real numbers in the second
half of the 19
th
century. In 1872 Cantor and Dedekind followed with their
precise definition of the real numbers. We first present the three methods
(of Dedekind, Cantor and Cauchy) of defining the reals in terms of rationals
and then show that they are identifiable.
46 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
3.5.1 Dedekind’s cuts
As a professor in the Polytechnic School in Z¨ urich I found my-
self for the first time obliged to lecture upon the elements of the
differential calculus and felt more keenly than ever before the lack
of a really scientific foundation for arithmetic.
- Richard Dedekind, in the opening of the paper in which Dedekind’s
cuts were introduced.
Dedekind defined a cut to determine a real number. A cut is a partition
of a sequence into two disjoint nonempty subsequences, all the members of
one of which are less than all the members of the other. Dedekind used the
point at which the sequence is partitioned
5
to define a real number.
Definition of a (Dedekind) cut:
Given an ordering < on a set V , a subset C ⊆ V is a cut in V :=
1) C = ∅ ∧ C = V
2) (∀a, b : a, b ∈ C : a ∈ C ∧ b < a → b ∈ C)
3) C does not have a greatest element
Example: ¦x ∈ Q [ x
2
< 2¦ is a cut in Q. Notice that we can also define the
same cut as ¦x ∈ Q [ x
4
< 4¦.
Each real number r can now be defined by a cut C in Q if r is the supre-
mum for C. Each cut then determines a unique real number (see paragraph
3.5.4). We want to identify cuts that define the same real number, such as
for example ¦x ∈ Q [ x
2
< 2¦ and ¦x ∈ Q [ x
4
< 4¦.
Definition of (Dedekind) cut equivalence: A cut C
1
is equivalent to a
cut C
2
, notation C
1
∼ C
2
:= there is a supremum r for C
1
and for C
2
We can now define R
Dedekind
as the set of all equivalence classes of all cuts
in Q: R
Dedekind
:= ¦C ⊆ Q [ C is a cut in Q ¦/∼.
5
Actually, Dedekind’s original definition did not use a partition but a slightly more
complex division. For details see the link ‘Dedekind cuts’ at http://zax.mine.nu/stage.
3.5. REAL NUMBERS 47
Example: ¦x ∈ Q [ x
2
< 2¦ has

2 as supremum. We can identify the real
number

2 with the equivalence class of all sets that have

2 as supremum.
3.5.2 Cantor’s chains of segments
In mathematics the art of proposing a question must be held of
higher value than solving it.
- A thesis defended in Cantor’s doctoral examination.
Cantor defined a chain of segments to determine a real number (see also
[17, chapter 12]). This is a sequence of ever decreasing intervals in Q, the
limit of which determines a unique real number.
Definition of chain segments:
< a
n
, b
n
>
V
n∈N
is a chain of segments (in V ) :=
1) (∀n : n ∈ N : a
n
∈ V ∧ b
n
∈ V )
2) (∀n : n ∈ N : a
n
≤ a
n+1
≤ b
n+1
≤ b
n
)
3) (∀n : n ∈ N : b
n
−a
n
≤ 2
−n
)
Example: Consider the following chain of segments in Q:
<< 1, 2 >, < 1.4, 1.5 >, < 1.41, 1.42 >, < 1.414, 1.415 >, . . . >.
Each segment ‘includes’

2.
Note that < a
n
, b
n
>
V
n∈N
(notation < a
n
, b
n
>
V
or < a
n
, b
n
> when it
is clear which set V is meant) is actually a sequence, and in 3) a minimum
bound is put on the speed of convergence. We now want to be able to say
when two chains are equivalent.
Definition of chain equivalence: The chains of segments < a
n
, b
n
> and
< c
n
, d
n
> are equivalent, notation < a
n
, b
n
> ∼ < c
n
, d
n
> :=
(∀k : k ∈ N : b
k
≥ c
k
∧ d
k
≥ a
k
)
Theorem: ∼ is an equivalence relation on the set of all chains of segments
of Q
48 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Each equivalence class of chains of segments in Qnow determines uniquely
a real number r. To be precise, r is determined by < a
n
, b
n
>

if
(∀n : n ∈ N : a
n
< r < b
n
). r then is the only real number with this property
(see also paragraph 3.5.4).
We can now define R
Cantor
as the set of all equivalence classes of chains
of segments in Q : R
Cantor
:=< a
n
, b
n
>
Q
n∈N
/ ∼
3.5.3 Cauchy-sequences
Men pass away, but their deeds abide.
- Louis Cauchy, his last words quoted in [22].
Cauchy defined a Cauchy sequence to determine a real number. His sequence
of numbers defines a real by letting the numbers come closer to the real num-
ber in every step.
Definition of Cauchy Sequence: With _ a partial order on a set
6
V ,
¦a
n
¦
V
n∈N
is a Cauchy sequence in V :=
1) (∀n : n ∈ N : a
n
∈ V )
2) (∀k : k ∈ N : (∃p : p ∈ N : (∀n, m : n, m ∈ N : n, m > p →
[ a
n
−a
m
[ ≤ 2
−k
)))
Example: The informally (using ‘. . .’ to informally indicate an infinite con-
tinuation) defined sets ¦1, 1.4, 1.414, 1.4142, 1.41421, 1.414213, . . .¦ and
¦1, 1.414, 1.4121, . . .¦ are both Cauchy sequences. For each n ∈ N, a
n+1
lays
closer to

2 than a
n
.
We also denote a Cauchy sequence ¦a
n
¦
n∈N
simply by a
n
. We now want
to be able to say when two Cauchy sequences are equivalent.
6
V is in general an ordered, commutative ring. We will not further discuss this here,
and for the rest of this paragraph take V = Q.
3.5. REAL NUMBERS 49
Definition of Cauchy sequence equivalence: The sequences a
n
and b
n
are equivalent, notation a
n
∼ b
n
:= lim
n→∞
(a
n
) = lim
n→∞
(b
n
)
Note that in the definition of equivalence the hitherto undefined notion
of a limit is used. With the following definition we can formalize the notion
of a limit.
Definition of sequence convergence: A sequence ¦a
n
¦
n∈N
of elements of
a set V is said to converge to a sequence ¦b
n
¦
n∈N
, notation lim
n→∞
(a
n
) =
lim
n→∞
(b
n
) := (∀k : k ∈ N : (∃p, q : p, q ∈ N : (∀n, m : n, m ∈ N ∧ n >
p ∧ m > q : [ a
n
−b
m
[ < 2
−k
)))
Note: convergence is usually defined in terms of real numbers, but we can-
not use such definition here because we yet have to define the reals. The num-
ber r is then called the limit of the sequence a
n
, notation lim
n→∞
(a
n
) = r,
if (∀k : k ∈ N : (∃p : p ∈ N : (∀n : n ∈ N ∧ n > p :[ a
n
−r [< 2
−k
))).
A sequence is said to diverge if it does not converge.
Theorem: Any convergent sequence ¦a
n
¦
n∈N
is bounded and has a unique
limit.
Proof: First we prove (by contradiction) the uniqueness. Suppose the se-
quence has 2 limits, c and c

. Take any k ∈ N. Then from the definition of
convergence there is an integer p such that [ a
n
−c [ < 2
−k
if n > p. Also, there
is an integer p

such that [ a
n
− c

[ < 2
−k
, if n > p

. Adding the two equa-
tions we get (using the triangle inequality: (∀a, b :: [ a + b [ ≤ [ a [ + [ b [ ))
: [ c

− c [ = [ (a
n
− c) + (c

− a
n
) [ ≤ [ a
n
− c [ + [ a
n
− c

[ < 2
−k
∗ 2.
Hence, [ c

−c [ < 2 ∗ 2
−k
, for all k ∈ N, if n > p ∧n > p

. This means c = c

,
thus the limit is indeed unique. Now we prove boundedness. The sequence
converges, so we can take, for example, k = l. Then there is a p such that
[ a
j
−c [ < 2
−k
for j > p. We then have, again using the triangle inequality,
that [ a
j
[ ≤ [ a
j
− c [ + [ c [ < 2
−l
+ [ c [. Then the sequence can be
bounded by M = max.¦[ a
1
[, [ a
2
[, . . . , [ a
p
[, (1 + [ c [)¦
Each real number can now be defined by an equivalence class of Cauchy
sequences: r is determined by a
n ∼
if r = lim
n→∞
(a
n
), for each sequence a
n
from the equivalence class a
n ∼
.
50 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
We can now define R
Cauchy
as the set of all equivalence classes of Cauchy
sequences in Q : R
Cauchy
:=< a
n
>
Q
n∈N
/ ∼
3.5.4 Properties of the three definitions
Before these definitions for real numbers were given, we intuitively thought of
the reals as infinite sequences of (decimal) digits. In the rest of this section
we assume that by R we mean this set of reals, i.e. all infinite sequences
of decimal numbers. We can now check whether the three new definitions
indeed are correct ways to identify real numbers:
1) < a
n
, b
n
> Q is a chain of segments → (∃!c : c ∈ R : (∀n : n ∈ N : a
n

c ≤ b
n
))
2) C is a cut in Q → (∃!c : c ∈ R : c = supremum(C))
3) ¦a
n
¦
n∈N
is a Cauchy sequence → (∃!c : c ∈ R : lim
n→∞
(a
n
= c))
Then we can check for every newly defined set X of reals that:
a) it contains a countable, densely ordered (i.e. (∀r
1
, r
2
: r
1
, r
2
∈ D : (∃q :
q ∈ Q : r
1
< q < r
2
))) set D without endpoint, which is dense in X.
b) every Dedekind cut has a supremum in X.
Every set for which a) and b) hold is isomorphic with R. If a definition
satisfies a) and b) it possesses the properties we intuitively want the real
numbers to have. It can be proven that if these two properties hold we have
defined the reals successfully such that there is a total ordering on the reals,
the reals are densely ordered and the ordering is continuous.
3.6. INFINITE SETS 51
3.6 Infinite sets
Our minds are infinite, and yet even in these circumstances of
finitude we are surrounded by possibilities that are infinite, and
the purpose of life is to grasp as much as we can out of that in-
finitude.
- A.N. Whitehead in [76]
The size of a finite set V , notation [ V [, can be defined by the number of
elements that it has. But counting the elements does not end for infinite sets.
Cantor was concerned with the problem of measuring the sizes of infinite sets
(because he was investigating questions about singularities of Fourier series,
see [30, chapter 4]) and proposed a rather nice solution to this problem. He
observed that two finite sets have the same size if the elements of one set
can be paired with the elements of the other set; this method compares sets
without resorting to counting and can be extended to infinite sets.
This is the concept of an equivalence relation between sets (the relation is
also referred to as ‘are of the same cardinality’, ‘equipotent’ or ‘equipollent’
(see [30, page 229])).
Definition of set equivalence: A set V is equivalent to a set W, notation
V ∼ W := there is a bijection f : V → W
It is simple to check that ∼ has the properties of an equivalence relation,
i.e. it is reflexive, symmetric and transitive. But if we consider ∼ to be a
true relation, we need the concept of V , the set of all sets: ∼ ⊆ V V . But
the existence of V is paradoxical, see section 3.8.
This new method to measure the number of elements of a set is reflected
in the notion of cardinality of a set, and led to the surprising result that
there are many levels of infinity. Before we present a proof of this result,
using Cantor’s famous diagonalization method, we first introduce some more
definitions.
52 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Postulate for Cardinal numbers:
With every set V is associated a well-defined abstract entity V , called the
cardinal number of V , such that V ∼ W ↔ V = W. We can think of V
as denoting the common property of set equivalence (as defined above) of all
sets in the equivalence class of V .
It proved difficult however, to come to an exact definition of cardinality
from this postulate. Cantor regarded cardinals as special abstract entities
of a new kind. In 1884, the German mathematician Frege came with his
own definition of cardinal numbers. He discussed it with the mathematician
Russell and they proposed the idea of defining V as V/ ∼, the equivalence
class of V modulo ∼. The postulate for cardinal numbers then follows at
once. Frege also denoted finite cardinal numbers as natural numbers: ∅ = 0,
¦∅¦ = 1, ¦∅, ¦∅¦¦ = 2, . . .. This Frege-Russell definition would become stan-
dard, until - as we will later see in section 3.8 - it became known that this
definition could also lead to a paradox.
Cantor used the Hebrew letter aleph to name the different levels of in-
finity. The cardinality of the set of natural numbers is by definition called
aleph-null or aleph-nough, notation ℵ
0
. The ‘next levels’ of infinity are called

1
, ℵ
2
, . . .. Since the cardinality of the set of reals was unknown, Cantor de-
fined it as c. If we assume the continuum hypothesis (see section 3.7), that
says there is no level of infinity between the cardinality of N and R, the car-
dinality of the set of reals can also be denoted by aleph-one, notation ℵ
1
.
Property of cardinality: Given the cardinality V of a set V , we have
• If V is finite: V = the number of elements of V
• If V is infinite: V = ℵ
i
, when there exists a bijection between V and
the set {
i
(N)
Sometimes the cardinality of a set V is also denoted by [ V [ , after the size
of a set V . A more rigorous treatment of cardinal numbers will be given in
section 3.8.1. This new concept enabled Cantor to define more concepts for
the analysis of infinite sets. It also inspired others to analyze the properties
of infinite sets.
3.6. INFINITE SETS 53
No other question has ever moved so profoundly the spirit of man,
no other idea has so fruitfully stimulated his intellect; yet no other
concept stands in greater need of clarification than that of the in-
finite.
- D. Hilbert, quoted in [96]
In the rest of this section we will present some of the results of the research
of infinite sets.
Definition of finite: A set V is finite := (∃n : n ∈ N : V ∼ ¦x ∈ N [ x < n¦)
Definition of infinite: A set V is infinite := V is not finite
Definition of Dedekind infinite:
A set V is Dedekind infinite := (∃W : W ⊂ V : V ∼ W)
Theorem: V is Dedekind infinite ↔ V is infinite (from [17])
Proof: We show that V is infinite iff N ≤
1
V . We prove the two implications
of the theorem separately:
V is Dedekind infinite → V is infinite: V is Dedekind infinite, i.e. there
exists a W ⊂ V such that V ∼ W, i.e. there exists a bijection f : V → W.
Because W is nonempty and W ⊂ V there also exists an a ∈ V such that
a / ∈ W. Consider the function g : N → V , defined recursively by g(0) = a
and g(k + 1) = f(g(k)). We now have to show that g is an injection, i.e for
all i, j ∈ N : i = j → g(i) = g(j). We use induction on i:
i = 0: if 0 = j then g(0) = a / ∈ W and g(j) ∈ W, so g(0) = g(j).
i = k + 1 : assume k + 1 = j, then we can prove g(k + 1) = g(j) by
induction on j:
j = 0 : g(0) = a / ∈ W and g(k + 1) ∈ W, so g(k + 1) = g(0).
j = l + 1: we know k = 1 = j = l + 1, so k = l. By the induction
hypotheses g(k) = g(l). Since f is a bijection we also have that
f(g(k)) = f(g(l)), i.e g(k + 1) = g(l + 1) or g(i) = g(j).
54 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
V is Dedekind infinite ← V is infinite: N ≤
1
V , so there exists a bijec-
tion f : N → V . We show that W := V − ¦f(0)¦, clearly a real subset
of V (W ⊂ V ), is equivalent to V (W ∼ V ). The following function g is a
bijection from V to W: g(f(i)) = f(i +1), g(x) = x if x = f(i), for all i ∈ N.
Definition of countable:
A set V is countable, also called denumerable := V is finite or V ∼ N
Definition of uncountable: A set V is uncountable := V is not countable
Definition of denumeration: A denumeration of a set V is a bijection
f : N → V
Cantor then proved that N, Z and Q all have the same cardinality and
also called these sets countably infinite.
Theorem: Q is countable
Proof: We give a bijection from N to Q, by listing all elements of Q. Consider
a table with all fractionals
a
b
(a ∈ N, b ∈ N
+
, with fractional
a
b
on the a
th
row and the b
th
column. If we list all elements row by row, we would not
obtain a correspondence between N and Q, since the list would never get
to the second row. By listing the elements at the diagonals (south-west to
north-east), starting from the north-west corner, we obtain a correspondence
between N and Q. Because
2
2
=
1
1
, etc, we hereby skip an element when it
would cause a repetition. We can also give a bijection from Q to an infinite
subset of N which is equivalent to N: for each fractional
a
b
∈ Q with a and b
relative prime, let f(< a, b >) :=
1
2
(a +b)(a +b + 1) +n.
An example of an uncountable set is the set of real numbers, R. In 1873
Cantor proved that R is uncountable, using a technique called diagonaliza-
tion (also known as the diagonal method), see [17, page 99].
Theorem: R is uncountable
Proof: Suppose there is a bijection f between N and R. We contradict this
by finding an x in R that is not paired with anything in N. We construct
this X by taking the first fractional digit of x arbitrarily but never 0 or 9 or
the first fractional digit of f(1), the second fractional digit of x also different
from 0, 9, and the second fractional digit of f(2), etc. Continuing this way
3.6. INFINITE SETS 55
down the diagonal of the table of digits, we obtain all digits of x. x is not
f(n) for any n because the n
th
fractional digit of x differs from the n
th
frac-
tional digit of f(n).
Note that we avoid the problem of certain numbers such as 2.3999 . . . and
2.4000 . . . being equal by never selecting a 9 or a 0. Similarly, we can use
this diagonalization method to show that N ∼ ¦0, 1¦
N
.
Theorem: (∀V :: {(V ) ∼ ¦0, 1¦
V
). (see [17, page 98])
Proof: We show that there is a bijection K from {(V ) to ¦0, 1¦
V
. For
W ⊆ V , define K(W) (also denoted K
W
), the characteristic function of W,
as:
K
W
(v) = 1 if v ∈ W
K
W
(v) = 0 if v / ∈ W.
We now show that K is a bijection from {(V ) to ¦0, 1¦
W
:
1) f is injective: let W
1
, W
2
⊂ V and suppose W
1
= W
2
, that means there
is an element w ∈ V , such that (w ∈ W
1
∧w / ∈ W
2
)∨(w / ∈ W
1
∧w ∈ W
2
).
Then we have that (K
W
1
(w) = 1 ∧ K
W
2
(w) = 0) ∨ (K
W
1
(w) = 0 ∧
K
W
2
(w) = 1), and thus (∃w : w ∈ V : K
W
1
(v) = K
W
2
(v)), i.e. K
W
1
=
K
W
2
.
2) f(w) is surjective: suppose g ∈ ¦0, 1¦
V
. Let W
g
= ¦v ∈ V [ g(v) = 1¦.
Then (∀v : v ∈ V : K
W
g
(v) = 1 ↔ g(v) = 1), thus (∀v : v ∈ V :
K
W
g
(v) = g(v)), and g = K
W
g
.
We can define an ordering relation ≤
1
on the cardinalities of sets. We
say that V ≤
1
W if there is an injection from V to W but not vice versa.
Then V <
1
W of course means that V ≤
1
W holds but not V ∼ W. This
relation on the set of cardinals only depends on the cardinals themselves and
not on the choice of the particular sets V and W. The relation ≤
1
is reflexive
and transitive. Cantor also conjectured that ≤
1
is a partial order. This was
later proven independently by the two mathematicians F. Bernstein and E.
Schr¨oder (see [59, page39]).
We give two theorems that are based on the relation <
1
:
Theorem: (without proof) (∀V : V is a non-empty set: V <
1
{(V ))
56 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Theorem: V is Dedekind infinite ↔N ≤
1
V
Proof: This theorem follows directly from the theorem on page 53 and the
definition of infinite.
Although we have seen that N is countable but R is not, we might still
think that there is some smaller interval of the reals that can be paired to
the naturals.
Theorem: N ∼ [0, 1]
Proof of Poincar´ e (see [17]) We show there is no bijection f : N → [0, 1],
in particular (∀f : (f : N → [0, 1]) : f is not surjective). We do this
by constructing for every function f : N → [0, 1] a y ∈ [0, 1] such that
(∀n : n ∈ N : f(n) = y). We construct this y by means of a chain of
segments (see paragraph 3.5.2).
Let f : N → [0, 1]. Let S
n
be an infinite chain of segments such that
1) (∀i : i ∈ N : f(i) / ∈ S
i
)
2) (∀i : i ∈ N : S
i+1
⊆ S
i
)
3) (∀i : i ∈ N : [ S
i
[ = 3
−i−1
),
with [ S
i
[ being the length of segment S
i
.
We can construct such a chain of segments, for if we divide a segment
S
n
= [p
q
, q
n
] in three equal parts (i.e. each part has length 3
−n−1
), at least
one of these parts does not contain f(n + 1). We take this part for S
n+1
.
The constructed chain of segments determines (see paragraph 3.5.2) a real
number y, with (∀n : n ∈ N : y ∈ S
n
), and thus certainly y ∈ [0, 1]. We also
have that (∀n : n ∈ N : f(n) / ∈ S
n
∧ y ∈ S
n
), i.e. so (∀n : n ∈ N : y = f(n)).
The following theorem gives a way to prove the equivalence of sets:
Theorem of Cantor-Bernstein: V ≤
1
W ∧ W ≤
1
V → V ∼ W
Proof: Assume V ≤
1
W and W ≤
1
V . Then there are injections f : V → W
and g : W → V . We know that Dom(g) = W, so to prove g is surjective
we have to prove Ran(g) ∼ W. Since Ran(g) ⊆ V and g ◦ f is an injec-
tion from V to Ran(g), we have V ≤
1
Ran(g). And since for all W and V ,
W ⊆ V ∧ V ≤
1
W → V ∼ W (see the lemma below), we have Ran(g) ∼ V .
3.6. INFINITE SETS 57
Lemma: W ⊆ V ∧ V ≤
1
W → V ∼ W
Proof: Suppose W ⊆ V and V ≤
1
W. There is an injection h : V → W. Let
A
0
:= V − W, and (∀n : n ∈ N : A
n+1
:= h(A
n
)). We now give the desired
bijection k : V → W.
• k(a) := a if a / ∈
¸
n
A
n
• k(a) := h(a) if a ∈
¸
n
A
n
We show that k is a bijection:
• k is injective: Suppose a = b, then k(a) = k(b) by using a case analysis
a / ∈
¸
n
A
n
∧ b / ∈
¸
n
A
n
, a / ∈
¸
n
A
n
∧ b ∈
¸
n
A
n
, a ∈
¸
n
A
n
∧ b / ∈
¸
n
A
n
, a ∈
¸
n
A
n
∧b ∈
¸
n
A
n
. For all cases, it follows that k(a) = k(b)
by the definition of k and the injectivity of h.
• k is surjective: Suppose w ∈ W, thus w / ∈ A
0
. Again we use case
analysis:
– if w / ∈
¸
n
A
n
then w = k(w).
– if w ∈
¸
n
A
n
, assume w ∈ A
p
. Since w / ∈ A
0
, p ≥ 1. Thus there is
a w

∈ A
p−1
such that w = k(w

).
Example: We prove that (a, b) ∼ [0, 1] for all a, b ∈ R by using the theorem
of Cantor-Bernstein. We first prove that (0, 1) ∼ [0, 1] and consequently
that (0, 1) ∼ (a, b). Then, by the transitivity of ∼ we can conclude that
(a, b) ∼ [0, 1].
Proof of (0, 1) ∼ [0, 1]: The identity function id
(0,1)
: (0, 1) → [0, 1]
is an injection from (0, 1) to [0, 1], so (0, 1) ≤
1
[0, 1]. The function
f(x) =
1
3
(x + 1) is an injection from [0, 1] to (0, 1), so [0, 1] ≤
1
(0, 1).
By the theorem of Cantor-Bernstein we now know that (0, 1) ∼ [0, 1].
Proof of (0, 1) ∼ (a, b): The function f(x) = (b −a)x +a is a bijection
from (0, 1) to (a, b).
Using the Cantor-Bernstein theorem we can also prove that
(a, b) ∼ (0, 1) ∼ R ∼ R
n
∼ ¦0, 1¦
R
∼ {(N) ∼ N
N
, for all n ∈ N, n ≥ 1.
58 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Theorem: V is infinite →N ≤
1
V
Proof: V is infinite and thus not empty. We take one element x
0
∈ V . Next,
we take an element x
1
∈ V −¦x
0
¦. We can repeat this infinitely (i.e. for all
n we can select an x ∈ V − ¦x
0
, . . . , x
n
¦), if we assume that it is possible
to always select an element from any non-empty set (see the axiom of choice
below). In this way we get a countable subset of V , namely ¦x
0
, x
1
, x
2
, . . .¦.
The only assumption we have made here is the so-called axiom of choice.
Axiom of choice (AC): Given any set W of non-empty sets V , there is a
function f which assigns to each member V of W an element f(V ) of V .
This definition was proposed first in an article by Zermelo in 1908 (trans-
lated in [93, pages 199-215]). Such a function f is called a choice function
for W. The axiom can be restricted by limiting to those families W of a par-
ticular cardinality. Since for any finite W the axiom is provable, the weakest
non-trivial case occurs when W is denumerable (see page 54 for the definition
of denumerable). This case is known as the Denumerable axiom. Zermelo
regarded the AC as already implicitly used by mathematicians. In response
some people asked when this assumption developed from mathematics, when
it is implicitly used, and when exactly it can or cannot be avoided. Zermelo
attempted to prove AC, but the controversy over his proof of 1904 (see [63,
page 310]) led Zermelo to axiomatize set theory (see section 5.3.1). We can
add AC to set theory based on the axioms of Zermelo and Fraenkel (ZF, see
section 5.3), in which case it is termed ZFC (ZF supplemented by the Axiom
of Choice). For more details on the role of the AC, we refer to section 5.3
and [63]. See http://zax.mine.nu/stage and click on ‘links’ for some quotes
about the AC.
An instance of the following theorem (without proof) of the British ma-
thematician F.P. Ramsey is often used in graph theory. The notation V
n
in
this theorem is defined as the set of all subsets of V with n elements, i.e.
V
n
:= ¦X ⊆ V [ X has n elements¦.
Theorem of Ramsey: If V is a denumerable set and f : V
n
→ ¦0, 1, . . . , m−
1¦ with n, m ∈ N and n, m ≥ 1 then (∃W : W ≤
1
V : W is denumerable and
f is constant on W
n
).
3.6. INFINITE SETS 59
Theorem: R
2
∼ R ∼ (0, 1)
Proof: We can say that R ∼ (0, 1) if there is a bijection between (0, 1)
and R. Indeed, there exists a bijection f : (0, 1) → R, defined as f(x) =
tan(
π
2
(2x−1)). Thus: R ∼ (0, 1). If we consider an element of R
2
, that is two
real numbers between 0 and 1, then we can map these numbers to an element
r ∈ R by interchangeably taking the next digit of each of the two numbers.
For example, we map (0.76584 . . . , 0.13275, . . .) uniquely to (0, 71635 . . .).
Thus: R
2
∼ R. Since ∼ is transitive, we know that R
2
∼ R ∼ (0, 1).
Theorem: {(N) ∼ (0, 1)
Proof: First we show that {(N) ≤
1
R. Suppose V ∈ {(N), map V to
the decimal 0.a
1
a
2
. . ., with a
i
= 1 if i ∈ V and a
i
= 0 otherwise. This
injection proves that {(N) ≤
1
R. Now we give an injection from (0, 1) to
{(N): assume r ∈ (0, 1), i.e. r = 0.a
1
a
2
. . . with 0 ≤ a
i
≤ 9. We want
to identify numbers such as 0.3999 . . . and 0.4000 . . .. Therefore we assume
there is not an i ∈ N such that for all n > i, n ∈ N, a
n
= 9. Then we
map r to the set ¦1a
1
, 1a
1
a
2
, . . .¦ of natural numbers. Clearly, this map-
ping is well-defined. For example, r = 0.17803 . . . is mapped to the set
¦11, 117, 1178, 11780, 117803, . . .¦. Thus (0, 1) ≤ {(N), hence {(N) ∼ (0, 1).
Corollary: {(N) ∼ R
Proof: This directly follows from {(N) ∼ (0, 1) and (0, 1) ∼ R, and the
transitivity of ∼.
60 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
3.7 The Continuum Hypothesis
We still think that the study of the size of the continuum should
be our guiding light for further research in set theory.
- Judah Haim in [33]
After showing that the real numbers cannot be put into one-to-one corre-
spondence with the natural numbers (see section 3.5), Cantor hypothesized
in 1877 that each infinite subset of R is either denumerable or equivalent
to the continuum. This hypothesis was first published in 1878 in [13] and
became later known as:
The Continuum Hypothesis (CH): (N ≤
1
A ≤
1
R) → (A ∼ N∨A ∼ R)
This hypothesis (as given in [17, page 128]) is also known in many other
forms, of which we will mention and explain the most important. We can
immediately see that the following version of CH is equivalent to the given
definition: ‘any set of real numbers is either finite, countably infinite or has
the same cardinality as the entire set of reals’. This means that ‘the num-
ber of real numbers is the next level of infinity above the number of natural
numbers’ (see also [30, page 197]).
As we saw in section 3.6, Cantor defined the cardinality of the natural
numbers to be ℵ
0
, and the next levels of infinity to be ℵ
1
, ℵ
2
, ℵ
3
, etc. He also
named the cardinality of the reals c, for continuum. Cantor’s original for-
mulation of CH was: (B) c = ℵ
1
. Since Cantor also proved that {(N) ∼ R
(see page 59), we can also state CH as: (C) {(N) ∼ ℵ
1
. The cardinality of
the power set of any set X is equal to the cardinality of ¦0, 1¦
X
(see page
55), often denoted as 2
X
, so another formulation
7
of CH is: (D) 2

0
= ℵ
1
(see [31]). These formulations, although (B) leads us to think about sizes
of reals, (C) about subsets and (D) about cardinal exponentiations, are all
equivalent in ZFC. We will not go into details of less precise or more de-
pendant formulations such as ‘what is the cardinality of the set of points on
a geometrical line?’.
7
Actually in this formulation we have identified the cardinalities ℵ
0
and ℵ
1
with the
sets that have these cardinalities.
3.7. THE CONTINUUM HYPOTHESIS 61
Some of the theory that is needed in the remaining part of this section, for
the generalized continuum hypothesis, will be introduced in later chapters.
If you are not familiar with the notations that are used, you might want to
skip the remaining part of this section and get back to it later.
In 1908 the German mathematician Felix Haussdorf proposed the follo-
wing generalization of CH (that is also called aleph-hypothesis):
The Generalized Continuum Hypothesis (GCH):
(∀r : r is an ordinal : 2

r
= ℵ
r+1
)
For a definition and the notation of ordinal numbers, we refer to section
3.8.1. Obviously, (see section 5.3) we have that ZF + GCH ¬ CH. Note
that ZF +GCH ¬ AC (so we don’t need ZFC once we have GCH).
Cantor and many other great mathematicians spent years trying to prove
CH or its negation (Cantor tried to prove his hypothesis by using a decom-
pensation theorem; for details see [31, page 117]), but did not succeed. This
problem was so important that Hilbert (see section 6.2) put it first in his list
of 23 problems.
In 1938 significant progress was made when the mathematician G¨ odel
proved (in his article ‘What is Cantor’s continuum problem?’) that CH is
consistent with ZFC (see section 5.3.2) by constructing a model of ZFC +
CH. Since at the same period, G¨odel proved his famous incompleteness the-
orem (see chapter 8), people suspected that CH was one of the statements
(of ZFC) that can neither be proved nor disproved. Mathematicians sus-
pected that CH was undecidable in ZFC but it took until 1963 until this
was proved by Paul Cohen in [15].
To do that he used a new technique called forcing. Forcing is a combi-
natorial technique for proving statements consistent with the axioms of set
theory. Cohen used it in order to prove that the negation of AC and the
negation of CH are consistent with the axioms of set theory (AC and CH
were already known to be consistent). Essentially it consists of a method
of performing the following algorithm: start with a model of set theory ´.
Construct an object X not in ´ with certain properties. Consider the smal-
lest model ´

with X an element of ´

and ´ a subset of ´

(this is done
in a way such that the construction of ´

is implicit in the construction of
62 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
X). For more details on forcing, see [51] and [81].
Thus Cohen constructed a model of ZFC + CH and this, along with
G¨odel’s model of ZFC + CH, showed that CH is undecidable in ZFC. So
this means that either CH or CH could be added as an axiom of ZFC.
But since neither of these axioms seems axiomatic or ‘self-evident’ they have,
unlike AC, not been adopted as axioms of set theory. Mathematicians either
accept this incompleteness in set theory or try to find more intuitive axioms
that will help decide it. In other words, the question remains what intuitive
axiom of set theory we need to make it more complete, and whether, with
some axiom system for set theory, the continuum hypothesis is true.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 63
3.8 Cardinal and Ordinal numbers and Para-
doxes
Every transfinite consistent multiplicity, that is, every transfinite
set, must have a definite aleph as its cardinal number.
- Georg Cantor
3.8.1 Cardinal numbers and Cantor’s Paradox
In section 3.6 we already encountered cardinal numbers and the notion of
set equivalence. After defining the equivalence of sets (see page 51), Cantor
realized that all sets that are equivalent to a given set V have a common
property. He identified this property with the cardinal number V of a set V ,
a property that abstracts from the nature and order of the elements of a set.
Example: Consider the following sets: A = ¦1, 2, 3¦, B = ¦3, 2, 1¦, C =
¦¦4¦, 7, ¦a, b¦¦, D = ¦1, ¦4¦¦. We can say that A ∼ B ∼ C, or (equiva-
lently) A = B = C. We also have A ∼ D, or A = D. Note that in this
example the equality ‘=’ between cardinal numbers is a new type of equality
that is defined as A = B ↔ A ∼ B.
We can see that cardinality abstracts from the order and nature of the
elements, and for finite sets the cardinal number can be identified with the
ordinary ‘number of elements’. Therefore we identify the cardinal number of
a finite set of n elements with the natural number n. We denote the smallest
infinite set (or transfinite) cardinal number by ℵ
0
. As we have already seen
on page 52, this is the cardinal number of N or any denumerable infinite set.
Cantor defined the ‘next’ levels of infinity by ℵ
1
, ℵ
2
, . . ..
The next question was how to pass from the abstract notion of cardinal
numbers to real cardinal numbers, i.e. one wanted to regard cardinal numbers
as objects of the mathematical system. It turned out to be quite a problem
to define the cardinal V of a set V as an object of set theory. In naive set
theory, as well as in Quine’s ‘New Foundations’ (see section 7.3), the defini-
tion of the cardinal V of V poses no problem: V can be defined as the set
of all sets equivalent to V . But this definition (first given by Frege, see page
64 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
3.6) of cardinal numbers as given in section 3.6 can lead to a paradox that
was first found by Cantor.
Cantor’s paradox: The set of all sets is its own power set. Therefore, the
cardinality of the set of all sets must be bigger than itself.
In axiomatic set theory however (e.g. in ZF, see section 5.3), without the
unrestricted comprehension axiom, there is no set which contains all sets
equivalent to V . With this paradox the need arose to find a new definition of
cardinals in a context without the unrestricted comprehension axiom, such
that traditional paradoxes could no longer be derived.
Several new definitions of cardinal numbers were then proposed, based
on ordinal numbers (for which we refer to the next section
8
). The following
definition that comes from the mathematician von Neumann is now the stan-
dard definition for cardinal numbers.
Definition of Cardinal number (or initial number):
A cardinal number α := an ordinal number α with property (∀γ :: α ∼ γ →
α ≤ γ)
For each set V we can prove (see [17, section 2.10]) that there exists
exactly one cardinal number α satisfying V ∼ α (proof uses AC). We call
this unique α the cardinality or cardinal number of the set V , and is also
denoted by V .
In other words, with the axiom of choice we can develop the theory of
ordinals in the von Neumann way and define V to be the least ordinal α equiv-
alent to V . The existence of such an α is guaranteed by the well-ordering
theorem. If we have the axiom of foundation among our axioms, even if the
axiom of choice is absent we can define V as the set of all sets W of least
rank among those equivalent with V (see [1]). In the absence of the axioms
of choice and foundation the operation V is undefinable (see [1]).
For more information on the definition and calculus of cardinal numbers,
we refer to [59, chapter 6], [25] and [34].
8
The rest of this section depends on concepts that are defined in later chapters.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 65
3.8.2 Ordinal numbers and Burali-Forti’s Paradox
We already introduced Cantor’s concept of cardinal number in section 3.6,
and saw in the previous paragraph that it abstracts of the order and nature
of the elements of a set. Cantor also defined a property of sets, the ordinal
number, that only abstracts from the nature of the elements of a set, but
retains the order in which they are given.
Here we consider sets with a total ordering (see page 25). Recall that in
addition for a well-ordered set, each non-empty subset also has a first mem-
ber in the given ordering (see also page 3.2). In the case of ordered sets, the
concept of equivalence is now replaced by the sharper concept of similarity.
We consider two ordered sets V and W similar, notation V · W, if there is
a bijection between V and W that retains all order relations. Note that we
have already seen this relation with the concept of isomorphism (‘is isomor-
phic to’, see page 31), and note that · is an equivalence relation. Instead
of saying two sets are similar, we also can say they are of the same order type.
Definition of an Order Type: An equivalence class under the · (isomor-
phism) relation
The equivalence class to which an ordered set V belongs is called the
order type of V . All well-ordered sets that are as such similar to a given set
V have a common property. Cantor identified this property with the ordinal
number V of a well-ordered set V , a property that only abstracts from the
nature of the elements of a set. And just as for cardinals (see section 3.8.1)
the question was posed how to define ordinal numbers as part of set theory.
In 1883 Cantor defined in [13] an ordinal number as the order type of a well-
ordered set.
Definition of Ordinal Number (Cantor): A well-ordered set V has or-
dinal number o := o is the order type of V
If a set is finite and simply ordered, it is well-ordered and it has an ordinal
number. The ordinal number of that set is the same, regardless of the order
of the elements. For each finite and simply ordered set, we can therefore
66 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
identify the (finite) cardinal number with the ordinal number.
Example: 0 = ∅; 1 = ¦0¦; 2 = ¦0, 1¦; 3 = ¦0, 1, 2¦ are ordinal numbers.
The smallest infinite ordinal number is called ω. This is the ordinal num-
ber of the sequence ¦0, 1, 2, 3, . . .¦, which can be seen as N or as the sequence
of finite cardinal numbers in their ‘natural’ order. We introduce some other
transfinite ordinals by example (from [10, page 66]).
Example:
If we call the set ∅ as ‘0’, the next set as ‘1’, etc., then consider the union
of all the sets ¦0, 1, 2, . . . ¦. This is another ordinal called ω and is the
first non-finite ordinal. It has a successor: ω ∪ ω, called ω + 1. More
ordinals can be obtained by continuing this succession, and taking the
union of all these ordinals yields an ordinal we call ω∗2, etc. The natural
numbers in reverse order are denoted

ω.
V
1
= ¦2, 3, 4, . . . , 1¦ ; V
2
= ¦3, 4, 5, . . . , 1, 2¦
V
3
= ¦1, 3, 5, . . . , 2, 4, 6, . . .¦ ; V
4
= ¦. . . , 3, 2, 1¦
V
5
= ¦1, 3, 5, . . . , 6, 4, 2¦ ; V
6
= ¦1, 11, 21, . . . , 2, 12, 22, . . .¦
N = ω ; V
1
= ω + 1 ; V
2
= ω + 2 ; V
3
= ω +ω = ω ∗ 2
V
4
=

ω ; V
5
= ω +

ω ; V
6
= ω ∗ 10
For ordinal numbers n of N and m of M we say that n < m if the well-
ordered set N is similar to a real subset of M.
Unfortunately, a similar situation as for cardinal numbers, was found
for ordinal numbers. In 1897 it was found by the Italian assistant of the
mathematician Peano, Burali-Forti, that this definition can give rise to a
paradox (see [18, page 259]).
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 67
The Burali-Forti Paradox: The set of all ordinal numbers, taken in their
natural order, form a well-ordered series, and therefore also has an ordinal
number Ω. But the ordinal number of any subset of the set of all ordinals
exceeds every number of that subset, and therefore Ω exceeds any ordinal
number whatsoever.
This led to new proposals for definitions of ordinal numbers. Hence we
hereunder present another definition, given by John von Neumann in [61].
In 1923 he pointed out that among all well-ordered sets having a Cantorian
ordinal as their order type, there is a particular one with some very special
properties. Von Neumann defined this particular set as the ordinal of that
order type.
Definition of ordinal number: A set α is an ordinal number :=
1) α is a well-ordered set with the binary relation ∈ as its ordering
2) (∀β :: β ∈ α ↔ β ⊂ α)
With this definition of ordinal numbers, the Burali-Forti paradox can
no longer be applied, since the set of all ordinals is well-ordered by and
2) also holds (a proof is given in [59, section 4.2]). According to this def-
inition, the empty set is an ordinal number. This ordinal number is also
denoted by 0. Similarly we also denote the ordinal numbers ¦0¦ by 1, ¦0, 1¦
by 2, ¦0, 1, 2¦ by 3, etc. Otherwise said: 0 = ∅, 1 = ¦∅¦, 2 = ¦∅, ¦∅¦¦, . . ..
These ordinal numbers, which are finite sets, are called finite ordinal num-
bers. The finite ordinal numbers are identified with the natural numbers.
The set ω = ¦0, 1, 2, . . .¦ of all natural numbers is also an ordinal number.
An ordinal number that is an infinite set, like ω, is called a transfinite ordi-
nal number. For every well-ordered set V , there exists exactly one ordinal
number isomorphic to V .
Definition of ordinal number of a well-ordered set V :
The ordinal number of a well-ordered set V := the ordinal number isomorphic
to V
68 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
A detailed treatment of ordinal calculus that is based on this definition
of of ordinal numbers, is outside the scope of this report. In the remainder
of this section we will only define the most common concepts.
As we saw in 3.2 we also write α ∈ β (we denote ordinals by lower-case
Greek letters) as α < β, which defines an ordering on the ordinal numbers.
The least ordinal number is of course 0, and the ordering of the finite ordi-
nal numbers coincides with the usual ordering of the natural numbers. The
least transfinite ordinal is ω (see also 5.3.2). The ordering ≤, defined by
α ≤ β := α < β ∨ α = β, is a linear ordering and a well-ordering of the
ordinal numbers. Therefore we can apply transfinite induction (see page 37)
on ordinal numbers.
For any ordinal number α, the set α

= ¦γ [ γ ≤ α¦ (called a seg-
ment of α) also is an ordinal number, and α

is the unique predecessor of
α. A transfinite ordinal without a predecessor is called a limit ordinal num-
ber, and all the other ordinal numbers are called isolated ordinal numbers.
The first limit ordinal number is ω. For any set V of ordinal numbers,
¦γ [ (∃η : η ∈ V : η ≤ γ)¦ is an ordinal number, the supremum of V .
A full treatment of the theory of ordinal numbers is omitted here. Ri-
gorous study has produced a complete calculus of ordinal numbers and pro-
duced significant results. We only mention here the so-called well-ordering
theorem, which Cantor had accepted as true (see [18, page 257]) but that
was first proved rigorously by Zermelo in 1904.
Well-Ordering Theorem: Every set can be well-ordered.
This means that ordinals give us a way of ‘counting’ any set, even if it is
not finite. The particular significance of the well-ordering theorem lies in the
possibility that we can apply the principal of mathematical induction (which
is well known for denumerable sets, see section 3.4.3) to any arbitrary well-
ordered set. Ordinal numbers form the basis of transfinite induction which
is a generalization of the principle of induction.
3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 69
We now have the following properties (given without proof):
• Two finite and ordered sets have the same order type if and only if they
have the same cardinal number
• Cantor’s theorem : the cardinality of any set is lower than the cardi-
nality of the set of all its subsets (i.e. there is no highest aleph)
• If two sets have the same ordinal number, they have the same cardinal
number, but not necessarily vice versa
For more information and theory on cardinal numbers, ordinal calculus
and set theory we refer to two classical books on set-theory: [25] and [34].
The first one gives a good introduction to set theory and presupposes little
mathematical knowledge, the latter is more suitable for readers with experi-
ence on set theory.
70 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Chapter 4
Peano and Frege
4.1 Peano’s arithmetic
Questions that pertain to the foundations of mathematics, al-
though treated by many in recent times, still lack a satisfactory
solution. The difficulty has its main source in the ambiguity of
language.
- Peano in the opening of the paper ‘Arithmetices Principia’, novo
methodo exposita in which he introduces axioms for the integers
The Italian mathematician Giuseppe Peano (1858-1932) spent most of
his career successively in the infinitesimal calculus, in foundations of mathe-
matics and in linguistic studies. After his work on calculus (see Peano’s first
publication [65]) and geometry (see [66] [67]), Peano gained particular inter-
est in the field of number theory, also known as arithmetic. Like Dedekind
(see quote on page 46), Peano became aware of the lack of rigour in mathe-
matics by his experience in teaching infinitesimal calculus.
What is number theory? The field of mathematics consisting of the study
of the properties of the natural numbers
Since then, Peano strived for rigor, for an abstract mathematics. He came
to the conclusion that mathematics must be constructed, independently of
intuition or common sense, in a way that absolutely guarantees the validity
71
72 CHAPTER 4. PEANO AND FREGE
of its theorems.
In order to satisfy this requirement he devoted himself to the transforma-
tion of mathematics into a self-contained system, and rewrote mathematics in
symbolic form as an axiomatic system (see section 6.1), based exclusively on
postulated primitive notions and primitive propositions. To discard intuition,
he first renounced ordinary language (because it is often not sufficient and
imprecise) and desired a new mathematical symbolism, consisting entirely
of neutral symbols. Second, he formalized the logic of the mathematical ar-
gument to replace intuitive inference by application of a limited number of
stated logical rules.
So Peano formalized both the language of mathematics and the logic
of the mathematical argument, and thereto first developed parts of sym-
bolic logic and first formalized propositional and predicate calculus. This
development was rudimentary and would later be worked out in full detail
by the mathematicians Russell and Whitehead in ‘Principia Mathematica’
(1910, see section 7.1). He introduced letters to denote propositions and
propositional functions (Peano’s logic notation) and the symbol ∈ for the
membership relation of a set.
The work of formalization of mathematics was published in the journal
‘Rivista di Mathematica’ (this journal was previously founded by himself)
and ‘Formulario Mathematico’, a series of 5 books that is also known as
‘Formulaire de Math´ematique’
1
. In 1899 he axiomatized the arithmetic of
cardinal numbers, to be published in the third volume of ‘Formulario Math-
ematico’ in 1901. Peano based the foundations of arithmetic on 5 axioms
(see [31, page 227]), that are formulated with the help of three (undefined)
terms, the acquaintance with the latter being assumed:
a) N (the set of natural numbers)
b) 0 (the particular natural number zero)
c) a+ (the immediate successor of the natural number a)
1
The original ‘Formulaire de Math´ematique’ was called ‘Formulario Mathematico’ when
the first final version appeared in 1908, because Peano at that time consistently used
Interlingua, his simplificated dialect of Latin, for all his mathematical publications.
4.1. PEANO’S ARITHMETIC 73
Definition of the Peano axioms for the natural numbers:
1) 0 ∈ N
(zero is a natural number)
2) a ∈ N → a+ ∈ N
(the immediate successor of any number is a number)
3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S)) →N ⊂ S
(if a set S contains zero and if it contains any number x it also contains
the immediate successor x+ of that number, then S includes the whole
of N)
4) a, b ∈ N ∧ a+ = b+ → a = b
(no two different numbers have the same immediate successor)
5) a ∈ N → a+ = 0
(zero is not the immediate successor of a number)
Axiom three has the function to formalize the principle known as mathe-
matical induction. We can show that in ZF (see section 5.3) we can derive
the five axioms of Peano. For more information on the Peano axioms, I refer
to [31, chapter 5], [49, page 146-147] and [64, appendix A].
After defining the natural numbers, Peano used a recursive definition to
define the arithmetical sum, product and other operators, and he derived
much of the elementary number theory.
Example: Peano defined the sum a + b by recursion with respect to b :
a + 0 = a, a + (b+) = (a + b)+. Similarly we can define the product
a ∗ b : a ∗ 0 = 0, a ∗ (b+) = (a ∗ b) +a.
Peano then showed how rationals and reals can be formally obtained from
naturals, and further considered elementary analysis and geometry. In later
years, Peano turned away from the foundations of mathematics and devoted
most of his time on his new international auxiliary language Interlingua. He
invented this language (see [49, page 148-150]) in an attempt to reduce the
grammatical structure of languages and create a universal language. His
mathematical work were to have a profound influence on the thought of
mathematics, but his language Interlingua received little response.
74 CHAPTER 4. PEANO AND FREGE
4.2 Frege’s work
As I think about acts of integrity and grace, I realize that there
is nothing in my knowledge to compare with Frege’s dedication to
truth. His entire life was on the verge of completion, much of his
work had been ignored to the benefit of men infinitely less capa-
ble, his second volume was about to be published, and upon finding
that his fundamental assumption was in error, he responded with
intellectual pleasure clearly submerging any feelings of disappoint-
ment. It was almost superhuman and a telling indication of that
of which men are capable if their dedication is to creative work and
knowledge instead of cruder efforts to dominate and be known.
- B. Russell about Frege, in [93, page 127]
The German mathematician and philosopher Gottlob Frege (1848-1925)
was one of the founders of modern symbolic logic putting forward the (lo-
gistic) view that mathematics is reducible to logic. He has written many
important papers on philosophy. Frege once said ‘every good mathematician
is at least half a philosopher, and every good philosopher is at least half a
mathematician’. Famous is his ontological argument for the existence of god,
but we will not discuss his philosophical writings here. We will mention his
three most important works on the foundations of mathematics: Begriffs-
schrift, Grundlagen der Arithmetik and Grundgesetze der Arithmetik.
Begriffsschrift
Just as Peano, the German mathematician Gottlob Frege invented a log-
ical symbolism to which he gave the name ‘Begriffsschrift’ (in English known
as ‘Concept script’). We will not treat the symbolism that was used in Be-
griffsschrift here, in full detail (it can be found in [49, page 175-182] and in
[31, page 177-199]), but give a few examples of his new logic and describe
the rest of his work in general terms.
Frege rejected the subject/predicate regimentation on which Aristotelian
logic depends, and recognized (not as the first) that the patterns of Aris-
totle cannot always be used to evaluate inferences correctly.
4.2. FREGE’S WORK 75
Example: Certain obvious inferences, such as:
If Joe doesn’t wear a kilt, than Joe is not Scottish.
Joe doesn’t wear a kilt.
Therefore, Joe is not Scottish.
do not fall under the patterns of traditional logic (also called syllogisms). Ac-
tually this is another kind of inference that contains a conditional expression
of the form:
if B then A
B
Therefore, A.
Frege adopted this new rule in the system of logic of his Begriffsschrift.
With arbitrary expressions for A and B, the rule became later known as
modus ponens. A logic that evaluates these sorts of expressions is called a
propositional logic.
What is propositional calculus (or sentential calculus)?
A symbolic system of treating compound propositions and their logical re-
lationships. Compound propositions are formed via a set of derivation rules
using standard symbols: ∧, ∨, →, ; Basic propositions consist of simple,
unanalyzed propositions.
Frege based his propositional calculus on 6 axioms: for all x, y and z:
1 x → (y → x)
2 (x → (y → z)) → ((x → y) → (x → z))
3 (x → (y → z)) → (y → (x → z))
4 (x → y) → (y → x)
5 x → x
6 x → x
76 CHAPTER 4. PEANO AND FREGE
Derivations in the propositional calculus were based on two procedures of
substitution and the rule of modus ponens. For the full calculus of predi-
cates, three additional axioms were needed. For all x, y and (propositional
functions) F:
7 (x = y) → (F(x) → F(y))
8 x = x
9 (∀x :: F(x)) → F(y)
Frege presented this new logic in his ‘Begriffsschrift’ in 1879. It consists
of three parts. In the first part he provides a list of inferences from which,
he believes, all truths of logic can be derived. Then Frege demonstrates in
the second part the completeness of his logic (i.e. all inferences that can be
shown to be valid inferences using the techniques of Aristotelian or proposi-
tional logic can also be shown to be valid using only Frege’s laws and rules
of inference). The third part of Begriffsschrift shows that logic alone suffices
to show the validity of certain inferences (about properties that are heredi-
tary in so-called ‘ancestral sequences’). He also showed that mathematical
induction (see section 3.4.3) can be replaced by a principle about ancestral
sequences that depends only on logical laws.
Grundlagen der Arithmetik
Throughout his work Frege developed (as the first) the main thesis of logi-
cism, that mathematics is reducible to logic. But thereto, he had to do more
than developing a new logical symbolism. His next book, ‘Die Grundlagen
der Arithmetik’ (1884), was devoted to the ‘foundations of arithmetic’. In
this book, Frege treated the foundations of arithmetic, based on the concept
of (cardinal) numbers. He put forward the logicist philosophy that arithmetic
could be founded upon logic alone, and he discussed work of others in detail
(see [49, 184-185]). In [31, page 183] we learn more about Frege’s philosophy.
In the introduction of his book Frege announced his three guiding principles:
1) Always to separate sharply the psychological from the logical, the sub-
jective from the objective
2) Never to ask for the meaning of a word in isolation, but only in the
context of a proposition
4.2. FREGE’S WORK 77
3) Never to lose sight of the distinction between concept and object
In his book he presented his own theory of numbers, and wanted to show
that all the truths in arithmetic are derivable from logical laws and defini-
tions alone. He did this by sketching the proof, but not giving the official
Begriffsschrift proofs of the truths of arithmetic. Before Frege could do that
he needed a new version of Begriffsschrift, to accompany the new require-
ments that his formalization of the concept of numbers had, but also to fill
in pieces that were simply missing.
Grundgesetze der Arithmetik
In his next three papers ‘Function and Concept’, ‘On Sense and Meaning’,
and ‘On Concept and Object (1892)’, he introduced all modifications that he
was to make to his language, Begriffsschrift, and his logical system. During
that period he also completed his definitions of the natural numbers and some
of the proofs of simple truths of arithmetic from these definitions and logical
laws. His new logical calculus included a symbolic representation of the truth
value of any given proposition, which provided a shorter notation for many
Begriffsschrift propositions. The calculus also had several other new logical
and arithmetical symbols, one of the most important of them being a notation
for what Frege called the ‘course-of-values’ of a propositional function. The
course-of-values of a propositional function ϕ , denoted by Frege as ˘ εϕ(ε),
denoted the truth value for all possible values of the argument (here ε). We
denote it as cov and define equal course-of-values by cov(f) = cov(g) ↔ (∀a ::
f(a) = g(a)). In 1893, Frege published the first volume of his ‘Grundgesetze
der Aritmetik’, the ‘Basic Laws of Arithmetic’. It set out the new version of
logic and began the proofs that were to make the project successful. In the
second part Frege wanted to define the natural numbers and some basic laws
governing them and, in the third part, he would define the real numbers and
lay the foundations for expressing analysis in terms of logic. In 1902, when
volume 2 was in press, he received a now famous letter from the English
mathematician and logician Russell (see chapter 5), who pointed out, with
great modesty, a contradiction could be derived in Frege’s system (see section
5.1). This contradiction would later be named after Russell and become
known as ‘Russell’s paradox’.
78 CHAPTER 4. PEANO AND FREGE
Hardly anything more unwelcome can befall a scientific writer
than one of the foundations of his edifice be shaken after his work
is finished. I have been placed in this position by a letter of mr.
Bertrand Russell just as printing of the second volume was near-
ing completion . . . .
- The first paragraph of the appendix from Frege’s ‘Grundgesetze
der Aritmetik’
After many letters between the two (see for example [93, pages 124-128]),
Frege modified one of his axioms and explained in an appendix to the book
that this was done to restore the consistency of the system. However with
this modified axiom, many of the theorems of volume 1 do not go through
and Frege must have known this. He probably never realized that even with
the modified axiom the system is inconsistent since this was not shown until
after Frege’s death in 1925, by Leshniewski (see [85]).
The scope of Frege’s Grundgesetze is similar to that of Principia Mathe-
matica (to be discussed in section 7.1), and both aimed at a logistic basis
for mathematics, but with Russell’s theory of types Principia Mathematica
did not contain the paradox. Frege’s contribution to the foundations of ma-
thematics was therefore largely indirect (through Principia Mathematica,
see [49, page 181]). Although Frege attracted only a small audience in his
lifetime, he was a major influence on Peano and Russell, and in the years
thereafter his influence on contemporary philosophy, especially on thought
about language and logic, has become ubiquitous.
In this text I have made extensive use of the excellent books [98] and [97]
about Frege that contain many more references about Frege and his work,
and chapter 4.5 from [31] and chapter 6, section 4 from [49].
Chapter 5
Russell
The fact that all Mathematics is Symbolic Logic is one of the
greatest discoveries of our age; and when this fact has been esta-
blished, the remainder of the principles of mathematics consists
in the analysis of Symbolic Logic itself.
- B. Russell in Principles of Mathematics, 1903
The English logician and philosopher Bertrand Russell (1872-1970) pu-
blished in his long life an incredible number of books on logic, the theory of
knowledge and many other topics. He certainly was one of the most impor-
tant logicians and philosophers of the 20
th
century.
Russell’s private life, affairs, imprisonment, his social and political cam-
paigns and advocacy of both pacifism and nuclear disarmament are certainly
interesting, but we will not discuss these subjects here (see for more informa-
tion and references on Russell’s life and work [62], [80] and [31, chapter 6, 7,
11 and sections 8.2, 8.3, 8.4, 8.8.3, 8.9.2, 10.1, 10.2.1]). I quote the following
assessment from [73]: “Bertrand Russell had one of the most widely varied
and persistently influential intellects of the 20
th
century. During most of his
active life, a span of three generations, Russell had at any time more than
40 books in print ranging over philosophy, mathematics, science, ethics, so-
ciology, education, history, religion, politics and polemic. The extent of his
influence resulted partly from his amazing efficiency in applying his intellect
(he normally wrote at the rate of 3,000 largely unaltered words a day) and
partly from the deep humanitarian feeling that was the mainspring of his ac-
79
80 CHAPTER 5. RUSSELL
tions. This feeling expressed itself consistently at the frontier of social change
through what he himself would have called a liberal anarchistic, left-wing,
and skeptical atheist temperament.”
Here, we will focus on Russell’s mathematical contributions to the foun-
dations of mathematics. His contributions relating to mathematics include
his discovery of Russell’s paradox, his defense of logicism (the view that
mathematics is, in some significant sense, reducible to formal logic), his in-
troduction of the theory of types, and his refining and popularizing of the
first-order predicate calculus. Along with Kurt G¨ odel (see chapter 8), he is
usually credited with being one of the two most important logicians of the
twentieth century. We will look at each of these contributions in more detail.
Russell discovered the paradox which bears his name in 1901, while
working on his ‘Principles of Mathematics’ (1903). The paradox and the
closely related vicious circle principle are discussed in section 5.1. Russell’s
own response to the paradox came with the introduction of types (see chap-
ter 7). Using the vicious circle principle also adopted by Henri Poincar´e,
together with Russell’s so-called ‘no-class’ theory of classes, Russell was then
able to explain why the unrestricted comprehension axiom (see section 2.1)
fails: propositional functions, such as ‘x is a set’, should not be applied to
themselves since self-application would involve a vicious circle. On this view,
it follows that it is possible to refer to a collection of objects for which a
given condition (or predicate) holds only if they are all at the same level or
‘type’.
Although first introduced by Russell in 1903 in the Principles, his theory
of types finds its mature expression in his 1908 article ‘Mathematical Logic as
Based on the Theory of Types’ and in the monumental work he co-authored
with Alfred North Whitehead, ‘Principia Mathematica’ (1910, 1912, 1913).
Principia Mathematica and the theory of types will be treated in detail in
chapter 7. The theory admits of two versions, the ‘simple theory’ and the
‘ramified theory’. Both versions of the theory later came under attack. For
some, they were too weak since they failed to resolve all of the known para-
doxes. For others, they were too strong since they disallowed many ma-
thematical definitions which, although consistent, violated the vicious circle
principle. Russell’s response to the second of these objections was to intro-
duce, within the ramified theory, the axiom of reducibility. Although the
81
axiom successfully lessened the vicious circle principle’s scope of application,
many claimed that it was simply too ad hoc to be justified philosophically.
Of equal significance during this same period was Russell’s defense of logi-
cism, the theory that mathematics was in some important sense reducible to
logic. First defended in his Principles, and later in more detail in ‘Principia
Mathematica’, Russell’s logicism consisted of two main theses. The first
is that all mathematical truths can be translated into logical truths or, in
other words, that the vocabulary of mathematics constitutes a proper subset
of that of logic. The second is that all mathematical proofs can be recast as
logical proofs or, in other words, that the theorems of mathematics consti-
tute a proper subset of those of logic.
Like Gottlob Frege, Russell’s basic idea for defending logicism was that
numbers may be identified with sets of sets and that number-theoretic state-
ments may be explained in terms of quantifiers and identity. It followed
that number-theoretic operations could be explained in terms of set-theoretic
operations such as intersection, union, and the like. In ‘Principia Mathema-
tica’ Whitehead and Russell were able to provide detailed derivations of many
major theorems in set theory, finite and transfinite arithmetic, and elemen-
tary measure theory. A fourth volume on geometry was planned but never
completed.
For more information on Russell’s theory of types and about Principia
Mathematica, we refer to chapter 7. In this chapter we used parts of [73]
and [39].
82 CHAPTER 5. RUSSELL
5.1 Russell’s paradox
I hoped sooner or later to arrive at a perfect mathematics which
should leave no room for doubts, and bit by bit to extend the sphere
of certainty from mathematics to other sciences.
- Russell, in [78]
Paradoxes have been known for a long time, but in particular with the
introduction of more formal systems at the end of the 19
th
century paradoxes
became more influential on the foundations of mathematics. Before we de-
scribe the most famous paradox of Russell, we first define the notion of a
paradox.
What is a paradox? A paradox is a statement which appears self-contradictory
or contrary to expectations, and is also known as an antinomy
In an axiomatic system (see section 6.1) a paradox is a derivation that
leads to a contradictory statement.
A paradox is properly something which is contradictory to ge-
neral opinion; but is frequently used to signify something self-
contradictory [...] Paralogism, by its etymology, is best fitted to
signify an offence against the formal rules of inference.
- De Morgan, in [31, page 310]
In [86], three ‘paradox threats’ are identified: when systems are complex,
formal or designed for computers, there often is not enough intuition to notice
inconsistencies. With the previously described formalizations, the systems
of Cantor (see chapter 2), Peano (see section 4.1), Frege (see section 4.2),
and not to mention Russell himself were at risk. And indeed, in 1902 Russell
discovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. The paradox
turned out to be at the basics of mathematics, since it could be formulated in
all the systems mentioned above. We first formulate the paradox in Cantor’s
set theory:
Russell’s paradox: Let R = ¦x [ x ∈ x¦. Then R ∈ R ↔ R / ∈ R
5.1. RUSSELL’S PARADOX 83
Russell in 1901 studied Cantor’s work [31, section 6.6.1] and after noting
that some sets belonged to themselves while the rest did not do so, Russell
showed that the set of all sets which do not belong to themselves belongs to
itself if and only if it does not do so - and, by repetition of the argument,
vice versa also. Russell also expressed this paradox in terms of predicates,
and as such first presented his discovery in a letter to Frege (see [93, page
124] and see also the quote on page 78).
Since Peano’s system was based on the set theory of Cantor, also Peano’s
work contained the paradox. In Frege’s work (Grundgesetze der Aritmetik)
self-application was not possible, so R ∈ R was not allowed, but the para-
dox could still be expressed by using Frege’s notion (see page 77) of the
course-of-values of a function. If we define equal course-of-values cov by
cov(f) = cov(g) ↔ (∀a :: f(a) = g(a)), we can derive the paradox in Frege’s
work as follows (see also [86, page 7] for a slightly different proof):
Define f(x) := (∀ϕ :: (cov(ϕ) = x) → ϕ(x)), and let K := cov(f).
f(K)
≡ ¦def. f¦
(∀ϕ :: cov(ϕ) = K → ϕ(K))
≡ ¦elim.¦
(∀ϕ :: cov(ϕ) = K → ϕ(K))
≡ ¦instantiate ϕ with f¦
cov(f) = K → f(K)
≡ ¦def. K, elim. →¦
f(K)
The paradox had a big influence, since it could be formulated in all sys-
tems, and all statements in classical logic were entailed by a contradiction.
84 CHAPTER 5. RUSSELL
In the eyes of many mathematicians (e.g. Hilbert, Brouwer) it therefore
appeared that no proof could be trusted once it was discovered that the
logic underlying all mathematics was inconsistent. Russell’s paradox arises
as a result of naive set-theory’s so-called unrestricted or naive comprehension
axiom (see page 16). Cantor created this axiom with the intuition that any
coherent condition may be used to determine a set. But that means that the
condition ϕ that determines a set V = ¦x [ ϕ(x)¦ may depend on the whole
set V , i.e. it allows impredicative definitions (see below for the definition of
impredicative). Most attempts at resolving Russell’s paradox have therefore
concentrated on various ways of restricting or abandoning this axiom.
Before we consider the consequences of the discovery of the paradox,
we first take a further look at the nature of the paradox, hereby following
Russell’s own analysis. While writing ‘The Principles’, Russell’s attention
was attracted by what is now known as Cantor’s paradox and (according to
a letter he wrote to the French mathematician Jourdain) found that there
was something wrong with his earlier refutation of Cantor’s paradox (see [29,
section 7]). He removed his earlier refutation from ‘The Principles’ and his
revised diagnosis uncovered a true paradox. As we have already seen, he
summarized this discovery and the reasoning that led thereto in a second
letter to Frege.
After discovering his famous paradox, Russell traced the fallacy back to
what he called the ‘vicious circle principle’. The ‘vicious circle’ that his prin-
ciple is named after, arises from the assumption that a set of objects may
contain members which can only be defined by means of the set as a whole.
Therefore, Russell said that statements are not legitimate and meaningless,
if they contain a set of objects such that it will contain members which pre-
suppose this (total or whole) set of objects. That means a statement is only
legitimate if all propositions it contains refer to already defined sets.
Definition of impredicative: A definition is impredicative if it involves a
set V that has a member v ∈ V whose definition depends on V .
1
1
Note that a direct implementation of this definition as a new axiom of set theory is not
possible; We might rephrase the definition as ‘whatever set contains an apparent element,
that element must not be dependant on that set’. This might be implemented by fixing
‘an apparent element’ of a set and then expressing its independency of other elements of
that set. This independency means that, regardless of the nature of the elements of the
5.1. RUSSELL’S PARADOX 85
In a sense those impredicative definitions are thus circular, and were con-
sidered the cause of antinomies. For more information about impredicativity,
see [57, section 15.3].
Definition of Vicious Circle Principle
2
: Definitions, assumptions or
statements involving all of a set must not be a part or an element of that
set. In other words, impredicative definitions should be avoided.
In terms of set theory we can formulate the principle as : No set V is
allowed to contain members v definable only in terms of V , or members v
involving or presupposing V .
Vicious circle fallacies are arguments that are condemned by the vicious
circle principle. Such arguments may not necessarily lead to contradictions
(since fallacious arguments can lead to true conclusions).
In Principia Mathematica (see [31, section 7.2]), Russell assembles a col-
lection of seven different paradoxes, all of which were based on the same
circular type of reasoning, and then he resolved them by making their circu-
larity explicit. We will now mention eight of the most well-known paradoxes,
most of whom originate from the vicious circle principle.
set, the nature of the apparent element remains the same. The ‘nature’ of the elements
can be seen as all the members of that element (or in case the element is an individual,
the nature of the apparent element can be seen as that individual). This leads us to the
following axiom:
(∀X :: (∀x : x ∈ X : x = a → (∀x

: x

∈ X ∧ x = x

: x

= b(x

) → a ∈ X))). Clearly
this does not avoid the paradox of Russell. We consider a set X:=R ≡ ¦x [ x / ∈ x¦ and
an element x ∈ R, i.e. we have x / ∈ x. Despite the fact that the set X is ‘too large’, the
axiom does not prohibit the existence of the set X. The axiom tells us x = a → (∀x

:
x

∈ R ∧ x = x

: x

= b(x

) → a ∈ R). In other words, we can change each element in R
except x and the nature of x should not depend on it. The only thing we know about x is
that x ∈ x and x ∈ R. So to obtain a contradiction we have to show that x ∈ x ∨ x / ∈ R.
Now we can change all x

into any value b(x

), but still we will have x / ∈ x and x ∈ R. So
unfortunately this most ‘direct’ attempt to solve the paradox fails.
2
Russell formulated it originally as ‘Whatever involves all of a collection must not be
one of the collection’. Or, as formulated in [49, page 113]: ‘If, provided a certain collection
had a total, it would have members only definable in terms of that total, then the said
collection has no total’. Another formulation of [87] says ‘No entity can be defined in
terms of a totality of which it is itself a possible member’.
86 CHAPTER 5. RUSSELL
1 Russell’s paradox (1903), which we have discussed in this section. The
impredicativity is clear in the definition of the set that contains all sets
that are not members of themselves. There are many popularizations of
this paradox, one of them is from Russell himself (1919) and concerns
the plight of the barber of a certain village who has enunciated the
principle that he shaves only all those persons of the village who do
not shave themselves. The paradox is then formed by the question
‘Does the barber shave himself?’.
2 Burali-Forti’s paradox (1897), which we have discussed in section 3.8.2.
The impredicativity comes from the ordinal number of the naturally
ordered set of all order numbers.
3 Cantor’s paradox, which we have discussed in section 3.8.1. The im-
predicativity comes from the cardinal number of the set of all sets.
4 The liar’s paradox: We quote from [49, page 127]: “If a man says ‘I
am lying’, his utterance is self-contradictory, and it cannot be either
true or false. The oldest form of this particular paradox, in the words
of Principia Mathematica, is that of Epimenides the Cretan, ‘who said
that all Cretans were liars, and all other statements made by Cretans
were certainly lies’.”.
5 Richard’s paradox: The French schoolteacher Jules Richard (1862-
1956) published a paradox in [74] in 1905. He considered a set V of
all non-terminating decimals that can be defined in a finite number of
words. By arranging V as a sequence, and applying Cantor’s diagonal
argument to the members of V , a different but non-terminating decimal
was produced, defined in a finite number of words.
6 Paradox of definitions. Again we quote from [49]: “The possible defi-
nitions of specific ordinal numbers can be arranged in a sequence, and
there are therefore at most ℵ
0
of them. But the totality of ordinal
numbers is not denumerable, and so there exist ordinal numbers which
cannot be individually defined. Among such indefinable ordinals there
is a least, and thus it appears that the description ‘the least indefinable
ordinal’ yields a definition of an entity that cannot be defined.”.
7 Berry’s paradox: “The least integer not nameable in fewer than nine-
teen syllables” is itself a name that contains only eighteen syllables.
5.1. RUSSELL’S PARADOX 87
8 The Grelling-Nelson paradox: The German philosopher Kurt Grelling
(1886-1942) published with his friend Leonard Nelson (1882-1927) in
1908 a paradox. As described in [31, page 336]: “Some words can be
predicated of themselves: in English, ‘word’ is a word, ‘noun’ is a noun,
and so on. This property is called ‘autological’, and is obviously itself
autological. Other English words are not autological; ‘German’, say, or
‘verb’. They are called ‘heterological’ - but this word is heterological if
and only if it is not so.”.
The first three paradoxes are logical paradoxes that can be formulated
within Cantor’s set theory. The remaining five are mainly paradoxes of nam-
ing, they are of a semantic kind. All these paradoxes have stimulated funda-
mental research, and especially Russell’s paradox that revealed the vicious
circle principle and first showed the need for a theory of types or other re-
striction of the power of the comprehension axiom.
88 CHAPTER 5. RUSSELL
5.2 Consequences and philosophies
Perhaps the greatest paradox of all is that there are paradoxes in
mathematics.
- E. Kasner and J. Newman quoted in [46]
The various proposals to overcome this paradox led to various theories.
One proposal was to reconstruct set theory on an axiomatic basis (this
axiomatic method was first suggested by Hilbert, see section 6.1) sufficiently
restrictive to exclude the paradoxes. Hilbert and other formalists had the
basic idea to allow the use of only well-defined and finitely constructible
objects, together with rules of inference that were deemed to be absolutely
certain.
The mathematician Zermelo in 1908 as first did an attempt to formulate
proper axioms for set-theory such that the paradox is not deducable, but
most other parts of set-theory are. This attempt was successful and, after
a refinement by the mathematician Fraenkel, led to the ZF axiom system
(see section 5.3) which is still the most accepted basis today. Subsequent
refinements to ZF have been made by Skolem, and later by the three mathe-
maticians von Neumann, Bernays and G¨ odel (see section 8.5).
Russell’s own response to the paradox came with the introduction of his
theory of types in his Principia Mathematica (see section 5.4). Russell al-
ready laid out a first version of his theory to eliminate the paradoxes in
1908. Since self-application (R ∈ R) caused a contradiction, he decided to
suppress this. With this approach he assigned types to variables (as types
he took sets) and allows expressions such as x ∈ y only if the type of x
is one less (in some order) than the type of y. The outlawing of impredi-
cative definitions seemed a solution to the known paradoxes in set theory.
But it turned out there are essential and accepted parts of mathematics that
contain impredicative definitions. This was a serious problem to Russell’s
solution, despite the fact that many instances of impredicative definitions in
mathematics could be circumvented. We quote from [22, page 265]: “In 1918,
the German mathematician Hermann Weyl (1885-1955) tried to construct as
much parts of analysis as possible from the natural number system without
the use of impredicative definitions. Although he succeeded in obtaining a
considerable part of analysis, he was unable to derive the important theorem
5.2. CONSEQUENCES AND PHILOSOPHIES 89
that every nonempty set of real numbers having an upperbound has a least
upperbound”.
Other attempts towards a solution for the paradoxes of set theory focus on
the foundations of logic. Luitzen Brouwer and the intuitionists took this
approach and tried to prevent the paradoxes by denying the principle of the
excluded middle (which states that any mathematical statement is either
true or false). Brouwer first attacked the logical foundations of mathematics
in his doctoral thesis in 1907; This formed the beginning of the Intuitionist
School. The intuitionists had the basic idea that one cannot assert the exis-
tence of a mathematical object unless one can also indicate how to go about
constructing it.
In the period after the discovery of the paradoxes, we distinguish three
main philosophies of mathematics: logicism, intuitionism and formalism.
What is Logicism? A school of mathematical thought which holds the
thesis that mathematics is a part of (or a branch of) logic.
Logicists contend that all of mathematics can be deduced from pure logic,
without the use of any specifically mathematical concepts, such as number or
set. The first ideas date back to Leibniz (1616) and the actual reduction of
mathematics to logic was started by Dedekind (1818) and Frege (1884-1903)
and later by Peano, and Whitehead and Russell (in Principia Mathematica
1910-1913).
What is Intuitionism? A school of mathematical thought by the 20
th
cen-
tury Dutch mathematician L.E.J. Brouwer (1881-1966) that contends that
the primary objects of mathematical discourse are mental constructions go-
verned by self-evident laws.
Intuitionists have challenged many of the oldest principles of mathema-
tics as being non-constructive (and hence meaningless). They proposed that
a proof in mathematics should be excepted only if it constructed the mathe-
matical entity it talked about, and not if it merely showed that the entity
‘could’ be constructed or that supposing its non-existence would result in
contradiction.
90 CHAPTER 5. RUSSELL
Brouwer had the fundamental insight that such nonconstructive argu-
ments will be avoided if one abandons a principle of classical logic (which
lies for example behind De Morgan’s laws). This is the principle of the ex-
cluded third (or excluded middle), which asserts that for every proposition
ϕ, either ϕ or ϕ; or equivalently that, for every ϕ, ϕ implies ϕ. This
principle is basic to classical logic and had already been enunciated by Aris-
totle, though with some reservations, as he pointed out that the statement
“there will be a sea battle tomorrow” is neither true nor false.
Because of the weight it places on mental apprehension through construc-
tion of purported mathematical entities, intuitionism is sometimes also called
constructivism. A still more severe form of constructivism which we will not
further discuss is strict finitism, in which one rejects infinite sets. More in-
formation on intuitionism can be found in [60].
What is Formalism? A school of mathematical thought introduced by the
20
th
century mathematician David Hilbert, which holds that all mathematics
can be reduced to rules for manipulating formulas without any reference to
the meanings of the formulas.
Formalists contend that it is the mathematical symbols themselves, and
not any meaning that might be ascribed to them, that are the basic objects
of mathematical thought. Hilbert’s program, called formalism, was to con-
centrate on the formal language of mathematics and to study its syntax. A
statement should be a metatheorem, that is a theorem provable within the
syntax of mathematics.
These three philosophies do not necessarily contradict each other, and
all philosophies are still advocated today. Whether the logicist thesis has
been established seems to be matter of opinion. Though successful, it can
be questioned on the ground that the systematic development of logic pre-
supposes mathematical ideas in its formulation. The intuitionists succeeded
in rebuilding large parts of present-day mathematics, but a large part is still
wanting, making intuitionist mathematics less powerful and in many respects
much more complicated than classical mathematics. These are serious ob-
jections to the intuitionistic approach, but it is generally conceded that its
methods do not lead to contradictions, and some hope for a new intuitionist
reconstruction of mathematics carried out in a different and more successful
5.2. CONSEQUENCES AND PHILOSOPHIES 91
way. Unfortunately for the formalists, a consequence of G¨ odel’s incomplete-
ness theorem (see chapter 8) is that the consistency of mathematics can be
proved only in a language which is stronger than the language of mathema-
tics itself. Yet, formalism is not dead - most pure mathematicians are tacit
formalists, but the naive attempt to prove the consistency of mathematics in
a weaker system had to be abandoned. From [11, item from Paul Bernays]
we learn that most mathematicians of all three philosophies are also philo-
sophical realists: “While no one, except an extremist intuitionist, will deny
the importance of the language of mathematics, most mathematicians are
also philosophical realists who believe that the words of this language denote
entities in the real world. Following the Swiss mathematician Paul Bernays
(1888-1977), this position is also called Platonism, since Plato believed that
mathematical entities really exist.”. For more information about realism, see
[57].
92 CHAPTER 5. RUSSELL
5.3 Zermelo Fraenkel
5.3.1 Axiomatic set theory
After the discovery of Russell’s paradox, it became clear that set theory
needed a new and more rigorous basis. Hilbert’s proof theory, that will be
treated in more detail in chapter 6.1, offered a way to put set theory on firm
and hopefully consistent grounds. The so-called ideal calculus was a first
formalization of Cantor’s set theory, but it lacked the preciseness of Hilbert’s
later theories and was inconsistent because it still contained in some form the
(naive) comprehension principle (see page 16). The first real axiomatization
of set theory was given in 1908 by the German mathematician Ernest Zermelo
in [101]. The attitude adopted in his axiomatic development of set theory
is that it is not necessary to know what ‘sets’ are and the ‘things’ that are
its elements, nor what the ‘membership relation’ means [49, see page 288,
paragraph 1]. Zermelo instead postulated a domain B of abstract objects and
represented the elements or ‘things’ of this domain by the letters a, b, c, . . ..
He then defined the primitive notions of equality and membership: a = b
states that ‘a’ and ‘b’ designate the same ‘thing’. a ∈ b is defined on the
domain B and if a ∈ b holds, we call b a set and a an element of this set. Thus
some, but not necessary all objects of B are sets. The assumptions adopted
about these notions are called the axioms of the theory. Its theorems are the
axioms together with the statements that can be deduced from the axioms
using the rules of inference (see also section 6), for example by a system of
logic. Criteria for the choice of axioms have been identified by several people
(see Hilbert’s theory in section 6, or [49, last sentence of page 287]). The
most accepted criteria (more formally defined in chapter 6) include:
1. Consistency of the system (it should be impossible to derive both a
statement and its negation, in other words the paradoxes should be
avoided).
2. Plausibility (the axioms should be in accord with intuitive beliefs about
sets, see [60]).
3. Completeness (richness of the theory: the desirable results of Cantorian
set theory ought to be derived as theorems).
In the next paragraph we will present the set of axioms that Zermelo has
chosen and that formed the basis for all future axiomatizations of set theory
5.3. ZERMELO FRAENKEL 93
(see also section 8.5).
5.3.2 Zermelo Fraenkel (ZF) Axioms
Zermelo formulated his axiomatic system in 1908, the extensions of Fraenkel
are from 1922. In the same year (1922) the Norwegian mathematician Skolem
(1887-1963) proposed a formal language for formulating the theory.
Zermelo noted that the sets involved in a derivation of the paradoxes are
very large
3
(for Cantor’s paradox it is the set of all sets (see section 3.8.1),
for Russell’s paradox it is the set of all sets which are not members of them-
selves (see section 3.8.2), and for the Burali-Forti paradox (see section 3.8.2)
it is the set of all well-orderings). Therefore he wanted to restrict the size of
sets, and he changed the (naive) comprehension principle into his separation
axiom, such that the paradox could no longer be derived:
Separation Axiom: (∀z∃y∀x :: (x ∈ y ↔ x ∈ z ∧ ϕ(x)))
For every set z and definite
4
property ϕ of sets there exists a set whose ele-
ments are exactly those of z having the property ϕ.
There are also certain limitations on the property ϕ (i.e. it should be de-
finite) that we will mention later in section 8.5. We show that the standard
derivation of Russell’s paradox cannot be applied when the naive compre-
hension axiom is replaced by the separation axiom.
Let R = ¦x [ x ∈ Z ∧ x / ∈ x¦
R ∈ R ↔ R ∈ Z ∧ R / ∈ R
→ R / ∈ R, contradiction.
R / ∈ R ↔ R / ∈ Z ∨ R ∈ R
3
The term proper class is sometimes used to refer to these ‘excessively large’ sets; all
other sets are then referred to as improper classes. This means all sets are classes but not
every class is a set. A class that is not a set is called a proper class.
4
See section 8.5 for the definition of the concept of definiteness.
94 CHAPTER 5. RUSSELL
← R / ∈ Z
In both equations above we can only conclude that R ∈ R ↔ R / ∈ R if
we know that R / ∈ Z. Since we cannot directly conclude (or refute) R ∈ Z,
Russell’s derivation of his paradox does not apply.
However, this fact alone does not guarantee that there does not exist a
paradox, as claimed in some articles, but merely that the separation axiom
does not permit the construction of paradoxical sets with elements defined
in terms of the sets themselves. But until consistency is proved, there might
be other less obvious ways to construct a paradox.
We now give all of the ZF axioms that constitute set theory. The first
seven axioms are those that were originally formulated by Zermelo. Axiom
8 and 9 were later added by Fraenkel and von Neumann respectively. The
axioms 1 through 8 are the original set of the Zermelo-Fraenkel axioms.
In the definitions below we use several shorthand notations. If we wish
however we can express these definitions in full detail, such that the notation
of each expression does not depend on previous axioms. For example, in
axiom 8 we used the ∃! to denote that there is exactly one y, and in axiom
9 we used the symbols ∩ and ∅, and in axiom 6 we used ⊆ to express x ⊆ z
as a shorthand for (∀y :: y ∈ x → y ∈ z). The separation and substitution
axioms are actually axiom schemes.
The Zermelo-Fraenkel axioms:
1. Extensionality axiom (or axiom of determination):
(∀x, y, z :: (z ∈ x ↔ z ∈ y) → x = y)
Sets are uniquely determined by their members,or to be exact: if every
element of a set x is at the same time an element of y, and conversely,
then x = y.
2. Axiom of the empty set:
(∃x∀y :: y / ∈ x)
There is an (improper, see also footnote on page 93) set, the ‘null’ or
‘empty’ set, which contains no elements at all.
5.3. ZERMELO FRAENKEL 95
3. Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ϕ(x)), ϕ is definite and does not contain y.
For every set z there exists a set y whose elements are exactly those of
z having the property ϕ.
4. Pairing axiom:
(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))
Given two sets a and b there exists a set whose elements are exactly a
and b.
5. Sum-set axiom or Union axiom:
(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))
For every set z there exists a set y whose elements are exactly those
objects occurring in at least one element of z.
6. Power set axiom
(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)
For every set z there is a set y that includes every subset of x.
7. Axiom of infinity:
(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : ¦a¦ ∈ z))
There exists a successor set.
8. Axiom of replacement or axiom of substitution (by Fraenkel):
(∀x∃!y :: ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x, y))))
The image of a set under an operation ϕ (functional property) is again
a set.
9. Axiom of foundation or axiom of regularity (by von Neumann):
(∀a ::= ∅ → (∃b :: b ∈ a ∧ b ∩ a = ∅))
Every non-empty set is disjoint from at least one of its elements.
Theorem: (from [49, chapter 11]) The domain B itself (see page 92) is not
a set.
Proof: Suppose V is any given set. Then
5
, V has a subset W that consists of
those elements of V that are not members of themselves. But then W is not
an element of itself (because in that case we would have W ∈ W, while W
5
Since the property x / ∈ x is definite. See section 8.5 for the definition of the concept
of definiteness.
96 CHAPTER 5. RUSSELL
consists of elements that are not members of themselves). But if W would
be an element of V − W, we would also have W ∈ W. This means that W
is not a member of V . But V is certainly in B, and therefore B is not the
same as V . Thus B cannot coincide with any set at all.
The theory is not complete, since many statements are independent of
ZF. Independent of the previous axioms, the following two statements have
a more dubious status (and are not part of standard ZF):
10. Axiom of choice (AC):
(∀x :: (∃f : f is a function : Dom(f) = x−¦∅¦∧Ran(f) ⊂
¸
A∧(∀a :
a ∈ Dom(f) : f(a) ∈ a)))
Every set x has a choice function.
Definition of choice function: A function f is called choice function for
the set V := Dom(f) = V −¦∅¦ ∧ (∀v : v ∈ Dom(f) : f(v) ∈ V )
11. Generalized Continuum Hypothesis(GCH):
For any cardinal ℵ
r
, ¦0, 1¦

r
= ℵ
r+1
In 1908 Felix Haussdorf proposed this generalization of CH. Another
formulation of this axiom and more information are given in section 3.6. In
the remainder of this section, we will give a short explanation of the nature
of the other axioms. For more detailed information, we refer to section 8.5
and to the rich literature on set theory that is available (for example [17],
[24], [49, chapter 11], [28]).
The axioms are not minimal. For example, as we have already seen in
section 2.2
6
, the axiom of the empty set can be deduced from the separation
axiom. We also have empty set axiom + substitution axiom ¬ separation
axiom. We have also seen in section 2.2 how we can define basic operations
with the extensionality and separation axioms. The pairing, sum and pow-
erset axioms, together with the extensionality axiom, ensure uniqueness of
the pairs, sums and powersets of sets. With these axioms alone we can al-
ready create an infinite number of axioms. However, each set constructed
6
The existence of the empty set in section 2.2 was actually derived from the compre-
hension principle but the result can similarly be obtained from the separation axiom.
5.3. ZERMELO FRAENKEL 97
with axioms 1 to 6 only has a finite number of elements. It is the infinity
axiom that we need to create infinite sets. These sets are not unique, but the
smallest successor set, denoted ω, is unique. We call its elements the natural
numbers. With this axiom we can now also prove the principle of induction
for ω (see section 3.4.3). The substitution axiom says that whenever ϕ is a
property of sets, such that to every x there is exactly one y for which ϕ(x, y),
and a is a set, then there exists a set, the elements of which are exactly
those y for which an x ∈ a exists such that ϕ(x, y). The foundation axiom
says that each non-empty set has epsilon-minimal elements (see below). An
implication of this axiom is that there is no function f defined on ω such
that (∀i : i ∈ ω : f(i + 1) ∈ f(i)). For a motivation and analysis of the role
of the foundation axiom we refer to [17, section 2.1].
Definition of epsilon-minimal:
An element b ∈ a is epsilon-minimal in a := b ∩ a = ∅
Another corollary of the foundation axiom is that there is no set which
has itself as its only element. Note that to prevent the paradoxes we need
the separation axiom, not the foundation axiom.
The origin of the axiom of choice was Cantor’s recognition of the impor-
tance of being able to well-order arbitrary sets; i.e., to define an ordering
relation for a given set such that each nonempty subset has a least element.
The virtue of a well-ordering for a set is that it offers a means of proving
that a property holds for each of its elements by a process (transfinite in-
duction) similar to mathematical induction. Zermelo (1904) gave the first
proof that any set can be well-ordered. His proof employed a set-theoretic
principle that he called the axiom of choice, which, shortly thereafter, was
shown to be equivalent to the so-called well-ordering theorem. One form of
this principle is expressed as the axiom of choice. A choice function for a set
A ‘chooses’ an element from each non-empty subset in A. If x is a nonempty
set the elements of which are nonempty sets, then there exists a function f
with domain y such that for member a of y, f(a) ∈ a. For a more detailed
discussion of the axiom of choice we refer to [17, section 2.9].
Intuitively, the axiom asserts the possibility of making a simultaneous choice
of an element in every nonempty member of any set; this guarantee accounts
for its name. The assumption is significant only when the set has infinitely
many members. Zermelo was the first to state explicitly the axiom, although
98 CHAPTER 5. RUSSELL
it had been used but essentially unnoticed earlier. It soon became the subject
of vigorous controversy because of its unconstructive nature. There are a few
mathematicians who feel that the use of the axiom of choice is improper, but
to the vast majority it, or an equivalent assertion, has become an indispens-
able and commonplace tool. For this discussion of the axiom of choice we
have used [63], [77] and [11].
A discussion of the Generalized Continuum Hypothesis can be found in
section 3.7.
Chapter 6
Hilbert
The further a mathematical theory is developed, the more harmo-
niously and uniformly does its construction proceed, and unsus-
pected relations are disclosed between hitherto separated branches
of science.
- Hilbert, quoted in [76]
David Hilbert (1862-1943) was a German mathematician who reduced
geometry to a series of axioms and contributed substantially to the esta-
blishment of the formalistic foundations of mathematics. His first work was
on invariant theory and in 1888 he proved his famous Basis theorem (see
[5]). After that he did significant work in the areas of algebraic number
theory, and published his ‘Zahlbericht’, or ‘Report on the theory of numbers’
in 1897. In 1899 he published the ‘Grundlagen der Geometrie’ (to appear
in English as ‘The foundations of Geometry’ in 1902), which contained (see
[31, section 4.7.2]) what would become a widely accepted set of 21 axioms
for Euclidian geometry and an analysis of their significance. This axiomatic
method that Hilbert used (for geometry, but its application and concept
is more general and can be used far beyond the domain of geometry, see
also [57, section 14.7]) will be treated in section 6.1. A substantial part of
Hilbert’s fame rests on a list of 23 mathematical problems he outlined in
1900, and posed as a challenge for the next century. Some of these problems
were related to the foundations of mathematics (see section 6.2). In 1905
Hilbert attempted to lay a firm foundation of mathematics by proving its
consistency, resulting in two volumes of ‘Grundlagen der Mathematik’ that
99
100 CHAPTER 6. HILBERT
were intended to lead to a proof theory. Despite that in 1931 Kurt G¨ odel
showed this goal to be unattainable (see chapter 8), the work Hilbert had
done on the foundations of mathematics nevertheless remained influential to
the development of logic. Hilbert’s work on integral equations in about 1909,
(see [45]) led to research in functional analysis and established the basis for
his work on infinite-dimensional space, later called Hilbert space (see [22,
page 232]). When Hilbert was made an honorary citizen of G¨ ottingen he
gave an address which ended with six famous words, showing his enthusiasm
for mathematics and optimism for solving mathematical problems: “There
are absolutely no unsolvable problems. Instead of the foolish ignorabimus
[Latin for ‘the ignorant’], our answer is on the contrary: Wir m¨ ussen wissen,
Wir werden wissen” [We must know, We shall know].
6.1. HILBERT’S PROOF THEORY 101
6.1 Hilbert’s proof theory
Hilbert formalized mathematical theories in order to turn them into well-
defined objects of discussion, thus making possible the new kind of investi-
gation to which he gave the new name meta-mathematics. Hilbert was the
first who emphasized that strict formalization of a theory involves the total
abstraction from the meaning, the result being called a formal system or
formalism. In its structure, a formalized theory is no longer a system of
meaningful propositions but one of sentences as sequence of words, which
in turn are sequences of letters (a symbolic language). Hilbert’s method of
making the formal system as a whole the object of mathematical study is
called metamathematics or proof theory.
What is metamathematics? The study about mathematics itself (with
respect to formalized mathematical systems, metamathematics thus consists
of statements about the signs and formulas occurring within axiomatic sys-
tems). One of the primary goals of metamathematics is to determine the
nature of mathematical reasoning
After Hilbert presented an axiomatic development of geometry in ‘Grund-
lagen der Geometrie’ (1899), he devoted himself to the much greater task of
applying his new metamathematic method to pure mathematics as a whole.
Or, as Hilbert wrote in 1917: “Since the examination of the consistency is a
task that cannot be avoided, it appears necessary to axiomatize logic itself
and to prove that number theory and set theory are only parts of logic”.
Hilbert took a formal(istic) approach to achieve this logistic goal (logicism
is the study that uses logic as the basis of mathematics and formalists at-
tempted to successfully axiomatize mathematics, see also the philosophies in
section 5.2). Thereto Hilbert identified three properties that an axiomatic
system should have: it should be decidable, complete and consistent. In or-
der to define these notions, we first have to make precise some other concepts.
Definition of an axiom:
A proposition that is regarded as true without proof
Definition of free variable:
A variable that is not bound within the scope of a quantifier
102 CHAPTER 6. HILBERT
An axiom that does not contain any variables is also called an axiom
statement, an axiom with free variables is called an axiom scheme and each
free variable is to be quantified over all well-formed formulas.
Definition of statement (or sentence): A well-formed formula with no
free variables
Of the systems that Hilbert’s proof theory applies to, we here consider
those susceptible to G¨odel’s incompleteness theorem (that will be presented
in chapter 8).
Definition of an STGA language: A language
1
L is Susceptible to
G¨odel’s argument (STGA) if it consists of:
1 c, a denumerable set of (well-formed) expressions (also called formulas)
of L
2 o ⊆ c, sentences of L (i.e. with no free variables)
3 { ⊆ o, provable sentences of L
4 1 ⊆ o, refutable sentences of L
5 H ⊆ c, predicates of L (i.e. with free variables, H ∩ o = ∅). For
convenience, we here assume predicates to have exactly one variable.
6 A function ϕ : c N → c, ϕ assigns to every E ∈ c and n ∈ N an
expression E(n) such that for every H ∈ H we take for E and every
n ∈ N, H(n) is a sentence (H(n) ⊆ c hence, H(n) ⊆ o).
We can think of such a function ϕ as a substitution function. Infor-
mally, the sentence H(n) expresses the proposition that the number n
belongs to the set names by H.
The following set is the only one that depends on a semantic
interpretation of the expressions, and is normally determined by a
model that we accept as representing the truth. The model should
be distinguished from the set of derivation rules that (syntactically
or mechanically) determines whether sentences are provable or
1
Sometimes also called system, since it not only defines a language but also includes
the (dis)provability and truth of expressions.
6.1. HILBERT’S PROOF THEORY 103
refutable. It is important to realize that the truth of a sentence
is not the same as the provability of that sentence.
7 T ⊆ o, true sentences of L. This set can be determined by a model
(see page 107)
First, we give an intuitive explanation of this definition: In most parts of
mathematics, not every sequence of symbols is meaningful or useful. There-
fore we only consider the so-called well-formed formulas c. Some of these
formulas (also called propositions) do not contain free variables, we name
them sentences (o). Some of them are provable from the axiomatic system
(i.e. they can be derived from the axioms and derivation rules of the axiomatic
system), and are elements of {. Others are refutable, also called disprovable
(i.e. their negation can be derived from the axioms and derivation rules of
the axiomatic system) and are elements of 1. These notions only depend on
whether the sentence is derivable from the axiomatic system and are inde-
pendent from the truth of the sentence. We call the set of true sentences T
(the other sentences are false). Other formulas have free variables, i.e. they
are functions. We call them predicates (H). We also assume there exists a
function ϕ that assigns to every expression H ∈ H and natural number n a
sentence H(n).
What is an Axiomatic System? An axiomatic system (sometimes also
called formal axiomatic system) is a logical system that gives rise to an STGL
language and has an explicitly stated finite set of axioms from which provable
sentences can be derived (using a finite set of derivation rules)
The set of axioms and derivation rules determines which sentences of L
are provable or not. The axiomatic system also contains a syntax definition
that determines the well-formedness of expressions of L. Normally, the syn-
tax definition of an axiomatic system consists of an alphabet of symbols and
a set of rules. We show that this notion of an axiomatic system gives rise
to a language that falls under the category of STGL languages. Such an
axiomatic system A is often defined as follows:
104 CHAPTER 6. HILBERT
Definition of axiomatic system: An axiomatic system A consists of:
• An alphabet Σ, consisting of a finite number of constants (with their
arities) and variables.
• A recursive definition of a syntax, determining which formulas are well-
formed formulas.
• An initially determined and fixed set of axioms and derivation rules
(also called transformation rules or rules of inference).
The recursive definition over the given alphabet gives us the set of ex-
pressions. The variables enable us to form predicates. The set of axioms and
derivation rules let us prove or refute sentences. Ideally, we want all sen-
tences that are provable coincide with the sentences we intuitively consider
true ({ = T ) and the refutable sentences coincide with those we consider
false. We call a system with this property correct. We now give an example
of a definition of a simple axiomatic system.
Example: axiomatic system A
1
• Σ = ¦∨
2
,
1
, (
0
, )
0
, ∀
2
, x
0
, y
0
, R
2
0
, true
0
, false
0
¦
The numbers that are written in superscript denote the arity of the
relations; a constant or variable is a 0-ary relation.
• ϕ is a well-formed formula if it
0. is one of the constants true and false.
1. is an atomic formula R
i
(x
1
, . . . , x
j
), with R
i
a relation with arity
j, and x
1
, . . . , x
j
variables or constants.
2. has the form of ϕ
1
∨ ϕ
2
, ϕ
1
∧ ϕ
2
, (ϕ
1
), ϕ
1
, ∀x
i

1
), where ϕ
1
and
ϕ
2
are smaller formulas and x
i
is some variable from Σ.
6.1. HILBERT’S PROOF THEORY 105
• For all variables x, variables or constants c and d and well-formed for-
mula ϕ,
R
0
(c, d)
true
∀x(ϕ)
false
false
true
true
false
true ∧ ϕ
ϕ
false ∧ ϕ
false
true ∨ ϕ
true
ϕ ∨ true
ϕ
106 CHAPTER 6. HILBERT
The STGA language L that can be constructed
2
on the basis of A
1
,
denoted by L
A
1
, consists of the following parts:
1. c is the set of usual mathematical predicates formed by the symbols of
the given alphabet (so c includes the binary relation R
0
).
2. o is the set of those expressions without free variables (i.e. proposi-
tions).
3. The provable sentences { are those that are true from the derivation
rules. For example, false ∧ R
0
(false, true) → true ∧ R
0
(false, true)
→ true ∧ true → true.
4. The refutable sentences 1 are those that are false from the derivation
rules. For example, ∀y (false ∨y) ∧ true → false ∧ true → false.
5. The predicates are those expressions with one free variable.
6. For each such predicate we can replace the free variable by a formula
that is represented
3
by a natural number, and obtain a proposition.
7. The definition of an axiomatic system does not include a model. If we
think of the standard logic that is used in practice, we can see that for
all formulas except those with an ∀-symbol, the formulas are derivable
if and only if they are true.
We now introduce some concepts related to STGA languages and axiomatic
systems. We assume that A is an axiomatic system that gives rise to an
STGA language L.
Definition of derivable: A formula ϕ is derivable in L := ϕ ∈ {.
A formula ϕ is derivable from an axiomatic system A, notation A ¬ ϕ :=
there is an axiom a
i
of A and a sequence of formulas ϕ
1
,. . . , ϕ
l
such that
ϕ
1
= a
i
and ϕ
l
= ϕ and each ϕ
i
follows from the preceding formulas and the
axioms of A by the derivation rules of A.
2
Sometimes it is also said that an axiomatic system A
1
gives rise to a language L
A
3
An example of such a bijective function between a predicate and a set of natural
numbers will be given in section 8.2.
6.1. HILBERT’S PROOF THEORY 107
We call the sequence of formulas ϕ
1
, . . . , ϕ
l
in a derivation of the state-
ment ϕ a formal proof π of the statement ϕ. When A ¬ ϕ, we also write
ϕ ∈ A.
Example:
A
1
¬ false ∧ R
0
(false, true)
A
1
¬ ∀x)x (since the formula is not well-formed, i.e. does not follow
to be true from the syntax definition)
A
1
¬ ∀y (false ∨y) ∧ true (since it does not follow from the derivation
rules, i.e. is a refutable sentence)
Hilbert proposed a program to reformulate all mathematics as a formal
axiomatic theory, and this theory has to be proved to be consistent, i.e. free
from contradiction. The standard method that was used to prove the consis-
tency of axiomatic systems was to give a ‘model’. A model for an axiomatic
theory is simply a system of objects, chosen from some other theory and
satisfying the axioms.
This means we can relate axiomatic systems to existing systems by means
of a model, also called interpretation or structure. A model of a formal
axiomatic theory is a well-defined mathematical system with the particular
structure that is characterized by the theory.
Definition of universe: Set of values that variables of an axiomatic system
may take
Definition of a model: A universe together with an assignment of n-ary
relations to n-ary constants, and a corresponding assignment of the variables.
We define a model ´for an axiomatic system A by : ´ = (U, P
1
, . . . , P
k
)
with U a universe for A and P
1
, . . . , P
k
the relations corresponding to symbols
R
1
, . . . , R
k
of A. If a formula ϕ is true in the model ´ (i.e. by interpretation
of the relation symbols by the corresponding relations), notation ´ [= ϕ, we
say that ´ is a model of ϕ.
108 CHAPTER 6. HILBERT
Example: Let ´
1
= (N, ≤) be a model for axiomatic system A
1
´
1
[= ∀x∀y(x ≤ y ∨ y ≤ x)
´
1
[= ∀x∀y(x ≤ y ∧ y ≤ x)
Note that instead of using R
1
for the relation symbol, we immediately took
the interpretation ≤.
A theory Th of a model ´, notation Th(´) is the set of true statements
in the language of that model.
Definition of a theory: Th(´) := ¦ϕ [ ϕ is a statement and ´ [= ϕ¦
So now we can say that Hilbert was looking for an axiomatic system for
which logic can be a model. Hilbert proposed such an axiomatic system to
have the properties of consistency, completeness and decidability. We will
now introduce these concepts, along with some other properties of axiomatic
systems. Since the properties of an axiomatic system A give rise to corre-
sponding properties in the language L
A
, we here distinguish in each definition
between the property of a language and of an axiomatic system.
Definition of decidability:
A language L is decidable := (∀ϕ :: (ϕ ∈ { ∨ ϕ ∈ 1)).
An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that de-
cides in a finite number of steps whether (or not) A ¬ ϕ) (see also [49, page
270])
Definition of consistency:
A language L is consistent := (∃s : s ∈ o : s ∈ { ∧ s ∈ 1), i.e. { ∩ 1 = ∅
or no sentence is both provable and refutable in L.
An axiomatic system A is consistent := (∃ϕ :: A ¬ ϕ ∧ A ¬ ϕ) (i.e. it is
not possible for any formula ϕ, to derive both ϕ and ϕ) (see also [49, page
240])
A language L is inconsistent if is not consistent. Clearly, L is inconsistent
if { and 1 are not disjoint. Note that consistency and decidability do not
refer to T , but only concern { and 1. The following definitions of com-
pleteness, soundness and correctness also depend on the truth set T (and
therefore on the model that determines that truth set).
6.1. HILBERT’S PROOF THEORY 109
Definition of completeness:
A language L is complete for a model ´ := (∀ϕ :: ´ [= ϕ → ϕ ∈ {).
An axiomatic system A is complete for model ´ :=
(∀ϕ :: ´ [= ϕ → A ¬ ϕ) (i.e. all true statements in the model are deriva-
ble/provable)
A language L is incomplete if it is not complete. Note that the statement
(∀ϕ :: ´ [= ϕ → A ¬ ϕ) is equivalent with (∀ϕ :: A ¬ ϕ → ´ [= ϕ), i.e. all
statements ϕ that are not derivable/provable, are also not true in the model.
Definition of soundness:
A language L is sound for a model ´ := (∀ϕ :: ϕ ∈ { → ´ [= ϕ).
An axiomatic system A is a sound axiomatization for a model ´ :=
(∀ϕ :: A ¬ ϕ → ´ [= ϕ) (i.e. if a statement ϕ is derivable/provable, it is
true in the model)
Definition of correctness:
A language L is correct for a model ´ := { ⊆ T ∧ 1∩ T = ∅ (i.e. every
provable sentence is true and every refutable sentence is false (not true)).
An axiomatic system A is correct for a model ´ := A is sound for ´ and
A is complete for ´
Theorem: If L is correct, it is consistent.
Proof: This follows directly from the definitions of correctness and consis-
tency because if { is a subset of T and T is disjoint from 1, then { must
be disjoint from 1.
110 CHAPTER 6. HILBERT
6.2 Hilbert’s 23 problems
Who of us would not be glad to lift the veil behind which the future
lies hidden: to cast a glance at the next level of our science and
at the secrets of its development during future centuries? What
particular goals will there be toward which the leading mathema-
tical spirits of coming generations will strive? What new methods
and new facts in the wide and rich field of mathematical thought
will the next centuries disclose?
- D. Hilbert, in the opening of his speech to the 1900 Congress
in Paris
In 1900 Hilbert outlined his list of 23 mathematical problems to the In-
ternational Congress of Mathematics in Paris, which he urged upon the at-
tention of his contemporaries. His famous address was important and still
today influences and stimulates mathematical research all over the world.
It was not only a collection of problems, but it was also his philosophy of
mathematics (see also the formalist viewpoint in section 5.2) and a collec-
tion of problems important to that philosophy. Many of the problems have
since been solved, and each solution was a noted event (or even a mathema-
tical breakthrough). Some of these problems however remain unsolved till
this day. In 2000, in the footsteps of Hilbert, the Clay Mathematics Insti-
tute (see http://zax.mine.nu/interests/questions/clay.htm) has made a new
list of 7 (for a large part mathematical) problems to be solved in this century.
Among those problems is one of the original problems (number 8) of
Hilbert. It requires a solution to the Riemann hypothesis, which is usually
considered to be the most important unsolved problem in mathematics. We
mention some of the original problems that are related to the foundations
of mathematics. For a complete source of information on the 23 (or 25?,
see [32]) original publications of Hilbert, see the articles [41] and [40], also
available online [42].
6.2. HILBERT’S 23 PROBLEMS 111
• Problem 1: Cantor’s problem of the cardinal number of the continuum.
This problem is also known as the Continuum Hypothesis and exten-
sively covered in section 3.7.
• Problem 2: The consistency of the axioms of arithmetic. The question
is whether it can be shown that the axioms on which arithmetic is based
are consistent. G¨odel later showed that any formal system that contains
arithmetic (see chapter 8) can never prove its own consistency. Another
metamathematical argument might exist, that cannot be expressed in
the system, but can prove its consistency.
• Problem 6: Mathematical treatment of the axioms of physics, asks to
treat in the same manner, by means of axioms, those physical sciences
in which mathematics plays an important part; in the first rank are the
theory of probabilities and mechanics. So far no complete axiomatiza-
tion of physics has been found.
• Problem 9: Proof of the most general law of reciprocity in algebraic
number theory. For any field of numbers, the law of reciprocity (for
more references see http://www.mathematik.uni-bielefeld.de/∼kersten/-
hilbert/prob9.html) is to be proved for the residues of the l
th
power,
when l denotes a prime, and further when l is a power of 2 or a power
of an odd prime. This problem is still unsolved.
• Problem 10: Decidability of solvability of diophantine equations. This
question asks if, ‘given a diophantine equation with any number of un-
known quantities and with rational integral numerical coefficients, to
devise a process according to which it can be determined by a finite
number of operations whether the equation is solvable in rational inte-
gers’. In modern terminology the problem asks to devise an algorithm
that tests whether a polynomial has an integral root. A root of a poly-
nomial is an assignment of values to its variables so that the value of the
polynomial is 0. A root is an integral root if all variables are assigned
integer values. Some polynomials have an integral root (for example
6x
3
yz
2
+3xy
2
−x
3
−10 has an integral root at x = 5, y = 3 and z = 0)
and some do not.
Hilbert did not use the term algorithm but rather ‘a process according
to which it can be determined by a finite number of operations’. In
order to solve this problem this notion had to be made more precise
112 CHAPTER 6. HILBERT
(this was done by Turing, see section 9.1). Also, Hilbert asked that an
algorithm be devised. Thus he apparently assumed such an algorithm
exists, but now we know that this problem is algorithmically unsolv-
able. In 1970, the young Russian Yuri Matijasevic, building on the
work of Martin Davis, Hilary Potnam and Julia Robinson, showed that
no algorithm exists for testing whether a polynomial has integral roots.
• Problem 23: Further development of the methods of the calculus of
variations. Of the 23 problems Hilbert posed, this one is the least defi-
nite, since it involves the general question of extending the calculus of
variations, which basically is the theory of the variation of functions.
With some examples that we will not treat here, Hilbert gave a jus-
tification of the necessity for an extension of the differential and in-
tegral calculus (for more references see http://www.mathematik.uni-
bielefeld.de/∼kersten/hilbert/prob23.html).
At the end of his article, Hilbert says that he does not believe mathema-
tics will, like other sciences, split into separate branches whose connection
becomes ever more loose, but that the organic unity of mathematics is in-
herent in the nature of this science, for mathematics is the foundation of all
exact knowledge of natural phenomena. For a more detailed assessment of
Hilbert’s view, see [49, section 12.4] and [31, section 4.7].
Chapter 7
Types
7.1 Russell and Whitehead’s Principia Ma-
thematica
Logic has become more mathematical and mathematics has be-
come more logical. The consequence is that it has now become
wholly impossible to draw a line between the two; in fact, the two
are one. They differ as boy and man; logic is the youth of ma-
thematics and mathematics is the manhood of logic.
- B. Russell in [79, page 194]
In section 4.1 we saw that with the postulates he presented, Peano stated
and organized the fundamental laws of number theory, the core of mathema-
tics. If statements satisfying these conditions could be derived in this logic,
it would show that (at least part of) mathematics was founded in pure logic.
As we have seen in section 4.2, Frege was adherent to the goal of logicism that
all of mathematics could be derived from logic alone. But unfortunately the
language that he created was inconsistent, as we have learned from Russell’s
paradox in section 5.1. In his 1908 paper, ‘Mathematical Logic as Based on
the Theory of Types’, Russell laid out a theory to eliminate the paradoxes.
With Principia Mathematica, Bertrand Russell and his teacher, the mathe-
matician Alfred Whitehead, presented this theory to prevent the paradoxes
while at the same time allowing many of the operations Frege considered de-
sirable. The theory of types basically says that all sets and other entities have
113
114 CHAPTER 7. TYPES
a logical ‘type’, these types can be ordered and sets are always constructed
from specified members with lower types. We will look at the theory of types
in more detail in section 7.2.
Principia Mathematica consisted of three volumes (sometimes also called ‘the
Principia’) and was named after the ‘Philosophiae naturalis principia mathe-
matica’ of the English physician Isaac Newton. But unlike Newton’s book it
dealt not with the application of mathematical techniques to physics, but to
logic and mathematics itself. With their mathematical treatment of the prin-
ciples of the mathematicians, Russell and Whitehead intended to summarize
the recent work in logic as well as to give a revolutionary and systematical
development of mathematical logic and derive basic mathematical principles
from the principles of logic alone.
Their collaboration began in 1903 when Whitehead and Russell were both
in the initial stages of preparing second volumes to earlier books on related
topics: Whitehead’s 1898 ‘A Treatise on Universal Algebra’ and Russell’s
1903 ‘The Principles of Mathematics’. Their work overlapped considerably
and they began collaborating on what would become ‘Principia Mathema-
tica’. The approach of Russell and Whitehead was essentially that of Frege,
to define mathematical entities (like numbers) in pure logic and then derive
their fundamental properties. Indeed, their definition of natural numbers was
basically the same as the one of Frege, but unlike him, they opted to avoid
the philosophical aspects and justifications. Although ‘Principia’ was largely
successful there still was critique on the axioms of infinity and the axiom of
reducibility, they were considered to be too ad hoc solutions to be justified
philosophically. In 1919 Russell published about the philosophy behind his
work in an ‘Introduction to Mathematical Philosophies’ which was accessible
to a broad audience and therefore has been the main source through which
Russell’s logicist view of mathematics has become known.
I quote the following assessment about Principia Mathematica from [91]:
“In addition to its notation (much of it borrowed from Peano), its mas-
terful development of logical systems for propositional and predicate logic,
and its overcoming of difficulties that had beset earlier logical theories and
logistic conceptions, the Principia offered discussions of functions, definite
descriptions, truth, and logical laws that had a deep influence on discus-
sions in analytical philosophy and logic throughout the 20th century. What
is perhaps missing is any hesitation or perplexity about the limits of logic:
whether this logic is, for example, provably consistent, complete, or decida-
7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 115
ble, or whether there are concepts expressible in natural languages but not
in this logical notation. This is somewhat odd, given the well-known list of
problems posed by Hilbert in 1900 that came to animate 20th-century logic,
especially German logic. The Principia is a work of confidence and mastery
and not of open problems and possible difficulties and shortcomings; it is a
work closer to the naive progressive elements of the Jahrhundertwende than
to the agonizing fin de siecle.”. We would like to add that with the very for-
mal and accurate build-up of mathematics, Russell and Whitehead not only
managed to avoid the paradoxes but also created one of the most impressive
and complicated works of all times and that is, next to Aristotle’s Organon,
considered to be the most influential book on logic that was ever written.
In the next section we will further investigate Russell’s theory of types.
The English mathematician Frank Plumpton Ramsey (1903-1930) offered
criticism to the theory of types that was accommodated in later editions of
Principia Mathematica. The result of this is the ‘deramified theory of types’
that will be treated in subsequent sections, together with a later simplification
to this theory by the mathematicians Hilbert and Wilhelm Ackermann (1896-
1962) from Germany.
The mathematician Alonzo Church also published articles on type systems,
but did not develop his typed version of lambda calculus before the 1940’s,
and his typed lambda calculus thereby falls outside the scope of this article
(1870-1940). We will only summarize his work in this paragraph. The main
difference between the type structure of Russell and that of Church is that
the former is set-based with linear ordering of types and the latter is function
based with a non-linear order of types. The type theory that emerged from
Church’s lambda calculus (see section 9.2) was extended with simple types
in 1940 to prevent paradoxes, similar to the extension of logical set theory
with simple types by Russell in 1910 to avoid the paradoxes. Church also
proposed another logical set theory in 1974.
[..] in the simple theory of types it is well known that the indi-
viduals may be dispensed with if classes and relations of all types
are retained; or one may abandon also classes and relations of the
lowest type, retaining only those of higher type. In fact any finite
number of levels at the bottom of the hierarchy of types may be
deleted. But this is no reduction in the variety of entities, because
the truncated theory of types, by appropriate deletions of entities
116 CHAPTER 7. TYPES
in each type, can be made isomorphic to the original hierarchy -
and indeed the continued adequacy of the truncated hierarchy to
the original purposes depends on this isomorphism.
- A. Church in ‘The need for abstract entities’.
Organization of Principia Mathematica
The nearly 2,000 pages Principia Mathematica starts with a short preface
that explains what it wants to demonstrate, namely that pure mathematics
can be based on logic alone and requires no other primitive notions. Russell
classifies statements that involve logical constants only (such as the laws of
reciprocity, see page 18 of Principia Mathematica) as pure mathematics, and
other mathematical assertions that also refer to non-logical contents (such as
the statement that (perceptual) space is three-dimensional) as part of applied
mathematics. The belief was then expressed that pure mathematics was suf-
ficient to include all traditional mathematics. Then, after an introduction,
the first volume introduces a symbolic logic that is based on a small set
of axioms, and then lays out the propositional and predicate calculi. Built
upon these, Whitehead and Russell define types, sets, relations and their
properties, and basic operations on sets. The second volume continues with
a purely logical theory of cardinal and ordinal arithmetic. This allowed them
to introduce basic arithmetic, including addition, multiplication and expo-
nentiation of both finite cardinals and of relations.
The volume ends with a general theory of simply ordered sets (series) which
is followed by a logical base of fundamental mathematical analysis, including
subjects as convergent sequences, continuity, limits and derivatives.
The third volume was meant to prepare the ground for the fourth and con-
cluding volume on geometry (which was never completed), and contained a
theory of numbers that was called ‘measurement’. It starts with a theory of
well-ordered sets, finite, infinite and continuous series, the negative integers,
ratios and the real numbers, and finally vectors, coordinates and basic geo-
metric notions such as angles.
More details about the organization of Principia Mathematica and a critical
assessment of its work can be found in [31, chapter 7, and specifically section
7.8].
7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117
The symbolic logic and notation of Principia Mathematica
Russell and Whitehead opted for a more modern notation of Peano in-
stead of Frege’s Begriffsschrift. Unlike Frege, Russell and Whitehead treated
functions as first-class citizens. A good introduction to the logical calculus
and the specific notation that was used in Principia Mathematica can be
found in [49, section 3.2 and 3.3] and [31, sections 7.2, 7.3, 7.7 and 7.8].
Russell’s theory of types
Russell’s 1908 book included a categorization of most of the important
contradictions of that time, and an analysis of their common characteristics.
To prevent the paradoxes he catalogued, Russell formulated the vicious circle
principle (see page 85) and implemented it using types in Principia Mathe-
matica (see for details [31, section 7.9] and [49, section 3.2 and 3.3]).
What is a type?
A type is the range of significance of a propositional function, that is, the
collection of arguments for which the said function is significant and has val-
ues.
The type of a variable in a proposition is fixed by all the values the func-
tion is concerned with, i.e. by the totality over which the variable ranges.
This division of objects into types (the type of an object can be seen as a
property of that object) is necessary to conform to the vicious circle principle,
i.e. to make sure that ‘whatever contains an apparent variable must not be
a possible variable of that variable’. This can be established by making sure
that ‘an apparent variable’ is of a different and higher type than the possible
values of that type. This linear order of types prevents vicious circles, since
the variables contained in an object determine the type of that object.
Russell then defined an individual as being not a proposition but a con-
stant, destitute of complexity. We can now categorize propositions by their
types. First order propositions are elementary propositions that only con-
tain individuals, second order propositions are propositions with first-order
propositions as variables and possibly propositions of lower than first order
types. This can be continued, such that the n + 1
th
order propositions con-
tain propositions of order n and possibly others of order smaller than n.
118 CHAPTER 7. TYPES
We now also restrict relations like ∈ so that x ∈ y is only significant when
y is of a type one level higher than x, and we confine quantifiers always
to a single level. As can be proved however, this way of restricting propo-
sitions prevents the paradoxes but can in some cases be needlessly restrictive.
For more information about types in Principia Mathematica, see [31,
section 7.9] and [49, section 3.3]. For a formalization (in modern notation) of
Russell’s Ramified Theory of Types (RTT), we refer to [86, chapter 3]. On
its turn, this reference is again partly based on [52], [53], [54] and [43], all of
which in a certain context discuss RTT.
A detailed introduction to the (symbolic) logic and notation of Principia
Mathematica, as well as a formal introduction to RTT, STT and NF and
MP (see section 7.3), is to be included in a later version of this report.
7.2. RAMSEY, HILBERT AND ACKERMANN 119
7.2 Ramsey, Hilbert and Ackermann
Suppose a contradiction were to be found in the axioms of set
theory. Do you seriously believe that a bridge would fall down?
- F.P. Ramsey, quoted in [58]
Ramsey published his first major work ‘The Foundations of Mathematics’
(see [69, page 105-142]) in 1925. In this publication he attempted to improve
Principia Mathematica in two ways. First he proposed dropping the axiom
of reducibility which, he writes, is “[...] certainly not self-evident and there
is no reason to suppose it true; and if it were true, this would be a happy
accident and not a logical necessity, for it is not a tautology.”. His second
simplification is to suggest simplifying Russell’s theory of types by regarding
certain semantic paradoxes as linguistic. He accepted Russell’s solution to
remove the logical paradoxes of set theory arising from, for example, ‘the
set of all sets which are not members of themselves’. However, the seman-
tic paradoxes such as ‘this is a lie’ are, Ramsey claims, quite different and
depend on the meaning of the word ‘lie’. These he removed with his reinter-
pretation of the axiom of reducibility.
After his suggestions, Russell’s theory became known as the ramified theory
of types (RTT), and Ramsey’s modification of the theory as the deramified
theory of types.
For more detailed information about the history of deramification, we refer
to [86, chapter 4].
Hilbert, together with Ackermann (see [2]), simplified Russell’s theory of
types by removing the orders into what has become known as the ‘simple
theory of types’ (STT). We quote from page 115 of [49]: “[In the simple
theory of types,] every individual or individual variable is said to be of type
i; and if a predicate or predicate variable ϕ(x
1
, . . . , x
n
) has arguments x
1
,
. . . , x
n
, of types τ
1
, . . . , τ
2
respectively, then ϕ(x
1
, . . . , x
n
) is said to be of
type (τ
1
, . . . , τ
2
). Thus, for example, any predicate with two individual ar-
guments is of type (i, i), while a predicate with a single argument that is
itself a predicate with two individual arguments is of type (i, i, (i, i)). Having
introduced the hierarchy of types in this way, we shall now require bound
variables to be of some definite type. Every quantifier will then range over
the totality of all entities of the same type as the bound variable. When
120 CHAPTER 7. TYPES
this is done, we have a very comprehensive logical calculus which is secure
against vicious circularity”.
A further discussion and formalization (in the form of Church’s simply
typed lambda calculus λ → c) of the simple theory of types can be found in
[86].
7.3. QUINE 121
7.3 Quine
Just as the introduction of the irrational numbers . . . is a conve-
nient myth [which] simplifies the laws of arithmetic . . . so physical
objects are postulated entities which round out and simplify our
account of the flux of existence . . . The conceptional scheme of
physical objects is [likewise] a convenient myth, simpler than the
literal truth and yet containing that literal truth as a scattered part
- Quine, quoted in [50]
Willard Van Orman Quine (1908-2000) was an American mathematician
who became interested in the work of Russell. An alternative to Russell’s sys-
tem is one that allows a single universe of all types (or all sets). In Russell’s
theory such an object is too big but according to others, including Quine,
having a set of all sets or a type of all types is legitimate as long as we do not
permit forming all subsets. If there is some restriction on which subsets can
be formed, for example by requiring a stratified predicate to define the sub-
set, then no contradiction will result. Quine proposed in [94, pages 80-101]
a system called New Foundations, NF, based on this idea. To restrict the
way subsets are formed, Quine further restricted the comprehension axiom to:
NFC(omprehension) Axiom: ∃x∀y :: (y ∈ x ↔ ϕ(y)), where x is not
free in ϕ(y) and ϕ(y) is stratified
In [86, footnote 4], we find two definitions of stratification.
Definition of heterogeneous stratification: A well-formed formula ϕ
is heterogeneously stratified := there is a function f from the variables and
constants of ϕ to the natural numbers such that for each atomic well-formed
formula F(x
1
, . . . , x
n
) of ϕ, f(F) = 1 + (max : 1 ≤ i ≤ n : f(x
i
))
Definition of homogeneous stratification: A well-formed formula ϕ is
homogeneously stratified := ϕ is heterogeneously stratified and for the corre-
sponding function f we also have that f(x
i
) = f(x
j
) for 0 ≤ i, j ≤ n
With the NFC axiom the paradox is obviously prevented, since the sen-
tence ϕ ≡ x / ∈ x is not stratified.
122 CHAPTER 7. TYPES
We quote from [86, page 3]: “NF is weak for mathematical induction and
the axiom of choice is not compatible with NF. We cannot prove Peano’s
axiom[s] in it, unless we assume the existence of a class with m + 1 ele-
ments. Also, NF is said to lack motivation because its axiom of compre-
hension is justified only on technical grounds and one’s mental image of set
theory does not lead to such an axiom. To overcome some of the difficulties,
Quine adopted similar measures to NBG (Neumann-Bernay-G¨odel, see sec-
tion 8.5) set theory[, and developed another non-iterative set theory called
ML (Mathematical Logic), first presented in [70]]. Like NBG, ML contains
a bifurcation of classes into elements and non-elements. Sets can enjoy the
property of being full objects whereas classes cannot. ML was obtained from
NF by replacing (NFC) by two axioms, one for class existence and one for
elementhood. The rule of class existence provides [. . . ] the existence of the
classes of all elements satisfying any condition ϕ, stratified or not. The rule
of elementhood is such as to provide the elementhood of just those classes
which exist for NF. Therefore, the two axioms of comprehension for ML [are]:
Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x) is stratified
with set variables only in which y does not occur free.
Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x)
is any formula in which y does not occur free.
ML was liked both for the manipulative convenience we regain in it and
the symmetrical universe it furnishes. It was however proved subject to the
Burali-Forti paradox”.
For more information, we refer to [70], [71], [72] and the website
http://diamond.boisestate.edu/∼holmes/holmes/nf.html.
Chapter 8
G¨odel
The development of mathematics towards greater precision has
led, as is well known, to the formalization of large tracts of it,
so that one can prove any theorem using nothing but a few me-
chanical rules. [. . .] It will be shown below that this is not the
case, that on the contrary there are in the two systems mentioned
[viz. Principia Mathematica and ZF] relatively simple problems
in the theory of integers that cannot be decided on the basis of the
axioms.
- K. G¨ odel, in the opening of the paper introducing the incom-
pleteness theorem (1931)
8.1 Informally: G¨ odel’s incompleteness theorems
No system of Hilbert’s type in which the integers (or Peano’s arithmetic, see
section 4.1) can be defined can be both consistent and complete. At the
time this seemed unreal, but in 1931 Kurt G¨ odel (born in 1906 in Brnn,
Austria-Hungary, what is now Brno, Czech Republic) presented mathema-
ticians with the astounding and melancholy conclusion that the axiomatic
method has certain limitations, which rule out the possibility that even the
ordinary arithmetic (as by Peano) can ever be fully axiomatized. As a corol-
lary of this theorem, he proved that it is impossible to establish the internal
logic consistency of a very large class of deductive systems. It provoked a
reappraisal of philosophies of mathematics.
123
124 CHAPTER 8. G
¨
ODEL
G¨odel’s famous incompleteness theorem and the corresponding corollary
are also called the first and the second incompleteness theorem. G¨ odel was
able to show that, if an axiomatic system of formalized arithmetic is wide
enough, then
1. The system is necessarily incomplete, in the sense that there exists a
formula ϕ of the system such that neither ϕ nor its negation is derivable
(see also section 8.2 for the definition of incompleteness), and
2. If the system is consistent, then no proof of its consistency is possible
which can be formalized within it (see also section 8.2 for the definition
of consistency).
We first indicate (in 8 steps, following the lines of the original proof of
G¨odel) the main lines of both theorems in this section, and provide a more
rigorous and exact proof of the theorems in section 8.2 and further sections.
1 The (syntax of) formulas of an axiomatic system are precisely defined
and built up from a finite alphabet of symbols. Proofs are noth-
ing but a finite series of formulas and can be replaced by numbers.
With such a representation, the G¨ odel numbering, G¨odel gave a well-
ordering of all well-formed formulae of an axiomatic system S (to be
precise, of ω-complete systems, see section 8.2 for more details). G¨ odel
then showed how to represent metamathematical concepts as ‘formula’,
‘proof-schema’ and ‘provable formula’ by a series of natural numbers.
We define gn(ϕ) to be the G¨ odel number corresponding to well-formed
formula ϕ of S.
2 We consider a formula prov(ϕ) of S, stating that ϕ is a provable for-
mula. Precisely, we define prov(ϕ) := ‘ϕ is a provable formula’. A class
sign is a formula with just one free variable. We suppose that the class
signs are ordered by a function R with domain N, such that R(n) is
defined as the n
th
class sign. By [R(n); q] we denote the formula which
is denoted by replacing the free variable in R(n) by q.
3 We now define a set K of natural G¨ odel numbers by n ∈ K ↔
prov([R(n); n]). Since the symbols that are used in this formula are
all definable in S, there also is a formula with one free variable (i.e. a
class sign) that denotes n ∈ K, for some natural number n. We call
8.1. INFORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 125
this class sign C. So there is a natural number q such that C = R(q).
We now show that the proposition G ≡ [R(q); q], is unprovable in S.
Since
1
this formula says that q ∈ K, that is prov([R(q); q]), we can
say that G is a property that asserts of itself that it is not provable.
4 We show that G is provable ↔ G is provable, and hence is undecid-
able:
• Suppose G is provable, this means [R(q); q] is provable, (by replac-
ing the variable in the class sign by q) that is q ∈ K, i.e. prov([R(q); q]),
and this says prov(G) : G is not provable.
• Suppose G is not provable, this means its negation [R(q); q] is
provable, (by replacing the variable in the class sign C by q)
that is q / ∈ K, i.e. (prov([R(q); q]), and this is equivalent with
prov([R(q); q]) or prov(G) : G is provable.
A proof of G leads to a proof of G and vice versa, thus the system S
is inconsistent. So if we assume that S is consistent, then both G and
G must not be provable: G is undecidable in S.
5 By a metamathematical consideration we know however that G is true.
Because from the remark that G asserts its own unprovability, it follows
at once that G is true, since G is unprovable (because undecidable).
So there is a true statement in S (namely G) that is not provable: the
system S is incomplete!
6 If we add G as an axiom, we can again apply the argument given
in the previous five steps in the same way. Basically we then create
another formula G

, since in step 3 a proposition is defined that states
‘this formula is not provable’, or in other words ‘this formula does not
follow from the axioms’. That means, the proposition depends on the
set of axioms. Therefore, as I. Grattan-Guinness cleverly calls it in [31,
page 510], the system S is ‘essentially incompleteable’.
7 G¨odel then showed that ‘if arithmetic is consistent, it is incomplete’.
We want to prove this conditional statement as a whole. We define
the condition of the statement by A: ‘arithmetic is consistent’. We
1
By replacing in the class sign C, which expresses that n ∈ K for some natural number
n, the free variable by q.
126 CHAPTER 8. G
¨
ODEL
already have seen in section 6.1 that this means that there is at least
one formula ϕ of arithmetic that is not true. So we can express A ≡
(∃y :: (∀x :: prov (x is a proof of y))). A system is incomplete if
there is a true statement that is not provable. Thus we can represent
the conclusion of the conditional statement by G.
8 We can now formally prove A → G (see section 8.2 for the proof). This
means that if A is provable, we know (by modus ponens or the role of
detachment) that G is provable. But we already saw that (unless S
is inconsistent), G is not provable; thus if S is consistent, A is not
provable! That means if arithmetic is consistent its consistency cannot
be established by metamathematical reasoning within the formalism
of arithmetic (this is G¨ odel’s theorem 11, see [93, page 614]). Or, as
expressed in [31, page 510], ‘any set S of consistent formulae of PM
cannot include the formula F asserting its consistency’.
8.2. FORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 127
8.2 Formally: G¨ odel’s Incompleteness Theorems
The first incompleteness theorem says that Principia Mathematica or any
other system in which arithmetic can be developed, is essentially incomplete,
that is in any consistent set of arithmetical axioms there are statements that
are true but cannot be derived from the set.
The second theorem says that it is impossible to give a metamathemat-
ical proof of the consistency of a system comprehensive enough to contain
the whole of arithmetic - unless the proof itself employs rules of inference
in certain essential respects different from the derivation rules identifying
theorems within the systems.
In the following two paragraphs, we will first give an abstract version of
G¨odel’s first and second incompleteness theorem, investigate the set of lan-
guages that the theorem applies to, and then in the third paragraph fill in
the details by giving a specific G¨ odel numbering for arithmetic. Then in the
next sections we will apply the theorem to the system of Peano Arithmetic
and that of Principia Mathematica, and discuss the consequences of the in-
completeness theorem.
8.2.1 On formally undecidable propositions
We assume there is an STGA language L and investigate the conditions for
a system L for which G¨odel showed that there is a true sentence that is not
provable in L (i.e. (∃t : t ∈ T : t / ∈ {)). We define the following concepts:
A predicate H expresses a set of numbers A := (∀n :: H(n) ∈ T ↔ n ∈ A)
A is expressible in L if A is expressed by some predicate of L. Note that
expressibility in L only concerns with T and not with { and 1.
Theorem: Not every set of numbers is expressible.
Proof: (from [84]) Since L is built up of a finite number of symbols and
derivation rules, there are only denumerably many expressions or predicates
of L. But (by Cantor’s theorem, see page 69) there are non-denumerably
many sets of natural numbers. Therefore, not every set of numbers is ex-
pressible in L.
128 CHAPTER 8. G
¨
ODEL
Let gn be a function that assigns to each expression a unique natural
number (just as in step 1 in section 8.1, i.e. gn is a bijection between c and
N). For any E ∈ c, we also call gn(E) the G¨ odel number of E. We will
give a specific numbering in section 8.2.3. For this abstract treatment the
only assumption
2
we make is that every number is the G¨odel number of some
expression.
We define E
n
to be the inverse of gn, i.e. gn(E
n
) = n. The diagonali-
zation of E
n
for E
n
⊆ H, is defined by E
n
(n). We define d(n) to be the
G¨odel number of the diagonalization of E
n
, that is: d(n) := g
n
(E
n
(n)), and
call d the diagonal function of the system. For each set of natural numbers
A, we define A

to be the set of all numbers n such that d(n) ∈ A, i.e. we
have n ∈ A

↔ d(n) ∈ A. For any set of natural numbers A, we define
its complement
¯
A to be the set of all natural numbers not in A. The com-
plement operation ∼ binds stronger than the ∗, i.e. (
¯
A

) is to be read as (
¯
A)

.
Abstract form of G¨ odel’s first theorem: Let P be a set of G¨odel num-
bers of all the provable sentences. If the set
¯
{

is expressible in L and L is
correct, then there is a true sentence of L not provable in L.
Proof: (based on [84]) Suppose L is correct and
¯
P

is expressible in L by
a predicate H with G¨odel number h. Let G be the diagonalization of H
(i.e. the sentence H(h)). We show that G is true but not provable in L. H
expresses
¯
P

in L, i.e. H(n) is true ↔ n ∈
¯
P

for all n ∈ N. In particular,
H(h) is true ↔ h ∈
¯
P

. We have that h ∈
¯
P

↔ d(h) ∈
¯
{ ↔ d(h) / ∈ P.
But since h is the G¨odel number of H and by the definition of d, d(h) is
the G¨odel number of H(h) and so d(h) ∈ P ↔ H(h) is provable in L and
d(h) / ∈ P ↔ H(h) is not provable in L. Now we have: H(h) is true ↔ H(h)
is not provable in L. This means that H(h) is either true and not provable
in L or false but provable in L. The latter alternative violates the hypothe-
sis that L is correct. Hence it must be that H(h) is true but not provable in L.
Note that in this proof we have not defined the set T by a model but
determined the truth of G by a metamathematical argument just as we have
seen in step 5 of section 8.1, that is nevertheless commonly accepted by all
mathematicians. Note also that the proposition G corresponds to the propo-
2
This assumption is for technical reasons that make the proof more simple; G¨ odel’s
original numbering did not have this restriction.
8.2. FORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 129
sition G of point 3 of section 8.1, since H(h) is a proposition that expresses
of itself that it is not provable.
Theorem: If L is correct and if the set
¯
P

is expressible in L, then L is
incomplete.
Proof: A system L that is correct and for which the set
¯
P

is expressible in
L contains a sentence G that is true but not provable or refutable (By the
previous theorem and the assumption of correctness). Hence G is true, but
undecidable in L, and hence also incomplete.
That is where the name incompleteness theorem comes from. By this
theorem, it follows immediately that if a system is consistent, and the set
¯
P

is expressible in that system (which we will later see is true for a system
of basic arithmetic) then it is incomplete. Note that this is the statement
A → G of point 8 in section 8.1.
When we study a particular language L, such as a system containing Peano’s
arithmetic or the system of Principia Mathematica, we have to verify the
assumption that
¯
P

is expressible in L. We can do this by separately verifying
the following conditions.
G
1
: For any set A expressible in L, the set A

is expressible in L.
G
2
: For any set A expressible in L, the set
¯
A is expressible in L.
G
3
: The set P is expressible in L.
Theorem: G
1
∧ G
2
∧ G
3

¯
P

is expressible in L.
Proof: G
1
and G
2
imply that for any expressible set A,
¯
A

is expressible in
L. In particular we then have that if P is expressible in L (i.e G
3
holds),
¯
P

is expressible in L.
Before we prove a general form of G¨ odel’s second incompleteness theo-
rem, we introduce some more definitions.
A sentence E
n
is a G¨ odel sentence for a set A of natural numbers if either
E
n
is true and its G¨ odel number lies in A, or E
n
is false and its G¨ odel number
lies outside A, i.e. E
n
is a G¨odel sentence for A if and only if E
n
∈ T ↔ n ∈ A.
Diagonal Lemma: For any set A, if A

is expressible in L, then there is a
G¨odel sentence for A.
130 CHAPTER 8. G
¨
ODEL
Proof: Suppose H is a predicate that expresses A

in L; let h be its G¨odel
number. Then d(h) is the G¨odel number of H(h). For any number n, H(n)
is true ↔ n ∈ A

, therefore, H(h) is true ↔ d(h) ∈ A, and since d(h) is the
G¨odel number of H(h), then H(h) is a G¨odel sentence for A.
Lemma: If L satisfies G
1
, then for any set A expressible in L, there is a
G¨odel sentence for A.
Proof: L satisfies G
1
, thus for any expressible set A, A

is expressible in
L. Now we can apply the previous lemma to conclude that there is a G¨ odel
sentence for A.
With the diagonal lemma we can also prove the first theorem as follows:
Since
¯
P

is expressible in L, by the diagonal lemma, there is a G¨ odel sentence
G for
¯
P. A G¨odel sentence for
¯
P is a sentence which is (by the definition
of a G¨ odel sentence) true if and only if it is not provable in L. So for any
correct system L, a G¨odel sentence for
¯
P is a sentence which is true but not
provable in L.
8.2.2 The impossibility of an ‘internal’ proof of consis-
tency
With the diagonal lemma we can also prove a general form of G¨ odel’s second
theorem, that was first formulated in this form by the Polish mathematician
Alfred Tarski.
A general form of G¨ odel’s second theorem (by Tarski)
1. The set
¯
T

is not expressible in L
2. If condition G
1
holds, then
¯
T is not expressible in L
3. If conditions G
1
and G
2
both hold, then the set T is not expressible in
L (i.e. for systems for which G
1
and G
2
hold, truth within the system
is not definable within the system.)
Proof: To begin with, there cannot possibly be a G¨ odel sentence for the set
¯
T because such a sentence would be true if and only if its G¨odel number was
not the G¨odel number of a true sentence, and this is absurd.
8.2. FORMALLY: G
¨
ODEL’S INCOMPLETENESS THEOREMS 131
1. If
¯
T

were expressible in L, then by the diagonal lemma, there would be
a G¨odel sentence for the set
¯
T, which we have just shown is impossible.
Therefore,
¯
T

is not expressible in L.
2. Suppose condition G
1
holds. Then if
¯
T were expressible in L, the set
¯
T

would be expressible in L, violating (1).
3. If G
2
also holds, then if T were expressible in L, then
¯
T would also be
expressible in L, violating (2).
Now we have seen both theorems in a general form, we will consider particular
mathematical languages, starting with first order arithmetic, which we can
build on in section 8.3 to prove the incompleteness of systems based on
Peano’s arithmetic and other systems.
8.2.3 G¨ odel numbering and a concrete proof of G
1
, G
2
and G
3
This section will be completed in a later version of this document. For the
moment we refer to G¨odel’s original work that can be found in [93].
132 CHAPTER 8. G
¨
ODEL
8.3 G¨ odel’s theorem and Peano Arithmetic
The classification of the various modes of syllogisms, when they
are exact, has little importance in mathematics. In the mathema-
tical sciences are found numerous forms of reasoning irreducible
to syllogisms.
- G. Peano in [68, page 379]
There are various different incompleteness proofs of Peano Arithmetic
(with and without exponentiation). We mention three of them. The sim-
plest uses a truth set defined by Tarski and shows that every axiomatizable
subsystem of ^ (the complete theory of arithmetic) is incomplete. This
proof of G¨ odel’s first theorem however cannot be formalized in arithmetic
(since the truth set is not expressible in arithmetic), and was based on the
underlying assumption that Peano Arithmetic is correct, implying that every
sentence provable in Peano Arithmetic is a true sentence. G¨odel’s original
incompleteness proof involves the much weaker assumption of ω-consistency.
Definition of simple consistency: An axiomatic system A is
simply consistent := no sentence is both provable and refutable in A
Definition of ω-inconsistent: An axiomatic system A is ω-inconsistent
:= there is a predicate F(w) (in one free variable w) such that the sentence
(∃w :: F(w)) is provable but all the sentences F(0), F(1), . . . are refutable
Definition of ω-incomplete: An axiomatic system A is ω-incomplete := A
is a simply consistent axiomatic system in which all Σ
0
-sentences are provable
G¨odel’s original proof was based on the assumption of ω-consistency and
shows that every axiomatizable ω-consistent system in which all true Σ
0
-
sentences are provable is incomplete. This proof is of course formalizable in
Peano Arithmetic (and this is necessary for G¨ odel’s second theorem) and also
shows that any axiomatic system A that is simply consistent and in which
all Σ
0
-sentences are provable, is ω-incomplete.
The third proof (1936) is due to Rosser and uses the even weaker assumption
of simple consistency. It is based on an axiomatic system by the American
mathematician Raphael Robinson (1912-1995), that we refer to as R. It
8.3. G
¨
ODEL’S THEOREM AND PEANO ARITHMETIC 133
shows that every axiomatizable simply consistent extension of R is incom-
plete, but thereto uses a more elaborate sentence than the G¨ odel sentence
‘G is undecidable’.
We intend to include the three proofs in a later version of this document.
They can be found in [84] but in a particular presentation that does not use
the concept of a model for axiomatic systems, and that sometimes attaches
different meanings to established definitions, nevertheless it contains in our
opinion one of the best discussions of G¨ odel’s incompleteness theorems.
In a later version of this document we will also show how, given the proof
of incompleteness of Peano Arithmetic, G¨ odel’s theorems apply to Principia
Mathematica.
We quote K. G¨odel on the first page of [27]:
The most comprehensive formal systems that have been set up
hitherto are the system of Principia Mathematica on the one hand
and the Zermelo-Fraenkel axiom system of set theory (further de-
veloped by J. von Neumann) on the other. These two systems are
so comprehensive that in them all methods of proof today used
in mathematics are formalized, that is, reduced to a few axioms
and rules of inference. One might therefore conjecture that these
axioms and rules of inference are sufficient to decide any ma-
thematical question that can at all be formally expressed in these
systems. It will be shown that this is not the case, that on the
contrary there are in the two systems mentioned relatively simple
problems in the theory of integers that cannot be decided on the
basis of the axioms”.
134 CHAPTER 8. G
¨
ODEL
8.4 Consequences
I had a lot of conversations with him [G¨ odel] and a lot of dis-
agreements. Like most others, I was hard to convince about the
incompleteness theorem. There was at the time a tendency, which
I shared, to think that it was special to a certain type of formali-
zation of logic and that a radical reformalization might have the
effect that the G¨ odel argument did not apply. I persisted in that
longer than I should have, and he was always trying to convince
me otherwise.
- A. Church in an interview at Princeton University (1985)
In a later version of this document we will discuss the implications of
G¨odel’s theorem and show the reactions that followed the publication of his
paper [27] in 1931.
8.5. NEUMANN-BERNAYS-G
¨
ODEL AXIOMS 135
8.5 Neumann-Bernays-G¨ odel axioms
There is an infinite set A that is not too big.
There’s no sense in being precise when you don’t even know what
you’re talking about.
- John von Neumann (sources unknown)
Let us recapture the situation of the axiomatic theory of sets before we
introduce the Neumann-Bernays-G¨odel theory.
When Cantor introduced his set theory, he gave the informal definition
(see page 16) of a set being ‘any comprehension into a whole M of definite
and separate objects m of our intuition or thought’. After Hilbert proposed
his proof theory, set theory was given a more rigorous basis, and axiomatic
theories for Cantor’s sets were proposed. Cantor’s definition was replaced by
the principle of comprehension (see page 16), which was adopted by Frege
and Russell. Based on this principle a first formal theory of sets, called ‘ideal
calculus’ was developed (not treated in detail here, see for example [36]). The
antinomies of Burali-Forti and Russell however showed that this theory was
inconsistent, and one way to restore consistency was to incorporate in the
system a theory of types, as was done by Russell. At the same time, intu-
itionists tried to do mathematics without Cantor’s set theory at all. Others
tried to overcome the inconsistencies by making Cantor’s set theory more
rigidly axiomatic, and the most successful axiomatization of set theory was
presented by Zermelo in 1908.
The problem for him was to solve the problem of axiomatization in such
a way that it excludes all contradictions but still is sufficiently wide for all
that is valuable in this theory to be preserved. As we have seen in section
5.3, Zermelo postulated a domain of abstract objects (sets) and elements of
this domain, defined the primitive notions of ‘equality’ and ‘is element of’
relation, and introduced 7 axioms. The comprehension axiom was replaced
by the weaker separation axiom, that only allows new sets to be created
from existing sets and with definite predicates. Before we will describe why
the Hungarian mathematician von Neumann opposed this solution and came
with his own solution to the paradoxes, we will look at this separation axiom
136 CHAPTER 8. G
¨
ODEL
in more detail. Zermelo defined the separation axiom as follows:
Separation axiom:
(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y. For
every set z there exists a set y whose elements are exactly those of z having
the property ϕ.
The concept of definiteness in this axiom was defined by Zermelo as fol-
lows: “A question or assertion ϕ, the validity or invalidity of which is decided
without arbitrariness by the basic laws of logic, is said to be ‘definite’ ”.
We have already seen on page 93 that this axiom excludes the paradoxes of
Russell and Burali-Forti, and as Kneebone remarks
3
in [49, page 263] also
the semantic paradoxes.
In [83], the Norwegian mathematician Skolem pointed out that the defi-
nition of ‘definiteness’ was rather vague and he made precise the formulation
of ‘by the basic laws of logic’. Fraenkel used Skolem’s idea to formulate
the separation axiom in a new way (for details, see [49, page 290, 291]). In
1922 Fraenkel proposed the introduction of another axiom that allows the
existence of larger cardinal numbers than hitherto possible. The foundation
axiom of von Neumann makes occurrence of so-called extraordinary sets im-
possible. A set is extraordinary if there is a sequence of sets V
1
, V
2
, V
3
, . . .
such that V
2
∈ V
1
, V
3
∈ V
2
, etc. Von Neumann’s subsequent interest in set
theory led to the second major axiomatization of set theory in the 1920s.
His formulation differed considerably from Zermelo and Fraenkel (see sec-
tion 5.3) because the notion of function, rather than that of set, was taken
as primitive. In a series of papers beginning in 1937, however, the Swiss
logician Paul Bernays, a collaborator with the formalist David Hilbert, mod-
ified the von Neumann approach in a way that put it in much closer contact
with Zermelo and Fraenkel. In 1940, the Czech-born Kurt G¨ odel, known for
his incompleteness proof (see chapter 8), further simplified the theory. This
version is known as the Neumann-Bernays-G¨odel (NBG) axioms.
3
We quote: “since a definite property is one that is decidable by the basic relations of
the domain B [of sets, the abstract objects postulated by Zermelo], no such property as
that of being definable in a finite number of words can be used in the definition of a set,
and the semantic paradoxes are thus also excluded”.
8.5. NEUMANN-BERNAYS-G
¨
ODEL AXIOMS 137
Before we give the axioms, it is convenient to adopt the undefined notions
of class and the membership relation (though, as is also true in Zermelo and
Fraenkel, ∈ suffices). In the axioms we distinguish between the use of capital
Latin letters and lowercase Latin letters for the variables. The capital letters
stand for variables that take classes (the totalities corresponding to certain
properties) as values. A class is defined to be a set if it is a member of some
class; those classes that are not sets are called proper classes. The lowercase
letters are used as special restricted variables for sets.
Example: ‘for all x, A(x)’ stands for ‘for all X, if X is a set, then A(X)’;
i.e. the condition holds for all sets. Intuitively, sets are intended to be those
classes that are adequate for mathematics, and proper classes are thought
of as those collections that are ‘so big’ that, if they were permitted to be
sets, contradictions would follow. In the Neumann-Bernays-G¨ odel axioms,
the classical paradoxes are avoided. This can be proven by showing in each
case that the collection on which the paradox is based is a proper class, i.e. is
not a set.
Theorem: With the Neumann-Bernays-G¨odel axioms, the derivation of
Russell’s paradox does not apply.
Proof: We show that R := ¦x [ x is a set ∧ x / ∈ x¦ is a class, but not
a set. For all y we have that y ∈ R ↔ y is a set ∧ y / ∈ y. We prove by
contradiction that R is not a set.
Suppose R is a set. Suppose R ∈ R. But then we have (take R for y in the
above statement) R ∈ R ↔ R is a set ∧ R / ∈ R: contradiction. So we must
have R / ∈ R. Then by our assumption we have R is a set ∧ R / ∈ R, and
thus R ∈ R: contradiction. Since in both cases (R ∈ R and R / ∈ R) we get
a contradiction, out assumption that R is a set must be wrong.
The Neumann-Bernays-G¨ odel axioms (NBG):
1 Extensionality axiom (or axiom of determination):
(∀X, Y, z :: (z ∈ X ↔ z ∈ Y ) → X = Y )
Classes are uniquely determined by their members, to be exact: if every
element (that is a set) of a class X is at the same time an element of
Y , and conversely, than X = Y .
138 CHAPTER 8. G
¨
ODEL
2 Axiom of the empty set:
(∃x∀y :: y / ∈ x)
There is an (improper, see also footnote on page 93) set, the ‘null’ or
‘empty’ set, which contains no elements at all.
3 Axiom for class formation: (∃Y ∀x :: (x ∈ Y ↔ ϕ(x)), ϕ is a proposi-
tion in which set variables are only introduced by existential and uni-
versal quantifiers. For every set z there exists a set y whose elements
are exactly those of z having the property ϕ.
4 Pairing axiom:
(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))
Given two sets a and b there exists a set whose elements are exactly a
and b.
5 Sum-set axiom or Union axiom:
(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))
For every set z there exists a set y whose elements are exactly those
objects occurring in at least one element of z.
6 Power set axiom
(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)
For every set z there is a set y that includes every subset of x.
7 Axiom of infinity:
(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : ¦a¦ ∈ z))
There exists a successor set.
8 Axiom of choice:
(∀x :: (∃f : f is a function : Dom(f) = x − ¦∅¦ ∧ (∀a : a ∈ Dom(f) :
f(a) ∈ x)))
Every set x has a choice function.
9 Axiom of replacement or axiom of substitution (by Fraenkel):
(∀x∃!y : ϕ is a class : ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a :
ϕ(x, y))))
The image of a set under an operation (functional property) is again a
set.
10 Axiom of restriction: X = ∅ → (∃y : y ∈ X ∧ X ∩ y = ∅) Every
non-empty class is disjoint from one of its elements.
8.5. NEUMANN-BERNAYS-G
¨
ODEL AXIOMS 139
The axioms 1, 3, 9 and 10 are different from ZF. The third axiom (scheme)
is presented in a form to facilitate a comparison with the third axiom (scheme)
of ZF. In a detailed development of NBG, however, there appears, instead,
a list of seven axioms (not schemes) that for each of certain conditions there
exists a corresponding class of all those sets satisfying the condition. From
this finite set of axioms, each instance of the above scheme, can be obtained
as a theorem. When obtained in this way, the third axiom scheme of NBG
is called the class existence theorem.
In contrast to the ninth axiom scheme of ZF (see section 5.3.2), that of
NBG is not an axiom scheme but an axiom. Thus, with the comments above
about the third axiom in mind, it follows that NBG has only a finite number
of axioms. On the other hand, since the ninth axiom or scheme of ZF provides
an axiom for each formula, ZF has infinitely many axioms. The finiteness of
the axioms for NBG makes the logical study of the system simpler.
The relationship between the theories may be summarized by the state-
ment that ZF is essentially the part of NBG that refers only to sets. We give
the following theorems without proof:
Theorem: Every theorem of ZF is a theorem of NBG
Theorem: Any theorem of NBG that speaks only about sets is a theorem
of ZF
Theorem: ZF is consistent if and only if NBG is consistent
Note that the fact that NBG avoids the classical paradoxes and that
there is no apparent way to derive any one of them in ZF does not settle the
question of the consistency of either theory. All we know from this theorem
is that either both axioms are consistent, or both are inconsistent.
140 CHAPTER 8. G
¨
ODEL
Chapter 9
Church and Turing
9.1 Turing and Turing Machine
We may hope that machines will eventually compete with men in
all purely intellectual fields.
- Alan Turing in [38, page 46]
Alan Mathison Turing (1912-1954) was an English mathematician and
logician who pioneered in the field of computer theory and who contributed
important logical analyses of computer processes. Turing studied in Cam-
bridge, worked there on probability theory and (independently of de Moivre)
discovered the central limit theorem. In 1936 he won the Smith’s Prize. As
we have seen in the previous chapters, many mathematicians had attempted
to eliminate all possible error from mathematics by establishing a formal,
or purely algorithmic, procedure for establishing truth (the so-called for-
malist program). With his incompleteness theorem (see section 8.1), Kurt
G¨odel threw up an obstacle to this effort, for he showed that any useful ma-
thematical axiom system is incomplete in the sense that there must exist
propositions whose truth can never be decided (within the system). Turing
was motivated by G¨ odel’s work to seek an algorithmic method of determining
whether any given propositions were undecidable, with the ultimate goal of
eliminating them from mathematics. Instead, he proved in his seminal paper
‘On Computable Numbers, with an Application to the Entscheidungspro-
blem’ (reprinted in [19]) that there cannot exist any such universal method
of determination. We now regard this decision problem, or Entscheidungs-
141
142 CHAPTER 9. CHURCH AND TURING
problem, in more detail.
Decidability was one of Hilbert’s requirements for an axiomatic system
(see section 6.1). The problem of decidability asks if, given a mathematical
proposition, one could find an algorithm which would decide if the propo-
sition is true or false. When given an algorithm, it is easy to see that it
can prove certain propositions. But it is more difficult to prove there is no
algorithm that can solve certain propositions. Thereto Turing introduced a
hypothetical computing device (later called Turing machine). The Turing
Machine and proof of undecidability are given later in the section.
After this important publication Turing completed his Ph.D. in 1938 on
systems of logic based on ordinals, under direction of Alonzo Church (see
section 9.2). During the war Turing worked on breaking German Enigma
codes, and in 1948 he worked in Manchester on the construction of a new
digital computer. He described a modern computer before technology had
reached the point where the construction was a realistic possibility. His ef-
forts in the construction of early computers and the development of early
programming techniques were of prime importance. He also championed the
theory that computers eventually could be constructed that would be capable
of human thought, and he proposed a simple test, now known as the Tur-
ing test, to assess this capability. Turing’s papers on the subject are widely
acknowledged as the foundation of research in artificial intelligence. In 1952
Turing published the first part of his theoretical study of morphogenesis, the
development of pattern and form in living organisms.
The Turing Machine
Turing introduced his hypothetical computing device in 1936. He origi-
nally conceived the machine as a mathematical tool that could infallibly re-
cognize undecidable propositions - i.e., those mathematical statements that,
within a given formal axiomatic system (that includes at least arithmetic),
cannot be either true or false. G¨ odel had demonstrated that such proposi-
tions exist in any such system. Turing instead proved there can never exist
any universal algorithmic method for determining whether a proposition is
undecidable. This was left open by G¨odel, since the incompleteness theorem
(see section 8.1) only stated that consistency and completeness could not at
the same time be attained; that means there were statements (in consistent
9.1. TURING AND TURING MACHINE 143
systems) about numbers, indubitably true, which could not be proved from
finitely many rules. But the decidability of mathematical statements was
not settled by G¨odels theorem because it needs a formal definition of (al-
gorithmic) method in the formulation of the problem (or a definition of the
notion of algorithm in the definition of decidability in section 6.1). Thereto
Turing introduced a machine that was later to be called the Turing machine,
an idealized mathematical model that reduces the logical structure of any
computing device to its essentials. By extrapolating the essential features of
information processing, Turing was instrumented in the development of the
modern digital computer. His model served as a basis for all subsequent digi-
tal computers, which share his basic scheme of an input/output device (tape
and head), memory (tape) and central processing unit (head and transition
function).
Nowadays there are many models of computing devices available in the
theory of computation (complexity). We will not cover restricted models
such as finite automata and pushdown languages (and corresponding notions
such as regular languages and context-free grammars). We now directly in-
troduce the much more powerful model of Turing that we need to invest all
mathematical problems.
The Turing Machine model uses an infinite tape as its unlimited memory,
and has a tape head that can read and write symbols (of a set Γ) and move
around a tape (to the L(eft) or R(ight)). We here assume the tape is right-
infinite; this means the tape continues infinitely to the right side but it has
a left-most position. Initially the tape contains an input string of symbols
from an input alphabet Σ and is blank (i.e. filled with a special blank symbol
.) everywhere else. The Turing Machine is in a state q of a set of states Q,
and starts in an initial state q
0
. It uses a transition function δ that deter-
mines how it gets from one configuration (that is the current state, the tape
contents and the head location) to the next. This transition can consist of
writing a new symbol of the tape alphabet Γ to the tape and moving the tape
head either Left or Right, and depends on the current state and the current
symbol on tape. This computation (i.e. sequence of transitions) continues
until the Turing Machine enters either the (final) state q
accept
or the (final)
state q
reject
. We can define a Turing Machine (sometimes called determin-
istic, since each transition is determined uniquely given the configuration)
formally as a septuple:
144 CHAPTER 9. CHURCH AND TURING
Definition of a Turing Machine (TM):
A Turing Machine (TM) := (Q, Σ, Γ, δ, q
0
, q
accept
, q
reject
) with:
1 Q is a finite set of states.
2 Σ is a finite input alphabet not containing the special blank symbol ..
3 Γ is a finite tape alphabet, where ¦.¦ ∈ Γ and Σ ⊆ Γ.
4 δ is the transition function, where δ is finite and
δ : QΓ → QΓ ¦L, R¦.
5 q
0
is the start state, where q
0
∈ Q.
6 q
accept
is the accept state, where q
accept
∈ Q.
7 q
reject
is the reject state, where q
reject
∈ Q and q
reject
= q
accept
.
We call configurations accepting configurations if the state is q
accept
, re-
jecting configurations if the state is q
reject
, and halting configurations if the
state is either q
accept
or q
reject
. A start configuration C on input w is a con-
figuration with state q
0
and the head is on the leftmost position on the tape
with just w on it.
After defining the Turing Machine, Turing made his famous proposal
(known as Turing’s thesis, see also section 9.3) for the concept of ‘com-
putability by a Turing machine’. The proposal says that whenever there
is an effective method for obtaining the values of a mathematical function
(i.e. it is intuitively or effectively computable), the function can be computed
by a Turing Machine. The converse claim is trivial, and if the thesis is correct
we can reduce problems of (non-)existence of effective methods by problems
of the (non-)existence of Turing Machine problems. We quote one of Turing’s
formulations from [90]:
Turing’s Thesis: LCM’s [Logical Computing Machines, Turing’s expres-
sion for Turing Machines] can do anything that could be described as “rule
of thumb” or “purely mechanical”.
We now introduce more of Turing’s theory of Turing Machines before we
define his proof of undecidability.
9.1. TURING AND TURING MACHINE 145
We define a language to be a set of strings, a string being a series of
alphabet symbols (i.e. w ∈ Σ

, for all strings w). We say that a TM M
accepts input string w if a sequence of configurations C
1
, . . . , C
k
exists where
1 C
1
is the start configuration of M on input w.
2 Each C
i
yields C
i+1
via the transition function δ on M.
3 C
k
is an accepting configuration.
A set of strings that M accepts is called the language of M.
Definition of the language of a TM: The language of a TM M, notation
L(M) := ¦w [ w is a string that M accepts ¦.
Let w ∈ Σ

. We now define a notion that covers the ability of a TM to
end in the accept state when started with any string of a certain language.
Definition of Turing-recognizable: A language L is recognized by a TM
M := there exists a TM M such that for all strings
1 with input w, M stops in q
accept
if w ∈ L and
2 with input w, M stops in q
reject
or does not stop (loops) if w / ∈ L.
If language L is recognized by a TM M we say that M is an acceptor for
L. We distinguish between recognizing and deciding capabilities.
Definition of Turing-decidable (or decidable): A language L is decided
by a TM M := there exists a deterministic TM M such that:
1 with input w, M holds in q
accept
if w ∈ L, and
2 with input w, M holds in q
reject
if w / ∈ L.
If a language L is decided by a TM M we say that M is a decider for L.
There are several variants on Turing Machines such as double-sided in-
finite Turing Machines, multitape Turing Machines, non-deterministic Tur-
ing Machines and certain types of so-called enumerators. Most variants are
equivalent in the sense that they can recognize the same set of languages
146 CHAPTER 9. CHURCH AND TURING
(but not necessarily equally efficient).
Example: We now give an example of a Turing Machine solving a mathema-
tical problem by first defining it as a language problem. The problem (idea
from [56]) is to design a Turing Machine that computes the function
f(x, y) = x +y if x ≥ y
f(x, y) = 0 if x < y
For simplicity, we assume x and y to be positive integers. First we have to
choose a convention for representing positive integers, and decide what the
initial situation of the tape is. We choose a unary notation in which any
positive integer xis represented by w(x) ∈ ¦1¦
+
, such that [ w(x) [ = x. We
assume that w(x) and w(y) are on the tape in unary notation, separated by
a single ‘0’ and with the read-write head on the left-most symbol of w(x).
We first describe how the sum of x and y can be calculated, then how the
comparison x ≥ y can be made and finally how to combine those two ma-
chines into a Turing Machine that computes the desired function.
Calculating the sum
To add the two numbers a and b, we only have to remove the separating
0, so addition amounts to the concatenation of two strings. The following
Turing Machine, called Adder, adds a and b and is constructed relatively
simple:
Adder = (Q, Σ, Γ, δ, q
0
, q
A
, q
R
), with
Q = ¦q
0
, q
1
, . . . , q
4
¦
Σ = ¦0, 1¦
Γ = ¦0, 1, .¦
q
0
= ¦q
0
¦
q
A
= ¦q
4
¦
q
R
= ¦¦
δ(q
0
, 1) = (q
0
, 1, R)
9.1. TURING AND TURING MACHINE 147
δ(q
0
, 0) = (q
1
, 1, R)
δ(q
1
, 1) = (q
1
, 1, R)
δ(q
1
, .) = (q
2
, ., L)
δ(q
2
, 1) = (q
3
, 0, L)
δ(q
3
, 1) = (q
3
, 1, L)
δ(q
3
, .) = (q
4
, ., R)
Note that we remove the ‘0’ by temporarily creating an extra ‘1’, a fact
that is remembered by putting the machine into state q
1
. The transition
δ(q
2
, 1) = (q
0
, 0, R) is needed to remove this ‘1’ at the end of the computa-
tion. Finally, we move the read-write head back to the leftmost ‘1’. This
is not strictly necessary in this example, because the machine is designed
such that it will terminate right after any addition, but it is not harmful and
normally a good habit to let any action terminate in a state from which it is
easy to take further transitions.
Comparison
To compare two numbers a and b, we again assume they are written in the
notation that we used before and divided by a single ‘0’. We will construct
a Turing Machine that halts in an accepting state if a ≥ b and in a rejecting
state if a < b. Thereto we can match each ‘1’ on the left of the dividing
‘0’ with a ‘1’ on the right. We can do this by starting at the leftmost ‘1’
(of the number a) and interchangeably check off the leftmost symbols of the
numbers a and b by replacing them with the symbols ‘x’ and ‘y’ respectively.
The matching will stop when one of the two sequences of ‘1’s is completely
checked off. If x < y then the right sequences will still contain ‘1’s, and
if x ≥ y either the left sequence contains ‘1’s or neither sequence contains
‘1’s. In the first case, we still find a ‘1’ on the right when all ‘1’s on the left
have been replaced. We use this to get into the state q
5
. In the second case,
if a ≥ b, when we attempt to match another ‘1’, we encounter a blank at
the right of the working space, which can be used as a signal to enter the
accepting state. If we work this out in detail, we get the following Turing
Machine called Comparer :=
(Q, Σ, Γ, δ, q
0
, q
A
, q
R
), with:
148 CHAPTER 9. CHURCH AND TURING
Q = ¦q
0
, q
1
, q
2
, q
3
, q
4
, q
5
, q
6
, q
7
¦
Σ = ¦0, 1¦
Γ = ¦0, 1, x, y, .¦
q
0
= ¦q
0
¦
q
A
= ¦q
5
¦
q
R
= ¦q
7
¦
The transitions of δ can be grouped in several parts.
δ(q
0
, 1) = (q
1
, x, R)
δ(q
1
, 1) = (q
1
, 1, R)
δ(q
1
, 0) = (q
2
, 0, R)
δ(q
2
, y) = (q
2
, y, R)
δ(q
2
, 1) = (q
3
, y, L)
This set replaces the leftmost ‘1’ of a with ‘x’, then causes the read-write
head to travel right to the first ‘1’ of b and replace it with the symbol ‘y’.
When the dividing ‘0’ is passed, the machine enters state q
2
, indicating that it
is now dealing with the number b. When the symbol ‘y’ has been written, the
machine enters a state q
3
, indicating that on ‘1’ of ‘y’ has been successfully
paired with a ‘1’ of ‘x’. The next group of transitions reverses the direction
and repositions the read-write head over the leftmost ‘1’ of a, and returns
control to the initial state,
δ(q
3
, y) = (q
3
, y, L)
δ(q
3
, 0) = (q
4
, 0, L)
δ(q
4
, 1) = (q
4
, 1, L)
δ(q
4
, x) = (q
0
, x, R)
9.1. TURING AND TURING MACHINE 149
The rewriting continues this way when the input is a string 1
x
01
y
, stopping
only when on one side no more ‘1’s can be replaced. In that case either the
left side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of
‘1’s (a > b). In case the left side will not contain anymore ‘1’s, the transition
δ(q
4
, x) = (q
0
, x, R) will leave the read-write head on a ‘0’ in stead of a ‘1’.
δ(q
0
, 0) = (q
5
, x, L) (a ≤ b)
δ(q
2
, .) = (q
6
, ., L) (a > b)
In the first case we still have to check whether the right side has any ‘1’s left,
to determine whether a = b. This is done in the state q
5
.
δ(q
5
, x) = (q
5
, x, R)
δ(q
5
, 0) = (q
5
, 0, R)
δ(q
5
, y) = (q
5
, y, R)
δ(q
5
, 1) = (q
7
, y, R) (a < b)
δ(q
5
, .) = (q
6
, ., L) (a = b)
Combining Turing Machines for complicated tasks
We now have to put together the Turing Machines’ Adder and Comparer
to obtain the desired Turing Machine that computes the given function. We
can do this by starting with the input a and b in the previously described
notation and starting position, and using Comparer to determine whether
or not a ≥ b. We index all states with a C, i.e. the last transition will be
δ(q
C,0
, x) = (q
C,5
, x, L) or δ(q
C,2
, .) = δ(q
C,6
, ., L). In the first case (a ≥ b),
the Comparer should send a ‘start signal’ to the Adder, to give a +b as out-
put. In the second case (a < b), the Comparer should send a ‘start signal’
to a Turing Machine, (called Eraser) that simply replaces all ‘1’s by ‘0’s to
output the value 0 in the desired format.
We show how we can let the Comparer send these ‘start signals’. We first
index all states of the Adder by A and of the Eraser by E. Now in case of
a ≥ b, Comparer ends in state q
C,5
, and we can add a transition δ(q
C,5
, ∗) =
δ(q
A,0
, ∗). The star ‘∗’ stands for any possible symbol, so actually this tran-
sition is a shorthand notation for a set of transitions. Similarly, we can let
150 CHAPTER 9. CHURCH AND TURING
δ(q
C,7
, ∗) = δ(q
E,0
, ∗) bring the Eraser in the initial state. The Adder respec-
tively Eraser will then give the desired output because their behavior on the
input does not change as a result of the remaining of the states by comparer
(to be exact: the state in which the comparer terminates is suitable as an
initial position for Adder or Eraser). The only thing we have not taken care
of is that when the Comparer enters a final state, it does not have the initial
representation of the numbers a and b on tape, but has replaced the ‘1’s by
‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as an
exercise if you want) fix this by letting Comparer, as the last action before
entering a final state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a Turing
Machine that combines Comparer, Adder and Eraser to compute the func-
tion f. Similarly to this example, we can for example multiply two numbers
a and b, and we can also translate macro-instructions like ‘if p then q
j
else
q
k
’ (meaning that when we read ‘p’ on tape, then the Turing Machine goes
into a state q
j
and otherwise into a state q
k
), and even combine them into
complicated subprograms that can be invoked repeatedly whenever needed.
(End of Example)
The Entscheidungsproblem
After introducing the notion of a TM in [89], Turing answered Hilbert’s
decision problem for mathematical logic (in German called ‘Entscheidungs-
problem’) in the negative. The Entscheidungsproblem asks whether there
exists a definite method or algorithm which (at least in principle) can be ap-
plied to any given mathematical property to decide whether that proposition
is provable. We now define the notion of an algorithm with the notion of a
Turing Machine, and the set of provable propositions by the set of languages
that can be decided by some TM. If we look at the definition of decidability
in section 6.1, we have that for all formulas ϕ an algorithm, i.e. a TM, exists
that decides whether ϕ is true or not. If we code ϕ by means of a language,
and this is always possible (see the previous example for a demonstration),
we can reformulate the problem as: for all strings w ∈ L, there exists a TM
M that decides ϕ. We now show that this is not possible for all problems
(i.e. languages) by giving a specific problem, the Halting problem, that is not
decidable.
The Halting problem is the problem of testing whether a TM accepts a
given input string. We define the problem by stating it as a language pro-
9.1. TURING AND TURING MACHINE 151
blem, and asking whether that language is decidable.
Definition of the Halting problem:
For all strings w, H := ¦< M, w > [ M is a TM and M accepts w¦. Is H
decidable? (i.e. is there for each language a TM that decides for all strings
w if they belong to the language or not, that is (using Turing’s thesis, see
section 9.3): is there for each problem an algorithm that can decide it?).
Theorem: H is recognizable
Proof (by Turing): The following TM U, also called Universal Turing Ma-
chine because it is capable of simulating any other Turing Machine, recog-
nizes H. We informally define U, because a detailed definition of the septuple
such a TM consists of (see the definition of a TM) is a lot of work.
Description of Universal Turing Machine: U =
“On the input < M, w > where M is a TM and w is a string:
1 simulate M on input w
2 if M ever enters its accept state, accept”
Note that this TM loops on input < M, w > if M loops on w, which is
why this machine does not decide H. If the algorithm had some way to de-
termine that M was not halting on w, it could reject. Hence H is sometimes
called the Halting problem. As Turing demonstrated, an algorithm has no
way to make this determination.
Theorem: H is undecidable (see also [82, page 165]).
Proof (by Turing): We assume H is decidable and obtain a contradiction.
Suppose D is a decider for H, and defined by
D(< M, w >) :=“
• accept if M accepts w
• reject if M does not accept w”
Now we construct a new TM O with D as a subroutine. This new TM
calls D to determine what M does when the input to M is its own description
< M >. Once O has determined this information, it does the opposite. That
is, it rejects if M accepts and accepts if M does not accept. The following is
a description of O: O = “On input < M >, where M is a TM:
152 CHAPTER 9. CHURCH AND TURING
1 run D on input < M, < M >>,
2 output the opposite of what D outputs; that is if D accepts, reject and
if D rejects, accept”
We summarize the behavior of O as follows:
O(< M >) = “
• accept if M does not accept < M >
• reject if M accepts < M > ”
Now we obtain the contradiction by running O with its own description
< O > as input. In that case we get:
O(< O >) = “
• accept if O does not accept < O >
• reject if O does accept < O > ”
Thus neither O nor D can exist.
Turing wrote in his last publication about the interpretation of unsolvable
problems, such as the Halting problem for Turing machines:
These . . . may be regarded as going some way towards a demon-
stration, within mathematics itself, of the inadequacy of ‘reason’
unsupported by common sense.
- Alan Turing
In this section I have made extensive use of [38] [92] for information on
the life and work of Turing and [89] [82] [19] for the theory of TM’s and the
Halting problem. Another valuable source of information on Turing’s life and
work is the website http://www.turing.org.uk/
9.2. CHURCH AND THE LAMBDA CALCULUS 153
9.2 Church and the Lambda Calculus
Alonzo Church (1903-1995) was an American mathematician, whose work is
of major importance in mathematical logic, recursion theory and in theore-
tical computer science. One of the most important contributions to logic is
his invention in the 1930s of the lambda calculus. He is also remembered
for Church’s theorem published in 1936 in [14, page 345-363], stating that
the lambda calculus can be used to embody a correct formalization of the
notion of computability (see section 9.3). The notion of lambda definability
is conceptually the basis for the discipline of functional programming, and
the lambda calculus is also the basis for type theory. Church also founded
the Journal of Symbolic logic in 1956. He had 31 doctoral students including
famous mathematicians such as Turing, Kleene, Kemeny and Smullyan. We
now introduce the lambda calculus (Church’s formalization of the notion of
effective calculability) in a modern setting, using [9, chapter 4].
Application and abstraction
First we introduce the basic concepts of λ-calculus. A formalization fol-
lows thereafter. The lambda calculus has only two basic operations, abstrac-
tion and application.
• Abstraction is for constructing functions: For an expression E we in-
troduce λx.E to denote the abstraction of E over x, i.e. ‘the function
of x which computes E’.
Example
1
: λx . x + 1, λn . n n, etc.
We will later see how to define a recursive function; this is not so easy
since we do not have function names.
• (Function) application: The expression F A denotes that F is consid-
ered as a function (an algorithm) applied to input A. The original
lambda calculus theory is type-free so we also consider F F, that is, F
applied to itself.
Example: (λx . x + 1) 4, (λn . n n) 7, etc.
1
Note that in some examples we have simplified the notation for the clarity of the
example, since in pure lambda calculus we do not have arithmetic symbols, like + and ,
but we can encode these operations in the pure lambda calculus, as we will later see.
154 CHAPTER 9. CHURCH AND TURING
These two notions can be very powerful if we introduce the rule of beta
reduction which allows us to apply an expression over an abstraction, and for
example, rewrite (λx . x+1)4 to 4+1. Similarly (λn . nn) 7 can be reduced
to 77. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)n) 7) 4
can be reduced to (λx . (x + 1) 7) 4 and then to (4 + 1) 7.
Similar to ordinary mathematics, the names of the variables are irrele-
vant to the rules that can be applied, which allows a transformation of the
names (also known as dummy transformation). This rule in lambda calculus
is called alpha conversion. For example, alpha conversion allows us to rewrite
λn . nn to λx . xx, since they are essentially the same function.
Note that we also want to use functions as variables and arguments:
((λf . (λn . λx . fx n) 7)(λy . y + 1)) 4 should reduce to the earlier
expression.
But above we only have functions of one argument; we now introduce functions
with more arguments, while avoiding new notations. We can solve this pro-
blem by using iteration of applications, often called currying after the Amer-
ican mathematician H.B. Curry who made it popular.
Example: f(x, y) = 3 x + y can be written as F
1
≡ λx . (λy . 3 x + y).
Then f(4, 5) is written (F
1
4) 5, that is ((λx . (λy . 3 x + y)) 4) 5, which
can be reduced to (by using beta reduction): 3 4 + 5.
The above explanation and examples give an idea of what lambda calcu-
lus is. We will now work towards a more formal definition of lambda calculus.
The system of lambda calculus is based on the structure of Abstract Reduc-
tion Systems (ARS). The terms of the ARS then coincide with the inductively
defined lambda terms and the reduction relation will be β−reduction. So be-
fore we formally define the lambda calculus, we introduce the most relevant
theory of abstract reduction systems.
Abstract Reduction Systems
Definition of Abstract Reduction System (ARS): An abstract reduc-
tion system / := a structure ' A, →` consisting of a set A and a binary
relation → on A (i.e. →⊆ A A).
The relation is also called reduction or rewrite relation. If for a, b ∈ A, we
have a → b, we call b a one-step reduct of a.
9.2. CHURCH AND THE LAMBDA CALCULUS 155
The transitive and reflexive closure of → is written as (or alternatively


). This means is the smallest relation on A satisfying, for all a, b, c ∈ A,
(closure of →) if a → b then a b,
(reflexive) a a, and
(transitive) if a b and b c then a c.
Thus a b if and only if there exists a finite sequence of reduction steps
a ≡ a
0
→ a
1
→ . . . → a
n
≡ b. This sequence may be empty, in which case
a ≡ b. Here ≡ denotes (the syntactic) identity of elements of A, i.e. a ≡ b if
and only if a and b are the same element of A.
Definition of Normal Form: A term a ∈ A of an ARS < A, →> is a
normal form := there is no b ∈ A such that a → b. Furthermore, b ∈ A has
a normal form if and only if b a for some normal form a ∈ A
Definition of Weakly Normalizing: The reduction relation → of an
ARS < A, →> is weakly normalizing (or weakly terminating) := every a ∈ A
has a normal form. In this case we also say that A is weakly normalizing
Definition of Strongly Normalizing: The reduction relation → of an
ARS < A, →> is strongly normalizing (also called terminating, well-founded
or noetherian) := there exists no infinite reduction a
0
→ a
1
→ a
2
→ . . .,
with for all n ∈ N, a
n
∈ A.
Lemma If an ARS is strongly normalizing, it is weakly normalizing.
Proof: We prove this by proving the contraposition: if 'A, →` is not weakly
normalizing then 'A, →` is not strongly normalizing. Suppose 'A, →` is not
weakly normalizing. Then there is a
0
∈ A without a normal form. Since a
0
has no normal form, then certainly a
0
is not a normal form itself, so there is
a
1
∈ A such that a
0
→ a
1
. Now a
0
has no normal form, so a
1
can not be a
normal form. Thus we get an element a
2
∈ A such that a
1
→ a
2
. Repeating
this process yields an infinite reduction a
0
→ a
1
→ a
2
→ . . ..
Definition of Unique Normal Form: The reduction relation → of an
ARS < A, →> has the unique normal form property := for all a, b, c ∈ A
156 CHAPTER 9. CHURCH AND TURING
such that a b, a c, and b, c are normal forms, we have b ≡ c
Lemma An ARS < A, →> with the unique normal form property is not
always weakly normalizing.
Proof: For instance, the abstract reduction system with only element a ∈ A
and rewrite rule a → a has no normal forms, so it trivially has the unique
normal form property and is not weakly normalizing.
Definition of Local Confluence: A reduction relation → of an ARS
< A, →> is called locally confluent or weakly confluent (also weakly Church-
Rosser) := for all a, b, c ∈ A with a → b and a → c there exists a d ∈ A such
that b d and c d
Definition of Confluence: A reduction relation → of an ARS < A, →>
is called confluent (or has the Church-Rosser property, or is Church-Rosser)
:= for all a, b, c ∈ A with a b and a c there exists a d ∈ A such that
b d and c d
Lemma If a reduction relation has the unique normal form property and is
weakly normalizing then it is confluent.
Proof: Suppose we have a b and a c. Since → is weakly normalizing,
there are normal forms b

and c

such that b b

and c c

. By transitivity
we also have a b

and a c

, and thus by the unique normal form property
b

≡ c

. Hence b b

and c b

.
Lemma If → is confluent then → has the unique normal form property.
Proof: Suppose a b, a c, and b, c are normal forms. By confluence,
there exists a d such that b d and c d. Since b and c are normal forms,
we must have b ≡ d and c ≡ d, thus b ≡ c.
Syntax
Now we have seen the basic principle of lambda calculus, we will give a
more formal definition. We formally define the syntax of the lambda calculus
by giving its grammar.
9.2. CHURCH AND THE LAMBDA CALCULUS 157
Definition of the Syntax of Lambda Terms:
Lambda Term E := C [ v [ (E
1
E
2
) [ (λv . E) , with
• C ranges over a set of constants
(we will use the constant names a, b, c, . . . for elements of C)
• v ranges over a (denumerable) set of variables (using v, w, x, . . .)
• (E
1
E
2
) denotes a combination involving the application of one expres-
sion (E
1
) to another (E
2
). The subexpression E
1
is referred to as the
operator and E
2
is referred to as the operand
• (λv . E) denotes an abstraction. Informally it denotes a function of v
which produces result E. The subexpression E is referred to as the body
of the abstraction and v is called the bound variable of the abstraction
We also call lambda terms simply ‘terms’ or ‘expressions’.
Notational conventions: to achieve a minimal notation, we drop parentheses
whenever possible, and assume:
• Association to the left for iterated application:
F E
1
E
2
. . . E
n
denotes (. . . ((F E
1
) E
2
) . . . E
n
),
• Association to the right for iterated abstraction:
λx
1
. x
2
. . . . .x
n
.E or shortly λx
1
x
2
. . . x
n
. E
denotes λx
1
. (λx
2
. (. . . (λx
n
. E) . . .)).
Example: We can write the expression F
1
of the previous example as
λx y . 3 x +y, and λv . E
1
E
2
means (λv . (E
1
E
2
)).
Free/Bound Variables and α-conversion
We distinguish between free and bound occurrences of variables in an ex-
pression. An occurrence of v in E is said to be bound if it occurs within a
subexpression of E with the form λv . E
1
, and the occurrence is said to be
free otherwise.
Example: n occurs free in λx . (x + 1) n, whereas x occurs bound in this
expression. Both n and x occur bound in λn . (λx . x + 1) n. Further x
occurs both bound and free in (λx . x +1) x (the second occurrence of ‘x’
158 CHAPTER 9. CHURCH AND TURING
in this expression is bound, the third occurrence is free).
Definition of free variables: The free variables of a term E, denoted by
FV (E), is a set of variables defined recursively by:
• FV (C) = ∅,
• FV (v) = ¦v¦,
• FV (E
1
E
2
) = FV (E
1
) ∪ FV (E
2
),
• FV (λv . E) = FV (E) −¦v¦.
An expression E is said to be closed if FV (E) = ∅.
Example: The expression λz . (λx . z +x)(λy . y z) is closed.
α-conversion
We consider two terms as ‘equivalent’ if they only differ in their bound
variables. So λx . x and λy . y are considered being equivalent. But we must
distinguish λx . y +x and λy . y +y, since one has a free occurrence of y and
the other not. Note also that λxy . xy and λxy . yx are not equivalent. The
renaming process is called α-conversion, and allows us to change the name
of a bound variable, as long as we do so consistently. It is formally defined
as the equivalence relation generated by the following reduction:
Definition of α-reduction: λx . E →
α
λy . E

, where E

is obtained from
E by replacing all free occurrences of x in E by y, provided y is fresh, that is,
y neither occurs as a free variable nor as a bound variable in the expression
E (i.e. it does not occur in E).
Expressions that can be made textually equivalent by renaming bound
variables are called α-convertible or alpha(betically) equivalent. When two
lambda terms E
1
and E
2
are α-convertible in this sense we write E
1

α
E
2
,
and often also E
1
≡ E
2
.
9.2. CHURCH AND THE LAMBDA CALCULUS 159
Example: Some α-conversions:
λx . x + 1 ≡
α
λy . y + 1
λx . (λy . y x) ≡
α
λy . (λy . y y) (because the y’s in (λy . y x) will get
bound)
λx .(λy . x y)y ≡
α
λx .(λz . x z)y
From now on, two λ-terms are considered (syntactically) equal if they are
α-convertible to each other.
Substitution
We now formally define the concept of substitution of a variable in lambda
terms.
Definition of Substitution: The substitution of expression E for each free
occurrence of v in expression E
0
, denoted by E
0
[E/v], is defined by induction
on the structure of E
0
as:
• C[E/v] ≡ C
• x[E/v] ≡

E if x ≡ v
x if x ≡ v
• (E
1
E
2
)[E/v] ≡ (E
1
[E/v])(E
2
[E/v])
• (λx . E
1
)[E/v] ≡

λx . E
1
if x ≡ v
λx . (E
1
[E/v]) if x ≡ v and x / ∈ FV (E)
λy . ((E
1
[y/x])[E/v]) if x ≡ v and x ∈ FV (E)
and y / ∈ FV (E
1
E)
Example: (λx . z+7x)[x+3/z] ≡ λy . (z+7y)[x+3/z] ≡ λy . (x+3)+7y.
The following lemma tells us that substitution behaves well; it can be
proven by induction on the structure of λ-terms.
Lemma For all terms E
0
, E
1
, E
2
and variables x, y such that x ≡ y:
E
0
[E
1
/x][E
2
/y] ≡ E
0
[E
2
/y][E
1
[E
2
/y]/x].
160 CHAPTER 9. CHURCH AND TURING
Reduction System for the Lambda Calculus
As we have seen with an example at the beginning of this section, the
main rule for the lambda calculus is the beta reduction rule, that we can now
formally define.
Definition of β-reduction: β-reduction is the compatible relation gener-
ated by (λv . E
1
)E
2

β
E
1
[E
2
/v], with the rules:
E
1

β
E
2
E
1
E →
β
E
2
E
E
1

β
E
2
EE
1

β
EE
2
E
1

β
E
2
λv.E
1

β
λv.E
2
As before, any term matching the left-hand side of the rule is called a redex
and thus any expression of the form (λv . E
1
)E
2
is called a β-redex.
β-reduction is a reduction relation →
β
of the pure lambda calculus. We often
write → resp. instead of →
β
and
β
. We use =
β
(or sometimes simply
=) to denote the equivalence relation generated by →
β
. Note the difference
between ≡
(
α) and =
(β)
.
Example: (λnx . (x + 1) n) 7 4 →
β
(λx . (x + 1) 7) 4 →
β
(4 + 1) 7.
Example: This example illustrates the need of α-conversion during β reduc-
tion, even if distinct names are chosen from the start. Define TWICE ≡
λf . λx . f(fx), then
(λy . yy)TWICE

β
TWICE TWICE
≡ (λf . λx . f(fx)) TWICE

β
λx . TWICE (TWICE x)
≡ λx . TWICE ((λf . λx . f(fx))x)

β
λx . TWICE ((λx . f(fx))[x/f]) (Note the name clash)

α
λx . TWICE ((λy . f(fy))[x/f])
9.2. CHURCH AND THE LAMBDA CALCULUS 161
≡ λx . TWICE (λy . x(xy))

β
. . .
Example:
1. (λx . x + 1) ((λy . y y) 3)
β
(two possibilities) (3 3) + 1,
so different reduction paths are possible.
2. Ω ≡ (λx . xx)(λx . xx) →
β
(λx . xx)(λx . xx) →
β
, thus infinite
sequences of steps are possible: β-reduction is not always terminating.
This corresponds to ‘self-reproducing programs’.
3. (λx . xxx)(λx . xxx) →
β
(λx . xxx)(λx . xxx)(λx . xxx) →
β
, so
terms can even become arbitrarily large.
4. (λy . c)((λx . xxx)(λx . xxx)) → c, but also
(λy . c)((λx . xxx)(λx . xxx)) → (λy . c)((λx . xxx)(λx . xxx)(λx . xxx))
and the latter term can be reduced to c or again to a longer term, etc.
Although we already saw that λ-calculus is neither weakly nor strongly nor-
malizing, it does have the important confluence property. First we introduce
the following definition of the diamond property that we use to prove that

β
is confluent. To prevent confusion in the notation we will from now on
also use the implication symbol ⇒.
Definition of the Diamond Property: A binary relation → on the
lambda terms Λ satisfies the diamond property, notation →[= ♦ :=
(∀M, M
1
, M
2
: M, M
1
, M
2
∈ Λ : (M → M
1
∧ M → M
2
) ⇒ (∃M
3
: M
3
∈ Λ :
M
1
→ M
3
∧ M
2
→ M
3
))
Note that a reduction →
β
has the Church-Rosser property if it satisfies
the diamond property.
Lemma: Let → be a binary relation on a set Λ with its transitive,
reflexive closure and let →[= ♦. Then [= ♦.
162 CHAPTER 9. CHURCH AND TURING
Proof: Assume → is a binary relation on a set Λ with its transitive,
reflexive closure, and →[= ♦. We now have to prove that [= ♦. Suppose
M, L, K ∈ Λ, M L and M K. We then have to prove (∃N : N ∈ Λ :
L N ∧ K N). Let
(*) M ≡ M
0
→ M
1
→ . . . → M
n
≡ L, for some n ∈ N
(**) M ≡ K
0
→ K
1
→ . . . → K
m
≡ K, for some m ∈ N
We now need to apply a technique called induction loading (see for more
information the links on http://zax.mine.nu/stage/) to prove that K and L
have a common reduct N. To be precise, we show that l(m, n) holds for all
m, n ∈ N, with
l(m, n) := there exists a N(i, j) ∈ Λ, with i, j ∈ N and 0 ≤ i ≤ n
∧ 0 ≤ j ≤ m such that:
(a) N(i, 0) ≡ M
i
if 0 ≤ i ≤ n
(b) N(0, j) ≡ K
j
if 0 ≤ j ≤ m
(c) N(i, j) → N(i, j + 1) if 0 ≤ i ≤ n ∧ 0 ≤ j < m
(d) N(i, j) → N(i + 1, j) if 0 ≤ i < n ∧ 0 ≤ j ≤ m
Clearly, when l(m, n) is true for all m, n ∈ N, we know that K and L have
a common reduct. So the only remaining proof obligation is to show that
l(m, n) holds for all m, n ∈ N. We prove this by induction to n.
Base case (n): n=0
(a) let N(0, 0) be M
0
, then (a) holds trivially by reflexivity of ‘≡’.
(b) let N(0, j) be K
j
for 0 ≤ j ≤ m, then (b) also holds trivially.
Note that this is valid in combination with the definition under (a)
since N(0, 0) ≡ M
0
≡ M ≡ K
0
.
(c) N(i, j) → N(i, j + 1) holds because i = 0 and (**).
(d) N(i, j) → N(i + 1, j) holds trivially because n = 0 yields an empty range for i.
9.2. CHURCH AND THE LAMBDA CALCULUS 163
Induction case (n): Induction hypothesis (i.h.-n): suppose that for n = k,
k ∈ N, for all m ∈ N the statement l(m, n) is true. We now prove the
statement for n = k + 1. We do this by induction to m.
Base case (m): m=0
(a) let N(k + 1, 0) be M
k+1
for 0 ≤ k ≤ m, then (a) holds trivially.
(b) since j = 0 this amounts to N(0, 0) ≡ K
0
.
This is true because of our previous definition of N(0, 0) ≡ M
0
.
and the fact that M
0
≡ M ≡ K
0
.
(c) holds trivially, because m = 0 yields an empty range for j.
(d) N(i, j) → N(i + 1, j) because j = 0 and (*).
Induction case (m): Induction hypothesis (i.h.-m): suppose that for m = r
and n = k + 1, r ∈ N, the statement l(m, n) is true. We now prove the
statement for m = r + 1.
(a) N(i, 0) ≡ M
i
for 0 ≤ i ≤ k + 1 follows from i.h.-n.
(b) N(0, j) ≡ K
j
for 0 ≤ j ≤ r + 1 follows from i.h.-m.
(c) and (d)
We already know from the induction hypotheses that N(i, j) →
N(i, j +1) is okay for (0 ≤ i ≤ k +1 ∧0 ≤ j < r) ∨ (0 ≤ i < k ∧0 ≤
j < r + 1). What we now have to show is that this is also true for
i = k+1 and j = r+1. We know by (c) of i.h.-m there exists a N(k, r)
such that N(k, r) → N(k, r + 1). We also know by (d) of i.h.-n that
there exists a N(k, r) such that N(k, r) → N(k + 1, r). Then by the
diamond property of → we know (∃N(k +1, r +1) : N(k +1, r +1) ∈
Λ : N(k, r + 1) → N(k + 1, r + 1) ∧ N(k + 1, r) → N(k + 1, r + 1)).
We can now sketch the proof
2
of the following fundamental theorem of
the untyped lambda calculus:
2
The lines of the proof are due to W. Tait and P. Martin-L¨ of (see [6], section 3.2]), but
as far as I know this is the first proof that formalized the above lemma to a reasonable
extent.
164 CHAPTER 9. CHURCH AND TURING
Theorem (Church, Rosser): →
β
is confluent.
Proof: By the previous lemma, we know that if any binary relation on a set
satisfies the diamond property, its transitive reflexive closure also satisfies the
diamond property. Suppose we have a binary relation →
partial−β
on the set Λ
such that
β
is the transitive reflexive closure of →
partial−β
. So if we prove
that →
partial−β
satisfies the diamond property, by application of the previous
lemma we have proved that
β
satisfies the diamond property, i.e. →
β
is
confluent.
A concrete definition of →
partial−β
, a proof that its transitive reflexive closure
is indeed →
β
, and a proof that →
partial−β
satisfies the diamond property can
be found on pages 60-62 of [6].
Theorem: λ-calculus has the unique normal form property.
Proof: Suppose that a term a of 'Λ, →` has two normal forms, n
1
∈ Λ
and n
2
∈ Λ. This means there is no b ∈ Λ such that n
1
→ b or n
2
→ b.
But a n
1
∧ a n
2
, and then by the Church-Rosser property we know
(∃c : c ∈ Λ : a n
1
∧ a n
2
). But then we must have n
1
≡ n
2
.
Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, . . ..
Note that the term (λx.xx)(λx.xx) cannot be reduced to a normal form.
Confluence is a fundamental property for functional programming; we relay
on this when we evaluate programs by rewriting, knowing that we never have
to backtrack an evaluation (this is also one of the main differences with logic
programming).
In the λ-calculus we have defined in this section, we can represent natural
numbers and basic operations on the natural numbers. We will not show
this here; in most books on the lambda calculus there are some examples of
how to do basic arithmetic in lambda calculus. The λ-calculus represents a
certain class of (partial) functions on the integers. By a classical result of the
American mathematician Stephen C. Kleene (1909-1994) this is exactly the
set of (partial) recursive functions. The proof can be found in [6, theorem
9.2.16]. Church also thought of the set of functions that could be calculated
in his λ-calculus, and conjectured the following thesis:
9.2. CHURCH AND THE LAMBDA CALCULUS 165
Church’s thesis (1936) The set of effectively computable functions, i.e. functions
that intuitively (effectively) can be computed, is the same as the set of
functions that can be defined in λ-calculus.
A more formal version and detailed treatment of Church’s thesis can be
found in section 9.3.
Alan Turing proved in 1937 that the class of Turing computable functions is
the same as the class of functions definable in λ-calculus.
So the power of Turing Machines is the same as the power of λ-calculus.
Both models capture the intuitive idea of computation. This important thesis
is the subject of the next section.
166 CHAPTER 9. CHURCH AND TURING
9.3 The Church-Turing thesis
The Church-Turing thesis concerns the intuitive notion of algorithm (or ef-
fective or mechanical method) in logic and mathematics. The notion of an
algorithm or an effective method is an informal one, and attempts to char-
acterize this effectiveness lacked rigor, mainly because the key requirement
that the method demands no insight or ingenuity is left unexplicated.
One of Turing’s achievements in his paper of 1936 (reprinted in [19]
and online available at http://www.abelard.org/turpap2/tp2-ie.asp) was to
present a formally exact predicate with which the informal predicate ‘can be
calculated by means of an algorithm or effective method’ may be replaced.
The formal concept proposed by Turing is that of computability by a Turing
Machine (see section 9.1). He introduced this thesis in [90] in the course of
arguing that the ‘Entscheidungsproblem’ for the predicate calculus is unsolv-
able.
Turing’s thesis: TM’s can do anything that could be described as intu-
itively computable
Church also presented in [14] a formally exact way to express this no-
tion of intuitively computable. Turing’s method was however more obvious
and more general than Church’s, since the latter only considered functions
of positive integers. In order to calculate the values of the function Church
introduced his lambda calculus and specified the notion of a recursive func-
tion (see section 9.2).
Church’s thesis: A function of positive integers is effectively computable
only if it is recursive
The reverse implication is also referred to as the converse of Church’s
thesis. The class of lambda-definable functions and the class of recursive
functions were later shown to be identical. This was established in the case
of functions of positive integers by Church and the American mathematician
Kleene (see [47], [14]). After learning of Church’s proposal, Turing quickly
established that the apparatus of lambda-definability and his own apparatus
of computability were equivalent ([89], page 263).
9.3. THE CHURCH-TURING THESIS 167
Theorem: Lambda-definability and Turing Machine-computability
are equivalent.
Proof: See [89, page 263] for a proof that Turing’s machines and Church’s
lambda calculus can compute the same set of functions.
Although Turing and Church had chosen different ways to formalize the
intuitive notion of effective computability, respectively by identifying the no-
tion with that of computability by a Turing Machine and in the lambda cal-
culus, both methods are equivalent. After this proof of equivalence, Kleene
introduced the term ‘Church-Turing thesis’ to refer to any of the two equiv-
alent theses ([48], page 232).
Church-Turing thesis: The intuitive notion of an algorithm equals the
Turing Machine algorithm or (equivalent) the calculable functions of lambda-
calculus
There are a number of misunderstandings of the Church-Turing thesis,
collected in [16]; Turing did not show that
• Any problem can be solved ‘by instructions, explicitly stated rules or
procedures’
• A universal TM ‘can compute any function that any computer, with
any architecture can compute’ (Turing said noting about the limits of
what can be computed by a machine)
• Whatever can be calculated by a machine (working on finite data in
accordance with a finite program of instructions) is Turing-machine-
computable (this is known as Thesis M, see [16])
• Any process that can be given a systematic mathematical description
(or a ‘precise enough characterization of a set of steps’, or that is
‘scientifically describable’ or ‘scientifically explicable’) can be simulated
by a TM (this is known as Thesis S, see [16])
Since the word ‘computable’ is often tied by definition to effective calcu-
lability, the Church-Turing thesis is often stated as ‘All computable functions
are computable by a Turing Machine’ (a function is said to be computable if
and only if there is an effective procedure for determining its values).
168 CHAPTER 9. CHURCH AND TURING
If we summarize the above, we can say that to define the concept of an
algorithm, Church used a notational system, the lambda calculus. Turing did
the same with his theoretical computing device, the Turing Machine. On the
face they seemed very different from one another, but these two definitions
turned out to be equivalent, in the sense that each picks out the same set
of mathematical functions. The Church-Turing thesis is the assertion that
this set contains every function whose values can be obtained by a method
or algorithm corresponding to our intuitive notion of effectively computable.
Clearly, if there were functions of which an informal (intuitive) statement,
but not the formal statement, were true, then the latter would be less gene-
ral than the former and so could not be reasonably be employed to replace
it. When the thesis is expressed in terms of the formal concept by Turing,
it is appropriate to refer to the thesis also as the Turing thesis, and idem
for the case of Church. It is agreed amongst mathematicians and logicians
that ‘computable by means of a TM’ is the correct accurate rendering of the
informal notion in question.
Chapter 10
Conclusion
It is a profoundly erroneous truism, repeated by all copy books
and by eminent people, when they are making speeches, that we
should cultivate the habit of thinking of what we are doing. The
precise opposite is the case. Civilization advances by extending
the number of important operations which we can perform with-
out thinking about them. . . . The study of mathematics is apt to
commence in disappointment . . . We are told that by its aid the
stars are weighed and the billions of molecules in a drop of water
are counted. Yet, like the ghost of Hamlet’s father, this greatest
science eludes the efforts of our mental weapons to grasp it.
- A. Whitehead, in [99]
When I started my study on the foundations of mathematics, I did not
quite know what to expect. By now I’ve learned that the foundations of
mathematics can be a fascinating and important subject. Learning this new
subject was an interesting challenge, but sometimes hard work when I had
to go through numerous books that were full of details or too vague and
philosophical. Most books that I found on the foundations of mathematics
were either very detailed and descriptive (with an unmatched level of detail
and exactness is the book [31] of I. Grattan-Guinness) or treat only a part
of the theory that was developed from 1890 to 1940 (for example [17] gives
an excellent introduction to set theory). One of the better, though relatively
unknown, is the book of G.T. Kneebone [49] that is quite complete and still
considerably theoretic. One of the motivations to write this article was to
169
170 CHAPTER 10. CONCLUSION
present the theory properly. Hopefully that makes it more clear and enjoy-
able. Some of the good literature used, such as the books just mentioned,
will be found in the references at the end of this report.
At the same time, I also tried to briefly introduce the reader to the his-
torical context of the most important developments. Most undergraduate
courses I have taken gave little or no information about the history that is
laying behind the theory. Emphasis was laid on the accumulation of mathe-
matical knowledge. I believe that the history of mathematics in education
can not only make the study of mathematics more interesting, but also help
in the growth of mathematical understanding and appreciation of the current
form of the theory.
I want to conclude this report with a summary of the theory and my own
view on the project, and with some ideas for future work.
The project
In the beginning of the 20
th
century Hilbert said we should formalize all
of mathematics, mathematical reasoning. This ‘project’ (from now on I will
refer to it as the project) has been the central theme of this report. When
reading about the work and biographies of all those brilliant men that have
put themselves on this problem, you can (at least that’s what happened to
me) get caught up into this fascinating philosophical question.
To most people however, this all seems very impractical. We all know
you can make a popular operating system or start your own business on the
web and in one year make a million dollars if you’re lucky. And when it
comes to verifying mathematical proofs and making reliable software, a for-
mal basis is rarely used, the human mind is still the most important, and
other techniques, such as model-checking, are preferred. It might be worth
writing another article, on how and why in that respect the more practical,
working mathematicians and more theoretical logicians (or formalists, if you
prefer) grew apart. But let’s first go back to the project.
The attempt to formalize mathematical reasoning is not new - the Greek
already thought rationality was the supreme goal. We can think of Plato
171
and Reason, or as Russell
1
would say - think of Pythagoras and Rationality!
Aristoteles made a big step in formalizing the reasoning, with his patterns
of reasoning that are known as syllogisms. Ever since, logic was further
developed and important contributions come from De Morgan, Leibniz and
especially Boole. Because he was interested in theology and God (see [31,
chapter 3] and also [30, section 5.8, page 203]), Cantor became obsessed with
the notion of infinite, and developed his theory of infinite sets. With Cantor
mathematics got more abstract, and some people regarded his set theory
as a disease. Poincar´e, the great French mathematician, said
2
: (from [95])
“Later generations will regard Mengenlehre (set theory) as a disease from
which one has recovered.”. Peano and Frege, as we have learned in chapter
4, brought mathematical reasoning to an even higher level of formalization.
So far, so good. But there turned out to be some problems, and although
Cantor had already noticed this (see Cantor’s paradox in section 3.8), it was
Russell who spread the bad news to everyone, by stating his Russell paradox.
At that point Hilbert proposed to use a formal axiomatic method to solve
these problems, and he gave his famous three requirements of consistency,
completeness and decidability.
This proposal of Hilbert to formalize mathematics, led to the development
of several axiomatic systems, such as those of Zermelo and Fraenkel, and of
G¨odel, Bernays and Neumann. Russell and Whitehead made their own at-
tempt to formalize mathematics, with their theory of types. But although
all of these attempts were fruitful to a certain extent, in total they all failed,
and it took G¨ odel and Turing to show that in fact ‘the project’ couldn’t
be done. Formalizing mathematics so that we have absolute truth is not
possible! But these works of G¨odel and Turing were new and complicated,
and not everyone clearly recognized its importance. And even nowadays, few
people are familiar with the details of their work, and we often see confu-
sion between notions like ‘checking the proof of a statement’ and ‘checking
whether a statement is true (or not)’. There is also much confusion about the
exact implications of G¨ odel’s and Turing’s work. G¨ odel created a statement
within arithmetics, that is not provable in any axiomatic system. Turing
later formalized the notion of computability to show there is no mechanical
1
Although rationality is more commonly associated with Plato, Russell always insisted
on attributing it to Pythagoras (see [62]).
2
Whether or not he actually said this is a matter of debate amongst historians of
mathematics.
172 CHAPTER 10. CONCLUSION
procedure to decide if a statement is correct or not.
At first this was a shock, but then mathematicians were saying (and
again it would be nice to write an article about the different responses of
mathematicians and logicians): so what - we should do mathematics exactly
the same way as we’ve always done it, this does not apply to the problems
I care about. Indeed mathematicians continued with their work, and the
theorems of G¨odel and Turing had no or little impact in practice on how
we (should) do mathematics. The only effect the project might have had on
working mathematicians, is that they have become a bit more precise in the
use of language and in writing their proofs. Some of course were inspired
by problems like the 23 of Hilbert. But there has been another consequence
of all this theoretical work, that I was made aware of through a videotaped
lecture of G.J. Chaitin on the internet. I quote him about Hilbert’s attempt
to formalize all mathematics after the publications of the theorems of G¨ odel
and Turing: “It failed in that precise technical sense. But in fact it succeeded
magnificently, not formalization of reasoning, but formalization of algorithms
has been the great technological success of our time - computer programming
languages! So if you look at the history of the beginning of this century you’ll
see papers by logicians studying the foundations of mathematics in which
they had predicate calculi. Now you look back and you say this is clearly
a programming language! [...] If you look at Turing’s paper of course there
is a machine language [...]. Or, as von Neumann said: the universal Turing
Machine is really the notion of a general purpose programmable computer -
and that’s the idea of software. [...] If you look at papers by Alonzo Church
you see the lambda calculus, which is a functional programming language.
If you look at G¨ odel’s original paper you see what looks like LISP, it’s very
close to LISP”. As he showed there are numerous examples of unexpected
offspring of theoretical research, and all of the foundational work is not so
impractical after all! As G.J. Chaitin concluded in his speech, this is the
way “we’re all benefiting from the glorious failure of this project!”. Now
this is not entirely true, but it is true that theoretical studies, as he says
“don’t have spin-off in dollars right away, but sometimes they have vastly
unexpected consequences”. Formal methods/studies have not always done a
good job promoting themselves - maybe we can emphasize this aspect and
show that technology often advances through fascinating impractical ideas.
173
Status of the project
That brings us to ask if the question of the foundation of mathematics,
more than a decade after Hilbert formulated it, is now settled once and for
all. The short answer is: it is not. Even from the amount of interesting
resources on current research that are available on the internet alone, we can
conclude there is still a lot of work to do on the foundations of mathematics.
I consider creating an online version of this document with more background
information and links.
Although G¨ odel and Turing showed that it is impossible to totally for-
malize even basic arithmetic, let alone the whole of mathematics, it is still
possible to formalize parts of mathematics (for example, geometry) success-
fully. As P. Andrews says in [4], “attempts to understand the nature of rea-
soning and to build sophisticated information systems which can draw logical
conclusions may be regarded as part of an endeavor to fashion more powerful
intellectual tools for coping with the increasingly complex problems which
confront mankind.” In that respect the formalization is not restricted to ma-
thematical reasoning, and it can also be applied to other disciplines (such
as physics, chemistry or even social sciences). Especially the development
of software and computer systems will be facilitated by a formalization of
theories. Despite that total formalization of parts of mathematics is very
useful, this is not the focus of most current research: (most people believe
that) the human mind will (at least for the near future) be the one to prove
whether a given mathematical statement is true or not.
Ideas for future work and
distinguishment between mathematics and software
And although it cannot be determined by a machine whether any given
mathematical statement is true, we can try to develop an axiomatic system
such that as much as possible of the interesting statements
3
can be proved
within that system. This is useful because, even when all axiomatic systems
are incomplete and there are undecidable statements, if we provide one of the
3
As interesting statements, we consider all statements in the (everyday) work of prac-
ticing mathematicians. These ‘practical’ statements do not include the specific purely
theoretical statements that G¨odel invented for his incompleteness theorem.
174 CHAPTER 10. CONCLUSION
statements that the system does contain, and which we claim to be decida-
ble by providing a concrete and completely formalized (dis)proof of it within
that system, we still have a way to decide mechanically whether or not the
proof is correct for the given statement. The question then is if the set of
statements for which we can do this, still forms a part of mathematics that
is interesting enough. This has to be a part of our investigation: to find out
how many of the practical mathematical proofs contain ‘meta-arguments’, in
other words which classes will fall outside our system. Although we want to
change as little as possible to the (side of) mathematics itself, this also might
be a necessary option
4
. As P. Andrews calls his book [4], we get: ‘to truth
through proof’. This should be the first goal for the near future:
(1) Investigate which parts of mathematics can(not) be formalized (i.e. con-
tain ‘meta-arguments’), which formalization is best usable and allows most
parts of (practical) mathematics to be formalized, and totally formalize proof
checking for as most parts of mathematics as possible.
Formalization is not only important to check the correctness of mathema-
tical theories that are becoming ever more complex. Many models in physics
and chemistry depend on underlying mathematical theorems, and the suc-
cess of the model depends on the correctness of the mathematical theorems.
Also, we are becoming more and more dependent on automated systems, in
particular computers and software. There is a growing need for reliable (that
is, correctly specified and working according to the specifications) software,
not only for (safety) critical systems, but also in everyday applications. A
formal approach can not only be used to prove correctness of mathematical
statements but also of computer programs. This is an important point:
Distinguishment between mathematics and software construction.
Instead of the proofs of mathematical statements, we are then checking
the derivation steps of program derivations. I want to emphasize this differ-
ence, since it is often unclear or left implicit which of the two is meant when
arguments for/against formalistic studies are given. We have to realize that
we can never obtain a 100% guarantee of correctness of any algorithm, since
4
For a successful formalization of parts of mathematics we therefore do not only look
at the axiomatic system, but it also might require us to limit certain parts of mathematics
so that they contain less undecidable proofs or require us to rewrite certain existing proofs
to a form that is permitted by the system.
175
we also are dependent on the correctness of the proof-checker. That is why
we have to try to keep the proof-checker as simple, small and intuitive as pos-
sible (see also the ‘Bruijn criterion’ in [26, pages 4 and 26]). And analogue,
we can never obtain a 100% guarantee of correctness of any mathematical
statement, since we learned from G¨odel that the consistency of any axiomatic
system cannot be proved within that system, and therefore we better also
try to keep the axiomatic system as simple, small and intuitive as possible
(we could see all this as the Bruijn criterion variant for axiomatic systems).
But nevertheless, any such implementation of a proof checker would give us
the highest degree of certainty possible.
Software and Proof Checking
I would also like to remark that proof checking for programs can only give
us a way to verify the correctness of programs. At least as important (to ob-
tain correct programs) is the correct construction of programs. This is the
focus of the work in the area of programming methodology. At the Eindhoven
University of Technology for example, the techniques of E.W. Dijkstra are
used to derive correct programs from their specification. Unfortunately both
areas (proof checking/verification vs. construction/derivation) are merely ad-
vocates of their own approach, while a combination of both could give the
best results. Although there has been some minor work on formalizing these
proof techniques and combining formal methods and program derivations
(see for example [26]), cooperation is still minimal. If we go one step further
back in the process of creating correct software, the success of any piece of
software depends on the correctness of its specification. These first phases of
software engineering (indicating user requirements/specifications) can also be
adopted to comply with the methods of program derivation and formal proof
checkers (note that we not only use the term ‘proof checker’ for mathematics,
i.e. to check mathematical statements, but also for the software variant: for
checking algorithms/programs derivations). And since we can never obtain
a 100% guarantee of correctness of software (it depends for example on the
correctness of the specifications and the proof checker itself), model checking
techniques can also be used as a verification method to improve reliability
even further. Therefore I stress for an integrated approach, for the combina-
tion of all of the mentioned methods can only together give us the highest
reliability (i.e. highest chance of correctness of software). Such an integrated
approach requires research and cooperation between the various branches
176 CHAPTER 10. CONCLUSION
representing the methods I mentioned before and ultimately incorporation
in the software engineering process.
Mathematics and Proof Checking
Let’s go back to proof checking of mathematical statements. We men-
tioned the first goal of investigating and formalizing proof checking. As a
next step (2) we can think of building proof assistants. Proof assistants not
only check the proofs for us, but also help us in making the proofs: they
are tools that are a combination of a proof development system and a proof
checker. A good article about proof assistants using dependent type systems
can be found in [8]. Also an interesting article on computer assisted mathe-
matics (for computer algebra) is [7] with an abstract history of computations
versus proofs in mathematics. The notion of ‘helping’ or ‘assisting’ in making
proofs might be considered vague. For complicated statements, we can think
of tools that keep track of the context of the proof, of the remaining proof
obligations and even fill in part of the proofs for us automatically.
Proof assistants should make it easier for us to prove mathematical theorems.
Then (3) we can think of building a standard library of proved mathematics.
After a proof checker has confirmed the correctness of a given mathematical
statement and its corresponding proof, they can be stored in a database. It
can be accessible to everyone via the internet and even be used for previously
mentioned automated proving methods by proof assistants. And although we
can not see the quality of mathematical work as evident as the quality of phys-
ical products, this could be the long awaited ‘quality stamp’ for mathematics.
There have already been attempts to build standard libraries of mathematics
(see the Mizar project at http://www.mizar.org/ and the PRL project, see
http://.www.cs.cornell.edu/Info/Projects/NuPRL/nuprl.html, but they lack
the formal basis that has to be provided by (1) and (2)). Barendregt and
his group have formalized parts of algebra using the theorem prover COQ.
This shows that it is possible to formalize large parts of mathematics, but
the process itself of formalizing mathematics is too direct and informal and
needs to be further developed. Many valuable experiences have come out of
attempts on what are here called phase (2), (3) and (4), but for a successful
result this is premature and do we first have to start thoroughly at the be-
ginning (1). Work in this direction was done in [44], where a syntax-driven
derivation system is presented for a formal language of mathematics called
177
Weak Type Theory. This is a start of a more rigorous approach to the trans-
lation of mathematical texts (statements and proofs).
We see the extension of proof assistants with more intelligent and sophis-
ticated automated proving methods, as the last and final phase (4) of future
work. Part of the branch of automated proving are classical theorem proving
methods (such as for example automated induction, etc.). New methods are
from areas such as neural networks, fuzzy logic and genetic and DNA com-
puting and in the future possibly even quantum computing.
I want to end these ideas by summarizing the steps that are laying ahead
of us, in a new project.
The new project (for mathematics):
1 Investigate which parts of mathematics can(not) be formalized (i.e. con-
tain ‘meta-arguments’), which formalization is best usable and allows
most parts of (practical) mathematics to be formalized, and totally
formalize proof checking for as most parts of mathematics as possible
2 building a proof assistant (probably based on some form of WTT and
some form of TT)
3 build a standard library (archive) of proved mathematics
4 further develop automated proving techniques (to build in the proof
assistant)
And similarly we can formulate the new project for computer systems:
The new project (for software construction):
1 formalize as much of program derivation checking as possible
2 build a programming assistant (environment) based on a suited (and
preferably popular) programming language
3 build a standard library of reusable correct software (i.e. suitable for
component based software engineering) and its specification
4 further develop automated proving and program derivation techniques
178 CHAPTER 10. CONCLUSION
One of the most important questions, part of step (1), has so far in this
conclusion been avoided: What to take for the basis of mathematics? This is
one of the most difficult questions and as we have seen many great scientists
have thought about this. There is currently no consensus of what is the best
approach, and I am not in the position to give an argumented opinion. A
thorough research of the alternatives will have to yield the best approach and
will show which choice of foundational system is best usable in practice.
The only thing I can say is that it seems that recently most people seem to
favor type theory over category theory, relational calculi and also over set
theory. P.J. Scott for example favors type theory over category theory in
the introduction of [55]. H. Barendregt gives arguments for the use of type
theory over set theory in [7], and we quote from [4, the second page of the
preface]: “[People prefer the approach they are most familiar with.] However,
those familiar with both type theory and axiomatic set theory recognize that
in some ways the former provides a more natural vehicle than the latter for
formalizing what mathematicians actually do”. On the contrary, on http://-
www.rbjones.com/rbjpuc/logic/jrh0111.htm we find a detailed assessment
on the choice for a foundational system, with advantages of set theory over
type theory. Also, several new types of logic have been proposed, such as IF
logic (see [37]) and several types of so-called ‘fuzzy logics’, but until so far
it seems they lack preciness, formalization and proofs to support claims that
they can be used successfully as a foundation for mathematics.
A final remark on the debate between type theory and axiomatic set theory
as a foundational basis, is that if there is a mapping from the axioms of
(some form of) set theory in (some form of) type theory and vice versa, type
theoretic expressions have their counterparts in set theory. It is interesting to
investigate if among such mappings there is indeed a bijection. That would
show the equivalence of both theories in expressive power, so that the debate
can turn onto the question which theory is more intuitive and useful.
Some do not really believe in a successful formalization of mathematics but
rather see the indeterminacies in mathematical representations and the un-
decidabilities in any formal system as the source of problem solving and
creative power (see [87, page 174]). This standpoint was already mentioned
in 1807 by the German mathematician Hegel (1770-1831) in [35]: “Dagegen
muß behauptet werden, daß die Wahrheit nicht ein ausgepr¨ agte M¨ unze ist,
die fertig gegeben und so angestrichen werden kann”.
179
I am aware of the limitations of this report. Many chapters are still infor-
mal, such as the work of Frege in chapter 4. The theory of types in chapter
7 and of G¨ odels incomepleteness theorem in chapter 8 are not completely
covered and certain subjects closer to logic (such as intuitionism) are treated
very minimally. The only excuse I have is that it is simply not possible to
study all the original works in such a short period of time, and include all
theory in this report. I hope to complete this work at a later stage. It might
also be worth to extend (on both sides) the period of which the theory is
treated in this report. Recently we have seen interesting new theories on
category and type theory and even on the foundations of mathematics, as
we look at Chaitin’s results on randomness; it seems that he went further
where G¨odel and Turing left off. Finally I would like to remark that the ‘new
project’, consisting of the four steps mentioned in this conclusion, is just my
own view of work that lays ahead of us. To end with a concluding remark
by Alan Turing, from his paper on the Turing test: “We can only see a short
distance ahead, but we can see plenty there that needs to be done”.
Mark Scheffer, August 2001
5
5
p.s. To those who wonder what the turtle and the elephant are doing on the cover of
this report, I refer to the website http://zax.mine.nu/stage/.
180 CHAPTER 10. CONCLUSION
Appendix A
Timeline and Images
Figure A.1: Luitzen Brouwer
Figure A.2: George Cantor
Drawings by Soshichi Uchii, suchii@bun.kyoto-u.ac.jp;
Photo Quine by Kelly Wise;
Photo Ramsey due to Harcourt, Brace, Jovanovich.
181
182 APPENDIX A. TIMELINE AND IMAGES
Figure A.3: Richard Dedekind
Figure A.4: Gottlob Frege
Figure A.5: Kurt G¨ odel
Figure A.6: David Hilbert
183
Figure A.7: John von Neumann
Figure A.8: Giuseppe Peano
Figure A.9: Henri Poincar´e
Figure A.10: Willard Van Orman Quine
184 APPENDIX A. TIMELINE AND IMAGES
Figure A.11: Frank Plumpton Ramsey
Figure A.12: Bertrand Russell
Figure A.13: Alan Turing
185
186 APPENDIX A. TIMELINE AND IMAGES
Bibliography
[1] Y. Bar-Hillel A.A. Fraenkel and A. Levy. Foundations of set theory.
North-Holland Press, Amsterdam, 2 edition, 1973. First edition 1958.
[2] W. Ackermann and D. Hilbert. Grundz¨ uge der Theoretischen Logik,
volume Band XXVII of Die Grundlehren der Mathematischen Wis-
senschaften in Einzeldarstellungen. Springer-Verlag, first edition, 1928.
Berlin.
[3] J.H.J. Almering. Analyse. Delftse Uitgevers Maatschappij, 1993.
[4] P. Andrews. An introduction to mathematical logic and type theory: to
truth through proof. Academic press, 1986.
[5] J. Backer and P. Rudnicki. Hilbert’s basis theorem. Association of
Mizar Users, University of Bialystok, 12, 2000, 2000. Published in
Journal of Formalised Mathematics.
[6] H. Barendregt. The Lambda Calculus - Its Syntax and Semantics, vol-
ume 103. Elsevier Science Publishing Company, Inc., 1984.
[7] H. Barendregt and A.M. Cohen. Electronic Communication of Ma-
thematics and the Interaction of Computer Algebra Systems and Proof
Assistants. J. Symbolic Computation. Academic Press, 2001.
[8] H. Barendregt and H. Geuvers. Proof-checking using Dependent Type
Systems, volume 2, chapter 18, pages 1149-1240 of Handbook of Artifi-
cial Reasoning. Oxford Press, 2001.
[9] C.J. Bloo. Computational Models. TU/e Press, 2001. Manuscript
originally started by H. Geuvers and J. Hooman.
187
188 BIBLIOGRAPHY
[10] J. Breuer. Introduction to the Theory of Sets. Prentice-Hall, August
1958.
[11] Encyclopedia Brittanica. P. Bernays. EB, 2000.
[12] K.S. Brown. Mathematics. Seanet, 1991.
[13] G. Cantor. Ein beitrag zur mannigfaltigkeitslehre. Journal f. reine und
angew. Math., Gesammelte Abhandlungen., 84, pages 119-133, 1878.
Translated in ‘Contributions to the foundation of the theory of transfi-
nite numbers (translation from German’, by Philip E. Jourdain, Dover
Publishing, 1952.
[14] A. Church. An unsolvable problem in elementary number theory, vol-
ume 58. American journal of Mathematics, 1936.
[15] P.J. Cohen. Set Theory and the Continuum Hypothesis. Benjamin,
1966.
[16] B.J. Copeland. The Church-Turing Thesis. Springer-Verlag, 1997. Item
in Stanford Encyclopedia of Philosophy.
[17] H.C. Doets D. van Dalen and H. de Swart. Sets: Naive, Axiomatic and
Applied. Pergamon Press, 1978.
[18] J.W. Dauben. Georg Cantor, His Mathematics and Philosophy of the
Infinite. Harvard University Press, 1979.
[19] M. Davis. The Undecidable: Basic Papers on Undecidable Propositions,
Unsolvable Problems and Computable Functions. Raven Press, New
York, 1965.
[20] Diverse. Mathematische Annalen, 65. Springer-Verlag, Berlin, 1908.
[21] A. Einstein. Relativity: the special and general theory. Methuen Press,
London, 1970.
[22] H. Eves. Mathematical Circles Revisited. Boston Press, 1971.
[23] H. Eves. Foundations and fundamental concepts of mathematics. Dover
publications inc., Mineola, New York, third edition edition, 1997.
BIBLIOGRAPHY 189
[24] A. Fraenkel. Einleitung in die Mengenlehre. Springer-Verlag, third
edition, 1928.
[25] A.A. Fraenkel. Abstract Set Theory. North-Holland Press, Amsterdam,
3 edition, 1966. First edition in 1953.
[26] M. Franssen. Cocktail. Eindhoven University Press, 2000. Doctoral
thesis.
[27] K. G¨ odel. On formally undecidable propositions of Principia Mathema-
tica and related systems. Dover publications, New York, 1992. English
translation of G¨ odel’s original 1931 publication of the incompleteness
theorem. First published in 1962 by Basic Books, inc., New York.
[28] D. Goldrei. Classic set theory, a guided independant study. Chapman
and Hall, 1996.
[29] I. Grattan-Guinness. How did Russell write the principles of mathema-
tics (1903). McMaster University Library Press, 1997. In the Journal
of the Bertrand Russell Archive.
[30] I. Grattan-Guinness. From the Calculus to Set theory 1630-1910.
Princeton University Press, 2000. First published in 1980 by G. Duck-
worth & Co, London.
[31] I. Grattan-Guinness. The Search for Mathematical Roots 1870-1940.
Princeton University Press, 2000.
[32] I. Grattan-Guinness. A sideways look at Hilbert’s Twenty-three Pro-
blems of 1900. Middlesex University Press, 2000.
[33] J. Haim. Introduction of the Israel Mathematical Conference Procee-
dings, volume 6. Bar-llan University Press, 1993.
[34] P.R. Halmos. Naive Set Theory. Van Nostrand Press, London, 1990.
[35] G.W.F. Hegel. Ph¨anomenologie des Geistes. Reprint: Meiner, Hbg.,
1807. English translation ‘The Phemenology of Mind’ by J.B. Baillie
in 1910, London.
[36] H. Hermes and H. Schulz. Mathematische Logik. Unknown, 1952. In
Encyklopedia Mathematische Wissenschaften, I1, 1, I, page 58.
190 BIBLIOGRAPHY
[37] J. Hintikka. The Principles of Mathematics Revisited. Cambridge Uni-
versity Press, 1996.
[38] A. Hodges. Turing. The Great Philosophers. Phoenix, 1997.
[39] A.D. Irvine. Bertrand Arthur William Russell. Stanford University
Press, 2000.
[40] D. Joyce. Hilbert’s 1900 Address. Clark University, Worcester, 1997.
[41] D. Joyce. A list of Hilbert’s problems. Clark University, Worcester,
1997.
[42] D. Joyce. The Mathematical Problems of David Hilbert, http://-
alepho.clarku.edu/ djoyce/hilbert/. Clark University, Worcester, 1997.
[43] F. Kamareddine and T. Laan. A reflection on russell’s ramified types
and kripke’s hierarchy of truths. Journal of the Interest Group in Pure
and Applied Logic, 4 (2):195–213, 1996.
[44] F. Kamareddine and R. Nederpelt. A derivation system for a formal
language of mathematics. To be published, July 2001.
[45] I. Kaplansky. Encyclopedia Brittanica, item on David Hilbert. EB,
1990.
[46] E. Kasner and J. Newman. Mathematicians and the imagination. New
York Publishing, 1940.
[47] S.C. Kleene. Lambda-definability and recursiveness. Duke Mathemati-
cal Journal 2:340-353, 1936.
[48] S.C. Kleene. Mathematical Logic. New York, 1967.
[49] G.T. Kneebone. Mathematical logic and the foundations of mathema-
tics. D. van Nostrand Company, 1963. Reprint 2001.
[50] J. Koendrink. Solid Shape. Cambridge, 1990.
[51] K. Kunen. Set theory: an introduction of independence proofs. New
York Press, 1980.
BIBLIOGRAPHY 191
[52] T. Laan. A formalization of the ramified type theory. TUE Computing
Science Reports, 1994. Technical Report 94-33.
[53] T. Laan. The Evolution of Type Theory in Logic and Mathematics.
PhD thesis, Eindhoven University of Technology, 1997.
[54] T. Laan and R.P. Nederpelt. A modern elaboration of the ramified
theory of types. Studia Logica, 57(2/3):243–278, 1996.
[55] J. Lambek and P.J. Scott. Introduction to higher order logic. Cambridge
University Press, 2001.
[56] P. Linz. An introduction to formal languages and automata. D.C. Heath
and Company, 1990.
[57] J.R. Lucas. The conceptual roots of mathematics. Rootledge Press,
2000.
[58] D. MacHale. Comic Sections. Dublin, 1993.
[59] Mosch´e Machover. Set theory, logic and their limitations. Cambridge
University Press, 1996.
[60] P. Mancosu. From Brouwer to Hilbert, the debate on the foundations
of mathematics in the 1920s. Oxford University Press, 1998.
[61] E. Maor. To infinity and beyond. Boston Press, 1987.
[62] R. Monk. Russell. The Great Philosophers. Routledge, 1999. First
published in 1997 by Phoenix.
[63] G.H. Moore. Zermelo’s axiom of choice: it’s origins, development and
influence. Springer-Verlag, 1982.
[64] E. Nagel and J. R. Newman. G¨ odel’s proof. New York University Press,
1986. First published in 1958.
[65] G. Peano. Calcolo differenziale e principii di calcolo integrale. Turin
Press, 1884.
[66] G. Peano. Applicazioni geometriche del calcolo infinitesimale. Turin
Press, 1887.
192 BIBLIOGRAPHY
[67] G. Peano. Calcolo geometrico secundo lAusdehnungslehre di H. Grass-
mann e precedutto dalle operazioni della logica deduttiva. Fratelli Bocca,
Torino, 1888. Translation in German ‘Geometric Calculus : Accor-
ding to the Ausdehnungslehre of H. Grassmann’ by Lloyd Kannenberg,
november 1999, Publisher Birkhauser.
[68] G. Peano. Dizionario di matematica. Parte prima. Logica matematica.
Unknown, 1901. In Ri(e)vista di mathematica, edited by Peano.
[69] L.J.J. Wittgenstein P.M. Sullivan. The foundations of mathematics.
Unknown, June 1927. Reprinted by F. P. Ramsey, June 1927, Theoria
61 (2) (1995), pages 105-142.
[70] W. Van Orman Quine. Mathematical Logic. Harvard University Press,
1951. Revised edition of Norton, New York 1940.
[71] W. Van Orman Quine. From a Logical Point of View: 9 Logico-
Philisophical Essays. Harvard University Press, 2 edition, 1961. Cam-
bridge, Massachusetts.
[72] W. Van Orman Quine. Set Theory and its Logic. Harvard University
Press, 1963. Cambridge, Massachusetts.
[73] R.C.W. Bertrand Russell entry in Encyclopedia Brittanica. EB, 2000.
[74] J. Richard. Les principes de math´ematiques et le probl`eme des ensem-
bles. Revue gnrale des sciences pures et appliques, 16, 1905. Published
also in Acta Mathematica 30 (1906), pages 295-296.
[75] B. Riemann. Uber die Hypothesen, welche der Geometrie zu grunde
liegen. G¨ottingen Press, 1854.
[76] N. Rose. Mathematical Maxims and Minims. Raleigh NC, 1988.
[77] H. Rubin and J.E. Rubin. Equivalents of the axiom of choice. North-
Holland Press, Amsterdam, 1963.
[78] B. Russell. My philosophical development. London: George Allen and
Unwin, New York: Simon and Schuster, 1959.
BIBLIOGRAPHY 193
[79] B. Russell. Introduction to Mathematical Philosophy. The Great
Philosophers. London: George Allen and Unwin; New York: The
Macmillan Company, 1999. First published in 1997 by Phoenix.
[80] B. Russell. The autobiography of Bertrand Russell. Routledge, 2000.
[81] S. Shelah. Proper forcing, lecture notes in mathematics. Springer-
Verlag, 1982.
[82] M. Sipser. Introduction to the theory of computation. PWS Publishing
Company, Boston, 1997.
[83] A.T. Skolem. Einige bemerkungen zur axiomatischen begr¨ undung der
mengenlehre. Akademiska Bokhandeln, Helsinki, 1922. In ‘Matem-
atikerkongressen i Helsingfors 4-7 juli 1922, Den femte skandinaviska
matematikerkongressen’, pages. 217-232. Reprinted in ‘Selected Works
in Logic’, by A.T. Skolem, edited by Jens E. Fenstad, 1970, Publisher
Universitetsforlaget, Oslo.
[84] R.M. Smullyan. G¨ odel’s incompleteness theorems. Oxford Logic
Guides. Oxford University Press, 1992.
[85] B. Sobocinski. L’analyse de l’antinomie Russellienne par Lesniewski.
Unknown, 1950. Methodus I, pages 94-107, 220-228, 308-316; Metho-
dus II, pages 237-257.
[86] F. Kamareddine T. Laan and R. Nederpelt”. Types in Logic and Ma-
thematics before 1940, volume 8. Bulletin of Symbolic Logic, January
2002. To be published.
[87] M. Tiles. Mathematics and the image of reason. Routledge, 1991.
[88] E.C. Titchmarsh. Mathematical Maxims and Minims. Rome Press,
1988.
[89] A.M. Turing. On computable numbers, with an application to the Ent-
scheidungsproblem, volume 42, pages 230-265 of 2. London Mathe-
matical Society, 1936. With corrections from Proceedings of the Lon-
don Mathematical Society, Series 2, Vol.43 (1937) pages 544 to 546.
Reprinted with some annotations in ‘The Undecidable: Basic Papers
on Undecidable Propositions, Unsolvable Problems and Computable
Functions’, ed. Martin Davis, 1965, Raven Press, New York.
194 BIBLIOGRAPHY
[90] A.M. Turing. Intelligent Machinery. National Physical Labatory,
1948. National Physical Labatory Report in ‘Machine Intelligence 5’
by Meltzer, B. and Michie, P., 1969, Edinburgh University Press.
[91] Unknown. Encyclopedia Brittanica; Item on Principia Mathematica.
EB, 2000.
[92] Unknown. Encyclopedia Brittanica; Item on Turing. EB, 2000.
[93] J. van Heijenoort. From Frege to G¨ odel: source book in mathematical
logic 1879-1931. Harvard University Press, 1967.
[94] W. van Orman Quine. New foundations for Mathematical Logic. The
American Monthly, February 1937. 44(2), pages 70-80.
[95] Various. The Mathematical Intelligencer, volume 13. Springer-Verlag,
Berlin, 1991.
[96] J. von Neumann. Zur Einfurung der transfiniten Zahlen. Acta Szeged.
1:199-208 [I, 3], 1923.
[97] J. Weiner. Frege in Perspective. Cornell, 1990.
[98] J. Weiner. Frege. Past Masters. Oxford University Press, 1999.
[99] A. Whitehead. An introduction to Mathematics. Williams and Norgate,
London, 1911.
[100] A. Whitehead. A treatise on universal algebra. New York, 1960.
[101] E. Zermelo. Untersuchungen ¨ uber die Grundlagen der Mengenlehre,
I. Springer-Verlag, 1908. In Mathematische Annalen 65, 1908, pages
261-281.

2

3

.

Mark Scheffer, id. 415968, e-mail: zax@chello.nl. Last changes: March 22, 2002. This report is part of a practical component of the Computing Science study at the Eindhoven University of Technology.

. two things are needed: Love and Blood. . 2001.Anonymous quote.4 To work on the foundations of mathematics.

. 15 3 Mathematical constructs in set-theory 3. . . . . . . . .4. . . . 3. . . 3. . . . . . . . . . . . . 71 4. . 3. . . . . .4. . . . . .2 Relations . . . 13 2. . . . . . . .2 Ordinal numbers and Burali-Forti’s Paradox 21 21 23 29 32 32 33 34 45 46 47 48 50 51 60 63 63 65 . . . .2 Deduction . .5. . . . . . . .2 Basic concepts . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . .1 Cardinal numbers and Cantor’s Paradox . 3.1 The beginning of set-theory . 3. . . . . . . . . .3 The principle of induction . . . . . . . . . . . . . . . . . .5. . . . . .3 Cauchy-sequences . . . . . . 74 5 . . . . . . 3.4 Induction Methods . . . . . . . . .1 Peano’s arithmetic . . . . . . . 3. . . . . . . . . .1 Induction . . . . . . . . .Contents 1 Introduction 9 2 Cantor’s paradise 13 2.1 Dedekind’s cuts . 3. . . . .5. . . . . . . . . .5. . . . . . . . . .8 Cardinal and Ordinal numbers and Paradoxes . . . . . . . . . . . .7 The Continuum Hypothesis . . . . . . . . . . . . . . . . . . . . . . .6 Infinite sets . . . . . . . . . .3 Functions . . . . . . . . . . . . . . 4 Peano and Frege 71 4. . . . 3. . .4 Properties of the three definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . 3. .2 Frege’s work .4. . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . .2 Cantor’s chains of segments . . . . . . . . . . . . . 3. . . . .1 Some mathematical concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.8. . . . . . . .5 Real numbers . . . . . . . . . . . . . . . 3. .

.1 Hilbert’s proof theory . .2 The impossibility of an ‘internal’ proof of consistency . . . . . . . . . . . 166 169 181 . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . 132 o 8. 113 7. . 5. . . . .3 G¨del numbering and a concrete proof of G1 . . . . . . . 101 6. .5 Neumann-Bernays-G¨del axioms . . . 134 8.6 5 Russell 5. . . .2 Consequences and philosophies . . . . . .1 Informally: G¨del’s incompleteness theorems . . . . . . . . . . . . . . . . . . . . . . . . . .2 Hilbert’s 23 problems . . . . . . . . . . . . . . . . . 130 8. 6 Hilbert 99 6. . . . .2 Ramsey. . . . . . 153 . . . . . . . . . . .3. . . . . . . . . 121 8 G¨del o 123 8. . . . .3 The Church-Turing thesis . . . . . . 110 7 Types 113 7. . . . . . . . . . . . . . . 10 Conclusion A Timeline and Images 141 . . . .3 Quine . . . 119 7. . . . . 127 8. . . . . . . . . . . Hilbert and Ackermann .4 Consequences . . . . . . . . . .3 G¨del’s theorem and Peano Arithmetic . . . . . .3. . . . . . 123 o 8. . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . .1 Russell’s paradox .2 Formally: G¨del’s Incompleteness Theorems . . . . . . . . 127 o 8. . . . . .1 Turing and Turing Machine . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . 141 . . . .2 Church and the Lambda Calculus . . .3 Zermelo Fraenkel . . 5. . . . . . . . . . . .2 Zermelo Fraenkel (ZF) Axioms CONTENTS 79 82 88 92 92 93 .1 Axiomatic set theory . . . . . . . . . . . . . . . . . . . . 9. . 9.2. . . . . . .1 Russell and Whitehead’s Principia Mathematica . . . . . . . G2 and G3 131 o 8. . . 135 o 9 Church and Turing 9. . . . .1 On formally undecidable propositions . . . . .2. . . . . .

. and t(x0 . . . 1 Notation originally due to E. . .CONTENTS 7 Mathematical Notations Many different notations have been developed for set theory and logic. with Q a quantifier. . xn . . . xn ) the term of the quantification. xn ) using an operator or relation indicated by Q. . .W. . xn ) : t(x0 . Mathematical Logic symbol meaning ∧ ∨ ¬ ϕ(x) → ↔ ≡ ∀ ∃ ∃! conjuction disjunction (inclusive) negation propositional function implication bi-implication equivalence universal quantifier existential quantifier one-element existential quantifier also described as and or not if . . . . xn ) for which ϕ(x0 . other notations that we have used are introduced in the text. . . . xn ) holds. over all values (x0 . . The quantification is the accumulation of values t(x0 . . iff is equivalent to for all exists exists a unique In most places we have chosen to use the following notation1 to denote quantifications: (relation : range : term) denotes the relationship over a set of terms ranging over range Consider a general pattern (Q x : ϕ(x0 . . . . Most notations that we have used are standard today. Dijkstra. . . xn )).. . . . . ϕ a boolean expression in terms of the dummies x0 . then if and only if. . .

or if it is clear what the range of a variable is. Example: (∀x : true : x ∈ A → x ∈ B) ≡ (∀x :: x ∈ A → x ∈ B) ≡ ‘all elements of A are also elements of B’ . Example: ( = 02 + 12 + 22 + 32 + 42 + 52 = 5 x : 0 ≤ x ≤ 5 : x2 ) x2 x=0 Example: (∃x : x ∈ N : x3 − x2 = 18) ≡ ‘there exists a natural number x such that x3 − x2 = 18’ If the term ranges over all possible values of the variable (here : x).e. This allows us to reason about general properties of quantifications. we can omit it. it exactly determines the domain of the quantification). Note that this type of quantification is only suitable for binary operations that are symmetric and associative.8 CONTENTS This notation is suitable for formal manipulation and unambiguous in the sense that it explicitly indicates the quantifier Q. in a way in which the (scopes of the) bound variables are clearly identified. the dummies and the range of the dummies that is indicated by the boolean expression ϕ (i.

Chapter 1 Introduction
Pure mathematics is, in its way, the poetry of logical ideas. - Albert Einstein This report covers the most important developments and theory of the foundations of mathematics in the period of 1870 to 1940. The tale of the foundations is fairly familiar in general terms and for its philosophical content; here the main emphasis is laid on the mathematical theory. The history of the foundations of mathematics is complicated and is a many-sided story; with this article I do not aim to give a definitive or complete version, but to capture what I consider the essence of the theoretical developments, and to present them in a clear and modern setting. Some basic mathematical knowledge on set-theory and logics are presupposed. By the middle of the nineteenth century, certain logical problems (for example paradoxes around the notions of infinity, the infinitesimal and continuity) at the heart of mathematics had inspired a movement, led by German mathematicians, to provide mathematics with more rigorous foundations. This is where the theory of this report begins, with the emergence of set theory by the German mathematician Cantor. In section 2.1 we informally describe how work on a problem concerning trigonometric series gradually led Cantor to his theory of sets (section 2.2). As a result of the work of Weierstrass, Dedekind and Cantor, pure mathematics had been provided with much more sophisticated foundations. The notion of infinitesimal had been banished, ‘real’ numbers had been provided with a logically consistent 9

10

CHAPTER 1. INTRODUCTION

definition (section 3.5), continuity had been redefined and, more controversially, a whole new branch of arithmetic had been invented which addressed itself to the problems (e.g. paradoxes) of infinity (sections 3.6, 3.7). In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly different form by Burali-Forti (section 3.8.2). Cantor and Burali-Forti could not resolve this paradox, but it was not taken so seriously, partly because the paradoxes appeared in a rather technical region. The Italian mathematician Peano (section 4.1) was able to show that the whole of arithmetic could be founded upon a system that uses three basic notions and five initial axioms. At the same time the German mathematician Frege (section 4.2) worked on developing a logical basis for mathematics. Just as Peano, Frege wanted to put mathematics on firm grounds. But Frege’s grounds were strictly logic; he followed a development later called logicism, also known as the development of so-called mathematical logic. The British mathematician Russell noted Peano’s work and later that of Frege. Soon thereafter he showed (section 5.1) how finite descriptions like ‘set of all sets’ could be self-contradictory (i.e. paradoxical) and pointed out the difficulties that arose with self-referential terms. This paradox that Russell found existed not only in specific technical regions but in all of the axiomatic systems underlying mathematics at the same time (section 5.1). But since the paradoxes could be avoided in most practical applications of set theory, the belief in set theory as a proper foundation of mathematics remained. Axiomatic set theory (section 5.3.1) was an attempt to come to a theory without paradoxes. Various responses to the paradox (section 5.2) led to new sets of axioms for set theory. The two main approaches are by the German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hungarian von Neumann, the Hungarian-Austrian G¨del and the Briton Bernays o (section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy of mathematics by the Dutch mathematician Brouwer (not covered here) and to a theory of types, proposed by Russell himself with the help of his former teacher, the English mathematician Whitehead. Despite of the paradox Russell and Whitehead still claimed that all mathematics could be founded on a mathematical logic; this believe was given a definite presentation in their work ‘Principia Mathematica’ (section 7.1). Various consequences followed (section 7.3) and new conceptions of logic arose (by Wittgenstein and

11 Ramsey, see section 7.2). At the turn of the century, the German mathematician David Hilbert listed certain important problems concerning the foundations of mathematics and mathematics in general (section 6.2. To overcome paradoxes and other problems that arose in existing systems, Hilbert developed a theory of axiomatic systems (section 6.1). He then stimulated his student Zermelo in using this axiomatic method to develop as first a set of axioms for set theory (section 5.3.2). Hilbert had since then made more precise demands on any proposed set of axioms for mathematics (section 6.1) in terms of consistency, completeness and decidability. In 1931 G¨del had shown that consistency and completeness could not o both be attained (chapter 8). G¨del’s work left outstanding Hilbert’s queso tion of decidability. The English mathematician Turing proved in 1936 that there are undecidable problems, by giving the so-called halting problem that cannot be solved by any algorithm (section 9.1), after formalizing the notion of algorithm with his concept of the Turing Machine. The American mathematician Church (independently) obtained the same result but with another formalization of the notion of an algorithm, using his computational model of lambda calculus (section 9.2). In section 9.3 we state that these two notions are equivalent and correspond to the intuitive notion of algorithm or computability. In chapter 10 I summarize the theory of the foundations of mathematics, before giving my own opinion and make some suggestions for future work. This article is part of the practical component of my study of computing science, and written for a large part in 8 weeks at the Heriot-Watt university in Edinburgh under supervision of prof. F. Kamareddine. I want to thank Rob Nederpelt and the formal methods section of the computing science department of the Eindhoven University of Technology for making this possible. Rob Nederpelt always inspired me to continue working on this report and was patient in explaining difficult proofs to me. And last but not least, I want to thank Fairouz Kamareddine for her support and positive motivation, and Boukje Nouwen (as she breathes a sigh of relief that this is (I think) the last revision) for the typesetting and editing of large parts of this document and for helping me in many ways to finish this article in such a small period of time.

12 CHAPTER 1. INTRODUCTION .

Much of our knowledge is due to a comparatively few great mathematicians such as Newton. chapter 2. C.1 The beginning of set-theory Perhaps the most surprising thing about mathematics is that it is so surprising. . These have only been found out by long study. the stage was set for a consideration of questions about the fundamental nature of mathematics. in [88] By the late 19th century the discussions about the foundations of geometry had become the focus for a running debate about the nature of the branches of mathematics ([23. but it is impossible to foresee their consequences. the work of the English mathematician George Boole in the 1850s ([49. or Riemann. extending over many centuries.Chapter 2 Cantor’s paradise 2. page 51]). Although there had been no conscious plan leading in that direction.S4. page 69/70]). and the American Charles Sanders 13 .Titchmarsh. Gauss. E. Euler. few careers can have been more satisfying than theirs. The rules which we make up at the beginning seem ordinary and inevitable. They have contributed something to human thought even more lasting than great literature. last paragraph of section 35. In the study of logic. since it is independent of language.

page 184]) that would not conflict with such a representation led him. to define irrational numbers in terms of convergent sequences of rational numbers (or quotients of integers.2. . CANTOR’S PARADISE Peirce around 1880 ([49. In Germany at the university of Halle.2) and then to begin his major lifework.5. section 5. Consideration of the collection of numbers (originally termed ‘point sets’. All of these debates came together through the pioneering work of the German mathematician Georg Cantor on the concept of a set. page 187]). see [30. the direction of Cantor’s research turned away from number theory and towards analysis. first. one of his senior colleagues at Halle. had contributed to the development of a symbolism to explore logical deductions and in Germany the logician Gottlob Frege (see [98]) had directed keen attention to fundamental questions.14 CHAPTER 2. Cantor in 1870 showed ([30. who challenged Cantor to prove the open problem on the uniqueness of representation of a function as a trigonometric series (see [30. This was due to Heine. Cantor had begun work in this area because of his interest in Riemann’s theory of trigonometric series. the theory of sets and the concept of transfinite numbers. section 5. see section 3. page 182]).2. in 1872. Starting from the work on trigonometric series and on the function of a complex variable done by the German mathematician Bernhard Riemann (see [75]) in 1854. page 182]) that such a function can be represented in only one way by a trigonometric series.

class or (as it was first called by Riemann (see [31.2. 8.2 Basic concepts The essence of mathematics lies in its freedom. quoted in [58] In 1974 Cantor published his first article on set-theory. The objects are also called elements or members of the set. We denote a set of elements between brackets ‘{’. What is a set? A (finite or infinite) collection of objects. published ¨ in [20. that is considered as a single. 8. 4. 9}. A set is sometimes also called aggregate. wrote Cantor (in ‘Untersuchungen uber die Grundlagen der Mengenlehre I’. which may be of a mathematical nature. 3. we write 4 ∈ {2. such as numbers or functions. the fractional numbers by Q (this set is also called the rationals) and the reals by R (this set is also called the continuum). We write 4 ∈ {7.Georg Cantor. page 88]) and later by the mathematician Russell:) manifold . the integers by Z. The objects of a set themselves can also be sets. or not. 4. page 261-281]). 3. We denote the well-known sets of natural numbers by N (this set is also called the naturals). A set. . distinguishable objects of perception or thought conceived as a whole”. 9} to indicate that 4 is not an element of the set {7. 5}. 5} to indicate that 4 is an element of the set {2. abstract object. In this report we use a similar description of the concept of a set. ’}’. Example: If we consider a set that contains natural numbers. BASIC CONCEPTS 15 2.2. . is “a collection of definite. What is set theory? A branch of mathematics that deals with the properties of well-defined collections of objects. and membership of an element to a set by the membership relation ∈. In a mathematical context we mostly consider sets of numbers and functions.

page 288]) two sets A and B to be identical (equal). ϕ(x) = false). We thus have that y ∈ {x | ϕ(x)} ↔ ϕ(y). For example we take the set of powers of three or the set of all even numbers (to be exact: the set containing those elements of the set of natural numbers that have the property to be divisible by 2). From now on. 3. 4}} = {{2. we denote this unique set x by ∅ and / call it the empty set. notation ⊆. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)). indicates that one set is contained in the other: Definition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B) Definition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A = B) We often want to create a new set from a given set by selecting elements that have certain properties.16 CHAPTER 2. notation A = B. Comprehension principle: For all properties ϕ there is precisely one set. Theorem: (∃!x :: (∀y :: y ∈ x)) / Proof: If we take ϕ to be false. if and only if A and B have the same elements. denoted by ∅. CANTOR’S PARADISE Cantor defined ([49. 7} = {7. there is at least one set that has no elements: the empty set. When later set-theory was axiomatized. As a consequence (by taking for all x. this definition became also known as the Axiom of extensionality: A = B := (∀x :: (x ∈ A ↔ x ∈ B)) Example: {3. This is equivalent to saying there is no element y that can be a member of x: (∃!x :: (∀y :: y ∈ x)). This principle was used by Cantor. the comprehension principle says that ‘there is precisely one set whose elements are exactly those objects which have the property false’. . 4} The relation ‘is a subset of’. {3. whose elements are exactly those objects which have the property ϕ. 3}. and we also call it the unrestricted or naive comprehension principle because it later (see sections 3.8 and 5. 3} and {2.1) turned out to be untenable. denoted by {x | ϕ(x)}.

⊕ is right-distributive over := (∀E. regardless whether the operations are set-theoretic or not) properties of operations: idempotence. G :: (E ⊕ F ) ⊕ G = E ⊕ (F ⊕ G)) Definition of distributivity: ⊕ is distributive2 over := (∀E. Suppose ⊕ and are binary1 operations on a certain domain and E. Definition of idempotence: ⊕ is idempotent := (∀E :: E ⊕ E = E) Definition of commutativity: ⊕ is commutative := (∀E.2. but this will not be necessary for our discussion. F :: E ⊕ F = F ⊕ E) Definition of associativity: ⊕ is associative := (∀E. commutativity. F.and right-distributive). and we for example simply say that × is distributive over + (when in fact it is both left. But before we do that. Although Cantor did not formulate these properties as such. G :: E ⊕ (F G) = (E ⊕ F ) (E ⊕ G)) These properties can also be generated for operations of arbitrary arity. So now we can introduce some operations on sets. we first introduce some general (i. From the previous theorem we know that (∀y :: y ∈ ∅).e. BASIC CONCEPTS 17 Corollary: (∀a :: ∅ ⊆ a) Proof: We want to prove that (∀a :: ∅ ⊆ a) or. 2 This form of distributivity is also called left-distributivity. G :: (E F ) ⊕ G = (E ⊕ G) (F ⊕ G)) In ordinary mathematics this distinction is often left out for commutative operations. This yields us (∀x :: false → x ∈ a). on which we have defined the equality relation ‘=’. by applying the comprehension principle. F. / Using the comprehension principle we can create new sets from given sets. F. using the definition of the subset relation: (∀x :: x ∈ ∅ → x ∈ a). associativity and distributivity.2. as opposed to rightdistributivity. F and G are elements on that domain (for example sets). which is true. 1 . they are used in the branch of calculus and useful in the set theory that follows in this chapter.

named after the English mathematician Augustus de Morgan: First law of reciprocity: A ⊆ B ↔ AC ⊇ B C Second law of reciprocity: (A ∪ B)C = AC ∩ B C Third law of reciprocity: (A ∩ B)C = AC ∪ B C We define the power set of V . this operation creates a larger set from a given set V. CANTOR’S PARADISE The symbol ∪ is employed to denote the union of two sets. as the set of all subsets of V . we can deduce three properties that are known as the laws of reciprocity. contains those elements of B. . The difference of sets B and A. denoted by P(V ). that are not in A. Example: The operations union and intersection on sets are both idempotent. A ∩ B is defined as the set composed of all elements that belong to both A and B. denoted B − A. Definition of difference: B − A := {x | x ∈ B ∧ x ∈ A} / If A ⊆ B we often call the difference B − A the relative complement of A in B. We then call B the universe. Definition of union: A ∪ B := {x | x ∈ A ∨ x ∈ B} The intersection operation is denoted by the symbol ∩. and if it is clear what the universe is we often denote the relative complement of A by Ac . A collection of sets is called (pairwise) disjoint or mutually exclusive if any two distinct sets in it are disjoint. Thus. the set A ∪ B is defined as the set that consists of all elements belonging either to set A or set B. From the definitions that we have introduced so far.18 CHAPTER 2. commutative and associative. The second and third law are also known as the laws of de Morgan. Note that if V = ∅. Definition of intersection: A ∩ B := {x | x ∈ A ∧ x ∈ B} Any two sets the intersection of which is the empty set are said to be disjoint.

BASIC CONCEPTS Definition of powerset: P(V ) := {A | A ⊆ V } Given a set V . Definition of partition: P is a partition of X := X = {A | A ∈ P } ∧ (∀A. Assume P is a set of subsets of X. We can do the same for the intersection. B : A.2.1 and [17] in section 2. that is a family of subsets that are mutually exclusive and jointly exhaustive. B ∈ P : A = B ∨ A ∩ B = ∅) In this chapter I have made extensive use of [30] in section 2. Definition: Definition: A := {x | (∃y :: y ∈ A ∧ x ∈ y)} A := {x | (∀y :: y ∈ A → x ∈ y)} We can divide a set of objects into a partition.2. the union is then defined as the set of all objects which belong to at least one set in the collection A. . we thus have that (∀y :: y ∈ P(V ) ↔ y ⊆ V ) 19 We can extend the union of a pair of sets to any finite collection of sets.2.

CANTOR’S PARADISE .20 CHAPTER 2.

Compared to a ‘normal’ pair. within the limits of his imagination. in experience. he is not thereby discovering the fundamental principles of the universe nor becoming acquainted with the ideas of God. In this section we will consider pairs and the cartesian product. What he is to imagine is a matter for his own caprice. where two pairs are considered equal if they have the same elements. to construct what world he pleases.2) and functions (in section 3.Chapter 3 Mathematical constructs in set-theory 3. sets of entities which obey the same logical scheme as his mathematical entities.W. First we consider the mathematical concept of an ordered pair < a.J. necessary before we can treat relations (in section 3. b >. 1925 Now that we have this apparatus of set-theory available.3). he has created a branch of science. but that we can define some basic mathematical constructs in set-theory. we want an ordered pair to also have the property 21 .N. Sullivan in Aspects of Science.1 Some mathematical concepts The mathematician is entirely free. we will see that it is not just a separate branch of mathematics. then he has applied his mathematics to the external world. If he can find. .

We now define the cartesian product of a set of sets. d > ↔ a = c ∧ b = d) We can now easily verify that the following definition (see [17. . Definition of cartesian product of a set of sets: ×V := {f : I → i∈I Vi | (∀i : i ∈ I : f (i) ∈ Vi )} 1 Representation originally by Kuratowski. see [49. we can now use the same definition in set-theory: Definition of cartesian product: A × B := {< a. Definition of ordered pair1 : < a. MATHEMATICAL CONSTRUCTS IN SET-THEORY that the elements appear in the same order: (∀c. b > | a ∈ A ∧ b ∈ B} Let V = {Vi | i ∈ I} be a set of sets. b}} As the cartesian product A × B is by definition the set of all ordered pairs < a.22 CHAPTER 3. {a. denoted by ×V or ×i∈I Vi . that will be introduced on page 29. The definition uses the concept of a function. b > := {a. d :: < a. page 294]. chapter 8]) in set-theory satisfies the desired property. b > = < c. b > with a ∈ A and b ∈ B.

5 >. < 2.2. Xn := R ⊆ X1 . < 0. . X2 × . < 1. . We define the following shorthand notation (sometimes also written in infix notation as xR y): R(x. but relations between objects. Thus. . .2 Relations Mathematicians do not study objects. 2 >. for n ∈ N. × Xn . 1 >. Definition of binary relation: R is a binary relation between X and Y := R ⊆ X × Y Note: We can easily generalize this definition for n-ary relations: R is an n-ary relation on X1 . . We formalize this notion in set-theory. 3 >. . only those pairs < x. Content to them is irrelevant: they are interested in form only. Poincar´ e In mathematics. . < 1. There we defined the set R ⊆ X ×Y implicitly by using a statement. y > ∈ R’. . between N and N) can be defined as: < 0. < 2. . 4 >. 3 >. they are free to replace some objects by others as long as the relations remain unchanged. 4 >. Example: The relation < on the naturals (i.e. < 0. The mathematical expression ‘x < y’ is now equivalent to the set theoretic expression ‘< x. y) := < x. We call n the arity of the function. y > ∈ R. a relation maps each element from an input set (called domain) to either true or false. . < 1. . Example: We have already seen the definitions of the subset and proper subset relations in section 2.H. . . . We will continue to use statements to define relations. . with R representing the ‘less than’ relation. RELATIONS 23 3. . . . 3 >. y > are in R for which the statement holds (here we are using in fact the comprehension principle of page 16). < 2.1.J.3. 2 >. .

range: dom(R) := {x ∈ X | (∃y : y ∈ Y : R(x. y) ∧ R(y. Definition of identity relation: IV := {< x. z)) Definition of connectivity: R is connective := (∀x. y. symmetric and transitive . page 30]). z ∈ X : R(x. Hereby we can explicitly state on which domain the property holds (e. y : x. y : x. In set-theory. As we did for operations in section 2. section 1. x)) Definition of anti-symmetry: R is anti-symmetric := (∀x. y) ∧ R(y. R ⊆ X × X). y. MATHEMATICAL CONSTRUCTS IN SET-THEORY On a relation R we can define the concepts of domain and range. y : x. y ∈ X : R(x. Definition of reflexivity: R is reflexive := (∀x : x ∈ X : R(x.b. y) ∨ (x = y) ∨ R(y.9. y))} ran(R) := {y ∈ Y | (∃x : x ∈ X : R(x. we denote the identity relation on V by IV . x)) Definition of symmetry: R is symmetric := (∀x. Note that we have already defined an equality relation ‘=’ on X at page 16.g. x) → x = y) Definition of transitivity: R is transitive := (∀x. R is reflexive on X) or leave this implicit (e. x)) Definition of equivalence: R is an equivalence relation := R is reflexive. y))} If we define the identity relation of X.e.g. simply R is reflexive). y) → R(y.24 CHAPTER 3. y ∈ X : R(x. we want it to have the usual property that idX (x) = x for all x ∈ X (see for example [3.2. Definition of domain. we can also define some general properties of relations. z) → R(x.5. y ∈ X : R(x. y > ∈ V × V | x = y} Assume R is a binary relation on a set X (i. z : x.

but not connective. we denote the equivalence class of x with respect to R as [x]R . Definition of (weak) partial ordering: R is a (weak) partial ordering := R is reflexive. by categorizing them based on the properties they have.2.3. and R is a relation on X. the quotient set X/R of X modulo R is the set of equivalence classes [x]R for all x ∈ X. and is not the same as antisymmetric. Definition of quotient set: X/R := {[x]R | x ∈ X} We now continue to build on the concept of relations. An important property of relations is the ability to compare and order elements. anti-symmetric (note that the proof of anti-symmetry uses the axiom of extensionality of page 16) and transitive. If R is an equivalence relation on a set X. y)} If R is an equivalence relation on X. RELATIONS 25 Note: Asymmetric means not symmetric. Definition of equivalence class: [x]R := {y ∈ X | R(x. Suppose X and Y are sets. anti-symmetric and transitive Definition of (total or linear) ordering: R is a (total or linear) ordering := R is irreflexive. anti-symmetric. Example: The subset relation is reflexive. transitive and connective Definition of well-ordering: R is a well-ordering := R is an ordering on X and each nonempty subset of X has a least element . anti-symmetric and transitive (on X) Definition of quasi ordering: R is a quasi ordering := R is irreflexive and transitive Definition of strict partial ordering: R is a strict partial ordering := R is irreflexive.

In the literature the immediate successor or predecessor is sometimes called just successor or predecessor. x1 is an immediate predecessor of x2 if in addition (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1 . x2 is an immediate successor of x1 if in addition (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1 . The notions of smallest and largest elements will be introduced hereafter. MATHEMATICAL CONSTRUCTS IN SET-THEORY Definition of well-foundedness: A set V is well-founded by a relation R := S is partially ordered by R and contains no infinite descending chains A set S contains a set C that is an infinite descending chain iff C ⊂ S ∧ C has no minimal element. x2 )) Note that with these definitions it can be easily proved that if a relation R on X is an ordering. x3 ) ∧ R(x3 . x3 ) ∧ R(x3 . x2 ) ∧ ¬R(x2 . we define the well-known concepts of (immediate) successor and predecessor. and when it is a quasi ordering by ≺. partially ordered by and let Y be a subset of X. Let X be a set. or we simply speak of the ‘next’ or ‘previous’ value. Now we can speak of a set of which the elements are ordered by a relation R. Definition of (immediate) predecessor: An element x1 ∈ X is a predecessor of an element x2 ∈ X (with respect to an ordering R on X) := R(x1 . Now we can distinguish elements based on their order.26 CHAPTER 3. x2 ) ∧ ¬R(x2 . Definition of minimal element: x is a minimal element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : y x) . x1 ). x1 ). then each element except the smallest has a unique immediate predecessor and each element except the largest has a unique immediate successor. When R is a partial ordering we often denote it by the symbol . Theorem: (without proof) Any subset of a well-founded set is also wellfounded. x2 )) Definition of (immediate) successor: An element x2 ∈ X is a successor of an element x1 ∈ X (with respect to an ordering R on X) := R(x1 . Sometimes we also see that the term ‘direct’ is used in stead of ‘immediate’.

and 2 is the infimum of X. Then R is a partial order (but not strict) and also a quasi order. 24. 36} and R(x.3. last) element of X := x ∈ X ∧ (∀y : y ∈ X : y x) Definition of lowerbound: x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x Definition of upperbound: x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y y) y) x) Definition of infimum: x is an infimum for Y in X := x is the greatest lowerbound for Y in X Definition of supremum: x is a supremum for Y in X := x is the smallest upperbound for Y in X Example: Let X = {4. 1 is a lowerbound for X. y) := x is a divisor of y. RELATIONS 27 Definition of maximum element: x is a maximum element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : x Definition of least element: x is a least (also called smallest or first) element of X := x ∈ X ∧ (∀y : y ∈ X : x y) Definition of maximal element: x is a maximal (also called greatest. 6. . largest. but not a (total) order. 12. 4 and 6 are minimal elements of X. but X has no least element.2.

n} = ∅. we can apply the induction principle to conclude that A ∩ {0. Suppose N = {0. . . . that (∀A : A ⊆ N ∧ A = ∅ : A has a least element). if A ∩ {0. . . The least element of A ∩ {0. . For | N | = n + 1. . For | N | = 0 it is trivial that A is well-ordered. This principle can be shown (a proof can be found in [59. n} has a least element. .4. MATHEMATICAL CONSTRUCTS IN SET-THEORY The so-called least number principle says that any non-empty subset of the natural numbers has a least element. From the so-called least number principle we can conclude that N is also well-ordered by <. . n ∈ N. . If A ∩ {0. . . . . n}. . notation | A |. Example: The relation < on the naturals is an example of a total ordering on N. that will be introduced in section 3. . . . . . . . n + 1}. Proof: We know that < is an ordering on N. .28 CHAPTER 3. . n + 1 is a least element of A. . n} is also a least element of A ∩ {0. We show by induction on the number of elements of A. n} = ∅. Let A ⊆ N . page 7]) to be equivalent to the principles of weak and strong induction. We prove the latter.

y > of the input and output set if it belongs to the function or not.3. We now introduce a notation for the set of all functions f : X → Y . we now define some general properties for functions. f is a function or mapping from X to Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y . in other words it is a special kind of relation that indicates for each pair < x. x2 : x1 . Example: We have the following property: f : X → Y is surjective ↔ Ran(f ) = Y . We can define this notion in set-theory by using a relation between X and Y such that for each x ∈ X there is a unique y ∈ Y such that < x. More precisely. x2 ∈ X : x1 = x2 → f (x1 ) = f (x2 )) Definition of surjective: f : X → Y is surjective or a surjection := (∀y : y ∈ Y : (∃x : x ∈ X : y = f (x)) Definition of bijective: f : X → Y is bijective or a bijection := f is surjective and f is injective If f is bijective.3.3 Functions In mathematics. Definition of injective: f : X → Y is injective or an injection := (∀x1 . a function maps each element from an input set to one or more elements of an output set. FUNCTIONS 29 3. notation f (x) = y. notation f : X → Y := f ⊆ X × Y ∧ (∀x : x ∈ X : (∃!y : y ∈ Y : < x. Definition of function: f is a function from a set X to a set Y . . y > ∈ f )) The definitions of domain and range as given in the subsection about relations can now also be used for functions. f is also called a (one-to-one) correspondence between X and Y . y > ∈ f . Definition of Y X : Y X := {f ∈ P(X × Y ) | f is a function from X to Y } As we did before for relations and operations.

. with g(x) = y iff x = sin(y) is a relation. with X0 ⊆ X := the function f X0 : X0 → Y with (∀x : x ∈ X0 : f X0 (x) = f (x)) Just as in algebra. . not a function. We will now consider two special kinds of functions: the identity function and the sequence. s ∈ X N ) Definition of identity function: The identity function idX := idX : X → X and (∀x : x ∈ X : idX (x) = x) We now introduce some operations on functions in set-theory. Definition of composition: The composition g◦f of two functions f : A → B and g : B → C := the function g ◦ f : A → C with g ◦ f (x) = g(f (x)). 2π]. . MATHEMATICAL CONSTRUCTS IN SET-THEORY Example: f : N → [−2π. . . and reason about sets of structures instead. g : [−2π. 2π] → N. with f (x) = sin(x) is a function and a relation.e. . called isomorphism. R0 . for all x ∈ A Definition of inverse function: The inverse of a bijection f : X → Y := the function f −1 : Y → X with (∀y : y ∈ Y : f −1 (y) = x ↔ y = f (x)) Definition of restricted function: The restriction of a function f : X → Y to X0 . . Rp is a (relational) structure := X is a set and R0 . we can now combine a set and relations on that set into a structure. Rp are relations on X The concept of a structure enables us to abstract from the exact set and relations.30 CHAPTER 3. Definition of sequence: s is a sequence of X := s is a function from N to X (i. . We can easily check that these definitions correspond to mathematical operations. . There also is a useful definition for equivalence of structures. Definition of (relational) structure: X.

< . Rp and S = Y. . f is not an isomorphism from N. we can now abstract over structures. ∗ and R+ . xni ∈ X : Ri (x0 . xni : x0 . with a ⊕ b := b divides a. < to Neven . notation R S := there exists an isomorphism from R to S Definition of automorphism: f is an automorphism of R := f is an isomorphism from R to R Example: An isomorphism from structure N. Sp be two structures. . . . . . . . . . . the function f (x) = 2x3 is an automorphism of R. . . . Definition of isomorphism: f is an isomorphism between R and S := f is a bijection from X to Y and (∀i : 0 ≤ i ≤ p : (∀x0 . r2 ∈ R+ .3. we call them isomorphic. f (xni )))) With the notion of isomorphism. . . Rp is the identity function idA : A → A. xni ) ↔ Si (f (x0 ). R0 . Definition of isomorphic: Two structures R and S are isomorphic. . . . When two structures are similar (the sets are of the same size and the relationships between the elements in one structure are retained between images of those elements in the other structure). Example: An automorphism of A. R0 . Also. a > | a ∈ A}. < . . + . so idA = {< a. . < is given by f : N → Neven . log(r1 ∗ r2 ) = log(r1 ) + log(r2 ). . Example: The function g : R+ → R+ with g(x) = log(x) is an isomorphism between R+ . .3. . . . FUNCTIONS 31 Let R = X. with f (n) = 2n. . because for all r1 . such that (∀i : 0 ≤ i ≤ p : the arity of Ri and Si is ni + 1). . S0 . . . ⊕ to N.

. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3. or from the individual to the universal. Philosophy department. 100 was greasy . . . . the conclusion goes beyond the premisses (or the evidence).1 Induction Induction is a method of reasoning from a part to a whole.32 CHAPTER 3. from particulars to generals. Reasoning by analogy. including a distinguishment between determinative and conceptual induction. Earlham College. . As said in [49. In chapter 14. all coffee shop burgers are greasy (or: the next coffee shop burger will be greasy). in which case they become deductions).4. Coffee shop burger no.A. Example of inductive reasoning: 2 Coffee shop burger no.4. This is the source of uncertainty in inductions: inductions are strengthened as confirming instances pile up. . In my view it would be just as sensible for the two ends of a worm to quarrel. . 1 was greasy . . Therefore. .4 Induction Methods There is a tradition of opposition between adherents of induction and deduction. . is also a primitive form of induction (see [23. where a conclusion is made based on an analogues situation.3). . In both these kinds of induction. quoted in [76] 3. 2 was greasy . Coffee shop burger no. whereas in induction it asserts more. . but they can never bring certainty (unless every possible cause is actually examined. the broad difference between deductive and inductive reasoning is that in deduction the conclusion asserts less than the premisses. . 2 Example from: Peter Suber. section 3 of [49] there is a more detailed treatment of inductive reasoning. . page 6]). In ordinary induction we examine a certain number of cases and then generalize. Whitehead. It should not be confused with the mathematical principle of induction (treated in section 3. page 366]. So in induction the conclusion contains information that was not contained in the premisses.

page 121]). each of which is either a premise or a consequence of a statement occurring earlier in the proof. the traditional view that deduction proceeds from the general to the specific has been abandoned as incorrect by most logicians. in mathematical context. if it does not we say that our reasoning is invalid . deduction is a method of reasoning that is based on a rigorous proof: a derivation (using fixed rules called a system of logic). .i.1) are a form of reasoning with deductive certainty. If the conclusion follows from the premisses. we say that our reasoning is valid . Some experts regard all valid inferences as deductive in form and for this and other reasons reject the supposed contrast between deduction and induction. can be seen as the theory of the formal structure of deductive reasoning. quoted in [100] In contrast to induction.4. chapter 1.A. not empirical procedures.3.2 Deduction Mathematics. the deductive argument has become more subtle and complex and is now based on modern symbolic logic. Originally based on Aristotle’s logic. but a syllogism is now recognized as merely a special case of a deduction. necessary. Whitehead. Also.e. The Greek found deductive reasoning. .1) and Russell’s Principia Mathematica (see section 7. deductive reasoning. is the development of all types of formal. of one statement (the conclusion) from one or more statements (the premisses) . The logic of Hilbert’s metamathematics (see section 6. the method to establish mathematical facts.4. in its widest significance. a chain of statements. This usage is a generalization of what the Greek philosopher Aristotle called the syllogism (see [49. INDUCTION METHODS 33 3. Logic. although others have proposed different formalizations of deductive logic (see [49. The German mathematician Hilbert greatly contributed to deductive reasoning as we will see when we introduce his proof theory (also known as the axiomatic method) in chapter 6. In deductive reasoning. we are not directly concerned with the truth of the conclusion but rather whether the conclusion does or does not follow from the premisses. section 5 and 6)]).

Together. Weak: prove that if the theorem holds for an arbitrary element n. ‘Mathematical induction’ is unfortunately named. mathematical induction concludes with deductive certainty. then you have proven only that the theorem holds for that case and its successors. This case is also called base case. Prove that the theorem holds for a specific case (which often is minimal for a given ordering of the elements). then it holds for the element n + 1 Strong: prove that if the theorem holds for all elements up to some arbitrary element n. The name was probably inspired by the fact that. Prove a rule that says that if the theorem holds for an arbitrary element. 3) Conclusion. The claim that the theorem is true for an arbitrary element is called the induction hypothesis. 2) Induction step. it is true for the next case. for it is unambiguously a form of deduction. The induction step can take two forms which correspond to two forms of mathematical induction. axiom 3).34 CHAPTER 3. The informal structure of the proof of a theorem by mathematical induction is fairly simple: 1) Basis. This often is a rule of heredity that tells us that the theory is true for the immediate successor case of an arbitrary element if it is true for the arbitrary element itself. 1 and 2 imply that the theorem holds for all cases starting with the base case. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3. not for all possible cases. also known as mathematical induction. It was even used by Peano to define the concept of natural numbers (see section 4. is an important process for proving theorems.1. then it holds for the element n + 1 . just like induction.3 The principle of induction Informal The principle of induction. as we will see.4. But. it generalizes to a whole set from a smaller sample. If you didn’t use the minimal case in step 1. Again we assume there is an ordering of the elements with +1 the immediate successor relation.

The principle presupposes the following two conditions: l A S is a set. we also refer to the principle as natural induction.3.1) 0 is in N and if x is in N then x + 1 is in N.4. well-founded by relation R (such that ‘+’ denotes the immediate successor of an element with respect to the relation R) and with smallest element e l B Every element except e has a (unique) immediate predecessor and ϕ is a property of elements of S l l If A and B hold.e. if s ∈ S has property ϕ. if we have the axioms (see Peano axioms in section 4.e. Formal Suppose that we want to prove a property ϕ(s) that holds for all s ∈ S. It does not go without saying that if we are to use mathematical induction to prove that some theorem applies to ‘all possible cases’.e. Definition of the (weak) (mathematical) induction principle: if l C ϕ(e) (i. the nth case. This condition is also known as S is inductive. This is important. the set N is inductive. etc. And we have to be able to speak about the minimal case. since many mistakes are being made in applying the principle. The induction principle assumes that S is a well-founded set and every element except for the smallest has an immediate predecessor. i. then those cases must somehow be enumerable and in some way linked to the integers. the successor of a given case. e has a property ϕ) l D (∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i. we can use the induction principle. In case the set S is the naturals. The structure of an inductive set in fact resembles that of the naturals. INDUCTION METHODS 35 We will now formally state the principle of induction. then the (unique) immediate successor of s also has property ϕ) then the property ϕ holds for every element in S .

has an immediate predecessor. if s has property ϕ then so does the successor of s. Consider the natural numbers with the ordering defined as follows: • if n and m are both even. l We now show (with proof by contradiction) why the additional property B that every element except the smallest must have an immediate predecessor is necessary for the induction principle. denoted e. But then by D . If n = e. and suppose that a property ϕ is true for e. l One way is to prove in step C that ϕ holds for a (possibly non-minimal) case l b ∈ S. this principle can be generalized in several ways. Proof: Suppose S is a well-founded set and every element except the smallest. ϕ must also hold for the immediate successor of n−. we always define n We can check that N is well-founded by . Let N be the set of elements of S for which ϕ is not true. Thus ϕ must be true for all s ∈ S. .36 CHAPTER 3. which is even. Also. By the theorem of page 26 we also know that if S is well-founded. n has an immediate predecessor. never an odd number. If n > e. the successor of an even number is always the next even number. then n m if n < m m if n < m m • if n is even and m is odd. That is because in our ordering.e. we have a contradiction. step D is also called the induction step. then n • if n and m are both odd. as well as for the immediate successor s+ ∈ S if it is true for s ∈ S. and ϕ(s) is called the induction hypothesis. ϕ must hold for n−. denoted n−. The smallest element in the ordering is 0. We now prove by contradiction that ϕ holds for all s ∈ S. The conclusion then is that the property ϕ holds for all elements in S that are ordered larger or equal to b. We take the property ϕ that every element is even. but not every element (for example 1) has an immediate predecessor. Suppose that ϕ is not true for all s ∈ S. Since n is the smallest element l for which ϕ doesn’t hold. i. any subset of S is also well-founded. and if s has property ϕ. MATHEMATICAL CONSTRUCTS IN SET-THEORY l l Step C is also called the base of a proof by induction. thus N contains a smallest element n. As we mentioned before. N = {s ∈ S | ¬ϕ(s)}. In step D we then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)). that is n: contradiction. then s must be even.

4. that is based on the same assumptions ( A and B ) as the weak induction principle. C and D holding) every natural number is even: contradiction! There is however a weaker principle. but instead of C and D with . Principle of strong (mathematical) induction: The same as for (weak) l l induction. called transfinite induction which suitably stated . INDUCTION METHODS 37 l l l Therefore (with only conditions A .does apply to every well-ordered set.3. But first we regard a l l stronger principle.

when applied to ordinals (ordinals form a well-ordered and hence well-founded set and are introduced in section 3. Thus {x ∈ X | ¬ϕ(x)} is non-empty and has a smallest element e (since X. This form of induction. 3 . for example in [4]. Note that the base case is not really left out. and with smallest element e. X for S. it can be applied to any set S that is well-ordered by relation a R. Sometimes this principle is called the Principle of Complete Induction. Again we use proof by contradiction. l l but instead of A and B as assumptions. R is well-founded). R is a structure such that A . and assume (∃x : x ∈ X : ¬ϕ(x)). for all x ∈ S we have ϕ(x) if all R-predecessors y of x have property ϕ) Sometimes this is also informally stated using the infamous three dots as (∀s : s ∈ S : (ϕ(e) ∧ ϕ(e+) ∧ . l According to E (substitute z for y. . D2 ) (∀x  : x ∈ S : (∀y : y ∈ S : R(y. . We now have ¬ϕ(e ) ∧ (∀z : z ∈ X : R(z.2) is called transfinite induction. since it is implicitly present in the quantification (take e for x). e ) → ϕ(z)). B and E hold. and take e for x) we then have ϕ(e ): contradiction. but this is less common.8. x) → ϕ(y)) → ϕ(x)) (i.e. ∧ ϕ(s)) → ϕ(s+). l l l Proof: Suppose X. Principle of transfinite induction3 : The same as for strong induction.

Proof: The proof of the principle of transfinite induction is similar to the proof of the strong induction principle. without Peano’s induction axiom). see also section 3.8.2) and ‘Mathematical Induction Principle’ are all equivalent. but none of them can be proven from the other fundamental axioms in ZF set theory alone. or one of its equivalent forms. ‘Zorn’s Lemma’. Without these conditions (to be exact. The four statements known as ‘Axiom of Choice’. As an example. MATHEMATICAL CONSTRUCTS IN SET-THEORY An example of such a set are the ordinals or cardinals. we now prove that (mathematical) induction is a special case of transfinite induction. all three given principle are equivalent.8. To prove this it suffices to show l l l that ( C and D ) ↔ E . Clearly. s has an immediate predecessor (i.3 and chapter 4 respectively). we cannot directly prove the principles to be true from the ZF axioms alone4 . An extra axiom is needed. see page 3. that will be introduced in section 5. These proofs however are based on an underlying set of axioms (the so-called ZF axioms and the Peano axioms. the others follow as consequences. it is not possible to prove mathematical induction. There are also other equivalent statements that are sometimes used (such as Zermelo’s postulate). since we proved them to be true. With only the fundamental axioms of Zermelo-Fraenkel set theory. and it is a nice exercise to prove the equivalence of these statements. In that case we can prove the equivalence of the principles by showing that they imply each other.2) The case for limit ordinals is typically approached by noting that a limit ordinal b is (by definition) the union of all ordinals a < b and using this fact to prove ϕ(b) assuming that ϕ(a) holds true for all a < b. the infamous Axiom of Choice. meaning that if you assume one of them to be true. s has no immediate predecessor (i. ‘Well-Ordering principle’ (also known as well-ordering theorem. 3. A proof by transfinite induction typically needs to distinguish three cases: 1. 4 .e. s is a so-called limit-ordinal. s is a minimal element 2. for the set of natural numbers.38 CHAPTER 3.e. the set of elements which are smaller than s has a largest element) In this case we can apply normal induction. or even the class of all ordinals.

and instantiate ψ with the property ϕ. and thus ϕ(q) also holds for all q > 0. We assume TFIND’. we have (p < 0 → ψ(p)) → ψ(0). For all properties ψ we have to prove (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)). we have IND’. i. Let s ∈ N be the smallest number such that s < k + 1 ∧ ¬ ϕ(s). We now want to prove IND’. (∃p : p ∈ N : p < k + 1 ∧ ¬ ϕ(p)). If we take q = 0. If we take q = 0. i. (∀p : p ∈ N : p < 0 → ϕ(p)) is trivially true. By IND’ we also know that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). Proof by Contradiction: (from: [17]) It suffices to prove that IND’ ≡ TFIND’. That means the condition of TFIND’ (∀p : p ∈ N : p < q → ϕ(p)). contradiction. Suppose we have (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)). . that is (∀r : r ∈ N : r < s → ϕ(r)). ψ(0). First we prove this for q = 0. Thus we have ϕ(0). with q = k + 1 must not be true: ¬(∀p : p ∈ N : p < k + 1 → ϕ(p)). Hereby we have proved TFIND’. and since we already have proven (∀ϕ :: ϕ(0)). ϕ(k) ∧ ¬ϕ(k + 1). INDUCTION METHODS Normal induction (IND): (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n : n ∈ N : ϕ(n))) Transfinite induction (TFIND): 39 (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q)) → (∀m : m ∈ N : ψ(m))) We can prove the equivalence of IND and TFIND in two ways: in a constructive way or with a proof by contradiction.4. Assume k ∈ N. But then we would have ϕ(s) according to TFIND’ (namely if we take s for q and r for p). This is true by the assumption of IND’.3. with IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))) TFIND’ ≡ (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p)) → ψ(q))) Proof of TFIND’ → IND’: Assume ϕ is a property.e. Proof of IND’ → TFIND’: Assume IND’. We give both proofs. Now we have proved that (∀ϕ :: (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).e. Now we prove this for q > 0. instantiate ϕ with ψ.

for all k ∈ N. Let k ∈ N. TFIND gives us: (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let s(k) := (∀l : l ∈ N : l < k → ϕ(l)). the assumption) is to assume that something is true for ‘simpler’ forms of an object and then prove that it holds for ‘more complex’ forms. We prove (∀n : n ∈ N : ϕ(n)) by first proving that (∀n : n ∈ N : s(n)) by using IND. s(0) holds trivially since the range of l is empty in that case. (i) gives us that (∀n : n ∈ N : ϕ(n)). We now need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n : n ∈ N : ϕ(n)). In such cases straight induction is not always useful. But we have assumed that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).40 CHAPTER 3. Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). Thus ϕ(k) holds for k = 0. The inductive hypothesis (i. However. and that the act of building preserves that property. This means ϕ(k − 1) holds (since k − 1 ∈ N). Since s(k + 1) ≡ s(k) ∧ ϕ(k).. We want to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). If k = 0. we can conclude s(k + 1) because ϕ(k) follows from (i) and the definition of s(k). MATHEMATICAL CONSTRUCTS IN SET-THEORY Constructive Proof: Proof of TFIND → IND: Assume TFIND. by the definition of s. Also assume that (i): (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Proof of IND → TFIND: Assume ψ is a property. (∀l : l ∈ N : l < k → ϕ(l)) is trivially true since the range of l is empty. and thus (by using IND) that (∀n : n ∈ N : s(n)). Assume k > 0. Suppose s(k) holds. and (∀l : l ∈ N : l < k → ϕ(l)). and subsequently that (∀n : n ∈ N : s(n) → ϕ(n)). It is useful when objects are built up from more primitive objects: if we can show the primitive objects have the desired property. Structural Induction In many cases we do not want to prove properties about the integers or similar well-ordered sets. Clearly.e. We now have that (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k). ‘Complexity’ can be defined in several ways: the most common way is to say that one object is more complex than another if it includes that . This generalized induction principle is known as structural induction. Now we have s(0) ∧ (∀k : k ∈ N : s(k) → s(k + 1)). and let ϕ be a property. forms of induction can also be appropriate when trying to prove properties about structures defined recursively. Thus ϕ(k) holds also for k > 0. then we have shown that all objects must have the property. And.

We assume two real numbers x and y and prove EQ by induction on n. For some reason.3. in a later version of this report. and abbreviate the equality by EQ. but this need not always be the case. we rewrite the left side of this equation: LHS = (x + y)k+1 = (x + y)k (x + y) = (here in fact we are using the induction hypothesis) k j=0 k j xk−j y j (x + y) = . y ∈ R. and RHS is 1 0 x1−0 y 0 + 1 1 x1−1 y 1 = x + y Induction case: We assume EQ is true for n = k and have to show that it is then also true for n = k + 1 : k+1 (x + y) k+1 = j=0 k+1 j xk+1−j y j First. most textbooks take n = 1 as the basis. and the right-hand side RHS. in which case LHS is simply x + y. The binomial theorem states that for all x.4. since both sides are 1. 41 A general treatment of recursively defined structures (formal definition of structural induction over recursive datatypes) will be presented in a later version of this report. INDUCTION METHODS other object as a subpart. Example: We show that mathematical induction is an instance of the general notion of structural induction over values of recursively defined types. Example: As an example of the use of mathematical induction we prove the binomial theorem. Basis case: For n = 0 the EQ clearly is correct. and n ∈ N we have n EQ ≡ (x + y) = n j=0 n j xn−j y j We call the left-hand side of this equality LHS.

we use Pascal’s identity: (∀k. MATHEMATICAL CONSTRUCTS IN SET-THEORY k j=0 k j k x k−j+1 y + j=0 j k j xk−j y j+1 In rewriting the right side of the equation. This gives k RHS = j=0 k j k+1 xk+1−j y j + j=1 k j−1 xk+1−j y j . and y k+1 into the second sum (as the j = k + 1 term).42 CHAPTER 3. k x k+1 + y k+1 + j=1 k k+1 j k j k j + xk+1−j y j = k j−1 y j x k+1 + y k+1 + j=1 k xk+1−j y j = k x k+1 + y k+1 + j=1 x k+1−j + j=1 k xk+1−j y j j−1 We can now bring xk+1 into the first sum (as the j = 0 term). n : k. n ∈ N ∧ 0 < k < n : We first prove the latter: n k−1 = = + n k = n! n! + (k − 1)!(n − k + 1)! k!(n − k)! n+1 k = n k−1 + n ) k n! k n! (n − k + 1) n! (k + (n − k + 1)) + = k!(n − k + 1)! k! (n − k + 1)! k! (n − k + 1)! (n + 1)! = k! (n + 1 − k)! k+1 n+1 k Now we rewrite RHS: RHS = j=0 k+1 j xk+1−j y j = We split out the j = 0 and j = k + 1 terms before applying Pascal’s identity.

y ∈ R and n ∈ N. INDUCTION METHODS and k 43 LHS = j=0 k j k xk−j+1 y j + j=0 k j xk−j y j+1 The first sums of LHS and RHS are the same. leaf)). by counting both the leaves and the nodes. First we define a data structure for binary trees. leaf). We now define the size of a tree. and we can see that the second sums are also equal. by doing a dummy transformation (let i = j −1): k+1 j=1 k j−1 k xk+1−j y j = i=0 k i xk−i y i+1 So LHS = RHS. node(node(leaf. and we can conclude that EQ holds for all x. The structure of a binary tree is well known and says that a tree is either a leaf or made up of two subtrees glued together by a node. TREE ::= leaf | node < TREE × TREE > An example of such a tree is node(leaf. Again we define the size in the language Z. For this example we will use a definition in the notation of the language Z to describe recursive data structures.4. by first declaring its type and then saying how it is defined in each of the two cases: . The basic idea of the definition is that we define the size of a tree inductively over the structure.3. Example: We give an example of a proof about binary trees using structural induction. saying how the size of a given tree is calculated from the sizes of its parts.

Also. . t2 )) = 1 + size(t1 ) + size(t2 ) = (apply induction hypothesis to t1 and t2 ) 1 + (leaves(t1 ) + nodes(t1 )) + (leaves(t2 ) + nodes(t2 )).t2 )) = leaves(t1 ) + leaves(t2 ) ∧ nodes(leaf) = 0 ∧ nodes(node(t1 . t2 )) = (leaves(t1 ) + leaves(t2 )) + (1 + nodes(t1 ) + nodes(t2 )) = (commutativity and associativity of + ) 1 + (leaves(t1 ) + nodes(t1 )) + (leaves(t2 ) + nodes(t2 )). leaves(t) + nodes(t) = leaves(t) + 0 = 1 + 0 = 1. t2 )) + nodes(node(t1 . We prove the theorem by induction on the size of t. Then size(t)= size(node(t1 .t2 )) = 1 + nodes(t1 ) + nodes(t2 ) We now want to prove the following theorem by structural induction on the size of the tree t. The induction hypothesis says that the theorem holds for all t with size(t ) < size(t). t2 : TREE • leaves(leaf) = 1 ∧ leaves(node(t1 . Proof: Let t. Induction case: Assume t = node(t1 . t2 )) = 1 + size(t1 ) + size(t2 ) Similarly. t2 : TREE • size(leaf) = 1 ∧ size(node(t1 . size(t) = leaves(t) + nodes(t). MATHEMATICAL CONSTRUCTS IN SET-THEORY size : TREE → N ∀ t1 . t1 and t2 be of type TREE. And leaves(t) + nodes(t) = leaves(node(t1 . Base case: Assume t=leaf. Theorem: For all trees t. we make two new definitions about trees: leaves: TREE → N nodes: TREE → N ∀ t1 .44 CHAPTER 3. t2 ). Then size(t) = size(leaf) = 1. t .

REAL NUMBERS 45 3. or. a precise and formal basis for the real numbers was needed. Cauchy and Heine tried to define the irrational or real numbers in the second half of the 19th century. in other words. In 1872 Cantor and Dedekind followed with their precise definition of the real numbers. .. there is no rational number x such that x2 = x ∗ x = 2).5. We first present the three methods (of Dedekind. and repeating this process a (large) number of times. (see [30. see [12]). Even simple equations have no solutions if all we knew were rational numbers (for example. I am sure the reader will appreciate with sufficient clearness what I mean here by ‘neighboring’ and by ‘jumps’ (if he is not too pedantic). especially in connection with theorems asserting the existence of some limit relations.5 Real numbers What do we mean when we say ‘continuum’ ? Here is a description Albert Einstein gave on page 83 of [21]: The surface of a marble table is spread out in front of me. by going from point to point without executing ‘jumps’.3.C. in order to avoid begging the question of existence involved in the theorem. We express this property of the surface by describing the latter as a continuum. I can get from any point on this table to any other point by passing continuously from one point to a ‘neighboring’ one. Cantor and Cauchy) of defining the reals in terms of rationals and then show that they are identifiable. In order for set theory to cover the fundamental structures of analysis. When Cantor developed his set theory. it was well known that each type of number could be constructed as the limit of a sequence of numbers of another type. But it became clear that. People have been using the concept of real numbers for a long time (the Babylonians for example already calculated with roots long B. page 182]) the proof might require irrational numbers to be defined in terms of rational ones.

5 .4).5.Richard Dedekind.1 Dedekind’s cuts As a professor in the Polytechnic School in Z¨rich I found myu self for the first time obliged to lecture upon the elements of the differential calculus and felt more keenly than ever before the lack of a really scientific foundation for arithmetic. Each cut then determines a unique real number (see paragraph 3.mine. a subset C ⊆ V is a cut in V := 1) C = ∅ ∧ C = V 2) (∀a. Definition of a (Dedekind) cut: Given an ordering < on a set V . Dedekind’s original definition did not use a partition but a slightly more complex division. Each real number r can now be defined by a cut C in Q if r is the supremum for C. . We want to identify cuts that define the same real number.46 CHAPTER 3. Notice that we can also define the same cut as {x ∈ Q | x4 < 4}. in the opening of the paper in which Dedekind’s cuts were introduced.nu/stage. Actually. Dedekind defined a cut to determine a real number. Dedekind used the point at which the sequence is partitioned5 to define a real number. b ∈ C : a ∈ C ∧ b < a → b ∈ C) 3) C does not have a greatest element Example: {x ∈ Q | x2 < 2} is a cut in Q. notation C1 ∼ C2 := there is a supremum r for C1 and for C2 We can now define RDedekind as the set of all equivalence classes of all cuts in Q: RDedekind := {C ⊆ Q | C is a cut in Q }/∼. such as for example {x ∈ Q | x2 < 2} and {x ∈ Q | x4 < 4}. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3. all the members of one of which are less than all the members of the other. A cut is a partition of a sequence into two disjoint nonempty subsequences. Definition of (Dedekind) cut equivalence: A cut C1 is equivalent to a cut C2 . b : a.5. For details see the link ‘Dedekind cuts’ at http://zax.

bn > and < cn . .415 >. √ Each segment ‘includes’ 2. 3. .5.41. chapter 12]).42 >. < 1. dn > := (∀k : k ∈ N : bk ≥ ck ∧ dk ≥ ak ) Theorem: ∼ is an equivalence relation on the set of all chains of segments of Q . and in 3) a minimum bound is put on the speed of convergence. We now want to be able to say when two chains are equivalent. < 1. notation < an . 1. < 1. bn > when it n∈N is clear which set V is meant) is actually a sequence.3.2 Cantor’s chains of segments In mathematics the art of proposing a question must be held of higher value than solving it. Cantor defined a chain of segments to determine a real number (see also [17. 1. . dn > are equivalent. Note that < an . We can identify the real √ √ number 2 with the equivalence class of all sets that have 2 as supremum.5 >. 2 >. bn >V or < an . bn > ∼ < cn . .414. 1.A thesis defended in Cantor’s doctoral examination. bn >V is a chain of segments (in V ) := n∈N 1) (∀n : n ∈ N : an ∈ V ∧ bn ∈ V ) 2) (∀n : n ∈ N : an ≤ an+1 ≤ bn+1 ≤ bn ) 3) (∀n : n ∈ N : bn − an ≤ 2−n ) Example: Consider the following chain of segments in Q: << 1. Definition of chain segments: < an . Definition of chain equivalence: The chains of segments < an .5. REAL NUMBERS 47 √ Example: {x ∈ Q | x2 < 2} has 2 as supremum. This is a sequence of ever decreasing intervals in Q. >.4. bn >V (notation < an . the limit of which determines a unique real number.

His sequence of numbers defines a real by letting the numbers come closer to the real number in every step. . m : n. . . bn >∼ if (∀n : n ∈ N : an < r < bn ). 1. an+1 lays closer to 2 than an .4). . his last words quoted in [22].4. To be precise.414213.} are both Cauchy sequences. We also denote a Cauchy sequence {an }n∈N simply by an . r is determined by < an .5. V is in general an ordered. Definition of Cauchy Sequence: With {an }V is a Cauchy sequence in V := n∈N 1) (∀n : n ∈ N : an ∈ V ) 2) (∀k : k ∈ N : (∃p : p ∈ N : (∀n.’ to informally indicate an infinite continuation) defined sets {1.4142.} and {1. . m ∈ N : n. 1.√ 1. 6 a partial order on a set6 V .4121.5. .Louis Cauchy. and for the rest of this paragraph take V = Q. 1. 1. 1.414. MATHEMATICAL CONSTRUCTS IN SET-THEORY Each equivalence class of chains of segments in Q now determines uniquely a real number r. For each n ∈ N.3 Cauchy-sequences Men pass away. . We will not further discuss this here. . We now want to be able to say when two Cauchy sequences are equivalent. r then is the only real number with this property (see also paragraph 3. We can now define RCantor as the set of all equivalence classes of chains of segments in Q : RCantor :=< an . Cauchy defined a Cauchy sequence to determine a real number. 1. . m > p → | an − am | ≤ 2−k ))) Example: The informally (using ‘. .414. but their deeds abide. commutative ring. bn >Q / ∼ n∈N 3.41421.48 CHAPTER 3.

b :: | a + b | ≤ | a | + | b | )) : | c − c | = | (an − c) + (c − an ) | ≤ | an − c | + | an − c | < 2−k ∗ 2. Suppose the sequence has 2 limits. c and c . . if (∀k : k ∈ N : (∃p : p ∈ N : (∀n : n ∈ N ∧ n > p :| an − r |< 2−k ))). | a2 |.3. for all k ∈ N. The sequence converges. Then the sequence can be bounded by M = max. . Now we prove boundedness. Theorem: Any convergent sequence {an }n∈N is bounded and has a unique limit. . notation limn→∞ (an ) = r. if n > p . but we cannot use such definition here because we yet have to define the reals. Proof: First we prove (by contradiction) the uniqueness. notation an ∼ bn := limn→∞ (an ) = limn→∞ (bn ) Note that in the definition of equivalence the hitherto undefined notion of a limit is used. Then from the definition of convergence there is an integer p such that | an −c | < 2−k if n > p.5. With the following definition we can formalize the notion of a limit. if n > p ∧ n > p . k = l. We then have. that | aj | ≤ | aj − c | + | c | < 2−l + | c |. q : p. Definition of sequence convergence: A sequence {an }n∈N of elements of a set V is said to converge to a sequence {bn }n∈N . m ∈ N ∧ n > p ∧ m > q : | an − bm | < 2−k ))) Note: convergence is usually defined in terms of real numbers. | ap |. thus the limit is indeed unique. This means c = c . The number r is then called the limit of the sequence an . so we can take. Also. notation limn→∞ (an ) = limn→∞ (bn ) := (∀k : k ∈ N : (∃p. . q ∈ N : (∀n. A sequence is said to diverge if it does not converge. there is an integer p such that | an − c | < 2−k . for each sequence an from the equivalence class an ∼ . Take any k ∈ N. REAL NUMBERS 49 Definition of Cauchy sequence equivalence: The sequences an and bn are equivalent. | c − c | < 2 ∗ 2−k . m : n. again using the triangle inequality.{| a1 |. Then there is a p such that | aj − c | < 2−k for j > p. (1 + | c |)} Each real number can now be defined by an equivalence class of Cauchy sequences: r is determined by an ∼ if r = limn→∞ (an ). . Hence. for example. Adding the two equations we get (using the triangle inequality: (∀a.

50 CHAPTER 3. . r2 : r1 .5. which is dense in X. i.e. In the rest of this section we assume that by R we mean this set of reals.e. densely ordered (i. r2 ∈ D : (∃q : q ∈ Q : r1 < q < r2 ))) set D without endpoint. We can now check whether the three new definitions indeed are correct ways to identify real numbers: 1) < an . we intuitively thought of the reals as infinite sequences of (decimal) digits. bn > Q is a chain of segments → (∃!c : c ∈ R : (∀n : n ∈ N : an ≤ c ≤ bn )) 2) C is a cut in Q → (∃!c : c ∈ R : c = supremum(C)) 3) {an }n∈N is a Cauchy sequence → (∃!c : c ∈ R : limn→∞ (an = c)) Then we can check for every newly defined set X of reals that: a) it contains a countable. MATHEMATICAL CONSTRUCTS IN SET-THEORY We can now define RCauchy as the set of all equivalence classes of Cauchy sequences in Q : RCauchy := < an >Q / ∼ n∈N 3. It can be proven that if these two properties hold we have defined the reals successfully such that there is a total ordering on the reals. b) every Dedekind cut has a supremum in X. Every set for which a) and b) hold is isomorphic with R. (∀r1 . the reals are densely ordered and the ordering is continuous.4 Properties of the three definitions Before these definitions for real numbers were given. all infinite sequences of decimal numbers. If a definition satisfies a) and b) it possesses the properties we intuitively want the real numbers to have.

. notation | V |. and the purpose of life is to grasp as much as we can out of that infinitude. i.6. This is the concept of an equivalence relation between sets (the relation is also referred to as ‘are of the same cardinality’.3. page 229])). Before we present a proof of this result. INFINITE SETS 51 3.N. He observed that two finite sets have the same size if the elements of one set can be paired with the elements of the other set.A. But if we consider ∼ to be a true relation. notation V ∼ W := there is a bijection f : V → W It is simple to check that ∼ has the properties of an equivalence relation. But counting the elements does not end for infinite sets. chapter 4]) and proposed a rather nice solution to this problem. we first introduce some more definitions.6 Infinite sets Our minds are infinite.e. this method compares sets without resorting to counting and can be extended to infinite sets. it is reflexive. Definition of set equivalence: A set V is equivalent to a set W . the set of all sets: ∼ ⊆ V × V . . see [30. symmetric and transitive. and led to the surprising result that there are many levels of infinity. ‘equipotent’ or ‘equipollent’ (see [30. we need the concept of V . and yet even in these circumstances of finitude we are surrounded by possibilities that are infinite. This new method to measure the number of elements of a set is reflected in the notion of cardinality of a set. Cantor was concerned with the problem of measuring the sizes of infinite sets (because he was investigating questions about singularities of Fourier series.8. using Cantor’s famous diagonalization method. see section 3. can be defined by the number of elements that it has. Whitehead in [76] The size of a finite set V . But the existence of V is paradoxical.

In 1884. the cardinality of the set of reals can also be denoted by aleph-one. . .8 . Cantor defined it as c. after the size of a set V . Cantor regarded cardinals as special abstract entities of a new kind. . the equivalence class of V modulo ∼. A more rigorous treatment of cardinal numbers will be given in section 3. notation ℵ0 . called the cardinal number of V . . This new concept enabled Cantor to define more concepts for the analysis of infinite sets.1. notation ℵ1 . Frege also denoted finite cardinal numbers as natural numbers: ∅ = 0. to come to an exact definition of cardinality from this postulate. {∅}} = 2. such that V ∼ W ↔ V = W .7). This Frege-Russell definition would become standard.52 CHAPTER 3. ℵ2 . Cantor used the Hebrew letter aleph to name the different levels of infinity. MATHEMATICAL CONSTRUCTS IN SET-THEORY Postulate for Cardinal numbers: With every set V is associated a well-defined abstract entity V . we have • If V is finite: V = the number of elements of V • If V is infinite: V = ℵi . It also inspired others to analyze the properties of infinite sets. . We can think of V as denoting the common property of set equivalence (as defined above) of all sets in the equivalence class of V . . {∅} = 1. . the German mathematician Frege came with his own definition of cardinal numbers. The postulate for cardinal numbers then follows at once. until . Since the cardinality of the set of reals was unknown. The cardinality of the set of natural numbers is by definition called aleph-null or aleph-nough.. {∅. If we assume the continuum hypothesis (see section 3. that says there is no level of infinity between the cardinality of N and R. He discussed it with the mathematician Russell and they proposed the idea of defining V as V / ∼.. The ‘next levels’ of infinity are called ℵ1 . when there exists a bijection between V and the set P i (N) Sometimes the cardinality of a set V is also denoted by | V | . It proved difficult however. Property of cardinality: Given the cardinality V of a set V .it became known that this definition could also lead to a paradox.8.as we will later see in section 3.

i. defined recursively by g(0) = a / and g(k + 1) = f (g(k)). so g(0) = g(j). / j = l + 1: we know k = 1 = j = l + 1. We prove the two implications of the theorem separately: V is Dedekind infinite → V is infinite: V is Dedekind infinite. We use induction on i: i = 0: if 0 = j then g(0) = a ∈ W and g(j) ∈ W .D. / i = k + 1 : assume k + 1 = j. there exists a W ⊂ V such that V ∼ W . no other idea has so fruitfully stimulated his intellect. so k = l. We now have to show that g is an injection. yet no other concept stands in greater need of clarification than that of the infinite.e. Consider the function g : N → V .e.e for all i. then we can prove g(k + 1) = g(j) by induction on j: j = 0 : g(0) = a ∈ W and g(k + 1) ∈ W . quoted in [96] 53 In the rest of this section we will present some of the results of the research of infinite sets. By the induction hypotheses g(k) = g(l). INFINITE SETS No other question has ever moved so profoundly the spirit of man.3. so g(k + 1) = g(0). Definition of finite: A set V is finite := (∃n : n ∈ N : V ∼ {x ∈ N | x < n}) Definition of infinite: A set V is infinite := V is not finite Definition of Dedekind infinite: A set V is Dedekind infinite := (∃W : W ⊂ V : V ∼ W ) Theorem: V is Dedekind infinite ↔ V is infinite (from [17]) Proof: We show that V is infinite iff N ≤1 V .e g(k + 1) = g(l + 1) or g(i) = g(j). i. i. . Since f is a bijection we also have that f (g(k)) = f (g(l)). Because W is nonempty and W ⊂ V there also exists an a ∈ V such that a ∈ W . . i.6. Hilbert. there exists a bijection f : V → W . j ∈ N : i = j → g(i) = g(j).

Z and Q all have the same cardinality and also called these sets countably infinite. we would not obtain a correspondence between N and Q. the second fractional digit of x also different from 0. we obtain a correspondence between N and Q. is equivalent to V (W ∼ V ). 2 An example of an uncountable set is the set of real numbers. b ∈ N+ . starting from the north-west corner. We show that W := V − {f (0)}. since the list would never get to the second row. we hereby skip an element when it 2 1 would cause a repetition. We can also give a bijection from Q to an infinite subset of N which is equivalent to N: for each fractional a ∈ Q with a and b b relative prime. R. In 1873 Cantor proved that R is uncountable. clearly a real subset of V (W ⊂ V ). b >) := 1 (a + b)(a + b + 1) + n. Continuing this way . We construct this X by taking the first fractional digit of x arbitrarily but never 0 or 9 or the first fractional digit of f (1). Theorem: R is uncountable Proof: Suppose there is a bijection f between N and R. Definition of countable: A set V is countable. 9. MATHEMATICAL CONSTRUCTS IN SET-THEORY V is Dedekind infinite ← V is infinite: N ≤1 V . By listing the elements at the diagonals (south-west to north-east). Consider a table with all fractionals a (a ∈ N. see [17. so there exists a bijection f : N → V . with fractional a on the ath b b row and the bth column. etc. We contradict this by finding an x in R that is not paired with anything in N. If we list all elements row by row. using a technique called diagonalization (also known as the diagonal method). also called denumerable := V is finite or V ∼ N Definition of uncountable: A set V is uncountable := V is not countable Definition of denumeration: A denumeration of a set V is a bijection f :N→V Cantor then proved that N. The following function g is a bijection from V to W : g(f (i)) = f (i + 1). Theorem: Q is countable Proof: We give a bijection from N to Q. by listing all elements of Q. and the second fractional digit of f (2). let f (< a. Because 2 = 1 . g(x) = x if x = f (i). page 99].54 CHAPTER 3. etc. for all i ∈ N.

and g = KWg . Cantor also conjectured that ≤1 is a partial order.6. INFINITE SETS 55 down the diagonal of the table of digits. For W ⊆ V . This was later proven independently by the two mathematicians F. Bernstein and E. we can use this diagonalization method to show that N ∼ {0. 1}V . Then we have that (KW1 (w) = 1 ∧ KW2 (w) = 0) ∨ (KW1 (w) = 0 ∧ KW2 (w) = 1). o We give two theorems that are based on the relation <1 : Theorem: (without proof) (∀ V : V is a non-empty set: V <1 P(V )) . The relation ≤1 is reflexive and transitive. Schr¨der (see [59. and thus (∃w : w ∈ V : KW1 (v) = KW2 (v)). the characteristic function of W . We can define an ordering relation ≤1 on the cardinalities of sets. (see [17. . Let Wg = {v ∈ V | g(v) = 1}. Then V <1 W of course means that V ≤1 W holds but not V ∼ W . thus (∀v : v ∈ V : KWg (v) = g(v)). page 98]) Proof: We show that there is a bijection K from P(V ) to {0. we obtain all digits of x. This relation on the set of cardinals only depends on the cardinals themselves and not on the choice of the particular sets V and W . that means there / / is an element w ∈ V . define K(W ) (also denoted KW ). being equal by never selecting a 9 or a 0. KW1 = KW2 . Then (∀v : v ∈ V : KWg (v) = 1 ↔ g(v) = 1).3999 . as: KW (v) = 1 if v ∈ W / KW (v) = 0 if v ∈ W . 2) f (w) is surjective: suppose g ∈ {0. x is not f (n) for any n because the nth fractional digit of x differs from the nth fractional digit of f (n). . i.e. 1}N . We say that V ≤1 W if there is an injection from V to W but not vice versa. 1}W : 1) f is injective: let W1 .3. and 2. W2 ⊂ V and suppose W1 = W2 .4000 . Similarly. . . 1}V ). Note that we avoid the problem of certain numbers such as 2. 1}V . page39]). We now show that K is a bijection from P(V ) to {0. Theorem: (∀V :: P(V ) ∼ {0. such that (w ∈ W1 ∧w ∈ W2 )∨(w ∈ W1 ∧w ∈ W2 ).

We also have that (∀n : n ∈ N : f (n) ∈ Sn ∧ y ∈ Sn ). 1] Proof of Poincar´ (see [17]) We show there is no bijection f : N → [0. e in particular (∀f : (f : N → [0.5. Since Ran(g) ⊆ V and g ◦ f is an injection from V to Ran(g). We do this by constructing for every function f : N → [0. Theorem: N ∼ [0. W ⊆ V ∧ V ≤1 W → V ∼ W (see the lemma below). so (∀n : n ∈ N : y = f (n)).e. each part has length 3−n−1 ). 1]) : f is not surjective). 1]. we might still think that there is some smaller interval of the reals that can be paired to the naturals. so to prove g is surjective we have to prove Ran(g) ∼ W . and thus certainly y ∈ [0. for if we divide a segment Sn = [pq . / The following theorem gives a way to prove the equivalence of sets: Theorem of Cantor-Bernstein: V ≤1 W ∧ W ≤1 V → V ∼ W Proof: Assume V ≤1 W and W ≤1 V .e. We construct this y by means of a chain of segments (see paragraph 3. Let f : N → [0.56 CHAPTER 3. we have Ran(g) ∼ V . We know that Dom(g) = W .5. . with (∀n : n ∈ N : y ∈ Sn ). we have V ≤1 Ran(g).2) a real number y. Let Sn be an infinite chain of segments such that 1) (∀i : i ∈ N : f (i) ∈ Si ) / 2) (∀i : i ∈ N : Si+1 ⊆ Si ) 3) (∀i : i ∈ N : | Si | = 3−i−1 ). We take this part for Sn+1 . Then there are injections f : V → W and g : W → V . And since for all W and V . MATHEMATICAL CONSTRUCTS IN SET-THEORY Theorem: V is Dedekind infinite ↔ N ≤1 V Proof: This theorem follows directly from the theorem on page 53 and the definition of infinite.2). with | Si | being the length of segment Si . We can construct such a chain of segments. 1]. i. at least one of these parts does not contain f (n + 1). Although we have seen that N is countable but R is not. 1] such that (∀n : n ∈ N : f (n) = y). The constructed chain of segments determines (see paragraph 3. qn ] in three equal parts (i. 1] a y ∈ [0. 1].

and (∀n : n ∈ N : An+1 := h(An )). thus w ∈ A0 . 1) ≤1 [0.3. it follows that k(a) = k(b) n by the definition of k and the injectivity of h. 1]. 1}R ∼ P(N) ∼ NN . a ∈ n An ∧b ∈ n An . 1]. Again we use case / analysis: – if w ∈ / n An then w = k(w). then k(a) = k(b) by using a case analysis / / / a ∈ n An ∧ b ∈ n An . a ∈ n An ∧ b ∈ n An . so (0. b ∈ R by using the theorem of Cantor-Bernstein. • k(a) := a if a ∈ / n An n • k(a) := h(a) if a ∈ An We show that k is a bijection: • k is injective: Suppose a = b. Proof of (0. 1] to (0. 1) ∼ [0. 1). b) ∼ [0. 1]: The identity function id(0.1) : (0. Thus there is a w ∈ Ap−1 such that w = k(w ). 1) → [0. 1]. Using the Cantor-Bernstein theorem we can also prove that (a. We now give the desired bijection k : V → W . • k is surjective: Suppose w ∈ W . 1) ∼ R ∼ Rn ∼ {0. Let A0 := V − W . 1) to (a. Proof of (0. a ∈ n An ∧ b ∈ / An . 1] for all a. 1) ∼ (a. 1] and consequently that (0. 3 By the theorem of Cantor-Bernstein we now know that (0. for all n ∈ N. 1) ∼ (a. so [0. There is an injection h : V → W . 1]. b). p ≥ 1. We first prove that (0. . assume w ∈ Ap . Example: We prove that (a.6. b) ∼ (0. Since w ∈ A0 . b) ∼ [0. For all cases. b): The function f (x) = (b − a)x + a is a bijection from (0. INFINITE SETS 57 Lemma: W ⊆ V ∧ V ≤1 W → V ∼ W Proof: Suppose W ⊆ V and V ≤1 W . 1) ∼ [0. The function f (x) = 1 (x + 1) is an injection from [0. 1) to [0. 1] is an injection from (0. 1] ≤1 (0. 1) ∼ [0. by the transitivity of ∼ we can conclude that (a. 1). / – if w ∈ n An . Then. n ≥ 1. b).

the weakest non-trivial case occurs when W is denumerable (see page 54 for the definition of denumerable). An instance of the following theorem (without proof) of the British mathematician F. Theorem of Ramsey: If V is a denumerable set and f : V n → {0.e. in which case it is termed ZFC (ZF supplemented by the Axiom of Choice).mine.3. i.nu/stage and click on ‘links’ for some quotes about the AC. . xn }).3). . page 310]) led Zermelo to axiomatize set theory (see section 5. . Axiom of choice (AC): Given any set W of non-empty sets V . In this way we get a countable subset of V .}. In response some people asked when this assumption developed from mathematics. . The only assumption we have made here is the so-called axiom of choice. when it is implicitly used. m ∈ N and n. The notation V n in this theorem is defined as the set of all subsets of V with n elements. namely {x0 . . We take one element x0 ∈ V . For more details on the role of the AC. there is a function f which assigns to each member V of W an element f (V ) of V . x1 . . See http://zax. Ramsey is often used in graph theory. . The axiom can be restricted by limiting to those families W of a particular cardinality. . This definition was proposed first in an article by Zermelo in 1908 (translated in [93. . pages 199-215]). Zermelo attempted to prove AC. 1. .3 and [63]. We can repeat this infinitely (i. . Zermelo regarded the AC as already implicitly used by mathematicians. but the controversy over his proof of 1904 (see [63. V n := {X ⊆ V | X has n elements}. m ≥ 1 then (∃W : W ≤1 V : W is denumerable and f is constant on W n ).P. we refer to section 5. This case is known as the Denumerable axiom.e. and when exactly it can or cannot be avoided.1). x2 . Such a function f is called a choice function for W . We can add AC to set theory based on the axioms of Zermelo and Fraenkel (ZF. Since for any finite W the axiom is provable.58 CHAPTER 3. see section 5. m− 1} with n. we take an element x1 ∈ V − {x0 }. Next. . if we assume that it is possible to always select an element from any non-empty set (see the axiom of choice below). MATHEMATICAL CONSTRUCTS IN SET-THEORY Theorem: V is infinite → N ≤1 V Proof: V is infinite and thus not empty. for all n we can select an x ∈ V − {x0 .

is mapped to the set {11. that is two 2 real numbers between 0 and 1. 1178. 1) and R. Now we give an injection from (0. . For example. an = 9. 11780.4000 . . 1). 1). INFINITE SETS 59 Theorem: R2 ∼ R ∼ (0. . with 0 ≤ ai ≤ 9. For example. 117. Clearly. Thus: R ∼ (0. 1).. 1) and (0. . 1) ∼ R. r = 0.3999 . Since ∼ is transitive. Suppose V ∈ P(N).76584 . If we consider an element of R2 . then we can map these numbers to an element r ∈ R by interchangeably taking the next digit of each of the two numbers. This injection proves that P(N) ≤1 R. defined as f (x) = tan( π (2x−1)). .3.a1 a2 .e. 71635 . We want to identify numbers such as 0.. hence P(N) ∼ (0. 1a1 a2 . . . r = 0. . 0. there exists a bijection f : (0. i. 1) Proof: We can say that R ∼ (0. 1) ≤ P(N). . 117803.a1 a2 . . Then we map r to the set {1a1 . . . 1) Proof: First we show that P(N) ≤1 R. .).6. and 0.17803 . n ∈ N. Therefore we assume there is not an i ∈ N such that for all n > i.13275.}. 1). . 1) to P(N): assume r ∈ (0. . Indeed. . . we map (0. . Corollary: P(N) ∼ R Proof: This directly follows from P(N) ∼ (0. . . this mapping is well-defined. . we know that R2 ∼ R ∼ (0.) uniquely to (0. and the transitivity of ∼. .} of natural numbers. . 1) → R. Theorem: P(N) ∼ (0. . Thus: R2 ∼ R. 1) if there is a bijection between (0. with ai = 1 if i ∈ V and ai = 0 otherwise. map V to the decimal 0. . Thus (0.

5). ℵ3 . 7 . we can also state CH as: (C) P(N) ∼ ℵ1 . are all equivalent in ZF C. page 197]). countably infinite or has the same cardinality as the entire set of reals’.60 CHAPTER 3. This means that ‘the number of real numbers is the next level of infinity above the number of natural numbers’ (see also [30. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3. ℵ2 . for continuum. The cardinality of the power set of any set X is equal to the cardinality of {0. .7 The Continuum Hypothesis We still think that the study of the size of the continuum should be our guiding light for further research in set theory.Judah Haim in [33] After showing that the real numbers cannot be put into one-to-one correspondence with the natural numbers (see section 3. This hypothesis was first published in 1878 in [13] and became later known as: The Continuum Hypothesis (CH): (N ≤1 A ≤1 R) → (A ∼ N ∨ A ∼ R) This hypothesis (as given in [17. Actually in this formulation we have identified the cardinalities ℵ0 and ℵ1 with the sets that have these cardinalities. so another formulation7 of CH is: (D) 2ℵ0 = ℵ1 (see [31]). (C) about subsets and (D) about cardinal exponentiations. although (B) leads us to think about sizes of reals. He also named the cardinality of the reals c. Since Cantor also proved that P(N) ∼ R (see page 59). 1}X (see page 55). etc. As we saw in section 3. Cantor’s original formulation of CH was: (B) c = ℵ1 . often denoted as 2X . page 128]) is also known in many other forms. of which we will mention and explain the most important. These formulations. We can immediately see that the following version of CH is equivalent to the given definition: ‘any set of real numbers is either finite. and the next levels of infinity to be ℵ1 .6. We will not go into details of less precise or more dependant formulations such as ‘what is the cardinality of the set of points on a geometrical line?’. Cantor defined the cardinality of the natural numbers to be ℵ0 . Cantor hypothesized in 1877 that each infinite subset of R is either denumerable or equivalent to the continuum.

we refer to section 3. If you are not familiar with the notations that are used.7. Note that ZF + GCH AC (so we don’t need ZF C once we have GCH).3) we have that ZF + GCH CH. will be introduced in later chapters. for details see [31. THE CONTINUUM HYPOTHESIS 61 Some of the theory that is needed in the remaining part of this section.2) put it first in his list of 23 problems. Forcing is a combinatorial technique for proving statements consistent with the axioms of set theory.8. Essentially it consists of a method of performing the following algorithm: start with a model of set theory M. Consider the smallest model M with X an element of M and M a subset of M (this is done in a way such that the construction of M is implicit in the construction of .3. for the generalized continuum hypothesis. people suspected that CH was one of the statements (of ZF C) that can neither be proved nor disproved.1. In 1938 significant progress was made when the mathematician G¨del o proved (in his article ‘What is Cantor’s continuum problem?’) that CH is consistent with ZF C (see section 5. (see section 5. page 117]). Obviously. Construct an object X not in M with certain properties. Since at the same period. but did not succeed. This problem was so important that Hilbert (see section 6. Mathematicians suspected that CH was undecidable in ZF C but it took until 1963 until this was proved by Paul Cohen in [15]. Cohen used it in order to prove that the negation of AC and the negation of CH are consistent with the axioms of set theory (AC and CH were already known to be consistent). Cantor and many other great mathematicians spent years trying to prove CH or its negation (Cantor tried to prove his hypothesis by using a decompensation theorem. To do that he used a new technique called forcing.2) by constructing a model of ZF C + CH. G¨del proved his famous incompleteness theo orem (see chapter 8).3. you might want to skip the remaining part of this section and get back to it later. In 1908 the German mathematician Felix Haussdorf proposed the following generalization of CH (that is also called aleph-hypothesis): The Generalized Continuum Hypothesis (GCH): (∀r : r is an ordinal : 2ℵr = ℵr+1 ) For a definition and the notation of ordinal numbers.

In other words. not been adopted as axioms of set theory.62 CHAPTER 3. the continuum hypothesis is true. and whether. showed that CH is undecidable in ZF C. with some axiom system for set theory. Mathematicians either accept this incompleteness in set theory or try to find more intuitive axioms that will help decide it. along with o G¨del’s model of ZF C + CH. Thus Cohen constructed a model of ZF C + ¬CH and this. For more details on forcing. unlike AC. So this means that either CH or ¬CH could be added as an axiom of ZF C. . MATHEMATICAL CONSTRUCTS IN SET-THEORY X). see [51] and [81]. But since neither of these axioms seems axiomatic or ‘self-evident’ they have. the question remains what intuitive axiom of set theory we need to make it more complete.

3}. Note that in this example the equality ‘=’ between cardinal numbers is a new type of equality that is defined as A = B ↔ A ∼ B. or A = D. that is. As we have already seen on page 52. {4}}. b}}. {a. Cantor realized that all sets that are equivalent to a given set V have a common property. . one wanted to regard cardinal numbers as objects of the mathematical system. We can see that cardinality abstracts from the order and nature of the elements. 2. this is the cardinal number of N or any denumerable infinite set. ℵ2 . . Cantor defined the ‘next’ levels of infinity by ℵ1 . 7.e. 1}. In naive set theory. C = {{4}.3). the definition of the cardinal V of V poses no problem: V can be defined as the set of all sets equivalent to V . every transfinite set. The next question was how to pass from the abstract notion of cardinal numbers to real cardinal numbers. Example: Consider the following sets: A = {1. and for finite sets the cardinal number can be identified with the ordinary ‘number of elements’. or (equivalently) A = B = C.8. Therefore we identify the cardinal number of a finite set of n elements with the natural number n. He identified this property with the cardinal number V of a set V . i. must have a definite aleph as its cardinal number. .. B = {3.Georg Cantor 3.8. as well as in Quine’s ‘New Foundations’ (see section 7. see page . We denote the smallest infinite set (or transfinite) cardinal number by ℵ0 . We can say that A ∼ B ∼ C. D = {1. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 63 3. After defining the equivalence of sets (see page 51).1 Cardinal numbers and Cantor’s Paradox In section 3.8 Cardinal and Ordinal numbers and Paradoxes Every transfinite consistent multiplicity.6 we already encountered cardinal numbers and the notion of set equivalence. a property that abstracts from the nature and order of the elements of a set.3. But this definition (first given by Frege. 2. It turned out to be quite a problem to define the cardinal V of a set V as an object of set theory. We also have A ∼ D. .

without the unrestricted comprehension axiom. The existence of such an α is guaranteed by the well-ordering theorem. If we have the axiom of foundation among our axioms. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3. section 2. The following definition that comes from the mathematician von Neumann is now the standard definition for cardinal numbers. . chapter 6]. We call this unique α the cardinality or cardinal number of the set V .6 can lead to a paradox that was first found by Cantor. we refer to [59.g. based on ordinal numbers (for which we refer to the next section8 ). and is also denoted by V . With this paradox the need arose to find a new definition of cardinals in a context without the unrestricted comprehension axiom. Cantor’s paradox: The set of all sets is its own power set. For more information on the definition and calculus of cardinal numbers.64 CHAPTER 3. such that traditional paradoxes could no longer be derived.6) of cardinal numbers as given in section 3.10]) that there exists exactly one cardinal number α satisfying V ∼ α (proof uses AC). Therefore. 8 The rest of this section depends on concepts that are defined in later chapters.3). Definition of Cardinal number (or initial number): A cardinal number α := an ordinal number α with property (∀γ :: α ∼ γ → α ≤ γ) For each set V we can prove (see [17. there is no set which contains all sets equivalent to V . see section 5. In axiomatic set theory however (e. [25] and [34]. In other words. in ZF. In the absence of the axioms of choice and foundation the operation V is undefinable (see [1]). even if the axiom of choice is absent we can define V as the set of all sets W of least rank among those equivalent with V (see [1]). the cardinality of the set of all sets must be bigger than itself. Several new definitions of cardinal numbers were then proposed. with the axiom of choice we can develop the theory of ordinals in the von Neumann way and define V to be the least ordinal α equivalent to V .

6. regardless of the order of the elements.1) the question was posed how to define ordinal numbers as part of set theory. a property that only abstracts from the nature of the elements of a set.3. We consider two ordered sets V and W similar . In 1883 Cantor defined in [13] an ordinal number as the order type of a wellordered set. The ordinal number of that set is the same. and saw in the previous paragraph that it abstracts of the order and nature of the elements of a set. Definition of an Order Type: An equivalence class under the phism) relation (isomor- The equivalence class to which an ordered set V belongs is called the order type of V .8. that only abstracts from the nature of the elements of a set.8. And just as for cardinals (see section 3. but retains the order in which they are given. For each finite and simply ordered set. if there is a bijection between V and W that retains all order relations. Here we consider sets with a total ordering (see page 25). Note that we have already seen this relation with the concept of isomorphism (‘is isomorphic to’. Cantor also defined a property of sets. we also can say they are of the same order type. we can therefore . the ordinal number .2). Instead of saying two sets are similar. All well-ordered sets that are as such similar to a given set V have a common property. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 65 3. and note that is an equivalence relation. In the case of ordered sets. Definition of Ordinal Number (Cantor): A well-ordered set V has ordinal number o := o is the order type of V If a set is finite and simply ordered. notation V W . each non-empty subset also has a first member in the given ordering (see also page 3. Cantor identified this property with the ordinal number V of a well-ordered set V . the concept of equivalence is now replaced by the sharper concept of similarity. it is well-ordered and it has an ordinal number.2 Ordinal numbers and Burali-Forti’s Paradox We already introduced Cantor’s concept of cardinal number in section 3. Recall that in addition for a well-ordered set. see page 31).8.

4.}. . . was found for ordinal numbers. V4 = {. . . The natural numbers in reverse order are denoted ∗ω. V1 = ω + 1 . . 1}.. . . . 2. 2. page 66]). . . . 6. This is the ordinal number of the sequence {0. 3. 3. 3. 4. then consider the union of all the sets {0. Burali-Forti. 1. More ordinals can be obtained by continuing this succession. . 3. It has a successor: ω ∪ ω. 1. . . 2} . 2 = {0. 1. The smallest infinite ordinal number is called ω. 5. V5 = ω +∗ω . 1. MATHEMATICAL CONSTRUCTS IN SET-THEORY identify the (finite) cardinal number with the ordinal number. 6. This is another ordinal called ω and is the first non-finite ordinal. . V3 = ω + ω = ω ∗ 2 V4 = ∗ω . . called ω + 1.} N = ω . V2 = {3. . }. Example: 0 = ∅. V6 = ω ∗ 10 For ordinal numbers n of N and m of M we say that n < m if the wellordered set N is similar to a real subset of M . 1} V5 = {1. 2. Example: If we call the set ∅ as ‘0’. . . V2 = ω + 2 . . and taking the union of all these ordinals yields an ordinal we call ω ∗2. 3. 12. 11. 3 = {0. . V6 = {1. which can be seen as N or as the sequence of finite cardinal numbers in their ‘natural’ order. . . . 2. . 21. 5. . 22. . V1 = {2. . 4. etc. . We introduce some other transfinite ordinals by example (from [10. . etc. a similar situation as for cardinal numbers. 1 = {0}. 2} V3 = {1. . Unfortunately. . 2. that this definition can give rise to a paradox (see [18. In 1897 it was found by the Italian assistant of the mathematician Peano. . 5. . page 259]). the next set as ‘1’. . 2} are ordinal numbers. 4.} .66 CHAPTER 3. . 1} .

taken in their natural order. there exists exactly one ordinal number isomorphic to V . {0. For every well-ordered set V . 1.. These ordinal numbers. Von Neumann defined this particular set as the ordinal of that order type. which are finite sets. given by John von Neumann in [61]. {∅}}. and therefore also has an ordinal number Ω. In 1923 he pointed out that among all well-ordered sets having a Cantorian ordinal as their order type. Definition of ordinal number of a well-ordered set V : The ordinal number of a well-ordered set V := the ordinal number isomorphic to V .2]). {0. Otherwise said: 0 = ∅.8. the Burali-Forti paradox can no longer be applied. . form a well-ordered series. . . is called a transfinite ordinal number . This led to new proposals for definitions of ordinal numbers. since the set of all ordinals is well-ordered by and 2) also holds (a proof is given in [59. . Hence we hereunder present another definition. there is a particular one with some very special properties. and therefore Ω exceeds any ordinal number whatsoever. 1 = {∅}. 2} by 3. The set ω = {0.3. Definition of ordinal number: A set α is an ordinal number := 1) α is a well-ordered set with the binary relation ∈ as its ordering 2) (∀β :: β ∈ α ↔ β ⊂ α) With this definition of ordinal numbers. etc. the empty set is an ordinal number. 2 = {∅. . are called finite ordinal numbers. The finite ordinal numbers are identified with the natural numbers. 1} by 2. like ω.} of all natural numbers is also an ordinal number. section 4. Similarly we also denote the ordinal numbers {0} by 1. This ordinal number is also denoted by 0. . According to this definition. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 67 The Burali-Forti Paradox: The set of all ordinal numbers. But the ordinal number of any subset of the set of all ordinals exceeds every number of that subset. 2. 1. An ordinal number that is an infinite set.

This means that ordinals give us a way of ‘counting’ any set. The least transfinite ordinal is ω (see also 5.2 we also write α ∈ β (we denote ordinals by lower-case Greek letters) as α < β. which Cantor had accepted as true (see [18. Rigorous study has produced a complete calculus of ordinal numbers and produced significant results. As we saw in 3. page 257]) but that was first proved rigorously by Zermelo in 1904. which defines an ordering on the ordinal numbers. In the remainder of this section we will only define the most common concepts. The first limit ordinal number is ω. the set α = {γ | γ ≤ α} (called a segment of α) also is an ordinal number. and all the other ordinal numbers are called isolated ordinal numbers. the supremum of V . even if it is not finite. . We only mention here the so-called well-ordering theorem. A transfinite ordinal without a predecessor is called a limit ordinal number . The particular significance of the well-ordering theorem lies in the possibility that we can apply the principal of mathematical induction (which is well known for denumerable sets. and the ordering of the finite ordinal numbers coincides with the usual ordering of the natural numbers. MATHEMATICAL CONSTRUCTS IN SET-THEORY A detailed treatment of ordinal calculus that is based on this definition of of ordinal numbers. Ordinal numbers form the basis of transfinite induction which is a generalization of the principle of induction. see section 3.3. and α is the unique predecessor of α. is outside the scope of this report.2). For any set V of ordinal numbers. The ordering ≤. Well-Ordering Theorem: Every set can be well-ordered. defined by α ≤ β := α < β ∨ α = β. The least ordinal number is of course 0. Therefore we can apply transfinite induction (see page 37) on ordinal numbers. A full treatment of the theory of ordinal numbers is omitted here. {γ | (∃η : η ∈ V : η ≤ γ)} is an ordinal number. For any ordinal number α.4. is a linear ordering and a well-ordering of the ordinal numbers.3) to any arbitrary wellordered set.68 CHAPTER 3.

ordinal calculus and set theory we refer to two classical books on set-theory: [25] and [34]. there is no highest aleph) • If two sets have the same ordinal number.8. . they have the same cardinal number.3. but not necessarily vice versa For more information and theory on cardinal numbers. CARDINAL AND ORDINAL NUMBERS AND PARADOXES We now have the following properties (given without proof): 69 • Two finite and ordered sets have the same order type if and only if they have the same cardinal number • Cantor’s theorem : the cardinality of any set is lower than the cardinality of the set of all its subsets (i.e. the latter is more suitable for readers with experience on set theory. The first one gives a good introduction to set theory and presupposes little mathematical knowledge.

MATHEMATICAL CONSTRUCTS IN SET-THEORY .70 CHAPTER 3.

still lack a satisfactory solution. What is number theory? The field of mathematics consisting of the study of the properties of the natural numbers Since then.Peano in the opening of the paper ‘Arithmetices Principia’.Chapter 4 Peano and Frege 4. Peano became aware of the lack of rigour in mathematics by his experience in teaching infinitesimal calculus.1 Peano’s arithmetic Questions that pertain to the foundations of mathematics. Like Dedekind (see quote on page 46). independently of intuition or common sense. . He came to the conclusion that mathematics must be constructed. The difficulty has its main source in the ambiguity of language. for an abstract mathematics. Peano strived for rigor. Peano gained particular interest in the field of number theory. After his work on calculus (see Peano’s first publication [65]) and geometry (see [66] [67]). also known as arithmetic. although treated by many in recent times. novo methodo exposita in which he introduces axioms for the integers The Italian mathematician Giuseppe Peano (1858-1932) spent most of his career successively in the infinitesimal calculus. in a way that absolutely guarantees the validity 71 . in foundations of mathematics and in linguistic studies.

based exclusively on postulated primitive notions and primitive propositions.1). Second. CHAPTER 4. The work of formalization of mathematics was published in the journal ‘Rivista di Mathematica’ (this journal was previously founded by himself) and ‘Formulario Mathematico’. This development was rudimentary and would later be worked out in full detail by the mathematicians Russell and Whitehead in ‘Principia Mathematica’ (1910. because Peano at that time consistently used Interlingua. for all his mathematical publications. that are formulated with the help of three (undefined) terms. page 227]). In 1899 he axiomatized the arithmetic of e cardinal numbers. To discard intuition. to be published in the third volume of ‘Formulario Mathematico’ in 1901.1). So Peano formalized both the language of mathematics and the logic of the mathematical argument. his simplificated dialect of Latin. and rewrote mathematics in symbolic form as an axiomatic system (see section 6.72 of its theorems. PEANO AND FREGE In order to satisfy this requirement he devoted himself to the transformation of mathematics into a self-contained system. 1 . the acquaintance with the latter being assumed: a) N (the set of natural numbers) b) 0 (the particular natural number zero) c) a+ (the immediate successor of the natural number a) The original ‘Formulaire de Math´matique’ was called ‘Formulario Mathematico’ when e the first final version appeared in 1908. he formalized the logic of the mathematical argument to replace intuitive inference by application of a limited number of stated logical rules. consisting entirely of neutral symbols. a series of 5 books that is also known as ‘Formulaire de Math´matique’1 . and thereto first developed parts of symbolic logic and first formalized propositional and predicate calculus. he first renounced ordinary language (because it is often not sufficient and imprecise) and desired a new mathematical symbolism. He introduced letters to denote propositions and propositional functions (Peano’s logic notation) and the symbol ∈ for the membership relation of a set. Peano based the foundations of arithmetic on 5 axioms (see [31. see section 7.

4. and he derived much of the elementary number theory. then S includes the whole of N) 4) a. We can show that in ZF (see section 5.1.3) we can derive the five axioms of Peano. In later years. product and other operators. a + (b+) = (a + b)+. [49. After defining the natural numbers. page 146-147] and [64. Peano then showed how rationals and reals can be formally obtained from naturals. page 148-150]) in an attempt to reduce the grammatical structure of languages and create a universal language. appendix A]. Peano turned away from the foundations of mathematics and devoted most of his time on his new international auxiliary language Interlingua. Peano used a recursive definition to define the arithmetical sum. Example: Peano defined the sum a + b by recursion with respect to b : a + 0 = a. a ∗ (b+) = (a ∗ b) + a. Similarly we can define the product a ∗ b : a ∗ 0 = 0. For more information on the Peano axioms. but his language Interlingua received little response. PEANO’S ARITHMETIC Definition of the Peano axioms for the natural numbers: 1) 0 ∈ N (zero is a natural number) 2) a ∈ N → a+ ∈ N (the immediate successor of any number is a number) 73 3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S)) → N ⊂ S (if a set S contains zero and if it contains any number x it also contains the immediate successor x+ of that number. I refer to [31. and further considered elementary analysis and geometry. . He invented this language (see [49. His mathematical work were to have a profound influence on the thought of mathematics. b ∈ N ∧ a+ = b+ → a = b (no two different numbers have the same immediate successor) 5) a ∈ N → a+ = 0 (zero is not the immediate successor of a number) Axiom three has the function to formalize the principle known as mathematical induction. chapter 5].

in full detail (it can be found in [49. and every good philosopher is at least half a mathematician’.2 Frege’s work As I think about acts of integrity and grace. but give a few examples of his new logic and describe the rest of his work in general terms. much of his work had been ignored to the benefit of men infinitely less capable. his second volume was about to be published. Famous is his ontological argument for the existence of god. I realize that there is nothing in my knowledge to compare with Frege’s dedication to truth. . page 175-182] and in [31. page 127] The German mathematician and philosopher Gottlob Frege (1848-1925) was one of the founders of modern symbolic logic putting forward the (logistic) view that mathematics is reducible to logic. We will mention his three most important works on the foundations of mathematics: Begriffsschrift. in [93. He has written many important papers on philosophy. and upon finding that his fundamental assumption was in error. he responded with intellectual pleasure clearly submerging any feelings of disappointment. but we will not discuss his philosophical writings here. We will not treat the symbolism that was used in Begriffsschrift here. the German mathematician Gottlob Frege invented a logical symbolism to which he gave the name ‘Begriffsschrift’ (in English known as ‘Concept script’). . Russell about Frege. Grundlagen der Arithmetik and Grundgesetze der Arithmetik. PEANO AND FREGE 4. page 177-199]). and recognized (not as the first) that the patterns of Aristotle cannot always be used to evaluate inferences correctly. Frege rejected the subject/predicate regimentation on which Aristotelian logic depends. It was almost superhuman and a telling indication of that of which men are capable if their dedication is to creative work and knowledge instead of cruder efforts to dominate and be known. Frege once said ‘every good mathematician is at least half a philosopher.B.74 CHAPTER 4. His entire life was on the verge of completion. Begriffsschrift Just as Peano.

What is propositional calculus (or sentential calculus)? A symbolic system of treating compound propositions and their logical relationships. Therefore. Compound propositions are formed via a set of derivation rules using standard symbols: ∧. Joe is not Scottish. FREGE’S WORK Example: Certain obvious inferences. Joe doesn’t wear a kilt. 75 do not fall under the patterns of traditional logic (also called syllogisms). Basic propositions consist of simple. ∨.2. ¬ . such as: If Joe doesn’t wear a kilt. With arbitrary expressions for A and B. Frege based his propositional calculus on 6 axioms: for all x. the rule became later known as modus ponens. y and z: 1 x → (y → x) 2 (x → (y → z)) → ((x → y) → (x → z)) 3 (x → (y → z)) → (y → (x → z)) 4 (x → y) → (¬y → ¬x) 5 ¬¬x → x 6 x → ¬¬x . A. Frege adopted this new rule in the system of logic of his Begriffsschrift.4. than Joe is not Scottish. →. A logic that evaluates these sorts of expressions is called a propositional logic. unanalyzed propositions. Actually this is another kind of inference that contains a conditional expression of the form: if B then A B Therefore.

based on the concept of (cardinal) numbers. he had to do more than developing a new logical symbolism.e. His next book. the subjective from the objective 2) Never to ask for the meaning of a word in isolation. For the full calculus of predicates. all inferences that can be shown to be valid inferences using the techniques of Aristotelian or propositional logic can also be shown to be valid using only Frege’s laws and rules of inference). 184-185]). In this book. y and (propositional functions) F : 7 (x = y) → (F (x) → F (y)) 8 x=x 9 (∀x :: F (x)) → F (y) Frege presented this new logic in his ‘Begriffsschrift’ in 1879. but only in the context of a proposition . For all x. The third part of Begriffsschrift shows that logic alone suffices to show the validity of certain inferences (about properties that are hereditary in so-called ‘ancestral sequences’). he believes. was devoted to the ‘foundations of arithmetic’. that mathematics is reducible to logic. But thereto. In the introduction of his book Frege announced his three guiding principles: 1) Always to separate sharply the psychological from the logical. three additional axioms were needed.76 CHAPTER 4. In [31. In the first part he provides a list of inferences from which. PEANO AND FREGE Derivations in the propositional calculus were based on two procedures of substitution and the rule of modus ponens. and he discussed work of others in detail (see [49. ‘Die Grundlagen der Arithmetik’ (1884). Then Frege demonstrates in the second part the completeness of his logic (i.4. Frege treated the foundations of arithmetic. all truths of logic can be derived.3) can be replaced by a principle about ancestral sequences that depends only on logical laws. Grundlagen der Arithmetik Throughout his work Frege developed (as the first) the main thesis of logicism. He also showed that mathematical induction (see section 3. He put forward the logicist philosophy that arithmetic could be founded upon logic alone. page 183] we learn more about Frege’s philosophy. It consists of three parts.

one of the most important of them being a notation for what Frege called the ‘course-of-values’ of a propositional function. Begriffsschrift. In 1902. We denote it as cov and define equal course-of-values by cov(f ) = cov(g) ↔ (∀a :: f (a) = g(a)). and wanted to show that all the truths in arithmetic are derivable from logical laws and definitions alone. but also to fill in pieces that were simply missing. ‘On Sense and Meaning’. During that period he also completed his definitions of the natural numbers and some of the proofs of simple truths of arithmetic from these definitions and logical laws. when volume 2 was in press. FREGE’S WORK 3) Never to lose sight of the distinction between concept and object 77 In his book he presented his own theory of numbers. In 1893. The calculus also had several other new logical and arithmetical symbols. . Before Frege could do that he needed a new version of Begriffsschrift. ˘ denoted the truth value for all possible values of the argument (here ε). he received a now famous letter from the English mathematician and logician Russell (see chapter 5). the ‘Basic Laws of Arithmetic’. Frege published the first volume of his ‘Grundgesetze der Aritmetik’. He did this by sketching the proof. Grundgesetze der Arithmetik In his next three papers ‘Function and Concept’. and ‘On Concept and Object (1892)’. he introduced all modifications that he was to make to his language. In the second part Frege wanted to define the natural numbers and some basic laws governing them and. which provided a shorter notation for many Begriffsschrift propositions. but not giving the official Begriffsschrift proofs of the truths of arithmetic. in the third part. to accompany the new requirements that his formalization of the concept of numbers had. denoted by Frege as εϕ(ε). It set out the new version of logic and began the proofs that were to make the project successful. with great modesty.2. a contradiction could be derived in Frege’s system (see section 5. and his logical system.1). This contradiction would later be named after Russell and become known as ‘Russell’s paradox’. His new logical calculus included a symbolic representation of the truth value of any given proposition.4. he would define the real numbers and lay the foundations for expressing analysis in terms of logic. who pointed out. The course-of-values of a propositional function ϕ .

. . Frege modified one of his axioms and explained in an appendix to the book that this was done to restore the consistency of the system. I have been placed in this position by a letter of mr. . Bertrand Russell just as printing of the second volume was nearing completion . has become ubiquitous. Although Frege attracted only a small audience in his lifetime. he was a major influence on Peano and Russell. but with Russell’s theory of types Principia Mathematica did not contain the paradox. see [49. PEANO AND FREGE Hardly anything more unwelcome can befall a scientific writer than one of the foundations of his edifice be shaken after his work is finished. pages 124-128]). page 181]). In this text I have made extensive use of the excellent books [98] and [97] about Frege that contain many more references about Frege and his work. Frege’s contribution to the foundations of mathematics was therefore largely indirect (through Principia Mathematica. and chapter 4. and in the years thereafter his influence on contemporary philosophy.1). by Leshniewski (see [85]). . section 4 from [49]. and both aimed at a logistic basis for mathematics.The first paragraph of the appendix from Frege’s ‘Grundgesetze der Aritmetik’ After many letters between the two (see for example [93. many of the theorems of volume 1 do not go through and Frege must have known this.78 CHAPTER 4.5 from [31] and chapter 6. . The scope of Frege’s Grundgesetze is similar to that of Principia Mathematica (to be discussed in section 7. especially on thought about language and logic. However with this modified axiom. He probably never realized that even with the modified axiom the system is inconsistent since this was not shown until after Frege’s death in 1925.

sociology. a span of three generations. I quote the following assessment from [73]: “Bertrand Russell had one of the most widely varied and persistently influential intellects of the 20th century. ethics. science.2. 11 and sections 8.000 largely unaltered words a day) and partly from the deep humanitarian feeling that was the mainspring of his ac79 . 10. 8. He certainly was one of the most important logicians and philosophers of the 20th century.3. [80] and [31.Chapter 5 Russell The fact that all Mathematics is Symbolic Logic is one of the greatest discoveries of our age. affairs. 8. 7.B. education. and when this fact has been established. but we will not discuss these subjects here (see for more information and references on Russell’s life and work [62]. religion. his social and political campaigns and advocacy of both pacifism and nuclear disarmament are certainly interesting. mathematics. imprisonment.1]). the remainder of the principles of mathematics consists in the analysis of Symbolic Logic itself. 8. 1903 The English logician and philosopher Bertrand Russell (1872-1970) published in his long life an incredible number of books on logic. Russell had at any time more than 40 books in print ranging over philosophy.1. the theory of knowledge and many other topics.9.2. history.4. 8.2.8. The extent of his influence resulted partly from his amazing efficiency in applying his intellect (he normally wrote at the rate of 3. Russell’s private life. politics and polemic. 10. chapter 6. . Russell in Principles of Mathematics.3. During most of his active life.

1) fails: propositional functions. in some significant sense. Using the vicious circle principle also adopted by Henri Poincar´.” Here. reducible to formal logic). Although the . For some. although consistent. we will focus on Russell’s mathematical contributions to the foundations of mathematics. For others. The theory admits of two versions. left-wing. Russell’s own response to the paradox came with the introduction of types (see chapter 7). Along with Kurt G¨del (see chapter 8). The paradox and the closely related vicious circle principle are discussed in section 5. they were too weak since they failed to resolve all of the known paradoxes. and skeptical atheist temperament. he is o usually credited with being one of the two most important logicians of the twentieth century. his introduction of the theory of types. 1913). ‘Principia Mathematica’ (1910. Principia Mathematica and the theory of types will be treated in detail in chapter 7. Although first introduced by Russell in 1903 in the Principles. Russell discovered the paradox which bears his name in 1901. We will look at each of these contributions in more detail. e together with Russell’s so-called ‘no-class’ theory of classes. the axiom of reducibility. should not be applied to themselves since self-application would involve a vicious circle. 1912. it follows that it is possible to refer to a collection of objects for which a given condition (or predicate) holds only if they are all at the same level or ‘type’. his defense of logicism (the view that mathematics is. his theory of types finds its mature expression in his 1908 article ‘Mathematical Logic as Based on the Theory of Types’ and in the monumental work he co-authored with Alfred North Whitehead. His contributions relating to mathematics include his discovery of Russell’s paradox. Both versions of the theory later came under attack. and his refining and popularizing of the first-order predicate calculus. On this view. Russell’s response to the second of these objections was to introduce. RUSSELL tions. the ‘simple theory’ and the ‘ramified theory’. they were too strong since they disallowed many mathematical definitions which. violated the vicious circle principle. such as ‘x is a set’. Russell was then able to explain why the unrestricted comprehension axiom (see section 2. within the ramified theory.80 CHAPTER 5. while working on his ‘Principles of Mathematics’ (1903). This feeling expressed itself consistently at the frontier of social change through what he himself would have called a liberal anarchistic.1.

It followed that number-theoretic operations could be explained in terms of set-theoretic operations such as intersection.81 axiom successfully lessened the vicious circle principle’s scope of application. First defended in his Principles. For more information on Russell’s theory of types and about Principia Mathematica. Of equal significance during this same period was Russell’s defense of logicism. The first is that all mathematical truths can be translated into logical truths or. and the like. and elementary measure theory. many claimed that it was simply too ad hoc to be justified philosophically. Russell’s logicism consisted of two main theses. . In this chapter we used parts of [73] and [39]. that the theorems of mathematics constitute a proper subset of those of logic. in other words. that the vocabulary of mathematics constitutes a proper subset of that of logic. the theory that mathematics was in some important sense reducible to logic. Russell’s basic idea for defending logicism was that numbers may be identified with sets of sets and that number-theoretic statements may be explained in terms of quantifiers and identity. and later in more detail in ‘Principia Mathematica’. we refer to chapter 7. The second is that all mathematical proofs can be recast as logical proofs or. in other words. finite and transfinite arithmetic. union. In ‘Principia Mathematica’ Whitehead and Russell were able to provide detailed derivations of many major theorems in set theory. A fourth volume on geometry was planned but never completed. Like Gottlob Frege.

Before we describe the most famous paradox of Russell. and is also known as an antinomy In an axiomatic system (see section 6. by its etymology.1) a paradox is a derivation that leads to a contradictory statement. but is frequently used to signify something selfcontradictory [. A paradox is properly something which is contradictory to general opinion.Russell. We first formulate the paradox in Cantor’s set theory: Russell’s paradox: Let R = {x | x ∈ x}. The paradox turned out to be at the basics of mathematics. we first define the notion of a paradox. formal or designed for computers. three ‘paradox threats’ are identified: when systems are complex.1 Russell’s paradox I hoped sooner or later to arrive at a perfect mathematics which should leave no room for doubts. With the previously described formalizations. page 310] In [86]. Peano (see section 4. and bit by bit to extend the sphere of certainty from mathematics to other sciences. . is best fitted to signify an offence against the formal rules of inference. in 1902 Russell discovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. .. the systems of Cantor (see chapter 2). in [78] Paradoxes have been known for a long time. RUSSELL 5.1)..De Morgan.82 CHAPTER 5. there often is not enough intuition to notice inconsistencies. in [31. but in particular with the introduction of more formal systems at the end of the 19th century paradoxes became more influential on the foundations of mathematics. And indeed. Frege (see section 4. Then R ∈ R ↔ R ∈ R / .2). and not to mention Russell himself were at risk.] Paralogism. What is a paradox? A paradox is a statement which appears self-contradictory or contrary to expectations. since it could be formulated in all the systems mentioned above.

K.1] and after noting that some sets belonged to themselves while the rest did not do so. page 7] for a slightly different proof): Define f (x) := (¬∀ϕ :: (cov(ϕ) = x) → ϕ(x)). Russell also expressed this paradox in terms of predicates.¬¬} (∀ϕ :: cov(ϕ) = K → ϕ(K)) ≡ {instantiate ϕ with f } cov(f ) = K → f (K) ≡ {def. and all statements in classical logic were entailed by a contradiction. ¬f (K) ≡ {def. and as such first presented his discovery in a letter to Frege (see [93. we can derive the paradox in Frege’s work as follows (see also [86. so R ∈ R was not allowed. section 6. If we define equal course-of-values cov by cov(f ) = cov(g) ↔ (∀a :: f (a) = g(a)). and let K := cov(f ).6. also Peano’s work contained the paradox.5. Since Peano’s system was based on the set theory of Cantor.1. →} f (K) The paradox had a big influence. by repetition of the argument. f} ¬(¬∀ϕ :: cov(ϕ) = K → ϕ(K)) ≡ {elim. RUSSELL’S PARADOX 83 Russell in 1901 studied Cantor’s work [31. page 124] and see also the quote on page 78). vice versa also. . Russell showed that the set of all sets which do not belong to themselves belongs to itself if and only if it does not do so . but the paradox could still be expressed by using Frege’s notion (see page 77) of the course-of-values of a function. since it could be formulated in all systems. In Frege’s work (Grundgesetze der Aritmetik) self-application was not possible.and. elim.

Before we consider the consequences of the discovery of the paradox. hereby following Russell’s own analysis. Russell’s attention was attracted by what is now known as Cantor’s paradox and (according to a letter he wrote to the French mathematician Jourdain) found that there was something wrong with his earlier refutation of Cantor’s paradox (see [29.e. if they contain a set of objects such that it will contain members which presuppose this (total or whole) set of objects. arises from the assumption that a set of objects may contain members which can only be defined by means of the set as a whole. Russell traced the fallacy back to what he called the ‘vicious circle principle’. Most attempts at resolving Russell’s paradox have therefore concentrated on various ways of restricting or abandoning this axiom. regardless of the nature of the elements of the 1 . he summarized this discovery and the reasoning that led thereto in a second letter to Frege. We might rephrase the definition as ‘whatever set contains an apparent element. that element must not be dependant on that set’.84 CHAPTER 5. But that means that the condition ϕ that determines a set V = {x | ϕ(x)} may depend on the whole set V . This might be implemented by fixing ‘an apparent element’ of a set and then expressing its independency of other elements of that set. Brouwer) it therefore appeared that no proof could be trusted once it was discovered that the logic underlying all mathematics was inconsistent. Cantor created this axiom with the intuition that any coherent condition may be used to determine a set. That means a statement is only legitimate if all propositions it contains refer to already defined sets. Russell’s paradox arises as a result of naive set-theory’s so-called unrestricted or naive comprehension axiom (see page 16). As we have already seen. While writing ‘The Principles’. The ‘vicious circle’ that his principle is named after. it allows impredicative definitions (see below for the definition of impredicative). Therefore. He removed his earlier refutation from ‘The Principles’ and his revised diagnosis uncovered a true paradox. After discovering his famous paradox. i. This independency means that.1 Note that a direct implementation of this definition as a new axiom of set theory is not possible. Russell said that statements are not legitimate and meaningless. Definition of impredicative: A definition is impredicative if it involves a set V that has a member v ∈ V whose definition depends on V . we first take a further look at the nature of the paradox. Hilbert. RUSSELL In the eyes of many mathematicians (e. section 7]).g.

So to obtain a contradiction we have to show that x ∈ x ∨ x ∈ R. In other words. We consider a set X:=R ≡ {x | x ∈ x} and / an element x ∈ R. page 113]: ‘If. In terms of set theory we can formulate the principle as : No set V is allowed to contain members v definable only in terms of V . i. Definition of Vicious Circle Principle2 : Definitions.2]). provided a certain collection had a total. all of which were based on the same circular type of reasoning. Vicious circle fallacies are arguments that are condemned by the vicious circle principle. This leads us to the following axiom: (∀X :: (∀x : x ∈ X : x = a → (∀x : x ∈ X ∧ x = x : x = b(x ) → a ∈ X))). and were considered the cause of antinomies. For more information about impredicativity. or members v involving or presupposing V . Clearly this does not avoid the paradox of Russell. impredicative definitions should be avoided.1. The axiom tells us x = a → (∀x : x ∈ R ∧ x = x : x = b(x ) → a ∈ R). but still we will have x ∈ x and x ∈ R. Or. Despite the fact that the set X is ‘too large’. as formulated in [49. The ‘nature’ of the elements can be seen as all the members of that element (or in case the element is an individual. We will now mention eight of the most well-known paradoxes. . In Principia Mathematica (see [31. the / axiom does not prohibit the existence of the set X. and then he resolved them by making their circularity explicit. see [57. the nature of the apparent element remains the same. 2 Russell formulated it originally as ‘Whatever involves all of a collection must not be one of the collection’. we have x ∈ x. then the said collection has no total’.e.5. So unfortunately this most ‘direct’ attempt to solve the paradox fails. RUSSELL’S PARADOX 85 In a sense those impredicative definitions are thus circular. / / Now we can change all x into any value b(x ). most of whom originate from the vicious circle principle. Such arguments may not necessarily lead to contradictions (since fallacious arguments can lead to true conclusions). we can change each element in R except x and the nature of x should not depend on it. Russell assembles a collection of seven different paradoxes. it would have members only definable in terms of that total. The only thing we know about x is that x ∈ x and x ∈ R. set. Another formulation of [87] says ‘No entity can be defined in terms of a totality of which it is itself a possible member’. assumptions or statements involving all of a set must not be a part or an element of that set.3]. section 15. the nature of the apparent element can be seen as that individual). In other words. section 7.

. The oldest form of this particular paradox. page 127]: “If a man says ‘I am lying’. The impredicativity is clear in the definition of the set that contains all sets that are not members of themselves.”. 6 Paradox of definitions. 5 Richard’s paradox: The French schoolteacher Jules Richard (18621956) published a paradox in [74] in 1905. and applying Cantor’s diagonal argument to the members of V . and all other statements made by Cretans were certainly lies’. one of them is from Russell himself (1919) and concerns the plight of the barber of a certain village who has enunciated the principle that he shaves only all those persons of the village who do not shave themselves. 2 Burali-Forti’s paradox (1897). The impredicativity comes from the cardinal number of the set of all sets. 4 The liar’s paradox: We quote from [49.1.8. and there are therefore at most ℵ0 of them. which we have discussed in section 3. his utterance is self-contradictory. 7 Berry’s paradox: “The least integer not nameable in fewer than nineteen syllables” is itself a name that contains only eighteen syllables. and thus it appears that the description ‘the least indefinable ordinal’ yields a definition of an entity that cannot be defined. defined in a finite number of words. The impredicativity comes from the ordinal number of the naturally ordered set of all order numbers. is that of Epimenides the Cretan. RUSSELL 1 Russell’s paradox (1903). Again we quote from [49]: “The possible definitions of specific ordinal numbers can be arranged in a sequence. a different but non-terminating decimal was produced. But the totality of ordinal numbers is not denumerable. By arranging V as a sequence. which we have discussed in this section. 3 Cantor’s paradox. in the words of Principia Mathematica.86 CHAPTER 5. which we have discussed in section 3. The paradox is then formed by the question ‘Does the barber shave himself?’. and so there exist ordinal numbers which cannot be individually defined.8. There are many popularizations of this paradox. Among such indefinable ordinals there is a least.”. ‘who said that all Cretans were liars.2. He considered a set V of all non-terminating decimals that can be defined in a finite number of words. and it cannot be either true or false.

or ‘verb’. The remaining five are mainly paradoxes of naming.”. page 336]: “Some words can be predicated of themselves: in English. Other English words are not autological. All these paradoxes have stimulated fundamental research.but this word is heterological if and only if it is not so. ‘noun’ is a noun. they are of a semantic kind.1. The first three paradoxes are logical paradoxes that can be formulated within Cantor’s set theory. ‘German’. This property is called ‘autological’. . and so on. say. RUSSELL’S PARADOX 87 8 The Grelling-Nelson paradox: The German philosopher Kurt Grelling (1886-1942) published with his friend Leonard Nelson (1882-1927) in 1908 a paradox.5. As described in [31. and especially Russell’s paradox that revealed the vicious circle principle and first showed the need for a theory of types or other restriction of the power of the comprehension axiom. ‘word’ is a word. They are called ‘heterological’ . and is obviously itself autological.

Since self-application (R ∈ R) caused a contradiction. This was a serious problem to Russell’s solution. The outlawing of impredicative definitions seemed a solution to the known paradoxes in set theory. Bernays and G¨del (see section 8. Hilbert and other formalists had the basic idea to allow the use of only well-defined and finitely constructible objects.4). Newman quoted in [46] The various proposals to overcome this paradox led to various theories. after a refinement by the mathematician Fraenkel. Although he succeeded in obtaining a considerable part of analysis. o Russell’s own response to the paradox came with the introduction of his theory of types in his Principia Mathematica (see section 5. he was unable to derive the important theorem . This attempt was successful and. he decided to suppress this.1) sufficiently restrictive to exclude the paradoxes.2 Consequences and philosophies Perhaps the greatest paradox of all is that there are paradoxes in mathematics. One proposal was to reconstruct set theory on an axiomatic basis (this axiomatic method was first suggested by Hilbert. together with rules of inference that were deemed to be absolutely certain. despite the fact that many instances of impredicative definitions in mathematics could be circumvented. page 265]: “In 1918.E. the German mathematician Hermann Weyl (1885-1955) tried to construct as much parts of analysis as possible from the natural number system without the use of impredicative definitions. Russell already laid out a first version of his theory to eliminate the paradoxes in 1908. Subsequent refinements to ZF have been made by Skolem. But it turned out there are essential and accepted parts of mathematics that contain impredicative definitions. RUSSELL 5. see section 6. Kasner and J. With this approach he assigned types to variables (as types he took sets) and allows expressions such as x ∈ y only if the type of x is one less (in some order) than the type of y. .88 CHAPTER 5.5). but most other parts of set-theory are. and later by the three mathematicians von Neumann. We quote from [22. led to the ZF axiom system (see section 5. The mathematician Zermelo in 1908 as first did an attempt to formulate proper axioms for set-theory such that the paradox is not deducable.3) which is still the most accepted basis today.

In the period after the discovery of the paradoxes.5. The intuitionists had the basic idea that one cannot assert the existence of a mathematical object unless one can also indicate how to go about constructing it. What is Intuitionism? A school of mathematical thought by the 20th century Dutch mathematician L. we distinguish three main philosophies of mathematics: logicism. Brouwer first attacked the logical foundations of mathematics in his doctoral thesis in 1907. and Whitehead and Russell (in Principia Mathematica 1910-1913). Other attempts towards a solution for the paradoxes of set theory focus on the foundations of logic. without the use of any specifically mathematical concepts. The first ideas date back to Leibniz (1616) and the actual reduction of mathematics to logic was started by Dedekind (1818) and Frege (1884-1903) and later by Peano. and not if it merely showed that the entity ‘could’ be constructed or that supposing its non-existence would result in contradiction.E. intuitionism and formalism. Luitzen Brouwer and the intuitionists took this approach and tried to prevent the paradoxes by denying the principle of the excluded middle (which states that any mathematical statement is either true or false).2. What is Logicism? A school of mathematical thought which holds the thesis that mathematics is a part of (or a branch of) logic. Logicists contend that all of mathematics can be deduced from pure logic. They proposed that a proof in mathematics should be excepted only if it constructed the mathematical entity it talked about. .J. This formed the beginning of the Intuitionist School. Brouwer (1881-1966) that contends that the primary objects of mathematical discourse are mental constructions governed by self-evident laws. such as number or set. CONSEQUENCES AND PHILOSOPHIES 89 that every nonempty set of real numbers having an upperbound has a least upperbound”. Intuitionists have challenged many of the oldest principles of mathematics as being non-constructive (and hence meaningless).

These three philosophies do not necessarily contradict each other. Formalists contend that it is the mathematical symbols themselves. which asserts that for every proposition ϕ. This principle is basic to classical logic and had already been enunciated by Aristotle. These are serious objections to the intuitionistic approach. that are the basic objects of mathematical thought. as he pointed out that the statement “there will be a sea battle tomorrow” is neither true nor false. Whether the logicist thesis has been established seems to be matter of opinion. RUSSELL Brouwer had the fundamental insight that such nonconstructive arguments will be avoided if one abandons a principle of classical logic (which lies for example behind De Morgan’s laws). it can be questioned on the ground that the systematic development of logic presupposes mathematical ideas in its formulation. but it is generally conceded that its methods do not lead to contradictions. This is the principle of the excluded third (or excluded middle). and some hope for a new intuitionist reconstruction of mathematics carried out in a different and more successful . that is a theorem provable within the syntax of mathematics. Though successful. but a large part is still wanting. and not any meaning that might be ascribed to them. was to concentrate on the formal language of mathematics and to study its syntax. for every ϕ. making intuitionist mathematics less powerful and in many respects much more complicated than classical mathematics. and all philosophies are still advocated today. called formalism. A statement should be a metatheorem. What is Formalism? A school of mathematical thought introduced by the 20th century mathematician David Hilbert. Hilbert’s program. Because of the weight it places on mental apprehension through construction of purported mathematical entities. intuitionism is sometimes also called constructivism. though with some reservations.90 CHAPTER 5. ¬¬ϕ implies ϕ. or equivalently that. The intuitionists succeeded in rebuilding large parts of present-day mathematics. More information on intuitionism can be found in [60]. in which one rejects infinite sets. which holds that all mathematics can be reduced to rules for manipulating formulas without any reference to the meanings of the formulas. A still more severe form of constructivism which we will not further discuss is strict finitism. either ϕ or ¬ϕ.

5. For more information about realism. item from Paul Bernays] we learn that most mathematicians of all three philosophies are also philosophical realists: “While no one. Unfortunately for the formalists. will deny the importance of the language of mathematics.most pure mathematicians are tacit formalists. except an extremist intuitionist. this position is also called Platonism. see [57]. formalism is not dead . since Plato believed that mathematical entities really exist. Following the Swiss mathematician Paul Bernays (1888-1977). a consequence of G¨del’s incompleteo ness theorem (see chapter 8) is that the consistency of mathematics can be proved only in a language which is stronger than the language of mathematics itself. From [11.”. but the naive attempt to prove the consistency of mathematics in a weaker system had to be abandoned. Yet.2. CONSEQUENCES AND PHILOSOPHIES 91 way. most mathematicians are also philosophical realists who believe that the words of this language denote entities in the real world. .

1. c. Criteria for the choice of axioms have been identified by several people (see Hilbert’s theory in section 6. Thus some.. offered a way to put set theory on firm and hopefully consistent grounds. see page 288. in other words the paradoxes should be avoided). The most accepted criteria (more formally defined in chapter 6) include: 1. The so-called ideal calculus was a first formalization of Cantor’s set theory. for example by a system of logic. we call b a set and a an element of this set. last sentence of page 287]). The first real axiomatization of set theory was given in 1908 by the German mathematician Ernest Zermelo in [101]. The attitude adopted in his axiomatic development of set theory is that it is not necessary to know what ‘sets’ are and the ‘things’ that are its elements. Plausibility (the axioms should be in accord with intuitive beliefs about sets. b. He then defined the primitive notions of equality and membership: a = b states that ‘a’ and ‘b’ designate the same ‘thing’. nor what the ‘membership relation’ means [49. RUSSELL 5.3 5. 3. . 2. a ∈ b is defined on the domain B and if a ∈ b holds. paragraph 1]. Completeness (richness of the theory: the desirable results of Cantorian set theory ought to be derived as theorems). but not necessary all objects of B are sets. that will be treated in more detail in chapter 6. Hilbert’s proof theory.3. Consistency of the system (it should be impossible to derive both a statement and its negation.1 Zermelo Fraenkel Axiomatic set theory After the discovery of Russell’s paradox.92 CHAPTER 5. but it lacked the preciseness of Hilbert’s later theories and was inconsistent because it still contained in some form the (naive) comprehension principle (see page 16). or [49. In the next paragraph we will present the set of axioms that Zermelo has chosen and that formed the basis for all future axiomatizations of set theory . . it became clear that set theory needed a new and more rigorous basis. The assumptions adopted about these notions are called the axioms of the theory. see [60]). . Zermelo instead postulated a domain B of abstract objects and represented the elements or ‘things’ of this domain by the letters a. Its theorems are the axioms together with the statements that can be deduced from the axioms using the rules of inference (see also section 6).

3. for Russell’s paradox it is the set of all sets which are not members of themselves (see section 3.5. and he changed the (naive) comprehension principle into his separation axiom.2).8. There are also certain limitations on the property ϕ (i.5 for the definition of the concept of definiteness. We show that the standard derivation of Russell’s paradox cannot be applied when the naive comprehension axiom is replaced by the separation axiom.5). 93 5. such that the paradox could no longer be derived: Separation Axiom: (∀z∃y∀x :: (x ∈ y ↔ x ∈ z ∧ ϕ(x))) For every set z and definite4 property ϕ of sets there exists a set whose elements are exactly those of z having the property ϕ.2 Zermelo Fraenkel (ZF) Axioms Zermelo formulated his axiomatic system in 1908.2) it is the set of all well-orderings). Let R = {x | x ∈ Z ∧ x ∈ x} / R∈R↔R∈Z ∧R∈R / → R ∈ R. 4 See section 8. 3 . This means all sets are classes but not every class is a set. A class that is not a set is called a proper class. ZERMELO FRAENKEL (see also section 8.e. all other sets are then referred to as improper classes. Zermelo noted that the sets involved in a derivation of the paradoxes are very large3 (for Cantor’s paradox it is the set of all sets (see section 3. contradiction.8. Therefore he wanted to restrict the size of sets.3. / / R∈R↔R∈Z ∨R∈R / The term proper class is sometimes used to refer to these ‘excessively large’ sets. the extensions of Fraenkel are from 1922. and for the Burali-Forti paradox (see section 3.5. In the same year (1922) the Norwegian mathematician Skolem (1887-1963) proposed a formal language for formulating the theory.1). it should be definite) that we will mention later in section 8.8.

Since we cannot directly conclude (or refute) R ∈ Z. For example.94 ←R∈Z / CHAPTER 5. However. this fact alone does not guarantee that there does not exist a paradox. but merely that the separation axiom does not permit the construction of paradoxical sets with elements defined in terms of the sets themselves. z :: (z ∈ x ↔ z ∈ y) → x = y) Sets are uniquely determined by their members. 2. there might be other less obvious ways to construct a paradox.or to be exact: if every element of a set x is at the same time an element of y. see also footnote on page 93) set. and conversely. The Zermelo-Fraenkel axioms: 1. We now give all of the ZF axioms that constitute set theory. and in axiom 6 we used ⊆ to express x ⊆ z as a shorthand for (∀y :: y ∈ x → y ∈ z). Axiom 8 and 9 were later added by Fraenkel and von Neumann respectively. such that the notation of each expression does not depend on previous axioms. y. The separation and substitution axioms are actually axiom schemes. then x = y. . which contains no elements at all. RUSSELL In both equations above we can only conclude that R ∈ R ↔ R ∈ R if / we know that R ∈ Z. in axiom 8 we used the ∃! to denote that there is exactly one y. and in axiom 9 we used the symbols ∩ and ∅. / Russell’s derivation of his paradox does not apply. the ‘null’ or ‘empty’ set. In the definitions below we use several shorthand notations. If we wish however we can express these definitions in full detail. Extensionality axiom (or axiom of determination): (∀x. But until consistency is proved. The axioms 1 through 8 are the original set of the Zermelo-Fraenkel axioms. as claimed in some articles. Axiom of the empty set: (∃x∀y :: y ∈ x) / There is an (improper. The first seven axioms are those that were originally formulated by Zermelo.

4. See section 8. Then5 . Sum-set axiom or Union axiom: (∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w)) For every set z there exists a set y whose elements are exactly those objects occurring in at least one element of z. y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x. 5. while W Since the property x ∈ x is definite. ZERMELO FRAENKEL 95 3.3. For every set z there exists a set y whose elements are exactly those of z having the property ϕ. 8. ϕ is definite and does not contain y. Separation axiom: (∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)). Axiom of foundation or axiom of regularity (by von Neumann): (∀a ::= ∅ → (∃b :: b ∈ a ∧ b ∩ a = ∅)) Every non-empty set is disjoint from at least one of its elements. Axiom of replacement or axiom of substitution (by Fraenkel): (∀x∃!y :: ϕ(x. Pairing axiom: (∀a. 7. Proof: Suppose V is any given set. Axiom of infinity: (∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : {a} ∈ z)) There exists a successor set. y)))) The image of a set under an operation ϕ (functional property) is again a set.5. 9. b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b)) Given two sets a and b there exists a set whose elements are exactly a and b. Theorem: (from [49. 6.5 for the definition of the concept / of definiteness. V has a subset W that consists of those elements of V that are not members of themselves. Power set axiom (∀z∃y∀x :: x ∈ y ↔ x ⊆ z) For every set z there is a set y that includes every subset of x. chapter 11]) The domain B itself (see page 92) is not a set. But then W is not an element of itself (because in that case we would have W ∈ W . 5 .

each set constructed The existence of the empty set in section 2. chapter 11]. RUSSELL consists of elements that are not members of themselves). But V is certainly in B. For more detailed information. The axioms are not minimal. The theory is not complete. we will give a short explanation of the nature of the other axioms.2 was actually derived from the comprehension principle but the result can similarly be obtained from the separation axiom. 6 . the following two statements have a more dubious status (and are not part of standard ZF ): 10. together with the extensionality axiom. the axiom of the empty set can be deduced from the separation axiom. 1}ℵr = ℵr+1 In 1908 Felix Haussdorf proposed this generalization of CH.96 CHAPTER 5. we refer to section 8. Independent of the previous axioms. ensure uniqueness of the pairs. With these axioms alone we can already create an infinite number of axioms.6. The pairing. [24]. [28]).5 and to the rich literature on set theory that is available (for example [17]. This means that W is not a member of V . Generalized Continuum Hypothesis(GCH): For any cardinal ℵr . [49. {0. and therefore B is not the same as V .2 how we can define basic operations with the extensionality and separation axioms. A ∧ (∀a : Definition of choice function: A function f is called choice function for the set V := Dom(f ) = V − {∅} ∧ (∀v : v ∈ Dom(f ) : f (v) ∈ V ) 11. since many statements are independent of ZF. Thus B cannot coincide with any set at all. as we have already seen in section 2. But if W would be an element of V − W . Another formulation of this axiom and more information are given in section 3. sums and powersets of sets. we would also have W ∈ W . For example. sum and powerset axioms. We have also seen in section 2.26 . Axiom of choice (AC): (∀x :: (∃f : f is a function : Dom(f ) = x − {∅} ∧ Ran(f ) ⊂ a ∈ Dom(f ) : f (a) ∈ a))) Every set x has a choice function. However. We also have empty set axiom + substitution axiom separation axiom. In the remainder of this section.

which. to define an ordering relation for a given set such that each nonempty subset has a least element. ZERMELO FRAENKEL 97 with axioms 1 to 6 only has a finite number of elements.9]. y). then there exists a function f with domain y such that for member a of y. section 2.4. With this axiom we can now also prove the principle of induction for ω (see section 3. An implication of this axiom is that there is no function f defined on ω such that (∀i : i ∈ ω : f (i + 1) ∈ f (i)). denoted ω. One form of this principle is expressed as the axiom of choice. the elements of which are exactly those y for which an x ∈ a exists such that ϕ(x. the axiom asserts the possibility of making a simultaneous choice of an element in every nonempty member of any set. y).. and a is a set. Intuitively. not the foundation axiom. Zermelo (1904) gave the first proof that any set can be well-ordered. this guarantee accounts for its name. The origin of the axiom of choice was Cantor’s recognition of the importance of being able to well-order arbitrary sets. These sets are not unique. The substitution axiom says that whenever ϕ is a property of sets.3).1]. The foundation axiom says that each non-empty set has epsilon-minimal elements (see below). For a motivation and analysis of the role of the foundation axiom we refer to [17.e. f (a) ∈ a. If x is a nonempty set the elements of which are nonempty sets. section 2. although . We call its elements the natural numbers. The assumption is significant only when the set has infinitely many members. then there exists a set. The virtue of a well-ordering for a set is that it offers a means of proving that a property holds for each of its elements by a process (transfinite induction) similar to mathematical induction. His proof employed a set-theoretic principle that he called the axiom of choice. Note that to prevent the paradoxes we need the separation axiom. was shown to be equivalent to the so-called well-ordering theorem. It is the infinity axiom that we need to create infinite sets. Definition of epsilon-minimal: An element b ∈ a is epsilon-minimal in a := b ∩ a = ∅ Another corollary of the foundation axiom is that there is no set which has itself as its only element. Zermelo was the first to state explicitly the axiom.5. A choice function for a set A ‘chooses’ an element from each non-empty subset in A. such that to every x there is exactly one y for which ϕ(x. For a more detailed discussion of the axiom of choice we refer to [17. i. is unique. but the smallest successor set.3. shortly thereafter.

. [77] and [11].7. A discussion of the Generalized Continuum Hypothesis can be found in section 3. There are a few mathematicians who feel that the use of the axiom of choice is improper. has become an indispensable and commonplace tool. but to the vast majority it. It soon became the subject of vigorous controversy because of its unconstructive nature. or an equivalent assertion. For this discussion of the axiom of choice we have used [63].98 CHAPTER 5. RUSSELL it had been used but essentially unnoticed earlier.

In 1899 he published the ‘Grundlagen der Geometrie’ (to appear in English as ‘The foundations of Geometry’ in 1902).2). but its application and concept is more general and can be used far beyond the domain of geometry. resulting in two volumes of ‘Grundlagen der Mathematik’ that 99 . see also [57. and published his ‘Zahlbericht’. .7. section 14. and unsuspected relations are disclosed between hitherto separated branches of science. After that he did significant work in the areas of algebraic number theory.Chapter 6 Hilbert The further a mathematical theory is developed.Hilbert.7]) will be treated in section 6. section 4. His first work was on invariant theory and in 1888 he proved his famous Basis theorem (see [5]). or ‘Report on the theory of numbers’ in 1897. Some of these problems were related to the foundations of mathematics (see section 6.1. This axiomatic method that Hilbert used (for geometry. which contained (see [31. the more harmoniously and uniformly does its construction proceed. quoted in [76] David Hilbert (1862-1943) was a German mathematician who reduced geometry to a series of axioms and contributed substantially to the establishment of the formalistic foundations of mathematics. and posed as a challenge for the next century.2]) what would become a widely accepted set of 21 axioms for Euclidian geometry and an analysis of their significance. A substantial part of Hilbert’s fame rests on a list of 23 mathematical problems he outlined in 1900. In 1905 Hilbert attempted to lay a firm foundation of mathematics by proving its consistency.

100 CHAPTER 6. When Hilbert was made an honorary citizen of G¨ttingen he o gave an address which ended with six famous words. the work Hilbert had done on the foundations of mathematics nevertheless remained influential to the development of logic. (see [45]) led to research in functional analysis and established the basis for his work on infinite-dimensional space. Hilbert’s work on integral equations in about 1909. We shall know]. later called Hilbert space (see [22. . Despite that in 1931 Kurt G¨del o showed this goal to be unattainable (see chapter 8). our answer is on the contrary: Wir m¨ssen wissen. page 232]). HILBERT were intended to lead to a proof theory. showing his enthusiasm for mathematics and optimism for solving mathematical problems: “There are absolutely no unsolvable problems. u Wir werden wissen” [We must know. Instead of the foolish ignorabimus [Latin for ‘the ignorant’].

the result being called a formal system or formalism. we first have to make precise some other concepts. What is metamathematics? The study about mathematics itself (with respect to formalized mathematical systems. Hilbert took a formal(istic) approach to achieve this logistic goal (logicism is the study that uses logic as the basis of mathematics and formalists attempted to successfully axiomatize mathematics.1 Hilbert’s proof theory Hilbert formalized mathematical theories in order to turn them into welldefined objects of discussion. Hilbert’s method of making the formal system as a whole the object of mathematical study is called metamathematics or proof theory. In its structure. Thereto Hilbert identified three properties that an axiomatic system should have: it should be decidable. which in turn are sequences of letters (a symbolic language). it appears necessary to axiomatize logic itself and to prove that number theory and set theory are only parts of logic”. Or. metamathematics thus consists of statements about the signs and formulas occurring within axiomatic systems). Hilbert was the first who emphasized that strict formalization of a theory involves the total abstraction from the meaning. see also the philosophies in section 5. he devoted himself to the much greater task of applying his new metamathematic method to pure mathematics as a whole. In order to define these notions. Definition of an axiom: A proposition that is regarded as true without proof Definition of free variable: A variable that is not bound within the scope of a quantifier .6. a formalized theory is no longer a system of meaningful propositions but one of sentences as sequence of words. HILBERT’S PROOF THEORY 101 6. One of the primary goals of metamathematics is to determine the nature of mathematical reasoning After Hilbert presented an axiomatic development of geometry in ‘Grundlagen der Geometrie’ (1899).2). thus making possible the new kind of investigation to which he gave the new name meta-mathematics. complete and consistent.1. as Hilbert wrote in 1917: “Since the examination of the consistency is a task that cannot be avoided.

refutable sentences of L 5 H ⊆ E. 1 . an axiom with free variables is called an axiom scheme and each free variable is to be quantified over all well-formed formulas. provable sentences of L 4 R ⊆ S. HILBERT An axiom that does not contain any variables is also called an axiom statement. with free variables. We can think of such a function ϕ as a substitution function. since it not only defines a language but also includes the (dis)provability and truth of expressions. the sentence H(n) expresses the proposition that the number n belongs to the set names by H. Informally.e. Definition of an STGA language: A language1 L is Susceptible to G¨del’s argument (STGA) if it consists of: o 1 E. H(n) is a sentence (H(n) ⊆ E hence. we here consider those susceptible to G¨del’s incompleteness theorem (that will be presented o in chapter 8).e. Definition of statement (or sentence): A well-formed formula with no free variables Of the systems that Hilbert’s proof theory applies to. and is normally determined by a model that we accept as representing the truth. For convenience. sentences of L (i. The model should be distinguished from the set of derivation rules that (syntactically or mechanically) determines whether sentences are provable or Sometimes also called system. H ∩ S = ∅). a denumerable set of (well-formed) expressions (also called formulas) of L 2 S ⊆ E. we here assume predicates to have exactly one variable. 6 A function ϕ : E × N → E. with no free variables) 3 P ⊆ S. The following set is the only one that depends on a semantic interpretation of the expressions. H(n) ⊆ S).102 CHAPTER 6. ϕ assigns to every E ∈ E and n ∈ N an expression E(n) such that for every H ∈ H we take for E and every n ∈ N. predicates of L (i.

and are elements of P. their negation can be derived from the axioms and derivation rules of the axiomatic system) and are elements of R. The axiomatic system also contains a syntax definition that determines the well-formedness of expressions of L. These notions only depend on whether the sentence is derivable from the axiomatic system and are independent from the truth of the sentence.e. true sentences of L. i. we give an intuitive explanation of this definition: In most parts of mathematics. not every sequence of symbols is meaningful or useful. It is important to realize that the truth of a sentence is not the same as the provability of that sentence. also called disprovable (i. Some of them are provable from the axiomatic system (i.1. the syntax definition of an axiomatic system consists of an alphabet of symbols and a set of rules. Such an axiomatic system A is often defined as follows: . Some of these formulas (also called propositions) do not contain free variables. Others are refutable. Therefore we only consider the so-called well-formed formulas E. Other formulas have free variables. We show that this notion of an axiomatic system gives rise to a language that falls under the category of STGL languages. Normally. HILBERT’S PROOF THEORY refutable. we name them sentences (S). What is an Axiomatic System? An axiomatic system (sometimes also called formal axiomatic system) is a logical system that gives rise to an STGL language and has an explicitly stated finite set of axioms from which provable sentences can be derived (using a finite set of derivation rules) The set of axioms and derivation rules determines which sentences of L are provable or not. We also assume there exists a function ϕ that assigns to every expression H ∈ H and natural number n a sentence H(n). This set can be determined by a model (see page 107) First. 103 7 T ⊆ S.6.e. We call them predicates (H). they are functions. We call the set of true sentences T (the other sentences are false). they can be derived from the axioms and derivation rules of the axiomatic system).e.

We call a system with this property correct. . ¬1 . Example: axiomatic system A1 2 • Σ = {∨2 . . (0 . R0 . xj ). 2. Ideally. ¬ϕ1 . consisting of a finite number of constants (with their arities) and variables. . • An initially determined and fixed set of axioms and derivation rules (also called transformation rules or rules of inference). The set of axioms and derivation rules let us prove or refute sentences. ∀xi (ϕ1 ). . . xj variables or constants. with Ri a relation with arity j. . The variables enable us to form predicates. . ϕ1 ∧ ϕ2 . 1. has the form of ϕ1 ∨ ϕ2 . The recursive definition over the given alphabet gives us the set of expressions. We now give an example of a definition of a simple axiomatic system. HILBERT Definition of axiomatic system: An axiomatic system A consists of: • An alphabet Σ. y 0 . f alse0 } The numbers that are written in superscript denote the arity of the relations. . )0 . • ϕ is a well-formed formula if it 0. is one of the constants true and false. determining which formulas are wellformed formulas. x0 . • A recursive definition of a syntax. is an atomic formula Ri (x1 . a constant or variable is a 0-ary relation.104 CHAPTER 6. (ϕ1 ). true0 . . ∀2 . and x1 . where ϕ1 and ϕ2 are smaller formulas and xi is some variable from Σ. we want all sentences that are provable coincide with the sentences we intuitively consider true (P = T ) and the refutable sentences coincide with those we consider false.

HILBERT’S PROOF THEORY 105 • For all variables x.1. d) true ¬f alse true true ∧ ϕ ϕ true ∨ ϕ true .6. ∀x(ϕ) f alse ¬true f alse f alse ∧ ϕ f alse ϕ ∨ true ϕ R0 (c. variables or constants c and d and well-formed formula ϕ.

and obtain a proposition. true) → true ∧ true → true. 6. notation A ϕ := there is an axiom ai of A and a sequence of formulas ϕ1 . 3. 4. The predicates are those expressions with one free variable. For example. E is the set of usual mathematical predicates formed by the symbols of the given alphabet (so E includes the binary relation R0 ). denoted by LA1 . 5. . Definition of derivable: A formula ϕ is derivable in L := ϕ ∈ P. the formulas are derivable if and only if they are true. 2. ∀y (false ∨ y) ∧ true → false ∧ true → false. ϕl such that ϕ1 = ai and ϕl = ϕ and each ϕi follows from the preceding formulas and the axioms of A by the derivation rules of A. 7. S is the set of those expressions without free variables (i. For each such predicate we can replace the free variable by a formula that is represented3 by a natural number. For example.. The refutable sentences R are those that are false from the derivation rules. .106 CHAPTER 6. The definition of an axiomatic system does not include a model. we can see that for all formulas except those with an ∀-symbol. Sometimes it is also said that an axiomatic system A1 gives rise to a language LA An example of such a bijective function between a predicate and a set of natural numbers will be given in section 8. ¬ false ∧ R0 (false. We now introduce some concepts related to STGA languages and axiomatic systems. A formula ϕ is derivable from an axiomatic system A. propositions). The provable sentences P are those that are true from the derivation rules. We assume that A is an axiomatic system that gives rise to an STGA language L.e. . If we think of the standard logic that is used in practice. true) → true ∧ R0 (false. consists of the following parts: 1.2. 3 2 . HILBERT The STGA language L that can be constructed2 on the basis of A1 .

. Pk ) with U a universe for A and P1 . also called interpretation or structure. . . .e. free from contradiction. i. Definition of universe: Set of values that variables of an axiomatic system may take Definition of a model: A universe together with an assignment of n-ary relations to n-ary constants. . Rk of A. i. The standard method that was used to prove the consistency of axiomatic systems was to give a ‘model’. . . .e. . Pk the relations corresponding to symbols R1 . Example: A1 ¬ false ∧ R0 (false. . chosen from some other theory and satisfying the axioms. When A ϕ.1. we say that M is a model of ϕ.6. notation M |= ϕ. ϕl in a derivation of the statement ϕ a formal proof π of the statement ϕ. i. HILBERT’S PROOF THEORY 107 We call the sequence of formulas ϕ1 . .e. is a refutable sentence) Hilbert proposed a program to reformulate all mathematics as a formal axiomatic theory. . true) A1 ∀x)x¬ (since the formula is not well-formed. . . and this theory has to be proved to be consistent. This means we can relate axiomatic systems to existing systems by means of a model. we also write ϕ ∈ A. . . If a formula ϕ is true in the model M (i.e. . and a corresponding assignment of the variables. does not follow to be true from the syntax definition) ∀y (false ∨ y) ∧ true (since it does not follow from the derivation A1 rules. P1 . by interpretation of the relation symbols by the corresponding relations). A model of a formal axiomatic theory is a well-defined mathematical system with the particular structure that is characterized by the theory. A model for an axiomatic theory is simply a system of objects. We define a model M for an axiomatic system A by : M = (U.

completeness and decidability. i. to derive both ϕ and ¬ϕ) (see also [49. L is inconsistent if P and R are not disjoint. The following definitions of completeness. Note that consistency and decidability do not refer to T .108 CHAPTER 6. Since the properties of an axiomatic system A give rise to corresponding properties in the language LA . Definition of a theory: T h(M) := {ϕ | ϕ is a statement and M |= ϕ} So now we can say that Hilbert was looking for an axiomatic system for which logic can be a model.e. but only concern P and R. Hilbert proposed such an axiomatic system to have the properties of consistency. it is not possible for any formula ϕ. HILBERT Example: Let M1 = (N. . A theory T h of a model M. An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that decides in a finite number of steps whether (or not) A ϕ) (see also [49. page 240]) A language L is inconsistent if is not consistent.e. along with some other properties of axiomatic systems. soundness and correctness also depend on the truth set T (and therefore on the model that determines that truth set). page 270]) Definition of consistency: A language L is consistent := ¬(∃s : s ∈ S : s ∈ P ∧ s ∈ R). An axiomatic system A is consistent := ¬(∃ϕ :: A ϕ ∧ A ¬ ϕ) (i. Definition of decidability: A language L is decidable := (∀ϕ :: (ϕ ∈ P ∨ ϕ ∈ R)). ≤) be a model for axiomatic system A1 M1 |= ∀x∀y(x ≤ y ∨ y ≤ x) M1 |= ∀x∀y(x ≤ y ∧ y ≤ x) Note that instead of using R1 for the relation symbol. we immediately took the interpretation ≤. We will now introduce these concepts. notation T h(M) is the set of true statements in the language of that model. P ∩ R = ∅ or no sentence is both provable and refutable in L. we here distinguish in each definition between the property of a language and of an axiomatic system. Clearly.

e. i.6.e. are also not true in the model.e. An axiomatic system A is complete for model M := (∀ϕ :: M |= ϕ → A ϕ) (i.1.e. then P must be disjoint from R. An axiomatic system A is a sound axiomatization for a model M := (∀ϕ :: A ϕ → M |= ϕ) (i. Note that the statement (∀ϕ :: M |= ϕ → A ϕ) is equivalent with (∀ϕ :: A ϕ → M |= ϕ). it is true in the model) Definition of correctness: A language L is correct for a model M := P ⊆ T ∧ R ∩ T = ∅ (i. all true statements in the model are derivable/provable) A language L is incomplete if it is not complete. HILBERT’S PROOF THEORY 109 Definition of completeness: A language L is complete for a model M := (∀ϕ :: M |= ϕ → ϕ ∈ P). Definition of soundness: A language L is sound for a model M := (∀ϕ :: ϕ ∈ P → M |= ϕ). Proof: This follows directly from the definitions of correctness and consistency because if P is a subset of T and T is disjoint from R. all statements ϕ that are not derivable/provable. An axiomatic system A is correct for a model M := A is sound for M and A is complete for M Theorem: If L is correct. every provable sentence is true and every refutable sentence is false (not true)). . if a statement ϕ is derivable/provable. it is consistent.

. which is usually considered to be the most important unsolved problem in mathematics. see the articles [41] and [40].2) and a collection of problems important to that philosophy. also available online [42]. in the opening of his speech to the 1900 Congress in Paris In 1900 Hilbert outlined his list of 23 mathematical problems to the International Congress of Mathematics in Paris.2 Hilbert’s 23 problems Who of us would not be glad to lift the veil behind which the future lies hidden: to cast a glance at the next level of our science and at the secrets of its development during future centuries? What particular goals will there be toward which the leading mathematical spirits of coming generations will strive? What new methods and new facts in the wide and rich field of mathematical thought will the next centuries disclose? .nu/interests/questions/clay. the Clay Mathematics Institute (see http://zax. HILBERT 6. and each solution was a noted event (or even a mathematical breakthrough). For a complete source of information on the 23 (or 25?. in the footsteps of Hilbert. In 2000. Many of the problems have since been solved. We mention some of the original problems that are related to the foundations of mathematics.htm) has made a new list of 7 (for a large part mathematical) problems to be solved in this century.mine. which he urged upon the attention of his contemporaries. but it was also his philosophy of mathematics (see also the formalist viewpoint in section 5. Among those problems is one of the original problems (number 8) of Hilbert.110 CHAPTER 6. Some of these problems however remain unsolved till this day. His famous address was important and still today influences and stimulates mathematical research all over the world. It requires a solution to the Riemann hypothesis. It was not only a collection of problems.D. see [32]) original publications of Hilbert. Hilbert.

y = 3 and z = 0) and some do not. This problem is also known as the Continuum Hypothesis and extensively covered in section 3.uni-bielefeld. Some polynomials have an integral root (for example 6x3 yz 2 + 3xy 2 − x3 − 10 has an integral root at x = 5. ‘given a diophantine equation with any number of unknown quantities and with rational integral numerical coefficients. • Problem 10: Decidability of solvability of diophantine equations.html) is to be proved for the residues of the lth power. So far no complete axiomatization of physics has been found. This question asks if.mathematik. In order to solve this problem this notion had to be made more precise . • Problem 9: Proof of the most general law of reciprocity in algebraic number theory. A root of a polynomial is an assignment of values to its variables so that the value of the polynomial is 0. • Problem 6: Mathematical treatment of the axioms of physics. to devise a process according to which it can be determined by a finite number of operations whether the equation is solvable in rational integers’.6. G¨del later showed that any formal system that contains o arithmetic (see chapter 8) can never prove its own consistency. when l denotes a prime. This problem is still unsolved. Hilbert did not use the term algorithm but rather ‘a process according to which it can be determined by a finite number of operations’. that cannot be expressed in the system. by means of axioms. but can prove its consistency. In modern terminology the problem asks to devise an algorithm that tests whether a polynomial has an integral root. HILBERT’S 23 PROBLEMS 111 • Problem 1: Cantor’s problem of the cardinal number of the continuum. and further when l is a power of 2 or a power of an odd prime. the law of reciprocity (for more references see http://www. in the first rank are the theory of probabilities and mechanics. A root is an integral root if all variables are assigned integer values. The question is whether it can be shown that the axioms on which arithmetic is based are consistent.de/∼kersten/hilbert/prob9. For any field of numbers. • Problem 2: The consistency of the axioms of arithmetic. Another metamathematical argument might exist. asks to treat in the same manner.7.2. those physical sciences in which mathematics plays an important part.

unibielefeld. since it involves the general question of extending the calculus of variations. Hilbert gave a justification of the necessity for an extension of the differential and integral calculus (for more references see http://www.7]. At the end of his article.mathematik. Thus he apparently assumed such an algorithm exists. Hilbert asked that an algorithm be devised . Of the 23 problems Hilbert posed. but now we know that this problem is algorithmically unsolvable.4] and [31. Hilary Potnam and Julia Robinson. building on the work of Martin Davis. for mathematics is the foundation of all exact knowledge of natural phenomena.1). Also. With some examples that we will not treat here.112 CHAPTER 6. see section 9. which basically is the theory of the variation of functions. showed that no algorithm exists for testing whether a polynomial has integral roots. section 4.de/∼kersten/hilbert/prob23. like other sciences. split into separate branches whose connection becomes ever more loose. see [49. this one is the least definite. . Hilbert says that he does not believe mathematics will.html). For a more detailed assessment of Hilbert’s view. but that the organic unity of mathematics is inherent in the nature of this science. section 12. In 1970. • Problem 23: Further development of the methods of the calculus of variations. the young Russian Yuri Matijasevic. HILBERT (this was done by Turing.

presented this theory to prevent the paradoxes while at the same time allowing many of the operations Frege considered desirable. the core of mathematics. it would show that (at least part of) mathematics was founded in pure logic. As we have seen in section 4. as we have learned from Russell’s paradox in section 5. page 194] In section 4. They differ as boy and man. The consequence is that it has now become wholly impossible to draw a line between the two.1 we saw that with the postulates he presented. But unfortunately the language that he created was inconsistent. The theory of types basically says that all sets and other entities have 113 . . With Principia Mathematica.1.1 Russell and Whitehead’s Principia Mathematica Logic has become more mathematical and mathematics has become more logical. Peano stated and organized the fundamental laws of number theory.Chapter 7 Types 7. logic is the youth of mathematics and mathematics is the manhood of logic. Frege was adherent to the goal of logicism that all of mathematics could be derived from logic alone. Russell laid out a theory to eliminate the paradoxes. In his 1908 paper. ‘Mathematical Logic as Based on the Theory of Types’.B.2. the two are one. in fact. Bertrand Russell and his teacher. If statements satisfying these conditions could be derived in this logic. Russell in [79. the mathematician Alfred Whitehead.

the Principia offered discussions of functions. for example. truth. What is perhaps missing is any hesitation or perplexity about the limits of logic: whether this logic is. they opted to avoid the philosophical aspects and justifications. to define mathematical entities (like numbers) in pure logic and then derive their fundamental properties. We will look at the theory of types in more detail in section 7. but to logic and mathematics itself. or decida- .114 CHAPTER 7. but unlike him. Although ‘Principia’ was largely successful there still was critique on the axioms of infinity and the axiom of reducibility. Indeed. and its overcoming of difficulties that had beset earlier logical theories and logistic conceptions. these types can be ordered and sets are always constructed from specified members with lower types. Their collaboration began in 1903 when Whitehead and Russell were both in the initial stages of preparing second volumes to earlier books on related topics: Whitehead’s 1898 ‘A Treatise on Universal Algebra’ and Russell’s 1903 ‘The Principles of Mathematics’. With their mathematical treatment of the principles of the mathematicians. definite descriptions. But unlike Newton’s book it dealt not with the application of mathematical techniques to physics.2. and logical laws that had a deep influence on discussions in analytical philosophy and logic throughout the 20th century. complete. Principia Mathematica consisted of three volumes (sometimes also called ‘the Principia’) and was named after the ‘Philosophiae naturalis principia mathematica’ of the English physician Isaac Newton. Russell and Whitehead intended to summarize the recent work in logic as well as to give a revolutionary and systematical development of mathematical logic and derive basic mathematical principles from the principles of logic alone. Their work overlapped considerably and they began collaborating on what would become ‘Principia Mathematica’. provably consistent. its masterful development of logical systems for propositional and predicate logic. The approach of Russell and Whitehead was essentially that of Frege. In 1919 Russell published about the philosophy behind his work in an ‘Introduction to Mathematical Philosophies’ which was accessible to a broad audience and therefore has been the main source through which Russell’s logicist view of mathematics has become known. their definition of natural numbers was basically the same as the one of Frege. I quote the following assessment about Principia Mathematica from [91]: “In addition to its notation (much of it borrowed from Peano). they were considered to be too ad hoc solutions to be justified philosophically. TYPES a logical ‘type’.

retaining only those of higher type.2) was extended with simple types in 1940 to prevent paradoxes.1. This is somewhat odd. considered to be the most influential book on logic that was ever written..”. The Principia is a work of confidence and mastery and not of open problems and possible difficulties and shortcomings. and his typed lambda calculus thereby falls outside the scope of this article (1870-1940). We will only summarize his work in this paragraph. it is a work closer to the naive progressive elements of the Jahrhundertwende than to the agonizing fin de siecle. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 115 ble. In the next section we will further investigate Russell’s theory of types. The English mathematician Frank Plumpton Ramsey (1903-1930) offered criticism to the theory of types that was accommodated in later editions of Principia Mathematica. or one may abandon also classes and relations of the lowest type. by appropriate deletions of entities . The mathematician Alonzo Church also published articles on type systems. or whether there are concepts expressible in natural languages but not in this logical notation. The type theory that emerged from Church’s lambda calculus (see section 9. similar to the extension of logical set theory with simple types by Russell in 1910 to avoid the paradoxes. [. In fact any finite number of levels at the bottom of the hierarchy of types may be deleted.7. Church also proposed another logical set theory in 1974. given the well-known list of problems posed by Hilbert in 1900 that came to animate 20th-century logic. The main difference between the type structure of Russell and that of Church is that the former is set-based with linear ordering of types and the latter is function based with a non-linear order of types. But this is no reduction in the variety of entities. together with a later simplification to this theory by the mathematicians Hilbert and Wilhelm Ackermann (18961962) from Germany. but did not develop his typed version of lambda calculus before the 1940’s. because the truncated theory of types. Russell and Whitehead not only managed to avoid the paradoxes but also created one of the most impressive and complicated works of all times and that is.] in the simple theory of types it is well known that the individuals may be dispensed with if classes and relations of all types are retained. We would like to add that with the very formal and accurate build-up of mathematics. especially German logic. next to Aristotle’s Organon. The result of this is the ‘deramified theory of types’ that will be treated in subsequent sections.

Whitehead and Russell define types. sets. relations and their properties.8]. can be made isomorphic to the original hierarchy and indeed the continued adequacy of the truncated hierarchy to the original purposes depends on this isomorphism. Organization of Principia Mathematica The nearly 2. limits and derivatives. continuity. including subjects as convergent sequences.000 pages Principia Mathematica starts with a short preface that explains what it wants to demonstrate. the first volume introduces a symbolic logic that is based on a small set of axioms. The second volume continues with a purely logical theory of cardinal and ordinal arithmetic. and basic operations on sets. It starts with a theory of well-ordered sets. and contained a theory of numbers that was called ‘measurement’. The volume ends with a general theory of simply ordered sets (series) which is followed by a logical base of fundamental mathematical analysis. including addition. finite. multiplication and exponentiation of both finite cardinals and of relations.A. . Church in ‘The need for abstract entities’. after an introduction. ratios and the real numbers. Then. the negative integers. Built upon these. This allowed them to introduce basic arithmetic. namely that pure mathematics can be based on logic alone and requires no other primitive notions. and finally vectors. TYPES in each type. and other mathematical assertions that also refer to non-logical contents (such as the statement that (perceptual) space is three-dimensional) as part of applied mathematics. chapter 7.116 CHAPTER 7. The third volume was meant to prepare the ground for the fourth and concluding volume on geometry (which was never completed). Russell classifies statements that involve logical constants only (such as the laws of reciprocity. The belief was then expressed that pure mathematics was sufficient to include all traditional mathematics. . infinite and continuous series. coordinates and basic geometric notions such as angles. and then lays out the propositional and predicate calculi. see page 18 of Principia Mathematica) as pure mathematics. More details about the organization of Principia Mathematica and a critical assessment of its work can be found in [31. and specifically section 7.

to make sure that ‘whatever contains an apparent variable must not be a possible variable of that variable’. second order propositions are propositions with first-order propositions as variables and possibly propositions of lower than first order types. The type of a variable in a proposition is fixed by all the values the function is concerned with. 7. section 3. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117 The symbolic logic and notation of Principia Mathematica Russell and Whitehead opted for a more modern notation of Peano instead of Frege’s Begriffsschrift. such that the n + 1th order propositions contain propositions of order n and possibly others of order smaller than n.3] and [31. Russell formulated the vicious circle principle (see page 85) and implemented it using types in Principia Mathematica (see for details [31. that is.2. i. and an analysis of their common characteristics.2 and 3. What is a type? A type is the range of significance of a propositional function.3]). First order propositions are elementary propositions that only contain individuals. sections 7. the collection of arguments for which the said function is significant and has values. Russell then defined an individual as being not a proposition but a constant. 7. Unlike Frege.e.1. Russell and Whitehead treated functions as first-class citizens.2 and 3. This division of objects into types (the type of an object can be seen as a property of that object) is necessary to conform to the vicious circle principle. A good introduction to the logical calculus and the specific notation that was used in Principia Mathematica can be found in [49. section 3.3. since the variables contained in an object determine the type of that object. Russell’s theory of types Russell’s 1908 book included a categorization of most of the important contradictions of that time. i.e. We can now categorize propositions by their types.7 and 7. destitute of complexity.8]. by the totality over which the variable ranges. . This can be continued. This linear order of types prevents vicious circles. This can be established by making sure that ‘an apparent variable’ is of a different and higher type than the possible values of that type. To prevent the paradoxes he catalogued. section 7.9] and [49.7.

as well as a formal introduction to RTT. chapter 3]. A detailed introduction to the (symbolic) logic and notation of Principia Mathematica. section 7. this way of restricting propositions prevents the paradoxes but can in some cases be needlessly restrictive. STT and NF and MP (see section 7. we refer to [86.3).9] and [49. this reference is again partly based on [52]. [53]. section 3. and we confine quantifiers always to a single level. all of which in a certain context discuss RTT. see [31. is to be included in a later version of this report.118 CHAPTER 7. [54] and [43]. On its turn. . For a formalization (in modern notation) of Russell’s Ramified Theory of Types (RTT). As can be proved however.3]. TYPES We now also restrict relations like ∈ so that x ∈ y is only significant when y is of a type one level higher than x. For more information about types in Principia Mathematica.

. HILBERT AND ACKERMANN 119 7. .2 Ramsey. Hilbert and Ackermann Suppose a contradiction were to be found in the axioms of set theory. . for example. His second simplification is to suggest simplifying Russell’s theory of types by regarding certain semantic paradoxes as linguistic. ‘the set of all sets which are not members of themselves’. Thus. . while a predicate with a single argument that is itself a predicate with two individual arguments is of type (i. Hilbert. xn ) has arguments x1 . and if it were true. chapter 4]. and if a predicate or predicate variable ϕ(x1 .P. i). xn . together with Ackermann (see [2]). . When . xn ) is said to be of type (τ1 . . Ramsey claims. .F. τ2 ). (i.2. . . i. Do you seriously believe that a bridge would fall down? . quite different and depend on the meaning of the word ‘lie’. then ϕ(x1 . for it is not a tautology. the semantic paradoxes such as ‘this is a lie’ are. . . . . After his suggestions. He accepted Russell’s solution to remove the logical paradoxes of set theory arising from.] every individual or individual variable is said to be of type i. for example. First he proposed dropping the axiom of reducibility which. For more detailed information about the history of deramification. . Having introduced the hierarchy of types in this way. we refer to [86. simplified Russell’s theory of types by removing the orders into what has become known as the ‘simple theory of types’ (STT). page 105-142]) in 1925. . Ramsey. However. τ2 respectively.”. quoted in [58] Ramsey published his first major work ‘The Foundations of Mathematics’ (see [69. . . .] certainly not self-evident and there is no reason to suppose it true. These he removed with his reinterpretation of the axiom of reducibility. i)). In this publication he attempted to improve Principia Mathematica in two ways. . of types τ1 .7. Every quantifier will then range over the totality of all entities of the same type as the bound variable. is “[. and Ramsey’s modification of the theory as the deramified theory of types. Russell’s theory became known as the ramified theory of types (RTT). this would be a happy accident and not a logical necessity. RAMSEY. he writes... any predicate with two individual arguments is of type (i. We quote from page 115 of [49]: “[In the simple theory of types. . we shall now require bound variables to be of some definite type.

.120 CHAPTER 7. A further discussion and formalization (in the form of Church’s simply typed lambda calculus λ → c) of the simple theory of types can be found in [86]. we have a very comprehensive logical calculus which is secure against vicious circularity”. TYPES this is done.

is a convenient myth [which] simplifies the laws of arithmetic . In Russell’s theory such an object is too big but according to others. . having a set of all sets or a type of all types is legitimate as long as we do not permit forming all subsets. NF.7.3 Quine Just as the introduction of the irrational numbers . . pages 80-101] a system called New Foundations. . Quine proposed in [94. since the sentence ϕ ≡ x ∈ x is not stratified. quoted in [50] Willard Van Orman Quine (1908-2000) was an American mathematician who became interested in the work of Russell. . for example by requiring a stratified predicate to define the subset. including Quine. The conceptional scheme of physical objects is [likewise] a convenient myth. . xn ) of ϕ. . j ≤ n With the NFC axiom the paradox is obviously prevented. so physical objects are postulated entities which round out and simplify our account of the flux of existence . / . An alternative to Russell’s system is one that allows a single universe of all types (or all sets). based on this idea.Quine. then no contradiction will result.3. To restrict the way subsets are formed. footnote 4]. Quine further restricted the comprehension axiom to: NFC(omprehension) Axiom: ∃x∀y :: (y ∈ x ↔ ϕ(y)). QUINE 121 7. . . . f (F ) = 1 + (max : 1 ≤ i ≤ n : f (xi )) Definition of homogeneous stratification: A well-formed formula ϕ is homogeneously stratified := ϕ is heterogeneously stratified and for the corresponding function f we also have that f (xi ) = f (xj ) for 0 ≤ i. If there is some restriction on which subsets can be formed. where x is not free in ϕ(y) and ϕ(y) is stratified In [86. simpler than the literal truth and yet containing that literal truth as a scattered part . we find two definitions of stratification. Definition of heterogeneous stratification: A well-formed formula ϕ is heterogeneously stratified := there is a function f from the variables and constants of ϕ to the natural numbers such that for each atomic well-formed formula F (x1 . .

[71]. The rule of elementhood is such as to provide the elementhood of just those classes which exist for NF. Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)). ML was obtained from NF by replacing (NFC) by two axioms.boisestate. We cannot prove Peano’s axiom[s] in it. see seco tion 8. Like NBG. For more information. It was however proved subject to the Burali-Forti paradox”. and developed another non-iterative set theory called ML (Mathematical Logic). . one for class existence and one for elementhood. stratified or not.5) set theory[. we refer to [70]. [72] and the website http://diamond. ML contains a bifurcation of classes into elements and non-elements. Sets can enjoy the property of being full objects whereas classes cannot. The rule of class existence provides [. where ϕ(x) is any formula in which y does not occur free.edu/∼holmes/holmes/nf.html. . To overcome some of the difficulties. TYPES We quote from [86. Quine adopted similar measures to NBG (Neumann-Bernay-G¨del. first presented in [70]]. ML was liked both for the manipulative convenience we regain in it and the symmetrical universe it furnishes. NF is said to lack motivation because its axiom of comprehension is justified only on technical grounds and one’s mental image of set theory does not lead to such an axiom. unless we assume the existence of a class with m + 1 elements. ] the existence of the classes of all elements satisfying any condition ϕ. the two axioms of comprehension for ML [are]: Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)). where ϕ(x) is stratified with set variables only in which y does not occur free. page 3]: “NF is weak for mathematical induction and the axiom of choice is not compatible with NF.122 CHAPTER 7. . Also. Therefore.

Principia Mathematica and ZF] relatively simple problems in the theory of integers that cannot be decided on the basis of the axioms. what is now Brno. but in 1931 Kurt G¨del (born in 1906 in Brnn. o Austria-Hungary. he proved that it is impossible to establish the internal logic consistency of a very large class of deductive systems. G¨del. as is well known.1 Informally: G¨del’s incompleteness theorems o No system of Hilbert’s type in which the integers (or Peano’s arithmetic. . It provoked a reappraisal of philosophies of mathematics. see section 4. Czech Republic) presented mathematicians with the astounding and melancholy conclusion that the axiomatic method has certain limitations. . As a corollary of this theorem.K. to the formalization of large tracts of it. that on the contrary there are in the two systems mentioned [viz. 123 .1) can be defined can be both consistent and complete. . which rule out the possibility that even the ordinary arithmetic (as by Peano) can ever be fully axiomatized. in the opening of the paper introducing the incomo pleteness theorem (1931) 8. At the time this seemed unreal. [.Chapter 8 G¨del o The development of mathematics towards greater precision has led. so that one can prove any theorem using nothing but a few mechanical rules.] It will be shown below that this is not the case.

2 for more details). The system is necessarily incomplete. With such a representation.2 for the definition of consistency). then no proof of its consistency is possible which can be formalized within it (see also section 8. A class sign is a formula with just one free variable. o We define gn(ϕ) to be the G¨del number corresponding to well-formed formula ϕ of S. GODEL G¨del’s famous incompleteness theorem and the corresponding corollary o are also called the first and the second incompleteness theorem. stating that ϕ is a provable formula. If the system is consistent. q] we denote the formula which is denoted by replacing the free variable in R(n) by q. ‘proof-schema’ and ‘provable formula’ by a series of natural numbers. G¨del gave a wello o ordering of all well-formed formulae of an axiomatic system S (to be precise. following the lines of the original proof of G¨del) the main lines of both theorems in this section. if an axiomatic system of formalized arithmetic is wide enough. 1 The (syntax of) formulas of an axiomatic system are precisely defined and built up from a finite alphabet of symbols. By [R(n). We first indicate (in 8 steps. Precisely. for some natural number n. in the sense that there exists a formula ϕ of the system such that neither ϕ nor its negation is derivable (see also section 8. see section 8. We suppose that the class signs are ordered by a function R with domain N. Proofs are nothing but a finite series of formulas and can be replaced by numbers. G¨del o then showed how to represent metamathematical concepts as ‘formula’. of ω-complete systems.2 for the definition of incompleteness). a class sign) that denotes n ∈ K. Since the symbols that are used in this formula are all definable in S. 3 We now define a set K of natural G¨del numbers by n ∈ K ↔ o ¬prov([R(n).124 ¨ CHAPTER 8. n]). and provide a more o rigorous and exact proof of the theorems in section 8. such that R(n) is defined as the nth class sign. We call . G¨del was o able to show that.2 and further sections. there also is a formula with one free variable (i. we define prov(ϕ) := ‘ϕ is a provable formula’. 2 We consider a formula prov(ϕ) of S.e. then 1. and 2. the G¨del numbering.

We By replacing in the class sign C. thus the system S is inconsistent. 7 G¨del then showed that ‘if arithmetic is consistent. i.1. and hence is undecidable: • Suppose G is provable. the system S is ‘essentially incompleteable’. and this says ¬prov(G) : G is not provable. then both G and ¬G must not be provable: G is undecidable in S. 1 . So if we assume that S is consistent. So there is a natural number q such that C = R(q). we can say that G is a property that asserts of itself that it is not provable. Since1 this formula says that q ∈ K. page 510]. Grattan-Guinness cleverly calls it in [31.e. is unprovable in S. 4 We show that G is provable ↔ ¬G is provable. the proposition depends on the set of axioms.e. since in step 3 a proposition is defined that states ‘this formula is not provable’. q]) or prov(G) : G is provable. it is incomplete’. That means. q]). that is ¬prov([R(q). q]). ¬prov([R(q). and this is equivalent with / prov([R(q). (by replacing the variable in the class sign C by q) that is q ∈ K.¨ 8. which expresses that n ∈ K for some natural number n. Because from the remark that G asserts its own unprovability. We define the condition of the statement by A: ‘arithmetic is consistent’. (by replacing the variable in the class sign by q) that is q ∈ K. this means its negation ¬[R(q). We now show that the proposition G ≡ [R(q). Therefore. it follows at once that G is true. Basically we then create another formula G . o We want to prove this conditional statement as a whole. i. the free variable by q. we can again apply the argument given in the previous five steps in the same way. INFORMALLY: GODEL’S INCOMPLETENESS THEOREMS 125 this class sign C. q] is provable. as I. • Suppose G is not provable. A proof of G leads to a proof of ¬G and vice versa. q] is provable. or in other words ‘this formula does not follow from the axioms’. q]. 5 By a metamathematical consideration we know however that G is true. this means [R(q). since G is unprovable (because undecidable). q]). So there is a true statement in S (namely G) that is not provable: the system S is incomplete! 6 If we add G as an axiom. (¬¬prov([R(q).

thus if S is consistent. 8 We can now formally prove A → G (see section 8. But we already saw that (unless S is inconsistent).2 for the proof). So we can express A ≡ (∃y :: (∀x :: ¬prov (x is a proof of y))). This means that if A is provable. . see [93. A system is incomplete if there is a true statement that is not provable. Thus we can represent the conclusion of the conditional statement by G.126 ¨ CHAPTER 8. G is not provable. as o expressed in [31. page 614]). we know (by modus ponens or the role of detachment) that G is provable.1 that this means that there is at least one formula ϕ of arithmetic that is not true. Or. ‘any set S of consistent formulae of P M cannot include the formula F asserting its consistency’. GODEL already have seen in section 6. A is not provable! That means if arithmetic is consistent its consistency cannot be established by metamathematical reasoning within the formalism of arithmetic (this is G¨del’s theorem 11. page 510].

Then in the o next sections we will apply the theorem to the system of Peano Arithmetic and that of Principia Mathematica.2.1 On formally undecidable propositions We assume there is an STGA language L and investigate the conditions for a system L for which G¨del showed that there is a true sentence that is not o provable in L (i. Proof: (from [84]) Since L is built up of a finite number of symbols and derivation rules. and discuss the consequences of the incompleteness theorem.¨ 8. is essentially incomplete. we will first give an abstract version of G¨del’s first and second incompleteness theorem. Note that expressibility in L only concerns with T and not with P and R.2 Formally: G¨del’s Incompleteness Theorems o The first incompleteness theorem says that Principia Mathematica or any other system in which arithmetic can be developed. not every set of numbers is expressible in L. see page 69) there are non-denumerably many sets of natural numbers. Therefore.2. (∃t : t ∈ T : t ∈ P)). . We define the following concepts: / A predicate H expresses a set of numbers A := (∀n :: H(n) ∈ T ↔ n ∈ A) A is expressible in L if A is expressed by some predicate of L. In the following two paragraphs. FORMALLY: GODEL’S INCOMPLETENESS THEOREMS 127 8. there are only denumerably many expressions or predicates of L. 8.unless the proof itself employs rules of inference in certain essential respects different from the derivation rules identifying theorems within the systems. But (by Cantor’s theorem.e. that is in any consistent set of arithmetical axioms there are statements that are true but cannot be derived from the set. investigate the set of lano guages that the theorem applies to. Theorem: Not every set of numbers is expressible. and then in the third paragraph fill in the details by giving a specific G¨del numbering for arithmetic. The second theorem says that it is impossible to give a metamathematical proof of the consistency of a system comprehensive enough to contain the whole of arithmetic .

1. we have n ∈ A∗ ↔ d(n) ∈ A. i. We define d(n) to be the G¨del number of the diagonalization of En . If the set P is expressible in L and L is correct.2. is defined by En (n). we also call gn(E) the G¨del number of E. i.3. For any E ∈ E. (A∗ ) is to be read as (A)∗ . We have that h ∈ P ∗ ↔ d(h) ∈ P ↔ d(h) ∈ P . For this abstract treatment the o only assumption2 we make is that every number is the G¨del number of some expression. This means that H(h) is either true and not provable in L or false but provable in L. that is nevertheless commonly accepted by all mathematicians. Proof: (based on [84]) Suppose L is correct and P ∗ is expressible in L by a predicate H with G¨del number h. 2 . The latter alternative violates the hypothesis that L is correct. i. Now we have: H(h) is true ↔ H(h) / is not provable in L. In particular.e. G¨del’s o original numbering did not have this restriction.e. Hence it must be that H(h) is true but not provable in L. GODEL Let gn be a function that assigns to each expression a unique natural number (just as in step 1 in section 8. We will o give a specific numbering in section 8.e. then there is a true sentence of L not provable in L. The diagonalization of En for En ⊆ H. gn is a bijection between E and N). i. But since h is the G¨del number of H and by the definition of d. We define En to be the inverse of gn. gn(En ) = n.e. Note that in this proof we have not defined the set T by a model but determined the truth of G by a metamathematical argument just as we have seen in step 5 of section 8. We show that G is true but not provable in L. / H(h) is true ↔ h ∈ P ∗ . that is: d(n) := gn (En (n)).128 ¨ CHAPTER 8.e. H(n) is true ↔ n ∈ P ∗ for all n ∈ N. Let G be the diagonalization of H o (i. For each set of natural numbers A. we define its complement A to be the set of all natural numbers not in A. For any set of natural numbers A. and o call d the diagonal function of the system. the sentence H(h)). i. The complement operation ∼ binds stronger than the ∗. we define A∗ to be the set of all numbers n such that d(n) ∈ A. Note also that the proposition G corresponds to the propoThis assumption is for technical reasons that make the proof more simple. H expresses P ∗ in L.1.e. Abstract form of G¨del’s first theorem: Let P be a set of G¨del numo o ∗ bers of all the provable sentences. d(h) is o the G¨del number of H(h) and so d(h) ∈ P ↔ H(h) is provable in L and o d(h) ∈ P ↔ H(h) is not provable in L.

1. such as a system containing Peano’s arithmetic or the system of Principia Mathematica. By this theorem. but undecidable in L. Diagonal Lemma: For any set A. since H(h) is a proposition that expresses of itself that it is not provable. Before we prove a general form of G¨del’s second incompleteness theoo rem.1. G1 : For any set A expressible in L. Hence G is true. then there is a G¨del sentence for A. o A sentence En is a G¨del sentence for a set A of natural numbers if either o o En is true and its G¨del number lies in A.2. That is where the name incompleteness theorem comes from. In particular we then have that if P is expressible in L (i. When we study a particular language L. We can do this by separately verifying the following conditions. and hence also incomplete. Theorem: If L is correct and if the set P ∗ is expressible in L. it follows immediately that if a system is consistent. Note that this is the statement A → G of point 8 in section 8.e G3 holds). En is a G¨del sentence for A if and only if En ∈ T ↔ n ∈ A. i. FORMALLY: GODEL’S INCOMPLETENESS THEOREMS 129 sition G of point 3 of section 8. we introduce some more definitions. Proof: G1 and G2 imply that for any expressible set A. the set A∗ is expressible in L. G2 : For any set A expressible in L. Proof: A system L that is correct and for which the set P ∗ is expressible in L contains a sentence G that is true but not provable or refutable (By the previous theorem and the assumption of correctness). if A∗ is expressible in L.e. A∗ is expressible in L. or En is false and its G¨del number o lies outside A. G3 : The set P is expressible in L. the set A is expressible in L. o .¨ 8. and the set P ∗ is expressible in that system (which we will later see is true for a system of basic arithmetic) then it is incomplete. Theorem: G1 ∧ G2 ∧ G3 → P ∗ is expressible in L. we have to verify the assumption that P ∗ is expressible in L. P ∗ is expressible in L. then L is incomplete.

GODEL o Proof: Suppose H is a predicate that expresses A∗ in L. A∗ is expressible in L. A general form of G¨del’s second theorem (by Tarski) o 1. thus for any expressible set A. The set T ∗ is not expressible in L 2. o o Lemma: If L satisfies G1 . there is a G¨del sentence for A. therefore. there cannot possibly be a G¨del sentence for the set o T because such a sentence would be true if and only if its G¨del number was o not the G¨del number of a true sentence.e. H(n) o ∗ is true ↔ n ∈ A .2 The impossibility of an ‘internal’ proof of consistency With the diagonal lemma we can also prove a general form of G¨del’s second o theorem. If condition G1 holds. A G¨del sentence for P is a sentence which is (by the definition o of a G¨del sentence) true if and only if it is not provable in L. for systems for which G1 and G2 hold. If conditions G1 and G2 both hold. that was first formulated in this form by the Polish mathematician Alfred Tarski. then the set T is not expressible in L (i. then for any set A expressible in L.2. there is a G¨del sentence G for P . o Proof: L satisfies G1 . then H(h) is a G¨del sentence for A. So for any o correct system L. H(h) is true ↔ d(h) ∈ A.130 ¨ CHAPTER 8. then T is not expressible in L 3. and this is absurd. With the diagonal lemma we can also prove the first theorem as follows: o Since P ∗ is expressible in L. For any number n. and since d(h) is the G¨del number of H(h). truth within the system is not definable within the system. a G¨del sentence for P is a sentence which is true but not o provable in L. 8. Now we can apply the previous lemma to conclude that there is a G¨del o sentence for A. o .) Proof: To begin with. by the diagonal lemma. let h be its G¨del number. Then d(h) is the G¨del number of H(h).

FORMALLY: GODEL’S INCOMPLETENESS THEOREMS 131 1. then by the diagonal lemma. If G2 also holds. For the moment we refer to G¨del’s original work that can be found in [93]. we will consider particular mathematical languages. If T ∗ were expressible in L.2. then T would also be expressible in L. G2 o and G3 This section will be completed in a later version of this document. which we can build on in section 8. Then if T were expressible in L. the set T ∗ would be expressible in L. violating (2). then if T were expressible in L. starting with first order arithmetic. there would be a G¨del sentence for the set T . Suppose condition G1 holds. which we have just shown is impossible. o Therefore.2. T ∗ is not expressible in L. 8.3 to prove the incompleteness of systems based on Peano’s arithmetic and other systems. violating (1). o .¨ 8. 3. Now we have seen both theorems in a general form. 2.3 G¨del numbering and a concrete proof of G1 .

We mention three of them. It . This proof of G¨del’s first theorem however cannot be formalized in arithmetic o (since the truth set is not expressible in arithmetic). Definition of simple consistency: An axiomatic system A is simply consistent := no sentence is both provable and refutable in A Definition of ω-inconsistent: An axiomatic system A is ω-inconsistent := there is a predicate F (w) (in one free variable w) such that the sentence (∃w :: F (w)) is provable but all the sentences F (0). In the mathematical sciences are found numerous forms of reasoning irreducible to syllogisms. It is based on an axiomatic system by the American mathematician Raphael Robinson (1912-1995). This proof is of course formalizable in Peano Arithmetic (and this is necessary for G¨del’s second theorem) and also o shows that any axiomatic system A that is simply consistent and in which all Σ0 -sentences are provable. F (1). that we refer to as R. GODEL 8. is ω-incomplete. . The simplest uses a truth set defined by Tarski and shows that every axiomatizable subsystem of N (the complete theory of arithmetic) is incomplete. .G. and was based on the underlying assumption that Peano Arithmetic is correct. G¨del’s original o incompleteness proof involves the much weaker assumption of ω-consistency. . . has little importance in mathematics. Peano in [68. are refutable Definition of ω-incomplete: An axiomatic system A is ω-incomplete := A is a simply consistent axiomatic system in which all Σ0 -sentences are provable G¨del’s original proof was based on the assumption of ω-consistency and o shows that every axiomatizable ω-consistent system in which all true Σ0 sentences are provable is incomplete. implying that every sentence provable in Peano Arithmetic is a true sentence.132 ¨ CHAPTER 8. when they are exact.3 G¨del’s theorem and Peano Arithmetic o The classification of the various modes of syllogisms. The third proof (1936) is due to Rosser and uses the even weaker assumption of simple consistency. page 379] There are various different incompleteness proofs of Peano Arithmetic (with and without exponentiation).

reduced to a few axioms and rules of inference. They can be found in [84] but in a particular presentation that does not use the concept of a model for axiomatic systems. but thereto uses a more elaborate sentence than the G¨del sentence o ‘G is undecidable’. and that sometimes attaches different meanings to established definitions. We intend to include the three proofs in a later version of this document.¨ 8. . von Neumann) on the other. G¨del on the first page of [27]: o The most comprehensive formal systems that have been set up hitherto are the system of Principia Mathematica on the one hand and the Zermelo-Fraenkel axiom system of set theory (further developed by J. G¨del’s theorems apply to Principia o Mathematica. We quote K. that on the contrary there are in the two systems mentioned relatively simple problems in the theory of integers that cannot be decided on the basis of the axioms”. One might therefore conjecture that these axioms and rules of inference are sufficient to decide any mathematical question that can at all be formally expressed in these systems. given the proof of incompleteness of Peano Arithmetic. It will be shown that this is not the case.3. that is. nevertheless it contains in our opinion one of the best discussions of G¨del’s incompleteness theorems. o In a later version of this document we will also show how. These two systems are so comprehensive that in them all methods of proof today used in mathematics are formalized. GODEL’S THEOREM AND PEANO ARITHMETIC 133 shows that every axiomatizable simply consistent extension of R is incomplete.

. .A. and he was always trying to convince me otherwise. Church in an interview at Princeton University (1985) In a later version of this document we will discuss the implications of G¨del’s theorem and show the reactions that followed the publication of his o paper [27] in 1931. I was hard to convince about the incompleteness theorem. which I shared. Like most others.4 Consequences I had a lot of conversations with him [G¨del] and a lot of diso agreements. to think that it was special to a certain type of formalization of logic and that a radical reformalization might have the effect that the G¨del argument did not apply. GODEL 8.134 ¨ CHAPTER 8. I persisted in that o longer than I should have. There was at the time a tendency.

¨ 8.5. The comprehension axiom was replaced by the weaker separation axiom. set theory was given a more rigorous basis. Based on this principle a first formal theory of sets.5 Neumann-Bernays-G¨del axioms o There is an infinite set A that is not too big. At the same time. After Hilbert proposed his proof theory. and axiomatic theories for Cantor’s sets were proposed. The antinomies of Burali-Forti and Russell however showed that this theory was inconsistent. and one way to restore consistency was to incorporate in the system a theory of types.John von Neumann (sources unknown) Let us recapture the situation of the axiomatic theory of sets before we introduce the Neumann-Bernays-G¨del theory.3. Before we will describe why the Hungarian mathematician von Neumann opposed this solution and came with his own solution to the paradoxes. Others tried to overcome the inconsistencies by making Cantor’s set theory more rigidly axiomatic. called ‘ideal calculus’ was developed (not treated in detail here. NEUMANN-BERNAYS-GODEL AXIOMS 135 8. he gave the informal definition (see page 16) of a set being ‘any comprehension into a whole M of definite and separate objects m of our intuition or thought’. that only allows new sets to be created from existing sets and with definite predicates. . intuitionists tried to do mathematics without Cantor’s set theory at all. as was done by Russell. There’s no sense in being precise when you don’t even know what you’re talking about. we will look at this separation axiom . and introduced 7 axioms. The problem for him was to solve the problem of axiomatization in such a way that it excludes all contradictions but still is sufficiently wide for all that is valuable in this theory to be preserved. o When Cantor introduced his set theory. Cantor’s definition was replaced by the principle of comprehension (see page 16). defined the primitive notions of ‘equality’ and ‘is element of’ relation. and the most successful axiomatization of set theory was presented by Zermelo in 1908. As we have seen in section 5. see for example [36]). which was adopted by Frege and Russell. Zermelo postulated a domain of abstract objects (sets) and elements of this domain.

His formulation differed considerably from Zermelo and Fraenkel (see section 5.3) because the notion of function. was taken as primitive. a collaborator with the formalist David Hilbert. The foundation axiom of von Neumann makes occurrence of so-called extraordinary sets impossible. 3 . GODEL in more detail. etc. no such property as that of being definable in a finite number of words can be used in the definition of a set. In a series of papers beginning in 1937. . the validity or invalidity of which is decided without arbitrariness by the basic laws of logic. Zermelo defined the separation axiom as follows: Separation axiom: (∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)). modified the von Neumann approach in a way that put it in much closer contact with Zermelo and Fraenkel. however. We have already seen on page 93 that this axiom excludes the paradoxes of Russell and Burali-Forti. page 290. This version is known as the Neumann-Bernays-G¨del (NBG) axioms. In 1940. The concept of definiteness in this axiom was defined by Zermelo as follows: “A question or assertion ϕ.136 ¨ CHAPTER 8. the Swiss logician Paul Bernays. In [83]. and as Kneebone remarks3 in [49. such that V2 ∈ V1 . ϕ is definite and does not contain y. the Czech-born Kurt G¨del. For every set z there exists a set y whose elements are exactly those of z having the property ϕ. rather than that of set. V3 . 291]). Fraenkel used Skolem’s idea to formulate the separation axiom in a new way (for details. o We quote: “since a definite property is one that is decidable by the basic relations of the domain B [of sets. known for o his incompleteness proof (see chapter 8). . the abstract objects postulated by Zermelo]. see [49. Von Neumann’s subsequent interest in set theory led to the second major axiomatization of set theory in the 1920s. V3 ∈ V2 . . page 263] also the semantic paradoxes. V2 . In 1922 Fraenkel proposed the introduction of another axiom that allows the existence of larger cardinal numbers than hitherto possible. the Norwegian mathematician Skolem pointed out that the definition of ‘definiteness’ was rather vague and he made precise the formulation of ‘by the basic laws of logic’. further simplified the theory. and the semantic paradoxes are thus also excluded”. is said to be ‘definite’ ”. A set is extraordinary if there is a sequence of sets V1 .

e. and proper classes are thought of as those collections that are ‘so big’ that.e. This can be proven by showing in each case that the collection on which the paradox is based is a proper class. than X = Y . if X is a set. Since in both cases (R ∈ R and R ∈ R) we get / a contradiction. those classes that are not sets are called proper classes. Suppose R ∈ R. A class is defined to be a set if it is a member of some class. . In the Neumann-Bernays-G¨del axioms. Suppose R is a set. So we must / have R ∈ R. Then by our assumption we have R is a set ∧ R ∈ R. Y. Example: ‘for all x. sets are intended to be those classes that are adequate for mathematics.¨ 8. and conversely. but not / a set. z :: (z ∈ X ↔ z ∈ Y ) → X = Y ) Classes are uniquely determined by their members. Theorem: With the Neumann-Bernays-G¨del axioms. i. A(x)’ stands for ‘for all X. as is also true in Zermelo and Fraenkel. out assumption that R is a set must be wrong. contradictions would follow. it is convenient to adopt the undefined notions of class and the membership relation (though. if they were permitted to be sets. The Neumann-Bernays-G¨del axioms (NBG): o 1 Extensionality axiom (or axiom of determination): (∀X. But then we have (take R for y in the above statement) R ∈ R ↔ R is a set ∧ R ∈ R: contradiction. the derivation of o Russell’s paradox does not apply. o the classical paradoxes are avoided. then A(X)’. NEUMANN-BERNAYS-GODEL AXIOMS 137 Before we give the axioms. Intuitively. ∈ suffices). the condition holds for all sets. We prove by / contradiction that R is not a set. In the axioms we distinguish between the use of capital Latin letters and lowercase Latin letters for the variables. to be exact: if every element (that is a set) of a class X is at the same time an element of Y . The lowercase letters are used as special restricted variables for sets. For all y we have that y ∈ R ↔ y is a set ∧ y ∈ y. and / / thus R ∈ R: contradiction. Proof: We show that R := {x | x is a set ∧ x ∈ x} is a class. is not a set. i.5. The capital letters stand for variables that take classes (the totalities corresponding to certain properties) as values.

5 Sum-set axiom or Union axiom: (∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w)) For every set z there exists a set y whose elements are exactly those objects occurring in at least one element of z. .138 ¨ CHAPTER 8. b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b)) Given two sets a and b there exists a set whose elements are exactly a and b. ϕ is a proposition in which set variables are only introduced by existential and universal quantifiers. GODEL 2 Axiom of the empty set: (∃x∀y :: y ∈ x) / There is an (improper. 3 Axiom for class formation: (∃Y ∀x :: (x ∈ Y ↔ ϕ(x)). 7 Axiom of infinity: (∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : {a} ∈ z)) There exists a successor set. 9 Axiom of replacement or axiom of substitution (by Fraenkel): (∀x∃!y : ϕ is a class : ϕ(x. 4 Pairing axiom: (∀a. see also footnote on page 93) set. the ‘null’ or ‘empty’ set. For every set z there exists a set y whose elements are exactly those of z having the property ϕ. y)))) The image of a set under an operation (functional property) is again a set. 6 Power set axiom (∀z∃y∀x :: x ∈ y ↔ x ⊆ z) For every set z there is a set y that includes every subset of x. 10 Axiom of restriction: X = ∅ → (∃y : y ∈ X ∧ X ∩ y = ∅) Every non-empty class is disjoint from one of its elements. which contains no elements at all. 8 Axiom of choice: (∀x :: (∃f : f is a function : Dom(f ) = x − {∅} ∧ (∀a : a ∈ Dom(f ) : f (a) ∈ x))) Every set x has a choice function. y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x.

From this finite set of axioms. 3. The finiteness of the axioms for NBG makes the logical study of the system simpler. each instance of the above scheme. a list of seven axioms (not schemes) that for each of certain conditions there exists a corresponding class of all those sets satisfying the condition. or both are inconsistent. All we know from this theorem is that either both axioms are consistent.5. however. with the comments above about the third axiom in mind. instead. On the other hand. ZF has infinitely many axioms. NEUMANN-BERNAYS-GODEL AXIOMS 139 The axioms 1. When obtained in this way.¨ 8.3. In a detailed development of NBG. it follows that NBG has only a finite number of axioms. We give the following theorems without proof: Theorem: Every theorem of ZF is a theorem of NBG Theorem: Any theorem of NBG that speaks only about sets is a theorem of ZF Theorem: ZF is consistent if and only if NBG is consistent Note that the fact that NBG avoids the classical paradoxes and that there is no apparent way to derive any one of them in ZF does not settle the question of the consistency of either theory. since the ninth axiom or scheme of ZF provides an axiom for each formula. the third axiom scheme of NBG is called the class existence theorem. can be obtained as a theorem. there appears. The third axiom (scheme) is presented in a form to facilitate a comparison with the third axiom (scheme) of ZF. The relationship between the theories may be summarized by the statement that ZF is essentially the part of NBG that refers only to sets. Thus. In contrast to the ninth axiom scheme of ZF (see section 5. that of NBG is not an axiom scheme but an axiom. . 9 and 10 are different from ZF.2).

140 ¨ CHAPTER 8. GODEL .

1). worked there on probability theory and (independently of de Moivre) discovered the central limit theorem. for he showed that any useful mao thematical axiom system is incomplete in the sense that there must exist propositions whose truth can never be decided (within the system). he proved in his seminal paper ‘On Computable Numbers. Kurt G¨del threw up an obstacle to this effort. or Entscheidungs141 . . As we have seen in the previous chapters. We now regard this decision problem. page 46] Alan Mathison Turing (1912-1954) was an English mathematician and logician who pioneered in the field of computer theory and who contributed important logical analyses of computer processes.1 Turing and Turing Machine We may hope that machines will eventually compete with men in all purely intellectual fields. procedure for establishing truth (the so-called formalist program). Turing studied in Cambridge. with an Application to the Entscheidungsproblem’ (reprinted in [19]) that there cannot exist any such universal method of determination.Chapter 9 Church and Turing 9. many mathematicians had attempted to eliminate all possible error from mathematics by establishing a formal. With his incompleteness theorem (see section 8. Turing was motivated by G¨del’s work to seek an algorithmic method of determining o whether any given propositions were undecidable.Alan Turing in [38. Instead. or purely algorithmic. In 1936 he won the Smith’s Prize. with the ultimate goal of eliminating them from mathematics.

since the incompleteness theorem o (see section 8. the development of pattern and form in living organisms. within a given formal axiomatic system (that includes at least arithmetic). The Turing Machine Turing introduced his hypothetical computing device in 1936. This was left open by G¨del. and he proposed a simple test. His efforts in the construction of early computers and the development of early programming techniques were of prime importance. The Turing Machine and proof of undecidability are given later in the section. in 1938 on systems of logic based on ordinals. Thereto Turing introduced a hypothetical computing device (later called Turing machine). given a mathematical proposition.D. G¨del had demonstrated that such proposio tions exist in any such system. He originally conceived the machine as a mathematical tool that could infallibly recognize undecidable propositions . cannot be either true or false. and in 1948 he worked in Manchester on the construction of a new digital computer.1). one could find an algorithm which would decide if the proposition is true or false. The problem of decidability asks if..1) only stated that consistency and completeness could not at the same time be attained. During the war Turing worked on breaking German Enigma codes.i. Turing instead proved there can never exist any universal algorithmic method for determining whether a proposition is undecidable.e. He described a modern computer before technology had reached the point where the construction was a realistic possibility. that means there were statements (in consistent . After this important publication Turing completed his Ph. But it is more difficult to prove there is no algorithm that can solve certain propositions. under direction of Alonzo Church (see section 9. it is easy to see that it can prove certain propositions.2). to assess this capability. those mathematical statements that. in more detail. now known as the Turing test. He also championed the theory that computers eventually could be constructed that would be capable of human thought. CHAPTER 9. In 1952 Turing published the first part of his theoretical study of morphogenesis. When given an algorithm. CHURCH AND TURING Decidability was one of Hilbert’s requirements for an axiomatic system (see section 6. Turing’s papers on the subject are widely acknowledged as the foundation of research in artificial intelligence.142 problem.

e. since each transition is determined uniquely given the configuration) formally as a septuple: . Thereto Turing introduced a machine that was later to be called the Turing machine. We can define a Turing Machine (sometimes called deterministic. sequence of transitions) continues until the Turing Machine enters either the (final) state qaccept or the (final) state qreject . TURING AND TURING MACHINE 143 systems) about numbers. this means the tape continues infinitely to the right side but it has a left-most position. memory (tape) and central processing unit (head and transition function). and depends on the current state and the current symbol on tape. By extrapolating the essential features of information processing. His model served as a basis for all subsequent digital computers. which could not be proved from finitely many rules. This computation (i. indubitably true. filled with a special blank symbol ) everywhere else.e. the tape contents and the head location) to the next. Nowadays there are many models of computing devices available in the theory of computation (complexity). This transition can consist of writing a new symbol of the tape alphabet Γ to the tape and moving the tape head either Left or Right. and starts in an initial state q0 . an idealized mathematical model that reduces the logical structure of any computing device to its essentials.9.1). We now directly introduce the much more powerful model of Turing that we need to invest all mathematical problems. It uses a transition function δ that determines how it gets from one configuration (that is the current state. Turing was instrumented in the development of the modern digital computer. We will not cover restricted models such as finite automata and pushdown languages (and corresponding notions such as regular languages and context-free grammars). Initially the tape contains an input string of symbols from an input alphabet Σ and is blank (i. and has a tape head that can read and write symbols (of a set Γ) and move around a tape (to the L(eft) or R(ight)).1. The Turing Machine model uses an infinite tape as its unlimited memory. But the decidability of mathematical statements was not settled by G¨dels theorem because it needs a formal definition of (alo gorithmic) method in the formulation of the problem (or a definition of the notion of algorithm in the definition of decidability in section 6. We here assume the tape is rightinfinite. which share his basic scheme of an input/output device (tape and head). The Turing Machine is in a state q of a set of states Q.

After defining the Turing Machine. 5 q0 is the start state. where δ is finite and δ : Q × Γ → Q × Γ × {L. R}.e. Turing’s expression for Turing Machines] can do anything that could be described as “rule of thumb” or “purely mechanical”. where qreject ∈ Q and qreject = qaccept . where { } ∈ Γ and Σ ⊆ Γ. 2 Σ is a finite input alphabet not containing the special blank symbol . qaccept . see also section 9. where qaccept ∈ Q. δ. it is intuitively or effectively computable). Turing made his famous proposal (known as Turing’s thesis.144 CHAPTER 9. We now introduce more of Turing’s theory of Turing Machines before we define his proof of undecidability. and halting configurations if the state is either qaccept or qreject . qreject ) with: 1 Q is a finite set of states. Σ. We quote one of Turing’s formulations from [90]: Turing’s Thesis: LCM’s [Logical Computing Machines. The proposal says that whenever there is an effective method for obtaining the values of a mathematical function (i. where q0 ∈ Q. 7 qreject is the reject state. 6 qaccept is the accept state. . The converse claim is trivial. CHURCH AND TURING Definition of a Turing Machine (TM): A Turing Machine (TM) := (Q. rejecting configurations if the state is qreject . A start configuration C on input w is a configuration with state q0 and the head is on the leftmost position on the tape with just w on it. Γ. the function can be computed by a Turing Machine. 3 Γ is a finite tape alphabet. q0 .3) for the concept of ‘computability by a Turing machine’. and if the thesis is correct we can reduce problems of (non-)existence of effective methods by problems of the (non-)existence of Turing Machine problems. 4 δ is the transition function. We call configurations accepting configurations if the state is qaccept .

Definition of the language of a TM: The language of a TM M .e. A set of strings that M accepts is called the language of M . M holds in qreject if w ∈ L.9. . Most variants are equivalent in the sense that they can recognize the same set of languages . TURING AND TURING MACHINE 145 We define a language to be a set of strings. Ck exists where 1 C1 is the start configuration of M on input w. We now define a notion that covers the ability of a TM to end in the accept state when started with any string of a certain language. M stops in qreject or does not stop (loops) if w ∈ L. a string being a series of alphabet symbols (i. There are several variants on Turing Machines such as double-sided infinite Turing Machines. w ∈ Σ∗ . 3 Ck is an accepting configuration. Let w ∈ Σ∗ . . 2 Each Ci yields Ci+1 via the transition function δ on M . multitape Turing Machines. Definition of Turing-decidable (or decidable): A language L is decided by a TM M := there exists a deterministic TM M such that: 1 with input w. If language L is recognized by a TM M we say that M is an acceptor for L. . M holds in qaccept if w ∈ L. and 2 with input w. / If a language L is decided by a TM M we say that M is a decider for L. M stops in qaccept if w ∈ L and / 2 with input w.1. for all strings w). We distinguish between recognizing and deciding capabilities. We say that a TM M accepts input string w if a sequence of configurations C1 . notation L(M ) := {w | w is a string that M accepts }. non-deterministic Turing Machines and certain types of so-called enumerators. Definition of Turing-recognizable: A language L is recognized by a TM M := there exists a TM M such that for all strings 1 with input w. .

y) = x + y if x ≥ y f (x. so addition amounts to the concatenation of two strings. q4 } Σ = {0. and decide what the initial situation of the tape is. . 1. y) = 0 if x < y For simplicity. First we have to choose a convention for representing positive integers. We choose a unary notation in which any positive integer xis represented by w(x) ∈ {1}+ . . . qR ). then how the comparison x ≥ y can be made and finally how to combine those two machines into a Turing Machine that computes the desired function. 1) = (q0 . qA . q0 . Calculating the sum To add the two numbers a and b. we assume x and y to be positive integers. δ. Γ. adds a and b and is constructed relatively simple: Adder = (Q. . 1. We assume that w(x) and w(y) are on the tape in unary notation. separated by a single ‘0’ and with the read-write head on the left-most symbol of w(x). The following Turing Machine.146 CHAPTER 9. with Q = {q0 . CHURCH AND TURING (but not necessarily equally efficient). We first describe how the sum of x and y can be calculated. } q0 = {q0 } qA = {q4 } qR = {} δ(q0 . The problem (idea from [56]) is to design a Turing Machine that computes the function f (x. Σ. R) . such that | w(x) | = x. q1 . we only have to remove the separating 0. 1} Γ = {0. Example: We now give an example of a Turing Machine solving a mathematical problem by first defining it as a language problem. called Adder.

) = (q4 . qR ). We use this to get into the state q5 . R) is needed to remove this ‘1’ at the end of the computation. 0. In the second case. Finally. R) δ(q1 . L) δ(q3 . we again assume they are written in the notation that we used before and divided by a single ‘0’. 0) = (q1 . 1. The matching will stop when one of the two sequences of ‘1’s is completely checked off. qA . we still find a ‘1’ on the right when all ‘1’s on the left have been replaced. We will construct a Turing Machine that halts in an accepting state if a ≥ b and in a rejecting state if a < b. Thereto we can match each ‘1’ on the left of the dividing ‘0’ with a ‘1’ on the right. Comparison To compare two numbers a and b. 1) = (q3 . which can be used as a signal to enter the accepting state. if a ≥ b. . 1. 1) = (q0 . 1) = (q1 . R) 147 Note that we remove the ‘0’ by temporarily creating an extra ‘1’. q0 . and if x ≥ y either the left sequence contains ‘1’s or neither sequence contains ‘1’s. L) δ(q2 . If x < y then the right sequences will still contain ‘1’s. L) δ(q3 . The transition δ(q2 . . If we work this out in detail. In the first case. Σ. we encounter a blank at the right of the working space. we move the read-write head back to the leftmost ‘1’. R) δ(q1 . 1. This is not strictly necessary in this example.9. but it is not harmful and normally a good habit to let any action terminate in a state from which it is easy to take further transitions. We can do this by starting at the leftmost ‘1’ (of the number a) and interchangeably check off the leftmost symbols of the numbers a and b by replacing them with the symbols ‘x’ and ‘y’ respectively. TURING AND TURING MACHINE δ(q0 . because the machine is designed such that it will terminate right after any addition. with: .1. ) = (q2 . Γ. a fact that is remembered by putting the machine into state q1 . δ. we get the following Turing Machine called Comparer := (Q. 1) = (q3 . 0. when we attempt to match another ‘1’.

R) δ(q2 .148 CHAPTER 9. δ(q3 . 0. } q0 = {q0 } qA = {q5 } qR = {q7 } The transitions of δ can be grouped in several parts. 1. The next group of transitions reverses the direction and repositions the read-write head over the leftmost ‘1’ of a. 0) = (q4 . q1 . δ(q0 . 1. y. R) δ(q2 . x) = (q0 . L) δ(q4 . 1) = (q4 . 1) = (q1 . y) = (q2 . y. then causes the read-write head to travel right to the first ‘1’ of b and replace it with the symbol ‘y’. q7 } Σ = {0. x. 1) = (q1 . indicating that on ‘1’ of ‘y’ has been successfully paired with a ‘1’ of ‘x’. When the dividing ‘0’ is passed. L) δ(q4 . 1. 1) = (q3 . R) δ(q1 . y. q2 . 0. R) . and returns control to the initial state. 1} Γ = {0. CHURCH AND TURING Q = {q0 . q4 . R) δ(q1 . the machine enters state q2 . 0) = (q2 . the machine enters a state q3 . q6 . x. y) = (q3 . x. L) δ(q3 . L) This set replaces the leftmost ‘1’ of a with ‘x’. q3 . y. q5 . indicating that it is now dealing with the number b. When the symbol ‘y’ has been written.

9.1. TURING AND TURING MACHINE

149

The rewriting continues this way when the input is a string 1x 01y , stopping only when on one side no more ‘1’s can be replaced. In that case either the left side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of ‘1’s (a > b). In case the left side will not contain anymore ‘1’s, the transition δ(q4 , x) = (q0 , x, R) will leave the read-write head on a ‘0’ in stead of a ‘1’. δ(q0 , 0) = (q5 , x, L) (a ≤ b) δ(q2 , ) = (q6 , , L) (a > b) In the first case we still have to check whether the right side has any ‘1’s left, to determine whether a = b. This is done in the state q5 . δ(q5 , x) = (q5 , x, R) δ(q5 , 0) = (q5 , 0, R) δ(q5 , y) = (q5 , y, R) δ(q5 , 1) = (q7 , y, R) (a < b) δ(q5 , ) = (q6 , , L) (a = b) Combining Turing Machines for complicated tasks We now have to put together the Turing Machines’ Adder and Comparer to obtain the desired Turing Machine that computes the given function. We can do this by starting with the input a and b in the previously described notation and starting position, and using Comparer to determine whether or not a ≥ b. We index all states with a C, i.e. the last transition will be δ(qC,0 , x) = (qC,5 , x, L) or δ(qC,2 , ) = δ(qC,6 , , L). In the first case (a ≥ b), the Comparer should send a ‘start signal’ to the Adder, to give a + b as output. In the second case (a < b), the Comparer should send a ‘start signal’ to a Turing Machine, (called Eraser) that simply replaces all ‘1’s by ‘0’s to output the value 0 in the desired format. We show how we can let the Comparer send these ‘start signals’. We first index all states of the Adder by A and of the Eraser by E. Now in case of a ≥ b, Comparer ends in state qC,5 , and we can add a transition δ(qC,5 , ∗) = δ(qA,0 , ∗). The star ‘∗’ stands for any possible symbol, so actually this transition is a shorthand notation for a set of transitions. Similarly, we can let

150

CHAPTER 9. CHURCH AND TURING

δ(qC,7 , ∗) = δ(qE,0 , ∗) bring the Eraser in the initial state. The Adder respectively Eraser will then give the desired output because their behavior on the input does not change as a result of the remaining of the states by comparer (to be exact: the state in which the comparer terminates is suitable as an initial position for Adder or Eraser). The only thing we have not taken care of is that when the Comparer enters a final state, it does not have the initial representation of the numbers a and b on tape, but has replaced the ‘1’s by ‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as an exercise if you want) fix this by letting Comparer, as the last action before entering a final state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a Turing Machine that combines Comparer, Adder and Eraser to compute the function f . Similarly to this example, we can for example multiply two numbers a and b, and we can also translate macro-instructions like ‘if p then qj else qk ’ (meaning that when we read ‘p’ on tape, then the Turing Machine goes into a state qj and otherwise into a state qk ), and even combine them into complicated subprograms that can be invoked repeatedly whenever needed. (End of Example) The Entscheidungsproblem After introducing the notion of a TM in [89], Turing answered Hilbert’s decision problem for mathematical logic (in German called ‘Entscheidungsproblem’) in the negative. The Entscheidungsproblem asks whether there exists a definite method or algorithm which (at least in principle) can be applied to any given mathematical property to decide whether that proposition is provable. We now define the notion of an algorithm with the notion of a Turing Machine, and the set of provable propositions by the set of languages that can be decided by some TM. If we look at the definition of decidability in section 6.1, we have that for all formulas ϕ an algorithm, i.e. a TM, exists that decides whether ϕ is true or not. If we code ϕ by means of a language, and this is always possible (see the previous example for a demonstration), we can reformulate the problem as: for all strings w ∈ L, there exists a TM M that decides ϕ. We now show that this is not possible for all problems (i.e. languages) by giving a specific problem, the Halting problem, that is not decidable. The Halting problem is the problem of testing whether a TM accepts a given input string. We define the problem by stating it as a language pro-

9.1. TURING AND TURING MACHINE blem, and asking whether that language is decidable.

151

Definition of the Halting problem: For all strings w, H := {< M, w > | M is a TM and M accepts w}. Is H decidable? (i.e. is there for each language a TM that decides for all strings w if they belong to the language or not, that is (using Turing’s thesis, see section 9.3): is there for each problem an algorithm that can decide it?). Theorem: H is recognizable Proof (by Turing): The following TM U , also called Universal Turing Machine because it is capable of simulating any other Turing Machine, recognizes H. We informally define U , because a detailed definition of the septuple such a TM consists of (see the definition of a TM) is a lot of work. Description of Universal Turing Machine: U = “On the input < M, w > where M is a TM and w is a string: 1 simulate M on input w 2 if M ever enters its accept state, accept” Note that this TM loops on input < M, w > if M loops on w, which is why this machine does not decide H. If the algorithm had some way to determine that M was not halting on w, it could reject. Hence H is sometimes called the Halting problem. As Turing demonstrated, an algorithm has no way to make this determination. Theorem: H is undecidable (see also [82, page 165]). Proof (by Turing): We assume H is decidable and obtain a contradiction. Suppose D is a decider for H, and defined by D(< M, w >) :=“ • accept if M accepts w • reject if M does not accept w” Now we construct a new TM O with D as a subroutine. This new TM calls D to determine what M does when the input to M is its own description < M >. Once O has determined this information, it does the opposite. That is, it rejects if M accepts and accepts if M does not accept. The following is a description of O: O = “On input < M >, where M is a TM:

2 output the opposite of what D outputs. . of the inadequacy of ‘reason’ unsupported by common sense. . that is if D accepts.turing. .Alan Turing In this section I have made extensive use of [38] [92] for information on the life and work of Turing and [89] [82] [19] for the theory of TM’s and the Halting problem. may be regarded as going some way towards a demonstration.152 CHAPTER 9.org.uk/ . Another valuable source of information on Turing’s life and work is the website http://www. In that case we get: O(< O >) = “ • accept if O does not accept < O > • reject if O does accept < O > ” Thus neither O nor D can exist. such as the Halting problem for Turing machines: These . Turing wrote in his last publication about the interpretation of unsolvable problems. < M >>. reject and if D rejects. accept” We summarize the behavior of O as follows: O(< M >) = “ • accept if M does not accept < M > • reject if M accepts < M > ” Now we obtain the contradiction by running O with its own description < O > as input. CHURCH AND TURING 1 run D on input < M. within mathematics itself.

9.2. CHURCH AND THE LAMBDA CALCULUS

153

9.2

Church and the Lambda Calculus

Alonzo Church (1903-1995) was an American mathematician, whose work is of major importance in mathematical logic, recursion theory and in theoretical computer science. One of the most important contributions to logic is his invention in the 1930s of the lambda calculus. He is also remembered for Church’s theorem published in 1936 in [14, page 345-363], stating that the lambda calculus can be used to embody a correct formalization of the notion of computability (see section 9.3). The notion of lambda definability is conceptually the basis for the discipline of functional programming, and the lambda calculus is also the basis for type theory. Church also founded the Journal of Symbolic logic in 1956. He had 31 doctoral students including famous mathematicians such as Turing, Kleene, Kemeny and Smullyan. We now introduce the lambda calculus (Church’s formalization of the notion of effective calculability) in a modern setting, using [9, chapter 4]. Application and abstraction First we introduce the basic concepts of λ-calculus. A formalization follows thereafter. The lambda calculus has only two basic operations, abstraction and application. • Abstraction is for constructing functions: For an expression E we introduce λx.E to denote the abstraction of E over x, i.e. ‘the function of x which computes E’. Example1 : λx . x + 1, λn . n × n, etc. We will later see how to define a recursive function; this is not so easy since we do not have function names. • (Function) application: The expression F A denotes that F is considered as a function (an algorithm) applied to input A. The original lambda calculus theory is type-free so we also consider F F , that is, F applied to itself. Example: (λx . x + 1) 4, (λn . n × n) 7, etc.

Note that in some examples we have simplified the notation for the clarity of the example, since in pure lambda calculus we do not have arithmetic symbols, like + and ×, but we can encode these operations in the pure lambda calculus, as we will later see.

1

154

CHAPTER 9. CHURCH AND TURING

These two notions can be very powerful if we introduce the rule of beta reduction which allows us to apply an expression over an abstraction, and for example, rewrite (λx . x+1)4 to 4+1. Similarly (λn . n×n) 7 can be reduced to 7×7. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)×n) 7) 4 can be reduced to (λx . (x + 1) × 7) 4 and then to (4 + 1) × 7. Similar to ordinary mathematics, the names of the variables are irrelevant to the rules that can be applied, which allows a transformation of the names (also known as dummy transformation). This rule in lambda calculus is called alpha conversion. For example, alpha conversion allows us to rewrite λn . nn to λx . xx, since they are essentially the same function. Note that we also want to use functions as variables and arguments: ((λf . (λn . λx . f x × n) 7)(λy . y + 1)) 4 should reduce to the earlier expression. But above we only have functions of one argument; we now introduce functions with more arguments, while avoiding new notations. We can solve this problem by using iteration of applications, often called currying after the American mathematician H.B. Curry who made it popular. Example: f (x, y) = 3 × x + y can be written as F1 ≡ λx . (λy . 3 × x + y). Then f (4, 5) is written (F1 4) 5, that is ((λx . (λy . 3 × x + y)) 4) 5, which can be reduced to (by using beta reduction): 3 × 4 + 5. The above explanation and examples give an idea of what lambda calculus is. We will now work towards a more formal definition of lambda calculus. The system of lambda calculus is based on the structure of Abstract Reduction Systems (ARS). The terms of the ARS then coincide with the inductively defined lambda terms and the reduction relation will be β−reduction. So before we formally define the lambda calculus, we introduce the most relevant theory of abstract reduction systems. Abstract Reduction Systems Definition of Abstract Reduction System (ARS): An abstract reduction system A := a structure A, → consisting of a set A and a binary relation → on A (i.e. →⊆ A × A). The relation is also called reduction or rewrite relation. If for a, b ∈ A, we have a → b, we call b a one-step reduct of a.

9.2. CHURCH AND THE LAMBDA CALCULUS

155

The transitive and reflexive closure of → is written as (or alternatively → ). This means is the smallest relation on A satisfying, for all a, b, c ∈ A,

(closure of →) if a → b then a (reflexive) a (transitive) if a a, and b and b

b,

c then a

c.

Thus a b if and only if there exists a finite sequence of reduction steps a ≡ a0 → a1 → . . . → an ≡ b. This sequence may be empty, in which case a ≡ b. Here ≡ denotes (the syntactic) identity of elements of A, i.e. a ≡ b if and only if a and b are the same element of A. Definition of Normal Form: A term a ∈ A of an ARS < A, →> is a normal form := there is no b ∈ A such that a → b. Furthermore, b ∈ A has a normal form if and only if b a for some normal form a ∈ A Definition of Weakly Normalizing: The reduction relation → of an ARS < A, →> is weakly normalizing (or weakly terminating) := every a ∈ A has a normal form. In this case we also say that A is weakly normalizing Definition of Strongly Normalizing: The reduction relation → of an ARS < A, →> is strongly normalizing (also called terminating, well-founded or noetherian) := there exists no infinite reduction a0 → a1 → a2 → . . ., with for all n ∈ N, an ∈ A. Lemma If an ARS is strongly normalizing, it is weakly normalizing. Proof: We prove this by proving the contraposition: if A, → is not weakly normalizing then A, → is not strongly normalizing. Suppose A, → is not weakly normalizing. Then there is a0 ∈ A without a normal form. Since a0 has no normal form, then certainly a0 is not a normal form itself, so there is a1 ∈ A such that a0 → a1 . Now a0 has no normal form, so a1 can not be a normal form. Thus we get an element a2 ∈ A such that a1 → a2 . Repeating this process yields an infinite reduction a0 → a1 → a2 → . . .. Definition of Unique Normal Form: The reduction relation → of an ARS < A, →> has the unique normal form property := for all a, b, c ∈ A

there exists a d such that b d and c d.156 such that a b. so it trivially has the unique normal form property and is not weakly normalizing. Proof: Suppose a b. Since → is weakly normalizing. or is Church-Rosser ) := for all a. c are normal forms. →> is called confluent (or has the Church-Rosser property. thus b ≡ c. and thus by the unique normal form property we also have a b and a b and c b. Since b and c are normal forms. By transitivity there are normal forms b and c such that b c . →> with the unique normal form property is not always weakly normalizing. →> is called locally confluent or weakly confluent (also weakly ChurchRosser ) := for all a. Hence b Lemma If → is confluent then → has the unique normal form property. the abstract reduction system with only element a ∈ A and rewrite rule a → a has no normal forms. a c. CHURCH AND TURING c. we have b ≡ c Lemma An ARS < A. b. we must have b ≡ d and c ≡ d. we will give a more formal definition. Proof: Suppose we have a b and a c. c ∈ A with a b and a c there exists a d ∈ A such that b d and c d Lemma If a reduction relation has the unique normal form property and is weakly normalizing then it is confluent. b and c c . b ≡ c . By confluence. c ∈ A with a → b and a → c there exists a d ∈ A such that b d and c d Definition of Confluence: A reduction relation → of an ARS < A. and b. a CHAPTER 9. Definition of Local Confluence: A reduction relation → of an ARS < A. Syntax Now we have seen the basic principle of lambda calculus. We formally define the syntax of the lambda calculus by giving its grammar. and b. Proof: For instance. . c are normal forms. b.

c. Example: n occurs free in λx . . .2.) 157 • (E1 E2 ) denotes a combination involving the application of one expression (E1 ) to another (E2 ). (x + 1) × n. E) denotes an abstraction. . Both n and x occur bound in λn . E1 E2 means (λv .)). . . Notational conventions: to achieve a minimal notation. . . . . (E1 E2 )). for elements of C) • v ranges over a (denumerable) set of variables (using v. Example: We can write the expression F1 of the previous example as λx y . and the occurrence is said to be free otherwise. x. and λv . . E) . . . x + 1) × n.xn . and assume: • Association to the left for iterated application: F E1 E2 . Informally it denotes a function of v which produces result E. ((F E1 ) E2 ) . x2 . The subexpression E is referred to as the body of the abstraction and v is called the bound variable of the abstraction We also call lambda terms simply ‘terms’ or ‘expressions’. En denotes (. we drop parentheses whenever possible. . 3 × x + y. E) . Free/Bound Variables and α-conversion We distinguish between free and bound occurrences of variables in an expression. x + 1) × x (the second occurrence of ‘x’ . . . w. . with • C ranges over a set of constants (we will use the constant names a. . .E or shortly λx1 x2 . E1 . CHURCH AND THE LAMBDA CALCULUS Definition of the Syntax of Lambda Terms: Lambda Term E := C | v | (E1 E2 ) | (λv . (λx . . b. • Association to the right for iterated abstraction: λx1 . En ). . The subexpression E1 is referred to as the operator and E2 is referred to as the operand • (λv . Further x occurs both bound and free in (λx . An occurrence of v in E is said to be bound if it occurs within a subexpression of E with the form λv . whereas x occurs bound in this expression. (λxn . . (λx2 . .9. (. E denotes λx1 . xn .

E . • F V (λv . The renaming process is called α-conversion.e. . Expressions that can be made textually equivalent by renaming bound variables are called α-convertible or alpha(betically) equivalent. it does not occur in E). y + y. Definition of free variables: The free variables of a term E. y are considered being equivalent. xy and λxy . since one has a free occurrence of y and the other not. So λx . An expression E is said to be closed if F V (E) = ∅. y × z) is closed. z + x)(λy . that is. Example: The expression λz . • F V (v) = {v}. CHURCH AND TURING in this expression is bound. provided y is fresh. α-conversion We consider two terms as ‘equivalent’ if they only differ in their bound variables. as long as we do so consistently. E →α λy . E) = F V (E) − {v}.158 CHAPTER 9. yx are not equivalent. (λx . and often also E1 ≡ E2 . y neither occurs as a free variable nor as a bound variable in the expression E (i. But we must distinguish λx . where E is obtained from E by replacing all free occurrences of x in E by y. • F V (E1 E2 ) = F V (E1 ) ∪ F V (E2 ). y + x and λy . When two lambda terms E1 and E2 are α-convertible in this sense we write E1 ≡α E2 . the third occurrence is free). and allows us to change the name of a bound variable. Note also that λxy . It is formally defined as the equivalence relation generated by the following reduction: Definition of α-reduction: λx . x and λy . denoted by F V (E). is a set of variables defined recursively by: • F V (C) = ∅.

Lemma For all terms E0 . (z+7×y)[x+3/z] ≡ λy . (λy . y such that x ≡ y: E0 [E1 /x][E2 /y] ≡ E0 [E2 /y][E1 [E2 /y]/x]. is defined by induction on the structure of E0 as: • C[E/v] ≡ C • x[E/v] ≡ E if x ≡ v x if x ≡ v • (E1 E2 )[E/v] ≡ (E1 [E/v])(E2 [E/v])  if x ≡ v  λx . (x+3)+7×y. Substitution We now formally define the concept of substitution of a variable in lambda terms. z+7×x)[x+3/z] ≡ λy . two λ-terms are considered (syntactically) equal if they are α-convertible to each other. y × y) (because the y’s in (λy . x × y)y ≡α λx . E1 . E1 )[E/v] ≡  λy .(λz .9. The following lemma tells us that substitution behaves well. y × x) ≡α λy .2. Definition of Substitution: The substitution of expression E for each free occurrence of v in expression E0 . E2 and variables x. x × z)y From now on. CHURCH AND THE LAMBDA CALCULUS 159 Example: Some α-conversions: λx .(λy . it can be proven by induction on the structure of λ-terms. y + 1 λx . x + 1 ≡α λy . (λy . ((E1 [y/x])[E/v]) if x ≡ v and x ∈ F V (E)   and y ∈ F V (E1 E) / Example: (λx . E1   if x ≡ v and x ∈ F V (E) / λx . y × x) will get bound) λx . (E1 [E/v]) • (λx . denoted by E0 [E/v]. .

with the rules: E1 →β E2 E1 E →β E2 E E1 →β E2 EE1 →β EE2 E1 →β E2 λv. λx . λx . TWICE ((λy . f (f x). Definition of β-reduction: β-reduction is the compatible relation generated by (λv . Example: This example illustrates the need of α-conversion during β reduction. CHURCH AND TURING Reduction System for the Lambda Calculus As we have seen with an example at the beginning of this section. the main rule for the lambda calculus is the beta reduction rule. yy)TWICE →β TWICE TWICE ≡ (λf . E1 )E2 is called a β-redex . We use =β (or sometimes simply =) to denote the equivalence relation generated by →β . λx . TWICE ((λx . instead of →β and β . any term matching the left-hand side of the rule is called a redex and thus any expression of the form (λv . f (f y))[x/f ]) (Note the name clash) . f (f x))[x/f ]) λx . (x + 1) × n) 7 4 →β (λx .E1 →β λv. then (λy . Define TWICE ≡ λf . TWICE ((λf . Note the difference between ≡( α) and =(β) . even if distinct names are chosen from the start.160 CHAPTER 9. f (f x))x) →β ≡α λx . Example: (λnx . that we can now formally define. (x + 1) × 7) 4 →β (4 + 1) × 7.E2 As before. We often write → resp. TWICE (TWICE x) ≡ λx . β-reduction is a reduction relation →β of the pure lambda calculus. f (f x)) TWICE →β λx . E1 )E2 →β E1 [E2 /v].

xxx)(λx . Then |= ♦. xxx) →β · · ·. Definition of the Diamond Property: A binary relation → on the lambda terms Λ satisfies the diamond property. xx) →β · · ·. xxx)(λx . xxx)(λx . notation → |= ♦ := (∀M. it does have the important confluence property. xxx) →β (λx . xxx)(λx . x(xy)) . M2 ∈ Λ : (M → M1 ∧ M → M2 ) ⇒ (∃M3 : M3 ∈ Λ : M1 → M3 ∧ M2 → M3 )) Note that a reduction →β has the Church-Rosser property if it satisfies the diamond property. 3. TWICE (λy . xxx)) and the latter term can be reduced to c or again to a longer term. CHURCH AND THE LAMBDA CALCULUS ≡ →β Example: 1. M1 . (λy . its transitive. but also (λy .2. 4. (λx . 161 2. . This corresponds to ‘self-reproducing programs’. c)((λx . c)((λx . so different reduction paths are possible.9. so terms can even become arbitrarily large. xx)(λx .. Ω ≡ (λx . xx) →β (λx . xxx)(λx . Although we already saw that λ-calculus is neither weakly nor strongly normalizing. xx)(λx . xxx)) → (λy . M1 . M2 : M. thus infinite sequences of steps are possible: β-reduction is not always terminating. λx . x + 1) ((λy . xxx)(λx . To prevent confusion in the notation we will from now on also use the implication symbol ⇒. (λx . etc. xxx)(λx .. c)((λx . y × y) 3) β (two possibilities) (3 × 3) + 1. Lemma: Let → be a binary relation on a set Λ with reflexive closure and let → |= ♦. xxx)) → c. First we introduce the following definition of the diamond property that we use to prove that →β is confluent.

162

CHAPTER 9. CHURCH AND TURING

Proof: Assume → is a binary relation on a set Λ with its transitive, reflexive closure, and →|= ♦. We now have to prove that |= ♦. Suppose M , L, K ∈ Λ, M L and M K. We then have to prove (∃N : N ∈ Λ : L N ∧K N ). Let (*) M ≡ M0 → M1 → . . . → Mn ≡ L, for some n ∈ N

(**) M ≡ K0 → K1 → . . . → Km ≡ K, for some m ∈ N We now need to apply a technique called induction loading (see for more information the links on http://zax.mine.nu/stage/) to prove that K and L have a common reduct N. To be precise, we show that l(m, n) holds for all m, n ∈ N, with l(m, n) := there exists a N (i, j) ∈ Λ, with i, j ∈ N and 0 ≤ i ≤ n ∧ 0 ≤ j ≤ m such that: (a) N (i, 0) ≡ Mi (b) N (0, j) ≡ Kj (c) N (i, j) → N (i, j + 1) (d) N (i, j) → N (i + 1, j) if 0 ≤ i ≤ n if 0 ≤ j ≤ m if 0 ≤ i ≤ n ∧ 0 ≤ j < m if 0 ≤ i < n ∧ 0 ≤ j ≤ m

Clearly, when l(m, n) is true for all m, n ∈ N, we know that K and L have a common reduct. So the only remaining proof obligation is to show that l(m, n) holds for all m, n ∈ N. We prove this by induction to n. Base case (n): n=0 (a) let N (0, 0) be M0 , then (a) holds trivially by reflexivity of ‘≡’. (b) let N (0, j) be Kj for 0 ≤ j ≤ m, then (b) also holds trivially. Note that this is valid in combination with the definition under (a) since N (0, 0) ≡ M0 ≡ M ≡ K0 . (c) N (i, j) → N (i, j + 1) holds because i = 0 and (**). (d) N (i, j) → N (i + 1, j) holds trivially because n = 0 yields an empty range for i.

9.2. CHURCH AND THE LAMBDA CALCULUS

163

Induction case (n): Induction hypothesis (i.h.-n): suppose that for n = k, k ∈ N, for all m ∈ N the statement l(m, n) is true. We now prove the statement for n = k + 1. We do this by induction to m. Base case (m): m=0 (a) let N (k + 1, 0) be Mk+1 for 0 ≤ k ≤ m, then (a) holds trivially. (b) since j = 0 this amounts to N (0, 0) ≡ K0 . This is true because of our previous definition of N (0, 0) ≡ M0 . and the fact that M0 ≡ M ≡ K0 . (c) holds trivially, because m = 0 yields an empty range for j. (d) N (i, j) → N (i + 1, j) because j = 0 and (*). Induction case (m): Induction hypothesis (i.h.-m): suppose that for m = r and n = k + 1, r ∈ N, the statement l(m, n) is true. We now prove the statement for m = r + 1. (a) N (i, 0) ≡ Mi for 0 ≤ i ≤ k + 1 follows from i.h.-n. (b) N (0, j) ≡ Kj for 0 ≤ j ≤ r + 1 follows from i.h.-m. (c) and (d) We already know from the induction hypotheses that N (i, j) → N (i, j + 1) is okay for (0 ≤ i ≤ k + 1 ∧ 0 ≤ j < r) ∨ (0 ≤ i < k ∧ 0 ≤ j < r + 1). What we now have to show is that this is also true for i = k+1 and j = r+1. We know by (c) of i.h.-m there exists a N (k, r) such that N (k, r) → N (k, r + 1). We also know by (d) of i.h.-n that there exists a N (k, r) such that N (k, r) → N (k + 1, r). Then by the diamond property of → we know (∃N (k + 1, r + 1) : N (k + 1, r + 1) ∈ Λ : N (k, r + 1) → N (k + 1, r + 1) ∧ N (k + 1, r) → N (k + 1, r + 1)). We can now sketch the proof2 of the following fundamental theorem of the untyped lambda calculus:
The lines of the proof are due to W. Tait and P. Martin-L¨f (see [6], section 3.2]), but o as far as I know this is the first proof that formalized the above lemma to a reasonable extent.
2

164

CHAPTER 9. CHURCH AND TURING

Theorem (Church, Rosser): →β is confluent. Proof: By the previous lemma, we know that if any binary relation on a set satisfies the diamond property, its transitive reflexive closure also satisfies the diamond property. Suppose we have a binary relation →partial−β on the set Λ such that β is the transitive reflexive closure of →partial−β . So if we prove that →partial−β satisfies the diamond property, by application of the previous lemma we have proved that β satisfies the diamond property, i.e. →β is confluent. A concrete definition of →partial−β , a proof that its transitive reflexive closure is indeed →β , and a proof that →partial−β satisfies the diamond property can be found on pages 60-62 of [6]. Theorem: λ-calculus has the unique normal form property. Proof: Suppose that a term a of Λ, → has two normal forms, n1 ∈ Λ and n2 ∈ Λ. This means there is no b ∈ Λ such that n1 → b or n2 → b. n2 , and then by the Church-Rosser property we know But a n1 ∧ a n2 ). But then we must have n1 ≡ n2 . (∃c : c ∈ Λ : a n1 ∧ a Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, . . .. Note that the term (λx.xx)(λx.xx) cannot be reduced to a normal form. Confluence is a fundamental property for functional programming; we relay on this when we evaluate programs by rewriting, knowing that we never have to backtrack an evaluation (this is also one of the main differences with logic programming). In the λ-calculus we have defined in this section, we can represent natural numbers and basic operations on the natural numbers. We will not show this here; in most books on the lambda calculus there are some examples of how to do basic arithmetic in lambda calculus. The λ-calculus represents a certain class of (partial) functions on the integers. By a classical result of the American mathematician Stephen C. Kleene (1909-1994) this is exactly the set of (partial) recursive functions. The proof can be found in [6, theorem 9.2.16]. Church also thought of the set of functions that could be calculated in his λ-calculus, and conjectured the following thesis:

e. So the power of Turing Machines is the same as the power of λ-calculus. Both models capture the intuitive idea of computation. is the same as the set of functions that can be defined in λ-calculus. CHURCH AND THE LAMBDA CALCULUS 165 Church’s thesis (1936) The set of effectively computable functions. This important thesis is the subject of the next section.2. Alan Turing proved in 1937 that the class of Turing computable functions is the same as the class of functions definable in λ-calculus. i.3.9. . A more formal version and detailed treatment of Church’s thesis can be found in section 9. functions that intuitively (effectively) can be computed.

Turing quickly established that the apparatus of lambda-definability and his own apparatus of computability were equivalent ([89]. CHURCH AND TURING 9. page 263). The formal concept proposed by Turing is that of computability by a Turing Machine (see section 9. After learning of Church’s proposal.org/turpap2/tp2-ie.2). The class of lambda-definable functions and the class of recursive functions were later shown to be identical. In order to calculate the values of the function Church introduced his lambda calculus and specified the notion of a recursive function (see section 9. and attempts to characterize this effectiveness lacked rigor.166 CHAPTER 9.1). . This was established in the case of functions of positive integers by Church and the American mathematician Kleene (see [47]. Church’s thesis: A function of positive integers is effectively computable only if it is recursive The reverse implication is also referred to as the converse of Church’s thesis. One of Turing’s achievements in his paper of 1936 (reprinted in [19] and online available at http://www. mainly because the key requirement that the method demands no insight or ingenuity is left unexplicated. Turing’s method was however more obvious and more general than Church’s.abelard. Turing’s thesis: TM’s can do anything that could be described as intuitively computable Church also presented in [14] a formally exact way to express this notion of intuitively computable. The notion of an algorithm or an effective method is an informal one. [14]). since the latter only considered functions of positive integers.asp) was to present a formally exact predicate with which the informal predicate ‘can be calculated by means of an algorithm or effective method’ may be replaced.3 The Church-Turing thesis The Church-Turing thesis concerns the intuitive notion of algorithm (or effective or mechanical method) in logic and mathematics. He introduced this thesis in [90] in the course of arguing that the ‘Entscheidungsproblem’ for the predicate calculus is unsolvable.

page 232). with any architecture can compute’ (Turing said noting about the limits of what can be computed by a machine) • Whatever can be calculated by a machine (working on finite data in accordance with a finite program of instructions) is Turing-machinecomputable (this is known as Thesis M. Proof: See [89. THE CHURCH-TURING THESIS 167 Theorem: Lambda-definability and Turing Machine-computability are equivalent.3. see [16]) • Any process that can be given a systematic mathematical description (or a ‘precise enough characterization of a set of steps’. respectively by identifying the notion with that of computability by a Turing Machine and in the lambda calculus. Kleene introduced the term ‘Church-Turing thesis’ to refer to any of the two equivalent theses ([48]. After this proof of equivalence. or that is ‘scientifically describable’ or ‘scientifically explicable’) can be simulated by a TM (this is known as Thesis S. Church-Turing thesis: The intuitive notion of an algorithm equals the Turing Machine algorithm or (equivalent) the calculable functions of lambdacalculus There are a number of misunderstandings of the Church-Turing thesis. the Church-Turing thesis is often stated as ‘All computable functions are computable by a Turing Machine’ (a function is said to be computable if and only if there is an effective procedure for determining its values). both methods are equivalent. collected in [16]. Turing did not show that • Any problem can be solved ‘by instructions. explicitly stated rules or procedures’ • A universal TM ‘can compute any function that any computer. page 263] for a proof that Turing’s machines and Church’s lambda calculus can compute the same set of functions. Although Turing and Church had chosen different ways to formalize the intuitive notion of effective computability.9. . see [16]) Since the word ‘computable’ is often tied by definition to effective calculability.

the lambda calculus. On the face they seemed very different from one another. When the thesis is expressed in terms of the formal concept by Turing. we can say that to define the concept of an algorithm. and idem for the case of Church.168 CHAPTER 9. in the sense that each picks out the same set of mathematical functions. but not the formal statement. Clearly. were true. but these two definitions turned out to be equivalent. it is appropriate to refer to the thesis also as the Turing thesis. CHURCH AND TURING If we summarize the above. Church used a notational system. if there were functions of which an informal (intuitive) statement. . then the latter would be less general than the former and so could not be reasonably be employed to replace it. the Turing Machine. The Church-Turing thesis is the assertion that this set contains every function whose values can be obtained by a method or algorithm corresponding to our intuitive notion of effectively computable. It is agreed amongst mathematicians and logicians that ‘computable by means of a TM’ is the correct accurate rendering of the informal notion in question. Turing did the same with his theoretical computing device.

. We are told that by its aid the stars are weighed and the billions of molecules in a drop of water are counted. in [99] When I started my study on the foundations of mathematics. though relatively unknown.T. this greatest science eludes the efforts of our mental weapons to grasp it. . Most books that I found on the foundations of mathematics were either very detailed and descriptive (with an unmatched level of detail and exactness is the book [31] of I.A. Whitehead. that we should cultivate the habit of thinking of what we are doing. like the ghost of Hamlet’s father. Civilization advances by extending the number of important operations which we can perform without thinking about them. but sometimes hard work when I had to go through numerous books that were full of details or too vague and philosophical. I did not quite know what to expect. . The precise opposite is the case. By now I’ve learned that the foundations of mathematics can be a fascinating and important subject. Learning this new subject was an interesting challenge. Grattan-Guinness) or treat only a part of the theory that was developed from 1890 to 1940 (for example [17] gives an excellent introduction to set theory). . . Kneebone [49] that is quite complete and still considerably theoretic. One of the motivations to write this article was to 169 . The study of mathematics is apt to commence in disappointment . . repeated by all copy books and by eminent people.Chapter 10 Conclusion It is a profoundly erroneous truism. One of the better. Yet. when they are making speeches. is the book of G.

But let’s first go back to the project. We all know you can make a popular operating system or start your own business on the web and in one year make a million dollars if you’re lucky.170 CHAPTER 10. Most undergraduate courses I have taken gave little or no information about the history that is laying behind the theory. Hopefully that makes it more clear and enjoyable. I also tried to briefly introduce the reader to the historical context of the most important developments. CONCLUSION present the theory properly. Emphasis was laid on the accumulation of mathematical knowledge. Some of the good literature used. such as model-checking. such as the books just mentioned. The project In the beginning of the 20th century Hilbert said we should formalize all of mathematics. I believe that the history of mathematics in education can not only make the study of mathematics more interesting. if you prefer) grew apart. When reading about the work and biographies of all those brilliant men that have put themselves on this problem. The attempt to formalize mathematical reasoning is not new . you can (at least that’s what happened to me) get caught up into this fascinating philosophical question. will be found in the references at the end of this report. And when it comes to verifying mathematical proofs and making reliable software. I want to conclude this report with a summary of the theory and my own view on the project.the Greek already thought rationality was the supreme goal. on how and why in that respect the more practical. To most people however. mathematical reasoning. and other techniques. We can think of Plato . a formal basis is rarely used. the human mind is still the most important. are preferred. working mathematicians and more theoretical logicians (or formalists. this all seems very impractical. It might be worth writing another article. At the same time. but also help in the growth of mathematical understanding and appreciation of the current form of the theory. and with some ideas for future work. This ‘project’ (from now on I will refer to it as the project) has been the central theme of this report.

Russell always insisted on attributing it to Pythagoras (see [62]). logic was further developed and important contributions come from De Morgan. Because he was interested in theology and God (see [31. Turing later formalized the notion of computability to show there is no mechanical Although rationality is more commonly associated with Plato. Poincar´. At that point Hilbert proposed to use a formal axiomatic method to solve these problems. and he gave his famous three requirements of consistency. o and not everyone clearly recognized its importance. But there turned out to be some problems. Ever since. And even nowadays. Peano and Frege.”.8). Formalizing mathematics so that we have absolute truth is not possible! But these works of G¨del and Turing were new and complicated. and some people regarded his set theory as a disease. page 203]). and we often see confusion between notions like ‘checking the proof of a statement’ and ‘checking whether a statement is true (or not)’. Bernays and Neumann. But although all of these attempts were fruitful to a certain extent. This proposal of Hilbert to formalize mathematics. brought mathematical reasoning to an even higher level of formalization. as we have learned in chapter 4. chapter 3] and also [30. and developed his theory of infinite sets. with his patterns of reasoning that are known as syllogisms. and although Cantor had already noticed this (see Cantor’s paradox in section 3. such as those of Zermelo and Fraenkel. completeness and decidability. with their theory of types. and it took G¨del and Turing to show that in fact ‘the project’ couldn’t o be done. led to the development of several axiomatic systems. So far. 2 Whether or not he actually said this is a matter of debate amongst historians of mathematics. Russell and Whitehead made their own ato tempt to formalize mathematics. Leibniz and especially Boole. by stating his Russell paradox. in total they all failed. or as Russell1 would say . Cantor became obsessed with the notion of infinite. G¨del created a statement o o within arithmetics. the great French mathematician. There is also much confusion about the exact implications of G¨del’s and Turing’s work. that is not provable in any axiomatic system.8. With Cantor mathematics got more abstract. so good.think of Pythagoras and Rationality! Aristoteles made a big step in formalizing the reasoning. section 5.171 and Reason. few people are familiar with the details of their work. and of G¨del. 1 . it was Russell who spread the bad news to everyone. said2 : (from [95]) e “Later generations will regard Mengenlehre (set theory) as a disease from which one has recovered.

The only effect the project might have had on working mathematicians. this is the way “we’re all benefiting from the glorious failure of this project!”. At first this was a shock.J. but formalization of algorithms has been the great technological success of our time . but then mathematicians were saying (and again it would be nice to write an article about the different responses of mathematicians and logicians): so what . this does not apply to the problems I care about. but sometimes they have vastly unexpected consequences”. If you look at G¨del’s original paper you see what looks like LISP. but it is true that theoretical studies. Some of course were inspired by problems like the 23 of Hilbert.. that I was made aware of through a videotaped lecture of G. But there has been another consequence of all this theoretical work. which is a functional programming language. I quote him about Hilbert’s attempt to formalize all mathematics after the publications of the theorems of G¨del o and Turing: “It failed in that precise technical sense.172 CHAPTER 10. CONCLUSION procedure to decide if a statement is correct or not. it’s very o close to LISP”. not formalization of reasoning..] If you look at Turing’s paper of course there is a machine language [. [. . But in fact it succeeded magnificently.computer programming languages! So if you look at the history of the beginning of this century you’ll see papers by logicians studying the foundations of mathematics in which they had predicate calculi.] If you look at papers by Alonzo Church you see the lambda calculus.. and all of the foundational work is not so impractical after all! As G. Or.]. Indeed mathematicians continued with their work. Now this is not entirely true.. as von Neumann said: the universal Turing Machine is really the notion of a general purpose programmable computer and that’s the idea of software.maybe we can emphasize this aspect and show that technology often advances through fascinating impractical ideas. Chaitin on the internet. Formal methods/studies have not always done a good job promoting themselves . Chaitin concluded in his speech.. As he showed there are numerous examples of unexpected offspring of theoretical research. as he says “don’t have spin-off in dollars right away.we should do mathematics exactly the same way as we’ve always done it.J.. and the theorems of G¨del and Turing had no or little impact in practice on how o we (should) do mathematics. is that they have become a bit more precise in the use of language and in writing their proofs. Now you look back and you say this is clearly a programming language! [.

this is not the focus of most current research: (most people believe that) the human mind will (at least for the near future) be the one to prove whether a given mathematical statement is true or not. I consider creating an online version of this document with more background information and links. Despite that total formalization of parts of mathematics is very useful. geometry) successfully. chemistry or even social sciences). The short answer is: it is not. we consider all statements in the (everyday) work of practicing mathematicians. is now settled once and for all.173 Status of the project That brings us to ask if the question of the foundation of mathematics. Although G¨del and Turing showed that it is impossible to totally foro malize even basic arithmetic. Andrews says in [4]. we can conclude there is still a lot of work to do on the foundations of mathematics. it is still possible to formalize parts of mathematics (for example. if we provide one of the As interesting statements.” In that respect the formalization is not restricted to mathematical reasoning. These ‘practical’ statements do not include the specific purely theoretical statements that G¨del invented for his incompleteness theorem. Ideas for future work and distinguishment between mathematics and software And although it cannot be determined by a machine whether any given mathematical statement is true. let alone the whole of mathematics. As P. This is useful because. “attempts to understand the nature of reasoning and to build sophisticated information systems which can draw logical conclusions may be regarded as part of an endeavor to fashion more powerful intellectual tools for coping with the increasingly complex problems which confront mankind. Especially the development of software and computer systems will be facilitated by a formalization of theories. o 3 . even when all axiomatic systems are incomplete and there are undecidable statements. and it can also be applied to other disciplines (such as physics. more than a decade after Hilbert formulated it. we can try to develop an axiomatic system such that as much as possible of the interesting statements3 can be proved within that system. Even from the amount of interesting resources on current research that are available on the internet alone.

Instead of the proofs of mathematical statements.174 CHAPTER 10. this also might be a necessary option4 . in other words which classes will fall outside our system. Andrews calls his book [4]. Although we want to change as little as possible to the (side of) mathematics itself. we still have a way to decide mechanically whether or not the proof is correct for the given statement. CONCLUSION statements that the system does contain. As P. There is a growing need for reliable (that is. This has to be a part of our investigation: to find out how many of the practical mathematical proofs contain ‘meta-arguments’. This is an important point: Distinguishment between mathematics and software construction. contain ‘meta-arguments’). The question then is if the set of statements for which we can do this. not only for (safety) critical systems. A formal approach can not only be used to prove correctness of mathematical statements but also of computer programs.e. but it also might require us to limit certain parts of mathematics so that they contain less undecidable proofs or require us to rewrite certain existing proofs to a form that is permitted by the system. which formalization is best usable and allows most parts of (practical) mathematics to be formalized. Also. We have to realize that we can never obtain a 100% guarantee of correctness of any algorithm. still forms a part of mathematics that is interesting enough. and which we claim to be decidable by providing a concrete and completely formalized (dis)proof of it within that system. This should be the first goal for the near future: (1) Investigate which parts of mathematics can(not) be formalized (i. we are then checking the derivation steps of program derivations. I want to emphasize this difference. since For a successful formalization of parts of mathematics we therefore do not only look at the axiomatic system. since it is often unclear or left implicit which of the two is meant when arguments for/against formalistic studies are given. we get: ‘to truth through proof’. we are becoming more and more dependent on automated systems. Many models in physics and chemistry depend on underlying mathematical theorems. and the success of the model depends on the correctness of the mathematical theorems. Formalization is not only important to check the correctness of mathematical theories that are becoming ever more complex. but also in everyday applications. correctly specified and working according to the specifications) software. and totally formalize proof checking for as most parts of mathematics as possible. 4 . in particular computers and software.

These first phases of software engineering (indicating user requirements/specifications) can also be adopted to comply with the methods of program derivation and formal proof checkers (note that we not only use the term ‘proof checker’ for mathematics. small and intuitive as possible (see also the ‘Bruijn criterion’ in [26. and therefore we better also try to keep the axiomatic system as simple. Therefore I stress for an integrated approach. At the Eindhoven University of Technology for example. But nevertheless. Software and Proof Checking I would also like to remark that proof checking for programs can only give us a way to verify the correctness of programs. construction/derivation) are merely advocates of their own approach. i. while a combination of both could give the best results. Although there has been some minor work on formalizing these proof techniques and combining formal methods and program derivations (see for example [26]).e. small and intuitive as possible (we could see all this as the Bruijn criterion variant for axiomatic systems). This is the focus of the work in the area of programming methodology. Dijkstra are used to derive correct programs from their specification. Such an integrated approach requires research and cooperation between the various branches . pages 4 and 26]). since we learned from G¨del that the consistency of any axiomatic o system cannot be proved within that system. to check mathematical statements. highest chance of correctness of software). cooperation is still minimal. At least as important (to obtain correct programs) is the correct construction of programs. we can never obtain a 100% guarantee of correctness of any mathematical statement. And since we can never obtain a 100% guarantee of correctness of software (it depends for example on the correctness of the specifications and the proof checker itself). the techniques of E. any such implementation of a proof checker would give us the highest degree of certainty possible.e. for the combination of all of the mentioned methods can only together give us the highest reliability (i. If we go one step further back in the process of creating correct software. Unfortunately both areas (proof checking/verification vs. That is why we have to try to keep the proof-checker as simple. model checking techniques can also be used as a verification method to improve reliability even further. And analogue. the success of any piece of software depends on the correctness of its specification.175 we also are dependent on the correctness of the proof-checker.W. but also for the software variant: for checking algorithms/programs derivations).

html. but for a successful result this is premature and do we first have to start thoroughly at the beginning (1). Then (3) we can think of building a standard library of proved mathematics. Proof assistants not only check the proofs for us. There have already been attempts to build standard libraries of mathematics (see the Mizar project at http://www. where a syntax-driven derivation system is presented for a formal language of mathematics called .org/ and the PRL project.www. (3) and (4). Proof assistants should make it easier for us to prove mathematical theorems. Mathematics and Proof Checking Let’s go back to proof checking of mathematical statements. And although we can not see the quality of mathematical work as evident as the quality of physical products. Many valuable experiences have come out of attempts on what are here called phase (2). This shows that it is possible to formalize large parts of mathematics. they can be stored in a database. but also help us in making the proofs: they are tools that are a combination of a proof development system and a proof checker.mizar.cs. Also an interesting article on computer assisted mathematics (for computer algebra) is [7] with an abstract history of computations versus proofs in mathematics. Work in this direction was done in [44]. We mentioned the first goal of investigating and formalizing proof checking. of the remaining proof obligations and even fill in part of the proofs for us automatically. but they lack the formal basis that has to be provided by (1) and (2)). It can be accessible to everyone via the internet and even be used for previously mentioned automated proving methods by proof assistants. this could be the long awaited ‘quality stamp’ for mathematics.edu/Info/Projects/NuPRL/nuprl. For complicated statements. but the process itself of formalizing mathematics is too direct and informal and needs to be further developed. CONCLUSION representing the methods I mentioned before and ultimately incorporation in the software engineering process. After a proof checker has confirmed the correctness of a given mathematical statement and its corresponding proof. see http://. A good article about proof assistants using dependent type systems can be found in [8]. Barendregt and his group have formalized parts of algebra using the theorem prover COQ. The notion of ‘helping’ or ‘assisting’ in making proofs might be considered vague. we can think of tools that keep track of the context of the proof.176 CHAPTER 10.cornell. As a next step (2) we can think of building proof assistants.

etc.e. New methods are from areas such as neural networks. contain ‘meta-arguments’). Part of the branch of automated proving are classical theorem proving methods (such as for example automated induction.177 Weak Type Theory. The new project (for mathematics): 1 Investigate which parts of mathematics can(not) be formalized (i. This is a start of a more rigorous approach to the translation of mathematical texts (statements and proofs).e. which formalization is best usable and allows most parts of (practical) mathematics to be formalized. and totally formalize proof checking for as most parts of mathematics as possible 2 building a proof assistant (probably based on some form of WTT and some form of TT) 3 build a standard library (archive) of proved mathematics 4 further develop automated proving techniques (to build in the proof assistant) And similarly we can formulate the new project for computer systems: The new project (for software construction): 1 formalize as much of program derivation checking as possible 2 build a programming assistant (environment) based on a suited (and preferably popular) programming language 3 build a standard library of reusable correct software (i. as the last and final phase (4) of future work. We see the extension of proof assistants with more intelligent and sophisticated automated proving methods.). I want to end these ideas by summarizing the steps that are laying ahead of us. suitable for component based software engineering) and its specification 4 further develop automated proving and program derivation techniques . in a new project. fuzzy logic and genetic and DNA computing and in the future possibly even quantum computing.

. type theoretic expressions have their counterparts in set theory. It is interesting to investigate if among such mappings there is indeed a bijection. has so far in this conclusion been avoided: What to take for the basis of mathematics? This is one of the most difficult questions and as we have seen many great scientists have thought about this. several new types of logic have been proposed. page 174]). This standpoint was already mentioned in 1807 by the German mathematician Hegel (1770-1831) in [35]: “Dagegen muß behauptet werden. part of step (1). The only thing I can say is that it seems that recently most people seem to favor type theory over category theory.com/rbjpuc/logic/jrh0111. but until so far it seems they lack preciness.rbjones. That would show the equivalence of both theories in expressive power. formalization and proofs to support claims that they can be used successfully as a foundation for mathematics. those familiar with both type theory and axiomatic set theory recognize that in some ways the former provides a more natural vehicle than the latter for formalizing what mathematicians actually do”. A thorough research of the alternatives will have to yield the best approach and will show which choice of foundational system is best usable in practice. and I am not in the position to give an argumented opinion.] However.J.htm we find a detailed assessment on the choice for a foundational system. H. P. Barendregt gives arguments for the use of type theory over set theory in [7]. CONCLUSION One of the most important questions. There is currently no consensus of what is the best approach. so that the debate can turn onto the question which theory is more intuitive and useful.178 CHAPTER 10. is that if there is a mapping from the axioms of (some form of) set theory in (some form of) type theory and vice versa. relational calculi and also over set theory. on http://www. Scott for example favors type theory over category theory in the introduction of [55]. a u die fertig gegeben und so angestrichen werden kann”. Also. with advantages of set theory over type theory. Some do not really believe in a successful formalization of mathematics but rather see the indeterminacies in mathematical representations and the undecidabilities in any formal system as the source of problem solving and creative power (see [87. A final remark on the debate between type theory and axiomatic set theory as a foundational basis. and we quote from [4. the second page of the preface]: “[People prefer the approach they are most familiar with. such as IF logic (see [37]) and several types of so-called ‘fuzzy logics’. daß die Wahrheit nicht ein ausgepr¨gte M¨nze ist. On the contrary.

consisting of the four steps mentioned in this conclusion.179 I am aware of the limitations of this report. from his paper on the Turing test: “We can only see a short distance ahead. is just my own view of work that lays ahead of us. Mark Scheffer. I refer to the website http://zax. To end with a concluding remark by Alan Turing.mine. as we look at Chaitin’s results on randomness. The only excuse I have is that it is simply not possible to study all the original works in such a short period of time. Finally I would like to remark that the ‘new o project’. I hope to complete this work at a later stage. Many chapters are still informal. August 20015 p. such as the work of Frege in chapter 4. Recently we have seen interesting new theories on category and type theory and even on the foundations of mathematics. It might also be worth to extend (on both sides) the period of which the theory is treated in this report. but we can see plenty there that needs to be done”. The theory of types in chapter 7 and of G¨dels incomepleteness theorem in chapter 8 are not completely o covered and certain subjects closer to logic (such as intuitionism) are treated very minimally. it seems that he went further where G¨del and Turing left off. To those who wonder what the turtle and the elephant are doing on the cover of this report.s. 5 .nu/stage/. and include all theory in this report.

CONCLUSION .180 CHAPTER 10.

1: Luitzen Brouwer Figure A. suchii@bun.jp.2: George Cantor Drawings by Soshichi Uchii.ac.kyoto-u.Appendix A Timeline and Images Figure A. Photo Ramsey due to Harcourt. 181 . Jovanovich. Photo Quine by Kelly Wise. Brace.

5: Kurt G¨del o Figure A.6: David Hilbert .3: Richard Dedekind Figure A.4: Gottlob Frege Figure A. TIMELINE AND IMAGES Figure A.182 APPENDIX A.

9: Henri Poincar´ e Figure A.8: Giuseppe Peano Figure A.10: Willard Van Orman Quine .7: John von Neumann Figure A.183 Figure A.

11: Frank Plumpton Ramsey Figure A.12: Bertrand Russell Figure A.184 APPENDIX A. TIMELINE AND IMAGES Figure A.13: Alan Turing .

185 .

TIMELINE AND IMAGES .186 APPENDIX A.

J. The Lambda Calculus . [9] C. Hilbert. volume 2. [2] W.A. North-Holland Press. An introduction to mathematical logic and type theory: to truth through proof. Almering. Levy. Grundz¨ge der Theoretischen Logik. [5] J. Ackermann and D. TU/e Press. chapter 18. 12. Amsterdam. Andrews. Geuvers and J. 2 edition. Backer and P. Bar-Hillel A..M. Cohen. Berlin. 1986. 1973. 1993. Computational Models. 187 . 2001. Academic Press. [8] H. [6] H. Hooman. Symbolic Computation. First edition 1958. Hilbert’s basis theorem. Barendregt. 2000. University of Bialystok. 1928. Bloo.Bibliography [1] Y. u volume Band XXVII of Die Grundlehren der Mathematischen Wissenschaften in Einzeldarstellungen. Foundations of set theory. Inc.J. Academic press. Oxford Press. pages 1149-1240 of Handbook of Artificial Reasoning. volume 103. first edition. Proof-checking using Dependent Type Systems. Delftse Uitgevers Maatschappij. Manuscript originally started by H. 2001.H. 2000. Fraenkel and A. Barendregt and A. 2001. Association of Mizar Users. Springer-Verlag. [4] P. Geuvers. Electronic Communication of Mathematics and the Interaction of Computer Algebra Systems and Proof Assistants. Published in Journal of Formalised Mathematics. [7] H. Analyse. Elsevier Science Publishing Company.Its Syntax and Semantics. Barendregt and H. 1984. [3] J. Rudnicki. J.

1971. de Swart. . 65. Mathematics. Doets D. Introduction to the Theory of Sets. Set Theory and the Continuum Hypothesis. An unsolvable problem in elementary number theory. 1997. Benjamin. Mathematical Circles Revisited. 1878.188 BIBLIOGRAPHY [10] J.C. 1978. Math. Seanet. New York. Methuen Press. New York. Einstein.S. Foundations and fundamental concepts of mathematics. Brown. The Undecidable: Basic Papers on Undecidable Propositions. Item in Stanford Encyclopedia of Philosophy. Berlin. reine und angew. [22] H. [16] B. [11] Encyclopedia Brittanica. Church. Dover publications inc.. [13] G. 1936. [21] A. Prentice-Hall. Dauben. Breuer. Georg Cantor. Axiomatic and Applied. Pergamon Press.. Mineola. Davis. [15] P. P. [17] H. 2000. EB. Gesammelte Abhandlungen. Translated in ‘Contributions to the foundation of the theory of transfinite numbers (translation from German’. The Church-Turing Thesis. Eves. August 1958. Eves. 1997.W. 1966. [18] J. 1965. Harvard University Press. volume 58.J. Springer-Verlag. Cohen. American journal of Mathematics. Jourdain. 1908. by Philip E. Ein beitrag zur mannigfaltigkeitslehre.. 1952. 84. Mathematische Annalen. Sets: Naive. Springer-Verlag. His Mathematics and Philosophy of the Infinite. Dover Publishing. third edition edition. [20] Diverse. Boston Press. 1991. Copeland. Unsolvable Problems and Computable Functions. Raven Press. Journal f. [12] K. [19] M. Cantor. 1979. [23] H.J. [14] A. van Dalen and H. Relativity: the special and general theory. 1970. pages 119-133. Bernays. London.

[33] J. 3 edition.R. Abstract Set Theory. 1993.. [36] H. Van Nostrand Press. Einleitung in die Mengenlehre. [34] P. volume 6. Fraenkel. 1992. Doctoral thesis. 2000. [28] D. The Search for Mathematical Roots 1870-1940. I1.BIBLIOGRAPHY 189 [24] A. First edition in 1953. English translation ‘The Phemenology of Mind’ by J. Grattan-Guinness. Reprint: Meiner. 1996. Duckworth & Co. a guided independant study. Middlesex University Press. [29] I. Hegel. inc. Bar-llan University Press. In the Journal of the Bertrand Russell Archive. [32] I.. London. [30] I. A sideways look at Hilbert’s Twenty-three Problems of 1900. Fraenkel.F. Goldrei. First published in 1980 by G. Introduction of the Israel Mathematical Conference Proceedings. Halmos. Grattan-Guinness. third edition. 2000. Cocktail. McMaster University Library Press. [26] M. Eindhoven University Press. First published in 1962 by Basic Books. I.B. Haim. Schulz. Franssen. Classic set theory. 1966. page 58. 1807. North-Holland Press. English translation of G¨del’s original 1931 publication of the incompleteness o theorem. 2000. Grattan-Guinness. Chapman and Hall. Princeton University Press. Mathematische Logik. Amsterdam. London.W. Grattan-Guinness. Naive Set Theory. 1. Springer-Verlag. 1990. Dover publications. From the Calculus to Set theory 1630-1910. [27] K. Unknown. Hermes and H. G¨del. New York. a [35] G. 2000. In Encyklopedia Mathematische Wissenschaften. London. Baillie in 1910. New York. Princeton University Press. Hbg. [25] A.A. Ph¨nomenologie des Geistes. 1952. On formally undecidable propositions of Principia Mathemao tica and related systems. . [31] I. How did Russell write the principles of mathematics (1903). 1997. 1928.

Clark University. Kleene. 1997. 2000. 1936. Reprint 2001. [38] A. 1967. Encyclopedia Brittanica. Clark University. Hintikka. Newman. New York Press. Phoenix. Journal of the Interest Group in Pure and Applied Logic. To be published. Koendrink. [51] K. [50] J. Kunen. [42] D. 1996. 1980. The Principles of Mathematics Revisited. Stanford University Press. 1963. [46] E. A derivation system for a formal language of mathematics. 1997. 1996. Clark University. Worcester.clarku. D. 1990. Cambridge University Press. [48] S. EB.T. Kneebone. Kaplansky. 1997. http://alepho. Mathematical logic and the foundations of mathematics. Irvine. Kamareddine and R.190 BIBLIOGRAPHY [37] J. 1940. [44] F. Joyce. Lambda-definability and recursiveness. item on David Hilbert. Hodges. The Great Philosophers.D. [39] A. Cambridge. Worcester. July 2001. Mathematical Logic. Duke Mathematical Journal 2:340-353. 1990. 4 (2):195–213. van Nostrand Company. [43] F. Bertrand Arthur William Russell. . [49] G. The Mathematical Problems of David Hilbert.C. Laan.edu/ djoyce/hilbert/. Kasner and J. Joyce.C. Joyce. Nederpelt. [41] D. New York Publishing. [40] D. Mathematicians and the imagination. New York. 1997. Kamareddine and T. Solid Shape. Set theory: an introduction of independence proofs. Worcester. Hilbert’s 1900 Address. A list of Hilbert’s problems. [45] I. [47] S. Kleene. A reflection on russell’s ramified types and kripke’s hierarchy of truths. Turing.

[65] G. The Evolution of Type Theory in Logic and Mathematics. Applicazioni geometriche del calcolo infinitesimale. An introduction to formal languages and automata. [54] T. Calcolo differenziale e principii di calcolo integrale.BIBLIOGRAPHY 191 [52] T. Monk. [55] J. First published in 1997 by Phoenix. Peano. . Peano. [57] J. Technical Report 94-33. To infinity and beyond. TUE Computing Science Reports. 57(2/3):243–278. development and influence. Eindhoven University of Technology. Routledge. [64] E. [53] T. D. 2001. First published in 1958. A modern elaboration of the ramified theory of types. From Brouwer to Hilbert. Laan and R. [60] P. Heath and Company. [63] G. Laan. Laan. 1996. MacHale. Newman. 1998. Nagel and J. 1987.R.C. Cambridge e University Press.H. Linz. G¨del’s proof. 1999. 2000. Cambridge University Press. Rootledge Press. o 1986. The Great Philosophers. Moore. [66] G. Comic Sections. 1982. Nederpelt. Lucas. 1997. Turin Press. [56] P. [62] R. 1994. [61] E. Russell. Springer-Verlag. logic and their limitations. Set theory. Turin Press. [58] D. 1884. Boston Press. PhD thesis. Dublin. Scott. New York University Press. Oxford University Press.J. Zermelo’s axiom of choice: it’s origins. Introduction to higher order logic. Lambek and P. 1993. Studia Logica. 1990. Maor.P. [59] Mosch´ Machover. A formalization of the ramified type theory. the debate on the foundations of mathematics in the 1920s. R. 1887. Mancosu. 1996. The conceptual roots of mathematics.

pages 105-142. June 1927. [68] G.M. The foundations of mathematics. G¨ttingen Press. Revised edition of Norton. New York 1940. Published also in Acta Mathematica 30 (1906). Richard. Massachusetts. Russell. 1901. pages 295-296. Harvard University Press. Wittgenstein P. Bertrand Russell entry in Encyclopedia Brittanica.J. [72] W. Les principes de math´matiques et le probl`me des enseme e bles. Harvard University Press. 2 edition. Rubin and J. Publisher Birkhauser.C. June 1927. London: George Allen and Unwin. edited by Peano. Ramsey. New York: Simon and Schuster. 1888. 2000. P. 1905. [78] B.192 BIBLIOGRAPHY [67] G. From a Logical Point of View: 9 LogicoPhilisophical Essays. Van Orman Quine.W. Uber die Hypothesen. Translation in German ‘Geometric Calculus : According to the Ausdehnungslehre of H. [70] W. Grassmann e precedutto dalle operazioni della logica deduttiva. Mathematical Logic. [74] J. 1951. 1988. Torino. [75] B. Harvard University Press. 16. Sullivan. Cambridge. Rose. Logica matematica. Amsterdam. In Ri(e)vista di mathematica. Grassmann’ by Lloyd Kannenberg. My philosophical development. Parte prima. EB. Peano. Unknown. Set Theory and its Logic. . Dizionario di matematica. Reprinted by F. 1854. [77] H. Raleigh NC.E. Fratelli Bocca. Van Orman Quine. Mathematical Maxims and Minims. [71] W. [69] L. o [76] N. Cambridge. 1963. Revue gnrale des sciences pures et appliques. Peano.J. Equivalents of the axiom of choice. november 1999. NorthHolland Press. welche der Geometrie zu grunde liegen. Calcolo geometrico secundo lAusdehnungslehre di H. Unknown. Massachusetts. Rubin. 1961. 1959. 1963. Riemann. [73] R. Van Orman Quine. Theoria 61 (2) (1995).

Routledge. Turing. Sobocinski. Oslo. 1991. 220-228. First published in 1997 by Phoenix. by A. Tiles. [83] A. .M. 1922. Rome Press. edited by Jens E. 2000. 308-316. January 2002. New York: The Macmillan Company. Helsinki. Skolem. Reprinted with some annotations in ‘The Undecidable: Basic Papers on Undecidable Propositions. 1936. [82] M. Reprinted in ‘Selected Works in Logic’. ed. 217-232. 1965. The Great Philosophers. Skolem. SpringerVerlag. pages. pages 94-107. Boston.T. Nederpelt”. Sipser. London: George Allen and Unwin. Titchmarsh. [89] A. Mathematical Maxims and Minims. Introduction to Mathematical Philosophy. Kamareddine T. Martin Davis. Mathematics and the image of reason. The autobiography of Bertrand Russell. Vol. Smullyan. On computable numbers. London Mathematical Society. 1997. Publisher Universitetsforlaget. Unknown. Akademiska Bokhandeln. lecture notes in mathematics. 1950. 1988. volume 8. [88] E.C. Fenstad.43 (1937) pages 544 to 546. With corrections from Proceedings of the London Mathematical Society. Einige bemerkungen zur axiomatischen begr¨ndung der u mengenlehre. Methodus II. [87] M. Unsolvable Problems and Computable Functions’.BIBLIOGRAPHY 193 [79] B. [80] B. In ‘Matematikerkongressen i Helsingfors 4-7 juli 1922. Russell. volume 42. 1982. L’analyse de l’antinomie Russellienne par Lesniewski. Raven Press. Series 2. Russell. To be published. 1992. pages 230-265 of 2. Oxford Logic [85] B.T. [81] S. Proper forcing. [84] R. PWS Publishing Company. pages 237-257. G¨del’s incompleteness theorems. with an application to the Entscheidungsproblem. Bulletin of Symbolic Logic. Methodus I. Laan and R. o Guides. Shelah. 1999. [86] F. Oxford University Press. Types in Logic and Mathematics before 1940. Den femte skandinaviska matematikerkongressen’.M. Routledge. 1970. New York. Introduction to the theory of computation.

Oxford University Press. Encyclopedia Brittanica. An introduction to Mathematics. Springer-Verlag. pages 70-80. New foundations for Mathematical Logic.. Whitehead.194 BIBLIOGRAPHY [90] A. Zur Einfurung der transfiniten Zahlen. ¨ I. Berlin. Item on Turing. [100] A. New York. In Mathematische Annalen 65. volume 13. National Physical Labatory. London. and Michie. [97] J. Turing. The Mathematical Intelligencer. Weiner. Past Masters. [93] J. Intelligent Machinery. Item on Principia Mathematica. 1908. P. B. National Physical Labatory Report in ‘Machine Intelligence 5’ by Meltzer. Encyclopedia Brittanica. Cornell. 1999. February 1937. van Orman Quine. 3]. von Neumann. [101] E. A treatise on universal algebra. Springer-Verlag. 1967. Weiner. 1991. 1908. [99] A.M. From Frege to G¨del: source book in mathematical o logic 1879-1931. 1911. Frege in Perspective. EB. 44(2). [98] J. [91] Unknown. 2000. 1:199-208 [I. 1948. 1960. EB. Zermelo. . 1969. [92] Unknown. Frege. pages 261-281. Edinburgh University Press. 2000. Whitehead. [94] W. Harvard University Press. Acta Szeged. The American Monthly. Williams and Norgate. Untersuchungen uber die Grundlagen der Mengenlehre. [96] J. [95] Various. van Heijenoort. 1990. 1923.