You are on page 1of 54

NOTES ON LOGIC

18.510, FALL 2015

HENRY COHN

Contents
1. Syntax 1
2. Semantics 4
3. Boolean algebra 6
4. Classification 10
5. Quantum logic 13
6. Formal proofs 14
7. First-order languages and syntax 19
8. First-order semantics 21
9. Examples of first-order languages 23
10. Completeness and compactness 25
11. Consequences of compactness 27
12. Downwards L owenheim-Skolem 29
13. Filters and ultrafilters 32
14. Voting 35
15. Ultraproducts 38
16. Formal proofs in first-order logic 42
17. Introduction to incompleteness 46
18. Intuitive Background 47
19. Proofs of the Incompleteness Theorems 50
20. Constructing a G odel sentence 54

1. Syntax
The propositional calculus is the mathematical system that deals with logical
relationships between propositions, i.e., between assertions. The word calculus
simply means a method of calculating. It is by no means limited to the differential
or integral calculus, although those are certainly the most famous examples.
The propositional calculus involves manipulating strings of symbols. Some of the
symbols will be variables, which stand for propositions; then we can combine them
with the logical symbols (not), (and), (or), (implies), and (is equivalent
to). For example, (p p) means that p is true or p is not true.
This framework is quite limited, because it does not allow us to create interesting
statements from scratch. Using the propositional calculus, we can make assertions
like (p q), but we cannot explain what p and q actually mean, so we are limited
to talking about pre-existing statements. Thus, to do serious mathematics we will
1
2 HENRY COHN

eventually need a more powerful framework, but the propositional calculus is an


important part of more general systems.
We begin by formalizing the intuitive idea of a string of symbols. Its convenient
to formulate it in terms of the von Neumann definition of the natural numbers, but
that is of course not essential.
Definition 1.1. Given a set , a string with alphabet (or string over ) is a
function from a natural number to . We write for the set of all strings over ,
and say that a function from n to has length n.
Given a function : n , we typically write the string as (0)(1) . . . (n 1).
For example, if = {0, 1}, then 01001 denotes the function : 5 defined
by (0) = 0, (1) = 1, (2) = 0, (3) = 0, and (4) = 1. However, this is just
shorthand for the formal definition. We also typically denote the symbols in a
string with subscripts rather than parentheses: given a string , we write it as
0 1 . . . n1 rather than (0)(1) . . . (n 1). We will not use a special notation
for the empty string (the unique string with zero length), because we will not talk
about it often enough to justify reserving a letter for it.
Definition 1.2. If and are strings of lengths m and n, respectively, then their
concatenation is the string
0 1 . . . m1 0 1 . . . n1
of length m + n.
Note that we have given up on carefully formulating basic concepts axiomatically
in terms of set theory. Hopefully the previous examples have made it clear that
one can do that, and at this point there is little insight to be gained by defining
concatenation inductively to avoid writing the ellipses.
We identify the symbols in with the strings of length one. For example, if
is a string and is a symbol, then means the concatenation of the string of
length one with the string .
The alphabet of the propositional calculus consists of a countably infinite set
of variables together with nine other symbols (distinct from any of the variables),
namely , , , , , (, ), >, and . The logical connectives were explained
above, while > denotes truth and falsehood (truth upside down).
We will use lowercase letters to denote variables, but because the English language
is equipped with only a finite alphabet, we must allow some sort of decoration to
distinguish extra variables. For example, if we run out of letters, we can generate
more variables by appending primes: p, p0 , p00 , etc. will all be distinct variables. We
treat these as representing indivisible symbols (i.e., the prime is meaningless by
itself and cannot be used in isolation). Thus, we have a countably infinite set of
variable symbols.1
As a general convention, we will use Roman letters to denote variables and Greek
letters to denote strings.
For the rest of this section, let denote the alphabet of the propositional calculus.
We begin by giving an inductive definition of which strings are syntactically correct.

1Alternatively, we could have used a finite set of symbols, including the prime. That would
avoid the infinite alphabet, at the cost of slightly wordier rules about variable naming. Which
approach to take makes little difference mathematically and is a matter of taste.
NOTES ON LOGIC 3

Definition 1.3. A subset S of is admissible if


(1) S contains all the variables in , as well as the symbols > and ,
(2) if S contains , then it also contains , and
(3) if S contains and , then it also contains ( ), ( ), ( ), and
( ).
A well-formed formula (or wff ) is a string that is an element of all admissible subsets
of .
The word wff is pronounced like woof, or sometimes wiff.
We can prove that a string is a wff by showing how to derive it from (1) through
(3) above. For example, since p and q are variables, they are both wffs by (1)
(because they must be in every admissible set), and q is then a wff by (2), so
(p q) is a wff by (3). It is easy to build up much more elaborate wffs, such as
((p q) (r0 s)).
To prove that something is not a wff, we must take the opposite approach of
finding an admissible set that does not contain it. For example, (p q is not a wff,
because the parentheses are unbalanced. Specifically, let S be the set of strings in
with equal numbers of left and right parentheses. This is an admissible set, since
it satisfies (1) through (3), but it does not contain (p q, so that string is not a wff.
In fact, it takes no real ingenuity to determine whether a given string is a wff:
either the string is obviously a wff or there is something obvious that is wrong with
it. We will not formalize or prove this statement here, but we will prove one closely
related lemma, called unique readability. It says that every wff can be built up
from (1) through (3) in a unique way, so we neednt worry about ambiguity (for
example, that a single wff might simultaneously be the conjunction of two wffs and
the disjunction of two different wffs):
Lemma 1.4. Every wff satisfies exactly one of the following:
(1) is a variable, >, or ,
(2) there exists a unique wff such that = ,
(3) there exist unique wffs 1 and 2 such that = (1 2 ),
(4) there exist unique wffs 1 and 2 such that = (1 2 ),
(5) there exist unique wffs 1 and 2 such that = (1 2 ), or
(6) there exist unique wffs 1 and 2 such that = (1 2 ).
To prove this lemma, we will use the idea of a prefix of a string: is a prefix of
if = for some string . This allows = , with the empty string ; we say
is a proper prefix of if it is a prefix and 6= .
Proof. We begin by showing that no wff can be a proper prefix of another wff.
Every proper prefix of a wff is either a string of symbols or has strictly more left
parentheses than right parentheses, because the set of strings whose proper prefixes
are of these types is admissible. However, no wff can consist entirely of symbols
or have unbalanced parentheses.
Now we turn to the proof of the lemma. The only interesting part is cases (3)
through (6). We must prove that there is no overlap among these cases, and that
1 and 2 are uniquely determined in each case. To do so, let and  be symbols
chosen from , , , or , and suppose we have a wff that is of the form (1 2 )
and (10  20 ), where 1 , 2 , 10 , and 20 are wffs.
4 HENRY COHN

Because (1 2 ) = (10  20 ), one of 1 and 10 must be a prefix of the other,


and therefore 1 = 10 . Then (1 2 ) = (10  20 ) implies =  and 2 = 20 as
well. 

The only reason why unique readability holds is that we have fully parenthesized
everything. If we allowed strings like p q r, it would be ambiguous whether the
string was built from p and q r or from p q and r. In practice, writing many
parentheses can be tedious, and it is often convenient to omit some of them, but the
resulting formula should be viewed as shorthand for the real, fully parenthesized
wff.

2. Semantics
We have specified the syntax of the propositional calculus, i.e., the grammar
that specifies which formulas are well formed. Now we turn to the semantics, the
meaning of the formulas.
The symbol > stands for truth, and for falsehood. If we substitute > and
for the variables in a wff, then the following table shows how to assign a truth value
of > or inductively to the entire wff:
( ) ( ) ( ) ( )
> > > > > >
> >
> > > >
> > >
Under these rules, each wff involving variables x1 , . . . , xn determines a function
from the variable assignments {>, }n to {>, }. Note that this process depends
crucially on Lemma 1.4, since unique readability guarantees that we cannot evaluate
a wff in two different ways.
In the table shown above, each symbols behavior corresponds to its name (not,
and, or, implication, and equivalence). Two cases are worthy of note. One is that
always denotes inclusive or, so ( ) is true if is true, is true, or both are
true. The other is that denotes material implication. In other words, ( )
depends only on the truth values of and , and does not require any conceptual
relationship between the statements. It simply means that if is true, then is
true; equivalently, is true or is not true.
For example, if p stands for 1 + 1 = 2 and q stands for Fermats last theorem, then
(p q) is true, because p and q are both true, even though there is no obvious way
to deduce the truth of Fermats last theorem from 1 + 1 = 2. Material implication
does not require such a deduction. Part of its beauty is that we can deal with
implications without ever having to formalize the tricky intuitive notion of what it
means for q to be a consequence of p; instead, all we need is the property that if p
and (p q) are both true, then q is true. Material implication has this property.
Similarly, a false statement vacuously implies everything: ( p) is true
regardless of whether p is true or false. This can be slightly counterintuitive, since
it disagrees with the usual English interpretation of the word if. Suppose it isnt
raining, and someone says if it were raining, I would take my umbrella. This
person is presumably imagining a counterfactual scenario and reasoning about how
that imaginary world differs from the real world. Its much less plausible that
someone would say if it were raining, then it wouldnt be raining, even though
NOTES ON LOGIC 5

material implication allows you to deduce anything from a falsehood. (The use of
the subjunctive suggests counterfactual reasoning. The sentence if it is raining,
then it is not raining avoids that issue, at the cost of sounding even weirder.)
Counterfactual reasoning is much richer and deeper than material implication, too
much so to be captured by the propositional calculus. Instead, material implication
depends only on truth values. We know a false statement can sometimes imply a
true statement (if 1 = 1, then (1)2 = 12 ) and can sometimes imply another false
statement (if 1 = 1, then 1 + 1 = 1 + 1). If we have just truth values to work with,
then we can capture this behavior only by deciding that ( p) is always true,
regardless of whether p is true. One might fear that this is an oversimplification
compared with counterfactual reasoning, and indeed it is, but it is not a harmful
oversimplification, and material implication provides a perfectly good foundation
for mathematics. It has the property that if you know p and (p q), then you can
deduce q, and that is good enough.
The logical symbols used in the propositional calculus are redundant, because we
can express some of them in terms of the others. We could replace ( ) with
(( ) ( )), and we could replace ( ) with ( ). We could even
eliminate one of and by expressing it in terms of the other, i.e., replacing ( )
with ( ) or replacing ( ) with ( ). These sorts of reductions
are sometimes convenient, but they make long formulas much harder to read, so we
will cheerfully use a redundant system.
Definition 2.1. A wff in the propositional calculus is satisfiable if there exist
truth values for its variables such that evaluates to >. It is a tautology if it always
evaluates to >.
For example, (p p) is a tautology, as is (q (p q)). The wff (p q) is
satisfiable but not a tautology, and (p p) is not even satisfiable. Note that is a
tautology if and only if is not satisfiable.
It is generally easy to tell whether a short wff is a tautology. However, it can be
a little counterintuitive. For example,
(((p q) r) ((p r) (q r)))
is a tautology, because both ((p q) r) and ((p r) (q r)) are equivalent to
((r p) q). At first, this seems plainly absurd: it seems to be saying that if two
hypotheses together imply a conclusion, then only one of the hypotheses is actually
needed to reach the conclusion. However, that reformulation is misleading, because
it is subtly changing the problem. When people think about implication, they often
allow an implicit universal quantifier to slip in: they imagine that propositions p
and q depend on some circumstances x, and they think of (p q) as meaning
x (p(x) q(x)) (which is of course not a wff in the propositional calculus). This is
not what the propositional calculus studies; it deals with single, isolated propositions,
not families of them depending on other variables. That distinction clears up the
intuitive problem:
x ((p(x) q(x)) r(x))
is not equivalent to
(x (p(x) r(x))) (x (q(x) r(x))),
but it is equivalent to
x ((p(x) r(x)) (q(x) r(x))).
6 HENRY COHN

The problem isnt that


(((p q) r) ((p r) (q r)))
is not a tautology. Instead, it is that you cant distribute over .
As this example shows, it is not always obvious at first glance whether a wff
is a tautology. It is easy to test it by brute force, by substituting all possible
combinations of truth values for the variables, but that is extremely time-consuming,
because there are 2n possibilities for n variables. For example, in practice it is
impossible to check all 2100 cases for 100 variables.
In fact, one of the deepest open problems in theoretical computer science, namely
whether P = NP, asks whether one can do better: P = NP if and only if there is
a polynomial-time algorithm for testing whether a wff is a tautology. The time
bound means the number of steps required to test a wff of length n is bounded by
a polynomial in n. Of course, such an algorithm could not check all the variable
assignments, so it would have to be based on a more efficient reformulation of what
it means to be a tautology. It is widely believed that P 6= NP, and in fact that no
tautology-testing algorithm runs in subexponential time, but no proof is known.

3. Boolean algebra
Classical logic is based on the law of the excluded middle: a proposition that is
not true must be false, with no third possibility. However, it is sometimes convenient
to extend the notion of logic to other truth values. For example, we could use a
three-valued logic, with a truth value ? (meaning maybe) in addition to > and
. Theres a natural way to extend , , , , and to this setting, based on
interpreting ? as a state of ignorance about whether the truth value is > or :
( ) ( ) ( ) ( )
> > > > > >
> ? ? > ? ?
> >
? > ? ? > > ?
? ? ? ? ? ? ?
? ? ? ? ?
> > > >
? > ? > ?
> > >
However, this system has some unfortunate properties. For example, we would like
(p p) to be a tautology, because every proposition should be equivalent to itself,
but in the three-valued logic described above, it is not a tautology. The problem is
that (? ?) evaluates to ?, since it has no way of knowing whether the two question
marks describe the same uncertain proposition or different ones.
In this section, we will develop the concept of a Boolean algebra, which is arguably
the best-behaved way to extend two-valued logic. The three-valued logic described
above is not a Boolean algebra. We will stick with classical two-valued logic in the
rest of this course, but before we proceed its worth putting it in a larger context.
We will build up to Boolean algebras in several steps. To begin, we will have
a partially ordered set B of truth values. The ordering relation p q means q is
at least as true as p; for example, this may hold if q is true on weekends and p is
true on Saturdays. The reason it is only a partial ordering is that there may be
NOTES ON LOGIC 7

r
@
@
r @r
HH 
H 
HH
r Hr
@
@
@r

Figure 3.1. A poset that is not a lattice

incomparable truth values, for example things that are true on Mondays and things
that are true on Tuesdays.
In principle, we could use any poset of truth values, but it is far from clear how
to define the operations , , , , and in an arbitrary poset. Thus, we will
impose additional structure on B. We will define and in terms of the other
three operations, just as in the two-valued case, but that still leaves three operations
to go.
Definition 3.1. A lattice is a poset in which every pair of elements p and q has a
least upper bound p q and a greatest lower bound p q.
In a lattice, p q is read p join q, and p q is read p meet q. Recall that the
least upper bound property means
(1) p q p and p q q, and
(2) for all r such that r p and r q,
r p q.
I.e., it is an upper bound, and it is the smallest of the upper bounds. The greatest
lower bound has exactly the same property, but with all the inequalities reversed.
Two elements p and q in a poset have at most one least upper bound: given two
of them, each would have to be less than or equal to the other. Similarly, they have
at most one greatest lower bound. However, they neednt have either a least upper
bound or a greatest lower bound. For example, in a poset consisting of just two
incomparable elements, those elements have no upper or lower bounds at all. Thus,
not every poset is a lattice. For a less trivial example, see Figure 3.1. In this poset,
every pair of elements has upper and lower bounds, but not always a least upper
bound and a greatest lower bound.
For an example of a poset that is a lattice, let S be a set, and consider P(S)
ordered by . The least upper bound is then the union, and the greatest lower
bound is the intersection.
It is not difficult to check that if p1 , . . . , pk are elements of a lattice, then
p1 pk is their least upper bound, and p1 pk is their greatest lower
bound. (No parentheses are needed, because these operations are associative.) In
particular, every finite lattice contains elements 0 and 1 such that 0 x 1 for all
x in the lattice. However, that is not necessarily true in infinite lattices.
The lattice perspective on and fits naturally with the idea of truth values:
p or q should be at least as true as p and at least as true as q, but no truer than
this forces it to be, so we take it to be the least upper bound p q of p and q, and
8 HENRY COHN

r
@
@
r r @r
@
@
@r

Figure 3.2. A complemented lattice in which complements are not unique.

we deal similarly with and. Thus, we will assume that our poset B of truth values
is a lattice. However, we have not yet seen how to deal with .

Definition 3.2. A complemented lattice is a lattice L such that


(1) there exist elements 0, 1 L such that 0 p 1 for all p L, and
(2) for every p L, there exists some q L such that p q = 1 and p q = 0.
Such a q is called a complement of p.

In a complemented lattice, 0 and 1 behave like and >, and q behaves like p.
The lattice P(S) is complemented: 0 = , 1 = S, and the complement of p is S \ p.
However, not every lattice is complemented. For example, every totally ordered
set is a lattice, with p q = max(p, q) and p q = min(p, q). Even if there are
greatest and least elements 1 and 0, no other element p can have a complement,
since max(p, q) = 1 and min(p, q) = 0 imply that {p, q} = {0, 1}. Thus, totally
ordered sets with more than two elements are never complemented lattices.
Complemented lattices almost allow us to define , but there is one major issue:
an element may have several complements. (By contrast, 0 and 1 are uniquely
determined by property (1) in the definition.) For example, in Figure 3.2, each
of the three incomparable elements in the middle is a complement of each of the
others.

Definition 3.3. A distributive lattice is a lattice in which p(q r) = (pq)(pr)


and p (q r) = (p q) (p r) for all p, q, and r.

In other words, each of and distributes over the other, just as in classical
two-valued logic. The lattice P(S) is always distributive, but the lattice drawn
above is not. If we let p, q, and r be the elements in the middle row (and let 0 and
1 be the least and greatest elements), then

p (q r) = p 1 = p,

but

(p q) (p r) = 0 0 = 0.

Lemma 3.4. In a complemented, distributive lattice, the complement of each


element is unique.
NOTES ON LOGIC 9

Proof. Suppose q and q 0 are complements of p in a complemented, distributive


lattice. Then
q =q1
= q (p q 0 )
= (q p) (q q 0 )
= 0 (q q 0 )
= q q0 .

Similarly, q 0 = q q 0 , and thus q = q 0 . 

In a complemented, distributive lattice, we can define p to be the unique


complement of p. This gives a suitable setting for doing propositional logic.
Definition 3.5. A Boolean algebra is a complemented, distributive lattice.
The power set P(S) is an example of Boolean algebra (ordered by ). We can
think of S as the set of states the world can be in. The subsets of S represent
truth values by specifying which states of the world a given proposition is true in.
For example, if S is the set of days of the week, then the truth value of Its the
weekend is {Saturday, Sunday}.
We have defined Boolean algebras to be a special type of partially ordered set,
but one can also characterize them algebraically, in terms of identities satisfied by
, , and , as follows.
Proposition 3.6. Suppose B is a set with a unary operation , binary operations
and , and specified elements 0 and 1. Define p q to mean p = p q. Then B
is a lattice under with meet and join if and only if it satisfies (1) through (3)
below, and it is a Boolean algebra with complement if and only if it satisfies (1)
through (5).
(1) Associativity: p (q r) = (p q) r and p (q r) = (p q) r for all
p, q, r B.
(2) Commutativity: p q = q p and p q = q p for all p, q B.
(3) Absorption: p (p q) = p and p (p q) = p for all p, q B.
(4) Complements: p p = 0 and p p = 1 for all p B.
(5) Distributivity: p (q r) = (p q) (p r) and p (q r) = (p q) (p r)
for all p, q, r B.
Proof. One direction is easy: it is straightforward to show that every lattice satisfies
(1) through (3) (for example, p (q r) = (p q) r is proved by showing that
both sides are the least upper bound of p, q, and r), while (4) and (5) are part of
the definition of a Boolean algebra.
For the other direction, we begin by showing that if (1) through (3) hold, then B
is partially ordered by . To show that is reflexive, we must prove that p = p p
for all p. That follows from the two absorption laws: p = p (p p) implies
p p = p (p (p p)),
and the right side simplifies to p by absorption. For antisymmetry, p q means
p = p q, and q p means q = q p, so p = q follows from commutativity. Finally,
10 HENRY COHN

for transitivity, if p = p q and q = q r, then


p r = (p q) r
= p (q r)
=pq
= p,
so p q and q r imply p r. Thus, is a partial ordering of B.
We can characterize p q not just in terms of , but also . Specifically, p = pq
if and only if q = p q, by the absorption and commutative laws: if p = p q, then
p q = (p q) q = q,
and vice versa.
Now we will verify that p q is the least upper bound of p and q in this poset, and
that p q is their greatest lower bound. We have p p q because p = p (p q)
by absorption, and q p q now follows by commutativity. Thus, p q is an upper
bound for p and q; to show that it is the least upper bound, we must prove that if
p = p r and q = q r, then p q = (p q) r. That would follow immediately
from the distributive law, but we would like to prove it using only (1) through (3).
To do so, we will use r = p r and r = q r, as justified by the previous paragraph.
Then
(p q) r = (p q) (p r)
= (p q) (p (q r))
= (p q) ((p q) r)
= p q,
where the last equality follows from absorption. Thus, p q is the least upper
bound of p and q. The proof that p q is the greatest lower bound is identical, with
all inequalities reversed and with and interchanged. (The fact that can be
characterized in terms of either or is crucial for this symmetry.)
So far, we have shown that B is a lattice if (1) through (3) hold. All that remains
is to check that 0 p 1 for all p B, since (4) and (5) then amount to the rest of
the definition of a complemented, distributive lattice. We have p1 = p(pp) = p
by the complement and absorption laws, so p 1, and p 0 = p (p p) = p, so
p 0. This completes the proof. 

These axioms highlight the duality symmetry of Boolean algebras: switching


and and 0 with 1 still yields a Boolean algebra. Thus, all Boolean algebra
identities come in dual pairs. However, this symmetry can be understood equally
well from the poset perspective, in which it amounts to switching and .

4. Classification
What are all the Boolean algebras? The obvious examples are the lattices P(S)
of all subsets of a given set S. In this section, we will show that there are no other
finite examples, up to isomorphism. An isomorphism of Boolean algebras is the
same as an isomorphism of the underlying poset. Equivalently, it is a bijective map
that preserves , , and (equivalence holds because the poset structure can be
defined in terms of these operations, and vice versa).
NOTES ON LOGIC 11

There exist infinite Boolean algebras that are not of this form. For example, take
any countably infinite subset S of an infinite Boolean algebra, and let B be the
closure of S under , , and . Then B is a Boolean algebra (by Proposition 3.6).
However, B is countably infinite, and it is therefore not even in bijection with a
power set, let alone isomorphic to one as a poset.
The Stone representation theorem characterizes all Boolean algebras (finite or
infinite) in terms of topology: for every Boolean algebra B, there is a compact,
totally disconnected, Hausdorff topological space T such that B is isomorphic to
the poset of subsets of T that are both closed and open. In fact, one can recover T
from B as the space of homomorphisms from B to the two-element Boolean algebra.
However, that theorem is beyond the scope of these notes. Instead, we will focus on
the finite version.
Definition 4.1. An atom in a Boolean algebra is a nonzero element p such that
there exists no q satisfying 0 < q < p.
In other words, an atom is an element as close to the bottom as possible, without
actually being at the bottom. The dual notion is a coatom, but we will focus on
atoms.
For our purposes, the importance of atoms is that they recover the elements of
the underlying set S from P(S): an atom in P(S) is a nonempty subset of S that
has no nonempty proper subsets, and that means an atom must be a single-element
subset. Thus, if we want to show that a Boolean algebra is of the form P(S), then
the elements of S must corresponds to the atoms in the Boolean algebra.
Lemma 4.2. If p is an atom in a Boolean algebra and p q r, then p q or
p r.
This is obvious for P(S) (if a single element is in the union of two sets, then it
must be in one or the other), but we must prove it for a general Boolean algebra.

Proof. Suppose p q r. Then


p = p (q r) = (p q) (p r).
Each of p q and p r is less than or equal to p, and thus equals either 0 or p. They
cannot both be 0, since then p = 0 0 = 0. Thus, p q = p or p r = p, so p q
or p r, as desired. 

Lemma 4.3. If p and q are elements of a Boolean algebra such that p < q and
every r < q satisfies r p, then p = 0 and q is an atom.
In other words, p cannot be a bottleneck between q and the elements less than
q, except in the degenerate case in which p = 0. To make use of the hypothesis in
proving this lemma, we must choose an r. We would like to choose one that is at
most q and unlikely to be at most p, and taking r = q p is a natural choice.

Proof. Let r = q p. Then r q, so either r = q or r p.


In the first case, q p = q, and then combining this with p = p q (from p < q)
yields
0 = q 0 = q (p p) = (q p) (q p) = q p = p.
Thus, p = 0, and now the hypotheses of the lemma also tell us that q is an atom.
12 HENRY COHN

In the second case, we have q p p. This means p = p (p q), and applying


the distributive law yields
p = (p p) (p q) = 1 (p q) = p q.
Thus, p q, which contradicts p < q, so this case cannot occur. 
Lemma 4.4. If p is a nonzero element of a finite Boolean algebra, and q1 , . . . , qk
are the atoms that are less than or equal to p, then
p = q1 qk .
Note that every nonzero element of a finite Boolean algebra has some atom beneath
it (otherwise, one could produce an infinite descending sequence of elements), and
of course only finitely many because the Boolean algebra is finite. The lemma
statement still works even if p = 0, as long as we interpret an empty join to equal 0
(as it should, because 0 is the identity element for ).
Proof. The join q1 qk is the least upper bound of q1 , . . . , qk , and thus
q1 qk p. Suppose p is a minimal counterexample to the lemma; in other
words, suppose q1 qk < p but the lemma holds for every element below p.
Note that there must be a minimal counterexample, since there cannot be an infinite
descending sequence of counterexamples in a finite Boolean algebra.
For every r < p, the minimality of p tells us that r is the join of the atoms below it,
and those atoms form a subset of q1 , . . . , qk . Thus, r q1 qk . By Lemma 4.3,
q1 qk = 0 and p is an atom. However, no atom can be a counterexample,
since every atom equals the join of those less than or equal to it (namely, itself).
Thus, there cannot be a minimal counterexample, or any counterexample at all. 
Theorem 4.5. Let B be a finite Boolean algebra, and let A be the set of atoms in
B. Then the map f : B P(A) defined by
f (p) = {a A : a p}
is an isomorphism.
Proof. Injectivity follows from Lemma 4.4, because one can recover p from f (p) as
the join of its elements. For surjectivity, given any set a1 , . . . , ak of atoms, each of
them is at most a1 ak , and Lemma 4.2 implies that no other atom is at most
a1 ak : if an atom a satisfies a a1 ak , then a ai for some i and
hence a = ai (because both are atoms). Thus, f (a1 ak ) = {a1 , . . . , ak }.
If p q, then every atom below p is also below q and hence f (p) f (q).
Conversely, if f (p) f (q), then the join of the elements of f (p) and is less than or
equal to the join of the elements of f (q), and thus p q. It follows that f is an
isomorphism. 
The intuitive picture behind the Boolean algebra P(S) is that S is the set of
possible states of the world. The truth value of a proposition is the set of states in
which that proposition is true. Theorem 4.5 tells us that if we want a finite set of
truth values, and if we want our logical operations to obey the laws enumerated in
Proposition 3.6, then this construction is the only possibility. Even in this more
general setting, classical two-valued logic plays a fundamental role: it is hiding in
the fact that each element of S is either completely in or completely out of a subset.
Aside from a digression in the next section, from this point on we will restrict
our attention to classical logic. It is an appropriate foundation for mathematics,
NOTES ON LOGIC 13

and we would gain little by casting everything in more general terms. However,
other Boolean algebras do play a fundamental role in certain areas within logic. For
example, Cohens forcing technique for proving independence results in set theory
can be understood in terms of constructing Boolean-valued models of ZFC, i.e.,
models of set theory in which the truth values are drawn from a Boolean algebra.
Of course the details are subtle and important, but this is a very plausible way to
arrive at independence proofs. At the risk of oversimplification, if one can construct
a model in which all the axioms of ZFC have truth value 1, but the continuum
hypothesis has truth value strictly between 0 and 1, then it can be neither proved
nor disproved using the axioms.

5. Quantum logic
Is it possible that the real world is governed by non-classical logic at the level
of fundamental physics? Surprisingly, the answer is yes: quantum mechanics can
naturally be described in terms of a strange logic called quantum logic, which is not
a Boolean algebra.
The fundamental construction amounts to replacing set theory with linear algebra.
The set of states of our quantum system will be described by a complex vector space
V of possible wave functions. Typically, V will be an infinite-dimensional Hilbert
space, but we can imagine a finite-dimensional vector space, and that does indeed
occur for very simple quantum systems. The state space V will play the same role
as the set S in the Boolean algebra P(S).
Truth values of propositions in quantum logic are vector subspaces of V (closed
subspaces in the infinite-dimensional case, but this distinction is irrelevant for
finite-dimensional spaces). The key difference from classical logic is the restriction
to subspaces: a proposition cannot be true for an arbitrary subset of the state space,
but rather just for a vector subspace.
Let Q be the set of all subspaces of V . Then Q is a lattice under . Specifically,
p q is the intersection of p and q, and p q is their span (i.e., the smallest subspace
containing both). The identity element 0 for is the zero-dimensional subspace,
and the identity 1 for is the full space V . Furthermore, we can define p to be
the orthogonal complement of p (provided we have an inner product on V ). Thus,
Q is a complemented lattice, so we can carry out all of our logical operations in Q.
However, Q is not a Boolean algebra, because it is not distributive, assuming
dim V 2. Let p, q, and r be distinct lines in the same plane. Then q r equals
that plane, so p (q r) = p. However, p q = 0 and p r = 0, so (p r) (p r) = 0.
Thus, p (q r) 6= (p r) (p r).
This is extremely disconcerting if one takes it literally, because the real world is
governed by quantum mechanics and hence does not have a distributive lattice of
truth values. In fact, this perspective helps explain (or at least formalize) some of
the weird aspects of quantum mechanics, such as the uncertainty principle.
For a toy model, imagine a particle that has two possible positions q0 and q1 , and
two possible momenta p0 and p1 . We know it is definitely at position q0 or q1 ; in
other words, q0 q1 = 1. Similarly, p0 p1 = 1. It follows that (q0 q1 )(p0 p1 ) = 1.
Thus, we know it has a definite position, and independently a definite momentum.
However, we cannot apply the distributive law to conclude that

(q0 p0 ) (q0 p1 ) (q1 p0 ) (q1 p1 ) = 1.


14 HENRY COHN

In fact, the uncertainty principle for position and momentum tells us that this is
false! In other words, even though the particle has a definite position and a definite
momentum, we cannot conclude that we can specify both at once.
For a concrete realization of this possibility, let p and q be distinct lines in a
two-dimensional space V , such that p and q are not orthogonal. Then

(p p) (q q) = 1,

so every state satisfies p or p and satisfies q or q, but

(p q) (p q) (p q) (p q) = 0,

so there is no state in which we can pin down the status of both p and q.
It is difficult to understand what this means, because it is natural to apply the
distributive law without thinking, and this difficulty is responsible for much of
the confusing nature of quantum mechanics. Fortunately, we do not have to use
quantum logic to understand quantum mechanics. There are two ways to approach
quantum mechanics: we can either think inside the quantum system and apply
quantum logic, or think outside the system and apply classical logic to reason about
the state space V . The second option implicitly deals with quantum logic in the
form of linear algebra, but it is much easier to take this approach. In principle, one
might hope to achieve deeper insight by training ones brain to use quantum logic
directly, but in practice that does not seem to be fruitful.
Quantum logic is far from an arbitrary complemented lattice. First, it is ortho-
complemented. In other words, we can choose a complement p for each p such that
p = p and the map p 7 p is order-reversing. (For comparison, it is not hard to
check that the complemented lattice shown in Figure 3.2 is not orthocomplemented.)
Furthermore, quantum logic is orthomodular : if p q, then p (p q) = q. Note
that this is a special case of the distributive law p (p q) = (p p) (p q).
Orthomodularity is more symmetrical than it looks, because one can prove that
the orthomodular law is equivalent to its dual, which says that if r s, then
r (r s) = s. (Take r = p and s = q.)
As we can see from the uncertainty principle, orthomodularity is a poor substitute
for distributivity, but it is better than nothing. It is remarkable that it describes
quantum mechanics while distributivity does not.

6. Formal proofs
In this section, we will develop a notion of formal proof for tautologies. Of course,
it is straightforward to test whether a wff is a tautology using a truth table, so there
is no real need to use formal proofs. However, they are interesting in their own right
as well as good preparation for the much deeper topic of formal proofs in first-order
logic, which we will turn to next.
Note that we are setting aside general Boolean algebras and returning to classical
logic, the two-element Boolean algebra {>, }.
For simplicity, we will use just a subset of the propositional calculus, with
variables, parentheses, , and . Everything else can be defined in terms of them,
according to the rows of the following table:
NOTES ON LOGIC 15

( )
> ( )
( ) ( )
( ) ( )
( ) (( ) ( ))
We will view , , , and simply as abbreviations, rather than part of our formal
language. Alternately, one could keep them as part of the language and have special
rules for going back and forth between the two forms listed above.
The purpose of simplifying the language is to minimize case analysis in our study
of the proof system. Nobody would want to use such a restricted set of symbols
for mathematics in practice, but it is pleasant to streamline the theory as much as
possible.
Our proof system will be based on five axioms. Specifically, for all wffs , , and
, each of the following wffs is an axiom:
(1) ( )
(2) ( ( ))
(3) (( ) (( ( )) ( )))
(4) (( ) ( ))
(5) ((( ) ) )
Note that these axioms are tautologies (they are true regardless of what , , and
are). Axioms 1 and 3 are obvious properties of implication, as is 2 if one keeps
in mind that is material implication. Axioms 4 and 5 are best understood by
reading ( ) as .
Strictly speaking, these are axiom schemes: they are infinite families of axioms,
one for each choice of , , and . However, this distinction wont be important for
us right now.
Axiom 5 is the law of the excluded middle: a proposition that is not false must
be true. Axioms 1 through 4 define what is known as intuitionistic logic.
It is natural to wonder where these axioms came from. What would lead someone
to choose them, as opposed to all the other true axioms one could choose instead?
The answer is that they come from reverse engineering proofs. If you try to prove
the results from this section, and add a new axiom every time you get stuck, then
youll end up with a perfectly serviceable list of axioms.
In addition to the axioms, our system has one rule of inference, called modus
ponens (Latin for the method of affirming or establishing). Modus ponens says that
if we have proved wffs and ( ), then we can deduce .
Note that a rule of inference tells how we can manipulate wffs, which is different
from what an axiom does. See What the Tortoise said to Achilles by Lewis Carroll
(Mind 4 (1895), 278280; reprinted in Hofstadters book G odel, Escher, Bach: an
Eternal Golden Braid ). See also A counterexample to modus ponens by Vann McGee
(Journal of Philosophy 82 (1985), 462471) for the only objection Ive ever seen to
modus ponens.
We can now define a formal proof of a wff from a set of hypotheses (i.e., any
wffs other than axioms that we would like to assume).

Definition 6.1. Let be a wff and a set of wffs. A formal proof of given
is a finite sequence 1 , . . . , n of wffs with n = , such that each of them either
follows from two previous wffs in the sequence via modus ponens or is an axiom or
16 HENRY COHN

element of . To be more precise, we say that i follows from j and k by modus


ponens if j, k < i and k = (j i ).
We write ` if there is a formal proof of given , and we write ` if there
is a formal proof of given .
Note that ` if and only if there is a finite subset 0 of such that 0 ` ,
because every formal proof involves only finitely many wffs.
As an example of a formal proof, we will show that Axiom 1 is redundant, given
the other axioms. Let be any wff. Then the following five wffs are a formal proof
of ( ) without using Axiom 1:
(1) ( ( ))
(2) (( ( )) (( (( ) )) ( )))
(3) (( (( ) )) ( ))
(4) ( (( ) ))
(5) ( )
Specifically, line 1 of the proof takes = and = in Axiom 2. Line 2 takes
= , = ( ), and = in Axiom 3. Line 3 follows from modus ponens
applied to lines 1 and 2. Line 4 takes = ( ) and = in Axiom 2. Finally,
line 5 follows from modus ponens applied to lines 4 and 3.
As we can see from this example, formal proofs are typically cumbersome and
unenlightening. The purpose isnt to give intuitive insight, but rather to capture
the idea of provability in a formal system, and we will see that they accomplish this
goal.
Definition 6.2. A set of wffs is inconsistent if ` , and consistent otherwise.
It is satisfiable if there is an assignment of truth values to the variables such that
every wff in evaluates to > (such an assignment is called a satisfying assignment).
If we think of as denoting a contradiction, then a set of wffs is consistent if and
only if it cannot prove a contradiction. For example, {, ( )} is inconsistent.
Lemma 6.3 (Soundness). If ` , then evaluates to > under every satisfying
assignment for .
Proof. Consider any formal proof of given . The axioms are tautologies, and the
elements of are by assumption true under every satisfying assignment. Furthermore,
modus ponens preserves truth: if and ( ) both evaluate to >, then so does
. Thus, the conclusion of the proof must evaluate to >. Formally, we can prove
it by induction on the proof length. 

Corollary 6.4. Every satisfiable set of wffs is consistent.


The converse of soundness is completeness, which is a much deeper result (Theo-
rem 6.8).
Lemma 6.5 (Deduction). If {} ` , then ` ( ).
The intuition is obvious: a proof of assuming as a hypothesis amounts to a
proof of ( ). However, it is far from obvious that our formal proof system has
this property. We prove below that it does. Note that in the proof, we are working
outside of our formal system to establish properties of the system.
NOTES ON LOGIC 17

Proof. Let 1 , . . . , n be a formal proof of given {}. We will turn the


sequence ( 1 ), . . . , ( n ) into a formal proof of ( ) given , by
inserting some additional steps as justification.
Each step i in the original proof is either , an element of , an axiom, or the
result of modus ponens. If i is , then ( i ) follows from Axiom 1.
If i is an element of or an axiom, then we justify ( i ) by inserting two
additional steps before it: first we assert i itself (as an axiom or element of ), and
then (i ( i )) via Axiom 2, before deducing ( i ) by modus ponens.
Finally, suppose j, k < i and k = (j i ), so i follows from j and
k by modus ponens. We must justify deducing ( i ) from ( j ) and
( (j i )). By Axiom 3, we can add
(( j ) (( (j i )) ( i )))
to the proof, after which applying modus ponens twice lets us deduce (( (j
i )) ( i )) and then ( i ), as desired. 

The primary purpose of Axioms 1 through 3 is to justify the deduction lemma.


The lemma itself is an important tool for constructing formal proofs. For example,
suppose and are wffs, and we want to show that
` ( (( ) )).
One can give a direct proof, but the deduction lemma makes it completely straightfor-
ward. Specifically, {, ( )} ` has the three-step proof , ( ), , and now
we can apply the deduction lemma twice, to conclude that {} ` (( ) ) and
then ` ( (( ) )). Note that the proof of the deduction lemma explains
how to unfold this argument into a complete formal proof of ( (( ) )).
Lemma 6.6. Let be a wff and a set of wffs.
(1) If is consistent and ` , then {} is consistent.
(2) If {} is inconsistent, then ` ( ).
(3) If {( )} is inconsistent, then ` .
Proof. For (1), if ` and {} ` , then consider any formal proof of from
{}. If appears in the proof, then one can insert a proof of it from , and
this shows that ` .
For (2), since {} ` , the deduction lemma implies that ` ( ).
Finally, for (3), the deduction lemma implies that ` (( ) ). Combin-
ing this with Axiom 5 leads to ` . 

Corollary 6.7. Let be a wff and a set of wffs. If is consistent, then at least
one of {} and {( )} is consistent (possibly both).
Proof. If both {} and {( )} are inconsistent, then ` ( ) and
` , by (2) and (3) in Lemma 6.6. Then ` by modus ponens, and hence is
inconsistent, as desired. 

We can now prove the converse of soundness. It says that our proof system is
powerful enough to prove everything that could possibly be proved.
Theorem 6.8 (Completeness). If evaluates to > under every satisfying assign-
ment for , then ` .
18 HENRY COHN

For example, is a tautology if and only if ` , because every assignment is a


satisfying assignment for the empty set.

Corollary 6.9. If a set of wffs is consistent, then it is satisfiable.

To see why the corollary follows from the completeness theorem, note that if is
not satisfiable, then every satisfying assignment for makes true (because has
no satisfying assignments), and thus ` . In fact, the theorem also follows from
the corollary: if evaluates to > under every satisfying assignment for , then
{( )} is not satisfiable. By the corollary, it is inconsistent, so part (3) of
Lemma 6.6 implies that ` .
Thus, it suffices to prove the corollary. To do so, we must somehow use consistency
to produce a satisfying assignment. There are usually many satisfying assignments
for a given set of wffs (if it is consistent), which makes it difficult to single one out.
We will get around this difficulty by making the consistent set as large as possible,
so that it will have a unique satisfying assignment, which can easily be described.

Proof. Suppose is a consistent set of wffs. We begin by finding a maximal


consistent set of wffs containing . Specifically, there are only countably many
wffs, so we can number them 1 , 2 , . . . . Let 0 = , and for each i 1, let
i = i1 {i } if i1 {i } is consistent, and let i = i1 {(i )}
otherwise. By Corollary 6.7, i is consistent for all i.
Now let 0 = i0 i . This set is also consistent, since if 0 ` , then some
S

finite subset of 0 also proves . Each element of the finite subset is in i for some
i, and because the sets i are nested, the entire finite subset is in one of them.
However, that contradicts the consistency of i .
The set 0 is a maximal consistent set of wffs, because for each wff , either
0 or ( ) 0 (while they cannot both be in any consistent set). If is
consistent with 0 , then 0 .
Now we can define an assignment of truth values as follows. For each variable x,
if x 0 then we set x to be >, and if (x ) 0 then we set x to be . Let
ev() denote the evaluation of under this assignment.
We will prove that for every wff , ev() = > if and only if 0 . Thus,
because 0 , we have found a satisfying assignment for . We will prove this
equivalence by induction on the length of .
We begin with the cases when is a variable or . By definition, each variable
x satisfies ev(x) = > iff x 0 . Furthermore, ev() = , which corresponds with
6 0 (since 0 is consistent). Thus, all we need to verify is that the equivalence
between ev() = > and 0 holds when = ( ), assuming it holds for
and because they are shorter wffs.
Specifically, we must prove that 0 when ev() = > or ev() = , and that
6 0 when ev() = and ev() = >. In terms of membership in 0 , we must
prove that if 0 or ( ) 0 , then ( ) 0 , while if ( ) 0
and 0 , then ( ) 6 0 .
First, suppose 0 . By Axiom 2, 0 ` ( ( )), and hence 0 ` ( )
by modus ponens. Thus, ( ) is consistent with 0 by part (1) of Lemma 6.6,
so ( ) 0 .
When ( ) 0 , we can use the same argument with Axiom 4 instead of
Axiom 2. Recall that Axiom 4 tells us that 0 ` (( ) ( )).
NOTES ON LOGIC 19

Finally, if ( ) 0 and 0 , then ( ) cannot be in 0 , since if it


were then 0 would be inconsistent. This completes the case analysis and shows
that we have found a satisfying assignment. 

7. First-order languages and syntax


Propositional calculus is a useful tool for exploring the relationships between
propositions, but it is not expressive enough to serve as the foundation for mathe-
matics. Instead, we will use an extension called first-order logic, which will suffice
for all of mathematics. Its the logic of variables and quantifiers ( and ).
We will begin by specifying a formal language. As in the propositional calculus,
we will define well-formed formulas as abstract strings of symbols, with no pre-
defined meanings. In order to assign meanings and truth values to them, we will
have to specify a mathematical structure to which they are supposed to apply.

Definition 7.1. A first-order language consists of six disjoint sets of symbols:


(1) a countably infinite set of variable symbols,
(2) the seven logical symbols , , , , , , ,
(3) the left and right parentheses, comma, and equals sign,
(4) a set of constant symbols,
(5) a set of relation symbols, with a natural number called the arity associated
with each of them, and
(6) a set of function symbols, each with an arity.

Definition 7.1 is stated a little informally. Strictly speaking, we should define a


first-order language to be a 6-tuple, we should consider ordered pairs consisting of a
relation symbol and its arity, etc. This sort of formalization is straightforward and
unexciting.
We will interpret the arity of a function/relation as the number of inputs it
takes. A function/relation with arity k is also called a k-ary function/relation (this
generalizes the terms unary, binary, ternary, etc.).
The symbols in a first-order language will form the alphabet for well-formed
formulas. In other words, specifying a first-order language amounts to specifying a
template for what we are allowed to talk about (i.e., for which constants, relations,
and functions have names in our languages). For example, if we want to talk about
groups, we will need a binary function symbol to represent group multiplication,
and it may also be convenient to have a constant symbol for the identity and a
unary function symbol for the inverse map. If we want to talk about an ordered set,
we will need a binary relation symbol.
Note that symbol means nothing by itself: anything can be a symbol. In
practice, we will represent symbols with letters. Similarly, we have not given a
precise definition to symbols like or ), but it doesnt matter (they can be anything,
and the only important thing is that they are unique symbols we choose the represent
with this notation). As in the propositional calculus, we will sometimes use primes
to generate additional variables if we ever run out of letters.
There are many different first-order languages, depending on which constant,
relation, and function symbols are included. A universal language might sound
attractive, but instead it is often convenient to use languages tailored to particular
applications. On the other hand, the ZFC axioms of set theory can be formulated
20 HENRY COHN

within first-order logic, using just a single binary relation symbol . The first-order
language of set theory is nearly universal within mathematics.
It may seem circular to base our analysis of logic on sets and then formulate
set theory within a framework of mathematical logic. From a strictly deductive
perspective, this is correct: we cant justify each one using the other without
circularity. However, we should be making a distinction between mathematics and
metamathematics (the mathematical study of mathematics itself). Our perspective
will be that we know what a proof is and how to judge an informal argument. The
point to formalizing proofs is generally not to increase our confidence or justify our
reasoning. Instead, its to get a deeper understanding of the very idea of proof. In
order to even talk about questions such as the limits of proof, we need to analyze
the concept of proof carefully and give precise definitions. When we carry out this
metamathematics, we wont be working within a formal system, but rather analyzing
a formal system using the ordinary tools of mathematics. Once we have built a
powerful enough system, we could use it to prove formal versions of the informal
theorems we proved about it. Of course it would be silly to believe those theorems
just because the formal system said so. Instead, we believe them because we were
convinced by the informal proofs. However, its nice to know that the formal system
agrees with us, and it would be worrisome if we were unable to formalize the proofs.
As in propositional calculus, we begin by specifying the syntax of first-order logic.
Definition 7.2. The terms in a first-order language are the strings specified
recursively as follows:
(1) All variable and constant symbols are terms.
(2) If f is an n-ary function symbol and t1 , . . . , tn are terms, them f (t1 , . . . , tn )
is a term.
In other words, the terms are the expressions obtained by applying function
symbols repeatedly, starting with variables and constants. This definition is stated
quite a bit less formally than the definition of wffs in the propositional calculus
(Definition 1.3), but the reader should have no difficulty translating between these
degrees of formality.
It is sometimes convenient to use other notation. For example, we could use
infix notation to write a + b instead of +(a, b) if + is a binary function symbol.
We will sometimes do this, but well treat it as an informal abbreviation for the
sake of human readability, with +(a, b) being the true string in the formal language.
An alternative approach would be to formalize several different types of function
symbols, but its not worth complicating the formal language for the sake of human
stylistic conventions.
Definition 7.3. The wffs (well-formed formulas) in a first-order language are the
strings specified recursively as follows:
(1) If R is an n-ary relation symbol and t1 , . . . , tn are terms, then R(t1 , . . . , tn )
is a wff.
(2) If t1 and t2 are terms, then =(t1 , t2 ) is a wff.
(3) If and are wffs, then so are , ( ), ( ), ( ), and ( ).
(4) If is a wff and x is a variable, then x and x are wffs.
The first two sorts of wffs (R(t1 , . . . , tn ) and =(t1 , t2 )) are called atomic wffs, because
they cannot be broken down into smaller wffs.
NOTES ON LOGIC 21

Note that wffs in the propositional calculus are not first-order wffs (for example,
(p p) is not a first-order wff, because the variable p is not itself a wff). From now
on, the term wff will by default refer to first-order wffs.
In our formalization, first-order wffs must be fully parenthesized. As in the case
of the propositional calculus, this implies unique readability: no wff can be formed
in two different ways using the defining rules.
In practice we will always write (1 = 2 ) instead of =(1 , 2 ), because it looks
too weird otherwise. As in case of infix function symbols, well view this as an
abbreviation for the true wff. (Similarly, we may omit parentheses for readability,
but the formal system itself does not allow this.) The reason why the equals sign
is handled separately from the relation symbols is that it will have a predefined
meaning.
By default, we will use lowercase letters for variables, unless they are already
taken for constants or functions. For example, x (x = x) is a wff.
The definition of wff may seem overly permissive. For example, x (x = y) is
a wff, although it doesnt specify the role of y. Worse yet, x x (x = x) is a wff,
despite quantifying twice over the same variable. Examples such as x (x = y) are
actually useful, but x x (x = x) is not. Still, it wont do any harm once we specify
first-order semantics, so there is no need to rule it out.
Definition 7.4. Let be a wff, let x be a variable, and let Q denote either or .
If Qx occurs as a (consecutive) substring of a wff , then this substring is the
scope of the quantifier Q. Every occurrence of x in is bound, and a variable is free
if it is not bound by any quantifier.
For example, in x (x = y), the x in x = y is bound while the y is free. In
y x (x = y), both x and y are bound. In
(x (x = y) z (x = z)),
the x in x = y is bound, as is the z in x = z, while the y in x = y and the x in x = z
are free. In particular, one occurrence of a variable may be bound while another is
free.
Note that fully parenthesizing wffs can be crucial for avoiding ambiguity. For
example, (x (x = y) z (x = z)) is different from x ((x = y) z (x = z)),
and x (x = y) z (x = z) is in principle ambiguous. We will sometimes drop
the outermost parentheses, in which case x (x = y) z (x = z) will mean
(x (x = y) z (x = z)).
Note that Definition 7.4 depends on unique readability to make sense.
Definition 7.5. A sentence is a wff with no free variables.

8. First-order semantics
Without some addition information, a wff has no meaning. Even if we try to
interpret the logical symbols, we dont know what the other symbols (variables,
constants, functions, relations) are referring to, and so we cant make any sense of
the wff. In other to assign meaning to a wff, we need the specify the mathematical
structure we are trying to apply it to. Then we can decide whether it is true or false
for that particular structure.
Definition 8.1. A structure A for a first-order language L consists of a nonempty
set A called the domain of A, an element denoted cA A for each constant symbol
22 HENRY COHN

c in L, a function f A : An A for each n-ary function symbol f in L, and a relation


RA An for each n-ary relation symbol R.
When we try to interpret a wff in a structure, variables will be assumed to range
over the domain. Constant, function, and relation symbols will refer to the actual
constants, functions, and relations in the structure.
To decide whether a wff is true, it is not quite enough to specify a structure. We
will also need to specify values for all the free variables, since they are otherwise
undetermined. It will be convenient to specify the free variables via an assignment
of values to all the variables, as follows. (This is not crucial, but assigning a value
to every variable means we dont have to keep track of which variables are free. We
will just ignore the assignment for the bound variables.)
Definition 8.2. An assignment for a structure A is a function from the set of
variable symbols to the domain A of A. Given an assignment s, variable x, and
element a A, the substitution sx,a is the assignment defined by sx,a (y) = s(y) for
each variable y 6= x and sx,a (x) = a. (In other words, we keep the values s(y) for
y 6= x, but overwrite the value at x with a.)
Given a structure A and an assignment s, for each term t we will inductively
define tA,s A as follows. If t is a variable, then tA,s = s(t). If t is a constant, then
tA,s = tA . Otherwise, if t = f (t1 , . . . , tn ), then
tA,s = f A (tA,s A,s
1 , . . . , tn ).
The point here is that specifying a structure and an assignment provides enough
information to evaluate every term.
Definition 8.3. Let A be a structure for a first-order language, and let be a wff.
We say A satisfies under assignment s, written A |=s , based on the following
rules:
(1) If is (t1 = t2 ), then A |=s means tA,s 1 = tA,s
2 .
(2) If is R(t1 , . . . , tn ), then A |=s means (tA,s A,s A
1 , . . . , tn ) R .
(3) If is , then A |=s means A 6|=s . If is 1 2 , then A |=s
means A |=s 1 and A |=s 2 . And the analogous definitions hold for , ,
and .
(4) If is x , then A |=s means there exists a A such that A |=sx,a .
(5) If is x , then A |=s means for all a A, A |=sx,a .
In other words, given a structure, the symbols mean exactly what youd think
they mean. This definition may at first look almost vacuous, but it has real content:
it associates a precise meaning in the structure to formal strings of symbols.
For a simple example, suppose the domain A of our structure is the set of natural
numbers, and s assigns the variable x the value s(x) = 1 and the variable y the
value s(y) = 2. Then A |=s (x = y), because s(x) = 1 6= 2 = s(y). Similarly,
A |=sx,2 (x = y), because sx,2 (x) = 2 = sx,2 (y).
Lets look at a tricky case involving nested quantifiers, to see how the definition
works. What, if anything, does x y x R(x, y) mean? Given an assignment s,
A |=s x y x R(x, y) means
for all a A, A |=sx,a y x R(x, y).
This means
for all a A, for all b A, A |=(sx,a )y,b x R(x, y),
NOTES ON LOGIC 23

which in turn means


for all a A, for all b A, there exists c A such that A |=  R(x, y).
(sx,a )y,b
x,c

On the other hand, (sx,a )y,b x,c
maps x to c and y to b, so this comes down to
for all a A, for all b A, there exists c A such that RA (c, b).
(We will often write RA (c, b) to mean (c, b) RA .) Now a is irrelevant, and we see
that when two quantifier conflict, its the innermost one that counts. I dont want
to overemphasize this point, because theres no good reason to write a formula like
x y x R(x, y). However, its nice to know our definitions can make sense of it.
Lemma 8.4. If s and s0 are assignments that agree on the free variables in , then
A |=s if and only if A |=s0 .
It is straightforward to prove this lemma inductively. It captures the intuition
that the only point of the assignment s is to specify the free variables (the bound
variables overwrite s via substitution).
Definition 8.5. If is a sentence in a first-order language and A is a structure for
this language, then A |= means A |=s for some assignment s (equivalently, all
of them, because has no free variables). If is a set of sentences, then A |=
means A |= for all . In that case, we see that A is a model of the set
(which we view as axioms).
It is also convenient to have a more pleasant notation for substitution in free
variables. Suppose (x1 , . . . , xk ) is a wff with free variables x1 , . . . , xk ; this means
that is a wff in which x1 , . . . , xk are the only free variables. Given a structure A
and elements a1 , . . . , ak A, we write
A |= (a1 , . . . , ak )
to mean
A |=s
holds when s is any substitution satisfying s(xi ) = ai . This notation is a little
sloppy. For example, it depends on the specified ordering of the variables x1 , . . . , xk ,
so it makes no sense to use it without first saying suppose (x1 , . . . , xk ) is a wff
with free variables x1 , . . . , xk or the like. Furthermore, the notation makes it look
like we are literally substituting a1 , . . . , ak into the wff, which doesnt make sense in
our framework. (We have no notion of a wff with elements of a structure substituted
into it.) However, this notation is clearer and more suggestive than taking about
substitution functions, so we will use it whenever possible.

9. Examples of first-order languages


In the first-order language of graph theory, there is a single binary relation called
adjacency and denoted u v using infix notation, and there are no constant or
function symbols. Variables stand for vertices in a graph, and u v means vertices
u and v are connected by an edge. A simple, undirected graph is a structure that
satisfies the wffs x (x x) and x y ((x y) (y x)). More generally, a
structure is a directed graph with possible self-loops.
There are some interesting things one can say about first-order graph theory,
but the first thing to observe is that this is a weak language. We can say more
24 HENRY COHN

simple things in this language. For example, that there exists a triangle, or that
every triangle is contained in a complete graph on seven vertices. (Of course, these
sentences may or may not be true, depending on the structure.) However, there does
not exist a wff (x, y) with two free variables x and y such that (x, y) holds if and
only if x and y are connected by a path. The basis reason is that we cannot use an
ellipsis in a first-order formula. (We also cannot quantify over wffs or over natural
numbers.) For each n, there is a wff n (x, y) that says x and y are connected by a
path of length at most n. For example, we can take 2 (x, y) to be

(x = y) (x y) z ((x z) (z y))

(leaving out some parentheses). However, as n increases the wffs n become longer
and longer. We will prove later that connectivity is not a first-order property.
By contrast, set theory will provide a far more expressive language. Strictly
speaking, the underlying first-order language is the same as for graph theory, just
with different notation. Instead of the adjacency relation , we will have the set
membership symbol . Whats different is how we interpret the language. Variables
will stand for sets, and x y will mean that x is an element of y.
The ZFC axioms can all be formulated as first-order sentences. Whats even
better is that doing so clears up the remaining ambiguity from our informal approach
to ZFC. In a few of the axioms, such as separation, we needed to refer to a property
of sets, and we left it somewhat vague what that meant. To make the axiom precise,
we can interpret a property to be any first-order wff:

Axiom (Separation). For each first-order wff (x, y1 , . . . , yn ) in the language of


set theory and with free variables x, y1 , . . . , yn , ZFC includes the axiom

y1 . . . yn S T x ((x T ) ((x S) (x, y1 , . . . , yn ))).

The ellipses are simply an abbreviation: for each specific wff , the corresponding
axiom requires no ellipses to state.
Note that separation has become an axiom scheme consisting of infinitely many
analogous axioms. This is because first-order logic cannot quantify over wffs, so we
need to assume a separate axiom for each wff. Fortunately, theres no harm in doing
so. ZFC is therefore an infinite set of first-order wffs.
A third example is Peano arithmetic (sometimes abbreviated PA), which is the
first-order theory of elementary number theory. Variables stand for natural numbers,
there is a constant symbol 0, there is a unary function S (standing for successor),
and there are binary functions + and (written with infix notation). The axioms of
Peano arithmetic are as follows:

(1) n (S(n) = 0).


(2) m n ((S(m) = S(n)) (m = n)).
(3) n ((n + 0) = n).
(4) n ((n 0) = 0).
(5) n m ((n + S(m)) = S(n + m)).
(6) n m ((n S(m)) = ((n m) + n)).
(7) The first-order induction scheme, defined below.
NOTES ON LOGIC 25

The induction scheme consists of infinitely many axioms. For each wff (n, m1 , . . . , mk )
in the language of PA with free variables n, m1 , . . . , mk , we assume an axiom stating

m1 . . . mk (0, m1 , . . . , ml ) n ((n, m1 , . . . , mk ) (S(n), m1 , . . . , mk ))

n (n, m1 , . . . , mk ) .

In other words, it says you can prove an assertion (n, m1 , . . . , mk ) by induction on


n. The extra free variables make the statement more cumbersome, but they allow
us to deal with assertions that depend on outside parameters. This is the same
issue as in the Axiom of Separation.
Our three examples so far (graph theory, set theory, and number theory) exhibit
seemingly different behavior regarding models. Every simple, undirected graph is
a model of our axioms of graph theory. The natural numbers are a model of the
Peano axioms, and it is not obvious whether there are any others. As for set theory,
it is not clear whether there are any models of ZFC at all. (Note that the class of
all sets does not yield a model of ZFC, because it is not a set.) We will see that
Peano arithmetic has many models, and that although ZFC surely does as well,
ZFC cannot prove that ZFC has a model. More precisely, if ZFC can prove that it
has a model, then it is inconsistent.

10. Completeness and compactness


The most fundamental theorems about first-order logic are the completeness and
compactness theorems.

Definition 10.1. A set of first-order sentences is called satisfiable if it has a model,


and unsatisfiable otherwise. Given such a set , we write |= to mean A |=
for every model A of .

Theorem 10.2 (Compactness). Let be a set of first-order sentences. If every


finite subset of is satisfiable, then so is .

Another way of stating compactness is that every finitely satisfiable set of


sentences is satisfiable. The compactness theorem is far from obvious, and it is a
profound fact about first-order logic. We will prove it later, after we look at some
examples and applications.
The name compactness might be a little puzzling, but it refers to a beautiful
topological interpretation. Given a first-order language, consider the class S of all
structures for this language. (Its a proper class, but we wont let that trouble us.2)
Define a topology on S by saying that for every first-order sentence , the class
S of structures satisfying is closed in S. (It is also open, since its complement
is S .) In other words, we are looking at the topology generated by the subbase
{S : is a sentence}.

2This issue is actually a red herring. When defining the topology on S, we care only about the
first-order sentences satisfied by each structure, so we might as well identify structures that satisfy
the same sentences. Equivalently, we can replace each structure with the set of first-order sentences
satisfied by it; the resulting sets are exactly the maximal satisfiable sets of sentences. They form a
set S 0 (a subset of the power set of the set of sentences), and we might as well topologize it instead
of S. Using S 0 lets us avoid talking about proper classes, but I feel S is more natural than S 0 is.
26 HENRY COHN

In this language, Theorem 10.2 says that if {S : } has the finite intersection
property, then
\
S 6= .

That is equivalent to saying that S is compact. To see why, note that it is a special
case of the compactness of S, and in fact the Alexander subbase theorem says that
this special case is equivalent to the general case.
Another way of thinking about compactness is via formal proofs. In Section 16,
we will set up a formal proof theory for first-order logic, at which point we will
define what ` means. Then we will prove completeness:
Theorem 10.3 (Completeness). Let be a set of first-order sentences and a
sentence. Then ` if and only if |= .
This theorem says that formal proofs capture exactly what we want them to:
a sentence has a formal proof using some axioms if and only if it holds in every
structure satisfying those axioms. This is the ideal notion of proof. As in the
propositional calculus, one direction is easy (namely, soundness: ` implies
|= ), while the other is much deeper.
Completeness might sound contrary to Godels incompleteness theorems, but
it is not. Whats going on is that proof system is complete, while the axioms are
incomplete. The completeness of the proof system means that the sentences that
hold in every model of ZFC are provable, while those whose negations hold in every
model are disprovable. By contrast, the independence of the continuum hypothesis
amounts to saying that it holds in some models of ZFC but not others. Godels first
incompleteness theorem says that not only are the ZFC axioms incomplete, but they
cannot be completed in any reasonable way. We will discuss the incompleteness
phenomenon later.
We can reformulate completeness in terms of consistency, much like Corollary 6.9.
Theorem 10.4. Let be a set of first-order sentences. If is consistent, then
has a model.
Note that the converse of this theorem is obvious (or, rather, will be obvious
once we have defined formal proof): anything you can prove using must hold
in every model, so if has a model, then you had better not be able to prove a
contradiction.
It is not hard to show that Theorem 10.4 is equivalent to completeness. Suppose
we assume the completeness theorem. If has no model, then |= and hence, by
completeness, ` . Conversely, suppose Theorem 10.4 holds, and |= . Then
{} has no model and is thus inconsistent, which means ` .
What Theorem 10.4 tells us is that formal proofs are the only obstacle to the
existence of a model. If you cant derive a contradiction, then a model actually exists.
This is analogous to Corollary 6.9, but much deeper. Instead of creating just a
satisfying assignment, we need to come up with a first-order structure. Nevertheless,
the proof will follow similar lines, with one additional idea (Henkinization).
One striking application is ZFC. If ZFC is consistent, then it has a model. However,
Godels second incompleteness theorem will imply that ZFC cannot prove its own
consistency (assuming it is indeed consistent). Thus, although ZFC presumably
does have a model, it cannot prove this fact.
NOTES ON LOGIC 27

Theorem 10.4 makes the compactness theorem obvious. All proofs will have
finite length, so if theres a contradiction, then the contradiction can be derived
using only a finite number of sentences. Thus, if every finite subset of a set of
sentences is consistent, then the whole set is consistent. This assertion becomes
compactness when we replace consistency by satisfiability, and Theorem 10.4 tells
us that consistency and satisfiability are the same thing.

11. Consequences of compactness


Before proving the compactness theorem, lets examine some of its consequences.
In this section we will assume its truth.
How well can a set of first order axioms determine a model? For example, can
they determine the models cardinality? For finite models this is easy. For example,
the sentence
x y z ((x = y) (y = z) (z = x) w ((w = x) (w = y) (w = z)))
forces a model to consist of exactly three elements, and one can write a similar
sentence for any natural number. Furthermore, one can specify any finite set of
natural numbers, for example that a model must have cardinality one, four, or
eight. However, as soon as you allow arbitrarily large finite models, infinite models
necessarily exist:
Proposition 11.1. Let be a set of first-order sentences. If has arbitrarily large
finite models, then it has an infinite model.
Proof. Consider the set
0 = {at least one element, at least two elements, at least three elements, . . . }.
Here, at least two elements is an abbreviation for the sentence
x y (x = y),
etc. By assumption, every finite subset of 0 is satisfiable, using a large enough
model of . Thus, by compactness 0 is satisfiable, and a model of 0 is an infinite
model of . 
For example, in the first-order language of graph theory, let be the set of
sentences that are true in every finite simple, undirected graph. Such graphs can be
arbitrarily large, and thus there exists an infinite graph G such that G |= . This
graph is indistinguishable from a finite graph using first-order sentences. Note that
to get this conclusion, we do not even need to know which sentences are in .
Similarly, we can consider the first-order language of fields from abstract algebra.
Then there exist infinite fields that are indistinguishable from finite fields using
first-order sentences. Such fields are known as pseudo-finite fields.
Of course, its easy to make a set of sentences that force a model to be infinite,
as in the proof of Proposition 11.1: we can just take
{at least one element, at least two elements, at least three elements, . . . }.
This construction uses infinitely many axioms, which is unavoidable at this level of
generality:
Proposition 11.2. There is no finite set of first-order sentences whose models
are exactly the infinite structures (i.e., every infinite structure is a model and no
finite structure is).
28 HENRY COHN

Proof. By taking the conjunction of the elements of , we might as well assume


consists of a single sentence . Because is satisfied for every finite structure, it
must be satisfied for some infinite structure as well, by Proposition 11.1. 
By contrast, we can easily force a model to be infinite if we are willing to put
nontrivial restrictions on it. For example, let f be a unary function symbol. Then
we can write a first-order sentence that says f is injective but not surjective. This
sentence forces a structure to be infinite, but not every infinite structure satisfies
the sentence (in accordance with the previous proof).
Note that we can form the negation of a single sentence, but not of a set of
sentences. Using an infinite set of axioms is tantamount to allowing an infinite
conjunction of sentences, but we cannot form an infinite disjunction in this way.
If we could, then the same proof idea as in Proposition 11.2 would disprove the
compactness theorem.
Using the compactness theorem, we can prove the assertion from Section 9 that
connectedness is not definable in first-order graph theory:
Proposition 11.3. There is no wff (x, y) in the first-order language of graph
theory with free variables x and y such that G |= (v1 , v2 ) iff there is a path in G
from v1 to v2 .
Proof. Suppose such a wff (x, y) exists. Recall that for each n there is another wff
n (x, y) that says there is a path of length at most n from x to y. Now we extend
the language to include two constant symbols c and d. Consider the set
= {(c, d), 1 (c, d), 2 (c, d), 3 (c, d), . . . }.
Every finite subset of is satisfiable, because there exist graphs in which c and d
are connected but only by long paths. Thus, is satisfiable. In any model of , c
and d must be connected, but not by any finite-length path. This contradicts the
existence of . 
Definition 11.4. Given a first-order language, an isomorphism between structures
A and B (written A = B) is a bijection g : A B between their domains A and B,
such that all the constants, relations, and functions correspond under this bijection.
In other words, for every constant symbol c, g(cA ) = cB . For every n-ary function
symbol f and a1 , . . . , an A, g(f A (a1 , . . . , an )) = f B (g(a1 ), . . . , g(an )). For every
n-ary relation symbol R and a1 , . . . , an A, RA (a1 , . . . , an ) holds if and only if
RB (g(a1 ), . . . , g(an )) holds.
The standard model of Peano arithmetic has domain N, with the usual successor,
addition, and multiplication functions and with the constant symbol 0 having its
usual meaning. A nonstandard model is one that is not isomorphic to the standard
model. It is a remarkable fact that such models exist, but it follows easily from
compactness:
Theorem 11.5. There exists a nonstandard model of Peano arithmetic.
Proof. Extend the language to include a new constant symbol c, and consider the
set
= PA {c 0, c 1, c 2, . . . },
where PA denotes the set of Peano axioms and n m means ` (n = (m + `)).
Every finite subset of is certainly satisfiable, and hence all of is. In any model
NOTES ON LOGIC 29

of , c must be a number that is greater than all of 0, 1, 2, . . . . In other words, it is


an infinite number, and so the model must be nonstandard. 
In fact, every nonstandard model of Peano arithmetic contains infinite numbers.
Specifically, its not hard to show that for each natural number k, the Peano axioms
completely determine what can happen up to the k-th successor of 0. All you get is
the standard model. The only loophole is that there may be numbers bigger than
any natural number, and as we have seen, that can occur.
Just about every theorem from elementary number theory can be proved in Peano
arithmetic, and so they remain true in nonstandard models. For example, given
a prime p, the natural numbers modulo p form a field, and the same is true for
nonstandard models and infinite primes. The resulting field is no longer finite, but
it is pseudo-finite. (This is not obvious.)
It is difficult to describe a nonstandard model of Peano arithmetic concretely. For
example, Tennenbaum proved that a nonstandard model can never be computable.
(A computable model is one with a countable domain and for which the functions
and relations are algorithmically computable.) This means that they are necessarily
complicated and exotic in a way the natural numbers are not.
We will prove one more consequence of compactness, namely that any set of
sentences with an infinite model has arbitrarily large models:
Theorem 11.6 (Upwards Lowenheim-Skolem). Let be a set of first-order sen-
tences such that has an infinite model. Then for every infinite cardinal , there
exists a model of cardinality at least .
The proof is a beautiful example of how languages with uncountably many
symbols can be useful.
Proof. Introduce new constant symbols, and for each pair of them add a new
sentence to saying these constants are distinct. The resulting extension of is
finitely satisfiable, because each finite subset can be satisfied by using an infinite
model of and choosing distinct values for the constant symbols that occur in the
finite subset. Thus, the entire set of sentences is satisfiable, and every model has
cardinality at least . 
In fact, we can attain cardinality exactly (as long as there are at most symbols
in the language), but this proof does not justify that conclusion. We will prove it in
the next section, as a corollary of the downwards Lowenheim-Skolem theorem.

12. Downwards Lo
wenheim-Skolem
Definition 12.1. Let A and B be structures for some first-order language. We
say A is a substructure of B (written A B) if their domains satisfy A B, the
constants agree, and the functions and relations for A are the restrictions to A of
those for B. We say A is an elementary substructure of B (written A e B) if A B
and for every wff (x1 , . . . , xk ) with free variables x1 , . . . , xk and all a1 , . . . , ak A,
A |= (a1 , . . . , ak ) iff B |= (a1 , . . . , ak ).
We also call B an elementary extension of A if A e B.
Note that a substructure can satisfy different sentences from its extension. In
fact, the sentences satisfied by a substructure are usually totally different, and it
deserves an explanation when they do agree.
30 HENRY COHN

In the definition of an elementary substructure, it is not enough to consider just


sentences (with no free variables). For example, consider the language with a
single unary function symbol f . Let A and B be the structures with A = {1, 2, . . . }
and B = {0, 1, 2, 3, . . . }, in both cases with the function representing i 7 i + 1.
Then A = B, so they satisfy the same sentences, and A B. However, B is not an
elementary extension of A, because if (x) = y (f (y) = x), then A |= (1) but
B 6|= (1).
The proof of upwards Lowenheim-Skolem using compactness generalizes to pro-
duce an elementary extension:
Theorem 12.2 (Upwards Lowenheim-Skolem). Let A be an infinite structure. Then
for every infinite cardinal , A has an elementary extension of cardinality at least .
Proof. We extend the language by including a new constant symbol for each element
of the domain A. Then A can be viewed as a structure for the new language by
assigning each new constant symbol to the corresponding element of A. Let
be the set of first-order sentences satisfied by A in the extended language. By
Theorem 11.6, there exist arbitrarily large models of . All of these models are
elementary extensions of A, if we identify A with the substructure formed by the
constant symbols. 

Downwards Lowenheim-Skolem says we can decrease the size of any structure


down to the cardinality of the language. Furthermore, we get an elementary
substructure:
Theorem 12.3 (Downwards L owenheim-Skolem). Let A be a structure for a first-
order language L, and let S A be any subset of the domain of A. Then there
exists an elementary substructure B of A such that S B and
|B| max(|S|, number of symbols in L).
Note that there are always at least 0 symbols in every first-order language,
because there are infinitely many variables. By choosing S appropriately, we can
force |B| to be anything from the number of symbols in L to |A|.
Skolems paradox is that if ZFC has a model, then downwards Lowenheim-Skolem
says it has a countable model. At first this seems impossible, because ZFC proves
that there are uncountable sets. How is that possible when there are only countably
many elements available to make them from?
To resolve this paradox, we must distinguish between two notions of countability,
namely the notion internal to the model and the one we can see from the outside.
Let A be a model of ZFC. There is an element A that is the minimal inductive
set in A, and that defines countability within A: an element a A is A-countable
if there exists f A whose elements are ordered pairs describing an injection from
a to . Being A-countable is very different from just being a countable subset of A.
Looking at the model from the outside, we can see that a has no more elements than
does. However, if we try to build an injection from a to for an A-uncountable
set a, well find that there is no element of the model corresponding to our injection.
From within the model itself, a behaves like an uncountable set, even though we
can count its elements by going outside the model.
It might sound at first like its obvious that the external perspective is right, but
thats just because we see ourselves as judging the model from the outside. Instead,
NOTES ON LOGIC 31

we can imagine how it feels inside the model.3 Suppose one day a booming voice
from the sky announces How silly you are to think R is uncountable! I can easily
enumerate all the pitiful things you call real numbers. Upon being asked how, the
voice replies For heavens sake, you just do it. Theres nothing special about it.
Really, your whole world is countable, so you can enumerate whatever you want.
When you complain that this doesnt seem possible, the voice says I keep forgetting
how limited your stock of functions is. Indeed, your world has only countably many
of them! The enumeration is admittedly not a function as you understand the term,
but trust me: I can see functions that would make your head spin. If only you could
join me out here in the real world!
A model of ZFC is a self-contained set-theoretic universe, and the usual definitions
and theorems apply only within the model. Unsolicited opinions from the universe
containing the model are neither relevant nor helpful. Furthermore, the external
universe may itself be just a model within some larger universe, so who is to say
which level counts as reality?
We will prove downwards L owenheim-Skolem directly, with no need for the
compactness theorem.
Lemma 12.4 (Tarski-Vaught test). Let A and B be structures for a first-order
language. Then A e B if and only if A B and the following equivalence holds:
for all wffs (x0 , . . . , xk ) with free variables x0 , . . . , xk and all a1 , . . . , an A,
B |= x0 (x0 , a1 , . . . , an ) iff there exists a0 A such that B |= (a0 , a1 , . . . , an ).
This condition says that given a wff with an assignment in A of all but one of its
free variables, it can be satisfied in B by choosing a value for the remaining variable
from B iff it can be satisfied in B by choosing a value in A. Note that this condition
talks only about B, and says nothing directly about which sentences A satisfies.

Proof. For the easy direction, suppose A e B. Then B |= x0 (x0 , a1 , . . . , ak ) iff


A |= x0 (x0 , a1 , . . . , ak ), because A is an elementary substructure. The latter
condition holds iff A |= (a0 , . . . , ak ) for some a0 A, and this is equivalent to
B |= (a0 , . . . , ak ) because A e B.
For the more subtle direction, suppose that A B and the condition from
the lemma statement holds. We wish to prove that for all (x1 , . . . , xk ) and
a1 , . . . , ak A,
A |= (a1 , . . . , ak ) iff B |= (a1 , . . . , ak ).
We will do so by induction on the length of . When is atomic, this is true by the
definition of a substructure. When is built up using , , , , or , it is trivial.
The one remaining cases is quantifiers, and we can reduce to the existential case
by replacing with . Thus, suppose (x1 , . . . , xk ) = x0 (x0 , . . . , xk ). Then
A |= (a1 , . . . , ak ) iff there exists a0 A such that A |= (a0 , . . . , ak ). By induction,
this is equivalent to B |= (a0 , . . . , ak ), and by the Tarski-Vaught condition this
assertion is equivalent to B |= x0 (x0 , a1 , . . . , ak ). Thus,
A |= (a1 , . . . , ak ) iff B |= (a1 , . . . , ak ),
as desired. 

3Please forgive me for anthropomorphizing sets.


32 HENRY COHN

Proof of downwards Lowenheim-Skolem. For each wff (x0 , . . . , xk ) define a func-


tion e : An A such that
A |= x0 (x0 , a1 , . . . , ak ) iff A |= (e (a1 , . . . , ak ), a1 , . . . , ak ).
In other words, e (a1 , . . . , ak ) is suitable choice of x0 if there is any, and we can
choose it arbitrarily if no x0 will work. (Of course this construction uses the axiom
of choice heavily.)
Let B0 = S {constants in A}, and for each natural number i define
Bi+1 = Bi {e (b1 , . . . , bk ) : (x1 , . . . , xk ) is a wff and b1 , . . . , bk B}.
S
Let B = i0 Bi . In other words, we start with S and the constants in A, and then
we take the closure under all the functions e .
Now B is the domain of a substructure B: it contains all the constants, we can
define the relations by restriction, and it is closed under all the functions in the
language. To see why it is closed under functions, note that if f is a function symbol
and we set (x0 , . . . , xk ) = x0 (x0 = f (x1 , . . . , xk )), then e = f A .
By the Tarski-Vaught test, A e B, because closure under e ensures that the
Tarski-Vaught condition is satisfied. To compute the cardinality |B|, note that
X
|Bi+1 | |Bi | + |Bi |k
wffs (x0 , . . . , xk )

max(|Bi |, |{wffs (x0 , . . . , xk )}|)


= max(|Bi |, number of symbols in L).
Thus, |B| max(|S|, number of symbols in L). 

13. Filters and ultrafilters


A filter on a set is a way of specifying which subsets should be considered large.
Definition 13.1. A filter F on a set S is a subset F P(S) such that
(1) 6 F and S F ,
(2) if X F and X Y S, then Y F , and
(3) if X F and Y F , then X Y F .
In other words, the empty set is not large but the full set is, a superset of a large
set must be large, and the intersection of two large sets is still large. Note that the
last property is the most distinctive: there are many ways of characterizing large
sets in mathematics, but most are not closed under intersection.
One obvious way to form a filter is to look at all the subsets of S that contain a
given non-empty set T . This is called a principal filter:
Definition 13.2. If T is a non-empty subset of S, then the principal filter generated
by T is {X P(S) : T X}.
Lemma 13.3. Every filter on a finite set is principal.
Proof. Let F be a filter on a finite set. Then F must be finite, and thus
\
XF
XF
T
by iterated closure under intersection. If we let T = XF X, then T is contained
in every element of F , and every superset of T is an element of F . Thus, F is the
principal filter generated by T . 
NOTES ON LOGIC 33

Lemma 13.4. If S is an infinite set, then


{X P(S) : S \ X is finite}
is a non-principal filter on S.
Proof. Let F = {X P(S) : S \ X is finite}. Then S F , and 6 F because S is
infinite. For the other two properties, it is convenient to reformulate the definition
as F = {S \ X : X P(S) and X is finite}. Every superset of S \ X is of the form
S \ Y with Y X, and (S \ X) (S \ Y ) = S \ (X Y ). Thus, F is a filter. However,
it is not principal, because no element of S is in every set in the filter. 

A subset whose complement is finite is often called a cofinite subset.


Lemma 13.5. A subset B of P(S) is contained in a filter if and only if it satisfies
the finite intersection property: for all X1 , . . . , Xn B,
n
\
Xi 6= .
i=1

Proof.TThis condition is obviously necessary, because if F is a filter containing B,


n Tn
then i=1 Xi F and thus i=1 Xi 6= .
To see that the finite intersection property is sufficient to ensure that B is
contained in a filter, let
F = {X P(S) : Y1 Yn X for some Y1 , . . . , Yn B}.
Certainly 6 F and S F , and F is by definition closed under taking supersets.
Furthermore, if X F and X 0 F , then Y1 Yn X and Y10 Ym0 X 0
for some Y1 , . . . , Yn , Y10 , . . . , Ym0 B, and then
Y1 Yn Y10 Ym0 X X 0 ,
so X X 0 F . Thus, F is a filter containing B. 

Definition 13.6. An ultrafilter on a set S is a filter F on S such that for all X S,


either X F or S \ X F .
Note that the sets X and S \ X cannot both be in F , because their intersection
is empty. If we view the sets in the filter as being large, then the ultrafilter property
means every set either is large or has a large complement. This is a very strong
property.
Lemma 13.7. A principal filter is an ultrafilter if and only if it is generated by a
one-element subset.
Proof. Suppose T is the generator of a principal filter on a set S. It is an ultrafilter
if and only if every subset of S contains either all or none of T , which is true exactly
when |T | = 1. 

It is far from obvious that any other ultrafilters exist, but we will see shortly that
they do.
Lemma 13.8. If F is a filter on S and X is a subset of S such that S \ X 6 F ,
then there is a filter F 0 on S such that F F 0 and X F 0 .
34 HENRY COHN

Proof. The set X cannot have empty intersection with any element Y of F , because
S \ X would then contain Y and therefore be an element of F . It follows that
{X Y : Y F }
has the finite intersection property, because
n
\ n
\
(X Yi ) = X Yi
i=1 i=1
Tn
and i=1 Yi F if Y1 , . . . , Yn F . Thus, there exists a filter F 0 on S that contains
{X Y : Y F }. Because F 0 is closed under taking supersets, X F 0 and F F 0 ,
as desired. 
Lemma 13.9. A filter F on a set S is an ultrafilter if and only if it is a maximal
filter on S (i.e., there is no filter F 0 on S such that F ( F 0 ).
Proof. Suppose F is a ultrafilter. Then for every subset X of S, either X or S \ X
is contained in F , but no filter can contain both. Thus, F cannot be enlarged and
must be a maximal filter.
For the other direction, if F is a filter on S and X is a subset of S such that
neither X nor S \ X is in F , then Lemma 13.8 yields a strictly larger filter on S. 
Proposition 13.10. Every filter is contained in a maximal filter (on the same set).
Proof. This proposition is a straightforward consequence of Zorns lemma. Let F
be a filter on a set S, and let P be the poset of filters on S that contain F , ordered
by . If C is a chain in P , then [
F0
F 0 C
is a filter on S: it contains S but not and is closed under taking supersets
because every element of C has these properties, and if X1 F10 and X2 F20 with
F10 , F20 C, then without loss of generality F10 F20 because C is a chain, and hence
X1 X2 F20 .
Thus, every chain in P has an upper bound, so P has a maximal element, as
desired. 
It follows from Lemmas 13.4 and 13.9 and Proposition 13.10 that there is a
non-principal ultrafilter on every infinite set. However, non-principal ultrafilters
are mysterious objects and it is far from clear how to describe one. In particular,
it is consistent with ZF that all ultrafilters are principal. One might guess that
Proposition 13.10 would be equivalent to the Axiom of Choice, but in fact it is
strictly weaker.
We built non-principal ultrafilters from extending the filter given by Lemma 13.4,
which is not contained by any principal ultrafilter. In fact, this is the only possible
way to construct them:
Proposition 13.11. If U is a non-principal ultrafilter on a set S, then every cofinite
subset of S is in U .
Proof. Let X be a cofinite subset of S, and suppose X 6 U . Then S \ X U , and
S \ X is finite. Let Y be the intersection of all the subsets of S \ X that are in U ,
so Y is the smallest subset of S \ X that is in U . Then every element Z of U must
satisfy Y Z, since otherwise Z Y would be a smaller element of U than Y is.
NOTES ON LOGIC 35

In particular, U must be the principal filter generated by Y , which contradicts the


hypothesis that U is non-principal. Thus, X must be an element of U after all. 

The theory of filters and ultrafilters generalizes beautifully to the context of


Boolean algebras: a subset F of a Boolean algebra B is a filter if it is upwards
closed, contains 1 but not 0, and is closed under . It is an ultrafilter if for every
p B, either p F or p F . Thus, a filter on a set S is the same as a filter in the
Boolean algebra P(S). All of the results of this section generalize straightforwardly
to filters in Boolean algebras.
A Boolean algebra homomorphism is a map between two Boolean algebras that
preserves 0, 1, , , and . The following lemma is straightforward:
Lemma 13.12. A subset F of a Boolean algebra B is a filter if and only if there
is a Boolean algebra homomorphism f : B B 0 such that F = {p B : f (p) = 1},
and it is an ultrafilter if and only if f [B] = {0, 1}.
In terms of P(S), principal ultrafilters correspond to the obvious homomorphisms
to {0, 1}. Namely, fix some s S and map a subset X S to 1 if s X and to 0
otherwise. Thus, when S is finite there are only the obvious homomorphisms, while
there are non-obvious ones when S is infinite.

14. Voting
There is a remarkable connection between principal ultrafilters and the theory
of voting. Before proceedings to ultraproducts, we will use ultrafilters to prove
Arrows impossibility theorem. This proof is due to Bengt Hansson (The existence
of group preference functions, Public Choice 28 (1976), 8998). Its not the shortest
or simplest proof, but it makes the connection with mathematical logic especially
vivid.
How can a group of voters rank several options (such as candidates for election
or bills)? Lets assume each voter has a personal ordering for these options. We
wont even ask about strength of preferences, since its unclear how to normalize
preference strength between people. Instead, we want a voting system in the form
of a function that takes the voters rankings and somehow aggregates them to form
a collective ranking.
One compelling approach is Condorcets: rank A > B if and only if a majority of
the voters thought so. Unfortunately, this approach breaks down when there are
more than two options being voted on. For example, what if a third of the voters
think A > B > C, a third think B > C > A, and a third think C > A > B? Then
two thirds think A > B, two thirds think B > C, and two thirds think C > A. This
sounds paradoxical, but it isnt. Even if every voter is individually rational, pairwise
voting can create inconsistent rankings. For this particular case, by symmetry the
only reasonable option is to declare a tie.4
One might hope to salvage voting by using more than just pairwise vote totals.
Maybe if you look at the patterns of who voted in which way, you can discern

4Note that if you have the power to schedule pairwise votes, you can choose who you want to
win. If you like C, you first call for a vote on A vs. B, and A will win. Then you call for a vote
on A vs. C, and C will win, at which point you declare C the overall winner by transitivity. By
contrast, if you want B to win, you first call for a vote on A vs. C. This illustrates the potentially
enormous power of being in charge of what gets voted on when.
36 HENRY COHN

societal preferences. Arrows impossibility theorem says this cannot be done, at


least not in the ideal way one might hope for.
Definition 14.1. A pre-ordering of a set S is a binary relation on S that is
reflexive (x x) and transitive (x y and y z implies x z). I.e., its an
ordering that allows ties, where x y and y x but x 6= y. We will write x < y to
indicate that x y but x and y are not tied. A pre-ordering is total if for all x and
y, x y or y x. Let O(S) be the set of total pre-orderings of S.
Definition 14.2. Given a set V of voters and a set S of options, a social choice
function is a function from F(V, O(S)) to O(S). (Recall that F(V, O(S)) is the set
of functions from V to O(S).)
In other words, a social choice function takes as input the ordered preferences
of each voter, allowing ties, and somehow combines them to output the societal
preferences. Arrow proposed5 two axioms for a social choice function:
Unanimity: if every voter thinks A > B, then so does society.
Independence of irrelevant alternatives: societys ranking of A vs. B does not
change if the voters change their rankings of C, where C 6= A, B, while keeping the
same relative rankings of A and B.
Unanimity is an obvious requirement, meant only to rule out ridiculous social
choice functions. Independence of irrelevant alternatives is much more subtle. If
someone offers you chocolate or vanilla ice cream and you choose chocolate, it would
be unreasonable to change your choice to vanilla upon being told that strawberry is
also an option, since the alternative of strawberry is irrelevant for the choice between
chocolate and vanilla. It would certainly be desirable to extend this independence
from the individual to the collective level, since it would avoid situations in which
a third candidate flips the election between two other candidates without actually
being electable him/herself.
A dictatorship is a social choice function such that if one particular voter, called
the dictator, thinks A > B, then the collective ranking agrees that A > B, regardless
of what the other voters think. Note that if the dictator feels A and B are tied,
then the social choice function is not restricted in how it can break the tie, but it
cannot overrule a clear preference from the dictator.
Theorem 14.3 (Arrows impossibility theorem). For a finite set of voters and
at least three options, every social choice function that satisfies unanimity and
independence of irrelevant alternatives is a dictatorship.
Giving up on unanimity would be ridiculous, and a dictatorship would be equally
unacceptable. This means any reasonable voting system must not satisfy indepen-
dence of irrelevant alternatives.
Definition 14.4. A set of voters is decisive for A > B if whenever they all rank
A > B, so does society. They are decisive if they are decisive for A > B for all
options A and B.
Note that being decisive for A > B does not imply being decisive for B > A.
(I.e., being decisive for A > B is specific to not just these two options, but also their
ordering.)

5This version is not exactly the same as Arrows, but similar in spirit.
NOTES ON LOGIC 37

A>C A>C>B C>B

X Y

Figure 14.1. The voting pattern in the proof of Lemma 14.5.

A?B A>B

X V \X

Figure 14.2. Because V \ X is not decisive for A > B, there is


some voting pattern for X (shown as A ? B) that leads to B A
overall.

In a dictatorship, there is a decisive set of size one. More generally, having a


small decisive set means the voting system is an oligarchy.
Let V be the set of voters, and suppose there are at least three options. We will
prove Arrows theorem via a sequence of lemmas. Suppose the social choice function
satisfies unanimity and independence of irrelevant alternatives.
Lemma 14.5. The decisive sets form a filter on V .
In the following proofs, when X is a set of voters we will say X votes for A > B
to mean that every voter in X does so.
Proof. By unanimity, V is decisive and is not. Furthermore, decisive sets are
clearly upwards closed. All that remains is to check closure under finite intersections.
Suppose X and Y are decisive, and that X Y votes for A > B. We would like to
show that society therefore ranks A > B.
To check this, we will change how X and Y rank a third option C. By indepen-
dence of irrelevant alternatives, moving C around is harmless, so we can do this
without loss of generality. Suppose X votes for A > C and Y votes for C > B; their
intersection X Y must therefore vote for A > C > B. (See Figure 14.1.)
Now because X is decisive and votes for A > C, society ranks A > C. Similarly,
because Y is decisive and votes for C > B, society ranks C > B. It follows that
society ranks A > B, as desired, and this must happen whenever X Y ranks
A > B because C is irrelevant. This holds for all pairs, so X Y is decisive. 

Lemma 14.6. If V \ X is not decisive for A > B, then X is decisive for B > C
(for all C 6= A) and for C > A (for all C 6= B).
Proof. We will prove just the first case (the second is completely analogous). Suppose
C 6= A. Because V \ X is not decisive for A > B, there exists a scenario in which
V \ X votes for A > B but society thinks B A (see Figure 14.2). Now imagining
changing the rankings of C, which will not change the outcome for A vs. B by
independence of irrelevant alternatives. Suppose V \ X votes for A > C, while X
38 HENRY COHN

A>C
A>B
A?B
A>C
B>C

X V \X

Figure 14.3. Now A > C by unanimity and B A as before.

votes for A > C and B > C (see Figure 14.3). Society still ranks B A, and now
A > C by unanimity, and hence society ranks B > C. On the other hand, the
position of A is irrelevant for this ranking, and the only part that did not refer to
A was that X votes for B > C. Thus, X must be decisive for B > C whenever
C 6= A. 

Lemma 14.7. If X is decisive for A > B, then V \ X is not decisive for B > A.
Proof. What would happen if the voters in X all voted for A > B and those in
V \ X all voter for B > A? They cant both win, so at most one of X and V \ X
can be decisive in this situation. 

Lemma 14.8. For all X V , either X or V \ X is decisive.


Proof. If V \ X is decisive, then we are done. Otherwise, V \ X is not decisive
for A > B for some A and B. Then by Lemma 14.6, X is decisive for B > (6= A)
and (6= B) > A, where we use the shorthand 6= A to stand for any option other
than A. By Lemma 14.7, V \ X is not decisive for (6= A) > B and A > (6= B).
Finally, Lemma 14.6 tells us that X is decisive for B > (6=6= A), (6= B) > (6= A),
and (6=6= B) > A. Here 6=6= A means anything that is different from something that
is different from A, so it includes all options. Thus, these three possibilities cover
all comparisons, and X is decisive. 

It follows that the decisive subsets of V form an ultrafilter. Now Arrows


impossibility theorem follows immediately from the fact that every ultrafilter on
a finite set is principal. By contrast, infinite sets have non-principal ultrafilters,
and we will see in the next section how to use such ultrafilters to decide first-order
sentences by voting.

15. Ultraproducts
Fix a first-order language, and consider structures Ai for i I. How can we
combine
Q them to form a single structure? One obvious
Q way is to try to form a product
A := iI Ai . We can take it to have domain iI Ai , and wecan define constants
and functions in the obvious way: cA = (cAi )iI , f A (ai )iI = f Ai (ai ) iI for


unary functions, etc. However, it is not clear how to handle relations. Furthermore,
products have poor first-order properties: given a set of first-order sentences, its
models are generally not closed under taking products.
Loss theory of ultraproducts resolves all these difficulties. The idea is that if
we want to tell whats true in an ultraproduct, we let the factors vote using an
ultrafilter.
NOTES ON LOGIC 39

Definition 15.1. Let Ai be a structure for a fixed first-order


Q language, for i I,
and let U be an ultrafilterQon I. The ultraproduct B := iI Ai /U has as its domain
the equivalence classes of iI Ai under , where x y means {i I : xi = yi } U .
Let [x] denote the equivalence class of x. For each constant symbol c, cB is defined
to be [(cAi )iI ]. We apply functions pointwise via
f B ([x1 ], . . . , [xn ]) = [(f Ai (x1i , . . . , xni ))iI ],
and we define the relations by RB ([x1 ], . . . , [xn ]) iff {i I : RAi (x1i , . . . , xni )} U .
Note that the superscripts on x1 , . . . , xn are intended to avoid confusion with the
subscripts that specify a factor in the product.
For the definition of the ultraproduct to make sense, we need to check sev-
eral things. First, that is an equivalence relation. This follows immediately
from the definition of an ultrafilter: it is reflexive because I U , it is symmetric
because equality is symmetric, and it is transitive because U is closed under inter-
section and is upwards closed.6 Next, we must check that the functions are well
defined. If x1 y 1 , . . . , xn y n , then we must show that (f Ai (x1i , . . . , xni ))iI
(f Ai (yi1 , . . . , yin ))iI . To see why, note that
n
\
{i I : f Ai (x1i , . . . , xni ) = f Ai (yi1 , . . . , yin )} {i I : xji = yij } U.
j=1

Finally, relations are well defined for essentially the same reason.
The intuition behind ultraproducts is that for every statement, we let the factors
vote on whether it should be true. Whichever side forms a set in the ultrafilter
wins. Note that having an ultrafilter, as opposed to a filter, is crucial: it means that
exactly one of and will win, since they correspond to complementary sets of
voters.
Note that the equivalence relation is just voting on equality: to decide whether
[x] = [y], let the factors vote. Similarly, the definition of RB also amounts to taking
a vote. The fundamental theorem of ultraproducts says that once we have set up
the definition, the same is true for all wffs:
Q
Theorem 15.2 (Fundamental theorem of ultraproducts). Let B = iI Ai /U be
1 n 1 n
an ultraproduct.
Q Suppose (x , . . . , x ) is a wff with free variables x , . . . , x and
1 n
a , . . . , a iI Ai . Then
B |= ([a1 ], . . . , [an ]) iff {i I : Ai |= (a1i , . . . , ani )} U.
Proof. This assertion is true by construction for atomic wffs, and we will prove it for
general wffs by induction on formula length. Let S = {i I : Ai |= (a1i , . . . , ani )}.
We can reduce everything to the case of logical symbols , , and by writing
everything else in terms of them. For , note that S = I \ S , while X U if
and only if I \ X 6 U . For , note that S = S S , which is in U if and
only if both S and S are since U is both closed under intersection and upwards
0 0 n
Q suppose = x (x , . 0. . , x ). Then
closed. Finally, B |= ([a1 ], . . . , [an ]) iff there
0 n
exists a iI Ai such that B |= ([a ], . . . , [a ]). By induction, the latter holds
iff there exists a0 iI Ai such that {i I : Ai |= ([a0i ], . . . , [ani ])} U . Finally,
Q

that is equivalent to {i I : Ai |= x0 (x0 , [a1i ], . . . , [ani ])} U , as desired. 

6If it is not obvious why upwards closure is relevant, you should pause and think about this.
40 HENRY COHN

Corollary 15.3. Let be a set of first-order sentences. Then the ultraproduct of


any models of is again a model of .

Ultraproducts with respect to principal ultrafilters are boring: they simply


recover one of the factors (much as a dictatorship merely reproduces the dictators
opinions). Thus, ultraproducts do not give an interesting way to combine finitely
many structures. However, ultraproducts with respect to a non-principal ultrafilter
are remarkably interesting.
As a first example, consider the standard model N of Peano arithmetic. Let I
be the set of natural numbers (it will be clearer to use different notation for the
index set and the model itself), and let U be a non-principal ultrafilter on I. We
can form the ultraproduct
Y
N/U.
iI

Is it just N again? No, it is a non-standard model of Peano arithmetic. To see


why, note that it contains a standard model, namely the constant elements [(n)iI ]
for n N. On the other hand, [(i)iI ] is an infinite element of the ultraproduct.
Specifically, for each n N, {i I : i > n} is cofinite and hence in the ultrafilter
(by Proposition 13.11), so [(i)iI ] > [(n)iI ].
We can carry out a similar construction for the real numbers. Fix a non-principal
ultrafilter U on a the index set I of natural numbers, and consider the field R of
real numbers Q with constants 0 and 1, functions + and , and binary relation . We
call R = iI R/U the set of hyperreal numbers. We identify the real number x
with the equivalence class [(x)iI ] of the constant sequence, so R is an extension
of R. This field has exactly the same first-order properties as R, but it contains
infinite elements, such as [(i)iI ], and infinitesimal elements, such as the reciprocal
of [(i)iI ].
In fact, we can extend subsets and functions from R to R much more broadly.
For each subset S R, the corresponding internal set S R is defined by
[(xi )iI ] S iff {i I : xi S} U . A function f : R R extends to
f : R R via f ([(xi )iI ]) = [(f (xi ))iI ]. It is not hard to check that S and f
are well defined. Thus, we automatically get hyperreal versions of familiar functions.
For example, we have a hyperreal sine function sin and exponential function exp .
The beauty of the hyperreal numbers is that we can now do calculus using
infinitesimals. This is called non-standard analysis. For example, lets write write
x y if x y is infinitesimal (i.e., < x y < for all positive real numbers
). Then a one can show that function f : RtoR is continuous at a real number x
if and only if its hyperreal extension f has the following property: for all y R ,
if y x then f (y) f (x). In other words, continuity amounts to saying if y
is infinitely close to x, then f (y) is infinitely close to f (x). This justifies the
naive calculations Newton and Leibniz did with infinitesimals when setting up the
foundations of calculus, but it was not until the work of Abraham Robinson in the
1960s that anyone realized it was possible to do this rigorously (rather than taking
the usual - approach).
In these examples of non-standard models of N or R, we have taken an ultraproduct
of copies of the same structure (i.e., an ultrapower ). Of course we can also take
ultraproducts of different structures. For example, in first-order graph theory let
Cn denote a chain of n vertices linearly connected. What can we say about the
NOTES ON LOGIC 41

structure
Y
Cn /U
n1

with respect to a non-principal ultrafilter U on N?


(1) All but finitely many of these chains have size at least k, and therefore the
ultraproduct has size at least k, by the fundamental theorem of ultraproducts.
Thus, it is infinite.
(2) For all n 2, Cn has two vertices of degree 1 and all others of degree 2.
Hence the same is true for the ultraproduct.
(3) For each k, all but finitely many chains Cn have their degree 1 vertices
at distance greater than k apart. For each fixed k, having the degree 1
vertices be at distance greater than k can be expressed by a first-order
sentence. Thus, there is no finite path between the vertices of degree 1
in the ultraproduct (since for each k, there is no path of length k). In
particular, the ultraproduct is not connected.
(4) The ultraproduct contains no cycles, because for each k none of its factors
contains a cycle of length k.
Note that we are not using any special properties of the ultrafilter beyond the
fact that it contains all cofinite sets. More sophisticated arguments sometimes make
use of special ultrafilters, but the only time we will do so will be in the next proof.
Using ultraproducts, we can finally prove the compactness theorem:

Theorem 15.4 (Compactness). Let be a set of first-order sentences. If every


finite subset of has a model, then so does itself.

Proof. Let S be the set of finite subsets of , and for each T S, let AT be a model
of T . We will choose an ultrafilter U on S so that the ultraproduct
Y
AT /U
T S

is a model of .
Given
Q a suitable ultrafilter U on S (with properties to be specified below), let
A = T S AT /U . We wish to ensure that A |= for all . By the fundamental
theorem of ultraproducts, A |= if and only if {T S : AT |= } U . Thus, we
need the sets {T S : AT |= } to be in U for each .
Let S = {T S : AT |= }. These sets have the finite intersection property,
because {1 , . . . , n } S1 Sn and hence S1 Sn 6= . Thus, there
exists an ultrafilter on S that contains S for each Q (by Lemmas 13.5 and 13.9
and Proposition 13.10). If U is such an ultrafilter, then T S AT /U is a model of
, as desired. 

I find this construction absolutely remarkable. We begin with potentially different


models for all the finite subsets of , and we need to piece them together. To do so,
we simply set up a voting scheme and let them decide democratically on the model
of . All we need to ensure is that they vote in favor of all the sentences in , and
that is arranged by our choice of ultrafilter.
42 HENRY COHN

16. Formal proofs in first-order logic


In this section, we will develop a formal proof system for first-order logic. It will
be similar to the proof system for the propositional calculus, but much deeper and
more useful (because in first-order logic we cant simply fall back on truth tables).
As in Section 6, we will throw out many symbols. In particular, we will replace
, , , , and with equivalent expressions in terms of , , and :
( )
( ) ( )
( ) ( )
( ) (( ) ( ))
x x
Strictly speaking, we havent used as part of first-order logic, but we can add it
as a special atomic wff that is never satisfied.
Proofs will consist of certain finite sequences of wffs. We will allow free variables,
with the interpretation that there are implicit universal quantifiers on them. This is
much like informal proofs, in which you can introduce a variable to stand for an
arbitrary element, manipulate it freely, and then announce that you have proved a
universally quantified sentence about that variable.
We will keep the same five axioms as before. For all first-order wffs , , and
(possibly with free variables), each of the following will be an axiom:
(1) ( )
(2) ( ( ))
(3) (( ) (( ( )) ( )))
(4) (( ) ( ))
(5) ((( ) ) )
In addition, we will add four new axioms:
(6) (x ( ) ( x )), whenever and are wffs, x is a variable,
and x is not free in
(7) x (x) (y), whenever (x) is a wff with free variable x (and possibly
others), y is a variable or constant, and no free occurrence of x in (x) is
within the scope of a y
(8) (t = t), whenever t is a term
(9) ((x = y) ( )), whenever x and y are variables or constants, and
are wffs, and is obtained by substituting y for some free occurrences of
x in
Note that in Axiom 7, the (y) in x (x) (y) is outside the scope of the x.
Otherwise, it would have been written x ((x) (y)).
In addition to modus ponens, we will have a rule of inference called generalization:
if we have proved a wff , then we can deduce x for any variable x.
Definition 16.1. Let be a sentence and a set of sentences. A formal proof
of given is a finite sequence 1 , . . . , n of wffs with n = , such that each
of 1 , . . . , n is an axiom, is an element of , or follows from previous wffs in the
sequence using modus ponens or generalization. We write ` if there is a formal
proof of given , and we write ` if there is a formal proof of given .
Definition 16.2. A set of sentences is inconsistent if ` , and consistent
otherwise.
NOTES ON LOGIC 43

Lemma 16.3 (Soundness). If ` , then |= .


Proof. We just need to prove by induction that for every wff in a proof using
hypotheses , every model of satisfies the universal quantification of . (I.e.,
if there are any free variables, insert corresponding universal quantifiers at the
beginning of the formula.) Modus ponens and generalization preserve this property,
and the axioms all have the property; this is enough to complete the proof. 
Corollary 16.4. Every satisfiable set of wffs is consistent.
Lemma 16.5 (Deduction). Let and be sentences and a set of sentences. If
{} ` , then ` ( ).
Proof. Practically the same proof as for Lemma 6.5 works. Recall that we take a
formal proof 1 , . . . , n of and insert before each step. Of course we need to
insert a few additional steps of justification for each step of the new proof. Exactly
the same justification as before works for all the cases analyzed in the proof of
Lemma 6.5, and the only new case we need to deal with is generalization. Suppose
i < j, and j is derived from i by generalization (so j = x i ). Then we want
to derive ( j ) from ( i ). By generalization, we can derive x ( i )
from ( i ). Then Axiom 6 tells us that x ( i ) ( x i ), because
is a sentence and thus has no free variables. By modus ponens, we can deduce
( x i ), which is what we wanted. 
Lets give a couple of examples of formal proofs. First, (x x ). When
we remove and , this becomes
(x ( ) ((x ( ) ) )).
In other words, its a special case of
( (( ) )),
with = x ( ). One way to see that this is provable is to use the completeness
theorem for propositional calculus (which says we can prove any propositional
tautology; note that we can imitate the propositional calculus proofs in first-order
logic, since all the same axioms still apply). We could also see it using the deduction
lemma: {, } ` , and then we can apply the deduction lemma twice.
For a more subtle example, lets prove (x x), which amounts to
((x ) (x (( ) ) ).
We will show that {x , x (( ) )} ` , after which applying the
deduction lemma twice implies
` ((x ) (x (( ) ) ).
The formal proof of {x , x (( ) )} ` is as follows:
(1) x
(2) x (( ) )
(3) x (( ) ) (( ) )
(4) (( ) )
(5) (( ) )
(6)
(7) x
(8)
44 HENRY COHN

Lines 1 and 2 are hypotheses, and line 3 is a case of Axiom 7. Now line 4 follows
by modus ponens, line 5 is Axiom 5, and line 6 again uses modus ponens. Finally,
line 7 follows by generalization and line 8 by modus ponens.
Lemma 16.6. Let be a sentence and a set of sentences.
(1) If is consistent and ` , then {} is consistent.
(2) If {} is inconsistent, then ` ( ).
(3) If {( )} is inconsistent, then ` .
Corollary 16.7. Let be a sentence and a set of sentences. If is consistent,
then at least one of {} and {( )} is consistent (possibly both).
The proofs are the same as that of Lemma 6.6 and Corollary 6.7.
Theorem 16.8 (Completeness). Let be a set of sentences and a sentence. If
|= , then ` .
Corollary 16.9. If a set of wffs is consistent, then it is satisfiable.
To see why the corollary follows from the completeness theorem, note that if is
not satisfiable, then every model for satisfies (because has no models), and
thus ` . In fact, the theorem also follows from the corollary: if every model of
satisfies , then {( )} is not satisfiable. By the corollary, it is inconsistent,
so part (3) of Lemma 16.6 implies that ` .
Thus, it suffices to prove the corollary. To do so, we must somehow use consistency
to produce a model. We will face the same difficulties as in the proof of the
completeness theorem for propositional calculus, and we will deal with them in the
same way, with one additional trick known as Henkinization.
Lemma 16.10. Let be a consistent set of sentences, (x) a wff with one free
variable, and c a constant symbol that does not occur anywhere in or . If
` x (x), then {(c)} is consistent.
Proof. If {(c)} is inconsistent, then ` ((c) ). Suppose y is a variable
that does not occur anywhere in (x) or the formal proof of ((c) ) from
. If we replace c with y everywhere it occurs in the proof, then we find that
` ((y) ) and hence, by generalization, ` y ((y) ).
Now by Axiom 7, ` y ((y) ) ((x) ), because no occurrence of
y in (y) can be within the scope of a x. (Recall that every occurrence of y in
(y) was a free occurrence of x in (x) and hence was not bound by any x.) Thus,
` ((x) ) by modus ponens, and hence ` x ((x) ) by generalization.
Finally, one of our hypotheses is that ` x (x). In other words, `
(x ((x) ) ). When combined with ` x ((x) ), we find that
` . Thus, if is consistent, then {(c)} is also consistent. 
Proof of the completeness theorem. Suppose is a consistent set of wffs. We would
like to build a model of , but we have nothing to work with except syntax. Henkins
brilliant idea was that we can build the model out of pure syntax, if we expand our
language enough through a process now called Henkinization.
We would like to expand to a new set , in an expanded language, with the
following two properties:
(1) The set 0 is a maximal consistent set of wffs. I.e., 0 is consistent but not
contained in any larger consistent set (in the same language).
NOTES ON LOGIC 45

(2) Whenever 0 ` x (x), there is some constant symbol c such that 0 ` (c).
In other words, 0 assigns a name to everything it proves exists.
Either of these alone is easy. To get a maximal consistent set, we can just test
each wff sequentially and add it to the set if it is consistent with what came before.
This is exactly what we did in the proof of Theorem 6.8. One minor difference is
that the language might not be countable, but thats not a problem (we can just
well-order the language and use transfinite induction).
Similarly, we can also ensure that the second property holds. For each existential
statement x (x) implied by , we introduce a new constant symbol (never before
used). By Lemma 16.10, {(c)} is consistent. Henkinization means extending
the language and set of sentences in this way. If we cycle through all the existential
statements, we can deal with all of them this way. Of course, by enlarging the
language we have created new existential statements, so theres an infinite regress
here, but after infinitely many cycles of Henkinization we arrive at a fully Henkinized
set of sentences.
Either of these two processes can disrupt the other. If we extend the set of
sentences to make a maximal consistent set, then we may increase its consequences,
leaving it no longer Henkinized. Furthermore, Henkinizing enlarges the language,
after which the set of sentences may no longer be maximal. However, we can achieve
both properties by iteration: let 0 be , let 2i+1 be a maximal consistent set
containing 2i , and let 2i be the Henkinization of 2i1 when i 1. Then the
union 0 of all these sets is a maximal consistent, fully Henkinized set of sentences.
To see why let Li denote the language of i , and L0 that of 0 (i.e., the set of
constant symbols in L0 is the union of those for Li ). Any wff for the language L0
involves only finitely many constant symbols and must thus be a wff for Li for some
i. Thus, every wff for L0 was already dealt with at some finite stage of this process,
which means 0 is both maximal consistent and fully Henkinized.
Now we can make a structure A out of our language. Its domain A will be the
set of equivalence classes [c] of constant symbols c under the equivalence relation
c d, meaning (c = d) 0 . (Note that 0 if and only if 0 ` , because 0 is
a maximal consistent set.) In other words, having given names to everything that
might exist, we take them to be the elements of our domain, after setting them
equal if 0 thinks they are equal. Note that Axioms 8 and 9 imply that is indeed
an equivalence relation.
We set cA = [c], and we define the relations RA in A by
RA ([c1 ], . . . , [cn ]) iff R(c1 , . . . , cn ) 0 .
Note that Axiom 9 implies that these relations are well defined. To get functions,
we can use the following trick: given an n-ary function symbol f and constants
c1 , . . . , cn , 0 ` x (x = f (c1 , . . . , cn )). (Of course such an x should exist, namely
x = f (c1 , . . . , cn ), and one can check that 0 proves this.) Thus, 0 ` (c0 =
f (c1 , . . . , cn )) for some constant c0 , and we can set [c0 ] to be f A ([c1 ], . . . , [cn ]). It is
not hard to check that this is well defined.
So far, we have built a structure A. We can now prove by induction on that
A |= iff 0 .
The base case is atomic formulas, which follow from the construction of A. The cases
= and = (1 2 ) are handled exactly like in the proof of Theorem 6.8.
Finally, suppose = x (x). Because Axiom 7 says 0 ` (x (x) (c)) for
46 HENRY COHN

every constant symbol c, we see that 0 ` x (x) implies (c) 0 for all constants
c, as desired. By contrast, if 0 ` x (x), then 0 ` x (x) (recall that we
proved (x x) as an example of a formal proof), and hence 0 ` (c)
for some constant c. 
As discussed in Section 10, compactness is an immediate corollary of completeness:
any proof of a contradiction will have finite length and thus involve only finitely
many hypotheses. However, completeness is of fundamental importance in its own
right. It gives us a precise analysis of what provability means, which opens the
door of limitative results such as the incompleteness theorems. (Without a precise
definition of provability, there is no way to establish unprovability.)

17. Introduction to incompleteness


Godels incompleteness theorems are two of the most famous and striking theorems
in mathematical logic. The first says, roughly, that no axiomatization of mathematics
(of set theory, or some similarly comprehensive theory) can be complete. The second
says that if an axiomatization of mathematics is consistent, then it cannot prove its
own consistency.
From a naive point of view, the first incompleteness theorem sounds shocking,
and the second silly. Its surprising and distressing that formalized mathematics
will always be incomplete, but why would one care whether it can prove its own
consistency? After all, even if it could prove its own consistency, why would one
believe it? An inconsistent system can prove its own consistency just as easily as
a consistent one could. One doesnt trust someone just because he or she claims
not to be lying, and using mathematics to prove its own consistency sounds equally
useless.
In fact, the second incompleteness theorem is even deeper and more surprising
than the first, and is of substantial interest. From a modern point of view, the
first incompleteness theorem is exactly what one would hope for, and even expect:
mathematics is a broad and rich subject, and it would be surprising and distressing
if one could distill it to a collection of formal rules; complete formal theories lack
depth. The incompleteness of mathematics ensures that we will never finish doing
mathematics. Of course, even if one believes that formalized mathematics must be
incomplete, it is still far from obvious how one would prove it, and it took deep
insight on Godels part to do so. Nevertheless, the fact of incompleteness should
not be surprising.
The second incompleteness theorem is more subtle. Its historical importance
comes from Hilberts program, an ambitious approach to mathematical logic laid
out by Hilbert. In the early 20th century, some mathematicians were upset by the
emergence of non-constructive existence proofs, proofs of an objects existence
that give no hint as to how to construct such an object. The most famous example is
Hilberts basis theorem. It had long been an open problem whether certain algebraic
systems always had a basis. Hilbert proposed a brilliant proof that a basis always
exists, but his proof gave no method for actually finding one. Gordan, another
researcher in invariant theory, announced that this was not mathematics, but rather
theology.
It did not take long for most mathematicians to become comfortable with non-
constructive proofs. For example, Gordan eventually admitted that theology also
had its merits. Nowadays essentially all mathematicians believe non-constructive
NOTES ON LOGIC 47

proofs. Some have a preference for constructive proofs, because they give more
information, but almost nobody is skeptical about the correctness of non-constructive
proofs. However, earlier in the century there were prominent doubters. Hilbert
proposed the following plan (Hilberts program) for convincing everyone that
non-constructive proofs were harmless:
(1) Formalize all of mathematics.
(2) Identify an absolutely uncontroversial part of mathematics, the most finite
and constructive part. Nobody should doubt any of its techniques.
(3) Prove the consistency of all of mathematics (including non-constructive
methods) using only the uncontroversial part of mathematics.
If this plan could have been carried out, it would have shown that non-constructive
methods were harmless: they could not possibly contradict the results of construc-
tive mathematics. However, Godels incompleteness theorems show that Hilberts
Program cannot be carried out. The first rules out Step 1, but that is not a devas-
tating objection: we could simply replace all of mathematics with all of known
mathematics. However, the second incompleteness theorem shows that Step 3
is impossible. Assuming our formalization is consistent, it cannot prove its own
consistency at all, let alone using only constructive techniques.

18. Intuitive Background


All Cretans are liars.
Epimenides the Cretan
yields falsehood when appended to its own quotation yields
falsehood when appended to its own quotation.
W.V.O. Quine

18.1. The Liar. The paradox of the liar, attributed to Epimenides, is best phrased
in the form This sentence is false. That sentence cannot be true (since it claims to
be untrue), and therefore must be false. However, if it is false, then since it claims
to be false, it is false that it is false, and hence it is true after all. Logic is helpless
when confronted with this paradox.
Of course, the moral of this story is that one must never say This sentence
is false, or at least if one does say it, one mustnt really mean it. It is neither
a true nor a false sentence, and is therefore meaningless. It is important to bar
self-contradictory sentences from mathematics, to keep from having to deal with
the resulting contradictions.

18.2. Russells Paradox. At the turn of the century a contradiction did in fact
slip into the foundations of mathematics. Frege was a pioneer in mathematical logic,
and had devoted many years to formalizing mathematical reasoning. Just after
the second volume of his magnum opus was completed, he received a letter from
Bertrand Russell, pointing out the following paradox. Consider the set S defined by
S = {X | X 6 X}.
In other words, S is the set of all sets that are not elements of themselves. That
is not as weird a property as it might seem at first. Most sets are not elements
of themselvese.g., the set of real numbers is not a real number. Unless one has
eccentric tastes in set theory, S should in fact consist of all sets. Russell asked
48 HENRY COHN

whether S was an element of itself. By the definition of S, we have S S if and


only if S 6 S, which is a contradiction and cannot be tolerated in mathematics.
From a modern point of view, Russells paradox simply amounts to a proof that
the set S cannot exist. However, Freges formalization of mathematics contained
naive axioms implying that S did exist, which was a problem. It is very likely that
all such problems have been ironed out from the current foundations of mathematics,
but the possibility of a contradiction lurking somewhere deserves some thought and
attention.7
18.3. Implicit Self Reference. How could one rule out the liar paradox as a
mathematical statement? The most obvious way is to ban self reference, so that
one cannot say this sentence. That seems like overkill, because there are plenty
of innocent self-referential sentences. (This sentence refers to itself.) However,
better safe than sorry.
Modern mathematics does in fact rule out explicit self reference, but the startling
fact is that implicit self reference is possible. A good way for readers familiar with
computer programming to come to terms with implicit self-reference is to consider
the following puzzle:
Problem 18.1. Is there a self-contained computer program that prints out its own
source code?
The self-containment requirement is meant to rule out reading the source code
from disk or otherwise obtaining it externally. In typical programming languages it
at first seems impossible, as naive attempts lead to the infinite regress of
Print Print Print Print . . . .
Explicit self-reference (Print myself.) is not allowed in ordinary programming
languages. Nevertheless, the problem is solvable, as we will see shortly.
18.4. G odels First Incompleteness Theorem. Implicit self reference was in-
troduced by Godel in the proof of his first incompleteness theorem. He wanted to
prove that formalized mathematics cannot be complete, assuming it is consistent.
To do so, given any formalization F of mathematics, G odel formulated a sentence
G saying, in effect, G is not provable in F . This sentence cannot be provable,
since that would prove its unprovability, yet the existence of a proof of G would
prove it provability as well, which is a contradiction. Hence, G really is unprovable,
and it is therefore an unprovable truth. (One might wonder what it means to assert
that G is true; we will deal with that point later when we give a more careful proof
of the incompleteness theorem.) Of course, we could add G to the axioms of F to
obtain a new system F 0 , but even that would not suffice to prove the sentence This
sentence is not provable in F 0 . Nothing we do can complete the formalization.
It is easy to say This sentence is not provable in F , but it is not so easy to
come up with an equivalent statement of formalized mathematics. There are two
difficulties, namely formalizing this sentence and is not provable in F . The
7Of course, technical flaws of this sort generally have little to do with the correctness of
important theorems. The reason why a theorem is true remains unchanged. For example, by
modern standards Euclids proofs omit certain details, but his theorems are all true. However,
even if flaws in the foundations of mathematics would probably not lead to the collapse of all
human knowledge, they would still be quite annoying, because they would indicate that we had
not thought clearly enough about exactly what we mean when we talk about mathematics.
NOTES ON LOGIC 49

second difficulty, that of coming up with a precise description within F of what


it means to be provable, is one we have already dealt with. When we proved the
completeness theorem, we gave a precise mathematical definition of proof and we
showed that it captures exactly what we wanted it to.8
The first difficulty amounts to achieving implicit self reference. Explicit self-
reference is not allowed, so how could Godel formulate his sentence? His construction
is best appreciated by studying the following version of the paradox of the liar,
which Quine used to elucidate Godels construction:
yields falsehood when appended to its own quotation yields
falsehood when appended to its own quotation.
Taking the quoted phrase and appending it to its own quotation reproduces Quines
sentence, which asserts that this process yields a falsehood. In other words, Quines
sentence claims to be false, which is equivalent to the liar paradox. The surprise
is that Quines sentence never explicitly refers to itself, so it gets around the ban
on self reference. Godels construction is analogous, except of course that yields
falsehood is replaced by is unprovable (and everything is described in formal
mathematical terms).
At this point, one might worry that Godels trick of implicit self-reference un-
dermines mathematics. Why cant one use it to formalize the liar paradox, instead
of Godels innocuous variant? The subtlety does not lie in the self reference, but
rather in the other half of the sentence. In order to state This sentence is false in
formal mathematics, one would have to formalize the notion of falsehood. Thats
quite different from formalizing the notion of unprovability. We know the form a
mathematical definition of provability takes: a statement is provable if it is the
conclusion of a proof, a finite list of formal statements that either are axioms or
follow from earlier statements using certain rules of inference. The details are
formidable, but we have examined them carefully. By contrast, how would one give
a formal mathematical definition of truth? No similar definition comes to mind, and
in fact Tarski observed that no formal definition can exist (precisely because one
would then be able to formulate the liar paradox).
Implicit self reference has many applications. For example, it gives a simple
solution to the puzzle of writing a program to print its own source code:
Print the following string followed by its quotation: Print the
following string followed by its quotation:
This English text effectively instructs the reader to copy it. The same idea works
in any programming language. For example, one implementation in Scheme (an
especially mathematically elegant programming language) is as follows:
((lambda (body) (list body (list (quote quote) body)))
(quote (lambda (body) (list body (list (quote quote) body)))))

8The names completeness theorem and incompleteness theorem may seem contradictory,
but they refer to differ aspects of formal mathematics. The underlying system of logic is complete:
given a set of axioms and a sentence , one can prove from if and only if is satisfied
by all models of . In other words, everything that really follows from is provable, and vice
versa. However, if we are trying to deal with a large enough branch of mathematics, then the set
of axioms will necessarily be incomplete: there will be sentences that can neither be proved nor
disproved using .
50 HENRY COHN

19. Proofs of the Incompleteness Theorems


Our intuitive discussion of the first incompleteness theorem omitted certain details.
For example, what if one were to fix a structure A and take all true statements in
A as axioms? That would certainly be a complete theory, in apparent contradiction
to the incompleteness theorem. Of course, the problem with this approach is that
nobody can tell what is or isnt an axiom, so one cannot define provability because
one cannot even define what the axioms are. To understand Godels incompleteness
theorems fully, we need to clarify the exact hypotheses needed.
Suppose we have a set of first-order sentences, which we take as our axioms.
We will use the notation of modal logic, and write  to mean sentence is
provable. This operator has the highest precedence; thus,   means if
is provable, then it can be proved that is provable. We view  not as an
addition to our language, but rather as an abbreviation for a formal definition
of provability. This means our language must be expressive enough to make this
definition. In practice, this means the language must be capable of describing string
manipulation, and the set of axioms must be systematic enough that we can
specify the corresponding strings by a first-order formula. In other words,  will
be a first-order sentence asserting that the string has a formal proof.
Elementary plane geometry is not expressive enough to do this, and indeed it has
a complete set of axioms. By contrast, set theory can define just about anything
in mathematics, and it is certainly able to deal with wffs and provability. It turns
out that even Peano arithmetic is sufficiently expressive, but that requires some
coding tricks based on elementary number theory. For simplicity, we will focus on
set theory.
All our arguments will be based on the following three properties, Lobs elegant
formulation of the Hilbert-Bernays provability axioms:
(1) For all sentences , if ` , then ` .
(2) For all sentences , ` ( ).
(3) For all sentences and , ` (( ) ( )).
The first axiom is intuitively clear: to prove , it suffices to exhibit a proof
of . The second axiom tells us that we can in fact prove this within our formal
system. (Notice that neither axiom implies the other: the first could be true but not
provable using , or it could be provable yet false.) The third axiom is an obvious
property of implication.
If we take to be the ZFC axioms and define  using our definition of formal
proof, then all three Hilbert-Bernays axioms can be proved. The first and third are
not especially difficult, but the second is somewhat painful to check. Fortunately, it
is not needed until the second incompleteness theorem.
Note that ` and  are very different statements. The former is a statement
about from the outside, while the latter is a first-order sentence in our language
(i.e., the system is talking about itself).

odel sentence for is a sentence such that ` ( ).


Definition 19.1. A G

In other words, a Godel sentence is a sentence that is (provably) equivalent to its


own unprovability, so it captures the idea of saying This sentence is unprovable. We
assume for now that such a sentence can be constructed using indirect self-reference.
In the next section, we will do so explicitly.
NOTES ON LOGIC 51

Definition 19.2. The axiom set is complete if for every sentence , either `
or ` . It is consistent if 6` , and it is -consistent if 6` .
In other words, is complete if everything can be proved or disproved, it is
consistent if a it cannot prove a contradiction, and it is -consistent if it cannot
prove that a contradiction can be proved. (This is only a special case of the standard
definition of -consistency, but it will work for our purposes here.) Notice that if
is -consistent, then it is consistent, by the first Hilbert-Bernays axiom. However,
the converse is false: just because one can prove that a contradiction is provable,
it does not follow that a contradiction actually is provable, as we will see once we
have proved the second incompleteness theorem.
Lemma 19.3. For every sentence , ` (( ) ).
Proof. It is a tautology that ( ), so ` ( ( )). Then the first
Hilbert-Bernays axiom implies ` ( ( )), and the third then yields `
( ( )). When combined with ` (( ) ( )), we
conclude that ` ( ( )). However, the wff ( ( ))
is equivalent to (( ) ), so ` (( ) ), as desired. 

Theorem 19.4 (First incompleteness theorem). Suppose is a G odel sentence for


. If ` , then ` . If ` , then ` . Thus, if is -consistent, then
is independent of and hence is incomplete.
Proof. If ` , then `  because ` ( ) (by the definition of
a Godel sentence), and the first Hilbert-Bernays axiom says that ` implies
` . However, using both  and  we can deduce that ` .
If ` , then `  by the definition of a Godel sentence, and `  by
the first Hilbert-Bernays axiom. It then follows from Lemma 19.3 that ` , as
desired. 

Note that we have phrased the first incompleteness theorem in terms of -


consistency, and have not discussed whether the Godel sentence is true (the way
we did in the intuitive background section). One reason is that it is meaningless to
assert the truth of except in the context of a particular model of , and we may
not have a particular model in mind. For example, how could one specify a model
of ZFC? When expressed in terms of -consistency, the incompleteness theorems
are purely syntactic, and there is no need to use a semantic notion such as truth.
One might hope to weaken the hypotheses of the first incompleteness theorem,
so that we assume only consistency for , rather than -consistency. That cannot
be done with G odel sentences, because they are automatically provable in an -
inconsistent system. Rosser constructed a related type of sentence that requires
only consistency, as we will see later.
We can formulate and prove the content of the first incompleteness theorem
inside our formal system as follows:
odel sentence for . Then ` ( ) and
Lemma 19.5. Suppose is a G
` ( ).
Note that   is a first-order sentence that expresses the idea that if
` , then ` . Similarly,   expresses the other half of the first
incompleteness theorem.
52 HENRY COHN

Proof. The proof simply amounts to examining the proof of the first incompleteness
theorem in the light of the Hilbert-Bernays axioms. We can prove ` ( )
as follows: by the definition of , ` ( ). It follows from the first and third
axioms that ` ( ). On the other hand, ` ( ) by the
second axiom. Thus, ` ( ( )), so it follows from Lemma 19.3
that ` ( ).
We conclude by proving ` ( ). By the definition of , `
( ), so ` ( ). On the other hand, we have already proved
` ( ), so ` ( ). Thus, ` ( ). 
Theorem 19.6 (Second incompleteness theorem). If is consistent and has a
odel sentence, then 6` .
G
Note that  expresses the unprovability of a contradiction, i.e., the consistency
of our system. Thus, the second incompleteness theorem says a consistent system
cannot prove its own consistency.
Proof. Let be a Godel sentence, and suppose ` . By Lemma 19.5,
` ( ), so ` ( ). Thus, ` , so ` because is a
Godel sentence. However, by the first incompleteness theorem, ` implies ` ,
so is inconsistent. 
The proof of the second incompleteness theorem basically amounts to Lemma 19.5.
In other words, it comes down to the fact that is itself strong enough to prove
the first incompleteness theorem.
It is natural to wonder whether there are -inconsistent axiom sets that are
nevertheless consistent. One minor consequence of the second incompleteness
theorem is that there are. Suppose is consistent.9 Then 6` , so {}
is consistent. However, this enlarged system is automatically -inconsistent, since
 asserts the provability of a contradiction. (Strictly speaking, it asserts the
provability using just as axioms, but that implies the provability using {}.)
This construction of -inconsistent systems is somewhat silly, since it simply
adds the provability of a contradiction as an axiom. One doesnt expect to run into
many other examples, and the concept of -inconsistency is more of a theoretical
worry than a practical danger. However, we must deal with it if we are to study the
incompleteness theorems rigorously (without using Rossers trick).
The same way Godel sentences assert their own unprovability, one can construct
sentences that assert their own provability. Henkin asked what one could say about
such sentences. A Henkin sentence is a sentence such that ` ( ).
It might at first sound like nothing could be said about Henkin sentences, but
Lob realized that they are always in fact provable. In fact, something even more
general is true:
ob). For all sentences , if ` ( ), then ` .
Theorem 19.7 (L
The intuitive motivation for the proof of L
obs theorem is the following paradoxical
argument for the existence of Santa Claus:
9One might wonder whether there is any consistent axiom system capable of formalizing the
concept of proof. For set theory, the answer is presumably yes, but it is not so clear how we could
know for sure. However, this formalization can also be carried out in much humbler systems, such
as elementary number theory. Ultimately, in order to prove that a system is consistent, we have to
assume something, but first-order number theory seems like a safe bet.
NOTES ON LOGIC 53

Consider the sentence If this sentence is true, then Santa Claus


exists. Suppose it is a true sentence; then its hypothesis is true
by assumption, as is the implication asserted by the sentence, so
Santa Claus exists. Thus, we have shown that if the sentence is
true, then Santa Claus exists. That is precisely what the sentence
asserts, so we have proved the sentence. Therefore, it is true, and
consequently Santa Claus does exist.

Proof. Suppose ` ( ). Using implicit self reference, construct a sentence


such that
` ( ( )).
(Notice the analogy with the Santa Claus sentence.) As a special case,
` ( ( )).
It follows that
` ( ( )),
and hence
` ( ( )).
We know that
` (( ) ( )),
so
` ( ( )).
However, we also know
` ( ).
Thus,
` ( ).
Because ` ( ), we end up with
` ( ).
By the defining property of , this is equivalent to , so ` . It follows that
` , and thus ` , as desired. 

Lobs theorem is extraordinary, since it says the only way can prove 
is if it actually proves (in which case  , or indeed for any , follows
trivially). This is the exact opposite of naive intuition. After all,  means
that if is provable, then it is true. Surely, it should be possible to prove that
anything thats provable is true. However, Lobs theorem says you cant prove this
in general, and in fact that you can only prove  in the narrowest possible
case, namely when itself is provable.
Using L obs theorem, we can give another proof of the second incompleteness
theorem:

Proof. For every sentence , its negation is equivalent to . If we take


= in Theorem 19.7, we find that if ` , then ` . 

In fact, one can also deduce Lobs theorem from the second incompleteness
theorem.
54 HENRY COHN

20. Constructing a Go
del sentence
In the previous section, we did not actually construct a Godel sentence, but
instead referred vaguely to indirect self-reference. One can carry out the construction
more formally as follows.
First, we assign a Godel number to each wff and to each finite sequence of
wffs. In other words, we encode them as natural numbers.10 We can do so in any
reasonable way (using prime factorizations, decimal expansions, or whatever we
like). The Godel numbers for finite sequences of wffs will be used to describe formal
proofs.
Now let P (m, n) be a wff with two free variables m and n, which says m is
the Godel number of a proof of the sentence with Godel number n. Of course,
expressing this idea as a first-order wff is rather cumbersome, but it can certainly
be done in the language of set theory, for example. (Doing it for first-order number
theory is trickier but also possible.) The precise definition of P will depend on our
axiom set .
Let Q(m, n) be a wff saying m is the Godel number of a wff with one free
variable, and n is the Godel number of that wff with m substituted for the free
variable. In other words, if we let Wk (x) denote the wff with Godel number k and
free variable x (if any, since some Godel numbers wont correspond to a wff with
one free variable), then Q(m, n) means n = Wm (m). Plugging m into Wm should
bring to mind diagonalization, and that is exactly what is going on here.
Finally, let G(k) = n (Q(k, n) m P (m, n)). In other words, G(k) says that
Wk (k) is not provable.
If g is the Godel number of the wff G(k) (which has one free variable k), then
G(g) is a Godel sentence: it says that Wg (g) is not provable, but G = Wg , so G(g)
asserts its own unprovability.
This construction is by no means limited to G odel sentences, since it depends on
no property of provability except that it can be defined by a first-order wff. Given
any wff R(n) that specifies a property of the sentence with Godel number n, we
could define G(k) = n (Q(k, n) R(n)). If g is the Godel number of G(k), then
G(g) asserts that it itself has property R. For example, taking R(n) = m P (m, n)
gives a G odel sentence. We can use this approach to produce the auxiliary sentence
in the proof of L obs theorem (which asserts that its own provability implies ).

10Its traditional to use natural numbers, but if we are doing this in set theory we could just as
easily use strings.

You might also like