Professional Documents
Culture Documents
HENRY COHN
Contents
1. Syntax 1
2. Semantics 4
3. Boolean algebra 6
4. Classification 10
5. Quantum logic 13
6. Formal proofs 14
7. First-order languages and syntax 19
8. First-order semantics 21
9. Examples of first-order languages 23
10. Completeness and compactness 25
11. Consequences of compactness 27
12. Downwards L owenheim-Skolem 29
13. Filters and ultrafilters 32
14. Voting 35
15. Ultraproducts 38
16. Formal proofs in first-order logic 42
17. Introduction to incompleteness 46
18. Intuitive Background 47
19. Proofs of the Incompleteness Theorems 50
20. Constructing a G odel sentence 54
1. Syntax
The propositional calculus is the mathematical system that deals with logical
relationships between propositions, i.e., between assertions. The word calculus
simply means a method of calculating. It is by no means limited to the differential
or integral calculus, although those are certainly the most famous examples.
The propositional calculus involves manipulating strings of symbols. Some of the
symbols will be variables, which stand for propositions; then we can combine them
with the logical symbols (not), (and), (or), (implies), and (is equivalent
to). For example, (p p) means that p is true or p is not true.
This framework is quite limited, because it does not allow us to create interesting
statements from scratch. Using the propositional calculus, we can make assertions
like (p q), but we cannot explain what p and q actually mean, so we are limited
to talking about pre-existing statements. Thus, to do serious mathematics we will
1
2 HENRY COHN
1Alternatively, we could have used a finite set of symbols, including the prime. That would
avoid the infinite alphabet, at the cost of slightly wordier rules about variable naming. Which
approach to take makes little difference mathematically and is a matter of taste.
NOTES ON LOGIC 3
The only reason why unique readability holds is that we have fully parenthesized
everything. If we allowed strings like p q r, it would be ambiguous whether the
string was built from p and q r or from p q and r. In practice, writing many
parentheses can be tedious, and it is often convenient to omit some of them, but the
resulting formula should be viewed as shorthand for the real, fully parenthesized
wff.
2. Semantics
We have specified the syntax of the propositional calculus, i.e., the grammar
that specifies which formulas are well formed. Now we turn to the semantics, the
meaning of the formulas.
The symbol > stands for truth, and for falsehood. If we substitute > and
for the variables in a wff, then the following table shows how to assign a truth value
of > or inductively to the entire wff:
( ) ( ) ( ) ( )
> > > > > >
> >
> > > >
> > >
Under these rules, each wff involving variables x1 , . . . , xn determines a function
from the variable assignments {>, }n to {>, }. Note that this process depends
crucially on Lemma 1.4, since unique readability guarantees that we cannot evaluate
a wff in two different ways.
In the table shown above, each symbols behavior corresponds to its name (not,
and, or, implication, and equivalence). Two cases are worthy of note. One is that
always denotes inclusive or, so ( ) is true if is true, is true, or both are
true. The other is that denotes material implication. In other words, ( )
depends only on the truth values of and , and does not require any conceptual
relationship between the statements. It simply means that if is true, then is
true; equivalently, is true or is not true.
For example, if p stands for 1 + 1 = 2 and q stands for Fermats last theorem, then
(p q) is true, because p and q are both true, even though there is no obvious way
to deduce the truth of Fermats last theorem from 1 + 1 = 2. Material implication
does not require such a deduction. Part of its beauty is that we can deal with
implications without ever having to formalize the tricky intuitive notion of what it
means for q to be a consequence of p; instead, all we need is the property that if p
and (p q) are both true, then q is true. Material implication has this property.
Similarly, a false statement vacuously implies everything: ( p) is true
regardless of whether p is true or false. This can be slightly counterintuitive, since
it disagrees with the usual English interpretation of the word if. Suppose it isnt
raining, and someone says if it were raining, I would take my umbrella. This
person is presumably imagining a counterfactual scenario and reasoning about how
that imaginary world differs from the real world. Its much less plausible that
someone would say if it were raining, then it wouldnt be raining, even though
NOTES ON LOGIC 5
material implication allows you to deduce anything from a falsehood. (The use of
the subjunctive suggests counterfactual reasoning. The sentence if it is raining,
then it is not raining avoids that issue, at the cost of sounding even weirder.)
Counterfactual reasoning is much richer and deeper than material implication, too
much so to be captured by the propositional calculus. Instead, material implication
depends only on truth values. We know a false statement can sometimes imply a
true statement (if 1 = 1, then (1)2 = 12 ) and can sometimes imply another false
statement (if 1 = 1, then 1 + 1 = 1 + 1). If we have just truth values to work with,
then we can capture this behavior only by deciding that ( p) is always true,
regardless of whether p is true. One might fear that this is an oversimplification
compared with counterfactual reasoning, and indeed it is, but it is not a harmful
oversimplification, and material implication provides a perfectly good foundation
for mathematics. It has the property that if you know p and (p q), then you can
deduce q, and that is good enough.
The logical symbols used in the propositional calculus are redundant, because we
can express some of them in terms of the others. We could replace ( ) with
(( ) ( )), and we could replace ( ) with ( ). We could even
eliminate one of and by expressing it in terms of the other, i.e., replacing ( )
with ( ) or replacing ( ) with ( ). These sorts of reductions
are sometimes convenient, but they make long formulas much harder to read, so we
will cheerfully use a redundant system.
Definition 2.1. A wff in the propositional calculus is satisfiable if there exist
truth values for its variables such that evaluates to >. It is a tautology if it always
evaluates to >.
For example, (p p) is a tautology, as is (q (p q)). The wff (p q) is
satisfiable but not a tautology, and (p p) is not even satisfiable. Note that is a
tautology if and only if is not satisfiable.
It is generally easy to tell whether a short wff is a tautology. However, it can be
a little counterintuitive. For example,
(((p q) r) ((p r) (q r)))
is a tautology, because both ((p q) r) and ((p r) (q r)) are equivalent to
((r p) q). At first, this seems plainly absurd: it seems to be saying that if two
hypotheses together imply a conclusion, then only one of the hypotheses is actually
needed to reach the conclusion. However, that reformulation is misleading, because
it is subtly changing the problem. When people think about implication, they often
allow an implicit universal quantifier to slip in: they imagine that propositions p
and q depend on some circumstances x, and they think of (p q) as meaning
x (p(x) q(x)) (which is of course not a wff in the propositional calculus). This is
not what the propositional calculus studies; it deals with single, isolated propositions,
not families of them depending on other variables. That distinction clears up the
intuitive problem:
x ((p(x) q(x)) r(x))
is not equivalent to
(x (p(x) r(x))) (x (q(x) r(x))),
but it is equivalent to
x ((p(x) r(x)) (q(x) r(x))).
6 HENRY COHN
3. Boolean algebra
Classical logic is based on the law of the excluded middle: a proposition that is
not true must be false, with no third possibility. However, it is sometimes convenient
to extend the notion of logic to other truth values. For example, we could use a
three-valued logic, with a truth value ? (meaning maybe) in addition to > and
. Theres a natural way to extend , , , , and to this setting, based on
interpreting ? as a state of ignorance about whether the truth value is > or :
( ) ( ) ( ) ( )
> > > > > >
> ? ? > ? ?
> >
? > ? ? > > ?
? ? ? ? ? ? ?
? ? ? ? ?
> > > >
? > ? > ?
> > >
However, this system has some unfortunate properties. For example, we would like
(p p) to be a tautology, because every proposition should be equivalent to itself,
but in the three-valued logic described above, it is not a tautology. The problem is
that (? ?) evaluates to ?, since it has no way of knowing whether the two question
marks describe the same uncertain proposition or different ones.
In this section, we will develop the concept of a Boolean algebra, which is arguably
the best-behaved way to extend two-valued logic. The three-valued logic described
above is not a Boolean algebra. We will stick with classical two-valued logic in the
rest of this course, but before we proceed its worth putting it in a larger context.
We will build up to Boolean algebras in several steps. To begin, we will have
a partially ordered set B of truth values. The ordering relation p q means q is
at least as true as p; for example, this may hold if q is true on weekends and p is
true on Saturdays. The reason it is only a partial ordering is that there may be
NOTES ON LOGIC 7
r
@
@
r @r
HH
H
HH
r Hr
@
@
@r
incomparable truth values, for example things that are true on Mondays and things
that are true on Tuesdays.
In principle, we could use any poset of truth values, but it is far from clear how
to define the operations , , , , and in an arbitrary poset. Thus, we will
impose additional structure on B. We will define and in terms of the other
three operations, just as in the two-valued case, but that still leaves three operations
to go.
Definition 3.1. A lattice is a poset in which every pair of elements p and q has a
least upper bound p q and a greatest lower bound p q.
In a lattice, p q is read p join q, and p q is read p meet q. Recall that the
least upper bound property means
(1) p q p and p q q, and
(2) for all r such that r p and r q,
r p q.
I.e., it is an upper bound, and it is the smallest of the upper bounds. The greatest
lower bound has exactly the same property, but with all the inequalities reversed.
Two elements p and q in a poset have at most one least upper bound: given two
of them, each would have to be less than or equal to the other. Similarly, they have
at most one greatest lower bound. However, they neednt have either a least upper
bound or a greatest lower bound. For example, in a poset consisting of just two
incomparable elements, those elements have no upper or lower bounds at all. Thus,
not every poset is a lattice. For a less trivial example, see Figure 3.1. In this poset,
every pair of elements has upper and lower bounds, but not always a least upper
bound and a greatest lower bound.
For an example of a poset that is a lattice, let S be a set, and consider P(S)
ordered by . The least upper bound is then the union, and the greatest lower
bound is the intersection.
It is not difficult to check that if p1 , . . . , pk are elements of a lattice, then
p1 pk is their least upper bound, and p1 pk is their greatest lower
bound. (No parentheses are needed, because these operations are associative.) In
particular, every finite lattice contains elements 0 and 1 such that 0 x 1 for all
x in the lattice. However, that is not necessarily true in infinite lattices.
The lattice perspective on and fits naturally with the idea of truth values:
p or q should be at least as true as p and at least as true as q, but no truer than
this forces it to be, so we take it to be the least upper bound p q of p and q, and
8 HENRY COHN
r
@
@
r r @r
@
@
@r
we deal similarly with and. Thus, we will assume that our poset B of truth values
is a lattice. However, we have not yet seen how to deal with .
In a complemented lattice, 0 and 1 behave like and >, and q behaves like p.
The lattice P(S) is complemented: 0 = , 1 = S, and the complement of p is S \ p.
However, not every lattice is complemented. For example, every totally ordered
set is a lattice, with p q = max(p, q) and p q = min(p, q). Even if there are
greatest and least elements 1 and 0, no other element p can have a complement,
since max(p, q) = 1 and min(p, q) = 0 imply that {p, q} = {0, 1}. Thus, totally
ordered sets with more than two elements are never complemented lattices.
Complemented lattices almost allow us to define , but there is one major issue:
an element may have several complements. (By contrast, 0 and 1 are uniquely
determined by property (1) in the definition.) For example, in Figure 3.2, each
of the three incomparable elements in the middle is a complement of each of the
others.
In other words, each of and distributes over the other, just as in classical
two-valued logic. The lattice P(S) is always distributive, but the lattice drawn
above is not. If we let p, q, and r be the elements in the middle row (and let 0 and
1 be the least and greatest elements), then
p (q r) = p 1 = p,
but
(p q) (p r) = 0 0 = 0.
4. Classification
What are all the Boolean algebras? The obvious examples are the lattices P(S)
of all subsets of a given set S. In this section, we will show that there are no other
finite examples, up to isomorphism. An isomorphism of Boolean algebras is the
same as an isomorphism of the underlying poset. Equivalently, it is a bijective map
that preserves , , and (equivalence holds because the poset structure can be
defined in terms of these operations, and vice versa).
NOTES ON LOGIC 11
There exist infinite Boolean algebras that are not of this form. For example, take
any countably infinite subset S of an infinite Boolean algebra, and let B be the
closure of S under , , and . Then B is a Boolean algebra (by Proposition 3.6).
However, B is countably infinite, and it is therefore not even in bijection with a
power set, let alone isomorphic to one as a poset.
The Stone representation theorem characterizes all Boolean algebras (finite or
infinite) in terms of topology: for every Boolean algebra B, there is a compact,
totally disconnected, Hausdorff topological space T such that B is isomorphic to
the poset of subsets of T that are both closed and open. In fact, one can recover T
from B as the space of homomorphisms from B to the two-element Boolean algebra.
However, that theorem is beyond the scope of these notes. Instead, we will focus on
the finite version.
Definition 4.1. An atom in a Boolean algebra is a nonzero element p such that
there exists no q satisfying 0 < q < p.
In other words, an atom is an element as close to the bottom as possible, without
actually being at the bottom. The dual notion is a coatom, but we will focus on
atoms.
For our purposes, the importance of atoms is that they recover the elements of
the underlying set S from P(S): an atom in P(S) is a nonempty subset of S that
has no nonempty proper subsets, and that means an atom must be a single-element
subset. Thus, if we want to show that a Boolean algebra is of the form P(S), then
the elements of S must corresponds to the atoms in the Boolean algebra.
Lemma 4.2. If p is an atom in a Boolean algebra and p q r, then p q or
p r.
This is obvious for P(S) (if a single element is in the union of two sets, then it
must be in one or the other), but we must prove it for a general Boolean algebra.
Lemma 4.3. If p and q are elements of a Boolean algebra such that p < q and
every r < q satisfies r p, then p = 0 and q is an atom.
In other words, p cannot be a bottleneck between q and the elements less than
q, except in the degenerate case in which p = 0. To make use of the hypothesis in
proving this lemma, we must choose an r. We would like to choose one that is at
most q and unlikely to be at most p, and taking r = q p is a natural choice.
and we would gain little by casting everything in more general terms. However,
other Boolean algebras do play a fundamental role in certain areas within logic. For
example, Cohens forcing technique for proving independence results in set theory
can be understood in terms of constructing Boolean-valued models of ZFC, i.e.,
models of set theory in which the truth values are drawn from a Boolean algebra.
Of course the details are subtle and important, but this is a very plausible way to
arrive at independence proofs. At the risk of oversimplification, if one can construct
a model in which all the axioms of ZFC have truth value 1, but the continuum
hypothesis has truth value strictly between 0 and 1, then it can be neither proved
nor disproved using the axioms.
5. Quantum logic
Is it possible that the real world is governed by non-classical logic at the level
of fundamental physics? Surprisingly, the answer is yes: quantum mechanics can
naturally be described in terms of a strange logic called quantum logic, which is not
a Boolean algebra.
The fundamental construction amounts to replacing set theory with linear algebra.
The set of states of our quantum system will be described by a complex vector space
V of possible wave functions. Typically, V will be an infinite-dimensional Hilbert
space, but we can imagine a finite-dimensional vector space, and that does indeed
occur for very simple quantum systems. The state space V will play the same role
as the set S in the Boolean algebra P(S).
Truth values of propositions in quantum logic are vector subspaces of V (closed
subspaces in the infinite-dimensional case, but this distinction is irrelevant for
finite-dimensional spaces). The key difference from classical logic is the restriction
to subspaces: a proposition cannot be true for an arbitrary subset of the state space,
but rather just for a vector subspace.
Let Q be the set of all subspaces of V . Then Q is a lattice under . Specifically,
p q is the intersection of p and q, and p q is their span (i.e., the smallest subspace
containing both). The identity element 0 for is the zero-dimensional subspace,
and the identity 1 for is the full space V . Furthermore, we can define p to be
the orthogonal complement of p (provided we have an inner product on V ). Thus,
Q is a complemented lattice, so we can carry out all of our logical operations in Q.
However, Q is not a Boolean algebra, because it is not distributive, assuming
dim V 2. Let p, q, and r be distinct lines in the same plane. Then q r equals
that plane, so p (q r) = p. However, p q = 0 and p r = 0, so (p r) (p r) = 0.
Thus, p (q r) 6= (p r) (p r).
This is extremely disconcerting if one takes it literally, because the real world is
governed by quantum mechanics and hence does not have a distributive lattice of
truth values. In fact, this perspective helps explain (or at least formalize) some of
the weird aspects of quantum mechanics, such as the uncertainty principle.
For a toy model, imagine a particle that has two possible positions q0 and q1 , and
two possible momenta p0 and p1 . We know it is definitely at position q0 or q1 ; in
other words, q0 q1 = 1. Similarly, p0 p1 = 1. It follows that (q0 q1 )(p0 p1 ) = 1.
Thus, we know it has a definite position, and independently a definite momentum.
However, we cannot apply the distributive law to conclude that
In fact, the uncertainty principle for position and momentum tells us that this is
false! In other words, even though the particle has a definite position and a definite
momentum, we cannot conclude that we can specify both at once.
For a concrete realization of this possibility, let p and q be distinct lines in a
two-dimensional space V , such that p and q are not orthogonal. Then
(p p) (q q) = 1,
(p q) (p q) (p q) (p q) = 0,
so there is no state in which we can pin down the status of both p and q.
It is difficult to understand what this means, because it is natural to apply the
distributive law without thinking, and this difficulty is responsible for much of
the confusing nature of quantum mechanics. Fortunately, we do not have to use
quantum logic to understand quantum mechanics. There are two ways to approach
quantum mechanics: we can either think inside the quantum system and apply
quantum logic, or think outside the system and apply classical logic to reason about
the state space V . The second option implicitly deals with quantum logic in the
form of linear algebra, but it is much easier to take this approach. In principle, one
might hope to achieve deeper insight by training ones brain to use quantum logic
directly, but in practice that does not seem to be fruitful.
Quantum logic is far from an arbitrary complemented lattice. First, it is ortho-
complemented. In other words, we can choose a complement p for each p such that
p = p and the map p 7 p is order-reversing. (For comparison, it is not hard to
check that the complemented lattice shown in Figure 3.2 is not orthocomplemented.)
Furthermore, quantum logic is orthomodular : if p q, then p (p q) = q. Note
that this is a special case of the distributive law p (p q) = (p p) (p q).
Orthomodularity is more symmetrical than it looks, because one can prove that
the orthomodular law is equivalent to its dual, which says that if r s, then
r (r s) = s. (Take r = p and s = q.)
As we can see from the uncertainty principle, orthomodularity is a poor substitute
for distributivity, but it is better than nothing. It is remarkable that it describes
quantum mechanics while distributivity does not.
6. Formal proofs
In this section, we will develop a notion of formal proof for tautologies. Of course,
it is straightforward to test whether a wff is a tautology using a truth table, so there
is no real need to use formal proofs. However, they are interesting in their own right
as well as good preparation for the much deeper topic of formal proofs in first-order
logic, which we will turn to next.
Note that we are setting aside general Boolean algebras and returning to classical
logic, the two-element Boolean algebra {>, }.
For simplicity, we will use just a subset of the propositional calculus, with
variables, parentheses, , and . Everything else can be defined in terms of them,
according to the rows of the following table:
NOTES ON LOGIC 15
( )
> ( )
( ) ( )
( ) ( )
( ) (( ) ( ))
We will view , , , and simply as abbreviations, rather than part of our formal
language. Alternately, one could keep them as part of the language and have special
rules for going back and forth between the two forms listed above.
The purpose of simplifying the language is to minimize case analysis in our study
of the proof system. Nobody would want to use such a restricted set of symbols
for mathematics in practice, but it is pleasant to streamline the theory as much as
possible.
Our proof system will be based on five axioms. Specifically, for all wffs , , and
, each of the following wffs is an axiom:
(1) ( )
(2) ( ( ))
(3) (( ) (( ( )) ( )))
(4) (( ) ( ))
(5) ((( ) ) )
Note that these axioms are tautologies (they are true regardless of what , , and
are). Axioms 1 and 3 are obvious properties of implication, as is 2 if one keeps
in mind that is material implication. Axioms 4 and 5 are best understood by
reading ( ) as .
Strictly speaking, these are axiom schemes: they are infinite families of axioms,
one for each choice of , , and . However, this distinction wont be important for
us right now.
Axiom 5 is the law of the excluded middle: a proposition that is not false must
be true. Axioms 1 through 4 define what is known as intuitionistic logic.
It is natural to wonder where these axioms came from. What would lead someone
to choose them, as opposed to all the other true axioms one could choose instead?
The answer is that they come from reverse engineering proofs. If you try to prove
the results from this section, and add a new axiom every time you get stuck, then
youll end up with a perfectly serviceable list of axioms.
In addition to the axioms, our system has one rule of inference, called modus
ponens (Latin for the method of affirming or establishing). Modus ponens says that
if we have proved wffs and ( ), then we can deduce .
Note that a rule of inference tells how we can manipulate wffs, which is different
from what an axiom does. See What the Tortoise said to Achilles by Lewis Carroll
(Mind 4 (1895), 278280; reprinted in Hofstadters book G odel, Escher, Bach: an
Eternal Golden Braid ). See also A counterexample to modus ponens by Vann McGee
(Journal of Philosophy 82 (1985), 462471) for the only objection Ive ever seen to
modus ponens.
We can now define a formal proof of a wff from a set of hypotheses (i.e., any
wffs other than axioms that we would like to assume).
Definition 6.1. Let be a wff and a set of wffs. A formal proof of given
is a finite sequence 1 , . . . , n of wffs with n = , such that each of them either
follows from two previous wffs in the sequence via modus ponens or is an axiom or
16 HENRY COHN
Corollary 6.7. Let be a wff and a set of wffs. If is consistent, then at least
one of {} and {( )} is consistent (possibly both).
Proof. If both {} and {( )} are inconsistent, then ` ( ) and
` , by (2) and (3) in Lemma 6.6. Then ` by modus ponens, and hence is
inconsistent, as desired.
We can now prove the converse of soundness. It says that our proof system is
powerful enough to prove everything that could possibly be proved.
Theorem 6.8 (Completeness). If evaluates to > under every satisfying assign-
ment for , then ` .
18 HENRY COHN
To see why the corollary follows from the completeness theorem, note that if is
not satisfiable, then every satisfying assignment for makes true (because has
no satisfying assignments), and thus ` . In fact, the theorem also follows from
the corollary: if evaluates to > under every satisfying assignment for , then
{( )} is not satisfiable. By the corollary, it is inconsistent, so part (3) of
Lemma 6.6 implies that ` .
Thus, it suffices to prove the corollary. To do so, we must somehow use consistency
to produce a satisfying assignment. There are usually many satisfying assignments
for a given set of wffs (if it is consistent), which makes it difficult to single one out.
We will get around this difficulty by making the consistent set as large as possible,
so that it will have a unique satisfying assignment, which can easily be described.
finite subset of 0 also proves . Each element of the finite subset is in i for some
i, and because the sets i are nested, the entire finite subset is in one of them.
However, that contradicts the consistency of i .
The set 0 is a maximal consistent set of wffs, because for each wff , either
0 or ( ) 0 (while they cannot both be in any consistent set). If is
consistent with 0 , then 0 .
Now we can define an assignment of truth values as follows. For each variable x,
if x 0 then we set x to be >, and if (x ) 0 then we set x to be . Let
ev() denote the evaluation of under this assignment.
We will prove that for every wff , ev() = > if and only if 0 . Thus,
because 0 , we have found a satisfying assignment for . We will prove this
equivalence by induction on the length of .
We begin with the cases when is a variable or . By definition, each variable
x satisfies ev(x) = > iff x 0 . Furthermore, ev() = , which corresponds with
6 0 (since 0 is consistent). Thus, all we need to verify is that the equivalence
between ev() = > and 0 holds when = ( ), assuming it holds for
and because they are shorter wffs.
Specifically, we must prove that 0 when ev() = > or ev() = , and that
6 0 when ev() = and ev() = >. In terms of membership in 0 , we must
prove that if 0 or ( ) 0 , then ( ) 0 , while if ( ) 0
and 0 , then ( ) 6 0 .
First, suppose 0 . By Axiom 2, 0 ` ( ( )), and hence 0 ` ( )
by modus ponens. Thus, ( ) is consistent with 0 by part (1) of Lemma 6.6,
so ( ) 0 .
When ( ) 0 , we can use the same argument with Axiom 4 instead of
Axiom 2. Recall that Axiom 4 tells us that 0 ` (( ) ( )).
NOTES ON LOGIC 19
within first-order logic, using just a single binary relation symbol . The first-order
language of set theory is nearly universal within mathematics.
It may seem circular to base our analysis of logic on sets and then formulate
set theory within a framework of mathematical logic. From a strictly deductive
perspective, this is correct: we cant justify each one using the other without
circularity. However, we should be making a distinction between mathematics and
metamathematics (the mathematical study of mathematics itself). Our perspective
will be that we know what a proof is and how to judge an informal argument. The
point to formalizing proofs is generally not to increase our confidence or justify our
reasoning. Instead, its to get a deeper understanding of the very idea of proof. In
order to even talk about questions such as the limits of proof, we need to analyze
the concept of proof carefully and give precise definitions. When we carry out this
metamathematics, we wont be working within a formal system, but rather analyzing
a formal system using the ordinary tools of mathematics. Once we have built a
powerful enough system, we could use it to prove formal versions of the informal
theorems we proved about it. Of course it would be silly to believe those theorems
just because the formal system said so. Instead, we believe them because we were
convinced by the informal proofs. However, its nice to know that the formal system
agrees with us, and it would be worrisome if we were unable to formalize the proofs.
As in propositional calculus, we begin by specifying the syntax of first-order logic.
Definition 7.2. The terms in a first-order language are the strings specified
recursively as follows:
(1) All variable and constant symbols are terms.
(2) If f is an n-ary function symbol and t1 , . . . , tn are terms, them f (t1 , . . . , tn )
is a term.
In other words, the terms are the expressions obtained by applying function
symbols repeatedly, starting with variables and constants. This definition is stated
quite a bit less formally than the definition of wffs in the propositional calculus
(Definition 1.3), but the reader should have no difficulty translating between these
degrees of formality.
It is sometimes convenient to use other notation. For example, we could use
infix notation to write a + b instead of +(a, b) if + is a binary function symbol.
We will sometimes do this, but well treat it as an informal abbreviation for the
sake of human readability, with +(a, b) being the true string in the formal language.
An alternative approach would be to formalize several different types of function
symbols, but its not worth complicating the formal language for the sake of human
stylistic conventions.
Definition 7.3. The wffs (well-formed formulas) in a first-order language are the
strings specified recursively as follows:
(1) If R is an n-ary relation symbol and t1 , . . . , tn are terms, then R(t1 , . . . , tn )
is a wff.
(2) If t1 and t2 are terms, then =(t1 , t2 ) is a wff.
(3) If and are wffs, then so are , ( ), ( ), ( ), and ( ).
(4) If is a wff and x is a variable, then x and x are wffs.
The first two sorts of wffs (R(t1 , . . . , tn ) and =(t1 , t2 )) are called atomic wffs, because
they cannot be broken down into smaller wffs.
NOTES ON LOGIC 21
Note that wffs in the propositional calculus are not first-order wffs (for example,
(p p) is not a first-order wff, because the variable p is not itself a wff). From now
on, the term wff will by default refer to first-order wffs.
In our formalization, first-order wffs must be fully parenthesized. As in the case
of the propositional calculus, this implies unique readability: no wff can be formed
in two different ways using the defining rules.
In practice we will always write (1 = 2 ) instead of =(1 , 2 ), because it looks
too weird otherwise. As in case of infix function symbols, well view this as an
abbreviation for the true wff. (Similarly, we may omit parentheses for readability,
but the formal system itself does not allow this.) The reason why the equals sign
is handled separately from the relation symbols is that it will have a predefined
meaning.
By default, we will use lowercase letters for variables, unless they are already
taken for constants or functions. For example, x (x = x) is a wff.
The definition of wff may seem overly permissive. For example, x (x = y) is
a wff, although it doesnt specify the role of y. Worse yet, x x (x = x) is a wff,
despite quantifying twice over the same variable. Examples such as x (x = y) are
actually useful, but x x (x = x) is not. Still, it wont do any harm once we specify
first-order semantics, so there is no need to rule it out.
Definition 7.4. Let be a wff, let x be a variable, and let Q denote either or .
If Qx occurs as a (consecutive) substring of a wff , then this substring is the
scope of the quantifier Q. Every occurrence of x in is bound, and a variable is free
if it is not bound by any quantifier.
For example, in x (x = y), the x in x = y is bound while the y is free. In
y x (x = y), both x and y are bound. In
(x (x = y) z (x = z)),
the x in x = y is bound, as is the z in x = z, while the y in x = y and the x in x = z
are free. In particular, one occurrence of a variable may be bound while another is
free.
Note that fully parenthesizing wffs can be crucial for avoiding ambiguity. For
example, (x (x = y) z (x = z)) is different from x ((x = y) z (x = z)),
and x (x = y) z (x = z) is in principle ambiguous. We will sometimes drop
the outermost parentheses, in which case x (x = y) z (x = z) will mean
(x (x = y) z (x = z)).
Note that Definition 7.4 depends on unique readability to make sense.
Definition 7.5. A sentence is a wff with no free variables.
8. First-order semantics
Without some addition information, a wff has no meaning. Even if we try to
interpret the logical symbols, we dont know what the other symbols (variables,
constants, functions, relations) are referring to, and so we cant make any sense of
the wff. In other to assign meaning to a wff, we need the specify the mathematical
structure we are trying to apply it to. Then we can decide whether it is true or false
for that particular structure.
Definition 8.1. A structure A for a first-order language L consists of a nonempty
set A called the domain of A, an element denoted cA A for each constant symbol
22 HENRY COHN
simple things in this language. For example, that there exists a triangle, or that
every triangle is contained in a complete graph on seven vertices. (Of course, these
sentences may or may not be true, depending on the structure.) However, there does
not exist a wff (x, y) with two free variables x and y such that (x, y) holds if and
only if x and y are connected by a path. The basis reason is that we cannot use an
ellipsis in a first-order formula. (We also cannot quantify over wffs or over natural
numbers.) For each n, there is a wff n (x, y) that says x and y are connected by a
path of length at most n. For example, we can take 2 (x, y) to be
(x = y) (x y) z ((x z) (z y))
(leaving out some parentheses). However, as n increases the wffs n become longer
and longer. We will prove later that connectivity is not a first-order property.
By contrast, set theory will provide a far more expressive language. Strictly
speaking, the underlying first-order language is the same as for graph theory, just
with different notation. Instead of the adjacency relation , we will have the set
membership symbol . Whats different is how we interpret the language. Variables
will stand for sets, and x y will mean that x is an element of y.
The ZFC axioms can all be formulated as first-order sentences. Whats even
better is that doing so clears up the remaining ambiguity from our informal approach
to ZFC. In a few of the axioms, such as separation, we needed to refer to a property
of sets, and we left it somewhat vague what that meant. To make the axiom precise,
we can interpret a property to be any first-order wff:
The ellipses are simply an abbreviation: for each specific wff , the corresponding
axiom requires no ellipses to state.
Note that separation has become an axiom scheme consisting of infinitely many
analogous axioms. This is because first-order logic cannot quantify over wffs, so we
need to assume a separate axiom for each wff. Fortunately, theres no harm in doing
so. ZFC is therefore an infinite set of first-order wffs.
A third example is Peano arithmetic (sometimes abbreviated PA), which is the
first-order theory of elementary number theory. Variables stand for natural numbers,
there is a constant symbol 0, there is a unary function S (standing for successor),
and there are binary functions + and (written with infix notation). The axioms of
Peano arithmetic are as follows:
The induction scheme consists of infinitely many axioms. For each wff (n, m1 , . . . , mk )
in the language of PA with free variables n, m1 , . . . , mk , we assume an axiom stating
m1 . . . mk (0, m1 , . . . , ml ) n ((n, m1 , . . . , mk ) (S(n), m1 , . . . , mk ))
n (n, m1 , . . . , mk ) .
2This issue is actually a red herring. When defining the topology on S, we care only about the
first-order sentences satisfied by each structure, so we might as well identify structures that satisfy
the same sentences. Equivalently, we can replace each structure with the set of first-order sentences
satisfied by it; the resulting sets are exactly the maximal satisfiable sets of sentences. They form a
set S 0 (a subset of the power set of the set of sentences), and we might as well topologize it instead
of S. Using S 0 lets us avoid talking about proper classes, but I feel S is more natural than S 0 is.
26 HENRY COHN
In this language, Theorem 10.2 says that if {S : } has the finite intersection
property, then
\
S 6= .
That is equivalent to saying that S is compact. To see why, note that it is a special
case of the compactness of S, and in fact the Alexander subbase theorem says that
this special case is equivalent to the general case.
Another way of thinking about compactness is via formal proofs. In Section 16,
we will set up a formal proof theory for first-order logic, at which point we will
define what ` means. Then we will prove completeness:
Theorem 10.3 (Completeness). Let be a set of first-order sentences and a
sentence. Then ` if and only if |= .
This theorem says that formal proofs capture exactly what we want them to:
a sentence has a formal proof using some axioms if and only if it holds in every
structure satisfying those axioms. This is the ideal notion of proof. As in the
propositional calculus, one direction is easy (namely, soundness: ` implies
|= ), while the other is much deeper.
Completeness might sound contrary to Godels incompleteness theorems, but
it is not. Whats going on is that proof system is complete, while the axioms are
incomplete. The completeness of the proof system means that the sentences that
hold in every model of ZFC are provable, while those whose negations hold in every
model are disprovable. By contrast, the independence of the continuum hypothesis
amounts to saying that it holds in some models of ZFC but not others. Godels first
incompleteness theorem says that not only are the ZFC axioms incomplete, but they
cannot be completed in any reasonable way. We will discuss the incompleteness
phenomenon later.
We can reformulate completeness in terms of consistency, much like Corollary 6.9.
Theorem 10.4. Let be a set of first-order sentences. If is consistent, then
has a model.
Note that the converse of this theorem is obvious (or, rather, will be obvious
once we have defined formal proof): anything you can prove using must hold
in every model, so if has a model, then you had better not be able to prove a
contradiction.
It is not hard to show that Theorem 10.4 is equivalent to completeness. Suppose
we assume the completeness theorem. If has no model, then |= and hence, by
completeness, ` . Conversely, suppose Theorem 10.4 holds, and |= . Then
{} has no model and is thus inconsistent, which means ` .
What Theorem 10.4 tells us is that formal proofs are the only obstacle to the
existence of a model. If you cant derive a contradiction, then a model actually exists.
This is analogous to Corollary 6.9, but much deeper. Instead of creating just a
satisfying assignment, we need to come up with a first-order structure. Nevertheless,
the proof will follow similar lines, with one additional idea (Henkinization).
One striking application is ZFC. If ZFC is consistent, then it has a model. However,
Godels second incompleteness theorem will imply that ZFC cannot prove its own
consistency (assuming it is indeed consistent). Thus, although ZFC presumably
does have a model, it cannot prove this fact.
NOTES ON LOGIC 27
Theorem 10.4 makes the compactness theorem obvious. All proofs will have
finite length, so if theres a contradiction, then the contradiction can be derived
using only a finite number of sentences. Thus, if every finite subset of a set of
sentences is consistent, then the whole set is consistent. This assertion becomes
compactness when we replace consistency by satisfiability, and Theorem 10.4 tells
us that consistency and satisfiability are the same thing.
12. Downwards Lo
wenheim-Skolem
Definition 12.1. Let A and B be structures for some first-order language. We
say A is a substructure of B (written A B) if their domains satisfy A B, the
constants agree, and the functions and relations for A are the restrictions to A of
those for B. We say A is an elementary substructure of B (written A e B) if A B
and for every wff (x1 , . . . , xk ) with free variables x1 , . . . , xk and all a1 , . . . , ak A,
A |= (a1 , . . . , ak ) iff B |= (a1 , . . . , ak ).
We also call B an elementary extension of A if A e B.
Note that a substructure can satisfy different sentences from its extension. In
fact, the sentences satisfied by a substructure are usually totally different, and it
deserves an explanation when they do agree.
30 HENRY COHN
we can imagine how it feels inside the model.3 Suppose one day a booming voice
from the sky announces How silly you are to think R is uncountable! I can easily
enumerate all the pitiful things you call real numbers. Upon being asked how, the
voice replies For heavens sake, you just do it. Theres nothing special about it.
Really, your whole world is countable, so you can enumerate whatever you want.
When you complain that this doesnt seem possible, the voice says I keep forgetting
how limited your stock of functions is. Indeed, your world has only countably many
of them! The enumeration is admittedly not a function as you understand the term,
but trust me: I can see functions that would make your head spin. If only you could
join me out here in the real world!
A model of ZFC is a self-contained set-theoretic universe, and the usual definitions
and theorems apply only within the model. Unsolicited opinions from the universe
containing the model are neither relevant nor helpful. Furthermore, the external
universe may itself be just a model within some larger universe, so who is to say
which level counts as reality?
We will prove downwards L owenheim-Skolem directly, with no need for the
compactness theorem.
Lemma 12.4 (Tarski-Vaught test). Let A and B be structures for a first-order
language. Then A e B if and only if A B and the following equivalence holds:
for all wffs (x0 , . . . , xk ) with free variables x0 , . . . , xk and all a1 , . . . , an A,
B |= x0 (x0 , a1 , . . . , an ) iff there exists a0 A such that B |= (a0 , a1 , . . . , an ).
This condition says that given a wff with an assignment in A of all but one of its
free variables, it can be satisfied in B by choosing a value for the remaining variable
from B iff it can be satisfied in B by choosing a value in A. Note that this condition
talks only about B, and says nothing directly about which sentences A satisfies.
It is far from obvious that any other ultrafilters exist, but we will see shortly that
they do.
Lemma 13.8. If F is a filter on S and X is a subset of S such that S \ X 6 F ,
then there is a filter F 0 on S such that F F 0 and X F 0 .
34 HENRY COHN
Proof. The set X cannot have empty intersection with any element Y of F , because
S \ X would then contain Y and therefore be an element of F . It follows that
{X Y : Y F }
has the finite intersection property, because
n
\ n
\
(X Yi ) = X Yi
i=1 i=1
Tn
and i=1 Yi F if Y1 , . . . , Yn F . Thus, there exists a filter F 0 on S that contains
{X Y : Y F }. Because F 0 is closed under taking supersets, X F 0 and F F 0 ,
as desired.
Lemma 13.9. A filter F on a set S is an ultrafilter if and only if it is a maximal
filter on S (i.e., there is no filter F 0 on S such that F ( F 0 ).
Proof. Suppose F is a ultrafilter. Then for every subset X of S, either X or S \ X
is contained in F , but no filter can contain both. Thus, F cannot be enlarged and
must be a maximal filter.
For the other direction, if F is a filter on S and X is a subset of S such that
neither X nor S \ X is in F , then Lemma 13.8 yields a strictly larger filter on S.
Proposition 13.10. Every filter is contained in a maximal filter (on the same set).
Proof. This proposition is a straightforward consequence of Zorns lemma. Let F
be a filter on a set S, and let P be the poset of filters on S that contain F , ordered
by . If C is a chain in P , then [
F0
F 0 C
is a filter on S: it contains S but not and is closed under taking supersets
because every element of C has these properties, and if X1 F10 and X2 F20 with
F10 , F20 C, then without loss of generality F10 F20 because C is a chain, and hence
X1 X2 F20 .
Thus, every chain in P has an upper bound, so P has a maximal element, as
desired.
It follows from Lemmas 13.4 and 13.9 and Proposition 13.10 that there is a
non-principal ultrafilter on every infinite set. However, non-principal ultrafilters
are mysterious objects and it is far from clear how to describe one. In particular,
it is consistent with ZF that all ultrafilters are principal. One might guess that
Proposition 13.10 would be equivalent to the Axiom of Choice, but in fact it is
strictly weaker.
We built non-principal ultrafilters from extending the filter given by Lemma 13.4,
which is not contained by any principal ultrafilter. In fact, this is the only possible
way to construct them:
Proposition 13.11. If U is a non-principal ultrafilter on a set S, then every cofinite
subset of S is in U .
Proof. Let X be a cofinite subset of S, and suppose X 6 U . Then S \ X U , and
S \ X is finite. Let Y be the intersection of all the subsets of S \ X that are in U ,
so Y is the smallest subset of S \ X that is in U . Then every element Z of U must
satisfy Y Z, since otherwise Z Y would be a smaller element of U than Y is.
NOTES ON LOGIC 35
14. Voting
There is a remarkable connection between principal ultrafilters and the theory
of voting. Before proceedings to ultraproducts, we will use ultrafilters to prove
Arrows impossibility theorem. This proof is due to Bengt Hansson (The existence
of group preference functions, Public Choice 28 (1976), 8998). Its not the shortest
or simplest proof, but it makes the connection with mathematical logic especially
vivid.
How can a group of voters rank several options (such as candidates for election
or bills)? Lets assume each voter has a personal ordering for these options. We
wont even ask about strength of preferences, since its unclear how to normalize
preference strength between people. Instead, we want a voting system in the form
of a function that takes the voters rankings and somehow aggregates them to form
a collective ranking.
One compelling approach is Condorcets: rank A > B if and only if a majority of
the voters thought so. Unfortunately, this approach breaks down when there are
more than two options being voted on. For example, what if a third of the voters
think A > B > C, a third think B > C > A, and a third think C > A > B? Then
two thirds think A > B, two thirds think B > C, and two thirds think C > A. This
sounds paradoxical, but it isnt. Even if every voter is individually rational, pairwise
voting can create inconsistent rankings. For this particular case, by symmetry the
only reasonable option is to declare a tie.4
One might hope to salvage voting by using more than just pairwise vote totals.
Maybe if you look at the patterns of who voted in which way, you can discern
4Note that if you have the power to schedule pairwise votes, you can choose who you want to
win. If you like C, you first call for a vote on A vs. B, and A will win. Then you call for a vote
on A vs. C, and C will win, at which point you declare C the overall winner by transitivity. By
contrast, if you want B to win, you first call for a vote on A vs. C. This illustrates the potentially
enormous power of being in charge of what gets voted on when.
36 HENRY COHN
5This version is not exactly the same as Arrows, but similar in spirit.
NOTES ON LOGIC 37
X Y
A?B A>B
X V \X
Lemma 14.6. If V \ X is not decisive for A > B, then X is decisive for B > C
(for all C 6= A) and for C > A (for all C 6= B).
Proof. We will prove just the first case (the second is completely analogous). Suppose
C 6= A. Because V \ X is not decisive for A > B, there exists a scenario in which
V \ X votes for A > B but society thinks B A (see Figure 14.2). Now imagining
changing the rankings of C, which will not change the outcome for A vs. B by
independence of irrelevant alternatives. Suppose V \ X votes for A > C, while X
38 HENRY COHN
A>C
A>B
A?B
A>C
B>C
X V \X
votes for A > C and B > C (see Figure 14.3). Society still ranks B A, and now
A > C by unanimity, and hence society ranks B > C. On the other hand, the
position of A is irrelevant for this ranking, and the only part that did not refer to
A was that X votes for B > C. Thus, X must be decisive for B > C whenever
C 6= A.
Lemma 14.7. If X is decisive for A > B, then V \ X is not decisive for B > A.
Proof. What would happen if the voters in X all voted for A > B and those in
V \ X all voter for B > A? They cant both win, so at most one of X and V \ X
can be decisive in this situation.
15. Ultraproducts
Fix a first-order language, and consider structures Ai for i I. How can we
combine
Q them to form a single structure? One obvious
Q way is to try to form a product
A := iI Ai . We can take it to have domain iI Ai , and wecan define constants
and functions in the obvious way: cA = (cAi )iI , f A (ai )iI = f Ai (ai ) iI for
unary functions, etc. However, it is not clear how to handle relations. Furthermore,
products have poor first-order properties: given a set of first-order sentences, its
models are generally not closed under taking products.
Loss theory of ultraproducts resolves all these difficulties. The idea is that if
we want to tell whats true in an ultraproduct, we let the factors vote using an
ultrafilter.
NOTES ON LOGIC 39
Finally, relations are well defined for essentially the same reason.
The intuition behind ultraproducts is that for every statement, we let the factors
vote on whether it should be true. Whichever side forms a set in the ultrafilter
wins. Note that having an ultrafilter, as opposed to a filter, is crucial: it means that
exactly one of and will win, since they correspond to complementary sets of
voters.
Note that the equivalence relation is just voting on equality: to decide whether
[x] = [y], let the factors vote. Similarly, the definition of RB also amounts to taking
a vote. The fundamental theorem of ultraproducts says that once we have set up
the definition, the same is true for all wffs:
Q
Theorem 15.2 (Fundamental theorem of ultraproducts). Let B = iI Ai /U be
1 n 1 n
an ultraproduct.
Q Suppose (x , . . . , x ) is a wff with free variables x , . . . , x and
1 n
a , . . . , a iI Ai . Then
B |= ([a1 ], . . . , [an ]) iff {i I : Ai |= (a1i , . . . , ani )} U.
Proof. This assertion is true by construction for atomic wffs, and we will prove it for
general wffs by induction on formula length. Let S = {i I : Ai |= (a1i , . . . , ani )}.
We can reduce everything to the case of logical symbols , , and by writing
everything else in terms of them. For , note that S = I \ S , while X U if
and only if I \ X 6 U . For , note that S = S S , which is in U if and
only if both S and S are since U is both closed under intersection and upwards
0 0 n
Q suppose = x (x , . 0. . , x ). Then
closed. Finally, B |= ([a1 ], . . . , [an ]) iff there
0 n
exists a iI Ai such that B |= ([a ], . . . , [a ]). By induction, the latter holds
iff there exists a0 iI Ai such that {i I : Ai |= ([a0i ], . . . , [ani ])} U . Finally,
Q
6If it is not obvious why upwards closure is relevant, you should pause and think about this.
40 HENRY COHN
structure
Y
Cn /U
n1
Proof. Let S be the set of finite subsets of , and for each T S, let AT be a model
of T . We will choose an ultrafilter U on S so that the ultraproduct
Y
AT /U
T S
is a model of .
Given
Q a suitable ultrafilter U on S (with properties to be specified below), let
A = T S AT /U . We wish to ensure that A |= for all . By the fundamental
theorem of ultraproducts, A |= if and only if {T S : AT |= } U . Thus, we
need the sets {T S : AT |= } to be in U for each .
Let S = {T S : AT |= }. These sets have the finite intersection property,
because {1 , . . . , n } S1 Sn and hence S1 Sn 6= . Thus, there
exists an ultrafilter on S that contains S for each Q (by Lemmas 13.5 and 13.9
and Proposition 13.10). If U is such an ultrafilter, then T S AT /U is a model of
, as desired.
Lines 1 and 2 are hypotheses, and line 3 is a case of Axiom 7. Now line 4 follows
by modus ponens, line 5 is Axiom 5, and line 6 again uses modus ponens. Finally,
line 7 follows by generalization and line 8 by modus ponens.
Lemma 16.6. Let be a sentence and a set of sentences.
(1) If is consistent and ` , then {} is consistent.
(2) If {} is inconsistent, then ` ( ).
(3) If {( )} is inconsistent, then ` .
Corollary 16.7. Let be a sentence and a set of sentences. If is consistent,
then at least one of {} and {( )} is consistent (possibly both).
The proofs are the same as that of Lemma 6.6 and Corollary 6.7.
Theorem 16.8 (Completeness). Let be a set of sentences and a sentence. If
|= , then ` .
Corollary 16.9. If a set of wffs is consistent, then it is satisfiable.
To see why the corollary follows from the completeness theorem, note that if is
not satisfiable, then every model for satisfies (because has no models), and
thus ` . In fact, the theorem also follows from the corollary: if every model of
satisfies , then {( )} is not satisfiable. By the corollary, it is inconsistent,
so part (3) of Lemma 16.6 implies that ` .
Thus, it suffices to prove the corollary. To do so, we must somehow use consistency
to produce a model. We will face the same difficulties as in the proof of the
completeness theorem for propositional calculus, and we will deal with them in the
same way, with one additional trick known as Henkinization.
Lemma 16.10. Let be a consistent set of sentences, (x) a wff with one free
variable, and c a constant symbol that does not occur anywhere in or . If
` x (x), then {(c)} is consistent.
Proof. If {(c)} is inconsistent, then ` ((c) ). Suppose y is a variable
that does not occur anywhere in (x) or the formal proof of ((c) ) from
. If we replace c with y everywhere it occurs in the proof, then we find that
` ((y) ) and hence, by generalization, ` y ((y) ).
Now by Axiom 7, ` y ((y) ) ((x) ), because no occurrence of
y in (y) can be within the scope of a x. (Recall that every occurrence of y in
(y) was a free occurrence of x in (x) and hence was not bound by any x.) Thus,
` ((x) ) by modus ponens, and hence ` x ((x) ) by generalization.
Finally, one of our hypotheses is that ` x (x). In other words, `
(x ((x) ) ). When combined with ` x ((x) ), we find that
` . Thus, if is consistent, then {(c)} is also consistent.
Proof of the completeness theorem. Suppose is a consistent set of wffs. We would
like to build a model of , but we have nothing to work with except syntax. Henkins
brilliant idea was that we can build the model out of pure syntax, if we expand our
language enough through a process now called Henkinization.
We would like to expand to a new set , in an expanded language, with the
following two properties:
(1) The set 0 is a maximal consistent set of wffs. I.e., 0 is consistent but not
contained in any larger consistent set (in the same language).
NOTES ON LOGIC 45
(2) Whenever 0 ` x (x), there is some constant symbol c such that 0 ` (c).
In other words, 0 assigns a name to everything it proves exists.
Either of these alone is easy. To get a maximal consistent set, we can just test
each wff sequentially and add it to the set if it is consistent with what came before.
This is exactly what we did in the proof of Theorem 6.8. One minor difference is
that the language might not be countable, but thats not a problem (we can just
well-order the language and use transfinite induction).
Similarly, we can also ensure that the second property holds. For each existential
statement x (x) implied by , we introduce a new constant symbol (never before
used). By Lemma 16.10, {(c)} is consistent. Henkinization means extending
the language and set of sentences in this way. If we cycle through all the existential
statements, we can deal with all of them this way. Of course, by enlarging the
language we have created new existential statements, so theres an infinite regress
here, but after infinitely many cycles of Henkinization we arrive at a fully Henkinized
set of sentences.
Either of these two processes can disrupt the other. If we extend the set of
sentences to make a maximal consistent set, then we may increase its consequences,
leaving it no longer Henkinized. Furthermore, Henkinizing enlarges the language,
after which the set of sentences may no longer be maximal. However, we can achieve
both properties by iteration: let 0 be , let 2i+1 be a maximal consistent set
containing 2i , and let 2i be the Henkinization of 2i1 when i 1. Then the
union 0 of all these sets is a maximal consistent, fully Henkinized set of sentences.
To see why let Li denote the language of i , and L0 that of 0 (i.e., the set of
constant symbols in L0 is the union of those for Li ). Any wff for the language L0
involves only finitely many constant symbols and must thus be a wff for Li for some
i. Thus, every wff for L0 was already dealt with at some finite stage of this process,
which means 0 is both maximal consistent and fully Henkinized.
Now we can make a structure A out of our language. Its domain A will be the
set of equivalence classes [c] of constant symbols c under the equivalence relation
c d, meaning (c = d) 0 . (Note that 0 if and only if 0 ` , because 0 is
a maximal consistent set.) In other words, having given names to everything that
might exist, we take them to be the elements of our domain, after setting them
equal if 0 thinks they are equal. Note that Axioms 8 and 9 imply that is indeed
an equivalence relation.
We set cA = [c], and we define the relations RA in A by
RA ([c1 ], . . . , [cn ]) iff R(c1 , . . . , cn ) 0 .
Note that Axiom 9 implies that these relations are well defined. To get functions,
we can use the following trick: given an n-ary function symbol f and constants
c1 , . . . , cn , 0 ` x (x = f (c1 , . . . , cn )). (Of course such an x should exist, namely
x = f (c1 , . . . , cn ), and one can check that 0 proves this.) Thus, 0 ` (c0 =
f (c1 , . . . , cn )) for some constant c0 , and we can set [c0 ] to be f A ([c1 ], . . . , [cn ]). It is
not hard to check that this is well defined.
So far, we have built a structure A. We can now prove by induction on that
A |= iff 0 .
The base case is atomic formulas, which follow from the construction of A. The cases
= and = (1 2 ) are handled exactly like in the proof of Theorem 6.8.
Finally, suppose = x (x). Because Axiom 7 says 0 ` (x (x) (c)) for
46 HENRY COHN
every constant symbol c, we see that 0 ` x (x) implies (c) 0 for all constants
c, as desired. By contrast, if 0 ` x (x), then 0 ` x (x) (recall that we
proved (x x) as an example of a formal proof), and hence 0 ` (c)
for some constant c.
As discussed in Section 10, compactness is an immediate corollary of completeness:
any proof of a contradiction will have finite length and thus involve only finitely
many hypotheses. However, completeness is of fundamental importance in its own
right. It gives us a precise analysis of what provability means, which opens the
door of limitative results such as the incompleteness theorems. (Without a precise
definition of provability, there is no way to establish unprovability.)
proofs. Some have a preference for constructive proofs, because they give more
information, but almost nobody is skeptical about the correctness of non-constructive
proofs. However, earlier in the century there were prominent doubters. Hilbert
proposed the following plan (Hilberts program) for convincing everyone that
non-constructive proofs were harmless:
(1) Formalize all of mathematics.
(2) Identify an absolutely uncontroversial part of mathematics, the most finite
and constructive part. Nobody should doubt any of its techniques.
(3) Prove the consistency of all of mathematics (including non-constructive
methods) using only the uncontroversial part of mathematics.
If this plan could have been carried out, it would have shown that non-constructive
methods were harmless: they could not possibly contradict the results of construc-
tive mathematics. However, Godels incompleteness theorems show that Hilberts
Program cannot be carried out. The first rules out Step 1, but that is not a devas-
tating objection: we could simply replace all of mathematics with all of known
mathematics. However, the second incompleteness theorem shows that Step 3
is impossible. Assuming our formalization is consistent, it cannot prove its own
consistency at all, let alone using only constructive techniques.
18.1. The Liar. The paradox of the liar, attributed to Epimenides, is best phrased
in the form This sentence is false. That sentence cannot be true (since it claims to
be untrue), and therefore must be false. However, if it is false, then since it claims
to be false, it is false that it is false, and hence it is true after all. Logic is helpless
when confronted with this paradox.
Of course, the moral of this story is that one must never say This sentence
is false, or at least if one does say it, one mustnt really mean it. It is neither
a true nor a false sentence, and is therefore meaningless. It is important to bar
self-contradictory sentences from mathematics, to keep from having to deal with
the resulting contradictions.
18.2. Russells Paradox. At the turn of the century a contradiction did in fact
slip into the foundations of mathematics. Frege was a pioneer in mathematical logic,
and had devoted many years to formalizing mathematical reasoning. Just after
the second volume of his magnum opus was completed, he received a letter from
Bertrand Russell, pointing out the following paradox. Consider the set S defined by
S = {X | X 6 X}.
In other words, S is the set of all sets that are not elements of themselves. That
is not as weird a property as it might seem at first. Most sets are not elements
of themselvese.g., the set of real numbers is not a real number. Unless one has
eccentric tastes in set theory, S should in fact consist of all sets. Russell asked
48 HENRY COHN
8The names completeness theorem and incompleteness theorem may seem contradictory,
but they refer to differ aspects of formal mathematics. The underlying system of logic is complete:
given a set of axioms and a sentence , one can prove from if and only if is satisfied
by all models of . In other words, everything that really follows from is provable, and vice
versa. However, if we are trying to deal with a large enough branch of mathematics, then the set
of axioms will necessarily be incomplete: there will be sentences that can neither be proved nor
disproved using .
50 HENRY COHN
Definition 19.2. The axiom set is complete if for every sentence , either `
or ` . It is consistent if 6` , and it is -consistent if 6` .
In other words, is complete if everything can be proved or disproved, it is
consistent if a it cannot prove a contradiction, and it is -consistent if it cannot
prove that a contradiction can be proved. (This is only a special case of the standard
definition of -consistency, but it will work for our purposes here.) Notice that if
is -consistent, then it is consistent, by the first Hilbert-Bernays axiom. However,
the converse is false: just because one can prove that a contradiction is provable,
it does not follow that a contradiction actually is provable, as we will see once we
have proved the second incompleteness theorem.
Lemma 19.3. For every sentence , ` (( ) ).
Proof. It is a tautology that ( ), so ` ( ( )). Then the first
Hilbert-Bernays axiom implies ` ( ( )), and the third then yields `
( ( )). When combined with ` (( ) ( )), we
conclude that ` ( ( )). However, the wff ( ( ))
is equivalent to (( ) ), so ` (( ) ), as desired.
Proof. The proof simply amounts to examining the proof of the first incompleteness
theorem in the light of the Hilbert-Bernays axioms. We can prove ` ( )
as follows: by the definition of , ` ( ). It follows from the first and third
axioms that ` ( ). On the other hand, ` ( ) by the
second axiom. Thus, ` ( ( )), so it follows from Lemma 19.3
that ` ( ).
We conclude by proving ` ( ). By the definition of , `
( ), so ` ( ). On the other hand, we have already proved
` ( ), so ` ( ). Thus, ` ( ).
Theorem 19.6 (Second incompleteness theorem). If is consistent and has a
odel sentence, then 6` .
G
Note that expresses the unprovability of a contradiction, i.e., the consistency
of our system. Thus, the second incompleteness theorem says a consistent system
cannot prove its own consistency.
Proof. Let be a Godel sentence, and suppose ` . By Lemma 19.5,
` ( ), so ` ( ). Thus, ` , so ` because is a
Godel sentence. However, by the first incompleteness theorem, ` implies ` ,
so is inconsistent.
The proof of the second incompleteness theorem basically amounts to Lemma 19.5.
In other words, it comes down to the fact that is itself strong enough to prove
the first incompleteness theorem.
It is natural to wonder whether there are -inconsistent axiom sets that are
nevertheless consistent. One minor consequence of the second incompleteness
theorem is that there are. Suppose is consistent.9 Then 6` , so {}
is consistent. However, this enlarged system is automatically -inconsistent, since
asserts the provability of a contradiction. (Strictly speaking, it asserts the
provability using just as axioms, but that implies the provability using {}.)
This construction of -inconsistent systems is somewhat silly, since it simply
adds the provability of a contradiction as an axiom. One doesnt expect to run into
many other examples, and the concept of -inconsistency is more of a theoretical
worry than a practical danger. However, we must deal with it if we are to study the
incompleteness theorems rigorously (without using Rossers trick).
The same way Godel sentences assert their own unprovability, one can construct
sentences that assert their own provability. Henkin asked what one could say about
such sentences. A Henkin sentence is a sentence such that ` ( ).
It might at first sound like nothing could be said about Henkin sentences, but
Lob realized that they are always in fact provable. In fact, something even more
general is true:
ob). For all sentences , if ` ( ), then ` .
Theorem 19.7 (L
The intuitive motivation for the proof of L
obs theorem is the following paradoxical
argument for the existence of Santa Claus:
9One might wonder whether there is any consistent axiom system capable of formalizing the
concept of proof. For set theory, the answer is presumably yes, but it is not so clear how we could
know for sure. However, this formalization can also be carried out in much humbler systems, such
as elementary number theory. Ultimately, in order to prove that a system is consistent, we have to
assume something, but first-order number theory seems like a safe bet.
NOTES ON LOGIC 53
Lobs theorem is extraordinary, since it says the only way can prove
is if it actually proves (in which case , or indeed for any , follows
trivially). This is the exact opposite of naive intuition. After all, means
that if is provable, then it is true. Surely, it should be possible to prove that
anything thats provable is true. However, Lobs theorem says you cant prove this
in general, and in fact that you can only prove in the narrowest possible
case, namely when itself is provable.
Using L obs theorem, we can give another proof of the second incompleteness
theorem:
In fact, one can also deduce Lobs theorem from the second incompleteness
theorem.
54 HENRY COHN
20. Constructing a Go
del sentence
In the previous section, we did not actually construct a Godel sentence, but
instead referred vaguely to indirect self-reference. One can carry out the construction
more formally as follows.
First, we assign a Godel number to each wff and to each finite sequence of
wffs. In other words, we encode them as natural numbers.10 We can do so in any
reasonable way (using prime factorizations, decimal expansions, or whatever we
like). The Godel numbers for finite sequences of wffs will be used to describe formal
proofs.
Now let P (m, n) be a wff with two free variables m and n, which says m is
the Godel number of a proof of the sentence with Godel number n. Of course,
expressing this idea as a first-order wff is rather cumbersome, but it can certainly
be done in the language of set theory, for example. (Doing it for first-order number
theory is trickier but also possible.) The precise definition of P will depend on our
axiom set .
Let Q(m, n) be a wff saying m is the Godel number of a wff with one free
variable, and n is the Godel number of that wff with m substituted for the free
variable. In other words, if we let Wk (x) denote the wff with Godel number k and
free variable x (if any, since some Godel numbers wont correspond to a wff with
one free variable), then Q(m, n) means n = Wm (m). Plugging m into Wm should
bring to mind diagonalization, and that is exactly what is going on here.
Finally, let G(k) = n (Q(k, n) m P (m, n)). In other words, G(k) says that
Wk (k) is not provable.
If g is the Godel number of the wff G(k) (which has one free variable k), then
G(g) is a Godel sentence: it says that Wg (g) is not provable, but G = Wg , so G(g)
asserts its own unprovability.
This construction is by no means limited to G odel sentences, since it depends on
no property of provability except that it can be defined by a first-order wff. Given
any wff R(n) that specifies a property of the sentence with Godel number n, we
could define G(k) = n (Q(k, n) R(n)). If g is the Godel number of G(k), then
G(g) asserts that it itself has property R. For example, taking R(n) = m P (m, n)
gives a G odel sentence. We can use this approach to produce the auxiliary sentence
in the proof of L obs theorem (which asserts that its own provability implies ).
10Its traditional to use natural numbers, but if we are doing this in set theory we could just as
easily use strings.