Professional Documents
Culture Documents
•
NUNC COCNOSCO EX PARTE
TRENT UNIVERSITY
LIBRARY
Digitized by the Internet Archive
in 2019 with funding from
Kahle/Austin Foundation
https://archive.org/details/numbersystemsfouOOOOfefe
THE
Foundations of Algebra
O
and Analysis
J
This book is in the
ADDISON-WESLEY SERIES IN MATHEMATICS
NUMBER SYSTEMS
Foundations of Algebra
and Analysis
J
by
SOLOMON FEFERMAN
Department of Mathematics
Stanford University
ONULP
Dedicated to my mother and father.
1 R
PREFACE
1.1 Introduction. 1
The mathematical method 3
1.2 Logic. 4
Mathematical statements and their structure 4 • Existence 7 •
Logical connectives 10
2.1 Sets.14
Sets as abstractions from conditions 14 • Extensions of the con¬
cept of set 17 • Identity and inclusion 19 • Some peculiar
sets 21
2.2 An algebra of sets.■.25
Intersection, union, and complement 25 • Basic laws of the
algebra of sets 29 • Extended intersections and unions 34
3.3 Order.82
Simply ordered systems 83 • Well-ordered systems 85 • Order¬
ing and the arithmetical operations 90
Chapter 5 Polynomials.15g
Bibliography. ,nQ
Index
411
CHAPTER 1
One can go quite far on the basis of an uncritical use of the various
number systems. Much of the differential and integral calculus that we
know today, as well as the physical theories which are expressed in these
mathematical terms, was developed in just such a way. For example,
in the calculus we are asked to consider computations of infinite length,
such as
i A _l A_i i_ . . .
1 3 W 9 27 W
In this particular case, we are easily convinced, on the basis of the formula
(1 + r)( 1 — r + r2 — r3 + • • •) = 1,
l-l + l-H-
(1 — 1) + (1 — 1) H-= 0 + 04-= 0
It is not that the uncritical approach necessarily gives wrong answers, but
rather that there are certain questions for which it provides no coherent
answer at all.
In the study of Fourier series (which have many applications in engi¬
neering and physics) in the latter half of the nineteenth century there
arose certain questions which could not be adequately answered on the
basis of an uncritical approach to the number systems and which, at the
same time, could not be avoided. In response, a number of workers in
mathematics and logic embarked on a critical program to clarify the
concepts which were involved. The result of their work gradually re¬
solved itself into a systematic theory which could be used to settle the
troublesome questions to the satisfaction of most mathematicians. An
understanding of this theory is now an essential prerequisite to the study
of modern mathematics.
We confine ourselves in this book to that part of this theory which has
most to do with the number systems themselves, and to those matters
in mathematics which are most directly related to the number systems.
Those which are closest at hand are first—in the field of algebra—the
determination of the potentialities and limitations on solving algebraic
1.1] INTRODUCTION 3
For the reader who is interested in finding out more about symbolic logic
we recommend the textbooks listed in the Bibliography.
1.2] LOGIC 5
(1:2-1) x + y = 0,
(1:2-2) 3 + (-2) = 0,
which is, of course, false. Another way is provided by the use of the
words “all” (every, any) and “some” (there is, there exists). Some ex¬
amples of statements which can be formed from the condition (1:2-1)
using these words are:
(1:2-3) for all integers x and y, x + y = 0;
(1:2-4) for some integers x and y, x -j- y = 0;
(1:2-5) for any integer x there is an integer y such that x + y = 0;
(1:2-6) for any positive integer x there exists a positive integer y such
that x + y = 0.
(1:2-7) z + y = 5
namely the integers 1, 2, 3, and 4; however, (1:2-9) is still false, since the
condition (1:2-9) is not true of all positive integers x, in particular not
true of the number 5.
It is seen that variables serve roughly the same purpose in mathematical
statements as do the pronouns “it,” “this,” “that” in ordinary language.
If we did not use some such device as variables, even the simplest mathe¬
matical statements, such as
(1:2-12) for any two integers (not necessarily distinct), the result of
squaring the first and subtracting the square of the second is
the same as forming the product of two terms, the first of which
is the result of subtracting the second given integer from the
first, while the second term, of the product is the result of adding
the two integers together.
One note of warning should be sounded about the use in many mathe¬
matical texts of the words variable” and “constant.” For example, one
may read such a phrase as:
(1:2—15) x + y = y + x
(1:2-18) for any real number x, if x < —2, then 4 < x2.
(1:2-20) for every prime number x there exists a prime number y with
x < y.
The intuitive reason for the truth of (1:2-21) is simple. The polynomial
x5 7x2 + 2 has the value 2 at x = 0 and —4 at x = 1; since its value
varies continuously between 2 and — 4 as x varies between 0 and 1, there
must exist an x between 0 and 1 for which the value is 0. (This is not a
precise proof; we shall be able to give precise proofs of statements of this
sort in Chapter 7.) Again there is no known formula which will exhibit a
solution of the given equation (and, in a precise sense which is given in
courses of advanced algebra, there is probably no hope of finding one; this
will be discussed in more detail in Chapters 8, 9). Moreover, no finite
computation of predetermined length will end with exhibiting a specific
real number x as a solution of (1:2-20). However, the following infinite
sequence of computations will bring one closer and closer to a solution,
if it does not end in a finite number of steps with an exact solution. First
compute the value of x5 — 7x2 + 2 at 0.0, 0.1, 0.2, . . . , 0.9, 1.0. It is
possible that one of these rational numbers is an exact root, and our
computation is ended. Otherwise, in one of these intervals the value at
the left endpoint must be positive and at the right endpoint negative.
Suppose, for example, that (0.5)5 — 7(0.5)2 + 2 > 0 and (0.6)5 —
7(0.6)2 + 2 < 0. Thus a root of the polynomial lies between 0.5 and 0.6.
Then we compute the value of the polynomial at 0.50, 0.51, 0.52, . . . ,
0.59, 0.60, testing in each case to see whether this value is zero, positive,
or negative. By continuing in this way, we can find the decimal value of
a root to any desired number of places. The necessity of performing a
1.21 LOGIC 9
* The precise statement and proof of this is based on the advanced theory
of recursive functions developed in recent research papers.
10 THE LOGICAL BACKGROUND [CHAP. 1
To summarize, we see that there are four situations which can accom¬
pany a proof of existence. First we may extract from the proof a simple
formula for the solution, such as with the solution of the quadratic equa¬
tion. Second, the proof may lead us to a finite systematic computation
procedure (often called an algorithm), such as in case (1:2-20). Third,
the proof may lead us to an infinite systematic computation procedure,
as in case (1:2-21); this is especially true in connection with various
problems whose solutions are real or complex numbers. Finally, the
proof may lead us to no computation procedure at all, and must rest as
a pure statement of existence; this is the case in (1:2-22). The student will
find these different kinds of situations intertwined throughout this book
and his further work in mathematics; he should always keep a sharp eye
out for the distinctions, if they are not explicitly mentioned. If he becomes
worried about the small amount of attention that is paid to computation
in his further courses of mathematics, he should remember that this is
much more often out of necessity than out of perversity.
One more point to be made about statements of existence, which is
away from this main issue, is that “there exists” is to be interpreted as
“there is at least one.” Thus in (1:2-21) one can see that there must be
another root of x5 — 7x2 + 2 in the interval from 1 to 2 (compute the
value of the polynomial at 2). Similarly, in (1:2-22) there may be many
values of c in the given interval at which the function attains its maximum.
In order to say that there is just one object satisfying a given condition,
we usually use the words “there is exactly one” or “there is a unique.”
For example, the following is true:
is counted as true, but would not be so counted under the exclusive inter¬
pretation of “or. ” It may be that we do not give enough credit to the non-
1.2] LOGIC 11
is called an implication. The first part of it, £ < — 1, is called the hypoth¬
esis or antecedent and the second part, 9 < x2, is called the conclusion
or consequent. The use of implication in mathematics differs in certain
respects from ordinary usage. Very often we intend to convey, in every¬
day language, that there is some sort of cause and effect relationship
between the hypothesis and conclusion of an implication. Examples of
such are “If you eat that green apple (then) you will get sick,” and “If
you do that again (then) I’ll spank you. ” It seems very difficult to try to
give this sort of relationship a precise meaning, especially in connection
with mathematical statements. The simplest precise way to provide a
uniform treatment of implication is to demand that the truth or falsity
of an implication depend only on the truth or falsity of its components,
and should not necessarily depend on the sense of the components. We
see only one way to show that an implication is false, namely by showing
that the hypothesis is true while the conclusion is false. In all the other
(three) cases, under this understanding of implication, the implication
should be counted as being true. Consider, for example, the following
instances of (1:2-25):
Of these four, there is only one in which the conclusion is false and the
hypothesis true, namely (d); hence in all other cases the implication is
true. This may go slightly against the grain, especially in cases (b) and
(c), but only if we try to think of the conclusion as “necessarily following”
from the hypothesis. The hypothesis in both these cases is false, and one
often says, in such cases, that the whole implication is vacuously true.
In case the reader still has doubts he should compare these with such
12 THE LOGICAL BACKGROUND [CHAP. 1
statements as “If you high jump seven feet then I’ll eat my hat. ” [Which
of (1:2-25,)(b), (c) is this like?] Only (l:2-25')(d), of the four cases given,
shows that (1:2-25) is not true for all integers x.
The reader may feel that it is an academic matter to discuss implica¬
tions which have a false hypothesis. This is not the case, since a number of
arguments in mathematics, most notably proofs by contradiction (or
reductio ad absurdum) involve just such situations. In these proofs we
show that a certain statement (2 is not true as follows: we imagine that
(2 is true and we infer from this another statement, (B, which is known to
be false. In other words, we prove
Assuming that the inference is correct, i.e., that (1:2-26) is true, it follows
from the falsity of <S> that (2 cannot be true. Of course, if we knew in ad¬
vance that (2 is not true, we would not be interested in the implication
(1:2-26), since it is vacuously true. We shall give no examples of such
proofs by contradiction now; several of these, which we shall point out
explicitly, will be found later in the text.
There is one sense in which the notion of implication is used in mathe¬
matics which is more closely related to the everyday sense of “necessary
consequence. ” That is when we say that one statement (2 implies another
statement (B, by which we mean that (B can be logically inferred from (2
on the basis of the initial axioms. From the point of view of formal logic,
this will happen only in the case when the statement “if G then «” can
be inferred from the axioms, so that again there is no necessary connection
between G and (B. However, from the informal point of view, we usually
concern ourselves only with implications involving statements whose
contents are somehow related.
We now turn to the use of the words “if and only if. ” A condition such as
1. Tell, in each of the following cases, whether the statement is true or false
and give brief informal reasons for each answer.
(a) There exists an integer x such that x2 — 4 = 0.
(b) There exists a unique positive integer x such that x2 — 4 = 0.
(c) There exists a rational number x such that x2 + x + 1 =0.
(d) For any positive real number x there exists a real number y such that
y2 = x.
(e) For any real number x there exists a real number y such that y2 = x.
(f) For every integer x there exists an integer y such that x = 2y or
x = 2y -f- 1.
(g) For all positive integers x, if x > 3 then x2 > 9.
(h) For all positive integers x, if x > 2 then x2 > 9.
(i) For all real numbers x and y, if x < y and y ^ 0 then (x/y) < 1.
(j) For all integers x, x2 < 16 if and only if —4 < x and x < 4.
(k) For all integers x, x3 < 27 if and only if x < 3.
(l) For all integers x, (x3 — 1)/13 < 2 if and only if x < 2.
2. Which of the four possible combinations of truth and falsity in the hypoth¬
esis and conclusion can be realized by substituting particular integers for
x in the following condition? (Give examples of each.)
It is easily seen that these conditions are equivalent for all values of x.
This suggests that the two conditions are in some sense identical, although
the senses which they express are, on the face of it, different. The matter
can be put more exactly in the following way: the totality of objects which
satisfy condition (2:1-1) is exactly the same as the totality of objects which
satisfy condition (2:1-2). Among the objects which are contained in this
totality are the numbers 3, 0, •§, — f, \/2 + 3, tt + 1, etc., and among
those which are not contained in the totality are 5, —f, V^I, the moon,
etc. We cannot possibly list all members of this totality, but this should
not prevent us from ascribing an existence to this totality which is in¬
dependent of any particular way of describing the same totality.
When we shift attention in this way from particular ways of singling
out a certain group of objects, to the totality of objects itself, we come to
the notion of set (also called class or collection). Each condition involving
one variable determines a set. More generally, this holds for certain kinds
of conditions whose formulation involves several variables. Some of these
variables may be tied down by the use of words such as “all” or “some”;
these are said to be bound. The rest are called free variables. For example,
the condition
has one free variable, x, and one bound variable, y, while the condition
(in fact, statement)
where in the first two the free variable is understood to be x, in the third y,
and in the fourth n. Similarly,
indicate conditions with two free variables. Given a condition with one
free variable, say Ct(rc), we denote by
(2:1-6) {x\a(x)j
the set consisting of all those objects c which satisfy the condition. (2:1-6)
may be read: the set of all x such that d(x). In effect, for any specific con¬
dition &(x), (2:1-6) provides us with a symbolic means of denoting a specific
set, which is analogous to the use of constants to denote specific members.
Moreover, just as with constants denoting numbers (“2 + 3” and “5”
denote the same number), different expressions, for example, {x: &{xj) and
{x: ffi(rr)}, can be used to denote the same set. The condition for this is
simple:
(2:1-7) {x: d(rc)} = {x: CB(rr)} if and only if for all x, 0L(x) is equiv¬
alent to ffi(rc).
(2:1-9) c G S,
(2:1-10) c&S.
Thus, for the particular set S introduced in (2:1-8) the following are true
statements:
denotes the set whose only elements are —5, tt + 2, y/2. As another
example, we have
With this notation for a finite set it is immaterial in what order the ele¬
ments are listed or even whether they are listed more than once. Thus
all denote the same set, which is the same as the set denoted in (2:1-12).
A notation similar to this is often used informally to denote some infinite
2.1] SETS 17
(2:1-14) {1,4,9,...}
This “dot” notation is also often used to indicate certain finite sets of any
general class. For example, we might write
as a notation for the finite set consisting of the first n squares of positive
integers, when n is a positive integer.
Extensions of the concept of set. With every finite set can be associated
in a direct manner a certain condition which defines the set in the form
(2:1-6). For example, the set (2:1-12) is the same as the set
such examples.) This example is concerned with the idea that certain real
numbers, such as \/2, y/2 + §, y/2 — %7-, are like each other in that their
differences are rational numbers; others, such as y/2 + f, 7r + 2, are
unlike in that their differences are not rational. This suggests classifying
all the real numbers into different sets, where two real numbers x and y
belong to the same set S if and only if x — y is rational. Let the collec¬
tion of all such sets S be called A. Thus
^ ^2 - , ^2, . . . , ^2 +
S2 = {• . . , 7T — 91, . . . , 7T — Jf, . . . , 7T + 2, . . .}
Figure 2.1
Imagine each of the circles to represent one of the sets S of A; from each
of these one element is chosen, shown as a darker point, and the set B
is the set of all the chosen points. Unfortunately, there seems to be no
rule which one could give in advance for selecting a particular element
from each set S of A. In other words, there is no condition ®(z) that
we can think of (which does not assume prior knowledge of the set B)
such that B = (a::ffi(a:)}. (Convince yourself of this by trying, to the
contrary, to formulate such a rule or condition.) The statement that
there exists such a set B is a typical consequence of a general statement
which is called the 'principle or axiom of choice. We shall have several
occasions to apply this statement in the development of the properties
of the real number system; we trust that the student will find each such
application plausible in its own right.
I rom the initial consideration of sets defined by conditions we have
thus widened our conception to include sets which may not be defined by
any condition at all. The example in the preceding paragraph also suggests
2.1] SETS 19
another way for widening our conception. Sets can be formed by collect¬
ing objects of any kind. In particular, once we conceive of sets themselves
as being objects, we must allow sets whose elements are in turn sets.
Indeed, the collection A of the preceding paragraph is just such a set.
As another example, consider the set
Then
where P is the set of positive integers. Still another example is the set
then
where Re is the set of real numbers and Ra the set of rational numbers.
(2:1-24)(a) B c A,
(Some writers use the symbol C instead of c, while others prefer to write
20 THE SET-THEORETICAL BACKGROUND [CHAP. 2
In particular,
(2:1-26) A c A
(2:1-28) {I:lc4},
i.e., if they have exactly the same elements, then for all sets S, A e S if
and only if B e S.
1 he following familiar statements are immediate consequences of
(2:1-29):
The first of these is called the reflexive law for identity, the second the
symmetric law, and the third the transitive law. There are many relations
between objects, other than the relation of identity, which also satisfy
these conditions. For example, let us call two real numbers x, y equivalent,
and write x = y, if x — y is rational. Then it is seen that (2:1-30) (a)
through (c) hold true of any real numbers x, y, z when we replace the
symbol “= ” by the symbol Such “identity-like” relations are very
interesting and will be used at a number of places in our work. The point
to be realized from this is that the laws in (2:1-30) are not sufficient to
characterize identity, but that rather some much stronger condition, such
as (2:1-29), must be used. Certain other statements about identity,
with which the student is familiar from his school courses, such as “if
equals are added to equals the results are equal,” will be discussed in
Section 2.3.
since to this day we do not know whether this set contains any elements
(i.e., we do not know the truth or falsity of Fermat’s “last theorem”).
22 THE SET-THEORETICAL BACKGROUND [CHAP. 2
For example,
(2:1-34) {x: x is a prime number and 23 < x and x < 29} = {z\ z is
a unicorn}.
(2:1-35) 0.
(2:1-36) 0cA,
(2:1-37) A = {X:Xisaset}.
(2:1-39) for any set S and condition d(x) there exists a set A such that
for all x, x e A if and only if x G S and d(x).
In other words, if we already have a set S, we can form any smaller set
{x\ x e S and &(x)}. (If we now try to carry through the Russell paradox,
by forming {X: X e S and X <2 X}, we see that no contradiction is
obtained, since the resulting set A need not belong to S.) If the student
checks back over all the examples discussed in this section he will see
that in every instance we have formed a set by restricting a presumably
pre-existing set, for example the set P of positive integers, I of integers,
or Re of real numbers. On the other hand, there is a second principle which
we can use which will allow us to proceed from a given set A to a mod¬
erately larger set:
(2:1-40) for any set A there exists a set B such that for all X, X e B
if and only if X c A.
In other words, given a set A, we are assured that we can form the set
{X: X c A} of all subsets of A. By this means one can proceed from the
assumption only of the existence of certain simple infinite sets, such as
the set P of positive integers, to obtain the existence of somewhat larger
sets. In particular, we shall show how the existence of the set Re of real
numbers can be obtained by a combination of principles like (2:1-39)
and (2:1-40). The student may want to keep these principles in mind in
the further work; however, he should realize that they are by no means a
full set of axioms for set theory, and that therefore a number of our
set-theoretical arguments will have to appeal to his intuitive understand¬
ing of the subject.
3. Which of the following are true and which false? In each case, give a
brief explanation of your answer.
(a) (x: x G Re and x2 — 5x + 4 = 0} c (x: x G P and x2 — 5x + 4 > 0}.
(b) {a:: x G Re and x2 — 5x + 4 = 0} ^ 0.
(c) {x: x G Re and x2 + x + 2 = 0} 0.
(d) {a:: x G Re and x — 4 > 0} C (x: x G Re and x2 — 5x + 4 > 0}.
(e) {x: x G Re and x — 2 > 0} c (x:xG Re and x2 — 5x + 4 > 0}.
4. Let A = {S: S c I and for some x G I, S = {y: y G I and x — j/ is a
multiple of 5}}. One member of A is the set {. . . , —4, 1, 6, 11, . . .}.
What are the other members of A? Find a set B which contains exactly
one element in common with each member of A.
Of course, for particular sets S and A this diagram may not correspond
at all to the natural way of visualizing the relationship between A and S.
For example, if S is the set I of integers and
Figure 2.3
26 THE SET-THEORETICAL BACKGROUND [CHAP. 2
Thus, when using diagrams such as the first, we should keep in mind that
this may represent a considerable deformation of actual geometrical
relationships. A further simplifying step in the use of such diagrams is
accomplished by omitting picturing the points entirely, leaving them to
be supplied by our imagination (Fig. 2.4).
Figure 2.4
This has the double advantage of being less tiresome to draw and of allow¬
ing us to imagine that S may be infinite. Such pictures and their com¬
binations, to be described in the following, are often called Venn diagrams.
From given conditions &(x) and (S>(x) we can form the new conditions
(We must use the restriction to S in the last, since the use of sets {x: x & A}
which are too large, could lead to paradoxical conclusions.) We shall
assume that S is some fixed but arbitrary set in the following; then we can
write A instead of A(,S).
The sets formed in (2:2-4) can be visualized as follows:
Figure 2.5.
(2:2-5) A - B = A n B,
(A n B) u (B n A) = (A u B) - (A n B).
28 THE SET-THEORETICAL BACKGROUND [CHAP. 2
If we wish, a proof of this last statement can easily be given from the
definitions (2:2-4) and (2:2-5).
The following diagrams represent combinations involving three sets
A, B, C:
Figure 2.10.
A n (B u C) = (A n B) u (A n C)
and
The student should draw diagrams to pick out other combinations, such
as A n (B n o, (A u B) n C, A u (B n C), (A n B) u (B n C), etc.
The relation of inclusion, B Q A, is diagrammatically represented by
Thus if B c A, we have
A n B = B
and
A U B = A.
Conversely, it appears from Figs. 2.5 and 2.6 that if either of these latter
relationships holds we must have B Q A. We further have that for any
subset A of S, 0 c A, and A c S.
A relation which is in a sense opposite to that of inclusion holds when
A and B have no elements in common. We say that A and B are disjoint
if A n B = 0. Under this definition the empty set is disjoint from every
set, including itself. In general, the relation of disjointness can be pictured
as follows:
Figure 2.15
Basic laws of the algebra of sets. If we look back over Figs. 2.5 through
2.12, we see that in some sense only “typical” relationships between the
sets are shown, in that in no case does the relation of inclusion or disjoint¬
ness hold. For this reason an argument using Venn diagrams may be con¬
sidered slightly unreliable (just as with arguments by figures in plane or
solid geometry]. However, any such argument can be made definitive by
returning to the basic definitions of 0, c, n, U, and . There is also another
possibility (again analogous to geometry), namely, to try to find some basic
properties of these notions from which other properties can be deduced.
30 THE SET-THEORETICAL BACKGROUND [CHAP. 2
(A U B) u C;
(iv) A n (B u C) = (A n B) U (A n C) and
Au(5flC) = (iuB)n(Au C);
(v) A n 0 = 0 and A U S = *8;
(vi) A n S = A and A U 0 — A;
(vii) A n A = 0 and A U A = S;
(viii) (A n B) = A u B and (A U B) = A n B;
(ix) Z = A;
(x) B c A if and only if A D B = B;
(xi) B c A if and only if A U B = A.
It is known that the set of statements (i)—(ix) is complete in the sense that
any equation formed using 0, S, n, U, and ~~ (relative to S) and any
number of variables A, B, C, . . . which is true for any set S and subsets
A, B, C, ... of S can be deduced from (i)-(ix). As an example, consider
the statement following (2:2-5),
(2:2-7) (A n B) U (B n A) = (A u B) n (A n B),
(A u B) n (A n B) = (A U B) n (1 U B) _ by (viii)
= [{A U B) n A] U [{A U B) n 5] by (iv)
= lA_n (A U 5)_] U[Bfl (AU B)] by (ii)
= KA n A) u (A n 5)] u [(fi n A) u (B n B)]
_ _ by (iv)
[0_U (.A n 5)] u [(5 n A) u 0] by (vii)
[(Z n B) U 0] U [(H n I) U 0] by (ii)
(A 61 U (B n A) by (vi)
(A n B) u (B n A) by (ii)
After a little practice it is very easy to develop skill in such deductions
and to combine several applications of the basic laws into single steps.
For clearly, the right side of this equivalence follows from the left side.
2.2] AN ALGEBRA OF SETS 31
By a similar argument
B U C = A U B.
Hence
So far we have not used conditions (2:2-6)(x), (xi). These show that
c can be defined either in terms of D or U. They are not independent,
since we can deduce from (2:2-6)(i)-(ix) that
A U B = A,
{A U B) n A = 0 and A n (A U B) = 0.
(A u B) n A = (A n A) u (B n A) = 0 u (B n A)
= B n A = (A n B) n A =. (A n A) n B = 0 n B = 0;
A n (A u B) = A n (I n B) = (A n I) n B = 0 n B = 0.
Then from (2:2-6) (i)-(ix) and either (x), (xi) or (2:2-10) we can deduce
the properties of c, the following of which are the most basic:
(i) 0 c A,
(ii) A c s,
(Hi) A c A,
(iv) if A Q B and B c A, then A = B,
(v) if A Q B and B c C, then A QC.
We have already verified most of these directly from our basic definition
of c in the preceding section. Note that (2:2-ll)(iv) is just a restatement
of (2:2-8), using (2:2-10).
We have not yet considered the question of the correctness of the state¬
ments in (2.2—6) under our basic definitions. These can be proved using
the condition (2:1-23) of the preceding section, according to which two
sets are equal if they have the same elements. Thus, for example, to verify
the second part of (2:2-6)(iv), suppose x e A u (B n C). Then by
(2:2-4), x e A or x e B n C, hence x e A or (x e B and x e C). Suppose
xeA, then it is true that x e A or x e B, i.e., that x e A U B, and
similarly it is true that x e A U C. Thus in this case we see that x e
(A U B) n {A u C). On the other hand, if x e B and x €E C, we see
first that x e A u B and also that x e A u C, hence again that z e
{A U B) n (A U C). Thus in either case we obtain x g (A u B) n
{A UC). In other words, we have shown that if x e A U (B n C),
then x e (A U B) n (A U C). By establishing the converse implication
m a similar way, we would obtain x e A U (B n C) if and only if
i £ (4 U 11) n (i U C); in other words, A U (B n C) = (A U B) n
(A U C). By this procedure we can, with little exercise of imagination
and a great deal of writing, convince ourselves of the truth of each of the
statements in (2:2-6).
The pi oof of the completeness of (2:2-6), to which we have referred
earlier, is quite another matter. It, is, in contrast, a metamathematical
statement, for it is concerned with the possibility of certain deductions
rather than with a mathematical realization of particular deductions. The
formal study of these laws, which goes back to G. Boole, forms the initial
part of what is usually called Boolean algebra. The proof of completeness
w ic we shall not give here, can be found in various modern treatments
of Boolean algebra.
The reader has no doubt already recognized a striking resemblance
between some of the statements in (2:2-6) and the laws of ordinary
algebra. Indeed, if we replace the symbols n, U, 0, S by •, 0, 1 re-
2.2] AN ALGEBRA OF SETS 33
(i) A ■ A =: A and A + A = A;
(ii) A ■ B = B • A and A + B = B + A
(iii) A C) = (A-B) -C and
A + (B + C) = (A + B) + C;
(iv) A ■ (B + C) = (A-B) + (A • C) and
A + (-B ■C) = (A + B) • (A +C);
(v) A • 0 = 0 and A + 1 = 1;
(vi) A • 1 = A and A + 0 = A.
Of course, only (i), the second part of (iv), and the second part of (v) are
not met in ordinary algebra. Because of other formal similarities, many
writers often refer to the intersection of A and B as being the product
of A and B (written either A ■ B or AB) and to the union of A and B
as being the sum of A and B (written as A + B). Also, c satisfies many of
the conditions met by the ordinary < relationship among numbers; in
particular this is true of (2:2-8)(iii)-(v). However, for any two numbers
a, b we have a < b or b < a; the corresponding is not true of sets, i.e.,
there exist sets A, B such that neither A c B nor B c A. Because of the
many similarities between the laws for sets and those for numbers, the
use of the word “algebra” in dealing with sets is quite appropriate. Further,
various names used to describe ordinary algebraic laws are naturally
extended to the corresponding laws for sets. In particular, (2:2-6)(ii),
(iii), and (iv) are referred to respectively as commutative, associative, and
distributive laws. (2:2-6) (vi) shows that S and 0 act as identity elements
for n and U, respectively [cf. (2:2-6),(vi)]. (2:2-8)(iii), (iv), and (v) are
referred to, respectively, as reflexive, antisymmetric, and transitive laws. We
shall have occasion to refer to such statements again in a variety of
algebraic contexts.
Another interesting aspect of the statements in (2:2-6) is their symmetry
or duality. Namely, if we interchange n and U and interchange S and 0
while leaving unchanged, each part of the statements (2:2 6) (i) (viii)
is converted into the other part. Further, (2:2-6)(ix) is self-dual, since
it involves only —. Since any true equation in these symbols can be derived
from (2:2-6)(i)-(ix), the dual equation obtained by such an interchange
must also be derivable. For example, we have shown that
(A n B) u (B n 1) = (A u B) n (.A n B)
(A u B) n (B u I) = (A n B) U (A U B)
A n B nC n■ ■ • n X
= ix: x e A and x e B and x <E C and . . . and x e X}
and
A U B \j C U • • • U X
= ix- x e A or a; e B or x e C or . . . or x G X}.
The algebra of sets gives us greater insight into sets and their possible
relationships. It can also be used, to great effect, for the more compact
and precise expression of various statements from set theory. Consider,
for example, the axiom of choice, which we described by a kind of diagram
(Fig. 2.1) in the preceding section. Now it can be expressed as follows:
(2:2-16) Let M be any nonempty collection of sets such that for each
X e M, we have X 9^ 0, and for each X, Y e M if X 9^ Y
then X fi Y = 0; then there exists a set A such that A n X
contains a single element for each X G M.
-)-)-
a b
(a) (X U Y) = X n Y.
(b) (X u Y) nz = (Xuz) n w
(c) (x u Y) nz = (X uz)nf.
(d) If X c Y and X c Z then X c Y n Z.
(e) (X n Y) U (Y n X) c Y.
(f) If Y = (X n 7) U (Y Cl X) then X = 0.
36 THE SET-THEORETICAL BACKGROUND [CHAP. 2
It is not difficult to see (by factoring 2x2 — 3xy + y2 and considering the
different possibilities for the factors) that for any x, y, (2:3-1) is equivalent
to the following condition ffi(x, y):
This second condition makes it easier to see which numbers x, y are “solu¬
tions” of (2:3-1), in the sense that they make a(x, y) true. For example,
foi x 1 we have solutions y = 1 and y = 2, 3, ... ; for x = 2 we have
solutions y = 1, 2 and y = 4, 5, . . . ; for * = 5 we have solutions
y ~ 4, 5 and y = 10, 11, 12, ... ; etc. We cannot possibly list all
solutions x, y, since there are infinitely many of these; but we can imagine
a kind of infinite list which one could look into, to see whether or not a
given pair a, b is a solution. Let us, for the moment, use the notion of a
list m this extended sense. Schematically, such a list could be indicated
as in the table:
X l l 1 . . 2 2 2 . . 5 5 5 ...
y l 2 3 . . 1 2 4 . . 4 5 10 ...
Now since (2:3-1) and (2:3-2) are equivalent for all values of x, y, the
list of values associated with condition (2:3-1) is exactly the same as that
associated with condition (2:3-2). In other words, such a list serves the
2.3] RELATIONS AND FUNCTIONS 37
(2:3-3) (_), (. . .) are 'positive integers and (...) < (_) or 2(_) < (...).
Now what does it mean that a given pair of integers satisfies this con¬
dition? If we speak of the pair 1, 2 it evidently doesn’t matter whether
we place 1 for (_) and 2 for (...), or if we do just the opposite. On the
other hand, if we speak of the pair 2, 3 we get different results according
as we place 2 for (_) and 3 for (. . .) or conversely, for in the first case the
condition is not satisfied, while in the second it is. Hence the order in which
a given pair of integers a, b is presented and the manner in which these are
to be associated with the free variable (“empty places”) of a condition
must be specified. This leads to the concept of an ordered pair of objects
a, b; we shall denote such by
Ordered pairs and cartesian products. The ordered pair (a, b) stands in
contrast with the unordered pair {a, b} which we have already discussed.
For though we have {a, b} = {b, a), it is essential to the concept of ordered
pair that we have (a, 6) ^ (6, a), unless a = b. More generally, we have
We trust that it is no more difficult for the student to grant the existence
of objects (a, b) with this property than it is to grant the existence of sets;
in other words, we take the idea of ordered pair here as being a primitive
undefined notion. However, it is possible by a slightly sophisticated trick
to define it in terms of more basic notions (compare the first exercise at the
end of this section).
Having ordered pairs, the next step is easy. Instead of talking about
(possibly infinite) lists, we simply talk about sets of ordered pairs. For
example, associated with the condition (2:3-2) is
(2:3-6) the set of all ordered pairs (x, y) such that x, y are positive
integers and y < x or 2x < y.
Among members of this set we find the pairs (1, 1), (1, 2), (1, 3), ■ • ■ ,
(2, 1), (2, 2), (2, 4), . . . , (5, 4), (5, 5), (5, 10), . . . ; among nonmembers
38 THE SET-THEORETICAL BACKGROUND [CHAP. 2
we find (2, 3), (3, 4), (3, 5), . . . , also ( 1, 2), (t, 1), etc. More generally,
given any condition a(x, y) it seems that we can associate with it
(2:3-7) the set of all ordered pairs (x, y) such that a(x, y).
(2:3-8) for any sets A, B there exists a set C such that for all z, z G C
if and only if for some x, y we have z = (x, y) and x G A and
V e B.
In other words, C has as members those, and only those ordered pairs
(x, y) for which x G A and y G B. This set C is denoted by
(2:3-9) AxB
We call x the first term of 2 and y the second term of 2. We can now apply
the principle (2:2-39) of the preceding section to see that
(2:3-11) for any sets A, B and condition a(x, y) there exists a set W such
that for all z, z G W if and only if z G A X B and, for the
unique x, y such that z = (x, y), we have Gt(x, y).
This is the set IF = {2: for some x G A, y g B, a(x, y) and 2 = (x, y)\
or, as we shall write more economically,
If A, B are finite sets then A X B is also finite, for we can list com¬
pletely all possible combinations (x, y) of elements s of A with elements
2.3] RELATIONS AND FUNCTIONS 39
and
It may be that for a set IF, W contains no ordered pairs or contains some
elements which are not ordered pairs. However, we have
Further, we have
«0U)
©07) 4
Figure 2.18
Since relations are just special kinds of sets, it follows that the con¬
dition (2:1-23) for the identity of two sets can be applied equally well to
relations. However, every element of a relation is an ordered pair, so we
can replace the condition in this case by the following more special one:
In this respect relations play a role for conditions with two free variables
which is completely analogous to the role played by sets for conditions with
one free variable; they serve to identify equivalent conditions. We must,
however, be cautious about one point in the analogy. Whereas there is
at most one set associated with each condition Q(x), there are in general
two relations associated with conditions a(x,y). To see this, return to
the form (2:3-3) in which we expressed a certain condition using symbols
_ and . . . instead of variables x and y. In this form there is no reason to
prefer one symbol to vary over the domain of the relation and the other
to vary over the range. Associated with the given condition are two rela¬
tions IF and W, one consisting of all pairs (_,...) satisfying the condition,
while the other consists of all pairs (...,_) satisfying the condition.
We can say that the relations W, IF are connected in the following way:
In such a case W is said to be the converse of IF; hence also IF is the con¬
verse of W. Consider, for example, the relation IF = {(1, 1), (2, 1), (3, 2)}
42 THE SET-THEORETICAL BACKGROUND [CHAP. 2
l 2 3
Figure 2.19
It turns out in this case that we do not need to take this as a new primitive
notion so long as all we demand of this notion is that it fulfill (2:3-23). If
we define
then we can deduce (2:3-23) from the basic property (2:3-5) of ordered
pairs. This also leads us to define
(2:3-25) A X B X C = (A X B) X C,
so that A X B X C is the set of all triples (x, y, z), in the sense of (2:3-24)
such that x E A, y e B, and z e C. For example, for A = (—2 0 5^-
B = (3, 5} and C = {0, 3}, 1
2.3] RELATIONS AND FUNCTIONS 43
This is not the same as the set A X (B X C). For example, (—2, 3, 0) =
((—2, 3), 0) by definition, which is distinct from (—2, (3, 0)). However,
there is a clear one-to-one correspondence between the elements of the two
sets. It is now easily seen how one would define the notion of ordered
quadruple (a, b, c, d) and the product A X B X C X D, and so on, for
larger numbers of factors, and in this way see how to treat conditions with
arbitrarily many free variables. We thus have for any specified positive
integer n a notion of ordered n-tuple, which agrees with that of ordered
pair for n — 2 and of ordered triple for n = 3.
We have defined a relation as being a subset of A X B for some A, B or,
equivalently, (2:3-17) as being any set of ordered pairs. (Then 0 is a
relation, since 0c4 X B for any A, B.) Under our definition (2:3-24),
every ordered triple (a, b, c) is at the same time an ordered pair, although,
of course, the converse is not true. Thus every set W of ordered triples is
a set of ordered pairs ((a, b), c) and hence is a relation. It is, however, a
relation of a more special kind, which we call a ternary relation. (More
generally, using the notion of ordered n-tuple, we could single out for
any specified positive integer n, the n-ary relations.) We could, if we wished,
refer to an arbitrary relation as being a binary relation, but this only
serves to re-emphasize the fact that it is a set of ordered pairs.
A nonmathematical example of a ternary relation is provided by the
set W = {(x, y, z): x, y, z are people and 2 is a son of x and y}. It is seen
that there are many a, b for which there is no c with (a, b,c) G W; for
example a, b may not be married or may be married but have no son. On
the other hand, every human male c is the son of some a, b, so that
(ft(W) = the set of human males. W also has the property that if
(a, b, c) G W then (b, a, c) G IF; it does not have the property that if
(a, b, c) G IF and (a, b, c') G IF then c = c'. A mathematical example of
a ternary relation is provided by the set IF' = {(x, y, z): x and y are odd
prime numbers and z = x + y). Let 0 be the set of odd prime numbers
3, 5, 7, 11, ... , and let U6 be the set of even numbers z > 6. Then
£>(W') = 0X0 and (R(IF') c U6; it is a famous open question (Gold-
bach’s problem) whether (R(fF') = E6. W' has the property that if
(a, b, c) G W' then (6, a, c) G IF'; it also has the property that if
(a, b, c) G IF' and (a, b, c') G IF' then c = c'.
IF; IF' is called the composition of IF and IF'. (Some writers use the
symbol IF ° IF' for this.) For example, if IF = {(x, y): x is a son of y}
and IF — {(x, y). x is a child of y} then IF; IF' = {(x, y): x is a grandson
of y} ■ lfW = (Of y):x,y El and x < y) then IF; IF = {{x, y): x, y e I
and x + 1 < y}.
A similar operation on relations using the words “for all” can also be
defined, but we would find no use for it here.
Equivalence relations are very much like the identity relation. Consider
the following two relations: IF = {(x, y): x, y £ I and x — y is a multiple
of 3}, W' = {(x, y): x, y E Re and x — y E Ra}. The first of these is
an equivalence relation in the integers, the second in the real numbers.
If we write a = b instead of (a, b) e IF, we have
. . . —6 = —3 = 0 = 3 = 6 == 9 = . . .
. . . — 5 = —2 = 1 = 4 = 7 = 10 = . . .
. . . —4 = —1 =52^5 = 8 = 11 = ...,
The sets in M are called the equivalence sets associated with W. One often
writes [a] instead of Wa when working with some fixed relation W. (2:3—29)
shows that the equivalence relation W in S corresponds directly to the
identity relation in M.
Then we have:
We leave it to the student to verify this. (2:3-29) and (2:3-31) show that
we have a direct correspondence between equivalence relations and parti¬
tions.
The identity relation has other interesting mathematical properties. For
example, if a, b, c are integers and a = b then a + c = b + c and
a ■ c = b ■ c. To what extent are these properties shared by other equiv¬
alence relations? For example, if = is the relation defined above, so that
a = b if and only if there is a u such that uel, with a — b = 3 • u,
we see that a = b implies a + c = b -f- c [compute (a + c) — (b -j- c)]
and a - c = b ■ c [compute (a ■ c) — (b • c)]. For the relation W' between
reals, which we write now =', so that a = b if and only if a — b e Ra,
we have again a =' b implies a + c =' b + c, but we cannot in general
infer that a • c =' b • c (for 1 =' 0 but 1 • \/2 0 • \/2.) Equivalence
relations which do have such additional algebraic properties will prove
to be very useful in our development.
(2:3-33) a relation F is a function if and only if for any x, y\, and y2,
(x, yi) G F and (x, y2) G F implies tji = y2.
or, equivalently, by
(2:3-34) (c) F is the function with domain Re such that for each x G Re,
F(x) = x3 — 1.
Figure 2.21
The union of the first two graphs is just the set of all (x, y) such that
x2 + y2 = 1; however, that set is clearly not a function.
A precise definition of what it means for F to be an implicit function
associated with a given condition d(x, y), within a preassigned domain S,
might run as follows: the domain of F is {x \ x G S and for some y G S,
&(x, y)} and for each x G 3D(F), F{x) G S and d(x, F(x)) holds. Given a
condition d(x, y) and set S, for each x G S let Wx = {{x, y):y G S and
CL(x, y)}; then let D = {x\ Wx ^ 0}. For any Xi, x2, if WXl 9^ WX2
then X]_ x2, hence WXl n WX2 = 0. Let M be the collection of all
sets Wx for x G D. By the axiom of choice there is a set F such that
F n Wx contains exactly one element for each x G D, Hence for each
x G D there is a unique y G S with (x, y) G F; moreover, (x, y) G Wx,
so that d(x, y). Thus there always exists at least one implicit function
associated with d(x, y) and S. The problem of implicit functions in calculus
goes deeper: in which cases can we prove the existence of at least one im¬
plicit function satisfying additional conditions of continuity, differentia¬
bility, etc.? It is with respect to such functions that, say, rules of ‘'im¬
plicit differentiation” are supposed to have significance.
Closely related to the implicit functions are the so-called multivalued
functions. Authors who use this term often refer to the notion of function
presented here as being that of a single-valued function. For example, it
is known that with every complex number z 9^ 0 is associated exactly
two complex numbers w with w2 — z. The question of distinguishing
between these two square roots of z is not as simple as in the case of real
numbers, since we cannot speak of positive or negative complex numbers.
The equation F(z) = -s/z does not define a function in our sense of the
word. There are two approaches to this problem in the theory of complex
numbers. One is to speak of the branches of the “function” \/z, i.e., of
certain single-valued functions which together provide both square roots
for every number z. The second is to expand the notion of complex
number by the use of Riemann surfaces. For the square-root function,
in place of a single complex number z ^ 0 there will now be two numbers
Z\, z2 on the associated surface, and one single-valued function F such
50 THE SET-THEORETICAL BACKGROUND [CHAP. 2
that F(zx) is one square root of 2 and F(z2) is the other. In this book
we shall always use the word “function” in its single-valued sense, i.e.,
according to (2:3-32) or (2:3-33), and we shall treat situations which
lead to multivalued functions in terms of these.
A ternary relation, i.e., a set of ordered triples {{x, y), 2), can also be
a function. The condition for F to be such is given by (2:3-33): for any
x> Vi zi> z2, if ((%, y), Z\) e F and ((x, y), 22) E F then zx = z2. Accord¬
ing to our definitions, the unique 2 associated with (x, y), if there is any,
is denoted by F((x, y)); for simplicity, we shall instead denote it by
F(x, y). If a function F is a ternary relation we shall call it a binary func¬
tion (function of two arguments or variables). Although it is not necessary
to qualify it in this way, we can refer to an arbitrary function (which is
not otherwise specified to be binary) as being a unary function (function
of one variable). Functions of more than two variables can be treated
similarly, so that for any specified positive integer n, we can talk of
functions of n variables, or, simply, of n-ary functions. The clearest way
to express that a function is, say, binary is to describe its domain, for
example 2D (F) = S X S.
In algebra it is customary to use the word operation instead of function,
but these have exactly the same meaning. Thus when we speak of the
operation of multiplication on real numbers we mean the function F, with
domain Re X Re, such that for all a, b £ Re, F(a, b) = a • b. An opera¬
tion is called unary, binary, etc., under the same conditions as a function.
Thus multiplication is a binary operation.
and bi = b2 then F(ax, 5X) — F(a2,b2) whenever (ax, 6X) G 3D(F); for
therefore F(ax, 5X) is a z for which ((a2, b2), z) G F—but there is only
one such z, which we have called F(a2, b2). Similar statements hold for
functions with other numbers of arguments. On the other hand, if = is
an equivalence relation in a set S closed under an operation F, we have
seen that it need not be true that if ax = a2 and 5X = b2 then F(ax, 6X) =
F(a2, b2). The cases in which this is true are of special interest:
We need only see that the relation consisting of all triples (([a], [6]),
[F{a, 6)J) is a function; if ([a'}, [6']) = ([a], [6]) then a = a', b = b' and
hence F(a, 6) = F(a', b'), i.e., [F(a, 6)] = [F(a', 6')]. Consider, for ex¬
ample, the equivalence relation in the integers, W = {{x, y): x, y e I and
x — y is a multiple of 3). We have three equivalence sets, [0], [1], [2].
It is easily seen that the operation F(a, b) = a + 6 is well defined with
respect to this equivalence relation. For this we have by (2:3-37) an as¬
sociated operation F(X, Y) on equivalence sets, which we denote by
X ® Y. Then it can be seen that [0] © [1] = [1], [1] © [2] = [3] = [0],
[2] © [2] = [4] = [1], etc. More compactly:
For F~1(b) is the unique x such that (6, x) E F, i.e., such that F(x) = b;
hence F~ 1(F(a)) is the unique x such that F(x) — F(a), i.e., is a itself.
The notion of one-to-one function applies directly to functions of more
than one argument, for then we merely view the domain as a set of ordered
pairs, triples, etc. For example, the function G(x, y) = x -p y with domain
P X P is not one-to-one, while the function H(x, y) = 2X • 3?y is. (Why?)
The composition F; G of two functions F, G is the set of all ordered
pairs (x, y) such that for some z, (x, z) E F and (z, y) E G. Given x, y,
if there is any such 2 then x E 2D (F) and 2 must be F(x); then (2, y) E G
implies 2 e 2D (GO, i.e., F{x) E 2D (G), and y = G{z), i.e., y = G{F(x)).
Hence it is seen that
(S, Fi, F2, . . . , Fk, Wx, W2, ■■■, Wh ax, a2, ..., am)
and
(S', F[, F'2, ... , Fr, W[, W2, . . ■ , W[', a'x, 02 , ... , a'm,)
As examples, (Re, <) and (Re, +) are not of the same type, (Re, •) and
(Re, Sq) are not of the same type, where Sq(z) = x2, while (Re, +, •, y/2)
and (Re', •, +, —3) are of the same type.
Now the algebraic indistinguishability of two systems can be explained
as follows.
when Re = {x: ieRc and x > 0}. Here the symbol + is used to denote
the binary operation F(x, y) = x + y on Re X Re, and • denotes the
operation F'(x, y) = x ■ y on Re' X Re'. On the right side, < denotes the
relation < restricted to elements of Re'. A suitable one-to-one function
which will establish the isomorphism is G(x) = 2X for x e Re. To verify
that this works, we would have to show that:
(i) T>(G) = Re and (R(G) = Re' (i.e., for each y e Re', with y > 0,
there is an x e Re with 2X = y), and G is one-to-one (i.e. if
2xi = 2X2 then xx = x2);
2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 57
(2:4-4) the statement that (S) is isomorphic to (S') reduces to the state¬
ment that there is a one-to-one correspondence G between S and S'.
If this holds, we say that S and S' are set-theoretically equiva¬
lent or equinumerous.
The latter term is used because we can pair off the elements a, b, ... of
S with the elements a', b' ... of S' by the rule a' = G(a), b' = G(b), ... .
This is just an abstract version of the most primitive form of counting
(to see how many sheep one has, tie each to a tree). However, if we say
that two sets have the same number of elements whenever they are set-
theoretically equivalent, we open the way for some apparent paradoxes.
For note that the sets S — (1, 2, 3, . . .} ( = P) and S' = (2, 4, 6, . . .}
are set-theoretically equivalent by the function G(x) = 2x. Thus a set
can have the same number of elements as a proper subset of itself. How¬
ever, there is no real contradiction here unless we should also try to
insist that a set S cannot have the same number of elements as any proper
subset S' of itself, as we know to be the case with finite sets S. In fact,
we see now, from these intuitive judgments, how to give purely set-
theoretical definitions of the notions of finiteness and infinity:
It is also often said that S and S' have the same cardinal number if they
are set-theoretically equivalent. Note that this does not say what a
cardinal number is; it merely defines a relation between sets, namely,
being of the same cardinal number. Now it turns out that if we could
form a set
(2:4-8) Suppose that (S, F1, F2,..., Fk, Wu W2)..., Wl} a1} a2,...
O and (S', F{, F>2, . . . , F’k, W[, W’2, . . . , W'h a[, a'2, . . , a'm)
are systems of the same type. We say that the second system is
a subsystem of the first, or that the first is an extension of the
second, if the following conditions hold:
(i) S'Ctf;
(ii) a[ = ax, a2 = a2, . . . , a'm = am;
2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 59
It is very common to use the same symbols for the operations, relations,
and constants of two systems when one knows or wishes to indicate that
one is a subsystem of another. Thus, for example, we write (Re', +, •),
(Re, +, •)• This implicitly involves a statement that the + in the first
case denotes a binary operation F' whose domain is Re' X Re' and whose
range is contained in Re', such that for any x, y E Re', F'(x, y) = x + y,
where -j-is the operation given in the second system; similarly, for the use
of • . Among other things, implicit in this is the fact that for the operation
+ given on Re, we have x -)- y E Re' whenever x, y G Re', i.e., that Re'
is closed under the operation + from the larger system. In contrast,
starting with the system (Re, —), we could not speak of a subsystem
(Re', —), despite the fact that Re' c Re, since Re' is not closed under
subtraction and hence (Re', —) does not even form a mathematical system.
Because of the conditions (iii), (iv) of our definition (2:4-8), such
ambiguity in denoting distinct operations and relations will not in general
lead to confusion. The only case where we must be careful is when we are
dealing with more than one extension of the same system, where the
extensions themselves may not be related to each other by c. In such
cases we may continue to use the same symbols for one extension, while
introducing new symbols for the second, e.g. (Re', +, •), (Re, +, •),
{S, ©, °). Here the fact that the first system is intended to be a sub¬
system of the third is given by the statement: Re' c S and for all
x, y G Re', x + y = x © y and x ■ y = x ° y; while, by this symbolism,
nothing need be said to indicate that the first is a subsystem of the second.
Systems related by the conditions of (2:4-8) share many interesting
algebraic properties. For example, if F± is commutative, i.e., Fx{x, y) —
F\{y, x) for all x,y<ES, then so is F\, F\{x, y) = F\{y, x) for all x,y e S'.
On the other hand, and in contrast to the case of =, they do not in general
share all algebraic properties. For example, in (Re, -j-) we know that there
is an x (e Re) such that for all y (e Re), x + y = y, namely x = 0.
But in (Re', +) the corresponding statement is false, i.e., there is no
x (G Re') such that for all y (e Re'), x + y = y. There are cases where
we have a proper subsystem which is, at the same time, isomorphic to
60 THE SET-THEORETICAL BACKGROUND [chap. 2
the extended system—for example, (E, +) with (I, +), where E is the set
of even integers. However, such cases are rarely met in algebra. We shall
have more to say later concerning the connection between properties of
a system and some subsystems.
We shall conclude this section with a general result on isomorphism and
extension systems which will have several applications in our work to come.
The situation usually faced in these applications is the following. We have
a system
(/S', . . .) — (S, Fi, F2, Fk, W W2, Wi, alt a2, , am)
Figure 2.22
(2.4-9) Suppose that (*S, . . .), (S, . . .), and (S*, . . .) are all mathematical
systems of the same type and that (/S, . . .) ^ (S, . . .) and
(S, . . .) is a subsystem of (S*, . . .). Then we can find a system
{S*, . . .) such that (£*, ...)=* (S* . . .) and (*S, . . .) is a sub¬
system. of (*§*, . . .).
2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 61
Figure 2.23
(5) S c S*;
(6) a = a*;
(7) (u, v) G W if and only if (u, v) e W*, for all u, v e S;
(8) F(u, v) = F*(u, v) for all u, v e S.
The first step in finding the required system (S*, F*, W*, a*) is to find
a set >S* and a function F[ such that the following conditions hold:
(9) S c F*;
(10) H is one-to-one, with 30(77) = and (R(H) = S*;
(11) H(x) = G(x) for all x e S.
(12) a* = a
Then to see, for x, y £ S*, whether to put (x, y) in W*, we first determine
whether the corresponding pair (H(x), H(y)) belongs to W*, i.e., we define
(13) (x, y) e W* if and only if (H{x), H(y)) £ W*, for all x, y £ S*.
Finally, to find, for x, y £ $*, what value 2 to ascribe to F*(x, y), we first
see what value w is given to F*(H(x), H(y)) and then choose 0 so that w
corresponds to it under H. In other words, we define
It follows that
Finally, we have
by (4) and (11). But by (1) and (11), H(x), H(y) £ S, so that
by (8). Hence
H(F(x,y)) = F *(H(x),H(y)),
and
Thus F{x, y) = F*(x, y) by the definition (14) of F*. The proof of the
condition on W, W* makes similar use of (1) through (8), (11), and (13).
Since a* is chosen equal to a by (12), this completes the proof.
CHAPTER 3
3.1 Basic properties. The positive integers have two basic uses, count¬
ing and ordering. The simplest concrete representatives of these numbers
are the series of tallies
Peano systems and inductive proofs. We are now led to consider systems
with these three basic properties.
64
3.1] BASIC PROPERTIES 65
Other than 3.2 and the axioms of set theory referred to in Chapter 2, no
other basic assumptions will be needed in this book.
One of our first main objects will be to show that there is essentially
only one Peano system, i.e., that any two Peano systems (P, Sc, 1) and
(P', Sc', 1') are isomorphic. Let us first see that this would not be the case
if any one of the conditions 3.1 (i)—(iii) were omitted. The argument here
is informal.
The simplest system which satisfies conditions 3.1(h) and (iii) but not
3.1 (i) consists of a single element, which we denote by V. We put P' = {T}
and Sc'(P) = P. A system which satisfies 3.1 (i) and (iii) but not 3.1(h)
must have at least two different elements. Here let V, 2' be any distinct
objects, set P' = {P, 2'}, Sc'(l') = 2' and Sc'(2') = 2'. Finally, to con¬
struct, a system which satisfies 3.1 (i) and (ii) but not 3.1 (iii), it is seen
that we must take a domain with infinitely many objects. For in this
domain we must have at least the elements P, Sc'(l'), Sc'(Sc'(l')), . . . ;
by 3.1(i), each of the elements past P must be distinct from P. Further,
we have by 3.1(h) that if Sc'(Sc'(P)) = Sc'(P), then Sc'(l') = P, con¬
tradicting 3.1 (i), so Sc'(Sc'(P)) is also distinct from Sc'(P). Similarly it
is seen that each of the elements in this sequence is distinct from all other
* Although the study of positive integers is ancient and the axiomatic method
itself is a couple of thousand years old, an explicit axiomatic treatment of the
positive integers dates only to the late nineteenth century, beginning with the
work of the mathematicians R. Dedekind and G. Peano. The conditions set
down in 3.1 correspond directly to the axioms given by Peano.
66 THE POSITIVE INTEGERS [CHAP. 3
3.3 Theorem. Let (P, Sc, 1) be a Peano system. Then for any x e P,
either x = 1, or there is a y e P with x = Sc (y); moreover, in the
latter case, y is unique.
(3:1-1) 21 = 2,
2sc(*) = 2* • 2, for all ieP,
would completely define 2“ for any a e P. Note that these rules do not
constitute an explicit definition of 2“, but only provide us with a systematic
procedure to calculate 2“. Intuitively, we must get a unique value from
this calculation for any a. This can be put in more general terms as follows.
We suppose that we have a known function G available, in this case one
which gives us multiplication by 2, G(x) = x • 2 for any x £ P. We wish
to find a function F(x)(=2X) which has a given value c at x = 1, in this
case c = 2. Further F(Sc(x)) is related to F(x) for any xePby the use
of the function G, F(Sc(x)) = F{x) ■2 = G(F(x)). The question is
then, given any number c e P, and any function G on P with values in
P, does there exist a function F with domain P such that:
(3:1-2) HD = c>
Proof. The proof falls into two parts; we first show that there exists
at least one function F satisfying the conditions (i)-(iii), and we then
show that any two such functions are identical.
[The latter is just another way of expressing (iii), that if F(x) = y then
F(Sc(x)) = G(y)]. However, there are many relations which satisfy this
condition; one such is P X S. What distinguishes the desired function
from all these other relations is that we want (a, b) to be in it only as
required by (ii)', (iii)'. In other words, it is to be the smallest relation
satisfying (ii)', (iii)'. This can be described precisely as follows:
We shall now show that we can derive from (1) that F is also one of the
relations in M.
(3) (1 ,c)EF.
This follows immediately from the definition of fl and the fact that
(1, c) E IT for all IT e M.
(6) IgA.
Suppose that x G A, so that whenever (x, z{) G F and (x, z2) G F then
= z2. We must show that whenever (Sc(x), w 1) G Band (Sc (a:), w2) G F
then w1 = w2. To prove this, it suffices to show that
For if (8) is true, we would have for the given Wi, w2 some zh z2 with
Wl = G(zi), w2 = G{z2), (x, z 1) G F and (x, z2) G F. Then, since
x G A, 21 = z2 and hence G(z\) = G{z2), that is, W\ = w2. Now to
prove (8) suppose, to the contrary, that it is not true; in other words,
suppose that we have some w with (Sc(a:), w) G F but such that for all
2 for which (x, 2) G F we have w 9* G{z). Consider the relation W =
F — {(Sc(x), w)}. We shall show that W G M. First of all (1, c) G F
and (1, c) (Sc(x), to); hence (1, c) G W. Suppose that (u, y) G W;
then (u, y) G F and (Sc(it), G(y)) G F. Clearly if u 5^ x then (Sc(m),
G{y)) ^ (Sc(a:), w) [by 3.1(h)], so that in this case (Sc(w), G(y)) G IF.
On the other hand, if u — x and (Sc(u), G(y)) = (Sc(a;), w), then
w = G(y), where (x, y) G F, contrary to the choice of w; hence (Sc(w),
G(y)) 5* (Sc(a:), w), so again (Sc(w), G(y)) G W. Thus whenever
(u, y) G W, also (Sc(w), G(y)) G IF. Now that we have shown IF g M
we see by (2) that F c W; but this is false since (Sc(x), w)e F and
70 THE POSITIVE INTEGERS [CHAP. 3
(Sc(x), w) & W. Thus our hypothesis that (8) is incorrect has led to a
contradiction, and now (8) is proved. Since (7) follows from (8), we have
by induction from (6) that A = P. Hence
(9) F is a function.
We have still to prove that F satisfies condition (i); we must show that
for each x £ P there is a y with (x, y) e F. Since F c P x S, it will
then follow that 2D(F) = P and (R(F) c S. Let B = 2D(F), that is,
Part 2. It is much easier to prove that there cannot be more than one
such function. Suppose that Fx, F2 both satisfy the conditions (i)-(iii);
we wish to show Fx = F2, he., that for all x £ P, Fi(x) = F2(x). This
is proved by induction on x. By (ii), Fx(l) = c and F2(l) = c, so
Fx(l) = F2{ 1). Suppose that Fx(x) = F2(x); then Fx (Sc(x)) = G(F1(x))
and F2(Sc(x)) = G(F2(x)), so
Fi(Sc(x)) = F2(Sc(x)).
3.4' Theorem. Let (P, Sc, 1) be a Peano system. Suppose that S is a set,
c £ S, and that G is a binary function with 2D(Gr) = P X S and
(R(G) c S. Then there is a unique function F satisfying the following
conditions:
Proof. Suppose that (P, Sc, 1) and (P', Sc', P) are Peano systems.
We want to find a function F with the following properties:
(1) F is one-to-one;
(2) D(F) = P and (R(F) = P';
(3) P(D = i';
(4) for any x G P, F(Sc(x)) = Sc'(F(x)).
[That these are the requirements for = is seen by taking F for G in (2:4-3).]
Apply 3.4 with P' as S', 1' as c, and Sc' as G. Then the function F so
obtained already satisfies (3) and (4) and the first part of (2). Further
<3t(F) C P'. To prove that (R(F) = P', we must show that for any i/eP'
there exists an x E P with F(x) = y. Let
We shall show that A = P' by induction in P' [i.e., we apply 3.1 (iii) to
(P', Sc', 1')]. Clearly 1' e A. Suppose that y G A, so that y = F(x),
where x E P. Then Sc'(y) = Scr{F(x)) = F(Sc(x)) by (4), and therefore
also Sc'(y) G A. Thus the induction, and hence (2), is proved. Now let
3.6 Convention. Throughout the following we shall assume that (P, Sc, 1)
is some fixed Peano system. We shall call P the set of positive integers.
Thus when we use the symbols P, Sc, 1 in the following, we can omit the
explicit statement that these form a Peano system.
For the purpose of this set of exercises, we temporarily put ourselves prior to
the proof of 3.4. Thus assume below that (P, Sc, 1) is any Peano system.
Prove:
(e) For each x £ P there is at least one suitable partial function for [1, x].
(f) For each x £ P there is at most one suitable partial function for [1, x].
3.2] THE ARITHMETIC OF POSITIVE INTEGERS 73
By (e), (f) there is for each x G P a unique suitable partial function for
[1, x] \ denote this function by Hx. Prove:
(g) For any x G P and y G [1, x], HSc(x)(y) = Hx{y).
Finally, define a function F with domain P such that F(x) = Hx(x) for
each x £ P. Then prove that
(h) F is a function satisfying 3.4(i)-(iii).
(3:2-1) x + 1 = Sc (a;),
x + Sc (y) = Sc (a: + y).
(3:2-3) x • 1 = x,
x • Sc(y) = (x ■ y) + x,
(3:2-4) F2(x, 1) = x,
F2(x, Sc(y)) = F2(x, y) + x.
(1) cx = H{x),
Then define
(5) for all x,yE P, F{x, y) = Fx(y), where Fx is the unique function
satisfying (4).
Similarly, for each fixed positive integer m, we can prove the existence
and uniqueness of functions of m + 1 variables satisfying conditions
such as:
(iii) ' /or aZZ aq, x2, . . . , xm, y <E P, F(xi, x2, . . . , xm, Sc(?/))
= (7(^i, x2, . . ., xm, y, F(x i, x2) . . . ,xm, y)).
x + (y + z) = {x + y) + 2.
76 THE POSITIVE INTEGERS [CHAP. 3
Then
(2) 1 £ A,
since
x + (?/ + 1) = x + Sc(y) by 3.8(h)
Sc(x + y) by 3.8(iii)
(* + y) + 1 by 3.8(h).
Also
Both proofs are equally acceptable, but sometimes a proof with parameters
will not work while the other proof will (cf. again the proofs of 3.4, 3.5).
The student can easily justify the separate steps in the following proofs
in the same style as in 3.9.
Proof. Let
Obviously
(2) 1 e A.
3.2] THE ARITHMETIC OF POSITIVE INTEGERS 77
Also
(Sc(s) + 1) : (0 + 1) + l) = ((1 + ») + l)
= (l + {x + 1)) by 3.9
= 1 + Sc (a;).
Hence A = P.
x + y = y + x.
Then
(2) IgA
by 3.10. Also
y A~ z then x = y.
(i) x = y;
(ii) for some u e P, fc = y + u;
(iii) for some v E P, y = x + v.
Sc(x), so that case (i) holds for Sc (a;), y. Otherwise, v = Sc(w) for some
w) then
so that in this case, (iii) holds for $c(x), y. This proves (3) and the induc¬
tion is complete.
3.16 Theorem. {Left distributive law for ■ over +) For any x, y, z & P,
x • (y + z) = (x ■ y) + (x • z).
(1) x • {y + 1) = (x ■ y) + (x • 1).
(2) If x • (y + z) = (x ■ y) + (x ■ z)
then
x • (y + Sc (2)) = S • y) + (x • Sc (2)).
This is easily converted into an inductive proof of the form 3.1 (iii).
(x + y) ■ z = (x ■ 2) + (y ■ 2).
80 THE POSITIVE INTEGERS [CHAP. 3
(!) 0 + y) • 1 = (x • 1) + (y ■ 1).
(2) If (x + y) • 2 = (x • z) + (y • 2)
then
x • y = y • x.
(1) x • 1 = 1 • x.
x- (y -z) = (x • y) • z.
if x ■ z = y • z then x = y.
x ■ z = (y + u) ■ z — (y -z) + (u-z)
(i) lv = 1,
(ii) xy -xz = xv+z,
(iii) (xv)z = xv",
(iv) (x ■ y)z = xz ■ yz.
Proof. We prove only (ii), leaving the others for the student. The proof
is by induction on z.
(1) xv • x1 — xv+1,
11 = 1,
(Sc(a;))! = Sc(x) • (a:!).
This is a unary function whose existence can be derived either from the
existence of a certain binary function in 3.7 (which will be F(x, y) = yl)
or directly from 3.4 when we take S = P.
2. Check that Theorem 3.16 was not needed in the proofs of 3.17-3.19.
Give a direct proof of 3.16 from these latter theorems.
3. Prove Theorem 3.20.
4. Prove Theorem 3.23(iii).
3.3 Order. The basic notion which led to our formulation of Peano
systems, and hence to the characterization of the positive integers, was
that of a immediately preceding b, or equivalently of b immediately suc¬
ceeding a, when b = Sc (a). Generally, we should say that a precedes b
('a < 6)> or equivalently, that b follows a (b > a), if b succeeds a by a
number of steps, i.e., if starting with a we will eventually reach b by
forming Sc(a), Sc(Sc(a)), Sc(Sc(Sc(a))), . . . Using our usual notations
2 — 1 + 1, 3 = 2+1, etc., we have
Sc(tt) = a —f- 1^
Sc(Sc(a)) = Sc (a) + 1 = (a + 1) + 1 = a + (1 + 1) = a + 2;
(i) (Trichotomy law for <). Exactly one of the three cases
x < y, x = y, y < x
is true.
(ii) (Transitive law for <). If x < y and y < z then x < z.
Proof, (i) follows directly from 3.14 and the definition 3.24. (ii) follows
from the fact that if y = x + v, z = y + w, where v, w e P, then
2 = (x + v) + w = x + (v + w), and v + w G P.
We shall write x < y < z as an abbreviation for “x < y and y < z,”
x < y < z as an abbreviation for “x < y and y < zsimilarly for
x < y < z and x < y < z. We do not generally write such expressions
as x < y > z, although a suitable convention could be made about this.
We write x < y for “not x < y,” which is equivalent to y < x by trichot¬
omy; similarly, we write x ^ y for “not x < y, ” which is equivalent to
y < x.
(3:3-1) 2, 4, 6, . . . , 1, 3, 5, . . . ;
84 THE POSITIVE INTEGERS [CHAP. 3
it is seen that the transitive law holds for <, but not the trichotomy law
(since none of 2 < 1,2 = 1,1 < 2 holds). For an example in which the
trichotomy law holds but not the transitive law, we turn to a set with three
elements, S= {1,2,3}. Let 1 <' 2, 2 <' 3, 3 <' 1, and let <' hold
in no other cases. The student can use this as a basis to construct such a
relation in the set of all positive integers.
In the next chapter we shall turn to the set of all integers, which has the
natural ordering
(It is easy to check that this relation actually provides a simple ordering
of P.) One characteristic difference between this ordering and the natural
ordering is that there is no first element in the ordering -<
3.3] ORDER 85
3.28 Definition. Let (S, <) be a simply ordered system and let A c S.
An element c is said to be a first or least element of A if c E A and if
for all x e A with x c we have c < x. The element c is said to be
a last or largest element of A if c e A and if for all x E A with x A c
we have x < c.
3.29 Lemma. Suppose that (S, <) is a simply ordered system and A ci S.
If A has a first element then it is unique; similarly for last elements.
Proof. Suppose that both c, c' are first elements of A and suppose, con¬
trary to the desired conclusion, that c c'. By the trichotomy law, either
c' < c or c < c', but not both. Suppose first that c' < c. Since c is a
first element of A and c' ^ c we have c < c', which gives us a contradic¬
tion. Similarly c < c' would lead to a contradiction. Hence we must have
c = c'.
Thus if a set A has at least one first element, we can speak of the first
element of A. It may be, however, that A has no first element at all; such
is the case with the set P in the ordering of (3:3-5).
Well-ordered systems.
Thus the system (P, <) of (3:3-5) is not well-ordered. On the other hand,
we shall show that (P, <) is a well-ordered system. [So also is the system
(P, <) of (3:3-2); cf. the exercises.] In order to do this, we must first
establish some properties which connect < with 1 and Sc. These properties
may be thought of as the recursive characterization of < in P, i.e., they
show under what conditions x < 1 holds and under what conditions
x < Sc (y) holds. These conditions are given in the first two parts of the
next theorem. The remaining parts are simple consequences of these
conditions.
(i) x < 1;
(ii) x < Sc (y) if and only if x < y;
(hi) 1 < x)
(iv) Sc (y) < x if and only if y < x;
(v) y < Sc (y) and there is no z E P with y < z < Sc (y).
86 THE POSITIVE INTEGERS [CHAP. 3
Proof. We have already established (3.25) that (P, <) is simply ordered.
Consider any set A c P with A 0, but suppose, to the contrary, that
A has no first element. We shall show that this leads to a contradiction.
Let
Then
(2) Pci.
(3) I e B.
(5) P c A.
3.3] ORDER 87
(3:3-6) 2, 3, . . . , 1.
Thus for any well-ordered system (S, <) we can introduce an operation
Sc' whose domain is {x: x e S and for some w E S, x < w} such that for
any x in its domain Sc'(z) is the direct successor of x. If we denote by 1'
the first element of S, we see that properties corresponding to the recursive
description of < in 3.31 (i), (ii) hold in any well-ordered system. We
need add only two simple properties now to characterize the positive in¬
tegers through its ordering.
3.35 Theorem. Suppose that P' 5^ 0, and that (P', <') is a system with
the following properties:
(iii) every element x of P', other than the first element 1' of P', has a
direct predecessor.
For if Sc'(F) = 1', 1' is a successor of x, that is, x < 1', contrary to the
choice of 1' as the first element of P.
(3) If A Q P' and 1' E A and A has the property that whenever
x E A then Sc'(T) e A, then A = P'.
3.36 Theorem. The system (P, <) satisfies 3.35(i)-(iii), with 1 being the
first element of P under < and Sc(x) being the direct successor of any
element x with respect to <. Further, suppose that P' 5^ 0, and that
(P', <') is a system with the properties 3.35(i)—(iii). Then
Proof. That (P, <) satisfies the stated conditions is now seen directly
from 3.23, 3.31 (iii), (v), and 3.3. Suppose that (P', <') satisfies 3.35(i)-
(iii). We wish to construct a one-to-one function from P to P' which pre¬
serves the relation <. Let Sc', 1' be defined as in 3.35. Since (P', Sc', 1')
is a Peano system, we know by 3.5 that
(2) F is one-to-one;
This is true since both sides of the equivalence are false, F( 1) = P being
the first element of P' with respect to <'. Now suppose that the condi¬
tion in (6) is true for y and the given x. Then
Ordering and the arithmetical operations. We have thus far not connected
the ordering < with the arithmetical operations in any extended way.
The basic results here are the following.
y • z = (x + u) ■ z = (x • z) + (u • z)
(i) if xz = yz then x = y;
(ii) if 1 < z and zx = zv then x — y.
3.4] SEQUENCES, SUMS AND PRODUCTS 91
1. Let (S, <) be given and S' c S. Let <' be the relation < restricted to
elements of S'. Show that if (S, <) is simply ordered, then so is (S', < ').
Do the same for well-ordering.
2. Let (Si, < 1) and (S2, < 2) be given, with Si fl S2 = 0- Let S = Si U £2
and defme < on S by the condition:
Show that if (Si, < 1) and (S2, <2) are simply ordered, then so is (S, <).
Do the same for well-ordering.
3. Show that the ordering corresponding to
2, 3, 4, . . . , 1
2, 4, 6, . . . , 1, 3, 5, . . .
3.40 Definition. Let n e P, and let S be any set. We say that S has n
elements if S is set-theoretically equivalent to {k: k E P and k < n}.
3.41 Definition.
Two sequences (aq, x2, ... , xn) and (yx, y2, ..., yn) are identical if
and only if Xk = yk for every k < n. In particular (aq, x2) = (yi, 2/2)
if and only if aq = x2 and yx = y2, so that two-termed sequences behave
just like ordered pairs (aq, x2); however, (aq, x2) is defined in terms of
ordered pairs as {(1, aq), (2, x2)}. Similarly, three-termed sequences
(aq, x2, xs) behave like ordered triples. Thus, in general, n-termed
sequences (aq, x2, ... , xn) can be used in contexts where one might use
ordered 71-tuples (aq, x2, . . . , xn), and infinite sequences (aq, x2, . . . ,
Xk, . . ■) can be used to explain the notion of an ordered infinite-tuple.
As with ordered pairs, it is clear why we must use symbols ( ) which are
different from those { } for the formation of sets. Although (1, 2, 3} =
{1,3,2}, we have (1,2,3) 5^ (1,3,2). However, some authors write
{%k\ i<k<n where we write (Xk)i<k<n, so that the reader should watch
for the intended meaning. We shall use {aq, x2, . . . , xn} only for the set
of objects associated with a sequence, i.e., for {y: for some k e P, k < n,
y = aq}, and similarly with {aq, x2, . . . , Xk, . . .}. Thus the set associated
with the sequence (1, 3, 2, 1, 3) is {1, 3, 2, 1, 3}, i.e., is the set {1, 2, 3}.
It is not difficult to see that {aq, x2, . . . , xn} has m elements for some
m < n (in the sense of Definition 3.40).
(i) Xk = d;
for every n G P.
Let
(3:4-3) c = H( 1);
G(n, z) = z + i/(Sc(n)), for every n 6 P and z £ S.
(3:4-4) (i)" F( 1) = c;
(ii)" P(Sc(n)) = G(n, F(n)\ for every n e P.
Given the function H, the element c of S and the function G are well-
defined by (3:4-3). Then by Theorem 3.4' we see that there is a unique
function F satisfying (3:4-4) with 2D(F) = P and (R(F) c S. Thus 3.42
is just another form of recursive definition. (Hence it should be expected
that various properties of ££=1 xk will have to be verified by induction
on n.) We also need a notation to be associated with “product” operations
•; the following definition of this is simply obtained from 3.42 by chang¬
ing + to • and £ to n.
(i) n = d;
for every n e P.
4
(3:4-5) 2 xk ^ X Xk^j + x4 = ^ + X3 + Xi
sequence (x1} x2, . . . , xk, . . .) for which k < n; cf. Exercise 1 below.) If
Xi, x2, x3, x4 are positive integers and the operation we are dealing with
is the ordinary operation +, we know by associativity that
(3:4-7)(b) (yu y2) = (x2, x3 + x4), {zt, z2, z3) = (xu x2 + x3, x4).
n+m
(i) X
k= 1
Xk =
Xn-\-k
96 THE POSITIVE INTEGERS [CHAP. 3
n-\-m . n . . m
Ti ~f~ 1 / 71 \ , 1
(1) Xk ( %k j “F ( •En-{-k
k=1 \fc=l / \fc=l
For, by 3.42,
T.I =1 %n-\-k — ■£«+!•
Suppose (i) is true for m. We show that it is true for Sc(ra) = m + 1, i.e.,
that
n+(.m+1)
\ /m+l \
(2)- I Xk £ Xk\ + / Zn+fc j •
k=1
For
n+(rra-)-l) (n.-|-m)-|-l
^ Xk = ^ Tfc, by associativity of + on P,
k=1 A;=l
“F *r(n-fTO) + l)
(.?,*■)
,n+m .
“F *rn-(-(m-|-l)
y n m v
n y m
£ + ( £ Xn+k +
fc=i \fc=l
by associativity of + on S,
m +1
— Xk + £ ^n+fc, by 3.42(ii).
fc=i fc=i
(ii)
(x + y) - z = (x’z) + (y • z).
Some special sums and products. The reader is familiar with the values
of various special sums, for example: the sum of the first n positive integers
1, 2, 3, . . . , w; more generally the sum of any arithmetic progression
a, a + d, a + (2 • d), . . . , a + (n ■ d); the sum of the first n squares
1, 4, 9, ,n2; the sum of a geometric progression a, a ■ r, a • r2, ... ,
a - rn; etc. Expressed in the notation we have, for the first of these,
(3:4-8) ^ k = 1}-
fc=i 2
Since we do not formally presuppose fractions, this can for the moment be
expressed as follows:
(3:4-10) n! = ]J fc,
k—l
(3:4-11) 1! = 1,
(n + 1)! = (n + 1) • (n!).
h.4] SEQUENCES, SUMS AND PRODUCTS 99
n
(3:4-12) ^ x = n■ x
k= 1
and
(3:4-13) Rx = xn.
fc=l
n n n n
(3:4-14) Y xk = Y Xi = Y xi = H Xz-
fc= 1 2=1 j=l Z= 1
(3:4-15) UnH
Indeed we have done essentially this in showing that the definition 3.42
is justified. This sort of notation may be preferred on various theoretical
grounds, but for practical questions it is more awkward to deal with. For
100 THE POSITIVE INTEGERS [CHAP. 3
n n n n
Xk = X Vk and FI Xk = FI
k=l k=1 *=1 k=l
I ? II? | I I ? * * •
is the simplest and most obvious of these. Its practical disadvantages are
also immediately apparent. It is a time-consuming job to denote the
number of sheep in a moderate-sized flock or the number of bushels of
wheat in a crop by means of such a system. Even more laborious are
arithmetical computations which would be associated with various business
transactions. For example, we might agree to pay
II
bushels of wheat. It would be a long time before we discovered that
perhaps we did not want to pay that much, after all.
It thus became a matter of practical necessity, long ago, to develop a
compact systematic notation for dealing with large numbers. This was
provided by selecting larger basic units into which large numbers can be
decomposed into a small number of more easily recognizable parts. For
example, if ||||| is taken as such a larger unit, the above number of
bushels is more readily apprehended as
mm'Mt m m mi,
where the diagonal indicates a completed unit of |||||. Still larger num¬
bers would be analyzed in terms of the number of 4Hff' groups of
etc. If we abbreviated by /, and / groups of / by /, etc., we can
describe the above number also as consisting of /:/’s plus |||| : I’s, i.e.,
of a single/, no/’s, and |||| : |’s. To carry this through, we need a way
of indicating that there are none of a certain type of unit; we may do this,
for example, by using the symbol 0. Thus the above number consists
101
102 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
(4:1-1) 0 • a = 0, 0 + a = a.
Also implicitly involved in this use of 0 is that the associative law continues
to hold when the positive integers are extended by including 0 as a new
“number. ”
It would be natural to hope that the positional notation described above
would also lead to an economical means of carrying out arithmetical com¬
putations. In order to make this possible, still further algebraic properties
of 0 are needed. Consider, for example, what is involved in the computa¬
tions of 14-29 in the base 10 and the base 5:
29 104
14 24
(4:1-2) 116 (base 10) 431 (base 5)
29 213
406 3111
must make extensive use of the distributive law for • over +, and of
commutative and associative laws for both + and • , when the positive
integers are extended by 0.
It is seen then that much of value in problems solely concerning positive
integers could be obtained if we could be sure that it is consistent to as¬
sume that there is a new number 0 satisfying (4:1-1) such that most of
the usual algebraic laws holding for +, • in the set of positive integers
continue to hold in the new system of numbers. It should not be expected
that all laws will remain true. Thus, if we demand (4:1-1), we must have
0 ■ a = 0 • b for all numbers a, b, and hence we cannot consistently demand
that the cancellation law for • also continue to hold. The analysis of the
problem of justifying the positional notation and its use in arithmetical
computations provides us with a minimum requirement as to which laws
we should like to be able to extend to cover the adjunction of the number 0.
The brute-force approach to this problem is to test all pairs (x, y) of posi¬
tive integers in some succession, say (1, 1), (1, 2), (2, 1), (1, 3), (2, 2),
(3, 1), . . . with the hopes that we will eventually reach a solution. Un¬
fortunately, if a particular finite series of such tests fails to give us a solu¬
tion, the question will not be conclusively settled by this approach, since
we may not have gone far enough in our testing or there may be no solu¬
tion at all. With a little sophistication this can be remedied; we observe
that if there is a solution we must have 4x < 10 and Sx < 12, also
y < io and 2y < 12. From this we easily conclude that a solution, if
any exists, must be among the ten pairs (x, y) for which x < 2 and y < 5.
Although we are now in a position to decide whether there is a solution,
the number of computations we have to make (unless we are lucky) is
still slightly burdensome. The step of enlightenment consists in attempting
instead to eliminate one of the variables Irom this problem. We multiply
both sides of the first equation by 2 to obtain 8x + 2y = 20. Here 2y
must be a number which when added to 8x yields 20; let us write
2y = 20 — 8x as an abbreviation for this statement. Similarly, 2y =
12 — Sx. Thus, if there is a solution we must have 20 — Sx = 12 — Sx.
104 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
Our approach to this end will consist of two steps. We shall first set down
those properties of a system of objects D with certain operations +, •
which we would like to see fulfilled in an extension of the positive integers
and we will investigate the consequences of these properties. We will
then prove the existence of at least one such extension of P. There are
many such extensions; we shall be able to single out the set of integers
as providing, in some sense, the least such extension.
Some remarks about these conditions are in order here. First of all, the
conditions (ii)-(iv) are dual with respect to +, 0 and •, 1, i.e., each part
of these conditions is obtained from the other by replacing the one oper¬
ation and constant by the other. Thus any deduction of a theorem con¬
cerning + , 0 from the first parts of (ii)-(iv) can be used to obtain a deduc¬
tion of the dual theorem from the second parts, (ii) gives us commutative
laws, (iii) associative laws, and (iv) describes 0 and 1 as being identity
elements for +, *, respectively, (v) is a distributive law for • over +.
With the exception of the first part of (iv), we have seen in Chapter 3
that these conditions are all satisfied by P with +, •, 1. (vi) fulfills the
possibility of subtraction that we wished to obtain. It should be desired
that the result is unique, i.e., if x + u = y and x + v — y, then u = v;
more, simply, we want x-^-v to imply u = v. This corresponds
to the cancellation law 3.12 for P. We shall be able to derive this law for D
in 4.3 below.
In the modern study of algebra it has proved useful to consider systems
which satisfy some but not necessarily all of the conditions of 4.1. Most
prominently, those systems which are merely assumed to satisfy, among
the conditions of 4.1, the first part of (ii), both parts of (iii), the first part
of (iv), and (v) and (vi), are called rings. If, in addition, the second part
of 4.1 (ii) is satisfied, the ring is said to be commutative. If, finally, there is
an element 1 satisfying 4.1 (i) and the second part of (iv), the ring is said
to have a unit or unity element. It is for this reason that we have used
the given designation for the systems satisfying all conditions of 4.1.
It is thus seen that the notion of a commutative ring with unity embodies
those properties which, on first glance, we would like to see extended from
P, together with the general possibility of subtraction. We shall see that
many different such systems besides the integers can be constructed. Since
various of these will be useful in our development, it is worthwhile having
a list of results which apply to any commutative ring with unity. These
will be taken up in this section; in the next section we shall deal with
certain elaborations of the basic definition 4.1 which will bring us still
closer to the integers.
Throughout the remainder of this section we assume that (D, +, *, 0, 1)
is an arbitrary commutative ring with unity.
We begin by establishing the uniqueness of 0, 1 as identity elements.
Proof, (i) is immediate from 4.5(i) and (ii), and (ii) then follows by 4.4.
To prove (iii) it suffices, by 4.4, to show that x + (y + (—t)) = y. By as¬
sociativity and commutativity this is equivalent to y + (t + (—x)) = y,
which in turn follows from (i) and y + 0 = y.
Parts (i) and (ii) may be said to characterize —x as the additive inverse
of x.
Proof.
x-(y — z) = x-(y + (—2)) by 4.6(iii)
= x-y + X' (—z) by distributivity
= x-y + (— (x-z)) by 4.8(h)
= X'y — x-z by 4.6(iii).
(i) nx = ^2 x
k= 1
and
(ii) xn = j~[ x.
k= 1
(i) x1 = x; xm+n _ ,
(ii)
s
O
O
(iii) xm'n = (xm)n; (iv)
II
(v) 1” = 1; (vi) (x-iy)n =
4.2 Integral domains. There are other concepts and laws associated
with the positive integers which we should see if we can consistently extend
in whole or in part to systems in which subtraction is available. For
example, we have the cancellation law 3.21 for multiplication, according
to which if x, y, z E P and x • y = x • z, then y = z. This law cannot be
consistently extended to commutative rings with unity as it stands; for
0-1 = 0-0 but 1 0. However, if we exclude the possibility that
x = 0, we obtain a useful concept.
Theorem 4.14 provides the first step to the study of the solutions of
algebraic equations in an integral domain. Let D be such a domain and
let n = nl for any n E P. By the results of Section 4.1 we have, for
example, x2 — 2-x — 3 = (x — 3)-(x + 1). Hence, for any x E D,
x2 — 2-x — 3 = 0 if and only if (x — 3) • (x + 1) = 0. By 4.7 and
4.14, the latter is equivalent to x — 3 = 0 or x + 1 = 0, i.e., to x = 3
or x = —1. It would be true in any commutative ring with unity that
both 3 and —1 are solutions of the equation x2 — 2-x — 3 = 0. How¬
ever, we would be unable to establish, without the hypothesis that we have
an integral domain, that these are the only solutions of this equation.
We have not yet proved the existence of commutative rings with unity,
let alone of integral domains. Since the conditions for these do not ex¬
plicitly demand the existence of any elements other than 0, 1 where
0 1, it is conceivable that such systems can be found using only these
two elements. If there were such a commutative ring with unity, it would
necessarily have the following addition and multiplication tables:
+ 0 1 • 0 1
0 0 1 0 0 0
1 1 0 1 0 1
the • table are determined by the conditions .r*0 = 0 = 0-.r and x-1 = x.
Now it can be verified, by checking each possible choice of x, y, z in each
condition of 4.1 and 4.13 (or 4.14), that the two-element system thus ob¬
tained is actually an integral domain. This is a tedious matter; for ex¬
ample, to verify the distributive law alone requires checking eight cases.
However, we shall be able to obtain verification of this result as an easy
consequence of a theorem allowing us to construct a large number of
integral domains, which will be proved at the end of this chapter.
(iv) for all x, y, z e D, if x < y and 0 <2, then x-z < yz.
4.2] INTEGRAL DOMAINS 111
x E D+, x = 0, —x E D+;
x E Pos, x = 0, —x E Pos.
Hence
(2) F is one-to-one;
and
We obtain (3) from 4.11 (ii) in the form (n + m)\ = nl + ml. We ob¬
tain (4) from 4.11 (iii), (vi) in the form (n-m)l = n(ml) = n(l*ml) =
nl -ml.
Proof, (i) is obvious if 0 < x and follows from Exercise 1(c) below
when x < 0. Hence —\x\ < 0 < \x\. Since x = —\x\ when x < 0, this
leads to (ii). To prove (iii), the conclusion is obvious if 0 < x. If x < 0,
then from —u < x follows —x < u by Exercise 1(c), and hence again
M < u. (iv) By adding the inequalities — |.r| < x < \x\ and —\y\ <
V 5: \y\ we obtain —(\x\ + \y\) < x + y < \x\ + \y\. Applying (iii)
with x + y instead of x and \x\ + \y\ instead of u gives the desired con¬
clusion. (v) is easily proved by considering the four possible cases.
Part (iv) of 4.20 is the familiar triangle inequality.
We would like to show now that there exists at least one domain with this
property and that any two domains with this property are isomorphic.
Then we will be justified in choosing one of these and calling it the system
of integers.
114 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
'x + y if x E P, y G P
x if x E P, y = 0
(4:3-2) x + y • x — z if x E P, y E P*, y = z* and z < x
{z - x)* if x £ P, y E P*, y = z* and x < z
p if x G P and y = x*.
Proof. Define
(1) W = {((n, m), (n', m')) : (n, m), (n', m') e P X P and
n + m' — n' : • ?n].
This follows from the following statements, which hold true for all n, m,
n', m', n", m" G P:
(a) (n, m) = (n, m);
(b) if (n, m) s (n',m'), then (n',m') = (n, m);
(c) if (n, m) = (n', m') and (n', m') = (n", m"), then
(■n, m) = (n", m").
Of these, (a) and (b) are obvious from the definition of =. To prove (c),
we have, by hypothesis, n + m' = n' © m and n' © m" = n" + m'; we
116 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
wish to conclude that to © m" = to" -j- m. We add the first two equations
together to obtain
which is the desired result. Conditions 7(a), (b) express, in the terminology
of (2-3.36) that = is a congruence relation with respect to the operations ©, °
and the relation < .
Now let
Hence also
In order to do this we must show that all the conditions 4.1 (i)—(vi) for a
commutative ring with unity are met, that the additional conditions 4.13
(or 4.14) and 4.15 for an ordered integral domain are met, and that the
new condition stated in our theorem is satisfied. We shall content ourselves
here with a sampling of the proofs involved.
Consider first the commutative law for +,
Let X = [n, m\. Then X*1 = [{n, m) ° (2, 1)] by (9b) and (10b). Hence
X• 1 = [2n + m, 2m © n\. Thus it suffices to prove that
Assuming that the earlier conditions for a commutative ring with unity
have been established, it suffices to find V e D such that X + V = 0;
for then we can take U = V + F. Let X = [n, m\; this corresponds to
n — m. Clearly its negative should correspond to m — n. Thus we take
V = [m, n]. Now
For let X = [n, m], Y = [p, q] and suppose that X ^ F. Then (n, m) ^
(p, q), that is, n + q m + p. Hence n + q <m + porm + p <n + g.
4.3] CONSTRUCTION AND CHARACTERIZATION OF INTEGERS 119
In the first ease, (n, m) < (p, q) and in the second case (p, q) < (n, m)
by (6). Hence [n, m] < [p, q) or [p, q] < [n, m] by (9c). As another
example, we have
For let A = [n, m], Y = [p, q],Z = [:r, s]. By hypothesis, (n, m) < (p, q),
that is, n + q < p + m. We wish to show that (n + r, m + s) <
(p + r, q + s), that is, n + r + ^ + s < m + s + p + r. This follows
by adding r -f- s to both sides of the given inequality.
Finally, we shall prove the special property mentioned in our theorem.
To do this we must first compute nl and —nl for every n e P. By defini¬
tion nl = £it=i 1- It is easily seen by induction on n that
(20) nl = [n + 1, 1],
Further it follows from the proof of (16) that — nl = [1, n + 1], We wish
to show that
We can now obtain from 4.21 the existence of a system of the desired
kind which actually contains P.
I<
Figure 4.1
That is, we have exactly the situation of (2:4-9), as illustrated in Figs. 2.22
and 2.23. According to the result there, we can choose a set I containing P
and define operations +, • on I and a relation < on I such that
(I) "Tj <, 1) contains (P, +, •, <, 1) as a subsystem, thus satisfying (i).
Further we can extend G to an isomorphism H of this new system onto the
system (D, +,*,<, 1). If we take 0 to be the unique element z in I with
H(z) = 0, H also establishes that
Proof. It is also more convenient for this proof to deal with differences
of positive integer multiples instead of just elements and their negatives.
First note that
(3) G is one-to-one.
For suppose that x, y G I and G(x) = G(y). By (1) we can find n, to, p,
gel with x = n — m and y = p — .q. Then by (2), nl — ml =
pi — ql; hence nl + ql = ml + pi and (n + g)l — (m-fp) 1 by 4.18
[or 4.11(h)]. But then n + q = to + p by 4.18 and hence x — n — m —
p — q = y. We show next that
G(x + y) = (n + p) 1 — (to + g) 1
= (nl + pi) — (toI + gi)
= (nl — toI) + (pi — ql)
- G(x) + G(y).
The proof of the second part makes similar use of the algebra of integral
domains. Since (7(0) = 0 and (7(1) = 1, we need only show
in order to complete the proof of the first part of our theorem. We have
n — m < p — q if and only if n -f- q < to + p, since we have an ordered
integral domain. This condition is in turn equivalent to nl + ql <
ml + pi by 4.18, and hence to nl — toI < pi — ql, again by the
results on ordering.
To prove (ii), suppose that for any u £ D, either u = nl with n e P,
or u = 0 or u = — (nl) with n e P. Then u = G(k) for /cel, with
k = n, k = 0, or k = —n, respectively. Hence D c 61(G) = I and
therefore D = I.
It follows from this theorem that 4.11 can be extended to hold for
arbitrary integer multiples of elements of any ordered integral domain.
In fact it can be seen, by additional argument, that if (D, +, •, 0, 1) is
any commutative ring with unity then 4.11 (i)-(vii) remains true for any
x, y G D and m, n e I. However, in the following text, we have no further
need for the more general multiples.
The results 4.22 and 4.23 now permit us to adopt the following con¬
vention.
1. Using the notations of the proof of 4.21, show that for any n, m, p, q, p',
q' E lf (p> q) = (p', q'), then (n, to) ° (p, q) = (n, to) ° (p', q'). Use
this result to establish (7b) of the proof.
4.4] THE INTEGERS AS AN INDEXING SYSTEM 123
2. Using the notations of the proof of 4.21 show the following for any
z, y, zed.
(a) Z + 0 = X
(b) X-(Y-Z) = (X-Y)-Z
(c) X-(Y + Z) = X-Y + X-Z
(cl) if X < Y and 0 < Z, then Z-Z < F-Z.
3. Prove that for any x E I we have x E P if and only if 0 < x.
4. Prove that for any x, y E I we have y < x if and only if y + 1 < x.
(1) F(x) = (x - a) + 1
(3) F is one-to-one.
This follows immediately from Exercise 1(a) of Exercise Group 4.2. Thus
F establishes the following isomorphism:
For any n > 0 we can construct from this the n-termed sequence
(4:4-2) (xq, ■ ■ ■ i )
and
if m < n.
4.4] THE INTEGERS AS AN INDEXING SYSTEM 125
The condition (i) can be viewed as defining the sum and product of the
empty sequence. 1 he choice of values 0 and 1 in (i) is arbitrary, so far
as producing a well-defined notion is concerned, but is not arbitrary if we
wish these special sums and products to share the general properties of
usual sums and products. We have, for example,
2= (2
fc=l
\k=1
**) + *1
)
= 0 + xi = Xi,
and similarly
i
n
k=l
Xk = i-xi = Xi,
so that the conditions of 4.26 accord with 3.42(i) and 3.43(i) in this case.
The condition (ii) of the above definition is justified by recursive definition
on Pm for each m E I, m > 0. Further, inductive proof on Pm easily
serves to establish the following.
,
4.27 Theorem. Let (xo,. . ., Xk • ■. ) be an infinite sequence of elements of
D, and let n, m, q G I, m > 0, q > 0. Then we have
n n
(i) ^2 xk and JJ Xk, as defined in 4.26,
fc=l S: = 1
n n
(iii) 2- Xk = X) (z'Xk) for any z E D.
k=m k=m
4.27(iii) follows immediately from (ii) for q = 1 and the general dis¬
tributive law 3.46. The condition 4.27(ii) allows us to choose the initial
value of k at our convenience.
The reader should convince himself that this is the precise formulation of
the desired result. The proof is left to him. For a formulation of a general
commutative law we need the notion of a change in the order of factors.
This is provided by the following.
Thus the sequence (x3, xlt x2, xfi) is obtained by a permutation F from the
sequence (xi, x2, x3, xfi) where F{ 1) = 3, F(2) = 1, F(3) = 2, F(4) = 4.
(Note that one or the other of the sums on the right-hand side of the
equation might be empty.) To reduce this to the inductive hypothesis,
we wish to write the sum of the first two terms as %F(k) for suitable F.
Define
We claim that
n+1
= J2Xk-
k= 1
This completes the inductive step and hence the proof of the theorem.
xn == yj x.
k= 1
(D, +, *>0, 1). These are the geometric series and the binomial expansion.
The first of these is trivial.
etc., we expect that the general binomial expansion should take the form
(4:4-7) (x + vT = £ 4”>x”-y
k—0
n times
and consider each factor as being numbered, say first factor, second factor,
etc., we see that the coefficient of xn~kyk is the same as the number of
ways we can make distinct choices of k factors. In other words, c*™"1 is
the number of distinct subsets containing exactly k elements chosen from
a set consisting of n elements. Clearly, under this description,
1 1
1 2 1
(4:4-11) 13 3 1
1 4 6 4 1
4.4] THE INTEGERS AS AN INDEXING SYSTEM 129
where each number in each row, other than the first and last 1, is obtained
as the sum of the two closest numbers in the row directly above it. Thus
(4:4-9) and (4:4-10), as reflected in the Pascal scheme (4:4-11), provide
a simple recursive calculation procedure by which we can obtain any c*re).
There is another approach to the calculation of the c£n) which involves
counting permutations. The number of permutations of a set consisting
ot n distinct elements is the same as the number of permutations of
{1, . . . , n}. Every function on this set is alternatively described by 3.40,
as a sequence (blt . . . , bn). The condition that such a sequence then be a
permutation can simply be written as {bx, ... ,bn] = {1, ... ,n}. The
number of such sequences is determined as follows: bx can be chosen in n
different ways; once bi is fixed, b2 can be chosen in n — 1 different ways,
. . . ; once 61? 62, ... , bn_2 have been chosen, 6n—1 can be chosen in two
ways, and bn is then completely determined. Thus the number of distinct
permutations <61, . . . , bn) is n ■ (n — 1) • . . . • 2 • 1, i.e., is simply n\.
Now let us look at the number c£n) of A:-element subsets of the set
{1, . . . , n}. With each such subset X there are associated certain permu¬
tations. We can permute the elements of X in Ad ways and we can permute
the elements of X in (n — k)! ways. Each of the first is a sequence
(&!,...,&*) with range {6j, . . . , bk} = X; each of the second is a sequence
(b[, . . . , bn-k) with range X. Together these determine a permutation of
{1, . . . , n}, namely (bx, . . . , bn) where bk+i = b[ for 1 ■< i < n — k.
The number of permutations of {l, ... ,n) determined in this way by a
given X is k\(n — 1c)!, since distinct permutations of X or of X lead to
distinct such permutations. Further, if Xx and X2 are distinct A:-element
subsets and <6i1), . . . , 6”)), (bx2), . . . , b^) are permutations associated in
this way with and X2, then they must also be distinct; for otherwise
n\
(4:4-12) 4"> =
k\(n — k)!
These informal arguments are the basis of the following definition and
theorems. As is customary, we will now use the symbol ® for c*n).
130 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
4.34 Definition. Let n, k e I, with 0 < n in (i), (ii) below, and 1 < k < n
in (iii). We define
n
(i) n\ = II D
i—1
(f) Use (c) to give another proof of the generalized commutative law 4.30.
3. Prove Theorem 4.35.
4. Prove Theorem 4.36.
5. Using the interpretation of Q in terms of sets, prove that the number of
subsets of a set with n elements is 2".
6. Observe (in integers) that 2Xx (ak — ak-i) = an — a0. Thus
n n n
n =
k=l
tt2 ~ (* - D2] = 2^
k=1
~ !) =2
jfc = 1
~ n.
This gives
Use this method to find X"=i h2, &3- Frame a general recursive
procedure for finding X”=i where to is any fixed positive integer.
7. Calculate (1 — x) U"=0 (1 + £2*) (i G I).
or equivalently if
The study of such “near” division in the integers has many interesting
consequences. Among questions which should be answered about (4:5-1),
(4:5-2) are whether such a representation is always possible for any a
and b ^ 0, and if so whether the quotient and remainder are uniquely
determined in such a representation. The answers to these questions are
provided by the next theorem.
4.37 Theorem. For any a, b e I with b > 0 there exist q, r e I such that
a = bq + r and 0 < r < b. Further, if q', r' £ I and a = bq' + r',
0 < r' < b, then q = q' and r = r'.
(2) for any a 6 I there exists q £ I with bq < a < b(q +1).
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 133
Suppose that r ^ r'. Then either r > r' or r' > r. In the first case we
write
Since r — r' > 0 and b > 0 we must also have q' — q > 0, and hence
q' — q > 1. It follows that the right-hand side of (4) is > b. On the
other hand, the left-hand side of (4) is < r < 6, giving a contradiction.
Similarly, if r' > r we obtain a contradiction. Thus r = r' and
b(q' ~ q) = 0. But since wTe have an integral domain and 6^0, we must
have q' — q = 0, i.e., q = q'.
The divisibility relation and the primes. The cases in which the remainder
after division is 0 are of special interest.
It can be mechanically checked that the first few positive primes are
Often it is only the positive primes which are referred to as primes. How¬
ever, from the algebraic point of view the present definition is more
natural (this will be brought out in our discussion, later in the book, of
divisibility questions for polynomials). We shall prove below that there
are infinitely many positive primes (and hence infinitely many negative
ones). The following useful property of nonprimes is easily verified.
4.41 Lemma. If a e I, a > 1, and a is not prime, then there exist ax, a2 E I
with 1 < ai < a, 1 < a2 < a, and a = ax ■ a2.
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 135
Given a number a > 1 which is not prime, one or both of these numbers
°i> a2 could be prime. However if, say, ax is not prime it can be factored
further as ax = a[ ■ a2 where both a[, a2 are greater than 1. We proceed
similarly with a2, if a2 is not prime. By continuing this process we ex¬
pect that we will reach, in a finite number of steps, a representation
a = pi • p2 • . . . • pn, where p1} p2, . . . , pn are all positive primes. It is
conceivable that, by performing these computations in a different order
we would reach a representation a = p{ ■ p2 ■ . . . • p'm where n ^ m or
where n = m, but the representations are essentially different in that the
sequence p[, p2, ... ,pL is not a permutation of the sequence plt p2) ,
pn. This will be shown not to be possible if we can demonstrate that every
prime p has the following property: if p\bc, then p\b or p\c. For from
Pi\(P\ • P2 ■ ■ ■ • • Pm) will then follow px\p[ or px\{p2 • . . . • Pm), hence
Pi = Pi or pi\(p2 ■ . . . ■pm); by repeating this procedure, we would
eventually conclude that pi = p[ or pi = p2 or • • • or px = p
Then cancelling px from both sides of the equation px ■ p2 ■ . . . • pn =
p{ ■ p2 ■ . . . ■ Pn and repeating the argument for p2, p3, . . . , we could
eventually realize that the representations must be the same, except
possibly for the order of the factors. It is the object of the next group of
theorems to make these ideas precise, ending with a proof of the existence
and uniqueness of such representations. The following development,
while not the most direct to gain this end, is more informative and more
readily generalizable to questions of divisibility in other systems.
It follows that if d\, d2 are both gcd’s of a and b, then dx\d2 and d2\di)
hence di = ±d2 by 4.39(vii). Further, if dx is a gcd of a, b, then so is
—d\. Evidently “greatest” here does not refer to magnitude in the usual
sense.
These considerations do not yet yield the existence of a gcd for any pair
a, b. This is easily established in certain special cases. If a = 0, then any
number divides a; since b\b this shows that b is a common divisor of a and b.
Moreover, 6 is obviously a gcd in this case. A gcd also evidently exists in
the case a = b. The nontrivial cases are given by the restriction a ^ 0,
b 0, and a ^ b. For simplicity, let us consider first the case a > b > 0.
The following argument provides a proof of existence of the gcd of a, b
in this case and at the same time gives what is known as the Euclidean
136 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
^n —2 ^n — lQn — 1 4“ 0 rn rn—j.
Then
rx — a — bq,
r2 = b — rxqx = b — (a — bq)qx = — a9i + b(l + qqx),
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 137
and
nt-i — axk—i + byk—i,
so we can also write r* = ax^ + fo/#, with suitable Xk, yk■ It follows that
rn, which is the positive gcd of a, b, can be written in the form ax + by,
for certain x, y £ I. Consider any number ax' + by' where x', y' are inte¬
gers. If c £ I and c\a and c\b, then c\{ax' + by') by 4.39(viii), (ix). In
particular, rn\(ax' + by'). If ax' + by' is positive, this implies rn < ax' +
by'. Hence the positive gcd of a, b is the least positive number of the
form ax' + by'. This characterization suggests another proof of the
existence of gcd’s which is slightly more sophisticated than the foregoing
but is in certain respects more informative.
We will make use of the following two distinctive properties of the set L
of all linear combinations ax + by, where x, y 6 I:
It is seen that if L' is any other set which satisfies (4:5-8)(a), (b) and we
have a, b G L', then L c L'. Clearly a = a • 1 + b ■ 0 and b = a • 0 + b • 1
are in L. Hence L is the smallest set L' satisfying all these conditions.
Now we can prove a theorem about any set S satisfying the conditions
of (4:5-8) in place of L, which will give us the desired result about L as a
special case.
(1) d e S;
(2) d > 0;
Steps (4) and (6) together give the desired equality. Thus d G S and is
a divisor of every element of S. Suppose that d' were any other number
with this property for which d' > 0. Then d\d' and d'\d; hence d = ±d'
by 4.39(vii). Clearly we cannot have d = —d', so d = d'.
There is a possibility of confusing this notation for the gcd with that for
the ordered pair. Both these notations are in standard use. The context
will always determine which meaning is intended.
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 139
(1) 1 = as -\- ct
We leave the proof of this to the reader. Although it is now easily obtained,
4.47(iii) establishes the important property of prime numbers which led us
to the consideration of gcd’s. We also leave the following for the reader
to prove.
4.49 Theorem. Suppose that (S, <) is a well-ordered system and that
A c S. For any a E S, let Sa = {x : x E S and x < a}. Suppose
A has the property that whenever Sa c A, then a £ A; then A = S.
140 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
Conversely, it can be shown that any simply ordered system (S, <)
must be well ordered if it has the following property for all A c S: if
for every a e S, Sa Q A implies a E A, then A = S.
A proof making use of the property 4.49 of well-ordered systems is some¬
times called a course-of-values induction since it refers to the behavior of
all values preceding a given element a. In particular applications it should
be observed that when trying to verify that the condition, implies
a G A, holds for all a e S, one will encounter one case in which the
hypothesis Sa d A is trivially satisfied, namely when a is the first element
of S, for then Sa = 0. It may be necessary to give a direct proof that
a e A in this case.
(i) there exists a sequence (p\, . . . , pn) of positive primes such that
a = pi ■ . . . • pn)
(ii) if (qi, . . . , qm) is any other sequence of positive primes such that
a = qi ■ ■ qm, then n = m and for some permutation F of
{1, . . . , n}, qk = pF(k) for all k < n.
Assume the hypothesis of (1). We first show that (i) is also true for a.
If a is prime, this is immediate. Otherwise, we have, by 4.42,
By hypothesis, there exist two sequences, (pu . . . , pni) and (p[, ... , p„2),
of positive primes such that
n2
n n
'“l
Let n — nx + n2 and define pUl+k for 1 < k < n2 as p*. Then by 4.27(ii)
and 4.28,
(n »i
pa
\ / n
• ( n Vk) = n ?*.
\ n
k= 1 ) \fc=n1-pi f k=l
(5) a n
k= 1
(6) a Pn-
Hence pn\ \Yk=i Qk- Then by 4.47 there is an i, 1 < i < m, such that
PnWi- But since 1 < pn, this can only happen when pn = (p. Let G be
any permutation of {l, ... ,m} such that G(m) = i. Then by the gen¬
eralized commutative law 4.30
ii ?* = n
uv rrc
(7) a =
A:=l fc=l
?G(fc) (if item)• a Pn-
Define
are y±i, and kx = G(ji), k2 = G(j2) where j1} j2 are <n, then
F(ki) = H(ji), F(k2) = H(j2); hence j\ = j2 by one-to-oneness of H,
and then kx = k2. The only other case to consider is kx y± i, k2 = i.
But in this case F(kx) y± F(k2). To conclude the induction we need only
observe from (9) and (10) and the fact qi = pn that
From this theorem the following can be obtained without much trouble;
the proof is left to the reader.
4.50 and 4.51 are generally referred to as different specific forms of the
unique factorization theorem for integers. Given a representation of a as in
4.50, we can find all positive divisors d of a very easily, as all products
of subsequences, i.e., sequences (pkl, . . . , Pkm), where 1 < kx < ■ ■ ■
< km < n. This leads to a convenient way of finding the gcd (a, b) when
representations for a, b are available. (The details will be left to the
exercises.) Further, inspection of the proof of part (i) in 4.50 reveals that
such representations can always be found in a finite number of steps.
This can be done in the following particular routine. Given a > 1, list
all positive primes 2, 3, 5, . . . which are <a, until one comes to the first
which is a divisor of a. Call this prime and write a — pi ■ a2. If
a2 > 1, we repeat this process, to obtain a2 = p2 ■ a . Then the sequence
ax, a2, . . . of numbers thus obtained must eventually reach 1, since when¬
ever ak > 1, we have ak = Pk ■ ak+i and hence ak > ak+1. When
an > 1 and an+x = 1 we have a — px ■ . . . ■ pn.
(i) a = ckbn~k ;
k=0
(hi) c0 5^ 0.
(2) q G P.
(3) q < a.
By (2), (3) we can apply the inductive hypothesis to q. Hence there exist
unique m e I, m > 0, and integers (d0, . . . , dm) with
m
(4) (i) 5 = £
/c=0
(iii) d0 ^ 0.
(5) a = £ + r.
k=0
Define
(7) (i) a = £
k=0
(iii) c'0 ^ 0.
k=0 k=0
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 145
Thus if we define
(9) 5' = E
fc=o
we have
But 0 < c'n, < b, so that by the unicity condition of the division
algorithm,
= (b - 1) ^2 bk = bn+l - 1.
k=0
For any other m with bm < a < bm+1, if say m < n, we would have
m + 1 < n, hence bm+1 < bn < a, which is a contradiction. Similarly,
we cannot have n < m. Hence n = m.
ni n2
n—A:
«i = X dl’kb
and «2 = X rf2,fcfrn fc,
k—0 k—0
where
di.fc = 0 if 0 < k < n — n\,
Then
ax + a2 — X (dilfc + d2'ic)bn k.
k=o
Let d/c = rfi.fc + d2ifc for 0 < k < n. Then d0 ^ 0. However, we do not
necessarily have 0 < dk < b, but can only conclude that 0 < dk <
26 — 2. This forces us to consider the so-called “carry-over. ” For example,
if 0 < dn < b, then we will take for the nth digit cn in the expansion of
a i + a2 the number dn. Otherwise we can write dn = 6 + r, where
0 < r < 6. In this case we take cn = r, and carry over 1 to the coefficient
of the 61 term. As the new coefficient of the 61 term we have dn_x + 1,
which is <26. Let d'n_x be dn_j 1 or dn_i according as there is or is not
a carry-over. Since dn < 26 — 2, we certainly have d,h-i < 26. Hence
we can now repeat for dn-i the procedure which we applied to dn. If
6 < dn-i, we will have^_x = 6 + cn_lt 0 < cn_x < 6. Then 6 =
1 • 62 -f- cn_x6, so that we must now carry over 1 to the 62-term.
Continuing in this manner we will eventually reach the representation
tor ax -f- a2. The student should analyze in detail the algebraic laws which
make this procedure possible, and should carry out a similar analysis for
multiplication.
Although we have not yet considered any integral domains besides the
integers, it should be remarked that a number of our notions and results
about the integers can be extended to a wide variety of other domains.
This is true of the division algorithm, the notion of divisibility, existence
of gcd, notion of prime, and representation as products of primes. Since
all our proofs made use of the well-ordering of the positive part of the
integers, some variants of these arguments must be made in order to carry
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 147
these theorems over to other domains. We shall consider this in only one
case later on in the book, where we take up divisibility questions concern¬
ing the polynomials. There we will be able to use induction on the degree
of the polynomials to take the place of the induction arguments in the
integers. In contrast to these theorems, the general representation theorem
4.53 is peculiar to the integers. For if every positive element of an ordered
domain could be represented to the base 2 (i.e., 1 + 1) of the domain, the
set of positive elements would be isomorphic to P, and the domain itself
would be isomorphic to the integers.
and any congruence relation = for this system the corresponding system
(4:6-3) a = [a]
Now we shall see that congruence relations give essentially all the
homomorphic images of a system. For suppose that G is a homomorphic
mapping of (S, Flt . . .) onto (S', F[, . . .). Define the relation = by
(4:6-5). (It may be observed that if G is one-to-one, i.e., is an isomorphism,
then is is just the relation = of identity.) Then (4:6-5) shows that = is
an equivalence relation in S. Suppose that X\ = yi} x2 = y2, that is,
G(xi) = G(yi), G(x2) = G(y2). If Wx is, say, binary, then by the condi¬
tion of the homomorphism (x\, x2) G Wx if and only if (G(xf), G(x2)) G W(
and (ylt y2) e Wx if and only if {G(yx), G(y2)) e W[. But G(x{) = G(yi),
G(x2) = G(y2), so (xi,x2) e Wx if and only if (2/1, 2/2) G Wx. Further,
if Fi is, say, binary, then we claim FX(x\, x2) = Fi(yx,y2), i.e., that
G(Fi(xi,x2)) = G(Fx(yx, y2)). This follows from the fact that
for every x G S and which will provide the desired isomorphism. (To see
that (4:6-7) uniquely determines H, it is only necessary to check that if
G(x) — G(y), then [x] = [?/].) Thus by means of congruence relations in
the original system (S, F x, . . .) we obtain every homomorphic image, at
least up to isomorphism. The choice of whether, in a given discussion, we
should speak about congruence relations or about homomorphisms is thus
a matter of convenience.
Here we have passed from each line to the next by using the properties
of G as a homomorphism, except when going from the third to the fourth
line, where we used the assumption that Fx (F2(x), F1(a2, x)) = F2(oq)
is true in the original system.
Incidentally, we can now expand on our remarks in Chapter 2 in con¬
nection with properties shared by a system and its subsystems. It is easily
seen that every equation which is true in a system is also true in every one of
its subsystems. This answers part of the question: what algebraic 'prop¬
erties are preserved when we pass from a system to any of its subsystems?
The complete answer to this question is also known by recent results
in logic.
In contrast to the above, an inequality is not in general preserved by
homomorphism. For we may have particular elements of S, say a a2, i,
such that oq 9^ a2, but G(ai) = G(a2), since G is not required to be one-
to-one. An inequality will be preserved when passing to subsystems, but
other, slightly more complicated, properties will fail to be preserved, for
example, “existential” properties (cf. below).
The question of finding a large class of properties of relations preserved
by homomorphism or by passing to subsystems is, from the logical point
of view, no more difficult than for those involving functions and constants,
but it would be even more difficult for us to explain in the present frame¬
work. The reader can check that various properties of a system (S, IF),
where IF is binary, such as IF being reflexive, symmetric, transitive in S,
are preserved under homomorphism or by passing to subsystems. Also
the property of being simply ordered is inherited by subsystems. We can
ask the same question with respect to homomorphism. However, there
are no nontrivial homomorphisms of ordered systems, so that this question
loses interest. For suppose that (S, <) is a simply ordered system and
that = is a congruence relation in this system. Then for any x\, x2, yi, y2)
if xi = 7/i, x2 = J/2, and Xi < x2) then yx < y2. If the corresponding
homomorphism G is not trivial, i.e., is not an isomorphism, then aq = 7q
for some X\, y\ where aq 9^ y\. But then aq < ?q or ?q < aq. Suppose
the former. Apply our condition to aq = ?q, zq = ?q; then from x\ < tq
we would have to conclude that 7/1 < Tq. We similarly reach a contradic¬
tion if y 1 < aq. Thus = must be the identity relation in this case. As
we shall see in a moment, we can construct many nontrivial homomor¬
phisms of the system (I, +, •, 0, 1). The above shows that the congruence
relations corresponding to these homomorphisms cannot also be a congru¬
ence relation with respect to <. (Because of such results as the above, the
notion of homomorphism for systems with relations only is sometimes
defined in a slightly different way than is done here. Under this other
definition, ordered systems can have nontrivial homomorphic images which
are again ordered.)
152 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4
i.e., x' +' G(u) = y'. Thus u' = G(u) is a suitable choice.
4.57 Theorem.
We leave the proof of this to the reader. The sets [0]m, [l]m, . . . ,
[m \}m are also often called the congruence classes mod m. We have,
for example, for m = 3,
4.58 Theorem. The relations =m, for me I, are the only congruence rela¬
tions for the system (I, +, •, 0, 1).
(1) S = {u : u = 0},
since
(3) ScI, S * 0;
(5) if u, v e S, then u + v e S.
In other words, S is a nonempty subset of I closed under linear combina¬
tions. The properties (4), (5) follow directly from the assumption that =
is a congruence relation. By Theorem 4.43,
For m — 4 we have the four congruence classes, [0]4, [1]4, [2]4, [3]4,
which we also (as is common practice, but with slight danger of am¬
biguity) denote by 0, 1, 2, 3. We then have the following tables for +
and • (mod 4).
+ 0 1 2 3 • 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
Note that in this last we have [2]4 • [3]4 = [2]4 • [1]4, but [3]4 5^ [1]4; i.e.,
the cancellation law for multiplication does not hold in this system, hence
it is not an integral domain.
4.00 Theorem. Let m e I, m > 1. Then (ITO, +m, -TO, [0]m, [1]TO) is a
commutative ring with unity. It is an integral domain if and only if
m is a prime.
Proof. The first part is immediate from 4.55 and the fact that
[0}m [1 ]m (otherwise 1 = 0 (mod m), that is, m\\). Suppose that m is
prime and that [x]m -m [y]m = [0]m, i.e., by 4.59, [x • y\m = [0]TO, so
x ■ y = 0 (mod m). Then m\(x • y). By 4.47(h), m\x or m\y, hence x = 0
(mod m) or y = 0 (modm), so [x]m = [0]m or [y]m = [0]w. Conversely,
suppose that the system is an integral domain. If m is not prime, we can
write m = a-b where 0<a<m, 0<b<m (4.41). Then [0]m =
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 155
Actually, the systems ITO, for m a prime, have an even more interesting
property, which we shall take up in the next chapter.
(4:6-9) if (x, y, z) is a primitive solution with x even, y odd, then for some
u, 9GP, x = 2uv, y = u2 — v2, z = u2 + v2, (u, v) = 1,
u > v, and u, v have opposite parity.
that any u, v with u > v thus lead to a solution. (4:6—9), together with
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 157
POLYNOMIALS
(i) ©(/) = D;
(ii) for a certain sequence (a0, oq,. . . , an) of elements of D, where
an 5^ 0 if n > 0, and all x G K,
/(») = 'Yj
i=0
Here (a0, a\, ... , an) is called the sequence of coefficients of/.
For if ai 0 for some i, consider the largest such i < n; call it k. Thus
ai = 0 for k < i < n. If k = 0, presuming (as is natural) £° = 1, we
have a0 • 1 = 0 also. If k > 0, we apply (5:1-1) with k instead of n and
m = 0, b0 = 0, giving a contradiction. Conversely, it can be seen that
(5:1-2) implies (5:1-1). For example, if n > m, we define bi = 0,
for m < i < n (if necessary). Then £“= o ai¥ = £?= o bi£l, hence
(ai — bi)? = 0. Application of (5:1-2) then gives the desired
result. Of course, all these manipulations with polynomial forms implicitly
160 POLYNOMIALS [CHAP. 5
Proof.
Z «-f‘ =
;=o
t *<?> £ ^ i
i—0 i—0
=
i=0
and
2=0
£ 2=0
= £ (a# + biP),
i=0
by 3.45(i),
(5:1-3)
where, in particular,
In fact, this follows from 5.4(ii). We extend the given finite sequences
(a0, . . . , an), (b0, . . . , bm) to essentially infinite sequences by making
ca = 0 for i > n, bi = 0 for i > m. Then we see that if i > n + m
then di = £y=o ay&.w = 0, for if 0 < j < i, either n < j or m < i — j,
so that in any case afii_j = 0. On the other hand,
n+rre
Proof. If the two sums are equal, then £”=0 ai£l + (—1) £*=0 = 0.
Applying 5.4(h) to this special case shows that (—1) £0=0 ©) F =
£*=o (—bi) £\ Then by 5.4(i) (a* — 6;)!1 — 0; furthermore, there
is an n with a* — bi = 0 and hence eq = bi for all i > n. It follows
that H"=o (eq — bi) F — 2Zi°=o («i — h) F = 0. Then by our defini¬
tion 5.3 of transcendental extension, also (a* — bf) = 0 for all i < n.
Hence oq = bi for all i.
5.6 Theorem. Suppose that E = D[£], E' = D[£'] are two simple tran¬
scendental extensions of D. Then E = E'. We can choose the isomorphic
mapping F so that F(a) = a for each a e D and F(£) = £'.
(1) v = eqf\
i=0
because the same rules 5.4(i), (ii) for calculating +, • on infinite sums hold
in E as hold for ©, ° in E'. For example, in the first case we have
(5) F(a) = a,
in particular
Thus our theorem is proved. It can be seen that the conditions that F
be an isomorphism satisfying (5) and (7) in fact uniquely determine
F{rf) for all 7? e E to be given as in (2).
Having the unicity of simple transcendental extensions up to ==, the
only thing we need prove now, in order to make these play the same role
as symbolic polynomial expressions, is an existence theorem. The answer
to the question as to what should serve as the elements of such an ex¬
tension is suggested immediately by 5.5: ordinary infinite sequences
(a0, .. . ,dj,...), (60, are objects such that if (a0, . . . , ai} . . .) =
(b0, . . . , bi, . . .) then a4- = bi for all i. A definition of sum and product
of two such sequences is simply obtained by imitating 5.4(i), (ii). Finally,
we can identify each a e D with the sequence (a, 0, 0, . . . , 0, . . .) and £
with the sequence (0, 1, 0, . . . , 0, . . .).
5.7 Theorem. For each integral domain D there exists a simple tran¬
scendental extension E = D[£],
i
di — ^ ajbi-j for each i.
3=0
164 POLYNOMIALS [chap. 5
The basic laws 4.1 (i), (ii), (iv), (vi) and the first half of (iii), for a com¬
mutative ring with unity, are readily verified. Consider the remaining
conditions, the associative law for 7 and the (left) distributive law for 7
over F. Given (a0, . . . , ai} . . .), <b0, .), (c0, . . . ,c,-, . . .) e E,
their sum, associating to the right, is a sequence whose fcth term is
Consider any k. The terms of the first sum giving the kth. term are thus
aJojCu-i-j where (i, j) is a pair for which i < k, j < k — i. The terms
of the second are atbs-_tCk—s where (s, t) is a pair for which s < k, t < s.
There is a one-to-one correspondence between these pairs (i, j) and (s, t)
under which af)jCk—i-j = atbs_tCk-s; namely, set t = i and $ = i -\- j
when given (i, j), or equivalently, i = t and j = s — t when given (s, t).
Hence the two sequences have identical fcth terms for each k. Less
precisely, as following 5.4, we can think of the first sum as
i\ +(i2-\-i^)=k
(i\ 4-^2)-\-i%=k
We leave the proof of the distributive law to the reader. To complete the
5.1] POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS 165
where we now use (7), (8), and (2) (say, by induction on n). It follows
immediately from (9) that
£ «*(£)* = 0
;=o
Theorems 5.6 and 5.7 thus provide us with existence and uniqueness,
up to isomorphism, of a simple transcendental extension D[£] of any
domain D. For the purposes of algebra, it makes no difference how any
particular such extension is chosen.
We shall very often have to compare the behavior of the same polynomial
in several integral domains. If D and E form integral domains with D
a subdomain of E we would expect that the polynomials over D can be
regarded as polynomials over E. Indeed if we form a simple transcendental
extension E[£], it is clear that the set of all polynomial forms £”=o alfi
in E[£]_such that a0, . . . , an e D is itself a simple transcendental exten¬
sion D[£] of D, and hence ^ to the extension chosen by the convention
5.8. For simplicity, since there are only a specific number of integral
domains that we shall have to compare in this way, we can assume that
the £ used in all cases is the same, so that we have not only D[£] ^ D[£],
but in fact £ = £, hence D[£] = D[£],
We now return to the relationship of polynomial forms to polynomial
functions.
(i) deg (f(£) + git)) < max (deg (/(£)), deg (y(£)));
(ii) if fit) ^ 0 and git) ^ 0 then deg if it) • y(£)) = deg if it)) +
deg (git))-
The proof of this is left to the reader. We can have < in (i), for example,
with f(£) = 1 + £, git) = 1 — £• Also the hypotheses in (ii) can
obviously not be eliminated in general.
5.12 Definition. Given fit), git) e D[£], we say that git) divides fit) in
D[£], in symbols git)\fit), if for some Ht) e D[£L fit) = git)h(t)-
It turns out that the relation | between elements of D[£] has surprisingly
many properties in common with the corresponding relation in I. How¬
ever it is technically simpler to describe this in the case that division is
trivial in D, or as we shall say in the next chapter, in the case that D is a
168 POLYNOMIALS [CHAP. 5
field. For the moment, we consider only a simple special case, namely
when <?(£) = (—a) + £ or, as is more usually written, £ — a.
5.13 Theorem. Suppose that /(£) £ D[£], /(£) 0 and suppose that
a £ D. Then (£ — a)|/(£) if and only if f(a) = 0.
n 71
= 1 &»(£*' - a1') = X - O
i=0 i=l
[the last by standard algebra (cf. 4.33)]. Using the general distributive
law we can factor (£ — a) out of the sum to give the desired result.
5.16 Definition. Given /(f) e D[f], /(f) = £?=o aikl, we define the
formal derivative/'(f), a^so written (/(f))', by
/'(*) = S
1=1
1. Prove 5.4(ii).
2. Prove that the left distributive law for + over • holds for the system
E defined in 5.7.
3. Prove Lemma 5.11.
4. Let D be the collection of all polynomial functions / with coefficients in D.
For any /, g G D define f+g to be the function hi with hi(x) =
f(x) + g(x) for all x G D, and f~ g to be the function h2 with h2(x) =
f(x) • g(x) for all iGD. For any a £ D, define a to be the constant func¬
tion / £ D with f(x) = a for x E D. Define a function F on D[f] as
follows: for each /(f) £ D[f], F(/(f)) is the polynomial function / with
domain D associated with/(f) by 5.9(i). Prove the following:
(a) D is closed under +, •, and a £ D for each a £ D. _ _ _ _ _
(b) F is a homomorphic mapping of (D[f], +, *, 0, 1) onto (D, +, •, 0, 1).
(c) If D is infinite, the mapping F is an isomorphism.
Where have we implicitly used part of (b) in the proof of 5.13?
where a0, a\, . . . , a5 £ D. It may at first sight seem that a new notion
is involved here. However, if we write
f(( 1. h) = E D a,,yii hi = £ (E
2=o y=o / 2=0 \y=o /
m / n \ m / n \
5.20 Theorem. Suppose that k > 0 and that D]^, . . ., £&] is a k-fold
transcendental extension of D. Suppose that H is any permutation
of {1, ,k}. Then
Proof. This can be seen from 5.18, using the fact (Exercise 2 of Exercise
Group 4.4) that every permutation is a product of transpositions, in
particular of simple interchanges of successive elements.
Next, in generalization of 5.6, we have the following.
To justify and generalize the use of such notation as f(fa, fa) in 5.18, we
make the following definition.
(ii) In particular, if T>[£1} ..., fa] cE then the value of the function
f associated with any polynomial y is determined at fa, ... , fa,
and we have/(fa, . . . , fa) = y.
(iii) (d, . . . ,xk) is said to be a root of the polynomial f(fa, . . . , fa)
in D if xi, . . . , xk E D and f(x1} . . . , xk) = 0.
for any aq, . . . , Xk—i, Xk. To make the distinction clear, we always ex¬
plicitly mention which kind of polynomial we are dealing with, for example,
by such notations as /(£ 1, . . . , £*), <?(£i, . . . , £fc_i), etc.; in the use of
/(£x, . . . , £fc), we do not exclude the possibility that this polynomial does
not depend on all of £1, . . . , £*>
As one special case of 5.20, we see that for each l < k, any polynomial
/(£ 1, . . . , £fc) can be written as a polynomial in £7 by
we have
n
Then for any /(£ , h) E D[£1? . . . , £fc] we can find n > 0, and
for each i < n and (j\, . . . ,jk) E S%, elements a^.j E D such that
/({!,...,&) = t( L Ojr.
* = 0 \(il.Jk)^Si
m /
m, ...,&)=£( I ./* # • • • &
i=o \(ii.jfc)est-
5.25 Definition. Suppose that f(£i, ... , £*) E D[£1; . . . , £fc]. We say
that /(£i, . . . , £k) has degree n in £i, . . . , £&, and
/(ii, • • •, h) E D
we see that
and
63 = — (XiX2X3).
5.26 Definition, (i) We call a polynomial /(£b ...,£*) e D[£1} ... , £fc]
symmetric in , . . . , kk if for any permutation H of {1, . . . , k} we
have /(£i, ...,£*) = /(£#u)> • ■ ■ ,
(ii) Let 0 < n < k. By the nth elementary symmetric polynomial in
£i, . . . , we mean
°n(£i, • • • , h) = X • • • !zn,
l<h<^2< <ln<k
= ij n ^ = s n
XGM„ Z'GH(X) I£M„ i'GF
Proof. We leave the proof of (i) to the reader. In (ii) it is seen from the
notation that we are dealing with the symmetric polynomial of degree n
in £x, • • • , on the left-hand side and with the corresponding polynomial
in £x, • • • , £fc_i on the right. The proof of (ii) can be seen in several ways.
The first would be by direct inspection of the definition 5.26(ii), or its
alternative form (5:2-1). From the latter we see that
where X e M], if and only if X = {Zx, l2, . . . , ln} with 1 < Zx <
l2 <■••< ln < k — 1, since all X e M„ such that k e X contribute 0
to the sum. Then the conclusion is immediate from (1). Another argument
is as follows. Consider a (k + l)-fold transcendental extension D[£,
£x, . . . , £*,], and the fc-fold transcendental extension D[£, £x, . . . , £fc_x].
Applying (i) of this theorem to these gives
5£i + 5£2 + 5£3 + £? + £2 + £3 — 5cr 1 (£1, £2, £3) + ai(£i, £2, £s)2
— 2cr2(£ 1, £2, £3) = gi? 1 (£1, £2, £3); ®*2(£ij £2, £3)),
where gr(£x, £2) = 5£i + £x — £2- We can also write this as a polynomial
h(cr 1 (£1, £2, £3)>°'2(£i, £2, £3); ^"3(£1 > £2, £3)) in all the (nontrivial) sym¬
metric polynomials in £x, £2, £3 by taking /i(£x, £2, £3) = 5£i + £? —
£2 + 0 • £3. The general statement here is the following.
Note that each <n is homogeneous of degree 7, so that for any j (and no
matter what A is),
[by Exercise 2(b) below]. In the terms of (1), the statement of our theorem
takes the following form:
we see that
Note that /0(£i, • • • , £jt-i) is just the first coefficient in the representa¬
tion of /(£i, . . ., fa) as a polynomial in Since /0(£i, . . . , £k-i) is
only symmetric in £i, . . . , £k—i, we cannot expect that /(iff, . . • , £fc) —
fo(£i, ... , ^_j) is symmetric in £i, . . . , However, the representation
(7) suggests that we consider the following closely related difference.
(8) Let /!($!, . . . , &) = /($!, ... ,h) — go(<T\, ■ ■ ■ , <Tk-1). r/tett
/i(^i, . . . , £*;) ts symmetric in £1; . . . , £& and
\.• • • ^-ib
Then
/l(£l> • • • j £fc) = ■ ■ ■ , %H(k))
= kk—\hk{^H(l), ■ ■ ■ , ^H(k-l), £hOc)),
(11) ITe can. /ind /i(£x, . . . , £&) G D[£x, . . . , £*] snc/i £/ia£
/l(£l> •■•,£*:) = <Tk • ^(£1, • • • , £&)• /i(£i, ■■■,&) is symmet¬
ric in £1, . . . , and deg (fc(£x, . . . , £fc)) < n — k.
5.2] POLYNOMIALS IN SEVERAL VARIABLES 181
The second part of this conclusion is seen from (8) and deg (crk) = k.
Also we must have symmetry because for any permutation H of {1, . . . , k},
We can now conclude the secondary induction step, for by (8) we have
(a) deg (/(G, • • • ) G) + 0(G> • • • . G)) < max (deg (/(G, • • • , G),
deg (g(G, • • • , G)));
(b) if /(G) • • • , G) ^ 0 and gr(G> • • • , G) ^ 0 then
3. (a) Show that 5.14 does not generalize to polynomials in several variables
by giving an example of /(G> G) £ I[G> G1 such that/(G> G) ^ 0
but/(G) G) has infinitely many roots in I.
(b) Show that, nevertheless, 5.15 does generalize to polynomials in several
variables (even though 5.14 is essential to the proof of 5.15): if D is
infinite, /(G, • • • , G), g(G, • • • , G) e D[G, • • • , G1 and
then
5. Represent £?G + £iG + £lG + £lG + dsG + £§G as g/op, cr2, 0-3),
where we write 07 for <rz(G, G> G)-
6. Let A;, m be arbitrary positive integers and set 7m = £y=1 Put
To = m. Show that
£ (-l)X_i<r, = 0,
;=0
where n = min (k, m), and ay is oy(G> • • • , G)- Use this to represent
£j=i £y as a polynomial in the elementary symmetric polynomials in
G> G) G-
7. Verify that
where ax, a2, bx, b2, cx, c2 are given elements of some integral domain D.
If we multiply the first equation by b2, the second by bx, and then sub¬
tract one equation from the other we see that any x e D for which there
exists a y G D satisfying (6:1-1) must also satisfy
Similarly we reach a condition which y must satisfy. Both these new equa¬
tions have the general form
(6:1-3) bx = a.
may well be that b ^ 0 and yet there is no solution at all to (6:1-3) (as
9
One of the main objects of this chapter will be to show that given any
integral domain D, we can construct another domain K such that D forms
a subdomain of K and K satisfies (6:1-4) and (6:1-5) or, equivalently, as
we have just seen, (6:1-6) and (6:1-5). Further we shall see that such a
K is uniquely determined up to isomorphism. In particular, any system
thus associated with the integers will serve the purpose that we have in
mind for the rational numbers. Thus we can speak of the quotient a/b of
two integers a, b (b 9± 0) in an algebraically well-defined and consistent
sense, as being the result of division in such an extension.
a
(6:1-7) (i) ^ if and only if a c;
b
(ii)
a + c
(iii)
nr~;
c
(iv) if and only if a < c.
b
The first of these is intuitively evident. The second is seen to hold by con¬
sidering a further subdivision of the smaller unit (which makes up the given
unit of length b times) into c equal parts. The third condition should hold
for any operation + on lengths which, if is the length of a given segment
P1P2 and l2 is the length of P2P3, where P2 is between Px and P3 on a
straight line, is to give 11 + l2 as the length of P1P3. The last condition
should hold if < is to be a relation between lengths such that if l\, l2
are lengths then li < l2 if and only if there are points P1; P2, P3 on a
straight line with P2 between Pj and P3, lx is the length of PXP2, and l2
is the length of P1P3. It follows from (6:1-7) that (for a, b, c, d e P)
ad + be
(,1) i+i bd
d c
(iii) < ~^if and only if ad < be.
For if a/b = c/d, then ad/bd = bc/bd by two applications of (6:1—7) (ii);
then ad = be by (6:1—7) (i). Since these steps can be retraced by the same
conditions, we see that the equivalence in (i) above holds. For (ii) we
write
be ad + be
bd bd
by (6:1—7)(ii) and (iii). Similarly, we can obtain (iii) from (6:1—7)(i), (ii),
and (iv). If we apply (6:1-8) to ratios a/1, c/1 we see that the system of
ratios a/1 under +, < is isomorphic to the positive integers under +, <.
Further properties of + on arbitrary ratios a/b, such as commutativity,
associativity, cancellation, etc., can be seen to follow from (6:1-8).
186 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6
Here the guiding ideas are that units of area measurement should be
squares, and that the area of a rectangle with lengths llt l2 as sides should
be li ■ l2- If this were pursued in detail, it would be seen that we obtain
a system (Rax, +, •, <, 1) from (P, +, •, <, 1) such that
a/b, or ^>
(i) | = a . (6-1);
a c
(iii) ^ ^ if and only if a • d = b • c;
a a• e
(iv) b^e
b ’
(1continued)
188 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6
(viii) if a 9^ 0 then
(ix) l"1 = 1.
Proof. The proofs are straightforward from 6.2. To see (i), we show
b • (a • 6_1) = a; but this follows directly from the fact that b • 6_1 = 1.
Part (ii) follows from the basic property of integral domains. For (iv),
if e ^ 0 also b • e 9^ 0. Let a/b = x; then b • x = a, hence {b • e) • x =
a • e, so that also x = a • e/b • e. Now from (i), (ii), (iv), and simplified
versions of (iii), (v), namely:
we can obtain (iii), (v) in the same way as we derived the parts of (6:1-8)
from (6:1-7). For (iii)' we note from (ii) that 1 5^ 0; hence by cancel¬
lation, a • b~x = c • 6_1 if and only if a = c. On the other hand, (v)' is
a form of distributivity,
(a • b x) + (c • b *) = (a + c) • b 1.
Also, if c 9^ 0,
Proof. Note that our first statement is required to hold for all m El,
not only mePas in the definition. To extend x~m = (T-1)™ to negative
m (it is trivial for m = 0), suppose that m = —k where k E P. Then
(x-lyn = (s-l)-* = ((*-!)-!)* = xk = x~™ To ^ ^ =
(xm) 1 for m E P, note that xm • (x~x)m = (x • x~1)m = lm = 1. Again
for m = 0 this is clear. For m = —k, where k E P, we have
The remainder of this theorem is reduced to this first part and to 4.32
(the version of 4.12 allowing m = 0 or n = 0) by consideration of cases.
For example, to prove that xm+n = xm-xn, it can be seen that it is
sufficient to treat the cases m + n > 0. We can restrict ourselves further
to the case n = 0 or — n E P. Then xm+n • x~n = _ xm.
hence, by x~n • xn = 1, xm+n = xm ■ xn. The verification of the remain¬
ing parts is equally direct (and mildly tedious).
ing the left inequality by 62, the right inequality by 1/62. The proofs of
the remaining parts are direct and are left to the reader.
THE RATIONAL NUMBERS AND FIELDS [CHAP. 6
190
Some finite fields. So far we do not have a proof of the existence of any
fields whatever. We shall prove in the next section a general theorem which
tells us that any integral domain can be extended to a field. However,
before doing that we can verify that certain integral domains that we have
already come across are in fact fields, namely the domains Ip of integers
modulo a prime p (4.60). More generally, we have the following.
x Wlth b‘X — a. Since b was chosen arbitrarily, this shows that K forms
a field.
This theorem is not an empty generalization of the statement that all
Ip are fields, since it can be shown that there are many other finite integral
domains. However, it may be of interest to give another proof of 6.9
for these specific domains. Recall that the elements of Ip are the equiv¬
alence sets [A;]p for k e I with respect to the congruence relation =p.
Then for a, b e Ip we have a = [%, b = [m\p for some k, m; further if
b 9^ [0]p in Ip, m ^ 0 (mod p). To show that there is an x E Ip with
b ’ x = a, it is sufficient to find l E I with m • l = k (mod p). By 4.44
and 4.47(i), (m, p) = 1, hence there are s, t with ms -f pt = 1. Then
m(sk) -f- p(tk) = k, so that m(sk) = k (mod p).
An even easier proof makes use of Exercise 5 of Exercise Group 4.6.
It can be seen from there that if m ^ 0 (mod p) then mv~x = 1 (mod p),
hence m • mp = 1 (mod p). In other words, the equivalence set of
mp ~ is an inverse for that of m in Ip. This is sufficient to show that lp
is a field. As an example, the inverse of [2]7 in I7 is [25]7 = [32]7 = [4]7.
It might be suspected that, just as we constructed new integral domains
from old ones by forming homomorphic images, so could we also form new
fields from old fields. As it turns out, however, nothing essentially new can
be gained in this way. This is the content of the following theorem, whose
proof is left as an exercise.
6.2 Fields of quotients. The existence theorem. The idea for construct¬
ing a field K from an integral domain D is very similar to that used in 4.21
to construct the integers from the positive integers. To any two elements
а, b G D with 6^0 should correspond a quotient a/6 in K. As we have
already indicated in the discussion of (6:1—4)—(6:1—6), K need contain no
other elements. In other words, if we can extend D to a field K at all
then the set Q of all such quotients already forms a field. Thus we can re¬
strict attention to obtaining such a field Q each element of which cor¬
responds to (various) pairs (a, 6) of elements of D, with 6 9^ 0, of which
it is the quotient. The condition that two such pairs (a, 6), (c, d) thus
correspond to the same element of Q is given by 6.3 (iii), namely, a • d =
b • c. This shows us how to define the appropriate equivalence relation
W on pairs. Then the elements of Q can be taken, at first, to be the equiv¬
alence sets of this relation. Finally, to define appropriate operations on
the equivalence sets, we first define operations on pairs, with respect to
which we show that IT is a congruence relation. How these latter opera¬
tions are to be defined is suggested directly by the various parts of 6.3.
For example, 6.3(iv) shows that we should take (a, 6) © (c, d) =
(a • d + b • c,b • d). These ideas are now carried out in detail in the
following theorem.
Proof. Define
(1) W = {((a, 6), (a', 6')): (a, 6), (a', 6') G D X (D - {0}) and
a • b' = a' • 6}.
Then
(a, 6) — (a', b'), (a', b') = (a", b") we have ab' = a'b, a'b" = a"6'.
Multiply the first equation by b'b", the second by bb'. Then we see that
ab'b'b" = a"b'bb'. But (6)2 5^ 0, hence ab" = a"b, showing that
(a, b) = (a", b").
By (3), we can deal with the equivalence sets W(aM of W. As in the proof
of 4.21, we shall also write [(a, 6)] for these sets and [a, 6] where possible.
To define appropriate operations on the sets [a, 6], we first make the fol¬
lowing definitions of operations 0 and ° on any elements (a, b) (c d) of
D X (D {0}):
Before proving (6), observe first that 0 and ° are commutative. Thus it
is sufficient in (6) to prove that if (a, b) = (a', b') then
Two such sets [a, b], [a', b'} are identical if and only if (a, b) = (a', b').
As we know from the general considerations of (2:3—37), it follows from
(6a and b) that
(8) there are operations +, • defined on Q such that for any [a, b],
[c, d) e Q,
The verification of this result by means of (4), (5), (7)-(10) is now a com¬
pletely routine matter. We shall check only a few of the less obvious laws:
associativity of +, distributivity, and additive and multiplicative inverses.
Representing these as [a, b], [c, d], [e,f], we are reduced to showing
in fact, we have =.
This has the form (g, g) = (bg, bg), which is true by (1).
Given X = [a, 6], the obvious candidate for Y is {—a, 6], Then
so long as a 0.
We now wish to show that the field Q contains a subsystem isomorphic
t° D. The natural choice is the set of elements a where a e D. Define a
function G by the conditions:
and set
(18) D = <R(G).
In fact, if_X = [a, b], with b 9^ 0, then we also have X = [a, 1] • [1, 6] =
(a) • (b)~\ by (16).
196 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6
Since (Q, . . .) is a field and (Q, . . .) = (Q, . . .), also (Q, . .^) is a field.
Since every element X of Q is a quotient of elements A, B of D, whenever
i6Q and H(x) = X, we will have x the quotient of the corresponding
elements H~X(A) and H~1(B), i.e., the elements G~l(A) and G~X(B)
of D.
If D has defined on it a relation < which makes it into an ordered in¬
tegral domain, we can extend this relation to Q by first defining the set
of positive elements in this relation as follows:
(22) let Pos consist of all quotients a/b in Q for which a, b G D and
0 < a • b.
[This, of course, is taken in light of 6.6 (i)]. Note that according to (22),
x G Pos if and only if there is some pair (a, b) G D X D for which
0 < a • 6 and x = a/b. It does not follow directly from this that if x G Pos
then for any pair (c, d) G D X D for which x = c/d we have 0 < c • d,
i.e., that the determination, by (22), of whether an element x G Pos is
independent of the particular representation as a quotient of elements
of D. This is, however, true:
(23) if (a, b), (c, d) G D X (D — {0}) and a/b = c/d then 0 < a • b
if and only if 0 < c • d.
and
x G Pos, x = 0, —x E Pos.
We need to see that this defines a unique value G(x) for each x e Q,
independent of any particular representation x = a • 6-1. If we have
another such representation x = c • d~x, we know that a • d = b • c.
Hence^ by definition of subsystem, a ° d = b ° c so that, in K, a ° frO =
c . d&. Clearly
for any x, y E Q, as is seen from the fact that the same rules 6.3(v), (vi)
for calculating sum and product of quotients hold in Q as in Q'. Clearly
In view of the unicity of such a system, we can now adopt the following.
Although the rational numbers form one (in fact the smallest) field
containing the integers, they are by no means the only such field. Two
especially interesting larger fields are formed by the real and the complex
numbers, which we shall deal with in the next two chapters. These will
then lead us to a number of other algebraically interesting intermediate
fields.
Besides the integral domains of the integers I and of the integers
modulo p we have constructed a number of new integral domains in Chapter
5, namely those consisting of the polynomial forms over any given domain.
6.2] FIELDS OF QUOTIENTS 199
, h) g2(G, • • •, h)
hitti, . . . , &) h2(£i, ■ ■■,&)’
/(G, • • •, &)
9l(G; • • • j £k)h2(i;i, . . . , £&) 92(^1, • • ■ , %k)h 1(^1, . . . , %k),
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 201
the problem reduces to determining whether there exist xiy ... ,xk with
f(xi, . . . ,xk) = 0,
6.17 Definition. Suppose that k, m E P, and that for each i = 1, ... ,m,
fi(%i, ... , h) = LyU where (an, ... , aik) is a k-termed
sequence of elements K. Suppose that (6b . . . , bm) is an rn-termed
202 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6
3=1
for each i = 1, . . . , m.
(2) Oi1
&1 _
—
7
0i for each i = 2, . . . , m.
an
Again we assume that this has at least one solution with aq, x2, . . . , xk e L.
If all aij — 0, it follows that fq = 0. But then any x[, x2, . . . , xk E L
provide a solution, in particular, any elements of K. Clearly there is no
unique solution in this case. Suppose now that some an A 0; assume,
for simplicity, that an ^ 0. (The proof in general follows the same lines
as below.) Then any solution x[, x2, . . . , xk e L satisfies
Let us define
and
In other words, (6) [or, what is the same, (8)] is a system of m equations
in k — 1 unknowns. Conversely, we see that if {x'2, . . . , x'k) is any solu¬
tion of the system (8), and we define x[ in terms of x2, ... , x'k by (4),
then (x{, x2, ... , xk) is a solution of (5). Now the system (8) has at least
one solution, namely (x2, . . . , xk). Hence, by induction hypothesis, we
can find a solution (x'2, . . . , x'k) with all x'- e K. But then x[ as determined
by (4) is also in K. The final case to consider is that some (x[, x'2, . . . , x%)
is the unique solution of (5) in K. We leave as an exercise the proof that
it is then the unique solution of (5) in L.
Note that the above proof actually provides much more than what is
explicitly stated in the theorem. Inspection of the proof shows that we
have an algorithm for determining, in terms of the coefficients, whether or not
a given system of linear equations has at least one solution in a field K, and
tf 80, for determining all such solutions. [In the simplest case, k = 1, this
was given by the conditions (2).] Linear algebra is devoted, in part, to
studying such algorithms in more perspicuous forms, through the use of
determinants and matrices. We shall not pursue the general question any
further in this direction, and suggest that the reader consult any of the
several excellent texts now available on this subject. We turn, instead, to
a few simple examples for illustrative purposes.
The simplest nontrivial case involves two variables, k = 2. Consider
the case m = 2.
and
a2Xaxx (bx — ^12^2) + d22x2 = b2.
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 205
or, equivalently,
61 CL 12 an b1
b2 0-22 o21 b2
(6:3-4)' X\ = ) X2 -
flu 0-12 Oi 1 0\2
there would have been no solution; this essentially comes from looking at
the denominator a22«n — a2iai2 in (6:3-4). This must be ^ 0, unless
&2«ii — a2i&i = 0. In this case, we have
over the integers. For if we had a solution x1} x2 e I, the left-hand side
of (6:3-8) would be divisible by 2, while the right-hand side would not.
On the other hand, there are infinitely many solutions in rationals, inter¬
related by x\ = | — 2x2. Clearly, again, none of these pairs (x1; x2) has
both xi and x2 in I.
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 207
i.e., that the gcd of ax, a2 divides b. The argument that this is necessary
follows the same line as above. On the other hand, if d = (ax, a2) and
d\b, and ax = a[d, a2 = a2d, b = b'd, we see that (6:3-9) is equivalent to
But now (a], a2) = 1. Hence there are s,lel (by 4.44) with
hence
ax(sb') + a'2(tb') = b'.
Multiplying through by d shows that (xx, x2) = (sb', tb') is one solution
of (6:3-9). This is by no means the only solution of (6:3-9) [assuming that
(6:3-10) holds]. If (x[, x2) is any solution, and we set yx = xx — xx,
y2 = x2 — x2) we have
Our previous argument shows that this has solutions, since (ax, a2)|0, but
(6:3-12) exhibits only the solution (0,0), for which x[ = xx, x2 = x2.
However, there are many others, namely, all those yx, y2 G I with
a2 a2
(6:3-14) Vi 2/2 = 2/2-
ax
Since (ax, a2) = 1, this has a solution with yx, y2 e I if and only if
ax\y2, that is, y2 = axk for some /cel; then yx = — a'2k. Clearly any
choice of k ^ 0 gives a new solution (x[, x2) in I of (6:3-9). We conclude:
(6:3-15) If (ax> a2)\b then axxx + a2x2 = b, where xx, x2 e I, if and only
if for some /cel, xx = sb' — a2k, x2 = tb' + a[k, where
d = (ax, a2), ax = a[d, a2 = a'2d, b = b'd, and a[s + a'2t = 1.
m ftp
b0 ‘
I ci 1
yy where all ai} b{ e I, ^ 0.
and
6.21 Corollary. Suppose that n > 1, a G I, |a| > 1, and that \a\ =
v\x • ■ • pIt is the unique prime power representation of \a\ with
Pi < P2 <•■■< pm. Suppose that for some t, n \ it. Then
I” — a has no rational roots.
6.23 Theorem. Suppose that /(f), g(£), /z(f), /i(f), . . . ,/„(f) G K[f],
Then:
(i) /(£)l°;
(ii) 0|/(€) z/and onfz/ z//(f) = 0;
(iii) if a E K and a 5^ 0 then aj/(f);
(iv) z//z(f)i£(f) and 0(f)l/(f) &en /z(f)|/(f);
(v) if 0(f)|/»(f) for each i then flf(f)E*~i/fc(f);
(vi) #0(f)|/»(f) for some i then flf(£) I IE=i/*(€);
(vii) /(f)|1 i/and only if /(f) ~ 1;
(viii) /(f)|l z/ and only if for some a 6 K, a ^ 0, we have f(£) = a;
(ix) ~ zs an equivalence relation between elements of K[ f ];
(x) /(f) 0(f) if and only if for some a E K, a ^ 0, we /zaz>e
/(f) = 00(f);
(xi) z/ /(f) 7* 0, deg (0(f)) = deg (/(f)) and 0(f)|/(f) toen
0(f) ~ /(f);
(xii) if f(£),g(£) are monic then f(£) ~ g(£) if and only if f(£) = 0(f);
(xiii) z/ /(f) 0 then there is a unique a E K, a 5^ 0 and a unique
monic 0(f) E K[f] with /(f) = ag(tf); hence there is a unique
monic 0(f) E K[f] with /(f) ~ 0(f).
Proof. The proofs are quite straightforward. The only essentially new
points are in (viii)-(xi). Consider first (viii). As we have already seen,
if a E K, a ^ 0 then ajl. Suppose that /(f) |1. Then clearly /(f) 5^ 0.
If we show deg (/(f)) = 0, we are through. Suppose otherwise, say
deg (/(f)) = n > 0, for some n. Then since /(f)11, there is 0(f) with
1 = /(f)0(f)- Also 0(f) 0. Hence by 5.11(h),
Prime polynomials.
This is now the correct analogue of the notion of prime integer. Note that
by 6.23(xiii) with each prime p(f) is associated a unique monic g(f) with
P(f) r'w' g(f). It follows by 6.23(ix) that also g(f) is prime. Further, if
/(f)l<?(f) and/($) is monic then /(f) = 1 or/(f) = g(f). Thus the monic
prime polynomials provide unique representatives for arbitrary prime poly¬
nomials just as the positive prime integers provide unique representatives
for arbitrary prime integers.
Note that by 6.23(viii), a simpler way of expressing the condition
/(f) ^ 0,/(f) ^ 1 is that deg (/(f)) > 0.
6.25 Theorem. 7//(f) £ K[f], deg (/(f)) > 0, and/(f) is not prime in
then there exist g(f), 7(f) with /(f) = g(f)/i(f) and 0 <
deg (g(f)) < deg (/(f)) and 0 < deg (7(f)) < deg (/(f)).
(6:4-2) (514 + |2 - 1)
i.e., we have quotient 5|2 + 10| + 11, remainder 2| — 23. The method
is simply this: we start out with
6.26 Theorem. Suppose that /(f), /i(f) G K[f] where deg (/i(f)) > 0.
Then there exist g(f), r(f) G K[f] such that /(f) = h(if)q{£) + r(f)
and 0 < deg (r(f)) < deg (/i(f)). Further if g'(f), r'(f) G K[f],
fit) = HOq'iO + r'(f), and 0 < deg (r'(f)) < deg (h(0), then
q'(0 = ?(f) andr'(^) — r(f).
Proof. The special case, deg (h( f)) = 0, is trivial. Otherwise we have
q'(f) e L[f] with /(f) = h(f)g'(f). We also have g(f), r(f) g K[f] with
/(f) = /i(f)g(f) + r(f) and 0 < deg (r(f)) < deg (/i(f)). By unique¬
ness in 6.26 applied in L[f], g'(f) = q(f), r(f) = 0.
Thus we can now speak of /t(f)|/(f) without specifying what field this
should take place in. The reader should not jump to conclusions about
6.4] POLYNOMIALS OVER A FIELD 215
6.28 Definition. Suppose that /(f), g((), d(£) E K[f]. We call d(£) a
greatest common divisor or (gcd) of /(£) and g(%) in K[£] if d(£) has
the following properties:
(i) d(€)|/(€) and d(£%d);
(ii) if HZ) e K[£] and h(£) |/(£), h(0\g(0, then h(Z)\d{Z).
Our method to prove the existence and uniqueness (up to ~) of gcd for
polynomials now follows the same lines as for integers.
Proof. Consider
S* = [m: m > 0 and for some s(£) E S, s(£) ^ 0 and deg (s(£)) = mj.
/(f), 0(f) have a unique monic gcd d(f) in K[f], For suitable s(f),
t(6 e K[f] we have d(f) = /(f)s(f) + g(£)t(£).
Thus also the notion of gcd, for polynomials in a given field, does not
change if we enlarge the field.
6.32 Definition. We define (/(£), 0(f)) for all /(f), 0(f) g K[f] with
/(f) 9^0 or g(f) 7^ 0 to be the unique monic fed d(f) of /(f) and
0(€);/(€) and 0(f) are catted relatively prime f/ (/(f), 0(f)) = 1.
Although the proof of 6.30 does not exhibit it directly, we can actually
provide an algorithm for determining (/(f), 0(f)). The same formal
procedure as given by the Euclidean algorithm for integers, which we de¬
scribed following 4.42, can be applied here. We leave it to the reader to
work out examples of this in the exercises.
The next two theorems are direct analogues of 4.40-4.48. The second
is derived quite easily from the first and earlier results.
Proof, (i) Under the hypothesis and by 6.30, for some s(f), t(f) G A[f],
/(f)s(f) + ft(f)*(f) = 1. Hence /(f)0(f)s(f) + d(f)0(f)f(f) = 0(f).
Since /i(f) divides the left-hand side, /i(f)|0(f).
(ii) follows directly when we write h(f) = /(f)g(f) and apply part (i) to
0(£) !/(£)?(€)•
6.34 Theorem. Oppose fdaf /(f), 0(f), /x(f), . . . ,/n(f), p(f) e K[f],
and suppose that p(f) fs prime in K[f], Then we have:
(i) </>(£) +/(f) ^ (/(f), p(f)) = 1;
(b) */ P(^)|/(f)0(f) f/ien p(f)|/(f) or p(f)|0(f);
(iii) if p(f)|n"=1/;(f) then p(f)|/»(f) /or some i.
6.4] POLYNOMIALS OVER A FIELD 217
6.35 Theorem. Suppose that /(£) e K[£], deg (/(£)) >0. Then
(i) there exists a sequence (pi(£), . . . , Pk(k)) of polynomials prime
in K[£] such that /(£) = pi(£) • • • pk(£), and
(ii) if (<?i(£)> • • • , Qi(0) anU other sequence of polynomials prime
in K(£) such that /(£) = qi(^) ■ • ■ qi(£), then k = l and for
some permutation F of {1, . . . , k}, gt-(|) ~ PFaft) for all i < k.
Proof. This proceeds in a way quite similar to the proof of 4.50. Here
it is by course-of-values induction (4.49) on the degree of /(£). That is,
we let
(1) A = {n: n > 0 and for all /(£) e K[£] with deg (/(£)) = n, we
have (i) and (ii)}.
We wish to conclude
(3) n E A.
Suppose it is given that /(£) e K[£] with deg (/(£)) = n. To show (i)
holds for/(£), we consider two possibilities. If/(£) is prime, we are through.
Otherwise, we know by 6.25 that
(4) there exist g(£), h(i-) E K[£] with /(£) = g(i;)h(Z) and 0 <
deg (g(£)) < n and 0 < deg (h(£)) < n.
Hence by (1) and (2), both g(£) and h(i-) can be written as products of
prime polynomials; but then so also can /(£). To prove that (ii) holds for
/(£), suppose that we have
6.36 Theorem. Suppose that /(f) e K[f], deg (/(f)) > 0. Then
(i) there exists ana E K with a 9*= 0 and a sequence (pi(it), . . . , Pk(0)
of monic polynomials prime in K[ f ] such that
/(£) = api(f).Vk{£),
and
/(£) = &?i(S).
we have k = l, a = b, and for some permutation F of {1, . . . , k},
qi(0 = PFdfO for all i < k.
Proof, (i) is obvious. Consider (ii). If /(f) is prime in K[f], but /(f)
has a root a in K, then (f — a)|/(f) by 5.13. Clearly, (f — a) is not ~1
and not ~/(f). Conversely, suppose that /(f) has no root in K[f], but
/(£) = g(£)h(£) is a proper factorization. Then deg (g(f)) or deg (h(f)) = 1.
Hence one of these has the form aqf + a0, with ax 9^ 0. But then
—o0/ai would be a root of /(f).
6.4] POLYNOMIALS OVER A FIELD 219
6.38 Definition. Suppose that /(f) e K[f], /(f) ^ 0, and that /(c) = 0
where c £ K.
(i) By the order of c in /(f) we understand the largest positive integer
m such that (f — c)m|/(f).
(ii) We call c a simple root of /(f) if its order in /(f) is 1; otherwise
we shall call c a multiple root of /(f).
It may happen that d(£) ~ /(£), and then /x(£) is constant. This will
be the case if /'(£) = 0; such can happen, e.g., with /(£) = — a in
the field lv. However, we claim that
Note that the hypothesis is satisfied in any ordered field. Note also
that the choice of /1(£) is independent of K, since /'(£) and d(£) are in¬
dependent of K ; for the gcd and /x(£) can be found by Euclid’s algorithm
followed by the division algorithm.
If d(£) = 1 then /(£) = /i(£) and /(£) has only simple roots in any
field K which contains its coefficients (so long as ml 5^ 0 for all m E P).
A partial converse is true for the case that the only polynomials which
are prime in K[£] are of degree 1; in this case, if /(£) has only simple roots
in K then d(£) = 1. This can be proved by induction on the degree of
/(£) using (6:4-6) and (6:4-7). The added condition on K is essential;
for example, the polynomial f(f) = (£2 - 2)2 in Ra[£] has only simple
roots in Ra, since it has no roots in Ra, but (/(£),/'(£)) = (£2 — 2).
5. Find the gcd (/(£), p(£)), where /(£), p(£) in 4(c) are regarded as poly¬
nomials in l2[£]-
6. Which of the following polynomials is prime in Ra[£]? Prove your
assertion.
(a) e + a - 1 (b) £2 + 3£ - 4
(c) £3 - 12 (d) £3 + £ - 2
7. Factor the polynomial /(£) = £6 — 1 into polynomials prime in Ra[£],
Prove that your result is correct.
8. Find all polynomials prime in l3[£] and of degree <3.
9. Let K be any field in which the only prime polynomials are those of degree 1.
Suppose that/(|) G K[£] with/(£) ^ 0. Show that if/(£) has no multiple
roots in K then (/(£),/'(£)) = 1.
10. Suppose that p(£) is prime in K[£], For any /(£), g(£) G K[£], define
/(£) = g(k) (mod p(|)) to mean p(£)|(/(£) — p(£>), and /(£) ^ g(£)
(modp(£)) to mean that this is not the case. Show that if /(£) ^ 0
(mod p(£)), then there exists/i(£) G K[£] with /(£)/i(£) = 1 (mod p(£>).
What ideas does this suggest?
11. Show that any /(£), g(^) G K[£] have a least common multiple (1cm),
i.e., a polynomial m(£) such that /(£)|?n(£), g(£)\m(%) and whenever
f(k)\K%), g(£)\h(£;), then m(£)|A(£).
12. We consider elements p of the field K(|) of rational forms over K.
(a) Show that if </(£), h(£) GK[£], g(£) ^ 0, h(£) ^ 0, and (g(£), h(£)) = 1
then there exist s(£), t(i-) G K[£] with
_i__ _ ad) m
g(S) + HU) '
1 1 (a — bs) _ a — bs _ a — bs
a -f- bs (a -f bs) (a — bs) a2 — b2s2 a2 — 2b2
The interesting aspect of this is that from any ordered field K which con¬
tains a square root s of 2, we can construct a subfield S which contains
the same root, and which contains nothing more than what is demanded
by these conditions; we say that s generates S (over Ra).
Now (7:1-2) and our proofs of closure suggest how to construct a field S
which satisfies these conditions. We first construct a system S which will
be = to S:
Finally we put a = (a, 0) for each a £ Ra and s = (0, 1). It can then be
shown that under these definitions, (S, +, •, 0, 1) is a field in which the
224 THE REAL NUMBERS [CHAP. 7
Figure 7.1
We have brought the unit length to coincide at one end with the origin
of this line, which we label 0, the other end falling at a point labeled 1.
We also call the corresponding points 0 and 1, respectively. Given any
line segment, we can attempt to measure it by first laying off an equal seg¬
ment on the above line so that one end coincides with the origin. Let us
denote the other end point resulting from this transfer by P; we shall
also use P to denote any number that we succeed in assigning as length to
OF. We then lay off the segment 01 end to end a number of times until
we either reach P exactly as end point or obtain P between successive such
end points. The resulting end points are labeled by the positive integers
1, 2, 3, . . . In the above picture, the length P (regarded as a number)
of OP is not exactly an integer; we have 3 < P < 4. Our next step is
to refine the measurement. This is in effect a choice of a new unit of length,
but notationally it preserves the previous assignment of numbers by
introducing fractional quantities.
If our unit of measurement is the inch, we would usually refine further
by dividing the unit segment into two equal parts, then each of those parts
into two equal parts, and so on. If our unit of measurement is in the metric
system, say, is the meter, we would refine further by dividing the unit
segment into ten equal parts, then each of those parts into ten equal parts,
and so on. In general, given any positive integer 6, we can divide the unit
segment into b equal parts by the ruler and compass construction indicated
in the following figure for the special case b = 5.
Figure 7.2
226 THE REAL NUMBERS [CHAP. 7
a times. (In particular, Q2, Q3, . . . are also labeled 2/6, 3/6, ... , re¬
spectively.) By this means a definite point P on the original line is assigned
(with respect to the given unit of measurement) to any nonnegative ra¬
tional number a/6.
Now given any straight line, infinite in both directions, a point chosen
arbitrarily on the line as the origin, and a unit of measurement, we can
assign to any rational number a/6 a definite point on the line by assigning
nonnegative rationals as above to points on one (the right) side of the origin
and to negative rationals —a/6, where a, 6 > 0, the point distant a/6
from the origin on the other (“left”) side.
Figure 7.3
ordering (which already holds for the rational points). We wish to formu¬
late this property in set-theoretical terms. Consider, for example, the state¬
ment that there is a gap in the rational numbers at “where \/2 ought to
be.” Let
A — {r: r E Ra and r > 0 and r2 > 2},
i.e., A consists of all rationals to the right of, or above, y/2, and B of all
those to the left of, or below, \/2. Intuitively, A and B have the following
properties, for = Ra:
We might call such a pair A, B a cut in the set S, with A the upper section
and B the lower section of the cut. Let us indicate a lower section with no
largest element [i.e., (i), (iv), (vi) hold] by < ), a lower section
with a largest element [i.e., (i), (iv) hold, but not (vi)] by -].
Similarly, we use (-*, [---> for upper sections with no, or
some, smallest element, respectively. Then we can combine lower and
upper sections in the rationals, i.e., to satisfy (7:1—5)(i)-(v) in only one
of three ways:
(a) ^-*--
(b) .-*-*
(c)---X--
Figure 7.4
7.2 Definition. Let (S, <) be a simply ordered system. By a lower sec¬
tion in (S, <) we mean a set B c ,S satisfying the following conditions:
(a) B 9^ 0 and S — B ^ 0;
(b) if y e B and y' < y then y’ e B;
(c) if y G B then there exists y' with y < y’ and y' e B.
We denote by L(S) the collection of all lower sections in S.
x' < x with x' e A; for otherwise, x would be the least element of A,
contrary to hypothesis. Thus all the conditions for A G U(S) are satisfied.
Hence by 7.1 (ii), B = S — A contains a largest element. But this
contradicts 7.2(c).
7.5 Theorem. Let (S, <) be a densely ordered system without first or last
element. For X, Y G U(S), put X -< Y if Y c X and Y ^ X. Then:
(i) (U{S), <) is a continuously ordered system which contains a sub¬
system = to (S, <) ;
(ii) if (S, <) is already continuously ordered, then
i-e
Figure 7.5
then that Y < X in the second case. Suppose that y E Y; either u < y
or y < u. In the latter case, « £ Fby 7.1(b), which contradicts the choice
of u. Hence u < y and then y e X, again by 7.1(b). In other words,
Y c X; but Y ^ J, so I < Y.
Before completing the proof of the continuous ordering of (U(S), <),
let us show the second part of (i). With each a £ Sis naturally associated
the set
Then
--•—(«»((( ( i ( <-*-
a
^__
Y
j
Figure 7.7
The fact that U(S) — A ^ 0 provides some F e U(S) with Y < X for
all X e A. But also S — Y ^ 0, so there is some a e S — Y. Hence
a < y for all y E Y and then a < x for all I 6 A and x E Ar. This shows
why we have indicated a in Fig. 7.7. Thus, intuitively, the elements of A
must “bunch up.” The set that they “bunch up to” is just the set
(9) Z=\JX[X(=A];
Greatest lower bounds and least upper bounds. The property of an ordered
system to be continuously ordered is a special case of an apparently more
general condition concerning existence of greatest lower bounds and least
upper bounds, which is also more frequently used in practice. However,
as we shall see, the conditions are really equivalent.
7.6 Definition. Let (S, <) be a simply ordered system and suppose that
A c S and i ^ 0.
(i) If b G S and b < x for all x G A then b is said to be a lower
bound for A; if A has at least one lower bound then A is said to
be bounded from below (in S).
(ii) If a is a lower bound for A and b A a for every lower bound, b
for A, then a is said to be a greatest lower bound for A (in S).
(iii) If b E S and x < 6 for all x g A then b is said to be an upper
bound for A; if A has at least one upper bound then A is said to
be bounded from above (in S).
(w) If a is (in upper bound for A and a. A b for every upper bound b
for A then a is said to be a least upper bound for A (in S).
7.7 Lemma. Suppose that (S, <) and A are as in 7.6. Then:
(i) there is at most one a G S such that a is a greatest lower bound for A;
(ii) there is at most one a G S such that a is a least upper bound for A.
7.8 Definition. Suppose that (S, <) and A are as in 7.6. Then:
(i) if A has a greatest lower bound a G S, we call the unique such
element the infimum of A (in S) and write a = inf A, and we say
inf A exists (in S);
(ii) if A has a least upper bound a G S, we call the unique such element
the supremum of A (in S) and write a = sup A, and we say
sup A exists (in S').
Note that inf A exists and inf A g A if and only if A has a least element,
namely inf A; similarly for sup A and largest element.
Properly speaking, we should write “mfs A ” and “sups A.” For example,
if we take A = (x: x e Ra, 0 < t and x2 > 2}, infRa A does not exist
(under the usual ordering of Ra) although A is bounded from below in
Ra. On the other hand, when we define the real numbers and extend
7.1] TOWARD EXTENDING THE RATIONALES 233
(Ra, <) to (Re, <), infRe A will exist and — \/2. Where necessary, we
will write “infs A” or “inf A in S.” It is also common practice to write
a,
“inf (x e A),” “gib A,” “glbx (x G A)” for inf A, and “supx (x G A),”
“lub A,” “lubx (x e A)” for sup A.
7.9 Theorem. Suppose that (S, <) is a continuously ordered system, and
suppose that A c S and A 0. Then we have:
(i) if A is hounded from below then inf A exists;
(ii) if A is bounded from above then sup A exists.
then a = sup A in (S, <’), etc. Thus we see that we could deduce (ii)
from (i) if we are able to show that whenever (S, <) is continuously
ordered so also is (S, <'). In fact, we see from 7.1 and 7.2 that if B c S
then B is an upper section in (S, <') if and only if it is a lower section in
(S, <). But we saw in 7.3 that in this case S — B contains a least element
in (S, <), that is, S — B contains a largest element in (S, <'). Hence
the latter is also continuously ordered. [A “direct” proof of Theorem
7.9(ii) would implicitly involve such ideas.]
Note that 7.9(i) applied to A e U(S) gives inf A as the largest element
of & A: for A has no least element, so inf A & A; on the other hand,
if b E S — A, we know that b < x for all .c E A, so 6 is a lower bound for
A and thus b < inf A by definition. Thus any system (S, <) which
satisfies 7.9(i) for all A c S and which is densely ordered without first or
last element is necessarily continuously ordered. Similarly for 7.9(h).
The notion of continuous ordering could be modified slightly so as to
allow for systems with first or last element (or both). Then we could easily
prove an existence theorem like 7.5(i) for any densely ordered system
(S, <) from the given Theorem 7.5. However, for the main case in which
we are interested, the set of real numbers, we do not need the modified
notions.
It may be thought that having proved the existence of a continuously
ordered extension of (Ra, <) we are now in a satisfactory situation re¬
garding the real numbers. However, there is as yet no guarantee that we
can suitably define algebraic operations on such an extension. Furthermore,
the property of being a continuously ordered extension of (Ra, <), while
an essential property of the real numbers, cannot be said to characterize
the real numbers from an algebraic point of view. For it can be shown
that there are two systems, (Slt <j) and (S2, <2), with this property
such that (Si, <i) (S2, <2). (In fact, there are many such systems.)
The proof of this is not difficult but giving it here would involve introduc¬
ing various notions from the theory of ordering relations that are not
useful to us otherwise in this text. In contrast, we shall see in the next
section that the real numbers can be completely characterized when the
algebraic structure is considered along with the ordering.
(This is a generalization of the sets Ca used in the proof of 6.5 for a E Ra.)
Proof, (i) Suppose for all n E P, n < a. Then a is an upper bound for P.
But (K, <) is continuously ordered, so sup P exists in K by 7.9; we put
b = sup P. Since n < b for all n E P, also n + 1 <6 and then n < b — 1
for all n E P. But b — 1 < b, and b — 1 is a bound for P, contradicting
b = supP. (ii) follows immediately by finding n E P with 1/a < n.
To prove (iii), we need to find m, n E I such that a < m/n < b. This
236 THE REAL NUMBERS [CHAP. 7
7.13 Theorem.
(i) If a E K then Ca 6 C/(Ra).
(ii) If a, b e K and a < b then Ca < Cb.
(iii) If A E U(Ra) then A = Ca for a unique a E K.
Proof, (i) Using 7.14(i) and the definition of Ca, we see that we must
show for any a, b E K and any u E Ra:
(1) a -\~ b < u if and only if there exist x, y E Ra such that a < x,
b < y, and x + y = u.
Clearly if the right side of this equivalence holds, so does the left. In the
other direction, we first find an x E Ra such that a < x and x + b < u,
that is, a < x < u — b; this is possible by 7.12(iii). If we then take
y = u — x, we have b < y and x + y = u. Before proving (ii), consider
the intuitive picture, let us say for Co < A:
A'
+
'- r
QA
Figure 7.8
238 THE REAL NUMBERS [CHAP. 7
If the right side holds, we have ab < xy, so ab < u. Suppose that ab < u.
If b = 0, we need only find x E Ra with a < x, which is possible by
7.12(i). For then if we take y = u/x, 0 < y since both x, u are positive
and xy = u. Suppose that 0 < b. Then there is an x e Ra such that
a < x and xb < u, that is, x < u/b, by 7.12(iii). Then again if we take
V = u/x, we have xy = u > xb, so that from x > 0 we conclude y > b.
The proof of the remaining parts is now direct. For example in (v),
writing A = Ca, B = Cb, we have by definition A ° B = Cab, QA = C_„,
(0A) ° B = C(—a)b, and 0[(0A) ° B] = (7_((_a)6) = Cab■ Similarly for
(vi), (vii).
Proof. We can assume, without loss of generality, that both fields con¬
tain Ra as a subfield, by 6.12. We call one (K, +, •, <, 0, 1), the other
(K, -J-, •, <, 0, 1). But both fields are = to (£/(Ra), ©, °, <, C0, C±),
by 7.16. We are justified in using this same field in both cases, since 7.15
shows how A © B, A » B can be defined for all A, B e U(Ra) in terms of
Ra only. For we know that U(Ra) is simply ordered by <, so that for
any A e U(Ra), Co < A or A < C0.
We have now achieved our first main object, and we turn to the second,
a proof that there exists at least one continuously ordered field. From
the preceding this should seem rather a straightforward matter. For we
have already proved in 7.5(i) that (f/(Ra), <) is a continuously ordered
system. Thus all that would be left is to show that if we take 7.15(i), (ii),
(iv)-(vii) as constituting a definition of ©, ©, ° on U(Ra), (f/(Ra), ©,
°, <, C0, Ci) is an ordered field. We know this must be so by 7.16 if there
exists any continuously ordered field (K, +, •, <, 0, 1) at all. However,
in a proof of this statement we would not be able to take advantage, as
we did earlier, of the presumed properties of such a K, but must rather
work out from scratch the fact that C(Ra) possesses all the desired
properties. This is a long and tedious task which provides as reward
little further insight into the matter at hand. We shall, instead, now turn
to another approach to the existence proof which is based on the ideas of
Cauchy.
Limits. Up to now our guiding idea has been to think of real numbers
as greatest lower bounds of sets A of rational numbers, in particular of
upper sections—or, dually, as least upper bounds of lower sections. If
we return to our discussion of measurement in Section 7.1, it is equally
natural to think of real numbers as limits of sequences (a0, . . . , an, . . .)
of rational numbers. For intuitively, even if a given length is not rational,
we can approximate it “as closely as we please ” by rational lengths, simply
by taking finer and finer subdivisions of the unit of measurement. The
decimal representation of a real number, such as \f2 = 1.4142 . . . , is
intended to signify that the terms of a certain sequence of rational num¬
bers 1, 1.4, 1.41, 1.414, 1.4142, . . . approximate \/2 in this sense. Of
course, this is by no means the only sequence which has this property;
this particular sequence is obtained by approximating the number from
below using successive subdivisions by tenths. We could also consider
sequences which approximate from above, or which alternate above and
below the number, or which are obtained by other types of subdivision,
etc. Thus, to begin with, we will think of real numbers as being limits of
arbitrary kinds of sequences of rational numbers, and only later inquire
240 THE REAL NUMBERS [CHAP. 7
as to what special kinds of sequences already give all real numbers. The
main questions that we shall have to answer are: first, what is a necessary
and sufficient condition for a sequence of rational numbers to have a real
number as limit and, second, what is a necessary and sufficient condition
for two sequences to be equivalent in the sense that they have the same
limit? Then to construct the real numbers, we will be able to abstractly
identify them with equivalence sets of such sequences.
As is usual in analysis, we use such letters as e, 8 in connection with
statements concerning approximation by smaller and smaller numbers.
We continue to assume in the following that (K, +, •, <, 0, 1) is any
continuously ordered field which extends the rationals. Before turning to
notions connected with limits of sequences, we want to re-express the
conditions for a number a to be inf A or sup A in “approximation language. ”
a b
Figure 7.9
the distance between the two points. (This is first realized for a > 0
and then extended to arbitrary o, b by considerations of signs.) In general,
if the relation of inequality is not specified, we use \b — a\ to measure the
distance; for \b — a\ = b — a if a < b, and |6 — a\ = — (b — a) =
a — b if b < a. We have defined the absolute-value function for any
ordered integral domain in 4.19 and verified its usual properties in 4.20.
In particular we can apply these to K.
7.20 Lemma. Suppose that Xk £ K for all k > 0. Then the sequence
(xk)k> o has at most one limit in K.
Proof. Suppose that a, b are both limits of (Xk)k<o■ For any e > 0
there exist mi, m2 such that |Xk — a\ < e for k > m^ and \xk — 6| < e
for k > m2. Let k > max (mi, m2). Then
lim(1 + (-l)‘i)= !■
7.24 Definition. Suppose that (xk), (yk) are sequences in K. Then (yk)
is said to be a subsequence of (xk) if for a certain sequence (jk) of
integers, 0 < j0 < ji < ■ • • < jk • • •, we have yk = %jkfor each k > 0.
7.26 Theorem. Every bounded sequence (xk) in K has at least one con¬
vergent subsequence.
and call [5, c] a (closed) interval. The mid-point of this interval, in geometric
244 THE REAL NUMBERS [CHAP. 7
(3) if there are infinitely many k such that Xk G [b, c] then the same
holds for at least one of the intervals [b, (6 + c)/ 2] and [(6 + c)/2, c\.
(4) Suppose that bn, cn are given. We take 6n+1 = bn and cn+1 =
(bn -f cn)/2 if there are infinitely many k such that Xk G [bn,
(bn + cn)/ 2], Otherwise we take bn+\ = (bn + cn)/‘2 and
Cn 1 = Cn.
For, we first see from (ii) that if n < m then bn < bm and cm < cn.
Hence for any m we have bn < bm < cm if n < m and bn < cn < cm
if n > m, using (5)(i). Thus in any case, given m this shows bn < cm
for all n, that is, cm is an upper bound for B. The last statement is proved
similarly. It follows from (6) and 7.9 that sup B and inf C exist. In fact,
since we have chosen smaller and smaller intervals [bn, cn], we have
For an exact proof, let b = sup B, c = inf C. Then b < cn for all n
by (6) and definition of sup. But then, by definition of inf, b < c. Since
7.2] CONTINUOUSLY ORDERED FIELDS 245
(8) Let jo = 0. Given jn, let jn+1 be the least k such that jn < k and
Xk E [6n+i, cn+i]. Let yn = Xjnfor all n > 0.
That there exists at least one such k larger than any given j is immediate
from (5)(rv). Thus this is a well-defined recursive procedure. By choice,
jn < jn+i for all n, so that
Finally, let
(10) a = sup B.
Then
(11) lim yk = a.
/c—>00
For take e = 1. Then for some m, |Xk — xi\ <1 for all k, l > m. But
\xk\ < |Xk — xi\ + |xi\ (by the triangle inequality applied to \xk\ =
\{xk ~ xi) + xi\), so \xk\ < 1 + \xi\ for all k, l > m. In particular,
246 THE REAL NUMBERS [CHAP. 7
Now consider any e > 0. By (2) there is an mx such that |Xj — a\ < e/2
if k > m\. Also there is an m2 with \xi — Xj\ < e/2 if j, l > m2. Let
m = max (mi, m2). Then, since jm > m, if also l > m we have
7.28 Definition.
(i) For each a e K we put a = (xk) where Xk = a for all k > 0.
For any (xk), (;yk> we put:
(ii) (xk) + {yk) = {xk + yk),
(iii) (xk) — (Vk) = W - yk),
(iv) (xk) • (yk) = (xk • Vk),
(v) (Xk)/{yk) = (Xk/yk) if yk 5^ o for all k, and
(vi) (yk) < (xk) if for some m and e > 0, e < xk — ykforallk > m.
7.2] CONTINUOUSLY ORDERED FIELDS 247
Proof, (i) By definition, given ex > 0, we can find m such that for all
k, l > m, |Xk — xi\ < ei and |yk — yi\ < ej. Given e > 0, take
€i = e/2 and m given by this ex. Then for k, l > m,
Also,
\xkyk — xvyi\
= |(xkyk — xiyk) + (xtyk — xtyi)| < \xk — xi\ \yk\ + |yk — yi\ \xi
Again pick c > 0 with |xf\ < c, \yi\ < c for all i. Further, since lim^*
yk ^ 0, there must be some d > 0 such that \yk\ > d for all k. (This is
seen as follows. First, there is e > 0 such that for all m, there is some
248 THE REAL NUMBERS [CHAP. 7
k > m with \yk\ > e. But (yk) is fundamental, so there is some m2 such
that for all k, l > m2, \yk — yi\ < e/2. Thus if k is chosen so that
\Vk\ > e, we must have \yi\ > e/2 for all l > m2. Now take d = min
(|?/o|, • • • , \ym2-i\, e/2). Then d > 0 since all yk ^ 0, and \yk\ > d for
all k.) But then for any ex and m with |Xk — xi\ < eiy |yk — yt\ < d
for all k, l > m, we have
IXkVk ~ ab| = \(xkyk — ayk) + (ayk — ab)\ < \xk — a\ \yk\ + |a| \yk — 6|.
Then, using c > 0 with \yk\ < c for all k and \a\ < c, we obtain the
desired result for given e > 0 by taking ex = e/2c. In general, a and b
take over the role of xi and yi, respectively, in the proof of (i).
We leave the proof of (iii) and (iv) to the student.
lim xk = a.
k
—>00
The proof that the various conditions of 4.1 hold for this system is quite
straightforward on the basis of the definition 7.28 of the various opera¬
tions. For example, to prove distributivity, we have
then (xk) + (—(xk)) = 0. The system Fd(Ra) also has the following
property:
Then
Hence (xk) = (zk) and we see that = is transitive, thus that it is an equi¬
valence relation. Suppose now that (xk) = (xk), (yk) = (y'k). Then
and
lim (fxk + yk) -- (x'k + y'k)) = lim ((xk - x'k) + (yk — y'k)) = 0,
k—»oo k—»oo
since lim^o, (xk — x'k) = lim^oo (yk - y'k) = 0. Thus (xk) + (yk) -
(x'k) + (yk). Also (xk) • (yk) = (x'k) • (yk) and (xk) • (yk) = (x'k) ■ (y'k) so
that (xk): (yk) = (x'k) • (y'k). To see the first of these, we consider
lim/c^oo (xkyk — xkyk) = lim^oo (xk — x'k) • yk. We cannot apply 7.28(h),
directly, since lim^,*, yk may not exist in Ra. However, since (yk) is
bounded, if we choose c G Ra with c > 0 and \yk\ < c for all k, then for
any e > 0, we can find an m such that \xk — x'k\ • \yk\ < e for all k > m,
namely an m such that \xk — xk\ < e/c for all k > m. Using commuta¬
tivity of •, the other statements follow directly.
7.2] CONTINUOUSLY ORDERED FIELDS 251
(6) For each <Xk) G Fd(Ra) we put [\Tfc)] = the equivalence set to
which (Xk) belongs, with respect to the relation =. We denote by
Fd*(Ra) the collection of all such equivalence sets. For (xk),
(yic) G Fd(Ra), we put
(a) [(xfc)] @ [<2/fc)] = K^fc) + (yk)],
(b) [<£*)] o [<2/fc)] = [<xfc> • <?/*)], and
(c) a* = [a] for each a e Ra.
(8) if X & Fd*(Ra) and 1^0* then there exists (xk) such that
%k ^ 0 for all k, lim/^x Xk X 0 and X = [(x^)].
Here we take < as defined in 7.29(vi), (yk) < (xk) if and only if for some
e > 0 and m, e < Xk — yk for all k > m. It must then be shown that if
252 THE REAL NUMBERS [CHAP. 7
(xk) = (x'k), (yk) = (y'k) then (yk) < (xk) if and only if (y'k) < (x’k). We
leave the proof of this to the reader. This allows us to define, unambig¬
uously,
(11) for (xk), (yk) e Fd(Ra), we put [(?/*)] <* [<arfc)] if (yk) < (xk).
Then
For let X = [(a;*)], 1 = [(//*)], where {xk), (yk) G Fd(Ra) and for some
e > 0 and m we have e < Xk — yk for all k > m. We can also choose m
so large that |Xk — xm\ < e/4 and |yk — ym\ < e/4 for any k > m. Then
for all k > m. Similarly xk ~ a > e/4 for all k > m. This shows that
Y <* a* <* X.
The notion of fundamental sequence can be applied to Ra*, leading to
the set Fd(Ra*); note the difference in meaning between this and Fd*(Ra).
Members of the former are sequences of elements from Ra*; members of
the latter are equivalence sets of fundamental sequences from Ra. How¬
ever, we have the following relationship:
For, by (14), the absolute value of the difference, in Fd*(Ra), of any two
elements xk and x* is just (\xk — xi\)*. Thus for (x*) to be in Fd(Ra*) we
must have that for any e* in Ra* with 0* <* e*, there exists m such
(|Xk — xi\)* <* e* for all k, l > m. By (14) this is equivalent to (xk) G
Fd(Ra).
We can further consider the question of whether, for (x*) e Fd(Ra*),
lim^^oo x* exists in Fd*(Ra). In fact we have the following:
Let X = [(xk)]. What must be checked here is that for any e* in Ra*
with 0* <* e* there exists an m such that the difference between x*
and X in absolute value is <* e* for all l > m—equivalently, that for
any e in Ra with 0 < e there exists an m such that \x% — {xk)\ < e for
all l > m. Since |(t/*)| = {\yk\) for any sequence, we have |xi — (x*,)| =
254 THE REAL NUMBERS [CHAP. 7
{\xi — £fc|)fc>0 for any given l. By the definition of <, we must thus show
that, given e > 0, there exists m such that for any l > m there exists
ex > 0 and mi such that Ci < e — \xi — xk\, that is, \xk — xi\ < e — e1}
for all k > mi. In fact, since <xk) E Fd(Ra) by (16), we can find m such
that \xk — xi\ < e/2 for all k, l > m; hence we can choose ei = e/2 and
nii = m to satisfy the preceding.
Now by the general result (2:4-9) we can find a set K, define operations
+, • on K and a relation < in K, as well as a function F on K, satisfying
the following conditions
From these properties we shall now deduce our theorem, by showing that
0 < m < 2k. Put Zk,m — b + m(a — b)/2k. Whenever Zk,m £ A and
m < m' < 2k, also zk,m’ E A, since zk,m < zk,m'. But zk,o *2 A. Hence
if mk is the least m such that zk,m E 4,we have mk > 0 and zk,mk-i 2 A.
Thus, if we set xk = and yk = zk,mk for each 7c, we have xk £ A,
yk E A and yk — xk = l/2k for all k. Furthermore, since we are per¬
forming successive divisions by halves, we see that if k > l, then xi < xk
and yk < yi. Then (xk) E Fd(Ra). For, first, each xk £ Ra since we
chose a, b E Ra. Next, consider any e > 0, and pick m with l/2m < e.
7.2] CONTINUOUSLY ORDERED FIELDS 255
Of course, from now on we can apply any of our preceding results about
arbitrary continuously ordered systems and fields to the real numbers.
Calculate lim sup^*,^ and lim infk^xXk for Xk = (—l)fc[l + 1 /(&+ 1)].
7. Prove the following, for any bounded sequence (xk) of real numbers.
(i) If for each k > 0 we put Zk = sup {xp.l > k}, then lim^a, Zk exists
and =lim sup^oo^fc.
(ii) lim sup/c^ooZi; = a if and only if
(a) for each e > 0 there exists m such that Xk < a -j- e for all k > to;
(b) for each e > 0 and to there exists k > to such that Xk > a — e.
Formulate corresponding results for lim \nik^xXk.
8. Prove that if (Xk) is a bounded sequence of real numbers, then lim^o, Xk
exists if and only if lim sup^ooXj, = lim inf^ooXfc.
9. Prove that if (xk) is a bounded sequence of real numbers which is non¬
decreasing, i.e., Xk < Xk+ i for all k, then lim^*, Xk exists.
10. Show that for each d G Re, d > 0, there exists a G Re with a2 = d.
[Hint: Consider sup A where A = (i: i G Re and x2 < d].]
71 — 1 00 00
^ T“ ^ ^ y xi-
1=0 i=n i=0
7.36 Theorem. Suppose that b G P, 6 > 1. Then for any a G Re, a > 0,
there exists a sequence (mi) of integers with m0 > 0, 0 < mt < b
for all i > 0 and a = XT=o mi • b~\
Proof. One way to find such a sequence is to “creep up on a" from the
left. Thus, for example for 6 = 10, we first take m0 as the largest integer
<a, then mx/10 as the largest tenth in a — m0, then m2/102 as the
258 THE REAL NUMBERS [CHAP. 7
k
(1) V
y; niib 1 < a.
i=0
and
k
mk+ib~(-k+1> < a — y mib~\
i—0
This completes the induction. Since (1) holds for all k, we also have by 7.35
Hence
lor this also holds for k = 0 by (2). Again by (3) we have nikb~k <
a ~_Li=o mib~l for any k > 0; but then by (6) applied to k — 1,
< b x\ so nik < b. This proves (5). To prove that a =
Ya=o we need to show that
(7) for any e > 0 there exists n such that {a — Zi=o m-b^) < e
for all k > n.
7.38 Theorem. If 0 < yi < Xi for all i > 0, where Xi, yi are real, and if
DLo Xi converges then Ya=o Vi converges and 0 < Y7a=o Vi ^
L oo
2=0
260 THE REAL NUMBERS [CHAP. 7
a — Y, mib~l = ^ mib~l
%=0 i—/c —(-1
for any k > 0. Since 0 < < 6 for each i > 0, we have
using (among other facts) 7.37, 7.38. (On what grounds can we factor out
(6 — l)6~(/c+1)?) Thus for any k > 0, a — Ei=o mi6—1 < b~k. [Note
that we verified this as (6) in the proof of 7.36, but only for the special
sequence considered there.] Now if a — E;=o = b~k for any k,
we would have a rational and of the form a = c/bk, contrary to hypothesis.
Hence a — E;=o mfo~l < b~k for all k; of course, the same holds if we
replace ra; by mf Now suppose that there is some k with mk ^ mk;
let n be the least such k, so that for i < n and, say, m'n < mn.
Then E;=o rn'ib~l < E;=o rnf)~l < a, and
But then 0 < mn — m'n < 1, which is impossible for distinct integers.
7.3] INFINITE SERIES; REPRESENTATIONS OF REAL NUMBERS 261
We leave it to the reader to verify that there are exactly two representa¬
tions of the form 7.36 for each a of the form c/bk.
The reader is probably familiar with the fact, for the base b = 10,
that every rational number has an eventually repeating or periodic repre¬
sentation. This is, in fact, a characteristic of the representation of rational
numbers in any base b > 1. By this we mean that a = £/=0 m*for
some mi which satisfy the conditions of 7.36, and which, for a certain
n and q, satisfy mi+g = m, for all i > n (period q), if and only if a e Ra.
The details of this are left for the exercises.
There are other kinds of representations of real numbers which are also
interesting. One is the so-called continued-fractions representation. These
are representations of real numbers as limits of sequences
m0, 1
m0 -f mo + m0 H- 7 * *
1
m-i -\- m.\
m2
m2
m3
1 1 1
mo, m0 H-> mo ~h m0 +
mi m+ m2 m+ m+ m3
1 1 1
mo +
m+ m+ m+l
For example (but only for very special reasons), the following suggests
such a representation of \/5, i.e., of a solution x of x2 — 5 = 0. We
write x2 — 4 = 1, so (x — 2) = 1/(2 + x), x = 2 + 1/(2 + x). Then
by successively substituting this expression for x we get
x = 2 + x = 2
1
4 +
2 + x
4 +
2 + x
2 + —-•
' 4+ 4+ 4+
as taken in the above sense, but the statement is indeed correct. Again
we leave to the reader the study of some of the introductory ideas dealing
with continued fractions.
262 THE REAL NUMBERS [chap. 7
Power series. In particular, infinite series play their main role in analysis
as a means of representing and investigating a large class of functions.
Algebraically, the simplest functions (of one argument) to deal with are
the polynomial functions (5.1, 5.9). It is natural to generalize these, in
the real numbers, to functions definable in the form F(x) = 127=o ^{X1 or,
as is usually said, by means of a power series. However, in contrast to
polynomial functions, such a representation need not be meaningful for
all values of x, simply because the series need not converge for every value
of x. For example, we have already seen in 7.37 that the power series
127=0 H converges if and only if |x| < 1. In the cases that it converges
it has the same value as the function F(x) = 1/(1 — x). But the behaviors
of the function and the series are otherwise quite different, for the function
is defined for all values of x 1. (Recall the remarks concerning an
uncritical use of the relationship between this series and function which
we made in Chapter 1.)
The following theorem will provide us with a condition for testing the
convergence of a power series which will be sufficient for our purposes,
though stronger results can be obtained. (We leave the proof of one of
these to the exercises.)
Proof. In (i), 0 < c. Choose d with c < d < 1 and then let e = d — c.
By hypothesis there is an m such that \\bk+1/bk\ — c\ < e for all k > m;
hence \bk+1/bk\ < d for all k > m. Then |6fc+1| < d\bk\ for all such k.
7.3] INFINITE SERIES; REPRESENTATIONS OF REAL NUMBERS 263
Thus \bm+11 < d\bm\, |6m_|_2| < d\bm+l\ < d2\bm\, and we see in general,
by induction on i, that
Now the series 217=0 dl converges by 7.37 since |d| < 1. Hence so does
the series |bm\ 227=o d1 = 227=o \bm\dl. It follows by 7.38 that 227=m \bi\ =
12?=o \bm+i\ also converges. Hence we see that 227=o \b%\ converges, so
that Z7=o bi converges by 7.34(iv).
To prove (ii) we find m such that \bk+1/bk\ > 1 for all k > m. But
then it is seen that |6m+i| > '\bm\ for all i. Thus for e — \bm\ we have
e > 0 and \bk\ > e for all k > m. But if 227=o bi converged we should
have linifc^oc bk = 0 by 7.34(h), in contradiction to the preceding.
It is seen from the proof that when c < 1, we have the (in general)
stronger result that 227= u F converges. The reason is simply that
A+l bk+1
16,1 bk
Note that the test gives no information when c = 1. There are many
examples of series 22t=o bi with linifc^ \bk+i/bk\ = 1, some of which
converge while others diverge. For example, 227=o 1 diverges, and it can
be shown that 227= i (1/f) also diverges while 227= i (1 A’2) converges.
Proof.
as given in 5.4 (ii). Of course in that case we were dealing with essentially
finite (eventually zero) sequences, and no questions of convergence were
involved. Examples can be given to show that the corresponding operation
on infinite series does not always lead from convergent series to convergent
series. However we can obtain the following result.
7.44 Theorem. Suppose that Xu=o |oq| and Z)u=o 4 converge. Then so
also does DT=o (Hj=o ajbi—j) and we have
(t *\(± b)= t (i
\2=0 / \ i=0 / 2 = 0 \j = 0 /
Pvoof. Let Sfc 2^2=0 Q'ty tjc 5^2 = 0 bij Mk — X^2=0 (2^i=0 afti—j) •
Further, let a = £?=o K|, a = Xa=o ai [using 7.34(iv)], and b = Xa=o 4-
We can assume a > 0, for otherwise a; = 0 for all i and the desired con¬
clusion is obvious. The conclusion of the theorem can now be stated as
We wish to prove this result by using the facts that lim^oo = 0 and
lim/t_»oo (Za=o |«i|) = a. Suppose we are given an e > 0. We know that
266 THE REAL NUMBERS [chap. 7
the sequence of |<4| is bounded, say |<4| < d for all k, where d > 0.
Choose m so that |e4| < e/2 a for all k > m. We can also choose m large
enough to satisfy J2i=i Wi\ < e/2d for all k > l > m. Then for any
k > 2m,
k_
A/ III.
^ dk—i ^ ^ ^ l^idk- y v a idk i | —
< ^ E M d E a7;
2=0 =TO+1
e , e
6.
<12a'aJrd'Td~ 2 ' 2
From this we can conclude that (3), and hence the theorem, is proved.
E E
^ / o
xY i\
E(x) ■ E(y) = ( i—J
ERE JKi ~ j)! * v
*=0 \j—0 jKi ~ J)! j=o
i\
(x + yYt
jy.
so that here the right-hand side is just E(x -f y). We leave the proofs of
the remaining parts to the reader.
1. Show that the following two conditions are equivalent for a, b with
a G Re, a > 0, and b G P, b > 1:
(i) a = c/d for some c, d £ I, with (c, d) = 1, such that whenever p is
prime and p\d then p\b;
(ii) a = e/bk for some e, k £ I.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 267
2. Prove that each real number of the form c/bk, c, k £ I, has exactly two
representations of the form 7.36.
3. Prove Theorem 7.37(i), (ii).
4. (i) Show that if a real number a has a periodic representation to base b,
that is, a = ]Tf=0 mib~l where for some n,q, mi+q = to; for all i,
then a £ Ra. [Hint: Consider (bqa) — a.]
(ii) Conversely, to show that every positive rational number a = c/d
(c, d £ P) has a periodic representation to the base b, let
a = m0 —7
TO r TO 2 mk
1
a = m0 H-- —7
?
TOl TO2 mk
Then if any such curve is above the horizontal axis at one point, below at
another, it should cross the axis somewhere between the two points. More
precisely: if for some a, b, f(a) > 0 and/(6) < 0, there should be some x
between a and b with f(x) = 0. Of course, to the uninstructed eye, this
looks as though it should already occur when we consider the graph of
/ in Ra X Ra. However, as we have realized, there are (despite the
density of Ra) “gaps” in Ra through which such a curve could pass.
On the other hand, all such “gaps” are filled in Re, at least when it is re¬
garded as a continuously ordered system. We turn now to a discussion of
the general notion of continuous function (on Re) and to a verification
of the above facts.
Proof. Suppose that a < b. [The proof for a > b proceeds similarly
or can be obtained from the present proof for a < b, using the continuous
function G(x) = —F(x).] We have in mind the graph of F:
It is not excluded that F may have several roots between a and b. We shall
prove the existence of the rightmost root. We define
and then let c = sup A. Then sup A is well defined since a E A and b
is certainly a bound for A. Clearly a < c < b. If F(c) — 0 we are
through. Suppose next that F(c) > 0. Let e = F(c). Then by the con¬
tinuity of F at c, we can find 8 > 0 such that if \x — c\ < 8 then |F(x) —
F(c)| < e. Since c = sup A, there exists at least one x e A with \x — c\ < 8
by 7.18(h). For such x, F(x) < 0, so that |F(x) — F(c)| = (F(c) —
F(x)) > F(c) = e, contradicting the preceding. Suppose finally that
F(c) < 0. Let e = |F(c)|. Again we find 8 > 0 such that |F(x) —
F(c)\ < e whenever \x — c\ < 8. In this case, consider any x with
c < x < c + 8. Then x & A by c = sup A. Hence F(x) > 0. But then
7.49 Theorem. Suppose that F is continuous and that a, b are real numbers
with a < b. Let A = {F(x): a < x < b}; in other words, A is
the range of F restricted to {x: a < x < b}. Then A is bounded
above and below.
270 THE HEAL NUMBERS [CHAP. 7
7.50 l heorem. Suppose that F is continuous and that a, b are real numbers
with a < b. Then:
(i) there exists at least one number c such that a < c < b and
Fix) < F(c) for all x with a < x < b, and
(ii) there exists at least one number c such that a < c < b and
F(c) < F(x) for all x with a < x < b.
Proof, (i) Let A = {F(x):a < x < 6}. Since A is bounded above
by the preceding theorem, it has a least upper bound, call it d. Then by
7.18(h), for each n G P we can find an xn such that a < xn < b and
d — F(xn) < 1/n. We apply the Bolzano-Weierstrass theorem again to
find a convergent subsequence (yk) = (xnk), n0 < nx < ■ ■ ■ < nk < ■ ■ ■,
of the sequence (xn). Let limj,^* yk = c. Then limfc^w F{yk) = F(c)
by 7.47. Clearly, a < c < b. We show now that d = F(c). Given any
e > 0 we can find n e P with l/n P e/2 by the Archimedean property
7.12(h). Then if k is any integer with n < nk, we have \d — F(yk)\ =
d — F(xnk) < l/nk < l/n < e/2. Since lim^^oo F(yk) = F(c), we know
that there is an m with \F(yk) - F(c)| < e/2 for all k > m. Hence by
choosing large enough k we obtain
Since this is true loi any e 0, it follows that d = F[c) and hence F(c)
is the supremum of A. The proof of (ii) is quite similar.
A e say of (i) that F attains or takes on its absolute maximum at c, with
respect to the interval a < x < b. Similarly in (ii) we say that F attains
its absolute minimum at c for this interval.
As they stand, 7.48 and 7.50 are pure existence results. That is, no
statement is made as to how to locate the numbers c with the given
properties, say by means of some fundamental sequence converging to c.
We shall have somewhat more to say about this question farther on in
this section.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 271
Real 'polynomials and their roots. We now turn to applying these results
to polynomials. This is done by means of the corresponding functions,
which we shall show to be continuous via the next theorem. Its statement
and proof are related to 7.29.
7.51 Theorem. Suppose that c e Re, and that G, H are continuous func¬
tions. Then the function F defined by any one of the following conditions,
for all x G Re, is continuous:
(i) F(x) = c;
(ii) F(x) = x;
(iii) F(x) = G(x) + H(x);
(iv) F(x) = G(x) ■ H(x).
Proof, (i) and (ii) are trivial. In (iii)-(iv) we consider any real number a.
(iii) Given e > 0, we can find <5j > 0, <52 > 0 such that
6
if \x — a < then 1 G(x) - G(a) |
K 2
e
if \x — a\ < S2 then \H(x) - H(a)\ < 9
\F(x) - F(a) |
= |(G(x) - G(a)) + (H(x) - H{a))\ < \G(x) - G(a)\ + |H(x) - H{a)\,
Letilfx = max (\G(a)\, l). Then M y > 0. We can find 8 2 > 0 such that
By 7.49 the values H{x) for \x — a\ < S2 are bounded from above and
below so that we can find an M2 such that
Let 8 be the smaller of 51; <52. Thus whenever \x — a\ < 8 we also have
\Ii(x)\ < M2 and hence \G(x) — G(a)\ \H(x)\ < e/2 and |(7(a)| |H(x) —
77(a) | < e/2, so that, finally, \F(x) — T(a)| < e.
We could also obtain continuity of F(x) = G{x)/H{x), provided
H(x) ^ 0 for all x. However, this is not needed here.
7.52 Corollary. If /(£) e Re[£], that is, /(£) = Y/l=o a^1 where each
ai is real, then the associated polynomial function fix) = Y/l= o a^a/
is continuous.
Proof. By induction on i from 7.51 (ii), (iv) we see that each of the
functions g{(x) = x1 is continuous; then so is fi(x) = aixi by 7.51 (i), (iv).
Then the result here follows by induction on n using 7.51 (iii).
This theorem suggests that also functions defined by power series,
F{x) = ELo Uix\ should be continuous (at all points for which they are
defined). However, 7.51 cannot be applied directly to obtain this result,
and some additional considerations are needed. These are pursued in the
exercises (3, 4 below). In particular, it will be seen that the function E
defined in 7.43 is continuous.
In order to apply 7.48 and 7.52, we next prove the following inequalities.
Proof. Using the fact that a > — |a| we see that for any x
Now if x > M then certainly * > 1 so that for any i, xi < U+1. Hence
rf x > M and 0 < i < j < n then xi < xj. It follows that if x > M then
a Lx 1
tool V.W-
\<hi-l\Xn
Thus
proving the first part of (i). To prove the second part we write
and hence
(4) if x < -M then (-1 )nf(x) > yn - Myn~l > 0, for y = -x.
7.55 Theorem. Suppose that n e P and a > 0. Then there exists a unique
real number x such that x > 0 and xn = a.
Proof. Consider /(£) = £n — a. Then /(0) < 0 and f(b) > 0 for any
b > a + 1. Hence there is at least one a: > 0 with x" — a = 0. Suppose
that xn = a and yn = a where 0 < x, 0 < y. If x < y then (by 4.15)
xn < yn, which is contradictory. Similarly we cannot have y < x, so
x = y.
In general, it can be seen that for a > 0, xn = a has exactly two
solutions x if n is even, while it has just one solution x if n is odd. If a = 0,
it has only x = 0 as solution. If a < 0 it has no solutions if n is even, and
exactly one solution if n is odd.
Comparison of these results with those of 6.20 and 6.21 already indicates
the substantial advantage which the real numbers give us in determining
the existence of roots of polynomial equations. We do not yet have enough
274 THE REAL NUMBERS [chap. 7
We shall have more to say at the end of this section about the relation
of these kinds of powers to the ones so far treated. As usual, we write
\/a for y/a. Note that whenever n is even, an > 0 no matter what the
sign of a is, and hence y/an defined; we have y/an — |a|.
b2 — 4ac
that is,
4a2 “
Since (x + b/2a)2 > 0, 4a2 > 0, if the polynomial has any roots we must
have b2 - 4ac > 0. Conversely, if b2 - 4ac > 0, we can exhibit the
roots as in (ii) and (iii).
As we shall see from our study of the complex numbers, the only prime
polynomials in the real numbers are the linear a£ + b and the quadratic
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 275
(4 - a) n — 1 a
L'fc + l — xk
nx n—1 n nx1}*1
k
It appears from Fig. 7.12 that lim^oo xk = \/a. However, the proof
of this is quite another matter. We illustrate such for n = 2. In this case
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 277
.. ..
/- i/ Va , a
xk+1 \ Cl — C,\xk V fl) — ——h ~—
2 2 2xfc
= W
2
- Vi) - ^
2
^ -xfc v^)
= (t/c — V a) /. _ \/a\ = (xk — Va)2
2 \ xk / 2xk
Using this, we can prove by induction on k that \/a < xk. For it is true
for xq by hypothesis. Suppose that it is true for k. Then (xk — \/a) z/2xk
> 0, so xk+1 — \/~a. > 0, thus proving it for k+ 1. It then also follows that
1 \ Cl ^ /Xk ^
“6 V b ) '
xk — Va (x0 — Va\2
b V b / ’
that is,
— 1 r r~\ 2k
^
xk — Va < - (x0 — va .
b2k-l
x0 — Vc
Xk+1 and hence — Va <
2k
from the above and assuming xi — -y/2 < 0.1, we see that x3 — \/2 <
(0.1)5 = 0.00001. The method is as good, or better, than the usual
278 THE REAL NUMBERS [chap. 7
Figure 7.13
If in this figure c2 is irrational, then we will never find out by such calcula¬
tions that there are any solutions x of F(x) = 0 between a and 6, other
than ci, since the sign of F at the end points of each subinterval which
contains c2, but not c1; will be positive.
The situation pictured in Fig. 7.13 could not occur if the function we
are dealing with is a polynomial function / which has only simple real
roots in the sense of 6.38. For if c is any real root of /(£) we have /(£) =
(£ — c)ff(£) with g(jj) E Re[£] and g(c) ^ 0. Say, for example, that
g(c) > 0. Then by continuity of g we can find an e > 0 such that g(x) > 0
whenever \x — c\ < e. Hence f(x) has the same sign as x — c if c — e < x <
c + e and thus f(x) < 0 if c — e < x < c and/(;r) > 0 if c < x < c + e.
Similarly, in the case that g(c) < 0, f(x) has the same sign as — (x — c)
in a suitably small interval around c.
This leads to an algorithm for isolating all the real roots of an arbitrary
polynomial /(£) e Re[£], For by 6.39, if we take d(£) = (/(£),/'(£)) and
/i(£) = f(k)/d(0, /i(£) has exactly the same real roots as /(£) and /x(£)
has only simple roots. Furthermore, we can find by 7.53(h), a and b between
which all roots of /(£) lie. Thus, because of the remarks of the preceding-
paragraph, the general procedure described above can be successfully
carried through for fx, giving the desired result for/.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 279
Even if a continuous function F has roots of the sort pictured in Fig. 7.13,
we could still apply the general method for isolating all of these if, first
of all, it has only finitely many roots in all, and, second, we have some
algorithm which tells us exactly how many roots F has in any given in¬
terval. Such an algorithm can in fact be given for arbitrary polynomial
functions / by a judicious use of Euclid’s algorithm for computing a
greatest common divisor of/(£) and/'(£). The statement of this new algo¬
rithm is known as Sturm’s theorem. While it is not needed in our further
work, it is presented here for its own sake and because of its distinctive
character relative to the algebra of real numbers. Before proving the
theorem, we introduce two concepts which play an important role in it.
The condition yi—\ • yi > 0 means simply that yi—X and yi have the
same sign, i.e., either yi—\ > 0 and yi > 0 or yi—\ < 0 and yi < 0.
The condition yi—\ ■ yi < 0 means that they have opposite sign, i.e.,
either yx_ 1 > 0 and yi < 0 or yi_x < 0 and yi > 0. As an example of
the above, consider the computation of E((0, 2, 0, 1)). The associated
280 THE REAL NUMBERS [CHAP. 7
That is, ignoring zero values, we have three changes of sign in the original
sequence, namely from 2 to —1, from —1 to 3, and then finally from
2 to —2.
One way of finding a greatest common divisor d(f) of /(£),/'(£) is by
repeated use of the division algorithm 6.26 in just the same way as one
uses Euclid’s algorithm for finding a gcd in integers (which we discussed
following 4.42). That is, we write /(f) = 0o(f)/'(f) + o(f), where
deg (r+f)) < deg (/'(f)), then apply the division algorithm to /'(f),
r+f), and so on. For this discussion it is more convenient to determine
o(f) by /(f) = 0o(£)/'(£) — o(D* We then write
where 0 < deg (o+1(f)) < deg (o(f)). We continue this procedure
until we reach the first m with deg (rTO+1(f)) = 0. If rm+1(f) = 0, we
have rTO_i(f) = </TO(f)rm(f), and rm(f) is the desired gcd. Otherwise,
rm+i(f) is a constant, rm+1 5^ 0, and rm(f) = 9m+i(f)o+i, since any
7.59 Definition. Letf(£) e Re, deg (/(f)) > 0. By the Sturm sequence
associated with /(f) we mean the sequence (/0(f), /i(f),..., /TO(f))
determined by the following recursive conditions:
(i) /o(f) = /(€),/i(f) = /'(f);
(ii) /or eac/i i with 0 < i < m, +_i(f) = 0f(f)/<(f) - /i+1(f),
w^ercO < deg (/<+1(f)) < deg (+(f)) and fi+1(Z) ^ 0;
(hi) /m-l(f) = 0«(f)/m(f).
For any real number c, we take F/(c) = F((/0(c), /j(c), . . . ,/m(c))).
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 281
7.60 Theorem. Suppose that /(£) e Re[£], deg (/(£)) > 0. Suppose that
a < b and f(a) ^ 0, f(b) 9^ 0. Then the number of distinct roots
c °f /(£) with a < c < b is equal to F/(a) — F/(6).
(2) whenever f(c) = 0 there are k > 0 arid gf(£) with g(c) 5^ 0 and
/()(£) = (£ _ C)d(£b
7i(f) = A#(£) + (f - c)^(€).
For /w(f) has no real roots other than those of /(£). Then for any such x,
division of each term of the sequence (/o(x), /i(x), . . . ,fm{x)) by the non¬
zero constant fm(x) does not change the number of variations of sign
(although it will change individual signs if it is negative). We put
Since each polynomial has only finitely many roots, we can write
Pick d0, dt+1 arbitrarily with d0 < di and dt < dt+\. Thus
(7) if i < m and 0 < / < t + 1 then fi(x) has a fixed sign for all
x with dy_x < x < dj, i.e., either ffix) > 0 for all such x or
fi(x) < 0 for all such x. If further Jfidf) ^ 0, then ffix) has the
same sign as Ji(dj) for all such x.
To see the first part, suppose there were Xi, x2 between dy_i and dj
with ffixi) > 0 and fi(x2) < 0. Then by Weierstrass’ theorem 7.48 there
would exist an x between x\ and x2 such that ffix) = 0. This contradicts
(5) and (6). The proof of the second part is similar. It follows that
(8) if 0 < j < t + 1 and d,j_i < .iq, x2 < dj then Vj(x1) =
V?(x2).
Thus we need only investigate how Vj(x) changes at the dj. Note that
(9) there is no i with 0 < i < m and x such that fi_i(x) = 0 = fi(x).
We write c for dj. To prove (11) let us compare the signs in the sequences
If for given i with 0 < i < m, we have fi(c) — 0 then by (9) /i_i(c) 5^ 0
and by (10) i(c) = —Ji+1(c). Hence we have one variation of sign
at this position for x = c. Again by (9) 7i— i(x) has the same sign as
7i-i(c) and fi+1(x) has the same sign as /;+1(c) for 1 < x < dj.
Thus no matter what the value of fi(x) is, there will continue to be only
one variation of sign at this position in the sequence. Hence, if Vf(x) is
different from Vj(c) it can only be that J0(c) = 0. In this case we have
by (2) certain k > 0 and <7 (if) with fo(x) = (.x — c)g{x), fi(x) = kg(x) +
(x — c)g'(x) for all x, where g(c) 5* 0. Thus J0(c) = 0, 7i(c) — kg(c).
On the other hand, if dj—i < x < dj = c, g(x) 7= 0 and we see by (9)
that g(x) has the same sign as g(c), and J\(x) has the same sign as 7i(c),
hence as g(c). But /0(x) has opposite sign to that of g(x) for x < c; hence
fo(x) and fi(x) have opposite signs for < x < dj. In other words,
as we move to x = c, we lose one variation in sign at this position in the
sequence. It may well happen that, in this case as well, there is some
i > 1 with Ji{c) = 0. But the same argument as before shows that this
can cause no change in the existence of a variation at such a position.
Hence (11) is proved.
Now, given a, b with a < b and f(a) 5^ 0, f(b) ^ 0, we can further
assume of d0, dt+1 that d0 < a, b < d(+1. There is a unique l < t + 1
with di-1 < a < di. Since the roots off are exactly the same as those of /
by (2), we can apply (11) to prove by induction on j that if l < j then
Vj(a) — Vf(dj) is equal to the number of distinct roots c of /(£) such that
a < c < dj. I11 particular
we see that for suitably large values of 6, Vf(b') has the sequence of signs
+, hence V/(b) = 1. On the other hand, F/(a) for suitably small
(i.e., large negative) values of a has the sequence of signs +, —, —, -f,
hence Vf (a) — 2. Thus/($) has exactly one real root c [which, in this case,
we already know from the decomposition/($) = ($ — 1)2($2 -f- 1)].
Once we know exactly how many real roots a polynomial /($) has and
we have a rational interval a0, b0 such that all roots c lie between a0 and
b0, the Sturm procedure will lead in a finite number of steps to finding
disjoint rational intervals ait bi each of which contains exactly one root
Ci with ai < a < bi. Then for each such root we can find a fundamental
sequence of rationals which approaches it and which we can calculate to
any desired degree of accuracy by the method described earlier. Although
this is a tedious matter for “hand” computation, it is quite routine once
the Sturm sequence for the given polynomial is found and is quite suitable
for machine computation. A more general algorithm is known which will
not only determine the number and location of the roots c but also their
multiplicities. We shall not develop this procedure here.
It is natural to ask whether we can develop, as we did with the Null-
stellensatz, an algorithm to do the following: for any continuous function
F whose values F(x) can be computed for all x by some procedure, the
algorithm will compute for us a (fundamental sequence converging to a)
number c between a and b at which F attains its maximum in the interval
[a, 6]. A partial answer from analysis is that this is possible for a re¬
stricted class of functions, namely among those whose derivative function
F' is continuous. In this case it can be shown that if F attains its maxi¬
mum at c in [a, 6], then either c = a or c = b or a < c < b, F'(c) = 0
and F{c) > F(a), F(c) > F(b). This provides us with the required
algorithm for any function F for which we have an algorithm for finding
all solutions c of F'(c) = 0. In particular, this applies to all polynomial
functions on the real numbers. A similar result holds for locating minima.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 285
rational exponents we can see, by taking appropriate limits, that the same
properties continue to hold for arbitrary real exponents.
Now it can be shown that for any a > 0 the function Fa is a continuous
function on Re. Essentially this reduces to showing that Fa is continuous
at 0. For if we wish to show it continuous at any real number b, we are
considering
b
(a + h)
n — 1
Xk+1
n
288 THE REAL NUMBERS [CHAP. 7
Show that \Ta < Xh for all k (so that Xk ^ 0 and Xk is well defined bjr
the preceding for all k), and that lim/^* Xk = v'a.
8. How many real roots does/(£) = — ^£2 + £ — 2 have? Find the
smallest interval [a, f>] with a, 6 £ I which contains all these roots.
9. Suppose that/(£) £ Re[£] is of degree n > 0 and that a < b. Prove the
following:
(a) If /(a) = fib) = 0 then /'(£) has at least one root c with a < c < b.
This is known as Rolle’s theorem. [Hint: Consider first the case that
/(£) has no roots between a and b, and use the factorization /(£) =
(£ — a)4(£ — b)lgig), where g(a) ^ 0, g{b) ^ 0.)
(b) No matter what /(a), fib) are, there exists c with a < c < b and
f'(c) = [f(b) — f{a)]/(b — a) ilaw of the mean for the differential
calculus).
(c) If fix) > 0 for all x with a < x < b then / is increasing in [a, 6],
that is, f{x) < fiy) whenever a < x < y < b. Similarly, if/'ix) < 0
for all x with a < x < b then / is decreasing in [a, 6].
(d) If / attains a maximum at c in the interval [a, b], that is, a < c < b
and fix) < fic) whenever a < x < b, then either c = a or c = b or
fie) = 0. The same holds if/ attains a minimum at c in the interval
[a, 6].
10. Develop the properties of the functions Gaix) = axix £ Ra) and Faix) =
axix £ Re), as indicated at the end of this section.
then by the preceding definition deg (/(£)) = 0, /(£) = a0, and hence
a0 = 0; thus ai = 0 for each i < n. In other words, if Ra[x] is defined
as above, it is a simple transcendental extension of Ra. It follows from
5.6 that Ra[£] ^ Ra[.c], under the natural correspondence F(a) = a for
each a G Ra and F(£) = x. Note, however, that the statement that there
exist transcendental real numbers, in the sense of 7.61, is a much more
special statement than the general existence theorem 5.7 for simple trans¬
cendental extensions, even when the latter is restricted to K = Ra.
Once the existence of transcendental numbers is proved, one can go on
to ask whether particular real numbers, such as e, ir, \/2v5, the solution
x of 2X = 3, etc., are transcendental. As we shall see, the first method we
shall use to obtain the existence of transcendental numbers will, in principle,
permit us to “exhibit” a specific number of this kind; however, this
approach would be impractical. In the remaining part of the section we
will take up a second proof which does show how to construct some
particular transcendental numbers in a simple explicit way. Concerning
the specific numbers just mentioned, the matter is more difficult; we shall
say more of this later.
Cantor's method. The first method of proof is due to Cantor, the founder
of the mathematical development of set theory, who lived toward the end
290 THE REAL NUMBERS [CHAP. 7
of the 1800’s. In the specific problem we have before us, we are dealing
with two infinite sets A, B with B c A, of which we are trying to show
that A ^ B, that is, A — B is nonempty—namely, A = Re, B = the
set of algebraic real numbers. If we do not have any simple or direct
method to exhibit an element x of A — B, we may ask whether there might
not be some other way of showing that A has more elements than B.
For the case that B c A, this would simply amount to showing that A
and B do not have the same number of elements. Speaking precisely, we
have in mind here using the notion of set-theoretical equivalence introduced
in (2:4-4). This was a special case of isomorphism of systems; sets A
and B are set-theoretically equivalent if the systems (A) and (R) are =.
We now introduce a special symbol for this relation.
7.63 Theorem. Suppose that A is any set and S is the set of all subsets
of A, S = {X: X c A}. Then:
(i) there exists AcS such that A ~ A, but
(ii) A 56 S; in fact, there is no function F with 2D(F) = A and
(R(F) = S.
(l if keX,
lo if k & X.
At first sight this seems to show that Re has at least as many elements
as S, and hence Re ^ P. We shall see that this is true, but there is still
a slight complication which must be overcome. The function F just
defined is not one-to-one because of the failure, in certain cases, of the
uniqueness of the representation. Thus, for example, the sets AG = {1}
and A2 = {2, 3, 4, . . .} will have F(Xx) = F(X2). Realizing this, we
need only modify our function F slightly to guarantee uniqueness, namely
by taking, say, F(X) = ZLo mj3i where the mk are determined as
before. To see that if F(AX) = F{X2) then Xx = X2, we make use of
the uniqueness result 7.39 for numbers a not ol the form 6/3 and of
Exercise 2, Exercise Group 7.3. Each real number a of the form b/3k
292 THE REAL NUMBERS [chap. 7
has just two representations, one ending eventually in 0’s and the other
eventually in 2's; however, this last case never arises with the given choice
of the mi’s.
This argument thus shows that there is a subset S of Re with S ~ S.
From this we will easily conclude that P ^ Re and that, in fact, as in
7.63(ii), there is no function F with 2D(F) = P and (51 (F) = Re. Since
this is a relationship that will keep recurring we introduce the following
terminology.
G(n)} = {F( 1), . . . , F(m)}. Consider the least k > m with F{k) g
(F(l), . . . , F(m)}; we take G{n + 1) = F(k). Then we can prove by-
induction that for each n there exists an m > n with {(7(1), . . . , G(n)} =
(F(l), . . . , F(m)} and that the values (7(1), . . . , G{n) are distinct. Thus
G is seen to establish P A. Conversely, if P ~ A it is clear by defini¬
tion that A is denumerable. If A is finite and nonempty then A ~
{1, . . . , n} for some n E P. But then A is denumerable by (i) and (ii).
V _ \d
O -
r„(l) y Ci„(2) ydA3) y .
-I
. ./ y
,(3)
0.m(i3)m(23)m(33) . . .
1 if m = 0,
0 if m = 1.
Then we take
(7:5-2) a = . . .
We can now begin to see where the hope lies in proving the existence of
transcendental numbers—we should try to show that the set of algebraic
real numbers is denumerable. We shall do this in a series of steps, by show¬
ing that I, Ra, Ra[£], and finally the set of all roots of members of Ra[£],
are all denumerable. This will be possible by the following three general
theorems.
7.67 Theorem.
(i) P X P - P.
(ii) If A, B are denumerable so is A X B.
7.68 Theorem. Suppose that M is any denumerable class of sets and that
for each IeM, X is denumerable. Then [JX[X eM] is denumerable.
Proof. For the proof we can assume, without loss of generality, that
M 0 and that each X E Mis nonempty, since the empty set contributes
nothing to the union considered. By hypothesis, there exists a function
F with 30(F) = P and (R(F) = M. We denote by Xk the value F(k), so
that M = {Xi, X2, . . . , Xk, . . .}. For each k there exists by hypothesis
a function Gk with 30(Gk) = P and 0i(Gk) = Xk. If x E jJX[X e M] =
s P] then for some k e P, x E Xk and hence for some l e P
x = Gk(l). By 7.67(i), P ~ P X P, so that there exists a function H
with 30(H) = P, 0{(H) = P X P. Given n E P there are thus unique
k,l E P with H(n) = (k, l). Hence there exist two functions Hi, H2 with
30(H\) = 30(H2) — P and H(n) = (Hi(n), H2(n)) for every n; these
have the property that for any k, l e P we can find n E P with Hx(n) = k,
H2(n) = l. We can thus define a function G* with 30((7*) = P and
(R((7*) = (JXfcf/c e P] by G*(n) = Giil(n)(H2(n)) for every n.
[Note that this proof uses the axiom of choice, in that for each k we must
choose one of many possible functions G with 3l>(G) = P, Oi(G) = Xk.
However, in most applications the functions Gk can be given explicitly
in advance.]
(7:5-3)
Proof. For each n E P let Sn be the set of all sequences (aq, . . . , xn) of
length exactly n, where each Xi G A. We prove by induction on n that Sn
is denumerable. Clearly, A « S\. Note that ~ Sn X A, with a
suitable one-to-one mapping given by the function which associates with
any sequence (aq, . . . , xn, xn+i) the ordered pair ((aq, . . . , xn), an+1).
Hence if Sn is denumerable, so is Sn+i by 7.67(ii). The theorem now follows
from 7.68, since S — U*Sn[n E P].
(1) I Is denumerable.
(2) Ra is denumerable.
For let S be the set of all finite sequences (a0, ■ ■ ■ , an) where each a; E Ra
and n is an arbitrary integer >0. Then S is denumerable by 7.69 so that
there is a function F with 2D(F) = P, (R(F) = S. We can define a function
G with 2D(G) = S and (R(F) = Ra[£] by
71
(4) A is denumerable.
Here we associate with each /(£) E Ra[£] the set Xf{^ — {x: x e Re and
f(x) = 0} of all real roots of /(£) (possibly empty). Let M be the class of
all these sets; X £ M if and only if X = X/^ for some /(£) E Ra[^].
By definition, a real number is algebraic if and only if it belongs to some
member of M, A — Ul[IeM], HereM is seen to be denumerable by
(3); each member Xf^ of M is finite. Hence (4) is proved by 7.68.
By going over the proofs again it can be seen that an enumeration of
Ra[£] can be explicitly described. Furthermore, Sturm’s theorem provides
us with a method to explicitly find the number and location, to any desired
degree of accuracy, of all the roots of any given /(£) e Ra[£], But then
we can describe an explicit enumeration S = (a(1), a(2), a(3), . . .} by the
expansions of an\ ai2), a(3), ... in any chosen base 6. Hence by a variant
of Cantor’s diagonal method (cf. 7:5-1), we can give a procedure which
will exhibit the representation to the base b of a certain number a & S.
It is in this sense that we can, in principle, use the above method of proof
to exhibit a transcendental number. Since we shall now present a method,
due to Liouville, for exhibiting such numbers in a much more perspicuous
form, there is no reason to try to pursue the former approach.
Cantor’s method does not make any special use of the algebraic proper¬
ties of the real numbers. For this reason it is adaptable to a wide variety
of situations in mathematics when one wants to compare various sets. It is
also for this reason that we have presented it here. In contrast, Liouville’s
method makes essential use of the algebraic properties of real numbers
and is thus limited in scope. However, it has more of particular interest
to tell us about algebraic and transcendental numbers.
+ 6/bi + cV k k
= a \ j-x
/(f) = - l2 T~V
Since ak2 + bkl + cl2 € I, |a/c2 bkl + c/2| > 1. Hence
(7:5-4) — x >
k
a|
J-y
This suggests that, given x and the quadratic polynomial /(£) of which
it is a root, we can find M > 0 such that
In fact, let d = max (|x|, \y\), so d > 0. Then it is seen that if \k/l\ > 2d
we have \k/l —■ x\ > d > d/l2 (since 1 < /). On the other hand, if
\k/l\ < 2d we have
k k
< + \y\ — 3d,
j-y l
298 THE REAL NUMBERS [CHAP. 7
so that by (7:5-4)
1 1
> and >
3d l2 • 3d|a|
V
7.71 Theorem. Suppose that /(£) G Ra[£], deg (/(£)) = n > 1, and
suppose that fix) = 0 where x & Ra. Then we can find a real number
M > 0, depending on /(£) and x, such that whenever k e I and
l G P then
Proof. If /(£) has any rational root y then /(£) = (£ — y)g(£) where
deg (g(£)) = w — 1 and p(x) = 0. By successively dividing out all
rational roots we will eventually reach a polynomial fi(£) of degree
k < n with no rational roots and such that fi(x) = 0. If the result is
proved for such polynomials, then from M/lk > M/ln we get the stated
result. Hence we can assume at the start, with no loss of generality, that
We can write /(£) = E;=o ri¥ where rt = a/di, c{, d.t G I. Then by
multiplying f(£) by d0-di • • • dn, we again have a polynomial satisfying
(1) with x as a root and with integer coefficients. We can thus also assume
that
n
(3) there exists a real number N > 0 which depends only on x andf(£)
such that
This can be seen in several ways. For example, if we use Exercise 9(b) of
the preceding section, for any x, y we can find u between x and y with
m -/(y) - /'(«).
x — y
k
< N x
l
2 eI
1=0
Thus if we take M = min (1, 1/N), we have the desired result, since
\x — (k/l) | > 1 implies
k M
x > M >
l ln
7.72 Theorem, (i) Suppose that x is a real irrational number such that for
every n e P there exist k £ I, l E P with \x — (k/l) \ < l/nln. Then
x is transcendental.
(ii) Suppose that b E P, b > 1, and x = Xa=i b (l'). Then x is
transcendental.
Note that, no matter what b is, the representation to the base b of the
number exhibited in (ii) is 0.11000100000000000000000100 . . . Numbers
x which satisfy the hypothesis of (i) are often called Liouville numbers,
of which one can produce many examples in the spirit of (ii). It is by no
means true that every transcendental number is a Liouville number.
It is quite another (and, in general, much more difficult) matter to show
that certain specific “interesting” numbers are transcendental. Tor ex¬
ample, it is known that e, it, and v/2v5 are transcendental. On the other
hand, it is not known whether the same is true of e + tt, ev. Even proofs
that such numbers are irrational are not always simple. One example of
such a proof is suggested in the exercises.
300 THE REAL NUMBERS [CHAP. 7
1. Let A be the set of all real numbers x of the form ^'f=o mi/3i where each
nii = 0 or 2.
(a) Is A denumerable? Prove your result.
This set A (called Cantor’s ternary set) can be pictured as the inter¬
section of the following sets:
1 2 1 2 7 8 ,
0 9 9 3 3 9 9
order, yo < yi < • • • < ym, at least one of the differences y\ — yo,
V2— yi, ■ • ■ ,Vrn — Vm-1, (1 — Vm) + 2/o is <1/(to + 1). B»y examin¬
ing the differences, obtain the desired conclusion. What does this result
show about possible improvements of Liouville’s theorem 7.71?
5. Show that if x G Re — Ra, then there are infinitely many (k, l) with
k G I, l G P and \x — k/l\ < l/l2.
6. (a) Show that if x G Re and i is a nonnegative integer, then there exists
N > 0 such that lad — yi\ < W|a: — y\ whenever \x — y\ < 1.
(b) Use the result of (a) to give a direct proof of step (3) in the argument
for Theorem 7.71.
7. (a) Prove Theorem 7.72(i), (ii).
(b) Give another example of a Liouville number.
8. (a) Suppose a = (—1 )®a» exists, where each at- > 0. Show that
(b) Usin;
from 7.45, show that e-1, and hence also e, is irrational. [Hint: Con¬
sider n\ (e-1 — Xp=o (—1) Vi!)-J This can be generalized to show that
e~k, and hence also ek, is irrational for each k G P.
CHAPTER 8
1 _ x { y \
x + uy x2 + y2 U \ x2 + y2) ’
(1) K = Re X Re
and
(4) u2 = -1,
We pick a few cases of the statements that must be checked against the
definitions 4.1, 4.13, and 6.1. For example, for distributivity:
For the condition 4.14 for an integral domain, if (x, y) • (x\, yx) — 0,
that is, {xxx — yyx, xyx + xxy) = (0, 0), we have xxx — yyx — 0 =
xyx + xxy. From the first of these equations, xxxyyx = y2y\ > 0 and
from the second xxxyyx = —x\y2 < 0. Thus xxxyyx = 0 and xy — 0
or xxyx — 0. Suppose that the first of these holds, so that x = 0 or y = 0.
Say x = 0; if also y = 0 then (x, y) = 0 and we are through. Other¬
wise xxx — yyx and — xyx = xxy shows that yyx = 0 = xxy; hence
xx = Vi = 0 and (xx, yx) = 0. The argument is similar in the other pos¬
sible cases. For the condition 6.1 for a field, it is sufficient to show that if
(x, y) 9^ 0 then (x, y) • (x, y)~1 = 1. Since either i ^ 0 or y ^ 0,
certainly x2 -f- y2 > 0 and (x, y)~1 is well defined by (3d). The condi¬
tion is easily checked from (3c). We thus take it that (5) is established.
For any x e Re, let
for each 2 G K.
Imaginary and complex numbers have been used for hundreds of years,
beginning with their use in the solution of polynomial equations. Such
applications were “formal” and came long before geometrical interpreta¬
tions of these numbers and of the basic operations on them. It was
implicitly assumed in such formal manipulations that the same “laws of
algebra” could be applied to complex numbers as to rational and real
numbers. The existence theorem 8.2 explicitly states in what sense this is
possible and the uniqueness theorem 8.3 shows that, for algebraic purposes,
the three conditions 8.2(i)—(iii) are the only ones we need set down to govern
the use of these numbers. I he fact that there is no relation < under which
C will become an ordered field also shows precisely what limitations are
imposed in extending “the laws of algebra. ”
We thus have (i) and (ii) of the following as a Corollary to 8.5, since
F(z) = z for all 2 e C.
Parts (iii)-(iv) follow directly from the definition 8.6. Of course, (i)
and (ii) can also be checked directly in this way. This leads now to a general
result about complex roots of real and complex polynomials.
8.8 Theorem. Suppose that /(£) = Xn=o cqf\ /(£) £ C[£], and zeC.
Then:
(i) f(z) = L"=o a;?;
(ii) #/(£) e Re[|] then f(z) = /(2);
(iii) if f(£) e Re[|] ond z 7s a rooi 0//(^) so 7s z.
Proof. Part (i) follows from 8.7(i), (ii) by induction, first for the poly¬
nomials £ and then in general. Then (ii) is immediate from 8.7(iv).
Thus if/(f) G Re[£] and/(z) = 0, also/(z) = 0, so f(z) = 0, proving (iii).
( . s VvS2 + t2 + S . V\/ s2 + t2 — s
\ 2 x/2
2 = V^2 + y2 (~ X + i - V
\V X2 + IJ2 Vx2 + y2
Figure 8.1
Then the quantity \/x2 + y2, which we have already had to deal with
several times, is the distance of P from the origin 0. Furthermore, in the
representation
Kyi + yf) has associated point (xx + x2, y\ + y2). The corresponding
figure is as follows.
8.12 Definition. For any z e C we put \z\ = -y/^; \z\ is called the
absolute value or modulus of z.
Proof. Parts^(i), (ii) are obvious from 8.7, and from 8.7(h) we obtain
\zw\2' = (zw)(zw) = (zz) (ww) = |z|2H2, which gives us (iii) by (i). Part
(iv) is then immediate by application of (iii) to w = z~h To prove (v),
8.1] BASIC PROPERTIES 311
We wish to show that this result is <(|z| + l)2 = |z|2 + 2\z\ + 1. This
reduces to showing that z + z < 2\z\. If z = x + iy with x, y real, then
we have 2x < 2\/x2 + y2, which is obviously true. In general, if w — 0
the result is trivial and if w + 0 we have
By (v), \z\ = \w + (z — w)| < \w\ + \z — w\, giving \z\ — |+ < |z — w|.
If |z| > |rt'|, this gives (vi). If |w| > \z\ we have | |z| — \w || = +| —
|z| < \w — z| = \z — w|.
Thus, despite the fact that C cannot be made an ordered field, we are
still able to introduce an absolute value function which not only extends
the absolute value as given by the ordering of the reals, but also shares its
main properties.
We now turn to the relationship between the algebraic operation z\ • z2
and its geometrical interpretation. Formally, if we write zx = r\ (cos + +
i sin +) and z2 = r2(cos d2 + i sin 02) then rx = |zx|, r2 = \z2\, and
If we set z = ziz2 = r(cos 8 + i sin 8), then r = |z| = |zi| |z2| = rir2.
Thus
cos 8 — cos 0i cos 82 — sin 0! sin 02,
sin 0 = sin 0X cos 02 + cos 0i sin 02.
We now examine more closely the notion of angle and the definitions of the
trigonometric functions of angles.
The most convenient system for measuring angles in mathematical
analysis is by radians. If we consider the unit circle with center the origin
and radius 1 then the circumference of the circle is equal to 2ir. The arc
length subtended by any angle 0 is in the same ratio to 2tt as 0 is to the full
circular angle. In the radian system of measurement, each angle is meas¬
ured by the same number of radians as there are units in the associated
312 THE COMPLEX NUMBERS [chap. 8
(b)
Figure 8.3
C(0) = 1, 5(0) = 0,
The proofs that we can find such functions (Arg, C and S) would take
us too far afield from our main interests here. These proofs can be ac¬
complished after a modest development of the calculus, which only depends
on the treatment of the real number system given in the preceding chapter.
In Appendix II we sketch the main points that are involved and also give
references to complete proofs which lead to the following theorem. For
our purposes here, it is never really necessary to know exactly how the
number t is determined. Thus one can equally well read throughout the
following “where x is any given positive real number.”
8.15 Theorem. Suppose that C, S are any two functions satisfying the
conditions of 8.14. Then for any real number 0 and n G I:
(i) -1 < (7(0) < 1, -1 < 5(0) < 1;
(ii) C (0 + 7t/2) = —5(0) and S (0 + tt/2) = (7(0);
(iii) (7(—0) = (7(0) and S(~0) = — 5(0);
(iv) (7(0 + 2mr) = (7(0) and S(6 + 2mr) = S(6);
(v) (7(20) = (72(0) - 52(0) = 2(72(0) -1 = 1- 2S2(6);
(vi) 5(20) = 25(0)(7(0);
Proof. Part (i) is immediate from 8.14(iv), and (ii) is evident from
8.14(h), (v), (vi). To prove (iii) we use 8.14(v), (vi) with 0X = 0,
02 = —0 to obtain
hence (iv) is seen to hold for n > 0. But by 8.14(v) and (iii),
and
5(01 ~ 02) = 5(0:)(7(02) - C(0i)5(02).
Then we see that (7(0 — 7t/2) = 5(0) and 5(0 — 7t/2) = —(7(0), hence
we can obtain (7(0 — 27r) = (7(0) and 5(0 — 27t) = 5(0). From this
we can get (iv) in general. Parts (v) and (vi) are immediate from 8.14(v),
(vi), and part (vii) is obtained from (v) by substituting 0/2 for 0.
From (vii) we can write
(7(0)
±
8.1] BASIC PROPERTIES 315
To determine which signs are taken here, one must know the location of
6/2. In general, the signs of (7(0), *S(0) for 0 < 6 < 2t [and hence for all
6 by (iv) above] are completely determined by 8.14(i)—(iii) and repeated
application of 8.15(h). We consider the values of these for 0=0, ir/2, ir,
and 3x/2 and for the intermediate ranges (I) 0 < 6 < t/2, (II) tt/2 <
6 < 7T, (III) 7T < 6 < 37T/2, and (IV) Zir/2 < 6 < 2ir. Each 6 in
range (II) is 7t/2 + 6X where 6X is in range (I), etc. Thus we see that
C(6) = 0 if and only if 6 = t/2 or 6 = Zt/2 [with S(6) = 1 or — 1,
respectively], and *S(0) = 0 if and only if 6 = 0 or 6 = t [with (7(0) = 1
or —1, respectively]. For 0 in range (I): (7(0) > 0, *8(0) > 0; in range
(II): (7(0) < 0, *S(0) > 0; in range (III): (7(0) < 0, *8(0) < 0; and in
range (IV): C(6) > 0, *8(0) < 0. This is summarized in the figure below.
Thus, for example, if 0 < 0 < t then 0 < 0/2 < t/2 and
and
f C(6)
)
and again
Proof. Suppose that (7, *S and (T, *Si are two such pairs of functions.
Then both of these also satisfy the conditions of 8.15. By 8.14(i), (ii) and
316 THE COMPLEX NUMBERS [CHAP. 8
7T
(1) if 0 < 9 < — then C(9) = Ci(d) and S(9) = Sx(9).
and
s T) = Qk T
Next we obtain:
(3) if 6 is any real number and me. P and if C(0) = C\(0) and
S(d) = Si (6) then
and
Consider now any 9 with 0 < 9 < ir/2. We can write 9 = a(w/2) where
0 < a < 1. By the representation 7.36 of real numbers to the base 2,
we can write a = Y^i=\ (mi/24), where each mi is 0 or 1. Hence for any
8 > 0 we can find m, k e P with |a — (to/2k)\ < 8; namely m/2k =
l (TO*/2*) for suitably large n. Now we apply the assumed continuity
of the functions C, Ci. It is seen from this that given any e > 0 we can
find 8 > 0 such that 0<a — 5<a+5<l and
and
|CX (a iPj - C\ I < e.
Given any such e and associated 8, choose to, k as above, and let b m/2k.
Then by (4)
8.1] BASIC PROPERTIES 317
S (a = S! (a ,
8.17 Definition. We write C(6) = cos 6 and S(9) = sin 9 for any real
number 6, where C, S are the unique functions satisfying the condi¬
tions of 8.14.
+ cos(9 + l) + (* + !)’
—I — i = cos (6 + 7r) + i sin (9 + 7r), etc.
z z
On the other hand, to represent (—1/2) + i(V3/2), we first find <f> with
cos 4> = \/3/2, sin <f> = 1/2.]
318 Tl-IE COMPLEX NUMBERS [CHAP. 8
8.19 Theorem.
(i) If Zi = ri(cos 91 + i sin 9{) and z2 = r2(cos d2 + i sin d2)
where 6\, 92 are any real numbers, then
nth roots of complex numbers. Substituting 9/n for 9 and r1/n for r in
8.19(ii) now gives us a solution of zn = r(cos 9 + i sin 9). More generally,
we have the following.
Zk = I a? 11;cos
for k = 0, 1, . , . , n — 1.
8.1] BASIC PROPERTIES 319
for 0 < k < n — 1, hence 0 < (0 + 2kiv)/n < 2w for each such k. By
the uniqueness result 8.18, it follows that if Zk = zi where 0 < k,
l < n — 1 then k = l. Finally, each Zk is a root of the polynomial
— d. Since this can have at most n distinct roots, the Zk provide all
the solutions of zn = d.
Thus the trigonometric representation very elegantly supplies us with
the general existence theorem that we had obtained previously only for
n — 2 (8.9) and already found troublesome to deal with algebrai¬
cally for n = 3. To see how the above compares with the solution for
n = 2, let
Note that this definition is also in accord with 7.56 for d real, d > 0.
As a practical matter, the computation of the nth roots of a complex
number d is broken up into two parts: the computation of the real nth
roots |d|1/n and the computation of the cos (0 4 2/c7r/n) and sin (0 4
2kir/n). The second of these is achieved, to any desired degree of accuracy,
by the series representations of cos 0 and sin 0 discussed in Appendix II.
The results of such computations for specified degrees of accuracy and for
a large number of values of 0 are compiled in tables ol trigonometric
functions.
320 THE COMPLEX NUMBERS [CHAP. 8
3 ® Q 6.2®
(8:1-1) cos — — 3 cos - sin - cos 6
o o o
and then to
e d
(8:1-2) 4 cos3 3Q COS — cos 8.
3 O
Figure 8.6
Figure 8.7
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 323
Then the set of 2 with \z — d\ < r corresponds to the set of points interior
to the circle \z — d\ = r.
We shall show in this section that in terms of this notion of distance
between complex numbers, not only the notions of limit and continuity but
also the basic results concerning these can be extended in a straightforward
way from the real to the complex numbers. Our purpose in doing this is
to provide us with the following approach to finding roots of complex
polynomials/(£) E C[£]. We shall show that for each such the correspond¬
ing function/(z) is a continuous function on C. Now we cannot in general
speak of maximum or minimum values of such a function, since C is not
ordered. However, the function \f(z)\ will also be seen to be continuous,
and this function takes on only nonnegative real values. Then /(£) has
a root if and only if the function \f(z)\ attains the minimum value 0 for
some z. Our first main step will be to show that \f(z)\ does always attain
some minimum value, generalizing 7.50. After that, we shall have to make
some arguments, which hold for the complex numbers, to show that this
minimum value cannot be other than 0. In the following we cover only
those parts of complex analysis needed to reach this result.
If we look back at the proof of 7.50, we see that essential use was made
of the Bolzano-Weierstrass Theorem 7.26, according to which every
bounded sequence contains a convergent subsequence. We also need here
the analogue of this. We thus begin by generalizing some of the material
of Section 7.2 on limits of sequences.
324 THE COMPLEX NUMBERS [chap. 8
lim Zk = d
k—>co
if for each real e > 0 there exists an m such that \zk — d| < e
for all k > m.
(ii) We say {Zk) is convergent if there exists some d E C which is a
limit of (Zk).
(iii) We say {Zk) is bounded if for some real number M and all k,
\zk\ < M.
(iv) We say (wk) is a subsequence of (zk) if for some sequence of
integers (jk), jo < ji < • • • < jk < • • • and wk = Zjk for
each k.
Proof. In the proof of 7.26 the basic idea was that if an interval
[b, c] = {x: b < x < c} contains xk for infinitely many k, the same holds
of one of the two subintervals [b, (b + c)/2], [(b + c)/2, c] obtained by
dividing the original interval in half. In this proof we consider rectangles
in the plane, instead of intervals, and subdivide such into four equal sub¬
rectangles. By a (closed) rectangle here we could mean simply [b, c] X
[b', c'] = {(x, y):b < x < c and b' < y < c'}. We modify this slightly,
so that we are dealing with complex numbers directly. We define
(1) [6, c] (x) [&', c'] — {x + iy: b < x < c and b' < y < c'}
(4) if there are infinitely many k with Zk G [b, c] (x) [&', c'} then the
same holds for at least one of the sets
i, b + c b + c b'
h’
v 'GtA
V 1 o b, > c
c'
1 1
S2 ; s4
Si ’T'T"”
V
Figure 8.9
If c — 6 = c' — b' then the original set [6, c] (x) [bf, c'] and each of its
subsets Sfib, c, b', c') will be squares. We can now define recursively the
sequences bn, cn, bn, c'n as follows.
(5) We take b0, c0 as given in (3) and b'0 = b0, c'0 = c0. Given
bn, cn, b'n, c’n we take bn+l, cn+1, b'n+i, c'n+1 to be such that [bn+1,
cn+1] (x) [b'n+i, c'n+i\ = Si(bn, cn, b'n, c'n) for the first i (=1, 2, 3, 4)
such that Zk G Sfibn, cn, b'n, c'„) for infinitely many k.
Thus, for example, if the first such i is equal to 2, for given bn, cn, b'n, c'n,
we have
bn “h Cn if _ bn ~T Cn
bn A-1 b n, Cn-\-1 7) f ^ Ln-\-1 — Cr
(iv) there are infinitely many k such that Zk G [bn, cn\ (x) [b'n, c'n\.
326 THE COMPLEX NUMBERS [chap. 8
Then each cre(c') is an upper bound for B(B') and each bn(b'n) is
a lower bound for C (C).
Hence sup 5, sup iT, inf (7, inf C exist by 7.9 and we can prove, in the
same way as in 7.26, that
Let
(10) Let j0 = 0. Given jn, let jn+\ be the least k such that jn < k
and zk E [6n+1, cn + 1] (x) [b'n+l, c'+x]. Let wn = zj for all
n > 0.
(11) (wk) is a subsequence of (zk), with wn E [bn, cn\ (x) [b'n, c'n\ for
all n.
Since also d E [bn, cn] 0 [b'n, c'n] for all n, in order to show that
(12) lim wn = d,
n-+oo
(13) for any real e > 0 we can find an m such that \w — u\ < e
whenever w, u E [bn, cn] 0 [b'n, c'n] and n > m.
This is realized by computing, for any n and for w, u E [bn, cn] 0 [b'n, c'n],
the absolute value \w — u\2 as follows: if w = nq + iw2, u = Ui + in2,
then
^2'(0(c«-»o)2,
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 327
This gives (13) and thus concludes the proof of the theorem.
Essentially all that we have used about the complex numbers in this
theorem comes from their representation in the plane Re X Re and from
the basic properties of the distance function |z — w\ in this representation.
It could equally well have been expressed and proved as a theorem con¬
cerning Re X Re, without any mention of C. In the same way one can
prove a corresponding theorem for the three-dimensional Euclidean space
Re X Re X Re, and for higher-dimensional Euclidean spaces. In recent
years the theorem has undergone considerable generalization in the sub¬
ject of (point-set) topology. We can also generalize the notion of fundamental
sequence (7.22) to two and higher dimensions, and prove the basic Cauchy
condition for convergence of a sequence, namely that it be a fundamental
sequence (7.23, 7.27). We shall not need this here, and leave it to the
reader to pursue.
Proof. This follows the same lines as the proof of 7.50. Let
8.29 Theorem. If M) G C[£], /(£) = E?=o at? where the a{ are complex,
then the associated polynomial function f (z) = £"=o a#* is continuous.
Then for any real number M > 0 we can find a real number r > 0
smc/i that |/(2) | > M whenever |z| > r.
n— 1 n— 1
|/(2) | > \anZn\ — ^ at-2* > | an2n | — ^ |a*2*|.
i=0 i=0
Let
n— 1
|Oi
(2) r>= E I Ojn.
1=0
8.31 Theorem. Suppose that/(£) £ C[£] with deg (/(f)) > 0. Then there
exists at least one complex number d with \f(d)\ < \f(z)\ for all z e C.
Proof. By 8.28, we can find d\ £ C with \d\\ < 1 and |/(di)| < |/(z)|
whenever \z\ < 1. Let M = |/(di)|. By 8.30 we can find r > 1 such that
|/(2)| > M whenever |2| > r. Now if we apply 8.28 again, we can find
d G C with |/(d)| < |/(2)| for all 2 with \z\ < r. This d satisfies the con¬
clusion, for if \z\ > r we have \f(z)\ > |/(di)| > !/(d)|, since |di| < r.
8.32 Theorem. Suppose that /(£) G C[£] with deg (/(f)) > 0. Then for
each d G C with f(d) 5* 0 there exists some 2 G C with \f(z)\ < \f(d)\.
330 THE COMPLEX NUMBERS [chap. 8
ff(£) = ff (£ + d)1 = ^
i—0 i— 0
then the theorem is proved, for \f(d -f z)\ < |c| = |/(d)| in this case. Let
where d, 0 are real numbers and where 0 < r. Although |6*| and d are
taken to be fixed, we are still free to determine r, 6. By De Moivre’s theorem,
Our strategy is now as follows: we shall choose 6 in such a way that bkzk
is a negative real number, and r is so small that |1 + bkzk\ < 1; we then
choose r still smaller, if necessary, so that when \bk+1zk+l + •••-[- bnzn\
is added to |1 + bkzk\, the result is still smaller than 1. For the first purpose
we simply take
Then d + kd = 7r and
1 ^ \bk\
(8) Choose r < min
■\P\bk\ + ' ‘ ' + I bn
Then
8.33 Theorem. Suppose that /(£) £ C[£], /(£) = Xo=o aiC, with
deg (/(£)) = n > 0. Then:
(i) there exists at least one complex root of /(£), i.e., some z £ C with
m = 0;
(ii) for some zlt . . . , zn £ C, /(£) = an(^ — si) • • • (? — zn)-
Proof, (i) By 8.31 we can find d £ C with |/(d)| < |/(2)| for all 2 G C.
Then /(d) = 0, for otherwise by 8.32 we could find some zeC with
\f(z)\ < \ f(d) |. (ii) follows by induction on n, using 5.13.
preceding. The Theorem 8.33 can be restated to the effect that, as far as
roots of 'polynomial equations are concerned, C is algebraically complete.
Thus, in this respect, our work has come to an end, and any constructions
of larger number systems than C must be based on other considerations.
(There are, in fact, a number of such extensions which are of interest, but
they do not fall within the framework of this book because of the nature
of these other considerations.) On the other hand, we shall see that if it
is just this property of algebraic completeness or closure that we are in¬
terested in, a certain subfield of C already sez’ves the purpose, namely the
field Alg of algebraic complex numbers. This notion will be defined in the
next section in obvious generalization of 7.61.
The construction of C from Re is algebraic in spirit, but that of Re
from Ra is not. The first of these can be said to proceed by formally adjoin¬
ing a root of £2 -f- 1 to Re, in a sense which will be defined precisely in the
next chapter. However, the construction of Re from Ra involved some
essentially nonalgebraic concepts, either the use of arbitrary Dedekind
sections in Ra or of arbitrary fundamental sequences in Ra, and the
pervading idea is the geometric-analytic notion of continuity. That this
can be avoided if we wish to be content with Alg is, compared to our in¬
tuitive conception of Re and C, a sophisticated development. This is
realized by a general treatment of the process of formal adjunction of
roots of polynomials to given fields. It is this procedure which takes on
central importance in modern algebra, and to which we shall devote atten¬
tion in the next chapter.
Thus, when it is said that the fundamental theorem of algebra is not
really a theorem of algebra, what is suggested in part is the nature of the
setting, C, of the theorem. What is also suggested is the nature of the proof
of the theorem. From this point of view it may rightly be said that it is
“really” a theorem of complex analysis or, even, “really” a theorem of
topology. This is not to say that one cannot give algebraic proofs of it.
In fact, one of Gauss’ proofs is very much in the spirit of modern algebra,
but again involves more sophisticated work.
The proof which we have given here makes use of a minimum amount of
information from analysis. By various minor modifications, one could
do with even less, in particular without the use of the trigonometric
functions. However, this is also less enlightening; to do without the
trigonometric representation would also mean doing without various in¬
formative results such as 8.20, 8.22 on the roots of £n — d.
In the opposite direction, we want to indicate what a fuller use of com¬
plex analysis would provide. As with the real numbers, it is natural to
consider functions F defined on C by power series, F(z) = £“L0 atf
where the a; e C. Questions of convergence for such series can often be
determined in much the same way as 7.41 is used for real power series.
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 333
so that elV = cos y + i sin y. Then ex+VJ — exelV = ex(cos y + i sin y);
this is what suggested the definition of Exercise 13 of the preceding section.
With a modest development of complex analysis, the above power series
definitions can be used to develop all of the basic properties 8.14 of the
trigonometric functions.
To return to the main matter at hand, a function F on C is said to be
constant if for some complex number c, F(z) = c for all 2. Then the general
theorem of Weierstrass, to which we have referred in connection with
8.32, is that if F is an entire function but is not constant and if F(d) ^ 0
then there exists 2 G C with |F(z)| < \F(d)\. There also exists 2 G C with
\F(z)\ > \F(d)\, and in both cases 2 can be chosen arbitrarily close to d.
Closely related to the latter statement is Liouville’s theorem a bounded
entire function (|F(z)| < M for some M and all 2) must be constant.
This leads to another proof of the fundamental theorem. For it can be
shown that if /(£) has no roots, the function l//(z) is entire; but also
1 //(z) can be seen to be bounded by using 8.30.
The function E(z) = ez is entire and not constant; thus ez is not bounded,
by Liouville’s theorem. Furthermore ez = 0 has no solutions, for \ez\ =
334 THE COMPLEX NUMBERS [CHAP. 8
Then we can look upon finding a root of /(£) as either finding (x, y) where
I/O +iy)\ or, equivalently, where |f(x + iy)|2 = g2(x, y) + h2(x, y) is a
minimum, or finding (x, y) such that g(x, y) = 0 and h(x, y) = 0. In
either case we have a more difficult problem than that for real polynomials
with one variable.
It turns out from advanced work in analysis that there is an algorithm
for calculating to any degree of accuracy all of the roots of a polynomial
/(£) G C[£] whose coefficients themselves can be computed to any degree
of accuracy. This depends on the notion of complex integration, and the
theorem that if S is a simple closed curve, then (1/27ri) fs(f'(z)/f(z)) dz is
equal to the number of roots of /(£), each counted as often as its multipli¬
city in /(£), which are within the interior of S; here S must be such that
f(z) = 0 for all 2 on S. In general, this integration must be replaced by an
approximate integration in order to carry out the algorithm. One trouble
concerning this procedure is the following. If deg (/(£)) = n and /(£)
has n distinct roots, the computation procedure will eventually isolate
each of these. However, if there are multiple roots, the procedure will not
uncover this for us, so that at a given stage of the process we may not be
sure whether we are isolating a multiple root or two roots extremely close
together. Thus it is preferable to deal instead with the polynomial //£)
given by 6.39, which has the same complex roots as /(£), but all of which
are simple for //£).
On the other hand, by a more detailed algebraic analysis, it has been
shown by Tarski that Sturm’s procedure can be generalized to polynomials
of several variables with real coefficients. In particular, if g(£lt £2£2)
are two such polynomials, his work provides a systematic procedure for
calculating the exact number of pairs (x, y) satisfying g(x, y) = 0,
h(x, y) — 0, t < x < b, a' < y < b', by certain calculations on a, b,
a', b' and the coefficients of <?(£i, £2) and /i(£i, £2)- These computations
can be carried out effectively if all these numbers are rational, or more
generally, algebraic. Tarski’s procedure can thus be used to decide exactly
how many roots (and even with what multiplicities) a given polynomial
/(£) with algebraic coefficients has within a given (algebraically defined)
region of the complex plane, and then to compute these roots to any
desired degree of accuracy.
In both cases not only the theoretical justification but also the procedure
to be followed is rather involved. It is thus outside of the range of this
book to try to describe them any more closely.
8.34 Theorem. Suppose that /(£) e Re[£] with deg (/(£)) =n > 0.
Then:
(i) there exist k, l e I with 0 < l < n, 2k + l = n, and a, hi, ... , hk,
Ci, ... , Ck, di, . . . , di e Re such that for each i = 1, . . . , k,
Ifj — 4Ci < 0 and such that
Note that the set of algebraic real numbers, as defined in 7.61, is just
the set Alg n Re. The following is just a restatement of part (i) of the
fundamental theorem 8.33. The theorem immediately following it lists
elementary consequences of the preceding definition.
We leave the proof of this to the reader; it is seen that the algebraic
closure of C enters essentially into the proof of (iii).
It can be shown that if K is denumerable then so also is Alg (.K); this
follows by essentially the same lines of argument as for 7.70. More generally
it can be shown that K ~ Alg (K).
Givens, w E Awe can find polynomials /(£), <?(£) £ K[£\ with deg (/(£)) =
n > 0, deg (g(£)) = m > 0, and f(z) = 0, g(w) = 0.
That z~1 E A if z ^ 0 is easiest to prove. If we write /(£) = a0 -+-
£ —h* * * * ”f~ £ with, till df EE I\_ then —\~ ci\Z —\- • • • -f~ &nzn = 0 so
ao{z~l)n + ai{z~l)n~x + • • • + an = 0 and s_1 is a root of the poly¬
nomial L"=o
We now prove that z -fi w e A. We can assume, without loss of
generality, that both /(£), g{£) are monic. Hence, by 8.33(ii), we can
find zi, . . . , zn, Wi, . . . , wm E C with
n m
We wish to show that h{f) E A[£], First note that for any i,
ra in
nif- (Zi + = n at — zi) — wi\ = — zfi.
j=\ j=i
Hence
It is clear that any change in the order of the does not affect the value
of h(%). This reminds us of the fundamental theorem on symmetric poly-
8.3] KOOTS OF COMPLEX POLYNOMIALS 339
nomials, 5.29. To put the matter in the proper form for application of
5.29 we consider polynomials in an (n + l)-fold transcendental extension
A[|, |x, . . . , In] of A. We wish to consider the element n?=i £/(l — &)
of A[|, |i, . . . , |n] as a polynomial in |i, . . . , In with coefficients in
K[|]. This is possible by 5.19. We put D = A[|] and then Z)[|1? ...,!«] =
A[|, |i, . . . , In]. Then in D[|i, . . . , |w] we have a polynomial p(|x,... |„)
with
n
Now by (4), (5), (7), (8), we have h(|) = p(zi, . . . , z„) and then
Now recall the close connection between roots, coefficients, and the or,;,
stated in 5.28(i): we have
Put . . . ,zn) = c*; then we see that A(|) = r(|, Ci, . . . , cTO), hence
(11) /i(|)eA[|],
the proof that 2 • w G A. With this the proof of (1) is completed and we
now know that A is a subfield of C.
To prove that A is algebraically closed it is sufficient by 8.37(iii) to
show that Alg (A) c A, i.e., that
Now
Also
(? - zi)3 + (I - z2)3
= 2?3 — 3?2(zi + z2) + 3?(z2 + zl) (zf -f- zf) = 2?3 + 18/
Here h(£) has the eight roots given by all possible combinations of signs
in (±\/2 zt \/2 zb 4\/3)/2.
We shall return to a more intensive study of the algebraically closed
field Alg and its subfields in the next chapter.
8.39 Definition. Suppose that /(f) e C[f], deg (/(f)) = n > 0 and
/(f) = (f — zi) . . . (f — zn). By the discriminant of /(f), which we
denote by Dis (d(f)), we mean the number IIi<;</<n (2» — zj)2-
8.3] ROOTS OF COMPLEX POLYNOMIALS 343
— b zt \/b2 — 4ac
2a
Recall that by 8.8(iii) if a, b, c are real then either both roots z1} z2 of
/(f) are real, in which case Dis (/(f)) > 0, or the roots zj, z2 are nonreal
complex conjugates, say zi = x + it/, z2 = £ — h/ with ^ 0, in which
case Dis (/(f)) = (2it/)2 = — 4y2 < 0. Thus by computing the discrimi¬
nant we are also able to tell whether a given real quadratic polynomial
has two distinct real roots, one multiple root, or two distinct nonreal
(conjugate) roots, according as Dis (/(f)) > 0, =0, or <0. Of course,
this again agrees with the result of using the quadratic formula.
344 THE COMPLEX NUMBERS [chap. 8
2763 4 I8616263.
For fourth and higher degree polynomials the computation becomes even
more involved. The following simplifying device is moderately helpful.
Its proof is left to the student.
8.41 Theorem. Suppose that n > 2 and that /(£) = Ya=0 K-i? is in
C[£] with 60 = 1. Then for some c2, ... , cn,
,(0 -/({-£).
icc 6ace
Dis (/(£)) = Dis ((,(£)).
The latter part of this holds since 2 is a root of /(£) if and only if
2 + (61 /n) is a root of (?(£) = /(£ — 6\/n). Thus it suffices now to
restrict attention to /(£) of the form £n + c2£ra_2 + ••• + <:„. From the
above we thus obtain the following.
8.43 Theorem. Suppose that /(£) e Reft], /(£) is monic, deg (f(t)) = 3
and /($) = (^- Zl)tt - «a)(€ - s3). Let d = Dis (/(£)). Then
d is real and we have:
(i) d > 0 if and only if z\, z2, z3 are real and all distinct;
(ii) d! = 0 if and only if zlt z2, z3 are real and not all distinct;
(iii) d < 0 if and only if one of Z\, z2, 23 is real and the remaining two
numbers are nonreal and complex conjugates.
Clearly if d 5^ 0 and Z\, z2, z3 are real then d > 0. From these facts we
can establish that the equivalences hold in (i)-(iii).
Even with the simplification 8.41, the computation of Dis (/(£)) for
/(£) of degree 4, /(£) = £4 + pt2 + qk + r (re C[^]), becomes quite
involved. It turns out in this case that we get Dis (/(£) = 16p4r —
4p3q2 — 128p2r2 + 144pq2r — 27q4 + 256r3. There is no value to us
here in going through the details of verifying this. Also a theorem analogous
to 8.43 can be established for real polynomials of degree 4. However the
possibilities for the roots are greater for each of the various cases.
As we obtained in 8.11 a general representation of the complex roots of
an arbitrary quadratic polynomial in terms of its coefficients, we might
also hope to obtain the same sort of representation for the roots of cubic
and higher degree polynomials /(£). Furthermore, we might hope that as
with the quadratic formula, such a representation could be given using the
rational operations +, —, •, -1 and the operation of taking the nth roots
of a complex number for various n. If we can find such a representation for
at least one root of /($), we say that f{z) = 0 is solvable by radicals.
The problem of thus describing the roots of higher degree equations
was one that occupied and baffled mathematicians until the beginning of
346 THE COMPLEX NUMBERS [CHAP. 8
For p = 0 the roots z of £3 + q are the three cube roots of —q, namely
v/—g, fv'—g, -g where f = cos 2tt/3 + i sin 27t/3 (these roots are
distinct if g ^ 0). We can assume now that p ^ 0. In view of the
history of the problem it may be expected that some ingenuity is now
involved which cannot be motivated beforehand. This is seen in the next
step: we seek to find roots 2 of (8:3-1) which have the form
Suppose that f(z) = 0; then we can find at least one such w as a solution
of w2 — zw — p/3 = 0, and w ^ 0 for otherwise p = 0. For any such
w we have, by direct substitution and expansion,
(8:3-3) + 0,
Then
3
(8:3-5) u2 + qu — = 0.
Proceeding formally for the moment, we write the solution of this last
equation by the quadratic formula as
where \/ in this case indicates only one of the square roots. Then
where v/ indicates only one of three cube roots. Thus we have the follow¬
ing conclusion: if f(z) = 0 then we can find w satisfying (8:3-2) and
(8:3-6), the latter under some definite choice of the square and cube roots
involved. Conversely, consider any one of the (in general, six) numbers w
denoted by (8:3-6). Then if we set u = w3 we have
since (8:3-6) provides us with six numbers to choose from. This is settled
as follows. Let
wi
V J_ V
w /1*
3 wi 3 w.
W2 P J_ V J_
3 w2 3 w'
similarly
V J_ V J_ w2.
™3
3 w3 3 W2
then
{«i» 22, 23} = {«i, 4, 23}
though, of course, the order is not necessarily preserved. The same con¬
clusion can be seen to hold if —p/3 is ^{wxw{) or ff2(wiw[). Since we have
seen that 0 is a root of /(£) if and only if it is one of the numbers z1} z2,
Z3, zu z2, z3> it follows that 2 is a root of /(£) if and only if it is one of
z 1, z2, z3 or, equivalently, one of z[, z2, z3.
Recall that by 8.42 the discriminant of £3 + p£ -|- q is —4p3 — 27q2 =
—108(#2/4 + p3/27). This leads us to summarize the above arguments
in the following form.
The roots of the general cubic £3 + 6X£2 + b2f + b3 can be found from
this theorem by applying the transformation of 8.41. Consider the special
case that all the coefficients of this polynomial are real. Then the poly¬
nomial /(£) = £3 + p£ + q associated with it by 8.41 also has p, q real.
We know in this case how to classify the roots according as d > 0, d = 0,
8.3] ROOTS OF COMPLEX POLYNOMIALS 349
(8:3-9) b + c — a2 = p, a(c — b) = q, be — r.
(8:3-11) u = a2
we have
Now by 8.41 and 8.44 we can solve this equation for u by radicals. There
are in general three such solutions and by (8:3-11) a can be taken to be
one square root of one of them; then b, c can be determined from a by
(8:3-10). The question now is, which of the six possible choices of a should
we take? Tracing back our steps shows that any one of these will do—
in other words, no matter what root u we take of (8:3-12) and which
square root a we take of u, if we define b, c from such a by (8:3-10) then
the relationships (8:3-9) will hold and hence also the decomposition (8:3-8).
(If p, q, r are real we can also choose a, b, c to be real by 8.34 or directly
from here, since it can be seen in this case that (8:3-12) has at least one
real root u > 0.) Since we can find the roots of each of the factors of
(8:3-8) by the quadratic formula, we thus reach the following conclusion:
the general fourth degree polynomial equation over C can be solved by radicals.
In fact, by the use of ambiguous radicals V , -\/ as in (8:3-6), one can
give in terms of the coefficients a single formula which denotes, under the
different particular interpretations of the radical signs, all of the complex
roots of any fourth degree polynomial.
Thus/(£), g(£) have at least one root in common if and only if Res (/(£),
0(D) = o.
(a) State and prove a theorem analogous to 8.40 for resultants.
(b) What is the relationship between Dis (/(£)) and Res (/(£),
for n > 1 ?
6. Prove Theorem 8.41.
7. (a) Classify the roots of the following polynomials according to 8.43(i)-
(iii):
(i) £3-4£+l,
(ii) 2£3 - 6£2 — 1.
(9:1-1) 1 e K
and
z + w G K, z — w G K, z ■ w e K
Proof. Part (i) is immediately obtained by using 9.1, since the collec¬
tion M of all subfields L of C with ZcL has at least C as one member.
Parts (ii), (iii) follow directly from the definition 9.2. To prove (iv), note
first that Zn c Zm whenever n < m, since Zn c Zn+1 for all n. Let
K — UZn[w e P]. Then 1 e K. If z, w e K then for some n, m we have
z G Zn and w e Zm. If n < m then both z, w e Zm and z -f- w, z • w,
and z/w (in case w ^ 0) belong to Zm+1 and hence to K; similarly if
m < n. Thus K is a subfield of C andZ c K. Hence by (iii), Gen (Z) c K.
PT> show that K c Gen (Z), it is sufficient to prove that each Zn c Gen (Z);
this is easily proved by induction on n, using (i).
Part (i) of the above justifies the terminology in 9.2. Parts (i)-(iii)
also justify our referring to Gen (Z) as the smallest subfield of C which
contains Z. Part (iv) provides us with an alternative (inductive) way of
regarding Gen (Z), which could just as well have been taken as the basic
definition. It corresponds to our intuitive idea that Gen (Z) can be con¬
structed by starting with 1 and the elements of Z and repeatedly applying
the rational operations any finite number of times.
There is nothing special about the role of C here. We could also define
for any field and any set of elements in the field, the subfield generated
by that set. We would then obtain a theorem just like 9.3 for this notion.
Similarly, we can define the notion of the subdomain generated by a set
of elements in an integral domain, and prove a similar theorem, and so on
9.1] GENERATION OF SUBFIELDS 355
for other kinds of algebraic systems like rings, etc. A number of the follow¬
ing results also hold when adapted to these other contexts.
The proof of the following is left to the reader.
9.4 Theorem.
Proof. The only part of the preceding which is not a direct adaptation
of the corresponding parts of 9.3 and 9.4 is (vii). It is seen as follows.
By (v), K(Z) c K(Z u Z'), and by (ii), Z'cZuZ'clf(ZuZ'). Since
K(Z U Z') is a subfield of C by (i), it follows by (iii) that (A(Z))(Z') C
356 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9
Proof. Parts (i) and (ii) are obvious; in particular, G(0) = 0, (7(1) = 1
by (i); (iii) follows by our previous remarks. Clearly D is closed under
+ and •, with (?(/i(£) + /2(£)) = A(z) + /2(z), G(/i(£)-/2(f)) =
/i(2) 'f2{z). This proves (iv) and (v).
9.1] GENERATION OF SUBFIELDS 357
9.9 Theorem. Suppose that K is a subfield of C and z e C but z & Alg (A).
Then the function G defined in 9.8 can be extended to an isomorphic
mapping of A(£) onto K(z), so K(£) — K(z).
/i(g) = ffiO)
/2(g) ^2(2)
m) _ gitt)
fi{k) 02(0
Clearly the extended mapping G' also keeps fixed each element of K and
preserves —, •, and —1. Hence all that remains to be proved is that
K(z) is the same as the range of G'. Let L = <R(G'). Then K c L, z e L
and L is a subfield of C. Hence K{z) c L by 9.6(iii). On the other hand,
it is clear that each element fi(z)/f2(z) of L is already in K(z), since the
latter is closed under all the rational operations.
Essentially what is involved in this proof is that if z is not algebraic
over K then K[z\, i.e., the range D of the mapping G of 9.8, is a simple
transcendental extension of K. But then A[£] = K[z\ by 5.6. Hence the
corresponding fields of quotients are = by 6.12. Since A (z) can be seen
to be the field of quotients of A[z], we have A(^) = A(z).
This theorem completely determines the structure of simple trans¬
cendental extensions K(z) in C, i.e., extensions where z is not algebraic over
A. These are all isomorphic to A(£) and hence to each other. Thus,
for example, tv and e are algebraically indistinguishable over the rationals,
Ra(7r) = Ra(e).
case that z is algebraic over K. Now it can be seen directly that K(z)
consists of all elements fiiz)/f2iz) for which f\(t), f2(t) E A[£] and
f2(z) 0. However, as we saw earlier, we are blocked from establishing
the same sort of connection between Kit) and K(z) in this case as for the
nonalgebraic case 9.9.
Our first step in treating the case that z is algebraic over K is to obtain
a survey of the set of all polynomials fit) over K of which z is a root.
This is accomplished by the next theorem.
Proof. First let pit) be any monic polynomial in 7v[£] of degree n such
that p(z) = 0; by definition of Alg (A), there is at least one such poly¬
nomial. Let fit) G A[£] be arbitrary with fit) ^ 0. By the division
algorithm 6.26 there exist (unique) hit), rit) in A[£] with
Then
9.11 Definition. For any p(£) in K[£] and fi(£), /2(£) E K[%], we let
/i(£) =fz(i) (mod p(0) hold if and only if p(£)|(/i(£) — /2(f)).
9.12 Theorem. Suppose that K is afield and p(£) is prime in K[£\. Then.
(i) = (mod p(£)) is a congruence relation in (A[£], +, ■, 0, 1);
(ii) the system of equivalence sets [/(£)] of this relation under the
associated operations forms a field;
(iii) for each /(£) e A[£] there is a unique r(£) e K[£\ with /(£) =
r(£) (mod p(£)) and 0 < deg (r(£)) < deg (p(£)).
forms a commutative ring with unity, so long as we know that [1] 5^ [0].
This holds, of course, from the assumption that p(£) is prime in K[£].
It remains only to prove that
(2) if /(£) ^ 0 (mod p(£)) then for some g(£) E K[£\, we have
ftt) • 9(0 = 1 (mod p(£)).
For then each nonzero equivalence set [/(£)] has an inverse [p(£)],
[/(£)] 0 [#(£)] = [1]. Since p(£) I /(£) by hypothesis here, we must have
(p(£)>/(£)) = T Hence by the representation 6.30 of gcd, there are
polynomials h(£), <?(£) in A[£] with 1 = h(£)p{£) +/(£){/(£)■ Put this is
just the conclusion desired in (2), and thus (ii) is proved, (iii) is just a
restatement of the division algorithm for division by p(£).
360 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9
We can now bring these results together to obtain the following descrip¬
tion of simple algebraic extensions.
and w = r(z).
The Theorems 9.12 and 9.13 are quite important initial results in the
algebraic theory of fields. We wish to illustrate them with a few examples.
Consider first the case A = Ra, z = y/2. The unique monic prime
p(!) G Ra[£] with y/2 as root is p(£) = f — 2. According to 9.13(h),
each element w of Ra(\/2) has a unique representation in the form
w = r(V2) where r(^) G Ra[£] has degree less than 2. Hence for each such
w there are unique rational numbers a, b with w = a -f by/2. That the
set of these numbers forms a subfield of C is the result of 9.13(i). To see
this directly, we need only verify closure under the rational operations
+, —, •, —1. Verification is trivial for + and —:
1 _ a — by 2 _ a b
a + by/2 (a + &V2)(a - 6\/2) a2 ~ 252 «2 “ 262
From the point of view of polynomials modulo £2 + 1, this last takes the
form (a + 6£) • (ai + 6i£) = aai + (dbi + bdx)| + bbifi2, and
{d + b%) • (di + 6i|) = (ddi — bbi) -f- (dbi -T bd{) <(rnod + 1),
Adjoining roots to drbitrdry fields. Theorems 9.12 and 9.13 show that if
Ai is a subfield of C, p(£) is prime in iv[£] and 2 is a root of p(£), then we
can construct a field isomorphic to K(z) in the system of equivalence
sets modulo p(£). From the purely algebraic point of view the set C of
complex numbers is superfluous here, for we can already verify directly
that the polynomial p(£) (or more precisely, the polynomial with cor¬
responding coefficients in the system of equivalence sets) has a root in the
system of equivalence sets. This leads us to the following general theorem,
which we have alluded to several times earlier.
9.15 Theorem. Suppose thdt (K, +, •, 0, 1) is dny field dnd /(£) e K[£]
is of degree m > 0.
(i) We cdn construct d field (L, +, •, 0, 1) which contdins K as a
subfield dnd dt ledst one element x with f(x) — 0.
(ii) We cun construct such a field L which contdins elements Xi, . . . , xm
with /(£) = ($ — *!)■•■({ — xm) in L[£].
But x2 = —x — 1 = x + 1 so
(a + bx) • {ai + bix) = (acq + 661) + (a6i + 6ai + bbi)x.
These conditions for sum and product consistently determine the desired
field L. Its elements are 0, 1, x and 1 + x, and the above rule leads to the
following product table:
• 0 1 X 1 + X
0 0 0 0 0
1 0 1 X 1 + X
X 0 X 1 + X 1
1 + X 0 1 + X 1 X
2-jt ... 2t
f = cos-1 sin —
3 3
with coefficients in K, and so on for K(zx, z2, z3), etc. Moreover, each of
these representations is unique.
This suggests studying extensions of the form L = K(v i, ... ,vm) with
the property that every element of L can already be uniquely expressed
as a linear combination of vx, . . . , vm with coefficients in K. In particular,
we would consider K{zx) in the form If(l, zx, . . . , 2?1_1) and K(zx, z2)
in the form K( 1, zx, z2, , z\z32, . . . , z\~1z2*~1) when looking at the
matter from this point of view. The main first result which we wish to
obtain about such extensions is that if such vx, ... ,vm can be found at
all satisfying the above conditions with K and L then the number m of
these is uniquely determined by K and L. For those readers familiar with
linear algebra, this is a direct consequence of a well-known result about
vector spaces. For if K e L, L can be regarded as a vector space over the
field K; in this we are concerned only with the operations of addition
applied to elements of L and of multiplication of elements of L by elements
of If. If vi, ... , vm can be found so that every element of L is a linear
combination of vx, ... ,vm with coefficients in K, then L is a finite¬
dimensional vector space over K. If, moreover, this representation is unique
then vx, ... ,vm forms a linearly independent basis for L over K. It is shown
in vector space theory that any two linearly independent bases have the same
number of elements, which gives the desired result here.
The choice /v*(0) = {0} here has a slight technical advantage. Clearly
K*(V) cK(V) but, in general, these are distinct. For example, Ra*(\/2)
consists only of numbers of the form a\J2 for a E Ra, hence does not con¬
tain any rational other than 0. However, as we know from 9.13, Ra(\/2) =
Ra*(l, y/2).
For V 0, K*(V) can be defined in a manner similar to K(V) via 9.2:
it is the smallest set (of elements of C) which contains V and which is
closed under addition and under multiplication by elements of K. (In
terms of vector space theory, we would call K*(V) the space generated or
spanned by V over K.) This definition leads to the following properties,
which are similar to those of K(V) and are easy to verify. Here, and
throughout the following, U, V, W, etc., are taken to be arbitrary subsets
of C.
9.17 Theorem.
(i) V c K*(V).
(ii) If U c F then K*(U) c K*(V).
(iii) K*(K*(V)) = K*(V).
(iv) K*(V) = UK*{W)[W c V and W is finite}.
(v) V is independent over K if and only if for each finite W c V,
W is independent over K.
9.18 Theorem.
(i) If w E K*(V U {u}) and w & K*(V) then u e K*{V U {w}).
(ii) If U is finite then for some V c U, V is independent over K and
K*(V) - K*(U).
(iii) If V = {vi, , vm) with m > 1 then V is independent over K
if and only if for each w £ K*(V) there are unique aX) ... , am E K
with w = aiv] + • • • + amvm.
The result (i), often called the exchange condition, is one of the main
properties needed to prove (ii) above, as well as the next two important
results.
Proof. For the purpose of the proof we will verify a more general state¬
ment. We prove by induction on to > 0 that whenever H, V satisfy the
hypotheses of the theorem then there exists W c U where W contains
exactly to elements and where K*(U) = K*(V U (H — IF)). In other
words, we can exchange the elements of V for certain elements of U with¬
out affecting K*(U). The proof of this is trivial for to = 0. Suppose that
it is true for to — 1, where to > 0, and suppose that V, U satisfy the
hypotheses of the theorem for to. Then for distinct vx, . . . , vm we have
V = {vx, . . . , vm}. Let Vi = {vx, . . . , vm_i} = V — {vm}. By hy¬
pothesis we can find Wi c U, such that Wx contains exactly to — 1
elements and such that K*(U) = K*(VX U (U — Wfi). Since vm e
K*(V) c K*(U), there is a hnite subset Ux of (U — Wx) with vm e
K*(Vi U Ui). Among such Hi there is one with a least number of ele¬
ments. It is impossible that Ux = 0, for otherwise vm e K*(VX) =
K*(V — {vm}), contradicting the independence of V. Choose any u e Ux.
Since Ux — {u} has fewer elements than Hi, we have
Thus, by exchange,
= K*(V U (H - IF)).
9.2] ALGEBRAIC EXTENSIONS 369
Hence
and then
9.20 Theorem. Suppose that U, V are both finite and independent over K
and that K*{V) = K*(U). Then V ~ U, that is, V and U have
exactly the same number of elements.
m = [K*(U):K].
Finite field extensions. We can now connect these results with field
extensions, as initially suggested.
When dealing with finite extensions we can make use of the number
[L: K).
370 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9
Z = 2
1=0, jV*
aPJ-
771
(1) w = ^2 biUi.
1=1
Moreover, for each bi, i = 1, ,m, we can find aiA, ... , ai<n G K with
71
(2) bi
j= i
Hence
m n
(3) w = 2 2 ai'iuivi-
i=i j=l
This shows that the set W of all numbers UjVj for i = 1, ,m, j =
1, . . . , n generates M over K, M = K*(W). Thus M is a finite extension
of K. If we show that the elements utVj are distinct for the distinct pairs
(i, j) and that W is independent over K then [M: K] = mn. The following
argument proves both of these. Suppose that a*,/, a[j e K, and
Then by (4),
rrt, nv
(G) 23 23
i—X i= 1
372 ALGEBRAIC NUMBER FIELDS ANI) FIELD EXTENSIONS [CHAP. 9
n n
[K{z):K] [L:K\.
2tt . 27t
f = cos — + l sin — •
To exhibit a basis for L over Ra we use 9.25 and the proof of 9.26. A basis
for Ra(\/2) over Ra is {1, y/2, (\/2)2} and one for Ra(x/2)(f) over
Ra(\/2) is {1, . Hence {1, a/2, v74, fv7?} is a basis for L
over Ra.
Corollary 9.28 can also be used to give a new and, on the basis of the
techniques developed in this chapter, simpler proof of Theorem 8.38—
that whenever K is a subfield of C then Alg (K) is an algebraically closed
subheld of C. For suppose that zx, z2 G Alg (.K). Then also 22 is algebraic
over K(z{). Hence, by the preceding, K(zx, z2) is a finite extension of K.
But then every element of K(zx, z2) is algebraic over K by 9.24. In
particular, this is true of zx + z2, zx — z2, zx • z2 and, in case z2 ^ 0,
z\/z2. To prove algebraic closure, suppose that 2 e Alg (Alg (K)), i.e.
for some w0, . . . , wm G Alg (K) with wm ^ 0 we have
w0 + wxz + • • • + wmzm = 0.
L = K(w0, . . . wm, z)
(9:3-1) ax -f- by = c
for some fixed point (a, b) and fixed c e Re with c > 0. To say that
(x, y) lies on the line £ determined by (9:3-1) we mean simply that (x, y)
satisfies (9:3-1). Similarly we define what is meant by: (x, y) lies on the
circle Q. It is seen that if £x, £2 are any two lines then £x = £2 or £x is
disjoint from £2 (in other words, £x is parallel to £2) or £x n £2 =
{(x, y)} for a unique point (x, y). In the last case we say that £x, £2
meet or intersect at (x, y). Similarly we can speak of the points of inter¬
section, if any, of a line and a circle or of two circles.
Algebraically, such points of intersection can be determined as follows.
To find the intersection of £x = {(x, y): axx biy = cx} and £2 =
{(x, y): a2x b2y = c2}, it is first seen that £x = £2 if and only if there
is some h 9^ 0 with ax — ha2, 6X = hb2 and cx = hc2, and that £x n
£2 = 0 if and only if there is some h 9^ 0 with ax = ha2, 6X = hb2,
and cx hc2. Then if £x £2 and £x n £2 9^ 0, it can be seen that
there is a unique solution (x, y) of
(The hypotheses ensure that ax/a2 5^ bi/b2> hence ax62 — a26x 9^ 0.)
Hence the coordindtes x, y of the point of intersection of two lines can be found
by rdtionol operdtions from the coefficients of the equations determining those
lines.
Similarly to find the intersections of a line £ = {(x, y): axx + biy — cx]
and a circle e = {(x, y): (x — a2)2 + (y — b2)2 = cl), we seek to solve
simultaneously
By subtracting one equation from the other, this problem can be reduced
to one of solving simultaneously either one of these equations with a linear
equation, namely
(2ai 2a2)x -f- (2bx 2b2)y = (cf c2) -j- (a2 — af) -f- (b2 — b\).
If the circles have the same center they will be disjoint or identical.
Otherwise (a1; bf) (a2, b2) and we see again that there are at most two
solutions for the intersection of two circles, any one of which can be expressed
by rational operations and real square roots from the coefficients of the equa¬
tions of the given circles.
Any ruler and compass construction starts from a finite set of initial
data, which consist of specific points, lines, and circles. The construction
proceeds in a finite number of steps to produce new points, lines, and
circles. These can be found only by means of one of the following operations:
(9:3-7) (a) Given distinct points Px, P2 we can construct the unique
line £ on which these lie.
(b) Given a point Px and distinct points Qx, Q2, we can construct
the circle Q with center Pi, any point P2 of which has the
same distance from Pi as Q2 has from Qx.
(c) Given distinct lines £i, £2, we can construct the point of
intersection, if there is any.
(d) Given a line £ and a circle Q, we can construct the point or
points of intersection, if there are any.
(e) Given distinct circles Ci, C2, we can construct the point or
points of intersection, if there are any.
done for the constructions (9:3-7c, d and e). For the construction (9:3-7a)
we are given xx, yx, x2, y2 with (xx, yx) 9^ (x2, y2) and we want an equa¬
tion of the line through (xX} yx) and (x2, y2). As is well known, this equa¬
tion is x = xi, for the case that x2 = xx, and is (y — yx)/(x — xx) =
(y2 yi)/(x2 Xi) for the case that x2 5^ xx. In either case this can
be brought to the form ax -f- by = c, where a, b, c are obtained by rational
operations on xxyx, x2, y2. For (9:3-7b) we are given a, b, xx, yx, x2, y2
and the desired equations is (x — a)2 + (y — b2) = c2 where
(9:3-8) Suppose that the numbers or coefficients for the initial points,
lines, and circles of a construction lie in a certain set A of real
numbers. Let K — Gen (A). Suppose that a point (x, y) is
constructed by ruler and compass from the original data. Then
there exist subfields Kx c K2 c • • ■ e Kq+1 = L of Re such
that (i) Kx = K, (ii) for each j = 1, . . . , q, Kj+1 = Kfiy/uf)
for some Uj e Kj with Uj > 0, and. (iii) x, y e L.
For if we can show that each point (x, 0) for x E L and each point (0, y)
for y E L can be so constructed, we can find (x, y) as the intersection of
the vertical through (x, 0) and the horizontal through (0, y). We shall
indicate why this works for the points (x, 0) with x E L, the proof for the
points (0, y) being similar. It is sufficient to show that for each x E K
this is so, which is trivial, and that the set of x with (x, 0) so constructible
is closed under +, —, •, and ~1, as well as the operation of taking square
roots of positive numbers. For +, — this is standard. For xx ■ x2, we can
restrict ourselves to the case xx > 0, x2 > 0. Then we use the following
figure:
378 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9
Here P is the point (x, 0), A is a point whose distance from P is the same
as that of (1, 0) from (0, 0), Q is the point of bisection of OA, Q is the
circle with center Q through A, and BP is drawn perpendicular to OA
through P. Then OB A is a right triangle, and it is seen that triangles
OPB and BP A are similar. Hence corresponding proportions are equal,
x/w = w/1, so that w2 = x and w = \fx. Then by constructing B' on
the positive T-axis with distance to 0 the same as that of B to P, we obtain
(y/x, 0).
The algebraic equivalent of constructibility.
Since we have not given a completely explicit description of geometry,
the foregoing can only be taken as a sketch of what to expect on the basis
of such a description. However, we believe the reader should accept the
following nongeometric definition as providing an adequate equivalent,
on the basis of the foregoing, of the notion of construction by ruler and
compass in the cartesian plane.
and
(iv) x, y e Kq+1.
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 379
Then by 8.9,
Vv^fTT2 + 8j + tf - Sj
(±) 2 r 2
380 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9
Now let
(4) if u + iv is in
then
u, v E Lz(i—i) +2j for l = 1, • • ■ , Q -j- 1.
Since always
this shows that the sequence of subfields L\, L2, . . . , L3g_|_2 has the form,
Li = Gen (A), Lk+i = L^y/rif) for k — 1, . . . , 3g + 1, with rk e Lk and
rfc > 0. Since we can delete all repetitions in this sequence with rk = 0,
we thus have a sequence of subfields of Re satisfying the conditions
(i)-(iii) of 9.29. But by (4), since w = x + iy is in Kq+U we have x,
y G L3q+2 and hence (x, y) is constructible from A.
From the algebraic point of view it is often more convenient to deal with
C-constructibility of numbers x -f iy than with constructibility of “points”
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 381
(X, y). We can now derive a simple algebraic criterion for C-constructi-
bility, which is the main tool in settling the classical construction problems.
Note that there is no loss of generality in starting here with an arbitrary
subfield K of C, since C-constructibility from a set B is the same as that
from Gen (B).
Suppose that the given angle has radian measure 9. The supposed con¬
struction will end with the following figure.
Figure 9.4
But then by 9.32, the primitive nth root of unity fn = cos 2ir/n +
i sin 2-7T/n will be C-constructible. Let qn{k) be the unique monic poly¬
nomial prime in Ra[£] of which is a root. Thus we are led to finding
this polynomial and its degree, in order to compute [Ra(f„): Ra] = y(n).
It can be shown that <p(ri) is the number of integers to with 1 < m < n
and (to, n) = 1, that is, to relatively prime to n (<p is often called Euler’s
function). Furthermore, a general computation of <p(n) is available in
terms of the prime power representation of n:
(We exclude 2m°_1 here if to0 = 0.) Thus [Ra(fw): Ra] is a power of 2
if and only if n has the form n = 2”*°^! • • • pj, where px, . . . , pj are dis¬
tinct odd primes, for each of which pi — 1 is a power of 2. This leads us
to consider those prime numbers p which can be represented in the form
p = 2l + 1. A further consideration shows that l must itself be a power
of 2 in this case, and p = 22 + 1. These numbers are called the Fermat
primes. For k = 0, 1, 2, 3, 4 we obtain as values of 22 +1 the numbers
with Fermat primes Pi, ... ,Pi, then a regular n-sided polygon cannot he
constructed by ruler and compass. In particular, for n = 7, 9, 11, 13, 14,
18, 19, . . . we cannot thus construct a regular n-sided polygon. On the
other hand, it has been known since the time of the Greek geometers how
to construct regular 3-sided (triangles), 5-sided (pentagons), and 15-sided
polygons, and hence polygons having 2m ■ 3, 2m • 5, and 2m • 3 • 5 sides,
for any m. In particular, we can construct regular n-sided polygons for
n = 3, 4, 5, 6, 8, 10, 12, 15, 16. What was unsuspected until the time of
Gauss was that a regular 17-sided polygon can also be constructed by ruler
and compass. In fact, Gauss showed that the above is a complete descrip¬
tion; if n = 2mpi ■ ■ ■ pi(m > 0, l > 0) with Fermat primes pi, . . . , Pi,
then a regular n-sided polygon can he constructed by ruler and compass.
A full exposition of these results would require a fair amount of addi¬
tional work beyond the material of this book. Thus we shall content
ourselves with an indication of how these results can be obtained for the
special case where n is just a prime number, which we now denote instead
by p. Further details for this are suggested in the Exercise Group 9.3.
Thus given a prime number, p > 3, and f = cos 27r/p + i sin 27r/p,
the first task is to find the unique monic polynomial g(£) which is prime
in Ra[£] and which has ( as a root. We know that £ is a root of 1 +
2 + •••+£ + 1. It can be proved that this is the desired polynomial
(cf. Exercise 5 below). Hence [Ra(f):Ra] = p — 1 and p — 1 is, in¬
deed, the number of integers m with 1 < m < p and (m, n) = 1. Then
if we prove that every prime p of the form p = 2l + 1 must be a Fermat
0k
prime, i.e., of the form p = 2 -j- 1 (cf. Exercise 6 below), we reach the
first special conclusion (by 9.31 and 9.32) that if p is prime and a regular
p-sided polygon can be constructed by ruler and compass, then p is a
Fermat prime.
To prove the converse, we use the following result, established in
Exercise 8 below. Suppose that K is a subfield of C, z\ is a root of a poly¬
nomial q(£) G R[£], of which the other roots are z2, ... , zn. Suppose that
K(zi) = K(zf) for every i, j and that [K{zi) \ K] = 2m for some m > 0.
Then zi is C-constructible from K. (This is a partial converse of 9.32.)
In particular, for the polynomial g(£) = 1 + j:p~2 + • • • + £ + 1,
the roots are f, f2, . . . , h We know (by Exercise 12 of Exercise
Group 8.1) that Ra(f) = Ra(G) for any i = 1, . . . , p — 1. Hence we
can apply the preceding result directly to obtain: if p is a Fermat prime
then a regular p-sided polygon can be constructed by ruler and compass.
Given a Fermat prime we can, in principle, analyze the foregoing proofs
to obtain an actual construction of a regular p-sided polygon. In practice
this becomes somewhat involved. Gauss developed a systematic pro¬
cedure for obtaining the required constructions. Even this becomes quite
involved for larger values of p, but it was used successfully to give an ex-
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 385
plicit construction for the first interesting case, p = 17. The much simpler
case, p = 5, can be handled as follows. Here f = cos 2ir/5 + i sin 2r/5.
From 1 = T5 we obtain f-1 = f4 and f~2 = f3. Hence
(8:3-11) (i) AT = K,
(ii) znp G Kj for j = 1, . . . , q,
(iii) Kj+i = Kfizj) for j = 1, . . . , q,
(iv) w E Ag_|_i.
?r(n)
lim 1
n—>go n/log n
(where the logarithm is taken to the base e). This was first conjectured
by Gauss, but it was not finally proved until the end of the 19th century,
concluding with the work of Hadamard and de la Valine Poussin.
In recent years there has been some success in obtaining elementary,
i.e., nonanalytic, proofs of several such results. This has been done by
Erdos and Selberg for the prime number theorem by some rather difficult
arguments. However, the methods of analytic number theory are still
among the most powerful and penetrating for the solution of various
number-theoretical problems whose statements are quite elementary.
Thus the extensions of the basic number systems provide much more
than a merely formally satisfying edifice. The search for a satisfactory
solution of various elementary problems necessarily led to their successive
development, with rewards that could hardly have been expected initially.
As with Euclid’s systematization of the geometry of his time, the account
presented here of this development followed the discovery of most of the
significant results which it comprehends. This work of our century is part
of the new view which has been reached of mathematics as an integrated
whole. As with Euclid’s geometry, the reader should take it not as an
end but as a new beginning—in this case leading him into the study of the
surprising interrelationships between the various branches of mathematics
which have been uncovered under this modern view.
APPENDIXES
APPENDIX I
The basic notions of the theory are two relations between sets, that of
identity or equality, =, and membership, E. As usual we denote nonequality
by nonmembership by <g. The basic axioms concerning identity and
its relationships to membership are the following.
The first four of these treat the logical aspects of identity, for which cor¬
responding statements would hold in any mathematical context. Axiom 5,
however, is specifically set-theoretical. It is called the axiom of exten¬
sionality and shows that a set is completely determined by its members
[cf. (2:1-23)].
We wish now to give some axioms of set-existence. To begin with, we
have an axiom guaranteeing the existence of at least one set. (In most
treatments of logic this would be logically derivable.) The simplest such
set is the empty set.
In general, the notion “the unique x such that G(x) ” can be introduced
whenever d(x) is a condition for which we have proved that (i) there
exists an x such that &(x), and (ii) for any x, y if B{x) and 0l(y) then x — y.
Next we have an axiom allowing the construction of unordered pairs.
Axiom 7. For any a, b there exists an A such that for all x,x E A if and only
if x = a or x = b.
Axiom 8. For any Iff there exists an S such that for all x, x E S if and only
if for some X E M we have x E X.
Definition 4. For any iff, UA[X e iff] is the unique S satisfying the con¬
dition of Axiom 8. In particular, we take
The next axiom is that for the set of all subsets of A, often called the
power-set of A.
Axiom 9. For any A there exists an S such that for all X, X E S if and only
if X c A.
Axiom 10. For any S there exists an A such that for all x, x £ A if and only
if x E S and &(x).
Definition 7. For any S, {x: x £ S and Ct(:r)} is the unique set A satisfy¬
ing the condition of Axiom 10.
We shall now sketch how one can prove the existence of intersection,
difference, and cartesian product from these axioms. First note that if
we can define f}X[X £ M] and if M X 0, then for any S £ M we have
r\X[X £ M] c s. This suggests the following.
Thus here we are using Axioms 7-9 and Axiom 10 with S = (P((P(A U B)).
It can then be proved that for any a £ A and b E B we have (a, b) E AX B.
Also note how the existence of domain and range (2:3-15), (2:3-16) can be
realized. Each element a of SD(TF) is a member of some {a, b}, which is in
turn a member of the element (a, b) of W. In other words, if we put
Wi = UX[X e W] and *8 = UF[F £ Wj], we should have 2D(IF) £ *8,
and similarly (R(IF) c S. In fact, if we take £>(1F) = {x:x £ UF[F £
UX[X £ IF]] and for some y, (x, y) E W}, we are guaranteed the exist¬
ence of 4)(IF) by Axioms 7, 8, and 10, and we can then prove that
SOME AXIOMS FOR SET THEORY 395
Definition 11. Let S be any set satisfying the conditions (i), (ii) of Axiom
11. Let M = {X: X £ (P (S) and 0 £ A and whenever
x £ X then {x} £ X). Let P = DA[A £ M],
Axiom 12. Suppose that M is such that for all X e M, X ^ 0, and for
all X, Y E M, if X 9^ Y then X n Y = 0. Then there exists
an A such that for each X e M, there is a y with A n X = {y}.
Axiom 12'. Suppose d 1 is such that for all X E d/, X 0, and for all
I, F g M, if X 5^ Y then X n Y = 0. Then there exists
a function F with 36(F) = M and such that for each X E M
we have F(X) E X.
Axiom 12". For any M there exists a function G such that 36(C) =
{X: X E M and X ^ 0} and such that whenever X E 36(C)
then G(X) E X.
To see that Axioms 12, 12', 12" are equivalent under the preceding axioms
(in fact under 1-10), suppose first that 12 holds. Given M satisfying the
hypothesis of 12' it also satisfies that of 12. Hence we can find a set A
satisfying the conclusion of 12. Now put F = {(X,y):X Ed/ and
y E A n X}. The existence of F is seen from Axiom 10 since it is a sub¬
set of M X A and we have already seen how the existence of this set
can be proved. Clearly F is the desired function to satisfy the conclusion
of Axiom 12b Suppose that 12' is true. Consider M satisfying the hy¬
pothesis of 12". Its elements are not necessarily disjoint or nonempty.
Let Mi = {X: X e M and X ^ 0}. This exists by Axiom 10. Now we
“disjoint” the elements of Mi. Let d/2 = (X X {X}: X E M-f}] this
exists from d/2 C d/x X (P(Mi). Then d/2 satisfies the hypothesis of 12'.
SOME AXIOMS FOR SET THEORY 397
In radian measure, the angle Z.NOP should have the same value as twice
the area of the sector NOP. The latter is seen to be the area of the triangle
OQP, where Q is (x, 0), together with the area under the circle and above
the £-axis between Q and N. This leads us by the calculus to define the
function A, with domain {x\ — 1 < x < 1}, by the following condition:
When —1 < x < 0, the quantity fl \/1 — t2 dt gives the total area
under the circle and above the ;r-axis between Q and N, while the quantity
— x2 gives the negative of the area of the triangle OQP, so (1)
still gives A(x) correctly as twice the area of the sector NOP.
Taking (1) as the starting point, we can prove the following:
(2) (a) The derivative A'(z) is defined for — 1 < x < 1 and A'(x) =
— 1/Vl — x2 there;
(b) A is continuous and decreasing for — 1 < x < 1;
(c) ^4(1) = 0 and A(—1) = 7r where ir = 2 \/l — t2 dt;
(d) A(—x) = 7r — A(x) and .4(0) = 7t/2;
(e) if 0 < 0 < it then there is a unique x with —1 < x <1
and A{x) = 0; further, 0 < x < 1 if 0 < 0 < ir/2.
The continuity of A and the formula for A' (x) are seen from the represen¬
tation
the standard method for differentiating xy/l — x2, and the fundamental
theorem of calculus, by which the derivative of \/l — t2 dt is
\/l — x2. By (a), we have A'(x) < 0 for —1 < x < 1, and hence A
is decreasing for — 1 < x < 1. Parts (c)-(e) are straightforward; in (e)
we can use Weierstrass’ Nullstellensatz 7.48 to find at least one x with
A(x) = d and — 1 < x < 1, given that 0 < 9 < it; similarly if
The next step, on the basis of (2e), is to define functions C and S by:
(3) (a) C(d) = x and S(6) = y/l — x2 whenever 0 < 6 < it and
x is the unique number such that—l < x < 1 and A{x) = 6]
(b) C(—d) = C{0) and S(—d) = — S(9) whenever — tv < d < 0;
(c) C(6 + 2mr) = C(d) and S(d + 2mr) = S(6) for any d and
n e I.
Since for each 6 there is a unique n e I with — tv < 6 + 2mr < ir,
these conditions determine C(0), S(6) in a definite way for all d. We then
have from (2c, d) and the preceding definition:
These are a little easier to obtain from (1)—(4), but an analytic verifica¬
tion without appeal to intuitive geometrical considerations is still fairly
troublesome. We have here to speak of the angles of a triangle.
for any (xx, y 1), (x2, y2). What must be proved here is that given distinct
points M(xM, yM) and Ar(.rJ\r, ym) we can find a rigid motion F with
ANALYTICAL BASIS OF TRIGONOMETRIC FUNCTIONS 403
F(xm, Vm) — (0, 0) and F(xN, yN) = (d, 0) with d > 0. We suppose
that this can be done.
Consider the special case where M is already the origin O and N, Q
are points on the unit circle with xN > xQ. The figure below shows the
given triangle and its transformed version.
In this case the image of N is A^O, 1) and that of Q is some P(x, y) on the
unit circle, since both OQ, ON have unit length. Let 9y = A(xN),
92 = A(xq), and let 9 equal the angle /LNOQ as determined by the rigid
motion, hence 9 = .4(.r). Thus, xN = C(91), yN = S(6i), xq = C(d2),
Vq = F(92), x = C(6), y = S(6). It can be shown that rigid motions
also preserve areas. The area of the sector ONQ is d2 — 9y) and that
of ON'P is \9] hence 9 = 92 — 9y. Now the square of the distance of N'
to P is
= 2 - 2(C(02)C(()1) + «fi1)).
Thus, by comparison,
that is, we have (5a) for this special case of 9y, 92. Now (5a, b) can be
proved in general by a systematic use of (2)-(4).
To conclude this approach, the continuity of the functions C, S can be
obtained by using a general theorem of the differential calculus, according
to which if F is a function on Re and F'(a) is defined then F is continuous
at a., In this case we show that C'(9), S'(9) are defined for every 9, and
S'(t) = lim ^ \ - SW
A--> 0 h
The desired result (6) is then seen from the special results
The area given by h is equal to twice the area of the sector ONP, which
lies between twice the areas of the triangles OMP and ONQ, hence
Then C(h) < h/S(h) < 1 /C(h) and we get \imk-+oh/S(h) = 1 from
lim/,^0 C(h) = 1. The second part of (7) is obtained by writing
text cited above. A related approach can be taken based on the definition
of angle in terms ol arc length rather than area; we shall not pursue this.
The second approach to the trigonometric functions is more sophisti¬
cated and formal, but the verification of the basic properties is also
smoother going. It is based on the power series representations of these
functions. We follow here the treatment given by Rudin in his book,
Principles of Mathematical Analysis* pp. 150-152.
What lies behind the basic definitions taken here is that if F is a func¬
tion on Re such that the nth-derivative Fin) (taking F(0) = F) is defined
on Re for each n then we have
F(n\0) B
F(x)
E
n=0
n\ A
for any n. With this in mind, we can take the following as definitions of
the functions C, S:
That these functions are defined for all 0 can be proved from 7.41. Further,
we see immediately from (8) that (7(0) = 1, $(0) = 0, C(—0) = (7(0)
and *S(—0) = —S(6) for any 0.
Now it is shown in analysis that if a function F is defined by F(x) =
o °nXn for all x, then F is continuous on Re, all derivatives F(k) are
defined for all x, and we have
n —k
= E anx
s
n=0
(—1)” + 1
(2n +1)1
„2re + l
-W,
and S'(6) = (7(0) (by a similar computation).
Consider any fixed real number <p and let
BIBLIOGRAPHY
Logic:
Quine, W. V., 21 lathematical Logic (rev. ed.). Cambridge: Harvard University
Press, 1951.
Suppes, P., Introduction to Logic. Princeton: Van Nostrand, 1957.
Set Theory:
Kamke, E., Theory of Sets. F. Bagemihl, translator. New York: Dover, 1950.
Suppes, P., Axiomatic Set Theory. Princeton: Van Nostrand, 1960.
Algebra:
Birkhoff, G. and Maclane, S., A Survey of Modern Algebra (rev. ed.).
New York: Macmillan, 1953.
Paige, L. J. and Swift, J. D., Elements of Linear Algebra. Boston: Ginn, 1961.
Van der Waerden, B. L., Modern Algebra, Vols. I and II. F. Blum, trans¬
lator. New York: Ungar, 1949.
Number Theory:
Le Veque, W. J., Topics in Number Theory, Vols. I and II. Reading: Addison-
Wesley, 1956.
Pollard, H., The Theory of Algebraic Numbers (Carus Mathematical Mono¬
graphs, Number 9). New York: Wiley, 1950.
Analysis:
Apostol, T. M., Mathematical Analysis. Reading: Addison-Wesley, 1957.
Hille, E., Analytic Function Theory, Vol. I. Boston: Ginn, 1959.
Rudin, W., Principles of Mathematical Analysis. New York: McGraw-Hill,
1953.
408
INDEX
INDEX
ABCDE69876543
Date Due
£TH3]
t. y BA'
A
NOV 9 ic 70
-M U
1 IRD A DV
r>"
t
APR 2 2 1999—
-T- ™ . li =r*7:i ^
DEC—3 e/s ftp
-mxr—~ I7J
nnT u«
-Uu r, o 1386
» IvW
NO. 23233
PRINTED \b U. S. A. (WJ caT
QA 241 F4
Feferman, Solomon. 010101 000
Tne number systems; foundation
63 0126777 .
TRENT UNIVERSITY
QA241 *F4
■fTo-Tp-pmArt, Salomon -
ISSUED TO
DATE
T
i __ _ —
-<--—
£§§=±i- . —K_. -
mm
14218
QA Feferman, Solomon
241 The number systems
F4
Trent
University