You are on page 1of 440

SOLOMON FEFERMAN


NUNC COCNOSCO EX PARTE

TRENT UNIVERSITY
LIBRARY
Digitized by the Internet Archive
in 2019 with funding from
Kahle/Austin Foundation

https://archive.org/details/numbersystemsfouOOOOfefe
THE

Foundations of Algebra
O

and Analysis
J
This book is in the
ADDISON-WESLEY SERIES IN MATHEMATICS

Lynn Loomis, Consulting Editor


THE

NUMBER SYSTEMS
Foundations of Algebra
and Analysis
J

by

SOLOMON FEFERMAN
Department of Mathematics
Stanford University

ADDISON -WESLEY PUBLISHING COMPANY, INC.


READING, MASS. • PALO ALTO • LONDON
Copyright © 1964
ADDISON-WESLEY PUBLISHING COMPANY, INC.

Printed in the United States of America

ALL RIGHTS RESERVED. THIS BOOK, OR PARTS THERE¬

OF, MAY NOT BE REPRODUCED IN ANY FORM WITH¬

OUT WRITTEN PERMISSION OF THE PUBLISHERS.

Library of Congress Catalog Card No. 63-12470

ONULP
Dedicated to my mother and father.

1 R
PREFACE

The subject matter of this book is the successive construction and


development of the basic number systems of mathematics, namely the
positive integers, integers, rational numbers, real numbers, and complex
numbers. It is a subject that many mathematicians feel should be learned
by every serious student in this field. Preferably, he should do this as
soon as possible after his first course in mathematical analysis (calculus)—
either before or during his introduction to more rigorous treatments of
analysis and algebra.
Despite the significance of this subject in a mathematical education,
there does not seem to be any special provision for its study in most
American universities. Sometimes a hasty review of the material is given
in intermediate courses on algebra or analysis. Another approach often
taken in these courses is to begin with the real number system as axio-
matically given, rather than to develop its properties from more basic
notions and results.
We believe this situation has come about for several reasons. First of
all, the (now) classical presentations of this material have a curious isola¬
tion from the rest of mathematics. The ideas and methods employed
seem to have a “once only” character and lack the sense of interrelated¬
ness of most other important mathematical concepts. Second, the rate
at which knowledge is increasing makes it imperative that the student
of mathematics hasten his mastery of the main parts of his field. Finally,
and in tune with the “abstractness” of modern mathematics, there is a
growing tendency to present all its parts axiomatically.
As a result of these circumstances, there is often a gap in the student’s
education between his “concrete” computational work in the calculus and
his more advanced work. It is true that modern abstract analysis and
algebra have developed as the proper means to encompass, and then to
advance beyond, the particular notions and results concerning the classical
number systems uncovered before this century. However, a firm grasp of
the significant particular cases provides the best basis for an appreciation
of the newer developments.
It thus seems to us that the subject of this book provides the most
appropriate material for this transition period in the student’s mathe¬
matical education. We have tried to give here a presentation which is on
the one hand up to date, complete, and rigorous, and on the other hand
constantly motivated with reference to both the student’s background and
the needs of modern mathematics.
vii
Vlll PREFACE

We believe the approach taken here makes the text adaptable to a


variety of teaching situations. It can be used for a one-quarter or a one-
semester course specifically set aside for this material and demanding no
prerequisites at this level. It can be the text for the first part of a con¬
ventional intermediate course in algebra or analysis, with certain sections
omitted or merely sketched, according to the specifications of the course
and the taste of the instructor. It might also be used as a basic reference
work for such a course or as the text for a reading course, which the student
would master by independent study. It is with this last possibility espe¬
cially in mind that we have chosen to make this book self-contained and
to pursue clarity and completeness, rather than conciseness.
S. F.
Stanford, California
October 1963
CONTENTS

Chapter 1 The Logical Background. 1

1.1 Introduction. 1
The mathematical method 3

1.2 Logic. 4
Mathematical statements and their structure 4 • Existence 7 •
Logical connectives 10

Chapter 2 The Set-Theoretical Background.14

2.1 Sets.14
Sets as abstractions from conditions 14 • Extensions of the con¬
cept of set 17 • Identity and inclusion 19 • Some peculiar
sets 21
2.2 An algebra of sets.■.25
Intersection, union, and complement 25 • Basic laws of the
algebra of sets 29 • Extended intersections and unions 34

2.3 Relations and functions.36


Relations as abstractions from conditions 36 • Ordered pairs
and cartesian products 37 • Domain, range, and converse 39 •
Ternary (etc.) relations 42 • Operations on relations; composi¬
tion 43 • Special kinds of relations 44 • Equivalence relations
and partitions 45 • Functions 46 • Congruence relations 50 •
Converse and composition of functions 52
2.4 Mathematical systems of relations and functions .... 55
Isomorphism 55 • Set-theoretical equivalence 57 • Subsys¬
tems 58

Chapter 3 The Positive Integers.64

3.1 Basic properties.64


Peano systems and inductive proofs 64 • Functions on Peano
systems 66 • Isomorphism of Peano systems 70

3.2 The arithmetic of positive integers.73


Recursive definitions 73 • Addition of positive integers 75 •
Multiplication of positive integers 79 • Exponentiation and
other operations 81
IX
X CONTENTS

3.3 Order.82
Simply ordered systems 83 • Well-ordered systems 85 • Order¬
ing and the arithmetical operations 90

3.4 Sequences, sums and products.91


Finite and infinite sequences 91 • Extended sums and prod¬
ucts 93 • Generalized associative and commutative laws 94 •
Some special sums and products 98

Chapter 4 The Integers and Integral Domains. .... 101

4.1 Toward extending the positive integers.101


Practical motivations 101 • Algebraic motivations 103 • Com¬
mutative rings with unity 104

4.2 Integral domains.108


Ordered integral domains 110 • Absolute value 112

4.3 Construction and characterization of the integers .... 113


The existence theorem 115 • Uniqueness of the characteriza¬
tion 120

4.4 The integers as an indexing system.123


More general associative and commutative laws 125 • Geo¬
metric series; binomial expansion 127

4.5 Mathematical properties of the integers.131


The division algorithm 131 • The divisibility relation and the
primes 133 • Greatest common divisors 135 • Factorization of
integers into primes 139 • Positional notations for integers 143

4.6 Congruence relations in the integers.147


Homomorphism 148 • Properties preserved under homomor¬
phism 150 • Congruence modulo an integer 152 • Applications
to a Diophantine problem 155

Chapter 5 Polynomials.15g

5.1 Polynomial functions and polynomial forms.158


Existence and uniqueness of simple transcendental exten¬
sions 160 • Divisibility and roots of polynomials 167 • Formal
derivatives 168

5.2 Polynomials in several variables.170


&-fold transcendental extensions 171 • Symmetric polyno¬
mials 174 • The fundamental theorem on symmetric poly¬
nomials 178
CONTENTS XI

Chapter 6 The Rational Numbers and Fields.183

6.1 Toward extending integral domains.183


Algebraic motivations 183 • Geometric motivations 184 •
Fields 187 • Ordered fields; dense orderings 189 • Some finite
fields 190
6.2 Fields of quotients.192
The existence theorem 192 • Isomorphism of fields of quo¬
tients 197 • The rational numbers; fields of rational forms 198

6.3 Solutions of algebraic equations in fields.200


Systems of linear equations 201 • Linear equations in integral
domains 206 • Polynomial equations in the rationals 208

6.4 PoRnomials over a field.210


Basic properties of divisibility 210 • Prime polynomials 211 •
The division algorithm for polynomials 214 • Greatest com¬
mon divisors 215 • Unique factorization theorem for poly¬
nomials 217

Chapter 7 The Real Numbers.222

7.1 Toward extending the rationals.222


Algebraic motivations 222 • Geometric motivations 224 •
Upper and lower sections; continuously ordered systems 226 •
Existence of continuously ordered systems 229 • Greatest
lower bounds and least upper bounds 232

7.2 Continuously ordered fields.235


The Archimedean property 235 • Isomorphism of continuously
ordered fields 238 • Fundamental sequences 242 • The Bol-
zano-Weierstrass Theorem 243 • Construction of a continu¬
ously ordered field 248
7.3 Infinite series and representations of real numbers .... 256
Positional notations for real numbers 257 • Power series 262 •
The exponential function 264
7.4 Polynomials and continuous functions on the real numbers . . 267
Weierstrass’ Nullstellensatz 268 • Real polynomials and their
roots 271 • Computations of roots 275 • Location of all roots:
Sturm’s theorem 278 • Rational and real powers of real num¬
bers 285
7.5 Algebraic and transcendental numbers.288
Cantor’s method 289 • Denumerable and nondenumerable
sets 291 • The existence of transcendental real numbers 295 •
Liouville’s method 296
xii CONTENTS

Chapter 8 The Complex Numbers.303

8.1 Basic properties.303


Characterization of the complex numbers 303 • Complex con¬
jugates 306 • Square roots of complex numbers 307 • A geo¬
metric interpretation 309 ■ Absolute value 310 • Basic prop¬
erties of the trigonometric functions 313 • The trigonometric
representation; De Moivre’s theorem 317

8.2 Polynomials and continuous functions in the complex numbers . 322


Limits and the Bolzano-Weierstrass theorem extended 324 •
Continuity extended 327 • Polynomial functions; growth and
minimum of the modulus 328 • The fundamental theorem of
complex algebra 331 • On computing roots of complex poly¬
nomials 334 • Decomposition of real polynomials 335
8.3 Boots of complex polynomials.337
Boots of polynomials over a subfield 337 ■ Algebraically closed
subfields 337 • Multiple roots; discriminants 342 • Boots of
cubic equations 346 ■ Boots of fourth degree equations 349 •
On equations of higher degree 350

Chapter 9 Algebraic Number Fields and Field Extensions 353

9.1 Generation of subfields.353


The general extension process 355 • Simple extensions 356 •
Simple transcendental extensions 357 • Simple algebraic exten¬
sions 357 • Adjoining roots to arbitrary fields 362
9.2 Algebraic extensions.365
Linearly generated extensions; bases and dimension 366 • Finite
field extensions 369 • Iterated finite extensions 371

9.3 Applications to geometric construction problems.374


Basic geometric notions 374 • The realization in the cartesian
plane 374 • Buler and compass constructions 376 ■ The alge¬
braic equivalent of constructibility 378 • Some classical con¬
struction problems 381 • Begular polygons; Gauss’ solution 383
9.4 Conclusion. 3^6

Appendix I Some Axioms for Set Theory.391

Appendix II The Analytical Basis of the


Trigonometric Functions.400

Bibliography. ,nQ

Index
411
CHAPTER 1

THE LOGICAL BACKGROUND

1.1 Introduction. The basic number systems of mathematics are the


following:
(1) the collection P of positive integers 1, 2, 3, . . . ;
(2) the collection I of integers . . . , —3, —2, — 1, 0, 1, 2, 3, . . . ;
(3) the collection Ra of rational numbers, consisting of all fractions
a/b, where a, b are integers and 6^0 (such as 2/3, —8/7, 2/—4) ;
(4) the collection Re of real numbers consisting of the rational num¬
bers and of the irrational numbers (such as s/2, —vm, 7r, \/3/7r,
eV‘2);
(5) the collection C of complex numbers, consisting of the real num¬
bers and the imaginary numbers and their combinations (such as
V=T, I - Vs V=T, VV + 2V=T).
Historically, the understanding and use of these number systems have
evolved over a period of several thousand years, more or less in the order
presented. There are a number of persuasive reasons why this development
took place and why the development came to a certain completion with
the complex numbers. In this book, we shall give a systematic exposi¬
tion of the same growth of ideas. We hope at the same time to convince
the reader that there is nothing capricious in this evolution. The only
accidental aspect of the subject is the use of certain words such as “ra¬
tional,” “irrational,” “real,” “imaginary,” and “complex” (indicating the
initial resistance which the introduction of new numbers met at each
stage). The force of long usage prevents us from replacing these by more
appropriate words.
The student has learned to understand something of the nature of the
different kinds of numbers and to perform various arithmetical computa¬
tions with them in grade school. However, at a certain point his ideas
about these may become clouded. How do we know that a/2 is not
rational? What do we mean when we say that 7r is approximately
even better that it is approximately 3.141(3, and even better that it is
approximately 3.14159? We are accustomed to thinking of real numbers
as measuring certain lengths, and of the product of real numbers, such
as a/2 • 7r, as measuring the area of a rectangle, in this case with sides
of length a/2 and 7r; on what basis do we assign another real number to
that area, i.e., give a length equivalent to that area? The student is now
at a point in his education where he can expect to get and is able to
assimilate clear (though not necessarily simple) answers to such questions.
1
2 THE LOGICAL BACKGROUND [chap. 1

One can go quite far on the basis of an uncritical use of the various
number systems. Much of the differential and integral calculus that we
know today, as well as the physical theories which are expressed in these
mathematical terms, was developed in just such a way. For example,
in the calculus we are asked to consider computations of infinite length,
such as
i A _l A_i i_ . . .
1 3 W 9 27 W

In this particular case, we are easily convinced, on the basis of the formula

(1 + r)( 1 — r + r2 — r3 + • • •) = 1,

(which is verified by “multiplying out”) that the result of the computation


should be 1/(1 + -J) = f. However, if we are uncritical we should avoid
asking what the result of

l-l + l-H-

is. For if we apply our formula, we obtain as answer 1/(1 + 1) = +


while it is equally evident that the answer should be

(1 — 1) + (1 — 1) H-= 0 + 04-= 0

and also that it should be

1 + (-1 4- 1) + (-1 + 1) 4-=1+0 + 04-= 1.

It is not that the uncritical approach necessarily gives wrong answers, but
rather that there are certain questions for which it provides no coherent
answer at all.
In the study of Fourier series (which have many applications in engi¬
neering and physics) in the latter half of the nineteenth century there
arose certain questions which could not be adequately answered on the
basis of an uncritical approach to the number systems and which, at the
same time, could not be avoided. In response, a number of workers in
mathematics and logic embarked on a critical program to clarify the
concepts which were involved. The result of their work gradually re¬
solved itself into a systematic theory which could be used to settle the
troublesome questions to the satisfaction of most mathematicians. An
understanding of this theory is now an essential prerequisite to the study
of modern mathematics.
We confine ourselves in this book to that part of this theory which has
most to do with the number systems themselves, and to those matters
in mathematics which are most directly related to the number systems.
Those which are closest at hand are first—in the field of algebra—the
determination of the potentialities and limitations on solving algebraic
1.1] INTRODUCTION 3

equations in various settings, and second—in the field of analysis—the


development of the limit concept and of its basic properties. (For various
reasons the algebraic questions will receive somewhat heavier emphasis
in this book.) Although our book is subtitled “Foundations of algebra
and analysis,” it should not be thought that the subject can be meaning¬
fully separated into two stages, the first entirely occupied with the study
of the number systems and the second with the applications of this study
to the critical treatment of mathematics. For it is the demands of the
already informally understood concepts and results of algebra and analysis
which shape the particular development that is taken. To ignore this would
be to deliberately place our critical understanding at a disadvantage.
Thus our attempt throughout is to gain the advantages of both intuition
and rigor by intertwining motivation, precise development, and appro¬
priate applications.

The mathematical method. The objects with which mathematics deals,


such as numbers and geometrical figures, are abstract in nature and are
usually, in any given study, infinite in number. Although our ideas about
these objects are closely related to our perceptions of various groupings
of material objects, it is very rarely that we can settle a mathematical
question by direct appeal to reality. Thus a certain amount of experimenta¬
tion with pencil, paper, ruler, and compass may lead us to guess that the
medians of any triangle all meet in a single point, but no amount of
experimentation could verify that this statement is true of the infinitely
many conceivable triangles. The specifically mathematical method used
to settle such questions is as follows. Certain statements regarding the
objects we have in mind are regarded as evident, as being part and parcel
of our conceptions of the objects themselves; these statements are generally
called axioms or postulates. Once the axioms and basic concepts are
granted all else in mathematics is obtained by logical argument in which
new concepts, if they appear, are defined only in terms of earlier ones. Now
it is conceivable that someone is unwilling to grant a given group of
axioms. He may do this on the ground that he cannot conceive of any
objects to which the axioms correctly apply or on the ground that the
axioms do not correctly apply to the notions to which he thinks they are
intended to apply. There is no logical method by which such a person
can be persuaded to believe otherwise. To such a person, mathematics
(at least, as developed from that particular group of axioms) is a con¬
tentless game; it may, nevertheless, be a game which he enjoys playing.
However, the true value and power of the mathematical method are that
it leads those -who do grant the axioms, as expressing simple and intui¬
tively clear truths about certain objects, incontrovertibly to complicated
and often surprising truths about the same objects.
4 THE LOGICAL BACKGROUND [chap. 1

The student is no doubt familiar with this “axioms-definitions-theorems ”


description of mathematical activity from his course in plane geometry;
he may have even been brought to the conclusion that such an approach
to mathematics is sterile and barren. Indeed, he is much more apt to be
convinced of a statement in geometry or calculus by a few diagrams of
"typical” cases, or by a manipulation with infinite series which appears
as if it ought to be right, than by a careful logical argument. But this
approach is strictly limited and can provide only a thin appreciation and
understanding of mathematics. Thus, in our presentation here, exact
definitions of concepts and careful arguments in proofs will be in the fore¬
front. This does not mean that intuition must be abandoned. On the con¬
trary, and in contrast to mechanical experimentation, the finding of a
correct proof often demands great ingenuity combined with intuitive
understanding. The student will have the opportunity to develop such
understanding both in following the proofs given here and in carrying
out proofs of his own.
For purposes of illustration of certain basic "pre-number” concepts we
will assume some familiarity with the number systems in this chapter
and the next. However, after that point we will proceed to carry out the
development proposed above with only the simplest prior assumptions con¬
cerning numbers as a basis.

1.2 Logic. It is possible to give a completely exact description of the


notion of logical deduction; this has been accomplished in the last half
century in the field of mathematics devoted to symbolic or formal logic.
We do not assume that the reader is familiar with symbolic logic, nor
shall we attempt to describe this subject to him here, since logical think¬
ing in mathematics can be learned only by observation and experience.
(In fact, the ability to reason correctly and to understand correct reason¬
ing is itself a prerequisite to the study of formal logic.) Nevertheless,
there are logical aspects of our study of the number systems which are
worth approaching informally before we embark on our subject matter
proper.*
Mathematical statements and their structure. In mathematics we are
concerned solely with affirmative or declarative statements (also called
propositions) which must either be true or false. Thus such statements as
Goldbach’s conjecture is probably true” or questions such as "Can every
map be colored with only four colors?” though they play an important
role in the doing of mathematics, are not part of mathematics proper.
When we use the word “statement” in the following, we have in mind only
affirmative statements.

For the reader who is interested in finding out more about symbolic logic
we recommend the textbooks listed in the Bibliography.
1.2] LOGIC 5

The transition from arithmetic to more advanced subjects in mathe¬


matics corresponds to the transition from particular statements, such as
12 + 7 = 19, 12 • 7 = 86 (the first of which is true, the second false),
to statements involving references to arbitrary objects of a certain kind.
The most economical means for formulating statements of the latter sort
is by the use of variables. These are certain letters, such as a, b, x, y, z,
m, n, etc., which in a given statement can refer to some or all of these objects.
Other symbols, such as 12, e, tv, y/2, etc., which are intended to refer to
certain fixed objects, are usually called constants.
The following expression,

(1:2-1) x + y = 0,

which involves variables, is not regarded as a statement, since it is neither


true nor false; it is called a condition (on x and y). However, it can be used
to form a statement in several ways. One way is to substitute constants
for the variables, e.g., 3 for x and —2 for y, thus yielding the particular
statement

(1:2-2) 3 + (-2) = 0,

which is, of course, false. Another way is provided by the use of the
words “all” (every, any) and “some” (there is, there exists). Some ex¬
amples of statements which can be formed from the condition (1:2-1)
using these words are:
(1:2-3) for all integers x and y, x + y = 0;
(1:2-4) for some integers x and y, x -j- y = 0;
(1:2-5) for any integer x there is an integer y such that x + y = 0;
(1:2-6) for any positive integer x there exists a positive integer y such
that x + y = 0.

Clearly, statement (1:2-3) is false and (1:2-4) is true (in particular,


0 + 0 = 0). (1:2-5) is true since for any integer x the integer —x is an
instance of an integer y which satisfies the condition; on the other hand,
(1:2-6) is false.
Consider now the condition

(1:2-7) z + y = 5

and the statements

(1:2-8) for any integer x there is an integer y such that x + y = 5, and


(1:2-9) for any positive integer x there is a positive integer y such that
x + y = 5.
6 THE LOGICAL BACKGROUND [CHAP. 1

Again (1:2-8) is true since, given an integer x, the integer 5 — x is an


integer y for which the condition (1:2-7) is true. Concerning (1:2-9),
we see that there are some positive integers x satisfying the condition,

(1:2-10) there is a 'positive integer y such that x + y = 5,

namely the integers 1, 2, 3, and 4; however, (1:2-9) is still false, since the
condition (1:2-9) is not true of all positive integers x, in particular not
true of the number 5.
It is seen that variables serve roughly the same purpose in mathematical
statements as do the pronouns “it,” “this,” “that” in ordinary language.
If we did not use some such device as variables, even the simplest mathe¬
matical statements, such as

(1:2—11) for all integers x and y, x2 — y2 = (x — y)(x + y),

would demand unnecessarily complicated expression. For example:

(1:2-12) for any two integers (not necessarily distinct), the result of
squaring the first and subtracting the square of the second is
the same as forming the product of two terms, the first of which
is the result of subtracting the second given integer from the
first, while the second term, of the product is the result of adding
the two integers together.

It is apparent that mathematics without the use of variables could hardly


be advanced beyond arithmetic and would be much more difficult to master.
However, the use of letters as variables is not an essential feature, since
one could use other kinds of simple symbols almost as well:

(1:2-13) for all integers _ and . . . , (_)2 — (. . ,)2


= (_ — ■ ■ ■) (_+ • • •)•

One note of warning should be sounded about the use in many mathe¬
matical texts of the words variable” and “constant.” For example, one
may read such a phrase as:

(1:2-14) consider the polynomial ax2 + bx + c, where a, b, c are


constants.

In actuality, all the letters a, b, c, and x in (1:2-14) are variables in our


sense. What would be intended, in a discussion launched by (1:2-14),
would be to obtain some information regarding the behavior of ax2 +
bx fi- c for any given a, b, c when x varies; in other words, there is a dif¬
ference in interest in the respective roles of a, b, c, and x. It would perhaps
1.2] LOGIC 7

be better to distinguish these roles by referring to a, b, c as 'parameters,


as is often done.
A second point to be noted regarding normal mathematical writing,
and which again seems in conflict with our discussion, is connected with
the practice of referring to conditions such as

(1:2—15) x + y = y + x

as being true statements. What is intended is that the condition is asserted


to hold true for all values of the variables. To be properly stated, one must
first determine from the context what types of objects are under con¬
sideration. For example, in this particular case it may be real numbers.
Then, stated in full, we would have

(1:2—16) for all real numbers x and y, x + y = y + x.

Similarly the following “statement,”

(1:2-17) let x < — 2] then 4: < x2,

would (in a discussion of real numbers) be properly translated into the


statement

(1:2-18) for any real number x, if x < —2, then 4 < x2.

When there is no ambiguity, we may also follow such practices as are


indicated in (1:2-15) and (1:2-17).

Existence. A very important idea connected with the type of statements


we have been considering concerns the meaning in mathematics of ex¬
istence. The student is used, from his first courses in mathematics, to
dealing with problems for which there is a solution. Furthermore, he ex¬
pects to explicitly obtain that solution or to find a formula or rule for
obtaining the solution in any particular case. A typical example is the
formula for the solution of the quadratic equation ax2 + bx + c = 0.
Consider now the following situation. In the study of questions of division
among integers, for example to find greatest common divisors or least com¬
mon multiples, the prime numbers are very useful. These are the positive
integers, other than 1, which have no integer divisors other than them¬
selves and 1. The first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19,
23, ... Now it can be proved that

(1:2-19) there exist infinitely many prime numbers.


8 THE LOGICAL BACKGROUND [CHAP. 1

This is a consequence of the following statement:

(1:2-20) for every prime number x there exists a prime number y with
x < y.

However, no simple formula is known which will associate with every


prime number x the next larger prime number. On the other hand, there
is a simple (though tedious) method for computing this number. For ex¬
ample, given that 89 is prime, we need only compute all possible divisors
of each of the numbers in the list 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101, . . . Eventually, according to (1:2-20), we must come upon a
prime number; in this case, it can be verified that the next prime after 89
is 97. Thus, even though we may be able to prove the existence of a solu¬
tion to a certain problem, we must in general be satisfied with some
systematic method for computing the solution, in contrast to finding a
formula which will “exhibit” the solution.
As a second example, consider the following statement, which can be
proved:

(1:2-21) there exists a real number x such that x5 — 7x2 + 2 = 0.

The intuitive reason for the truth of (1:2-21) is simple. The polynomial
x5 7x2 + 2 has the value 2 at x = 0 and —4 at x = 1; since its value
varies continuously between 2 and — 4 as x varies between 0 and 1, there
must exist an x between 0 and 1 for which the value is 0. (This is not a
precise proof; we shall be able to give precise proofs of statements of this
sort in Chapter 7.) Again there is no known formula which will exhibit a
solution of the given equation (and, in a precise sense which is given in
courses of advanced algebra, there is probably no hope of finding one; this
will be discussed in more detail in Chapters 8, 9). Moreover, no finite
computation of predetermined length will end with exhibiting a specific
real number x as a solution of (1:2-20). However, the following infinite
sequence of computations will bring one closer and closer to a solution,
if it does not end in a finite number of steps with an exact solution. First
compute the value of x5 — 7x2 + 2 at 0.0, 0.1, 0.2, . . . , 0.9, 1.0. It is
possible that one of these rational numbers is an exact root, and our
computation is ended. Otherwise, in one of these intervals the value at
the left endpoint must be positive and at the right endpoint negative.
Suppose, for example, that (0.5)5 — 7(0.5)2 + 2 > 0 and (0.6)5 —
7(0.6)2 + 2 < 0. Thus a root of the polynomial lies between 0.5 and 0.6.
Then we compute the value of the polynomial at 0.50, 0.51, 0.52, . . . ,
0.59, 0.60, testing in each case to see whether this value is zero, positive,
or negative. By continuing in this way, we can find the decimal value of
a root to any desired number of places. The necessity of performing a
1.21 LOGIC 9

potentially infinite sequence of computations is less satisfactory than the


case where a finite sequence will suffice; however, when there is no alter¬
native one must be content with this method of computing a solution, in
contrast to finding a formula which will “exhibit” the solution.
For a final example concerning the meaning of “existence” we turn to
a question in analysis. In many problems it is important to find a relative
or absolute maximum value for a function. Simple examples show that
a function f(x) can be defined for all values of x such that a < x < b
and yet have no absolute maximum in that interval. However, the fol¬
lowing theorem can be proved (we give it in Chapter 7):

(1:2-22) if a, b are real numbers with a < b and f(x) is a continuous


function for all values of x such that a < x < b then there
exists a number c with a < c < b for which /(c) is an abso¬
lute maximum.
If one follows a proof of this theorem, no way is seen of extracting from
it a systematic procedure for calculating, say, the decimal expansion of c.
In fact, it is known that there are continuous functions such that one can
systematically calculate (to any degree of accuracy) the decimal expansion
of f(x), given a procedure for calculating the decimal expansion of x,
but such that there is no systematic procedure for calculating any value
of c at which / attains its absolute maximum value.*
The reader may wonder what value, or even meaning, such a statement
as (1:2-22) has if we cannot be sure that in any given application we shall
be able to compute the solution. In this he would be supported by a small,
but hardy, group of logicians and mathematicians (known as construct¬
ivists or intuitionists). The usual position of most mathematicians on
this question might be summarized by the following points:
(a) there is no foundation for the optimistic belief that humans can
solve all problems which they set themselves, but it is still meaningful
to pose such problems;
(b) a statement of existence gives us a minimum guarantee of informa¬
tion, from which we can try to proceed for more information in more
specific cases [for example, in the case (1:2-22), one proceeds to investi¬
gate added conditions on the function, such as differentiability, which
would permit computability of the location of the maximum];
(c) in this way we may be led to proofs of the correctness of some
computational procedures via a long chain of “pure” existential results,
yet there may be no way to eliminate such noncomputational statements
from our arguments.

* The precise statement and proof of this is based on the advanced theory
of recursive functions developed in recent research papers.
10 THE LOGICAL BACKGROUND [CHAP. 1

To summarize, we see that there are four situations which can accom¬
pany a proof of existence. First we may extract from the proof a simple
formula for the solution, such as with the solution of the quadratic equa¬
tion. Second, the proof may lead us to a finite systematic computation
procedure (often called an algorithm), such as in case (1:2-20). Third,
the proof may lead us to an infinite systematic computation procedure,
as in case (1:2-21); this is especially true in connection with various
problems whose solutions are real or complex numbers. Finally, the
proof may lead us to no computation procedure at all, and must rest as
a pure statement of existence; this is the case in (1:2-22). The student will
find these different kinds of situations intertwined throughout this book
and his further work in mathematics; he should always keep a sharp eye
out for the distinctions, if they are not explicitly mentioned. If he becomes
worried about the small amount of attention that is paid to computation
in his further courses of mathematics, he should remember that this is
much more often out of necessity than out of perversity.
One more point to be made about statements of existence, which is
away from this main issue, is that “there exists” is to be interpreted as
“there is at least one.” Thus in (1:2-21) one can see that there must be
another root of x5 — 7x2 + 2 in the interval from 1 to 2 (compute the
value of the polynomial at 2). Similarly, in (1:2-22) there may be many
values of c in the given interval at which the function attains its maximum.
In order to say that there is just one object satisfying a given condition,
we usually use the words “there is exactly one” or “there is a unique.”
For example, the following is true:

(1:2-23) there is a unique real number x such that x5 — 7x2 + 2 = 0


and 0 < x < 1.

Logical connectives. We return now to a further analysis of the logical


structure of mathematical statements. Certain other words, besides “all”
and “some,” serve a purely logical function, i.e., can be used to build more
complicated conditions or statements from simpler ones in such a way that
the meaning and truth of the more complicated conditions are completely
determined by the simpler ones. Examples of such words are “not,” “and,”
“or.” However, the meaning of “or” in mathematics is not completely
determined by normal usage; the mathematical convention is that it be
used in the nonexclusive sense, i.e., the resulting statement is true if either
of its parts is true, and also if both of them are true. Thus the statement

(1:2-24) for all integers x, either x < 2 or 2 < x,

is counted as true, but would not be so counted under the exclusive inter¬
pretation of “or. ” It may be that we do not give enough credit to the non-
1.2] LOGIC 11

exclusive sense of “or” in ordinary language. Thus if we grant the state¬


ment “We must have much bigger rocket engines or we will not be able
to put a man on Mars,” we should keep in mind the possibility that we
may have much bigger rocket engines and still be unable to put a man
on Mars.
Other statement connectives which are quite commonly used in logic
and mathematics, much more than in ordinary discourse, are provided by
the words “if—then ...” and “if and only if. ” A condition which is formed
using the first of these, such as

(1:2-25) if x < —1 then 9 < x2

is called an implication. The first part of it, £ < — 1, is called the hypoth¬
esis or antecedent and the second part, 9 < x2, is called the conclusion
or consequent. The use of implication in mathematics differs in certain
respects from ordinary usage. Very often we intend to convey, in every¬
day language, that there is some sort of cause and effect relationship
between the hypothesis and conclusion of an implication. Examples of
such are “If you eat that green apple (then) you will get sick,” and “If
you do that again (then) I’ll spank you. ” It seems very difficult to try to
give this sort of relationship a precise meaning, especially in connection
with mathematical statements. The simplest precise way to provide a
uniform treatment of implication is to demand that the truth or falsity
of an implication depend only on the truth or falsity of its components,
and should not necessarily depend on the sense of the components. We
see only one way to show that an implication is false, namely by showing
that the hypothesis is true while the conclusion is false. In all the other
(three) cases, under this understanding of implication, the implication
should be counted as being true. Consider, for example, the following
instances of (1:2-25):

(1:2-25') (a) if — 4 < —1 then 9 < (—4)2;


(b) if 4 < -1 then 9 < 42;
(c) if 3 < -1 then 9 < 32;
(d) if— 3 < -1 then 9 < (-3)2.

Of these four, there is only one in which the conclusion is false and the
hypothesis true, namely (d); hence in all other cases the implication is
true. This may go slightly against the grain, especially in cases (b) and
(c), but only if we try to think of the conclusion as “necessarily following”
from the hypothesis. The hypothesis in both these cases is false, and one
often says, in such cases, that the whole implication is vacuously true.
In case the reader still has doubts he should compare these with such
12 THE LOGICAL BACKGROUND [CHAP. 1

statements as “If you high jump seven feet then I’ll eat my hat. ” [Which
of (1:2-25,)(b), (c) is this like?] Only (l:2-25')(d), of the four cases given,
shows that (1:2-25) is not true for all integers x.
The reader may feel that it is an academic matter to discuss implica¬
tions which have a false hypothesis. This is not the case, since a number of
arguments in mathematics, most notably proofs by contradiction (or
reductio ad absurdum) involve just such situations. In these proofs we
show that a certain statement (2 is not true as follows: we imagine that
(2 is true and we infer from this another statement, (B, which is known to
be false. In other words, we prove

(1:2-26) if (2 then (B.

Assuming that the inference is correct, i.e., that (1:2-26) is true, it follows
from the falsity of <S> that (2 cannot be true. Of course, if we knew in ad¬
vance that (2 is not true, we would not be interested in the implication
(1:2-26), since it is vacuously true. We shall give no examples of such
proofs by contradiction now; several of these, which we shall point out
explicitly, will be found later in the text.
There is one sense in which the notion of implication is used in mathe¬
matics which is more closely related to the everyday sense of “necessary
consequence. ” That is when we say that one statement (2 implies another
statement (B, by which we mean that (B can be logically inferred from (2
on the basis of the initial axioms. From the point of view of formal logic,
this will happen only in the case when the statement “if G then «” can
be inferred from the axioms, so that again there is no necessary connection
between G and (B. However, from the informal point of view, we usually
concern ourselves only with implications involving statements whose
contents are somehow related.
We now turn to the use of the words “if and only if. ” A condition such as

(1:2-27) \x — 2\ <3 if and only if —1 < x and x < 5

is called an equivalence. It carries the same sense as the two implications

(1:2-27)(a) if — 1 < x and x < 5 then \x — 2| <3;


if \x — 2\ < 3 then —1 < x and x < 5;

the second of these corresponds to the “only if” of (1:2-27). We also


say, with the same sense, that

(1:2-28) — 1 < x and x < 5 is a necessary and sufficient condition for


\x — 2| < 3;
1.2] LOGIC 13

here “sufficient” corresponds to the “if” part of (1:2-27). It follows from


the conditions for truth of an implication that an equivalence is true if
both parts are true [as in (1:2-27), when x = 3] or both parts are false
[as in (1:2-27), when x — 5]; it is false in both other cases. Again there is
no necessary relationship assumed between the content of the components
of an equivalence. We also speak of two statements ft and (B as being
equivalent if the equivalence “ft if and only if (B” can be proved on the
basis of the given axioms. Here we usually have in mind that the content
of ft and (B are somehow related.

Exercise Group 1.2

1. Tell, in each of the following cases, whether the statement is true or false
and give brief informal reasons for each answer.
(a) There exists an integer x such that x2 — 4 = 0.
(b) There exists a unique positive integer x such that x2 — 4 = 0.
(c) There exists a rational number x such that x2 + x + 1 =0.
(d) For any positive real number x there exists a real number y such that
y2 = x.
(e) For any real number x there exists a real number y such that y2 = x.
(f) For every integer x there exists an integer y such that x = 2y or
x = 2y -f- 1.
(g) For all positive integers x, if x > 3 then x2 > 9.
(h) For all positive integers x, if x > 2 then x2 > 9.
(i) For all real numbers x and y, if x < y and y ^ 0 then (x/y) < 1.
(j) For all integers x, x2 < 16 if and only if —4 < x and x < 4.
(k) For all integers x, x3 < 27 if and only if x < 3.
(l) For all integers x, (x3 — 1)/13 < 2 if and only if x < 2.

2. Which of the four possible combinations of truth and falsity in the hypoth¬
esis and conclusion can be realized by substituting particular integers for
x in the following condition? (Give examples of each.)

If x2 > 9 then x < .0 or x > 2.

Is this condition true for all integers x?


CHAPTER 2

THE SET-THEORETICAL BACKGROUND

2.1 Sets. Sets as abstractions from conditions. Consider the following


two conditions:

(2:1-1) x is a real number and \x — 2| <3,

(2:1-2) x is a real number and — 1 < x and x < 5.

It is easily seen that these conditions are equivalent for all values of x.
This suggests that the two conditions are in some sense identical, although
the senses which they express are, on the face of it, different. The matter
can be put more exactly in the following way: the totality of objects which
satisfy condition (2:1-1) is exactly the same as the totality of objects which
satisfy condition (2:1-2). Among the objects which are contained in this
totality are the numbers 3, 0, •§, — f, \/2 + 3, tt + 1, etc., and among
those which are not contained in the totality are 5, —f, V^I, the moon,
etc. We cannot possibly list all members of this totality, but this should
not prevent us from ascribing an existence to this totality which is in¬
dependent of any particular way of describing the same totality.
When we shift attention in this way from particular ways of singling
out a certain group of objects, to the totality of objects itself, we come to
the notion of set (also called class or collection). Each condition involving
one variable determines a set. More generally, this holds for certain kinds
of conditions whose formulation involves several variables. Some of these
variables may be tied down by the use of words such as “all” or “some”;
these are said to be bound. The rest are called free variables. For example,
the condition

(2:1-3) x is an integer and y is an integer and x = 3y or x = 3y -f- 1

has two free variables, x and y, and the condition

(2:1-4) x is an integer and there exists an integer y such that x = 3 y or


x = 3y + 1

has one free variable, x, and one bound variable, y, while the condition
(in fact, statement)

(2:1-5) for every integer x there exists an integer y such that x = 3y or


x = 3y -f- 1
14
2.1] SETS 15

has no free variables. The second of these conditions, although it involves


two variables, still determines a set (which includes the numbers —3, —2,
0, 1, 3, 4, etc.) because only one of these variables is free. Thus what we
wish to say is that every condition which has just one free variable determines
a set which is the totality of all those objects satisfying the condition. (We
shall have to qualify this statement at the end of this section; however,
it will be adequate for our purposes in the meantime.)
To indicate that we are dealing with a condition which has one free
variable we shall write such expressions as

a (re), ©( x), (P (y), Q(ri), etc.,

where in the first two the free variable is understood to be x, in the third y,
and in the fourth n. Similarly,

a(rr, y) n), (P(y, z), Q(n, p), etc.

indicate conditions with two free variables. Given a condition with one
free variable, say Ct(rc), we denote by

(2:1-6) {x\a(x)j

the set consisting of all those objects c which satisfy the condition. (2:1-6)
may be read: the set of all x such that d(x). In effect, for any specific con¬
dition &(x), (2:1-6) provides us with a symbolic means of denoting a specific
set, which is analogous to the use of constants to denote specific members.
Moreover, just as with constants denoting numbers (“2 + 3” and “5”
denote the same number), different expressions, for example, {x: &{xj) and
{x: ffi(rr)}, can be used to denote the same set. The condition for this is
simple:

(2:1-7) {x: d(rc)} = {x: CB(rr)} if and only if for all x, 0L(x) is equiv¬
alent to ffi(rc).

We shall also need to make statements and formulate conditions in¬


volving all or some sets of a certain type and so, just as with statements
about numbers, we shall need variables ranging over sets. For this purpose
we shall use capital letters, A, B, C, D, E, M, N, S, X, Y, Z, etc., occasion¬
ally also in boldface, A, B, C, etc. The use of such variables is helpful even
in dealing with specific sets. Suppose, for example, that in a certain discus¬
sion we shall have to make several references to the set {x: x is a real
number and \x — 2| < 3}. Since this gets unwieldy, it is preferable to
introduce a temporary notion for the set, as follows:

(2:1-8) Let S = {x: x is a real number and \x — 2| < 3}.


16 THE SET-THEORETICAL BACKGROUND [CHAP. 2

As a companion to this, we introduce a simple notation to express that


an object c satisfies the condition which we have used to define S. We write

(2:1-9) c G S,

which we read: c is in S, or c is a member of S, or c is an element of S, or


c belongs to S. If we wish to say that c is not in S, we write

(2:1-10) c&S.

Thus, for the particular set S introduced in (2:1-8) the following are true
statements:

(2:1-11) 3 G S, —§ G S, y/2, + 3 G S, 5 6? S, yf—1 S? S; for all real


numbers x, if x > 5 then x & S.

For most sets it is humanly impossible to explicitly list all elements of


the set; these are the infinite sets. Those for which it is (at least in principle)
possible to completely list the elements, given enough time and space,
are called finite. This explanation of the word “finite ” is, of course, very
vague. Most people would regard the notions of finiteness and infinitude
as being intuitively clear and acceptable as basic undefined concepts of
our development. However, we shall see in Section 2.4 that it is possible
to define these notions in terms of simpler undefined concepts in such a
way as to accord with our intuitive understanding of them.
I'or the finite sets we have another natural notation besides the basic
notation of (2:1—6), namely, that obtained by writing down descriptions
of each of the elements of the set, and enclosing the result in braces. Thus

(2:1-12) {-5, tt + 2, W2)

denotes the set whose only elements are —5, tt + 2, y/2. As another
example, we have

(2:1-13) {x: x is an integer and \x — 2| < 3} = (0, 1, 2, 3, 4}.

With this notation for a finite set it is immaterial in what order the ele¬
ments are listed or even whether they are listed more than once. Thus

{?r 2, -5, W2}, (W2, -5, —5, 7r 2},


2 • </2 + 1 1
—5, v/2, 7r + 2,
2 “ 2

all denote the same set, which is the same as the set denoted in (2:1-12).
A notation similar to this is often used informally to denote some infinite
2.1] SETS 17

sets. For example, we may use

(2:1-14) {1,4,9,...}

to denote the set of squares of positive integers, i.e., the set

{m: for some n E P, m = n2}.

Unfortunately, this notation is ambiguous and should be used with care.


For example, (2:1-14) could be used equally well to denote the set

{m: for some n e P, m = n2 + (n — \){n — 2){n — 3)}.

It is better, where possible, to indicate the “general element” of the set,


such as

(2:1-15) {1, 4, 9, . . . ,n2, . . .}.

This “dot” notation is also often used to indicate certain finite sets of any
general class. For example, we might write

(2:1-16) {1, 4, 9, . . . ,n2}

as a notation for the finite set consisting of the first n squares of positive
integers, when n is a positive integer.

Extensions of the concept of set. With every finite set can be associated
in a direct manner a certain condition which defines the set in the form
(2:1-6). For example, the set (2:1-12) is the same as the set

(2:1-17) {x: x = —5 or x = 7r -)- 2 or x = \/2}.

No such direct possibility is available with infinite sets. As we have seen,


many conditions give rise to infinite sets. However, we should conceive
of the possibility that there exist sets for which we can find no corresponding
condition. It is difficult to see at first how any use could ever be made of
such sets. However, the experience of mathematicians in developing
analysis has shown that they are practically indispensable for certain argu¬
ments. The situation here is very closely related to the questions of
existence discussed in the preceding section, especially to the case where one
can state the existence of a solution to a problem without being able to
provide any systematic procedure for computing it.
The following example will give a clearer idea of what kinds of state¬
ments of set existence we have in mind. (What we cannot provide at the
moment is the full motivation which would lead to the consideration of
18 THE SET-THEORETICAL BACKGROUND [CHAP. 2

such examples.) This example is concerned with the idea that certain real
numbers, such as \/2, y/2 + §, y/2 — %7-, are like each other in that their
differences are rational numbers; others, such as y/2 + f, 7r + 2, are
unlike in that their differences are not rational. This suggests classifying
all the real numbers into different sets, where two real numbers x and y
belong to the same set S if and only if x — y is rational. Let the collec¬
tion of all such sets S be called A. Thus

^ ^2 - , ^2, . . . , ^2 +
S2 = {• . . , 7T — 91, . . . , 7T — Jf, . . . , 7T + 2, . . .}

indicate two particular sets in A. Clearly, each set S in A is infinite.


Now we claim that there exists a set B which has exactly one element in
common with each of the sets S of A. For example, B may contain
^2 + 1 but no other element from Si, and may contain tt — but
no other element from S2. In other words the set B has the property:
if xh x2 are distinct elements of B then xx — x2 is not rational (other¬
wise they would come from the same set), and for each real number y
there is an x e B such that x — y is rational (since at least one element
is chosen from each set). The existence of this set can be visualized in a
certain sense:

Figure 2.1

Imagine each of the circles to represent one of the sets S of A; from each
of these one element is chosen, shown as a darker point, and the set B
is the set of all the chosen points. Unfortunately, there seems to be no
rule which one could give in advance for selecting a particular element
from each set S of A. In other words, there is no condition ®(z) that
we can think of (which does not assume prior knowledge of the set B)
such that B = (a::ffi(a:)}. (Convince yourself of this by trying, to the
contrary, to formulate such a rule or condition.) The statement that
there exists such a set B is a typical consequence of a general statement
which is called the 'principle or axiom of choice. We shall have several
occasions to apply this statement in the development of the properties
of the real number system; we trust that the student will find each such
application plausible in its own right.
I rom the initial consideration of sets defined by conditions we have
thus widened our conception to include sets which may not be defined by
any condition at all. The example in the preceding paragraph also suggests
2.1] SETS 19

another way for widening our conception. Sets can be formed by collect¬
ing objects of any kind. In particular, once we conceive of sets themselves
as being objects, we must allow sets whose elements are in turn sets.
Indeed, the collection A of the preceding paragraph is just such a set.
As another example, consider the set

(2:1-18) B = {X: X is a finite set of positive integers}.

Then

(2:1-19) (1, 5, 6} G B, (2, 5, 7, 11} e B, {—1, 5, 7, 11} £ B,


{m: for some n G P, m = n2} & B,

where P is the set of positive integers. Still another example is the set

(2:1-20) C = {X: X is any set of positive integers} ;

then

(2:1-21) {ra: for some n G P, m = n2} G C,


{m: m — 0 or for some n G P, m = n2} & C.

The set A of the preceding paragraph can be defined as

(2:1-22) A = {S: S is a set of real numbers such that for some


x G Re, S = {y: y G Re and x — y G Ra}},

where Re is the set of real numbers and Ra the set of rational numbers.

Identity and inclusion. Since not every set is determined by a condition,


we cannot in general apply (2:1-7) to find out whether two sets are
identical. However, our intuitive conception of set demands that a set
be completely determined by its elements. This can be expressed pre¬
cisely by saying that the following condition holds for arbitrary sets A, B:

(2:1-23) A — B if and only if for all x, x G A if and only if x G B.

A more general relation between sets is that of inclusion. We say that a


set B is included in a set A, or that B is a subset of A; in symbols,

(2:1-24)(a) B c A,

only in the case where

(2:1,-24)(b) for all x, if x G B then x G A.

(Some writers use the symbol C instead of c, while others prefer to write
20 THE SET-THEORETICAL BACKGROUND [CHAP. 2

B C A only when B c A and B ^ A; watch for such discrepancies.)


It is clear, then, that for any sets A, B

(2:1-25) A = B if and only if B c A and A c B.

In particular,

(2:1-26) A c A

for any set A. It is further clear that for any sets A, B, C

(2:1-27) if C c B and B c A then C c A.

d he inclusion notation can be used to simplify the defining conditions of


the sets described in the preceding paragraph. For example, the set C
of (2:1-20) is{I:IcP}; and the set B of (2:1-18) is {X: X is finite and
X — P} > further, BcC. In general, if A is any set we can form

(2:1-28) {I:lc4},

which is called the set of all subsets of A.


This is a good moment to discuss (what may at first seem to be a digres¬
sion) the general treatment of identity or equality in mathematics. Some
students find this notion paradoxical. They argue: if we say two objects
are identical then they are not two objects but one; but if there is only
one object to begin with, there is no point in saying that it is identical with
itself, since that is trivial. The point here is a confusion in the use of the
word “two.” What we are really concerned with is two descriptions of
objects. When we write 5 = 2 + 3, we are saying that the object denoted
by ‘5’ is the same as the object denoted by ‘2 + 3/ and this is a true and
nontrivial bit of information. On the other hand, we recognize the state¬
ment 5 = 2 + 2 to be meaningful but false, since the objects denoted by
‘5’ and by ‘2 + 2’ are distinct.
The use of the words “is the same as” or “is identical with” belongs
to the most basic part of our reasoning and cannot be explained in terms
of any prior notions. We can, however, put down certain statements which
make their use more explicit. The .starting point is that objects x and y
are identical, x = y, just in the case that x and y share exactly the same
properties. More precisely,

(2.1-29) x = y if and only if for all sets S, x £ S if and only if y e S.

This condition is known as the identity of indiscernibles; its explicit recogni¬


tion goes back to the philosopher-mathematician Leibniz. Here x and y
can be objects of any kind: pebbles, numbers, or even sets. In particular
it follows from (2:1-23) and (2:1-29) that if two sets A and B are identical^
2.1] SETS 21

i.e., if they have exactly the same elements, then for all sets S, A e S if
and only if B e S.
1 he following familiar statements are immediate consequences of
(2:1-29):

(2:1-30) For any objects x, y, z we have


(a) x = x;
(b) if x = y then y = x;
(c) if x — y and y — z then x = z.

The first of these is called the reflexive law for identity, the second the
symmetric law, and the third the transitive law. There are many relations
between objects, other than the relation of identity, which also satisfy
these conditions. For example, let us call two real numbers x, y equivalent,
and write x = y, if x — y is rational. Then it is seen that (2:1-30) (a)
through (c) hold true of any real numbers x, y, z when we replace the
symbol “= ” by the symbol Such “identity-like” relations are very
interesting and will be used at a number of places in our work. The point
to be realized from this is that the laws in (2:1-30) are not sufficient to
characterize identity, but that rather some much stronger condition, such
as (2:1-29), must be used. Certain other statements about identity,
with which the student is familiar from his school courses, such as “if
equals are added to equals the results are equal,” will be discussed in
Section 2.3.

Some peculiar sets. If the original method we described of constructing


sets from given conditions G(x) by forming {x\ Q,(t)} is pursued to its
logical conclusion, certain peculiarities arise. For example, it is perfectly
sensible to form the set {x: x is a prime number and 17 < x and x < 29},
which we realize is the same as the set {19, 23}. It should be equally
sensible to form the set

(2:1-31) {x\ x is a prime number and 23 < x and x < 29} ;

but this would have to be a set with no elements whatever. We can, if


we wish, avoid the use of such sets. However, in order to do so we would
have to know in advance of any condition, whether or not it is satisfied
by at least one object. Thus we would be restricted from forming

(2:1-32) {n: n e P and n > 2 and there exist x e P, y e P, zeP


such that xn + yn = zn},

since to this day we do not know whether this set contains any elements
(i.e., we do not know the truth or falsity of Fermat’s “last theorem”).
22 THE SET-THEORETICAL BACKGROUND [CHAP. 2

It is much easier, and perfectly consistent with our other conceptions of


sets, to make no restrictions of this sort. We shall thus say that a set A
is empty if for all objects x, x & A. It then follows immediately from
(2:1-23) that

(2:1-33) if A, B are both empty sets then A — B.

For example,

(2:1-34) {x: x is a prime number and 23 < x and x < 29} = {z\ z is
a unicorn}.

By (2:1-33) we can speak of the empty set, which we denote by

(2:1-35) 0.

(Some writers use “0” or “A.”) By convention, 0 is counted as a finite


set (its elements can be listed in no time). It follows from our definition
of the empty set and the definition (2:1-24) of inclusion that for any set A

(2:1-36) 0cA,

since the implication in (2:1-24) (b) is in this case vacuously true.


On the other end of the scale it would seem that we could form sets
that are inconceivably large, such as {X: X is an infinite set} or even the
set

(2:1-37) A = {X:Xisaset}.

If we allow this, then we must allow, of the set A in (2:1-37), that A e A.


However, it was discovered by Bertrand Russell that such unrestricted
formation of sets can lead to a contradiction, which has in consequence
come to be known as Russell’s paradox. Namely, form the set

(2:1-38) A = {X:X is a set and X <2 X}.

Now either A e A or A & A, but we cannot have both true. In the


first case it follows that A must be one of the sets X described in the
condition, hence we would necessarily have A A. On the other hand,
if A & A it satisfies the condition which determines which sets are ele¬
ments of A, so A e A. Thus in either case we are led to a contradiction.
The contradiction can be avoided only by saying that either there is no
such object as A to begin with or, if there is, it cannot be a set.
That such contradictions arise from unrestricted formation of sets
{x: &(x)} does not mean that we must abandon such constructions al-
2.1] SETS 23

together. The situation is analogous with the manipulation of infinite


series; the fact that we can get contradictory answers to the sum 1 —
1 + 1-b • • • does not force us to abandon the use of infinite series.
In both cases we have proceeded from an understanding of manipulations
or groupings of finite collections of objects to a corresponding treatment
of infinite collections of objects. The contradictions show that our in¬
tuitions are not a completely reliable guide in the latter case. It is neces¬
sary, therefore, to clarify our intuition by stating precise conditions under
which such manipulations or groupings can be carried out. This should
be done in such a way that our intuitions of the finite are consistently
extended to the infinite. As we discussed in the first section of this chapter,
it is the purpose of this book to explicitly carry out such a program for the
number systems, especially for the real number system, where the first
real difficulties arise.
In fact, our development will proceed from the assumption of the
existence of the set of positive integers together with certain simple
properties of these numbers, successively to the “construction” of the in¬
tegers, rational numbers, real numbers, and complex numbers. The only
assumptions we shall use other than those concerning the positive integers
will be ones concerning the notion of set. Hence, to complete the picture,
we should try to explicitly list all assumptions that would be needed for
this development in a way that would consistently extend our conception
of sets to the realm of infinite sets. Such a systematization has actually
been carried through within the past fifty years, and has reached a form
in axiomatic set theory which has proved acceptable to most mathematicians.
Unfortunately, it would take us too far afield to fully discuss axioms for
set theory. Although that subject is logically prior to the one with which
this book is concerned and is not more difficult to deal with, we believe it
demands a slightly higher level of sophistication and a background in the
needs of mathematics for the proper motivation. The interested student
would do well to follow his work here with a study of axiomatic set theory.
There are now several excellent introductory texts which he could use for
this purpose and which are listed in the Bibliography.
We can, however, briefly indicate some of the principles which are
used in axiomatic set theory, which will show how our ideas about sets
can be carried through without contradiction. We shall not present all
these principles at once, but will introduce them where appropriate
throughout the remainder of this chapter. For convenience of comparison,
we bring these together in Appendix I with some additional comments.
However, the reading of the Appendix is best delayed until the student
has completed at least this chapter and Section 3.1.
The difficulty in the Russell paradox can be loosely expressed by saying
that we have allowed for the existence of sets which are too large. Consider
24 THE SET-THEORETICAL BACKGROUND [CHAP. 2

now the following restricted principle:

(2:1-39) for any set S and condition d(x) there exists a set A such that
for all x, x e A if and only if x G S and d(x).

In other words, if we already have a set S, we can form any smaller set
{x\ x e S and &(x)}. (If we now try to carry through the Russell paradox,
by forming {X: X e S and X <2 X}, we see that no contradiction is
obtained, since the resulting set A need not belong to S.) If the student
checks back over all the examples discussed in this section he will see
that in every instance we have formed a set by restricting a presumably
pre-existing set, for example the set P of positive integers, I of integers,
or Re of real numbers. On the other hand, there is a second principle which
we can use which will allow us to proceed from a given set A to a mod¬
erately larger set:

(2:1-40) for any set A there exists a set B such that for all X, X e B
if and only if X c A.

In other words, given a set A, we are assured that we can form the set
{X: X c A} of all subsets of A. By this means one can proceed from the
assumption only of the existence of certain simple infinite sets, such as
the set P of positive integers, to obtain the existence of somewhat larger
sets. In particular, we shall show how the existence of the set Re of real
numbers can be obtained by a combination of principles like (2:1-39)
and (2:1-40). The student may want to keep these principles in mind in
the further work; however, he should realize that they are by no means a
full set of axioms for set theory, and that therefore a number of our
set-theoretical arguments will have to appeal to his intuitive understand¬
ing of the subject.

Exercise Group 2.1

In the following, P = the set of positive integers, I = the set of integers,


and Re = the set of real numbers.
1. Which of the following sets is finite and which is infinite? In each finite
case, list all the elements.
(a) {x\x £ Re and x2 — 5x -j- 4 = 0}.
(b) {x:x G P and x2 — 5x + 4 > 0}.
(c) {x:x G P and x is a divisor of 24}.
(d) {x:x E P and for some y £ P, 2x + 3y = 24}.
(e) {x\x G P and for some y £ I, 2x + 3y = 24}.
(f) (X: X c (1, 4, 5} and X has at most two elements}.
(g) (X: X c P and X has at most two elements}.
2.2] AN ALGEBRA OF SETS 25

2. List all elements of each of the following finite sets:

Ai = {X: X c {1}}, A2 = {X: X c {1, 2}}, A3 = {X: X c {1, 2, 3}}.

How many elements do you expect will be in

A» = {X: X c {1, 2, 3, , n}} ?

3. Which of the following are true and which false? In each case, give a
brief explanation of your answer.
(a) (x: x G Re and x2 — 5x + 4 = 0} c (x: x G P and x2 — 5x + 4 > 0}.
(b) {a:: x G Re and x2 — 5x + 4 = 0} ^ 0.
(c) {x: x G Re and x2 + x + 2 = 0} 0.
(d) {a:: x G Re and x — 4 > 0} C (x: x G Re and x2 — 5x + 4 > 0}.
(e) {x: x G Re and x — 2 > 0} c (x:xG Re and x2 — 5x + 4 > 0}.
4. Let A = {S: S c I and for some x G I, S = {y: y G I and x — j/ is a
multiple of 5}}. One member of A is the set {. . . , —4, 1, 6, 11, . . .}.
What are the other members of A? Find a set B which contains exactly
one element in common with each member of A.

2.2 An algebra of sets. Intersection, union, and complement. Let S be


any set. We know by (2:1-39) that for any condition CL(x) with one free
variable the set A = {x: x G S and Ct(x)} exists; clearly A c S. As an
intuitive guide for the following considerations, imagine the elements of
the set S sprinkled before us as in the following figure, with the set A
consisting of those points contained in the shaded portion.

Of course, for particular sets S and A this diagram may not correspond
at all to the natural way of visualizing the relationship between A and S.
For example, if S is the set I of integers and

A = {x: x G I and (x + l)(x — 2)(x — 5) >0},

a more natural representation would be:


-3-2-101234567
• • • ft « » 0 • • ft • *
^-A-ft

Figure 2.3
26 THE SET-THEORETICAL BACKGROUND [CHAP. 2

Thus, when using diagrams such as the first, we should keep in mind that
this may represent a considerable deformation of actual geometrical
relationships. A further simplifying step in the use of such diagrams is
accomplished by omitting picturing the points entirely, leaving them to
be supplied by our imagination (Fig. 2.4).

Figure 2.4

This has the double advantage of being less tiresome to draw and of allow¬
ing us to imagine that S may be infinite. Such pictures and their com¬
binations, to be described in the following, are often called Venn diagrams.
From given conditions &(x) and (S>(x) we can form the new conditions

(2:2-1) d(x) and &{x) or <$>(x), not CL(x)

by using the connectives “and,” “or,” and “not.” If A = {x: x e S


and &(x)} and B = {x: x e S and ®(x)}, then we can form new sets
from A and B corresponding to the conditions in (2:2-1):

(2:2-2) Ci = {«eS and &(x) and (B(.r)},


C2 = {x: x e S and Ct(x) or (B(a;)},
C3 = {x: x e S and not fi(x)}.

These have the properties, for all x e S,

(2:2-3) x G Ci if and only if x e A and x e B,


x E. C2 if and only if x e A or x e B,
x G C3 if and only if x & A.

In fact, corresponding to any sets A and B, whether or not they are


defined by some conditions, we can form the sets Cb and C2 and, relative
to any given set S, the set C3. We define

(2:2-4) A D B = {x: x £ A and x e B},


A U B = {x: x e A or x G B},
A(S) = {x\ x e S and x & A].

A n B is called the intersection or meet of A and B, A U B is called the


union or join of A and B, and Z(S) the complement of A relative to S.
2.2] AN ALGEBRA OF SETS 27

(We must use the restriction to S in the last, since the use of sets {x: x & A}
which are too large, could lead to paradoxical conclusions.) We shall
assume that S is some fixed but arbitrary set in the following; then we can
write A instead of A(,S).
The sets formed in (2:2-4) can be visualized as follows:

A n B is the crosshatched area

Figure 2.5.

Figure 2.6. Figure 2.7.

Further combinations can be visualized in the same way:

A n B is the crosshatched area (4 n B) u {B n A) is the crosshatched area

Figure 2.8. Figure 2.9.

It is seen from Fig. 2.9 that (A n B) u (B n A) is the set of elements


which are in A or in B but not in both; it thus corresponds to the use of
the word “or” in the exclusive sense, in contrast to the nonexclusive use
of “or” in the formation of A U B. If we define

(2:2-5) A - B = A n B,

we see from Figs. 2.5, 2.6, and 2.9 that

(A n B) u (B n A) = (A u B) - (A n B).
28 THE SET-THEORETICAL BACKGROUND [CHAP. 2

If we wish, a proof of this last statement can easily be given from the
definitions (2:2-4) and (2:2-5).
The following diagrams represent combinations involving three sets
A, B, C:

(.4 n B) n C is the crosshatched area

Figure 2.10.

It is seen from this diagram that (A n B) n C = A n (B n C).

A n (B u C) is the crosshatched area A u (B n C) is the crosshatched area

Figure 2.11. Figure 2.12.

Then we can verify from the diagrams that

A n (B u C) = (A n B) u (A n C)
and

iu(BnC) = (duB)n(4u C).

The student should draw diagrams to pick out other combinations, such
as A n (B n o, (A u B) n C, A u (B n C), (A n B) u (B n C), etc.
The relation of inclusion, B Q A, is diagrammatically represented by

B is the crosshatched area

Fig. 2.13. B is the crosshatched area.


2.2] AN ALGEBRA OF SETS 29

Thus if B c A, we have
A n B = B
and
A U B = A.

Conversely, it appears from Figs. 2.5 and 2.6 that if either of these latter
relationships holds we must have B Q A. We further have that for any
subset A of S, 0 c A, and A c S.
A relation which is in a sense opposite to that of inclusion holds when
A and B have no elements in common. We say that A and B are disjoint
if A n B = 0. Under this definition the empty set is disjoint from every
set, including itself. In general, the relation of disjointness can be pictured
as follows:

From Figs. 2.7 and 2.13 we see that if A n B = 0 then A J3 and


B c A. Conversely, it appears that if either of these inclusions holds, then
A and B must be disjoint. Three (or more) sets A, B, C are said to be
pairwise disjoint if any two of them are disjoint. This is evidently not
equivalent to the statement that A n B n C = 0, as the following dia¬
gram shows:

Figure 2.15

Basic laws of the algebra of sets. If we look back over Figs. 2.5 through
2.12, we see that in some sense only “typical” relationships between the
sets are shown, in that in no case does the relation of inclusion or disjoint¬
ness hold. For this reason an argument using Venn diagrams may be con¬
sidered slightly unreliable (just as with arguments by figures in plane or
solid geometry]. However, any such argument can be made definitive by
returning to the basic definitions of 0, c, n, U, and . There is also another
possibility (again analogous to geometry), namely, to try to find some basic
properties of these notions from which other properties can be deduced.
30 THE SET-THEORETICAL BACKGROUND [CHAP. 2

The following is just such a list.

(2:2—6) For any set S and subsets A, B, C of S we have:


(i) A n A = A and A U A = A;
(ii) A n B = B n A and A U B = B u A;
(iii) A n (B nC) = {A n B) and A u (B u C) =

(A U B) u C;
(iv) A n (B u C) = (A n B) U (A n C) and
Au(5flC) = (iuB)n(Au C);
(v) A n 0 = 0 and A U S = *8;
(vi) A n S = A and A U 0 — A;
(vii) A n A = 0 and A U A = S;
(viii) (A n B) = A u B and (A U B) = A n B;
(ix) Z = A;
(x) B c A if and only if A D B = B;
(xi) B c A if and only if A U B = A.

It is known that the set of statements (i)—(ix) is complete in the sense that
any equation formed using 0, S, n, U, and ~~ (relative to S) and any
number of variables A, B, C, . . . which is true for any set S and subsets
A, B, C, ... of S can be deduced from (i)-(ix). As an example, consider
the statement following (2:2-5),

(2:2-7) (A n B) U (B n A) = (A u B) n (A n B),

which we obtained by a diagram argument. This can be deduced as follows:

(A u B) n (A n B) = (A U B) n (1 U B) _ by (viii)
= [{A U B) n A] U [{A U B) n 5] by (iv)
= lA_n (A U 5)_] U[Bfl (AU B)] by (ii)
= KA n A) u (A n 5)] u [(fi n A) u (B n B)]
_ _ by (iv)
[0_U (.A n 5)] u [(5 n A) u 0] by (vii)
[(Z n B) U 0] U [(H n I) U 0] by (ii)
(A 61 U (B n A) by (vi)
(A n B) u (B n A) by (ii)
After a little practice it is very easy to develop skill in such deductions
and to combine several applications of the basic laws into single steps.

It follows from (2:2-7) that

(2:2-8) A = B if and only if A n B = 0 and B n Z = 0.

For clearly, the right side of this equivalence follows from the left side.
2.2] AN ALGEBRA OF SETS 31

Now suppose the right side is true. Let C — {A U B) n (A fi B).


According to (2:2-7), C = (A n B) u (B n A) = 0 U 0 = 0. Hence
A U C = A by (2:2-6)(vi). But also,

A u C = [A U (A U B)] n [A U_(A n B)] by (iv)


= [A U B] n [A U [A U B)] by (Hi), (i), (viii)
= [A u B] n [S U B) by (iii), (vii)
= [A u B] n S by (v)
= A U B by (vi)

By a similar argument

B U C = A U B.
Hence

A uC = B U C, which, since (7=0, shows us that A — B.

So far we have not used conditions (2:2-6)(x), (xi). These show that
c can be defined either in terms of D or U. They are not independent,
since we can deduce from (2:2-6)(i)-(ix) that

(2:2-9) A n B = B if and only if A U B = A.

To see this in one direction, suppose A D B = B. To show that

A U B = A,

it suffices by (2:2-8) to show that

{A U B) n A = 0 and A n (A U B) = 0.

The first follows from

(A u B) n A = (A n A) u (B n A) = 0 u (B n A)
= B n A = (A n B) n A =. (A n A) n B = 0 n B = 0;

the second follows from

A n (A u B) = A n (I n B) = (A n I) n B = 0 n B = 0.

(Trace the conditions used.) In addition to either (2:2-6)(x) or (2:2—6)(xi),


c is also conveniently characterized by

(2:2-10) B c A if and only if B n A = 0.

[Prove this from (2:2-6)(x), using B = (B n A) U (B n A).]


32 THE SET-THEORETICAL BACKGROUND [CHAP. 2

Then from (2:2-6) (i)-(ix) and either (x), (xi) or (2:2-10) we can deduce
the properties of c, the following of which are the most basic:

(2:2-11) For any set S and subsets A, B,C of S we have

(i) 0 c A,
(ii) A c s,
(Hi) A c A,
(iv) if A Q B and B c A, then A = B,
(v) if A Q B and B c C, then A QC.

We have already verified most of these directly from our basic definition
of c in the preceding section. Note that (2:2-ll)(iv) is just a restatement
of (2:2-8), using (2:2-10).
We have not yet considered the question of the correctness of the state¬
ments in (2.2—6) under our basic definitions. These can be proved using
the condition (2:1-23) of the preceding section, according to which two
sets are equal if they have the same elements. Thus, for example, to verify
the second part of (2:2-6)(iv), suppose x e A u (B n C). Then by
(2:2-4), x e A or x e B n C, hence x e A or (x e B and x e C). Suppose
xeA, then it is true that x e A or x e B, i.e., that x e A U B, and
similarly it is true that x e A U C. Thus in this case we see that x e
(A U B) n {A u C). On the other hand, if x e B and x €E C, we see
first that x e A u B and also that x e A u C, hence again that z e
{A U B) n (A U C). Thus in either case we obtain x g (A u B) n
{A UC). In other words, we have shown that if x e A U (B n C),
then x e (A U B) n (A U C). By establishing the converse implication
m a similar way, we would obtain x e A U (B n C) if and only if
i £ (4 U 11) n (i U C); in other words, A U (B n C) = (A U B) n
(A U C). By this procedure we can, with little exercise of imagination
and a great deal of writing, convince ourselves of the truth of each of the
statements in (2:2-6).
The pi oof of the completeness of (2:2-6), to which we have referred
earlier, is quite another matter. It, is, in contrast, a metamathematical
statement, for it is concerned with the possibility of certain deductions
rather than with a mathematical realization of particular deductions. The
formal study of these laws, which goes back to G. Boole, forms the initial
part of what is usually called Boolean algebra. The proof of completeness
w ic we shall not give here, can be found in various modern treatments
of Boolean algebra.
The reader has no doubt already recognized a striking resemblance
between some of the statements in (2:2-6) and the laws of ordinary
algebra. Indeed, if we replace the symbols n, U, 0, S by •, 0, 1 re-
2.2] AN ALGEBRA OF SETS 33

spectively, the statements of (2:2-6)(i)-(vi) take the following form:

(i) A ■ A =: A and A + A = A;
(ii) A ■ B = B • A and A + B = B + A
(iii) A C) = (A-B) -C and
A + (B + C) = (A + B) + C;
(iv) A ■ (B + C) = (A-B) + (A • C) and
A + (-B ■C) = (A + B) • (A +C);
(v) A • 0 = 0 and A + 1 = 1;
(vi) A • 1 = A and A + 0 = A.

Of course, only (i), the second part of (iv), and the second part of (v) are
not met in ordinary algebra. Because of other formal similarities, many
writers often refer to the intersection of A and B as being the product
of A and B (written either A ■ B or AB) and to the union of A and B
as being the sum of A and B (written as A + B). Also, c satisfies many of
the conditions met by the ordinary < relationship among numbers; in
particular this is true of (2:2-8)(iii)-(v). However, for any two numbers
a, b we have a < b or b < a; the corresponding is not true of sets, i.e.,
there exist sets A, B such that neither A c B nor B c A. Because of the
many similarities between the laws for sets and those for numbers, the
use of the word “algebra” in dealing with sets is quite appropriate. Further,
various names used to describe ordinary algebraic laws are naturally
extended to the corresponding laws for sets. In particular, (2:2-6)(ii),
(iii), and (iv) are referred to respectively as commutative, associative, and
distributive laws. (2:2-6) (vi) shows that S and 0 act as identity elements
for n and U, respectively [cf. (2:2-6),(vi)]. (2:2-8)(iii), (iv), and (v) are
referred to, respectively, as reflexive, antisymmetric, and transitive laws. We
shall have occasion to refer to such statements again in a variety of
algebraic contexts.
Another interesting aspect of the statements in (2:2-6) is their symmetry
or duality. Namely, if we interchange n and U and interchange S and 0
while leaving unchanged, each part of the statements (2:2 6) (i) (viii)
is converted into the other part. Further, (2:2-6)(ix) is self-dual, since
it involves only —. Since any true equation in these symbols can be derived
from (2:2-6)(i)-(ix), the dual equation obtained by such an interchange
must also be derivable. For example, we have shown that

(A n B) u (B n 1) = (A u B) n (.A n B)

is derivable. It follows that

(A u B) n (B u I) = (A n B) U (A U B)

must also be derivable, which can be checked independently.


34 THE SET-THEORETICAL BACKGROUND [CHAP. 2

Extended, intersections and unions. The associative laws A fl (B n C) =


{A n B) n C and A U (B u C) = {A u B) u C allow us to use the ex¬
pressions A n B n C and A U B u C without ambiguity. [In contrast,
the use of parentheses in (2:2—6)(iv) is essential; for in general, A n
(B u C) ^ (A n B) u C.\ Similarly, we could use 4 nUflCnh to
represent any of the following identical sets: A n (Bn (C n D)),
A n ((B n C) n D), (4n5)n(Cn D), ((A n B) n C) n D, (An
(B n C)) n D. In this way we can extend the notion of intersection to
any finite collection of sets A, B, C, . . ., X. These can be defined in¬
dependently by:

A n B nC n■ ■ • n X
= ix: x e A and x e B and x <E C and . . . and x e X}
and

A U B \j C U • • • U X
= ix- x e A or a; e B or x e C or . . . or x G X}.

More generally, consider any nonempty collection M of subsets of S


i.e., if X £ M then X -c S. We define

(2:2-12) HX [X e M] = {x\ x G X for every X £ M),


UX [X £ M] = {x: x £ X for some X e M) .

That this actually generalizes the notions of n, U is seen from

(2:2-13) if M = {A, B] then flX [X e M] = A n B and


UX [X e M] = A U B.

A corresponding statement is true for any finite collection M. Many of the


laws in (2:2-6) can be generalized to arbitrary union and intersection.
Foi example, the distributive laws take the following form:

(2:2-14) A n (UX[X £ M]) = U(A n X)[X £ M],


A u (DX[X e M]) = D(X u X)[X e M].

(More precisely, if N is the collection of all sets Y of the form A n X, for


X e M, we have, for example, A n (UX[X e M]) = UF[F e N] )
Another example is drawn from the statement (2:2-6) (viii), usually re¬
ferred to as DeMorgan’s laws. These now take the form

(2:2-15) flX[X eMJ= (JX[X e M], UX[X e M] = f!X[X g M].


2.2] AN ALGEBRA OF SETS 35

The algebra of sets gives us greater insight into sets and their possible
relationships. It can also be used, to great effect, for the more compact
and precise expression of various statements from set theory. Consider,
for example, the axiom of choice, which we described by a kind of diagram
(Fig. 2.1) in the preceding section. Now it can be expressed as follows:

(2:2-16) Let M be any nonempty collection of sets such that for each
X e M, we have X 9^ 0, and for each X, Y e M if X 9^ Y
then X fi Y = 0; then there exists a set A such that A n X
contains a single element for each X G M.

Exercise Group 2.2

1. (An algebra of intervals.) Let Re be the set of real numbers. For a, b G Re


let [a, b] = {i:iGRe and a < x < b}, [a, V) = (x: x G Re and
a < x < b}, [a, b] = {x: x G Re and a < x < b}, (a, b) = {x: x G Re
and a < x < b}. The sets [a, b], [a, 6), (a, b], (a, b) are called (bounded)
intervals. These can be pictured on a line; for example, the interval
[a, b), with a < b is shown as:

-)-)-
a b

(a) In which cases is an interval empty? Which intervals contain exactly


one element?
(b) Compute [1, 3) Cl [2, 6], [1, 3) U [2, 6], [1, 3) — [2, 6], [2, 6] [1, 3).
Diagram the results.
(c) Is the intersection of any two intervals again an interval? Consider
the same question for union, difference. Is the complement of an
interval again an interval? (Prove your statements.)
(d) Suppose that a < b and c < d. Give a necessary and sufficient con¬
dition for [a, 6] c [c, d).
(e) Let M be the collection of all intervals [0, 1/n) for n a positive integer.
Find UX[X G M], OX[X G M], U([0, 2] - X)[X G M].

2. Let S be an arbitrary set and X, Y, Z be subsets of S. For each of the


following, either prove the statement from the basic laws (2:2-6) or show
by means of Venn diagrams that it is not generally true.

(a) (X U Y) = X n Y.
(b) (X u Y) nz = (Xuz) n w
(c) (x u Y) nz = (X uz)nf.
(d) If X c Y and X c Z then X c Y n Z.
(e) (X n Y) U (Y n X) c Y.
(f) If Y = (X n 7) U (Y Cl X) then X = 0.
36 THE SET-THEORETICAL BACKGROUND [CHAP. 2

3. Let N(X) denote the number of elements in A” when X is a finite set.


Show that

N(X U Y) = N(X) + N(Y) - N(X n Y).

Develop a formula for N(X Ul’UZ) in terms of N(X), N(Y), N(Z),


N(x n Y), N(X n z), N(Y n z), N(x n y n z).

2.3 Relations and functions. Relations as abstractions from conditions.


In Section 2.1 we eliminated the accidental aspect of treating particular
conditions &(x), with one free variable, by passing to the associated sets.
Then two conditions a(x), (R(x) were abstractly identified if the associated
sets were the same, i.e., if a(x) and 6S(x) were equivalent for all values
of x. We shall now show that the same sort of identification can be applied
to conditions with more than one free variable.
Let us begin by considering conditions a(x, y) which involve two free
variables, such as

(2:3-1) x, y are 'positive integers and 2x2 — 3xy + y2 > 0.

It is not difficult to see (by factoring 2x2 — 3xy + y2 and considering the
different possibilities for the factors) that for any x, y, (2:3-1) is equivalent
to the following condition ffi(x, y):

(2:3-2) x, y are positive integers and y < x or 2x < y.

This second condition makes it easier to see which numbers x, y are “solu¬
tions” of (2:3-1), in the sense that they make a(x, y) true. For example,
foi x 1 we have solutions y = 1 and y = 2, 3, ... ; for x = 2 we have
solutions y = 1, 2 and y = 4, 5, . . . ; for * = 5 we have solutions
y ~ 4, 5 and y = 10, 11, 12, ... ; etc. We cannot possibly list all
solutions x, y, since there are infinitely many of these; but we can imagine
a kind of infinite list which one could look into, to see whether or not a
given pair a, b is a solution. Let us, for the moment, use the notion of a
list m this extended sense. Schematically, such a list could be indicated
as in the table:

X l l 1 . . 2 2 2 . . 5 5 5 ...

y l 2 3 . . 1 2 4 . . 4 5 10 ...

Now since (2:3-1) and (2:3-2) are equivalent for all values of x, y, the
list of values associated with condition (2:3-1) is exactly the same as that
associated with condition (2:3-2). In other words, such a list serves the
2.3] RELATIONS AND FUNCTIONS 37

same purpose with respect to conditions involving two free variables as


does a set for conditions with one free variable. Unfortunately, the notion
of a list carries with it some connotations (such as “can be written down
on paper” and “given in a certain order”) which should be avoided.
Thus it is necessary to carry our abstraction one step further.
Suppose we were presented with the condition (2:3-2) in the following
form:

(2:3-3) (_), (. . .) are 'positive integers and (...) < (_) or 2(_) < (...).

Now what does it mean that a given pair of integers satisfies this con¬
dition? If we speak of the pair 1, 2 it evidently doesn’t matter whether
we place 1 for (_) and 2 for (...), or if we do just the opposite. On the
other hand, if we speak of the pair 2, 3 we get different results according
as we place 2 for (_) and 3 for (. . .) or conversely, for in the first case the
condition is not satisfied, while in the second it is. Hence the order in which
a given pair of integers a, b is presented and the manner in which these are
to be associated with the free variable (“empty places”) of a condition
must be specified. This leads to the concept of an ordered pair of objects
a, b; we shall denote such by

(2:3-4) (a, b).

Ordered pairs and cartesian products. The ordered pair (a, b) stands in
contrast with the unordered pair {a, b} which we have already discussed.
For though we have {a, b} = {b, a), it is essential to the concept of ordered
pair that we have (a, 6) ^ (6, a), unless a = b. More generally, we have

(2:3-5) (a, b) = (c, d) if and only if a = c and b = d.

We trust that it is no more difficult for the student to grant the existence
of objects (a, b) with this property than it is to grant the existence of sets;
in other words, we take the idea of ordered pair here as being a primitive
undefined notion. However, it is possible by a slightly sophisticated trick
to define it in terms of more basic notions (compare the first exercise at the
end of this section).
Having ordered pairs, the next step is easy. Instead of talking about
(possibly infinite) lists, we simply talk about sets of ordered pairs. For
example, associated with the condition (2:3-2) is

(2:3-6) the set of all ordered pairs (x, y) such that x, y are positive
integers and y < x or 2x < y.

Among members of this set we find the pairs (1, 1), (1, 2), (1, 3), ■ • ■ ,
(2, 1), (2, 2), (2, 4), . . . , (5, 4), (5, 5), (5, 10), . . . ; among nonmembers
38 THE SET-THEORETICAL BACKGROUND [CHAP. 2

we find (2, 3), (3, 4), (3, 5), . . . , also ( 1, 2), (t, 1), etc. More generally,
given any condition a(x, y) it seems that we can associate with it

(2:3-7) the set of all ordered pairs (x, y) such that a(x, y).

In Section 2.1 we made certain reservations about the unrestricted


formation of sets; presumably, similar considerations should apply here.
It is not clear whether it makes sense to form such “large ” sets of ordered
pairs as {(X, Y):X,Y are sets and X c F) or {(X, F): X, Y are sets and
X g F). To avoid the possibility of paradoxes and yet provide sufficient
freedom in forming such sets as desired in (2:3-7), the following state¬
ment is provided in axiomatic set theory:

(2:3-8) for any sets A, B there exists a set C such that for all z, z G C
if and only if for some x, y we have z = (x, y) and x G A and
V e B.

In other words, C has as members those, and only those ordered pairs
(x, y) for which x G A and y G B. This set C is denoted by

(2:3-9) AxB

and is called the cartesian product of A and B (after the philosopher-


mathematician Descartes). It is seen that

(2.3-10) if z g A X B then there are unique x, y such that x G A,


y £ B, and z = (x, y).

We call x the first term of 2 and y the second term of 2. We can now apply
the principle (2:2-39) of the preceding section to see that

(2:3-11) for any sets A, B and condition a(x, y) there exists a set W such
that for all z, z G W if and only if z G A X B and, for the
unique x, y such that z = (x, y), we have Gt(x, y).

This is the set IF = {2: for some x G A, y g B, a(x, y) and 2 = (x, y)\
or, as we shall write more economically,

(2:3-12). W = {(x, y):x G A, y G B and a(x, y)}.

In particular, the set of (2:3-6) is denoted by

(2.3-13) {(x, y): x G P, y g P and y < x or 2x < y].

If A, B are finite sets then A X B is also finite, for we can list com¬
pletely all possible combinations (x, y) of elements s of A with elements
2.3] RELATIONS AND FUNCTIONS 39

Figure 2.16 Figure 2.17

y of B. (In case A = 0 or B = 0 then A X B = 0.) For example, if

A = (-2, 0, 5}, B = {3, 5} then


AxB = {(-2, 3), (-2, 5), (0, 3), (0, 5), (5, 3), (5, 5)};

in this case A X B has six (distinct) elements. In general, if A is a finite


set with n distinct elements a\, . . . , an and B is a finite set with m distinct
elements bu ...,bm then A X B has the n ■ m distinct elements

(di, bi), . . . , (ai, bm), (a,2, b 1) • • • , (a2> bm), . . . , (a n, b 1), . • ■ , (fln,

A simple geometrical interpretation is given in Fig. 2.16. The points in¬


dicated by the dots correspond to the elements of the sets A, B, while
those indicated by the crosses correspond to the elements of A X B. For
example, the second point directly above a3 corresponds to the element
(a3, b2). The same sort of geometrical interpretation can be visualized for
infinite sets, for example for P X P. Every subset of P X P then cor¬
responds to a certain subset of the set of intersections of the vertical and
horizontal lines. This is indicated in Fig. 2.17 for the set {(x, y): x G P,
y e P and 2x < y + 1 or y < x}.

Domain, range, and converse. In general, if A, B are sets and W c


A x B then W is said to be a relation between elements of A and elements
of B. If (a, 6) G IF we say that the relation holds between a and b; in
some cases this is also written aWb or Wab. For example, if IF = {{x, y):
x G P, 2/ e P, and x < y}, then we usually refer to IF as the less-than
relation; we have here the choice of writing (a, b) G IF or a < b, and it
would not be out of the way to use the symbol < to denote the relation
itself and write la, 6) G <. Most often we are concerned with relations
between elements of the same set, i.e., subsets of a set A X A, in which
case we say that the relation holds between elements of A. Actually,
40 THE SET-THEORETICAL BACKGROUND [CHAP. 2

there would be no loss of generality if we restricted ourselves to such cases,


for if C is the union of A and B, then A X B QC X C. This is just a
special instance of the following:

(2:3-14) if A c A' and B c5' then A X B c A' X B'.

Hence, we cannot say of a relation IF that there are unique sets A, B


between whose elements it holds. On the other hand, there are unique
“smallest” sets for which this is true. We define, for any set IF (whether
or not it contains ordered pairs),

(2:3-15) the domain of IF, in symbols 35(IF), is the set


{x: for some y, (x, y) e W)

and

(2:3-16) the range of IF, in symbols 61 (IF), is the set


{y-for some x, (x, y) e IF}.

It may be that for a set IF, W contains no ordered pairs or contains some
elements which are not ordered pairs. However, we have

(2:3-17) if IF is a set of ordered pairs then W c 35(IF) X (R(IF), and


hence W is a relation.

Further, we have

(2:3-18) for any sets A, BifW^Ax B then 25 (IF) c A and (R(IF) c B.

These properties are easily verified.


In some cases a relation is described by giving its domain and range out¬
right. For example, if Pt is the set of points of a plane and L is the set
of lines of a plane, the relation of incidence is the relation {(x, y): x e Pt
y e L> and x lies on V} with domain Pt and range L. More often the
domain and range are not explicitly given in some definition of a relation
but must be deduced from it. For example, the relation IF = {(x, y):
x S I, y G I, and x2 + Ay2 < 16} is seen to have the domain (—4, —3
-2, -1, 0, 1, 2, 3, 4} and range (-2, -1, 0, 1, 2,}. Of course, a relation
does not in general hold between all elements of its domain and of its range.
There are many pairs in the preceding example which do not belong to W
but which do belong to 35(IF) X (R(IF). A way to geometrically visualize
the domain and range of a relation in A X B is given in Fig. 2.18 The
singlehatched area corresponds to A X B, the crosshatched area to IF
and the heavy lines to the domain and range of IF. The rectangular area
bounded by the two pairs of dashed lines corresponds to 55(IF) X (R(IF)
2.3] RELATIONS AND FUNCTIONS 41
B

«0U)

©07) 4

Figure 2.18

Since relations are just special kinds of sets, it follows that the con¬
dition (2:1-23) for the identity of two sets can be applied equally well to
relations. However, every element of a relation is an ordered pair, so we
can replace the condition in this case by the following more special one:

(2:3-19) if U, W are relations then U = W if and only if, for all x, y,


(x, y) G U if and only if (x, y) G IF.

In particular, if we consider relations defined by certain conditions, we


have

(2:3-20) {(x, y)\ x G A, y G B,anda(x, y)} = {(x, y):x G A,y G B,


and <$>(x, y)} if and only if for all x G A and y G B, d(x, y)
is equivalent to <$>(x, y).

In this respect relations play a role for conditions with two free variables
which is completely analogous to the role played by sets for conditions with
one free variable; they serve to identify equivalent conditions. We must,
however, be cautious about one point in the analogy. Whereas there is
at most one set associated with each condition Q(x), there are in general
two relations associated with conditions a(x,y). To see this, return to
the form (2:3-3) in which we expressed a certain condition using symbols
_ and . . . instead of variables x and y. In this form there is no reason to
prefer one symbol to vary over the domain of the relation and the other
to vary over the range. Associated with the given condition are two rela¬
tions IF and W, one consisting of all pairs (_,...) satisfying the condition,
while the other consists of all pairs (...,_) satisfying the condition.
We can say that the relations W, IF are connected in the following way:

(2:3-21) for all x, y, (x, ly) G W if and only if (y, x) G W.

In such a case W is said to be the converse of IF; hence also IF is the con¬
verse of W. Consider, for example, the relation IF = {(1, 1), (2, 1), (3, 2)}
42 THE SET-THEORETICAL BACKGROUND [CHAP. 2

with £>(TF) = {1,2,3}, (R(W)j= {1,2}. The converse of W is W =


(Cl, !); (1, 2), (2, 3)} with SD(JF) = (1, 2}, (R{W) = (1, 2, 3}. Geomet-
rically, the two relations are compared in the following figure.

l 2 3

Figure 2.19

Although a relation W and its converse W are in general distinct, it is


possible to deduce from any property of IT a corresponding property of W
by means of the equivalence (2:3-21). Hence in a discussion of the set of
solutions of a condition it makes little difference which of the two associated
relations one considers. The important thing is to make clear in advance
which is being studied.

Ternary {etc.) relations. The step to the treatment of conditions with


more than two free variables is now clear. For conditions with three
free variables we should use ordered triples

(2:3—22) (a, b, c);

these should have the basic property

(2:3-23) (a, b, c) = (e,f, g) if and only if a = e, b = f, and c = g.

It turns out in this case that we do not need to take this as a new primitive
notion so long as all we demand of this notion is that it fulfill (2:3-23). If
we define

(2:3-24) (a, b, c) = ((a, b), c),

then we can deduce (2:3-23) from the basic property (2:3-5) of ordered
pairs. This also leads us to define

(2:3-25) A X B X C = (A X B) X C,

so that A X B X C is the set of all triples (x, y, z), in the sense of (2:3-24)
such that x E A, y e B, and z e C. For example, for A = (—2 0 5^-
B = (3, 5} and C = {0, 3}, 1
2.3] RELATIONS AND FUNCTIONS 43

This is not the same as the set A X (B X C). For example, (—2, 3, 0) =
((—2, 3), 0) by definition, which is distinct from (—2, (3, 0)). However,
there is a clear one-to-one correspondence between the elements of the two
sets. It is now easily seen how one would define the notion of ordered
quadruple (a, b, c, d) and the product A X B X C X D, and so on, for
larger numbers of factors, and in this way see how to treat conditions with
arbitrarily many free variables. We thus have for any specified positive
integer n a notion of ordered n-tuple, which agrees with that of ordered
pair for n — 2 and of ordered triple for n = 3.
We have defined a relation as being a subset of A X B for some A, B or,
equivalently, (2:3-17) as being any set of ordered pairs. (Then 0 is a
relation, since 0c4 X B for any A, B.) Under our definition (2:3-24),
every ordered triple (a, b, c) is at the same time an ordered pair, although,
of course, the converse is not true. Thus every set W of ordered triples is
a set of ordered pairs ((a, b), c) and hence is a relation. It is, however, a
relation of a more special kind, which we call a ternary relation. (More
generally, using the notion of ordered n-tuple, we could single out for
any specified positive integer n, the n-ary relations.) We could, if we wished,
refer to an arbitrary relation as being a binary relation, but this only
serves to re-emphasize the fact that it is a set of ordered pairs.
A nonmathematical example of a ternary relation is provided by the
set W = {(x, y, z): x, y, z are people and 2 is a son of x and y}. It is seen
that there are many a, b for which there is no c with (a, b,c) G W; for
example a, b may not be married or may be married but have no son. On
the other hand, every human male c is the son of some a, b, so that
(ft(W) = the set of human males. W also has the property that if
(a, b, c) G W then (b, a, c) G IF; it does not have the property that if
(a, b, c) G IF and (a, b, c') G IF then c = c'. A mathematical example of
a ternary relation is provided by the set IF' = {(x, y, z): x and y are odd
prime numbers and z = x + y). Let 0 be the set of odd prime numbers
3, 5, 7, 11, ... , and let U6 be the set of even numbers z > 6. Then
£>(W') = 0X0 and (R(IF') c U6; it is a famous open question (Gold-
bach’s problem) whether (R(fF') = E6. W' has the property that if
(a, b, c) G W' then (6, a, c) G IF'; it also has the property that if
(a, b, c) G IF' and (a, b, c') G IF' then c = c'.

Operations on relations; composition. In mathematics we are often con¬


cerned with various binary relations in a set S (between elements of $),
i.e., with subsets oi S X S. There is an algebra attached to such relations
analogous to the algebra of subsets of an arbitrary set. It makes sense to
ask of any two relations IF, IF' in S whether IF c IF'. I or example,
{(*, y): x, y G I and x < y} Q {(x, y): x, y G I and x < y), but the c
does not hold in the reverse direction. Under c, 0 is the smallest relation
44 THE SET-THEORETICAL BACKGROUND [CHAP. 2

in S and S X /SAs thejargest. If W, W' are relations in S then IF n IF',


11 U IF', and IF(=IF(,SX'S,) = (S X S) — W) are again relations in S.
For example,

{{x, y): x, y el and x < y} U {(x, y):x,y el and x = y]


= {(x, y): x, y e I and x < y}
and (in I X I)

(0, y): X, y e I and x < y} = {(x, y): x, y e I and y < x}.

Now the operations n, U, “ correspond to the use of the words “and,”


“or, ” and “not” as applied to defining conditions of sets (2:2-2). The words
for every and for some have also been used in connection with opera¬
tions on sets, namely D and U (2:2-12), but in this case the variables
were sets, not elements. An operation on relations which uses the words
“for some” attached to elements is that of forming the domain, (2:3-15).
However, ©(IF) is not, in general, a relation, even if IF is. It has been
found that the following is an appropriate and useful operation on relations,
to lead again to relations.

(2:3-26) W; IF' = {(x, y):for some z, (x, z) e IF and (z, y) e IF'}.

IF; IF' is called the composition of IF and IF'. (Some writers use the
symbol IF ° IF' for this.) For example, if IF = {(x, y): x is a son of y}
and IF — {(x, y). x is a child of y} then IF; IF' = {(x, y): x is a grandson
of y} ■ lfW = (Of y):x,y El and x < y) then IF; IF = {{x, y): x, y e I
and x + 1 < y}.
A similar operation on relations using the words “for all” can also be
defined, but we would find no use for it here.

Special kinds of relations. The following are some interesting mathe¬


matical relations in the set I of integers: the identity relation {{x, y): x, y e I
and * = y}; the less-than relation {(x, y): x,y e I and x < y}; the less-
than-or-equal-to relation {{x, y): x,y el and x < y}; the divisibility rela¬
tion {(x, y):x,y el and x is a divisor of . We also have a relation between
subsets of I, the inclusion relation {(A, F): X c I, fc} and X c Yx
These have various characteristic properties, which we now describe for
an arbitrary relation IF in a set S.

(2.3-27) II is said to be reflexive (in S) if for all a e S, (a, a) e W ;


IF is said to be irreflexive (in S) if for all a e S, (a, a) e IF;
IF is said to be symmetric if whenever (a, b) e IF then (b, a) e IF;
IF is said to be antisymmetric if whenever (a, b) e W and
(b, a) e IF then a = b;
IF is said to be transitive if whenever (a, b) e W and (b c) e IF
then (a, c) e IF.
2.3] RELATIONS AND FUNCTIONS 45

If no set S is specified, we assume S = 3D (IF) U ffi(TF). For the relations


in the integers described above we have that the identity relation is
reflexive, symmetric, and transitive (it is also antisymmetric); the less-
than-or-equal-to relation is reflexive, antisymmetric, and transitive (but
not symmetric). The student should also classify the other relations with
respect to these properties.

Equivalence relations and 'partitions. We now define:

(2:3-28) IF is said to be an equivalence relation (in S) if W is reflexive


(in S), symmetric, and transitive.

Equivalence relations are very much like the identity relation. Consider
the following two relations: IF = {(x, y): x, y £ I and x — y is a multiple
of 3}, W' = {(x, y): x, y E Re and x — y E Ra}. The first of these is
an equivalence relation in the integers, the second in the real numbers.
If we write a = b instead of (a, b) e IF, we have

. . . —6 = —3 = 0 = 3 = 6 == 9 = . . .
. . . — 5 = —2 = 1 = 4 = 7 = 10 = . . .

. . . —4 = —1 =52^5 = 8 = 11 = ...,

where, by transitivity, the relation = holds between any two elements


in the same row, while, on the other hand, it never holds between two
elements from different, rows. Thus if we put X0 ={..., —6, —3, 0, 6,
9, . . .}, Xx= {..., -5, -2, 1, 4, 7, 10, . . .}, X2 = {. . . , -4, 2, 5,
8, 11, . . .}, we see that X0, X1} X2 are pairwise disjoint sets such that
every integer is in one of the sets, hence I0UliUl2 = I, and such
that for any a, b, a = b if and only if a, b belong to the same set Xi.
Moreover, if a is any element of I, the set Wa = {x: x E I and x = a} is
one of the sets X0, Xx, X2; hence for any a, b either Wa = IF& or Wa n
IF5 = 0. Clearly, a = b if and only if Wa = IF5. Much the same situa¬
tion holds with the relation IF', except in this case we cannot conveniently
list all the associated sets. However, we can describe the process in
general terms as follows:

(2:3-29) Suppose that W is an equivalence relation (in S). Let Wa =


(x: x G X and (x, a) e IF} for each a G S. Let M be the
collection of all sets IFa for a e S. Then if X, Y e M either
X — Y or X D Y = 0; further, U-^[W £ M] = S. Finally,
(a, 6) £ IF holds if and only if there is an X E M such that
a, b £ X, and also if and only if Wa = Wb-
46 THE SET-THEORETICAL BACKGROUND [CHAP. 2

The sets in M are called the equivalence sets associated with W. One often
writes [a] instead of Wa when working with some fixed relation W. (2:3—29)
shows that the equivalence relation W in S corresponds directly to the
identity relation in M.

(2:3-30) A collection M of subsets of S is called a partition of S


if, first, for any X £ M we have 1^0 and, second, for
any X, Y £ M either X = Y or I n F = 0 and, finally,
u M
X[X e ] = S.

Then we have:

(2:3-31) Suppose that M is a partition of S. Define a relation W in S


by: (a, b) E W if and only if there is an X e M such that
a, b £ X. Then W is an equivalence relation in S whose as¬
sociated equivalence sets are just the members of M.

We leave it to the student to verify this. (2:3-29) and (2:3-31) show that
we have a direct correspondence between equivalence relations and parti¬
tions.
The identity relation has other interesting mathematical properties. For
example, if a, b, c are integers and a = b then a + c = b + c and
a ■ c = b ■ c. To what extent are these properties shared by other equiv¬
alence relations? For example, if = is the relation defined above, so that
a = b if and only if there is a u such that uel, with a — b = 3 • u,
we see that a = b implies a + c = b -f- c [compute (a + c) — (b -j- c)]
and a - c = b ■ c [compute (a ■ c) — (b • c)]. For the relation W' between
reals, which we write now =', so that a = b if and only if a — b e Ra,
we have again a =' b implies a + c =' b + c, but we cannot in general
infer that a • c =' b • c (for 1 =' 0 but 1 • \/2 0 • \/2.) Equivalence
relations which do have such additional algebraic properties will prove
to be very useful in our development.

Functions. We turn now to the study of another very important class


of relations, the functions. The definition of these given in set theory is
intended to make certain intuitive notions precise. One of these intuitive
concepts has a physical source, namely that one physical quantity is
strictly determined once other related quantities are fixed. For example,
the distance y which a dropped body falls during a given period of time x
depends (in the simplest physical analysis) only on that period of time, and
does not depend, say, on the mass of the body. It was natural for physicists
to hope that such relationships could be characterized exactly by mathe¬
matical “laws.” This related to the mathematical notion of a calcula¬
tion procedure which associates with every value of some quantity x
2.3] RELATIONS AND FUNCTIONS 47

another strictly determined quantity y. Thus for the above physical


situation, experiment suggests that y = 16a;2, when x represents seconds
and y feet. We can easily conceive, though, of such situations of regularity
in nature for which we can find no “law” which will accurately reflect
the relationship between the quantities. This general notion of a deter¬
minate or functional relationship which is not necessarily tied to any
particular way of expressing it can be explained precisely in the language
of relations as follows:

(2:3-32) a relation F is said to be a function if for each x G 3D(F) there


is a unique y with (x, y) e F; this unique y is denoted by F(x).

Since whenever F is a relation and x G 3D(F) there is at least one y with


Cx, y) G F, the property of being a function can be re-expressed:

(2:3-33) a relation F is a function if and only if for any x, y\, and y2,
(x, yi) G F and (x, y2) G F implies tji = y2.

The notion of a function as provided by a law is contained in the above


notion by means of the construction of relations from conditions &(x, y).
Thus, for example, the function associated with the relationship y — x3 — 1
between real numbers is defined by

(2:3-34) (a) F = {(x, y): x, y & Re and y — x3 — 1},

or, equivalently, by

(2:3-34)(b) F = {(x, x3 — l):x G Re}

(since whenever x G Re, also x3 — 1 G Re). In greater accordance with


usual practice, we could also write

(2:3-34) (c) F is the function with domain Re such that for each x G Re,
F(x) = x3 — 1.

It should be noted that this particular function F is distinct from the


function G = {{x, y)\ x, y G I and y = x3 — 1}. As relations, we have
G c F, since whenever (x, y) G G, also (x, y) G F; but G ^ F, since
£>((?) = I and 3D(F) = Re. Thus the concept (2:3-32) of function is
sharper than the vague concept of law. This is as it should be, for F and G
have many different properties. For example, for each y G Re there is
an x G 3D(F) with F(x) = y; in other words (R(F) = Re. But it is not
true that for each y G I there is an x G 3D((7) with G(x) = y (5 is not
x3 — 1 for any x G I).
48 THE SET-THEORETICAL BACKGROUND [CHAP. 2

Even if one is primarily interested in functions defined by particular


conditions, the above general notion of function is of great usefulness.
For example, we may obtain some result about all continuous functions;
then whenever we recognize that a particular condition defines a func¬
tion of this type we know immediately that the result applies. In this
respect, arbitrary functions stand to particular laws as variables stand
to constants.
The graphical interpretation of relations is very naturally applied to
functions. Usually, we have to deal with a function whose domain and
range are contained in some preassigned set, for example the set Re of
real numbers. Then the function is a subset of Re X Re. We picture
certain reference sets in Re X Re, the “x-axis” consisting of all pairs
(x, 0) for x G Re and the “yards” consisting of all pairs (0, y) for y e Re;
these intersect in (0, 0). (Thus a point known to be on the x-axis can be
uniquely labeled by the number x.) The graph of a function in Re X Re
might then look as follows.

The conditions for a relation F to be a function is simply interpreted by


the statement that no point in the graph of F lies directly above another
point of the graph, for otherwise we would have (x, y) e F, (x, y2) E F
with yi y2. On the other hand, to any given y there may correspond
no value or many values of x such that (x, y) e F.
A notion which appears frequently in mathematics courses is that of an
implicit function. For example, the condition x3 — y = 1 implicitly
determines y as a function of x; explicitly, y = x3 — 1. It is also said
that x2 + y2 = 1 implicitly determines y as a function of x; formally,
V = V1 — ^2- Here, however, the situation is more subtle. We must
first determine which kinds of numbers are to be considered. If these are
to be real numbers, we cannot allow x > 1 or x < — 1, i.e., we must have
1 < x < 1. Second, if we settle, for the sake of definiteness, that
Va signifies the (unique) positive square root of a when a > 0, we see
that there are at least two functions determined by the condition: y =
\/l — x2 for —1 < x < 1, or y = —y/l — x2. Actually, there are
many such functions; for example, another is y = — x2 when
— 1 + (n/2) < x < —1 + (n + l)/2, n = 0, 1, 2, 3. The graphs of
2.3] RELATIONS AND FUNCTIONS 49

Figure 2.21

The union of the first two graphs is just the set of all (x, y) such that
x2 + y2 = 1; however, that set is clearly not a function.
A precise definition of what it means for F to be an implicit function
associated with a given condition d(x, y), within a preassigned domain S,
might run as follows: the domain of F is {x \ x G S and for some y G S,
&(x, y)} and for each x G 3D(F), F{x) G S and d(x, F(x)) holds. Given a
condition d(x, y) and set S, for each x G S let Wx = {{x, y):y G S and
CL(x, y)}; then let D = {x\ Wx ^ 0}. For any Xi, x2, if WXl 9^ WX2
then X]_ x2, hence WXl n WX2 = 0. Let M be the collection of all
sets Wx for x G D. By the axiom of choice there is a set F such that
F n Wx contains exactly one element for each x G D, Hence for each
x G D there is a unique y G S with (x, y) G F; moreover, (x, y) G Wx,
so that d(x, y). Thus there always exists at least one implicit function
associated with d(x, y) and S. The problem of implicit functions in calculus
goes deeper: in which cases can we prove the existence of at least one im¬
plicit function satisfying additional conditions of continuity, differentia¬
bility, etc.? It is with respect to such functions that, say, rules of ‘'im¬
plicit differentiation” are supposed to have significance.
Closely related to the implicit functions are the so-called multivalued
functions. Authors who use this term often refer to the notion of function
presented here as being that of a single-valued function. For example, it
is known that with every complex number z 9^ 0 is associated exactly
two complex numbers w with w2 — z. The question of distinguishing
between these two square roots of z is not as simple as in the case of real
numbers, since we cannot speak of positive or negative complex numbers.
The equation F(z) = -s/z does not define a function in our sense of the
word. There are two approaches to this problem in the theory of complex
numbers. One is to speak of the branches of the “function” \/z, i.e., of
certain single-valued functions which together provide both square roots
for every number z. The second is to expand the notion of complex
number by the use of Riemann surfaces. For the square-root function,
in place of a single complex number z ^ 0 there will now be two numbers
Z\, z2 on the associated surface, and one single-valued function F such
50 THE SET-THEORETICAL BACKGROUND [CHAP. 2

that F(zx) is one square root of 2 and F(z2) is the other. In this book
we shall always use the word “function” in its single-valued sense, i.e.,
according to (2:3-32) or (2:3-33), and we shall treat situations which
lead to multivalued functions in terms of these.
A ternary relation, i.e., a set of ordered triples {{x, y), 2), can also be
a function. The condition for F to be such is given by (2:3-33): for any
x> Vi zi> z2, if ((%, y), Z\) e F and ((x, y), 22) E F then zx = z2. Accord¬
ing to our definitions, the unique 2 associated with (x, y), if there is any,
is denoted by F((x, y)); for simplicity, we shall instead denote it by
F(x, y). If a function F is a ternary relation we shall call it a binary func¬
tion (function of two arguments or variables). Although it is not necessary
to qualify it in this way, we can refer to an arbitrary function (which is
not otherwise specified to be binary) as being a unary function (function
of one variable). Functions of more than two variables can be treated
similarly, so that for any specified positive integer n, we can talk of
functions of n variables, or, simply, of n-ary functions. The clearest way
to express that a function is, say, binary is to describe its domain, for
example 2D (F) = S X S.
In algebra it is customary to use the word operation instead of function,
but these have exactly the same meaning. Thus when we speak of the
operation of multiplication on real numbers we mean the function F, with
domain Re X Re, such that for all a, b £ Re, F(a, b) = a • b. An opera¬
tion is called unary, binary, etc., under the same conditions as a function.
Thus multiplication is a binary operation.

(2:3-35) Suppose that G is an operation with domain S and that A c S;


A is said to be closed under G if whenever also G{x) e A.
Suppose that F is an operation with domain S X S and that
A c S', A is said to be closed under F if whenever x, y E. A
also F(x, y) e A.

Consider, for example, the operation of inversion of nonzero real numbers:


= Re {0}, G(x) = l/x. The set Ra — {0} of nonzero rationals
is closed under inversion, but the set I - {0} of nonzero numbers is not.
The operation of subtraction on real numbers is the function F with
3D(F) = Re X Re, F(x, y) = x — y. The set I of integers is closed under
subtraction, but the set P of positive integers is not.

Congruence relations. The statements one finds in geometry connecting


equality with operations, such as “if equals are added to equals, the results
are equal, ” are seen to hold trivially when translated into our language of
relations and functions. In fact if F is any binary operation and ax = a2
2.3] RELATIONS AND FUNCTIONS 51

and bi = b2 then F(ax, 5X) — F(a2,b2) whenever (ax, 6X) G 3D(F); for

((ax, bi), F{a\, b\)) E F,

hence by (ax, 6X) = (a2, b2), also

((a2, b2), F(ai, bi)) E F,

therefore F(ax, 5X) is a z for which ((a2, b2), z) G F—but there is only
one such z, which we have called F(a2, b2). Similar statements hold for
functions with other numbers of arguments. On the other hand, if = is
an equivalence relation in a set S closed under an operation F, we have
seen that it need not be true that if ax = a2 and 5X = b2 then F(ax, 6X) =
F(a2, b2). The cases in which this is true are of special interest:

(2:3-36) Suppose that W is an equivalence relation in a set S; we put


a = b if (a, b) E IT.
(i) If G is a unary operation with D(G) = S, (R(G) ^ S,
then W is said to be a congruence relation with respect to
G if whenever ax = a2 we have (7(ax) = G(a2).
(ii) If F is a binary operation with 3D(F) = S X S, (R(F) Q S,
then W is said to be a congruence relation with respect to
F if whenever ax = a2 and 6X = b2 we have

F(ax, 6X) = F(a2, b2).

We apply the words “congruence relation” to a IT only if we already know


that W is an equivalence relation. It is clear how the notion of congruence
relation can be applied also to functions of more than two arguments;
however, we shall have no occasion to use such. Sometimes it is also
convenient to define the notion: IT is a congruence relation with respect to
the relation U. If U is, for example, binary, and = is taken as above, this
holds if whenever ax = a2 and 6X = b2 and (ax, 6X) G U then (a2, b2) E U.
The property of being a congruence relation IT with respect to, say,
a binary function F can be rephrased by saying that the equivalence set
which contains F(a, b) is uniquely determined by the equivalence sets of a, b,
respectively; more briefly, that F is well defined with respect to IT. In
other words, we are led to a function on equivalence sets:

(2:3-37) Suppose that IT is a congruence relation in a set S with respect


to a binary operation F on S. Let M be the collection of equiva¬
lence sets [a] = Wa, for a E S. Then there is a binary opera¬
tion F with domain M such that for each a, b E S,

F([o], [b]) = [F(a, 6)].


52 THE SET-THEORETICAL BACKGROUND [CHAP. 2

We need only see that the relation consisting of all triples (([a], [6]),
[F{a, 6)J) is a function; if ([a'}, [6']) = ([a], [6]) then a = a', b = b' and
hence F(a, 6) = F(a', b'), i.e., [F(a, 6)] = [F(a', 6')]. Consider, for ex¬
ample, the equivalence relation in the integers, W = {{x, y): x, y e I and
x — y is a multiple of 3). We have three equivalence sets, [0], [1], [2].
It is easily seen that the operation F(a, b) = a + 6 is well defined with
respect to this equivalence relation. For this we have by (2:3-37) an as¬
sociated operation F(X, Y) on equivalence sets, which we denote by
X ® Y. Then it can be seen that [0] © [1] = [1], [1] © [2] = [3] = [0],
[2] © [2] = [4] = [1], etc. More compactly:

© [0] [1] [2]

[0] [0] [1] [2]


[1] [1] [2] [0]
[2] [2] [0] [1]

1 he same sort of construction of functions on equivalence sets as given in


(2:3-37) can evidently be carried out for functions of one or any number
of arguments, whenever the equivalence relation is a congruence relation
with respect to such functions. Similarly, if W is a congruence relation
with respect to a binary relation U, we can unambiguously define an
associated relation U between equivalence sets by: ([a], [6]) e U if and
only if (a, 6) e U.

Converse and composition of functions. Since functions are special kinds


of relations, we can also apply to them the notions of converse and com¬
position. Then the converse of a (unary) function F is the set of all pairs
(v> x) such that (x, y) £ F, i.e., is the set of all pairs (F(x), x) for x E 36(F).
It is seen that the converse of a function need not be a function. For ex¬
ample, for the function F(x) = x2 with domain I, both (2, 4) and (—2, 4)
are in F, hence (4, 2), (4, —2) are in its converse. On the other hand, the
converse G of the function G(x) = 2.r T- 1 with domain I is a function;
for suppose (y, aq), (y, x2) G G, then y = 2xi + 1 and y = 2x2 + 1,
hence 2aq + 1 = 2x2 + 1, and aq = x2. The domain of this function
G is the same as the range of the function G, namely the set of all odd
integers. Formally, we could define G(x) = (x — l)/2 on this domain.
It is seen that

(2:3-38) the converse of a function F is again a function if and only if


we have for any aq, a;2 e 36(F) that if F(aq) = F(x2) then
xi = x2 [equivalently, if aq ^ x2 then F(aq) ^ F(x2)].
2.3] RELATIONS AND FUNCTIONS 53

If a function F has as its converse a function, then F is called the inverse


of F, and is denoted in this case by F~l] also F is said to be a one-to-one
or bi-unique function or correspondence [from 2D(G0 onto (R(Cr)].

(2:3-39) If F is a one-to-one function then 3D(F~:) = (R(F) and


(R(F—1) = 2D(F); for each a E 2D(F), F~1(F(a)) = a and
for each b E 3D(F~1), F(F~1(6)) = b.

For F~1(b) is the unique x such that (6, x) E F, i.e., such that F(x) = b;
hence F~ 1(F(a)) is the unique x such that F(x) — F(a), i.e., is a itself.
The notion of one-to-one function applies directly to functions of more
than one argument, for then we merely view the domain as a set of ordered
pairs, triples, etc. For example, the function G(x, y) = x -p y with domain
P X P is not one-to-one, while the function H(x, y) = 2X • 3?y is. (Why?)
The composition F; G of two functions F, G is the set of all ordered
pairs (x, y) such that for some z, (x, z) E F and (z, y) E G. Given x, y,
if there is any such 2 then x E 2D (F) and 2 must be F(x); then (2, y) E G
implies 2 e 2D (GO, i.e., F{x) E 2D (G), and y = G{z), i.e., y = G{F(x)).
Hence it is seen that

(2:3-40) if F, G are functions then so also is H = F; G. The domain of


H consists of all x such that F(x) E 2D (GO, and for each x E 3D (I/),
H(x) = G(F(x)).

For example, if F(x) = x2 with domain Re and G(x) = 2x + 1 with


domain I, then (F; G)(x) = 2x2 + 1 with domain I.

Exercise Group 2.3

1. (a) Show that for any (not necessarily distinct) elements a, b, c, d,


{a, b} = {c, d} if and only if a = c and b = d or a = d and b = c.
(b) Show that for any elements a, b, c, d, {{a}, {a, 6}} = {{c}, {c, d}}
if and only if a = c and b = d. Thus we can define the ordered
pair in set-theoretical terms by (a, b) = {{a}, {a, 6}}.
(c) Show that the existence of the cartesian product A X B (2:3-8) can
be proved from the existence of A U B on the basis of (2:1-39),
(2:1-40), if we define ordered pair as in (b).
2. Find the domain and range of each of the following relations. Give a
geometric interpretation in each case.
(a) {(x, y):x, y E Re and x2 + Ay2 < 1}
(b) {(z, y): x, y E Re and y2 = x)
(c) {(x, y): x, y E Re and y = x2}
(d) {(x, y):x, y G Re and y2 = x2}
(e) {(x, y, z): x, y, z E Re and x — 2y + 2 = 3}
(f) {(x, y, z): x, y, z E Re and x2 + V2 + z2 < 1 and x + y = 1}
54 THE SET-THEORETICAL BACKGROUND [CHAP. 2

3. Classify each of the following relations according to whether they do


or do not have the properties of being reflexive, irreflexive, symmetric,
antisymmetric, transitive, an equivalence relation. Give your reasons.
(a) {{x, y): x, y £ I and x < y + 1}
(b) {(x, y): x, y G I and x2 = y2}
(c) {(x, y): x, y £ I and \x\ < \y\)
(d) {(x, y): x, y £ I and x, y are both even and x < y or x, y are both
odd and x < y or x is even and y is odd}
(e) {(X, Y) :ICI, FCI and X f| Y = 0}
4. Show that the following are true for any relations U, V, IF; here W
represents the converse of a relation W.
(a) Vj (V; W) = (U; V); W
(b) 'u = u
(c) (U; F) = V; G
(d) F; (F U IF) = (t/; F) U (U; W)
Show that in general t/; (F fl IF) ^ ([/; F) D (U; IF).
5. Show that if lb is a relation with 3D (IF) = S then IF is an equivalence
relation in S if and only if IF = IF and IF; IF c IF.
6. Find the composite functions F; G and G; F in each of the following cases,
and find the domain in each case.
(a) F{x) = x2 - 9 (xe Re), G{x) = Vx(xG Re, x > 0)
(b) F{x) = {l/x){x G Re, x ^ 0), G = F
(c) F{x) = x2 (x£Re), G(x) = f° lf - x — 1
[1 if x < —1 or 1 < x
7. Which of the following functions is one-to-one? In each such case describe
the inverse function and find its domain.
(a) Fix) = x2 + 1, x G Re
(b) Fix) = x3, x G Re
(c) F(z) = \/{x2 +1), x G Re, x > 0
(d) Fix, y) = 2X ■ y, x, y G Re
8. For x, y G I, let (z, y) £ IF if x — y is a multiple of 4; write x s= v if
Or, 2/) G IF.
(a) Show that = is an equivalence relation with four equivalence sets
[0], [1], [2], [3].
(b) Define operations ©, » on equivalence sets so that [a] © [6] =
[a + b], [a] o [6] = [a • b]. Make a table for each of these operations.
Which of the following are true and which false, for the class M of
equivalence sets? Prove your statements.
(i) For each X G M, X © [0] = X;
(ii) For each A" G M, X » [I] = A";
(iii) For each A" £ M there is a Y £ M with X © Y = [0];
(iv) For each X £ M, if X ^ [0] there is a Y £ M with A" ° Y = [1] •
(v) If X, Y, Z £ M, X ^ [0] and X ° Y = X ° Z then Y = Z •
M If X, Y,Ze M then X «. (F © Z) = (X „ Y) © (X « Z).’
2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 55

2.4 Mathematical systems of relations and functions. In this book we


will be studying the properties of various sets of numbers S with respect
to various operations F x, F2, ... on F and relations W x, W2, ... in S.
Further, we often single out some particular elements ax, a2, . . . of S
when these have some special properties which distinguish them from other
elements. Different such systems of functions, relations, and distinguished
elements may have closely related properties. For example, the set of
real numbers Re, under addition +, with distinguished element 0, is very
similar to the set of positive real numbers Re', under multiplication • ,
with distinguished element 1. For example, x + y = y + x for all x, y e
Re and x ■ y = y ■ x for all x, y e Re'; x + 0 = x for all x G Re, and
x ■ 1 = x for all x e Re'; for each x e Re there is a y £ Re with x + y = 0
(namely, —x) and for each x e Re' there is a y e Re' with x ■ y = 1
(namely, 1/x); and so on. It seems reasonable, therefore, to speak of the
properties of the systems (ordered triples) (Re, +, 0) and (Re', •, 1). We
are thus led to the following general notion:

(2:4-1) a mathematical or algebraic system is an ordered (k + l + m


+ 1 )-tuple

(S, Fi, F2, . . . , Fk, Wx, W2, ■■■, Wh ax, a2, ..., am)

in which F1; F 2l . . . ,F k are operations on S under which S is


closed, Wx, W 2, . . . ,Wi are relations in S, and ax, a2, .
are certain specified elements of S.

Isomorphism. From the algebraic point of view, the particular way in


which the elements of S and the operations, relations, etc., on S are defined
are not as important as the properties of S under these operations and
relations. Thus we would say we are dealing with essentially the same
system if we have another system (S', F[, F'2, ... , F'k, W[, W'2, ... W[,
a'x, a2 ... , a'm) with exactly the same properties. To express this more
precisely, we must first limit the kinds of systems to be compared.

(2:4-2) Two mathematical systems

(S, F1} F2, . . . , Fk, Wx, W2, Wh ax, a2, . . . , am)

and

(S', F[, F'2, ... , Fr, W[, W2, . . ■ , W[', a'x, 02 , ... , a'm,)

are said to be of the same type if the following conditions hold:


(i) k = k’, l = V, and m — m';
(ii) corresponding functions Ft- and Ft have the same number
of arguments, so that, e.g., if F{ is unary on S, 3D(Ff) = S,
5G THE SET-THEORETICAL BACKGROUND [CHAP. 2

then F'i is unary on S', SD(F') = S', and if Ft is binary


on S, £)(Fi) = S X S, then F'i is binary on S', ■D(Fi) =
S' X S';
(iii) corresponding relations Wi and W( apply to the same
number of arguments, so that, e.g., if Wt is binary in S,
Wi c S X S, then Wi is binary in S', W' c S' X S'.

As examples, (Re, <) and (Re, +) are not of the same type, (Re, •) and
(Re, Sq) are not of the same type, where Sq(z) = x2, while (Re, +, •, y/2)
and (Re', •, +, —3) are of the same type.
Now the algebraic indistinguishability of two systems can be explained
as follows.

(2:4-3) Two mathematical systems (S, . . .) and (S', ...), as in (2:4-2),


are said to be isomorphic if they are of the same type and if
there is a function G satisfying the following conditions:
(i) G is a one-to-one correspondence from S onto S' [i e with
£>((?) = S, (R(G) = 5'];
(ii) G(ai) = a'x, G(a2) = a'2, ... , G(am) = a'm;
(iii) if W i is, say, binary then for any x, y e S, (x, y) e Wx
if and only if (G(x), G(y)) £ W(; and similarly for
W2, W2,..., Wi, Wi (and any other number of arguments);
(iy) if F1 is, say, unary then for any x, y <= S, (x, y) e Fx
if and only if (G(x), G(y)) e F\, i.e., G(Fx(x)) =
F\ (G(x)) ; and if Fx is, say, binary then for any x, y,
2 £ S, (x, y, z) e Fx if and only if (G(x), G(y), G(z)) e F\
i.e., G(Fx(x,y)) — F\ (G(x), G(y)); and similarly for
F2, F2, . . . , Fm, F'm (and any other number of arguments).
If 0)~0V) hold we say that (S, . . .) and (S', . . .) are iso¬
morphic under G and write (S, . . .) ^ (S', . . .).

As an example, it can be seen now that

(Re,+, <, 0) = (Re', •, <, 1),

when Re = {x: ieRc and x > 0}. Here the symbol + is used to denote
the binary operation F(x, y) = x + y on Re X Re, and • denotes the
operation F'(x, y) = x ■ y on Re' X Re'. On the right side, < denotes the
relation < restricted to elements of Re'. A suitable one-to-one function
which will establish the isomorphism is G(x) = 2X for x e Re. To verify
that this works, we would have to show that:

(i) T>(G) = Re and (R(G) = Re' (i.e., for each y e Re', with y > 0,
there is an x e Re with 2X = y), and G is one-to-one (i.e. if
2xi = 2X2 then xx = x2);
2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 57

(ii) (7(0) = l(i.e,2° = 1);


(iii) x < y if and only if G(x) < G(y) (i.e., if and only if 2X < 2V);
(iv) G(F(x, y)) = F'(G(x), G(y)) [i.e., G(x + y) = G(x) ■ G(y), or
2x+y — 2X • 2V]

Such properties of exponentiation on reals are familiar to the reader (the


x to be chosen in (i) such that 2X = y is usually called log2 y); we shall
discuss these later in this book.

Set-theoretical equivalence. One very special case of the notion of mathe¬


matical system and of isomorphism is that when the systems under con¬
sideration have no operations, relations, or distinguished elements; in other
words, we are just dealing with “1-tuples” (S) of sets S. In this case,

(2:4-4) the statement that (S) is isomorphic to (S') reduces to the state¬
ment that there is a one-to-one correspondence G between S and S'.
If this holds, we say that S and S' are set-theoretically equiva¬
lent or equinumerous.

The latter term is used because we can pair off the elements a, b, ... of
S with the elements a', b' ... of S' by the rule a' = G(a), b' = G(b), ... .
This is just an abstract version of the most primitive form of counting
(to see how many sheep one has, tie each to a tree). However, if we say
that two sets have the same number of elements whenever they are set-
theoretically equivalent, we open the way for some apparent paradoxes.
For note that the sets S — (1, 2, 3, . . .} ( = P) and S' = (2, 4, 6, . . .}
are set-theoretically equivalent by the function G(x) = 2x. Thus a set
can have the same number of elements as a proper subset of itself. How¬
ever, there is no real contradiction here unless we should also try to
insist that a set S cannot have the same number of elements as any proper
subset S' of itself, as we know to be the case with finite sets S. In fact,
we see now, from these intuitive judgments, how to give purely set-
theoretical definitions of the notions of finiteness and infinity:

(2:4-5) S is finite if and only if S is not set-theoretically equivalent to


any proper subset of itself. S is infinite if and only if it is not
finite.

It is also often said that S and S' have the same cardinal number if they
are set-theoretically equivalent. Note that this does not say what a
cardinal number is; it merely defines a relation between sets, namely,
being of the same cardinal number. Now it turns out that if we could
form a set

(2:4-6) W = {(S, S'): S and S' are set-theoretically equivalent},


58 THE SET-THEORETICAL BACKGROUND [CHAP. 2

Then we would obtain

(2:4-7) W is an equivalence relation in the sense of (2:3-28).

Ihen if we could go further and form the class of equivalence sets of W,


it would be natural to say that a cardinal number is just one of these
equivalence sets. However, in the usual formulations of set theory, the
formation of W as in (2:4-6) cannot be justified. Nevertheless, with
certain modifications this approach can be pursued to reduce our usual
notions of numbers to purely set-theoretical terms. We do not plan to
follow this approach here, but shall instead begin in the next chapter
by framing our most basic and intuitively clear conceptions of the finite
numbers in axiomatic form.
We shall have occasion to return to the notion of set-theoretical equiv¬
alence at various points in the following. In particular, we shall see in
Chapter 7 another of the “paradoxes of the infinite”; contrary to an un¬
informed guess, it is not true that any two infinite sets have the same
cardinal number. Much of modern set theory is closely connected with the
different “sizes of infinity-” which this remark suggests exist.

Subsystems. Another important relation between general algebraic


systems that we shall frequently encounter is that in which one is a sub¬
system of the other. Perhaps the force of the definition which we shall
take for this is best understood by first considering an example in which
the relation does not hold. The systems (Re, +, <) and (Re', •, <) are,
as we have seen, ^ when Re' is the set of positive real numbers. Further¬
more, Re' c Re. However, we would not say that the system (Re', •, <)
is a subsystem of (Re, +, <), for, in general, for x, y e Re', the value
x ' y °f the operation in the first system is not the same as the value
x + V Siven by the operation in the second system. (It is true that for
some x, y e Re', x ■ y — x -f- y, for example x = 2, y = 2, or x = f,
V = 3, etc.; however, the main point here is that it is not true that for all
x> y e Re'> s • y = x + y, for example 1 • 1 5* 1 + 1.) On the other hand,
we would regard (Re', <) as a subsystem of (Re, <), since, for x, y e Re'
the relation x < y holds in the first system if and only if it holds in the
second system. This leads to the following definition:

(2:4-8) Suppose that (S, F1, F2,..., Fk, Wu W2)..., Wl} a1} a2,...
O and (S', F{, F>2, . . . , F’k, W[, W’2, . . . , W'h a[, a'2, . . , a'm)
are systems of the same type. We say that the second system is
a subsystem of the first, or that the first is an extension of the
second, if the following conditions hold:
(i) S'Ctf;
(ii) a[ = ax, a2 = a2, . . . , a'm = am;
2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 59

(iii) if Wx is, say, binary, then for any x, y E S', (x, y) e W[


if and only if (x, y) e Wx, and similarly for W2, W'2, . . . ,
Wi, W[ {and any other number of arguments);
(iv) if Fi is, say, binary, then for any x, y e S', F\{x, y) =
Fi(x, y), and similarly for F2, F2, . . . , Fk, F'k {and any
other number of arguments).
We shall also say that S' is a subsystem of the first system, or
simply that S' is a subsystem of S, with respect to the given opera¬
tions, relations, and constants, if these conditions hold.

It is very common to use the same symbols for the operations, relations,
and constants of two systems when one knows or wishes to indicate that
one is a subsystem of another. Thus, for example, we write (Re', +, •),
(Re, +, •)• This implicitly involves a statement that the + in the first
case denotes a binary operation F' whose domain is Re' X Re' and whose
range is contained in Re', such that for any x, y E Re', F'(x, y) = x + y,
where -j-is the operation given in the second system; similarly, for the use
of • . Among other things, implicit in this is the fact that for the operation
+ given on Re, we have x -)- y E Re' whenever x, y G Re', i.e., that Re'
is closed under the operation + from the larger system. In contrast,
starting with the system (Re, —), we could not speak of a subsystem
(Re', —), despite the fact that Re' c Re, since Re' is not closed under
subtraction and hence (Re', —) does not even form a mathematical system.
Because of the conditions (iii), (iv) of our definition (2:4-8), such
ambiguity in denoting distinct operations and relations will not in general
lead to confusion. The only case where we must be careful is when we are
dealing with more than one extension of the same system, where the
extensions themselves may not be related to each other by c. In such
cases we may continue to use the same symbols for one extension, while
introducing new symbols for the second, e.g. (Re', +, •), (Re, +, •),
{S, ©, °). Here the fact that the first system is intended to be a sub¬
system of the third is given by the statement: Re' c S and for all
x, y G Re', x + y = x © y and x ■ y = x ° y; while, by this symbolism,
nothing need be said to indicate that the first is a subsystem of the second.
Systems related by the conditions of (2:4-8) share many interesting
algebraic properties. For example, if F± is commutative, i.e., Fx{x, y) —
F\{y, x) for all x,y<ES, then so is F\, F\{x, y) = F\{y, x) for all x,y e S'.
On the other hand, and in contrast to the case of =, they do not in general
share all algebraic properties. For example, in (Re, -j-) we know that there
is an x (e Re) such that for all y (e Re), x + y = y, namely x = 0.
But in (Re', +) the corresponding statement is false, i.e., there is no
x (G Re') such that for all y (e Re'), x + y = y. There are cases where
we have a proper subsystem which is, at the same time, isomorphic to
60 THE SET-THEORETICAL BACKGROUND [chap. 2

the extended system—for example, (E, +) with (I, +), where E is the set
of even integers. However, such cases are rarely met in algebra. We shall
have more to say later concerning the connection between properties of
a system and some subsystems.
We shall conclude this section with a general result on isomorphism and
extension systems which will have several applications in our work to come.
The situation usually faced in these applications is the following. We have
a system

(/S', . . .) — (S, Fi, F2, Fk, W W2, Wi, alt a2, , am)

which we find deficient in certain respects, and which we wish to extend to


a new system (S*, . . .) where these deficiencies are overcome. For example,
the system (P, +) of positive integers is deficient in that we cannot in
general perform in it the operation of subtraction. This is the basic
motivation for extending P to the set of integers I, and extending -|- on P
to an operation -[- defined on I, with respect to which subtraction is always
possible.
It turns out in practice that one can usually obtain, in some way, a
system (S*, . . .) (of the same type as S) which satisfies the newly desired
properties, but which does not directly contain (S, it contains,
instead, a subsystem (S, . . .) isomorphic to (S, . . .). Diagrammatically,
we have the following relationship:

Figure 2.22

Here G is taken to be a function which establishes the isomorphism


(/S', . . .) = (S, . . .). The question is then whether we can find a system
(/S*, . . .) which is an extension of (S, . . .) and which has the same proper¬
ties as the constructed system (S* . . .). This is guaranteed by the fol¬
lowing result.

(2.4-9) Suppose that (*S, . . .), (S, . . .), and (S*, . . .) are all mathematical
systems of the same type and that (/S, . . .) ^ (S, . . .) and
(S, . . .) is a subsystem of (S*, . . .). Then we can find a system
{S*, . . .) such that (£*, ...)=* (S* . . .) and (*S, . . .) is a sub¬
system. of (*§*, . . .).
2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 61

Diagrammatically, this means we can complete Fig. 2.22 as follows:

Figure 2.23

Here H is a function which will establish an isomorphism between the


system (S*, . . .) to be found and the system (S*, . . .) which we have.
Moreover, H will be chosen to agree with G on S, that is, H{x) = G(x)
for each x e S.
We illustrate the proof of (2:4-9) for a completely typical case, that of
a system (S, F, W, a) where Fisa binary operation on S and W is a binary
relation on S; (S, F, W, a) and (S*, F*, W*, a*) are of the same type.
By the assumption (S, . . .) = (S, . . .) of (2:4-9) we have a function G
satisfying the following conditions:

(1) G is one-to-one, with tD(G) = S and (R(G) = S;


(2) G{a) = a;
(3) (x, y) G W if and only if (G(x), G{y)) <E W, for all x, y e S;
(4) G(F(x, y)) = F(G'Oz), G(y)) for all x, y e S.

By the assumption of (2:4-9) that (S, . . .) is a subsystem of (S*, . . .)


we have:

(5) S c S*;
(6) a = a*;
(7) (u, v) G W if and only if (u, v) e W*, for all u, v e S;
(8) F(u, v) = F*(u, v) for all u, v e S.

The first step in finding the required system (S*, F*, W*, a*) is to find
a set >S* and a function F[ such that the following conditions hold:

(9) S c F*;
(10) H is one-to-one, with 30(77) = and (R(H) = S*;
(11) H(x) = G(x) for all x e S.

To do this, we must determine the elements of — S in one-to-one cor¬


respondence with those of S* — S. Note that it is not excluded that there
may be u e S* with u G S as well. To ensure disjointness, consider the
elements (S, u) for u e S* — S, i.e., the elements of the cartesian product
{*8} X (S* — S). Under the definition (S, u) = { {*8}, {$, u}} of ordered
62 THE SET-THEORETICAL BACKGROUND [CHAP. 2

pair in Exercise 1 of the preceding section, it is seen that we never have


(&, u) £ S. For otherwise this would give us sets X, Y with S £ X,
X £ Y, and F £ S, and such a circularity is not possible under our con¬
ception of set (and is specifically excluded in axiomatic set theory). In
other words, S D ({&} X (S* — S)) = 0. We now take S* = S U
({5} X (S* — S)), and by the disjointness of these sets, we can define
H unambiguously as follows. If x £ S, we take H(x) = G(x). If x £
{$} X (S* — S), there is a unique u £ S* — S with x = (S, u), by
definition of ordered pair; in this case we take H(x) = u. It is then a direct
matter to see that (9), (10), and (11) are fulfilled under these definitions.
Having (9)—(11), we find that the rest is now fairly straightforward.
We first take

(12) a* = a

Then to see, for x, y £ S*, whether to put (x, y) in W*, we first determine
whether the corresponding pair (H(x), H(y)) belongs to W*, i.e., we define

(13) (x, y) e W* if and only if (H{x), H(y)) £ W*, for all x, y £ S*.

Finally, to find, for x, y £ $*, what value 2 to ascribe to F*(x, y), we first
see what value w is given to F*(H(x), H(y)) and then choose 0 so that w
corresponds to it under H. In other words, we define

(14) F*(x, y) = H~l{Y*{H{x), H(y))) for all x, y £ S.

It follows that

(15) H(F*(x, y)) = F*{H(x), H(y)) for all x, y £ S.

It is thus seen from (10) through (15) that

(16) H establishes an isomorphism between (S*, F*, W*, a*) and


(S* F* W* a*).

Finally, we have

(17) (S, F, W, a) is a subsystem of (S*, F*, W*, a*).

For example, to check the condition on F and F*, consider any x, y £ S.


Since F(x, y) £ S, we have

H{F(x, y)) = F(H(x), H{y))

by (4) and (11). But by (1) and (11), H(x), H(y) £ S, so that

F(H(x),H(y)) = F*(H(x), H(y))


2.4] MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS 63

by (8). Hence

H(F(x,y)) = F *(H(x),H(y)),

and

F(x, y) = #-^*(#(2), #(*/))).

Thus F{x, y) = F*(x, y) by the definition (14) of F*. The proof of the
condition on W, W* makes similar use of (1) through (8), (11), and (13).
Since a* is chosen equal to a by (12), this completes the proof.
CHAPTER 3

THE POSITIVE INTEGERS

3.1 Basic properties. The positive integers have two basic uses, count¬
ing and ordering. The simplest concrete representatives of these numbers
are the series of tallies

i, ii, in, mi,...


When counting a (finite) set of objects we place it in one-to-one cor¬
respondence with one of these tallies; when ordering the set we enumerate
its elements in correspondence with the sequence of tallies, as first, second,
third, etc. The basic properties of the positive integers can be described
and developed from either of these two points of view. From the first
point of view they fall under the theory of cardinals in set theory, while
from the second they fall under the theory of ordinals. We shall begin with
the second approach and then return to the first later on.
We thus conceive of the positive integers as being compared by a certain
relation < of ordering. We say a precedes b in the ordering, or that b
succeeds or follows a if a < b. We might seek to characterize the positive
integers in terms of properties of this relation <. An even more primitive
relation is that where a immediately precedes b or b immediately succeeds a;
in the interpretation by tallies this holds when b is obtained from a by
adjoining a single tally to a. In this case we shall say that b is the successor
of a, and write b = Sc (a). Evidently, Sc is a function which can be
applied to every positive integer. What are the basic properties of this
function?
First of all, we have a positive integer, which we denote by 1, which is
not the successor of any other positive integer. Second, it is evident
that if b is a positive integer different from 1, then it is the successor of a
unique integer a. Hence if b = Sc(aa) and b == Sc(a2), then ax = a2;
in other words, the successor function is one-to-one. Finally, we see that
every positive integer can be obtained from 1 by applying the successor
function suitably often. In more precise set-theoretical terms, this can
be expressed as follows: if a set A contains 1, and if A contains Sc(x)
whenever it contains x, then A contains all positive integers.

Peano systems and inductive proofs. We are now led to consider systems
with these three basic properties.
64
3.1] BASIC PROPERTIES 65

3.1 Definition. By a Peano system we understand a system (P, Sc, 1),


where Sc is a function on P under which P is closed and where 1 e P,
such that:
(i) for all x e P, Sc(F) ^ 1;
(ii) for all x, y e P, if Sc (a) = Sc (y) then x = y;
(iii) if A c P and 1 E A and A has the 'property that whenever x eA
then Sc(x) e A, then A = P.

These properties are satisfied in our intuitive conception of the positive


integers. * Whoever agrees with this agrees then that there exists at least
one Peano system. However, this statement of existence cannot be in¬
ferred from any of the general principles of set theory described in the
preceding chapter. We must thus take the position that in a mathematical
development the assumption of the existence of Peano systems has to be
taken as a basic initial hypothesis. (A simple alternative and equivalent
hypothesis, called the axiom of infinity, is described in Appendix I.)

3.2 Axiom. There exists at least one Peano system.

Other than 3.2 and the axioms of set theory referred to in Chapter 2, no
other basic assumptions will be needed in this book.
One of our first main objects will be to show that there is essentially
only one Peano system, i.e., that any two Peano systems (P, Sc, 1) and
(P', Sc', 1') are isomorphic. Let us first see that this would not be the case
if any one of the conditions 3.1 (i)—(iii) were omitted. The argument here
is informal.
The simplest system which satisfies conditions 3.1(h) and (iii) but not
3.1 (i) consists of a single element, which we denote by V. We put P' = {T}
and Sc'(P) = P. A system which satisfies 3.1 (i) and (iii) but not 3.1(h)
must have at least two different elements. Here let V, 2' be any distinct
objects, set P' = {P, 2'}, Sc'(l') = 2' and Sc'(2') = 2'. Finally, to con¬
struct, a system which satisfies 3.1 (i) and (ii) but not 3.1 (iii), it is seen
that we must take a domain with infinitely many objects. For in this
domain we must have at least the elements P, Sc'(l'), Sc'(Sc'(l')), . . . ;
by 3.1(i), each of the elements past P must be distinct from P. Further,
we have by 3.1(h) that if Sc'(Sc'(P)) = Sc'(P), then Sc'(l') = P, con¬
tradicting 3.1 (i), so Sc'(Sc'(P)) is also distinct from Sc'(P). Similarly it
is seen that each of the elements in this sequence is distinct from all other

* Although the study of positive integers is ancient and the axiomatic method
itself is a couple of thousand years old, an explicit axiomatic treatment of the
positive integers dates only to the late nineteenth century, beginning with the
work of the mathematicians R. Dedekind and G. Peano. The conditions set
down in 3.1 correspond directly to the axioms given by Peano.
66 THE POSITIVE INTEGERS [CHAP. 3

terms of the sequence. However, it is not prevented that there be other


elements of the list. Let us introduce a new element 6 distinct from all
the elements 1', Sc'(l'), Sc'(Sc'(l')), . . . ; further introduce new distinct
elements Sc'(6), Sc'(Sc'(6)), . . . Let P' consist of all elements obtained
from 1', b by repeated application of Sc'. That this system (P', Sc', 1')
does not satisfy 3.1 (iii) is seen by taking A to consist only of the elements
1', Sc'(l'), Sc'(Sc'(l')), . . .
Observe that in this last example of a system which is not a Peano
system we have an element b which is not the successor of any element c.
That this cannot happen in a Peano system, except when b = 1, is seen
from the following theorem.

3.3 Theorem. Let (P, Sc, 1) be a Peano system. Then for any x e P,
either x = 1, or there is a y e P with x = Sc (y); moreover, in the
latter case, y is unique.

Proof. By the example of the preceding paragraph it is clear that in


order to prove this theorem we must make essential use of 3.1 (iii). Let

A = {x: x e P and x = 1 or for some y e P, x = ScQ/)}.

We wish to show that A = P. Clearly A c P and 1 e4. Further,


suppose that x G A] we wish to show that Sc(F) e A, i.e., that Sc(:r) = 1
or for some zeP, Sc(az) = Sc(z). The first possibility is excluded by
3.1 (i), but the second possibility is trivially satisfied by z = x. Hence
by 3.1 (iii) we obtain the desired conclusion. To prove the uniqueness,
suppose that x = Sc(y) and x = Sc(,s); then y = z by 3.1 (ii).
The first part of the proof of 3.3 is the simplest example of what is
called a proof by induction. The typical situation is one in which we wish
to show that a certain condition a(x) is satisfied by all x G P, where
(P, Sc, 1) is a Peano system. Form the set A = {x: x G P and a(x)}.
Then we wish to show Pci; since A c P by definition, this amounts to
showing that A — P. If the proof of the latter is given by 3.1 (iii), we
have an example of a proof by induction. What is essentially involved in
such a proof are the following two steps:
(i) show <J(1) holds;
(ii) show that whenever x e P and <2(x) holds then d(Sc(x)) holds.
We shall make use of such proofs quite often in connection with the
positive integers.

Functions on Peano systems. We have not so far discussed the operations


of addition, multiplication, exponentiation, etc., on the positive integers.
We could try to search out basic properties of these to be adjoined to those
for any Peano system. However, there is a uniform procedure for charac-
3.1] BASIC PROPERTIES 67

terizing these operations and reducing their properties to the properties


of Peano systems. Suppose, for example, that we already understand
the operation of multiplication. How do we go about computing the result
of exponentiation, 2a, for any a E P? Intuitively this is to be a product of
2 by itself a times. Thus 21 = 2, 22 = 2 • 2, 23 = (2 • 2) • 2 = 22 • 2,
^ ((2 • 2) • 2) -2 = 23 • 2; in general, if a is the successor of a number
b, a = Sc (6), we compute 2a by first computing 2b and then multiplying
by 2, i.e.,

By repeating this procedure we will eventually reduce the computation


of 2a to the computation of 21, which we know, by definition, to be 2.
Thus it would seem that the rules

(3:1-1) 21 = 2,
2sc(*) = 2* • 2, for all ieP,

would completely define 2“ for any a e P. Note that these rules do not
constitute an explicit definition of 2“, but only provide us with a systematic
procedure to calculate 2“. Intuitively, we must get a unique value from
this calculation for any a. This can be put in more general terms as follows.
We suppose that we have a known function G available, in this case one
which gives us multiplication by 2, G(x) = x • 2 for any x £ P. We wish
to find a function F(x)(=2X) which has a given value c at x = 1, in this
case c = 2. Further F(Sc(x)) is related to F(x) for any xePby the use
of the function G, F(Sc(x)) = F{x) ■2 = G(F(x)). The question is
then, given any number c e P, and any function G on P with values in
P, does there exist a function F with domain P such that:

(3:1-2) HD = c>

F(Sc(x)) = G{F(x)), for all x e P.

Further, can there be more than one such function, or is F uniquely


determined? The intuitive reasons for the existence and uniqueness of
such a function are clear. However, a formal proof, such as the one we
shall give in the next theorem, is somewhat more involved. (The student
may do well to go over this proof twice: first to get the general idea and
again to see the necessity of the various details. He may, alternatively,
find it better to skip at this point and then return to it after studying the
applications of the theorem in the remainder of this chapter.) In this
theorem we shall consider also a slightly more general situation: we seek
a function F with domain P and range a subset of a certain set S; in order
that G(F{x)) be defined, G must then be a function with domain S.
68 THE POSITIVE INTEGERS [CHAP. 3

3.4 Theorem. Suppose that S is a set, c E S, and that G is any function


with T>(G) = S, 6i(G) c S. Suppose further that (P, Sc, 1) is a Peano
system. Then there exists a unique function F satisfying the following
conditions:

(i) 2D(F) = P and (R(F) c S;


(ii) F(l) = c;
(iii) for all x E P, F(Sc(x)) = G(F(x)).

Proof. The proof falls into two parts; we first show that there exists
at least one function F satisfying the conditions (i)-(iii), and we then
show that any two such functions are identical.

Part 1. The desired function is a certain relation IT cPxS. It is


to have the properties
(ii) ' (1, c) e IT,
(iii) ' if (x, y) E IT then (Sc(:r), G(y)) E IT.

[The latter is just another way of expressing (iii), that if F(x) = y then
F(Sc(x)) = G(y)]. However, there are many relations which satisfy this
condition; one such is P X S. What distinguishes the desired function
from all these other relations is that we want (a, b) to be in it only as
required by (ii)', (iii)'. In other words, it is to be the smallest relation
satisfying (ii)', (iii)'. This can be described precisely as follows:

(1) Let M be the collection of all relations IT satisfying (ii)', (iii)';


then we define
F = PI W[W E M],
Hence

(2) whenever IT e M then F c IT.

We shall now show that we can derive from (1) that F is also one of the
relations in M.

(3) (1 ,c)EF.

This follows immediately from the definition of fl and the fact that
(1, c) E IT for all IT e M.

(4) If (x, y) eF then (Sc(x), G(y)) E F.

For if (x, y) E F then (x, y) E IT for all IT e M; hence by (iii)',


(Sc(x), G(y)) E IT for all IT e M so that (Sc(a;), G(y)) E F by (1).
We must now verify that, F is actually a function, i.e., we wish to show
that for any x, zh z2 E P, if (x, zf) E F and (x, z2) E F, then zx = z2.
3.1] BASIC PROPERTIES 69

We shall prove this by induction on x. Let

(5) A = {a:: x G P and for all zlt z2 G P, if (x, zi) G F and


(x, z2) G F then zx = z2}.

We shall show A = P by applying 3.1 (iii). First we have

(6) IgA.

To prove (6), it suffices to show that for any z, if (1, z) G F then z = c.


We prove this by contradiction; in other words, suppose to the contrary
that there is some 2 with (1, z) G F but z ^ c. Consider the relation
W = F — {(1, 2)}. Since (1, c) G F and (1, c) 7^ (1, 2), it follows that
(1, c) G IF. Moreover, whenever {u, y) G W then (u, y) G F and hence
(Sc(u), G(y)) G F; but Sc(u) 1, so (Sc(it), G(y)) 7^ (1,2), and hence
(Sc(it), G(y)) G W. Thus W satisfies both conditions (ii)', (iii)'; in other
words, W G M. But then it follows from (2) that F c W; however this
is clearly false since (1, 2) G F and (1,2) £ W. Thus our hypothesis has
led us to a contradiction, and hence (6) is proved. Next we show that

(7) if x G A then Sc(x) G A.

Suppose that x G A, so that whenever (x, z{) G F and (x, z2) G F then
= z2. We must show that whenever (Sc(x), w 1) G Band (Sc (a:), w2) G F
then w1 = w2. To prove this, it suffices to show that

(8) whenever (Sc(a;), w) G F then there exists a 2 with w = G{z) and


(x, 2) G F.

For if (8) is true, we would have for the given Wi, w2 some zh z2 with
Wl = G(zi), w2 = G{z2), (x, z 1) G F and (x, z2) G F. Then, since
x G A, 21 = z2 and hence G(z\) = G{z2), that is, W\ = w2. Now to
prove (8) suppose, to the contrary, that it is not true; in other words,
suppose that we have some w with (Sc(a:), w) G F but such that for all
2 for which (x, 2) G F we have w 9* G{z). Consider the relation W =
F — {(Sc(x), w)}. We shall show that W G M. First of all (1, c) G F
and (1, c) (Sc(x), to); hence (1, c) G W. Suppose that (u, y) G W;
then (u, y) G F and (Sc(it), G(y)) G F. Clearly if u 5^ x then (Sc(m),
G{y)) ^ (Sc(a:), w) [by 3.1(h)], so that in this case (Sc(w), G(y)) G IF.
On the other hand, if u — x and (Sc(u), G(y)) = (Sc(a;), w), then
w = G(y), where (x, y) G F, contrary to the choice of w; hence (Sc(w),
G(y)) 5* (Sc(a:), w), so again (Sc(w), G(y)) G W. Thus whenever
(u, y) G W, also (Sc(w), G(y)) G IF. Now that we have shown IF g M
we see by (2) that F c W; but this is false since (Sc(x), w)e F and
70 THE POSITIVE INTEGERS [CHAP. 3

(Sc(x), w) & W. Thus our hypothesis that (8) is incorrect has led to a
contradiction, and now (8) is proved. Since (7) follows from (8), we have
by induction from (6) that A = P. Hence

(9) F is a function.

We have still to prove that F satisfies condition (i); we must show that
for each x £ P there is a y with (x, y) e F. Since F c P x S, it will
then follow that 2D(F) = P and (R(F) c S. Let B = 2D(F), that is,

(10) B = {x: x £ P and for some y, (x, y) £ F}.

We prove by induction that B = P. First, lei?, since (1, c) e F by (3).


Next, if x e B, pick some y with (x, y) e F; then by (4), (Sc(x), G(y)) £ F,
and hence Sc(x) £ B.
This concludes the first part of the proof, that there is at least one func¬
tion F satisfying conditions (i)-(iii).

Part 2. It is much easier to prove that there cannot be more than one
such function. Suppose that Fx, F2 both satisfy the conditions (i)-(iii);
we wish to show Fx = F2, he., that for all x £ P, Fi(x) = F2(x). This
is proved by induction on x. By (ii), Fx(l) = c and F2(l) = c, so
Fx(l) = F2{ 1). Suppose that Fx(x) = F2(x); then Fx (Sc(x)) = G(F1(x))
and F2(Sc(x)) = G(F2(x)), so

Fi(Sc(x)) = F2(Sc(x)).

As we shall see, this is a fundamental theorem in developing the proper¬


ties of the positive integers. It is occasionally useful to consider the
following slightly more general formulation of 3.4. We omit the proof
of it, for it can be given by simple modifications of 3.4.

3.4' Theorem. Let (P, Sc, 1) be a Peano system. Suppose that S is a set,
c £ S, and that G is a binary function with 2D(Gr) = P X S and
(R(G) c S. Then there is a unique function F satisfying the following
conditions:

(i) £>(F) = P and 61(F) C £;


(11) F(l) = c;
(iii) for all x £ P, F(Sc(x)) = G(x, F(x)).

Isomorphism of Peano systems. We are now in a position to prove the


second fundamental theorem, which shows that there is essentially only
one Peano system.
3.1] BASIC PROPERTIES 71

3.5 Theorem. Any two Peano systems are isomorphic.

Proof. Suppose that (P, Sc, 1) and (P', Sc', P) are Peano systems.
We want to find a function F with the following properties:

(1) F is one-to-one;
(2) D(F) = P and (R(F) = P';
(3) P(D = i';
(4) for any x G P, F(Sc(x)) = Sc'(F(x)).

[That these are the requirements for = is seen by taking F for G in (2:4-3).]
Apply 3.4 with P' as S', 1' as c, and Sc' as G. Then the function F so
obtained already satisfies (3) and (4) and the first part of (2). Further
<3t(F) C P'. To prove that (R(F) = P', we must show that for any i/eP'
there exists an x E P with F(x) = y. Let

(5) A = {y: y E P' and for some x E P, F(x) = y}.

We shall show that A = P' by induction in P' [i.e., we apply 3.1 (iii) to
(P', Sc', 1')]. Clearly 1' e A. Suppose that y G A, so that y = F(x),
where x E P. Then Sc'(y) = Scr{F(x)) = F(Sc(x)) by (4), and therefore
also Sc'(y) G A. Thus the induction, and hence (2), is proved. Now let

(6) B = {x: x e P and for all z e P, if F{z) = F(x) then z = x}.

If we can show that B = P then (1) will be established. We again apply


induction, this time in P. First we must show that

(7) for all z e P, if F(z) = 1' then z = 1.

Suppose, to the contrary, that 2 G P, F(z) = 1' but z ^ 1. By 3.3


there is a w G P with z = Sc(w), hence F(z) = Sc'(F(w)) by (4). But
then 1' is Sc' of an element of P', contrary to the fact that (P', Sc', 1') is
a Peano system. Now suppose that x G B, i.e., that

(8) for all z e P, if F(z) = F(x) then z = x.

We wish to show that Sc(x) G B, i.e., that

(9) for all w G P, if F(w) = F(Sc(x)) then w = Sc(x).

Let we P, F(w) = F( Sc(x) = Sc '(F(x)). Here w cannot be 1, by the


same argument as before. Hence for some z E P, w = Sc(z) and F(w) =
Sc'{F{z)). Then since (P', Sc', 1') is a Peano system and Sc’(F(z)) =
72 THE POSITIVE INTEGERS [CHAP. 3

Sc'(F(aO), we have F(z) = F(x). Hence, by the hypothesis (8), also


z = x. This implies that Sc(z) = Sc(x), that is, w = Sc(x).
We have already argued that the properties which define a Peano
system are all correct in our intuitive conception of the positive integers.
Theorem 3.5 now shows that these properties completely characterize the
positive integers, at least up to isomorphism. Thus when dealing with the
algebraic properties of the positive integers, we need only consider some
particular Peano system. By formal hypothesis, 3.2, there exists at least
one such system.

3.6 Convention. Throughout the following we shall assume that (P, Sc, 1)
is some fixed Peano system. We shall call P the set of positive integers.

Thus when we use the symbols P, Sc, 1 in the following, we can omit the
explicit statement that these form a Peano system.

Exercise Group 3.1

For the purpose of this set of exercises, we temporarily put ourselves prior to
the proof of 3.4. Thus assume below that (P, Sc, 1) is any Peano system.

1. Prove that for all x £ P, Sc(x) 5* x.


2. Alternative proof of 3.4. Call B a segment of P if
(i) BCP;
(ii) 1 G B]
(iii) whenever Sc(y) £ B then y £ B.

Let, for each x £ P, Mx = {B : B is a segment of P and x £ B}. Then


define [1, x] = riP[fi £ MJ. (Intuitively, [1, x] is the set of all y £ P
with y < x.) Prove the following statements.
(a) If x £ P then [1, x\ is a segment of P and x £ [1, x].
(b) [1, 1] = {1}.
(c) If x £ P then [1, Sc(x)] = [1, x] U (Sc(x)}.
(d) If x £ P then Sc(x) g [1, x].

Suppose that we are given S, c, G satisfying the hypothesis of 3.4. Call


H a suitable partial function for [1, x] if the following conditions are satisfied:
(i) £>(#) = [1, x] and(R(H) c S;
(ii) H( 1) = c;
(iii) ifSc(y) £ [1, x] then H(Sc(y)) = G(H(y)).

Prove:

(e) For each x £ P there is at least one suitable partial function for [1, x].
(f) For each x £ P there is at most one suitable partial function for [1, x].
3.2] THE ARITHMETIC OF POSITIVE INTEGERS 73

By (e), (f) there is for each x G P a unique suitable partial function for
[1, x] \ denote this function by Hx. Prove:
(g) For any x G P and y G [1, x], HSc(x)(y) = Hx{y).

Finally, define a function F with domain P such that F(x) = Hx(x) for
each x £ P. Then prove that
(h) F is a function satisfying 3.4(i)-(iii).

We thus obtain an alternative proof of part 1 of the proof of 3.4. Part 2


remains the same.

3.2 The arithmetic of positive integers. Recursive definitions. We have


already indicated that the operations of addition, multiplication, ex¬
ponentiation, etc., can be reduced in some sense to the basic operation of
successor. Consider first the sum a + b of two positive integers. If 6 is 1
this is found simply as Sc (a). If 6 is not 1, b = Sc(c) for a unique c; if
we already know the value of a + c, we can compute that of a + b by
simply adding 1, that is, a + 6 = Sc(a + c). In general, then, we want
for any x, y e P,

(3:2-1) x + 1 = Sc (a;),
x + Sc (y) = Sc (a: + y).

Thus we are seeking a function F x(x, y) such that for any x, y G P

(3:2-2) Fi(x, 1) = Sc(x),


Fi(x, Sc(y)) = Sc(Fi(x, y)).

Similarly, once we have addition, we would want to define multiplication,


x • y, in such a way that

(3:2-3) x • 1 = x,
x • Sc(y) = (x ■ y) + x,

since x • Sc(y) = x • (y + 1). Here we seek a function F2 such that for


any x, y E P

(3:2-4) F2(x, 1) = x,
F2(x, Sc(y)) = F2(x, y) + x.

The question of existence of such functions as Fx, F2 is clearly related to


the theorem 3.4. The only difference is that here we seek functions of two
variables, i.e., with domain P X P. For simplicity of formulation, we
restrict ourselves to the case that the range of the desired function is also
74 THE POSITIVE INTEGERS [CHAP. 3

contained in P. The following theorem will give us a general existence


statement under which the arithmetical operations can be subsumed.

3.7 Theorem. Suppose that G, H are functions with 2D(G) = P X P,


(R(G) T P, T>(H) = P, (Pi(H) c; P. Then there is a unique function F
satisfying the following conditions:
(i) 3D(F) = P X P and (R(F) c P;
(ii) for any x E P, F(x, 1) = H(x);
(iii) .for any x, y E P, F(x, Sc(y)) = G(x, F(x, y)).

Proof. For any x E P put

(1) cx = H{x),

(2) Gx{z) = G(x, z),

i.e., Gx is a function with

(3) 5>(GX) = P, (R(GX) c P.

By 3.4 it follows that for each x E P there is a unique function, which we


denote by Fx, satisfying the following conditions:

(4) (i) 3XFx) = P, <R(FX) c P;


(ii) Fx( 1) = cx;
(iii) for all y eP, Fx(8c(y)) = Gx(Fx(y)).

Then define

(5) for all x,yE P, F{x, y) = Fx(y), where Fx is the unique function
satisfying (4).

It is then seen that F is a function satisfying the desired conditions (i)—(iii)


of our theorem. To prove that F is unique, suppose that F' is any other
function satisfying these conditions. Then it is easily proved by induc¬
tion on y that for all y E P and for all x E P, F(x, y) = F'{x, y). Alter¬
natively, if we denote the value F\x, y) by F'x{y), for each x E P, we see
that Fx satisfies 3.4(i) (iii). Since there is only one such function by
3-4, Fx(y) = Fx(y) for all y e P, hence F'{x, y) = F(x, y) for all x, y E P.
As with the modification of 3.4 to 3.4', we can formulate a slightly
more general statement here, whose proof is easily obtained from 3.4' as
3.7 was just obtained from 3.4.

3.7' Theorem. Suppose that G, H are functions with T>(G) = P X P X P,


61(G) T 1, 33(//) = P, (R(H) c P. Then there is a unique function F
3.2] THE ARITHMETIC OF POSITIVE INTEGERS 75

satisfying the following conditions:

(i) 2>(F) = P X P and Gt(F) c P;


(ii) for all igP, F(x, 1) = H(x);
(iii) for all x, y <E P, F(x, Sc(y)) = G(x, y, F(x, y)).

Similarly, for each fixed positive integer m, we can prove the existence
and uniqueness of functions of m + 1 variables satisfying conditions
such as:

(ii) ' for all X\, x2, ■ ■ ■ , xm G P,


F(xi, x2, . . . , xm, 1) = H(xi, x2, . . . , Xjffj j

(iii) ' /or aZZ aq, x2, . . . , xm, y <E P, F(xi, x2, . . . , xm, Sc(?/))
= (7(^i, x2, . . ., xm, y, F(x i, x2) . . . ,xm, y)).

Any operation which is introduced as the unique function satisfying such


conditions is said to be given by a recursive definition. Since these are not
explicit definitions, without the proof of either the existence or the unique¬
ness of such a function, this could not be regarded as a definition at all.
There are many other schemes besides those described here for which we
can prove the existence of unique functions on the positive integers
satisfying the associated conditions. The ones we have described are the
simplest and for this reason are often called primitive recursive definitions.
The even simpler class of definitions of the form 3.7 will be sufficient for
most of our purposes.

Addition of positive integers.

3.8 Definition. We denote by x -\- y the value G{x, y) of the unique


binary function F determined by 3.7 when we take H(x) = Sc(a:) and
G(x, z) = Sc(2). Thus:

(i) + is a binary operation on P under which P is closed;


(ii) for any x e P, x + 1 = Sc(x);
(iii) for any x, y G P, x + Sc(y) = Sc/r + y).

Various familiar properties of addition, such as commutativity, associa¬


tivity, etc., can be derived using this definition, i.e., require no additional
axioms concerning positive integers. A few attempts show the proper
order in which these should be obtained.

3.9 Theorem. {Associative law for +) For any x, y, z E. P,

x + (y + z) = {x + y) + 2.
76 THE POSITIVE INTEGERS [CHAP. 3

Proof. By induction on 2. Consider any x, y e P. Let

(1) A = {2: z e P and x + (y + z) = (x + y) -j- z}.

Then

(2) 1 £ A,

since
x + (?/ + 1) = x + Sc(y) by 3.8(h)
Sc(x + y) by 3.8(iii)
(* + y) + 1 by 3.8(h).
Also

(3) if z e A then Sc(2) e A.

For suppose that 2 e A. Then

x + (y + Sc(2)) = x Sc(y -)- 2) by 3.8(iii)


= Sc (a: + (y + 2)) by 3.8(iii)
= Sc((t + y) + 2) since 2 e A
= (x + y) + Sc(2) by 3.8(iii).

By (2), (3), and 3.1 (iii), A = P, and the theorem is proved.

Remark. In this inductive proof, the variables x, y acted as parameters,


i.e., were fixed throughout the proof. The proof could also be made without
parameters, by considering the set

B = {2: 2 e P and for all x, y e P, x + (y + 2) = {x + y) + 2}.

Both proofs are equally acceptable, but sometimes a proof with parameters
will not work while the other proof will (cf. again the proofs of 3.4, 3.5).
The student can easily justify the separate steps in the following proofs
in the same style as in 3.9.

3.10 Lemma. For any x G P, x -f- 1 = 1 -f- x.

Proof. Let

(1) A = (x: x e P and x + 1 = 1 -f- x}.

Obviously

(2) 1 e A.
3.2] THE ARITHMETIC OF POSITIVE INTEGERS 77

Also

(3) if x G A then Sc (a:) e A.

For suppose that x A. Then

(Sc(s) + 1) : (0 + 1) + l) = ((1 + ») + l)
= (l + {x + 1)) by 3.9
= 1 + Sc (a;).

Hence A = P.

3.11 Theorem. (Commutative law for T) For any x, y G P,

x + y = y + x.

Proof. Let x e P. Let

(1) A = {y: y G P and x + y = y + x}.

Then

(2) IgA

by 3.10. Also

(3) if y G A then Sc(y) e A.

For suppose that y G A. Then

x + Sc(y) = Sc(T + y) = Sc(t/ + x) since y e A


= y + Sc(x)
= !/+(*+!)
= V + (1 + x) by 3.10
= (y + 1) + x by associative law
= Sc (y) + x.

3.12 Theorem. {Cancellation law for +) For any x, y, z e P, if x + 2

y A~ z then x = y.

Proof. Let x, y G P. Let

(1) A = {z: z; e P and ifx-\-z = y-\-z then x = y} ;


(2) 1 G A.
78 THE POSITIVE INTEGERS [CHAP. 3

For suppose that x + 1 = y + 1, that is, Sc(z) = Sc (y). Then x = y


by 3.1(ii).

(3) If z El A then Sc(2) e A.

For suppose that z e A, and suppose that x + Sc(2) = y + Sc(z), that


is, Sc(:r + z) = Sc(y + z). Then x + 2 = y + 2 by 3.1(h). Hence
y = z by inductive hypothesis (z e A).

3.13 Theorem. For any x, y E P, y ^ x + y.

The proof of this is left as an exercise.

3.14 Theorem. (Trichotomy law for +) For any x, y E P, one of the


following three cases holds:

(i) x = y;
(ii) for some u e P, fc = y + u;
(iii) for some v E P, y = x + v.

Moreover, no two of these cases can hold simultaneously. Finally, the


u in case (ii) and the v in case (iii) are unique.

Proof. We begin with a proof of the last statements. Suppose that


(1), (11) were both true. Then we would have x = x + u, which contradicts
3.13. Similarly (i), (iii) cannot both be true. Suppose that (ii), (iii) were
both true. Then we would have x = (x + v) + u = x + (v + u), by
associativity, which would again contradict 3.13. The uniqueness of u in
(ii) and of v in (iii) follows from the cancellation law for +.
Now to prove that at least one of the three possibilities always holds,
consider any y e P; we shall proceed by induction on x. That is, let

(1) A = {x: x e P and x = y or for some u eP, x = y -\- u


or for some v e P, y = x + v};
(2) lei.

For either 1 = y or 1 ^ y. If 1 5^ y then y = Sc(y) for some v, by 3.3;


i.e., y = v + 1 = 1 + v. Hence either the first or third case holds for 1, y.

(3) If x E A then Sc (a;) G A.

Suppose that x e A. It is possible that x = y; then Sc(x) = Sc(y) =


y + 1, so case (ii) holds for Sc (a;), y. It is possible that x = y + u for some
u; then Sc(x) = Sc(y + u) = y -f- Sc(m), so again case (ii) holds for
Sc(*), V• Finally, it may be that y = x + v. Here, if v = 1 then y =
3.2] THE ARITHMETIC OF POSITIVE INTEGERS 79

Sc(x), so that case (i) holds for Sc (a;), y. Otherwise, v = Sc(w) for some
w) then

y = x + Sc(ip) = x + (w + 1) = (x + 1) -j- w = Sc (a:) + w,

so that in this case, (iii) holds for $c(x), y. This proves (3) and the induc¬
tion is complete.

Multiplication of positive integers. We now turn to the definition of


multiplication and the verification of various of its elementary properties.

3.15 Definition. We denote by x • y the value F(x, y) of the unique binary

function F determined by 3.7 when we take H(x) = x, G{x, z) = z + x.


Thus:
(i) • is a binary operation on P under which P is closed;
(ii) for any x G P, x ■ 1 = x;
(iii) for any x, y e P, x • Sc(y) = (x • y) + x.

3.16 Theorem. {Left distributive law for ■ over +) For any x, y, z & P,

x • (y + z) = (x ■ y) + (x • z).

Proof. Let x, y e P. We proceed by induction on 2.

(1) x • {y + 1) = (x ■ y) + (x • 1).

For x ■ (y + 1) = x ■ Sc(y) = (x • y) + x = (x ■ y) + (x • 1).

(2) If x • (y + z) = (x ■ y) + (x ■ z)

then

x • (y + Sc (2)) = S • y) + (x • Sc (2)).

For, under the hypothesis,

x ■ (y + Sc(2)) = x ■ (Sc(y + 2)) = [x • (y + z)} + x


= [{x ■ y) + (* • 2)] + x
= (x • y) + [{x • z) + x]
= (x • y) + {x- Sc(2)).

This is easily converted into an inductive proof of the form 3.1 (iii).

3.17 Theorem. {Right distributive law for ■ over +) For any x, y, 2 £ P,

(x + y) ■ z = (x ■ 2) + (y ■ 2).
80 THE POSITIVE INTEGERS [CHAP. 3

Proof. Let x, y e P. We proceed by induction on 2. Clearly

(!) 0 + y) • 1 = (x • 1) + (y ■ 1).
(2) If (x + y) • 2 = (x • z) + (y • 2)

then

(x + y) • Sc(2) = (x • Sc(z)) + (y • Sc(2)).

For, under the hypothesis,

(x + y) ' Sc(z) = [(» + y) • 2] + (a: + y) = [(x ■ z) + (y ■ z)] + (x + y)


— l(x • z) + x] + [{y • z) + y]
by associative, commutative
laws for +
= (x ■ Sc(«)) + (y • Sc(2)).

3.18 Lemma. For all x e P, 1 • x — x.

Proof. By induction on x. Clearly, 1-1 = 1. Suppose that 1 • x = x;


then 1 • ScO) = (1 • x) + 1 = x + 1 = Sc(x).

3.19 Theorem. (Commutative law for •) For all x, y e P,

x • y = y • x.

Proof. Let x e P. We proceed by induction on y:

(1) x • 1 = 1 • x.

For x • 1 = x by the definition of • and 1 • x = x by 3.18.

(2) If x • y = y ■ x then x ■ Sc(y) = Sc (y) ■ x.

For, under the hypothesis,

x ■ Sc<d/) = (x-y) + x= (yx) + x


= (yx) + (l-x)
= (y + 1) • x by 3.17
= Sc (y) • x.

3.20 Theorem. (Associative law for ■) For any x, y, z e P,

x- (y -z) = (x • y) • z.

The proof of this is left as an exercise.


3.2] THE ARITHMETIC OF POSITIVE INTEGERS 81

3.21 Theorem. (Cancellation law for ■) For any x, y, z E P,

if x ■ z = y • z then x = y.

Proof. Suppose that x ^ y. We shall show that x • z ^ y ■ z. By the


trichotomy law 3.14, either for some u E P, x = y + u or for some
v E P, y = x + v. In the first case,

x ■ z = (y + u) ■ z — (y -z) + (u-z)

by the distributive law. Since u ■ z & P, it follows again by the trichotomy


law that x • z ^ y • z. We argue similarly for the case y = x + v.

Exponentiation and other operations. We begin to see now in how rela¬


tively simple and straightforward a manner the familiar arithmetical
properties of addition and multiplication flow from their most basic
properties, as given in the recursive definitions. We now turn to a similar
but much briefer treatment of exponentiation.

3.22 Definition. We denote by xv the value F(x, y) of the unique binary


function determined by 3.7 when we take H{x) = x, G(x, z) — z • x.
Thus:
(i) exponentiation is a binary operation on P under which P is closed;
(ii) for any x e P, x1 = x;
(iii) for any x, y E P, xSc(y) = (xv) ■ x.

3.23 Theorem. For any x, y, z E P we have:

(i) lv = 1,
(ii) xy -xz = xv+z,
(iii) (xv)z = xv",
(iv) (x ■ y)z = xz ■ yz.

Proof. We prove only (ii), leaving the others for the student. The proof
is by induction on z.

(1) xv • x1 — xv+1,

for xv+l = xSc(y) = xv • x = xv • x1.

(2) If xy • x* = xy+z then xy ■ xSo<2) = xv+8c(z).

For, under the hypothesis,

xy . zSc(z) = • (xz ■ x) = (xy -xz)-x= xy+z ■ x = xSc(y+2) = xw+Sc(2).


82 THE POSITIVE INTEGERS [CHAP. 3

Other familiar functions can be defined recursively. For example, for


z\, we have the conditions:

11 = 1,
(Sc(a;))! = Sc(x) • (a:!).

This is a unary function whose existence can be derived either from the
existence of a certain binary function in 3.7 (which will be F(x, y) = yl)
or directly from 3.4 when we take S = P.

Exercise Group 3.2

1. Prove Theorem 3.13.

2. Check that Theorem 3.16 was not needed in the proofs of 3.17-3.19.
Give a direct proof of 3.16 from these latter theorems.
3. Prove Theorem 3.20.
4. Prove Theorem 3.23(iii).

5. Prove that for each i/GP either y = 1 or there is a (unique) x E P with


y = 2 • x or there is a (unique) x G P with y = (2 • x) + 1.
6. Find a recursive definition for a function F with domain P such that
F(l) = 1 and for all x G P, F(2 ■ x) =2 and F((2 • x) + 1) = 1.

3.3 Order. The basic notion which led to our formulation of Peano
systems, and hence to the characterization of the positive integers, was
that of a immediately preceding b, or equivalently of b immediately suc¬
ceeding a, when b = Sc (a). Generally, we should say that a precedes b
('a < 6)> or equivalently, that b follows a (b > a), if b succeeds a by a
number of steps, i.e., if starting with a we will eventually reach b by
forming Sc(a), Sc(Sc(a)), Sc(Sc(Sc(a))), . . . Using our usual notations
2 — 1 + 1, 3 = 2+1, etc., we have

Sc(tt) = a —f- 1^
Sc(Sc(a)) = Sc (a) + 1 = (a + 1) + 1 = a + (1 + 1) = a + 2;

similarly Sc(Sc(Sc(a))) = a + 3, . . . Thus it seems that the appropriate


definition of the order relation < should be as follows.

3.24 Definition. For any x, y g P, we put:

(i) x < y if and only if there is some v G P with y = x + v;


(ii) x > y if and only if y < x;
(iii) x < y if and only if x < y or x = y;
(iv) x > y if and only if y < x.
3.3] ORDER 83

Properly speaking, the relation considered in 3.24(i) is the set

W = {(x, y): x, y e P and for some v 6 P, y = x v}.

3.25 Theorem. For any x, y, z & P we have:

(i) (Trichotomy law for <). Exactly one of the three cases

x < y, x = y, y < x

is true.

(ii) (Transitive law for <). If x < y and y < z then x < z.

Proof, (i) follows directly from 3.14 and the definition 3.24. (ii) follows
from the fact that if y = x + v, z = y + w, where v, w e P, then
2 = (x + v) + w = x + (v + w), and v + w G P.

3.26 Corollary. For any x, y, z & V we have:

(i) (Reflexive law for <). x < x.


(ii) (Connectivity law for <). Either x < y or y < x.
(iii) (Antisymmetric law for <). If x < y and y < x then x = y.
(iv) (Transitive law for <). If x < y and y < z then x < z.

We shall write x < y < z as an abbreviation for “x < y and y < z,”
x < y < z as an abbreviation for “x < y and y < zsimilarly for
x < y < z and x < y < z. We do not generally write such expressions
as x < y > z, although a suitable convention could be made about this.
We write x < y for “not x < y,” which is equivalent to y < x by trichot¬
omy; similarly, we write x ^ y for “not x < y, ” which is equivalent to
y < x.

Simply ordered systems.

3.27 Definition. A binary relation < in a set S is said to be a simple


(or total) ordering of S if the trichotomy law [3.25(i)] and the transitive
law [3.25(h)] hold for any x, y, z e S, when we take the relation <
instead of <. If these conditions hold then (S, <) is said to be a simply
ordered system.

Thus the ordering < is a simple ordering of P. It is by no means the


only simple ordering of P. Consider, for example, the ordering

(3:3-1) 2, 4, 6, . . . , 1, 3, 5, . . . ;
84 THE POSITIVE INTEGERS [CHAP. 3

i.e., for x, y e P define

(3:3-2) x < y if and only if x is even and y is even and x < y,


or x is odd and y is odd and x < y,
or x is even and y is odd.

(Here “x is even ” means that there is a u e P with x = 2 ■ u, and “x is odd ”


means that x = 1 or there is a u e P with x = 2 • u + 1.) The student
can verify that the relation < defined in (3:3-2) is a simple ordering of P.
We shall seek to characterize the relation < on P (up to isomorphism)
by means of certain statements phrased entirely in terms of this relation,
and independent of its original definition in terms of The example
(3:3-2) shows that the statement that < is a simple ordering of P is
certainly insufficient to characterize it.
Before we proceed to finding a correct characterization, it should be
noted that the conditions to be a simple ordering are independent. Thus
if we define for x, y e P

(3:3-3) x < y if and only if x is even and y is even and x < y


or x is odd and y is odd and x < y,

it is seen that the transitive law holds for <, but not the trichotomy law
(since none of 2 < 1,2 = 1,1 < 2 holds). For an example in which the
trichotomy law holds but not the transitive law, we turn to a set with three
elements, S= {1,2,3}. Let 1 <' 2, 2 <' 3, 3 <' 1, and let <' hold
in no other cases. The student can use this as a basis to construct such a
relation in the set of all positive integers.
In the next chapter we shall turn to the set of all integers, which has the
natural ordering

...,-3, -2,-1,0, 1,2, 3,...

An isomorphic ordering of the positive integers is

(3:3~4) ...,6,4,2, 1,3,5,... ;

i.e., define for x, y e P

(3:3-5) x < y if and only if x is even and y is even and x > y,


or x is odd and y is odd and x < y,
or x is even and y is odd.

(It is easy to check that this relation actually provides a simple ordering
of P.) One characteristic difference between this ordering and the natural
ordering is that there is no first element in the ordering -<
3.3] ORDER 85

3.28 Definition. Let (S, <) be a simply ordered system and let A c S.
An element c is said to be a first or least element of A if c E A and if
for all x e A with x c we have c < x. The element c is said to be
a last or largest element of A if c e A and if for all x E A with x A c
we have x < c.

3.29 Lemma. Suppose that (S, <) is a simply ordered system and A ci S.
If A has a first element then it is unique; similarly for last elements.

Proof. Suppose that both c, c' are first elements of A and suppose, con¬
trary to the desired conclusion, that c c'. By the trichotomy law, either
c' < c or c < c', but not both. Suppose first that c' < c. Since c is a
first element of A and c' ^ c we have c < c', which gives us a contradic¬
tion. Similarly c < c' would lead to a contradiction. Hence we must have
c = c'.
Thus if a set A has at least one first element, we can speak of the first
element of A. It may be, however, that A has no first element at all; such
is the case with the set P in the ordering of (3:3-5).

Well-ordered systems.

3.30 Definition. A set S is said to be well-ordered by a relation <, and


(S, <) is said to be a well-ordered system, if the following conditions
hold:

(i) (S, <) is a simply ordered system;


(ii) for any A c S, if A ^ 0 then A has a first element.

Thus the system (P, <) of (3:3-5) is not well-ordered. On the other hand,
we shall show that (P, <) is a well-ordered system. [So also is the system
(P, <) of (3:3-2); cf. the exercises.] In order to do this, we must first
establish some properties which connect < with 1 and Sc. These properties
may be thought of as the recursive characterization of < in P, i.e., they
show under what conditions x < 1 holds and under what conditions
x < Sc (y) holds. These conditions are given in the first two parts of the
next theorem. The remaining parts are simple consequences of these
conditions.

3.31 Theorem. For any ijeP we have:

(i) x < 1;
(ii) x < Sc (y) if and only if x < y;
(hi) 1 < x)
(iv) Sc (y) < x if and only if y < x;
(v) y < Sc (y) and there is no z E P with y < z < Sc (y).
86 THE POSITIVE INTEGERS [CHAP. 3

Proof, (i) Suppose that x < 1. Then by definition [3.24(i)], 1 = x + v


for some deP. Now either v=l or v=w+l for some w E P. In
either case we obtain z £ P with 1 = z -f- 1 = Sc (2), which is impossible.
(ii) Suppose first that x < y, that is, x < y or x = y. In the first
casey = x + v for some v E P and then Sc(y) = x + Sc(w), sox < Sc(y).
In the second case Sc (y) = Sc(x) = x + 1, so again x < Sc (y). Con¬
versely, suppose that x < Sc(y). Then Sc(y) = x + v for some deP.
If v = 1 then Sc (y) = Sc(x) and hence y = x. Otherwise v = Sc (re)
for some w E P and Sc(y) = Sc(o: + w), hence y = x + w and x < y.
Thus, in any case, x < y.
Parts (iii) and (iv) follow directly from (i) and (ii), respectively, by
application of the trichotomy law (in the form, z < w if and only if
w < z). Part (v) follows directly from (ii).

3.32 Theorem. (P, <) is a well-ordered system.

Proof. We have already established (3.25) that (P, <) is simply ordered.
Consider any set A c P with A 0, but suppose, to the contrary, that
A has no first element. We shall show that this leads to a contradiction.
Let

(1) B = {x: x E P and for every y E A, x < y}.

Then

(2) Pci.

For suppose that x E B but x A. Then x E A. Hence we would have


x < x, which contradicts the trichotomy law. We shall now prove by
induction that B = P:

(3) I e B.

By 3.3(iii), 1 < y for all y e A. If 1 e A this would show that 1 is a


first element of A, contrary to hypothesis. Hence 1 g 4 and then 1 < y
for all y e A.

(4) If x E B then Sc(x) e B.

For suppose that x E B. Consider any y e A; then x < y. Hence by


3.31(iv), Sc(F) < y. If Sc(F) e A this would show that Sc(x) is a first
element of A, contrary to hypothesis. Hence Sc(x) g A and then
Sc(x) < y for all y e A, that is, Sc(x) £ B. Since B = P follows from
(3) and (4) we see from (2) that

(5) P c A.
3.3] ORDER 87

But then P n A = 0 which, since A c P, shows that A = 0. This


contradicts our original hypothesis.
This theorem provides us with a very important property of the positive
integers. It allows us to give a second kind of inductive proof which is
often more convenient to apply than the standard kind of induction we
have used so far. Examples of such proofs will be found in the next chapter.
However, the property of well-ordering still does not characterize the
ordering of P. An example of another well-ordering which is not isomorphic
to the natural ordering is provided by

(3:3-6) 2, 3, . . . , 1.

Still another example is provided by the relation < defined in (3:3-2)


above. We shall leave it to the student to verify these facts. The point
is that we can have well-ordered systems (S, <) in which there are ele¬
ments other than the very first which have no direct predecessor.

3.33 Definition. Let < be a binary relation in a set S and let x, y E S.


Then x is said to be a direct predecessor of y and y is said to be a direct
successor of x [with respect to (S, <)] if x < y and if there is no
z E S with x < z and z < y.

In the well-ordering provided by (3:3-6) above, the element 1 has neither


a direct predecessor nor a direct successor. However, we have the follow¬
ing, the proof of which we leave to the student.

3.34 Lemma. Suppose that (S, <) is a well-ordered system.

(i) If x E S and for some w E S, x < w, then x has a (unique)


direct successor in S.
(ii) If x El S then x has at most one direct predecessor.

Thus for any well-ordered system (S, <) we can introduce an operation
Sc' whose domain is {x: x e S and for some w E S, x < w} such that for
any x in its domain Sc'(z) is the direct successor of x. If we denote by 1'
the first element of S, we see that properties corresponding to the recursive
description of < in 3.31 (i), (ii) hold in any well-ordered system. We
need add only two simple properties now to characterize the positive in¬
tegers through its ordering.

3.35 Theorem. Suppose that P' 5^ 0, and that (P', <') is a system with
the following properties:

(i) (P', <') is well-ordered;


(ii) every element x of P' has a direct successor, Sc'(x);
88 THE POSITIVE INTEGERS [CHAP. 3

(iii) every element x of P', other than the first element 1' of P', has a
direct predecessor.

Then (P', Sc', 1') is a Peano system.

Proof. We must verify the three properties of 3.1.

(1) For all x e P', Sc'(F) 1'.

For if Sc'(F) = 1', 1' is a successor of x, that is, x < 1', contrary to the
choice of 1' as the first element of P.

(2) For all x, y E P, if Sc'(F) = Sc'(y) then x = y.

Let 2 = Sc'(x) = Sc'(y). Thus both x and y are direct predecessors of 2.


Hence x = y by 3.34(ii).

(3) If A Q P' and 1' E A and A has the property that whenever
x E A then Sc'(T) e A, then A = P'.

For suppose that A 5^ P'. Let B = P' — A, Thus B c P', B 0.


Hence, by well-ordering, B has a first element b in the ordering <'. By
(iii) either 6 = 1' or for some x 6 P', 6 = Sc'(F). But 1' E A and hence
1' (2 B, so b 7^ 1'. On the other hand, if b = Sc'(x), then x < 6; hence
x & B, since b < y for all y E B. Thus x E A ; hence also Sc'(F) G A,
that is, b E A, which contradicts the fact that b & A. Thus the hypothesis
A P' leads to a contradiction.
This theorem now leads directly to the following.

3.36 Theorem. The system (P, <) satisfies 3.35(i)-(iii), with 1 being the
first element of P under < and Sc(x) being the direct successor of any
element x with respect to <. Further, suppose that P' 5^ 0, and that
(P', <') is a system with the properties 3.35(i)—(iii). Then

(P, <) ^ (P\ <').

Proof. That (P, <) satisfies the stated conditions is now seen directly
from 3.23, 3.31 (iii), (v), and 3.3. Suppose that (P', <') satisfies 3.35(i)-
(iii). We wish to construct a one-to-one function from P to P' which pre¬
serves the relation <. Let Sc', 1' be defined as in 3.35. Since (P', Sc', 1')
is a Peano system, we know by 3.5 that

(1) (P, Sc, 1) ^ (P', Sc', 1').


3.3] ORDER 89

Hence there is a function F with the following properties:

(2) F is one-to-one;

(3) 2D(F) = P and (R(F) = P';

(4) P(P = l';


(5) for any x e P, F(Sc(x)) = Sc'(F(x)).

We wish to show that for this function F

(6) for any x, y e P, x < y if and only if F(x) <' F(y).

Let x e P be fixed. We shall prove (6) by induction on y.

(7) x < 1 if and only if F(x) <' F( 1).

This is true since both sides of the equivalence are false, F( 1) = P being
the first element of P' with respect to <'. Now suppose that the condi¬
tion in (6) is true for y and the given x. Then

(8) x < ScQ/) if and only if F(x) <' F(Sc(d/)).

For by 3.31(h), x < Sc(y) if and only if x < y or x = y. Also since


F(Sc(y)) = Sc'(F(y)), it follows that F(Sc(y)) is the direct successor
of F(y) in the ordering <’. Hence F(x) <' F(Sc(y)) if and only if
F(x) <' F{y) or F(x) = F(y). By inductive hypothesis, x < y if and
only if F(x) <' F(y). By the one-to-one property (2), x = y if and
only if F(x) = F(y). Hence we can conclude that (8) holds. This com¬
pletes the inductive proof of (6) and thus also the proof of our theorem.
The significance of the two theorems 3.35 and 3.36 can be seen as
follows. Theorem 3.36 shows that the ordering of the positive integers
can be completely characterized by the properties 3.35(i)-(iii). These
properties are expressed entirely in terms of the notions of ordering, since
Sc' has the meaning of direct successor in the ordering <' (3.33) and 1'
has the meaning of being the first element in the ordering <'. Theorem
3.35 shows that if we started with the assumption that (P, <) is a system
satisfying (i)-(iii) and defined Sc, 1 in this way, we could obtain all the
arithmetical properties of the positive integers as we have done so far.
Hence we could have taken instead of the axiom 3.2 an alternative axiom
which states that there exists at least one system (P, <) satisfying 3.35(i)-
(iii). As to their intuitive correctness there is not much to choose between
accepting the one or the other of these as an axiom. The choice here has
been made on the basis of taste and convenience.
90 THE POSITIVE INTEGERS [CHAP. 3

Ordering and the arithmetical operations. We have thus far not connected
the ordering < with the arithmetical operations in any extended way.
The basic results here are the following.

3.37 Theorem. For any x, y, z e P we have:


(i) x < y if and only if x + z < y + z;
(ii) x < y if and only if x • z < y • z;
(iii) x < y if and only if xz < yz;
(iv) if 1 < z then 1 < zx;
(v) if 1 < z then x < y if and only if zx < zv ;
(vi) the above all hold true if we replace < by < throughout, except in
the condition 1 < z of (v).

Proof, (i) x + 0 < y + z if and only if there is u e P with y + z =


(x + z) + u, that is, y + z — (i + «) + 2. By cancellation (3.12), this
is equivalent to the existence of a u e P with y = x + u.
(ii) Suppose that x < y. Then y = x + u for a certain u e P. Hence

y • z = (x + u) ■ z = (x • z) + (u • z)

by distributivity, and thus x ■ z < y ■ z. Now suppose that x • z < y • z.


If x < y, then x = y or y < x by trichotomy. In the first case x ■ z =
y • z, contradicting our hypothesis. In the second case it follows by the
first half of our proof of (ii) that y • z < x ■ z, again contradicting the
hypothesis. Hence x < y.
(iii) Suppose that x < y. We prove that xz < yz by induction on 2.
This is clear for z = 1. Suppose that it is true for 2. Then £Sc(z) = xz ■ x
and ySc(z) = yz ■ y. By hypothesis, xz < yz, hence xz ■ x < yz • x by (ii) ;
also yz • x < yz • y by (ii) and commutativity of •. Then by transitivity,
xz ■ x < yz • y, completing the induction. To prove the converse, we argue
by contradiction as in (ii).
We leave the proofs of (iv) and (v) to the student.

3.38 Corollary. For any x, y, z, w e P we have:

(i) if x < y and z < w then x z < y + w;


(ii) if x < y and z < w then x ■ z < y • w;
(iii) if x < y and z < w then xz < yw ;
(iv) if 1 < x < y and z < w then xz < yw.

These are proved directly from 3.37 using transitivity.

3.39 Corollary. For any x, y, z e P we have:

(i) if xz = yz then x = y;
(ii) if 1 < z and zx = zv then x — y.
3.4] SEQUENCES, SUMS AND PRODUCTS 91

Corollary 3.39 provides us with cancellation-type laws for exponentia¬


tion. Expressed in another way, let 2 £ P be fixed. Then 3.39(i) shows that
the function F(x) = xz is one-to-one. Also 3.39(11) shows that the function
F(x) = zx is one-to-one, if 1 <2. These are proved from 3.37(iii) and (v)
by using trichotomy.

Exercise Group 3.3

1. Let (S, <) be given and S' c S. Let <' be the relation < restricted to
elements of S'. Show that if (S, <) is simply ordered, then so is (S', < ').
Do the same for well-ordering.
2. Let (Si, < 1) and (S2, < 2) be given, with Si fl S2 = 0- Let S = Si U £2
and defme < on S by the condition:

x < y if and only if x, y £ Si and x < 1 y,


or x, y £ S2 and x < 2 y,
or x £ (Si and y £ S2.

Show that if (Si, < 1) and (S2, <2) are simply ordered, then so is (S, <).
Do the same for well-ordering.
3. Show that the ordering corresponding to

2, 3, 4, . . . , 1

is a well-ordering. Do the same for

2, 4, 6, . . . , 1, 3, 5, . . .

4. Give an example of a simple ordering of P in which every nonempty set


A has a last element.
5. Prove Lemma 3.34.
6. Prove Theorem 3.37(iv) and (v).

3.4 Sequences, sums and products. Finite and infinite sequences. Up


to this point we have used letters such as a, b, c, x, y, 2, . . . in our discussion
of the positive integers in order to emphasize the algebraic character of
these numbers which, in this respect, have much in common with other
number systems. However, when the positive integers are used to count
or order the elements of a set S, it is common to use such letters as i, j,
k, l, m, n, p, q as variables for elements of P. We shall make no special
convention to this effect, but shall also tend to use these letters in various
such contexts.
The notion of counting by positive integers is a special case of the notion
of set-theoretical equivalence discussed at the end of Section 2.3.
92 THE POSITIVE INTEGERS [CHAP. 3

3.40 Definition. Let n e P, and let S be any set. We say that S has n
elements if S is set-theoretically equivalent to {k: k E P and k < n}.

In other words, S has n elements if there is a one-to-one function H with


domain {k: k e P and k < n) and range S. If we denote H(k) by x
we would also write S = {xi, x2, Xk, , xn}. Now it seems intui¬
tively clear that any such set S is finite and that, conversely, any finite
set S (other than the empty set) has some definite number n of elements,
for some n e P. However, recall that the set-theoretical definition of
finiteness of a set S in (2:4-5) was that S not be set-theoretically equiva¬
lent to any proper subset of itself. Curiously the fact that finite sets, as
defined in that way, are related to P in the way just described apparently
cannot be proved without the axiom of choice applied to infinite sets.
It is not essential to our development that we prove this connection.
We shall content ourselves, instead, with an exact statement. The inter¬
ested student can try to construct a proof of it (by no means simple) by
following the lines sketched in the exercises, or by referring to texts on
axiomatic set theory (cf. Bibliography).

(3:4-1) A nonempty set S is finite if and only if there is an n E P such


that S has n elements.

It is convenient for many applications to liberalize the use of functions


H from subsets of P to sets S in two ways. First we drop the restriction
that H be one-to-one; then we obtain a sequence of objects xx, x2, . . . ,
Xk, ... , xn, some or all of which may be identical. Second we can extend
this notion to the enumeration of a (possibly) infinite set of objects
xi, x2, Xk, ... , some or all of which may be identical.

3.41 Definition.

(i) Let n E P. By a finite sequence of n terms in a set S we understand


a function H whose domain is {k: k G P and k < n} and whose
range is contained in S. This sequence is also denoted by (H( 1),
H(2), . . . ,H(n)) or by (H(k))i<k<n- By the kth term of the
sequence we understand H{k); if this is also denoted by Xk, we denote
the sequence by (xx, x2, , xn)'or by (xk)i<k<n.

(ii) By an infinite sequence of terms in a set S we understand a function


whose domain is P and whose range is contained in S. This
sequence is also denoted by (H( 1), H(2), . . . , H(k), . . .) or by
(H(k))kep. If the kth term H(k) of this sequence is denoted by Xk,
we also denote the sequence by (xx, x2, . . . , xk, . . .) or by (xk)kep.
3.4] SEQUENCES, SUMS AND PRODUCTS 93

Two sequences (aq, x2, ... , xn) and (yx, y2, ..., yn) are identical if
and only if Xk = yk for every k < n. In particular (aq, x2) = (yi, 2/2)
if and only if aq = x2 and yx = y2, so that two-termed sequences behave
just like ordered pairs (aq, x2); however, (aq, x2) is defined in terms of
ordered pairs as {(1, aq), (2, x2)}. Similarly, three-termed sequences
(aq, x2, xs) behave like ordered triples. Thus, in general, n-termed
sequences (aq, x2, ... , xn) can be used in contexts where one might use
ordered 71-tuples (aq, x2, . . . , xn), and infinite sequences (aq, x2, . . . ,
Xk, . . ■) can be used to explain the notion of an ordered infinite-tuple.
As with ordered pairs, it is clear why we must use symbols ( ) which are
different from those { } for the formation of sets. Although (1, 2, 3} =
{1,3,2}, we have (1,2,3) 5^ (1,3,2). However, some authors write
{%k\ i<k<n where we write (Xk)i<k<n, so that the reader should watch
for the intended meaning. We shall use {aq, x2, . . . , xn} only for the set
of objects associated with a sequence, i.e., for {y: for some k e P, k < n,
y = aq}, and similarly with {aq, x2, . . . , Xk, . . .}. Thus the set associated
with the sequence (1, 3, 2, 1, 3) is {1, 3, 2, 1, 3}, i.e., is the set {1, 2, 3}.
It is not difficult to see that {aq, x2, . . . , xn} has m elements for some
m < n (in the sense of Definition 3.40).

Extended sums and 'products. The notion of sequence is useful, among


other things, in extending operations originally defined for a fixed number
of objects to operations on “arbitrarily many” objects.

3.42 Definition. Suppose that S is a set on which a binary operation +


is defined and under which S is closed. Let (aq, x2, . . . , aq, . . .) be
an infinite sequence of terms of S. For every n £ P we denote by
xk the element of S uniquely determined by the following conditions:

(i) Xk = d;

for every n G P.

This definition is justified on the following grounds. The sequence


(aq, x2, . . . , Xk, ■ ■ ■) is a given function H with domain P, H(k) = aq
for every k. We seek a function F with domain P whose value F(n) is to
be ££=1 xk. Then the conditions (i), (ii) above correspond to the fol¬
lowing conditions on F:

(3:4-2) (i)' F(l) = H{ 1);


(ii)' P(Sc(w)) = F(n) + H(Sc(n)), for every n e P.
94 THE POSITIVE INTEGERS [CHAP. 3

Let

(3:4-3) c = H( 1);
G(n, z) = z + i/(Sc(n)), for every n 6 P and z £ S.

Thus the conditions in (3:4-2) are equivalent to

(3:4-4) (i)" F( 1) = c;
(ii)" P(Sc(n)) = G(n, F(n)\ for every n e P.

Given the function H, the element c of S and the function G are well-
defined by (3:4-3). Then by Theorem 3.4' we see that there is a unique
function F satisfying (3:4-4) with 2D(F) = P and (R(F) c S. Thus 3.42
is just another form of recursive definition. (Hence it should be expected
that various properties of ££=1 xk will have to be verified by induction
on n.) We also need a notation to be associated with “product” operations
•; the following definition of this is simply obtained from 3.42 by chang¬
ing + to • and £ to n.

3.43 Definition. Suppose that S is a set on which a binary operation


• is defined and under which S is closed. Let {xi, x2, . . . , xk, . . .)
be an infinite sequence of terms of S. For every n e P we denote by
njt=1 xk the element of S uniquely determined by the following conditions:

(i) n = d;

for every n e P.

Where the basic operations are denoted by +, •, as with the positive


integers and their extensions to be defined in the following chapters, the
corresponding operations on sequences will be denoted by and n,
respectively.

Generalized associative and commutative laws. Given a sequence (x 1, x2,


x3, we can compute £*=1 xk from 3.42 as follows.

4
(3:4-5) 2 xk ^ X Xk^j + x4 = ^ + X3 + Xi

((X\ + x2) + X3) + x4.


3.4] SEQUENCES, SUMS AND PRODUCTS 95

(Clearly, the value of 1 xk depends only on the terms xk of an infinite

sequence (x1} x2, . . . , xk, . . .) for which k < n; cf. Exercise 1 below.) If
Xi, x2, x3, x4 are positive integers and the operation we are dealing with
is the ordinary operation +, we know by associativity that

(3:4-6) ((x’i + x2) + £3) + x4 = (x\ -f- x2) + (x3 + x4)


= X\ -f- (x2 + {x3 + x4))
= X\ + ((x2 + X3) + xf)
= (X-y + (x2 + X3)) + X4.

In terms of the notation, these equalities can be represented as follows:

(3:4-7) (a) 23 Xk = 2] Xk + 2 Xk+2 = 23 Xk + 53 Vk


k=l k= 1 k=l k=1 k=1
13 3
= 23Xk + 53 Xk+i = 53 Zk>
k=1 fc=l k=l
where

(3:4-7)(b) (yu y2) = (x2, x3 + x4), {zt, z2, z3) = (xu x2 + x3, x4).

Here we interpret, for example,


3 3
(3:4-7)(c) 23 Xk+1 = 53 Wk’ where (wi, w2, w3) = (x2, x3, x4).
k=l k= 1

In general, if xlt x2, x3, x4 e S and + is an operation on S, we cannot


expect the equalities in (3:4-7) to hold for £ unless we know that the
associative law for + also holds in S. It is possible to formulate a general
associative law for given that + is associative, which would cover all
the cases shown in (3:4-7). We will do this in the next chapter. For the
moment we shall give only an associative law which covers the equalities
between the first and second and between the first and fourth terms in
(3:4-7) (a).

3.44 Theorem. Suppose that S is a set closed under a binary operation +


and that + is associative on S, i.e., for all x, y, z E S, x + (y + 2) =
(x + y) + 2- Let {xk)k&p be any infinite sequence of terms in S.
Then for any n, m £ P we have

n+m

(i) X
k= 1
Xk =
Xn-\-k
96 THE POSITIVE INTEGERS [CHAP. 3

Similarly, if • is an associative operation on S then

n-\-m . n . . m

(ii) Xk ( ("J %k I • f %n-i~k


k—1 \fc = l / \fc=l

Proof. We prove (i); the proof of (ii) is completely similar. Let n be


fixed; we proceed by induction on m.

Ti ~f~ 1 / 71 \ , 1

(1) Xk ( %k j “F ( •En-{-k
k=1 \fc=l / \fc=l

For, by 3.42,
T.I =1 %n-\-k — ■£«+!•

Suppose (i) is true for m. We show that it is true for Sc(ra) = m + 1, i.e.,
that
n+(.m+1)
\ /m+l \
(2)- I Xk £ Xk\ + / Zn+fc j •
k=1

For

n+(rra-)-l) (n.-|-m)-|-l

^ Xk = ^ Tfc, by associativity of + on P,
k=1 A;=l

“F *r(n-fTO) + l)
(.?,*■)

,n+m .

“F *rn-(-(m-|-l)

y n m v

( £ Zfc + £ ■£«+*) - Xn+(m +1) , by induction hypothesis


\/c = l k= 1 /

n y m

£ + ( £ Xn+k +
fc=i \fc=l

by associativity of + on S,
m +1
— Xk + £ ^n+fc, by 3.42(ii).
fc=i fc=i

When associativity holds for + or for •, we can unambiguously use


£i + a:2 + • • • + xn, Xi • x2 • . . . • xn to denote ££=1 xk, £*=i re¬
spectively. We further agree, for the purposes of simplification, to drop
3.4] SEQUENCES, SUMS AND PRODUCTS 97

parentheses in particular sums of products. Thus 3 • x2 • y + 5 • ?/3 +


x • z2 • y when written out in full would be, first, (3 • x2 • y) + (5 • y3) +
(x • z2 • y) and then (((3 • x2) • y) + (5 • y3)) + ((a; • z2) • y). Where
there is no ambiguity, the symbol • is often omitted entirely, so that this
particular sum would be written 3x2y ~f 5y3 + xz2y. (The ambiguity
arises, for example, with 23. Is this 2 times 3, or the number 23? In the
first case it should be written 2 • 3.) Parentheses are essential to dis¬
tinguish, for example, 2(x + 3) from 2x + 3, that is, (2x) + 3.
From a sequence (x1; x2, x3, x4) we can form many other sequences of
the same length with the same set of terms {x\, x2, x3, x4} by permuting
the terms, for example, (xi, x3, x2, x4), (x4, x4, x3, x2), etc. Then the sum
of the second sequence is ((x4 + x3) + x2) + x4. If the associative law
for + holds in S, this is the same as (x4 + (x3 + x2)) + x4. If, further,
+ is commutative on S, this is equal to (xi + (x2 + x3)) + x4 and then,
again by associativity, to ((xi + x2) + x3) + x4. Thus, given associa¬
tivity of +, a general formulation of commutativity would be the state¬
ment that xk = £&=i Vk, whenever the sequence (yi, . . . , yn) is

obtained by permuting the terms of the sequence (xi, . . . , xn). We shall


prove such a statement in the next chapter. For the moment we shall
consider only a special consequence of associativity and commutativity.
In the particular case that we have been considering, this takes the form

(xi + X2) + (x3 + X4) = (xi + x3) + (x2 + X4),

that is, if we set U\ = x3, u2 = x4, then

(xx + x2) + (ui + u2) = (xi + Ui) + (x2 + u2).

3.45 Theorem. Suppose that S is a set closed under a binary operation +


and that + is associative and commutative on S (for all x, y e S,
x + y = y x). Let (xk)k<= p and (yk)kGF be any infinite sequences of
terms in S. Then for any n £ P

Similarly, if • is associative and commutative on S then

(ii)

The proof of this is left to the student. A generalized distributive law is


easily stated as follows; the proof of this is also left to the student.
98 THE POSITIVE INTEGERS [CHAP. 3

3.46 Theorem. Suppose S is a set closed under binary operations + ,


Let • be right distributive over + on S, that is, for any x, y, z e S,

(x + y) - z = (x’z) + (y • z).

Let (xk)k<Ep be any infinite sequence of terms in S and let z e S. Then


for any n e P

A similar result holds for left distributivity, i.e., when z • {x + y) —


(z ■ x) + (z ■ y) for any x, y, z e S. Of course if • is commutative on &
these two results are equivalent.

Some special sums and products. The reader is familiar with the values
of various special sums, for example: the sum of the first n positive integers
1, 2, 3, . . . , w; more generally the sum of any arithmetic progression
a, a + d, a + (2 • d), . . . , a + (n ■ d); the sum of the first n squares
1, 4, 9, ,n2; the sum of a geometric progression a, a ■ r, a • r2, ... ,
a - rn; etc. Expressed in the notation we have, for the first of these,

(3:4-8) ^ k = 1}-
fc=i 2

Since we do not formally presuppose fractions, this can for the moment be
expressed as follows:

(3:4-9) 2 ^2 k = n(n + 1).


k=l

A proof of (3:4-9) is easily given by induction on n. We will delay con¬


sideration of the general arithmetic and geometric progressions until we
have negative numbers and fractions available.
The product of the first n positive integers is what we usually denote
by n! In fact if we define

(3:4-10) n! = ]J fc,
k—l

the recursi\ e characterization of n\ which we gave earlier is an immediate


consequence of 3.43:

(3:4-11) 1! = 1,
(n + 1)! = (n + 1) • (n!).
h.4] SEQUENCES, SUMS AND PRODUCTS 99

Actually, even the ordinary product and exponentiation can be re¬


captured from the generalized sums and products. Consider a sequence
(x, x, . . . , x, . . .) all terms of which are identical to a given ieP. Then
we have for n £ P

n
(3:4-12) ^ x = n■ x
k= 1

and

(3:4-13) Rx = xn.
fc=l

These correspond to our intuitive understanding of n ■ x as a sum


x + x + • • • + x of x, n-times, and of xn as a product x ■ x • . . . • x of
x, n-times. (3:4-12) and (3:4-13) are easily proved by induction on n.
We wish to make a final remark concerning the use of the £ and n
notation. The variable k in Xk is what is often called a “dummy
index,” i.e., we can also write, for example,

n n n n
(3:4-14) Y xk = Y Xi = Y xi = H Xz-
fc= 1 2=1 j=l Z= 1

On the other hand, xk, £™=i xk will in general denote different


numbers, depending on what n, m denote. Thus k and n in ££=1 xk behave,
respectively, like bound variables and free variables in the formulation of
conditions. The situation is analogous with integration in the calculus.
Thus EU k2 = i i2 and J7 x2 dx = /" y2 dy; here k, i are to be
compared with the “variables of integration” x and y. It is possible to
develop the theory of sums, products, integrals, etc., without the use of
such bound variables. For example, instead of dealing with an infinite
sequence (xk)kep, we would deal with the function H, H(k) = xk for all
k £ P. Then we would write

(3:4-15) UnH

instead of Ya=i xk- This has the recursive properties (corresponding to


3.42)

(3:4-16) TiH = H( 1),


Zsc(n) H = CLnH) +tf(Sc(n)).

Indeed we have done essentially this in showing that the definition 3.42
is justified. This sort of notation may be preferred on various theoretical
grounds, but for practical questions it is more awkward to deal with. For
100 THE POSITIVE INTEGERS [CHAP. 3

example, in order to express (3:4-9) we would have to write

(3:4-17) Let H be a function with domain P, H(k) = k for all k E P;


then

2 TnH = nin +1) for every n £ P.

Exercise Group 3.4

1. Let (xk)kGv, (yk)kep be two infinite sequences of terms from a set S.


Prove that for any n £ P, if Xk = Vk for every k < n then

n n n n

Xk = X Vk and FI Xk = FI
k=l k=1 *=1 k=l

2. Prove Theorem 3.45.


3. Prove Theorem 3.46.
4. Show that for any n £ P, 6 YLk=l k2 = nin + l)(2n + 1).
5. The following sequence of statements is designed to lead to a proof of
(3:4-1), that a nonempty set S is finite in the sense of (2:4-5) if and only
if there is an n E P such that S has n elements. We use [1, n] to denote
{k: k E P and k < n} below. Prove the following:
(a) If x is any element then {a:} is finite.
(b) If S is finite and x is any element then S U {x} is finite.
(c) If « £ P then [1, ft] is finite.
(d) If S is finite and S' is set-theoretically equivalent to S then S' is finite.
(e) Let M be any nonempty collection of nonempty sets. Then there
exists a function G with SD(G) = M and G(X) £ X for each X E M.
[Hint: This is a consequence of the axiom of choice (2:2-16). Con¬
sider the collection N of all sets X' of the form X' — {(X, x): x £ X),
where X E M. Apply (2:2-16) to N.]
(f) Let S be a set such that for every ft £ P, [1, ft] is not set-theoretically
equivalent to S. Then there exists a one-to-one function F with
SD(P) = P and 61(F) c S. [Hint: Let M be the collection of all
nonempty subsets of S and let G be a function satisfying the conclu¬
sion of (e). Define the desired function recursively, taking F(n -|- 1)
(p be an element of S not in (P(l), . . . , F(n)} as given by

G(*S — {F(l), . . . , F(n)}).[

(g) If there exists a one-to-one function F with 2D(P) = P and (R(F) C S


then S is infinite.
Now from these statements prove (3:4-1).
CHAPTER 4

THE INTEGERS AND INTEGRAL DOMAINS

4.1 Toward extending the positive integers. Practical motivations. For


the practical purposes of transmitting information about various integral
quantities it is necessary to have a system of notation which can be used
to denote arbitrarily large positive integers. The system of tallies

I ? II? | I I ? * * •

is the simplest and most obvious of these. Its practical disadvantages are
also immediately apparent. It is a time-consuming job to denote the
number of sheep in a moderate-sized flock or the number of bushels of
wheat in a crop by means of such a system. Even more laborious are
arithmetical computations which would be associated with various business
transactions. For example, we might agree to pay

pieces of gold for each of

II
bushels of wheat. It would be a long time before we discovered that
perhaps we did not want to pay that much, after all.
It thus became a matter of practical necessity, long ago, to develop a
compact systematic notation for dealing with large numbers. This was
provided by selecting larger basic units into which large numbers can be
decomposed into a small number of more easily recognizable parts. For
example, if ||||| is taken as such a larger unit, the above number of
bushels is more readily apprehended as

mm'Mt m m mi,
where the diagonal indicates a completed unit of |||||. Still larger num¬
bers would be analyzed in terms of the number of 4Hff' groups of
etc. If we abbreviated by /, and / groups of / by /, etc., we can
describe the above number also as consisting of /:/’s plus |||| : I’s, i.e.,
of a single/, no/’s, and |||| : |’s. To carry this through, we need a way
of indicating that there are none of a certain type of unit; we may do this,
for example, by using the symbol 0. Thus the above number consists
101
102 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

of | \/j(, 0 :/, HI) : |. The number of pieces of gold per bushel is


4!Ht' Wr llll)and hence consists of ||:/, [|| :
|. Still further economy
is achieved by special denotations for |, ||, |||, ||||, say 1, 2, 3, 4.
Then the numbers considered are 1 0 4 : |, and 2 :/, 4 : |. If
this sort of decomposition is always given in a fixed order, with the
number of the smallest unit last, the position of the digits is sufficient
for a description of the number: 104, 24. With another choice of basic
grouping we would get another notation for these numbers. For example,
in our commonly used base of ||||||||||, with notations for preceding num¬
bers 1, 2, 3, 4, 5, 6, 7, 8, 9, we would denote the above numbers by 29
and 14. We would say that 104 is a notation for 29 to the base 5. (In
modern times we have found it convenient again to work with very small
bases; the base 2 is especially useful in electronic computing machines.
The notation 11101 denotes 29 to the base 2.)
Thus the practical question of economically denoting positive integers
led to the use of a symbol 0 which by itself denotes no quantity. This
could be understood and accepted in a purely formal way. However, the
justification that such systems of notation are correct (in the sense that
every positive integer receives one and only one notation) is best realized
if, in addition, 0 has certain algebraic properties assigned to it. Thus, in
the base 5, we wish to analyze numbers into groups of units 1, 5, 52, . . .
We can represent the number 29 in this system as 1 • 52 + 4 • 1; to reach
the representation 104, we would like to write 104 as 1 • 52 + 0 • 5 + 4 • 1.
This is possible if the following laws hold for 0:

(4:1-1) 0 • a = 0, 0 + a = a.

Also implicitly involved in this use of 0 is that the associative law continues
to hold when the positive integers are extended by including 0 as a new
“number. ”
It would be natural to hope that the positional notation described above
would also lead to an economical means of carrying out arithmetical com¬
putations. In order to make this possible, still further algebraic properties
of 0 are needed. Consider, for example, what is involved in the computa¬
tions of 14-29 in the base 10 and the base 5:

29 104
14 24
(4:1-2) 116 (base 10) 431 (base 5)
29 213
406 3111

In the above case we are computing (2 • 10 + 9) • (1 • 10 + 4), in the


other case (1 • T 0 • 5 -f- 4) • (2 • 5 + 4). To follow this through we
4.1] TOWARD EXTENDING THE POSITIVE INTEGERS 103

must make extensive use of the distributive law for • over +, and of
commutative and associative laws for both + and • , when the positive
integers are extended by 0.
It is seen then that much of value in problems solely concerning positive
integers could be obtained if we could be sure that it is consistent to as¬
sume that there is a new number 0 satisfying (4:1-1) such that most of
the usual algebraic laws holding for +, • in the set of positive integers
continue to hold in the new system of numbers. It should not be expected
that all laws will remain true. Thus, if we demand (4:1-1), we must have
0 ■ a = 0 • b for all numbers a, b, and hence we cannot consistently demand
that the cancellation law for • also continue to hold. The analysis of the
problem of justifying the positional notation and its use in arithmetical
computations provides us with a minimum requirement as to which laws
we should like to be able to extend to cover the adjunction of the number 0.

Algebraic motivations. The historical step from arithmetic to algebra


came with the realization that the quantity or quantities to be determined
in many problems could not be obtained by an immediate arithmetical
computation but instead had to be deduced from some condition or condi¬
tions which they were required to satisfy. For example, a particular prob¬
lem may lead to the question of what positive integers x and y, if any,
satisfy the conditions

(4:1-3) Ax + y = 10 and Sx + 2 y = 12.

The brute-force approach to this problem is to test all pairs (x, y) of posi¬
tive integers in some succession, say (1, 1), (1, 2), (2, 1), (1, 3), (2, 2),
(3, 1), . . . with the hopes that we will eventually reach a solution. Un¬
fortunately, if a particular finite series of such tests fails to give us a solu¬
tion, the question will not be conclusively settled by this approach, since
we may not have gone far enough in our testing or there may be no solu¬
tion at all. With a little sophistication this can be remedied; we observe
that if there is a solution we must have 4x < 10 and Sx < 12, also
y < io and 2y < 12. From this we easily conclude that a solution, if
any exists, must be among the ten pairs (x, y) for which x < 2 and y < 5.
Although we are now in a position to decide whether there is a solution,
the number of computations we have to make (unless we are lucky) is
still slightly burdensome. The step of enlightenment consists in attempting
instead to eliminate one of the variables Irom this problem. We multiply
both sides of the first equation by 2 to obtain 8x + 2y = 20. Here 2y
must be a number which when added to 8x yields 20; let us write
2y = 20 — 8x as an abbreviation for this statement. Similarly, 2y =
12 — Sx. Thus, if there is a solution we must have 20 — Sx = 12 — Sx.
104 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

Returning to the meaning of —, we see that x must be a number such


that 20 + 3a: = 12 -f 8x, i.e., 8 + 3x = (5 + 3)x = 5x -j- 3x, so that
we must have 8 = 5x. Since there is no positive integer x satisfying this
last equation, we see that (4:1-3) has no solution in positive integers.
This suggests that the use of an operation b — a, which is taken to be
that number 2, if such exists, for which a + z = b, could have great use
in the formal simplification and manipulation of various problems, even
when we are only concerned with positive integers. We know by 3.24 that
this operation would be defined for positive integers a, b only when a < b.
However, to interrupt a problem in which we wanted to make use of this
operation with the imposition of various conditions of order would again
increase the burden which we are trying to avoid. The way out of this
would be to allow unrestricted application of the operation 6 — a with
the proviso that, when b < a, what is obtained is some new kind of
number. On these new kinds of numbers we must have an operation +
defined, with respect to which — is defined as above. In particular,
a — a should be a number z for which a + z = a. The natural candidate
is the number 0 which we “invented” earlier. Thus we seek to extend the
operations +, • defined on the positive integers to a new class of objects,
or numbers, which we shall call integers, in such a way that as many as
possible of the properties of +, • on the positive integers continue to apply
and such that,

(4:1-4) for any integers a, b there exists an integer z such that a + z = b.

Our approach to this end will consist of two steps. We shall first set down
those properties of a system of objects D with certain operations +, •
which we would like to see fulfilled in an extension of the positive integers
and we will investigate the consequences of these properties. We will
then prove the existence of at least one such extension of P. There are
many such extensions; we shall be able to single out the set of integers
as providing, in some sense, the least such extension.

Commutative rings with unity

4.1 Definition. A system (D, +, •, 0, 1) is called a commutative ring with


unity if 0, 1 e D, D is closed under the binary operations + and, *, and
if the following conditions hold for all x, y, z e D :
(i) 0^1;
(ii) x + y = y + x and x-y = yx)
(iii) x + (V + z) = (x + y) + z and x-(yz) = (x-y)-z;
(iv) x + 0 = x and x-l = x;
(v) x-(y + z) = x-y + x-z;
(vi) there exists a u e D such that x + u = y.
4.1] TOWARD EXTENDING THE POSITIVE INTEGERS 105

Some remarks about these conditions are in order here. First of all, the
conditions (ii)-(iv) are dual with respect to +, 0 and •, 1, i.e., each part
of these conditions is obtained from the other by replacing the one oper¬
ation and constant by the other. Thus any deduction of a theorem con¬
cerning + , 0 from the first parts of (ii)-(iv) can be used to obtain a deduc¬
tion of the dual theorem from the second parts, (ii) gives us commutative
laws, (iii) associative laws, and (iv) describes 0 and 1 as being identity
elements for +, *, respectively, (v) is a distributive law for • over +.
With the exception of the first part of (iv), we have seen in Chapter 3
that these conditions are all satisfied by P with +, •, 1. (vi) fulfills the
possibility of subtraction that we wished to obtain. It should be desired
that the result is unique, i.e., if x + u = y and x + v — y, then u = v;
more, simply, we want x-^-v to imply u = v. This corresponds
to the cancellation law 3.12 for P. We shall be able to derive this law for D
in 4.3 below.
In the modern study of algebra it has proved useful to consider systems
which satisfy some but not necessarily all of the conditions of 4.1. Most
prominently, those systems which are merely assumed to satisfy, among
the conditions of 4.1, the first part of (ii), both parts of (iii), the first part
of (iv), and (v) and (vi), are called rings. If, in addition, the second part
of 4.1 (ii) is satisfied, the ring is said to be commutative. If, finally, there is
an element 1 satisfying 4.1 (i) and the second part of (iv), the ring is said
to have a unit or unity element. It is for this reason that we have used
the given designation for the systems satisfying all conditions of 4.1.
It is thus seen that the notion of a commutative ring with unity embodies
those properties which, on first glance, we would like to see extended from
P, together with the general possibility of subtraction. We shall see that
many different such systems besides the integers can be constructed. Since
various of these will be useful in our development, it is worthwhile having
a list of results which apply to any commutative ring with unity. These
will be taken up in this section; in the next section we shall deal with
certain elaborations of the basic definition 4.1 which will bring us still
closer to the integers.
Throughout the remainder of this section we assume that (D, +, *, 0, 1)
is an arbitrary commutative ring with unity.
We begin by establishing the uniqueness of 0, 1 as identity elements.

4.2 Theorem. Let u G D.


(i) If x + u — x for all x G D, then u = 0.
(ii) If X'U — x for all x e D, then u = 1.
Proof. Assuming the hypothesis of (i) we would have 0 + u = 0.
Hence also u + 0 = 0, so that u = 0 by 4.1(iv). The proof of (ii) is
obtained by duality.
106 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

4.3 Theorem. If x, y, z e D and x + y = x + z, then y = z.

Proof. We “subtract” x from each side of x + y = x + z. That is,


pick u, according to 4.1(vi), for which x + u = 0 or, by commutativity,
u-\- x = 0. We add wto both sides, obtaining u + (x + V) — a + (x + z).
Hence by associativity and choice of u, 0 + y = 0 + z. But then by
commutativity y = z.

4.1 (vi) and 4.3 immediately yield the following.

4.4 Corollary. For any x, y e D there exists a unique u with x + u = y.

4.5 Definition, (i) For any x, y E D we take y — x to he the unique u


such that x + u = y.
(ii) For any x E D we take —x =0 — x.

Thus y — x is a binary operation and —x is a unary operation under


which D is closed.

4.6 Theorem. For any x, y, u E D we have:


(i) x + (—x) = 0;
(ii) if x + u = 0, then u = —x;
(hi) y — x = y + (—x).

Proof, (i) is immediate from 4.5(i) and (ii), and (ii) then follows by 4.4.
To prove (iii) it suffices, by 4.4, to show that x + (y + (—t)) = y. By as¬
sociativity and commutativity this is equivalent to y + (t + (—x)) = y,
which in turn follows from (i) and y + 0 = y.

Parts (i) and (ii) may be said to characterize —x as the additive inverse
of x.

4.7 Theorem. For any x e D, x-0 = 0.


Proof. It will suffice to find a u such that u + X'0 = u + 0. In fact,

X'X + X'0 = X’(x + 0) by distributivity


= X'X by 4.1(vi)
= X'X + 0 again by 4.1(vi).

4.8 Theor-em. For any x, y El D we have:


(i) —(—x) = x;
(ii) x'(—y) = -(x-y);
(iii) (—x)-y = -(x-y)]
(Iv) (—x)-(-y) = x-y.

Proof, (i) It suffices to show, by 4.6(h), that (—x) + x = 0; however,


this is immediate from 4.6(i).
4.1] TOWARD EXTENDING THE POSITIVE INTEGERS 107

(ii) It suffices to show, by 4.6(h), that x-y + £•(—y) = 0. But

X'y + X'{—y) = x-yy + (—ij)) by distributivity


= T'O by 4.6(i)
= 0 by 4.7.

(iii) follows immediately from (ii) and commutativity.


(iv) then follows from (i)-(iii).

Thus the “law of signs” 4.8(iv), which seems so arbitrary to many


beginning students of arithmetic, is seen to need no justification other than
that it is a formal consequence of the conditions for a commutative ring.

4.9 Theorem. For any x, y, z G D, x-(y — z) = x-y — x-z.

Proof.
x-(y — z) = x-(y + (—2)) by 4.6(iii)
= x-y + X' (—z) by distributivity
= x-y + (— (x-z)) by 4.8(h)
= X'y — x-z by 4.6(iii).

By the developments of Section 3.4, we can associate with any sequence


{x\, x2, . ■ ■ , Xk, ■ - .) of elements of D sums xk and products
n£=i xk generalizing the operations + and •, respectively. In particular,
the notions of multiple and exponentiation can be defined as follows.

4.10 Definition. Let x G D and n e P. We set


n

(i) nx = ^2 x
k= 1
and

(ii) xn = j~[ x.
k= 1

One should distinguish nx from n ■ x) the latter may have no meaning,


since for an arbitrary integral domain we need not have P c D; in par¬
ticular we may have n & D. The basic properties of multiples and ex¬
ponents are given in the next two theorems, the proofs of which will be
left to the reader.

4.11 Theorem. For any x, y £ D and m, n e P we have:


(i) lx = x; (ii) (m + n)x = mx + nx;
(iii) (m • n)x = m(nx); (iv) nO = 0;
(v) n(x + y) = nx + ny, (vi) n(x-y) = (nx)-y;
(vii) n{—x) = —(nx); (viii) n(x — y) — nx — ny.
108 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

4.12 Theorem. For any x, y £ D and m, n E P we have:

(i) x1 = x; xm+n _ ,
(ii)

s
O

O
(iii) xm'n = (xm)n; (iv)

II
(v) 1” = 1; (vi) (x-iy)n =

The motive in the particular formulation of these theorems is to con¬


sider the effect of the arithmetic of 1, +, • in P on the notion of multiple,
as in 4.11 (i)—(iii), and on the notion of exponent, as in 4.12(i)-(iii). In
the remaining parts we consider, as far as possible, the effect of using a
fixed multiple or fixed exponent n on the arithmetic of 0, 1, +, *, —.
The reader should see why other laws suggested by this procedure cannot
be included.

Exercise Group 4.1

1. Prove 4.11(h), (iii), (v), (vi), (vii).


2. Prove 4.12(iii), (vi).
3. Prove that for any x, y, z, w E D, x — y = z — w if and only if
x + w = z + y-
4. Prove that for any x, y, z, w E D,
(a) (x — y) + (z — w) = (x + z) — (y + w),
(b) (x — y) — (z — w) = (x + w) —(y + z),
(c) (x — y)’(z — w) = (x-z + yw) — (yz + x-w).
5. Can you construct a system satisfying all conditions of 4.1 except the
first, 0^1?

4.2 Integral domains. There are other concepts and laws associated
with the positive integers which we should see if we can consistently extend
in whole or in part to systems in which subtraction is available. For
example, we have the cancellation law 3.21 for multiplication, according
to which if x, y, z E P and x • y = x • z, then y = z. This law cannot be
consistently extended to commutative rings with unity as it stands; for
0-1 = 0-0 but 1 0. However, if we exclude the possibility that
x = 0, we obtain a useful concept.

4.13 Definition. A system (D, +, •, 0, 1) is called an integral domain if


it is a commutative ring with unity satisfying the following condition:
for all x, y, z E D,

if x 0 and x-y = x-z, then y = z.

If (D', +, •, 0, 1) is a subsystem of (D, +, •, 0, 1) and is an integral


domain then it is said to be a subdomain of the second system.

It is easily seen that if (D, +, -, 0, 1) is an integral domain and D' c D,


then D' forms a subdomain of D under the operations of D if and only if
4.2] INTEGRAL DOMAINS 109

1 G Dr and for any x, y E D' we have i + j/eD', x — y G D', and


X’V S D'.
Later we shall be able to give a number of examples of commutative
rings with unity which are not integral domains. The following provides a
useful equivalent to this new condition.

4.14 Theorem. Suppose that (D, + , *, 0, 1) is a commutative ring with


unity. Then it is an integral domain if and only if for any x, y E D,
x-y = 0 implies that x = 0 or y = 0.

Proof. Suppose that the system is an integral domain. Suppose that


x-y = 0. If x = 0, we are through. Suppose that x ^ 0; we shall
show in this case that y = 0. In fact, x-0 = 0 by 4.7, hence x-y = x-0,
hence y = 0 by 4.13. Suppose now that x-y = 0 implies x = 0 or y = 0
for all x, y E D. Consider any x, y, z E D for which x ^ 0 and x-y = x-z.
Then x-y — x-z = Oby 4.6, hence x-(y — z) = 0 by 4.9. Since x 0,
we must have y — 2 = 0, i.e., y = z by 4.6.

Theorem 4.14 provides the first step to the study of the solutions of
algebraic equations in an integral domain. Let D be such a domain and
let n = nl for any n E P. By the results of Section 4.1 we have, for
example, x2 — 2-x — 3 = (x — 3)-(x + 1). Hence, for any x E D,
x2 — 2-x — 3 = 0 if and only if (x — 3) • (x + 1) = 0. By 4.7 and
4.14, the latter is equivalent to x — 3 = 0 or x + 1 = 0, i.e., to x = 3
or x = —1. It would be true in any commutative ring with unity that
both 3 and —1 are solutions of the equation x2 — 2-x — 3 = 0. How¬
ever, we would be unable to establish, without the hypothesis that we have
an integral domain, that these are the only solutions of this equation.
We have not yet proved the existence of commutative rings with unity,
let alone of integral domains. Since the conditions for these do not ex¬
plicitly demand the existence of any elements other than 0, 1 where
0 1, it is conceivable that such systems can be found using only these
two elements. If there were such a commutative ring with unity, it would
necessarily have the following addition and multiplication tables:

+ 0 1 • 0 1

0 0 1 0 0 0
1 1 0 1 0 1

The entries in the + table are determined by the conditions x + 0 =


x = 0 + x, except for the entry 1 + 1 = 0. This must hold if we are
to have 1 + u = 0 for some u, and u = 0 will not work. The entries in
110 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

the • table are determined by the conditions .r*0 = 0 = 0-.r and x-1 = x.
Now it can be verified, by checking each possible choice of x, y, z in each
condition of 4.1 and 4.13 (or 4.14), that the two-element system thus ob¬
tained is actually an integral domain. This is a tedious matter; for ex¬
ample, to verify the distributive law alone requires checking eight cases.
However, we shall be able to obtain verification of this result as an easy
consequence of a theorem allowing us to construct a large number of
integral domains, which will be proved at the end of this chapter.

Ordered integral domains. The distance is great between this example


of a two-element integral domain and the intuitive concept of the integers
which motivated us to study such systems. There is no contradiction here
but rather a demonstration that we have still not imposed enough condi¬
tions to characterize our intuitive notion of integer. One thing that, is
lacking in our development so far is the extension of the notion of order.
For various reasons (which will be brought out below), we expect that 0
should be “less than” 1 and that 1 should be “less than” 1 + 1. But if
this held in our two-element system, we would also have 1 “less than” 0,
contradicting the law of trichotomy. This shows that the notion of order
cannot be consistently applied to our two-element system. It may be
expected then that by adjoining conditions of order we will avoid such
unusual cases and come closer to the desired concept.
We already have an intuitive picture of the ordering of the integers, as
given by
—3 —2 —10 1 2 3

where x < y if x is to the left of y. It is clear that this is not a well¬


ordering, but it certainly is a simple ordering. Its connection with the
arithmetical operations +, •, at least on the positive integers, is given by
the statements 3.37(i), (ii), that for all x, y, z e P, x < y implies that
x f- z < y z and x ■ z < y ■ z. Of these, we can expect the first to
hold even when 2 is 0 or negative; the second will be false when z = 0,
and when 2 is negative it will have the effect, rather, of making y ■ z < x ■ z.
Thus if we attempt to impose conditions of order on an integral domain,
we are led to the following definition.

4.15 Definition. A system (D, +, •, <, 0, 1) is called an ordered integral


domain if the following conditions hold:
(i) (D, + , .,0, 1) is an integral domain;
(ii) (D, <) is a simply ordered system;
(hi) for all x, y, z e D, if x < y, then x + 2 < y + 2;

(iv) for all x, y, z e D, if x < y and 0 <2, then x-z < yz.
4.2] INTEGRAL DOMAINS 111

In any such domain, we denote by D+ the set {x mgD and 0 < x}


If (D', +, <, 0, 1) is a subsystem of (D, +, -, <, 0, 1) then it is
said to be an ordered subdomain of the second system.-

The existence of an ordered integral domain will be provided by the


construction of the integers in the next section.

4.16 Theorem. Let (D, +, •, <, 0, 1) be an ordered integral domain. Then


for any x, y E D we have:
(i) x < y if and only if y — x E D + ;
(ii) if x, y E D+, then x + y E D+ and x-y E D+;
(iii) exactly one of the following three cases holds:

x E D+, x = 0, —x E D+;

(iv) if x 0, then x2 E D+;


(v) 1 E D+;
(vi) if x E D+ and n E P, then nx E D+ and xn E D+.

Proof, (i) If x < y, then x + (—x) < y + (—x) by 4.15(iii). Hence


0 < y — x. Conversely, if 0 < y — x, we have 0 + x < (y — x) + x,
hence x < y.
(ii) Suppose that 0 < x, 0 < y. Then 0 + y < x + y by 4.15(iii),
that is, y < ® + y. Hence, by transitivity, also 0 < x + y. Also, by
4.15(iv), 0-y < x-y, that is, 0 < x-y.
(iii) —x E if and only if 0 < —x, that is, 0 < 0 — x, which by (i)
is equivalent to x < 0. Hence (iii) follows directly from the trichotomy
law for < applied to 0, x. The proofs of (iv)-(vi) are left to the reader.

The conditions 4.16(i), (ii), (iii) lead to an alternative characterization


of the notion of an ordered integral domain.

4.17 Theorem. Suppose that (D, +, •, 0, l) is an integral domain for which


there is a set Pos c D satisfying the following conditions:
(i) if x, y E Pos, then x + y E Pos and x-y E Pos;
(ii) for any x E D, exactly one of the following three cases hold:

x E Pos, x = 0, —x E Pos.

Define x < y to hold if y — x E Pos. Then (D, +,•,<, 0, 1) is an


ordered integral domain in which Pos = D+.

The proof of this is left to the reader.


It is now easy to see that there is no way in which we could define a
relation < which would turn the two-element integral domain into an
112 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

ordered integral domain. For by the definition of D+ and 4.16(v) we must


have 0 < 1. Hence also 0+l<l + lby 4.15(iii). But in the two-
element domain 1 + 1 = 0, so that we would have both 0 < 1 and
1 < 0, contradicting the law of trichotomy. In fact, if D is an ordered
integral domain and we define n = nl, we see by repeated addition of 1
to the inequality 0 < 1 that 0<l<2<’--<n<n+l< * • •
Thus the set of elements n looks just like the positive integers as far as
their order is concerned. Even more can be said. For the remainder of
this section we assume that (D, + , •, <, 0, 1) is an arbitrary ordered integral
domain.

4.18 Theorem. Let P = {nl : n e P}. Then

(P,+, <, 1) =* (P, +, <, 1).

Proof. For each n e P, let F(n) = nl. Then T>(F) = P, (R(F) = P,


and F(l) = 1. Further, by carrying out the argument described above,
we see that for any n, m e P

(1) n < m if and only if F(n) < F{m).

Hence

(2) F is one-to-one;

for if n m either n < m or m < n, hence either F(n) < F(m) or


F(m) < F(n), so that in any case F(n) ^ F(m). All that is left to establish
is that

(3) F(n + m) = F(n) + F(m)

and

(4) F(n-m) — F(n)’F(m).

We obtain (3) from 4.11 (ii) in the form (n + m)\ = nl + ml. We ob¬
tain (4) from 4.11 (iii), (vi) in the form (n-m)l = n(ml) = n(l*ml) =
nl -ml.

Absolute value. We are familiar with the concept of the absolute or


numerical value of a number, which is to be the number itself if the num¬
ber is positive or 0 and otherwise is the negative of the number. By the
trichotomy law we can apply this concept to any ordered integral domain.

4.19 Definition. For each x e D, \x\ is defined by the following conditions:


(i) if 0 ^ £ then \x\ = x;
4.3] CONSTRUCTION AND CHARACTERIZATION OF INTEGERS 113

4.20 Theorem. For any x, y, u e D we have:


(i) 0 < \x\)
(ii) —|:r| < x < \x\)
(iii) if 0 < u and —u < x < u, then \x\ < u;
(iv) \x + y\ < \x\ + \y\;
(v) \x-y\ = \x\• \y\.

Proof, (i) is obvious if 0 < x and follows from Exercise 1(c) below
when x < 0. Hence —\x\ < 0 < \x\. Since x = —\x\ when x < 0, this
leads to (ii). To prove (iii), the conclusion is obvious if 0 < x. If x < 0,
then from —u < x follows —x < u by Exercise 1(c), and hence again
M < u. (iv) By adding the inequalities — |.r| < x < \x\ and —\y\ <
V 5: \y\ we obtain —(\x\ + \y\) < x + y < \x\ + \y\. Applying (iii)
with x + y instead of x and \x\ + \y\ instead of u gives the desired con¬
clusion. (v) is easily proved by considering the four possible cases.
Part (iv) of 4.20 is the familiar triangle inequality.

Exercise Group 4.2

In the following we have an arbitrary ordered integral domain.

1. Prove that for any x, y, z £ D we have:


(a) x < y if and only \i x z < y z]
(b) if 0 < 2, then x < y if and only if X’Z < y-z;
(c) x < y if and only if —y < —x;
(d) if z < 0, then x < y if and only if yz < x-z.
2. Prove Theorem 4.16(iv), (v), (vi).
3. Prove Theorem 4.17.
4. Prove that ||x| — |r/|| < \x — y\.
5. For which values of n G P does xn = yn imply that x = y for all x, y G D?
Prove your statement.

4.3 Construction and characterization of the integers. According to


4.18 the set of multiples nl in an ordered integral domain forms a system
isomorphic to the positive integers. We would say that the entire domain
looks just like the integers if it has the additional property that

(4:3-1) for any x e D, either x = n 1 for some n 6 P or x — 0 or


x = —(nl) for some n e P.

We would like to show now that there exists at least one domain with this
property and that any two domains with this property are isomorphic.
Then we will be justified in choosing one of these and calling it the system
of integers.
114 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

There are two approaches to the proof of existence. The first is to


adjoin a set of new objects to the given domain of positive integers and to
define the operations outright, and then to try to verify that the new
system is an ordered integral domain with the additional property (4:3-1).
Let 0, 1*, 2*, ... be such objects, of which none is in P and all are distinct
[for example, we can take 0 = (1, 1) and n* = (n + 1, 1) for n E P].
Here n* is to be the element —n in the domain to be constructed; however,
we cannot give it this designation before we have defined suitable oper¬
ations. Let P* = {n* : n £ P} and let D = P U {0} U P*. We wish
to define operations x + y and x-y and a relation x < y for all x, y E D.
This must be done by cases, according as x, y belong to P, {0}, P*. There
are nine main cases for each operation. (Once + is defined we could
simply define x < y to hold if and only if there is a z E P with x + z = y.)
For example, we could define x + y as follows, for the subcase x E P, and
the three associated subcases y E P, y = 0, y E P* (of which the last
must in turn be divided into three further cases):

'x + y if x E P, y G P
x if x E P, y = 0
(4:3-2) x + y • x — z if x E P, y E P*, y = z* and z < x
{z - x)* if x £ P, y E P*, y = z* and x < z
p if x G P and y = x*.

Here f is the operation defined on P only if v < u and is taken to be


the unique w E P such that v + w = u (3.12, 3.24). The motive for
choosing the various clauses in this definition is made on the grounds that
when we finish defining + and obtain the definition of — from it ac¬
cording to 4.5 we will have u — v = u — v whenever u, v E P and v < u,
and —z = z* whenever z E P. Thus, for example, if z < x, x + z*
should be x + (—z) = x — z. It is evident that completing the defini¬
tion of +, defining •, and then verifying that all the desired conditions
are met by the resulting system would be an extremely tedious job.
Fortunately the second approach to the proof of existence is somewhat
more manageable, though slightly more sophisticated. Moreover it has
the advantage of being quite instructive regarding the method of carrying
out proofs of existence of other types of systems. The main idea is to
shift attention from the formation of negatives, as in (4:3-1), to the full
use of subtraction. In other words, we could demand, instead of (4:3-1),
that the desired domain should have the property

(4:3-3) for any x E D there exist n, m E P with x — nl — ml.

In particular, x = ql, for q E P, can be written as (q + 1)1 — 11 and 0


as nl — nl. Thus to any pair of elements (n, m) E P X P will correspond
4.3] CONSTRUCTION AND CHARACTERIZATION OF INTEGERS 115

the result of subtracting ml from nl. However, in contrast to (4-3.1),


this representation of each x G D is not unique; we can have x = n'l —
m'l for other n', m'. But it is seen that two pairs (n, m) and (n', m') lead
to the same value, by subtraction, if and only if n + m' = n' + m. This
defines an equivalence relation in the set P X P; the integral domain can
then be constructed as a system whose elements are the equivalence sets
of this equivalence relation. To define the operations of addition and
multiplication of these equivalence sets, we observe that if x has a repre¬
sentation nl — ml and y has a representation pi — ql, then x + y has
a representation (n + p)l — (m -\- q) 1 and x-y has a representation
(np © mq) 1 — (nq + mp)l (cf. Exercise 4 of Exercise Group 4.1). Thus
we obtain associated operations ©, ° on pairs (n, m) with (n, m) ©
(P, q) = (n © p, m + q) and (n, m) ° (p, q) = (np © mq, nq + mp). To
show that these lead to well-defined operations on equivalence sets, we
return to the considerations of Section 2.3. The full details of this approach
are now gi yen in the proof of the following theorem.

The existence theorem.

4.21 Theorem. There exists an ordered integral domain (D, +, •, <, 0, 1)


with the property that for any x G D either there is an n & P with
x = nl or x = 0 or there is an n G P with x = —nl.

Proof. Define

(1) W = {((n, m), (n', m')) : (n, m), (n', m') e P X P and
n + m' — n' : • ?n].

Thus IT is a binary relation in P X P. We shall also write

(2) (n, m) = (n', m') for ({n, m), (n', m')) G W.

We first verify that

(3) W is an equivalence relation in P X P.

This follows from the following statements, which hold true for all n, m,
n', m', n", m" G P:
(a) (n, m) = (n, m);
(b) if (n, m) s (n',m'), then (n',m') = (n, m);
(c) if (n, m) = (n', m') and (n', m') = (n", m"), then
(■n, m) = (n", m").

Of these, (a) and (b) are obvious from the definition of =. To prove (c),
we have, by hypothesis, n + m' = n' © m and n' © m" = n" + m'; we
116 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

wish to conclude that to © m" = to" -j- m. We add the first two equations
together to obtain

n + m! + n' + m" = to' + m + to" © to©

then by commutativity and cancellation of m! + to' from both sides we


obtain the desired result.
Now define for any n, m, p, q E P

(4) (n, m) © (p, q) — (to + p, m + q),

(5) (n, m) o (pt q) = (np + mq, nq + mp),

(6) (to, m) < (p, q) if and only if n -j- q < m + p.

Then we claim that for any n', m', p', qf e P

(7) if (n, m) = (n', m') and (p, q) = (p'} q'), then


(a) (n, m) © (p, q) = (n', m') © (p', q'),
(b) (to, m) o (p, q) = (n', m!) » (p', q'),
and
(c) (to, m) < (p, g) t/ and only if (nm') < (p', g')-

For, under the hypotheses, to + mf = n' + m and p + qf = p' -f q. We


wish to show, for (a), that

(n + p) + (m' + q') = (n' © p') © (m + q).

This is obtained directly by adding the two equations together. We leave


the verification of 7(b) to the reader. To prove (c) we have by hypothesis
(in one direction) to + q < m + p and to + m' = n' + m and p © q' = q' © p.
Then by 3.37 (i), n ~\~ q ~\~ m' <m + p© m', hence to' © m + q <
m + p + m', and then n' © q < p + m'; but then also to' + g + q' <
p + tot' + g', hence w' + g + g' < p' + g + m', so that

n' + qf < p' + m',

which is the desired result. Conditions 7(a), (b) express, in the terminology
of (2-3.36) that = is a congruence relation with respect to the operations ©, °
and the relation < .
Now let

(8) D = {X : for some n, m E P, A" = lF(n,TO)}.

In other words, D is the collection of all equivalence sets of W. Thus D


4.3] CONSTRUCTION AND CHARACTERIZATION OF INTEGERS 117

is a subset of the set of all subsets of P X P. By (2:3-29) for each X,


F e D either I = F or I n F = 0. Further, if X = W(n,m) and
^ we have X = F if and only if (n, m) = (n', m'). Following
our earlier practice, we will now find it more convenient to write [(n, m)}
for fF(„)m). Furthermore, for typographical simplicity, we shall also write
[n, m\ instead of [{n, m)}, where there is no ambiguity. As we have seen
in (2:3-37), it follows from (7)(a)-(c) that

(9) there are operations +, • defined on D and a relation < between


elements of D such that for any n, m, p, q e P,
(a) [n, m] + [p, q] = [(n, m) © (p, q)],
(b) [n, m\ • [p, g] = [(n, m) ° (p, g)],
(c) [n, m\ < [p, q] if and only if (n, m) < (p, q).

To complete the construction of our domain, we need only determine a


0 and 1. These are provided by

(10) (a) 0 = fl, 1] and (b) 1 = [2, 1].

Flowever, it is seen from the definition of = that for any n e P

(11) (a) (n, n) = (1, 1) and (b) (n + 1, n) = (2, 1).

Hence also

(12) (a) 0 = [n, n] and (b) 1 = [n + 1, n\.

We shall now show that

(13) (D,0,1) is a system satisfying the conditions of our


theorem.

In order to do this we must show that all the conditions 4.1 (i)—(vi) for a
commutative ring with unity are met, that the additional conditions 4.13
(or 4.14) and 4.15 for an ordered integral domain are met, and that the
new condition stated in our theorem is satisfied. We shall content ourselves
here with a sampling of the proofs involved.
Consider first the commutative law for +,

(14) for any X, F e D, X + F = Y + X.

Let X = [n, m], Y = [p, q] for some n, m, p, q £ P. Then X + F =


[(n, m) © (p, q)] and Y + X = [{p, q) © (n, m)] by (9a). Thus it
suffices to prove that (n, m) © (p, q) = (p, q) © (n, m), i.e., that
(n + p, m + g) = (p + n, q + m). In fact, we have equality in the latter,
by commutativity in the positive integers.
118 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

Consider next the statement that 1 is an identity element,

(15) for any IeD, I4 = I.

Let X = [n, m\. Then X*1 = [{n, m) ° (2, 1)] by (9b) and (10b). Hence
X• 1 = [2n + m, 2m © n\. Thus it suffices to prove that

(2n + m, 2m + n) = (n, m),


i.e., that
(2n + m) + to = (2m + w) + n;

this last is clearly true.


To prove the possibility of subtraction, we must show that

(16) for any X, Y e D there exists a U G D such that X + U = Y.

Assuming that the earlier conditions for a commutative ring with unity
have been established, it suffices to find V e D such that X + V = 0;
for then we can take U = V + F. Let X = [n, m\; this corresponds to
n — m. Clearly its negative should correspond to m — n. Thus we take
V = [m, n]. Now

X + V = [n, m] + [m, n] = [(n, m) © (m, n)] by (9a)


= [n + m, m + n\ by (4)
= 0 by (12a).

To obtain an integral domain we show that

(17) for any X, Y G D if X* F = 0, then X = 0 or Y = 0.

Indeed, suppose that X = [n, m\, Y = [p, q). Then

X‘ Y = [(n, m) o (p, g)] = [rip © mq, mp © nq\.

If X-Y = 0, then (np + mq, mp + nq) = (1, 1), hence np + mg =


mp + nq. Suppose that X ^ 0, i.e., that n ^ m. Either n > m or
m > n. In the first case, n = m + r for some r e P. Hence
mp -\- rp + mq = mp -\- mq + rq by distributivity in P. Then by can¬
cellation for + in P, rp = rq; hence by cancellation for • in P, p = q.
Thus Y = 0. The argument is similar if m > n.
Let us now consider the properties 4.15, to see that we have an ordered
integral domain. For example, we have

(18) for any XJeDpJX ^ F, then X < Y or Y < X.

For let X = [n, m], Y = [p, q] and suppose that X ^ F. Then (n, m) ^
(p, q), that is, n + q m + p. Hence n + q <m + porm + p <n + g.
4.3] CONSTRUCTION AND CHARACTERIZATION OF INTEGERS 119

In the first ease, (n, m) < (p, q) and in the second case (p, q) < (n, m)
by (6). Hence [n, m] < [p, q) or [p, q] < [n, m] by (9c). As another
example, we have

(19) for any X, Y, Z e D, if X < Y, then X + Z < Y + Z.

For let A = [n, m], Y = [p, q],Z = [:r, s]. By hypothesis, (n, m) < (p, q),
that is, n + q < p + m. We wish to show that (n + r, m + s) <
(p + r, q + s), that is, n + r + ^ + s < m + s + p + r. This follows
by adding r -f- s to both sides of the given inequality.
Finally, we shall prove the special property mentioned in our theorem.
To do this we must first compute nl and —nl for every n e P. By defini¬
tion nl = £it=i 1- It is easily seen by induction on n that

(20) nl = [n + 1, 1],

Further it follows from the proof of (16) that — nl = [1, n + 1], We wish
to show that

(21) for any IeD either there is an n & P with X = nl or X = 0


or there is an n & P with X = —nl.

Let X — [p, q]. If p > q, then p = q + r for reP; in this case


(p, q) = (r + 1, 1) and X = rl. If p — q, then X = 0 by (12a). If
p<q, q=PJrr for r e P; in this case (p, q) = (1, r + 1) and X = —rl.
It is now evident how our main proposition (13), can be derived in a
generally straightforward way from the statements (1)-(12), thus estab¬
lishing our theorem. We shall leave the verification of other instances as
exercises for the student. In each instance the verification is reduced to a
question as to whether certain related properties hold in P. If the prop¬
erties appealed to were noted down it would be seen that practically all
the results obtained in Chapter 3 concerning +, •, and < have been used
in proving this theorem.

We can now obtain from 4.21 the existence of a system of the desired
kind which actually contains P.

4.22 Theorem. There exists an ordered integral domain (I, +, •, <, 0, 1)


which
(i) contains (P, +, *, <, 1) as a subsystem, and satisfies
(ii) for any x G I either x £ P or x = 0 or —x G P.

Proof. Let (D, +,-,<, 0, 1) be a system satisfying the conditions of


4.21. Let P = (nl : n G P}. By 4.18

(P, +, •, <, 1) ^ (P, +, <, 1)


120 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

with the isomorphism being established by the function G(n) = nl for


each ns P. The system (P, +,•,<, 1) is a subsystem of (D, +, •, <, 1).
Diagrammatically, we wish to complete the following figure.

I<

Figure 4.1

That is, we have exactly the situation of (2:4-9), as illustrated in Figs. 2.22
and 2.23. According to the result there, we can choose a set I containing P
and define operations +, • on I and a relation < on I such that
(I) "Tj <, 1) contains (P, +, •, <, 1) as a subsystem, thus satisfying (i).
Further we can extend G to an isomorphism H of this new system onto the
system (D, +,*,<, 1). If we take 0 to be the unique element z in I with
H(z) = 0, H also establishes that

(I; +> ’> <> 0, 1) ~ (D, +, •, <, 0, 1).

Since isomorphic systems have the same algebraic properties, it follows


that our new system (I, . . .) must be an ordered integral domain. Con¬
sider now any x S I. Then H(x) + H(—x) = H(x+ (~x)) = H{0) = 0;
hence H(—x) = —H(x). Now H(x) G P or H(x) = 0 or —H(x) G P by
assumption on D. In the first case H(x) = G(x) and igP, since H
extends G. In the second case x = 0. In the final case H(—x) G P and
again — x G P. This completes the proof of our theorem.

Uniqueness of the characterization. We shall now be justified in fixing


the integers to be any one of the systems of 4.22 if we show that any two
such systems are isomorphic. In fact, we shall do a bit more, by showing
that any ordered integral domain D contains an ordered subdomain I
isomorphic to the system I.

4.23 Theorem. Let (I, +, •, <, 0, 1) be a system satisfying the conditions


4.22(i), (ii). Let (D, +, •, <, 0, 1) be any ordered integral domain.
Define nu, for n G I and us D, according to 4.10(i) if n G P, other¬
wise to be 0 if n — 0 or — ((-n)u) if -n S P. Let G be the func¬
tion with £>((?) = I, G{n) = nl for each ns I, and let I = (51(G).
4.3] CONSTRUCTION AND CHARACTERIZATION OF INTEGERS 121

Then the following hold:


(i) I forms an ordered subdomain of D and G establishes an
isomorphism

(I) +> <, 0, 1) = (I, +, •, <, 0, 1).

(ii) If D also satisfies the condition of 4.21, then D = I.

Proof. It is also more convenient for this proof to deal with differences
of positive integer multiples instead of just elements and their negatives.
First note that

(1) if x G I there exist n, m & P with x = n — m.

For by 4.22(h), either x e P or x = 0 or (—x) e P. In the first case


x = (x + 1) — 1, in the second x = 1 — 1, and in the last case
x = 1 — (l + (—x)). The converse to (1) is, of course, trivial by the
assumption of 4.21. Next note that

(2) if n, m e P then G(n — m) = nl — ml.

For by the trichotomy law in P, either there is k e P with n = m + k


or n = m or there is / e P with m = n + l. In the first case, by 4.11(h),

G(n — m) = G(k) = AT = (to + /c)l — toI.

In the second case

Gin — to) = G(0) = 0 = nl — nl.

In the final case


G(n — to) = G(—l) = —(11),

since —( — l) = l; but also —(H) = nl — (n + 1)1, again by 4.11(h).


Parts (1) and (2) now allow us to reduce the proof of our theorem to
the isomorphism result 4.18 for positive integer multiples.

(3) G is one-to-one.

For suppose that x, y G I and G(x) = G(y). By (1) we can find n, to, p,
gel with x = n — m and y = p — .q. Then by (2), nl — ml =
pi — ql; hence nl + ql = ml + pi and (n + g)l — (m-fp) 1 by 4.18
[or 4.11(h)]. But then n + q = to + p by 4.18 and hence x — n — m —
p — q = y. We show next that

(4) for any x, y E. I,

G(x + y) = G(x) + G(y) and G(x • y) = G(x)-G(y).


122 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

Writing again x=n — m, y = p — q with positive integers n, to, p, q,


we have x + y = (n + p) — (to -f- q); hence

G(x + y) = (n + p) 1 — (to + g) 1
= (nl + pi) — (toI + gi)
= (nl — toI) + (pi — ql)
- G(x) + G(y).

The proof of the second part makes similar use of the algebra of integral
domains. Since (7(0) = 0 and (7(1) = 1, we need only show

(5) for any x, y G I, x < y if and only if G(x) < G(y),

in order to complete the proof of the first part of our theorem. We have
n — m < p — q if and only if n -f- q < to + p, since we have an ordered
integral domain. This condition is in turn equivalent to nl + ql <
ml + pi by 4.18, and hence to nl — toI < pi — ql, again by the
results on ordering.
To prove (ii), suppose that for any u £ D, either u = nl with n e P,
or u = 0 or u = — (nl) with n e P. Then u = G(k) for /cel, with
k = n, k = 0, or k = —n, respectively. Hence D c 61(G) = I and
therefore D = I.

It follows from this theorem that 4.11 can be extended to hold for
arbitrary integer multiples of elements of any ordered integral domain.
In fact it can be seen, by additional argument, that if (D, +, •, 0, 1) is
any commutative ring with unity then 4.11 (i)-(vii) remains true for any
x, y G D and m, n e I. However, in the following text, we have no further
need for the more general multiples.
The results 4.22 and 4.23 now permit us to adopt the following con¬
vention.

4.24 Convention. We assume throughout the remainder of this book that


(I, +, •, <, 0, 1) is a fixed ordered integral domain satisfying the con¬
ditions 4.22(i), (ii). We shall call I the set of integers.

Exercise Group 4.3

1. Using the notations of the proof of 4.21, show that for any n, m, p, q, p',
q' E lf (p> q) = (p', q'), then (n, to) ° (p, q) = (n, to) ° (p', q'). Use
this result to establish (7b) of the proof.
4.4] THE INTEGERS AS AN INDEXING SYSTEM 123

2. Using the notations of the proof of 4.21 show the following for any
z, y, zed.
(a) Z + 0 = X
(b) X-(Y-Z) = (X-Y)-Z
(c) X-(Y + Z) = X-Y + X-Z
(cl) if X < Y and 0 < Z, then Z-Z < F-Z.
3. Prove that for any x E I we have x E P if and only if 0 < x.
4. Prove that for any x, y E I we have y < x if and only if y + 1 < x.

4.4 The integers as an indexing system. Before turning to the mathe¬


matical features of the system of integers which expose its structure more
deeply, especially those relating to questions of divisibility, we wish to
indicate its usefulness in expanding the treatment of sequences, sums, and
products in arbitrary commutative rings. This is made possible by the
following result, which permits us to use segments of I other than P as
the basis for an indexing system.

4.25 Theorem. Let a E I and let Pa = {x : x G I and x > a}. Then


(Pa, <) =. (P, <) and (Pa, <) is a well-ordered system in which the
first element is a, the successor of any element x is x + 1, and for each
x > a, the predecessor of x is x — 1.

Proof. We can define a function F on Pa by

(1) F(x) = (x - a) + 1

for each x > a. Then

(2) 2D (F) = P0, (R (F) = P.

For if y E P, y > 0 by Exercise 3 above. Let x — y + a — 1; then


x > a — 1, hence x > a by Exercise 4 above, i.e., x E Pa. Clearly
y = F(x).

(3) F is one-to-one.

For if F(x) = F(y), that is, x + (1 — a) = y + (1 — a), we obtain


x = y by cancellation. Finally, for x, y E P,

(4) x < y if and only if F(x) < F(y).

This follows immediately from Exercise 1(a) of Exercise Group 4.2. Thus
F establishes the following isomorphism:

(5) (Pa, <) = (P, <)•

Hence Pa must be well ordered, since P is. Further, F(a) = 1, so a is the


124 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

first element in the ordering of Pa. From F(x + 1) = F(x) + 1, we see


by the isomorphism that x + 1 must be the successor of x in Pa; on the
other hand, if x > a, then x — 1 > a, and F(x — 1) + 1 = F(x), so
that x — 1 must be the predecessor of x.

Thus inductive proofs and recursive definitions can be given equally


well for any of the systems Pa. When a = 1 we have Pj = P. Another
system that we shall often use is P0, although we shall occasionally use
others such as P2, etc. Note that the isomorphism given in 4.25 does not
necessarily extend to other relations or operations. For example, though 1
corresponds to 0 in the isomorphism from P0 to P, we have 0 + 0 = 0
and 1 + 1 9^ 1. Of course, an operation +' can be defined on P0 which
would be isomorphic to the operation of + on P, but this would not be of
any particular use.
Let S be any set. We expand the notion of an infinite sequence in S to
take in also functions with domain P0. Such a sequence is written

(4:4-1) <A0, • • • ! %k) •■■')■

For any n > 0 we can construct from this the n-termed sequence

(4:4-2) (xq, ■ ■ ■ i )

consisting of the first n terms. We can also include the case n = 0 by


introducing the notion of an empty sequence; this can be thought of as the
function with D(F) = 0, hence (R(F) = 0 (in fact, F = 0).
For simplicity, we assume (where appropriate) throughout the following
that

(4:4-3) (D, +, •, 0, 1) is a commutative ring with unity.

4.26 Definition. Let {x0, . . . , Xk, . . .) be an infinite sequence of elements


of D. Then Xk and n£=m Xk are defined for any n, m e I,
where m > 0, by the conditions:
n n
and JJ Xk = 1 if n < m;
k=m

and

if m < n.
4.4] THE INTEGERS AS AN INDEXING SYSTEM 125

The condition (i) can be viewed as defining the sum and product of the
empty sequence. 1 he choice of values 0 and 1 in (i) is arbitrary, so far
as producing a well-defined notion is concerned, but is not arbitrary if we
wish these special sums and products to share the general properties of
usual sums and products. We have, for example,

2= (2
fc=l
\k=1
**) + *1
)
= 0 + xi = Xi,

and similarly
i
n
k=l
Xk = i-xi = Xi,

so that the conditions of 4.26 accord with 3.42(i) and 3.43(i) in this case.
The condition (ii) of the above definition is justified by recursive definition
on Pm for each m E I, m > 0. Further, inductive proof on Pm easily
serves to establish the following.

,
4.27 Theorem. Let (xo,. . ., Xk • ■. ) be an infinite sequence of elements of
D, and let n, m, q G I, m > 0, q > 0. Then we have
n n
(i) ^2 xk and JJ Xk, as defined in 4.26,
fc=l S: = 1

accord with the values given by 3.42 and 3.43;


n n —m-\-q

(ii) ^ y ^ "] %k-\-m—q)


k=m k=q

n n
(iii) 2- Xk = X) (z'Xk) for any z E D.
k=m k=m

4.27(iii) follows immediately from (ii) for q = 1 and the general dis¬
tributive law 3.46. The condition 4.27(ii) allows us to choose the initial
value of k at our convenience.

More general associative and commutative laws. We shall now formulate


general associative and commutative laws. In the first of these we consider
any nonnegative integers n\, n2, . . . , nt and set n = nx + n2 + • • • + nt.
Given x\, . . . , xn, we can group these as:

X\, ■ • • , Xn^\ I; • ■ • > Xni+n2> Xn1-{-n2-\-lf ■ ■ ■ , +«24-n3; • • •

Here, if n4- = 0, the corresponding subsequence is regarded as being empty.


126 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

4.28 Theorem. Let (xx, . . . , xk, . . .) be an infinite sequence of elements


of D. Let (wi, . . . , nt) be a sequence of nonnegative integers. For each
i = 1, . . . , t, let mi = £}=i nj, and let n = mt + nt. Then
n t / ni
X
k=1
Xk — X ( X Xmi+k
1=1 \fc=l

The same holds if we replace £ by II (but £, + remain unchanged).

The reader should convince himself that this is the precise formulation of
the desired result. The proof is left to him. For a formulation of a general
commutative law we need the notion of a change in the order of factors.
This is provided by the following.

4.29 Definition. A function F is said to be a permutation of a set S if it


is one-to-one and 2D (F) = (R(F) = S.

Thus the sequence (x3, xlt x2, xfi) is obtained by a permutation F from the
sequence (xi, x2, x3, xfi) where F{ 1) = 3, F(2) = 1, F(3) = 2, F(4) = 4.

4.30 Theorem. Let (x\, . . . , Xk, ■ ■ ■) be an infinite sequence of elements


of D. Then for any n £ P and any permutation F of {1, . . . , n) we
have
n n
X! xk = %F(ky
k—1 k—1

The same holds if we replace £ by II.

Proof. The proof is by induction on n G P. For n = 1 it is trivial.


Suppose that it is true for n. Let Gbea permutation of {1, . . . , n + 1}.
Then G(m) = n + 1 for a unique m, l < m < n -\- l. Then
n-f 1 m—1 n-\-1
<« E } y %G(k) d- %n-\-1 "L ^ by 4.28
fc=1 k= 1 k—m-)-l
to—1 n
= X Xg(fc) + k=m
k= 1
X ^(fc+D + ^n +1 by 4.27(h).

(Note that one or the other of the sums on the right-hand side of the
equation might be empty.) To reduce this to the inductive hypothesis,
we wish to write the sum of the first two terms as %F(k) for suitable F.
Define

F(]A = {GW if 1 < k < m


(2)
W |G(A; +1) if m < k < n '
4.4] THE INTEGERS AS AN INDEXING SYSTEM 127

Since all values of G(k) for k ^ m are < n, we have

(3) 1 < F(k) < n for all k < n.

We claim that

(4) F is a permutation of {1, ... ,n}.

By (2), (3) we need only check that F is one-to-one. Suppose that


F(ki) = F(k2). If both hi, k2 are < m or both are > m, it follows from (2)
and the fact that G is a permutation that Ay = k2. If, say, Ay < m < k2,
we have G(ki) = G(k2 -f- 1), hence Ay = k2 + 1, which contradicts our
assumption. Thus neither this case nor, by symmetry, the case
ko < m < Ay can occur. We have from (1) and (2) that
m+l m—1 n

(5) S ^ y %F(k) "I" y ] %F(k) +


k= 1 k= 1 k=m

= ^ > %F(k) “1“ %n-1-1


k=l

= ^2 xn + ^n+i by (4) and inductive hypothesis


k=l

n+1

= J2Xk-
k= 1

This completes the inductive step and hence the proof of the theorem.

We now extend our definition 4.10 of xn to include n = 0 as follows:

4.31 Definition. Let x e D, n e I, n > 0. We set


n

xn == yj x.
k= 1

4.32 Theorem. Suppose that x, y G D. All of 4.12(i)-(vi) continue to hold


for any n, m G I with n > 0, m > 0, except 4.12(iv) (0” = 0 only
for n > 0). In addition x° = 1 for all x E. D.

The proofs of these results are straightforward from 4.12 by considera¬


tion of the special cases n = 0, m = 0, or can be obtained from the general
properties 4.26-4.28 of JJ-

Geometric series; binomial expansion. We shall conclude this section with


the use of the extended notions of sum, product, and exponent in com¬
puting certain sums from arbitrary commutative rings with unity
128 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

(D, +, *>0, 1). These are the geometric series and the binomial expansion.
The first of these is trivial.

4.33 Theorem. For any x e D, (x — 1) •£*=o %n = (xn + 1 — 1).

From the familiar expansions,

(4:4-4) (x + y)2 = x2 + 2 xy + y2,

(4:4-5) (x + y)3 = x3 + 3 x2y + 3 xy2 + y3,

(4:4-6) (x + y)4 = x4 -f 4x3y + 6x2y2 + 4xi/3 + i/4,

etc., we expect that the general binomial expansion should take the form

(4:4-7) (x + vT = £ 4”>x”-y
k—0

for certain c*n\ How are these coefficients to be calculated? If we write

(4:4-8) (x + y)n = (x + y) ■ (* + y) • . . . • (x + y),


V--—-y •

n times

and consider each factor as being numbered, say first factor, second factor,
etc., we see that the coefficient of xn~kyk is the same as the number of
ways we can make distinct choices of k factors. In other words, c*™"1 is
the number of distinct subsets containing exactly k elements chosen from
a set consisting of n elements. Clearly, under this description,

(4:4-9) c<„”> = 1, c™ = 1, 4”> = 4”A.

Further, we have the following recursive relationship:

(4:4-10) c[n+1) = c[n) + cin2x, for 1 < k < n.

For consider a set {ax, . . . , a„, an+x} with n + 1 distinct elements.


Consider a subset X of this set, where X has exactly k elements. Either
an_|_x G X or not. The number of subsets X for which an_|_x £ X is cj^.
The number of subsets X for which an +i e X is the same as the number
of (k — 1) element subsets Y = X — {a„+x} of (ax, . . . , an}, hence is
c£>x. (4:4-10) corresponds to the familiar Pascal triangle

1 1
1 2 1
(4:4-11) 13 3 1
1 4 6 4 1
4.4] THE INTEGERS AS AN INDEXING SYSTEM 129

where each number in each row, other than the first and last 1, is obtained
as the sum of the two closest numbers in the row directly above it. Thus
(4:4-9) and (4:4-10), as reflected in the Pascal scheme (4:4-11), provide
a simple recursive calculation procedure by which we can obtain any c*re).
There is another approach to the calculation of the c£n) which involves
counting permutations. The number of permutations of a set consisting
ot n distinct elements is the same as the number of permutations of
{1, . . . , n}. Every function on this set is alternatively described by 3.40,
as a sequence (blt . . . , bn). The condition that such a sequence then be a
permutation can simply be written as {bx, ... ,bn] = {1, ... ,n}. The
number of such sequences is determined as follows: bx can be chosen in n
different ways; once bi is fixed, b2 can be chosen in n — 1 different ways,
. . . ; once 61? 62, ... , bn_2 have been chosen, 6n—1 can be chosen in two
ways, and bn is then completely determined. Thus the number of distinct
permutations <61, . . . , bn) is n ■ (n — 1) • . . . • 2 • 1, i.e., is simply n\.
Now let us look at the number c£n) of A:-element subsets of the set
{1, . . . , n}. With each such subset X there are associated certain permu¬
tations. We can permute the elements of X in Ad ways and we can permute
the elements of X in (n — k)! ways. Each of the first is a sequence
(&!,...,&*) with range {6j, . . . , bk} = X; each of the second is a sequence
(b[, . . . , bn-k) with range X. Together these determine a permutation of
{1, . . . , n}, namely (bx, . . . , bn) where bk+i = b[ for 1 ■< i < n — k.
The number of permutations of {l, ... ,n) determined in this way by a
given X is k\(n — 1c)!, since distinct permutations of X or of X lead to
distinct such permutations. Further, if Xx and X2 are distinct A:-element
subsets and <6i1), . . . , 6”)), (bx2), . . . , b^) are permutations associated in
this way with and X2, then they must also be distinct; for otherwise

= {b?\ bi1'} = {bf\ . . . , fef} = X2.

Hence the total number of permutations of (1, . . . , n} thus associated


with A;-element subsets is k\(n — /c) . But every permutation
(bx, . . . , bn) can be obtained in this way; simply take X = {61, . . . , bk}.
Hence we see that
k\(n — k)\ckl) — n\,

or, as we would usually write it,

n\
(4:4-12) 4"> =
k\(n — k)!

These informal arguments are the basis of the following definition and
theorems. As is customary, we will now use the symbol ® for c*n).
130 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

4.34 Definition. Let n, k e I, with 0 < n in (i), (ii) below, and 1 < k < n
in (iii). We define
n

(i) n\ = II D
i—1

4.34(i) extends our previous definition of n\, so that 0! = 1 [4.26(i)].


4.34(h) and (iii) constitute a recursive definition of (£) for all n, k with
0 < k < n.

4.35 Theorem. Let n, k e I, 0 < k < n. Then k\(n — k)\(j.) = n\.

4.36 Theorem. For any x, y e D and n E I, n > 0, we have

(X + </)“=£ (?) xn~yn~k.


fc=o v 7

The proofs of these two theorems, which proceed by suitable inductions,


are left to the reader.

Exercise Group 4.4

1. Prove Theorem 4.28.


2. Let F be a permutation of a finite set S; we call the associated set of F the
set X = {x : x £ S and F{x) 9^ x}. If F is not the identity permutation
then I ^ 0. If we have X = {a\, . . . , a*} where for each i < k,
F(ai) = o»+i and T(a&) = a\, then F is called a (k-) cycle and is denoted
by (ax, cik). A 2-cycle (ai, 02) is called a transposition. The identity
permutation is denoted by (a) for any a G S. If F is a cycle (ai, . . . , a*)
and G is a cycle (61, . . . , 6j)> then (01, . . . , a*)(61, . . . , b{) denotes the
permutation F; G. The cycles are called disjoint if {a\, . . . , an
{hi, . . • , bt} = 0. Multiplication of arbitrary cycles is associative.
(Why?) Thus a product {a\, . . . , a*)(61, . . . , 61) • • • (21, . . . , zt) of
cycles is unambiguous. In the following, prove the given statements or
your answers to the questions.

(a) Is multiplication of cycles commutative?


(b) Every permutation is a product of disjoint cycles.
(c) Every permutation is a product of transpositions (not necessarily
disjoint).
(cl) What can be said about the uniqueness in the representations (b), (c) ?
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 131

(e) Express as products of disjoint cycles:

(123) (456)(135) (246)


(1234)(4321)
(12)(145)(23)(21).

(f) Use (c) to give another proof of the generalized commutative law 4.30.
3. Prove Theorem 4.35.
4. Prove Theorem 4.36.
5. Using the interpretation of Q in terms of sets, prove that the number of
subsets of a set with n elements is 2".
6. Observe (in integers) that 2Xx (ak — ak-i) = an — a0. Thus
n n n

n =
k=l
tt2 ~ (* - D2] = 2^
k=1
~ !) =2
jfc = 1
~ n.

This gives

2 ^ k = n2 n = n(n -j- 1), /b = n<~W ^ •


fc=i Jt=i ^

Use this method to find X"=i h2, &3- Frame a general recursive
procedure for finding X”=i where to is any fixed positive integer.
7. Calculate (1 — x) U"=0 (1 + £2*) (i G I).

4.5 Mathematical properties of the integers. The preceding section


dealt with the integers as an auxiliary system for discussing algebraic
properties of arbitrary commutative rings with unity. In this section we
take up the algebraic properties of the integers themselves, primarily with
respect to the notion of divisibility. It will be observed in the following
that a number of results essentially concern only the positive integers, or
could be easily rephrased so as to involve only notions concerning P.
The advantage of dealing with them now, rather than earlier, is that we
have with the use of 0 and — a great deal more freedom in algebraic
manipulations.

The division algorithm. In an arbitrary ring, subtraction of one element,


x, from another, y, is made possible by the condition that there exists a
(unique) u such that x + u — y. Similarly, it would be possible to divide
x into y or, in more usual terms, to divide y by x if there is an element u
such that X‘U = y. Of course, this is not in general the case. For example,
if y 9^ 0 there is no u such that 0-u = y. Even when x 9^ 0, this is
usually not possible. For example, in the integers we cannot divide 7 by 3.
However, in such cases we can describe how close we can come to the
possibility of division. We consider the multiples 3 • 1, 3 • 2, 3 • 3, ... of
3, i.e., 3, 6, 9, . . . Since 7 is one more than 6 = 3 • 2, we say that 7 has
132 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

a quotient of 2 when divided by 3 and a remainder of 1; 7 = 3 • 2 + 1.


Similarly, 8 = 3-2 + 2 has a quotient of 2 and a remainder of 2 when
divided by 3. In general, if a, b are any positive integers, we would say
that a has a quotient q and a remainder r when divided by b, where
6 5^ 0, if

(4:5-1) bq < a < b(q + 1), and a = bq + r,

or equivalently if

(4:5-2) a = bq + r, where 0 < r < b.

The study of such “near” division in the integers has many interesting
consequences. Among questions which should be answered about (4:5-1),
(4:5-2) are whether such a representation is always possible for any a
and b ^ 0, and if so whether the quotient and remainder are uniquely
determined in such a representation. The answers to these questions are
provided by the next theorem.

4.37 Theorem. For any a, b e I with b > 0 there exist q, r e I such that
a = bq + r and 0 < r < b. Further, if q', r' £ I and a = bq' + r',
0 < r' < b, then q = q' and r = r'.

Proof. We first show this for a > 0, or equivalently (by Exercise 3 of


Exercise Group 4.3), for a e P. Given b > 0, we prove by induction on
a G P that

(1) there exist q, r E. I with a = bq + r and 0 < r < b.

This is true for a = 1. For if b = 1 we have l = 6- l+ 0;ifl <5


we have 1 = b • 0 + 1. Suppose that it is true for a. Then for some
q, r, a + 1 = bq + r + 1, and 0 < r + 1 < b. If r + 1 = b, we have
a-\-l = bq-\-b = b(q + 1) + 0, which again gives the representation.
To prove it for a < 0, we have first 0 = b • 0 + 0. If a < 0, then —a e P
by 4.22. Then by (1), —a = bqx + rq, where 0 < rq < b. If ?q = 0,
a = b( — q{) + 0. If rq 5^ 0,

a = b(—q 1) — rx = b(—qx — 1) + (b — rq).

In this case set q = —qq — 1 and r = b — rq; since rq < b < b + rq


we have 0 < r < b.
An alternative proof of the first part of this theorem rests on showing
first the following corollary of it:

(2) for any a 6 I there exists q £ I with bq < a < b(q +1).
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 133

Ihis is done by applying the well-ordering principle 3.32 to the set

A = {bx — a:iel and a < bx}

of positive integers. (One must first show that A ^ 0.)


To prove uniqueness, suppose that

(3) bq + r = bq' + r' where 0 < r < 6, 0 < r' < b.

Suppose that r ^ r'. Then either r > r' or r' > r. In the first case we
write

(4) r — r' = b(q' — q).

Since r — r' > 0 and b > 0 we must also have q' — q > 0, and hence
q' — q > 1. It follows that the right-hand side of (4) is > b. On the
other hand, the left-hand side of (4) is < r < 6, giving a contradiction.
Similarly, if r' > r we obtain a contradiction. Thus r = r' and
b(q' ~ q) = 0. But since wTe have an integral domain and 6^0, we must
have q' — q = 0, i.e., q = q'.

A more general theorem can be obtained when we change the restriction


b > 0 to the restriction 6^0. This is left as an exercise.
The result 4.37 is often referred to as the division algorithm for the
integers. Indeed, the following is one mechanical procedure which is
suggested by it for computing q, r given a, b (b > 0). Compute the num¬
bers a — bx for x E I in the following order: a — b ■ 0, a — b ■ 1,
a — b ■ (—1), a — 6 • 2, a — b ■ ( — 2), . . . , and with each computation
compare a — bx to 0 and b. When we arrive at a q such that 0 < a —
bq < b we have found the desired numbers. Moreover, Theorem 4.31
assures us that we will eventually find such a q in a finite number of steps.
(This algorithm can be simplified once we know which of the cases b < a,
— b < a < b, a < —b is the one that a satisfies.)
It is an immediate corollary of 4.37, taking 6=2, that for every integer
a there exists q such that a = 2q or there exists q such that a = 2q + 1 ;
in other words, we have the familiar fact that every integer is even or odd,
and by uniqueness cannot be both.

The divisibility relation and the primes. The cases in which the remainder
after division is 0 are of special interest.

4.38 Definition. Let a, b e I. We say that a is divisible by b, or that a is


a multiple of b, or that b is a factor of a, if there exists q El such that
a = bq. If this holds we write b\a. If a is not divisible by b we write b\a.
134 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

We have the following consequences.

4.39 Theorem. Let a, b, c, aq, . . . , an E I. We have:


(i) a|0;
(ii) Oja if and only if a = 0;
(iii) l|a;
(iv) a|l if and only if a = ±1;
(v) a|a;
(vi) if c\b and b\a, then c|a;
(vii) b\a and a\b if and only if a = =Lb;
(viii) if c\a, then c\ab;
(ix) if c\a and c\b, then c|(a T 6);
(x) if 1 < i < n and c\ai, then c|XI*=i ak-

Proof. Cases (i)—(iii) are trivial. In (iv) and (vii) we use a = ±6 as


an abbreviation for “a = b or a = —b.” (iv) is obvious in the “if ” direc¬
tion. Suppose that a|l; 1 = aq for q E I. Then 1 = |1| = |a| • |g|. We
cannot have a = 0 or q = 0. Hence |a| > 0 and |g| > 0 and |a| • |g| >
|a| > 1. Thus |a| = 1. If a > 1, then |a| = a > 1; if a < — 1, then
|a| = —a > 1. Hence a = ±1. (v) and (vi) are trivial. For (vii) the
“if” part is again obvious. Suppose that b\a, a\b, that is, a = bq\ =
(aq2)qi = a(q2q\). If a = 0, then b = 0 by (ii). If a ^ 0, we obtain
by cancellation 1 = q2qi, hence gjjl and gq = ±1 by (iv). Then a = ±6.
(viii) and (ix) are easily seen to hold, the last by distributivity. (x) follows
from (viii) and the generalized associative and commutative laws 4.28
and 4.30.

4.40 Definition. Let pel. Then p is said to be a prime if p ^ 0,


p ^ ±1 and for all a, if a\p, then a = ±1 or a = ±p.

It can be mechanically checked that the first few positive primes are

3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37,

Often it is only the positive primes which are referred to as primes. How¬
ever, from the algebraic point of view the present definition is more
natural (this will be brought out in our discussion, later in the book, of
divisibility questions for polynomials). We shall prove below that there
are infinitely many positive primes (and hence infinitely many negative
ones). The following useful property of nonprimes is easily verified.

4.41 Lemma. If a e I, a > 1, and a is not prime, then there exist ax, a2 E I
with 1 < ai < a, 1 < a2 < a, and a = ax ■ a2.
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 135

Given a number a > 1 which is not prime, one or both of these numbers
°i> a2 could be prime. However if, say, ax is not prime it can be factored
further as ax = a[ ■ a2 where both a[, a2 are greater than 1. We proceed
similarly with a2, if a2 is not prime. By continuing this process we ex¬
pect that we will reach, in a finite number of steps, a representation
a = pi • p2 • . . . • pn, where p1} p2, . . . , pn are all positive primes. It is
conceivable that, by performing these computations in a different order
we would reach a representation a = p{ ■ p2 ■ . . . • p'm where n ^ m or
where n = m, but the representations are essentially different in that the
sequence p[, p2, ... ,pL is not a permutation of the sequence plt p2) ,
pn. This will be shown not to be possible if we can demonstrate that every
prime p has the following property: if p\bc, then p\b or p\c. For from
Pi\(P\ • P2 ■ ■ ■ • • Pm) will then follow px\p[ or px\{p2 • . . . • Pm), hence
Pi = Pi or pi\(p2 ■ . . . ■pm); by repeating this procedure, we would
eventually conclude that pi = p[ or pi = p2 or • • • or px = p
Then cancelling px from both sides of the equation px ■ p2 ■ . . . • pn =
p{ ■ p2 ■ . . . ■ Pn and repeating the argument for p2, p3, . . . , we could
eventually realize that the representations must be the same, except
possibly for the order of the factors. It is the object of the next group of
theorems to make these ideas precise, ending with a proof of the existence
and uniqueness of such representations. The following development,
while not the most direct to gain this end, is more informative and more
readily generalizable to questions of divisibility in other systems.

Greatest common divisors.

4.42 Definition. Let a, b, d G I. We call d a greatest common divisor


(gcd) of a and b if d has the following properties:
(i) d\a and d\b;
(ii) if x G I and x\a and x\b, then x\d.

It follows that if d\, d2 are both gcd’s of a and b, then dx\d2 and d2\di)
hence di = ±d2 by 4.39(vii). Further, if dx is a gcd of a, b, then so is
—d\. Evidently “greatest” here does not refer to magnitude in the usual
sense.
These considerations do not yet yield the existence of a gcd for any pair
a, b. This is easily established in certain special cases. If a = 0, then any
number divides a; since b\b this shows that b is a common divisor of a and b.
Moreover, 6 is obviously a gcd in this case. A gcd also evidently exists in
the case a = b. The nontrivial cases are given by the restriction a ^ 0,
b 0, and a ^ b. For simplicity, let us consider first the case a > b > 0.
The following argument provides a proof of existence of the gcd of a, b
in this case and at the same time gives what is known as the Euclidean
136 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

algorithm for finding such a gcd. It is based on the observation that if


we write

(4:5-3) a — bq + r, 0 < r < b,

then for any del

(4:5-4) d\a and d\b if and only if d\b and d\r.

Hence a gcd of a, b must at the same time be a gcd of b, r. Then the


problem of finding the gcd of a, b is reduced to the presumably simpler
problem of finding the gcd of 6 with the smaller number r. If r = 0, we
have b as a gcd. Otherwise we can repeat the process with b, r. If we
do not reach a zero remainder in, say, n steps we have the following
situation:

(4:5-5) a = bq -f rx, 0 < rx < 6;


b = r1ql + r2 0 < r2 < rx;
O = r2q2 + r3 0 < r3 < r2;

^n —2 ^n — lQn — 1 4“ 0 rn rn—j.

At the next stage we write

(4:5-6) rn_x = rnqn + rn+1, 0 < rn+1 < rn.

Then

(4:5-7) if rn+x = 0 we have:

rn = (a gcd of rn_lt rn) — • • • = (a gcd of rx, r2)


= (a gcd of b,rx) = (a gcd of a, b).

Note that the sequence of numbers, rx, r2, . . . , is uniquely determined by


a, b. Is it possible that for no n do we have r„+1 = 0? If so, then the
set A = {/■ i, r2, . . . , rn, . . .} would be a set of positive integers without
a least element, contrary to the well-ordering of P. Hence we must even¬
tually reach an n for which (4:5-5) and (4:5-6) hold and rn+1 = 0, thus
providing us with rn as the desired gcd by (4:5-7). This is Euclid’s
algorithm.
Suppose that we have (4:5—5) and (4:5—6) with rn_|_x = 0. We can write
out the way in which rx, r2, . . . , rn are determined by a, b as follows:

rx — a — bq,
r2 = b — rxqx = b — (a — bq)qx = — a9i + b(l + qqx),
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 137

etc. Thus we have rx = axi -T byi, where xi = 1, yx = —g, r2 =


ax2 + ^2/2, where x2 = —gi, 2/2=1 + gg1; etc. Suppose that we have
expressed

rk-2 — axk—2 + byk—2

and
nt-i — axk—i + byk—i,

where k > 3; then

rk — Tk—2 rk — lQk — 1 — a{Xk—2 xk — 1 Qk — i) + b(yk— 2 — yk—iqk—i),

so we can also write r* = ax^ + fo/#, with suitable Xk, yk■ It follows that
rn, which is the positive gcd of a, b, can be written in the form ax + by,
for certain x, y £ I. Consider any number ax' + by' where x', y' are inte¬
gers. If c £ I and c\a and c\b, then c\{ax' + by') by 4.39(viii), (ix). In
particular, rn\(ax' + by'). If ax' + by' is positive, this implies rn < ax' +
by'. Hence the positive gcd of a, b is the least positive number of the
form ax' + by'. This characterization suggests another proof of the
existence of gcd’s which is slightly more sophisticated than the foregoing
but is in certain respects more informative.
We will make use of the following two distinctive properties of the set L
of all linear combinations ax + by, where x, y 6 I:

(4:5-8) (a) if u £ L and z £ I, then uz G L ;


(b) if u, v e L, then u + v E L.

It is seen that if L' is any other set which satisfies (4:5-8)(a), (b) and we
have a, b G L', then L c L'. Clearly a = a • 1 + b ■ 0 and b = a • 0 + b • 1
are in L. Hence L is the smallest set L' satisfying all these conditions.
Now we can prove a theorem about any set S satisfying the conditions
of (4:5-8) in place of L, which will give us the desired result about L as a
special case.

4.43 Theorem. Suppose that S c I, S ^ 0, and that S satisfies:


(i) if u E S and z e I, then uz £ S;
(ii) if u, v £ S, then u + v £ S.

Then either S = {0}, or there exists ad > 0 such that S = {dz : z £ 1} ;


in the latter case d is uniquely determined.

Proof. Suppose that S ^ {0}. Since S ^ 0 we can pick u £ S, u ^ 0.


If u < 0, then — u = u(— 1) £ hence in any case S contains a positive
138 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

integer. Let A = S fl P; thus A 0. By the well-ordering of P, A con¬


tains a least element, call it d. Then d has the following properties:

(1) d e S;

(2) d > 0;

(3) if u G S and u > 0, then u > d.

It follows immediately from (1) and the hypothesis (i) that

(4) {dz : 26l}c 5.

Suppose, to prove the reverse, that u G S. By the division algorithm


(4.31) we can write

(5) u = dz -\- r, where z, r G I and 0 < r < d.

Then r=u — dq=u-\-d - (—q). Since u, d G S it follows from (i), (ii)


that also r G S. If r > 0, we would have r > d by (3). Thus we must
have r = 0 and u = dz. Hence

(6) S c {dz : z G 1}.

Steps (4) and (6) together give the desired equality. Thus d G S and is
a divisor of every element of S. Suppose that d' were any other number
with this property for which d' > 0. Then d\d' and d'\d; hence d = ±d'
by 4.39(vii). Clearly we cannot have d = —d', so d = d'.

4.44 Theorem. Let o, b G I, a ^ 0 or b ^ 0. Then a and b have a unique


positive greatest common divisor d. For suitable s, t G I we have
d = as + bt.

Proof. Let S = (a.r + by : x, y G I}. Then S satisfies 4.43(i), (ii).


Furthermore, a, b G S, so that S ^ 0, S ^ {0}. Choose d > 0 with
S = {dz : 2 G I}. Since d G S, we must have d = as + bt for some s, t.
Since a, b G S we have d a divisor of both a, 6. If c is any other divisor
of both a, b, then c|(as + bt) by 4.39(viii), (ix), that is, c\d. Hence d is
a gcd of a, b. As we have seen, any other gcd d' satisfies d' = ±d. Hence,
if d' > 0, then d' = d.

4.45 Definition. We define (a, b) for all a, b G I with a ^ 0 or b ^ 0


to be the unique positive gcd of a, b. Here a and b are called relatively
prime if (a, b) = 1.

There is a possibility of confusing this notation for the gcd with that for
the ordered pair. Both these notations are in standard use. The context
will always determine which meaning is intended.
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 139

If (a, b) = 1, then a, b have no divisors in common other than ±1.


This leads to the following [where, as in our further work, the use of the
symbol (a, b) will implicitly involve the assumption a ^ 0 or M 0],

4.46 Theorem. Suppose that a, b, c e I. If (a, c) = 1 and c|a6, then


c\b.

Proof. Since (a, c) = 1 we have by 4.44

(1) 1 = as -\- ct

for some s, t £ I. Hence

(2) b = (ab)s + c(bt).

Since c\ab and c\c{bt), we have c\b.

4.47 Theorem. Suppose that p is a prime and a, b, cq, . . . , an £ I. Then:


(i) if p\a, then (a, p) = 1;
(ii) if p\ab, then p\a or p\b;
(iii) if p\(ai • . . . • an), then p|a, for some i.

We leave the proof of this to the reader. Although it is now easily obtained,
4.47(iii) establishes the important property of prime numbers which led us
to the consideration of gcd’s. We also leave the following for the reader
to prove.

4.48 Theorem. Suppose that a, b, c £ I. If (a, b) = 1, a\c and 6|c, then


(ab)\c.

Factorization of integers into primes. The ground is now almost prepared


for us to prove a representation theorem for integers as products of primes.
First, consider the question of proving the existence of at least one repre¬
sentation for each a > 1. Consider any such a. If a is a prime, we are
through. Otherwise, a = ax ■ a2 for some a1( a2 where 1 < ax < a,
1 < a2 < a. If we assume that the result holds for all b < a, we can
then conclude that it holds for a. This suggests a new type of inductive
proof, slightly different from the kind that we have used so far, in that
we do not just consider whether the inductive hypothesis holds for the
element which immediately precedes a but whether it holds for all elements
which precede a. The validity of this type of argument is ensured by the
following general theorem.

4.49 Theorem. Suppose that (S, <) is a well-ordered system and that
A c S. For any a E S, let Sa = {x : x E S and x < a}. Suppose
A has the property that whenever Sa c A, then a £ A; then A = S.
140 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

Proof. Suppose, to the contrary, that A ^ S. Let B = S — A. Then


B ^ 0 and, by the well-ordering of S, B has a first element, call it b. We
claim Sb c A. For if x G Sb we have x < b, hence x & B and then x G A.
It follows by our assumption on A that b G A, contradicting the fact that
b G B. Thus we must have A = S.

Conversely, it can be shown that any simply ordered system (S, <)
must be well ordered if it has the following property for all A c S: if
for every a e S, Sa Q A implies a E A, then A = S.
A proof making use of the property 4.49 of well-ordered systems is some¬
times called a course-of-values induction since it refers to the behavior of
all values preceding a given element a. In particular applications it should
be observed that when trying to verify that the condition, implies
a G A, holds for all a e S, one will encounter one case in which the
hypothesis Sa d A is trivially satisfied, namely when a is the first element
of S, for then Sa = 0. It may be necessary to give a direct proof that
a e A in this case.

4.50 Theorem. For any a e I, a > l, we have:

(i) there exists a sequence (p\, . . . , pn) of positive primes such that
a = pi ■ . . . • pn)
(ii) if (qi, . . . , qm) is any other sequence of positive primes such that
a = qi ■ ■ qm, then n = m and for some permutation F of
{1, . . . , n}, qk = pF(k) for all k < n.

Proof. We shall prove (i) and (ii) together by a course-of-values in¬


duction on the set P2 of integers >2. Let A be the set of all a e P2 for
which there exists a sequence (pi, . . . , pn) satisfying (i) which is unique
up to order in the sense of (ii). To prove our theorem it suffices, by 4.49,
to show that

(1) if {x : 2 < x < a} c A, then a e A.

Assume the hypothesis of (1). We first show that (i) is also true for a.
If a is prime, this is immediate. Otherwise, we have, by 4.42,

(2) a = ai ■ a2 where 1 < ax < a, 1 < a2 < a.

By hypothesis, there exist two sequences, (pu . . . , pni) and (p[, ... , p„2),
of positive primes such that

n2

n n
'“l

(3) ai = Vk, 0,2 — Vk-


*;=i k=1
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 141

Let n — nx + n2 and define pUl+k for 1 < k < n2 as p*. Then by 4.27(ii)
and 4.28,
(n »i

pa
\ / n

• ( n Vk) = n ?*.
\ n

k= 1 ) \fc=n1-pi f k=l

Now to prove (ii), suppose further that

(5) a n
k= 1

where (qx, . . . , qm) are primes. We write

(6) a Pn-

Hence pn\ \Yk=i Qk- Then by 4.47 there is an i, 1 < i < m, such that
PnWi- But since 1 < pn, this can only happen when pn = (p. Let G be
any permutation of {l, ... ,m} such that G(m) = i. Then by the gen¬
eralized commutative law 4.30

ii ?* = n
uv rrc

(7) a =
A:=l fc=l
?G(fc) (if item)• a Pn-

From (6) and (7) we obtain


n— 1 TO — 1
(8) n vk n
k=i
90(*i)

by cancellation. Now it may happen that the quantity in (8), which we


shall now call ai, is 1. This can only happen if a = p\ = q\, and (ii) is
trivially established in this case. Otherwise, 2 < aq, and we also see from
a = ai ■ pn and 2 < pn that ax < a. Hence we. can apply our inductive
hypothesis to ax. By (ii) for a\, n — 1 = m — 1 and for a certain
permutation H of (1, ... ,n — 1},

(9) QG(j) = VHP) for all j < n — 1.

Define

(H(j) if k 9^ i and k = G(j)


(10) F(k)
(n if k — i.

By these conditions F is well determined; for each k < n (=m), k ^ i,


has k = G(j) for a unique j < n by the fact that G is a permutation of
(1, . . . , n} and G(n) = i. Further, F is a permutation of {1, . . . , n).
For suppose that F{k{) = F(k2) where ki, k2 are <n. If both kx, k2
142 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

are y±i, and kx = G(ji), k2 = G(j2) where j1} j2 are <n, then
F(ki) = H(ji), F(k2) = H(j2); hence j\ = j2 by one-to-oneness of H,
and then kx = k2. The only other case to consider is kx y± i, k2 = i.
But in this case F(kx) y± F(k2). To conclude the induction we need only
observe from (9) and (10) and the fact qi = pn that

(11) qk = PF(k) for all k < n.

From this theorem the following can be obtained without much trouble;
the proof is left to the reader.

4.51 Corollary. For any a e I, a > 1, there exist unique sequences


(p 1, . . . , pn) and (ii, . . . , in) such that:
(i) for each k, l with 1 < k < l < n, pk is a positive prime and
Pk < Pi]
(ii) for each k < n, ik £ P;
(iii) a = pY ■ ■ pf1-

4.50 and 4.51 are generally referred to as different specific forms of the
unique factorization theorem for integers. Given a representation of a as in
4.50, we can find all positive divisors d of a very easily, as all products
of subsequences, i.e., sequences (pkl, . . . , Pkm), where 1 < kx < ■ ■ ■
< km < n. This leads to a convenient way of finding the gcd (a, b) when
representations for a, b are available. (The details will be left to the
exercises.) Further, inspection of the proof of part (i) in 4.50 reveals that
such representations can always be found in a finite number of steps.
This can be done in the following particular routine. Given a > 1, list
all positive primes 2, 3, 5, . . . which are <a, until one comes to the first
which is a divisor of a. Call this prime and write a — pi ■ a2. If
a2 > 1, we repeat this process, to obtain a2 = p2 ■ a . Then the sequence
ax, a2, . . . of numbers thus obtained must eventually reach 1, since when¬
ever ak > 1, we have ak = Pk ■ ak+i and hence ak > ak+1. When
an > 1 and an+x = 1 we have a — px ■ . . . ■ pn.

4.52 Theorem. There exist infinitely many positive primes.

Proof. We shall show by induction on n e P that there exist at least n


distinct positive primes. This is obvious for n = 1. Suppose that it is
true for n. Let qx, . . . , qn be n distinct positive primes. Let a =
(qi • • • ■ • qn) + L By 4.50, a has a representation as a product of positive
primes. Hence there is at least one positive prime p such that p\a. Then
p is distinct from each of qx, ... , qn. For otherwise if p = qk for some
p\(q 1 • • • ■ • qn), and hence p\(a — (qx • . . . • qn))] but then p|l, so that
P = L By the definition of prime, this is impossible. Thus if we take
qn+\ — P, we have at least n + 1 distinct positive primes qx, . . . , qn+x.
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 143

Since our proposition is true for every n £ P, there cannot be finitely


many primes.
Since the division algorithm provides us with an algorithm for finding
all divisors 6 of a number a, we have an algorithm for deciding whether
any given number is a prime. Thus, by successively listing all numbers 2,
3, 4, 5, 6, . . . up to any given point, we can effectively list all prime num¬
bers up to that point. It is not simple, however, to determine in advance
how far one must go in order to find a given number of primes. The answer
to this is known, but only as an approximation statement, as the prime
number theorem, one of the most celebrated mathematical results of the
last century.

Positional notations for integers. We now return to one of the questions


which motivated the introduction of integers (at least of 0), the possibility
of the positional decimal representation of positive integers. For example,
we write 2037 as an abbreviation for the number 2 • 102 + 0 • 102
3 ■ 10 + 7. For the sake of uniformity we can also write here 3 • 10 =
3 • 101 and 7 = 7 • 1 = 7 • 10°.
We generalize the desired result of decimal representation to repre¬
sentation in terms of an arbitrary base b > 1. Note that coefficients in
representation to base 10 are the numbers 0, 1, 2, . . . , 9. Similarly,
coefficients in representations to base b will be 0, l, ... ,b — 1.

4.53 Theorem. Suppose that a e P, 6 £ I, 6 > 1. Then there exists a


unique n £ I, n > 0, and a unique sequence (c0, . . . , cn) of integers
such that:

(i) a = ckbn~k ;
k=0

(ii) for each k with 0 < k < n we have 0 < ck < 6;

(hi) c0 5^ 0.

Proof. We shall prove the existence and unicity of such a representation


simultaneously by course-of-values induction on P (4.49). Thus suppose
that a £ P and that every ax £ P with aq < a has a unique representation
of the desired form. We consider two possibilities, a < b or a > b. If
a < b, then a = a ■ 1 = a ■ b° = E*=o ckb°~k, where c0 = a [so that
(ii), (iii) clearly hold]. Suppose also that a = E*=o ckbn~k. If n > 0,
then a = Cobn + E*=i ckbn~k and, since each ckbn~k is nonnegative,
a > cobn; but Co > 1, so a > bn = b ■ bn~l > b, contrary to a < b.
Hence n = 0 and a = c'0b° = c'0; thus Co = c0 and the representation is
unique. Consider now the case a > b. Write

(1) a — qb + r where 0 < r < b,


144 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

by the division algorithm. Then?' < a, and a — r > 0; also a — r = qb,


hence q > 0, that is,

(2) q G P.

But 1 < b, so q < qb < a, that is,

(3) q < a.

By (2), (3) we can apply the inductive hypothesis to q. Hence there exist
unique m e I, m > 0, and integers (d0, . . . , dm) with
m

(4) (i) 5 = £
/c=0

(ii) 0 < dk < b whenever 0 < k < m;

(iii) d0 ^ 0.

Then from (1) by distributivity we have


m

(5) a = £ + r.
k=0

Define

(6) n = m + 1, cn = r, and c*, = d* /or 0 < /c < n — 1.

Thus (5) can be rewritten as


71— 1
n—k n—k
° = 2 Ckb Cn — ^
k=0 fc=0

Clearly c0 0 and 0 < Ck < b whenever 0 < k < n. Hence we see


that (6) provides at least one representation for a. Suppose that we also
had

(7) (i) a = £
k=0

(ii) 0 < c'k < b whenever 0 < k < n';

(iii) c'0 ^ 0.

If n' = 0, then a = Cq < b, contrary to a > 6. Hence n' > 0 and we


can write
n' —1 /«' —1 \
— 1)— fc
a — c{b n + c'n’ — £ 4b1"'”' ■ h -p cC-

k=0 k=0
4.5] MATHEMATICAL PROPERTIES OF THE INTEGERS 145

Thus if we define

(8) m' — n' — 1, d'k — ci, for 0 < k < m',


and
m'

(9) 5' = E
fc=o
we have

(10) a = q’b + c'n’.

But 0 < c'n, < b, so that by the unicity condition of the division
algorithm,

(11) a' = q and crn’ = r.

Now we apply the inductive hypothesis of the uniqueness of representation


(4) of q. It follows that m' = m and dk = dk for 0 < k < m. But then
by (6) and (8) n' = m' + 1 = m + 1 = n, c'n, — ch — r = cn, and
c'k == d'k = dk = Ck, for 0 < k < n — 1. Hence also the representation
of a is unique, and the induction step is completed.
By means of the result 4.33 for the geometric series, we can also add the
following bit of information to the preceding.

4.54 Theorem. For a G P represented as in 4.53 we have bn < a < bn + 1


Furthermore, n is uniquely determined by these inequalities.

Proof. Since c0 ^ 0 we clearly have bn < a. On the other hand, each


Ck < b — 1, so

a = c^n~k < Z (b - l)bn-k = (b - 1) hn~k


k=0 k=0 k=0

= (b - 1) ^2 bk = bn+l - 1.
k=0

For any other m with bm < a < bm+1, if say m < n, we would have
m + 1 < n, hence bm+1 < bn < a, which is a contradiction. Similarly,
we cannot have n < m. Hence n = m.

We have given the representation theorem in the form which corre¬


sponds to our usual way of denoting numbers, with descending powers
of b. Using the commutative law, we could equally well represent a as
YJk=o Ckf>k, where 0 < Ck < b and cn ^ 0.
From the representation of 4.53 we can deduce the familiar rules for
adding and multiplying integers as learned in elementary arithmetic.
146 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

Given ai, a2 to be added, say

ni n2

«i = X ci^bni~k and a2 = X c2ikbn2~k,


k=0 fc=0

where n2 > nx, put n = w2 and write

n—A:
«i = X dl’kb
and «2 = X rf2,fcfrn fc,
k—0 k—0

where
di.fc = 0 if 0 < k < n — n\,

d\ ,fc-)-(n—nj) — CX)& ii 0 < < RX,

d2,k = c2,fc for 0 < k < n.

Then

ax + a2 — X (dilfc + d2'ic)bn k.
k=o

Let d/c = rfi.fc + d2ifc for 0 < k < n. Then d0 ^ 0. However, we do not
necessarily have 0 < dk < b, but can only conclude that 0 < dk <
26 — 2. This forces us to consider the so-called “carry-over. ” For example,
if 0 < dn < b, then we will take for the nth digit cn in the expansion of
a i + a2 the number dn. Otherwise we can write dn = 6 + r, where
0 < r < 6. In this case we take cn = r, and carry over 1 to the coefficient
of the 61 term. As the new coefficient of the 61 term we have dn_x + 1,
which is <26. Let d'n_x be dn_j 1 or dn_i according as there is or is not
a carry-over. Since dn < 26 — 2, we certainly have d,h-i < 26. Hence
we can now repeat for dn-i the procedure which we applied to dn. If
6 < dn-i, we will have^_x = 6 + cn_lt 0 < cn_x < 6. Then 6 =
1 • 62 -f- cn_x6, so that we must now carry over 1 to the 62-term.
Continuing in this manner we will eventually reach the representation
tor ax -f- a2. The student should analyze in detail the algebraic laws which
make this procedure possible, and should carry out a similar analysis for
multiplication.
Although we have not yet considered any integral domains besides the
integers, it should be remarked that a number of our notions and results
about the integers can be extended to a wide variety of other domains.
This is true of the division algorithm, the notion of divisibility, existence
of gcd, notion of prime, and representation as products of primes. Since
all our proofs made use of the well-ordering of the positive part of the
integers, some variants of these arguments must be made in order to carry
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 147

these theorems over to other domains. We shall consider this in only one
case later on in the book, where we take up divisibility questions concern¬
ing the polynomials. There we will be able to use induction on the degree
of the polynomials to take the place of the induction arguments in the
integers. In contrast to these theorems, the general representation theorem
4.53 is peculiar to the integers. For if every positive element of an ordered
domain could be represented to the base 2 (i.e., 1 + 1) of the domain, the
set of positive elements would be isomorphic to P, and the domain itself
would be isomorphic to the integers.

Exercise Group 4.5

1. State and prove a theorem corresponding to the division algorithm 4.37


just under the restriction b ^ 0.
2. Prove Lemma 4.40.
3. Prove Theorem 4.47(i)—(iii).
4. Prove Theorem 4.48.
5. Find the gcd’s (1960, 252) and (165, 182) by the Euclidean algorithm and
by prime factorization. In each case find s, t G I with (a, 5) = as + bt.
6. Let a, b, c £ P.
(a) Simplify (ab, ac).
(b) If (a, c) = (b, c) = 1, what is (ab, c)? Why?
7. Prove the Corollary 4.50.
8. Give a direct proof of 4.50(ii) by a course-of-values induction on the
number c = ab.
9. Show' that if a2\b2 then a\b.
10. Prove that there are infinitely many positive primes p for which there
exist k wTith p = 4k + 1. Can you see other forms p = ak + b for which
this holds?
11. Calculate the results of the following operations, using all numbers repre¬
sented to the base 4. Check by converting to base 10.
(a) 3102 + 223 (b) 3102 • 223
12. Show that if the sum of the digits, to the base 10, of a number a is divisible
by 9, then a is divisible by 9. Are there numbers other than 9 with this
property ?

4.6 Congruence relations in the integers. In the proof of Theorem 4.21,


in which we constructed a system isomorphic to the integers, we can
distinguish two main steps. We first defined operations ©, ° on P X P
and a relation = between elements of P X P, and we verified that = is a
congruence relation with respect to these operations. In the second step
we used this fact to define, unambiguously, corresponding operations +, •
on the class D of equivalence sets of =. Of course, the algebraic properties
148 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

of (P X P, ©, °) completely determine those of (D, + , •). The definitions


of ©, ° are of a rather special nature, particularly suited for transforming
a particular kind of algebraic system into a system with subtraction-like
properties. On the other hand, the passage from (P X P, ©, °) to (D, +, •)
has features of a rather general nature, which are seen to be present when¬
ever we construct from a system

(S, F1,F2, . . . , Fk, Wx, W 2, . . . ,Wi, ax, a2, . . . , am)

and any congruence relation = for this system the corresponding system

(S, F1; F2, . . . , Ffc, Wj, W2, . . . ,Wh ab a2, . . . , aw)

in the class S of equivalence sets. It is natural to ask what algebraic prop¬


erties of the original system continue to hold true in the new system.
Let us denote the equivalence set to which any x E S belongs by [x\. We
recall the conditions of (2:3-36) for = to be a congruence relation on the
original system. For example, if F is one of the original operations and is,
say, binary and xx = yx, x2 = JJ2, then F(xx, x2) = F(yx, y2), and further
if W is one of the original relations and is, say, binary, then (xx, x2) E W
if and only if (yx, y2) e W (similarly when F, W are n-ary for other n).
Then corresponding to F, W we can define operations F and relations W
on equivalence sets so that

(4:6-1) F([x], [y]) = [F(x, y)l

(4:6-2) ([x], [y]) e W if and only if (x, y) e W.

In general any equivalence set X can be represented as [x] for many x E S;


it is clear from this why it is necessary for = to be a congruence relation
in order that (4:6-1), (4:6-2) provide unambiguous definitions of F, W.
In defining the new system in S from that in S, we need only add to condi¬
tions like (4:6-1) and (4:6-2) the definitions

(4:6-3) a = [a]

for each constant a of the original system.

Homomorphisms. Now consider the function G(x) = [x] for all x E S.


This has the properties:

(4:6-4) (a) £>((?) = S, (R(G) = S;


(b) G(ax) - a1; G(a2) = a2, . . . , G(am) = am;
(c) if Wx is, say, binary, then for any x, y E S, (x, y) E Wx
if and only if (G(x), G(y)) E W, , and similarly for W2,... ,
Wi, and any other number of arguments;
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 149

(d) if Fi is, say, binary, then for any x, y e S, G{Fx(x, y)) =


Fx(G(x), G(y)), and similarly for F2, . . . , Fk, and any
other number of arguments.

It is thus seen that G has all the properties required of an isomorphism


from S to S except that G be one-to-one. More generally, suppose that
{S', F[, F2, . . . , Fk, W[, W2, . . . , Wi, a[, a2, . . . , am) is any system for
which there is a function G satisfying (4-6.4) (a)-(d) with S', F[, F2, ... ,
SL W{, W2, . . . , W[, a{, a2, . . . , a'm, respectively, instead of S, Fx,
F2, . . . , Fk, Wi, W2, . . . , W;, ax, a2, . . . , am. Then we say that G is a
homomorphic mapping of the first system onto the second, and that the
second system is a homomorphic image of the first. (Hence, in particular,
every isomorphic image is a homomorphic image.) Thus (4:6-4)(a)-(d)
show that every congruence relation = in the original system gives rise to
a homomorphic mapping G{x) = [x]. Further this relation is connected
to the mapping by the condition

(4:6-5) for all x, y E S, x = y if and only if G{x) = G(y).

Now we shall see that congruence relations give essentially all the
homomorphic images of a system. For suppose that G is a homomorphic
mapping of (S, Flt . . .) onto (S', F[, . . .). Define the relation = by
(4:6-5). (It may be observed that if G is one-to-one, i.e., is an isomorphism,
then is is just the relation = of identity.) Then (4:6-5) shows that = is
an equivalence relation in S. Suppose that X\ = yi} x2 = y2, that is,
G(xi) = G(yi), G(x2) = G(y2). If Wx is, say, binary, then by the condi¬
tion of the homomorphism (x\, x2) G Wx if and only if (G(xf), G(x2)) G W(
and (ylt y2) e Wx if and only if {G(yx), G(y2)) e W[. But G(x{) = G(yi),
G(x2) = G(y2), so (xi,x2) e Wx if and only if (2/1, 2/2) G Wx. Further,
if Fi is, say, binary, then we claim FX(x\, x2) = Fi(yx,y2), i.e., that
G(Fi(xi,x2)) = G(Fx(yx, y2)). This follows from the fact that

Fi{G(x 1), G(x2)) - F[(G(yi), G(y2)).

Hence this relation = is actually a congruence relation on the system


(S, Fi, . . .). Now form the corresponding system (S, F1} . . .) in the class S
of equivalence sets. Then we claim that

(4:6-6) (S',F{,...) =* (S, Fx, . . .).

Indeed it can be seen that there is a one-to-one mapping FI from S' to S


which will satisfy the condition

(4:6-7) H(G(x)) = [x]


150 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

for every x G S and which will provide the desired isomorphism. (To see
that (4:6-7) uniquely determines H, it is only necessary to check that if
G(x) — G(y), then [x] = [?/].) Thus by means of congruence relations in
the original system (S, F x, . . .) we obtain every homomorphic image, at
least up to isomorphism. The choice of whether, in a given discussion, we
should speak about congruence relations or about homomorphisms is thus
a matter of convenience.

Properties preserved under homomorphism. We can now pose our original


question as follows: what algebraic properties are preserved when we pass
from a system to any homomorphic image of it? A precise answer to this
question depends on what we mean by “algebraic property. ” A suitable
formulation of this can be given in metamathematical terms, i.e., within
the framework of symbolic logic. Given this formulation, a satisfactory
answer to the above question is known as a result of certain research
carried on in recent years. The class of properties thus known to be pre¬
served under homomorphism is rather wide, but not easy to describe in
nonlogical terms. We intend only to indicate in an informal way some
special instances of such properties.
Suppose that we form an element bx of S by applying to given elements
x, y, z, . . . of S as well as the elements ax, a2, . . . various of the operations
Fi, F2, .... Considering x, y, z, ... as being variable, bx may be thought
of as the value tx(x, y, z, . . .) of a function tx. When the values tx(x, y, z,...)
and t2(x, y, z, . . .) are asserted to be equal for all x, y, z, . . . in S, we say
that the equation

(4:6-8) tx(x, y, z, . . .) = t2(x, y, z, . . .)

is true in S. For example, the following may be a true equation in S:


F\(F2(x), Fx(a2, x)) = F2(ai), where Fx is binary, F2 unary. Now it
can be shown that every equation which is true in a system is also true in
every homomorphic image of that system.. For example, to see that, the above
equation is true in (S', F{, . . .), consider any x' e S'. We wish to show
F[{F2(x'), F((a2, x)) = F2(a{). Let G provide the homomorphism from
S to S'] thus a( = G(ax), a2 = G(a2), and x' = G(x) for some x E S.
Then
F\(F'2(x'), F\(a'2, x')) = F\(F'2(G(x)), F\(G(a2), G(x)))
= F\(G(F2(x)), G(Fi(a2, x)))
= G(Fx(F2(x), Fi(a2, x)))
= G(F2(ai))
= F'2(G(ax))
= F2(a'i).
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 151

Here we have passed from each line to the next by using the properties
of G as a homomorphism, except when going from the third to the fourth
line, where we used the assumption that Fx (F2(x), F1(a2, x)) = F2(oq)
is true in the original system.
Incidentally, we can now expand on our remarks in Chapter 2 in con¬
nection with properties shared by a system and its subsystems. It is easily
seen that every equation which is true in a system is also true in every one of
its subsystems. This answers part of the question: what algebraic 'prop¬
erties are preserved when we pass from a system to any of its subsystems?
The complete answer to this question is also known by recent results
in logic.
In contrast to the above, an inequality is not in general preserved by
homomorphism. For we may have particular elements of S, say a a2, i,
such that oq 9^ a2, but G(ai) = G(a2), since G is not required to be one-
to-one. An inequality will be preserved when passing to subsystems, but
other, slightly more complicated, properties will fail to be preserved, for
example, “existential” properties (cf. below).
The question of finding a large class of properties of relations preserved
by homomorphism or by passing to subsystems is, from the logical point
of view, no more difficult than for those involving functions and constants,
but it would be even more difficult for us to explain in the present frame¬
work. The reader can check that various properties of a system (S, IF),
where IF is binary, such as IF being reflexive, symmetric, transitive in S,
are preserved under homomorphism or by passing to subsystems. Also
the property of being simply ordered is inherited by subsystems. We can
ask the same question with respect to homomorphism. However, there
are no nontrivial homomorphisms of ordered systems, so that this question
loses interest. For suppose that (S, <) is a simply ordered system and
that = is a congruence relation in this system. Then for any x\, x2, yi, y2)
if xi = 7/i, x2 = J/2, and Xi < x2) then yx < y2. If the corresponding
homomorphism G is not trivial, i.e., is not an isomorphism, then aq = 7q
for some X\, y\ where aq 9^ y\. But then aq < ?q or ?q < aq. Suppose
the former. Apply our condition to aq = ?q, zq = ?q; then from x\ < tq
we would have to conclude that 7/1 < Tq. We similarly reach a contradic¬
tion if y 1 < aq. Thus = must be the identity relation in this case. As
we shall see in a moment, we can construct many nontrivial homomor¬
phisms of the system (I, +, •, 0, 1). The above shows that the congruence
relations corresponding to these homomorphisms cannot also be a congru¬
ence relation with respect to <. (Because of such results as the above, the
notion of homomorphism for systems with relations only is sometimes
defined in a slightly different way than is done here. Under this other
definition, ordered systems can have nontrivial homomorphic images which
are again ordered.)
152 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

The above general discussion now leads us easily to the following


theorem.

4.55 Theorem. If G is any homomorphic mapping of a commutative ring


(D, + , *,0, 1), with unity, such that G(0) ^ 15(1), then the resulting
image is a commutative ring with unity.

Proof. All the conditions of 4.1 for a system to be a commutative ring


with unity, except (i) and (vi), are equations and hence are preserved
under homomorphism. 4.1 (i) is guaranteed for the image system by the
hypothesis (7(0) X1 (7(1). To see that 4.1(vi) holds let +' denote the
image of + in the new system, so that G(x + y) = G(x) +' G(y) for
any x, y G D. Consider any x', y' G (R(G); we wish to find u' G (R(G)
with x' +' u' = y'. We have x' = G(x), y' = G(y) for some x, y G D.
Then x + u = y for some u G D, and

G(x + u) = G(x) +' G(u) = G(y),

i.e., x' +' G(u) = y'. Thus u' = G(u) is a suitable choice.

In contrast to 4.55, a subsystem of a commutative ring with unity need


not be again such a system; we leave it to the reader to provide examples
of such.
The condition 4.14 to be an integral domain is that x-y = 0 implies
x = 0 or y = 0, for all x, y G D. It is not clear that this property is
preserved under homomorphic images; in fact, we shall give many instances
below to show that it is not. (This is not to say that in other cases, a
given homomorphic image of an integral domain could not be an integral
domain.)
We wish now to give a classification of all congruence relations (and hence
of all homomorphic images) of the integers. We shall start with certain
particular relations which we show to be congruence relations.

Congruence modulo an integer.

4.56 Definition. Let x, y, m G I. We put x = my if m\(x — y). We also


write x = i/(mod m) (read “x is congruent to y modulo m”) for x = my.

4.57 Theorem.

(i) For any m G I, =m is a congruence relation for the system


(I, +, •, 0, 1).
(ii) =m is the same as =(_TO).
(iii) For m = 0, =m is simply the identity relation; for m = 1 it is
the universal relation I X I.
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 153

(iv) If we denote by [x\m the equivalence set to which x belongs, then


for m > 0, =m has exactly m equivalence sets [0]m, [l]m, ,
[m. l]m*

We leave the proof of this to the reader. The sets [0]m, [l]m, . . . ,
[m \}m are also often called the congruence classes mod m. We have,
for example, for m = 3,

[Oh - s{• • • > 3, 0, 3,6,...},


[1]3 = {• • • , -2, 1, 4, 7, . . [2]3 = {■ . • , -1, 2, 5, 8, \
■I ■

4.58 Theorem. The relations =m, for me I, are the only congruence rela¬
tions for the system (I, +, •, 0, 1).

Proof. Let = be any congruence relation. The study of this relation


can be reduced to the study of the set

(1) S = {u : u = 0},

since

(2) x = y if and only if x — y G S.

Now S has the following properties:

(3) ScI, S * 0;

(4) if u e S and z £ I, then uz e S;

(5) if u, v e S, then u + v e S.
In other words, S is a nonempty subset of I closed under linear combina¬
tions. The properties (4), (5) follow directly from the assumption that =
is a congruence relation. By Theorem 4.43,

(6) either S = {0} or S = {mz : z e 1} for some m > 0.

In the first case, x = y if and only if x — y = 0, that is, x = y; hence


= is the same as =0 >n this case. In the second case, x = y if and only
if x — y is some multiple of m; thus = is the same as =m here.

Such sets S as constructed in (1) from a congruence relation in a com¬


mutative ring are called ideals in modern algebra. Conversely, any set S
satisfying (3)-(5) determines a relation = by condition (2). It can be
seen that this relation is necessarily a congruence relation. The use of
ideals provides a further step in simplifying the study of homomorphisms
of commutative rings. Moreover, ideals have proved to be of great use
in a variety of contexts in number theory and algebra.
154 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

4.57 and 4.58 give us a complete classification of the congruence rela¬


tions for the integers. For the corresponding homomorphic images, we
see that =0, being the identity relation, provides us only with a system
isomorphic to the integers; while =i, being the universal relation, provides
us with a system consisting of a single element ([0] x).

4.59 Definition. Let m e P. We denote by Im the collection of congruence


classes mod m, that is, lm = {[0]TO, [l]m, . . . , [m — l]m}. We denote
by +m, 'm the operations on congruence classes mod m corresponding
to Thus:
(1) \L\m m \y\m — 2/]ro>
(ii) [%]m 'm [l/]m ' U\m-
We also write x + y(mod m) for [x -j- y]m and x • ?/(mod m) for
[x • y\m-

For m — 4 we have the four congruence classes, [0]4, [1]4, [2]4, [3]4,
which we also (as is common practice, but with slight danger of am¬
biguity) denote by 0, 1, 2, 3. We then have the following tables for +
and • (mod 4).

+ 0 1 2 3 • 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1

Precisely written, we have, for example,

[2]4 +4 [3]4 = [1]4, [2]4 "4 [3]4 = [2]4.

Note that in this last we have [2]4 • [3]4 = [2]4 • [1]4, but [3]4 5^ [1]4; i.e.,
the cancellation law for multiplication does not hold in this system, hence
it is not an integral domain.

4.00 Theorem. Let m e I, m > 1. Then (ITO, +m, -TO, [0]m, [1]TO) is a
commutative ring with unity. It is an integral domain if and only if
m is a prime.

Proof. The first part is immediate from 4.55 and the fact that
[0}m [1 ]m (otherwise 1 = 0 (mod m), that is, m\\). Suppose that m is
prime and that [x]m -m [y]m = [0]m, i.e., by 4.59, [x • y\m = [0]TO, so
x ■ y = 0 (mod m). Then m\(x • y). By 4.47(h), m\x or m\y, hence x = 0
(mod m) or y = 0 (modm), so [x]m = [0]m or [y]m = [0]w. Conversely,
suppose that the system is an integral domain. If m is not prime, we can
write m = a-b where 0<a<m, 0<b<m (4.41). Then [0]m =
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 155

[m\m — [o]m [b]m, although [a\m ^ [0]m, [b]m ^ [0]m.


•to By this contra¬
diction we see that m must be prime.

Actually, the systems ITO, for m a prime, have an even more interesting
property, which we shall take up in the next chapter.

Applications to a Diophantine problem. We wish to conclude this chapter


with a solution of a problem, in elementary number theory, which makes
essential use of the prime factorization theorem and, to a minor extent, of
the congruence notation. Such problems as the following were first sys¬
tematically studied by the Greek mathematician Diophantus.
The statement, for x e I, that x = 0 (mod 2) or x = 1 (mod 2), is just
another way of saying that x is even or x is odd. In the first case we have
x = 2k for some k e I, in the second case x 2k + 1. We have
=

(2k)2 = 4 A-2 and (2k + l)2 = 4k2 + 4k + 1 = 4k(k + 1) + 1.

Since k or A' + 1 is even, it follows that k(k + 1) is always even. Hence


(2k -f l)2 = 1 (mod 8). On the other hand, we have (2k)2 = 0 or
4 (mod 8), according as k (and hence k2) is even or odd. Thus, in any
case, x2 = 0, 1, or 4 (mod 8), no matter what x is. This allows us to
say something about the possible values of x2 + y2, for x, y e I; we see
that, mod 8, this is never 3, 6, or 7.
It is a familiar fact that we can form a right triangle with legs of length
3 and 4, hypotenuse of length 5; the same holds for the numbers 5, 12,
and 13. Here we are using Pythagoras’ theorem that there is a right tri¬
angle with legs of lengths x, y and hypotenuse z if and only if x2 + y2 = z2.
Of course, it is not true that for any x, y £ P we can find z e P satisfying
this equation. It is natural then to ask whether we can find a simple
systematic method for determining all x, y, z £ P such that x2 + y2 = z2.
One method is to enumerate all triples (x, y, z) of elements of P (it is
sufficient here to restrict ourselves to those with .r < z, y < z) say
(1,1,2), (1,1,3), (1,2,3), (2,1,3), (1,1,4), (1,2,4), (1,3,4),...,
testing each in turn to see whether we have the desired relation. This is
both tedious and not very enlightening. The following analysis is much
more satisfactory.
We shall call a triple (x, y, z) for which x, y, z e P and .r2 + y2 = z2,
a solution. Consider any such solution. We put d = (x, y), x = xpl,
y = yxd. Then (xx, yx) = 1; for if we had a prime p with p\xx, p\yx, we
would have pd\x, pd\y, contradicting d = (x, y). From x2 + y2 = z~
it follows that d2\z2 and hence that d\z (cf. Exercise 9 of Exercise Group
4.5). Put z = zxd. Then
2 i 2 2
x\ + y\ — z\.
156 THE INTEGERS AND INTEGRAL DOMAINS [CHAP. 4

Let us call a triple (x, y, z) for which x, y, z G P, x2 + y2 = z2, and


(x, y) = 1, a primitive solution. By the preceding, to every solution
(x,y,z) is associated a primitive solution {xx,yx,Zi) and a d with
(x, y, z) = (x\d, yid, zxd); conversely, if (xi, yx, zx) is a primitive solution
then (x\d, yxd, zxd) is a solution for any d.
We now assume that (x, y, z) is a primitive solution. Then it is seen that
also (y, z) = 1, again by considering the possibility of a prime p which
divides both y, z; similarly (x, z) = 1. Since (x, y) = 1, it cannot be that
both x and y are even. We also claim that they cannot both be odd. For
otherwise we would have x2 = 1 (mod 8) and y2 = 1 (mod 8), hence
z2 = 2 (mod 8); however, we have seen that this holds for no 2. By sym¬
metry, in finding all solutions, it suffices now to restrict ourselves to the
case that x is even and y is odd. Then from (x, 2) = 1 it also follows
that 2 is odd.
Let us now write x2 = z2 — y2 = (2 + y)(z — y). We can write
x = 2t, t G P. Since both 2 and y are odd, we have 2 + y and 2 — y even.
Hence we can find r, s e P with z -\- y = 2r, z — y = 2s; from this it
follows that 2 = r + s, y = r — s, and hence also r > s. We see,
furthermore, that (r, s) = 1, for (r, s) divides both r + s and r — s, and
hence both y, 2. Finally, from 412 = (2r)(2s), we conclude that t2 — rs.
Suppose that p is prime and p\r, so (p, s) = 1, hence p\t and then
p2\rs. But (p2, s) = 1 also, so that p2\r. Writing r = p2r 1, t = ptx, we
have t\ = rxs. It is easy to see now how we can give an inductive proof
(by course-of-values induction on t) that if t2 = rs and (r, s) = 1 then
for some u, v 6 P, r = u2 and s = v2. For the induction hypothesis will
give us rx = u\, s = v2 for some uu v, and then r = (pux)2. Furthermore,
we must have (u, v) = 1. Finally, from t2 = rs we obtain t = uv.
Note that we cannot have r = s (mod 2), for otherwise y = 0 (mod 2),
he., y would be even. This is another way of saying that r and s cannot
both be even or both be odd, or as is often said, that they have opposite
parity. Furthermore, u2 is even or odd according as u is, i.e., u2 = u
(mod 2). Hence, we also cannot have u = v (mod 2).
We summarize this as follows:

(4:6-9) if (x, y, z) is a primitive solution with x even, y odd, then for some
u, 9GP, x = 2uv, y = u2 — v2, z = u2 + v2, (u, v) = 1,
u > v, and u, v have opposite parity.

Conversely, it is seen from

(2uv)2 + (u2 - v2)2 = (u2 + v2)2,

that any u, v with u > v thus lead to a solution. (4:6—9), together with
4.6] CONGRUENCE RELATIONS IN THE INTEGERS 157

the earlier remark reducing arbitrary to primitive solutions, gives a simple


complete description of all solutions.
Some instances of (u, v) satisfying the conditions of (4:6-9) are (2, 1),
(4,1), (3,2), (5,2). These lead to the solutions (4,3,5), (8,15,17),
(12, 5, 13), (20, 21, 29), respectively.

Exercise Group 4.6

1. Prove Theorem 4.57.


2. Do there exist x, y, z G I with x2 + y2 + z2 = 807?
3. Construct addition and multiplication tables for 12, I3, I5, U-
4. Show that for any x £ I and any prime p, xv = x (mod p). [Hint: Use
induction.] What can you say about xp_1 (mod p), xp~2 (mod p) (for
p > ?)? (Relate this to Exercise 3.)
5. Let a £ P. Show that a = b mod 9 when b is the sum of digits in the
representation of a to the base 10. (Cf. Exercise 12, Exercise Group 4.5.)
6. Give an example of a system and a subsystem where the first is a com¬
mutative ring with unity, while the second is not.
7. Let (D, +, -,0, 1) be any integral domain.
(a) Show that if n £ P and for some x G D, where x ^ 0, we have
nx = 0; then for all y £ D, ny = 0. [Hint: Consider n(x • y).} A
domain such that for all n £ P and all iGD with 1 ^ 0 we have
nx 0 is said to be of characteristic °o (sometimes also called of
characteristic 0). If D is not of characteristic °°, there is a least
positive integer n such that for some x £ D, x ^ 0 and nx = 0.
In this case D is said to be of characteristic n.
(b) Show that if D is not of characteristic =0 then its characteristic is a
prime number. This notion of characteristic can be used to generalize
4.23, by omitting the requirement of order. If we take I = (nl : n £ 1}
as in 4.23, we see that 0, 1 £ I and I is closed under +, •. Then we
can show the following:
(c) If D is of characteristic p (prime), then the subsystem I (with the
operations of D restricted to I) is isomorphic to the domain Ij, of
integers mod p.
8. Show that if x, y, z £ P and x4 -f- y4 = z2 then there exist x\, y 1, zi £ P
with zi < z and x\ + y\ = z\. Use this to show that there are no
x, y, z £ P satisfying x4 + y4 = z2.
CHAPTER 5

POLYNOMIALS

5.1 Polynomial functions and polynomial forms. We interrupt our


study of the various number systems to take up the general algebraic notion
of a polynomial. This will be seen to play a very important role in the
further developments. The impressive success achieved by algebra over
arithmetic in the evolution of mathematical thinking can be summarized
as due to two things. First, algebra provides a simple, compact, and
abstract language for formulating various mathematical problems. In
this way, the mathematical heart of such problems is exposed and all
superfluous details can be ignored. Typically, the problem takes the form
of an equation, or several equations, in one or more variables, x,y, ... ,
for which we seek solutions. Second, algebra provides us with system¬
atic techniques for solving these equations. Of course, the nature of
these solutions varies according to whether we are dealing with integers,
rational numbers, real numbers, etc.
In this chapter we concentrate on the first of these aspects of algebra,
by making precise what is meant by an algebraic equation. For the
purpose of achieving suitable generality, we make no special assumptions
on the underlying system other than that it is an integral domain. Thus,
we assume throughout this chapter that (D, +, •, 0, 1) is an arbitrary integral
domain. It turns out that a considerable amount can be learned about
the structure of algebraic expressions, in particular about polynomials,
even under these general assumptions. We will then see in succeeding
chapters what more can be said about polynomials and polynomial equa¬
tions in each special case. In fact, as we shall see, the development of new
number systems beyond given ones is closely intertwined with these
special problems. We naturally wish to extend any number system which
is not fully adequate to provide for the solutions of equations that we
think (for some reason or other) ought to be solvable.
The simplest kind of algebraic expression for a given system D would
be one built up by means of the symbols +, •, symbols for particular
elements of D, and a symbol x for a single “unknown.” These can be
transformed by formal manipulations, which make use of the commuta¬
tive, associative, and distributive laws, into a certain standard form, e.g.,
2 + (—5) • x + 3 • x2. Now, questions as to what are symbols and
expressions involve us in questions about language which, if we tried to
make them precise, would take us far afield into the modern logical
descriptions of language. An alternative possibility, which is readily pro¬
vided within our present framework, is suggested in the following.
158
5.1] POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS 159

5.1 Definition. By a polynomial function (of one variable) of degree


n (n > 0) with coefficients in D, we mean a function f such that:

(i) ©(/) = D;
(ii) for a certain sequence (a0, oq,. . . , an) of elements of D, where
an 5^ 0 if n > 0, and all x G K,

/(») = 'Yj
i=0

Here (a0, a\, ... , an) is called the sequence of coefficients of/.

The unfortunate aspect of this definition, which is not shared by a


linguistic expression, is that a polynomial function does not uniquely
determine its sequence of coefficients. For example, in /3, consider the
two polynomial functions / and g, defined by f(x) = x3 — x, g(x) = [0]
for all x E 73. Then f(x) = g(x) for all x 6 /3, as is easily checked
directly or from the identity xs — x = x(x — [l])(x — [2]). Thus the
two sequences of coefficients, ([0]) and ([0], [—1], [0], [1]), give rise to the
same polynomial function. In fact, it is easily seen that in any finite
domain there are distinct sequences of coefficients which give rise to the
same polynomial function. It turns out to be preferable, then, to consider
objects associated with any domain D which have formally the same
properties as polynomial functions, but which do not share this dis¬
advantage. What we have in mind is to think of a symbol-like object £
and objects £?=o a^\ which we shall call polynomial forms (in £). These
should have the property:

(5:1-1) if Ztn=offi-r = Ul=o bi£, where either n — 0 or n > 0 and


an 5* 0, and m = 0 or m > 0 and bm ^ 0, then n = m and
ai = bi for 0 < i < n.

In particular we should have


n

(5:1-2) if Y a£ = 0 then ai = °for ® < i < n-


i=0

For if ai 0 for some i, consider the largest such i < n; call it k. Thus
ai = 0 for k < i < n. If k = 0, presuming (as is natural) £° = 1, we
have a0 • 1 = 0 also. If k > 0, we apply (5:1-1) with k instead of n and
m = 0, b0 = 0, giving a contradiction. Conversely, it can be seen that
(5:1-2) implies (5:1-1). For example, if n > m, we define bi = 0,
for m < i < n (if necessary). Then £“= o ai¥ = £?= o bi£l, hence
(ai — bi)? = 0. Application of (5:1-2) then gives the desired
result. Of course, all these manipulations with polynomial forms implicitly
160 POLYNOMIALS [CHAP. 5

involve assumptions about operations +, •, —, which can be defined on


them, extending the operations on D (forms of degree zero). In other
words, what we should expect is that we are dealing with an integral
domain. More explicitly, consider the following:

5.2 Definition. Suppose that (E, +, •, 0, 1) is an integral domain and


£ G E. Then E is said to he a simple extension of D by £, in symbols
E = D[£], if the following conditions are satisfied:

(i) D forms a subdomain of E;


(ii) for each 6 E there are elements a0, . . . , an E D with ri =
r?=o
E is said to be a simple transcendental extension by £ if, in addition,
(iii) whenever o0, . . . , an e D and £”=o «;£* = 0, then ai = 0 for
0 < i < n.

(The reason for using the term “transcendental” will be explained in


Chapter 7.)

Existence and uniqueness of simple transcendental extensions. Our goal,


for the proper use of polynomial forms, will be realized if we prove that
any domain D has at least one simple transcendental extension, and that
such is unique up to isomorphism. We first consider, however, some
properties that apply to any simple extension. To make certain manip¬
ulations formally easier, we use the following:

5.3 Definition. Suppose that E = D[£] is a simple extension of D.


Suppose that {a$, . . . , a,, . . .) is an infinite sequence of elements of D
for which there is an n > 0 with a{ = 0 for all i > n. We shall call
such a sequence essentially finite. With any such sequence we associate
the element £°°=0 a;£\ defined to be £”=o a*£h

It is easily seen that this definition of £*=o <l£* is independent of the


choice of n, so long as ai = 0 for all i > n.

5.4 Lemma. Suppose that E = D[£] is a simple extension of D. Suppose

that (a0, . . . , ai, . . .) and (bQ, are two essentially finite


sequences of elements of D. Then there are essentially finite sequences
(c0, . . . , C{, . . .), (d0, . . . , di, . . .) of elements of D for which:
QQ oo oo

(i) £ ad’ + 2 = Ti Ci¥, givenby a = + 6f for alii, and


i=0 i=0 i=0

(n) / f] a^l\ • ( £ bi£z\ = £ d£, given by di = ay6f_,-.


\i=0 / \i=0 / i=0 ?=o
5.1] POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS 161

Proof.

(i) For some k, l we have a; = 0 for i > k, hi = 0 for i > l. Let n be


the maximum of k and l. Then

Z «-f‘ =
;=o
t *<?> £ ^ i
i—0 i—0
=
i=0
and

2=0
£ 2=0
= £ (a# + biP),
i=0
by 3.45(i),

£ (CLi + hi) if = J] Cit = £ c£,


i=0 i=0 i=0

if we define Ci as above. The proof of (ii) is left as an exercise to the student.


Sometimes it is also convenient to think of the formula for di as being
given by di — ^,j+k=i afik- If the reader tries to write down corresponding
laws for the sum and product of £™=o aip, 1lT=o hip he will see the
economy of the above formulation. However, for purposes of specific
computations it is, of course, preferable to use these more usual forms.
For example, we have

(5:1-3)

where, in particular,

(5:1—4) dn^.m = anbm.

In fact, this follows from 5.4(ii). We extend the given finite sequences
(a0, . . . , an), (b0, . . . , bm) to essentially infinite sequences by making
ca = 0 for i > n, bi = 0 for i > m. Then we see that if i > n + m
then di = £y=o ay&.w = 0, for if 0 < j < i, either n < j or m < i — j,
so that in any case afii_j = 0. On the other hand,
n+rre

dn+m = ^ , O'jb(n+m)—j = anbm,


3=0

since if 0 < j < n then m < (n + m) — j, so that bn+m _y = 0, while


if n < j < n + m, we have dj = 0.

5.5 Lemma. Suppose that E = D[£] is a simple transcendental extension


of D. Suppose that (a0, . . . , Gq, . . .), (60, . . . , 6», • • •) are two essen¬
tially finite sequences of elements of D. Then
oo co

2] aiP = hiP if and only if di = hi for all i.


i=0 i=0
1G2 POLYNOMIALS [CHAP. 5

Proof. If the two sums are equal, then £”=0 ai£l + (—1) £*=0 = 0.
Applying 5.4(h) to this special case shows that (—1) £0=0 ©) F =
£*=o (—bi) £\ Then by 5.4(i) (a* — 6;)!1 — 0; furthermore, there
is an n with a* — bi = 0 and hence eq = bi for all i > n. It follows
that H"=o (eq — bi) F — 2Zi°=o («i — h) F = 0. Then by our defini¬
tion 5.3 of transcendental extension, also (a* — bf) = 0 for all i < n.
Hence oq = bi for all i.

5.6 Theorem. Suppose that E = D[£], E' = D[£'] are two simple tran¬
scendental extensions of D. Then E = E'. We can choose the isomorphic
mapping F so that F(a) = a for each a e D and F(£) = £'.

Proof. The domains (E, +, •, 0, 1) and (E', ©, °, 0, 1) both contain D


and have operations agreeing with those on D when applied to elements
of D. However, they need not agree in any respect otherwise. We shall
write for sums in E'. For each i/eEwe can find an essentially finite
sequence (a0, . . . , a*, . . .) of elements of D with

(1) v = eqf\
i=0

By 5.5, the sequence of a; is uniquely determined by tj. Define F by

(2) F(v) = g diur.


i=0
Then is is clear that

(3) 34(F) = E, (R(F) = E', and F is one-to-one,

the last again by 5.5. Suppose that v, f e E, g given as in (1), and f =


E?=o h?. Then

(4) F(v + f) = F(v) © F(f), F(v • f) = F(v) o F(r),

because the same rules 5.4(i), (ii) for calculating +, • on infinite sums hold
in E as hold for ©, ° in E'. For example, in the first case we have

f(v + n = F^p CiA,


00

where a = cq + bp, thus F(r, + f) = g c;(£')\ where a = at © bi}


i=0
since ai, bi e D, and finally

f(v + n = E a^y © E 6i(€T.


i—0 i=0
5.1] POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS 163

If a e D we have a = £“°=0 ai¥ where a0 = a, ai = 0 for i > 0, hence

(5) F(a) = a,

in particular

(6) F(0) = 0, F(l) = 1.

Finally, we have £ = o where a\ = 1, a» = 0 for i ^ 1, so

(7) F(£) = £'.

Thus our theorem is proved. It can be seen that the conditions that F
be an isomorphism satisfying (5) and (7) in fact uniquely determine
F{rf) for all 7? e E to be given as in (2).
Having the unicity of simple transcendental extensions up to ==, the
only thing we need prove now, in order to make these play the same role
as symbolic polynomial expressions, is an existence theorem. The answer
to the question as to what should serve as the elements of such an ex¬
tension is suggested immediately by 5.5: ordinary infinite sequences
(a0, .. . ,dj,...), (60, are objects such that if (a0, . . . , ai} . . .) =
(b0, . . . , bi, . . .) then a4- = bi for all i. A definition of sum and product
of two such sequences is simply obtained by imitating 5.4(i), (ii). Finally,
we can identify each a e D with the sequence (a, 0, 0, . . . , 0, . . .) and £
with the sequence (0, 1, 0, . . . , 0, . . .).

5.7 Theorem. For each integral domain D there exists a simple tran¬
scendental extension E = D[£],

Proof. We first construct directly a domain (E, +, *, 0, 1) which is ji


simple transcendental extension by a certain element £ of a domain D
isomorphic to D. We define

(1) E = the set of all essentially finite sequences (a0, ...,«»,•••) of


elements of D.

For (a0, .), (b0, G E we put

(2) (o0, + (b0, ...,b{,...)= (c0, ... where


d — + bi for all i,

(.3) (a0, ...,ai,...): (b0, . . ., 6f, . . .) = (d0, . . . , dh . . .) where

i
di — ^ ajbi-j for each i.
3=0
164 POLYNOMIALS [chap. 5

That E is closed under + is obvious, and that it is closed under 7 is easily


verified by the same argument as given for (5:1-3) following 5.4. We
define a function F which will map D isomorphically into E by

(4) F(a) = (a, 0, 0, . . . , 0, . . .) for each a e D; also set a =F(a),


D = <R (F).

In particular, this defines 0 and I. Finally we set

(5) * = <0, 1, 0, . . . , 0, . . .).


We wish to show first that, under these definitions,

(6) (E, +, 0, 1) is an integral domain.

The basic laws 4.1 (i), (ii), (iv), (vi) and the first half of (iii), for a com¬
mutative ring with unity, are readily verified. Consider the remaining
conditions, the associative law for 7 and the (left) distributive law for 7
over F. Given (a0, . . . , ai} . . .), <b0, .), (c0, . . . ,c,-, . . .) e E,
their sum, associating to the right, is a sequence whose fcth term is

while associating to the left, it is a sequence whose kih term is

Consider any k. The terms of the first sum giving the kth. term are thus
aJojCu-i-j where (i, j) is a pair for which i < k, j < k — i. The terms
of the second are atbs-_tCk—s where (s, t) is a pair for which s < k, t < s.
There is a one-to-one correspondence between these pairs (i, j) and (s, t)
under which af)jCk—i-j = atbs_tCk-s; namely, set t = i and $ = i -\- j
when given (i, j), or equivalently, i = t and j = s — t when given (s, t).
Hence the two sequences have identical fcth terms for each k. Less
precisely, as following 5.4, we can think of the first sum as

i\ +(i2-\-i^)=k

while of the second as

(i\ 4-^2)-\-i%=k

We leave the proof of the distributive law to the reader. To complete the
5.1] POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS 165

proof that we have an integral domain it is sufficient to show, by 4.15,


that if (a0, . . . , a,, . . .) 5^ 0 and <b0, 9^ 0 then their product

(d0, . . . , di, . . .) 9^ 0. Let n be the largest i such that a; 5^ 0 and m


the largest i such that b{ 5* 0. Then just as we argued for (5:1-4) follow¬
ing 5.4, we have dn+m = anbm; hence dn+m 9^ 0, and the sequence of
di s is not 0.
To see that we have E = D[£], we next compute what powers of £
and formal polynomials in £ look like. As in any domain, (f)° = T.

(7) (£)j = (eo, ■ ■ ■ , ei, . . .) where ej = 1 and e» = 0 for i 9^ j.

This is easily proved by induction on j, using (3) and (5). Furthermore,


we have

(8) if a G D, a • <60, (ab0, . . . , abi} . . .),

from (3) and (4). Finally, it is seen that

(9) if (a0, . . . , o,-, . . .) G E and m = 0 for i > n then

®i(£) (®0> • • 4 j ■ • •)>


i=0

where we now use (7), (8), and (2) (say, by induction on n). It follows
immediately from (9) that

(10) if a0, • • • , an G D and

£ «*(£)* = 0
;=o

then ai = 0 for each i < n.

For we extend ao, . . . , an to an infinite sequence (ao, . . . , a*, . . .) satisfy¬


ing the hypothesis of (9). The only way in which we can have (ao, . . • ,
ait . . .) = (0, . . . , 0, . . .) is that ax = 0 for all i. Thus we see that

(11) E is a simple transcendental extension of D.

Finally, we claim that

(12) F establishes an isomorphism between

(D, +, •, 0, 1) and (D, +,q0, I).


166 POLYNOMIALS [CHAP. 5

That F is one-to-one is obvious. The remainder of the proof of (12) simply


rests on showing

(13) if a, b e D then a + b = a + b, a • b — a -b,

which is immediate from (2) and (8).


To conclude the proof of the theorem, i.e., to find a simple transcendental
extension E = D(£) of D itself, we need only apply the general result of
(2:4-9), much as we did at the end of the proof of 4.22.

Theorems 5.6 and 5.7 thus provide us with existence and uniqueness,
up to isomorphism, of a simple transcendental extension D[£] of any
domain D. For the purposes of algebra, it makes no difference how any
particular such extension is chosen.

5.8 Convention. We assume throughout the remainder of this book that


(D[£], +, •, 0, 1) is some simple transcendental extension of (D, +, •,
0, 1), given that the latter is an integral domain. The elements of D[£]
will be called polynomial forms, or simply polynomials over D.

We shall very often have to compare the behavior of the same polynomial
in several integral domains. If D and E form integral domains with D
a subdomain of E we would expect that the polynomials over D can be
regarded as polynomials over E. Indeed if we form a simple transcendental
extension E[£], it is clear that the set of all polynomial forms £”=o alfi
in E[£]_such that a0, . . . , an e D is itself a simple transcendental exten¬
sion D[£] of D, and hence ^ to the extension chosen by the convention
5.8. For simplicity, since there are only a specific number of integral
domains that we shall have to compare in this way, we can assume that
the £ used in all cases is the same, so that we have not only D[£] ^ D[£],
but in fact £ = £, hence D[£] = D[£],
We now return to the relationship of polynomial forms to polynomial
functions.

5.9 Definition. Suppose that (E, +, •, 0, 1) is an integral domain which


contains D as a subdomain and suppose that £”=0 ai¥ is any element
o/D[£].

(i) Associated with any such polynomial is the polynomial function


f with £>(/) = E, determined by f(x) = £"=o alxi for all x £ E.
(ii) In particular, if D[£] cE, the value of the associated function f
is defined at £ and is just the original polynomial, /(£) = £”=o aif-
(iii) We say that x is a root or zero of /(£) in D if x £ D and f(x) = 0.

Given E, the association of functions on E with polynomials is well


determined because of the uniqueness of representation 5.5 of polynomial
5.1] POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS 167

forms. To be perfectly unambiguous, we should use a different designa¬


tion /E for the associated function, depending on the choice of E. How¬
ever, whenever we have two different extensions Ej, E2 of D, with Ex c E2,
we will have /E (x) = /e„(t) for all x e E1( so long as Ex is a subdomain
of E2. Since we shall never face any other kind of situation here, the
ambiguity in using such symbols as/, g, etc., without specifying the domains
of these functions, is harmless; moreover, the domains can easily be speci¬
fied when necessary. With this in mind, we can freely use the notations
fit), <7(1), etc. for polynomial forms as in (ii) above. Note that/(£) 5^ 0,
where fit) = ££=0 ait\ simply means that some a; 5^ 0. If fit) — 0
then f{x) = 0 (no matter what E it is that x belongs to); however,
fit) 5^ 0 provides no information about whether or not / has roots in D.

5.10 Definition. Suppose that fit) E D[£],

(i) We say fit) is of degree n, and write deg ifit)) = n, if either


n = 0 and fit) = a0 where a0 E D, or n > 0 and fit) =
^”=0 Oiif where an ^ 0.
(ii) If fit) = r?=o w/iere n > 0 and ara 5^ 0, we say that
(a0, a-i, an) is the sequence of coefficients of fit) and that an
is the leading coefficient oi fit)- We say that fit) fs monic ?/
an — 1.

We call /(£) a constant polynomial and the associated function / a


constant function if/(£) £ D, i.e., deg (/(£)) = 0.

5.11 Lemma. Suppose that fit), git) E D[£]. Then we have:

(i) deg (f(£) + git)) < max (deg (/(£)), deg (y(£)));
(ii) if fit) ^ 0 and git) ^ 0 then deg if it) • y(£)) = deg if it)) +
deg (git))-

The proof of this is left to the reader. We can have < in (i), for example,
with f(£) = 1 + £, git) = 1 — £• Also the hypotheses in (ii) can
obviously not be eliminated in general.

Divisibility and roots of polynomials. We will be especially interested


in properties of divisibility in polynomials, when the division can be
carried out in D[£]:

5.12 Definition. Given fit), git) e D[£], we say that git) divides fit) in
D[£], in symbols git)\fit), if for some Ht) e D[£L fit) = git)h(t)-

It turns out that the relation | between elements of D[£] has surprisingly
many properties in common with the corresponding relation in I. How¬
ever it is technically simpler to describe this in the case that division is
trivial in D, or as we shall say in the next chapter, in the case that D is a
168 POLYNOMIALS [CHAP. 5

field. For the moment, we consider only a simple special case, namely
when <?(£) = (—a) + £ or, as is more usually written, £ — a.

5.13 Theorem. Suppose that /(£) £ D[£], /(£) 0 and suppose that
a £ D. Then (£ — a)|/(£) if and only if f(a) = 0.

Proof. If /(£) = (£ — a)/i(£) then f(x) = (x — a)h{x) for all x £ D.


Hence/(a) = 0. Conversely, let /(£) = £”=0 hip with hn ^ 0. Since
/(a) = 0, we must have n > 0. Also,
n n

/({) = /({) - /(a) = £ - £ b,a‘


i=0 1=0

n 71
= 1 &»(£*' - a1') = X - O
i=0 i=l

= ^ bi(i; — «)(£* 1 + a£* “ + ••• + a1 x)


i=i

[the last by standard algebra (cf. 4.33)]. Using the general distributive
law we can factor (£ — a) out of the sum to give the desired result.

5.14 Theorem. Suppose that /(£) £ D[£] is of degree n and /(£) ^ 0.


Then /(£) has at most n roots in D.

Proof. By induction on n. If n = 0, /(£) = a0 where a0 5* 0. Thus


/(£) has 0 roots. Suppose that the theorem is true for n, and that /(£) is
of degree n -)- 1. If /(£) has no roots in D we are through. Otherwise let
a £ D, /(a) = 0. Then by the preceding, /(£) = (£ — a)/i(£) for suit¬
able A(£), which by 5.11(h) is of degree n. Since f(x) = (x — a)h(x), we
see that an element b of D with b ^ a is a root of /(£) if and only if it is
one of /i(£). Applying the induction hypothesis to h(£) gives the desired
result.

5.15 Corollary. If D is infinite and/(£), p(£) £ D[£], then /(£) = c/(£)


if and only if fix) = g(x) for all x & D.

Thus if D is infinite we have a one-to-one correspondence between


formal polynomials /(£) in D[£] and the associated polynomial functions
/ with domain exactly D. As we already observed following 5.1, this is
not the case for any finite D.

Formal derivatives. The next definition of “formal derivative” of poly¬


nomials, suggested by the basic operation of the differential calculus, and
the theorem following it will turn out to be useful later in our work. The
proofs are left to the reader.
5.1] POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS 169

5.16 Definition. Given /(f) e D[f], /(f) = £?=o aikl, we define the
formal derivative/'(f), a^so written (/(f))', by

/'(*) = S
1=1

Clearly, if deg (/(f)) = w > 0 then deg (/'(I)) < n - 1. We write


/(f)fc for (/(f))fc in the following.

5.17 Theorem. If b e D and /(f), y(f) E D[f] ibm we have:

(i) (&/(*))' = b/'(f);


(ii) (/(£) + <?(£))' = /'(€) + <7'(€);
(hi) (/(f)y(f))' = /(f)s'(f) +f'UMO;
(iv) (/(f)*)' = fc/U)*"1 - /'(I), M fc e P-

Exercise Group 5.1

1. Prove 5.4(ii).
2. Prove that the left distributive law for + over • holds for the system
E defined in 5.7.
3. Prove Lemma 5.11.
4. Let D be the collection of all polynomial functions / with coefficients in D.
For any /, g G D define f+g to be the function hi with hi(x) =
f(x) + g(x) for all x G D, and f~ g to be the function h2 with h2(x) =
f(x) • g(x) for all iGD. For any a £ D, define a to be the constant func¬
tion / £ D with f(x) = a for x E D. Define a function F on D[f] as
follows: for each /(f) £ D[f], F(/(f)) is the polynomial function / with
domain D associated with/(f) by 5.9(i). Prove the following:
(a) D is closed under +, •, and a £ D for each a £ D. _ _ _ _ _
(b) F is a homomorphic mapping of (D[f], +, *, 0, 1) onto (D, +, •, 0, 1).
(c) If D is infinite, the mapping F is an isomorphism.
Where have we implicitly used part of (b) in the proof of 5.13?

5. (a) Prove Theorem 5.17(i)—(iv).


(b) What is (/(f)p)' for/(f) £ I,[f] (p a prime)?
6. Suppose that (D, +, •, <, 0, 1) is an ordered integral domain (hence
infinite). Define Pos to be the set of all /(f) £ D[f] where, for n =
deg (/(f)) and/(f) = Z"=o a,-f\ we have 0 < an. For/(f), y(f) £ D[f],
put y(f) < /(f) if and only if /(f) — g{f) £ Pos. Prove:
(a) For deg (/(f)) = deg (flr(f)) = 0, /(f) = a, g(f) = b, we have
g(f) < /(f), as just defined, if and only if b < a in D.
(b) (D[f], +, •, <, 0, 1) is an ordered integral domain.
(c) There are/(f) £ D[f] with a < /(f) for all a £ D.
170 POLYNOMIALS [CHAP. 5

5.2 Polynomials in several variables. Consider the “general” poly¬


nomial of degree 2 in two variables £1; £2:

/(£ 1) £2) = a0 + al£l + ^2^2 + &3£l + 04^1^2 + «5^2j

where a0, a\, . . . , a5 £ D. It may at first sight seem that a new notion
is involved here. However, if we write

/(£ 1) £2) = (bo + «l£l + Ct3£l) + («2 + «4£l)£2 + 05 £2,

we can think of /(£ 1, £2) as a polynomial gr0(£i) + gq(£i)£2 + ^2(^1)


in |2 with coefficients in D[£j], hence as an element of (D[£i])[£2]. Equiv¬
alently, we can think of /(£x, £2) as an element of (D[£2])[£i[. To ensure
that both ways of looking at /(£1; £2) give the same results in general,
we need the following theorem.

5.18 Theorem. Suppose that D[£i] is a simple transcendental extension


of D by £1} and that (D[C])[^2] is a simple transcendental extension
of (D[£j]) by $2- Then:

(i) (D[^])[^2] = (D[^2])[^];


(ii) D[$2] is a simple transcendental extension of D by £2 and (D[ £2]) [ £x ]
is a simple transcendental extension of D[£2] by £j.

Proof. Suppose that we have an element of (D[£2])[£i]; we denote it


by /(£ 1, £2)- Then for some n > 0 and ^(£2), • • ■ , 0»(£2) G D[£2], we
have /(£i, £2) = Z)?=o • £i- We can choose m large enough so
that for each ! = 0,...,«we have ai)0, . . . , ai>m e D with ^(£2) =
L“=o apy£2- Then

f(( 1. h) = E D a,,yii hi = £ (E
2=o y=o / 2=0 \y=o /

m / n \ m / n \

= 2d ( 2d a*-j£l£2 ) = 2d ( 2d ai,l£l ) £2-


t=o \i=o / 1=0 \;=o /

We have applied here commutativity and the generalized associative and


distributive laws, which is possible since (D[£2])[£1] is an integral domain.
The last equation shows that/(£i, £2) G (D[£1])[£2], Thus (D[^2])[£i] £
(T)[£i])[£2]- Since the relative transcendence of £1; £2 is not used here,
we can establish the reverse inclusion in the same way, and hence (i).
To prove (ii), suppose that /(£ 1, £2) is represented as above and that
/(£i, £2) = 0. We want to show each ^(£2) = 0. By the representa¬
tion /(£j, £2) — zr=0 (Z)o=o «0i£i) £J2 and the transcendence of £2 over
D[£i], it follows that each polynomial ]T'!=o Of,y£\ = 0. But then by the
5.2] POLYNOMIALS IN SEVERAL VARIABLES 171

transcendence of £1 over D, a; ,y = 0 for each i, j. Hence <7i(£2) —


JjJL0 ai,j£2 = 0, which concludes the proof.

/c-/oM transcendental extensions. To make the notation simpler for the


general case of /c variables, we introduce the following.

5.19 Definition. Given an integral domain (E, +, •, 0, 1) which contains


D as a subdomain, and elements £1, ..., £& of E, we define D[G, ... , £*]
recursively as follows:

(i) for k = 0 this is just D;


(ii) for k > 0, D[£i, ...,£&]= (D[£i, . . . , £fc—i])[£fc]-

We say that D[£i, . . . , £*] is a fc-fold transcendental extension of D


if for each i, with 0 < i < n, D[£x, . . . , £<] is a simple tran¬
scendental extension of Z)[£ 1, . . . , 1] by &.

We can now generalize 5.18 as follows.

5.20 Theorem. Suppose that k > 0 and that D]^, . . ., £&] is a k-fold
transcendental extension of D. Suppose that H is any permutation
of {1, ,k}. Then

(i) D[£i, ...,£&] = D[£h(d, • • • , &f<fc)];


(ii) D[£//(d, . . . , &Kk)} is a k-fold transcendental extension of D.

Proof. This can be seen from 5.18, using the fact (Exercise 2 of Exercise
Group 4.4) that every permutation is a product of transpositions, in
particular of simple interchanges of successive elements.
Next, in generalization of 5.6, we have the following.

5.21 Theorem. Suppose that D, D are isomorphic integral domains with


isomorphism given by G, 2D(G) — D and (R(G) = D . Suppose that
D[£i, . • , £fc], D'[£!,...,£*] are k-fold transcendental extensions of
D, D', respectively. Then D[£i, .. .,£&] = D [£], . . . , ££]• T/e can
choose the isomorphic mapping F so that F{a) = G(a) for each a G D
and F(fii) = # for each i.

Proof. This proceeds by induction on k. For the induction step we use


a slight generalization of 5.6, where the ground domains are not necessarily
the same but are isomorphic. However, the proof ol this generalization is
essentially the same.
It is not necessary to prove a new existence theorem for fc-fold tran¬
scendental extensions. This follows immediately from the definition 5.19
and the basic existence theorem 5.7 for simple transcendental extensions.
172 POLYNOMIALS [chap. 5

Thus we are justified in the following.

5.22 Convention. We assume throughout the remainder of this hook that


(D[£1; . . . , fa], + , •, 0, 1) is some fixed Ic-fold transcendental exten¬
sion of (D, + , •, 0, 1), given that the latter is an integral domain. We
call the elements of D[£1} . . . , fa] polynomial forms, or simply poly¬
nomials, in the k “variables” fa, . . . , fa, over D.

To justify and generalize the use of such notation as f(fa, fa) in 5.18, we
make the following definition.

5.23 Definition. Suppose that (E, + , •, 0, 1) is an integral domain which


contains D as a subdomain.

(i) With any element y of D[£i, . . . , fa] is associated a k-ary function


f with £)(/) = {(xi, . . . , Xk): Xi, . . . , x-k E E} and values deter¬
mined by the following recursive conditions:
(a) if k = 1, / is found as in 5.9(i) ;
(b) if k > 1 and the given polynomial y is £"=0 fa fa where each
fa E D[£i, ■ • ■ , £fc_i], and with each fa is associated the
(k — 1 )-ary function gi (for any extension E of D) then for
any x1} . . . , xk E E,

/(•i-i, • • •, gi(%i) • • •, Xk—i)xk.


i=0

(ii) In particular, if T>[£1} ..., fa] cE then the value of the function
f associated with any polynomial y is determined at fa, ... , fa,
and we have/(fa, . . . , fa) = y.
(iii) (d, . . . ,xk) is said to be a root of the polynomial f(fa, . . . , fa)
in D if xi, . . . , xk E D and f(x1} . . . , xk) = 0.

The association of functions to polynomials in (i) above is well deter¬


mined by the recursive conditions. For, by our convention 5.22, we are
always dealing with a /c-fold transcendental extension D[£1; of D.
If k = 1, the association is possible by our arguments concerning 5.9.
If k > 1, then D[fx, . . . , fa] is a simple transcendental extension of
D[£x, . . . , fa—j] by fa. Hence each element y of D[£j, . . . , fa] has a
representation y = £?=o 'fakk, which is unique in the sense that for any
other such representation y = fa fa, we must have fa = fa for
i < min (n, m) and fa = 0 and fa = 0 for i > min (n, m). Thus using
either such representation gives rise to the same function.
5.2] POLYNOMIALS IN SEVERAL VARIABLES 173

As with 5.19, there is a possibility of ambiguity concerning the associated


polynomial functions / if we do not explicitly mention £>(/). Again, this
ambiguity is harmless. There is, however, another possible ambiguity
here which must be treated with slightly more care. Suppose, for example,
that we have an element r? = Ukk of D[£j, . . . , £fc] which does not
depend on £&; in other words y = where S D[£1; . . . , £*,_!]. Then,
according to whether we look upon 17 as a polynomial in £1, ...,£*> or
as one in just £1, . . . , £fc_i, we have associated two different functions,
a k-ary function/and a (k — l)-ary function g. The relationship between
these is simple:

f (A 1 j * • • j Xk—lj Xk) • • • j Xk—l)

for any aq, . . . , Xk—i, Xk. To make the distinction clear, we always ex¬
plicitly mention which kind of polynomial we are dealing with, for example,
by such notations as /(£ 1, . . . , £*), <?(£i, . . . , £fc_i), etc.; in the use of
/(£x, . . . , £fc), we do not exclude the possibility that this polynomial does
not depend on all of £1, . . . , £*>
As one special case of 5.20, we see that for each l < k, any polynomial
/(£ 1, . . . , £fc) can be written as a polynomial in £7 by

D[£i, . . •, £fc] = D[£i, • • • , £7—1, £z+i, • • •, £/c £z];

we have
n

/(£ 1, •••,£*)= ^ <7i(£i, • • •, £z-i, £z+i, • • • > £fc) • £l


i=0

Moreover, by the transcendence of £7 over the preceding domain, this


representation is unique. For each such l we can thus assign a degree
deg 7(/) to /(£!, . . . , £fc) when it is regarded as a polynomial in £7. For
example, the following polynomial over I,

/(£i, £2) = 2 + 3£i — 5£i£2 + £1 — 4£f£2

has degree 2 as a polynomial in £x and degree 1 as a polynomial in £2.


In many problems, it is more appropriate to deal with the total degree of
the terms in /(£1; . . . , £fc); this is obtained by adding the degrees in each
term. In this particular case, the total degree of /(£i, £2) is 3. It is also
natural to group together all terms which have the same total degree.
In the above case, this amounts only to grouping together — 5£i£2 and £?.
The general case takes the following form.

5.24 Theorem. For any i > 0, let

Si = (0'i, • • • ,jk)-0 < ju . . . , 0 < jk and j 1 +-h jk = i}-


174 POLYNOMIALS [CHAP. 5

Then for any /(£ , h) E D[£1? . . . , £fc] we can find n > 0, and
for each i < n and (j\, . . . ,jk) E S%, elements a^.j E D such that

/({!,...,&) = t( L Ojr.
* = 0 \(il.Jk)^Si

This representation is unique in the sense that if also

m /
m, ...,&)=£( I ./* # • • • &
i=o \(ii.jfc)est-

i/ien for each i and (jx, . . . ,jk) E we /iare a^ ; = 6^


7/ i < min(n, m) and a^ ^ = 0 /or i > m, if n > rn, and
bji.j = 0 for i > n, if m > n.

We leave the proof of this to the reader. For simplicity of notation we


shall write
L instead of the above
ii+-• • +hfc=*

By the generalized commutative law, it does not matter in which order


this summation is carried out.

5.25 Definition. Suppose that f(£i, ... , £*) E D[£1; . . . , £fc]. We say
that /(£i, . . . , £k) has degree n in £i, . . . , £&, and

(i) write deg (/(£i, • - . , £&)) = n if either n = 0 and

/(ii, • • •, h) E D

or n > 0 and we can write

/(£., ...,«*)= t( L %,.-iF).

where a^.^ ^ 0 for somejx, . . . ,jk withjx +-b /*, = to.


(ii) We call /(£i, ... , £/t) homogeneous o/ degree n if it is of degree n
and if

/(£i) •■•>&) = S ah.h


ii4-

(iii) We call f(£i, . . . , £k) linear if it is homogeneous of degree 1.

Symmetric polynomials. We know that if H is any permutation of


{1, . . ., A;} then D[£#(1), . . . , £//(fc)] is a /c-fold transcendental extension
of D in the sense of 5.19. But then, not only is this the same as
D[£i, . . . , £fc], but also by 5.21 we can establish an isomorphism F be-
5.2] POLYNOMIALS IN SEVERAL VARIABLES 175

tween these two (identical) domains, determined by F(a) = a, F( £;) =


&?(*) for each i. Then for any /(£x, . . . , £*,) G D[£1; . . . , £fc] we have
^(/(£i> . . • , £*)) = /(£ff(i), • • ■, Iff(fc))- For example, if

/(£i> £2, £3) — 2£i — 3£2£3 + £i£3


and
77(1) = 2, 77(2) = 3, 77(3) = 1,
we have

71(/(£ 1, £2, £3)) — /(£2, £3) £1) — 2£2 — 3£3£x + £2£i-

A very interesting special class of polynomials consists of all those such


that /(£ 1, ...,£&) = /(£//(d, . . . , £//(*>), no matter what the permuta¬
tion 77 of {1, . . . , k} is. We shall call these symmetric polynomials in
£1, . . . , £fc. For example, the polynomials £1 + £2 + £3, £1 + £2 + £3,
£i£2 + £i£3 + £2£3, £i£2£3 are symmetric in £lf f2, £3- However, these
are no longer symmetric if regarded as polynomials in |2, £3) £4, for
they are only invariant under those permutations 77 of {1, . . . , 4} such
that 77(4) = 4.
There is a very close connection between symmetric polynomials and
roots of polynomials in one variable. Suppose, for example, that we have
/(£) G D[£] of degree 3 and roots X\, x2, x3 G D such that /(£) =
(£ — ar)(£ — £2)(£ — £3). If we also write

/(£) = £4 + £>i£2 + b2% + 63,

we see that

b 1 = —(xi + x2 + x3), b2 = xix2 + x4x3 + x2x3,

and
63 = — (XiX2X3).

For/(£) of degree 4 with roots xlt x2, x3, x4 we have

/(£) = (£ — *i)(£ — ®a)(£ — s3)(£ — au)


= £4 + bi^ + b2£2 + 53£ + b4
and
61 = — (zi + 2:2 + ^3 + £4),
b2 — x4x2 + XjX3 + ^1^4 4" ^2^3 4- x2x4 4- £3^4,
b3 = — (x4x2x3 4“ xix2x4 4- x4x3x4 4" x2x3x4),
and
b4 = x4x2x3x4.
176 POLYNOMIALS [CHAP. 5

This suggests the following.

5.26 Definition, (i) We call a polynomial /(£b ...,£*) e D[£1} ... , £fc]
symmetric in , . . . , kk if for any permutation H of {1, . . . , k} we
have /(£i, ...,£*) = /(£#u)> • ■ ■ ,
(ii) Let 0 < n < k. By the nth elementary symmetric polynomial in
£i, . . . , we mean

°n(£i, • • • , h) = X • • • !zn,
l<h<^2< <ln<k

when n > 0. We take cr0(^1, ...,£*) = 1.

That every symmetric polynomial <Xn(£i, . . . , £&) is actually symmetric


in the sense of part (i) of this definition is seen by writing their descrip¬
tion in a slightly different form. Given k, for each n < k, let Mn be the
class of all sets X c {1, . . . } k} such that X has exactly n elements. Each
such X can be uniquely represented in the form X = {h, h, ■ ■ ■ , ln}
where 1 < h < l2 < • • • < ln < k. Then

(5:2-1) MS,_fa) = I ('ll SiV


IGM„ \iex /

Now if H is any permutation of {1, . . . , k} then we obtain an induced


permutation H of elements of Mn by taking H({7X, l2, . . . , ln}) =
{H(l i), H(l2), . . . , H{ln)}. Whenever lu l2, ... ,ln are all distinct, so
are H{12), . . . , H(ln); furthermore given any distinct l[, l2, . . . , l'n,
we can find h, l2,...,ln with H({Zj, l2, . . . , ln}) = {l[, 1'2, . . . , Vn). Of
course, H does not necessarily preserve an ordering h < l2 < • • • < ln;
this is the reason for passing to the terminology of sets. Now we see that

(1), • • ■ , £ff(fc)) = n £tf(Z)


XGM„ zgx

= ij n ^ = s n
XGM„ Z'GH(X) I£M„ i'GF

since for each Y e M„ we can find unique X e Mn with H(X) = Y.


Hence crn(%H(p, • • . , Zhoo) = 0"n(£i, • • ■ , £*). We have thus proved the
following.

5.27 Theorem. Suppose that 0 < n < k. Then crn(0, ■ ■ ■ , h) is a


homogeneous polynomial of degree n which is symmetric in £1; . . . ,

We next generalize the relationship between roots and symmetric poly¬


nomials, observed for polynomials of degrees 3 and 4 in one variable, to
5.2] POLYNOMIALS IN SEVERAL VARIABLES 177

polynomials of any degree. We also note an interrelationship between the


polynomials <rn(£ x, . . . , £fc) for different choices of k.

5.28 Theorem, (i) If xly . . . , xn e D then

- Xl) • • ' (€ - In) = E ..., OS4'.


i=0

(ii) If 0 < n < k — 1 then

°n(£i, ■ ■ ■, £fc—i, 0) = crn(£x, • • • , £fc—i).

Proof. We leave the proof of (i) to the reader. In (ii) it is seen from the
notation that we are dealing with the symmetric polynomial of degree n
in £x, • • • , on the left-hand side and with the corresponding polynomial
in £x, • • • , £fc_i on the right. The proof of (ii) can be seen in several ways.
The first would be by direct inspection of the definition 5.26(ii), or its
alternative form (5:2-1). From the latter we see that

(1) (TnUl, fc-1,0) = £ ( II *l)>


igm; \u=x )

where X e M], if and only if X = {Zx, l2, . . . , ln} with 1 < Zx <
l2 <■••< ln < k — 1, since all X e M„ such that k e X contribute 0
to the sum. Then the conclusion is immediate from (1). Another argument
is as follows. Consider a (k + l)-fold transcendental extension D[£,
£x, . . . , £*,], and the fc-fold transcendental extension D[£, £x, . . . , £fc_x].
Applying (i) of this theorem to these gives

(2) ($-*!)•■•($- fa-i)(€ “ fc)

= ^ (—l)*0"t(^i> • • • , fc-i, £k)!;k

(3) a - ti) ■ ■ ■« - = E • • •, a-i)^~1_i.


i=0

From (2) we have


k
(£-*!)•••(«- &-i)f = E (-l)Wii, • • •, &-!, 0)^"*'
1=0

and from (3)

(£ — £i) • • • (£ — £fc-i)£ = ^ (—l)!fi(^l, • • • , £k-l)£k *•


4=0
178 POLYNOMIALS [CHAP. 5

By the unicity of representation of polynomials in D[£, £1} . . . , 1] as

polynomials in £ with coefficients in D[£x, . . . , £*_i], we obtain the


desired result. [Note that there is no conflict, since

°fc(£i, • ■ • > Zk—i> £fc) — £1 • • • £&—i£fc

gives crfc(£1, . . . , 0) = 0.]


The surprising fact about the elementary symmetric polynomials is
that every symmetric polynomial can be represented as a combination of
them. For example,

5£i + 5£2 + 5£3 + £? + £2 + £3 — 5cr 1 (£1, £2, £3) + ai(£i, £2, £s)2
— 2cr2(£ 1, £2, £3) = gi? 1 (£1, £2, £3); ®*2(£ij £2, £3)),

where gr(£x, £2) = 5£i + £x — £2- We can also write this as a polynomial
h(cr 1 (£1, £2, £3)>°'2(£i, £2, £3); ^"3(£1 > £2, £3)) in all the (nontrivial) sym¬
metric polynomials in £x, £2, £3 by taking /i(£x, £2, £3) = 5£i + £? —
£2 + 0 • £3. The general statement here is the following.

The fundamental theorem on symmetric 'polynomials.

5.29 Theorem. ///(£ 1, . . . , £&) ts a symmetric polynomial in £1, . . . , £fc


then we can find g{ti, ...,£*,) e D[£b ..., £fc] such that

/(£ij •••,£*) = <7(o'i(£i, • • • , £fc), • • - , cr/c(£i, • • • , £*))•

Proof. For simplicity of notation,

(1) we write a 1 for o'j(£1, . . . , £*).

Note that each <n is homogeneous of degree 7, so that for any j (and no
matter what A is),

(2) deg (of) = Z • i

[by Exercise 2(b) below]. In the terms of (1), the statement of our theorem
takes the following form:

(3) if f(t 1, , £*) is symmetric in £x,...,£* and

deg (/(£i, . . . , £*)) = n

then we can find g(ti, ■ ■ ■ , £fc) with

/(£ 1, — b) = gi&ij ■ ■ ■, 0k)•


5.2] POLYNOMIALS IN SEVERAL VARIABLES 179

We proceed by induction on k; we call this the primary induction. For


k = 1 any polynomial/(£j) is symmetric in £j. Furthermore, (£x) = £1;
so we can take <7(£i) = /(|i). Suppose that (3) holds for k — 1 and any
n, where k > 1; this is the primary induction hypothesis. We now prove
it for k and any n by induction on n; we call this the secondary induction.
For n = 0, /(£i, . . ., h) G D and we can again take g(ki, • ■ • , h) —
/(£i, . . . , £*:)• Suppose that n > 0, and suppose that the result is true
for all polynomials in £i, . . . , ^ of degree < n; this is the secondary in¬
duction hypothesis. Consider /(£1; . . . , £*) symmetric of degree n. Then
if we set

(4) /o(£i, • • • , i) = /(£i, ■ • • , h-i, 0)?

we see that

(5) /o(£i> • • • , £/c_i) is a symmetric polynomial in £i, . . . , i with


deg (/o(£i, • • • , £k-i)) = n0 < n.

For, any permutation 77 0 of {1, . . . , k — 1} can be extended to a permu¬


tation 77 of {1, . . . , k} by setting H(k) = k. Hence, by the primary
induction hypothesis on k — 1,

(6) we can find g0(£i, ■ ■ ■ , £k-i) such that /0(£i, • • • , Sk-i) =


fl'o(0'l(£l> • • ■ , £fc— l), • • • 1 (€lj • • • > %k— l))-

By 5.28(h), . . . , h-i) = i, • • •, £k-i, 0) /or each l < k — 1.


Thus

(7) tee d; = ^(£1, . . . , &-1, 0), tee /tare /0(€i, • • • , £k-i) =


i, • • • , &k—i)-

Note that /0(£i, • • • , £jt-i) is just the first coefficient in the representa¬
tion of /(£i, . . ., fa) as a polynomial in Since /0(£i, . . . , £k-i) is
only symmetric in £i, . . . , £k—i, we cannot expect that /(iff, . . • , £fc) —
fo(£i, ... , ^_j) is symmetric in £i, . . . , However, the representation
(7) suggests that we consider the following closely related difference.

(8) Let /!($!, . . . , &) = /($!, ... ,h) — go(<T\, ■ ■ ■ , <Tk-1). r/tett
/i(^i, . . . , £*;) ts symmetric in £1; . . . , £& and

deg (/i(£i, . . . , £*)) < w.

The symmetry of fi(£i, is obvious, since both /(£i, . . . , £/t) and


do(o-i(li, • • • , £k), . . • , o-fc-i(fi, • • • , £k)) are symmetric in £x, • • • ,
As to the degree, it is sufficient to show that deg (g0(ai, . . . , o&_i)) < n0.
180 POLYNOMIALS [chap. 5

We know that 0O(£i, • • • , £/c-i) can be written as a sum of terms of the


form bji jk_Ji1 • • • fei, with 6yx y4_x s* 0. Then 0oWi, • • • , oW)
is the corresponding sum of terms 6^ yfc_x<ri1 ' ‘ ' &ik-il- According to
(2), which is independent of fc, the total degree of each such term is
TOyx 4 = ji + 2;2 +-f (fc — I)4"1 where, by (5)-(7), m < n0.
Also 00(0-1, . . . , (Tk-i) is the corresponding sum of terms

\.• • • ^-ib

and the total degree of such terms remains equal to my ,y . Hence we


have deg (g0(ox, . . . , oj,_i)) < n0. (In fact, it is easily seen that we
have equality here.)

Now it follows from (4)-(8) that

(9) A(£i, • • • , £*-i, 0) = 0.

Thus if we regard /i(£i, as an element of (D[£x, . . . , £ft-i])[£fc],


it is a polynomial in with coefficients in D[£x, . . . , ^_x] which has
the root 0. But then by 5.13, £*|/x(£x, • - • , £fc) in D[£x, . . . , £fc]. In
other words,

(10) we can find a -polynomial hk(^i, • • • , %k) in D[£x, such


thatf 1(^1, ...,£*) = ?fcfefc(€i, • • • , £*)■

Now consider any permutation H of (1, . . . , /c} with H(k) = k — 1.

Then
/l(£l> • • • j £fc) = ■ ■ ■ , %H(k))
= kk—\hk{^H(l), ■ ■ ■ , ^H(k-l), £hOc)),

which we write as £k-ih*(£i, ■ ■ ■ , £&). By (9) it follows that h*(i; 1, . . . ,


£k—i, 0) = 0, so that by the same argument as above, £fc|^*(£i, . . . , h)
in D[£x, . . . , £*,]. Thus we can find a polynomial /pc_x(£x, . . . , £k) such
that/i(£x, . . . , £fc) = £k-i£khk-itti, ■ ■ ■ , £&)• By repeating this permu¬
tation argument with H(k) = k — 1, H(k — 1) = k — 2, we can then
find /pfc_2(£i, ...,£*) with

/i(£i> • ■•>£&) = It—2£fc—2(^1, • • •, £*).

Since cr^i, . . . , £a0 = £x£2 • • • £&, we eventually obtain the following.

(11) ITe can. /ind /i(£x, . . . , £&) G D[£x, . . . , £*] snc/i £/ia£
/l(£l> •■•,£*:) = <Tk • ^(£1, • • • , £&)• /i(£i, ■■■,&) is symmet¬
ric in £1, . . . , and deg (fc(£x, . . . , £fc)) < n — k.
5.2] POLYNOMIALS IN SEVERAL VARIABLES 181

The second part of this conclusion is seen from (8) and deg (crk) = k.
Also we must have symmetry because for any permutation H of {1, . . . , k},

= Vkifil, • • • , £fc) = • • • , £//(*)) ;

hence by the symmetry of /j and cancellation in an integral domain, we


must have h(£i, . . . , ■ ■ ■ , £#(&)). It follows from our
secondary induction hypothesis that

(12) we can find gq(£i, swc/t

/i(£i> • • • , £fc) = o'* • 9'i(o'i, . . . , crk).

We can now conclude the secondary induction step, for by (8) we have

(13) /(£i, ...,£*)= o* • ^i(o’i, . . . , o-*) — ^(o’i, • • • , o-*_i).

If we regard g0(£i, • - • , £*-i) as a polynomial in £1; . . . ,

#o(£i, • ■ ■ > l) = 0o (£i, • • • , ?*),

then the polynomial

Q(£l, •••,(*;) = £*<7l(£l, ■ ■ ■ > £k) — <7o(£l, •■•,£&)

is such that/(^x, . . . , £&) = g(alt . . . , ak). This concludes the secondary


induction step. Thus, by induction, (3) holds for k and polynomials of
any degree n. Since this completes the primary induction step, we thus
see that (3) holds true for any k and n.
The proof also provides us with a systematic method for representing
any given symmetric polynomial as a polynomial in the elementary sym¬
metric polynomials. However, the computations involved are quite
laborious even for fairly simple symmetric polynomials /(£ x, . . . , £fc).
Less complicated techniques for treating some special cases are discussed
in the exercises.
Because of the relationship 5.28(i) between the coefficients, roots, and
the elementary symmetric polynomials in these roots of a polynomial in
one variable, the fundamental theorem 5.29 turns out to have a number
of important consequences concerning the solvability of algebraic equations.
We shall discuss these in later chapters. It can be shown that the poly¬
nomial g(£ 1, . . . , £fc) can be chosen in only one way to satisfy 5.29 for
given /(£i, . . . , £*). However, this uniqueness result provides no addi¬
tional information in the further applications of 5.29; we thus omit the
proof.
182 POLYNOMIALS [CHAP. 5

Exercise Group 5.2

1. Prove Theorem 5.24.


2. Prove the following generalizations of 5.11, for

/(G) • • •) G)> s/G) • • •) G) £= D[G) • • • > Gl;

(a) deg (/(G, • • • ) G) + 0(G> • • • . G)) < max (deg (/(G, • • • , G),
deg (g(G, • • • , G)));
(b) if /(G) • • • , G) ^ 0 and gr(G> • • • , G) ^ 0 then

deg (/(G) • • • , G) ■ ff(G, ■ • • , G))


= deg (/(G) • • ■ , G)) + deg (gr(G, • • • , G))-

3. (a) Show that 5.14 does not generalize to polynomials in several variables
by giving an example of /(G> G) £ I[G> G1 such that/(G> G) ^ 0
but/(G) G) has infinitely many roots in I.
(b) Show that, nevertheless, 5.15 does generalize to polynomials in several
variables (even though 5.14 is essential to the proof of 5.15): if D is
infinite, /(G, • • • , G), g(G, • • • , G) e D[G, • • • , G1 and

f(x i, .... xk) = g(x i, . . . ,xk) for all xi, . . . , x* E D

then

/(G) • • •) G) = 0(G> • • •) G)-

4. Prove Theorem 5.28(i).

5. Represent £?G + £iG + £lG + £lG + dsG + £§G as g/op, cr2, 0-3),
where we write 07 for <rz(G, G> G)-
6. Let A;, m be arbitrary positive integers and set 7m = £y=1 Put
To = m. Show that

£ (-l)X_i<r, = 0,
;=0

where n = min (k, m), and ay is oy(G> • • • , G)- Use this to represent
£j=i £y as a polynomial in the elementary symmetric polynomials in
G> G) G-
7. Verify that

(G — G)2(G — G)2(G — G)2


2 2 3 3 2
= <7i(J2 — 4<T2 — dcrio-g — 27(13 + l8ai<T2cr3-
CHAPTER 6

THE RATIONAL NUMBERS AND FIELDS

6.1 Toward extending integral domains. Algebraic motivations. The


source of the concept of division as the basis for extending integral domains
is twofold. One approach to it is essentially algebraic, the other essentially
geometric. We have seen at the beginning of Chapter 4 that by means
of subtraction we can simplify the question of existence of solutions of
certain pairs of equations. The general form of such equations is

(6:1-1) axx + bxy = cu


a2x + b2y = c2,

where ax, a2, bx, b2, cx, c2 are given elements of some integral domain D.
If we multiply the first equation by b2, the second by bx, and then sub¬
tract one equation from the other we see that any x e D for which there
exists a y G D satisfying (6:1-1) must also satisfy

(6:1-2) (a ib 2 — a2bx)x = cxb2 — c2bx.

Similarly we reach a condition which y must satisfy. Both these new equa¬
tions have the general form

(6:1-3) bx = a.

Now if b = 0, there exists an x G D satisfying (6:1-3) if and only if


a = 0; and, if this is the case, every x E D is a solution of (6:1-3). If
b ^ 0, we know that there is at most one solution of (6:1-3); for in an
9

integral domain, if bxx = bx2 and b ^ 0, then xx = x2. However, it


9

may well be that b ^ 0 and yet there is no solution at all to (6:1-3) (as
9

simple examples in the integers show).


It would at least be a formal advantage for representing the solutions
of certain equations if we had an integral domain K, containing the given
domain D, satisfying

(6:1-4) for any a, b e D, if b 9^ 0 then there exists x e K such that


bx = a.

In this case we would say that a is divisible by b in K. Suppose further


that we are able to construct the domain K in such a way that its elements
183
184 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

arise only as solutions of equations of the form (6:1-4), i.e., that

(6:1-5) for any u e K there exist a, 6 G D with b 9^ 0 and bu = a.

This is plausible so far as D is concerned, for any u G D is the solution


of the equation 1 u = a, when we take a = u. Then we claim that, as
a consequence of (6:1-4) and (6:1-5), for any u, v e K, with u 9^ 0, v
must be divisible by u in K, if K is an integral domain. For we have
certain ai, a2, bi, b2 G D with bi 9* 0, b2 9^ 0 and b\U = a\, b2v = a2
by (6:1-5). Since b\ 9^ 0 and u 9^ 0 we have 9^ 0; hence b2a\ 9^ 0.
But then by (6:1-4) there is an x e K with b2aix — bia2. Substitute
b\U for a\ and b2v for a2 here. This gives bib2ux = b1b2v. Since bib2 9^ 0,
we can cancel to give ux = v, which shows that v is divisible by u in K.
Reformulating this, we obtain

(6:1-6) for any a, b G K, if b 9^ 0 then there exists x £ K such that


bx = a.

One of the main objects of this chapter will be to show that given any
integral domain D, we can construct another domain K such that D forms
a subdomain of K and K satisfies (6:1-4) and (6:1-5) or, equivalently, as
we have just seen, (6:1-6) and (6:1-5). Further we shall see that such a
K is uniquely determined up to isomorphism. In particular, any system
thus associated with the integers will serve the purpose that we have in
mind for the rational numbers. Thus we can speak of the quotient a/b of
two integers a, b (b 9± 0) in an algebraically well-defined and consistent
sense, as being the result of division in such an extension.

Geometric motivations. The geometric approach to the notion of division


has to do with the attempt to apply the process of counting to the measure¬
ment of straight line segments. If we take a fixed length as unit of measure¬
ment, then wq ascribe a positive integer n as length to any line segment
which can be subdivided into n equal segments of the given length. As a
practical question it is seen that, in this sense, we could only very rarely
ascribe a definite length to an arbitrarily given line segment. However,
we could measure a larger class of segments if we took a shorter unit of
length. Suppose this were chosen in such a way that the original unit had
length 6 as measured with respect to this new unit, where b e P. Suppose
further that we are fortunate enough to be measuring a segment which has
length exactly a, where a e P, with respect to this new unit. Then we
should say that the ratio of the length of this segment to the original unit
is a to b. Hence we ascribe to it the “length” a/b in terms of the original
unit.
6.1] TOWARD EXTENDING INTEGRAL DOMAINS 185

The relationship between these two approaches is the following. If we


assume certain geometric conditions concerning measurement of line
segments as evident, the above assignment of formal ratios will satisfy
certain algebraic conditions. For example, we should have (for a, b,c,de P)

a
(6:1-7) (i) ^ if and only if a c;
b

(ii)
a + c
(iii)
nr~;
c
(iv) if and only if a < c.
b

The first of these is intuitively evident. The second is seen to hold by con¬
sidering a further subdivision of the smaller unit (which makes up the given
unit of length b times) into c equal parts. The third condition should hold
for any operation + on lengths which, if is the length of a given segment
P1P2 and l2 is the length of P2P3, where P2 is between Px and P3 on a
straight line, is to give 11 + l2 as the length of P1P3. The last condition
should hold if < is to be a relation between lengths such that if l\, l2
are lengths then li < l2 if and only if there are points P1; P2, P3 on a
straight line with P2 between Pj and P3, lx is the length of PXP2, and l2
is the length of P1P3. It follows from (6:1-7) that (for a, b, c, d e P)

(6:1-8) (i) ^ ^ if and only if ad = be;

ad + be
(,1) i+i bd
d c
(iii) < ~^if and only if ad < be.

For if a/b = c/d, then ad/bd = bc/bd by two applications of (6:1—7) (ii);
then ad = be by (6:1—7) (i). Since these steps can be retraced by the same
conditions, we see that the equivalence in (i) above holds. For (ii) we
write
be ad + be
bd bd

by (6:1—7)(ii) and (iii). Similarly, we can obtain (iii) from (6:1—7)(i), (ii),
and (iv). If we apply (6:1-8) to ratios a/1, c/1 we see that the system of
ratios a/1 under +, < is isomorphic to the positive integers under +, <.
Further properties of + on arbitrary ratios a/b, such as commutativity,
associativity, cancellation, etc., can be seen to follow from (6:1-8).
186 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

Similar considerations on the use, first, of positive integers, and then of


ratios to measure areas lead to the condition
a c ac
(6:1-9)
b d bd

Here the guiding ideas are that units of area measurement should be
squares, and that the area of a rectangle with lengths llt l2 as sides should
be li ■ l2- If this were pursued in detail, it would be seen that we obtain
a system (Rax, +, •, <, 1) from (P, +, •, <, 1) such that

(6:1-10) (i) (P, +, •, <, 1) is a subsystem of (Rai, +, •, <, 1);


(ii) for every a, b e P there exists x E Rai with b • x = a;
(iii) for every x e Rax there exists a, b e P with b • x = a;
(iv) any statement which holds true for the set of positive elements
of ordered integral domains holds true for Rax.

To be more precise concerning (iv), Rax would satisfy the commuta¬


tive and associative laws for +, *, the left-distributive law for • over +,
and the law x • 1 = x for all x E Rax. In addition, cancellation laws for
+ and • would hold without restriction, as well as ordering conditions
corresponding to 4.15(ii)-(iv). Finally, we would have x < y if and only
if there is u E Rax with x + u — y.
This suggests that if we can construct an extension Rax of P satisfying
(6:l-10)(i)-(iv), then we should be able to find an integral domain Ra2
extending Rax, of which Rax is the set of positive elements. In fact, the
same method which we used to construct the integers from the positive
integers could be used to construct such an Ra2 from Rax. Finally, to
bring the geometric approach in full correspondence with the algebraic
approach, we would see that the system Ra2 thus obtained is isomorphic
to the system Ra of rational numbers. In a sense, conversely, we shall
see that the characteristic properties (6:1-8) of fractions are direct con¬
sequences of the statement that Ra is an ordered integral domain satisfy¬
ing (6:1-6).
Historically, the geometric approach to a general treatment of division
and ratios antedated the algebraic approach. However, from our present
standpoint, the going is smoother if we follow the latter course. More¬
over, by doing this, we do not involve ourselves with additional assumptions
of a geometrical nature. Indeed, we can reverse the historical sequence by
founding geometry through coordinate systems (the method of analytic
geometry, in contrast to synthetic geometry). We shall explore this matter
a little more fully later. Despite our algebraic procedure (but not in
conflict with it), we shall find that an appeal to intuitive geometric notions
will be quite helpful occasionally in motivating certain steps.
6.1] TOWARD EXTENDING INTEGRAL DOMAINS 187

Fields. We now define more precisely the systems we are trying to


construct.

6.1 Definition. A system (K, + , •, 0, 1) is said to be a field if it is an


integral domain satisfying the following condition: for any a, b E K
with 5^0 there exists an x 6 K with b • x = a. If, in addition, there
is a relation < between elements of K which makes K an ordered integral
domain then we call K an ordered field. If K forms a field, or ordered
field, and is a subsystem of a second system then it is said to be a subfield,
respectively ordered subfield, of the second system.

Throughout the remainder of this section, we assume that (K, +, •, 0, 1)


is an arbitrary field. As we have already remarked, it follows directly from
Definition 4.13 of an integral domain that for any a, b E K with 6 ^ 0,
there exists a unique x e Iv with b • x = a.

6.2 Definition. For each a, b E K with b ^ 0, we take

a/b, or ^>

to be the unique x with b • x = a. We take b—1 to be 1/6.

Thus the function F(a, b) = a/b is an operation from K X (K — {0})


to K, and G(b) = 6—1 is an operation from K — {0} to K. We shall call
a/b the ratio of a to b or the quotient of a (divided) by b. We shall call
b~1 the inverse of b; the reason for this is that its characteristic property
is b • b~1 = 1.
It is easily seen that if L is a set with 1 e L, L c K and such that
whenever a, b E L we have a + b, a — b, a • b E L and, provided 6^0,
also a/b E L, then L forms a subfield of K under the operations of K.
Moreover, if K is an ordered field, then also L is automatically an ordered
subfield under the same relation.

6.3 Theorem. For any a, b, c, d, e e K, if b 9^ 0, d 5^ 0, and e ^ 0,


we have:

(i) | = a . (6-1);

(ii) ^ = 0 if and only if a — 0;

a c
(iii) ^ ^ if and only if a • d = b • c;

a a• e
(iv) b^e
b ’

(1continued)
188 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

6.3 Theorem (continued)

{similarly for — instead of +);

(viii) if a 9^ 0 then

(ix) l"1 = 1.

Proof. The proofs are straightforward from 6.2. To see (i), we show
b • (a • 6_1) = a; but this follows directly from the fact that b • 6_1 = 1.
Part (ii) follows from the basic property of integral domains. For (iv),
if e ^ 0 also b • e 9^ 0. Let a/b = x; then b • x = a, hence {b • e) • x =
a • e, so that also x = a • e/b • e. Now from (i), (ii), (iv), and simplified
versions of (iii), (v), namely:

we can obtain (iii), (v) in the same way as we derived the parts of (6:1-8)
from (6:1-7). For (iii)' we note from (ii) that 1 5^ 0; hence by cancel¬
lation, a • b~x = c • 6_1 if and only if a = c. On the other hand, (v)' is
a form of distributivity,

(a • b x) + (c • b *) = (a + c) • b 1.

The proofs of the remaining statements are equally direct.

Definition 6.2 of b 1 allows us to convert any statement about inverses


into a statement about ratios. 6.3(i) shows us how to eliminate ratios from
statements in favor of inverses. Various familiar properties of inverses
and ratios are easily obtained from the remaining parts of 6.3 by apply¬
ing these principles. For example 6_1 5^ 0 if b 9^ 0. Further,

Also, if c 9^ 0,

(a/b)/{c/d) - {a/b) • {c/d)-1 = {a/b) ■ {d/c) =


0 * c
6.1] TOWARD EXTENDING INTEGRAL DOMAINS 189

6.4 Definition. We define x m for x eK, x ^ 0, and m e P by


x~m =

6.5 Theorem. Suppose that x, y e K, x * 0, y ^ 0, and that m, n e I.


Then x m = (x 1)m = (xm)~1. Furthermore, all parts of 4.12 under
the extended definition of exponentiation provided by 6.4 continue to
hold except for 4.12(iv).

Proof. Note that our first statement is required to hold for all m El,
not only mePas in the definition. To extend x~m = (T-1)™ to negative
m (it is trivial for m = 0), suppose that m = —k where k E P. Then
(x-lyn = (s-l)-* = ((*-!)-!)* = xk = x~™ To ^ ^ =
(xm) 1 for m E P, note that xm • (x~x)m = (x • x~1)m = lm = 1. Again
for m = 0 this is clear. For m = —k, where k E P, we have

(x-1)™ = (a;"1)-* = ((a;-1)-1)^ = ((a;-1)^'1 = (a;^)-1 = (xm)-\

The remainder of this theorem is reduced to this first part and to 4.32
(the version of 4.12 allowing m = 0 or n = 0) by consideration of cases.
For example, to prove that xm+n = xm-xn, it can be seen that it is
sufficient to treat the cases m + n > 0. We can restrict ourselves further
to the case n = 0 or — n E P. Then xm+n • x~n = _ xm.
hence, by x~n • xn = 1, xm+n = xm ■ xn. The verification of the remain¬
ing parts is equally direct (and mildly tedious).

The familiar property xm~n = xmfxn follows directly from 6.5.

Ordered fields; dense orderings. We now wish to see what properties


fractions have with respect to an ordering relation. We thus assume in the
following statements that (K, +, •, <, 0, 1) is any ordered field.

6.6 Theorem. For any a, b, c, d E K, if b ^ 0, d 0, we have:

(i) 0 < ^ if and only if 0 < a • b;


CL c
(ii) ^ < 2 if and only if (a • d) • (6 • d) < (6 • c) • (b • d);

(iii) if 0 < b • d, then ^ ^ if and only if a • d < b • c;

(iv) 0 < d < b if and only if 0 <]-< -■


b d

Proof. In any ordered domain, for b ^ 0 we have 0 < b2. Further


1/6 5^ 0, hence also 0 < (1/6)2 = 1/62. Thus (i) is proved by multiply¬

ing the left inequality by 62, the right inequality by 1/62. The proofs of
the remaining parts are direct and are left to the reader.
THE RATIONAL NUMBERS AND FIELDS [CHAP. 6
190

The ordering relations in ordered fields have an interesting special


property. The corresponding property for arbitrary ordered systems is
introduced in the next definition.

6.7 Definition. A set S is said to be densely ordered by a relation <,


and (S, <) is said to be a densely ordered system, if the following
conditions hold:
(i) (S, <) is a simply ordered system;
(ii) for any x, y £ S there is a z e S with x < z < y.

6.8 Theorem. K is densely ordered by <.

Proof. We need only verify, for any x, y G K, that x < y implies


x < (x + y)/2 < y. By 6.6(iii), the first inequality is equivalent to
2x < x + V, the second to x + y < 2y. Each of these follows directly
from x < y.
Note that in any densely ordered system S, there are infinitely many
elements between any two distinct elements x, y. Say, for example, that
x < y. Then there is a Z\ with x < z1 < y, hence a z2 with x < z\ <
z2 < y, h6nce a z3 with x < z\ < z2 < z3 < y, etc. This proof im¬
plicitly involves the axiom of choice, for at each stage we must choose
one among many intervening elements. (The slightly weaker statement,
that for any n e P there exist n distinct elements zh z2, ... ,zn with
x < Zi < y, can be proved without the axiom of choice.) The proof of
6.8, however, provides us with one rule of choice in ordered fields: x <
(x + y)/2 < y, hence

x < X±1 < (* + v)/2 + y K V: etc

Some finite fields. So far we do not have a proof of the existence of any
fields whatever. We shall prove in the next section a general theorem which
tells us that any integral domain can be extended to a field. However,
before doing that we can verify that certain integral domains that we have
already come across are in fact fields, namely the domains Ip of integers
modulo a prime p (4.60). More generally, we have the following.

6.9 Theorem. Any finite integral domain is a field.

Proof. Let K be a finite integral domain. Consider any element b ^ 0,


and let F(x) = b • x for each x E K. Then 2D(F) = K, 61(F) c K.
Further F is a one-to-one function, for if b • x\ = 6 • x2 then X\ = x2.
If (R(F) K, then K is set-theoretically equivalent to a proper subset of
itself, namely 2D(F). But this contradicts the definition of (2:4-5) of
finiteness. Hence (R(F) = K. It follows that for any a £ K there is an
6.1] TOWARD EXTENDING INTEGRAL DOMAINS 191

x Wlth b‘X — a. Since b was chosen arbitrarily, this shows that K forms
a field.
This theorem is not an empty generalization of the statement that all
Ip are fields, since it can be shown that there are many other finite integral
domains. However, it may be of interest to give another proof of 6.9
for these specific domains. Recall that the elements of Ip are the equiv¬
alence sets [A;]p for k e I with respect to the congruence relation =p.
Then for a, b e Ip we have a = [%, b = [m\p for some k, m; further if
b 9^ [0]p in Ip, m ^ 0 (mod p). To show that there is an x E Ip with
b ’ x = a, it is sufficient to find l E I with m • l = k (mod p). By 4.44
and 4.47(i), (m, p) = 1, hence there are s, t with ms -f pt = 1. Then
m(sk) -f- p(tk) = k, so that m(sk) = k (mod p).
An even easier proof makes use of Exercise 5 of Exercise Group 4.6.
It can be seen from there that if m ^ 0 (mod p) then mv~x = 1 (mod p),
hence m • mp = 1 (mod p). In other words, the equivalence set of
mp ~ is an inverse for that of m in Ip. This is sufficient to show that lp
is a field. As an example, the inverse of [2]7 in I7 is [25]7 = [32]7 = [4]7.
It might be suspected that, just as we constructed new integral domains
from old ones by forming homomorphic images, so could we also form new
fields from old fields. As it turns out, however, nothing essentially new can
be gained in this way. This is the content of the following theorem, whose
proof is left as an exercise.

6.10 1 heorem. Suppose that (K, +, •, 0, 1) is a field. The only congruence


relations in this system are the identity relation and the universal
relation K X K. (In other words, if = is a congruence relation for which
there are x, y E K with x y and x = y, then for all x, y e K,
x = y). Hence every homomorphic mapping G of this system either is
an isomorphic mapping or satisfies G(x) = G(y) for all x, y E K.

In Section 6.3, we shall return to a further discussion of fields in general,


especially of the algebraic problem of solving equations in fields.

Exercise Group 6.1

1. Prove Theorem 6.6(ii)-(iv).


2. Prove Theorem 6.10.
3. Prove that there is a field consisting of exactly four elements. (Keep in
mind Exercise 7, Exercise Group 4.6.) Try to construct satisfactory addi¬
tion and multiplication tables without going through all details.
4. Suppose that K is a field, n > 0, b0, c0, ... , cn E K and that
bi X bj if i X j. Show that there is /(£) G K(£] with /(&,) = a for each
i < n and deg (/(G) < n. (This is the so-called interpolation theorem.)
Is/(G uniquely determined by these conditions?
[CHAP. 6
192 THE RATIONAL NUMBERS AND FIELDS

6.2 Fields of quotients. The existence theorem. The idea for construct¬
ing a field K from an integral domain D is very similar to that used in 4.21
to construct the integers from the positive integers. To any two elements
а, b G D with 6^0 should correspond a quotient a/6 in K. As we have
already indicated in the discussion of (6:1—4)—(6:1—6), K need contain no
other elements. In other words, if we can extend D to a field K at all
then the set Q of all such quotients already forms a field. Thus we can re¬
strict attention to obtaining such a field Q each element of which cor¬
responds to (various) pairs (a, 6) of elements of D, with 6 9^ 0, of which
it is the quotient. The condition that two such pairs (a, 6), (c, d) thus
correspond to the same element of Q is given by 6.3 (iii), namely, a • d =
b • c. This shows us how to define the appropriate equivalence relation
W on pairs. Then the elements of Q can be taken, at first, to be the equiv¬
alence sets of this relation. Finally, to define appropriate operations on
the equivalence sets, we first define operations on pairs, with respect to
which we show that IT is a congruence relation. How these latter opera¬
tions are to be defined is suggested directly by the various parts of 6.3.
For example, 6.3(iv) shows that we should take (a, 6) © (c, d) =
(a • d + b • c,b • d). These ideas are now carried out in detail in the
following theorem.

б. 11 Theorem. Given any integral domain (D, +, •, 0, 1) we can con¬


struct afield (Q, +, •, 0, 1) with the following properties:
(i) D forms a subdomain of Q with respect to the given operations;
(ii) for any x £ Q there exist a, b G D with b ^ 0 and x = a/6.
If, further, < is a relation under which D is an ordered integral domain,
then we can extend it to a relation < between elements of Q under which
Q is an ordered field, without changing its meaning on D.

Proof. Define

(1) W = {((a, 6), (a', 6')): (a, 6), (a', 6') G D X (D - {0}) and
a • b' = a' • 6}.

We shall also write

(2) (a, 6) = (a', b') for ((a, 6), (a', 60) G W.

Then

(3) IF is an equivalence relation in D X (D - {0}).

To prove this, we must check, as usual, reflexivity, symmetry, and transi¬


tivity. Only the third of these is not immediately obvious. Suppose that
a, a', a", 6, 6', b" e D, with each of the last three distinct from 0. From
6.2] FIELDS OF QUOTIENTS 193

(a, 6) — (a', b'), (a', b') = (a", b") we have ab' = a'b, a'b" = a"6'.
Multiply the first equation by b'b", the second by bb'. Then we see that
ab'b'b" = a"b'bb'. But (6)2 5^ 0, hence ab" = a"b, showing that
(a, b) = (a", b").
By (3), we can deal with the equivalence sets W(aM of W. As in the proof
of 4.21, we shall also write [(a, 6)] for these sets and [a, 6] where possible.
To define appropriate operations on the sets [a, 6], we first make the fol¬
lowing definitions of operations 0 and ° on any elements (a, b) (c d) of
D X (D {0}):

(4) («, b) 0 (c, d) = (a • d + 6 • c, b • d),

(5) (a, b) o (c, d) — (a • c, b • d).

Note that D X (D — {0}) is closed under 0, ° only because b • d ^ 0,


which would not hold if we did not assume that D is an integral domain.

(6) W is a congruence relation with respect to 0, ».

Before proving (6), observe first that 0 and ° are commutative. Thus it
is sufficient in (6) to prove that if (a, b) = (a', b') then

(a) (a> b) 0 (c, d) = (a', b') 0 (c, d),


(b) (a, b) o (c, d) = (ab') ° (c, d).

For in general, if (a, b) = (a', b'), (c, d) = (F, d'), then

(a, b) 0 (c, d) = (a', b') 0 (c, d)


- (c, d) © (a', b')
- icf, d’) 0 (a', b')
= (ab') 0 (c', d'),

where we have applied (6a) in the first and third =, commutativity in


the other two. Similarly, we can then prove (6) for By (1), (6a) reduces
to showing that
(ad + bc)b'd = (a'd + b'c)bd,

which in turn reduces to adb' + bcb' = a'db + b'cb, hence finally to


adb' = a'db, which is true by ab' = a'b. Also, (6b) reduces to showing
that (ac)(b'd) = (a'c)(bd), which follows immediately from ab' = a'b.
Thus (6) is established.
Let

(7) Q = {X:for some (a, b) e D X (D — {0}), X = W(a<b)}.

In other words, Q is the collection of equivalence sets [(a, 6)] or [a, b\ of W.


[CHAP. 6
194 THE RATIONAL NUMBERS AND FIELDS

Two such sets [a, b], [a', b'} are identical if and only if (a, b) = (a', b').
As we know from the general considerations of (2:3—37), it follows from
(6a and b) that

(8) there are operations +, • defined on Q such that for any [a, b],
[c, d) e Q,

(a) [a, 6] -T [c, d] = [(a, b) © (c, d)],


(b) [a, 6] 7 [c, d\ = [{a, b) ° (c, d)].

Next put, for any a E D,

(9) a = [a, 1],

Observe that for [a, 6] E Q,

(10) (a) [a, b] = 0 if and only if a = 0,


(b) [a, b\ = I if and only if a = b.

This follows directly from (1).


We now claim

(11) (Q, +, •, 0,1) is afield.

The verification of this result by means of (4), (5), (7)-(10) is now a com¬
pletely routine matter. We shall check only a few of the less obvious laws:
associativity of +, distributivity, and additive and multiplicative inverses.

(12) For any X, Y, Z e Q, X -f- (F + Z) = (X + F) + Z.

Representing these as [a, b], [c, d], [e,f], we are reduced to showing

(adf + b(cf + de), bdf) = ((ad + bc)f + bde, bdf);

in fact, we have =.

(13) For any X,Y,Z eQ, X ' (F + Z) = (X ~ Y) + (X ■ Z).

Here we are reduced to showing that

(a(c/ + de), bdf) = (acbf + bdae, bdbf).

This has the form (g, g) = (bg, bg), which is true by (1).

(14) For any X E Q there is Y E Q with X + F = 0.


6.2] FIELDS OF QUOTIENTS 195

Given X = [a, 6], the obvious candidate for Y is {—a, 6], Then

X + Y = [ab + (—a)b, b2] = [0, b2] = 0


by (10a).

(15) For any X e Q, X ^ 0, there isYeQ with X 7 F = I.

Since here [a, 6] ^ 0 we know by (10a) that a 9* 0. Hence [b, a) <= Q.


By (10b) [a, 6] • [6, a] = [ab, ba\ = 1.
1 resuming now that the proof of (11) is completed, (15) also shows that
the inverse in Q is defined by

(16) [a, b]-1 = [6, a],

so long as a 0.
We now wish to show that the field Q contains a subsystem isomorphic
t° D. The natural choice is the set of elements a where a e D. Define a
function G by the conditions:

(17) £>((?) = D, G(a) = a for each a e D,

and set

(18) D = <R(G).

That D is a subsystem of Q follows from

(19) (a) a + 6 = (a) + (b),

(b) a ■ b = (a) ■ (b),

for any a, b £ D. This also shows that G is a homomorphic mapping of D


onto D. The only thing left to check in showing that

(20) G^ establishes an isomorphism between (D, +, •, 0, 1) and


(D, +, •, 0, I)

is that G is one-to-one. But if a = b, we have (a, 1) = (6, 1), hence


a = b. Now (i)-(iii) of our theorem will practically be established if we
show that

(21) for any X e Q, there are A, B e D, B 5^ 0 , with X — A ~ B~J.

In fact, if_X = [a, b], with b 9^ 0, then we also have X = [a, 1] • [1, 6] =
(a) • (b)~\ by (16).
196 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

To complete the proof, we apply the general result (2:4-9). According


to that we can complete the following diagram, to obtain a system
(Q, +, •, 0, 1) satisfying the conditions of our theorem.

Since (Q, . . .) is a field and (Q, . . .) = (Q, . . .), also (Q, . .^) is a field.
Since every element X of Q is a quotient of elements A, B of D, whenever
i6Q and H(x) = X, we will have x the quotient of the corresponding
elements H~X(A) and H~1(B), i.e., the elements G~l(A) and G~X(B)
of D.
If D has defined on it a relation < which makes it into an ordered in¬
tegral domain, we can extend this relation to Q by first defining the set
of positive elements in this relation as follows:

(22) let Pos consist of all quotients a/b in Q for which a, b G D and
0 < a • b.

[This, of course, is taken in light of 6.6 (i)]. Note that according to (22),
x G Pos if and only if there is some pair (a, b) G D X D for which
0 < a • 6 and x = a/b. It does not follow directly from this that if x G Pos
then for any pair (c, d) G D X D for which x = c/d we have 0 < c • d,
i.e., that the determination, by (22), of whether an element x G Pos is
independent of the particular representation as a quotient of elements
of D. This is, however, true:

(23) if (a, b), (c, d) G D X (D — {0}) and a/b = c/d then 0 < a • b
if and only if 0 < c • d.

For ad = be, hence (ad) (be) = b2c2. If c = 0 then necessarily a = 0,


and the conclusion is trivial. Otherwise b2c2 > 0, hence (ab)(cd) > 0,
again giving the desired conclusion.
It follows from (23) that

(24) if a G D then a G Pos if and only if 0 < a.

For we need only consider the presentation a = a/1. Hence, if we define,


for x, y G Q,

(25) x < y if and only if y — x G Pos,


6.2] FIELDS OF QUOTIENTS 197

we see that this relation is an extension of the given relation on D, without


changing its meaning for x, y e D.
Thus we need only check the conditions 4.17(i), (ii) to obtain the final
conclusion of our theorem. These are, for x, y £ Q,

(26) if x, y e Pos then x + y e Pos and x • y e Pos,

and

(27) exactly one of the following three cases holds:

x G Pos, x = 0, —x E Pos.

The proofs are left to the reader.

Note that an application of this theorem is really interesting only if D


does not itself already form a field. For otherwise, by the conditions (i)
and (ii), every element of Q is already in D, hence by (i), D = Q.

Isomorphism of fields of quotients. We wish now to show that in the


algebraic sense, i.e., up to isomorphism, the field Q satisfying the conditions
6.11 (i)—(iii) is uniquely determined by D. In fact, and in analogy to 4.23,
we can prove the following more general statement.

6.12 Theorem. Let (D, +, •, 0, 1) be an integral domain and (Q, +, •, 0, 1)


afield satisfying the conditions 6.11 (i), (ii) with D. Suppose further
that (K, ©, o, 0, 1) is afield such that (D, +, *, 0, 1) is a subdomain of
(K, ©, °, 0, 1). Let Q' consist of all x e K such that for some a, b e D
with b 0, we have x = a ° &O (inverse in K). Then Q' forms a
field, and
(Q,+,-,0, 1) ^ (Q', 0, o, 0, 1).

In particular, if K also satisfies the conditions 6.11 (i)—(ii), then


K = Q' and the systems given by Q and K are isomorphic. The same
results hold if Q and K are ordered fields by ordering relations which
agree on D.

Proof. To define the required isomorphism, given a, b e D, b ^ 0 , set

(1) G(a • 6-1) = a ° 60.

We need to see that this defines a unique value G(x) for each x e Q,
independent of any particular representation x = a • 6-1. If we have
another such representation x = c • d~x, we know that a • d = b • c.
Hence^ by definition of subsystem, a ° d = b ° c so that, in K, a ° frO =
c . d&. Clearly

(2) 3D(C) = Q, <R(G) = Q'.


198 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

That (3) G is one-to-one

is seen from the converse to the above argument: if a ° bU = c


then a • 6_1 = c • d~~ h Finally,

(4) G(x + y) = G(x) © G(y) and G(x • y) = G(x) • G(y),

for any x, y E Q, as is seen from the fact that the same rules 6.3(v), (vi)
for calculating sum and product of quotients hold in Q as in Q'. Clearly

(5) G(0) = 0 and G{ 1) = 1.

Finally, if both Q, K are ordered, then the orderings of Q, Q' must be


isomorphic by G, since again the same rule 6.6(ii) for deciding whether
one quotient precedes another holds in Q as in Q'.

The rational numbers; fields of rational forms. The unique (up to —)


field Q satisfying 6.11 (i), (ii) is usually called the field of quotients of D.
If we now hpply the two preceding theorems to the integers, we obtain
the following.

6.13 Theorem. We can construct an ordered field (Ra, +, •, <, 0, 1) which


(i) contains (I, +, •, <, 0, 1) as a subsystem, and is such that
(ii) for any x E Ra there exist a, tel with t ^ 0 and x = a/b.
Furthermore, any other ordered field which contains the integers con¬
tains an ordered sub field isomorphic to Ra.

In view of the unicity of such a system, we can now adopt the following.

6.14 Convention. We assume throughout the remainder of this book that


(Ra, +, •, <, 0, 1) is a fixed ordered field satisfying 6.13(i), (ii). We
shall call Ra the set of rational numbers.

Although the rational numbers form one (in fact the smallest) field
containing the integers, they are by no means the only such field. Two
especially interesting larger fields are formed by the real and the complex
numbers, which we shall deal with in the next two chapters. These will
then lead us to a number of other algebraically interesting intermediate
fields.
Besides the integral domains of the integers I and of the integers
modulo p we have constructed a number of new integral domains in Chapter
5, namely those consisting of the polynomial forms over any given domain.
6.2] FIELDS OF QUOTIENTS 199

Then by 6.11 we immediately obtain the following.

6.15 Theorem. Suppose that (D, + , •, 0, 1) is any integral domain,


k E P, and that (D[£x, . . . , £*],+,•, 0, 1) is the k-fold transcendental
extension of D given by 5.22. Then we can construct a field
(D(flf . . . , fa), +, •, 0, l) with the following properties:
(i) D[£x, . . . , £*;] is a subdomain o/ D(£x, . . . , with respect to
the given operations;
(ii) for any p E D(£x, . . . , £k) there exist

/(£i, ■ ■ ■ , ?&)> g(£i, ■ ■ ■ , &0 E D[£x, • • • ,

with g(y £x, ...,&) 5* 0 and p = /(| i, . . . , h)/g(£ x, &).

Since such an extension D(£x, is again uniquely determined up


to = by 6.12, we can proceed to the following.

6.16 Convention. We assume throughout the remainder of this book that


(D(£x, . . . , £&), +, •, 0, l) is a fixed field satisfying 6.15(i), (ii),
provided that (D, +, •, 0, 1) is an integral domain and D[£x, . . . ,
is a k-fold transcendental extension of D by £x, . . . , £&. Then
D(£i, . . . , £*;) is called the field of rational forms in £x, . . . , £&.

When k = 1, we usually write D[£] and D(£), respectively, for the


domain of polynomial forms and the field of rational forms in £. As we
associated in a natural way with each polynomial form rj a certain func¬
tion /, so we might expect to associate with each rational form p a certain
function r. Given p = f(£)/g(£) with g(£) ^ 0, we would expect to take
r(x) = f(x)/g(x) for each x. However, this only serves to define the func¬
tion r at each x for which g(x) ^ 0, i.e., at each x which is not a root of
</(£). There is no natural algebraic way to supply an alternative definition
of r(x) in case g(x) = 0. Of course, we could deal with functions whose
domain is not all of D, but that would vary from function to function
according to the representation of the given rational form. Moreover,
different representations, /(£)/#(£) and /i(£)A/i(£)> of the same rational
form need not define the same function; for example, if a is not a root of
g(£) and we take /i(£) = /(£)(£ — a) and gq(£) = g(£)(£ — a), a will
be in the domain of the function determined by/(£), g(£) but not in that
determined by/i(£), gi{£). The same problems arise in associating func¬
tions with members of D(£x, . . . , £*). Rather than make some artificial
convention about the algebraically undefined cases, we shall not attempt
to set down any specific rule of association.
Originally, the need for dealing with polynomial forms rather than
polynomial functions arose from the fact that the latter did not have
200 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

certain desired properties in the case of D finite. On the other hand,


according to 5.15, it makes no difference algebraically whether we deal
with forms or functions when D is infinite. The preceding discussion
shows, however, that it is essential to deal with rational forms rather than
functions, no matter what D is. Indeed, the beauty of the general field
of quotient construction 6.11 is that it legitimizes the use of all the standard
algebraic manipulations on the ratios /(£)/#(£), without there being any
need to worry about whether the denominators are “defined, ” except to
ensure that gr(£) is not trivially 0.

Exercise Group 6.2

1. Prove statements (26) and (27) in the proof of Theorem 6.11.


2. Suppose that x £ Ra, 0 < x and x2 < 2. Let y — x-\- (2 — x2)/2(xJr 1).
Show that x < y and y2 < 2. Given that x > 0, x2 > 2, find a y £ Ra
with 0 < y < x and y2 > 2.

6.3 Solutions of algebraic equations in fields. Fields provide us with


great flexibility in solving algebraic equations. The general problem in¬
volves not one but several such equations, or as is usually said, a system
of equations involving several variables, for which one seeks a simultaneous
solution. The “expressions” occurring on each side of an equation are built
up from “variables” £i,...,£* and constants, i.e., particular elements of
the field under consideration, by means of the basic rational operations
of the field, +, •, —, _1. By 6.3(v)-(viii), any such expression can
eventually be reduced to the form of a single ratio, which we can take to
be g(£i, . . . , h)/h($h . . . , &), where g(£u . . . , &), h(£lt ...,&) are
polynomials over the given field, and h(^, , £*) ^ 0. Thus the ques¬
tion as to whether there exists a solution of an equation, the sides of which
can be reduced to the forms

, h) g2(G, • • •, h)
hitti, . . . , &) h2(£i, ■ ■■,&)’

reduces to the question of whether there exist x1} . . . , xk such that

hi(x1} ..., xk) 0, h2(xi, . . . , xk) ^ 0


and
9i(Ti, • • • ; _ 9... y Xk)
hi(xi, . . . , Xk) h2(xi, . . . , xk)
Setting

/(G, • • •, &)
9l(G; • • • j £k)h2(i;i, . . . , £&) 92(^1, • • ■ , %k)h 1(^1, . . . , %k),
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 201

the problem reduces to determining whether there exist xiy ... ,xk with

f(xi, . . . ,xk) = 0,

satisfying the preceding inequalities concerning hi, h2. In general, all


solutions of an initially given system of equations can be found among
all solutions {x\, . . . , xk) of a system of polynomial equations, i.e., for
certain/i(£x, ...,£&)> 1 < i < m, among the set of all (xb ... , xk) such
that fi(xi, . . . , Xk) = 0 for each i = 1, ... ,m. (We emphasize the word
“among” here, since certain solutions of this final system will be excluded
as being solutions of the original system if they make certain denominators
0.) We now restrict attention entirely to such systems of polynomial
equations.
The general problem concerning such a system is to determine the
existence and construction of solutions, i.e., to find out whether there are
any solutions at all and, if so, how to express these by means of various
basic operations, possibly beyond the basic rational operations. Not much
can be said about the problem in this very general form. However, a great
deal is known about special cases of the problem, especially, on the one
hand, systems of linear equations, in which the degree of each fi is 1, and,
on the other hand, single equations in one variable, i.e., equations in which
m = k = 1. The study of these special cases forms two extensive parts
of algebra, the first of which is usually called linear algebra and the second,
the theory of polynomials over various fields. We shall touch briefly on the
first of these; we shall give to the second a good deal more attention
throughout the remainder of this book, especially as it relates to the fields
of rational, real, and complex numbers.
For our general considerations, we assume now in the rest of this section
that (K, +, •, 0, 1) is an arbitrary field. Sometimes we shall also want to
consider extensions (L, +, •, 0, 1) of such afield.

Systems of linear equations. We know by 5.24 that if /(£i, ... , h) £


K[£i, ...,&] and deg (/(£u . . . , £*)) = 1, we have /(h, ••■,&) =
a0 + Z^=i aj%j for some a0, alt . . . , ak G K, where at least one ay 5^ 0
for 1 < j < k. Thus/(xi, . . . , xk) = 0 if and only if £y=1 ay£y = — a0.
This leads to a slightly different form of the general system of equations,
in which we take the left-hand term to be given by a homogeneous linear
polynomial. (The term linear comes from the fact that in analytic geom¬
etry, the set of all solutions (x, y) of an equation aix + a2y = b, where
ai ^ 0 or a2 ^ 0, forms a straight line.)

6.17 Definition. Suppose that k, m E P, and that for each i = 1, ... ,m,
fi(%i, ... , h) = LyU where (an, ... , aik) is a k-termed
sequence of elements K. Suppose that (6b . . . , bm) is an rn-termed
202 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

sequence of elements K. The given sequences are said to be the coeffi¬


cients of a system of m linear equations in k variables. A sequence
(xi, . . . , xk) is said to be a solution of this system if fi(xx, . . . , Xk) = bi,
that is, if
k
aHxi =

3=1

for each i = 1, . . . , m.

For simplicity of statement in the following, we have not excluded here


the possibility that for some i and all j = 1, . . . , k we have a*y = 0.
Of course, the equation 0 = bi is trivial in this case, and the whole system
will have no solution if, in fact, bi ^ 0.
We want first to show that if the coefficients of a system of linear equa¬
tions belong to a field K then nothing is gained by passing to a larger
field L, in the sense that if the system possesses any solution at all in L,
it already has one in K. It is, of course, possible that the system has more
solutions in L than in K; however, we shall see that this is not the case if
there is a unique solution (xx, ... , xk) in K. Since the latter is usually
the case of main interest, it follows that any field K is, in a sense, complete
with respect to solutions of linear equations. (Thus, if this were our only
interest, the development of the number systems could simply stop with
the rational numbers.)

6.18 Theorem. Suppose that K is a subfield of L. Suppose that aij e K


and bi £ K for each i < m and j < k. If the system of equations
k
^ / aijxj — bi, i = 1, 2, ... , m,
3= 1

has a solution (xx, . . . , xk) with xx, . . . , xk e L, then it already has a


solution (xi, ... ,x'k) with x{, . . . , x'k e K. Furthermore, if the system
has a unique solution (x[, . . . , x'k) in K, then (x[, . . . , x'k) is also
the unique solution in L.

Proof. The general idea is that of the method of eliminating variables.


The proof proceeds by induction on m. (We could equally well proceed
by induction on k.) However, we first study separately the case k = 1.
In this case, we are assuming that we have a solution xx e L to a system
of equations

(1) 011*1 bi> O21X1 = 62 , . . . , am\X\ — bm.

If all an = 0, we must have all b{ — 0; hence any x{ e L is also a solu¬


tion of this system, and in particular any x[ £ K. It is clear that in this
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 203

case there is no unique solution in K. Suppose that some an ^ 0; for


simplicity, assume that it is an. Then any solution x[ must satisfy
x[ = bi/an. In particular, we are assuming by (1) that there is one
solution X\ El L; being a quotient of two elements of K, it follows that
xj. G K. F urthermore, it is clear that aq is the unique solution in both
K and L. Note that the statement that the system has at least one solution
is equivalent to

(2) Oi1
&1 _

7
0i for each i = 2, . . . , m.
an

We can assume now that k > 1. To start our induction on m, we con¬


sider the case m = 1:

(3) anx i + ai2T2 + • ■ • + alkxk = bx.

Again we assume that this has at least one solution with aq, x2, . . . , xk e L.
If all aij — 0, it follows that fq = 0. But then any x[, x2, . . . , xk E L
provide a solution, in particular, any elements of K. Clearly there is no
unique solution in this case. Suppose now that some an A 0; assume,
for simplicity, that an ^ 0. (The proof in general follows the same lines
as below.) Then any solution x[, x2, . . . , xk e L satisfies

(4) x'i = afi[bi — (a12.r'2 + + alkxk)].-

Conversely, no matter what x'2, . . . , x'k e L are chosen, if we define x[


by (4), (x[, x'2, , xk) will be a solution of (3). In particular, we can
choose x2, ■ ■ ■ , x'k G K. Then the associated with these by (4) must
also be in K. It is again clear that there is no unique solution.
Assume now that the theorem holds for systems of m equations; we
show that it also holds for a system of m + 1 equations. (We still assume
k > 1.) Thus suppose that we have a solution x\, x2, . . . , xk e L of

(5) a{\X\ a{2x2 ’ * * ^b &ik%k bi far % 1,2,..., vn A 1.

We make the same division of cases as in (3). If all coefficients a\j = 0,


we must have = 0. But then any solution (x[, x2, ... , xk) of the re¬
maining m equations is a solution of the whole system. Hence, by induc¬
tion, there is at least one solution entirely in K. Clearly, the unique
solution in K, if there is one, is at the same time the unique one in L.
We assume now that some a\j A 0, say again, an. Then any solution
x[, x'2, . . . , x'k must satisfy (4). It must also satisfy

(6) anafi[bi — (ai2x2 + • • • + a\kxk)]

+ ai2x’2 + • • • + aik.rk' = b{ for i = 2, . . . , m + 1.


204 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

Let us define

(7) Cij ciij ~~ ®ii®n aij Jot i = 2, ... , m -f- 1, j = 2, . . . , k,

and

di = — a.W&i for i = 2, . . . , m -j- 1.

Then (6) takes the form

(8) ci2x2 + + Cikx’k = di for i = 2 , . . . , m + 1.

In other words, (6) [or, what is the same, (8)] is a system of m equations
in k — 1 unknowns. Conversely, we see that if {x'2, . . . , x'k) is any solu¬
tion of the system (8), and we define x[ in terms of x2, ... , x'k by (4),
then (x{, x2, ... , xk) is a solution of (5). Now the system (8) has at least
one solution, namely (x2, . . . , xk). Hence, by induction hypothesis, we
can find a solution (x'2, . . . , x'k) with all x'- e K. But then x[ as determined
by (4) is also in K. The final case to consider is that some (x[, x'2, . . . , x%)
is the unique solution of (5) in K. We leave as an exercise the proof that
it is then the unique solution of (5) in L.
Note that the above proof actually provides much more than what is
explicitly stated in the theorem. Inspection of the proof shows that we
have an algorithm for determining, in terms of the coefficients, whether or not
a given system of linear equations has at least one solution in a field K, and
tf 80, for determining all such solutions. [In the simplest case, k = 1, this
was given by the conditions (2).] Linear algebra is devoted, in part, to
studying such algorithms in more perspicuous forms, through the use of
determinants and matrices. We shall not pursue the general question any
further in this direction, and suggest that the reader consult any of the
several excellent texts now available on this subject. We turn, instead, to
a few simple examples for illustrative purposes.
The simplest nontrivial case involves two variables, k = 2. Consider
the case m = 2.

(6;3-l) axlxx + aX2x2 = bx,


a2xxx + a22x2 = b2.

Assume, for simplicity, that axx ^ 0, and either a2l ^ 0 or a22 ^ 0.


Following the lines of the proof, if we are to obtain a solution with
xx, x2 e K, we should have

(6.3—2) xx — axx (bx — 1*12^2)

and
a2Xaxx (bx — ^12^2) + d22x2 = b2.
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 205

The second of these equations then takes the form

(6:3—3) (o22 — 02lall 0\2)%2 = b2 — 0'21allbl,

or, equivalently,

(6:3—3)' ((I22Q11 — o2\ci\2)x2 — &2an — a2i&i.

Now (6:3-3)' has a solution x2 if and only if either a22«n — <^21^12 =


0 = b2an — <22ibi or a22<2n — 021^12 ^ 0- In the first case, any x2
is a solution of (6:3-3), and any aq defined in terms of x2 by (6:3-2) gives
a solution (aq, x2) of (6:3-1). In the second case, there is a unique solu¬
tion of (6:3-3)' and thence, via (6:3-2), a unique solution (aq, x2) of
(6:3-1). This is seen to be given by

b2d\1 — U21bi 022bi — b2Clu


(6:3-4) £2 = Xl —
a22all ~~ a21al2 022CLn ~ a2la12

which, in the symbolism of determinants, is usually expressed by

61 CL 12 an b1
b2 0-22 o21 b2
(6:3-4)' X\ = ) X2 -
flu 0-12 Oi 1 0\2

I&21 O 22 a21 022

Some evidence of the power of the abstract notion of field is indicated by


the fact that this representation of the solution, with which we are familiar
from the usual number systems, holds equally well in any field, for instance,
in any of the fields Ip, pa prime.
As an example, consider the question of whether there exist integers
X\, x2 such that

(6:3-5) 3yx + 4j/2 = 0 (mod 5),


4j/x + ?/2 = 3 (mod 5).

This is the same as solving

(6:3-5)' [3]aq +5 [4]aq = [0],


[4]aq +5 x2 = [3]

in 15. If we apply the method described above, solving, as is simplest


here, the second equation for x2 in terms of aq, we have x2 = [3] -—5
[4]aq = [3] +5 aq. Then [3]^! +6 [4]([3] +5 *1) is equal to [0], that is,
[7]aq +5 [12] = [0], [2]aq = [—12] = [3]. We need only find [2]-1 in I5,
which is [3]. Hence aq = [3][2_1] = [9] = [4]. From this, [x2\ =
[7] = [2], Since <[4], [2]) is the unique solution of (6:3-5)' in J5, then
206 THE RATIONAL NUMBERS AND FIELDS [CHAP, 6

(4, 2) is a solution of (6:3-5). Of course, the latter is only unique up to


=(mod 5). For example, another solution of (6:3-5) is (14, —3).
Note that if we had started with the system

(6:3-6) 3yx + 4y2 = 0 (mod 5),


42/i + 22/2 = 3 (mod 5),

there would have been no solution; this essentially comes from looking at
the denominator a22«n — a2iai2 in (6:3-4). This must be ^ 0, unless
&2«ii — a2i&i = 0. In this case, we have

Mil — fl2i&i = [3] • [3] —5 [4] • [0] = [9] ^ [0];


but

a22^n — &2W12 = [2][3] —5 [4] • [4] = [—10] = [0].

On the other hand, by taking an example where also ~ a2i&i = 0,


for example,

(6:3-7) 3yx + 4y2 = 1 (mod 5),


4i/1 + 2y2 = 3 (mod 5),

we could arrange it so that for each value of X\ we can find a value of x2


satisfying the equations.

Linear equations in integral domains. Consideration of one equation in


two variables, or two equations in three variables, is of interest when we
are dealing with an integral domain which is not a field, for example, the
domain I. To study such systems, we can pass to the held of quotients—in
this particular case to Ra. Using the method of eliminating variables,
we find all solutions, if there are any, in this held. We then try to see which
of these solutions belongs to the original domain. Note that the hypotheses
of 6.18 do not apply here. It may well turn out that there are solutions
in the held of quotients without there being solutions in the original
domain. For example, this is the case with the equation

(6:3—8) 2xq -f- 4x2 = 7

over the integers. For if we had a solution x1} x2 e I, the left-hand side
of (6:3-8) would be divisible by 2, while the right-hand side would not.
On the other hand, there are infinitely many solutions in rationals, inter¬
related by x\ = | — 2x2. Clearly, again, none of these pairs (x1; x2) has
both xi and x2 in I.
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 207

A necessary and sufficient condition that

(6.3-9) axxx A a2X2 — b

has a solution with xx, x2 e I, where ax, a2, b e I and ax ^ 0, a2 ^ 0,


is that

(6:3-10) (ax, a2)\b,

i.e., that the gcd of ax, a2 divides b. The argument that this is necessary
follows the same line as above. On the other hand, if d = (ax, a2) and
d\b, and ax = a[d, a2 = a2d, b = b'd, we see that (6:3-9) is equivalent to

(6:3-11) a'l-Ci + a2x2 = b'.

But now (a], a2) = 1. Hence there are s,lel (by 4.44) with

(6:3-12) axs + a'2t = 1,

hence
ax(sb') + a'2(tb') = b'.

Multiplying through by d shows that (xx, x2) = (sb', tb') is one solution
of (6:3-9). This is by no means the only solution of (6:3-9) [assuming that
(6:3-10) holds]. If (x[, x2) is any solution, and we set yx = xx — xx,
y2 = x2 — x2) we have

(6:3-13) dxyx + a2y2 — 0.

Our previous argument shows that this has solutions, since (ax, a2)|0, but
(6:3-12) exhibits only the solution (0,0), for which x[ = xx, x2 = x2.
However, there are many others, namely, all those yx, y2 G I with

a2 a2
(6:3-14) Vi 2/2 = 2/2-
ax

Since (ax, a2) = 1, this has a solution with yx, y2 e I if and only if
ax\y2, that is, y2 = axk for some /cel; then yx = — a'2k. Clearly any
choice of k ^ 0 gives a new solution (x[, x2) in I of (6:3-9). We conclude:

(6:3-15) If (ax> a2)\b then axxx + a2x2 = b, where xx, x2 e I, if and only
if for some /cel, xx = sb' — a2k, x2 = tb' + a[k, where
d = (ax, a2), ax = a[d, a2 = a'2d, b = b'd, and a[s + a'2t = 1.

As an example: all integral solutions of 2xx + 6x2 = 8 can be seen to


have the form xx = 16 — 3k, x2 = —4 + k, by taking s = 4, / = 1 in
208 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

the equation 1 • s + 3 • t = 1. (Of course, any other particular solution


s, t could be used; the answers will differ only formally.)

Polynomial equations in the rationals. We can make no general statements


at this point about single polynomial equations in one variable of degree
higher than 1. Instead, we conclude this section with a very useful result
about polynomials over the rationals, our first special interesting case.
Suppose that we have

m ftp

b0 ‘
I ci 1
yy where all ai} b{ e I, ^ 0.

We see that c is a root of /(£) if and only if it is a root of (b0bj • • • bn)f(£).


However, the latter is a polynomial with integral coefficients. It is thus
sufficient to restrict ourselves to this case.

6.19 Theorem. Suppose that /(£) = £?=o where a{ e I, n > 0, and


an ^ 0. Suppose that b, c e I with c ^ 0 and (b, c) = 1. Then if
b/c is a root of /(£) we must have b\a0 and c\an.

Proof. From Ya=o afb/c)l = 0 we conclude that Xa=o UiCn~lbi = 0 by


multiplying through by cn. This can be rewritten in the two forms

(1) a0cn = ~{alcn~lb + a2cn~1b2 + • • • + anbn)

and

(2) anbn = ~(a0cn + a^-'b H-+ ancbn~l).

From (1) and (2), respectively, we see that

(3) 6|a0c" and c\anbn.

I rom (6, c) = 1 we obtain the desired result by repeated application


of 4.46.

6.20 Corollary. Suppose that /(£) — ]C?=o is a monic polynomial


(an = 1) with all ai e I. Then every rational root of /(£) is an integer.

6.21 Corollary. Suppose that n > 1, a G I, |a| > 1, and that \a\ =
v\x • ■ • pIt is the unique prime power representation of \a\ with
Pi < P2 <•■■< pm. Suppose that for some t, n \ it. Then
I” — a has no rational roots.

Proof. For otherwise we would have, by 6.20, x £ I with xn = a;


hence |a;| = \a\. Take b = \x\) then b e P. Every exponent in the prime
6.3] SOLUTIONS OF ALGEBRAIC EQUATIONS IN FIELDS 209

power representation of bn is divisible by n. This contradicts the unicity


of the representation for |a|.
Thus, in general, the rational numbers are not closed under the opera¬
tion of taking nth roots. The preceding generalizes the classical proof that
a/2 is irrational, or expressed in present terms, that there is no x G Ra
with x2 = 2. Thus the rationals are far from being algebraically complete.
The effort to overcome this incompleteness by introducing new kinds of
numbers is embarked upon in the next chapter, which introduces the real
numbers. Before doing this, we want to take up some general properties
of division among polynomials over any held which will be useful at various
points in our further discussion.

Exercise Group 6.3

1. Complete the proof of Theorem 6.18, as suggested at the end.


2. Find examples of an, ai2, «2i, <222, b\, 62 G I for which the pair of
congruences
anxi + ai2x2 = 61 (mod 7),
a.2ix\ + C122X2 = 62 (mod 7),

with xi, X2 G I, 0 < x\ < 7, 0 < X2 < 7, has


(a) no solutions,
(b) exactly one solution,
(c) more than one solution.
3. Find all incongruent solutions of the pair of congruences

x\ — X2 + 2^3 = 1 (mod 5),


2xi + X2 — X3 = 2 (mod 5).

4. Consider two equations,

anxi + ai2x2 + • ■ • + ai/cX/c = 61,


a 21x1 + 022X2 + •• + 02 kXjc ^2)

in k variables (k > 2) with all a{j, hi in a field K. Find simple necessary


and sufficient conditions, in terms of the a*,-, b, so that there should be
at least one solution (xi, X2, . . . , xu) in K.
5. Find necessary and sufficient conditions for the existence of at least one
solution (xi, X2, X3) in I to the system of Exercise 4, when k = 3, and
all Oij, bi G I.
6. Find all rational roots, if any, of the following polynomials in Ra[£]:
(a) £3 — 5£ + 3, (b) 2£3 - 3£2 - 2£ + 3.
210 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

6.4 Polynomials over a field. Problems of multiplication and division


in the integers (other than 0) are greatly simplified by analyzing every
such number into basic “building blocks,” namely the prime numbers.
The success of this analysis depends essentially on the unique factoriza¬
tion theorem 4.50. Similarly, we might hope for an analysis of arbitrary
polynomials into what might be called prime polynomials. Again our hope
would be to obtain a unique factorization theorem. In fact we shall be
able to do this with a line of reasoning very close to that which we carried
out for I in Section 4.4, especially 4.37-4.51. We assume again in this
section that we are dealing with an arbitrary field (K, +, •, 0,' 1). Basic
throughout the following is the definition 5.12 of the relation | of division
between polynomials in K[£],
At first one might be tempted, following 4.40, to call a polynomial /(£)
prime if /(£) ^ 0, ±1 and if, whenever gU)\f{£), then g(fi) = ±1 or
g(0 = ±/(£). However, the role of +1 is peculiar to the integers and not
typical of the more general situation. The essential point about ±1 in I
is that these are the only divisors of 1 in I (4.39iv). In contrast, in K[£]
there are many divisors of 1; namely any a e K with a ^ 0 has a\\ by
1 = a • (1/a). We shall see that these are the only divisors of 1 in K[£],
From this point of view, we should rather say that/(£) is prime if, when¬
ever <K£)|/(£), then g(£) is either a constant or a constant times /(f).
Indeed, we think of factoring out a constant from a polynomial as being
a trivial factorization. We should think of two polynomials as being
essentially the same, as far as divisibility problems in polynomials are
concerned, if they differ only by a constant factor, so that they divide
each other.
These remarks suggest the technical advantages of studying factoriza¬
tion of polynomials over a field, rather than over an arbitrary integral
domain, as in Chapter 5. All of the results of this section can be generalized
so as to hold for polynomials over a domain, but the statements become
more complicated, since divisibility among constants is no longer trivial.
On the other hand, one can often directly apply the results of this section
to polynomials over an integral domain by simply considering them as
polynomials over the associated field of quotients.

Basic properties of divisibility.

6.22 Definition. Suppose thatf(£), g(£) e K[f]. We write /(f) ~ p(f) if


f(0\9(Z) and g($ |/(£) in K[f]. We write /(f) ^ g($ if /(f) ~ 0(f)
does not hold.

The next theorem summarizes a number of basic properties of | and ~ in


K[f], many of which are analogous to those of 4.39.
6.4] POLYNOMIALS OYER A FIELD 211

6.23 Theorem. Suppose that /(f), g(£), /z(f), /i(f), . . . ,/„(f) G K[f],
Then:
(i) /(£)l°;
(ii) 0|/(€) z/and onfz/ z//(f) = 0;
(iii) if a E K and a 5^ 0 then aj/(f);
(iv) z//z(f)i£(f) and 0(f)l/(f) &en /z(f)|/(f);
(v) if 0(f)|/»(f) for each i then flf(f)E*~i/fc(f);
(vi) #0(f)|/»(f) for some i then flf(£) I IE=i/*(€);
(vii) /(f)|1 i/and only if /(f) ~ 1;
(viii) /(f)|l z/ and only if for some a 6 K, a ^ 0, we have f(£) = a;
(ix) ~ zs an equivalence relation between elements of K[ f ];
(x) /(f) 0(f) if and only if for some a E K, a ^ 0, we /zaz>e

/(f) = 00(f);
(xi) z/ /(f) 7* 0, deg (0(f)) = deg (/(f)) and 0(f)|/(f) toen
0(f) ~ /(f);
(xii) if f(£),g(£) are monic then f(£) ~ g(£) if and only if f(£) = 0(f);
(xiii) z/ /(f) 0 then there is a unique a E K, a 5^ 0 and a unique
monic 0(f) E K[f] with /(f) = ag(tf); hence there is a unique
monic 0(f) E K[f] with /(f) ~ 0(f).

Proof. The proofs are quite straightforward. The only essentially new
points are in (viii)-(xi). Consider first (viii). As we have already seen,
if a E K, a ^ 0 then ajl. Suppose that /(f) |1. Then clearly /(f) 5^ 0.
If we show deg (/(f)) = 0, we are through. Suppose otherwise, say
deg (/(f)) = n > 0, for some n. Then since /(f)11, there is 0(f) with
1 = /(f)0(f)- Also 0(f) 0. Hence by 5.11(h),

0 = deg (1) = deg (/(f)) + deg (9(f)),

which is a contradiction, (ix) is immediate from (iv) and the definition


6.22. To prove (x), suppose that/(f) ~ g(f). If either/(f) or g(f) = 0,
then so is the other. In this case/(f) = g(f). Otherwise, we can assume
that/(f) 5* 0, 0(f) 7* 0. For some Mf), Mf), we have

/(f) = MfMf) = M f)Mf)/(f)-

Since/(f) ^ 0, we thus have 1 = /ii(f)/z2(f). Hence fcx(f)|l, so that by


(viii) there is a E K with a 7* 0 and /ii(f) = a. We leave the proofs of
(xi)-(xiii) to the student.

Prime polynomials.

6.24 Definition. We say that p(f) (e K[ f]) zs prime in K[f], or irreducible


over K, z/ p(f) 5* 0, p(f) 1 and for any /(f) E K[f], z//(f)b(f)
thenfif) ~ 1 or/(f) ~ p(f).
212 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

This is now the correct analogue of the notion of prime integer. Note that
by 6.23(xiii) with each prime p(f) is associated a unique monic g(f) with
P(f) r'w' g(f). It follows by 6.23(ix) that also g(f) is prime. Further, if
/(f)l<?(f) and/($) is monic then /(f) = 1 or/(f) = g(f). Thus the monic
prime polynomials provide unique representatives for arbitrary prime poly¬
nomials just as the positive prime integers provide unique representatives
for arbitrary prime integers.
Note that by 6.23(viii), a simpler way of expressing the condition
/(f) ^ 0,/(f) ^ 1 is that deg (/(f)) > 0.

6.25 Theorem. 7//(f) £ K[f], deg (/(f)) > 0, and/(f) is not prime in
then there exist g(f), 7(f) with /(f) = g(f)/i(f) and 0 <
deg (g(f)) < deg (/(f)) and 0 < deg (7(f)) < deg (/(f)).

Proof. Suppose the contrary. Then whenever g(f)|/(f) we must have


0(f) ~ 1 or 0(f) ~ /(f)- For/(f) = 0(f)7(f), and either deg (g(f)) = 0
or deg (g(f)) = deg (/(f)) or deg (7(f)) = 0 or deg (7(f)) = deg (/(f)).
In the first case, g(f) = a, where a ^ 0; hence g(f) ~ 1. In the second
case, g(f) ~ /(f) by 6.23(ix). Similarly, in the third and fourth cases,
7(f) ~ 1 or 7(f) ~ /(f), from which it follows easily that

0(f) ~ /(f) or 0(f) ~ 1,

respectively. Thus if the conclusion were false, /(f) would be prime.


This is the analogue of 4.41. Since the degree assigns to each polynomial
a nonnegative integer, and since every nontrivial factorization leads to
smaller nonnegative integers for the factors, we will be able to use the
well-ordering of the nonnegative integers to conclude that we can even¬
tually factor /(f) into prime factors. This will provide us with the
existence part of our representation theorem. To get the uniqueness part,
we need, essentially, an analogue of 4.47(iii), that whenever a prime divides
a product it divides one of the factors. The proof of that rested on develop¬
ment of the properties of gcd, which in turn rested on the division algorithm
4.37. For a proper analogue, the comparison of ordering of degrees with
that of positive integers suggests that when attempting to divide an /(f)
by some h(f), where 7(f) does not necessarily |/(f), we should in general
seek a remainder r(f) which has smaller degree than 7(f). Of course, we
can omit the trivial case deg (h(f)) = 0 here. This gives us the idea for
the proper formulation.
On the other hand, the idea for the proof of the division algorithm for
polynomials, our next theorem, is just to express in general form the
standard technique of dividing polynomials taught in high-school algebra.
6.4] POLYNOMIALS OVER A FIELD 213

For example, consider dividing jfc(f) = |2 — 2| + 2 into /(|) = 5|4 +


£2 — 1 (polynomials over Ra). The standard method takes the following
form here:

(6:4-1) 5|2 + 10^ + 11


I2 ~ 2$ + 2)5£4 + 0|3 + |2 + 0| - 1
5|4 - 10|3 + 10|2 + 0| +Q
10|3 - 9|2 + 0| - 1
10|3 - 2012 + 20| + 0
1112 -20|-1
11|2 - 221 + 22
2| - 23
Thus,

(6:4-2) (514 + |2 - 1)

= (I2 - 2| + 2)(5|2 + 10| + 11) + (21 - 23),

i.e., we have quotient 5|2 + 10| + 11, remainder 2| — 23. The method
is simply this: we start out with

/(i) = or + aT-1 + ■ • •, Mi) = br + b'r*1 + ■■■,

where, say, m < n. We take a “trial” quotient by the terms of highest


degree, (a/6)|n_m. We then multiply this back by /i(|) and compare the
result with /(|) by forming the difference:

(6:4-3) /(f) - | {"-”*({)

= («£" + <*'{"-> + •■•)- (a? + ^ + •■•)•

The result is a polynomial/i(|) of degree < n — 1, /i(|) = c|”_1 + • • •,


where c = a’ — (ab'/b), and we have

(6:4-4) /(I) = \ In-^(l)+/i(l).

If m > w - 1 we are through. Otherwise, we repeat the process with


/i(|). If c 5^ 0, this takes the form

(6:4-5) /^l) = l |(n-1)-mh(|) +/2(|),

where /2(|) is of degree < n — 2. By continuing this procedure, we


eventually reach a representation of /(|) as a sum of terms, each having
/i(|) as a factor, together with a remainder of degree < deg (/i(|)).
214 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

The division algorithm for polynomials.

6.26 Theorem. Suppose that /(f), /i(f) G K[f] where deg (/i(f)) > 0.
Then there exist g(f), r(f) G K[f] such that /(f) = h(if)q{£) + r(f)
and 0 < deg (r(f)) < deg (/i(f)). Further if g'(f), r'(f) G K[f],
fit) = HOq'iO + r'(f), and 0 < deg (r'(f)) < deg (h(0), then
q'(0 = ?(f) andr'(^) — r(f).

Proof. Let /i(f) = ^2?=obiP where m > 0 and bm 7^ 0. We also


write 6 = bm. We first prove the existence part of the theorem by induc¬
tion on the set of n > 0 such that the result holds for all /(f) of degree
<n. First note that if deg (/(f)) < m, then such a representation exists
with g(f) = 0, r(f) = /(f). In particular, it holds for n = 0. We now
show that if it holds for n — 1 > 0 it also holds for n. It is sufficient, by
the preceding, to consider only /(f) with deg (/(f)) = n > m. Put
/(f) = aiP with an 0; we write a = an. Then let /i(f) =
/(f) — (a/6)f"“m/i(f). As we have seen, deg (/i(f)) < n — 1. Hence,
by induction, there are qx(f), ?q(f) with /i(f) = /i(f)gi(f) + 74(f) and
0 < deg (74(f)) < m. Thus we can take g(f) = (a/b)£n~m + 9i(f) and
74(f) = Kf). To prove uniqueness now, suppose that q'(f), r'(f) also
satisfy the given conditions. Then /i(f)(g'(f) — q{f)) = r(f) — r'(f).
If q'(f) ^ g(f), then the degree of the left-hand side is > deg (h(f)) by
5.11(h), while by 5.11 (i) the degree of the right-hand side is

< max (deg (r(f)), deg (r'(f))) < deg (/i(f)).

Hence g'(f) = q(f), and consequently r'(f) = r(f).


Note that 6.26 has our earlier theorem 5.13 as a special consequence.
For suppose that/(f) t* 0, a G K, and/(a) = 0. We can find g(f), r(f)
such that/(f) = (f — a)g(f) + r(f) with 0 < deg (r(f)) < deg (f — a),
that is, deg (r(f)) = 0, so r(f) = 6 for some b G K. But then 0 = /(a) == b
so (f — a)|/(f). Theorem 6.26 also yields the following useful complete¬
ness property: if /(f) can be divided by h(f) in any field larger than K,
it can already be divided by h(f) in K[f]. This is proved next.

6.27 Theorem. Suppose that K is a subfield of a field L. Suppose that


/(f)Xf) eK[f], ///i(f)|/(f) in L[f] then /i(f)|/(f) in K[f],

Proof. The special case, deg (h( f)) = 0, is trivial. Otherwise we have
q'(f) e L[f] with /(f) = h(f)g'(f). We also have g(f), r(f) g K[f] with
/(f) = /i(f)g(f) + r(f) and 0 < deg (r(f)) < deg (/i(f)). By unique¬
ness in 6.26 applied in L[f], g'(f) = q(f), r(f) = 0.
Thus we can now speak of /t(f)|/(f) without specifying what field this
should take place in. The reader should not jump to conclusions about
6.4] POLYNOMIALS OVER A FIELD 215

other similar-sounding statements. For example, if p(|) is prime in K[£],


it may fail to be prime in L[£], when K C L. (Why?)

Greatest common divisors.

6.28 Definition. Suppose that /(f), g((), d(£) E K[f]. We call d(£) a
greatest common divisor or (gcd) of /(£) and g(%) in K[£] if d(£) has
the following properties:
(i) d(€)|/(€) and d(£%d);
(ii) if HZ) e K[£] and h(£) |/(£), h(0\g(0, then h(Z)\d{Z).

Our method to prove the existence and uniqueness (up to ~) of gcd for
polynomials now follows the same lines as for integers.

6.29 Theorem. Suppose that S c K[|] and that S satisfies:


(i) there exists s(£) £ S with s(£) 5^ 0;
(ii) if s(£) e S and h(£) E K[£] then s(Z)h(%) E S;
(iii) if s(£), t(£) E S then s(£) + t(ij) E S.
Then there exists d(£) E K[£], with d(£) 5* 0 and.
(iv) for all h(£) E K[£], /i(|) E S if and only if d(£)|/i(£).
[Hence, in particular, d(£) E S.] Further if di(|) is any other poly¬
nomial satisfying condition (iv), we have d(£) ~ di(|). Hence there
is a unique monic d(£) satisfying (iv).

Proof. Consider

S* = [m: m > 0 and for some s(£) E S, s(£) ^ 0 and deg (s(£)) = mj.

We know that S* 5^ 0 and S* c P0 = P u {0}. Hence by well-ordering


of P0 (4.25), S* has a least element, call it n. We can then find some
d(£) E S with d(£) 5^ 0 and deg (d(£)) = n. We shall show that (iv)
holds for any such d(£). If n = 0, d(£) = d E K. Then also d•1/d E S
by (ii), hence 1 E S. But then /i(£) E S for all h(Z) by (ii). On the other
hand, d|/i(£) for all h(£). Thus we can now assume that n > 0. If d(£)|/i(£),
we have h(Z) = d(£) • g(|) where g(£) E K[£]; hence /i(£) E S by (ii).
Suppose that /&(£) E S. Then by the division algorithm 6.26 we can find
q(Z), K£) with ^(£) = d(Z)q(t) + r(£) and 0 < deg (r(£)) < n. If
r(£) = 0, we are through. Suppose the contrary. Then from (ii) and (iii)
we see that r(£) E S. If we put deg (r(£)) = m, it follows that m E S*
and m < n. But this contradicts the choice of n. For uniqueness, if
dj(£) is any other polynomial satisfying (iv) we have d(£) E S and
di(|) £ S; thus applying (iv) to both gives d(£)|di(£) and di(£)|d(£),
that is, d(£) ~ di(£).
216 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

6.30 Theorem. If /(f), 0(f) e K[f] and /(f) ^ 0 or 0(f) 9^ 0 then

/(f), 0(f) have a unique monic gcd d(f) in K[f], For suitable s(f),
t(6 e K[f] we have d(f) = /(f)s(f) + g(£)t(£).

Proof. Let S consist of all polynomials of the form /(-f)s(f) + 0(f)f(f).


Then/(|), 0(f) G S. Clearly S satishes 6.29(i)—(iii). If we choose monic
d(f) satisfying 6.29(iv) it follows that d(f)|/(f), d(f)|0(f). Further, if
d(f)|/(f), d(f)|0(f), it follows from the definition of S that d(f) divides
every element of S. In particular d(f)|d(f). Hence d(f) is a monic gcd.
Clearly, if dj(f) is any other monic gcd then d(f) ~ di(f). But then
d(f) = di(|) by 6.23(xii).

6.31 Corollary. Suppose that /(f), 0(f), d(f) e K[f], /(f) ^ 0 or


0(f) ^ 0, dx(f) G L[f] with d(f), dx(f) monte 0cd’s 0/ /(f) and 0(f)
in K[f], L[f], respectively. Then d(f) = dx(f).

Thus also the notion of gcd, for polynomials in a given field, does not
change if we enlarge the field.

6.32 Definition. We define (/(£), 0(f)) for all /(f), 0(f) g K[f] with
/(f) 9^0 or g(f) 7^ 0 to be the unique monic fed d(f) of /(f) and
0(€);/(€) and 0(f) are catted relatively prime f/ (/(f), 0(f)) = 1.

Although the proof of 6.30 does not exhibit it directly, we can actually
provide an algorithm for determining (/(f), 0(f)). The same formal
procedure as given by the Euclidean algorithm for integers, which we de¬
scribed following 4.42, can be applied here. We leave it to the reader to
work out examples of this in the exercises.
The next two theorems are direct analogues of 4.40-4.48. The second
is derived quite easily from the first and earlier results.

6.33 Theorem. Suppose thatf(^), g(f), h{f) G K[f]. Then we have:


(i) if{m,h(f)) = 1 and/i(f)|/(f)0(f) then ft(f)|0(f);
(ii) if (/(f), 0(f)) = 1 and/(f)|/i(f) and 0(f)|/i(f) tten/(f)0(f)|fc(f).

Proof, (i) Under the hypothesis and by 6.30, for some s(f), t(f) G A[f],
/(f)s(f) + ft(f)*(f) = 1. Hence /(f)0(f)s(f) + d(f)0(f)f(f) = 0(f).
Since /i(f) divides the left-hand side, /i(f)|0(f).
(ii) follows directly when we write h(f) = /(f)g(f) and apply part (i) to
0(£) !/(£)?(€)•

6.34 Theorem. Oppose fdaf /(f), 0(f), /x(f), . . . ,/n(f), p(f) e K[f],
and suppose that p(f) fs prime in K[f], Then we have:
(i) </>(£) +/(f) ^ (/(f), p(f)) = 1;
(b) */ P(^)|/(f)0(f) f/ien p(f)|/(f) or p(f)|0(f);
(iii) if p(f)|n"=1/;(f) then p(f)|/»(f) /or some i.
6.4] POLYNOMIALS OVER A FIELD 217

Unique factorization theorem for polynomials. We are now able to prove


the main result.

6.35 Theorem. Suppose that /(£) e K[£], deg (/(£)) >0. Then
(i) there exists a sequence (pi(£), . . . , Pk(k)) of polynomials prime
in K[£] such that /(£) = pi(£) • • • pk(£), and
(ii) if (<?i(£)> • • • , Qi(0) anU other sequence of polynomials prime
in K(£) such that /(£) = qi(^) ■ • ■ qi(£), then k = l and for
some permutation F of {1, . . . , k}, gt-(|) ~ PFaft) for all i < k.

Proof. This proceeds in a way quite similar to the proof of 4.50. Here
it is by course-of-values induction (4.49) on the degree of /(£). That is,
we let

(1) A = {n: n > 0 and for all /(£) e K[£] with deg (/(£)) = n, we
have (i) and (ii)}.

The induction proceeds by assuming

(2) if 0 < m < n then m E A.

We wish to conclude

(3) n E A.

Suppose it is given that /(£) e K[£] with deg (/(£)) = n. To show (i)
holds for/(£), we consider two possibilities. If/(£) is prime, we are through.
Otherwise, we know by 6.25 that

(4) there exist g(£), h(i-) E K[£] with /(£) = g(i;)h(Z) and 0 <
deg (g(£)) < n and 0 < deg (h(£)) < n.

Hence by (1) and (2), both g(£) and h(i-) can be written as products of
prime polynomials; but then so also can /(£). To prove that (ii) holds for
/(£), suppose that we have

(5) /(£) = Pi(€) • • ' Vkik) = 3iU) • • • ?j(€).

If k — 1, then we cannot have l > \, otherwise we would have a proper


factorization of the prime pi(£); similarly if l = 1. Thus we can assume
now that both k > 1, l > 1. By 6.34(iii), Pi(£)|?;(£) for some j. Then
by definition (since pi(£) / l), pi(£) ~ qM), Qjik) = opi(£) for some
a 9^ 0. Hence

(6) p2(S) ■ ■ • Vkit) = (aqi(Z) ■ ■ • qj-i(S) ’ Qj+i(£) • • ' ?i(f)-)

On the left-hand side we have a polynomial of degree m, 0 < m < n.


218 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

The polynomial agq(f) is prime since it is ~ to qi(f). Now by induction


hypothesis (2), k — 1 = l — 1 and the polynomials on the right-hand
side of (6) are under some one-to-one correspondence, with those on
the left side. This correspondence is directly extended to the desired
permutation in (5).
As a second form of the unique factorization theorem we have the
following.

6.36 Theorem. Suppose that /(f) e K[f], deg (/(f)) > 0. Then
(i) there exists ana E K with a 9*= 0 and a sequence (pi(it), . . . , Pk(0)
of monic polynomials prime in K[ f ] such that

/(£) = api(f).Vk{£),
and

(ii) if b G K, b 0, and {gq(f), ■ ■ ■ , qi(t;)) is any other sequence of


monic polynomials prime in K[f] such that

/(£) = &?i(S).
we have k = l, a = b, and for some permutation F of {1, . . . , k},
qi(0 = PFdfO for all i < k.

The proof is left as an exercise.


For fruitful applications of these theorems we must wait for further
developments. It may be expected that such applications must depend
to a certain extent on determining, for a given field K, exactly what poly¬
nomials are prime in K[f], In contrast to the situation for determining
the prime integers, this is a quite difficult problem in general. It can be
shown, for example, that there is an algorithm for determining whether
any given polynomial /(f) e Raff] is prime there, but its application is
troublesome. In practice it turns out more suitable to try to obtain simple
results for special classes of polynomials. The simplest result of this kind,
which holds in any field, is the following.

6.37 Theorem. Suppose thatf(£) e Kff] and n = deg (/(£)). Then:


(i) if n = 1, /(£) is prime in K[£];
(ii) if n = 2 or 3, /(£) is prime in K[£] if and only if /(f) has no
root in K.

Proof, (i) is obvious. Consider (ii). If /(f) is prime in K[f], but /(f)
has a root a in K, then (f — a)|/(f) by 5.13. Clearly, (f — a) is not ~1
and not ~/(f). Conversely, suppose that /(f) has no root in K[f], but
/(£) = g(£)h(£) is a proper factorization. Then deg (g(f)) or deg (h(f)) = 1.
Hence one of these has the form aqf + a0, with ax 9^ 0. But then
—o0/ai would be a root of /(f).
6.4] POLYNOMIALS OVER A FIELD 219

In particular, if /(f) e Ra[f] is of degree 2 or 3, we can further test


to see whether /(f) is prime over Ra by clearing of fractions to give an
~ polynomial with integer coefficients and applying 6.19. The fact that
only finitely many fractions b/c need be considered in Ra provides us
with a simple algorithm for this case. Often we can do better by applying
6.20 or 6.21. Among other things, 6.21 shows that there are infinitely
many prime polynomials of both degrees 2 and 3.
If K is a finite field, say one of the fields of integers modulo a prime, we
can test for roots directly and hence test for primeness of quadratic or
cubic polynomials.
Such a method will not work for degree n = 4. For example, f2 — 2
has no rational roots, hence neither does /(f) = (f2 — 2)2; but this/(f)
is not prime in Raff]. However, it can be shown that for each n > 3
there are also infinitely many prime polynomials in Ra[f] of degree n.
This situation changes radically as we pass to other fields. Among the main
results we shall obtain are that every prime polynomial in the real numbers
is of degree 1 or 2, while every prime polynomial in the complex numbers
is of degree 1. Hence in both these cases we have a complete survey, using
the factorization theorem, of the structure of all polynomials.
We shall conclude this chapter with a general result making use of the
gcd of polynomials. This will be seen to have some simplifying value in
the determination of roots of polynomials. Suppose that c e K and that
/(c) = 0, where /(f) e K[f] and /(f) 5^ 0. It may happen that not only
(f — c)|/(f) but also (f — c)2|/(f). There is a largest positive integer
m such that (f — c)m|/(£). This leads to the following.

6.38 Definition. Suppose that /(f) e K[f], /(f) ^ 0, and that /(c) = 0
where c £ K.
(i) By the order of c in /(f) we understand the largest positive integer
m such that (f — c)m|/(f).
(ii) We call c a simple root of /(f) if its order in /(f) is 1; otherwise
we shall call c a multiple root of /(f).

Suppose that c is a root of /(f) of order m(> 1). Write

(6:4-6) /(f) = (f - c)mg(f).

Then g(c) ^ 0, for otherwise (f — c)m+1|/(£)- By 5.17 we have

(6:4-7) /'(f) = m(f - c)m-yg(f) + (f - c)mg'(f),

where m = ml. Let

(6:4-8) d( f) = (/(f), /'(f)) and A (f) = /(f) Mf).


220 THE RATIONAL NUMBERS AND FIELDS [CHAP. 6

It may happen that d(£) ~ /(£), and then /x(£) is constant. This will
be the case if /'(£) = 0; such can happen, e.g., with /(£) = — a in
the field lv. However, we claim that

(6:4-9) if m ^ 0 then /i(c) = 0, and in fact c is a simple root of /i(£).

For we know by (6:4-7) that (£ — c)m~1\f'(£) and hence (£ — c)m~1\d(^).


If (£ — c)m\d(t;) then also (£ — c)m\/'(£). From this would follow
(£ — c)jmgf(|). But m ^ 0, so (£ — c) | m, and hence (£ — c)|gr(£).
This leads to the contradiction g(c) = 0. Thus c is exactly of order
m — 1 ind(£), i.e., we can write d(£) = (£ — c)m_1/i(£), where/i(c) ^ 0.
Then necessarily /i(£)|#(£), g(£) = h(0gi(|), and /i(£) = (£ — c)0i(£).
Since 17(c) ^ 0, also ^i(c) 0, so that we have the conclusion of (6:4-9).
This argument proves the following result.

6.39 Theorem. Suppose that ml ^ 0 for each m E P. Consider any


/(£) eK[£] with f(£) * 0. Let d(£) = (/(*),/'(£)) and /x(£) =
fiti/diS). Then:
(i) /i(£) has exactly the same roots in K as/(£), and
(ii) /1(£) has only simple roots in K.

Note that the hypothesis is satisfied in any ordered field. Note also
that the choice of /1(£) is independent of K, since /'(£) and d(£) are in¬
dependent of K ; for the gcd and /x(£) can be found by Euclid’s algorithm
followed by the division algorithm.
If d(£) = 1 then /(£) = /i(£) and /(£) has only simple roots in any
field K which contains its coefficients (so long as ml 5^ 0 for all m E P).
A partial converse is true for the case that the only polynomials which
are prime in K[£] are of degree 1; in this case, if /(£) has only simple roots
in K then d(£) = 1. This can be proved by induction on the degree of
/(£) using (6:4-6) and (6:4-7). The added condition on K is essential;
for example, the polynomial f(f) = (£2 - 2)2 in Ra[£] has only simple
roots in Ra, since it has no roots in Ra, but (/(£),/'(£)) = (£2 — 2).

Exercise Group 6.4

1. Prove Theorem 6.23(xi)-(xiii).


2. Prove Corollary 6.31.
3. Prove Theorem 6.36.
4. Find the gcd (/(£), g(§) for the following polynomials in Ra[£],
(a) M) = £18 - 1, g(Z) = £33 - 1
(b) /(£) = 2£3 - 2, g(S) = £4 + £3 + ? + £ + !
(c) /(£) = S2+£+!,</(£) = £4+£2+l
6.4] POLYNOMIALS OVER A FIELD 221

5. Find the gcd (/(£), p(£)), where /(£), p(£) in 4(c) are regarded as poly¬
nomials in l2[£]-
6. Which of the following polynomials is prime in Ra[£]? Prove your
assertion.
(a) e + a - 1 (b) £2 + 3£ - 4
(c) £3 - 12 (d) £3 + £ - 2
7. Factor the polynomial /(£) = £6 — 1 into polynomials prime in Ra[£],
Prove that your result is correct.
8. Find all polynomials prime in l3[£] and of degree <3.

9. Let K be any field in which the only prime polynomials are those of degree 1.
Suppose that/(|) G K[£] with/(£) ^ 0. Show that if/(£) has no multiple
roots in K then (/(£),/'(£)) = 1.
10. Suppose that p(£) is prime in K[£], For any /(£), g(£) G K[£], define
/(£) = g(k) (mod p(|)) to mean p(£)|(/(£) — p(£>), and /(£) ^ g(£)
(modp(£)) to mean that this is not the case. Show that if /(£) ^ 0
(mod p(£)), then there exists/i(£) G K[£] with /(£)/i(£) = 1 (mod p(£>).
What ideas does this suggest?
11. Show that any /(£), g(^) G K[£] have a least common multiple (1cm),
i.e., a polynomial m(£) such that /(£)|?n(£), g(£)\m(%) and whenever
f(k)\K%), g(£)\h(£;), then m(£)|A(£).
12. We consider elements p of the field K(|) of rational forms over K.
(a) Show that if </(£), h(£) GK[£], g(£) ^ 0, h(£) ^ 0, and (g(£), h(£)) = 1
then there exist s(£), t(i-) G K[£] with

_i__ _ ad) m
g(S) + HU) '

(b) Show that if p G K(£) then p can be represented as a sum of a number


of terms, one of which is in K[£], and the others of which each has
the form r(£)/(p(£))* for some r(£), p(|) G K[£], where p(£) is prime
in K[£], i > 0, and deg (r(£)) < deg (p(£)). (This is often called
the partial-fractions representation of p over K.)
CHAPTER 7

THE REAL NUMBERS

7.1 Toward extending the rationals. Algebraic motivations. At the


beginning of the last chapter we discussed the algebraic and geometric
motivations for introducing the rational numbers. Expressed in terms of
polynomials, the algebraic motivation was quite simply that not even
the first-degree equations a£ — b = 0 could in general be solved in
integers, for a, b e I. The idea for constructing an integral domain extend¬
ing I in which such equations could be solved, at least when we could hope
for a solution, i.e., when a ^ 0, was directly suggested by this algebraic
consideration; so in turn was the general concept of a field. As we saw in
Section 6.3, fields serve to do much more than originally proposed, namely
they provide the proper framework for the study of systems of linear
equations. On the other hand, fields in general, and the rational numbers
in particular, are far from providing us with a free hand for the solutions
of equations of degree 2 or higher. For example, we saw in 6.21 that for
n > 1 and for any a G I which is not a perfect nth power, — a has
no rational roots.
Now there is no a 'priori reason to have the feeling that every /(£) e Ra[£]
of degree >0 should have a root “somewhere. ” If we are thinking of num¬
bers as having to do with the measurement of lengths, then we are thinking-
in terms of an ordered held. But in any ordered held K (say, containing
Ra), x2 > 0 for x ^ 0, and hence x2 + 1 >0 for all x e K. Therefore
£“ + 1 has no roots in K. On the other hand, there are good reasons for
wanting to say that |2 — 2 has a root “somewhere,” i.e., that in a suit¬
able ordered held K extending Ra, there is an x with x2 = 2. For Pytha¬
goras’ theorem ascribes such £ as length to the hypotenuse of a right
triangle with legs both of length equal to 1.
Even if we have reservations concerning the validity of the basic
geometric principles from which Pythagoras’ theorem is drawn, it still
makes perfect algebraic sense to try to see whether we can construct an
ordered held K, extending the rationals, which contains a root s of £2 — 2.
If we have such a held K and an s e K with s2 = 2, let

(7-T-l) S = {a + 6s: a, b e Ra}.

Certainly Ra c S c K. Furthermore, it is easily seen that S is closed


222
7.1] TOWARD EXTENDING THE RATIONALS 223

under + and —. It is also closed under multiplication, for

(a0 + b0s)(a1 + bis) = a0ai + (60ai + a1b0)s + 60M2


= (O'0ai T 26q6i) + (b0ai -T aib0)s.

Note next that for a0, b0, a\, bj e Ra,

(7:1-2) a0 + b0s = ai bxs if and only if a0 — «i and b0 = &i-

For (b0 — &i)s = ax — a0, so that if b0 — bi ^ 0,

s = (ai — a0)/(60 — &i) e Ra.

It follows that a + bs = 0 if and only if a = b = 0; hence if a + bs ^ 0,


also a — bs ^ 0. Now we can see that S is closed under division by
nonzero elements: for

1 1 (a — bs) _ a — bs _ a — bs
a -f- bs (a -f bs) (a — bs) a2 — b2s2 a2 — 2b2

(Necessarily, by the preceding, a2 — 2b2 ^ 0, since it is the product of


the nonzero elements a + bs, a — bs; but this is also seen directly from
the fact that |2 — 2 has no rational roots.) Thus l/(a + bs) = a' + b's,
where a' = a/(a2 — 2b2), b' = —b/(a2 — 2b2), and hence is again in S.
It follows that

(7:1-3) S forms an ordered field when the operations of K are restricted


to S.

The interesting aspect of this is that from any ordered field K which con¬
tains a square root s of 2, we can construct a subfield S which contains
the same root, and which contains nothing more than what is demanded
by these conditions; we say that s generates S (over Ra).
Now (7:1-2) and our proofs of closure suggest how to construct a field S
which satisfies these conditions. We first construct a system S which will
be = to S:

(7:1-4) S consists of all pairs (a, b) for a, b E Ra.

For (a0, b0), (ai, bx) G S we define

(a0, bo) + (a\, b\) — (ao + «i, b0 + bi)


and
(a0, bo) • (ai, bi) = (a0ai + 260f>i, a0b\ + aib0).

Finally we put a = (a, 0) for each a £ Ra and s = (0, 1). It can then be
shown that under these definitions, (S, +, •, 0, 1) is a field in which the
224 THE REAL NUMBERS [CHAP. 7

set Ra of all elements a with a e Ra constitutes a subfield = to Ra.


Finally, s2 = 2 in S. We could also, if desired, define a suitable ordering
relation < under which S becomes an ordered field.
The preceding gives a hint of a more general situation. Given a poly¬
nomial/^) G Ra[£] of degree >0, we can ask whether there exists a field
K (or, possibly, an ordered field K) in which /(£) has a root, or has “all”
its roots. To see how we might construct a field of this kind, we would
first try to analyze, for such a K, what the set S of elements generated by
such a root looks like. A successful analysis would then show us how to
construct a suitable S directly. As we have seen, we cannot hope to get
an ordered field containing a root of £2 + 1, but we might still hope to
get a field containing such a root if we drop the ordering requirement.
As we shall see in Chapter 9, this general plan can be carried out for
all nontrivial /(£). It provides the first step toward a satisfactory general
treatment of solutions of equations over the rationals. We say “first
step, ” because even if we succeed in doing what is suggested, we still
have the following difficulty. This plan merely associates with each non¬
trivial /(£) e Ra[£] a field K in which it has a root; in fact the construc¬
tion will associate with /(£) a “smallest” such K. We cannot then expect
that fields Ki and K2 thus associated with different /i(£), /2(£) are the
same, or even This would prevent us from simultaneously dealing with
and combining in various ways roots of several equations. What would
really be desirable is a single field K which contains all roots of all nontrivial
polynomials over Ra. In fact, a K satisfying this condition can also be
constructed. It can be imagined that the construction of such a field
might involve, in some sense, an infinite number of special constructions
dealing with each /(£) in turn.

Geometric motivations. In contrast, the geometric motivation for extend¬


ing the system of rational numbers and the method suggested by it for
proving the existence of a satisfactory extension are, at least at the be¬
ginning, easier to deal with than the algebraic approach just sketched.
I urthermore, the geometric approach contains an idea which has many
applications in other branches of mathematics, especially analysis and
“point-set” topology. Finally, we shall see after carrying through in this
direction that when the geometric approach is combined with one algebraic
step to the complex numbers, it provides us with a suitable single frame¬
work for analyzing the multiplicity of algebraic extensions described above.
The geometric ideas which we shall use often in the following are only
for heuristic purposes. The definitions we make and results we obtain
will be based strictly on the set-theoretical and algebraic notions we have
dealt with up to now. (This is not to say that we believe that geometrical
intuitions are less trustworthy than set-theoretical ones; rather that it is
7.1] TOWARD EXTENDING THE RATIONALS 225

not necessary to make any additional assumptions as to the nature of


mathematical objects beyond those made in set theory.)
The process of measuring the lengths of (straight) line segments involves
certain comparisons. Basic to any particular assignment of numbers as
measures to lengths is the choice of a particular line segment as providing
a unit of length. Consider a straight line which is infinite in one direction
(a ray):
i-1-1-1-•—i-1--
0 1 2 3 p4 5

Figure 7.1

We have brought the unit length to coincide at one end with the origin
of this line, which we label 0, the other end falling at a point labeled 1.
We also call the corresponding points 0 and 1, respectively. Given any
line segment, we can attempt to measure it by first laying off an equal seg¬
ment on the above line so that one end coincides with the origin. Let us
denote the other end point resulting from this transfer by P; we shall
also use P to denote any number that we succeed in assigning as length to
OF. We then lay off the segment 01 end to end a number of times until
we either reach P exactly as end point or obtain P between successive such
end points. The resulting end points are labeled by the positive integers
1, 2, 3, . . . In the above picture, the length P (regarded as a number)
of OP is not exactly an integer; we have 3 < P < 4. Our next step is
to refine the measurement. This is in effect a choice of a new unit of length,
but notationally it preserves the previous assignment of numbers by
introducing fractional quantities.
If our unit of measurement is the inch, we would usually refine further
by dividing the unit segment into two equal parts, then each of those parts
into two equal parts, and so on. If our unit of measurement is in the metric
system, say, is the meter, we would refine further by dividing the unit
segment into ten equal parts, then each of those parts into ten equal parts,
and so on. In general, given any positive integer 6, we can divide the unit
segment into b equal parts by the ruler and compass construction indicated
in the following figure for the special case b = 5.

Figure 7.2
226 THE REAL NUMBERS [CHAP. 7

We mark off equal lengths OPi = PiP2 = P2P3 = • • • = Pb—\Pb on a


new straight line emanating from 0. We draw the line 1 Pb, and then for
any i draw a line through Pi parallel to 1 Pb. The point of intersection Qi
with 01 provides a division OQi = Q1Q2 = Q2Q3 = • • • = Q6_11 of 01
into 6 equal parts. We also write 1/6 for Qx, and then denote by a/6 the
result of laying off the segment

a times. (In particular, Q2, Q3, . . . are also labeled 2/6, 3/6, ... , re¬
spectively.) By this means a definite point P on the original line is assigned
(with respect to the given unit of measurement) to any nonnegative ra¬
tional number a/6.
Now given any straight line, infinite in both directions, a point chosen
arbitrarily on the line as the origin, and a unit of measurement, we can
assign to any rational number a/6 a definite point on the line by assigning
nonnegative rationals as above to points on one (the right) side of the origin
and to negative rationals —a/6, where a, 6 > 0, the point distant a/6
from the origin on the other (“left”) side.

Figure 7.3

In this correspondence, r < s for rational numbers r, s if and only if the


point (labeled) r is to the left of the point s. Thus between any two
distinct points r and s there is always a third point [for example,
(r + s)/2]. Because of this dense ordering of the rational points (i.e.,
points to which rationals have been assigned), we have the appearance of
a completely filled-up line. Nevertheless, Pythagoras’ theorem shows that
this is not so. If the hypotenuse AB of a right triangle with legs both of
length 1 is brought to coincide with OP, where 0 < P, P cannot be rational,
for P2 = 2. We can imagine that the straight line is like a thread in which
every point is essential to holding the thread together. Omitting the point
P with P2- = 2, 0 < P, will not result in any loss of rational points, but
the thread is now cut into two separate pieces. Indeed, as can be seen,
between any distinct rational points r and s there can always be found
some nonrational point P. (Why?) Hence omitting all the nonrational
points would cause the thread to disintegrate.

Upper and lower sections; continuously ordered systems. We begin to


see now a characteristic property of the ordering of all the points on the
line which is somewhat stronger than the mere property of density of the
7.1] TOWARD EXTENDING THE RATIONALS 227

ordering (which already holds for the rational points). We wish to formu¬
late this property in set-theoretical terms. Consider, for example, the state¬
ment that there is a gap in the rational numbers at “where \/2 ought to
be.” Let
A — {r: r E Ra and r > 0 and r2 > 2},

B = {r:r E Ra and r < 0 or r2 < 2},

i.e., A consists of all rationals to the right of, or above, y/2, and B of all
those to the left of, or below, \/2. Intuitively, A and B have the following
properties, for = Ra:

(7:1-5) (i) A 9* 0 and B 0;


(ii) A U B = S;
(iii) if r E B and s e 4 then r < s;
(iv) if r E B and r' < r then r' E B;
(v) if s E A and s < s' then s' E A;
(vi) if r E B then there exists r' E B with r < r';
(vii) if s e A then there exists s' £ A with s' < s.

We might call such a pair A, B a cut in the set S, with A the upper section
and B the lower section of the cut. Let us indicate a lower section with no
largest element [i.e., (i), (iv), (vi) hold] by < ), a lower section
with a largest element [i.e., (i), (iv) hold, but not (vi)] by -].
Similarly, we use (-*, [---> for upper sections with no, or
some, smallest element, respectively. Then we can combine lower and
upper sections in the rationals, i.e., to satisfy (7:1—5)(i)-(v) in only one
of three ways:
(a) ^-*--
(b) .-*-*
(c)---X--

Figure 7.4

(The fourth conceivable possibility is excluded by density.) Now intui¬


tively, every pair of sections satisfying (i)-(v) in the set S of all points
on the line should give a picture like Fig. 7.4(a) or (b); i.e., every cut in
the full line goes through some point. In contrast, the particular subsets
A and B of Ra defined above yield a cut in Ra which looks like Fig. 7.4(c).
Although we have been led to these considerations from ideas involving
measurement, we see now that the “defect in the rational numbers is
one which can be expressed entirely in terms of its ordering relation. I o
simplify matters, note that in discussing cuts in a general set S, it is not
really necessary to involve both lower and upper sections, for each is the
228 THE REAL NUMBERS [CHAP. 7

complement of the other. Furthermore, if we restrict consideration to


upper sections, it is only necessary to consider those which have no least
element, as in Fig. 7.4(a) and (c). For given an upper section as in case (b),
we can associate a new upper section of type (a) or (c) by removing the
least element of the given upper section and adjoining it to the given
lower section.

7.1 Definition. Let (S, <) be a simply ordered system.


(i) By an upper section in (S, <) we mean a set 4cS satisfying the
following conditions:
(a) A 5^ 0 and S — A ^ 0;
(b) if x E A and x < x' then x' E A;
(c) if x E A then there exists an x' < x with x' E A.
We denote by U(S) the collection of all upper sections in S.
(ii) (S, <) is said to be continuously ordered if (S, <) is densely
ordered without first or last element and for every A E U(S), S — A
contains a largest element.

Recall the definition 3.28 of largest element; for R = & — H, y is a


largest element of B if y e B and for all y' e B, we have y' < y. It is
easily seen that if A is an upper section in the simply ordered system
(S, <) and B = S — A, then (1) if y e B and x E A then y < x, for
otherwise by (b) above, x < y and y e A; (2) if y e B and y' < y then
y' G B, for otherwise y' <E A and then y e A by (b) above. Hence, if
in addition B has no largest element, we see that all the conditions of
(7.'1—5) are satisfied; in this case the system would not be continuously
ordered.
1 he proof that nothing is lost from our original ideas concerning both
upper and lower sections is contained in Theorem 7.3.

7.2 Definition. Let (S, <) be a simply ordered system. By a lower sec¬
tion in (S, <) we mean a set B c ,S satisfying the following conditions:
(a) B 9^ 0 and S — B ^ 0;
(b) if y e B and y' < y then y’ e B;
(c) if y G B then there exists y' with y < y’ and y' e B.
We denote by L(S) the collection of all lower sections in S.

7.3 Xiieorem. If (S, <) is a continuously ordered system and B E L(S)


then S — B contains a least element.

Proof. Suppose that S — B does not contain a least element. Let


A = S — B. Then A E U(S). For by 7.2(a), both S — A(=B) and A
are nonempty. Further, if x E A then y < x for all y e B [for otherwise,
by 7.2(b) we would have x E B], Hence if x < x', also y < x' for all
y E B, so that x' & B, i.e., x’ E A. Finally, if t E A then there exists
7.1] TOWARD EXTENDING THE RATIONALE 229

x' < x with x' e A; for otherwise, x would be the least element of A,
contrary to hypothesis. Thus all the conditions for A G U(S) are satisfied.
Hence by 7.1 (ii), B = S — A contains a largest element. But this
contradicts 7.2(c).

7.4 Theorem. (Ra, <) is not continuously ordered.

We leave the proof to the reader.

Existence of continuously ordered systems. Intuitively, the set of all


points on a straight line forms a continuously ordered system. We wish
now to give a precise set-theoretical proof of the existence of such a system.
This proof, and the entire related development of ideas concerning sections,
is due to Dedekind. (Historically, these ideas can be traced back to the
treatment of incommensurable quantities given by the Greek geometer
Eudoxus and included in Euclid’s Elements.) Later we shall give a different
proof based on ideas of Cauchy.

7.5 Theorem. Let (S, <) be a densely ordered system without first or last
element. For X, Y G U(S), put X -< Y if Y c X and Y ^ X. Then:
(i) (U{S), <) is a continuously ordered system which contains a sub¬
system = to (S, <) ;
(ii) if (S, <) is already continuously ordered, then

(U(S), <) 9* (S, <).

Proof. An intuitive picture of the ordering X < Y between elements


X, Y of U(S) is
Y

i-e

Figure 7.5

We shall also write Y C X for Y Q X and Y ^ X. Thus X < 1 is


equivalent to Y C X. Hence it is clear that X ^ X for any X, and that
if x < Y and Y < Z then X < Z. To conclude the verification that

(1) (U(S), <) is simply ordered,

we must show that for any X, F G U(S) if X ^ Y then X < Y or Y < A.


Since X ^ Y there is either some u G X — Y or some v G Y — X.
We show in the first case that X < Y; by symmetry, we can conclude
230 THE REAL NUMBERS [CHAP. 7

then that Y < X in the second case. Suppose that y E Y; either u < y
or y < u. In the latter case, « £ Fby 7.1(b), which contradicts the choice
of u. Hence u < y and then y e X, again by 7.1(b). In other words,
Y c X; but Y ^ J, so I < Y.
Before completing the proof of the continuous ordering of (U(S), <),
let us show the second part of (i). With each a £ Sis naturally associated
the set

(2) Ca = {x: x E S and a < x}.

Then

(3) if a E S we have Ca G U(S).

For, first, a E S — Ca and there is an x E S with a < x by hypothesis,


so 7.1(a) is satisfied. Obviously, so also is 7.1(b). Finally if x E Ca then
there is x' G Ca with x' < x by the dense ordering of S.
Next we see that

(4) if a, b e S and a < b then Ca < Cb-

lor clearly Cf c Ca and b e Ca — Cb, so Cb C Ca- It follows from (4)


that the mapping which associates with each a <E S the element Ca of
U(S) is one-to-one. Further if Ca < Cb then a < b, for otherwise b < a
and hence Cb < Ca by (4), contradicting (1). Thus

(5) {{Ca. a E S}, <) = (S, <).

Now we return to the full system U(S). Note that

(6) if a e S and X e U(S) then X < Ca if and only if a £ X.

For if a E X and a < x then x E X, hence Ca Q X. If Ca = X then we


would have a < x for all x E X, in particular a < a. Thus Ca C A",
that is, X < Ca. Conversely, suppose that X < Ca. If a & X then, as
we observed just before this theorem, a < x for all x E X. In other
words, X c Ca and then Ca < X, which is a contradiction. Thus a E X.
We can now prove

(7) (U(S),<) is densely ordered without first or last element.

For suppose that I, F 6 U(S) and


X < Y. Then F cl, and we can
find some b E X — F. Pick a e X
with a < b; then also a E X — F.
We shall show that X < Ca and Figure 7.6
7.1] TOWARD EXTENDING THE RATIONALS 231

Ca < Y. (This would not be true for Cb if b happened to be chosen so that


Y = Cb', this is the reason we stepped to a.) Now X < Ca is immediate
from (6). It also follows from (6) that Ca <, Y, since a & Y. But we do
not have Ca = Y, otherwise b e Y. That there is no first or last element
in U(S) is easily seen from (6). Thus (7) is established.
To conclude the proof of part (i) of our theorem, we must now examine
sections in U(S). We have

(8) A e U(U(S)) if A c U(S) and


(a) A ^ 0 and U(S) — A ^ 0;
(b) if X G A and X < X' then I'eA;
(c) if X E A then there exists an X' < X with X' e A.

Such a “super’’-section is a little more difficult to imagine:

--•—(«»((( ( i ( <-*-
a
^__
Y
j

Figure 7.7

The fact that U(S) — A ^ 0 provides some F e U(S) with Y < X for
all X e A. But also S — Y ^ 0, so there is some a e S — Y. Hence
a < y for all y E Y and then a < x for all I 6 A and x E Ar. This shows
why we have indicated a in Fig. 7.7. Thus, intuitively, the elements of A
must “bunch up.” The set that they “bunch up to” is just the set

(9) Z=\JX[X(=A];

i.e., we wish to show that

(10) Z E U(S) and Z is the largest element of U(S) — A.

This will, of course, prove that

(11) ( U(S), <) is continuously ordered.

First of all F ^ 0, since it is a nonempty union of nonempty sets; also


s — Z 5* 0, since a E S — Z, where a is as described earlier. If x E Z
then x E X for some X E A. Hence if x < x' we have x' E X and then
x> e z.Also we know that there exists an x' < x with x' E X, so again
there is such an x' E Z. Thus Z E U(S). Since for each A E A we have
X c Z, that is, Z <, X, if Z were in A, A would have a least element
contrary to (8c). Thus Z A. To conclude, we must show that if
Y g U(S) — A then Y Z, i.e., Z Q Y. Now since A is an upper sec-
232 THE REAL NUMBERS [CHAP. 7

tion in U(S) we know that Y < X for all X G A, hence certainly X c Y


for all X e A. But then also Z = (JX[AT G A] c Y.
We leave the proof of part (ii) of this theorem to the reader.

Greatest lower bounds and least upper bounds. The property of an ordered
system to be continuously ordered is a special case of an apparently more
general condition concerning existence of greatest lower bounds and least
upper bounds, which is also more frequently used in practice. However,
as we shall see, the conditions are really equivalent.

7.6 Definition. Let (S, <) be a simply ordered system and suppose that
A c S and i ^ 0.
(i) If b G S and b < x for all x G A then b is said to be a lower
bound for A; if A has at least one lower bound then A is said to
be bounded from below (in S).
(ii) If a is a lower bound for A and b A a for every lower bound, b
for A, then a is said to be a greatest lower bound for A (in S).
(iii) If b E S and x < 6 for all x g A then b is said to be an upper
bound for A; if A has at least one upper bound then A is said to
be bounded from above (in S).
(w) If a is (in upper bound for A and a. A b for every upper bound b
for A then a is said to be a least upper bound for A (in S).

The following is seen quite easily.

7.7 Lemma. Suppose that (S, <) and A are as in 7.6. Then:
(i) there is at most one a G S such that a is a greatest lower bound for A;
(ii) there is at most one a G S such that a is a least upper bound for A.

7.8 Definition. Suppose that (S, <) and A are as in 7.6. Then:
(i) if A has a greatest lower bound a G S, we call the unique such
element the infimum of A (in S) and write a = inf A, and we say
inf A exists (in S);
(ii) if A has a least upper bound a G S, we call the unique such element
the supremum of A (in S) and write a = sup A, and we say
sup A exists (in S').

Note that inf A exists and inf A g A if and only if A has a least element,
namely inf A; similarly for sup A and largest element.
Properly speaking, we should write “mfs A ” and “sups A.” For example,
if we take A = (x: x e Ra, 0 < t and x2 > 2}, infRa A does not exist
(under the usual ordering of Ra) although A is bounded from below in
Ra. On the other hand, when we define the real numbers and extend
7.1] TOWARD EXTENDING THE RATIONALES 233

(Ra, <) to (Re, <), infRe A will exist and — \/2. Where necessary, we
will write “infs A” or “inf A in S.” It is also common practice to write
a,
“inf (x e A),” “gib A,” “glbx (x G A)” for inf A, and “supx (x G A),”
“lub A,” “lubx (x e A)” for sup A.

7.9 Theorem. Suppose that (S, <) is a continuously ordered system, and
suppose that A c S and A 0. Then we have:
(i) if A is hounded from below then inf A exists;
(ii) if A is bounded from above then sup A exists.

Proof, (i) If A contains a least element a then a = inf A. Suppose that


A does not contain a least element, and is bounded from below. Let A' =
{.x: for some y e A, y < x). Then A' G U(S). For, first, A' ^ 0 since
A 0. Since there is some b with b < y for all y G A, and since S has
no least element [7.1(h)], there must be some b with b < y for all y E A)
then b E S — A'. Clearly if x E A' and x < x' then x' E A'. Also if
x E A' then there exists x' < x with x' G A'. For suppose otherwise.
Then x would be a least element of A'. For some y G A, y < x) but then
y = x, since A c A', so that y < x would make x not the least element
of A'. Hence x G A. Thus, in this case, A also has a least element,
contrary to hypothesis. Hence all the conditions for A' G U(S) are
satisfied. Let B' = S — A’. Then by definition S'has a largest element,
call it a; we show now that a = inf A. Since a & A', we have a < y
for all y G A. Thus a is a lower bound for A. Suppose that b is a lower
bound for A. Then b & A, otherwise A would have a least element. Hence
b < y for all y G A; but then b & A' by definition of A', that is, b G B'.
Since a was chosen as the largest element of S', it follows that b < a.
This completes the proof of (i).
(ii) We could prove (ii) by a related argument. Instead, we wish to
make some remarks which show how we can deduce (ii) from (i) using
some general considerations. Associated with^any binary relation W is
its converse relation \V (Section 2.3): (x, y) G W if and only il (y, x) G W.
Given a system (S, <), denote by <' the relation converse to <: x <’ y
if and only if x !> y. Then <C' is also a binary relation in »S; we consider
(S, <'). It is easily seen that (S, <) is simply ordered if and only if
(S, <') is; the same holds for being densely ordered. If we think of (S, <)
as an ordering on some horizontal line drawn on a piece of glass, with
x < y holding if x is to the left of y, then we can think of (S, <') by look¬
ing at the glass from the other side. Then to every notion concerning
(S, <),corresponds a dual notion concerning (S, <'). thus, for example,
a: is a least element in (S, <) if and only if it is a largest element in (S, < ).
Also if A Q S, A then b is a lower bound for A in (S, <) if and only
if it is an upper bound for A in (S, <'); further if a — inf A in (S, <)
234 THE REAL NUMBERS [CHAP. 7

then a = sup A in (S, <’), etc. Thus we see that we could deduce (ii)
from (i) if we are able to show that whenever (S, <) is continuously
ordered so also is (S, <'). In fact, we see from 7.1 and 7.2 that if B c S
then B is an upper section in (S, <') if and only if it is a lower section in
(S, <). But we saw in 7.3 that in this case S — B contains a least element
in (S, <), that is, S — B contains a largest element in (S, <'). Hence
the latter is also continuously ordered. [A “direct” proof of Theorem
7.9(ii) would implicitly involve such ideas.]
Note that 7.9(i) applied to A e U(S) gives inf A as the largest element
of & A: for A has no least element, so inf A & A; on the other hand,
if b E S — A, we know that b < x for all .c E A, so 6 is a lower bound for
A and thus b < inf A by definition. Thus any system (S, <) which
satisfies 7.9(i) for all A c S and which is densely ordered without first or
last element is necessarily continuously ordered. Similarly for 7.9(h).
The notion of continuous ordering could be modified slightly so as to
allow for systems with first or last element (or both). Then we could easily
prove an existence theorem like 7.5(i) for any densely ordered system
(S, <) from the given Theorem 7.5. However, for the main case in which
we are interested, the set of real numbers, we do not need the modified
notions.
It may be thought that having proved the existence of a continuously
ordered extension of (Ra, <) we are now in a satisfactory situation re¬
garding the real numbers. However, there is as yet no guarantee that we
can suitably define algebraic operations on such an extension. Furthermore,
the property of being a continuously ordered extension of (Ra, <), while
an essential property of the real numbers, cannot be said to characterize
the real numbers from an algebraic point of view. For it can be shown
that there are two systems, (Slt <j) and (S2, <2), with this property
such that (Si, <i) (S2, <2). (In fact, there are many such systems.)
The proof of this is not difficult but giving it here would involve introduc¬
ing various notions from the theory of ordering relations that are not
useful to us otherwise in this text. In contrast, we shall see in the next
section that the real numbers can be completely characterized when the
algebraic structure is considered along with the ordering.

Exercise Group 7.1

1. Prove Theorem 7.4. (Cf. Exercise 2, Exercise Group 6.2.)


2. Prove Theorem 7.5(ii).
3. Show that if a G t (Ra), r E Ra and r > 0, then there exist x E A,
y e Ra„~ A wRh x — y <-r. (In other words, if we think of r as being
small, there are elements of .4 and Ra — A as close as we please.)
7.2] CONTINUOUSLY ORDERED FIELDS 235

7.2 Continuously ordered fields. Pursuing the ideas suggested at the


end of the preceding section we now consider the following.

7.10 Definition. A system (K, +, •, <, 0, 1) is said to be a continuously


ordered field if it is an ordered field for which (K, <) is a continuously
ordered system.

Since every ordered field contains a subsystem = to the rationals as a


subfield, for simplicity we use the same symbols to denote the basic opera¬
tions and relations as for Ra. We modify this usage only when it is neces¬
sary to consider two such fields at the same time. Our first main object
is to show that any two continuous ordered fields are =. To this end,
we assume throughout the remainder of this section that (K, +, •, <, 0, 1)
is any continuously ordered field (which contains Ra as a sub field). We shall
show that its structure is completely determined by the field structure of
Ra. Theorem 7.5 showed us how to extend (Ra, <) to a continuously
ordered system (f7(Ra), <). Our first step is to show that K is neces¬
sarily in one-to-one correspondence with C7(Ra). The natural cor¬
respondence associates with each a E K the set Ca given by the following.

7.11 Definition. For a E K, Ca = {x: x E Ra and a < x}.

(This is a generalization of the sets Ca used in the proof of 6.5 for a E Ra.)

The Archimedean property. We expect, in general, that if a £ if then


Ca £ C7(Ra). At the beginning of a proof of this we would have to show
that Ca ^ 0, he., that for each a E K there exists an x E Ra with a < x.
This is by no means true of arbitrary ordered fields. For example, the
field Ra(£) of quotients of Ra[f] can be ordered in such a way that x < £
for all x E Ra. (Cf. Exercise 6, Exercise Group 5.1.) The desired property
for K can also be formulated more specifically as follows: for each a E K
there exists an n E P with a < n. It is easily seen that this is equivalent
to the previously stated property.

7.12 Theorem, (i) For each a E K there exists n E P with a < n.


(ii) For each a E K with a > 0 there exists n E P with l/n < a.
(ifi) For each a, b E K with a <! b there exists x E Ra with a <C x <C b.

Proof, (i) Suppose for all n E P, n < a. Then a is an upper bound for P.
But (K, <) is continuously ordered, so sup P exists in K by 7.9; we put
b = sup P. Since n < b for all n E P, also n + 1 <6 and then n < b — 1
for all n E P. But b — 1 < b, and b — 1 is a bound for P, contradicting
b = supP. (ii) follows immediately by finding n E P with 1/a < n.
To prove (iii), we need to find m, n E I such that a < m/n < b. This
236 THE REAL NUMBERS [CHAP. 7

should be possible if n e P is chosen so large that 1/n < b — a; we can


find such n by (ii). Then there exists an m E I with a < m/n, for by (i)
if we choose k e P with a < k, we have a < nk/n. Consider the set
A = {m:m el and a < m/n}. If m e A and m < m' then m' e A.
Also not all integers belong to A; for we can find l E P with (—a) < l,
hence l < a by (i), and then —In/n < a. Thus all integers in A are
greater than some one integer. It follows by 4.25 that A has a least ele¬
ment, call it m: then (m — 1 )/n < a < m/n. By adding 1/n to both
sides of the first inequality here we get m/n < a -f 1/n < a + (6 — a),
so that, in fact, a < m/n < b.
We have used quite essentially in the proof the fact that K constitutes
a continuously ordered field; we could not expect to obtain (iii) above if
we looked on K as merely forming a continuously ordered system contain¬
ing Ra.
There is another formulation of a property of K equivalent to 7.12(i),
which is often referred to as the property of being an Archimedean order¬
ing, namely: whenever 0 < b < c then for some n e P, c < nb. This
follows directly from 7.12(i) when we take a = c/b. Conversely, 7.12(i)
follows from Archimedean ordering. For if a E K then a < 1 or 1 < a.
In the second case we can find some n e P with a < nl, so in either case
we obtain n e P with a < n. The Archimedean property can be inter¬
preted as guaranteeing the possibility of measurement with respect to
arbitrarily chosen positive lengths b, no matter how small. For, given
c > 0, we can always find n e P with (n — 1)5 < c < nb; thus we can
“locate ” c by laying off 5 sufficiently often.

7.13 Theorem.
(i) If a E K then Ca 6 C/(Ra).
(ii) If a, b e K and a < b then Ca < Cb.
(iii) If A E U(Ra) then A = Ca for a unique a E K.

Proof, (i) Ca ^ 0 by 7.12(i); Ra - Ca ^ 0 by 7.12(iii) applied to


a — 1, a. Clearly x e Ca and x < x' implies x' e Ca. If x e Ca then
af< x so there exists e Ra with a < x' < x by 7.12(iii); but then
x' e Ca. Hence Ca G C(Ra). (ii) follows directly from 7.12(iii) and
properties of upper sections. To prove (iii), consider A e t/(Ra). Then
A ^ 0 and A is bounded below in K. Let a = inf A in K; a G A, since
otherwise A would have a least element. Hence if x E A then a < x.
Conversely, suppose it is given that a < x. If for all x' e A we had
x < <c , then a would not be inf A. Hence there exists x' e A with
x' < x‘> then * G A since A e C(Ra). Thus A = (oeRa and
a < x} = Ca- a is unique by (ii).
Thus the function which associates with each a e K the set Ca is a one-
to-one order-preserving correspondence between K and U(Ra). This
7.2] CONTINUOUSLY ORDERED FIELDS 237

correspondence induces certain “addition” and “multiplication” operations


on £/(Ra).

7.14 Definition. Suppose that A, B E U(Ra). Let a, b be the unique


elements of K such that A = Ca, B = (7b. Then put
(i) A © B = Ca+b,
(ii) ©d = C—a,
(iii) A o B = Ca-b-

These definitions of °, as given, depend on K. In order to show that


the structure of K is completely determined by Ra, we must next show
that it is possible to give intrinsic definitions of ©, ° in terms of Ra. We
shall see here the reason for also singling out the © operation. Note that
if a E Ra, Ca is well determined independently of K. In particular C0,
Ci correspond to 0, 1, respectively.

7.15 Theorem. Suppose that A, B e U(Ra). Then:


(i) A © B = {u: u E Ra and for some x E A and y E B,
x + y = u) ;
(ii) ©4 = {u: u & Ra and for some z E Ra, z < u and —x < z
for all x E A} ;
(iii) if A < C0 then C0 < ©d;
(iv) If C0 A A, Co <,B then A ° B = {u: usRa and for some
iei and y E B, x • y = u};
(v) if A < C0, C0 A B then = ©[(©4) ° B];
(vi) if Co A A, B < Co then A ° B = ©[A ° (©R)];
(viii) if A < Co, B < Co then A ° B = (©d.) ° (QB).

Proof, (i) Using 7.14(i) and the definition of Ca, we see that we must
show for any a, b E K and any u E Ra:

(1) a -\~ b < u if and only if there exist x, y E Ra such that a < x,
b < y, and x + y = u.

Clearly if the right side of this equivalence holds, so does the left. In the
other direction, we first find an x E Ra such that a < x and x + b < u,
that is, a < x < u — b; this is possible by 7.12(iii). If we then take
y = u — x, we have b < y and x + y = u. Before proving (ii), consider
the intuitive picture, let us say for Co < A:

A'

+
'- r
QA

Figure 7.8
238 THE REAL NUMBERS [CHAP. 7

Here A' = {—x: x G A). Since we want an upper section, we consider


all those u E Ra greater than all elements of A'. However, if it happened
that A = Ca where a e Ra, the element —a would also be counted among
these u, and would be the least such u. To prevent the existence of such
a least element, we also require that there be z < u which also has this
property. By a similar picture for A <, C0, we see that (ii) always gives
the correct definition of QA on geometric grounds. We leave it to the
student to provide an exact proof of (ii) and (iii). (Note that QA should
not be confused with the set difference Ra — A.)
In contrast to (i), the definition in (iv) would not correctly define
A o B without the restriction Co A A, Co <. B. As an example, consider
(«:«eRa and for some x > — 1 and y > 2, x ■ y = u}, which might
be expected to correspond to C0 ° C2 = C_2. In fact, the set so defined
consists of all rationals; for if n E P and n > 1, we have 2n > 2, and
—\ > — 1, so — %(2ri) = —n is in the set. Hence also all u e Ra with
—n < u are in the set. It is for this reason that we must break down the
definition of ° into cases and use the operation QA to reduce the other
cases to this one.
We now give a proof of (iv). Let A = Ca, B = Cb where a, b e K.
By hypothesis and 7.13(h), 0 < a and 0 < b. We must show that for
any u G Ra,

(2) ab < u if and only if there exist x, y e Ra such that a < x,


b < y and x ■ y = u.

If the right side holds, we have ab < xy, so ab < u. Suppose that ab < u.
If b = 0, we need only find x E Ra with a < x, which is possible by
7.12(i). For then if we take y = u/x, 0 < y since both x, u are positive
and xy = u. Suppose that 0 < b. Then there is an x e Ra such that
a < x and xb < u, that is, x < u/b, by 7.12(iii). Then again if we take
V = u/x, we have xy = u > xb, so that from x > 0 we conclude y > b.
The proof of the remaining parts is now direct. For example in (v),
writing A = Ca, B = Cb, we have by definition A ° B = Cab, QA = C_„,
(0A) ° B = C(—a)b, and 0[(0A) ° B] = (7_((_a)6) = Cab■ Similarly for
(vi), (vii).

Isomorphism of continuously ordered fields.

7.16 Theorem. (K, 0, 1) ^ (C7(Ra), 0, ■<, C0> 0).

Proof. We have already seen that the function F(a) = Ca is a one-to-


one, order-preserving mapping of K onto f/(Ra). Thus we need only show
F(a + b) = F(a) Q F(b), F(a • b) = F(a) ° F(b) for all a, b e K. But
this is simply a rephrasing of Definition 7.14(i) and (iii).
7.2] CONTINUOUSLY ORDERED FIELDS 239

7.17 Theorem. Any two continuously ordered fields are isomorphic.

Proof. We can assume, without loss of generality, that both fields con¬
tain Ra as a subfield, by 6.12. We call one (K, +, •, <, 0, 1), the other
(K, -J-, •, <, 0, 1). But both fields are = to (£/(Ra), ©, °, <, C0, C±),
by 7.16. We are justified in using this same field in both cases, since 7.15
shows how A © B, A » B can be defined for all A, B e U(Ra) in terms of
Ra only. For we know that U(Ra) is simply ordered by <, so that for
any A e U(Ra), Co < A or A < C0.
We have now achieved our first main object, and we turn to the second,
a proof that there exists at least one continuously ordered field. From
the preceding this should seem rather a straightforward matter. For we
have already proved in 7.5(i) that (f/(Ra), <) is a continuously ordered
system. Thus all that would be left is to show that if we take 7.15(i), (ii),
(iv)-(vii) as constituting a definition of ©, ©, ° on U(Ra), (f/(Ra), ©,
°, <, C0, Ci) is an ordered field. We know this must be so by 7.16 if there
exists any continuously ordered field (K, +, •, <, 0, 1) at all. However,
in a proof of this statement we would not be able to take advantage, as
we did earlier, of the presumed properties of such a K, but must rather
work out from scratch the fact that C(Ra) possesses all the desired
properties. This is a long and tedious task which provides as reward
little further insight into the matter at hand. We shall, instead, now turn
to another approach to the existence proof which is based on the ideas of
Cauchy.

Limits. Up to now our guiding idea has been to think of real numbers
as greatest lower bounds of sets A of rational numbers, in particular of
upper sections—or, dually, as least upper bounds of lower sections. If
we return to our discussion of measurement in Section 7.1, it is equally
natural to think of real numbers as limits of sequences (a0, . . . , an, . . .)
of rational numbers. For intuitively, even if a given length is not rational,
we can approximate it “as closely as we please ” by rational lengths, simply
by taking finer and finer subdivisions of the unit of measurement. The
decimal representation of a real number, such as \f2 = 1.4142 . . . , is
intended to signify that the terms of a certain sequence of rational num¬
bers 1, 1.4, 1.41, 1.414, 1.4142, . . . approximate \/2 in this sense. Of
course, this is by no means the only sequence which has this property;
this particular sequence is obtained by approximating the number from
below using successive subdivisions by tenths. We could also consider
sequences which approximate from above, or which alternate above and
below the number, or which are obtained by other types of subdivision,
etc. Thus, to begin with, we will think of real numbers as being limits of
arbitrary kinds of sequences of rational numbers, and only later inquire
240 THE REAL NUMBERS [CHAP. 7

as to what special kinds of sequences already give all real numbers. The
main questions that we shall have to answer are: first, what is a necessary
and sufficient condition for a sequence of rational numbers to have a real
number as limit and, second, what is a necessary and sufficient condition
for two sequences to be equivalent in the sense that they have the same
limit? Then to construct the real numbers, we will be able to abstractly
identify them with equivalence sets of such sequences.
As is usual in analysis, we use such letters as e, 8 in connection with
statements concerning approximation by smaller and smaller numbers.
We continue to assume in the following that (K, +, •, <, 0, 1) is any
continuously ordered field which extends the rationals. Before turning to
notions connected with limits of sequences, we want to re-express the
conditions for a number a to be inf A or sup A in “approximation language. ”

7.18 Lemma. Suppose that A c K, A ^ 0 and a e K. Then:


(i) a = inf A if and only if a is a lower bound for A and for every
e > 0 (e in Ra) there exists x E A with x — a < e;
(ii) a = sup A if and only if a is an upper bound for A and for
every e > 0 (e in Ra) there exists x E A with a — x < e.

Proof, (i) Suppose that a = inf A and e > 0. If there is no x E A


withx — a < 6, or equivalently *c < a + e, then for all .t e A, a + e < x
and a -f- e is a lower bound for A. But since a < a + e, this contradicts
the statement that a is the greatest lower bound for A. Conversely, we
must show that if a satisfies the given condition and 6 is a lower bound for
A then b < a. Suppose, to the contrary, that 6 is some lower bound for A
with a < b. Let e = b — a. Then e > 0, so there is an x E A with
x — a < e, hence x < a + e = b. This contradicts the assumption
that b is a lower bound for A. (ii) is proved similarly.
The theorem still holds if we restrict e to satisfy the weaker condition,
e in Ra (at least weaker in a proof that a = inf A or a = sup A holds).
For we know by 7.12(h) that for each e > 0 there isneP with 1/n < e.
Hence the statement that for every e > 0 there exists x E A with x —
a < e is equivalent to the statement that for every n E P there exists
x E A with x — a < 1/n. Thus, in fact, we can restrict e in (i) to be in
Ra or in any set of rationals which has 0 as greatest lower bound. The
same, of course, holds for (ii).

a b

Figure 7.9

It is in accordance with our intuitive geometric notions that if we have


a < b, where a and 6 represent points on the line, then b — a measures
7.2] CONTINUOUSLY ORDERED FIELDS 241

the distance between the two points. (This is first realized for a > 0
and then extended to arbitrary o, b by considerations of signs.) In general,
if the relation of inequality is not specified, we use \b — a\ to measure the
distance; for \b — a\ = b — a if a < b, and |6 — a\ = — (b — a) =
a — b if b < a. We have defined the absolute-value function for any
ordered integral domain in 4.19 and verified its usual properties in 4.20.
In particular we can apply these to K.

7.19 Definition. Suppose that (xk)k>o = (xo, ■ ■ ■ , Xk, . . .) is a


sequence of elements of K, Xk E K for all k > 0. Suppose that a E K.
We say a is a limit of this sequence (as k increases indefinitely) if
for each e > 0 (e in Ra) there is some m such that

|Xk — o| < e for all k > m.

Just as remarked earlier the same notion is defined here no matter


whether stated for arbitrary e in K, just e in Ra, or even just e of the
form 1/n, n e P.

7.20 Lemma. Suppose that Xk £ K for all k > 0. Then the sequence
(xk)k> o has at most one limit in K.

Proof. Suppose that a, b are both limits of (Xk)k<o■ For any e > 0
there exist mi, m2 such that |Xk — a\ < e for k > m^ and \xk — 6| < e
for k > m2. Let k > max (mi, m2). Then

\a — b\ ^ |(a — xk) + (xk — 6)1 < |a — xk\ + \xk — b\ < 2e.

Suppose that a 5^ b. Then choosing e = j\a — 6| gives a contradiction.

7.21 Definition. Suppose that Xk G K for all k > 0.


(i) We say that (xk)k> o is convergent or that the limit of this sequence
exists if there is some a G K such that a is a limit of the sequence.
(ii) If (xk)k> o is convergent and a is the unique limit of the sequence
we write
lim Xk = a.
/c—>oo

We deal with convergence and limits of sequences (xk)k>i by regarding


them as sequences (xk+i)k>o, with xQ, . . . , xi-1 chosen arbitrarily. It is
clear that for any /, (xk)k>i converges if and only if (xk)k>o converges,
and that these have the same limit if they converge. Another way of put¬
ting this is that, for any given l, we can alter the first l terms of a sequence
without changing its convergence or limit, if a limit exists.
242 THE REAL NUMBERS [CHAP. 7

Some examples of the use of the above notions are:

(7:2-1) (i) The sequence (1 + !/&)&> i is convergent, and

(ii) The sequence (1 + (—l)fcl//c)fc> x is convergent, and

lim(1 + (-l)‘i)= !■

(iii) The sequence ((—l)fc(l + l/&))*> i is not convergent.


(iv) The sequence (k)k>o is not convergent.

For example, to prove (ii), we note that for each n,

|(1 + (-1)*1 /k) - 1| = |(-l)fclA| = l/k < 1 In

if k > n; hence for e = \/n we can take m = n + 1 in 7.19(i). The


proofs of (iii) and (iv) are more troublesome if one works with 7.19, for
we must show that no matter what a is chosen we cannot have lim*.-,*
Xk = a for the given sequence (Xk)k>o• We would rather use here some
intrinsic condition on the Xk which would not involve considering all such
a. Intuitively, the reason that the sequences in (iii), (iv) do not converge
is that the terms of the sequence do not eventually become and remain
closer together than any preassigned number.
From now on we shall use (xk) to indicate a sequence (xk)k>o, except
where another notation is necessary to avoid ambiguity.

Fundamental sequences. The condition on sequences which is involved


in the above is expressed precisely in the following.

7.22 Definition. Suppose that A c K. Then (xk) is said to be a funda¬


mental sequence in A, in symbols (xk) G Fd(_4), if xk e A for
k > 0, and if for any e > 0 (e in Ra) there exists an m such that
|Xk — xi\ < efor all k, l > m.

Very often the terminology, Cauchy sequence, is used instead of funda¬


mental sequence. Clearly, if (xk) is in Fd(2?) then it is in Fd(A) for any A
which contains all x^ Also, as we have seen before, we can restrict our¬
selves to e in Ra.

7.23 Theorem. If (xk) is convergent then (xk) e Fd(A).


7.2] CONTINUOUSLY ORDERED FIELDS 243

Proof. For some a e K, lim^c* Xk = a. Thus, given any e > 0, we


can choose m so that \xk — a\ < e/2 for all k > m. Then if k, l > m,

W ~ xi\ = |(xk — a) + (a — xj)| < |xk — a\ + \a — xi\ < - + ~ = e.

Hence (xfc) is a fundamental sequence.


Referring back to the examples (ii), (iii) of (7:2-1) we see that both
sequences fail to converge because neither is a fundamental sequence in K.
For, in both cases, we can find an e > 0, namely e = 1, such that for every
m there exist k, l > m with \xk — xi\ > e; for example, k = m, l = m -f- 1.
Note that there are sxibsequences of the sequence <(—l)fc(l + 1 /k))k>i
which converge, for example, the sequence (1 + 1/2k)k>\', on the other
hand, no subsequence of (k)k> o will converge. By a subsequence we
mean the following.

7.24 Definition. Suppose that (xk), (yk) are sequences in K. Then (yk)
is said to be a subsequence of (xk) if for a certain sequence (jk) of
integers, 0 < j0 < ji < ■ • • < jk • • •, we have yk = %jkfor each k > 0.

We adapt this definition in the usual way to sequences (Xk)k>i• The


difference noted between the sequences of (7:2-1) (iii) and (iv) is brought
out by the following.

7.25 Definition. A sequence (xk) in K is said to be bounded if {x0, . . . ,


Xk, ■ . •} is bounded from above and below. Equivalently, it is bounded
if for some b e K we have 1^1 < b for all k.

The Bolzano-Weierstrass Theorem.

7.26 Theorem. Every bounded sequence (xk) in K has at least one con¬
vergent subsequence.

Proof. Choose bQ, c0 with

(1) b0 < Xk < c0 for all k.

We think of the sequence (xk)k>o as “clustering” around various points


between b0 and c0, i.e., points d such that for each e > 0 there are infi¬
nitely many k with Xk nearer to d than e, |Xk — d\ < e. We shall now
describe a procedure for “narrowing down” on one of these points. We
use below the notation

(2) [6, c] = {x: x e K and b < x < c),

and call [5, c] a (closed) interval. The mid-point of this interval, in geometric
244 THE REAL NUMBERS [CHAP. 7

terms, is (6 + c)/2. At any rate, since b < (6 + c)/2 < c whenever


b < c, it is clear that

(3) if there are infinitely many k such that Xk G [b, c] then the same
holds for at least one of the intervals [b, (6 + c)/ 2] and [(6 + c)/2, c\.

Our narrowing-down procedure simply consists in successive divisions by 2,


choosing at each step one subinterval containing Xk for infinitely many k.
That is, we define recursively a sequence bn, cn, starting with b0, c0 given
by (1), as follows.

(4) Suppose that bn, cn are given. We take 6n+1 = bn and cn+1 =
(bn -f cn)/2 if there are infinitely many k such that Xk G [bn,
(bn + cn)/ 2], Otherwise we take bn+\ = (bn + cn)/‘2 and
Cn 1 = Cn.

[This is justified by Theorem 3.4 providing for recursive definition on P,


by defining the function F on P with values ordered pairs in K, F(n) =
(6n—i, cn_i); F( 1) is given and F(n) = G(F(n — 1)) for n > 1, where G
is chosen so as to satisfy (4).] Now the following properties of the sequence
of intervals [bn> cn] is easily proved by induction on n, using (3) in the last
of these:

(5) (i) bn ^ cn>


(ii) bn ^ hn+i and Cn-^-i ^ cn,

(iii) °n — bn = ~ (c0 — b0);


(iv) there are infinitely many k such that x* G [bn, cn].

This leads to the following.

(6) Let B = {60, C = {c0, . . . , cn, . . .}. Then each


cn is an upper bound for B and each bn is a lower bound for C.

For, we first see from (ii) that if n < m then bn < bm and cm < cn.
Hence for any m we have bn < bm < cm if n < m and bn < cn < cm
if n > m, using (5)(i). Thus in any case, given m this shows bn < cm
for all n, that is, cm is an upper bound for B. The last statement is proved
similarly. It follows from (6) and 7.9 that sup B and inf C exist. In fact,
since we have chosen smaller and smaller intervals [bn, cn], we have

(7) sup B = inf C.

For an exact proof, let b = sup B, c = inf C. Then b < cn for all n
by (6) and definition of sup. But then, by definition of inf, b < c. Since
7.2] CONTINUOUSLY ORDERED FIELDS 245

for any n, bn < b < c < cn, we have 0<c — b<cn — bn =


(l/2n)(c0 b0). If 60 = c$, certainly b = c. Otherwise, this shows that
(c — b)/(c0 — bo) < 1/2" for all n. Now if 6 ^ c, (c — b)/(c0 — b0) > 0
so that by the Archimedean ordering of A[7.12(i)] there would exist n
such that 1/2" < (c — b)/(c0 — b0). Thus b = c.
The number defined in (7) is a natural candidate as a limit of a certain
subsequence of the (xk). One subsequence of this kind is determined
as follows.

(8) Let jo = 0. Given jn, let jn+1 be the least k such that jn < k and
Xk E [6n+i, cn+i]. Let yn = Xjnfor all n > 0.

That there exists at least one such k larger than any given j is immediate
from (5)(rv). Thus this is a well-defined recursive procedure. By choice,
jn < jn+i for all n, so that

(9) (yk) is a subsequence of (xk).

Finally, let

(10) a = sup B.

Then

(11) lim yk = a.
/c—>00

For suppose we are given an e > 0. By 7.18(i) there exists m\ with


a — bmi < e. But bmi < bn < a for all n > mi, so o — bn < e for
n > m,\. Similarly, using a = inf C and 7.18(ii) there exists m2 with
cn — a < e for all n > m2. Taking m = max (mi, m2), we hnd that if
n > m&ndx E [bn, cn] then \x — a\ < e. Now by (8) each Xj e [bn, cn] c
[bm, cm], for n > m, so that \yn — a\ < e for any n > m. This proves
(11) and thus concludes the proof of the theorem.

This important result is often referred to as the Bolzano-Weierstrass


theorem. It now permits us to prove the converse of 7.23.

7.27 Theorem. If (xk) E Fd(A) then (xf) is convergent.

Proof. First we show that

(1) (xk) is bounded.

For take e = 1. Then for some m, |Xk — xi\ <1 for all k, l > m. But
\xk\ < |Xk — xi\ + |xi\ (by the triangle inequality applied to \xk\ =
\{xk ~ xi) + xi\), so \xk\ < 1 + \xi\ for all k, l > m. In particular,
246 THE REAL NUMBERS [CHAP. 7

1^*1 < 1 + \xm\. Then if we take b = max (1 \xm\, \xo\, • • • , \xm—\\)


we have \xk\ < 6 for all k.
We can thus find by the Bolzano-Weierstrass theorem some sequence
jk of integers and a E K with

(2) jo < ji < ■ ■ ■ < jk < ■ ■ ■ and lim xjk = a.


k—>oo

Now consider any e > 0. By (2) there is an mx such that |Xj — a\ < e/2
if k > m\. Also there is an m2 with \xi — Xj\ < e/2 if j, l > m2. Let
m = max (mi, m2). Then, since jm > m, if also l > m we have

IXi — a\ = I (xi — Xj ) + (xj — a) I < \xi — Xj I + \x, — a| < e.

The importance of this theorem, in combination with 7.23, is that it


provides us with an intrinsic condition on a sequence for it to be convergent.
The property of K given by 7.27 is often referred to by saying that K is
topologically complete. In modern analysis one studies a wide variety of
sets on which can be defined a notion of distance or metric between two
points x, y, corresponding to our use of \x — y\ here, and with respect
to which the notions of limit and fundamental sequence can be developed.
It turns out that versions of the Bolzano-Weierstrass theorem and of the
completeness theorem above can also be proved for many such sets.
We shall see how this can be done for the complex numbers in the next
chapter. These notions and results are generalized still further in modern
topology, where the notion of limit is treated without any dependence on
that of distance, and where algebraic operations on elements play no role
to begin with.
With each a e K is associated the set of all sequences {xk) with Hindoo
Xk = a. There is at least one such sequence, the trivial sequence Xk = a
for all k. We shall soon show that we can also find at least one such
sequence with xk £ Ra for all k; this is a precise statement corresponding
to the intuitive idea of approximation by rationals, which led us to study
convergence of sequences. First, however, we wish to see how the operations
on and relations between elements a, b of K are reflected in corresponding-
operations on and relations between their associated sequences.

7.28 Definition.
(i) For each a e K we put a = (xk) where Xk = a for all k > 0.
For any (xk), (;yk> we put:
(ii) (xk) + {yk) = {xk + yk),
(iii) (xk) — (Vk) = W - yk),
(iv) (xk) • (yk) = (xk • Vk),
(v) (Xk)/{yk) = (Xk/yk) if yk 5^ o for all k, and
(vi) (yk) < (xk) if for some m and e > 0, e < xk — ykforallk > m.
7.2] CONTINUOUSLY ORDERED FIELDS 247

7.29 Theorem. Suppose that A c K, and that A is a subfield of K.


(i) If (xk), (yic) e Fd(A), we have

W) + (yk), (Xk) — (Vk), (xk) • (yk) £ Fd(A).

If yk ^ o for all k and yk ^ 0 then also (xk)/(yk) £ Fd(A).

(ii) If xk = a, lim^* yk = b then lim^x (xk ± yk) =


a ± b, linifc^oo (xk ■ yk) = a ■ b and, if each yk ^ 0 and 6 ^ 0,
lim/c-Ks ixk/yk) = a/6.
(iii) If (Xk), (yk) £ Fd(A) then lim^x Xk = limfc_*x yk is equivalent
to lini/t^oo (Xk — yk) = 0, i.e., for any e > 0 (e in Ra) there
exists m such that \xk — yk\ < € for all k > m.
(iv) If limitxk = a, lim*^* yk = b then <yk) < <xk) if and only
if a < b.

Proof, (i) By definition, given ex > 0, we can find m such that for all
k, l > m, |Xk — xi\ < ei and |yk — yi\ < ej. Given e > 0, take
€i = e/2 and m given by this ex. Then for k, l > m,

I (a* + yk) — (xi + yi) |


= |(xk — xi) + (yk — yi)| < |xk — xi\ + \yk — yi\ < e/2 + e/2 = e.

Also,

I (xk — yk) — (xi — yi) |


= |(xk — xi) — (yk — yi)| < |xk — xt\ -f \yk ~ yi\ < *■

To treat multiplication, we first observe that for any k, l,

\xkyk — xvyi\
= |(xkyk — xiyk) + (xtyk — xtyi)| < \xk — xi\ \yk\ + |yk — yi\ \xi

By (1) of the proof of 7.26, any fundamental sequence is bounded. Hence


we can find c > 0 with \xf\ < c and \yi\ < c for all i. Thus if we choose
ex = e/2c, we obtain the required m. Finally, to treat division, we first
observe for any k, l,

Xk Xi xkyi — xiyk {xkyi — Xiyi) — (xiyk — xiyi)


yk yi ykyi yuyi

i—n—r (I** — xi\ \vi\ + \y*> — yi\


\Vh\ Iyi\

Again pick c > 0 with |xf\ < c, \yi\ < c for all i. Further, since lim^*
yk ^ 0, there must be some d > 0 such that \yk\ > d for all k. (This is
seen as follows. First, there is e > 0 such that for all m, there is some
248 THE REAL NUMBERS [CHAP. 7

k > m with \yk\ > e. But (yk) is fundamental, so there is some m2 such
that for all k, l > m2, \yk — yi\ < e/2. Thus if k is chosen so that
\Vk\ > e, we must have \yi\ > e/2 for all l > m2. Now take d = min
(|?/o|, • • • , \ym2-i\, e/2). Then d > 0 since all yk ^ 0, and \yk\ > d for
all k.) But then for any ex and m with |Xk — xi\ < eiy |yk — yt\ < d
for all k, l > m, we have

Xjc x_i 2e,c


Vk yi d2

Hence, given e > 0, if we take ex = e d2/2c, we obtain the required m.


(ii) The proof of this part proceeds in a completely related way. We
know here that for each ex > 0 there is an m such that \xk — a\ < ex and
\yk — &| < ej for all k > m. Then, for example, to prove lim/c^o, (xk • yk)
= a ■ b, we write

IXkVk ~ ab| = \(xkyk — ayk) + (ayk — ab)\ < \xk — a\ \yk\ + |a| \yk — 6|.

Then, using c > 0 with \yk\ < c for all k and \a\ < c, we obtain the
desired result for given e > 0 by taking ex = e/2c. In general, a and b
take over the role of xi and yi, respectively, in the proof of (i).
We leave the proof of (iii) and (iv) to the student.

7.30 Theorem. For each a e K there exists (xk) £ Fd(Ra) with

lim xk = a.
k
—>00

Proof. Let bk = a + \/{k + 1) for each k > 0. By 7.12(iii) we can


find for each k an xk e Ra with a < xk < bk. Then \xk — a\ < 1 /{k + 1).

Construction of a continuously ordered field. We can now assemble these


various results as follows. For each (xk) e Fd(Ra), let G((xk)) =
lim^-^oo xk. This exists by 7.26. The function G has the property that

G((xk) + (yk)) = G((xk)) + G((yk)),


G((xk) • (yk)) = G((xk)) ■ G((yk))

and <xk) < (yk) if and only if G((xk)) < G{{yk)),

by 7.29. Furthermore, for each a e K there exists (xk) £ Fd(Ra) with


d((xk)) = a by the preceding theorem. Hence the function G maps the
system (Fd(Ra), +, •, <, 0,1) homomorphically onto (K, +, •, <, 0, 1).
It follows by our general considerations on homomorphism of Section 4.6
that the field K is ^ to a system of equivalence sets [/rfc)] formed from
Fd(Ra), where we take (xk) = (iJk) if G((xk)) = G((yk)), i.e., if
7.2] CONTINUOUSLY ORDERED FIELDS 249

lim^oo Xk = yk- This latter relation is independent of K, since


by 7.29 (iii), this holds if and only if for each e > 0 (e in Ra) there exists
m with \xk — Vk\ < £ for all k > m. Thus if we start with a continu¬
ously ordered field K, this corresponding system of equivalence sets formed
from Fd(Ra) is also a continuously ordered held. By this means we could
get a second proof of 7.17, that any two continuously ordered fields are
isomorphic. Our interest now is, rather, to use these ideas to prove the
following.

7.31 Theorem. There exists a continuously ordered field.

Proof. We have observed in the definition 7.22 of Fd(Ra) that we could


restrict consideration to e in Ra. We assume this restriction in the follow¬
ing. The definitions of /, and < make sense for (xk), (yk) G Fd(Ra),
without assuming anything further about the presumed extension held K.
We also continue to use a for the constant sequences a, where a e Ra.
The notion lim^* Xk = a also continues to make sense for xu G Ra, so
long as a e Ra. In particular, we shall want to consider this for the case
a = 0. We do not ascribe correct significance to the statement lim^oo
Xk 9^ 0, if we take this to mean that lim^x Xk exists in Ra and is 5^0;
however, we take this to mean that lim*;-^ Xk = 0 is false. With this
understanding of the notions concerned, we then have:

(1) Fd(Ra) contains a for each a e Ra and is closed under +, —, *,


and /, in this last case when dividing by (yk) with yk ^ 0 for all
k and lim^oo yk ^ 0.

This is seen from the proof of 7.29(i), which (applied to A — Ra)


now depends only on Ra and not on any presumed K. Now we claim

(2) (Fd(Ra), +, •, 0, l) is a commutative ring with unity.

The proof that the various conditions of 4.1 hold for this system is quite
straightforward on the basis of the definition 7.28 of the various opera¬
tions. For example, to prove distributivity, we have

(xk) • ((yk) + (zk)) = (xk) • (yk + Zk) = (xk ■ (yk + «*)}


= ((xk ■ yk) + (xk ■ zk)) = (xk ■ yk) + (xk • zk)

= ((xk) • (yk)) + ((xk) • (zk))-

For existence of additive inverse, write

— (xk) = 0 — (xk) = (—xk);


250 THE REAL NUMBERS [CHAP. 7

then (xk) + (—(xk)) = 0. The system Fd(Ra) also has the following
property:

(3) If (xk) G Fd(Ra), Xk ^ 0 for all k and lim/c-,*, Xk 9^ 0, then for


some (yk) G Fd(Ra), (xk) • (yk) = I.

Namely, take (yk) = (l/xk) = l/(xk); thus (yk) G Fd(Ra) by (1).


The system Fd(Ra) does not form an integeral domain. For example,
if we take xk = 0 for k even, xk = l/(k + 1) for k odd, and yk =
l/(k + 1) for k even, yk = 0 for k odd, both (xk), (yk) G Fd(Ra), but
(xk) • <yk) = 0. However, we shall now obtain an integral domain, in
fact a field, from Fd(Ra) by taking a homomorphic image as described
earlier.

(4) We define (xk) = (yk) to hold if lim^*, (xk — yk) = 0.

Then

(5) = is a congruence relation in the system (Fd(Ra), +, •, 0, 1).

For it is clear that = is reflexive and symmetric. If <xk) = (yk) and


(yk) = (zk) then lim/c^oo (xk — yk) = limfc^oe (yk — zk) = 0. But then
by 7.29(h), which can be applied when the limits concerned belong to Ra,

lim ixk — zk) = lim ({xk — yk) + (yk — zk)) = 0 + 0 = 0.


k—>co k—>oo

Hence (xk) = (zk) and we see that = is transitive, thus that it is an equi¬
valence relation. Suppose now that (xk) = (xk), (yk) = (y'k). Then

(%k) + (yk) = (xk + yk), (xk) + (yk) = (x'k + y'k),

and

lim (fxk + yk) -- (x'k + y'k)) = lim ((xk - x'k) + (yk — y'k)) = 0,
k—»oo k—»oo

since lim^o, (xk — x'k) = lim^oo (yk - y'k) = 0. Thus (xk) + (yk) -
(x'k) + (yk). Also (xk) • (yk) = (x'k) • (yk) and (xk) • (yk) = (x'k) ■ (y'k) so
that (xk): (yk) = (x'k) • (y'k). To see the first of these, we consider
lim/c^oo (xkyk — xkyk) = lim^oo (xk — x'k) • yk. We cannot apply 7.28(h),
directly, since lim^,*, yk may not exist in Ra. However, since (yk) is
bounded, if we choose c G Ra with c > 0 and \yk\ < c for all k, then for
any e > 0, we can find an m such that \xk — x'k\ • \yk\ < e for all k > m,
namely an m such that \xk — xk\ < e/c for all k > m. Using commuta¬
tivity of •, the other statements follow directly.
7.2] CONTINUOUSLY ORDERED FIELDS 251

(6) For each <Xk) G Fd(Ra) we put [\Tfc)] = the equivalence set to
which (Xk) belongs, with respect to the relation =. We denote by
Fd*(Ra) the collection of all such equivalence sets. For (xk),
(yic) G Fd(Ra), we put
(a) [(xfc)] @ [<2/fc)] = K^fc) + (yk)],
(b) [<£*)] o [<2/fc)] = [<xfc> • <?/*)], and
(c) a* = [a] for each a e Ra.

By (5), the operations ©, ° are well-determined by (6a) and (b).

(7) The system (Fd*(Ra), @, °, 0*, 1*) is a commutative ring with


unity.

For the function G((xk)) = [(a;*)] is a homomorphic mapping of Fd(Ra)


onto Fd*(Ra) by (5) and the general considerations of Section 4.6. Then
by 4.55 we will have (7) if we can show 1* 5^ 0*, that is, [I] ^ [0], i.e.,
that 1^0. But this follows from lim^*, (1 — 0) = 1. To improve the
result (7) we now need the following lemma:

(8) if X & Fd*(Ra) and 1^0* then there exists (xk) such that
%k ^ 0 for all k, lim/^x Xk X 0 and X = [(x^)].

For, by hypothesis, X = [(wfc)] where (uk) ^ 0, i.e., lim^*, Uk X- 0. Thus


for some e > 0 and for all m there is some k > m with \uu\ > e. Further,
since (u^ G Fd(Ra), we can find m such that \uu — ui\ < e/2 for all
k, l > m. But then \ui\ > e/2 for all l > m, by comparison with Uk,
k > m, for which \uk\ > e. Now put Xk = e/2 if k < m, xk = Uk if
k > m. Then (xk) = (uk) since Xk ~ Uk = 0 for all k > m. Since
\xk\ > e/2 > 0 for all k, this provides the desired representative of Ah
We can now prove that

(9) the system (Fd*(Ra), ©, °, 0*, 1*) is afield.

It is sufficient to show that for each X e Fd*(Ra), with X ^ 0*, we can


find Ze Fd*(Ra) with X ° Z = 1*. Using (8) we can find (xk) with
X = [(a;*)] such that (xk) satisfies the hypothesis of (3). Hence for some
(zk), (xk) • (zk) = I. Thus for Z = [(2*)], X ° Z = [<%) • (zjfc)] =
[1] = 1*.
We must now turn to verifying properties of ordering in Fd*(Ra).
We first check that

(10) = is a congruence relation with respect to the relation < in Fd(Ra).

Here we take < as defined in 7.29(vi), (yk) < (xk) if and only if for some
e > 0 and m, e < Xk — yk for all k > m. It must then be shown that if
252 THE REAL NUMBERS [CHAP. 7

(xk) = (x'k), (yk) = (y'k) then (yk) < (xk) if and only if (y'k) < (x’k). We
leave the proof of this to the reader. This allows us to define, unambig¬
uously,

(11) for (xk), (yk) e Fd(Ra), we put [(?/*)] <* [<arfc)] if (yk) < (xk).

Then

(12) Fd*(Ra) becomes an ordered field under <*.

We check this by verifying the conditions 4.17(i)—(ii) for an ordered


integral domain, when we take Pos to consist of all X with 0* <* X.
That this is sufficient is seen from [(yk)} <* [(a;*)] if and only if 0* <*
K^fc) — (yk)], which is quite direct from (11) and the definition of <.
Concerning 4.17(i), suppose that 0* <* [(a©], 0* <* [<2/fc>], i.e., for some
6i > 0, e2 > 0 and some mi, m2, ex < xk for all k > mx, e2 < yk for
all k > m2. Then ex + e2 < xk + yk and ex • e2 < xk ■ yk for all
k > max (mx, m2), hence 0* <* [<**>] © [(yk)] and 0* <* [<xfc>] ° [(yk)]-
Now for 4.17(h) consider any [(a©] 0*, i.e., lim^x xk ^ 0. For some
e > 0 and all m there is k > m with > e. As we have argued several
times earlier, this shows there is an mx such that \xf\ > e/2 for all l > mi.
Put d = e/2. We claim then that there exists m such that d < xk for
all k > m or xk < —d for all k > m. In the former case we would have
0 < (xk), and in the latter case we would have 0 < —(xk). Now suppose
that the assertion is false. Then for all m there exist k,l > m with xk < d
and — d < x 1. But by the preceding, if m > mi, we must have |^| > d,
\xi\ > d for such k, l; hence actually xk < —d and d < xt. But then
\xk — >2d = e. To summarize: if the assertion is false then for all
m > m\ there exist Ic, l > m with \xk — xi\ > e. This clearly contradicts
(xk) e Fd(Ra). The only point left to show in the proof of (12) is that it
is not possible for [</©] e Fd*(Ra), that any two of the cases 0* <*
[©c)L 0* <* [(a;*,)], and 0* <* [—(xfc)] hold simultaneously. This is
easily checked via (11).
As we know by 6.13 any ordered field contains a subfield isomorphic to
the rationals. In this case, the subfield consists simply of the set

(13) Ra* = {a*: a e Ra}.

That the mapping F(a) = a* is an isomorphic mapping of Ra onto Ra*,


i.e., that

(14) for a, b e Ra, (a + b)* = a* © b*, (a ■ b)* = a* ° 6*, and


a < b if and only if a* < * b*,
7.2] CONTINUOUSLY ORDERED FIELDS 253

is cleai fiom a b — a + b, a ■ b = a • b, and a < b if and only if


a < b. We also have the following special property:

(15) if X, } e Fd*(Ra) and Y <* X then there exists a e Ra with


Y <* a* <* X.

For let X = [(a;*)], 1 = [(//*)], where {xk), (yk) G Fd(Ra) and for some
e > 0 and m we have e < Xk — yk for all k > m. We can also choose m
so large that |Xk — xm\ < e/4 and |yk — ym\ < e/4 for any k > m. Then

Xm — yk = (xm — ym) + (ym — yk) > e + (ym — yk) > 3e/4

for any k > m. Similarly xk — ym > 3e/4 for k > m. Let a =


(xm + Vm)f2. Then

ri — oi, —- fmXr ym — 2yk _ xm — yk . ym ~ dk ^ 3e ym — yu ^ e

for all k > m. Similarly xk ~ a > e/4 for all k > m. This shows that
Y <* a* <* X.
The notion of fundamental sequence can be applied to Ra*, leading to
the set Fd(Ra*); note the difference in meaning between this and Fd*(Ra).
Members of the former are sequences of elements from Ra*; members of
the latter are equivalence sets of fundamental sequences from Ra. How¬
ever, we have the following relationship:

(16) if (xk) is a sequence of elements from Ra, then (xk) G Fd(Ra) if


and only if (x*) e Fd(Ra*).

For, by (14), the absolute value of the difference, in Fd*(Ra), of any two
elements xk and x* is just (\xk — xi\)*. Thus for (x*) to be in Fd(Ra*) we
must have that for any e* in Ra* with 0* <* e*, there exists m such
(|Xk — xi\)* <* e* for all k, l > m. By (14) this is equivalent to (xk) G
Fd(Ra).
We can further consider the question of whether, for (x*) e Fd(Ra*),
lim^^oo x* exists in Fd*(Ra). In fact we have the following:

(17) for each (x*) G Fd(Ra*), limfc-,* xf exists in Fd*(Ila) and is


equal to \{xk)].

Let X = [(xk)]. What must be checked here is that for any e* in Ra*
with 0* <* e* there exists an m such that the difference between x*
and X in absolute value is <* e* for all l > m—equivalently, that for
any e in Ra with 0 < e there exists an m such that \x% — {xk)\ < e for
all l > m. Since |(t/*)| = {\yk\) for any sequence, we have |xi — (x*,)| =
254 THE REAL NUMBERS [CHAP. 7

{\xi — £fc|)fc>0 for any given l. By the definition of <, we must thus show
that, given e > 0, there exists m such that for any l > m there exists
ex > 0 and mi such that Ci < e — \xi — xk\, that is, \xk — xi\ < e — e1}
for all k > mi. In fact, since <xk) E Fd(Ra) by (16), we can find m such
that \xk — xi\ < e/2 for all k, l > m; hence we can choose ei = e/2 and
nii = m to satisfy the preceding.
Now by the general result (2:4-9) we can find a set K, define operations
+, • on K and a relation < in K, as well as a function F on K, satisfying
the following conditions

(18) (a) (K, +, •, <, 0, 1) contains Ra as a subfield;


(b) F maps (K, -f, •, <, 0, 1) isomorphically onto

(Fd*(Ra), ®, -, <*,0*, 1*);

(c) for each a E Ra, F(a) = a*.

It then follows from (12)—(17) that also

(19) (a) (K, +, •, <, 0, 1) is an ordered field;


(b) for any u, v e K if u < v then there exists a E Ra with
u < a < v;
(c) if (xk) £ Fd(Ra) then there exists u E K with lim^^oo Xk = u.

From these properties we shall now deduce our theorem, by showing that

(20) (K, <) is a continuously ordered system.

For suppose that A is an upper section in K, A E U(K). Then we can


find u E A and v E K — A. Applying (19b) first to u, u + 1 and then
to w — 1, v, we can find a, b E Ra such that a E A, b & K — A, and hence
b < a. For any k > 0 consider the finite sequence of elements

b = b + °(a ~ V , b + (« - » , b . 2<“ - *>)


2k 2k 2k
m(a — b) 2\a - b)
a,
2k

0 < m < 2k. Put Zk,m — b + m(a — b)/2k. Whenever Zk,m £ A and
m < m' < 2k, also zk,m’ E A, since zk,m < zk,m'. But zk,o *2 A. Hence
if mk is the least m such that zk,m E 4,we have mk > 0 and zk,mk-i 2 A.
Thus, if we set xk = and yk = zk,mk for each 7c, we have xk £ A,
yk E A and yk — xk = l/2k for all k. Furthermore, since we are per¬
forming successive divisions by halves, we see that if k > l, then xi < xk
and yk < yi. Then (xk) E Fd(Ra). For, first, each xk £ Ra since we
chose a, b E Ra. Next, consider any e > 0, and pick m with l/2m < e.
7.2] CONTINUOUSLY ORDERED FIELDS 255

Then if k, l > m, say k > l, \xk - x,| = xk - Xi < yk - xt < Vl -


xi — l/-2? < l/2m < e, and similarly if / > /c. But now by (19c) there
exists w G K with lim^* xk = w. We shall show that w = inf A.
From this we can conclude that w is the largest element of if — A, since
A has no least element; then (20) will be proved according to our defini¬
tion 7.1(h). Here it is convenient to use the criterion 7.18(i) for inf (which
depends only on K being an ordered field). First, since (xk) is an increas¬
ing sequence, there cannot exist an l with w < xi) otherwise we get a
contradiction to lim^^* xk = w with e = xi — w. Hence xk < w for
all k. If w e A then there exists w' G A with w' < w. Let e = w — w'.
All xk < w' since all xk & A. Hence w — xk > w — w' = e for all k,
contradicting lim^* xk = w. Thus w ^ A, and w is a lower bound for A.
In particular, w < yk for all k. Now consider any e > 0. We can find
some k with l/2k < e. Then we have yk — w < yk — xk = 1/2* < e.
Thus by 7.18(i), w = inf A, and the theorem is proved.
The main idea for the construction of the system K from Ra in this
proof is generalized in modern analysis and topology to show that any set
on which a suitable notion of distance is defined can be extended to one
which is topologically complete in the sense of 7.27. The additional com¬
plications in the preceding proof were due to the necessity of also extending
the basic algebraic operations and ordering from Ra to K, and verifying
that the desired properties of the extensions hold in K.
In view of this existence theorem and the uniqueness theorem 7.17,
we are now free to identify the real numbers by means of the following.

7.32 Convention. We assume throughout the remainder of this hook that


(Re, —(—, •, <, 0, 1) is a fixed continuously ordered field containing the
rationals as a subfield. We call Re the set of real numbers.

Of course, from now on we can apply any of our preceding results about
arbitrary continuously ordered systems and fields to the real numbers.

Exercise Group 7.2

1.Prove Theorem 7.15(h), (iii).


2.Prove Theorem 7.29(iii), (iv).
3.Carry out the proof of step (10) in the proof of 7.31.
4.Suppose that (K, —, *, <, 0, 1) is an ordered field and L is a subfield
satisfying the following conditions:
(a) for any u, v G K with u < v there exists a G L with u < a < v;
(b) for any (xk) G Fd(L) there exists u G K with lim*_^x xk = u.
Show that any (xk) G Fd(A) converges in K. [Cf. (19) in the proof of
7.31.]
5. Do you believe that any ordered field K which satisfied (4c) is continuously
ordered? Discuss.
256 THE REAL NUMBERS [chap. 7

6. Suppose that (xk) is a bounded sequence of real numbers. Let u G A if


and only if there is a subsequence (yf) = (xk.) of (xf) with lim »•_»«, yi = u.
Show that A is bounded from above and below.
We define

lim sup&—><x)Xk = sup A, lim inik^Xk = inf A.

Calculate lim sup^*,^ and lim infk^xXk for Xk = (—l)fc[l + 1 /(&+ 1)].
7. Prove the following, for any bounded sequence (xk) of real numbers.
(i) If for each k > 0 we put Zk = sup {xp.l > k}, then lim^a, Zk exists
and =lim sup^oo^fc.
(ii) lim sup/c^ooZi; = a if and only if
(a) for each e > 0 there exists m such that Xk < a -j- e for all k > to;
(b) for each e > 0 and to there exists k > to such that Xk > a — e.
Formulate corresponding results for lim \nik^xXk.
8. Prove that if (Xk) is a bounded sequence of real numbers, then lim^o, Xk
exists if and only if lim sup^ooXj, = lim inf^ooXfc.
9. Prove that if (xk) is a bounded sequence of real numbers which is non¬
decreasing, i.e., Xk < Xk+ i for all k, then lim^*, Xk exists.
10. Show that for each d G Re, d > 0, there exists a G Re with a2 = d.
[Hint: Consider sup A where A = (i: i G Re and x2 < d].]

7.3 Infinite series and representations of real numbers. We have


already mentioned several times the decimal representation of real num¬
bers. This is a particular association of fundamental sequences of rationals
with every real number having that real number as limit. The terms of
these sequences are most naturally regarded as finite sums, with each term
obtained from the preceding by adding a new (smaller) quantity. We
thus turn now to the discussion of “infinite” sums, i.e., limits of such finite
sums in the real numbers,

7.33 Definition. Suppose that (x{) is a sequence of real numbers. By the


associated sequence of partial sums or series we understand the sequence
(sk) with Sk = Yli=o xi for k. If (sk) converges we say the infinite
series D"=0 X; converges; otherwise, that it diverges. If \imk^x (sk) = a,
we write xi = a.

We treat similarly the meaning of xf for any n > 0. The follow¬


ing is now a direct consequence of our earlier results about sequences.

7.34 Theorem. Suppose that Xk is real for k > 0.


(i) E?= o Xi converges if and only if for any e > 0 there is an n
such that for all l > k > n, |Ei=fcXj| < e.
(ii) If El=o xi converges then linij^ %i = 0.
7.3] INFINITE SERIES; REPRESENTATIONS OF REAL NUMBERS 257

(iii) If ]Li=o xi converges and n > 0 then x% converges and

71 — 1 00 00

^ T“ ^ ^ y xi-
1=0 i=n i=0

(iy) If Hi=o T?j converges then also ^o=o converges.

Questions of convergence or divergence of various kinds of series are


quite important in analysis and are studied in some detail there. We shall
not puisue the subject in any complete and systematic way but only discuss
additional properties of series as needed.

The decimal representation


Positional notations for real numbers.
a = m0.mim2m3 ... of a real number a > 0 is given by a sequence (mQ
of integers where m0 > 0 and 0 < m,- < 10 for i > 0. The intended
meaning is
mi i m2 m3 nti
a = m0 +
To 102 ^ 103 E
i=0
Kb
y~] miio \
1=0

The obvious generalization to arbitrary base b > 1, in analogy to the repre¬


sentation 4.53 of integers to the base b, would take the form a =
XLo mib~l where m{ are integers, m0 > 0 and 0 < m{ < b for i > 0.
We could of course also represent m0 to the base b, using nonnegative
powers of b as well. But the essential part of the problem is dealing with
the part a m0 which is a number between 0 and 1. That we can uniquely
find m0 from a and then proceed to treating the remaining part is given
by the following.

7.35 Lemma. Suppose that a G Re, a > 0. Then there is a unique m G I


with 0<m<a<m-j-l.

Proof. By Archimedean ordering 7.12(i) we can find n G P with a < n.


Let m S 1 be the least such n. Thenm < a. Given also m' < a < m'-\-1,
we have m-f 1 < m' + 1 by definition of m. Hence m < m'. If m < m!
then m + 1 < m' < a, contrary to a < m + 1.

7.36 Theorem. Suppose that b G P, 6 > 1. Then for any a G Re, a > 0,
there exists a sequence (mi) of integers with m0 > 0, 0 < mt < b
for all i > 0 and a = XT=o mi • b~\

Proof. One way to find such a sequence is to “creep up on a" from the
left. Thus, for example for 6 = 10, we first take m0 as the largest integer
<a, then mx/10 as the largest tenth in a — m0, then m2/102 as the
258 THE REAL NUMBERS [CHAP. 7

largest hundredth in a — (m0 + mi/10), etc. This can be rephrased by:

m0 < a < m0 +1, mi < 10(a — m0) < ffli + 1,

m2 < 102 J^a — ^m0 + < m2 + 1, etc.

In general, given any b > 1, we shall define a sequence (m;) recursively


in such a way that for any k > 0,

k
(1) V
y; niib 1 < a.
i=0

The sequence is given by

(2) m0 is the unique integer with m0 < a < m0 + 1

and

(3) gwew m0, . . . , mk: mt+1 is i/ic unique integer with

mk+ x < 6?l + 1 — y mifi ^ < mfc+1 + 1.


' i'=0 ^

That there is such a sequence is seen inductively, by showing at the same


time that (1) always holds. First, (2) is immediate from 7.35. Then (1)
also holds for k = 0. Suppose that we have m0, . . . , mk satisfying (1).
Then a — Xa=o m^-1 > 0; so we also have bk+1 (a — ]T;=0 mzh—*) > 0.
Hence by 7.35 there is unique mk+1 satisfying the inequalities of (3).
Then also (1) holds for k + 1, since by (3),

k
mk+ib~(-k+1> < a — y mib~\
i—0

This completes the induction. Since (1) holds for all k, we also have by 7.35

(4) m; > 0 for all i.

We wish now to show

(5) nii < b if i > 0.

To see this, we first observe by (3) that for any k > 0,


7.3] INFINITE SERIES; REPRESENTATIONS OF REAL NUMBERS 259

Hence

(6) — 23 for any k.

lor this also holds for k = 0 by (2). Again by (3) we have nikb~k <
a ~_Li=o mib~l for any k > 0; but then by (6) applied to k — 1,
< b x\ so nik < b. This proves (5). To prove that a =
Ya=o we need to show that

(7) for any e > 0 there exists n such that {a — Zi=o m-b^) < e
for all k > n.

By Archimedean ordering, it is sufficient to prove this for e = 1/q where


q > 0. By (6), it is further sufficient to find n with 1/6” < 1/q, i.e.,
q < bn. Since 6 > 1, certainly any n > q will do.
As an example of the above, any real number a > 0 can be represented
in the form a = £*=0 (m,-/2l), where each m,- = 0 or 1 for i > 0. Of
course, the value m0.mim2m3 ... in this binary representation is com¬
pletely different from that in the ordinary decimal expansion. For example,
0.101010 ... is f when regarded as being a binary representation of a
number, while it is ^ in the decimal representation. We could verify
these special cases by computing the sequence of numbers to* as described
in the preceding proof for a = §, b = 2 and a = 6 = 10. However,
we shall obtain in a moment a general statement concerning the representa¬
tion of rationals.
In contrast with the representation of integers to the base 6, we have
not asserted the uniqueness of the representation of real numbers. In
fact, we know from experience with decimal representations that there
are certain cases in which this fails, for example f = 0.125000 . . . =
0.124999 ... In general, we shall show that such distinct representations
of a number a can occur only if a is of the form c/d where c, d e I, (c, d) = 1,
and every prime which divides d also divides 6. A simpler statement of the
same condition is that a = e/bk for some e, k e I.
To prepare the ground for the proof we need two more results about
infinite series which are also of interest in their own right. The first of
these deals with the natural generalization of the geometric series (4.33)
to infinite series. We leave its proof to the student.

7.37 Theorem, (i) If x 6 Re, \x\ > 1 then ^“=0 X{ diverges.


(ii) If x G Re, \x\ < 1 then Y,7=o R = 1/(1 — x).

7.38 Theorem. If 0 < yi < Xi for all i > 0, where Xi, yi are real, and if
DLo Xi converges then Ya=o Vi converges and 0 < Y7a=o Vi ^
L oo
2=0
260 THE REAL NUMBERS [CHAP. 7

Proof. Let sk = E;=o *i, h = E;=o Vi- By hypothesis, lim*-,* sk = a


for some a GE Re. Since all > 0, it is not possible that a < 0. By
7.34(iii), E?=fc+i Xi converges and E?=fc+i Xi = a — s*. By the preced¬
ing, also 0 < EiUfc+i a:*; thus sk < a for all a. Now 0 < tk < sk for
all k by the relationship of the Xi and y*. Hence (tk) is a bounded sequence.
It easily follows from this and the fact that tk < ffc+i for all k that the
sequence (;tk) is also convergent (cf. Exercise 9 of the preceding section).
But then by 7.29(h)

lim sk — lim tk = lim (sk — tk) > 0.


k—>oo k—>oo k—>oo

7.39 Theorem. Suppose that b e P, b > 1, a e Re, a > 0 and a ^ c/bk


for all c, k £ I. Then there is a unique sequence (m,-) satisfying the
conditions of 7.36.

Proof. Suppose that we have a = E?=o mib~l = E?=o 'm'ib~l where


both {mf}, (mi) satisfy the conditions of 7.36. By 7.34(iii),

a — Y, mib~l = ^ mib~l
%=0 i—/c —(-1

for any k > 0. Since 0 < < 6 for each i > 0, we have

^ mf) < Y (b — 1)6


»=&-{-1
00
(6 - 1) 1 = J_)
= (6 - l)6~(fc+1) 2 6~1'
^+i 1 bk ’
i'=0

using (among other facts) 7.37, 7.38. (On what grounds can we factor out
(6 — l)6~(/c+1)?) Thus for any k > 0, a — Ei=o mi6—1 < b~k. [Note
that we verified this as (6) in the proof of 7.36, but only for the special
sequence considered there.] Now if a — E;=o = b~k for any k,
we would have a rational and of the form a = c/bk, contrary to hypothesis.
Hence a — E;=o mfo~l < b~k for all k; of course, the same holds if we
replace ra; by mf Now suppose that there is some k with mk ^ mk;
let n be the least such k, so that for i < n and, say, m'n < mn.
Then E;=o rn'ib~l < E;=o rnf)~l < a, and

(mn — m'n)b Y mib 1 ~ Y


i=0 i=0
m'ib 1 < a — Y^ m'ib
i'=0
1 < b~n.

But then 0 < mn — m'n < 1, which is impossible for distinct integers.
7.3] INFINITE SERIES; REPRESENTATIONS OF REAL NUMBERS 261

We leave it to the reader to verify that there are exactly two representa¬
tions of the form 7.36 for each a of the form c/bk.
The reader is probably familiar with the fact, for the base b = 10,
that every rational number has an eventually repeating or periodic repre¬
sentation. This is, in fact, a characteristic of the representation of rational
numbers in any base b > 1. By this we mean that a = £/=0 m*for
some mi which satisfy the conditions of 7.36, and which, for a certain
n and q, satisfy mi+g = m, for all i > n (period q), if and only if a e Ra.
The details of this are left for the exercises.
There are other kinds of representations of real numbers which are also
interesting. One is the so-called continued-fractions representation. These
are representations of real numbers as limits of sequences

m0, 1
m0 -f mo + m0 H- 7 * *
1
m-i -\- m.\
m2
m2
m3

or, as sometimes also written,

1 1 1
mo, m0 H-> mo ~h m0 +
mi m+ m2 m+ m+ m3

so that the limit, if it exists, is denoted by

1 1 1
mo +
m+ m+ m+l

For example (but only for very special reasons), the following suggests
such a representation of \/5, i.e., of a solution x of x2 — 5 = 0. We
write x2 — 4 = 1, so (x — 2) = 1/(2 + x), x = 2 + 1/(2 + x). Then
by successively substituting this expression for x we get

x = 2 + x = 2
1
4 +
2 + x
4 +
2 + x

Of course this hardly proves that

2 + —-•
' 4+ 4+ 4+

as taken in the above sense, but the statement is indeed correct. Again
we leave to the reader the study of some of the introductory ideas dealing
with continued fractions.
262 THE REAL NUMBERS [chap. 7

The subject of representations has only been touched on here because


of its general interest, and not because of any particular relevance to the
main problems of algebra or analysis. (However, continued fractions do
play a very useful role in number theory, especially in connection with the
solution of certain quadratic equations in integers.) In elementary mathe¬
matics courses, it is customary for students to “calculate” with real
numbers, for example, to add or multiply them, by using some initial
portions of their decimal representations. The significance of such cal¬
culations can be explained precisely in terms of closeness of approximation.
However, when we leave the area of particular calculations it is the
characterization of the real number system as a continuously ordered
field and the facts directly attendant on this, such as that every real
number is the limit of some sequence of rationals, that are of main use.

Power series. In particular, infinite series play their main role in analysis
as a means of representing and investigating a large class of functions.
Algebraically, the simplest functions (of one argument) to deal with are
the polynomial functions (5.1, 5.9). It is natural to generalize these, in
the real numbers, to functions definable in the form F(x) = 127=o ^{X1 or,
as is usually said, by means of a power series. However, in contrast to
polynomial functions, such a representation need not be meaningful for
all values of x, simply because the series need not converge for every value
of x. For example, we have already seen in 7.37 that the power series
127=0 H converges if and only if |x| < 1. In the cases that it converges
it has the same value as the function F(x) = 1/(1 — x). But the behaviors
of the function and the series are otherwise quite different, for the function
is defined for all values of x 1. (Recall the remarks concerning an
uncritical use of the relationship between this series and function which
we made in Chapter 1.)
The following theorem will provide us with a condition for testing the
convergence of a power series which will be sufficient for our purposes,
though stronger results can be obtained. (We leave the proof of one of
these to the exercises.)

7.40 Iheorem. Suppose that (bk) is a sequence of nonzero real numbers


such that lim^oo \bk+i/bk\ exists. We denote this limit by c. Then:
(i) Su=o bi converges if c < 1, and
(ii) E?=o bi diverges if c > 1.

Proof. In (i), 0 < c. Choose d with c < d < 1 and then let e = d — c.
By hypothesis there is an m such that \\bk+1/bk\ — c\ < e for all k > m;
hence \bk+1/bk\ < d for all k > m. Then |6fc+1| < d\bk\ for all such k.
7.3] INFINITE SERIES; REPRESENTATIONS OF REAL NUMBERS 263

Thus \bm+11 < d\bm\, |6m_|_2| < d\bm+l\ < d2\bm\, and we see in general,
by induction on i, that

(1) |&m+t| < d%n\.

Now the series 217=0 dl converges by 7.37 since |d| < 1. Hence so does
the series |bm\ 227=o d1 = 227=o \bm\dl. It follows by 7.38 that 227=m \bi\ =
12?=o \bm+i\ also converges. Hence we see that 227=o \b%\ converges, so
that Z7=o bi converges by 7.34(iv).
To prove (ii) we find m such that \bk+1/bk\ > 1 for all k > m. But
then it is seen that |6m+i| > '\bm\ for all i. Thus for e — \bm\ we have
e > 0 and \bk\ > e for all k > m. But if 227=o bi converged we should
have linifc^oc bk = 0 by 7.34(h), in contradiction to the preceding.
It is seen from the proof that when c < 1, we have the (in general)
stronger result that 227= u F converges. The reason is simply that

A+l bk+1
16,1 bk

Note that the test gives no information when c = 1. There are many
examples of series 22t=o bi with linifc^ \bk+i/bk\ = 1, some of which
converge while others diverge. For example, 227=o 1 diverges, and it can
be shown that 227= i (1/f) also diverges while 227= i (1 A’2) converges.

7.41 Corollary. Suppose that (ak) is a sequence of nonzero real numbers


such that lim^oo |a*+1/a,fc| exists. We denote this limit by d. Then:
(i) if cl = 0, 22 f=o o-iX1 converges for every x E Re;
(ii) if d ^ 0 and we put r = 1/d then 227= o aix% converges for |x| < r
while it diverges for \x\ > r.

Proof. For each fixed x we apply 7.40 to the series bi where


bi = aiX\ Then lim*^ \bk+i/bk\ = \x\ lim^^ \ak+i/ak\ = \x\ d. Thus
we use 7.40 with c = |x| d. Then certainly c < 1 if d = 0, so that
2l7=o ciiX1 converges no matter what x is. If rf ^ 0 we have convergence
if c < 1, that is, |x| < 1/d, and divergence if c > 1, that is, \x\ > 1/d.
Again, for d ^ 0, we can obtain examples of power series in which all
possible combinations of convergence or divergence are realized for x = ±r.
The number r in (ii) is usually called the radius of convergence of the power
series. Very often we say, instead of (i), that the power series has an
infinite radius of convergence and write r = go.
The reader is familiar with power series of the form («*A'0 xl from
calculus, and with the particular series associated with various functions
such as ex, sin x, cos x by means of Taylor’s theorem. However, it is usual
at such a level that various properties of these functions are (often
264 THE REAL NUMBERS [chap. 7

implicitly) assumed. For example, use of the function ex is an instance of


performing the general operation of exponentiation yx on real numbers
x, y. It is assumed that this has the usual properties of exponentiation,
such as yXlyx2 = yxI+X2. We have not yet defined such an operation here
and shown how such a property can be verified. We shall describe how
this can be done at the end of the next section, by means of a general
theorem which shows how various functions defined on the rationals can
be extended in a natural way to the real numbers and, moreover, in such
a way that many properties of the original functions continue to hold for
their extensions. At any rate, even though one generally obtains the
expected results, it is by no means a trivial matter.
An adequate treatment of the trigonometric functions involves even
greater difficulties. The usual treatment of these functions is in terms of
the geometric notion of angle and implicitly involves a number of assump¬
tions about this notion which are rarely articulated in beginning or even
more advanced calculus courses. A precise statement and the verifica¬
tion of these properties, starting from a nongeometric framework, seem
to demand some material from the calculus on either differentiation or
integration. We shall describe such a treatment in Appendix II, and draw
further consequences in Chapter 8.

The exponential function. One sophisticated approach to handling the


exponential, logarithmic, and trigonometric functions is to take as defini¬
tions of these functions the power series expansions which one is led to,
presuming intuitively expected properties of these functions. Then one
shows that the functions, as so defined, do indeed have these properties.
We shall conclude this section with some results showing how one might
pursue the exponential function ex in this way. Here the expected power
series expansion is E?=o (xl/iX).

7.42 Lemma. El=o (xl/i\) converges for every real x.

Proof.

lim l/(fc + 1)!


= lim y — = 0.
Jc—>00 1/A! 00 k -f- 1

Thus we can apply 7.41 (i).

7.43 Definition. E is taken to be the function with domain Re such that

Efx) = ELo P/P- for all x. We define e = E( 1).

One of the main properties we expect of E is that E(x1) • E(x2) =


E(xi + x2) for all x1} x2. To verify this, we must somehow multiply the
two series E;=o (/AO) DLo (/AO to give a single series in powers of
7.3] INFINITE SERIES; REPRESENTATIONS OF REAL NUMBERS 265

X\ + x2. The clue here of how to do this is by a generalization of the


multiplication of polynomials.

as given in 5.4 (ii). Of course in that case we were dealing with essentially
finite (eventually zero) sequences, and no questions of convergence were
involved. Examples can be given to show that the corresponding operation
on infinite series does not always lead from convergent series to convergent
series. However we can obtain the following result.

7.44 Theorem. Suppose that Xu=o |oq| and Z)u=o 4 converge. Then so
also does DT=o (Hj=o ajbi—j) and we have

(t *\(± b)= t (i
\2=0 / \ i=0 / 2 = 0 \j = 0 /

Pvoof. Let Sfc 2^2=0 Q'ty tjc 5^2 = 0 bij Mk — X^2=0 (2^i=0 afti—j) •
Further, let a = £?=o K|, a = Xa=o ai [using 7.34(iv)], and b = Xa=o 4-
We can assume a > 0, for otherwise a; = 0 for all i and the desired con¬
clusion is obvious. The conclusion of the theorem can now be stated as

(1) lim Uk = ab.


k—»oo

To prove this, we rewrite Uk as follows:

Uk — a0b0 -(- (flo^i T Oi^o) + (uo^2 T- Ri&i + a2bo) + • • •


T (a0bk -T aibk—i T~ • • • + cqFo)

= «o(^o + + ■ * • + bk) + di(bo -j- b\ -\- • • • + bk~i) + • • • + Ukbo

= ototk + aitk-i + • • • + o-kto-

For comparison with ab, or certain approximations of it, we set <4 =


b — tk, so tk = b — dk- Then we have

(2) Uk = Skb — (aodk + a\dk—\ + • • • + cikdo)-

Since lim;^* Sk = a, to prove (1) it is sufficient now to show that

(3) lim (aodk + Uidk—i T • • • + cikdo) = 0.


/C—>00

We wish to prove this result by using the facts that lim^oo = 0 and
lim/t_»oo (Za=o |«i|) = a. Suppose we are given an e > 0. We know that
266 THE REAL NUMBERS [chap. 7

the sequence of |<4| is bounded, say |<4| < d for all k, where d > 0.
Choose m so that |e4| < e/2 a for all k > m. We can also choose m large
enough to satisfy J2i=i Wi\ < e/2d for all k > l > m. Then for any
k > 2m,
k_
A/ III.
^ dk—i ^ ^ ^ l^idk- y v a idk i | —

i=0 i=0 i=m -f-1

< ^ E M d E a7;
2=0 =TO+1
e , e
6.
<12a'aJrd'Td~ 2 ' 2

From this we can conclude that (3), and hence the theorem, is proved.

7.45 Theorem. Suppose that x, y e Re. Then:


(i) E(0) = 1;
(ii) E(x) ■ E(y) = E(x + y);
(hi) E(-x) = 1 /E(x);
(iv) if x < y then E(x) < E(y) ;
(v) 0 < E(x);
(vi) for any nel, E(n) = en.

Proof, (i) is obvious by definition of E. Since the series defining E(x)


converges for all x, in particular for any \x\, we can apply 7.44 to prove
(ii). Thus

E E
^ / o
xY i\
E(x) ■ E(y) = ( i—J
ERE JKi ~ j)! * v
*=0 \j—0 jKi ~ J)! j=o

But by the binomial expansion 4.36 and by 4.35,

i\
(x + yYt
jy.

so that here the right-hand side is just E(x -f y). We leave the proofs of
the remaining parts to the reader.

Exercise Group 7.3

1. Show that the following two conditions are equivalent for a, b with
a G Re, a > 0, and b G P, b > 1:
(i) a = c/d for some c, d £ I, with (c, d) = 1, such that whenever p is
prime and p\d then p\b;
(ii) a = e/bk for some e, k £ I.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 267

2. Prove that each real number of the form c/bk, c, k £ I, has exactly two
representations of the form 7.36.
3. Prove Theorem 7.37(i), (ii).
4. (i) Show that if a real number a has a periodic representation to base b,
that is, a = ]Tf=0 mib~l where for some n,q, mi+q = to; for all i,
then a £ Ra. [Hint: Consider (bqa) — a.]
(ii) Conversely, to show that every positive rational number a = c/d
(c, d £ P) has a periodic representation to the base b, let

c = mod + ro, 0 < ro < d,


bro = m\d + ri, 0 < r\ < d,
and in general for any i > 0,

bri-1 = md + r,-, 0 < r; < d.

Then a = TO;i_1 is such a representation. (Compare this procedure


with usual longhand decimal division.)
5. If a is a real number, a > 0 and a G P, there are unique mo £ P and
ao G Re with ao > 0 and a = mo + 1/ao. (Why?) If a is irrational,
so also is ao- If a is rational, a = c/d, c, d £ P, and c = qd + r, 0 <
r < d, then too = q and ao = d/r. Use these facts to prove that every
rational number can be represented as a finite continued fraction,

a = m0 —7

TO r TO 2 mk

What would be involved in showing that every irrational number a can


be represented as an infinite continued fraction,

1
a = m0 H-- —7
?
TOl TO2 mk

Show that there exists a real number so represented with mo = 2,


to = 4 for i > 0.
6 . Prove Theorem 7.45(iii)-(vi).

7.4 Polynomials and continuous functions on the real numbers. We


are now in a position to handle the algebraic problems which we considered
as one of the motivations for introducing the real numbers, namely those
concerning the existence (or nonexistence) of roots of polynomials /(£)
with rational coefficients. We should expect that a considerably greater
number of such polynomials have roots in real numbers than in the
rationals. More generally, we should also consider the same questions for
arbitrary /(£) £ Re[£].
Although these questions can be treated without any essential use of
analysis, the basic and most useful fact in this connection is that every
268 THE REAL NUMBERS [CHAP. 7

polynomial function f(x) with real coefficients is a continuous function.


Intuitively, this means that when “graphed,” i.e., when we consider the
set of points (x,f(x)) in Re X Re, the result is an “unbroken curve”:

Then if any such curve is above the horizontal axis at one point, below at
another, it should cross the axis somewhere between the two points. More
precisely: if for some a, b, f(a) > 0 and/(6) < 0, there should be some x
between a and b with f(x) = 0. Of course, to the uninstructed eye, this
looks as though it should already occur when we consider the graph of
/ in Ra X Ra. However, as we have realized, there are (despite the
density of Ra) “gaps” in Ra through which such a curve could pass.
On the other hand, all such “gaps” are filled in Re, at least when it is re¬
garded as a continuously ordered system. We turn now to a discussion of
the general notion of continuous function (on Re) and to a verification
of the above facts.

7.46 Definition. Suppose that F is a unary function with 4)(F) = Re,


(R(F) c Re, and suppose that a e Re.
(i) We say F is continuous at a if for any e > 0 there exists a 8 > 0
such that whenever \x — a\ < 8 then j F(x) — F(a) \ < e.
(ii) We say F is continuous (on Re) if for each real number a, F is
continuous at a.

More general concepts of continuity are used extensively in analysis,


but the above is sufficient for our purposes and most purposes of algebra.
We begin by investigating some properties held by all continuous functions.
The defining condition (i) is reminiscent of the defining condition 7.21
for limits of sequences. These can in fact be brought together by the follow¬
ing, whose proof we leave to the reader.

7.47 Lemma. Suppose that F is continuous and that (xk) is a convergent


sequence of real numbers with lim^ xk = a. Then (F(xk)) is also
a convergent sequence and F(xk) = F(a).

Weierstrass Nullstellensatz. The basic theorem to be applied to obtain


roots of polynomials is the following, known as Weierstrass’ Nullstellensatz
(“zeros theorem”).
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 209

7.48 Theorem. Suppose that F is a continuous junction and that a, b are


real numbers with F(a) <0 and F(b) > 0. Then there exists at least
one real number c between a and b such that F(c) = 0.

Proof. Suppose that a < b. [The proof for a > b proceeds similarly
or can be obtained from the present proof for a < b, using the continuous
function G(x) = —F(x).] We have in mind the graph of F:

It is not excluded that F may have several roots between a and b. We shall
prove the existence of the rightmost root. We define

A = {x: a < x < b and F(x) < 0}

and then let c = sup A. Then sup A is well defined since a E A and b
is certainly a bound for A. Clearly a < c < b. If F(c) — 0 we are
through. Suppose next that F(c) > 0. Let e = F(c). Then by the con¬
tinuity of F at c, we can find 8 > 0 such that if \x — c\ < 8 then |F(x) —
F(c)| < e. Since c = sup A, there exists at least one x e A with \x — c\ < 8
by 7.18(h). For such x, F(x) < 0, so that |F(x) — F(c)| = (F(c) —
F(x)) > F(c) = e, contradicting the preceding. Suppose finally that
F(c) < 0. Let e = |F(c)|. Again we find 8 > 0 such that |F(x) —
F(c)\ < e whenever \x — c\ < 8. In this case, consider any x with
c < x < c + 8. Then x & A by c = sup A. Hence F(x) > 0. But then

IF(x) - F(c) | = F(x) + |F(c)| > |F(c)| = e,

which is again a contradiction. Thus the only possibility is that F(c) = 0.


In the same spirit as this theorem, the next general result that we obtain
for continuous functions concerns the existence of maxima and minima.
A corresponding result for certain functions of complex numbers will
play an important role in the next chapter. We need the following pre¬
paratory theorem.

7.49 Theorem. Suppose that F is continuous and that a, b are real numbers
with a < b. Let A = {F(x): a < x < b}; in other words, A is
the range of F restricted to {x: a < x < b}. Then A is bounded
above and below.
270 THE HEAL NUMBERS [CHAP. 7

Proof. Suppose that A is not bounded above. Then for each n e P


there exists an element of A larger than n, i.e., there exists x with a <
x < b and n < F(x). We can choose for each n a definite xn such that
a < xn < b and n < F(xn) (using the axiom of choice—which, however,
can be be avoided by slightly finer considerations). Then (xn) is a bounded
sequence, so that by the Bolzano-Weierstrass theorem 7.26, it contains a
convergent subsequence (yk) = n0 < nx < ■ • • < nk < ■ ■ ■ . Then
by 7.47 we should have (F(yk)) a convergent sequence. However, it is
easily seen that this contradicts nk < F(yk) for all k. Thus A must be
bounded above. The proof that A is bounded below is similar.

7.50 l heorem. Suppose that F is continuous and that a, b are real numbers
with a < b. Then:
(i) there exists at least one number c such that a < c < b and
Fix) < F(c) for all x with a < x < b, and
(ii) there exists at least one number c such that a < c < b and
F(c) < F(x) for all x with a < x < b.

Proof, (i) Let A = {F(x):a < x < 6}. Since A is bounded above
by the preceding theorem, it has a least upper bound, call it d. Then by
7.18(h), for each n G P we can find an xn such that a < xn < b and
d — F(xn) < 1/n. We apply the Bolzano-Weierstrass theorem again to
find a convergent subsequence (yk) = (xnk), n0 < nx < ■ ■ ■ < nk < ■ ■ ■,
of the sequence (xn). Let limj,^* yk = c. Then limfc^w F{yk) = F(c)
by 7.47. Clearly, a < c < b. We show now that d = F(c). Given any
e > 0 we can find n e P with l/n P e/2 by the Archimedean property
7.12(h). Then if k is any integer with n < nk, we have \d — F(yk)\ =
d — F(xnk) < l/nk < l/n < e/2. Since lim^^oo F(yk) = F(c), we know
that there is an m with \F(yk) - F(c)| < e/2 for all k > m. Hence by
choosing large enough k we obtain

1.^ -^(c)l ^ \d F(yk) | -j- |F(yk) — T(c)| < e.

Since this is true loi any e 0, it follows that d = F[c) and hence F(c)
is the supremum of A. The proof of (ii) is quite similar.
A e say of (i) that F attains or takes on its absolute maximum at c, with
respect to the interval a < x < b. Similarly in (ii) we say that F attains
its absolute minimum at c for this interval.
As they stand, 7.48 and 7.50 are pure existence results. That is, no
statement is made as to how to locate the numbers c with the given
properties, say by means of some fundamental sequence converging to c.
We shall have somewhat more to say about this question farther on in
this section.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 271

Real 'polynomials and their roots. We now turn to applying these results
to polynomials. This is done by means of the corresponding functions,
which we shall show to be continuous via the next theorem. Its statement
and proof are related to 7.29.

7.51 Theorem. Suppose that c e Re, and that G, H are continuous func¬
tions. Then the function F defined by any one of the following conditions,
for all x G Re, is continuous:
(i) F(x) = c;
(ii) F(x) = x;
(iii) F(x) = G(x) + H(x);
(iv) F(x) = G(x) ■ H(x).

Proof, (i) and (ii) are trivial. In (iii)-(iv) we consider any real number a.
(iii) Given e > 0, we can find <5j > 0, <52 > 0 such that
6
if \x — a < then 1 G(x) - G(a) |
K 2
e
if \x — a\ < S2 then \H(x) - H(a)\ < 9

Let <5 be the smaller of Si, S2. From

\F(x) - F(a) |
= |(G(x) - G(a)) + (H(x) - H{a))\ < \G(x) - G(a)\ + |H(x) - H{a)\,

we conclude that |F{x) — F(a) \ < e whenever \x — a\ < 8.


(iv) Here we write

| F(x) — F(a) | = | G(x)H(x) — G(a)H(a)\


= | (G(x) - G(a))H(x) + G(a) (H(x) - H(a))\
< |G{x) - G{a)| \H{x)\ + |G(a)| |H(x) - H{a)\.

Letilfx = max (\G(a)\, l). Then M y > 0. We can find 8 2 > 0 such that

if \x — a\ < S2 then |H{x) — H(a)\ < •

By 7.49 the values H{x) for \x — a\ < S2 are bounded from above and
below so that we can find an M2 such that

if \x — a\ < 5i then \H(x)\ < M2.

Now we can find <5X > 0 such that

if \x — a\ < Si then |G(a-) — G(a)| < ~j- ■


272 THE REAL NUMBERS [CHAP. 7

Let 8 be the smaller of 51; <52. Thus whenever \x — a\ < 8 we also have
\Ii(x)\ < M2 and hence \G(x) — G(a)\ \H(x)\ < e/2 and |(7(a)| |H(x) —
77(a) | < e/2, so that, finally, \F(x) — T(a)| < e.
We could also obtain continuity of F(x) = G{x)/H{x), provided
H(x) ^ 0 for all x. However, this is not needed here.

7.52 Corollary. If /(£) e Re[£], that is, /(£) = Y/l=o a^1 where each
ai is real, then the associated polynomial function fix) = Y/l= o a^a/
is continuous.

Proof. By induction on i from 7.51 (ii), (iv) we see that each of the
functions g{(x) = x1 is continuous; then so is fi(x) = aixi by 7.51 (i), (iv).
Then the result here follows by induction on n using 7.51 (iii).
This theorem suggests that also functions defined by power series,
F{x) = ELo Uix\ should be continuous (at all points for which they are
defined). However, 7.51 cannot be applied directly to obtain this result,
and some additional considerations are needed. These are pursued in the
exercises (3, 4 below). In particular, it will be seen that the function E
defined in 7.43 is continuous.
In order to apply 7.48 and 7.52, we next prove the following inequalities.

7.53 Theorem. Suppose that /(£) £ Re[(j, /(£) = 2Za=o where


n > 0 and an — 1. Let M be the larger of 1, Yfi=o | a/. Then:
(i) if x > M then f{x) > 0 and if x < —M then (~l)nf(x) > 0;
(ii) if fix) = 0 then \x\ < M;
(iii) if n is odd and x < —M then fix) < 0.

Proof. Using the fact that a > — |a| we see that for any x

(1) fix) > x11 (|ao| T- |ax|a; +•••/- |an_i|x”—*).

Now if x > M then certainly * > 1 so that for any i, xi < U+1. Hence
rf x > M and 0 < i < j < n then xi < xj. It follows that if x > M then

a Lx 1
tool V.W-
\<hi-l\Xn

< (kol + |«i| + + \an_i\)xn 1 < Mx 1

Thus

(2) if x > M then fix) > xn ~ Mxn~l = xn~/x — M) > 0,

proving the first part of (i). To prove the second part we write

(-!)"/(*) = E (-l)n-W-*)< = E bp/,


i=0 i=0
7.4] POLYNOMIALS AN1) CONTINUOUS FUNCTIONS 273

where 6; = (—1)” la; and y = —x. Then we obtain, as in (1), from


|^i| =

(3) ( l)nf(x) > yn (|ao| + |«i|2/ ~b • • ■ 4" la?i—i\yn 1)> .for


V = —x.

Now if x < —M then y > M, and we see as before that

|a0| + \o>i\y + • • • + \0"n—i\yn 1 < Myn \

and hence

(4) if x < -M then (-1 )nf(x) > yn - Myn~l > 0, for y = -x.

(ii) and (iii) follow immediately from (i).


As examples of applications of 7.53(i), consider /0(£) = £4 — 3£ + 2,
/j(£) = £5 + 3£4 — 2. Then both f0(x) > 0 and fi(x) > 0 for x > 5;
fo(x) > 0 for x < —5 while fi(x) < 0 for x < —5. It follows from the
next theorem that /i(£) has at least one real root between —5 and 5; the
only related conclusion we can make about /0(£) is that if it has any real
root x then —5 < x < 5. To apply 7.53 to arbitrary /(£), we first divide
/(£) by the leading coefficient an. The behavior of f(x) will then be deter¬
mined for |a:| > M, where M is the larger of 1 and ^”=0 \di/an\, by the
sign of an.

7.54 Corollary. Suppose that f(jf) G Re[^] is of degree n, where n is odd.


Then /(£) has at least one real root. More particularly, if /(£) =
ELo OiT where an ^ 0, and M is the larger of 1, ^To1 \ai/an\,
then f(x) = 0 for some x with \x\ < M.

7.55 Theorem. Suppose that n e P and a > 0. Then there exists a unique
real number x such that x > 0 and xn = a.

Proof. Consider /(£) = £n — a. Then /(0) < 0 and f(b) > 0 for any
b > a + 1. Hence there is at least one a: > 0 with x" — a = 0. Suppose
that xn = a and yn = a where 0 < x, 0 < y. If x < y then (by 4.15)
xn < yn, which is contradictory. Similarly we cannot have y < x, so
x = y.
In general, it can be seen that for a > 0, xn = a has exactly two
solutions x if n is even, while it has just one solution x if n is odd. If a = 0,
it has only x = 0 as solution. If a < 0 it has no solutions if n is even, and
exactly one solution if n is odd.
Comparison of these results with those of 6.20 and 6.21 already indicates
the substantial advantage which the real numbers give us in determining
the existence of roots of polynomial equations. We do not yet have enough
274 THE REAL NUMBERS [chap. 7

information to obtain a complete description of the prime polynomials


with real coefficients, and thus a complete analysis of arbitrary polynomials
with real coefficients as in 6.35. For this purpose it seems necessary to
make use of the complex numbers, which we shall do in the next chapter.
We can, however, completely settle the cases of polynomials of degree
2 or 3, since for these it is sufficient by 6.37 to test for the existence of
roots. Indeed, since every polynomial of degree 3 has at least one real
root by 7.54, no such polynomial is prime. The test for polynomials of
degree 2 is by means of the usual quadratic formula. To express this here,
we now introduce a symbol for square root and, more generally, nth root,
which is possible by 7.55.

7.56 Definition. Suppose that a > 0 and n e P. By y/a, or alln, we


mean the unique x > 0 with xn = a. We call this the (nonnegative)
nth root of a.

We shall have more to say at the end of this section about the relation
of these kinds of powers to the ones so far treated. As usual, we write
\/a for y/a. Note that whenever n is even, an > 0 no matter what the
sign of a is, and hence y/an defined; we have y/an — |a|.

7.57 Theorem. Suppose that a, b, c e Re, a ^ 0. Then


(i) a + &£ -)- c has a real root if and only if b2 — 4ac > 0, and
hence is prime in Re[£] if and only if 62 — 4ac < 0.
(ii) if b2 4ac = 0, a£" + &£ c has the unique root x = —b/2a
and we have at2 + &£ + c = a(£ — x)2;
(iii) if b2 4ac > 0, a£2 + &£ + c has the two roots

—b + y/b2 — 4 ac —6 — y/b2 4ac


Xi and X2 —
2a 2a
i-2
so that or + b£ + c = a(£ — xx)(g - x2).

i Pro°f• We proceed by the usual method of “completing the square.”


lor any x, ax2 + bx + c = 0 is equivalent to x2 -f (b/a)x = —(c/a),
and hence to

b2 — 4ac
that is,
4a2 “

Since (x + b/2a)2 > 0, 4a2 > 0, if the polynomial has any roots we must
have b2 - 4ac > 0. Conversely, if b2 - 4ac > 0, we can exhibit the
roots as in (ii) and (iii).
As we shall see from our study of the complex numbers, the only prime
polynomials in the real numbers are the linear a£ + b and the quadratic
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 275

a£2 + + c with 6 2 — 4ac < 0. However, the proof of this statement


will involve some moderately deeper considerations.

Computations of roots. Even in the special case 7.54 or 7.55 of poly¬


nomials for which the existence of roots is guaranteed, we have so far no
information on how to “find” these roots. By “find” we mean here a
systematic method of computation which will lead from the given coeffi¬
cients of a polynomial to the terms of a fundamental sequence (xn) with
x = lim„^.oo xn being one of the roots of the polynomial. We might even
hope for more, namely a method which will give us in this way all the
real roots of the polynomial, if there are any.
It might seem at first sight that the quadratic formula of 7.54(h), (iii)
does just this for polynomials of degree 2. In fact it does tell us exactly
how many real roots such a polynomial has. However, it reduces the prob¬
lem of computing these roots to the problem of computing a square root,
which simply brings us back, via the definition 7.56, to the existence
theorem 7.55.
If we disregard for a moment questions of efficiency and simplicity,
we can see at least one method for solving the first computation problem
above for the polynomials of 7.54 and 7.55. The method in fact applies
to any continuous function F for which we can systematically compute
F{x) at each value x, even if only at each rational value x, and which
satisfies the hypothesis of 7.48 that we have certain a0 and b0 such that
F(a0) < 0 and F(b0) > 0. By continuity it is easily seen that we can also
find such a0, b0 which are rational. Suppose, for example, that a0 < 60.
Consider c = (a0 + b0)/2] we can calculate F{c) by hypothesis. If
F(c) = 0 we are through. If F(c) > 0, we know by 7.48 that there is
some root between eq and fq, where we take eq = a0, bi = c. If F(c) < 0,
we know that there is a root between cq and iq, where in this case cq = c,
fq = 60. We now repeat the procedure with cq, fq instead of a0, b0.
This gives rise to two sequences of rationals an, bn. At any stage we
either arrive at (an + bn)/2 as a root of F, or we continue to the next
stage to obtain an+i, bn+1 with F(an+1) < 0 and F(bn+i) > 0. In
general, bn — an = (l/2n)(60 — a0). It is thus seen (cf. the proof of
the Bolzano-Weierstrass theorem 7.26) that if the process never stops
with a root then limn_>oo cq = limn^® bn = c, and by the continuity of F,
F(lim^oo an) = F(an) < 0, while ^(lim*-, bn) = lim)WOO F(bn)
> 0, so that F(c) = 0. Hence either sequence (an), (bn) provides us with
a fundamental sequence converging to a root c ol F. In fact, if ao, 6o are
chosen to be integers, it is seen that this procedure will lead us to the repre¬
sentation of c to the base 2 (7.35). It is obvious how this method can be
adapted to other bases, e.g., base 10 for a decimal representation of c.
In particular, to apply this method to the polynomials of 7.54, 7.55, we
276 THE REAL NUMBERS [chap. 7

can take a0 = —M, b0 = M, where M is as given in 7.54, in the first


case, and a0 = 0, ax = a + 1 in the second case.
A number of methods that have been developed in algebra and analysis
for computing roots of equations are more suitable than the preceding
when viewed as a practical matter. The primary concerns here are first,
the question of “rapidity of convergence” of the approximations involved,
and then, the simplicity of the algorithm provided. We wish to discuss
only one particular case here, namely the question of finding x > 0 such
that xn = a, when a > 0 is given. The procedure we consider is known
as Newton’s method; it is suggested by ideas from elementary analysis,
which we treat informally here. Though the idea of the method is simple,
the precise conditions under which it leads to a correct solution for an
arbitrary initial function F are more involved than is worth stating here.
Consider the graph of the function F(x) = xn — a for x > 0.

We shall obtain a sequence xk, starting with any initial x0 known to be


laiger than Va, for example, x$ = a Given xk, we find the tangent
line to the graph of F at the point (xk, F(xk)). As we know from calculus,
the equation of this line is y — F(xk) = F'(xk)(x - xk), where F' is the
derivative of the function F. The notion of derivative has already been
discussed here as a formal operation on polynomials in 5.16, 5.17. In
particular, for/(£) = a, we have/'(£) = the result expected
from the calculus. In the case we are dealing with, the equation of the line
is thus y (xk a) = nxk l(x xk). We define Xk+x to be the x-
coordinate of the point of intersection of this line with the x-axis. In
other words, -(xj* - a) = nxt~\xk+l - xk), so that

(4 - a) n — 1 a
L'fc + l — xk
nx n—1 n nx1}*1
k

It appears from Fig. 7.12 that lim^oo xk = \/a. However, the proof
of this is quite another matter. We illustrate such for n = 2. In this case
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 277

xk+1 = (xk/2) + (a/2xk). Then

.. ..
/- i/ Va , a
xk+1 \ Cl — C,\xk V fl) — ——h ~—
2 2 2xfc

= W
2
- Vi) - ^
2
^ -xfc v^)
= (t/c — V a) /. _ \/a\ = (xk — Va)2
2 \ xk / 2xk

Using this, we can prove by induction on k that \/a < xk. For it is true
for xq by hypothesis. Suppose that it is true for k. Then (xk — \/a) z/2xk
> 0, so xk+1 — \/~a. > 0, thus proving it for k+ 1. It then also follows that

(xk+1 — Va) < —- for all k.


2 Va

Put b = 2\/a, so that

1 \ Cl ^ /Xk ^
“6 V b ) '

It follows by induction on k that

xk — Va (x0 — Va\2
b V b / ’
that is,
— 1 r r~\ 2k
^
xk — Va < - (x0 — va .
b2k-l

A somewhat weaker comparison from the above, using 1 — Va/xk < 1,


shows that

x0 — Vc
Xk+1 and hence — Va <
2k

which is already sufficient to establish lim/c-,*, xk = \/a. We leave the


proof for the more general case as a problem to the reader.
As an example of the preceding, the computation of the sequence for
V2 up to k — 3, starting with x0 = 2, is X\ = 1.500 . . . , x2 — 1.4166 . . . ,
and, rounding off, x3 = 1.4142. Using

X-3 V2 < — (Xl - V2Y


23(V2)

from the above and assuming xi — -y/2 < 0.1, we see that x3 — \/2 <
(0.1)5 = 0.00001. The method is as good, or better, than the usual
278 THE REAL NUMBERS [chap. 7

“grade-school” algorithm for computing square roots, and generalizes, as


described, to the computation of arbitrary nth roots, which the school
algorithm does not. In addition, the latter has the disadvantage of involv¬
ing certain “trial-and-error ” procedures in the progress of the computa¬
tion. Newton’s method (as should any practical method) lends itself
easily to the use of calculating machines.

Location of all roots: Sturm’s theorem. Let us return to the procedure


described earlier for computing at least one root of a continuous function
F for which we have a, b with F(a) < 0, F(b) > 0. It might first be
thought that, by considering the sign of F at the end points of all sub¬
intervals obtained from the original interval by successive subdivisions by
2, we can eventually locate all solutions x of F(x) = 0. However, it can
be seen that we could not achieve this for a function whose graph is as
shown in the following figure.

Figure 7.13

If in this figure c2 is irrational, then we will never find out by such calcula¬
tions that there are any solutions x of F(x) = 0 between a and 6, other
than ci, since the sign of F at the end points of each subinterval which
contains c2, but not c1; will be positive.
The situation pictured in Fig. 7.13 could not occur if the function we
are dealing with is a polynomial function / which has only simple real
roots in the sense of 6.38. For if c is any real root of /(£) we have /(£) =
(£ — c)ff(£) with g(jj) E Re[£] and g(c) ^ 0. Say, for example, that
g(c) > 0. Then by continuity of g we can find an e > 0 such that g(x) > 0
whenever \x — c\ < e. Hence f(x) has the same sign as x — c if c — e < x <
c + e and thus f(x) < 0 if c — e < x < c and/(;r) > 0 if c < x < c + e.
Similarly, in the case that g(c) < 0, f(x) has the same sign as — (x — c)
in a suitably small interval around c.
This leads to an algorithm for isolating all the real roots of an arbitrary
polynomial /(£) e Re[£], For by 6.39, if we take d(£) = (/(£),/'(£)) and
/i(£) = f(k)/d(0, /i(£) has exactly the same real roots as /(£) and /x(£)
has only simple roots. Furthermore, we can find by 7.53(h), a and b between
which all roots of /(£) lie. Thus, because of the remarks of the preceding-
paragraph, the general procedure described above can be successfully
carried through for fx, giving the desired result for/.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 279

Even if a continuous function F has roots of the sort pictured in Fig. 7.13,
we could still apply the general method for isolating all of these if, first
of all, it has only finitely many roots in all, and, second, we have some
algorithm which tells us exactly how many roots F has in any given in¬
terval. Such an algorithm can in fact be given for arbitrary polynomial
functions / by a judicious use of Euclid’s algorithm for computing a
greatest common divisor of/(£) and/'(£). The statement of this new algo¬
rithm is known as Sturm’s theorem. While it is not needed in our further
work, it is presented here for its own sake and because of its distinctive
character relative to the algebra of real numbers. Before proving the
theorem, we introduce two concepts which play an important role in it.

7.58 Definition, (i) Suppose that (x0, . . . , is a k-termed sequence


Xk—X)

of real numbers (with k possibly 0, in which case the sequence is empty).


We say that the sequence is reduced if for each i < k, ay 5^ 0. With
each nonreduced sequence (x0, . . . , Xk—X) we associate a reduced
sequence (y0, . . . , yi_x) by the following recursive conditions: For
k = 0, we take 1 = 0. Given a reduced sequence (y0, . . . , yi—i) for
{xQ, . . . , Xk_ 1), we take the same sequence for (x0) . . . , Xk-X, Xk) if
Xk — 0; otherwise we take the sequence (y0, . . . , yi—X, Xk).
(ii) By the number of variations of sign of a reduced sequence (y0, . . . ,
yi-\), which we denote by V({yG, . . . , yi_x)), we mean the number
determined by the following recursive conditions: If l = 0 or l = 1,
V((Vo, ■ • • , Vi-1)) = 0. Given V((y0, . . . , yi-i)) for l > 1 we take

V(y0, . . . , yi_lt yi)) = V((y0, . . . , yt_x»


if
yi—i ■ yi > 0,
and we take

V((y0, ■ ■ ■ , J/i—i, Vi)) = V((y0, • • • , 2/1—1)) + 1


if
yi-i - yi < 0.

We take the number of variations of sign of an arbitrary sequence


(x0, . . . , Xk-1) to be V((y0, . . . , yi-i)), where (y0, . ■ ■ , yi-1) is the
associated reduced sequence; we also write V{{xq, . . . , ^—i)) for this
number.

The condition yi—\ • yi > 0 means simply that yi—X and yi have the
same sign, i.e., either yi—\ > 0 and yi > 0 or yi—\ < 0 and yi < 0.
The condition yi—\ ■ yi < 0 means that they have opposite sign, i.e.,
either yx_ 1 > 0 and yi < 0 or yi_x < 0 and yi > 0. As an example of
the above, consider the computation of E((0, 2, 0, 1)). The associated
280 THE REAL NUMBERS [CHAP. 7

reduced sequence is <2, 1) and F((2, 1)) = F((2)) = 0. As another


example,

F«2, -1, 0, 0, 3, 1, 0, 2, -2, —5)) = F«2, -1, 3, 1, 2, -2, -5))


= F«2, -1,3, 1,2, -2))
- F«2, -1, 3, 1, 2)) + 1
= F«2,-l,3,l)) + l
= F«2, -1,3)) + 1
= F«2, -1)) + 2
= F«2)) + 3 = 3.

That is, ignoring zero values, we have three changes of sign in the original
sequence, namely from 2 to —1, from —1 to 3, and then finally from
2 to —2.
One way of finding a greatest common divisor d(f) of /(£),/'(£) is by
repeated use of the division algorithm 6.26 in just the same way as one
uses Euclid’s algorithm for finding a gcd in integers (which we discussed
following 4.42). That is, we write /(f) = 0o(f)/'(f) + o(f), where
deg (r+f)) < deg (/'(f)), then apply the division algorithm to /'(f),
r+f), and so on. For this discussion it is more convenient to determine
o(f) by /(f) = 0o(£)/'(£) — o(D* We then write

/'(£> = 0i(£)o(f) - r2(f), rx(f) = 02(£F2(f) - r3(f), . . . ,


O-i (€) = 0i(f)o(f) — O+i (f),

where 0 < deg (o+1(f)) < deg (o(f)). We continue this procedure
until we reach the first m with deg (rTO+1(f)) = 0. If rm+1(f) = 0, we
have rTO_i(f) = </TO(f)rm(f), and rm(f) is the desired gcd. Otherwise,
rm+i(f) is a constant, rm+1 5^ 0, and rm(f) = 9m+i(f)o+i, since any

constant divides any polynomial. In this case, any nonzero constant is


a gcd, in particular 1 is. It may of course happen that already deg (r+f))
= 0, in which case either /'(f)|/(f) or (/(f),/'(f)) = 1.

7.59 Definition. Letf(£) e Re, deg (/(f)) > 0. By the Sturm sequence
associated with /(f) we mean the sequence (/0(f), /i(f),..., /TO(f))
determined by the following recursive conditions:
(i) /o(f) = /(€),/i(f) = /'(f);
(ii) /or eac/i i with 0 < i < m, +_i(f) = 0f(f)/<(f) - /i+1(f),
w^ercO < deg (/<+1(f)) < deg (+(f)) and fi+1(Z) ^ 0;
(hi) /m-l(f) = 0«(f)/m(f).
For any real number c, we take F/(c) = F((/0(c), /j(c), . . . ,/m(c))).
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 281

As an example of a computation of a Sturm sequence, it can be seen that


for KO = /„(*) = £4 - 2£3 + 2e — 2£ + 1, we have/'U) = f^) =
4£3 — 6£2 + 4£ — 2, /2(£) = — + £ — f, and the last term is
/3U) = —32£ + 32. Then, for example, F/(0) = F((l, —2, —f, 32)) = 2;
a similar computation shows that F/(2) = 1. Note that in this case
/(£) = (£ — 1)2(£2 + 1) has exactly one real root, namely c = 1.
Sturm’s theorem is now as follows.

7.60 Theorem. Suppose that /(£) e Re[£], deg (/(£)) > 0. Suppose that
a < b and f(a) ^ 0, f(b) 9^ 0. Then the number of distinct roots
c °f /(£) with a < c < b is equal to F/(a) — F/(6).

Proof. Let (/0(f),/i(f),/i(£), ... ,/«((;)) be the Sturm sequence of


/(£)• Thus /m(£)|/i(£) for each i and /m(£) is a gcd of /(£), /’(I). We put

(1) h(£) = h(£)/fm(£), whenever fm(Z)\h(£).

Thus/m(£) = 1 identically. It follows from 5.17 that whenever c is a root


of /(£) of multiplicity k, we have

/(£) = (£ — c)kg{i), with gr(c) ^ 0,


and
/'(i) = - c)*-vu) + a - c)V(€).
Hence c is of multiplicity k — 1 in /m(£), /m(£) = (£ — c)fc-1s(£), where
s(f)|^(f) and s(£)|</(£). We can thus conclude that

(2) whenever f(c) = 0 there are k > 0 arid gf(£) with g(c) 5^ 0 and

/()(£) = (£ _ C)d(£b
7i(f) = A#(£) + (f - c)^(€).

Note also that

(3) if f(x) 0 then fm(x) 9* 0 and Vf{x) = F((/0(x), Ji(x), . . . ,


Jm(x))).

For /w(f) has no real roots other than those of /(£). Then for any such x,
division of each term of the sequence (/o(x), /i(x), . . . ,fm{x)) by the non¬
zero constant fm(x) does not change the number of variations of sign
(although it will change individual signs if it is negative). We put

(4) V/(x) = F((Jo(x),/1(x), . . .,?„(*)))

for each x. We need not have F/(x) = F/(x) when/(x) = 0. However,


by (3) and the hypothesis, V/(a) = F/(a) and V/(b) = F/(6).
282 THE REAL NUMBERS [CHAP. 7

Let us now think of x as moving from a to b. The main part of our


proof will be to show that Vj(x) can change only when x passes through
a value c such that /(c) = 0 and that, furthermore, Vj(x) decreases by 1
in such a passage.
We consider the totality S of all roots of all the polynomials associated
with the Sturm sequence (without regard to the initial interval [a, b]):

(5) S = {x\ffix) = 0 for some i < m].

Since each polynomial has only finitely many roots, we can write

(6) S = {d\, d2, ■ . . , dt} where di < d2 < ■ ■ • < dt.

Pick d0, dt+1 arbitrarily with d0 < di and dt < dt+\. Thus

(7) if i < m and 0 < / < t + 1 then fi(x) has a fixed sign for all
x with dy_x < x < dj, i.e., either ffix) > 0 for all such x or
fi(x) < 0 for all such x. If further Jfidf) ^ 0, then ffix) has the
same sign as Ji(dj) for all such x.

To see the first part, suppose there were Xi, x2 between dy_i and dj
with ffixi) > 0 and fi(x2) < 0. Then by Weierstrass’ theorem 7.48 there
would exist an x between x\ and x2 such that ffix) = 0. This contradicts
(5) and (6). The proof of the second part is similar. It follows that

(8) if 0 < j < t + 1 and d,j_i < .iq, x2 < dj then Vj(x1) =
V?(x2).

Thus we need only investigate how Vj(x) changes at the dj. Note that

(9) there is no i with 0 < i < m and x such that fi_i(x) = 0 = fi(x).

For if i — m, Ji(£) = fm(£) = 1. On the other hand, if i < m and


fi—i(x) — 0 = ffix) then from

(10) 7*-i(£) = gfiOffiO - 7<+i(f),

we would conclude that also fi+fix) = 0, hence also fi+2(x) = 0, . . . ,


so that we eventually obtain fm(x) = 0, which is impossible.
We claim now that

(11) tf0<j<t+l and dy_x < x < dj then


Vj(dj) = Vj(x) ifrf(dj) 5^ 0 and
V?(dj) = Vj(x) — 1 if f(dj) = 0.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 283

We write c for dj. To prove (11) let us compare the signs in the sequences

(Jo(c)Jl(c), . . . ,7i-l(c),7*(c),7i+l(c), • • . ,fm(c)),


(fo(x),fi(x), . . . . . . ,fm(x)).

If for given i with 0 < i < m, we have fi(c) — 0 then by (9) /i_i(c) 5^ 0
and by (10) i(c) = —Ji+1(c). Hence we have one variation of sign
at this position for x = c. Again by (9) 7i— i(x) has the same sign as
7i-i(c) and fi+1(x) has the same sign as /;+1(c) for 1 < x < dj.

Thus no matter what the value of fi(x) is, there will continue to be only
one variation of sign at this position in the sequence. Hence, if Vf(x) is
different from Vj(c) it can only be that J0(c) = 0. In this case we have
by (2) certain k > 0 and <7 (if) with fo(x) = (.x — c)g{x), fi(x) = kg(x) +
(x — c)g'(x) for all x, where g(c) 5* 0. Thus J0(c) = 0, 7i(c) — kg(c).
On the other hand, if dj—i < x < dj = c, g(x) 7= 0 and we see by (9)
that g(x) has the same sign as g(c), and J\(x) has the same sign as 7i(c),
hence as g(c). But /0(x) has opposite sign to that of g(x) for x < c; hence
fo(x) and fi(x) have opposite signs for < x < dj. In other words,
as we move to x = c, we lose one variation in sign at this position in the
sequence. It may well happen that, in this case as well, there is some
i > 1 with Ji{c) = 0. But the same argument as before shows that this
can cause no change in the existence of a variation at such a position.
Hence (11) is proved.
Now, given a, b with a < b and f(a) 5^ 0, f(b) ^ 0, we can further
assume of d0, dt+1 that d0 < a, b < d(+1. There is a unique l < t + 1
with di-1 < a < di. Since the roots off are exactly the same as those of /
by (2), we can apply (11) to prove by induction on j that if l < j then
Vj(a) — Vf(dj) is equal to the number of distinct roots c of /(£) such that
a < c < dj. I11 particular

(12) Vf(a) — Vj(dt-1-1) is equal to the number of distinct roots c of


/(if) such that a < c.

The same argument applies to b, so that Vj(b) — Vj(dt+i) is the number


of distinct roots c of /(£) with b < c. Hence

(13) Vj(a) — Vj(b) is the number of distinct roots of /(if) with


a < c < b.

By our observation from (3), this proves the theorem.


Sturm’s theorem can be combined directly with 7.53 to determine in a
finite number of steps exactly how many real roots a given polynomial/({)
with real coefficients has. We simply take as a, b the numbers M 1,
M -v 1 where M is as specified in 7.53 and compute F/(a) — Vf(b).
284 THE REAL NUMBERS [CHAP. 7

However, it is even simpler to consider that a and b are chosen so as to


include between them all the numbers —Mi, Mi corresponding to all the
terms /;($) of the Sturm sequence of /($) [and hence all the roots of all
these /*(£)]. We know by 7.53 that if /,($) is of degree w* > 0 and
/;($) = dn].kni -f ••• has leading coefficient dn\, then fi(b) has the
same sign as d^. and fl(a) has the same sign as d(n.(— l)nL Of course,
if /;($) has degree 0 it is a constant and has fixed sign. To illustrate these
computations with the Sturm sequence of /($) = /0($) = $4 — 2$3 -f
2— 21 + 1, which we computed earlier to be

flit) = 4$3 - 6$2 + 4$ - 2,


hit) = + S ~ l fs(0 = -32$ + 32,

we see that for suitably large values of 6, Vf(b') has the sequence of signs
+, hence V/(b) = 1. On the other hand, F/(a) for suitably small
(i.e., large negative) values of a has the sequence of signs +, —, —, -f,
hence Vf (a) — 2. Thus/($) has exactly one real root c [which, in this case,
we already know from the decomposition/($) = ($ — 1)2($2 -f- 1)].
Once we know exactly how many real roots a polynomial /($) has and
we have a rational interval a0, b0 such that all roots c lie between a0 and
b0, the Sturm procedure will lead in a finite number of steps to finding
disjoint rational intervals ait bi each of which contains exactly one root
Ci with ai < a < bi. Then for each such root we can find a fundamental
sequence of rationals which approaches it and which we can calculate to
any desired degree of accuracy by the method described earlier. Although
this is a tedious matter for “hand” computation, it is quite routine once
the Sturm sequence for the given polynomial is found and is quite suitable
for machine computation. A more general algorithm is known which will
not only determine the number and location of the roots c but also their
multiplicities. We shall not develop this procedure here.
It is natural to ask whether we can develop, as we did with the Null-
stellensatz, an algorithm to do the following: for any continuous function
F whose values F(x) can be computed for all x by some procedure, the
algorithm will compute for us a (fundamental sequence converging to a)
number c between a and b at which F attains its maximum in the interval
[a, 6]. A partial answer from analysis is that this is possible for a re¬
stricted class of functions, namely among those whose derivative function
F' is continuous. In this case it can be shown that if F attains its maxi¬
mum at c in [a, 6], then either c = a or c = b or a < c < b, F'(c) = 0
and F{c) > F(a), F(c) > F(b). This provides us with the required
algorithm for any function F for which we have an algorithm for finding
all solutions c of F'(c) = 0. In particular, this applies to all polynomial
functions on the real numbers. A similar result holds for locating minima.
7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 285

The arguments needed to realize these results for polynomial functions


are developed in the exercises.
However, as we mentioned in Section 1.2, it can be shown that no such
algorithm can ever be developed which will meet the general conditions
described above. In order to prove that such is impossible it is of course
necessary to have a precise definition of the notion of an algorithm or
effective computation procedure. Such a definition has been arrived at in
recent years by several different routes, the results of which have all
turned out to be equivalent. It is generally accepted that this gives the
widest possible notion of algorithm, involving far more latitude than is
ever met in practice. This theory of algorithms, or of general recursive
functions as it is usually called, is relevant to many situations in algebra
and analysis where we meet with existence statements, instances of which
are not clearly subject to computation. We cannot begin to go into this
material in this book and have mentioned it only so the reader will begin
to realize what are certain inherent limitations in the subject. On the
other hand, it is not necessary to have a precise definition of algorithm in
order to recognize specific cases of algorithms, e.g., the division algorithm,
Euclid’s algorithm for finding a gcd, the algorithm for locating at least
one root of a continuous function according to Weierstrass’ theorem,
and the algorithm of Sturm’s theorem for finding the total number of real
roots of a real polynomial.

Rational and real powers of real numbers. We wish to conclude this


section with some remarks about the fractional powers a1/n introduced
in 7.56 for any a > 0 and any n G P. Assume that a is fixed in the follow¬
ing and that, for nontriviality, a > 0. It is easily seen that if n, nx G P
and I and m/n = mx/nx then (a1/n)m = (alln^)mK Hence we
can unambiguously define ax for any rational x to be (a1/n)m whenever
x = m/n where n G P and me I. Then the basic properties of exponentia¬
tion which we developed in 4.12 for positive integer exponents and extended
in 6.5 to arbitrary integer exponents can be seen to hold also for arbitrary
rational exponents. For example, for any x, y G Ra we have a +v =
ax ■ av and ax'v = (ax)y. We leave it to the reader to verify these facts.
For any real a > 0, consider the function Ga(x) = a with domain Ra.
It can be shown that if (xk) is any fundamental sequence of rationals,
then (Ga(xk)) = (a1*) is a fundamental sequence of real numbers. Further¬
more, if (xk), (Vk) are two such fundamental sequences for which limfc^oo
xk = limfc^J yk then limfc_» ax* = lim^„ avh Hence we can unam¬
biguously define ax for any real number x as being lim*-,* ax* whenever
x = limfc-yoo xk and each xk G Ra. Then the function Fa(x) = ax has
domain Re and is an extension of Ga, that is, Fa(x) = Ga(x) for all x G Ra.
From the fact that the basic properties of 4.12 and 6.5 hold for arbitiaiy
286 THE REAL NUMBERS [chap. 7

rational exponents we can see, by taking appropriate limits, that the same
properties continue to hold for arbitrary real exponents.
Now it can be shown that for any a > 0 the function Fa is a continuous
function on Re. Essentially this reduces to showing that Fa is continuous
at 0. For if we wish to show it continuous at any real number b, we are
considering

Fa(x) - Fa(b) = ax - ab = a\ax~b - 1).

Thus by continuity at 0 we can make |Fa{x) — Fa(b)\ as small as we please


by taking \x — 6| so close to 0 that |ax~b — lj is small enough. Moreover,
suppose that H is any continuous function on Re with H{x) = Ga(x) for
all rational x. Then if (xk) is any fundamental sequence of rationals so
is (H(xk)) and lim*^* H(xk) = //(lim^*, xk) by 7.47. Hence if x is any
real number, Umk^xaXk — H(x) whenever x = lim^*, xk, with each xk
rational. In other words, H(x) = Fa(x) for all x e Re. Hence, Fa is the
unique continuous function which extends Ga to all of Re.
Recall the function E defined by a power series in 7.43. We showed in
7.45 (vi) that E(m) = em for all me I, where e — E(1)(>0). It is also
easily seen from 7.45(h) that E(l/ri) = elln for any n e P and then that
E(m/n) = emln for any n e P, m e I. In other words, E(x) = ex for
all x e Ra. Moreover, E is a continuous function on Re by Exercise 4
below. Thus E(x) = ex for all real numbers x by the preceding.
All of the above is an instance of a fairly general situation. Suppose
that we are given a function G with domain Ra and range cRe such that:
(i) if (xk) is a fundamental sequence of rationals then (G(xk)) is a funda¬
mental sequence of reals, and (ii) if (xk), (yk) are two such sequences with
lim^oo Xk = limfc-wo yk, then limfc__>ao G(xk) = lim^^ G(yk). Then we can
unambiguously define a function F on all real numbers x by F(x) —
lirn^o, G(xk), whenever x = linifc^* xk, with all xk e Ra. Then F is an
extension of G to Re which, under suitable additional conditions on G,
is also a continuous function on Re. There can be at most one such con¬
tinuous function. Thus, from the point of view of analysis, F is the
natural extension of G to Re (though there are, of course, infinitely many
noncontinuous extensions). In many cases various general properties
which hold for G on Ra can be seen to continue to hold for F on Re by
systematic use of the way in which F is defined.
Returning to exponentiation, for any real number b we can also consider
the function Fib) given by F(b)(x) = xb; this is defined only for real numbers
x > 0. It can be seen that F<b) is continuous at any number a > 0.
This involves considering

\F(b\x) - F(b\a)\ = \xb - a6|.


7.4] POLYNOMIALS AND CONTINUOUS FUNCTIONS 287

If we set x = a + h, then by a generalization of the binomial expansion


4.36 it can be shown that

b
(a + h)

which is in this case (for b & P) an infinite sum. Then

which is close to 0 for h close to 0. The validity of such an expansion and


the proof of continuity of Ftb) (on its domain) is given in advanced courses
in calculus by means of a more thorough treatment of power series.

Exercise Group 7.4

1. Prove Lemma 7.47.


2. Show that if G, H are continuous and H(x) ^ 0 for all x then the func¬
tion F determined by F(x) = G(x)/H(x) for all x is continuous.
3. Suppose that Fo, Fi, , Fk, ... is a sequence of functions, each with
domain Re, and F is a function on Re such that for each real number x,
lim^oc Fk(x) — F(x); in other words, for each x and each e > 0 there
is an m such that | F(x) — Ft(x) \ < € for all k > m. Given a, b with
a < b, we say that the sequence (Fk) converges uniformly to F on the
interval (a, b) if m can be chosen independent of x in (a, b), i.e., for each
e > 0 there is an m such that |F(x) — F k(%)\ < e whenever k > m and
a < x < b. Show that if this holds for a, b and if a < c < b and each
Fk is continuous at c then F is continuous at c. (Examples can be given
to show that the hypothesis of uniformity cannot be dropped here.)
Hence if for each k, Fk is continuous on Re and (.Fk) converges uniformly
to F on every interval (—a, a) then F is continuous on Re.
4. Suppose that F{x) = E*=o a<xi is defined for every x G Re. Prove that
F is continuous on Re.
5. Suppose that /(£) G Re[£], /(£) = E"=o where n > 0 and an = 1.
Let N be the sum of all the a; which are <0, or N = 0 if there are no such.
Let M = max (1, N). Showthat/(x) > 0 if x > M [cf. 7.53(i)]. What
is a comparable lower bound for the roots of /(£)?
6. Show that xn = a has:
(a) exactly two real solutions x if n is even and a > 0,
(b) exactly one real solution x if n is odd, and
(c) no solutions if n is even and a < 0.
7. Given n G P, a G Re, a > 0. Let x0 be any number with \/a < x0,
and then determine (xk) by

n — 1
Xk+1
n
288 THE REAL NUMBERS [CHAP. 7

Show that \Ta < Xh for all k (so that Xk ^ 0 and Xk is well defined bjr
the preceding for all k), and that lim/^* Xk = v'a.
8. How many real roots does/(£) = — ^£2 + £ — 2 have? Find the
smallest interval [a, f>] with a, 6 £ I which contains all these roots.
9. Suppose that/(£) £ Re[£] is of degree n > 0 and that a < b. Prove the
following:
(a) If /(a) = fib) = 0 then /'(£) has at least one root c with a < c < b.
This is known as Rolle’s theorem. [Hint: Consider first the case that
/(£) has no roots between a and b, and use the factorization /(£) =
(£ — a)4(£ — b)lgig), where g(a) ^ 0, g{b) ^ 0.)
(b) No matter what /(a), fib) are, there exists c with a < c < b and
f'(c) = [f(b) — f{a)]/(b — a) ilaw of the mean for the differential
calculus).
(c) If fix) > 0 for all x with a < x < b then / is increasing in [a, 6],
that is, f{x) < fiy) whenever a < x < y < b. Similarly, if/'ix) < 0
for all x with a < x < b then / is decreasing in [a, 6].
(d) If / attains a maximum at c in the interval [a, b], that is, a < c < b
and fix) < fic) whenever a < x < b, then either c = a or c = b or
fie) = 0. The same holds if/ attains a minimum at c in the interval
[a, 6].
10. Develop the properties of the functions Gaix) = axix £ Ra) and Faix) =
axix £ Re), as indicated at the end of this section.

7.5 Algebraic and transcendental numbers. At the beginning of this


chapter we suggested that the main algebraic motivation for extending
the rational number system to the real number system was to give us
greater freedom in finding roots of polynomials with rational coefficients
and greater insight into algebraic problems involving the rational numbers.
This has been achieved in part by the work of the preceding section, and
will be completed in the work of the next two chapters. On the other
hand, the motivation for the particular means to construct the real number
system in a satisfactory way was essentially geometric in nature, by
“filling the gaps” in the ordering of the rational numbers. We now turn
to the question of whether the algebriac approach would have led to the
same result. By considering functions like the exponential functions
discussed at the end of the preceding section, we may be led to suspect
that this is not so and that in this respect, the real numbers provide much
more than is necessary for the solution of polynomial equations over the
rationals.
I or example, it is not difficult to see that for any real number a, we have
a < 2°. Since 2° = 1, and since the function Fix) = 2X — a is con¬
tinuous, it follows from Weierstrass’ Nullstellensatz that for any a > 1
there is a (unique) real number x with 2X — a = 0, that is, 2X = a.
(Such a solution x is usually denoted as log2 a; it can also be seen that
log2 a is defined for any a > 0.) For example, we can solve the equation
7.5] ALGEBRAIC AND TRANSCENDENTAL NUMBERS 289

2X = 3. However, we see no way of finding a polynomial /(£) e Ra[£]


for which this x is a root. To phrase the question involved more concisely,
we make the following definition.

7.61 Definition. A real number x is said to be algebraic if there exists a


polynomial /(£) e Ra[£] with deg (/(£)) > 0 and fix) = 0. If x is
not algebraic it is said to be transcendental.

Thus the question we are raising is whether there exist transcendental


numbers. We shall prove in this section that there do. The significance
of this result, beyond what has already been indicated, will be discussed
again later.
We have already used the word “transcendental” in speaking of simple
transcendental extensions D = K[£] of a field K in Definition 5.2. The
connection between that concept and the one given here is very direct.
Suppose that a; is a transcendental real number. Let D consist of all num¬
bers E"=o diX* for ai G Ra; we write D = Ra[x] to indicate this depend¬
ence. Hence Ra[x] = {/(>):/(£) e Ra[|]}. Then Ra[x] is easily seen to
be an integral domain (under the operations of Re) which satisfies con-
dition (i) of 5.2 for x in place of £. Furthermore, it also satisfies 5.2(ii),
for if n n

/(£) = J2G Ra^ and ^2 aix* =


i=0

then by the preceding definition deg (/(£)) = 0, /(£) = a0, and hence
a0 = 0; thus ai = 0 for each i < n. In other words, if Ra[x] is defined
as above, it is a simple transcendental extension of Ra. It follows from
5.6 that Ra[£] ^ Ra[.c], under the natural correspondence F(a) = a for
each a G Ra and F(£) = x. Note, however, that the statement that there
exist transcendental real numbers, in the sense of 7.61, is a much more
special statement than the general existence theorem 5.7 for simple trans¬
cendental extensions, even when the latter is restricted to K = Ra.
Once the existence of transcendental numbers is proved, one can go on
to ask whether particular real numbers, such as e, ir, \/2v5, the solution
x of 2X = 3, etc., are transcendental. As we shall see, the first method we
shall use to obtain the existence of transcendental numbers will, in principle,
permit us to “exhibit” a specific number of this kind; however, this
approach would be impractical. In the remaining part of the section we
will take up a second proof which does show how to construct some
particular transcendental numbers in a simple explicit way. Concerning
the specific numbers just mentioned, the matter is more difficult; we shall
say more of this later.
Cantor's method. The first method of proof is due to Cantor, the founder
of the mathematical development of set theory, who lived toward the end
290 THE REAL NUMBERS [CHAP. 7

of the 1800’s. In the specific problem we have before us, we are dealing
with two infinite sets A, B with B c A, of which we are trying to show
that A ^ B, that is, A — B is nonempty—namely, A = Re, B = the
set of algebraic real numbers. If we do not have any simple or direct
method to exhibit an element x of A — B, we may ask whether there might
not be some other way of showing that A has more elements than B.
For the case that B c A, this would simply amount to showing that A
and B do not have the same number of elements. Speaking precisely, we
have in mind here using the notion of set-theoretical equivalence introduced
in (2:4-4). This was a special case of isomorphism of systems; sets A
and B are set-theoretically equivalent if the systems (A) and (R) are =.
We now introduce a special symbol for this relation.

7.62 Definition. We write A ~ B if A and B are set-theoretically


equivalent, i.e., if there exists a one-to-one function F with 3D(F) = A
and (R(F) = B. We write A & B if A ^ B is not true.

Clearly, ~ shares the reflexivity, symmetry, and transitivity condi¬


tions for an equivalence relation, that is, A « A, and if A « B then
B ~ A, and if A ~ B and R ~ C then A ~ C. We shall say that A
and B do, or do not, have the same number of elements according as
A « B, or A & B. We have already used this explication of the intuitive
idea of number of elements in (2:4-5) to give precise definitions of the no¬
tions of finite and infinite sets: A is finite if whenever R ~ R and R c 4
then R = A, and A is infinite if it is not finite, i.e., if there exists a set R
with A ~ R, R c A, and B ^ A. It might at first sight appear that
any two infinite sets are indistinguishable as to their number of elements.
However, it was Cantor’s important realization that this is not the case;
he set forth the following theorem (which is easy to prove, once realized).

7.63 Theorem. Suppose that A is any set and S is the set of all subsets
of A, S = {X: X c A}. Then:
(i) there exists AcS such that A ~ A, but
(ii) A 56 S; in fact, there is no function F with 2D(F) = A and
(R(F) = S.

Proof, (i) We put X £ A if X has exactly one element, i.e., X = {x}


for some x e A. The required one-to-one mapping F from A onto A is
given by F(x) = {x}.
(11) Suppose, to the contrary, that there exists a function F with
©(F) = A, 01(F) = S. Let B — {x\x^A and x & F(x)}. Thus Re S
and for some b e A, F(b) = B. Either b e R or not. In the first case
b 6! F(b) by the definition of R; but then b 2 R, which is impossible.
In the second case, that b <2 R, we have b 5? F(6), hence b is one of the
7.5] ALGEBRAIC AND TRANSCENDENTAL NUMBERS 291

elements x of B, that is, 6 e B, which is again impossible. Thus the


assumption that there exists such a function F leads us to a contradiction.
We have seen in Exercise 5, Exercise Group 4.4, that if A is finite then
S is finite—namely, if A =» {1, 2, . . . , n} where n e P then S «
{1, 2, . . . , 2”}. From (i) of the preceding it is intuitively clear that if A
is infinite then S is infinite. Speaking precisely, if we can find B c A
with B ~ A but B A, then by (i) we can find B c A with B « A
but B ^ A. If we take Si = (S — A) U B, it is easily seen that Sx c S,
Sx ~ S, but Sx 7^ S. According to (i) we should say that S has at least
as many elements as A; but then by (ii) we should say that S has a greater
number of elements than A. [Of course, it is trivial that S properly contains
the particular set A defined in the proof of (i); what is not so trivial is
that no set A which satisfies (i) can be equal to S.] Hence, if A is infinite
then, in some sense, S is of a higher order, or magnitude, of infinity than
A. Since from any set S we can form the set §(S) = the set of all subsets
of S, starting with any infinite set A we can form the successively higher
orders of infinity corresponding to A, S(A), §(§(A)), .... In our work
here we will be concerned only with the first of such steps.

Denumerable and nondenumerable sets. The simplest infinite set to which


we can apply Cantor’s result is A = P. Consider any subset X of P and
imagine the following procedure. We consider each positive integer
1, 2, 3, ... in turn, and inquire at each step whether it is or is not a member
of X. If it is not, we write down 0, otherwise we write down 1. This gives
rise to an infinite sequence (mk), k = 1, 2, 3, . . . , of 0’s and l’s, to which
we can associate a real number 0.m1m2m3 ... in the representation to the
base 2 (7.36). In other words, we are defining a function F on the set of
all subsets S of P with F(X) = E"=o mf/2i where m0 = 0 and for each

(l if keX,
lo if k & X.

At first sight this seems to show that Re has at least as many elements
as S, and hence Re ^ P. We shall see that this is true, but there is still
a slight complication which must be overcome. The function F just
defined is not one-to-one because of the failure, in certain cases, of the
uniqueness of the representation. Thus, for example, the sets AG = {1}
and A2 = {2, 3, 4, . . .} will have F(Xx) = F(X2). Realizing this, we
need only modify our function F slightly to guarantee uniqueness, namely
by taking, say, F(X) = ZLo mj3i where the mk are determined as
before. To see that if F(AX) = F{X2) then Xx = X2, we make use of
the uniqueness result 7.39 for numbers a not ol the form 6/3 and of
Exercise 2, Exercise Group 7.3. Each real number a of the form b/3k
292 THE REAL NUMBERS [chap. 7

has just two representations, one ending eventually in 0’s and the other
eventually in 2's; however, this last case never arises with the given choice
of the mi’s.
This argument thus shows that there is a subset S of Re with S ~ S.
From this we will easily conclude that P ^ Re and that, in fact, as in
7.63(ii), there is no function F with 2D(F) = P and (51 (F) = Re. Since
this is a relationship that will keep recurring we introduce the following
terminology.

7.64 Definition. We say that a set A is denumerable if either A is empty


or there exists a function F with 2D(F) = P and (R(F) = A; we say
that A is nondenumerable if it is not denumerable.

The words “countable” and “uncountable” are also in frequent usage.


A denumerable set can be finite according to this definition, since we are
not requiring F to be one-to-one. Some authors restrict the use of the
word “denumerable” (or “countable”) to infinite sets. Others who use
the above definition single out the infinite denumerable sets by the phrase
“denumerably infinite.” In any case, “nondenumerable” (and“uncount¬
able”) is taken to apply only to infinite sets. We can now rephrase our
result 7.63(ii) by: the set of all subsets of P is nondenumerable.

7.65 Theorem. Suppose that A, B are any sets.


(i) If A is denumerable and A ~ B then B is denumerable.
(ii) If A is denumerable and B c A then B is denumerable.
(iii) A is denumerable if and only if A is finite or P ~ A.

Proof, (i) Given F with 2D(F) = P, (R(F) = A, and one-to-one G with


2D(G) = A, (R(G) = B, we form the composition Fx of G and F, that is,
Fi(n) = (7(F(n)) for all n. Then 2D(Fi) = P, (R(FX) = B.
(ii) Given F with 2D(F) = P, (R(F) = A, and B c A with B ^ 0,
let be any element of B. Define

F,(n) = |FW ‘f F(k) g B•


(Xi otherwise.

Then 3D(FX) = P, (R(Fa) = B.


(iii) Suppose that A is not finite and that we have F with 2D(F) = P
and (R(F) = A. Then for any n e P there exists an x e A with x &
(F(l), F(2), . . . , F(n)}, for these sets are all finite. We shall now con¬
struct a one-to-one function G with 2D(G) = P, <R((7) = A by simply
eliminating the (possible) repetitions in F. The function G is defined
recursively as follows. We take G( 1) = F(l). Suppose that we are given
(7(1), . . . , G(n), defined in such a way that for some m > n, {(7(1), . . . ,
7.5] ALGEBRIAC AND TRANSCENDENTAL NUMBERS 293

G(n)} = {F( 1), . . . , F(m)}. Consider the least k > m with F{k) g
(F(l), . . . , F(m)}; we take G{n + 1) = F(k). Then we can prove by-
induction that for each n there exists an m > n with {(7(1), . . . , G(n)} =
(F(l), . . . , F(m)} and that the values (7(1), . . . , G{n) are distinct. Thus
G is seen to establish P A. Conversely, if P ~ A it is clear by defini¬
tion that A is denumerable. If A is finite and nonempty then A ~
{1, . . . , n} for some n E P. But then A is denumerable by (i) and (ii).

7.66 Theorem. Re is nondenumerable.

Proof. Let S be the set of all subsets of P. We have given an argument


(preceding 7.64) to show that there exists a subset S of Re with S ~ S.
But by Cantor’s theorem, S is nondenumerable, hence so is S by 7.65(i).
But then Re cannot be denumerable by 7.65(ii).
One often sees a more “concrete” proof of this theorem, which goes
something like the following. Suppose that Re were denumerable. Then
so also would be the subset S consisting of all numbers a with 0 < a < 1
and with an expansion a = 0.m1m2m3 ... in the base 3 where each m;
is 0 or 1. But then we could enumerate

V _ \d
O -
r„(l) y Ci„(2) ydA3) y .
-I
. ./ y

which we list as follows:


. . .
A2)
(7:5-1) 0.mi2,m(22,m(32) . . .

,(3)
0.m(i3)m(23)m(33) . . .

For contradiction, we shall produce a number a e S which is not in this


list. Given any number m = 0 or 1, we write

1 if m = 0,
0 if m = 1.

Then we take

(7:5-2) a = . . .

The representation of a differs from that of each member of the supposed


list in at least one position, namely it differs from a"' at the nth position.
By the uniqueness of the representations involved, a is not in the list.
There are no ideas in this proof which are not already contained in the
proof of 7.66 via Cantor’s theorem 7.63(h). However, because of the form
of the argument, the basic technique employed in the proof here or in
7.63(h) is often called Cantor's diagonal method.
294 THE REAL NUMBERS [CHAP. 7

We can now begin to see where the hope lies in proving the existence of
transcendental numbers—we should try to show that the set of algebraic
real numbers is denumerable. We shall do this in a series of steps, by show¬
ing that I, Ra, Ra[£], and finally the set of all roots of members of Ra[£],
are all denumerable. This will be possible by the following three general
theorems.

7.67 Theorem.
(i) P X P - P.
(ii) If A, B are denumerable so is A X B.

Proof, (i) Define a function F with 35(F) = P X P and (R(F) c P by


F{n, m) = 2n ■ 3m for each n, m E P. By uniqueness of the prime power
representation of integers we see that F is one-to-one. Hence P X P ~
01(F). Since 61(F) c P and 6i(F) is infinite, 01(F) P by 7.65(h), (iii).
Hence P X P ~ P by transitivity of ~.
The proof of (ii) is left to the student. We also give in Exercise 2(b)
below another, more direct way of proving (i).

7.68 Theorem. Suppose that M is any denumerable class of sets and that
for each IeM, X is denumerable. Then [JX[X eM] is denumerable.

Proof. For the proof we can assume, without loss of generality, that
M 0 and that each X E Mis nonempty, since the empty set contributes
nothing to the union considered. By hypothesis, there exists a function
F with 30(F) = P and (R(F) = M. We denote by Xk the value F(k), so
that M = {Xi, X2, . . . , Xk, . . .}. For each k there exists by hypothesis
a function Gk with 30(Gk) = P and 0i(Gk) = Xk. If x E jJX[X e M] =
s P] then for some k e P, x E Xk and hence for some l e P
x = Gk(l). By 7.67(i), P ~ P X P, so that there exists a function H
with 30(H) = P, 0{(H) = P X P. Given n E P there are thus unique
k,l E P with H(n) = (k, l). Hence there exist two functions Hi, H2 with
30(H\) = 30(H2) — P and H(n) = (Hi(n), H2(n)) for every n; these
have the property that for any k, l e P we can find n E P with Hx(n) = k,
H2(n) = l. We can thus define a function G* with 30((7*) = P and
(R((7*) = (JXfcf/c e P] by G*(n) = Giil(n)(H2(n)) for every n.
[Note that this proof uses the axiom of choice, in that for each k we must
choose one of many possible functions G with 3l>(G) = P, Oi(G) = Xk.
However, in most applications the functions Gk can be given explicitly
in advance.]

A more “visual” proof of the above is sometimes given as follows.


We can enumerate Xk as {xff x2k), . . . , x\k) . . .} (possibly with repeti¬
tions). By following the successive arrows in the diagram shown below,
7.5] ALGEBRAIC AND TRANSCENDENTAL NUMBERS 295

starting with a^, we obtain an enumeration of U-UlfceP]:

(7:5-3)

We are in effect using here the enumeration of P X P, given by Exercise


2(b) below, as (1, 1), (1, 2), (2, 1), (1, 3), (2, 2), (3, 1), (1, 4), (2, 3), . . .

7.09 Theorem. Suppose that A is denumerable. Let S be the set consisting


of all finite sequences (aq, . . . , xn) of elements of A, for all n E P.
Then S is denumerable.

Proof. For each n E P let Sn be the set of all sequences (aq, . . . , xn) of
length exactly n, where each Xi G A. We prove by induction on n that Sn
is denumerable. Clearly, A « S\. Note that ~ Sn X A, with a
suitable one-to-one mapping given by the function which associates with
any sequence (aq, . . . , xn, xn+i) the ordered pair ((aq, . . . , xn), an+1).
Hence if Sn is denumerable, so is Sn+i by 7.67(ii). The theorem now follows
from 7.68, since S — U*Sn[n E P].

The existence of transcendental real numbers.

7.70 Theorem. The set A of all algebraic real numbers is denumerable.


Hence there exist transcendental real numbers.

Proof. It of course follows from the theorem that every subset of A


is also denumerable. However, as we have already remarked, the proof
will proceed by working up to A through several of its subsets. We first
show that

(1) I Is denumerable.

For I is the union of three denumerable sets, I = P U {0} U {—n:n G P}.

(2) Ra is denumerable.

To see this, let Ra„ = {m/n: m E 1} for each n E P. Thus I ~ Ra„


and hence each Ran is denumerable by (1). But each rational number
belongs to at least one Rara, Ra = U Ra„[n E P], so Ra is denumerable
by 7.68.

(3) Ra[£] is denumerable.


296 THE REAL NUMBERS [CHAP. 7

For let S be the set of all finite sequences (a0, ■ ■ ■ , an) where each a; E Ra
and n is an arbitrary integer >0. Then S is denumerable by 7.69 so that
there is a function F with 2D(F) = P, (R(F) = S. We can define a function
G with 2D(G) = S and (R(F) = Ra[£] by

71

G((a0, . . . , an)) = ^2 a,ig for any (a0, . . . , an) E S.


i=0

Then the composite function H formed from F followed by G has 2D(H) = P,


(R (H) = Ra[f].

(4) A is denumerable.

Here we associate with each /(£) E Ra[£] the set Xf{^ — {x: x e Re and
f(x) = 0} of all real roots of /(£) (possibly empty). Let M be the class of
all these sets; X £ M if and only if X = X/^ for some /(£) E Ra[^].
By definition, a real number is algebraic if and only if it belongs to some
member of M, A — Ul[IeM], HereM is seen to be denumerable by
(3); each member Xf^ of M is finite. Hence (4) is proved by 7.68.
By going over the proofs again it can be seen that an enumeration of
Ra[£] can be explicitly described. Furthermore, Sturm’s theorem provides
us with a method to explicitly find the number and location, to any desired
degree of accuracy, of all the roots of any given /(£) e Ra[£], But then
we can describe an explicit enumeration S = (a(1), a(2), a(3), . . .} by the
expansions of an\ ai2), a(3), ... in any chosen base 6. Hence by a variant
of Cantor’s diagonal method (cf. 7:5-1), we can give a procedure which
will exhibit the representation to the base b of a certain number a & S.
It is in this sense that we can, in principle, use the above method of proof
to exhibit a transcendental number. Since we shall now present a method,
due to Liouville, for exhibiting such numbers in a much more perspicuous
form, there is no reason to try to pursue the former approach.
Cantor’s method does not make any special use of the algebraic proper¬
ties of the real numbers. For this reason it is adaptable to a wide variety
of situations in mathematics when one wants to compare various sets. It is
also for this reason that we have presented it here. In contrast, Liouville’s
method makes essential use of the algebraic properties of real numbers
and is thus limited in scope. However, it has more of particular interest
to tell us about algebraic and transcendental numbers.

Liouville’s method. The method is based on the following fact. Suppose


that x is an irrational real number. Consider any m0 E P. If we consider
the rationals k/m where k E I, m E P, and m < mo, it is intuitively clear
that one of these rationals is closest to x, say kx/mx. (What result leads to
a proof of this statement?) If we set \x — k1/m1\ = e, then e > 0.
7.5] ALGEBRAIC AND TRANSCENDENTAL NUMBERS 297

Furthermore, for any k, l with fc e I and l £ P, \x — k/l\ < e implies


that l > m0. Thus for any x G Re — Ra and me P there exists an
e > 0 such that if k, l e I and 1 < l < m then \x — k/l\ > e. We
want to see now what can be said about how such e depends on x and m.
It turns out that if x is algebraic, one can ensure a kind of dependence
which can be shown not to hold for every real number.
The simplest sort of dependence we could hope for would be that,
given x, we can find M > 0 such that for any l G P, \x — k/l\ > M/l
holds for all k e I. For then taking e = M/m, we would have \x — k/l\ >
M/l > e whenever k, l e I and 1 < l < m. This, however, is in general
not possible. The next simplest sort of dependence we could hope for
would be that, given x, we can find M > 0 and neP such that for any
l e P, \x — k/l\ > M/ln holds for all /cel. This would give the desired
result with e = M/m11. We shall see that this can be done for x algebraic
and irrational. We shall find M, n via a polynomial /(£) e Ra[£] of
which x is a root.
As an example, suppose we have such /(£) of degree 2. Then we can
choose /(£) with integer coefficients, /(£) = a£2 + 6£ + c where a, b,
cel, such that f(x) = 0. We know by 7.57 that b2 — 4ac > 0, and
we have some y with /(£) = a(^ — x)(^ — y). Since the roots of /(|)
must take the form (—b ± Vb2 — 4ac)/2a and since x is not rational,
y must also be irrational. Consider any feel and leP. Then

+ 6/bi + cV k k
= a \ j-x
/(f) = - l2 T~V
Since ak2 + bkl + cl2 € I, |a/c2 bkl + c/2| > 1. Hence

(7:5-4) — x >
k
a|
J-y

This suggests that, given x and the quadratic polynomial /(£) of which
it is a root, we can find M > 0 such that

k > M for any l e P.


l - i2

In fact, let d = max (|x|, \y\), so d > 0. Then it is seen that if \k/l\ > 2d
we have \k/l —■ x\ > d > d/l2 (since 1 < /). On the other hand, if
\k/l\ < 2d we have

k k
< + \y\ — 3d,
j-y l
298 THE REAL NUMBERS [CHAP. 7

so that by (7:5-4)

1 1
> and >
3d l2 • 3d|a|
V

Thus a suitable choice of M is min (d, l/3d|a|).


The generalization of the above leads to Liouville’s theorem:

7.71 Theorem. Suppose that /(£) G Ra[£], deg (/(£)) = n > 1, and
suppose that fix) = 0 where x & Ra. Then we can find a real number
M > 0, depending on /(£) and x, such that whenever k e I and
l G P then

Proof. If /(£) has any rational root y then /(£) = (£ — y)g(£) where
deg (g(£)) = w — 1 and p(x) = 0. By successively dividing out all
rational roots we will eventually reach a polynomial fi(£) of degree
k < n with no rational roots and such that fi(x) = 0. If the result is
proved for such polynomials, then from M/lk > M/ln we get the stated
result. Hence we can assume at the start, with no loss of generality, that

(1) f(fi) has no rational roots.

We can write /(£) = E;=o ri¥ where rt = a/di, c{, d.t G I. Then by
multiplying f(£) by d0-di • • • dn, we again have a polynomial satisfying
(1) with x as a root and with integer coefficients. We can thus also assume
that
n

(2) f{£) = ^2 ait? where all at- G I and an ^ 0.


i=o

Now we claim that

(3) there exists a real number N > 0 which depends only on x andf(£)
such that

I/O) — f(y)| < NO — VI whenever \x — y\ < 1.

This can be seen in several ways. For example, if we use Exercise 9(b) of
the preceding section, for any x, y we can find u between x and y with

m -/(y) - /'(«).
x — y

But by 7.49 we can choose N > 0 so that \f'(u) \ ^ N whenever


7.5] ALGEBRAIC AND TRANSCENDENTAL NUMBERS 299

\x — u\ < 1. This N then satisfies (3). Another proof is suggested in


the exercises for this section.
Now since/(x) = 0, we have for such N that satisfies (3),

k
< N x
l

for all k e I, l £ P with \x — (k/l) \ < 1. But by (2),

2 eI
1=0

and f(k/l) ^ 0 by (1). Hence

(4) > whenever k e I, 1 £ P and - < 1.


Nlr I

Thus if we take M = min (1, 1/N), we have the desired result, since
\x — (k/l) | > 1 implies
k M
x > M >
l ln

Now to use this theorem to prove the existence of transcendental num¬


bers, we need only find numbers x which contradict the conclusion of 7.71
for every n. That is, for every n and M > 0 there should be k £ I,
l e P with \x — (k/l) | < M/ln. This leads us to the following theorem,
the proof of which we leave to the reader.

7.72 Theorem, (i) Suppose that x is a real irrational number such that for
every n e P there exist k £ I, l E P with \x — (k/l) \ < l/nln. Then
x is transcendental.
(ii) Suppose that b E P, b > 1, and x = Xa=i b (l'). Then x is
transcendental.

Note that, no matter what b is, the representation to the base b of the
number exhibited in (ii) is 0.11000100000000000000000100 . . . Numbers
x which satisfy the hypothesis of (i) are often called Liouville numbers,
of which one can produce many examples in the spirit of (ii). It is by no
means true that every transcendental number is a Liouville number.
It is quite another (and, in general, much more difficult) matter to show
that certain specific “interesting” numbers are transcendental. Tor ex¬
ample, it is known that e, it, and v/2v5 are transcendental. On the other
hand, it is not known whether the same is true of e + tt, ev. Even proofs
that such numbers are irrational are not always simple. One example of
such a proof is suggested in the exercises.
300 THE REAL NUMBERS [CHAP. 7

It may be asked why it would be considered sufficient from the algebraic


point of view to deal with algebraic real numbers, rather than the full
real number system. For, it might be argued, since there are polynomials
with rational coefficients which have real (and hence algebraic) roots but
have no rational roots, is it not also possible that there are polynomials
with algebraic coefficients which have real roots but have no algebraic
roots? If this is the case, we should also consider the real numbers which
are roots of such polynomials. It is at first sight conceivable that by
repeatedly extending the set of real numbers which are of algebraic
interest in this way, we could eventually obtain all real numbers.
More precisely, suppose that K is any set of real numbers containing
Ra. Let Al(K) be the set of all x e Re such that for some /(£) e k[(],
deg (/(£)) > 0 and f(x) = 0. Then define the sets Ran inductively by:
Raj = Ra, RaOT+1 = HZ(Ram). Thus, the set of algebraic real numbers
isjustRa2. LetRa* = |jRaTO[raeP]. Then we see that Al(Ra*) — Ra*.
For if /(£) G Ra*[£] and deg (/(£)) > 0 then /(£) e Ram[£] for some m;
hence any root of /(£) is in RaTO+i and therefore in Ra*. Thus Ra* has
a very satisfactory algebraic closure property. The question raised above
is simply whether Ra* = Re. The answer to this question can already
be seen to be negative by using Cantor’s methods. For, the same sort of
argument that was used to prove 7.70 can be used to show that if K is
denumerable then so also is Al(K). Hence, we see by induction that each
Ra)re is denumerable, so that Ra* is denumerable by 7.68. Thus it follows
that Ra* 9^ Re.
This still leaves open the question of whether, perhaps, by dealing just
with Ra2 (algebraic numbers) and Re — Ra2 (transcendental numbers)
we are still working within too limited a framework. The surprising result
is that this is not so because AZ(Ra2) = Ra2, hence Ra* = Ra2. We
shall prove this at the end of the next chapter, by which time the ground
will be adequately prepared for a fuller insight into the behavior of
algebraic numbers. In addition, we shall show that the algebraic numbers
form a field, so that they are closed under all the algebraic processes
studied here. (Of course, in light of the result that Ra3 = Ra2, hence all
Ra,„ = Ra2 for m > 2, the preceding discussion in terms of Cantor’s
method is just academic. Its main purpose was to show what conclusions
could be drawn about Ra* with the information presently at hand.)
All of this discussion has been restricted to real solutions of polynomial
equations. To complete the picture we are necessarily led to the study of
systems in which we can find roots of polynomials which have no real
roots. The simplest such polynomial is £2 + 1. It forms the starting point
for the introduction of the complex numbers, which we take up in the
next chapter.
•5] ALGEBRAIC AND TRANSCENDENTAL NUMBERS 301

Exercise Group 7.5

1. Let A be the set of all real numbers x of the form ^'f=o mi/3i where each
nii = 0 or 2.
(a) Is A denumerable? Prove your result.
This set A (called Cantor’s ternary set) can be pictured as the inter¬
section of the following sets:

A\ consists of the interval [0, 1]


from which the open interval I
1
(J, f) has been removed. 3
1

A2 consists of Ai from which the


open intervals

1 2 1 2 7 8 ,
0 9 9 3 3 9 9

have been removed.

As is again obtained from A2 by


removing the middle open thirds H-H-H-H-H-H-H-H
n 1 2 1 2 7 8 ,
of the intervals in A2. U 9 9 3 3 9 9 1

Convince yourself that .1 = 0 An[n G P].


(b) If we call the length of An the sum of the lengths of the intervals of
An, we have length (Ai) = f, length (A2) = f, length (A3) =
etc. What is the length ln of A»? What is lim^^ Z„? (It is natural
to ascribe this limit as “length” or “measure” to A.)
2. (a) Prove Theorem 7.67(ii).
(b) Let
(n + m — 2)(n + m — 1)
F{n, m) n.
2

Show that F is a one-to-one function with 2D(F) = P X P, (R(F) = P.


[Hint: Consider the function G(l) = (L — 2)(l — 1 )/2; show that
for each k G P there exists a unique integer l > 2 with G(l) < k <
G(J + 1). Then k = F(n, m) with n = k — G(l), m = l — n.)
3. Let S = {X\ X C P and X is finite or P — X is finite). Is S denumer¬
able? Prove your result.
4. Prove that if x G Re and m G P then there exist k, l G I such that
1 < l < m and \x — k/l\ < 1/Z(m + 1), by the following method. For
any real number y, let [y] be the unique integer q with q < y < q + 1
(7.35). Consider the m + 1 numbers jx — [jx\ for j = 0,1,2,..., m.
Show that if these are arranged as a sequence y0, y 1, . . . , ym in increasing
302 THE REAL NUMBERS [chap. 7

order, yo < yi < • • • < ym, at least one of the differences y\ — yo,
V2— yi, ■ • ■ ,Vrn — Vm-1, (1 — Vm) + 2/o is <1/(to + 1). B»y examin¬
ing the differences, obtain the desired conclusion. What does this result
show about possible improvements of Liouville’s theorem 7.71?
5. Show that if x G Re — Ra, then there are infinitely many (k, l) with
k G I, l G P and \x — k/l\ < l/l2.
6. (a) Show that if x G Re and i is a nonnegative integer, then there exists
N > 0 such that lad — yi\ < W|a: — y\ whenever \x — y\ < 1.
(b) Use the result of (a) to give a direct proof of step (3) in the argument
for Theorem 7.71.
7. (a) Prove Theorem 7.72(i), (ii).
(b) Give another example of a Liouville number.
8. (a) Suppose a = (—1 )®a» exists, where each at- > 0. Show that

(b) Usin;

from 7.45, show that e-1, and hence also e, is irrational. [Hint: Con¬
sider n\ (e-1 — Xp=o (—1) Vi!)-J This can be generalized to show that
e~k, and hence also ek, is irrational for each k G P.
CHAPTER 8

THE COMPLEX NUMBERS

8.1 Basic properties. Characterization of the complex numbers. We begin


by investigating what can be said about a field (.K, +, ■, 0, 1) which con¬
tains the real numbers as a subfield and some root u of /(£) = £2 + 1,
that is, an element u with u2 = —1. Clearly, there is no relation < which
will make K into an ordered field, since in an ordered field — 1 < 0 and
x2 > 0 for all x.

8.1 Theorem. Suppose (K, +, •, 0, 1) is a field which contains Re as a


subfield, and suppose u e K is such that u2 — —1. Then for any
x, y, Xi, yi e Re we have:
(i) x -j- uy = + uyi if and only if x = x\ and y = y\\
(ii) (x + uy) + Oi + uyx) = (x + Xx) + u(y + yx);
(iii) — (x + uy) = (—x) + u(—y);
(iv) (x + uy) ■ {xx + uyx) = (xxx — yyi) + u{xyx + xxy)\
(v) (x + uy) ■ (x — uy) = x2 + y2;
(vi) if x + uy 0 then x2 4- y2 > 0 and

1 _ x { y \
x + uy x2 + y2 U \ x2 + y2) ’

(vii) the set D of all elements x + uy in K with x, y E Re forms a sub-


field of K.

Proof. The proof of this theorem is quite easy. For (i), if x + uy =


xx + uyx we have (x — x{) = u{yx — y), hence (x — xx)2 = —(yx — y)2.
Since (x — Xx)2 > 0 and — (yx ~ y)2 < 0 we must have (x — xx)2 =
(yx — y)2 = 0, and then x — Xx = y — yx = 0. Parts (ii)-(v) are
obvious; u2 = —1 is used again in (iv) and (v). For (vi) if x + uy ^ 0
then x + uy ^ 0 + u ■ 0, hence either x 9^ 0 or y 7^ 0 by (i). In either
case x2 + y2 > 0, and (vi) then follows directly from (v). The results
(ii)-(vi) show that the set D defined in (vii) is closed under the operations
+, —, •, and -1 of K, and clearly contains each element x = x + u • 0
of Re, in particular 0 and 1. Hence it must satisfy all the conditions of
a field and is a subfield of K.
Thus, among other things, if we want to prove the existence of a field K
containing an element u satisfying the hypothesis of 8.1, part (vii) sug¬
gests that we can already realize this by imposing the stronger condition
303
304 THE COMPLEX NUMBERS [CHAP. 8

that all elements z of K have the form 2 = x + uy for some x, y e Re.


Furthermore, 8.1 (i) shows that the function F(x, y) = x -f- uy would be
a one-to-one mapping of Re X Re onto K in this case. This leads us
directly to the proof of the next theorem.

8.2 Theorem. There exists a field (K, -f-, •, 0, 1) and an element u of K


satisfying the following conditions:
(i) Re c K and Re is a subfield of (.K, +, •, 0, 1);
(ii) u2 = —1;
(iii) for each z e K there exist x, y e Re with z — x + uy.

Proof. We first construct a field (K, +, •, 0, 1) which will be isomorphic


to the desired field. We take

(1) K = Re X Re

and

(2) 0 = (0, 0), 1 = (1, 0), and u = (0, 1).

Every element of K is uniquely represented in the form (x, y) for some


x, y £ Re. Then (x, y) = 0 if and only if x = y = 0. We next define:

(3) (a) (.X, y) + Oi, yx) = (x + xx ,y + yx),


(b) -(x,y) = (—x,—y),
(c) (x, y) • (xi, yf) = (xxi — yy1} xyx + xxy),
(d) (x, y)-1 = (-2 ^ y2 » - ^3) whenever {x, y) ^ 0.

Then we see that

(4) u2 = -1,

for this is (0, 1) • (0, 1) = (—1, 0) = —(1, 0).


Now it is a routine matter using (3)(a)-(c) to verify that

(5) (K, +, •, 0, 1) is a field.

We pick a few cases of the statements that must be checked against the
definitions 4.1, 4.13, and 6.1. For example, for distributivity:

O, y) * [(*1, Vi) + (x2, y2)]


— (x, y) • (Xi + x2, yx + y2)
= (x(xi + X2) — y{yx + y2)} x(yx + y2) + (xx + x2)y)
= ((xxx — yyx) + (xx2 — yy2), (xyx + xxy) + (xy2 + x2y))
— (xxi ~~ yyi> xy 1 + y) + (xx2 — yy2, xy2 + x2y)
= y) * (xi, y 1)] + [(x, y) • (x2, y2)].
8.1] BASIC PROPERTIES 305

For the condition 4.14 for an integral domain, if (x, y) • (x\, yx) — 0,
that is, {xxx — yyx, xyx + xxy) = (0, 0), we have xxx — yyx — 0 =
xyx + xxy. From the first of these equations, xxxyyx = y2y\ > 0 and
from the second xxxyyx = —x\y2 < 0. Thus xxxyyx = 0 and xy — 0
or xxyx — 0. Suppose that the first of these holds, so that x = 0 or y = 0.
Say x = 0; if also y = 0 then (x, y) = 0 and we are through. Other¬
wise xxx — yyx and — xyx = xxy shows that yyx = 0 = xxy; hence
xx = Vi = 0 and (xx, yx) = 0. The argument is similar in the other pos¬
sible cases. For the condition 6.1 for a field, it is sufficient to show that if
(x, y) 9^ 0 then (x, y) • (x, y)~1 = 1. Since either i ^ 0 or y ^ 0,
certainly x2 -f- y2 > 0 and (x, y)~1 is well defined by (3d). The condi¬
tion is easily checked from (3c). We thus take it that (5) is established.
For any x e Re, let

(6) G(x) = (x, 0).

Then we see directly from (3a, c) that

(7) (a) ©((r) = Re, (R((7) c K and G is one-to-one;


(b) (7(0) = 0 and (7(1) = 1;
(c) for any x, y G Re, G(x + y) = G(x) + G(y) and
G{x ■ y) = G{x) ■ G(y);
(d) for any z G K there exist (unique) x, y G Re with
z = G(x) + (u • G(y)).

By the general result (2:4-9), the proof of the theorem is concluded by


extending Re to a set K in one-to-one correspondence with K by an exten¬
sion H of the function G, and defining +, • on K so that this is an isomor¬
phism. Then if u G K is chosen so that G(u) = u, we will have u2 = — 1
by (4) and z = x + uy, where

H(z) = H(x) + (u • H(y)) = G{x) + (u • G(y)),

for each 2 G K.

8.3 Theorem. If (K, +, •, 0, 1) and (K, +, •, 0, 1) are two fields which


satisfy the conditions 8.2(i)-(iii) for certain u G K, u G K, respectively,
then (K, +, •, 0, 1) = (K, -f, •, 0, 1). One function F with 30(F) = K,
(R(F) = K which gives this isomorphism is uniquely determined by
F(x) = xfor all x G Re and F{u) — u.

Proof. If such F is to be an isomorphism we should have F(x + u ■ y) =


F(x) + F(u) - F(y) = x + u • y for all x, y G Re. In fact, if we define F
in this way, then F{z) is well determined for all z G K. For by 8.2(iii),
z = x + uy for some x, y G Re, and by 8.1 (i), these x, y are uniquely
30G THE COMPLEX NUMBERS [chap. 8

determined by z. Applying 8.1 (i) to K also shows F to be one-to-one and


01(F) = K, by hypothesis 8.2(iii) for K. Finally 8.1 (ii), (iv) show that the
sum and product of elements x + u • y of K are calculated in exactly
the same way as for the corresponding elements x + u • y of K, that is,
F(zi + z2) = F(zi) + F(z2), F(z\ ■ z2) = F(zi) • F(z2) for any zlt z2 G K.
This proves the theorem.
Since we now have existence and uniqueness, up to =, of a field K
satisfying 8.2(i)-(iii) with respect to a certain element u, we are justified
in introducing the following.

8.4 Convention. We assume from now on that (C, +, •, 0, 1) is a fixed


field and i a fixed element of C satisfying the conditions of 8.2(i)-(iii);
that is,
(i) Re c C and Re is a subfield of (C, +, •, 0, 1) ;
(ii) i2 = -1;
(iii) for each z G C there exist (unique) x, y G Re with z = x + iy.
We call C the set of complex numbers, and i the principal square root
of —1 in C, or the imaginary unit.

Imaginary and complex numbers have been used for hundreds of years,
beginning with their use in the solution of polynomial equations. Such
applications were “formal” and came long before geometrical interpreta¬
tions of these numbers and of the basic operations on them. It was
implicitly assumed in such formal manipulations that the same “laws of
algebra” could be applied to complex numbers as to rational and real
numbers. The existence theorem 8.2 explicitly states in what sense this is
possible and the uniqueness theorem 8.3 shows that, for algebraic purposes,
the three conditions 8.2(i)—(iii) are the only ones we need set down to govern
the use of these numbers. I he fact that there is no relation < under which
C will become an ordered field also shows precisely what limitations are
imposed in extending “the laws of algebra. ”

Complex conjugates. Note that —i is also a square root of —1 in C,


that is, (—i)2 = i“ = —1, and i 5^ —i (for otherwise i = 0, while
02 7^ 1). 1 hese are the only square roots of —1 in C, for they are both
roots of the polynomial £2 + 1. We shall generalize this in a moment.
Every element 2 G C can be represented in the form 2 = xx + (—i)?/i for
some Xi, yx G Re, namely X\ = x, yi = —y where 2 = x + iy. Thus
by 8.3 we have the following.

8.5 Corollary. The function F with 35(F) = (ft(F) = C, given by


F(x + iy) = x — iy, for all x, y e Re, is an isomorphic mapping of
the field of complex numbers onto itself.
8.1] BASIC PROPERTIES 307

8.6 Definition. For each z G C, we set z = x — iy when z = x -f- iy


for x, y G Re. Here z is called the (complex) conjugate of z.

We thus have (i) and (ii) of the following as a Corollary to 8.5, since
F(z) = z for all 2 e C.

8.7 Corollary. For any z, Z\ G C we have:


(i) z + z1 = zjj- Zi;
(ii) z ■ Zi = z • Zi;
(iii) z = z;
(iy) z = z if and only if z e Re;
(v) if z = x + iy then z • z = x2 + y2, and if z 9* 0, then zz > 0.

Parts (iii)-(iv) follow directly from the definition 8.6. Of course, (i)
and (ii) can also be checked directly in this way. This leads now to a general
result about complex roots of real and complex polynomials.

8.8 Theorem. Suppose that /(£) = Xn=o cqf\ /(£) £ C[£], and zeC.
Then:
(i) f(z) = L"=o a;?;
(ii) #/(£) e Re[|] then f(z) = /(2);
(iii) if f(£) e Re[|] ond z 7s a rooi 0//(^) so 7s z.

Proof. Part (i) follows from 8.7(i), (ii) by induction, first for the poly¬
nomials £ and then in general. Then (ii) is immediate from 8.7(iv).
Thus if/(f) G Re[£] and/(z) = 0, also/(z) = 0, so f(z) = 0, proving (iii).

Square roots of complex numbers. It is clear that every real number d


has a square root z in complex numbers, i.e., such that z2 = d. If d = 0,
then z = 0 is the unique such square root and we write \/0 = 0. If
d > 0, then \/d and —y/d are the only such roots. If d < 0 then — d > 0,
and z = i\J—d and —i\/—d are the only such roots; in this case we
denote the first of these by y/d and the second by —y/d. This gives a
unique determination of y/d for every d G Re. Using this, we can extend
7.57 so that every polynomial a/2 + + c with real coefficients a, b, c
and a 9^ 0 has either the unique root — 6/2a if 62 — 4ac = 0, or the two
distinct complex roots (—6 ± y/b2 — 4ac)/2a if b2 — 4ac 0, which
are real only in the case b2 — 4ac > 0.
We wish now to extend this to obtain for every d G C a complex square
root y/d and then to show that every polynomial a/2 + 6f + c with
arbitrary complex coefficients has roots as described above. To do the first
of these, consider d — s + it with s, t real. If z = x + iy is to satisfy
z2 — d with x, y real, we must have x2 — y2 = s and 2xy = t. If t = 0,
we can already determine the square roots of d, which is real in this case.
308 THE COMPLEX NUMBERS [chap. 8

If t 9^ 0 then both x ^ 0, y ^ 0. Thus we can set y = t/2x and


x2 — t2/4x2 = s, so that 4x4 — 4sx2 — t2 = 0. By using the quadratic
formula we can find all real solutions x of this equation and, for each
such x, all real solutions (x, y) of the original pair of equations. This leads
to the following theorem, whose proof we leave to the reader.

8.9 Theorem. Suppose that d — s + it with s, t real. Let

( . s VvS2 + t2 + S . V\/ s2 + t2 — s

\ 2 x/2

with + or — taken according as t > 0 or t < 0. Then the only complex


numbers z with z2 = d are z0 and —z0.

8.10 Definition. For any d e C we denote by d112 or \fd the number


Zq determined as in 8.9; it is called the principal square root of d.

Later the use of the term “principal” will be expanded somewhat. We


now see by the same argument as in 7.57 that the quadratic formula can
be given general sense:

8.11 Theorem. Suppose that a, b, c E C with a 0. Then the only com¬


plex roots zi, z2 of af2 + &£ + c are (—b ± \/b2 — 4ac)/2a, and
af2 -f- + c = a(£ — Zi)(£ — 22)-

Of course, z 1 = z2 if b2 — 4ac = 0, and Z\ z2 otherwise. We wish


now to pursue the solutions of polynomial equations of higher degree.
As the next step we might consider cubic equations, in particular the equa¬
tion 23 = d. As a simplifying step here, note that we can write any
z = x + iy (with x, y real) as

2 = V^2 + y2 (~ X + i - V
\V X2 + IJ2 Vx2 + y2

if z 9^ 0. Set r = \/x2 + y2, u — x/r, v = y/r so that r > 0, u2 + v2 = 1,


and z = r(u -f iy). This representation is unique, i.e., if z = ri(ux + ivx),
with ri > 0 and u\ -f v\ = 1, then r — rlt u = u1} and v = vx. For
zz = (rxu\)2 + (riiq)2 = r\ for any such representation, hence r2 = r\
and r = rx. Then from u + w = ux + iiq we get the result. We shall
show that the corresponding representation of z2 is r2(u + iv)2, of z3 is
r3(u + iy)3, etc. Hence if we write d ^ 0 as d = ri(si -f Bx) with
ri > 0 and s? + t\ = 1, to solve z3 = d is the same, by this uniqueness,
as solving r3 = rq and (u + iy)3 = Sl + i^. The first equation is
trivially solved for r, given r1} by r = r\13. However, the second equation
leads to two third degree equations in u and v which must be solved
8.1] BASIC PROPERTIES 309

simultaneously. This can be done by direct algebraic manipulations, but


the work is somewhat tedious and not very informative. Moreover, this
is only for the simplest cubic equation. For true insight into the nature
of the solutions of these and higher order equations, it is necessary to turn
to the geometric representation of complex numbers and of the basic
operations on them.

A geometric interpretation. Since we proved the existence of a field


isomorphic to C consisting of the elements of Re X Re, it is natural to
consider such a representation in the plane. We associate with each zeC
the unique “point” (x, y), with x, y e Re and z = x + \y, which we
denote now by P, and later by z itself:
P(x, y)

Figure 8.1

Then the quantity \/x2 + y2, which we have already had to deal with
several times, is the distance of P from the origin 0. Furthermore, in the
representation

the quantities x/\/x2 + y2, y/y/x2 + y2 are, respectively, the ratios


OA/OP and AP/OP of the base and altitude of the right triangle OAP
to the hypotenuse. In plane trigonometry these ratios are designated,
respectively, as cos 6 and sin 6 where d is the angle AOP, which is found
as the angle between the positive x-axis and the hypotenuse OP, measured
in a counterclockwise direction. We write z = r (cos d + i sin 6), where
r = Vz2 + V2 and also r = \z\.
The only geometric notion here which so far is not explained precisely
in terms of our previous work is that of “angle” and in particular of the
angle AOP measured as described above. Actually, the definition of this
notion in analytic terms is by no means elementary. However, we assume
for the moment that the usual geometric and trigonometric notions and
results are well understood. We shall return to the question of obtaining
these within our framework in a short while.
If we are given two complex numbers Z\ = X\ + it/i, z2 = x2 + W2,
with associated points (x\, yi), (x2, 2/2), the sum z\ + z2 = (aq + x2) +
310 THE COMPLEX NUMBERS [chap. 8

Kyi + yf) has associated point (xx + x2, y\ + y2). The corresponding
figure is as follows.

Geometrically, the sum zx -f- z2 is obtained by the so-called “parallelogram


law” for adding vectors (or resolving forces). It is easily seen that this
law applies independent of the signs of xx, yx, x2, y2. By our preceding
remarks, OPx = \zx\, OP2 = |z2|, and OP = \zx + z2|. But OP2 = PXP,
and the length of one side of a triangle is always less than or equal to the
sum of the lengths of the other two sides. Thus we conclude that OP <
OPi + OP2, that is, |z\ + z2\ 4 \zx\ + \z2\ for any zx, z2. The definition
of |z| and verification of such properties as this, the so-called triangle
inequality, can now be given entirely in geometric-free terms.

Absolute value. From 8.7(v), we know that if z = x -j- iy with x, y


real then zz = x2 + y2, which is >0.

8.12 Definition. For any z e C we put \z\ = -y/^; \z\ is called the
absolute value or modulus of z.

8.13 Theorem. For any z, w £ C we have:


(i) |z| > 0, and \z\ = 0 if and only if z = 0;
(ii) if z is real then \z\ as defined in 4.19 and as defined above are the
same;
(iii) \z ■ w\ = \z\ • \w\;
(iv) if z 5* 0, |2_1| =
(v) \z + w\ < |z| M;
(vi) \z — w\ > I |z| — \w\ j.

Proof. Parts^(i), (ii) are obvious from 8.7, and from 8.7(h) we obtain
\zw\2' = (zw)(zw) = (zz) (ww) = |z|2H2, which gives us (iii) by (i). Part
(iv) is then immediate by application of (iii) to w = z~h To prove (v),
8.1] BASIC PROPERTIES 311

we first consider the special case w = 1. Then

I2 + 1|2 = (Z + l)(z + 1) = |z|2 + Z + Z + 1.

We wish to show that this result is <(|z| + l)2 = |z|2 + 2\z\ + 1. This
reduces to showing that z + z < 2\z\. If z = x + iy with x, y real, then
we have 2x < 2\/x2 + y2, which is obviously true. In general, if w — 0
the result is trivial and if w + 0 we have

\z + w\ = \(zw~1 + l)tt>| = \zw~~1 + 1| \w\ < {\zw~l\ + 1)1+


= (|z| |+_1 + 1)| + = |z| + \w\.

By (v), \z\ = \w + (z — w)| < \w\ + \z — w\, giving \z\ — |+ < |z — w|.
If |z| > |rt'|, this gives (vi). If |w| > \z\ we have | |z| — \w || = +| —
|z| < \w — z| = \z — w|.
Thus, despite the fact that C cannot be made an ordered field, we are
still able to introduce an absolute value function which not only extends
the absolute value as given by the ordering of the reals, but also shares its
main properties.
We now turn to the relationship between the algebraic operation z\ • z2
and its geometrical interpretation. Formally, if we write zx = r\ (cos + +
i sin +) and z2 = r2(cos d2 + i sin 02) then rx = |zx|, r2 = \z2\, and

ziz2 = rir2[(cos 8x cos d2 — sin 61 sin d2)


+ i(sin 81 cos d2 + cos 8j sin d2)}.

If we set z = ziz2 = r(cos 8 + i sin 8), then r = |z| = |zi| |z2| = rir2.
Thus
cos 8 — cos 0i cos 82 — sin 0! sin 02,
sin 0 = sin 0X cos 02 + cos 0i sin 02.

In fact, we know from elementary trigonometry that

cos (0i + 02) = cos 0i cos 02 — sin 01 sin 02,


sin (0i + 02) = sin 0i cos 02 + cos 0i sin 02.

We now examine more closely the notion of angle and the definitions of the
trigonometric functions of angles.
The most convenient system for measuring angles in mathematical
analysis is by radians. If we consider the unit circle with center the origin
and radius 1 then the circumference of the circle is equal to 2ir. The arc
length subtended by any angle 0 is in the same ratio to 2tt as 0 is to the full
circular angle. In the radian system of measurement, each angle is meas¬
ured by the same number of radians as there are units in the associated
312 THE COMPLEX NUMBERS [chap. 8

(b)

Figure 8.3

arc length. In the figure above, OP = OA = l_and the value of 9 in


radians is equal to the length of the circular arc AP.
In trigonometry, cos 6 and sin 9 are defined for all real numbers 9,
first geometrically for 0 < 6 < 2tt [as cos 9 = x and sin 6 = y for (x, y)
on the unit circle] and then extended by definition to all 9 by the periodicity
property cos (9 + n ■ 2tt) = cos 9, sin (9 + n ■ 2tt) for nel. Then with
reference to the computation of 2 = zx ■ z2, we see that 9 = + 92 if
0 < #i + <?2 < 27r and 9 = 9X -T #2 — 27t if 2tt < 9\ + 02) if we start
with 0 < 91, 92 < 27r. This gives diagrams such as the following.

Representations of some particular numbers are

1 = 1 i • 0 = cos 0 + i sin 0, i = 0 + i • 1 = cos \ . . 7T


1 S1U 2 ’
~~ 1 = — 1 + i • 0 = cos 7T + i sin 7r,
• I 1\ 3?r 37T
—i = 0 1 • (—1) = cos —
1 sm 2
7r . . . 7r
V2X 1 cos — + sin — etc.
(Fi + i'Fi) = v'( 1

Implicit in the trigonometric representation are the following ideas.


First, with each complex number 2 = x + iy such that \z\ = 1, i.e., such
that x2 + y2 = 1, is uniquely associated a real number 9, called the
8.1] BASIC PROPERTIES 313

angle of z or the argument of z, which we temporarily denote as Arg (2).


Referring to Fig. 8.3,^we determine that this real number 6 is the length
of the circular arc AP. Equivalently, from elementary geometry, 6 is
equal to one half of the area of the sector AOP, since 6 is in the same ratio
to the length 2x of the circumference of the unit circle as the area of
AOP is to the area ir of the unit circle. In either determination of 6
(and in particular of the number it) we are using a nonelementary concept,
namely length of a curve or area of a -plane figure, the general definition of
which demands the analytic concept of integration.
Second, it is assumed that we have certain functions C and S of real
numbers, 0(6) — cos 6, S(6) = sin 6, for any real number 6, such that
whenever 6 = Arg (2) is uniquely determined to be the angle of 2 = x fi- iy
as in the preceding paragraph, we have 0(6) = x and S(9) = y. Finally
these functions enjoy certain special properties, such as:

C(0) = 1, 5(0) = 0,

C(x) = —1, S(t) = 0, etc.,


0(6 — 2nx) = 0(6), S(6 -(- 2mr) = S(6),
0(61 + 6 2) — 0(61)0(62) — S(6i)S(62),
S(6i -p d2) = S(d1)C(d2) + C(di)S(d2).

The proofs that we can find such functions (Arg, C and S) would take
us too far afield from our main interests here. These proofs can be ac¬
complished after a modest development of the calculus, which only depends
on the treatment of the real number system given in the preceding chapter.
In Appendix II we sketch the main points that are involved and also give
references to complete proofs which lead to the following theorem. For
our purposes here, it is never really necessary to know exactly how the
number t is determined. Thus one can equally well read throughout the
following “where x is any given positive real number.”

Basic properties of the trigonometric functions.

8.14 Theorem. There exist two continuous functions C and S on Re


satisfying the following conditions:
(i) C(0) = 1, S(0) = 0;
(ii) C(t/2) - 0, aS(x/2) = 1;
(iii) if 0 < 6 < x/2 then 0 < 0(6) < 1 and 0 < S(6) < 1;
(iv) for any 6, C2(6) + S2(6) = 1;
(v) for any 61} d2, C(6i + d2) = C(di)C(d2) — S(d1)S(d2);
(vi) for any 6X, d2, S(61 + 62) = S(6i)C(62) + 0(6X)S(62).
314 TIIE COMPLEX NUMBERS [chap 8

Starting from these we now derive some further results.

8.15 Theorem. Suppose that C, S are any two functions satisfying the
conditions of 8.14. Then for any real number 0 and n G I:
(i) -1 < (7(0) < 1, -1 < 5(0) < 1;
(ii) C (0 + 7t/2) = —5(0) and S (0 + tt/2) = (7(0);
(iii) (7(—0) = (7(0) and S(~0) = — 5(0);
(iv) (7(0 + 2mr) = (7(0) and S(6 + 2mr) = S(6);
(v) (7(20) = (72(0) - 52(0) = 2(72(0) -1 = 1- 2S2(6);
(vi) 5(20) = 25(0)(7(0);

(vii) c2(») = i+^)are,s2(|).l^».

Proof. Part (i) is immediate from 8.14(iv), and (ii) is evident from
8.14(h), (v), (vi). To prove (iii) we use 8.14(v), (vi) with 0X = 0,
02 = —0 to obtain

(1) 1 = (7(0)<7(— 0) - 5(0)5(—0)


and
0 = 5(0)(7(—0) + (7(0)5(—0).

We solve these simultaneous equations for (7(—0) in terms of (7(0) and


5(0) by multiplying the first by (7(0) and the second by 5(0) to give

(7(0) = (72(0)(7(—0) + 52(0)(7(—0) = (7(—0),

by 8.14(iv). Similarly we see that 5(—0) = —5(0). By repeated applica¬


tion of (ii) we get

(7(0 + 27t) = (7(0) and 5(0 + 27t) = 5(0);

hence (iv) is seen to hold for n > 0. But by 8.14(v) and (iii),

(2) (7(0! - 02) = (7(01)(7(02) + 5(0!)5(02)

and
5(01 ~ 02) = 5(0:)(7(02) - C(0i)5(02).

Then we see that (7(0 — 7t/2) = 5(0) and 5(0 — 7t/2) = —(7(0), hence
we can obtain (7(0 — 27r) = (7(0) and 5(0 — 27t) = 5(0). From this
we can get (iv) in general. Parts (v) and (vi) are immediate from 8.14(v),
(vi), and part (vii) is obtained from (v) by substituting 0/2 for 0.
From (vii) we can write

(7(0)
±
8.1] BASIC PROPERTIES 315

To determine which signs are taken here, one must know the location of
6/2. In general, the signs of (7(0), *S(0) for 0 < 6 < 2t [and hence for all
6 by (iv) above] are completely determined by 8.14(i)—(iii) and repeated
application of 8.15(h). We consider the values of these for 0=0, ir/2, ir,
and 3x/2 and for the intermediate ranges (I) 0 < 6 < t/2, (II) tt/2 <
6 < 7T, (III) 7T < 6 < 37T/2, and (IV) Zir/2 < 6 < 2ir. Each 6 in
range (II) is 7t/2 + 6X where 6X is in range (I), etc. Thus we see that
C(6) = 0 if and only if 6 = t/2 or 6 = Zt/2 [with S(6) = 1 or — 1,
respectively], and *S(0) = 0 if and only if 6 = 0 or 6 = t [with (7(0) = 1
or —1, respectively]. For 0 in range (I): (7(0) > 0, *8(0) > 0; in range
(II): (7(0) < 0, *S(0) > 0; in range (III): (7(0) < 0, *8(0) < 0; and in
range (IV): C(6) > 0, *8(0) < 0. This is summarized in the figure below.

Thus, for example, if 0 < 0 < t then 0 < 0/2 < t/2 and

and

while if t < 6 < 2t then t/2 < 0/2 < t and

f C(6)
)

and again

8.16 Theorem. There is a unique pair of functions (7, *8 satisfying the


conditions of 8.14 (including the condition of continuity on Re).

Proof. Suppose that (7, *S and (T, *Si are two such pairs of functions.
Then both of these also satisfy the conditions of 8.15. By 8.14(i), (ii) and
316 THE COMPLEX NUMBERS [CHAP. 8

8.15(ii) we see, as in the preceding discussion, that it is sufficient to prove


that

7T
(1) if 0 < 9 < — then C(9) = Ci(d) and S(9) = Sx(9).

We first note the following by 8.15(vii):

(2) if k e P then C ( ^ ttJ = Cx ( p tt

and

s T) = Qk T
Next we obtain:

(3) if 6 is any real number and me. P and if C(0) = C\(0) and
S(d) = Si (6) then

C(mO) = C\(m6) and S(mO) = S\(md).

This is seen by induction of m, using the addition laws 8.14(v), (vi).


Combining (2) and (3) thus shows that

(4) for any fc,meP,C^-^

and

Consider now any 9 with 0 < 9 < ir/2. We can write 9 = a(w/2) where
0 < a < 1. By the representation 7.36 of real numbers to the base 2,
we can write a = Y^i=\ (mi/24), where each mi is 0 or 1. Hence for any
8 > 0 we can find m, k e P with |a — (to/2k)\ < 8; namely m/2k =
l (TO*/2*) for suitably large n. Now we apply the assumed continuity
of the functions C, Ci. It is seen from this that given any e > 0 we can
find 8 > 0 such that 0<a — 5<a+5<l and

(5) if |a b\ < 8 then


lc Cf) -c (6 f) < e

and
|CX (a iPj - C\ I < e.

Given any such e and associated 8, choose to, k as above, and let b m/2k.
Then by (4)
8.1] BASIC PROPERTIES 317

Hence C (" 2 ) - O («f)


„ /a7r\ |
c (“ f) “ C (6 J) c
=
(" D Ci (~y) | < 2e.

Since this is true for any e > 0, we must have

Similarly we use continuity to prove

S (a = S! (a ,

for any a with 0 < a < 1, so that (1) is proved.

8.17 Definition. We write C(6) = cos 6 and S(9) = sin 9 for any real
number 6, where C, S are the unique functions satisfying the condi¬
tions of 8.14.

The trigonometric representation; De Moivre’s theorem. We can now obtain


the unique representation of complex numbers in trigonometric form.

8.18 Theorem. For any z e C, if z ^ 0, there are unique real numbers


r and 9 with 0 < r, 0 < 9 < 2tt, and z = r{cos 9 + i sin 6).

Proof. Clearly |z|2 = r2(cos2 9 + sin2 9) = r2 in any such representa¬


tion, so r = \z\. Thus r is uniquely determined. To prove existence and
uniqueness it is sufficient to consider z = x -f iy with x, y e Re and
|z| = 1, that is, x2 + y2 = 1. Then \x\ < 1, \y\ < 1. Any possible
representation x = cos 9, y = sin 9 completely determines the range of 9
in the sense of Fig. 8.5, according to the signs of x, y. If either of x, y is 0
the existence and uniqueness is immediate. In the other cases it is sufficient
to settle the existence and uniqueness for the case 0 < x < 1,0 < y < l.
[For example, if we find 9 with cos 9 = \ and sin 9 = y/3/2 then

+ cos(9 + l) + (* + !)’
—I — i = cos (6 + 7r) + i sin (9 + 7r), etc.
z z

On the other hand, to represent (—1/2) + i(V3/2), we first find <f> with
cos 4> = \/3/2, sin <f> = 1/2.]
318 Tl-IE COMPLEX NUMBERS [CHAP. 8

Thus consider now any fixed x, y with x2 + y2 = 1 and 0 < x < 1,


0 < y < 1. Let F(6) = cos 9 — x for any 6. With x fixed, F is a con¬
tinuous function of 9. Since F(0) = 1 — x > 0 and F(ir/2) = —x < 0,
there must exist 9 between 0 and ir/2 witli F(9) = 0 by the Weierstrass
Nullstellensatz 7.48; that is, 0 < 9 < 7t/2 and cos 9 — x. Since
cos2 9 + sin2 9 = 1 we thus have sin2 9 = y2; but then sin 6 = y since
both sin 9 and y are positive. This proves the existence of 9. Suppose also
that cos <t> = x, sin </> = y. Then by the sign arguments, also 0 < <f> < 7t/2.
Suppose, for example, that 4> < 6. Then sin (9 — <f>) = sin 9 cos —
cos 9 sin 4> = yx — xy — 0. But 0 < 9 — cf> < 7t/2. Hence sin (9 — </>) = 0
can occur only if <9 — 0 = 0. This proves the theorem.
We shall call the representation of complex numbers 2 in the form of
8.18 the trigonometric representation of z. The idea of using such a rep¬
resentation was independently put forth by several mathematicians in the
years around 1800, most prominently by Gauss, so that one often speaks
of the Gaussian plane. The basic diagram in Fig. 8.1 is often referred to
as the Argand diagram, after another one of the discoverers. For our pur¬
poses, the main use we shall make of this representation comes from part
(ii) of the following theorem, called De Moivre’s theorem. It is directly
suggested by the addition laws for cos and sin, in particular cos 29 =
cos2 9 — sin2 6, sin 29 = 2 sin 9 cos 9, which can be expressed more com¬
pactly in the form cos 29 + i sin 29 = (cos 9 + i sin 9)2.

8.19 Theorem.
(i) If Zi = ri(cos 91 + i sin 9{) and z2 = r2(cos d2 + i sin d2)
where 6\, 92 are any real numbers, then

zi ■ 22 = rx ■ r2[cos (6X + 92) + i sin (9j + 02)].

(ii) For any real numbers r and 9 and any n G P,

[r(cos 9 + i sin 9)}n = r”[cos nd + i sin nd].

We leave the proof to the student.

nth roots of complex numbers. Substituting 9/n for 9 and r1/n for r in
8.19(ii) now gives us a solution of zn = r(cos 9 + i sin 9). More generally,
we have the following.

8.20 Theorem. Suppose that d e C, d ^ 0, and n e P. Then there exist


exactly n distinct complex numbers z with zn = d. If d = |d|(cos 9 +
i sin 9) with 0 < 9 < 2ir, these numbers are

Zk = I a? 11;cos

for k = 0, 1, . , . , n — 1.
8.1] BASIC PROPERTIES 319

Proof. By 8.19(11), for each k,

znk = |d|[cos (0 + 2kir) -\- i sin (0 4 2kir)] — |d|(cos 0 4 i sin 0) = d.

Moreover, the numbers Zk are distinct. For, 0 < 0 < 2t, so

0 < 0 + 2kir < 2ir + 2(n — l)7r = 2mr,

for 0 < k < n — 1, hence 0 < (0 + 2kiv)/n < 2w for each such k. By
the uniqueness result 8.18, it follows that if Zk = zi where 0 < k,
l < n — 1 then k = l. Finally, each Zk is a root of the polynomial
— d. Since this can have at most n distinct roots, the Zk provide all
the solutions of zn = d.
Thus the trigonometric representation very elegantly supplies us with
the general existence theorem that we had obtained previously only for
n — 2 (8.9) and already found troublesome to deal with algebrai¬
cally for n = 3. To see how the above compares with the solution for
n = 2, let

d = s + it = Vs2 + t2 ( S - - + i —_^ > for d ^ 0.


\\/s2 + t2 Vs
2 + t2J
To compute the square roots of d according to 8.20, we must evaluate
cos (0/2), sin (0/2) and cos (0/2 -)- tt), sin (0/2 + tt) for the unique 0
such that 0 < 0 < 2tt and cos 0 = s/Vs2 + t2, sin 0 = t/s/s2 4 t2.
This can be done by 8.15(h), (vii). We leave it to the student to work out
the details. The result of this computation is that the principal square
root of d, \/d, as determined by 8.10, is seen to be the same as the number
VVI (cos 0/2 4- i sin 0/2), i.e., the number z0 of the preceding theorem.
This allows us to extend the definition 8.10 as follows.

8.21 Definition. Suppose that n G P. We set 0i/,! = vT) = 0. For any


de C with d 4 0 we denote by dVn or \/d the number z0 determined
as in 8.20; it is called the principal nth root of d. All the numbers
Zk of 8.20 are called the nth roots of d.

Note that this definition is also in accord with 7.56 for d real, d > 0.
As a practical matter, the computation of the nth roots of a complex
number d is broken up into two parts: the computation of the real nth
roots |d|1/n and the computation of the cos (0 4 2/c7r/n) and sin (0 4
2kir/n). The second of these is achieved, to any desired degree of accuracy,
by the series representations of cos 0 and sin 0 discussed in Appendix II.
The results of such computations for specified degrees of accuracy and for
a large number of values of 0 are compiled in tables ol trigonometric
functions.
320 THE COMPLEX NUMBERS [CHAP. 8

An alternative approach is available for particular values of n. It is seen


that if we write n — 2m ■ ni where 2\n1, we can reduce the question of
finding nth roots to that of nith roots, either algebraically by the formula
of 8.9 or trigonometrically by the half-angle formulas of 8.15(vii). Thus
we suppose now that n is odd. Next, we can try to reverse the use of
De Moivre’s theorem to obtain equations for cos 6/n and sin 6/n in terms
of cos 8, sin 6. For example, for n = 3 this leads to

3 ® Q 6.2®
(8:1-1) cos — — 3 cos - sin - cos 6
o o o

and then to
e d
(8:1-2) 4 cos3 3Q COS — cos 8.
3 O

Setting x = cos d/3, y = sin 6/3, we can thus hope to determine x, y


from the given value of cos 6 and the following equations

(8:1-3) 4.r3 — 3a; = cos 6, x2 + y2 = 1,

by using the determination of the signs of x and y according to the loca¬


tion of 6/3. For example, for ir/3 we have x > 0, y > 0 and 4a;3 — 3x +
1 = 0. This equation has x = —1 as a root; dividing by x + 1 leads to
the equation 4a;2 — 4a; + 1 = 0, which has x = \ as its only root.
Then y2 = § and y = y/3/2. Thus cos ir/3 = f, sin ir/3 = y/3/2,
in agreement with school trigonometry. To find the functions of x/6, we
can then apply the half-angle formulas, or again work directly with (8:1-3),
since in this case we are led to the simpler equation 4a;3 — 3a; = 0 for
x = cos 7f/6.
This algebraic approach does not always lead to a simple algebraic
representation for nth roots, even for n = 3. For example, for 7r/9 we
must solve the equation 4a;3 — 3a; = J, the means for which are not
readily apparent. This will be discussed in more detail at the end of this
chapter. Furthermore, the larger the value of n, the more complicated is
the equation to be solved. We leave it to the reader to explore this matter
in the exercises for the special case n = 5.
There is one case of 8.20 which is of particular interest in several respects,
namely the case where d — 1. In this case we are dealing with the so-
called nth roots of unity.

8.22 Theorem. Given n e P, let w = cos 2ir/n + i sin 2ir/n. Then:


(i) the numbers wk for k = 0, 1, . . . , n — 1 are the n distinct
solutions of zn = 1;
(ii) if d is any complex number with d ^ 0 and if z0 = \/d then the
numbers Zk = z0 ■ wk for k = 0, 1, . . . , n — 1 are the n distinct
solutions of zn = d.

The proof of this is left to the student.


8.1] BASIC PROPERTIES 321

By the geometrical representation of product and power (Fig. 8.4 and


Theorem 8.19) the numbers wk for k = 0, 1, . . . , n — 1 are vertices of
a polygon with n vertices and n equal angles, in other words of the regular
'polygon with n sides. Thus, for example, finding the numbers cos 27r/5 and
sin 27t/5 corresponds to constructing the regular pentagon:

Figure 8.6

The question of whether it is possible to construct such regular polygons


by certain means is a classical problem of plane geometry. So also is the
problem of trisecting any given angle. In trigonometric terms this can be
expressed as finding cos 6/3 and sin 6/3, given cos 6 and sin 6; in algebraic
terms we are involved with finding roots of certain third degree equations
as discussed above. Thus various geometrical problems are recast into
algebraic form. One of the striking contributions of algebra has been the
determination of the possibilities for carrying out various geometric con¬
structions on the basis of purely algebraic considerations. This is a matter
which will occupy our attention in the next chapter.
First, however, we must complete our work on roots of complex poly¬
nomials. This is now settled for the polynomials £" — d where d £ C.
In the next section we make use of the techniques developed here to
handle the general case.

Exercise Group 8.1

1. Prove Theorem 8.9.


2. Given z £ C, what is the significance of (z + z) /2, (z — z)/2i, z/\z\2
(forz 7^ 0) ?
3. Given a real number a > 0 and a complex number zi, what corresponds,
in the geometric interpretation of the complex numbers, to each of the
following conditions:

|z| = a, \z — z11 = a, |z| < a, \z — zi| < a?

4. Are there any complex roots z of £3 — ~b £ T 4 with |z| < 1 ?


5. (a) Prove Theorem 8.19(i), (ii).
(b) Does 8.19(h) hold for any n £ I? .Justify your answer.
322 THE COMPLEX NUMBERS [chap. 8

6. If the trigonometric representation of z is r(cos 6 -j- i sin 9) where r ^ 0


and 0 < 9 < 2tt, what is the trigonometric representation of 2-"1?
7. Show how 8.9 can be derived from 8.20, and that 8.10 and 8.21 are in
accordance for n = 2.
8. Express in algebraic form (i.e., in terms of i and specific nth roots of real
numbers) all the complex numbers z with:
(a) 23 = —i
(b) 26 = -i
(c) 23 = -2 + i • 2
(d) 24 = (-1/2) - iV3/2
9. Find a polynomial equation to determine x = cos 9/5, given cos 9.
10. Prove Theorem 8.22(i), (ii).
11. (a) Show that if w = cos 2-7+5 + i sin 2-7+5 then wi4 + w3-\- w2jr w-f 1=0.
(b) Let u = M)+l/w. Show that u2 + u = 1.
(c) Use (a), (b) to represent w algebraically.
12. Suppose that n G P. Let w = cos 2x/n + i sin 2t/n. If z G C, we call
2 a primitive nth root of unity if the numbers zk for k = 0, 1, . . . , n — 1
are all the solutions u of un — 1. Show that 2 is a primitive nth root of
unity if and only if 2 = wm for some m G P with 1 < m < n and m
relatively prime to n.
13. Let E be the function of 7.45. We write E(x) = ex. The function E is
extended to the complex numbers by the following definition: E(z) =
e+cos y + i sin y) whenever 2 = x + iy with x, y real. We also write
E(z) = ez. Show that:
(a) ezi+*2 = e2ie*2 for any z\, 22 G C
(b) ekz = (ez)k for any 2 G C, k G I
(c) e2nn = 1 for any n G I
Note that e(2,rl/n) is another representation of the primitive nth root of
unity cos 2ir/n + i sin 2t/n.

8.2 Polynomials and continuous functions in the complex numbers.


The notions of limit and continuity can be extended to the complex num¬
bers by using an appropriate notion of “distance ” between complex num¬
bers. For real numbers, the distance between x and a is given by \x — a\.
This suggests using \z — d\ to measure the distance between two complex
numbers 2 and d, leading to the following diagram:

Figure 8.7
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 323

Since \z — d\ gives the distance of the point corresponding to z — d from


the origin, it also gives the distance between the points corresponding to
z, d by the parallelogram law. Algebraically, if we set z = x + iy,
d = s + if with x, y, s, t real, we have |z — d\ = y/ (x — s)2 -f- (y — t)2,
which leads to a direct justification via Pythagoras’ theorem.
In the notions of limit and continuity we make use of conditions of the
form \z — d\ < r where r is a real number (> 0). Geometrically for a
given d and r, the condition |z — d\ = r is satished by all z whose distance
from d is exactly r, that is, by all z on the circle with center d and
radius r.

Then the set of 2 with \z — d\ < r corresponds to the set of points interior
to the circle \z — d\ = r.
We shall show in this section that in terms of this notion of distance
between complex numbers, not only the notions of limit and continuity but
also the basic results concerning these can be extended in a straightforward
way from the real to the complex numbers. Our purpose in doing this is
to provide us with the following approach to finding roots of complex
polynomials/(£) E C[£]. We shall show that for each such the correspond¬
ing function/(z) is a continuous function on C. Now we cannot in general
speak of maximum or minimum values of such a function, since C is not
ordered. However, the function \f(z)\ will also be seen to be continuous,
and this function takes on only nonnegative real values. Then /(£) has
a root if and only if the function \f(z)\ attains the minimum value 0 for
some z. Our first main step will be to show that \f(z)\ does always attain
some minimum value, generalizing 7.50. After that, we shall have to make
some arguments, which hold for the complex numbers, to show that this
minimum value cannot be other than 0. In the following we cover only
those parts of complex analysis needed to reach this result.
If we look back at the proof of 7.50, we see that essential use was made
of the Bolzano-Weierstrass Theorem 7.26, according to which every
bounded sequence contains a convergent subsequence. We also need here
the analogue of this. We thus begin by generalizing some of the material
of Section 7.2 on limits of sequences.
324 THE COMPLEX NUMBERS [chap. 8

Limits and the Bolzano-Weierstrass theorem extended.

8.23 Definition. Let (Zk) = {z0, . . . , Zk, . . .) he an infinite sequence of


complex numbers Zk-
(i) If d G C, we say d is a limit of (Zk) and write

lim Zk = d
k—>co

if for each real e > 0 there exists an m such that \zk — d| < e
for all k > m.
(ii) We say {Zk) is convergent if there exists some d E C which is a
limit of (Zk).
(iii) We say {Zk) is bounded if for some real number M and all k,
\zk\ < M.
(iv) We say (wk) is a subsequence of (zk) if for some sequence of
integers (jk), jo < ji < • • • < jk < • • • and wk = Zjk for
each k.

It is easily seen, in generalization of 7.20, that each sequence (zk) has


at most one limit d; hence we are justified in writing lim^,*, zk = d when
such a limit exists. The Bolzano-Weierstrass theorem for C now reads
as follows.

8.24 Theorem. Suppose that (zk) is a bounded sequence of complex numbers.


Then (zk) contains at least one convergent subsequence.

Proof. In the proof of 7.26 the basic idea was that if an interval
[b, c] = {x: b < x < c} contains xk for infinitely many k, the same holds
of one of the two subintervals [b, (b + c)/2], [(b + c)/2, c] obtained by
dividing the original interval in half. In this proof we consider rectangles
in the plane, instead of intervals, and subdivide such into four equal sub¬
rectangles. By a (closed) rectangle here we could mean simply [b, c] X
[b', c'] = {(x, y):b < x < c and b' < y < c'}. We modify this slightly,
so that we are dealing with complex numbers directly. We define

(1) [6, c] (x) [&', c'] — {x + iy: b < x < c and b' < y < c'}

for any b, c, b', c' e Re. Let us put

(2) Zk = xk + iyk where xk, ijk G Re,

for each k. Since (zk) is bounded, we have a real number M with


Vzjfe + y\ — W\ < M for each k. But \xk\, \yk\ < + y\. Hence
we can choose b0, c0 so that b0 < c0 and

(3) zk G [60, c0] (x) [&0, cn] for all k.


8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 325

Now, in general, we have for any real b, c, b', c':

(4) if there are infinitely many k with Zk G [b, c] (x) [&', c'} then the
same holds for at least one of the sets

i, b + c b + c b'
h’
v 'GtA
V 1 o b, > c

b + c b' + c' 6 + c 'y + C' ;


(Y)
2 ' C vS) ’ 2 2 ’ C vS) 2 ' C

We denote these sets respectively by Sfib, c, b', c') for i — 1, 2, 3, 4.

This result is illustrated in the following figure.

c'
1 1
S2 ; s4

Si ’T'T"”
V

Figure 8.9

If c — 6 = c' — b' then the original set [6, c] (x) [bf, c'] and each of its
subsets Sfib, c, b', c') will be squares. We can now define recursively the
sequences bn, cn, bn, c'n as follows.

(5) We take b0, c0 as given in (3) and b'0 = b0, c'0 = c0. Given
bn, cn, b'n, c’n we take bn+l, cn+1, b'n+i, c'n+1 to be such that [bn+1,
cn+1] (x) [b'n+i, c'n+i\ = Si(bn, cn, b'n, c'n) for the first i (=1, 2, 3, 4)
such that Zk G Sfibn, cn, b'n, c'„) for infinitely many k.

Thus, for example, if the first such i is equal to 2, for given bn, cn, b'n, c'n,
we have

bn “h Cn if _ bn ~T Cn
bn A-1 b n, Cn-\-1 7) f ^ Ln-\-1 — Cr

Now the following properties are easily proved by induction on n:

(6) (i) bn < cn and b'n < c'n;


(ii) bn bn^.i, c?i+i — cn, bn bn+1 o/nd cn_|_i ^ cn,

(iii) cn — bn = c'n — b'n = — (c0 — b0)]

(iv) there are infinitely many k such that Zk G [bn, cn\ (x) [b'n, c'n\.
326 THE COMPLEX NUMBERS [chap. 8

Then, as in the proof of 7.26, we proceed as follows:

(7) Let B — {bo, . . . , bnj . . .}, C = {cq, . . . , cnj ...}■,


5' = {&&, . . . , K, . . .}, and C' = {c'0, . . . , c'n, . . .}.

Then each cre(c') is an upper bound for B(B') and each bn(b'n) is
a lower bound for C (C).

Hence sup 5, sup iT, inf (7, inf C exist by 7.9 and we can prove, in the
same way as in 7.26, that

(8) sup B = inf C and sup B' = inf C’.

Let

(9) d = (sup B) + i(sup B').

We want to show that d is the limit of a suitably chosen subsequence of


the Zk. One such subsequence is found as follows.

(10) Let j0 = 0. Given jn, let jn+\ be the least k such that jn < k
and zk E [6n+1, cn + 1] (x) [b'n+l, c'+x]. Let wn = zj for all
n > 0.

Then we see that

(11) (wk) is a subsequence of (zk), with wn E [bn, cn\ (x) [b'n, c'n\ for
all n.

Since also d E [bn, cn] 0 [b'n, c'n] for all n, in order to show that

(12) lim wn = d,
n-+oo

it suffices to show that

(13) for any real e > 0 we can find an m such that \w — u\ < e
whenever w, u E [bn, cn] 0 [b'n, c'n] and n > m.

This is realized by computing, for any n and for w, u E [bn, cn] 0 [b'n, c'n],
the absolute value \w — u\2 as follows: if w = nq + iw2, u = Ui + in2,
then

IW - “I2 = 01 - Ul)2 + 02 - U2)2 < (Cn ~ bn)2 + (C'n ~ Vnf

^2'(0(c«-»o)2,
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 327

the last inequality by (6)(iii). Hence

\w-u\< V2 (C° ~ bo) •

Then given e > 0, choose m so that

\/2 (cq bo)


e

This gives (13) and thus concludes the proof of the theorem.

Essentially all that we have used about the complex numbers in this
theorem comes from their representation in the plane Re X Re and from
the basic properties of the distance function |z — w\ in this representation.
It could equally well have been expressed and proved as a theorem con¬
cerning Re X Re, without any mention of C. In the same way one can
prove a corresponding theorem for the three-dimensional Euclidean space
Re X Re X Re, and for higher-dimensional Euclidean spaces. In recent
years the theorem has undergone considerable generalization in the sub¬
ject of (point-set) topology. We can also generalize the notion of fundamental
sequence (7.22) to two and higher dimensions, and prove the basic Cauchy
condition for convergence of a sequence, namely that it be a fundamental
sequence (7.23, 7.27). We shall not need this here, and leave it to the
reader to pursue.

Continuity extended. We now pass over to analogues of the material of


Section 7.4 on continuous functions.

8.25 Definition. Suppose that F is a unary function with 30(F) = C,


(R(F) c C. Suppose that d G C.
(i) We say F is continuous at d if for any (real) e > 0 there exists
a 8 > 0 such that whenever

\z — d\ < 8 then |F(z) — F(d)\ < e.

(ii) We say F is continuous on C if for each complex number d, F is


continuous at d.

We easily obtain the following, as with 7.47.

8.26 Lemma. Suppose that F is continuous on C and that (zk) is a convergent


sequence of complex numbers with lim*^* zk = d. Then (F(zk)) is
also a convergent sequence and lim^oo F(zk) = F(d).

Next, we have the following generalization of 7.51. The proof is left


to the reader.
328 THE COMPLEX NUMBERS [chap. 8

8.27 Theorem. Suppose that c is complex and that G, H are continuous


functions on C. Then the function F defined by any one of the following
conditions, for all z G C, is continuous on C:
(i) F(z) = c;
(ii) F(z) = z;
(iii) F(z) = G(z) + H(z) ;
(iv) F{z) = G(z) ■ H(z) ;
(v) F{z) = \G(z)\.

Now we can prove a generalization of 7.50(ii) for continuous functions


of the form |F(z)|. [Part (i) can also be generalized in the same way, but
will not be needed here.]

8.28 Theorem. Suppose that F is continuous on C and that r is a real


number >0. Then there exists at least one complex number d such that
\d\ < r and |F(d)| < \F(z)\ for all z with \z\ < r.

Proof. This follows the same lines as the proof of 7.50. Let

A = (|T(z)|: \z\ < r}.

Then A is bounded below by 0. Let a = inf A. By 7.18(i), for each


n G P we can find some zn G C with \zn\ < r and \F(zn)\ — a < 1/n.
By the Bolzano-Weierstrass theorem for C, (zn) has a convergent sub¬
sequence (wk) = (Znk), where n0 < n\ < • • • < nk < ■ • •. Let lim^*
wk = d. Then since each \wk\ < r, it is seen that |d| < r. By 8.26 and
8.27(v), lim^^oo |F(wfc)| = \F(d)\. To conclude the proof, we show that
a = |F(d)|. Given any e > Owe can find n G P with 1/n < e/2. Thus
whenever Jc is large enough so that n < nk we have | \F(wk)\ — a\ =
\F{zn^)\ — a < l/nk < e/2. Also we can find m so that | |F(d)j —
\F(wk)\ | < e/2 for all k > m. Hence for k large enough, | |T(d)| — a\ < e
Since this is true for each e > 0, we must have a = |H(d)|.

Polynomial functions; growth and minimum of the modulus. By the preced¬


ing theorem any continuous function on C attains its minimum absolute
value within any given circle. To apply this result to polynomials, we next
obtain directly from 8.27(i)-(iv) the following.

8.29 Theorem. If M) G C[£], /(£) = E?=o at? where the a{ are complex,
then the associated polynomial function f (z) = £"=o a#* is continuous.

It follows, of course, that \f(z) \ attains a minimum in any given circle.


However, we want now to obtain a stronger conclusion, namely that the
modulus |/(«) | takes on an absolute minimum relative to the whole com¬
plex plane. This is realized from the following theorem, which shows that,
on the whole, \f(z)\ can only grow larger as |z| is made larger.
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 329

8.30 Theorem. Suppose that /(£) eC [£],/(£) = with

deg (/(f)) = n > 0.

Then for any real number M > 0 we can find a real number r > 0
smc/i that |/(2) | > M whenever |z| > r.

Proof. By 8.13(v), (vi), we have

n— 1 n— 1
|/(2) | > \anZn\ — ^ at-2* > | an2n | — ^ |a*2*|.
i=0 i=0

Thus by 8.13(iii) we have:


n—1
\(ti
(1) if \z| > 1 then\f(z)\ > |a„| |z|
fin\ 2|

Let
n— 1
|Oi
(2) r>= E I Ojn.
1=0

Then we see that


\an\ z
(3) if \z\ > max (1, 2rf) then \ f(z)\ >

To prove the theorem, it is sufficient to choose r > 0 so that

(4) r > max (1, 2rf) and rn > j^r >


|®n |

which is clearly possible.

8.31 Theorem. Suppose that/(£) £ C[£] with deg (/(f)) > 0. Then there
exists at least one complex number d with \f(d)\ < \f(z)\ for all z e C.

Proof. By 8.28, we can find d\ £ C with \d\\ < 1 and |/(di)| < |/(z)|
whenever \z\ < 1. Let M = |/(di)|. By 8.30 we can find r > 1 such that
|/(2)| > M whenever |2| > r. Now if we apply 8.28 again, we can find
d G C with |/(d)| < |/(2)| for all 2 with \z\ < r. This d satisfies the con¬
clusion, for if \z\ > r we have \f(z)\ > |/(di)| > !/(d)|, since |di| < r.

The next theorem gives us another important property of polynomials.


When combined with 8.31 it will lead directly to our main theorem.

8.32 Theorem. Suppose that /(£) G C[£] with deg (/(f)) > 0. Then for
each d G C with f(d) 5* 0 there exists some 2 G C with \f(z)\ < \f(d)\.
330 THE COMPLEX NUMBERS [chap. 8

Proof. Let c = f(d) and n = deg (/(£)). Thus c ^ 0, n > 0. We shall


investigate the behavior of f(z) in a neighborhood of d. This is the same as
studying the behavior of fid + z) for z in a neighborhood of 0. Let o(£) =
f(d + 0/c. Then <?(£) £ C[£] and

(1) (i) g(z) = jor an 2 G (]■


0

(ii) g(£) = 1 + &i£ + • • • + bn£n, with all hi £ C, bn ^ 0.

For if /(£) = L"=o Oif we have

ff(£) = ff (£ + d)1 = ^
i—0 i— 0

so bn = an/c. Since ^(0) = f(d)/c = 1, this shows that 60 = 1. If

(2) there exists a z £ C with \g{z)\ < 1,

then the theorem is proved, for \f(d -f z)\ < |c| = |/(d)| in this case. Let

(3) g(£) = 1 + bk%k + • • • + bnC, where 0 < k < n and bk ^ 0.

By 8.18 we can write

(4) bk = |6*|(cos d + i sin d), 2 = r(cos 0 + i sin 6),

where d, 0 are real numbers and where 0 < r. Although |6*| and d are
taken to be fixed, we are still free to determine r, 6. By De Moivre’s theorem,

(5) bkzk = |&fc|r*[cos (d + ke) + i sin (d + kO)].

Our strategy is now as follows: we shall choose 6 in such a way that bkzk
is a negative real number, and r is so small that |1 + bkzk\ < 1; we then
choose r still smaller, if necessary, so that when \bk+1zk+l + •••-[- bnzn\
is added to |1 + bkzk\, the result is still smaller than 1. For the first purpose
we simply take

Then d + kd = 7r and

(7) bkzk = —\bk\rk.

If k = n, we can choose r = l/^|6fc|, so that g(z) = 0 and (2) is satisfied.


(We are in effect taking 2 to be a kth root of —1/6* in this case.) Suppose
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 331

that k < n. Now if we choose r < 1 then

\bk+1zk+1 + • • • + bnzn | = rk+1\bk+1 + • • • + bnzn~k~1\


< r*+1(|&*+i| + • • • + |bn\rn-k~l)

< r*r(1| + • • • + |&n|)-

1 ^ \bk\
(8) Choose r < min
■\P\bk\ + ' ‘ ' + I bn

Then

|{?(z)| < |1 + bkzk| T \bk+izk^ 1 T ‘ ■ ■ T bnz111


< |1 - |&*|rfc| + rk\bk\ = (1 - \bk\rk) + \bk\rk = 1.

This proves (2) and hence the theorem.

The fundamental theorem of complex algebra. We can now combine these


results to prove our main theorem.

8.33 Theorem. Suppose that /(£) £ C[£], /(£) = Xo=o aiC, with
deg (/(£)) = n > 0. Then:
(i) there exists at least one complex root of /(£), i.e., some z £ C with
m = 0;
(ii) for some zlt . . . , zn £ C, /(£) = an(^ — si) • • • (? — zn)-

Proof, (i) By 8.31 we can find d £ C with |/(d)| < |/(2)| for all 2 G C.
Then /(d) = 0, for otherwise by 8.32 we could find some zeC with
\f(z)\ < \ f(d) |. (ii) follows by induction on n, using 5.13.

This theorem is one of the many remarkable and important contribu¬


tions of Gauss to mathematics (and to physics and astronomy as well).
Gauss himself gave five different proofs of it, and various other proofs have
been developed since then. The main step in the present proof, the
Theorem 8.32, is a special case of a general theorem in complex analysis
due to Weierstrass.
One sometimes hears it said that the fundamental theorem of algebra
is neither “fundamental” nor a “theorem of algebra.” There is perhaps
some justice to these remarks. The greater the number of important re¬
sults that are seen to flow from a theorem, the more we are inclined to
refer to it as being fundamental. From the point of view of modern algebra,
it cannot be said that 8.33 is more fundamental than other results. How¬
ever, as regards our work here, its significance is of the first order. Our
entire effort has been bent to overcoming the algebraic incompleteness of
the number systems P, I, Ra, and Re with which we have successively
dealt and, in this measure, each of these provided some advance over the
332 THE COMPLEX NUMBERS [CHAP. 8

preceding. The Theorem 8.33 can be restated to the effect that, as far as
roots of 'polynomial equations are concerned, C is algebraically complete.
Thus, in this respect, our work has come to an end, and any constructions
of larger number systems than C must be based on other considerations.
(There are, in fact, a number of such extensions which are of interest, but
they do not fall within the framework of this book because of the nature
of these other considerations.) On the other hand, we shall see that if it
is just this property of algebraic completeness or closure that we are in¬
terested in, a certain subfield of C already sez’ves the purpose, namely the
field Alg of algebraic complex numbers. This notion will be defined in the
next section in obvious generalization of 7.61.
The construction of C from Re is algebraic in spirit, but that of Re
from Ra is not. The first of these can be said to proceed by formally adjoin¬
ing a root of £2 -f- 1 to Re, in a sense which will be defined precisely in the
next chapter. However, the construction of Re from Ra involved some
essentially nonalgebraic concepts, either the use of arbitrary Dedekind
sections in Ra or of arbitrary fundamental sequences in Ra, and the
pervading idea is the geometric-analytic notion of continuity. That this
can be avoided if we wish to be content with Alg is, compared to our in¬
tuitive conception of Re and C, a sophisticated development. This is
realized by a general treatment of the process of formal adjunction of
roots of polynomials to given fields. It is this procedure which takes on
central importance in modern algebra, and to which we shall devote atten¬
tion in the next chapter.
Thus, when it is said that the fundamental theorem of algebra is not
really a theorem of algebra, what is suggested in part is the nature of the
setting, C, of the theorem. What is also suggested is the nature of the proof
of the theorem. From this point of view it may rightly be said that it is
“really” a theorem of complex analysis or, even, “really” a theorem of
topology. This is not to say that one cannot give algebraic proofs of it.
In fact, one of Gauss’ proofs is very much in the spirit of modern algebra,
but again involves more sophisticated work.
The proof which we have given here makes use of a minimum amount of
information from analysis. By various minor modifications, one could
do with even less, in particular without the use of the trigonometric
functions. However, this is also less enlightening; to do without the
trigonometric representation would also mean doing without various in¬
formative results such as 8.20, 8.22 on the roots of £n — d.
In the opposite direction, we want to indicate what a fuller use of com¬
plex analysis would provide. As with the real numbers, it is natural to
consider functions F defined on C by power series, F(z) = £“L0 atf
where the a; e C. Questions of convergence for such series can often be
determined in much the same way as 7.41 is used for real power series.
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 333

Thus if limfc__*x |a&+i/afc| = d exists then o ^ converges for all z if


d = 0, and converges for all z with \z\ < 1/d if d ^ 0. In the latter case,
the set {z: |z| < r}, where r = 1/d, is called the circle of convergence;
in the former case, the whole complex plane is referred to as the circle
of convergence. A function defined by a power series which converges for
all 2 is said to be an entire function; in particular any polynomial function
is an entire function. Functions which are defined by power series on the
interior of their circle of convergence have many beautiful properties there,
among which is the property that they are continuous there and all deriva¬
tives of the function are defined there. Such functions are said to be
analytic or holomorphic in their domain of definition.
An example of an entire function is the natural extension of the func¬
tion E of 7.43 to C, as given by E(z) = 'fZt=o zk/k\. In extension of nota¬
tion, we also write ez for E(z). Various of the properties 7.45 of E for real
numbers also continue to hold, e.g., eZl ■ e*2 = eZl+Z2 for any zlt z2 G C.
We have

(i )kzk (-1 )kz2k\ (— \)kz2k+1\


e
iz
£
k=0
k\ (2k) \J (2k + 1)! )
Now, as we have stated in the Appendix II, we have for any real y,

(-1 ) Vk (~i )Vfc+1


cos y = £
k=0
(2*0!
and sin y £
k=0
(2k + 1)! ’

so that elV = cos y + i sin y. Then ex+VJ — exelV = ex(cos y + i sin y);
this is what suggested the definition of Exercise 13 of the preceding section.
With a modest development of complex analysis, the above power series
definitions can be used to develop all of the basic properties 8.14 of the
trigonometric functions.
To return to the main matter at hand, a function F on C is said to be
constant if for some complex number c, F(z) = c for all 2. Then the general
theorem of Weierstrass, to which we have referred in connection with
8.32, is that if F is an entire function but is not constant and if F(d) ^ 0
then there exists 2 G C with |F(z)| < \F(d)\. There also exists 2 G C with
\F(z)\ > \F(d)\, and in both cases 2 can be chosen arbitrarily close to d.
Closely related to the latter statement is Liouville’s theorem a bounded
entire function (|F(z)| < M for some M and all 2) must be constant.
This leads to another proof of the fundamental theorem. For it can be
shown that if /(£) has no roots, the function l//(z) is entire; but also
1 //(z) can be seen to be bounded by using 8.30.
The function E(z) = ez is entire and not constant; thus ez is not bounded,
by Liouville’s theorem. Furthermore ez = 0 has no solutions, for \ez\ =
334 THE COMPLEX NUMBERS [CHAP. 8

\ex+iv\ = ex 0 by 7.45. It may be asked why, since 8.32 generalizes to


any entire function, we cannot also derive a theorem like 8.33 for such,
in particular that ez = 0 for some z G C. The reason lies in the difference
between the way \F(z)\ grows for F(z) a polynomial function and for F{z)
an entire function like ez. By 8.30, for F(z) a polynomial function, \F(z)\
must increase without bound in all directions, for \z\ increasing. In con¬
trast, \ez\ = ex increases without bound in some directions, namely for x
increasing, but decreases in others, namely for x (negative and) decreasing.
In general, although for each particular r we can conclude from 8.28 that
an entire function F achieves a minimum for \z\ < r, without the kind of
growth condition 8.30 we cannot conclude that F achieves an absolute
minimum, as was done for polynomials in 8.31. Without such a conclusion,
we cannot use 8.32 to obtain a more general theorem about the existence
of zeros of entire functions.

On computing roots of complex polynomials. We wish now to examine the


computational aspects of the fundamental theorem of algebra. As with
real polynomials, we can ask whether there is a systematic method for
computing, to any desired degree of accuracy, at least one root 2 of a poly¬
nomial /(£) e C[£]. By this we mean computing to any desired degree
of accuracy some real numbers x, y such that x + iy is a root of /(£).
More generally, we can ask whether there is an analogue of Sturm’s
Theorem 7.60 which would allow us to determine the exact number of roots
of /(£) in any given region of the complex plane, and hence to determine
all of the roots to any given degree of accuracy. It would be sufficient here
to deal with rather simple regions such as squares or circles.
Our proof of the fundamental theorem is somewhat more closely related
to the Theorem 7.50 on the existence of minima for real functions than it
is to Weierstrass’ Nullstellensatz. The latter provided one direct method
for computing at least one root of a real function which is positive at one
point, negative at another. However, as we remarked, there is in general
no such method for finding where a real function F attains a minimum on
a given interval [a, &]. It can be seen that there is such a method for poly¬
nomial functions f(x). For, according to Exercise 9 of Exercise Group 7.4,
if f(x) attains a minimum at x = c in an interval [a, 6] either/'(c) = 0 or
c = a ore = b. All the roots c of /'(£) can be located by successive applica¬
tions of Sturm’s theorem, and then a minimum can be determined by com¬
paring /(a), /(&), and the various values /(c).
Unfortunately, there is no simple way to extend this method to complex
polynomials or to reduce the problem for such to the corresponding problem
for real polynomials. It can be seen that for any polynomial /(£) with real
or complex coefficients we have f(x + iy) = g(x, y) + ih(x, y) where
g(£i, £2), h(£i, £2) are polynomials in two variables with real coefficients.
8.2] POLYNOMIALS AND CONTINUOUS FUNCTIONS 335

Then we can look upon finding a root of /(£) as either finding (x, y) where
I/O +iy)\ or, equivalently, where |f(x + iy)|2 = g2(x, y) + h2(x, y) is a
minimum, or finding (x, y) such that g(x, y) = 0 and h(x, y) = 0. In
either case we have a more difficult problem than that for real polynomials
with one variable.
It turns out from advanced work in analysis that there is an algorithm
for calculating to any degree of accuracy all of the roots of a polynomial
/(£) G C[£] whose coefficients themselves can be computed to any degree
of accuracy. This depends on the notion of complex integration, and the
theorem that if S is a simple closed curve, then (1/27ri) fs(f'(z)/f(z)) dz is
equal to the number of roots of /(£), each counted as often as its multipli¬
city in /(£), which are within the interior of S; here S must be such that
f(z) = 0 for all 2 on S. In general, this integration must be replaced by an
approximate integration in order to carry out the algorithm. One trouble
concerning this procedure is the following. If deg (/(£)) = n and /(£)
has n distinct roots, the computation procedure will eventually isolate
each of these. However, if there are multiple roots, the procedure will not
uncover this for us, so that at a given stage of the process we may not be
sure whether we are isolating a multiple root or two roots extremely close
together. Thus it is preferable to deal instead with the polynomial //£)
given by 6.39, which has the same complex roots as /(£), but all of which
are simple for //£).
On the other hand, by a more detailed algebraic analysis, it has been
shown by Tarski that Sturm’s procedure can be generalized to polynomials
of several variables with real coefficients. In particular, if g(£lt £2£2)
are two such polynomials, his work provides a systematic procedure for
calculating the exact number of pairs (x, y) satisfying g(x, y) = 0,
h(x, y) — 0, t < x < b, a' < y < b', by certain calculations on a, b,
a', b' and the coefficients of <?(£i, £2) and /i(£i, £2)- These computations
can be carried out effectively if all these numbers are rational, or more
generally, algebraic. Tarski’s procedure can thus be used to decide exactly
how many roots (and even with what multiplicities) a given polynomial
/(£) with algebraic coefficients has within a given (algebraically defined)
region of the complex plane, and then to compute these roots to any
desired degree of accuracy.
In both cases not only the theoretical justification but also the procedure
to be followed is rather involved. It is thus outside of the range of this
book to try to describe them any more closely.

Decomposition of real polynomials. We now return to the main lines of


our study by considering the algebraic consequences of the fundamental
theorem. This will occupy our attention in the next section and the suc¬
ceeding chapter. We conclude this section with a direct consequence,
336 THE COMPLEX NUMBERS [chap. 8

which now completely settles the structure of polynomials with real


coefficients. We leave the proof to the reader.

8.34 Theorem. Suppose that /(£) e Re[£] with deg (/(£)) =n > 0.
Then:
(i) there exist k, l e I with 0 < l < n, 2k + l = n, and a, hi, ... , hk,
Ci, ... , Ck, di, . . . , di e Re such that for each i = 1, . . . , k,
Ifj — 4Ci < 0 and such that

/(£) = °(£2 + h\£ + ci) • • • (£2 + bid;4-c&)(£ — di) . . . (£ — di);

(ii) /(£) fs prime in Re[£] if and only if n = 1 or n = 2 and /(£)


/ias no real roots;
(iii) Z/ie representation in (i) is unique up to the order of the factors.

This theorem is very useful, for example, in the integral calculus. By


its means it can be shown that the integration of any rational function,
i.e., a quotient of two polynomial functions, can be reduced to certain
standard forms. This is effectively possible if we have decompositions of
the two polynomials in the form (i) above. The method involved, that of
“partial fractions,” was described in a general form in Exercise 12 of
Exercise Group 6.4. If this method is applied to the particular case of
rational forms p in Re(£), we have the result that every such p can be
expressed as a polynomial in Re(£) plus a sum of terms of one of the two
forms
(ai£ + a2)/(£2 + + c)m or a/(£ - d)m,

where all coefficients are real. The integration of the corresponding


functions is then easily carried out.

Exercise Group 8.2

. Generalize the notion of fundamental sequence (7.22) to the complex


1

numbers and prove generalizations of the Theorems 7.23, 7.27 giving


Cauchy’s criterion for convergence of a sequence.
2. Prove 8.27(v).
3. Is any of the functions F defined by the following conditions continuous
on (7? Prove your statement.
(a) F(z) = x for the unique x, y G Re with 2 = x + iy;
(b) F(0) = 0, and if z 5^ 0, F{z) = 9 for the unique r, 9 G Re with
0 < r, 0 ^ 6 < 2ir, and 2 = r(cos 9 -f i sin 6);
(c) F{z) = \/z.
4. Prove Theorem 8.34(i)—(iii).
8.3] ROOTS OF COMPLEX POLYNOMIALS 337

5. (a) Find all the complex roots of f4 — f2 + 1. Find the decomposition


of this over Re as in 8.34(i).
(b) Do the same for the polynomial /(f) = ]T?=0 f,:. [Hint: The roots of
/(f) are the 8th roots of unity, other than 1.]
6. Find the partial fractions representation of l/(f4 — f2 + 1) in Re(f) as
described at the end of the section.

8.3 Roots of complex polynomials. Roots of 'polynomials over a subfield.


We now consider a more general situation. Let K be any subfield of C.
We wish to see when we can conclude that every polynomial over K already
has a root in K (and hence as in 8.33, decomposes completely into linear
polynomials over K). For this purpose we associate with any K the set
of all roots of polynomials over K.

8.35 Definition. Suppose that K is a subfield of C.


(i) If z e C we say that z is algebraic over K if for some /(f) e A[f]
we have /(f) ^ 0 and f(z) = 0. We denote by Alg (K) the set of
all elements which are algebraic over K. In particular, we put
Alg = Alg (Ra) and say that z is algebraic if z e Alg.
(ii) We say that K is algebraically closed if for each /(f) e/C[f] with
deg (/(f)) > 0 we have f(z) = 0 for some z e K.

Note that the set of algebraic real numbers, as defined in 7.61, is just
the set Alg n Re. The following is just a restatement of part (i) of the
fundamental theorem 8.33. The theorem immediately following it lists
elementary consequences of the preceding definition.

8.36 Theorem. C is algebraically closed.

8.37 Theorem. Suppose that K is any subfield of C. Then we have:


(i) K c Alg (K) c C;
(ii) if K is algebraically closed and /(f) = ffi=o crF £ K[ f] with
deg (/(f)) = n > 0 then for some zlt . . . , zn E K, /(f) —
Or»(f 2i) . . . (f Zn) ,
(iii) K is algebraically closed if and only if Alg (K) c K.

We leave the proof of this to the reader; it is seen that the algebraic
closure of C enters essentially into the proof of (iii).
It can be shown that if K is denumerable then so also is Alg (.K); this
follows by essentially the same lines of argument as for 7.70. More generally
it can be shown that K ~ Alg (K).

Algebraically closed subfields. We have already mentioned several times


that the set of algebraic numbers possesses all the desirable algebraic
338 THE COMPLEX NUMBERS [CHAP. 8

properties of the field of all complex numbers. What we have in mind is


stated precisely in the following important and more general theorem.

8.38 Theorem. Suppose that K is any subfield of C. Then:


(i) Alg (K) is also a subfield of C;
(ii) Alg (K) is algebraically closed.

Proof. Let A = Alg (K). In order to prove that A is a subfield of C


it is sufficient, as we have seen in Section 6.1, to prove that 1 £ A and that

(1) if z, w E A then z + w G A and z ■ w E A and, provided z ^ 0,


also z~1 E A.

Givens, w E Awe can find polynomials /(£), <?(£) £ K[£\ with deg (/(£)) =
n > 0, deg (g(£)) = m > 0, and f(z) = 0, g(w) = 0.
That z~1 E A if z ^ 0 is easiest to prove. If we write /(£) = a0 -+-
£ —h* * * * ”f~ £ with, till df EE I\_ then —\~ ci\Z —\- • • • -f~ &nzn = 0 so
ao{z~l)n + ai{z~l)n~x + • • • + an = 0 and s_1 is a root of the poly¬
nomial L"=o
We now prove that z -fi w e A. We can assume, without loss of
generality, that both /(£), g{£) are monic. Hence, by 8.33(ii), we can
find zi, . . . , zn, Wi, . . . , wm E C with

(2) z = Z!, w = wx, fit) = (£ — Zi) . . . (£ — zn) and g(£) =


(£ — Wi) . . . (£ — wm).

We want a polynomial with coefficients in K of which + wx is a root.


More generally, we might hope to construct a polynomial with coefficients
in K of which each Zi + Wj is a root for i = 1, . . . , n, j = 1, . . . , m.
The simplest such polynomial is

n m

(3) HZ) = U H — (zi + to,-)].


i= 1 j=l

We wish to show that h{f) E A[£], First note that for any i,

ra in
nif- (Zi + = n at — zi) — wi\ = — zfi.
j=\ j=i

Hence

(4) Kt) = n^- zi)•


i= 1

It is clear that any change in the order of the does not affect the value
of h(%). This reminds us of the fundamental theorem on symmetric poly-
8.3] KOOTS OF COMPLEX POLYNOMIALS 339

nomials, 5.29. To put the matter in the proper form for application of
5.29 we consider polynomials in an (n + l)-fold transcendental extension
A[|, |x, . . . , In] of A. We wish to consider the element n?=i £/(l — &)
of A[|, |i, . . . , |n] as a polynomial in |i, . . . , In with coefficients in
K[|]. This is possible by 5.19. We put D = A[|] and then Z)[|1? ...,!«] =
A[|, |i, . . . , In]. Then in D[|i, . . . , |w] we have a polynomial p(|x,... |„)
with
n

(5) p(ll, • • • , In) = n^a- €<)•


i=l
Clearly

(6) p(!i, . . . , In) is symmetric in |i, - - - , In over D = A[|],

Hence by 5.29 we can find g(li, - - - , In) £ D[|x, . . . , |n] with

(0 P(l 1) ■ ■ ■ } In) g(<Ti(|i, ■ ■ ■ > In); • • • j O’n(ll) • • ■ ; In));

where <7j-(|i, . . . , |„), i = 1, . . . , n, are the elementary symmetric poly¬


nomials in n variables. Thus if we return to A[|, |x, . . . , |n] we can find

(8) r(|, |i, . . . , In) £ A[|, |i, . . . , In],


p(!l, • • • , In) = Kl> O'ldl, • • • ! In), • ■ • , O'ndl, • • • , In)) •

Now by (4), (5), (7), (8), we have h(|) = p(zi, . . . , z„) and then

(9) A(|) = r( |, . . . , Zn), . . . , • • • , 2n))-

Now recall the close connection between roots, coefficients, and the or,;,
stated in 5.28(i): we have

/(|) = (| — £j) - ‘ ‘ (I zn) = ^ 1 ( f) — i(2l> • • • , 2n) I •


i=0

Thus since/(|) E A[|] we have

(10) <rf(2i, . . . , Zn) £ A /or each i = 1, . . . , n.

Put . . . ,zn) = c*; then we see that A(|) = r(|, Ci, . . . , cTO), hence

(11) /i(|)eA[|],

Returning to (3) shows that z -\- w E. A, since h{z + w) = 0.


The proof that z — to G 4 is obtained with minor modifications of the
preceding, or by the observation that if z £ A also —2 £ A; for
( —1 yaii—zy = 0 whenever £™=0 a= 0. We leave to the reader
340 THE COMPLEX NUMBERS [CHAP. 8

the proof that 2 • w G A. With this the proof of (1) is completed and we
now know that A is a subfield of C.
To prove that A is algebraically closed it is sufficient by 8.37(iii) to
show that Alg (A) c A, i.e., that

(12) if /(£) g ■/![£], f(£) ^ 0, and f(z) = 0 then z e A.

Let /(£) = Wq + wi £ -f- • • • + where wn ^ 0, n > 0, and each


Wi E A. Hence for each i = 0, . . . , n we have a polynomial

(13) QiU) £ K[£\, deg (</;(£)) = mi > 0, and Qiiwi) = 0.

We can assume with no loss of generality that each Qi is monic. By the


fundamental theorem of algebra we can find for each i = 0, . . . , n,
numbers

(14) Itfi,!, • • • 7 wi,m. £ C,


Wi = wiA, and g»•(£) = (£ — wiA) . . . (£ — wi<m).
I

We wish now to construct a new nonzero polynomial h(£) of which 2 is


a root and for which h(£) E K(£). To satisfy the first of these it is suffi¬
cient that /(£) |/i(£). This (and the preceding symmetric polynomials
argument) suggests taking h(%) to be a product of polynomials like /(£),
in the sense that we take arbitrary substitutions witj for the coefficients
Wi of /(£). In other words, let

”*0 rni ™-n

(15) /?(£) = JJ XI * ' * IX (wo,j0 + wUi£ + • ‘ + wn,jnO-


> 0=1 h=1 y«=i

Now the symmetry is seen as follows. We consider an (m0 ■ mx • • • ran)-


fold transcendental extension obtained by taking

Do = -K"[£], = -D0[£o,l7 • • • 7 £0,m0],


D2 L)j[^i,i, . . . , £ 1,]7 • • ■ 7 Dn4.1 = Dn[^nAf . . . , £TCiWlJ.

Then we have an element p(£0,i, . . . , £n,m(i) of Dn+i corresponding to


fi(£) given by

(16) p(£0.1, • • • , £„,«„)


m0 nii n't n

= II II * * * II (£0,/o + £i./i£ + • • ■ + £n,j„£n)-


>0 = 1 >1 = 1 >n=l
Thus

(17) /l(£) = p(w0li, ■ • • , wB,m ).


8.3] ROOTS OF COMPLEX POLYNOMIALS 341

Now we use the symmetric polynomials argument repeatedly as follows.


If we regard p(?0l1, • • • , ?»,mB) as an element of /)„[(■„,lf . . . , ?»,mJ, we
see by (16) that it is symmetric in ?„a, ■ ■ ■ , U,mn- Hence it can be ex¬
pressed as a polynomial over Dn in the elementary symmetric polynomials
o’iUn, 1, . . . , £n,mn) of degree mn. But the values (?i(wnA, . . . , wn<m ) be¬
long to K, since these are related to the coefficients of gn(£) by 5.28(i).
Thus p(?o,i, • • • , ?n—i,mn_j) wn,i> ■ • • , wn,?nn) £ Dn. We now consider
this as an element of Z)n_i[?n—1,1, • • • , ?n-i,mH_J and see again by (16)
that it is symmetric in ?n_i,i, • • ■ , ?n—Hence by the same argu¬
ment as before, we have p(?0li, • • • , U-2,m„ 9, w»_i,i, . . . , u>»_i|TO ,,
■ ■ ■ > wn,mj £ Dn-1- By repeating this procedure n — 1 more times
we see that p(w0,1, . . . , w«,Mn) E D0 = A[?]. Thus /i(?) e R[?] by (17).
To illustrate the proof, consider first showing that a/3 -f- a/2 e Alg.
We know that a/3, a/2 g Alg, being roots of ?2 — 3, ?3 — 2, respectively.
We can write ?2 — 3 = (? — 2X)(? — z2) with zi = a/3, z2 = — a/3, and
?3 — 2 = (? — u>i)(? — w2)(k — w3) with wi = a/2, w2 = a/2/
= a/2/, where f = cos 27t/3 + i sin 27t/3 (8.22). Following the
first part of the proof of the preceding theorem, we set

/?) = (? /i + w>i))(? — Zi + w2))(? (^i + W3))


X (? — (z2 + w>i))(? — (z2 + w>2))(? (22 + ^3))
= [(? - ^i)3 - 2][(? - z2)3 - 2]
= (I — 2i)3(? — Z2)3 2[(? — z/3 + (? ^2)3] + 4.

Now

(? - Zi)3(? - 22)3 = [(? - «l)(? - 22)]3 = (I2 - 3)3.

Also

(? - zi)3 + (I - z2)3
= 2?3 — 3?2(zi + z2) + 3?(z2 + zl) (zf -f- zf) = 2?3 + 18/

since Z\ -f- z2 = 0 and zf + 22 = 3a/3 — 3a/3 = 0. Hence h(£) =


(?2 - 3)3 - 2[2?3 + 18/ + 4 = ?6 - 9?4 - 4/ + 27/ - 36? - 23
has V3 + a/2 as a root (as can also be verified by direct computation).
Note that if we were to apply the proof to find a polynomial for a/2 + a/3
we would get the same result, but the computations would be more com¬
plicated since we would have to deal with symmetric functions of three
variables.
To illustrate the second part of the proof, consider finding a nonzero
polynomial //?) £ Ra[?] which has a root 2 of /(?) = ?2 + a/2? + a/3.
In fact, h(£) was chosen in the proof in such a way that/(?)/(?), and /i(?)
will have both roots (—a/2 ± a/2 — 4\/3)/2 of /(?). Following the
342 THE COMPLEX NUMBERS [CHAP. 8

proof we first find polynomials (q/f) over Ra of which the coefficients


w0 = \/3, Wi = y/2, and w2 = 1 are roots; such polynomials are
0o(f) = I2 — 3, #i(f) = f2 — 2, and g2(0 = f — T In the notation
of the proof we have m0 = 2, mi = 2, m2 = 1, and rc0,i = \/3,
w0,2 = —V3, ^i,i = v/2, ^x.2 = — \/2, w2li = 1. Then we take

A(f) = (f2 + \/2f + V3)(l2 + V2£ - V3)(f2 - V2f + V3)


X - V2£ - V3)
= [(I2 + V202 - 3][(f2 - v^f)2 - 3]
= (f2 + \/2f)2(f2 - v/2f)2 - 3(f2 - x/2f)2 - 3(f2 + \/2f)2 + 9
= a4 - 2f2)2 - 3f2[(f - V2)2 + (f + V2)2] + 9
= (f8 - 4f6 + 4f4) - 3f2[2f2 + 4] + 9
= f8 - 4:f6 - 2f4 - 12f2 + 9.

Here h(£) has the eight roots given by all possible combinations of signs
in (±\/2 zt \/2 zb 4\/3)/2.
We shall return to a more intensive study of the algebraically closed
field Alg and its subfields in the next chapter.

Multiple roots; discriminants. As a second result concerning roots of


complex polynomials, we wish to develop an algorithm for deciding
whether a given /(f) e C[f] has multiple roots, i.e., whether there exists
2 G C with (f — 2)2|/(f). One such algorithm is suggested by Theorem
6.39. If we put d(f) = (/(f),/'(f)) and /i(f) = /(f)/d(f), we know that
/i(f) has only simple roots and has exactly the same complex roots as /(f).
Since d(f) is chosen monic and since/(f), d(f) factor completely into linear
factors by the fundamental theorem of algebra, we see that /(f) has only
simple roots if and only if /(f) = /i(f), that is, d(f) = 1. Hence (a non¬
constant) /(f) has a multiple root if and only if (/(f),/'(f)) ^ 1. Now
Euclid’s algorithm provides us with a method for computing this g.c.d.
from the coefficients of /(f) in a finite number of steps by using only the
basic rational operations, and hence provides us with the desired method
for determining whether/(f) has multiple roots.
We wish now to develop an alternative algorithm for the same question
which is more closely related to the preceding arguments involving sym¬
metric polynomials. Given/(f) monic and of degree n and with complex
roots Zi, . . . , zn, we see that /(f) has multiple roots if and only if
IIkkjXii (zi ~ zi) = 0. This product is not invariant under all per¬
mutations, but its square is.

8.39 Definition. Suppose that /(f) e C[f], deg (/(f)) = n > 0 and
/(f) = (f — zi) . . . (f — zn). By the discriminant of /(f), which we
denote by Dis (d(f)), we mean the number IIi<;</<n (2» — zj)2-
8.3] ROOTS OF COMPLEX POLYNOMIALS 343

8.40 Theorem. For each n > 0 there is a 'polynomial d(fi, , fn) G


I[fi, . . . , fn] in n variables over the integers, such that for any monic
fit) e C[f], if we write fit) = £"=0 &»-»f* (&o = 1), we have
Dis (/(f)) = d(bi, . . . , 6n).

Proof. The polynomial giti, . . . , fn) = ITi<«,-<n (fc — tj)2 with


integer coefficients is evidently symmetric in fi, . . . , f„. By the funda¬
mental theorem on symmetric polynomials 5.29, we can find

dl(£l, • • • , fn) G I[fl, • • • , fn],


with
ffitlt • • • , fn) ~ dj(<7i(fi, . . . , fn), • • • , f 1, • • • , tn)) ■
Then

Dis (/(f)) = g(z.\, . . . , zn)


= di (cri(zi, . . . , 2W), . . . , an(zu . . . , z„)) = di(—&i, b2, . . . , (—l)w6n),

by 5.28(i). Thus if we take d(fi, . . . , f„) = di(— fi, f2, • • • , (— l)wfn),


we have the desired result.
Since the proof of 5.29 gives us an algorithm to construct the polynomial
d(fi, . . . , fn), the preceding theorem gives us an algorithm for comput¬
ing the discriminant of any polynomial in terms of its coefficients, and
hence for determining whether it has multiple roots.
If /(f) is quadratic, /(f) = it — 2X)(f — z2) = f2 + &if + b2, we
have bi = —(21 + z2), c 1 = ziz2, and

Dis (fit)) = (2l — 22)2 = 2^ T~ Z2 2ZiZ2


= (21 "T 22)2 — 4zjZ2 = bf ■ 462.

More generally, to determine multiplicity for /(f) = of2 + 6f + c


(a 3^ 0), we compute the discriminant of /i(f) = f2 + (b/a) f + c/a
which, by the preceding, is (62 — 4ac)/a2. This accords with the result
8.11 giving the direct computation of the roots of /(f) as

— b zt \/b2 — 4ac
2a

Recall that by 8.8(iii) if a, b, c are real then either both roots z1} z2 of
/(f) are real, in which case Dis (/(f)) > 0, or the roots zj, z2 are nonreal
complex conjugates, say zi = x + it/, z2 = £ — h/ with ^ 0, in which
case Dis (/(f)) = (2it/)2 = — 4y2 < 0. Thus by computing the discrimi¬
nant we are also able to tell whether a given real quadratic polynomial
has two distinct real roots, one multiple root, or two distinct nonreal
(conjugate) roots, according as Dis (/(f)) > 0, =0, or <0. Of course,
this again agrees with the result of using the quadratic formula.
344 THE COMPLEX NUMBERS [chap. 8

If /(£) is cubic, /(£) = (£ — «i)(£ — 22)(£ ~ «3) = £3 + &i£2 +


62$ + 63, we have

&i — (2i + z2 + 23)) &2 = (2122 + 2i23 + 2223), ^3 = (212223)-

Already in this case the computation of the discriminant gets somewhat


involved. Referring to Exercise 7 of Exercise Group 5.2, we found that

(£1 £2)2(£l £3) 2(^2 £3)“ = O'TO'2 — 4(72 — 4(71 (73

27(73 -|- 18(7\(S2(73 J


hence
Dis (/(£)) = (—6!)26| - 46! - 4(-61)3(-63)
- 27(—63)2
+ 18( 6i)62( ^3)

= &1&2 — 462 — 46163

2763 4 I8616263.

For fourth and higher degree polynomials the computation becomes even
more involved. The following simplifying device is moderately helpful.
Its proof is left to the student.

8.41 Theorem. Suppose that n > 2 and that /(£) = Ya=0 K-i? is in
C[£] with 60 = 1. Then for some c2, ... , cn,

/(« - n) = £" + r Cn--P-


' ' i=0
// ice pirf

,(0 -/({-£).
icc 6ace
Dis (/(£)) = Dis ((,(£)).

The latter part of this holds since 2 is a root of /(£) if and only if
2 + (61 /n) is a root of (?(£) = /(£ — 6\/n). Thus it suffices now to
restrict attention to /(£) of the form £n + c2£ra_2 + ••• + <:„. From the
above we thus obtain the following.

8.42 Theorem. For /(£) = £3 + p£ + q G C[£] ice have Dis (/(f)) =


—4p3 - 27q2.

As with the quadratic polynomials, we might expect that the discrimi¬


nant of a cubic polynomial gives us some information about the real or
8.3] KOOTS OF COMPLEX POLYNOMIALS 345

nonreal character of its complex roots. This information is summarized


in the following.

8.43 Theorem. Suppose that /(£) e Reft], /(£) is monic, deg (f(t)) = 3
and /($) = (^- Zl)tt - «a)(€ - s3). Let d = Dis (/(£)). Then
d is real and we have:
(i) d > 0 if and only if z\, z2, z3 are real and all distinct;
(ii) d! = 0 if and only if zlt z2, z3 are real and not all distinct;
(iii) d < 0 if and only if one of Z\, z2, 23 is real and the remaining two
numbers are nonreal and complex conjugates.

Proof. Since d — (zi — z2)2(Zi — 23)2(z2 — 23)2, we know that d = 0 if


and only if at least two of the roots are equal, thus giving (ii). Suppose
that d 9^ 0. Since /(£) has real coefficients, whenever z is a root of /(£)
we also have z a root; further z = z if and only if 2 is real. Suppose that
/(£) has one nonreal root, say Z\ = x + iy, y 9^ 0. Then also one of the
other roots must be z\, say z2 = x — iy. But /(£) must have at least
one real root by the general theorem 7.54 on roots of odd degree poly¬
nomials. Hence 23 must be real. (Or one can argue this from the decomposi¬
tion result 8.34.) Thus in this case

d = (2iy)2{(x - z3) + ii/]2[(.r — z3) — iy}2


= -4j!2{{x -z,)2 + y2}2 < 0.

Clearly if d 5^ 0 and Z\, z2, z3 are real then d > 0. From these facts we
can establish that the equivalences hold in (i)-(iii).

Even with the simplification 8.41, the computation of Dis (/(£)) for
/(£) of degree 4, /(£) = £4 + pt2 + qk + r (re C[^]), becomes quite
involved. It turns out in this case that we get Dis (/(£) = 16p4r —
4p3q2 — 128p2r2 + 144pq2r — 27q4 + 256r3. There is no value to us
here in going through the details of verifying this. Also a theorem analogous
to 8.43 can be established for real polynomials of degree 4. However the
possibilities for the roots are greater for each of the various cases.
As we obtained in 8.11 a general representation of the complex roots of
an arbitrary quadratic polynomial in terms of its coefficients, we might
also hope to obtain the same sort of representation for the roots of cubic
and higher degree polynomials /(£). Furthermore, we might hope that as
with the quadratic formula, such a representation could be given using the
rational operations +, —, •, -1 and the operation of taking the nth roots
of a complex number for various n. If we can find such a representation for
at least one root of /($), we say that f{z) = 0 is solvable by radicals.
The problem of thus describing the roots of higher degree equations
was one that occupied and baffled mathematicians until the beginning of
346 THE COMPLEX NUMBERS [CHAP. 8

the sixteenth century. The method of solving quadratic equations had


long been known from the work of the early Indian and Greek mathe¬
maticians (at least in geometric form). There is an interesting bit of
mathematical history surrounding the eventual discovery of a general
method for solving cubic equations. This was first accomplished by
Ferro who, as was often the manner, kept the solution relatively secret
for thirty years. The solution was then independently found again by
Tartaglia. The mathematician (and physician, among other talents)
Cardan learned the details of this from Tartaglia under a pledge of secrecy.
Cardan then published it as his own work in his Ars Magna. He was equally
ungenerous with the work of his student Ferrari, who was the first to find
a solution by radicals of the general fourth degree equation. It is for this
reason that one often sees the solutions for the third and fourth degree
equations referred to as Cardan’s formulas. Fortunately for his name,
Cardan did contribute much of his own at this stage in the study of roots
of polynomial equations, particularly concerning the relationships among
the roots, the number of roots, and the need to use complex numbers
even in describing the roots of real polynomials. (We shall discuss this
last point below.)

Roots of cubic equations. We wish now to describe solutions by radicals


found for the general cubic equation or, what is sufficient, for the equation

(8:3-1) f(z) = 0, where /(£) = £3 + + q, p, q E C.

For p = 0 the roots z of £3 + q are the three cube roots of —q, namely
v/—g, fv'—g, -g where f = cos 2tt/3 + i sin 27t/3 (these roots are
distinct if g ^ 0). We can assume now that p ^ 0. In view of the
history of the problem it may be expected that some ingenuity is now
involved which cannot be motivated beforehand. This is seen in the next
step: we seek to find roots 2 of (8:3-1) which have the form

(8:3-2) z — w — —> where w E C, w 9^ 0.


3w

Suppose that f(z) = 0; then we can find at least one such w as a solution
of w2 — zw — p/3 = 0, and w ^ 0 for otherwise p = 0. For any such
w we have, by direct substitution and expansion,

(8:3-3) + 0,

or, equivalently, (w3)2 + q(w3) — p3 /27 = 0. Now let


8.3] ROOTS OF COMPLEX POLYNOMIALS 347

Then
3
(8:3-5) u2 + qu — = 0.

Proceeding formally for the moment, we write the solution of this last
equation by the quadratic formula as

where \/ in this case indicates only one of the square roots. Then

where v/ indicates only one of three cube roots. Thus we have the follow¬
ing conclusion: if f(z) = 0 then we can find w satisfying (8:3-2) and
(8:3-6), the latter under some definite choice of the square and cube roots
involved. Conversely, consider any one of the (in general, six) numbers w
denoted by (8:3-6). Then if we set u = w3 we have

so that, no matter what the determination is of y/~, u satisfies (8:3-5).


Then since p 9^ 0 we must have u 9£ 0 and hence also w 9^ 0. It follows
then from (8:3-3) through (8:3-5) that z = w — p/3w is a_root of /(£).
In other words, no matter how we choose to interpret \/ and \/ in
(8:3-6), by (8:3-2) and (8:3-6) we are provided with a solution 2 of (8:3-1).
We thus have a complete description of the possible roots of /(£).
There is still a slight puzzle as to how to single out from this descrip¬
tion a description of three numbers z 1, z2, z3 with

/(£) = (£ ~ zi)(£ — z2)(£ — 23),

since (8:3-6) provides us with six numbers to choose from. This is settled
as follows. Let

where now vV/4 + p3/27 is definitely chosen as the principal square


root, in accordance with 8.10 or 8.21. Then uu' = —p3/27. Now let
\/u be the principal cube root of u. By 8.22 the other cube roots of u are
fv/w and T2v/w where f = cos 2t/3 + i sin 27t/3 is a primitive cube root
of unity. We put wx = <fu, w2 = $</u, w3 = f2</u. Similarly, we
348 THE COMPLEX NUMBERS [chap. 8

introduce w[ = \/u', w2 = w3 = $2\Zu'. Now (wiw{)3 = uu' =


—p3/27. Since (—p/3)3 = —p3/27, —p/3 must be one of the values
wiw[, £(wiw[), or £2(wiw[). Suppose that it is the first. Then

wi
V J_ V
w /1*
3 wi 3 w.

In this case also w2w'3 = (^Wi)(^2w[) = w\w'x = —p/3, so

W2 P J_ V J_
3 w2 3 w'

similarly
V J_ V J_ w2.
™3
3 w3 3 W2

In other words, if we set


V V
Zi = w,- — Wi
3 vcu 3 w’.

then
{«i» 22, 23} = {«i, 4, 23}

though, of course, the order is not necessarily preserved. The same con¬
clusion can be seen to hold if —p/3 is ^{wxw{) or ff2(wiw[). Since we have
seen that 0 is a root of /(£) if and only if it is one of the numbers z1} z2,
Z3, zu z2, z3> it follows that 2 is a root of /(£) if and only if it is one of
z 1, z2, z3 or, equivalently, one of z[, z2, z3.
Recall that by 8.42 the discriminant of £3 + p£ -|- q is —4p3 — 27q2 =
—108(#2/4 + p3/27). This leads us to summarize the above arguments
in the following form.

8.44 Theorem. Suppose that/(£) = £3 + p£ + g,/(£) e C[£]andp ^ 0.


Let d = Dis (/(£)) = —4p3 — 27q2 and let u be either of the complex
numbers —q/2 ± y/—d/108. Let f = cos 27t/3 + i sin 2tt/3, and
let Wi = \/u, w2 = $y/u, w3 = £2\/u. Then u and each Wi are
different from 0. Finally, let Zi = w{ — p/Zwifor i = 1, 2, 3. Then
/(£) = U ~ 2i)(£ — 22)(I — 23).

The roots of the general cubic £3 + 6X£2 + b2f + b3 can be found from
this theorem by applying the transformation of 8.41. Consider the special
case that all the coefficients of this polynomial are real. Then the poly¬
nomial /(£) = £3 + p£ + q associated with it by 8.41 also has p, q real.
We know in this case how to classify the roots according as d > 0, d = 0,
8.3] ROOTS OF COMPLEX POLYNOMIALS 349

or d < 0 by 8.43. The surprising feature of the above solution is that in


the case that /(£) has three real distinct roots we have d > 0 and we use
as an intermediary in the representation of these roots the cube roots of
the nonreal numbers —q/2 + s/—d/108. For a long time this was
regarded as a defect of the above method of solution. However, it has
turned out from more modern algebraic investigations that this situation
is unavoidable. It can be shown that it is in general impossible to represent
any of the roots of a real cubic with positive discriminant by means of only
the rational operations and the use of real nth roots of real numbers, no matter
what integers n > 1 are allowed. (This is the surmise of Cardan which we
referred to earlier.) The exact statement and proof of this result involves
some deeper algebraic considerations, which are beyond the scope of this
book.

Roots of fourth degree equations. The foregoing treatment should give a


clearer idea of what is meant by saying that an equation is solvable by
radicals. (A precise definition of the general notion will be given at the end
of the next chapter.) Without going into the same detail, let us see now
how an arbitrary fourth degree polynomial equation over C can be solved
by radicals. Again it is sufficient to consider the equation

(8:3-7) f(z) = 0, where /(£) = f4 + pf2 + q£ + r, p,q,r& C.

If r = 0, the roots of /(£) are z = 0 and those of the cubic f3 + pf + q,


which we can solve by 8.44. We thus assume now that r ^ 0. If q = 0
we can treat /(£) = £4 + p£2 + r as a quadratic in £2 and then easily
find its roots. We can thus also assume now that q ^ 0. As an inter¬
mediate step to finding the roots zi, z2, 23, 24 of /(£), we first find a decom¬
position of it into quadratic factors, /(£) = (£2 + a£ + 5)(£2 + a'f + b').
Now the roots, say Z\, z2, of the first factor satisfy z\ + z2 = — a and
those, z3, z4, of the second factor satisfy 23 + z4 = —a'. But since the
coefficient of £3 is 0 in /(f), zj + z2 + 23 + z4 = 0 and hence a' = —a
in any such decomposition. Since there is at least one such decomposition,
we see that

(8:3—8) there are a, b, c £ C withf(^) — (f2 A of -f- fr)(£2 — af-f-c).

The problem is now seen to be one of how to determine a, b, c from p, q, r


by rational operations and use of nth roots. Multiplying out gives us the
following relationships:

(8:3-9) b + c — a2 = p, a(c — b) = q, be — r.

Since we are assuming that q t6 0, r 9^ 0, we see that each of a, b, c is


350 THE COMPLEX NUMBERS [CHAP. 8

distinct from 0. Then writing c + 6 = a2 -\- p, c — b — q/a gives us

(8:3-10) 2 c — a2 + p + q/a and 2b — a2 + p — q/a,

and 4r = 46c = [(a2 + p) + q/a][(a2 -f p) — q/a] = (a2 + p)2 — q2/a2.


Hence a2(a2 + p)2 — 4ra2 — q2 = 0, so that a6 + 2pa4 + (p2 — 4r)a2 —
q2 — 0. Thus if we set

(8:3-11) u = a2

we have

(8:3-12) u3 + 2 pu2 + (p2 — 4 r)u — q2 = 0.

Now by 8.41 and 8.44 we can solve this equation for u by radicals. There
are in general three such solutions and by (8:3-11) a can be taken to be
one square root of one of them; then b, c can be determined from a by
(8:3-10). The question now is, which of the six possible choices of a should
we take? Tracing back our steps shows that any one of these will do—
in other words, no matter what root u we take of (8:3-12) and which
square root a we take of u, if we define b, c from such a by (8:3-10) then
the relationships (8:3-9) will hold and hence also the decomposition (8:3-8).
(If p, q, r are real we can also choose a, b, c to be real by 8.34 or directly
from here, since it can be seen in this case that (8:3-12) has at least one
real root u > 0.) Since we can find the roots of each of the factors of
(8:3-8) by the quadratic formula, we thus reach the following conclusion:
the general fourth degree polynomial equation over C can be solved by radicals.
In fact, by the use of ambiguous radicals V , -\/ as in (8:3-6), one can
give in terms of the coefficients a single formula which denotes, under the
different particular interpretations of the radical signs, all of the complex
roots of any fourth degree polynomial.

On equations of higher degree. Despite the special character of some of the


preceding computations, there are some hints in our treatment of the
2nd, 3rd, and 4th degree equations as to the handling of the general nth
degree equation. However, again no real progress was made on this problem
until the late 18th and early 19th centuries. First came the work of
Lagrange, who obtained certain useful reductions of the problem, but
failed to achieve the desired solutions by radicals. Along more special
lines, Gauss concerned himself with the solutions of equations connected
with various geometrical constructions by ruler and compass. As we shall
see in the next chapter, this reduces to working with equations having
solutions that involve square roots only. Starting with this, Gauss was
able to settle both positively and negatively various construction problems
which had been outstanding since the time of the Greek geometers. In
8.3] ROOTS OF COMPLEX POLYNOMIALS 351

retrospect, it does not seem surprising to us that there should be equa¬


tions which can be solved by radicals, but cannot be solved in this more
special form. What is suprising is the result then obtained by Abel: for
each n > 5, there is no formula for the roots of the general nth degree 'poly¬
nomial equations by means of rational operations and radicals in terms of
its coefficients. Still left open by this important result is the possibility
that we might be able to find solutions for each particular equation by
means of radicals, although by Abel’s theorem no single formula can work
for all equations. This was finally settled by the work of Galois, which
has come to be known as the Galois theory of equations. This beautiful
theory summarized most of the earlier work in such a way as to lead to a
clearer understanding of the reasons for the various outcomes. It further
showed how to construct for each n > 5 particular equations of degree
n which cannot be solved by radicals. For example, very simple equations
of degree 5 with rational coefficients can be produced having this property.
The modern presentation of these results demands a substantial develop¬
ment of two topics in modern algebra, the algebraic theor y of fields and the
theory of groups. In the next chapter we shall give the background of the
first of these by taking up the algebraic theory of subfields of the complex
numbers. This will be sufficient to allow us to treat certain geometric
construction problems.

Exercise Group 8.3

1. Prove Theorem 8.37(i)—(iii)-


2. Complete the proof of Theorem 8.38(i) by showing that z ■ w G Alg (K)
whenever z, w G Alg (K).
3. (a) Find/(£) E Ra[£] with/(£) 5^ 0 and f(\/2 + iv/2) = 0.
(b) Find/(£) G Ra[£] with/(£) ^ 0 and such that (£2 + i£ — *Z/2)\/(£).
4. We call z an algebraic integer if for some monic /(£) G I[£], we have
f(z) = 0, i.e., if z is a root of a monic polynomial with integer coefficients.
(a) Using the proof of 8.38, show that the set of algebraic integers forms
an integral domain (under the operations of C).
(b) Show that the domain of algebraic integers does not form a sub¬
field of C.
(c) Show that if w G Alg then mw is an algebraic integer for some m G I
with m 0. Thus Alg is the field of quotients of the domain of
algebraic integers.
5. Suppose that /(£), g{£) £ C[£] are monic and of degrees n > 0, m > 0,
respectively. Put /(£) = (£ — zi) • • • (£ zn), g{Q = (£ w\) • • •
(£ — wn). We define the resultant of f{If) and g{£) to be the number

Res (/(£), g{f)) n n (21 *• ~ «>,■).


1=1 j=l
352 THE COMPLEX NUMBERS [CHAP. 8

Thus/(£), g(£) have at least one root in common if and only if Res (/(£),
0(D) = o.
(a) State and prove a theorem analogous to 8.40 for resultants.
(b) What is the relationship between Dis (/(£)) and Res (/(£),
for n > 1 ?
6. Prove Theorem 8.41.
7. (a) Classify the roots of the following polynomials according to 8.43(i)-
(iii):
(i) £3-4£+l,
(ii) 2£3 - 6£2 — 1.

(b) Find a necessary and sufficient condition on b E Re so that 2£3-|- &£+ 1


has exactly three real roots.
8. Find all the roots of the following polynomials.

(a) £3 + 9£ - 6 (b) |3 + 3if + (1 + i) (c) £4 - 2|2 - 8£ - 3


CHAPTER 9

ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS

9.1 Generation of subfields. Although, as we have seen in the preced¬


ing chapter, the use of the complex number field provides us with a good
deal of information concerning the solution of various algebraic problems,
it is necessary to restrict our attention to somewhat smaller parts of C
in order to obtain more detailed information about the nature of these
solutions. This is the case, for example, if we wish to answer such questions
as: are the roots of a given polynomial/(£) e C[£] constructible by classical
geometric means, or are the roots of /(£) constructible by rational opera¬
tions and radicals from its coefficients? Since what is problematic in
answering these questions is the use of certain algebraic operations, such
as taking nth roots, over and beyond the rational operations +, —, •, —1,
what is suggested here is the following approach. The coefficients of a
given polynomial /(£) lie in certain subfields K of C. As we shall see,
there is a smallest such subfield, which we shall call the field generated by
the coefficients of /(£). Beyond this field K the roots of /(£) lie in another
subfield L. Again we shall see that there is a smallest such subfield which
contains K, which we shall call the field generated by the roots of /(£) over K,
or simply the root field of /(£). Our hope now is that the nature of the roots
of/(£) will somehow be reflected in certain relationships between these two
fields K and L. We shall study in detail one (very useful) such relation¬
ship in this chapter.
The construction of the two subfields of C described above are particular
instances of a more general procedure, namely, given a set of elements
Z c C, the construction of the field generated by the elements of Z. For in
the second case above we need take as Z only the set K together with all
roots of /(£).
Recall that by 6.1, 6.2, a necessary and sufficient condition for a set K
to be a subfield of C is that

(9:1-1) 1 e K

and

(9:1-2) whenever z, w G K then

z + w G K, z — w G K, z ■ w e K

and, provided w ^ 0, also z/w G K.


353
354 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

This leads directly to the following result.

9.1 Lemma. Suppose that M is any nonempty collection of sub fields L of C.


Then flL[L e M] is also a subfield of C.

Proof. If we set K = f|L[L e M] we see that K satisfies (9:1-1) and


(9:1-2), since each Le M satisfies these conditions.

9.2 Definition. For each set Z c C let

Gen (Z) = flL [L is a subfield of C and Z c L\.

We call Gen (Z) the subfield (of C) generated by Z.

9.3 Theorem. Suppose that Z Q C. Then we have:


(i) Gen (Z) is a subfield of C;
(ii) Z c Gen (Z);
(iii) if L is a subfield of C with Z c L then Gen (Z) c L;
(iv) if Z1 = Z U {1} and for each n e Pj Zn + l consists of Zn together
with all elements z -j- w, z — w, z ■ w and, for w ^ 0, also z/w,
such that z, w e Zn, then Gen (Z) = UZn[n e P].

Proof. Part (i) is immediately obtained by using 9.1, since the collec¬
tion M of all subfields L of C with ZcL has at least C as one member.
Parts (ii), (iii) follow directly from the definition 9.2. To prove (iv), note
first that Zn c Zm whenever n < m, since Zn c Zn+1 for all n. Let
K — UZn[w e P]. Then 1 e K. If z, w e K then for some n, m we have
z G Zn and w e Zm. If n < m then both z, w e Zm and z -f- w, z • w,
and z/w (in case w ^ 0) belong to Zm+1 and hence to K; similarly if
m < n. Thus K is a subfield of C andZ c K. Hence by (iii), Gen (Z) c K.
PT> show that K c Gen (Z), it is sufficient to prove that each Zn c Gen (Z);
this is easily proved by induction on n, using (i).

Part (i) of the above justifies the terminology in 9.2. Parts (i)-(iii)
also justify our referring to Gen (Z) as the smallest subfield of C which
contains Z. Part (iv) provides us with an alternative (inductive) way of
regarding Gen (Z), which could just as well have been taken as the basic
definition. It corresponds to our intuitive idea that Gen (Z) can be con¬
structed by starting with 1 and the elements of Z and repeatedly applying
the rational operations any finite number of times.
There is nothing special about the role of C here. We could also define
for any field and any set of elements in the field, the subfield generated
by that set. We would then obtain a theorem just like 9.3 for this notion.
Similarly, we can define the notion of the subdomain generated by a set
of elements in an integral domain, and prove a similar theorem, and so on
9.1] GENERATION OF SUBFIELDS 355

for other kinds of algebraic systems like rings, etc. A number of the follow¬
ing results also hold when adapted to these other contexts.
The proof of the following is left to the reader.

9.4 Theorem.

(i) Gen (0) = Ra.


(ii) If K is a subfield of C then Gen (K) = K.
(iii) If Z Q C then Gen (Gen (Z)) = Gen (Z).
(iv) J/lfcZcC then Gen (IT) c Gen (Z).
(v) If Z Q C and z e Gen (Z) then for some finite subset IT of Z,
z e Gen (IT). In brief, Gen (Z) = UGen (IT) [IT c Z and IT
is finite].

Although it follows from the preceding results that

Gen (Z) U Gen (Z') c Gen (Z U Z')

whenever Z c C, Z' c C, in general Gen (Z) U Gen (Z') Gen (Z U Z').


We leave finding examples of this to the student.

The general extension process. We now consider the process of construct¬


ing from a given subfield K and a set of elements Z, the subfield L generated
over K by Z. Actually, it is not necessary at the start here to assume that
A is a subfield.

9.5 Definition. Suppose that K c: C and ZcC. We take K(Z) =


Gen (K U Z). If Z is finite and nonempty, say Z = {zi, ... , znj,
we write K{t\, . . . ,zn) instead of K({z\, . . . , zn}).

Then as a direct consequence of various of the foregoing we have the


following results.

9.6 Theorem. Suppose that K c C, Z c C and Z'cC. Then:


(i) K(Z) is a subfield of C;
(ii) K c K(Z) and Z c K(Z);
(iii) if L is a subfield of C with K c L and Z Q L then K(Z) c L;
(iv) K{K(Z)) = K(Z);
(v) IT c Z ^en A(IT) c A(Z) ;
(vi) A(Z) = UA(IT)[IT c Z and IT is finite})
(vii) K(Z U Z') = (A(Z))(Z').

Proof. The only part of the preceding which is not a direct adaptation
of the corresponding parts of 9.3 and 9.4 is (vii). It is seen as follows.
By (v), K(Z) c K(Z u Z'), and by (ii), Z'cZuZ'clf(ZuZ'). Since
K(Z U Z') is a subfield of C by (i), it follows by (iii) that (A(Z))(Z') C
356 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

K(Z u Z'). On the other hand, K U Z c K{Z) c (K(Z))(Z') by (ii)


and Z' c (K(Z))(Z'), again by (ii). Hence if U Z U Z' c (K(Z))(Z').
Since (A(Z))(Z') is a subfield of C by (i), it follows from 9.3(iii) that
Gen (A U Z U Z') c (A(Z))(Z'), that is, K(Z U Z') c (A(Z))(Z') by 9.5.

A simple example of a statement involving the hypothesis that A is a


subfield of C is the following.

9.7 Theorem. Suppose that A is a subfield of C and Z cC. Then A(Z) = A


if and only if Z c A.

The proof is trivial. In particular A(0) — A whenever A is subfield.

Simple extensions. By 9.6(vi), the study of the structure of the K(Z) is


reduced to the study of the cases where Z is finite. In this case, if Z ^ 0,
say Z = {zi, . . . , zn), the study of A(Z) can be further reduced by
9.6(vii). We write here A(Z)(Z') instead of (A(Z))(Z'). We have

K(zlt . . . , zn) = K({zi, . . . , zn}) = K({zlt . . . , zn_1} u {zn})


A({2i, . . . , Zn—1 /)({^n/) = A(Zx, . . . , Zn —

Repeating this gives K(z1} . . . , zn) — K(zf) . . . (zn—i)(zn)- Hence we


need only consider simple extensions L(z) for various L and 2. The
hypothesis that A is a subfield now comes essentially into play.
Since every subfield is closed under addition and multiplication, when¬
ever /(£) is a polynomial with coefficients in A, we have f(z) e A(z).
Furthermore, if g(£) e A[|] and g(z) ^ 0 then also f(z)/g(z) e K(z).
This suggests studying A(z) by investigating the relationship between it
and both the domain of polynomials A[£] and the field of rational forms
A(£). This is initiated by the following.

9.8 Theorem. Suppose that A is a subfield of C and z e C. Define the


function G with domain A[£] by G(/(G) = f(z) for each f(fi) e A[$].
Let D be the range of G. Then:
(i) for each a e A, G(a) = a, so that A Q D;
(ii) G(£) = z, so that z G D;
(hi) AcA(2);
(iv) D is a subdomain of K(z)\
(v) G is a homomorphic mapping of A[|] onto D.

Proof. Parts (i) and (ii) are obvious; in particular, G(0) = 0, (7(1) = 1
by (i); (iii) follows by our previous remarks. Clearly D is closed under
+ and •, with (?(/i(£) + /2(£)) = A(z) + /2(z), G(/i(£)-/2(f)) =
/i(2) 'f2{z). This proves (iv) and (v).
9.1] GENERATION OF SUBFIELDS 357

Simple transcendental extensions. The mapping G cannot in general be


extended to a mapping G' of A(£) into K(z) in the natural way, i.e., by
sending /i(£)//2(£) into /i(z)//2(z), since we may well have /2(z) = 0
even though /2(£) ^ 0. However, if z is not the root of any nonzero
polynomial over K, this can be carried through. In fact in this case we
get the following.

9.9 Theorem. Suppose that K is a subfield of C and z e C but z & Alg (A).
Then the function G defined in 9.8 can be extended to an isomorphic
mapping of A(£) onto K(z), so K(£) — K(z).

Proof. The extension G' of G to A(£) described above is now well


defined for all elements /i(|)//2(£) °f since, by hypothesis and
definition of Alg (A), whenever /2(£) t6- 0 then f2(z) ^ 0. Moreover,
the extended mapping G' is one-to-one. For suppose that

/i(g) = ffiO)
/2(g) ^2(2)

where/2(£) 02 (£) ^ 0 and g2(fi) ^ 0. Let h(£) =/i(£)<72(£) — fzi&QiiO-


Then h{z) — 0, and hence h(£) = 0. Thus also

m) _ gitt)
fi{k) 02(0

Clearly the extended mapping G' also keeps fixed each element of K and
preserves —, •, and —1. Hence all that remains to be proved is that
K(z) is the same as the range of G'. Let L = <R(G'). Then K c L, z e L
and L is a subfield of C. Hence K{z) c L by 9.6(iii). On the other hand,
it is clear that each element fi(z)/f2(z) of L is already in K(z), since the
latter is closed under all the rational operations.
Essentially what is involved in this proof is that if z is not algebraic
over K then K[z\, i.e., the range D of the mapping G of 9.8, is a simple
transcendental extension of K. But then A[£] = K[z\ by 5.6. Hence the
corresponding fields of quotients are = by 6.12. Since A (z) can be seen
to be the field of quotients of A[z], we have A(^) = A(z).
This theorem completely determines the structure of simple trans¬
cendental extensions K(z) in C, i.e., extensions where z is not algebraic over
A. These are all isomorphic to A(£) and hence to each other. Thus,
for example, tv and e are algebraically indistinguishable over the rationals,
Ra(7r) = Ra(e).

Simple algebraic extensions. Of course, our main interest here is rathei


in studying the nature of roots of polynomial equations, and hence in the
358 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

case that z is algebraic over K. Now it can be seen directly that K(z)
consists of all elements fiiz)/f2iz) for which f\(t), f2(t) E A[£] and
f2(z) 0. However, as we saw earlier, we are blocked from establishing
the same sort of connection between Kit) and K(z) in this case as for the
nonalgebraic case 9.9.
Our first step in treating the case that z is algebraic over K is to obtain
a survey of the set of all polynomials fit) over K of which z is a root.
This is accomplished by the next theorem.

9.10 Theorem. Suppose that K is a subfield of C and that z E Alg (K).


Let n be the least positive integer such that for some git) e A[£] with
deg (g(£)) = n we have g{z) = 0. Then:
(i) there is a unique monic polynomial pit) E K[t] of degree n such
that p(z) = 0;
(ii) for any f(£) G K[t] we have f(z) = 0 if and only if p(t)\fit)‘,
(iii) p{t) is prime in K[t];
(iv) if qit) is monic and prime in K[£] and qiz) — 0 then qit) = pit)-

Proof. First let pit) be any monic polynomial in 7v[£] of degree n such
that p(z) = 0; by definition of Alg (A), there is at least one such poly¬
nomial. Let fit) G A[£] be arbitrary with fit) ^ 0. By the division
algorithm 6.26 there exist (unique) hit), rit) in A[£] with

(1) fit) = Ht)p(t) + rit) and 0 < deg (rit)) < n.

Then

(2) fiz) = riz).

Suppose that/(z) = 0; then also r(z) = 0 and hence deg (rit)) = 0 by


the choice of n. But then r(t) must be the constant 0. Hence p(t)lfit)-
Conversely, if this holds, certainly f(z) = 0. In other words, no matter
what choice of pit) we take with deg (pit)) = n and p(z) = 0, we must
have (ii). In particular, if pi(t) is another such polynomial we have
p(£)|Pi(£)* Since pit) is monic it follows that if also piit) is monic then
p(t) = Pi(t) by 6.23(xi), (xii). Thus we have also proved (i). Parts
(iii) and (iv) are then easy consequences.
By (iii), (iv), the polynomial pit) can also be described as the unique
monic qit) irreducible over K with qiz) = 0. Then the positive integer
n of the hypothesis can alternatively be described as being the degree of
this polynomial.
Now consider again the homomorphic mapping G of K[t] into K(z),
defined in 9.8, which assigns to each fit) its value fiz). We know by
Section 4.6 that with each homomorphic mapping is associated a certain
9.1] GENERATION OF SUBFIELDS 359

congruence relation = such that the homomorphic image is = to the cor¬


responding system of equivalence sets. The relation /i(£) = /2(£) holds
if and only if G(/i(£)) = Cr(/2(£)), i.e., if and only if fx(z) = f2(z). But
/i(z) — /2(2) = 0 if and only if p(£)|(/x(£) — /2({)), where p($) is the
unique monic prime polynomial of 9.10. On the other hand, given any
p(£) in K[%], we can define such a corresponding equivalence relation.

9.11 Definition. For any p(£) in K[£] and fi(£), /2(£) E K[%], we let
/i(£) =fz(i) (mod p(0) hold if and only if p(£)|(/i(£) — /2(f)).

This relation is very similar to = (mod p) where pel. The analogy


and the results 4.60 and 6.9, which show that the system of integers mod p
forms a field when p is prime, lead us to the following.

9.12 Theorem. Suppose that K is afield and p(£) is prime in K[£\. Then.
(i) = (mod p(£)) is a congruence relation in (A[£], +, ■, 0, 1);
(ii) the system of equivalence sets [/(£)] of this relation under the
associated operations forms a field;
(iii) for each /(£) e A[£] there is a unique r(£) e K[£\ with /(£) =
r(£) (mod p(£)) and 0 < deg (r(£)) < deg (p(£)).

Proof. It is clear that = (mod p(£)) is an equivalence relation in A[£].


Further if /i(£) = f2{k) (mod p(£)) then /i(£) + g(£) 35/2(f) + £K£)
(mod p(£)), and /i(£) ■ g(Q = /2(€) • g{k) (mod p($)), for any g(£). Part
(i) now follows from this. As we know from 4.55, the system of equiv¬
alence sets [/(£)], with operations

(1) [/i(£)l © L/2(£)] = Lfi(£) + /2U)],


[fitt)} ° [/2(0] = Ui(t) -/2(f)]

forms a commutative ring with unity, so long as we know that [1] 5^ [0].
This holds, of course, from the assumption that p(£) is prime in K[£].
It remains only to prove that

(2) if /(£) ^ 0 (mod p(£)) then for some g(£) E K[£\, we have
ftt) • 9(0 = 1 (mod p(£)).

For then each nonzero equivalence set [/(£)] has an inverse [p(£)],
[/(£)] 0 [#(£)] = [1]. Since p(£) I /(£) by hypothesis here, we must have
(p(£)>/(£)) = T Hence by the representation 6.30 of gcd, there are
polynomials h(£), <?(£) in A[£] with 1 = h(£)p{£) +/(£){/(£)■ Put this is
just the conclusion desired in (2), and thus (ii) is proved, (iii) is just a
restatement of the division algorithm for division by p(£).
360 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

We can now bring these results together to obtain the following descrip¬
tion of simple algebraic extensions.

9.13 Theorem. Suppose that K is a subfield of C and z E Alg (K). Let


p(£) be the unique monic prime polynomial in A[£] with p(z) = 0.
Let G be the homomorphic mapping of A[£] into K[z) defined in 9.8,
sending each /(£) into f(z). Then we have the following:
(i) The range of G is exactly K(z) and K(z) is = to the field of equiv¬
alence sets (mod p(£)) of 9.12(h).
(ii) If w E K(z) then there is a unique r(£) G A[£] with

0 < deg (r(£)) < deg (p(£))

and w = r(z).

Proof. Let D be the range of G. As we have already observed,


= G(J2(S)) if and only if /i(£) = f2(0 (mod p(£)). Hence by
the general considerations of Section 4.6 on homomorphisms, D is = to
the system of equivalence sets (mod p(£)). But the latter is a field by
9.12(h). Hence D is also a field. But K c D and zehby 9.8(i), (ii) so
that K{z) c D by 9.6(iii). We already know that D c K(z), so D = K(z).
Thus if w G K(z) then w = G(f(f)) for some /(£) e A[£]; hence w =
C(r(£)) for some r(£) E K[£\ with 0 < deg (r(tj)) < deg (p(£)) by
9.12(iii). In other words, w = r(z) for some such r(£). If also w = n(z)
then r($) = ri(^), for otherwise r(£) — rj(^) is a nonzero polynomial of
lower degree than p(i-) having z as a root.

9.14 Corollary. Suppose that K is a subfield of C, p( f) is prime in A[^],


and p(zi) = 0, p{zf) = 0. Then K{zf) = K{zf). This isomorphism
can be chosen so that each a E K is mapped into itself and z\ maps
into z2.

The Theorems 9.12 and 9.13 are quite important initial results in the
algebraic theory of fields. We wish to illustrate them with a few examples.
Consider first the case A = Ra, z = y/2. The unique monic prime
p(!) G Ra[£] with y/2 as root is p(£) = f — 2. According to 9.13(h),
each element w of Ra(\/2) has a unique representation in the form
w = r(V2) where r(^) G Ra[£] has degree less than 2. Hence for each such
w there are unique rational numbers a, b with w = a -f by/2. That the
set of these numbers forms a subfield of C is the result of 9.13(i). To see
this directly, we need only verify closure under the rational operations
+, —, •, —1. Verification is trivial for + and —:

(a + by/2) ± (ai + b1y/2) = (a ± af) + (b ± bf)y/2.


9.1] GENERATION OF SUBFIELDS 361

For product we have

(a + b\/2) • (ax + 6i\/2) = aax + (abx + bax)\/2 + bbx(\/2)2


= (aax + 2bbx) + (abx + bax)\/2.

For inverse, given a + by/2 5* 0 or equivalently, by uniqueness, not both


a = 0, b = 0, we seek ax, bx E Ra with aax + 2bbx = 1, abx + bax = 0.
These simultaneous equations can be solved for a x, bx in terms of a, b.
More directly, we observe that (a -f 6\/2)(a — b\/2) = a2 — 262, so

1 _ a — by 2 _ a b
a + by/2 (a + &V2)(a - 6\/2) a2 ~ 252 «2 “ 262

Here a2 — 2b2 ^ 0, for otherwise, if 6^0, y/2 would be rational,


while if b = 0 we would have a = 0.
These computations lie behind the suggestion in Section 7.1 that one
way to construct a field which contains Ra and a root of £2 — 2 is to start
with the set of ordered pairs (a, 6) £ Ra X Ra and define operations

(a, b) © {ax, bx) = (a + ax, b -f- bx),


(a, b) ° (cti, 61) = (acq -j- 2bbx, ccbx T bax).

This forms a field into which Ra can be mapped isomorphically by sending


each a £ Ra into (a, 0).
Alternatively, we can construct such a field from Ra[£] by 9.12. As
representatives of the equivalence sets we can take the polynomials a +
with a, b E Ra. To compute the representative of a product

(a + b£)(ax + &i£) = aax + (abx + 6a 1) £ + bbx£2,

we have to reduce this modulo the polynomial £2 — 2. Since £2 =


2(mod £2 — 2), we have 661 £2 = 266!(mod £2 — 2), hence

(a + b£)(ax + &i£) = (aai T- 266i) + (a&i + 6ai)|(mod £2 2).

Thus, in the corresponding system of equivalence sets,

[a -f- 6£] ° [ax + bx^] = [(aax + 2bbx) + (abx + bax) £].

By 9.14, Ra(\/2) = Ra(—V2). Again this can be seen directly by the


mapping which sends each a E Ra into itself and which sends y/2 into
—a/2. This should be expected, since there is nothing assumed about the
number y/2 in the above computations other than that it is a root of the
polynomial £2 — 2.
362 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

We proceed similarly in dealing with Re(i), which is another way of


denoting C. Here the monic prime polynomial p(£) is £2 + 1. Thus again
each element w of Re(i) has a unique representation w = a + bi for a, b
real. The formal operations of sum and product are necessarily

(a + &i) -f- (a,\ + 61i) — (a «i) + (b + bi)i,


(a + 6i) • (ai 4- &ii) = aai + (dbi + bd\)i -)- 6&ii2
= (ddi — bbi) -j- (abi + bdi)i.

From the point of view of polynomials modulo £2 + 1, this last takes the
form (a + 6£) • (ai + 6i£) = aai + (dbi + bdx)| + bbifi2, and

{d + b%) • (di + 6i|) = (ddi — bbi) -f- (dbi -T bd{) <(rnod + 1),

since = — 1 mod (£2 + 1). These computations he behind our con¬


struction of the complex numbers in Section 8.1. Again, 9.14 gives the
already known result Re(i) = Re(—i), with mapping sending each a e Re
into itself and i into —i.

Adjoining roots to drbitrdry fields. Theorems 9.12 and 9.13 show that if
Ai is a subfield of C, p(£) is prime in iv[£] and 2 is a root of p(£), then we
can construct a field isomorphic to K(z) in the system of equivalence
sets modulo p(£). From the purely algebraic point of view the set C of
complex numbers is superfluous here, for we can already verify directly
that the polynomial p(£) (or more precisely, the polynomial with cor¬
responding coefficients in the system of equivalence sets) has a root in the
system of equivalence sets. This leads us to the following general theorem,
which we have alluded to several times earlier.

9.15 Theorem. Suppose thdt (K, +, •, 0, 1) is dny field dnd /(£) e K[£]
is of degree m > 0.
(i) We cdn construct d field (L, +, •, 0, 1) which contdins K as a
subfield dnd dt ledst one element x with f(x) — 0.
(ii) We cun construct such a field L which contdins elements Xi, . . . , xm
with /(£) = ($ — *!)■•■({ — xm) in L[£].

Proof. Let p(£) be any prime in K[£] with p(£)|/(£). We define


= (mod p(£)) as in 9.11. Then the results of 9.12 are seen to hold for this
more general situation by the same proofs. The subfield K can be mapped
isomorphically into the field E of equivalence sets under the associated
operations 0, <> (modp(£)) by sending each a eK into [a]. Then if
P(£) = ao + oq£ + * • • + an£n, the polynomial corresponding to p(£)
in the domain of polynomials E[jj] is p^y) = [a0] 0 [oijij 0 • • • 0 [an]yn.
9.1] GENERATION OF SUBFIELDS 363

Then [£] is a root of pi(ri) in E[t?], that is,

[flo] © ([«i] ° [£]) © • • • © ([fln] ° [£]”) = [0],


since the left-hand side is just [a0 + ai£ + • • • + anC\ = [p($)] and
P(£) = 0 (mod p(£)). F rom E and the isomorphic embedding of K into E
we can construct an extension field L of K which contains a root x of p(£)
and hence of /(£). This proves (i). To prove (ii) we then proceed by induc¬
tion on m.
Given a field L satisfying 9.15(h) for given K and/(£), we can construct
the subfield of L generated by the roots x\, . . . , xm of /(£). It can be shown
that such a field is uniquely determined up to isomorphism. Thus we can
refer to any one such field as the root field of /(£) over K.
To illustrate this theorem, consider the field (I2, +, •, 0, 1) of integers
modulo 2 (4.59, 4.60, 6.9). We shall construct an extension L of this field
which contains a root x of p(£) = £2 + £ + 1. This polynomial is prime
over I2 by 6.37, since neither 0 nor 1 is a root of it. By the proof of 9.15
and by 9.12, L will consist of elements a + bx for a, b e I2, where
x2 + x + 1 = 0. The sum and product are determined by

(a + bx) + (a i + b\x) — (a + aq) + (6 + b{)x


and
(a + bx) • (ai + bix) = aoq + (a&i + ba{)x + bbix2.

But x2 = —x — 1 = x + 1 so
(a + bx) • {ai + bix) = (acq + 661) + (a6i + 6ai + bbi)x.

These conditions for sum and product consistently determine the desired
field L. Its elements are 0, 1, x and 1 + x, and the above rule leads to the
following product table:

• 0 1 X 1 + X

0 0 0 0 0
1 0 1 X 1 + X

X 0 X 1 + X 1
1 + X 0 1 + X 1 X

The polynomial £2 + £ + 1 decomposes completely in L, with the roots


x and 1 + x.
Similarly, to construct a field L containing I5 and a root of £3 + £ + 1,
which can be seen to be prime over 15, we would consider elements of the
form a + bx + cx2, where a, b, c e 15, and determine the product opera¬
tion on these by using x3 = —x — 1 = 4.r + 4. This field has 53 = 125
distinct elements.
364 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

By using the procedure of 9.15 “sufficiently often” it can be shown that


every field K can be extended to a field L in which every polynomial
/(£) G K[£] completely decomposes. Then it can be shown that L can
also be chosen so that every polynomial /(£) e L[£] completely decomposes
in L[£]. In other words, any field K has an algebraically closed extension
field L. This is the result which we had in mind when discussing the
algebraic significance of the fundamental theorem of complex algebra in
Section 8.2. From the algebraic point of view, C is just one among many
possible algebraically closed extensions of Ra. Of course, we already
know from 8.38 the existence of many other such fields. What is of
algebraic interest here is that we have a quite general method of construc¬
tion which makes use only of basic algebraic notions and which in no way
involves the analytic notion of limit. (This is not to diminish the interest
of the remarkable fact that the proper arena for classical analysis, the
complex numbers, has at the same time the algebraically significant
property of being algebraically closed.)
Having settled the structure of simple algebraic extension K(z) in C,
we can now turn to the study of iterations K(zx, . . . , zn) of such exten¬
sions. This is the main subject matter of the next section.

Exercise Group 9.1

1. Prove Theorem 9.4(i)-(v).


2. Give an example of Zi c C, Z2 Q C with Gen (Zx) (J Gen (Z2) ^
Gen (Zi U Z2).
3. Suppose that none of z\, z2, wi, w2 is algebraic. Is

Ra(zi, 22) = Ra(w)i, w2)?

Prove your conclusion.


4. Prove Theorem 9.15(h).
5. (a) Describe the elements of Rafv^) in accordance with 9.13(h) and
show how to compute the product of any two such elements.
(b) Do the same for the elements of Ra(-v/2)(^), where

2-jt ... 2t
f = cos-1 sin —
3 3

(c) What is the inverse of 2 + \/2 in the description of part (a) ?


6. (a) Show that p(£) = £3 T~ £ + 1 is prime in Ra[£]. Let z be any root
°f P(£)- Describe the set of elements in Ra(z) in accordance with
9.13(h), and the computation of the product operation in Ra(z). What
is the inverse of 1 + 2 in this description?
(b) Repeat part (a) for p(£) regarded as a polynomial over the field I2 of
integers modulo 2.
9.2] ALGEBRAIC EXTENSIONS 365

7. Suppose that (K, +, •, 0, 1) is a field and that K is denumerable.


(a) Prove that K[£] is denumerable.
(b) Prove that there exists a field (L, +, •, 0, 1) which contains K as a
subfield and such that each nontrivial polynomial in K[£] decomposes
completely in L, i.e., if/(£) is any polynomial in K[£] of degree m > 0
then there exist xx, . . . , xm £ L with /(£) = (£ — an) • • • (£ — xm).
8. Assume that for any field K (denumerable or not) there exists a field L
satisfying the conditions of Exercise 7 (b). Then show that for any field K
there exists an extension field L which is algebraically closed.

9.2 Algebraic extensions. We assume throughout this section that K, L,


M are arbitrary subfields of C. We want to investigate the situation
L — K(zx, . . . , Zk), where zx, ... ,zjc are algebraic over K or, more gener¬
ally, where each z{ is algebraic over K(zi, . . . , Zi—i) for i = 1, . . . , k.
As we know from 9.6(vii), our investigation can be confined to the study
of a series of simple algebraic extensions. The main facts that we wish to
use about such extensions are given in 9.13. To summarize: with each Zi
algebraic over K is associated a unique monic polynomial pi(£) which is
irreducible over K and of which zx is a root. If nx is the degree of this
polynomial then nx > 0 and for each w £ K(zx) there are unique a0,
ax, . . . , an _i £ K with

(9:2—1) w = ao axzx + •••-]- ani_xzi1 \

Thus if nx = 1, K(zx) = K, and conversely.


Now if z2 is algebraic over Kx = K(zx) we have a unique monic poly¬
nomial p2(£) which is irreducible over Kx and of which z2 is a root. If n2
is the degree of this polynomial then n2 > 0 and for each u £ K(zx, z2) =
Kx(z2) there are unique w0, . . . , Wn2—i G Ki with

(9:2—2) u = Wo -f- wxz2 T- ■ • • -f- wn2_xz22

Now if we represent each Wj in the form (9:2-1), we obtain unique


aoj, ai,y, • • • , an Xj for each j = 0, . . . , n2 — 1 with wj = a0j +
axjzx + • • • + Onj-i.,-211-1; hence

(9:2-3) u — a0,o Oi,o2i (^0,1^2 ^2,o2i d- ai,i2i^2


1 2 1 1 n 1—1 no — 1
+ ao,222 "T ■ ' • T ^ni — l,n2—lzl z2
n 1 — 1 «2 — 1
= S 2 aiJzlZ2-
i= 0 j=0

By continuing in this way we can get a representation of the set of ele¬


ments of K(zx, z2, z3), etc. It is apparent then that one could formulate a
result analogous to 9.13 for extensions K(zx, . . . , zk) by using representa¬
tions in terms of polynomials in K[£x, . . . , £*] of k variables.
366 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

We wish to extract from such representations certain features of the


relationship between L = K(zX) . . . , Zk) and K which are independent of
the choice of zx, . . . , z^ For this purpose we emphasize a different aspect
of representations (9:2-1) and (9:2-3), namely that each element w of K(zx)
can be represented as a linear combination of the nx elements 1, zx, ... , zrfi~1
with coefficients in K, and each element u of K{zx, z2) can be represented as
a linear combination of the nxn2 elements
l 3 n1 — ln2—l
*1, z2, z 1, , ZlZ2, , Z1 Z2

with coefficients in K, and so on for K(zx, z2, z3), etc. Moreover, each of
these representations is unique.
This suggests studying extensions of the form L = K(v i, ... ,vm) with
the property that every element of L can already be uniquely expressed
as a linear combination of vx, . . . , vm with coefficients in K. In particular,
we would consider K{zx) in the form If(l, zx, . . . , 2?1_1) and K(zx, z2)
in the form K( 1, zx, z2, , z\z32, . . . , z\~1z2*~1) when looking at the
matter from this point of view. The main first result which we wish to
obtain about such extensions is that if such vx, ... ,vm can be found at
all satisfying the above conditions with K and L then the number m of
these is uniquely determined by K and L. For those readers familiar with
linear algebra, this is a direct consequence of a well-known result about
vector spaces. For if K e L, L can be regarded as a vector space over the
field K; in this we are concerned only with the operations of addition
applied to elements of L and of multiplication of elements of L by elements
of If. If vi, ... , vm can be found so that every element of L is a linear
combination of vx, ... ,vm with coefficients in K, then L is a finite¬
dimensional vector space over K. If, moreover, this representation is unique
then vx, ... ,vm forms a linearly independent basis for L over K. It is shown
in vector space theory that any two linearly independent bases have the same
number of elements, which gives the desired result here.

Linearly generated extensions; bases and dimension. For those readers


not familiar with linear algebra we now present the notions and argu¬
ments involved as they pertain to the present situation. This can be
achieved fairly quickly.

9.16 Definition. Suppose that FcC.


(i) If V = 0 we put K*(V) = {0}. If V ^0, we put K*(V) =
{w: for some ax, . . . , a* e K and vx, . . . , e 7, w = axvx -f
• • • + We say that w depends (linearly) on V over K if
w e K*(V). If V is finite, say V — {vx, ... , vm}, we write
K*(v i,...,0 for K*(V).
(ii) W e say that V is (linearly) independent over K if for each ogF
v <2 K*(V - {»}).
9.2] ALGEBRAIC EXTENSIONS 367

The choice /v*(0) = {0} here has a slight technical advantage. Clearly
K*(V) cK(V) but, in general, these are distinct. For example, Ra*(\/2)
consists only of numbers of the form a\J2 for a E Ra, hence does not con¬
tain any rational other than 0. However, as we know from 9.13, Ra(\/2) =
Ra*(l, y/2).
For V 0, K*(V) can be defined in a manner similar to K(V) via 9.2:
it is the smallest set (of elements of C) which contains V and which is
closed under addition and under multiplication by elements of K. (In
terms of vector space theory, we would call K*(V) the space generated or
spanned by V over K.) This definition leads to the following properties,
which are similar to those of K(V) and are easy to verify. Here, and
throughout the following, U, V, W, etc., are taken to be arbitrary subsets
of C.

9.17 Theorem.
(i) V c K*(V).
(ii) If U c F then K*(U) c K*(V).
(iii) K*(K*(V)) = K*(V).
(iv) K*(V) = UK*{W)[W c V and W is finite}.
(v) V is independent over K if and only if for each finite W c V,
W is independent over K.

In the following we shall be concerned only with the case where V is


finite. The main new features of the operation K*(V) are given in the next
three theorems.

9.18 Theorem.
(i) If w E K*(V U {u}) and w & K*(V) then u e K*{V U {w}).
(ii) If U is finite then for some V c U, V is independent over K and
K*(V) - K*(U).
(iii) If V = {vi, , vm) with m > 1 then V is independent over K
if and only if for each w £ K*(V) there are unique aX) ... , am E K
with w = aiv] + • • • + amvm.

Proof, (i) vSuppose that w = aiVi + • • • + a{Vi + bu where V\, ,


Vi E V and ai, . . . , az-, b E K. Then 6^0, for otherwise w E K*(V).
Hence u = (—b~1a1)vi + • • • + (—b~1ai)vi -f b~lw, so u E K*(V U {w}).
(ii) We have 0 c U and 0 is independent over K. Since U is hnite,
there is a maximum number m such that for some V Q U, V is independent
over K and V has m elements. Consider any such V. Then K*(V) c K*(U)
by 9.17(h). Suppose that K*(V) 9^ K*(U). Then for some u E U,
ag K*(V), again by 9.17(h). Thus also u & V by 9.17(i). Now we
claim that Vx = V U {u} is independent over K. For suppose that
v E Vi) we wish to show that v & K*(Vi — {v}). For v = u, this is the
same as saying ag K*(V), which is so by choice of u. If v 9^ u and
368 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

v e K*(Vi — {y}), that is, v e K*((V — {y}) U {m}) then by (i),


u e K*((V — {v}) u {y}) = K*(V), which is impossible. But Vx c U
and Vi has m + 1 elements, contrary to the choice of to. Hence we must
have K*(V) = A*(H).
The proof of (iii) is left to the student.

The result (i), often called the exchange condition, is one of the main
properties needed to prove (ii) above, as well as the next two important
results.

9.19 Theorem. Suppose that U, V are independent over K, and that


K*(V) c K*(U). Suppose that V is finite with exactly m elements
(to > 0). Then U has at least to elements.

Proof. For the purpose of the proof we will verify a more general state¬
ment. We prove by induction on to > 0 that whenever H, V satisfy the
hypotheses of the theorem then there exists W c U where W contains
exactly to elements and where K*(U) = K*(V U (H — IF)). In other
words, we can exchange the elements of V for certain elements of U with¬
out affecting K*(U). The proof of this is trivial for to = 0. Suppose that
it is true for to — 1, where to > 0, and suppose that V, U satisfy the
hypotheses of the theorem for to. Then for distinct vx, . . . , vm we have
V = {vx, . . . , vm}. Let Vi = {vx, . . . , vm_i} = V — {vm}. By hy¬
pothesis we can find Wi c U, such that Wx contains exactly to — 1
elements and such that K*(U) = K*(VX U (U — Wfi). Since vm e
K*(V) c K*(U), there is a hnite subset Ux of (U — Wx) with vm e
K*(Vi U Ui). Among such Hi there is one with a least number of ele¬
ments. It is impossible that Ux = 0, for otherwise vm e K*(VX) =
K*(V — {vm}), contradicting the independence of V. Choose any u e Ux.
Since Ux — {u} has fewer elements than Hi, we have

vm <2 K*(Vi U (Hi - {u})).

Thus, by exchange,

u e K*(vi u (Hi - {«}) u K}) = h*(f u (Ux - {r})).

Now let TF = Wx U {u}. Since u & Wx, IF contains exactly to elements.


We claim that this is a suitable choice of IF to complete the induction.
To do this, we must now show that K*(U) = K*(V U (H — IF)).
Since F c K*(U) and H — IF c IC*(U), we have F U (H — IF) c K*(U),
hence A*(F U (H - IF)) c K*(K*(U)) = K*fiU). Note that we have

u e K*(V u (Hi - M)) c K*(V u ((H - Wx) - {w}))

= K*(V U (H - IF)).
9.2] ALGEBRAIC EXTENSIONS 369

Hence

Vx u (U - TFi) = Vx U (U - W) U {■u} c K*(V U (U - W))

and then

K*(U) = K*(V: U (U - TFi)) c K*(K*(V U (U — W)))


= K*(V U (U — TF)).

This completes the proof.

We now immediately obtain the following.

9.20 Theorem. Suppose that U, V are both finite and independent over K
and that K*{V) = K*(U). Then V ~ U, that is, V and U have
exactly the same number of elements.

It can be shown by transfinite methods (including use of the axiom of


choice when nondenumerable sets are considered) that 9.18(h) and 9.20
also hold if we drop the word “finite” in their statement. Corresponding
results also hold for any operation (7(F) instead of K*{V), if the notion of
independence is defined similarly to 9.17(h) and if G{V) satisfies condi¬
tions corresponding to 9.17(i)-(iv) and the exchange condition 9.18(i).
We give an example of this in Exercise 8 below.
By 9.18 if a set S has the form K*(U) for some finite U, where K is
given, then S has the form K*(V) for some finite V which is independent
over K. Moreover the number of elements in V depends only on S and K,
by 9.20. This allows us to introduce the following function of S and K.

9.21 Definition. Suppose that U is finite. By the dimension of K*(U)


over K we mean the unique number m of elements of some V independent
over K with K*(U) = K*{V). In symbols,

m = [K*(U):K].

Finite field extensions. We can now connect these results with field
extensions, as initially suggested.

9.22 Definition. L is said to be a finite extension of K (where K, L are


subfields of C) if for some finite set V, L = K*(V). If V is an in¬
dependent set over K with L = K*(V), we call V a (linear) basis for
L over K.

When dealing with finite extensions we can make use of the number
[L: K).
370 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

We leave to the reader the proof of the following.

9.23 Lemma. Suppose that L is a finite extension of K. Then:


(i) [L: K] > 1;
(ii) K c L;
(iii) [L: K] = 1 if and only if L = K.

If L is a finite extension of K with L — K*(V) then also L = K(V).


For by (ii) above K c L. Since V c K*(V), also fci. Hence K(V) c L.
But clearly K*(V) Q K(V) so L = K(V). However, it is by no means
true that if V is finite and L = K(V) then L is a finite extension of K.
The following shows that L = Ra(7r) is a counterexample for K = Ra.

9.24 Theorem. Suppose that L is a finite extension of K. Then L c Alg (K).


In fact, if [L: K] = m and z e L then there is /(£) e FT[£] with
deg (/(£)) < m and f(z) = 0.

Proof. The conclusion is clear for z = 0. Suppose that z ^ 0, and let


Z = (1, z,z2, . . . , zm}. Then Z has m ~\- 1 distinct elements. Since
K*(Z) c: L = K*(V), for suitable independent V over K with m elements,
we know by 9.19 that Z cannot be independent over K. Thus one of the
elements of Z, say zl, must be a linear combination of the others over K:
m

Z = 2
1=0, jV*
aPJ-

If i > 0 this gives a polynomial over K of degree k, i < k < m, of which


z is a root. If i = 0, not all at- can be 0, hence we get again a polynomial
of degree < m over K of which z is a root.

As a partial converse of this theorem, we have, first, the following.

9.25 Theorem. Suppose that z is algebraic over K. Then K(z) is a finite


extension over K. In fact, if pifi) is the unique monic polynomial which
is prime in such that p(z) = 0 and if deg (p(£)) = n then
{1, 2, ... , zn is a basis for K(z) over K and [K(z): K] = n.

Proof. Let Z = (1, 2, ... , z71-1}. By 9.13(h), K(z) = K*{Z). More¬


over, by the same result, each w e K(z) has a unique representation
w = a0 + a^ fi- • ■ ■ an_izn~l for a0, ... , an_x e K. Hence Z is
independent over K by 9.18(iii). If 2 e K then p(£) = £ — 2 and n = 1,
so Z = {1} has one element in this case. If 2 £ K then n > 0 and Z
has exactly n distinct elements, hence [K*(Z): K] = n.
9.2] ALGEBRAIC EXTENSIONS 371

Iterated finite extensions. By the preceding, if zx is algebraic over K


and 22 is algebraic over L = K(z\) then M = K(zi, 22) is a finite extension
of L and L is a finite extension of K. To complete the link, we need the
following important result.

9.26 Theorem. Suppose that M is a finite extension of L, and L is a finite


extension of K. Then M is a finite extension of K and [M: K] =
[M:L\ • [L: K].

Proof. Let M = L*(ui, . . . , um) where {ult . . . , um} is independent


over L, and L = K*{v\, . . . , vn) where {vi, ..., vn} is independent over
K. Thus [M: L\ = m and [L: K] = n.
If w e M we can find blt . . . , hm e L with

771
(1) w = ^2 biUi.
1=1

Moreover, for each bi, i = 1, ,m, we can find aiA, ... , ai<n G K with

71
(2) bi
j= i

Hence
m n
(3) w = 2 2 ai'iuivi-
i=i j=l

This shows that the set W of all numbers UjVj for i = 1, ,m, j =
1, . . . , n generates M over K, M = K*(W). Thus M is a finite extension
of K. If we show that the elements utVj are distinct for the distinct pairs
(i, j) and that W is independent over K then [M: K] = mn. The following
argument proves both of these. Suppose that a*,/, a[j e K, and

(4) 'y ] y j &i,jUiVj y' y ^ aijUiVj.


i= 1 j= 1 i— 1 j=l

For each i — 1, . . . , n, let

(5) bi = 2 and b>i = S a'idvi-


j= 1 3=1

Then by (4),
rrt, nv

(G) 23 23
i—X i= 1
372 ALGEBRAIC NUMBER FIELDS ANI) FIELD EXTENSIONS [CHAP. 9

Hence by 9.18(iii), b{ = b'i for each i = 1, . . . , m, that is,

n n

(7) ^2 ctijVj = ^2 a'iJvj for each 7 = 1, . . . , m.


3=1 3= 1

Thus, again by 9.18(iii),

(8) a,ij = a'ij for each i = 1, ,m and j = l, ... ,n.

Note that if M is a finite extension of K and M 3 L 3 K, then M is a


finite extension of L; for whenever M = K*(V) also M = L*(V). By
this observation we get the following direct consequences of 9.26, the first
of which is a sharpening of 9.24 and the second a generalization of 9.25.

9.27 Corollary. If L is a finite extension of K and z e L then

[K{z):K] [L:K\.

9.28 Corollary. Given K, zx, . . . , zq, let Kx = K, Ki+1 = Kfzf,


for i = 1, . . . , q, and L = Kq+X = K(zx, . . . , zq). If each Zi is
algebraic over Ki then L is a finite extension of K. We have [L: K] =
nx ■ • • nq where ni = [Kfizf): Kf\.

This completes the considerations which we entered into at the beginning


of this section. Given an iterated algebraic extension L = K(zX) . . . , zq),
the number [L: K] is independent of how, or in what order, zx, ... ,zq are
chosen.
As an example of the computations involved in a specific application
of 9.28, let us find the dimension of the root field L of £3 — 2 over Ra, i.e.,
of the number [L: Ra] where L = Ra(v/2, f2v%), and

2tt . 27t
f = cos — + l sin — •

First we observe that L = Ra(A^2, f) = Ra(v/2)(f) = Ra(f)(x/2). The


monic polynomials irreducible over Ra of which W2 and f are roots are
C3 — 2 and £2 + £ + 1, respectively. Thus [Ra(v/2): Ra] = 3,
[Ra(f): Ra] = 2. However, to find [L: Ra] for example as

[Ra(^2)(f): Ra(^2)] • [Ra(^2), Ra],

we must find the monic polynomial irreducible over Ra(v/2) of which


f is a root. But this polynomial is still £2 + £ + 1, for neither of its non-
real roots f, f2 is an element of the held Ra(x/2), which contains only
real numbers. Hence [L: Ra] = 2-3 = 6. A slightly different argument
9.2] ALGEBRAIC EXTENSIONS 373

establishes the same result if we write

[L: Ra] = [Ra(f)(^2): Ra(f)] • [Ra(f): Ra].

To exhibit a basis for L over Ra we use 9.25 and the proof of 9.26. A basis
for Ra(\/2) over Ra is {1, y/2, (\/2)2} and one for Ra(x/2)(f) over
Ra(\/2) is {1, . Hence {1, a/2, v74, fv7?} is a basis for L
over Ra.
Corollary 9.28 can also be used to give a new and, on the basis of the
techniques developed in this chapter, simpler proof of Theorem 8.38—
that whenever K is a subfield of C then Alg (K) is an algebraically closed
subheld of C. For suppose that zx, z2 G Alg (.K). Then also 22 is algebraic
over K(z{). Hence, by the preceding, K(zx, z2) is a finite extension of K.
But then every element of K(zx, z2) is algebraic over K by 9.24. In
particular, this is true of zx + z2, zx — z2, zx • z2 and, in case z2 ^ 0,
z\/z2. To prove algebraic closure, suppose that 2 e Alg (Alg (K)), i.e.
for some w0, . . . , wm G Alg (K) with wm ^ 0 we have

w0 + wxz + • • • + wmzm = 0.

Then z is algebraic over K(w0, . . . , wm) and hence

L = K(w0, . . . wm, z)

is a hnite extension of K by 9.28. But 2 e L, so 2 is algebraic over K


by 9.24.

Exercise Group 9.2

1. Prove Theorem 9.17(iii), (v).


2. Prove Theorem 9.18(iii).
3. Suppose that K*(U) Q K*(W) where IT, but not necessarily U, is finite.
Show that there exists finite V c U with K*(U) = K*(V).
4. Prove Lemma 9.23.
5. Show that if [L: K] = 2, p(f) is prime in A[£], z G L, and p(z) = 0,
then every root of p(£) belongs to L. Show further that if deg (p(£)) > 1
then L = K(z). Is either (or both) of these conclusions true if we assume
instead that [L: K] = 3?
6. Let L be the root field of £4 — 1 over Ra, i.e., the field generated by the
fourth roots of unity over Ra. Compute [L: Ra]. Do the same for the root
field of £6 — 1.
7. (i) Is Ra(l + </l) = Ra(\/2H
(ii) Is Ra(\/2 + V3) = Ra(V2, V3)?
8. Let K+(V) = Alg (K(V)) for any K, V. We say that w depends alge¬
braically on V over K if w G K+(V). Show that 9.17(i)—(iv) and 9.18(i)
continue to hold true if we replace K*(V) by K+(V).
374 ALGEBRIAC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

9.3 Applications to geometric construction problems. Basic geometric


notions. To conclude our work, we wish now to show how the above
results can be applied in deciding certain questions about geometric con¬
structions with a ruler (straightedge) and compass in the Euclidean plane.
We cannot go into a full account of this material. To do so we would first
have to give, among other things, a set of axioms for Euclidean plane
geometry. In modern terms these axioms would be given by a set of con¬
ditions on a mathematical system of objects, which are called “points,”
“lines, ” and “circles, ” and between which there are certain relations: rela¬
tions of incidence, e.g., “point P lies on line £” and “point P lies on circle 6, ”
i,
of betweenness, e.g., “point P lies between points Q Q2 on a line joining
Qi, Q2,” and finally of equidistance, e.g., “the distance of point Pi from
P2 is the same as that of Qi from Q2. ” (This is just an indication of what
basic notions might be taken, and to accomplish the same purposes
various alternatives are possible.) Any triple of distinct points (Pi, P2, P3)
can be regarded as determining an angle with vertex P2. In terms of the
above discussion we can say that the angle determined by (Pi, P2, P3)
is the same as that determined by (Qi, Q2, Q3) if we can find P[ between
Pi, P2 and P3 between P2, P3, and similarly for Q[, Q'3, such that in the
triples (P[, P2, P'3) and (Q[, Q2, Q'3) all corresponding distances are the
same. Thus we can speak of bisecting an angle, i.e., of dividing the angle
into two equal angles, and similarly of trisecting an angle, etc., just as
with the notion of equidistance we can speak of bisecting, trisecting, etc.,
a line segment.
Note that the notion of distance, considered as a real-valued function
of pairs of points, or of size of angle, is not available to us under this
conception of geometry. We are limited to dealing with equidistance
and equiangularity. Constructions assuming the use of rulers marked to
any desired degree of accuracy, and of angle measures (protractors), go
beyond what is to be considered here.

The realization in the cartesian plane. It is known that, under suitable


choice of the basic notions and axioms for geometry, any two systems of
objects satisfying these axioms are isomorphic. Moreover, at least one
such system can be given by means of cartesian geometry. By a point
we understand here an element (x, y) of Re X Re (whose coordinates we
call x, y). By a line we understand the set £ of all points (x, y) satisfying

(9:3-1) ax -f- by = c

for some fixed a, b, c. E Re with a ^ 0 or b ^ 0. By a circle we under¬


stand the set C of all points satisfying

(9:3-2) (x - a)2 + (y — b)2 = c 2


9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 375

for some fixed point (a, b) and fixed c e Re with c > 0. To say that
(x, y) lies on the line £ determined by (9:3-1) we mean simply that (x, y)
satisfies (9:3-1). Similarly we define what is meant by: (x, y) lies on the
circle Q. It is seen that if £x, £2 are any two lines then £x = £2 or £x is
disjoint from £2 (in other words, £x is parallel to £2) or £x n £2 =
{(x, y)} for a unique point (x, y). In the last case we say that £x, £2
meet or intersect at (x, y). Similarly we can speak of the points of inter¬
section, if any, of a line and a circle or of two circles.
Algebraically, such points of intersection can be determined as follows.
To find the intersection of £x = {(x, y): axx biy = cx} and £2 =
{(x, y): a2x b2y = c2}, it is first seen that £x = £2 if and only if there
is some h 9^ 0 with ax — ha2, 6X = hb2 and cx = hc2, and that £x n
£2 = 0 if and only if there is some h 9^ 0 with ax = ha2, 6X = hb2,
and cx hc2. Then if £x £2 and £x n £2 9^ 0, it can be seen that
there is a unique solution (x, y) of

(9:3-3) a\X + biy = cx,


a2x -f b2y = c2.

This is given by (cf. 6.18ff)


_ Cjb2 — c2bi
a\b2 — d2bi
(9:3-4)
_ dic2 — q2cx
y dib2 — d2b\

(The hypotheses ensure that ax/a2 5^ bi/b2> hence ax62 — a26x 9^ 0.)
Hence the coordindtes x, y of the point of intersection of two lines can be found
by rdtionol operdtions from the coefficients of the equations determining those
lines.
Similarly to find the intersections of a line £ = {(x, y): axx + biy — cx]
and a circle e = {(x, y): (x — a2)2 + (y — b2)2 = cl), we seek to solve
simultaneously

(9:3-5) d\X + biy = cx,


(x — a2)2 + (y b2)2 = c22.

For example, if 6X 9^ 0, we write y = — (ax/6x)a: — cx/5x = dx + e, and


solve the quadratic equation (x — a2)2 + (dx + e — b2)2 = cl There
are at most two real roots of this equation, and hence at most two points
in the intersection. By the quadratic formula we thus see that the co¬
ordinates x, y of each of the points of intersection (if any) of a line and a
circle can be expressed using rational operations and the operation of forming
the square root of a nonnegative real number, from the coefficients of the
equations determining the line and circle.
376 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [chap. 9

Finally, to find the intersections of two circles, we must solve simul¬


taneously equations of the form

(9:3-6) (x — aj)2 + (y - bx)2 = c\,


(x — a2)2 + (y — b2)2 = c2.

By subtracting one equation from the other, this problem can be reduced
to one of solving simultaneously either one of these equations with a linear
equation, namely

(2ai 2a2)x -f- (2bx 2b2)y = (cf c2) -j- (a2 — af) -f- (b2 — b\).

If the circles have the same center they will be disjoint or identical.
Otherwise (a1; bf) (a2, b2) and we see again that there are at most two
solutions for the intersection of two circles, any one of which can be expressed
by rational operations and real square roots from the coefficients of the equa¬
tions of the given circles.

Any ruler and compass construction starts from a finite set of initial
data, which consist of specific points, lines, and circles. The construction
proceeds in a finite number of steps to produce new points, lines, and
circles. These can be found only by means of one of the following operations:

(9:3-7) (a) Given distinct points Px, P2 we can construct the unique
line £ on which these lie.
(b) Given a point Px and distinct points Qx, Q2, we can construct
the circle Q with center Pi, any point P2 of which has the
same distance from Pi as Q2 has from Qx.
(c) Given distinct lines £i, £2, we can construct the point of
intersection, if there is any.
(d) Given a line £ and a circle Q, we can construct the point or
points of intersection, if there are any.
(e) Given distinct circles Ci, C2, we can construct the point or
points of intersection, if there are any.

If we regard the construction as being carried out in the cartesian plane,


the original data are said to be given if we have the coordinates xx, yx of each
initial point (xx, yf), coefficients a, b, c for an equation (9:3-1) of each
initial line, and the coordinates a, b of the center and the radius c giving the
equation (9:3-2) of each initial circle. We proceed to find such numbers or
coefficients for every point, line, and circle obtained in the construction by
using only the rational operations and the formation of real square roots of
numbers already obtained. Indeed, we have already seen how this is to be
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 377

done for the constructions (9:3-7c, d and e). For the construction (9:3-7a)
we are given xx, yx, x2, y2 with (xx, yx) 9^ (x2, y2) and we want an equa¬
tion of the line through (xX} yx) and (x2, y2). As is well known, this equa¬
tion is x = xi, for the case that x2 = xx, and is (y — yx)/(x — xx) =
(y2 yi)/(x2 Xi) for the case that x2 5^ xx. In either case this can
be brought to the form ax -f- by = c, where a, b, c are obtained by rational
operations on xxyx, x2, y2. For (9:3-7b) we are given a, b, xx, yx, x2, y2
and the desired equations is (x — a)2 + (y — b2) = c2 where

c = y/(x2 xx)2 + (2/2 — yi)2-


Thus again we have used only rational operations and real square roots.
In particular, we draw the following consequence.

(9:3-8) Suppose that the numbers or coefficients for the initial points,
lines, and circles of a construction lie in a certain set A of real
numbers. Let K — Gen (A). Suppose that a point (x, y) is
constructed by ruler and compass from the original data. Then
there exist subfields Kx c K2 c • • ■ e Kq+1 = L of Re such
that (i) Kx = K, (ii) for each j = 1, . . . , q, Kj+1 = Kfiy/uf)
for some Uj e Kj with Uj > 0, and. (iii) x, y e L.

Conversely, in a sense, we claim that

(9:3-9) if x, y El L for such an extension L of a field K of real numbers,


then the point (x, y) can be constructed by ruler and compass
from a finite set of points with coordinates in K.

For if we can show that each point (x, 0) for x E L and each point (0, y)
for y E L can be so constructed, we can find (x, y) as the intersection of
the vertical through (x, 0) and the horizontal through (0, y). We shall
indicate why this works for the points (x, 0) with x E L, the proof for the
points (0, y) being similar. It is sufficient to show that for each x E K
this is so, which is trivial, and that the set of x with (x, 0) so constructible
is closed under +, —, •, and ~1, as well as the operation of taking square
roots of positive numbers. For +, — this is standard. For xx ■ x2, we can
restrict ourselves to the case xx > 0, x2 > 0. Then we use the following
figure:
378 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

Here 0 = (0, 0), Pi = (aq, 0), £i is an arbitrary line through 0 distinct


from the x-axis, Qx has the same distance from 0 as (1, 0) from 0, and Q2
has the same distance from Qx as (x2, 0) from O. Then line £2 is drawn
parallel to the line through QiPx by the usual construction, and P2 is
the intersection of £2 with the x-axis. By proportions we have \/xx =
x2/w, that is, w — xxx2. By interchanging x2, w here we can get xx/x2.
(In particular, by these constructions for +, •, —1, we can get all (x, 0)
with x rational, as we already pointed out in Section 7.1.) To obtain
\/x for x > 0, we use the following figure.

Here P is the point (x, 0), A is a point whose distance from P is the same
as that of (1, 0) from (0, 0), Q is the point of bisection of OA, Q is the
circle with center Q through A, and BP is drawn perpendicular to OA
through P. Then OB A is a right triangle, and it is seen that triangles
OPB and BP A are similar. Hence corresponding proportions are equal,
x/w = w/1, so that w2 = x and w = \fx. Then by constructing B' on
the positive T-axis with distance to 0 the same as that of B to P, we obtain
(y/x, 0).
The algebraic equivalent of constructibility.
Since we have not given a completely explicit description of geometry,
the foregoing can only be taken as a sketch of what to expect on the basis
of such a description. However, we believe the reader should accept the
following nongeometric definition as providing an adequate equivalent,
on the basis of the foregoing, of the notion of construction by ruler and
compass in the cartesian plane.

9.29 Definition. Suppose that A c Re and x, y e Re.


We say that (x, y) is constructive from A if there is a finite sequence
Kp, K2, . . . , Kq.|_! of subfields of Re and a sequence of real numbers
Uj, j = 1, . . . , q, such that
(i) Kx = Gen (A),
(ii) Uj G Kj and uj > 0 for j = 1, . . . , q,
(iii) Kj+1 = Kj(\fiuj) for j = l, ... ,q,

and

(iv) x, y e Kq+1.
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 379

We say that (x, y) is constructible if it can he constructed from the


empty set.

By the identification of complex numbers x + iy with points (x, y) of


the plane, this definition suggests the following formal generalization.

9.30 Definition. Suppose that B c C and w e C. We say that w is


C-constructible from B (or constructible from B in the generalized
sense) if there is a finite sequence K1} K2, . . . , Kq+1 of subfields of
C and a sequence of complex numbers Zj, j = 1, . . . , q, such that
(i) Kx = Gen (B),
(ii) z] e Kj for j = 1, . . . , q,
(iii) Kj+1 = Kfizj) for j = 1, . . . , q,
(iv) w G Kq+1.
We say w is C-constructible if it is C-constructible from the empty set.

Now it can be argued on geometrical grounds, as for (9:3-8) and (9:3-9),


that this definition is really not more general than the preceding one, in
the sense of the following theorem. We give here a direct nongeometrical
proof.

9.31 Theorem. Suppose that B c C and that w — x + iy, where x, y


are real. Let A consist of all real numbers a, b with a + ib e B.
Then w is C-constructible from B if and only if (x, y) is constructible
from A.

Proof. If (x, y) is constructible from A we find a sequence of real fields


Ki, . . . , Kg+1 and of real numbers ult . . . , uq satisfying 9.29(i)-(iv).
Thus Kx = Gen (A). Let z0 = i, z, = aJufj for j = 1, . . . , q. Then
put K'0 — Gen (B) and K'+i = Kj(zj) for j = 0, . . . , q. It is seen that
K'q, K'i, . . . , K'q+i is a sequence of subfields satisfying 9.30(i)—(iii) with
Kj c Kj for j = 1, . . . , q. Hence x, y, i G K'q+x, so w E K'q+1 and w is
C-constructible from B.
Conversely, suppose that w is C-constructible from B, and that
Ki, . . . , Kq+1 is a sequence of subfields of C and zi, . . . , zq a sequence of
elements of C satisfying 9.30. For each j = 1, . . .', q, let

(1) Zj = Sj -f- itj.

Then by 8.9,

(2) Zj or —Zj is one of the numbers

Vv^fTT2 + 8j + tf - Sj
(±) 2 r 2
380 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

Now let

(3) Li = Gen (A), L2 = Li(\/2) and, for j = 1, . . . , q,

Lzj = L3(j_1)+2(Vs2. + tV),

Lzj+i = L3j('\l\/s2 + t2. + Sj),

Lzj+2 = + t2. — Sj).

We can regard each Ki for l = 1, . . . , q + 1 as Gen (B U {21, . . . , zi—1}).


Consider any number u -R iv in B U {z\, . . . , zi— 1}; then we see from
(l)-(3) and the definition of A that both u, v e L3{i_1)+2. Hence it is
also seen, say by 9.3(iv), that

(4) if u + iv is in

Ki = Gen (B U {21, . . . , zi-i})

then
u, v E Lz(i—i) +2j for l = 1, • • ■ , Q -j- 1.

In particular, for each j = 1, . . . , q, since z] G Kj, we have Sj, tj e


L3(y_1)+2 and hence

(5) s] + tj G /v3(y -t) f 2-

But then also

(6) for each j = l, ... ,q,

Vsy + t2j + si G Lsy and Vs2 + t2 — Sj G L3y+i.

Since always

*/ + t2j > 0, Vsf+Tf + Sy > 0, - Sy > 0,

this shows that the sequence of subfields L\, L2, . . . , L3g_|_2 has the form,
Li = Gen (A), Lk+i = L^y/rif) for k — 1, . . . , 3g + 1, with rk e Lk and
rfc > 0. Since we can delete all repetitions in this sequence with rk = 0,
we thus have a sequence of subfields of Re satisfying the conditions
(i)-(iii) of 9.29. But by (4), since w = x + iy is in Kq+U we have x,
y G L3q+2 and hence (x, y) is constructible from A.
From the algebraic point of view it is often more convenient to deal with
C-constructibility of numbers x -f iy than with constructibility of “points”
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 381

(X, y). We can now derive a simple algebraic criterion for C-constructi-
bility, which is the main tool in settling the classical construction problems.
Note that there is no loss of generality in starting here with an arbitrary
subfield K of C, since C-constructibility from a set B is the same as that
from Gen (B).

9.32 Theorem. Suppose that K is a subfield of C and w g C. If w is


C-constructible from K then w is algebraic over K and [7v (w): K] = 2n
for some n > 0.

Proof. I nder the hypothesis, we can find a sequence of subfields


K\, . . . , Kq+i of C and a sequence of complex numbers Z\, . . . , zq
satisfying the conditions of 9.30, with Kx = K. Then for each j = 1, ,q
if we put dj = z2, Zj is a root of the monic polynomial £2 — dj G Kff}.
If this polynomial is prime in K/f\ then [Kj+l: Kf = [Kj(zf): Kf\ = 2.
Otherwise, / — dj splits into linear factors, one of which must be £ — Zj\
in this case zj e Kj and [Kj+1: Kf] = 1. Hence by 9.28, Kg+1 is a finite
extension of K with [Kq+1: K] = 2m for some m > 0. But w G Kq+1,
so by 9.24 and 9.27, w is algebraic over K and [K(w): K]\2m. This gives
the desired conclusion.
Under certain additional conditions on the nature of K(z), the converse
to 9.32 is also true. However, the proof of this involves some finer con¬
siderations which we shall not go into here. (A special case of this is treated
in the exercises.) However, we can use 9.32 with 9.31 to show that various
geometric configurations cannot be obtained by ruler and compass
constructions.

Some classical construction problems.


The problem of the duplication of the cube is to construct, if possible,
from a given line segment P1P2, regarded as the edge of a given cube,
a new line segment Q1Q2 which is to be the edge of a cube with double the
volume of the original one. In particular, if such a construction can be car¬
ried out, it should be possible to construct a point (x, 0)—from the original
data given by the pair of points (0, 0), (1, 0)—satisfying x3 = 2, that is
-s/2 should be C-constructible from Ra. But the monic polynomial
p(£) G Ra[£] which is irreducible over Ra and has \f2 as root is just
p(%) = — 2. Hence \R&(//2): Ra] = 3. Thus S/2 is not C-con¬
structible from Ra by 9.32. Hence, the duplication of the cube cannot be
carried out by a ruler and compass construction.
The initial data in the problem of the trisection of an angle consist of
two intersecting lines or three distinct points P1, /J2, 7J3. Suppose we had
a method for trisecting any angle by ruler and compass. Then we should
be able to trisect any acute angle 6 in which the initial data are given by
382 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

Px = (1,0), P2 = (0, 0), and a point P3 = (x, y) on the unit circle;


thus x = cos 9, y = sin 9.

Suppose that the given angle has radian measure 9. The supposed con¬
struction will end with the following figure.

Figure 9.4

Thus, if the construction is possible, we can also construct the point


(cos 9/3, 0) by dropping the perpendicular from P3 to the x-axis. Here
the original data consist of rational numbers and cos 9 (since the value
of sin 9 = \/T — cos2 9 can be constructed from that of cos 9). Let'
w = cos 9/3. We know by (8:1-3) that re is a root of the polynomial
4£3 — 3£ — cos 9, that is, of /(£) = £3 — f £ — (cos 9)/4. Let K =
Ra(cos 9). If /(£) has no roots in K then/(£) will be irreducible over K
and we will have [K(w):K\ = 3. Otherwise [K(w):K] will be 1 or 2.
In the former of these two cases w e K, and hence w is constructible
from K. In the second case, w is a root of a quadratic polynomial irreduci¬
ble over K; since the roots of such can be obtained by adjoining a square
root to K, we see again that w is constructible from K. Thus we see that,
with the original data as given in Fig. 9.3, the angle with radian measure 9
can be trisected by ruler and compass if and only if /(£) = £3 — f £ — (cos 0)/4
has a root in Ra(cos 9). Now one can give many examples of angles 9 for
which /(£) has no root in Ra(cos 9). For example, for 9 = 7t/3 (60°) we
have cos 9 = and Ra(cos 9) = Ra. We know by 6.19 that every ra¬
tional root b/c, with gcd (b, c) = 1, of /(£), and hence of 8£3 — 6£ — 1,
must be such that 6|1 and c|8. The only possible candidates are ±1,
± 5, and ±|\ It is a routine matter to verify that none of these is
a root of /(£). (Cf. Exercise 2 for a more concise argument.) Hence,
the trisection of an angle cannot in general be carried out by ruler and compass
constructions.
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 383

Regular 'polygons; Gauss’ solution.


We conclude this group of problems with a discussion of the question:
for what values of n e P, n > 3, can we construct a regular n-sided polygon?
First of all, it is easily seen that if it is possible to do this, then we can locate
the center of the circumscribed circle. We can then construct from this
a regular n-sided polygon with circumscribed circle having radius of length
1. From this we can construct such a polygon whose circumscribed circle
is the unit circle, with center the origin, and with (1, 0) as one vertex:

But then by 9.32, the primitive nth root of unity fn = cos 2ir/n +
i sin 2-7T/n will be C-constructible. Let qn{k) be the unique monic poly¬
nomial prime in Ra[£] of which is a root. Thus we are led to finding
this polynomial and its degree, in order to compute [Ra(f„): Ra] = y(n).
It can be shown that <p(ri) is the number of integers to with 1 < m < n
and (to, n) = 1, that is, to relatively prime to n (<p is often called Euler’s
function). Furthermore, a general computation of <p(n) is available in
terms of the prime power representation of n:

(9:3-10) if n — 2m°pfi ■ • • pfi (j possibly 0) where px, . . . , pj are


distinct odd primes and each ml + l > 0, then

<p(n) = 2TO°-yri-1(?>i - i) • • • vV~\Pi - i).

(We exclude 2m°_1 here if to0 = 0.) Thus [Ra(fw): Ra] is a power of 2
if and only if n has the form n = 2”*°^! • • • pj, where px, . . . , pj are dis¬
tinct odd primes, for each of which pi — 1 is a power of 2. This leads us
to consider those prime numbers p which can be represented in the form
p = 2l + 1. A further consideration shows that l must itself be a power
of 2 in this case, and p = 22 + 1. These numbers are called the Fermat
primes. For k = 0, 1, 2, 3, 4 we obtain as values of 22 +1 the numbers

3, 5, 17, 257, 65537,

each of which, as it turns out, is prime. However, it is known that 22 +1


is no longer prime. (It is not known whether or not there are infinitely
many Fermat primes.) Thus, if n is not of the form,

2mpi ■ ■ • pfm > 0, Z > 0)


384 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

with Fermat primes Pi, ... ,Pi, then a regular n-sided polygon cannot he
constructed by ruler and compass. In particular, for n = 7, 9, 11, 13, 14,
18, 19, . . . we cannot thus construct a regular n-sided polygon. On the
other hand, it has been known since the time of the Greek geometers how
to construct regular 3-sided (triangles), 5-sided (pentagons), and 15-sided
polygons, and hence polygons having 2m ■ 3, 2m • 5, and 2m • 3 • 5 sides,
for any m. In particular, we can construct regular n-sided polygons for
n = 3, 4, 5, 6, 8, 10, 12, 15, 16. What was unsuspected until the time of
Gauss was that a regular 17-sided polygon can also be constructed by ruler
and compass. In fact, Gauss showed that the above is a complete descrip¬
tion; if n = 2mpi ■ ■ ■ pi(m > 0, l > 0) with Fermat primes pi, . . . , Pi,
then a regular n-sided polygon can he constructed by ruler and compass.
A full exposition of these results would require a fair amount of addi¬
tional work beyond the material of this book. Thus we shall content
ourselves with an indication of how these results can be obtained for the
special case where n is just a prime number, which we now denote instead
by p. Further details for this are suggested in the Exercise Group 9.3.
Thus given a prime number, p > 3, and f = cos 27r/p + i sin 27r/p,
the first task is to find the unique monic polynomial g(£) which is prime
in Ra[£] and which has ( as a root. We know that £ is a root of 1 +
2 + •••+£ + 1. It can be proved that this is the desired polynomial
(cf. Exercise 5 below). Hence [Ra(f):Ra] = p — 1 and p — 1 is, in¬
deed, the number of integers m with 1 < m < p and (m, n) = 1. Then
if we prove that every prime p of the form p = 2l + 1 must be a Fermat
0k
prime, i.e., of the form p = 2 -j- 1 (cf. Exercise 6 below), we reach the
first special conclusion (by 9.31 and 9.32) that if p is prime and a regular
p-sided polygon can be constructed by ruler and compass, then p is a
Fermat prime.
To prove the converse, we use the following result, established in
Exercise 8 below. Suppose that K is a subfield of C, z\ is a root of a poly¬
nomial q(£) G R[£], of which the other roots are z2, ... , zn. Suppose that
K(zi) = K(zf) for every i, j and that [K{zi) \ K] = 2m for some m > 0.
Then zi is C-constructible from K. (This is a partial converse of 9.32.)
In particular, for the polynomial g(£) = 1 + j:p~2 + • • • + £ + 1,
the roots are f, f2, . . . , h We know (by Exercise 12 of Exercise
Group 8.1) that Ra(f) = Ra(G) for any i = 1, . . . , p — 1. Hence we
can apply the preceding result directly to obtain: if p is a Fermat prime
then a regular p-sided polygon can be constructed by ruler and compass.
Given a Fermat prime we can, in principle, analyze the foregoing proofs
to obtain an actual construction of a regular p-sided polygon. In practice
this becomes somewhat involved. Gauss developed a systematic pro¬
cedure for obtaining the required constructions. Even this becomes quite
involved for larger values of p, but it was used successfully to give an ex-
9.3] APPLICATIONS TO GEOMETRIC CONSTRUCTION PROBLEMS 385

plicit construction for the first interesting case, p = 17. The much simpler
case, p = 5, can be handled as follows. Here f = cos 2ir/5 + i sin 2r/5.
From 1 = T5 we obtain f-1 = f4 and f~2 = f3. Hence

o = r4 + r3 + r2 + r + 1 = (r2 + r2) + (r1 + r) + 1.


If we set co = + r we have co2 = f~2 +_2 + f2, hence co2 +co -
1 = 0. The roots of this equation are (—1 =t y/5)/2. Since co = 2 cos 2ir/5
(from f 1 = cos27t/5 — isin27r/5) we can use this to construct the
point (cos 2t/5, 0) and then the point (cos 2ir/5, sin 2tt/5) corresponding
to f.
1 he notion of C-constructibility from a field K is a special case of the
notion of constructibility from a field K by means of the rational operations
and arbitrary nth roots. To be precise, we would say that a complex number
w is so constructible if there exists a sequence of subfields AT, K2, . . . , Kq+\
of C, a sequence of complex numbers z1} . . . , zq, and a sequence of positive
integers n1; . . . , nq, satisfying the following conditions:

(8:3-11) (i) AT = K,
(ii) znp G Kj for j = 1, . . . , q,
(iii) Kj+i = Kfizj) for j = 1, . . . , q,
(iv) w E Ag_|_i.

Then we would say that an equation f(w) = 0 is solvable by radicals over


K, where /(£) E AT[£], if there is at least one such w constructible in this
way. As with 9.32, if there is such a solution w then K(w) satisfies certain
restrictive conditions which are not met by arbitrary simple algebraic
extensions. However, these conditions cannot be expressed solely in
terms of the dimension [K(w)\ K], but involve rather a detailed descrip¬
tion of the possible subfields of K(w) and their interrelationships. This
description, which is treated in Galois theory, leads to the results about
solvability and nonsolvability by radicals described at the end of the
preceding chapter.

Exercise Group 9.3

1. The problem of squaring a given circle is one of constructing the side of a


square with area equal to the area of the circle. Why cannot this be done
by ruler and compass?
2. Let p(£) = 4£3 — 3£ + Show that 2p(£/2) = £'3 — 3£+ 1. Use
this equation to get a quick proof that p(ij) is irreducible over Ita. (Cf. the
proof that the angle 7r/3 cannot be trisected by ruler and compass.)
3. Let 0 = 27t/7. Show that cos 6 is a root of £3 + £2 — 2£ — 1. [Hint:
First show that cos 40 = cos 30 and then use the equations for finding
cos 40 and cos 30 in terms of cos 0.) Use this to get a direct proof that a
regular 7-sided polygon cannot be constructed by ruler and compass.
386 ALGEBRAIC NUMBER FIELDS AND FIELD EXTENSIONS [CHAP. 9

4. Prove the following result (Eisenstein’s theorem). Suppose that p is a


prime number. If /(£) = an%n -j- an—i£n_1 + • • • + «o> where n > 0
and each a(-G I, and if a„ ^ 0 (modp), each a; = 0 (mod p) for
i = 0, ...,»— 1, but ao ^ 0 (mod p2), then /(£) is prime in Ra[£].
[Method: Consider /(£) = (6m£m + • • • + bo)(ck^k + • • • + co), where
all bi, c,-£l. Then one but not both of bo, co is divisible by p; suppose that
&o 0 (mod p), co = 0 (mod p). .41so ck ^ 0 (modp). Consider the
unique i > 0 with a ^ 0 (mod p) but each of co, . . . , c»_i = 0 (mod p),
and examine a£ (mod p). Show that i must be n.]
5. Suppose that p is prime and g(£) = £p_1 + £p~2 + • • • + £ + 1. Show
that q(tj) is irreducible over Ra by applying Eisenstein’s theorem to
?(£+ 1). [Note:qtf) = ({’ - !)/(£ - 1).]
6. (i) Show that if l = mn where m is odd then 2" + 1 divides 2l + 1.
(ii) Show that if p = 2l + 1 is prime, then l = 2k for some k.
7. Suppose that K is a subfield of C, z\ is algebraic over K but z\ & K,
and that g(£) is the unique monic polynomial irreducible over K of which
2i is a root. Write q(g) = (£ — 21) • • • (£ — zn). Suppose further that
K(z 1) = K(zj) for each j. Let w = ni<i<_,<„ (zi — z3), so that w2 is
the discriminant of g(£). Show that

(a) w2 G K, (b) w & K, (c) [K{w): K] = 2.

To prove (b), suppose that w G K and apply 9.14.


8. Let K, 21 and #(£) satisfy the hypotheses of Exercise 7. Suppose further
that [iv(2i): K] = 2m for some m > 0. Show that 21 is C-constructible
from K.

9.4 Conclusion. Even if the successive creation and elucidation of the


various number systems discussed in this book were the result only of an
aesthetic desire to see the matter “whole, ” the outcome must be regarded
as an impressive intellectual achievement. However, what is more re¬
markable is that it appears to be essential to use the full resources of this
development to analyze problems which, in their statement, are of a com¬
pletely elementary nature. We have already seen an indication of this
in the preceding discussion of the geometric constructibility problems.
Even the construction of the regular 17-sided polygon involves various
nontrivial ideas from the algebra of complex numbers.
There are also many questions about the positive integers which, it
appears, could not have been answered without a substantial development
of the advanced parts of this subject, in particular of the structure of
algebraic number fields and certain subdomains of these fields, and of a
good deal of complex analysis. The two branches of mathematics using
these techniques are called, respectively, algebraic number theory and
analytic number theory.
9.4] CONCLUSION 387

One example in the first of these fields is in connection with what is


called Fermat’s “last theorem,” according to which there are no x, y, z, n e P
with n > 2 and xn -j- yn = zn. Fermat wrote in the margin of one of his
books that he knew a proof of this statement. No one to this date has
succeeded in providing any such proof, so that it is probable, considering
the efforts involved, that Fermat was in error about his method. At any
rate, there is a substantial class of integers n > 2, for which it has been
shown that xn + yn ^ zn for every x, y, z e P and n in this class. The
main results here were first found by Kummer, who made use of the
theory of ideals of algebraic integers (in finite extensions of the rationals)
developed by Dedekind and Kronecker.
The prime number theorem is an example of a result obtained by heavy
use of complex analysis. According to this theorem, if tt(n) is the number
of primes p with 1 < p < n, then 7r(n) is asymptotic to n/log n, that is,

?r(n)
lim 1
n—>go n/log n

(where the logarithm is taken to the base e). This was first conjectured
by Gauss, but it was not finally proved until the end of the 19th century,
concluding with the work of Hadamard and de la Valine Poussin.
In recent years there has been some success in obtaining elementary,
i.e., nonanalytic, proofs of several such results. This has been done by
Erdos and Selberg for the prime number theorem by some rather difficult
arguments. However, the methods of analytic number theory are still
among the most powerful and penetrating for the solution of various
number-theoretical problems whose statements are quite elementary.
Thus the extensions of the basic number systems provide much more
than a merely formally satisfying edifice. The search for a satisfactory
solution of various elementary problems necessarily led to their successive
development, with rewards that could hardly have been expected initially.
As with Euclid’s systematization of the geometry of his time, the account
presented here of this development followed the discovery of most of the
significant results which it comprehends. This work of our century is part
of the new view which has been reached of mathematics as an integrated
whole. As with Euclid’s geometry, the reader should take it not as an
end but as a new beginning—in this case leading him into the study of the
surprising interrelationships between the various branches of mathematics
which have been uncovered under this modern view.
APPENDIXES
APPENDIX I

SOME AXIOMS FOR SET THEORY

As we saw in Section 2.1, unrestricted use of the concept of “arbitrary”


set can lead to contradictions, e.g., Russell’s paradox. The purpose of
axiomatic set theory is to make explicit various statements about sets
that would be acceptable and would not lead to such contradictions.
These statements should also be strong enough to allow us to provide the
foundations for all mathematical notions and constructions, in particular,
at least those used in this book. We already indicated at various points
in Chapter 2 one such set of principles which, together with the assump¬
tion 3.2 (that there exists at least one Peano system), appears to satisfy
all these conditions. I he purpose of this Appendix is to bring these various
principles together, in slightly modified form, so that they may be examined
more readily.
The first adequate proposal for a system of axioms for set theory was
put forth by Zermelo in 1908. His system is sufficient to provide the basis
for all the work in this book as well as all classical algebra, analysis, and
geometry. Stronger systems are needed to account for the mathematical
theory of sets, cardinals, and ordinals due to Cantor, and for modern
mathematical developments which rest on this theory. Various such
systems have been proposed by Fraenkel, von Neumann, Bernays, Godel,
and Quine. We content ourselves here with a description of a system of
essentially the same strength as Zermelo’s.
To begin with, we may have in mind that there are two kinds of entities
under discussion: first of all, certain objects called individuals and,
second, certain objects called sets, which we conceive of as being suc¬
cessively built up from the individuals, i.e., sets of individuals, sets of
sets of individuals, etc. For example, we may conceive of the positive
integers as being a basic collection of individuals, all other mathematical
objects such as the integers, rational numbers, real numbers, complex
numbers, being constructed from these by use of the set concept. However,
it turns out that once we have the empty set and basic principles of exist¬
ence of sets, we can prove the existence of a set P, an element 1, and an
operation Sc on P, such that (P, Sc, 1) is a Peano system. Thus the assump¬
tion of the existence of individuals is superfluous, and we assume here that
all objects under purview are sets. We think of all variables x, y, z, A, B,
C, M, S, etc., as ranging over sets. Thus, for example, “for all x, G(.r) ”
is taken to have the same meaning as “for all sets x, &(x),” and “there
exists an S such that (P(S) ” is taken to have the same meaning as “there
exists a set S such that (P(N).”
391
392 APPENDIX I

The basic notions of the theory are two relations between sets, that of
identity or equality, =, and membership, E. As usual we denote nonequality
by nonmembership by <g. The basic axioms concerning identity and
its relationships to membership are the following.

Axiom 1. For any x, x — x.

Axiom 2. For any x, y, if x = y then y = x.

Axiom 3. For any x, y, z, if x — y and y = z then x = z.

Axiom 4. For any x, y, if x = y then for all z, x E z if and only if y G z,


and for all w, w E x if and only if w E y.

Axiom 5. For any A, B, if for all x, x E A if and only if x E B, then


A = B.

The first four of these treat the logical aspects of identity, for which cor¬
responding statements would hold in any mathematical context. Axiom 5,
however, is specifically set-theoretical. It is called the axiom of exten¬
sionality and shows that a set is completely determined by its members
[cf. (2:1-23)].
We wish now to give some axioms of set-existence. To begin with, we
have an axiom guaranteeing the existence of at least one set. (In most
treatments of logic this would be logically derivable.) The simplest such
set is the empty set.

Axiom 6. There exists an A such that for all x, x & A.

By extensionality any two such sets must be identical. Hence we can


introduce the following object (2:1-33):

Definition 1. 0 is the unique A such that for all x,x & A.

In general, the notion “the unique x such that G(x) ” can be introduced
whenever d(x) is a condition for which we have proved that (i) there
exists an x such that &(x), and (ii) for any x, y if B{x) and 0l(y) then x — y.
Next we have an axiom allowing the construction of unordered pairs.

Axiom 7. For any a, b there exists an A such that for all x,x E A if and only
if x = a or x = b.

Again by extensionality, we have a unique such set, given a, b.

Definition 2. For any a, b, {a, b] is the unique A satisfying the condition


of Axiom 7. We put {a} = (a, a} for any a.
SOME AXIOMS FOR SET THEORY 393

As observed in Exercise 1, Exercise Group 2.3, it is not necessary to


introduce ordered pair as an undefined notion.

Definition 3. For any a, b, we take (a, b) = {{a}, {a, b}}.

Then, as in the exercise, we can prove the statement of (2:3-5), that if


(a, 6) = (c, d) then a = b and c = d. As before, we can define (a, b, c)
as ((a, b), c), and so on for (a, b, c, d), etc.
To prove the existence of sets {a, b, c}, {a, b, c, d}, etc., we could in¬
troduce further special axioms like Axiom 7. However, this is not neces¬
sary if we can form unions.

Axiom 8. For any Iff there exists an S such that for all x, x E S if and only
if for some X E M we have x E X.

Again we have unicity.

Definition 4. For any iff, UA[X e iff] is the unique S satisfying the con¬
dition of Axiom 8. In particular, we take

A U B = \JX[X e {A, B}].

Then we can define successively {a, b, c} = {a, b) U {c}, {a, b, c, d) =


{a, b, cj U {d}, etc. It would appear that we should also provide an axiom
guaranteeing the existence of intersections and relative complements; how¬
ever, we shall see that these are derivable from the further axioms.

Definition 5. For any A, B we put B c A if for all x, x E B implies


X El A.

The next axiom is that for the set of all subsets of A, often called the
power-set of A.

Axiom 9. For any A there exists an S such that for all X, X E S if and only
if X c A.

Definition 6. For any A we take (P(A) to be the unique S satisfying the


condition of Axiom 9.

The main principle of set existence is that of restricted set formation or


abstraction (2:1-39). By a condition Gt(x) here we mean specifically any
condition formulated entirely in terms of = and G and built up using the
logical connectives, “not, ” “and, ” “or, ” “if . . . then . . . , ” “if and only if, ”
“for all y,” “for some y” (or other variables), where all variables range
over sets. In the next axiom Gt(x) is any condition which does not contain
A as a free variable, although it may contain other free variables.
394 APPENDIX I

Axiom 10. For any S there exists an A such that for all x, x £ A if and only
if x E S and &(x).

Since we have unicity again, we can introduce the following by exten-


sionality for each such condition Ct(x).

Definition 7. For any S, {x: x £ S and Ct(:r)} is the unique set A satisfy¬
ing the condition of Axiom 10.

We shall now sketch how one can prove the existence of intersection,
difference, and cartesian product from these axioms. First note that if
we can define f}X[X £ M] and if M X 0, then for any S £ M we have
r\X[X £ M] c s. This suggests the following.

Definition 8. If M = 0 we put DA[X e M] = 0. If M X 0 and


S E M, we take f|A[A e M] = {x: x E S and for all
X £ M, x £ A}. In particular we take

A nB = f)X[X e {A, B}}.

It is easily seen that this definition of HA[A £ M] is independent of the


choice of S £ M when M X 0. However, we need to use at least one such
S in order to apply Axiom 10.

Definition 9. For any S and A we take AiS) = S — A = {x: x E S and


x £ A].

Again this is possible by Axiom 10.


To prove the existence of A X B (2:3-9) under the present definition
(3) of ordered pair we note that for any a E A, b E B we have {a} £
<P(A U B) and {a, 6} £ (P{A U B), hence {{a}, {a, b}} £ (P((P(A U B)).
This leads to:

Definition 10. For any A, B we take A X B = {x: x E (P((P(A U B))


and for some a £ A and b E B, x = (a, b)}.

Thus here we are using Axioms 7-9 and Axiom 10 with S = (P((P(A U B)).
It can then be proved that for any a £ A and b E B we have (a, b) E AX B.
Also note how the existence of domain and range (2:3-15), (2:3-16) can be
realized. Each element a of SD(TF) is a member of some {a, b}, which is in
turn a member of the element (a, b) of W. In other words, if we put
Wi = UX[X e W] and *8 = UF[F £ Wj], we should have 2D(IF) £ *8,
and similarly (R(IF) c S. In fact, if we take £>(1F) = {x:x £ UF[F £
UX[X £ IF]] and for some y, (x, y) E W}, we are guaranteed the exist¬
ence of 4)(IF) by Axioms 7, 8, and 10, and we can then prove that
SOME AXIOMS FOR SET THEORY 395

x S)(TT ) if and only if for some y, (x, y) £ W. We proceed similarly for


the range. We can now introduce the various basic concepts concerning
relations and functions as in Section 2.3.
If we have the existence of at least one set S, then we can derive the
existence of an empty set from Axiom 10, simply as {.r: x £ S and x ^ x}.
If there were no restriction to S in Axiom 10 we could also derive directly
the set existence Axioms 7-9. However, as we know by Russell’s paradox,
an unrestricted version of Axiom 10 leads to contradictions and all state¬
ments can be trivially derived from it. It does not appear that Axioms
7-9 can be derived from the restricted version of Axiom 10 given here,
and hence they had to be stated separately.
We cannot yet prove that there exists an infinite set, although we can
prove a series of statements guaranteeing that there are infinitely many
distinct sets. What we have in mind is the series of statements that any
two of 0, {0}, {{0}}, {{{0}}}, ... are distinct. For note first that for any
x, {t} 0, since x £ {x}. Note also that if {x} = {y} then x = y,
for by Definition 2, x E {x}, hence if {x} = {y} then x £ {y} by Axiom 4,
and then x = y by Definition 2 again. Thus {{0}} ^ {0}, for other¬
wise by the preceding, {0} = 0. Also by similar arguments, {{{0}}} ^ 0,
{{{0}}} ^ {0}, {{{0}}} ^ {{0}}, etc. The existence of a set which con¬
tains all these sets cannot be proved from the preceding axioms. We in¬
troduce it as a new assumption, often called the axiom of infinity.

Axiom 11. There exists a set S such that


(i) 0 G S, and
(ii) whenever then {x} e S.

Definition 11. Let S be any set satisfying the conditions (i), (ii) of Axiom
11. Let M = {X: X £ (P (S) and 0 £ A and whenever
x £ X then {x} £ X). Let P = DA[A £ M],

It can be seen that P is independent of the choice of S. We can think


of P as the smallest set satisfying Axiom 11 (i), (ii).

Theorem. If we define Sc(t) = {x} for all x £ P, we have:


(i) 0 e P;
(ii) if x El P then Sc(x) £ P;
(iii) for all x £ P, Sc(t) ^0;
(iv) for all x, y £ P, if Sc(x) = Sc(y) then x = y;
(v) if X c P and 0 £ X and for anu x, x £ X implies
Sc(a;) £ X, then X = P.

In other words, (P, Sc, 0) is a Peano system.


396 APPENDIX I

Another usual way of introducing a Peano system in set theory is by


taking, instead, Sc(x) = x U {x} and assuming the existence of a set S
which contains 0 and which contains Sc(x) with each x. The smallest
such set consists of 0, {0}, {0, {0}}, {0, {0}, {0, {0}}}, ... or, if we denote
these respectively by 0, 1, 2, 3, . . . , of 0, 1 = {0}, 2 = {0, 1}, 3 =
{0, 1, 2,}, ... . This has certain advantages over the preceding definition
for other developments.
Of the basic set-theoretical principles mentioned in Chapter 2, there
now remains only one that we have not stated here, the axiom of choice.
As given in (2:2-16), this is as follows.

Axiom 12. Suppose that M is such that for all X e M, X ^ 0, and for
all X, Y E M, if X 9^ Y then X n Y = 0. Then there exists
an A such that for each X e M, there is a y with A n X = {y}.

In words: given a set M of disjoint nonempty sets, there exists a set A


which has exactly one element in common with each set of M. The state¬
ment of this axiom is rather cumbersome. Some other statements which
are equivalent and simpler to formulate make use of the notion of func¬
tion. We recall that a function is defined to be a set F of ordered pairs
(x, y) such that for each x E 36(F) there is a unique y with (x, y) E F;
the unique y is denoted by F(x).

Axiom 12'. Suppose d 1 is such that for all X E d/, X 0, and for all
I, F g M, if X 5^ Y then X n Y = 0. Then there exists
a function F with 36(F) = M and such that for each X E M
we have F(X) E X.

Axiom 12". For any M there exists a function G such that 36(C) =
{X: X E M and X ^ 0} and such that whenever X E 36(C)
then G(X) E X.

To see that Axioms 12, 12', 12" are equivalent under the preceding axioms
(in fact under 1-10), suppose first that 12 holds. Given M satisfying the
hypothesis of 12' it also satisfies that of 12. Hence we can find a set A
satisfying the conclusion of 12. Now put F = {(X,y):X Ed/ and
y E A n X}. The existence of F is seen from Axiom 10 since it is a sub¬
set of M X A and we have already seen how the existence of this set
can be proved. Clearly F is the desired function to satisfy the conclusion
of Axiom 12b Suppose that 12' is true. Consider M satisfying the hy¬
pothesis of 12". Its elements are not necessarily disjoint or nonempty.
Let Mi = {X: X e M and X ^ 0}. This exists by Axiom 10. Now we
“disjoint” the elements of Mi. Let d/2 = (X X {X}: X E M-f}] this
exists from d/2 C d/x X (P(Mi). Then d/2 satisfies the hypothesis of 12'.
SOME AXIOMS FOR SET THEORY 397

If we pick F satisfying the conclusion of 12' we see that for each X e M


with X 0, F(X X {X}) = (y, X) for some y e X. Let G = {(X, y):
X e fl/i and A(X X {X}) = (y, X)}. This satisfies the conclusion of
12". It is easily seen that 12" implies 12. A number of other statements
are known to be equivalent to the axiom of choice and are more useful
in some contexts.
It is interesting to get an idea of how these axioms enter into the various
constructions of the book. Many of these involve the power set Axiom 9
and the Axiom 10 of restricted set formation. As we have seen, from these
already follows existence of cartesian product. In the proof of 4.21 of the
existence of an ordered integral domain D extending P we took as ele¬
ments of D equivalence sets of pairs (n, m) for n, m e P with respect
to a certain equivalence relation in P X P. Thus D c (P(P x P). In the
proof of 5.7 of the existence of a simple transcendental extension E = D[£]
of any domain D, we took, first, the set of all essentially finite sequences
(a0, . . . , an, . . .) of elements of D. Such a sequence is just a function F
with 2D(F) = {n: n G I and n > 0} and (ft(F) c D, hence F c I X D
and F E (P(I X D). Thus the set constructed is a subset of (P(I X D).
Again, as with the construction of I, in constructing a held of quotients
from an integral domain D(5.11), we use a subset of (P(D X D). It can
be seen, however, that in none of these cases do we use the full force of
the power-set axiom, for it can be shown that in each case we do not actually
increase the cardinalities dealt with (for D infinite). We know by 7.63
that 9(A) always has greater cardinality than A. The only case where
we use this axiom in its full force is in the construction of Re, either via
Dedekind sections (certain subsets of Ra) or via fundamental sequences.
As we saw in 7.66, (P(P) is set-theoretically equivalent to a subset of Re;
in fact, it can be shown that (P(P) ~ Re. The difference from the con¬
struction of D[£] is that the use of infinite, but essentially finite, sequences
(a0, . . . , an, . . .) is a simplifying technical device; we could equally well
have used the set of all finite sequences (a0, . . . , an) of any length n. As we
saw in 7.69, if D is denumerable, so also is this set; in general, if D is in¬
finite, the two sets are set-theoretically equivalent.
The Axiom 8 of unions also enters in a very essential way in the verifi¬
cation of the basic property of the real numbers, the continuity of its
ordering. This is best seen from the proof of 7.5, according to which if
(S, <) is a densely ordered system without first or last element then the
system (U(S), <) of upper Dedekind sections (described there) is a
continuously ordered system. The main step (9) in the proof was to show
that if A is any upper section of U(S) then U(S) — A has a largest
element Z; we took X = UX[X e A], (A related consideration, but one
more difficult to isolate logically, is in the proof of 7.31 of the existence
of a continuously ordered field; there the main step is (17), to show that
398 APPENDIX I

any fundamental sequence in the class Fd*(Ra) of equivalence sets of


fundamental sequences from Ra has a limit in Fd*(Ra).)
The axiom of choice has been used only rarely in this book and in all
cases where it has been applied its use could have been avoided. Consider,
for example, the proof of 7.49, that if F is continuous on Re and A =
{F(x): a < x < b} then A is bounded above. The part of the argument
involving the axiom of choice ran as follows: If A is not bounded above
then for each n e P there exists x with a < x < b and n < F(x). Let
H{n) = Xn = {x: a < x < b and n < F{x)} for each n e P, and let
M = {Xn: n e P}. Thus each Xn 0; by the axiom of choice we can
pick a definite xn e Xn for each to. More precisely, applying the form
Axiom 12", we have a function G with domain M such that G(X) e X
for each X E M. Then we let xn = G(H(ri)) for each to. The axiom of
choice can be avoided here as follows: We note first that since F is con¬
tinuous, for' each to E P there exists y with a < y < b, y E Ra, and
n < F(y). For if we take real x with n < F{x), any rational number y
sufficiently close to x in this interval will also have n < F(y). Let
Yn = {y: a < y < b and y e Ra and n < F(y)} for each n G P. Thus
Yn ^ 0 and Yn c Xn for each n. Now we know by 7.70 that the set Ra
of rational numbers is denumerable. Furthermore, if we follow the proof
of this result, it is easy to produce a definite function E with domain P
and range Ra; let rm = E(m) for each m, so that Ra = {rm: me P}.
Thus for each n e P there exists me P with rm e Yn; given n e P, let
L(n) be the least such m (which exists by the well-ordering of P). Then if
we set yn = rL(n) for each n we have yn e Yn, hence yn e Xn for all n,
as desired. Thus we can give an explicit description of a choice function
in this case. [This is not to be confused with an effective description—we
may have no way to mechanically compute the value of L(n) for each n
although, given L(n), we can compute rnn) = E(L(ri)).] More generally,
if S is any denumerable set and M is a denumerable collection of subsets
of S, say M = {Xn: n e P}, then a choice function G with G(Xn) e Xn
for each nonempty Xn can be explicitly defined and proved to exist with¬
out the axiom of choice. The interested reader should survey other uses
of the axiom of choice in this book to see how to avoid it in each case.
It is natural to ask why we have mentioned and used the axiom of choice
if we can do without it. The reason is that it is quite easy to overlook the
fact that the axiom is implicitly involved in various arguments. Indeed,
the explicit recognition that it was necessary to use it in a number of
standard proofs in analysis was not made until Zermelo did so in the early
1900’s. Once we become aware of this need, we become accustomed to
isolating its role in arguments and then dispensing with it, where possible,
by alternative methods. Unfortunately, it does not seem possible to do
this throughout analysis.
SOME AXIOMS FOR SET THEORY 399

The axiom of choice occupies a controversial position in modern mathe¬


matics. In contrast to the other axioms of set existence, it does not
explicitly describe the set, which it claims to exist, by means of a condition
on its members. It thus rests on a different kind of intuitive evidence
than that required to justify the remaining axioms. It is for this reason
that one finds it constantly referred to in modern mathematical writings,
even where the set-theoretical basis of the arguments is not otherwise
explicitly mentioned.
As we have mentioned earlier, there are systems of axioms of set theory
that are stronger than the one presented here and are in current usage as
a basis to account for all the known developments of mathematics. On the
other hand it is natural to try to isolate the minimal set-theoretical
assumptions needed for a given restricted part of mathematics, such as
classical analysis or algebra. In this direction, the axiom system presented
here is much stronger than necessary. The study of weaker systems and
alternative formulations is the object of current investigations into the
foundations of analysis. A proper understanding of these demands some
background in metamathematics (the logical study of systems of mathe¬
matical reasoning). Such investigations of the foundations of mathe¬
matics are part of the continuing effort (which has been carried on through¬
out the history of mathematics) to clarify its basic concepts and methods
and their interrelationships.
APPENDIX II

THE ANALYTICAL BASIS OF THE


TRIGONOMETRIC FUNCTIONS

We sketch in this appendix two approaches to the analytical definition


and verification of the basic properties 8.14(i)-(vi) of the trigonometric
functions cos and sin. Both of these depend on a modest development of
the calculus, which presumes no knowledge of the number systems beyond
that provided in Chapters 1-7. Essentially what is needed here are the
“fundamental theorem of calculus,” relating integrals to derivatives, and
the basic facts about the connection between signs of derivatives and the
increasing or decreasing character of a function.
The first approach proceeds directly to explicate in analytic terms the
geometric notions involved, beginning with that of angle. We follow
here the treatment of Morrey’s University Calculus* pp. 214-228 and
271-274. Consider a point (z, y) on the unit circle, with y > 0. Thus
x2 + y2 = 1 and y = \/l — x2. We wish to define what is meant by
the angle associated with (x, VI — x2), which we denote by A(x).

In radian measure, the angle Z.NOP should have the same value as twice
the area of the sector NOP. The latter is seen to be the area of the triangle
OQP, where Q is (x, 0), together with the area under the circle and above
the £-axis between Q and N. This leads us by the calculus to define the
function A, with domain {x\ — 1 < x < 1}, by the following condition:

(1) A{x) — 1 < x < 1.

Geometrically, this is first seen to be the appropriate definition for


0 < x < 1.

* C. B. Morrey, Jr., University Calculus with Analytic Geometry, Reading,


Mass.: Addison-Wesley Publishing Co., 1962.
400
ANALYTICAL BASIS OF TRIGONOMETRIC FUNCTIONS 401

When —1 < x < 0, the quantity fl \/1 — t2 dt gives the total area
under the circle and above the ;r-axis between Q and N, while the quantity
— x2 gives the negative of the area of the triangle OQP, so (1)
still gives A(x) correctly as twice the area of the sector NOP.
Taking (1) as the starting point, we can prove the following:

(2) (a) The derivative A'(z) is defined for — 1 < x < 1 and A'(x) =
— 1/Vl — x2 there;
(b) A is continuous and decreasing for — 1 < x < 1;
(c) ^4(1) = 0 and A(—1) = 7r where ir = 2 \/l — t2 dt;
(d) A(—x) = 7r — A(x) and .4(0) = 7t/2;
(e) if 0 < 0 < it then there is a unique x with —1 < x <1
and A{x) = 0; further, 0 < x < 1 if 0 < 0 < ir/2.

The continuity of A and the formula for A' (x) are seen from the represen¬
tation

A(x) = xy/l — + 2 f Vl — t2 dt - 2 j1 v 1 — t2dt,

the standard method for differentiating xy/l — x2, and the fundamental
theorem of calculus, by which the derivative of \/l — t2 dt is
\/l — x2. By (a), we have A'(x) < 0 for —1 < x < 1, and hence A
is decreasing for — 1 < x < 1. Parts (c)-(e) are straightforward; in (e)
we can use Weierstrass’ Nullstellensatz 7.48 to find at least one x with
A(x) = d and — 1 < x < 1, given that 0 < 9 < it; similarly if

0 < Q < tt/2.

The next step, on the basis of (2e), is to define functions C and S by:

(3) (a) C(d) = x and S(6) = y/l — x2 whenever 0 < 6 < it and
x is the unique number such that—l < x < 1 and A{x) = 6]
(b) C(—d) = C{0) and S(—d) = — S(9) whenever — tv < d < 0;
(c) C(6 + 2mr) = C(d) and S(d + 2mr) = S(6) for any d and
n e I.

Since for each 6 there is a unique n e I with — tv < 6 + 2mr < ir,
these conditions determine C(0), S(6) in a definite way for all d. We then
have from (2c, d) and the preceding definition:

(4) (a) (7(0) = 1 and S(0) = 0;


(b) C(-tc/2) — 0 and S(tt/2) — 1;
(c) if 0 < 6 < tt/2 then 0 < (7(0) < 1 and 0 < S(6) < 1;
(d) for any 0, (72(0) + S'2(0) = 1.
402 APPENDIX II

In other words, we obtain 8.14(i)-(iv) from these definitions. What must


still be shown is that C and 8 are continuous functions on Re, and that the
addition formulas 8.14(v), (vi) are satisfied.
We consider the latter first. It is seen that C(—6) = C(8), 8 ( — 8) =
—S(6) hold in general; thus it is sufficient to derive the subtraction
formulas
(5) (a) C(81 d2) = C(8,)C(d2) -T 8(d\)S(82);
(b) 8(8, - e2) = 8(e1)C(82) - C(8,)S(82).

These are a little easier to obtain from (1)—(4), but an analytic verifica¬
tion without appeal to intuitive geometrical considerations is still fairly
troublesome. We have here to speak of the angles of a triangle.

A triangle is determined by three distinct points, M, N, Q.

To determine, for example, the angle ANMQ in terms of our previous


definition of angle, we need the notion of a rigid motion in the plane. In
this case, we would consider a motion that sends M into the origin 0 and N
into a point A' on the positive x-axis. Then Q goes into a certain point Q'.

Then we take xLNMQ to be the angle 8 determined by the intersection


P(x, y) of OQ' with the unit circle. Speaking precisely, a rigid motion is
a function F which is a one-to-one mapping of Re X Re onto itself with
the property that the distance between points is preserved, i.e., if we
write F(x, y) = (G(x, y), H(x, y)) for any x, y, we have

V(x2 — x,)2 + (y2 -- y,)2

= v'CG'fo, 2/2) - G(xlt y,))2 + (II(x2, y2) - H(Xl, y,))2

for any (xx, y 1), (x2, y2). What must be proved here is that given distinct
points M(xM, yM) and Ar(.rJ\r, ym) we can find a rigid motion F with
ANALYTICAL BASIS OF TRIGONOMETRIC FUNCTIONS 403

F(xm, Vm) — (0, 0) and F(xN, yN) = (d, 0) with d > 0. We suppose
that this can be done.
Consider the special case where M is already the origin O and N, Q
are points on the unit circle with xN > xQ. The figure below shows the
given triangle and its transformed version.

In this case the image of N is A^O, 1) and that of Q is some P(x, y) on the
unit circle, since both OQ, ON have unit length. Let 9y = A(xN),
92 = A(xq), and let 9 equal the angle /LNOQ as determined by the rigid
motion, hence 9 = .4(.r). Thus, xN = C(91), yN = S(6i), xq = C(d2),
Vq = F(92), x = C(6), y = S(6). It can be shown that rigid motions
also preserve areas. The area of the sector ONQ is d2 — 9y) and that
of ON'P is \9] hence 9 = 92 — 9y. Now the square of the distance of N'
to P is

(x - 0)2 + (y - l)2 = .r2 + V2 ~ 2y + 1 = 2 - 2y = 2 - 2C(02 - 9y).

On the other hand, because we have a rigid motion,

(x — 0)" + (y — l)2 = (xq — xN)2 + (yQ — yN)2


= (xq — 2 xqXn + x-n) + (yQ — 2 yQUN + Vn)

= (xq + yo) + (xfr + y%) — 2 (xqXn + 2/ ?/ )


q v

= 2 - 2(C(02)C(()1) + «fi1)).

Thus, by comparison,

C(92 - 6y) = C(e2)C(9y) + S(02)S(9y),

that is, we have (5a) for this special case of 9y, 92. Now (5a, b) can be
proved in general by a systematic use of (2)-(4).
To conclude this approach, the continuity of the functions C, S can be
obtained by using a general theorem of the differential calculus, according
to which if F is a function on Re and F'(a) is defined then F is continuous
at a., In this case we show that C'(9), S'(9) are defined for every 9, and

(6) C'(9) = —S(9) and S'(9) = C(6).


404 APPENDIX II

For example, to show that

S'(t) = lim ^ \ - SW
A--> 0 h

exists, we use 5(0 + h) = S(6)C(h) + C(6)S(h); then

,5(0 + fc) _ 5(0) (C(h) - 1) , /~l / n\ m


-h-= m -h-+ c(d) ~~h~

The desired result (6) is then seen from the special results

(7) lim ® = 1 and lim -- = 0.


>0 h h—>0 h

To obtain (7), we use an inequality on areas.

The area given by h is equal to twice the area of the sector ONP, which
lies between twice the areas of the triangles OMP and ONQ, hence

C(h)S(h) < h < CCP ■

Then C(h) < h/S(h) < 1 /C(h) and we get \imk-+oh/S(h) = 1 from
lim/,^0 C(h) = 1. The second part of (7) is obtained by writing

1 - C(h) _ (1 - C(h))(l + C(h)) _ S2(h)


h h( 1 + Cih)) h( 1 + C(h))
S(h) S(h)
h ' 1 + CQi) '
Then
Sjh) S(h)
lim
h-> o h
and lim —
h->o 1 C{h)
= 0.
yielding (7).
f his concludes our discussion of the first approach to the trigonometric
functions. The reader interested in more details should consult Morrey’s
ANALYTICAL BASIS OF TRIGONOMETRIC FUNCTIONS 405

text cited above. A related approach can be taken based on the definition
of angle in terms ol arc length rather than area; we shall not pursue this.
The second approach to the trigonometric functions is more sophisti¬
cated and formal, but the verification of the basic properties is also
smoother going. It is based on the power series representations of these
functions. We follow here the treatment given by Rudin in his book,
Principles of Mathematical Analysis* pp. 150-152.
What lies behind the basic definitions taken here is that if F is a func¬
tion on Re such that the nth-derivative Fin) (taking F(0) = F) is defined
on Re for each n then we have

F(n\0) B
F(x)
E
n=0
n\ A

(Actually, this only holds provided certain boundedness conditions are


met; what concerns us here is this representation as motivation.) If the
trigonometric functions C, S are to be obtained satisfying 8.14(i)-(vi),
then by the preceding arguments we will have C(1)(0) — — N(0), *S(1)(0) =
C(d). Hence we see that C(2)(0) = — (7(0), C(3)(0) = S(6), (7(4)(0) =
(7(0), and in general C(n+4:\d) = (7(”)(0); we proceed similarly with S(6).
This shows that C(2n + 1)(0) = 0 and (7(2n)(0) = ( — 1)™, while

N(2n+1)(0) = (-1)”, S(2n)(0) = 0,

for any n. With this in mind, we can take the following as definitions of
the functions C, S:

V (-Dre .an (~Dn q2u-\- l


(8) (7(0)
(2n)!
and S(e) E
n=
0
(2n -f- 1)!

That these functions are defined for all 0 can be proved from 7.41. Further,
we see immediately from (8) that (7(0) = 1, $(0) = 0, C(—0) = (7(0)
and *S(—0) = —S(6) for any 0.
Now it is shown in analysis that if a function F is defined by F(x) =
o °nXn for all x, then F is continuous on Re, all derivatives F(k) are
defined for all x, and we have

Fik\x) = ^2 n{n — 1) ... in — (k — 1 ))anxn k


n=0

n —k
= E anx

* Walter Rudin, Principles of Mathematical Analysis, New York: McGraw-


Hill, 1953.
406 APPENDIX II

(Hence a* = F(k)(0)/k\ for each k.) In particular, it follows from (8)


that C, S are continuous on Re, that C'(6), 5'(0) are defined for all 9,
and that

Y' (2 ft) • fl2n-l (~1)W ft2n-l


C'(e)
^ (2n - 1)1 (2n)! ^ (2n - 1)1
n—1 ' ' n—l v

s
n=0
(—1)” + 1
(2n +1)1
„2re + l
-W,
and S'(6) = (7(0) (by a similar computation).
Consider any fixed real number <p and let

(9) (7(0) = C(0 + <p) - C(d)C(<p) + 5(0)5fa),


H(e) = S(9 + <p) - c(d)S(<p) - s(e)C(<p),
F{9) = G2(d) + H2{e).

Then G\9) = -S{9 + <p) + 5(0) C(^) + (7(0)5(v) = -H(6) by the


preceding, and also H'(6) = G(6). Hence, F'{6) = 2 G(9)G'{6) +
2H{B)H'{6) = 0. But then F(0) must have a fixed constant value for all 9.
Since G{0) = C{<p) — C{<p) = 0 and H(0) = S(<p) — S(<p) — 0, we have
F(0) = 0. Thus F(9) = 0 for all 9. But then G{9) = 0 and H{9) = 0
for all 9 since G2{9) > 0, H2{9) > 0. Thus C{9 + <p) = C(9)C(<p) —
S(6)S(<p), S(9 + <p) = C(6)S(<p) + S(6)C(cp), and the addition formulas
8.14(v), (vi) are verified. Taking <p = —9 gives C(0) = C(9)C(—9) —
S(9)S{—9), hence 1 = C2(9) + S2(9) by the first consequences of (8).
Thus also 8.14(iv) is established.
An even simpler derivation of the addition formulas is based on complex
power series, noting thatC(0) + i5(0) = E(i9), where E(z) = E*=o zn/n\.
[Compare the addition formula E(x -f y) = E(x) • E(y) of 7.45, as well
as Exercise 13, Exercise Group 8.1, and the discussion of E{z) follow¬
ing 8.32.]
The only thing remaining to be proved is 8.14(h), (iii), that C(ir/2) = 0,
S(tt/2) = 1, and 0 < C(0) < 1, 0 < 5(0) < 1, for 0 < 9 < tt/2; here
7r is a number to be introduced in a way different from its geometric
definition (2c). We shall define 7t/2 as the least positive solution 0X of
C(di) = 0; it is thus necessary to show that there exists such a 6fi. That
there exists positive 9X with C(0i) = 0 can be seen from Weierstrass’
Nullstellensatz if it is proved there exists positive 9 with C(9) < 0 [since
we know C(0) = 1], Suppose, to the contrary, that (7(0) > 0 for all 9.
Then 5'(0) > 0 for all 9 and hence 5(0) is a strictly increasing function,
by differential calculus. Since 5(0) = 0, we would thus have 5(0) > 0
for 9 > 0 and hence C\9) < 0 for all such 0; but then (7(0) is a strictly
decreasing function. It follows that 0 < (7(0) < 1 for all 9 > 0. Now let
ANALYTICAL BASIS OF TRIGONOMETRIC FUNCTIONS 407

0O > 0 be fixed, and consider 1(d) = fl0S(t)dt for 0O < 0. By the


fundamental theorem of calculus, 1(d) = —C(6) + C(0O); hence 1(6) < 1.
On the other hand, since S is strictly increasing, the value of the integral
feQS(t) dt is larger than S(60) times the length of the interval 0 — 0O,
that is, 1(6) > S(60)(d - 0O). Since S(60) > 0, we can take 0 so large
that 1(6) > 1. We thus have a contradiction to the hypothesis (7(0) > 0
for all 0 > 0. Hence, as we have seen, there is 0i > 0 with C(6i) = 0.
If there were no least such 0X, then we could find a sequence (0n) with
lim,woo 6n = 0 and (7(0,J = 0 for all n; but then by continuity (7.47) we
would have (7(0) = 0.
We can thus define tv to be 20x, where 0X is the least positive real number
with (7(0X) = 0. Hence C(tt/2) = 0 and 0 < (7(0) < 1 for 0 < 0 < tt/2.
Again by using S'(0) = (7(0) it follows that S(6) is strictly increasing for
0 < 0 < tt/2, hence 0 < S(6) for 0 < 0 < tv/2, and by continuity and
the increasing property, also 0 < S(t/2). But S2(tt/2) + C2(tv/2) = 1,
so S(tt/2) = 1. This completes the proof.
APPENDIX III

BIBLIOGRAPHY

A list of suggestions for further reading.

Logic:
Quine, W. V., 21 lathematical Logic (rev. ed.). Cambridge: Harvard University
Press, 1951.
Suppes, P., Introduction to Logic. Princeton: Van Nostrand, 1957.

Set Theory:
Kamke, E., Theory of Sets. F. Bagemihl, translator. New York: Dover, 1950.
Suppes, P., Axiomatic Set Theory. Princeton: Van Nostrand, 1960.

Algebra:
Birkhoff, G. and Maclane, S., A Survey of Modern Algebra (rev. ed.).
New York: Macmillan, 1953.
Paige, L. J. and Swift, J. D., Elements of Linear Algebra. Boston: Ginn, 1961.
Van der Waerden, B. L., Modern Algebra, Vols. I and II. F. Blum, trans¬
lator. New York: Ungar, 1949.

Number Theory:
Le Veque, W. J., Topics in Number Theory, Vols. I and II. Reading: Addison-
Wesley, 1956.
Pollard, H., The Theory of Algebraic Numbers (Carus Mathematical Mono¬
graphs, Number 9). New York: Wiley, 1950.

Analysis:
Apostol, T. M., Mathematical Analysis. Reading: Addison-Wesley, 1957.
Hille, E., Analytic Function Theory, Vol. I. Boston: Ginn, 1959.
Rudin, W., Principles of Mathematical Analysis. New York: McGraw-Hill,
1953.

408
INDEX
INDEX

Abel, N. H., 351 of extensionality, 392


Absolute maximum, 9, 270 of infinity, 65, 395
Absolute minimum, 270 of Peano systems, 65
Absolute value, 112, 310 of restricted set formation, 393
Addition of positive integers, 75 of set-existence, 393
Additive inverse, 106 Axiomatic set theory, 23, 391
Adjunction of roots, 362
Algebra, fundamental theorem of, 331
of sets, 25, 29 Basis, linearly independent, 366, 369
Bernays, P., 391
Algebraic (complex), number, 337
Binary, function, 50
equations in fields, 200
operation, 50
field extensions, 353, 365
relation, 43
integer, 351 (Ex. 4), 387
number field, 353 representation of real numbers, 259
Binomial expansion, 127, 287
number theory, 386
Bi-unique function, 53
over a field, 337
real number, 288 Bolzano-Weierstrass theorem, for the
system, 55 complex numbers, 324
for the real numbers, 243
{See also Simple algebraic extension)
Boolean algebra, 32
Algebraically closed field, 337, 373
Bound variable, 14
Algebraically dependent, 373 (Ex. 8)
Bounded, from above, 232
Algorithm, 10, 204, 285
from below, 232
division, 131, 212
sequence, 243, 324
Euclidean, 135, 236
Analytic function, 333
Analytic geometry, 186 Calculus, fundamental theorem of, 401
Analytic number theory, 386 Cancellation law, 77, 80, 105, 108
Analytical basis of the trigonometric Cantor, G., 391
functions, 400 Cantor’s (diagonal) method, 289, 293
Angle, 309, 374, 400 Cantor’s ternary set, 301 (Ex. 1)
Antecedent, 11 Cardan, G., 346
Antisymmetric law, 33, 83 Cardinal number, 57
Antisymmetric relation, 44 Cartesian, plane, 374
Archimedean ordering, 236 product, 37, 394
Archimedean property, 235 Cauchy, A. L., 229, 239
Argand diagram, 318 Cauchy sequence, 242
Argument of a complex number, 313 Characteristic of an integral domain,
Associative law, 33, 75, 80, 105 157 (Ex. 7)
generalized, 94, 125 Characterization of the positive inte¬
Axiom, 3 gers, 72, 89
of abstraction, 393 Choice, axiom of, 18, 35, 100 (Ex. 5),
of choice, 18, 35, 100 (Ex. 5), 396 396
411
412 INDEX

Circle, 374 Convergence, circle of, 333


of convergence, 333 radius of, 263
Class, 14 uniform, 287 (Ex. 3)
Classical construction problems, 381 Convergent sequence, 241, 324
Closure under an operation, 50 Convergent series, 256
Coefficient, leading, 167 Converse of a relation, 39, 41, 52
Coefficients, of a polynomial, 159, 167 Coordinates, 374
of a system of equations, 202 Correspondence, one-to-one, 53
Collection, 14 Cos (cosine), 309, 317, 400
Commutative law, 33, 77, 80, 105 Countable set, 292
generalized, 94, 125 Course-of-values induction, 140
Commutative ring with unity, 104 Cubic equations, roots of, 346
Complement of a set, 25, 394 Cut, 227
Complex, conjugate, 306 Cycle, 130 (Ex. 2)
(See also Fundamental theorem of
complex algebra)
Decimal representation, 257
numbers, 1, 303, 306
Dedekind, R., 65, 229, 387
Composition of functions, 52
Dedekind section, 399
Composition of relations, 43
Degree of a polynomial, 167, 173
Computation, of roots of complex poly¬
de la Vallee Poussin, C. J., 387
nomials, 334
De Moivre’s theorem, 318
of roots of real polynomials, 275
De Morgan’s laws, 34
(See also Effective computation pro¬
Densely ordered system, 190
cedure)
Denumerable set, 291
Conclusion of an implication, 11
Derivative of a function, 400
Condition, 5, 393
Derivative of a polynomial, formal, 168
necessary and sufficient, 12
Determinant, 204
with one free variable, 15
Diagonal method, Cantor’s, 289, 293
with several free variables, 15, 36
Dimension, 369
Congruence, class, 153
Diophantine problems, 155
modulo an integer, 152
Direct predecessor, 87
modulo a polynomial, 359
Direct successor, 87
relation, 50
Discriminant, 342
relations in a field, 191
Disjoint sets, 29
relations in the integers, 147
Distance, 246, 255
Conjugate, complex, 306
Distributive law, 33, 78, 105
Connectives, logical, 10
generalized, 97, 125
Connectivity law, 83
Divergent series, 256
Consequent, 11
Divisibility relation, between integers,
Constant, 5
133
Constant polynomial, 167
between polynomials, 167, 210
Constructible numbers, 378
in an integral domain, 183
Constructions, ruler and compass,
Division algorithm, for integers, 131
376
for polynomials, 212
Continued-fractions representation,
Domain, integral, 101, 108
261, 267 (Ex. 5)
Domain of a relation, 39, 394
Continuous function, of complex num¬
Duality of statements, 33, 105
bers, 327
Duplication of the cube, 381
of real numbers, 9, 267
Continuously ordered field, 235
Continuously ordered system, 226 Effective computation procedure, 285
Contradiction, proof by, 12 Eisenstein’s theorem, 386 (Ex. 4)
INDEX 413

Elementary symmetric polynomial, iterated finite, 371


176' linearly generated, 366
Element of a set, 16 simple, 356
Empty sequence, 124 simple algebraic, 357
Empty set, 22, 392 simple transcendental, 160, 289, 357
Entire function, 333 Extensionality, axiom of, 392
Equality; see Identity
Equations, algebraic, 200 Factor, 133
cubic, 346 Factorization of integers into primes,
of degree higher than four, 350 139
fourth degree, 349 Factorization of real polynomials, 335
polynomial, in the rationals, 208 Factorization theorem, unique, for in¬
quadratic, 274, 308 tegers, 142
solvability of, by radicals, 345, 351 for polynomials, 217
systems of linear, 201 Fermat prime, 383
Equinumerous sets, 57 Fermat’s “last theorem,” 21, 387
Equivalence, logical, 12 Ferrari, 346
relation, 44 Ferro, S., 346
sets, 46 Field(s), 183, 187
set-theoretical, 57, 290 algebraically closed, 337, 373
Erdos, P., 387 continuously ordered, 235
Essentially finite sequence, 160 finite, 190
Euclid, 229 finite extensions of, 369
Euclidean algorithm, for integers, 135 generated by a set of elements, 353
for polynomials, 216 of quotients, 192
Euclidean plane, 374 of rational forms, 198
Euclidean space, 327 ordered, 187
Eudoxus, 229 Finite, extension of a field, 369
Euler’s function, 383 sequence, 91, 124
Even integer, 133 set, 16, 57, 92
Eventually repeating representation, Finite-dimensional vector space, 366
261 First element, 85
Exchange condition, 368 Formal derivative of a polynomial, 168
Existence statements, 7 Fourth degree equations, roots of, 349
Existence theorems, for continuously Fraenkel, A., 391
ordered fields, 251 Free variable, 14
for continuously ordered systems, Function(s), 36, 46, 395
229 bi-unique, 53
for fields of quotients, 192 composition of, 52
for ordered integral domains, 115, constant, 167
119 continuous, 9, 267, 327
for simple transcendental extensions, converse of, 52
163 exponential, 264, 322 (Ex. 13), 333
Exponential function, 264, 322 (Ex. general recursive, 285
13), 333 implicit, 48
Exponentiation, 81, 107, 127, 189, 285 inverse, 53
Extended, intersection of sets, 34 multivalued, 49
products, 93, 124 one-to-one, 53
sums, 93, 124 polynomial, 158, 166, 172, 328
union of sets, 34, 393 single-valued, 49
Extension(s) of a field, 355, 365 trigonometric, 313, 400
finite, 369 Fundamental sequence, 242, 327, 399
414 INDEX

Fundamental theorem, of calculus, 401 Integers, 1, 101, 122


of complex algebra, 331 algebraic, 351 (Ex. 4), 387
on symmetric polynomials, 178 positive, 1, 64
Integral, 400
Galois theory (of equations), 351, 385 Integral domain, 101, 108
Gauss, K. F., 331, 350, 383 characteristic of an, 157 (Ex. 7)
Gaussian plane, 318 ordered, 110
gcd, see Greatest common divisor Integration, 313, 335
General recursive function, 285 Intersect, 375
Generalized, associative laws, 94, 125 Intersection of sets, 25, 394
commutative laws, 94, 125 extended, 34, 394
distributive laws, 97, 125 Interval, 35 (Ex. 1), 243
Generation of subfields, 353 Inverse, additive, 106
Geometric construction problems, 374 multiplicative, 187
Geometric representation of the com¬ of a function, 53
plex numbers, 309 Irrational numbers, 1, 209
Geometric series, 127, 259 Irreflexive relation, 44
Godel, K., 391 Irreducible polynomial, 211
Greatest common divisor (gcd), for in¬ Isomorphism, 55
tegers, 135 Isomorphism theorem, for continu¬
for polynomials, 215 ously ordered fields, 238
Groups, 351 for fields of quotients, 197
for /e-fold transcendental extensions,
171
Hadamard, J. S., 387
for ordered integral domains, 120
Holomorphic function, 333
for Peano systems, 71
Homogeneous polynomial, 174, 201
for simple transcendental extensions,
Homomorphic image, 149
163
Homomorphic mapping, 149, 191
Iterated finite extensions of a field, 371
Homomorphism, 148
properties preserved under, 150
Hypothesis of an implication, 11 Join, see Union of sets
/c-fold transcendental extension, 171
Ideal, 153, 387 Kronecker, L., 387
Identity, element, 105 Kummer, E. E., 387
laws for, 21, 392
of indiscernibles, 20 Lagrange, J. L., 350
of sets, 19, 392 Largest element, 85
Imaginary numbers, 1, 306 Last element, 85
Imaginary unit, 306 Law of signs, 107
Implication, 11 Law of the mean, 288 (Ex. 9)
Implicit function, 48 Leading coefficient, 167
Inclusion, 19, 33 Least element, 85
Incommensurable quantities, 229 Leibniz, G. W., 20
Individuals, 391 Limit of a sequence, 239, 324
Induction, course-of-values, 140 Line, 374
proof by, 66, 124 Linear, algebra, 201, 366
Infinum (inf), 232 basis, 369
Infinite, sequence, 91, 124 combinations, 137, 366
series, 256 polynomial, 174, 201
set, 16, 57 (See also: Systems of linear equa¬
Infinity, axiom of, 65, 395 tions)
INDEX 415

Linearly, dependent, 366 algebraic real, 288


generated extension, 366 cardinal, 57
independent, 366 complex, 1, 303, 306
Liouville numbers, 299 constructible, 378
Liouville’s theorem, 298 imaginary, 1, 306
Location of roots (of a real polyno¬ irrational, 1, 209
mial), 278 Liouville, 299
Logic, 4 prime, 7, 134
Logical, argument, 3 rational, 1, 183, 198
connective, 10 real, 1, 222, 255
equivalence, 12 transcendental real, 288
Lower bound, 232 Numerical value; see Absolute value
greatest, 232
Lower section, 226 Odd integer, 133
One-to-one correspondence (function),
Mathematical statement, 4, 10 53
Mathematical system, 55 Order of a root of a polynomial, 219
Matrices, 204 Operation, 50
Maximum, absolute, 9, 270 Ordered; see Continuously ordered sys¬
Mean, law of the, 288 (Ex. 9) tem and Densely ordered system
Measurement of lengths, 184, 225 field, 187
Meet; see Intersection integral domain, 110
Member of a set, 16 n-tuple, 43
Membership relation, 392 pair, 37, 393
Metamathematics, 32, 150, 399 quadruple, 43
Metric, 246 (See also: Simply ordered system)
Minimum, absolute, 270 triple, 42
Modulo an integer, congruence, 153 (See also: Well-ordered system)
Modulo a polynomial, congruence, 359 Ordering, Archimedean, 236
Modulus, 310
of complex polynomial functions, 328
Morrey, C. B., Jr., 400 Pairwise disjoint sets, 29
Multiple, 107, 133 Parallelogram law, 310
Multiple root of a polynomial, 219, Parameter, 7
335, 342 Partial-fractions representation, 221
Multiplication of positive integers, 78 (Ex. 12), 336
Multivalued function, 49 Partial sums, 256
Partition of a set, 45
n-ary function, 50 Pascal triangle, 128
n-ary relation, 43 Peano, G., 65
Necessary and sufficient condition, 12 Peano systems, 64, 395
Newton’s method, 276 Peano’s axioms for the positive inte¬
Nondenumerable set, 291 gers, 65
nth roots, of complex numbers, 318 Periodic representation, 261, 267 (Ex.
of real numbers, 274 4)
of unity, 320 Permutation, 126, 130 (Ex. 2)
primitive, of unity, 322 (Ex. 12) Plane, 374
n-tuple, ordered, 43 Point, 374
Number of variations of sign, 279 Polygon, regular, 383
Number theory, 155, 386 Polynomial(s), 158, 210
Numbers, algebraic over a field, 337 coefficients of a, 158
algebraic complex, 337 constant, 167
416 INDEX

degree of a, 167, 173 Quadratic formula, 274, 308


divisibility relation between, 167, Quadruple, ordered, 43
210 Quine, W. V., 391
equations in the rationals, 208 Quotient, 132, 184, 187
form, 158 Quotients, field of, 192
formal derivative of a, 168
function, 158, 166, 172 Radians, 311, 400
homogeneous, 174, 201 Radicals, solvability by, 345, 350,
in several variables, 170 385
irreducible, over a field, 210 Radius of convergence, 263
linear, 174, 201 Range of a relation, 39, 394
monic, 167 Ratio, 187
over the complex numbers, 327 Rational forms, field of, 198
over the rational numbers, 208 Rational numbers, 1, 183, 198
over the real numbers, 267 Rational roots of polynomials, 208
prime, over a field, 211, 218, 274, Real numbers, 1, 222, 255
336 Recursive definition, 73, 75, 124
root field of a, 353, 363 Reduced sequence, 279
roots of a, 166, 168, 172 Redudio ad absurdum, 12
symmetric, 174, 339 Reflexive law, 21, 33, 83
zeros of a, 166 Reflexive relation, 44
Positional notation, for integers, 143 Regular polygon, 383
for real numbers, 257 Relation^), 36, 39, 395
Positive integers, 1, 64, 72, 89 binary, 43
Peano’s axioms for the, 65 composition of, 43
Postulate, 3 congruence, 50
Power series, 262, 405 converse of a, 39
Power-set, 393 domain of a, 39, 394
Predecessor, direct, 87 equivalence, 44
Prime, Fermat, 383 n-ary, 43
number, 7, 134 range of a, 39, 394
number theorem, 143, 387 reflexive, 44
polynomial over a field, 211, 218, simple ordering, 83
274, 336 symmetric, 44
relatively, integers, 138 transitive, 44
Primitive nth root of unity, 322 (Ex. Relatively prime integers, 138
. 12>
Primitive recursive definition, 75
Relatively prime polynomials, 216
Remainder, 132
Principal nth root of a complex num¬ Representations of real numbers, 257
ber, 319 Resultant, 351 (Ex. 5)
Principal square root of a complex Rigid motion, 402
number, 308 Ring, commutative, with unity, 105
Product, cartesian, 37, 394 Rolle’s theorem, 288 (Ex. 9)
Products, extended, 93, 124 Root(s), adjunction of, to a field,
Proof, by contradiction, 12 362
by course-of-values induction, 140 computation of complex, 334
by induction, 66, 124 computation of real, 275
Properties preserved under homomor¬ field of a polynomial, 353, 363
phism, 150 location of real, 278
Properties preserved in subsystems, multiple, 219, 335, 342
151 nth, of complex numbers, 318
Pythagoras’ theorem, 155, 222 of real numbers, 274
INDEX 417

of unity, 320, 322 (Ex. 12) nondenumerable, 291


of complex polynomials, 331 of all subsets (of a set), 20, 24, 291,
of cubic equations, 346 393
of fourth degree equations, 349 union of, 25
order of, 219 Set-theoretical equivalence, 57, 290
of a polynomial, 166, 172 Signs, law of, 107
of rational polynomials, 208 Simple algebraic extension, 357
of real polynomials, 271 Simple extension of a field, 356
simple, 219, 278, 335 Simple ordering relation, 83
Rudin, W., 405 Simple root of a polynomial, 219, 278,
Ruler and compass construction, 335
376 Simple transcendental extension, 160,
Russell’s paradox, 22, 391 289, 357
Simply ordered system, 83
•Section, Dedekind, 399 Sin (sine), 309, 317, 400
lower, 226 Single-valued function, 49
upper, 226 Solutions of a system of linear equa¬
Sector of a circle, 313 tions, 202
Selberg, A., 387 Solvability of an equation by radicals,
Self-dual statement, 33 345, 350, 385
Sequence, bounded, 243, 324 Space, Euclidean, 327
Cauchy, 242 Square root of (—1), 306
convergent, 241, 324 Square roots of complex numbers,
empty, 124 307
essentially finite, 160 Squaring a circle, 385
finite, 91, 124 Statement, existence, 7
fundamental, 242, 327, 399 mathematical, 4, 10
infinite, 91, 124 Sturm sequence, 280
limit of a, 239, 324 Sturm’s theorem, 278, 281, 334
Sturm, 280 Subdomain, 108
Series, convergent, 256 ordered, 111
divergent, 256 Subfield, 187, 353
geometric, 127, 259 ordered, 187
infinite, 256 Subsequence, 243, 324
power, 262, 405 Subset, 20
Set(s), 14, 391 Subsystem, 58, 151
algebra of, 25, 29 Subtraction, 105
axiomatic theory of, 23, 391 Successor, direct, 87
complement of a, 25, 394 Successor operation, 64
denumerable, 291 Sums, extended, 93, 124
disjoint, 29 Supremum (sup), 232
empty, 22 Symmetric law, 21
equinumerous, 57 Symmetric polynomials, 174, 339
equivalence of, 46 elementary, 176
extended intersection of, 34, 394 fundamental theorem on, 178
extended union of, 35, 393 Symmetric relation, 44
finite, 16, 57, 92 Synthetic geometry, 186
identity between, 19 System, algebraic, 55
inclusion between, 19 continuously ordered, 226
infinite, 16, 57 densely ordered, 190
intersection of, 25, 394 mathematical, 55
member of a, 16 Peano, 64
418 INDEX

simply ordered, 83 Uniform convergence, 287 (Ex. 3)


well ordered, 85, 123, 139 Union of sets, 25, 393
Systems of linear equations, 201, 206 extended, 34, 393
Unique factorization theorem, for inte¬
Tarski, A., 335 gers, 142
Tartaglia, 346 for polynomials, 217
Taylor’s theorem, 263 Unit circle, 311
Term of an ordered pair, 38 Unit, imaginary, 306
a sequence, 92 Unity element, 105
Ternary relation, 42 Unity, nth roots of, 320
Ternary set, Cantor’s, 301 (Ex. 1) Unordered pair, 392
Topology, 246, 255, 327 Upper bound, 232
Topologically complete, 246, 255 least, 232
Total degree (of a polynomial in sev¬ Upper section, 226
eral variables), 173
Totalities, 14
Vacuously true implication, 11
Transcendental, real number, 288
Variable, 5, 6, 14
(See also: Simple transcendental ex¬
Variations of sign, 279
tension, 160, 289, 357)
Vector space, 366
Transitive law, 21, 33, 83
Venn diagram, 26, 29
Transitive relation, 44
von Neumann, J., 391
Transposition, 130 (Ex. 2)
Triangle inequality, 113, 310
Trichotomy law, 78, 83 Weierstrass, K. T., 331 (See also:
Trigonometric functions, 313, 400 Bolzano-Weierstrass theorem)
Trigonometric representation of the Weierstrass’ Nullstellensatz, 268
complex numbers, 317 Well-ordered system, 85, 123, 139
Trisect.ion of an angle, 381
Type of a mathematical system, 55 Zermelo, E., 391
Zeros of a polynomial, 166
Unary function, 50 Zeros-theorem; see Weierstrass’ Null¬
Uncountable set, 292 stellensatz

ABCDE69876543
Date Due
£TH3]
t. y BA'
A
NOV 9 ic 70
-M U
1 IRD A DV
r>"
t
APR 2 2 1999—
-T- ™ . li =r*7:i ^
DEC—3 e/s ftp
-mxr—~ I7J

nnT u«
-Uu r, o 1386
» IvW

NO. 23233
PRINTED \b U. S. A. (WJ caT
QA 241 F4
Feferman, Solomon. 010101 000
Tne number systems; foundation

63 0126777 .
TRENT UNIVERSITY

QA241 *F4

■fTo-Tp-pmArt, Salomon -

The number systems.

ISSUED TO
DATE

T
i __ _ —
-<--—

£§§=±i- . —K_. -

mm

14218
QA Feferman, Solomon
241 The number systems
F4

Trent
University

You might also like