Consulting Editors
RICHARD S. PIETERS
GAILS. YOUNG
RICHARD S. PIERCE
Department of Mathematics University of Washington
ADDISOKWESLEY
PUBLISHING
COMPANY, INC.
LONDON
READING, MASSACHUSETTS
PALO ALTO
Copyright @ 1963 ADDISONWESLEY PUBLISHING COMPANY, INC. Printed in the United States U;r; Arnerica
ALL RIGHTS RESERVED. THIS BOOK, OR PARTS THEREOF, MAY NOT B E REPRODUCED I N ANY FORM WITHOUT WRITTEN PERMISSION OF THE PUBLISHERS.
PREFACE This book is an offspring of two beliefs which the authors have held for many years: it is worthwhile for the average person to understand what rnathematics is al1 about; it is impossible to learn much about mathematics without doing mathematics. The first of these convictions seems to be accepted by most educated people. The second opinion is less widely held. Mathematicians teaching in liberal arts colleges and universities are often under pressure from their colleagues in the humanities and social sciences to offer short courses which will painlessly explain mathematics to students with varying backgrounds who are seeking a broad, liberal education. The extent to which such courses do not exist is a credit to the good sense of professional mathematicians. Mathematics is a big and difficult subject. I t embraces a rigid method of reasoning, a concise form of expression, and a variety of new concepts and viewpoints which are quite different from those encountered in everyday life. There is no such thing as "descriptive7' mathematics. In order to find answers to the questions "What is mathematics?" and "What do mathematicians do?", it is necessary to learn something of the logic, the language, and the philosophy of mathematics. This cannot be done by listening to a few entertaining lectures, but only by active contact with the content of real mathematics. I t is the authors' hope that this book will provide the means for this necessary contact. For most people, the road from marketplace arithmetic to the border of real mathematics is long and steep. I t usually takes severa1 years to make this journey. Fortunately, because of the improving curriculum in high schools, many students are completing the elementary mathematics included in algebra, geometry, and trigonometry before entering college, so that as college freshmen they can begin to appreciate the attractions of sophisticated mathematical ideas. Many of these students have even been exposed to the new programs for school mathematics which introduce modern mathematical ideas and methods. Too often, such students are shunted into a college algebra or elementary calculus course, where the main emphasis is on mathematical formalism and manipulation. Any enthusiasm for creative thinking which a student may carry into college will quickly be blunted by such a course. It is often claimed that the manipulative skills acquired in elementary algebra and calculus are what a student needs for the application of mathematics to science and engineering, and indeed to the practica1 problems of life. Although not altogether wrong, this argument overlooks the obvious fact that in almost any situation, the ability to use mathematical technique and reasoning is more valuable than the ability to manipulate and calculate accurately. v
vi
PREFACE
Elementary college algebra and calculus courses usually cultivate manipulation a t the expense of logical reasoning, and they give the student almost no idea of what mathematics is really like. It is often painfully evident to an instructor in, say, a senior leve1 course in abstract algebra that the average mathematics major in his class has a very distorted idea of the nature of mathematics. The object of this book is to present in a form suitable for student consumption a small but important part of real mathematics. I t is concerned with topics related to the principal number systems of mathematics. The book treats those topics of algebra which are basic for advanced studies in mathematics and of fundamental importance for al1 working mathematicians. This is the reason t,hat we have entitled our book "The Algebraic Foundations of Mathematics. " In accord with the philosophy that students should be taught mathematics by exposing them to the mathematics of professional mathematicians, the book should be useful not only to students majoring in mathematics, but also to adequately prepared students of any speciality. Since mathematics is a logical science, it is appropriate that any book on real mathematics should emphasize mathematical proofs. The student who masters the technique and acquires the habit of mathematical proof is well on his way toward understanding the nature of mathematics. Such a mastery is hard to achieve, but it is within the reach of a large percentage of the college population. This book is not intended to be an easy one. I t is not meant for the college freshman with minimum preparation from high school. An apt student with three years of high school mathematics should be able to study most parts of the book with profit, but his progress may not be rapid. Appropriate places for the use of this book include: a freshman course to replace the standard precalculus college algebra for students who will progress to a rigorous treatment of calculus, a terminal course for liberal arts students with a good background in mathematics, an elementary honors course for mathematics majors, a course to follow a traditional calculus course to develop maturity, and a refresher course for high school mathematics teachers. The book is written in such a way that the law of diminishing returns will not set in too quiekly. That is, enough difficult material is included in most sections and chapters so that even the best students will be challenged. The student of more modest ability should keep this in mind in order to combat discouragement. Some sections digress from the main theme of the book. These are designated by a "star." For the most part, starred sections can be omitted without loss of continuity, although it may be necessary to refer to them for definitions. It should be emphasized that the starred sections are not the most difficult parts of the book. On the contrary, much of the material
PREFACE
vii
in these sections is very elementary. A star has been attached to just those sections which are not sufficiently important to be considered indispensable, but which are still too interesting to omit. The complete book can be covered in a two semester or three quarter course meeting three hours per week. The following table suggests how the book can be used for shorter courses. Course College algebra Time required 1 Semester, 3 hours 1 Quarter, 5 hours Chapter
1 (Omit 13, 15, and starred sections) 2 (Omit starred sections) 4 (Omit 41) 5 (Omit starred sections) 6 (Omit 61, 64, 65) 7 (Omit 71, 72, 73, 76, and starred sections) 8 9 (Omit starred section) 10 (Omit 104)
1 through 8 (Omit starred sections) 4 5 6 8 9 10 (Omit 41, 43, 45, 46) (Omit starred sections) (Omit 61, 64, 65) (Omit starred section) (Omit 103, 104)
Above all, this book represents an effort to show college students some of the real beauty of mathernatics. The appreciation of mathematical beauty is not like the enjoyment of literature, music, and other art forms. I t requires serious effort and hard study. I t is much more difficult for a mathematician to explain his triumphs and masterpieces than for any other kind of artist or scientist. Consequently, most mathematicains do not try to interpret their work to the general public, but only communicate with
viii
PREFACE
colleagues having similar interests. For this reason, a mathematician is often considered to be a rather aloof person who lives partly in this world and partly in some other mysterious realm. This is in fact a fairly accurate conception. However, the door to the world of mathematics is never locked, and anyone who will make the effort can enjoy the beauties of an intellectual domain which comes closer to aesthetic perfection than any other science. Acknowledgements. Writing a textbook is not a routine chore. Without the help of many people, we might never have finished this one. We are particularly indebted to Professors C. W. Curtis, R. A. Dean, and H. S. Zuckerman, who read most of the manuscript of this book, and gave us many valuable suggestions. Our publisher, AddisonWesley, has watched over our work from beginning to end with remarkable patience and benevolence. The swift and expert typing of Mary Pierce is sincerely appreciated. Finally, we are grateful to many friends for sincere encouragement during the last two years, and especially to our wives, who have lived with us through these trying times.
11 12 13 14 15 "16 *17 21 22 23 *24 25 *26 31 32 33 41 42 43 4 4 45 46 51 52 53 *54 *55 56 57 *58 61 62 63
Sets . . . . . . . . . . . . . . The cardinal number of a set . . . . . . . The construction of sets from given sets . . . . . . . . . . . . The algebra of sets Further algcbra of sets . General rules of operation Measures on sets . . . . . . . . . . . . . . Properties and esamples of measures Proof by induction . . . . . . . . The binomial theorem . . . . . . . Generalizations of the induction principlc . S h e technique of induction . . . . . Inductive properties of the natural numbcrs Inductive definitions . . . . . . .
. . . . . . . .
. . . . . . . .
.
. .
.
. .
.
. .
.
. .
. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . The definition of numbers Operations with the natural numbers The ordcring of the natural numbers
Construction of the integers . . . . . . . . Rings Generalized sums and products Integral domains . . . . Thc ordering of the integers . Properties of order . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . S h e division algorithm Greatest common divisor . . . . . . . . . . . . . . . The fundamental theorem of arithmetic . . . . . . . . . . . More about primes Applications of the fundamental theorem of arithmetic . . . . . . . . . . . . . . Congruences . . . . . . . . . . . Linear congruences The theorems of Fermat and Euler . . . . . . .
. . . . . . . .
. . . . . . . .
. . . Basic properties of the rational numbcrs . . . . . . . . . . . . . Fields The characteristic of integral domains and fields . ix
. . . . . . . . . . . .
CONTENTS
64 65 71 72 73 74 75 76 "77 "78 *79 *710 81 82 83 84 91 92 93 94 95 96 97 98 "99 910 91 1 912 101 102 103 104
. .
. .
. .
. .
. .
. .
.
. . . . . . . 213 . . . . . . . 218
. .
.
Development of the real numbers . . . . . . The coordinate line Dedekind cuts . . . . . . . . Construction of the real numbers The completeness of the real numbers Properties of complete ordered fields Infinite sequences . . . . . . Infinite series . . . . . . . Decimal representation . . . . Applications of decimal representations
. .
.
. .
.
. .
.
. .
.
. . . .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. . .
.
. . .
.
. . .
.
. . .
.
. . .
.
. . .
.
. . .
.
. . .
. . . . . . . .
. . . . 286 . . . . 291 . 298 . . . . 303
The construction of the complesnumbers . . . Comples conjugates and the absolute value in C . The geometrical representation of complex numbers Polar representation . . . . . . . . . Algebraic equations . . . . . . . . . Polynomials . . . . . . . . . . . The division algorithm for polynomials . . . . Greatest common divisor in F[x] . . . . . . The unique factorization theorem for polynomials Derivatives . . . . . . . . . . . . The roots of a polynomial . . . . . . . The fundamental theorem of algebra . . . . The solution of third and fourthdegree equations Graphs of real polynomials . . . . . . . Sturm's theorem . . . . . . . . . . Polynomials with rational coefficients . . . .
Polynomialsinseveralindeterminates . Systems of linear equations . . . . The algebra of matrices . . . . . The inverse of a square matrix . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
INTRODUCTION
As we explained in the preface, the purpose of this book is to exhibit a small, but significant and representative, part of the world of mathematics. The selection of a principal subject for this project poses difficulties similar to those which a blind man faces when he tries to discover the shape of an elephant by means of his "sense of feel." Only a few aspects of the subject are within reach, and it is necessary to exercise care to be sure the part examined is truly representative. We might select some important unifying concept of modern mathematics, such as the notion of a group, and explore the ramifications of this idea. Alternatively, an older and perhaps familiar topic can be examined in depth. I t is this last more conservative program which will be followed. We will study the principal number systems of mathematics and some of the theories related to them. An attempt will be made to answer the question "what are numbers?" in a way which meets the standards of logical precision demanded in modern mathematics. This program has certain dangers. Familiarity with ordinary numbers hides subtle difficulties which must be overcome before it is even possible to give an exact definition of them. Checking the details in the construction of the various number systems is often tedious, especially for a student who does not see the point of this effort. On the other hand, the end products of this work, the real and complex number systems, are objects of great usefulness and importance in mathematics. Moreover, the development of these systems offers an opportunity to exhibit a wide variety of mathematical techniques and ideas, so that the student is exposed to a representative cross section of mathematics. I t is customary in technical books to te11 the reader what he will need to know in order to understand the text. A typical description of such requirements in mathematical textbooks runs as follows: "This book has no particular prerequisites. However, the reader will need a certain amount of mathematical maturity. " Usually such a statement means that the book is written for graduate students and seasoned mathematicians. Our prerequisites for understanding this book are more modest. The reader should have successfully completed two years of highschool algebra and a year of geometry. The geometry, although not an absolute prerequisite, will be very helpful. For certain topics in the chapters on the complex numbers and the theory of equations, a knowledge of the rudiments of trigonometry is assumed. We do not expect that the reader will have much "mathe1
INTRODUCTION
matical maturity." Indeed, one of the main purposes of this book is to put the reader in touch with mature mathematics. Some of the obstacles which a beginning student of mathematics faces seem more formidable than they reafly are. With a little encouragement almost any intelligent person can become a better mathematician than he would imagine possible. The purpose of the remainder of this introduction is to provide some encouraging words on a variety of subjects. I t is hoped that our discussion will smooth the reader's way throughout the book. We suggest that this material be read quickly, then referred to later as it is needed. The number systems. There are five principal number systems in mathematics: the natural numbers: 1, 2, 3, 4, etc.; the integers: 0, 1, 1, 2, 2, 3, 3, etc.; the rational numbers: 0, 1, 1, +, +,$, 3, 3, 3, etc.; the real numbers:O, 1, *, *, 4,4, a, 3  $'a,etc.;and the complez numbers: 0, 1, i, 1 G, T S ,etc. With the possible exception of the complex numbers, each of these systems should be familiar. Indeed, the study of these number systems is the principal subject of arithmetic courses in elementary school and of algebra courses in high school. Of course, the names of these systems may not be familiar. For exarnple, the integers are sometimes called whole numbers, and the rational numbers are often referred to as fractions. In this book the number systems will be considered at two levels. On the one hand, we will assume at least a superficial knowledge of numbers, and use them in examples from the first chapter on. On the other hand, Chapters 3, 4, 6, 7, and 8 each present a critica1 study of one of these systems. The reader has two alt,ernatives. He can either skim the material in these chapters, relying on the knowledge of numbers which he already possesses, or he can study these chapters in detail. The latter road is longer and more tedious, but it leads to a very solid foundation for advanced courses in mathematical analysis. Variables. I f a single event can be called the beginning of modern mathematics, then it may possibly be the introduction of variables as a systematic notational device. This innovation, due largely to the French inathematician Francois Vihta (15401603), occurred about 1590. Without variables, mathematics would not have progressed very far beyond what we now think of as its "beginnings." By using variables it is possible to express complicated properties of numbers in a very simple way. Basic laws of operation, such as
m, +
z+ y =y+z
and
can be stated without using the variables x, y, and z, but the resulting statements lack the clarity of these algebraic identities. For example,
the statement that "the product of a number by the sum of two other ilumbers is equal to the sum of the product of the first number by the secoild with the product of the first number by the third" is more simply and clearlyexpressed by the identity
More complicated laws would be almost impossible to stat*ewithout using variables. The reader who doubts this should try to express in words the relatively simple identity
The variables which are encountered in highschool algebra courses usually range over systems of numbers; that is, it is intended that these variables stand for real numbers, rational numbers, or perhaps only for integers. However, variable symbols are often useful in other contexts. For example, the symbols 1and m in the statement "if 1and m are two different nonparallel lines, then 1 and m have exactly one point in common" are variables, representing arbitrary lines in a plane. In this book variables will be used to denote many kinds of objects. However, in al1 cases, a variable is a symbol which represents an unspecified member of some definite collection of objects, such as numbers, points, or lines. The given collection is called the range of the variable, and a particular object in the range is called a value of the variable. The notations used for variables in mathematical literature often puzzle students. In the simplest cases, the letters of the alphabet are used as variable symbols. However, some mathematical statements involve a very large number of variables, and in some cases, even infinitely many. To accommodate the need for many variables, letter symbols with subscripts are usually employed, for example, xl, x2, x7, y3, 215, a2, b7, etc. Sometimes double subscripts are more convenient than single ones. Thus, , ~ etc. ~ Variable symbols are we find expressions such as x l , ~~, 7 x2~,52, often used to denote a subscript on a variable letter. For instance xi, y,, ak, zi,j, etc. In these cases, the variable subscript is usually assumed to stand for a natural number, or possibly an integer. Mathematical language. One of the difficulties in learning mathematics is the language barrier. Not only must the student master many new concepts and the riames of these concepts, but he must also learn numerous abbreviations and symbols for common words. Except for the use of abbreviations, the grammar of mathematics is the same as that of the language in which it is written.
INTRODUCTION
A sentence in mathematical writing is any expression which is a meaningful assertion, either true or false. According to this definition, such formulas as 2.'2  2, 1=2, 1 1<32, and 0=0
must be counted as sentences. Sentences may contain variables. Por example, the statement "There is a real number x such that xlOO25x7 500 = O" is an assertion which is either true or false, 57~5~ although it is not obvious which is the case.* There are other expressions of importance in mathematics which cannot be called sentences. These are formulas, such as
,x
+ zj = 1,
x2
+ 2x + 1 = 0,
and
> 2,
and expressions which have the form of sentences, except that variables occur in place of the subject or object; for example, "x is an integer" and "2 divides n." Expressions such as these are called sentential functions. They have the property that substituting numerical values (or whatever objects the variables represent) for the variables converts them into sentences. For instance, by suitable substitutions for x and y, the sentential function x y = 1 is transformed into the sentences
I t makes no sense to ask whether or not a formula such as x y = 1 is true. For some values of x and y it is true; for others it is not. On the other hand, the formula x y=y x has the property that every substitution of numbers for x and y leads to a true sentence. Such a sentential function is usually called an identity. Sentential functions which are not formulas may also have the property of being true for al1 values of the variables occurring in them. For example, the statement x" has this property of universal validity, provided "either x < y, or y that it is understood that x and y are variables which range over real numbers. A sentential function which is true for al1 values of the variables in it is said to be identically true or identically valid (the adjective "identically is sometimes omitted) . Implications. Many beginning students of mathematics have trouble understanding the idea of logical implication. As many as onehalf of al1 statements in a mathematical proof may be implications, that is, of the form " p implies q," where p and q are sentences or sentential functions.
<
j7
* It is true.
ISTRODUCTION
p irnplies q q is implicd by p
x positive implies t h a t x is nonnrgative x noniicgativc is implicd by x bcing positive if x is positive, then x is nonnegative x is nonnegative if x is positive x is positive only if x is nonnegative only if x is nonnepative is x positive
l
1
I
for p i t is necessary t h a t p
Icor this reasoii, it is important to be able to recogiiize a11 implicatioii, and to undcrstand what it means. Thc variety of ways in which mathcmaticians say "p implies q" is ofteri bcwildcring to studeiits. The expressions "x = 1 iniplics x is an iiiteger"; "if x = 1, thcn x is an integer "; (.'E = 1 orily if x is an integcr "; ".r = 1 is a siificierit conditioii for x to be an iiitcgcr "; aiid "for .c to eyual 1, it is neccssary that x be ari iiiteger" al1 have thc samc mcaiiiilg. Such statements as thcsc occur rcpeatcdly in aiiy book or paper oii mathcmatics. For thc readcr's coilveiiience, TVC list in Table 1 some of the forms in which "p implies q" may be written, togcthcr with cxamples of thesc locutions. If p and q are both sciltences, then the implicatioii "p implics q" is a sentence; if cither p, or q, or both p arid q are sciitential functions, theii "p implies q" is a sentential fuiictioii. Iii case " p implies q" is a seiitence, then its truth is completely dctcrmiiicd by the truth or falsity of p aiid q. Specifically, this implicatioii is truc cither if p is false or q is true. I t is false only if p is true and q is false. I;or cxample, "3 = 3 implies 1 < 3" is triie, "3 = 2 implies 1 < 2 " is truc, " 3 = 1 implies 1 < 1 " is truc, but "3 = 3 implics 1 < 1 " is falsc. I t may seem strange to coiisidcr a scn tcilce "p implies q" to bc true eveii though there is iio apparent conilectioil 1)etwceii p aiid q. The idea which C O ~ V C Y Sis that thc validity of the the statement "p implies q" 11s~a11y
INTRODUCTION
TABLE2
Form
p is equivalent to q p if and only if q p is a necessary and sufficient condition for q p implies q, and conversely
Example
+1 y+1 +1 y+1 y is a necessary and sufficient condition for x + 1 y+1 x y implies x + 1 y + 1, and conx x x
= y
is equivalent to x
y if and only if x
=
versely
sentence q is somehow a consequence of the truth of p. I t is hard to see how the truth of such an implication as "3 = 1 implies 1 < 1" fits this conception. Our convention concerning the truth of an implication becomes more understandable when we consider how a sentence of the form "p implies q" may be obtained from a sentential function by substitution of numerical values for the variables. For example, the implication "y 2 = x implies y < x" is a sentential function which everyone would agree is identically valid. That is, it is true for al1 values of x and y. However, by substituting 1 for x and 1 for y, we obtain the sentence "3 = 1 implies 1 < 1," whose truth was previously admitted only with reluctance. Converse and equivalence. From any statements p and q it is possible to form two different implications, "p implies q" and "q implies p." Each of these implications is called the converse of the other. An implication does not ordinarily have the same meaning as its converse. For example, the converse of the statement "if n > O, then n2 > 0 " is the implication "if n2 > O, then n > O." These assertions obviously have different meanings. In fact, the first statement is identically true, whereas the second statement is not true for al1 n ; for example, (  I ) ~ = 1 > O and 1 < O. I f the implication "p implies q" and its converse "q implies p" are both true, then the statements p and q are said to be equivalent. In practice, the notion of equivalence of p and q is most frequently applied when p and q are sentential functions. For example, if x and y are variables which range over numbers, then the formulas x = y 1=y 1 are equivalent, since "x = y implies x 1=y 1" and x and "x 1= y 1 implies x = y" are identically valid. There are various ways of saying that two statements p and q are equivalent. Most of these forms are derived from the terminology for implications. Severa1 examples are given in Table 2.
+ +
+ +
INTRODUCTION
TABLE3
P
Q
p implies q
f alse
true
Contrapositive and inverse. In addition to the implication "p implies q" and its converse "q implies p," two other implications can be forrned using p and q. These are "not q implies not p" and "not p implies not q." The implication "not q implies not p" is called the contrapositive of "p implies q," while "not p implies not q" is called the inverse of ('p implies q." For example, the contrapositive of the statement "if x = 1, then x is an integer " is the implication "if x is not an integer, then x is not equal to l." I t is easy to see that the contrapositive of "p implies q" is true under exactly the same circumstances that this implication is itself true. The most convincing way to demonstrate this fact is to make a table listing al1 of the possible combinations of truth values of any two sentences p and q, together with the corresponding truth or falsity of " p implies q" and its contrapositive (Table 3). The entries in the fifth column of Table 3 are determined by the combinations of true and false in the first two columns, while the entries of the last column are determined from the combinations which occur in the third and fourth columns. Of course, the entries of the third column are just the opposite of those in the first column, and a similar relation exists between the fourth and second columns. The fact that an implication is logically the same as its contrapositive is often very useful in mathematical proofs. Sometimes, rather than proving a statement of the form "p implies q," it is easier to prove the contrapositive "not q implies not p." This is logically acceptable. Also, if we wish to prove that p and q are equivalent, that is, " p implies q" and "q implies p " are valid, it is permissible to establish that "p implies q" and "not p implies not q. " This is because "not p implies not q" is the contrapositive of "q implies p. " However, beware ; it is not correct to claim that if p implies q and not q implies not p, then p is equivalent to q. DeJinitions. Simple mathematical proofs often consist of nothing more than showing that the conditions of some definition are satisfied. Kevertheless, beginning students frequently find such arguments difficult to understand. Consider, for example, the problem of showing that 222 is
INTRODUCTION
The logical structure of a mathematical proof may have one of two forms. The direct proof starts from certain axioms or definitions, and proceeds by application of logical rules to the required conclusion. The second method, the socalled indirect proof, is perhaps less familiar, even though it is often used unconsciously in everyday thinking. The indirect proof begins by assuming "hat the statement to be proved is false. Then, using this assumption, together with the appropriate axioms and definitions, a contradiction of some kind is obtained by means of a logical argument. From this contradiction it is inferred that the statement originally assumed to be false must actually be true. For example, let us show by an indirect proof that there is no largest natural iiumber. This proof uses three general properties of numbers, which, for our purposes can be considered as axioms: 1 is a natural iiumber; (a) if n is a natural iiumber, theii n 1; (b) n < n (c) if n < m, then n 2 m is impossible. Our indirect proof begins with the assumption that the statement to be proved is false, that is, we assume that there is a largest natural number. Let this number be deiioted by n. To say that n is the largest natural number means two things: (i) n is a natural number; (ii) if m is a natural number, then n m. Applying the rule of detachment to (a) aiid (i) gives (iii) n 1 is a natural number. Substituting n 1 for m in (ii), we obtain (iv) if n 1 is a natural number, then n n 1. The rule of detachment can nolv be applied to (iii) and (iv) to conclude that (v) n 2 n 1. 1 for m iii (c) gives However, substituting n (vi) if n < n 1, then n n 1 is impossible. This, together with (b) and the rule of detachmeiit yields (vii) n 2 n 1 is impossible. The statements (v) and (vii) provide the contradiction which completes this typical indirect proof. In spite of the elementary character of the logic used by mathematicians, it is a matter of experience that understanding proofs is the most difficult aspect of mathematics. Most people, mathematiciaiis included, must work hard to follow a difficult proof. The statements follow each other relent
>
+ + + + + +
> +
> +
10
INTRODUCTION
lessly. Each step requires logical justification, which may not be easy to find. The result of this labor is only the beginning. After the stepbystep correctness of t,he argument has been checked, it is necessary to go on and find the mathematical ideas behind the proof. Truly, real mathematics is not easy.
CHAPTER 1
SET THEORY
11 Sets. The notion of a set enters into al1 branches of modern mathematics. Algebra, analysis, and geometry borrow freely from elementary set theory and its terminology. Indeed, al1 of mathematics can be founded on the theory of sets. As is to be expected, an idea with such a wide range of application is quite simple, and any intelligent person can learn enough about set theory for most useful applications of the subject. The central idea of set theory is that of dealing with a collection of objects as an individual thing. Mathematics is not alone in using this idea, and many occurrences of it are found in everyday experience. Thus, for example, one speaks of the Smith family, meaning the collection of people consisting of John Smith, his wife Mary, and their son William. Also, if we referred to Mrs. Smith's wardrobe, we would be treating as a single thing the collection of individual pieces of clothing belonging to Mrs. Smith. The mathematical use of this device of lumping things together into a single entity differs from common usage only in the frequency and systematic manner of its application.
* This statement cannot be considered as a mathematical definition of the terni "set." I n mathematics, a definition is supposed to completely identify the object being defined. Here we have only supplied the synonym "collection" for the less familiar term "set," The problem of finding a satisfactory mathematical definition is far more difficult than it might seem. The uncritical use of sets can lead to contradictions which are avoided only by imposing restrictions on the naive concept of a set, Finding a definition of "set" which is free from contradictions and which satisfies al1 mathematical needs has for 75 years been a central problem of the logical foundations of mathematics. Fortunately, these difficult aspects of set theory can be ignored in almost al1 mathematical applications of the theory.
11
12
SET THEORY
[CHAP.1
52 cards, considered as a set, remains the same whether it is in its original package or is shuffled and distributed into four bridge hands.
x2 = o. EXAMPLE 4. The set of numbers a on the real line (Fig. 11) which satisfy 1 5 a 5 l.
EXAMPLE 1. The set consisting of the numbers O and 1. EXAMPLE 2. The set of numbers which are roots of the equation x2  x = 0. EXAMPLE 3. The set of numbers which are roots of the equation " x
EXAMPLE 5. The set of al1 numbers x/2, where x is a real number which satisfies 2 5 x 5 2. EXAMPLE 6. The set of al1 points a t a distance less than one from a point p in some plane. EXAMPLE 7. The set of al1 points inside a circle of radius one with center a t the point p in the plane of the preceding example. EXAMPLE 8. The set of al1 circles with center a t the point p in the plane of Example 6. EXAMPLE 9. The set consisting of he single number O. EXAMPLE 10. The set which contains no objects whatsoever.
According to our definition of equality of sets, we see that the sets of Examples 1, 2, and 3 are the same. Although O occurs as a socalled double root of the equation x 3  x 2 = O in Exaniple 3, only its presence or absence matters when speaking of the set of roots. The sets of Exanlples 4 and 5 are the same, as are those of Examples 6 arid 7. I f we consider a circle to be the same thing as the set of al1 of its poiiits, then the eleineiits of the set of Example 8 are themselves sets. Sets of sets will be studied more thoroughly in Section 15. The set described in Example 9 contains a single element. Such sets are quite common. I t is conventional to regard such a set as an eiltity which is different from the eleinent which is its only member. Even in ordinary conversation this distinctioii is often made. If Robert Brown is a bachelor with no known rclsttions, then we would say that the Brown family consists of one meniber, but we would not say that Mr. Brown consists of oiie nlember. The reader may feel that the set of Example 10 does not satisfy the description given in
111
SETS
13
Definition 11.1. However, it is customary in mathcmatics to interpret the term "collcction" i11 such a ivay that this ~iotion includes the collection of no objects. Actiially, thc sct containing no elements arises quite naturally in many situations. For instante, in consideriiig thc scts of real numbers which are roots of algebraic equatioris, it ~voilldbe awkjvard to makc a special rase for equations like x2 1 = O, ivhich has no real roots. 13ecause of its importailcc, thc set coiitaiilirig no elemcilts has a special namc, thc empty set, and it is rcprcscntcd by a special symbol, a. When it is neccssary to cal1 atteiltioii to the fuct that a set A is not the empty set, then ve will say that A is nonempty. Oiie reason that set theory is used in so many brailches of mathcmatics is thc versatility of its notation. As aiiyoiie ~ v h o has studied elcmentary algebra might expect, thc, letters of the alphabet are used to rcpresent sets. 111 this book, sets will be represeiitcd by capital letters, and the elements of sets ~ i l usually l bc represeiitcd by small letters. Thc statemeiit that an object a is an elemeiit of a set A is symbolized by
We rcad thc cxpression a E A as "a is in A," or sometimes, "a in A." To give a specific example, lct, A be t,he set of roots of the equation x2  x = O (Example 2). Theii
IVe often wish to cxpress the fact t,hat an object is not contained in a certaiii set. If a is not an element of t,he set A, \ve writc
and rcad this cxpressioii "a is iiot in A . " Thus if A again is the set of Examplc 2, wc would havc 2 4 A, 3 4 A, 4 4 A, etc. I t \vas mentioned earlier that a set is ofteri dcfincd by some property possessed by its elements. There is a very uscful ilotatioiial device iri set theory which gives a standard method of symbolizing thc sct of al1 objects having a certain property. For instaiice, the sets of Kxamples 2 and 4 are respcct ivcly writtcn {2~z2x=0], and {all5a<l).
The symbolic form (*I*] is sometimcs called the set builder. In using it, we replace ttie first asterisk by a variable elemcnt symbol (x aild a in the examples), aiid the second astcrisk is rcplaced by a meaningful condition which the object represented by thc variable miist satisfy to be an element of the set (x2  n: = O and  1 5 a 5 1 in the examplcs). Thus, the
SET THEORY
set builder occurs in the form (xlcondition on x) (or with some variable other than x), and this expression represents the set of al1 elements which satisfy the stated condition. Often, the totality of possible objects for which the variable stands is evident from the condition required of the variable. For example, if the real roots of algebraic equations are under discussion, then in (x/x2  x = O), it is clear that x stands for a real number, and that the set consists of the real numbers which satisfy x2  x = O . In {al1 5 a 5 11, it may not be clear what kind of numbers are allowed as values of the variable. If it is necessary to be more explicit, we would write
where R is the set of al1 real numbers. Similarly, in Example 6, the set builder notation would be
where P is the set of al1 points in some plane and d(p, q) is the distance between points p and q in the plane. Here, the variable q can take as its value any point in the plane P, and the set in question consists of those points in P which satisfy d(p, q) < l. Other forms of the set builder notation for this example are
I t is often convenient to use general symbols or expressions in place of the variable element in the set builder notation. For example, (x21x E R ) , where R is the set of al1 real numbers, is the set of al1 squares of real numbers; (x/ylx E N, y E N), where N is the set of al1 natural numbers, is the set of al1 positive fractions. A variation of the set builder notation can be used to denote sets which contain only a few elements. This coilsists of listing al1 of the elements of the set between braces. For exalnple, the sets of Examples 1 and 8 would be written (0,l) and (01, respectively. I t is sometimes convenient to repeat the same element one or more times in the notation for the set. Thus, in Example 3, we might first write the set of roots of x3  .r2 = x x (x  1) = O as (O, 0, 1) , since O is a double root. Of course, by the definition of equality, {OJO,1) = 0 l . The notation (0, O, 1) conveys no more iiiformation about
111
SETS
15
xs  x2 = O than {O, 1) does. Similarly, (a, b, a, a, b) represeiits the same set a s {a, b). There is aiiother good reason for allowing repetitioii of one or more elements in the notation for a set. Consider the example {a, b] . Wc can think of this as the set whose members are the lettcrs a and b. 111 mathematical applications, however, it is often coiivenient to regard {a, b) as the sct containing variable quantities a and b. As siich, it would become a specific sct if particular values where substituted for a arid b. E'or example, if we allow natural numbers to be substituted for a and b, then each choice of valucs for a and b determines a set whose members are these selected natural riumbers. In this example, a and b may take on the samc value. For instai~ce,if a = 1, b = 1, t h e i ~{a, b) = (1, 1)' = (1). If we did not allow repetition of the elements in designsting sets, the collection of sets (a, b) dctermined by substituting values for a and b would be considcrably more difficult t o describe. This difficiilty would be increased iii more complicated examples. Scts containing many elements can often be represented by listing some of the elements between braces aild using a sequence of dots to indicate omitted elemciits. lior examplc, it is clear that
{1, 2, 3,
. . . , 2165)
represents thc set of al1 natural numbers from 1 t o 2165. Some infinite sets caii also he represented in this way. For example,
denotes the set of al1 natural iiumbers. DEFISITIOS 11.2. A set A is called a subset of the set R (or A is included i n R) if every element of A is an element of B. I t is customary t o express the fact that. A is a subset of R by writing A 2 B or B 2 A. Aily set A is a subset of itself, A A , according to this definition. If A 2 B, but A # R (that is, A is not thc same as the set R), then A is called a proper subset of B and iii this case ~ v c writc A C R or B > A . If A is iiot a subset of B, ~ v c writc A g R or B 2 A .
EXAMPLE 1l . The set of al1 even integers, 2 3 = (0, &2, &4, . . .), is a proper subsct of the sct Z of al1 intcgers. EXAMPLE 12. The set of a11 poirits a t distance Iess than one from a point p in a plane P is a proper subset of the set of points of P a t distance less than or p. equal to oiie f r o n ~
16
SET THEORY
EXAMPLE 13. {O, 1) C (O, 1, 2, 4). EXAMPLE 14. (a10 < a 5 1) C {a10 5 a EXAMPLE 15. Q, E A for every set A .
1).
The reader should carefully check to see that in each of these examples the condition of Definition 11.2 is satisfied. The fact that Q, is a subset of every set may seem straiige. However, it is certainly true according to our definitions: every element of is an element of A, or in other words, no element of can be found which is not in A. Since Q, has no elements, this condition is certainly satisfied. The inclusion relation has three properties which, although direct consequences of our definition, are quite important. The first of these has already been noted. THEOREM 11.3. For any sets A, B, and C, (a) A E A , (b) if A c B and B 5 A, then A = B, (c) if A E B and B S C, then A E C.
Proof. We will prove property (b) in detail, leaving the proof of (c) t o the reader. If A c B and B E A , then every element of A is an element of B and every element of B is an element of A. That is, A and B contain exactly the same elements. Thus, by Definition 11.1, A = B.
Certain sets occur so frequently in mathematical work that it is convenient to use particular symbols to designate them throughout any mathematical paper or book. An example is the practice of denoting the empty set by the symbol a. In this book, the number systems of mathematics, considered as sets, will occur repeatedly. We therefore adopt the f ollowing conventions :
N designates the set of al1 natural numbers : (1, 2, 3, . . .) ; Z designates the set of al1 integers: {. . . , 3, 2, 1,0, 1, 2, 3, . . .) ; Q designates the set of al1 rational numbers: (albja E 2, b E N) ; R designates the set of al1 real numbers.
This notation, though not universal, would be recognized by most modern mathematicians. Throughout this book, the letters N, 2 , Q, and R will not be used to denote any set other than the corresponding ones listed above. I n mathematical literature, a considerable amount of variation in notation can be found. The terminology and symbolism introduced in this section will be used in the remainder of this book, but it is by no means
111
SETS
17
universal. F o r t h e reader's coiivenience, \ve list some common alternative termii~ology . Set : class, ensemble, aggregate, collect ion. Element of a set: member of a set, poiiit of a set. E m p t y set : void set, vaciious set, null set, zero set. CP: o, A. a ~ A : A 3 a . a 4 A : ~ E ' A , ~ E A , A ~ ~ . (*]*) : (* : *), [*)*], [* : *l.
l. Using the set builder form, write expressions for the folloning sets. (a) The set of al1 even integers. (b) The set of al1 integers which are divisible by five. (c) The set of al1 integers which leave a remainder of one mhen divided by five. (d) The set of al1 rational numbers greater than five. (e) The set of al1 points in space which are inside a sphere with center a t the point p and radius r . (f) The set of solutions of the equation x3  2x2  x 2 = 0. 2. Te11 in words what sets are represented by the following expressions. (a) {x E NIx > 10) (b) {x E QIX  3 E N) ( 4 ( 5 , 6 , 7, . . 1 (4 {a, b, , Y, 2 ) (e) { X ~ X = y2 z2, y E R, z E R, x2  y2 = (x  y)(x y)) 3. Describe the following sets by listing their elements. (a) {xIx2 = 1) (b) {x1x2  2x = 0) 1 = O ) (c) {x1x2  2x 4. List the following collections of sets. (a) The sets {a, b, c), where a, b, and c are natural numbers less than or equal to 3. (b) The sets {a2 a 1), where a is a natural number less than or equal to 5 . (c) The three element sets (a, b, c), where a, b, and c are integers between 2 and 4. 5. State al1 inclusion relations which exist between the folloing sets: N, 2, Q, R, the set of al1 even integers, {n(n = m2, m E Z), {zjx = y2, y E Q), { X ~ X = y2  n, y E R, n E 2). 6. Prove Theorem 11.3(c). 7. Prove that if A B, B C C, then A C C, and if A C B and B G C, then A c C.
+ +
18
SET THEORY
[CHAP.
12 The cardinal number of a set. The simplest and most important classification of sets is given by the distinction between finite and infinite sets. Returning to the examples considered in Section 11, the set of Examples 1, 2, and 3, and the sets of Examples 8 and 9 are finite, while the sets of Examples 4 through 7 are infinite. Note that the empty set @ is considered to be finite. I t is not altogether easy to explain the difference between a finite and an infinite set, although almost everyone with some experieiice learns to distinguish finite from infinite sets. Roughly speaking, a finite set is either the empty set, or a set in which we can designate a first element, a second element, a third element, and so on, until at some stage we reach an nth element and find that there are no more left. Of course, the number n of elements in the set may be one, two, three, four, . . . , a million, or any natural number whatsoever. A set is said to be infinite if it is not finite, that is, if its elements cannot be counted. Examples of infinite sets are the set N of al1 natural numbers, the set Z of al1 integers, the set Q of al1 rational numbers, the set R of al1 real numbers, etc. These examples show that some of the most important sets encountered in mathematics are infinite. If A is a finite set, it is meaningful to speak of the number of elements of A. This number is called the cardinal number (or cardinality) of the set A. We will use the notation iA 1 to designate the cardinal number of A . As examples,
l(O,1,4)1=3.
I t was remarked above that the empty set 4> is regarded as being finite: Since @ has no elements, it is natural to say that the cardinality of (P is zero. Thus, in symbols, (@l = 0. There are many synonyms for the expression "cardinal number of A." Besides the term "cardinality of A," which we have already mentioned, one finds such expressions as "power of A " and "potency of A. " The notation IA( for the cardinal number of the set A is not universal either. The symbolism A is perhaps even more common (but difficult to print and type), and such expressions as card A or N(A) can also be found. These descriptions of finite and infinite sets, and of the cardinal number of a finite set are too vague to be called mathematical definitions. Moreover, we have not said anything about the cardinal numbers of sets which are not finite. The first man to systematically study the cardinal number concept for arbitrary sets (both finite and infinite) was Georg Cantor (18451918). His researches have had a profound influence on al1 aspects of modern mathematics. In the remainder of this section, we will examine one of Cantor's most important ideas, and see in particular how it enables as to explain the concept of a finite set in more exact terms.
121
19
DEFINITION 12.1. A pairing between the elements of two sets A and B such that each element of A is matched with exactly one element of B, and each element of B is matched with exactly one element of A , is called a onetoone correspondence between A and B.
The reader should study the following examples to be sure that he fully understands the meaning of the fundamental concept defined in Definition 12.1.
EXAMPLE 1. Let A = (1, 2, 31, and B = (a, b, e]. Then there are six possible onetoone correspondences between 4 and B: 1 2 b 3 1 2 3 1 2 3 1 2 3 1 2 3 a 1 2 3
b c a
5 5 5
c a b
5 5 5
b a c
5 5 5
c b
5 5 5
c b a
EXAMPLE 2. It is impossible to obtain a onetoone correspondence between the set A = (1, 2, 3) and the set B = (1, 2). No matter how we try to pair , and B, we find that more than one element of A must off the elements of 4 correspond to a single element of B. If 1 t , 1 and 2 t , 2, then 3 must correspond 1, the element 1 E B , 1, 2 t , 2, 3 to 1 or 2 in B. I n the correspondence 1 t is paired with both 1 E A and 3 E A, so that the correspondence is not onetoone. A similar situation occurs in al1 possible correspondences between A and B. The more general fact that there is no onetoone correspondence between (1, 2, 3, . . . , m) and (1, 2, 3, . . . , n) if m < n is also true. This can be proved using the properties of the natural numbers, which will be discussed in the next two chapters. EXAMPLE 3. There is a onetoone correspondence bet.c~eenthe set Z = (. . . , 3, 2, 1, 0, 1, 2, 3, . . .) of al1 integers and the set N = (1, 2, 3, . . .) of al1 natural numbers. The elements of Z and N can be paired off as follows:
Note that in order to construct a onetoone correspondence between Z and N, not al1 of the numbers of N can be paired with themselves. Otherwise, we would use up al1 of N and have nothing left to associate with O, 1, 2, . . . .
DEFINITION 12.2. Let A be a set and let n be a natural number. Then the cardinal iliimber of A is n if there is a onetooiie correspondence between A and the set (1, 2, 3, . . . , n), consisting of the first n natural
20
SET THEORY
[CHAP.
numbers. A set A is Jinite if A = @, or there is a natural number n such that the cardinal number of A is n. Otherwise A is called in$nite. This definition is no more than a careful restatement of the informal descriptions of finite and infinite sets, and their cardinal numbers, which were given a t the beginning of this chapter. The usual practice of writing a finite set in the form {al, az, a3,
, a,)
(without repetition)
Cantor observed that it is possible to say when two sets A and B have the same number of elements, without referring to the exact number of elements in A and B. This idea is illustrated in the following example. Suppose that in a certain mathematics class, every chair in the room is occupied and no students are standing. Then without counting the number of students and the number of chairs, it can be asserted that the number of students in the class is the same as the number of chairs in the room. The reason is obvious; there is a onetoone correspondence between the set of al1 students in the class and the set of al1 chairs in the room.
DEFINITION 13.3. Two sets A and B are said to have the same cardinal number, or the same cardinality, or to be equivalent if there exists a onetoone correspondence between A and B.
By Example 1, the two sets {1,2,3) and {a, b, c ) have the same cardinality. By Example 3, so do the sets N and 2. However, according to Example 2, the sets {1,2, 3) and (1,2) do not have the same cardinal number . In accordance with Definition 12.3, the existence of any onetoone correspondence between A and B is enough to guarantee that A and B have the same cardinal number. As in Example 1, there may be many onetoone correspondences between A and B.
EXAMPLE 4. Every set *4 is equivalent to itself, since a ++ a for a E A is obviously a onetoone correspondence of A with itself. If A contains more than one element, then there are other ways of defining a onetoone correspondence of A with itself. For example, let L4 = (1, 2). Then there are two onetoone correspondences of A with itself : 1 * 1, 2 t , 2 and 1 * 2, 2 o 1. Any onetoone correspondence of a set with itself is called a permutation of the set.
121
21
If A = {al, a2, . . . , a,) and B = {bl, b2, . . . , b,) are finite sets which both have the cardinal number n, then there is a onetoone correspondence between A and B :
so that A and B are equivalent in the sense of Definition 12.3. That is, if A and B are finite sets, then A and B are equivalent if 1 A 1 = 1 BI. The important fact to observe about Definition 12.3 is that it applies to infinite as well as finite sets. One of Cantor's most remarkable discoveries was that infinite sets can have different magnitudes, that is, in some sense, certain infinite sets are "bigger than" others. To appreciate this fact requires some work. Example 3 has already illustrated the fact that infinite sets which seem to have different magnitudes may in fact have the same cardinality. An even more striking example of this phenomenon is the following one.
EXAMPLE 5 . The set iV of natural numbers has the same cardinality as the set F = (m/nlm E N, n E N)
of al1 positive rational numbers. This can be seen with the aid of a dia.gram as in Fig. 12. By following the indicated path, each fraction will eventually f we number the fractions in the order that they are encountered, be passed. I which are equal to numbers which have skipping fractions like S, 2, S, and
2,
22
SET THEORY
[CHAP.
been ~reviouslypassed, 1%e get the desired onetoone correspondence between N and F:
Cantor showed that many important collections of numbers have the same cardinality as the set N of al1 natural numbers. For example, this is true for the set d of al1 real algebraic numbers, that is, real numbers r which are solutions of an equation of the form
where no, ni, . . . , nk1, n k are any integers. The set A includes al1 rational numbers, since m / n is a root of the equation nx  m = O; A also includes numbers like
d2, 4%)4 3 , a, . . ..
Judging from these examples, one might guess that al1 infinite sets have the same cardinality. But this is not the case. Cantor proved that it is impossible to give a pairing between N and the set R of al1 real numbers. Later, we will be able to present Cantor's proof that these sets do not have the same cardinal numbers. The fact that the set A of real algebraic numbers has the same cardinality as N and the result that R and N do not have the same cardinal numbers together imply that R # A . That is, there are real numbers which are not solutions of any equation
with no, nl, . . . , nkl, nk integers. This interesting fact is by no means evident. I t is fairly hard to exhibit such a real number, but Cantor's results immediately imply that they do exist. Although Cantor's work on the theory of sets was highly successful in many ways, it raised numerous new and difficult problems. One of these ranks among the three most famous unsolved problems in mathematics (the other two: the Fermat conjecture, which we will describe in Chapter 5 , and the Riemann hypothesis, urhich is too technical t o explain in this book.) Cantor posed the problem of urhether or not there is some set S of real numbers whose cardinality is different from both the cardinality of N and the cardinality of R. The conjecture that no such set S exists
121
23
is known as the continuum hypothesis. I t was first suggested in 1878, and to date, it has been neither proved nor disproved. An infinite set is called denumerable if it has the same cardinality as the set N of al1 natural numbers. If S is denumerable, then it is possible to pair off the elements of S with the numbers 1 , 2 , 3 , . . . . Thus, the elements can be labeled a l , a2, a3, . . . , where a, is the symbol which stands for the element corresponding to the number n. Hence, if S is denumerable, then S can be written {al, a2, a3, . . .), with the elements of S listed in the form of a sequence. The converse statement is also true. That is, a set which can be designat,ed {al, a2, a3, . . .) is denumerable (or possibly finite, since distinct symbols might represent the same element of S). As we have shown in this section, the set of al1 integers and the set of al1 positive rational numbers are examples of denumerable sets. We conclude this section by listing for future reference the following important propert'ies of the equivalence of sets. (12.4). Let A, B, and C be arbitrary sets. Then (a) A is equivalent to A ; (b) if A is equivalent to B, then B is equivalent to A; and (c) if A is equivalent to B and B is equivalent to C, then A is equivalent to C. I t has already been noted in Example 4 that (a) is satisfied. Property (b) follows from the fact that the definition of a onetoone correspondence is symmetric. That is, if A and B are interchanged in Definition 12.1, the definition says the same thing as before. Thus, a onetoone correspondence between A and B is a onetoone correspondence between B and A. The proof of (c) is left as an exercise for the reader (see Problem 8).
(a) {(x, Y,z>lxE (0, 1, 2)) Y E (3, 41, 2 E (0, 2,411 (b) ( X ~E XZ, x < 5 ) (e) {xlx E N, x2  3 = 0) (d) (x1x E Q, O < x < 1)
2. What is the cardinal number of the following finite sets? (b) {nln E 2, n2 5 86) (a) {nln E N, n < 1000) (d) (nln E N, n3 5 27) (e) (n21n E Z, n2 5 36) (e) {n31n.E N, n3 27) 3. Let A = (1, 2, 3, 4) and B = (a, b, c, d). List al1 onetoone correspondences between 4 and B.
<
24
SET THEORY
[CHAP.
4. Using the method by which we proved that the positive rational numbers have the same cardinality as the set N of natural numbers, indicate how to prove that N has the same cardinal number as the set of al1 pairs (m, n,) of natural numbers. List the pairs which correspond to al1 numbers up to 21. 5. Prove that the set of al1 rational numbers Q has the same cardinality as the set of al1 natural numbers N. 6. Let A be the set of al1 positive real numbers x, and let B be the set of al1 real numbers y satisfying O < y < l. Show that the pairing x ++ y, where y = 1/(1 x) is a onetoone correspondence between A and B. 7. Let A be a denumerable set, and let B be a finite set. Show that the set S = ((2, y)lx E A, y E B) is denumerable. 8. Suppose that sets A and B have the same cardinality, and that sets B and C have the same cardinality. Show that A and C have the same cardinality.
13 The construction of sets from given sets. I n this section, we will discuss two important methods of constructing sets from given sets. The first process combines two sets X and Y to obtain a set called the product of X and Y. The second construction leads from a single set X to another set called the power set of X. There are severa1 other methods of building sets from given sets, but they will not be considered in this book. The definition of the product of two sets is based on the concept of an ordered pair of elements. Suppose that a and b denote any objects whatsoever. I f the elements a and b are grouped together in a definite order (a, b), where a is the first elernent and b is the second element, then the resulting object (a, b) is called an ordered pair of elements. Two ordered pairs are the same if and only if they have the same first element and the same second element. Thus we arrive a t the following definition."
= c
and b = d.
EXAMPLE 1. Let A = (1, 2, 3). Then the following distinct ordered pairs of elements of A can be fornied: (1, l), (1, 2)) (1, 3)) (2, l ) , (2, 2), (2, 3), (3, l), (3, 2), (3, 3). Note that (a, b) = (b, a) only if a = b. By Definition 13.1, this is true in general. EXAMPLE 2. A man has tulo pairs of shoes, one brown pair and one black pair. If he dresses in the dark, what are the possible combinations of shoes
* There is a simple way to define an ordered pair in the framework of set theory, namely, for objects a and b let (a, b) = ({a, b), a). An ordered pair is then a definite object, and i t is possible to prove that (a, b) = (e, d) if and only if a = c and b = d. However, we will use the informal description given in the text and regard this property of ordered pairs as a definition.
131
25
which he can put on? Let X = {left brown shoe, left black shoe) and Y = {right brown shoe, right black shoe). Then the set of al1 possible combinations which the man might wear is the set of al1 ordered pairs with the first element taken from the set X and the second element taken from the set Y, that is, the set of al1 pairs (left brown shoe, right brown shoe), (left brown shoe, right black shoe), (left black shoe, right brown shoe), (left black shoe, right black shoe). EXAMPLE 3. The set of al1 ordered pairs of natural numbers is the set
S
Thus,
((n, m)(n E N, m E N ) .
DEFINITION 13.2. Let X and Y be sets. Then the product of the sets X and Y is t,he set of al1 ordered pairs (x, y), where x E X and y E Y. The product of X and Y is denoted by X X Y. Thus, in symbols X
EXAMPLE 4. The ordered pairs listed in Examples 1, 2, and 3 are exactly the elements of the products A X A, X X Y, and N X N, respectively.
5. Let U = (1, 2, 3), V = (1, 31, and P V = (2, 3). Then EXAMPLE V = i(1, l), (1, 3)) (2? 1)) (2, 3), (3, l), (3, 3)), and V X U = ((1, l), (1, 2), (1, 3), (3, l), (3, 2), (3, 3)). I t follows that U X V f V X U . Thus, in forming the product of two sets, the order in which the sets are taken is significant. We also have
ux
(U
{(O, 0, 2), ((1) 3), 2), ((3, 3)) 2)) ((1) l), 3), ((3, l ) , 3), ((3) 3), 3)),
((2) 1), 2), ((2, 3), 2)) ((3, 1), 2), ((1) 3), 3), ((2, 1), 3)) ((2, 3), 3),
Note that the elements of ( U X V ) X TV are different from al1 of the elements of U X ( V X TV). I n fact, the elements of ( U X V ) X T V are ordered pairs whose first element is an ordered pair of numbers, and the second element is a number. I n U X ( V X TV) i t is just the other way around: the elements are ordered pairs in which the first element is a number and the second element is an ordered pair of numbers. The reader must be careful to make a distinction between ((2, l), 3) and (2, (1, 3)), for example.
26
SET THEORY
[CHAP.
EXAMPLE 6. If X is any set, then X X @ = cP X X = cP. Indeed, since the empty set contains no element, there cannot be any ordered pair whose first or second element belongs to the empty set.
Even though U X V # V X U and (U x V) X W # U X (V x W) in Example 5, it is true that U X V is equivalent to V X U and ( U X V) x W is equivalent to U X (V X W), as we see by counting the elements in each of these sets. It is easy to prove that these results hold in general.
THEOREM 13.3. Let X, Y, and Z be sets. Then (a) X X Y is equivalent to Y X X, and (b) (X X Y) X Z is equivalent to X X (Y X 2 ) .
Proof. We will prove (a) and leave the proof of (b) as an exercise for the reader. According to Definition 12.3, we must show that there is a onetoone correspondence between X X Y and Y X X. Every element of X X Y is an ordered pair (x, y), with x E X y E Y; every element of Y X X is an ordered pair (z, w), with z E Y and w E X. I f (x, y) E X X Y, then (y, x) E Y X X, so that (x, y) can be matched with (y, x). (y, x) is the desired onetoone correspondence The pairing (x, y) between X X Y and Y x X.
The definition of the product of two sets can be generalized to a finite collection of sets X1, X B , . . . , Xn. The product of these sets, deiloted by X1 x Xz X . X X,, is the set of al1 ordered strings of elements  (xl, x2, . . . , x,), where xi E Xi for i = 1, 2, . . . , n. For example, if n = 3, then
EXAMPLE 7. Let U, V, and TV be the sets defined in Example 5, that is = (1, 2, 31, V = (1, 3), and TV = (2, 3). Then
I t is possible to generalize Theorem 13.3 to products of finite collections of sets (see Problems 6, 7, and 8). We turn now to a second method of obtaining a new set from a given set X. DEFIXITION 13.4. Let X be any set. The set of al1 subsets of X is called the power set of X, and is denoted by P(X).
131
27
Thus, the elements of P(X) are precisely the subsets of X. In particular, E P ( X ) a n d X E P(X).
=
= =
a , then P(X)
(a).
=
(a, (a}).
=
If X is an infinite set, then it has infinitely many distinct subsets. That is, P(X) is infinite if X is infinite. In fact, if x E X, then {x) E P(X), so that P(X) contains a t least as many elements as X. Suppose that X is a finite set. Let X1 be a set which is obtained by adjoining to X a new element a which is not in X. That is, the elements of X1 are al1 of the elements of X, together with the new element a. Then every subset of X1 either does not contain a and is therefore a subset A of X, or else it contains a and is therefore obtained from a subset A of X by adjoining the element a to A. Thus, every subset A of X gives rise to two distinct subsets of X1, the set A itself and the set Al obtained by adjoining a to A. Note that al1 of the sets so constructed are different. That is, A # Al, and if A # B, then A # Bl, Al # B, and Al Z B1. Therefore, there are just twice as many subsets of X1 as there are subsets of X. That is, 1 P(X1)1 = 21P(X) 1. Starting with the empty set @ (for which IP(X)I = 1), it is possible to add elements one by one, doubling the cardinality of the resulting power set each time an element is added, until a set X containing n elements is obtained. Our reasoning shows that the power set of X will contain 2" elements.
THEOREM 13.5.
I f
There is another way to prove this theorem which is worth examining, since it gives additional information about the number of subsets of a finite set. Let X consist of the distinct elements al, a,, . . . , a,. With each ak in X, associate a symbol xk, and consider the formal product
I f this expression is multiplied out, the result is a sum of distinct products of x'a (except the first term, which is 1). There is exactly one such product for every subset {a,,, a,,, . . . , a,,) of {al, a,, . . . , a,), namely x,,x,, . . . x,,. The empty set corresponds to l. For instance, if
28
SET THEORY
Kow replace each x k by the symbol t. Then the product becomes (1 t)(l t) . . . (1 t), while in its expansion, al1 products corresponding to sets containing the same number j of elements become t'. In the example, X = {ai, a2, as), we obtain (1 t)3 = 1 t t 4t t2 t2 t2 t3 = 1 3t 3t2 t3. As in this example, al1 of the terms t j can be collected into a single expression of the form Nj,,tj, where Nj,, is precisely the number of subsets of X which have cardinality j. Therefore
+ + +
+ +
+ +
We can specialize even more by letting t have the value l . Then the identity (11) becomes
The sum 011 the righthand side of this identity represents the number of subsets containing no elements of X (the empty set), plus the number of subsets containing one element of X, plus the number of subsets containing two elements of X, and so on, until we reach N,,,, the number of subsets containing n elements of X. Clearly, this sum is just the total number of subset,s of X, since every subset contains some number of elements of X between zero and n. Thus, we have arrived a t the same conclusion as before : there are exactly 2" subsets of a set X with n elements. By using the binomial theorem of algebra (see Section 22) to expand (1 t),, it is possible to squeeze more information from identity (11). We get
The coefficient of t j is
*

n(n  l ) . . . (n j!
j + 1) 
n! j!(n

j)!
* An exclamation mark (!) following a natural number n denotes the number obtained by multiplying together al1 the numbers from 1 to n. For example, l! = 1, 2! = 1 . 2 = 2, 3! = 1 . 2 . 3 = 6, 4! = 1  2 . 3  4 = 24. I t i s a l s o customary to define O! to be 1. With this convention, the formulas for the binomial coefficients are correct in the cases j = O and j = n.
131
29
for al1 numbers t. This leads us to expect that the coefficients of the same powers of t on each side of the equation must be equal. That is, No,, = 1, Ni,, = n, Nz,, = n(n  1)/2,. . . , N,,, = 1, and in general
for al1 values of t, then a. = bo, a l = bl, . . . , a, = b,. This will justify (14). Thus, our somewhat longer proof of Theorem 13.5 yields the interesting fact that in a set X containing n distinct elements, there are n!/j!(n  j) ! different subsets containing exactly j elements. For example, in a set containing ten elements, there are 10!/4!6! = 210 subsets of cardinality 4.
1. List the elements of the sets A X B, B X A, ( A X B) X C, and A X (B X C), where A = (x,y, 2 , w), B = (1, 2)) and C = ( a ) . 2. Prove Theorem 13.3(b). 3. Let U = (1, 2). Prove that U X N is equivalent to N, where N is the set of al1 natural numbers. 4. Prove that if U is a finite nonempty set and V is a denumerable set, then U X V is equivalent to V. 5 . Prove that if U and V are denumerable sets, then the following sets are equivalent: U, V, U X V, V X U. 6. State the generalization of Theorem 13.3 for a finite collection of sets
X1) X2) , Xn. 7. Prove that U X V X W is equivalent to ( U X V) X TV, where U, V , and W are arbitrary sets. 8. Prove that the following sets are equivalent: U X V X W, U X W XV, V X U X W , V X W X u , w x U X V , W X V X U. 9. For any set X, define
n factors
77
xn = X X X X . . . X X ,
where n is a natural number. Show that if X = (1, 2), and Y then IXnJ = J P ( Y ) I .
=
(1, 2,
. . . , n),
30
SET THEORY
[CHAP.
10. List the elements of P(X), where X = {1, 2, 3, 4). 11. Let X be a set with 7 objects. (a) How many subsets of X of cardinality a t most three are there? (b) How many subsets of X of cardinality a t least three are there? 12. (a) Let t = 1 in equation (11) and interpret the meaning of the resulting identity. (b) What is the number of subsets of even cardinality of a set containing n elements? 13. Show that if the sets A and B have the same cardinality, then so do P ( A ) and P(B). 14. Cantor proved that if X is an infinite set, then X and P(X) do not have the same cardinal number, that is, it is impossible to give a onetoone correspondence between the elements of X and the elements of P(X). Prove this fact. [Hint: Suppose that a o A is such a correspondence. Let
(aja E X , a 
A a n d a G? A ) .
14 The algebra of sets. The ordinary number systems satisfy severa1 b = b a , a ( b . c) = ( a . b ) c important laws of operation, such as a and a (b c) = a b a c. There are also natural operations of combining sets which satisfy rules analogous to these identities. Moderii algebra is largely eoncerned with systems which satisfy various laws of operation, so it is natural that the algebra of sets should be a part of this subject. Our objective in this section is to study the principal operating rules for sets. The first two basic operations of set theory are analogous to addition and multiplication of numbers. They are binary operations, that is, they are performed on a pair of sets to obtain a new set.
The set A U B is called the union (or join or set s u m ) of A and B. The set A n B is called the intersection (or meet or set product) of A and B. As we pointed out in the Introductioii, the word "or" in mathematics is interpreted in the inclusive sense, so that the statement "x E A or x E B" includes the case where z is in both A and B. Thus, the union of A and B contains those elements which are in A, or in B, or in both A and B. The intersectioii of 4 , and B contaii~s those elements which are in both A and B.
141
31
These sets can be illustrated by means of simple pictures called Venn diagrams. The elements of the sets are represented by the points inside a closed curve in the plane. It should be emphasized that these diagrams are only symbolic, and that the elements of the sets which they represent are not necessarily points in the plane, but can be any objects whatsoever. In Fig. 13, the total shaded area is A U B and the doubly shaded area is A n B.
AnB
= =
( 1 , 2, 3, 4 , 5, 6 , 7 ) .
n (1, 3, 4, 5, 7 ) n
<a< <a<
1)
( 2 , 3, 6 ) (2, 6 )
(3).
(a10
= @. =
1) U {O, 1 )
< a 2 1).
=
(al$
< a < 2)
{al$
<a<
1).
EXAMPLE 6 . If *4 is the set of al1 points of a line through the point p and if B is the set of al1 points of a second (different) line through p, then A l B = ( p ) .
In most mathematical applications of set theory, al1 of the sets under consideration will be subsets of some particular set X. This set, called the universal set, may be different for different problems, but it will usually be fixed throughout any discussion. For the purposes of developing the algebra of sets, we will fix a universal set X once and for all. Al1 of the sets under consideration are assumed to be subsets of X. The third basic operation of set theory is analogous to forming the negative of a number. I t is a unary operation, that is, it is performed on a single set to obtain a new set.
Ac
= (X~X E
X , x g A).
The set A" is called the complement of A in X (or simply the complement of A if it is understood that A is being considered as a subset of the universal set X).
32
SET THEORY
[CHAP.1
Thus, A" consists of those elements of X which are ilot elements of A. There are many different notations in mathematical literature for the complement of a set A. Some which the reader may encounter are A', A, C(A), and c(A). In Fig. 14, the shaded area represents AC.
= =
(2, 4, 5).
(2).
Then {ala
<
O)"
THEOREM 14.3. Let A, B, a n d C be subsets of X. Then thefollowing identities are satisfied: (a) (b) (c) (d) A A A A A (e) A (f) A (g) A uB=BuA,AnB=BnA; u (B u C) = (A u B) u C, A n (B n C ) u A = A , A n A = A; n (B u C ) = (A n B ) u (A n C ) , u(BnC)= (AuB)n(AuC); U A " = X, A n A " = iP; uX=X,An@=@; u @ = A , A n X = A. (A
n B) n C ;
Proof. Al1 of these identities are simple consequences of the definition of union, intersection, and complement. We will illustrate this assertion by giving the detailed proofs of (b) and (d). The remaining identities are left for the reader to check. Suppose that x E A U (B U C). Then according to Definition 14.1, either x E A or x E B U C. Suppose that x E A. Then by Definition 14.1 again, x E A U B. Again, by 14.1, x E (A U B) U C. On the other hand, if x E B U C, then either x E B or x E C. I f x E B, then x E A U B, and consequeiitly x E (A U B) U C. I f x E C, then we conclude immediately that x E (A U B) U C. Thus, in every case, if x E A U (B U C),
141
33
theii x E (A U B) U C. By Definitioii 11.2, this means that A U (B U C) E (A U B) U C. A similar argument shows that (A U B) U C 2 A u (B u C). Hence, A U (B U C) = (A U B) U C. This proves the first half of (b). If x E A n (B n C), then by Definition 14.1, x E A and x E B n C. Thus, x E A, x E B, and x E C. Consequently, x E A n B and x E C. Therefore x E (&4 n B) n C. Hence, A n (B n C) G (A n B) n C. Similarly, (A n B) n C G A n (B n C). This shows that A n (B n C) = (A n B) n C. To prove the first equality of (d), suppose that x E A n (B U C). Then x E A and x E B U C. Hence, either x E A and x E B or x E A and x E C. That is, either x E A n B or x E A n C. Consequently, x E (A n B) U (A n C ) . WehaveshownthatA n (B u C ) E ( A n B ) u (A n C). On the other hand, suppose that x E (A n B) U (A n C). Then either x E A n B or x E A n C. If x E A n B, then x E A and ~ E B so , that X E A and ~ E B u C . Therefore X E A n ( B u C ) . Similarly, if x E A n C, then x E A n (B U C). Hence, in any case, x E A n (B U C). We have shown that
This inclusion relatioii, combined with the one obtained above, yields
Let us illust'rate by a Venn diagram the identity (d) which we have just proved. The heavily outliiied region in Fig. 15 represents either side of the identity. The reader should illust,rate the other identities of Theorem 14.3 by Venn diagrams. The identities (a) through (g) in Theorem 14.3 are the basic rules of operation in the algebra of sets. By algebraic manipulations alone, it is possible to derive from these numerous other laws of operation.
34
SET THEORY
[CHAP.
( C n A) U ( C n B) =
sncl
A n ( B U A)
A
Identities such as those of Example 10 can of course always be obtained directly from the definitions of the set operations, as y e did for the proof of Theorem 14.3. However, identities which involve severa1 sets can usually be derived more easily by algebraic manipulations. THEOREM 14.4. Let A, B, and C be sets. (a) A c A u B , B c A u B ; A = , A n B , B A n B . (b) If A 2 C and B C, then A U B E C; if A 2 C and B 2 C, then A n B 2 C. f A S B, then A U C B U C and A n C 2 B n C. (c) I (d) A E B if and only if A n B = A ; A 2 B if and only if AuB=A. The proofs of the various statements in Theorem 14.4 are again simple applications of the definitions. For exahple, let us prove the first part of (d). I f A c B, then x E A implies z E B, and hence x E A n B. A n B. I f x E A n B, then in particular, x E A, so that Thus A A n B c A . Therefore, A = A n B. Conversely, if A = A n B, then every element of A is in A n B and, in particular, in B. Therefore A 2 B. THEOREM 14.5. Let A and B be subsets of the set X.
n B)" =
u B".
141
T H E ALGEBRA O F SETS
35
The statements (a), (e), and (d) of Theorem 14.5 should be clear. Let us examine (b). To say that x E (A U B)" is the same as saying x A U B, which in turn amounts to x A and x B. That is, x E AC and x E Bc, which means x E Ac n Bc. Thus (A U B)" and A" n Bc contain exactly the same elements, so they are equal. The proof that (A n B)" = A" U Bc is similar. We illustrate the identity (A n B)" = A" U BC by a Venn diagram. In Fig. 16, the region outside of the doubly shaded region represents each of the sets (A n B)" and A" U Bc.
1. If the universal set is the collection N of al1 natural numbers, determine ,4 U B, A n B, and Ac in the following cases. (a) 4 , = (nln is even) , B = {nln < 10) (b) A = {n1n2 > 2n  1), B = {nln2 = 2n 3) (e) A = {nl(n 1)/2 E N), B = {nln/2 E N)
2. Prove Theorem 14.3(a), (c), (e), (f), and (g). 3. Justify each step of the computations in Example 10, using the results of Theorem 14.3 where they are needed. 4. Prove the following identities by algebraic manipulations, using the results of Theorem 14.3 and Example 10. (a) A U ( A c n B) = A U B, A n ( A C uB) = A n B (b) A U (B n (A U C)) = A U (B n C) (c) ( ( n n B) U (B n c>) U (C n A) = ( ( A U B) n ( B u C ) )
( C U A)
5. Illustrate Theorem 14.4(c) by a Venn diagram. 6. Show that if A E C, then A U (B n C) = (A U B) 7 C. 7. Prove Theorem 14.5(a), (e), and (d). 8. Using Theorem 14.3(d), (e), (g) and Theorem 14.4(d), show that if A U B = X, then Ac B. Also, show that if A n B = <P, then B E A". Thus, show that B = Ac if and only if A U B = X and A n B = a. 9. Use the result of Exercise 8 to give a new proof of Theorem 14.5(b).
36
SET THEORY
[CHAP.
10. Rlake Venn diagrams to illustrate the following identities. (a) A n (B U (C U D)) = (.(A n B) U (A n (BU ~ Cc) (b) ( A U (B n c ) ) ~=
n c))
U (A
n D)
11. If A and B are any sets, then the diflerence between A and B is defined to be A  B = ( a E Ala B). I n particular, if A and B are subsets of some universal set X, then A  B = A n Be. Show that the following are true. (a) (,4  B)  C = A  ( B U C ) (b) A  (B  A ) = A (c) A  ( A  B) = A n B 12. Define ,4/B
=
A ri B
UB
The binary operation (*/*) is called the Schefler stroke operation. 13. Translate the identities of Theorem 14.3(d) and Theorem 14.4(b) into rules involving only the Scheff er stroke operation. 14. Define A O B = (A n Be) U ( A c n B). Prove the following. (a) A O A = @ , A @ @ = A (b) (A O B) O C = A O ( B O C) (c) A n ( B o C) = ( A n B) O ( A n C )
15 Further algebra of sets. General rules of operation. It is possible to extend many of the identities in the previous section to theorems concerning operations on any number of sets.
As in the case of two sets, u(S) is called the u n i o n of the sets of S and n(S) is called the intersection of the sets in S. Thus, u(S) contains those elemeizts which are in any one or more of the sets in S, and n(S) contains those elemeizts which are in every set in S. For these definitions, S need not be a finite collectioiz of sets (see Example 3 below). In Fig. 17, S = {A, B, C, D); u(S) is the total shaded area and n(S) is the most heavily shaded area, inside the heavy outline.
EXAMPLE 1. Let S = ((1, 2), {l., 3, 5:, (2, 5, 6)). Then U(S) = (1, 2, 3, 5, 6) and n(S) = @. EXAMPLE 2. If S
=
A U B and n(S)
*4 l B.
151
FURTHER ALGEBRA O F
SETS.
37
EXAMPLE 3. Let C be a circle in soine plane P. Let S consist of al1 sets A which satisfy the following specifications: the elements of A are al1 points of P lying on the side containing C of some tangent line to C. (See Fig. 18.) Then U(S) is the set of al1 points in P, while n(S) consists of al1 points inside C.
EXAMPLE 4. Let S be the empty set of subsets of the universal set X. Shen
U(S)
@,
and
n(S)
X.
The reader should carefully check these examples to be sure that they satisfy the condition of Definition 15.1. Example 4 may perhaps be surprising, but it is nevertheless correct according to the definition. Moreover, the intersection or union of the empty set of sets is often eiicountered. It would be a nuisance if these operations were undefined in this case. I f S is a collection of sets, and if its member sets can be labeled by the elements of another set 1,then we write S = (CJi E 1 ). For example, let I = N = (1, 2, 3, . . .), and let S be the collection of sets
Then the set {1,2,. . . , i) can be denoted by Ci7and we have S = {Cili E N). f S= When this notation is used, the set I is called aiz index set. I (CiJi E I ) , it is customary to write uieICi for u(S) and n i ~ ~ for C i n(S). The identities of Theorem 14.3(b) are called the associative laws for the operations of set union and set intersection. These are special cases of a general associativity principle. THEOREM 15.2. I f the sets of the collection ( A l, A s , . . . , An) are united in any way, two a t a time, using each of the sets a t least once, the resulting set is equal to
38
SET THEORY
[CHAP.
If the sets of this collection are intersected in any way, two at a time, using each set a t least once, the resulting set is equal to
The phrases "united in any way, two at a time" and "intersected in any way, two a t a time" are somewhat vague. However, the meaning becomes clear if we look a t examples. The possible ways to unite the sets (A 1, A2, A3), using each set once are : Al U (A2 U A3), Al U (A3 U A2), (Al U A2) U A3, (Al U A3) U A2, A2 U (A3 U Al), A2 U (Al U A3), (A2 U A3) U Al, (A2 U Al) U A3, A3 U (Al U A2), A3 U (A2 U Al), (A3 U Al) U A2, (A3 U A2) U Al
Four sets can be united, two a t a time using each set once, in 120 ways: (Al U (A2 U A3)) U A4, ((Al U Az) U A3) U A47 A1 u ((A2 u A31 u A,), (A1 U A2) U (A3 U A,), A l U (A2 U (A3 U Aa)), together with the cornbinations obtained by interchanging Al, A2, A3, and A4 in various ways. I t is a simple chore to prove any individual instance of Theorem 15.2, for example, to show that (Al U (A2 U A3)) U A4 = u ( { A ~ , A2, A3, A4)). This is perhaps the best way for the reader to convince himself that the theorem is true. Of course, this method is not a "proof" and will not satisfy the mathematician who demands a general method which will cover al1 possible cases at once. A mathematically correct proof of Theorem 15.2 is out of reach until we have discussed inductive proofs. These will be considered in Chapter 2, and a proof of the theorem will be given there. As a consequence of Theorem 15.2, we may adopt the notation
since the expression A l U A2 U . U A, is just what is obtained from unions of the type considered by omitting parentheses. The theorem says that the arrangement of parentheses is of no consequence anyway. I t is possible to give a proof now of a theorem which is closely related to Theorem 15.2. THEOREM 15.3. Let S and T be two sets whose elements are sets. Then (a) u(S U T) = u(S) U u(T), (b) n ( S U T) = n(S) n n(T).
151
FURTHER ALGEBRA O F
SETS.
39
I~XAMPLE 5. Let S
((1, 2, 31, {2,4)), and T = ((2, 31, (2, 4)). Then {{l,2,3), {2,4), (2,3117 {1,2,3,4), ncs u T ) = (21, U(S) = (1, 2 7 3,4), U(T) = (2, 3,4), n ( S ) = {2),n(T) = (21, U(S)U U(T) = {1,2, 3,4), n ( S ) ri n ( T ) = (2).
=
uT= U(S u T ) =
S
The proof of Theorem 15.3 amounts to a careful examination of Definitions 14.1 and 15.1. Suppose x E u(S U T ) . Then x E A for some f A E S, A E S U T. But if A E S U T , then either A E S or A E T. I f A E T , then x E u ( T ) . In either case, x E u(S) U then x E ~ ( 8 ) I . U(T). This shows that u(S U T ) c u ( S ) U u ( T ) . On the other hand, if f A E S, then certainly A E x E u ( S ) , then x E A for sorne A E S. I S U T. Thus, x E u(S U T ) . Therefore u ( S ) G u(S U T ) . Similarly, ( T ) c u(S u T ) . Thus u(S) u u ( T ) u(S U T ) ,by Theorem 14.4(b). I f the opposite inclusions are combined, it follows that u(S U T ) = U(S) U u ( T ) . Part (b) of Theorem 15.3 is proved similarly. The identities (d) of Theorem 14.3 are the distributive laws for the set operations of union and intersection. These laws can also be generalized. THEOREM 15.4. Let (Bili E 1)be a set of sets, and let A be any set. Then
that the elements of A n (uiEIBi) are exactly the same as the elements of uiEI(A n Bi). The other statement is proved in a similar way.
EXAMPLE 6. I n Theorem 15.4, let I = N = (1, 2, 3, . . .). Define Bi {1, 2, . . . , i) for i E 1,and let A = (2nIn E N) = (2, 4, 6, . . .). Then
UiEIBi= N,
and
n (Ui~rBi) = A nN
A.
We have A n B; = (2nln E N, 2n 5 i), and U ~ E I ( A l Bi) = {2nln E N) = A. Further, n i ~ r B = i (11, SO that A U ( n i ~ r B i= ) A U (1) = (1, '44, 6)
SET THEORY
Finally,
Therefore,
(1, 2, 4, 6,
. . .).
In the ordinary arithmetic of numbers, it is possible to start with a single nonzero number, say 2, and to build from it infinitely many other numbers by addition, subtraction, multiplication, and division. One of the surprising facts about the arithmet,ic of sets is that only a finite number of different sets can be constructed from a finite number of sets using the operations of union, intersection, and complementation. For example, starting with a set A (contained in the universal set X), we obtain the sets A", A n A = A, A U A = A. Thus the first step of the construction yields one new set A". At the second step, we get A n A" = +, A U A" = X, as well as A and A" again. The next step produces no new sets, nor does any step thereafter. A little calculation will show that the only possible sets which can be constructed from two sets A and B in X are a, A, B, A",
Bc,AnB,AcnB,An13c,AcnBc,AuB,AcuB,AuBc,A~Bc, (A n Bc) U (Ac n B), (A n B) U (Ac n Bc), X. In this list, the four sets A n B, A n Bc, A" n B, and A" n Bc are particularly interesting.
An examination of the Venn diagram in Fig. 19 indicates why these sets are important. We see that except for a, each set in our list is the union of one, two, three, or al1 of these fundamental sets. For example, A
uB
(A
n B) u
This is an example of a general theorem which is usually called the disjunctiue normal form t heorem.
161
MEASURES ON SETS
Al Al A", A;
n A 2 n A;, n A; n A;,
A2
A;,
n A; n A;.
By the theorem, every possible nonempty combination of A l , A2, and A 3 can be obtained as a union of one, two, three, four, five, six, seven, or al1 of these Mijk. For instance,
The proof of Theorem 15.5, like the proof of Theorem 15.2, can be carried out only by mathematical induction. Since this result will not be needed in later parts of this book, a formal proof will not be given.
1. Suppose that S = {A), where A is a set. What is U(S)? What is n(S) ? 2. Check Theorem 15.2 for the following particular combinations of sets: (Al U A2) U ( A 3 U A4), Al U (A2 U (A3 U ~ 4 ) ) , ( A l U (A2 U A s ) ) U A4. 3. Prove Theorem 15.3(b). 4. Let (Ai[i E 1) be a set of subsets of X. Show that the following are true. (a) (uiEIAi)" = niIA(
(b) ( n i ~ r A i ) " = ui~A:
5. What is the largest number of different sets which can be constructed from three subsets A, B, C of a universal set X, using the operations of union, intersection, and complementation? I f C A C B C C C X, how many different sets can be constructed?
*16 Measures on sets. One important application of set theory is its use in mathematical statistics. The foundation of statistics is the theory of probability, and in its mathematical form, probability is the study of certain kinds of measures on sets. In this section and the next the concept of measure of a set will be introduced, and some of its simplest properties will be examined.
SET THEORY
Severa1 ways of "measuring" sets are already known to the reader. For example, the measure of a line segment (which may be considered as a set of poiiits) is usually taken to be the length of the segment. A good measure of a finite set A is 1 Al, the number of elements in A. But there are situations where different measures of line segments and finite sets are more useful. For example, a railroad map usually indicates the route between major cities by a sequence of line segments connecting intermediate points as shown in Fig. 110. Here the length of each line segment is of little interest. The important measure of these segments is the actual rail line distance between the cities corresponding to the points which the segments connect. Another useful measure for these line segments might be the annual cost of upkeep of that section of the rail line which is represented by them. Note that this measure has a natural extension to those subsets of the map which are unions of two or more segments. For example, if I I represents the part of the rail line between Milwaukee and Chicago and if I 2 represents the part between Detroit and Buffalo, then the cost of upkeep of the part of the rail line represented by I l U I 2 would be the cost for I l plus the cost for 1 2 . We will now consider a measure for finite sets which links our discussion to the application of set theory to probability. Suppose that a pair of dice, labeled A and B, are rolled. Both A and B will come to rest with a number of dots from 1 to 6 on the "up" face. The result of the roll can therefore be represented by an ordered pair (m, n) of natural numbers, where m gives the number of dots on the "up" face of A and n gives the number of dots on the "up" face of B. Thus, m and n can be any natural numbers from 1 to 6. I f the dice are "honest," then it is reasonable to suppose that for any roll of the dice, the 36 different pairs (m, n) are equally likely to occur. Now it is customary to define the "point" which is made on any roll of the dice to be the total number of spots on the two "up" faces. Thus, if the outcome of the roll is represented by (m, n), then the point made on the roll is m n. Therefore, the possible points which can be made on a roll of the dice are the numbers from 2 to 12, that is, the set of possible points is (2, 3, . . . , 12). We now assign a measure to the subsets of (2, 3, . . . , 12). I f S is such a subset, assign as the measure of S the probability that on a roll of the dice the point made will be a member of S. The probability of making a certain point is the ratio of the number of different ways that the point can be made, to 36, the number of possible results of a roll. For example, the probability of making the point 2 is
161
MEASURES ON SETS
43
ure would assign to the subset (2) the number h. Suppose now that the subset S is the set (7). The outcome of the roll will be in (7) only if the point made is 7. Since 7 can be made in six possible ways: (1, 6), (2, 5), (3, 4,) (4, 3), (5, 2 ) ) (6, 1), the probability of making 7 is = 3, and the measure of (7) is i. As another example, take S = (7, 11). The point made on a roll will be in this set if it is a 7 or 11. We have seen that there are six ways of making 7. There are tmo ways of making 11: ( 5 , 6) and (6,5). Thus, the measure of the set (7, 11) is & = 4j. It is clear now that this "probability measure" can be determined for each of the Z1 = 2048 different subsets of possible points. Let us now look for some common properties of the measures described in the above examples and try to arrive a t a suitable mathematical notion of measure. One property is immediately evident. In each case, there is a rule for assigning a certaiii number to various subsets of a giveii set. In the example of a railroad map, two different measures were suggested for line segments making up the map. The second of these measures, the cost of upkeep, was actually defined for unions of segments of the map. In both cases, however, the measures are defined only for very special subsets of the whole map. In general, measures need not be defined 011 al1 subsets of a given set, but only on some collection of subsets. However, unless these collections satisfy certain "closure " conditions, the measures on them will not be very useful.
6, since 2 can be made in only one way, by the roll (1,l). Thus our meas
DEFINITION 16.1. Let X be a set. A nonempty collection S of subsets of X is called a ring of subsets of X (or just a ring of sets) if it satisfies the following two conditions.
(a) If A E S and B E S, then A U B E S. (b) I f A E S and B E S, then A n Bc E S.
EXAMPLE 1. If X is any set, then the collection of al1 subsets of X is a ring of subsets of X. EXAMPLE 2. Let X be an infinite set. Then the collection of al1 finite subsets of X is a ring of subsets of X. Moreover, X is not in this ring. EXAMPLE 3. Let S be the set of al1 subsets of R which are finite unions of sets of the type I = {xja < x 5 b , a E R, b~ R). Such sets are called halfopen intervals. That is, each set of S has the form I1U 1 2 U U In,where Ii = {xlai < x 2 bi), 1 2 = (xla2 < x 5 b2), . . . , 1 , = {xla, < x b,), for some real numbers al, a2, . . . , a,, bi, b2, . . , b,. Then S is a ring of sets.
<
44
SET THEORY
[CHAP.
The expression "ring of sets" is standard mathematical terminology. I t is derived from abstract algebra. The "closure" conditions to which we alluded above are the properties (a) and (b) in Definition 16.1. There are other important closure conditions which are satisfied by rings of sets. THEOREM 16.2. Let S be a ring of subsets of X. Then
E S; (a) If A E S and B E S, then A n B E S ; (b) f Al, A2, . . . , A, E S , then Al U A2 U (c) I Al n A 2 n .  . nA , E S .
U A, E S and
Proof. One of the requirements in Definition 16.1 is that S be nonempty . Thus, there is some subset A of X which belongs to S. Consequently, by Definition 16.1 (b), A n A" = is in S. Suppose that A E S and B E S. Then by Definition 16.1 (b), A n BCE S. Now use Definition 16.l(b) again, with A n Bc taking the place of B. We obtain A n (A n Bc) E S. However, by Theorems 14.5 and 14.3, A n (A n BC)"= A n (Ac u B) = (A n Ac) u (A n B) = u (A n B) = A n B. Thus, A n B E S. Finally, if Al, A2, . . . , A , belong to S, then using Definition 16.l(a) repeatedly gives Al U A2 E S, Al U A2 U A3 = ( A ~ u A ~ ) U A ,~ . .E . ,S A ~ u A ~ u .   u A , E S . Similarly, by using repeatedly 16.2(b), which we have just proved, we find that Al n A, n  n A,ES.
There is one more important property that our examples have in common. I n the upkeep cost measure on the segments of the railroad map, we noted that if Il and I2are distinct segments, then the measure of 1U I2is the measure of Il plus the measure of 12. This is still clearly true if Il and I2are replaced by unions of segments, provided that these unions have no segment in common. This additivity property is shared by the probability measure example. Here, the measure was defined for al1 subf A and B are subsets such that sets of the set of points (2, 3, . . . , 12). I no number of (2,3, . . . , 12) is in both A and B (A n B = a), then the measure of A U B is the sum of the measures of A and B. For example, the measure of (7) is +,the measure of (1 1) is &,and the measure of (7) U = $. This simple property is the essence of (11) = (7, 11) is 8 the mathematical notion of measure. Two sets A and B are said to be disjoint if they have no elements iii common, that is, A n B = a. A collection of sets is called pairwise disjoint if each pair of different sets in the collection is disjoint. Note that the term "pairwise disjoint" refers to the collection of sets as a whole and not to the individual sets in the collectioii.
161
MEASURES ON SETS
45
EXAMPLE 4. The sets (7, 11) and (2, 12) are disjoint. 5. The collection of line segments in the railroad map example are EXAMPLE pairwise disjoint, provided we agree that each line segment includes its lefthand endpoint, but not its righthand endpoint. EXAMPLE 6. Let A l , A2, A3, . . . be the sets of real numbers x defined by Al = (x(1< x 21, A2 = {xl2 < x 5 31, A3 = (213 < x 5 41, etc. Then the collection A l , A2, A3, . . . is pairwise disjoint.
<
DEFINITION 16.3. Let X be a set, and let S be a ring of subsets of X. A measure on the collection S is a rule which assigns to each set A in the collection S some real number m ( A ) , subject to the conditioii that if A and B are disjoint sets in S , then
We will be concerned principally with measures defined on the set of al1 subsets of a finite set. For this discussion the following example is important.
EXAMPLE 7. Let X be a set containing n distinct elements xl, x2, . . . , x,. Let mi, m2 . . . , m, be a sequence of n real numbers. For a nonempty subset *4 of X , define m ( A ) to be the sum of al1 those mi for which xi E A. If A = let m ( A ) = O. For instance, if n = 3,
+,
m(+) = O, m ( { x i ) ) = mi, m((x2)) = m2, m({xa)>= ma, m ( { x i ,x2)) = mi ma, m ( { x i ,~ 3 ) = ) mi i m3, m({x2,2 3 ) ) = m2 m3, m((x1, x2, 2 3 ) ) = mi 4 m2 m.
+ +
It is left to the reader to show that the condition of Definition 16.3 is satisfied, so that a measure is defined on the collection P ( X ) of al1 subsets of X . Particular cases are worth noting. (1) If m1 = m2 = . = m, = 1, then m ( A ) = Al, the cardinal number of A . (2) If mi = 1, m2 = . = m, = O, then m ( A ) = 1 if xi E A and m ( A ) = O if xi 4 A. Thus we can say that m measures whether or not xi is in A. (3) Let xi = 1, x2 = 2 , . . . , X n = n. Let m1 = 1, m2 = 1, m3 = 1,. . . , m, = (1)". Then m ( A ) is just the number of even numbers in A minus the number of odd numbers in A.
I t is not surprising that there are so many interesting special cases of Example 7, since actually every measure on the collection P(X) of al1
46
SET THEORY
[CHAP.
subsets of a finite set S is of this form. This will become clear after we observe t,hat the additive property of measures has a simple generalization.
THEOREM 16.4. Let m be a measure defined on a ring S of subsets of a set X. If (A 1, AS, . . . , A,) is a collection of sets in S and this collection is pairwise disj oint, then
m(Al U A2 U
U A,) = m(&)
+ m(&) + + m(An).
+
I f n = 2, this theorem is the same as the additivity condition for a measure required in Definition 16.3, namely m(A1 U A2) = m(Al) m(A2) if A 1 n A = di. Consider the case n = 3. The assertion is
Since the collection {A1, A2, A3) is pairwise disjoint, we know in particular, that A2 n A3 = Qj. Since m is a measure, by Definition 16.3, m(A2 U A3) = m(&) Thus, we have
+ m(&).
Now if Al and A2 U A3 are disjoint, we can apply Definition 16.3 again to the left side of the last equality to obtain the desired result:
By the distributive law for the set operations (Theorem 14.3), we obtain Al n (A2 u A,) = (A1 n A,)
so that A l and A2 U A3 are indeed disjoint. We used here the fact that Al n A2 = @ and A l n A3 = @, which is justified by the assumption that {Al, A2, A3) is pairwise disjoint. By repeated application of the argument used in the case n = 3, it is possible to see that Theorem 16.4 is true for any n. A formal proof of this theorem will not be given here, because such a proof is based on the principle of mathematical induction. The reader should begin to be aware that mathematics leans heavily on this important method of proof which will be discussed in the next chapter. Accepting Theorem 16.4, we are ready to examine the assertion that every measure defined on P ( X ) for a finite set X is of the type given in
161
MEASURES ON SETS
47
Example 7. For simplicity, suppose that X = (21, 2 2 , $3, x4), where x l , x2, x3, x4 are distinct. Suppose that m is a measure defined on P(X). Then m1 = m((xl)), m2 = m({x2)>, m3 = ( b ) ) , m4 = m((x4)) are certain real numbers. I t is evident that if A is any nonempty subset of X, we can write A = Uzi~~{xi). For example, {xl, x2, x3) = ( ~ 1 )U {XZ)U {x3). Moreover, if i # j, then {xi) n (xj) = <P. Thus, the collection of al1 distinct one element sets {xi), with xi E A, is pairwise disjoint. Hence, by Theorem 16.4, m(A) is the sum of al1 mi = m((xi)) for which xi E A. For example, if A = {xl, x2, x3,), then
This argument shows that any measure m on P(X) is a measure of the type described in Example 7, that is, Example 7 is the most general possibility for a measure on the set of al1 subsets of a finite set. Indeed, starting with a measure m on P(X) for which nothing is assumed except that it satisfies the conditions of Definition 16.3, we have shown that there are numbers mi corresponding to the distinct elements xi E X such that for any nonempty subset A of X, the measure m(A) is precisely the sum of those mi for which xi is in A. But this is just the measure of Example 7, except possibly for A = <P. However, in the iiext section we will show that m(@) = O for every measure.
1. I n the dice rolling example, find the measure of the following sets:
48
SET THEORY
[CHAP.
7. I n a certain game, three pennies are tossed a t the same time and points are scored, depending on the outcome of the toss, as follows:
3 heads 2 heads and a tail
= =
20 points, 10 points,
Define a probability measure m on the collection of subsets of the possible points (20, 15, 10, 51, as was done for the dice rolling example in the text. Find m((20, 15, 10, 5)), m((20, lo)), and m((5)). What is the probability that a t least two heads will appear in the outcome of a toss? 8. I n a certain card game, two cards are dealt from a standard deck of 52 cards. Aces count 4 points, kings 3 points, queens 2 points, jacks 1 point, and al1 other cards O points. I n a given deal, the possible points range from O points (neither card is an ace or a face card) to 8 points (a pair of aces). For a subset A of the set {O, 1, 2, . . . , 8) of possible points, define m(A) to be the probability that on a given deal the number of points scored is an element of A. Find m((O)>, n2({%9, m((5, 6, 7,811, and m((l1).
"17 Properties and examples of measures. In this section we derive some useful properties of measures.
THEOREM 17.1. Let m be a measure on a ring S of subsets of a set X. (a) m(@) = 0. f A E S and A" E S, then m(Ac) = m(X)  m(A). (b) I Proof. ( a ) Since @ n @ = @, the empty set is disjoint from itself (and it is the only set having this property). Thus m(@) = m(@U &) = m(@) m(@). Subtracting the nurnber m(@) from both sides of this equality gives O = m(@). (b) By Theorem 14.3(e), A n A" = and A U A" = X. Thus, m(Ac) = m(A U A") = m(X). A and A" are disjoint, so that m(A) Again, by subtraction, m(Ac) = m(X)  m(A).
uB
( A n B),
n Bc, Ac n B, A n B)
is pairwise
Subtracting the second and third equalities from the first one gives
m ( A U B)
which when rearranged is the desired identity. We conclude this chapter by giving some practica1 examples of measures on finite sets.
EXAMPLE 1. In a certain class, 40% of the students are blonds, the rest are brunettes, 12% are lefthanded, and 5% are both blond and lefthanded. Find the percentage of atudents who are righthanded brunettes. Let A be the subset of blonds in the class and B be the subset of lefthanded students. Then A 1 7 B is the subset of lefthanded blonds and A U B is the subset of students who are either blond or left handed. Moreover ( A U B)" is the subset of students who are neither blond nor left handed, that is, the subset of righthanded brunettes. Recalling that cardinality is a measure on the set of al1 subsets of a finite set, Theorem 17.2 gives
Now if there are n students in the class, and C is any subset of students, then 1Cl/n gives the fraction of the class which is in C and 1001Cl/n gives the percentage of the class which is in C. Thus, we have
Therefore, since 47% of the class is in A U B, 53% of the class is in (A U B)" that is, are righthanded brunettes.
50
SET THEORY
[CHAP.
EXAMPLE 2. A certain type of spring balance is constructed so that i t measures only weights between one and two pounds. If we have three steaks each of which is known to weigh between and 1 pound, how can we use the spring balance to determine their weights exactly? Let the steaks be denoted by xl, 22, x3. For a subset A of X = (xl, x2, x3), let m(A) be the total weight of the steaks in the set A. Clearly m is a measure on the subsets of X, Let Al = (x2, x3), A2 = (xi, $31, A3 = (x1, 52). Because of our rough knowledge of the weights of $1, x2, and x3, we are certain that m(*41), m(Aa), and m(A3) can be accurately determined by the spring balance, since their weights are between 1 and 2 pounds. Now Al U A2 = A2 U A3 = A3 U Al = X, and A l fl A2 = (x3), A2 n A3 = {xl), A3 n A1 = (52).
Hence, m(X)
+ ( m ( ~ l ) m(A2)
+ m(&)).
Therefore,
I t is possible to extend the result of Theorem 17.2 to an identity which involves more than two sets. For example, suppose that A, B, and C are subsets of X and that m is a measure defined on a ring of subsets of X. Then using Theorem 17.2 repeatedly,
m(A)
+ m(B) + m(C)
m(B ri C)  m((A
n B) U
( A n C))
m(A) m(B) [ m ( An B)
171
51
EXAMPLE 3. The classification of blood type is made on the presence or absence of three distinct antigens in the blood. These antigens are denoted by A, B, and Rh. The possible blood types are eight in number:
Type O, Rh negative O, Rh positive A, Rh negative A, Rh positive B, Rh negative 13, Rh positive AB, Rh negative AB, ~h positive
Suppose that in a group of ten people: 4 have antigen A, 5 have antigen B, 6 have antigen Rh, 2 have antigens A and B, 3 have antigens A and Rh, 3 have antigens B and Rh, and 2 have al1 antigens. Determine the number of people in the group having type O, Rh positive blood. Let TA, TB) TRh denote the sets of people having the respective antigens A, B, and Rh. The number of people with type O (Rh positive or negative) is
The number of people with type O, Rh positive is therefore 3  1 = 2. By similar considerations, it is possible to determine the number of people with each of the eight possible blood types.
A UB A B
= = =
52
SET THEORY
[CHAP.
2. Determine m(A U B U C U D) in terms of m(A), m(B), m(C), m(D), m(A n B ) , m(A n C), m ( A n o ) , m(B n C), m(B n o ) , m(C n D), m(A n B n C), m(A n B n D), nz(A n C n D), m(B l C n D), and m(A n B n C r i D). 3. Show that the empty set @ is the only set disjoint from itself. 4. Suppose that a certain spring balance rneasures only weights between 1+ and 3 pounds. If four steaks are known to weigh between 4 and 1 pound, show how the spring balance can be used to determine their weights exactly. 5. I n Example 3, find the number of people with blood types A, Rh negative, A, R h positive, AB, R h negative, and AB, Rh positive. 6. Three numbers 1, 2, 3 are written in random order. Assume that each possible ordering is equally likely. What is the probability that a t least one of the numbers will occupy its proper place, that is, 1 occurs first, or 2 occurs second, or 3 occurs third? 7. I n a certain sample of the population, i t is found that lung cancer occurs in 15 cases per 100,000 people. It is estimated that 80% of those with lung cancer smoke and that 65% of those without lung cancer smoke. (These are fictitious estimates.) Determine the approximate ratio of smokers with lung cancer t o smokers without lung cancer.
CHAPTER 2
MATHEMATICAL INDUCTION
21 Proof by induction. The essence of mathematics is the construction of logically correct proofs for general theorems. A beginning student is apt to look upon a mathematical proof as a sort of magical incantation which somehow gives truth to a theorem. Nothing could be further from the intention of the person who devises the proof. A proof is worthless if it is not convincing, at least to an intelligent person who makes the effort to understand it. Generally speaking, there are two steps leading to the understanding of a mathematical proof. The first step is the mechanical checking of the proof to see that each statement follows as a logical consequence of statements which precede it. I f the argument survives this test, and if the final statement is the assertion which was to be proved, then it must be admitted that the proof is valid. But to really understand the proof it is necessary to take the second, more difficult, step. One must look at the overall pattern of the argument and discover the basic idea behind it. The ideal is to see the proof through the mind of the person who originated it. Of course, this may require a high degree of mathematical talent, to say nothing of hard work, but the reward in selfsatisfaction is substantial, every bit as great, perhaps, as the reward which a musician obtains from mastering a difficult piano or violin sonata. Fortunately there are a few general methods of constructing mathematical proofs which are both elementary and powerful. Our objective in this chapter is to explore in detail one of the most important of these methods, the socalled proof by mathematical induction. Mathematical induction must be distinguished from logical induction. Roughly speaking, logical induction is the process of discovering general laws by noting some common feature in a number of special cases. As an example, if the sequence of numbers 1, 4, 9, 16, 25, . . . is written down, most people who have had some experience with arithmetic will infer by logical induction that the next term in this sequence will be 36. They recognize that 1, 4, 9, 16, and 25 are, respectively, the squares of 1, 2, 3, 4, and 5 , so that the natural choice for the next term is 62 = 36. Logical induction, although it is important for the process of mathematical discovery, is of no use in mathematical proofs. On the other hand, mathematical induction is primarily a technique of proof. I f we examine, say, Theorems 15.2, 15.5, and 16.4, we see that they are statement's which involve an arbitrary natural number n. These state53
54
MATHEMATICAL INDUCTION
[CHAP.
ments become specific assertions only when particular numbers are substitut,ed for n. For small values of n , the statements are quite easy to prove. The difficulty lies in finding a proof which takes care of al1 values of n. This is a situation in which mathematical induction can often be used. We present some examples of mathematical statements which involve an arbitrary natural number n.
EXAMPLE 2. If a
> 1,
then ( 1
EXAMPLE 3. (Theorem 16.4). Let m be a measure defined on a ring S of subsets of X. Let { A l , A2, . . . , A,) be a pairwise disjoint collection of sets in S. Then m(A1 U A2 U U An) = m ( A l ) m(A2) m(An).
+ +
In order to illustrate the mechanism of mathematical induction, consider Example l . As is often the case, our notation is not well adapted for small values of n. For n = 1, 2, 3, and 4, the assertions should read 1 = + . l ( l + 1), 1 + 2 = + . 2 ( 2 + 1) = 3,
For n larger than 4, the notation expresses the asserted identity clearly enough. Thus if allowance is made for the inadequate notation in the 3 cases n = 1, 2, 3, and 4, the statement of Example 1, 1 2 n = +n(n l ) ,can be considered as a compact method of writing an infinite sequence of formulas:
+ + +
A person who is not familiar with the identity which we are considering may be somewhat surprised that the formula works for the values n = 1 , 2 , 3 , and 4. But he may be justifiably skeptical that this fact makes 3 4 the assertion true in general. Let us try n = 5. Then 1 2 5 = ( 1 + 2 + 3 + 4 ) + 5 = + . 4 ( 4 + 1) + 5 = 1 0 + 5 = 15. Here we have been able to simplify our calculation somewhat by using the formula which we have already checked for the case n = 4. It turns out that this simplification is the real key to t,he general proof of the formula.
+ + + +
211
PROOF BY INDUCTION
55
For + . 4 ( 4 1) 5 can be expressed as 15 = + * 4 ( 4 1) 5 = 5 . ( 3 . 4 + 1) = + . 5 ( 4 + 2 ) = +  5 ( 5 + l ) , the required result. A similar calculation can now be made for n = 6, and, using the same simplification, we find that the formula is also correct in this case. In fact, the process of passing from one formula to the next can be formalized if we are willing to use a variable symbol n instead of a specific number. Thus, suppose that we have already shown that 1+2+3+...+n= Then +n(n+l).
+ +
+ +
The first, third, fourth, and fifth equality signs in the above identity are justified on the basis of the rules of algebraic operation. The remaining equality, the second, is justified by the assumption that formula n of the n = +n(n l ) , is valid. Note that 1 3 sequence, 1 2 2 + 3 +   . + n + ( n + l ) and 1 + 2 + 3 +  . . + ( n + l ) areboth abbreviations for the sum of the first n 1 natural numbers, so that they are equal. Thus, the equality of the first and last terms of the above expression is the n plus first identity in the sequence of formulas which we are trying to prove. In other words, the calculation shows that if some identity of the sequence is valid, then so is the following one. In particular, since the fifth identity is correct (as well as the first, second, third, and fourth), so is the sixth. Consequently, so is the seventh, the eighth, and so on. Since any formula of the sequence will eventually be reached in this way, we conclude that the identity of Example 1 is valid for al1 n. The proof we have given for the identity of Example 1 is a proof by mathematical induction. Although this method of attack is usually suggested for mathematical statements which involve an arbitrary natural number n, there are often other types of proof available. For example, we could also prove the formula of Example 1 as follows. Let S be the sum of the first n natural numbers. Then
+ + +
Therefore, n(n
+ 1) = 2s and
+n(n
+ 1).
56
MATHEMATICAL INDUCTION
[CHAP.
Let us next consider Example 2 : If a 1, then (1 a)n 1 na. This example is somewhat different from the first one, since it has the form of a mathematical theorem, rather than a mathematical identity. As in the case of Example 1, the statement of Example 2 can be considered to be an abbreviation for an infinite sequence of statements:
>
> +
If a If a If a
In these statements, a represents an arbitrary real number, and we re1 in each statement. Clearly, the first of these statements quire a is true. Let us try to proceed as in Example 1, using the nth statement of the sequence to prove the following statement. Assume then that 1, we have 1 a 2 0. Therefore, (1 a)" 1 na. Since a multiplying each side of the assumed inequality by 1 a preserves the direction of the inequality and gives
>
> +
>
I n this example, the argument needed to pass from statement n to statement n 1 is somewhat more complicated than the corresponding proof in Example 1. Nevertheless, it achieves the same end: from the truth of the first statement (n = l ) , the truth of the second statement (n = 2) follows; from the truth of the second statement, the truth of the third statement (n = 3) follows; and so on. Eventually every statement of the sequence is proved. Let us review the methods which we have used to prove the statements given in Examples 1 and 2. I t should be evident that both proofs follow the same outline. That outline, stated in general terms, is the principie of induction. The statements in Examples 1 and 2 involve an arbitrary natural number n. Thus, in both cases, we are presented with the problem of proving
211
PROOF BY INDUCTION
57
al1 of the statements in an infinite sequence P1, P2,P3,. . . of mathematical assertions. The procedure which we followed to prove these statements in the examples consisted of two steps. First, we observed that the first statement P1 of the sequence is true. Then we showed that for any n, it is possible to construct a proof of the statement Pn+l, based on the assumption that P, is true. This deduction of Pn+lfrom P, took the form of an ordinary mathematical argument (using logic and known mathematical facts). The number n occurred throughout the proof as a variable. For example, we could have substituted a number like 23 for each occurrence of n in the proof to obtain a deduction of Ps4 from P23. From these two steps in both Examples 1 and 2, it was concluded that al1 the statements were true. These conclusions were special cases of what f mathematical induction. is called the principle o (21.1). Principle o f mathematical induction. Let P1,P2,P3, . . . be a sequence of statements. Suppose that (a) P1 is true, and (b) for any n, if P, is true, then P,+l is true. Then al1 of the statements P1, Ps, P3,. . . are true. By assumption (a), P1 is true. By assumption (b) in the case n = 1, if P1is true, then P2is true. Thus P2is true. By (b) in the case n = 2, if P2is true, then P3is true. Thus P3is true. We can continue indefinitely in this way. Since any statement of the sequence will ultimately be reached, it follows that every one of the statements is true. To apply the principle of induction in making a mathematical proof, it is necessary to establish that conditions (a) and (b) are satisfied. The proof of (a) is usually called the basis of the induction while the proof of (b) is called the induction step. In carrying out the proof of the induction step, it may be assumed throughout the argument that the statement P, is true. This is called the induction hypothesis. I t should be noted however that the validity of the induction step does not necessarily depend on the truth of P,. For example, if P, is the 1)2 assertion "n2 n is odd," then Pn+l is the statement "(n (n 1) is odd." Since (n 1)2 (n 1) = n2 n 2(n l ) , it follows that if P, is true, then so is Pn+l(because the sum of an odd number and an even number is odd). That is, condition (b) is satisfied. However, P, is actually false for every n E N, since n2 n = n(n l), and a t least one of the natural numbers n or n 1 is even. Another aspect of the proof of the induction step which should be emphasized is that n must represent an arbitrary natural number (that is, a variable) throughout the argument. This is essential because the fact that P, implies P,+l is applied successively with n = 1, 2, 3, . . . .
+ +
+ + +
+ + +
58
MATHEMATICAL INDUCTION
[CHAP.
EXAMPLE 4. We prove Theorem 16.4 (Example 3). The proof is a typical application of mathematical induction to establish a mathematical theorem. The statement to be proved for the basis step is the following: If {Al} is a pairwise disjoint collection of sets, then m(A1) = m(A1). This is obviously true. To prove the induction step, we make the induction hypothesis P,: If
(Al, A2,
..
 7
An)
U An) = proved is of sets in m(A,)
is a pairwise disjoint collection of sets in S, then m(A1 U A2 U . m(A1) rn(*42) m(A,). The statement which has to be P,+i: If {Al, A2, . . . , A,, An+l) is a pairwise disjoint collection S, then m(A1 U A2 U U 14, U An+l) = m(A1) m(A2) . . m(A,+l). Note that by the definition of pairwise disjointness, if
+ +
+ +
is a pairwise disjoint collection, then so is (Al, A2, . . . , A,}. tion hypothesis can be applied to obtain
3
4 m ( 4 4
+ m(An+l)
=
m(A1 U A2 U
U A,)
+ m(A,+i).
We would like to conclude that m(Al U A2 U U A,) m(A,+l) = This conclusion is justified by Definition m(A1 U A2 U U A, U A,+i). 16.3, provided that A l U A2 U U A, and A,+l are disjoint. However by Theorem 15.4, An+i
n ( A l U A2 U
=
U A,)
. U ( A , + ~n A,)
since {Al, A2, . . . , A,, ,4,+1) is a pairwise disjoint collection. Thus, we have shown that the truth of P,+l follows from the truth of P,. By the principle of induction, this proves Theorem 16.4.
1. Use mathematical induction to prove the following identities. (2n  1) = n2 (a) 1 + 3 + 5 +   . + s ~ ( n 1)(2n 1) (b) 1 2 + 2 2 + 3 2 +  .  + n 2 = 1 (2n 1 ) 2 = +(n 1)(2n 1)(2n (c) l 2 32 52 l6 z6 36 (i),'n6 (d)
+ + + + + + +
+ 3)
211
PROOF BY INDUCTION
+ ( 3 . 4  5)2 
(b)
+ + +
+ +
)
mathematical induction to prove the following identities. 1+2+22+.**+2n1 = 2"1 n(+lnl = 4  (n 2) (+) "l 1 2(+) 3(+)2 3 + 33 + 35 + . . . + 321 = 3 s(9"  1)
1
1
= 
1 n+1
6. Let t be any real number different from 1. Use mathematical induction to prove the following identities.
(a)
i+t+t2+...+tn'
= 
tn  1 t 1
MATHEMATICAL INDUCTION
[CHAP.
7. Use mathematical induction to prove the following inequalities. (b) 2fl+3 < ( n + 3)! (a) n < 2, r ) !, where r E N. (c) n ! r ! < (n 8. Prove that for al1 natural numbers m and n, m(m 1) . (m n  1) is a multiple of n. 9. Prove by mathematical induction that if O al 5 1, O 5 a2 5 1, . . . , ( 1  a,) 2 1  al  a2  a,. O 5 a , 5 l , t h e n ( l  a l ) ( l  a2)
+
a *
<
10. Prove by mathematical induction that if a l , aa, numbers, then (al 2 a2n ala2  a,, 5
+ ; +
a2 =
2
ala2
(al
;
~
=
(a ~l
; a2)2 .]
11. Give a proof by mathematical induction of the case of Theorem 15.4 in which I = { 1 , 2 , . . . , n).
22 The binomial theorem. In this section, we will use mathematical induction to prove the binomial theorem. The binomial theorem and its generalization, the multinomial theorem, are important results not only in elementary algebra, but also in number theory, probability theory, and combinatorial analysis. An application of the binomial theorem has already been given in Section 13. Moreover, the proof of this theorem is a good exercise in the use of mathematical induction. The formulas
are familiar from elementary algebra. They suggest the problem of finding the general expanded version of the power (x y)n. An examination of the cases n = 2, 3, 4 suggests that the general formula should be of the f ollowing f orm :
where the coefficient Ni,, of x n " y is some natural number which depends on i and n. For example, if n = 4, then N 1,4 = 4, N 2 , 4 = 6, and N3,4 = 4. Mathematical induction now provides a means of verifying
221
61
this guess. Let Pn be the statement that equation (21) is valid with Ni,, certain positive integers (which will be determined presently). In particular, P1 is just the statement x y=x y, which is certainly true. Then, making the induction hypothesis that Pnis true, an algebraic calculation gives
This identity establishes the validity of Pn+1. I t is only necessary to note that the coefficients for the identity Pn+iwill be
Since the Ni,, are natural numbers by the induction hypothesis, so are the coefficients Moreover, these constants, which are usually called binomial coeficients, satisfy a simple relationship which makes it possible to obtain their value. For convenience, define
N~,, = N,,,
Then NiTn+i= Ni1,n
1,
(22) (23)
+ Ni,nj
5 i 5 n.
62
MATHEMATICAL INDUCTION
[CHAP.
The rule of formation should be clear. The edges of the triangle are composed of ones. The position of the numbers in successive rows is staggered, so that every number not on an edge of the triangle has two numbers above it, one of them to the right and the other to the left. Moreover, each such number is the sum of the two numbers above it. 1fwewrite down a similar triangle with the binomial coefficients:
We see that equations (22) and (23) express exactly the same rules of formation that were used to construct the Pascal triangle, and these rules clearly determine uniquely the numbers which appear in the triangle. Hence, the numbers in the nth row of the Pascal triangle are precisely the binomial coefficients for the expansion of (x y)".
A striking characteristic of the Pascal triangle is its symmetry about the vertical line through its center. This symmetry is expressed in terms of the binomial coefficients by the formula
The proof that equation (24) is actually valid is another simple exercise in mathematical induction. The details are left to the reader. Another less obvious relationship between binomial coefficients can be discovered from the Pascal triangle by tracing down a diagonal from left to right. For example, on the third diagonal we get the sequence 1, 3, 6, 10, 15, 21, . . . . The rule of formation here is not immediately evident. However, consider the successive quotients: 3, = 2 = 27 6 = 5 3) 1 5  3  6 21z . . . . Similarly, down the next diagonal, 1, 4, 10, _  4, 1 5
221
63
20, 35, . . . , the quotients are 4, 9 = S, These observations suggest another identity:
2 =
6 3,
20
= 24).
This can in fact be proved by induction, using equations (22) and (23). We will not carry out this proof. Instead, let us see how equation (25) can be used to determine a numerical expression for the binomial coefficients Ni,,, O < i < n. By successive cancellation, we obtain
Recalling the convention that O! = 1, we see that the expression n !/(n  i) ! i ! represents Ni,, even for i = O and i = n. For by (22))
The discussion of this section can now be summarized as a theorem. THEOREM 22.1
where the binomial coefficients Ni,, are natural numbers which are given by the formula
Except for the inductive proof of (25), al1 of the facts in Theorem 22.1, have been established. Instead of proving (25)) %vewill show directly (by induction on n) that Ni,, = n!/(n  i)! i! if O 5 i 5 n. We have already observed that for al1 n, No,, = n!/(n  O)!O!, and NnSn = n!/(n  n)! n !. This provides the basis of the induction. For if
64
MATHEMATICAL
INDUCTION
[CHAP.
n = 1, then O 5 i 5 n implies that either i = O or i = l. For the induction step, assume that Ni,, = n ! / ( n  i )! i ! for each i between O and n (including i = O and i = n ) . Then by (23))
provided that 1 5 i 5 n. Since the cases i = O and i = n 1 are taken care of by the first remark of this paragraph, the proof of the induction step is complete. By the principle of mathematical induction, Theorem 22.1 is established. The notation Ni,, for the binomial coefficients seems appropriate for in Section 13. However, a the interpretation of these numbers g~ven more common designation of Ni,, is (7). That is,
Henceforth, we will use ( ) : rather than Ni,, to denote the binomial coefficients. There are numerous useful identities involving binomial coefficients. We give one sample of such a relation.
THEOREM 22.2. Let m and n be natural numbers, and let k be an O 5 lc 5 m , O 5 lc n. Then integer ~at~isfying
<
(k
1) +
(k
2) +
+
(3 (O)
+
+
m+n lc
>
m]
I t is possible to prove this formula by induction on m n, using (22) and (23). However, there is a simpler proof, based on Theorem 22.1, which makes it clear why such an identity holds. We observe that ("kn) is the coefficient of xm+nkyk in the expansion of (z y)mCn. However,
[(O)
xm
I f these expressions are multiplied together and the terms with the same powers of x and y are collected, it is clear that the coefficient of x m fn  k y k
r
CD
'd. CD
D
9 2.
3 II
u
e
3
II
Ch3
66
MATHEMATICAL INDUCTION
[CHAP.
The general idea of the proof of Theorem 22.2 was used in Section 13 to obtain an expression for the number of subsets of cardinality k in a set with n elements. The method consists of obtaining two different expressions for the coefficients of a polynomial in one or more variables, and then equating the corresponding coefficients. The justification for this procedure must wait until the nature of polynomials has been examined more carefully (see Section 92). As in the case of Theorem 22.2, there are many instances in which an inductive proof can be replaced by this process of "equating coefficients."
1. Write the binomial formula for the case n cal value of al1 of the coefficients.)
2. Calculate n!/i!(n  i)! for n = 7, O < i < 7, and compare your results with the values of the binomial coefficients obtained from the Pascal triangle. 3. Prove (24) by induction on n, using (22) and (23).
4. Prove (25) by induction on n, using (22) and (23).
5. Show that the binomial formula implies
Show conversely that this identity implies the binomial formula. [Hint: let t = y/$.] 6. (For students familiar with differential calculus.) Prove (25) by differentiating both sides of the identity
then expanding the lefthand side and comparing coefficients of equal powers of t. 7. Prove Theorem 22.2 by induction on m n. 8. Using Theorem 22.2 and (24), show that (g)2
a2= (2).
+ (y)2 + (t)2 + +
5
n,
9. Prove by induction on n that if m is a natural number satisfying m then (3 r;') K2) (3 = (:++:l. 10. Prove by induction on r that
+ +
231
GENERALIZATIONS
OF THE INDUCTION
PRINCIPLE
67
23 Generalizations of the induction principle. I n this section, we will consider some variations of the principle of mathematical induction. I t is often difficult to use (21 . l ) directly in a mathematical proof, even though the problem under consideration seems to be accessible to induction. In many?such cases, a slight modification of the induction principle (21.1) will lead to success. Our first observation amounts to only a change of the notation in (21.1).
(23.1). Let r be an integer. Suppose that P,, P,+l, P,+2,. . . is a sequence of statements such that (a) P, is true, and (b) for any n r, if P, is true, then P,+l is true. Then al1 of the statements P,, P,+l, P,+2, . . . are true.
>
In many inductive problems, the direct application of (21.1) requires an unnatural change in notation. I t is better to use (23.1) in such cases.
EXAMPLE 1. If n 4, 2n < n!. In this case r = 4. The assertion Pq is correct: 24 = 16 < 24 = 4!. I t is easy to show that if 2n < n!, then 2"+l < (n l)!. Note that the statements P l (2l < l!), P2 (22 < 2!), and Pg (23 < 3!) are false.
>
Suppose that P1,P2,P3,. . . is a sequence of statements, and that P1 and P2are true. To prove P3by the ordinary induction process, we would show that P2 imples P3. However, al1 that we want is a proof that P3 is true, and it may be the case that P3is a consequence of P1,or of a combination of P1 and P 2 . More generally, if it has been shown that P1, P2). . . , Pn are al1 true, and if it is possible to prove that the truth of Pn+lis a consequence of some, or possibly all, of the statements Pl, P2,. . . , Pn,then we can assert that P1, P2,. . . , P,, Pn+1are al1 true, and we are ready to go on to the next statement in the sequence. If this can be done for every n, then it is possible to proceed along the sequence of statements, proving them one at a time. Eventually, any particular statement will be shown to be true. We can formulate this process as a revised principle of induction which, at the same time, takes advantage of the more general notation introduced in (23.1). (23.2). Let r be an integer. Suppose that P,, P,+l, P , + 2 , . . . is a sequence of statements such that (a) P, is true, and (b) for any n 2 r, if P,, P,+l, . . . , P, are al1 true, then P,+l is true. Then al1 of the statements P,, P,+l, Pr+2,. . . are true.
68
MATHEMATICAL INDUCTION
[CHAP.
A proof which is based on (23.2) is called a course of values induction. As in the case of ordinary induction, the proof of the first statement P, of the sequence is called the basis of the induction, and the proof that the truth of P,, P,+l, . . . , P, implies that Pn+lis true is called the induction step. For a course of values induction, the induction hypothesis is the assumption that P,, P,+l, . . . , P, are true. The conditions (a) and (b) of (23.2) can be combined into a single condition.
(23.3). Let r be an integer. Suppose that P,, P,+l, Pr+2, . . . is a sequence of statements such that for any n 2 r, if Pm is true for al1 m satisfying r 5 m < n, then P, is true. Then al1 of the statements P,, Pr+l, PT+2, . . . are true. The condition in (23.3) may seem ambiguous in the case where n = r, since it is impossible to have a natural number m satisfying r 5 m < r. But this simply means that the statement "P, is true for al1 m satisfying r 5 m < r " is automatically satisfied (or, in mathematical terminology, "vacuously satisfied"). Thus, the condition, for case n = r, is just the requirement that P , is true, which is condition (a) of (23.2). The induction hypothesis for the form of the induction principle given by (23.3) is the assumption that al1 of the statements Pm with r 5 m < n are true. This is different from the induction hypothesis in (23.2), where m 5 n, that is, for it is assumed that P, holds for r
Our discussion above shows that this shift from n 1 to n is necessary in order that condition (a) of (23.2) can be included in the condition of (23.3). Of course, condition (b) of (23.2) is also included in (23.3), which is assumed to hold for al1 n (and therefore it holds if n is replaced by n 1). Course of values induction is frequently used in the study of natural numbers. I n order to give an example, we introduce the important concept of a prime number. A prime number (or simply a prime) is a natural number greater than 1, which is not divisible by any natural number other than itself and l . For example, 2, 3, 5, 7, 11, and 13 are primes, while 4, 6, 8, 9, 10, 12, 14, and 15 are not primes.
EXAMPLE 2. We will prove using a cousse of values induction that every natural number greater than 1 is divisible by a prime. Since n = 2 is the first
231
PRIXCIPLE
69
natural number greater than one, we take r = 2 in (23.3). statements t o be proved is: P 2 : 2 is divisible by a prime, P 3 : 3 is divisible by a prime,
The sequence of
As another example of a course of values induction we will give the promised proof of Theorem 15.2. Actually, we will prove only half of this theorem, since the othcr half has a similar proof.
Prooj o j l'heorem 15.2. The statemcnt to be proved is this: if sets A 1, A 2 , . . . , A,, are united in any way, two a t a time, using each set a t least once, then thc resulting sct is equal to u ( ( A 1 , A2, . . . , A n ) ) .
There is a surprise in this proof. The induction variable is not n, as one might expect. Consider the particular sct ( A 1 U A 2 ) U ( A 3 U A4). For this case
S1= { A l , A,), S,
S=
S 1
8 2 =
The first equality of this calculation depends on the observation of Example 2 of Section 15, that
thc sccond cquality follo\vs from Thcorcm 15.3. Each of thc equality signs in the calculation can be corisidcred as a coiirse of valucs induction
70
MATHEMATICAL INDUCTION
[CHAP.
step. But the induction variable is the number of occurrences of U in the expression (Al U A2) U (A3 U A4), not the number of sets involved in the expression. We can rephrase the statement to be proved so that it has the proper form for induction with the induction variable in clear view. P,: Let S be a finite collection of sets. Let E be a set obtained by forming n binary unions, starting with the sets in S and using each of these sets a t least once. Then E = ~ ( 8 ) . I t is convenient to begin the sequence of induction statements with Po, that is, we take r = O in (23.2). Let S be a finite collection of sets. Let E be a set obtained by forming O binary unions (that is, no binary unions), starting with the sets in S and using each of these sets a t least once. Then E = ~ ( 8 ) . If we are to form no binary unions and use each set in S a t least once in the process, then S can contain only a single set A. Thus, for r = O, S = { A ) and E = A. Since E = A = u({A)) = u(S), the basis of induction is satisfied. Assume that P, is true for O 5 m 5 n. Let E be a set constructed in accordance with the statement P,+l, that is, E is obtained by forming n 1 binary unions, starting with the sets of a finite collection S and using each of the sets of S a t least once. The final step in the construction of E is the formation of a union El U E2,where E l and E2are constructed from the sets in S by forming binary unions. Thus E = El U Ea,where El is constructed with ml binary unions, using al1 sets in a subcollection S1 of S, and E2 is constructed with m2 binary unions, using al1 sets of a subcollection S2of S. Since every set in S occurs in the expression E, any set of S must be used either in the construction of El, or in the construction of Ea. Therefore S = Si u S2.
(Of course, there can be sets which are in both Si and S2,and in fact, it might even happen that S1 = S2 = S. This is why induction on the number of sets in S will not work.) The total number of occurrences of U in El U E2 is ml 1 m2. But since E has n 1 occurrences of U, it follows that n 1 = m1 m2 1. Thus, mi 2 n and m2 5 n. But these are the conditions which we need in order to apply the induction hypothesis to El and E2. That is, by P,, and P,,, it follows that
+ + + + +
El
=
u(S1)
and
E2 = ~ ( 8 2 ) .
Thus, by Theorem 15.3,
241
71
This completes the proof of the induction step. Therefore, by (23.2), the proof of the first half of Theorem 15.2 is complete,
< n!,
for n
' : 4.
2. Prove in detail that the condition in (23.3) is equivalent to conditions (a) and (b) in (23.2). 3. Let P , be as in the proof of Theorem 15.2. Give the details of the proof of statements Pi and P2. 4. Prove the second half of Theorem 15.2. 5. I t will be showri in Chapter 3 that if a prime number p divides a product r s of two natural numbers, then either p divides r, or p divides s. Using this fact, give an inductive proof of the following theorem. Let t(n) be the number of (distinct) primes which divide the natural number n. Then 2t(n)5 n. If n is odd, then 3t(n) n.
<
"24 The technique of induction. A real appreciation of the power of mathematical induction can be obtained only by studying some of the problems to which it applies. Some of these applications have been given in the examples of Sections 21, 22, and 23. In this section we will examine a few more samples of induction.
These suggest a general theorem: the sum of the first n consecutive cubes is the square of a natural number. This statement is similar to Example 1 and Problems l(a) and (b) of Section 21. Thus, i t appears to be a likely candidate for induction. However, in the example and problems of Section 21, the induction step of the proof is carried out by simple algebraic manipulation. For this suggested theorem, there seems to be no such algebraic process available. The trouble here is that our statement is not precise enough. I n contrast to the example and problems of Section 21, the statement to be proved does not say that the sum of the first n cubes is given by a particular expression in terms of n, but only that it is a square of some number. Let us examine the
72
MATHEMATICAL INDUCTION
[CHAP.
cases given above with the hope of finding a more exact statement of the theorem:
The sequence of numbers 1, 3, 6, 10, 15 uTasencountered in the discussion of 2, 1 2 3, 1 Example 1 of Section 21. They were the sums 1, 1 2 3 4, and 1 2 3 4 5. We proved that these sums can be expressed in the form +n(n 1). This observation suggests that a more complete statement than the original one is true, namely,
+ +
+ + + + +
+ +
This identity can be proved by a straightforward application of the principie of mathematical induction. We leave this task as an exercise for the reader.
Example 1 illustrates a surprising phenomenon in the technique of using mathematical induction. Proofs by induction often fail because the theorem to be proved has not been stated in a strong enough form. When the appropriate statement of the result is discovered, mathematical induction may work quite well. The reason that this happens is not hard to see. When the statement of a theorem is strengthened, we of course have more to prove. However, we also have more to work with, because the induction hypothesis is also strengthened. The problem is to strike the right balance between hypothesis and conclusion, so that the induction step can be taken. Induction often works better if we make the problem more general. Moreover, the inductive method can sometimes be used to discover theorems. Our next example illustrates these facts.
EXAMPLE 2. Consider a square array of points with 10 points on each side (see Fig. 21). We define a path through the array of points to be a broken line segment starting a t the lower lefthand dot, proceeding from dot to dot, moving either to the right or upward, and finally ending a t the upper righthand dot. One such path is shown in Fig. 21. The problem is to find the number of possible paths through the array. A more concrete formulation of this problem is to consider a person in the center of a large town, say a t the corner of First Street and First Avenue. In how many ways can he drive to the corner of Tenth Street and Tenth Avenue, traveling by a route just 18 blocks in length? A little experimentation will convince the reader that the number of such paths is too large to count easily. One possible method of finding the desired number is to
work up to i t inductively. If the square has two dots on each side, as in Fig. 22, there are only two paths. We may hope to work up through squares with 3,4, 5, . . . , 10 dots on each side. I n fact, if s is any natural number, it should be possible to determine the number of paths through a square array with S dots along each side. An even more general problem can be considered. How many paths are there through a rectangular array of points with r dots horizontally and s dots vertically? I t may seem optimistic to try to solve this general problem when the particular case of a 10 X 10 square array is apparently not easy. However, here is a situation in which the general problem is more accessible to induction than the specific one. Let P,,, denote the number of paths through an r by s rectangular array, where either r > 1 or s > 1. If r = 1 or s = 1, then the dots are in line (vertical or horizontal) and there is clearly only one path along the line of dots. I n other words, p1,~ = P8,l = 1, for al1 r > 1 and s > 1. If both r and s are larger than 1, then there are two possible starts for a path, either to the dot A immediately right of the lower lefthand dot, or to the dot B just above the initial one (see Fig. 23). Suppose
@a
74
MATHEMATICAL INDUCTION
[CHAP
that the first move is to the right. We then have to follow a path from A to the upper righthand corner. The number of such paths is just the number of paths through an r  1 by S array, that is, P T  l , s in our notation. Similarly, if the first move is to point B, then there are P,,sl ways to continue. Thus, since every path passing through A is different from every path passing through B, P r , s = Pr1,s Pr,s1, (27)
for r > 1 and S > 1. The relations (26) and (27) are similar to the identities (22) and (23) which determine the binomial coefficients. This can be seen more easily by changing our notation. We restate (26) and (27) as Pi,n+i = Pn+i,i = 1 (n 11, (28)
>
Now define
Nk,l
= Pk+l,lk+l
=
(O
5 75
1).
(2 1o)
Letting 7
0, 1
n, and also 7
n, 1 = n in (2lo), we obtain
= Pi,ni+2
Pi+i,ni+i
= Nii,n
Ni,,
(1
5i5
n),
which is (23). It was mentioned in Section 22 that the only solutions of (22) and (23) are the binomial coefficients. Therefore, N k , l = (i). NOWlet 7 = r  1 , l = r + S  2. Then
dots on a side, r
= S
By letting S = 10, we obtain the solution of the problem which was originally proposed; there are
253
75
paths through the 10 by 10 array of dots. Thus, a person driving from First Avenue and First Street to Tenth Avenue and Tenth Street and back in our mythical city could do so every day of the year for more than 66 years without ever twice using exactly the same route in either direction.
In this example, ure have used induction as a method of proof somewhat indirectly, namely, to show that P,,, = (riS2 , 1 ). This induction was actually carried out in Section 21. However, t,hemet,hod of setting up the problem (that is, obtaining a relation between P,,,, P,l,s, and P,,,l) is clearly based on the principle of induction. Xote that in order to apply this technique, it is necessary to generalize the original problem of finding the number of paths in a 10 by 10 array to the corresponding problem for a rectangular array of arbitrary size.
2. I t is well known that the sum of the interior angles of a regular nsided polygon is (n  2) 180 degrees. Give a proof of this fact by first generalizing i t to a suitable class of (not necessarily regular) polygons and then using induction. [Ilint: Divide a regular polygon into two polygons with a smaller number of sides by drawing a line between two nonadjacent vertices. Then see what induction hypothesis is needed to carry through the induction step. You may use the fact that the sum of the interior angles of a triangle is 180.] 3. Consider a triangular array of dots obtained from the s by s square array of JI:xample 2 by deleting al1 dots above the diagonal line from the lower lefthand corner to the upper righthand corner. Figure 2 4 illustrates the case s = 5. Define paths from the lovver lefthand dot to the upper righthand dot as before. \Vhat is the number of paths through the triangular array with 10 dots on the FIGURE 24 horizontal and vertical sides?
. . ....
0 . .
25 Inductive properties of the natural numbers. There is a close relation between the principle of induction and the order properties of the natural numbers. Our purpose in t,hissection is to describe this relationship. We have spoken severa1 times of a sequence P1, P2,P 3 , . . . of statements, and later we will discuss arbitrary sequenccs of rational and real numbers. So far we have not given a complet,e definition of a sequence.
76
MATHEMATICAL INDUCTION
[CHAP.
We have assumed that this notion has an intuitive meaning. When denoting a sequence by xl, x2, x3, . . . , we are taking advantage of the obvious fact that the objects of any sequence can be labeled by the natural numbers. This observation can be used to define the concept of a sequence in any f6rmaL development of mathematics based on set theory and the axioms of the natural numbers. That is, a sequence is a correspondence between the natural numbers and the objects of a set
where 1 corresponds to xl, 2 corresponds to x2, and so on. The objects xl, x2, x3, . . . need not be distinct. For example, if 1 corresponds to O, 2 corresponds to 1, 3 corresponds to 0, 4 corresponds to 1, etc., we obtain the sequence which is usually written O, 1, 0, 1, . . . . The elements of the sequence are the members of the set X. This definition is precise, and it agrees with our intuitive notion of a sequence. Moreover, if sequences are defined in this way, then their properties, and in particular the principle of induction, can be derived from properties of the natural numbers. Let us see what property of the natural numbers it is that yields the principle of mathematical induction. A careful examination of the discussion in Section 21 shows that the principle of induction depends on the fact that if one proceeds along a sequence of objects, from one to the next, then eventually any given element of the sequence will be reached. Applying this observation to the sequence 1, 2, 3, . . . of natural numbers in their usual order, we get the f ollowing statement. (25.1). Principie of induction for the natural numbers. Let S be a set of natural numbers such that (a) 1 E S, and 1 is in S. (b) if a natural number n is in S, then the next number n Then S contains every natural number.
In the formal development of mathematics, (25.1) is usually taken as an axiom. The principle of induction is deduced from it easily. Let P1, P2,Pa,. . . be a sequence of statements indexed by the natural numbers. Let S be the set of al1 natural numbers n for which the corresponding statement P, is true. Suppose that the sequence of statements satisfies the two conditions (a) and (b) of (21.1). Then by (2l.la), P1 is true, so that by the definition of S, 1 E S. Thus S satisfies (25.la). If n E S, then by the definition of S, P, is true, and therefore by (2l.lb), P,+l is true. Hence, n 1 E S. Thus S satisfies (25.lb). Consequently, according to (25.1), S contains every natural number. This means that every one of the statements P1, PS,P3,. . . , Pn is true.
2 51
77
There is another important property related to the ordering of the natural numbers. Let A be a nonempty finite set of natural numbers. Then the elements of A can be listed in some way:
By successively examining the numbers in this list, it is possible to pick out a smallest one, that is, a number ni which satisfies
Thus, A contains a smallest number. This same conclusion is reasonable even if A is infinite. Suppose that A is any nonempty set of natural numbers. Since A is not empty, it is possible to select some element a E A. Let A. = ( n l n E A, n 5 a ) . Then A. consists of some, but not necessarily all, of the natural numbers 1, 2, 3, , . . , a. Therefore, A. contains only a finite number of elements, and it is not empty since a E Ao. Thus A. has a smallest element. Cal1 this smallest element m. By the definit'ion of Ao, m E A and m a. If n E A, then either n E Ao, or n > a. In the first case, m n because m is the smallest element of Ao. I n the second case, we have n > a m. Thus, m is the least element of A. In practice, it may be difficult to determine which numbers belong to Ao, and if the cardinality of A. is large (say 1 Aol = 1 0 ' ~ ~the ~ ) process , of selecting the smallest number m might take severa1 lifetimes. However, A does have a least element, whether we can find it easily or not. This fact, which has important mathematical applications, can be stated as follows.
<
<
>
(25.2). Wellordering principle. Let A be a nonempty set of natural numbers. Then A contains a smallest number m (that is, m E A and m _< n for al1 n E A). I t is obvious that the conclusion of the wellordering principle is not true if A is the empty set. There are two reasons for pointing this out. First, a common blunder in applying the principle is committed by failing to prove that the set to which it is applied is not empty. Second, the wellordering principle is often used to show indirectly that some set A of natural numbers i s empty. One assumes that A is not empty, so that the wellordering principle can be used to infer that A contains a smallest number. Then, from the existence of this smallest number in A, some contradiction follows. Therefore, A must be empty. This method of proof can often be used instead of a course of values induction.
78
MATHEMATICAL INDUCTION
[CHAP.
EXAMPLE 1. Using the wellordering principle, we give a new proof of the fact that every natural number n greater than 1 is divisible by a prime number (originally proved in Example 2, Section 23, using course of values induction). Let A be the set of al1 natural numbers n which are greater than one and not divisible by a prime. If il is empty, then every n > 1 is divisible by a prime, and this is what we wish to show. Hence, suppose that A is not empty. Then by the wellordering principle, there is a smallest number m E A. Since m belongs to A , i t is not 1 and it is not divisible by a prime. I n particular, m itself is not a prime. Therefore, m is divisible by some natural number k which is different from 1 and m. I n particular, k < m, and k 4 A, since m is the smallest number in A. By the definition of A, this means that either k = 1, or Ic is divisible by a prime. However k 1, so that k must be divisible by a prime p. Since p divides k and k divides m, i t follows that p divides m. Uut this contradicts the fact that m E A, since no number in A is divisible by a prime. Thus, the original assumption that A is not empty must be incorrect. That is, A = cP, which means that every natural number greater than one is divisible by a prime.
1. Show by examples that the wellordering principle is not true for subsets of 2, Q, or R. [Hint: Give examples of nonempty sets which do not contain a smallest element.] 2. Show that the wellordering principle is satisfied for subsets of the following sets. (a) (nln E 2, n 2 k), where 7 is any integer (b) (1 
tin
E N)
3. Show by a method similar to the derivation of (21.1) from (25.1) that the generalized induction principle (23.3) can be deduced from the wellordering principle (25.2). 4. Show that if m and n are any two natural numbers, then a t least one of the following is true. (a) m = n k (b) there is a natural number k such that m = n 1 (c) there is a natural number 1 such that n = m
+ +
[Hint: Let S be the set of al1 natural numbers m such that for al1 n either (a), (b), or (c) is satisfied. Note that 1 is in S. Show that condition (b) of (25.1) is also satisfied.]
261
INDUCTIVE DEFINITIONS
5. Show that (25.1) is a consequence of (25.2). 6. Give an inductive proof of the sequence of statements Pl, P2, P3, . . . , where Pn is the assertion: if A is a set of natural numbers and if n E A , then A contains a smallest element.
7. Show by the wellordering principle that every nonempty finite set of natural numbers has a largest element.
"26 Inductive definitions. As the reader probably realizes, definitions are an important part of mathematics. Often an inductive process is used to formulate a mathematical definition. Definitions of this sort are called inductive, or recursive.
EXAMPLE l. Let x be a real number. Then the nth power of x is defined for al1 n as the product of x with itself n times. However, a more precise definition of xn is formulated inductively by means of two requirements:
EXAMPLE 2. Many important sequences are defined recursively. as an example the Fibonacci sequence:
2L1
V C T e cite
1,U2
1,U3
2 , u 4 = 3 , U 5 = 5, U6
 13
f....
1,
un+i
= Un
+ un1
(for n
> 2).
EXAMPLE 3. Often informal definitions are given for objects which should properly be defined by induction. For example, the sum S, = 1 2 . (n  1) n of the first n natural numbers was introduced informally in Section 21. The inductive definition of this sum is given by the conditions Si = 1, S,+l = S, (n 1). A proof by mathematical induction would establish the identity S, = +n(n 1) in the same way as before.
+ + +
+ +
80
MATHEMATICAL INDUCTION
[CHAP.
equality x1 = x. The condition K is the equality xn+' = (xn)  2 . In Example 2, 0, is the nth term of the Fibonacci sequence. Here, k = 2, C1 is the condition u1 = 1, C2 is the condition u2 = 1, and K is the condition n 2Un+ = Un Un1,
>
I t is important to show that inductive definitions give uniquely determined objects 0, for every natural number n. That is, we would llke to know that there exists a sequence 01, 0 2 , . . . , O,, . . . of objects such that O, satisfies C, for every n 5 k, and 0, satisfies K for n > k, and that if O, OS, . . . , O;, . . . is any sequence of objects such that 0; satisfies Cn for n 5 Ic, and 0; satisfies K for n > k, then Oi = 01, 0; = 0 2 , . . . , 0; = O, . . . . This can be proved using the induction principle for the natural numbers (25.1).
THEOREM 26.1. Suppose that Cl, C2, . . . , Ck, and K are conditions having the properties stated in (1) and (2) above. Then there is a unique sequence 01,02,. . . , O,, . . . of objects such that O , satisfies Cn for n 5 k, and 0, satisfies K for n > k.
Proof. Let S be the set of al1 natural numbers m such that there are unique objects 01, 02,. . . , Om with the properties that for n 5 m and n 5 Ic, O, satisfies C,, and for I c < n _< m, O, satisfies K (provided m > k). Then 1, 2, . . . , and k are in S, since by ( l ) , conditions C1, C2, . . . ! Ck, respectively, determine unique objects 01, 02, . . . , Ok. Suppose that some m 2 k belongs to S. Then by (2), there is a unique object satisfying K. Therefore there are unique objects 01, 0 2 , . . . , Om, such that O,(n 5 m 1) satisfies C, for n 5 Ic, and 0, satisfies K for n > k. Thus m 1 E S. Hence, S satisfies the conditions of (25.1), and therefore every natural number is in S. This means that there is a unique sequence 01, 02,. . . , O,, . . . of objects such that O, satisfies C, for n 5 k , and 0, satisfies K for n > k.
As one might expect, if objects 0, are defined inductively, then mathematical induction is an important tool for establishing the properties of these objects. We illustrate this fact by obtaining an estimate of the size of the Fibonacci numbers. Let a be a positive real number satisfying 1 a = a2. Then a is a solution of the equation x2  x  1 = O. By the formula for the roots of a quadratic equation, the solutions of this equation are i ( 1 6 ) Of the solutions, only the first is positive. Thus, and s ( 1  fi). a = +(1 6 ) . In particular, a > 1. Hence, u1 = 1 < a and u2 = 1 < a < a2. Make the induction hypothesis that u, < am for al1
261
INDUCTIVE DEFINITIONS
81
5 n. Then
Thus by the principle of induction we conclude that u, < a", for al1 n. Note that although we do not know an explicit formula for the number u,, we can nevertheless find some of its properties. This is typical of objects which are defined inductively:
and if O
2. Give an inductive definition for the sums in Problems l(a), 2(a), 3(a), and 4(a) of Section 21. 3. Give an inductive definition of n! 4. Give an inductive definition of nx (the operation of adding x to itself n times), and prove that this is the sarne as the operation of multiplying x by n. 5. List the first 50 terms of the Fibonacci sequence. 6. Show that no two consecutive terms of the Fibonacci sequence are divisible by the same natural number greater than one.
7. I n the Fibonacci sequence ul, u2, . . . , u,. n is a multiple of 3 and that u, is odd otherwise.
an
. . , show that
u, is even if
< u,+2,
9. Let a = + ( l d 5 ) and b = +(1  4 5 ) . Prove that if u, is the nth term of the Fibonacci sequence, then u, = (1/2/5) (an  bn).
10. Let the sequence vi,
v2, 03,
. . . , u,, . . . , show
. . . be defined inductively by
List the first 25 terms of this sequence. Show by induction that for any natural numbers m and n, 2um+, = umvn unvm, where ui, u2, . . . , U,, . . . is the Fibonacci sequence.
CHAPTER 3
3 1 1
THE DEFISITION
OF XUMBERS
83
three camels for seven wives, it was essential that both understood how many items he would have to give up and how many he would receive. This understanding could be achieved, for example, by means of "counters, " that is, collections of stones from which sets of small cardinality might be formed in the palm of the hand. I+om very early times, numbers have been treated as concrete objects, rather than the names of properties of sets. I n fact, this fictitious viewpoint was essential for the creation and development of mathematics. I t was not until the 19th century that mathematicians started wondering how to justify the existence of numbers. One of the earliest attempts to define the natural numbers as objects was made by the German mathematician Gottlob E'rege (18481925) in a book on the foundations of arithmetic, published in 1893. Basing his work on Cantor's set theory, Frege defined the cardinal number of a set A to be the class of al1 sets which are equivalent to A by Cantor's definition of equivalence (see Definition 12.3). According to Frege's definition, the cardinal number of {O, 1) is the class of a11 sets {al, a2), where a l and a2 are distinct objects. Similarly, the cardinal number of (1, 11, 111) is the class of al1 sets {al, a2, a3), where a l , a*, and a3 are distinct elements. The natural numbers are defined to be the cardinal numbers offinite sets. Thus, the number "2" is a set, namely, the set of al1 pairs of distinct objects. Frege's definition of the natural numbers is therefore based on two concepts: the notion of a finite set, and the definition of the cardinal number of an arbitrary set. I n Section 12, it was taken for granted that the natural numbers existed, and that their properties were well known. Thus, it made sense in Definition 12.2 to define a set to be finite if it was equivalent to (1, 2, . . . , n) for some natural number n. This definition cannot be used if the natural numbers are defined as in Frege's program, using the concept of a finite set. This difficulty can be avoided by defining a set A to be finite if and only if there is no onetoone correspondence betiveen A and a proper subset* of A . I t turned out that there was a more serious flaw in the second notion which enters into Frege's definition of the natural numbers. Unless handled with great care, the concept of the class of al1 sets which are equivalent to a given set A leads to perplexing logical contradictions. Exactly how much care is needed to avoid these contradictions is somewhat uncertain even now. Thus, the definition of the cardinal number of an arbitrary set which Frege used is unacceptable, and therefore so is his construction of the natural numbers.
* I t is possible to give a convincing argument that this definition of a finite set agrees with the intuitive idea of such a set. For examplc, see The Foundations b y R. L. 1j7ilder, pp. 6271. TViley (1952)) New York. oj ~Jiathematics
84
[CHAP.
A more satisfactory definition of the natural numbers was given by John von Neumann (19031957) in 1923. von Neumann observed that any standard sequence of sets, containing what we would intuitively recognize as 1, 2, 3, 4, . . . elements, respectively, can be adopted as the sequence of natural numbers. He showed that a convenient choice for this standard sequence is
These particular sets* are now usually called Jinite ordinal numbers. The elements of each such ordinal number n are the empty set, and al1 ordinal numbers which precede n. Usually, the number O (zero) is included among the ordinal numbers. I f this is done, then according to von Neumann's definition, O would have to be the empty set, since is the only set which contains no elements. With this convention, the definition of finite ordinal numbers is very natural: n is the set of al1 ordinal numbers which precede it. In this book, we will consider O to be an ordinal number, but not a natural number. This convention simplifies certain statements concerning the arithmetic of N, for example, the cancellation law of multiplication. In order to develop some of von Neumann's theory of the natural numbers, we must give an exact definition of these objects. The method by which the natural numbers can be generated is clear: and
=
{a, 1, 2,
. . . ,n
1) U {n) = n U (n).
(1)
= =
{a)
(1)
{a, 1)
=
(a, {a)),
3 = 2 U (2)
{a, 1) U (2)
{a, 1,2) =
* The reader should remember the convention discussed in Section 11 that an object a is to be distinguished from the set {a) whose only element is a. Thus, 1 = (a) # a , and 2 = (a, {a)) Z {a, a ) = {a) = 1, etc.
311
THE DEFIXITIOS
OF NUMBEHS
85
DEFINITION 31.1. (a) (<P.} is a natural number. (b) If n is a set which is a natural number, then n U ( n ) is a natural number. (c) The nataral numbers are just those sets which are obtained by repeated application of t,he rules (a) and (b). Hencefort.h, the term "natural number" will refer to the sets defined in Definition 31 . l . Of course, the familiar symbols 1, 2, 3, . . . will be used to denote the respective sets
We will use lowercase letters to represent natural numbers, even though this violates the custom of denoting sets by capital letters. As usual, the symbol N will stand for the set of al1 natural numbers. Our definition of the natural numbers may a t first seem strange to the reader. IIo\vcver, the use of this definition requires very little readjustment in our way of thinking about natural numbcrs. The theorems and definitions which have been given in Chapters 1 and 2 are al1 sensible and correct when the term "nat,ural number" is interpreted according to Definition 31.1. As an example, let us consider the principle of induction (25.1). It is convenient to introduce the following notation. If n is a natural number, define S(n) = n U ( n ). Intuitively, S(n) denotes the "successor" of n, that is, n 41. With this notation, t,he induction principle can be st,ated in the following form. (31.2). If A is a set of natural numbers such that (a) 1 E A, and (b) if n E A, then S(n) E A , then A contains every natural number. This principle is virtually a restatement of Definition 31.1 (c). Indeed, the conditions (a) and (b) in (31.2) state that (a) belongs to A, and if n is iii A , thcn n U ( n ) belongs to A. I n particular, any set which is obtained by repeated application of the (a) and (b) in Definition 31.1 must also belong to A. Thcrefore, according to Defini tion 31.1 (c), every natural number is in A . The fact that each natural number n is a set which contains what we instinctively think of as n elements often makes it possible to simplify the statements of definitions and theorems. E'or example, Definition 12.2 (of a set X having cardinality n) is intuitively eyuivalent to the statement that there is a onetoone correspondence Inetween X and n.
86
[CHAP.
(31.3). If n is any natural number and X is a set, then only if) X is equivalent to n.
1 x 1 = n if
(and
Because it is intuitively sound and better suited than Definition 12.2 for the development of the theory of the natural numbers from Definition 31 . l , we will use (31.3) as our definition of a set having cardinality n. As in Definition 12.2, a set X will be calledJinite if either X = G, or there is a natural number n such that X is equivalent to n. Although it would be contrary to our intuition, it is not inconceivable that a set X might be equivalent to two different natural numbers m and n. This would imply that there is a onetoone correspondence between m and n. Fortunately, it is possible to show that if a onetoone correspondence between natural numbers m and n exists, then m = n (see Problem 19). Consequently, it makes sense to say that the cardinal number of the set X is n if X is equivalent to n, and to define 1 x 1 to be this unique number. In other words, (31.3) is meaningful. As a set, each natural number n is equivalent to itself. Therefore, according to (31.3)) Inl = n, for every natural number n. I f X is any finite, nonempty set, then X is equivalent to 1x1. In fact, for X to be nonempty and finite means by definition that there is a natural number n such that X is equivalent to n. Then by (31.3), = n. Hence, X is equivalent to 1x1. This observation leads to the following useful fact. If X and Y are two finite nonempty sets, then X is equivalent to Y if and only if 1 x 1 = (Y/. To prove this statement, suppose first that X and Y are equivalent. Since Y is equivalent to / Y / ,it follows from (12.4~) that X is equivalent to 1 Y1 . Therefore, taking n in (31.3) to be the natural number 1 Y(, we obtain 1 x 1 = / Yl. TO prove the converse x 1 = 1 Yl. Since X is equivalent to 1 x 1 ,and Y is statement, suppose that 1 equivalent to lY1 (and 1 x 1 = / Y / ) ,it follows from (12.4~) that X is equivalent to Y. Many results follow from Definition 31.1 and (31.2). In particular, we cite the following.
1 x 1
1 is a natural number.
The statements (31.4) and (31.5) are reformulations of Definition 31.1 (a) and (b), respectively, using the notation 1 instead of (G) and S(n) instead of n U {n). The proofs of (31.6) and (31.7) are easy, and we leave them as exercises for the reader (see Problems 4 and 6).
311
THE DEFIXITION
OF NUMBERS
87
The statements (31.2), (31.4), (31.5), (31.6), and (31.7) are called Peano's axioms,because it was shown in 1889 by the Italian mathematician Guiseppi Peano (18581932) that the whole theory of the natural numbers can be developed from these statements. I n Peano's development of arithmetic, the natural numbers constitute a set N of undefined objects, with a distinguished clement 1. I t is assumed that an operation is defined on N which corresponds intuitively to the process of passing from a natural number n to its successor S ( n ) . l~inally, it is assumed that Peano's axioms are satisfied. From these fetv axioms, it is possible to define addition and multiplication in N, and show that these opcrations have their familiar propert,ies. This axi0mat.i~ development of the natural numbers is carried out in severa1 t,extbooks, in which a construction of the number systems is given. However, we uill not use I'cano's definition of the operations in N. When the natural numbers are defined as in Definition 31.1, the addition and multiplication operations have useful meanings in terms of the operations of set theory.
l. According to Frege's definition, what is the cardinal number of the empty set ?
2. Write in full the sets which arc 5 and 6 in von Neumann's definition of the natural numbers. 3. Prove that if n E N, then @ E n ; thus, no natural number is the cmpty set. [IIint: Let A = {n E NI@E n). Use (31.2) to show t h a t A = N.]
4. Show that if n E N, then n 4 1, so that 1 # S(n).
5. Prove that if m E n, then m n. [Hint: Lct A or m 5 n) . Use (31.2) to show that A = N.]
6. Prove (31.7).
{n E N I either m 4 n,
The following problems lead to some of the most important properties of the natural numbers. Several of the assertions made in this section and a number of the unprovcd statements in the next two sections occur among these problems. The rcader with limited mathematical background will probably have difficulty proving some of the statements in Problems 7 through 19, evcn though hints are supplied in many cases. Such students are advised to rcad the problems, and t r y to see what they mean (keeping in mind that thc set n is {a, 1, 2, . . . , n  1))) without attempting to do al1 of them. Howcver, Yroblems 7 through 19 should be worked in the order in which thcy appear, because many of them depend on thc preccding ones. 7. Prove that n 4 n for al1 natural numbers n. [Hint: Let A Csc (31.2) to prove that A = N.]
=
{n E Nln G? n) .
1, or 1 E n.
88
[CHAP.
1, or n
k
A A
E N.
11. Show that if m E n, then either #(m) = n, or else S(m) E n. [Hint: Let = {n E N I for al1 m E n, either S(m) = n, or else S(m) E n). Prove that
=
N.]
12. Show that for al1 m and n, either m E n, m = n, or n E m. [Hint: Let A = {n E N I for al1 m, either m E n, m = n, or n E m). Prove that A = N.] 13. Prove that for al1 m and n, either m C n, m 14. Prove that m C n if and only if m E n. 15. Prove that m C n if and only if S(m) C S(n). 16. Show that if n is a natural number, and X is a proper nonempty subset of x 1 = m. [Hint: Let A be the n, then there exists m E N such that m c n and 1 set of al1 n for which the statement is true. Show that 1 E A . Suppose that n E A. To prove that S(n) E A, suppose that C X C S(n) = n U (n). Let Y = X n n. Show that the statement of the problem is true for S(n) and X in each of the following cases: Y = a, Y = n, C Y C n and n 4 X, C Y C n and n E X.] 17. Prove that if X is a subset of a finite set Y, then X is finite, and moreover, x 1 C 1Yl. if X C Y, then 1 18. (a) Show that if X is a finite set and z is any element, then X U ( 2 ) is finite. (b) Prove that if X and Y are finite sets, then X U Y is finite. [Hint: Let A = {m E N I if X is a finite set and lY/ = m, then X U Y is finite). Use (31.2) to prove that A = N.] 19. (a) Prove that if n E N and 1 C n, then there is no onetoone correspondence between 1 and n. (b) Show that if m E N and n E N are such that there is a onetoone correspondence between #(m) and S(n), then there is a onetoone correspondence between m and n. [Hint: Let r ++ S be a onetoone correspondence between S(m) and S(n). If m E S(m) corresponds to n E S(n), then r ++ S is a onetoone correspondence between m and n, also. If m does not correspond to n, show that the given correspondence can be modified to obtain a onetoone correspondence between m and n.] (c) Prove that if m and n are natural numbers such that m and n are equivalent, then m = n. [Hint: Let A = (m E N I for al1 n E N, if m C n, then m and n are not equivalent). Use Problem 19(a) and (b), and (31.2) to show that A = N. The statement (c) then follows from the result of Problem 13.1 32 Operations with the natural numbers. Once people began to think of the natural numbers as concrete objects, it was found that these objects could be combined in useful ways. Thus, if a set A contains 2 elements and a set B, which is disjoint from A, contains 3 elements, then the union A U B invariably contains 5 elements. The process of forming the union
=
n, or n C m.
3 21
OPERATIOXS
89
A U B of disjoint sets gives risc to the abstract operation ivhich ive cal1 addition: 2 3 = 5. Thc definition of addition can be stated very simply.
I)EFINITIOI; 32.1. Let m and n be nat,ural ilumbcrs. Let X aiid Y be sets such that = m, IYJ = n, and (a) (b) X n Y = a . Then thc sum of m and n is the natural numbcr ( X U YI.
1 x 1
The siim of m and n is denoted by m n. Thc proccss which associates with cach pair m and n of natural numbers thcir sum m n is the binary operation callcd addition. The fact that every pair of natural numbcrs has a uniclue sum is expresscd by saying that thc natural numbcrs are closcd under addition. In order to sce that 32.1 is a valid dcfinition, wc must shoiv that it provides a rule by which any two natural numbers can be combincd to produce a uniclue third natural number. This is accomplished by proving the folloiving statcmcnts.
(1) For any natural numbers m and n, thcrc cxist sets X and Y which satisfy (a) and (b). (2) If , Y and Y are sets satisfying (a) and (b), then X U Y is finite, that is, therc is a natural number k such that IX U Y 1 = k. (3) The natural numbcr IX U Y 1 is the samc for al1 pairs X, Y of sets satisfyirlg (a) and (b). It. is clear t'hat (1) and (2) guarantee that cach pair of natural numbcrs has a sum, and (3) insures the fact that this sum is unique. To prove (l), ivc must define tivo scts X and Y which satisfy (a) and (b). Thcre are many ways in which this can be donc. We can, for example, use the following construction. IJct X be thc product sct m X {1), and let Y = n X (2). Thcn , j ~ m , k E n, 1; (k,2), arc onetoorie corresporidcnces bctween m and m x (1) and n and n X (2) , respcctivcly. Thcrcforc, X is equivalent to m and Y is equivalcnt to n. Thus, by (3l.:<), (a) is satisfied. Supposc that (b) is not satisficd, that is, X n Y # a. Thcn therc ivould be some j E m and k E n such that (j, 1)
=
(lc, 2).
=
By 1)cfinition 13.1, this implies in particular that 1 = 2, that is, {a) {+, {a)). Since this is clcarly false, it folloivs that X n Y = a.
90
[CHAP.
The proof of (2) is based on (31.2)) and it will not be given (see Problem 18, Section 31). Statement (3) is a consequence of the following result from set theory. (32.2). I f X, X', Y, and Y ' are sets such that (a) X is equivalent to X 'and Y is equivalent to (b) X n Y = cP and X 'n Y ' = a, then X U Y is equivalent to X' U Y'.
Y ' ,and
Proof. Since X and X' are equivalent, there is a onetoone correspond,x ' between X and X ' . Similarly, there is a onetoone correence x t spondence y ++ y' between Y and Y ' . The required onetoone correspond'U Y ' is obtained by combining the given ence between X U Y and X correspondences x t ,x 'and y ++ y ' .Since X n Y = cP and X' n Y ' = cP, the resulting combination is a onetoone correspondence. Figure 31 below illustrates this proof. As an exercise, the reader can use (32.2) to prove (3).
The basic rules of addition are the following familiar statements. (32.4). Properties o f Addition. Let k, m, and n be natural numbers. Then (a) Ic+ ( m + n ) = @ + m ) + n ; (b) m + n = n + m ; (c) if k n = m n, then l c = m.
Statement (a) is the associative law o f addition, (b) is the commutative law o f addition, and (c) is the cancellation law o f addition. The properties (a) and (b) are easily seen to follow from the corresponding associative and commutative laws for set unions. For example, if X
3 2 1
OPERATIONS
91
and Y are disjoint sets such that 1 x 1 = m and ( Y ( = n, then by Definition 32.1, m + n = J X U Yl and n + m = J Y u XI. Since X U Y = Y U X , it follows that m n =n m. The proof of (c) will be given in the next section. Before turning our att'ention to the operation of multiplication, we will prove that the "successor" of each natural number is obtained by adding 1 to the number:
Proof. Let X = n X {1}, and let Y = (a) X (2) = {(a, 2)). Then, as in the proof of statement (1) following Definition 32.1,
( , l ) , k ~ n , and @(+,2)
are onetoone correspondences between n and X and 1 = {a) and Y, respectively. Therefore, 1 x 1 = n and (Y1= 1. &loreover, X n Y = as before. Hence, by Definition 32.1, / X U Y 1 =n 1. On the other hand, the correspondence
t ,
(k, 1 ,
for Ic E n,
t ,
(a, 2)
is onetoone between n U {n) = S(n) and X U Y. Thus, X U Y is equivalent to the natural number S(n), that is, J X U Y 1 = S(n). Therefore, S(n) = ( X U Y ( = n 1.
Like addition, multiplication of natural numbers is a binary operation under which the set of natural numbers is closed. That is, if we are given natural numbers m and n, the operation of multiplication yields a unique natural number which is called the product of m and n, and is usually denoted by* mn or m n. I n this book, both the notations mn and m n will be used. The multiplication of natural numbers is often defined to be the process of repeated addition. IIor a given natural number n, we obtain the product 2 n by adding n n, the product 3 n by adding (n n) n [or n (n n), since by (32.4a), addition is associative]. In general, the n n, where there are m product m n is found by adding n terms in the sum. The familiar terminology "m times n " is a literal expression of the idea that multiplication is repeated addition. This definition is meaningful for specific products such as 2 n and 3 n. However, in the definition of m n for an arbitrary natural number m, the statement
+ +
+ +
+ +

* The symbol m x n is also frequently used to denote the product of m and n. However, in our notation, m x n stands for the'set product of m and n considered as sets.
92
[CHAP.
"there are m terms in the sum n n n" is vague. The difficulty in this definition can be corrected by defining multiplication by the conditions (a) 1 n = n, ( b ) S(m) n = m n
+ +
+ n,
and using (31.2) to prove that m n is a uiiiquely defined natural number for each pair of natural numbers m and n. An alternative definition of multiplication, based on the concept of the product of two sets, is better suited to our discussion in which the operations of the natural numbers are related to operations with sets. The following example shows that such a definition is reasonable.
EXAMPLE l. The seats in theaters are usually labeled by letters corresponding to the rows, and by numbers corresponding to the positions of the seats in each row. Suppose that a theater has 26 rows, each of which contains 50 seats. Then each seat is labeled by an ordered pair consisting of a letter of the alphabet and a number from 1 to 50. Consequently, there is a onetoone correspondence between the seats in the theater and the set A X L, where A = (a, b, c, . . . , z), and L = (1, 2, 3, . . . , 50). Therefore the number of seats in the theater is of the product set A X L. On the other exactly the cardinal number A X hand, we know very well that the number of seats can be determined by multiplying the number of rows by the number of seats in each row. That is, 1 A X L / = /A/ (L].
LI
DEFINITION 32.6. Let m and n be natural numbers and X and Y be sets such that 1 x 1 = m and 1 Y ! = n. Then the product of m and n is defined to be the natural number I X x Y(. The process which associates with each pair m and n of natural numbers their product m n is called multiplication. As in the case of Definition 32.1, this definition needs some justification. I n Definition 32.6, the existence of sets X and Y which satisfy 1 x 1=m and ] Y 1 = n is clear. Indeed, the sets m and n themselves satisfy I m1 = m and lnl = n. Two facts must be proved. (1) If X and Y are setd such that 1 x 1 = m and 1 Y1 = n, then X X Y is finite, that is, there is a natural number lc such that I X x Y1 = k. (2) The natural number ( X X Y ! is t'he same for al1 pairs X, Y of sets satisfying 1 x 1 = m and / Y /= n. The result (1) is obtained by using (31.2), and as before the details mil1 not be given (see Problem 6 in this section). Statement (2) is a consequence of the following result concerning set products.
321
93
(32.7). If X, X', Y, and Y' are sets such that X is equivalent to X' and Y is equivalent to Y', then X X Y is equivalent to X' X Y'. Proof. Since X is equivalent to A", there is a onetoone correspondence x ++ x' between X and X'. Similarly, there is a onetoone correspondence y ++ y' betwecn Y and Y'. The required onetoone correspondence between X x I' and X' X Y' is
The following important thcorem follows easily from Dcfinition 32.6. THEOI~EM 32.8. If X and Y are finite nonempty sets, then
Proof. Since X and Y are finite nonempt,y sets, there exist natural numbers m and n such that X is equivalent to m, and Y is equivalent to n. By Definit,ion 31.3, this means that 1 x 1 = m and IYI = n. Therefore,
by Definition 32.6. AMultiplicationhas the following properties. (32.9). Properties of multiplication. Let k, m, and n be any natural numbers. Then (a) (b) (c) (d) (e)
Parts (a), (b), and (c) of (32.9) correspond exactly to the three properties of addition givcn in (32.4). Thus, we say that multiplication is associativc, commutative, and satisfies the cancellation law. The fact that the number 1 is an identity element for multiplication is stated in (d). The two basic operations are connected by the cquality (e), which is called the distributive l a w of multiplication with respect to addition. Xote that (b) and (d) imply n 1 = n, and (b) and (c) imply the "righthand" distributive law ( m + n ) . k = m . k + n  I c . Thc identities (a), (b), (d), and (e) are proved using the definitions of multiplication and addition, together with some simple results of set theory; the property (c) will be proved in the next section. Let X, Y, and
94
[CHAP.
1 x 1 = k, 1 Y1 = m! and ITfVj = n.
Then by Theorcm
Similarly, Sincc X X (Y X JV) is cquivalent to (X X Y) X TV by Thcorem 13.3(b), and thcse are firiitc nonempty sets, it follo~vsthat IX X (Y X TV) j = I(X X Y) X TVl. Thcrefore, li (m n) = ( 1 c . m) n. The identity (32.913) is a dircct consequencc of Thcorem 13.3(a) :
use the facts that 1 {@) ( = 1, and that (@, w) ++ u,, To prove (32.9d), ' and TV. for w E TV is a onetoone corrcspondence between (@) x V Thcrefore, l . n = ](@] X I.YI = ( W J= n. The proof of (32.9~) is based on the following result from sct theory, which we lcavc for thc reader to prove. (32.10). Let X, Y, and JV be any sets. Then
(Y
W)
(X
Y)
(X
W).
For the application of (32.10) to the proof of (32.9c), lct X , Y, and W bc any sets such that 1 x 1 = Ic, / Y / = m, ITVI = n, and Y n W = @. Thcn it is clear that (X X Y) n (X X W) = a. Hence, by (32.3), Theorem 32.8, and (32.10), 1 c m li n = IX X Y/ ( X X WI = I ( X x Y) ~ ( X TY)] X = IX x (Y U JV)] = 1 x 1. ] Y u W J = ( m + n ) .
2. Assuming that addition corrcsponds to set union and that multiplication corresponds to set intcrscction, which of the laws (32.4) and (32.9) have analogucs for scts, and which of the identities of Theorem 1 4 . 3 have analogues in the arithmctic of natural numbers? 3. Write the proof (32.4a) in full.
6. Prove by induction (31.2) on the cardinal number of Y that if X and I' are finite sets, then X X Y is finitc. [liint:Use (32.10) and Problem 18, Section 31 .]
331
T H E 0 H t ) E R I S G OF THE S A T U I l A L S U M R E I i S
93
33 The ordering of the natural numbers. I'robably the most important property of the natural numbers is their ordcring. Just the act of counting prescnts t,hc natural numbers in a definite sequence: one, two, three, four, and so on. I t is this ordering of the numbers that most childrcn lcarn long bcfore thcy can add or multiply. I n the devclopment of the natural numbers bascd on Definition 31.1, the ordcring is dcfined very simply.
DEFISITIOS33.1. If m aiid n are iiatural iiiimbers, t.hen m is said t o be less than n if m C n. As is cust,omary, we will write m < n or n > m if m is less than n. According to Definition 33.1 and the result of Problcm 14, Scctioii 31, the thrce conditions are equivalent. I n Scction 25, we discussed one important aspcct of thc ordering of the natural numbers, namcly thc ~vellordcringprinciplc: (25.2). Here we poirit out, somc of thc more elementary propertics of ordcr. (:33.2). Properties of order. 1,ct lc, m, and n bc any natural numbcrs. Thcn (a) eithcr m < n, m = n, or n < m, and it is impossiblc for more than onc of thcsc relations to bc satisficd by a giveii pair m and n of natural numbers; (b) if k < m and m < n, then 1 ; < n; (c) if k < m, then k n <m n; (d) if k < m, thcn k n < m . n.
By the definition of <, the property (33.2s) is cquivalciit to the statement that cxactly one of the rclations m c n, m = n, n C m is satisfied. I t is clear from thc definition of set theoretical inclusion that a t most one of these relations holds. Thc fact that a t lcast onc of thc relatioiis is satisfied was stated as Problcm 13, Section 31, and wc \vil1 not prove this rcsult. The assertion (33.213) is the samc as the statcment that if 1 ; Cm and m c n, then k c n, and this is clearly true. I n order to provc part (c), choose scts X, Y, and W such that IXI=1;, and X IYI=m, and Y
lTVl=n,
(3 1)
cY
IY
a.
(32)
This can bc done in thc samc way that scts were constructcd for the proof
96
[CHAP.
of statcment (1) following I1efinition 32.1. Then by Defiiiition 32.1, k n = 1X U TVl, and m + n = IY U IVl. (Xote that Y n TV = implies X n IV = Q.) I t folloivs from (32) that X U IV c Y U T.V. Using thc result of l'roblem 17, Section 31, X U TV c Y U bV implics IX U TV( C 1 Y U TiT/. Thcrefore, using Dcfinition 33.1, \ve have k n = IX U TVJ < 1 Y U TI;/ = m n. Thc proof of (33.2d) is similar, and we lcavc it as a11 cxcrcisc for the rcader. The cancellation laws of addition and multiplication, (32.4~) and (32.9c), can be deduced from (33.2). I t is nccessary to shoiv that if k, nz, and n are natural numbcrs, then
(1) 1 ; n = n2 n implies k = m, and (2) li n = m n implies 1i = m. Wc provc t,he cont,rapositives of thcse implications (sec t,hc Introduct,ion). Suppose that li f: in. Then by (33.2s), eithcr 1 ; < m, or m < k. Suppose that 1 ; < m. By (33%) and (d), li n <m n and I; n < m n. If m < li, the proof is n # m n. li n # m n and k Thcreforc, similar. There is a useful relat,ion betivccn thc ordering and addition of natural numbers.
THEOHEM 33.3. Lct m and n be natural numbers. Then m and only if thcrc is a natural niimber 1 ; such that n = k m.
< 7~ if
That is, if n = k m, thcn m < n, and convcrsely, if m < n, it is possible to find a number k such that n = 1 ; m. Intuitively, (33.3) is evidcnt. The natural numbcr 1; is obtained by counting thc numbers in the scqucnce m+l, m+2, ..., n.
A formal proof of Theorcm 33.3 could be given by means of induction on m. To carry this out, it is nccessary to use the rcsiilts givcn in scveral of the problems in Section 31. We \vil1 leave this task to thc reader who is intercsted in working out the details of the thcory (see I'roblem 5 ) . I t is easy to sec that the numbcr k which occurs in Thcorem 33.3 is z m and n = 1c' m, then 1 ; m =1 ; ' m, unique. Indecd, if n = l so that by the cancellation Inw (32.4c), k = Ii'.
I~EFIXITION 33.4. Let m siid n be natural numbers such that m < n. Thc uniquc number k satisfying n = li m is called t,hc diflercnce of n and m.
The usual notation to designate thc dif'fcrcncc of n and m is n  m. The opcration ivhich associates with thc pair m and n of natural numbers
331
97
(satisfying m < n) their difference n  m is called subtraction. If m is not less than n, then it is impossible to form the difference n  m and still remain within the system of natural numbers. T h a t is, thc natural numbers are no closcd with respect to suhtraction. One of the reasons for enlarging the system of natural numbers to the intcgcrs is to make subtraction of any two numbers possible. Subtraction satisfies the following identities.
(33.5). Let j, li, m, and n be iiutural numbers such that j < k and m < n. Then n)  ( j n) ; (a) (li  j) 1 (n  m) = (1; (b) (1;  j) (n  m) = (li . n j a m )  ( j  n 1 i . m ) ; (e) (k n)  (1; t m) = n  m; (d) IC (n  m) = (1; 1%)  (1; m).
The proofs of al1 of these identities are based on the same observation. Suppose that x, y, and x are natural numbers. Then y < .v, and x is equal y = x. I n fact, if x g = x, to the diffcrencc x  y if and only if x then y < x by Theorcm 33.3, and x = x  y by Ilcfinition 33.4. Conversely, if y < x and x = x  g, then x y = .v by Ilefinition 33.4. We now use this fact to prove (33.5b). I n this case, .z: = 1 c n j . m, y =j  n l;.m, and z = (1;  j ) (n  m). S o t e that sinre j < k and m < n, the diffcrcilccs 1 ;  j and n  m are natural numbers, so that z = (Ic  j) (n  m) is a natural numbcr. We must prove that .2: = x y, that is,
+ m] =
+ 1;
(1;
j).(n
m)
=
+ ( j . n + 1;m) = (k j ) . ( n + [ ( j (n  m) 4j m) t li m]

m) m m) m)
+
+
+(j
+j
We leave the proof of the remnining parts of (33.5) as an exercise for t,he reader.
98
[CHAP.
1. Use the result of Problem 17, Section 31, to show that if Y is a finite set, < IYI. and X C Y, then
1 x 1
2. Prove (33.2d). 3. Show that if k, m, and n are natural numbers, then (a) k implies k < m, and (b) k n < m n implies k < m.
+n < m+n
4. Use (and cite) the necessary results from the problems of Section 31 to prove that the following conditions on a natural number are equivalent. However, do not use Theorem 33.3. (a) n  # 1 (b) n > 1 (c) There is a k E N such that n = S(k). 1. (d) There is a k E N such that n = k (e) There is a fc E N and a j E N such that n = k j. (f) There is a k E N such that n > k. 5. Using ordinary mathematical induction, together with the results of Problem 4, prove the following two statements (for al1 m). (a) For al1 natural numbers n, if m < n, then there is a natural number k such that n = k m. (b) For al1 natural numbers n, if there is a natural number 7 such that n = k m, then m < n.
n, then k = m.
9. Let k, m, and n be natural numbers. Prove the following. (a) I f m < n, then (n k)  m = (n  m) k. (b) I f n < m < k, t h e n n + ( k  m ) = k  ( m  n ) . f n < k < m < n + k, then n  (m  k) = ( n + k)  m (c) I = k  (mn). (d) I f n + k < m, then (m  k)  n = (m  n)  k = m  ( n + k).
CHAPTER 4
THE INTEGERS
41 Construction of the integers. The average American student first encounters the integers in elementary algebra, tvhere he learns that the integers consist of the natural numbers, zero, and the negative numbers (the negatives of the natural numbers). He learns by rote the rulcs for adding and multiplying these numbers and s.ome identities which addition and multiplicntion satisfy. If given the opportunity to use this knowledge, he muy remember these rules, but the chances are good that he \vil1 never ask why the integers are definid as they are, or w h y they satisfy their familiar rules of operation. Our purpose in this section is to explore these questions. As far as historical rcsearch has determined, the negative numbers and zero were introduced by the Hindu mathematicians of India in the sixth or seventh century A.n. The increasing importancc of commerce in India a t that time stimulated this invention. The natural numbers could be used t o measure fixed quantities of moneySormerehandise, but business transactions involved changes of these quantities, i.e., increases or decreases. Instead of dealing with receipt and payment as different kinds of exchanges, it was found that both transactions coiild be treated a t once if the amount of money or goods received was denoted by an ordinary natural number, and the amount paid out was represented by a negative number. This idea is useful mainly because the effect of consecutive transactions can be obtained by the operation which \ve know as addition of intcgers. For example, if the receipt of five coins is followed by the payment of ten coins, the net result is the same as the payment of five coins. Iri symbolic form, this equivalence is expressed by the formula
The interpretation of consecutive exchanges as a single exchange also requires the consideration of transactions which involve no change of money or goods. These are of course represented by the number zero. For instance, a receipt of 5 coins followed by payment of 5 coins has the same eff ect as "breaking even " : S + (  5 ) = o. The integers and their operations are of course very familiar in our modern society. The application of the integers to represent exchanges of money
100
THE IXTEGERS
[CIXAP.
is also commonplace. However, before the sixth century, negative numbers were unknown, and zero was used only as a symbol to distinguish between numbers such as 102 and 12. The invention of these new numbers and the definition of addition and multiplication of the integers to satisfy the needs of commerce must be considered to be among the greatest advances of civilization. Informally, the set Z of al1 integers consists of (1) al1 natural numbers, (2) an object called zero and denoted by O, which is different from al1 natural numbers, and (3) for each natural number n, an object denoted by n, which is different from al1 natural riumbers and zero, and such that if m and n are two different natural numbers, then m and n are different objects. These objects are called the negative numbers. I t is not very important what the objects called "integers" really are. In fact, thcre are severa1 ways to construct the system of integers from the natural numbers, and the different constructions lead to different answers to the question "\Vhat are the specific objects called integers?" Of course, al1 of these constructions lead to systems which are essentially* the same. When the natural numbers are defined to be the finite ordinal numbers in von Seumann's scnse (Definition 31. l), then a convenient choice for zero is the empty set a, and the negative numbers can be defined as
Thus, the integers are the folloiving sets: (a) al1 natural numbers:
As usual the symbol O will denote zero ( = a ) and 1, 2, 3, . . . , n, . . . will stand for (11, (21, (31, . . . , {n), . . . respect,ively. I t is easy to see that the set Z of Definition 41.1 satisfies the conditions (l), (2), and (3) of the informal description of Z given above. In particiilar, al1 of
* Thc systcms obtained by thc various constructions arc isomorphic (froin the Greck word meaning "of the same form"). Thc mathematical mcaning of the term isomorphic will be explsincd in Section 42.
411
101
are different. The operations of addition, multiplication, and negation are responsible for the usefulness and importance of the integers. Addition and multiplication are extcnsions of the corresponding operations in the system of natural numbers, but negation has no counterpart in N. The definition of negation is suggested when tve thirik of the set of integers in the usual order,
consisting of the natural niimhers, and a mirrorimage copy of these numbers (the negative numbers), linked together by the number zero. ATegationis the process of passing from a n integcr a to its mirrorimage. DEFISITIOS41.2. Ncgaiion. ,et m be any natural number. Theii (a) m = {mi, (b)  {m] = m, (e) 0 = o. Once the meaning of Definition 41.2 is understood, the notation m for the negative numbers can be used without fear of trouble, since in fact m stands for {m), whether FVC think of  as the negation operation symbol, or simply as the usual sign to denote negative numbers.* If parts (a) and (b) of Definition 41.2 are combined, we obtain the familiar rule o j double negation: (m) = m. The addition operation in the iiitegers is surprisingly complicat,ed. D E F I S I T I O41.3. ~~ Addition. Let m and n be Then (a,) m n is defined as in N ; (b) (m) (n) = (m n); mn O (c) m (n) = (n) m = (n  m) (d) m + O = O + m = m ; any natural numbers.
+ (m)
m;
* The minus sign is also uscd t o denote the binary opcration of subtraction, as, for example, in expressions likc 3  1. \Vhen uscd in this way, the symbol L< 77 always occurs betwccn two number symbols; when "" denotes the operation of negation, it is never preceded by a numbcr symbol.
102
THE IXTEGERS
[CHAP.4
This definition of addition is open to criticism on the grounds that it is cumbersome and difficult to use in proving the important properties of addition. To avoid such a complicated definition of addition, and an almost equally unwieldy definition of multiplication, mathematicians have devised another way of constructing Z from N. This construction employs three important new mathematical concepts: equivalence relation, equivalence class, and partition of a set. The introduction and study of these notions represents a considerable digression from our program of constructing the fundamental number system Z (see Section 64). We have therefore chosen to adopt Definition 41.3 as the definition of addition in Z. The rules of this definition provide an effective method of performing addition in Z, and they can be used to prove the main properties of addition. f addition. Let a, b, and c be integers. Then (41.4). Properties o (a) a + b = b + a ; (b) a (b c) = (a b) c; (c) a O = a; (d) a (a) = O .
+ +
+ +
+ +
Actually, (41.4a, c, d) can be obtained very easily from Definitions 41.3, 41.2, and the properties of addition for the natural numbers (32.4). I t is the proof of the associative law (41.4b) which requires the checking of a discouragingly large number of different cases.
EXAMPLE 1. We will prove the commutative law (41.4a). There are nine possible cases to esamine. Let m and n be any natural numbers. (1) m n = n m, by (32.4); (2) m (n) = (n) m, by Definition 41.3(c) ; (3) m O = O m, by Definition 41.3(d) ; (4) (m) n = n (m), by Definition 41.3(c) ; ( 5 ) (m) (n) = (m n) = (n m) = (n) (m), by Definition 41.3(b) and (32.4) ; (6) (m) O = O (m), by Definition 41.3(e) ; (7) O n = n O , by Definition 41.3(d) ; (8) O (n) = (n) O, by Definition 41.3(e); (9) o + o = o + o.
+ + +
+ +
+ + +
+ +
Of course, cases (2) and (4), cases (3) and (7), and cases (6) and (8) are really the same, so that one case of each of these pairs could be omitted. EXAMPLE 2. There are 27 main cases to consider in the proof of (41.4b). These are obtained by letting a, b, and c take al1 combinations of natural numbers, negative numbers, or zeros. However,.because of Definition 41.3(c), it is neces
sarF to break some of thcsc main cascs into subc:ises. k'or csamplc, siipl>ose thnt k , 172, arid n are natural numbers, srid thnt ne wish t o prove
This ideiitity has a differcnt meariing in ench of five subcascs. (1) If k < m, thcn i t is ncccssary to show ( m n)  k = (m  k ) n. (2) If k = m, tlicn i t is neccssary to show (1n n)  m = O n. ( 3 ) If m < k < m n, tlien i t is ncccss:iry t o slio~v(m n)  k = n (k  m). (4) If k = m n, then i t is ncccssary t o show O = [ ( ( m n)  m)] n. (5) If k > m n , thcii it is ncccssary to show [k  ( n ~ n ) ] = [(k  V I )  n].
+ +
+ +
+
S h e dcsired identities in cach of these cascs can bc 1)roicd using Definitioii 41.3 and the rcsults of l'roblcm 9, Section 33.
I3y considcring tlie interprctatioii of thc iiitcgcrs as mcasures of variution (iricrcase or dccrcase), it is possihlc to firid ti rtitioilal bttsis for Ilcfinition 41.3 and for thc propcrties of addition (11.4). Lct us examine :I. spccific example. Suppose thut thc iritcgcr a reprcseiits the c*hungciri thc number of gallons of water in a c~crtain rescrvoir duriiig a givcn day. 1;or instante, if thc timount of water in tlic rescrvoir iricrcased by 15,000 gallons diiriilg t hc day, then a = 15,000, i\hcrc:~sif it dccrctises by 15,000 galloils, thcii a =  15,000. I,et b reprcsei~t,the chaiige in thc riiimbcr of galloris of water iri the rescrvoir diiriilg thc next day. Theii a 1 b rcpresciits the chaiige of volume of water i i l the rcsorvoir (measilrcd iri niimber of gallons of water) diiring the two day pcriod. If hoth a :~ndO rcpreseilt increases, then a and b are nat,ural niimbers, aiid our physical iriterpretatioii of additiori agrccs Sor natiiral numbcrs nith the usual defiiiition of :idditi011 iii ,V. These facts are so familiar that they \\oiild probably be acceptcd without cliiestion. Howevcr, Sor thc sake of our disc*iissiori, ne could use this physical description as the dcfiriitioii of the addit,ion of intcgers. Quite possibly in this way. thc rules of addition wcrc originally ot~taiilcd Iri terms of this examplc, the properties of addition (41.4) can bc iritcrpreted as statcments of commonplacac observations. Thc commutativc la~v a b =b a mcans t,hat a change of amount a iri orie day, folloi~ed by a changc of amount 6 thc next day, produces thc same rcsult iri thc tnOday period as a charige of amoiiiit b on the first day, followed by a charige of amoiirit a on the sccoiid day. Thc associative laiv a (b c) = (a O) c is even more cirident, since the two sides of this idcntity simply represent two different wnys of looking a t thc rcsult of thc changcs o11 three consccutivc days. The idcntitics a 1 0 = a and a (  a ) = O have similar iriterprctations.
+ + +
104
THE INTEGERS
[CHAP.
I f m is a natural number, and a is an integer, then we can informally define the product m a to be the integer which is obtained by adding a to itself m times. For example, 1 . a = a, 2 a = a a, 3 . a = (a a) a. This definition is both natural and useful. I f a is a natural number, then m a is the same as the product of m and a, obtained from Definition 32.6. If a = O, then m a = O, as the reader can show by induction on m. I f a = n, t h e n m  a = (m.n). Infact,
+ +
2 (n) 3 (n)
= = =
(n) (n) = (n n) = (2 n), 2 (n) (n) = [(2 n)] (n) (2 n n) = (3 n), etc.
I t is convenient to extend the definition of products so that any two integers can be multiplied together. This means that we must define O c and (m) . c for an arbitrary integer c and natural number m. If the multiplication of integers is to satisfy the distributive law
then there is only one way in which these products can be defined. In fact, using this distributive law and (41.4)) we obtain 0.c o . c + o = o . c + [(Oc) ((Ose))] = ( O c o c) [(O c)] = ( O O ) c [(O c)] = o c [(O c)] = o.
=
Also, (m)
+ + . c = (m) . c + O = (m)  c + [(m * c ) + ((m.c))] [(m)  c + mmc] + [(mc)] [(m) + m] . c + [(msc)] = 0 . c + [(m0c)]
+
= =
+ [(m
c)]
(m
e).
Thus, we are led to the mellknown rules for multiplying integers. These will be adopted as the formal definition of multiplication. DEFINITION 41.5. Jfultiplication. Let m and n be any natural numbers. Then (a) m n is defined as in N; (b) (m) (n) = m n; (c) (  m ) * n = n . (  m ) = (m.n); (d) m . 0 = 0 . m = 0; (e) (m)  0 = O (m) = 0; (f) 0 0 = 0.
411
CONSTRUCTION O F T H E I N T E G E R S
105
From Definitions 41.5 and 41.3, and the properties of addition and multiplication of the natural numbers (32.4) and (32.9), it is possible to deduce the familiar properties of multiplication of the integers. The proofs are elementary, but tedious because they require the examination of numerous cases. (41.6). Properties of multiplication. Let a, b, and c be integers. Then (a) a . b = b  a ; (b) (a b) c = a (b c) ; (e) if a . c = b c, then either a = b or c = 0; (d) a . 1 = a; (e) a  ( b + c ) = ( a  b ) (aac).
With the exception of the cancellation law (e), the properties of multiplication of integers (41.6) are identical with the properties of multiplication of the natural numbers given in (32.9). Since a . O = b . O = O for al1 integers a and b, the statement (32.9~) is not true for the integers. However (41.6~) shows that O is the only integer which cannot be cancelled.
... ,
are different.
(31,
(2))
(1)) @)
1, 2, 3,
=
...
+O
a and a
+ (a)
+
O for
3. Let the integers be considered to measure amounts of change. Give an interpretation of the operation of negation and interpret the identity a (a) = 0. 4. Show by induction on m that m O = O (where m is a natural number, and m O is the result of adding O to itself m times). 5. Using Definition 41.5 and the properties of multiplication of the natural numbers (32.9)) prove the laws a b = b a, a (b c) = (a b) c, and a 0 1 = a.
6. Using Definitions 41.2 and 41.5, prove that for any integers a and b, (a).b = a . (b) = (a.b), and (a)*(b) = ab.
7. Using Definitions 41.3 and 41.5, the properties of addition and multiplication of the natural numbers (32.4) and (32.9), (33.5), and (41.4a, c, d), prove the distributive Iaw a (b c ) = (a b) (a c) for the integers. [Hints: (a) Prove the law in the cases where a t least one of a, b, or c is zero. (First show from Definition 41.5 that d O = O d = O for al1 d E 2.) (b) Enumerate
106
THE I N T E G E I ~ S
[CHAP.
the cight possible cases in which nonc of a, b, and c is zero. (c) Consider the cases
separately, and state the meaning of thcsc idcntities in each of the three subcases m < n, m = n, m > n. (d) I'rove these cases (see 33.5). (e) I'rove al1 other cases, eithcr dircctly or by reducing them to the cases treated in (d).]
.
8. Gsing Dcfinition 41 .5, prove that if c and d are intcgers, and c # O , d # 0, then c d # O . llsing this fact together with the distributive law (41.6e) and the result of Problem 6, prove (41.6~). 9. Prove the associativc law of additiori a (b c) = (a b) c. [Hints: (a) Cse (41.4~) to prove the law in al1 cases in which a t least one of a, b, or c is zero. (b) Enunlerate thc eight possible cases in which none of a, b, or c is zero. a = a, and use this fact together with the distributive (c) Prove that (1) law to reduce these eight cases to thc three cases: (i) k+ (m+ n) = (k+ m) n, and (iii) k [(m) n] = (m n) = [(k) m] n, (ii) (lc) [k (m)] n. (d) Complete the proof outlined in Example 2 of case (ii). (e) Give the proof of case (iii) .]
+ +
+ + +
+ + +
+ +
42 Rings. Starting with the properties of addition and multiplication given in (41.4) and (41.6), it is possible to develop many useful facts about the integers. However, as t,he reader may have noticed, the system of rational numbers and the real number system also satisfy the identities listed in (41.4) and (41.6). Therefore, if we prove a theorem about the integers using only the facts contained in (41.4) and (41.6), the samc theorem should be true for the rational numbcrs and real numbers. There is oiie trouble with this usefiil observation. In order to carry a theorem which has been stated and proved for the integers over to the real or rational numbers, it is necessary to examine the proof of the theorem to be sure that it uses only properties which can be deduced from (41.4) and (41.6). Mathematicians have solvcd this problem by a simple but powerful idea. They have introduced a new term, "integral domain," to describe al1 systems on tzrhich are defined tmo operations (called addition and multiplication) satisfying al1 of the laws given in (41.4) and (41.6). Then, if a theorem can be deduced from the properties listed in (41.4) and (41.6), it is a theorem about integral domains, meaning that it is true for every system ~vhichsatisfies (41.4) and (41.6). In particular, it is true for the integers, rational numbers, and real numbers. 111 this section, ure deal with systems which are defined by a set of axioms, without concern for thc nature of the particular systems under consideration. Such a viewpoint is called abstract. There are several advantages (other than the possible economy of being able to treat many systems a t
421
RINGS
107
once) to be gained by an abstract approach t o mathematics. One of them is that in working with an axiomatically defined object, rather than a specific one, there is a n economy of ideas and concepts. Al1 of the superfluous notions and facts are thrown away, and our concentration is focused on the essential features of the object which we are studying. The abstract axiomatic approach to problems and theories has become a dominant feature of modern mathematics. hloreover this viewpoint is gaining importance in physical and social sciences. Anyone who wants to know what is current in science, particularly mathemat,ics, must become acqiiainted with abstraction. Instead of considering integral domains immediately, we introduce a more general concept,.
DEFISITIOS 42.1. A ring is a set A oii which are defined two biiiary operations x y and x y (called nddition and multiplication), and a unary opcration x (called negation), such that A contains among its elements a particular one O (called the xero* of A ) , and the following idcntities hold for al1 .r, y, and x in :
+ +
+ +
+ +
I t must be strongly cmphasized that in the definition of a ring A, nothing is assumed about the natiirc of the elemcnts of A . Any collcction of objects for which operations x y, .r y, and z are defined satisfying (a) through (g) is eligible to be called a ring. 'Liloreovcr, although the opcrations of a ring are called "addition, " "multiplicatioil, " and "negation," they need not a t a11 resemble the familiar operations of addition, multiplication, and negation of numbers. Thc only reyuirement is that therc are definitions which, for every x and y in A, determine the elements represented by x y, x y, and x. I t should also be remembered that a ring is determined not by its elements alone, but by the elements, togethcr with the operations of addition, multiplication, and negation. There are important examples of differcnt rings having the same set of elements.
* The use of the symbol O to rcprcscnt the zero ir1 cvcry ring is a longstanding mathematical tradition. Thcrc are instances in whicli this convention might cause confusion, but thcy are rare.
108
TIIE INTEGERS
[CAAP.
EXAMPLE 1. Thc numbcr systems Z (the intcgers), Q (the rational numbers), and R (the real numbers), with their familiar operations, are al1 rings. In fact, as we have already obscrved, these systems are integral domains, that is, they satisfy thc lavs given in (41.4) and (41.6). I t is easy to see t h a t (41.6a) and (41.6e) together imply Definition 42.1 (g), so that any integral domain is a ring.
EXAMPLE 2. The sct (2aja E Z ) of al1 even integers is a ring, again with the familiar opcrations.
0;
Then 4 is a ring. The symbols a and O may be interpreted in this exaniple as of an integer being odd or even. Howevcr, such an rcprcscnting the l>ropcrtics interprctation has no bearing on the question of A bcing a ring.
EXAMPLE 4. Let 1 1 = P(S), the set of al1 subsets of a set S. For X E A and Y E 1.1 (that is, X and Y are any subscts of S), define
X+ Y
( X n YC)u ( S c n 1');
x.Y = x
X
X.
Then with thcse operations, ;l is a ring. Nore generally, if A is any collection Y and Y are in 11, then X U Y and X n Yc are in A, of subscts of S such that if , then i l is a ring with respect to these operations. These are the collcctions which \ve callcd rings of sets in Section 16. Thc fact that these collections form a ring justifics the tcrminology "ring of sets."
The reader should verify that Examples 3 and 4 are rings as \ve claim (see Problem 14, Sectioil 14). It should be noted that the commutative 1aw of miiltiplication, n: y = y x, is not postulated for rings. If a ring satisfies the identity x y = y x, then it is called commutatice. There are important examples of rings which are not commutative, one of ivhich will be given in Chapter 10. Because the commutativc larv for multiplication is omitted from the postulates for a ring, it is necessary t.0 state both distributive laws, x (y x) = x . y z and (.c y ) x = z x y x. If the ring is commutative, then either of thcse laws can be deduced from the other. For example, (x y ) x = x (.c y) = x .z: z y =x x y 2.
+ +
,4. Then O
2 =
o.
+ n: = x and
THEOREM 423. J,et A be a ring aiid let x, 11, and x he any elcmcnts of A . (a) If x x = y x, theii . l : = y. (b) If x n: = C y, t,hen x = y.
+ +
We leavc the proof of Theorem 42.2 as an cxercise for thc rcader. T o prove Thcorem 42.3(a), supposc t,hat x x =y 2. Thcn, by Dcfinition 42.l(b), (c), and (d), z = x O =z [X (x)] = (2 x) (2) = (y X) (2) = y [Z (x)] = y O = y. The implication (b) folloivs from (a) and the commutative law, 42.l(a).
+ +
+
+ + +
+ + +
+ +
TIIEOREM 42.4. Let A be a riilg, and let r a i ~ d y be elemcilts of 11. Then (a) (X) = X ; (b) (2) (y) =  ( . E 1 y); (c) 2 . 0 = 0 . x = o ; (d) (x) y = z (  l j ) = (z y); (y) = x y. (e) (x) I t is not hard to prove the identitics of Theorem 42.4 for the integers by the direct use of Definit,ioris 41.2, 41.3, and 41.5. I t is significant however that the proof for general rings is simpler and more clegant. To prove (a), note that by Definition 42.1(a) and (d),
Thus, by Theorem 42.3, . x = (x). The proof of (b) also uses thc cancellatiori law, Theorem 4  2 3 By Definition 42.1 (a), (b), (c), (d), and Theorem 42.2, \ve obtain (.E y) [(x) (Y)] = (Y x) 1 [(x) (y)] = Y [x ( (  4 (u)I = y [(x (x)) (y)] = y [O (!/)1 = Y (y) = 0 =(y) = (.L y). (X y) [(x y)]. Thus, (2) By Definition 42.1 (e), O O = O. Therefore x O 1O = .r O = .r (O 0) = (x 0) (.E O) by Definition 42.1 (f) and (e). Lysing Thcorem 42.3, \ve obtain x O = O. Similarly, O . x = O. This proves (4 To provc (d), use the distributive law, Definition 42.l(f) and (g), togcther with the result just proved, and the cancellation law, Theorem 42.3. For instance, x y $ (m) . y = [x (z)] y = 0 y = O = y = (z y) [(x y)] by Definition 42.1 and (c). Therefore, (2)  (z y) by Theorem 42.3. Thc final statement (e) of Theorem 42.4 is obtaincd by tno applications of part (d), together with (a) : (.x) (y) = [.x (y)] =  [  ( J y)] = x0y. From a mathcmatician's viewpoiiit, the advantage of the integcrs over the natural numbers is that thc integers are closed under subtraction. That is, if a and b are any iritegers, then it is al~vays possible t'o find an intcger n: which is a solution of
+ +
+
+ + +
+ +
+
+ + + + + + +
+
+ +
110
THE IKTEGERS
[CHAP. 4
THEOREM 42.5. Let A be a ring. Let x E A and y E A. Then there is x = x, namely x = one and only one element x E A satisfying y x (Y>.
Proof. Let x = n: (y). x =y [x (y)] = y Then y [(y) + x ] = [y (y)] x = O + x = x + O = x, bythefirstfour (y) is a solution of y laws of Definition 42.1. Therefore, x = x x = x. On the other hand, if x is a solution of y x = x,then x (y) = (y 2) (y) = ( 2 y) (y) = 2 t [y (y)] = x O = x. Hence, x = x (y) is the unique solution of y z = x.
+ +
+ +
+ +
+
+ +
+ +
It is customary to use the subtraction notation x  y to denote the solution of the equation y x = x in an arbitrary ring. That is,
The postulates for rings can be given using the binary operation of subtraction instead of the unary operation of negation. [Definition 42.l(c) and (d) are replaced by the single identity y (x  y) = x.] However, Definition 42.1 is more familiar and,convenient. As we have pointed out in Example 1, the systems Z and Q of integers and rational numbers are rings. Also, Z is a subset of Q. However, more can be said: the operations of addition, negation, and multiplication in Q of the elements belonging to Z agree with the usual operations for the integers. In the study of rings, this situation occurs frequently enough to justify the introduction of a special term to describe it.
DEFIKITIOX 42.6. Let A be a ring. A nonempty subset B of A is called a subring of A if for every x and y in B (not necessarily different) the sum x y, the negative x, and the product x y al1 belong to B.
Since x E B, y E B implies that x E A and y E A, the sum x y, negative x, and product x y always exist as elements of A. The condition that B be a subring is merely the added assumption that these elements are in B and not just A. I f B is a subring of A, then the operations of A, applied to B, can be considered as operations on B. The set B with the operations which it inherits from A forms a ring because the identities of Definition 42.1 are automatically satisfied in B, so that the term subring is justified. I n speaking of a subring B of A, it is customary to think of B as a ring with the ope<atons on B agreeing with the operations of A.
. . .l.
Then B is a subring
ESAMPLE 6. Let C = (a/21a E 2 ) . Then C G Q, b u t C is not a subring of Q, since; for cxample, = is a product of elemcnts of C, but 4 C.
+ +
EXAMPLE 7. 4s we have noted, Z is a subring of Q. However, N is not a subring of Z, nor of Q, bccause if n E N, then n 4 N.
Thc concept of isomorphism, which was mentioned in Section 33 for general mathematical systems, is very important in the theory of abstract rings.
DEFISITIOS 42.7. Let A and B be rings. Thcn A is isomorphic to B if there is a onetoone correspondcnce x ++ x' betwccn the elements of A and B, such that sums, products, and negatives are preserved by the corresponderice. That is, if x ++ x' and y y', then
x+y++x'+y',
Xyx'.y',
and
x(xt).
If a ring A is isomorphic to a ring B, thcn any property of A which can be expressed in tcrms of the operations of addition, multiplication, and negation is also a propcrty of B, and vice versa. E'or instance, suppose that A is a commutative ring which is isomorphic to thc ring B. Lct x' and y' be any elemcnts of R. Then there exist elcments x and y in A such that x ++ x' and y ++ y' by the given isomorphism. htorcover, x y 2 ' y' and y x y' x'. Since x y = y x in A, and the correspondence is onetoone, we havc x' y' = y' x' in B. Thus, B is a commutative ring, since x' and y' were arbitrary elements of B. The mcaning of Dcfinition 42.7 is that isomorphic rings are indistinguishablc in every way which has to do with t,he fact that they are rings, even though A and B may be dif'ferent as scts.
EXAMPLE 8. 1,et Jf = ((a, a)la E 2 ) . Define addition, niultiplication, and negation* in J1 by the rules
= = =
(a 1 O ,a (a, a),
+ O),
o (b, b)
G (a, a)
(a b, a b),
* Tlic symbols used to denote addition, multiplication, and negation in a ring are usually 1, and . n'licri discussirig tno diffcrcnt rings a t the same time, i t may be confusing t o denote thc corres1)onding opcrations by the same synibols \vil1 sometinies (although this \vas dorie iri Dcfinitiori 42.7). I n this case, use 0 , 0, and O t o represent the opcratioris in one of tlic rings, and the usual symbols t o denote the operations in the other ring.
e,
112
THE INTEGERS
[CHAP.4
where and  denote the ordinary operations in Z. It can easily be verified that the operations O, 0, and O in d l satisfy the conditions of Definition 42.1, so that 111is a ring. The correspondence (a, a) ++ a is a onetoone correspondence between the elements of hf and Z, which is an isomorphism. For example,
O,
+,
(a, a) and
a,
(b, b)
+ +
b,
t ,
(a, a) O (b, b) = (a
+ b, a + b)
+ b.
The reader can check that multiplication and negation are also preserved by this correspondence.
9. The rings Z and R are not isomorphic. I n fact, we will show in EXAMPLE Section 73 that R is not denumerable. Since Z is denumerable (see Section 12, Example 3), there cannot be any onetoone correspondence between Z and R. For the same reason, Q and R are not isomorphic.
EXAMPLE 10. The rings Z and Q are not isomorphic. Of course there are onetoone correspondences between Z and Q (see Section 12, Example 5, for instance), but none of these are isomorphisms. To prove this statement, suppose that there is an isomorphism between Z and Q. Let r be the rational number corresponding to the integer 1: 1 r. Now let a be the integer corresponding to the rational number r/2:
Since any isomorphism preserves sums, the correspondence a itself gives 2a = a at , r/2 r/2 = r.
t ,
r/2 added to
,r and 2a t , r, so that 1 = 2a. However no integer a satisfies 2a Thus 1 t This contradiction shows that Z cannot be isomorphic to Q.
1.
EXAMPLE 11. Let S be a set with one element. Let B = P(S), with the operations defined as in Example 4. Then the ring B is isomorphic to the ring A described in Example 3. The correspondence is given by S t , a and @ ++ O. EXAMPLE 12. Let S and T be two finite sets with the same number of elements. Then the rings A = P(X) and B = P ( T ) , with the operations defined in Example 4, are isomorphic.
1. Let A
+ (m2, n2)
= = =
(mi
+ m2, n i + n2),
Show that with these operations, A is a ring. What is the zero element of this ring? Show that in A, i t is possible to find two nonzero elements whose product is zero. 2. Let B = ((m, n)lm E Z, n E 2). Define (mn, n2) (mi, n i ) (mi, n i ) (ma, na)
= =
+ mnni),
3. Let A
Find al1 possible ways in which ncgation and multiplication can be defined on A in order t h a t i t will be a ring. 4. Verify t h a t Examples 3 and 4 are rings. 5. Show that there is one and essentially only one ring which contains one element.
6. Show t h a t if x
+y
x in a ring, then y
0.
=
x
(z  y)
10. Prove the assertion in Example 5. 11. Show t h a t if B is a subring of A , and if C is a subring of B, then C is a subring of 11. 12. Verify that J l , in Example 8, is a ring. Complete the proof t h a t the given correspondence is an isomorphism between A l and Z. 13. Prove the statements madc in Examples 11 and 12. 14. Show t h a t the ring Z of al1 integers is not isomorphic to the ring of al1 even integers. 15. Prove that if A, B, and C are rings such that A is isomorphic to B, and B is isomorphic to C, then A is isomorphic to C. 16. Let A be a noncmpty set on which are defined three binary operations x y, x y, and x  y, satisfying the parts (a), (b), (e), (f), (g) of Definition 42.1, and the law y (x  y) = x. Do not assume the existence of a zero or negation in A .
114
THE INTEGERS
[CHAP.
(a) Prove the following for elements u, u, w, x, and y of A. [Hint: Use part (3) in the proofs of (4), (5), (6)) and (7) .] (1) If y = (u  u) (u  u), then u + y = u a n d v + y = u. (2) If u w = u w for some element w, then u x = u x for al1 x. (3) If u w = u w, then u = u. (4) u u = w if and only if v = w  u. ( 5 ) (x Y)  Y = x (6) Y (x  x) = Y (712x= yy
+ +
+ + +
+ + +
(b) Show that by suitably defining the element 0, and the operation of negation, A becomes a ring in the sense of Definition 42.1.
17. Show that if x  y is suitably defined in a ring A, then the law y (x  y) = x is satisfied in A. This result combined with Problem 16(b) shows that the postulate set consisting of (a), (b), (e), (f), (g) of Definition 42.1, and the condition y (x  y) = x, is equivalent to Definition 42.1 for a ring.
43 Generalized sums and products. The purpose of this section is to investigate the consequences and generalizations of the associative, commutative, and distributive laws of addition and multiplication in rings:
These laws are satisfied in the systems of natural numbers, integers, rational numbers, real numbers, complex numbers, and indeed in any commutative ring. Thus our results have a wide range of applicability. Moreover, the study of this section will give the reader a chance to become better acquainted with the abstract approach to mathematics. The fact t'hat multiplication and addition are binary operations means that numbers are always added and multiplied two at a time. However, 5 l . Because ordinary everyone is familiar with expressions like 2 addition satisfies the associative law, (42)) it does not matter whether we add 2 and 5 and then add 1, or add 2 to the sum of 5 and 1. Consequently, the expression 2 5 1 makes sense, even though it does not indicate which two of the numbers are to be added first. In general, the expression
+ +
+ +
indicates the result of adding the n numbers, a l , az, . . . , a, in any way, two
431
115
a t a time. The fact that the result does not depend on how the terms of thc sum are associated can be proved by induction, using (42)) in a way which is similar t o the proof of Theorem 15.2 given in Section 23. This result is called the general associative law. Thcre is a convenient notation which is used t o indicate the sum of several numbers. I t is thc expression Ci"=lai, standing for a l a2 a,. We read C;=l ai as "the sum of the ai from i = 1 t o i = n." A sum with a single term is providcd for by adopting the convention that C f = l ai = a l . 'I'he symbol C is called the summation sign. The letter i in thc expression ai is called the index o j summation. Another choice for the index of summation does not change the sum. Thus,
+ +
+ a2 + + a,.
m 
x:z
C;=oai,
EXAMPLE 4. EXAMPLE 5.
ras
I t is easy to show that the commutative law of addition, (43), can be extended to sums with any number of terms. Using thc notation for sums which we have just introduced, the "general commutative law" can be stated as follows. (43.1) Let a l , a2, . . . , a, be any numbers. Let il, i2,. . . , inbe any rearrangement of the indices 1, 2, . . . , n. Then
For example, if il a4 a2 al a3
+ + +
= =
4, i2 = 2, i3 = 1, i4 = 3, then ~ al a2 a3 a4 = t:=iai.
+ + +
f aij == ~
116
THE INTEGERS
[CHAP.
Proof. The proof of (43.1) is by induction on n. I f n = 1, there is nothing to prove. Note that one and only one of the numbers sil, ai2, . . . , ai, is a,. Suppose that aik is a,. Then by the general associative law and (43)
(ai,
Un
Since n does not occur in the list il, i2,. . . , ikl,i k + l , . . . , in,this finite sequence is simply a rearrangement of al, a2, . . . , a,l. Therefore, the induction hypothesis yields
Consequently,
An important special case of the general commutative law occurs when we consider sums of sums :
In this expression, we have a doubly indexed set of numbers ai,j, where i ranges from 1 to n and j ranges from 1to m. We first sum over j (for each i) and then add the resulting sums. For example,
431
GESEIZALIZED SCMS
A N D I'KOIIUCTS
117
This equality expresses the fact that if the numbers ai,j are written in a rectangular array
and if the terms in each row are added, and then these sums added, the result will be the same as if the terms in each column are added and then the totals of the columns are added.
EXAMPLE 6. Letai,j = (:)ifO 5 j 5 i 5 n a n d a i , j = OifO 5 i < j Then by formula (12) which gives the sum of the binomial coefficients,
n.
The associativc and commutative laws of multiplication, (44) and (45)) can be generalized to show that the product of n numbers al, a2, . . . , a, does not depend on the grouping or order of the numbers in the product. Shus, for example, (al a2) (a3 a4)
=
to denote the product of the n numbers a l , a2, . . . , a,. There is also a useful notation for products which is similar to the C notation for sums.
118
THE INTEGERS
to stand for the product a l a2 a,. The expression n y = 1 ai is read is called "the product of the ai from i = 1 to i = n." The symbol the product sign.
EXAMPLE 9.
(JJr=l
ni=i a;.
n+k
There is a useful generalization of the distributive law, (46). al, a2, . . . , a,, and bl, b2, . . . , bm be any numbers. Then
Let
We leave the proof of this identity as a problem for the reader. As we observed in Section 41, it is possible to define the product n a for a natural number n and an integer a in terms of the addition of integers. This remark can be generalized to arbitrary rings. DEFIXITION 43.2. Let x be an element of the ring A . Define
n summands
n(x)
(x)
+ (x) +
n summands
if n is a natural number.
+ (x),
The symbol O is used here with two different meanings. The O on the left is the integer O (that is, iP), while on the right, O stands for the zero of the ring.
"
431
119
On the basis of Definition 43.2, \ve can speak of ax where a E Z and x is an element of any ring. The generalized commutative, associative, and distributive laws of addition can be specialized to obtain the following result.
TIIEOREM 43.3. Let x arid y be any clements of the ring A . Let a and b be arbitrary integers. 'i'hcn
(a) (b) (c) (d) (e) (f) (g) az bx = (a b ) z; a(bx) = (aO)x; a(x y) = ax ay; a(x y) = (ax) y = .r (ay) ; l x = x; a0 = 0 ; a(x) = (a)x.
+ +
2
i=l
(j2
1)
2i
(c) c o i 2
and
fl notation.
+ (1)"'n6
(d) (k
+ 1) (k + 2)
120
THE INTEGERS
5. Let ai
a for i
=
n
1, 2, . . . , n. Show that
=
E ai
(a)
na,
fi
ai
an.
ax E
bj
=
abj
11. Defineai,i = (i)t'forO 5 j < i < n a n d ai,i By suitably interpreting the identity
OforO
+ + (7)
(7 ):.
441
IKTEGRAL D O M A I N S
121
4 4 Integral domains. As \ve observed in Section 42, the integers satisfy t,hree identit'ies which do not occur in the definition of a ring. This section is concerned with some very simple consequences of these special properties.
DEFIXITION 44.1. Let A be a ring. An element e E A is called a n identity for A , or an identity element of A , if
e..r for al1 x E A .
=
x  e = x,
frequently, and wit,hout further discussion. If a ring contains a t least one element, x different from O, then O cannot be an ident,ity element, since O x = O # z. Thus the only ring in which zero is an identity element is the ring containing only the element O.
D E F I S I T I24.3. O ~ An integral domain is a commutative ring A with a n identity element different from zero, such that the folloming cancellation law is satisfied in A : if x, y, and x are elements of A such that x z = y  2 , then either x = y, or x = 0.
As we have observed, the rings of integers, rational numbers, and real numbers are al1 integral domains. There is a way t,o determine whether a commutative ring satisfies the cancellation la\\. This test is most conveniently stated, using a nenr notion.
122
THE INTEGERS
[CHAP.
DEFINITION 44.4. Let A be a commutative ring. An element x E A is called a divisor of xero if for some x # O in A, x . x = O . If x is a divisor of zero and x # O, then x is called a proper divisor of xero. I t is evident from Theorem 42.4(c) that if A contains at least one element different from zero, then O is a divisor of zero. However, an integral domain has no proper divisors of zero. THEOREM 44.5. Let A be a commutative ring with an identity element different from zero. Then A is an integral domain if and only if A contains no proper divisors of zero.
Proof. Suppose that A is an integral domain. We wish to prove that A has no proper divisors of zero. Suppose that z is a divisor of zero. Then there is an x # O in A such that x x = O . By Theorem 42.4(c), O . x = 0. Thus, x . x = O x. By the cancellation law, either x = O, or x = 0. . In other words, O is the . Thus x = O However, by assumption, x # O only divisor of zero in A. That is, A contains no proper divisors of zero. The proof of the converse depends on the distributive law. Suppose that A has no proper divisors of zero. Since A is commutative and has an identity element not equal to O , it is only necessary to show that A satisfies the cancellation law. Assume that x, y, and x are elements of A satisfying x * x = y x. Then
by Definition 42.l(f) and Theorem 42.4. Thus, by Definition 44.4, either x (  y ) = O, or else x is a divisor of zero. Since we are assuming that A contains no divisors of zero except O (that is, no proper divisors of zero), it follows that either x (y) = O, or z = O . I f x (y) = 0, than x = y. Thus, either x = y, or x = O . This shows that A satisfies the cancellation law.
Some of the most interesting problems in the study of the integers are concerned with divisibility. In fact, a large part of the theory of numbers is devoted to the divisibility properties of the integers and the natural numbers. Most of the next chapter deals with this topic. In the integers, x divides y if there is an integer z such that y = x x. This definition makes sense in an arbitrary ring A, but the notion "x divides y" is not very useful unless A is an integral domain. DEFINITION 44.6. Let A be an integral domain. Let x and y be elements of A. Then x divides y in A (or y i s divisible by x in A, or x i s a factor of y in A), if there is a z E A such that y = x x.
441
INTEGRAL DOMAINS
123
It is important to note that the notion of divisibility depends not only on the elements x and y, but also on the integral domain under consideration. For example, 2 divides 3 in &, but not in Z. Usually, however, discussions involving divisibility are restricted to elements of a fixed integral domain. In this case, the terminology "x divides y" and "y is divisible by x," and the notation xl y if x divides y,
x i y if x does not divide y can be used without danger of confusion. For example, in the next chapter, the notion of "divisibility" and the symbolism aJb will always refer to divisibility in the ring Z of al1 integers. The reader can easily verify the following facts. THEOREM 44.7. Let A be any integral domain. Then for elements x, y, 2, U, v of A, (a) if xly and ylx, then xlx; (b) if xly and xlz, then xI(y 2); (c) if x(y, then x xly x;
( 4 xlx Y; (e) if xl y and xlx, then xl(u y (f) 4 0 , 11x7  l b ; (g) if OIx, then x = 0.
+v
2) ;
I f xly and ylx, then by definition, elements z and w exist in A such that f x = z . y , y = W  X . Hence, 1 * x = x = x a ( w  x )= (xw) . x . I x # O, this implies by Definition 44.3 that x w = l. In the ring of integers it is easy to see from Definition 41.5 and (33.2) that the condition x . w = 1 can be satisfied only if z = w = 1, or if x = w = 1. Therefore, either x = y, or x = y. The same conclusion is obtained if x = O, since y = w x = w O = O. Thus, we obtain the following result for integers.
THEOREM 44.8. Let x and y be integers. Suppose that x divides y and y divides x in 2. Then either x = y, or x = y.
Of course, if x = y, or if x = y, then xly and ylx in any integral domain. If y = x x and y = x . w in an integral domain A, then x z = x w. Hence, if x # O, then x = w. This observation justifies the following definition.
124
THE INTEGERS
[CHAP.
DEFINITION 44.9. Let A be an integral domain. Let x and y be elements of A such that x divides y in A, and x # O. Then the unique element x such that y =x*x
is called the quotient of y and x, and it is denoted by
1. Show that the ring of Example 2, Section 42, does not have an identity. Show that the rings of Examples 3 and 4 do have identities. 2. State which of the following rings are not integral domains and give your reasons. (a) The ring of al1 rational numbers. (b) The ring of Example 2, Section 42. (c) The ring of Example 3, Section 42. (d) The ring of Example 4, Section 42. (e) The ring of Problem 1, Section 42. 3. Give an example of an integral domain which contains exactly two elements. 5. If x and y are rational numbers, what are the conditions for x to divide
y in Q?
6. Using Definition 41.5 and (33.2), show that in the ring 2 , the condition
z . w = 1 can be satisfied only if x = w = 1 or x = w = 1.
7. Let A
Show that A is an integral domain. Show that (xi, yl) divides (x2, yz) in A if and only if xf yf divides both 21x2 yiy2 and xlyz  x2yi in 2.
8. Let A be an integral domain containing elements x, y , and x. Prove the following facts. (a) If zlx and zly, tllen x/z y/x = (x y)/z. (b) If zlx, then y (x/x) = (y x)/x. (c) If yjz and x((z/y), then (x y)lz, and z/(x. y) = (z/y)/x.
9. Show that if B is a subring of an integral domain A, and if B contains the identity element of A, then B is an integral domain.
451
125
= {aela E
10. Lct A be a ring with the idcntity element e. Show that B is a subring of A . (See Definition 43.2 and Theorem 43.3.)
2)
11. Lct 1 1 bc an integral domain. Show that if B is a ring wEich is isomorphic to A , then B is also an integral domain.
45 The ordering of the integers. Since the ordcring of the natural numbers is of such great importarice in mathemutics, it is not surprising that ]ve JVould want to define a similar order relation on the intcgers. The ii7ay in which this is done is familiar. The ordering in Z is given by
But why order Z in this ivay? For examplc, why not, define
The ansiver is that thc ordcring (48) is thc only really useful onc. If an ordering is to be defincd on 2, it should agree on N ivith the usual ordcring, and it should satisfy as many of thc basic conditions listcd in (33.2) as possiblc. I n part.icular, such an ordering should a t least have the property that addition of the integcr c to cach of the integers a and b does not change the order relation between them. That is, if a
<
b, then a
+ c < b + c.
(49)
The fact is that thc only ordering of Z which agrees on N with the usual order relation, and which satisfics (49)) is the familiar one given by (48). This assertion is not hard to prove. Suppose that < is such an order relation defincd on A . If m and n are natural numbers and m < n, thcn (49) implies
< m + 1, so that
Conseq uently, m
=
+ (m)
< m + (m)
0.
Thus, our order relation agrees ivith (48). I t is possible t,o describe the ordering of Z given by (48) in a convenient way.
a is less than b (or b DEFIXITIOS 451. Let, a and b be intcgers. T h e i ~ is greater han a) if b  a E N. In this case, we write a < b (or b > a).
126
THE INTEGERS
[CHAP.
In other words, a < b if and only if there is a natural number m such that b=a m. It follows in particular that Definition 45.1 agrees with the usual ordering of N. (See Definition 33.4.)
THEOREM 45.2. Let a, b, and c be any integers. Then (a) either a < b, a = b, or b < a, and it is impossible for more than one of tzheserelations to be satisfied by a given pair a, b of integers; (b) if a < b, and b < c, then a < c; c <b c; (c) if a < b, then a (d) if a < b and c > O , then a c < b c.
These statements are easily derived from the properties of the integers and the natural numbers. For instante, if c is any integer, then by Definitions 41.1 and 41.2 exactly one of the following holds true: c E N, c = 0, c E N. Applying this remark to c = b  a, it follows that either b  EN, b  a = 0 , o r (b  a) = a  EN. Thus, by Definition 45.1, either a < b, a = b, or b < a. The condition c > O is obviously essential for Theorem 45.2(d) to be true. We point this out because neglecting to check this condition is a frequent source of error in algebraic manipulations involving inequalities. It is possible to derive most of the useful properties of the ordering of the ring Z from Theorem 45.2 (a), (b), (e), and (d). As we will show later, the rational numbers and the real numbers also have orderings which satisfy these laws. Thus, any theorems which can be proved using only the properties of rings and the laws given in Theorem 45.2 will also be true for Q and for R. By introducing the abstract notion of an ordered integral domain, we can cover al1 of these cases a t once. DEFISITIOX45.3. Let A be an integral domain. Suppose that anorder relation (written x < y or y > x) is defined on A satisfying the conditions of Theorem 45.2. That is, if x, y, and z are elements of A, then (a) either x < y, x = y, or y < x, and it is impossible for more than one of these relations to be satisfied by a given pair x, y of elements of A ; (b) if x < y and y < z, then x < x; (c) if x < y, t h e n x z < y z; (d) if x < y, and x > O , then x z < y 2 . Then, A is called an ordered integral domain.
By Theorem 45.2, Z is an ordered integral domain. As we remarked above, the number systems Q and R are also ordered integral domains so that al1 definitions and theorems concerning ordered integral domains apply to each of the rings Z, Q, and R.
451
127
TIIEOREM 45.4. Let A be an ordercd integral domain. Let x, y, z, and w be any elemerits of 14. Thcn (a) if x < y and x < w, t,hen x z < w; (b) if x > O and y > O, then x { y > 0 ; (e) if x > O and y > 0, then x y > 0.
+
+
x <y x by Definition 45.3(c). Also, if Proof. If x < y, then x z < w, thcn y z < ?j w by Ikfinition 45.3(c) and the commutativity of addition. 'i'hus, by Ilefinition 45.3(b), if x < y and x < w, then x z < ?j W. This proves (a). 'i'he statement (b) is a special case of (a), and the statement (e) is a special case of Definition 45.3(d), using Theorem 42.4 (e) .
+ +
THEOREM 45.5. 1,et A be an ordered integral domaiii. Let x, y, and x be any elernent,~ of A . Then
(a) (b) (c) (d) (e) if x < y, thcn y < x; if x < y a n d x < O, then y . 2 if x < O and y < O, then x . y if x # O, then x2 > 0 ; 1 > 0.
Proof. (a) B y Dcfinition 45.3(c), y = x [(x) 1 (y)] < y [(x) 1 (y)] = x. (b) If z < O, then O < 2 by (a). Thus, by 45.3(d), (x x) = x . (2) < y (x) = Theorem 42.4 and D~finit~ion Hencc, y . z = [(y.z)] < [(.c.x)] = x . 2 . (e) This (y.z). statcmcrit is obtained from (b) by taking y t o be O and x t o be y. (d) If x # O, then either x > O, or x < O. If x > O, thcn x2 > O by Theorem 45.4(c). If x < O, then x2 > O by (c). (e) By Definition 44.3, 1 # O. Thus, by (d), 1 = 1 " 0.
It follows from Theorem 45.5(e) that not every iritcgral domain can have defined on it an ordcr relat,ion satisfying the coilditions of Definition 45.3, since there cxist integral domains A which satisfy thc condition 1 1 = O (sec Example 3 , Section 42). If such an 11 could be made irito an ordcrcd integral domain by a n order rclation <, then O < 1. Conscqucntly 1 = O $ 1 < 1 1 = O. This contradicts Dcfinition 45.3(a). In any ordered integral domain A , the elements of the set
are called the positice elements, and the elements of the set
128
THE INTEGERS
[CHAP.
If x E P and y E P, then by Theorem 45.4, x y E P and x y E P. Moreover, for any x E A, either x > O, x = O, or x < O . That is, any element of A is either positive, zero, or negative. This remark explains why an element x of an ordered integral domain is called nonnegative if either x > O, or x = O. By Definition 45.1, an integer a is positive if and only if a = a  O is in N. For this reason, the natural numbers, when regarded as elements of 2, are often called positive integers.
2. Let P be the set of al1 positive elements of an ordered integral domain A . Let M be the set of negative elements of A. Then A = P U (O) U 1M, where the sets P, (O), and M are pairwise disjoint. Justify this statement.
3. Let A be an ordered integral dolnain with P as its set of positive elements. Show that x < y in A if and only if y  x E P.
4. Let A be an integral domain. Let P be a subset of A which satisfies the following conditions : (a) if x E P and y E P , then x y E P and x y E P ; , then either x E P , or x E P; (b) if x E A and x # O (c) o 4 P.
Give an example to show that this is false. Find conditions on x, y, x, and w that will guarantee that x x < y w.
6. Show that i t is impossible to define an ordering of the integral domain given in Problem 2, Section 42, so that i t becomes an ordered integral domain. [Hint: Find an element x # O such that x2 < O.]
7. Let A be an ordered integral domain. Suppose that u E A, and m is a natural number. Prove the following statements. (a) If m is odd, then the equation xm = u has either one solution x in A, or else no solution in A . Give examples for the case A = Z to show that both of these possibilities can occur. (b) If m is even, and u > O , then the equation xm = u has either two solutions in A, or else no solution in A. Give examples for the case A = Z to show that both of these possibilities can occur. f m is even, and u = O , then x = O is the only solution of xm = u. (c) I (d) If m is even, and u < O, then xm = u has no solution in A.
461
PROPERTIES OF ORDER
129
46 Properties of order. Some of the most important concepts encountered in the calculus are defined in terms of the order relation of the real numbers. Many of these notions can be studied profitably in the abstract setting of ordered integral domains.
DEFINITION 46.1. Let A be any ordered integral domain. Then x 5 y 2 X) means that either x < y, or x = y. The relations x < y and (or y x y are called inequalities.
<
THEOREM 46.2. Let A be any ordered integral domain. Let x, y, z, and w be arbitrary elements of A. (a) If x < y and y z, or if x y and y < z, then x < z. (b) I f x y and y z, then x z. (c) If x y and y x, then x = y. y, or y x. (d) Either x y, y < x is satisfied. (e) Exactly one of the relations x (f) I f x < yandz w,orifx yandz < w , t h e n x + z < y + w . (g) I f x y and z w, then x z y w. y and z 2 0, then x z y z. (h) If x (i) If x y and z O, then y z x z. (j) For al1 x, x2 2 0.
< <
<
< <
<
<
<
+ < +
< <
Proof. The proof of this theorem is routine. For example, if x < y and z, then either x < y and y < z: or x < y and y = z. In the first y case, x < z by Definition 45.3(b). In the second case, x < z, since x < y = z. This proves the first part of (a). To prove the first part of (f), suppose that x < y and z 5 w. I f z = w, then x z < y w, since z <y z by Definition 45.3(c). If z < w, then x z <y w x by Theorem 45.4(a). We leave the remaining statements for the reader to prove.
<
I t is common practice to write sequences of inequalities and equalities. For example, is an abbreviation for x < y, y = z, z 2 w, and w < u. Usually al1 of the inequalities in such a sequence are directed in the same way, from small to large, or from large to small. I t then follows from Theorem 46.2 that the inequalities obtained by omitting part of the sequence are valid. In the above example, we get x < z, x < w, x < u, y w, y < U, and z < u. Frequently sets are defined by means of inequalities. By using the laws given in Theorem 46.2, it is often possible to simplify the descriptions of these sets.
<
130
THE INTEGERS
[CHAP.
EXAMPLE 1. Determine {x E Rlx2 x 5 2). The condition x2 x 5 2 is The product 2) (x  1) = x2 x  2 5 2  2 = O. equivalent to (x 2 and x  1 have the same sign (either posi(x 2)(x  1) is positive if x tive or negative). Therefore, the product (x 2) (x  1) will be 5O if and only if x 2 and x  1 have opposite signs, or one of these factors is zero. Since x1<x 2, it follows that (x 2)(x  1) 5 O is equivalent to x  1 5 O 5x 2. Consequently, x2 x 5 2 if and only if x 5 1 and 2 5 x. Theref ore, J = { X E R ] 2 5 X < 1). { X E R ~ x ~ + x 2)
+ +
+ +
DEFINITION 46.3. Let A be any ordered integral domain. Let S be a nonempty subset of A. An element x in A is called the smallest (or least, or minimum) element of S if x E S and x 5 y for al1 y E S. An element x in A is called the largest (or greatest, or maximum) element of S if x E S and y _< x for al1 y E S. The smallest element of S (if it exists) is denoted by min S and the largest element of S (if it exists) is denoted by max S. EXAMPLE 2. Min {x E Rlx 2 0) = 0, max {x E R / X 2 O) does not exist, min {z E Rlx > O ) does not exist, min {x E Z/x > O } = 1, max {x E Z(X> 0) does not exist. Generally, a nonempty set need not have either a smallest or a largest element. For example, it is easy to see that if S = A, then neither min S nor max S exists. However, if S is finite, the situation is different. THEOREM 46.4. Let S be a nonempty finite subset of an ordered integral domain. Then S has a smallest element and a largest element.
= 1, then xl is both the largest and the smallest element of S. Suppose that n > 1 and that every set containing fewer than n elements has a largest element and a smallest element. Let
I f n
By assumption, T has a smallest element xi. Then by Theorem 46.2, either x; 5 x,, in which case xi is the smallest element of S, or x, < xi, in which case x, is the smallest element of S. Similarly, S contains a largest element. This completes the proof of the induction step and proves the theorem.
461
PROPERTIES OF ORDER
131
THEOREM 46.5. Let A be an ordered integral domain. Let x l , x2, . . . , x,, and y be any elements of A. (a) min ( x l , 2 2 , . . . , ~ n ) y = min (xl y, x2 4 Y , . . , xn y), max (21, $ 2 , . . . , xn) $ y = M ~ X (51 Y , x2 $ 9, . . . , X n $ Y ) . (b) If y O, then
>
+ +
y amin ( x l , ~ and
2 , .
. . . ,~
n= ) max ( y x l , y
X2,
Proof. Let xi be the smallest of the elements x l , x2, . . . , x,. Then min { x l , ~ 2 . ,. . , x,) = xi, and xi 5 x l , xi ~ 2 . ,. . , xi x,. For any X i y 5 X n y. Thus) y, X i y 5 21 y, X i y X2 Y, since xi y occurs among the numbers xl y, x2 y, . . . , x, y, it follows that
+ < +
<
<
min ( x l
+ y, x z + y , . . . ,xn $ y)
= Xi $y =
min { x i , X 2 , . . . ,xn)
+Y
. . . , y . xn)
Xi =
y min ( ~ 1X,2 ,
. . . , x,).
max { y . x l , y a x 2 , .
. . , y  x,)
y.
Xi =
y 0 m i n( x l , ~
2 , .
. ., ~ n ) .
This proves the first part of (e). The second statement of each part of the theorern is proved in a way which is similar to the proof of the first statement.
132
THE INTEGERS
[CHAP.
DEFINITION 46.6. Let A be an ordered integral domain. Suppose that x E A. The absolute value of x, denoted* by 1x1, is defined to be 1 x 1 = rnax (x, x). In other words, 1 x 1 = x if x integers for example,
O and 1 x 1 = x if x
< O.
Thus, in the
THEOREM 46.7. Let A be an ordered integral domain, and let x and y be arbitrary elements of A . (a) 1 x 1 2 O; moreover, 1 x 1 = O only if x = 0. (b) I f x > O, then Iyl x if and only if x y x. (c) Ix (e) x
(4 IX YI
+ yl 5 1x1 + lyl.
=
<
< <
+ 1x1 2 o.
1 x 1 Iyl.
> O,
or x
> O.
Hence,
> 0.
x, then rnax {y, y) x. Thus, y 5 x and y (b) If lyl x. By x. Conx implies x 5 y. Thus, x 5 y Theorem 46.2(i), y y and y x, then y x and y x. Therefore, versely, if x x 1 1 x 1 and lvl Iyl. Hence, by (b), Iyl = max (y, y) 2. (c) 1 1 x 1 x 1 x 1 and \y1 y Iyl. Therefore, by Theorem 46.2(g), (Ixl 191) x Y 1 x 1 Ivl. Consequently, by (N, Ix YI 5 f x O and y 2 0, then x  y O, so that J x . y J= 1 x 1 ly(. (d) I 0,thenxy O . Inthiscase, lx.yl = x . y = Ixj.Iyl. I f x 2 0 , y (x.y) = x g (  y ) = lxI.IyI. The case x O, y 2 O is similar. Finally, if x 0, y O, then x y 2 0, and therefore lx . y] = x y = (x) (y) = 1 x 1 1 y]. (e) By Definition 46.6 and Theorem 46.5(a),
<
<
<
<
< <
<
<
+ < +
> < <
<
<
<
< <
<
1 x 1
> 0.
Problems involving algebraic manipulation of absolute values occur often in analysis. Sometimes Theorem 46.7 can be used to solve them.
* I t might be expected that the use of vertical bars to denote both absolute value and the cardinality of a set would be confusing. However, both notations are standard and the double meaning causes no cofusion.
461
PROPERTIES OF ORDER
133
EXAMPLE 3. Determine {x E R11x2  4x1 ( 1). By Theorem 46.7(b), 1x2  4x1 $ 1 is equivalent to 1 5 x2  42 $ l. These inequalities hold if and only if 3 = 1 4 $ x2  4x4 4 $ 1 l4 = 5. Thus, x belongs to the set {x E R 1 jx2  4x1 1) if and only if
<
If 4 3 and d5 denote the (positive) square roots of 3 and 5, then this inequality can be written in the form
1. Complete the proof of Theorem 46.2. 2. Let A be an ordered integral domain. Prove the following properties of A. (a) If O x < y and O z < w, then x z < y w. (b) If O 5 x $ y and O z $ w, then x z 2 y w. (e) I f O x < y, then xn < yn for al1 n E N. (d) If O x y, then xn 5 y" for al1 n E N. (e) I f x < y, then z  y < z  x. (f) If x $ y, then z  y 5 x  x. (g) I f x $ y, then y s.
< <
<
3. (a) Prove that if u, u, and x are elements of an ordered integral domain such that O u 5 u, then u2 $ x2 $ v2 if and only if either
<
Show that if O
(b) Generalize this result from squares to arbitrary exponents. 4. Let A be an ordered integral domain. Prove the following properties of the elements of A. (a) ~ X Y L x2 y2 (b) If x # y, then 2xy < x2 y2. ( 4 (XY (x2 x2) (y2 w2) [Hint: Show that [(x2 z2)(y2 w2)  (xy Z W ) ~ 2] O.]
<
134
T I I E ISTEGEHS
5. Determine thc folloiving scts of real nurilbcrs. (a) {x E R14x  12 > O) (b) { X E R 1 9  3 x < 0) (c) {x E R12x+ 1 < 42  7)
(d) {x E R((5)(3  2) L (i) (x 4 4)) (e) { x E R19x2 < 25) (f) {x E R(7x2 > 63) (g) { x E Rl(x  l ) ( x + 2) < 0) (h) {x E R ( ( x  1)(x  3) > 0) (i) { X E R ( ( x  l ) ( x + 2) < 0) (j) {x E RIx2 x > 6) (k) {x E R J X ~ x  G < 6) (1) {x E R l ( x + l ) ( x  2)(x  3) < 0) (m) { x E R ( ( x  3 ) ( x + 1)(x  2) > 0) (n) { X E RIx3  x + 3 > 3)
7. Show t h a t if 1 1 is an ordered intcgral domain, then rnax not exist. JVhat does this imply about finite integral domairis?
and min A do
8. Let A bc an ordered intcgral domain. Suppose S and T are finite nonempty subsets of 11. Show t h a t (a) max S U T = max {max S, max T) . (b) min S U T = min {min S, min T). 9. Let *,1be an ordcrcd integral domain. Suppose t h a t x, y, and z are arbitrary elements of 11. l'rovc thc following facts.
(a> 1 5 1 (b) lx21
=
(e) lx y 21 1x1 IzI ( 4 1x1  191 I lx  y 1 1x1 IYI (e) If z > 0, then Ix  y/ < z if and only if
+ + < + 1!/ +
<
1x1 x2
lIrl
lyl/lxl.
10. Ileterrninc thc following scts of real numbers. (b) { X E RI(1  xl < 2) (a) { X E Rll2x+ 11 < 3) (c) ( x E R(21x  11 < 3) ( 4 {X E R112~+ 41 5 2) (e) { c E Rllx  11 > O) (f) {x E R11x  31 > 4) (h) {x E R114x  x21 2 4) (g) i c E RIIx2  2x1 < 1) (i) {x E RI[x2  2x1 > 3)
CHAPTER 5
ELEMENTARY NUMBER THEORY 51 The division algorithm. In this chapter we will develop some useful and interesting results concerning the natural numbers and the integers. Our study can be viewed as a brief introduction to the theory of numbers, one of the oldest and most respected branches of pure mathematics. We begin with a discussion of the familiar process of long division and some of its consequences. I f a and b are integers, and if a O, then it is possible to "divide a into b" obtaining a quotient q and a remainder r. The exact statement of this fact is calIed the division algorithm. It is a basic result of considerable importance in number theory.
THEOREM 51.1. Division aigorithm. Let a and b be integers, with a # O. Then there exist unique integers q and r such that
b = aq where O
+ r,
L: r < lal.
Proof. There are two things to prove. We must show that integers q and r exist which satisfy
and that there is just one pair of integers q and r which satisfy these conditions. First we present the existence proof. Consider the set S of al1 nonnegative integers of the form b  ax, x E 2. We wish to apply the wellordering principle (25.2) to the set S. It is first necessary to prove that S f a 1, then is nonempty. Since a # O, either a 2 1 or a 5 1. I ajbl L Ibl, and
>
In this case, b  ax E S, when x = Jbl. By the wellordering principle, S contains a smallest number r. Let q be the value of .2: such that b  ax = r.
135
136
ELEMESTARY
XUMBER THEOHY
[CHAP.
>
> O.
<
ba(q+l)=
<baq=r.
Since r is the smallest number in S, it follows that b  a(q 1) is not a member of S. Since b  a(q 1) is of the form b  ax, and since S consists of a11 nonncgative integers of this form, the only possiblc reason for b  a(q 1) not to be in S is that O  a(q 1) is negativc. Hence,
If a 5 1, then
a(q
1)
< O. <
Therefore,
In both of the possible cases a 1, a j 1, \ve obtain r To prove uniyueness, suppose that
>
lal.
where O
+ r',
+
where O 2 r' < lal. Then aq r = aq' r'. Thus, a(q  q') = r'  r. Taking absolutc values, wc obtain la/ I q  q'I = Ir'  TI. By adding the inequalities la( < r 5 O and O 5 r' < la!, it follows that (al < r'  r < lal. Therefore, by Theorem 46.7, Ir'  rl < lal. Hence, (al Iq  qr( < la(, so that I q  q'l < 1. Since q and q' are integers, so is Moreover O j I q  q'J < 1. Hence, Iq  q'l = 0, and thereI q  qr(. fore q = q'. Consequently, r = r'. In thc cxpression b = aq r, q is called the quotient and r the remainder in the division of b by a. If r = O, then a divides b in 2, and in this case q = b/a. The division algorithm can be gcneralized to obtain an important theorem on the rcprescntation of natural numbers. The proof of this gcncralization uses thc following simple fact. (51.2). If a
>
1 and b
>
>
0.
511
137
Proof. Assume that q < O . Then q > 0, and therefore q 2 1. Hence, a(q) 2 a > r. Adding aq to both sides of this inequality, we obtain O>aq+r=b>O, which is a contradiction. Consequently q O . If q = O, then obviously f q > O, then since a > 1, we have aq > q. Therefore b > q 2 0. I b=aq+r>aq>q>O. This proves (51.2). THEOREM 51.3. Let a be a natural number greater than l . Then every natural number n can be uniquely represented in the form
>
where 7 is some nonnegative integer and rO,r l , . . . , rk are nonnegative integers less than a. Proof. The proof is by course of values induction on n (see 23.3). Assume that every natural number m < n can be represented uniquely in the form rkak rklakl . rla ro,
+ +
where ro, r l , . . . , rk are nonnegative integers less than a. By the division algorithm, there are unique integers q and r such that
f q = O, then n = r is the required unique By (51.2)) n > q 2 O . I representation (with 7 = O, ro = r). Of course this is the case when n = l. Assume now that q > O, that is, n a. Then q is a natural number less than n, so that the induction hypothesis applies to q. Thus, there is a unique representation
>
With a change of notation, this is an expression for n in the required form. S o prove that this representation of n is unique, suppose that n = sjai
+ sjlail +
+ sla + so,
138
[CHAP.
where j is a nonnegative integer, and so, sl, . . . , sj are nonnegative integers less than a. Because n 2 a, it follows that j 2 1, and
Since O
In the particular case a = 10, this theorem is merely a formal statement of the wellknown fact that every natural number can be represented in decimal notation. In fact, when we write an expression such as
The fact that every natural number admits a unique representation of this form is usually taken for granted. By Theorem 51.3, such an assumption is justified. In fact, we have proved that it is not necessary to use powers of 10 for such a representation. Any natural number a > 1 will do just as well. In an expression
the number a is called the base, or radix, of the representation. As in the case of the decimal system, it is convenient to abbreviate
For this notation to be unambiguous, it is necessary to have individual symbols representing each of the numbers O, 1, . . . , a  1. I f a 5 10, then the customary digits can be used. For example, every number is expressible with the base 5, using the coefficients O, 1, 2, 3, and 4. Thus, 1 53 3 5 2 O 5 f 1. 1411301 represents 1 56 4 55 1 5*
5 1 1
139
I f a > 10, then new symbols must be introduced for the numbers which are written in the decimal notation as 10, 11, . . . , a  1. (Clearly, the use of 10, 11, etc., would be confusing.) A frequently used base is 12. The scheme for representing numbers to the base 12 is called the duodecimal system. The letters A and B are often employed to denote 10 and 11, respectively, in the duodecimal system. For instance,
7AlBO represents 7 l2*
+ 10
lz3
+1
1 2 ~ 11 12
+ 0.
I n representing numbers to bases other than 10, we must be careful f there is a possibility of that the base being used is clearly understood. I confusion, the base is usually indicated as a snbscript. Thus,
The reader will find that with a little practice he can do elementary arithmetic with numbers expressed to bases other than 10. The methods used are the usual ones.
EXAMPLE 1. Let 413204, 223001 be numbers written to the base 5. Then
The magnitudes of numbers expressed to any base can be compared in the same familiar way that decimal numbers are compared. For example, (11113
<
(22213, and
(132)5
<
(141)5,
(11)2<(100)2,
(A99)12<(B00)12.
140
[CHAP. 5
The rule used to compare two numbers is a simple one, but the general statement of it is somewhat involved.
THEOREM 51.4. Let a be a natural number greater than 1. Suppose that n and m are natural numbers represented to the base a. Then n < m if and only if either n has fewer digits than m or n and m have
the same number of digits, and at the first place from the left where the digits of n and m differ, the digit in n is less than the corresponding digit in m. The proof is left as an exercise for the reader (see Problem 9 below). In the binary system of enumeration, each number is represented to the base 2. Thus, a binary number is written as a series of zeros and ones. For example,
Many largescale digital computers operate with numbers in binary form. There are two reasons for this. First, the operations of addition and multiplication are particularly simple in the binary system. Second, most of the basic components (switches, relays, diodes, rectifiers) of digital computers are bistable devices, that is, they are always in one of two states, which conveniently correspond to the digits O and 1 in the representation of a binary number. The binary system of enumeration has other important uses in mathematics. EXAMPLE 2. For the case a = 2, Theorem 51.3 can be stated as follows: every natural number n can be uniquely represented in the form
where k l
>
k2
>.
> k , >: O.
Thus,
is a onetoone correspondence between the set of al1 finite subsets of (O, 1, 2, . . .) and N. Since the set of al1 finite subsets of any denumerable set has the same cardinality as the set of al1 finite subsets of (O, 1, 2, . . .) (see Problem 14, Section 13), we obtain a useful theorem. The set of al1 finite subsets of a denumerable set is denumerable.
EXAMPLE 3. The binary number system can be used to obtain a winning strategy for Nim, an ancient game, which originated in China. Nim is played by two contestants, using three piles of counters. The contestants alternately pick
511
THE DIVISION
ALGORITHM
141
up any number of counters from one of the three piles. On each play they must take a t least one counter, and the counters chosen can come only from a single pile. The winner is the player who takes the last counter. From a mathematical standpoint, the game is completely described by specifying the numbers 1, m, and n of counters in each of the piles a t the beginning of each play. Thus, a sample game might be represented by
We say that a triple (1, m, n) of nonnegative integers describes a "position" of the game. Let (1, m, n) be any position. Write 1, m, and n in binary form.
) is unfavoris even. Otherwise, the position is "favorable." In particular (O, 0, O able. I t is easy to see that if (1, m, n) is unfavorable, then any position (1', m, n) with 1' < 1, or (1, m', n) with m' < m, or (1, m, n') with n' < n is favorable. Suppose that the counters are removed from the first pile. Then 1' < 1, so that when 1' is written in binary form
a t least one of the binary digits e; is different from the corresponding e i in the representation of l. Since e; and e: are either O or 1, we have e: = e i 1 if ei = 0, and e: = e;  1 if e i = l. Therefore,
g i is even. The fact that (1, m, n) is an unfavorable position means that e i f i Consequently e: f ; g; is odd, and the position (l', m, n) is favorable. I n particular, an unfavorable position can never lead to (O, 0, O); hence no player can finish the game from an unfavorable position. Thus, a good strategy is to remove enough counters from some pile so that the opponent is left in an unfavorable position. Of course, if (1, m, n) is unfavorable, this is impossible. But if (1, m, n) is favorable, then by reducing one of 1, m, or n, an unfavorable position results. This can be done in the following way. Suppose that j is the largest number such that ei fi gi is odd, that is, ek fk gk, eki fki gk1, . . . , ei+i fi+l gj+l are even, but ei fi gi is odd. B y the definition of a favorable position, such a j exists. Then either e j = 1, fi = 1, or g j = 1, since
+ +
+ +
+ +
+ +
+ +
142
otherwise ei fi Define for each i
[CHAP.
+ + gi
+ fi + gi
ei+fi+gi O
> j and e$ =
< ej.
' Then 1
if ei
+ fi + g; is even,
+ +
Then 1 = 0 . Z 5 + 1 . Z 4 + 1 * 2 3 + 0 . 2 2 + 1 . 2 + 1,
Thus, (1, m, n) is favorable. Using the procedure which we outlined above, we can obtain an unfavorable position (1', m, n), where
That is, the appropriate strategy is to remove 6 counters from the pile containing 27. The other player will then be faced with an unfavorable position, so that whatever he does leads to a new favorable position. In this way, the player who finds himself with a favorable position can always keep his opponent in an unfavorable position and win the game. In particular, if the initial position is favorable, the player with the first move can always win, provided that he knows the method which we have described. If he does not know this strategy, there is a good chance that one of his moves will lead to a favorable position for his opponent, since usually there are more favorable than unfavorable positions.
511
143
1. Using long division, find the quotient by a, where a and b are as follows. (a) a = 212, b = 3111 (c) a = 2164, b = 6411037 (e) a = 121, b = 36
2. Write the following decimal numbers in the base 5 system of enumeration: 2, 21, 3116, 711096, 1O1O. 3. Write the following decimal numbers in the base 12 system of enumeration: 4, 16, 3102, 999111. 4. Find the decimal expression for the following numbers:
6. Carry out the following addition without converting to the base 10. (a) (12145)s (51015)s (b) (111010111)2 4 (10101100)2 (c) (1AlB21)12 (ABAB11A)i2 (d) (140314)5 (2134114)5
7. Carry out the following multiplication without converting to the base 10. (a) (12145)s (51015)s (b) (111010111)2 (10101100)2 (e) (1AlB21)12 (ABAB11A)i2 (d) (140314)5 (2134114)s
8. Let n be a natural number. Suppose that
(a) Show that in the sequence of numbers 1, 2, 3, than 2k is divisible by 2k (in 2). (b) Use the result of part (a) to prove
. . . , n, no number other
9. Let a be a natural number greater than l. (a) Show that if n = rkak rklakl ro, where O 5 ri < a, then n < akfl. [Hint:First show that a k f l  1 = (a  l)ak (a  l)akl (u  l).] (b) Use the result of (a) to prove Theorem 51.4.
+ +
+ +
144
ELEMESTAHY
SCMBEII
THEORY
[CIIAP.
52 Greatest common divisor. If a and b are any two integers, then ari integer c is called a common dizjisor of a and b if cla and cib. Several simple facts follow immcdiatcly from this definition. Since 1 is a divisor of every integer, 1 is a common divisor of any two integers a and 6 . Thus, the set of common divisors of two integers is nonempty. Every intcger divides 0. Hence if b = O, then the common divisors of a and b are just the divisors of a. In particular, if a = b = O, every integer is a common divisor of a and b. I n this case thc set of common divisors of a and b is infinite. However, in every othcr case the set of common divisors of a and b is finite. Indeed, if a # O and cla, then a = w c for somc nonzero integer w. Consequently, by Theorem 36.7, la1 = Iwl Icl 2 Icl. 'i'hercforc, if a # O and if c is a common divisor of a and 6, thcn /al 5 c 5 lal. Obviously, there are only firiitely many integcrs c satisfying  la1 5 c 5 lal. Similarly, if b # O, then  lb1 5 c 5 lb/, and therc are only finitely many integers c satisfying lb/ 5 c 5 jbl. Therefore, if cithcr a or O (or both) is different from zero, thcn thc set of common divisors of a and b is finite and nonempty. Thus, by Theorem 46.4, this set coritains a largest integer. Since 1 is in the set of common divisors, this largcst integer is positive, that is, it is a natural number.
DEFIXITION 52.1. IJet a and b be integers which are not both zero. The greatest common divisor of a and b is the largcst integer in the set of al1 common divisors of a and b. The greatest common divisor of a and b is denoted by (a, b). The expression "grcatest common divisor" is often abbreviatcd g.c.d.
EXAMPLE 1. The common divisors of 12 and 30 Therefore, the g.c.d. of 12 and 30 is 6.
Xote that al1 of thc common divisors of 12 and 30 divide the greatest common divisor. We will show that this is no coincidence, but rathcr is a fundamental property of the g.c.d. THEOREM 52.2. Let a and b be integers which are not both zero. (a) There exist integers u and v such t h a t (a, b)
=
ua
+ vb.
521
145
Proof. If c is a common divisor of a and b, then c divides any number of tb, where S and t are integers. Thus, statement (b) follows the form su from the property (a). Let S be the set of al1 positive integers (natural numbers) which are of the form su tb, with s and t integers. Since a and b are not both zero, at least one of the integers
is positive and therefore belongs to S. By the wellordering principle, S contains a smallest number d. By the definition of S, there are integers u and v such that d = u.a+v.b. As we noted above, every common divisor of a and b divides d, so that in particular (a, b) Id. Thus, (a, b) 5 d. The proof will be finished if we show that d 5 (a, b). By the division algorithm,
where O
5 r < d, r
E 2, q E 2 . Therefore,
I f r were positive, then r E S, since r is of the form sa tb. But r < d and d is the smallest number in S. Therefore, r cannot be positive, that is, r = O. Thus, a = q d, so that d divides a. Similarly, d divides b. Therefore, d is a common divisor of a and b. Since (a, b) is the greatest common divisor of a and b, d 5 (a, b). The two inequalities (a, b) 5 d and d 5 (a, b) imply that (a, b) = d = ua vb.
Suppose that a and b are integers which are not both zero and d is an integer which satisfies the following conditions: dla and dlb. (51 (52)
By (51)) d is a common divisor of a and b. Therefore, by Theorem 52.2(b), dl (a, b). Since (a, b) ( aand (a, b) b, it follows from (52) that (a, b) Id. Thus, d = &(a, b). I n other words, the greatest common divisor of a and b is characterized up to its sign by the above conditions. In fact, these conditions together with the requirement that d be positive can be taken as the definition of the g.c.d. in 2 . The importance of the conditions (51) and (52) lies in the fact that they make sense in an arbitrary integral domain,
146
[CHAP.
whereas Definition 52.1 depends not only on the ordering of 2, but also on the very special fact that a nonzero integer has only a finite number of divisors. Accordingly, if A is an integral domain and if a and b are elements of A which are not both zero, then an element d E A is called a greatest common divisor of a and b (in A) if d satisfies (51) and (52) [where in (52)) c is an element of A]. O f course, in some integral domains, not every pair of elements has a greatest common divisor (see Probelm 14 below). Also, greatest common divisors in integral domains need not be unique. For example, in Q, if a and b are not both zero, then every nonzero rational number satisfies (51) and (52). We will use this generalized notion of a greatest common divisor in our discussion of polynomials in Chapter 9. We will now derive some of the most useful properties of greatest common divisors. The first of these are simple consequences of Definition 52.1. (52.3). Let a and b be integers which are not both zero. Then ( 4 (a, b) 2 1; (b) (a, b) = (b, a) ; (c) (a,b) = (a,b) = (a,b) = (a,b) = ((al,lbl); (d) (a, b) = (al if and only if al b; (e) (a, O ) = la1 (provided that a # 0). Proof. The statement (c) becomes evident if we note that the set of common divisors of a and b is identical with the sets of common divisors of a and b, of a and b, and of a and b. To prove (d), suppose first that (a, b) = lal. Then in particular la1 is a divisor of b. Therefore, al b. Conversely, if alb, then any divisor of a is also a divisor of b. Thus, the common divisors of a and b are exactly the divisors of a. Note that a # O, since a = O and alb implies b = 0, and we have assumed that a and b are not both zero. By the discussion preceding Definition 52.1, every divisor lal. Since la1 divides a, it follows that la1 is the largest c of a satisfies c divisor of a, and therefore la1 is the g.c.d. of a and b. The remaining statements of (52.3) are easy to prove.
<
THEOREM 52.4. Let a and b be integers which are not both zero. Suppose that c is a nonzero integer. Then (ca, cb) = Icl(a, b). Proof. Since (a, b) is a common divisor of a and b, and since Icl divides c, it follows that Icl (a, b) is a common divisor of ca and cb. Hence, by Theorem 52.2(b), Icl (a, b) divides (ca, cb).
0 1 1the
521
that (a, b) = ua
+ vb.
Consequently,
where u' = u, v' = v if c > 0, and u' = u, u' = v if c (ca, cb) is a common divisor of ca and cb, it follows that (ca, cb) divides Icl (a, b).
< O.
Since
Since both (ca, cb) and Icl(a, b) are positive, and each divides the other, (ca, cb) = Icl(a, b). An immediate consequence of this theorem is the following. THEOREM 52.5. Let a and b be integers which are not both zero. Suppose that c is a common divisor of a and b. Then
Note that since a and b are not both zero, and c divides both a and b, c cannot be zero. By Theorem 52.4, (a, b) = (c a/c, c b/c) = jcl (a/c, b/c). Any pair of integers a and b has 1 and 1 as common divisors. I f these are the only common divisors, then a and b are said to be relatively prime. I n other words, a and b are relatively prime if (a, b) = 1. For example, 2 and 5 are relatively prime, 9 and 16 are relatively prime, 27 and 35 are relatively prime, but 24 and 63 are not relatively prime since they have 3 as a common divisor. We now obtain a result which is needed in the next section for the proof of the fundamental theorem of arithmetic. THEOREM 52.6. Suppose that a and b are relatively prime and a divides b c. Then a divides c. Proof. Since (a, b) = 1, by Theorem 52.2(a) there are integers u and v such that 1 = ua vb.
+ v(bc) = c.
As we pointed out at the beginning of this section, any common divisor c of two nonzero integers a and b satisfies c 5 min (la/, Ibl}. Thus, the
148
[CHAP.
problem of finding the g.c.d. of a and b can be solved by examining al1 of the natural numbers which are less than min {[al,lb[> to find the largest one which divides both a and b. However, unless a and b are small, this procedure is impractical. There is a very efficient process for determining the g.c.d. of two integers, using the division algorithm. This method was apparently discovered by Euclid, and is called the Euclidean algorithm. I f either a or b is zero, then (a, b) is obtained from (52.3e). Moreover, since (a, b) = (la], 1 bl) by (52.3c), it is only necessary to consider positive integers, that is, natural numbers. Let a and b be natural numbers. By the division algorithm,
I f r2 = 0, the process ends. Otherwise, divide rl by r2, and the division algorithm yields
This process can be continued as long as a nonzero remainder is obtained. Since each new remainder is a nonnegative integer which is smaller than the preceding one, the sequence
. Thus, we have the following equamust terminate with some rn+l = O tions : a = bq1 r1,
We will show that r,, the last nonzero remainder, is the g.c.d. of a and b. By the equation rn1 = rnqn+l, we see that rnlrnl. Since rn2 = rnWlqn r,, it follows that rn[rn2. Continuing up the sequence of equa
521
149
tions (53) we find that r, divides each of the preceding remainders. Then rl, it follows since b = rlq2 r2, it follows that r,l b, and since a = bql that r,la. Therefore, r, is a common divisor of a and 6. Suppose that c is any common divisor of a and b. Then cjr l , since rl = a  bql. Consequently, clr2, since r2 = b  r1q2. Continuing down the sequence of equations (&3), we find that c!r3, clr4, . . . , clrn. In particular, since r, # O, c 5 Icl 5 irn\ = rn. T ~ u s rn , is the g.c.d. of a and b.
EXAMPLE 2. Let a
24756, b
12.
I t is possible to use equations (53) obtained in applying the Euclidean algorithm to determine not only the greatest common divisor d of any pair a and b of natural numbers, but also integers u and v such that
d = ua
+ vb.
The existence of such numbers was proved in Theorem 52.2(a), but that proof does not give a convenient method for finding the values of u and v. We illustrate the use of equations (53) to find u and v with t,he example a = 24756, b = 6108. Write the equation 48 = 36 1 12 in the form 12 = 48  36 1. Now reorder the preceding equation of Example 2 to obtain 36 = 276 48 . 5. Substitute and collect:
Continue this process, using each of the equations obtained in the above 0) : example (except the last one, 36 = 12 3
[CHAP.
Continuing up the set of equations (53) in this way, we can eventually express r, in terms of a and b. I t is possible to extend the definition of the greatest common divisor to collections of severa1 integers. Thus, if {al, az, . . . , a,) is any nonempty set of integers, we say that c is a common divisor of the integers in this set if c/ul, c/u2,. . . , and clan. I f not al1 of a l , a2, . . . , a, are zero, then this collection has only a finite number of common divisors, and therefore there is a natural number d which is the greatest common divisor of al, a2, . . . , a,. As before, the g.c.d. of al, a2, . . . , a, is denoted by (a1, a2, . . . , a,). Note that if n = 1, then (al) = lal/. Further, if any ai = O, then (al, a2, . . . , a,) = (al, a2, . . . , ai1, ai+l, . . . , a,), so that we may restrict our attention to sets of nonzero integers. THEOREM 52.7. Let al, a2, . . . , a, be nonzero integers, where n (a) There exist integers ul, u2, . . . , u, such that (al, a2,
> 1.
. . . , un)
C Unan.
. . . , a, divides (al, a2, . . . , a,).
The proof of this theorem is similar to the proof of Theorem 52.2, and we leave it as an exercise. A useful consequence of Theorem 52.7(b) is the f ollowing theorem. THEOREM 52.8. Let al, a2, . . . , a, be nonzero integers, where n 2 2. Then (al, a2)  an) = (al7 (a27 7 un)).
f c is a common divisor of al, a2, . . . , a,, then by TheoProof. I Thus, by Theorem 52.2(b), rem 52.7(b), clal and cl(a2, . . . , a,). ( a ( a , . . . , a ) ) In particular, (al, a2, , an)I(ai, (un, . . , un)). Conversely, if c is a common divisor of a l and (a2, . . . , a,), then clal and cl(a2, . . . , a,). Therefore, c is a common divisor of al, a2, . . . , a,, so that by Theorem 52.7(b), cl (al, a2, . . . , a,). In particular,
521
151
Since (al, a2, . . . , a,) and (a1, (a2, . . . , a,)) are natural numbers, each of which divides the other, they are equal. By using the Euclidean algorithm, together with Theorem 52.8, i t is possible to determine the g.c.d. of any nonempty finite set of natural nurnbers. Moreover, Theorem 52.8 can also be used with induction t o extend results on the greatest common divisor of two integers to theorems about any nonempty set of integers. Let a l , a2, . . . , a, be any nonzero integers. An integer c is called a common multiple of a l , a2, . . . , a, if a l (c, a21c, . . . , and u,~c. Evidently, ala2. a, and al a2 . . a, are both common multiples of a l , a2, . . . , a,. At least one of these is positive. By the wellordering property of the natural numbers, there exists a smallest positive integer c which is a common multiple of a l , a2, . . . , a,. This unique positive integer is called the least common multiple (or 1.c.m.) of a l , a2, . . . , a,. The usual notation for the 1.c.m. of a l , a2, . . . , a, is [al, a2, . . . , a,]. There is a close relationship between the g.c.d. and the 1.c.m. I n fact i t is possible t o prove that for any two nonzero integers a and b,
ua
+ vb.
l. In the following cases, find the g.c.d. (a, b), and express it in the form
(a) a = 121, b = 33 (b) a = 543, b = 241 (c) a = 78696, b = 19332 2. Show that in the expression (a, b) = ua vb, the integers u and v are not unique. 3. Find the following g.c.d.'s. (a) (144, 90, 1512) (b) (1932, 476, 952, 504, 9261) 4. Show that the integers a and b are relatively prime if and only if there exist integers u and v such that ua vb = 1.
5. Show that any two consecutive integers are relatively prime. 6. Prove that any two successive terms of the Fibonacci,sequence
152
[CHAP.
8. Show that if al, a2, . . . , a, are nonzero integers (n ' : l ) , and if c # 0, then (cal, caa, . . . , can) = IcI(a1, a2, . . . , a,).
9. Show that if a and b are integers which are not both zero, then a/(a, b) and b/(a, b) are relatively prime. 10. Let a, b, and c be nonzero integers. Prove the following result concerning least common multiples : [ca, cb] = Icl[a, b]. 11. Let a and b be nonzero integers which are relatively prime. Use Theorem 52.6 to show that [a, bl = la1 Ibl. 12. Using the results of Problems 9, 10, and 11, show that for any two nonzero integers a and b, [a, b l b , b) = la1 Ibl. 13. In equations (53), show that (a) rn 2 1, rn1 2rnl rn2 2 rni rn, rn3 2 Tn2 rn1, ., b 2 ri r2. (b) Using this result, show that if ui, u2, . . . denote the terms 1, 1, 2, 3, 5, 8, . . . of the Fibonacci sequence, then
>
1 of (c) Show that if p is the number of digits in b, then the number n steps in Euclid's algorithm is less than or equal to 5p. [Hint: By (b) and Problem 8, Section 26,
Thus,
14. Let A = (m n d 1 0 /m, n E 2). Prove the following. (a) A is an integral domain with the usual addition and multiplication of real numbers. f a b d f i divides c d2/@ in A, then a2  10b2 divides (b) I c2  10d2 in 2. (c) 2 and 4 2/10 are both common divisors of 6 and 8 22/10 in A. b 2 / a in A, then 21a and 21b in 2. (d) If 2 divides a (e) I f 4 2/10 divides 2c 2 d d 3 in A , then 31c2  10d2 in 2. (f) I f 2c 2d2/fi divides 6 in A, then c2  10d2/9in 2. (g) I f 2c 2 d d a divides 8 22/10 in A, then c2  10d2[6in 2.
+ + +
+ +
531
153
(h) Prove that there is no element a b2/= in A which is a common divisor 2/10 in A. [Hint: I f such of 6 and 8 22/10, and is divisible by both 2 and 4 bda exists, then by (d), (e), (f), and (g), we have a b d m = 2c an a 2dd10, where c2  10d2 = ~ 3 Now . use the easily verified fact that the square of a natural number written in decimal form never ends with 3 or 7.1
53 The fundamental theorem of arithmetic. As the two preceding sections indicate, many questions considered in the study of the natural numbers are concerned with divisibility properties. That is, when will a number a divide a number b, if a and b are someho~v related? As an example, we might be interested in conditions under which the natural number n divides the binomial coefficient (:). One of the principal elementary tools in the study of divisibility problems is a theorem, called the fundamental theorem of arithmetic, which says that every natural number greater than 1 can be written in an essentially unique way as a product of prime numbers. The primes, which we discussed briefly in Section 23, can therefore be considered as the basic building blocks of al1 natural numbers.
DEFINITION 53.1. A natural number p is called a prime (or prime number) if p # 1, and p is not divisible by any natural number other than 1 or p. A natural number n > 1 which is not a prime is called composite.
For example, 2, 3, 5, 7, 11, 13, 17, and 19 are al1 of the primes less than 20, and 4, 6, 8, 9, 10, 12, 14, 15, 16, and 18 are al1 of the composite numbers less than 20. The number 1 is distinguished: it is neither prime nor composite. Following an old tradition, we will usually designate primes by the small latin letters p and q (sometimes with subscripts). I f p is a prime, and if a is any natural number, then the greatest common f divisor (p, a) divides p, so that either (p, a) = p, or (p, a) = 1. I (p, a) = p, then p divides a (since it is the g.c.d. of p and a). Thus, either p divides a, or else p and a are relatively prime. There are two parts of the fundamental theorem of arithmetic. The more elementary part states that every natural number greater than 1 can be written in some way as a product of primes. That is, if n > 1, then
where pl, p2, . . . , pk are primes (not necessarily different). Of course, it may happen that 1c = 1, so that the product has only one factor. This result is easily proved by course of values induction on n [see (23.3)]. It is only necessary to show that if every natural number m, which satisfies 1 < m < n, can be written as a product of primes, then n can be written
154
[CHAP.
as a product of primes. If n is itself a prime, there is nothing to prove. (This remark takes care of the basis for the induction when n = 2.) Otherwise, n is composite, and therefore n = a b, where neither a nor b is 1 or n. That is, 1 < a < n and 1 < b < n. By the induction hypothesis,
k
1
a=IIpi
and
b=IIqj,
is a product of primes. The second part of the fundamental theorem of arithmetic is sometimes called the uilique factorization theorem. I t states that the expression of a natural number as a product of primes is unique, except for the order of the factors. This fact is also proved by induction. This time the induction is on the number Ic of prime factors in the expression of n as a product of primes. To begin with, however, we need a preliminary fact.
f a prime p divides a product ala2 (53.2). I least one of the factors ai.
Proof. I f k = 1, then the hypothesis is the same as the conclusion, so that there is nothing to prove. We may therefore make the induction hypothesis that if p divides a product of k  1 (k > 1) natural numbers, then it divides at least one of the factors of this product. By assumption, p divides ala2 . . . ak = (ala2 . . . akl) ak. As we remarked above, either p(ak or (p, ak) = l. I f (p, ak) = 1, then by Theorem 52.6, p divides ala2 . . . akWl. In this case, the induction hypothesis yields plai for some i with 1 i 5 k  1. This completes the proof of the induction step, and proves (53.2).
<
We can now complete the proof that factorization of a natural number n into a product of primes is unique. Suppose that
f k = 1, then n = pl where pl, p2, . . . , pk and q l , qg, . . . , ql are primes. I is a prime. Moreover, qllpl and ql # 1. Thus, by Definition 53.1, f 1 were greater than 1, then n = pl = q l . ( q 2 . ql = pl. I ql). Sinee q2, . . . , ql are primes, q2 q~ # 1, and n = pl is composite. This is a contradiction. Therefore, 1 = 1 and n = pl = ql, which is the desired conclusion. This proves the basis step in the induction on k . We
531
THE FUNDAMENTAL
THEOREM OF ARITHMETIC
155
may therefore assume that k > 1. Our induction hypothesis is that if a natural number can be expressed as a product of less than k primes, then this expression is unique up to the order of the factors. That is, the number has no other representation, regardless of the number of factors. From the equality pl.p2. . p k = q1'q2' Ql,
' ' a
ql. By (53.2), pk divides some qi. i t follows that pk divides ql q2 Since qi is a prime and pk # 1, it follows that pk = qi. By canceling pk, we obtain
Now p1 p2 pkl is a product of 7c  1 primes, and by the induction hypothesis, the factors pl, p2, . . . , pkWl are equal to the factors q1, 42, . . . , qi1, qi+l, . . . , qz in some order. This completes the proof of the main result of this section, which can be stated as follows. THEOREM 53.3. Fundamental theorem of arithmetic. Every natural number n > 1 can be written as a product of primes, and except for the order of the factors, the expression of n in this form is unique. O f course, the primes which appear in the factorization of a natural number may be repeated. For example, 360 = 2  2 2  3 3 5 = 23 32 5. In writing a number as a product of primes, it is convenient to group together the repeated primes so that the number is expressed as a product of powers of distinct primes. Thus, each natural number greater than 1 has a unique expression
where pl, p2, . . . , pg are distinct prime numbers and t.he exponents ei are natural numbers. I t is easy to see from Theorem 53.3 that the natural numbers which are where divisors of n = peilp22 . . . pig are the numbers p{lp& . . . O 5 fi ei. For instance, the divisors of 360 = 23 32 5 are
<
pp
156
ELEMEXTARY
NUMBER THEORY
[CHAP.
Associated with any natural number n interest. These are r(n) and a(n)
= =
>
I f we know the factorization of n into a product of powers of primes, it is possible to determine these quantities easily. Suppose that n = ppp12. . . pig,
where pl, p2, . . . , p, are distinct primes and el, e2, . . . , e, are natural numbers. I f d is a divisor of n, then
fi ei. By Theorem 53.3, different choices of the sequence where O f l , f2, . . . , f Qgive rise to different divisors. Thus, r(n) is the number of fi e;. I t is easy to show (by different sequences f 1, f2, . . . ,f, with O induction on g, for example) that the number of such sequences is
< <
< <
I fn
r(n)
(el
+ l)(e2 + 1)
+ 1).
< <
can be expanded as the sum of al1 products p{lp$ . . . p$, Thus, chosen from a summand of 1 pi p;" Therefore, the expansion of this product is just the sum of n, that is, a(n). Finally, since
+ +
[see Problem 6(a), Section 211) we have proved (53.5) I f n = pl"lpS2 . . . pis, where pl, p ~ .,. . , pg are distinct primes, then,
531
157
EXAMPLE 1. Let n
360
Z3 . 32 5. Then
Another useful application of Theorem 53.3 is in finding the greatest common divisor and the least common multiple of a set of natural numbers. Let (al, a2, . . . , a,) be a set of natural numbers. Then each number ai can be expressed as a product of powers of the same set of primes pl, p2, . . . , pk, if zero exponents are used. For example, consider (360, 105, 1078). Here, 360 = 23 32 5l 7' 11, 105 = 2'. 3'  5 ' 7l 11, 1078 = 2 3 5' . 72 11l . Naturally, any prime raised to the power zero is 1. (53.6). Let (al, a2, . . . , a,) be a set of natural numbers, where
. . , n a n d j = 1, 2 , . . . , k. Then . . . , a,)
=
p{lp$.
. . p& where
fj
. . . , ejn); . . . , ejn).
for al1 a' and j, then p{lp$ . . . pik is a divisor of each ai. On the other hand, a number which is a divisor of each ai must be of the form p:lp$ . . . pkk, where hj 5 eji for al1 i and j. Then hj 5 min {ejl, ej2, . . . , ejn) = f j. Thus, p?lp;2. . . pkk divides p{lp$. . . p,$, SO that p{lp$. . . p k k = (al, a2, . . . , a,). The proof of part (b) is similar and is left to the reader.
EXAMPLE 2. Find the g.c.d. and 1.c.m. of the set of numbers (360, 105, 1078).
We have
360 O 7 8
and
= =
0, 0,
= =
0, 0.
min (1, 1, 0)
0,
138
[CHAP.
3, 2,
2, 1.
max (1, 1, 0)
1,
23 . 32 5l 72 111 = 194040.
1. Express the following numbers as products of powers of distinct primes. (a) 100 (b) 1300 (c) 1960 (d) 109 (e) 713 2. Find the set of al1 divisors of each of the numbers in Problem 1. ) a(n) for each of the numbers n in 3. Using (53.4) and (53.5), find ~ ( n and Problem 1. Check your results on a(n) by computing the sum of the divisors of n directly. 4. Use (53.6) to find the g.c.d. and 1.c.m. of the following sets of integers. (b) (27, 18, 21, 45) (a) (20, 15, 22, 10) (c) (168, 842, 252) (d) (253, 690, 1127) 5. Use the fundamental theorem of arithmetic to determine the square roots of the following numbers with three decimal place accuracy. (a) 392 (b) 5780 (c) 122694 [Note: 4 3 = 1.41421..., 4 3 = 1.73205..., 4 5 = 2.23607....] 6. Prove in detail that if dln and n = pflpa2 . . . pig, then d = where O _< f; 5 ei.
p{lpp. . . pie,
7. Show that if the natural numbers a and b are relatively prime, then
8. For any natural number n, let ak(n) be the sum of the lcth powers of the divisors of n: ok(n) = I d l n dk, where the sum is over al1 natural numbers d which divide n. (a) Show that oo(n) (b) Show that if n primes, then
= =
. . . , p,
are distinct
(c) Show that ak(n) = n%k(n) both from the definition of ak(n) and from the formula obtained in part (b).
9. Use the fundamental theorem of arithmetic to give a new proof of Theorem 52.4.
10. Use the fundamental theorem of arithmetic to show that for any natural numbers a and b, (a, b)[a, b] = ab.
541
159
"54 More about primes. The fundamental theorem of arithmetic shows that the primes are of great irnportance in number theory. In this section we will consider some apparently simple questions about the set of al1 primes. Most of these questions will be left unanswered. Indeed, some of the simplest looking problems of the theory of prime numbers can be counted among the outstanding unsolved problems of mathematics. Probably the first question which one would ask about prime numbers is: How can 1te11 when a natural number is a prime? I t is always possible to test by long division whether or not a natural number a is divisible by a natural number b in the range 1 < b < a. I f a is not divisible by any such b, then a must be a prime. However, if a is large, then the amount of computation required to determine in this way whether a is a prime may be considerable. The labor can be reduced by using a simple property of composite numbers.
< 6.
Since a is composite, a = b c where b > 1 and c > 1. Suppose that b 5 c. (One of the factors of a is less than or equal to the other one, and we can denote that factor by b.) Assume that b > d a . Then c 2 b > < a , & = a, which is imposso that c > 4;. Therefore, a = b c > sible. Since the assumption that b > & led to a contradiction, we can conclude that b 5 &. By Theorem 53.3 or Example 2, Section 23, b is divisible by a prime p. Hence a = b c is divisible by p where p 2 b .da. To test whether a natural number a is a prime, it suffices by (54.1) to divide a by al1 primes which are not larger than .\/a. I f each division has a nonzero remainder, then a is a prime. For example, consider a = 787. Since 2g2 = 784 and 2g2 = 841, 28 < < 29. The primes which are 2, 3, 5 , 7, 11, 13, 17, 19, and 23. By trial we are not larger than find that 787 is not divisible by any of these primes. Therefore, by (54. l ) , 787 is a prime. The first tables of prime numbers were compiled by a simple process based on (54.1). This method "sifts" the composite numbers from the sequence of al1 natural numbers which are less than or equal to some fixed natural number. The process is credited to the Greek mathematician Eratosthenes (276194 B.C.). Suppose that we wish to find al1 primes 100. By (54.1), every composite number 5 100 is divisible by a prime p \/iOO = 10. Therefore, the primes 5 100 are those numbers which are not proper multiples of 2, 3, 5, and 7. Thus, if we let the multiples of 2, 3, 5, and 7 fa11 through a sieve which contains the first hundred natural numbers, the primes will be left. This process, the sieve of Eratosthenes, is illustrated in Fig. 51.
\/a
<
<
<
Knowing the primes 100 = v ' ' m we can use this method for finding the primes 5 10,000, and so forth. There are various refinements to the sieve of Eratosthenes which cut down the labor involved in compiling tables of primes. hforeover at the present time, this computation can be done by automatic computing machines. A complete list of al1 the primes among the first 11,000,000 natural numbers has been obtained by the sieve method. Note that in our table of primes, the primes thin out as the numbers get larger. There are 15 primes 5 50 and 10 primes between 50 and 100. A natural question to ask is whether the primes stop somewhere in the sequence of natural numbers, that is, is there a largest prime? Euclid* answered this question in the negative. THEOREM 54.2. There are infinitely many primes.
Proof. Euclid's proof of this fact is a proof by contradiction. Suppose that the number of primes is finite. Then al1 of the primes can be written down in order of increasing size,
<
* Most people think of Euclid as a geometer. Actually, Euclid's contributions to the subject of geometry seem to be slight. The familiar geometrical portions of Euclid's Elements are mainly compilations of the work of other geometers. However, Euclid's contributions to number theory were of the highest significance. Theorem 54.2 is rightly considered to be one of the fine gems of mathematical science.
541
161
which is the product of al1 of the primes plus 1, is not divisible by any prime in our list. This is true, since the remainder on dividing n by any one of these primes is 1 # 0. For example, if we divide n by 5, we obtain
But by Theorem 53.3 (or by Example 2, Section 23), every natural number greater than 1 is divisible by a prime. Therefore, n is divisible by a prime which is not in our list of al1 of the primes. This is a contradiction. Hence our assumption that there is only a finite number of primes is false, and Theorem 54.2 is true. Although Euclid showed over 2200 years ago that the set of primes is infinite, a closely related problem remains unsettled. I f p and p 2 are both primes, then they are said to form a prime pair. For example, 5 and 7, 11 and 13, 17 and 19,29 and 31 are prime pairs. The largest known prime pair seems to be 1,000,000,009,649 and 1,000,000,009,651. There is strong evidence to support the conjecture that the number of prime pairs is infinite. However, no proof has been found for this statement. Very little regularity is found in the occurrence of primes in the sequence of natural numbers. On the one hand, the difference between consecutive primes can probably be as small as two infinitely often. On the other hand, there are arbitrarily large gaps between consecutive primes. For if n is any natural number, then the numbers
2 = (1 2 3 n) 2 is divisible are al1 composite. In fact n! by 2, n! 3 is divisible by 3, and so forth. Therefore, if p is the largest prime less than n! 2, and q is the next largest prime, then q  p 2 n. The irregular occurrence of the primes makes it seem unlikely that there is any simple expression for the number of primes less than the natural number n. However, studies of tables of primes indicate that the number of primes less than n is approzimately equal to n/log n. (Here log n represents the natural logarithm of n; hence log n = c loglo n, where c = 2.302585... and loglo n is the usual logarithm to the base 10.) The fact that the ratio of the number of primes less than n to the quantity n/log n approaches 1 as n gets large is one of the most important results in the theory of prime numbers. I t is known as the prime number theorem. This theorem was conjectured by severa1 mathematicians in the late eighteenth century, but over a hundred years of mathematical development was required before it could be proved. Even though the set of al1 primes is irregularly distributed among the natural numbers, it might be hoped that a subset of the primes could be
162
[CHAP.
obtained by some simple formula. Pierre de Fermat (16011665)) the founder of modern number theory and one of the great mathematicians of al1 time, observed that for n = 0, 1, 2, 3, and 4, the quantity
is a prime, and he conjectured that this might be the case for al1 n. However, it wa.s found in 1732 that F5 is composite:
None of the Fermat numbers F, has been found to be a prime for n > 4, and in fact F, has been shown to be composite for 28 values of n. Nevertheless, the Fermat numbers have numerous interesting properties. A related class of numbers from which one might hope to obtain primes is given by the formula M, = 2,

1,
p a prime.
These are called Mersenne numbers after a rather undistinguished French mathematician hlarin Mersenne (15881648)) who asserted in 1644 that M, is a prime for p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127, and 257, and is composite for al1 other p less than 257. I t has since been shown that Mersenne's statement was incorrect: M67 and M257are not primes, and and Mio7 are not composite. There is an efficient method for MG1, testing whether certain Mersenne numbers are primes. This test has been used (with the aid of an automatic digital computer) to find the largest a number with 686 digits. I t is not known, however, known prime M2281, whether there are infinitely many Mersenne primes. There are two apparent ways to generalize the Mersenne numbers: replace 2 by an arbitrary natural number a > 1, and drop the restriction that the exponent p be a prime. Keither of these generalizations leads to new primes, however.
THEOREM 54.3. Let a and n be natural numbers greater than 1. Suppose that k = a"  1 is a prime. Then a = 2 and n is a prime.
Proof. I f a > 2, then lc = a"  1 = (a  l)(an1 anW2 1) has a proper divisor a  1 which is larger than l . This contradicts the f n = r S, where r > 1 assumption that Ic is a prime. Hence, a = 2. I
+ +
541
163
and s > 1, then lc = 2 C 1 = bs  1, where b = 2' > 2. As we have just seen, this implies that lc is composite. Thus, n is a prime. The Mersenne primes are closely connected with a class of numbers which greatly interested the ancient Greeks: the so called perfect numbers. A perfect number is a natural number which is equal to the sum of its proper divisors, that is, the sum of al1 of the divisors except the number itself. 3 and 496 = 2*. 31 = 1 2 4 8 For example, 6 = 1 2 16 31 62 124 248 are perfect numbers. Since a(n) is the sum of al1 of the divisors of n, including n, n is a perfect number if and only if a(n) = 2n. From Euclid's time a rule has been known for determining al1 even perfect numbers.
+ + +
+ + +
+ + + +
where p and 2"  1 are primes, is a perfect number. Conversely, if n is an even perfect number, then n is of this form.
Hence, n is a perfect number. Conversely, suppose that a(n) = 2n, where n is even. Since n is even, n = 2' k where 1 > O and k is odd. Therefore, a(n) = 42') a(k) (see Problem 7, Section 53), and
Since 2 is the only prime dividing 2"') and 21f'  1 is odd, it follows that 2'+' and 2'+'  1 are relatively prime. In view of the equality
1) . m
and
a(k) = 2'+'
m.
164
Thus, the sum of al1 of the divisors of k is the sum of the two divisors m and k. Hence, 1c has only two divisors. This implies that m = 1 and 7 is a prime. Moreover k = 2'+'  1. Thus,
where k = 2"'  1 is a prime. By Theorem 54.3, 1 prime p. Therefore, n = 2~'(2~  1)) where p and 2P  1 are primes.
+ 1 must also be a
Whether there are any odd perfect numbers is another unsolved problem in number theory. Results have been obtained which show that if an odd perfect number exists, it must be larger than 2,200,000,000,000.
1. Determine which of the following natural numbers are primes and justify your answer. (a) 503 (b) 943 (c) 1511 (d) 213  1 (e) 899 2. Use the sieve of Eratosthenes to compile a table of primes less than 300. 3. Prove that the only prime triple (that is, three consecutive primes of the form p, p 2, p 4) is 3, 5, 7.
+ +
5. Show that 33,550,336 is a perfect number. 6. Show that 2,096,128 is not a perfect number.
7. (a) Prove that if m < n, then F, divides F,  2 (where F, and F, are the Fermat numbers 22m 1 and 22n 1). (b) Show that if m # n, then F, and F, are relatively prime. (c) Use the result of (b) to give a new proof of Euclid's Theorem 54.2. 8. Show that if n is a natural number and if k = 2" 1 is a prime, then n is a power of 2.
9. (a) Prove that the product of natural numbers which are al1 of the form 3s 1 is a number which is again of this form. (b) Use this remark to prove that there are infinitely many primes of the 2, that is, that the infinite sequence 5, 8, 11, 14, 17, . . . of natural form 3x numbers contains infinitely many primes. [Hint: Proceed as in the proof of
551
165
Theorem 54.2. Suppose that there is only a finite number of primes o f the form 3x 2. List them all: 5, 11, 17, . . . , p,
"55 Applications of the fundamental theorem of arithmetic. The importance of the fundamental theorem of arithmetic can hardly be overestimated. This fact can be appreciated after some applications of the theorem are examined. In this section, we will present four different applications. Numerous others will appear later in the book. Godel numbering. Any scheme which associates a natural number with each sentence, or sequence of sentences, in some language in such a way that different expressions are associated with different numbers is called a Godel numbering of the language (after the mathematician Kurt Godel, who used such a numbering to prove important results in mathematical logic). One of the ways in which this can be done depends on the fundamental theorem of arithmetic and the fact that there are infinitely many prime numbers (Theorem 54.2). Nearly al1 expressions in the English language can be written using 38 symbols and a space marker. The symbols are the 26 letters of the alphabet, the period, question mark, exclamation point, comma, colon, semicolon, hyphen, apostrophe, two parentheses, and two quotation marks. Associate the numbers 1 to 38 with these symbols in the given order, that is,
. . . 26 27 28 . . . 37 38
1
A
5
B
5
C
5
2
5
.
1
?
1
Lt
5
>>
Associate zero with the space symbol. Let pl, p2, p3, p4, . . . denote the sequence of al1 primes in the order of increasing size. That is, pl = 2, p2 = 3, p3 = 5, pq = 7, . . . . I t is now possible to define a Godel numbering of the English language by associating with the expression
(where the si are either one of the 38 symbols listed above, or else a space marker) the number en p11p?p33 Pn ,
166
ELEMENTARY PXJMBER
THEORY
[CHAP.
where ei is the integer from O to 38 which corresponds to si. For example, the number associated with the expression GEORGE WA4SHINGTON
Since there are infinitely many primes, this scheme associates numbers with expressions of any length. Of course, the numbers involved may be very large. The number associated with the expression GEORGE WASHINGTON has more than 250 decimal digits. Nevertheless, even the text of a book such as the King James version of the bible (written in capitals, and with numbers written out) has a uniquely associated number. I t is theorectically possible to determine any expression from the knowledge of its corresponding number. For example, the number
factors to
and therefore
1 DO
is the expression from which it is obtained. Different expressions must correspond to different numbers, since by the fundamental theorem of arithmetic, two different products of primes cannot be equal to the same natural number. Thus, our scheme satisfies the requirements for a Godel numbering of the English language. The scheme for constructing a Godel numbering which we have presented is not very practical, since the numbers involved are usually very large. However, when applied to formal mathematical languages, the method has important theoretical consequences for logicians and philosophers. A cardinal number problem revisited. In Section 12, we showed that the set F of al1 fractions a / b (where a and b are natural numbers) has the same cardinality as N, the set of natural numbers. We will now use the fundamental theorem of arithmetic to give another proof of this fact, that is, to establish a onetoone correspondence between N and F. First, define a onetoone correspondence between the set of al1 nonnegative integers and . Any such the set of al1 integers in such a way that O corresponds to O
551
where p,, pz, . . . , pQ are distinct primes and e l , ez, negative integers, associate the fraction
. . . , e, are non
Then the association a ++ r is a onetoone correspondence between N and F. Before discussing the proof of this theorem, let us compare the correspondence between N and F given by Theorem 55.1 with the correspondence defined in Section 12. We see immediately that they are different. For example, with the definition given in Theorem 55.1,
which bears no resemblance to the correspondence given in Section 12. The correspondence of Section 12 was defined in rather vague terms. We gave no rule stating what fraction would correspond to a specific natural number n. Instead, it was pointed out how one could, with sufficient patience, find the fraction corresponding to any particular n. For large values of n, the method would not be practical. For example, to find the fraction corresponding to 90,000,000 would be a long, tedious job. On the other hand, the correspondence given by Theorem 55.1 is much more explicit. To apply the rule, the only requirement is that we be able to factor the natural number a into its prime factors. For example, the number 90,000,000 = 27 32 . 57 corresponds to 2* 3 54 = 3/10,000. For a mathematician, the correspondence defined in Theorem 55.1 is much more satisfying than the vague directions laid down in Section 12. Never
168
ELEMESTARY
NUMBER THEORY
[CHAP.
theless, he would admit, perhaps reluctantly, that the discussion in Section 12 proves just as effectively that the set F is denumerable. The proof of Theorem 55.1 is based on a generalization of the fundamental theorem of arithmetic.
p11pi2. . . p i ~ ,
where pl, p2, . . . , p, are distinct primes and xl, xs, . . . , xg are integers. Moreover, this representation is unique, except for the order of the factors and the occurrence of primes with exponent zero. This theorem is an almost immediate consequence of Theorem 53.3, and the fact that every positive rational number r has a unique representation a/b in "lowest terms," that is, with a and b natural numbers which are relatively prime. We leave to the reader the chore of supplying a detailed proof. Theorem 55.1 can now be easily proved by reinterpreting Theorem 53.3 and 55.2. For this purpose, let p l , p2, p3, p4, . . . denote the sequence of al1 primes in increasing order. Thus, pl = 2, p2 = 3, p3 = 5, p4 = 7, . . . . Then by Theorem 53.3, each natural number a can be written
where now el, e2, . . . , e, are nonnegative integers, and g is some sufficiently large number. The number of factors in the expression is not uniquely determined because we can always multiply by primes to the zero power. Thus, 10 = 2 l . 3O. 5 l . 7 O . l l O 13O. . However, the number a determines, and is determined by, the sequence of exponents el, e2, . . . , e,. By adjoining an infinite number of zeros, we do achieve complete uniqueness. I n other words, there is a onetoone correspondence between the natural numbers and the infinite sequence of nonnegative integers
which are zero from some point on (that is, for sufficientlylarge g, = 0, = O, . . .). The uniqueness statement in the fundamental theorem of arithmetic tells us that the correspondence
a
is onetoone.
py1pi2.. . p g "
(el, e2,
. . . , e,, O, 0, . . .)
551
169
In exactly the same way, there exists, by Theorem 55.2, a onetoone correspondence between the set F of al1 positive rational numbers and the set of al1 sequences
of integers with the property that x , + ~ = O, xg+2 = 0, point on. The correspondence is
. . . from some
where the ei are nonnegative integers which are zero from some point on, and the set K of al1 (xl, x2, . . . , x,, . . .), where the xi are integers which are zero from some point on. The onetoone correspondence e e given by (54) clearly determines a onetoone correspondence between J and K : (el, ez, . . . , e,,
. . .)
. . .). . . .)
which is the correspondence described in Theorem 55.1. A Diophantine problem. I t is well known that there are right triangles whose sides have integral length. The best known example is the 3, 4, 5 right triangle with bases of length 3 and 4, and hypotenuse of length 5. Somewhat less well known is the right triangle with sides of length 5, 12, and 13. Since the length c of the hypotenuse of a right triangle is related to the lengths a and b of the sides by the Pythagorean formula
the problem of finding al1 right triangles with sides of integral length is equivalent to finding al1 natural numbers a, b, and c which satisfy (55). An equation such as (55) involving powers of unknown quantities with integral coefficients is called a Diophantine equation (after the ancient b = 2, a2 Greek mathematician Diophantus). For example, a 5b2 = 1, a2 ab b2 = 5c2, and a 4 b4 = c2 are al1 Diophantine equations. The problem of finding al1 integral solutions of a Diophantine equation, or a system of Diophantine equations, is called a Diophantine problem.
+ +
170
[CHAP.
Using the fundamental theorem of arithmetic, it is possible to obtain the complete solution of (55). First note that if r, s, and t are natural numbers, with r > S, and if we let then
Therefore, (56) gives a large family of solutions of (55). We will show that every solution of (55) with a, b, and c natural numbers is of the form (56) (or a similar form with a and b interchanged) for suitable natural numbers r, s, and t. The proof is based on the following useful consequence of the fundwental theorem of arithmetic. THEOREM 55.3. Suppose that a and b are natural numbers which are relatively prime, and ab = cn for some natural number c. Then a = al, b = bl, where a l and bl are natural numbers. Proof. Let a =
p p . . . pp,
b =
q{l
fh q h
where pl, . . . , p, are distinct primes, q l , . . . , q h are distinct primes, and the exponents el, . . . , e, and f 1, . . . , fh are al1 positive. Then the pi must be different from al1 qj, since otherwise a and b would have a common prime factor, contrary to the assumption that they are relatively prime. Let
k are positive exponents. where rl, . . . , rk are distinct primes and ml, . . . ,m Then the condition ab = cn can be written
By Theorem 53.3, it follows that the primes rl, . . . rh must be pl, . . . , p,, ql, . . . , q h in some order, and that el, . . . , e,, f l , . . . , fh are the corresponding exponents nml, . . . , nmk. Thus, each ei and f j is divisible by n, that is, el/n, . . . , e,/n, f l/n, . . . , fh/n are al1 natural numbers. Let
.: Then a = ay, b = b We now return to the problem of finding al1 natural iiumber solutions of (55). Suppose that a, b, and c are natural numbers which satisfy (55).
551
THE FUNDAMENTAL
THEOREM OF ARITHMETIC
171
Let t be the greatest common divisor of a, b, and c. Then a/t, b/t, and c/t are natural numbers with no prime factor in common which satisfy
We will show that a/t, b/t, and c/t have the form r2  s2, 2rs, and r2 s2, respectively, or else 2rs, r2  s2, and r2 s2, respectively. Let x = a/t, y = b/t, and x = c/t. Then
where no prime divides any two of these natural numbers, that is, each pair of the numbers x, y, x are relatively prime. For example, if plx and plx, then p divides x2  x2 = y2. Thus, by (53.2)) ply. But this is a contradiction, since x, y, and x have no prime factor in common. Since x and y are relatively prime, they cannot both be even. Suppose they are both odd. Then we could write x = 1 2m, y = 1 2n, with m and n nonnegative integers. Consequently
and z2 = x2
412, so that
This is clearly impossible. Therefore, one of x or y is even, while the other is odd. Suppose that x is odd and y is even. Then x is odd, so that x  x x) are integers. Moreand x x are even. That is, $(z  x) and $(z over, they are relatively prime, since if a prime p divides +(x  z) and +(x x), then p divides +(x x) +(x  x) = x and *(x x) +(x  x) = x. But this is impossible, since x and x are relatively prime. BY (57) ,
+ +
where *y is a natural number, since y is even. By Theorem 55.3, *(x x) and *(z  x) are squares, that is, there exist natural numbers r and sbuch that $(z  Z) = s2. +(z x) = r2,
+ Consequently, z = r2 + s2, x
r2  s2, and
551
THE FUNDAMENTAL
THEOREM OF ARITHMETIC
173
f a, b, and c are natural numbers which have no common prime (55.5). I factor and satisfy a2 b2 = c2, then (a) c is odd; (b) either a is even and b is odd, or vice versa; (c) if a is even,
where r and
> s.
Now suppose that the equation x4 y4 = z2 can be satisfied by some natural numbers. Then the set of al1 natural numbers z for which there exist natural numbers x and y, such that x4 y4 = z2, is not empty. Consequently, by the wellordering principle, this set contains a smallest number c. Let a and b be corresponding natural numbers such that a4 b4 = c2. We will obtain a contradiction by showing that there is a natural number t smaller than 'c such that t2 = x4 y4 for some natural numbers x and IJ. This will show that our original assumption that a solution exists is false. I f a and b had a common prime factor p, then p41c2. Thus, by the fundamental theorem of arithmetic, p2/c, and therefore (a/p)4 ( b / ~= )~ ( ~ 1 ~ Since ~ ) ~clp2 . < c, this contradicts the assumption that c is the smallest of the natural numbers z for which z2 = x4 y* has a solution. Consequently, a and b are relatively prime. This implies that a2, b2, and c have no common prime factor, so that (55.5) applies to the equation (a2)2 (b2)2= c2. We obtain that either a2 or b2 is even, and assuming that a2 is even, we have
where r and S are relatively prime natural numbers and r > s. Since r and S are relatively prime, it follows that S, b, and r in the equation s2 b2 = r 2 have no prime factor in common. Thus, by (55.5a), applied to the equation s2 b2 = r2, r is odd. However, a2 is even, so that a is even. Thus, 4 divides a2 = 2rs, and consequently 2/73. Since r is odd, S must be even, and we have
Since (r, S) = 1, it follows that r and s/2 are relatively prime. Consequently, by Theorem 55.3, the equation (a/2)2 = r (s/2) implies that r and s/2 are squares:
174
[CHAP.
+ b2 = r2 again.
r = v2
Since s is even,
2vw,
b = u2
w2,
+ w2,
y2
for some natural numbers x and y. Combining these equalities with the w2, we obtain equations r = t2 and r = v2
Moreover, t 5 t2 = r $ r2 < r2 s2 = C. Thus, we have arrived at the promised contradiction, and proved the f ollowing result. THEOREM 55.6. There are no natural numbers a, b, and c which satisfy a4 b4 = c2. In particular, the equation x4 y4 = z4 has no solution in natural numbers.
The reader should reexamine the proof of this theorem, noting the following aspects of it. (1) The main step of the proof is to show that the existence of one triple =z : leads to another (xl, y,, zl) of natural numbers satisfying xf = zz with zz < zl. Repeating the triple (x2, y2, z2) satisfying x i argument would lead to a sequence of triples (x,, y,, z,), n = 1, 2, 3, . . . , with x$ y$ = Z : and zl > 22 > z3 > . . . . This sequence of inequalities is impossible by the wellordering principle, and therefore proves that the existence of the original triple (xl, yl, zl) is impossible. (Actually, it was convenient for our argument to use the wellordering principle at the beginning of the proof.) This technique of proof is common in number theory. It is called the "method of infinite descent. " The reader may recall that this method was used to establish the Euclidean algorithm in Section 52. (2) The main step of the proof is carried out by two applications of (55.5) and two applications of Theorem 55.3. Remembering this observation and the general method of proof, the reader should be able to reconstruct the argument without the help of the book. (3) The method of proof which we used would not suffice to show directly that the equation x4 2/4 = z4 has no solutions in N. The generalization
5 61
CONGRUENCES
175
to x4 y4 = z2 is essential to the success of our proof. This is another instance of the situation discussed in Section 24, where induction fails in the proof of a certain theorem, but is successful in proving a stronger result. In the case of Theorem 55.6, induction occurs as an application of the wellordering principle. (4) I t is an immediate consequence of Theorem 55.6 that no equation of the form X41 + y4m = x2n has a solution x = a, y = b, z = c, with a E N, b E N, c E N. Indeed, if such a solution exists, then a" bm, cn is a solution of x4 y4 = z2. I n particular, if 4 divides n, then the Fermat equation xn y" = zn has no solution in N.
1. Using the Godel numbering of the English language which was defined in this section, find the Godel numbers (in factored form) of the following expressions. (a) ALGEBRA (b) U.S.A. (c) DON'T GIVE U P THE SHIP! 2. Give the proof of Theorem 55.2. 3. Let a and b be any natural numbers. Show that integers r and s exist such that rs = (a, b), (alr, b/s) = 1. 4. Let a be a natural number. Let s2 be the largest square dividing a. Show that if d2 is a square dividing a, then dls. 5. Suppose that (a, b) = 1, (c, d) = 1, and ab = cd. Show that integers r, S, t, and u exist, each pair of which are relatively prime, such that a
=
rs,
tu,
c =
rt,
su.
< p,
a2
is given by
+ 2b2 = c2
b
=
=
a
[or a = Zrst, b numbers.
= =
Zrst,
c =
(r2
+ 2s2)t
S,
(r2
+ 2s2)t],where r,
56 Congruences. Many interesting problems and numerous theoretical questions in number theory are concerned with properties of the remainder obtained by dividing an integer by a fixed natural number m. For example, if the first of July falls on Sunday, then what will be the day of the week on
176
[CHAP.
which the first of September falls? Since July and August each have 31 days, the answer is that the first of September falls r days after Sunday, where r is the remainder obtained on dividing 31 31 by 7, namely, r = 6, and the day is Saturday . Another example is the following problem : a certain chemical reaction requires 100 hours; if it is desirable to complete the reaction at 8:00 A.M.,at what time of day should it be started? The answer is r hours before 8:00 A.M.,where r is the remainder obtained on dividing 100 by 24, that is, 4:00 A.M. A property of remainders which was needed for the solution of Problem 9, Section 54, is the fact that if the natural number a leaves the remainder 1 on division by 3, then the same is true for every power of a. The study of many such problems involving remainders is simplified by the systematic use of a concept which was introduced by the great German mathematician Carl Friedrich Gauss (17771855).
DEFINITION 56.1. Let m be a natural number. An integer a is congruent modulo m to an integer b if a  b is divisible by m in the ring of integers. I t is customary to write a
b (mod m)
to indicate that a is congruent to b modulo m. The relation a = b (mod m) is called a congruence, and m is called the modulus of the congruence. By the definition of congruence, every pair a, b of integers are congruent modulo 1. Thus, congruence with the modulus 1 is not very interesting. congruence modulo 2 has a familiar meaning: a = b (mod 2) if and only if a and b have the same parity; that is, either a and b are both even, or they are both odd. The connection between the remainders on division by m and congruence with the modulus m is seen from the following fact. THEOREM 56.2. Let m be a natural number. Then each integer is congruent modulo m to one and only one of the numbers O, 1, 2, . . . , m  l. This theorem is an immediate consequence of the division algorithm. If a is any integer and m is a natural number, then there are unique integers q and r, with O 5 r < m, such that a = qm r. Thus, there is a unique number r among the numbers O, 1, 2, . . . , m  1 such that a  r is divisible by m. By Definition 56.1, this means that a is congruent modulo m to one and only one of the numbers O, 1, 2, . . . , m  1. I t is clear that an integer a is divisible by a natural number m if and only if a = O (mod m). Moreover, a b (mod m) is equivalent to the
561
statement that a  b O (mod m). Thus, the notion of congruence is apparently only a variation of the concept of divisibility. It is therefore surprising that this notion is so useful. The usefulness is partly explained by the fact that congruence has many of the familiar properties of ordinary equality, so that manipulations with congruences are similar to the computations of elementary algebra. THEOREM 56.3. Let m be a natural number and let a, b, c, and d be integers. Then (a) a a (modm); (b) if a b (mod m), then b = a (mod m); b (mod m) and b = c (mod m), then a c (mod m) ; (c) if a (d) if a = b (mod m) and c = d (mod m), then a + c=b d (modm) a n d a  c b  d (modm); (e) if a = b (mod m) and c d (mod m), then ac bd (mod m); b (mod m), then ca cb (mod m); (f) if a (g) if a r b (mod m), then a" = bn (mod m) for any natural number n.'
CONGRUENCES
177

  
The properties (a), (b), and (c) follow easily from Definition 56.1. To prove (d), suppose that a = b (mod m) and c = d (mod m). Then by Definition 56.1, a  b = km and c  d = lm for some integers k and l. Thus,
and
c)  (b d) and (a  c)  (b  d) are Therefore, the differences (a both divisible by m. By Definition 56.1, a c b d (mod m) and a  c b  d (mod m). The statement (e) is proved similarly. Using the same notation as in the proof of (d), we have
+
+
Property (f) is an immediate consequence of (e) and the fact that c c (mod m) by (a). Property (g) is obtained by successively applying (e) to the congruence a = b (mod m). Using the given congruence twice, (e) implies a2 b2 (mod m). Using a = b (mod m) and a2 = b2 (mod m),
ac  bd = (a  b)(c  d) ad bc  2bd = (a  b)(c  d) d(a  b) b(c  d) = (km)(lm) d(km) b(1m) = (klm dk b1)m.
+ + +
+ + + +
178
(e) gives a3 b3 (mod m), and so forth. Of course, this argument can be formalized by induction. The "transitive law," Theorem 56.3(c), and the "reflexive law," Thcorem 56.3(a), justify the use of sequences of equalities and congruentes. For example,
[CHAP.
is a convenient abbreviat,ion for a = b (mod m), b = c, c d (mod m), and d = e. By (a) and (c), the congruenccs obtained by omitting one or more quantities from this sequencc are valid: a c (mod m), a d (mod m), a = e (modm), b = d (modm), b e (modm), and c e (modm). I t is a consequence of Theorem 56.3 that in a congruence with modulus m which involves sums and differences of products, any integer in the congruence can be replaced by any other integer to which it is congruent modulo m. For example, if ab3  2abc 5d2 (mod m), and if b = e (mod m), then ae3  2aec 5d2 (mod m). I n fact by (g), b3 = e3 (mod m). Using (f), we obtain ab3 = ae3 (mod m). Simjlarly, 2abc 2aec (mod m). ae3  2aec (mod m). Finally, employing (e), \\re By (d), ab3  2abc find a e " 2aec 5d2 (mod m). Even the simple properties of congruence given in Theorem 56.3 have useful applications.
  

by 7. By Thcorem 56.3,
110+ 210 +
s..
mhere this sum contains 14 occurrences of the blocks 11 01 (since 14 7 = 98). Thus,
+ 21 + + 61 +
+ 21 + 31 +
5 61
CONGRUENCES
179
EXAMPLE 2. A wellknown property of natural numbers written in decimal notation is that such a number is divisible by 9,if and only if the sum of its digits is divisible by 9. The basis for this useful fact is the observation that if
where O
ri
<
1 (mod 9))
That is, any natural number is congruent modulo 9 to the sum of its digits. I n particular, n is divisible by 9 if and only if the sum of the digits of n is divisible by 9. Note that the process of adding digits can be repeated to obtain the remainder on division of a number by 9. For instance,
One of the simplest and most familiar methods of checking the addition of a column of numbers is based on this observation. The process is called "casting out nines." I t consists of summing the digits of each number in the column, adding these sums, and comparing the result with the number which is obtained by summing the digits of the number which is supposed to be the sum of the given numbers. If the two numbers being compared are not congruent modulo 9, then there is an error. For example: 2165 3082 7165 11011 35171 1022 59616
= 13 = 19
4
14

= 17 = 5 72 = 7 + 2 O (mod 9) 27 2 + 7 = O (mod 9)
Of course, this check is not infallible, but i t is easy to apply. It is left to the reader to show that this method can also be used to check multiplication.
The following theorem gives some of the most useful relations between congruences with different moduli.
THEOREM 56.4. (a) I f a b (mod m), then l a = lb (mod lm). (b) If a b (mod m) and llm, then a b (mod 1). (c) If a = b (mod m) and d is a comnion divisor of a, b, and m, then a/d = b/d (mod m/d) . (d) I f ea eb (rnod m), then a = b (mod m/(c, m)).
180
[CHAP.
(e) I f a = b (mod ml) and c = d (mod mi), t h e n a + c = b + d (mod (ml,m2)), a  cb  d (mod (mi,mi)), and ac = bd (mod (mi, mi)). (f) I f a = b (mod ml), a b (mod m2), . . . , a = b (mod mk), then a b (mod n), where n is the least common multiple of ml, mi,
The reader will find that (a), (b), and (c) are straightforward consequences of Definition 56.1. Property (d) is the cancellation law for congruences. To prove (d), we first note that since (c, m) is a common divisor cb/(c, m) (mod m/(c, m)). of ea, cb, and m, by (e) we have ca/(c, m) This means that m/(c, m) divides [c/(c, m)](a  b). But m/(c, m) and c/(c, m) are relatively prime. Therefore, by Theorem 52.6, m/(c, m) divides (a  b). That is, a = b (mod m/(c, m)), proving (d). In order to prove (e), we observe that a b (mod (ml, m2)) and c = d (mod (ml, mi)) by (b), since (ml, mi) lml and (ml, mi) jm2. The conclusion follows from Theorem 56.3 (d) and (e). By Definition 56.1, the hypothesis of (f) is equivalent to the statement that ml 1 (a  b), m21(a  b), . . . , and mkl(a  b). Therefore, by (53.610) the least common multiple n of ml, m2, . . . , mk divides a  b. In other words, a b (mod n).
1. Show that every integer is congruent modulo 7 to exactly one of the following numbers: 291, 7, 54, 31, 36, 20, 765.
Z5
2. Find the remainders on dividing 360by 7, 15, and 31. [Hint: N7rite 60 Z4 Z3 22, SO that 360 = 3323163834.]
+ + +
4. Prove that if a
5. Show that the method of "casting out nines" can be used to check multiplication of natural numbers. 6. Use the fact that 10 = 1 (mod 11) and lo2 1 (mod 11) to discover a rule for divisibility of a natural number (written in decimal notation) by 11.
(b, m).
7. Discover a method of "casting out sixes" as a check for addition and multiplication for natural numbers written in the base 7 notation.
8. Prove (a), (b), and (c) of Theorem 56.4.
9. Find the remainder obtained for the following divisions. 10805 divided by 14. (a) l5 z5 (b) 1 2! 3! 4! (10l0)! divided by 24. (c) (1) (i) : (E)divided by 7.
+ + + + + + + + + (O) + + + +
571
LINEAR CONGRUENCES
181
57 Linear congruences. The linear equation ax = b, where a and b are integers, has a solution which is an integer if and only if a divides b. In fact, this statement is just the definition of divisibility. This section is concerned with the analogous problem of solving the linear congruence
ax
b (mod m).
EXAMPLE 1. A synodic month (the period of time between two consecutive appearances of a full moon) is approximately 29% days. If a full moon occurs a t a certain time on Monday evening, how many synodic months later will the full moon occur a t approximately tFe same time on Wednesday evening? If we measure time in terms of half days, the synodic month is 59 half days in length and a week is 14 half days long. After x synodic months beyond the occurrence of the full moon, 59x half days have elapsed, and a full moon occurs again. I f we divide 592 by 14 obtaining a remainder r, then this full moon occurs r half days after Monday evening. Since Wednesday is 4 half days after Monday, the x which solves our problem is the smallest positive integral solution of the congruence 59x 4 (mod 14).
Since 59
To understand more clearly the nature of the solutions of linear congruences, let us examine a particular example. Consider the congruence 3x
2 (mod 5 ) .
Substituting x = 1, 2, 3, . . . , 20, we find that among these numbers only x = 4, x = 9, x = 14, and x = 19 satisfy the congruence. Note that these numbers are al1 congruent modulo 5. This suggests that al1 integers x which are congruent to 4 modulo 5, and only these numbers, are solutions 2 (mod 5 ) . of 3x By checking more values of x we could gather additional evidence for our guess. However, this is unnecessary, since the conjecture is easy to prove. First, if x = 4 (mod 5)) then by Theorem 56.3, 3x 3 4 = 12 2 (mod 5). Thus every such x is a solution. Next we must show that these are the only solutions. I f z is any solution, then
182
[CHAP.
Since 3 is relatively prime to 5, it follows from Theorem 56.4(d) that x 4 (mod 5). The result that x satisfies 3x 2 (mod 5) if and only if 4 (mod 5) provides a complete solution of the linear congruence x 3x = 2 (mod 5). In order to describe the solutions of linear congruences in general, we introduce a new concept. By Theorem 56.2, every integer is congruent modulo m to one and only one of the numbers O, 1, 2, . . . , m  l . Thus the set Z of al1 integers is divided into disjoint subsets Xo, X1, X2, . . . , Xrn1 where X, = (x E Zlx r (mod m)).
The sets Xo, X1, X2, . . . , Xrn1 are called congruence classes modulo m (or residue classes modulo m). For example, if m = 2, X o is the set of al1 even integers and X1 is the set of al1 odd integers. I f m = 4, Xo = (4klk E Z), X1 = (41c llk E Z), X 2 = (4k 2 1 1 c E Z), and X3 = (41c 31k E 2 ) . I f the integers x and y are in the same congruence class X , modulo m, then x r r (mod m) and y r (mod m). Therefore, by Theorem 56.3(b) and (c), x y (mod m). Conversely, if x = y (mod m) and y E X,, then x y = r (mod m), so that x E X,. Thus, two integers x and y are in the same coiigruence class modulo m if and only if x = y (mod m). As in the example discussed above, if x is a solution of the congruence
 
and if y
Thus, if x is a solution of ax = b (mod m), theii every member of the congruence class which contains x is also a solution of the congruence. I n the example 3x 2 (mod 5)) every element of the congruence class X 4 is a solution of the congruence. I n fact, the solutions of 3x 2 (mod 5) are exactly the integers which belong to X4. However, it may happen that a linear congruence modulo m has solutions belonging to more than one congruence class modulo m. For example,
2x
6 (mod 12)
has the solutions 3 and 9, and therefore every element in either of the two congruence classes X 3 and X9 is a solution of this congruence. On the other hand, some linear congruences have no solutions. For instance, 22
1 (mod 6)
571
LINEAR CONGRUENCES
183
cannot be satisfied by any integer x, since 2n:  1 is always odd and therefore not divisible by 6. These remarks suggest that a linear eongruence ax = b (mod m) is effectively solved if we obt,ain a representative set of solutions
where O 5 ri 5 m  1 and ri # r j for i # j (which implies that the ri belong to different eongruence classes), such that every solution of ax = b (mod m) is a member of the eongruence class of some Ti. I f ax = b (mod m) has such a representative set of solutions Irl, 72, . . . , rk), then this congruence is said to have exactly 1c incongruent solutions modulo m. In particular, if lc = 1, that is, al1 solutions belong to the same congruence class, then we say that the congruence has a uniqzse solution modulo m. This is the case for the congruence 3x = 2 (mod 5). If m is not very large, it is possible to obtain the representative set of solutions by testing each of the numbers O, 1, 2, . . . , m  1 to see which of them satisfy ax r b (mod m). However, this procedure is impractical for large values of m. Fortunately, it is possible to prove general theorems which give a complete solution for any linear congruence. THEOREM 57.1. I f (a, m) = 1, the' eongruence an: unique solution modulo m.
b (mod m) has a
Proof. By Theorem 52.2(a), there exist integers u and v such that ua vm = l. Multiplying by b, we obtain
bua
+ bvm = b,
By Definition 56.1, a(bu) b (mod m), so that x = bu is a solution of t'he given congruence. Suppose that r is any solution of ax b (mod m). Then ar b = a(bu) (mod m). Since (a, m) = 1, we can use the cancelation lam for congruences, Theorem 56.4(d), to cancel a and obtain r
or
a(bu)
b = (bv)m.
bu (mod m).
Thus, any solution of the given congruence is congruent modulo m to the solution x = bu, so that ax = b (mod m) has a unique solution modulo m. Note that u can be found by using the Euclidean algorithm, explained in Section 52.
184
[CHAP.
Now consider the general linear congruence. THEOREM 57.2. The congruence ax = b (mod m) has a solution if and only if (a, m) divides b. If (a, m) divides b, the congruence has exactly (a, m) incongruent solutions modulo m. Proof. I f the congruence a z b (mod m) has a solution r, then a r b = lm, or ar  lm = b, for some integer l. Since (a, m) is a common divisor of a and m, (a, m) divides b. Conversely, if (a, m) divides b, then we can consider the congruence
Since a/(a, m) and m/(a, m) are relatively prime, (58) has a solution S by Theorem 57.1. Then as = b (mod m) by Theorem 56.4(a), so that S is a solution of ax E b (mod m). I n fact, any solution of (58) is a solution of the given congruence. I f the condition (a, m)lb is satisfied, then the congruence (58) has a solution S satisfying O _< S < m/(a, m). Define for j = O, 1, . . . , (a, m>  1,
Then s j = S (mod m/(a; m)), so that s j is a solution of (58)) and therefore of ax = b (mod m). Moreover, since
sj
is a representative set of solutions of the congruence ax = b (mod m). That is, every integer t satisfying at = b (mod m) is congruent modulo m to S, for some r. By Theorem 56.4(c), at = b (mod m) implies that t is a solution of (58). Therefore, t S (mod m/(a, m)) by Theorem 57.1. That is, t =S l[m/(a, m11
+ r, where
rm (a, m>
+ qm = + qm =
S ,
S,
(mod m).
571
LINEAR CONGRUENCES
185
EXAMPLE 2. Solve the congruence 15x = 20 (mod 35). Since (15, 35) = 5, and 5120, the congruence has 5 solutions which are incongruent modulo 35. These are obtained by first solving 32 = 4 (mod 7). We find the solution x = 6. Then a representative set of solutions of 152 = 20 (mod 35) is obtained from (59) and (510). These are
For (a, m) = 1, the congruence ax = b (mod m) can be solved as in Theorem 57.1 using the Euclidean algorithm. This is probably the best method if the numbers a and b are large. I f these numbers are small, the congruence can often be solved more easily by trial, or by using the properties of congruences given in Theorem 56.3.
f x is a solution, then 6x = 8 (mod 7), EXAMPLE 3. Solve 3x = 4 (mod 7). I so that x = 1 (mod 7), and x E 1 = 6 (mod 7). Suppose that we wish to solve 5x = 9 (mod 13). I f x is a solution, then 18x = 9 (mod 13), and therefore 1 (mod 13); consequently, 142 = 7 (mod 13), and x = 7 (mod 13). As a 22 final example, if x is a solution of 5x = 11 (mod 17), then 52 45 (mod 17); hence x = 9 (mod 17).
There is an important application of Theorem 57.1 to the construction of sets of orthogonal Latin squares. A Latin square of side m is an arrangement of m distinct symbols in m2 subsquares of a square, in such a way that every row and every column contains each symbol exactly once. I t is immaterial what symbols are used, but it is convenient to let them be the number symbols O , 1, 2, . . . , m  1. As an example,
is a Latin square of side 3. Two Latin squares of the same size are called orthogonal if, when one is superposed on the other, every ordered pair of symbols occurs exactly once in the resulting square. For instance
186
[CHAP.
are orthogonal Latin squares, since when one is superposed on the other we have
For centuries, amateur and professional mathematicians have found Latin squares interesting. In recent years the study of pairs of orthogonal Latin squares has taken a serious turn, because of the discovery that such pairs have important applications in algebra, geometry, and applied statistics. Let p be a prime. For any integer a, let r(a) be the remainder on dividing a by p. That is, O 5 r(a) < p, and a = r(a) (mod p). For O < lc < p, define a Latin square (which we designate by Lk) as in Table 51. In other words, the number in the ith row and jth column of Lk is
To show that Lkis a Latin square, it is necessary to prove that if O 5 b < p, then b occurs in every row of Lk and in every column of Lk. Consider the ith row. Then b  (i  1)k = c (mod p) for some c satisfying O c 5 p  1. Let j = c 1. Then
(j Thus,
1)
+ (i
1)
1)k = b (modp).
r((j
+ (i  1)k)
b.
Therefore, b occurs as the jth entry of the ith row. Now examine the jth column. Since O < k < p and p is a prime, it follows that (k, p) = 1.
571
LINEAR CONGRUENCES
Hence by Theorem 57.1, there is an integer d such that kd = b  ( j Wecan select d so that O

1) (modp). 1. Define i = d
5d5p
+ l.
Then
Consequently, r (( j  1) (i  1)k) = b, so that b is the it,h ent,ry of t,he ,jth column in Lk. Thus, we have shown that each row and column contains each of the numbers O, 1, . . . , p  1 at least once. Since there are only p entries in each row and column, it follows that the rows and columns cannot contain these symbols more than once. Therefore, Lk is a Latin square. We wish to show now that if O < k < 7' < p, then the squares Lk and Lkl are orthogonal. For this, we have to prove that if O 5 a 5 p  1 and O 5 b 5 p  1, there are natural numbers i and j such that 1 5 i 5 p, 1 5 j 5 p, and r ( ( j  1) (i  1)k) = a, r ( ( j  1) f (i  1 ) = b. This is elearly equivalent to the problem of solving the congruenees (i  1)k = a(modp), ( j  1) ( j  1) (i  1)k' 5 b (modp)
+ +
1)(k'
lc)
=b
a (mod p),
a)
+ (k'  k) (modp).
Since O < Ic'  7 < p and p is a prime, it follows from Theorem 57.1 that this congruence has a solution i such that 1 5 i 5 p. Choose j so that 1 5 j 5 p and
j  1=a

(i
I)lc (modp).
Then by construction, j  1 and i  1 satisfy the congruence ( j  1) (i  1)lc = a (mod p). However, these values of j  1 and i  1 also satisfy the congruence ( j  1) (i  l)kf b (mod p). In fact,
j 1
+ (i
This proves that Lk and Lkl are orthogonal. Note that we have constructed a set of p  1 Latin squares, each pair of which is orthogonal.
188
[CHAP.
Many problems in number theory require the simultaneous solution of systems of congruences. We will prove a famous and important theorem about such congruences. This result was known to Chinese mathematicians as early as 250 A.D., and for this reason i t is usually called the Chinese remainder theorem. THEOREM 57.3. Let ml, m2, . . . , mk be natural numbers such that (mi, mj) = 1 if i # j. Then if bl, b2, . . . , bk are any integers, there exists an integer x such that
. . . mk.
Proof. Let ni = mlm2 . . . milmi+l . . . mk. Then (ni, mi) = 1 since (mi, mj) = 1if i # j. Consequently, by Theorem 57.1, there is a n integer ti S U C ~ that niti bi (mod mi). Let
bi (mod mi), since if j # i, then milnj, and conseThen z = niti quently njtj = O (mod mi). Thus, x is a simultaneous solution of the f y also satisfies y = bl (mod ml), given system of congruences. I y b2 (mod m2), . . . , y = bk (mod mk), then x = y (mod ml), x = y (mod m2), . . . , x = y (mod mk). Therefore by Theorem 56.4(f), x y (mod m), where m is the least common multiple of (ml, m2, . . . ,mk) . But since these integers have no common prime factors, their least common multiple is mlm2 . . . mk.
l. Give the representative set of solutions for each of the following linear congruences. (a) 3622 236 (mod 24) (b) 552 = 5 (mod 31) (c) 84x = 96 (mod 7) (d) 36x: = 6 (mod 21) (e) 2702 = 30 (mod 150) 2. Let p be a prime. Show that if p i a, then the congruence ax = b (mod p) has a unique solution modulo p. 3. Find the solutions of the following systems of congruences. (a) x r 5 (mod 6)) x = 7 (mod 11) (b) x = 1 (mod 2)) x = O (mod 3)) x = 2 (mod 5) (c) x 21 (mod 29)) x = 5 (mod 30), x = 24 (mod 31)
581
189
4. Let a, b, and c be integers and let m be a natural number. Suppose that (c, m) = l. (a) Prove that the congruence ax = b (mod m) is equivalent to the congruence caz = cb (mod m), that is, every solution of ax b (mod m) is a solution of caz cb (mod m), and conversely. (b) Suppose, in addition, that (a, m) = 1. Prove that the congruence ax b (mod m) is equivalent to the congruence x b' (mod m) for some integer b'.
5. Let mi, m2, . . . , mk be natural numbers such that (mi, mi) = 1 if i # j. Let a l , a2, . . . , ak and bi, b2, . . . , b k be integers. Prove that the system of congruences
(a2, ma)lbz,
..
(ak, mk) 1 b k .
[Hint: Reduce the system of congruences to the form treated in Theorem 57.3 by using Problem 4(b), together with an argument similar to that given in the proof of Theorem 57.2.1
6. Determine which of the following systems of congruences have a solution, and mhen solutions exist, find a t least one. (a) 5x = 1 (mod 7), 22s 2 (mod 6) 1 (mod 125) (b) 8x = 14 (mod 24), 4x 1000 (mod 91) (c) 3 8 r ~ 3 (mod 12), 50x 75 (mod 125), x (d) 233x 10 (mod 12), 73x 1 (mod 219), 12x 4 (mod 8) 7. A band of 17 thieves stole a large sack of dollar bills. They tried to divide the bills evenly, but had three bills left over. Two of the thieves began to argue about the extra money, so one of them shot the other. The money was redistributed, but this time there were ten bills remaining. Again argument developed, and one more thief was shot. When the money was redistributed, there waqnone left over. What was the least possible amount of money which could hstve been stolen originally?


8. Construct a set of 4 Latin squares with 5 rows and 5 columns, such that each pair of the set is orthogonal.
"58 The theorems of Fermat and Euler. One of the oldest aiid most famous theorems in number theory was discovered by Fermat and communicated to a friend in 1640. The first published proof of Fermat's theorem, due to the Swiss mathematician Leonhard Euler (17071783), appeared almost a century later. Subsequently, a more general theorem mTasfound by Euler. In this section we will discuss these classical results and some of their applicatioiis.
190
[CHAP.
Fermat7s theorem concerns congruences with prime moduli. These are important because many problems concerning congruences with composite modulus can be reduced to questions about congruences with prime modulus. We begin by noting a simple property of the binomial coefficients. (58.1). If p is a prime and if i is an integer such that O
(r)
We leave the proof of this fact as an exercise for the reader (see Problem 6, Section 55). If p is a prime, and if a and b are any integers, tlien by the binomial theorem
+ (p
1) abp'
+ bP
aP
+ bp (mod p),
= O (mod p) if 1 2 i 2 p  1. Using mathesince by (58.1), matical induction, this observation can be generalized as follows.
(58.2). If p is a prime, and if al, a2, . . . , a, are any integers, then (al
+ a2 + + anlp
al
+ a; +
Proof. If n = 1, then the assertion is that al; ay (mod p), which is clearly valid. Assuming that the result holds for n, it follows from the remarks above that
[(al
+ a:
(modp).
+ a2 + +
a,)
4 an+llP= (al
ay
+ a; +
+ a2 + . + a d p + a:+l
+a:
+ a:+l
(modp).
I f welet a l = a2 = . . . = a, = 1 in (58.2), weobtainnp n (modp). Also, if p is odd, and a l = a2 . . . = a, = 1, then (58.2) specializes to (n)p r n (mod p). This is also true if p = 2, since n n (mod 2). Obviously, Op O (mod p). Therefore, we have proved the following theorem.
581
191
Alt,hough this theorem is obtained as a special case of (58.2)) it is evident that (58.2) can be deduced easily from the theorem. The "little f Fermat" is a slight variation of Theorem 58.3. theorem o THEOREM 58.4. I f p is a prime, and if a is any integer which is not divisible by p, then a ~  l=  1 (mod p). Proof. By Theorem 58.3, p divides a(ap'  1). Since p does not 1 (mod p). divide a, it must divide ap'  1, by (53.2). Hence, ap' The method by which we have proved Theorem 58.4 is similar to Euler's first proof of this theorem. Some years later, Euler found a different way to prove Fermat's theorem. Using the ideas of this second proof, he was able to establish the more general result known as Euler's theorem. We will prove Euler's theorem by a method which was discovered about 50 years later. This proof is important because it introduces a technique which has many applications in number theory. The following definition is needed. DEFINITION 58.5. Let m be a natural number. The totient of m is the number of nonnegative integers less than m which are relatively prime to m. The totient of m is usually denoted by p(m). In other words, cp(m)is the number of integers k such that O _< k < m and (k, m) = 1. For example, p(1) = 1, p(2) = 1, p(3) = 2, p(4) = 2, p(5) = 4, and p(6) = 2. If p is a prime, then the numbers 1,2, . . . , p  1 are al1 prime to p, so that p(p) = p  1. THEOREM 58.6. Euler7stheorem. If m is a natural number and a is an integer which is relatively prime to m, then
a'"(") =
1 (mod m).
Proof. To simplify notation, let t denote p(m). According to the definition of p(m), there are exactly t different natural numbers in the set (0, 1, 2, . . . , m  1) which are relatively prime to m. Let these be designated as Ic', k2) . . . , h. Consider the set of integers
. . .,t
192
[CHAP.
where qi is an integer and O 5 ri < m. Thus, ski r ri (mod m). The main step of the proof consists of showing that the list of numbers rl, r2, . . . , rt is just a rearrangement of the sequence kl, k2, . . . , let. We do this indirectly. First note that each ri is relatively prime to m. In fact, if a ri = aki. T ~ u s , prime p divides both m and Ti, then p also divides qim either p[aor pllci. However, since plm and both a and l & are relatively prime to m, p cannot divide either a or ki. Therefore, (ri, m) = 1. Since lcl, k2, . . . , lct is the list of all integers k such that O 5 l e < m and (lc, m) = 1, and since each ri satisfies these conditions, it follows that each ri must be equal to some lc,. I f we can show that the numbers rl, r2, . . . , rt are al1 different, then our proof that r l , r2, . . . , r t is a rearrangement of kl, k2, . . . , kt will be complete. Suppose that ri = r j for some i # j. Then subri and akj = qjm r j gives tracting the equations ski = qim
Therefore, mla(lci  kj). Since a is prime to m, it follows from Theorem 52.6 that ml(ki  kj). However, i # j implies ki # kj, and O 2 lei, kj < myields m < lei  lej < m. Hence, O < Iki  kjl < m. Therefore, m cannot divide ki  kj. This contradiction proves that the numbers rl, r2, . . . , rt are al1 different, and that the list r l , r2, . . . , rt is the same (in possibly different order) as kl, le2, . . . , kt. In particular, by the commutative law of multiplication,
It only remains to observe that the product lcl lc2 k t can be cancelled from each side of this congruence. In fact, no prime factor of m divides any of the integers lel, lc2, . . . , kt since these numbers are relatively prime to m. Thus, m has no prime factor in common with the product kl k2 . lct. That is, (lcl k2 . lct, m) = 1, and the cancellation is permissible by (56.4d). Thus, 1 at = aq'm' (mod m).
This proof can be illustrated by carrying it out in a particular numerical case. Let m = 14. Then the integers in the range from O to 13 which are relatively prime to 14 are 1, 3, 5, 9, 11, and 13. These can be taken as the numbers kl, le2, . . . , let in the proof of Theorem 58.6. In this example,
581
193
Therefore in our special case, the numbers rl, r2, . . . , rt occurring in the proof of Theorem 58.6 are 9, 13, 3, 11, 1, and 5. This agrees with the general result that r l , r2, . . . , r t is a rearrangement of kl, k2, . . . , Ict. To conclude the illustration, note that 1.3.5.9.1113 = 9.13.3.11.1*5 = [(5) 1][(5) 3][(5) 5][(5) 9][(5) = (5)6(i  3 . 5 . 9 . 1 1 13) (mod 14).
111[(5)
131
I f the natural number m in Euler's theorem is a prime p, then (s(m) = p(p) = p  1, and the theorem asserts that if (a, p) = 1, then ap' 1 (mod p). This is exactly the statement of Fermat7stheorem. In order to use Euler's theorem, it is necessary to know the value of the totient (s(m). For small values of m, (s(m)can be obtained by counting the numbers from O to m  1 which are relatively prime to m. However, if m is large, this procedure is impractical. Fortunately there is a convenient formula for q(m).
TKEOREM 58.7. If m is a natural number different from 1, and if m = p;'pi2. . . py, where pi, p2, . . . , p, are distinct primes, and the exponents el, e2, . . . , e, are positive, then
We will not give a proof of this theorem (however, see Problems 14, 15, and 16 below) . Euler's theorem has numerous applications. For example, it provides another method of solving linear congruences of the type discussed in
b (mod m),
where
(a, m) = 1.
b, then
.b
aq(m'
.b 
1 . b = b(m0dm).
We have ~ ( 2 2 ) = p(2 11) = 1 10 = 10. Then 6 159 is a solution of the ~ congruence. Since 152 = 225 r 5 (mod 22), 1 5 ~ 52 = 3 (mod 22), 1 5 = 32 = 9 (mod 22), 6 159 r 6 15 9 = 90 9 = 2 9 = 18 (mod 22). Therefore, x = 18 is the smallest nonnegative solution of the congruence. This method of solving linear congruences is very often not the easiest. If 15x 6 (mod 22), then 52 = 2 = 90 (mod 22), and x = 18 (mod 22).
Another application of Euler's theorem is the reduction of large powers of a number modulo m.
EXAMPLE 2. Suppose that we wish to find the least nonnegative integer to which 5221is congruent modulo 18. Since ~ ( 1 8 ) = 6 and (5, 18) = 1, Euler's theorem yields 56 1 (mod 18). Since 221 = 36 6 5, we obtain
13 (mod 18), and 55 5 13 = 65 = Finally, 52 7 (mod 18), 54 = 49 11 (mod 18). Thus, 11is the least nonnegative integer to which 5221is congruent modulo 18. EXAMPLE 3. What are the last two decimals of the number 3119? This is equivalent to the problem of finding the least nonnegative integer to which 3119 2 = ~ 40, is congruent modulo 100. Since (3, 100) = 1 and ~ ( 1 0 0 )= ~ ( 52) we have 340 r 1 (mod 100). Thus, (340)3 = 3 3119 1 (mod 100). Consequently, by Theorem 57.1, 3119 r (mod 100), where r is any solution of 3 67 3x 1 (mod 100). Since 100 = 33 3 1, we obtain 1 = 3 (33) (mod 100). Hence, 3119 67 (mod 100), SO that the last two decimal digits of 3119 are 6'7.
DEFINITION 58.8. Let m be a natural number and let a be an integer such that (a, m) = 1. The order o f a modulo m (or the exponent to which
581
195
a belongs modulo m) is the smallest natural number d such that ad = 1 (mod m).
By Theorem 58.6, the set of al1 natural numbers TL such that a" = 1 (mod m) is nonempty, since in fact cp(m) belongs to this set. Therefore, d is well defined and d 2 cp(m). It is clear from Definition 58.8 and Theorem 56.3 that if a = b (mod m), then a and b have the same order modulo m.
f THEOREM 58.9. Let d be the order of the integer a modulo m. I a" = 1 (mod m), then d[n. In particular, d[cp(m).
Proof. By the division algorithm, n = qd integer and O _< r < d. Consequently,
a'(ad)q
+ r, where q is a nonnegative
a@+'
Since d is the smallest positive exponent such that ad = 1 (mod m), i t follows that r = O. That is, dln. The last statement of the theorem is a consequence of Theorem 58.6. I t is possible to find the order modulo m of an integer a by trial, provided that m is small. For example, if m is 5, then cp(m) = 4, and the numbers f d is the order of a which are relatively prime to m are 1, 2, 3, and 4. I modulo 5, then d14, by Theorem 58.9. Hence, d = 1, 2, or 4. Clearly, 1 belongs to the exponent l. Also, 22 = 4 = 1 (mod 5 ) , 32 = 9 = 1 (mod5), 42 = 16 = 1 (mod5). Thus, 4 belongs to the exponent 2, while 2 and 3 belong to the exponent 4 modulo 5. The problem of finding the order of a modulo m can be difficult if a and m are large. By Theorem 58.9, the largest possible order of an integer modulo m is p(m). If the integer a is relatively prime to m and a belongs to the exponent cp(m) modulo m, then a is called a primitive root modulo m. For example, 2 and 3 are primitive roots modulo 5. I t is possible to prove that if m is a prime, then there are exactly cp(cp(m)) primitive roots modulo m among the natural numbers 1,2, . . . , m  1. However, if m is composite, there may not be any primitive roots for this modulus. For example, every odd integer belongs to one of the exponents 1, 2, or 4 modulo 16, and cp(16) = 8.
196
[CHAP.
An amusing application of Theorem 58.9 concerns the perfect shuffling of cards. Consider a deck of 2m cards. Let the cards of the deck be numbered from top to bottom: 1, 2, 3, . . . , m, m 1, . . . , 2m. The deck is split into two equal piles, the first pile consisting of the cards 1,2, 3, . . . , m 1, m 2, . . . ,2m in order, and the second pile consisting of the cards m in order. A perfect shuffle results if the cards are shuffled together from the bottom up, alternating a card from each pile, and beginning with the first pile. After a perfect shuffle, the arrangement of the cards will be changed f r o m 1 , 2 , 3, . . . , m , m + l , . . . , 2 m t o m + l , l , m + 2 , 2 , . . . , 2m,m. A card numbered 1, 2, . . . , m which was in position i before the perfect 1, m 2, shuffle is in position 2i after the shuffle. A card numbered m . . . , 2m which was in position i before the shuffle is in position 2i (2m 1) afterwards. Note that for 1 _< i _< m,
while for m
+ 1 5 i 5 2m,
Thus, in every case the ith card goes into position rl, where r l is the remainder obtained on dividing 2i by 2m l . Hence,
rl
2i (mod 2m
+ 1). + 1).
A second perfect shuffle will send a card which is now in position r l into position r2, where
r2 = 2rl
2 2i = 22 i (mod 2m
In general, after n perfect shuffles, the ith card will be in position r,, where r, 2" i (mod 2m
+ 1).
The question now arises: what is the least number of perfect shuffles required to return a deck of 2m cards to its original order? The answer is plainly the smallest positive integer n satisfying
1, . . . , 2m. Since this congruence must hold for i = 1, 2, 3, . . . , m, m in particular for i = 1, it is necessary that 2"
1 (mod 2m
+ 1).
581
2" i (mod 2m
+ 1)
+
for every i, b y Theorem 56.3(f). Thus, the positive integer which is the answer to the problem is the order of 2 modulo 2m l . [Note t h a t Definition 58.8 applies since (2, 2m 1) = l . ] Suppose now t h a t we are considering a n ordinary deck of 52 cards. Then 1 = 53. By Theorem 58.9, the order of 2 modulo 53 m = 26, and 2m is a divisor of (p(53) = 52. T h a t is, the order of 2 is one of the numbers 2, 4, 13, 26, or 52. Clearly 2 2 and 2* are not congruent to 1 modulo 53. Also, 26 = 64 11 (mod 53), 212 = 121 = 15 (mod 53)) and therefore 302 = 900 = 53 17  1 2 l 3 30 (mod 53). Finally, 226 = (2 13) 1 (mod 53). Thus, the order of 2 modulo 53 is 52. Consequently, i t follows from our general result t h a t 52 perfect shuffles are required to return a n ordinary card deck to its original order.

2. Find the last three decimal digits of the following numbers: 32m00,71610, 3. Find the last two decimal digits of gg9. 4. Prove that for every natural number a, a and a5 have the same final decimal digit. 5. Show that if p is an odd prime which does not divide the integer a, then either a(~l)I2 1 (mod p), or a(pl)I2 G 1 (mod p). 6. Find p(m) for m = 1, 2, 7, 8, 9, 10, 12, and 16 by listing the numbers Ic which satisfy O k < m, and (k, m) = 1.
<
7. Illustrate the proof of Euler's theorem, Theorem 58.6, for the particular case m = 15, a = 4. 8. (a) Find the order modulo 16 of the numbers 1, 3, 5, 7, 9, 11, 13, and 15. (b) Find the order modulo 15 of the numbers 1, 2, 4, 7, 8, 11, 13, and 14. 9. Let p be a prime, let e be a positive exponent, and let a and b be integers such that a b (mod pe). Prove that ap bP (mod pe+l). [Hint: Write a = b cpe, c E E, and expand ap = (b cpe)pby the binomial theorem.]
198
for every natural number n. What does this imply about the existence of primitive roots modulo powers of 2? 11. Using the result of Problem 9, deduce the following case of Euler's theorem from Fermat's theorem. Let p be a prime, let e be a positive exponent, and let a be an integer which is not divisible by p. Then
12. How many perfect shuffles are required to return a deck of 46 cards to their original order? How many for a deck of 22 cards? 13. Suppose that p is a prime and that a is a primitive root modulo p. (a) Show that
1" + 2n
+ . . . + (p
1"
1)n
+ an + a2n +
+ . . . + (p  ) n
1 (mod p).
(d) Determine l n
The following three problems lead to a proof of Theorem 58.7. Accordingly, this theorem should not be used to prove any of the statements in these problems. 14. Let p be a prime and let e be a positive exponent. Prove that
[Hint: First show that the number of integers Ir; such that O p/Ir;is pel.]
Ir;
< pe and
15. Let m and n be natural numbers which are relatively prime. Let rl, r2, . . , r, be al1 of the different integers r such that O 5 r < m, and (r, m) = 1, and let si, 82, . . . , S, be al1 of the different integers S such that O _< S < n, and (S, n) = 1. (a) Show that for any integers a and b, if c = ma nb, then (c, mn) = 1 if and only if (a, n) = 1 and (b, m) = 1. (b) Show that if msi nrj = msk nrl (mod mn), then i = Ir; and j = 1.
581
EULER
=
199
(c) Show that if c is an integer such that (e, mn) and j, c msi nri (mod mn).
(d) Prove that p(mn) = p(m) . p(n). [Hint: Using (a), (b), and (e), show k < mn and (k, mn) = 1, that for each integer k such that O there is exactly one pair (i,j ) of indices such that
<
msi
and every pair (i, j ) occurs this way.] 16. Using the results of Problems 14 and 15, prove Theorem 58.7.
CHAPTER 6
By the time Euclid wrote his Elements (about 300 B.C.),rational arithmetic had been developed to almost the same form that we know today. Of . course, the negative rational numbers came later. I t is not surprising that the invention of fractions occurred early in the history of our culture. The use of numbers for measuring length must have developed naturally from their use as counting devics. I f a trader wished to buy a certain amount of cloth, he had to have some way of giving the seller a description of how much cloth was wanted. A convenient measure was the number of arm lengths (approximately, the number of yards). With the need for more accuracy came the necessity of measuring fractional parts of unit lengths. A tailor who could make 3 robes from 10 yards of cloth would not want to buy 4 yards to make a single robe. He would need some way to express the length, 35 yards. Such needs must have led to the early invention of "rulers" and "yard sticks." Through the years, devices for measuring distance have progressed from the crude "measuring sticks" to the finest microscopic gauges. Let us review the facts about rational numbers which are usually discussed in elementary algebra courses. As we have seen, the main reason for enlarging the ring of integers to the rational numbers is to make division by natural numbers possible. Thus, ideally the system Q of rational numbers should satisfy the following conditions. (61.1). Properties of Q. (a) Q is an ordered integral domain. (b) Q contains Z as a subring. That is, Z E Q, and the ordering and the operations of addition, multiplication, and negation in Z agree with the ordering and operations in Q.
611
201
(c) If a E Z and n E N, then the equation nr = a has a solution r E Q,that is, the quotient a/n exists in Q (see Definition 44.9). (d) Every element r of Q is a quotient a/n for some a E Z and n E N, that is, r satisfies nr = a for some n E N, a E Z. The property (d) implies that Q is the "smallest" system which satisfies the requirements (a), (b),  and (e). This is stated more exactly in the following theorem. THEOREM 61.2. Let A be an ordered integral domain containing the ring Z of al1 integers as a subring. Suppose that for each a E Z and n E N, the quotient a/n exists in A. That is, A satisfies (a), (b), and (c) of (61.1). Let B be the set of al1 elements r E A which satisfy nr = a for some n E N and a E Z. Then B is a subring of A, and B satisfies (6l.la, b, c, d). Moreover, any ring which satisfies al1 of the conditions of (61 . l ) is isomorphic to B. Since we will not use this theorem, its proof will be omitted. The conditions (c) and (d) of (61.1) lead to the familiar method of representing al1 rational numbers. In fact, every rational number is a quotien t a/n, a E Z, n E N, and conversely, for every a E Z, n E N, there is a rational number which is the quotient a/n. I t should be remembered that a/n is no more than a way of designating a particular rational number. We think of a/n as the expression representing the solution of the equation n x = a; that is, a/n represents the number obtained from the division of a by n, just as a n represents the number obtained by multiplying a and n. Of course, each rational number has infinitely many representations in this form. For example
1 2,
4 7
6 7
al1 denote the same rational number. The conditions (61.1) are the specifications which must be met by the system of rational numbers, and as we noted in Theorem 61.2, any two rings which satisfy these conditions are isomorphic. However, it is not immediately obvious that there is any system at al1 which satisfies (61.1). In Section 65, starting with N and Z, we will construct an ordered integral domain Q which does satisfy (61.1). By using (6l.la, b), it is possible to discover the rules of operation for rational numbers, represented as quotients.
202
[CHAP.
(61.3). Rules o f operation for rational numbers. Let a and b be integers, and let m and n be natural numbers. Then (a) a/m = b/n if and only if na = mb ; (b) a/m < b/n if and only if na < mb; (e) ( a l 4 (bln) = (na mb)/mn; ( 4 (a/m) = (a)/m; (e) (a/m> (b/n> = (ab)/(mn).
Proof. Let r denote the rational number a/m, and let s represent b/n. That is, r and s are the unique rational numbers satisfying
I f r = S, then mnr = mns. Hence na = mb. Conversely, if na = mb, then mnr = mns. Since m and n are natural numbers, mn f 0, and by the cancellation law in an integral domain, r = s. The proof of (b) is f r < S, then na = mnr < mns = mb, similar to the proof of (a). I since mn is positive in 2, and therefore also positive in Q. Conversely, if na < mb, then mnr < mns. This implies that r < s. For otherwise, s 2 r and hence mns _< mnr, by Theorem 46.2. The proofs of (e), (d), and (e) are very simple. Note that by Definition 42.1 and Theorem 42.4
(mn)(r S) = mnr mns m(r) = (mr) = a, (mn) (rs) = (mr) (ns) = ab. Thus, by Definition 44.9, (a/m) (a/m) (a/m)
na
+ mb,
r
=
(a)/m,
. (b/n)
r s = (ab)/(mn).
Although the system of rational numbers is constructed in order to divide integers by natural numbers, it happens that this system enjoys the strongest possible divisibility property: if r and s are rational numbers, and if r f O, then r divides s in Q. This fact is easily proved using the properties of Q given above. By (6T1.1), we can write r = a/m, s = b/n, where a and b are integers and m and n are natural numbers. Then it is well known that s/r = mb/na is the required quotient. However, if a is negative, then na is not a natural number, and it does not follow from
611
203
(61.1~)that mblna is in Q. This defect is easily corrected by noting that s/r can just as well be represented by mab/na2, where a2 is a positive integer, and consequently na2 E N. This is the idea involved in the proof of the important divisibility property, which we now formulate more carefully . THEOREM 61.4. I f r and s are rational numbers, and if r # O, then r divides S in Q; that is, there is a rational number t such that r . t = s. Proof. By (61.1)) we can write
tvhere a and b are suitably chosen integers, and m and n are natural numbers. Thus, the rational numbers r and S satisfy
Since r # O, it follows that a # O. Thus, by Theorem 45.5, a2 is a positive integer. Therefore na2 E N. Evidently, mab E 2 . By (61.1~))there is a rational number t such that na2t = mab. Theref ore, (mna2)rt = (mr) (na2t) = a(mab)
=
(ma2) b = (ma2)(ns)
(mna2)s.
Sinee Q is an integral domain, and mna2 E N (hence, mna2 Z O), it follows that rt = s. I t should be noted that in severa1 places in the above proof, we have used the fact that multiplication in Q of elements belonging to Z agrees with the usual multiplication in 2, that is, Z is a subring of Q.
1. Prove the cancellation law for quotients: if m and n are natural numbers and a is an integer, then ma/mn = a/n in Q.
2. Prove that each positive rational number can be represented uniquely in the form m/n, where m and n are relatively prime natural numbers.
4. The Farey series F k of order k is the ascending sequence of rational fractions 01'12 m/n, where O 5 m 5 n 5 lc and (m,n) = 1. For instance, F4 is 1, 4 , 31 29 3 7 3 , L. Write the Farey series F5 and Fs.
204
[CHAP.
r/(S)
(r/s)
and
(r)/(S)
r/s.
7. Point out the places where the condition (6l.lb) was used in the proof of Theorem 61.4.
8. Prove Theorem 61.2. 62 Fields. The theory of ordered integral domains developed in Chapter 4 can be applied to the ring Q of rational numbers. However, Q also satisfies the divisibility property, given in Theorem 61.4, which does not hold in al1 integral domains (for example, it is not satisfied in 2 ) . I t is to be expected that certain properties of the rational numbers are consequences of Theorem 61.4 (together with the other properties of integral domains). I f this is so, then these properties will hold as well for any ordered integral domain which satisfies this divisibility condition. There are examples of such integral domains. For us, the most interesting one other than Q itself is the ring R of al1 real numbers. Thus, as in Section 42, it appears that by introducing a new abstract concept, we will be able to prove theorems in a general setting which will apply to a number of important special cases.
DEFINITION 62.1. A jield is a commutative ring F such that (a) F contains a t least one element different from O; (b) if z E F, y E F, and x f O, then there is an element x E F such that z x = y.
EXAMPLE 1. The rings Q and R are fields, as we have mentioned. So is the system C of al1 complex numbers. The ring of integers is not a field. EXAMPLE 2. Let F a + a = 0;O.O = a . 0 is a field. EXAMPLE 3. Let F
tables
=
=
=
a + O = a, a. T h e n F
+ O l u v O 1 l u u u v
Define x verify.
.lo
1 u v
l O v u
u v o 1
v u 1 0
0 ' 0 o o o 1 O l u v
U O U V l
O v l u
621
FIELDS
205
The definition of a field does not explicitly require the existence of an identity element. However, it is not hard to show that every field does contain an identity. In fact a much stronger statement can be made. THEOREM 62.2. Every field is an integral domain. Proof. Suppose that F is a field. We first prove that F contains a nonzero identity element. Let x be any nonzero element of F. There is such an x by Definition 62.l(a). By Definition 62.1 (b), there is an e E F such that x e = x. Since x # O, it is evident that e cannot be O . To prove that e is an identity element, it is only necessary to show that y . e = y for al1 y E F. Note that since F is commutative, e y = y . e for al1 y. If y is any element of F, there exists z E F such that x z = y. Consequently, y = x . 2 = ( x . e ) . z = ( e . x ) . z = e . ( x . z ) = e  y = yse. Thus,eis a nonzero identity in F. To complete the proof of the theorem, it will now be enough (by Theorem 44.5) to prove that F has no proper divisors of zero. Suppose that y is a divisor of zero in F. That is, x y = O for some nonzero element x. By Definition 62.1 (b), there is an element w E F such that x . w = e, the identity of F. Consequently, y = y e = . Thus, if y is a divisor y (x w) = (y x) w = (x y) w = O . w = O of zero, then y = O, that is, F contains no proper divisors of zero. As we mentioned in Chapter 4, the identity element in any ring is usually denoted by 1. Of course, this custom is observed for fields in particular. Since every field F is an integral domain, there is, for each x # O and y in F, only one x E F satisfying the equation x z = y. Thus, as in any integral domain, we can write z as the quotient y/x. The property which distinguishes fields from arbitrary integral domains is the fact that in a field the quotient y/x always exists when x # 0. DEFINITION 62.3. Let F be a field. Let x be a nonzero element of F. The quotient 1/x is called the inverse o f x i n F, and is denoted by xl. Thus, x' is the unique element satisfying n: x' = 1. I t should be emphasized that x' is not defined for z = 0. Moreover, if x # O, then x' # o. THEOREM 62.4. Let x, y, xl, x2, . . . , n:, (n a field F. Then (a) (x . y)' (b) (2' ~ (c) (xl)'
1
1) be nonzero elements of
=
2 .
x' 2;
. y';
xn)' = x i '
=
. x2'. . . . x , ' .
S
(d) 1 = = 1; (1)'
1.
206
[CHAP.
Proof. The proofs of al1 of these statements [except (b), which can be obtained by induction from (a)] are based on the above obser~at~ion about the uniqueness of the inverse, namely, if s and t are elements of F such tha,t S t = 1, then t = S'. For example, to prove (a), let s = x y, t = xl y1. Then s t = (x y) (xl tl) = (x xl) ( y . y1) = 1 . 1 = 1. Thus, x  '  ~  ' = t = sl = (x y)'. The proof of (c) is similar. By definition of the inverse, x xl = 1. Thus, x' x = 1. Therefore, x is the inverse of xl. That is, x = (xl)'. The proofs of (b) and (d) are left for the reader to complete. Quotients can be expressed in terms of the inverse operation: if x # 0, then Y .l. y = y . x1. (61) X Indeed, x (xl y) = (x x') y = 1 y = y. This observation is useful, because inverses are easier to manipulate than quotients. DEFINITION 62.5. Let x be a nonzero element of the field F. Define xn for natural numbers n by the inductive conditions
Define x0 = 1, and
X(n)
(x'1".
...
We have briefly discussed powers of real numbers in Section 26. The new concept which Definition 62.5 introduces is the idea of zero and negative exponents. That is, if x is a nonzero element of a field, then the object xa is defined for every integer a. f eaponents. Let x and y be nonzero elements of a field (62.6). Ruies o
The identities (d) and (e) are easy consequences of Definition 62.5. Also, if a and b are natural numbers, then the identities (a), (b), and (c)
621
FIELDS
207
can be proved by induction on a (see Problem 1, Section 26). To extend these results to arbitrary integers involves a somewhat tedious checking of cases. As an illustration, let us consider the identity (a) for the case in which a is a positive integer and b is a negative integer. Then a = n and b = m, where n and m are natural numbers. If n > m, then n = (n  m) m. Hence,
assuming that al1 of the identities above have been verified in the cases where a and b are natural numbers. I f n = m, the proof is simpler:
I f n
< m, t~hen m
(m
n)
+ n and
An ordered integral domain which satisfies Definition 62.1 (b) is naturally called an ordered field. The most important examples of ordered fields are the systems of rational numbers and real numbers.
THEOREM 62.7. Let F be an ordered field. Let x and y be elemeiits of F, and let a and b be integers. (a) If x > O , thenx' > 0;if z < O , thenx' < 0. (b) I f O < x < y, o r x < I/ < 0, thenx' > y'. (c) If x > O, then za > O; if x < O and a is odd, then xa < O; if x < O and a is even then xa > 0. (d) I f O < x < y, and a > O, then xa < ya; if O < x < y, and a < O, then xa > ya. (e) If O < x < 1 and a < O , then xb < xa; if 1 < x, and a < b, then xa < xb.
Proof. If x > O, then x' # O. Hence, either x' > O, or x' < 0. I f z' < O , then 1 = x xl < O, which is false by Theorem 45.5. Thus, x1 > O. Similarly, if n. < O, then x' must also be < O . Suppose that O < x < y. Then xl and y' are positive. Assume that a' 5
208
[CHAP.
y'. T h e n 1 = x x' < y . x' y y' = 1 , whieh is a eontradietion. Therefore, y' < xl. I f x < y < O, then O <  y < x. Henee, (N)' < (y)'. Therefore,  (y)' <  (  x )  l . Holvever,  (y)l = (1) (g)l = (  1 )  l . (y)l
= [(1)
<
(y)]l
= y1,
xl.
EXAMPLE 4. Often problems of simplifying the description of sets of real or rational numbers involve inverses. The rules given in Theorem 62.7 are useful in solving such problems. Consider the set ( x E Rlx 22l > 3). If x 221 > 3, then x 5 O is impossible. Hence, x 221 > 3 if and only if x > O and x ( x 2 s  l ) > 32, that is x2 2 > 3x > O. This inequality holds if and only if ( x  2 ) ( x  1 ) > O and x > O. Since x > 0, the product ( x  2 ) ( x  1 ) is positive if and only if O < x < 1 or x > 2. Hence,
(a) ( x  l . yl)l . ( x  l )  l (b) ( x y x  l )  l . ( x  l . y1 . x  l ) ( 4 [ ( x  l ) y]l [ x ' (Y91 2. If A is a subring of a field F, and if 1 E A, does it follow that A is a field? An integral domain? Support your answer by examples or proofs.
5. Prove by induction on n that if x, x i , 22, and a, al, a2, . . . , a, are integers, then
fi
x2)
631
CHARACTERISTIC
209
7. Simplify the descriptions of the following sets. (a) (x E Rlxl > 4) (b) {x E Rlx (c) {x E Rlx xl 23.. 8. Let F be an ordered field. Suppose that x E F, y E F, and x # O . Show that (a) IxlI = I X I  ~ , (b) Ix/ Y I = lxl/lyl. 9. Let x, y, x, and w be elements of an ordered field. Suppose that x # O and w # O. Prove the following. (a) If x and w have the same sign, then x/x < y/w if and only if xw < yx. (b) If x and w have opposite sign, then x/x < y/w if and only if x w > yx. 10. Prove Theorem 62.7(c), (d), and (e). 11. Prove Theorem 62.6 in detail. 12. Show that if A is a commutative ring with an identity element e such that for every x # O in A, there is an element y E A such that x y = e, then A is a field. 13. Prove that if F is a field, and if A is a ring which is isomorphic to F (see Definition 42.7)) then A is a field.
>
<
63 The characteristic of integral domains and fields. The rules of exponents (62.6) resemble the identities listed in Theorem 43.3 for repeated sums. Indeed, the operation of forming powers of x is the multiplicative analogue of the additive operation ax defined in Definition 43.2. Recall that if a = n is a natural number and x is an element of a ring, then
ax= x + x +  a  + x ;
if a = O, then ax = O; if a
=
n summands
n summands
There is an important classification of integral domains and fields which is based on the operation ax. THEOREM 63.1. Let A be aii integral domain. Theii exactly oiie of the following statements is true. (a) If x # O in A, then nz # O for al1 n E N. (b) There is a unique prime p such that pn: = O for al1 z E A, and if nx = O for z # O, then p divides n iii 2.
2 10
[CHAP.
Proof. Suppose that nx = O for some n E N and x # O. By the wellordering principle for N, there is a smallest natural number p which has O for al1 nonzero this property. That is, if m < p, m E N, then my y E A, but there is an x # O in A such that px = O. Let e be the identity* of A. By Theorem 43.3, (pe) x = p(e x) = px = O. Thus, since A is , we obtain pe = O . Therefore, for any an integral domain and x # O We next show that p is a prime. y E A, py = p(e y) = (pe) y = O. Suppose otherwise. Then there are natural numbers r and s such that 1 < r < p, 1 < s < p, and p = r s. Therefore, (re) (se) = (rs)e2 = pe = O . Since A is an integral domain, this implies that either re = O or se = O. However, by its choice, p was the smallest natural number such that pe = O. Thus, p must be a prime. It is clear that the prime p is unique. Finally, suppose that nx = O for some natural number n and x # O in A. If p i n, then (n, p) = 1. Thus, integers a and b exist satisfying an bp = 1. Therefore, by Theorem 43.3, x = l x = (an bp)x = (an)x (bp)x = a(nx) b(px) = a0 bO = O . This contradiction shows that pln. To review our proof, we have shown that if statement (a) is not true, then statement (b) is true. Therefore, either (a) is true or else (b) is true. Obviously both statements cannot be true for the same integral domain A.
+ +
DEFINITION 63.2. Let A be an integral domain. The characteristic of A is defined to be zero if A satisfies Theorem 63.1 (a), and it is defined to be the prime p if A satisfies Theorem 63.1 (b). We say that A is of prime characteristic if its characteristic is some prime p. The characteristic of the rings of integers, the rational numbers, the real numbers, and the complex numbers is zero. I n fact a more general result can be proved. THEOREM 63.3. The characteristic of every ordered integral domain A is zero. Proof. Suppose that x > O in A. Then 2x = x 32 = 22 x >x x = 2x, and so on
+ x > O + x = x,
, then x > 0, and (nx) = In particular nx # O for al1 n E N. If x < O n(x) # O for al1 n E N. Therefore, nx # O for al1 n E N and x E A. Consequently, Theorem 63.1 (a) is satisfied, and the characteristic of A is zero.
* In order to avoid confusing the identity of A with the natural number 1, we do not follow the custom of denoting the identity of A by 1 in this proof.
631
CHARACTERISTIC
211
The fields in Examples 2 and 3 of Section 62 have characteristic 2. I t is possible to construct fields of arbitrary prime characteristic.
EXAMPLE l. Let p be a prime. Define Z, = {O, 1, 2, . . . , p  1). Instead of the usual addition, negation, and multiplication of integers, define operations 0, 0, and O by
5 d < p,
and d
+
a
b (mod p) ;
5e<
where f is the unique integer such that O 5 f < p, and f a b (mod p). The fact that 2, is a ring with respect to these operations is a consequence of the elementary properties of integers and the relation of congruence. For example, to prove that (a @ b) @ c = a O (b O c), let a b d (mod p), where O 5 d < p. Then (a @ b) @ c = e, where e is determined by the conditions O 5 e < p and d c = e (modp). Thus, ( a + b) c = e (modp). I n exactly the same way, a O (b O c) = f, where f is the integer determined by the conditions O 5 f < p and a (b c) f (mod p). Since (a b) b = a (b c), i t follows that e f (mod p). However, e and f satisfy O 5 e < p and O 5 f < p. Thus, e = f, and consequently a O (b @ c) = (a @ b) @ c. The other ring postulates are proved similarly. Note that 2, is commutative and has 1 as an identity. These facts do not depend on p being a prime. However, the assurnption that p is prime is needed to show that 2, is a field. Suppose that a # O inZ,. Then O < a < p, so that p does not divide a. Thus, since p is prime (a, p) = 1. Consequently if b E Z,, there exists (by Theorem 57.2) an integer m such that am b (mod p). Let c be the unique integer such that 0 5 c < p a n d m  ~ ( m o d p ) . ThencEZ,,anda.camb(modp). Since O 5 b < p, i t follows that a O c = b. Hence, by Definition 62.1, Z, is a field. Our final observation concerning 2, is that its characteristic is p. I n fact
+ +
+ + 
+ +
p summands
< p and
l + l + . . o + l = p(modp).
' p summands
Thus, p l = O in 2,.
212
[CHAP.
The fields Z,, defined in the above example, are almost as important in mathematics as the fields Q and R. They connect the theory of numbers with the powerful methods of abstract algebra. In Chapter 9, we will see an example of how a rather simple theorem about abstract fields can be translated into an important result of number theory when it is specialized to a statement about 2,. For the sake of future reference, we collect some useful facts about 2,. THEOREM 63.4. (a) The elements of 2, are the integers O, 1, 2, p  1. (b) Z, is a field with respect to the operations 0, 0. (c) If a and b are elements of Z,, then
...,
e,
a.b =a
0b (modp).
2. Complete the proof that Zp is a commutative ring. 3. (a) Show that the field Z2 is isomorphic to the field described in Example 2, Section 62. (b) Show that the field given in Example 3, Section 62, is not isomorphic to any of the fields 2,.
4. Show that a field which has only a finite number of elements cannot have zero characteristic.
5. Let A be an integral domain of prime characteristic p. Show that for any x and y in A, (x y)P = xp yp.
6. Let A be an integral domain with identity element e. Let B be the subring of A consisting of al1 elements ae with a E Z (see Problem 10, Section 46). (a) Show that if the characteristic of A is zero, then the correspondence
is an isomorphism between Z and B. (b) Show that if the characteristic of A is the prime p, then the correspondence
641
EQUIVALENCE RELATIONS
213
64 Equivalence relations. In Section 61, the system of rational numbers was described informally. Some basic properties of this system were listed in (61.1), and the consequences of these properties were discussed. We now face the problem of constructing the rational numbers. Our goal is to define a set of objects, operations of addition, negation, and multiplication, and an ordering of the objects, such that the conditions of (61.1) are satisfied. What the objects called rational numbers really are does not matter very much. They stand for different things in different applications, The important thing is to satisfy the conditions of (61.1). I t is a fact (and not very difficult to prove) that any two systems which satisfy these conditions are isomorphic in the sense of Section 33. This remark explains why the problem of constructing the rational numbers is equivalent to the construction of a system which satisfies these conditions. How should a set of objects which satisfies the conditions of (61.1) be defined? By (6l.ld), each rational number can be represented by an expression a/m, where a represents an integer, m represents a natural number, and the solidus bar / is a punctuation mark which separates a and m. This describes the symbol for a rational number in a purely formal way, and points out the fact that a rational number is really determined by the pair of numhers a and m, written in a definite order. Therefore, the symbol (a, m), which denotes an ordered pair of numbers can be used to represent the rational number a/m The set of al1 ordered pairs (a, m ) with a E Z and m E N is a definite collection of objects. Our discussion suggests that this set of ordered pairs might be a likely candidate for the set Q of rational numbers. However, there is a difficulty with this choice for Q. I t follows from (61.3) that different expressions a/m and b/n can represent the same rational number. For example, % = = 2. In fact, by (61.3a), a/m = b/n if na = mb. Thus, in the collection of ordered pairs (a, m), we must agree to somehow identify the pairs (a, m) and (6, n) when na = mb. This identification procedure is based on the important concept of an equivalence relation. Although our immediate interest is the construction of the system of rational numbers, the methods which will be discussed in the present sectioh are applicable to very general situations. The term "relation " is familiar to almost everyone. Mathematicians use many particular relations such as inequalities of numbers, inclusions of sets, congruence of integers, and similarity of geometric figures. In addition to these specific examples, the general notion of a relation on a set is of great importance in mathematics.
DEFINITION 64.1. Let S be any set. A relation on S is a set T of ordered pairs (z, y ) of elements of S.
At first, this definition seems to be far from the usual meaning of the term "relation. " I t would be less strange if we said that t'he relation "corre
214
[CHAP.
sponds t o n the set T of al1 ordered pairs (x, y) of elements of S which stand in the given relation. For example, the relation < on the set of al1 integers corresponds to the set of al1 ordered pairs (a, b) which satisfy a < b (or more explicitly, b  a E N ) . The trouble is that there are relations on sets of objects which correspond to the same set of ordered pairs, but would be considered different by familiar standards. Consider a set of three brothers, Jack, Jerry, and Jim. Suppose that Jack is 8 years old and 5 feet tall, Jerry is 5 years old and 4 feet tall, while Jim is 3 years old and 3 feet tall. Then the relations "is older than" and 'lis taller than" applied to this set both correspond to the following set of ordered pairs : {(Jack, Jerry), (Jack, Jim), (Jerry, Jim)). From a mathematician's standpoint, these two relations are the same even though the concepts "older " and "taller" would lead to different relations on another collection of people. Simplification is a characteristic of mathematics. Considering two relations to be identical if the corresponding sets of ordered pairs are the same is a typical example of simplifying a familiar concept so that it can be used with mathematical precision. This is the justification for Definition 64.1. Although a relation is defined to be a set T of ordered pairs, it is often convenient to express the fact that a certain pair (x, y) belongs to T by writing x < y, x = y, x N y, x y, or 5 = y. That is, a symbol such as is associated with the relation T and if ( 2 , y) E T then we write
and speak of the relation on S (meaning, of course, the relation T). The particular relations of congruence and inequality which we have considered in previous sections have al1 been defined and expressed in this way. The most important and useful relations in mathematics satisfy certain special conditions. The equivalence relations which we will discuss in this section are defined to be relations which satisfy three particular conditions.
DEFINITION 64.2. Let S be a set. Let T be a relation on S. Theii T is an equivalence relation* if (a) (x, 2) E T for al1 x E S ; (b) if (x, y) E T, then (y, x) E T ; (c) if ( x , y) E T and (y, 2) E T , then (5, x) E T .
* This notion should not be confused with the concept of the "equivalence of two sets" which was introduced in Section 12. The relation of equivalence of sets defined in Definition 12.3 is a particular equivalence relation on the class of al1 sets.
641
EQUIVALENCE
RELATIONS
215
I f we write x y to stand for (x, y ) E T, then the conditions (a), (b), and (c) take a more familiar form: (a') x x for al1 x E S; (b') if x y, then y x; (e') if x y and y x, thenx x.

The condition (a) or it,s equivalent (a') is called the "reflexive law." The properties (b) and (b') are called the "law of symmetry," while (c) and (e') are called the "transitive law." Because of the transitive law, it makes sense to write a sequence of equivalences
as we have done in the case of inequalities or congruences. Also, by the reflexivity, such a sequence can include equalities
The convenience of writing sequences of equivalences and equalities is one b is preferred to (a, b) E T. reason why the notation a
EXAMPLE l. Let S be any set. Let T = ((x, x)lx E S}. Then T is an equivalente relation on S. This equivalence relation is ordinary equality, since (x, y) E T if and only if x = y. EXAMPLE 2. Let m be a natural number. Let T = ((a, b)la E Z, b E Z, a = b (mod m)). Then T is the equivalence relation on the set Z of al1 integers (see Theorem 56.3), which was called "congruence modulo m" in Chapter 5. EXAMPLE 3. Let S be the set of al1 ordered pairs (m, n) of natural numbers. Let T be the collection of al1 ordered pairs of ordered pairs ((m, n), (k, 1)) satisfying m 1 = n k. Then T is an equivalence relation on S.
EXAMPLE 4. Let S be the set of al1 ordered pairs (a, m) where a E Z and m E N. Define T to be the set of al1 ordered pairs of ordered pairs
such that na = mb. Then T is an equivalence relation on S. To prove the transitive law for example, suppose that
Then na = mb and kb = nc. Therefore, nlca = mkb by cancellation, ka = mc. Thus, by definition,
mnc
nmc. Hence,
216
[CHAP.
The last example is the equivalence relation which will lead to the construction of the system Q of rational numbers. For convenience, we will use the symbol = to denote this relation. That is, write (a, m)
. =
(b, n)
if
na
mb.
Example 3 is similar to Example 4. I t is possible to use this equivalence relation to obtain a new construction of the integers from the natural numbers. The process is analogous to the construction of Q using the equivalence relation of Example 4, which will be given in the next section. DEFINITION 64.3. Let S be a set and let on S. For x E S, define
y).
be an equivalence relation
[x]= (y E S ( x
The set [x]is called the equivalence cass of the element x with respect to the equivalence relation . I t should be remembered that the definition of [x] depends on the equivalence relation , although this fact is not indicated by the notation. THEOREM 64.4. Let be an equivalence relation on a set S. Then (4 x E [xl; (b) [x]= [y]if and only if x y; (c) if y E [x], then [x]= [y] ; (d) for any x E S and y E S, either [x]= [y]or [x]n [y]= a; (e) S = ~({CxllxE 81).
Proof. By Definition 64.2(a'), x x. Thus by Definition 64.3, x E [x]. To prove (b), suppose first that [x]= [y]. Then y E [y]= [x]. Thus, by Definition 64.3, x y. Conversely, suppose that x y. If x E [y),then y 2. Therefore, by Definition 64.2, (e'), x x. Hence, x E [x].This shows that x y implies [y]S [x].Similarly, y x implies [x]2 [y]. Consequently, since y x follows from x y by Definition 64.2(br), we obtain x y implies [x]= [y]. This proves (b). If y E [x], then by Definition 64.3, x y. Therefore, [x]= [y] by what has just been shown. This proves (e). In order to obtain (d), assume that [x]n [y] # a. Then there is a x E S such that x E [x]and x E [y]. By the property (c) which we have just established, [x]= [x]= [y]. Finally (e) is evident because by Definition 64.3, [x]5 S for al1 x, so that ~({[xllx E S)) E S. On the other hand, by (a), if y E S, then y E [g] ~({[xllxE S)). Hence, every element of S belongs to the union ~({[xllx E S)).
  
 
641
EQUIVALENCE RELATIONS
217
EXAMPLE 5. Let S be the set Z of al1 integers and let be the relation of congruence modulo m. Then for any a E Z, it is easy to see that [a] = { a + b 0 m l b = 0, h 1 , h 2 , Therefore there are exactly m distinct (and disjoint) equivalence classes; namely, [O], [l], [2], . . . , [m  11. These are the sets which were denoted by Xo, Xl, X2, . . . , XmViin Section 57. EXAMPLE 6. Let S be the set of al1 ordered pairs of natural numbers, and let T be the equivalence relation on S which was defined in Example 3. If (m, n) E S , then [(m, n)] = { ( k , 1)jlc  1 = m  n) . Consequently, there is a onetoone correspondence between the equivalence classes for this equivalence relation and the set of al1 integers; this correspondence is given by [(m, n)] ++ m  n. EXAMPLE 7. Let S be the set F of al1 ordered pairs (a, m) with a E Z, m E N. The equivalence classes of F with respect to the equivalence relation = defined in Example 4 are [(a, m)] = {(b, n)lna = mb) . By (61.3a), na = mb if and only if a/m = b/n. Thus, [(a, m)] consists of al1 pairs (b, n) such that bjn = a/m. Therefore, there is a onetoone correspondence between these equivalence classes and the rational numbers as we know them informally. This remark is the key to the formal construction of Q: the rational numbers are defined to be the equivalence classes of elements of F with respect to =.
2. Find al1 equivalence relations on the set {O, 1, 2). 3. Give examples of a relation on the set {O, 1, 2) which satisfies the following conditions of Definition 64.2. (i) none of (a), (b), and (c) (ii) (a), but neither (b) nor (c) (iii) (b), but neither (a) nor (c) (iv) (c), but neither (a) nor (b) (v) (a) and (b), but not (c) (vi) (a) and (c), but not (b) (vii) (b) and (c), but not (a) 4. Show that the relations of Examples 3 and 4 are equivalence relations.
218
[CHAP.
5. Let S be the set of al1 straight lines in the Euclidean plane. Let Po be a fixed point in the plane. Let lo be a fixed line in the plane. State which of the conditions (a), (b), or (c) of Definition 64.2 are satisfied by the following relations. (i) 1 m if 1 is parallel or equal to m (ii) 1 m if 1 is not parallel to m (iii) 1 m if 1 is perpendicular to m (iv) 1 m if 1 is perpendicular to m, or if 1 is parallel or equal to m (v) 1 m if 1 and m both pass through Po (vi) 1 m if 1 and m intersect in a point on the line lo (vii) 1 m if 1 # m
6. Verify directly that the equivalence classes of integers modulo m given in Example 5 satisfy the statements of Theorem 64.4.
7. Let S be the set of al1 natural numbers. Define T = ( ( m , n)ln divides m k for some lc, and m divides ni for some j). Show that T is an equivalence relation on S. What are the equivalence classes [l],121, and [6]with respect to the equivalence relation T?
8. A partition P of a set S is a set of nonempty subsets of S such that (i) if A E P, B E P, and A # B, then A n B = cP; (ii) U(P) = S.
(a) Show that if is an equivalence relation on S, then the set of al1 equivalence classes in S (with respect to the relation ) is a partition of S. (b) Suppose that P is a partition of S. Define x y if x E A and y E A, where A is some set of P. Show that is an equivalence relation on S.

65 The construction of Q. Let F be the set of al1 ordered pairs (a, m), with a E 2, m E N.
DEFINITION 65.1. Let Q be the set of al1 equivalence classes [(a, m)] of F with respect to the equivalence relation = defined by (a, m) = (6, n) if na = mb.
The elements of Q are called rational numbers. I t is necessary now to introduce operations of addition, negation, and multiplication on the set Q defined above. The discussion of Example 7, Section 64, suggests that [(a, m ) ] should be interpreted as the quotient a/m. On the basis of this interpretation, the laws given in (61.3) motivate the definitions of the operations and the ordering in Q. For example, since a/m b/n = (na mb)/mn, it is natural to define
6 51
THE CONSTRUCTION OF Q
219
A little thought is required to see that this definition makes sense. Consider a particular example:
According to (62))
so that by (62),
Therefore, in order to justify (62), it is necessary to show that [(7, 6)] = [(42, 36)]. By Theorem 64.4, this condition is equivalent to
which is easily verified from the definition of =. Of course, in order to justify (62) generally, we must work with expressions which involve arbitrary numbers. However, before proceeding with this calculation, let us formulate the exact definitions of the operations and the ordering in Q.
DEFINITION 65.2. Let r and S be elements of Q. That is, r and S are equivalence classes of ordered pairs, with respect to the relation =. Arbitrarily, select (a, m) E r and (b, n) E s. Define (a) r S = [(na mb, mn)l, (b) 7 = [(a, m)], (c) r S = [(ab, mn)], (d) r < S if na < mb.
Frequently; as in this case, mathematical definitions are made to depend on an arbitrary choice of one or more things. Whenever this happens, it is necessary to show that the object being defined does not really depend on these initial choices. If this can be proved, then the object is said to be well deJined. The fact which must be proved in order to justify Definition 65.2 is that the equivalence classes [(na mb, mn)], [(a, m)], [(ab, mn)] and the condition na < mb are the same for al1 choices of (a, m) E r and (b, n) E s.
220
[CHAP.
and and
m'a
ma'
n'b
nb'.
[(na
and
na
< mb
n'a'
< m'b'.
By Theorem 64.4(b) and the definition of =, these conditions are equivalent to mtn'(na mb) = mn(n'a' m'b'), m'(a) = m(a'), (m'n') (ab) = (mn) (a'b'), and na < mb if and only if n'a' < m'b'.
+
These results can easily be obtained from the relations m'a n'b = nb'. For example,
mar and
m'n' (na
+ mb) = (n'n)
= =
+ +
Also, if na < mb, then m'n'na < m'n'mb, since m' and n' are natural numbers. Thus, nfn.ma' < m'mnb', or (mn) (n'a') < (mn) (rn'b'). Therefore, n'a' < rn'b'. In the same way, n'a' < m'b' implies na < mb. The remaining identities are left for the reader to check. THEOREM 65.3. The set Q defined in Definition 65.1, together with the operations and the ordering given by Definition 65.2 is an ordered integral domain.
Proof. The proof of this result is entirely straightforward, as the following sample indicates. Let r, S , and t be elements of Q, that is, equivalence
651
T H E CONSTRUCTION OB Q
22 1
classes of ordered pairs. We wi11 prove the distributive law (r S) t = r t s t. Let (a, m) E r, (6, n) E S, and (c, lc) E t. Then (r S) t = [(na mb, mn)] [(e, k)]. Note that (na mb, mn) belongs to its equivalente class [(na mb, mn)], so we may choose it to form the product [(na mb, mn)] [(c, le)]. By Definition 65.2(c),
+ + +
+ +
(r
+ S) . t
[((na
+ mb)c, mnlc)]
t
= =
[(nac
+ mbc, mnlc)].
r t
+s
The construction of Q given in this section encounters the following problem: the set Q defined in Definition 65.1 does not contain the set Z of al1 integers. However, Q does contain the subset
which has al1 the properties of Z. Indeed, by Definition 65.2, [(a, 0 1 [(b, 0 1 = [(a b, 1)1, [(a, 0 1 = [(a, 0 1 , [(a, 1) [(b, 0 1 = [(a b, 1)1, and [(a, 1)]
+
<
[(b, 1)]
if and only if a
< b.
is an isomorphism between Z and 2'. In fact, if a # b, then 1 a # 1 b, so that (a, 1) is not equivalent to (b, 1) under the relation . Hence, by Theorem 64.4, [(a, l ) ] # [(b, l)]. This proves that the correspondence is onetoone. The other conditions required for an isomorphism are easily obtained from (63).
222
[CHAP.
In order to satisfy condition (b) of (61. l ) , we will identify each equivalente class [(a, l ) ] with the corresponding integer a. That is, [(a, l ) ] will be considered as a new label for a. The process of identifying a ring A (or a more general mathematical system) with a subring B of another ring is used frequently in mathematical constructions. This is always possible if A is isomorphic to B, and if \ve are only concerned with properties which are consequences of the operations in A. Indeed, from this viewpoint, there is no real difference between A and B. The identification of Z with Z' carries with it the identification of N with N' = {[(m, l)]jm E N). Thus, we obtain N C Z c Q. It remains to show that (61 .le, d) are satisfied. THEOREM 65.4. For any a E Z and m E N,
Proof. Since (m, 1) E [(m, l)] and (a, m) E [(a, m)], it follows from Definition 65.2 that
By the definition of the relatioa =, (ma, m) = (a, 1). Thus, by Theorem 64.4(b),, [(ma, m)] = [(a, l)]. From this theorem and the identificatioii of the integer a with the equivalence class [(a, l)], we obtain the result that Q satisfies (6l.lc, d). THEOREM 65.5. I f a E Z and m E N, then [(m, l ) ] divides [(a, l)] in Q. Moreover, every element of Q is of the form [(a, l)]/[(m, l)] for some a E Z and m E N.
1. Complete the proof that the operations of Definition 65.2(b) and (c) are well defined.
651
T H E CONSTRUCTION
OF Q
223
resulting ring? Shoiv that if m is a prime, then the ring obtained by this construction is a field mhich is isomorphic to 2,.
5. Let S be the set of al1 ordered pairs (m, n ) of natural numbers with the equivalence relation defined in Example 3, Section 64: (k, 1 )
Define
(m, n )
if
+n
+ m.
[(k,0 1
+ [(m,4 1
<
[(m,n ) ]
= =
+ + + + + +
(a) Prove that these operations are well defined on the set 2' of equivalente classes of S, and that with these operations, Z' is an ordered integral domain. (b) Show that the correspondence [(m,n ) ]++ m  n is an isomorphism between 2' and the ring Z of al1 integers.
CHAPTER 7
* However, the theory of proportions developed by the Greek mathematician Eudoxus (408355 B.C.) can be considered as a geometrical analogue of Dedekind's development of the real numbers. There is evidence that this discovery was not made by Pythagoras himself, but rather by one of his followers.
711
DEVELOPMENT
225
Therefore, m2 is even. Consequently, m is even. Thus, the number a has the factor 2 in common with m. This contradicts the fact that a and m were selected to be relatively prime. Hence our assumption that 2 is the square of a rational number must be false. By using the more sophisticated facts about the integers which were established in Chapter 5, we can prove a general theorem, from which Pythagoras' result is obtained as a special instance. THEOREM 71.1. Let m be a natural number, and let a be an integer. I f there is a rational number r such that
then r is necessarily an integer. This theorem does not state that there is, or is not, a solution of the equation xm = a. Such results depend on m and a. For example, if a = 4 and m = 2, then xm = a has two rational solutions x = 2 and x = 2; on the other hand we have just seen that if a = 2 and m = 2, then xm = a has no solution which is a rational number. What Theorem 71.1 says is that in searching for a rational number r such that rm = a, we can restrict our attention to integers. To prove the theorem, suppose that r is a rational number such that rm = a. Then r can be represented as a quotient b/n, where b E 2, n E N, and b is relatively prime to n. Consequently,
I f n # 1, then there is a prime p such that pln. Consequently, plbm and therefore, by (53.2)) plb. Ho~vever, this is impossible, since n and b are relatively prime. Thus,
n = l and r = b ~ Z .
Let us now see how Theorem 71.1 implies that .\/S is irrational. Suppose that fi = r E Q. Then r2 = 2. Thus, by Theorem 71.1 (using the case m = 2, a = 2), it follows that r is an integer. However, this is impossible, since 2 is a prime. Theref~re, 2 cannot be the square of a rational number. The discovery of the irratioriaiity of .\/S did not lead Greek mathematicians to the introduction of real numbers, although it did inspire the development of Eudoxus' theory of incommensurable line segments (which is the geometrical analogue of Dedekind's theory of real numbers, created 2200 years later). The Greeks of Euclid's era carefully separated arithmetic from geometry, and as a result, concepts such as length and
226
[CHAP.
area had only a geometrical meaning for them. The use of numbers in practica1 computations was spurned by the Greek ruling class as being contrary to "Platonic idealism. " Nevertheless, fractions were well known and commonly used in ancient Greece. In fact, there was one outstanding exception to the purism of the Greek geometers. That was Archimedes of Syracuse (287212 B.C.),who is considered to be the greatest of al1 mathematicians before Isaac Newton (16421727). Archimedes used numbers extensively in his studies of volumes and areas, and his idea of successive approximation is close to the modern conception of real numbers. The intuitive notion of a real number as a specific object appeared late in the Renaissance. To be sure, rational approximations of particular numbers such as roots of integers and 71 were found and used by the Babylonians even before the period when Greek mathematics flourished. However, the idea of the system of al1 real numbers developed only after the introduction around 1600 of the familiar ('decimal point" notation for decimal fractions :
where the a's and b's are integers between O and 9 inclusive, or else r is the negative of such a number. If r = amlOm a ,  l l ~ ~  l al 10 a0 bl 101 b210~ b,lOn, then r is represented by the expression
+ +
The number of digits following the decimal point is called the number of decimal places in the representation of the decimal fraction r. Using Theorem 51.3, we can easily see that a rational number r is a decimal fraction if and only if r can be written in the form r = a/1ok, with a E Z and 1c some nonnegative integer. Thus, not every rational number is a decimal fraction. However, every rational number can be approximated by a decimal fraction. In fact, one of our main goals after defining the real numbers is to show that every real number can be approximated by a decimal fraction.
711
DEVELOPMENT
227
EXAMPLE 1. The rational number 4is not a decimal fraction. For $ = a/lOk implies lok = 2k5k = 3a, which is impossible by the fundamental theorem of arithmetic. However, 0.3 = $ a < 4< $ G = 0.4, 0.33 = < 4 < &$ = 0.34, = 0.334, < 4< 0.333 =
$ a
EXAMPLE 2. It is possible to approximate 4% by trial and error. Note that < 2 and 22 = 4 > 2. It therefore seems plausible that 4 S l i e s between 1)) where O b < b 1 9. By a pair of decimal fractions 1.b and l.(b trial, we obtain
<
+ <
Thus, apparently lies between 1.4 and 1.5. If this procedure is repeated, additional decimal places are obtained:
Therefore, a perfectly straightforward search procedure yields a decimal fraction whose square is as close to 2 as desired. Our calculation shows that the square of 1.41421 differs from 2 by about By continuing the computation, we could improve this estimate as much as we wish.
The above examples suggest that infinite decimal sequences might be introduced to represent real numbers. Imagine aii endless sequence of successive decimal fractions
approximating a real number u. As in the above examples, it should be possible to obtain the nplace decimal approximation from the (n  1)place decimal approximation by adding a single decimal digit. Then, the
228
[CHAP.
result of al1 the approximations can be conveniently represented by an infinite sequence of decimal digits:
From this infinite expression, the nplace decimal approximation is obtained by using only the first (m 1) n symbols, that is, amam1 . . . ala0 . blb2b3 . . . bn. The expression amam1 . . . alao . blb2b3 . . . will be called the infinite decimal sequence representing u. For exarnple, 4 is represented by the infinite decimal sequence
+ +
In practice it is not possible to specify the complete infinite decimal sequence representing a real number u, except for very special values of u. The decimal fractions are themselves represented by decimal sequences which end with an infinite string of zeros:
We will prove later that rational numbers are characterized by the fact that their infinite decimal representations are ultimately periodic, that is, from some point on, the representations consist of the repetition of blocks of decimal digits. For example,
+ is represented by 0.333333...,
& is represented by 0.090909...,
is represented by 0.416666....
I f infinite decimal sequences are used to represent real numbers, there is a tendency to identify the numbers with the sequences which represent them. A similar temptation was encountered in our discussion of the rational numbers, where we were inclined to identify rational numbers with the quotients a/m. In fact, the definition of the rational numbers in Section 65 was motivated by the idea that rational numbers are represented by quotients a/m, with a E 2, m E N. Similarly, it is possible to define real numbers in terms of infinite decimal sequences, but this construction involves formidable technical difficulties. I t will therefore be avoided. Instead, we will base the construction of R on the intuitive idea that there is a onetoone correspondence between the set of al1 real numbers and the set of al1 points on a line. This geometrical motivation and the resulting construction of R, following the ideas of Dedekind, will be outlined in the next two sections. Infinite decimal sequences do, however, provide a useful way of representing real numbers, and this subject will be discussed in Section 79.
721
229
l. Indicate which of the following numbers are decimal fractions and give 6 25 their decimal representation: +, &, 12g.
2/5.
22,$, and
=
4. Prove that a rational number r is a decimal fraction if and only if r for some a E Z and nonnegative integer k.
a/lOk
Prove
. . .p p ,
where p l , p2,
. . . , p,
xma=O
has a rational number solution x nents el, e2) . . . , and e,.
=
6. Let n be a natural number. Suppose that u and v are rational numbers which are related by the equation
Show that
By repeated use of this observation, find a rational number u such that Find a rational number u such that u2  10 < lob8. u2  5 <
7. Show that if t
>
> r2 >
2.
72 The coordinate line. The motivation for Dedekind's construction of the real number system is geometrical. As we saw in Section 61, the use of numbers to measure distances led to the introduction of rational numbers. Rulers were constructed by subdividing a convenient unit of length into fractional parts. A number of these subdivided units could be laid out on a single "yard stick" or "tape measure." The mathematical idea behind al1 these measuring implements is the notion of a onetoone corresponden~ between the rational numbers and certain points of a line. The early Greek geometers were the first to realize that not al1 the points of the line were "used up" in this correspondence, that is, there are points of the line which do not correspond to any rational number. From this they drew the conclusion that numbers are inadequate for describing geometrical notions such as length and area. The modern viewpoint is quite different. The fact that the rational numbers do not fill up the whole line is accepted as evidence that the rational number system should be enlarged. Moreover, the desire to have a onetoone correspondence
230
[CHAP.
between the set of all numbers and the set of all points on a line is the principle which guides the construction of the real numbers. If two points Poand P1are given on a line 1, then there is a natural way to set up a onetoone correspondence between the rational numbers and points of 1 so that O corresponds to Poand 1 corresponds to P1. The construction of this correspondence parallels the steps of the construction of the rational number system given in Chapters 3, 4, and 6. Once the real numbers have been defined, this correspondence can be completed so that the real numbers are associated in a onetoone way with al1 the points of l. The correspondence between numbers and points is called a coordinate system on l.* In this section, the onetoone correspondence between Q and points of a line will be described. We will then see in the next section how this correspondence leads to Dedekind's definition of the real number system. The geometrical ideas are introduced only to guide our intuition toward the appropriate definitions. Accordingly, no attempt will be made to give rigorous proof s of geometrical statements. As is customary, assume that the distinguished points Po and P1 on 1 are situated so that P1 is on the right side of Po (see Fig. 71). The segment of 1 from Poto P1is called the basic unit interual. The length of this interval is the unit of length for the coordinate system on l. The points Po and P1 are called the origin and unit point, respectively, of the coordinate system.
The first step in the constructioii of the coordinate line consists of associating the natural numbers with points of l. Let P2be the point to the right of P1whose distance from P1is the same as the distance from P1 to P o . Such a point can be constructed mechanically, using a pair of compasses, by drawing a semicircle centered a t P1, starting a t the point Po (see Fig. 72). The other point of intersection of this semicircle with 1 is
* The term "coordinate system" may also refer to a onetoone correspondence between the points of a plane and the set of al1 pairs of real numbers (see Section 83), or points in space and the set of al1 triples of real numbers. A line 1 together with a coordinate system on 1 is often called a coordinate line.
721
231
P 2 . Let P3 be the point to the right of P2whose distance from P2is also equal to the unit length. This can be constructed in the same way that P2 was obtained, since the distance from P2 to P1 is also equal to the unit length. Continue this process, obtaining points P4,P5,P 6 , . . . SO that the segments PoP1, P1P2, P2P3, P3P4, P4P5, . . . are al1 congruent. (That is, they have the same length and the ordering of their endpoints is the same: P1is right of P o , P2is right of P1,Pois right of P2,etc.). Now set up the correspondence n t, P, between the natural numbers and the points constructed in this way (see Fig. 73). It is apparent that this correspondence is onetoone. However, in order to prove this fact, it would be necessary to use some simple geometrical properties of straight lines (specifically, the fact that straight lines do not close back on themselves, as do the great circles on spheres, for example) . The next step in setting up a onetoone correspondence between the rational numbers and points of 1 consists of defining a sequence P1, P2, P3, . . . of points on 1, moving from Poto the left in such a way that al1 of the segments PlPo, P2P1, P3P2, . . . are congruent to the basic unit interval P o P l . These points can be obtained using a pair of compasses in the same way that the points PP,P3, P4, . . . were constructed. Now define the correspondence a ++ P, between the integers and the set of points . . . , PW3, P2, P1, Po,Pi, P2, P3, . . . (see Fig. 74). Let r be any rational number. Then r can be represented as a quotient of an integer by a natural number r = a/m. Thus, in order to obtain a correspondence between Q and points of 1, it suffices to define points P,,, on 1 for each a E Z and m E N, so that P,,, = Pb,,if and only if a/m = b/n.
232
[CHAP.
between Q and the points P,,, of 1 is well defined (since a/m = b/n implies P,,, = Pb,n)and onetoone (since a/m # b/n implies P,,, # Pb,n). To construct the points P,,,, choose P1,, to be the first point to the right of Poin a subdivision of the basic unit interval into m equal parts. That is, P1,, is the point on the righthand side of Po such that the distance from Poto P1is m times the distance from Poto P1,,. For example, p1,i = P1, and P1,2 is the point which bisects the segment PoP1. To obtain the points P,,, for arbitrary a E 2, we repeat the process used to obtain the points P, associated with the integers, except that PoPl ,, is used as the basic unit interval instead of PoP1. That is, the points P2,,, P3,,, P4,m7 . . . are constructed to the right of P1,, and Pi ,, P2,m, P3,m, . . . are constructed to the left of Po,so that the intervals
...,
and
P 3 , m P  2 ,m)
P2,mP1
,m,
P1 ,rnPo,
Pl , m P 2 , m ,
P2,mP3,m)
are al1 congruent to the interval PoP1 .,, For example, if m = 2, the points shown in Fig. 75 are obtained. I t is clear from the definition of the points Pa2,that for any natural number Ic, P,,, = Pka,km. Then if a/m = b/n, it follows that
If a/m # b/n, then either na < mb, or mb < na. In case na < mb, it is evident that P,,, = Pna,,, lies to the left of P b , n = Pmb,nm.Similarly, if mb < na, then P b , n lies to the left of P,,,. In either case P,,, Z Pb,n. I f r is any rational number, we can define P, to be the point P,,,, where r = a/m is any represeiitatioii of r as a quotient. Using this more convenient notation, we can express the correspondence between Q and the points of 1 in the form r ++ P,.
Our discussion in the preceding paiagraph shoms that this correspondence has the following basic property.
721
233
(72.1). The point P, lies to the left of t,he point P, if and only if r
< s.
Since P1, = P1,the unit interval which is used for the construction of the points PaY1 is the same as the original basic unit interval PoP1. This means that the points PaP1 are the same as the points P, which were associated with the integers. Thus, the correspondence a l 1 t . , Pa,l agrees with the previously defined association a ++ P,. I t is not possible to picture al1 of the points Pul, on a line segment, since as m gets large, these points become increasingly dense along every part of the line (see Fig. 76). In fact it is possible to prove the following important result. (72.2). I f S and T are two differeiit points of 1, then there is a point P,,, between S and T. We will not give a proof of (72.2) since that would require a careful formulation of geometrical principies. However, it is worthwhile to give an informal argument in support of this statement. Suppose for definiteness that S lies to the left of T. Then the basic interval can be covered by a finite number of translates* of ST. That is, there are points TI, T2, T3, . . . , T, on 1 such that T, lies to the right of P1 and the intervals PoT1, T1T2, T2T3, . . . , Tm1 T, are al1 congruent to S T (see Fig. 77). Thus, we can suppose that m is the number of translates of S T needed to cover PoPl in this way. Then if PoPl is subdivided into m equal subintervals, each of these will be shorter than the intervals PoTl, TlT2, T2T3,. . . , Tm1Tm. In particular, PoPIIm is properly contained in PoT1.
[CHAP.
There is a unique integer a such that S lies in the interval Pcal,I,P,rm with P,,, to the right of S (see Fig. 78). Then PaI, lies on the Ieft side of T, since otherwise the interval P(aI,I,PaI, woufd contain the interval S T . However, Pcal)lmPalm 2 S T is impossible, since P(al)lmPa/m is congruent to P o P I I m S , T is congruent to P O T I ,arid P o P I I m c PoT1. Therefore, the point Pul, lies strictly between S and T. The density property (72.2) obscures t,he fact that the points P, do not filI the whole line. The sets (P,,,la E 2 ) form a "mesh" on E which becomes arbitrarily fine as m inereases. I t is conceivable therefore that ~ , ~ ~ { P , ~ ,E l a2) is the set of al1 points of l. Thefact that the points P, do not exhairst 1 is a consequence o Pythagaras' theorem that .$S is irrational. This diseovery, which greatly influenced the development of Greek mathematics, was probably made by means of a geometrical exarnple such as the folfo~ving one.
EXAMPLE 1. Draw a line 1 through diag~nallyopposite corners of a square. Set up a coordinate system on 1, using one corner of the square as the origin Po, and chwsing Pi on the segment of 1 inside the square, so that the unit of length for the coordinate system is the same as the length of the sides of the square (that is, the distance from Po t o Pi is the same as the Iength of the side of the square). Let T be the point on 1 uhich is the corner of the square opposite to Po (see Fig. 79). Finally, let S be one of tho two corners of the square which is not on l. Then POST is a right triangle. Thus, by the Pythagorean triangle theorem
where PoT, POS, ST, and PoPi represent the lengths* of the line segments PoT, POS,ST, and PoPl. If T = P, for some rational number r, then the distance from Po to T is Ir1 times the length of the basic unit interval PoPl. That

* The eally Greek mathematicians al~vaysinterpreted lengths and areas as different kinds of geometrical magnitudes (not numbers), so that the Pythagorean triangle theorem for them was a relation between the areas of three squares. However, they did assign a meaning to fractional multiples of lengths and areas, and they showed that if the length of the side of one square is r times the length of the side of another, then the area of the first square is r2 times the area of the second. Thus, the proof given in this example would have made sense to the Greek geometers.
DEDEKIND CUTS
is,
Pol'
J r J PoPi. Hence,
so that As we saw in Section 71,2 cannot be the square of a rational number. Therefore, T must be different from al1 of the points P,.
l. Using a ruler, draw a figure which extends the subdivision in Fig. 76 to include al1 points a/5 and a/6 between 3 and 3.
2. For which of the following values of Ic is i t possible to construct a rectangle with sides of integral length whose diagonal has length dk:
3. Show how to subdivide a line segment into 7 equal subsegments, using only a ruler and a pair of compasses.
4. Show that the correspondence r ++ P , satisfies (72.1). 5 . Define the length of a line segment P,P, on the line 1 to be
73 Dedekind cuts. We turn now to the problem of constructing the real numbers so that they will correspond to al1 of the points on the line l. Our purpose in this section is to show how this requirement leads to the definition of real numbers. If T is a point on 1, define
XT
(71)
1 which lie to the right of T (see Fig. 710). In the example shown in Fig. 710, r E XT and s XT. The properties (72.1) and (72.2) of the correspondence r ++ P,, together with the definition of the sets XT, given in (71)) lead to a number of important facts.
(73.1). Let T be any point on l. Then (a) C X T C Q; (b) if r and s are rational numbers such that r < S and r E XT, then s E XT; (c) X T has no smallest element; (d) if T = P,, then X T = (S E &Ir < S); (e) if the point S on 1 lies to the left of T, then Xs XT; (f) the correspondence T ++ X T is onetoone.
We will prove ( a ) , (b), and (c) and leave it for the reader to prove the remaining statements on the basis of (72.1) and (72.2). Let S and R be points on 1 such that S lies to the left of T and R lies to the right of T. By (72.2)) there are rational numbers s and r such that P, is between S and T and P, is between T and R. Then P, lies to the left of T and P, lies to the right of T. By (7l), s 4 X T and r E XT. Therefore X T is a nonempty proper subset of &, that is, @ c X T c &. TO prove (b), we note that by (72.1) if r < S, then P, lies to the right of P,. Since r E XT, P, lies to the right of T. Hence P, lies to the right of T. Consequently, s E XT. The proof of (c) uses both (72.1) and (72.2). Suppose that r E XT, that is, P, lies to the right of T. By (72.2) there is a rational number s such that the point P, is between T and P,. Therefore, P, lies to the right of T, and to the left of P,. Hence, s E X T by (71)) and s < r by (72.1). This argument proves that for any element r of XT, there is always a smaller element s E XT. Thus X T cannot have a smallest element . The fact stated in (73.1 f ) that T ++ X T is a onetoone correspondence between the points of 1 and the sets X T of rational numbers suggests that these sets might be the appropriate objects to cal1 real numbers, since our stated objective is to define the real numbers so that they will correspond to al1 the points on l. However, the sets X T are defined using rather vague geometrical ideas. The definition of real numbers should be based on the established properties of the rational number system. We would therefore like to find properties of the sets XT which characterize these sets in an
731
DEDEKIND CUTS
237
exact way. The properties (a), (b), and (e) of (73.1) satisfy this requirement. DEFINITIOX 73.2. A Dedekind cut* is a set X of rat,ional numbers satisfying (a) $ X S&; (b) if r < s and r E X, then S E X ; and (c) X has no smallest element. I t is nomyour contention that the sets of rational numbers thus defiiied can be identified with the sets XT. By (73.1), every set X T is a set of rational numbers X which satisfies the conditions of Definition 73.2. On the other hand, if X is a Dedekind cut, then there is a point T on the line 1 such that X = X T . This is not a statement which we can prove, but rather it is a geometrical assumption about the set of points on a line. However, this assurnption can be made plausible. Let X be a Dedekind cut. Xote that there is a point on 1which lies to the left of every point P, corresponding to a rational number r which belongs to X. Indeed, if this is not the case, then X contains every rational number, contrary to Definition 73.2(a). To see this, assume that there is no point of 1 which lies to the left of every point P, with r E X. This means that for every point S on 1, there is some point P, with r E X such that either P, = S or P, lies to the left of S. Let s be an arbitrary element of Q, and choose S on 1 to be a point to the left of P,. Then by assumption there is an r E X such that P, lies to the left of P,. By (72.1)) r < s. Since r E X, it follows from Definition 73.2(b) that S E X. Thus, we have shown that X contains every rational number S, contradicting Definition 73.2(a), as predicted. n'ow imagine that a movable poiiit indicator is placed on 1 a t a point which lies t'o the left of every point P, with r E X. Let the indicator be moved to the right, as far as it will go without passing through one of the points P, corresponding to a rational number r which belongs to X. Since X is not empty by Definition 73.2(a), the indicator cannot be moved indefinitely. Therefore, it must stop a t some point T, blocked by the condition that if it is moved any farther to the right, then it will pass through a point P, with r E X. We assert that X = XT. In the first place, suppose that s E X. We wish to show that S E XT, t,ha>t is, the point P, lies to the right of T. Since X has no smallest element by Definition
* The sets of rational numbers satisfying (a), (b), and (c) of Definition 73.2 are more properly called upper Dedekind cuts, but we will simply refer to them as "cuts." A lower Dedekind cut is defined to be a set X of rational numbers such that C X C Q, r > S and r E X implies s E X, and X has no largest element. The real numbers can be defined using lower Dedekind cuts, but i t turns out that the definition of multiplication is less natural in terms of lower cuts.
238
[CHAP.
73.2(c), there is an r E X such that r < s. Therefore, P, lies to the left of P,, by (72.1). Moreover, P, cannot lie to the left of T, since otherwise the indicator would have passed through P, in moving to the position T. Thus, either P, = T or P, lies to the right of T. Since P, lies to the right of P,, it follows that in either case, P, is t,o the right of T. Consequently, X 2 XT. Now suppose that s E XT. Then P, lies to the right of T. Since the indicator cannot be moved any closer to P, than T without passing through some point P, corresponding to a rational number r in X, there must be an r E X such that P, lies to the left of P,. Hence r < S by (7=2.1). Therefore, by Definition 73.2(b), it follows that S E X. This shows t,hat XT c X. Therefore, X = XT. Of course, this argument is not a proof in the mathematical sense. However, it does show, intuitively at least, that every Dedekind cut is of the form X T for some point T on the line l. This means that T t , X T is a onetoone correspondence between the set of al1 points on 1 and the set of al1 Dedekind cuts. I t therefore seems reasonable to formally deJine the set of al1 real numbers to be the set of al1 Dedekind cuts. That is, by definition, real numbers are Dedekind cuts. Then we can say that there is a onetoone correspondence between the set of al1 points on 1 and the set of al1 real numbers. The correspondence T X T is called the coordinate system on the line 1 (or the coordinatization of 1) with the basic unit interval PoP1.
( 4 {r E &Ir 2 > 3 ) (b) { r E Q l l / r < 1) (c> { Y E &Ir2 > 1) (d) {r E QI lir3 2 0 ) (e) {r E QIO (1 y )  l < 1) 2. Illustrate the proofs of (73.la, b, c) by means of diagrams.
<
74 Construction of the real numbers. The motivation given in the last two sections has prepared the way for the formal definition of the real numbers and their operations. For convenience, we repeat the definition given informally at the end of the last section.
DEFINITION 74.1. The se R of al1 real numbers is the set of al1 Dedekind satisfy cuts, that is, the totality of the sets X of rational numbers ~vhich conditions (a), (b), and (c) of Definition 73.2.
741
239
Thus the real numbers are defined strictly in terms of the set Q of rational numbers and the ordcr relation < in Q. The reader should be aware of the fact that our point of view is now changed. The real numbers are the Dedekind cuts. The operations in R must be dcfined and their properties derived solely on the basis of Definition 74.1 and known properties of the rational numbers. Before considering the operations in R, it is necessary to establish some fundamental properties of Dedekind cuts.
(74.2). If X and Y are Dedekind cuts, then exactly one of the relations
XCY, is satisfied. X=Y, or X 3 Y
Prooj. Suppose that neither X c Y nor X = Y is satisfied. Then there is an element r E X such that r 4 Y. If s E Y, then s 5 r is impossible, because otherwise r E Y by Definition 73.2(b). Therefore, r < s. Since r E X, it follows from Definition 73.2(b) again that S E X . Hence, we have shawn that s E Y implies S E X , tjhat is, Y X. By assumption Y # X. Therefore, X > Y. We have shown that if neither of the two relations X c Y, X = Y is satisfied, then the third relation X > Y must hold. This proves that a t least one of the threc relations is satisfied. I t is obvious from the definition of set inclusiori that a t most one of the relations X c Y, X = Y, X 3 Y can be satisfied.
Thc reader should remember that for an arbitrary pair of sets X and Y, none of the relations X c Y, X = Y, or X > Y necessarily hold. Therefore (74.2) expresses an import,ant special property of Dedekind cuts. Correspondirig to cach rational number r there is a Dedekind cut given b y t,hc definition
X(r)
Thc correspondencc
= (t E QJt
>
rj.
++
X(r)
( 7 4 . 3 ) . Lct r arid s be rational numbers. ( a ) X ( r ) > X ( s ) if and only if r < s. (1,) X ( r S ) = {t u ( t E X ( r ) ,u E X ( s ) ) . ( c ) If r 2 O and s 2 0, then X ( r . s ) = ( t  u l t E X ( ~ ) , U E X ( S ) ] .
l'roof. 'i'he first of thesc statcmcnts follo\vs easily from (72). We will prove (b) and lcavc the proof of (c.) as an excrcise. If t E X ( r ) and u E X ( s ) , then t > r arid U > S , by (72). Therefore, t u >r S,
240
that is, t
S). Then v > r S, by (72). There is a rational Suppose that v E X(r number w satisfying u > w > r S; for example, w = i ( r S u) has this property. Then w  r > S and ( u  w) r > r. Thus, if t = (u  w) r and u = w  r, it follows that t E X(r), u E X(s), and t U = (u  W) r (W  r) = u. Therefore,
+ +
+ +
S), it follows that X(r Since u was any rational number in X(r {t ult E X(r), u E X(s)). This proves (b).
S)
I t should be noted that (74.3~)is not true if either of the assumptions r 2 O and S 2 O is omitted. In fact, if r < 0, and S is any rational number, then {u v[uE X(r), v E X(s)) = Q (see Problem 3 below). The onetoone correspondence r * X(r) which is established by (72) between the set Q of rational numbers and a set of Dedekind cuts serves to identify the rational numbers with a subset of R. Of course, Q itself is not a subset of R, and in order to be able to think of the rational number system as a part of the system of real numbers, it is necessary to "identify" each rational number r with the corresponding cut X(r). A similar identification process was used when we enlarged the system of integers to the field of rational numbers (see Section 65). In effect, the rational numbers are redefined to be the set of al1 Dedekind cuts X(r). I t is important to show that the operations and ordering which will be defined in R agree for cuts of the form X(r) with the usual operations and ordering in Q. Specifically, it will be necessary to prove X(r)
+ X(s)
=
= =
< X(s)
if and only if
< s.
These facts will be established as each operation is defined in R. I t is convenient to discuss the ordering of R before defining the operations of addition, negation, and multiplication.
<
(or Y
> X)
if X > Y.
741
241
I t may seem odd that the ordering of R is the reverse of the inclusion relation. This reversal is necessary, however, to make the ordering in R agree with the usual ordering in Q. In fact, by (74.3a), X(r) > X(s) is equivalent to r < s. Hence, X(r) < X(s) if and only if r < s according t o Definition 74.4. THEOREM 74.5. The ordering of R has the properties: (a) for any X and Y in R, exactly one of the relations X X = Y, or Y < X is satisfied; (b) if X < Y and Y < W, then X < W.
< Y,
The statement (a) is a reformulation of (74.2), using the Definition 74.4, and statement (b) is a consequence of the transitivity of inclusion. DEFINITION 74.6. Addition in R. Let X E R,
Y E R. Define
+ Y is called the sum of X and Y. I t is necessary to show that X + Y is a Dedekind cut. Obviously, Q. Since X c Q and Y C Q, there are rational numbers u @ CX + Y and v such that u < r for al1 r E X and v < s for al1 s E Y.Consequently, u + v < r + s for al1 r E X and s E Y. Therefore, u + v 4 X + Y. This proves that X + Y c Q. Next, suppose that r, and t are rational s. numbers and that r + s < t, where r E X and E Y. Then r < t Thus, t s E X. Consequently, t = (t + s E X + Y.This shows that X + Y satisfies Definition 73.2(b). Finally, to show that X + Y has no smallest element, suppose that t E X + Y. Then by definition, t = r + s for some r E X and s E Y. Since r is not the smallest element of X, thereexistsr' E Xsuchthatr' < r. Thent = r + s > r' + s E X + Y. Hence, t is not the smallest element of X + Y, and since t was any number in X + Y, it follows that this set has no smallest element. Therefore, X + Y satisfies al1 of the conditions required to be a Dedekind cut, so that X + Y E R.
Then X
S,
S
 S)
Therefore, addition of the elements of R which we have identified with rational numbers agrees with the usual addition in Q. Defining negation in R is a bit tricky. The negative, X, of a Dedekind cut X E R must be a cut such that the sum of X and X is the zero f the rational numbers [which we are identifying with the element of R. I cuts of the form X(r), r E Q] are to be a subring of R,then the zero of R
242
[CHAP.
s > O, so that s In particular, if r E X ? s E X, then r sequently, if S E X, then s > r for al1 r E X ; that is, X E {S E Qls
>
r.
Con
> >
As a first guess, one might suppose that the set Tx = {S E QIs r for al1 r E X)
is the cut X for which we are looking. Indeed, it is easy to see that Tx satisfies the first two conditions in Definition 73.2 of a Dedekind cut, namely, @ C Tx c Q, and if s < t and s E Tx, then t E Tx (see Fig. 711). What makes Tx most attractive as a candidate for the role of X is the fact (which we will not prove) that {r sjr E X, s E Tx) = X(0). That is, if Tx is a Dedekind cut, then according to Definition 74.6, X Tx = X(0). Unfortunately, the set Tx is not always a cut. In fact, if X = X(r), where r E Q, then
Tx
t for al1 t E X(r)) t for al1 t E Q such that t > r} t for al1 t E Q such that t < r) r),
and the set Tx has a smallest element r. Therefore, Tx is not a Dedekind cut, by Definition 73.2(c). I t appears, however, that the negative of X
741
243
should be Tx if T x has no least element, and it should be the set obtained from Tx by deleting the srnallest element, in case Tx has a least element. A convenient way to formulate this definition is to say that X is the set of al1 elements in T x which exceed some other element of Tx. This description of X is equivalent to the following one.
DEFINITION 74.7.
defined to be
X
(r E QIr
>
S
We must prove that X is a Dedekind cut. Obviously, X G Q. Moreover, if r E X, then by Definition 74.7, r 4 X. Hence, X C Q: Suppose that t X. Then t < S for al1 s E X, by Definition 73.2(b). Consequently, t > S for al1 S E X. Thus, if r > t, then r E X. Therefore, @ c X. I t is obvious from Definition 74.7 that if r < s and Finally, if r E X, then by Definition 74.7, r E X, then S E X. r > t for some t such that t > 8, for al1 s E X. Let r' be a rational Connumber satisfying r > r' > t (see Problem 1 ) . Then r' E X. sequently, r is not the smallest element of X. Since r was an arbitrary element of X, it follows that X has no smallest element. Therefore, X is a Dedekind cut. We note that negation in R agrees on cuts of the form X(r) with negation in Q. In fact, by Dehition 74.7,
(u E Qlu > t for some t E Q such that t > S for al1 S E X(r)). Since X(r) = (S E Q / s > r) , we have
X(r)
=
X(r)
(u E Qlu
E Q such that
>
S
for al1 S
> r).
I f t > S for every rational number S which is greater than r, then t < S for al1 s > r. This implies that t 5 r; that is, t 2 r. Conversely, if t r, then t 5 r, and therefore t < S for al1 S > r. We have shown r. Consequently, that t > s for al1 S > r if and only if t
>
>
X(r)
= =
( u E Qlu > t for some t E Q such that t (U E QIu > r> = X(r).
r>
f X E R and X (74.8). I
X.
244
Let S be an element of X which is not in X(0). Then S 5 O. Since X has no smallest element, there is a rational number t E X such that t < S 2 0. f r E X, then r > t, so that . Hence, t E X(0). I Then t > O r E X(0). This shows that X 2 X(0). Since  t E X(0) and t 4 X (by Definition 74.7)) it follows that X C X(0). Therefore X(0) < X. As one might expect, X(0) is the zero element of R. Therefore, X E R is called positive if X(0) < X, negative if X < X(O), and nonnegative if X(0) < X or X(0) = X (that is, X(0) 5 X). By (74.8)) if X is negative, then X is positive. DEFINITION 74.9. Multiplication in R. Let X E R and Y E R. Define (a) X Y = {r slr E X, S E Y)if X and Y are nonnegative, (b) X Y = [(X) Y] if X is negative and Y is nonnegative, if X is nonnegative and Y is negative, and (c) X Y = [X ( Y)] (Y) if X and Y are negative. (d) X Y = (X) To justify this definition,* three remarks are needed. First, if X and Y are nonnegative, then {r slr E X, S E Y)is a cut. The proof of this fact is similar to the argument which we gave to show that ( r slr E X, S E Y> is a Dedekind cut, and we leave it for the reader. [Note that since X E X(0) and Y c X(O), it follows that C X Y E X(0) c Q].Second, if X is negative and Y is nonnegative, then X is positive, so that the expressiori [(X) Y] makes gense because products of nonnegative cuts have been defined. A similar remark applies to the cases (e) and (d). Our final remark is that if r and S are rational numbers, then
* It is unfortunate that to define the product of two Dedekind cuts, four cases must be considered separately. We can easily see, however, that if either X or Y is negative, then ( r slr E X, S E Y) = Q (see Problem 3). Thus it would not do to use Definition 74.9(a) without some restriction on X and Y. This problem can be avoided by arranging the construction of R in a different order. Instead of proceeding from the natural numbers, to the integers, to the rational numbers, to the real numbers, as we have done in Chapters 3, 4, 6, and 7, the system R could have been obtained by the construction:
natural numbers
+
+positive
This route from N to R is somewhat more convenient, but less interesting, because i t does not give us an opportunity to study the important rings Z and Q along the way.
741
245
If r and s are nonnegative rational numbers, this identity follows from (74.3~). Then using the identity X(r) = X(r), together with Definition 74.9(b), (e), and (d), the desired result is easily obtained for al1 combinations of the signs of r and s. For example, if r is negative and S is nonnegative, then X(r) X(s) = [(X(r)) X(s)] = [X(r) X(s)] = [X((r) S)] = X([(r) S ] ) = X ( r S). Until now, the only examples of Dedekind cuts which we have seen are the sets X(r) corresponding to rational numbers. In Section 710, it will be shown that the sets Q and R do not have the same cardinal number. Consequently, there must be a vast set of cuts which are not of the form X(r) for any r E Q. I t seems worthwhile to give here a specific example of such a cut.
EXAMPLE 1. Let X = {r E Qlr > O and r2 > 2). Obviously, @ C X C Q, and if r E X, r < S, then s E X. To prove that X has no smallest element, suppose that r E X. Let r 1 S = +. 2 r
Then r  s = r/2  l/r = (r2  2)/2r > O . Hence, r > s > O. Also, s2  2 = (r2  2)2/4r2 > 0, SO that s2 > 2. Therefore, s E X. This proves that X is a Dedekind cut. Obviously X is nonnegative. Thus, X2
=
(rslr E X, s E X).
If r E X and s E X, then (r s ) = ~ r2 s2 > 2 . 2 = 4. Therefore, r s > 2. That is, X2 X(2). On the other hand, if t > 2, it is possible to find positive rational number r such that t > r2 > 2 (see, for example, Problem 7, Section 71). Hence, r E X and t E X2. This shows that X2 X(2). Therefore, X2 = X(2). In other words, X is the real number 2/2. In particular, X cannot be of the form X(t) for any rational number t, since otherwise X(t)2 = X(2) would imply t2 = 2.
THEOREM 74.10. The system R of al1 real numbers, given by Definition 74.1, with the operations of addition, negation, multiplication, and order defined by Definitions 74.6, 74.7,74.9, and 74.4, and with X(0) and X ( l ) as zero and identity element, is an ordered field.
The reader should refresh his memory by listing al1 the identities which have to be checked in the proof of this theorem. Some of these are trivial. Por example,
246
[CHAP.
establishes the commutative law of addition. The identities which involve multiplication (particularly the distributive law) are troublesome, because their proofs require the consideration of numerous cases. There are two rules whose proofs involve a new idea. x (X) = X(0). I f X E R, W E R, and X # X(O), then there is a Y E R such that X . Y = W.
The proofs of both these results use the following property of Dedekind cuts. (74.11). I f X is a Dedekind cut, and if r is a rational number greater than zero, then there is a rational number s such that s 4 X and s + r ~ X . Proof. Since X c Q, there is some s E Q with s 4 X. Suppose that r 4 X. (74.11) is false. Then for any s not in X, it follows that s Starting with such an S, we obtain S r 4 X, s 2r = (S r) r 4 X, nr 4 X for al1 s 3r = (S 2r) r 4 X, etc. By induction, s natural numbers n. However, this is impossible. In fact, since r > O, it is nr exceeds any rational possible to choose n large enough so that s number. In particular, choosing t E X, we can find n so that s nr > t. Then by Definition 73.2(b), S nr E X.
+ +
+ + + +
Using (74.11)) we show that X (X) = X(0). Let r E X and s E X. Then by Definition 74.7 there is a rational number t such that s > t, and t > u for al1 u E X. In particular, s >. r, so that r (X) = (r slr E X, s E X} c X(0). On s >O . Therefore, X the other hand, suppose that r E X(O), that is, r > O . By (74.11)) it is possible to find S E Q such that s 4 X, and s (r/2) E X. Hence, s < t for al1 t E X. Consequently, (S) r/2 > S and S > t for al1 r/2 E X. I t follows that t E X. Therefore, (S)
(X) 2 Since r was an arbitrary element of X(O), we have proved X X(0). Therefore, X (X) = X(0). We conclude this Section by showing that if X > X(0) and W 2 X(O), then there is a Dedekind cut Y such that X Y = W. Define
{r/slr E W, O
E X}.
There must be rational numbers s satisfying O < s < t for al1 t E X, because X is a Dedekind cut which is properly contained in X(0) = (r E Q I ~ > O>. Hence, @ c Y c X ( 0 ) cQ. If r E W and O < s < t
741
247
for al1 t E X, and if r / s < u, then r < su. Hence, su E W and u = su/s E Y . Finally, Y has no smallest element, because if r / s E Y, with r E W and O < S < t for al1 t E X, then there exists r' E W such that r' < r. Consequently r r / s < r / s and r r / s E Y. This proves that Y is a nonnegative Dedekind cut . By Definition 74.9,
I f u E X and O < s < t for al1 t E X, then in particular s < u. Moreover, since r E W, and W >_ X(O), i t follows that r > O. Therefore, u ( r / s ) > r. Hence, X Y W. To reverse this inclusion, suppose that r E W. Since W has no smallest element, there is an r' E W with r' < r. Then ( r  rr)/r' > O. Select s E Q so that O < s < t for al1 t E X. Then s(r  rr)/r' > O. Hence, by (74.11)) there exists S' E Q such that S' 4 X and S' [s(r  rr)/r'] E X. We can suppose that S' >_ S , since otherwise S' could be replaced by s. Since S' 4 X, it follows that O < S' < t for al1 t E X. Thus rr/s' E Y by the definition of Y. Therefore,
1. (a) Show that if u and v are rational numbers with u < u, then w = +(u u) is a rational number satisfying u < w < v. (b) Use part (a) to prove that X(r) is a Dedekind cut for every rational number r.
4. Suppose that X
X.
5. Show that X
<
6. Prove that if X is a Dedekind cut, then the set Tx = { S E Qls > r for al1 r E X) satisfies Definition 73.2(a) and (b), and {r slr E X, s E Tx) = X(0) 7. Draw a diagram to illustrate the proof that X (X) 2 X(0).
8. Show from the definition of multiplication that X . X(0) = X(0) for al1 X E R.
248
[CHAP.
<
Y, then
10. Prove the following laws in R. (a) X (Y TIT) = ( X Y) IV (b) X X(0) = X (c) X . Y = Y  X (d) X . (Y TV) = ( X Y) W (e) X < Y implies X 1 1 ' < Y TV (f) X < Y and TV > X(0) implies X TV
+ + +
+ +
+
=
+ Y)
(X)
12. Prove the distributive law TV = (W X) (W Y) in the following cases. (a) X, Y and TV are nonnegative Y 2 X(0) (b) TY 2 X(O), X < X(O), X [Hint: Consider W (X Y) W (X) .] (c) W X(0)) X < X(O), Y 2 X(0)) X Y < X(0) (d) X(0)) X < X(0)) Y < X(0) (e) H7 < X(O), X, Y arbitrary
+ + (Y). (X + Y)
<
Y T Y
w>
>
+ + +
=
X for al1 X E R.
14. Show that if X # X(0) and TY is any element of R, then there exists Y such that X Y = TV. It is necessary to consider the three cases: X > X(O), W < X(0); X < X(O), T V 2 X(0); X < X(O), T.V < X(0). These can be reduced to the case X > X(O), W 2 X(O), which has already been considered.
75 The completeness of the real numbers. Theorem 74.10 shows thal the system R of al1 real numbers is an ordered field. The same is true of the rational numbers, but we have seen that the field of real numbers is more versatile than the field Q. For example, such equations as x 2 = 2, x 2 = 3, x 2 = 5,etc., can be solved in R , but not in Q. 1s it possible to find a property of R which distinguishes it from arbitrary ordered fields? In this section we will show that such a fundamental property exists.
2
5
y for
y for
* To state this definition, or Definition 75.2,i t is not necessary that A be an integral domain. The only requirement is that A be a partially ordered sei. That x for al1 x E A ; is, there is a relation 2 defined on A which satisfies (i) x (ii) if x 5 y and y x, then x = y; (iii) if x 5 y and y 2, then x 2.
<
< <
<
751
249
EXAMPLE 1. Let A = Q. Let S = ( r E Q]r2< 2). Then t E Q is an upper bound of S if t > O and t2 > 2 (that is, considered as an element of R, t > 4 2 ) . An element u E Q is a lower bound of S if u < O and u2 > 2 (that is, u <  4 2 ) . EXAMPLE 2. Let A = Z . Let S = ( a E ZIa2 < 2). Then S = (1, 0, 1). Consequently, the upper bounds of S in Z are al1 integers b 2 1. The lower bounds of S are the integers b 5 1. EXAMPLE 3. If A is an ordered integral domain, and if S is a subset of A which has a greatest element x, then the upper bounds of S in A are al1 elements y E A such that y 2 x. Similarly if x is the least element of S, then the lower bounds of S are al1 of the elements u E A such that u 5 2 . EXAMPLE 4. Let A bound in A.
=
&, S
DEFINITION 75.2. Let A be an ordered integral domain. Suppose that S is a subset of A. An element x E A is called the least upper bound of S in A if x is the smallest element in the set of al1 upper bounds of S. An element y E A is called the greatest lower bound of S in A if y is the largest
element in the set of al1 lower bounds of S. I t is sometimes convenient to have a more formal statement of this definition. Referring to Definition 46.3, we see that x is the least upper bound of S if and only if (a) x 2 y for al1 y E S, (b) if x 2 y for al1 y E S, then x 2 x. Similarly, x isthe greatest lower bound of S if and only if (a') x 5 y for al1 y E S, (b') if x 5 y for al1 y E S, then x 5 x. Since the largest element in a set and the smallest element of a set are unique, if they exist a t all, we are justified in speaking of the least upper bound and the greatest lower bound. Of course, the least upper bound and the greatest lower bound of a set may not exist a t all. The expressions 1.u.b. S and g.1.b. S are frequently used as abbreviations for the least upper bound of S and the greatest lower bound of S, respectively. Of ten the Latin terms "supremum" and "injimum" are used instead of "least upper bound" and "greatest lower bound." In this case, the abbreviations sup S and inf S are used. Thus, 1.u.b. S = sup S, and g.1.b. S = inf S.
EXAMPLE 5. Let A = Q and let S = {r E &Ir2 < 2). Then S has no least upper bound and no greatest lower bound in Q, because the set
250
[CHAP.
of al1 lower bounds of S has no largest element. (See the Example in Section 74 .)
EXAMPLE 6. Let d
Then 1.u.b. S 74 .)
=
R and let
dS
in R and g.1.b. S
dS
= R and let T = {X E RIX2 5 2 ) . Then 1.u.b. T and g.1.b. T = dS. Note that in this example, the least upper bound and the greatest lower bound of T actually belong to T, whereas in Example 6, this was not the case.
EXAMPLE 7. Let A
2/Z
THEOREM 75.3. Let F be an ordered field. Let S and T be nonempty subsets of F such that g.1.b. S and g.1.b. T exist. (a) I f U = (x+ ylx E S , y E T), theng.1.b. U = g.1.b. S + g.1.b. T. (b) I f V = (x ylz E S, y E T), and if al1 the elements of S and T are nonnegative, then g.1.b. V = (g.1.b. S ) . (g.1.b. T) . (c) I f W = {x!x E S), then 1.u.b. W = (g.l.b. S).
Proof. We will prove (a) and (c), leaving (b) as a test for the reader. By definition of the greatest lower bound, it follows that g.1.b. S 5 x for al1 x E S and g.1.b. T 5 y for al1 y E T. Hence, g.1.b. S g.1.b. T x y for each x E S and y E T. That is, g.1.b. S g.1.b. T is a lower bound of (x ylz E S, y E T) = U. We wish to show that this sum is the greatest lower bound of U. That is, if x x y for al1 x E S and y E T, then x 5 g.1.b. S g.1.b. T. Let x be an arbitrary element of S. Then x x y for al1 y E T, so that x  x is a lower bound of T. Therefore, x  x g.1.b. T. Transposing, we obtain x  g.1.b. T x. Since x can be any element of S, it follows that x  g.1.b. T is a lower bound of S. Thus, x  g.1.b. T g.1.b. S. This gives the desired result : x g.1.b. S g.1.b. T. Therefore, (a) is proved. To prove (c), note that by Definition 75.1, w is an upper bound of W if and only if w 2 x for al1 x E S. The condition w 2 x is evidently equivalent to w x. Thus, w is an upper bound of W if and only if w is a Iower bound of S. Since S has a greatest lower bound, the condition for w to be a lower bound of S is the same as w 5 g.1.b. S, or (g.l.b. S). This sequence of equivalent statements equivalently w shows that (g.l.b. S ) is an upper bound of W and every other upper bound of W is larger. Therefore, (g.1.b. S) is the least upper bound of W.
< + <
+ +
<
<
< <
<
>
751
THE COMPLETENESS
25 1
A useful case of Theorem 75.3 occurs when t,he set T consists of a single element y. The laws (a) and (b) then become (a') g.1.b. (x yJx E S ) = (g.1.b. S ) y, (b') g.1.b. {x ylx E S} = (g.1.b. S) y, provided that y and al1 of the elements of S are nonnegative.
The reader should be able to formulate and prove an analogue of Theorem 75.3 for least upper bounds. If A is any ordered integral domain and S is the empty set, then every element of A satisfies the condition for being an upper bound and a Iowei bound of S. This fact may seem strange, but a careful reading of Definition 75.1 shows that it is true. For instance, the condition x 2 y for al1 y E @ is satisfied vacuously, because there is no y in @. I t follows that the empty set has no least upper bound, and no greatest lower bound, since an ordered integral domain has no greatest element and no least element (see Problem 4 below). Also, if the set S has no upper bound, then it cannot have a least upper bound. I f S has no lower bound, then it cannot have a greatest lower bound. There are two important examples of ordered integral domains in which every nonempty set which has an upper bound also has a least upper bound, and every nonempty set which has a lower bound also has a greatest lower bound. These are the rings Z and R. DEFINITION 75.4. An ordered integral domain A is called complete if it satisfies: (a) if S is a nonempty set in A which has an upper bound in A, then 1.u.b. S exists; (b) if S is a nonempty set in A which has a Iower bound in A, then g.1.b. S exists. We leave it as a problem for the reader to show that Z is complete. THEOREM 75.5. R is a complete ordered field.
Proof. Let S be a nonempty set of Dedekind cuts ~vhichhas a lower bound. That is, there exists a cut X such that X 5 Y for al1 Y E S. By definition of the ordering in R, this means that Y C X for al1 Y E S. Define w = u({Y(Y E S)).
We will show that W is a Dedekind cut. Since S is not empty, there is some Y E S. Therefore, c Y G W. Since every Y in S is contained in X, it follows that W c X c Q. Therefore, W satisfies condition (a) of the definition of a Dedekind cut. Suppose that r E JV and r < s. Then there is some Y E S such that r E Y. Since Y is a cut, r E Y and r < s implies
252
[CHAP.
S E Y . Hence, S E Y W. Finally, W has no smallest element. For if r E W, then r E Y for some Y E S. Since Y has no smallest element, there is a rational number r' such that r' < r and r' E Y W. That is, for every number in W, there is a smaller number in W. Consequently, W has no smallest element, as claimed. We have shown that W satisfies al1 the conditions of a Dedekind cut. Therefore, W E R. We next prove that W is the greatest lower bound of S. By definition of W, if Y E S, then Y G W. Therefore, W Y for al1 Y E S, so that W is a lower bound of S. Suppose that U is any lower bound of S. That is, U 5 Y for al1 Y E S. Thus, Y U for al1 Y E S. Hence,
<
u ( { Y J YE S ) )
e u.
Therefore, U 2 W. This shows that any lower bound of S in R is less than or equal to W, so that W is the greatest lower bound of S. Our proof up to this point shows that if S is a nonempty subset of R which has a lower bound, then g.1.b. S exists. To complete the proof, it is necessary to prove that if T is a nonempty subset of R which has an upper bound, then 1.u.b. T exists. Let S = {XIX E T). I f U is an upper bound of T, then  U is a lower bound of S. Hence, g.1.b. S exists. Noting that T = { Y1 Y E S ) , it follows from Theorem 75.3(c) that 1.u.b. T exists and is equal to (g.l.b. S). This completes the proof of Theorem 75.5.
1. Which of the following sets have upper bounds in Q? Which ones have lower bounds in Q?
{la  blla E 2, b E 2 ) (rnln E N), where r E Q, O < r { r n ( nE N), where r E Q, r > 1 (nln E N) ( a . b / ( a 2 b2)la E N, b E N )
<
2. Determine the least upper bounds and greatest lower bounds in Q (whenever they exist) of the sets given in Problem 1.
3. Show that any nonempty finite subset of an ordered integral domain has a least upper bound and a greatest lower bound.
4. Show that if A is an ordered integral domain, and if S upper bound in A and no lower bound in A.
=
A, then S has no
5. Show that if S is a nonempty set in an ordered integral bomain, then every upper bound of S is greater than or equal to every lower bound of S. Can the equality ever hold? If so, when? Show that for any set S such that g.1.b. S and 1.u.b. S exist, the inequality g.1.b. S 5 1.u.b. S is satisfied.
7 61
253
6. Give examples of nonempty subsets S of Q which have upper and lower bounds, satisfying (a) g.1.b. S exists, but 1.u.b. S does not exist, (b) g.1.b. S does not exist, but 1.u.b. S exists.
76 Properties of complete ordered fields. I t is difficult to overestimate the importance of the completeness property of the real numbers. Almost al1 of the fundamental theorems of analysis make use of completeness. In fact, one naturally wonders if it would be possible to construct mathematical theories such as calculus, using an arbitrary complete ordered field rather than the particular field R. The answer is that this would be possible, but because of the following theorem the results of this theory would not be any more general than the usual theorems concerning the real numbers.
THEOREM 76.1. Let F be a complete ordered field. Then there is an isomorphism between F and R which preserves the ordering. That is, there is a onetoone correspondence between F and R such that if x and y in F correspond respectively to X and Y in R, then
and
x < y
ifandonlyif
<
Y.
Theorems 75.5 and 76.1 are the two most important results concerning the system of real numbers. Taken together, these theorems te11 us that there is one, and, except for differences in the description of the elements and operations, only one complete, ordered field. Theorem 76.1 also shows that any property which can be proved for the real numbers is a consequence of the ordered field properties and completeness. The complicated description of R by means of Dedekind cuts can now be discarded.* It was needed only to prove the existence of a complete ordered field. We will not prove Theorem 76.1 in spite of the importance of this result. Instead, the use of completeness will be illustrated by proving two important elementary theorems about R. To emphasize the fact that only
* However, there are a few results concerning R which are proved most easily by using the properties of Dedekind cuts. We will find such an example in Section 79.
254
[CHAP.
the ordered field properties aiid completeness are used in the proofs, we will state these theorems for complete ordered fields. Then by Theorem 75.5, they are true for R in particular. THEOREM 76.2. Let F be a complete ordered field. (a) Suppose that x > 1 in F. Then for any y E F, there is a natural number n such that xn > y. (b) Suppose that O 2 x < 1 in F. Then for any y > O in F, there is a natural number n such that xn < y.
Proof. If statement (a) is false, then the set S = (x, x2, x3, . . .) has an upper bound y in F. Hence, by completeness there is a least upper bound w of S in F. Then w 2 xn for al1 n E N. Since x > O, it follows that x1 > O. Hence, w x' 2 xn1 for al1 n E N. Thus, w xl is also an upper bound of S. Since w is the least upper bound, this implies that w xl 2 w. Consequently, zl 2 1 because w > O . Thus, x 1. This inequality contradicts the original assumption that x > 1. To prove (b), we first dispose of a trivial case. I f x = O, then y > O = xl, so that (b) holds with n = 1. I f x >O , then since x < 1, it foIlows that 1 < xl. By (a), there is a natural number n such that (x')" > y'. Consequently, xn < y.
<
THEOREM 76.3. Let F be a complete ordered field. Let x E F be positive. Suppose that m is any natural number. Then there is one and only one positive x E F such that xm = x.
Proof. This theorem is trivially true for m = 1, so that it can be assumed that m > 1. However, with minor changes of notation, the argument which follows is valid in the case m = 1, also. The proof is divided into three parts. (1) We will use the completeness of F to show that there is an element x E F satisfying the following conditions: (a) x > 0; (b) if O y < x, then y" < x; (c) if x < w, then wm 2 x.
<
Then S contains some positive element of F. For example, if y=min(l,x/2), so that then O < y I l
=
and y
y<x,
< 2.
761
255
Thus, w 2 y for al1 y E S. For if this is not the case, then w < y for some y E S. I t would then follow that wm < y" < 2, which is contrary to wm 2 x. Therefore, in particular, max (1, x) is an upper bound of S. By the completeness of F, the set S has a least upper bound in F. Let
We will show that x satisfies (a), (b), and (e). Since x y for al1 y E S and some y E S satisfies y > O, it follows that x > O. To prove (b), suppose that O 5 y < x. Then y is not an upper bound of S, since x is the least upper bound of S. Therefore, y < yl for some y E S. Consequently by the definition of S, y" < y? < x. To prove (c), suppose that x < w. Then w 4 S, because x is an upper bound of S. Therefore, wm is not less than z, that is, wm x. (2) We now show that if x satisfies (a), (b), and (c), then both of the inequalities xm > x and xm < z lead to contradictions. Therefore, xm = z. Suppose first that xm > z. Let
>
>
(xm  z) (m xm')'l.
5 xm  x.
y". However, by (d) and (b), y" < x. Thus, xm > x Consequently, x leads to a contradiction. Next, assume that xm < X. Define w = min {22, x Then (f) x < w I 2 x , (g) (w  x) (2mxm1) 5 Therefore, using the identity
<
xm.
256
[CHAP.
Hence, wm < x. However, by (f) and (e), x 5 wm. Therefore, the assumption xm < x also leads to a contradiction. The only remaining possibility is xm = x. (3) We complete the proof of Theorem 76.3 by showing that there is only one positive x E F such that xm = x. Suppose that x and y are in F, O < x, O < y, xm = x, and y" = x. Then xm = y", so that
Since O
 J Z
O, that is?
This proof is a typical sample of the reasoning methods which are used in analysis. To a beginning student, such proofs look very mysterious and complicated. Often the problem is that the details obscure the simple idea on which the argument is based. In order to understand such a proof, it is necessary to strip away the details and find the underlying idea. The above proof provides a good example. As the quantity y increases, starting a t zero, the value y" also increases continuously, that is, it does not jump. Since y" will ultimately exceed x, there must be some first value x of y for which xm x. This value is obtained in (1) by taking the least upper bound of (y E F ( y 2 0, ym < x). Then the fact that the increase of y" is continuous implies that y" cannot have "jumped over" z a t x. Therefore, xm = x. This is what was established in part (2) of the proof. The unique positive x E F satisfying xm = x is called the mth root of x in F. This quantity is usually denoted by
>
761
PROPERTIES OF COMPLETE
ORDERED FIELDS
257
(If m = 2, the expression is customarily abbreviated to 4.) I t is = O. I f z < O, then the expression is not convenient to define defined. * By Theorems 75.5 and 76.3, we have proved that every positive real number has a unique positive mth root for al1 natural numbers m.
1. Let e be the identity element of a complete ordered field F. Show that if y E F , then there exists a natural number n such that y < ne. 2. Let F be a complete ordered field. Let m be an odd natural number. Let
z E F (either positive or negative). Prove that there is one and only one x E F
such that xm = z. 3. Let F be a complete ordered field. Suppose that x aEZ,mEN, llm a xaIm = (x ) .
2 O in F.
Define for
Show that if a/m = b/n, then s a l m = xbln. Thus, xr is well defined for every rational number r. Prove the following rules of exponents. (a) xr xS = xr+Sfor x 2 O in F, r E Q, and S E Q. (b) (x')~ = x(r.S) for x 2 O in F, r E Q, and S E Q. (c) (x y)' = x r . yr for x O and y 2 O in F, and r E Q.
>
2.
The following problems lead to a proof of Theorem 76.1. They should be done in order. I n al1 of these problems, e denotes the identity element of a complete ordered field F.
5. Suppose that y E F, z E F are such that z  y > e. Show by the wellordering principle (using the result of Problem 1) that there is an integer a such that y < ae < z.
6. Suppose that y E F, z E F are such that y integer a and a natural number m such that
< z.
>
b/n in Q.
* For odd values of m, it would make sense to let 7 2 = However, for m even and x < 0, the expression is meaningless in an ordered field, because of Theorem 45.5.
T X
(?m).
258
[CHAP.
8. For x E F, define X(x) = (a/m E Qja E 2, m E N, x Show that X(x) is a Dedekind cut.
9. With the notation of Problem 8, prove that if x
<
(ae)/(me)} .
< y in F, then
10. Show that if X is a Dedekind cut, then there exists x E F such that X = X(x), where X(x) is defined as in Problem 8. [Hint: Let x be the greatest lower bound in F of the set ((ae)/(me)la E 2, m E N, a/m E X) .] 11. With the notation of Problem 8, prove that
(a/m E &]aE 2, m E N, O
<
(ae)/(me) in F} is
(b) Use this fact, together with the properties of addition in R, to show that X(x) = X(x). 13. (a) Prove that X(0) X(x) = X(0 x) for al1 x E F. (b) Prove that X(x) X(y) = X(x y) for x > O and y > O in F. (c) Prove that X(x) X(y) = X(x y) for al1 x and y in P. 14. Show that x + X(x) is an isomorphism between F and R which preserves order (see Theorem 76.1).
"77 Infinite sequences. I n order to bring our discussion of the real number system back to its starting point, we must show that the real numbers (considered as Dedekind cuts) can be represented by means of the infinite decimal sequences discussed in Section 71. This will be done in Section 79. The theoretical foundation of decimal representations will be laid in this section and the following one. Since the real numbers often occur as elements of sets, it is confusing to use capital letter set symbols to denote these objects. We will therefore change our notation, beginning in this section, and denote real numbers by small Latin letters u, v, w, etc. The ring of rational numbers will always be considered to be a subring of R , and this convention leads to the inclusions
It is clear from the discussion of Theorem 76.1 that we can ignore the way in which the real numbers are constructed without losing any essential information about them. The vital fact to remember is that R is a complete ordered field. DEFINITION 77.1. Let ul, u2, u3, . . . be an infinite sequence of real numbers. This sequence is said to converge to a real number v if, for
771
INFINITE SEQUENCES
259
any real numbers wl and wz satisfying wl < v < w2, there is a natural number Ic (depending on how close wl and w2 are to u) such that if n 2 Ic, then w1 < un < w2, that is,
and so forth. This definition is so important in mathematics that it deserves some discussion. The meaning which we wish to convey by saying that ul, 242, u3, . . . converges to v is that the numbers u, get close to v as we move to the right along the sequence. I t is natural to ask, how close to u? The answer is "arbitrarily close to v" by going out "sufficiently far. " The expressions in quotation marks are vague, but they can be made exact. The phrase "the u's are arbitrarily close to v" must be replaced by an expression such as "the u's lie in an arbitrarily small interval around u," and the phrase "sufficiently far out along the sequence" should be changed to "from some point on in the sequence." Combining these replacements gives a better informal definition of convergence of a sequence to u: no matter how small an interval is prescribed around u, al1 the numbers of the sequence from a certain point on lie in this interval. The reader can now see that Definition 77.1 is only a formal restatement (using mathematical symbolism) of this informal definition. I t appears offhand that for any sequence ul, u2, u3, . . . there might be three possibilities : (a) ul, UZ, u3, . . . does not converge to any real number; (b) ul, u2, u3, . . . converges to exactly one real number ; (c) u1, u2, u3, . . . converges to two or more real numbers. We will show that this last possibility is inconsistent with the definition of convergence.
THEOREM 77.2. I t is impossible for an infinite sequence to converge to two different real numbers.
Proof. Suppose that the sequence ul, u2, u3, . . . , converged to numbers vl and v2 with v1 < v2. Let wl, w2, and w3 be any numbers satisfying wl < v l < w2 < v2 < wg. Then by Definition 77.1, there are natural numbers kl and 1c2 such that if n 2 lcl, then wl < u, < w2, and if m 2 k2, then w2 < U, < w3. However, if n is larger than both kl and kz, these conditions imply u, < w2 < u,, which is impossible. Thus, the sequence ul, u2, u3, . . . cannot converge to two different numbers.
260
[CHAP.
Because of this theorem, we are justified in saying that v is the limit of the sequence ul, u2, u3, . . . if this sequence converges to v. In this case it is customary to write U = limn+, U,.
EXAMPLE 1. Let ui, u2, u3, . . . be the sequence 1, 2, 3, . . . . Then this sequence does not converge to any real number u, since for any u, there is some m such that u 1<m <m 1 < . . I n particular, i t is not possible to find a natural number k such that v  1 < uk < v 1 for al1 n k. This example shows that the possibility (a) listed above can occur.
>
EXAMPLE 2. Let ul, u2, u3, . . . be the sequence 1 , 0 , 1, 0, 1, O, . . . . Then this sequence does not converge to any real number v. I n fact, no matter what the 3 and v  4 < number v might be, i t is impossible to have v  3 < O < v 1<v +, which is what would be required if we took wl = v  $ and in Definition 77.1. w2 = v
+ +
EXAMPLE 3. Let ul, u2, u3, . . . be the sequence 1, +, 4,$, 4, . . . . Then lirn,,, u, = O. For suppose that wl < O < w2. Choose k to be the smallest natural number which is greater than (w2)l. If n 2 k, then n > ( ~ 2 ) ~ ~ . Therefore, wl < O < u, = l / n < w2. This example shows that the possibility (b) listed above can also occur.
EXAMPLE 4. Let ul, u2, u3, . . . be the infinite sequence of real numbers u, u, u, . . . , al1 of which are the same. Then i t is evident, and even easy to prove, that lirn,,, u, = u. EXAMPLE 5. Let ul, u2, u3, . . . be the infinite sequence t, t2, t3, . . . , where t is some real number. Suppose that Itl < 1. Then Itj > ltI2 > jtI3 > . . It is a familiar fact that Itln "gets close to zero" as n gets large. On a more rigorous level, i t follows from Theorem 76.2(b) that if w > O, then there is a natural number k such that w > Itl k. Therefore, w > It/", or equivalently w < tn < w, for al1 n 2 k . If wl < O < w2, let w = min (wl, wz}. Then
for al1 n 2 k. Thus, t, t2, t3, . . . converges to O when Itl < 1. If Itl > 1, then for any v there is a natural number n such that ltIn > lvl, by Theorem 76.2(a). It follows easily that the sequence t, t2, t3, . . . does not converge if ltl > 1. If Itl = 1, then either t = 1 and the sequence is 1, 1, 1, . . . , or t = 1 and the sequence is 1, 1, 1, . . . . I n the first case the sequence converges to 1, and in the second case i t does not converge.
EXAMPLE 6. Let ul, u2, ua, . . . be the sequence 0, 4,+, 0, 4, 0, . . . . That is, u, = O if n 1 (mod 3), u, = l / n if n = 2 (mod 3), and u, = l/n if n = O (mod 3). Then lirn,,, u, = O. JVe leave the proof of this fact as a problem for the reader.
6,
771
INFINITE
SEQUENCES
261
There are some useful properties of thc limits of sequences which will be needed in the succeeding sections. THEOREM 77.3. Let zcl, u2, u3, . . . and vl, v2, v3, . . . be infinite sequences of real numbers which have limits. Let w be any real number. Then (u, u,) = lim,,,~, limn+,vn, and (a) lim,,, u,) = w limn+,un. (b) limn,,(w
The meaning of (a) is that the sequence u1 v l , u2 v2, u3 v3, . . . has a limit which is the sum of the limits of the sequences ul, u2, u3, . . . and vl, u2, u3, . . . . Equality (b) means that the sequence wul, W U ~ W , U ~ ,. . . has a limit which is w t,imes the limit of the sequence u,l, u2 ,u3, . . . . Suppose that lim,,,~, = u, and that lim,,,v, = v. To prove (a), we must v < w2, then there is a natural number k such that show that if wl < u for n k, wl < u, u, < w2. From the inequality wl < u u < w2, it follows that wl  u < u < w2  u. Choose w and w; so that wl  u < w < v < w; < w 2  u . Thenwl  w < u < w2w;. This, together with the inequality w < v < wk, allows us to use the hypotheses lim,+,u, = u and limn+,vn = v. Indeed, by Definition 77.1, there must be natural numbers kl and 1c2 such that if n 2 kl, then wl  w < u, < w2  w;, and if n 2 k2, then w < un < wh. Let k = max (kl, k 2 ) . Then if n 7, it follows that n kl and n 2 k2. Therefore, n k implies wl  w < u, < w2  w h and w < u, < w;. Adding these inequalities gives wl < u, un < w2 for al1 n k. This proves (a). The proof of (b) must be separated into three cases: w = 0, w > 0, and w < O . If w = O, then the statement to be proved is that the limit of the sequence O, 0, O, . . . is O . This is clear. Suppose that w > 0. Let lim,,,~, = u, as before. We wish to show that limn,,w u, = w u. That is, if wl < u, u < w2, then there is a natural number k such 1c. The inequality wl < w u < w2 that wl < w u, < w2 for al1 n and thefact that w > Oyieldswl wl < u < wl w2. Sincelim, +,u, = 1c. Conseu, there is a k such that w' . wl < u, < w' w2 for al1 n quently, wl < w u , < w2 for al1 n 2 lc. The proof for w < O differs from the proof in the case w > O only in that the inequality wl < w u < w2 is equivalent to w' w2 < u < w' wl, rather than w' wi < u < wl W2. As a particular case of Theorem 77.3(b), we obtain (for w = 1)
>
+ +
>
>
>
>
>
>
262
This formula is more general than Theorem 77.3(b), but we will not prove it. (See Problem 9, however.) There are two problems associated with every sequence. Does the sequence converge to some number? I f so, to what real number does it converge? I t appears from Definition 77.1 that in order to give a "yes" answer to the first of these questions, it would be necessary to have the answer to the second. I t turns out that this is not always the case. Many methods have been devised which, for particular types of sequences, yield a criterion for convergence. One of the simplest is the following.
THEOREM 77.4. Let ul, u2, u3, . . . be an increasing sequence of real . Then this sequence converges u2 u3 numbers, that is, ul if and only if it has an upper bound (in other words, there is a real number w such that u, w for al1 n).
<
<
<
<
Proof. First suppose that ul, u2, u3, . . . converges to v. Choose any real numbers wl and w2 satisfying wl < v < w2. Then by Definition 77.1, there is a natural number k such that if n 2 k, then wl < u, < w2. In particular, u1 _< u2 _< . 5 uk _< uk+l 5 5 Un < w2 for a11 n > lc. That is, w2 is an upper bound of {u,ln E N}. Conversely, assume that {u,ln E N} has an upper bound. Then by the completeness of R, this set also has a least upper bound v. We will prove that lim,+,u, = v. Suppose that wl < v < w2. Then u, v < w2, for al1 n, since v is an upper bound of {u,ln E N}. Moreover, because v is the least upper bound of {u,ln E N}, and wl < v, it follows that wl cannot be an upper bound of the set of un's. Hence, there is some natural number Ic such that wl < uk. Then wl < uk 5 uk+l 5 uk+2 5 ' ' ' 7
<
so that wl
< u, < w2
I t is possible to prove a theorem similar to Theorem 77.4 for decreasing sequences of real numbers. A decreasing sequence converges if and only if it has a lower bound. These results do not give any information about sequences which are neither increasing nor decreasing. Such sequences can be bounded but not converge, as Example 2 shows.
781
INFINITE
SERIES
2. Show that if u is any real number, then the sequence u, u, u, to u. 3. Prove the statement niade in Example 6.
. . . converges
4. Show that if ul, u2, u3, . . . is any infinite sequence of real numbers, and if vi, 212, 213, . . . is the sequence obtained from ul, u2, u3, . . . by omitting the first m terms, then v i , 02, 213, . . . converges to w if ul, u2, u3, . . . converges to w, and i t does not converge if ul, u2, u3, . . . does not converge.
1~31,
. . . converges
6. Let ul, u2, u3, . . . be an infinite sequence of real numbers. Suppose that for each real number u, there is some n such that lunl > jul. Prove that the sequence does not converge.
7. Use Theorem 77.3(b) (with w rem 77.4 for decreasing sequences.
=
1)
8. Show that any convergent sequence, considered as a set, has an upper bound and a lower bound.
9. Prove that lirn,,, (u, cases. (a) limn+, U n = O (b) lirn,,, u, > O
u,)
(lirn,,,
u,)
(lirn,,,
"78 Infinite series. A particularly important class of sequences is obtained from the formal expressions called infinite series. To motivate the concept of an infinite series, let us return to the decimal fractions which were discussed in Section 71. The usual notation
For example,
264 and
[CHAP.
This observation tempts us to use a similar interpretation for hfinite decimal sequences. We would like to write
However, the sum of infinitely many numbers is not defined. By using the definition of convergence of sequences, it is sometimes possible to assign a meaning to infinite sums. In particular, this definition covers al1 of the sums which are associated with infinite decimal sequences. DEFINITION 78.1. An infinite series is an expression
where v l, v2, v3, . . . is a given sequence of numbers. The elements of this sequence are called the terms of the series. DEFINIT~OX 78.2. Let
vk be an infinite series.
The n u m b ~ r
is called the nth partial sum of this series. The series is said to converge to u, or to have t,he sum u, or simply to be convergent, if the sequence ul, u2, u3, . . . converges to u. In this case, we write
I f the sequence ul, u2, u3, . . . does not converge, then the series is called divergent.
A convenient way to abbreviate the definition of the sum of an infinite series is by the formula
781
INFINITE SERIES
265
EXAMPLE 1. Let vi = 1, va = 1, va = 1, . . . . Then the nth partial sum of Ek=l vk is 1 1 1 = n. Since the sequence 1, 2, 3, . . . of partial sums is not convergent (Example 1, Section 77), i t follows from Definition 78.2 that this series is divergent.
+ + +
+
2. Let vi = 1, u2 = 1, va = 1, . . . , vk = (  l ) k + l , . . . . Then EXAMPLE u1 = 1, V l + V 2 = 1  1 = o, v1+v2+v3 = 1  1 + 1 = 1, V l + V 2 + 03 04 = 1  1 1  1 = O, . . . . That is, the sequence of partial sums of the series E;=l vk is 1, 0, 1, 0, . . . . Since this sequence does not converge (Example 2, Section 77), i t follows that (  l ) k +l is divergent.
EXAMPLE 3. Let vi
1/1  2 , va
1 / 2 . 3,
03 =
1 / 3  4 , . . . . Then
1/k ( k
+ 1) converges to 1.
EXAMPLE 4 . , Let 01 = 1, v2 = t, v3 = t2, . . . , vk = tkl, . . . . That is, tkl. It is easy to prove by induction* [see Problem 6(a), z = l vk = Section 211 that the nth partial sum of this series is
provided that t # l . I t follows from Theorem 77.3 and Example 5, Section 77, that 1  t" 1 limn), = 1t 1t
+ t2+
+ tn1)
266
[CHAP.
if Itl < 1, and this limit does not exist if Itl > 1. Hence, by Definition 78.2, th1 converges to 1/(1  t ) if Itl < 1, and i t diverges if Itl > 1. the series If Itl = 1, then t = 1 or t = 1. In these cases, the series E:=i tXl is the same as the ones discussed in Examples 1 and 2, both of which diverge. This example, and Example 5, Section 77, upon which i t is based, should be studied carefully. Both results have important applications in the theory of infinite sequences and series.
Many of the results concerning infinite sequeiices lead to theorems about infinite series. A t8ypicalexample is the following.
v k and E:=l w k be infinite series which THEOREM 78.3. Let converge. Let w be any real number. Then
That is, under the assumptions that the series (vk ~ k ) a,nd
k=l
vk
and
vk)
wk
converge,
2 +
k=l
2 (w
converge to the corresponding expressions on the righthand side of (a) and (b). To prove (a), note that by the generalized commutative law
781
INFINITE
SERIES
267
k=l
w.v.=
w.("k). k=l
Thus,
The generalized commutative and distributive laws used in the proof of Theorem 78.3 are concerned only with finite sums. This theorem is in a sense a generalization of these laws to infinite series. If C:=l uk and w k both diverge, one might expect that (uk wk) also diverges. However, the series
k=l
(1 + h(k
+ 1)
and
k=l
(1)
converges. The infinite series discussed in Examples 3 and 4, above, are unusual, because the sum of these series can be determined. Generally, it is very difficult to find the sum of a series. Often we need only to know whether or not a given series converges. For this problem, numerous tests have been devised. Most of these tests apply only to series with nonnegative terms. They are based on the following consequence of Theorem 77.4. THEOREM 78.4. Let CF=l v k be an infinite series such that v k O for al1 lc. Then this series is convergent if and only if the set of its partial sums has an upper bound, that is, there is a real number w such that Ck=lv k = u, 5 w for al1 n.
>
Proof. By Definition 78.2, the series C g l vk is convergent if and only if its sequence of partial sums ul, u2, u3, . . . is convergent. Since v k 2 O for al1 k , i t follows that ul 5 u l v2 = u2 5 u2 v3 = u3 5 . . Hence, the partial sums of CZl v k form an increasing sequence. By
268
[CHAP.
Theorem 77.4, such a sequence converges if and only if it is bounded. This proves the theorem. As it stands, Theorem 78.4 is not a very useful criterion for deciding whether or not a series converges. However, there are numerous tricks for determining whether the partial sums of particular infinite series are bounded or not. Such tests are studied a t length in calculus courses. We will implicitly use one wellknown test (the "comparison test ") to prove a result which makes it possible to assign a real number to every infinite decimal sequence.
. . . , ao, bl, b2, . . . , b,, . . . are integers between O and 9 where a,, (inclusive), converges to a real number .
Proof. The (m
so that by Theorem 78.4, this series converges to a real number. By Theorem 78.5, we see that an infinite decimal sequence can be considered as an abbreviation for a convergent infinite series:
781
INFINITE SERIES
269
Thus, associated with every infinite decimal sequence is a real number (the sum of the corresponding series). This definition of the real number associated with an infinite decimal sequence agrees with the intuitive idea of the decimal representation of real numbers which we discussed in Section 71. Indeed, the decimal representation of a real number u was described there as a sequence of progressively more accurate approximations of u by decimal fractions. This sequence of decimal fractions is exactly the sequence of partial sums of the series associated with the infinite decimal sequence representing u, so that if the intuitive idea of "progressively more accurate approximations of u" agrees with the exact notion of convergence, then the series associated with the infinite decimal representation of u must converge to u. In the next section we will completely justify this viewpoint.
> 1.
if
= 9
2k k!
(c)
U* =
k loAk.
3. Prove that if u k is an infinite series such that O = un+l = un+z = un+sn= ,that is, al1 terms after the nth one are eero, then u k converges
xr=l
xr=l
to
Uk.
xyZl
uk
5 . Prove that the converse of the theorem of Problem 4 is false by showing that (a) lirnk,, 1 / ( d m 1 % ) = 0, and (b) the series
1 x 1=d
uk
dX.1
6. Prove the comparison test: If u* is an infinite szies witli u k 2 O for al1 k = 1, 2, . . . , and (b) v k converges, then
x : = l
E,=,
u*
vn for
converges.
7. Use the comparison test to show that the following infinite series are convergent.
270
[CHAP.
*79 Decimal representation. At the end of the last section, it was shown that every infinite decimal sequence can be considered as the representation of a real number, namely,
This observation provokes the two main questions which will be answered in this section. Can every real number be represented in this way by a decimal sequence? Can certain real numbers be represented by more than one decimal sequence, and if so, which ones, and in how many ways? Throughout this section, both finite and infinite decimal sequences will be considered as abbreviations of their corresponding decimal sums; that is
A decimal fraction which has n decimal places (see Definition 71.2) is called an nplace decimal fraction or an nplace decimal sequence. These are the decimal sequences with n digits following the decimal point, that is, . . . a o . b1b2.. . b,. A nonnegative rational number r is an nplace decimal fraction if and only if 10" r is an integer (see Problem 4, Section 71). I t is convenient to summarize some familiar properties of decimal sequences which will be used in this section. THEOREM 79.1. (a) If r is an nplace decimal fraction, then r is an nplace decimal fraction. (b) I f r and S are nplace decimal fractions, and r < s, then
+ lo"
(c) If amam1 . . . a. . blb2. . . bn = cmcml . . . co . dld2. . . dn, then a, = cm, a,1 = cm1, . . . , a0 = CO, 61 = di, b2 = d2, . . . , b, = d,. . . . a. . b l b 2 . . . b, (d) < amam1 . . . a. . blb2. . . b n b n + l . . . bn+k < (ama,1.. . a o . 6162.. . b,) lo".
Proof. By the remark preceding this theorem, if r is an nplace decimal fraction, then 1 0 9 is an integer. Therefore, 10nr 1 = 10n(r loMn)is
791
DECIMAL REPRESENTATION
271
lo" is an nplace decimal fraction. also an integer. Consequently r The proof of (b) is based on the same idea. Since r < S, This proves (a). Thus, IOnr 1 5 10"s. Hence, r 10nr is an integer less than 10"s. lo" 5 s. To prove (c), note that if amam1 . . . a0 . blb2 . . . bn = cmcml . . . co . dld2 . . . dn is multiplied by lon, then we obtain
Thus, by the uniqueness of the decimal represent'ation of a natural number (Theorem 51.3), a, = cm, a,1 = cm1, . . . , a0 = co, b l = dl, b2 = d2, . . . , bn = dn. Finally, the proof of (d) is a simple calculation:
amam1 . . . a0 . blbz . . . b, = a, . l o m lom' . . a. b1 101 bz. . bn lo" . lorn' . . . a. bl 10l+ b2 . ior2 a,. 10'" + 6, lo" b,+l  bn+k. 1 0  ( ~ + ~ '
+ (g.lo'"+l' + . . . + 9 . lo'"+")
(a, lom
+ (lo"
 10'n+k')
<
=
(a,, 10"
+ a. + bl
101
lo") lo")
4
i = m , m  1 , . . . , O, j Z l , 2 , 3,..., fj
=
ci,
bj
dj 7
(c) (d)
<
. . . a0 . blb2 . . . bn. . . . a. . blb2 . . . bnOOO . . . = . . . a. . blb2 . . . bn 5 amam1 . . . a. . blb2 . . . bnbn+1 . . . . . . a0 . blb2. . . bn)
The identities (a), (b), and (c) are easily proved, using the ii~terpretation of decimal sequences as sums, together with Theorem 78.3. The proof of Theorem 79.2 (d) is based on the corresponding result for finite decimal
272
[CHAP.
sequences Theorem 79.l(d), and the definition of an infinite decimal sequence as the sum of an infinite series. Suppose that
and
amam1 . . . a o . b l b 2 . . . bn = r.
We have to prove that r by Theorem 79.1 (d),
5 u 5r
+ lo".
Suppose that u
< r.
Then
and in particular, if u
loe7'. Consequently, r 5 u. I n the same way, we see that u 5 r Xote that the second strict inequality of Theorem 79.l(d) has been weakened to 5 in Theorem 79.2(d). In Theorem 79.4 it will be shown that this weakening is essential. I n the proof of the fundamental theorem of decimal representation of real numbers we will use an important property of the real number system which has not yet been discussed. (79.3). Let x and y be real numbers such that x rational number r such that x < r 5 y.
< y.
Then there is a
This result has been proved for complete ordered fields in Problems 5 and 6 of Section 76. However, a simpler proof can be given for the real number system if we go back to the construction of real numbers by Dedekind cuts. By Definition 74.4, the inequality x < y means that x, considered as a set of rational numbers, properly contains y. Hence, there is a rational number r such that r belongs to x, but not to y. Then the Dedekind cut corresponding to r contains y and is contained properly in x. (See Problem 3, Section 74.) Thus, if we identify r with the cut to which it corresponds and use Definition 74.4 again, we obtain x < r 5 y. THEOREM 79.4. Fundamental theorem of decimal representation. Let u be a positive real number. Then u is represented by some infinite decimal sequence . . . a. . b l b 2 . . . b, . . . .
791
DECIMAL REPRESENTATION
273
That is, u is the limit of the infinite series corresponding to this decimal sequence.
Proof. The proof of this theorem consists of severa1 steps. First we show that for any n, there is a unique nplace decimal fraction r, such that
The rational number r, is called the nplace decimal approximation of u. The next step of the proof is to show that there is an infinite decimal sequence . . . a. . b l b 2 . . . b, . . . such that for each natural number n, the nplace decimal approximation r of u is exactly . . . a. . blb2 . . . b,. The proof is completed by showing that the infinite series
Let 10nr = a/m, where a E Z and m E N. By the division algorithm, d, where b and d are integers, and O 5 d < m. we can write a = m b Then 10nr = b (dlm). In particular,
There are two possible cases: either b > 10%  1, or b 5 10% Suppose first that b > 10nu  1. Define r, = lo" b. Then
1.
lo,. Moreover, r, is an nplace decimal Consequently, rn 5 u < rn fraction, since lon r, = b is an integer, and b 1 > 10nu 2 0 , so that b O. In the second case, where b 5 10nu  1, define r, = 10"(b 1). Then u  lo" < r < rn 5 u, so that r, 2 u < r, lo". As in 1 E Z, the first case, r, is an nplace decimal fraction, since 1 0 9 , = b and 1 < 10nu  1 < 10nr < b 1 implies b 1 2 0. I t is easy to see that there is at most one nplace decimal fraction r, such that r, 5 u < r, lo,. Indeed, suppose that S , is an nplace decimal fraction such that S , 5 u < S , lo". If r, < S,, then
>
274
[CHAP.
r, lo" 5 S,, by Theorem 79.1 (b). However, this yields the contradiction S, u < r, lo" S,. For a similar reason, the inequality S, < r, is impossible. Therefore, r, = S,. This completes the proof of the first step. (2) Suppose that r, = amam1 . . . a. . blb2 . . . b,, is the nplace decimal approximat,ion of u, and that r,+l = cmcml . . . co . dld2 . . . d,d,+l is the (n 1)place decimal approximation of u. (We may assume that the number of digits to the left of the decimal point is the same for r, and rn+l by adjoining zeros, if necessary.) Let S, = cmcm1. . . co . dld2 . . . d, be the nplace decimal fraction obtained from r,+l by deleting the last digit. Since r n + ~ = S, d,+l 10'~+" and O ( d,+l 9, it follows 10'"'". Moreover, rn+l u < r,+l r,+l S, 9 that S, 10'"+", because rn+l is the (n 1)place decimal approximation of u. Hence,
<
<
<
+ < +
< <
Therefore, S, is the nplace decimal approximation of u. By the uniqueness of such nplace approximations, which was proved in (l), it follows that S , = r,. Hence by Theorem 79.1 (c), cm = a,, cm1
=
a,1,
. . . , co
ao, dl = bl, d2
= b2,
. . . , d,
bn.
We have proved that r,+l = amam1 . . . a0 . blb2 . . . b,d,+l, that is, the 1)place decimal approximation of u is obtained from the nplace (n approximation by adding a single decimal digit. Thus, the sequence rl, r2, r3, . . . of decimal approximations of u gives rise to the infinite decimal sequence amam1 . . . a. . blb2 . . . b, . . . such that for each n,
rn = amam1 . . . a.
. blbz . . . b,.
This completes the second step of the proof. (3) The (m 1 n)th partial sum of the infinite series
+ +
a,
10"
+ bn
+
lo,
+ a. + b1
=
101
+ b2
+
=
. . . a0 . blb2. . . b,
r,.
Therefore, to complete the proof, we have only to show that if wl < u < w2, then there is a natural number le such that wl < r, < w 2 for al1
791
DECIMAL REPRESEXTATIOS
275
2 Ii. By Definitions 78.1 and 78.2, this nill imply that the infinite series corresponding to the iilfinite decimal scqueilce
converges to u. Hy Theorem 7fj.2(t)), there is a natural number h. such that = (10l)k < 21  wl. Then wl < 16 l'herefore, if n 2 l., W l < u  lo" u  lo"" rn 5 u < Wp. This completes the proof of Theorem 79.4. We now consider the second qiicstion which \vas mentioned a t the beginning of this section: which real numbers can be represented by tivo or more infinite decimal seclueilces? I t is easy to show that there are numbers which have different rcprcsentatioils.
EXAMPLE 1. l e \vil1 show that 0.999 ... = 1 = 1.000... . 13y Theorem 79.2(a), 10 (0.999...) = 9.999... = 9 (0.999...). Hencc, 9 (0.999...) = 10 . (0.999...)  (0.999...) = 9. Dividing by 9 gives tlie desircd conclusion.
This example can easily be gcneralizcd. THEOI~EM 70.5. Let, a,, digits. Then
I'roof.
10"
. . . a. . b1b2 . . . bn999. . .) = . . . aob b2 . . . 6, .O90 . . . = (a,naml . . . aoblb2 . . . O,,) 1 (0.9'39 . . .) = (a,,aml . . . aob16p . . . b,) 1 . b1h2 . . . bn) t = 1On[(a,a ,,, 1 . . .
Ilividiilg by 10'"ives
the theorem.
This theorcm shows that every nplace dccimal frnction can be represented by tno different infinite decimal seclueilccs, one of which has al1 zeros after the nth digit to the right of the decimal point, and the other one having al1 riiilcs after thc nth digit to the right of thc dccimal point. NTe
276
[CHAP.
will prove that this is the only case in which a real number has more than one decimal representation. THEOREM 79.6. Suppose that the real number u is represented by two different decimal sequences,
. . . a. . blb2 . . . b n b n + l . . . = U
 CmCml
. . . co . d l d 2 . . . dndn+l
Then
bn+i = 9, bn+, = 9,
bn+, = 9,
..., .
O,
dn+2 = 0, dn+3 = 0,
1, let
By Theorem 79.2(d), rk 2 u 2 rk
+ lo',
sk
< U 2 sn +
I f rk
<~
k then ,
Hence, rk
< sk implies
U = Sk
= rk
. by assumption r, so that rk < sk also yields rk+l < ~ k + ~Since induction argument proves that for al1 k 2 n,
< S,,
an
rk Thus, if k
< sk,
and
u = sk = rk
n,
791
DECIMAL R E P R E S E S T A T I O S
277
By the uniqueness theorcm for finite decimal sequences [Theorem 79.1. (c)],
Consequently, by Theorem 79.1 (c) again, a,, = cm, a,1 = P ,,, 1 , . . . , a. = eo, b i = Si, b2 = f 2 , . . . bn = fn, bn+l = 9, On+2 = 9, . . . , bk = (3. (Xote that if d, > O, then bn = d,  1, b,l = d,l, . . . , bi = d i ,a. = c0) . . . , = cml, a,, = cm.) Sirice 1; can be ariy natural numbcr grcater than or cqual to n, this completes thc proof of Theorem 79.6. We can summarize the rcsults of Theorenis 70.3, 79.5, and 79.6 as f ollo\vs :
THEOKEM 79.7. If U is a positive real number ivhich is not n decimal fraction, then u can be represcnted in exactly one way as an infinite decimal sequcnce. If u is a decimal fraction, then u can bc represented in exactly tivo ways as an infinite decimal fraction. Onc of these rcpresentations ends ivith a sequcnce of nines and the other ends ivith a sequence of zeros.
l . Give al1 possible infinite decimal representations of the folloiving numbers. (a) 1.O1 (b) (c) 5 (d) &
2. Carry out the proof of Theorem 79.5 for the particular case of the number 0.4999 ... .
3. I n which of the folloiving proofs is the coml~letenesspropcrty of R used either directly or indirectly: the proof of Theorem 78.5; the proof of Theoreni 79.4; the proof of Theorem 69.5; the proof of Theorcm 79.6.
4. Prove Theorem 79.2(a), (b), and (c).
5. Provc the following refinenlent of (79.3) : let x and y be real numbers such that x < y ; then there is a rational number r such t h a t x < r < y.
278
where a,, a,1, . . . , ao, bi, b2, . . . , b,, binary sequence represents the number
. . . are binary
digits O or 1. Such a
(a) State the analogue for infinite binary sequences of each of Theorems 78.5, 79.4, 79.5 and 79.6. (b) Find the binary sequence representing
h.
5 x).
7. If x is any real number, define the greatest integer function [z] as folloivs:
. . . bn . . . , what is [x]?
"710 Applications of decimal representations. The possibility of representing real numbers by infinite decimal sequences is of considerable practica1 importance. However, the theorems on decimal representation also have theoretical applications of some importance. One of these will be presented in this section. This application leads naturally to a discussion of the decimal representation of rational numbers. Cantor's theorem. One of the most interesting applications of the decimal representation theorem for real numbers is Cantor's proof that the set of al1 real numbers is not denumerable. That is, there is no onetoone correspondence between the set R of al1 real numbers and the set N of natural numbers. The proof is by contradiction: we assume that such a correspondence exists and show that this assumption leads to a contradiction. Suppose t hat n u,
(n E Nln
++
u,, where O
5 u, <
1).
Let the elements of Il/I be labeled nl, n2, n3,. . . , with n l < na < n3 < . . . . Not'e that M must be infinite, since the set of al1 real numbers between O and 1 is infinite. Therefore, we obtain the onetoone correspondence
5 u <
1. Ex
7 1o]
. \ P P L I C A T I O S S O F DECIMAL REPKESEXT.4TIOXS
279
pressing each u,, as an infinite decimal seyuencc, \ve obtain the tablc
Thc contradiction which v7c are seeking is obtaincd by constructing a decimal sequcnce O . clc2 . . . C ~ C ~ +.I. . corresponding to a numbcr 1 1 , diffcrent from 1, which cannot occur in the list
(contradicting the assumption that this list contains al1 real numbcrs u satisfying O _< 'ZL < 1 ) . To obtain U,let cl be any dccimal digit which is differcnt from bl , l , O, and 9 ; Ict C? be ariy decimal digit which is diffcrent from 0 2 , 2 ,O, and 9; . . . ; let ck be any decimal digit which is differerit from bk,k, O, and '3 ; arid so forth. Yote that there are a t least six possible choices for each of the numbers cl, c2, CQ, . . . . Define
Thcii v is a positive real number, lcss than 1 , which does not end with a sequcnce of zeros or riincs. Hcnce, the decimal repre~entat~ion of u is uniclue, by Theorem 79.7. AIoreaver, for evcry Ic, u # unk. In fact, by thc way that u was constructcd, the decimal reprcsentation of v is differcnt from the dccimal reprcscritation of un,. Sincc v has only one dccimal represcntation, this implics that 11 # u,,. This complctcs thc proof.
TIIEOKEXI 710.1. Cantor's theorcm. ''hc set of al1 real izumbers is izot
dcnumcrahle. Cantor's thcorem shows coiiclusivcly that, it is not possible in any way to sct up a onetoonc corresporidence bet~iccnthc, points of a linc and the rational numt)crs. l'hc example gi\rcii in Section 72 showed that the natural coordinate corrcspondcrice between (2 arid thc points of a linc 1 does ilot exhaust al1 poiiits of 1. 'I'hc fact that it is not possible to establish any corresporidciicc betwccii Q aiid the points of 1 is a much st,rongcr result. To prove that no such correspondencc is possible, observe that by the discussioiz of Sccltioii 7:3, therc is a onetoone corrcspondencc betwecn thc
280
[CHAP.
real numbers and the points of l. Therefore, the existence of a onetoone correspondence between the points of 1 and Q would lead to a onetoone correspondence between R and Q. However, since Q is denumerable (Problem 5 , Section 12), this contradicts Cantor's theorem. Perhaps even more important than the result of Cantor's theorem is the method used for its proof. The crucial step in this proof is the observation that if an infinite list of sequences is given, arranged in the form of a rectangular array,
then any sequence cl, c2, c3, c4, . . . which differs at each entry from the "diagonal " sequence a l ,l , a2,2,a3,3, a4,4) . . . must differ from every one of the sequences ai,l, ai,2, ai,3, ai,4, . . . , which are the rows of the rectangular array. This type of argument is usually called the diagonal method. I t occurs in the proofs of some of the most important theorems of modern mathematics. The representation o f rational numbers. In Section 72 the term "irrational" was introduced to describe those real numbers which do not belong to Q. Cantor's theorem shows in a striking way that the irrational numbers are much more abundant than the rational numbers. I t is natural to ask if there is some way to recognize the decimal representation of a rational number. We will now prove that the infinite decimal sequences which represent positive rational numbers are exactly tlhe ones which are ultimately periodic.
DEFINITION 710.2.
if it is of the form
That is, fiom a certain point ont the decimal sequence consists of repetitions of a finite sequence of decimal digits.
7 101
APPLICATIONS
OF DECIMAL HEPRESENTATIONS
281
I t is convenient t,o abbreviate ultimately periodic decimal sequences by writing amaml .a0 blb2 . . . bnb,+i bn+k
The line over the block of digits indicates that this finite sequence is repeated indefinitely in the decimal sequence.
EXAMPLE 1. The expression 33.3 stands for 33.3333... = 100/3. This number could just as well be abbreviated 33.333, or even 33.333, etc. EXAMPLE 2. The expression 121.427 stands for 121.4272727... . This could also be ivritten 121.42%, or 121.42727. The possibility of expressing the number u representcd by 121.427 as 121.42727 leads to a niethod of determining u as a rational fraction. In fact
and 10 u
=
10 (121.427)
1214.27 = 1214
+ 0.27.
Subtracting these equations, the terms 0.27 on the righthand sides cancel each other, leaving
The argument used in Example 2 can easily be generalized to show that every ultimately periodic decimal sequence represents a rational number. THEOREM 710.3. Let U = Then u is the rational number
. . . a. . b l b 3 . . . bnbn+1 . . . 6 n + k .
In fact,
( l o n f k l o n ) .U
. . . a o b l b z . . . b,b,+l . . . b,+k) O . b,+l . . . b,+k] . . . aoblb2 . . . b,) + 0 . b,+l . . . b,+k] = . . a o b l b z . . . b,)] . . . a o b l b z . . . b,bn+l . . . b,+k) + 0.000. . . ,
=
282
[CHAP.
A theorem such as this always suggests a converse. In this case we are led to ask if every decimal representation of a nonnegative rational number is ultimately periodic. There is evideiice to support this conjecture. For example, by Theorem 79.7 the two decimal representations of any decimal fraction are ultimately periodic since they end in a sequence of zeros or nines. Consider also the following example.
EXAMPLE 3. It is possible to obtain the decimal espansion of rational numbers by long division. For instance,
Since the remainder 4 is the same as the number which we began dividing, i t is clear that continuation of the process will give the block 571428 repeatedly. Therefore, it seems certain that
(The only reason for not trusting this conclusion is that we have not shown that the continued use of long division does really lead to the decimal representation of a fraction. However, the validity of the result can be checked directly from Theorem 710.3.)
THEOREM 710.4. Every decimal represeiitatioii of a positive ratioiial number is ultimately peiiodic. The idea underlying the proof of this theorem is the same as the priiiciple operating in Example 3. The process of dividing the numerator of the fraction by the denominator must somewhere yield two remainders which are equal. When the same remainder occurs st second time, the decimal begiiis to repeat. This idea is somewhat disguised iii the following proof.
7 1O]
283
Let 2c be the positive rational number c/d where c and d are natural numbcrs. If u is a decimal fraction, then its two decimal representations are ultimately pcriodic, by Theorem 79.7. Thercforc suppose that u is not a decimal fraction, so that its decimal representation
\\here qo, ql, q2, . . . , rO,r l , 72, . . . are nonilegative integers, and O 5 < d for i = 0, 1, 2, . . . . There are a t most d different values which ri can take (actually, a t most d  1, since the assumption that u is not a decimal fraction implies that ri # O). Thercfore, in the list, of numbers
rn+k
r,
>: 0,
> 0.
Thcn
S o t e that r/d, q, r/d, and qn+k r/d are not decimal fractions, since otherwise c/d would be a decimal fraction. This fact is needed so that we can use the uniqueness of the decimal representations which was proved in Theorem 79.6. Because q, and qn+k are integers nnd O 5 r/d < 1, i t follows that
O bn+l
Thereforc,
bn+kbn+k+l
..
284
Consequently,
U
= amam1
1. Find the rational numbers which are represented by the following ultimately periodic decimal sequences. (a) 21.01 (b) 4.0010012 (c) 0.00111. 2. Find the decimal sequences which represent the following rational numbers. (a) 2/7, (b) 201/999 (c) 18/17 3. Let u be the real number 0.1010010001000010... whose decimal representation consists of a sequence of ones separated by blocks of zeros, with the length of each block equal to the number of ones which precede it. Show that u is irrational.
4. Show that the number
k=l is irrational. 5. Let A be a set containing a t least two elements. Use the diagonal method to prove that the set S of al1 sequences al, a2, as, . . . of elements of A is not denumerable. Use this result to show that the set of al1 subsets of the set N is not denumerable. [Hint: Let A = {O, 1) and establish a onetoone correspondence between the set P(N) of al1 subsets of N and the set S of al1 sequences al, a2, a3, . . . of zeros and ones. For instance, let
O if i 4 M and ai = 1 if i E 3 1 . 1
6. Let
[For example, (121.2121..., 003.3333...) ++ 102013.23132313....] (a) Show that this definition establishes a onetoone correspondence between the set of al1 pairs (u, u ) of real numbers and a subset T of R, provided the following convention is accepted: each real number is represented by an infinite decimal sequence (which may begin with a finite number of zeros), possibly ending with a sequence of al1 zeros, but not with a sequence of nines. (b) Show that T is not al1 of R.
7 1O ]
APPLICATIONS
OF DECIMAL REPRESENTATIONS
285
7. Use Theorem 710.4 and the proof of Theorem 710.3 to show that any natural number k: which is not divisible by 2 or 5 will divide some number of the sequence 9,99,999,9999, . . . .
8. Show that if m is a natural number which is relatively prime to 10, then the decimal expansion of l/m is of the form
CHAPTER 8
(i) C is a field containing R as a subring, and (ii) C contains a number* i which satisfies i2 = 1.
(81
I t is also reasonable to require that C be minimal among the systems satisfying these conditions. That is, there should be no proper subring of C which also satisfies (81). For otherwise, we could attain our objectives more economically with the subring than with C. The construction which gives the desired field turns out to be remarkably easy. The result is the complex number system, which not only contains a solution of x2 = 1, but also solutions of the most general algebraic equations. Complex numbers were introduced in about 1560 by the Italian mathematician Rafael Bombelli (15301572?). Bombelli was a teacher a t the University of Bologna, an important center of mathematics during the Renaissance. Until about 1800, complex numbers were viewed as mysterious objects, devoid of any real meaning.t At the end of the eighteenth century, severa1 mathematicians independently gave logically correct definitions and useful geometrical interpretations of these numbers.
* The use of the symbol i to represent 4 1 in C is standard mathematical notation. This element is usually called the imaginary unit. t A vestige of the early mysticism surrounding complex numbers is the common use of the term "imaginary" to distinguish them from "real7' numbers.
286
811
287
In order to see how the complex numbers and their operations should be defined, we suppose that there is a field F which satisfies the conditions (81). Then F contains R and the number i, and therefore it will contain i u, where u and v are real numbers. al1 expressions of the form u Also, since the usual rules of arithmetic are available in a field, it is easy to derive expressions for the sum, negat,ive, and product of such numbers:
(x
(x i y) (U
(ZU
+ +
+ +
(82)
These identities show that the collection of al1 t'he elements which can be written in the form u i v is a subring of F. Moreover, it is not hard to show that this subring also satisfies the conditions (81). In particular, if F is a field C with al1 of the desired properties, then the assumption that C is minimal implies that C coincides mith the subring of al1 elements of the form u i v. Therefore, it must be possible to write every element of C in the form u i u, where u and v are real numbers. I t is apparent that u i v is determined by the two real numbers u and v. Moreover, if x i y=u i u, then x = u and y = v. Indeed, if y # u, then i = (x  u)/(v  y). However, i2 = 1, and since (X  u)/(v  y) is a real number, it follows that [(x  u)/(v  y)]2 0, which is a contradiction. Therefore, y = u, and consequently x = u. We can summarize this discussion by saying that if a number system C with the desired properties exists a t all, then there is a onetoone correspondence, (U,u) c, U i . L',
>
between the set R X R of al1 ordered pairs of real numbers and C. This observation suggests that a way to construct the complex numbers is to define suitable operations on the set R X R. The identities of (82) show how the operations of addition, negation, and multiplication must be defined for the ordered pairs. There is another important fact which is a consequence of the above discussion. Any two rings which satisfy al1 of the requirements desired for C are isomorphic. That is, if C exists a t all, then C is unique. f al1 complex numbers consists of al1 DEFIKITION 81.1. The set C o ordered pairs (u, u) of real numbers. I f (x, y) E C and (u, u) E C, then (a) (x, y) (u, u) = (x u, Y 4; (b) (x, y) = (z, y); and (e) (x, y ) (U, u) = ( X U  y v, y u x u).
288
[CHAP.
The ordered pairs of real numbers are definite objects which can be interpreted as complex numbers without any logical contradiction. However, the set of al1 ordered pairs of real numbers often occurs in mathematics with other interpretations. The intended meaning of (u, v) should be specified whenever such pairs are used. In the case of complex numbers, this will usually be unnecessary, because once we show that the system C defined above satisfies the requirements listed in (8l), it will be possible to return to the convenient notation u i v. The reader should be aware of the double use of the signs , and . in Definition 81.1. On the lefthand sides of the identities (a), (b), and (e), they represent the operations which are being defined for ordered pairs, while on the righthand sides of these equalities, they indicate the known operations in the field R of real numbers. There is no problem about the operations in Definition 81.1 being well defined, as there was in the case of the rational numbers and the real numbers. Definition 81.1 involves no arbitrary choice, such as was made in defining the operations on the equivalence classes which are the elements of Q. Also, the expressions on the righthand sides of (a), (b), and (c) obviously belong to the set C of al1 ordered pairs of real numbers, so that the problem of closure, which was troublesome in defining addition, negation, and multiplication of real numbers, does not arise. I t must now be shown that the complex numbers as defined above satisfy the description given in (81). This result is the content of the two following theorems.
+,
THEOREM 81.2. The set C of al1 complex numbers with the operations defined in Definition 81.1 is a field with (O, O) as the zero element and (1, O) as the identity element. The complex number (O, 1) is a solution of the equation
x2
(l, O).
Proof. The proof that C is a commutative ring with (O, O) as the zero element and (1, O) as the identity element consists of checking the identities of Definition 42.1 in a straightforward way. For example, me will prove the associative law of multiplication:
8 11
THE COSSTRUCTION
289
Hence, (ul, V I ) ( ( ~ 2~ , 2 )( ~ 3va)) , = ( ( ~ 1V , I ) .(~2,v2)) ( ~ 3~ ,3). To prove that C is a field, it is necessary to show that if (u, u) # (O, 0) in C and (w, z) E C, then there exists (x, y) E C such that (u, U) (:c, y) = (w, 2).
(83)
If both sides of this cquality are multiplied on the left by (u, u), then by the associative law just proved, we obtain (u2
+ v2, O)
(5, y) =
(U,
U)
+ uz).
so that 'ZL = u = O. This contradicts the assumption that (u, u) # (O, 0). Thus, (u2 v2)' exists in R, and
+
=
(x,y)
(1, O) (2,y) = (((u2 v2)l, 0) . (u2 v2, 0)) (x, 9) = ((u2 u2)l, 0 ) . (U, z!) (w, 2) = ((u2 v2)l (uw L'z),(u2 u2)l (u2  uw)).
+ +
As frcquently happens in elementary algebra, the steps which lead to the solution of (83) can be reversed to prove that the expression obtained for (x, y) really is a solution: (u,v) ((u2
+ vz), (u2 + v2)l. (uz  vw)) = (U, u) (U, u) ((u2 + v2)l, O) (w, z) = (u2 + u2, O) ((u2 + v2)l, O) . (w, = (1, o) (w,
+ u2)l
(uw
2)
2)
= (w, 2).
By Definition 81 .l (c) and (b), (0, 1)2 = ( 1,O) servation completes the proof of Theorem 81.2.
= (1,
The definition of a complex number as an ordered pair (u, v) of real ,u i u, where numbcrs was suggested by the correspondence (u, u) t u and v are real and i2 =  1. In particular, a real number u = u i O should corrcspond to the pair (u, O).
290
[CHAP.
THEOREM 81.3. The correspondence zc t , (u, O) is an isomorphism between R and the subring R' = ((u, 0)Ju E R ) of C. Each element (0, 1) (v, O). of C can be expressed in the form (u, O)
This theorem, whose proof we leave for the reader, is the justification for identifying each real number u with the corresponding element (u, O) of R'. I f this identification is made, then R becomes a subring of C. Thus, C satisfies condition (i) of (81). In particular, we have now attained the following chain of inclusions relating the classical number systems of mathematics:
NcZcQcRcC.
For simplicity, each element of R in C will be denoted by a single symbol such as O, 1, +, 2, u, and v, rather than by the corresponding pair (O, O), (1, O), (3, O), (2, O), (U,O), and (v, O). Note that with this notation, O and 1 represent the zero and identity of C, as they should. Moreover, by Theorem 81.2, (0, 1)2 = 1, which leads to an exact definition of the symbol i as an abbreviation for (0, 1). Therefore,
and C satisfies condition (ii) of (81). By virtue of the notat,ion just introduced, the expression t c on a definite meaning as a complex number. In fact, u
+i
v takes
+ i . v = (U,O) + (O, 1)
(u, O)
(u, u).
We see from this equality that every complex number can be represented uniquely in the form u i v, with u and v real numbers. From this fact it follows easily that no proper subring of C satisfies (81); that is, C is minimal.
1. Express the following coinplex numbers in the form u iv. (4 (2, 1) . (172) (4 (3,2b'(l, 1) (a) (1, 1) (b) ( 0 , l ) (2, 1) where (u, u) # (O, O ) (e) (u, 2. Complete the proof of Theorem 81.2. 3. Prove Theorem 81.3. 4. Determine t,he value of the sum i q o r al1 values of n. 5. Show that the following sets are subrings of C. ( 4 ((Y, s)lr E Q, s E Q) (b) {(a, b)la E 2, b E 2)
821
COMPI~EX COSJUGATES
A X D ABSOLUTE VAL'L'E IS c
291
82 Complex conjugates and the absolute value in C. I t is iiot possible to dcfinc an orderiiig of thc complcx riumbcrs such that they will form an ordered ficld. I'or if C could be made into an ordered ficld, then i2 =  1 would have to be both positivc : ~ n d negativc, by 'i'heorem 453. Ho~vever, the ficld C has somc important special propertics ~vhich are not present in every field, or evcn iii ordcred fields. I n this sectioii the clonsequences of somc of thcse propcrties will be examined. Throughout this section and the remainder of the book, we \vil1 represent complcx numbcrs cithcr by single lettcrs, such as z and w, or else by the notation z i y and zc i c , where .c, y, u, and c dciiotc real numhcrs. Thc discussion a t the cnd of Section 81 justifies this c:oiivention. Since . x i y = u iu implics that s = u and = v, wc sec t,hat the real numbers n: aiid ?j appearing in the rcpreseritation z = .c i y are uniclilcly detcrmined by the complex riumber z. The real iiumber n: is called the real part of x, and y is called the imnginary part of z. I t is convenient to write n: = o ] ( ~ ) and y = g(z) in this case. T h a t is,
+ +
+ +
and and
a(z) S(z)
+ i y be a complex number.
The complex
We ~vill often simplify the phrase "complex conjugate of 2'' to "conjugate of z," although t,hc latter expression has a broadcr meanirig in ot,her phases of algebra. Also, it is customary to write n:  i y instead of z i(y).
THEOREM 82.2. Let z aild w be complex numbers. Theii (a) z + w = Z + w ; (b) (x)= 2; (c) 2 . w = X  w ; (d) if w f O, thcn = z/w; (e) if w = 2, t,hen = z, that is, = z; (f) x Z = 26i(z), z  Z = 2iS(z).
292
[CHAP.
We will omit the proof of (a), (b), (e), and (f). The proof of (c) is obtained by direct computation. Let z = x iy, w = U iv. Then Z = x i(y), and m = u i(u). Hence,
yv)
+ i[(yu + xv)],
and ( u .
To prove (d), note that by what we have just showii Hence, Z/w = Z/w.
If z = x
+ iy, then
Therefore z Z is a nonnegative real number, and it has a square root in R. DEFINITION 82.3. Let z = x iy be a complex number. The absolute value or modulus of z is the nonnegative real number
I f z is a real number, say z = x i O, then l t l = I f x 2 0, then @ = x. If x < O , then @ = x. Therefore, the definition of the absolute value of z given above is consistent with Definition 46.6 for the absolute value of elements of an ordered integral domain (in particular, the absolute value in R).
THEOREM 82.4. Let z and w be complex numbers. Then z I O; if I z I = O, then z = 0; (a) I (b) z Z = 1212; (c> 1 . 4= 1x1; (d) 121 = 1x1; (e) l z w l= 1 . 4 ; (f) if w # O , then I z / w l= I z I / I w I ; (g) l @ ( z ) l 5 121, Ig(z)l 5 121; (h) 1 2 w l: '1 . 4 Iwl.
e.
>
IwI
821
293
iy. By Definition 82.3, 1x1 2 0. I f jz/ = O, then Proof. Let z = x O = x2 + y2 2 x2 2 0; hence, x = 0; similarly, y = O . To prove (b), observe that z
z=
(x
+ iy)
[x
+ i(y)]
= x2
+ y2 = Izj2.
The equality (c) is obtained from (b) and Theorem 82.2(e) by taking the square root of both sides of the identity 1212 = I =Z z= z E = 1.~1~. Using Definition 82.3,
The identity (e) is obtained from (b) and Theorem 82.2(c) by taking the square root of both sides of the equality
Using this result, we have Iz/wl . Iwl = 1 (z/w) . w l = jzj. If w # O, then Iwl t ' O by (a), so that this identity can be divided by Iwl to obtain (f). I fz=x iy, then by definition,
The second statement of (g) is proved in a similar way. Finally, to obtain the triangle inequality {h), note that by Theorem 82.2,
Taking the square root of the first and last term of this inequality yields (h). The theorem we have just proved contains the most important elem e n t a r ~properties of the absolute value. The reader should become thoroughly familiar with these f acts. The result of Theorem 82.4(b) can be used to calculate quotients in C. The general idea, which was used implicitly in the proof of Theorem 81.2, is that
294
T H E C O M P L E X KUMBERS
[CHAP.
We will use the results of Theorems 82.2 and 82.4 and the fact that every nonnegative real number has a square root to prove that every complex number w has a square root x in C. In fact, an explicit expression for x in terms of w can be obtained. Let w E C. First assume that there is a complex number x which satisfies x2 = W. We will solve this equation for z in terms of w. By Theorem 82.4, x2 implies Iwl = / z 2 /= lz12 = 2 . 2. Therefore,
=
+ @(w)]is a nonnegative real number. Taking square roots, 2@(z) = & . \ / [ 2 ~ ~ ( z = ) ] 2f d ~ [ l w l + @(w)]. (87) There are now two cases to consider. I f jwj + @(w) # O, then by (87),
Using (87) again, \ve find that x has the two possible values,
I f
IwI
+ @(w) = O, then
Hence g(w) = 0, and w = @(w) = Iwl. That is, w = u, where u is a nonnegative real number.' By (87)) @ ( E ) = O. Hence, x = i g ( x ) , and
821
295
Our discussion shows that if there is a compIex number x satisfying z2 = w, then z must have the form (88) if lwJ @(w) # 0, and 2 is @(w) = O. I t remains to show conversely that if given by (89) if lwl @(w) # 0, and by (89) when lwl is given by (88) in case Iwl @(w) = O, then z2 = w. This is done by an easy computation. Suppose first that Iwl @(w) # O. Then
If Iwj
+ a ( w ) = O, theii ( f i f l ) 2
1wI
W.
THEOREM 82.5. I f w is aiiy nonzero complex number, theii there are o numbers z such that x2 = w. If jwl @(w) # O, exactly t ~ complex then these numbers are given by
I f Iwl
i m
and
im.
For any complex number w, it is convenient to Iet the symbol fistand f or IwI J2[lwl @(w)l
if Iwl a ( w ) # 0, and for if lwl @(w) = O . Then we caii say that the two square roots of w are 6and &.
i\/m +
=
5, @(w)
3, and 2[/wl
+ @(w)]
296
[CHAP.
In the case of square roots of complex numbers, just as for square roots of real numbers, we must be careful not to assume that @ is always equal to w. I t is in fact easy to see that
(81 O )
In any case, 4 3 = h w . More generally, we obtain the following result. (82.6). Let z and w be complex numbers. Then (a) 4X.W = &(&&); (b) if w # O, then = =t (G/&). We leave the proof of these identities for the reader. The theorem that every complex number has a square root in C can be used to show that any quadratic equation
where a, b, and c are complex numbers and a jL 0, has a solution x in C. Suppose that x is a complex number which satisfies (81 1). Rewrite (81 1) in the form
That is, the term b2/4a is added and subtracted on the lefthand side of (8ll), so that the expression in parentheses becomes a perfect square. This is the familiar method of completing the square. I t leads to the equality
Therefore,
Conversely, it can be checked by direct substitution that the two numbers given in (812) are solutions of (811). That is, the following result holds.
821
297
#
O. Then
X =
(1
+ i2) + (3 + i2)
2(1
+ i)
2
A 
2(1
+ i) 
(1  i))
4 
(d)
d m
(e)
fl
3. Find the solutions of the following equations. 1 = O (b) (3 i)x2 (a) x2 2ix 10x  (9 j i3) = O (c) 5x2+ 2/Zx  1 = o 4. Prove Theorem 82.2(a), (b), (e), and (f).
+ +
298
[CHAP.
5 . Show that if z and w are complex numbers, then the following are true.
( 4 lz 5 lzl (b) Iz  w l 2 IzI ( 4 lz wI2 Iz  wI2
WI
+ IwI
+ +
IwI
7. Prove (810).
8. Prove (82.6). 9. Show that if @(w) > O and @(z) > O, then
2 / / x = fi d .
10. Show that the numbers given by (812) are solutions of (811). 11. Prove that if w = u
12. Let w = a ib, where a and b are integers. Prove that Iwl is an integer if and only if w = t z2 or w = it z2, where z = r is with r, S, and t integers. [Hint: See Theorem 55.4.1
w.
83 The geometrical representation of complex numbers. We mentioned in Section 81 that ordered pairs (u, v) of real numbers have severa1 interpretations in mathematics. One of the most familiar applications of these pairs occurs in analytic geometry. In fact, analytic geometry is based on the "coordinatization" of the plane, that is, a onetoone correspondence between the set of al1 points P cif the plane, and the set of al1 pairs (x, y) of real numbers. This correspondence provides an important way of representing complex numbers by points in the plane. For the reader who is not familiar with analytic geometry, we will discuss briefly the process of defining coordinate systems in a plane. The construction begins with the choice of any two perpendicular lines. It is convenient to take one of these to be horizontal. This line is called the xaxis, and is denoted by X. The other line must then be vertical. I t is called the yaxis, and is denoted by Y. Let O be the point of intersection of X and Y. The point O is called the origin of the coordinate system in the plane. Let I be a point on X which lies to the right of O. Using 01 as the basic unit interval, define a coordinate system on X by the construction described in Section 73. Let J be a point on Y, above O, such that the distaiice is equal to the distance That is, the segments 01 and O J are congruent. Establish a coordinate system on Y using OJ as the basic unit interval. Let P be any point in the plane. Construct the line i through P and parallel to X (hence perpendicular to Y). Also draw the line m passing
m.
831
299
through P and parallel to Y (hence perpendicular to X ) . Then 1 meets Y at some point S, and m meets X at some point T. Let x be the real number corresponding to T in the coordinate system on X . Let y be the real number corresponding to S in the coordinate system on Y. We associate with P the number pair (x, y) (see Fig. 81) :
Different points evidently correspond in this way to different number pairs, and every pair of real numbers is associated with some point. In fact, the point corresponding to (x?y) can be found as the intersection of the vertical line through the point associated with x on X and the horizontal line through the point corresponding to y on Y. Thus, P * (x, y) is a onetoone correspondence between the set of al1 points of the plane and the set of al1 pairs of real numbers. A plane, together with a correspondence between points and number pairs defined in this way, is called a coordinate plane. The numbers x and y in the pair (x, y) corresponding to the point P are called the cartesian* coordinates of P. Sometimes, to be more specific, x is called the Xcoordinate or abscissa of P, and y is called the Ycoordinate or ordinate of P. The points on the xaxis are exactly the points whose coordinates are of the form (x, O). The points on the yaxis have coordinates (O, y). In (O, O), 1 ++ (1, O), and J * (O, 1). particular, O (83.1). Let S and T be points with cartesian coordinates (xs, ys) and (xT, yT), respectively. Then the distance between S and T is
* The term "cartesian" is used in honor of the French mathematician and philosopher Rene Descartes (15961650), who was the founder of analytic geometry.
300
Prooj. The proof of this statement is based on the Pythagorean triangle theorem. Let ls and lT be horizontal lines through S and T, respectively, ~ vertical lines through these same points. Let P be and let ms and r n be the point of intersection of the perpendicular lines ms and lT. Then P S T is a right triangle with S T as its hypotenuse. Figure 82 illustrates a f ST is horizontal or vertical, the triangle PST is typical situation. I degenerate, and this case requires special treatment. The lines ms and r n ~ intersect the xaxis a t points corresponding to the real numbers xs and XT, and these two points, together with T and P, determine a rectangle. Thus, the distance between P and T is the same as the distance between the points on the xaxis corresponding to xs and x ~ : T P = IxS  xT]. Similarly, PS = \ys  YTI. Hence,
72
si
T P+ ~ FS2 = l x s
=
xT12
+ lYS
yT12
Taking the square root completes the proof. We now turn to the representation of complex numbers as points in a coordinate plane. This representation is obtained simply by using the definition of complex numbers as pairs of real numbers, and associating each complex number x i y = (x, y ) with the point in the coordinate plane whose coordinates are x and 3. I t is then possible to use the complex numbers as labels for the corresponding points in the plane (see Fig. 83)) just as the real numbers are used to represent the points on a line. The term complex plane is often used to describe a coordinate plane whose points are labeled by complex numbers. I f complex numbers are interpreted in this way, then the operations with iy, them have interesting geometrical meanings. For example, if x = x
then @(z) = x is the abscissa of z and g(z) = y is the ordinate of z. Thus, in particular, the real numbers represent points on the xaxis. The absolute value /zl = is the distance from the origin O to z. More genl is the distance between erally, if z and w are complex numbers, then (z  w the point z and the point w. To see this, let z = x iy and w = u iv. which, Then I z  WI = 1 (x  U) i(y  U) 1 = d ( x 4 (y by (83.1), is the distance between the point with coordinates (x, y) and the point with coordinates (u, u). Often it is possible to give concise descriptions of sets of points in the plane, using complex numbers.
.\/m
+
EXAMPLE 1. {zIB(z) > O) is the set of al1 points in the upper half plane; in other words, the set of al1 points which lie above the xaxis. EXAMPLE 2. {z/121 < 1) is the set of al1 points which have distance less than one from the origin O, that is, the set of points which lie inside a circle of radius one with center a t O. EXAMPLE 3. {zI Iz  'il = 1) is the set of al1 points on the circle with center a t i, and radius equal to one. EXAMPLE 4. {z14(z) = m@(z)),where m is a real number, is the set of al1 points on a line 1 through the origin, with slope equal to m (see Fig. 8 4 ) .
The addition of complex numbers has an interesting geometrical meaning. Let z and w be complex numbers representing points in the complex f x, w, and O lie on a line E which plane. Let O be the origin in the plane. I is not the yaxis, then by Example 4, B(z) = m@(z) and 4(w) = mR(w) w) = $(z) S(w) = for some real number m. Consequently, 4(x m[@(z) @(w)]= m@(z w). Therefore, z w corresponds to a point on E . [If z and w lie on the 3axis, then R(z) = R(w) = O implies @(z w) = 0, and z w is on the yaxis.] I f the origin O does not separate z from w on E, then z w is a t a distance 1x1 Iwl from O on the same f O is between x and w, then x w is at a distance side as z and w. I llzl  lwll from O? on the same side as x if 1x1 > Iwl, and on the same side as w if lwj > lzl (see Fig. 85). I f x, w, and O do not lie on the same line, then the point corresponding to x w can be determined by the parallelogram rule.
[CHAP.
(83.2). Parallelogram rule. Let x and w be complex numbers representing points S and T, such that O, S, zlnd T do not lie on a line. If P is the point corresponding to z w, then OSPT is a parallelogram (see Fig. 86).
I t should be remarked in connection with (83.2) that OSPT js the order in which the vertices are encountered in moving around the sides of the parallelogram. That is, ST and OP are diagonals, not sides of the figure. The proof of (83.2) is an exercise in elementary geomet'ry, which we will leave for the interested reader.
1. Draw coordinate axes in a plane, and plot the points with the following coordinates: (2, 1), (1, 2), (1, l), ($, +). 2. Find the distance between the following pairs of points in the complex plane. (a) 4, i9 (b) i, 2  i (c) 1 i, 1  i (d) 9 + i(15),4  i9
{zl9(ix)
I2[] is the circle with centcr a t w, with 4. Sliow t h a t {xl Iz  zr;l = radius 4 2 1x1. [Hint: Use the ideiitity obtained in Problerii 5(c), Scction 82: ' 7 L w ' 2 t  2C12 1% I 2(/s12i Izc12).] 5 . \\1i:it is tlie geon~ctricalinterl)rctation of tlie law Iz wl 5 1x1 IwI? ITscthis intrr1)rctation t o tlccide ivlicri tlie cquality Ix wl = 1x1 f I w l holds.
1
IZ
6. \\1i:tt is tlie geon1etrica.1 mclnnirig of thc idcritity givcn in Problem 5(c), Scctioii 82:>
7. Describe tlic riietliod of finding the point corresponding t o z l w, if you are givcn thc poiiits corrcspoiidirig t o z and x.
8. Mhat is thc gcoriictrical iiitcr1,rctation of 2 , 2, and z poiiits rcl)resciitccl 1.13' x and zo) 3

9. Slio\\ thnt if z # O, ~vhcrex is a coiiiplex riuniber, and if t is a n y real niiiiiljcr, t1ic.n tlic poirits 0, z, an(1 t z arc al1 in a line. S h o ~ v t h a t O, z, sncl zu lie ori : L liiic if iiritl only if eitlicr z = O, or w is a real multiljlc of z.
84 Polar representation. T o iiitcrpret multiplication, \ve introduce a 11t.n nay of 1eprcsciitirig complcx iiumbcrs. 'i'his reprcsciitatioil is bascd oii thc polar coordinatc systcm* used iii analytic geometry, and for this rcasoii it is crtlled the polar reprcscntation of complcs iiiimbers. Lct O be tkic origin of s cartesirtn coordiilate system iii tlie plaiie. As iri the 1;ist scctioii, tlic points of the plaiie \vil1 bc lat~eledby complex iy be a noiizero complex iiumbcr. 'i'he line scgnumbcrs. I,et x = .r merit from O to the poiiit x has lciigth 171 = 2 / F + T I,et . 8 dcilote thc :lnglct whicbh this lirie scgmeiit makcs nitli tlic riglit lialf of the .raxis (SCC Z:ig. 87). 1s is (bi~stomiiry, \ve will meusiirc aiiglcs couiitcrcloc~k\vise bctuccii O :~iid 300 dcgrccs. 'i'hc compoiiciits .L. :iiid of x cnii be espresscd in tcrms of 12; alid 8 by the cquatioiis 2: X
+
.r
{xi cos e,
;xl siii 8.
FIGURE 87
* 'i?ic dcfiriition of polar coordinates niakcs use of t,he geonictrical concept of "anglc" arid tlic trigononictric functions "sinc" and "cosine." Iii ordcr t o obtaiii rcsults sucli as 'i'licorcril 84.3, bcloiv, ~vithoutresortirig to tlie usc of "gconietric:illy cvident" f:icts, \\e \\ould liavc to dcfiiic :iriglcs, sirics, aiid cosines rnorc carcfully, and work h:irtl to show tliat thcsc, iiotions liave tlie 1,roljerties nliiclli are o1)vious to our geoii~etricalintuitioii. Tlo~\cver, no attcriipt will be iiiadc to carry out sucli a 1)rograni here. t Tlie syrnbol 8 is tlic siilall Grcck ltttcr tlicta. Ingles in ri~atlien~atics are oftcn rc1)rcscntctl by l o \ ~ e r case Greek lettcrs, sucli as 8, 4 (phi), and x (chi).
1x1 (COS 6
+ i sin 6).
This expression is called the polar representation of x. The angle 6 is called the argument, or arnplitude, of t'he complex numbei x. This angle will be denoted by Arg x. I t should be remembered that the argument of the complex number O is not defined. Since we are measuring angles in degrees* between O and 360, the argument of every nonzero complex number satisfies
< Arg x
< 360.
It is evident that if x is a positive real number, then Arg x = 0, and if z is a negative real number, then Arg x = 180. Although the argument of a complex number x is always betureen O and 360, it should be noted that if x = 1x1 (COS 8 i sin 8)) where 8 = Arg x, then we also have
+ n  3 6 0 ) + isin (8 + n.360)]
+ i sin 410)
is a complexnumber with Arg x # 410. Infact, Arg x = 410  360 = 50. Let x and w be two nonzero complex numbers, with Arg x = 8, and Arg w = 4. Then
x =
IxI(cos8+isin6),
w = Iwl(cos++isin+).
Hence,
x w = 1x1 Iwl [(COS 6 cos 4

sin 8 sin 4)
This expression can be simplified by using the sum formulas of trigonometry : cos (6 4) = cos 6 cos 4  sin 6 sin 4, sin (8 4) = sin 6 cos 4 cos 8 sin 4. We obtaiii (814) x w = 1x1 IwI [COS (6 4) i sin (6 +)l.
+ +
+ + +
* I n most higher mathematics, angles are measured in radians rather than degrees. However, for the applications in this book, radians have no advantage over degrees. I n computational work, i t is more convenient to use degrees rather than radians, since most trigonometric tables list angles in degrees.
841
POLAR REPRESEXTATION
305
Comparing this formula nith the equation which expresses the trigonometric representation of z w,we obtain the rule for determining the argument of the product of two numbers. THEOREM 84.1. Let x and w be iionzero complex iiumbers. Theil Arg z w = Arg x
Y
+ h r g w,
z

+ Arg w < 360, and Arg z w = Arg x + Arg w if Are; x + Arg w 2 360.
if Arg x
360,
FIGURE 88
This theorem provides the desired geometrical interpretation of multiplication in C. The point x . w is the point on the half line which makes an angle of (Arg z Arg w) with the positive real axis, and which is a t a distance lzl I w l from O (see Fig. 88). A particularly important case of (814) occurs when w = z. This 28 i sin 28). We can easily genformula then becomes z2 = /zI2 (COS eralize this result by induction:
xn
= Iz jn
(cos n8
+ i sin no),
where 8
Arg x.
(815)
In particular, if 1x1 = 1, this identity gives Demoivre's theorem. TIIEOHEM 84.2. (cos 8
The theorem of Demoivre has numerous applications. One which students often appreciate is its use as a device for deriving trigonomet,ric formulas for multiple angles.
EXAMPLE 1. JVe use Theorem 8 4 . 2 to determine cos 48 and sin 48. Taking = 4 in this formula, and using the binomial theorem, we obtain:
cos 48
+ i sin 48
= =
(cos 8 cos4 8
+ i sin + 4 cos3 8 ( i sin 8) t6 cos2 8 ( i sin + 4 cos 8 ( i sin + (i sin (cos4 8  6 cos2 8 sin2 8 + sin4 8)
4i (4 cos3 8 sin 8  4 cos 8 sin3 8).
cos4 8  6 cos2 8 sin2 8 sin4 8, 4 cos3 8 sin 8  4 cos 8 sin3 8.
THE COMPLEX S U M B E K S
l+i
zo = $2(cos
15 i sin 15)
A more significant application of t,his theorem occurs in the proof of resul t. the follo~ving
THEOHEM 84.3. Let w bc ariy nonzero complex number. Let n be a natural number. Then there are exttctly n distinct complex numbcrs z sat,isfying zn = w. If w = I w l (cos O i sin O), wit,h O = Arg w, t,hen these numbers are given by z = 20, z = 21, . . . , z = Znl, where
Zj
7 i . I
"O)
+ i sin (O
+:
360)]
(816)
+ i and n = 3.
(":
"O)
+ i sin (O
=
=
1 ' 360)])1
I w l [COS (e
I w l (COS O
Thus, for any nonnegative integer j, the formula (816) gives a solution o the equation zn = w. What ]ve must shom is that for j = 0,1, . . . , n  1, the numbers given by (816) are al1 ditrerent, and that for any z sueh that zn = w, there is somc j = 0, 1, . . . , n  1, such that z = zj. First observe that if O 5 j < n, then
841
POLAR REPIEESESTATIOS
+ j 360
n
+ j 360
+ j~ 360 # S + 360 n n
j2
Arg zj2.
Thercfore, zj, # zj,. This proves that the numbcrs 20, 21, . . . , ~ n  1 are al1 different. Assume next that z is a complex number which satisfies zn = w. )Ve want to prove that z is one of the numbers 20, 21, . . . , ~ ~  1 that is, /zl = and S j 360 Arg z = n
vm
for some j = 0, 1, . . . , n  1. I t follows from Theorem 82.4(e) (using mathematical induction) that zn = w implies lzln = i w l . Therefore, Izl = Let z = lzl (cos i sin +) be thc polar representation of z, with + = Arg z. Thcn by Theorem 84.2,
w.
++
Izln (cos n+
Therefore, since jzln = (w!, this equality implics cos n+ = cos 0, and sin n+ = sin 8. Using the fact that ttvo anglcs mhich have the same sine and cosine differ by an integral multiplc of 360, we obt,ain n+  S = j 360, where j E 2. Thereforc, S j 360
+= +
O 5 S
Xrg w
<
360,
and
308
[CHAP.
1. Find the arguments of the following complex numbers. (e)  l + i (d) l + i 3 (e) 3  24
(a) 5
(b) i
2. Use Demoivre's theorem to obtain expressions for cos 58 and sin 58 in terms of cos 8 and sin 8. 3. Give the inductive proof of (815) in detail. 4. Show that Demoivre's theorem is valid for negative exponents; that is (cos 8 for al1 n E N. 5. Use Theorems 84.3 and 82.5 to obtain expressions for cos (8/2) and sin (8/2) in terms of cos 8 and sin 8. 6. Using a table of sines and cosines, find al1 the solutions z of the following equations (giving the real and imaginary parts with four decimal accuracy). (a) z3 = 1 i (b) z5 = 1 (e) z4 = i (d) z6 = 1 (e) z3 = 2  i2
+ i sin 8)
n
cos (no)
+ i sin (no)
rnl
+ lk + lZk + .+ + rzk+ + . +
{k+
<(nl)k <(nl)k
'rhe solution of equations of higher degrec is also considered, but usually only in the cases where the polynomials involved can be factored; for example,
These examples are al1 special cases of the general nth degree equation
310
T H E T H E O R Y OF ALGEBRSIC EQUATIONS
.L.
[CHAP.
make this quantity zero. For first and seconddegree equations, convenient formulas exist which give the solutions explicitly. The solution of the linear equation
a.
is given by the formula The quadrat'ic equation
+ a l z = O,
(al # 0)
,J=
  .a o
a1
There is no reason to suppose that for equations of degree higher than two i t will always be possible to find solutions among the complex numbers. We saw that in order to solve equations such as x2  2 = O and x2 5 = O it was necessary to extend the rational number system to the real number field; to solve s2 1 = O it was necessary to go beyond the reals to the complex numbers. 1s there any reason then to suppose that it is possible to solve such equations as
x3+32+1=0,
and
x5f(1+i)x3+2x2ie1=O
within the complex number field? Moreover, even if the general nth degree equation has solutions in the complex field C, can we expect to obtain explicit expressions for these solutions such as we have for the solutions of linear and quadratic equations? These two questions will be discussed more completely in Sections 98 and 99. In the theory of numbers, the study of algebraic congruences is almost as important as the study of algebraic equations. The problem of solving linear congruences, a2 b = O (mod m ) ,
and
a r e m o r e difficault t o solvc. Of course, integral solutions .r of aiiy congrucliicc can bc foilnd by trial a n d error if t h c y exist. Hoivcvcr, this m c t h o d is pracatical oiily f o r congruenccs of small modulus.
First observe tlint r y (111od 8) implics tliat x4 3x2 52 1 Y4 k 3y2 t 5y 1 (mod S). Thcrcforc, if x is a solution of x4 3x2 52 1 O (nlod S), thcn so is y, :ind vicc versa. This iiicans t h a t t o solve the congruence x4 3x2 5.r 1 = O (rnoci S), it is oiilj ricccssary t o find the riunlbcrs in thc sct {O, 1, 2, . . . , 7;
+ + + +
+ + + + +

wliicli are solutioiis. Evcry solutiori is congrucnt t o n solution which is in t h i s set, and conversely, every iiumber ~vhicliis congrucnt t o a solution x satisfying O x < 8 is itsclf a solution. Ucfore stnrting thc work of substituting each of thcsc nunihers into the given congruence, Ict us note t h a t thc theorems of Chaptcr 5 can be used t o simplify our work. If x is even, then
<
'i'hereforch, by Theorein 56.4(b), no cvcn valuc of x can be a solution of x4 3x2 5x t . 1 O (nlod 8). I3y Eulcr's Tlicorcm, 58.6,
x4 l~rovidcd that
= 2 ~ (= ~1 ) (mod
8),
1, 3, 5, and 7 : +2=
1E 3.151
+ 3x2 + 52 + 1 = 3 + 3x2 1 x + 1 3
x4
+ 32"
5x 1
2(mod8)
= =
= =
1, 3, 5,
7.
312
[CHAP.
The problem of solving congruences with a prime modulus p is equivalent to finding the solution of equations
where ao, a l , . . . , and a, are elements of the field 2, which was discussed in Example 1, Section 63. This fact is important because it enables us to apply tjhe theory of algebraic equations to obtain theorems about congruences. Such an application will be given in Section 97.
1. Find the real values of x ~vhich are solutions of the following equations. (a) (b) (c) (d) (e)
2. Find the real and complex values of x which are solutions of the following equations. 1= O (a) x4 (b) x 3 + x 2 + x + 1 = O (c) x 1 + 5 x 5 + 4 = o (d) x2" 2xn  1 = O (e) x6  3x4 3x2  1 = O
(a) has two (different) real solutions if b2 > 4ac, (b) has one real solution if b2 = 4ac, and (c) has two complex conjugate solutions if b2 < 4ac.
4. Find al1 integers x which are solutions of the following congruences. (a) x2  2x 1 O (mod 2 ) (b) x46 l 7x32 8x17 5x16 2x9 4x3 32 1 = O (mod 3 ) (c) x6  1 = O (mod 7) (d) x1 63x4 x = O (mod 9) (e) 2x25 57x 1 = O (mod 30)
+ + + +
+ + + +
92 Polynomials. The theory of equations is based 011 the algebra of polynomials. From elementary algebra, the reader is familiar with the procedures f or adding, subtracting, multiplying, and factoring polynomials. Al1 of the operational rules which were given in Section 42 as the postulates for a ring are used in manipulating polynomials. In order to
jiistify the use of these rules, it is necessary to examine the concept of a polynomial more critically than is customary iii elementary algebra courses. This is particularly imperative for our purposes, because we want to develop the theory so that it can be applied to equations iii the fields Z,, as me11 as the fields of complex and real numbers. Our plan in this section is to first review the intuitive definitions concerning polynomials and their opcrations. Then ]ve will examine these notions more critically, and see how they can be put on a sound basis. The reader who is not iiiterested in the formal devclopment of polynomials may omit the last part of this section. Let D be an integral domain. A polynomial in . x with coeficients i n D, is (tentatively) defined to be a formal expression
~vhcreao, a l , a2, . . . , and a, are elemeiits iil D. For the present the symbols xO,z l , . . . , xn and the plus signs iii (91) arc to be thought of as nothing more than punctuatioii marks which separate ao, a 1, . . . , and a,. The notation aozO a1z1 anxn is adopted because this expression will ultimately be interpretcd as a sum of products. For O 5 i 5 n, the expressions ai.xi are called terms of the polynomial. The elements ao, a l , a2, . . . , a, of D are called the coeficients of rO, xl, r2, . . . , and m", respectively, in this polynomial.
DEFINITIOS 92.1. TWOpolynomials iii x with coefficients iil D are equal if they have exactly the same terms, except for terms with zero coefficieiits. That is,
if a. = bo, a l = b l , a2 = b2, . . . , and a j = O for m m < n,orbj=Oforn < j 5 mifn <m. For example,
<j 2
n if
I n writing a polynomial, it. is customary (a) to omit terms with coefficient O, (b) to write a. inst,ead of aozO,(c) to write r instead of z l , (d) t,o write z j instead of l x j for j > 0, and (e) to write a$ instead of (aj)z3. For instante, instead of
[CHAP.
+ 22 + x3
5x4.
We will later see that these conventions are entirely justified. It is also a common practice to use expressions such as a(x), b(x), c(x), f(x), g(z), and p (x) to represent polynomials. It follows from Definition 92.1 that any two polynomials can be written with the same number of terms. For example, suppose that aoxO a2x2 . amxm and boxO blxl bnxn are polynomials with m < n. Then we can write
+
This observation shows that the following definition of addition of two polynomials is completely general.
be polynomials in z with coefficients in D. The sum of a(x) and b(z) is the polynomial
As an example, let a(x) = 2 x  x2 and b(x) = 3 x2 2x3. Then a(x) b(x) = 5 x 2x3. Indeed, we can consider 2 x  x2 as an abbreviation for 2x0 lxl (l)x2 0x3, and 3 x2 2x3 as an abbreviation of 3x0 Oxl 1x2 2x3; therefore, by Definition 92.2,
+ + + + + + + + +
+ + + + +
In elementary algebra courses, the process of multiplying two polynomials is usually carried out in severa1 steps. First, al1 combinations of two terms, one from each polynomial, are multiplied. Then the rule of exponents is applied to the powers of x, and finally the coefficients of equal powers of x are collected. The whole procedure can be carried out, using
I t is not convcnicnt to use this description of the process of multiplying polynomials as thc dcfinition of multiplication, but the end product of the method can be described in general tcrms aiid provides a satisfactory defini tion.
be polynomials in z with coeficients iil D. The product of a(z) and b(z) is the poIynomia1
mhere aj = O if j
>
m, and bk
O if Ic
>
n.
I n this definition, the cocfficients ao, al, a2, . . . , an are the ncgatives of thc elements ao, a l , a2, . . . , a, in the integral domain D.
316
[CHAP.
THEOREM 92.5. Let D[x]be the set of al1 polynomials in x with coefficients in an integral domain D. Define equality, addition, multiplication, and negation in D[x] by Definitions 92.1, 92.2, 92.3, and 92.4, respectively. Then D[x]is an integral domain.
Proof. In order to prove that D[x] is a commutative ring with OxOas its zero and l x Oas its identity, it is necessary to verify such identities as the associative, commutative, and distributive laws. This is a rather tedious job which we will leave to the reader. I t should be remarked that the proofs of these laws use the fact that addition, multiplication, and negation in D satisfy the postulates for a commutative ring. For example, amxm and b(x) = boxO blxl if a ( x ) = aozO a l z l bnxn are in D[x],then by Definition 92.3, the coefficient of xi in the product a ( z ) b(x) is
Again using Definition 92.3, the coefficient of x<n the product b ( z ) a ( z ) is biial biao. boai blaii
Since D is a commutative ring, it follows that these two expressions are equal for every i. Therefore, by Definition 92.1, a ( x ) b(x) = b(x) a(x). That is, multiplication is commutative in D[x]. In order to prove that D[x] is an integral domain, it suffices by Theorem 44.5 to show that if a ( z ) and b(x) are nonzero polynomials, then a(x)b(x)is not the zero polynomial OxO. Since a ( z ) and b(z) are not zero, it is possible to write
where a, # O and bn # O in D. By Definition 92.3, the coefficient of xm+" in a(x)b(x)is ambn. Since D is an integral domain, ambn # O, by Theorem 44.5. Therefore, a ( x ) . b(x) # OxO. This proves the theorem. I t is time to justify the notation aoxO aixl a2x2 polynomials. It is clear from Definitions 92.2 and 92.3 that
+ anxn for
where the righthand side of this equality is no longer a formal expression, but is an actual sum of products of polynomials. This observation suggests
that Ucshould use the symbol .c to denote thr polynomial os0 L l.,:', and cach clement a E D should be identified with the polynomial a.co. This last identification can be easily justified. Indeed, by Ilefinitions 92.2, 92.3, and 92.4,
so that the corresponderice a ++ a.cO is ari isomorphism. identifications, it folloivs that a(.r) = a. t al.r a2x2 actually a sum of the products
i factors
Qi
. .L' . . x . ...  z
, 
in the integral domain D[.r]. A numbcr of useful conscqucnces follo\k from this observation. For example, \ve can rearrange the terms in a polynornial in any way which might be convenient. In particular, the polynomial a0 alz anlznl anzn can be writteri in "dcsccnding powers" of x, that is, in the form a,sn ~,~z"' alz ao. I t is customary to denote the integral domaiii of al1 polynornials in x with coefficients in D, as we have done in Thcorem 92.5, by D[.r]. The identification of z with OxO l z l and of each a E D with uzo will always be madc. The polynomial z is often called an indeterminate, and D[x] is referred to a s the domain qf polgnomials in the indeterminate z with cocfficicnts in D. The elements of D, when regarded as clements of D[z], alx are called constant polynomials. The term a. in a(x) = a. a2x2 anxn is callcd the constant term of a(z). Thc zero and identity of D are also the zero and identity of D[z]. Let us now examine our definition of polynomials more critically. There are t\vo weak points in the construction of D[z] which we have givcn. lcirst,, the idea of a "formal expression" is vague. Second, Definition 92.1, for "eyuality" of polynomials, needs to be clarified. I t is possible to give a definition of polynomials and their opcrations which avoids both of these problems. However, some discussion is needed to see that this definition is reasonable. The definition of ccluality given above implies that a polynomial can be exprcssed using any number of terms with zero cocfficicnts. For instance,
iil ~vhichthe coeffi(:ients are zero from some point, on. This viewpoint has
3 18
[CHAP.
the advantage that there is no ambiguity about the number of terms in a anxn and C:=o bn.cn are equal when polynomial. Two polynomials a, = b, for every n. The problem of giviiig an exact meaning to the formal expressions
can be avoided. It is evident that a polynomial is completely determiiied by the sequence (ao, a l , a2, . . . , a,) of its coefficients. Thus, if 1se want concrete mathematical objects for our polynomials, 1se can take them to be the sequences of elements in'D which up to now have been thought of as the coefficients of the "powers" of .c. For the reasons explained above, it is advantageous to let al1 of these sequences be infinite, but of course zero from some point on. These remarks motivate the following construction of an integral domain whose elements are definite objects, and which has the same algebraic properties as the ring of al1 polynomials in .t. ~ v i t h coefficients iii D. Let A be the set of al1 infinite sequences
of elements from the integral domain D, such that ak = O for al1 except finitely many values of 1;. That is, the sequences svhich belong to 14 are those which are of the form (ao, a l , a2, . . . , an, 0, O,
.).
TTVO such sequences are equal if they contain exactly the same elements of D in the same order. The operations of addition, multiplication, 2nd negation in A are defined by the rules (ao, a l , a2, a3, . . .) @O, b i , 62, b3, = (a0 bo,al bl,az
.)
+ b3,. . .),
(92)
(ao, a l , a2, a3, . . .) @O, b1, 02, b3, = (sobo, a0b1 albo, a o h
and
.)
The sums ai b;, the sums of products aobi albiWl . ailbl aibo, and the negatives ai are formed in the integral domain D. It is
easy to see that the sct il is closed under the three opcrations of (92)) that is, if the sccluenccs (aO, al, a2, a3, . . .) nnd ( / l o , bl, b2, b3, . . .) have only a finite numbcr of nonzcro elements, then the same is true of the sum, product, and negativcs of these sequcnces. 1:or example, if aj = O for a11 j > m and bk = O for al1 li > n , then for 1 > m n,
since if j li = 1 > m n , then cithcr j > m , or 1~ > n. Consequcntly, al1 tcrms are zero aftcr thc m n 1st in the seyucnce
+ +
(aobo,aobl
which is the prodilct of (ao,al, a2, a3, . . .) and ( b u , b l , b2, 03, . . .). A few straightfornard calculations show t,hat A is an integral domain whose zero is (O, 0, 0, O, . . .), ancl nhose idcntity is ( 1 , 0, 0, 0, . . .). Aorcovcr, the corrcspondcnce a ++ (a, O , 0, 0 , . . .)
is an isomorphism bct~vcenD and thc subring of A consisting of al1 scquenccs of the form (a, O , 0, O , . . .). As usual, wc idcntify D with this subring, and nrite a instcad of (a, O, 0, O, . . .). I t follo~vs from (92) that the element x = (O, 1, o, O) . . .) satisfics x2 = ( O , O , 1,0 , . . .), :< (0,070, 1 ,
e),
ax a z "
= = =
a.x2 = ( a , 0, 0, 0 , . . .) (0, 0, 1 , 0 , . . .)
e te. Conscqucntly,
I n othcr words, the elemcnts of A can be cxprcssed iii the same \vay as thc polyiiomials ~vhichne havc bccn thinkiilg of as "formal espressions." I t is casy to see that the corrcspondciicc bct\vccii polynomials aild the clcments of 12 is a riilg isomorphism k)ctwccii D[x] and .l.
320
[CHAP.
1. Use Definition 92.2 to find the sums of the following pairs of polynomials with coefficients in 2.
(a) OxO 7 x 1 (3)x2 1x3, 5x0 6x1 (3)x2 (b) 1 7x4  x7, $3 5x5 (c) ixO oxl ox2 . ox24 ixO oxl ox2 . . 0 ~ 2 1 ~ ~ 2 ~ .
+ + + + + + + + + +
+ +
+ +
+ +
2. Use Definition 92.3 to find the products of the pairs of polynomials listed in Problem 1.
Show that this is the expression which is obtained by multiplying al1 combinations of terms from each factor and collecting the coefficients of equal powers of x. 4. Prove that addition is commutative and associative in D[x]. Show that a ( x ) O x0 = a ( x ) for a ( x ) E D [ x ] . Prove that a ( x ) [a(x)] = OxO for a(x) E D[x].
5. Prove that multiplication is distributive with respect to addition in D[x]. Show that l x Ois the identity of D[x]. 6 . Prove the following properties of multiplication in D[x]. ( a ) (aoxO a l x l a2x2 anxn) (cxi) = aocxi alcxi+ l a2cxi+2 a,c~i+~. (b) axi [b(x) (exi)] = [(axi) b ( x ) ] (cxi) for b ( x ) E D[x] and a E D, c E D. (c) Use the distributive law and (b) to prove that a ( x ) [b(x) (exi)] = [ a ( x ) b ( x ) ] (cxi) for a ( x ) , b ( x ) E D[x]and c E D. (d) Use the distributive law and (c) to prove that multiplication is associative in D[x].
+ +
+ +
93 The division algorithm for polynomials. I n this and the following two sections, we will investigate the arithmetic of polynomial rings. I t will be seen that the theory of the rings F[x]of al1 polynomials in x with coefficients in a Jield F is remarkably similar to the theory of the ring Z of al1 integers. The reader is advised to compare the results in Sections 93, 94, and 95 with the theorems about the integers which were proved in Sections 51, 52, and 53.
931
321
DEFINITION 93.1. Let a ( x ) = a0 alx a2x2 anxn be a polynomial with coefficients in an integral domain D. Suppose that a ( x ) is not the zero polynomial. The degree of a ( x ) is the largest m 2 O such that a, # O. The coefficient a , is called the leading coeficient of a ( x ) . The degree of any nonzero polynomial is a nonnegative integer. For example, 32:  4x3 is three; the degree of 2 is zero; the degree of 3 Ox is one. the degree of 3 (l)x
+ +
+ + +
The polynomials of degree zero are exactly the nonzero constant polynomials; the polynomials of degree one are the polynomials of the form a bx with b # 0 , etc. Ko degree is assigned to the zero polynomial. I t is convenient to denote the dcgree of a nonzcro polynomial a ( x ) by
k'or instante,
0x4] = 3,
I t is obvious from (91) that if Deg [a(x)]= n, then it is possible to write a ( x ) = anxn anlxnl . ao,
+ anlxnl +
+ ao,with a,
These two observations are oft,en useful. THEOREM 93.2. Let a(x) and b(x) be nonzero polynornials in D[x], where D is any integral domain. Then ( 4 Deg [a(x) b(x)l = Deg [a(x)l Deg [b(x)l; (b) if a ( x ) O(x) # O, then Deg [ a ( z ) b(x)] < max {Deg [ a ( x ) ]Deg , [ b ( x ) ]; ) (c) if Deg [ a ( x ) ]# Deg [ b ( x ) ]then , ,Deg [a(x) b(x)] = max (Deg [a(x)], Deg [b(x)]) .
+ +
+
Deg [ a ( x ) ]= n
and
Deg [b(x)]= m.
322
Then
[CHAP.
Since D is an integral domain, anbm # O. Therefore, Deg [a(x) b(x)] = m n = Deg [a(x)] Deg [b(x)]. To prove (b) and (c), suppose first that n > m. Then
Therefore,
Thus, Deg [a(x) b(x)] = n = max {Deg [a(x)], Deg [b(x)]). similar argument, if n < m, then Deg [a(x)
By a
This proves (e), and also (b) except in the case that n = m. When n = m,
+ b(x) = (a, + bn)xn + (an1 + bnl)xnl + + (ao + bo). I f a(x) + b(x) # O, then ak + bk # O for some k not exceeding n. The degree of a(x) + b(x) is the largest such Ic. Clearly,
a(x) Except for certain special results concerning Z[x], we will restrict our discussion to the integral domains F[x], where F is a field. Since every field is an integral domain, the definitions and results which have already been given in Sections 92 and 93 apply to F[x]. The fields which particularly concern us are C, R, Q, and 2,. The degree of a polynomial is used in the study of the arithmetic of polynomials in much the same way as the absolute value is used in the study of 2. The principal of mathematical induction is applied to the study of Z by means of the absolute value of an integer. Similarly, it is through the degree of a polynomial that induction can be used in F[x]. The division algorithm for polynomials is our first example of a theorem about polynomials which is proved by induction on degrees. THEOREM 93.3. The division algorithm in F[x]. Let a(x) and b(x) be polynomials in F[x], where F is a field. Suppose that b(x) # O. Then there exist unique polynomials q(x) and r(x)in F[x] such that
931
323
This fundamental result is a statement of the process of long division for polynomials. The reader is probably familiar with the mechanics of this process, without having thought about, its formal statemcnt and proof.
22 3 and b(x) = 2x2  32 1. JVe will EXAMPLE 1. Let a(x) = x3 think of thcse polynomials as elements of Q[x], even though they have integral coefficients. To find the polynomials q(x) and r(x) ivhose existence is guaranteed by Theorem 93.3, the familiar long division process will be used:
+ +
Therefore,
Thc proof of the division algorithm is based on an induction in the form of t.he wellordering principle. I t is convenient to prove a result which plays t,he role of the induction step in the proof of Theorem 93.3.
(93.4). Suppose that b ( x ) and c ( x ) are nonzerr, polynomials in F [ x ] , such that Deg [ b ( x ) ]5 Deg [c(x)].Then there is a polynomial j ( x ) such that either c ( x ) = j ( x ) . b ( x ) , or else
Deg [c(x)  f ( x ) b(x)] < Deg [c(z)].
. boj and c ( z ) = cmxm Prooj. Let b ( x ) = bnxn bnlxnl ~ ~  ~ x ~ C O , mhere ~ bn # O, cm # 0, and n 2 m. Define j ( x ) = (cm b ; l ) . ~ ~  ~ Then .
Thus, if c ( x )  j ( x ) b ( x ) # O , then the dcgree of this polynomial is less than m, the degree of c ( z ) . This is exactly what had to be shown for the proof of (93.4).
324
[CHAP.
We will now prove Theorem 93.3. We first prove the existence of polynomials q(x) and r ( x ) with the required properties. I f there is a polynomial q(x) such that a ( x ) = q(x) b(x), then r ( x ) can be taken to be O . Therefore, suppose that a ( x ) # g(x) b(x) for al1 g(x) E F[x]. Then Deg [a(x)  g(z) b(x)] is defined for al1 polynomials g(x) E F[x]. Consequently, (Deg [a(x)  g(x) . b(x)ll d x ) E F[xl)
is a nonempty set of nonnegative integers. By the wellordering principle, this set contains a smallest integer lc. That is, there is a polynomial q(x) E F[x]such that Deg [a(x)  q(x) b(x)]= k 5 Deg [a(x)  g(x) b(x)] for al1 g(x) E F[x]. If 12 f ( x ) E F[x]such that either
In the first case, a ( x ) = [q(x) f(x)] b(x), which is contrary to the assumption that a ( x ) # g(x) b(x) for al1 g(x) E F[x]. In the second case, Deg [a(x)  [q(x) f(x)] b(x)] < k, which contradicts k 5 Deg [a(x)  g(x) b(x)]for al1 g(x) E F[x]. The only alternative to these contradictions is that k is less than the degree of b(x). Thus, if we cal1 r ( x ) = a ( x )  q(x) b(x), it follows that a ( x ) = q(x) b(x) r ( x ) , and Deg [r(x)]< Deg [b(x)]. I t remains to show that the polynomials q(x) and r ( x ) satisfying the conditions of Theorem 93.3 are unique. Suppose that
where r l ( x ) and r2(x) are either zero, or else they have degree less than Deg [b(x)].By subtracting these expressions, we obtain
This is impossible, because Deg [ql(x)  q2(x)]2 O . Therefore, r2(x) r1( 2 ) = O, and since [ql( x )  q2(x)] b(x) = r 2 ( x )  r 1 (x) = O and
931
325
b ( x ) # O, we also have q l ( x )  q2(x) = O. This completes the proof of the uniqueness of q(x) and r ( x ) . The polynomials q ( x ) and r ( x ) in the expression a ( x ) = q ( x ) b(x)
+ r( x )
given by the division algorithm are called, respectively, the quotient and rernainder on dividing a ( x ) by O(x). The division algorithm for polynomials can be generalized in a way which is analogous to the way that Theorem 51.3 generalizes the division algorithm for integers. We limit ourselves to st,ating a special case of this generalization.
THEOREM 93.5. Let c E F, where F is a field. Then every nonzero polynomial f ( x ) can be uniquely represented in the form
. . . , a. are elements of F.
This theorem can be proved from Theorem 93.3 by induction on Deg [ f ( x ) ] in the same way t.hat Theorem 51.3 is obtained from Theorem 51.1. We omit the proof.
l . Use the division algorithm to find the quotient and remainder on dividing a ( x ) by b(x) for the following pairs of polynomials. (a) a ( x ) = 2x3  3x2 x  1 and b ( x ) = x2 .2 are in Q[x] (b) a ( x ) = x2 2 and b ( x ) = 2x3  3x2 x  1 are in Q[x] (e) a ( x ) = x7 4 x and b ( x ) = x  1 are in Q [ x ] (d) a ( x ) = x2 2/x  1 and b ( x ) = x  ( 2 / G  2 / 2 ) / 2 are in R [ x ] (e) a ( x ) = x3 ix2 x i and b ( x ) = x2 i are in C [ x ] ( f ) a ( x ) = 3x4 8x2 2 and b ( x ) = 12x2 x 3 are in2l3[x]
2. Let f ( x ) = Let d E F. Let a2, al, and a0 be the coefficients of the powers of x  d in the reprcsentation
Find cxl)ressions which give a2, a l , and a0 in terms of a, b, c, and d . Show directly t h a t Theorem 93.5 is truc in this case.
3. Let f( x )
x9
326
[CHAP.
6. Show t h a t l'heorcin 93.3 can bc generalized a s follows: let a ( x ) and b(x) be polynomials in D [ x ] , ~vlicreD is an integral domtiin. Suppose t h a t b(x) # 0, and thc leading coefficient (see Definitioii 93.1) of b(x) is 1. Thcn thcre exist unique polynomials g(x) and r ( x ) iii D[x] such t h a t
a(.)
and cither r ( x )
=
q(2) b(x)
+ r ( x ),
0, or
C ~ S CD
c [~ r ( x ) ] < I)eg [ b ( x ) ] .
9 4 Greatest common divisor in F [ x ] . Thc divisit,ility of elements in an integral domain \\as discuescd bricfiy in Scc~tioii 44. The conccpts and notatioii introduccd in that acction apply to the ring F[.r] of polyiiomials with cocfficiciits i i l a ficld F. I:or convcniencc Iet us recall that uccording to Dcfinitiori 41.G, a po1ynomi:il 6(.c) divides a(.r) in I;'[.c] if thcrc is a polyriomial c(.c) E F[.r] such that a(.[) = b(s) c(.x). I t is nlso customary to say i i l this case that b(.e) is afactor of a(.r). i'hus, b(.c) divides a(.r) in F[.c] if and oidy if the remainder on dividiiig a(.x) by b(.c) is the zero polynomial. The statement that b(.r) divides a(m) is nbbreviated by writing
The relation b(.r) ia(.r) iii F[.r] has certnin useful propcrtics which dcpend on thc particular nature of thc iiltcgral domain F[.x].
(04.1). (a) If b(.c):a(.c) in F[x], then d . b(.r.):a(.x) aiid b(.c)]j(.r) a(.r), wherc d is any nonzero elcment of F , arid f(z) is a ~ i y polynomial of F[.r]. (b) Ll iioiixero const:~ntpolynomial divides evcry polynomiul in F[.x]. (e) If b(.c) la(.r) and n(a) # 0, thcn tlie degree of b(.r) is lcss than or elual to tkic dcgrcc of a(.r). (d) If b(.r)'a(.r) :iiid a(.x) aiid b(.c) have the same degrec, then each polynomial is a iioiizcro constant multiple of the othcr. (c) If c(.r)la(.r), c(.r)jb(.r), thcn c(.r),[f(.r)a(.r) g(.c)b(.r)] for every j(.r) arid g(.r) in F[.c].
I'roof. To provc (ti), \\e note that a(.c) = I)(.r) c(.r) for some c(.r) E F[.r]. Thrn a(.r) = [ d . b(.r)J [dl .c(.r)] aiid f(x) .a(.c) = b(.ts) [f(.r) c(.r)]. Thcrcforc, d b(.r)la(.c) arid b(.r),f(x) a(.c). Statcment (1)) folio\\s from (a) niid the fact that thc ideiitity elemeilt 1 E F'[.L'] divides cvery polynomial i i i F[.r] [see Theorem 44.7(f)]. The propcrties (e) and (d) follow from Theorem $13.2(a). If b(.r)!a(.c), thcn by dcfinitioii, therc is a polynomial c(.x) siich that a(.r) = O(.r) c(.r). Since a(m) f O, it fo1lon.s that b(.c) # O aiid c(.r) f O. Therefore, by 'i'heorem 93.2(a), n e g [a(.r)] = 1)cg [b(.r)]
+ Deg [c(.c)].
941
IN
FIXI
327
Since the degrees are nonnegative integers, it follows that Deg [b(x)] 2 Deg [a(z)]. Iloreover, if Deg [b(z)] = Deg [a(z)], then Deg [c(r)] = O, so that c(x) is a nonzero constant. This proves (c) and (d). lTinally, the property (e) is no more than a restatement of Theorem 44.7(e).
EXAMPLE l . I n Q [ x ] , (+
since
and
Definition 52.1 of the greatest common divisor of two integers was based on the ordering of the integers. Such a definition does not make sense in riilgs such as F[x] which are iiot ordercd. Howcver, the conditions (51 ) and (52) for the greatest common divisor make sense in any integral domain, and as ive observed in Section 52, these conditions can be used to define the greatcst common divisor of tivo elements (not both zero) in any integral domain. For conveniente, let us rcstate this definition for the integral domain F[x].
DEFISITION 94.2. IJet a(x) and b(z) be polyiiomials in F[x] which are not both zero. Then d(x) E F[x] is a greatcst common dicisor (g.c.d.) of a(x) xnd b(x), in F[x] if (a) d(x)la(x) and d(x)lO(x); (b) if c(x) E F [ x ] satisfies c(x)la(x) and c(:c)]b(x), then c(x)ld(s).
I t follo~vs from (9l.la, b) that if d(x) is a greatest common divisor of a(x) and b(x), then so is c d(x), ivhere c is any nonzero element of F. Thus, a g.c.d. of a(x) and b(x) is not unique. Moreover, it is by no means obvious that tivo polynomials a(.r) and b(x) necessarily have any greatcst common divisor.
328
[CHAP.
DEFINITION 94.3. A nonzero polynomial f(x) is called monic if the leading coefficient of f(x) is 1. That is, f (x) has the form
+ bo
is a nonzero polynomial in
is a monic polynomial. Thus, for any nonzero polynomial g(x), there is a unique monic polynomial f(x) which is a multiple of g(x) by a nonzero element of F . It is customary t.o cal1 f(x) the monic polynomial associated with g(x) [or simply the monic associate of g(x)]. THEOREM 94.4. Let a(x) and b(x) be polynomials in F[x] which are not both zero. Then there exists a unique monic polynomial d(x) E F[x] which is a greatest common divisor of a(x) and b(x). Moreover,
for some g(x) and h(x) in F[x]. Proof. Suppose that a(x) = O. Then b(x) # O, and it is easy to see that the monic polynomial associated with b(x) is a greatest common divisor of a(x) and b(x). Similarly, if b(x) = O, then the monic associate of a(x) is a g.c.d. of a(x) and b(x). I n both of these cases, this monic g.c.d. can be expressed in the form g(x)a(x) h(x) b(x), in fact, with g(x) and h(x) constant polynomials. Assume therefore that a ( ~ # ) O and b(x) # O. We will prove the statement of the theorem (except the uniqueness) by course of values induction on
I f min {Deg [a(x)], Deg [b(x)]) = O, then either a(x) or b(x) is a nonzero constant polynomial, and the only common divisors of a(x) and b(x) are the nonzero constant polynomials. Hence, 1 is a monic g.c.d. of a(x) and b(x) in this case. Moreover, if a(x) = a E F, then 1 = a' a O . b(x), and if b(x) = b E F, then 1 = O a(x) b' . b. Assume inductively that if s(x) and t(z) are polynomials such that min {Deg [s(x)], Deg [t(x)]) < n, then s(x) and t(x) have a monic g.c.d. d(x), which can be expressed in the form d(x) = e(x) s(x) f (x) t (x) for some e(x) and f (x) in F[x]. Suppose that n = Deg [b(x)] 5 Deg [a(x)]. The proof is similar if n = Deg [a(x)] < Deg [b(x)]. By the division algorithm, 
where eithcr r(.c) = O, or else r(x) # O and Dcg [r(x)] < Deg [b(x)]. If r(x) = O, then b(z)la(x), and it follows easily from Definition 94.2 that t8hcmonic polynomial associated with b(x) is a greatest common divisor of a(x) and b(x). The monic associate of b(x) has the form g(x) a(z) h(x) b(x), whcrc g(x) = O and h(x) is a nonzero constant polynomial. Consider the case iil ivhich r(x) # O. Thcn mil1 {Deg [r(x)],Deg [b(x)])
= =
Dcg [r(x)] < Deg [b(z)] min {Dcg [a(x)], Dcg [b(x)]} = p.
Thus, by thc induction hypothesis, r(x) and b(x) have a monic greatest common divisor d(.r), which can be writtcn in the form d (x)
=
e(z) r (x)
+ f (x)b (x) . +
Since d(x)lb(x) and d(x)lr(x), it follows that d(x)([q(x)b(x) r(x)], by (94.1~). That is, d(x) Ja(z). Suppose tthat c(x) E F[x] is such t.hat c(x) la(x) and c(z)lb(x). Thcn c(x) divides a(x)  q(x) b(x) = r(x). Therefore, sincc d(x) is a g.c.d. of b(x) and r(x), Definition 94.2(b) requires that c(x)(d(.c). We have shown that d(z) satisfics both of the conditions of Definition 94.2 for a g.c.d. of a(x) and b(x). Thcrefore, d(x) is a g.c.d. of a(x) and b(x). Moreover, d(x) = e(x) r(x) = g(x) . a(.r)
+ f(x)
+ f(x)
b(x)
where g(x) = e(z) and h(.t.) = f(x)  e(x) q(x). To prove that d(x) is the unique monic g.c.d. of a(.t.) and b(x), assume that dl(x) is also a monic polynomial which satisfies Definition 94.2. Since d(x) satisfics part (a) and dl(x) satisfies part (b), it follo~vs that d(x)ldl(x). Similarly, sincc dl(x) satisfies (a) and d(x) satisfics (b), it follo\vs that d (x) Id(.r). Thcrcfore, Dcg [d (x)] = Deg [d(:c)], by (94. le). By (94.ld), dl(x) is a constant multiplc of d(x), say dl(x) = Icd(x), whcre 1; # O is in F. Sincc both dl (x) and d(x) have leading cocfficient 1, it follo\vs that 1i = l. Hence, dl(x) = d(x). This completes the proof of the thcorem. us to spcak of the monic g.c.d. of l'hc rcsult of Theorem 94.4 allo~vs two polynomials a(.t.) and b(x) in F[x] \vhich are not both zero. I t is convcnient to denote this uniclue monic g.c.d. by the exprcssion
This is similar to thc notation iilt,roduccd in Definition 52.1 for the g.c.d. of t\vo integers.
330
[CHAP.
I t i s possible to prove Theorem 94.4 by a method which resembles the proof of the analogous Theorem 52.2 (see Problem 5 ) . However, the proof which we have given provides a practica1 method of finding the g.c.d. of f b(x) = O, then ( a ( x ) ,b ( x ) ) is the monic two polynomials a ( x ) and b(x). I associate of a ( x ) . If a ( x ) # O, b(x) # 0 , and Deg b(x) 2 Deg a ( x ) , then (a(.), b ( x ) ) = (b(x),r ( x ) ) ,where r ( x ) is the remainder obtained from the division of a ( x ) by b(x). Consequently, by repeated application of the division algorithm, it is possible to find the g.c.d. of a ( x ) and b(x).
Let a ( x ) = x 5 + 3 x 4 + 5 x 3 + 4 x 2 + 4 x + 1 and b ( x ) + 2 x 4 + 3 x 3 + 2 x 2 + 2 x . Then b y repeated use of the division algorithm: x5 + 3x4 + 5x3 + 4x2 + 4 x + 1 i .( ~ + 5 2~~ + 3 ~ + 3 2 ~+ 2 2 ~+ ) ( ~ p 4 sx3+ 2x2 + 2 x + 1 ) ) ( ~+ 4 zx3 + zx2+ zx + 1 ) + ( ~ + 3 x), + 2x4 + + 2 ~ + 2 zx +2 ~ + 3 zx2+ 2% + 1 = ( X + 2 ) ( ~ + 3 X)+ ( ~+ 2 1) x3 + x ( x 2 + 1 ) + o.
EXAMPLE 2.
x5
~5
3L3
~4
Therefore,
The method of obtaining the g.c.d. of two polynomials as in Example 2 by repeated use of the division algorithm is called the Euclidean algorithm, because it is similar to the process of obtaining the g.c.d. of two integers which has been passed down to us in the works of Euclid. In general terms, the process consists of forming the successive equations
where r l ( x ) , r 2 ( x ) , . . . , rn(x) are not zero, Deg [b(x)]> Deg [ r i ( x ) ]> Deg [r2(x)]> > Deg [rn(x)]. I t follows from the proof of Theorem 94.4 that the monic polynomial associated with rn(x) is the monic
941
and (rn( 2 ) )O) is the monic polynomial assoeiated wi th rn ( x ). Two polynomials a ( x ) and b ( x ) in F [ x ] are relatively prime if the monic g.c.d. of a ( x ) and b ( x ) is 1. The proof of the following result is identical with the proof of Theorem 52.6.
THEOREM 94.5. If a ( x ) and b ( x ) are relatively prime polynomials in F [ x ] ,and if a ( x ) l b ( x ) ~ ( x )where , c ( x ) E F [ x ] ,then a ( x ) l c ( x ) .
 (dS
+ d 3 )+ ~d 6
x2
X~
x2 + l
2. Prove that a ( x ) and b(x) are relatively prime if and only if there exist polynomials j ( x ) and g(x) in F[x]such that j ( x ) a ( x ) g(x)b(x), = 1.
3. Let a ( x ) and b(x) be polynomials in D[x],where D is an integral domain. The polynomial a ( x ) is called an associate of b(x) if a(x)lb(x) and b(x)la(x). If a ( x ) is an associate of b(x), we write a(x) b(x). (a) Prove t h a t is an equivalence relation on D[x]. ( b ) Let D = 2. Prove t h a t a ( x ) b(x) if and only if b(x) = a ( x ) or b(x) = a(x). (c) Let D = F, a field. Prove that a ( x ) b(x) if and only if b(x) = k . a ( x ) , where k is a nonzero element of F. (d) Suppose that d i ( x ) and dn(x) are greatest common divisors of a ( x ) and b(x). Prove that di ( x ) d2(x). (The definition of a greatest common divisor of two polynomials in D[x],where D is an integral domain, is obtained from Definition 94.2 by replacing the field F by D).
4. Find a n example which shows t h a t the polynomials j ( x ) and g(x) in the expression d ( x ) = j ( x ) a ( x ) g(x)b(x) for the monic g.c.d. of a ( x ) and b(x) in Theorem 94.4 are not unique.
5. Let a ( x ) and b(x) be polynomials in F[x]which are not both zero. Let
(a) Show that S contains a t least one nonzero, monic polynomial. ( b ) Without using Theorem 94.4, prove t h a t the monic polynomial of smallest degree in S is a g.c.d. of a ( x ) and b(x).
332
EQUATIOSS
[CHAP.
6 . Show t h a t the only greatest common divisors of 2 and x in Z [ x ]arc 1 and 1. [See Problem 3 ( d ) for the definition of greatest common divisor in Z [ x ] . ] Prove t h a t i t is impossible t o find j ( x ) E Z [ x ] and g(x) E Z [ x ] satisfying 1 = g(x) x. This shows t h a t Theorcm 94.4 is false in D [ x ] ,where L) is j(x) 2 a n integral domain.
7 . Let a($) and b ( x ) be polynomials, not both zero, with coefficients in the rational field Q. Prove t h a t the monic g.c.d. of a ( x ) and b ( x ) in R [ x ]has rational coefficients. 1s the s+me conclusion true if R [ x ]is replaced b y C[x]? [Ilint: Prove t h a t the monic g.c.d. of a ( x ) and b ( x ) in Q [ x ]is also a g.c.d. of a ( x ) and b ( x ) in R [ x ] ,then use the uniqueness statement in Theorem 94.4.1 S. Let a ( x ) , b ( x ) , and c ( x ) be polynomials in F [ x ] ,with a ( x ) # O, b ( x ) # 0 , and a ( x ) monic. Prove t h a t ( a ( x ) b ( x ) ,a ( x ) c ( x ) ) = a ( x ) ( b ( x ) ,c ( x ) ) . 9. Let { a l ( x ) ,a 2 ( x ) ,. . . , a,(x)) ( n 2 ) be a sct of polynomials in F [ x ]with a l ( x ) # O. A greatest common divisor of { a l ( x ) ,a2(x), . . . , a,,(x)) in F [ x ] is a polynomial d ( x ) E F [ x ] such t h a t (i) d(x)lai(x) for i = 1, 2, . . . , n, and (ii) if c ( x ) ) u ~ ( for x ) i = 1, 2, . . . , n, then c ( x ) l d ( x ) . (a) Prove t h a t (. . . ( ( a i ( x ) ,a 2 ( x ) ) ,a s ( x ) ) , . . . , a n ( x ) ) is a g.c.d. of { a i ( x ) , an(x), . . . , an(x)) in I'[xl. (b) State and prove a theorem similar t o Theorem 94.4 for sets of n >_ 2 polynomials. 10. Find the monic g.c.d. of the following scts of polynomials. (a) x4  x3 3x2  22 2, x3 x2 2x 2, x 2/S i (b) x4  1, x3 x 2 + x + 1, x2  1 (c) x5 13x4 63x3 148x2 2082 192, 4x4 52x3 189x2 2962 208, 20x3 156x2 3782
>
+ + + + + +
+ + + + + + + + + +
+ 296
11. A least common multiple (1.c.m.) of two nonzero polynomials a ( x ) and b ( x ) in I'[x] is a polynomial m ( x ) E I'[x] ivhich satisfies (i) a ( x ) l m ( x ) and b ( x ) l m ( x ) ,and (ii) if l ( x ) is any polynomial in F [ x ]such t h a t a(x)ll(x) and b(x)ll(x),then m(x>11(x). Prove t h a t if a ( x ) and b ( x ) are nonzero polynomials in k'[x], then a ( x ) b ( x ) / ( a ( x ) ,b ( x ) ) is a 1.c.m. of a ( x ) and b ( x ) . 12. Find a least common multiple for thc follo\ving pairs of polynomials. (a) x5 3x4 5x3 4x2 42 1, x5 2x4 3x3 2x2 2.2: (b) x4  x 3x2  22 2, x3 x2 2x 2 (c) ~3  2 ~ + 1, x ~ + 1
+ + + + + + + + + + + + + +
95 The unique factorization theorem for polynomials. The fundamental theorem of arithmctic, which was proved in Section 53, states that every natural number can be written uniquely as a product of prime numbers. Thc purpose of this section is to prove a similar theorem about the arithmetic in an integral domain F[z], wherc F is a field.
DEICIKITION 95.1. IJet p(x) be a polynomial of positive degree in F[x]. Then p(x) is irreducible in F[x] if p(x) is not divisible by any polynomial except constant polynomials and constant multiples of p(x). in F[.L] Other~vise, p(x) is called reducible in F[x].
This defiriition requires some discussion. By (9&la, b), any polynomial a(x) in F[x] is divisible by every nonzero constant polynomial and by every nonzero constant multiple of a(.r). Thus, the irreducible polynomials in F[z] are exactly those which have no divisors other than these "trivial" ones. This parallels closely the definition of a prime number (Definition 53.1). Suppose that a(x) is a polynomial of positive degree which is reducible in F[.x]. Then by Definition 95.1, a(x) = f(x) g(x), where f(x) is not a constant and f(x) is not a constant multiple of a(x). I t follo~vsthat Deg [ f(x)] < Deg [a(x)]. For if Deg [f(x)] = 1)eg [a(x)], then by Theorem 93.2(a), Deg [g(.z)] = O. Therefore, g ( x ) is a nonzero constarit ~vhich implies that f(x) is a constant multiple of a(.c). This contradictiori proves that Deg [ j(.z)] < Deg [a(x)]. Therefore, a reducible polynomial a(x) in F[x] has a factor f(x) such that O < Deg [f(.e)] < Dcg [a(x)]. Conversely, it is easy to show that if a(x) E F[x] has a factor f(x) E F[x] which satisfies O < Deg [f(.c)] < Deg [a(z)], then a(x) is reducible in F[x]. Since Definition 95.1 applies only to polynomials of positive degree, thc constant polynomials are neither reducible nor irreducible. These polynomials play a special role in F[x] similar to that of the integers 1 aiid  1 in the arithmctic of %. I t is very important to observe that irreducibility is defined rclative to a particular field F. That is, a polynomial which is irreducible in F[.r]may be reducible in K[x] for some field K containing F.
x2  3
(ax
+ b)(cx + d)
+ bc)x + bd, + +
where a, b, c, and d are rational numbers. This iinplies that ac = 1, ati bc = 0, and bd = 3. Thus, c = ]/a, d = 3/b, and substituting in ad bc = O, \\e obtain
Thercfore, 3a2
+ b2
O and ( b / ~ )= ~ 3. However,
d3 is iiot a rational
334
[CIIAP.
9
=
number. Thus, x2  3 is irreducible in Q[x]. On the othcr hand, x2  3 (x  d3)(x d3),so that x2  3 is reducible in R[x].
EXAMPLE 2. Any polynomial of dcgree one, ax b, a # O , with coefficients in a field F, is irreducible in l+'[x].In fact ax b = f(x) g(x) with O < Dcg [f(x)] < 1 is obviously impossible. 3lorcover, if K is any field containing F as a subring, then ax b is also irrcducible in K[x].
+ +
The principal result of this section is that every polynomial of positive degree in F[x] can be expressed as a product of an element in F and one or more monic irreduciblc polynomials in F[x]. fi,foreover, this factorization is unique, except possibly for the order of the factors. This is the unique factorization theorem in F[x], which is the analogue of the fundamental theorem of arithmetic. The following preliminary results are needed for the proof of this important theorcm. (95.2). If p(x) is irreduciblc in F[x] and f (x) E F[x], then either p(x) 1j (x) in F[x], or p(x) and f(x) are rclatively prime. Proof. Let d(x) = (p(a), f(x)). Then d(x)lp(x), by Definition 91.2. Since p(x) is irreducible, i t follows that either d(x) is a constant or d ( 4 ) is a nonzero constant multiple of p(x). If d(x) is a constant, then d(x) = 1 (because d(x) is monic), so that p(x) and f(x) are relatively prime. If d(x) is a nonzero constant multiple of p(x), then p(x) = k d(x) for some nonzero 1c E E. Since d(x)l j(z), it follows that p(x) 1 j(x), by (94.la). (  5 . ) . If p(x) is irreducible in F[x], and p(x) divides the product a,(x) of polynomials in F[x], then p(x) divides a t a l (z) a2(x) least one of the polynomials ai(x). The proof is the same as the proof of (53.2). THEOREM 95.4. Unique factorization theorem i n F[x]. Every polynomial a(x) E F[x] of positive degree can be written as a product of a nonzero element of F and monic irreducible polynomials in F[x]. Except for the order of the factors, the expression of a(x) in this form is unique. Proof. Roth parts of this theorem are proved by course of values induction on n = Ileg [a(x)]. The proof that a(x) can be factored into a product of a nonzero element of 17 and monic irreducible polynomials in F[x] is similar to the corresponding part of the fundamental theorem of arithmetic. Suppose that Deg [a(x)] = l. Then a(x) = bz c, where b E F , c E F, and b # O. By Example 2, x (b' . c) is a monic irreducible
951
335
b [x
+ (bl
c)].
Suppose that a ( x ) has degree n > 1: and assume that every polynomial of degree m, with 1 5 m < n, can be expressed in the form
is irreducible, then
(ala,l)xnl If a ( x ) is not irreducible, then a ( x ) = b ( x ) c ( x ) , where b ( x ) and c ( x ) are polynomials in F [ x ] satisfying 1 5 Deg [ b ( x ) ]< Deg [ a ( x ) ] and 1 5 Deg [c(x)] < Deg [ a ( x ) ] . Therefore, by the induction hypot,hesis b(x)
= =
'
pr(x) qs(x):
and c(x)
' ' ' '
where cl and c2 are nonzero elements of F , and the pi(x) and q j ( x )are monic irreducible polynomials. Thus, a(.)
=
b( x ) c ( x )
' ' ' '
p&)
qs(x),
which is the required form. To prove that the factorization of a polynomial a ( x ) is unique, we can use induction either on the degree of a ( x ) , or on the number of monic irreducible polynomials which occur in some decomposition of a ( x ) into a product of irreducible polynomials. This last method corresponds to the proof of the uniqueness given in Theorem 53.3. However, for the proof of Theorem 95.4, it is slightly easier to induce on the degree of a ( x ) . Suppose first that a ( x ) has degree one and that
Thcn a l = a2 f 0, and albl = a2b2. Multiplying the last, equation by a;' = a;', we obtain bl = 62. Therefore, any t\vo factorizat,ions of
336
EQUATIONS
[CHAP.
a ( x ) are identical. Now, suppose that a ( x ) has degree n > 1, and assume that the unique fact,orization theorem is true for al1 polynomials of degree less than n. Let
of a ( x ) into products of an element of F and one be any two fa~torizat~ions or more monic irreducible polynomials. Since the P ~ ( xand ) q j ( x ) are monic polynomials, the leading coefficient of a ( x ) is both cl and c2. That is, c1 = cg. Thus,
so that p l ( x ) divides q l ( x ) q 2 ( x ). . . qs(x). Since p l ( x ) is irreducible, it follows from (95.3) that p l ( x ) divides one of the polynomials q j ( x ) . However, q j ( x ) is irreducible, and p l ( x ) is not a constant, so that p l ( x ) must be a constant multiple of q j ( x ) . Since p l ( x ) and q j ( x ) are both monic polynomials, it follows that pl ( x ) = q j ( x ) . If r = 1, then a ( x ) is irreducible, so that s = j = 1. In this case, the factorizations a ( x ) = c l p l ( x ) = c2q1(x)are identical. Otherwise, p l ( x ) can be cancelled from the above expression to obtain
Since n = Deg [ p l ( x ) ] Deg [ p z ( x ) . . . pT(x)] and Deg [ p l ( x ) ]2 1, it follows that Deg [p2(x) p,(x)l < n. By the induction hypothesis, the polynomials p 2 ( x ) , . . . , pT(x) are just the polynomials ql ( x ), q2 ( x ), . . . , qj 1 ( x ), qj+ 1 ( a ) , . . . , qs ( x ) in some order. Therefore, the two factorizations of a ( x ) are the same, except possibly for the order of the factors. The process of expressing a polynomial as a product of an element of F and a product of monic polynomials which are irreducible in F[x] is the familiar "complete factorization" which is studied in elementary algebra. I t would be convenient to have a systematic method which would give a complete factorization of any polynomial a ( x ) in any integral domain F [ x ] . Simply to have a way of deciding whether or not a given polynomial in F [ x ]is irreducible in F [ x ]would be helpful. Unfortunately, such methods exist only for particular fields F. For example, if the field F is a finite field of the form 2, (where p is a prime), then there are only finitely many polynomials of a given degree. By examining al1 products of two polynomials of degree less than a ( x ) , it is possible to decide whether or not a ( x ) is irreducible.
951
C N I Q U E FACTOHIZ.4TION T H E O R E M FOR P O L Y S O M I A L S
337
EXAMPLE 3. I3y a method which is similar t o the "sieve of Eratosthenes" (see Section 54), the monic irreducible polynomials of any degree in thc rings Z,[x] can be determined. Actually, the method is practica1 only for small p, and for polynomials of low degree. )Ve will consider the case p = 3. The following list includes evcry monic polynomial in Z3[x] of degree lcss than or equal t o two : x x 1 1
x +2 x2 x x
x2 x x2+x 1 x2+x + 2 x2 22 x2 2x 1 x2 22 2
+1 +2
+
= X'X
= (x+ l).(x+
= =
2)
x . ( x + 1) (x+2).(x+2) x (x (x
+ + + + +
= =
+ 1) (x + 1)
+ 2)
I t follo\vs t h a t the monic, irreducible polynomials of degrce one and two in Z3[x] are
1. Determine which of the following polynomials are irreducible in Q[x]. (a) x3  2 (b) x3 1 (d) x4  x2  1 (e) x2 2x2 22 1 (e) x4 zx 4 (f) X~ zx3 I
+ +
3. Let j(x) = ax2 bx c be a polynomial with rational coefficients, where a # O. (a) Prove t h a t j(x) is irreducible in Q[x] if and only if b2  4ac is not the square of a rational number. (b) Prove t h a t f (x) is irreducible in Rlx] if and only if b2  4ac < O. (c) Prove t h a t f(x) is reducible in C[x] for al1 values of a, b, and c .
4. Express the polynomials listed in I'roblem 1 as a product of monic irreducible polynomials in Q[x], R[x], and C[x].
+ +
5. TJsc the method of Example 3 to find al1 irreducible monic polynomials of the third degree in Z3[x].
6. Find the complete factorization of al1 1)olynomials of degree four in Z ~ [ X ] .
338
[CHAP.
8. Let a ( x ) = cp1 ( ~ ) ~ i p 2 ( x ) .". 2 . p,(x)"r be a polynomial in F[x],where the pi(x) are monic polynomials which are irreducible in F [ x ] , pi(x) # p j ( x ) for i # j, and the exponents ni are natural numbers. Prove that b(x) E F[x]divides a ( x ) if and only if b(x) = dpi(x)"lp2(~)"2. . . p,(x)"r, where O mi 5 ni for i = 1)2 ) . . . )r .
<
9. Any tuTo nonzero polynomials a ( x ) and b(x) in F[x] can be expressed in the forms a(x) = ~ p ~ ( x ) ~ ~ . p .~ . p,(x)"r, (x)~2
b(x)
dpi(x)"~p2(x)"2. . . p,(x)"r,
where the pi(x) are monic irreducible polynomials, pi(x) # pj(x) if i # j, and the exponents ni and mi are nonnegative integers. (a) Prove t h a t the monic g.c.d. of a ( x ) and b(x) is
where t i = min (mi, n i ) for i = 1, 2, . . . , r . (b) Prove t h a t a least common multiple of a ( x ) and b(x) is
1, 2,
.. ., r
96 Derivatives. Up to now, our discussion of the rings of polynomials with coefficients in a field has run parallel to the development of the fundamental t,heorem of arithmetic. In this section we introduce an idea which has no analoguc in thc arithmetic of integers. This is the conccpt of the derivative of a polynomial. This concept is one of the basic notions of calculus. The derivative of a polynomial plays an important role in the theory of equations, and for its application in this subject,, it can be defincd in a purely algebraic way.
+ ((n
1 ) an1)xn2
+ . . + ( 2 .a i ) + ~ 1
al
is called the derivative of a ( x ) . The derivative of a constant polynomial is zero. I t is customary to denote the derivativc of a ( x ) by a f ( x ) , the dcrivative of b ( x ) by b f ( x ) ,etc. The expressions n a,, ( n  1 ) a,1, . . . , 2 a2, and 1 a l for the coefficients of a r ( x ) dcnot,e the elements of F which are obtained by re
961
DERIVATIVES
m
n1 summanda
Xote that if the characteristic of the field F is a prime p with p 2 n, then p a , = O in F. Also, if 2p 2 n, then ( 2 p ) a2, = 0, etc.
+ d g x3  x + 4 3 E R[x],then the derivative ar(x) 4x3 + 32/2 x2  1. x5 + 2x3 + 32 + 1 E Z5[x],that is, the coefficients belong to thc
=
x4
I f a f ( x )is the derivative of a ( x ) , then the derivative of a l ( x ) is called the second derivative of a ( x ) , and is dcnoted by a U ( x ) . For any natural number n, thc result of taking n successive derivatives of a polynomial a ( x ) is called the nth derivati~le of a ( x ) . The nth derivative of a ( x ) can be denoted by
with n primes. However, this notation is unusual if n > 3. For large n it is customary to write a'"'(x) for the nth derivative of a ( x ) , and we will follow this practice if n > 2.
THEOREM 96.2. Let b(x) and c(x) be polynomials in F[x]. ( a ) If a ( x ) = b(x) c ( x ) , then a' ( x ) = b' (z) c' ( x ). (b) If a ( x ) = b(x) c ( x ) , then a f ( x ) = b(x) c l ( x ) bl(x) c ( z ) . (c) If a ( x ) = b(x)", where n 2 1, then a f ( x ) = n b(x)"' bf(+).
340
[CHAP.
(As we obscrvcd in Section 92, there is no loss of gcnerality in assuming that b(.x) and c ( x ) are written with the same number of terms.) Then
a ( r ) = (6, f cn).rn
Hence, by Definition '36.1,
c,1)~"'
(O0
af(.c) = n (O,
+1
+ cn)xn' + ( n
+
=
1) (bn1
+ c n  l )  ~ ~+ ~
+
+ 1 bi] + . + 1 .ci]
.
This provcs (a). We prove (b) first in t.hc case that b ( x ) = eam a,nd c ( x ) = f.cn where e, f E F , m 1 O and n 2_ O. Then a ( x ) = b ( x ) c ( x ) = (ef)xm+". By definition, a r ( x ) = [ ( m n) e j ] ~ ~ + ~  ' , and
br(x) c ( x )
+ b(.x)
+ (exm)
[ ( n f)xn'1
so that a' ( x ) = b r ( x ) c ( a ) t b ( x ) c' ( x ). Yext observe that if the identity (b) is correct for a l ( x ) = b l ( a ) c ( x ) and a2(:c) = b 2 ( x ) ~ ( x )then , it is truc for a ( x ) = b ( x ) c ( x ) , where b ( x ) = b l ( x ) b 2 ( x ) . Indeed, a ( x ) = [bl(n.) b 2 ( x ) ] ~ ( x= ) bl(.x) c(.c) b2(.r) C ( Z ) = a l ( x ) a2(x), so that
Similarly, if thc identity (b) holds for b ( x ) c l ( x ) and b ( x ) c 2 ( x ) , then it holds for b ( x ) [cl( x ) c 2 ( x ) ] . The proof of the identity ( b ) can now be completcd by induction. I t is convenient to use two steps. First we prove by induction on m thnt (b) is valid when
9 61
DERIVATIVES
is then ohtained by induction on n. The reader can renew his ski11 in the use of mathematical induction by filling in the details of this argument. I n order to prove (e), \ve usc induction on n. If n = 1 , the statement is that if a(x) = b(x), then at(a) = 1 b(.r)O bf(x). Since b(.c)O is 1 (the usual convention for the exponent zero), this identity is correct. Assume that n > 1, and that the derivatiie of b ( ~ ) ~ is' (n  1) O ( Z ) ~  ~ bt(z). Write b(z)"' = c(.c). Then if a(.r) = b ( . ~ = ) ~ b(n) . c(.x), it follo117s from (b) and the induction hypothesis that a' (x) = b(s) ct(.c) bf( . E ) c (x) bf(x) = b(.r) (n  1 ) b ( ~ ) "  ~
= =
+ b ( ~ ) ~  bt(x) l
+ b'(s)
b(.r)"l
Therefore, the induct,ion is complet,e, and Theorcm 96.2 is completcly proved. Thc reader should examine the proof of (b) very carefully, siilce the method is common in mathematical arguments. Our proof consists of three steps. I;irst, it is shown that the identity is true for the simplest polynomials, that is, the monomials. S e x t we prove that the set of polynomials satisfying the identity is closcd undcr addition. Finally, sincc every polynomial is a sum of monomials. it follows that the identity is true for al1 polynomials. This last stcp is of course a form of mathematical induction. I t is possiblc to provc (b) by straightforward calculation, but the notation becomes un~vieldy.
EXARIPLE 2. Let a ( x ) = (x  c)". Thcn a l ( x ) = n . ( x  c)"l, a U ( x ) = n ( n  1) ( x  c ) "  ~ ,
a ( n  l ) ( x ) = n . ( n  1) a(")(x) = n ! . l .
2 (x

c),
The derivative is useful for studying the multiple factors of a polynomial. I;or this application, \ve need thc following formula, ~vhichis a combination of Thcorem 90.2(1)) and (c).
342
[CHAP.
THEOREM 96.3. Let a(x) = cpl(x)"1 . . . pk(x)"k, k 1, where c E F, pl(x), . . . , pk(x) are polynomials in F[x] which are not constant, and n l , . . . , nk are natural numbers. Then
>
This result is easily obtained from Theorem 96.2 by induction on lc. We leave thc details for the reader to supply. THEOREM 96.4. Let a(x) = cpl(x)"1 . . . pk(x)"k, k 1, where ( 4 C E F, (b) pl(x), . . . , pk(x) are distinct monic, irreducible polynomials in FbI, (c) n l 1,. . . , n k 1,and (d) F has characteristic zero. Then the monic grcatest common divisor of a(x) and af(x) is
>
>
>
Prooj. I t is immediate that p1(x)"l' . . . pk(x)"k' divides a(x) = cpl(s)"l . . . pk(x)"*. We next observe that pl(x)nll . . . pk(x)"kl divides a(x)/pj(x) for j = 1, . . . , k. Therefore,
divides
where we have used the formula for a' (x) givcn by Theorem 96.3. Thus pl(x)"1' . . . pk(x)nkl is a common divisor of a(x) and al(s). To com) plete the proof, we must show that every common divisor of a ( ~ and af(x) divides pl(x)nll . . . pk(x)"kl. Let j(x) be a common divisor of a(x) and af(x). Thcn since j(x)la(x), it follows that
where ml _< n l , . . . ,mk 5 nk. I t is now sufficient to show that m1 # n l , . . . , mk # nk, for in this case ml 2 n l  1, . . . , mk nk  1, so that Assume that m1 = nl. Then f(x) divides pl(x)"l' . . . pk(x)"kl. f (x))at (x) implies that p l (x)"1laf (x). Moreover,
<
DERIVATIVES
Hence, by the unique factorization theorem, and the fact that pl(x), . . . , pL(x) are distinct monic irreducible polynomials, it follows that pl(x)lnlp(x). We now observe that nlp(x) # O. In fact, the leading coefficient of nlp(x) is nl Deg [pl(x)] times the identity element of F, which is not zero because of assumption (d). Therefore, Deg [nlp(x)] = Deg [pl (x)]  1. By (94.1~))this contradicts pl (x) lnip(x). This contradiction was obtained by assuming that ml = nl. Therefore, ml # nl, and similarly m2 n2, . . . , m k # nk. As we remarked above, these inequalities imply the theorem. A special case of Theorem 96.4 is worth emphasizing.
THEOREM 96.5. If p(x) is an irreducible polynomial in F[x], where F is a field of characteristic zero, then
Theorem 96.4 is useful for factoring certain polynomials, because the derivative, af(x), and the greatest common divisor, (a(x), af(x)), can both be effectively calculated in F[x].
a(x)
x9
a'(x)
Denote d(x)
= =
9x8
+ 3x3  x2  8~  4.
=
Then d'(x)
4x3
+ 9x2  22  8, and
(d(x), d'(x)) x
+ 2.
+ 2)21d(x).
344
T H E THFIORY O F 4T,(;EBH..IIC
EQUATIOSS
[CHAP.
The polynomial x2  x  1 is irreducible in Q[x] (see Problenl 3, Scction 95). Again by Thcorem 96;4, (x 2)3(x2  x  1)2 divides a(x). Ilividing, \ve obtain a(x) = (x 2)3(x2  x 1).
+ +
1. Find tlie derivatives of thc folloving polynomials. 3x4 Ex2  x 6, in Q[x] (a) 5x5 (b) x4 2/'2 x2 1, in R[x] (c) x " ix3 (2 3i)x2 4 3 x i, in C[x] (d) xn  1, in Q[x] (e) xn  1, in Zp[x] (f) xp+l 1, in Zp[x]
+ +
+ + + +
+ +
2. Find the successive derivatives af'(x), aC3)(x), u ( ~ ) ( x )., . . ior thc polynomials given in Problem 1. In cach case, find tlie smallcst natural number nz such t h a t a(")(x) = 0. 3. Prove t h a t for any nonzcro polynomial a(x) E F[x] there is a natural numbcr m 5 Deg [a(x)] such t h a t a(")(x) = O for al1 n > m. Prove t h a t if thc z = Dcg [a(x)]. Wliat can ??z be if the characcharacteristic of P is zcro, thcn n teristic of F is a prime p? 4. Complete thc dctails of the proof of Theorcn~ 96.2(b). 5. l'rove Thcorem 96.3. 6. Tlse the metliod of Examplc 3 to factor thc follo~vingpolynomials coniyletcly in the indicated I+'[x]. (a) x5 4x4 7 ~ ' 8~ x 9 35 2, in Q[x] (b) x G + 6 x + 11x4 12x3 19x2+ 6 x + 9, inQ[x] (c) x3 ix2 x i, in C[x] (d) x4  15x2  28x  12, in R[x] (e) x3 (22/ 43)x2 (2 2 4 6 ) ~ 2 4 3 , in R[x] (f) x4 x3 x 1, in Q[x]
+ + + + + + + + + + + + + + + + + + + + +
7. Cse Theorem 96.5 to show t h a t the following pol~nomials are not irrcducible in Q[x]. (a) x4 2x3 3x2 2x 1 (b) 4x3 16x2+ 21x+ 9 (c) x 6 + x4  x2  1
8. Show t h a t 'i'heorcm 96.4 is correct if the assuiiiption t h a t the charactcristic of F is zero is rcplaccd by thc condition t h a t the characteristic of Il' is a prime ivhich is larger than Deg [a(x)]. Givc an cxample ~vhichs h o ~ s that Theorcm 96.4 niay fui1 if tlic assumption (d) is omitted ciitircly.
971
T H E ROOTS O F A POLYNOMIAL
345
9. A nonzcro polynomial a ( x ) in F[x]is said to have a muEtiple3factor if there cxists a polynomial b(x) E F [ x ] ,of positive degree, such that b ( ~ ) ~ ( a ( Prove x). t h a t if F is a field of characteristic zero, then a polynomial a ( x ) E F[x] has a multiplc factor if and only if a ( x ) and a'($) are not relatively prime. 10. Use the result of Problem 9 to prove that the following polynomials have no multiple factors in Q[x]. (a) x4 x3 x2 x 1 22  1 ( b ) x3 (e) xn  1 (d) " x 3x2 2x  4
Find the condition on a and b in ordcr that the given polynomial have a multiple factor.
97 The roots of a polynomial. We now rcturn to our study of the solutions of algcbraic equat,ions. The work of the last five sections makes it possible to discuss this subjcct more critically than wc did in Section 91.
DEFINITIOS 97.1. Let D be ari integral domain, and let A be a commutativc ring which contains D as a subring. If
anun E A is called the alu and ZL E A , t.hen the elemcnt a. ualue of a ( x ) for x = U , and is denoted by a ( u ) . The element a ( u ) is said to be obtaincd by substituting u for . x in a ( x ) . Since the reprcscntation a ( x ) = a. al,r anxn is unique (by Definition 92.1), it follows that a ( u ) is uniquely defined by Definition 97.1.
EXAMPLE 1. The polynomial a ( x ) Suppose t h a t 1 1 = Z [ x ] . Then if u = x 2 ( x  1 ) + 1 = x3  3 x 2 + 55  2. a($) = 2(3) 1 =
=

+ +
The substitution process has some elementary propertics which are useful. (97.2). 1,et D be an integral domain which is a subring of a commutative ring A . Let f ( x ) , a ( x ) , and b ( x ) be in D[x]. Suppose that u E A . ( a ) If f ( x ) = a ( x ) b ( x ) , thcn f ( u ) = a ( u ) b(u). (b) If f ( x ) = a ( x ) b ( x ) , then f ( u ) = a ( u ) b(u). (c) If f ( 2 ) is a constant d in F, thcn f ( u ) = d. (d) If f ( x ) = a ( b ( x ) ) , that is, f ( x ) is thc polynomial obtained by substituting b(x) for x in a ( x ) , then f ( u ) = a ( b ( u ) ) .
346
EQUATIONS
[CHAP.
aix%nd b(r) =
bj2j.
Therefore,
The property (d) is obtained from (a), (b), and ( c ) by induction on t,he degree of a (x) . DEFINITIOX 97.3. Let D be an integral domain, and let A be a com) D[x]. An element mutative ring containing D as a subring. Let a ( ~ E c in A is called a root of a(x) [or a xero of a(x)] in A if a(c) = 0. The problem of finding the roots of the polynomial a(x) in A is exactly the same as the problem of solving the equation a(x) = O in A . We now restrict our attention to polynomials with coefficients in a field F. The results of Sections 93, 94, 95, and 96 (for example, the division algorithm, the properties of greatest common divisors, ancl the unique factorization theorem) can be used to obtain important information about the roots in F of polynomials in F[x]. Since many of the theorems proved in these sections do not apply t<opolynomials with coefficients in an integral domain, this restriction is essential. THEOREM 97.4. Remainder theorem. Let F be a field. If a(z) E F[x] and c E F, then a(c) is the remainder obtained on dividing a(x) by x  c. That is, there is a unique polynomial q(x) E F[x] such that
I'roof.
where either r(x) = O, or the degree of r(x) is less than the degree of x  c. Since Deg [x  c] = 1, it follows in either case t,hat r(z) is a
971
THEOREM 97.5. Factor theorem. An element c in the field F is a root of the polynomial a(x) E F[x] if and only if x  c is a factor of a(x) in F[x]. x Proof. By Theorem 97.4, the remainder obtained on dividing a(x) by c is a(c) . Therefore, x  c divides a(x) in F[x] if and only if a(c) = 0.
The factor theorem is often useful when one is trying to factor a polynomia!.
EXAMPLE 2. By inspection, the polynomial x3 x2 x 1 has 1 as a root. Thus, by Theorem 97.5, x  (1) = x 1 is a factor of x3 x2 x 1. Dividing, we find
+ + + +
+ +
+ +
1 is irreducible in Q[x]and R [ x ] ,because otherwise i t would The polynomial x2 have a real root, by Theorem 97.5. Thus, ( x 2 l ) ( x 1) is the complete factorization of x3 x2 x 1 in Q[x]and R [ x ] . However, in C [ x ] , x2 1 = ( x i )( x  i). Thus,
+ + + +
is a complete factorization in C [ x ] .
Using the factor theorem and the unique factorization theorem, we can now prove one of the most useful general theorems about the roots of polynomials. THEOREM 97.6. Let F be a field, and let a(x) E F[x] be a nonzero polynomial of degree n. Then a(x) has at most n distinct roots in F. If cl, c2, . . . , c k are al1 of the different roots of a(x) in F, then
where ml, m2, . . . , and m k are natural numbers, and b(x) is a nonzero polynomial in F[x] which has no roots in F. Proof. I f c E F is a root of a(x), then x  c is a monic irreducible factor of a($) in F[x], by Theorem 97.5. By the unique factorization theorem and Theorem 93.2(a), a(x) has at most n different monic irreducible factors
348
[CHAP.
in F[.r]. Therefore, a(x) has a t most n distinct roots in F. Let these be cl, c2, . . . , ck. Then x  cl, x  c2, . . . , x  ck must occur among the irreducible factors in the complete factorization of a(x) in F[x]. Thus we can write
1, m2 1, . . . , mk 1, and b(x) is a product of irreducible where m l polynomials which are different from x  cl, x  c2, . . . , and x  ck. If b(x) had a root c in F , then a(c) = O, so that c would be one of cl, c2, . . . , or ck. Ry t,he fact,or theorem, this would imply that x  cjlb(x) for some j. This would contradict the fact that b(x) is the product of al1 the irreducible factors of a(z) which are different from x  cl, x  cp, . . . , and x  ck. Therefore, b(x) has no root in F.
>
>
>
where ml, m2, . . . , m, are greater than zero. Taking t,he degrees on bot,h sides, we obtain from Theorem 93.2(a), n = Deg [a(x)] = Deg [(x  cl)ml]
=
m1
+ m2 + + m, + Deg [b(x)].
c,)".]
+ + Ileg [b(x)]
Since ml, m2, . . . , m, are natural numbers, this equality implies that = m, = 1, and Dcg [b(x)] = O. Hence, b(x) is a nonml = m2 = zero constant, and since a ( ~ is ) monic this constant must be 1. Thus,
A root c E F of the polynomial a(x) E F[x] is said t,o have multiplicity nz, or to be an mfold root of a(x) if (x  c ) ' ~ divides a(x), but (x  c ) ~ + '
does ilot divide a(x) in F[x]. Thus, c is a root of multiplicity m of a(x) if
971
349
where b(x) E F[x] is a polynomial such that b(c) # O. Roots of multiplicity one are usually cal.led simple roots. Roots of multiplicity two or more are called multiple roots. I f the field F has characteristic zero, then it follows from Theorem 96.4 that a root of multiplicity m > 1 of a(x) is a root of multiplicity m  1 of af(x), and a simple root of a(x) is not a root of a' (x).
EXAMPLE 3. Let us find the roots in C of x7 with the multiplicities of each root. We have
The roots of x2
They are
Therefore,
+ + i(d3/2), and +
is a polynomial with integral coefficients, such that a, # O (mod p). Then there are at most n integers d which are incongruent modulo p, and satisfy a(d) = O (mod p).
anxn and b(x) = bo blx bnxn If a(x) = a. alx are polynomials with integral coefficients, it is customary to write a(x) b(x) (mod p) if aobo(modp), and a, = b, (mod p). albl(modp),
...,
350
[CHAP.
I t is clear from Theorem 56.3, which gives the properties of congruence, that a(.) = b(x) (mod p) implies a(d) = b(d) (mod p) for any intcgcr d, and d = e (mod p) implies a(d) = a(e) (mod p) for any a(x) E Z[.r]. Becausc of thesc two observations, the study of the congrucncc modulo p of polynomials in Z[x] is equivalent to the study of polynomials with coefficients in the field 2,. Theorcm 97.8 is simply a reinterpretation of Theorcm 97.6 from this new vicwpoint. So that thc readcr can get a better understanding of the method of translating theorcms about the field Z , into statements about congruences modulo p, 11.c will give the proof of Theorcm 97.8 in full dctail. Suppose that d l , d2, . . . , dk are intcgers such that di and a(di) = O (mod p) for al1 i. We must show that k 5 n. Let bo, bl, . . . , b, be t,he remaindcrs obtained on dividing ao, a l , . . . , a,, rcspectively, by p. Lct el, e2, . . . , e k be thc remainders on dividing dl, d2, . . . , dk by p. 'I'hat is, a I. = Ol. ( m o d p ) , and di for O
E
# d j (mod p)
for i # j,
OIbi<p,
ei (mod p),
5 ei < p.
5 j 5 n and
Then a(x) = b(x) (mod p), and b(ei) = O (mod p). Thc integers bo, bi, . . . , bn, and el, e2, . . . , ek can be rcgarded as elements of the field 2,. Thus, b(x) can be considered as a polynomial with coefficicnts in Z,[x]. Since a, # O (mod p), the leadirig cocfficicnt b, of b ( x ) is not zero. Therefore, Deg [b(x)J = n. S o t e that thc addition and multiplication operations of Z, are different from the operations in Z, so that thc result of substituting ei into b(x) when ei is thought of as an clcment of Z,, and b(x) is considered as belonging to Z,[x] will bc different from thc result
971
351
obtained when ei is taken a s a n integer, and b(x) as a polynomial with integral coefficients. I n Zp we have
+ b2e: +
+ bnel.
However, by
by Theorem 97.6.
bo
+ blei + b2e: +
for
+ bnel = O (modp).
Since
# j,
di # di (mod p)
Therefore,
n,
EXAMPLE 4. We illustrate the proof of Theorem 97.8 by an example. Let a(x) = x3  x2 x 9. Then
+ +
a($)
b(x)
x3
Thus, b(x) has roots 1, 2, and 3 in Z5. Since Z5 is a field and Deg [b(x)] = 3, the polynomial b(x) cannot aave more than three roots. Returning to the original polynomial a(x), we see that
and if d is any integer such that a(d) 2 (mod 5)) or else d = 3 (mod 5).
a(1)
O (mod 5 ) )
a(3)
 d
O (mod 5))
1 (mod 5))
Although a t first glance i t seems somewhat trivial, Theorem 97.8 is a powerful tool in number theory. T o support this statement, we digress from our study of the theory of equations and use Theorem 97.8 to prove
352
[CHAP.
the fact, mentioned in Section 58, that if p is a prime, then there are p(p(p)) = p(p  1) primitive roots modulo p among the numbers 1,2, . . . , p  l . The reader who is not familiar with the material in Sections 16 aild 58 can pass on to the next section. Recall that if a is an integer prime to p, then the order of a modulo p is the smallest natural number d such that ad 1 (mod p). By Theorem 58.9, the order d of a modulo p is a divisor of p  1, and a is called a primitive root modulo p if its order is p  1. The desired result is a special case of the following theorem.
THEOREM 97.9. Let p be a prime. Suppose that dlp  1. Then among the numbers 1, 2, . . . , p  1, there are exactly p(d) integers mhich have order d modulo p. The proof is carried out in three stages. Only the first step uses Theorem 97.8. (1) Among the integers of the set S = (1, 2, . . . , p  1) there are cxactly d which satisfy xd  1 = O (mod p).
Proqf. Since d lp
1, we have
+ + xZd+
+ xk'd, with
[(p
l)/d]
1.
: o ~  l 1 = O (mod p) has p

:od  1
d solutions in S. Therefore,
O (mod p)
must have a t least d solutions in S. On the other hand, by Theorem 97.8, there can be a t most d solutions of zd  1 = O (mod p) in the set S. (2) To obtain Theorem 97.9 from the result (1)) we will use induction on d. To carry out this induction, an important identity is needed:
that is, the sum of p(e) over al1 natural izumbers e which divide d (including 1 and d) is exactly equal to d.
971
353
I'roof. Let T = { l , 2, . . . , d). For each divisor e of d, define Te = (lc E TI (d, k ) = el. Then each number k E T belongs to exactly one of t,he sets l',, with eld, that is, 1' is thc union of t.he pairwisc disjoint collect,ion {Tele divides d] . 1Icnce,by Theorem 16.4,
In ordcr to determine ITel, the number of elements in Te, note that k belongs to Te if and only if (d, Ic) = e, and that (d, Ic) = e is equivalent to eJi; and (d/e, k/e) = l . Hence, thcre is a onetoone correspondence between l', and the set (m E 211 5 m _< d/e, (d/e, m) = 1)) given by
Thercfore, ITel = 1 (m E L(1 5 m 2 d/e, (d/e, m) = 1) 1 = cp(d/e), by the dcfinition of the totient, Dcfinition 58.5. Consequent,ly,
eld
one natural (3) We can now prove Theorem 97.9. There is exact.1~ number a in the set S = (1, 2, . . . , p  1) which has order 1 modulo p, namely, a = 1. Hcnce, the theorem is true for d = l. We can therefore make the induction hypothesis that if elp  1 and e < d, then there are exactly cp(e) integers in S which have order e modulo p. For each divisor e of d, define Se = (a E Sla has order e modulo p). I t is obvious that the collection (Sele divides d) is pairwise disjoint. By Theorem 58.9: u((Sele divides d) ) = {a E slad Hence, by (1))
eld
eld,e<d

1 = O (mod p)).
< d.
Therefore,
354
[CHAP.
Consequently,
Isdl =
~(d>.
1. FVithout actual division, find the remainder when (a) x3 22  4 is divided by x  1 ; (b) x25 14x1 24 is divided by x 1 ; (c) x5 12x4 13x2 x 27 is divided by x 3.
+ +
2. Completely factor the following polynomials in C[x]. (a) x2 ix 2 (b) xs  1 x2 1 (c) 2 4 (d) X~  2 ~ 3 5 ~2 zx 24 (e) x3  2 (f) x3  5x2  9 x + 12
+ + + +
4. Find al1 monic fifthdegree polynomials f (x) in C[x] such that (a) f (x) has i as a root of multiplicity four; (b) f x) has 0, 1, 2, and 3 as simple roots; (c) f(x) has 1 and i as roots of multiplicitYf two; (d) f(x) has i and i as simple roots and 1 as a root of multiplicity two. 5. Show that the sum of the multiplicities of the roots of a polynomial a(x) E F[x] is less than or equal to the degree of a($).
6. Show that if a(x) and b(x) are polynomials of degree less than n in F[x], and if a(di) = b(di) for i = 1, 2, . . . , n, where di, d2, . . . , d, are distinct elements of F, then a(x) = b(x) in F[x].
7. Let a(x)
xP  x
8. Prove Taylor's theorem: I f f(x) is a polynomial of degree n in F[x], where F has characteristic zero, and if c is any element of F, then
981
355
11. Show that if f(x) = ax2 bx c E F[x], a # O, then either (a) f(x) has two distinct roots in F, (b) f(x) has one root of multiplicity two in F, or (c) f (x) is irreducible in F[x]. 12. Show that in Z,[x] there are exactly +p(p  1) polynomials of the form x2 ax b which are irreducible. [Hint: Show that there are +p(p 1) polynomials of this form which are reducible.]
+ +
+ +
13. Let a(x) be a monic polynomial of degree n in Z[x]. Suppose that p is a prime. Assume that di, dz, . . . , d, are integers such that di $ dj (mod p) if i # j, and a(di) = O (mod p) for al1 i. Prove that
a(x)
= (x
 dl)(x
d2).
. . (x
d,) (modp).
14. Use Problem 13 and Fermat's theorem to show that if p is a prime, then xpl

= 1 (modp).
98 The fundamental theorem of algebra. We come now to what is probably the most important result in the theory of equations.
THEOREM 98. l . The fundamental theorem of algebra. If f (x) E C[x]is a nonzero polynomial with Deg [ f (x)] >_ 1, then f (x) has at least one root in C. This theorem was surmised as early as the sixteenth century. Severa1 incorrect proofs of it were published before a satisfactory proof was found by Gauss in 1797. Gauss ultimately gave five different proofs of the fundamental theorem of algebra, each of which introduced new ideas and methods which have greatly influenced the development of mathematics. Of course, many other proofs of this theorem have been discovered since Gauss's time. Unfortunately, al1 of the known paths from elementary mathematical principles to Theorem 98.1 are quite long. We will not try to give a proof in this section. The reader who is interested in seeing a complete and correct proof can study Appendix 3 of this book, after he has read the remainder of this chapter. I t is possible for us to give a geometrical argument which shows that the fundamental theorem of algebra is plausible. a0 E C[x], where a, Z O and Let f(x) = anxn anlxnl n >_ 1. Since every root of
[CHAP.
If a complex number z is suhstituted for .r in f(x), then we obtain a complex number j(z). We interpret the numbers z and f(z) as points in the complex plane. As z ranges over a circle of radius r with ceiiter a t t,he origin O of thc: complex plane, the corresponding point f(z) describes a closed curve C,. Figure 91 shoms the curves ClI4, Cll,/j, C1, and C312for the polynomial f(x) = "2 . z i. If r = O, then C, is not a curve, but instead it is the point ao, and for small positive values of r, C, lies very close to this point. I n particular, for sufficiently small values of r, C, does not enclose the origin of the complex plane, because a. # O. If r is very large, the curve C, is approximated by the curve C: corrcsponding to the polynomial xn, since for values of z which have large absolute value, the term zn in f(z) dominates the sum a,lznl . . a l z a. of the rcmaining terms. If z = r(cos 0 i sin O ) , then zn =rn (eos n0 i sin no) (see Section 84). Thus, C: is a circle of radius rn which is traversed n times as z circles the origin once. From this obscrvation, it follo\vs that for large r, C, is a curve which
981
THE FUNDAMENTAL
THEOREM OF ALGEBRA
357
encircles the origin of the complex plane n times and lies relatively close to the circle mith center O and radius rn. AS r increases from small to large values, C, is deformed from a curve which does not enclose the origin into one which encircles the origin n times. The reader should try to visualize this deformation process in Fig. 91. It is geometrically evident that a t some stage of this deformation process, the corresponding curve must pass through O. That is, there exists an r > O such that C, passes through O. By definition of C,, this means that for some complex number x with 1x1 = r, the value of f(x) for x = x is O. Thus, x is the desired root of f(x). I t is possible to make this intuitive argument into a valid proof of the fundamental theorem of algebra by giving exact definitions of the geometrical concept of a curve, of the deformation of one curve into another, and of the idea of a curve enclosing a point. I n addition, it is necessary to establish some properties of these notions which seem obvious, but turn out to be very difficult to prove. To carry out this program would require a fairly deep penetration into the field of geometry which mathematicians cal1 topology. Since our main interest in this book is algebra, \ve will not pursue this topic. We now examine some of the consequences of the fundamental theorem of algebra. THEOREM 98.2. The irreducible polynomials in C[x] are exactly the polynomials of degree one. Hence, every polynomial a(s) E C[x] of positive degree can be written in the form
where b is a nonzero complex number, cl, c2, . . . , cn are al1 of the roots of a(x) in C (possibly with repetitions), and n = Deg [a(x)]. This factorization of a(x) is unique up to the order of the factors. Proof. Suppose that p(x) is an irreducible polynomial in C[x]. By Definition 95.1, p(x) # O and Deg [p(x)] > O. Therefore, by the fundamental theorem, p(x) has a root c E C. By the factor theorem, x  c divides p(x) in C[x]. Thus, x  c = b p(x) for some b # O in C (by Definition 95.1)) so that p(x) = 6' (x  c) has degree one. Since polynomials of degree one are always irreducible (see Example 2, Section 95)) this proves the first statement of Theorem 98.2. The second statement is a consequence of the unique factorization theorem, taking into account what we have just shown. The reader should bear in mind that since Z C Q C R C C, polynomials with coefficients in Z, Q, or R are polynomials in C[x], and therefore they have roots in C. This observation leads to the characterization of the
358
[CHAP.
irreducible polynomials in R[x]. First we need an important property of the complex root's of real polynomials. THEOREM 98.3. Let f(x) E R[x] C[x]. I f c = a ib is a complex number which is a root of f(x), then the complex conjugate = a  ib of c is also a root of f(x). O f course, it may happen that c itself is real, in which case this case, the theorem is trivial. To prove this theorem, let
=
c. In
Taking the complex conjugate of the lefthand side of this equation, we obtain from Theorem 82.2
Since ao, a l , . . . , a, are real, it follows that o = ao, 1 = al, , = a,. Therefore,
. . . , and
Thus, E is a root of f(x). THEOREM 98.4. The irreducible polynomials in R[x] are exactly the polynomials of degree one and the polynomials
with a, O, and c real, and b2  4ac < O. Hence, every polynomial a(x) E R[x] of positive degree can be written in the form
where b is a nonzero real number, cl, c2, . . . , C, are al1 of the roots of a(x) in R (possibly with repetitions), and dl(x), d2(x), . . . , ds(x) are quadratic polynomials in R[x] which have no real roots. Proof. Suppose that p(x) is an irreducible polynomial in R[z]. Then p(x) Z O and Deg [p(x)] > O. By Theorem 98.1, there is a complex number x such that p(z) = O. If x is real, then z  divides p(z) in R[x],
981
359
so that p(x) has degree one, as in the proof of Theorem 98.2. Therefore, suppose that x is not real. By Theorem 98.3, Z is a root of p(x), and Z # x. Let
By Theorem 82.2(f), x 2 = 2@(x) is real, and by Theorem 82.4(b), z.2 = /zl is real. Therefore, d(x) E R[x]. By the division algorithm, we can write P(X>= q(x) . d(x)
+ 44,
=
where q(x) and r(x) are in R[x], and either r(x) Deg [d(x)] = 2. Since ~(2) = p(z) and r(2) = p(2)

= =
O O
q(x) O
q(x) O = 0,
it follows that r(x) must be the zero polynomial. Indeed, otherwise the number of roots of r(x) would exceed Deg [r(x)], which is impossible by Theorem 97.6. Thus, d (a) divides p(x) in R[x]. Since p(x) is irreducible,
where a is some nonzero real number, and b =  a . (x are also real. Moreover, by Theorem 82.2(f),
+ z), c = a
xZ
since a # O and g(x) # O (because x is not real). This shows that every irreducible polynomial in R[x] is either linear or of the form ax2 bx c . Conversely, al1 such polynomials are irreducible in with b2  4ac < O R[x] (see Problem 3, Section 95). The last statement of Theorem 98.4 is a specialization of the unique factorization theorem to the ring of polynomials with real coefficients.
+ +
EXAMPLE l. The knowledge of a single root of a polynomial often simplifies the task of finding the remaining roots. For instance, if we are given that 1 i is a root of x4  4x3 5x2  23  2, then it follows from Theorem 98.3 that 1  i is also a root. Dividing x4  4x3 5x2  2x  2 by
360
EQU~ITIOSS
[CIIAP.
9
1
2x
22
*(2 *(2
+ 214 j
1 1 4 2 ,
1

 4)
42.

4 ~ :5 . 7 . " ~
liz/12,
21
2 are 142.
aiid
I.:S.~JII>LI: 2. S O I I I P ~ ~ Ii~ t Iis CS ~iceclssarjt o ~letcrininc a poljnoniial froni tlie kiio\\lcdge of its roots. If tlie polynornial belorigs t o C [ r ]and t h r leading coefficicnt, :iiitl al1 of tliv co1iil)lcs roots, togetlicr n i t h their niultil~licitics, are given, tlicii 'Tlicorcrii 98.2 solvrs tliis problcni. For cxaiiil)le, tlic riioriic ~)olj.rioinial \~liic.h has i as a doublc root, 1 i as a sini1)lc root. ancl 1 :is a sirnplc root is
(.r
il2[x  ( 1
+ i)](s
1)
 ~4 
( 2 + 3ilX:3  ( 2
5 i ) . r 2 + (4
i)x
(1
+i).
Vcrj oftcri in such ~)roblcnis, the inforin:itiori about thc roots is inconil~lete, so tliat i t is neccssary t o use othcr dat:i. For csaml)lc, suppose t h a t ne wish t o find cvcry real, cubic 1)olynonii:il n ( x ) witli 1c:idiiig cocfficiciit 1 and coristant tcrril 1, uhich h:is i as one of its roots. Siricc a(.r) is to 1ia1,c real coefficients and i is :t root, i t folio\\s froii~ Theorciil 08.3 t h a t  i is also :i root. Let x bc the rciiiainirig root. 'i'lieii
'Thcrcforc, b = 2, c = 1, 1 = z. '1'Iius, tlic oiily 1)olj.noniial ivitli thc rcquirccl propcrty is x3 1 x2 1 t 1 . Of coursc i t cal1 :11so be scen t h a t z = 1 by ol)scrvirig t h a t tlic 1)rotluct of tlic roots of a cubic polynon~ialis equal t o tlic ncgativc of tlic coristnrit tcriii clividctl b !  ttic 1c:iding cocfficicnt.
l . 1sing IFig. 91, ostiiiintc roiiglily tlic :ibsoliitc vnlucs of tlie roots of tlic l)olyiion~ial x:~ r i.
+ +
2 . Firid al1 of tlic roots of thc folio\\ing l)ol!iioniials, riiaking usc of tlic givcii tlit:i. (a) x:j GX'  24.r  160, oric root of \vliicli is 2  2 4 3 i . (b) rX1 (1  2 i ) s 2  ( 1 4 %).t.  1, uhicli h:is a doul~lc root. 4 1 . : '  4.r 14 , \vliicli 1i:is 1 1 i 3 s :i cloublc root. (c) X"  3s"
991
SOLUTIOX OOFTHIIID
SI)
FOUKTHDEGREE EQUATIONS
361
3. Find the monic polynomial a(x) in C[x] from the givcn data. 4i, and 1  4i, and no othcrs. (a) a(x) has simple roots 1, 2, i, 1 (b) a(x) has i a s a root of multil~licitythree and Dcg [a(x)] = 3. (c) a(x) is real, of the fourth dcgree, and has 1  i and i among its roots. bx c, and 2 i is a (d) a(x) is a real cubic polynomial of thc form x3 root of a(x).
+ +
Exprcss a, b, and c in tcrms of r l , r z , and rs. Obtain similar results for monic polynomials of degrce four.
5. Gsing Thcorcm 98.4, prove that every real polynomial of odd degree has a t least onc real root. [Remark. In Section 910, we will give a proof of this fact which docs not make indircct usc of the fundamental theorcm of algcbra.]
6. Let f(x) be a monic polynomial in R[x]such that f(x) has no roots in R. Prove t h a t f(a) > O for al1 real numbers a.
7. Lct f(x)
ax2"
n 2 l.
+ bxn + c be a polynomial +
(a) Show how Theorems 82.7 and 84.3 can be used t o find al1 of the roots of f ( 4 (b) Find thc roots of xG  2ix3 (1  i). 8. Prove t h a t if f(x) E R[x]has thc complex root c with multiplicity m, then f (x) also has C as a root of multiplicity m.
*99 The solution of third and fourthdegree equations. The fundamental theorcm of algebra is what mathematicians cal1 an existence theorem. I t asserts that certain numbers always exist', but it gives no method for talian mathematicians of the Renaissance period were finding them. The 1 mainly concerned with methods by which they could actually determine roots of particular equations. It was a remarkable achicvement that theyt discovered formulas which explicitly exhibited the solutions of third and fourthdegree equations. 'I'he cxpressions which give the roots of the general cubic equation can easily be derived by formal manipulation. Suppose that x = x is a solution of " 2 bx2 cx d = 0, (93)
+ +
Scipio Ferro discovered a solution of x3 ax = b, whcre a and b are positive real numbers. This was rediscovered and gcneralized somewhat b y Tartaglia, who showed his work t o Cardan under a pledge of secrecy. Cardan published the result of Ferro and Tartaglia, togethcr with some discoveries of his own, but he neglccted to mcntion t h a t the solution of the cubic equation was not his own work.
362
[CHAP.
+ b/3. Then
Conversely, if w satisfies
then it is easy to see that x = w  b/3 is a solution of x3 bx2 cx d =O . Therefore, we can restrict our attention to reduced cubic equations, that is, equations of the form
+ +
where the coefficients p and q are related to the coefficients of the general cubic equation (93) by
I f p
O in (94), then the reduced cubic equation has the special forrn
In this case, the three roots of the equation are the three complex cube roots of q which can be found by Theorem 84.3. Thus, we may assume that p # O in (94). Suppose that w is a solution of (94). Let u satisfy u2  wu  p/3 = O . Then, since p # O, it follows that u # O . Therefore, w = u  p/3u. Substituting in (94), we have
( u  p/3uI3
that is, Consequently
P(U 
p/3u)
+q
O,
u3  ( ~ / 3 ~ ) 3 q = o. (u3I2 q(u3)  ( ~ / 3= ) ~ o.
Suppose that u satisfies the equation (96). By Theorem 84.3, this equation has three solutions. I f u is any one solution, then the other two are f u and f 2 u , where
f = cos 120 f2
are actually roots of the reduced cubic equation x3 first note that
+ px + q = O.
We
Therefore,
(P/~u> =~q/2

d(q/2)2
+(~13)~.
Substituting wl = u p/3u in x 3
+ px + q, we have
f3
1,
364
EQTTATIOSS
[CHAP.
Iii the same \\;ay, ('U  (p/:<usatisfies (94). Thcrefore, w l , w2, and W Q are roots of the reduced cubic. These roots wcre obtained by assuming that u satisfies (96). Bowever, (08) shows that if u is a solution of (9ti), then u =  p / 3 u is a solution of (97). Therefore, the three solutions of (97) are u, ( u , and ?u. Thcse lead to the roots
of thc reduced cubic. Thus, (97) does not lcad to new solutions of (91) Wc summarize our results in the follo~ving theorcm. THEOHEM 99.1. 1,et p # O and q be complex numbcrs. Thcn the solutions of the reduced cubic equation
is a
UL!=
q/2
that
Jq/2
+ d ( q / 2 ) 2 + (pm 'j
p/3
The expressions in this theorem for tho solutioris of the rcduccd cubic equation are kilo~vnas Cardan's formzLlas.
991
365
EXAMPLE 1. Let us solve x3 3x2 2 = O. The corresponding reduced equation is obtained by letting x = y  1: y3  3y 4 = O. Thus, p = 3, q = 4, and
+ +
Taking
42+
43
and
q  2

43
+ 2/3 and 2
and
{ W  2
+ i 3+ j42
=
43,
+ 3x2 + 2
O are
The solution of the general quartic equation can be obtained from the solution of a cubic equation by an ingenious trick discovered by Ferrari (15221565), a student of Cardan. As in the case of cubic equations, it is convenient to reduce the general quartic equation,
+ r?J2+ + t = 0)
Sy
= y4
+ 2uy2 + u2 = ( 2 u
) ~
sy ~
+ (u2 t ) ,
366
since y4
=
EQUATIOSS
21
rg2 ( u
sy
so that
)y2
+ (z2
t)
(my
+ n)2
for suitable complex numbers m and n. This cquation will certainly hold no matter what y may be, provided m2 = (22s  r), n 2 = u 2  t, and 2mn = s. These rcquirements impose the condi tioii (S) = (2mn) = 4m2n2 = 4(2u  r)(u2  t). In other words, 11 must satisfy the resolztent cubic equation
(m9
+ n)2,

v''%T, n = s/(2d2u
f(my
r).
(91 2)
lo (913) are the roots of the reduced The four roots of the t ~ ~ equations quartic equation (9lo), as is easily shown by reversing our steps in the derivation of (913). Since a solution of (911) can be obtained, using Theorem 99.1, it follo~vsthat (910)) and hence (99)) can be solved explici tly.
EXAMPLE 2. Consider thc quartic cquation
l . JYe obtain
Thus, r
6, s
4, and t
Clcarly, u
991
S O L U T I O S O F THIRD . i S D F O U R T H  D E G R E E E Q U A T I O S S
367
+ i and 4  2
x4  4x3
Combining thesc results, ~ v c obtain al1 of the solutions of the original equation 12x2  122 5 = O :
Succcss in solving thc cuhic and quartic equations led mathematicians from t,he time of Bombclli to scck similar results for the general fifthdegree bx4 t cx3 d x 2 ex f = O. However, al1 efforts cyuation x5 failcd. Thc rcason for this failurc )vas finally discovered in 1824 by t,he young Korwegian gcnius, Y. 11. Abel (18021829), who provcd that the gencral fifthdegrcc cquat,ion cannot be solved by means of radicals. T h a t is, there are rio cxpressions (involving only the operations of addition, multiplication, subtraction, division, and thc operation of taking square roots, cube roots, fourth roots, etc.) which explicitly cxhibit the roots of an arbitrary moriic fifthdegrec polyriomial in terms of the coefficicnts of the polynomial. I h e n dceper insight into the solutions of polynomial equations resultcd from the investigations of Abcl's E'rench contemporary, Kvaristc Galois* (18111832). Galois' theory not only sho~vcdwhy it is
+ +
* Galois was perhaps the grcatest of al1 mathematical prodigies. Of him i t can truly be snid t h a t hc: \vas neither a1)preciatcd nor understood during his lifctime. His mathematical work !vas not publishcd until 14 years after his death, and \vas not absorbed into thc body of mathcmatical kno~vledgefor another 25 years. Yet thc ideas in this work rcvolutionized algcbra. Galois \vas killed in a ducl a t thc age of 21.
368
TIIE T I I E O I ~ Y OE' A L ( ; E B ~ I . ~ I CE Q U ~ ~ T I O S S
[CHAP.
impossible to solve the gencral fifthdcgrce ccluation by radicals, but also revealed ivhy the third arid foilrthdegree equations caan he solved. Evcii today, Galois' work stands, practically uilchanged, as one of the most bcautiful thcories of modern mathemat ivs.
+ +
2. (a) I'rove in dctail t h a t nny solution y of one of thc equations (913) is a solution of (9lo), provided ?n antl r~ are givcri by (912) :iiid u is any solution of (91 1). (b) 11rite on :t largc 1)icce of pn1)cr aii cs~)rcssioriwhich gives a solutioii y of (910) iii tcrms of r, S, arid t.
3. I,et f(x) be a monic cubic 1)olynoiiiial ivith roots rl, criminant D of f(x) is dcfirichcl t o bc
72,
px
+ q is
+ 3 + c2 = O.]
+ bx2 + cx + d.
c3 =
5. Prove tliat s cubic polynomial f(x) in 12[x]has thrce distinct real roots if thc discriminarit IJ of f ( r ) is positive, rc:tl roots, oric of whicli is a multiplc root, if D = O, nrid n single real root and two (riorircal) coiiiplcs conjugate roots if D < O.
6. Cse tlic rcsults of I'roblciiis 3 and 5 to detcriiliiie tlie riumbcr of real roots of the folloniiig polyrioniials. (h) z3  ql0 .x  1 (c) 2~~  % . { 1 (a) xLk 2x  1
7. E'incl tlic roots of z"  2.c 1 by obscrvirig tliat 1 is a root. Firicl the exl~rcssion, giveri by 'i3icorciii 99.1, for cacli of tlicsc roots.
8. Let a(x) = z:' 1 p r 1 q, jvlicrc p ancl rl nrc real arid (p/3)3 (so tliat p < 0). I'rovc tliat tlic tlircc roots of a(x) :ire
+ (q/2)2 < O
9 1o]
where
GRAPHS O F REAL P O L Y S O M I h L S
[Iiint: Let q/2 i~'[(p/3)~ ( ~ / 2 ) ~ = 1 r ( ~ 04 s i sin 4). Show that r = dp"/7 and cos 4 = (q/2)/V'p3/27. Substitute into Theorem 99.1, and use Theorem 84.3.1 9. Use the result of Problem 8 to find the roots of the follo~ving polynomials. (a) x3  22 1 9 (c) x3  3x2  32  4 (b) x3  92
910 Graphs of real polynomials. An importarit part of the theory of equations in R [ x ]is concerned with finding the real roots of polyriomials. For a given polynomial a ( x ) E R [ x ] , the problem is to determine the number of real roots of a(.c) and obtain decimal approximations of each real root. In this section and the following one, we will discuss some of thc basic methods for solving these problems. 1,et a ( x ) = a,xn anlxnl alx a0 be a polynomial with real coefficients. Associated with each real number c is the value a(c) of a ( x ) a t x = c. Of course, a(c) is also a real number. The set of al1 ordered pairs of real numbers a(c)>lcE R )
is called the graph of a ( x ) . Since each ordered pair of real numbers can be represented by a point in a coordinate plane, the graph of a ( x ) can be represented by a set of points in the plane. I t is customary to also refer to this set of points as the graph of a ( x ) . Experience shows that the graph of a real polynomial a(x) is a smooth unbroken curve. 14'or example, if a ( x ) is a constant polynomial, then the graph of a ( x ) is a horizontal line. If Deg [ a ( x ) ]= 1, then the graph of a ( x ) is a straight line which is neither horizontal nor vertical [see Fig. 92(a)]. If Deg [a(m)]= 2, then the graph of a ( x ) is a parabola [see 17ig.92(b)].
370
[CHAP.
From the graph of a real polynomial a(x), it is possible to obtain a great deal of information about a(x). For example, the real roots of a(x) are the numbers c such that a(c) = O, that is, they are the points at which the graph of a(x) either touches or crosses the Xaxis of the coordinate plane. Thus the graph of a ( s ) tells us (at least roughly) where the real roots of a(z) are located.
EXAMPLE l. Let us sketch the graph of a(x) = x3  3x2  22 6. I t is convenient to make a table of values of a(c) corresponding to various choices of c:
We plot the points determined by the pairs (c, a(c)) from the above table in a coordinate plane, and sketch a n unbroken curve which passes through these points (see Fig. 93). It is seen from this graph that a(x) has three real roots a t approximately 1.5, 1.5, and 3. Actually 3 is an exact root of a(x) as our table shows, and factoring out x  3 gives
6 it is not necessary to plot the graph in Hence, for a($) = x3  3x2  2x order to find the real zeros. However, for polynomials of higher degree, graphical methods may be the most effective way of approximating the roots.
The fact that the graph of a polynomial is an unbroken curve suggests the following important result. THEOREM 910.1. Let f(x) be a polynomial in R[x]. Suppose that a and b are real numbers such that a < b, and f(a) and f(b) have opposite signs. Then f(x) has a t least one real root c with a < c < b. This result is intuitively obvious. In fact, by assumption, the points (a, f(a)) and (b, f(b)) are on opposite sides of the Xaxis in the coordinate plane. Since the graph of f(x) is an unbroken curve which passes through these two points, this graph must, at one or more points between a and b, cross the Xaxis (see Fig. 94). That is, there is a real number c with a < c < b, such that f(c) = 0. Of course, the above remarks do not constitute a proof of the theorem. The completeness property of the real numbers will be used to locate the largest root c of a(x) in the interval from a to b. The argument is a slight modification of the proof of Theorem 76.3. The proof of Theorem 910.1 will not make use of the fundamental theorem of algebra. This remark is important, because our proof of the fundamental theorem given in the appendix is based on Theorem 910.1. Before giving the proof, it is convenient to establish a simple property of real polynomials. (910.2). Let g(x) E R[x]. Then there is a positive real number m which depends only on g(x) such that m g(h) 5 m for al1 h E R satisfying Ihl 5 1.
<
372
[CIIAP.
+ lb,/
I'rooj o j Theorem 91 0.1. Since j(a) and f(b) havc opposite signs, it, follows that either j(a) > O > f(b), or !(a) < O < f(b). We will prove the theorem for the case j(a) > O and j(6) < O. The proof iil the other case is similar. Let S = {t E R J a 5 t 5 6 and S() > O). . That is, S is the set of al1 real numbcrs betwecn a and b for ivhich the value of j ( s ) is positivc. The sct S is riot empty sincc a E S . Xloreovcr, 6 is an upper bound for S. Since R is a complete ordered ficld, the sct S has a least uppcr bound (see Ilefinition 75.4). Let c = 1.u.b. S. Then a c, because a E ,S' and c is an upper bound of S, and c _< b, since b is an upper bound of S and c is the least upper bound of S. The definitions of S and c imply two facts which \ve will use: (1) if c < t _< b, thenf(t) _< 0; (2) if h > O, then there is a real number t such that c  h < t 5 c and f(t) > 0. Indeed, if c < t 5 b, thcn t 4 S, since c is an upper bound for S . IIowever, a 5 c < t 5 b and f(t) > O implies that t E S, by the definition of S . Therefore, j(t) > O is impossiblc. That is, j(t) _< 0. Morcover, h > O means that c  h is not an upper bound of S, so that c  h < t for some t E S. Furthermore, t E S implies t 5 c and f(t) > O. The proof will be completed by showing that both of the inequalities f(c) > O and f(c) < O lead to contradictions. Indeed, it then follows that f(c) = 0, so that c # a and c # b. Thus, a < c < O. Consider the polynomial f(x C)  f(c), where f ( s c) is obtained from f(x) by substituting n: c for x in f(x). Since f(0 c)  f(c) = O, it follows that O is a root of this polyr~omial. Consequently, by the factor theorem, we have (3) f (x c)  j(c) = x g(:L.), where g ( x ) is some polynomial iri R[x]. Let m be a positivc real number such that (4) if h E Ii and Ihl 5 1, thcn m 5 g(h) _< m. Such a numbcr exists, by ($110.2). Suppose that f(c) > O. Then c < b, sincc j (b) < O. Define
<
+ +
+ +
(i) min
(2, b
c, j(c)/m)
This definition is so contrived that h sat.isfies (5) h > 0, (6) h 5 1, (8) h m < f(c). (7) h c < O , By (3), (a), (S), (6), aild (8)) wc obtain
9 101
GRAPHS O F R E A L POLYNOMIALS
373
However, it follows from ( 5 ) and (7) that c < h c < b, so that this inequality is in contradiction with (1). Therefore, f (e) > O is impossible. Suppose that f (e) < O . Define h
=
This choice of h leads to the inequalities (9) h > 0, (10) h _< 1, (11) h m f(c). By (2), there is a real number t such that c  h < t Consequently, h < t  c O < h, so that It  cl fore, by (3)) (4)) and (1 1) (substituting t  c for x),
<
<
< O is impossible,
EXAMPLE 2. Another proof that each positive real number d has a real nth root (Theorem 76.3) can be obtained very easily from Theorem 910.1. Con1)" d l. Hence, sider the polynomial f(x) = xn  d. Since n 2 1, (d f ( d + 1) = ( d + 1)"  d d + 1  d = 1 > O,andbecausef(O) = d < 0, i t follows from Theorem 910.1 that f(x) has a positive root. That is, d has a positive nth root.
>
+ > +
3. Theorem 910.1 can be used to locate the real roots of the polyEXAMPLE nomial f(x) = x3  12x2  13x 6. J%Te make a table of values for f(x) :
By Theorem 910.1, f(x) has three real roots ti, ta, and
2
t3
such that
< ti <
1,
<
tg
<
1,
and
12
< ts <
13.
Since f (x) can have a t most three roots, ti, tg, and t3 are al1 of the roots of f (x).
374
[CHAP.
To make the most effective use of the method used in Example 3 to locate the real roots of a polynomial f ( x ) E R [ x ] ,it is desirable to have an upper and a lower bound for the real roots of f(x). Otherwise, we will usually not know how large or small to take t in calcuIat,ingf(t) for a table of values of f ( x ). THEOREM 910.3. Let f ( x ) = xn be a polynomial in R [ x ] . Define
A l = max (anl,
+ a n  l ~ n  i + an2xn2 +
. . , a09
+ a0
an2,.
o>
. . . , (l)nlao, O).
+ 1).
5
>O
for al1 t > n/!+ 1, and (  l ) " f ( t ) > O for al1 In particular, if f ( x ) has a real root c, theii c 5 M 1.
+ 1 2 1, then
a
j,
and hence M
5 aj,
[see Problem G(a), Section 211. To prove that t <  ( m 1 ) implies that (  l ) n f ( t ) > O, simply apply the result which has just been proved to the polynomial
We leave the details for the reader to work out. It should be emphasized that the bounds  ( m 1 ) and M 1 obtained in Theorem 910.3 for the real roots of a polynomial are not in general the best possible. For instance, the theorem gives the bounds 1 and 6 for the real roots of x 2  5x 9, although this polynomial actually has no real root.
EXAMPLE 4. Let us obtain upper and 101%er bounds for the real roots of the
polynomial f(~= ) 2~~  3~~
 4.
Since f(x) is not a monic polynomial, Theorem 910.3 does not apply directly to give bounds for the roots of f(x). However, the roots of f(x) are evidently the
910]
+f(x)
We have max and max
x4  $ x 3 +
0,
+x
2.
=
{(2), (2,0,
3,
(21,
0)
= 2.
3, (21,
0)
5c5
3 by Theorem 910.3.
An important consequence of Theorems 910.1 and 910.3 is the following result. THEOREM 910.4. If f(x) is a nonzero polynomial iii R[x] such that Deg [ f(x)] is odd, then f(x) has at least one real root. Proof. Let f(x) = a. alx . anlxnl anxn, where ao, and a, are real numbers, a, # 0, and n is odd. Define al, . . . , g(a) = a ; ' f(x). Then
where bnl = anl/an, . . . , bl = al/an, and bo = ao/an. Since every root of g(x) is also a root of f(x), it is sufficient to show that g(x) has at least one real root. Let u and v be real numbers such that
ZL
bn2,
bl,
b
O,
0)
+ 1,
and
<
[max (bnl,
bn2,
. . . , (1)n2bi, (  I ) ~  ' ~ o ? 01
+ 11
Since n is odd, (1)" = 1, so that f(v) < O . Therefore, by Theorem 910.1, f(x) has a real root between v and u. The above proof does not depend on the fundamental theorem of algebra. A proof of Theorem 910.4 can be based on the fundamental theorem of algebra (see Problem 5 , Section 9S), but then it would not be logically correct to turn around and use Theorem 910.4 in the proof of the fundamental theorem, as \ve will do in Appendix 3.
376
[CHAP.
1. By plotting points a t 4 unit intervals from 3 to 3, sketch the graphs of the following polynomials. (a) x2  2 x + 1 (b)  2 ~ 3 X  3 (c) x4 x3 x2 x I (d) x3  2x2  3
+ + + + +

graphing the following polynomials, estimate the location of their real x4 x4 x3 2x2  8x  3 28x2 24x 12  4x 1

+ +
3. Find upper and lower bounds for the real roots of the following polynomials. (a) x7  x6  x5 x4  x3 x 1 (b) x12  23x2 722  1 (c) 4x5  2x  I (d) 99xg9 x7 1
+ +
4. Use the method of Example 3 to find the largest integer the real roots c of the following polynomials. 5 (a) x3  7x (b) x4  4x2 x 1 (c) x5  7x3 3x2 5x  1
c for al1 of
+ + + + +
5. Prove that a monic polynomial in R[x] which has even degree must have a t least two real roots if the constant term is negative. 6. Prove the last part of Theorem 910.3 in detail.
7. Let a l , a2, . . . , a, be real numbers with a l bl, b2, . . . , b, be positive real numbers. Define
g(x) and
=
< a2 <
<
a,.
Let bo,
(x  ai)(x
a 2 ) . .  ( x  a,)
8. Let f (x) be a polynomial of positive degree in R[x]. Prove that if f' (x) has no real root, then f(x) has exactly one real root. [Hint: Use Theorem 910.4 to show that f(x) has a t least one real root; use Theorem 96.4 to show that f(x) has no multiple real roots; prove that if
where f (x) has no root between a and b, then f' (a) and f '(b) have opposite signs; from these facts, deduce the assertion of Problem 8.1
91 1 Sturm9s theorem. Theorem 91 0.1 guarantees the existence of at least one real root between c and d if the values of the polynomial j ( x ) E R [ x ]a t c and d have opposite signs. There may be more than one. For example, if j ( z ) = 64.~" 88.c2 34.c  3,
then f ( 0 ) = 3 and f ( 1 ) = 7 . The roots of j(.r) are 6,3, and 2. In sketching a graph of f(.c) fr6m a table of valucs, it would be easy to overlook t,wo of these roots:
From this data, wc would probably sketch the graph pictured in Fig. 95. The actual graph of j ( x ) , with the three zeros indicated, is shown in Fig. 96. Sturm's theorem* makes it possible to determine the number of real roots of a polynomial between any two numbers. Applying this theorem to the polynomial f ( z ) = 642"  88.c2 342  3, wc would be able to see that f ( x ) has three real roots between O and 1 , and thereby avoid the error of sketching the graph of f ( x ) as in Fig. 95. Let j ( x ) be a polynomial of positive degree. We will describe a process which assigns to every real number t a nonnegative integer N ( t ) , such that the value of N ( t ) is diminished by 1 whenever t passes a root of j ( x ) . Then for any real numbers c < d such that, f(c) # O and f(d) # 0, the integer N ( c )  N ( d ) is the number of reaI roots of j(.x) bet,ween c and d.
378
[CHAP.
The first step in defining N ( t ) is to alter slightly the Euclidean algorithm (see Section 94). By the division algorithm,
Let sl(x)
where sk(x) is the last nonzero remainder. Except possibly for sign, the remainders sl(x), s2(x), . . . , sk(x)obtained in this way are the same as the remainders obtained in applying the Euclidean algorithm to find a greatest common divisor of f(x) and f'(x). Therefore, the last nonzero remainder sk(x) is a g.c.d. of f(x) and fr(x). The sequence of polynomials
176s
+ 34x  3 .
Then f f ( x )
= 192x2
For each real number t, the values at x = t of the polynomials given in (914) form a finite sequence of reaI numbers:
A variation in sign occurs in the sequence (915) whenever one of the numbers is positive, and the next nonzero number in the sequence is negative, or vice versa. For instance, in the sequence 3,0, 1, 2,0,0, 1, variations in sign occur at 3 and 2. Let N ( t ) be the total number of variations in sign for the sequence (915). The number N ( ) can be computed by discarding the numbers in the sequence (915) which are 0, and counting the number of variations in sign for the new sequence which consists of positive and negat,ive real numbers.
EXAMPLE 2. Let f ( x ) be the polynomial 64x3  88x2 34x  3, whose Sturm sequence was obtained in Example 1. The values of the polynomials in the Sturm sequence of f ( x ) corresponding to x = O and x = 1 are, respectively,
+ 34x
3,
THEOREM 911.1. Sturm's theorem. Let f(x) be a polynomial in R[x] whose Sturm sequence is given by (914). Let c and d be real numbers such that c < d and f ( c ) # O and f(d) # O . For each real number t, let N ( t ) be the number of variations in sign in the sequence (915). Then the number of distinct roots of f ( x ) between c and d is equal to N(c)  N(d). The proof of Sturm's theorem is elementary, but rather long. For this reason, we will not prove Theorem 911.1 in this section. The interested reader can find a proof of Sturm's theorem in Appendix 1.
380
[CHAP.
EXAMPLE 3. Returning to the polynomial j(x) = 64x3  88x2 342  3, we note t h a t by Sturrn's theorem and the result of Example 2, j(x) must have three roots betiveen O and 1. 7'his is in agreernent mith the obscrvation made a t the beginning of this section t h a t 9, 4, and 2 are roots of j(x). There can be no others, because the dcgree of j(x) is thrce.
I t is to be emphasized that Sturm's theorem gives a ~ v a y of finding the number of distinct real roots of a polynomial. This theorem does not give any information about the miiltiplicity of these roots. However, if thc Iast term sk(.c) in the Sturm sequencc of a polynomial f (z) is no t a constan t, then f(x) muy have multiple real roots, which can be located by applying Sturm's theorem to sk(x), since this polynomial is a g.c.d. of f(x) and f r ( x ) .
EXAMPLE 4. Let j(x)
4x3
32
+ 1.
Then jr(x)
12x2  3, and
2, x
0, x
Therefore, N(2) = 2, N(0) = 1, and N(2) = O. I t follo~vsfrom Sturm's theorem t h a t j(x) has one root between 2 and 0, and one root between O and 2. I t is easy to see (by Theorem 910.3, for example) t h a t j(x) has no root smaller than 2, and none larger than 2. Thus, j(x) has orily two distinct real roots. Clearly, one of these must be a double root, since the complex roots of a real polynomial occur in pairs, by Theorem 98.3. If we note that 22  1 is a greatcst common divisor of j(x) and jr(x), then it becomes clear from Theorem 95.4 t h a t is a double root of j(x). By inspection, the other real root is 1.
+ 4x3 + x2 62 + 2 O. Then jf(x) = 4x3 + 12x2 + 22 6, j(x) (ax + +)fyx) (3x2 + 5~  5), f'(x) (fx + t)(4jx2 + 55  5 )  ($2 + g), + 52  5 ( Y X+ y ) ( $ x + 5) 6.
=
x4
5 2
By Theorem 910.3, every real root of f(x) is between 7 and 7. Computing the values of the Sturm sequence for each integral value beginning a t x = 7, we find that N(7) = 4, N(6) = 4, N(5) = 4, N(4) = 4, N(3) = 4, . This shotvs that al1 four N(2) = 2, N(1) = 2, N(0) = 2, and N(1) = O roots of f(x) are real, and there are two roots bettveen 3 and 2 and two roots between O and 1. Since f(3) > 0, f (2) > O, f(0) > 0, and f(1) > 0, the existence of these real roots would not be detected by Theorem 910.1 if we calculated f(x) only for integer values of x. The calculation of N(+) = 3 and N(+) = 1 locates the roots of f(x) in the intervals 3<x<8, s<x<2, O < % < + , and + < x < l .
Having isolated each real root of f(x), we can use Theorem 910.1* to obtain the nplace decimal approximation of these roots. For example, since f(0) = 2, f(O.1) = 1.9541, f(0.2) = 0.8736, f(0.3) = 0.4061, f(0.4) = 0.0416, and f(0.5) = 0.1875, it follows from Theorem 910.1 that the root off(x) in the interval O < x < 4 is between 0.4 and 0.5. Repeating this process, we obtain f (0.41) = 0.0120 and f(0.42) = 0.0768 (with four decimal accuracy). Thus, the 2place decimal approximation of this root is 0.41. Continuing in this way, we can locate the root between successive thousandths, ten thousandths, etc. There are various schemes for systematizing and shortening the calculations involved in finding decimal approximations of the real roots of a polynomial in R[x]. The interested reader can find these methods discussed in standard college algebra and theory of equations textbooks.
1. Give the Sturm sequence of each of the following l)olynomials. (a) x3 x2 x 1 ~ 6 (b) x4  3x2  1 0 (c) xS  5x  2
+ + +
* If the multiplicity of the isolated root is even, then Theorem 910.1 will not help in locating the root. For the polynomial which tve are considering, i t is obvious that al1 of the roots are simple, because the surn of the multiplicities of al1 the roots is four, and there are four distinct roots.
382
[CHAP.
2. Use Sturm's theorem to locate (between consecutive integers) al1 the real roots of the polynomials in Problem 1.
3. Let f(x) = ax2 bx c, where a, b, and c are real numbers, with a # 0. Find the Sturm sequence of f(x). Use Sturm's theorem to show that f(x) has real roots if and only if b2 > 4ac.
4 . Let p and q be real numbers, with p # O. Show that the Sturm sequence of the polynomial x3 px q is
+ +
+ +
provided 27q2 4 p 3 # O. Use Sturm's theorem to show that x3 px q has one real root if 27q2 4 p 3 > O and three real roots if 27q2 4 p 3 < O . [Hint: Consider the cases p > 0, p < O separately.]
+ +
6. Find the 3place decimal approximations of al1 the roots of the polynomial of Example 5.
912 Polynomials with rational coefficients. The fundamental theorem of algebra leads to a complete solution of the problem "what are the irreducible polynomials in C[x] and in R[x]?" (See Theorems 98.2 and 98.4.) Determining the irreducible polynomials in Q[z] is much more difficult. There are ways of testing whether or not a polynomial in Q[x] is irreducible. However, al1 of these methods are rather complicated, and they do not lead to very interesting general results. For this reason, we will only consider a part of the general problem of determining the complete factorization of polynomials in Q[z], namely, the determination of the linear fac tors. By the factor theorem, a polynomial x  r with r E Q is a factor of a(x) in Q[x] if and only if r is a root of a(x). Suppose that
where the numbers ui and v i # O are integers. Let u be a common multiple of the denominators VO, ul, v2, . . . , un, for example, u = ~ 0 ~ . 1 . .~un,2 or u = [u0,u1, u2, . . . , u,]. Then the polynomial b(x) = v a(x) has integral coefficients. Moreover, b'(r) = v a(r) = O if and only if a(r) = O. Thus, the problem of finding the monic linear factors of a polynomial in Q[x] can be reduced to the problem of finding the rational roots of a polynomial in Z[x]. The following theorem shows that the rational roots of a polynomial in Z[z] can be found by trial.
9 121
383
= a0 a l x 4an,xn' anzn be a polynomial with integral coefficients. Suppose that a. # O, a, # O, and n 2 1. If b and c are relatively prime integers such that b/c is a root of a(x), then b divides a. and c divides a,.
+ albcnl +
+ albcn2 +
+
+ anlbnlc + anbn = 0.
+an~ bnl) . c = a,bn,
. + anWlbnd2c + anbn')l.
Therefore,
(aocnl
and
aOcn= b . [(alcnl
These equalities imply that c divides anbn and b divides aocn. Since b and c have no common prime factor by hypothesis, it follows that (c, bn) = 1 and (b, cn) = 1. Thus, by Theorem 52.6, c divides a, and b divides ao.
EXAMPLE 1. We will use Theorem 912.1 to show that O is the only rational root of the polynomial a(x) = x7  3x6 2x3 x2. Clearly, O is a root of a(x), and a(x) = x2(x5  3x4 22 1). If r # O is a rational root of a(x), then r is a root of x5  3x4 2x 1. We can write r = b/c, where b and c are relatively prime integers. By Theorem 912.1, b divides the constant term of x5  3x4 22 1, and c divides the leading coefficient of this polynomial. That is, b and c both divide l. Hence, b and c are either 1 or 1, so that r = rt 1 also. However, a(1) = l7  3  l 6 2 . l3 l 2 = 1 and a(1) = (  I ) ~ 3 (1)6 2(1)3 (1)2 = 5. Therefore, O is the only rational root of a(x>
+ + + +
+
+ +
+
+ +
EXAMPLE 2. Let
a(x)
= x4
+ yx3 + $$2
 2 3'
If r = b/c is a rational root of 6a(x), where b and c are relatively prime integers, then by Theorem 912.1, b divides 4 and c divides 6. Therefore, the possibilities for r are *l, h 2 , rt4, A+) A+) *g, *S, A&. Testing each of these numbers, we find that a(+)
=
O, a($)
O , and
3
and
384
[CHAP.
are the only rational roots of a ( x ) . The division algorithm gives the factorization a(x) = (X + ) ( x  + ) ( x 2 2x 2))
+ +
+ 2x + 2 is irreducible in Q[x].
EXAMPLE 3. Theorem 912.1 can be used in combination with some of the previous results in this chapter to obtain considerable information about the complete set of roots in C of a polynomial in Q [ x ] . Let
Since a ( x ) E Q [ x ] ,it f o l l o ~ ~ that s a ( x ) E R [ x ] . By Theorem 910.3, a real root c of a ( x ) satisfies o < c < 1 3 3 .
3 
By Theorem 912.1, the possible rational roots of b ( x ) are f1, f7 , f a ,and f3. Since 7 < and 7 > it follo\vs that 7 and 7 cannot be roots of b ( x ) [for otherwise a ( x ) would have a root in Q C R which is not between the bounds for the real roots of a ( x ) ] . Testing the numbers f1, f5, f3 in b ( x ) , we find that b(1) = 0 , b ( 3 ) = 0 , and that 1 and 3 are the only rational roots of b ( x ) . The division algorithm yields
y
y,
From Theorem 912.1, the only possible rational roots of c ( x ) are 1 and 1. Of course, 1 cannot be a root of c ( x ) , since i t is not a root of b ( x ) . By substituting, we find that e(1) = O. Division gives
C(X) = (X
so that b(x)
=
+ 1 ) ( x 3  x2  1 ) 3(x + I ) ~ ( x 3)(x3

x2  1 )
in Q[x]. Let d ( x ) = x3  x2  l . Since d(1) # O, it follows x3  x2  1 is irreducible in Q[x](see Problem 3 ) . Thus a ( x ) = ( x I ) ~ ( x 3 ) ( x 3  x2  1 ) is the complete factorization of a ( x ) into irreducibles in Q[x]. Further roots of b(x) in C are roots of d ( x ) . Regarding d ( x ) as a polynomial in R [ x ] , we use Theorem 910.3 again, and find that every real root c of d ( x ) satisfies 2 5
912]
c
and N (  2 ) = 2, N ( 2 ) = l . Therefore, by Sturm's theorem, d ( x ) has exactly one real root. This root is located between 1 and 2 sincc d(1) = 1 and d ( 2 ) = 3. The othcr roots of d ( x ) are a pair of conjugate complex numbers (Theorem 98.3). I n summary, we havc obtained the following information about thc roots in C of the polynomial a ( x ) : 1 is a double root; 3 is a simple root; thcre is a simple real root between 1 and 2 which is not rational; there is a pair of conjugatc complex roots. Of coursc, real and complex roots of x3  x2  1 can bc found in terms of square roots and cube roots, using the methods of Section 99.
The roots of polynomials in Q[x]have many interesting properties. I n the rcmainder of this section, we will examine some of the simplest ideas which are usecl in the study of the roots of rational polynomials. Our discussion will scratch the surface of an extensive branch of mathematics known as algcbraic number theory. DEFISITIOX912.2. A complex number u is called an algebraic number if u is a root of some nonzero polynomial with rational coeficients. Complex numbers which are not algebraic are called transcendental. Every rational number r is an algebraic number, because r is a root of x  T . Any number of thc form u = where r is rational, is algebraic, becausc u is a root of xm  r. The complex unit i = .\/lis an algebraic number. More generally, any number of the form r i S , r E Q, S E Q, is an algebraic number, because r i S is a root of x 2  2rx (r2 s2). Later we will show that the sum and product of any two algebraic numbers is an algcbraic number, so that numbers such as 4 3 4,4 fi, i/5 i, 2 fi, etc., are algebraic. We observed in Section 12 that the set of al1 algebraic numbers is denumerable (sce the discussion following Example 5 ) . Since the set C of complex numbers is not denumerable, thcrc must be many complex numbers which arc not algebraic. That is, transcenden tal numbcrs certainly exist. However, i t is not very easy to produce specific examples of transcendent,al numbers, and it is quite difficult to prove that particular numbers such as T and 2d2 are transcendental. According to Definition 912.2, a number u is algebraic if it is a root of any nonzero polynomial in Q[x]. Of c'ourse, if u is algebraic, then u is a root
e,
+ + + +
386
[CHAP.
of infinitely many polynomials with coefficients in Q. The following theorem tells us exactly what this set of polynomials can be.
THEOREM 912.3. Let u be an algebraic number. Then there is a unique monic polynomial p(x) of least degree having u as a root. This polynornial p(x) is irreducible, and it has the following property: if a(x) E Q[x], and u is a root of a(x), then p(x) divides a(x) in Q[x].
The unique polynomial p(x) described in this theorem is called the minimal polynomial o f u. The degree of u is defined to be the degree of the minimal polynomial of u. Thus, the rational numbers are exactly the algebraic numbers of degree one, and the numbers r fi where r, S E Q and S is not a square in Q are of degree two. To prove Theorem 912.3, let J = (a(x) E Q[x]la(u) = O). That is, J is the set of al1 polynomials in Q[x] which have u as a root. The assumption that u is an algebraic number means that J is a subset of Q[x] which contains a t least one nonzero polynomial. Therefore,
+ +
<
Then p(x) is a monic polynomial such that p(u) = O and Deg [p(x)] Deg [a(x)] for al1 nonzero a(x) E J . We will show: (i) p(x) is irreducible, and (ii) if a(x) E J, then p(x) divides a(x) in Q[x]. I t will then follow easily that p(x) is unique. Suppose that p(x) is reducible. Then p(x) = b(x) c(x), where b(x) and c(x) are nonzero polynomials in Q[x] which have degrees less than Deg [p(x)]. Since b(u) . c(u) = p(u) = O, it follows that either b(u) = O, or c(u) = O. Hence, by definition of J, either b(x) E J, or c(x) E J . This is impossible however, because Deg [p(x)] Deg [a(x)] for al1 nonzero a(x) E J. Therefore, p(x) is irreducible. I n order to prove (ii), let a(x) E J . By the division algorithm, it is possible to write
<
<
where q(x) E Q[x], r(x) E Q[x], and either r(x) = O, or else Deg [r(x)] < Deg [p(x)]. Suppose that r(x) # O . Then Deg [r(x)] < Deg [p(x)].
912]
387
Moreover, r(u) = a(u)  q(u) . p(u) = O  q(u) O = O, because a(x) E J and p(x) E J. Thus, r(x) E J. However, this is impossible, since r(x) E J implies that Deg [p(x)] Deg [r(x)]. Consequently, r(x) # O is impossible. Therefore, r(x) = O and a(x) = q(x) p(x). That is, p(x) divides a(x), which proves (ii). I t remains to show that p(x) is unique. By choice, p(x) is one monic polynomial of minimal degree in J. Suppose that a(x) is another one. Then Deg [a(x)] = Deg [p(x)]. By what we have just proved, p(x) la(x). Therefore, a(x) is a nonzero, constant multiple of p(x) (see 94. ld) . Since a(x) and p(x) are both monic, the constant must be one. That is, a ( ~ ) = p(x). This establishes the uniqueness of ~(4.
<
EXAMPLE 4. Let u = 2/S. Then the minimal polynomial of u is x2  2, since u is a root of this polynomial, but not of any polynomial of lower degree in Q [ x ] . Thus 2/2 is an algebraic number of degree two. The polynomials in Q[x] which have 2/2 as a root are exactly those polynomials which are divisible by x2  2. I n particular, if 2/2 is a root of the rational polynomial a ( x ) , then & is also a root of a(x).
We wish to prove that the set of al1 algebraic numbers is a subring of the ring C of al1 complex numbers. A preliminary result is needed, which is important in its own right. THEOREM 912.4. Let u be an algebraic number of degree n. Define
is closed under addition, multiplication, negation, and the Then &[u] inverse of every nonzero element of &[u] is in &[u]. Thus, &[u] is a field which is a subring* of C.
Proof. Let U = {a(u)la(x) E Q[x]). Then it follows from (97.2) that = U. I t is clear that U is a subring of C. We will first prove that &[u] &[u] G U. Indeed &[u] is just the set of al1 complex numbers r(u), where r(x) E Q[x]is such that either r(x) = O, or else Deg [r(x)] < n. On the other hand, suppose that w E U. Then w = a(u) for some a(x) E Q[x]. Let p(x) be the minimal polynomial of u. Then the degree of p(x) is,
* I n general, if D is a subring of a ring A and a E A, then D[a] denotes the smallest subring of A containing D and a. This notation seems to conflict with the use of D[x]to denote the ring of polynomials with coefficients in D, but there is no contradiction because D[x]is the smallest subring of D[x] which contains D and x. Throughout the rest of this section, the symbols u and v will always stand for algebraic numbers, and x will denote an indeterminate as usual.
388
[CIIAP.
by definition, the dcgree n of u. By the division algorithm, wc can write a(x) = q(x) p(x) r(x), where r(x) E Q[x], and cither r(x) = O, or else Deg [r(x)3 < I>cg [p(x)] = n. Thus,
Conseyuently w E &[u]. Sincc w was any element of C, we have proved that U c Q[u]. Thus, Q[u] = U. The only thing left to show is that every nonzero element of &[u] has an inverse in &[u]. Let w = ro r l u rnlunl be an element of Q[u] which is not zero. Then in particular, rlx rnlxnl is not aero. Moreovcr, the polynomial r(x) = ro Deg [r(x)] 5 n  i < Deg [p(x)]. Hence, p(x) does not divide r(x). Since p(x) is irreducible by Thcorem 91 2.3, it follows that p(x) and r(x) are rclatively prime [sec (95.2)]. Therefore, by Thcorem 94.4, polynomials g(x) and h(x) cxist in Q[z], such that
Thcrcfore, w'
r(u)'
THEOREM 912.5. If U and 1) are algcbraic numbers, then u u, u u, and u are algebraic riumbers. If u is a nonzero algebraic iiumber, then u' is an algebraic ilumbcr. In order to prove this theorem, it is necessary to use a result which will bc established in Section 102 (sce Theorem 102.9). 'i'hc spccial case of Theorcm 102.9 which wc will use here can be stated as follows. (912.6). Let {ri,j l 1 5 i 5 g, O 5 j 5 g) be a set of rational numbcrs. Then there exist rational numbcrs so, sl, sg, . . . , sg not al1 of which are zero such t,hat
912]
POLYNOMIALS
389
Proof of Theorem 912.5. Suppose t,hat the degree of u is m and the u is a root of a nonzero polynomial degree of v is n. We will prove that u in Q[x]mhich has degree a t most m n. Therefore u v is algebraie of degree 5 m n. By Theorem 912.4, for any natural numbers i and j, there exist rational numbers ai.0, ai,l, ai.2, . . . , ai,ml and bj,,, bj,,, bj,2J . . . , bj,nl S U C that ~
Sinee al1 of the binomial coefficients ( : ) are natural numbers, it follows that each of the numbers rk,l,h is rational. It is also convenient to define rk,l,o = 1 if 1~= 1 = O and r k , ~= , ~O if k > O or 1 > O, SO that
By (912.6) (taking g = m n and replacing the indices i = 1, 2, . . . , g by the m  n pairs (k,1), O _< k 5 m  1, O 5 1 5 n  1 in some order), there exist rational numbers so, sl, . . . , S,., not al1 of which are zero, such that
1 and O
1 _< n
1.
390 Consequently,
[CHAP.
That is, u
v is an algebraic number. A similar proof shows that u . v Therefore, u is a root of a nonzero polynomial of degree a t most m n. Thus, u v is algebraic. In particular u = (1) . u is algebraic. Finally, suppose that u # 0, and let the minimal polynomial of u be
+(C&'C~~)U~~+(C ) U O Therefore,
1
m1
1
Since the sums and products of algebraic numbers are algebraic, and since 1 u and each of the rational numbers  c ~ ' c l , cc1c2, . . . , co cmil cOh1 is algebraic, it follows that u' is algebraic. This completes the proof of Theorem 912.5.
EXAMPLE 5. It is instructive to carry out the proof of Theorem 912.5 in a special case. Let u = 1 4 2 and u = 43. Then the minimal polynomials of u and u are x2  2x  1 and x2  3, respectively. mTehave
9 121 so that
POLYNOMIALS W I T H R A T I O N S L C O E F F I C I E N T S
We wish to find rational numbers so, si, s2, s3, and s4, not al1 zero, satisfying
A method for solving such systems of equations will be developed in Section 102. However, it is easy to verify that
is a solution. Consequently,
4 3 is a root of x4  4x3  4x2 16x  8. v = 1 Therefore, u The proof that uv is an algebraic number is somewhat simpler in this special case. Note that 6u, (uv)~ = u2v2 = (2u 1) 3 = 3
(uv)~ = u4v4 Thus,
=
+ (12u + 5)
+ 45 + 108u.
2/2 2/3 is a root of x4  18x2 9. Consequently, uv = 4 3 It can be shown that the polynomials x4  4x3  4x2 16x  8 and x4  18x2 9 are irreducible in Q[x], so that if u = 1 d 2 and v = d 3 ,
392
[CHAP.
then the degree of u v and u v is exactly 4, the product of the degree of u and v is less the degree of v. I t may happen however that the degree of u v or of u than the product of the degrees of u and v. For example, if u = 4 2 and v = .t/2, then the degree of u is 2, the degree of v is 4, and the degrees of u v and u v are both 4 : u v is a root of x4  4x2  8x 2, u v is a root of x4  8.
THEOREM 912.7. The set A of al1 algebraic numbers is a field which is a subring of C. If ZL is any algebraic number, then the field &[u] is a subring of A.
Proof. By Theorem 912.5, the set A of al1 algebraic numbers is a field with respect to the operations of addition, multiplication, and negation in C. That is, A is a subring of C. If v is any element of &[u],where u is an algebraic number, then by the definition of &[u],v is a sum of products of algebraic numbers. Thus, by Theorem 912.5, v E A. Therefore, &[u] C A.
1. Find al1 of the rational roots of the following polynomials. (a) 2x3  7x2 10x  6 (b) x3  ,X 3x  2 (C) x3  S X 2  $x + 1. 16 (d) x3  48x 64 (e) x4  52  1 (f) 2~~  ~5  2 ~ 4 ~3 2x2 32  2
+ +
+ + +
2. Prove that if r is a rational root of a monic polynomial with integral coefficients, then r is an integer.
3. Prove that a polynomial of degree 2 or 3 in Q[x]is irreducible in Q[x]if i t has no rational root. Use this result to show that the following polynomials are irreducible in Q[x]. (b) x2 +X  1 (a> x2 II: 1 (c) x3 37x2 2 1 1 ~  1 ( d ) x3  25x  5
+ ++
4. Give the complete factorization in Q[x]of the following polynomials. (a) x4  1 (b) 2 ~ 4 x3 2 ~ 2  1 (e) x4 x2 1
+ + + +
5. For the following polynomials in Q[x]determine al1 rational roots, and the number and approximate location of al1 real roots. (a) x4 3x3 f$x2 &x  & ( b ) x5 4x4 7x3 7x2 4x 1 (c) x 7 + Z J i x 6 + $ x 5  Yx4 3 5 3  3x2+ GX  4 xX 3
+ + + + + + + +
9 121
393
+ d3, +
+,
7. Suppose that r and S are rational numbers and S is not the square of a di is x2  2rx rational number. Prove (a) the minimal polynomial of r is a root of a(x), then r  di is (r2  S); (b) if a(x) E Q[x] is such that r also a root of a(x).
8. Carry out the proof that if u and v are algebraic numbers, then u an algebraic number in the following special case. (a) u = d Z , v = d 5 (b) u = 43, v = +S.
+ v is
9. Give the details of the proof that if u and v are algebraic numbers of degree
CHAPTER 10
f DEFINITION 101.1. Let D be an integral domain. The domain o polynomials in the distinct indeterminates xl, x2, . . . , x, with coeficients in D is defined by induction on r. For r = 1, D[xl] is the integral domain of polynomials in xl with coefficients in D, defined as in Section 92. I f r > 1 and D[xl, x2, . . . , x,~] has been defined, let
The elements of D[xl, x2, . . . , xr] are called polynomials in xl, 2 2 , with coeficients in D. According to Definitions 101.1 and 92.1, each element of
. . . , X,
where fi E D[xl, x2, . . . , x,J. If r = 2, then each fi is a polynomial in which can be expressed in the form aipjx; with aij E D. Choose m to be the largest of the integers mo, ml, . . . , m, and define ai,j = O if
21,
394
1011
395
mi
<j
< m.
where al1 a;, j and bi,j are in D, then ai,j = bit for al1 i and j. In fact, define
and
i for al1 i. Then fixa = x7=0 gix2. By uniqueness of the representation (101)) it follows that
I for al1 i. Therefore, by Definition 92.1, ai,j = bi, for al1 i and j. In general, it can be shown by induction on r that each polynomial in D[xl, 2 2 , . . . , xT] can be expressed uniquely as a multiple sum
i,$1j
il 5 nl, where for each string il, i2, . . . , ir of integers satisfying O O i2 n2, . . . , O ir nr, ail,i2 ,...,iT is an element of D. The existence of a representation of the form (102) is the reason why the elements of D[xl, x2, . . . , x,] are called polynomials in 21, 2 2 , . . . , XT, with coefficients in D. Because it is cumbersome, the expression (102) is frequently shortened
< <
< <
<
where i stands for the ordered string (il, i2, . . . , ir), and the sum is over a finite number of such strings. I t is sometimes convenient to denote polynomials in r indeterminates by expressions such as
396
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
The statement that the representation in D[xl, x2, . . . , x,] is unique means that
xiaixtx2
x$ of a polynomial
only if ai = bi for al1 i = (il, i2,. . . , ir). This fact is very important. Many definitions concerning polynomials in severa1 indeterminates are stated in terms of the representation of polynomials in the form
The concepts introduced in this way are well defined because of the uniqueness of the representation (a fact which is often not mentioned). Those polynomials in D[xl, x2, . . . , x,] which contain only the indeterminates xj,, xj2, . . . , xj8, where jl, j2,. . . ,j, are distinct elements of the set (1, 2, . . . , r), form a subring of D[xl, x2, . . . , x,]. This subring is isomorphic to the ring of al1 polynomials in any s indeterminates with coefficients in D. I t is natural to denote this subring of D[xl, x2, . . . ,?l. j ZI, ai,j,nxny'z' by D[xj1, xj,, . . . , xj8]. For example, a polynomial such that ai,j,k = O for al1 k > O can be expressed as
xi
C C C a i , j , k ~ ~= yj~~
i j k i
j
(aii,j,Ox)Y2 =
j i
E
i j
.. bi.jy'z2,
where ai,j, oxO= bi, j E D. The set of al1 such polynomials is the subring of D[x, y, x], which we denote by D[y, x]. In this way, the rings of polynomials in the various subsets of (xl, x2, . . . , xr) are identified with subrings of D[x1, x2, , xrl. I f a(xl, 2 2 , . . . , xr) is a polynomial in D[xl, x2, . . . , x,], then it is clear from the representat,ion (102) that for each natural number j 2 r, we can think of a(xl, x2, . . . , xr) as a polynomial in xj with coefficients in D[xl, . . . , xj1, X j + l , . . . , x,]. Thus, no distinction is made between D [ x ~ , x ~ , . . . , and x ~ ]D[xl , . . . , xjl,xj+l, . . . , xr,xj]. In general, if il, i2,. . . , ir is any permutation of 1, 2, . . . , r, then D[xi1, x,, . . . , x;,] is regarded as the same domain of polynomials as D[xl, x2, . . . , X,]. Por example, the polynomial
is expressed as (x4
1011
which is a polynomial in D [ y ,z][x]= D [ y ,z, x]. The notion of the degree of a polynomial can be generalized in severa1 ways to polynomials in severa1 indeterminates. When a ( x l , 2 2 , . . . , x,) E D [ x l , x2, . . . , x,] is regarded as a polynomial in xj with coefficients in D[x17 . . . , xjVl7 ~ j + ~. ., . , x,], we can use Definition 93.1 to define the x jdegree of a ( x l , 2 2 , . . . , 2,). That is, if
where f,(xl, . . . , xj1, xj+l, . . . , x,) # 0, then a ( x l , x2, . . . , x,) is n. For example, +x2y 2xy3 (+y)x2= 1 ( i x 2 ) y (2x)y3,SO that
+ 1 = 1 + (2y3)x +
the
xjdegree of
Of course, the properties of the degree of a polynomial listed in Theorem 93.2 are satisfied by Deg, for each xj. I t is also possible to define the total degree of
i, for which ai, ,;,,... ,i, is not i2 to be the largest of the sums il zero. For example, the total degree of +x2y 2xy3 1 is four. It is easy to prove the analogue of Theorem 93.2 for the total degree.
+ +
+ +
(101.2). Let a ( x l , x2, . . . , x,) and b ( x l , x2, . . . , x,) be nonzero polynomials of total degrees m and n respectively. Then n; (a) a ( x l , x2, . . . , x,) b(xl, 2 2 , . . . , x,) has total degree m x,) b ( x l , x2, . . . , x,) is either zero, or has total (b) a ( x l , x2, . . . , degree 5 max (m, n ) ; (c) if m # n, then the total degree of a ( x l , x2, . . . , x,) b(x1, x2, . . . , x,) is equal to max {m, n ) .
We leave the proof of these facts for the reader to supply. The arithmetical properties of the rings F[x] mith F a field cannot be generalized to polynomial domains F [ x l ,x2, . . . , x,] with r > 1. The most important results in Sections 93 and 94 are false in F [ x l , x2, . . . , x,] when r > 1. Surprisingly enough, the unique factorization theorem is
398
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
true in F[xl, x2, . . . , x,], although it is proved in a different way than Theorem 95.4. We will not enter into a discussion of these matters, but will only note the following example.
EXAMPLE 1. The polynomials x and y in Q[x, y] clearly have only nonzero rational numbers as common divisors. Hence, 1 is a greatest common divisor of x and y (in the sense explained in Section 52). I t is not hard to see, however, that there are no polynomials f(x, y) and g(x, y) in Q[x, y] such that
The definition of substitution given in 97.1 can be extended to polynomials in severa1 indeterminates.
DEFINITION 101.3. Let D be an integral domain, and let A be a commutative ring which contains D as a subring. Suppose that
is in D[xl, x2, . . . , x,]. Let (ul, u2, ments of A. Then the element
in A is called the value of a(xl, x2, . . . , x,) for ~1 = ~ 1 x2 , = ~ 2 . ,. . , and x, = u,, and this value is denoted by a(u1, ug, . . . , U,). The a ( u l , u2, . . . , U,) is said to be obtained by substituting u l , ~ 2 .,. . , u, for x1, xp, . . . , Xr in a(x1, X2, . . . , xr).
EXAMPLE 2. Let D = R, a(x, y, z) = x2 y2  z2. If A = C, the value = 1. If of a($, y, z) a t (1, i, 1) is a(1, i, 1) = l2 (i)2 A = R, the value of a(x, y, z) a t ( d 2 , 4 2 , 2) is a ( d 2 , d 2 , 2) = ( d 2 ) 2 ( d S ) 2  (2)2 = O. Let A = R[x, y]. Then the value of a(x, y, z) a t
1011
POLYNOMIALS I N S E V E R A L I X D E T E R M I N A T E S
399
The property of substitution given in (97.2) can be generalized. (101.4). Let D be an integral domain mhich is a subring of the commutative ring A. Let f(xl, x2, . . . , x,), a(xl, x2, . . . , x,), and b(xi, 2 2 ,
, xr)
be in D[xl, x2, . . . , x,]. Suppose that ul, 242, . . . , are in A. (a) Iff(xl, ~ 2 , .. . , x,) = a(x1, ~ 2 , . .. , x r ) b(xi, 2 2 , . . . ,xr) then
a(x1, $2,
xr), then
for al1 v E A. (d) Let g(xl, x2, . . . , x,) E D [ x ~~ , 2 . ,. . , xsI, ai(x1, x2, D[xl, x2, . . . , x,] for i
=
 , xr)
1 f
h(zi, x2, . xr) = g(al(xl, 2 2 , then
,~ 7 ) ) )
The statements (a), (b), and (c) are easily proved by means of the generalized commutative, associative, and distributive laws of operation in a ring. (See Section 97 for the proof of (b) in the case r = 1.) The staternent (d) can be obtained from (a), (b), and (c) by induction on S (see Problem 14 below). Part (d) includes (a) and (b) as the special cases in which g(xl, x2) = x1 x2 and g(xl, x2) = x1 x2. Another important consequence of (d) is the fact that the result of substituting for the indeterminates in a polynomial does not depend on the way in which the polynomial is expressed. For example,
400
SYSTEMS
OF EQUATIONS
AND MATRICES
[CHAP.
10
then
for any u17u2, u3, u4 in a commutative ring containing Z as a subring. Of directly . course, this fact could be sho~vn , . . . , ~ r ] ,where DEFINITION 101.5. Let a(x17x2, . . . , x,) E D [ z ~22, D is an integral domain. Let A be a commutative ring containing D. If ul, u2, . . . , u, are in A, then the ordered string (ul7 u2, . . . , u,) is called a xero of a(xl, x2, . . . , x,) [or a solution o f a(x1, x2, . . . , 2,) = O ] in the ring A if a(ul, u2, . . . , uT) = 0. More generally, if
are polynomials in D[xl, x2, . . . , x,], then (ul7u2, solution o f the system o f equations
. . . , U,) is called a
1011
401
EXAMPLE 3. Let a(x, y) E R[x, y]. The zeros (u, u) of a(x, y) in R can be considered as the coordinates of points in the cartesian plane. The set of al1 such points constitutes what is called an algebraic curve, (possibly degenerate, that is, the empty set, or a finite number of points). For example, if a(x, y) = x2 y2  1, the set of al1 points (u, u) which are zeros of a(x, y) is the same as the set of al1 points which are a t a distance one from the origin. Hence, the solutions , when plotted as points in the cartesian plane, form a circle in R of a(x, y) = O of radius one with center a t the origin. EXAMPLE 4. Let a(x, y, z) E R[x, y, z]. The zeros (14, u, w) of a(a, y, z) in R can be considered as the coordinates of points in threedimensional cartesian space (by a process which is similar to the representation of number pairs by points in the plane). The set of al1 zeros in R of a polynomial a(x, y, z) E R[x, y, z] constitutes what is called an algebraic surface (possibly degenerate, that is, the empty set, or a finite set of points and algebraic curves). For example, let a(x, y, z) = x2 y2  z2. I t is possible to show that the set of al1 zeros of a(x, y, x ) in R lie on two cones with their vertices meeting a t the origin and with their axes extending along the zaxis in space (see Fig. 101). The zero
of a(x, y, z) in R[x, y] is called a parametrization of the upper half of this surface. The points on the upper cone are exactly those solutions (wl, w2, w3) in R of a(x, y, z) = O with w3 O. If any real numbers u and v are substituted for x and y, respectively, in (x2  y2, 2xy, x2 y2), we obtain a zero (u2  u2, 2uv, u2 v2) in R of a(x, y, z) with u2 v2 0, and therefore a point on the upper cone. The reader can show conversely that any zero (wi, w2, w3) in R of a(x, y, z) with w3 2 O is of the form wi = u2  v2, w n = 2uv, w3 = u2 v2 for suitable real numbers u and v.
>
+ >
EXAMPLE 5. Let ai(x, y, z) = x2 y2  z2, a2(x, y, z) = x2 y2  1 be in R[x, y, z]. The zeros in R of the system al(x, y, z) = O, a2(x, y, z) = O consist of al1 (u, u, + l ) with u2 v2 = l. Thus, in the threedimensional cartesian coordinates, the set of al1 these zeros forms two circles of radius one in space (see Fig. 102).
402
SYSTEMS
OF EQUATIONS
AND MATRICES
[CHAP.
10
The branch of mathematics which is concerned with the zeros of systems of polynomials in severa1 indeterminates is known as algebraic geometry. In recent years, the geometric aspects of algebraic geometry have become subordinate to the algebraic features of the theory. Each of the rings D[xl, 2 2 , . . . , x,] contains an important class of special polynomials, the symmetric polynomials. Ordinarily, a polynomial is changed into a different polynomial when its indeterminates are permuted. y2 z3, then a(z, x, y) = z x2 y3, For example, if a(x, y, z) = x a(y, z, N) = y z2 x3, etc. However, certain polynomials are left unchanged by al1 permutations of their indeterminates. For instance, let a(x, y) = x2 xy y2. The only permutations of (x, y) are
+ + + +
+ +
+ +
Obviously, the first of these permutations does not change a(x, y). The second permutation changes a(x, y) into a(y, x). However,
DEFIXITION 101.6. 14 polynomial a(x1, x2, . . . , x,) in D[xl, x2, . . . , X,] is called symmetric if it has the property that for any permutation
. . . , r),
That is, a(xl, x2, . . . , N,) is symmetric if every interchange of the indeterminates in a(xl, x2, . . . , x,) leaves t.his polynomial unchanged.
It is not necessary to check every permutation of (1,2, . . . , r ) to determine whether a polynomial a(xl, x2, . . . , x,) is symmetric.
(101.7). Let a(xl, x2, . . . , x,) E D[xl, x2,
. . . , x,]. Then
is symmetric if and only if for every pair i, j of natural numbers with l < i < j < r , a(zl, . . . , Ni1,
Xj, Xi+l,
. . . , Xj1,
Xi,
xj+i,
.
xr) = a ( ~ 1'2, 9
'r).
1011
403
That is, interchanging xi and xj has no effect on a ( x l , x2, Proof. Suppose that a ( x l , x2,
. . . , xr).
The proof of the converse will be clearer if we first examine a special case. Let r = 4 and suppose that interchanging any two indetermjnates has no effect on the polynomial a ( x l , x2, x3, x4). Consider the permutation
By assumption,
since a(x3,x2, x l , x4) is obtained from a ( x l , x2, x3, x4) by interchanging xl and x3. For the same reason, we have a(x1, xg, x3, x4) = a(x1, x4, 2 3 , ~ and a ( x l , 2 2 , x3, ~ In the identity
4
2 )
= )
a(zl,22,
24, 23)
, re u1 = x3, substitute ul, u2, u3, and u4 for x l , ~ 2 2,3 , and ~ 4 here u 2 = x2, u3 = x l , and u4 = x4. It then follows from (101.4d) that
Similarly, in the identity a(x1, 2 2 , xg, 5 4 ) = ~ ( 2 1 2, 2 , x4, 2 3 ) ~ubstitute u l , u 2 , u3,and u 4 for x l , x2, 2 3 , and x4, where u1 = x3, u2 = 2 4 , u3 = 2 1 , and u 4 = x2. We obtain
404
[CHAP.
10
gives the required result that a(sl, 2 2 , x3, x4) is left unchanged by the permutat ion
The proof of the general case follows the same idea, but uses more elaborate notation. First note that if kl, k2, . . . , kr is any rearrangement of 1, 2, . . . , r , then for aizy pair i, j with 1 5 i < j 5 r
Substituting u l , u2, . . . , u, for xl, 22, . . . , x,, where u1 = xk1, u2 = xk,, . . . , u, = xk, gives the required identity (103). The identity (103) means that in a(xkl,q,, . . . , xk,), any two of the indeterminates xk,, xk,, . . . , xk, can be interchanged without changing the polynomial. Moreover, for any permutation
it is possible to obta,in a(xj1, xj,, . . . , zj,) from a(zl, 22, . . . , x,) by finite sequence of such interchanges. Indeed, starting with
tl
me can put xj, in the first position by substituting xj, for xl and xl for xj,. If jl = 1, this operation involves no change a t all. If jl # 1, then the substitution simply interchanges z l and xjl in a(xl, x2, . . . ,x,). I n this case, it follows from (103) that
By a similar subst,itution, it is possible to get xjz into the second position. Since j2 # jl (by the definition of a permut'ation), the interchange which
1011
405
puts rj, into the secoiid place will not affect xj,. Continuing this process, \ve havc
(makiiig allo~vaiice for the iiiexactiiess of our notation). Each polyiiomial iii the column on the right side is obtained from the polynomial above it by interchanging two indeterminates or by no change a t all. Hence, by the identity (103), each polyiiomial is equal to the one which precedes it. This proves (101.7).
THEOREM 101 .B. The sum, product', aiid iiegative of symmetric polynomials are symmetric. Hence, t,he set of a11 symmetric polynomials in D[zl, 2 2 , . . . , x,] is a subring of D[zl, 2 2 , . . . , x,].
Proof. Let
1 2
. . .
. . .
T
jl
1
j2
T
jr
be a permutation of (1,2, . . . , r). If a(xl, 22, . . . ,~ r E ) D[xl, 2 2 , . . . ,x,], then the polynomial ~ ( x j ,zj,, , . . . ,~ j , ) is obtained from a(z1, 2 2 , . . . , 5,) by substituting xj, for xl, xj, for 2 2 , . . . , and Xj, for x,. In particular, if a(zl, x2, . . . , xT)and b(xl, ~ 2 . ,. . , x,) are symmetric, and
then by (101.4a),
It follows that f(xl, x2, . . . , 2,) is symmetric. The fact that the product and negative of symmetric polynomials are symmetric follows in a similar way from (101.4).
There is a particularly important class of symmetric polynomials, which can be conveniently defined as follows.
406
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
, are the polynomials s:"(zl, 2 2 , . . . , x,), s $ ' ( x ~x2, s?)(x1, x2, . . . , x,) defined by the following identity in
. . . , xr), . . . ,
For example, if r = 2,
(23
x1)(~3 
2 22) = X3
( ~ 1 ~ 2 ) ~ X 3lX2,
so that
+ +
( ~ 1 x2, ,
so that
~ ' 3 ' , 1 ( ~ 1 x2,
x3)
8'3'
2
x3) =
~ 3 ~ 1 ) ~ xlx2X3r 4
1011
407
. . . , xT])[xT+l], we obtain
. . . , x,)
D [ x ~~ ,
2 ,
. . . , xT1. Then
is symmetric. This observation is an immediate consequence of the symmetry of the elementary symmetric polynomials and (101.4d). We leave the proof to the reader. The converse of (101.10) is a deeper and more important result. THEOREM 101.1 1. Fundamental theorem of symmetric polynomials. Let a ( x l , x2, . . . , x,) be a symmetric polynomial in D [ x l , x2, . . . , xT],where D is any integral domain. Then there is a polynomial f ( x l , x2, . . . , x,) E D [ x l , x2, . . . , x,] such that
We will not prove this theorem here, but the interested reader can find a proof in Appendix 2.
x:
EXAMPLE 6. Let a(x1, 2 2 , $ 3 ) E Z [ x i ,x2, x3] be the symmetric polynomial xi xi. We have
+ +
and
( ~ 1 22
+ +
3 3 =) 2 1 ~
212;
+~1x+ 3 x;x3 + ~ 2 x 3 )
408
SYSTEMS
OF EQUATIONS
AXD MATRICES
Hence,
X?
+ +Xj
Xi
(s:~')~ 3S1 S2
(3) (3)
+ 3sk3).
The general procedure followed in Example 6 can be used to express any symmetric polynomial a(zl, x2, . . . , 2,) in D[xl, 52, . . . , zr] in terms of the elementary symmetric polynomials. Roughly speaking, the process consists of computing al1 products (including powers) of elementary symmetric polynomials such t h a t the products have total degree no greater t h a n the total degree of a(xl, z2, . . . , 2,). It is then possible (usually by inspection) t o express a(zl, x2, . . . , x,) as a sum with coefficients in D of these products. This procedure can be systematized, but the statement of the exact process is somewhat complicated. I n practice, the method of trial and error is usually effective.
1. Formulate the definitions of the following concepts for the special case of polynomials in the two indeterminates x and y. (a) the total degree of a(x, y) (b) the value of a(x, y) for x = u, y = v (c) a zero of a(x, y) in the ring 1 . 1 2. What are Deg, [a(%, y)], Deg, [a(%, y)], and the total degree of a(x, y) for the following polynomials?
3. Prove by induction on r that every element of D[xl, 5 2 , expressed uniquely in the form (102).
4. Prove (101.2).
. . . , x,]
can be
5 . Prove that there are no polynomials f(x, y), g(x, y) in D[x, y] such that xf (x, y) yg(x, y) = l. [Hint: Substitute x for y.]
6. Describe geoinetrically the zeros in R of the following polynomials in R[x, yl. ( 4 x2 t y2 (b) x  y (d) (x  112 (Y 212  4 ( 4 XY (e) x2 2x  3 (f) y2 1 7. Find the solutions in R of the following systems of equations. =O (a) x + y  5 = O , x  y + l 1 = 0, 10x  15y 2 = O (b) 2x  3y (c) 2x  3y 1 = O , 10% 15y 5 = O
+ +
+ +
+ +
1011
< <
<i<4
9. Which of the following polynomials in D[xi, x2, x3, x4] are symmetric? Prove your assertions. (a) xSx2 222x3 232x4 24x1 (b) (xl 22 x 3 ) ( ~ 2 23 x4)(51 x3 ~4)(~1 22 24) x2x3 x3x1. (c) x1x2
+ + + + + + + + +
+ +
+ +
10. Give the details of the proof of (101.10). 11. Express the following symmetric polynomials in Z[xl, x2, xa] in terms of the elementary symmetric polynomials. (a) 2: x2 23 (b) x?xz 22x3 ~ 3 x 1 x?x3 22x1 23x2 (4 x : 22 23 2 2 (d) x?x%3 xlxgxg f XlX2x3 12. Suppose that the roots of the polynomial x3  2x2 x and r3. Find the cubic polynomial whose roots are rS, r2, and rg.
+ + + + + + +
13. (a) Show that in Q[x, y], every symmetric polynomial a(x, y) can be written in the form a(x, Y)
=
E
i=O
ri.i(xy)?xi
j=O
+ Y'),
where ri,j E Q. [Hint: Let a(x, y) = ~ . ~ k , ~ x and ~ yobserve l, that since a(x, y) is symmetric a(x, y) = +[a(x, y) a(y, x)].] (b) Prove the fundamental theorem on symmetric polynomials for Q[x, y] by showing that for al1 j 2 O , si yi can be written in the form f(x y, xy) for some f(x, y) E Q[x, y]. [Hint: Note that x ? + ~ yi+2 = (x Y)(xif l yi+l)  xy(xi yi), and use induction.]
++
+ + +
and
+ cm(x1,
22,
. , xr)
in D[xl, XZ., . . . , xr], then for any ul, u2, ring containing D as a subring
. . . , ur in
a commutative
SYSTEMS
OF EQUATIONS
AND MATRICES
[CHAP.
10
and
Q(u~ ~) 2
)
ur)
dl(u1, u.2,
is no greater than one. Thus, the equations can be written in the form
where the coefficients ai,j and bi are elements of an integral domain D. We refer to (104) as a system o f S linear equations in r indeterminates (or unknowns) with coeflcients in D. For example
1 = 1 = 0
is a system of three equations in five unknowns ~vith coefficients in the field Q. Note that the case in which al1 of the coefficients ai,l, ai,2, . . . 9 ai,r and the constant term of one or more equations in a system are zero is not excluded. It is often convenient to omit terms which have zero coefficient,
1021
SYSTEMS
OF LINEAR EQUATIONS
411
provided that this does not cause confusion. For example, instead of xl 0x1 \ve would write
+ 0x2 + Ox3  x4 = 1
+ x2 + x3 + x4
xz
o,
+ +
X3
X l  X4 =
Xq
1 = 0.
because then it would not be clear that the system is in four indeterminates rather than three, unless this fact were mentioned explicitly. Therefore, whenever such a system is written, al1 indeterminates will be exhibited. In dealing with arbitrary systems of linear equations, it is convenient to use the summation notation, and write
instead of (104). This notation is not convenient for specific systems in which r and s are small. I f r 5 3, we will use x, y, and z instead of xl, x2, and x3. Definition 101.5, of a solution of a general system of polynomial equations, applies to systems of linear equations in particular. That is, if
is a system of s linear equations in r unknowns with coefficients in the integral domain D, and if A is a commutative ring containing D as a subring, then a solution in A of this system consists of an ordered string (e1, ~ 2 ., . . , c ~of ) r elements in A, S U C that ~ C;=l ai,jcj = bi, for i = 1, 2, . . ) s. DEFINITION 102.1. A system of linear equations with coefficients in an integral domain D is called consistent if it has a solution in some commutative ring containing D as a subring. Otherwise, the system is called inconsistent. When D is a field, there is a way to decide whether or not a system of linear equations with coefficients in D is consistent, and to find al1 of the solutions of the system if it is consistent. In the remainder of this section
412
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
we will explain this method of solving systems of linear equations.* The general idea of the process is to construct a new system of equations from the given one. The new system is such that its consistency can be determined by inspection, and when it is consistent, its solutions are easily found. Moreover, the new system is constructed in such a way that it has exactly the same set of solutions as the original system.
and
be systems of linear equations with coefficients in a field F. The systems are equivalent if every solution of the first system is a solution of the second, and vice versa. For example, the system
z+
2x
y=o 2y = o
It is obvious that the relation of equivalence of systems of equations is reflexive, symmetric, and transitive. That is, every system is equivalent to itself ; if the system S1is equivalent to the system S2, then S2is equivalent to Sl;and if the system SI is equivalent to the system SSand S2is equivalent to a system S3,then SI is equivalent to S3. Moreover, any two inconsistent systems are equivalent.
* The theory of determinants furnishes another method of solving systems of linear equations. In the simplest case of r equations in r unknowns, with the determinant of the coefficients not equal to zero, the familiar Cramer's rule provides explicit formulas for the unknowns as quotients of certain determinants. However, if the number of equations and unknowns exceeds four, then it requires considerable computation to evaluate these determinants, so that Cramer's rule is of more theoretical than practica1 importance. I n this book we will not discuss determinants or their application to the solution of linear equations. A complete discussion of these topics can be found in References 20, 21, 22, 24, and 25 listed a t the end of this book.
1021
SYSTEMS
OF LINEAR EQUATIONS
413
There are three basic operations called elementary transformations which replace a given system of equations with coefficients in a field F by an equivalent system. These operations are described as folloms: ( 1 ) interchange two equations; ( 2 ) multiply an equation by an element of F and add the result to a different equation of the system; (3) multiply an equation by a nonzero element of F. Thus, if the original system of equations is (104), then the forms of systems obtained by applying elementary transformations of the three types are as follows. Type 1, where 1
<m
<n
<
S:
Type 2, where 1
5 m <n
<
S,
and c E F:
=
+ ' ' ' + al,rXr bl am,iZl + a m , 2 ~ 2 + + am,rX~ bm (an,i + cam,i)xi + (an,2+ cam,z)xa 44 (a,,, + ca,,,)~, as,lxl + as,2x2 + + as,,x, bs.
ai,ilL^l + a 1 , 2 ~ 2
'
bn
+ cb,
Type 3, where c
+ O in F :
I t is clear that each type of elementary transformation takes a system of s linear equations in r unkno~vnswith eoefficients in F into a system of
414
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
linear equations of the same sort, that is, s equations in r unknowns with coefficients in F.
THEOREM 102.3. Suppose that S and S' are systems of linear equations with coefficients in a field F such that S' is obtained from S by means of a sequence of elementary transformations. That is, there are systems of linear equations So,SI,S2,. . . , Snsuch that Sois S and S, is S', and for each natural number Tc 5 n the system Sr, is obtained from the system Skl by means of an elementary transfermation. Then the systems S and S' are equivalent.
Proof. Since the relation of equivalence between two systems of linear equations is transitive, it is sufficient to prove that for each Tc 5 n, S k is equivalent to Skl. There are three cases to consider, depending on which type of elementary transformation is used in passing from Skl to Sk. If Skis obtained from SkVl by interchanging two equations in the list, then it is obvious that every solution of Skis a solution of SkMl, and vice versa. is obtained from SkVl by adding a multiple of one equation Suppose that Sk to another. That is,
ai,jxj = bi for
Then where m # n. Let (e1, c2, . . . , c,) be a solution of Sk1. (cl, c2, . . . , c,) plainly satisfies every equation of Sk, except possibly dnVjxj= e,. However, Cam,jcj=b, and Ca,,jcj=bn.
Multiplying the first of these equations by c and adding it to the second, we obtain from the general distributive, associative, and commutative lawS
1021
415
That is, CS=ldn,jcj = en. Therefore, ( e l , c2, . . . , c,) is a solution of Sk. Conversely if ( e l , C 2 , . . . , c,) is a solution of S k , then ai,jcj = bi for i # n and Subtracting from this cam,j)cj = b, cb,. equality c times the equation a,, jcj = b, gives C:=l a,, jcj = b,. Thus, ( e l , c2, . . . , c k ) is a solution of SkVl. Thus Sk and Sk1 are equivalent in this case also. The proof that S k is equivalent to Sk1 if Sk is obtained from Skl by multiplying some equation by a nonzero element of F is left as an exercise for the reader (see Problem 7 below). We now illustrate by an example the way in which a system of linear equations can be transformed by a sequence of elementary transformations into an equivalent system which can easily be solved.
x>=l
2x+*yz=o
3x2y+z
2x+ 3x
Multiply the first equation by 3 Multiply the first equation by 3 and add to the third equation Multiply the second equation by 3 Multiply the second equation by and add to the third equation Multiply the third equation by
+ &Y
 fz = O
=
 3y+$z=4
 m 2y3 + s z
x+&yjz=o y  1, = 4 6 3
23  ~ y + ~ =z 1
+ &Y
" 3
o
= 4
y  A,
= 31
15
x+&yjz=o y  1, =
6
2 =
4
4%
416
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
on the left and the resulting equivalent system is given on the right. The final system of equations in this table is easily solved. If (cl, ea, es) is a solution, then c3 = ' 4 5 7 0 (from the second equation), and el = 1 2 7 ) c2 = iC3 3 = &e2 +c3 = M (from the first equation). It is routine to check by direct substitution that (*%, is a solution of the system
Therefore, this system has exactly one solution in any cominutative ring containing Q as a subring. It follo~vs from Theorem 102.3 that the original system
It is the special form of the last system of equations in Table 101 that makes it possible to obtain these solutioii so easily. This system is a particular case of a system of equations which is in "echelon form." DEFISITIOS102.4. A system of linear equatioi~s
is said to be in echelonform if there exists an integer m with O m and a sequence of natural numbers (nl, n2, . . . , n,) such that < n , 5 r; (a) 1 5 n l < n 2 < (b) if 1 5 i 5 m, then a i , j = O for j < n i and ai,,i = 1; (c) if m < i 5 S, then ai,j = O for al1 j. [If m = S, case (c) does not occur].
<
In Example 1, the last system obtained in Table 101 is in echelon form, with m = 3, n l = 1, nz = 2, and n3 = 3. The system
1021
nl = 1,
and
n2 = 3.
The system
is also in echelon form with m = O . Systems of t,his kind (with the coefficients of al1 indeterminates equal to zero) seem rather trivial, but it would be inconvenient to exclude them from our discussion. In general, if m = O in Definition 102.4, t,hen the set {nl, n2, . . . , n,) of natural numbers is empty. In this case, the conditions (a) and (b) are satisfied vacuously, and condition (c) implies that ai,j = O for al1 i and j. Xote that by condition (a) in Definition 102.4, the number m cannot exceed r, because it is impossible to have more than r different natural numbers ni which satisfy 1 5 ni 5 r.
THEOREM 102.5. If S is a system of s linear equations iii r unknowns with coefficients in a field F, then it is possible to transform S into a system of linear equations S' in echelon form by means of a finite sequence of elementary transformations.
Proof. The proof of this theorem is by course of values induction on the number t of different indeterminates which have nonzero coefficients in the system. That is, t is the number of indeterminates having at least one f this number is zero, then the nonzero coefficient. Of course, t 5 r. I system must have the trivial form
0x1 0x1 0x1
+ 0x2 +
+ OX, = b,,
which is already in echelon form (with m = O). Thus, the basis of the induction t = O offers no difficulty. Assume that t > O and every system in which fewer than t indeterminates appear with nonzero coefficients can be transformed to a system in echelon form by means of elementary transformations. Suppose that
418
SYSTEMS OF EQUATIOKS
AXD MATRICES
[CHAP.
10
nl be the least natural number such that xnl has a nonzero coefficient in one of the equations. Since t > O, it follows from the wellordering principie that such an nl exists. If the coefficient of xnl is zero in the first equation, interchange the first equation with an equation in which the coefficient of xnl is not zero. Multiply the new first equation by the inverse in F of the coefficient of xnl. After these elementary transformations, the system has the form
The construction of (106) from the original system is effected by a finite number of elementary transformations. Moreover, it is evident that if an indeterminate x, occurs with zero coefficient in every equation of the original system
a t most t  1 indeterminates appear with nonzero cocfficients. By the induction hypothesis, the system (107) can be transformed into echelon form by a finite sequence of elementary transformations. Clearly, in the resulting echelon system obtained from (107), the indeterminates xj for j 5 nl will occur with coefficient zero. That is, the echelon system ob
1021
419
Consequently, combining this system with the first equation of (106)) we obtain an echelon system
Since a sequence of elementary transformations applied to (107) can be considered as a sequence of elementary transformations applied to (106) which do not involve the first equation, it follows that we can get from our original system to a system in echelon form by applying a finite number of elementary transformations. This completes the induction, and proves Theorem 102.5. By combining the results of Theorems 102.3 and 102.5, we obtain the most important result of this section. THEOREM 102.6. Any system S of s linear equations in r unknowns with coefficients in a field F is equivalent to a system Sr of s linear equations in r unknowns with coefficients in F where Sr is in echelon form. I t should be emphasized that a system of linear equations may be equivalent to many different systems in echelon form. The system Sr in Theorem 102.6 is by no means unique (see Problem 5 below). The reduction process described in Example 1 and in the proof of Theorem 102.5 works for arbitrary fields. When it is used for fields of the form Z,, where p is a prime number, the results can be interpreted to obtain information concerning the solution of linear congruences with a prime modulus (see the discussion following Theorem 97.8).
+ 4x2 + x3 +
x4
420
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
have coefficients in Z5, the integers modulo 5. We list the successive equivalent systems, arriving finally a t a system in echelon form. The reader should describe the elementary transformations a t each step.
SYSTEMS O F L I K E A R E Q U A T I O N S
This system is not satisfied for any choice of x l , $2, $3, and 2 4 because the final equation O = 3 is never satisfied. Therefore the original system has no solution in any commutative ring containing Z 5 . The linear system of equations in this example can be regarded as a system of simultaneous linear congruences
+ + + 2x3 + + + + +
23 x3 x3 x3+
= 4 (mod 5)
O(mod5).
( e l , c2, c3,
x2+
24
Our result shows that this system of congruences has no solution lvhere ci E 2.
c4)
with coefficientsin Q is in echelon form. Let x5 = c, where c is an element in any commutative ring A containing Q as a subring. Then from the last equation, x4 = 1  C. Substituting x4 = 1  c, x5 = c in the second equation and choosing x3 = d E A, we have
xz
=
id
+ z(1
C)  c
+ S = 1  3c 
'5d .
Thus, (5
3 Sd
,1 
 3d, d , 1  c, c )
422
[CHAP.
10
is a solution of the given system, where c and d are arbitrary elements in A. For example, if A = Q[x] and c = d = x, then a solution is
(5+*x,
1  fx, x, 1  x, x).
Examples 1, 2, and 3 illustrate the fact that systems of equations in echelon form can be solved (or shown to be inconsistent) without much trouble. In fact, we can prove the following general results. THEOREM 102.7. Let
be a system of linear equations with coefficients in F, which is in echelon i 5 m and ai,j = O for al1 form: ai,j = O for j < ni, a;,ni = 1 for 1 j i f m < i < s , w h e r e l 5 nl < n 2 <nm randO 2 m 5 s. (a) The system is consistent if and only if either m = S, or bi = O for every i satisfying m < i <_ s. If the system is consistent, then it has a solution (cl. c2, . . . , c,) with each ci in F. (b) If the system is consistent, then its solution is unique if and only if m = r. When this condition is not satisfied (that is, the system has more than one solution) then it is always possible to find at least as many solutions (cl, c2, . . . , c,) with ci E F as there are elements in F.
<
< . . a
<
Proof. (a) Suppose that m < s and there is an i Then the ith equation of the given system is
# 0.
This equation plainly has no solution in any ring A containing F as a subring. On the other hand, if either m = S, or bi = O for al1 i satisfying m < i 5 S, then it is easy to see that (cl, c2) . . . , cr) is a solution with ci E F, where we define recursively
and cj
O for al1 indicesj which are not among the indices nl, n2,
. . . , nm.
1021
423
Note that the cn, are determined by the ai,j and bi. For example,
I t follows that our system is consistent and has a solution in F. (b) Suppose that the system is consistent and m = r. Since the natural numbers nl, n2, . . . , n, satisfy 1 5 nl < n2 < . < n, 5 r, it follows that nk = Ic for 7c = 1, 2, . . . , r. That is, the system has the form
Ox, = b, = 0.
it follows that c,k is also unique. Hence, by the principle of induction, the system of equations has a unique solution. Conversely, if the system is consistent, but the condition m = r is not satisfied, then there exists an index 1 such t,hat 1 # nk for al1 1 5 Ic 5 m. Let c E F. Define ei = bi ai,tc. Then the system
. By is still consistent because if i > m, then ai,l = O and ei = bi = O the proof of part (a), this new system has a solution (cl, c2, . . . , c,) with ci E F, such that cl = O. I t is then clear that
is a solution of our original system of equations. Since c can be arbitrary, it follows that the system has at least as many different solutions (in F) as
424
[CHAP.
10
there are elements in F. In particular, since every field contains a t least two elements, the system has more than one solution. As a consequence of Theorems 102.G and 102.7, we have the following useful result. THEOREM 102.8. I f a system of linear equations in r unknowns with coefficients in a field F is consistent, then the system has a solution (e1, c2, . . . , cT)with C; E F. Proof. By Theorem 102.6, the given system S is equivalent to a system S' of linear equations with coefficients in F, such that S' is in echelon form. Since S is consistent and S' is equivalent to S, it follows that S' is consistent. By Theorem 102.7, S' has a solution (el, c2, . . . , c,) with ci E F. Since S' is equivalent to S, it follows that (cl, c2, . . . , c,) is also a solution of S.
= b, = O in (104), the system is called homoWhen bl = b2 = geneous. A homogeneous system of linear equations is always consistent since (0,0, . . . , 0) is a solution. An interesting question concerning homogeneous equations is whether or not they have solutions other than the trivial one (0, 0, . . . , 0). This problem can always be referred to the case in which the homogeneous system is in echelon form. Indeed, it is clear that every elementary transformation carries a homogeneous system into a homogeneous system. Therefore, by Theorems 102.3 and 102.5, every homogeneous system is equivalent to a homogeneous system in echelon form. It is clear that if a homogeneous system has a unique solution, then it has no solution other than the trivial one (0, 0, . . . , 0). Consequently, Theorem 102.7(b) provides a condition for a homogeneous system in echelon form to have a nontrivial solution, namely m < r, where m is the number of equations of the system in which some nonzero coefficient appears and r is the number of indeterminates. In particular, if the number S of equations is less than the number r of unknowns, then the system has a nontrivial solution. Consequently, we obtain the follo~ving useful result.
be a homogeneous system of*s linear equations in r unknowns with coefficients in the field F. Suppose that s < r. Then cl, c2, . . . , Cr exist in F, not al1 zero, such t,hat
1021
425
Proof. By Theorems 102.3 and 102.5, the system ai,jxj = O, i = 1 , 2 , . . . , S , is equivalent to a homogeneous system S' of S linear equations in r unknowns with coefficients in F such that S' is in echelon form. Since m 2 S < r, it follows from Theorem 102.7(b) and the fact that every field contains a t least two elements that there is a solution
of S' which is different from (0, 0 , . . . , 0 ) . Since S' is equivalent to the given system, it follows that C5=lai,j~j = O for i = 1, 2, . . . , s.
The value of x4 can be chosen arbitrarily and the equations solved for 23, x2, and x l .
1. Reduce the following systems of linear equations with coefficients in Q to echelon form by means of elementary transformations, describing the elementary transformation being used a t each step. y = 3 (a) 2 x x  y = l x+ y = 2
(b) Zxi
Xl
22
 x2
+ +x3 
 +x3+
x4 = 1
24
(c)
X 
y = 2
x+ y = 2 32y = 2 x + 7 y = 2
426
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
4x1 2x1
22 X2
X3
+ + + 4x5 + 2x5
54 X4
= =
2 0
(e)
x ! 2 y
(l)i'~j
(1Ii,
1, 2,
. . . , 100
2. Discuss the solution of each of the systems in Problem 1. That is, determine whether or not each system is consistent, and if i t is describe a11 possible solutions (as in Example 3). 3. Describe the elementary transformations used a t each step in Example 2. 4. Solve the following systems of linear equations with coefficients in 2 7 . (a) 2x+ 2y+ 32 = 1 4x+6y+ x = 4 X x = 3 (b) xl
+ + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 / 4x2 + 2x3 + 2x4 + 4x5 / xg + 6x3 + + 6x5 + 6x6 xi + $c 2x2 + 4x3 + 4x4 + 2x5 + x i + 4x2 + 5x3 + 2x4 + 3x5 + 6x6
+ + + + +
22
53
24
X5
$6
X i X i
1 = 1
=
= 1 =
22
$4
X i
X6 =
1 1 1
Does this list of systems include al1 possible echelon forms to which the given system can be reduced? 6. Suppose that the system C:=i ai,jXj = bi, i = 1, 2, . . . , r, of r linear equations in r unknowns with coefficients in a field F has the unique solution ( 1 , 2 , . . . , c . Show that i t is possible to reduce this system by elementary
1021
7. Complete the proof of Theorem 102.3 by showing that if a system S' is obtained from a system S by a n elementary transformation of type 3 (multiplication of an equation in S by a nonzero element of F), then S and S' are equivalent systems.
8. Let a, b, c, d, e, and f be elements of any field with a # O. Prove that the system by = e ax cx+dy = f
10. Show that if the system S' is obtained from the system S by an elementary transformation, then there is an elementary transformation which carries the system S' into S. 11. Show that if (ci, ca,
. . . , c,)
with al1 of the cj belonging to some ring A containing al1 ai,j, and if d is any element of A, then (dcl, dc2, . . . , dc,) is also a solution of the homogeneous system. 12. Show that if (cl, c2,
. . . , c,)
428
[CHAP.
10
then (cl
103 The algebra of matrices. The study of linear equations in the preceding section serves as a natural introduction to the concept of a rectangular matrix. The system of equations S,
can be completely determined if the coefficients of S are given and the position of each coefficient in the system js known. This information is conveniently presented by the rectangular array
DEFIXITION 103.1. Let A be a ring. An m by n matrix (plural: matrices) with elements in A is a rectangular array*
with m rows and n columns, where the entries ai,j are e1ement.sof the ring A . For example,
[:, 7
:,U]
* I n this section and the following one, boldface capital letters will denote matrices.
1031
429
al,l=2,
and
a2,1
a1,3
1,
0,
a2,3 = 2,
The entries ai,j of a matrix are called the elements of the matrix, and the position of each element in the matrix is indicated by its subscripts. For instance, alVl is the element in the first row and first column (the upper lefthand corner) of the matrix, while a3,4is the element in the third row and fourth column. In general, ai,j is the element in the ith row and jth column for i = 1, 2, . . . , m and j = 1, 2, . . . , n . The number m of rows and the number n of columns in a matrix can be arbitrary natural numbers. These numbers are called the dimensions of f A is an n by n matrix, that is, the number of rows is equal to the matrix. I the number of columns, then A is called a square matrix. A matrix with only one column, that is, an m by 1 matrix, is called a column matrix, or a column vector. Similarly, a matrix with only one row is called a row matriz, or a row vector. The reader should be careful not to confuse matrices with determinants. Corresponding to every square matrix A with elements in a commutative ring A, there is associated in a certain way an element of A called the determinant of A. For example, if A is the 2 by 2 matrix
the determinant of A is
The matrix A is not an element of the ring A, whereas the determinant of A is an element of A. For r by s matrices with r # s, the determinant is not even defined.
430
SYSTEMS OF EQUATIONS
AS~D MATRICES
[CHAP.
10
Matrices are more than just convenient forms for presenting numerical data. By defining suitable operations of addition, subtraction, and multiplication, it is possible to develop an algebra of matrices which has numerous applications. The purpose of this section is to define these matrix operations and derive their basic properties. Some of the application of the algebra of matrices will be described in examples. Two matrices will be called equal if they are identically the same. That is, if
then A = B if and only if m = r, n = s (thus, A and B have the same dimensions), and ai,j = bi,j for i = 1, 2, . . . , m and j = 1, 2, . . . , n. For example, if
A=
[U
:]
and
=
B=
[ o o'1,
1
then A # B ; if A = (1 1 1) and B
[O O O]
and
[+ o
I
i2
O
2(\/2)2
then A = B.
DEFINITION 103.2. If A and B are m by n matrices with elements ai,j and bi, j in a ring A , then the sum, A B, of A and B is the matrix
Thus, C = A B is an m by n matrix with elements in A such that a i , j + bi,jfor i = 1, 2, . . . , m and j = 1, 2, . . . , n. I t is clear that addition of matrices is a well defined biiiary operation on the set of al1 m by n matrices with elements in A. However, the sum A B is not defined unless A and B have t,he same dimensions.
Ci,j =
1031
Since matrices are added "elementwise", according to Definition 103.2, the properties of addition which hold in the ring A are also satisfied by matrix addition.
(103.3). Matrix addition is associative.
Proof. Let A, B, and C be m by n matrices with elements ai,j, biVj, and ci,j in a ring A. Then by Definition 103.2,
Similarly,
432
SYSTEMS
OF EQUATIONS
AND MATRICES
[CHAP.
10
Both (A B) C and A (B C) are m by n matrices with elements in A, and since addition is associative in A, it follows that (ai,j biPj) ci,j) for al1 i, j. Thus, according to the definition of ci,j = ai,j (bi, equality of matrices, (A B) C = A (B C). The commutative law of addition in a ring A leads to the corresponding property of matrix addition.
+ +
+ +
+ +
+ +
It will be left as an exercise for the reader to prove (103.4)) that is, to show that if A and B are m by n matrices with elements in a ring A, then A+B=B+A.
Let O denote the m by n matrix which has the zero element of A in every position. Then it follows from Definitions 42.1 (c) and 103.2 that
f course, O where A is any m by n matrix with elements in A. O also. Because O satisfies (108)) it is called the xero matriz. Let a i , i al2 . . a1,n
+A = A
am,l am,2
am,n
be an m by n matrix with elements ai,jin a ring A. Define the negative of A to be the m by n matrix
In (109), the element ai,j of 4 , is the negative of ai,j in the ring A . Thus, we have
al,*
+ (A)
a2,l
+ +
+
(al,d (a2,l)
al,2
a2.2
+ +
(al,z>
(a2.2)
+ +
+
(al,,) (a2,n)
0 0 0 0
...
. . . .
0 0
o.
amPl
(am,l)
am,2
+ (a,,d
(am,,)
0
1031
433
Let ,Mn(A) denote the set of al1 m by n matrices with elements in a ring A . Then with addition and negation defined by Definition 103.2 and (109), the properties (103.3), (103.4) and equations (108), (1010) correspond exactly to the conditions of Definition 42.1 (a) (b), (e), and (d) in the definition of a ring. The reader might expect that the next step would be to introduce an "elementwise7'multiplication in the set ,Mn(A), which together with addition and negation would make ,M,(A) into a ring. Indeed, this can be done (see Problem 6 below). However, it turns out that in the various applications of matrices, a different definition of matrix multiplication is more useful. DEFINITION 103.5. Let A be an m by n matrix with elements ai,j in a ring A and let B be an n by q matrix with elements bi,j E A. Then ai,kbk,j the product Al3 is the m by q matrix which has the element in the ith row and jth column for i = 1,2, . . . , m and j = 1, 2, . . . , q. According to this definition, it is possible to multiply two matrices with elements in a ring only when the first matrix has the same number of columns as the second matrix has rows. Therefore, if m # n, Definition 103.5 does not define the product of two matrices in the set ,Mn(A). However if m = n, then it does define a binary operation on the set nMn (A )
EXAMPLE 2. Let
be matrices with elements in the field Q of rational numbers. Since A has three columns and B has three rows, the product AB is defined. In fact, according to Definition 103.5,
The product BA is not defined, since B has three columns, while A has only two rows.
EXAMPLE 3. Using the definition of multiplication given in Definition 103.5, it is possible to write a system of S linear equations in r unknowns as a single
434
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
is called the matrix of coeficients of the system. Define column matrices X and B by
Since A has r columns and X has r rows, i t is possible to form the product AX. By definition, this product is a column matrix with s rows, namely,
Using this notation, a solution of the system of equations is a column matrix with r rows
1031
T H E ALGEBRA OF MATRICES
435
AC
EXAMPLE 4. Let
B.
be a system of linear equations with coefficients in the integral domain D. Suppose that y l , y2, . . . , yt are new unknowns which are related to $1, xa, . . . , X T by the equations x i = di,iyi di,2yz l di,tyt 2 2 = d2,1y1 d2,2y2 4 d z , t ~ t
XT
+ + + + dT,tyt, dr,iyi + d r , 2 ~ 2
with
dj,k
436
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
is the matrix of the coefficients in the system of equations which relate xl, x2, . . . , x, to yl, ys, . . , yt, then the matrix of coefficients of the new system is AD, according to Definition 103.5. These calculations can be carried out within the algebra of matrices. Let
Then the relation between the x's and y's can be expressed by the matrix equation X = DY (see Example 3). Also, the original system of equations can be written in the form
AX
B.
It must be noted of course that the number of columns of A is equal to the number of rows of DY, so that A(DY) makes sense. I n a moment we will show that matrix multiplication is associative. Assuming this fact, i t follows that
The matrix of coefficients of this system is clearly AD, which is what we proved above by writing the systems in full. This example illustrates the notational savings which matrices provide.
We will now establish the associativity of matrix multiplication which was mentioned in Example 4.
1031
437
Proof. Let A be an m by n matrix with elements ai,j in a ring A , B an n by q matrix with elements bi, in A ,and C a q by r matrix with elements ci,j in A . Then the products AB, BC, (AB)C, and A(BC) are al1 defined. We wish to prove that these last two products are equal. By Definition 103.5,
in the ith row and jth column for i = 1, 2, Again using Definition 103.5,
. . . , m and j
1, 2,
. . . , r.
438
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
in the ith row and jth column for i = 1, 2, . . . , m and j = 1, 2, . . . , r. By the distributive laws, Definition 42.l(f) and (g), and the commutative law for addition, Definition 42.l(a), which are satisfied in the ring A ,
Since (ai,kbk,l)cl,j = ai,k(bk,lCl,j) by the associative law for multiplication, in A, it follows that the element in ith row and jth column of (AB)C is the same as the element in the ith row and jth column of A(BC) for i = 1, 2, . . . , m and j = 1, 2, . . . , r. Therefore,
I f A and B are m by n and n by q matrices, respectively, with elements in a ring, then AB is defined, but BA has no meaning unless m = q. However, even in the case where both products AB and BA are defined, they are not necessarily equal. Indeed, if A is an m by n matrix and B is an n by m matrix, then AB is m by m and BA is n by n. Thus, if m # n, the two products do not have the same dimensions, and are not equal. The following example shows that even when A and B are both n by n square matrices (so that AB and BA are also n by n matrices), the products AB and BA may not be equal.
EXAMPLE 5 . Let
1031
7  9 CD
=
[%
%?
, ] : 1 0
DC
8 10
[: y :l.
7 6
2
25
We will now adopt the simpler notation iWn(A) for the set ,&fn(A) of al1 n by n matrices with elements in a ring A. The matrices of Mn(A) are called nrowed square matrices with elements in A. We have already proved most of the results needed for the following theorem. THEOREM 103.7. The set Mn(A) of al1 nrowed square matrices with elements iil a ring A, with addition, multiplication, and negation, defined by Definitions 103.2 and 103.5 and (109), is a ring. I f A contains an identity element 1, then the n by n matrix
(whose elements ei,j are 1 if i = j and O if i # j) is the identity in Mn(A). Moreover, if n 2 2 and 1 # O in A, then Mn(A) is not commutative.
Proof. The only identities left to verify in order to prove that Mn(A) is a ring are t,he distributive laws, Definition 42.l(f) and (g), that is,
A(B
+ C) = Al3 + AC,
(A
+ B)C = AC + BC.
These follow easily from the properties of addition and multiplication in A, and we leave their proof as an exercise for the reader. To prove that 1 is an identity in dfn(A), let
440
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
jth column of A1 is
A1 = A. Similarly, IA = A. To complete the proof, it will be sufficient to exhibit two matrice A and B in M , ( A ) such that AB # BA (assuming of course that n 2 2 and 1 # O in A). Let
where
e k ,j =
ai,,en, j
AB
ro 1 o . . . oo o o ... o
=
. . . . . . . . .
0
and
BA
o o o ... o O o ...
e e
s
=
. .
.
. .
.
o o . . . o
0
o ...
ai,j =
i . j for
i/j
i = 1,2,3,4,5andj = 1,2,3. (b) Construct the 2 by 4 matrix with elements in Q which has for i = 1, 2, and j = 1, 2, 3, 4. 2. List every 2 by 2 matrix which has elements in modulo 2.
ente
ai,j =
22,
3. If A and B are m by n matrices with elements in a ring A, then the diferof A and B is defined by A  B = A (B). Prove that A  B is the unique solution of the matrix equation B X = A.
+ +
1031
5. Prove (103.4).
Prove that with this multiplication and with addition and negation, defined by Definition 103.2 and (109), ,Mn(A) is a ring. Prove that if A is commutative, then ,M,(A) is commutative. 1s it true that ,Ji,(A) is an integral domain if A is an integral domain?
442
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
xl, $2,
8. Write the systems of homogeneous linear equations in the unknowns x3, and 2 4 whose matrices of coefficients are as follows.
10. Complete the proof of Theorem 103.7 by proving the distributive laws in M n ( A ) . Prove, more generally, that if and then State a general form of the other distributive law.
11. Prove that if n 2 2 and the ring A contains an element a # O, then Jf,(A) contains proper divisors of zero.
12. Let A be a ring with identity. Suppose that n B in M , ( A ) such that (AB)2 # A2B2.
13. Prove that for any ring A, the ring M 1 ( A ) of al1 1 by 1 matrices with elements in A is isomorphic to A.
1041
443
104 The inverse of a square matrix. I f F is a field, theil by Theorem 103.7 the ring M,(F) of al1 nrowed square matrices with elements in F has the identity
which is the n by n matrix with 1 in every position on the diagonal line from the upper lefthand corner to the lower righthand corner (the socalled "main diagonal") and O in every other position. The existence of an identity element in M,(F) makes it possible to define inverses. DEFINITION 104.1. Let A and B be in M,(F), where F is a field. If Al3 = BA = 1, then the matrix B is called an inverse of the matrix A in M, (F) .
I f B is an inverse of A, then of course A is an inverse of B, since Definition 104.1 is symmetrical in A and 8. A matrix A may not have an inverse, but if an inverse does exist, then it is unique. In fact, suppose that AB = BA = 1 znd AC = CA = 1, where A, B, and C belong to M,(F). Then by the associative law,
We will denote the unique inverse of A, when it exists, by A'. Matrices which have no inverse are called singular; if A has an inverse, then A is called nonsingular.
EXAMPLE 1. The matrix
444
SYSTEMS
OF EQUATIONS A N D MATRICES
[CHAP.
10
Therefore, the numbers b i , 1, b i ,2, b2,1, and b2,2 must satisfy the following equations : +bi,i @2,1 = 1 Lb1,2 4 $32,2 = 0 +bi,i 4 3b2,i = O
Multiplying the first equation by 2 and adding it to the third equation, wc get an equivalent system of equations:
Ml = 1.
An important elementary property of the set of al1 nonsingular matrices is the fact that this set is closed under multiplication. In fact, the inverse of the product of nonsingular matrices can be given explicitly in terms of the inverses of the given matrices. THEOREM 104.2. Let A l , A2, . . . , Ak be nonsingular matrices in J f , ( F ) , where F is a field. Then AL' . . . A T ~ A is ~ 'the inverse of the product AlA2 . . . Ak, SO that this product is nonsingular.
Prooj. If k = 1, the assertion to be proved is that A l ' is the inverse of Al. This is true by the definition of A l 1 . Suppose that k = 2. Then
1041
445
and
Thus, by Definition 104.1, ALIA1' is an inverse of AlA2. Since inverses are unique, (A1A2)' = ALIA;'. The proof of the general case is obtained by induction on k , using the case Ic = 2 t'o establish the induction step. We omit the details.
is a system of n linear equations in n unknowns with coefficients in a field F, then the matrix of coefficients of this system
f the matrix A has an inverse in Mn(F), aild if this belongs to Mn(F). I inverse is known, then the system of equations can easily be solved. In fact, suppose that (cl, c2, . . . , c,) is any solution of the system. As we observed in Example 3, Section 103,
AC = B, where
Therefore, by the associative law, C = IC = (AW'A)C = A'(Ac) = A'B. That is, the solution (cl, c2, . . . , e,) can be obtained in the form of a column matrix by computing A'B, provided that A' is known. Conversely, by direct substitution of C = A'B for X in the mtltrix equation AX = B, it f o l l o ~ ~ that s C is a solution. Therefore, the elements of C furnish a solution of the original system of linear equations. h'ote that the solution of the system is unique since C = A'B aiid A' is unique.
446
in C:
SYSTEMS
OF EQUATIONS
AND MATRICES
[CHAP.
10
ix2
23
The matrix of the coefficients of this system is the matrix A whose inverse was given in Example 2. By our discussion, the unique solution of this system is obtained from the column matrix
The above discussion gives some indication of why it is important to be able to decide whether or not a matrix has an inverse, and if it has, to find the inverse. In the remainder of this section, we will describe a practical method* of finding the inverse of any nonsingular square matrix with elements in a field. The process is similar to the method of solving systems of linear equations which was explained in Section 102. Suppose that ai,jxj = bi, i = 1, 2, . . . , m, is a system of m linear equations in n unknowns with coefficients in the field F. Let A be the matrix of coefficients of this system. I f we apply an elementary transformation to this system, then a system of linear equations is obtained whose matrix of coefficients B can be described in terms of the matrix A. For example, if the elementary transformation interchanges the equations k and 1, then B is obtained from A by interchanging the rows k and l. This observation motivates the definition of an elementary row transformation of a matrix A in ,M,(F). There are three types of such elementary transformations, which can be described as follows.
* It c a n be shown that a square matrix A with elements in a field is nonsingular if and only if the determinant of A is not zero. An explicit expression can even be given for the inverse of A in terms of certain determinants. However, the method which we will explain below is a more practical way to find Al than by evaluating these determinants.
1041
where c E F, and c # O. I t is clear that the method used to prove Theorem 102.6 can be employed to show that any matrix can be carried into echelon form:
448
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
EXAMPLE 4. Let
A sequence of elementary row transformations on a matrix A E ,M,(F) can be accomplished by multiplying A by a matrix P E M m ( F ) . This fact can be used to give a necessary and sufficient condition for a square matrix to have an inverse, and to calculate the inverse when it exists. In order to carry out this program, we need several preliminary results.
be the matrix obtained from the identity matrix (104.3). Let I ' " ~ 1 E M,(F) by interchanging the ith and jth rows of 1. Let A E ,M,(F). Then the matrix I c i 9 j )is ~ the matrix obtained from A by interchanging the ith and jth rows of A.
Proof. We have
1041
449
The matrix ~ ' ~ * has j ' 1's on the diagonal except in the ith and jth rows where the diagonal element is zero, 1 in the (i, j)position, 1 in the (j, i)position, and zeros elsewhere. I f A E ,M,(F) is a matrix with elements ai,j, then it follows from the definition of matrix multiplication that
For example, I ' ~ * is ~ the ' matrix obtained from the identity matrix in M4(F) by interchanging the second and fourth rows, and
(104.4). Let 1 0 ) be the matrix obtained from the identity matrix 1 E Mm(F) by multiplying each element of the ith row of 1 by c E F and adding it to the corresponding element of the jth row (i # j). Let A E ,M,(F). Then the matrix is the matrix obtained from A by multiplying each element of the ith row of A by c and adding it to the corresponding element of the jth row.
IP'A
450
[CHAP.
10
a1,1
a1,2
For instance, I:'"' is the matrix obtained from the identity matrix in M 4 ( F ) by multiplying each element of the first row of 1 by c and adding to the corresponding element of the third row. Moreover,
! I be the matrix obtained from the identity matrix (104.5). Let ) 1 E M m ( F ) by multiplying each element of the ith row of 1 by c # O in F. Let A E ,M,(F). Then I:)A is the matrix obtained from A by multiplying each element of the ith row of A by c.
Pro0.f. The result follows at once when we note that
s a matrix with 1's on the diagonal except in the ith row where c is on the diagonal, and zeros elsewhere.
1041
451
We will refer to the matrices and ) ! I as elementary transformation matrices of type 1, 2, and 3, respectively. The results (104.4), (104.5)) and (104.6) show that each elementary row transformation on a matrix can be accomplished by multiplying the given matrix on the left by a matrix obtained from 1 by this same elementary transformation.
1 ( ~ 9 j ) ,
IC.~),
2
I n Table 102, we list a sequence of elementary row transformations which will carry A into echelon form, the corresponding elementary transformation matrices, and the result of performing these elementary transformations.
o
Interchange the first and second rows
o o
1 12
1 0 0
12
2
o o
1 O
1 12
[:
o
[o
y 1
o] 1
:]
o o
*
452
[CHAP.
10
and the required matrix P is the product of the five elementary transformation matrices. Since
i t is evident that P is obtained from the identity matrix 1 by performing the given sequence of elementary transformations on 1. Thus, P can be computed without resorting to matrix multiplication. The following steps carry 1 into P by the elementary transformations listed in Table 102:
(104.6). Each elementary transformation matrix in Mm(F) has an inverse in Mm(F) which is an elementary transformation matrix of the same type.
Proof. By (104.3) when a matrix is multiplied on the left by I""', the ith and jth rows of the matrix are interchanged. Since IKj)is obtained from 1 by interchanging the ith and jth rows of 1, it follows that
1041
453
Therefore, the inverse of 1'") is ~ ( " j ) . By (104.4), multiplying a matrix on the left by I ! ? ! ' , adds c times each element of the ith row of the matrix to the corresponding element of the jth row. Since 1 : ) ' is obtained from 1 by multiplying each element of ith row of 1 by c and adding to the corresponding element of the jth row, it follows that 1!?:)1E9" = 1. A similar argument shows that I ~ ~ ' = I 1. ~ Therefore ~ ) 15:) is the inverse of 1f9j). Finally, it is easy to check that is the inverse of ~ f ) and , this completes the proof.
I$L
Since any product of nonsingular matrices has an inverse, by Theorem 104.2, the following result is obtained from (104.6). (104.7). A matrix P E Mm(F)which is a product of elementary transformation matrices has an inverse in Mm(F). We now return to the consideration of nrowed square matrices. One more preliminary result is needed before the main theorem.
f A has 1 (104.8). Let the matrix A in Mn(F) be in the echelon form. I in every main diagonal position, then
and it is possible to transform A into the identity matrix 1 in Mn(F) by a sequence of elementary row transformations.
Proof. If the last row of A is multiplied by dlPn and added to the first row, then multiplied by dzVn and added to the second row, etc., we obtain the matrix which is identical with A except that d l t n ,d2,n, . . . , dnl,n are replaced by O:
454
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
Next, the (n  1)st row of this new matrix is multiplied by dl,,1 and added to the first row, then multiplied by d2,n1 and added to the second row, and so forth. This sequence of elementary row transformations leads to the matrix
I t is obvious how this process is continued to finally obtain the identity matrix 1. EXAMPLE 6. By using type 2 elementary transformations, the matrix
M 4 ( Q ) in
THEOREM 104.9. Let F be a field and suppose that A E M,(F). Then A has an inverse in M,(F) if and only if A can be transformed into the identity matrix 1 of M,(F) by a sequence of elementary row transforma
1041
455
tions. The inverse of A can be obtained by applying to 1 the same sequence of elementary row transformations that is used to get from A to 1.
Proof. Suppose that A can be transformed into 1by a sequence of elementar~ transformations. Then by (104.3)) (104.4), and (104.5) there is a sequence El, Ea) . . . , Ekl, Ek of elementary transformation matrices such that EkEk1.. . E2ElA = 1.
Let B = EkEkl . . . E2E1. Then BA = 1. We wish to show that B is the inverse of A. By Definition 104.1, it is sufficient to prove that AB = 1. Note that by (104.7), B has an inverse Bl. From this fact and the identity BA = 1we obtain the desired result: AB = IAB = B~BAB = BlIB = B  ~ B = 1. By definition, B = EkEkdl . . . E2E11, so that B is obtained from 1 by applying in order the elementary transformations corresponding to El, Ea, . . . , Ekl, and finally Ek. This proves the last statement of the theorem. The only thing left to show is that if A has an inverse, then A can be transformed into 1 by a sequence of elementary transformations, Suppose that A' exists. As we remarked before, any matrix A can be transformed into the echelon form (1011) by means of a sequence of elementary row transformations. Consequently, by (104.3), (104.4), and (104.5), there is a matrix P E Mn(F), such that P is a product of elementary transformation matrices and C = PA is in echelon form (101 1). To complete the proof, it is sufficient by (104.8) to show that C has the form
with 1 in every diagonal position. Suppose that C does not have this form. Then because C is a square matrix in echelon form, it follows that every element of the last row of C is zero. Therefore, by the definition of matrix multiplication, if D is any matrix in Mn(F), then every element in the last row of CD is zero. In particular, C cannot have an inverse. However, C = PA. By assumption A has an inverse, and since P is a product of elementary transformation matrices, it follows from (104.7) that P has an inverse. Therefore, by Theorem 104.2, C has an inverse. This contradiction shows that C must have the form (1012), which completes the proof.
456
SYSTEMS OF EQUATIONS
AND MATRICES
[CHAP.
10
The last part of the above proof shows that no matter how a nonsingular matrix A is reduced to echelon form by elementary row transformations, the result will be of the form (1012). Otherwise A could not have an inverse. Therefore, if a matrix A E Mn(F) reduces by elementary row transformations to an echelon form different from (1012) (which means that the last row must contain al1 zeros), then A does not have an inverse in Mn(F).
EXAMPLE 7. We will show that the matrix
in N * ( & ) has no inverse. I n fact, by the usual process of carrying A into echelon form, we obtain
At this point, i t is possible to stop, even though complete reduction to echelon form has not been achieved. I t is clear however that elementary row transformations applied to the last two rows of this matrix cannot produce a 1 on the main diagonal in the third row and third column. Therefore, A can be transformed by elementary transformations into an echelon matrix which is not of the form (1012). Consequently, A has no inverse in M4(Q). Note that this same con
1041
457
clusion could not be obtained from the next to last matrix in the above sequence, because of the presence of 4 in the fourth row, second column. EXAMPLE 8. Let us apply the process described in Theorem 104.9 to obtain the inverse which was given (without any motivation) for the matris
in Example 2. From the second line on, the first column of Table 103 describes a n elementary row transformation. The second and third columns give the matrices which are obtained by applying these elementary transformations to the corresponding matrices of the preceding lines. The second and third columns of the first line contain the matrices A and 1 in M 3 ( C ) , respectively. The second and third columns of the last line of Table 103 (see pp. 458 and 459) contain 1and Al.
AlA
1 in Example 2.
3. Show by an example that the sum of two nonsingular matrices is not necessarily nonsingular. Can the sum of two singular matrices be nonsingular?
4. Carry the following matrices into echelon form (1011) by elementary row transformations.
5. Write the following elementary transformation matrices in A l 5 ( & ) : I " ~ ~ ' , ,1 , 1 ,1 Describe in words the elementary row transformations to which each af these matrices corresponds.
l+i
O 0 1
0 0 1 i O O
Multiply the first row by 3i and add to the second row
1
i
O O
1
3
3
1 O
Multiply the second row by (4 3i) and add to the third row
1 1
i
(1
+ lli)
4i
1 (4
+ 3i)
*
4
Ih M
l + .+
'+
'+
1
4
12
b
M
l +
+y3
w
N 12
M M
b
M
'+
4
I
.@
7
m &
cn
04
+;M1
1;1
+;i
+
'+
Cu Cu m
+
'cr
m M
Cu
rmi
w
CV
'cr
LO
+
m
CV
1s + S 2 m l I r +Ir n
m
1
; z I
+lz
1
' N '
'
*
 +
4
0
' N '
O +
0
4
0
4
.+
o
0
G 4
al+ B
h l g
a
a3
Q
33 U1
.M
S!
k '
Q)
+
h
zci
h a s
5
33 O
Q)
B o
ll
2 2
3 75 3 h * a .+ 2 e
5 =,m al 5a
k
3"
" 2 4;*
xz
2
G
a
0,
Q) UI)
.e
33
hOJ e
33
5 L
5:
g7;
r"+z
a 4
5
B
Q)
E:
460
[CHAP.
10
6. Find a matrix P such that PA is in echelon form (1011) for each matrix A listed in Problem 4. 7. Find the inverses of the elementary transformation matrices of Problem 5. 8. Which of the matrices of Problem 4 have inverses? Find the inverses when they exist. 9. Prove that A E ilPn(F) has an inverse if and only if A is a product of elementary transformation matrices. 10. Let A be the matrix of coefficients of the homogeneous system of n equations in n unknowns with coefficients in a field F :
(a) Show that if B is a nonsingular matrix, then the solutions of AX = O are the same as the solutions of (BA)X = 0. (b) Use the result of (a) to prove that A is nonsingular if and only if AX = O has only the trivial solution X = O . [Hint: To prove that this condition is sufficient, let B be a product of elementary transformation matrices such that BA is in echelon form. Use the result of Theorem 102.7(b), together with (104.8) and Theorem 104.9.1 (c) Use part (b) to prove that if A E M,(F) is such that BA = 1 for some B E Mn(F), then A is nonsingular, and B = Al.
11. Prove that the matrix
 ai,2a2,1a3,3
is not zero.
APPENDIX 1
THE PROOF OF STURM'S THEOREM THEOREM All. Sturm's theorem. Let f(x) be a polynomial in R [ x ]with Sturm sequence
f (x), f '(x),
s1(x) = Y ~ ( x ) ~ ' ( x) f(d, s2(x) = q2(x)s1(4  f ' ( ~ 1 , 83 ( 2 ) = Q O ( X ) ~ Z ( X ) (x), sk(x) = ~ k ( x ) s k  l ( x ) sk2(~). Let c and d be real numbers such that c < d and f (c) # O and f ( d ) # 0. For each real number t, let N ( t ) be the number of variations in sign in the sequence ( 1 ) . Then the number of distinct real roots of f ( x ) between c and d is equal to N (c)  N ( d ). Proof. The first step in the proof is to replace the Sturm sequence (1) by a modified sequence for which the value of N ( c )  N ( d ) is the same as for (1). The last Sturm polynomial sk(x) is a g.c.d. of f ( x ) and f ' ( x ) (see Section 911), and is a divisor of every polynomial in the Sturm sequence (1). The modified sequence is
Since sk(x) divides f ( x ) and f (c) # O, it follows that sk(c) # O. Therefore, dividing each polynomial of sequence (1) by sk(x)leaves the signs the same a t x = c if sk(c) > 0 , and reverses each sign at x = c if sk(c) < O. In either case, the number of variations in sign in the sequence
462
APPENDIX
That is, N(c) is the same for sequence (2) as for sequence (1). Similarly, sk(d) # O since f(d) # 0, and N(d) calculated from (2) is the same as N(d) computed from (1). Thus, the modified Sturm sequence (2) yields the same value of N (c)  N (d) as the original sequence (1). We next observe that the real roots of g(x) are the same as the real roots of f (x), although possibly with different multiplicities. In fact, suppose that the distinct real roots of f(x) are ul, u2, . . . , U,. Then
where a is a nonzero real number, ml, m2, . . . , m,, nl, n2, . . . , n, are natural numbers, and ql (x), q2(x), . . . , q,(x) are distinct monic polynomials of degree greater than one which are irreducible in R[x] and consequently have no real roots. By Theorem 96.4,
Thus, the different real roots of g(x) are also ul, u2, . . . ,and u,. Moreover, we note for future reference that each ui is a simple root of g(x). Thus, to prove the theorem, it is sufficient to show that N(c)  N(d) calculated from sequence (2) is the number of roots of g(x) in the interval from c to d. Divide the interval between c and d at each point which corresponds to a root of any one of the polynomials in the sequence (2). We then have a finite set of real numbers
such that each xi for 1 5 i < r is a root of some polynomial in (2), and every root of every polynomial in (2) in the interval from c to d is in the Thus if t satisfies xj1 < t < xi for set (xo, xl, x2, . . . , x,1, x,). i = 1, 2, . . . , r, then none of the polynomials in (2) is equal to O at x = t . The proof is carried out by showing that (i) the value of N(t) remains unchanged in each interval xi1 < t < xi, (ii) the value of N ( t ) is the same in two adjacent intervals ~ i < t < xi and xi < t < xi+l if is not a root of g(x), and (iii) the value of N(t) for xi < t < xi+l is one less than the value of N(t) for xi1 < t < xi if xi is a root of g(x).
APPENDIX
463
Proof of (i). Suppose that one of the polynomials in sequence ( 2 ) changes sign in an interval xi1 < t < xi. Denote this polynomial by h(x). Then h(tl) and h(t2) have opposite signs where xi1 < tl < t2 < xi. By Theorem 910.1, h(x) has a root between tl and t2. However, this contradicts the fact that every root of h(x) between c and d is in the set {x0,X L , x2, . . . , x,~, x,). Therefore, every polynomial in sequence (2) has the same sign for al1 t such that xi1 < t < xi. This implies that N(t), which is the number of variations in sign in the sequence
Proof of (ii). Suppose that xi is not a root of g(x). We will compare the sequences g(t>, go(t>, g1(t), , gk(t) = 1, (3)
s . .
where xi1
By the proof of (i), the signs of the numbers in (4) are the same as those for the corresponding numbers in (3), except that some of the numbers in sequence (4) may be zero. Observe that the first and last terms in (4) are not zero, since X i is not a root of g(x) and gk(xi) = 1. Moreover, no two consecutive terms in (4) are zero. For otherwise, examination of the equations (2) shows that al1 following terms would be zero. I n particular, gk(xi) = O, which is impossible. It also follows from (2) that those numbers in sequence (4) which are adjacent to a zero have opposite signs. For example, if g2(xi) = O, then since g3(x) = q3g2(x)  gl(x), we have O # g3(xi) = gl (xi). Therefore, at a place where a zero occurs in (4), there are the following possibilities for the signs in the sequences (3) and (4) :
Thus, the variation in sign that occurs in (3) is preserved in (4). Hence, N(t), the total number of variations in sign in (3), is the same as N(xi), the f t satisfies xi < t < xi+l in (3), total number of variations in sign in (4). I then the above argument shows that N(xi) = N(t). Therefore, N(t) is the same for al1 t such that xil < t < xi+l, which completes the proof of (ii). The reader should observe that since g(c) O , g(d) O, we have in
cidentally proved that N ( t ) is the same for al1 t such that c = z o as well as for al1 t such that xrAl < t x, = d .
<
5 t < xl
Proof o f (iii). Xote first that if xi is a root of g(x), then i # O and i f: r, since g(xo) = g(c) # O and g(x,) = g(d) f: O. Suppose that xi is a root of f ( x ) of multiplicity m . Then
where xi is not a root of a ( x ) . Moreover, s ~ ( x = ) ( X  X ~ ) ~  ' S ( X ) , where xi is not a root of s(x). Thus, s(x) and x  xi are relatively prime, so that s(x) divides a ( x ) . Since sk(x) divides
it follows that s(x) also divides a l ( x ) . Let b(x) = a ( x ) / s ( x ) and c(x) = a' ( x ) / s ( x ) . Then we have
where xi is not a root of b(x). I t follows that b(t) # O for al1 t such that xi1 < t < xi+l. Indeed, b(xi) # O, since xi is not a root of b(x). If b(t) = O for t # xi, then by ( 6 ) g(t) = O. This is impossible because x; is the only root of g(x) between xi1 and xi+l. It therefore follows from Theorem 910.1 that b(t) has the same sign throughout the interval xi1 < t < xi+l. Suppose that b(t) > O for al1 t in this interval. Then
if xi1 if xi
BY (71,
Therefore, go(t) > O for al1 t such that xi1 xi1 < t < xi, the signs of the sequence go(t), g1(t), are
+...
+,
and for xi
This same result is obtained if we suppose that b(xi) < O. If xi is not a root of any polynomial in (2) excbeptg(.~), then each term of t,he abbreviated sequence go(t), gdt), , gk(t)
s . .
has t,he same sign throughout the interval ai1 the complet,e sequence
In this case,
has exactly one less variation in sign when X i < t < xi+l than when zi1 < t < xi. If xi is a root of some polynomial in (2) other than g(x), then X i must bc a root of one of the polynomials gl (x), g2(.T), . . . , gk 1 (x), since go(xi) # 0 and gk(xi) = 1 # O. I t is now possible to use the rcsult of (ii) applicd to t,hc sequence
That is, since go(xi) # O and gk(xi) # 0, the number of variations in sign in go(t), g i w , g2(t), 9 gkl(t), gk(t) is the same for xi1 < t < xi as for n.; < t < xi+l. Thcrefore, in evcry case, the value of N(t) is exactly one less in the interval xi < t < xi+l than in the interval xi1 < t < Ji. This completes the proof of (iii). Combining the rcsults (i), (ii), and (iii), we have proved that the only change which occurs in the value of N(t) for c 5 t d is that N(t) is diminished by 1 a t each root of g(x) in the given interval. Thercfore, the number of roots of g(x) [which is the number of distinct real roots of the polynomial f(x)] between c and d is N(c)  N(d).
<
APPENDIX 2
. . . , xr)] = . . . , xr)I
[a(xi, X2,
, xr)I
In fact, it is easily seen that Deg,, [a(xl, . . . , xi, . . . , xj, for any a(xl, x2,
= Degzj [a(xl, . . . , xj,
. . . , xi, . . . xr)])