You are on page 1of 107

AN INTRODUCTION TO ANALYSIS

Robert C. Gunning

Robert
c C. Gunning

ii

Contents

1 Algebraic Fundamentals 1
1.1 Sets and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Groups, Rings and Fields . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Topological Fundamentals 39
2.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 Mappings 63
3.1 Continuous Mappings . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Differentiable Mappings . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Analytic Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . 87

iii

iv CONTENTS .

or equivalently for any set consisting of the sets Aα .Chapter 1 Algebraic Fundamentals 1. and students should learn it and use it freely. Among all sets the strangest is no doubt the empty set. There is some variation in the usage here. denoted by A ∪ B. denoted by A ∩ B. A set can be specified by listing its contents. or equivalently if both A ⊂ B and B ⊂ A. or by describing its contents in any other way so long as the members of the set are fully determined. and the condition that A is a subset of B is indicated by writing A ⊂ B or B ⊃ A. the set that contains nothing at all. tra- ditionally denoted by ∅. for any collection of sets {Aα }. so the set ∅ = { }.1 Sets and Numbers Perhaps the basic concept dealt with in mathematics is that of a set. y. x. and the union of sets A and B. With definition adopted here. that b does not belong to the set a is indicated by writing b 6∈ A. an unambiguously determined collection of mathematical entities. the intersection and union 1 . Sets A and B are said to be equal. for since there there is nothing in the empty set ∅ the condition that anything contained in ∅ is also contained in A holds vacuously. consists of those elements that belong to either A or B or both. A set A is a subset of a set B if whenever a ∈ A then also a ∈ B. More generally. The intersection of sets A and B. A ⊂ B includes the case that A = B as well as the case that there are some b ∈ B for which b 6∈ A. if they contain exactly the same elements. denoted by A = B or B = A. z}. In particular ∅ ⊂ A for any set A. since it can readily sneak up unexpectedly as an exception or counterexample to some mathematical state- ment if that statement is not carefully phrased. in the latter case the inclusion A ⊂ B is said to be a proper inclusion and it is denoted by A $ B or A % B . There is a generally accepted standard notation dealing with sets. That a belongs to or is a member of the set A is indicated by writing a ∈ A. this set should be approached with some caution. as for instance the set A = {w. consists of those elements that belong to both A and B.

A∆B respectively.2 CHAPTER 1. of this collection of sets are defined by \ n .1: Venn diagrams illustrating the sets A∩B. ALGEBRAIC FUNDAMENTALS Figure 1. A∪B.

1) Aα = a . o (1.

. a ∈ Aα for all Aα .

α [ n .

o Aα = a .

. a ∈ Aα for some Aα .

2) and (1. A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). ∼ (A ∩ B) = (∼ A) ∪ (∼ B). The intersection and union operations on sets are related by (1. α Two sets A and B are disjoint if A ∩ B = ∅. If A. consists of those a ∈ A such that a 6∈ B. The symmetric difference. Obviously the order in which the two sets are listed is critical in this case. A mapping f : A −→ B from a set A to a set B associates to each a ∈ A its image f (a) ∈ B. then a difference such as E ∼ A often is denoted just by ∼ A and is called the complement of the subset A. as in the accompanying Figure 1. and that is understood in the discussion of these subsets.4) A ∼ (B ∪ C) = (A ∼ B) ∩ (A ∼ C). since that might not be altogether clear from the definition of the difference. The set A often is called the domain of the mapping and .2) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). It is clear from the definitions that (1. . defined by (1. but there is also the symmetric difference betweem these two sets. are all viewed as subsets of a set E. It is worth writing out the proofs of (1. denoted by A ∼ B.4) in detail if these equations do not seem evident. . for clarity ∅ ∼ ∅ = ∅. A ∼ (B ∩ C) = (A ∼ B) ∪ (A ∼ C). .3) A∆B = B∆A = (A ∼ B) ∪ (B ∼ A) = (A ∪ B) ∼ (A ∩ B). B. can be illus- trated by Venn diagrams. as well as the intersection and union.1. C.4) takes the form (1. The difference between two sets A and B. whether B is a subset of A or not. with this understanding (1.5) ∼ (A ∪ B) = (∼ A) ∩ (∼ B).

1. so the point a is uniquely determined by b and consequently can be viewed as the image a = g(b) of a well-defined mapping g : B −→ A. conversely if there is a mapping g : B −→ A such that g(f (a)) = a for every a ∈ A and f (g(b)) = b for every b ∈ B then the mapping f is clearly both injective and surjective so it is bijective. and it is said to be bijective if it is both injective and surjective. SETS AND NUMBERS 3 the set B the range of the mapping.1. it is said to be surjective if for every b ∈ B there is a ∈ A such that b = f (a). not necessarily bijective or injective or surjective. The mapping f : A −→ B is said to be injective if f (a1 ) 6= f (a2 ) whenever a1 6= a2 . defined by n . and f is also injective. and it is a bijective mapping from B to A. Indeed if f is bijective then it is surjective. so each point b ∈ B is the image b = f (a) of a point a ∈ A. A mapping f : A −→ B is bijective if and only if there is a mapping g : B −→ A such that g(f (a)) = a for every a ∈ A and f (g(b)) = b for every b ∈ B. it is usually denoted by g = f −1 . from which it follows immediately that g(f (a)) = a for every a ∈ A and f (g(b)) = b for every b ∈ B. there can be associated to any subset X ⊂ A its image f (X) ⊂ B. For any mapping f : A −→ B. The mapping g is called the inverse mapping to f .

6) f (X) = f (a) ∈ B . o (1.

a ∈ A . .

and to any subset Y ⊂ B its inverse image f −1 (Y ) ⊂ A. defined by n .

7) f −1 (Y ) = a ∈ A . o (1.

f (a) ∈ Y . .

Y2 ⊂ B it is fairly easy to see that (1. The image f (X) may or may not coincide with the range of the mapping f . f −1 (Y1 ∩ Y2 ) = f −1 (Y1 ) ∩ f −1 (Y2 ). so in a sense it describes an injection of the set X into B. it should be kept clearly in mind though that f −1 (Y ) is well defined even for mappings f : A −→ B that are not bijective. . so mappings for which the inverse mapping f −1 is not even defined. To any mappings f : A −→ B and g : B −→ C there can be associated the composite mapping g ◦ f : A −→ C defined by  (1. If f : A −→ B is injective then clearly it determines a bijective mapping from any subset X ⊂ A to its image f (X) ⊂ B. f −1 (Y1 ∪ Y2 ) = f −1 (Y1 ) ∪ f −1 (Y2 ). this is an instance in which the inverse images of sets are better behaved than the images of sets.9) (g ◦ f )(a) = g f (a) for all a ∈ A. For any mapping f : A −→ B and any subsets X1 . f (X1 ∩ X2 ) ⊂ f (X1 ) ∩ f (X2 ) but it may be a proper inclusion. X2 ⊂ A and Y1 . If f : A −→ B is bijective the inverse image f −1 (Y ) of a subset Y ⊂ B is just the image of that subset Y under the inverse mapping f −1 . The order in which the composite is written should be kept carefully in mind: g ◦ f is the operation that results from first applying the mapping f and then the mapping g.8) f (X1 ∪ X2 ) = f (X1 ) ∪ f (X2 ).

The power set P(∅) of the empty set is a set that consists of a single ele- ment. on the other hand the set aB can be identified with the point a. To any two sets A. since a mapping f : a −→ B is determined fully by its image f (a) ∈ B and the image can be any point in B. the set con- sisting of all subsets of A. If A = a is a single point the set B a can be identified with the set B. if a1  a2 then a2  a1 . sometimes the power set P(A) of a set A is denoted just by P(A) = 2A . since the only mapping f : B −→ a is the mapping for which f (b) = a for all b ∈ B.11) χE (a) = 0 if a ∈ / E. 1} is the set consisting of two points 0. and (iii) transitivity. P({a}) = {{a}. (1. a relation between some pairs of elements a ∈ A which is denoted by a1  a2 and is characterized by the following properties: (i) reflexivity. ∅}. (ii) symmetry.  This establishes a bijective mapping between the set (F2 )A and the power set P(A). ALGEBRAIC FUNDAMENTALS To any set A there can be associated other sets. the mapping χE : A −→ F2 defined by   1 if a ∈ E. a  a. since the only subset of the empty set is the empty set itself.10) B A = f : A −→ B consisting of all mappings f : A −→ B. the identification (1. the power set of the set {a} consisting of a single point has two elements. 1. then to each mapping f : A −→ F2 there can be associated the subset f −1 (1) ⊂ A. sometimes called the power set of A and denoted by P(A).4 CHAPTER 1. the subset n . B there can be associated the set n o (1. if a1  a2 and a2  a3 then a1  a3 . If F2 = {0. If  is an equivalence relation on A then to any a ∈ A there can be associated the set of all x ∈ A that are equivalent to a.12) P(A) = (F2 )A . Yet another way of associating to a set A another set is through an equiv- alence relation on the set A. for example. and conversely to each subset E ⊂ A there can be associated the characteristic function χE of that set.

Aa = x ∈ A .

. x  a } ⊂ A.

Similarly to any b ∈ A there can be associated the set Ab of all y ∈ A that are equivalent to b. hence Ab ⊂ Aa . the set consisting of . which by reflexitivity includes in particular a itself. and correspondingly Aa ⊂ Ab so that actually Ab = Aa . If c ∈ Aa ∩ Ab for an element c ∈ A then c  a and c  b so by symmetry and transitivity a  b. then by symmetry and transitivity again whenever x ∈ Ab then x  b  a so x  a and consequently x ∈ Aa . Thus the set A is naturally decomposed into a collection of disjoint equivalence classes.

which is indicated by writing A ' B. SETS AND NUMBERS 5 these various equivalence classes is another set. One approach is merely to use the equivalence relation itself as a proxy for the “number of elements” in a set.1. n . particularly for sets that are not finite. For instance the equality A = B of two sets clearly satisfies the three conditions for an equivalence relation. to define the cardinality of a set A as the equivalence class of that set. This notion also clearly satisfies the three conditions for an equivalence relation among sets. The notion of equivalence also is used in a slightly different way. that is. as a relation among sets rather than as a relation between the elements of a particular set. called the quotient of the set A under this equivalence relation and denoted by A/.1. This is a construction that arises remarkably frequently in mathematics. it is not phrased as an equivalence relation between subsets of a given set but rather as an equivalence relation among various sets. but what that really means requires some discussion. Equivalent sets intuitively are those with the same “number of elements”. B are said to be equivalent if there is a bijective mapping f : A −→ B. two sets A. As another example. to avoid any involvement with the paradoxical notion of the set of all sets.

o (1.13) #(A) = X .

. X is a set for which X ' A .

so that there is a bijective mapping g : A −→ A0 . (ii) transitivity. since if A0 ' A. and #(A) ≤ #(B). a bijective mapping h : B −→ B 0 . and to write #(A) < #(B) or #(B) > #(A) to indicate that #(A) ≤ #(B) and #(A) 6= #(B). B 0 ' B.1 (Cantor Bernstein Theorem) If there are injective mappings f : A −→ B and g : B −→ A between two sets A and B then there is a bijective mapping h : A −→ B. but it does hold as a consequence of the following theorem. it us customary to consider #(B) ≥ #(A) as an alternative way of writing #(A) ≤ #(B). perhaps the basic use of the notion of the “number of elements” in a set. #(A) ≤ #(A). Reflexivity is clear since the identity mapping ι : A −→ A that associates to any a ∈ A the same element ι(a) = a ∈ A is clearly injective. In particular this can be used for the comparison of the sizes of sets. As far as the notation is concerned. if #(A) ≤ #(B) and #(B) ≤ #(C) then #(A) ≤ #(C). This concept is well defined. and transitivity is also clear since if f : A −→ B and g : B −→ C are injective mappings then the composition g ◦ f : A −→ C is also injective. Theorem 1.14) #(A) ≤ #(B) when there is an injective mapping f : A −→ B. The relation (1. if #(A) ≤ #(B) and #(B) ≤ #(A) then #(A) = #(B). by setting (1. and (iii) identity. Identity is not so clear though. and an injective mapping f : A −→ B then f 0 = h ◦ f ◦ g −1 : A0 −→ B 0 is an injective mapping so #(A0 ) ≤ #(B 0 ). .14) is is a partial order relation. a relation having the following properties: (i) reflexivity.

defined as the set N that satisfies the following Peano axioms: (i) origin. consequently x = (g ◦ f )(x1 ) for some point x1 ∈ C. ALGEBRAIC FUNDAMENTALS Proof: Introduce the auxiliary set   [   C = A ∼ g(B) ∪ (g ◦ f )n A ∼ g(B) ⊂ A n∈N where g ◦ f : A −→ A is the mapping  thattakes a point a ∈ A to  the point   (g ◦ f )(a) = g f (a) ∈ A. To demonstrate next that h is  surjective. since A ∼ g(B) ⊂ C it follows that A ∼ C ⊂ g(B). (ii)succession. consider a point y−1∈ B and let x = g(y) ∈ A. a contradiction since f is injective. To demonstrate first that h is injective consider any two distinct points x1 . If x1 . The theorem will be demonstrated by showing that the mapping h : A −→ B just defined is bijective. and in either case x2 = (g ◦ f )(x1 ) ∈    (g ◦ f )k A ∼ g(B) ⊂ C for some k. x2 ∈ A ∼ C ⊂ g(B) then x1 = g(y1 ) and x2 = g(y2 ) for uniquely determined distinct points y1 . to any a ∈ N there is associated an a0 ∈ N distinct from a. where a0 = b0 if and only if a = b. For finite sets it is possible to associate to the cardinality of a set a much more familiar entity by introducing formally the set of the natural numbers. x2 ∈ C then by definition h(x1 ) = f (x1 ) and h(x2 ) =  f (x2 ) so f (x1 ) = f (x2 ). If x1 . Since x1 ∈ C then by the definition of the set C either x1 ∈ A ∼ g(B) or there is some n ∈ N   for which x1 ∈ (f ◦ g)n A ∼ g(B) . That shows that h is injective. so it  must be the case that there is some n ∈ N for which x ∈ (g ◦ f )nA ∼ g(B) . a contradiction since x2 ∈ A ∼ C . That shows that h is surjective and thereby concludes the proof. x2 ∈ A and suppose to the contrary that h(x1 ) = h(x2 ). If x ∈ A ∼ C then  by definition  h(x) = g (x) = y. a contradiction. Since h(x1 ) = h(x2 ) it follows that f (x1 ) = y2 hence x2 = g(y2) = (g ◦ f )(x  1 ). Since g f (x1 ) = x = g(y) and g is injective it must be the case that y = f (x1 ). called the successor to a. but 1 is not the successor . and by definition h(x1 ) = y1 and h(x2 ) = y2 so y1 = y2 . If x1 ∈ C and x2 ∈ A ∼ C then by definition h(x1 ) = f (x1 ) and h(x2 ) = y2 where y2 ∈ B is the uniquely determined point for which g(y2 ) = x2 . If x ∈ C it cannot be the case that x ∈ A ∼ g(B) since x = g(y) ∈ g(B). In terms of this set introduce the mapping h : A −→ B defined by   f (x) if x ∈ C ⊂ A.6 CHAPTER 1. h(x) =  −1 g (x) if x ∈ (A ∼ C) ⊂ A. and since x1 ∈ C then by definition h(x1 ) = f (x1 ) = y. where g −1 (x) is well defined on the subset (A ∼ C) ⊂ g(B) since g is injective. there is a specified element 1 ∈ N. y2 ∈ B since g is injective.

and (iii) induction. indeed if n . the successor to 2 by 3 = 20 = (10 )0 . SETS AND NUMBERS 7 to anything in N. induction is just a precise way of saying that the set N originates from 1 in this way.1. An immediate consequence of the axiom of induction is that every element a ∈ N except 1 is the successor of another element of N. if E ⊂ N is any subset such that 1 ∈ E and that a0 ∈ E whenever a ∈ E then E = N. and so on. The successor to 1 is often denoted by 2 = 10 .1.

o E = 1 ∪ x ∈ N .

x = y 0 for some y ∈ N .

so by inductivity axiom E = N. The partial order relation (1. it is clear that 1 ∈ E and that if x ∈ E then x0 ∈ E. identity the equivalence class #(B) of the set B = { | | } with the natural number #(B 0 ) = #(B) = 10 = 2 ∈ N. if T (1) is true and if T (n0 ) is true whenever T (n) is true.14) then provides a partial order relation among the natural numbers. and if the equivalence class #(C) of a set C = { | | | . in which case #(A) = #(B). but the proof requires the use of the axiom of choice. and in this way the cardinality of a finite set is identified with a natural number. but only one of these relations is possible. the second stroke in A to the second stroke in B. identify the equivalence class #(A) of the set A = { | } with the natural number #(A0 ) = #(A) = 1 ∈ N. | | | } is identified with number #(C) = n ∈ N then identity the equivalence class #(D) of the set D that arises from C by adding an additional stroke the natural number #(D0 ) = #(D) = n0 ∈ N. Actually the partial order relation (1. called a stroke. and so on. Induction is also key to the technique of proof by mathematical induction: If T (n) is a mathematical statement depending on the natural number n ∈ N. in which case #(A) < #(B). thus an order relation is a partial ordering on a set with the property that any two elements of the set are related to one another in this way. eventually either there will be no strokes left in A but some strokes remaining in B. for if S ⊂ N is the subset of natural numbers associated to sets in this way then 1 ∈ S and if n ∈ S then n0 ∈ S. . indeed if E is the set of those n ∈ N for which T (n) then byhypothesis 1 ∈ E and n0 ∈ E whenever n ∈E. Indeed if m = #(A) and n = #(B) where A and B are two collections of strokes. 1 The ordering of sets really leads to a different concept. so by the induction axiom S = N. or all strokes in both A and B will be exhausted. In this case though it is actually an order relation. A number of examples of the application of this method of proof will occur in the subsequent discussion.14) is an order relation in general. a partial order relation with the additional property that for any two natural numbers m. in which case #(A) > #(B). then upon setting up a mapping from A to B that takes the first stroke in A to the first stroke in B. . or there will be strokes remaining in A but no strokes left in B. then T (n) is true for all n ∈ N. n either m < n or m > n or m = n. that of an ordinal number rather . The sets that are associated to the natural numbers in this way are called the finite sets. so by the induction axiom E = N. a complication that will not be discussed further here1 . even for infinite sets. All the natural numbers are associated to equivalence classes of sets in this way. In terms of the symbol |.

A set A is said to be countable2 if there is an injective mapping f : A −→ N.8 CHAPTER 1. but that is not necessarily the case for infinite sets. as is apparent from the observation that to any collection { | | . or the same number of elements. k } it is always possible to add another stroke so that no finite set of strokes exhausts all sets of strokes. thus #(A) = ℵ0 for any countably infinite set A. . Indeed if f : A −→ FA 2 is assumed to be a bijective mapping from a set A to the set FA 2 of all subsets of A then in particular the subset n . ALGEBRAIC FUNDAMENTALS A set that is not finite is called an infinite set. so that #(A) = #(N). . even if E is a proper subset of N. so that #(A) ≤ #(N). or equivalently #(A) ≥ ℵ0 for any infinite set. if A ⊂ B is a proper subset and both A and B are finite then #(A) < #(B). and that a set S is infinite is indicated by writing #(S) = ∞ as a slight abuse of notation. since that does not mean that ∞ is the cardinal number of the set S but just that S is not a finite set. Indeed each element of E is in particular a natural number n. so when these natural numbers are arranged in increasing order n1 < n2 < n3 < · · · then the mapping E : E −→ N that associates to ni ∈ E the natural number i ∈ N is a bijective mapping and consequently #(E) = #(N). as N. and it is said to be countably infinite if there is a bijective mapping f : A −→ N. In that sense N is the smallest infinite set. but #(E) < #(N) = ℵ0 for any finite subset E ⊂ N. On the other hand here are infinite sets that are strictly larger than N. Actually for any infinite subset E ⊂ N it is also case that #(E) = #(N) = ℵ0 . for instance (1. Thus a finite set is considered to be a countable set. The set N is an infinite set. It may be rather counterintuitive that any infinite proper subset of N has the same cardinality. The cardinality of the set N is usually denoted by #(N) = ℵ0 .15) #(FA 2 ) > #(A) for any set A.

o E = x∈A.

x∈ / f (x) .

it is not uncommon to use “at most countable” in place of countable and “countable” in place of countably infinite . so the set of all subsets of the set N of natural numbers is not a countable set. this contradictory situation shows that there cannot be a bijective mapping f : A −→ FA N 2 . However if a ∈ E then from the definition of the set E it follows that a ∈ / f (a) = E. so some caution is necessary when comparing discussions of these topics. so there is no end to the size of possible sets.2 (Cantor’s Diagonalization Theorem) If to each S natural num- ber n ∈ N there is associated a countable set En . than a cardinal number. 2 This terminology is not universally accepted. The Theory of Sets. the union E = n∈N En is a countable set. Nonetheless there are many sets that appear to be considerably larger than N but nonetheless are still countable. such as the classical treatment by Kamke. Theorem 1. but the set of all subsets of that set is properly larger still. In particular #(N) < #(F2 ). is a well defined set so it must be the image E = f (a) of some a ∈ A. while if a ∈/ E = f (a) then from the definition of the set E it also follows that a ∈ E. this is discussed in various books on set theory.

.2 .1 then proceeding along the increasing diagonal with x2.1. then in the next increasing diagonal x3.1 .1 . x1. if row n is finite the places that would be filled by some terms xn.3 and so on.2 x1. where row i consists of all the elements xn.2 x2.1 .1 x1.3 .3 x2. x2. x1.2 x3. Imagine these elements now rearranged in order by starting with x1.1 . and if En is countable S then since En+1 ⊂ i∈N i ∪ En it follows from Cantor’s Diagonalization The- orem and the observation that a subset of a countable set is at most countable that En+1 is countable.4 ··· x2. x1. The elements of the union E then can be arranged in an array x1. Although it was already shown that the set of all subsets of N is not count- able.2 . x2.4 ··· x3.i ordered by the natural number i ∈ N. x3. · · · there is the obvious bijection N −→ E. x1.1 x3. that is clearly  the case for n = 1. When all of the elements in E are written out in order as x1. nonetheless it follows from Cantor’s Diagonalization Theorem that set E of all finite subsets of N is countable. x2. Then since all the sets En are countable S it follows again from Cantor’s Diagonalization Theorem that the set E = n∈N En is countable. SETS AND NUMBERS 9 Since each set En is countable there is an injective mapping N −→ En that associates that associates to each i ∈ N an element xn.i ∈ En . Indeed the first step is to show by induction that for any natural number n ∈ N the set En of all subsets A ⊂ N for which #(A) = n is countable.2 .3 x1.3 x3.1 x2.2 .1.4 ··· ··· ··· ··· ··· ··· .i are blank so are just ignored in this ordering. showing that E is countable.1 .

ALGEBRAIC FUNDAMENTALS 1.10 CHAPTER 1. Rings and Fields To any two natural numbers a. if a = #(A) and b = #(B) for finite sets A and B the sum is defined by a + b = #(A ∪ B) and the product is defined by a · b = #(A × B) where A × B is the Cartesian Product of the two sets A and B.2 Groups. the set defined by n . b ∈ N there can be associated their sum a + b ∈ N and their product a · b ∈ N.

y) . o (1.16) A × B = (x.

. x ∈ A. y ∈ B .

and consequently a · (b · c) = (a · b) · c. showing that a+(b+c) = (a+b)+c. and (v) the distributive law relating addition and multiplication: (a + b) · c = a · c + b · c. (b.b = #(B). and c = #(C) then b+c = #(B∪C) so a + (b + c) = # A ∪ (B ∪ C) = #(A ∪ B ∪ C) while a + b = #(A ∪ B) so (a+b)+c = # (A∪(B)∪C = #(A∪B∪C). Moreover a + b = #(A ∪ B) = #(B ∪ A) = b + a. c)) = (a. y) ∈ A × B to (y. Moreover a · b = #(A × B) and b · a = #(B × A). These algebraic operations satisfy the following laws: (i) the associative law of addition: a + (b + c) = (a + b) + c. the result that a + n = b + n for natural numbers a. ∈ B × A is a bijective mapping so #(A × B) = #(B × A) and consequently a · b = b · a. . so that A0 = A ∪ {|} and consequently a0 = #(A∪{|}) = a+1. and the converse can be established by induction on n. It is clear that a + n = b + n if a = b. which by succession implies that a = b. hence the collection of strokes B can be viewed as the collection of strokes bijective to A together with some additional strokes C. c) . b. where a = #(A) and b = #(B) for sets A and B consisting of collections of strokes. n ∈ N if and only if a = b. if a = #(A). (ii) the commutative law of addition: a + b = b + a. b. The successor to a natural number a = #(A) where A is a finite collection of strokes was identified with the natural number a0 = #(A0 ) where A0 is derived from A by adding another stroke. c)) = (a. (iv) the commutative law of multiplication: a · b = b · a. so B = A ∪ C and therefore #(B) = #(A) + #(C). x). c) . correspondingly (a · b) · c) =   n o n o # (A × B) × C) where # (A × B) × C) = a. Finally (a + b) · c = # (A ∪ B) × C) where (A  ∪ B) × C = (A × C) ∪ (B × C) so # (A ∪ B) × C) = # (A × C) ∪ (B × C) = a · c + b · c and consequently (a + b) · c = a · c + b · c. indeed a + 1 = b + 1 is equivalent to a0 = b0 . (iii) the associative law of multiplication: a · (b · c) = (a · b) · c. but the mapping that sends (x. b. From this identification follows the cancellation law of the natural numbers. indicates that there is an injective mapping f : A −→ B. c) n o n o so A × (B × C) = a. Then for multiplication n o  b · c = #(B × C) so a · (b · c) = # A × (B × C) where B × C = (b. To verify these laws. The order relation a ≤ b. while if the converse holds for some n ∈ N and if a + n + 1 = b + n + 1 then (a + n)0 = (b + n)0 so by succession a + n = b + n and then by induction a = b. b).

the group S(A) is not abelian. the associative law takes the form (a + b) + c = a + (b + c). If G is a group and H ⊂ G is a subset such that 1 ∈ H and a · b ∈ H whenever a. it is customary to simplify the notation by writing ab in place of a · b and dropping parentheses when the meaning is clear.1. since there are expressions in which parentheses are necessary. A group G is said to be an abelian group. A mapping f : G −→ H between two groups that satisfies f (a·b) = f (a)·f (b) for any a. The algebraic notation is usually simplified by writing ab in place of a·b and by dropping parentheses when the associative laws indicate that they are not needed to specify the result of the operation uniquely. and the element a−1 ∈ G is called the inverse of the element a ∈ G. called a subgroup of G. A group . It may be clearest first to describe these structures more abstractly. though. so by writing a + b + c in place of (a + b) + c and abc in place of (a · b) · c. such as the set consisting of three elements. The element 1 ∈ G is called the identity in the group. since it could stand for either a · (b + c) or (a · b) + c and these can be quite different. b. A more interesting example of a group is the symmetric group S(A) on a set A. c ∈ G. for instance a · b + c is not well defined. The simplest group consists of just a single element 1 with the group operation 1 · 1 = 1. for even quite simple sets A though. GROUPS.2. so by writing abc in place of (ab)c. (iii) the inverse law: to each a ∈ G there is associated a unique a−1 ∈ G such that a · a−1 = a−1 · a = 1. Some care must be taken. the identity element of S(A) is the identity mapping ι : A −→ A for which ι(a) = a for all a ∈ G. c ∈ R then b = (a−1 · a) · b = a−1 · (a · b) = a−1 · (a · c) = (a−1 · a) · c = c. the identity element is denoted by 0 and the identity law takes the form a + 0 = 0 + a = a. In the additive notation the group operation associates to any elements a. (ii) the identity law: 1 · a = a · 1 for any a ∈ G. b ∈ G another element a · b ∈ G and that satisfies (i) the associative law: (a · b) · c = a · (b · c) for any a. Just as in the case of the natural numbers. RINGS AND FIELDS 11 Consequently the order relation a ≤ b amounts to the condition that b = a + c for another natural number c. It is a straightforward matter to verify that S(A) does satisfy the group laws. and at the same time N is in a sense a basic component and model for these further algebraic structures. but N can be extended to larger and more algebraically complete sets. b ∈ G another element a + b ∈ G. the set of all bijective mappings f : A −→ A. b ∈ G is called a group homomorphism. b ∈ H then it is evident that H itself is a group. if in addition it satisfies (iv) the commutative law: a · b = b · a for all a. there are cases in which the additive notation is used. While the multiplicative notation for the group operation is most common in general. where the product f · g of two bijections f and g is the composition f ◦ g. A group is defined to be a set G with a specified element 1 ∈ G and with an operation that associates to any elements a. b. The algebraic operations on the set N of natural numbers do not form a naturally complete mathematical structure. and the inverse law associates to each a ∈ G a unique element −a ∈ G such that a + 0 = 0 + a = a. or equivalently a commutative group. b ∈ G. The cancellation law holds in any group: if a · b = a · c for some a. and the inverse of a mapping f is the usual inverse mapping f −1 .

one with the additional property that to any a ∈ F for which a 6= 0 there is associated another element a−1 ∈ F such that a · a−1 = a−1 · a = 1.12 CHAPTER 1. A field is defined to be a ring F for which the set of nonzero elements form a group under multiplication. b ∈ S then it is evident that S itself is a ring. and the groups G and H can be identified through these mappings. where 0R is the additive identity in the ring R and 0S is the additive identity in the ring S. as do the natural numbers. 1 ∈ S and a + b ∈ S and a · b ∈ S whenever a. with the identity element 0 and with the inverse of a ∈ R denoted by (−a). defined as a set R with distinct specified elements 0. by dropping parentheses when the meaning is clear. since the distributive law implies that a · 0 = 0 for any a ∈ R. (iii) the distributive law that a · (b + c) = (a · b) + (a · c) holds. Groups involve a single operation. The basic such structure is that of a ring3 . so that (a · b) · c = a · (b · c) and a · b = b · a and 1 · a = a · 1 = a. for a · b + (−a) · b = a + (−a) · b = 0 · b = 0 and consequently (−a) · (−b) = a · b. For both rings and fields the notation is customarily simplified by writing ab in place of a · b. A group homomorphism f : G −→ H that is a bijective mapping is called a group isomorphism: the inverse mapping g = f −1 then is clearly a group homomorphism g : H −→ G. for if 1G denotes the identity in G and 1H denotes the identity in H then since f (1G ) = f (1G · 1G ) = f (1G ) · f (1G ) it follows from cancellation in H that f (1G ) = 1H . Another consequence of the  distributive law is that (−a) · b = −(a · b). but a ring is not a group under multipli- cation so it does not automatically follow that f (1R ) = 1S . b ∈ R. (ii) f (a · b) = f (a) · f (b) for all a. 1 ∈ R and with operations that associate to any elements a. If R is a ring and S ⊂ R is a subset such that 0. The operation a + b is called addition and the operation a · b is called mul- tiplication. Since a ring R is in particular a group under addition it is automatically the case that f (0R ) = 0S . and since f (a) · f (a−1 ) = f (a · a−1 ) = f (1G ) = 1H it follows that f (a−1 ) = f (a)−1 . so by writing a + b + c in place of (a + b) + c and abc in place of (ab)c. The additive identity 0 plays a special role in multiplication. A ring homomorphsm f : R −→ S between two rings is a mapping such that (1) f (a + b) = f (a) + f (b) for all a. (ii) the operation “ · ” is associative and commutative with the identity element 1. . for if a ∈ R then a = a · 1 = a · (1 + 0) = a · 1 + a · 0 = a + a · 0 hence a · 0 = 0. b ∈ R their sum a + b ∈ R and their product a · b ∈ R such that (i) R is an abelian group under the operation “+”. but other structures involve two opera- tions. thus a field is a special ring. and by writing a − b in place of a + (−b). called a subring of R. which is the reason 3 What is called a “ring” here is sometimes called a “commutative ring with an identity” since there are more general structures that are also called rings in which multiplication is not necessarily commutative and there does not necessarily exist an identity 1 for multiplication. The multiplicative subgroup of nonzero elements in the field F is denoted by F × . ALGEBRAIC FUNDAMENTALS homomorphism preserves other aspects of the group structure automatically. and (iii) f (1R ) = 1S where 1R is the multiplicative identity in the ring R and 1S is the multiplicative identity in the ring S. b ∈ R.

A ring homomorphism f : R −→ S that is a bijective mapping is called a ring isomorphism. the formal definitions of addition and multiplication are what would be expected of such differences. The corresponding definitions hold for the special case of rings that are fields. where European mathematicians were still suspicious of both even in the late middle ages. d) = (ac + bd. ade+acf +bce+bdf ) = (a. d)(e. f ) = ae+bf +ce+df. b. d) (e. f ) then a + d = b + c and c + f = d + e so (a + f ) + c = a + (f + c) = a + (d + e) = (a + d) + e = (b + c) + e = (b + e) + c from which it follows that by the cancellation law that a + f = b + e hence (a. f )+(c. b) (c. b ∈ N and define5 operations of addition and multiplication of elements of N × N by (a. f ) = (ace+adf +bcf +ade. f ). and while zero and the additive inverses appeared quite early in commerce. The only slightly complicated cases are those of the associative law for multiplication and the distributive law. It is an immediate consequence of the definition of equivalence that (1. indeed reflexivity and symmetry are obvious. b)  (c. It is a straightforward matter to verify that these two operations satisfy the same associative. commutative and distributive laws as the natural numbers. which will automatically contain inverses. since fractions appear early in history. ad + bc).1. b) · (c. the inverse mapping g = f −1 then is clearly a ring isomorphism g : S −→ R. b)(c. since the nonzero elements of a field form a group under multiplication. as a simple consequence of those laws for the natural numbers. so that 0 denotes the additive iden- tity for any ring. af +be+cf +de = (a. The natural numbers N can be extended to a ring by adjoining enough addi- tional elements to make addition into a true group operation while maintaining the associative and commutative laws of multiplication and the distributive law relating addition and multiplication4 . and the rings R and S can be identified through these mappings. b) + (c. 5 The idea is to look at formal differences a − b of the natural integers as elements of a larger set. See the discussions in the Princeton Companion to Mathematics and further references there. b)} of natural numbers a. f ). d) and (c. RINGS AND FIELDS 13 for condition (iii) in the definition of a ring homomorphism. For this purpose consider the Cartesian product N × N consisting of all pairs {(a. where a calculation leads to   (a. where there are losses as well as gains. b + c) for any a. f ) and   (a. . It is customary to drop the subscripts on the ring identity elements. c ∈ N. b1 )  (a2 . b)  (e.17) (a. and if (a. It is easy to see that this actually is an equivalence relation. GROUPS. However for the special case of a field condition (iii) is fulfilled automatically. d)  (e. b)  (a + c. c + d) and (a. d)(e. d) = (a + b. b)(e. b)+(c. b2 ) if and only if a1 + b2 = a2 + b1 . d) (e.2. Next introduce an equivalence relation on the set N × N defined by (a1 . 4 Historically multiplicative inverses were quite early extensions of the natural numbers. they were surprisingly late in the academic world.

. (a1 . in the sense that if (a1 . 1) · (n + 1. To see what is contained in the larger set Z that is not contained in the subset N. 1) or (n. so Z is an extension of the set N with its algebraic operations to a ring. 1) is the additive identity in Z and not contained in φ(N). n + 1) = φ(m) − φ(n). d)  (a2 . it is evident from (1. so φ(m + n) = φ(m) + φ(n). 1) is in the image φ(N) of the natural numbers. (m + 1) · 1 + (n + 1) · pair 1 = (mn + 1 + m + n + 1. b) ∈ N × N (1. 1) + (n + 1. 1) in view of (1.18) (a. and the equivalence class of (1. 1)  (m + n + 1 + 1. b2 )(c. therefore the mapping φ identifies the natural numbers with a subset N ⊂ Z.19) (a.˙ Thus under the imbedding φ : N −→ Z the algebraic operations on N are just the restriction of the algebraic operations on Z. 1 + m + n + 1)  (mn + 1. on the other hand since φ(m · n) is the equivalence class of the pair m · n + 1. b) so the equivalence class of (1. 1) so the equivalence class of (b. In view of (1.17) that any pair (m. Furthermore φ(m + n) ∈ Z is the equivalence class  of the pair (m + n + 1. 1) + (1. b1 ) + (c. The mapping φ : N −→ Z that associates to any n ∈ N the equivalence class in Z of the pair (n + 1. 1) while φ(m) · φ(n) is the equivalence class of the  (m + 1. a) in the quotient Z acts as the additive inverse of the equivalence class of (a. n) ∈ N then (m. b) + (1. a + b)  (1. a + 1)  (a. 1) or (1. so the only elements in Z not contained in the image φ(N) are precisely the elements needed to extend the natural numbers to a ring. 1) = (a + 1. so the elements of the ring Z can be identified with differences of elements in the image φ(N) of the natural numbers in the ring Z. ˙ d) . and these algebraic operations satisfy the same associative. 1) in the quotient Z satisfies the group identity law. commutative and distributive laws as the natural numbers. n + 1 + 1) = (m + 1. The quotient N × N/  of the set N × N by this equivalence relation is then a well defined set. the equivalence class of (n. n) ∈ Z × Z is equivalent to either (1. 1)  (n + 1. the operations of addition and multiplication are well defined among equivalence classes of elements of N × N so they are well defined on the set Z of equivalence classes. 1) ∈ N × N is an injective mapping.17) so φ(m · n) = φ(m)φ(n).14 CHAPTER 1. n) is the inverse −φ(n) and not contained in φ(N). denoted by Z and called the set of integers. d) and ˙ d)  (a2 . and in addition for any (a. b) + (b. since if (m + 1. the equivalence class of (1. 1). b1 )(c. n) where n ∈ N and n > 1. b1 )  (a2 . 1) then m + 2 = n + 2 so m = n by the cancellation law. n)  (m + 1 + 1. b) and therefore Z is a well defined ring.17) for any (a. 1) + (n + 1. 1 + 1) = (m + 1) + (n + 1). b2 ) + (c. Alternatively whenever (m. a) = (a + b. It is another straightforward matter to verify that the group operations on N × N preserve equivalence classes. b2 )         (a1 . b) ∈ N×N (1. ALGEBRAIC FUNDAMENTALS a very useful observation in dealing with this equivalence relation. 1) while φ(m) + φ(n) ∈ Z is the equivalence class of the pair (m + 1. 1) = (m + 1) · (n + 1) + 1 · 1. 1 + 1 = (m + 1.

defined as rings R with a distinguished subset P ⊂ R such that (i) for any element a ∈ R either a = 0 or a ∈ P or −a ∈ P . noting again that with this convention 0 is not considered a positive integer. for if b − a ∈ P and c < 0 then −c ∈ P and c(a − b) = (−c)(b − a) ∈ P . b ∈ P then a + b ∈ P and a · b ∈ P . Some interesting additional examples of rings can be derived from the ring Z by considering for any n ∈ N the equivalence relation defined by (1. and associated to this set P is the relation defined by setting a ≤ b if either b = a or b − a ∈ P . for if it were an ordered ring then 1 + 1 + · + 1 > 0 for any sum since 1 > 0. As usual for an order relation a < b is taken to mean that a ≤ b but a 6= b. a ≤ b and b ≤ a respectively. for (−a) − (−b) = b − a ∈ P . and b ≥ a is equivalent to a ≤ b while b > a is equivalent to a < b. The elements p(x) ∈ R[x] are polynomials in x . It is amusing and instructive to write out the detailed addition and multiplication tables for some small values of n. it is transitive since if a ≤ b and b ≤ c then b−a ∈ P and c−b ∈ P so c−a = (c−b)+(b−a) ∈ P . (iv) If a ≤ b and c < 0 then ca ≥ cb. and since it cannot be the case that both b − a ∈ P and a − b ∈ P it must be the case that a = b. hence the operations of addition and multiplication can be defined on equivalence classes. and only one of these possibilities can occur. since for any a. for if a ∈ P then a2 ∈ P . (i) If a ∈ R and a 6= 0 then a2 > 0. To any ring R it is possible to associate another ring R[x] called the poly- nomial ring over the ring R. since 1 = 12 . that quotient ring is often denoted by Z/nZ. providing the natural structure of a ring on the quotient. (iii) If a ≤ b and c > 0 then ca ≤ cb. note particularly that the additive identity 0 is not considered a positive element. that is 1 > 0. RINGS AND FIELDS 15 A particularly interesting class of rings are the ordered rings. GROUPS.1. Actually it is an order relation.20) a ≡ b (mod n) if and only if a − b = n x for some x ∈ Z. This ring is not an ordered ring. and these possibilities correspond to the relations a = b. for if b−a ∈ P and c ∈ P then c·(b−a) ∈ P . while if a ∈ / P then −a ∈ P and a2 = (−a)2 ∈ P . these elements are called the positive integers. but the sum of n copies of 1 is 0. The set P is called the set of positive elements of the ring R. For the special case of the ring Z it is clear that the subset φ(N) satisfies the condition to be the set of positive elements defining the order on Z. The positive elements of the ordered ring thus are precisely those elements a ∈ R for which a > 0. and (ii) if a. and it satisfies the identity condition since if a ≤ b then either b = a or b−a ∈ P and if b ≤ a then either b = a or a − b ∈ P . (ii) The multipicative identity is positive. (v) If a ≤ b then −a ≥ −b. It is easy to see that if a1 ≡ a2 (mod n) and b1 ≡ b2 (mod n) then (a1 + b1 ) ≡ (a2 + b2 ) (mod n) and (a1 · b1 ) ≡ (a2 · b2 ) (mod n). It may be worthwhile listing a few standard properties of the order relation in an ordered ring.2. b ∈ R either b − a = 0 or b − a ∈ P or a − b ∈ P and only one of these possibilities can arise. This is a partial order relation: it is reflexive since clearly a ≤ a.

6 The idea of this construction is to introduce formally the quotients a/b of pairs of elements of the ring integers for which b 6= 0. for any p(x) is a polynomial other than 0 with the leading coefficient an either an > 0 and p(x) ∈ P or an < 0 and −p(x) ∈ P . However if there are no such elements in a ring then that ring can be extended to a field by adding additional elements. In an integral domain the cancellation law holds: if a · c = b · c where c 6= 0 then 0 = bc − ac = (b − a) · c and since c 6= 0 it follows that b − a = 0 so b = a. In rings such as Z/6Z there are nonzero elements such as the equivalence class a of the integer 2 and the equivalence class b of the integer 3 for which a · b = 0. If an 6= 0 the polynomial is said to have degree n. Such elements cannot possibly have multiplicative inverses. for that would imply that b = (a−1 · a) · b) = a−1 · (a · b) = a−1 · b = 0. A ring for which a · b = 0 implies that either a = 0 or b = 0 is called an integral domain. b) · (c. they n can be written more succinctly in the form p(x) = j=1 aj xj . and define6 operations of addition and multiplication of elements of R × R× by (a. b) where a. and the definitions of the sum and product of two pairs is defined with this in mind. This ring can be viewed as an ordered ring by taking as the set P of positive polynomials those polynomials for which the leading term is a positive element of R. .16 CHAPTER 1. so that an > 0. If R is an integral domain consider the Cartesian product R × R× consisting of all pairs (a. b ∈ R and b 6= 0. if a. ALGEBRAIC FUNDAMENTALS of the form p(x) = a0 + a1 x + a2 x2 + · · · + an xn with Pcoefficients aj ∈ R. d) = (ad + bc. b ∈ R and neither of these two elements is zero. bd). It is clear that any ordered ring R is an integral domain. and the term an xn is the leading term of the polynomial and the coefficient an is the leading coefficient. and the sum and product of any positive polynomials are clearly also positive polynomials. j. d) = (ac. then ±a ∈ P and ±b ∈ P for some choice of the signs and consequently ±(a · b) = (±a) · (±b) ∈ P for the appropriate sign so a · b 6= 0. Any two polynomials can be written as p(x) = j=1 aj x and Pn k q(x) = k=1 bk x for the same integer n. The ring R is naturally imbedded as a subset of R[x] by viewing any a ∈ R as Pan polynomial j of degree 0. the sum of these polynomials is defined by n X p(x) + q(x) = (aj + bj )xj j=1 and their product is defined by n X p(x) · q(x) = aj bk xj+k .k=1 It is a straightforward matter to verify that R[x] is a ring with these operations. a contradiction. where the additive identity is the additive identity 0 of the ring R and the multiplicative identity is the multiplicative additive identity 1 of the ring R. bd) and (a. since some of the coefficients aj or bk can be taken to be zero. b) + (c. for if R is an ordered ring.

b2 ) if and only if a1 · b2 = a2 · b1 . b · c) if c 6= 0. The mapping φ : R −→ F that sends an element a ∈ R to the equivalence class φ(n) ∈ F of the element (a. b) + (c. it has a well defined field of quotients. 1). 1) · (1. b2 ) + (c. b2 )(c. and consequently F is a field. GROUPS. d) + (e. b3 ) then a1 b2 = a2 b1 and a2 b3 = a3 b2 so a1 b2 b3 = b1 a2 b3 = b1 a3 b2 and consequently (a1 b3 − a3 b1 )b2 = 0 so a1 b3 − a3 b1 = 0 by the cancellation law. It is an equally straightforward matter to verify that addition and multiplication on R × R× preserve equivalence classes. b) = (a · b. b) ∈ Z × Z× . b1 )  (a2 . a · b)  (1. b2 )  (a3 . b)  φ(a) · φ(b)−1 . f ). Indeed reflexivity and symmetry are trivial. where a calculation leads to    (a. and since the operations of addition and multiplication are well defined among equivalence classes of elements of R × R× they are well defined on the set F of equivalence classes and determine the structure of a ring on F . so any rational can be represented by the product a · b−1 for some integers a. the multiplicative identity element (1. hence an integral domain. b) then if φ(a) is the equivalence class of (a. Next introduce an equivalence relation on the set R × R× defined by (a1 . b1 ) + (c. and since (a. The quotient F = (R × R× )/  of the set R × R× by this equivalence relation is then a well defined set. The only slightly complicated case is that of the associative law of addition. and the additive inverse −(a. b1 )  (a2 . b) = (a. f ) = adf + bcf + bde. In particular since the ring Z is an ordered ring. b) = (−a. so that rationals . so (b. Any rational number in Q can be represented by the equivalence class of a pair (a. and for transitivity if (a1 . d)  (a2 . 1). a) is the multiplicative inverse of (a.2. in the sense that if (a1 . It is customary to simplify the notation by setting a · b−1 = a/b. If a 6= 0 and b 6= 0 then (b. It is easy to see that this actually is an equivalence relation. 1) and φ(b)−1 is the equivalence class of (1.21) that rational also can be represented by the product (a · c)(b · c)−1 for any nonzero integer c. in this way R can be realized as a subset of F such that the algebraic operations of R are compatible with those of F . d) and ˙ d)  (a2 . d) + (e. a very useful observation in dealing with this equivalence relation. b). 1) ∈ R × R× is clearly an injective mapping for which φ(a1 + a2 ) = φ(a1 ) + φ(a2 ) and φ(a1 · a2 ) = φ(a1 ) · φ(a2 ). with the additive identity element (0. and as in (1. RINGS AND FIELDS 17 It is a straightforward matter to verify that the set R × R× is a ring under these operations. b) it follows that (a.21) (a. b)  (a · c. denoted by Q and called the field of rational numbers. ˙ d) . b. It is an immediate consequence of the definition of equivalence that (1. a) · (a. b2 ) and (a2 . This field is called the field of quotients of the ring R.1. (a1 . b) + (c. b1 )(c. bdf = (a. b2 ) so that a1 b2 = a2 b1 then         (a1 . 1). b). b1 )  (a2 .

a contradiction. For instance although there are rational numbers x such as x = 1 for which x2 < 2 and rational numbers such as x = 2 for which x2 > 2. If E ⊂ F is bounded above. where N is identified with the positive integers. which amounts to a verification that the field R described axiomatically actually exists. if the integers are of the same sign then after multiplying both by −1 both will be positive so a/b ∈ P . an element a ∈ F is said to be the least upper bound or supremum of the set E if (i) E ≤ a. but one that is still incomplete for many purposes. and 7 This is often attributed to the school of Pythagoras. A nonempty subset E ⊂ F is bounded above by a ∈ F if x ≤ a for all x ∈ F . b ∈ N are any natural numbers for which (a/b)2 = 2. and correspondingly it is bounded below by b ∈ F if x ≥ b for all x ∈ F . the actual construction of this field as an extension of the rationals. where the set P of positive elements is defined as the set of rationals a/b where a > 0 and b > 0. but if they are of opposite signs the quotient a/b can never be represented by a quotient of two positive integers so a/b ∈/ P . The natural numbers are a countably infinite set. A subset is just said to be bounded above if it is bounded above by some element of the field F . However the rationals can be extended to a larger field R. but then 4c2 = (2c)2 = a2 = 2b2 so b must also be a multiple of 2. then the ring Z is also a countably infinite set. and since the rationals can be imbedded in the set of pairs Z × Z by selecting a representative pair for each equivalence class in Q it follows that the field Q is also a couintably infinite set. and that E is bounded below by b is indicated by writing E ≥ b. It is clear that if a/b and c/d are both in P then so are (a/b) + (c/d) and a/b) · (c/d). which does include a number x for which x2 = 2. see for instance the Princeton Companion to Mathematics. where it can be assumed that not both a and b are multiples of 2. Since a2 is a multiple of 2 then a itself must be a multiple of 2. . so a = 2c. will be deferred to a later point in the discussion. or equivalently a2 = 2b2 . where the order is described by a subset P ⊂ F of positive elements. so the discussion here will begin with an axiomatic definition of the field R of real numbers. This is yet another example of sets N ⊂ Z ⊂ Q where the inclusions are strict inclusions but nonetheless #(N) = #(Z) = #(Q) = ℵ0 .18 CHAPTER 1. as already noted. suppose that F is an ordered field. The field Q is an ordered field. To begin more generally. by Cantor’s Diagonalization Theorem. b. so P can serve as the positive elements in the field Q. an almost mythical mathematician who lived in the sixth century BCE in Greece. more generally E1 ≤ E2 indicates that x1 ≤ x2 for any x1 ∈ E1 and x2 ∈ E2 . a very old observation7 . This is a slightly more difficult extension to describe than those from the natural numbers to the integers or from the integers to the rationals. The rationals Q form an ordered field. Suppose that a. ALGEBRAIC FUNDAMENTALS can be represented by quotients a/b for integers a. That E is bounded above by a is indicated by writing E ≤ a. The set of all pairs of integers is also countably infinite. To verify that. Since the integers can be decomposed as the union Z = 0 ∪ N ∪ −N of countable sets. and correspondingly for bounded below. there is no rational number x for which x2 = 2. among many other delightful properties. the field of real numbers. any rational other than 0 can be represented by a quotient a/b of integers.

and by induction on k it follows that φ(n + k) = φ(n) + φ(k) for all n. Indeed if E ⊂ F is a nonempty set and if E ≥ b consider the set of all lower bounds of E. and (ii) if E ≥ y then y ≤ b. Thus a is the greatest lower bound of E. For a general ordered field it is not necessarily the case that a set that is bounded above has a least upper bound. an element b ∈ F is said to be the greatest lower bound or infimum of the set E if (i) E ≥ b. k ∈ N. and by definition L ≤ E. An ordered field for which any nonempty set that is bounded above has a least upper bound is called a complete ordered field. By definition the field of real numbers is a complete ordered field. so since all elements of E are upper bounds of L it follows that x ∈ / E. so since a is the least upper bound it follows that x ≥ a. This is a nonempty set. GROUPS. That there exists such a field and that it is uniquely defined remain to be demonstrated. which implies that φ(n + k) 6= φ(n) so the mapping φ is injective. so in particular L is bounded above. Proof: If it is not true that there is an integer n1 for which n1 > a then . Correspondingly if E ⊂ F is bounded below. The constructions involved in extending the natural numbers to the rationals are expressed in terms of the algebraic operations on N and can be performed in the same way on elements in the image φ(N) ⊂ R. where n0 ∈ N denotes the successor to n ∈ N.2. In this sense it is possible to view the rational numbers as a subfield of the real numbers. Theorem 1. the least upper bound of a set E is denoted by sup(E).3 For any real number a > 0 there is an integer n such that 1/n < a < n. from which it follows that any y ∈ E is an upper bound of L. so φ(n + k) > φ(n) in terms of the order in R. so for the present just assume that R is a complete ordered field. the greatest lower upper bound of a set E is denoted by inf(E). as desired. and it is denoted by R. RINGS AND FIELDS 19 (ii) if E ≤ x then x ≥ a. It is evident that if −E = { −x ∈ F | x ∈ E } then sup(−E) = − inf(E). and in this way the mapping φ extends to an injective field homomorphism φ : Q −→ R. Any x ∈ E is an upper bound of L.1. In a complete ordered field it is automatically the case also that any nonempty set that is bounded above has a greatest lower bound. There is a natural mapping φ : N −→ R defined inductively by φ(1) = 1 and φ(n0 ) = φ(n) + 1. so the mapping φ preserves both of the algebraic operations. Since F is a complete ordered field it follows that L has a least upper bound a = sup(L). so a is a lower bound for E. or that a set that is bounded below has a greatest lower bound. the set L = { x ∈ E | E ≥ x }. and that point of view will be adopted henceforth. Moreover since nk = n + n + · · · + n for k terms it follows as well that φ(nk) = φ(n+n+· · ·+n) = φ(n)+φ(n)+· · ·+φ(n) = kφ(n) = φ(n)φ(k). Thus φ(n + 1) = φ(n) + φ(1). and for any real numbers a < b there is a rational number r such that a < r < b. since b ∈ L by hypothesis. On the other hand if x > a then since a is the least upper bound it follows that x is not an upper bound for L.

but in that case n + 2 > n + 1 ≥ b. The set of all rational numbers of the form m m n for which n ≤ a is m0 m0 m0 +1 bounded above so it has a least upper bound n . and it must therefore have a least upper bound b. On the other hand the real numbers are sufficiently complete that. Thus there is an integer n1 > a. only the uniqueness needs to be established since the uniquenessis clear. and for the same reason there is an integer n2 > a1 . so that b cannot be an upper bound of the set N. so there is some integer n ≥ b − 1. so any real number can be approximated as closely as desired by rational numbers. a contradiction. for example. then n ≤ a but n n > a hence m0n+1 ≤ a + n1 < b. ALGEBRAIC FUNDAMENTALS a ≥ n for all integers n so the subset N ⊂ R is a nonempty set that is bounded above. as desired. and between any two real numbers. and if n ≥ n1 and n ≥ n2 then 1/n < a < n.20 CHAPTER 1. For that purpose. if r ∈ R introduce the sets of real numbers n . Thus there are rational numbers both larger and smaller than any positive real number. any positive real number has a unique positive square root. Next if a < b by what has just been shown there is an integer n such that n1 < (b − a) so a + n1 < b. Then b − 1 is not an upper bound for the set N.

o n

o X = x ∈ .

R .

x2 < r and Y = y ∈ .

R .

y 2 > r .

.

.

.

which is a contradiction since y0 = inf(Y ). since any y ∈ Y is an upper bound for X it must be greater than the least upper bound so x0 ≤ y. and if 0 < h1 then whenenever h < /2y0 follows that (y0 − h)2 = y02 − 2y0 h + h2 ≥ r +  − 2y0 h > r showing that (y0 −h ∈ Y . If x20 < r so that x20 = r −  for some  > 0. which is a contradiction since x0 = sup(X). so by the completeness property of the real numbers there exist the bounds x0 = sup(X) and y0 = inf(Y ). and if 0 < h ≤ 1 and a = 2x0 + 1 then 2x0 h + h2 ≤ (2x0 + 1)h = ah so whenever also h < /a it follows that (x0 + h)2 = x20 + 2x0 h + h2 ≤ r −  + ha < r. . and then x0 as a lower bound for Y must be less than the greatest lower bound so x0 ≤ y0 . These sets are clearly nonempty and x < y for any x ∈ X and y ∈ Y . Similarly if y02 > r so that y02 = r +  for some  > 0. showing that (x0 +h) ∈ X. Altogether r ≤ x20 ≤ y02 ≤ r and consequently x20 = y02 = r as desired.

The elements of V are called vectors. so since V is a group under addition it follows that 0v is the identity element of that group. If 0 ∈ F is the additive identity in the field and v ∈ V is any vector in the vector space then 0v = (0 + 0)v = 0v + 0v. Note that the definition of a vector space involves both the set V and a partic- ular field F . which associates to a ∈ F and v ∈ V an element av ∈ V . (iv) the distributive law holds for addition and scalar multiplication: a(v1 + v2 ) = av1 + av2 for all a ∈ F and v1 . (ii) scalar multiplication is associative: (ab)v = a(bv) for any a. addition.3. and scalar multiplication. A subset V0 ⊂ V for which v1 + v2 ∈ V0 whenever v1 . b ∈ F and v ∈V. To a linear subspace V0 ⊂ V there can be associated an equivalence relation in the vector space V by setting v1  v2 (mod V0 ) whenever v1 − v2 ∈ V0 .3 Vector Spaces A vector space over a field F by definition is a set V on which there are two operations.1. v2 ∈ V0 and av ∈ V0 whenever a ∈ F and v ∈ V0 is called a linear subspace of V . and usually will be denoted by bold-faced letters. that element is denoted by 0 ∈ V and is called the zero vector. such that (i) V is an abelian group under addition. (iii) the multiplicative identity 1 ∈ F acts as an identity under scalar multipi- cation as well: 1v = v for all v ∈ V . This is clearly an equivalence relation: it is reflexive since v1 − v1 = 0 ∈ V0 . and it is transitive since if v1 − v2 ∈ V0 and v2 − v3 ∈ V0 then v1 − v3 = (v1 − v2 ) + (v2 − v3 ) . v2 ∈ V an element v1 + v2 ∈ V . which associates to v1 . it is symmetric since if v1 − v2 ∈ V0 then (v2 − v1 ) = −(v1 − v2 ) ∈ V0 . The simplest vector space over any field F is the vector space consisting of the zero vector 0 alone. and clearly is itself a vector space over F . the elements of the field F are called scalars in this con- text. v2 ∈ V . it is sometimes called the trivial vector space or the zero vector space and usually it is denoted just by 0. VECTOR SPACES 21 1.

The equivalence class of a vector v ∈ V is the subset v + V0 = v + w .∈ V0 .

w ∈ V0 ⊂ V . The set of equivalence
classes has the structure of a vector space over the field F , where the sum of
two equivalence classes is defined by (v1 + V0 ) + (v2 + V0 ) = (v1 + v2 ) + V0
and the scalar product is defined by a(v + V0 ) = av + V0 ; this vector space is
denoted by V /V0 and is called the quotient space of V by the subspace V0 or
alternatively the vector space V modulo V0 .
A homomorphism or linear transformation between vector spaces V
and W over the same field F is defined to be a mapping T : V −→ W such that
(i) T (v1 + v2 ) = T (v1 ) + T (v2 ) for all v1 , v2 ∈ V , and
(ii) T (av) = aT (v) for all a ∈ F and v ∈ V .
A linear transformation T : V −→ W is called an injective linear trans-
formation if it is an injective mapping when viewed as a mapping between
these two sets; and it is called a surjective linear transformation if it is
a surjective mapping when viewed as a mapping between these two sets. A
linear transformation T : V −→ W that is both injective and surjective is

22 CHAPTER 1. ALGEBRAIC FUNDAMENTALS

called an isomorphism between the two vector spaces; its inverse mapping
T −1 : W −→ V clearly is also an isomorphic linear transformation. That two
vector spaces V and W are isomorphic is denoted by V ∼ = W , which merely
indicates that there is some isomorphism between the two vector spaces but
does not specify the isomorphism. That should be distinguished between the
identity V = W of two vector spaces, which implies either that they are actu-
ally the same thing or that there is a specified isomorphism between them. The
set of all linear transformations from a vector space V to a vector space W is
denoted by L(V, W ). The sum of two linear transformationss S, T ∈ L(V, W )
is the mapping defined by (S + T (x) = S(x) + T (x) for any vector x ∈ V ,
and is clearly itself a linear transformation (S + T ) ∈ L(V, W ); and the scalar
product of a real number c ∈ R and a linear transformation T ∈ L(V, W ) is
the mapping defined by (cT )(x) = cT (x) for any vector x ∈ V , and is clearly
also a linear transformation. The distributive law obviousl holds for addition
and scalar multiplications, so the set L(V, W ) is another real vector space.
The kernel or null space of a linear transformation T : V −→ W is the
subset
n

o
(1.22) ker(T ) = v ∈ V

. T (v) = 0 ⊂ V .

for if v1 . Indeed if ker(T ) = 0 and if T (v1 ) = T (v2 ) for two vectors v1 . A linear transformation T : V −→ W is injective if and only if ker(T ) = 0. v2 ∈ V then T (v1 − v2 ) = 0 so (v1 − v2 ) ∈ ker(T ) = 0 hence v1 = v2 . v2 ∈ ker(T ) then T (v1 + v2 ) = T (v1 ) + T (v2 ) = 0 + 0 = 0 so v1 + v2 ∈ ker(T ) and T (av1 ) = aT (v1 ) = a0 = 0 so av1 ∈ ker(T ). so T is an injective mapping. it is clear that ker(T ) ⊂ V is a linear subspace of V . The image of a linear transformation T : V −→ W is the subset n . and conversely if a linear transformation T : V −→ W is an injective mapping then T −1 (0) is a single point of V hence ker(T ) = 0.

23) T (V ) = T (v) . o (1.

. v ∈ V ⊂ W.

A linear transformation T : V −→ W is surjective precisely when its image is the full vector space W . the mapping that associates to any v ∈ V its equivalence class (v + V0 ) ∈ V /V0 is clearly a surjective linear transformation from the vector space V to the quotient space V /V0 with kernel V0 . so all elements in the equivalence class v + V0 have the same image under the transformation T . which clearly is an injective linear mapping. it is called the quotient linear transformation modulo the subspace V0 and generally will be denoted by Υ : V −→ V /V0 . If T : V −→ W is a linear transformation with the kernel V0 = ker(T ) then T (v + v0 ) = T (v) for any vector v0 ∈ V0 . v2 ∈ V so w1 + w2 = T (v1 ) + T (v2 ) = T (v1 + v2 ) ∈ T (V ) and aw1 = aT (w1 ) = T (aw1 ) ∈ T (V ) for any a ∈ F . thus the linear transformation T determines a well defined mapping ι : V /V0 −→ W . The linear transformation T : V −→ W can be viewed as the composition T = ι ◦ Υ . w2 ∈ T (V ) then w1 = T (v1 ) and w2 = T (v2 ) for some vectors v1 . for if w1 . It is clear that T (V ) ⊂ W is a linear subvspace of W . If V0 ⊂ V is a linear subspace of V . the image of the mapping T in the usual sense.

1.27) 0 −→ V −→ W is exact if and only if T is injective and T ν (1. in which T = ι ◦ Υ. a decomposition indicated by the diagram (1. VECTOR SPACES 23 of the quotient linear transformation Υ : V −→ V /V0 followed by the injective linear transformation ι : V /V0 −→ W .31) 0 −→ V1 −→ V2 −→ V2 /V1 −→ 0. so the exact sequence (1. and the exactness at V3 means T2 is surjective. consequently the exactness of the sequence means that V3 ∼ = V2 /T1 (V1 ).28) V −→ W −→ 0 is exact if and only if T is surjective hence ι T ν (1.24) T V . . For example it is clear from the definitions that ι T (1. so if the image of the linear trans- formation Ti−1 is precisely the kernel of the next linear transformation Ti . one of the form ι 1 2 T T ν (1.30) 0 −→ V1 −→ V2 −→ V3 −→ 0.26) V1 −→ V2 −→ V3 −→ V4 −→ V5 −→ V6 −→ · · · −→ Vn−1 −→ Vn is an exact sequence of linear transformations if it is exact at each Vi for 2 ≤ i ≤ n − 1. A common convention is to say that a sequence of linear transformations Ti−1i T (1. HH *  ΥH j ι V /V0 A diagram such this. The exactness at V1 means that T1 is injective. is called a commutative diagram.W.30) is really just the exact sequence ι 1 2 T T ν (1. the exactness at V2 means that the subspace T1 (V1 ) ⊂ V2 is precisely the kernel of T2 .3. and a longer sequence of linear transformations such as 1 T 2 3 T 4 5 T 6 T T T Tn−2 Tn−1 (1. has the property that the effect of tracing through successive compositions of mappings depends only on the begin- ning and ending point of the succession of mappings and not on the particular sequence of mappings involved.25) Vi−1 −→ Vi −→ Vi+1 is exact at Vi if Ti−1 (Vi−1 ) = ker(Ti ) ⊂ Vi .29) 0 −→ V −→ W −→ 0 is exact if and only if T is an isomorphism. A particularly useful special case is a short exact sequence.

. v2 . . here by convention ι indicates an injective linear transformation and ν denotes the null linear transformation that maps everything to the zero vector 0. the detailed description of a linear transformation T : V −→ W can be given in the form of the exact sequence ι ι T Υ ν (1. .32) 0 −→ V1 −→ V2 −→ V3 −→ V4 −→ 0 is readily seen to be equivalent to the two short exact sequences ι 1 T 2 T ν 0 −→ V1 −→ V2 −→ T2 (V2 ) −→ 0 ι ι 3 T ν 0 −→ T2 (V2 ) −→ V3 −→ V4 −→ 0. . The exactness here is a straightforward consequence of the definitions. As a final example here. a form of formal duality. For instance the exact sequence ι 1 T 2 3 T T ν (1. ALGEBRAIC FUNDAMENTALS Any other exact sequence that begins and ends with the trivial vector space 0 can be decomposed into a sequence of short exact sequences.24 CHAPTER 1. vn of vectors in a vector space V over a field F is the set of vectors defined by  . The span of a finite set v1 .33) 0 −→ ker(T ) −→ V −→ W −→ coker(T ) −→ 0. where the cokernel of a linear transformation T : V −→ W is the quotient space coker(T ) = W/T (V ). the terminology reflects the symmetry of the exact sequence.

 X n .

. . . v2 .  (1. .34) span(v1 . vn ) = aj vj .

aj ∈ F . .

 .

. . vn ) = span(v1 . vn ) = span(v1 . . . vj+1 . . .P . Indeed if span(v1 . . . . . . conversely n if the vectors v1 . . . . . wm ∈ span(v1 . so if aj 6= 0 then vj = 1≤i≤n. vn ). . vj+1 . vn ) = span(v1 . . . . . . . vj in a vector space V and if m > n then the vectors w1 . .i6=j ai vi . Theorem 1. . . . vn ) then vj ∈ span(v1 . . . It is easy to see that vectors v1 . . . vn ) so vj = 1≤i≤n. vn are linearly dependent if and only if span(v1 . not all of which are zero. . Vectors v1 . . . . v2 . so if whenever j=1 aj vj = 0 then aj = 0 for all indices 1 ≤ j ≤ n. . . vj−1 . vn are linearly de- n pendent if j=1 aj vj = 0 for some aj ∈ F where aj 6= 0 for at least one of the scalars aj . . . . . . . . . . . . .4 (Basis Theorem) If w1 . . vj+1 . vn ) for some vectors wj . . .i6=j ai a−1 j vi so span(v1 . vj+1 . By convention the span of the empty set of vectors is the trivial vector space 0 consisting ofPthe zero vector alone. . . . vn ) = span(v1 . vn are linearly dependent then i=1P ai vi = 0 for some scalars ai . . wm are linearly dependent. . . . . . . . vj−1 . . vn are linearly Pdependent. . and these vectors are Pnlinearly independent if they are not linearly dependent. . . vj−1 . vn ) for some index 1 ≤ j ≤ n. . . . . . . . and that shows that the vectors v1 . . vj−1 . . .  j=1 clearly the the span of any finite set of vectors in V is a subspace of V .

. vn+1 are clearly linearly independent. . . not all of which vanish. . . if V = span(v1 . . . . . and hence of course so are the vectors w1 . so eventually either the set of vectors (w1 . . . . . . That shows incidentally that dim W ≤ dim V whenever W ⊂ V . . . . vn ) and since the vectors w1 . . . vn ). . vn ). . If W ⊂ V for two vector spaces V. Therefore any finite dimensional vector space can be written as the span of a finite set of linearly independent vectors. . . VECTOR SPACES 25 Pn Proof: Since w1 ∈ span(v1 . . vn )it follows that w1 = j=1 aj vj for some scalars aj . . . vn ) = span(v1 . . . . vn ∈ V . (ii) If 0 −→ V1 −→ V2 −→ · · · −→ Vn −→ 0 is an exact sequence of vector . . reversing the roles of these two bases and applying the same argument shows that n ≤ m. . wm . . called a basis for the vector space. . that is. and it is clear from this that Pn w2 ∈ span(v1 . . wn ) = span(v1 . . .PSince wn+1 ∈ n span(v1 . so by relabeling Pnthe vectors vj it can be assumed that a1 6= 0 hence that w1 = a1 v1 + j=2 aj vj or Pn equivalently that v1 = a−1 1 w1 + −1 j=2 a1 aj vj . wn . wm and that suffices for the proof. . . . . . . . Otherwise bj 6= 0 for some j > 1. . . vm+1 . . vn ) = span(w1 . . and the argument can be repeated. so the vectors w1 . . . and by the same argument as before it follows that span(v1 . . and that if dim W = dim V then W = V . W and if w1 . . . . not all of the scalars aj can vanish. . The number of vectors in a basis for a vector space V is called the dimension of that vector space. vn ) for some vectors v1 . . vn for some vectors vm=1 . . .1. Theorem 1. V. . . If b1 6= 0 but bj = 0 for 2 ≤ j ≤ n this equation already shows that the vectors w1 . . Then since span(w1 . w2 . wn+1 are linearly dependent and therefore so are the vectors w1 . for since V is finite dimensional it has a basis of n vectors for some n so there can be no more than n linearly independent vectors in V . . w2 are linearly dependent. . . . It is an immediate consequence of the Basis Theorem that the number of vectors in a basis is the same for all bases. . The Basis Theorem shows that this process finally stops. wn ) is linearly dependent or span(w1 . . (i) If W ⊂ V then dim V = dim W + dim V /W . . A vector space V is finite dimensional if it is the span of finitely many vectors. . . and that argument can be repeated until all the remaining vectors are linearly independent. vn ∈ V . . . . . . . . . . . wm is a basis for W then there is a basis for V of the form w1 . . .3. . . wm are linearly dependent the Basis Theorem shows that m ≤ n.5 (Dimension Theorem) Let W. . vn ) it follows that w2 = b1 w1 + j=2 bj vj for some scalars bj . . wm and v1 . . The ar- gument can be continued. v2 . . indeed if W ⊂ V but W 6= V then there is at least one vector vn+1 ∈ V ∼ W . vn are two bases for a vector space V then since w1 . vn ) = span(w1 . wm ∈ span(v1 . . . . v3 . . . If these vectors are not linearly independent then as already observed the span is unchanged by deleting at least one of these vectors. v2 . . vn ) = span(w1 . . . vn ). wm . . . . be finite dimensional vector spaces over a field F . wn ) it follows that wn+1 = j=1 cj wj and consequently the vectors w1 . · · · . and by relabel- ing the vectors vj it can be assumed that b2 6= 0. V1 . Indeed if w1 . . . . .

The common practice. where m = dim W and n = dim V . (iii) This follows immediately from the preceding part (ii) applied to the exact sequence (1. To prove the general case ∗ by induction on n. . . xn ). .35) holds for the exact sequences 0 −→ V1 −→ 0 and 0 −→ V1 −→ V2 −→ 0. . so that n − m = dim V /W as desired.37) v= .31) the desired formula for that case reduces to part (i) which has been demonstrated.36) dim(V ) − dim(W ) = dim ker(T ) − dim coker(T ) . Pn−2 ∗ by the induction hypothesis 0 = j=1 (−1)j dim Vj + (−1)n−1 dim Vn−1 while ∗ 0 = dim Vn−1 − dim Vn−1 + dim Vn and adding these two formulas yields (1. The equivalence classes of V under the equivalence relation mod W have the form vj + W for m + 1 ≤ j ≤ n. if (1. . this alternative version of a vector does play a significant role other than just notational convenience. . Sometimes for notational convenience a vector is de- scribed merely by listing its coordinates.38) v = (x1 . . is to view a vector v ∈ F n as a column vector   x1  . . and since the special case n = 3 is just the short exact sequence (1. . .. . so as (1.39) v = {xj } = {x1 . wm for the vector space W and v1 . ALGEBRAIC FUNDAMENTALS spaces then n X (1. A standard example of a vector space of dimension n over a field F is the set F n consisting of n-tuples of scalars in F . (ii) It is clear that (1. j=1 (iii) In particular for any linear transformation T : V −→ W   (1. the row vector t (1. Associated to any vector v ∈ F n though there is also its transposed vector tv.35) holds for some n − 1 ≥ 2 and if Vn−1 ⊂ Vn−1 is the image of the vector space Vn−2 in Vn−1 then the exact sequence for the case n is equivalent to the pair of exact sequences ∗ 0 −→ V1 −→ V2 −→ · · · −→ Vn−1 −→ 0 ∗ 0 −→ Vn−1 −→ Vn−1 −→ Vn −→ 0.33).26 CHAPTER 1. the special cases n = 1 and n = 2. Proof: (i) Choose bases w1 . xn }. .  xn where xi ∈ F are the coordinates of the vector v. . . which will be followed systematically here.  (1. . vn for the vector space V . and that suffices for the proof. . .35) (−1)j dim Vj = 0.35) for the case n.

thus           x1 y1 x1 + y1 x1 a x1 (1.  and the associated Kronecker vector  1 δj    2 (1.44) T (δ j ) = aij δ i i=1 ..1. (1. j=1 It is also clear that any n-dimensional vector space over the field F is isomorphic to the vector space F n .41) δji = 0 if i 6= j. xn yn xn + yn xn a xn It is clear that F n is a vector space over the field F with these operations.43) v= xj δ j ..  .. It is clear that these n vectors are linearly independent hence are a basis for the vector space F n .42) δ j =  δ j  ∈ Rn . A linear transformation T : F m −→ F n can be described quite explicitly in a way that is very convenient for calculation.. The image of a basis vector δ j ∈ F m is a linear combination of the basis vectors δ i ∈ F n . VECTOR SPACES 27 although the vector still will be viewed as a column vector.. A canonical basis for the vector space F n can be described in terms of the Kronecker symbol   1 if i = j.. and the scalar product of a scalar a ∈ F and a vector v ∈ V is the vector obtained by multiplying all the coordinates of the vector by a. . and its inverse is the isomorphism T −1 : V −→ Rn for which T −1 (vj ) = δ j . For many purposes it is sufficient just to consider the vector spaces F n rather than abstract vector spaces. The sum of two vectors is the vector obtained by adding the coordinates of the two vectors.40)  . vn is a basis for V then the mapping T : Pn Pn F n −→ V defined by T j=1 xj δ j = j=1 xj vj obviously is an isomorphism between these two vector spaces for which T (δ j ) = vj . .. so any vector (1. so n X (1.    .  =  . which is 1.  if a ∈ F. . δjn thus δ j is the column vector for which all coordinates are 0 except for the j-th coordinate..  +  .           . ..  and a  .3.for if v1 .37) can be written in terms of this basis as n X (1.  =  ..

The horizontal lines are called the rows of the matrix and the vertical lines are called the columns of the matrix.46) A=  a31 a32 a33 · · · a3m    ···  an1 an2 an3 · · · anm called an n × m matrix over the field F . j=1 This system of equations often is viewed as a matrix product. ALGEBRAIC FUNDAMENTALS Pm scalars aij ∈ F . When vectors in F nm are viewed as n × m matrices over F it is customary and helpful to denote this vector space by F n×m . The preceding conventions and terminology are rather rigidly followed so should be kept in mind and used carefully and systematically. this represents the trivial linear transformation that maps all vectors in F m to the zero vector in F n .48) yi = aij xj for 1 ≤ i ≤ n. hence by linearity the image of an arbitrary vector for some v = j=1 xj δ j ∈ F m is the vector m X X n (1. F n ) thus can be identified with the of all n×m matrices over the field F . F n ) = mn. and matrix having all entries 0. so A = {aij } where aij is the entry in row i and column j. expressing the column vector or n × 1 matrix y = {yi } as the product of the n × m matrix A . which in turn is just the real vector space F nm since an n×m the matrix is merely a collection of nm scalars. it is evident that the addition and scalar multiplication of linear transformations in L(F m . j=1 i=1 so the linear transformation T is described completely by the set of scalars aij ∈ F .47) dim L(F m . The zero vector in F n×m is described by the zero matrix. and the real numbers aij are called the entries of the matrix. so as the collection of linear equations m X (1.45) T (x) = xj aij δ i ∈ F n . so the matrix can be viewed as a collection of n row vectors or alternatively as a collection m column vectors.45) can be expressed alternatively as a collection of linear equa- tions relating the variables xi and yj . F n ) amounts to the addition and scalar multiplication of the matriaces in F nm .28 CHAPTER 1. The vector space L(F m . so the identification of L(F m . It follows from this that (1. F n ) with F nm actually is an isomorphism of vector spaces. It is customary and convenient to list these real numbers in an array of the form   a11 a12 a13 · · · a1m  a21 a22 a23 · · · a2m    (1. Equation (1.

so yi = ai1 x1 + ai2 x2 + ai3 x3 + · · · + aim xm .49)  y3  =  a31    a32 a33 · · · a3m    x3    · · ·  ···  · · · yn an1 an2 an3 · · · anm xn where the entry yi in row i of the product is the sum over the index j of the products of the entries aij in row i of the matrix A and the successive entries xj of the column vector x. Matrices A and B can be multiplied to yield a product only when the number of columns of the matrix A is equal to the number of rows of the matrix B. VECTOR SPACES 29 and the column vector or n × 1 matrix x = {xj } in the form      y1 a11 a12 a13 · · · a1m x1  y2   a21    a22 a23 · · · a2m   x2    (1. Indeed if a linear transformations T : F n −→ F m is described by an m × n matrix B and a linear transformation S : F m −→ F l are described by m × n matrix A then for any index 1 ≤ i ≤ n m ! m  X X (S ◦ T )(δ i ) = S T (δ i ) = S bki δ k = bki S(δ k ) k=1 k=1 m X n X Xn Xm n X = bki ajk δ j = ajk bki δ j = cji δ i k=1 j=1 j=1 k=1 k=1 Pm the composition S ◦ T is described by the matrix C = {cji } where cji = thus k=1 ajk bki so that C = AB. 1 ≤ j ≤ n. where AB denotes both .1. and number of rows of the product AB is the number of rows of A while the number of columns of the product AB is the number of columns of the matrix B. It is worth calculating a few matrix products just to become familiar with the technique. For this reason the linear transformation de- scribed by a matrix A is usually denoted just by A.50) cij = aik bkj for 1 ≤ i ≤ l. The principal significance of matrix products though is that the composition of linear transformations corresponds to the product of the matrices describing these linear transformations.3. More generally the product of an l × m matrix A = {aij } and an m × n matrix B = {bij } is defined to be the l × n matrix AB = C = {cij } with entries m X (1. For example in the product    b b12 a11 a12 a13  11    c11 c12 b21 b22 =  a21 a22 a23 c21 c22 a31 a32 the first column of the product AB = C is the product of the matrix A and the first column of the matrix B. k=1 so column j of the product matrix AB = C is the column vector cj = A bj where bj is column j of the matrix B.

If the matrix A describes a linear transformations A : F n −→ F m then the matrix tA describes a linear transformation tA : F m −→ F n . but note particularly though that tA does not describe the inverse linear transformation to that described by A but rather something quite different altogether. This convention will be followed henceforth. Some further notational conventions involving matrices are widely used. a matrix A = {aij } such that aij = 0 whenever i > j. If C = AB the entries of these matrices are related by cij = k aik bkj . thus if ai is the i-th column vector of the matrix A then tai is the i-th row vector of the matrix tA.52)   0 0 a3 0   0 0 0 a4 Another special form of a matrix is a subdiagonal matrix.30 CHAPTER 1.38) of the transpose of a vector. A decomposition of this form in which A12 and A21 are both zero matrices is called a direct sum decomposition.53)  a31 a32 0  0 a41 a42 a43 0. the transpose of an n × m matrix A = {aij } is defined to be the m × n matrix t (1. denoted by A = A11 ⊕ A22 .51) A = B = {bij } where bij = aji . Thus for example an n × m matrix A = {aij } can be decomposed into 4 matrix blocks   A11 A12 A= A21 A22 where A11 is a k×k matrix. A21 is an (n−k)×k matrix and A22 is an (n−k)×(m−k) matrix. In extension of the definition (1. . Another notational convention is that a matrix can be decomposed into matrix blocks by splitting the rectangular array of its entries into subarrays. is called a diagonal matrix since all of its terms vanish aside from those terms along the main diagonal. for the inverse of a mapping between two vector spaces of different dimensions is notPeven well defined. as for example   0 0 0 0 a21 0 0 0 (1. so transposition reverses the order of matrix multiplication. ALGEBRAIC FUNDAMENTALS the matrix product and the composition A ◦ B of the linear transformations described by these matrices. as in   a1 0 0 0  0 a2 0 0  (1. just a real number. a superdiagonal matrix of course is defined correspondingly as a matrix A = {aij } such that aij = 0 whenever i < j. An n × n matrix A which has the direct sum decomposition A = A11 ⊕ A22 ⊕ · · · ⊕ Ann in which each of the component matrices Aii is a 1 × 1 matrix. and that equation can be interpreted alternatively as t C = tB tA. A12 is a k×(m−k) matrix.

so the column rank of A is just the dimension of the image AF m . n) denotes the minimum of the natural numbers m and n.1.55) B= . . shows that crank(A) ≤ rrank(A). since Ax = i=1 ai xi for any vector x ∈ F m . Pm The column space of course is just the image AF m ⊂ F n . . The same argument applied to the tranpose matrix tA. hence rrank(A) ≤ rrank(R). thus 0 ≤ crank(A) ≤ min(m. . The row space of the matrix A and its row rank rrank(A) are defined correspondingly. . . so the distinction between them is ordinarily ignored and the common value is just called the rank of the matrix and denoted by rank(A).54) bij = 0 otherwise  thus is the n × m matrix   Ik 0 (1. . which suffices for the proof. where aj ∈ F n . and extend this to a basis v1 . . vk . Proof: If A ∈ F n×m has crank(A) = ν let c1 . and since R is a ν × m matrix necessarily rrank(R) ≤ ν so it follows that rrank(A) ≤ ν = crank(A). .6 rrank(A) = crank(A) for any matrix A. the linear subspace of F n spanned by these column vectors ai is called the column space of the matrix A and the dimension of this subspace is called the column rank of the matrix A. and the components of these vectors are then rkj forming aij = k=1 rkj cik so that A = CR. vm for the vector space F m . where of course rrank(A) = crank( tA). and consequently crank(A) = rrank(A) as desired. which interchanges the roles of rows and columns. 0 0 Proof: If W ⊂ F m is the kernel of the linear transformation A choose some vectors v1 . n). cν ∈ F n be a basis for the column space of the matrix A and C = c1 c2 · · · cν be the n × ν matrix with these columns.7 (First Normal Form Theorem) If A ∈ F n×m is a matrix describing a linear transformation A : F m −→ F n for which dim ker(A) = k then there are matrices S and T describing isomorphisms S : F n −→ Fn and T : F m −→ Fm so that the matrix SAT = B = {bij } has the entries   1 if 1 ≤ i = j ≤ k (1. . . . . Actually the row rank and the column rank of any matrix are always the same. where min(m. Theorem 1. VECTOR SPACES 31 A further standard convention is that when an n × m matrix  A is viewed as a collection of m column vectors A = a1 a2 · · · am . as in the following result. . Theorem 1. and in particular crank(A) = 0 if and only if A = 0. vk in F m that form a basis for W ⊂ F m . Pthe columns of the matrix A hence can be written as ν linear combinations aj = k=1 rkj ck of the column vectors ck for some scalars Pc a ν × m matrix R. . Thus A(vi ) = 0 for 1 ≤ . The matrix equation A = CR expresses the rows of the matrix A as linear combinations of the ν rows of the matrix R. denoted by crank(A).3.

this linear transformation is described by the matrix  (1. Consequently B = SAT is the matrix with the entries (1.. xn ) in F n for any field F . which suffices for the proof. so that defines an algebraic operation on the space F n×n of square n × n matrices. Then introduce the linear mapping S : F n −→ F n for which S(wi ) = δ i . . . vk for W . x2 . . Any subset G ⊂ Gl(n. of course the zero matrix has no inverse. An interesting special example of a linear group is the subgroup of permu- tation matrices. wn ∈ F n ..  0 0 0 1 Not all matrices in F n×n have multiplicative inverses. so to denote the n × n identity matrix by In . δ jn where j1 . . the Kronecker symbols. δ jn . such as δ j1 . . but the image vectors A(vi ) ⊂ F n must be linearly independent.jn = δ j1 δ j2 . The vectors wi = AT (δ k+i ) = A(vi ) for k + 1 ≤ i ≤ m ⊂ F n are m linearly independent vectors. . . Groups of this form are known as linear groups over the field F . It is often convenient to specify the size of an identify matrix. .56) I4 =   0 0 1 0 . . Since the identity linear transformation takes the basis vectors δ i to themselves it is easy to see that the matrix I has the entries I = {δji }.. which is also an isomorphism of vector spaces.j2 . . .54). . If 1 ≤ i ≤ k then (S ◦ A ◦ T )(δ i ) = S ◦ (AT (δ i ) = S(◦Avi ) = S(0) = 0. the matrices describing those linear transformations that per- mute or rearrange the variables (x1 . . F ) with the property that AB ∈ G whenever A. . as already noted. . jn is a rearrangement of the first n natural numbers.32 CHAPTER 1.57) ∆j1 . Any such transformation takes the standard basis δ 1 . F ). δ n of F n to the basis formed by some rearrangement or permutation of these vectors. . . a group called the general linear group over the field F and denoted by Gl(n. but there are also nonzero matrices such as A = ( 00 10 ) ∈ F 2×2 for which A·A = 0 and these also can have no inverses. . and introduce the composition S ◦ A ◦ T : F m −→ F n . since if m Pm i=k+1 ci A(v i ) = 0 where not all the scalars c i vanish then i=k+1 ci v i) ∈ W but that vector is not in the span of the basis v1 . . Introduce the linear mapping T : F m −→ F m for which T δ i = vi . B ∈ G clearly form a group under multiplication. provided that I ∈ G and that for any matrix A ∈ G there is a matrix A−1 ∈ G such that AA−1 = I. ALGEBRAIC FUNDAMENTALS iP≤ k. However the set of n×n matrices describing isomorphisms of the vector space F n clearly have inverses. δ j2 . . . which is clearly an isomorphism of vector spaces. so that matrix acts as an identity for multiplication of matrices. j2 .. . . . The matrix I ∈ F n×n describing the identity mapping F n −→ F n clearly has the property that I A = A I = A for any matrix A ∈ F n×n . δ 2 . . and extend this to a basis w1 . while if k + 1 ≤ i ≤ m then (S ◦ A ◦ T )(δ i ) = S(wi ) = δ k+i . The product of two n × n matrices is another n × n matrix. so the set of such matrices form a group under multiplication. thus for instance   1 0 0 0  0 1 0 0 (1.

xn ) of permuta- tions of a set of n elements. following this by the permutation π2 has the effect of replacing the polynomial P2 (x) by another polynomial P3 (x). π2 ∈ S(x1 . consequently π ∗ P (x) = ±P (x) where the sign is called the sign of the permu- tation π and is denoted by sgn(π). for any permutation matrix δ j1 δ j2 . A permutation π is even if sgn(π) = +1 and is odd if sgn(π) = −1. (iv) sgn(π1 π2 ) = sgn(π1 )·sgn(π2 ) for any two permutations π1 .3. . a natural number traditionally denoted by n!.. . π2 . for which P3 (x) = −P2 (x) by the result of part (i). but possibly with a reversal of the signs of the differences. . leading to the result that sgn(π1 ◦ π2 ◦ · · · ◦ πk ) = sgn(π1 ) · sgn(π2 ) · · · sgn(πk ) for any permutations π1 . a polynomial of degree n(n − 1)/2 consisting of the prod- uct of the differences of any two distinct variables where the index of the first variable is less than the index of the second variable. . VECTOR SPACES 33 since ∆j1 . .. .j2 . .1. . .. δ jn there are n possible choices for the vector δ j1 . xn ) that in- terchanges two variables xi and xj . The argument can be continued inductively. . After a permutation π of the indices the resulting polynomial π ∗ P (x) still consists of the products of the differences of any two distinct variables. x2 .jn I = ∆j1 . .8 (i) sgn(π) = −1 for the permutation π ∈ S(x1 . (ii) If π1 . and continuing this process shows that the number of permutation matrices is the product n(n − 1)(n − 2) · · · 1. If k < i < j or i < j < k the product (xi − xk )(xj − xk ) is unchanged by the permutation. . . . .. each of which which interchanges two variables . πk .jn . . there then remain n − 1 possible choices for a distinct vector δ j2 . this composition is just the permutation π1 ◦ π2 .. The permutation matrices thus can be char- acterized as those matrices having the entry 1 exactly once in each row and column and all other entries 0. π2 . Thus the only effect of the permutation π on the sign of the polynomial P (x) arises from the change in the sign of the factor (xi − xj ) hence π ∗ P (x) = −P (x) so sgn(π) = −1. (ii) sgn(π1 ◦ π2 ◦ · · · ◦ πk ) = sgn(π1 ) · sgn(π2 ) · · · sgn(πk ) for any permutations π1 . πk . hence sgn(π1 ◦ π2 ) = sgn(π1 ) · sgn(π2 ). Theorem 1. . and if i < j < k the product (xi − xk )(xk − xj ) is replaced by (xj − xk )(xk − xi ) so the sign of the product is unchanged. π2 are permutations that interchange pairs of variables the effect of π1 is to replace the polynomial P (x) by another polynomial P1 (x) for which P1 (x) = −P2 (x) as in the result of part (i). where each difference is counted twice corresponding to the two orders of the indices. To the Q usual order of the variables there can be associated the polynomial P (x) = i<j (xi − xj ).. Proof: (i) A permutation π that interchanges xi and xj does not change the sign of any factor of the polynomial P (x) that does not involve the variable xi or the variable xj . .j2 . xn ). (iii) Any permutation can be written as a product of permutations that inter- change two variables. . . . To count the  number of matrices in Sn . The group of permutation matrices is denoted by Sn and is isomorphic to the symmetric group S(x1 . each of which which interchanges two variables. the square P (x)2 of this polynomial thus consists of the product of the differences between all pairs of distinct variables. ...

where T (v1 . vj+1 . . k. vk ) ∈ W .j1 that interchanges the variables x1 and xj1 replaces the vari- ables xj1 . . xjn of variables. . δ jk ) = sgn 1 Φ(δ 1 .. . defined as mappings T : V k −→ W from Cartesian products V k of a vector space V over the field F to another vector space W over F that are linear mappings V −→ W on each separate factor V . k . xkn . δ k ). . and that suffices for the proof. xkn by x1 . . Composing the permutation π with the permutation π1. . so π is the product of the inverses of these permutations. . xjn by x1 .jk =1 and since Φ also is assumed to be alternating   j . . . by the result of part (iii). jk Φ(δ j1 .. . . ... x2 . hence the desired result follows from the result of part (ii). . . δ jk ). . . composing this permutation with the permutation π2. . . vjk ) = sgn 1 2 T (v1 . . . vk are held fixed. n there are alternating mul- tilinear mappings (1. vk ) 1 2 ··· k for any rearrangement j1 . . x2 .34 CHAPTER 1. . ALGEBRAIC FUNDAMENTALS (iii) Suppose a permutation π replaces the variables x1 . j2 . . j1 . xk2 . Thus the composition of the permutation π with the composition of these permutations which interchange two variables leads to the identity per- mutation. v2 . vj2 . vj−1 . . . . . . . . . . and so on. A multilinear mapping T : V k −→ W is an alternating mapping if   j j · · · jk (1. . . . · · · . .k2 replaces the variables x1 . . akjk δ j1  j1 =1 jk =1 n X = a1j1 · · · akjk Φ(δ j1 . . . . (iv) Any permutations π1 and π2 can be written as products of permutations that interchange two variables. . v2 . . . .. . . . · · · . . . . xln . . . . xl3 . . .9 For any field F and any numbers k. 2. Proof: If Φ is an alternating multilinear mapping as in (1. . vk ) is a linear function of the vector vj ∈ V whenever the remaining vectors v1 . jk of the numbers 1. . . . vk ) = Φ  a1j1 δ j1 . . and all such mappings differ by a constant factor. . . xn by a rear- ranged set xj1 . 1 .59)P and the vectors n vi ∈ F n are written in terms of the basis vectors δ j as vi = j=1 aij δ j then since Φ is multilinear it follows that   n X n X Φ(v1 . . xj2 . .59) Φ : F n × · · · × F n −→ F | {z } k n from k copies of the vector space F to the field F . xj2 . . .58) T (vj1 .. v2 . A very important and extremely useful invariant of square matrices is closely related to multilinear mappings. xk2 . Theorem 1. thus a multilinear mapping T : V k −→ W associates to any k vectors vi ∈ V a vector T (v1 . . .

. normalized so that (1. . vk ) = C sgn a1j1 · · · akjk 1 ··· k j1 . . . . . Proof (i) If A and B are n × n matrices the column vectors vi of the product AB Pn can be written in terms of the column vectors bj of the matrix B as vi = j=1 aij bj .. . .. . . and that suffices for the proof.1.61) det A = Φ(a1 ..3. jn (1. .. anjn bjn  j1 =1 jn =1 n X = a1j1 · · · anjn Φ (bj1 .jn =1 = det A · det B. Theorem 1. Conversely the mapping Φ : F n × · · · × F n −→ F defined by (1.. . an ) = sgn a1j1 · · · akjn 1 ··· n j1 . . . .. . bn 1 ··· n j1 . and it then follows from part (i) of this theorem .. . The determinant is defined by (1. bjn ) j1 .59) then it must have the form n   X j1 .. .61). . .60) Φ(v1 . . . jk (1. . . The basic properties of the determinant follow quite directly from its definition.. δ k ). ... .. Thus the determinant is a uniquely defined mapping det : F n×n −→ F from the vector space of n × n matrices over the field F to that field. and (ii) det A 6= 0 if and only if A is a linear isomorphism A : Fn −→ Fn .62) det I = 1 for the identity matrix I. and since the mapping Φ is multilinear and alternating   n X n X det AB = Φ(v1 . . .10 The determinant mapping over a field F satisfies (i) det(AB) = det A · det B for any n × n matrices A and B. VECTOR SPACES 35 Thus if there is an alternating multilinear mapping (1..60) clearly is an alternating multilinear mapping. . . . an are the column vectors of the matrix A = {aij } and the map- ping Φ : F n × F n −→ F is the alternating multilinear mapping of the preceding theorem normalized so that Φ(δ 1 . .jk =1 where C = Φ(δ 1 . . . . . vn ) = Φ  a1j1 bj1 . . δ n ) = 1. In particular the determinant of an n × n matrix A = {aij } over a field F is defined by n   X j1 . .jn =1 n   X j ··· jn = a1j1 · · · anjn sgn 1 Φ(b1 . .jn =1 where a1 .. (ii) If a matrix A is a linear isomorphism then it has a well defined inverse matrix A−1 for which AA−1 = I. ..

T all have nonzero determinants it follows from part (i) that det B 6= 0. Theorem 1. The vector space Rn×n under multiplication of vectors thus is an algebra. T that are linear isomorphisms   Ik 0 from the vector space F n to itself such that SAT = B where B = 0 0 for some number k with 0leqk ≤ n. It is not difficult to describe all 2-dimensional commutative real algebras. and since the determinant is a multilinear function of the columns of B that means that det B = 0. v2 ∈ A their product v1 · v2 ∈ A such that (i) multiplication of vectors is associative and has an identity element I ∈ A.36 CHAPTER 1. indeed a commutative real algebra. with the same proof as in the preceding examination of rings. An algebra for which the multiplication is commutative is called a commutative algebra. The full set Rn×n not only has a multiplication but is also a vector space. which of course is not commutative. Conversely if A is an n × n matrix for which det A 6= 0 it follows from the First Normal Form Theorem. the multiplication table in terms of this basis then must be of the . The ring R itself is a one-dimensional real algebra. By definition an algebra over the real numbers is a real vector space A together with an operation that associates to any two vectors v1 . Just as in the case of rings. Consequently k = n so B = I. and other infinite dimensional algebras will be discussed at a later point. v2 ∈ A. The properties of multiplication of vectors are just those of a ring. so that 0 · v = v · 0 = 0 for any vector v ∈ A. so it is an example of another sort of algebraic structure. and choose a basis v1 . The set of all polynomials is another com- mutative algebra. Since the matrices S. the identity matrix. The dimension of an algebra A is defined to be its dimension as a vector space. thereby concluding the proof. A. v2 ∈ A. so it can be characterized as a vector space that is also ring under an operation of multipli- cation of vectors and the usual addition of vectors. v1 . a contradiction. ALGEBRAIC FUNDAMENTALS that det A det A−1 = det I = 1. (iii) scalar multiplication and the multiplication of vectors are related by c(v1 · v2 ) = (cv1 ) · v2 = v1 · (cv2 ) for all c ∈ R and v1 . and since A = S −1 BT −1 is a product of matrices describing linear isomorphisms it follows that the matrix A also describes a linear isomorphism. both for its later application and as an example of the preceding perhaps more abstract discussion. Indeed suppose that A is a commutative real algebra with dim(A) = 2. (ii) the distributive law v · (v1 + v2 ) = v · v1 + v · v2 holds for any vectors v. v2 for the vector space A so that v1 is the identity for mul- tiplication. the zero matrix 0 plays a special role in multiplication in an algebra. however if k < n then one of the columns of the matrix B is a zero vector. where scalar multiplication and multiplication of vectors are related by (iii) above.7 that there are matrices S. and obviously is the only example. except that multiplication is not assumed to be commutative. and it is worth examin- ing some of the special properties of this case. The special case of vector spaces over the field R of real numbers will be the focus of much of the subsequent analytical discussion. and consequently det A 6= 0. although not a finite dimensional algebra.

x2 : v1 · v1 = v1 . and by the distributive law v20 · v20 = (v2 − tv1 ) · (v2 − tv1 ) = v2 · v2 − 2tv1 · v2 + t2 v1 · v1 = (x1 v1 + x2 v2 ) − 2tv2 + t2 v1 = (x1 + t2 )v1 + (x2 − 2t)v2 = (x1 + t2 )v1 + (x2 − 2t)(v20 + tv1 ) = (x1 + tx2 − t2 )v1 + (x2 − 2t)v20 so if 2t = x2 it follows that v20 · v20 = yv1 for some y ∈ R. x2 ∈ R. v2 · v2 = v1 where  is either 0 or 1 or −1. if y 6= 0 it is possible to choose s so that s2 y = 1 and if y > 0 and s2 y = −1 if y < 0. which is the case but which has not yet been demonstrated here.64) v1 = v2 = 0 1 0 0 obviously satisfy (1. for any choice of t ∈ R the vectors v1 . This multiplication table can be simplified by replacing v2 by v20 = v2 − tv1 for a suitable real number t ∈ R. The real number y can be either 0 or positive or negative.3. v1 · v2 = v2 · v1 = v2 . v20 are another basis for V . VECTOR SPACES 37 form Ax+ 1.63) A : v1 · v1 = v1 . so that v200 · v200 = s2 yv1 . assuming that any positive real number can be written as the square of a positive real number. For the case  = 0 the 2 × 2 real matrices     1 0 0 1 (1. v1 · v2 = v2 · v1 = v2 . the vector space spanned by these two matrices is  . For a further simplification replace the vector v20 by v200 = sv20 for another real number s ∈ R.63). Therefore after relabeling the vectors v1 and v2 the multiplication table for the algebra takes the simpler form (1. v2 · v2 = x1 v1 + x2 v2 for some x1 .1.

 a b .

.

65) A0 = a. (1. 0 a . b ∈ R .

66) v1 = v2 = 0 1 1 0 clearly satisfy (1. the vector space spanned by these two matrices is  . For the case  = 1 the 2 × 2 real matrices     1 0 0 1 (1. and a straightforward calculation shows that this set of matrices is closed under multiplication and form a commutative real algebra.63).

 a b .

.

67) A1 = a. b a . b ∈ R . (1.

.

63). ALGEBRAIC FUNDAMENTALS and a straightforward calculation shows that this set of matrices is closed under multiplication and form a commutative real algebra.38 CHAPTER 1.68) v1 = v2 = 0 1 −1 0 obviously satisfy (1. the vector space spanned by these two matrices is  . For the case  = −1 the 2 × 2 real matrices     1 0 0 1 (1.

 a b .

.

b ∈ R .69) A−1 = a. −b a . (1.

showing that A−1 actually is a field. so any nonzero element (x1 v1 + x2 v2 ) ∈ A−1 has the multiplicative inverse (x1 v1 + x2 v2 )−1 = (x21 + x22 )−1 (x1 v1 − x2 v2 ). and the mapping z −→ z is a field isomorphism since it is readily verified that (z1 + z2 ) = z1 + z2 and (z1 · z2 ) = z1 · z2 . The basis vector v1 is customarily identified with the real identity element 1. and the products yv2 = yi for y ∈ R are known for historic reasons as imaginary numbers. and a product xv1 ∈ C is identified with the real number x viewed as being imbedded in C. 8 See the Princeton Companion to Mathematics . It is worth noting that obviously there is no notrivial isomorphism of the field R with itself. and a straightforward calculation shows that this set of matrices also is closed under multiplication and form a commutative real algebra. Indeed there are elements in A0 the squares of which are zero. The basis vector v2 is usually denoted by i.68) there is a choice whether to use v2 or −v2 as the second basis vector. but (v1 +v2 )·(v1 −v2 ) = 0 so there are nonzero elements having products 0. The world is in some ways not as interesting as it might be. exhibiting an interesting difference between the two fields R and C. finally if (x1 v1 + x2 v2 ) ∈ A−1 then (x1 v1 + x2 v2 ) · (x1 v1 − x2 v2 ) = (x21 + x22 )v1 for any real x1 . In (1. The elements of the field C.69) and the same algebraic operations. whichever is chosen clearly leads to the same algebra (1. where x = <(z) ∈ R is called the real part of the complex number z and y = =(z) ∈ R is called the imaginary part of the complex number z. since (xv1 ) · (yv1 ) = (xy)v1 the imbedding R ⊂ C is compatible with the algebraic operations in the fields R and C. no two of which are isomorphic. Thus it is possible to associate to any complex number z = x + iy ∈ C another complex number z = x − iy called the complex conjugate of z. x2 . alas there are no other finite dimensional real algebras that are fields8 . thus are written z = x + iy. as for instance v2 · v2 = 0. x2 . so nonzero elements always have nonzero squares. the complex numbers. It is also easy to see that these three real commutative algebras are distinct algebras. in A1 however (x1 v1 + x2 v2 ) · (x1 v1 + x2 v2 ) = (x21 + x22 )v1 + 2x1 x2 v2 for any real x1 . The field A−1 is called the field of complex numbers and is denoted by C.

and it is clear that aside from a constant factor the absolute value is the only norm on the vector space R1 . is defined by   x if x ≥ 0. (iii) the `∞ or supremum norm or sup norm of a vector x = {xj } is defined by kxk∞ = max1≤j≤n |xj |. (2. (iii) the triangle inequality: kx + yk ≤ kxk + kyk. However for general vector spaces there are a variety of norms in quite common use. For the 1-dimensional real vector space R the norm of a number x ∈ R. since for any norm kxk on R1 it follows from homogeneity that kxk = kx · 1k = |x| k1k.1) |x| = −x if x ≤ 0. the norm of the vector x.  That the absolute value satisfies the condition required in a norm is quite ap- parent. in particular there are the following three norms in the vector space Rn of dimension n > 1: (i) the `1 orPmean norm of a vector x = {xj } is defined by n kxk1 = j=1 |xj |. That the `1 norm satisfies the properties of a norm is clear since the absolute 39 . usually denoted by |x| and called the absolute value of x. (ii) the `2 or Cartesian norm or Euclidean norm of a vector x = {xj } is defined by q Pn 2 p kxk2 = j=1 xj = x21 + · · · + x2n with the non-negative square root. with the following properties: (i) positivity : kxk ≥ 0 and kxk = 0 if and only if x = 0. (ii) homogeneity: kcxk = |c| kxk for any c ∈ R. Formally a norm on a real vector space V is a mapping V −→ R that associates to any vector x ∈ V a real number kxk ∈ R.1 Normed Vector Spaces The geometric as distinct from the algebraic properties of vector spaces involve the notion of the length or size of a vector.Chapter 2 Topological Fundamentals 2.

y) is also a linear function of the vector y. It is convenient to demonstrate that inequality by first demonstrating an inequality involving the inner product of two vectors.3) kxk2 = (x. x). to verify that inequality. x) − 2(x. and this is an equality if and only if one of the two vectors is a non-negative multiple of the other. j=1 often the inner product is written (x. equation (2. if x.4) is called the polarization identity. y ∈ Rn (i) |(x. The `2 norm can be defined in terms of the inner product by p (2.4) (x. y) + (y. y) = 41 kx + yk22 − 14 kx − yk22 since kx + yk22 − kx − yk22 = (x + y. except perhaps for the triangle inequality. x) + 2(x. x + y) − (x − y. x) ≥ 0 and (x. x) = 0 if and only if x = 0. (iii) positivity: (x.2) (x. It is apparent from symmetry that the inner product (x. y ∈ Rn then since |xj | ≤ kxk∞ and |yj | ≤ kyk∞ for 1 ≤ j ≤ n it follows that |xj +yj | ≤ |xj |+|yj | ≤ kxk∞ +kyk∞ for 1 ≤ j ≤ n and consequently that kx + yk = max1≤j≤n |xj + yj | ≤ kxk∞ + kyk∞ . That the `2 norm satisfies these three properties also is obvious except for the triangle inequality. It is clear from its definition that the inner or dot product has the following properties: (i) linearity: (c1 x1 + c2 x2 . That the `∞ norm satisfies the properties of a norm also is clear. y) = x · y and called the dot product of the two vectors x and y. and this is an equality if and only if the two vectors are linearly dependent. y) for any c ∈ R. y) − (x. Theorem 2. y)| ≤ kxk2 kyk2 . x − y)     = (x. y) + c2 (x2 . y) = xj yj for vectors x = {xj } and y = {yj } ∈ Rn . Conversely the inner product can be defined in terms of the `2 norm by (2. y) = c1 (x1 . (ii) symmetry: (x. y) = 4(x. y). x) (with the non-negative square root). as is also quite clear from the definition.1 For any vectors x. y) = (y. . y) + (y. (ii) kx + yk2 ≤ kxk2 + kyk2 . is defined by n X (2. TOPOLOGICAL FUNDAMENTALS value of a real number is a norm.40 CHAPTER 2.

the quadratic polynomial for which a = kyk22 . by definition is never negative. introduce the continuous function f (t) of the variable t defined by n X (2. but from (2. That suffices for the proof. which is the in- equality (i). The very useful inequality (i) in the preceding theorem. This inequality is an equality if and only if b2 − 4ac = 0. b = 2(x. It is no doubt familiar that a real quadratic polynomial f (t) = at2 + bt + c for a > 0 is strictly positive for large values of |t|. is strictly positive for all t if b2 − 4ac < 0. has a single root at t = b/a if b2 − 4ac = 0 and is strictly positive for other values of t. y) ≤ kxk2 kyk2 .5) f (t) = kx + tyk22 = (xj + tyj )2 j=1 n X n X n X = x2j + 2t xj yj + t2 yj2 j=1 j=1 j=1 = kxk22 + 2t(x. hence   0 ≥ (b2 − 4ac) = 4 x. which is the case that the function f (t0 ) = 0 where t0 = b/a. y). y) + t2 kyk22 . and has two roots and is strictly negative for values of t between these roots if b2 −4ac > 0. NORMS 41 Proof: If x.5). y ∈ Rn where y 6= 0 and if t ∈ R. called the Cauchy- Schwarz inequality. y)2 ≤ kxk22 kyk22 and consequently (x.5) that means that x + t0 y = 0 hence that the vectors x and y are linearly dependent.2. c = kxk22 . can be written . y)2 − kxk22 kyk22 so (x.1. The function (2.

.

.

y) . (x.

(2.6) ψ .

.

.

≤ 1 if x 6= 0. y 6= 0. kxk2 kyk2 .

the value (|x1 ||x2 | · · · |xn |)1/n . The multiplicative analogue of the arithmetic mean is the geometric mean. There are a great number of inequalities such as the Cauchy-Schwarz in- equality that play significant roles throughout mathematics. and orthogonal or perpen- dicular if ψ = 0. which sometimes is called the mean norm since n1 kxk1 is the mean or average value of the n coordinates xj of any vector x = {xj } ∈ Rn . The geometrical interpretation of the parameter makes the `2 norm particularly useful in many applications. Two vectors x and y are said to be parallel and in the same direction if ψ = 1. parallel and in the opposite direction if ψ = −1. the mean or average often also is called the arithmetic mean of the n real numbers xj . Another such in- equality is related to the l1 norm.7) µ= |x1 | + |x2 | + · · · + |xn | n .2 (Inequality of the arithmetic and geometric means) For any real numbers xj ∈ R the arithmetic mean 1  (2. Theorem 2.

defined by (2. There is a useful notion of equivalence of norms kxka and kxkb on a real vector space V . kb such that kxkb ≤ kb kxkc and kxkc ≤ kc kxkc . which shows that the inequality held from the beginning and thereby concludes the proof.42 CHAPTER 2. After finitely many repetitions all the variables are equal and the inequality holds at the end. The process can be repeated so long as not all of the variables xj are equal. increasing the value of µn at each iteration. Indeed reflexivity and symmetry are clear. That this is actually an equivalence relation is quite straightforward. TOPOLOGICAL FUNDAMENTALS is related to the geometric mean by the inequality (2. Proof: It is clearly sufficient to prove the result under the additional hypothesis that xj > 0 for all j. if kxka  kxkb there are positive real numbers ca . Suppose then that x1 > x2 where x1 = max xj and x2 = min xj . but they are equivalent norms. The result of the theorem also holds clearly if x1 = x2 = · · · = xn . as indicated in Figure 2.8) |x1 x2 · · · xn | ≤ µn . If the variables x1 and x2 are replaced by x01 and x02 where x1 + x2 x01 = x02 = 2 then the value of µ is unchanged but since 1 1 x01 x02 = (x1 + x2 )2 = (x1 − x2 )2 + x1 x2 > x1 x2 4 4 this change increases the value of µn . hence kxka ≤ ca kxkb ≤ ca kb kxkc and kxkc ≤ kc kxkb ≤ kc cb kxka so that kxka  kxkc . The three norms on Rn defined previously here have quite different definitions and describe rather different geomtries.1.9) kxka  kxkb if there are real numbers ca > 0. n . since the result holds trivially if any variable is 0 and the absolute values of the variables are all nonnegative. cb such that kxka ≤ ca kxkb and kxkb ≤ cb kxkb while if kxkb  kxkc there are positive real numbers ka . cb > 0 such that kxka ≤ ca kxkb and kxkb ≤ cb kxk for all x ∈ V . as for transitivity.

1: The sets x ∈ R2 . o Figure 2.

`2 and `∞ norms on R2 . kxk ≤ 1 for the `1 . .

.

writing L(V. `W ) for the space of linear transformations between V and W when these vector spaces have `V and `W norms.10) kxk∞ ≤ kxk1 ≤ nkxk∞ and kxk∞ ≤ kxk2 ≤ nkxk∞ . W. but this norm depends explicitly on the norms chosen in the vector spaces V and W it is useful to modify the notation to indicate the norms explicitly. which suffices for the proof. On the other hand kxk1 = j=1 |xj | ≤ j=1 kxk∞ = nkx∞ n n √ and similarly kxk22 = j=1 x2j ≤ j=1 kxk2∞ = nkxk2∞ hence kxk2 ≤ nkxk∞ . NORMS 43 Theorem 2. and similarly kxk22 = j=1 |xj |2 ≥ |xj0 |2 for any particular index j0 so Pn Pn kx2 k ≥ kxk∞ . Pn Proof: First kxk1 = j=1 |xj | ≥ |xj0 | for any particular index j0 so kx1 k ≥ Pn kxk∞ . The operator norm kT ko then is defined by  . and consequently by symmetry and transitivity kxk1  kxk2 . which shows that kxk1  kxk∞ and kxk2  kxk∞ . `V .2. W ) of linear trans- formations between two normed vector spaces V and W .3 For any vector x ∈ Rn √ (2. Another sort of norm is defined on the vector space L(V. P P That demonstrates (2. consequently the `1 . `2 and `∞ norms are equivalent to one another.10).1.

 kT (x)kW .

.

11) kT ko = sup . (2.

W. Rn ) are described by matrices A ∈ Rn×m which are just vectors in Rnm . This norm takes the value kT ko = ∞ if the quotients kT (x)kW /kxkV are not bounded above and conse- quently the supremum is not a well defined real number. so the `1 . x∈V kxkV for any linear transformation T ∈ L(V. W. but that is not the case though even for some quite naturally defined linear transformations between infinite dimen- sional vector spaces. `2 and `∞ norms are well defined on these matrices just as for any vectors. for instance . Linear trans- formations A ∈ L(Rm . any linear transformation between finite dimensional vector spaces is a bounded linear transformation. As will be seen as the dis- cussion proceeds. `W ). and if S. That the operator norm actually is a norm for bounded linear transformations is clear. the linear transfor- mations T for which the supremum norm kT ko is a well defined real number are called bounded linear transformations. kxkV . kxkW ) are bounded linear transformations then k(S + T )(x)kW kS(x)kW kT (x)kW ≤ + for all x ∈ V kxkV kxkV kxkV hence kS + T ko ≤ kSko + kT ko so the triangle inequality holds. `V . it is obviously positive and homogeneous. T ∈ L(V.

.

kAk∞ = max .

aij .

1≤i≤n 1≤j≤m .

For any vector x ∈ Rm and any index 1≤i≤n .44 CHAPTER 2. TOPOLOGICAL FUNDAMENTALS for any matrix A = {aij } ∈ Rn×m .

m m m .

X .

X X .

.

.

a x ij j .

.

≤ |aij ||x j | ≤ kAk∞ kxk∞ = m kAk∞ kxk∞ j=1 j=1 j=1 so .

m .

X .

.

12) kAxk∞ = max . (2.

.

aij xj .

.

In many situations the particular norm is not terribly relevant since any equivalent norm could be used. if that is the case and if it is not necessary to specify just which norm is involved. for quite obvious reasons. Rn . `∞ ) is a bounded linear transformation and its operator norm has the bound (2. `∞ . the notation kxk will be used. `∞ ) is equivalent to the `∞ norm on A. `∞ ). `∞ . Of course there are comparable bounds for the equivalent norms kxk1 and kxk2 on either Rm or Rn . Rn ) the notation kxk stands for any of the various equivalent norms. for in some calculations the precise relations between the norms is significant. or usually throughout any single proof. kxk∞ ) it follows from the definition of the operator norm that for any basis vector δ j ∈ Rm kAδ j k∞ kA||o ≥ = kAδ j k∞ = max |aij | kδ j k∞ 1≤i≤n hence conversely (2. in particular kxk should have the same meaning in any single equation. . Rn . kxk∞ . `∞ . `∞ ). 1≤i≤n j=1 Thus a matrix A as a linear transformation A ∈ L(Rm . It then follows from (2. of course. Some care must be taken. In particular when considering the vector spaces Rm or the vector spaces L(Rm . and hence to the `1 and `2 norms on A.13) kAko ≤ mkAk∞ for A ∈ L(Rm .14) kAko ≥ kAk∞ for A ∈ L(Rm . Rn . `∞ . meaning any one of a set of equivalent norms where that set of norms is either specified or understood. Rn . Rn .13) and (2. ≤ mkAk∞ kxk∞ .13) that the operator norm on a linear trans- formation A ∈ L(Rm . On the other hand for any A ∈ L(Rm .

y 0 ) ≤ ρ(x0 . y) = ρ(y. and the basic properties of norms can be translated into basic properties for the notion of the distance between two vectors. y) = 1 if x 6= y. y) ≥ 0 and ρ(x. and the triangle inequality of the mapping ρ follows equally immediately from that of the norm. and not even all metrics on a vector space are derived from norms on that vector space. z) = 0 so x  z. For this purpose a metric on a set S is defined to be a mapping ρ : S × S −→ R with the following properties: (i) positivity: ρ(x. and (iii) the triangle inequality: ρ(x. but that if S = Rn this metric does not arise from any norm on Rn . In particular a normed vector space V with a norm kxk is a metric space with the metric norm defined by (2. In some cases spaces naturally occur equipped with a mappng ρ : S×S −→ R that satisfies all of the conditions for a metric except that ρ(x. (ii)symmetry: ρ(x. y).2. y) + ρ(y. y). Consequently it is possible to define a mapping ρ∗ : S ∗ × S ∗ −→ R by setting . y). It is clear that this relation satisfies the reflexivity and symmetry conditions for an equivalence relation. since for any norm ρ(cx. y) = 0 if and only if x = y. y) = ρ(y.  it is clear that this satisfies the conditions necessary to define a metric on S. y) ≤ ρ(x. and if x  y and y  z so that ρ(x. so this relation aosy satisfies the transitivity condition hence is an equivalent relation. y 0 ) = ρ(x. However metrics can be defined on a wide range of sets other than vector spaces. A metric space is a set S together with a metric ρ defined on it. x) + ρ(x. cy) = |c|ρ(x. On any such space it is possible to introduce an equivalence relation by setting x  y whenever ρ(x. y and x0 .. If x  x0 and y  y 0 then by the triangle inequality ρ(x0 .16) ρd (x. z) = 0 then by the triangle inequality ρ(x. the positivity and symmetry of the mapping ρ follow immediately from the pos- itivity and homogeneity of the norm.2. z) = 0 hence ρ(x. y) = kx − yk. METRICS 45 2. and it is convenient to axiomatize that notion and to derive the general properties that are applicable in a wide range of particular cases. such a mapping is called a pseudometric. y) for any scalar c ∈ R. z) + ρ(z. (2. z) ≤ ρ(x. so that ρ(x0 . y 0 yields the reverse inequality. y 0 ) = ρ(x. reversing the roles of x. The notion of distance between points is important in more general sets than just normed vector spaces. y) = 0 for some elements x. x). y) + ρ(y. and a set equipped with such a mapping is called a pseudometric space. y) = 0. y ∈ S for which x 6= y.15) ρ(x. For instance the discrete metric is defined on an arbitrary set S by   0 if x = y.2 Metric Spaces In a normed vector space the norm kx − yk can be interpreted as the dis- tance between the two vectors x and y. a metric space does not involve just a set but also a specified metric on that set.

.46 CHAPTER 2. in that case the two metric spaces are called isometric metri spaces. a2 . in terms of the metric ρ(x. the question whether a sequence converges or not may be quite difficult to answer for some sequences1 . Clearly if a sequence aν converges to a in terms of a metric ρ then it also converges to a for any metric equivalent to ρ. a0ν ) + ρ(aν . for instance setting aν = a for all numbers ν yields a perfectly acceptable sequence. is the only known cycle. A sequence {aν } does not necessarily consist of distinct points. is that no matter what natural number n is chosen the sequence eventually reaches the value 1.. TOPOLOGICAL FUNDAMENTALS ρ∗ (x∗ . On the other hand different spaces can have essentially the same metrics. 4.9) that equivalent norms on a vector space V determine equivalent metrics on the set V . 4. For that reason it is important to have tests to see whether sequences converge or not that do not 1 A well known sequence begins by setting a = n for any natural number n. a) <  whenever ν > N . φ(y) = ρS (x. nor does the sequence of real numbers given by aν = (−1)ν . y) ≤ ca ρb (x. that these two metrics are equivalent is indicated by writing ρa  ρb . a00 ) = 0 so a0 = a00 . 1.. A sequence in a metric space S is a set of points of S indexed by the natural numbers. It is evident that an isometry is necessarily an injective mapping. and in this way there is naturally associated to any pseudometric space an associated metric space. y ∗ ) = ρ(x. . although it has been verified for all values n ≤ 5 × 1018 and 1. A sequence does not necessarily converge or have a limit. a00 ) ≤ ρ(a0 . That has not yet been proved. . first made by Luther Collatz in 1937. y) for any x ∈ S representing the equivalence class x∗ and any x ∈ S representing the equivalence class x∗ . A sequence {aν } converges to a or has the limit a. cb > 0 such that ρa (x. . Two metrics ρa and ρb on a set S are defined to be equivalent metrics if if there are real numbers ca > 0. indicated by writing limν→∞ aν = a. 2. 2. 1. 4. 4. a00 ) <  whenever ν > N . at the time this is written. a3 . 2. As might be expected. this clearly defines a metric on the space S ∗ . y) in S. y) for all x. for if a0 and a00 are both limits of a sequence aν then for any  > 0 there is a number N such that ρ(aν . if for any  > 0 there is a number N such that ρ(aν . just as normed vector spaces can have several different norms. a00 ) < 2 and consequently ρ(a0 . 2. a0 ) <  and ρ(aν . 4. It is obvious from this definition and (2. which often can be used in place of the pseudometric space. y) for all points x. However if a sequence has a limit that limit is unique. y ∈ S. 1. . and then the sequence is periodic: 1. for example the sequence of real numbers given by aν = ν clearly does not converge. thus a sequence in S is an indexed set such as {a1 . and then for any such ν it follows from the triangle ineqality that ρ(a0 .. It is quite possible for a space to have several metrics. . and defining 1 the further terms inductively by setting aν+1 = aν /2 if aν is even and aν+1 = 3aν + 1 if aν is odd.} where aν ∈ S or just a list {aν } of points aν ∈ S for all ν ∈ N. and that if an isometry is also surjective then it is a one-to- one mapping between the two metric spaces and its inverse mapping is also an isometry. . The sequence of real numbers aν = a clearly has the limit a and the sequence of real numbers aν = 1/ν clearly has the limit 0. the conjecture. y) and ρb (x. 1. y) ≤ cb ρa (x. y ∈ S. . A mapping φ : S −→ T between metric spaces  S and T with metrics ρS and ρT is said to be an isometry if ρT φ(x).

2. so for any  > 0 there is a number N such that ρ(a. b) ≤  +  +  = 3 hence a = b. but it converges to the real number 0 which is not a point of S so the sequence does not converge in S. it follows from the triangle inequality again that ρ(a. It is not necessarily the case though that a Cauchy sequence is convergent. for there is a number Nn 1 such that |aµ − aν | < 2n whenever µ. a) < /2 + /2 =  whenever µ. Completeness depends not just on the point set but also on the particular metric. aν ) + ρ(aν . it was demonstrated earlier that 2 can be expressed as the supremum of a collection of rational numbers. ν > N . Two Cauchy sequences {aν } and {bν } are said to be equivalent Cauchy sequences. . aν ) <  and ρ(b. . are examples of metric spaces that are not complete. for example the metric space consisting of all real numbers x for which 0 < x ≤ 1 but with the metric given by ρ(x. Indeed suppose that {aν } and {bν } are Cauchy sequences with limits a = limν→∞ aν and b = limν→∞ bν . √ so it is possible √ to choose a sequence of rational numbers aν converging to 2. bν ) + ρ(bν . bν ) <  whenever ν > N . For example the set S of real numbers x for which 0 < x ≤ 1 is a metric space with the metric defined by the absolute value norm. When the intervals are constructed inductively it is clear that . bν ) ≤ ρ(aν . ν > N . a = b) + ρ(b = a. y) = | x1 − y1 | is complete. The basic property of the real number system can be restated in these terms as follows. Quite different Cauchy sequences may well have the same limit. indicated by writing {aν }  {bν }. yn ) and of natural numbers n such that |yn − xn | < n1 and that aν ∈ In for all ν > Nn . The metric space consisting of all real numbers x for which 0 < x ≤ 1 with the metric defined by the absolute value. is clearly a Cauchy sequence. 1/2.4 Real vector spaces with the metric defined by the `∞ or any equivalent norm are complete metric spaces. 1/4. a) + ρ(aν . ν > N . but since 2 is not a rational number the sequence anu does not converge in Q. A metric space S is said to be complete if every Cauchy sequence in S is convergent. For a more interesting example. and therefore by the triangle inequality ρ(aµ . . and conversely if {aν }  {bν }. ν > Nn so the interval In centered at aNn +1 will do. if for any  > 0 there is a number N such that ρ(aν . 1/3. a) < /2 whenever ν > N . Proof: First let {aν } be a Cauchy sequence in the vector space R1 . aν ) ≤ ρ(aµ .2. A sequence aν is a Cauchy sequence if for any  > 0 there is a number N such that ρ(aµ . There is then a collection of open intervals In = (xn . b) ≤ ρ(a. bν ) <  whenever ν > N . and the metric space consisting of all rational numbers with the metric defined by the absolute value norm. It is clear that any convergent sequence is a Cauchy sequence. If a = b then by the triangle inequality ρ(aν . for if limn→∞ aν = a then for any  > 0 there is a number N such that ρ(aν . the sequence 1. METRICS 47 involve knowing or conjecturing what the limit of the sequence is. It is easy to see that two Cauchy sequences have the same limit if and only if they are equivalent. bν ) <  whenever µ. aν ) <  whenever µ. bν ) <  +  = 2 so {aν }  {bν }. Theorem 2. so for any  > 0 the number N can be chosen sufficiently large so that in addition ρ(aν . the set Q of rational numbers with the metric defined by the √ absolute value norm is also a metric space.

while in the second case set a2 = a1 and M2 = M1 + 21 (M1 − a1 ). The numbers xn are bounded above and the numbers yn are bounded below. . . the real number r0 will be contained in at least one of these subintervals. the existence of least upper bounds for any set of real numbers bounded above. It is evident that x = y = lim aν so the sequence aν is convergent. Conversely if it is assumed that the real number system is a complete metric space with the metric associated to the absolute value then it is possible to prove that any set of real numbers bounded above has a least upper bound. Then either there is a point a2 ∈ S such that a1 + 12 (M1 − a1 ) < a2 ≤ M1 or a ≤ M1 + 21 (M1 − a1 ) for all a ∈ S. . . . and since xn < yn for each n then x ≤ y. and choose a point a1 ∈ S. 1. so the least upper bound x = sup xn and the greatest lower bound y = inf yn exist. 9.ν . and it is clear that A = sup S. so that r = n0 +r0 where 0 ≤ r0 < 1. . In the first case take the point a2 and set M2 = M1 . The process can be repeated inductively.ν is a Cauchy sequence so it must converge and therefore the entire sequence aν converges. Then if the interval from 0 to 1/10 is split into 10 equal intervals of length 1/102 by points n/102 for n = 0. TOPOLOGICAL FUNDAMENTALS it can also be assumed that In+1 ⊂ In and Mn+1 ≥ Nn . Note though that if at stage ν the remainder rν is at one of the points of subdivision then it can be viewed as being . while since yn − xn < n1 then x = y. the real number r1 will be contained in at least one of these subintervals. Repeating this construction leads to sequences aν and Mν which are Cauchy sequences with the same limit point A = limν→∞ aν = limν→∞ Mν . . Then if {aν } is a Cauchy sequence in Rn in the `∞ norm and if aν = (a1. in what is actually a very familiar construction.17) aν = n0 .n1 n2 · · · nν called a decimal expansion. Indeed suppose that S ⊂ R is bounded above by M1 . if n2 /102 ≤ r2 < (n2 + 1)/102 then r = n0 + n1 /10 + n2 /102 + r2 where 0 ≤ r2 < 1/102 . The real numbers can be described in terms of Cauchy sequences of rational numbers. For any positive real number r let n0 ∈ N be the largest natural number less than or equal to r. leading to the sequence of rational numbers aν = n0 + n1 /10 + n2 /100 + · · · + nν /10ν which are customarily written in the abbreviated notation (2. The proof of the preceding theorem used the basic completeness property. an. . so that a ≤ M1 for any a ∈ S.ν ) ∈ Rn it follows that for each i in the range 1 ≤ i ≤ n the sequence of real numbers ai. .48 CHAPTER 2. .ν . thus providing a very conve- nient way of describing that real number. thus in either case a1 ≤ a2 ≤ M2 ≤ M1 and (M2 − a2 ) ≤ 21 (M1 − a1 ). . 1. Thus the completeness of the real number system can be characterized either by the existence of a least upper’ bound for any sequence of real numbers bounded above or alternatively as a complete metric space with the metric defined by the absolute value. 9. That suffices for the proof. as a critical part of the argument. if n1 /10 ≤ r1 < (n1 + 1)/10 then r = n0 + n1 /10 + r1 where 0 ≤ r1 < 1/10. a2. The sequence of rationals {aν } is clearly a Cauchy sequence converging to the initial real number r. If the interval from 0 to 1 is split into 10 equal intervals of length 1/10 by points n/10 for n = 0.

in an expansion in the base 2. . for any metric space S let S ∗ be the set of all Cauchy sequences of elements of S. METRICS 49 contained in either of the two subintervals meeting at that boundary. so it is only a pseudometric. If a metric space S with a metric ρ is not complete it can be extended to a complete metric space. hence |ρ(aµ . and another decimal expansion 1. The details of this argument will be left to the interested reader. bµ ) − ρ(aν . and their axiomatic properties can be established in a rather straightforward way from the Cauchy sequences defining them.2. bµ ) ≤ ρ(aµ . and interchaning the roles of µ and ν leads to the corresponding inequality with the left-hand side reversed. bµ ). bν )| ≤ ρ(aµ . B) on S ∗ × S ∗ clearly satisfies all the conditions of a metric on S ∗ except that ρ(A. That is the way in which a real number can have one decimal expansion such as 1. bµ ) − ρ(aν .2340000000 . . bν ) ν→∞ exists.0110101000001001111 . B = {bν } ∈ S ∗ then ρ(aµ . aν ) + ρ(bν . The function ρ(A. To demonstrate this.2339999999 . The initial set S is imbedded as a subset of S ∗∗ by associating to any a ∈ S the equivalence class of the Cauchy sequence defined by aν = a. Note also that any decimal expansion is a Cauchy sequence so does describe a real number. . . if there is one. It should be added however that the same construction can be used based on any natural number n rather than 10 for the expansion. For example 2 can be expressed as a Cauchy sequence of the form √ 2 = 1. bν ) + ρ(bν . although of course any two of these Cauchy sequences must be equivalent Cauchy sequences. aν ) + ρ(bν . bµ ) and consequently ρ(aµ .. . bν )} is a Cauchy sequence of real numbers. . B) = lim ρ(aν . In this way the real numbers can be defined alternatively as the set of equivalence classes of rational numbers.2. when any two elements at distance 0 apart are considered equivalent. If A = {aν }. √ leading to the an expansion in the base n of any real number. However as noted earlier the space S ∗∗ of equivalence classes of elements of S ∗ . thus the process does not necessarily povide a unique Cauchy sequence describing the real number r. B) = 0 for some sequences A 6= B. bν ) ≤ ρ(aµ . Since A and B are Cauchy sequences it follows from the preceding inequality that {ρ(aν . . aν ) + ρ(aν . bµ ). is a metric space under the induced metric. That shows that the axioms do define the real numbers as a nontrivial set consistent with the properties of the rational numbers. so from the completeness of R it follows that the limit ρ(A.

TOPOLOGICAL FUNDAMENTALS It was discovered during the development of topology that a good many of the basic results did not require the use of a norm or metric but only the use of the notion of nearness provided by a norm or metric.50 CHAPTER 2. y). which is the subset of S defined by n . That notion is based on the concept of an -neighborhood N (a) of a point a in a metric space S with a metric ρ(x.

o (2.18) N (a) = x ∈ S .

. x) <  . ρ(a.

These properties can be summarized in the statement that an arbitrary union of open sets is open. If U ⊂ S is an open subset in the topology defined by  the metric . If Ui are open sets and a ∈ U = i Ui then for each i there is an i > 0 such that Ni (a) ⊂ Ui ⊂ U . so if  = min i then  > 0 and N (a) ⊂ Ni (a) ⊂ Ui for all U hence N (a) ⊂ U so U is open. For example an open -neighborhood N (a) of a point a ∈ S is an open set. and if x ∈ N0 (b) where 0 <  − ρ(a. and the condition that for each a ∈ ∅ there is an  > 0 such that Ni (a) ⊂ ∅ is fulfilled vacuously since there are no points a ∈ ∅. b) + 0 <  so that N0 (b) ⊂ N (a). An open set in the metric space S is defined in terms of these -neighborhoods as a subset U ⊂ S such that for each point a ∈ U there is an  > 0 for which N (a) ⊂ U . b) ≤ c2 ρ1 (a. intuitively an open set is one that contains with any point a all points that are sufficiently near a. b) and ρ2 (a. a finite intersection of open sets is open. and the empty set and the set S itself are open. x) ≤ ρ(a. TN (ii) if Ui ∈ T for 1 ≤ i ≤ N then i=1 Ui ∈ T . b) ≤ c1 ρ2 (a. b) < . b ∈ S. indeed if b ∈ N (a) then ρ(a. b) + ρ(b. The collection T of all the open subsetsSof a metric space S has the following properties: (i) if Uα ∈ T then α Uα ∈ T . That argument fails if there are infinitely many such sets Ui since in that case there does not necessarily exist an  > 0 such that  ≤ i for all i. b) for all points a. (iii) ∅ ∈ T and S ∈ T . It is easy to see that equivalent metrics on a metric space S determine the same collection of open subsets of S. Indeed it isTobvious that any union of open sets is open. Indeed if ρ1 and ρ2 are equivalent metrics on a set S then by definition there are constants c1 > 0 and c2 > 0 such that ρ1 (a. The entire set S of course is open. b) then by the triangle inequality ρ(a. x) < ρ(a.

ρ1 then for any point a ∈ U thereis an .

> 0 such that N  (a) = x ∈ S .

ρ 1 (a. x) < /c1 ⊂ U so that U is also an open set in terms . x) <  ⊂ U . Then N/c2 (a) = x ∈ S ρ2 (a.

The converse results follows upon reversing the roles of the two metrics ρ1 and ρ2 . and that is the geometric content of the assertion that these metrics are equivalent.1. so the two metrics determine the same open sets. and `∞ norms are the subsets sketched in Figure 2. The normed vector spaces considered in Section 2. An r-neighborhood of a point a ∈ Rn in the `2 norm is also called an open ball of radius r centered at the point a ∈ Rn and is . The -neighborhoods of the origin 0 ∈ Rn in the `1 .1 provide illustrative ex- amples of metric spaces. It is clear from the sketch that any one of these neighborhoods of the origin can be fit into another one by choosing a sufficiently small . of the metric ρ2 . and are open sets in the metric topology defined by any of these norms. `2 .

2. as distinct from the general expectation for -neighborhoods. METRICS 51 commonly denoted by Br (a). here r is not expected to be particularly small.2. The corresponding sets in an arbitrary metric space S are the sets n .

19) Br (a) = x ∈ S . o (2.

x) < r . ρ(a.

In place of the -neighborhoods in the `∞ norm in Rn though it is more common to consider slightly more generally the open cells in Rn . defined in terms of the metric ρ on S. subsets of the form n .

20) ∆ = x = {xj } ∈ Rn . o (2.

aj < xj < bj for 1 ≤ j ≤ n .

In particular in R1 a cell is just the set of points x ∈ R such that a < x < b. for arbitrary real numbers aj < bj . b) denotes the set of points x ∈ R for which a ≤ x < b while (a.5 An open subset U ⊂ R1 is a disjoint union of a countable number of open intervalsl. In general open sets can be quite complicated sets with no simple general description as cells or balls. It is clear that these latter three sets are not open sets. b) stands for the set of points x ∈ R for which x < b. thus [a. the exception though is the special of the vector space R1 . when all the open sets can be described quite simply. b). b] denotes the set of points x ∈ R for which a < x ≤ b and [a. ∞) stands for the set of point x ∈ R for which x > a while (−∞. also called an open interval and denoted by (a. In this notation (a. b] denotes the set of points x ∈ R for which a ≤ x ≤ b. n . Theorem 2. It is also traditional to use notation such as this for intervals other than open intervals.

o Proof: If U ⊂ R1 is an open set and a ∈ U let E = x ∈ R .

. (a. x) ∈ U .

If the set E is not bounded above then (a. a+ ). a+ ) is contained in U and is disjoint from the complement U ∼ (a− . which is impossible since a+ = sup(E). a+ ) ⊂ U . ∞) ⊂ U . Since U is open E 6= ∅. Choosing a rational number in each of these intervals maps the set of intervals injectively to a subset of Q. If a+ ∈ U then (a+ . then clearly (a. a+ ). where the supremum exists as a consequence of the completeness of the real number system. . so U is the union of a disjoint collection of open intervals. hence a+ ∈ / E. The same argument can be applied to any other point in U outside the interval (a− . If the set E is bounded above and a+ = sup(E). The corresponding argument shows that there is a value a− so that (a− . a+ ) ⊂ U but a− ∈ / U . Thus the open interval (a− . so there are points x ∈ E for which x > a. and that suffices for the proof. a+ + ) ∈ U since U is open. so the set of intervals is countable.

or a countable union of closed sets. A topology on a space can be defined alternatively by specifying the collection of closed sets and defining the open sets to be the complements of the closed sets. but an arbitrary union of closed sets is not necessarily closed. and there is a wide range of topologies between these two extreme cases. open neighborhoods in a general topological space play the role of -neighborhoods in a metric space with the metric topology. however it is customary to define topologies in terms of the open sets. The complements of the open subsets in a topological space S of S are called the closed sets of S. However there are not uncommon situations in which a set of interest is a countable intersection of open sets. called the metric topology. and the empty set ∅ and the full set S are closed. hence the set itself is an intersection of some collection of open sets. so caution is required when attempting to extend properties of metric spaces to general topological spaces. Of course any set whatsoever is a union of its points. hence is a union of some collection of closed sets. TOPOLOGICAL FUNDAMENTALS 2. (iii) ∅ ∈ C and S ∈ C. The sets in T are called the open sets in the topology T . but there are considerably more general open neighborhoods of any point in a metric space. SN (ii) if Ui ∈ C for 1 ≤ i ≤ N then i=1 ∈ C. . since -neighborhoods are themselves open sets. For instance any set S can be given the discrete topology in which T contains all the subsets of S. It is important to remember that arbitrary intersections of open sets are not necessarily open.3 Topological Spaces A topological space is defined to be a set S together with a collection T of subsets of S suchSthat: (i) if Uα ∈ T then α Uα ∈ T . An open subset of a topological space S containing a point a is called an open neighborhood of the point a. a set called a Gδ set. thus in a topological space any union or finite intersection of open sets is open. It follows immediately from the defining properties of a topology that the collection C of closed sets in a topological space S has the following properties:T (i) if Uα ∈ C then α ∈ C. The open sets in a metric space S provide a topology on S. . with some quite special properties. and the complement of any set thus is a union of some collection of closed sets. Thus in a topological space an arbitrary intersection and a finite union of closed sets are closed.52 CHAPTER 2. The -neighborhoods of a point a in a metric space are open neighborhoods of a. however metric spaces really are a special category of topological spaces. (iii) ∅ ∈ T and S ∈ T . a set called an Fσ set. or the indiscrete topology in which T contains just the empty set ∅ and the set S itself. In a sense. but there are many other possible topologies on a metric space. and arbitrary unions of closed sets are not necessarily closed. and the empty set ∅ and the full set S are open. TN (ii) if Ui ∈ T for 1 ≤ i ≤ N then i=1 Ui ∈ T .

A set S with the indiscrete topology is an example of a topological space that is not Hausdorff. b) = 2 > 0 and the  neighborhoods N (a) and N (b) are disjoint open sets for which a ∈ N (a) and b ∈ N (b). and the argument can be repeated. since a1 is closed the complement S ∼ a1 is open so U ∼ a1 = U ∩ (S ∼ a1 ) is an open neighborhood of a and hence there is a point a2 ∈ (U ∼ a1 ) ∩ E with a2 6= a. for if a. so U must indeed contain infinitely many distinct points of E.6 A subset of a topological space is closed if and only if it contains all its limit points. if S = R1 and R = [−1. A special class of topological spaces.3. For example. and that the closed sets in the relative topology of a subset R ⊂ S are just the complements in R of the open sets of R. but the interval [0. The union U = b∈S. and since a = S ∼ U it follows that a is a closed set. For instance. so it will be assumed that a topological space is Hausdorff whenever it is either necessary or just convenient to do so. What can be a bit confusing is that the open or closed sets in R although they are subsets of S as well are not necessarily open sets or closed sets in S. b ∈ S there are disjoint open subsets Ua . while that is not necessarily true in a general topological space. b6=a Ub is then an open subset of S. 1) is open in R1 and is also open in R. but these sets of course are not open subsets of R2 . A limit point of a subset E ⊂ S of a topological space is a point a ∈ S such that every open neighborhood of a contains a point of E other than a. 1] ⊂ S = R then the interval (0. For another example. are by definition topological spaces S with the property that for any two distinct points a. 1] is open in R since it is the full set R but it is not open in R1 . The set of limit points of E is denoted by E 0 and is called the derived set of E. Theorem 2. are topological spaces with the property that any two distinct points have disjoint open neighborhoods. On the other hand if S is a set containing more than one point then a point of S is not closed in the indiscrete topology. known as Hausdorff spaces. It is quite obvious that the collection of these intersections does satisfy the conditions to define a topology on T . A metric space S with its natural topology is a Hausdorff space. Ub ⊂ S with a ∈ Ua and b ∈ Ub .2. If S is a Hausdorff space a point a ∈ S is a limit point of a subset E ⊂ S if and only if every open neighborhood of a includes infinitely many distinct points of E: for if a is a limit point of E and U is an open neighborhood of a then there is a point a1 ∈ U ∩ E with a1 6= a. and of course a2 6= a1 . or more briefly. Hausdorff spaces have a number of particularly useful and natural properties. b ∈ S are two distinct points in the metric space S then ρ(a. . a point in a Hausdorff space is a closed set. TOPOLOGICAL SPACES 53 If S is a topological space then any subset R ⊂ S can be given the induced topology or relative topology by defining the open sets in the topology of R to be the intersections R ∩ U of the subset R ⊂ S with the open subsets U of the topological space S. Indeed if S is a Hausdorff space and a ∈ S is a point in S then for any point S b 6= a in S there is an open subset Ub ⊂ S such that a ∈ / Ub . if S = R2 and R = { (x1 . x2 ) ∈ R2 | x2 = 0 } then the relatively open subsets of R are just the open subsets of the x1 axis viewed as the set R1 .

denoted by E o . x) = r }. although in that case it is just a single point and is not the closure of an open ball. denoted by ∂E. hence the closure E can be characterized as the intersection of all closed sets in S containing E. Indeed if a ∈ (E)0 and U is any open neighborhood of a then there must be a point a1 ∈ E ∩ U other than a. b) is the closed interval [a. A closed cell can be defined in this way even when a1 = b1 for example. a) ≤ r }. TOPOLOGICAL FUNDAMENTALS Proof: If E ⊂ S is a closed set and a ∈ E 0 but a ∈/ E then U ∼ E is an open neighborhood of a that is disjoint from E. thus U is an open subset of S containing a. every perfect set must be an infinite set. is defined to be the subset of E consisting of all points a ∈ E for which there is an open neighborhood U of the point a such that U ⊂ E. It is easy to see that at least in a Hausdorff space the closure E of any set E is a closed set. On the real line R1 the closure of an open interval (a. since it contains an open neighborhood of each of its points. hence a ∈ E 0 ⊂ E so E is closed. x) < r } is the closed ball Br (a) = { x ∈ S | ρ(x. and in Rn the closure of an open cell ∆ = { x ∈ Rn | ai < xi < bi } is the closed cell ∆ = { x ∈ Rn | ai ≤ xi ≤ bi }. It follows readily from this that if A ⊂ F where F is a closed set then A ⊂ F . The boundary of a subset E ⊂ S. thus it is impossible that a ∈ / E and consequetly a ∈ E. Clearly the boundary of any set is a closed set. which contradicts the assumption that a ∈ E 0 . is defined to be the union E = E ∪ E 0 . thus aside from the empty set. That suffices for the proof. The interior of a set E also can be characterized by E o = S ∼ (S ∼ E). By definition then perfect sets are not only closed sets but have the additional property that every point of the set is a limit point. A closed ball can be defined in this way even when r = 0.54 CHAPTER 2. since a is closed in a Hausdorff space. for x ∈ (S ∼ E o ) precisely when . and since that is true for any point a ∈ / E it follows that S ∼ E is open and hence E is closed. It is clear that E o is an open set. On the other hand suppose that E ⊂ S is a set for which E 0 ⊂ E. which is true for any limit point of E so E 0 ⊂ E. is defined to be the intersection ∂E = E ∩ (S ∼ E). so since a1 ∈ E 0 there must be a point a2 ∈ (U ∼ a) ∩ E. denoted by E. In a metric space the closure of an open ball Br (a) = { x ∈ S | ρ(a. The closure of a subset E ⊂ S of a topological space S. b]. and it is evident from the definition that a ∈ ∂E if and only for any open neighborhood U of the point a there are points a1 ∈ U ∩ E and a2 ∈ (S ∼ E) ∩ U . The boundary of an open or closed cell ∆ = { x ∈ Rn | ai < xi < bi } in Rn is just the set of points ∂∆ = { x ∈ Rn | ai ≤ xi ≤ bi for all i but ai = bi for at least one i } . If a ∈ / E then a cannot be a limit point of E so there is some open neighborhood U of a for which U ∩ E = ∅. The interior of a subset E ⊂ S in a topological space S. so by the definition of the closure either a1 ∈ E or a1 ∈ E 0 . A set E is said to be a perfect set if E = E 0 . Thus in either case there is a point in the intersection U ∩ E other than a. hence E o can be characterized as the union of all open subsets of E. The boundary of an open or closed ball Br (a) in a metric space S is the subset ∂Br (a) = { x ∈ S | ρ(a. consequently if U is an open set such that U ⊂ E then U ⊂ E o . but in that case it is not the closure of an open cell. In the second case the set U ∼ a is an open neighborhood of a1 .

but then since U is open it also must be the case that a + sb ∈ U for some real numbers s > s0 . and the interior of a closed ball Br (a) in a metric space S is the set (Br (a))o = Br (a) if r > 0. For example the set of rational numbers with denominator at most 100 is nowhere dense in R1 . an open covering of a subset E ⊂ S of a topological space S is a collection of open .3. The vector space Rn with the metric topology is connected.2. A particularly useful special class of subsets of a topological space has a somewhat subtle and indirect definition. A subset E ⊂ S is said to be connected if it is a connected set in the relative topology. since U is open. TOPOLOGICAL SPACES 55 every open neighborhood of x contains some points not in E. one of the characteristic properties of the real number system. and it is bounded above since b 6∈ U . choose a point a ∈ U and a point b ∈ Rn ∼ U . indeed the interval [ 2. 3] ⊂ Q is both open and closed in Q. In a Hausdorff space an individual point is a closed set. for this is the limit of the points a + sb ∈ U as s → s0 . The same argument shows that an open and closed balls and open or closed cells in Rn are connected sets. A subset E ⊂ S is said to be nowhere dense if its closure E contains no open sets. since that point is a proper subset that is both open and closed. On the other hand the set Rn with the discrete topology is not a separable space. A topological space is said to be separable if it has a countable dense subset. so (S ∼ E o ) = (S ∼ E). but a set S with the discrete topology is not connected if it contains at least two points. which will be shown not to be countable. so this set has a least upper bound s0 . A point a ∈ S is an isolated point of S if the set consisting just of that point is an open set. Since U is closed it must be the case that a + s0 b ∈ U . For the purposes of this definition. For example the set of all points in Rn with rational coefficients is a countable dense subset of Rn . if S = E. so if a Hausdorff space S consisting of more than one point contains an isolated point then S is not connected. since any real number can be approximted as closely as desired by a rational number. and the subset R1 of the plane R2 is nowhere dense in R2 . or equivalently. hence the space Rn is a separable topological space. and is somewhat harder to understand than the classes of subsets described thus far. for no set in Rn has any limit points in the discrete topology and consequently the only dense subset is Rn itself. a contradiction. or sometimes dense in S. For example the interior of a closed cell ∆ = { x ∈ Rn | ai ≤ xi ≤ bi } o in Rn is the set ∆ = { x ∈ Rn | ai < xi < bi }. a point a ∈ E in a subset E ⊂ S is an isolated point of E if the set consisting just of that point is an open set in E with the relative topology. which is the empty set if ai = bi for any i. Any set S with the indiscrete topology is connected. A subset E ⊂ S of a topological space is said to be dense. A topological space S is said to be connected if the only subsets of S that are both open and closed are the empty set ∅ and the entire set S. but it is very important class of subsets with a wide range of uses. Indeed if U ⊂ Rn is a subset other than the empty set or all of Rn that is both open and closed. On the other hand the subset Q ⊂ R of rational points with the relative √ √ topology it inherits from R is not a connected set. The set of real numbers s such that a + t(b − a) ∈ U for 0 ≤ t < s is nonempty. for more interesting examples see page 60. if (E)o = ∅.

The collection of open sets Ub for all points b ∈ K then cover the compact set K. Of course any single point and any finite set of points are obviously compact set. if for any open covering {Uα } of E finitely many of the sets Uα actually cover all of E. Proof: (i) If F ⊂ K ⊂ S where F is closed S and K is compact and if Uα ⊂ S are open subsets of S such that F ⊂ α Uα then the sets Uα together with U = S ∼ F form an open covering of K. (ii) If Kn ⊂ S are nonempty T compact subsets such that Kn+1 ⊂ Kn for all natural numbers n then n Kn 6= ∅. that is. Theorem 2. Examples of noncompact sets are quite easy: for instance R with its usual topology is not compact. and since that . It is rather more difficult to demonstrate that other sets are compact.a is then an open neighborhood of a that is disjoint from U and hence is an open neighborhood of a that is contained in S ∼ K. n + k1 ) for all positive integers k but again it is not contained in any finite collection of these intervals. n) is not compact. n) but not in any any finite set of these intervals. (iv) If S is Hausdorff and K ⊂ S is compact then any infinite subset E ⊂ K has a limit point in K. Even a finite interval (n. since it is contained in the union of all the open intervals (n − k1 . U2 ⊂ S such that K1 ⊂ U1 and K2 ⊂ U2 . since it is necessary to show that any open covering has a finite subcovering. Tare then finitely many of the sets Uα that cover F . so finitely many of these sets serve to cover K and hence also serve to cover F . since it is contained in the union of all the open intervals (−n. excluding the set U .56 CHAPTER 2. called a subcovering of the set E.7 (Compactness Theorems) Compact sets in a topological space S have the following properties: (i) A closed subset of a compact subset in S is compact. Before finding further compact sets though it is convenient to list a number of standard properties of compactness. which is a contradiction since K1 is compact. but to show that E is compact it is necessary to show that for any open covering {Uα } of E finitely many of the sets Uα already cover E. A subset E ⊂ S is said to be compact if every open covering of E has a finite subcovering. (v) If S is Hausdorff and K1 .a of a. (ii) If n Kn = ∅ then S the sets Un = S ∼ Kn for n > 1 are open subsets of S such that K1 ⊂ α Un . hence F is compact. K2 ⊂ S are disjoint compact subsets then there are disjoint open subsets U1 . these finitely many sets. TOPOLOGICAL FUNDAMENTALS S subsets Uα ⊂ S such that E ⊂ α Uα . (iii) If S is Hausdorff then any compact subset of S is closed. but no finite subset of the sets Un can cover K1 . (iii) If K ⊂ S is compact and a ∈ (S ∼ K) then since S is Hausdorff for any point b ∈ K there are disjoint open neighborhoods Ub of b and Ub. To show that a set E is not compact it suffices to find a single open covering {Uα } of E such that no finite collection of the sets Uα can cover E. If some of the sets Uα are redundant they can be eliminated and the remaining sets are also an open covering of E. so finitely T many of these sets Ubi already serve to cover K. The finite intersection U = i Ubi . thus it cannot be the case that T n Kn = ∅.

Then as before the sets U1 = i U1. in the sense that a subset K ⊂ R is compact in the relative topology of R if and only if it is compact in S.a and a ⊂ Ua . and corre- spondingly for closed sets. It was already noted that if R is a subset of a topological space S then open subsets of R in the relative topology need not be open subsets of S. an = bn ). Then E has no limit points at all. thus e(∆) ≥ 0 and e(∆) = 0 if and only if the cell is just the single point (a1 = b1 .a and Ua such that K1 ⊂ U1. the sets Vα of course cover K. The sets Ua for all a ∈ K2 form an open covering of K2 . so in particular E is a closed subset of S. . since no finite collection of the sets : Ua can cover the entire infinite set E. (iv) Suppose E ⊂ K is an infinite subset of the compact subset K ⊂S and E has no limit points in K. but that is impossible. Furthermore every point a ∈ E has an open neighborhood Ua that does not contain any of the other points of E. The basic nontrivial example of a compact set is a closed cell in Rn . so it must have a finite subcovering. However compactness is really an absolute property. so K is a compact subset of R in the relative topology. The collection of these sets Ua together with the open set S ∼ E is then an open covering of K.b . and since K2 is compact finitely many T Uai of these setsS already serve to cover K2 . which is a compact. Ua. Next if K2 is any compact set it follows from the special case just demonstrated that for any point a ∈ K2 there are disjoint open ses U1.3. Therefore E must have limit points in K. the proof that a cell is compact rests on the completeness property of the real numbers.ai and U2 = i Uai are disjoint open subsets of S such that K1 ⊂ U1 and K2 ⊂ U2 . so finitely many of them will cover K since K is compact in S and consequently finitely many of the sets Uα will cover K as well. (v) Suppose first that K2 = a is a single point. so K is compact as a subset of S. Just for the purposes of the proof the edgesize of the cell ∆ = { x ∈ Rn | ai ≤ xi ≤ bi k} is defined to be the real number e(∆) = maxi (bi − ai ).2. hence finitely many of the sets Uα also serve to cover K. there will be open subsets Vα ⊂ S for which Uα = Vα ∩ R. since any limit point of E is a limit point of K and hence must be contained in K since K is closed by (iii). .bi is an open set containing a and U1 ∩ U2 = ∅ as desired. Since S is Hausdorff then for any point b ⊂ K2 there are disjoint open subsets Ub . which suffices for the proof. Indeed if K ⊂ R is compact in the relative topology and if Uα are open sets in S that cover K then the intersections Uα ∩ R are open sets in the relative topology of R that cover K so finitely many of these sets serve to cover K. so finitely many of these sets Ubi already serve to cover K1 . TOPOLOGICAL SPACES 57 is true for any point a it follows that S ∼ K is open and consequently that K is closed. .b ⊂ S such that b ∈ Ub and a ∈ Ua. a2 = b2 . . . and if if Uα are relatively open sets in R that cover K. Conversely if K is a compact subset of S and K ⊂ R. The collection of open sets Ub for all b ∈ K1 form an open covering of the compact set KS 1 . TThe union U1 = i Ubi is an open set containing K1 and the intersection U2 = i Ua.

TOPOLOGICAL FUNDAMENTALS Lemma 2. 2. nsuch that ∆ν+1 ⊂ ∆ν and limν→∞ e(∆ν ) = 0 then ν ∆ν is a single point of R .1 If ∆ν ⊂ Rn are closed cells n T in R for ν = 1. . .58 CHAPTER 2. . Proof: If ∆ν = { x = {xj } .

aνj ≤ xj ≤ bνj } then for each j clearly aν+1 .

and since ∆ν ⊂ ∆1 it is also the case that aj ≤ bj and bj ≥ a1j . ν ν 1 ν The basic completeness property of the real number system implies that any increasing sequence of real numbers bounded from above and any decreasing sequence of real numbers bounded from below have limiting values. therefore limν→∞ aνj = aj and limν→∞ bνj = bj for some uniquely determined real numbers . j ≥ aνj ν+1 and bj ≤ bj .

and it is clear that the cell ∆ = { x = {xj } . bj . aj .

aj ≤ xj ≤ bj } is contained in the intersection ν ∆ν . On the other hand since (bj − aj ) ≤ (bνj − aνj ) ≤ e(∆ν ) T and limν→∞ e(∆ν ) = 0 it must be the case that bj = aj so the limiting cell ∆ is just a single point of Rn . . Theorem 2. which concludes the proof.8 A closed cell in Rn is compact.

Proof: If a closed cell ∆ = {x = {xj } .

7 (iii) there are disjoint open neighborhoods V of a and W of ∂(U ∩ K o ). The cell ∆ can be written as the union of the closed cells arising from bisecting each of its sides.9 A nonempty perfect set in a locally compact Hausdorff space contains uncountably many points. and limν→∞ e(∆ν ) −→ 0. The result is that there is a collection of closed cells ∆ν which cannot be covered by finitely many of the open sets {Uα } and for which ∆ν+1 ⊂ ∆ν ⊂T∆. since the cells ∆ν were chosen so that none of them could be covered by finitely many of the sets {Uα }. and repeat the process. so that a ∈ V where V is a closed subset of K hence is compact. which concludes the proof. This point must be contained within one of the sets {Uα0 }. hence at least one of the subcells cannot be covered by finitely many of the sets {Uα }. and since V ∩ W = ∅ it follows that V ⊂ (U ∩ K o ) ⊂ U . thus the space Rn is at least locally compact. If S is a locally compact Hausdorff space then for any point a ∈ S and any open neighborhood U of the point a there is an open neighborhood V of the point a such that a ∈ V ⊂ V ⊂ U where V is compact. a single point in Rn . and if ν is sufficiently large then ∆ν ⊂ Uα0 as well. Theorem 2. but those that are have some particularly convenient properties. but that is a contradiction. which is not the case. and since the point a and the boundary ∂(U ∩ K o ) are closed subsets of the compact set K then by Theorem 2. Then bisect each of the sides of that subcell. aj ≤ xj ≤ bj } is not compact there is an open covering {Uα } of ∆ that does not admit any finite subcovering. Indeed a ∈ K o for a compact subset K ⊂ S and U is an open neighborhood of a then a ∈ (U ∩ K o ). Although the entire vector space Rn with its standard topology is not com- pact. It then follows from the preceding lemma that ν ∆ν = a. meaning that each point in Rn is an interior point of a com- pact set. . Not all metric spaces are locally compact. Therefore the cell ∆ is compact. any closed cell contained in Rn is compact. If finitely many of the sets {Uα } covered each of the subcells then finitely many would cover the entire set ∆.

obtaining in this way a sequence of open subsets Un ⊂ S such that Un ⊂ Un−1 . since E2 is dense. since E = E 0 the set E must contain infinitely many points. In particular the field R of real numbers is uncountable2 . following the notion of category for which Baire proved the theorem. since E1 is dense.3. Repeating this process inductively yields a sequence of nonempty subsets Ui ⊂ S such that Ui+1 ⊂ Ui ∩Ei+1 .7 (ii). The intersection U ∩ E1 is nonempty. and Paul Cohen showed in 1963 that it is consistent with the rest of mathematics to assume that there are such sets. TOPOLOGICAL SPACES 59 Proof: Suppose that E is a nonempty perfect subset of a locally compact Hausdorff space S and that E contains only countably many points a1 . Since b ∈ Ui ⊂ Ei for all i it follows that b ∈ E. which is a contradiction. .10 (Baire’s Theorem) The intersection of a countable number of dense open subsets of a locally compact Hausdorff space S is dense in S. so there is a point a2 ∈ U1 ∩ E2 and an open neighborhood U2 of a2 so that U2 6= ∅ and U2 ⊂ U1 ∩ E2 . the hypothesis that there are no subsets of the set R of real numbers that have cardinality strictly greater than that of the set Q of the rationals but strictly less than that of the entire set R of real numbers.. Since a1 ∈ U1 ∩ E 0 the set U1 must contain infinitely many points of E. a2 .2. so there is a point b ∈ i Ui . or equivalently whether there are any sets S such that ℵ0 < #(S) < c where c = #(R). All of the sets Ui are closed subsets of the compact T set U hence are compact sets. which is possible since S is locally compact. The preceding theorem then shows that a locally compact Hausdorff space S is a set of the 2 One of the classical questions in mathematics is the truth or falsity of the continuum hypothesis. and if a1 ∈ U ∩ E1 then since both U and E1 are open there will be a nonempty open neighborhood U1 of the point a1 such that U1 ⊂ E1 ∩ U . Theorem 2. . Choose an open neighborhood U1 of the point a1 in S such that U1 is compact. consider a point a ∈ S and an open neighborhood U of the point a. hence E is dense in S as desired. The sets Un ∩ E are closed sets that are contained in the compact set U1 so they areT also compact. so it is possible to choose an open set U2 ⊂ U1 such that U2 ∩ E 0 6= ∅ but a1 ∈ / U2 . and since S is locally compact it can be assumed that the closure U is compact. Now continue this process inductively. showing that the assumption that E is countable is false. thereby concluding the proof. The intersection U1 ∩ E2 is nonempty. Proof: T If Ei are dense open subsets of a locally compact Hausdorff space S and E = Ei . going back to Georg Cantor. they are nonempty and Un ⊂ Un−1 so their intersection U = n (Un ∩ E) is a nonempty subset of E. . However by construction the set U can contain no points of E. and since Ui+1 ⊂ Ui the intersection T U 0 = i Ui is nonempty by Theorem 2. Kurt G¨ odel showed in 1940 that it is consistent with the rest of mathematics to assume that there are no such sets. that Un ∩ E 0 6= ∅ but that an−1 ∈ / Un . A subset of a topological space is said to be of the first category if it is countable union of nowhere dense sets. . and of the second category if it is not of the first category. It is usually called Baire’s category theorem. The preceding theorem is of perhaps surprising use in a good deal of analysis.

and since a closed cell is compact by Theorem2. cannot be represented as a countable union of nowhere denseSsubsets. Proof: It is only necessary to demonstrate that if a subset E ⊂ Rn has the property that any infinite subset of E has a limit point in E then E is compact.7 (iii) that a compact subset of Rn is closed. TOPOLOGICAL FUNDAMENTALS second category.7 (i) that the set E is compact. 3. so as a set of the first category. An interesting consequence of this result is that the rational numbers cannot be written asTan intersection of countably many open subsets of R. Indeed suppose that Q = Ui for some openSsubsets Ui ⊂ R. Theorem 2.60 CHAPTER 2.11 (Heine-Borel Theorem) A subset E ⊂ Rn is compact if and only if it is closed and bounded. that is. and if a set E is not bounded it can be covered by the open balls Rr (0) for r = 1. It is worth examining some more nontrivial and perhaps surprising examples of the topology of subsets of vector spaces Rn . . so of course S = i Ai . Weierstrass demonstrated the result of Theorem 2. . or equivalently.8 it follows from Theorem 2. Proof: Any bounded closed subset E ⊂ Rn is contained in some closed cell. Indeed suppose S that Ai ⊂ SSare nowhere dense sets such that S = i Ai . If E is not bounded it must contain points aν such that kaν k ≥ ν. so E is not compact. but not by any finite subset of these balls. so are useful to keep in mind in order not to leap to conclusions that may seem intuitively obvious but that are in fact false. these illustrate some of the complications that are possible. For this purpose a subset S ⊂ Rn is said to be bounded if it is contained in a ball Br (a) or equivalently is contained in a cell ∆. in contradiction to the Baire category theorem. such as Rn . so that the set I of irrationals can be written as the union I = i (R ∼ US i ) of the countably many closed nowhere dense sets R ∼ Ui . If E ⊂ Rn is not compact then by the Heine-Borel Theorem either E is not bounded or E is not closed. To return to the special case of the vector spaces Rn again. That suffices for the proof. but then i (S ∼ Ai ) = ∅. Theorem 2. On the other hand it also follows from Theorem 2. . and at least in Rn the converse also holds. 2. That suffices to conclude the proof. compactness can be characterized quite simply and usefully. If E is not closed then at least one of its limit points is not contained in E. . any infinite set of points in a compact subset K ⊂ Rn has a limit point in K.7 that in any Hausdorff space. that if E ⊂ Rn is not compact then it contains an infinite set of points with no limit point in E. since Rn is Hausdorff. but then R = Q + i (R ∼ Ui ) is a representation of the real numbers R as a countable union of nowhere dense sets.12 (Bolzano-Weierstrass Theorem) A subset E ⊂ Rn is com- pact if and only if every sequence of distinct points in E has a limit point in E. which contradicts the preceding theorem since the sets S ∼ Ai are dense open subsets of S. and this set of points clearly can have no limit point at all.

. The set S thus is an open subset of R1 that contains all the rationals. 1] are described by their ternary expansions. let I1 ⊂ I0 be the set that arises from I0 if the open middle third ( 31 . the Cantor set contains no close intervals. any point in the Cantor set is then a limit of the set of these end points. TOPOLOGICAL SPACES 61 The first example is the Cantor set. On the other hand.a1 a2 a3 a4 . so the Cantor set is not only closed but perfect. where ai = 0 or 2. but the total length of these intervals is at most (1 + 21 + ( 12 )2 + ( 12 )4 + · · · + ( 21 )2 N ) <  so they cannot cover I. Note that the end points of the segments in each set In belong to the Cantor set. The Cantor set is the intersection C = n In . ..5 that open subsets of R1 consist of countably many disjoint open intervals (a. one of the classical and canonical examples of surprising sets. let I2 be the set that arises from I1 if the open middle third of each segment of I1 is removed. 1]. since any interval will eventually contain points that are removed. Let I0 be the closed unit interval I0 = [0. S nals are countable. a2 .2. Let S be the union S = i Ii where Ii is the open interval centered at the point ai and of length 2i . b). It is an interesting exercise to note that the boundary of the set S is its closure.3. 23 ) of I0 is removed. so list them as a1 . . . . It is interesting to note that the Cantor set is a closed nowhere dense subset of the unit interval. Indeed if S covered the closed interval I = [0. it is clear that the Cantor set consists of those real numbers in [0. and that the complement of S is a closed nowhere dense subsetsof R1 . When the real numbers in the interval [0. 1] for instance then since that interval is compact finitely many N of the intervals Ii would serve to cover I. . The ratio. the expansions to the base 3. The next example is the rational neighborhood. which is a nonempty compact subset of the unit interval I0 . It was noted in Theo- rem 2. 1] with a ternary expansion . and continue that process. This process produces nonempty compact subsets T In ⊂ I0 such that I0 ⊃ I1 ⊃ I2 ⊃ · · · . so a ternary expansion for which ai 6= 1. a3 . which sounds somewhat simpler than it may be in some cases. but nonetheless it does not cover the entire real line.

62 CHAPTER 2. TOPOLOGICAL FUNDAMENTALS .

this definition thus is equivalent to the usual definition for a metric space in terms of the metric topology. a mapping f : S −→ T is defined to be continuous  at a point a ∈ S if for any  > 0 there is a δ > 0 such that ρT f (x). with metrics ρS and ρT . a mapping f : S −→ T from a set S to a set T associates to each point x ∈ S a point f (x) = y ∈ T .Chapter 3 Functions and Mappings 3. for if f −1 N (b) contains an open neighborhood U of the point b then for some δ sufficiently small Nδ (a) ⊂ U so that f Nδ (a) ⊂ N (b). f (a) <  whenever ρS (x. and conversely if there is a δ > 0 such that f N  δ (a) ⊂ N (b) then Nδ (a) is an open −1 neighborhood of a contained in f N (b) .1 Continuous Mappings As defined earlier. Yet another equivalent way of defining continuity is that a mapping f : S −→ T is continuous at a point a ∈ S where f (a) = b ∈ T if for any  > 0 the inverse image f−1 N (b) contains an open neighborhood of the point a ∈ S. In view of this last equivalence. so is a perfectly reasonable definition in general. a) < δ. Finally for yet one more equivalent statement. It is quite evident that if a mapping f is continuous at a point a ∈ S in terms of metrics ρS and ρT it is also continuous in term of any equivalent metrics ρ0S and ρ0T where ρ0S  ρS and ρ0T  ρT . if it is just assumed that S and T are topological spaces a mapping f : S −→ T is defined to be continuous at a point a ∈ S if f −1 (V ) is an open neighborhood of a for any open neighborhood V of f (a). a mapping f : S −→ T is continuous at a point a ∈ S if f −1 (V ) is an open neighborhood of a for any open neighborhood V of f (a). Mappings having special regularity properties are of particular interest. a mapping f : S −→ T is continuous  at a point a ∈ S if for any  > 0 there is a δ > 0 such that f Nδ (a) ⊂ N (b) where b = f (a). 63 . perhaps the simplest sort of regularity is continuity. If S and T are metric spaces. in terms of -neighborhoods in these metric spaces. which can be defined in a variety of ways depending on the particular sorts of sets involved. Equivalently. for any open neighborhood V of b contains N (b) for a sufficiently small  > 0 and N (b) is itself an open neighborhood of b.

both of which are open. g : S −→ R from a topological space S to the real numbers are continuous then their sum (f +g) is continuous since it is the composition of the continuous mappings (f.64 CHAPTER 3. since kf (x) − f (a)k∞ = maxi |fi (x) − fi (a)|. For some other simple examples. for the inverse image f −1 (V ) of any open subset V ∈ T is either the empty set ∅ or the entire set S. since if a ∈ S then either f (a) ∈ V or f (a) ∈ / V and only one of these cases is possible. and if kck = 0 the mapping is constant hence continuous while if ||ck 6= 0 then |f (x) − f (a)| <  whenever kx − ak∞ < /kck∞ so the mapping is also continuous. x2 ) − f (a1 . is continuous. Pm one for which f (x) = i=1 ci xi . |a1 |−1 . Therefore any linear transformation T : Rm −→ Rn is continous in Rm . and that is clearly equivalent to the condition that f −1 (V ) is an open subset of S for any open subset V ⊂ T . usually called m functions. their product f g is continuous since it is the composition of the continuous mappings (f. Consequently a mapping f : S −→ T is continuous if and only if f −1 (C) is closed for each closed subset C ⊂ T . and the mapping f is continuous if and only if the coordinate functions fi (x) are continuous. g) : S −→ R2 and the product x1 x2 of real numbers. The simplest example of a continuous mapping between topological spaces is a constant mapping. |a2 |−1 ). A linear function. If f is continuous at a point a ∈ R and g is continuous at the point b = f (a) ∈ S then the composition h = g ◦ f is continuous at the point a ∈ R. for |f (x) − f (a)| = | i=1 ci (xi − ai )| ≤ kck∞ kx − ak∞ . a mapping f : R2 −→ R1 for which f (x1 . a2 )| = |(x1 − a1 )(x2 − a2 ) + a1 (x2 − a2 | + a2 (x1 − a1 ) ≤ |x1 − a1 ||x2 − a2 | + a1 |x2 − a2 | + a2 |x1 − a1 | <  < 1 if |xi − ai | < 13  min(1. where f (x) = {fi (x)} for 1 ≤ i ≤ n. More generally if mappings f. Consequently for many purposes it is enough to consider mappings f : RmP−→ R1 . for if |x − a| < min( 12 |a|. for if W is an open neighborhood of c ∈ T then V = g −1 (W ) is an open neighborhood of b since g is continuous at b and U = h−1 (W ) = f −1 (V ) is an open neighborhood of a since f is continuous at a. for |f (x1 . Mappings f : Rm −→ Rn between real vector spaces are described by their coordinate functions fi (x). Conseqently if f : R −→ S is a continuous mapping in R and g : S −→ T is a continuous mapping in S then h = g ◦ f : R −→ T is a continuous mapping in R. a mapping f : S −→ T such that f (x) = b for all points x ∈ S and a single point b ∈ T . For any mappings f : R −→ S and g : S −→ T the composition h = g ◦ f : R −→ T is the mapping that associates to any point a ∈ R the point h(a) = g f (a) ∈ T . g) : S −→ R2 and the addition x1 + x2 of real numbers. A mapping f : S −→ T is continous in a subset U ⊂ S if its restriction f |S : S −→ T is continuous in terms of the relative topology or induced topology in U . since the closed sets are precisely the complements of the open sets. /2|a|) then |x| > 12 |a| and |f (x)−f (a)| = |ax|−1 |x−a| < . MAPPINGS A mapping f : S −→ T between two topological spaces is defined to be continuous if it is continuous at each point a ∈ S. It is obvious that f −1 (T ∼ V ) = S ∼ f −1 (V ) for any mapping f : S −→ T and any subset V ⊂ T . A mapping f : R1 −→ R1 for which f (x) = 1/x is continuous at any point a 6= 0. It . and the mapping 1/f is continuous at any point a ∈ S at which f (a) 6= 0 since it is the composition of the continuous mappings f : S −→ R and the inverse 1/x of real numbers. a2 ). x2 ) = x1 x2 is continuous at any point (a1 .

The function of two variables  x1 x2  x2 + x2 if |x1 | + |x2 | =  6 0 1 2 (3.3) h(x1 . and the image of a closed set need not be closed.4) f (x) = x if 2 ≤ x ≤ 3  clearly is a continuous mapping. The subset [2. However the image of an open set need not be open. are all continuous functions. Theorem 3. 3]. x2 ) ∈ R2 except at the origin. hence .1 The image of a compact subset K ⊂ S under a continuous map- ping f : S −→ T is compact. CONTINUOUS MAPPINGS 65 follows from these observations that any sums and products of continuous real valued functions. 1) ⊂ S is closed in S but its image [0.1. and the subset [0. or alternatively by the condition that the inverse image of a closed set is closed. (3. However some properties of sets are preseved under mappings rather than their inverses. 2) is open in [0. or equivalently can be discontinuous. The real-valued function   0 if x is rational. at which it is discontinuous. 3] with the induced topology as a subset of R1 the mapping f : S −→ [0. 3] . The real-valued function   0 if x is irrational. 3] ⊂ S is open in S but its image [2. 3] is closed in [0. some obvious and others perhaps not so obvious. q q q is continuous at all irrational numbers but discontinuous at all rational numbers.1) f (x) = 1 if x is irrational  is continuous at no point. 1) ∪ [2. Functions can fail to be continuous.2) g(x) =  1 if x = p for coprime p. (3. for a variety of reasons. 3] defined by   2x if 0 ≤ x < 1 (3.3. and since K is compact is it contained in a union of finitely many of the sets f −1 (Uα ). Continuity of a mapping f : S −→ T between two topological spaces is characterized by the condition that the inverse image of an open set is open. Proof: If K ⊂ S is compact and f (K) is contained in a union of open sets Uα then K is contained in the union of the open sets f −1 (Uα ). For example if S = [0. and inverses of continuous real-valued functions at points at which they are nonzero. x2 ) =   0 if x1 = x2 = 0 is continuous at every point (x1 .

1. 1] is not compact. but its inverse fails to be continuous at the point 2. indeed for the continuous mapping (3. Proof: If the mapping f : S −→ T is one-to-one it has a well defined inverse mapping g : T −→ S. 2]) = [0.1 If f : S −→ R is a continuous real-valued function on a compact topological space S then there are points a. For example the mapping (3. If E is closed then by Theorem 3. and correspondingly for β = inf x∈U f (x). since S is compact. Theorem 3.7 (iii). A one-to-one continuous mapping f : S −→ T between two topological spaces has an inverse mapping. There are some general conditions which ensure that a one-to-one continuous mapping is a homeomorphism. That suffices for the proof.4) the inverse image f −1 ([0.2 A one-to-one and continuous mapping f : S −→ T from a com- pact Hausdorff space S onto another topological space T has a continuous in- verse. However the inverse image of a compact set under a continuous mapping need not be compact. That suffices for the proof. hence is closed by Theorem 2. which is impossiblel since S is assumed connected. That suffices for the proof. its inverse then is a homeomorphism f −1 : T −→ S. and then g −1 (E) = f (E) is compact by Theorem 3. On a noncompact space such as the interval I = (0. but the inverse mapping may not be continuous. Proof: If the image f (S) is not connected then it contains a nonempty proper subset E ⊂ f (S) that is both open and closed. 1) ⊂ R clearly supx∈I x = 1 and supx∈I x = 0 but these values are not attained at any point of I. and that suffices for the proof. b ∈ S for which (3. Corollary 3. If the inverse of a one-to-one mapping f : S −→ T between two topological spaces is continuous the mapping f is called a homeomorphism between the two spaces S and T . To show that g is continuous it suffices to show that g −1 (E) is closed for any closed subset E ⊂ S.4) is a continuous one-to-one mapping. Theorem 3.3 The image of a connected topological space S under a continuous mapping f : S −→ T is connected.5) f (a) = sup f (x) and f (b) = inf f bx). so if α = supx∈S f (x) then since α is a limit point of the set f (S) it must be contained in the set f (S) hence α = f (a) for some point a ∈ S. . hence is a closed set. its inverse image f −1 (E) ⊂ S then is a nonempty proper subset of S that is also both open and closed. 1) ∪ 2 of the compact set [0. x∈S x∈U Proof: The image f (S) ⊂ R is compact by the preceding theorem. One application of the preceding theorem is very commonly used.66 CHAPTER 3.1 it is compact. MAPPINGS f (K) is contained in the union of the images f f −1 (Uα ) = Uα of these finitely  many open sets.

On the other hand if the point a is an isolated point of S then b = limx→a f (x) for any b ∈ R. b] −→ R is a continuous function in the interval [a.1.4) once again is a continuous one-to-one mapping from the set S that is not connected to the con- nected set [0.2 If f : [a. indicated by b = limx→a f (x).6) that if a function f : S −→ R has a limit at a point a ∈ S . but since that intersection is just the point a itself so a is In view of the uniqueness of limits. indeed the mapping (3. For a connected topological space S consisting of more than a single point. b] = E− ∪ E+ . (3. but these sets then closed as well as open. for instance the function f : R −→ R that is 1 if x is rational and 0 if x is irrational does not have a limit at any point x ∈ R. is of particular interest and importance.b : S −→ R defined by   f (x) for x 6= a.  is continuous at the point a. ∞) are nonempty disjoint open subsets of the interval [a. b] is connected. b]. The preceding theorem has a very commonly used corollary. since there are no points x ∈ [a. hence the notion of the limit of a function is of interest primarily in connected topological spaces. b] ⊂ R where f (a) = A < f (b) = B then any C ∈ [A. A function may not have a limit.b and f˜a. B] is the image C = f (c) of some point c ∈ [a. C) and E+ = f −1 (C.b (x) = b for x = a. The special case of continuous functions in a topological space S. or alternatively for any  > 0 there is an open neighborhood U of the point a ∈ S such that f (U ) ⊂ N (b). and more detailed information and further techniques for dealing with continuous functions are available for special classes of topological spaces.6) f˜a. it is evident from (3. Corollary 3. A function f : S −→ R in a topological space S is said to have the limit b at a point a ∈ S. explicitly that means that for any  > 0 the inverse image f −1 (N (b)) is an open neighborhood of a in S. which is impossible since [a. CONTINUOUS MAPPINGS 67 The inverse image of a connected topological space under a continuous map- ping need not be connected however. The sets E− = f −1 (−∞. map- pings f : S −→ R. Indeed if f˜a. the intersection of these two neighborhoods then also is an open neighborhood of a in S. 3]. since the inverse image under f of any sufficiently small open neighborhood of any point b ∈ R is either empty or consists just of rational numbers or consists just of irrational numbers hence is not an open neighborhod of x. b] for which [a. since an isolated point a is an open subset itself.3. if the limit limx→a f (x) exists for a mapping f : S −→ R it is unique. Proof: Suppose to the contrary that there is some value C such that A < C < B but f (x) 6= C for all a ≤ x ≤ b.c are continuous at a for real values b 6= c there are disjoint open neighborhoods Ub and Uc of these two points for which f −1 (Ub ) and f −1 (Ub ) are open neighborhoods of a in S. so the inverse image of the connected set [0. 3] is not connected. if the mapping f˜a. That contradiction suffices to conclude the proof. b] such that f (x) = C.

On the other hand if it is not the case that limx→a f (x) = b then  for some  > 0 there will exist neighborhoods N1/n (a) such that f N1/n (a) ∈ / N (b). In particular if a function f is defined and bounded in an open neighborhood Nr (a) of a point a ∈ S then of Nr1 (a) ≤ of Nr2 (a) whenever r1 ≤ r2 ≤ r and it is then clear that the limit  (3. so there will be points an ∈ N1/n (a) such that f (an ) ∈ / N (b). Theorem 3. so if limn→∞ an = a there is a number N such that if n > N then an ∈ Nδ (a) and hence |f (an ) − f (a)| < . so in particular |f (x) − f (a)| <  whenever x ∈ Nδ (a) and consequently f is continuous at the point a. its oscillation in any subset E ⊂ S is defined by (3. It is sometimes useful to characterize limits in terms of sequences.8) of (a) = lim of Nr (a) r→0 exists.5 A bounded function f in an open neighborhood of a point a in a metric space S is continuous at the point a if and only if of (a) = 0. it is called the oscillation of the function f (x) at the point a. That suffices for the proof.7) of (E) = sup f (x) − inf f (x). That suffices for the proof. that is. hence of Nδ (a) <  and consequently of (a) = 0. If a function f : S −→ R is bounded in S. thus limn→∞ an = a but it is not the case that limn→∞ f (an ) = b.  Proof: If of (a) = 0 then for any  > 0 there is a δ > 0 such that of Nδ (a) < . x∈E x∈E this is well defined since the values |f (x)| are bounded. which shows that limn→∞ f (an ) = b. therefore |f (a)| − 21  ≤ |f (x)| ≤ |f (a)| + 21  for any x ∈ Nδ (a).4 If f : S −→ R is a function on a connected metric space S then limx→a f (x) = b if and only if limn→∞ f (an ) = b for any sequence {an } in S ∼ a such that limn→∞ an = a. Conversely if f is continuous at a then for any  > 0 there a δ > 0 such that |f (x) − f (a)| < 12  whenever x ∈ Nδ (a). Lemma 3.1 If f is a bounded function in a subset S of a metric space S then n . It is evident that if E1 ⊂ E2 then of (E1 ) ≤ of (E2 ). Proof: If limx→a f (x) = b then for any  > 0 there is an open neighborhod Nδ (a) of the point a ∈ S such that |f (x) − f (a)| <  whenever x ∈ Nδ (a). Theorem 3.68 CHAPTER 3. For the special case of bounded functions in a metric space S another char- acterization of continuity has several uses. MAPPINGS then f is continuous at a if and only if f (a) = limx→a f (x). satisfies |f (x)| ≤ M for all points x ∈ S and some constant M .

o E = x ∈ S .

of (x) <  .

. is an open subset of S for any  > 0.

3.1. Theorem 3. Proof: By the preceding lemma the set n . If b ∈ Nδ (a) ∩ S and δ0 is chosen sufficiently small that  N‘δ0 (b)  ⊂ Nδ (a) then   of Nδ0 (b) ∩ S ≤ of Nδ (a) ∩ S <  and therefore Nδo (b) ∩ S ⊂ E. That shows that E is an open subset of S and thereby concludes the proof.6 If f is a bounded function in a metric space S then the set of those points of S at which f is continuous form a Gδ set. CONTINUOUS MAPPINGS 69  Proof: If a ∈ E then of (a) <  so of Nδ (a) ∩ S <  if δ is sufficiently small.

1o En = x ∈ S .

of (x) < .

in the sense that they cannot be stated purely in terms of open and closed sets. such that ρT f (a). involving the metrics used to define the topology. which may depend on the point a.2) of a function that is continuous at the irrationals and discontinuous at the rationals. that is. f (x) <  whenever ρS (x. therefore it follows from the preceding theorem that there does not exist a function that is continuous precisely at the rational numbers and discontinuous at the irrational numbers. It was demonstrated as an application of Baire’s Theorem. Proof: If f : S −→ T is a continuous mapping and  > 0 then for any point a ∈ S there is a δa > 0 such that ρT f (a). so finitely many Ni of these neighborhoods for 1 ≤ i ≤ N already . to complement the discussion of the example (3. that the rational numbers do not form a Gδ subset of the real numbers. a countable intersection of open sets.10. if for each point a ∈ S and every  > 0 there is a δa > 0. The collection of open sets Nδa (a) for all points a ∈ S covers the compact space S. f (x) <  whenever ρS (x. Theorem 2. Some properties of continuity are not purely topological properties.7 A continuous mapping f : S −→ T from a compact metric space S to a metric space T is uniformly continuous. a) < δ. For a mapping f : S −→ T between two metric spaces S and T with metrics ρS and  a ∈ S if and only if for every  > 0 ρT . Not all continuous mappings are uniformly continuous. for example the mapping f : R −→ R defined by f (x) = x2 is not uniformly continuous. Theorem 3. f (x) < 21  whenever x ∈ Nδa (a). a) < δ. or equivalently if for any  > 0 there exists δ > 0 such that ρT f (a). and that suffices for the proof. which is just the condition that f is continuos at the point x ∈ S by Theorem 3. The intersection E = n∈N En is consequently a Gδ set. but are metric properties. the mapping f is continuous at a point there is a δ > 0 such that ρT f (a). and the mapping f is continuous in S if it is continuous at each point a ∈ S. However compactness is sufficient to guarantee that a continuous mapping is uniformly continuous. The mapping f is uniformly continuous in S if it is possible to find values δa > 0 that are independent of the point a ∈ S.5. f (x) <  whenever ρS (x. a) < δ. Clearly x ∈ E if and only if of (x) = 0. n T is an open subset of S for any natural number n.

Proof: Suppose that the sequence of continuous mappings fn : S −→ T from a topological space S to a metric space T convergs uniformly to a mapping f : S −→ T . and since ρS (x.70 CHAPTER 3. A sequence of points in a complete metric space converges if and only if it is a Cauchy sequence. The sequence is said to converge in S if it converges at each point of S. since that holds for any a ∈ S it follows that the limit mapping is continuous in S. f (x) <  whenever n > N. y ∈ S are any two points such that ρS (x. If the function fN is continuous at the point a then there is an open neighborhood U of the point a in S such that ρ fN (x) − fN (a) <  for all points x ∈ U .8 The limit of a uniformly convergent sequence of continuous map- pings from a topological space to a metric space is a continuous mapping. and in that case the limit of the sequence is the function f (x) for which f (x) = limn→∞ fn (x). then x ∈ Ni for one of these finitely many neighborhoods. The limit of a convergent sequence of continu- ous functions is not necessarily continuous though. MAPPINGS suffice to cover S. y) < 21 δ. It is easy to see further that a sequence of mappings from a topological . which shows that the limit mapping is continuous at the point a. Then for all x ∈ U it follows from the triangle inequality that     ρ f (x) − f (a) ≤ ρ f( x) − fN (x) + ρ fN (x) − fN (a) ≤ lρ fN (a) − f (a) <  +  +  = 3. f (x) = lim xn = n→∞ 1 if x = 1.  so f is uniformly continuous and that concludes the proof. and if x.  For mappings to metric spaces though there is a simple condition that is suffi- cient to ensure that the limit of a sequence of continuous functions is continuous. If δ = mini δai then δ > 0. Theorem 3. f (y) ≤ ρT f (x). which suffices for the proof. for instance the functions fn (x) = xn are continuous in the interval [0. Then for any point a ∈ S and any  > 0 there is a number N such that ρ fN (x) − f (x) < . f (ai ) + ρT f (a). A sequence of continuous mappings fn : S −→ T between two topological spaces is said to converge at a point a ∈ S if the sequence of points fn (a) converges in T . f (y) ≤ 21  + 12  = . It follows that ρT f (x). it follows immediately that a sequence of mappings fn : S −→ T from a topological space S to a complete metric space T converges in S if and only if the sequence {fn (x)} is a Cauchy sequence at each point x ∈ S. 1] ⊂ R and converge to the limit function   0 if 0 ≤ x < 1. A sequence of mappings fn : S −→ T from a topological space S to a metric space T with the metric ρ is uniformly convergent to a mapping  f : S −→ T if for any  > 0 there is a number N such that ρ fn (x). y) < 12 δ it is also the case that y ∈ Ni .

11) ρ fm (x). can be stated as a result about the interchange of the order of limits. and if the individual mappings fn have well defined limits at a point a ∈ S and these limits converge as well. and some caution is always required since the order in which limits are taken can be critical.3. bn ≤  for all m.9 is readily seen not to hold. then the points bn converge to a point b ∈ T and limx→a f (x) = b. so for any  > 0 there is a number N such that  (3. which is sometimes quite useful. ρ f (x). n→∞ x→a x→a n→∞ If the individual mappings fn are continuous at the point a ∈ S. The sequence {fn (x)} then converges at each point x ∈ S  consequently upon taking the limit as m → ∞ it follows that to a value f (x). fn (x) <  for all x ∈ S whenever m.9 If a sequence of mappings fn : S −→ T from a topological space S to a complete metric space T converges uniformly to a mapping f : S −→ T . n > N . so long as the limits exist. n ≥ N. fn (x) <  whenever m. If a sequence of mappings fn : S −→ T from a topological space S to a metric space T converges to a mapping f : S −→ T . Indeed it is obvious that a uniformly convergent sequence of mappings is uniformly Cauchy. so that the sequence {fn } converges uniformly to f in S. n > N . CONTINUOUS MAPPINGS 71 space S to a complete metric space T is uniformly convergent if and only if it is a uniformly Cauchy sequence. as x approaches a it follows from (3. it is natural to ask whether (3.9) holds can arise even when the functions involved are not necessarily continuous at the point a. Questions about interchanging orders of limits arise surprising frequently in analysis. Theorem 3. that is. A local version of Theorem 3. On the one hand. so that limx→a fn (x) = fn (a) and limn→∞ limx→a fn (x) = limn→∞ fn (a) = f (a). whether the limit function is continuous. fn (x) ≤  for all x ∈ S whenever n > N .11) that  ρ bm .10) lim lim fn (x) = lim lim fn (x). and if the limits lim11x→a fn (x) = bn ∈ T exist. fn (x) <  for all m.1.9) lim lim fn (x) = lim lim fn (x).  meaning that for any  > 0 there is a number N such that ρ fm (x). x ∈ S. The question whether (3. For instance for the ordinary functions fn (x) = 1/(1 + nx) of a real variable limx→0 fn (x) = 1 for all n hence limn→∞ limx→0 fn (x) = 1 while limn→∞ fn (x) = 0 for all points x 6= 0 hence limx→0 limn→∞ fn (x) = 0. n > N . Even in simple cases though the equality 3. this is just the question whether f (a) = limx→∞ f (x). n→∞ x→a x→a n→∞ Proof: Since the sequence of mappings {fn } is uniformly convergent it is uni- formly Cauchy.9. On the other hand if fn : S −→ T is a uniformly Cauchy sequence  in S then for any  > 0 there is a number N such that ρ fm (x). thus (3.

72 CHAPTER 3. and moreover it is a commutative algebra. The subset C(S) ⊂ B(S) of bounded continuous functions is a closed subspace. fN (x) + ρ fN (x). MAPPINGS so the sequence {bn } is Cauchy hence converges to a point b ∈ T .11) that  ρ fN (x). bN ) ≤  . The vector space B(S) is a complete vector space. since the uniform limit of a sequence of continuous functions is continuous by Theorem 3.8. b ≤ ρ f (x). f (x) ≤ . This set is a real vector space. and that suffices for the proof. b ≤ .  so whenever ρ(x. bN + ρ bN . and that a sequence of functions fn ∈ B(S) converges in this norm if and only if the sequence is uniformly convergent. The `∞ norm of a function f ∈ B(S) is defined by kf k∞ = supx∈S |f (x|. It is quite clear that this is a norm on the space B(S). and since kfm (x) − fn (x)k∞ ≤  for allm. On the other hand as m tends to ∞ it also follows from (3. a) < δ it follows that’ ρ f (x). and ρ(b. The set of bounded real-valued functions on a topological space S is denoted by B(S). which implies that for any point x ∈ S the values fn (x) form a Cauchy sequence of real numbers and hence converges to a real value f (x). By the triangle inequality     ρ f (x). just as for the analogous norm on a vector space. since it is easy to see that any Cauchy sequence of functions fn ∈ B(S) converges. n > N this inequality holds in the limit as m → ∞ so that kf (x) − fn (x)k∞ ≤  whenever n > N and consequently the sequence fn converges to f in the `∞ norm. which is a finite value since by assumption a function f ∈ B(S) is bounded. a) < δ. . bN ≤  whenever ρ(x. b . since the product of any two bounded functions is also bounded. Indeed if fn ∈ B(S) is a Cauchy sequence then for any  > 0 there is a number N such that kfn −fm k∞ <  whenever m. since the sum of two bounded functions and the product of a bounded function by a constant is also bounded. Since limx→a fN (x) = bN there is a δ > 0 so that ρ fN (x). n > N . hence C(S) is also a complete metric space.

h→0 h Note that this condition only involves values h 6= 0. for it is clear that equiva- lent norms yield the same condition of differentiability. that of a mapping f : R3 → R2 . which in this case takes the form    h a11 a12 a13  1         f1 (a + h) f1 (a) 1 (h) = + h2 + . .12) for the matrix A with (h) = 0. since ||f (a+h)−f (a)k = kAh+(h)k ≤ kAko khk+k(h)k <  if khk is sufficiently small. which is (3. illustrates the general form of the criterion (3. it follows that the constant A is uniquely determined.2.3.13) lim =A for some A ∈ R. A linear trans- formations f (x) = Ax for a matrix A ∈ Rm×n is differentiable at any point a ∈ Rm since f (a + h) = A(a + h) = f (a) + Ah. Since the limit is uniquely defined. it is called the derivative of the function f at the point a and is denoted by f 0 (a). where kAko is the operator norm of the linear transformation A. If f : Rm −→ Rn is differentiable at a then it is continu- ous at a. Note though that in the quotient ||(h)||/||h|| the vectors and hence the norms involved are from two different vector spaces.2 Differentiable Mappings For mappings between vector spaces Rn it is possible to consider not just continuous mappings but also differentiable mappings. and consequently f (x) is differentiable at any point a ∈ R and its derivative is f 0 (a) = cnan−1 . The case m = 3 and n = 2.12) for A = 0 and (h) = 0. |h| h consequently the condition that the function f (x) is differentiable at the point a is that f (a + h) − f (a) (3.12) f (a + h) = f (a) + Ah + (h) where limh→0 ||h|| =0 where kk is the `∞ norm or any equivalent norm on. The simplest case is that of mappings f : R1 −→ R1 . A mapping f : U −→ Rn from an open subset U ⊂ Rm into Rn is differentiable at a point a ∈ U if there is a linear transformation A : Rm −→ Rn described by an n × m matrix A such that for all h in an open neighborhood of the origin in Rm ||(h)|| (3.12) by the real number h so that |(h)| f (a + h) − f (a) = − A. functions of a real variable. which is (3. in that case the norm is just the absolute value of a real number and it is possible to divide the equation (3.12) for differentiability. If f (x) = cxn for some natural number n then f (a + h) − f (a) = c(a + h)n − can = can−1 h + h2 P (h) for some polynomial P (h) in the variable h. f2 (a + h) f2 (a) a21 a22 a23 2 (h) h3 Some simple mappings are easily seen to be differentiable. so no division by 0 is involved. For example a con- stant mapping f : Rm −→ Rn is differentiable at any point a ∈ Rm since f (a + h) = f (a). DIFFERENTIABLE MAPPINGS 73 3.

.10 A mapping f : Rm −→ Rn is differentiable at a point a if and only if each of the coordinate functions fi of the mapping f is differentiable at that point. and since k(h)k ∞ khk∞ = max1≤i≤n khk∞ |i (h)| k(h)k∞ it follows that limh→0 khk∞ = 0. . For the special case of a vector h = {0. . it is called the derivative of the mapping f at the point a and is denoted by f 0 (a). . That concludes the proof. is a differentiable function of the variable xk and that its derivative at the point xk = ak is the real number aik . am ) = fi (a1 . . MAPPINGS Theorem 3. so the mapping f is differentiable.14) fi (a + h) = fi (a) + aij hj + i (h) where limh→0 khk = 0.14) takes the form fi (a1 . The collection of these n equations taken together form the equation (3. . 0} having all com- ponents zero except for hk equation (3. so the matrix A is uniquely determined. It follows that the entries in the matrix A = {aik } are the uniquely determined partial derivatives of the coordinate functions of the mapping. 0. viewed as a function of the variable xk alone for fixed values xj = aj for the remaining variables for j 6= k.12) that each coordinate function fi satisfies m |i (h)| X (3. . am ) + aik hk + i (hk ) where limhk →0 |i|h(hkk| )| = 0. . and this is just the condition that each of the coordinate functions fi of the mapping f is differentiable at the point a. . .74 CHAPTER 3. . j=1 since |khk i (h)| ∞ ≤ k(h)k ∞ khk∞ . . . . The constant aik is called the partial derivative of the function fi with respect to the variable xk at the point a. Proof: If f is a differentiable mapping it follows from (3. . that is just the condition that fi (x). ak . . ak + hk . Conversely if each of the coordinate mappings fi is differentiable at the point a then (3.14) holds for 1 ≤ i ≤ n. so it is an m × n matrix with the entries n . hk . .12) in which (h) = {i (h)}. . . . . . . and is denoted by aik = ∂k fi (a). . 0.

o (3.15) f 0 (a) = aik = ∂k fi (a) .

. 1 ≤ k ≤ m. 1 ≤ i ≤ n .

In particular for the special case of functions f : Rm −→ R1 any linear combination c1 f1 (x) + c2 f2 (x) of differentiable functions is differentiable and (c1 f1 + c2 f2 )0 (a) = c1 f10 (a) + c2 f20 (a). and it is also evident from (3. It is evident from (3. For instance Df (a) .12) that if f : Rl −→ Rm is differentiable and T : Rm −→ Rn is a linear transformation then T f : Rl −→ Rn is differentiable and (T f )0 (a) = T f 0 (a).12) that a sum f1 + f2 of two differentiable mappings f1 . There are various alternative notations for derivatives and partial derivatives of functions of several variables that are in common use. f2 is also differentiable and that (f1 + f2 )0 (a) = f10 (a) + f20 (a). so in that sense differentiation is a linear operator.

b). Proof: First for the special case of a function of a single variable. and Dk f (a) or (a) for ∂k f (a) when f (x) is a real- ∂xk valued function of the variable x ∈ R . Since the limits are the same whether h > 0 or h < 0 it follows that h0 (a) = 0.17) at a point x ∈ (a. A real- valued function f : U −→ R1 defined in an open subset U ⊂ Rm has a local maximum at a point a ∈ U if there is an open neighborhood Ua of the point a in U such that f (x) ≥ f (a) for all points x ∈ Ua . and a local extremum is either a local maximum or a local minimum. but (f (a + h) − f (a))/h ≤ 0 for h < 0 so the limit (3.2. the notion of a local minimum is defined correspondingly of course.12 (Mean Value Theorem) If f and g are continuous func- tions in a closed interval [a. One consequence of the preceding result is particularly useful in proofs. b) then f (b) − f (a) g 0 (x) = g(b) − g(a) f 0 (x)   (3. One particularly useful property of the derivative is its role in detemining local extremal values of functions. Theorem 3. Next for the case of a function of several variables. This corollary is particularly simple and useful for functions of a single variable. after some further useful general properties of differentiation have been established. so only that case will be examined first. if f (x) has a local maximum at a point a then (f (a + h) − f (a))/h ≥ 0 for h > 0 so the limit (3.16) at a point x ∈ (a. which suffices for the proof. and when the mapping f : Rm −→ Rn n is viewed as giving the coordinates yi of a point y ∈ Rn as functions yi = fi (x) ∂yi of the coordinates xj of points x ∈ Rm the notation is quite commonly ∂xj used for ∂j fi . and since that holds for each variable xj it follows that all the partial derivatives are 0 at the point a so f 0 (a) = 0.13) for h ≥ 0 is non-negative.3. DIFFERENTIABLE MAPPINGS 75 ∂f is often used for f 0 (a).11 If f : U −→ R1 is a function in an open subset U ⊂ Rm that is differentiable at a point a ∈ U and if the function f has a local extremum at that point then f 0 (a) = 0. b). . the corresponding but somewhat weaker result for functions of several variables will be discussed later. In particular f (b) − f (a) = (b − a)f 0 (x)  (3.13) for h ≤ 0 is non-positive. The corresponding argument gives the same result at a local minimum. Theorem 3. b] and are differentiable in the open interval (a. local maxima or minima. when all the variables except the j-th are held fixed the function f (x) as a function of the single variable xj has an extremum at the point xj = aj so by the result established in the special case it follows that ∂j f (a) = 0.

this can be applied to the function f (x1 .13 If the partial derivatives of a mapping f : Rm → Rn exist at all points near a and are continuous at the point a then the mapping f is differentiable at the point a. When one of the variables is held fixed and f is viewed as a function of the remaining variable it is a differentiable function of a single variable. Theorem 3. 0) = 0. b] by   h(t) = f (b) − f (a) g(t) − g(b) − g(a) f (t). hence it is not differentiable at the origin. b] and a minimum value at some point x2 ∈ [a. and for convenience only the case m = 2 will be demonstrated in detail. x2 is an interior point of (a. This function is not even continuous at the origin. since h(a) = h(b). The mean value theorem for functions of a single variable asserts that if f (x) is continuous in a closed interval [a. b]. b] and is differentiable at each point of the open interval (a. and since (3. MAPPINGS Proof: Introduce the function h(t) defined in the interval [a. so ∂2 f (0. which establishes (3. Assume that the partial derivatives ∂k fi (x) exist for all points x near a and are continuous at a = {aj }. and consider a fixed vector h = {hj }. If both points x1 . b).10 it is enough to prove this for the special case that n = 1 so that f : Rm → R is just a real valued function.  x1 + x22 f (x) =   0 if x = 0  vanishes identically in the variable x2 if x1 = 0. x2 are end points of the interval then h(t) is constant. b).16).17) is just the special case of (3. For example the mapping f : R2 −→ R defined by  x x 1 2   2 if x 6= 0. b). 0) = 0. b] it takes a maximum value at some point x1 ∈ [a. and similarly ∂1 f (0. since it is easier to follow the proof in the simpler case and all the essential ideas are present.76 CHAPTER 3. then h0 (xj ) = 0 at that point by the preceding theorem. Since h is a continuous function on the compact set [a. but it is not true that conversely if the coordinate functions of a mapping f have partial derivatives at a point a with respect to each variable xj then f is a differentiable mapping. a2 ) of the single variable x1 in the interval between a1 and . If a mapping f : Rm −→ Rn is differentiable at a point a ∈ Rm then it has partial derivatives ∂k fi (a) with respect to each variable xj at that point. If at least one of the points x1 . b) then f (b) − f (a) = f 0 (c)(b − a) for some point c ∈ (a. for which h(a) = h(b) = f (b)g(a) − g(b)f (a). b). at which h(t) is differentiable. Proof: In view of Theorem 3.16) for which g(x) = x that suffices for the proof. the conclusion of the theorem is just the statement that h0 (x) = 0 at a point x ∈ (a. since for instance it takes the value 21 whenever x1 = x2 except at the origin where it takes the value 0. However if the partial derivatives of a mapping not only exist but also are continuous then the mapping is differentiable. and then h0 (x) = 0 for all points x ∈ (a.

α2 ) = h2 ∂2 f (a1 . a2 ) =     = f (a1 + h1 . By the triangle inequality |(h)| . α2 ) − ∂2 f (a1 . a2 ) .3. and to the function f (a1 +h1 . a2 + h2 ) − f (a1 . DIFFERENTIABLE MAPPINGS 77 a1 +h1 . a2 ) + h1 ∂1 f (α1 . Then f (a + h) − f (a) = f (a1 + h1 .2. a2 + h2 ) − f (a1 + h1 . α2 ) + h1 ∂1 f (α1 . a2 ) − ∂1 f (a1 . a2 ) = h2 ∂2 f (a1 + h1 . a2 ) + (h) where     (h) = h2 ∂2 f (a1 + h1 . a2 ) = h2 ∂2 f (a1 + h1 . a2 + h2 ) − f (a1 + h1 . a2 ) + h1 ∂1 f (a1 . a2 ) and f (a1 + h1 . a2 ) + f (a1 + h1 . a2 ) − f (a1 . a2 ) − f (a1 . α2 ). a2 ) = h1 ∂1 f (α1 . x2 ) of the single variable x2 in the interval between a2 and a2 + h2 if khk is sufficiently small. as a consequence there exist values α1 between a1 and a1 + h1 and α2 between a2 and a2 + h2 such that f (a1 + h1 .

.

.

.

.

≤ .

a2 ). α2 ) − ∂2 f (a1 .∂2 f (a1 + h1 .

+ .

a2 ). a2 ) − ∂1 f (a1 .∂1 f (α1 .

.

.

.

and since the partial derivatives are assumed to |(h)| be continuous at the point a it follows that limh→0 khk ∞ = 0. some care should be taken to distinguish carefully between the differentiability of a mapping. The limit of a uniformly convergent sequence of mappings from a topological space to a metric space is a continuous mapping by Theorem 3. A mapping f : U −→ Rn in an open subset U ⊂ Rm is said to be continuously differentiable or of class C 1 in an open subset U ⊂ Rm if the partial derivatives of its coordinate functions exist and are continuous throughout the set U .10. so the assumption in the preceding theorem that the partial derivatives are continuous is not necessarily automaticlly fulfilled. Thus differentiability implies that the partial derivatives exist. That shows that the mapping f : R2 −→ R is differentiable at the point (a1 .12). and the existence existence of all partial derivatives. khk∞ |h1 | |h2 | since khk∞ ≤ 1 and khk∞ ≤ 1. as defined in (3. but does not imply that they are continuous. and that condition implies that the mapping is differentiable at each point of U . which suffices to conclude the proof. If a mapping f : Rm −→ Rn is differentiable at all points of an open subset U ⊂ Rm the partial derivatives of the coordinate functions of the mapping f exist at each point of U . There is sometimes some confusion in the discussion of this point. and the existence of the partial derivatives does not imply differentiability unless it is also assumed that the partial derivatives are continuous. a2 ). There is a . but need not be continuous functions in U .

Proof: Suppose that g = limn→∞ fn0 in (a. Theorem 3. b) between x and c. b) then the sequence of func- tions fn is uniformly convergent to a differentiable function f in (a. with somewhat varying hypotheses.14 If {fn } is a sequence of differentiable functions in an open in- terval (a. in which case a somewhat stronger result is possible. b) and the functions fn converge at least at one point c ∈ (a.17) in the Mean Value Theorem that 0 (t) − fn0 (t)    fm (x) − fn (x) − fm (c) − fn (c) = (x − c) fm for some point t ∈ (a. For any point x ∈ (a.13 it is generally sufficient just to demonstrate this result for functions of a single variable. so since |x − c| ≤ |b − a| . In view of Theorems 3. b) and (3. b).10 and 3. b) such that the derivatives fn0 converge uniformly in (a. b). MAPPINGS corresponding result for differentiable mappings from Rm to Rn .18) lim fn0 (x) = f 0 (x) n→∞ at all points x ∈ (a.78 CHAPTER 3. b) it follows from (3.

.

.

.

.

.

0 (t) − fn0 (t).

 ..

fm (x) − fm (c) − fn (x) − fn (c) .

≤ |b − a|.

fm .

.

Since the derivatives fn0.

converge uniformly .

in (a. b) then for any  > 0 there is .

0 0 a number N such that .

fm (t) − fn (t).

n > N . b) whenever m. <  for all t ∈ (a. .

hence such that .

 .

.

.

fm (x) − fm (c) − fn x) − fn (c) .

n > N . ≤ |b − a| whenever m. .

b) to a function f . b) hence so it converges uniformly in (a.17) in the Mean Value Theorem again that fn (x + h) − fn (x) = hfn0 (xn ) for some point xn ∈ (a. Indeed . h n→∞ h n→∞ The points xn all lie in the closed and consequently compact interval bounded by x and x + h. b) between x and x + h. and since the sequence fn (c) converges then the sequence of functions fn converges uniformly in (a. consequently f (x + h) − f (x) fn (x + h) − fn (x) = lim = lim fn0 (xn ). thus the sequence of functions fn (x) − fn (c) is uniformly Cauchy in (a. b) between x and x + h. For any point x ∈ (a. b). so some subsequence xnk of the points xn converges to a point xh ∈ (a. It is easy to see that limk→∞ fn0 k (xnk ) = g(xh ) where g = limk→∞ fn0 k . which by Theorem 3. b) and any real number h ∈ R sufficiently small that x + h ∈ (a. b).8 is at least continuous in (a. b) it follows from (3.

.

.

.

.

.

.

0 .

fnk (xnk ) − g(xh ).

≤ .

fn0 k (xnk ) − g(xnk ).

+ .

g(xnk ) − g(xh ).

. .

.

.

.

.

.

DIFFERENTIABLE MAPPINGS 79 .3.2.

.

for any  > 0 there is a number K such that first .

fn0 k (x) − g(x).

<  for all x ∈ .

.

b) whenever k . (a.

> K. since the .

sequence of functions fn0 converges uniformly to g. and second .

g(xnk ) − g(xh ).

since xnk converges to . <  whenever k > K.

.

.

.

b). xh and g is continuous in (a. and therefore .

fn0 k (xnk ) − g(xh ).

< 2 whenever .

.

The basic situation is that the composition of a mapping g : U −→ Rm defined in an open neigh- borhood U of a point a ∈ Rl and a mapping f : V −→ Rn defined in an open neighborhood V of the point b = g(a) ∈ Rm .21) g(a + k) = g(a) + g0 (a)k + g (k) where lim = 0. Thus f (x + h) − f (x) = g(xh ) where xh is between x and x + h. as in the following diagram. Theorem 3. the composition φ = f ◦ g : U −→ Rn is the mapping defined by φ(x) = f g(x) for any x ∈ U . k > N .15 (Chain Rule) If the mapping g is differentiable at the point a and the mapping f is differentiable at the point b = g(a) then the composite function φ = f ◦ g is differentiable at the point a and φ0 (a) = f 0 g(a) · g0 (a). Rl Rm Rn S S S g f (3. which suffices for the proof.19) U −−−−→ V −−−−→ W φ=f ◦g x x x  ι  ι  ι g f a −−−−→ b = g(a) −−−−→ f (b) = φ(a) where the mappings ι are simply inclusions of the indicated points such as a in the open subsets such as U . but the calculation can be somewhat complicated. h Since limh→0 xh = x it follows that f 0 (x) = limh→0 g(xh ) = g(x). The differentiation of functions that arise by the composition of several other functions can be reduced to the differentiation of the various factors in the com- position. where g(U ) ⊂ V .20) f (b + h) = f (b) + f 0 (b)h + f (h) where lim = 0. h→0 khk∞ and since the mapping g is differentiable at the point a kg (k)k∞ (3.  Proof: Since the mapping f is differentiable at the point b kf (h)k∞ (3. |{z} | {z } k→0 kkk∞ b h .

In particular if l = m = n = 1. which provides a very convenient formula for reducing the calculation of the derivative of a product f1 f2 of two functions to the calculation of the derivative of each of the functions.80 CHAPTER 3. MAPPINGS Substituting (3. so all the vector spaces involved are one- dimensional.21) into (3. Since limk→0 kkk∞ = 0 and limh→0 kkhk f (h)k∞ ∞ =0 while limh→0 k = 0 it follows from the preceding equation k(k)k∞ that limk→0 kkk∞ = 0.22) that h = f ◦ g is differentiable at the point a and that φ0 (a) = f 0 g(a) · g0 (a). f2 (x) Of course this holds in the special case m = 1.23) For a slightly more complicated example.22) φ(a + k) = φ(a) + f 0 (b)g0 (a)k + (k) where (k) = f 0 (b)g (k) + f (h). then  f and g are real-valued functions as is their composition h(x) = f g(x) . f2 } : U −→ R2 in an open subset U ⊂ Rm the composition φ = g ◦ f : U −→ R is a differentiable mapping and   0  f (x)  0 0  0 φ (x) = g f (x) f (x) = f2 (x) f1 (x) · 10 = f2 (x)f10 (x) + f1 (x)f20 (x). and in general for a product φ(x) = f1 (x)f2 (x) of two functions of several variables the entries in the matrix φ0 (x) have the form ∂k φ(x) = f2 (x)∂k f1 (x) + f1 (x)∂k f2 (x). y2 ) = y1 y2 is differentiable at any point y1 .   (3. and it then follows from (3. since the function g(y1 . and the conclusion of the preceding theorem takes the simpler form h0 (x) = f 0 g(x) · g 0 (x) where h(x) = f g(x) . From the triangle inequality it follows that k(k)k∞ kf 0 (b)g (k)k∞ kf (h))k∞ kg0 (a)k + g (k)k∞ ≤ + · kkk∞ kkk∞ khk∞ kkk∞ ! 0 kg (k)k∞ kf (h))k∞ 0 kg (k)ko ≤ kf (b)ko + · nk g (a)ko + kkk∞ khk∞ kkk∞ kg (k)k∞ in terms of the operator norms of the matrices. y2 ∈ R and g 0 (y1 .20) where b = g(a) and h = g0 (a)k + g (k) leads to the result that   φ(a + k) = f g(a + k) = f b + h = f (b) + f 0 (b)h + f (h)   = φ(a) + f 0 (b) g0 (a)k + g (k) + f (h) hence that (3. which concludes  the proof. y2 ) = (y2 y1 ) it follows that for any differentiable mapping f = {f1 . .

x2 ) = f x1 . For another application of the chain rule. For instance if φ(x1 . if f : U −→ V is a one-to-one mapping between two open subsets U. so both matrices are nonsingular matrices at each point x ∈ U . thus the matrix g0 f (x) is the inverse of the  matrix f 0 (x) at each point x ∈ U . the function φ is really the composition φ = f ◦ G of the mappings f : R3 −→ R1 given by the function f and the mapping . . . . Some care also must be taken with the chain rule in those cases where the compositions are not quite so straightforward. so some care must be taken to remember the ∂yi ∂yi ∂x the derivative ∂x j is evaluated at the point x while the derivatives ∂t k and ∂tkj are evaluated at the point t. . than the version of the chain rule in the preceding the- orem. An alternative notation for the chain rule is suggestive and sometimes quite useful. If the mappings f and g are continuously differentiable then  by the chain rule g0 f (x) · f 0 (x) = (g ◦ f )0 (x) = I. When mappings f : Rl −→ Rm and g : Rm −→ Rn are described in terms of the coordinates t = {t1 .24) = · . g and φ have the form yi = fi (x) = φi (t) and xj = gj (t). x3 ) of three variables and a function g(x1 . x2 .3. . ∂tk j=1 ∂xj ∂tk This is the extension to mappings in several variables of the traditional formula- tion of the chain rule for functions of a single variable as the identity dy dy dx dt = dx · dt . x2 . x2 ) for a function f (x1 . . . easier to use. tl } ∈ Rl . ∂tk ∂xj ∂tk By the preceding theorem the derivative of the composite function φ = f ◦ g is the matrix Pproduct φ0 = f 0 g0 . (f 0 )ij = ∂j fi (x) = . . . this form of the chain rule is in some ways easier to remember. x = {x1 . x2 ) of two variables. . and in the 0 alternative notation this takes the form n ∂yi X ∂yi ∂xj (3. and with some caution. DIFFERENTIABLE MAPPINGS 81 The corresponding argument shows that the quotient ψ = f1 /f2 of two dif- ferentiable functions is differentiable at any point x at which f2 (x) 6= 0. . V ⊂ Rm and if g : V −→ U is the inverse mapping then g◦f : U −→ U is the identity mapping g ◦ f (x) = x. (g 0 )jk = ∂k gj (t) = . which in terms of the Pentries of these matrices is n n 0 (φ )ik = j=1 (f )ij (g 0 )jk or equivalently ∂k φi = j=1 ∂j fi · ∂k gj . yl } ∈ Rn . g(x1 . The partial derivatives are sometimes denoted by ∂yi ∂yi ∂xj (φ0 )ik = ∂k φi = . xm } ∈ Rm and y = {y1 .2. This lack of clarity means that some caution must be taken when this notation is used. It is customary however to omit any explicit mention of the points at which the derivatives are taken. . so (g ◦ f )0 (x) = I where I is the m × m identity matrix. and that ∂k f1 (x)f2 (x) − ∂k f2 (x)f1 (x) ∂k ψ(x) = . f2 (x)2 another very useful formula in practice. the coordinate functions of the mappings f .

x3 ) where x3 = g(x1 . while ∂1 f (x1 . x2 . x2 with respect to the variable x1 . x2 ) is less ambiguous. It should be kept in mind though that the meaning of the expression ∂  f x1 . The chain rule also is useful in deriving information about the derivatives of functions that are defined only implicitly. 0) = 0 then the values of that function are determined implicitly but not explicitly by the preceding equation. g2 (x1 . x2 ) satisfies the equation f (x1 . g(x1 . but if there are questions in any case it is safest to write the formula to be differentiated as an explicit composition of mappings. x2 . MAPPINGS G : R2 −→ R3 given by   x1 G(x1 . x2 ) so if the function g is differentiable it follows from the chain rule that φ0 (x) = f 0 G(x) G0 (x)    1 0 !       = ∂1 f G(x) ∂2 f G(x) ∂3 f G(x)  0  1     ∂1 g(x) ∂2 g(x) !    = ∂1 f G(x) + ∂3 f G(x) ∂1 g(x) ∂2 f G(x) + ∂1 f (x)∂2 g(x) hence the coordinate functions of the matrix φ0 (x) are   ∂1 φ(x) = ∂1 f G(x) + ∂3 f G(x) ∂1 g(x). x2 . x2 ) + f (x1 . x3 where x3 = g(x1 . it may mean either the derivative of the function f with respect to its first variable or the derivative of the composite function of the two variables x1 . as is illustrated in the assigned problems. x2 ) ∂x1 is not clear. x2 ) = 2x1 + 3x2 and the initial condition that f (0. x2 )5 + x1 f (x1 . x2 ) =  x2  . and multiplying each of these derivatives by the derivative of what is in the place of that variable with respect to the variable x1 . Some care must be taken with this calculation.   ∂2 φ(x) = ∂2 f G(x) + ∂1 f G(x) ∂2 g(x). For example if a function f (x1 . though. This equa- tion is the condition that the composition of the mapping F : R2 −→ R3 defined .82 CHAPTER 3. practice makes the application of the chain rule simpler. x2 ) with respect to each of its three variables. This amounts to calculating the partial derivative ∂1 φ(x) as the sum of the partial derivatives of the function f x1 .

the coordinate functions of the mapping φ(t) are φj (t) = aj + uj t so the derivative φ0 (0) is just the m × 1 matrix or vector φ0 (0) = {uj }. 0). so that (G ◦ F )0 (0. j=1 Thus the directional derivative of the function f in the direction of a unit vector u is the dot product of the gradient of the function f at the point a with the unit vector u. and this vector is called the gradient of the function f (x). the convention that the gradient of a function defined in a subset of a vector space Rm is a vector in the same sense as all other vectors in Rm is quite standard and quite useful. For example. so it is not a vector in the sense used for points x ∈ Rm . x2 ) = x1 . Thus for a function f (x1 .25) ∂u f (a) = ∂j f (a)uj = ∇f (a) · u. x2 ) + f (x1 . The maximal directional derivative of the function f (x) at the point a is in the direction of the vector ∇f (a) and is equal to k∇f (a)k2 . the transpose of the derivative is often denoted by ∇f (x). which are always viewed as column vectors. although an earlier but still used alternative notation is gradf (x). It follows from the chain rule that ∂u f (a) = f 0 (a)φ0 (0). so since ∂1 (G ◦ F ) = 0 and f (0. x2 ) − 2. x2 ) + ∂1 f (x1 . x2 ) = 0. and is denoted by ∂u f (a). The derivative of this restriction is called the directional derivative of the function f at the point a in the direction of the unit vector u. x2 ) and the mapping G : R3 −→ R defined by G(x1 . while the minimal directional derivative is in the direction of the vector −∇f (a) and . To avoid confusion and keep these distinctions clear. A similar calculation yields the value of ∂2 f (0.3. hence m X (3. x2 )4 ∂1 f (x1 . a straight line through a point a ∈ Rm in the direction of a unit vector u can be described parametrically as the set of points x = φ(t) for t ∈ R. 0) = 2. 0) = 0. f (x1 . but the transpose tf 0 (x) is a column vector. x2 ) + x1 ∂1 f (x1 . x2 ) of two variables     ∂1 f (x) f 0 (x) = ∂1 f (x) ∂2 f (x) and ∇f (x) = gradf (x) =  . either in the addition of vectors or in the dot or inner product of two vectors. If f : U −→ R is a differentiable function in an open set U ⊂ Rm containing the point a the restriction of f to this straight line can be viewed as a function (f ◦ φ)(t) = f (a + tu) of the parameter t ∈ R near the origin. where φ : R −→ Rm is the mapping φ(t) = a + tu. y) = y 5 + x1 y + y − 2x1 − 3x2 is the trivial mapping G ◦ F (x1 . x2 . ∂2 f (x Although this is really a rather trivial point. The derivative f 0 (x) of a func- tion is a 1 × m matrix. so it can be viewed as an ordinary vector in Rm .2. 0) = 0 the preceding equation reduces to ∂1 f (0. ∇f (x) is a vector that can be used in the same context as the vector x. and if the function f is differentiable it follows from the chain rule that ∂1 (G ◦ F ) = 5f (x1 . x2 . There is a notational convention that is commonly used in considering the derivatives of functions and is quite convenient. DIFFERENTIABLE MAPPINGS 83  by F (x1 .

28) ωf (x) = fj (x) dxj . j=1 The condition that a function f is differentiable at a point a can be viewed as the condition that f can be approximated near the point a by an affine function. a polynomial of degree 1 in the variables in Rm . It is tempting to avoid introducing a separate notation and to denote this function just by xj . this interpretation is often quite useful in practice. · · · . with the entry 1 in column j and all other entries 0.  where the entry 1 is in column j. 0. or to be more explicit. j=1 This form for the derivative f 0 (x) is called a differential form. and the length k∇f (a)k of the gradient vector is the maximum rate of increase of the function f .29) ∆f (x) = f (x + ∆x) − f (x) = f 0 (x)∆x + (∆x). The gradient ∇f (a) of a function f at a point a is a vector in the direction in which f is increasing most rapidly. this is familiar from physics. It is often expressed as the condition that for small changes ∆x in the coordinates of a point in Rm the change ∆f in the value of the function is approximately a linear function of ∆x.27) df (x) = ∂j f (x) dxj . 0. More generally a mapping f : U −→ Rm when viewed as associating to a point x ∈ Rm a vector also of dimension m is called a vector field on the subset U ⊂ Rm . a differential form of degree 1. If f (x) = xj then f 0 (x) = 0. · · · . magnetic and gravitational fields. · · · .84 CHAPTER 3. so that  (3. for the definition of a differentiable function can be written (3. explicitly m X (3.26) dxj = 0. 0 . With this notation the vector fj0 (x) for an arbitrary differentiable function f can be denoted corre- spondingly by df (x) and written in terms of the basis (3. which is somewhat confusing since x0j commonly is used to denote another set of variables in Rn . . and ∂u f (a) = 0 for any vector u orthogonal to the gradient ∇f (a). A vector field f in an open subset U ⊂ Rm with the coordinate functions fj (x) also can be written as a linear combination of the standard basis vectors in Rn and consequently as a differential form of degree 1. MAPPINGS is equal to −k∇f (a)k2 . 0. and that the error in this approximation is fairly small. thus f 0 (x) is independent of x and actually is one of the standard basis vectors for the vector space Rm . 0. The standard notation for its derivative then would be x0j . · · · . 1.26) as m X (3. for the electric. The question whether a vector is to be viewed as a row vector or a column can be avoided merely by writing a vector as an explicit linear combination of basis vectors in the vector space Rm . it is clearer and rather more customary to denote the derivative of the function xj by dxj . 0 . 1.

b and the line λ joining them lie in U then (3. and by the mean value theorem for functions of one variable g(1) − g(0) = g 0 (τ ) for a point τ ∈ (0. where kko is the operator norm. Although the preceding theorem can be applied to each coordinate function of the mapping f : U −→ Rn for n > 1. b and the line λ joining them lie in U then (3. Theorem 3.17 (Mean Value Inequality) If f : U −→ Rn is a differentiable mapping in an open set U ⊂ Rm and if two points a. .30) f (b) − f (a) = ∇f (c) · (b − a) for some point c between a and b on the line λ joining these two points.2. 1]. ∆x→0 Another of the standard approximations in one variable. but extends only as an inequality to mappings from Rm to Rn . does extend directly to functions of several variables.31) kf (b) − f (a)k2 ≤ kf 0 (c)ko kb − ak2 for some point c between a and b on the line λ. and the error (∆x) is much smaller than the change ∆x in the variable since lim (∆x)/k∆xk = 0. Since g(1) = f (b) and g(0) = f (a).  Proof: The function g(t) = f a + t(b − a) ) is a differentiable function of the variable t in an open neighborhood of the interval [0. while by the chain rule m X g 0 (t) =  ∂j f a + t(b − a) (bj − aj ). Theorem 3. j=1 it follows that m X f (b) − f (a) = g(1) − g(0) = g 0 (τ ) =  ∂j f a + τ (b − a) (bj − aj ) j=1 m X = ∂j f (c)(bj − aj ) = ∇f (c) · (b − a) j=1 where c = a + τ (a − b). the points at which the derivatives of the different coordinate functions are evaluated may be different. However for some purposes an estimate is useful enough and that problem can be avoided. the mean value theo- rem. DIFFERENTIABLE MAPPINGS 85 so the change ∆f (x) in the value of the function f (x) is approximately equal to the linear function f 0 (x)∆x of the change ∆x in the variable x.3. 1).16 (Mean Value Theorem) If f : U −→ R is a differentiable function in an open set U ⊂ Rm and if two points a. and that suffices for the proof.

If the unit vector u is chosen to lie in the direction of the vector f (b) − f (a) then . so m X fu (b) − fu (a) = ∇fu (c) · (b − a) = ∂j (u · f )(c) (bj − aj ) j=1 n X X m n X X m = ui ∂j fi (c)(bj − aj ) = ui · f 0 (c)ij (bj − aj ) i=1 j=1 i=1 j=1   0 = u · f (c)(b − a) for some point c between a and b on the line λ joining these two points.86 CHAPTER 3. MAPPINGS Proof: For any vector u ∈ Rn for which kuk2 = 1 the dot product fu (x) = u · f (x) is a real-valued function to which the Mean Value Theorem can be applied.

 .

kf (b) − f (a)k2 = |fu (b) − fu (a)| = .

u · f 0 (c)(b − a) .

.

.

for the appropiate operator norm and a suitable constant factor. y ∈ ∆ the line joining them is contained in ∆ so it follows from the Mean Value Inequality that kf (x) − f (y)k∞ ≤ M 0 kx − yk∞ . ≤ kf 0 (c)(b − a)k2 ≤ kf 0 (c)ko k(b − a)k2 by the Cauchy-Schwarz inequality. such as the following. Of course there is an analogous result in terms of any norms equivalent to the `2 norm. . The constant factor is irrelevant for many applications. f Theorem 3. which suffices for the proof. hence the mapping f is uniformly continuous in ∆ and thereby concludes the proof. Proof: If kf 0 (x)k∞ ≤ M for all x ∈ ∆ then for any two points x. for some constant M 0 .18 If f : U −→ Rn is a differentiable mapping in a cell ∆ ⊂ Rm such that kf 0 (x)ko ≤ M for all x ∈ U then the mapping f is uniformly continuous in ∆.

consequently ∆ = ∂2 ∂1 f (α1 . a2 ) .3. ∂1 ∂2 f (x). a2 ) − f (a1 .3 Real Analytic Mappings If f : U −→ R is a function defined in an open set U ⊂ R2 . The same argument used for the first grouping when applied to this grouping amounts to interchanging the roles of the two variables. ANALYTIC MAPPINGS 87 3. β2 )h1 h2 . and the process may continue. if the partial derivatives ∂1 f (x). and if the partial derivative ∂j1 f (x) exists at all points x ∈ U . ∂2 ∂1 f (x) exist and are continuous at a point a ∈ U . which for convenience is shortened to ∂j2 ∂j1 f (x). and by another application of the mean value theorem for functions of one variable it further follows that φ0 (α1 ) = ∂1 f (α1 . a2 + h2 ) − f (x1 . α2 )h2 where α2 is between a2 and a2 + h2 if khk is sufficiently small.19 If f : U −→ R is a function in open subset U ⊂ R2 . for example. 0)   x21 + x22  f (x1 . the function ∂j1 f (x) may itself have partial derivatives. x2 ) = 0  shows that ∂1 ∂2 f (0. a2 + h2 ) − f (a1 + h1 . a2 + h2 ) − f (a1 . a2 + h2 ) − f (a1 . 0) = −1. which leads to the result that ∆ = ∂1 ∂2 f (β1 . a2 + h2 ) − ∂1 f (α1 . On the other hand it is possible to group the terms in the equation for ∆ in another way. ∂2 ∂1 f (x) exist at all points x ∈ U . a straightforward calculation for the function x1 x2 (x21 − x22 )  if (x1 . x2 ) =   0 if (x1 . The order in which successive derivatives are taken may be significant. However for continuously differentiable functions the order of differentiation is irrelevant. Proof: By an application of the mean value theorem for functions of one vari- able it follows that     ∆ = f (a1 + h1 . so that     ∆ = f (a1 + h1 . then ∂1 ∂2 f (a) = ∂2 ∂1 f (a).3. such as ∂j2 ∂j1 f (x) . α2 )h2 h1 . and if the mixed partial derivatives ∂1 ∂2 f (x). a2 ) = φ(a1 + h1 ) − φ(a1 ) where φ(x1 ) = f (x1 . a2 + h2 ) − f (a1 + h1 . ∂2 f (x). a2 ) 0 = φ (α1 )h1 where α1 is between a1 and a1 + h1 if khk is sufficiently small. 0) = 1 but ∂2 ∂1 f (0. a2 + h2 ) − f (a1 . Theorem 3. x2 ) 6= (0. a2 ) = ∂2 ∂1 f (α1 . leading to ∂j3 ∂j2 ∂j1 f (x) and so on.

. for example by writing ∂12 ∂2 f (x) in place of ∂1 ∂2 ∂1 f (x) or ∂2 ∂1 ∂1 f (x). 2! 3! 1 (k) 1 · · · + f (a)hk + f (k+1) (α)hk+1 k! (k + 1)! where α is between a and a + h. 1) stands for ∂13 ∂22 ∂3 f (x) for instance. Another notation frequently used is ∂ 3 f (x) = ∂12 ∂2 f (x). Proof: For a fixed h ∈ R sufficiently small that the closed interval from a to a + h is contained in U let 1 00 1 (3. or by writing ∂j1 j2 f (x) in place of ∂j1 ∂j2 f (x). a2 ). in which ∂ I f (x) where I = (3.20 (Taylor expansion in one variable) If f : U −→ R1 is a function having partial derivatives up to order k + 1 in an open neighborhood U ⊂ R1 of a point a ∈ R1 then for any h ∈ R1 sufficiently small 1 00 1 (3. Since the functions ∂1 ∂2 f (x1 . Comparing the two expressions for ∆ and dividing by h1 h2 shows that ∂2 ∂1 f (α1 . and correspondingly for the points α2 and β2 . a2 ) = ∂1∂2 f (a1 . which suffices to conclude the proof. x2 ) are both continuous and α1 and β1 both approach a1 as h1 tends to 0. The preceding result of course applies to functions defined in open subsets of Rn for any n.12) of differentiability involved an approximation of a func- tion by a polynomial of degree 1 for sufficiently small values of the the auxiliary variable h. . Theorem 3. . β2 ). When is applicable to functions that are continuously differentiable of the appropriate orders the notation can be simplified. The existence of higher order derivatives can be interpreted corre- spondingly as involving an approximation of a function by polynomials of higher degree for sufficiently small values of the the auxiliary variable h. 2. it follows in the limit that ∂2 ∂1 f (a1 . and it applies to higher derivatives of any orders.32) f (a + h) = f (a) + f 0 (a)h + f (a)h2 + f 000 (a)h3 + . α2 ) = ∂1 ∂2 f (β1 .33) R(x) = f (a + x) − f (a) − f 0 (a)x − f (a)x2 − f 000 (a)x3 . for the same reasons. . 2! 3! 1 (k) 1 · · · − f (a)xk − cxk+1 k! (k + 1)! . It is sometimes convenient to use the multi-index notation. the points β1 and β2 are not necessarily the same as the points α1 and α2 since they are derived by different applications of the mean value theorem in one variable. ∂x21 ∂x2 The definition (3. MAPPINGS for some points β1 between a1 and a1 + h1 and β2 between a2 and a2 + h2 if khk is sufficiently small. since it just involves a change in the order of differentiation for any pair of variables. x2 ) and ∂2 ∂1 f (x1 .88 CHAPTER 3.

The process continues. Finally for the case i = k + 1 it follows that R(k+1) (x) = f (k+1) (x) − c.. . . + g (k) (0) + g (k+1) (τ ) 2! k! (k + 1)! . so it follows from an application of the mean value theorem for functions of a single variable that R0 (α1 ) = 0 for some value α1 between 0 and h. and since R0 (α1 ) = 0 it follows from another application of the mean value theorem for functions of a single variable that R00 (α2 ) = 0 for some α2 between 0 and α1 .34) R0 (x) = f 0 (a + x) − f 0 (a) − f 00 (a)x − f (a)x2 − 2! 1 1 ··· − f (k−1) (a)xk−1 − cxk . . . with further differentiation of the expression (3. That suffices for the proof. hjk+1 (k + 1)! j 1 . Note that 1 000 (3.... Theorem 3.3. and R(i) (αi ) = 0. hjk k! j . (k − 1)! k! which is much the same as (3.32) of the function g(t) 1 00 1 1 (3.. ..jk+1 f (α)hj1 .j f (a)hj1 . αk+1 } where αi+1 is between 0 and αi . and since R(k+1) (αk+1 ) = 0 it follows that c = f (k+1) (αk+1 )... ..33) but for the index k − 1 in place of the index k. Proof: Let φ(t) = a+th for any t ∈ R and considerthe function g(t) = f (φ(t)). . for which g(0) = f φ(0) = f (a) and g(1) = f φ(1) = f (a + h). yielding a sequence of points {α1 .35) f (a + h) = f (a) + ∂j f (a)hj + ∂j j f (a)hj1 hj2 + · · · j=1 2! j .36) g(1) = g(0) + g 0 (0) + g (0) + . In particular R0 (0) = 0.j =1 1 2 1 2 m 1 X ··· + ∂j .j =1 1 k 1 k m 1 X + ∂j1 ... The corresponding result in several variables can be deduced from the result in a single variable by an application of the chain rule.33).21 (Taylor expansion in several variables) If f : U −→ R has continuous partial derivatives up to order k + 1 in an open neighborhood U ⊂ Rm of a point a ∈ Rm then for any h = {hj } ∈ Rm sufficiently small m m X 1 X (3.jk+1 =1 where α is between a and a + h on the line segment connecting them. . ANALYTIC MAPPINGS 89 where c ∈ R is chosen so that R(h) = 0. α2 ..3. . Clearly R(0) = 0 as well. hence is between 0 and h. . By the Taylor expansion (3.

. . j1 =1 Xm g 00 (t) = ∂j1 j2 f (a + th)hj1 hj2 j1 . MAPPINGS for some τ ∈ (0.. The approx- imations (3.36) for t = 0 yields (3.41) f (a + x) ∼ cn xn n=0 .38) in the neighborhood U . xm ).35).35) for α = a + τ h and thereby concludes the proof.39) N (x) = (N + 1)! for a point α(x) between a and a + x depending on x. . There is an analogous result for functions of m variables. If limN →∞ N (x) = 0 for all points x ∈ U for some open neighborhood U of the origin then the function f (a + x) is the limit of the polynomials (3.40) can be written as power series expansions in the variable x or the collection of variables x = (x1 .j2 =1 and in general Xm g (ν) (t) = ∂j1 j2 ···jν f (a + th)hj1 hj2 · · · hjν . . By repeated applications of the chain rule m X g 0 (t) = ∂j1 f (a + th)hj1 . There are a number of alternative expressions for the error in this polynomial approximation.j2 . j1 .. If a function of a single variable has derivatives of all orders.37) and (3.40) f (a + x) = PN (x) + N (x) for polynomials PN (x) of degree N in m variables as in (3.90 CHAPTER 3. for the simpler case m = 1 a power series expansion in the variable x is a formal expression ∞ X (3. but for present purposes the form used here will suffice. 1).. so is a C ∞ func- tion. then for any number N the Taylor expansion provides an approximation (3.38) PN (x) = c0 + c1 x + c2 x2 + · · · + cN xN where cn = f (a) n! with the error 1 f (N +1) α(x) xN +1  (3.37) f (a + x) = PN (x) + N (x) by the polynomial 1 (n) (3..jnu =1 substituting these values into (3. where the Taylor expansion provides a polynomal approximation (3.

n=0 and the series is said to converge to the function f (a + x) in a neighborhood U of the origin 0 ∈ R if (3. The question whether a power series expansion converges amounts to much the same thing and has much the same answer whether the variable is real or complex. the partial sums (3. Since (1 − z)(1 + z + z 2 + · · · + z N = 1 − z N +1 it follows that 1 z N +1 1 + z + z2 + · · · + zN = − .42) and the notion (3. n→∞ which is indicated by ∞ X (3.44) f (a + x) = cn x n for all x ∈ U . that of the geometric series. Since polynomials such as (3. ANALYTIC MAPPINGS 91 for any real constants cn .43) of convergence are well defined. where the region of convergence is a neighborhood U of the origin 0 in the complex plane C.41) where the coefficients cn and the variables a and x are complex numbers.42) PN (x) = cn xn .3. the condition for convergence arises from the examination of a simple special case.43) have well defined values when the coefficients cn and the variable x take complex values.3. 1−z 1−z . n=0 A function that can be represented by a convergent power series expansion in a neighborhood U of the origin is called a real analytic function of the variable a + x for x ∈ U . and a function that can be represented by a convergent power series expansion of the variable a + x for x ∈ U is called a complex analytic function or a holomorphic function in the neighborhood U ⊂ C. The standard convention is to denote the variable by z for complex values. where z = xiy. thus it is possible to consider power series expansions (3. it is possible to consider power series expansions over the complex numbers as well as over the real numbers.43) f (a + x) = lim PN (x) for all x ∈ U . the partial sums of this series are the polynomials N X (3. There are corresponding definitions for functions of several variables.

N +1 .

N +1 z N +1 and if |z| ≤ r < 1 then .

z1−z .

≤ r1−r and consequently limN →∞ = 0 .

.

The examination of this series yields convergence . |z| < 1. 1 − z n=0 1 This is the power series expansion of the holomorphic function 1−z in the disc |z| < 1 in the complex plane. 1−z whenever |z| ≤ r < 1 so ∞ 1 X (3.45) = zn for z ∈ C.

(i) A necessary but not sufficient condition for the convergence of this series is that limn→∞ |cn | = 0.92 CHAPTER 3. MAPPINGS results for general power series. (iv) Ratio Test A sufficient condition for the convergence of this series is that |cn+1 /cn | ≤ r < 1 for all n > N for some N . and the series converges to a real or complex number C if C = limN →∞ cn . (iii) Root Test A sufficient condition for the convergence of this series is that |cn | ≤ M rn for some positive real numbers M and r with r < 1 and for all n > N for some N . the partial sums of this series are the finite sums PN = n=0 cn . (ii) Comparison Test A sufficient conditionP for the convergence of this series is that |cn | ≤ Cn for all n where Cn ≥ 0 and n→∞ Cn converges. Proof: (i) If the series. indeed even for more general series than power series. In general an infinite series is a formal sum n=0 cn for some real or complex values PN cn . so it is more efficient to consider P∞the general case first here. P∞ Theorem 3.22 (Convergence Tests) Let n=0 cn be an infinite series of complex terms cn . which are well defined real or complex numbers.

converges.

so limN →∞ . then the partial sums PN form a Cauchy se- quence .

PN − PN −1 .

so it suffices to Passume that |cn | ≤ rn for some real number 0 P ≤ r < 1 and for all ∞ n ∞ n. The convergence conditions of the preceding theorem are the simplest of a wide range of other tests for convergence of series. (iii) The convergence of a series does not depend on the first N terms for any N . (iv) The convergence of a series does not depend on the first N terms for any N . but that is not sufficient to guarantee that the series converges. Psince the series n=0 Cn ∞ converges. so if a series converges it cannot be assumed that its terms satisfy the conditions of the root or ratio test. since for instance the terms of the series 1 1 1 1 1 1 1 1 1 2 + 2 + 3 + 3 + 3 + 4 + 4 + 4 + 4 + · · · tendP to 0 but the series does not converge. thus the partial sums PN of the series n=0 cn are a Cauchy se- quence so that series converges. it follows readily from the ratio test that the series n=0 nn converges for any positive real number  < 1.45). hence the series converges by (iii). the comparison test. = limN →∞ |cN | = 0. It should be noted that the sufficient conditions for convergence of series in the preceding theorem are not generally necessary. nn n  . N are sufficiently large. but are still remarkably P∞useful. hence the series n=0 cn converges by (ii). the root test. ∞ PN (ii) If PN are the partial sums of the series n=0 cn then |PN −PM | = | n=M +1 cn | ≤ PN P∞ n=M +1 Cn <  whenever M. since (n + 1)n+1   1 1− = 1 +  < 1 whenever n > . As an example. in which case it follows immediately by induction that |cn | ≤ rn . the series n=0 r converges by (3. so it suffices to assume that |cn+1 /cn | ≤ r < 1 for all n.

hence |cn z n | < C |z0 | whenever |z| < |z0 | so by the root test the series converges whenever |z| < |z0 |. IfP a real power series n=0 an xn has a radius of ∞ convergence R. indeed converges uniformly in any smaller disc. called the radius of convergence of the series. N are sufficiently large so the series converges uniformly in the disc |z| ≤ r. ANALYTIC MAPPINGS 93 The question whether a particular series converges or not can be a quite difficult one though. one of the most basic of differential equations. the series converges at r so for any  > 0 it follows that |PN (z) − PM (z)| ≤ |PN (r) − Pn (r)| <  whenever M.3.46) e(x) = x . It then follows from Theo- rem 3.Pas noted in the example considered on the preceding ∞ page. the series P ∞ n n=0 n M converges. and the derivatives are described by the power series arising from term-by-term differentiation. and the se- ∞ ries that results is given by n=0 nan xn−1 . are solutions of the differential equation e0 (x) = e(x). if any. hence it can be differentiated term-by-term. It follows that for any complex power series there is a largest value R. which suffices for the proof. hence the series n=0 nan xn−1 converges in |z| < r1 by the comparison test. n Proof: If the power series converges for z = z0 then limn→∞ |cnnz0 | = 0 so |z| |cn |z0 |n | < C for some number C > 0. However for power series the situation is relatively simple.23 If a power series n=0 cn z n converges for a particular value z0 ∈ C then it converges for all complex values |z| < r where r = |z0 | and converges uniformly for all complex values |z| ≤ r for any r < z0 . If 0 < r1 < r < R then |an |rn < M for some number M hence |n an r1n | < nn M where  = rr1 < 1. consider the problem of finding which functions of a real variable x. of course it is possible that R = 0. so the series does not converge at any nonzero complex number. for which the series converges whenen- ver |z| < R. or that R = ∞. the sum f (x) = n=0 an xn is a well defined continuous function in the interval |x| < R.14Pthat this series converges to the derivative of the function Pf∞(x). Thus functions described by convergent power series have derivatives of all orders. n=0 n! It follows immediately from the ratio test that this series converges for all values x.3. requiring more subtle tests. Thus if ∞ f (x) = n=0 an xn has a radius of convergence R then f 0 (x) = n=0 nan xn−1 where this series too has a radius of convergence R. ThePindividual terms can be differentiated. As an example of the usefulness of this observation. If |z| ≤ r < |z0 | then for any M < N the partial sums satisfy |PN (z) − PM (z)| ≤ PN n PN n j=M +1 |cn z | ≤ j=M +1 Cr = |PN (r) − PM (r)|. Since the series converges unifomly in any disc of radius r < R it follows that P∞ the sum is a continuous function in the disc |z| < R. If there is such a function and it is normalized for instance by e(0) = 1 then all its derivatives also satisfy e(n) (0) = 1 so it has the Taylor series expansion ∞ X 1 n (3. so the series converges for any z and then converges uniformly in any finite disc |z| ≤ r. P∞ Theorem 3. and that yields the differential .

to see the significance of that observation consider how otherwise it is possible to make sense of a real power of a real number in a simple and straightforwar manner. The more traditional notation is e(x) = ex . hence that limn→∞ e(x)/xn = ∞ or equivalently limn→∞ xn /e(x) = 0. or . The actual value of the real number e can be calculated approximately.46) that converges on the entire line R. the positive n-th root of e. The function e(x) is called the exponential function and its inverse l(y) is called the logarithm function. from the power series expansion (3. n=1 n Moreover if e(h) = x and e(k) = y then h = l(x) and k = l(y) so e(h + k) = e(h)e(k) = xy hence l(xy) = h + k = l(x) + l(y). hence that e(x) > 1 for all x > 0. and consequently f (x) = e(x) or equivalently e(x + a) = e(a)e(x) whenever a > 0. so the mapping x −→ e(x) is a homeomorphism e : R −→ R+ where R+ = { x ∈ R | x > 0 }.46) that e(x) ≥ 1 + x for all points x ≥ 0. Since all the terms in the series expansion (3. and e(1) = 1 e n1 + n1 + · · · + n1 = e( n1 )n so e( n1 ) = e(1) n .718 with accuracy up to three decimal places. the result is that e = 2. It follows from the series expansion (3.46). Thus eα = e(α) for any rational number α. MAPPINGS equation e0 (x) = e(x). Since e(0) = 1 it follows that l(1) = 0. so the Taylor expansion of this function has the form ∞ X (−1)n−1 n l(1 + y) = y . its inverse hence is a well defined homeomorphism l : R+ −→ R.46) are positive when x > 0 it follows further that e(x) > xn for any positive integer n and all x > 0. thus e(x) increases faster than any power of x for large values of x. Since e0 (x) > 0 the function e(x) is a monotonically strictly increasing function of x. to any degree of accuracy. p consequently e( pq ) = e(1) q . On the other hand it also implies that e(n) = e(1 + 1 + · · · + 1) = e(1)n for any positive integer n. If f (x) = e(x + a)/e(a) for any a > 0 then f (0) = 1 and by the chain rule f 0 (x) = f (x). so e(x) > 0 for all x ∈ R. Thus there is a unique solution of the differential equation e0 (x) = e(x) normalized by e(0) = 1. k e(x + h) − e(x) and since k tends to 0 and h tends to 0 by continuity it follows in the limit that 1 1 l0 (y) = = = y −1 . e0 (x) y It then follows inductively from the preceding equation that l(n) (y) = (−1)n−1 (n − 1)!y −n . and this function has the power series expansion (3. On the one hand that implies that e(−1) = e(x)−1 . and consequently eα is defined to be the real number e(α) for any real number α.94 CHAPTER 3. If e(x) = y and e(x + h) = y + k then l(y) = x and l(y + k) = x + h hence log(y + k) − log(y) h = .

and upon separating the odd and even terms in the Taylor expansion (3.1: e(iy) = c(y) + i s(y) sometimes e(x) = exp(x). and that notation generally will be used henceforth. and l(y) = log(y). z ∈ C. so the extension of the exponential function also satisfies e(wz) = e(w)e(z) for all w. The function e(x) is defined by a Taylor series that converges for all real values of the variable x. which provides an extension of the exponential function to a holo- morphic or complex analytic function e(z) of the variable z ∈ C.3.3. ANALYTIC MAPPINGS 95 Figure 3. When restricted to complex num- bers of the form z = iy for real values y.46) it follows that ∞ ∞ ∞ X (iy)n X (iy)2k X (iy)2k+1 e(iy) = = n=0 n! 2k! (2k + 1)! k=0 k=0 and consequently that ∞ X (−1)k y 2k (3. hence it also converges for all complex values of the variable z. so e(iy) = c(y) + is(y). although sometimes the formulas for differentiation are simpler in the notation e(x) and l(x). The identity e(x + y) = e(x)e(y) translates into equalities for the Taylor polynomials on the two sides of this equation. the real part of e(iy) is a well-defined function c(y) and the imaginary part of e(iy) is a well-defined function s(y).47) c(y) = 2k! k=0 . these polynomial identities then hold for complex values of the variables as well.

and consequently |e(iy)|2 = e(iy)e(iy) = e(iy)e(−iy) = e(iy − iy) = 1 so that |e(iy)| = 1. and s(y) = sin y. Since the coefficients of the Taylor expansion of the exponential function are real numbers it follows that e(iy) = e(−iy). thus the complex numbers e(iy) for all y ∈ R lie on the unit circle |z| = 1 in the complex plane. the sine function. That identifies the sine and cosine with the familiar trigonometric interpretations as the sides of a right triangle with hypotenuse of length 1.49) c0 (y) = −s(y) and s0 (y) = c(y). MAPPINGS and ∞ X (−1)k y 2k+1 (3.1. the cosine function. (2k + 1)! k=0 Upon differentiating the preceding Taylor series it follows readily that (3.96 CHAPTER 3.48) s(y) = . The more customary notation and terminology is that c(y) = cos y. as sketched in Figure 3. .

51 absolute value. 41 column rank associative law. 3 cancellation law. 10. 52 Cauchy sequence Gδ set. 22 commutative group. 34 of matrix. 5 continuously differentiable. 70 category convergrent first and second. 63 Cantor set. 5 continuous. 36 closed sets. 56 Bolzano-Weierstrass Theorem. 79 algebra characteristic function of a set. 2 bounded above. 59 sequence of mappings. 51 compact Basis Theorem. 24 locally bijective compact set. open. 10. 77 Cartesian norm. 55 mapping. 11 open. 28 arithmetic mean. 59 commutative law. 30 axioms commutative algebra. 91 bounded linear transformation. 54 of a subset. 11 connected topological space. 19 bounded below. 4 commutative. 54 alternating column multilinear mapping. 63 Cantor’s Diagonalization Theorem. 10 uniformly. 41 cell abelian group. 52 algebra over a field. 60 complex numbers. 70 97 . 55 Cantor Bernstein Theorem. 39 Chain Rule. 39 convergent Cartesian product. 2 Compactness Theorems. 50 Cauchy-Schwarz inequality. 38 composite mapping. 43 complex conjugate. 38 bounded set. 11 ball. 60 at a point. 6 commutative diagram. 11 Baire’s Theorem. 69 cardinality. 52 uniformly. 10.Index Fσ set. 36 closure. 18 complex analytic function. 70 -neighborhood. 60 complement boundary. 18 complete ordered field. 36 Peano. 11 of matrix. 8 uniformly.

41 continuously. 55 complete ordered. 39 identity law. 91 homeomorphism. 1 group. 53 of a mapping. 33 identity matrix. 10 group isomorphism. 5 polarization. 64 diagonal matrix. 91 difference between two sets. 2 general linear group. 75 countably infinite set. 12 open. 11 distributive law. 40 holomorphic function. 66 empty set. 83 differential form. 11 disjoint sets. 30 characteristic. 91 commutative. 91 Venn. 32 . 46 vector space. 40 Euclidean norm. 77 geometric series. 5 permutation. 60 dot product. 11 directional derivative. 8 derivative. 19 of quotients. 11 domain integral. 1 homomorphism equal sets. 4 diagram complex analytic. 52 symmetric. 2 group homomorphism. 4 identity equivalence relation. 25 group. 31 determinant. 4 in order relation. 2 Heine-Borel Theorem. 83 abelian. 2 symmetric. 73 first category. 5 equivalent sets. 35 function. 11 even identity mapping. 8 covering field. 45 general linear. 42 equivalence classes. 84 greatest lower bound. 18 Dimension Theorem. 11 discrete metric. 55 finite set. 16 finite dimensional. 22 holomorphic.vector. 11 equivalence ring. 17 degree field. 11 discontinuous. 22 of vector. of a set. 84 of polynomial. 12 of metrics. 83 First Normal Form Theorem. 25 dense set. 32 differentiable geometric mean. 73 gradient. 2 real analytic. 21 of norms. 16 Hausdorff space.98 INDEX coordinates exact sequence. 32 discrete topology. 26 extremum countable set. 65 commutative. 8 local. 59 directional. 91 differentiable mapping.

6 linear transformation. linearly independente. 2 of polynomial. 75 Cauchy-Schwarz. 43 inequality linearly dependent. 30 mapping. 6 integral domain. 52 . 45 kernel discrete. 27 space. 2 mean inverse law. 14 mathematical induction. 24 arithmetic and geometric means. 45 vector. 40 multilinear. 21 proof by mathematical. 18 locally compact. 22 equivalent. 53 induction linear subspace. 46 Kronecker norm. 2 inverse. 1 identity. 28 inverse matrix blocks. 52 limit point. 75 infimum. 45 of linear transformation. 85 metric. 12 Mean Value Inequality. 32 open. 14 surjective. 21 injective. 1 limit. 58 infinite countably.INDEX 99 image leading coefficient inverse. of a mapping. 32 interval permutation. minimum. 67 indiscrete topology. 45 symbol. 75. 16 matrix. 21 Mean Value Theorem. 75 of a mapping. 2 mapping. 6 bounded. 21 Peano axiom. 41 isomorphism. 11 arithmetic. 2 injective bijective. 55 geometric. 2 inner product. 41 isolated point. 18 proper. 54 diagonal. 30 intersection of sets. 70 induced topology. 39 ring. 28 interior. 92 alternating multilinear. 52 of sequence of mappings. 2 positive. 85 vector space. 2 linear transformation. proof by. 8 mapping. 51 zero. 2 leading term of linear transformation. 2 maximum inverse image local. 11 mean norm. 34 between sets. 22 of polynomial. 16 of a mapping. 16 inclusion least upper bound. 63 infinite series. 27 metric topology. 34 integers. 41 local maximuim. 24 41 local extremume.

21 perfect set. 17 real analytic function. 30 Peano axiom. 18 proper subset. 22 of matrices. 3 nowhere dense set. 74 reflexivity partial order relation. 10 rational. 1 odd quotient permutation. 33 permutation matrix. 43 power series. 40 Cartesian. odd. 52 range operator norm. 6 modulo. 14 operator. 51 quotient space. 41 rational numbers. 55 field of. 88 permutation multilinear mapping. 43 of a mapping. 33 linear transformation. 28 number of natural numbers. 14 rank origin of matrix. 17 scalar.100 INDEX minimum Peano local. 51 quotients open covering. 8 proper inclusion. 27 numbers proof by mathematical induction. 15 equivalence. 32 natural numbers. 40 null space inner. 54 multi-index notation. 39 power set. 4 . 50 limit. 5 in equivalence relation. 22 open of a set under an equivalence rela- ball. 53 norm. 39 polarization identity. 21 interval. 18 partial derivative. 40 of linear transformation. 39 positive integers. 10 linear transformation. 17 open set. 39 polynomial ring. 61 orthogonal vectors. 90 supremum. 5 cell. 41 neighborhood point . 75 axioms. 2 ordered ring. 50 open sets. 51 tion. 91 parallel vectors. 55 product null Cartesian. 21. 42 positive elements Euclidean. 6 rational neighborhood. 8 perpendicular vectors. 39 in an ordered ring. 34 even. 68 rationals. 17 oscillation. 14 mean. 22 dot. 1 real. 6 natural. 41 real numbers.

45 Mean Value Inequality. 88 subgroup. 50 Cantor Bernstein. 4 power. 33 Chain Rule. 21 homomorphism. 89 subcovering. 5 proper. 1 Baire. 27 linear transformation. 12 succession isomorphism. 79 space Compactness. 53 Dimension. 1 Cantor’s Diagonalization Theorem. 52 . 14 successor polynomial. 5 union. 11 topological space. 22 8 sign Cantor-Bernstein. 18 supremum norm. 85 space. vector. 90 set. 89 empty. 6 row rank sum of matrix. 21 second category. 52 subspace ring. metric. 55 Taylor Expansion. 22 symmetric difference between two sets. 85 span. 1 Taylor expansion. 24 intersection. 55 symbol sequence. 5 subring. 52 Mean Value. 39 scalar. 28 supremum. 11 geometric.INDEX 101 in order relation. 91 symmetry infinite. 21 Heine-Borel. 30 of linear transformations. 15 of a natural number. 12 linear. 8 Basis. 21 Mean Value Theorem. 21. 1 Theorem equality. 59 mapping. 2 separable space. 1 relative topology. 60 topological. 46 Kronecker. 92 in equivalence relation. 88 disjoint. 1 partial order. 12 Peano axiom. 59 finite. 22 2 series symmetric group. 2 Taylor expansion. 5 of permutation. 6 ordered. 45 First Normal Form. several variables. 75 space. 21 surjective scalar product. 1 Bolzano-Weierstrass. short exact sequence. 60 open. 27 exact. 56 Hausdorff. 10 of matrix. 21 rows of natural numbers. 31 quotient. one variable. 25 metric. short exact. 12 relation subset. 24 Taylor Expansion.

52 B A . difference between sets A and B. 74 ℵ0 . symmetric difference between χE . 18 in order relation. identity mapping. polynomial ring over a ring transpose R. 84 N. 6 vector space. 21 C. 26 15 #(A) < #(B). 2 (a. supremum norm. intersection of sets A and `1 norm. characteristic function of set sets A and B. 39 A ' B. 1 #(A). ring of integers. 52 3 transformation Br (a). b). derivative of mapping f . order relation on uniformly convergent sets. 2 Z. order relation on uniformly Cauchy sequence. 32 Notation f 0 (a). y). quotient of set A under equiv- induced. relative. center linear. ∅. 5 . S is an infinite set. empty set. derivative of mapping f . 74 transitiviy E1 ≤ E2 . 39 B. 40 A∆B. 30 Z/nZ. field of complex numbers. 18 1 ι : A −→ A. open interval. of vector. set of natural numbers. set of mappings f : A −→ B. cardinality of the set A. union of sets Aα . 27 ∆. 8 uniformly continuous. 8 vector. 2 2A = (F2 )A . cardinality of N. 10 indiscrete. 74 in equivalence relation. open ball. partial derivative. 15 of matrix. bound of E. Cartesian product of sets discrete. quotient ring of integers. 40 the set of all subsets of A. 3 x.102 INDEX topology A × B. 18 Venn diagram. identity matrix of rank r. 2 E. 8 sequence of mappings. 1 `2 . intersection of sets Aα . 1 39 A ∼ B. 51 vector field. 2 `∞ . x · y. inner product. A ∪ B. 1 ∂yki ping f : A −→ B. 1 inf(E). 5 #(S) = ∞. A is a proper subset of B. field of real numbers. the power set of A. radius r. 70 sets. 52 alence relation . 5 ∂xj . A is a subset of B. 38 Kronecker. there is a bijective map. open cell. 51 Sα α α Aα . 52 A/ . field of rationals. 4 Dk f (a). 52 A and B. 21 a. infimum or greatest lower A $ B. 74 A ⊂ B. 8 T A . 21 Q. union of sets A and B. 70 union of sets. 69 #(A) ≤ #(B). 14 Ir . 4 A ∩ B. partial derivative. dot product. 5 metric. 21 R. Cartesian or Euclidean norm. 51 transitivity Df (a). 17 trivial or zero. 5 R[x].

directional derivative of func- tion f . 30 a ≡ b (mod n). 72 C 1 . general linear group of rank n over field F . 18 a ∈ A. 2 g ◦ f . a is a member of the set A. 30 grad f (x). 30 f −1 . sign of permutation π. 45 ∼ A. 74 ∂u f (a). congruence rela- tion. 18 a0 . column rank of matrix A. metric. continuously differentiable. difference E ∼ A when E is understood. 3 . algebra of bounded functions on S. 68 rrank(A). 32 sgn(π). inverse mapping to f . 15 a ≥ E. successor to a ∈ N. transpose of matrix A. the power set of A. F ).INDEX 103 Gl(n. 2 sup(E). 33 ∇f (x). 1 a ≤ E. 83 ρ(x. 6 crank(A). supremum or least upper bound of E. row rank of matrix A. oscillation of function f . 18 t A. 3 of (E). 83 ∂k f . 83 B(S). partial derivative. the set of all subsets of A. y). of (a). gradient of function f (x). the composite mapping. 50 P(A). gradient of function f (x). 77 N (a).