I

N

E

A

R

L

G

E

B

R

A

STEPHEN H. FRIEDBERG

ARNOLD J. INSEL

LAWRENCE E. SPENCE

L i s t

o if

S y m b o l s

Aj A'1 Al A* Al3 A' (A\B) Bi@---®Bk B(V) 0* Px C Q cond(/4) C"{R) Coc C(R) C([0,1]) Cx D det(A) ^»i dim(V) eA ?A EA F /(^) F"

the ij-th entry of the matrix A the inverse of the matrix A the pseudoinverse of the matrix A the the the the the the adjoint of the matrix A matrix A with row i and column j deleted transpose of the matrix A matrix A augmented by the matrix B direct sum of matrices Bi through B^ set of bilinear forms on V

page 9 page 100 page 414 page 331 page page page page page page 210 17 161 320 422 120

the dual basis of 0 the T-cyclic basis generated by x the field of complex numbers the iih Gerschgorin disk the condition number of the matrix A set of functions / on R with / ' " ' continuous set of functions with derivatives of every order the vector space of continuous functions on R the vector space of continuous functions on [0,1] the T-cyclic subspace generated by x the derivative operator on C°° the determinant of the matrix A the Kronecker delta the dimension of V lim m _ 0 0 ( / + A + 4r + • • • + AS) the ith standard vector of F" the eigenspace of T corresponding to A a field the polynomial f(x) evaluated at the matrix A the set of n-tuples with entries in a field F

page 526 page 7 page 296 page page page page page 469 21 130 18 331

page 525 page 131 page 232 page 89 page 47 page 312 page 43 page 264 page 6 page 565 page 8

/(T) J-'iS, F) H /„ or / lv or I K\ K,;, L\ Inn Am m—oo £(V) £(V. W) Mt/)x;i(/') is(A) Vj(A) N(T) nullity(T) 0 per(.V) P(F) P„(/'") o.i R rank(/4) rank(T) f>(A) Pi(A) R(T)

the polynomial /(•»') evaluated at the operator T the set of functions from S to a field F space of continuous complex functions on [0. 2TT] the // x n identity matrix the identity operator on V generalized eigenspace of T corresponding to A {./•: (<.'>(T))''(.r) = 0 for sonic positive integer p} left-multiplicatioD transformation by matrix A the limit of a sequence of matrices the space of linear transformations from V to V the space of linear transformations from V to W the set of m x // matrices with entries in F the column sum of the matrix A the j t h column sum of the matrix A the null space of T the dimension of the null space of T the zero matrix the permanent of the 2 x 2 matrix M the space of polynomials with coefficients in F the polynomials in P(F) of degree at most // standard representation with respect to basis 3 the field of real numbers the rank of the matrix A the rank of the linear transformation T the row sum of the matrix A the ith row sum of the matrix A the range of the linear transformation T CONTINUED ON REAR ENDPAPERS

page page page page page page page page page

565 9 332 89 07 485 525 92 284

page 82 page 82 page 9 page 295 age 295 age 67 >age 69 age 8 Pi ige 448 P< age 10 P P ige 18 pa ge 104 page 7 pa ;e 152 Da ;e 69 page 295 •v 295 re 67

T^

7

~

n

C o n t e n t s

Preface

IX

1 Vector Spaces 1.1 1.2 1.3 1.4 1.5 1.6 1.7* Introduction Vector Spaces Subspaces Linear Combinations and Systems of Linear Equations . . . . Linear Dependence and Linear Independence Bases and Dimension Maximal Linearly Independent Subsets Index of Definitions

1 1 6 16 24 35 42 58 62

2 Linear Transformations and M a t r i c e s 2.1 2.2 2.3 Linear Transformations. Null Spaces, and Ranges The Matrix Representation of a Linear Transformation Composition of Linear Transformations and Matrix Multiplication 2.4 Invertibility and Isomorphisms 2.5 The Change of Coordinate Matrix 2.6" Dual Spaces 2.7* Homogeneous Linear Differential Equations with Constant Coefficients Index of Definitions

64 64 79 86 99 110 119 127 145

...

3 Elementary M a t r i x O p e r a t i o n s a n d Systems o f Linear Equations 147 3.1 Elementary Matrix Operations and Elementary Matrices . . 147

Sections denoted by an asterisk are optional.

vi 3.2 3.3 3.4

Table of Contents The Rank of a Matrix and Matrix Inverses Systems of Linear Equations Theoretical Aspects Systems of Linear Equations Computational Aspects . . . . Index of Definitions 152 168 182 198

4

Determinants 4.1 4.2 4.3 4.4 4.5* Determinants of Order 2 Determinants of Order n Properties of Determinants Summary Important Facts about Determinants A Characterization of the Determinant Index of Definitions

199 199 209 222 232 238 244

5

Diagonalization 5.1 5.2 5.3s 5.4

245

Eigenvalues and Eigenvectors 245 Diagonalizability 261 Matrix Limits and Markov Chains 283 Invariant Subspaces and the Cayley Hamilton Theorem . . . 313 Index of Definitions 328

6

Inner P r o d u c t Spaces 6.1 6.2

329

Inner Products and Norms 329 The Gram-Schmidt Orthogonalization Process and Orthogonal Complements 341 6.3 The Adjoint of a Linear Operator 357 6.4 Normal and Self-Adjoint. Operators 369 6.5 Unitary and Orthogonal Operators and Their Matrices . . . 379 6.6 Orthogonal Projections and the Spectral Theorem 398 6.7* The Singular Value Decomposition and the Pseudoinverse . . 405 6.8* Bilinear and Quadratic Forms 422 6.9* Einstein A s Special Theory of Relativity 151 6.10* Conditioning and the Rayleigh Quotient 464 6.11* The Geometry of Orthogonal Operators 472 Index of Definitions 480

Table of Contents 7 Canonical Forms 7.1 7.2 7.3 7.4* The Jordan Canonical Form I The Jordan Canonical Form II The Minimal Polynomial The Rational Canonical Form Index of Definitions 482 482 497 516 524 548 549 549 551 552 555 561 571

Appendices A B C D E Sets Functions Fields Complex Numbers Polynomials

Answers t o Selected Exercises

Index

589

_• . »

z

P r e f a c e The language and concepts of matrix theory and, more generally, of linear algebra have come into widespread usage in the social and natural sciences, computer science, and statistics. In addition, linear algebra continues to be of great importance in modern treatments of geometry and analysis. The primary purpose of this fourth edition of Linear Algebra is to present a careful treatment of the principal topics of linear algebra and to illustrate the power of the subject through a variety of applications. Our major thrust emphasizes the symbiotic relationship between linear transformations and matrices. However, where appropriate, theorems are stated in the more general infinite-dimensional case. For example, this theory is applied to finding solutions to a homogeneous linear differential equation and the best approximation by a trigonometric polynomial to a continuous function. Although the only formal prerequisite for this book is a one-year course in calculus, it requires the mathematical sophistication of typical junior and senior mathematics majors. This book is especially suited for a. second course in linear algebra that emphasizes abstract vector spaces, although it can be used in a first course with a strong theoretical emphasis. The book is organized to permit a number of different courses (ranging from three to eight semester hours in length) to be taught from it. The core material (vector spaces, linear transformations and matrices, systems of linear equations, determinants, diagonalization. and inner product spaces) is found in Chapters 1 through 5 and Sections 6.1 through 6.5. Chapters 6 and 7, on inner product spaces and canonical forms, are completely independent and may be studied in either order. In addition, throughout the book are applications to such areas as differential equations, economics, geometry, and physics. These applications are not central to the mathematical development, however, and may be excluded at the discretion of the instructor. We have attempted to make it possible for many of the important topics of linear algebra to be covered in a one-semester course. This goal has led us to develop the major topics with fewer preliminaries than in a traditional approach. (Our treatment of the Jordan canonical form, for instance, does not require any theory of polynomials.) The resulting economy permits us to cover the core material of the book (omitting many of the optional sections and a detailed discussion of determinants) in a one-semester four-hour course for students who have had some prior exposure to linear algebra. Chapter 1 of the book presents the basic theory of vector spaces: subspaces, linear combinations, linear dependence and independence, bases, and dimension. The chapter concludes with an optional section in which we prove

x

Preface

that every infinite-dimensional vector space has a basis. Linear transformations and their relationship to matrices are the subject of Chapter 2. We discuss the null space and range of a linear transformation, matrix representations of a linear transformation, isomorphisms, and change of coordinates. Optional sections on dual spaces and homogeneous linear differential equations end the chapter. The application of vector space theory and linear transformations to systems of linear equations is found in Chapter 3. We have chosen to defer this important subject so that it can be presented as a consequence of the preceding material. This approach allows the familiar topic of linear systems to illuminate the abstract theory and permits us to avoid messy matrix computations in the presentation of Chapters 1 and 2. There arc occasional examples in these chapters, however, where we solve systems of linear equations. (Of course, these examples are not a part of the theoretical development.) The necessary background is contained in Section 1.4. Determinants, the subject of Chapter 4, are of much less importance than they once were. In a short course (less than one year), we prefer to treat determinants lightly so that more time may be devoted to the material in Chapters 5 through 7. Consequently we have presented two alternatives in Chapter 4 -a complete development of the theory (Sections 4.1 through 4.3) and a summary of important facts that are needed for the remaining chapters (Section 4.4). Optional Section 4.5 presents an axiomatic development of the determinant. Chapter 5 discusses eigenvalues, eigenvectors, and diagonalization. One of the most important applications of this material occurs in computing matrix limits. We have therefore included an optional section on matrix limits and Markov chains in this chapter even though the most general statement of some of the results requires a knowledge of the Jordan canonical form. Section 5.4 contains material on invariant subspaces and the Cayley Hamilton theorem. Inner product spaces are the subject of Chapter 6. The basic mathematical theory (inner products; the Grain Schmidt process: orthogonal complements; the adjoint of an operator; normal, self-adjoint, orthogonal and unitary operators; orthogonal projections; and the spectral theorem) is contained in Sections 6.1 through 6.6. Sections 6.7 through 6.11 contain diverse applications of the rich inner product space structure. Canonical forms are treated in Chapter 7. Sections 7.1 and 7.2 develop the Jordan canonical form, Section 7.3 presents the minimal polynomial, and Section 7.4 discusses the rational canonical form. There are five appendices. The first four, which discuss sets, functions, fields, and complex numbers, respectively, are intended to review basic ideas used throughout the book. Appendix E on polynomials is used primarily in Chapters 5 and 7, especially in Section 7.4. We prefer to cite particular results from the appendices as needed rather than to discuss the appendices

Preface

xi

independently. The following diagram illustrates the dependencies among the various chapters. Chapter 1

Chapter 2

Chapter 3

Sections 4.1 4.3 or Section 4.4

Sections 5.1 and 5.2

Chapter 6

Section 5.4

Chapter 7

One final word is required about our notation. Sections and subsections labeled with an asterisk (*) are optional and may be omitted as the instructor sees fit. An exercise accompanied by the dagger symbol (f) is not optional, however -we use this symbol to identify an exercise that is cited in some later section that is not optional. DIFFERENCES BETWEEN THE THIRD AND FOURTH EDITIONS The principal content change of this fourth edition is the inclusion of a new section (Section 6.7) discussing the singular value decomposition and the pseudoinverse of a matrix or a linear transformation between finitedimensional inner product spaces. Our approach is to treat this material as a generalization of our characterization of normal and self-adjoint operators. The organization of the text is essentially the same as in the third edition. Nevertheless, this edition contains many significant local changes that im-

xii

Preface

prove the book. Section 5.1 (Eigenvalues and Eigenvectors) has been streamlined, and some material previously in Section 5.1 has been moved to Section 2.5 (The Change of Coordinate Matrix). Further improvements include revised proofs of some theorems, additional examples, new exercises, and literally hundreds of minor editorial changes. We are especially indebted to Jane M. Day (San Jose State University) for her extensive and detailed comments on the fourth edition manuscript. Additional comments were provided by the following reviewers of the fourth edition manuscript: Thomas Banchoff (Brown University), Christopher Heil (Georgia Institute of Technology), and Thomas Shemanske (Dartmouth College). To find the latest information about this book, consult our web site on the World Wide Web. We encourage comments, which can be sent to us by e-mail or ordinary post. Our web site and e-mail addresses are listed below. web site: e-mail: http://www.math.ilstu.edu/linalg linalg@math.ilst u.edu Stephen H. Friedberg Arnold J. Insel Lawrence R. Spence

For example.1 Introduction 1. or acceleration) and a direction.1 V e c t o r S p a c e s 1.2 Vector Spaces 1.5 Linear Dependence and Linear Independence 1. Any such entity involving both magnitude and direction is called a "vector.3 Subspaces 1. the magnitude of their effect need not equal the sum of the magnitudes of the original quantities. Lor in this instance the motions of the swimmer and current. such as forces. however." A vector is represented by an arrow whose length denotes the magnitude of the vector and whose direction represents the direction of the vector. . regard for the direction of motion) is called its speed. In tins section the geometry of vectors is discussed. consequently. oppose each other. This geometry is derived from physical experiments that. the 'The word velocity is being used here in its scientific sense as an entity having both magnitude and direction. velocity. involve both a magnitude (the amount of the force. test the manner in which two vectors interact. a swimmer swimming upstream at the rate of 2 miles per hour against a current of 1 mile per hour does not progress at the rate of 3 miles per hour. only the magnitude and direction of the vector are significant. and the rate of progress of the swimmer is only 1 mile per hour upstream. we regard vectors with the same magnitude and direction as being equal irrespective of their positions.1 and accelerations. If.6 Bases and Dimension 1.1 INTRODUCTION Many familiar physical notions. The magnitude of a velocity (without.7* Maximal Linearly Independent Subsets 1. In most physical situations involving vectors.4 Linear Combinations and Systems of Linear Equations 1. Familiar situations suggest that when two like physical quantities act simultaneously at a point. velocities.

2(a) shows. the endpoint of the vector x + y can be obtained by first permitting y to act at P and then allowing x to act at the endpoint of y. can be performed on vectors—the length of a vector may be magnified . then his or her rate of progress is 3 miles per hour downstream. the endpoint Q of the arrow representing x + y can also be obtained by allowing x to act at P and then allowing y to act at the endpoint of x. Besides the operation of vector addition. the endpoint Q of x+y is (ay + b\. (Z2) denote the endpoint of x and (fei. a-2 + b?). Let (o. there is another natural operation that. the endpoint of the second vector is the endpoint of x + y. the vector should be assumed to emanate from the origin. the vectors used to represent these quantities can be combined to form a resultant vector that represents the combined effects of the original quantities. 62) denote the endpoint of y.1. This resultant vector is called the sum of the original vectors.i.2 Chap. Henceforth. The sum of two vectors x and y that act at the same point P is the vector beginning at P that is represented by the diagonal of parallelogram having x and y as adjacent sides. when a reference is made to the coordinates of the endpoint. and the rule for their combination is called the parallelogram law. Then as Figure 1. Experiments show that if two like quantities act together. Moreover. (See Figure 1.1 Parallelogram Law for Vector Addition. Since opposite sides of a parallelogram are parallel and of equal length. The addition of vectors can be described algebraically with the use of analytic geometry. 1 Vector Spaces swimmer is moving downstream (with the current). that is. Similarly. since a vector. Thus two vectors x and y that both act at the point P may be added "tail-to-head".) Figure 1. In the plane containing x and y. either x or y may be applied at P and a vector having the same magnitude and direction as the other may be applied to the endpoint of the first. In this case. If this is done. we sometimes refer to the point x rather than the endpoint of the vector x if x is a vector emanating from the origin. introduce a coordinate system with P at the origin. their effect is predictable.beginning at the origin is completely determined by its endpoint. of a vector.

For each pair of real numbers a and 6 and each vector x.62] Figure 1. For all vectors x. There exists a vector denoted 0 such that x + 0 = x for For each vector x. y. Two nonzero vectors x and y are called parallel if y = tx for some nonzero real number t. consists of multiplying the vector by a real number.2 or contracted. 0. as well as the geometric interpretations of vector addition and scalar multiplication. This operation. are true also for vectors acting in space rather than in a plane. and z.Sec. (ab)x = a(bx). x + y = y + x. 5. 7.2(b). (a + b)x = Arguments similar to the preceding ones show that these eight properties. (See Figure 1.hi) {ai +61. 6. (x + y) + z = x + (y + z).) To describe scalar multiplication algebraically. again introduce a coordinate system into a plane containing the vector x so that x emanates from the origin. These results can be used to write equations of lines and planes in space. 8. For all vectors x and y.) The algebraic descriptions of vector addition and scalar multiplication for vectors in a plane yield the following properties: 1. For each real number a and each pair of vectors x and ax + ay. y. the vector tx is represented by an arrow in the same direction if t > 0 and in the opposite direction if t < 0. then the coordinates of the endpoint of tx are easily seen to be (toi. The length of the arrow tx is |i| times the length of the arrow x.02). For each pair of real numbers a and b and each vector ax + bx.t02). 1. 4. called scalar multiplication. each vector x. a(x + y) = x. lx — x. If the endpoint of x has coordinates (01. there is a vector y such that x + y = For each vector x. 2. . 3. If the vector x is represented by an arrow. (Thus nonzero vectors having the same or opposite directions are parallel.a2 4.1 Introduction {taittaa) 1( C M +6i. then for any nonzero real number t.

4

Chap. 1 Vector Spaces

Consider first the equation of a line in space that passes through two distinct points A and B. Let O denote the origin of a coordinate system in space, and let u and v denote the vectors that begin at. O and end at A and B, respectively. If w denotes the vector beginning at A and ending at B. then "tail-to-head" addition shows that u + w — v, and hence w — v — u, where —u denotes the vector (—l)u. (See Figure 1.3, in which the quadrilateral OABC is a parallelogram.) Since a scalar multiple of w is parallel to w but possibly of a different length than w, any point on the line joining A and B may be obtained as the endpoint of a vector that begins at A and has the form tw for some real number t. Conversely, the endpoint of every vector of the form tw that begins at A lies on the line joining A and B. Thus an equation of the line through A and B is x = u + tw = u + t(v — u), where t is a real number and x denotes an arbitrary point on the line. Notice also that the endpoint C of the vector v — u in Figure 1.3 has coordinates equal to the difference of the coordinates of B and A.

^C Figure 1.3 Example 1 Let A and B be points having coordinates (—2,0,1) and (4,5,3), respectively. The endpoint C of the vector emanating from the origin and having the same direction as the vector beginning at A and terminating at B has coordinates (4,5,3) - ( - 2 , 0 , 1 ) = (6, 5, 2). Hence the equation of the line through A and B is x = (-2,0.1)-R(6,5,2). •

Now let A, IS, and C denote any three noncollinear points in space. These points determine a unique plane, and its equation can be found by use of our previous observations about vectors. Let u and v denote vectors beginning at A and ending at B and C, respectively. Observe that any point in the plane containing A, B, and C is the endpoint S of a vector x beginning at A and having the form su + tv for some real numbers s and t. The endpoint of su is the point of intersection of the line through A and B with the line through S

Sec. 1.1 Introduction

*-C Figure 1.4 parallel to the line through A and C. (See Figure 1.4.) A similar procedure locates the endpoint of tv. Moreover, for any real numbers 5 and i, the vector su + tv lies in the plane containing A, B, and C. It follows that an equation of the plane containing A, B, and C is x = A + su + tv. where s and t are arbitrary real numbers and x denotes an arbitrary point in the plane. Example 2 Let A, B, and C be the points having coordinates (1,0,2), (—3,-2,4), and (1,8, -5), respectively. The endpoint of the vector emanating from the origin and having the same length and direction as the vector beginning at A and terminating at B is (-3,-2,4)-(1,0,2) = (-4,-2,2). Similarly, the endpoint of a vector emanating from the origin and having the same length and direction as the vector beginning at A and terminating at C is (1,8, —5) — (1,0,2) = (0,8,-7). Hence the equation of the plane containing the three given points is x = (l,0,2) + . s ( - 4 , - 2 , 2 ) + f.(0,8,-7). •

Any mathematical structure possessing the eight properties on page 3 is called a vector space. In the next section we formally define a vector space and consider many examples of vector spaces other than the ones mentioned above. EXERCISES 1. Determine whether the vectors emanating from the origin and terminating at the following pairs of points are parallel.

Chap. 1 Vector Spaces (a) (b) (c) (d) (3. 1,2) and (6,4,2) ( - 3 , 1 , 7 ) and ( 9 . - 3 , - 2 1 ) ( 5 . - 6 . 7 ) and ( - 5 , 6 , - 7 ) ( 2 . 0 , - 5 ) and (5,0,-2)

2. hind the equations of the lines through the following pairs of points in space. (a) ( 3 , - 2 , 4 ) and (-5.7,1) (b) (2,4.0) and ( - 3 , - 6 . 0 ) (c) (3, 7. 2) and (3, 7. - 8 ) (d) ( - 2 , - 1 , 5 ) and (3,9,7) 3. Find the equations of the planes containing the following points in space. (a) (b) (c) (d) (2. - 5 , - 1 ) . (0,4.6). and ( - 3 , 7 . 1 ) (3, - 6 , 7), ( - 2 , 0 . - 4 ) , and (5, - 9 , - 2 ) ( - 8 . 2,0), (1.3,0), and (6, - 5 , 0 ) (1.1.1). (5,5.5). and ( - 6 , 4 , 2 )

4. What arc the coordinates of the vector 0 in the Euclidean plane that satisfies property 3 on page 3? Justify your answer. 5. Prove that if the vector x emanates from the origin of the Euclidean plane and terminates at the point, with coordinates (01,02), then the vector tx that emanates from the origin terminates at the point with coord i n ates (/ a \. Ia •>) • 6. Show that the midpoint of the line segment joining the points (a.b) and (c.d) is ((a \ c)/2,(6 + d)/2). 7. Prove that the diagonals of a parallelogram bisect each other. 1.2 VECTOR SPACES

In Section 1.1. we saw that, with the natural definitions of vector addition and scalar" multiplication, the vectors in a plane; satisfy the eight properties listed on page 3. Many other familiar algebraic systems also permit definitions of addition and scalar multiplication that satisfy the same eight properties. In this section, we introduce some of these systems, but first we formally define this type of algebraic structure. Definitions. A vector space (or linear space) V over a field2 F consists of a set on which two operations (called addition and scalar multiplication, respectively) are defined so that for each pair of elements x, y.
2

Fields are discussed in Appendix C.

Sec. 1.2 Vector Spaces

7

in V there is a unique element x + y in V, and for each element a in F and each element x in V there is a unique element ax in V, such that the following conditions hold. (VS 1) For all x, y in V, x + y = y + x (commutativity of addition).

(VS 2) For all x, y, z in V, (x + y) + z = x + (y + z) (associativity of addition). (VS 3) There exists an element in V denoted by 0 such that x + 0 = x for each x in V. (VS 4) For each element x in V there exists an clement y in V such that x + y = 0. (VS 5) For each clement x in V, 1.x = x. (VS 6) For each pair of elements a, b in F and each element x in V, (ab)x — a(bx). (VS 7) For each clement a in F and each pair of elements x, y in V, a(x + y) — ax + ay. (VS 8) For each pair of elements o, b in F and each clement x in V, (a + b)x — ax + bx. The elements x + y and ax are called the sum of x and y and the of a and x, respectively. product

The elements of the field F are called scalars and the elements of the vector space V are called vectors. The reader should not confuse this use of the word "vector'' with the physical entity discussed in Section 1.1: the word "vector" is now being used to describe any element of a vector space. A vector space is frequently discussed in the text without explicitly mentioning its field of scalars. The reader is cautioned to remember, however, that every vector space is regarded as a vector space over a given field, which is denoted by F. Occasionally we restrict our attention to the fields of real and complex numbers, which are denoted R and C. respectively. Observe that (VS 2) permits us to define the addition of any finite number of vectors unambiguously (without the use of parentheses). In the remainder of this section we introduce several important examples of vector spaces that are studied throughout this text. Observe that in describing a vector space, it is necessary to specify not only the vectors but also the operations of addition and scalar multiplication. An object of the form (ai ,02, • • •, a n ), where the entries a\, a 2 , . . . , a n are elements of a field F, is called an n-tuple with entries from F. The elements

8

Chap. 1 Vector Spaces

01,02 "„ are called the entries or components of the //-tuple. Two n-tuples (</\.a-2 an) and (61,62 6 n ) with entries from a field F are called equal if a, — b, lor / 1.2 11. Example 1 Thesel of all n-tuples with enl ries from a held F is denoted by F". This set is a vector space over F with the operat ions of coordinatewise addition and scalar multiplical ion; that is, if u — (01,02 a,,) € F". v — (61, 62 ....!>,,) € F", and c € F, then ;/ I i'(a.| I 61,02 + 62 "» +6„) and C K = (cai,C02 <""»)•

Thus R'5 is n vector space over R. In this vector space, ( 3 , - 2 , 0 ) + ( - 1 , 1 , 4 ) = (2, I. II and -5(1. 2,0) = (-5,10,0).

Similarly. C is a vector space over ('. In this vector space. (1 + i, 2) + (2 - 3i, 4z) = (3 - 2/, 2 + Ai) and i( I I i, 2) = ( - 1 + 1, 2t).

Vectors in F" may be written as column vectors A'.\

W rat her than as row vectors (a \.'/_an). Since a 1 -1 uple whose only enl ry is from F can be regarded as an element of F, we usually write F rather than F1 for the vector space of l-tuples with entry from F. • An /// x 11 matrix with entries from a field F is a rectangular array of the form fau "21 "u 022 aln\ "2n where each entry alj (1 < i < m, I • j O-mn) < n) is an element of F. We call the entries c/,; with / ~ 7 the diagonal entries of the matrix. The entries a,\.a;-2 </,„ compose the ?'th row of the matrix, and the entries a\r(i<i O-rnj compose the j t h column of the matrix. The rows of the preceding matrix are regarded as vectors in F". and the columns are regarded as vectors in F'". The m - n matrix in which each entry equals zero is called the zero matrix and is denoted by ().

Sec. 1.2 Vector Spaces

9

In this book, we denote matrices by capital italic letters (e.g., A. B, and C). and we denote the entry of a matrix A that lies in row i and column j by Ajj. In addition, if the number of rows and columns of a matrix are equal. the matrix is called square. Two m x n matrices A and B are called equal if all their corresponding entries are equal, that is, if Ajj = By for 1 < i < m and 1 < j < n. Example 2 The set of all mxn matrices with entries from a Held F is a vector space, which we denote by M m x n ( F ) , with the following operations of matrix addition and scalar multiplication: For A, B C M m x n ( F ) and c £ F, (4 + B)y = Ay + By and (cA)ij - Ci4y

for 1 < i < m and 1 < j < n. For instance, 2 1 and -3 1 1 1 M2X3 R) Example 3 Let S be any nonempty set and /•" be any field, and let F(S,F) denote the set of all functions from S to F. Two functions / and g in T(S, F) are called equal if f(s) = g(s) for each s £ S. The set J-'iS.F) is a vector space with the operations of addition and scalar multiplication defined for / . g G - .F(S. F) and c € F by (/ + 0)OO=/(*)+0(«) and (c/)(.s)=c[/(,s)] -3 0 2 -2 0 •6 6 -9 0 -3 -r> —9 4 3 6 -3 -2 5 1 1 3

for each s e S. Note that these are the familiar operations of addition and scalar multiplication for functions used in algebra and calculus. • A polynomial with coefficients from a. field F is an expression of the form f(x) = anxn + an-1x1 + <I]:V + do,

where n is a nonnegative integer and each a.f,.. called the coefficient of x , is in F. If f(x) = 0. that is, if a„ — an-\ — • • • = at) = 0. then j'(.r) is called the zero polynomial and. for convenience, its degree is defined to be —1;

10

Chap. 1 Vector Spaces

otherwise, the degree of a, polynomial is defined to be the largest exponent of x that appears in the representation /(./;) = anxn + a n _ i : r n - ] + • • • + axx + a0 with a nonzero coefficient. Note that the polynomials of degree zero may be written in the form f(x) — c for some nonzero scalar c. Two polynomials. f(x) = anxn + an-ixn~l and g(x) = bmxm + 6 m _ 1 x"'- 1 + • • • + blX + b0, are called equal if m = n and a-i = b,. for ? — 0 . 1 . . . . , n. When F is a field containing infinitely many scalars, we usually regard a polynomial with coefficients from F as a function from F into F. (See page 569.) In this case, the value of the function f(x) = anxn + an-yxn at c e F is the scalar /(c) = anc" + an_1cn
l x

+

h axx + a0

+ •••+ ayx + a0

+ • • • + aye + aQ.

Here either of the notations / or /(./;) is used for the polynomial function f{x) = anxn + o n _ i ^ n _ 1 + • • • + axx + a0. Example 4 Let f(x) = anxn + an-ixn~l and g{x) = bmxm + 6 m _iar m - 1 + • • • + 6,.r + 60 be polynomials with coefficients from a field F. Suppose that m < n, and define 6 m _i = 6 m + 2 = • • • = 6 n = 0. Then g(x) can be written as 9{x) = bnx" + 6 n _ i x n _ 1 + • • • + bix + b0. Define f(x)+g{x) - (o n + 6„)a;n + ( a n _ i + 6 n . i).r"-' +• • • + («, + 6,).r + (a 0 + 60) c/(;r) = canxn + can. ixn~l H h axx + a0

and for any c € F, define H h coia; + ca 0 .

With these operations of addition and scalar multiplication, the set of all polynomials with coefficients from F is a vector space, which we denote by P(F). •

Sec. 1.2 Vector Spaces

11

We will see in Exercise 23 of Section 2.4 that the vector space defined in the next, example is essentially the same as P(F). Example 5 Let F be any field. A sequence in F is a function a from the positive integers into F. In this book, the sequence o such that, o~(n) — an for n — 1.2.... is denoted {a,,}. Let V consist of all sequences {o n } in F that have only a finite number of nonzero terms a„. If {«„} and {6n} are in V and t € F . define {/'.„} + {6n} = {an + bn} and t{an} = {tan}. •

With these operations V is a vector space.

Our next two examples contain sets on which addition and scalar multiplication are defined, but which are not vector spaces. Example 6 Let S = {(ai, 02): 01,02 £ R}- For ( C M , 0-2), (61,62) E S and c € R, define (oi,a 2 ) + (61,62) = ( C M + 61,02 - 62) and c(oi,a 2 ) = (cai,ca 2 ).

Since (VS f). (VS 2), and (VS 8) fail to hold, S is not a vector space with these operations. • Example 7 Let S be as in Example 6. For (01,02), (61,62) £ S and c 6 R, define (a i. a-)) + (61,62) = [a 1 + b\, 0) and c(o\, a->) = (ca 1,0).

Then S is not a vector space with these operations because (VS 3) (hence (VS 4)) and (VS 5) fail. • We conclude this section with a few of the elementary consequences of the definition of a vector space. Theorem 1.1 (Cancellation Law for Vector Addition). If x, y, and z are vectors in a vector space V .such that x + z = y + z, then x = y. Proof. There exists a vector v in V such that 2 + v = 0 (VS 4). Thus
x

= x + 0 = x + (z + v) = (x + z) + v = (y + z) + v = y + (z + v) =y+0=y I

by (VS 2) and (VS 3). Corollary 1. The vector 0 described in (VS 3) is unique.

12 Proof. Exercise.

Chap. 1 Vector Spaces ]| i

Corollary 2. The vector y described in (VS 4) is unique. Proof. Exercise. The vector 0 in (VS 3) is called the zero vector of V, and the vector y in (VS 4) (that is. the unique vector such that x + y = 0) is called the additive inverse of x and is denoted by —x. The next result, contains some of the elementary properties of scalar multiplication. Theorem 1.2. In any vector space V, the following statements are true: (a) Ox = 0 for each x 6 V. (b) (—a)x = —(ax) = a.(—x) for each a £ F and each x £ V. (c) aO = 0 for each a € F. Proof, (a) By (VS 8), (VS 3), and (VS 1), it follows that Ox + Ox - (0 + 0 > = 0:/: = 0.r +0 = 0 + Ox. Hence Ox = 0 by Theorem 1.1. (b) The vector —(ax) is the unique element of V such that ax + [—(ax)] — 0. Thus if ax + (—a)x — 0, Corollary 2 to Theorem 1.1 implies that {-a).r = -(ax). But. by (VS 8), ax + (-a)x = [a + {-a)]x = Ox = 0 by (a). Consequently (—a)x = —(ax). by (VS 6). a(-x)=a[(-l)x] = [a(-l)]x = (-a)x. 1 The proof of (c) is similar to the proof of (a). EXERCISES 1. Label the following statements as true or false. (a) (b) (c) (d) (e) (f) (g) (h) Every vector space contains a zero vector. A vector space may have more than one zero vector. In any vector space, ax = bx implies that a = 6. In any vector space, ax = ay implies that x = y. A vector in F" may be regarded as a matrix in M.„ x i(F). An m x n matrix has m columns and n rows. In P(F). only polynomials of the same degree may be added. If / and o are polynomials of degree n, then / + g is a polynomial of degree n. (i) If / is a polynomial of degree n and c is a nonzero scalar, then cf is a polynomial of degree n. In particular. (—l)x = —x. So,

Sec. 1.2 Vector Spaces

13

(j) A nonzero scalar of F may be considered to be a polynomial in P(F) having degree zero, (k) Two functions in F(S, F) are equal if and only if they have the same value at each element of S. 2. Write the zero vector of 3. If M = what are Mia,M2i, and M22? 4. Perform the indicated operations. 1 2 3 4 5 6 "3x4 (F).

w (d)

<\

I

1

(e) (2.T4 - 7x3 + Ax + 3) + (8x3 + 2x2 - 6x + 7) (f) (-3a; 3 + 7a;2 + 8a - 6) + (2a:3 - 8ar + 10) (g) 5(2.r7 - 6x4 + 8x2 - 3a) (h) 3(.x5 - 2a:3 + 4a; + 2) Exercises 5 and 6 show why the definitions of matrix addition and scalar multiplication (as defined in Example 2) are the appropriate ones. 5. Richard Card ("Effects of Beaver on Trout in Sagehen Creek. California," ./. Wildlife Management. 25, 221-242) reports the following number of trout, having crossed beaver dams in Sagehen Creek.

Upstream Crossings Fall Brook trout. Rainbow trout Brown trout. Spring Summer

14 Downstream Crossings Fall 9 * 1 Spring 1 (J 1

Chap. 1 Vector Spaces

Brook trout Rainbow trout Brown trout

Summer 4 0 0

Record the upstream and downstream crossings in two 3 x 3 matrices. and verify that the sum of these matrices gives the total number of crossings (both upstream and downstream) categorized by trout species and season. 6. At the end of May, a furniture1 store had the following inventory.

Living room suites Bedroom suites Dining room suites

Early American 1 5 3

Spanish 2 1 1

Mediterranean ] I 2

Danish 3 4 6

Record these data as a 3 x 4 matrix M. To prepare for its June sale, the store decided to double its inventory on each of the items listed in the preceding table. Assuming that none of the present stock is sold until the additional furniture arrives, verify that the inventory on hand after the order is filled is described by the matrix 2M. If the inventory at the end of June is described by the matrix A = 5 3 1 2 (i 2 1 • ) 1 0 3 3

interpret 2M — A. How many suites were sold during the June sale? 7. Let 5 = {(). 1} and F - R. In F(S, R), show that. / = g and / -t g = h. where f(t) = 2t + 1. g(t) = 1 + At - It2, and h(t) = 5' + 1. 8. In any vector space1 V, show that, (a + b)(x + y) = ax + ay + bx + by for any x.y £ V and any o,6 £ F. 9. Prove Corollaries 1 and 2 of Theorem 1.1 and Theorem 1.2(c). 10. Let V denote the set of all differentiablc real-valued functions defined on the real line. Prove that V is a vector space with the operations of addition and scalar multiplication defined in Example 3.

Sec. 1.2 Vector Spaces

15

11. Let V = {()} consist of a single vector 0 and define 0 + 0 = 0 and cO = 0 for each scalar c in F. Prove that V is a vector space over F. (V is called the zero vector space.) 12. A real-valued function / defined on the real line is called an even function if /(—/,) = f(t) for each real number t. Prove that the set. of even functions defined on the real line with the operations of addition and scalar multiplication defined in Example 3 is a vector space1. 13. Let V denote the set of ordered pairs of real numbers. If (01,02) and (6], 62) are elements of V and c £ R, define (a 1, <I-I) + (61,62) = (O] + 6], o 2 6 2 ) and c.(a{, a2) = (cai, a2).

Is V a vector space over R with these operations? Justify your answer. 14. Let V = {(c/.i, a2..... o n ) : c/.( £ C for i = 1,2,... n}; so V is a vector space over C by Example 1. Is V a. vector space over the field of real numbers with the operations of coordinatewise addition and multiplication? 15. Let V = {(a.]. 0'2: • • • • an): aL £ Riori — 1,2, . . . n } ; so V is a vector space over R by Example 1. Is V a vector space over the field of complex numbers with the operations of coordinatewise addition and multiplication? 16. Let V denote the set of all m x n matrices with real entries; so V is a vector space over R by Example 2. Let. F be the field of rational numbers. Is V a vector space over F with the usual definitions of matrix addition and scalar multiplication? 17. Let V = {(01,02): 01,02 £ F}, where F is a field. Define addition of elements of V coordinatewise. and for c £ F and (01,02) £ V, define c(ai,o 2 ) = (c/.|,0). Is V a vector space over F with these operations? Justify your answer. 18. Let V = {(01,02): oi,a 2 £ R}. define For (ai,a 2 ),(6i,6 2 ) £ V and c £ R.

(a,\.a2) + (6].62) = (ai + 26i.a-2 + 3b2)

and

c{a.\,a2) = (cai, ca 2 )-

Is V a vector space over R with these operations? Justify your answer.

20. The appropriate notion of substructure for vector spaces is introduced in this section. Let V = {(01.Wi) = (cvi. Thus a subset W of a.02): 01. (VS 7). necessary to verify all of the vector space properties to prove that a subset is a subspace.wi) + (v2. with these operations.cw1). Prove that Z is a vector space over F with the operations (vi.„ + 6„} and t{a„} = {tan}.Define addition of elements of V coordinatewise. 21. define .16 Chap.{«. and (VS 8) hold for all vectors in the vector space. Prove that.02 £ R}. vector space V is a subspace of V if and only if the following four properties hold.o. . (VS 6). (See 22. w\ + w2) . and for (01.4 subset W of a vector space V over a field F is called a subspace of V if W is a vector space over F with the operations of addition and scalar multiplication defined on V. Fortunately it is not. Let Z = {(v. define {<>•„} + {6n} . Is V a vector space over R with these operations? Justify your answer. 0) if c = 0 cai. these properties automatically hold for the vectors in any subset.) 1.w): v £ Vandv/.md c(v\. The latter is called the zero subspace of V. How many matrices arc there in the vector space M m x n (Z2)? Appendix C.02) in V and c £ R. (VS 2). Let V and W be vector spaces over a field F. £ W}. Let V be the set of sequences {an} of real numbers. — 1 it c / 0. V is a vector space over R. 1 Vector Spaces 19. note that V and {0} are subspaces. Definition. it is of interest to examine subsets that possess the same structure as the set under consideration. Because properties (VS 1). (VS 5). {6n} £ V and any real number t. (See Example 5 for the definition of a sequence.w2) = (ci + u2.) For {«„}. In any vector space V.3 SUBSPACES In the study of any algebraic structure. .

0 by Theorem 1. the 2 x 2 matrix displayed above is a symmetric matrix. it is this result that is used to prove that a subset is.3 hold: 1.4 such that . (b) x + y £ W whenever x i.Sec. Conversely. 1 0 -2 3 5 .3 Subspaces 17 1. ex £ W whenever c £ /•" and x C W. The transpose A1 of an m x n matrix A is the n x m matrix obtained from A by interchanging the rows with the columns: that is. the discussion preceding tins theorem shows thai W is a subspace of V if the additive inverse of each vector in W lies in W. Proof. For example.4' = A. (c) ex £ W whenever c . Hence W is a subspace of V. The next theorem shows that the zero vector of W must be the same as the zero vector of V ami thai property I is redundant. t hen W is a vector space wit h the operat ions of addition and scalar multiplication defined on V. a subspace.F and x C W. Clearly. Then W is a subspace ofV if and only if the following three conditions hold for the operations defined in V. (W is closed under addition. (W is closed under scalar multiplication. The set W of all symmetric matrices " u :F) is a subspace of M n x „ ( F ) since the conditions of Theorem 1. ) 3. I"he zero matrix is equal to its transpose and hence belongs to W.1 2 3 1 2 2 3 A symmetric matrix is a matrix . 1. a symmetric matrix must be square. in fact. But also x + 0 = x.) Using this fact. (See Exercise 3.1 (p. and there exists a vector 0' > W Mich that x — 0' = X for each x £ W. Normally. and (c) hold.) 2. But if a i W. Theorem 1. Let V be a vector space and W a subset ofV.3. If W is a subspace of V. we show that the set of symmetric matrices is closed under addition and scalar multiplication. if conditions (a). It is easily proved that for any matrices A and B and any scalars a and 6. (Al)ij = Aj%. 4./• c W and y ( W. W has a zero vector. then ( -l)x C W by condition (c). 12). (b). and —x = ( — 1 )x by Theorem 1. So condition (a) holds. Each vector in W has an additive inverse in W. I The preceding theorem provides a simple method for determining whether or not a given subset of a vector space is a subspace.W and y i W. For example.2 (p. and thus 0' . (a) 0 e W . x+y £ W whenever . . (aA + bB)' = a A' + bB'. Hence conditions (!>] and (c) hold. I I).

tr(M) = Mn + M 2 2 + --.R) defined in Example 3 of Section 1. .+ M n n . if all its nondiagonal entries are zero.j = 0 whenever i ^ j. II) is the1 constant function defined by f(t) = 0 for all t € R. It therefore follows from Theorem 1. The example's that follow provide further illustrations of the concept of a subspace. Moreover. + Bi:j = 0 + 0 = 0 1 and 1 (cA)l:J = eA. We claim that C(/t*) is a subspace of J-(R. then whenever i ^ j.sum of the diagonal entries of M. we have1 (a/1)' = «. Se> P„(F) is edoseel under addition and scalar multiplication. and the product of a scalar and a polynomial of degree le'ss than or equal to n is a polynomial of degree less than or equal te> n. Since constant functions are1 continuous. and the product of a real number and a.R) by Theorem 1. diagonal matrix if M. the1 sum of two polynomials with degrees less than or equal to n is another polynomial of degree less than or equal to n. So for any a £ F. it is in P. that is. we. If A £ W. if . • Example 2 Lc1!. three are particularly important.have / £ C(R). = cO = 0 for any scalar c. If A £ W and B € W. that is. Clearly C(7?) is a subset of the vector space1 T(R. is the.ro polynomial has degree —1.4' = aA. then A' = A and B* = B. C(7?) denote the set of all continuous real-valued functions defined on R. (A f B)ij = A. denoted tr(M). then A' = A. First note that the1 zero of 1F(R.4 and B are diagonal v x n matrices. Moreover. Thus aA £ W. • Example 4 The1 trace of an n x n matrix M. Hence A + B and cA are diagonal matrices for any scalar c. Since the zc.2. the sum of two continuous functions is continuous. continuous function is continuous.R). Example 1 Let it be a nonnegative integer.. The first.. so that A + B £ W.3. Moreover. • Example 3 An n x n matrix M is called a.„(F).3 that P„(F) is a subspace of P(F). So C(R) is closed under addition and scalar multiplication and hene-e1 is a subspace of J-(R.3. Thus (A + B)1 = A1 + Bf = A + B. and let P„(F) consist of all polynomials in P(F) having ck'gree1 k'ss than or equal to n. 1 Vector Spaces 2. Therefore the set of diagonal matrices is a subspace of M„ X „(F) by Theorem 1. Clearly the1 zero matrix is a diagonal matrix because all of its emtric's are1 0. 3.18 Chap.

then W is a subspace of V. As we already have suggested. 1. (See Exercise 19. In fact. it follows that x + y and ax are contained in each subspace in C. • Example 5 The set of matrices in M. it can be readily shown that the union of two subspaces of V is a subspace of V if and only if one of the subspaces contains the other.3.IIXII(R. I Having shown that the intersection of subspaces of a vector space V is a subspace of V. This idea is explored in Exercise 23.union of subspaces must contain the zero vector and be' closed under scalar multiplication. If V is a vector space and W is a subset of V that is a vector space. Hence X f y and a. Since1 every subspace contains the zcre> vector. It is easily seen that the. Proof. Let c. a natural way to combine two subspaces Wi and W_) to obtain a subspace that contains both W| and W-j. (b) The empty set is a subspace of every vector space. 0 £ W. (d) The intersection of any two subsets of V is a subspace of V.) having nonnegative entries is not a subspace of Mmxn{R) because it is not closed under scalar multiplication (by negative scalars). £ /•' and x.Sec.each subspace in C is closed under addition and scalar multiplication.y £ W. and lei W demote' the intersection of the1 subspaces in C. (c) If V is a vector space other than the zero vector space. then V contains a subspace W such thai W f V. however. Because.r are also contained in W. T h e o r e m 1. Any intersection of subspaces of a vector si>ace V is a subspace ofV. se> that W is a subspace of V by Theorem 1.4. Then x and y are' contained in each subspace in C. it is natural to consider whether or not the union of subspaces of V is a subspace of V. but in general the union of subspaces of V need not be1 closed under addition. EXERCISES 1. the key to finding such a subspace is to assure thai it must be closed under addition. Let C be a collection of subspaces of V.3 Subspaces 19 It follows from Exercise 6 that the set of n x n matrices having trace equal to zero is a subspace of M n x n ( F ) . Label the following statements as true or false.) There is. • The next theorem shows how te) form a new subspace from other subspaces. (a) .

02. 6 £ F. and W3 n W4. a 2 .a 2 .03) {(01. 5. T h e n W = R2. In addition.02.4a 2 . that is.20 Chap.03) {(01.3 a | + 60^ = 0} 9. Determine the transpose of each of the matrices that follow. Prove that (aA + bB)1 = aA* + bB1 for any A. W = {(ai. Wi n W 4 .a 2 . Prove that tr(aA + bB) = atr(A) + 6tr(£) for any A. Let Wi.02.03) {(01.7o2 + a 3 = 0} R 3 : O] . Prove that diagonal matrices are symmetric matrices. (a) (b) (c) (d) (e) (f) W) W2 W3 W4 W5 W6 = = = = = {(01. a 3 ) {(aj. a 3 ) £ £ £ £ £ £ R 3 : ai = 3a2 and 03 = —a2) R 3 : a.6 (v. 1 Vector Spaces (e) An n x n diagonal matrix can never have more than n nonzero entries. Prove that (A1)* = A for each A £ M m X n ( F ) .3o3 = 1} R 3 : 5a? . 7. and observe that each is a subspace of R3. (f) The trace of a square matrix is the product of its diagonal entries.6 4 7 (d) -2 5 7 0 I 4 1 . a 2 £ R}. (b) 0 3 8 .0): a i . compute its trace. Describe Wi n W 3 . Determine whether the following sets are subspaces of R3 under the operations of addition and scalar multiplication defined on R3. 2. B £ M m x n ( F ) and any a. (g) Let W be the ay-plane in R3. W 3 . Justify your answers.a 3 = 0} R 3 : ai + 2a 2 .03) {(ai. 4. 8. if the matrix is square. 6. B £ M n x n ( F ) .) ( I 13 5) (f) (h) 3. = a 3 + 2} R 3 : 2ai . and VV4 be as in Exercise 8. . Prove that A + A1 is symmetric for any square matrix A. a2.

. 17. and. . Let S be a nonempty set and F a. y £ W. Let C(S. . but W 2 = {(a \. F2) are subspace. w2. a 2 . 11. 21. Prove that the set of all even functions in T(F\. • • •. . = { ( o i .. F2) and the set of all odd functions in T(I'\. the upper triangular matrices form a subspace of M. Let S be a nonempty set and F a. Is the set of all differentiable real-valued functions defined on R a subspace of C(R)? Justify your answer.2. n] a subspace of P(F) if n > 1? Justify your answer. whenever a £ F and x. 15. those for which limn-too a.. Prove that for any SQ £ S. this exercise is essential for a later sectiejii. that is. 18.s of. F). F) denote the set of all functions / £ T(S.. an) £ F n : a i + a 2 H h an = 1} is not. a2. V is a subspace of V if and only if 0 £ W and ax + y £ W whenever a £ F and x. Prove that a subset W of a vector space. 14. Let Cn(R. Is the set. a n ) £ F" : ai + o 2 + • • • + a n = 0} is a subspace of F n . . if Aij = 0 whenever i > j. F) is a subspace of ^(S.e. C W 2 or W 2 C W.3 Subspaces 21 10. Prove that. {/ £ T(S.F2) is called an even function if g(—t) = g(t) for each t £ F\ and is called an odd function if g(—t) = —g(t) for each t £ Fi. m x n (F). Prove that Cn(R) is a subspace of F(R.. 12. Show that the set of convergent sequences {o n } (i. . 'A dagger means that.. A function g £ T(Fl. of V if and only if W ^ 0. 16. F). . 22. field. and W2 be subspaces of a vector space V. Prove that C(S. Prove that W| U W2 is a subspace of V if and only if W. an..y £ W. wn are in W. is a subspace of F(S.R). 1.F 2 ). W = {f(x) £ P(F): f(x) = 0 or f(x) has degree. An m x n matrix A is called upper triangular if all entries lying below the diagonal entries are zero. F): / ( s 0 ) = 0}. .Sec. . then a\W\ + a2w2 + • • • + anwn £ W for any scalars a\.) denote the set of all real-valued functions defined on the real line that have a continuous nth derivative.' Prove that if W is a subspace of a vector space V and w\.n exists) is a subspace of the vector space V in Exercise 20 of Section 1. 20.F(Fi. Prove that a subset W of a vector space V is a subspace. Let F\ and F 2 be fiedds. 13. F) such that f(s) — 0 for all but a finite number of elements of S. Prove that W. then ax £ W and x + y £ W. a 2 . field. 19. Let W.

m .) Show that MWXn(F)=W1(DW2. Wc denote that V is the direct sum of W] and W 2 by writing V = W.. Let W[ denote the set of all polynomials /(. . is the set [x+y: x £ S\ and y £ S2}. In M m x n ( F ) define W.o 2 . . (a) we have o^ = 0 whenever / is even. If S\ and S2 are nonempty subsets of a vector space V. o n ) £ F" : a. 27. Prove that Wi + VV2 is a subspace of V that contains both W] and W2.22 The following definitions are used in Exercises 23 30. F n is the direct sum of the subspaces W. A vector space V is called the direct sum of W] and V V 2 if W. and let W| denote the subspace of V consisting of all diagonal matrices. Show that. = a 2 = • • • = a n _] = ()}. then the sum of Si and S2. 24..1 + • • • + bxx + b0. denoted S\ +S2. (b) Prove that any subspace of V that contains both Wi and W2 must also contain W t + W 2 . + W 2 = V. o n ) £ F n : o n = 0} and W 2 = {(01. 26. Let Wi and W 2 be subspaces of a vector space V. . .0 whenever i < j}. where W 2 = {A £ V: Aij = 0 whenever i > j). . and W 2 are subspaces of'W such that W2 D W 2 = {0} and W. Definition. = { ( o i . is the set of all upper triangular matrices defined in Exercise 12. 25. we have 6* = 0 whenever i is odd."_1 -| h a i a + c/(l. 23. .j . 1 Vector Spaces Definition. © W 2 ...{A £ M TOXn (F): Ai3 = 0 whenever i > j} and W 2 = {A £ M m X n ( F ) : Ai. Chap. Show that V = Wi © W 2 . o 2 ./:) in P(F) such that in the representation f(x) ~ anx" + fln_ia. Likewise let W 2 denote the set of all polynomials g(x) in P(F) such that in the representation g(x) = bmx'" + 6 m _ 1 a. © W 2 . Prove that P(F) = W. Let V demote the vector space consisting of all upper triangular n x n matrices (as defined in Exercise 12). (W.

(d) Prove that the set S is a vector space with the operations defined in (c). . (c) Prove that the preceding operations are well defined. 1. Both Wi and W2 are subspaces of M n X „(F).{ A £ F): Aij = 0 whenever i < ?'} and W2 to be the set of all symmetric n x n matrices with entries from /•'. that is. A matrix M is called skew-symmetric if Ml = — M. Clearly. 29. 31. then (c. Define W.\ + x2: where ari £ Wi and x2 £ W 2 . Compare this exercise with Exercise 28. 30. Now assume that F is not of characteristic 2 (see Appendix C). Prove that M n x n ( F ) — Wi © W 2 .v2 £ W.3 Subspaces 23 28.e of M n x n ( F ) . and let W 2 be the subspace of M n x n ( F ) consisting of all symmetric n x n matrices. Prove that the set Wi of all skew-symmetric n x n matrices with entries from F is a subspae. a skewsymmetric matrix is square. Let F be a field that is not of characteristic 2. fiedd.v2 £ V and a(v + W) = av + W for all v £ V and a £ F. It is customary to denote this coset. by v + W rather than {v} + W. Addition and scalar multiplication by scalars of F can be defined in the collection S = [v f W: v £ V} of all cosets of W as follows: (vi + W) + (v2 + W) = (Vj + v2) + W for all v\. For any v £ V the set {v} + W = {v + w: w £ W} is called the coset of W containing v. Let Wi and W2 be subspaces of a vector space V. + W = v2 + W if and only if vx . . Prove that V is the direct sum of Wi and W 2 if and only if each vector in V can be uniquely written as x. (b) Prove that c. Let W be a subspace of a vector space V over a field F. Prove that M „ x n ( F ) = Wi + W 2 . show that if vi + W = v[ + W and v2 + W = v'2 + W. + W) + (v2 + W) = (v[ + W) + (v'2 + W) and a(Vl + W) = a(v[ + W) for all a £ F.Sec. (a) Prove that v + W is a subspace of V if and only if v £ W. This vector space is called the quotient space of V modulo W and is denoted by V/W. Let F be a.

10 0.7 (0) Soy sauce 0 0.4 Chap.. .34 0.02 0. Department of Agriculture. and the set of all points in this plane is a subspace of R3. U.24 1.18 L.01 0. In this case we also say that v is a linear combination of ii\. B. 1 Vector Spaces LINEAR COMBINATIONS AND SYSTEMS OF LINEAR EQUATIONS In Section 1.\. respectively. In this case. of a vitamin present is either none or too small to measure.07 Niacin C (mg) (nig) 0.. (nig) 0. only) 100 0.. Consumer and food Economics Research Division.2 2 0. a2 ti„ and call a. Let V be a vector space and S a nonempty subset ofV.3 0 Raw wild rice (0) 0..4 0 Raw brown rice (0) 0.1 (0) Jams and preserves In 0. generalization of such expressions is presented in the following deli nit ions.2 2 Coconut custard pie (baked from mix) 0 0. Watt and Annabel I.C. The appropriate. an in F such that v = oiU\ + a2u2 + • • • f anun. unpared apples (freshly harvested) Chocolate-coated candy with coconut center Clams (meat. .1 Vitamin Content of 100 Grams of Certain Foods A (units) 0 90 0 1 3[ (mg) 0. 1963. Observe that in any vector space V. Example 1 TABLE 1.02 0. . where s and t are scalars and u and v are vectors.01 0.1 4 0. (This is proved as Theorem 1.01 0. an the coefficients of the linear combination. ..3 10 Cupcake from mix (dry form) 0 0.) Expressions of the form su + tv.2 0 Apple butter Raw. Thus the zero vector is a. Washington.03 0. Composition of Foods (Agriculture Handbook Number 8). Ov — 0 for each v £ V.01 0. a2.01 0.05 0. un in S and scalars ai.2 (0) Source: Rernice K. the equation of the plane simplifies to x = su + tv.01 0.4 0 Cooked spaghetti (unenriched) 0 0..S. and s and t deneDte arbitrary real numbers. An important special case occurs when A is the origin.1. A and ending at B and C..02 I5_.05 4. D.02 0. linear combination of any nonempty subset of V. u2.25 0. A vector v £ V is called a linear combination of vectors of S if there exist a finite number of vectors u\..5. and C in space is x = A + su + tv.02 0. a 2 . play a central role in the theory of vector spacers.3 0 Cooked farina (unenriched) (0)a 0..03 0. where u and v denote the vectors beginning at. it was shown that the equaticju of the plane through three noncollinear points A. .03 6..45 0. a Zeros in parentheses indicate that the amount. Merrill.02 0. Definitions.06 0.

02 0.01 + 0. we see that /0.03 + 0.4 Linear Combinations and Systems of Linear Equations 25 Table 1.02 + 0.3).40 6. raw brown rice. and wild rice1. .01 = 0.oo\ 0.ooy \0. B 2 (riboflavin).10 0.18 0.00/ v 4.40 0.20 4. coconut custard pie. 100 grams of chocolate candy. Bi (thiamine). how.8) can be.10 0. the vitamin vector for apple butter is /0.0()\ /o. 100 grams of raw brown rice. niacin. 100 grains of coconut custard pie.Sec.25 = 0.02 -f 0. and if so.01 0.01 0.00j V 2-00y 200 grains of apple butter. The vitamin content of 100 grams of each food can be recorded as a column vector in R:> for example1.45 0. soy sauce. Similarly.00y K0.20 0.03 0.00yi yO. u3 = (0.01 0. For now.07 + 0. In Chapter 3. 100 grams of jam. and soy sauce.00\ /0.30 1.oo\ fom\ /IOO. u2 = ( .1 shows the vitamin content of 100 grams of 12 foods with respect to vitamins A.00y \0. . raw brown rice.00\ 0. and 100 grams of spaghetti provide exactly the same amounts of the five vitamins as 100 grams of clams. we discuss a general method for using matrices to solve any system of linear equations.34 0. coconut custard pie.OOX 0.20 0. and C (ascorbic acid).00y lO.oo\ /90. since /io.02 0. This question often reduces to the problem e > f solving a system of linear equations.2.10 0. -2).02 0.2 . 100 grams of apples. linear combination of ui = (1. So 100 grams of cupcake.01 0. • Throughout. 100 grams of farina. we illustrate how to solve a system of linear equations by showing how to determine if the vector (2.70 ^0. expressed as a.4 .oo\ /o.01 0. 2.oo\ /0.20 0.63 0.05 + 2 0.02 4 0.oo/ \ 10.05 0. Chapters 1 and 2 we encounter many different situations in which it is necessary to determine1 whether or not a vector can be expressed as a linear combination of other vectors.00\ /o.06 + 0. 1.00J Considering the vitamin vectors for cupcake.()()/ Ki)i)oj Thus the vitamin vector for wild rice1 is a linear combination of the vitamin vectors for cupcake. and 200 grains of soy sauce provide exactly the same amounts of the five vitamins as 100 grams of raw wild rice.00\ /o.6.20 \2.OOy \0.1).00\ /0.30 0.02 0.30 ^2.

The procedure to be used expresses some of the unknowns in terms of others by eliminating certain unknowns from all the equations except one.8) can be expressed as a linear combination of u\. which produces Oi — 2a 2 + 2a4 — 3a. U3.0.03.5 = 6. . = 3. <i] — 2a2 + 3c?3 — 3c/4 + I6U5).4 + 7a. u.\. 2a 1 — 4a 2 + 203 + 805. .04.r. Hence (2.r> — 2 03 — 2a. we replace it by another system with the same solutions.5a4 + 19a5 = 6.4o2 + 2 o 3 + 805 = 6 ai — 2a2 + 3a3 . and then eliminate 03 from the third equation.1) + a 2 ( .02.2 ) + o 3 (0. . = 2 03 — 2ei4 + 705 = 1 3a.2 .3 = 1 04 — 2ar.204 — 305 — 2 2a.10) = (ai — 2o. obtaining Oi — 2a 2 + 204 — 3a.26 u4 = (2. the first. Next we add —3 times the second equation to the third.-3) + a 5 (-3. (3) . To begin. The result is the following new system: Oi — 2o2 + 2a 4 — 3a5 = 2 2o 3 . Thus we must determine if there are scalars 01.i + 05145 = Oi(l.3a4 + I605 = 8. it happened that while eliminating 01 from every equation except.03. This need not happen in general.0. and 05 such that (2.j — 5ci4 + 19a. we first multiply the second equation by | .. and Chap.8.2.05) satisfying the system of linear equations a 1 — 2a 2 f. we also eliminated a 2 from every equation except the first.(2. To do this.02.4o4 + 14a5 = 2 3a. (1) which is obtained by equating the corresponding coordinates in the preceding equation. u2.6.04. and W 5 if and only if there is a 5-tuple of scalars (01.j . We now want to make the coefficient of 03 in the second equation equal to 1.8.2 + 2«4 — 3as. but which is easier to solve.6. To solve system (I).16). we eliminate 01 from every equation except the first by adding —2 times the first equation to the second and — 1 times the first equation to the third. 2.3) + a.4 .8) = tti'Ui + a2u2 + 03113 + ei4U. 1 Vector Spaces u5 = (-3.. (2) In this case. -3).

we prove that these operations do not. 3. unknown with a. a 2 . a2.05) is a solution to system (1). In Section 3. multiple of any equation to another equation in the system. 2a 5 + 3. . Note that we employed these operations to obtain a system of equations that had the following properties: 1. and 04) in terms of the other unknowns (a2 and 0. 1. In particular.4.Sec. The first unknown with a nonzero coefficient in any equation has a larger subscript than the first unknown with a nonzc:ro coefficient in any prex-cxling equation.8) is a linear combination of U\.4.0) obtained by setting a 2 = 0 and a.r> — —A o3 +3o5= 7 04 — 2a.a-.05) = (2a2 .6. we find that Oi = 2a 2 — a. (2.iy 04.u2. Thus for any choice of scalars a 2 and a. multiplying any equation in the system by a nonzero constant.r. + 7.05 . The first nonzero coefficient in each equation is one. The procedure: just illustrated uses three1 types of operations to simplify the original system: 1.4 Linear Combinations and Systems of Linear Equations 27 We continue by eliminating a 4 from every equation of (3) except the third.5 = 0 is a solution to (1). — 3ar. constant. = 3. and 1/5. Therefore (2. Rewriting system (4) in this form. 2.0.04. interchanging the order of any two equations in the system.8) = -Aui + 0u2 + 7u3 + 3w4 + 0u5. 2. 3. nonzero coefficient in some equation. so that. + 7 a. This yic'lds a i — 2a2 + o.-5 — 4 03 = — 3a-. a vector of the form (oj. (4) System (4) is a system of the desired form: It is easy to solve for the first unknown present in each of the equations (01.03.5. adding a. then that.7.5). the vector (—4.3. If an unknown is the first. change the set of solutions to the original system.4 = 2a 5 + 3.1/3.6. unknown occurs with a zero coefficient in each of the other equations.

T5 = 6 .28 Chap.5x .3) + b(3x:i . but that.6x5 = 0 x 2 + 5x 3 . case we wish to find scalars a and b such that 2x:j .2x 2 + 7x + 8 is not.2x 2 . 1 Vector Spaces To help clarify the meaning of these properties. where c is nonzero.9) 3x 3 . then the original system has no solutions. In the first. (0) x. (Sex1 Example 2. the first unknown with a nonzero coefficient.2x2 + 12x ./|. One:e: a system with properties 1. .i(R).2x2 .l + x5 = .3x 5 = 2.9 .6 is a linear combination of x3 .i . 3x 3 . satisfy property 2 because X3.2x 8 (7) Specifically. We diseniss there the theoretical basis for this method of solving systems of linear equations and further simplify the procedure by use of matrices. does not have a larger subscript than .3 and in P-. in the course of tisiug operations 1. the first unknown with a nonzero ccxuficient in the second equation.oeffieie:nt in the first equation. Example 2 We claim that 2x3 ._.2x 2 + 12x . and system (7) doe:s not.) We return to the study of systems of linear equations in Chapter 3.6 = a(xz . occurs with a nonzero e. the first nonzero coefficient in the second equation is 2. in the third equation.5 -2x5= 9 x4 + 3. X\ + 3X2 + X4 =7 2x3-5x4 = .-r>x2 . however. it is easy to solve lor some of the unknowns in terms of the others (as in the preceding example). the first unknown with a nonzero coefficient.4x .5x 2 .4x . and 3 has been obtained. 2.2x 2 + 3x 3 x3 (6) xi + xr} = 1 x4 . system (5) does not satisfy property 1 because. 2. system (6) does not. If. note that none of the following systems meets these requirements.5x . in the second equation. is obtained. satisfy property 3 because x2. and 3 a system containing an equation of the form 0 = c.

• .2x 2 . 1.9).5x 2 .2x2 + 12x .2x2 . we obtain a system of linear equations a + 36 = 3 . In the second case. (8) But the presence of the inconsistent equation 0 = 17 indicates that (8) has no solutions.9.2x2 .2 46 = 12 96 = .56 = .5x .46)x + (-3a.3) + 2(3x:i .5x 2 .2 . .46 = 7 . as before yields a + 36= 6= 116 = 0= 3 4 22 17.5x 2 .4x .96).5 a .2x2 + 7x + 8 = a(x3 . -4 2 0 0.4x .5x . 29 Adding appropriate multiples of the first equation to the others in order to eliminate a. there are no scalars a and 6 for which 3x 3 . Eliminating a. Now adding the appropriate multiples of the second equation to the others yields o= 6 = 0= 0= Hence 2x 3 .4x .6 .5x .3) + 6(3x3 .56)x2 + ( .3 and 3x 3 . Hence 3x 3 — 2x2 + 7x + 8 is not a linear combination of x:i .5 a .Sec.9).3 a . we wish to show that. Thus we are led to the following system of linear equations: a -2a -5a -3a + 36 = 2 56 = .4 Linear Combinations and Systems of Linear Equations = (a + 3b)x3 + ( . we find that a + 36 = 6= 116 = 06= 2 2 22 0.2 a . Using the preceding technique.96 = 8.2 o .6 = -A(x3 .

«i + a2u.f.(cam)um + b„e„ f... w2.1.S'). (0. In this case. . um.. we form the set of all linear combinations of some set of vectors. . an arbitrary vector in span(<Sr). Then there exist vectors U\. b\. This fact is true in general. c2.1.0) 4.0.. then S contains a vector z. 0 2 . Because iu... 1. for instance.0). are clearly linear combinations of the vectors in S: so x + y and ex are in span(5).0) for some scalars a and 6.. .6(0. then w has the form w = CiWj +C2if2 + " • -+CkWk for some vectors W\. the span of the set {(1.-. For convenience. any subspace of V that contains S must also contain the span of S.. Since S C W. Definition. bn such that x = ai«i + a2u2 H Then x + y = a. v„ in 5 and scalars oi. 1 Vector Spaces Throughout this book. it follows that span(5) C W. Let S be a nonempty subset of a vector space V. Thus span(S) is a subspace of V. . belongs to W. . In this case.3. .. denoted span(S'). Thus the span of {(1. 0)} consists of all vectors in R3 that have the form a( 1.0. | Definition. . set of linear combinations.0) = (a. c.. 1 + 62t'2 4 and.5.m and y = 6.amu. . for any scalar c. we have u)\. If w C span(. So Oz = 0 is in span(5"). we also say that the vectors of S generate (or span) V. . 6.bnvn. . The span of S..0. Now let W denote any subspace of V that contains S. « 2 . . Therefore w = c+wi + c2w2 + • • • + Ck'Wk is in W by Exercise 20 of Section 1. Let ^iV £ span (5).2 + • • • + amum + 6|t .S1 = 0 because span(0) = {()}.a2)u2 4 f. A subset S of a vector space V generates (or spans) V if span(5) = V.30 Chap. If S ^ 0 .0)} contains all the points in the xy-plane. In R3. + b2v2 -\ \. am. . (0. Wk € W. c. Proof This result is immediate if . V\. ex = (cai)wi + (c. is the set consisting of all linear combinations of the vectors in S. in S and some scalars c\. Theorem 1. w2 u\. which is a subspace that is contained in any subspace of V. We now name such a. the span of the set is a subspace of R . Moreover. 6 2 .v2. . we define span(0) = {()}.0). The span of any subset S of a vector space V is a subspace of V..

in fact.0.0.8 a + 56 + 3c)(x2 + 3x .a 2 + a 3 ). and t for which r ( l .1 1 1 o a 12 .1) generate R3 since an arbitrary vector {0. 1.x 2 . and (0. 2 — 4x + 4 generate P2(-R) since each of the three given polynomials belongs to P 2 (i?) and each polynomial ax2 + bx + c in P2(-R) is a linear combination of these three.1. s.3) + ( .2.1). + 4) = az 2 + bx + a / Example 5 The matrices 1 1 1 0 1 1 0 1 1 1 0 1 and 0 1 1 1 • generate M 2x2 (. • (aua2.1 •3an .0). 2 1 1 .2) + (4a . 1 0 1 ) 1 0 1 1 1 + .1.ofl21 + o a 22.c)(2a-2 + 5x .2 + a 3 ).1) + t{0.4a.a + 6 + c ) ( .ft) since an arbitrary matrix A in M 2x2 (. the scalars r.0) + a(l.0.26 .Sec. and -a.1.3) in R3 is a linear combination of the three given vectors. 0.1.4 Linear Combinations and Systems of Linear Equations Example 3 31 The vectors (1.1) = are r = -(a\ + 02 — a 3 ). 1. s = -{a\ . and t = -(— a\ + 0. the matrices 1 0 0 1 1 1 0 1 and 1 1 0 1 . (1. namely. 2x2 + 5x — 3.1 1 1 2 . ( ^ a i l + o a 1 2 + o a 21 ~ 3^22) .R) c a n be expressed as a linear combination of the four given matrices as follows: an a2i aV2 a22 = . 2 1 1 1 x 0 0 a fl (""Q !! + o 1 2 + o 21 + o a 22) 1 On the other hand.a3) Example 4 The polynomials x2 + 3x — 2. ( .

Hence not every 2 x 2 matrix is a linear combination of these three matrices. then span(5) equals the intersection of all subspaces of V that contain S. Thus x G R3 is a linear combination of u. it is permissible to multiply an equation by any constant. (See Figure 1. (b) The span of 0 is 0 .) Figure 1. So any linear combination of these matrices has equal diagonal entries. (d) In solving a system of linear equations. it is permissible to add any multiple of one equation to another. (f) Every system of linear equations has a solution. • At the beginning of this section we noted that the equation of a plane through three noncollinear points in space.5 Usually there are many different subsets that generate a subspace W. In the next section we explore the circumstances under which a vector can be removed from a generating set to obtain a smaller generating set. is of the form x — su + tv. (e) In solving a system of linear equations. (a) The zero vector is a linear combination of any nonempty set of vectors. (c) If 5 is a subset of a vector space V. v € R3 and s and t are scalars. EXERCISES 1. v E R if and only if x lies in the plane containing u and v. one of which is the origin. Label the following statements as true or false. where u. 1 Vector Spaces do not generate M 2x2 (i?) because each of these matrices has equal diagonal entries.5.) It is natural to seek a subset of W that generates W and is as small as possible. . (See Exercise 13.32 Chap.

2 .T3 + 5x4 = 3 Xi + 2x 2 -Xi 42xi + 5x 2 4xi + l l x 2 Xi 2xi 3xi xi 4.3x 4 .4.3.4 ) (-2. .Sec. For each of the following lists of vectors in R3. (a) 2si . 2 . (a) (b) (c) (d) (e) (f) x3 .3x3 — 3x4 — 6 2xi 4. (a) (b) (c) (d) (e) (f) (-2. ( .5 3.3x2 4 . x 3 .-2.2x + 3 6x3 . .3 .1 3 2 3 4x + 2x .2.1 lx2 4 .3x 4 -1 .1. .2x2 + 3x .2x2 + 4x + 1.2x3 .8x2 + 4x. .3 . determine whether the first vector can be expressed as a linear combination of the other two. ( 1 .IOX4 .2 .-1) (1.(2.6x2 + x + 4 -2x3 .1). (1. . Solve the following systems of linear equations by the method introduced in this section.6 xi 4.3x 4 .3 ) .-1).2x3 .5.x 2 4.3x3 .x 5 .x + 1.2x2 + 3x .5.2. .3x2 + 4x + 1.2. x 3 . x 3 4 . x 3 .2 .2x 2 + x2 + x2 4. .x 2 + 2x 4 -3 3 x .1).3).4x 5 ~ 4x 4 .2 3 2 x 4 . 1 ) (2. ( .(1.2.2x 5 = 7 = -16 = 2 = 7 / (b) (c) (d) (e) (f) + 6x3 = .1 .1 . (1.3x3 = -2 3xi — 3x 2 — 2x3 + 5x4 = 7 X\ — X2 — 2x3 — x 4 = —3 3xi .1x2 .1.2x2 .6. ( 2 .2x 2 + 2x 3 = 2 xi + 8x 3 + 5.3x 2 — X3 4.x + 2. x . ( .3x 4 .0). determine whether the first polynomial can be expressed as a linear combination of the other two.1 ) (3. x 3 .lx2 + 4x3 = 10 x\ . 2 ) (5.4 Linear Combinations and Systems of Linear Equations 33 2.3x 2 4x 3 l()x3 5x 3 7x 3 X4 + x 5 .0).0. (1. 3 .1. ( . ( 1 .4.4x 2 . For each list of polynomials in P 3 (K).2x3 + x 2 + 3x .T4 = .x + 2x 4-13. . x 3 .3.-5). 1 ) . 1.3 ) .3 . 3 ) 4.3 .1 + x3 = 8 X3 = 15 + IOX3 = .4x 4 = 8 xi 4.-1. x 3 + 3x2 .2.x 2 + 2x 4 .-3). .2).2x 2 + X3 = 3 2x\ — X2 — 2x 3 = 6 x\ + 2x 2 — X3 + a*4 = 5 xi 4.

Show that the vectors (1. 1 ) } (-1. 1 ) } (-1. Show that if Si and S2 are arbitrary subsets of a vector space V. 11.1 . (0.0. . . 12.x 4. x 2 4. then span(Si) C span(5 2 ).0. Interpret this result geometrically in R3.1)} ( 2 .0.1).2).34 Chap.2).x 2 4-x + l . Show that if Mi = 1 0 0 0 M2 = 0 0 0 1 and M3 = 0 1 1 0 0 0 1 0 0 0 1 0 and 0 0 0 1 then the span of {Mi. Show that the matrices 1 0 0 0 generate M2X2(F)10. .1.) . x 4 . en} generates F n .x 3 4-2x 2 4-3x 4-3.-1).1.l . 9.2). x 2 4-x 4. 1 Vector Spaces 5.1).-1.2) = span(5i)+span(S' 2 ). e 2 .3 ) . In each part. 1 .1 .1. 7.1. . ( .2. S = {(1.1.1} 2 x 3 . S = {x 3 4.x 2 x + 3. x n }. . .1 .3. let ej denote the vector whose j t h coordinate is 1 and whose other coordinates are 0.X 4. 1 .1.0). . Show that a subset W of a vector space V is a subspace of V if and only if span(W) = W.1. In F n . x . . determine whether the given vector is in the span of S. S= {x 3 + x 2 + x + l . 13. . .1) generate F 3 . S = {(1.1. Show that Pn(F) is generated by {1. and (0. (1. Prove that {ei.0.1.1} 1 2 -3 4 1 0 0 1 s = S = 1 0 -1 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1 0 0 6. then span(S'iU5. S = {(1. 1 . In particular.1. M 3 } is the set of all symmetric 2 x 2 matrices. deduce that span(5 2 ) — V. (0. ( . 8.1). (a) (b) (c) (d) (e) (f) (g) (h) (2. (The sum of two subsets is defined in the exercises of Section 1.1)} .0. ^ Show that if Si and S 2 are subsets of a vector space V such that Si C 5 2 . 14.T Prove that span({x}) = {ax: a £ F} for any vector x in a vector space. S = {(1.1. M 2 .-1). if Si C 5 2 and span(Si) = V.

4). where ui — (2. —1).a 2 4. .2 = • • • = an = 0. U3 = (1.u 2 . .a 3 = . Let us attempt to find a proper subset of S that also generates W. u 3 — 2u\ ~ 3w2 4. Indeed.^/4}.a 2 4. This does not. Give an example in which span(Si D S 2 ) and span(Si) nspan(S 2 ) are equal and one in which they are unequal.a 3 w 3 . Prove that span (Si DS 2 ) C span(Si) n span(S 2 ).1.4a 1 4. The reader should verify that no such solution exists. Thus U4 is a linear combination of ui.3a 2 .OW4. . Consider. Under what conditions are there only a finite number of distinct subsets S of W such that S generates W? 1.3a 2 — 03 = — 1 has a solution.u 2 . for example. The search for this subset is related to the question of whether or not some vector in S is a linear combination of the other vectors in S. and U4.. answer our question of whether some vector in S is a linear combination of the other vectors in S. —2. 1. —1). a 2 .a 3 ).a 2 4. if and only if there are scalars a i . in fact.2 .vn € S and aiVi 4. and a 3 satisfying (1.a 2 4.. —ai . and U4 = (1. the smaller that S is. the fewer computations that are required to represent vectors in W.^3. . Now U4 is a linear combination of the other vectors in S if and only if there arc scalars a\.3). W is an infinite set. It can be shown.• • • 4.a 2 v 2 4. It is desirable to find a "small" finite subset S that generates W because we can then describe each vector in W as a linear combination of the finite number of vectors in S.-1. Unless W is the zero subspace. Let W be a subspace of a vector space V. then o\ = 0.-1. and a 3 such that u4 = aiui 4..a 2 w 2 4. that is.a 3 .1 ) = (2ai 4.a i .Sec. a 2 . Let S\ and S 2 be subsets of a vector space V.a 3 . 16. however.2 4ai 4.U2.anvn = 0. that U3 is a linear combination of wi.5 LINEAR DEPENDENCE AND LINEAR INDEPENDENCE Suppose that V is a vector space over an infinite field and that W is a subspace of V. 1)2. Let V be a vector space and S a subset of V with the property that whenever v\.a 3 = 1 . the subspace W of R3 generated by S = {u\. u 2 = (1. namely. 17. and U3 if and only if the system of linear equations 2ai 4. Prove that every vector in the span of S can be uniquely written as a linear combination of vectors of S.5 Linear Dependence and Linear Independence 35 15.

the zero vector can be expressed as a linear combination of the vectors in S using coefficients that are not all zero. w 2 . The converse of this statement is also true: If the zero vector can be written as a linear combination of the vectors in S in which not all the coefficients are zero. such that aiUi 4. .1. . . That is.4 . Thus. . This observation leads us to the following definition. (-1. -4). . un. . . Note that since U3 = 2ui — 3w2 4. then some vector in S is a linear combination of the others. . (1. it is more efficient to ask whether the zero vector can be expressed as a linear combination of the vectors in S with coefficients that are not all zero. 2 ) . we can save ourselves some work. we have —2ui 4. for a set to be linearly dependent. Consequently. there must exist a nontrivial representation of 0 as a linear combination of vectors in the set.a2w2 4. u 2 . .3u 2 4.U3. 1 Vector Spaces In the preceding example. . if any. .«3 — O W 4 = 0 can be solved for any vector having a nonzero coefficient. . not all zero. w2. .un in S and scalars ai. .0)} in R 4 . in the example above. a 2 . checking that some vector in S is a linear combination of the other vectors in S could require that we solve several different systems of linear equations before we determine which. Example 1 Consider the set S = {(1. un. .4 . A subset S of a vector space V is called linearly dependent if there exist a finite number of distinct vectors W]. w 2 . of ui. By formulating our question differently. any subset of a vector space that contains the zero vector is linearly dependent. so U\.36 Chap.3 . Definition. the equation —2wi 4. We show that S is linearly dependent and then express one of the vectors in S as a linear combination of the other vectors in S. . because 0 = 1 • 0 is a nontrivial representation of 0 as a linear combination of vectors in the set. rather than asking whether some vector in S is a linear combination of the other vectors in S. a n .OW4. Thus. We call this the trivial representation of 0 as a linear combination of u\. . . 0 ) . we have aiUi 4. (2. 2 . .3u 2 4.2. For any vectors Wi.U2. For instance.a 2 w 2 4. In this case we also say that the vectors of S are linearly dependent.w3 — O W 4 = 0.To show that .0.3. or U3 (but not U4) can be written as a linear combination of the other three vectors.• • • 4. and U4 is a linear combination of the others.• • • + anur= 0.anun = 0 if ai = a-2 = • • • = a n = 0. because some vector in S is a linear combination of the others.

.3 . 0 .a 2 ( 2 . 1.l .a 4 ( .0 ( . a 2 . 2 . One such solution is ai = 4. .0) = 0. A subset S of a vector space that is not linearly dependent is called linearly independent.3. . 2 ) . we must find scalars ai. not all zero. and 04 = 0.a 3 — a 4 3ai 4.4 ) 4 .2a 2 — 3a 3 —4ai — 4a 2 4.1.a 3 ( l .4 . for linearly dependent sets must be nonempty. A set is linearly independent if and only if the only representations of 0 as linear combinations of its vectors are trivial representations.2a 2 4.5 Linear Dependence and Linear Independence 37 S is linearly dependent. and 04. .l . 2 ) 4 .2. 0 ) = 0. .a 4 2ai — 4a 3 = = = = 0 0 0 0. the set 1 -4 -3 2 0 5 -3 6 7 -2 4 -7 -2 -1 3 -3 11 2 • is linearly dependent because 1 -4 -3 2 4-3 0 5 -3 6 7 -2 4\_ -7 (-2 3 11 V—1 —3 2 0 0 0 0 0 0 Definition. 0. .3 . a 2 = —3. . 0 ) 4 . a3. 2 . such that a i ( l . 3 . 1. l . 03 = 2. The empty set is linearly independent. For if {u} is linearly dependent. As before. 0 ) + 2(1. . 2.4 . then au = 0 for some nonzero scalar a. The following facts about linearly independent sets are true in any vector space. 2 . A set consisting of a single nonzero vector is linearly independent. Example 2 In M 2x3 (i?).4 . Finding such scalars amounts to finding a nonzero solution to the system of linear equations ai 4.3(2. 3. . we also say that the vectors of S are linearly independent.Sec. and 4(1.4 ) 4. Thus u= a (au) — a 0 = 0. Thus S is a linearly dependent subset of R4.4 .2a 3 4.

and so S is linearly independent.0. (0.l ) + a 3 ( 0 . . . The set {po(x). By equating the coefficients of xk on both sides of this equation for k = 1 .1 ) 4 . 1 Vector Spaces The condition in item 3 provides a useful method for determining whether a finite set is linearly independent.ai 4.0.0.1) = (0. .2. .1)} is linearly independent. . . . . 1 .0. =0 =0 a3 =0 —ai — a2 — 03 4. 1 . • Example 4 For k = 0 .x n . .a 2 = 0 = 0 = 0 1 .• • 4.38 Chap.a n = 0 Clearly the only solution to this system of linear equations is ao = ai = • • • = a n = 0.(a 0 4..0.pn(x)} is linearly independent in Pn(F).anpn(x) =0 ai ao 4. .0. 0 .(a 0 4.0). This technique is illustrated in the examples that follow.0. (0. we obtain ao a 0 4. . 2 .pi(x).(a 0 4.a 2 )x 2 -| 1 .ai)x 4. a n . (0.1. Example 3 To prove that the set S = {(1.a\ 4.1. a i .04 = 0 a2 Clearly the only solution to this system is ai = a 2 = a 3 = a4 = 0.0.. n let pk(x) = xk + xk+1 4. then ao 4.-1). and 04 are scalars such that a i ( l . 0 . 0 .3..0.• • • 4.aipi(x) H for some scalars ao.-1).. Suppose that ai. n.-1). we must show that the only linear combination of vectors in S that equals the zero vector is the one in which all the coefficients are zero. . . . 0 . we obtain the following system of linear equations. .ai 4.0. 0. l .a 2 4.a 2 ( 0 . • .0. Equating the corresponding coordinates of the vectors on the left and the right sides of this equation.ai -I h a n ) x n = 0. For if a0p0(x) 4.1 ) + a 4 (0.ai ao 4.

1 .6.1.0411. . U\ or u2) is a linear combination of the other vectors in S.0.OU4.3u 2 4. w2 = (1. If S 2 is linearly independent. consider the subset S = {u\. and «4 — ( 1 . Let S be a linearly independent subset of a vector space V.4 = (ai 4. 11.OM4) 4. We have previously noted that S is linearly dependent: in fact. -2u.a3?i3 4. and let V be a vector in V that is not in S. Then some vector v € S can be written as a linear combination of the other vectors in S. where //. Let V be a vector space. .a 3 (2ui — 3u2 4. we remarked that the issue of whether S is the smallest generating set for its span is related to the question of whether some vector in S is a linear combination of the other vectors in S.1*2. 1.5 Linear Dependence and Linear Independence 39 The following important results are immediate consequences of the definitions of linear dependence and linear independence. and let St C S 2 C V. Exercise.2u2 4. It follows that if no proper subset of S generates the span of S.3 = 2ii\ — 3u2 4. Another way to view the preceding statement is given in Theorem 1. Therefore every linear combination a\Ui f a2u2 + a3":i + 04114 of vectors in S can be written as a linear combination of U[.3a 3 )» 2 + 04^4.^3. and 114: a\tii 4. This equation implies that w3 (or alternatively. u2.\ 4.Sec.U3 . To see why.u 2 . .7. Thus the issue of whether S is the smallest generating set for its span is related to the question of whether S is linearly dependent. Proof.U4} of S has the same span as S! More generally. Let V be a vector space. —1. I Earlier in this section. If Si is linearly dependent. S S I Corollary. Thus the subset S' — {ai.3). For example. then Si is linearly independent. Exercise. suppose that S is any linearly dependent set containing two or more vectors.2 . then S 2 is linearly dependent. and the subset obtained by removing v from S has the same span as S.a. 4 ) .O//4 = 0.-1). Theorem 1. 113 = (1. Theorem 1.7.114} of R3.2a3)jvi + (a 2 . and let S] C S 2 C V. = ( 2 . then S must be linearly independent. Then S U {v} is linearly dependent if and only ifv € span(S).2u2 4. Proof.1 ) . .4U4 = aiUi + o.

.vm.40 Chap.3.l A ) \ 2 _4J)-M2><2^ (c) {x 3 4. . one of the u^s. EXERCISES 1.a2u2 4.• • • + anun = 0 for some nonzero scalars a i . Since v is a linear combination of u2.anxn = 0 and X i . 62.. Conversely.bmvm. . Label the following statements as true or false... . let v £ span(S).2x . .. .V2. Therefore S U {v} is linearly dependent by Theorem 1.v} is linearly dependent.x 2 4. (c) The empty set is linearly dependent. 1 -3 -2 6 in M 2 x 2 (fl) (a) -2 4 4 -8 ^ H .vm in S and scalars 6i. Thus aiv 4. 1 Vector Spaces Proof. equals v. (f) If aixi 4. (h).1} in P3(R) 3 The computations in Exercise 2(g). . ra.. Then there exist vectors Vi. x 3 . H Linearly independent generating sets are investigated in detail in Section 1. x 2 . the coefficient of v in this linear combination is nonzero. (a) If S is a linearly dependent set. which are in S.x 2 4.un. . and so v = a1 (-a2u2 onun) = . (d) Subsets of linearly dependent sets are linearly dependent.( a x a2)u2 (ax 1 an)un. . then there are vectors u\. (e) Subsets of linearly independent sets are linearly independent.6. (i).. Because S is linearly independent..x 4. 2 . and so the set {vi. . then each vector in S is a linear combination of other vectors in S.anun = 0. .1.. a n . . (b) Any set containing the zero vector is linearly dependent. un in S U {v} such that ai^i 4.• • • 4. we have v £ span(S)..v2. 6 m such that v = biVi + b2v2 4.xn are linearly independent. Determine whether the following sets are linearly dependent or linearly independent. and (j) are tedious unless technology is used.a2x2 4..6. • • • . u2. a 2 .. Hence 0 = b\vi + b2v2 4-l)v..• • • 4. . then all the scalars a« are zero.2x 2 . . .. 2. Since v ^ Vi for i = 1 . say «i.. .• • • 4. If Su{v} is linearly dependent.a2u2 4.

( l . 4 ) } i n R 3 (f) { ( l .x. 0 . x n } is linearly independent in P n ( F ) . . . prove that the set is linearly dependent. 8. (b) Prove that if F has characteristic 2. 6.4. In M m X n ( F ) . 1. . . (a) Prove that if F = R. .2 x 3 4.5x 2 4. then S is linearly independent. 7.l . Show that the set { l .3x 4.l ) } i n R 3 (g) 1 0 ~2 1 1 0 ~2 1 0 1 0 1 -1 1 -1 1 -1 2 1 0 -1 2 1 0 2 1 -4 4 2 2 1 -2 in M 2 x 2 (tf) in M 2 x 2 ( # ) 41 (h) (i) {x4 . 10.3x 4.5x 2 .4x 2 . .3x 3 4. 9.3x 2 . . l . Prove that { e i .0.1). . . .6.x 4-1.1)} be a subset of the vector space F 3 .Sec.x 4.* Let u and v be distinct vectors in a vector space V.x 3 4. ( l .3 that the set of diagonal matrices in M 2 x2(^) is a subspace. 5.3x 2 4. x 3 . (0.2x 4.5 Linear Dependence and Linear Independence (d) {x 3 .2x 4 4.x 3 + ox 2 . Recall from Example 3 in Section 1. 2 ) .l .x 4 4-x 3 . .6} in P3(R) (e) { ( l .2x 4 4. let ej denote the vector whose j t h coordinate is 1 and whose other coordinates are 0.3. Prove that \Exi: 1 < i < m.x 4 4. v} is linearly dependent if and only if u or v is a multiple of the other.2} in P 4 (R) 4 (j) {x . e 2 .3. 2 .5. Let S = {(1. (1.4x 2 4. 1 < j < n) is linearly independent. let E1* denote the matrix whose only nonzero entry is 1 in the zth row and j t h column. then S is linearly dependent.x 3 . x 4 + 3x 2 . Find a linearly independent set that generates this subspace.0). / 4.6. .8x 4.5x . l ) .2 .8x 4. . e n } is linearly independent. x 4 4.5. Show that {u. 2x 2 4.1. . In M 3 x 2 (F). Give an example of three linearly dependent vectors in R3 such that none of the three is a multiple of another. x . 2 ) .1. In F n . x 2 . . .l . l ) .8x} in P4(R) 3.5x 2 + 5x .x 3 4. ( 2 . ( .

A linearly independent generating set for W possesses a very useful property—every vector in W can be expressed in one and only one way as a linear combination of the vectors in the set.Wn} be a linearly independent subset of a vector space V over the field Z2. then S must be linearly independent. R)..w} is linearly independent. (b) Let u. 1.v. Prove that S is linearly dependent if and only if ui = 0 or Wfc+i € span({tii.u2. u — v} is linearly independent. Prove that {u. .. and w be distinct vectors in V.3) with nonzero diagonal entries. Prove that / and g are linearly independent in J-(R. Let V be a vector space over a field of characteristic not equal to two. How many vectors are there in span(S)? Justify your answer.U\. Prove that if {Ai. where r ^ s. Let S = {ui. (a) Let u and v be distinct vectors in V.U2. g. un in S such that v is a linear combination of ui. Prove that {u. • • • . € F(R. 15. Prove Theorem 1.. Prove that a set S is linearly dependent if and only if S = {0} or there exist distinct vectors v.. Prove that a set S of vectors is linearly independent if and only if each finite subset of S is linearly independent.6 and its corollary. R) be the functions defined by f(t) = ert and g(t) = est. (This property is proved below in Theorem 1. Prove that the columns of M are linearly independent. Let / . v.u + w. 17. . Let M be a square upper triangular matrix (as defined in Exercise 12 of Section 1.u2. v} is linearly independent if and only if {u 4. . Alk} is also linearly independent. Uk}) for some k (1 < k < n).U2. Prove that S is linearly independent. .. • • • . 13.v.42 Chap.A2.8. 16. Let S = {ui.un} be a finite set of vectors.5 that if S is a generating set for a subspace W and no proper subset of S is a generating set for W.6 BASES AND DIMENSION We saw in Section 1. Let S be a set of nonzero polynomials in P(F) such that no two have the same degree. 18. 1 Vector Spaces 11.. .. w 2 .Ak} is a linearly independent subset of Mnxn(-F)j then {A\. 14. . • • •.. 20. 19. 12.un. A2. • • . v 4.w} is linearly independent if and only if {u + v.) It is this property that makes linearly independent generating sets the building blocks of vector spaces.

that is. . } is a basis. . let ei = (1. Theorem 1. . •' Example 4 In Pn(F) the set { 1. x 2 . . . • / Example 5 In P(F) the set { 1 . Then {E*? : 1 < i < m. • Observe that Example 5 shows that a basis need not be finite.0. Suppose that a\U\ +a2u2 4anun and v = b\U\ + b2u2 + • • • + bnut \. we see that 0 is a basis for the zero vector space.Sec. row and j t h column.1..0).anun . . . we aiso say that the vectors of ft form a basis for V. . e n } is readily seen to be a basis for F n and is called the standard basis for F n . 0 ) . can be expressed in the form v = aiUi 4.. a n . . {ei. Then ft is a basis for V if and only if each o G V can be uniquely expressed as a linear combination of vectors of ft. .e 2 = (0. If v e V.. x . Example 1 Recalling that span(0) = {()} and 0 is linearly independent. If ft is a basis for V. We call this basis the s t a n d a r d basis for Pn(F).0. • Example 2 In F n . a 2 . 1.0..0.8.. • Example 3 In MTOXn(F).u2. e 2 .1). . x.e„ = (0.0. . In fact.. Proof. . x 2 .. 1 < j < n} is a basis for M m x n (F).a2u2 H for unique scalars a\. establishes the most significant property of a basis. . later in this section it is shown that no basis for P(F) can be finite. then v € span(/?) because span(/i/) = V.. .. Let V be a vector space and ft = {ui. . . .. . Tims v is a linear combination of the vectors of ft. which is used frequently in Chapter 2. Let ft be a basis for V. . . let E*i denote the matrix whose only nonzero entry is a 1 in the ith. The next theorem. x n } is a basis. Hence not every vector space has a finite basis. .6 Bases and Dimension 43 Definition.un} be a subset ofV. .. . A basis j3 for a vector space V is a linearly independent subset of V that generates V.

.U2. 39).. . then the preceding construction shows that ft U {v} is linearly dependent.8 shows that if the vectors u\. Thus v determines a unique n-tuple of scalars ( a i . it follows that ai — b\ = a2 — b2 = • • • = on . un.. Since S is a finite set. . each n-tuple of scalars determines a unique vector v £ V by using the entries of the n-tuple as the coefficients of a linear combination of ui.9 identifies a large class of vector spaces of this type..an = bn. This method is illustrated in the next example.a2 = b2. We claim that ft is a basis for V.. then every vector in V can be uniquely expressed in the form v = arui 4. Let v £ S. this theorem is often remembered as saying that a tinite spanning set for V can be reduced to a basis for V..u2. If a vector space V is generated by a Bnite set S. .u2. .. Theorem 1. but adjoining to ft any vector in S not in ft produces a linearly dependent set. . an) and. Otherwise S contains a nonzero vector u\. By item 2 on page 37. If v £ ft. I Because of the method by which the basis ft was obtained in the proof of Theorem 1.u2. This fact suggests that V is like the vector space F n . So v £ span(/3) by Theorem 1.a2.. Otherwise. conversely.an.7 (p. 1 Vector Spaces are two such representations of v. .4 that this is indeed the case.. If S = 0 or S = {0}. Thus S C span(/?). Since ft is linearly independent.b2)u2 4h (an bn)un.a2u2 -\ h anun for appropriately chosen scalars 0].. {ui} is a linearly independent set. In this book. Hence a\ — l>i. The proof of the converse is an exercise.9..b\)ui + (a2 . Subtracting the second equation from the first gives 0 = (a\ ..un form a basis for a vector space V.. Proof.---. a 2 .. By Theorem 1. then some subset of S is a basis for V.. then clearly v £ span(/3). if possible. we must eventually reach a stage at which ft = {ui.. Hence V has a finite basis.5 (p. Because ft is linearly independent by construction. we are primarily interested in vector spaces having finite bases. ..bn = 0. then V = {()} and 0 is a subset of S that is a basis for V.44 Chap. where n is the number of vectors in the basis for V. and so v is uniquely expressible as a linear combination of the vectors of ft. it suffices to show that ft spans V. Uk} is linearly independent. We see in Section 2. if v £ ft.. Continue.. 30) we need to show that S C span(/?).9.. I Theorem 1.. Uk) is a linearly independent subset of S..itfc in S such that {ui. T h e o r e m 1. choosing vectors u2.

Sec.-3. (8. 5 ) 4. 1.2.0) from our basis.1 ) .0. select any nonzero vector in S. . (8. The induction begins with m = 0.2. Now we consider the set {(2. 2 . 0 .5) = (8. .-1)} is a subset of S that is a basis for R3. we exclude (7.2. 5 ) .1 ) } obtained by adjoining another vector in S to the two vectors that we have already included in our basis. — 1) in our basis or exclude it from the basis according to whether {(2.3 . (0.20). (1.5). say (2.5). 5). .3(1.3 . As before. .10 (Replacement Theorem).-3. 2 . (1. that is generated by a set G containing exactly n linearly independent subset of V containing exactly and there exists a subset HofG containing exactly L U H generates V. . -12. To start.2. 5 ) .2 ) . Since 4(2. —2) as part of our basis. 2. (1. 5). 5 ) .9. We conclude that {(2. —2)} is linearly independent. the set {(2.(0. 45 It can be shown that S generates R3. Because 2(2.3 . .-2).2 ) is not a multiple of (2.20).0. —12.1 ) } is linearly independent or linearly dependent. —1) in our basis.2.2.0) = (0.3 . —3. (1. Let V be a vector space vectors.0.-3. . and so we include (0.2.0. and let L be a m vectors.0)} is linearly independent or linearly dependent. (1.-3.2 ) . ( 0 .0)}. In a similar fashion the final vector in S is included or excluded from our basis according to whether the set {(2. wc include (0.2 ) 4. to be a vector in the basis.3 . On the other hand. and so taking H = G gives the desired result. We can select a basis for R3 that is a subset of S by the technique used in proving Theorem 1. . . . 2. (0.5).0. • The corollaries of the following theorem are perhaps the most significant results in Chapter 1. -1). so that the set {(2.(7.1 ) .4(0. 5 ) and vice versa. The proof is by mathematical induction on m. (7.0.5.-12. An easy calculation shows that this set is linearly independent. . .0.6 Bases and Dimension Example 6 Let S = {(2.3 .2 ) . for in this case L = 0 . ( 0 . .0). (7. Thus we include (1.0. . . ( 1 .-12. Hence we do not include (8. Theorem 1.20)} is linearly dependent by Exercise 9 of Section 1.2.(1. -2).0. Then rn < n n — rn vectors such that Proof. .20) in our basis.

biui 4. I If a vector space has a finite basis. . . Because {v\.. a contradiction. {v\. v2. Thus there exist scalars a\.. Then every basis for V contains the same number of vectors. um. it follows that {vi.. • • • .(-6r/ 1 6 n _ m )u f l _ m . Definitions.7 (p.. un-m are clearly in span(L U H).1 vectors.... Moreover. vm. .u2.. .a2v2 + amvrn 4.. 39). . say &i. vm. . Un-m} Q span(L U H).. un-m } of G such that {vi . Suppose that ft is a finite basis for V that contains exactly n vectors..v2. Since S is linearly independent and ft generates V. v2...v2.m } generates V. n > rn + l. Hence n > ra. Un-m)... We prove that the theorem is true for m 4. and so we may apply the induction hypothesis to conclude that rn < n and that there is a subset {ui ... 39) contradicts the assumption that L is linearly independent. we obtain n < ra... a 2 .v2. lest vm+i be a linear combination of v\.vm}u{ui. By the corollary to Theorem 1....5 (p.(-bi1a2)v2 4. If 7 contains more than n vectors.1.. Therefore 7 is finite. is nonzero. . b2.6 (p. that is.... w n . b\. u 2 . v%.. and let 7 be any other basis for V. aTO.. Corollary 1 asserts that the number of vectors in any basis for V is an intrinsic property of V..V2. + (lhl)vni+i Let H = {u2.b2u2 4bn= VTO+1 • (9) n—rn **n—rn Note that n — m > 0.. vm.1.. for otherwise we obtain the same contradiction.. 30) implies that span(L U H) = V. un-m } generates V. the theorem is true for rn 4. < n. This fact makes possible the following important definitions. 1 Corollary 1. Theorem 1.(-bilam)vm 4.. and because V\... Solving (9) for wi gives ui = (-6^ 1 Oi)« 1 4. Reversing the roles of ft and 7 and arguing as above. Let L = {v\... the replacement theorem implies that n +1 < n... The unique number of vectors . 1 Vector Spaces Now suppose that the theorem is true for some integer m > 0.. Then ui £ span(LUi/). Proof. A vector space is called finite-dimensional if it has a basis consisting of a Unite number of vectors. then we can select a subset S of 7 containing exactly n 4-1 vectors. and the number m of vectors in 7 satisfies 77?...(-bi1b2)u2 H + • • • 4. Since H is a subset of G that contains (n — rn) — 1 = n — (m + 1) vectors. Hence rn = n.46 Chap. .u2. which by Theorem 1. Let V be a vector space having a tinitc basis.u2.. u2. v m + i } be a linearly independent subset of V consisting of ra 4... This completes the induction. ui.ui.vm} is linearly independent. . bn-m such that a\V\ 4. some bi.. .

the vector space of complex numbers has dimension 1. namely {l. the first conclusion in the replacement theorem states that if V is a finite-dimensional vector space. however. Just as no linearly independent subset of a finite-dimensional vector space V can contain more than dim(V) vectors. 1. Example 7 The vector space {0} has dimension zero.. . Yet nothing that we have proved in this section guarantees an infinite-dimensional vector space must have a basis.x.) • Example 12 Over the field of real numbers.) • In the terminology of dimension. The following results are consequences of Examples 1 through 4. in fact. • • • • The following examples show that the dimension of a vector space depends on its field of scalars. Example 9 The vector space M m x n ( F ) has dimension rnn. that every vector space has a basis.. Example 10 The vector space Pn{F) has dimension n + 1. then no linearly independent subset of V can contain more than dim(V) vectors. Let V be a vector space with dimension n.}. (A basis is {l.i}. / Example 11 Over the field of complex numbers. From this fact it follows that the vector space P(F) is infinite-dimensional because it has an infinite linearly independent set. the vector space of complex numbers has dimension 2.Sec. This set is. A vector space that is not finite-dimensional is called infinite-dimensional. (a) Any finite generating set for V contains at least n vectors. x2. (A basis is {1}.6 Bases and Dimension 47 in each basis for V is called the dimension of\/ and is denoted by dim(V). Example 8 The vector space F n has dimension n.7 it is shown. a basis for P(F). and a generating set for V that contains exactly n vectors is a basis for V. a corresponding statement can be made about the size of a generating set. Corollary 2.. In Section 1.

therefore (a) implies that L U H contains exactly n vectors and that L U H is a basis for V.4 and (a) of Corollary 2 that {x 2 4. By Theorem 1. (c) Every linearly independent subset of V can be extended to a basis for V. Now L U H contains at most n vectors. Proof.0. so that G is a basis for V.1)} is a basis for R4. . . (b) Let L be a linearly independent subset of V containing exactly n vectors.1. then we must have H = G. Thus H = 0 .4x 4. and L generates V.3.5x . E x a m p l e 15 .1. G must contain at least n vectors. 1 E x a m p l e 13 It follows from Example 4 of Section 1.2x 2 4.3x .0.1 ) .2. ( : : It follows from Example 3 of Section 1. Since a subset of G contains n vectors. if G contains exactly n vectors.x 2 .1 ) .0.1 ) . It follows from the replacement theorem that there is a subset H of ft containing n — n = 0 vectors such that L U H generates V.4} is a basis for P2(R.)•(. L is a basis for V.0. Moreover.48 Chap. (0. . • E x a m p l e 14 It follows from Example 5 of Section 1. 1 Vector Spaces (b) Any linearly independent subset ofV that contains exactly n vectors is a basis for V.0.9 some subset H of G is a basis for V. then the replacement theorem asserts that there is a subset H of ft containing exactly n — ra vectors such that LU H generates V.0.: ^ : ) • ( : : ) . Since L is also linearly independent. (0. Let ft be a basis for V.4 and (a) of Corollary 2 that : is a basis for M2x2(R).). (a) Let G be a finite generating set for V.5 and (b) of Corollary 2 that {(1. • . Corollary 1 implies that H contains exactly n vectors. (c) If L is a linearly independent subset of V containing ra vectors. . (0.

The Venn diagram in Figure 1. An Overview of Dimension and Its Consequences Theorem 1.6 depicts these relationships. every linearly independent subset of V contains no more than n vectors and can be extended to a basis for V by including appropriately chosen vectors.5 and (b) of Corollary 2 that {po(x).. . bases. If V has a finite basis. This procedure also can be used to extend a linearly independent set to a basis.. In Section 3. Moreover.6 . n. For this reason..Sec. It follows from Example 4 A procedure for reducing a generating set to a basis was illustrated in Example 6. we summarize here the main results of this section in order to put them into better/perspective. Also. we will discover a simpler method for reducing a generating set to a basis. 49 f xn. Figure 1. . This number is called the dimension of V.Pi(x). 1. as (c) of Corollary 2 asserts is possible. . and generating sets. let pk(x) = x fc 4-x fc+1 4 of Section 1. . A basis for a vector space V is a linearly independent subset of V that generates V.4. when we have learned more about solving systems of linear equations.6 Bases and Dimension Example 16 For k = 0 . then every basis for V contains the same number of vectors. every basis for V contains exactly n vectors.9 as well as the replacement theorem and its corollaries contain a wealth of information about the relationships among linearly independent sets.pn(x)} is a basis for P n (F). and V is said to be finite-dimensional.. Thus if the dimension of V is n. 1 . each generating set for V contains at least n vectors and can be reduced to a basis for V by excluding appropriately chosen vectors.

.Xfc} is linearly independent but adjoining any other vector from W produces a linearly dependent set. Moreover. x 2 . Then W is finite-dimensional and dim(W) < dim(V). so {xi} is a linearly independent set.a :i . . a2 = 04)..X2. W contains a nonzero vector Xj. If W = {()}. so W = V.a&) £ F 5 : a.7 (p. (0.1.3 that the set of symmetric n x n matrices is a subspace W of M n X n ( F ) .11. this process must stop at a stage where A .\ + 03 4. • . a 2 . Theorem 1.04.Xfc} is linearly independent.. Theorem 1. then W is finite-dimensional and dim(W) = 0 < n. Xi.0.1.\ .1. (-1.3).X2. . • Example 19 We saw in Section 1. 1 Vector Spaces Our next result relates the dimension of a subspace to the dimension of the vector space that contains it. Continue choosing vectors. Example 18 The set of diagonal n x n matrices is a subspace W of M n x n ( F ) (see Example 3 of Section 1.0)} as a basis.0.. 39) implies that { x i . .0. Let W be a subspace of a finite-dimensional vector space V.50 The Dimension of Subspaces Chap.0. . .1). • • • . and hence it is a basis for W. Otherwise. Let dim(V) = n. A basis for W is {En. Proof. if dim(W) = dim(V). • • • . Thus dim(W) = n. Therefore dim(W) = k < n. < n and { x i . It is easily shown that W is a subspace of F5 having {(-1.E22.Enn}. But Corollary 2 of the replacement theorem implies that this basis for W is also a basis for V.0). X & in W such that {xi. x 2 . ..0. then a basis for W is a linearly independent subset of V containing n vectors. Thus dim(W) = 3. If dim(W) = n.05 = 0. where E%3 is the matrix in which the only nonzero entry is a 1 in the iih row and j t h column. then V = W.0. A basis for W is {Aij : 1 < i < j < n}. .Xfc} generates W. Since no linearly independent subset of V can contain more than n vectors. 1 Example 17 Let / W = { (o.

C f c fc=0 kjti . a subspace of dimension 2 is a plane through the origin. 1. 2.Ci~l)(Ci .a 2 . 1 in the j t h row and ith column. It follows that dim(W) =n + (n~ 1)44.} must have dimensions 0. we see that a subspace of dimension zero must be the origin of Euclidean 3-space. Any subspace of R2 having dimension 1 consists of all scalar multiples of some nonzero vector in R2 (Exercise 11 of Section 1.6 Bases and Dimension 51 where A*3 is the n x n matrix having 1 in the ith row and jfth column. If a point of R2 is identified in the natural way with a point in the Euclidean plane.Cn) n Ci . If W is a subspace of a finite-dimensional vector space V. where ais. which is a subset of the standard basis for Pig(F).. the subspaces of R. Proof.Cp) . . Corollary 2 of the replacement theorem guarantees that S can be extended to a basis for V. .1 = -n(n+ 1 Corollary. then it is possible to describe the subspaces of R geometrically: A subspace of R2 having dimension 0 consists of the origin of the Euclidean plane.. • We can apply Theorem 1.Cn) X -Cfc {(•< -Co)--. • • • . . then any basis for W can be extended to a basis for V. a subspace of dimension 1 is a line through the origin. x 2 .11 to determine the subspaces of R2 and R \ Since R2 has dimension 2. Similarly. and 0 elsewhere.a 0 . x 18 }. or 3.Ci+l) •••(C-i. Let Cq.an £ F .(Ci .• • (X . and a subspace of R2 having dimension 2 is the entire Euclidean plane. respectively. .Cj_!)(x . The Lagrange Interpolation Formula Corollary 2 of the replacement theorem can be applied to obtain a useful formula. Because S is a linearly independent. Interpreting these possibilities geometrically.4).. fi(x) fn(x) defined by fi(*) = (X .c n be distinct scalars in an infinite field F. and a subspace of dimension 3 is Euclidean 3-space itself. The polynomials fo(x). subspaces of R2 can be of dimensions 0. A basis for W is {1. a subspace of R with dimension 1 consists of a line through the origin. 1.Ci. or 2 only..a 2 x 4. is a subspace W of Pis(F).ai6.Cj+i) • • • (X . J Example 20 The set of all polynomials of the form a18xl84-a16x1644. The only subspaces of dimension 0 or 2 are {0} and R2. 1. subset of V. Let S be a basis for W. x 1 6 .Sec.

n. . n 9 = ^>2bifi.ifi(cj) i=0 = Oj = 0 for j = 0 . 1 . . . Hence aj = 0 for j = 0 . . 1 . so /3 is linearly independent. (10) This property of Lagrange polynomials can be used to show that ft = {/o> /i) • • • > fn} is a linearly independent subset of P n (F). . a n . Because ft is a basis for P n (F). 1 Vector Spaces are called the Lagrange polynomials (associated with co. aifi — 0 i=0 f° r s o m e scalars an.52 Chap. n. 1=0 It follows that 9(cj) = ^2bifi(cj) i=0 so 9= i-0 is the unique representation of g as a linear combination of elements of ft. Since the dimension of P n ( F ) is n+1.. where 0 denotes the zero function. by (10). every polynomial function g in P n ( F ) is a linear combination of polynomial functions of ft. say. Suppose that /_. Notice ^9(ci)h = bJ'^ . a i . Then n ^2 aifi(cj) i=0 But also n 22a. it follows from Corollary 2 of the replacement theorem that ft is a basis for P n ( F ) .c n ). By regarding fi(x) as a polynomial function /*: F — > F. we see that fi(cj) = 0 1 tii^j if i = j. ...ci. . . . . Note that each fi(x) is a polynomial of degree n and hence is in P n (F). This representation is called the Lagrange interpolation formula. . . .

^)=(l-2)(l-3)-2-^-5- + 6 ^ / ' (•'•) = t lit ^ = -Hx2 (2-l)(2-3) and /*<*)=(r?i!r?M(* (3 .5(x 2 . —4).. .6) ...4x 4.5/i(x) . C\ = 2. n ) .6 Bases and Dimension 53 that the preceding argument shows that if bo. Label the following statements as true or false. .3) l . . and c2 are / . cn in F.8). Every vector space that is generated by a finite set has a basis. Every vector space has a finite basis. and b2 = —4.5.) The Lagrange polynomials associated with Co. bo = 8.3 x 2 + 6x 4. = 8/ 0 (x) 4. / -3* + 2). Thus we have found the unique polynomial of degree not exceeding n that has specified values bj at given points Cj in its domain (j = 0 .3) .2)(x . then / is the zero function. in the notation above. .. ..4/ 2 (x) = 4(x 2 .bn are any n 4 .1 scalars in F (not necessarily distinct).. o .Sec. EXERCISES 1. C2 = 3.1)(3 . C\. (Thus. (a) (b) (c) (d) The zero vector space has no basis. A vector space cannot have more than one basis. Ci. .2(x 2 .b\. 1. 1 . For example. and (3. An important consequence of the Lagrange interpolation formula is the following result: If / £ P n ( F ) and f(ci) = 0 for n4T distinct scalars CQ. (x .2) = . then the polynomial function n 9 = Y1 Mi i=0 is the unique polynomial in P n ( F ) such that g(cj) = bj..5x 4.3x 4. . cn = 1.4x + 3). let us construct the real polynomial g of degree at most 2 whose graph contains the points (1.5). &i = 5. .2) 2 Hence the desired polynomial is 2 g(x) = J2bifi(x) i=0 2 . (2.

-10.-4.6x 2 } (e) {1 4. —5. (g) The dimension of M m X n ( F ) is rn 4. u2 = (1. (k) If V is a vector space having dimension n.5. (a) (b) { .3 ) .u2.4.9x 2 } 4.3x . Do the polynomials x 3 — 2x 2 4T.1)} (d) {(-1.1).6 ) . . then every vector in V can be written as a linear combination of vectors in S in only one way. and that S 2 is a subset of V that generates V.2 . The vectors u : = (2.3 4-X2. ( 0 .2 x 4 .x 2 .3(i?)? Justify your answer.37.2. 1 . . = (—3.3 .0)} a linearly independent subset of R 3 ? Justify your answer.5x . 8 .2x .x 4.X4-X2} (c) {1 .-1)} (c) {(1.1).0.1. (2.4x 2 . .1). (0.2 .x 2 . ( .4. 4 .8). (h) Suppose that V is a finite-dimensional vector space. then V has exactly one subspace with dimension 0 and exactly one subspace with dimension n. . (i) If S generates the vector space V.3. . u 3 = ( .l . Then Si cannot contain more vectors than S 2 .2 + x .4 . l .U3. 7. (a) {(1. 2 ) } (e) {(1.4x .1. .8 .1).18x .-1). and if S is a subset of V with n vectors. .x 2 . (f) The dimension of P n ( F ) is n.3 . 1 .2 ) } 3.0.-1). (1) If V is a vector space having dimension n.-3.3.5.2x 4.(1. .x + 2x 2 . then the number of vectors in every basis is the same.0. then S is linearly independent if and only if S spans V. 3 ) } (b) {(2.4 . and 3x —2 generate P. .n.6x 2 } (d) { .4 ) .1 4.2x 2 .U4. Determine which of the following sets are bases for R3.10x2.2x 4. 1 Vector Spaces (e) If a vector space has a finite basis.2 x 2 .-1). 3 ) .2x . Give three different bases for F 2 and for M 2X 2(F).2). (j) Every subspace of a finite-dimensional space is finite-dimensional. Find a subset of the set {ui.4 x 2 } {14-2X4-X2. 2. Determine which of the following sets are bases for P2(R).2 ) . and u*. 5. (6. (0.3 . .54 Chap. that Si is a linearly independent subset of V.ur. 1 2 . .(2. (2.1.1 4.8) generate R3. .-2). uA = (1. 6. ( . (2.} that is a basis for R3.2 4. ( . —17). Is {(1.1). (1.4x 2 — x4-3.3 .

a 3 . 1.9). (3. a 3 . Let u. (3.03.6 ) .a 3 . 1 .2 .1. The vectors m = (1. w} is also a basis for V.Ug} that is a basis for generate W.2x 2 4. 2 ) .8 .3 .2 ) . 4 .2.6 Bases and Dimension 55 8. and W 4 = (0.3) (-4. -18. a 2 . au} and {au. 2 . What are the dimensions of Wi and W2? .9. 12.04) in F4 as a linear combination of Mi.a 5 = 0}.a 4 = 0} and W 2 = {(ai.a 5 ) £ F 5 : a x . (1.. u2.0.X3 = 0 is a subspace of R3.1.3) (-2.-2) ( . . (0. bv} are also bases for V. 1 ) . . u2 u4 uQ u8 = = = = ( .x 3 = 0 2xi — 3x 2 4.1. Show that if {u. and w be distinct vectors of a vector space V. .2 .5 .1).. (1. . -3). (-1.0).1. .3 0 ) .1 . (a) (b) (c) (d) ( . 7 ) . (3. . v} is a basis for V and a and 6 are nonzero scalars. v.15. (1. 3 .1). a 5 ) £ F 5 : a 2 = a 3 = a 4 and a x 4. . W. 13. a 4 ..1 . The set of solutions to the system of linear equations xi .a4.1) form a basis for F4.6 . then {u 4.Sec.v.1). -12. 9 .02. v. 10. . then both {u 4. .6 ) . (2.a2. . In each part. v + w.0.9 . w} is a basis for V. Let W denote the subspace of R5 consisting of all the vectors having coordinates that sum to zero. Find a subset of the set {ui.3 . u2. 7 . (-2. u3 = (0. ( . Show that if {u.w.3 .3).0. (1.6 ) . 6 ) . Find bases for the following subspaces of F 5 : Wi = {(ai.1. Find the unique representation of an arbitrary vector (01. (0.2 . 9. (2. U3.10) / 11. . 14.2 .9 . . (-1.1.12). Find a basis for this subspace.0. Let u and v be distinct vectors of a vector space V.7). u2 = (0. .15). .5). The vectors wi u3 u5 u7 = = = = (2.24). . and U4.1. (1. use the Lagrange interpolation formula to construct the polynomial of smallest degree whose graph contains the following points.v 4.

determine the dimension ofZ.8.. cnf^(x).. . Complete the proof of Theorem 1. and Z be as in Exercise 21 of Section 1. (b) State and prove a relationship involving dim(Wi) and dim(W 2 ) in the case that dim(Wi) ^ dim(W 2 ). If V and W are vector spaces over F of dimensions m and n.. Prove that a vector space is infinite-dimensional if and only if it contains an infinite linearly independent subset. (Be careful not to assume that S is finite.. (a) Prove that there is a subset of S that is a basis for V. Find a basis for W.vk.cn such that g(x) = co/(x) 4. 1 Vector Spaces 15.C l / ' ( x ) 4. The set of all skew-symmetric n x n matrices is a subspace W of Mnxn(^) (see Exercise 28 of Section 1. Prove that for any g(x) £ Pn(R) there exist scalars Co. Find a basis for W. Determine necessary and sufficient conditions on Wi and W 2 so that d i m ( W i n W 2 ) = dim(W 1 ). The set of all n x n matrices having trace equal to zero is a subspace W Exam of M n x n ( F ) (see Example 4 of Section 1. 20.2.3).v}). Find a basis for the vector space in Example 5 of Section 1. 23. Let f(x) be a polynomial of degree n in Pn(R). • • • .56 Chap. and define Wi = and W 2 = span({vi.V2. (a) Find necessary and sufficient conditions on v such that dim(Wi) = dim(W 2 ).v2. Let Wi and W 2 be subspaces of a finite-dimensional vector space V. .v2. 25.. What is the dimension of W? 17. be vectors in a vector space V. c\. The set of all upper triangular n x n matrices is a subspace W of M n X n ( F ) (see Exercise 12 of Section 1...3).* Let V be a vector space having dimension n.vk. 24..3). Find a basis for W.c2f"(x) + • • • 4where f^n'(x) denotes the nth derivative of f(x). 19.. What is the dimension of W? 16.2. Let Vi.. 22.) (b) Prove that S contains at least n vectors.. 21. and let S be a subset of V that generates V.vk}).v span({vi. Justify your answer. Let V. W. What is the dimension of W? 18.

Hint: Start with a basis {ui. 27.W 2 is finite-dimensional..c£F —a Prove that W] and W 2 are subspaces of V. 31. such that dim(Wi D W 2 ) = n.3. 32. (a) Find an example of subspaces Wi and W 2 of R3 with dimensions m and n. respectively. (a) Prove that if W] and W 2 are finite-dimensional subspaces of a vector space V. Wi 4. 28. For a fixed a £ R. Let Wi and W 2 be subspaces of a vector space V having dimensions rn and n.dim(W. as defined in the exercises of Section 1. and dim(Wi 4-W 2 ) = dim(Wi) 4-dim(W 2 ) .b£F Wi = c a eV: a. Let V be a finite-dimensional vector space over C with dimension n..vi..wi... (b) Let Wi and W 2 be finite-dimensional subspaces of a vector space V.. such that dim(Wi 4. then the subspace Wi 4. ..W 2 ) <m + n. 1.b.. Prove that if V is now regarded as a vector space over R. and let V = Wi 4. n W 2 ).. Let V = M 2 X 2 (F). where m> n. 30. and W2 = £\/:a.VV2) = m + n. .. (b) Prove that dim(Wi 4.3. (a) Prove that dim(Wi n W 2 ) < n.. 29..Sec. and Wi n W 2 . wp} for W 2 . Determine the dimensions of the subspaces Wi n P n ( F ) and W 2 n P n (F).w2..v2. .. u2.) Exercises 29-34 require knowledge of the sum and direct sum of subspaces. (See Examples 11 and 12. then dim V = 2n.W 2 . vm} for Wi and to a basis {uii«2.u2. determine the dimension of the subspace of Pn(R) defined by {f £ Pn(R): f(a) = 0}. Deduce that V is4he direct sum of Wi and W 2 if and only if dim(V) = dim(Wi) 4. (b) Find an example of subspaces Wi and VV2 of R3 with dimensions m and n. Let Wi and W 2 be the subspaces of P(F) defined in Exercise 25 in Section 1.W 2 . where m > n > 0.. uk.dim(W 2 ). where m > n > 0. W 2 .6 Bases and Dimension 57 26.uk..uk} for Wi f) W 2 and extend this set to a basis {ui. and find the dimensions of Wi.

A member M of T is called maximal (with respect to set inclusion) if M is contained in no member of T other than M itself. un 4. a more general result called the maximal principle is needed. 34. and yet it follows from the results of this section that such a basis does exist. .. and consider the basis {ui. where rn > n. Prove that if fti U ft2 is a basis for V. (b) Conversely. then there exists a subspace VV2 of V such that V = Wi©W2. Definition.uk. . Let W be a subspace of a finite-dimensional vector space V.. and dim(V/W). 35. Consider. dim(W). The difficulty that arises in extending the theorems of the preceding section to infinite-dimensional vector spaces is that the principle of mathematical induction. There is no obvious way to construct a basis for this space.6. . 1.7* MAXIMAL LINEARLY INDEPENDENT SUBSETS In this section. .u2. (b) Derive a formula relating dim(V). let fti and ft2 be disjoint bases for subspaces Wi and W 2 .un} be an extension of this basis to a basis for V.u2. If fti and ft2 are bases for Wi and W 2 .. Our principal goal here is to prove that every vector space has a basis. for example. we need to introduce some terminology.6 are extended to infinite-dimensional vector spaces. . (a) Prove that if Wi is any subspace of a finite-dimensional vector space V. Give examples of two different subspaces W 2 and W 2 such that V = Wi © W 2 and V = Wi©W2. (a) Prove that {uk+i + W.uk} for W. (b) Let V = R2 and Wi = {(ai. (a) Let Wi and W 2 be subspaces of a vector space V such that V = Wi ©W 2 ..58 Chap. show that fti n £k = 0 and fti U ft2 is a basis for V.. . 1 Vector Spaces 3 (c) Find an example of subspaces Wi and W 2 of R with dimensions m and n. respectively. 33. wfc+2 4.3. Let J. is no longer adequate. respectively. such that both dim(Wi D W 2 ) < n and dim(Wi 4. .be a family of sets..uk+\. Instead. several significant results from Section 1. The following exercise requires familiarity with Exercise 31 of Section 1. the vector space of real numbers over the field of rational numbers. which played a crucial role in many of the proofs of Section 1.W} is a basis for V/W. Before stating this principle. This result is important in the study of infinite-dimensional vector spaces because it is often difficult to construct an explicit basis for such a space.. Let {ui. .0): ai £ R}... then V = Wi © W 2 .W 2 ) < m 4-n. of a vector space V.W .

(b) The only linearly independent subset of S that contains B is B itself. and let T be the union of their power sets. In fact.12. Graduate Texts in Mathematics Series. General Topology. . Because the maximal principle guarantees the existence of maximal elements in a family of sets satisfying the hypothesis above. . Springer-Verlag. Example 4 For each positive integer n let An = {1. 4 The Maximal Principle is logically equivalent to the Axiom of Choice. • Definition. which is an assumption in most axiomatic developments of set theory. For if M is any member of T and s is any element of S that is not in M. Then the collection of sets C = {An: n — 1. Kelley. see John L. it is useful to reformulate the definition of a basis in terms of a maximal property.2. the concept defined next is equivalent to a basis. Then T has no maximal element.3.2. Then S and T are both maximal elements of T. • Example 2 Let S and T be disjoint nonempty sets. A maximal linearly independent subset of S is a subset B of S satisfying both of the following conditions. n } . in fact. If. Vol. Definition. In Theorem 1. Let S be a subset of a vector space V.. (This family T is called the power set of S. either A C B or B C A. then ML) {s} is a member of T that contains M as a proper subset. • Example 3 Let T be the family of all finite subsets of an infinite set S. .) The set S is easily seen to be a maximal element oiT. Maximal Principle.} is a chain. For a treatment of set theory using the Maximal Principle. . A collection of sets C is called a chain (or nest or tower) if for each pair of sets A and B in C.. 27. 4 Let T be a family of sets.. • With this terminology we can now state the maximal principle. there exists a member of T that contains each member ofC. then T contains a maximal member. 1. .7 Maximal Linearly Independent Subsets Example 1 59 Let T be the family of all subsets of a nonempty set S. for each chain C C JF. (a) B is linearly independent. 1991.Sec. we show that this is possible. Am C An if and only if m<n.

39) because span(/?) = V.9} is a maximal linearly independent subset of S = {2x3 .Ax . then there exists a member U of T that contains each member of C.x 3 .5x2 .5a. Let ft be a maximal linearly independent subset of S. it suffices to prove that ft generates V. we must show that if C is a chain in T.13. Because ft is linearly independent. Proof.2 .2x2 . we have contradicted the maximality of ft. it follows from Theorem 1.3.7 (p.3. then ft is a basis for V. Therefore we can accomplish our goal of proving that every vector space has a basis by showing that every vector space contains a maximal linearly independent subset. This result follows immediately from the next theorem.Ax . 30) that span(/3) = V. ft is linearly independent by definition. any subset of S consisting of two polynomials is easily shown to be a maximal linearly independent subset of S.se there exists a v £ S such that v £ span(/?). Since Theorem 1.12. because 1.5 (p. and so it suffices to prove . then ft U {v} is linearly dependent by Theorem 1.5x .hx .60 Example 5 Example 2 of Section 1. Our next result shows that the converse of this statement is also true. We claim that S C span(/3). If ft is a maximal linearly independent subset ofS.3x 3 . 2. Because span(S) = V. is the desired set. Let V be a vector space and S a subset that generates V.x3 . Let T denote the family of all linearly independent subsets of V that contain S. Theorem 1.4 shows that Chap. however.9} in P2(-R). 39) implies that ft U {v} is linearly independent. Thus maximal linearly independent subsets of a set need not be unique. 1 Vector Spaces {x3 .7 (p. Therefore S C span(/3). Theorem 1. We claim that U. In this case. If v £ V and v ^ ft. Let S be a linearly independent subset of a vector space V.2x2 . Proof.3. In order to show that T contains a maximal element. Clearly U contains each member of C. 0 Thus a subset of a vector space is a basis if and only if it is a maximal linearly independent subset of the vector space. There exists a maximal linearly independent subset ofV that contains S.2x2 + \2x . the union of the members of C. • A basis ft for a vector space V is a maximal linearly independent subset of V. for otherwi.6.

It can be shown.anun = 0 implies that a\ = a2 — • • • — an = 0. D. . (a) Every family of sets contains a maximal element.2. . of V that contains S). (d) If a chain of sets has a maximal clement. 2.Sec.. Hint: . analogously to Corollary 1 of the replacement theorem (p.a2w2 4.. (See Exercise 21 in Section 1. then that maximal element is unique.anun = 0.. 240. n .2. then that maximal element is unique. ..n. (f) A maximal linearly independent subset of a vector space is a basis for that vector space. H Corollary. EXERCISES 1. Let V be the set of real numbers regarded as a vector space over the field of rational numbers. for example. we have S C U C V. . (e) A basis for a vector space is a maximal linearly independent subset of that vector space.) Exercises 4-7 extend other results from Section 1. Because Ui £ U for i — 1.. contains all the others. This element is easily seen to be a maximal linearly independent subset of V that contains S.) 3. Lectures in Abstract Algebra. (Sets have the same cardinality if there is a one-to-one and onto mapping between them. Prove that V is infinite-dimensional. New York. Because each member of C is a subset of V containing S. say Ak. The maximal principle implies that T has a maximal element.. 1.un be in U and a i . Thus we need only prove that U is linearly independent.) (See. vol. However. Linear Algebra. Show that the set of convergent sequences is an infinite-dimensional subspace of the vector space of all sequences of real numbers. so a\U\ + a2u2 4. . .3. that every basis for an infinite-dimensional vector space has the same cardinality. N.e. . .7 Maximal Linearly Independent Subsets 61 that U £ T (i. Jacobson. p. a 2 . an be scalars such that aiUi 4. Label the following statements as true or false. u2.. (b) Every chain contains a maximal element. 1953. It follows that U is linearly independent.6 tri infinite-dimensional vector spaces. 46).• • • 4. . one of these sets. (c) If a family of sets has a maximal element. Let u\. there exists a set Ai in C such that Ui £ Ai. Every vector space has a basis. Ak is a linearly independent set. that U is a linearly independent subset. But since C is a chain.• • • 4. Van Nostrand Company. Thus ui £ Ak for i = 1. 2. .

4. then there exists a basis ft for V such that Si C ft C S 2 . 7.. Then ft is a basis for V if and only if for each nonzero vector v in V. un in ft and unique nonzero scalars C\. .62 Chap.9 (p. If Si is linearly independent and S 2 generates V. Prove the following infinite-dimensional version of Theorem 1. 5. and let S be a linearly independent subset of V.. and proceed as in the proof of Theorem 1.13. there exist unique vectors u\. 6. . Prove the following generalization of the replacement theorem. Let ft be a basis for a vector space V. it is not a zero of any polynomial with rational coefficients. Hint: Apply the maximal principle to the family of all linearly independent subsets of S 2 that contain Si. Let W be a subspace of a (not necessarily finite-dimensional) vector space V.. that is.. 44): Let Si and S 2 be subsets of a vector space V such that Si C S 2 .8 (p. . Prove the following generalization of Theorem 1. 1 Vector Spaces Use the fact that IT is transcendental. u2. .cnun. Prove that any basis for W is a subset of a basis for V. c 2 . There exists a subset Si of ft such that S U Si is a basis forV. cn such that v — ciui + c2u2 -\ 1 . INDEX OF DEFINITIONS FOR CHAPTER 1 Additive inverse 12 Basis 43 Cancellation law 11 Column vector 8 Chain 59 Degree of a polynomial 9 Diagonal entries of a matrix £ Diagonal matrix 18 Dimension 47 Finite-dimensional space 46 Generates 30 Infinite-dimensional space 47 Lagrange interpolation formula 52 Lagrange polynomials 52 Linear combination 24 Linearly dependent 36 Linearly independent 37 Matrix 8 Maximal element of a family of sets 58 Maximal linearly independent subset 59 n-tuple 7 Polynomial 9 Row vector 8 Scalar 7 Scalar multiplication 6 Sequence 11 Span of a subset 30 Spans 30 Square matrix 9 Standard basis for F" 43 Standard basis for P n (F) 43 Subspace 16 Subspace generated by the elements of a set 30 Symmetric matrix 17 Trace 18 Transpose 17 Trivial representation of 0 36 . 43): Let ft be a subset of an infinite-dimensional vector space V.

Chap. 1 Index of Definitions Vector 7 Vector addition 6 Vector space 6 Zero matrix 8 Zero polynomial 9 Zero subspace 16 Zero vector 12 Zero vector space 15 63 / .

2. In calculus.7 and 5. Later we use these transformations to study rigid motions in R n (Section 6. These special functions are called linear transformations.1 LINEAR TRANSFORMATIONS.1 2. we consider a number of examples of linear transformations. In the remaining chapters.7* T r a n s f o r m a t i o n s M a t r i c e s Linear Transformations.5 2. NULL SPACES. reflections. we see further examples of linear transformations occurring in both the physical and the social sciences. Null spaces. Many of these transformations are studied in more detail in later sections.1). AND RANGES In this section. Throughout this chapter. 3. Recall that a function T with domain V and codomain W is denoted by 64 .3 2.10).2).1) provide us with another class of linear transformations. and 4 of Section 2. and they abound in both pure and applied mathematics. and projections (see Examples 2.6* 2. we assume that all vector spaces are over a common field F. It is now natural to consider those functions defined on vector spaces that in some sense "preserve" the structure. the operations of differentiation and integration provide us with two of the most important examples of linear transformations (see Examples 6 and 7 of Section 2.2 2. and Ranges The Matrix Representation of a Linear Transformation Composition of Linear Transformations and Matrix Multiplication Invertibility and Isomorphisms The Change of Coordinate Matrix Dual Spaces Homogeneous Linear Differential Equations with Constant Coefficients i n Chapter 1. rotations. These two examples allow us to reformulate many of the problems in differential and integral equations in terms of linear transformations on particular vector spaces (see Sections 2. In geometry. we developed the theory of abstract vector spaces in considerable detail.2 L i n e a r a n d 2.4 2.

then (a) implies (b) (see Exercise 37).T(y) = c(26.d2). we have T E " > x < ) = X>T(-<-<)- We generally use property 2 to prove that a given transformation is linear. Also cT(x) 4.) 65 Definition.c&2 + 2di 4...(2di 4. If the underlying field F is the field of rational numbers. y £ V and c £ F.di) 4 c62 + ^2.1 Linear Transformations. T is linear if and only if. (See Exercise 7..) 1. a„ £ F. 2.. for xi. If T is linear. (See Appendix B. for all x. We often simply call T linear. let c £ R and x.d 2 . rf.b2) and y = (d\..) = (2c6. and Ranges T: V -» W.r/. but. See Exercises 38 and 39. Let V and W be vector spaces (over F). cbx +di). c6] + di). . cb2 4. + a 2 . where x = (bi. cbi 4.T(y) for all x.d2). The reader should verify the following properties of a function T: V — » W. Null Spaces. y £ V and c£F. Since ex + y= (cbi 4. 2.6 2 .y £ R2. in general (a) and (b) are logically independent. 3.Sec. a 2 ) = (2a.dx. . 4. . T is linear if and only if T(cx + y) = cT(x) + T(y) for all x. then T(x -y) = T(x) . We call a function T : V —* W a linear transformation from V to W if. we have (a) T(x + y) = T(x) 4.T(y) and (b) T(cx) = cT(x). To show that T is linear. y £ V.x2. then T(0) = 0. we have T(cx + y) = (2(c6] +di) + cb2 + d2.) = (2(c6i 4. Example 1 Define T: R2 -> R2 by T ( a . a i ) . . So T is linear. If T is linear. 4. 4.6j) 4.xn £V and oi.d 2 . o2..

and projection.fl2) (01. and T#(0.a 2 ) makes with the positive x-axis (see Figure 2.) • Example 4 Define T: R2 — > R2 by T(oi.. Finally.a 2 ] (01. observe that this same formula is valid for (01. the applications of linear algebra to geometry are wide and varied. We leave the proofs of linearity to the reader.ai sin 0 4.02] Chap. We determine an explicit formula for T#.1 T(ai.0).02) = (0. define T#: R2 — » R2 by the rule: Te(ai.i — rcosc* and a2 = rsino.a 2 ) £ R2.0).o 2 ) = (01.66 T« (01.1(b).) • • . Then T#: R2 — » R2 is a linear transformation that is called the rotation by 0. T is called the reflection about the x -axis. (See Figure 2.rsin(a!4-0)) = (r cos a cos 6 — r sin a sin 0.a2 cos 9).02.0) = (0.. Let a be the angle that (oi. The main reason for this is that most of the important geometrical transformations are linear.a 2 ) ^ (0. Tff(ai.02) = (ai.-02] (b) Reflection Figure 2.a2) is the vector obtained by rotating (ai. It is now easy to show. 2 Linear Transformations and Matrices (oi. Example 3 Define T: R2 — » R2 by T(ai.-02). as in Example 1. It follows that Tfl(ai.a2) has length r and makes an angle a + 9 with the positive x-axis. Three particular transformations that we now consider are rotation. Fix a nonzero vector (ai.a2) = (rcos(a4-0).0).02] (01.0). reflection.1(a)).0) (c) Projection As we will see in Chapter 6.a 2 ) counterclockwise by 6 if (oi. Example 2 For any angle 0. r cos a sin 0 + r sin a cos 0) = (ai cos 9 — a2 sin 0. that T# is linear.1(c).o 2 ) = (oi. Also. Then a. T is called the projection on the x-axis. (See Figure 2. and let r = ya2 + a2. fa) Rotation T(oi.

The determination of these sets allows us to examine more closely the intrinsic properties of a linear transformation.3. • Two very important examples of linear transformations that appear frequently in the remainder of the book. N(T) = { i G V : T(X) = 0}. and let T: V — * W be linear. let g(x). Let a. Definitions. We define the null space (or kernel) N(T) of T to be the set of all vectors x in V such that T(x) = 0. T is linear. for all / £ V. We define the range (or image) R(T) of T to be the subset of W consisting of all images (under T) of vectors in V. where f'(x) denotes the derivative of f(x). Then T is a linear transformation by Exercise 3 of Section 1. defined in Section 1. We often write I instead of lvWe now turn our attention to two very important sets associated with linear transformations: the range and null space. and Ranges We now look at some additional examples of linear transformations. h(x) £ Pn(R) and a £ R.h(x)) = (ag(x) + h(x))' = ag'(x) 4. and therefore deserve their own notation. Example 7 Let V = C(R). are the identity and zero transformations. Now T(ag(x) 4. 2.3.1 Linear Transformations. To show that T is linear. It is clear that both of these transformations are linear. a<b.Sec. Example 5 67 Define T: M m x n (-F) — > M n x m ( F ) by T(A) = A1. that is.b£ R. Null Spaces. For vector spaces V and W (over F). Let V and W be vector spaces. that is. the vector space of continuous real-valued functions on R. we define the identity transformation lv: V — • V by \v(x) = x for all x £ V and the zero transformation T 0 : V . Define T : V -* R by T(/)= / Ja f(t)dt / • T(h(x)). R(T) = {T(x): x £ V}. • Example 6 Define T: Pn(R) -* Pn-i(R) by T(/(x)) = f'(x). .• W by T0(x) = 0 for all x £ V. where A1 is the transpose of A.h'(x) = aT(g(x)) + So by property 2 above. Then T is a linear transformation because the definite integral of a linear combination of functions is the same as the linear combination of the definite integrals of the functions.

2. then R(T) = span(T(/?)) = span({T(<. Clearly T(vi) £ R(T) for each i.i). Since T(0 V ) = #w. Theorem 2.. respectively.. Thus x + y £ R(T) and ex £ R(T)...y £ N(T) and c£ F. Then there exist v and w in V such that T(v) — x and T(w) = y. If ft = {vi. 2 Linear Transformations and Matrices Let V and W be vector spaces. Proof. It is left as an exercise to verify that N(T) = {(a.y £ R(T) and c £ F.Hence x + y £ N(T) and ex £ N(T). 2 o 3 ) . Then N(l) = {0}.5 (p. so that N(T) is a subspace of V. and T(cv) = cT(v) — ex. So T(v 4. 02. Because T(0v) = 0\N. vn} is a basis for V. and T(cx) = cT(x) = c 0 w = #w. • In Examples 8 and 9.. N(T0) = V. To clarify the notation.. Theorem 2. we have that 0\j £ N(T).03) = (01 . andR(T o ) = {0}.. With this accomplished. Because R(T) is a subspace. . . and let T: V — • W be linear.0):a£R} and R(T) = R 2 . 30). a. Let V and W be vector spaces and T: V Then N(T) and R(T) are subspaces ofV and W.68 Example 8 Chap. respectively. a basis for the range is easy to discover using the technique of Example 6 of Section 1. Proof. W be linear.1.. R(l) = V.T( V2 ). and let I: V — > V and TQ: V — > W be the identity and zero transformations. 1 The next theorem provides a method for finding a spanning set for the range of a linear transformation. T(vn)}) = span(T(/3)) by Theorem 1. Let x. .w) = T(v) 4.. The next result shows that this is true in general.. Now let x.T(vn)}). T(v2). Then T(x + y) = T(x) + T(y) = 0yv + 0w = #w. we see that the range and null space of each of the linear transformations is a subspace.i). we have that #w € R(T). so R(T) is a subspace of W. R(T) contains span({T(i.T(w) = x + y. Let V and W be vector spaces. we use the symbols Oy and 0w to denote the zero vectors of V and W.6. respectively.v2.a 2 . • Example 9 Let T : R3 — > R2 be the linear transformation defined by T(ai.

. denoted nullity(T).x2} / ( I ) . we have n v = 2 J OiVi for some 01. Let V and W be vector spaces. The null space and range are so important that we attach special names to their respective dimensions. respectively. Reflecting on the action of a linear transformation. |J It should be noted that Theorem 2. If N(T) and R(T) are finite-dimensional. In other words. a 2 . and let T: V — • W be linear.T(. appropriately called the dimension theorem. The same heuristic reasoning tells us that the larger the rank.x. a n 6 F.2 is true if ft is infinite. . Then w — T(v) for some « G V . i=i Since T is linear. to be the dimensions of N(T) and R(T). . . As in Chapter 1. and the r a n k of T. and so dim(R(T)) = 2. we measure the "size" of a subspace by its dimension.). (See Exercise 33. then we define the nullity ofT. 2.({T(v): v £ ft}). Because ft is a basis for V. R(T) = Bpan. the smaller the rank. 10 i=l So R(T) is contained in span(T (/?)).) The next example illustrates the usefulness of Theorem 2. the smaller the range. the more vectors that are carried into 0. it follows that n = T(«) = ^ a i T ^ ) £ span(T(/3)). Definitions. and Ranges 69 Now suppose that w £ R(T).T(x 2 )}) = span = span 0 0 0 1 -1 0 0 0 -3 0 0 0 0 0 0 1 -1 0 0 0 • Thus we have found a basis for R(T). that is. Example 10 Define the linear transformation T: P2(R) T(/(*)) = Since ft — {l. Null Spaces. we see intuitively that the larger the nullity.Sec. This balance between rank and nullity is made precise in the next theorem./(2) 0 M2x2(R) 0 /(0). the smaller the nullity.2. by is a basis for P2(R). denoted rank(T). . we have R(T) = span(T(/5)) = span({T(l). .1 Linear Transformations.

. Let V and W be vector spaces.... we have R(T) = s p a n ( { T M ... we have T / So ( J2 biVi) \i=fc+i / =°- ^2 bivi i=k+l Hence there exist c.. 2 Linear Transformations and Matrices Theorem 2. 51).. Using Theorem 2. . ..ck n k ^2 biVi = y]ciVi i=k+l i=l e N T ( )- £ F such that or k n ^2(~Ci)vi + 22 °ivi = °i=l i=k+l Since ft is a basis for V.3 (Dimension Theorem). bn £ F.v2. We claim that S = {T(vk+i). If V is finite-dimensional.2 = 3. Interestingly. T(vk+2)... for a linear transformation..... Suppose that n yi biT(vi) = 0 i=fc+l for 6 fc+ i. and let T: V — > W be linear. This is demonstrated in the next two theorems. The reader should review the concepts of "one-to-one" and "onto" presented in Appendix B.vk} is a basis for N(T)... both of these concepts are intimately connected to the rank and nullity of the transformation. Proof.. B If we apply the dimension theorem to the linear transformation T in Example 9. First we prove that S generates R(T).2 and the fact that T(v-i) — 0 for 1 < i < k.T(v f c + 2 ). .. Hence S is linearly independent. therefore rank(T) = n — k.rank(T) = dim(V).11 (p. bk+2.\. Notice that this argument also shows that T(i>fc+i).. and {vi. T(vn)} = span({T(v/ c + i).. By the corollary to Theorem 1...vk} to a basis ft = {vi. Now we prove that S is linearly independent... T(vk+2).. .70 Chap.v2. T(v2)... Suppose that dim(V) = n.... we have bi — 0 for all i. we may extend {vi. dim(N(T)) = k.v2... T(vn)} is a basis for R(T). T(vn)} = span(S). we have that nullity(T) 4. then nullity(T) 4. so nullity(T) = 1.vn} for V. T(vn) are distinct. Using the fact that T is linear.c2.

Then 0 = T(x) — T(y) = T(x — y) by property 3 on page 65.4. if and only if rank(T) = dim(W). Since T is one-to-one. 16. (a) T is one-to-one. we have nullity (T) 4.1 Linear Transformations. Now assume that N(T) = {0}.) The linearity of T in Theorems 2. Then the following are equivalent. Let V and W be vector spaces.11 (p. (b) T is onto. and if and only if dim(R(T)) = dim(W). then it does not follow that one-to-one and onto are equivalent. this equality is equivalent to R(T) = W. From the dimension theorem. Suppose that T is one-to-one and x £ N(T). Then T is one-to-one if and only if N(T) = {0}. for it is easy to construct examples of functions from R into R that are not one-to-one. I We note that if V is not finite-dimensional and T: V — » V is linear. Therefore x — y £ N(T) = {0}.5. the conditions of one-to-one and onto are equivalent in an important special case. (c) rank(T)=dim(V). 2. Now. and 21. Let V and W be vector spaces of equal (finite) dimension. By Theorem 1. the definition of T being onto. and suppose that T(x) = T(y). and vice versa. and Ranges 71 Theorem 2. Surprisingly. Theorem 2. (See Exercises 15. or x = y. but are onto. The next two examples make use of the preceding theorems in determining whether a given linear transformation is one-to-one or onto.4. Null Spaces. we have x — 0. if and only if rank(T) = dim(V). Then T(x) = 0 — T(0). if and only if nullity(T) = 0.Sec. This means that T is one-to-one. Proof. I The reader should observe that Theorem 2.5 is essential. and let T: V — • W be linear. Proof.rank(T) = dim(V). with the use of Theorem 2. Hence N(T) = {0}. So x — y = 0. Example 11 Let T: P2(R) —* P3(R) be the linear transformation defined by T(f(x)) = 2f'(x)-r fX3f(t)dt. 50). Jo .4 allows us to conclude that the transformation defined in Example 9 is not one-to-one. we have that T is one-to-one if and only if N(T) = {0}.4 and 2. and let T: V — > W be linear.

. .o 2 ). Then S is linearly independent in P 2 (i?) because T(S) = { ( 2 . Let S = {2 — x + 3x2.\x2. Ax + x3}).l . .72 Now Chap.x + x2.2 ) } is linearly independent in R3. • Example 12 Let T : F2 — + F2 be the linear transformation defined by T(oi. we transferred a property from the vector space of polynomials to a property in the vector space of 3-tuples. . . T(x 2 )}) = span({3x. This technique is exploited more fully later. u 2 . ( 0 .-x2. Since {3x.a 2 ) = (oi + o 2 . From the dimension theorem..x3} is linearly independent. w2. N(T) = {0}. So nullity(T) = 0. Hence Theorem 2. T(x).4 that T is one-to-one. l . nullity(T) + 3 = 3. which follows from the next theorem and corollary. rank(T) = 3.a i x 4-a 2 a: 2 ) = (ao. . 2 Linear Transformations and Matrices R(T) = span({T(l). We conclude from Theorem 2. 2 . It is easy to see that N(T) = {0}. n. then a subset S is linearly independent if and only if T(S) is linearly independent. so T is one-to-one. 2 4. Since dim(P3(i?)) = 4.6. 0 . • In Exercise 14.. l ) . . Example 13 Let T: P2(R) —• R3 be the linear transformation defined by T(a 0 4. 3 ) . • In Example 13. . and suppose that {vi. Let V and W be vector spaces over F. is used frequently throughout the book. This result. . .1 — 2. Theorem 2. o i ) . vn} is a basis for V. it is stated that if T is linear and one-to-one.5 tells us that T must be onto.oi. and therefore. there exists exactly one linear transformation T: V — * W such that T(vi) = W{ for i = 1 .. T is not onto. One of the most important properties of a linear transformation is that it is completely determined by its action on a basis. Clearly T is linear and one-to-one. .x2}.Ax 4. 2 4. ( l . . Example 13 illustrates the use of this result. wn in W. For w\.

2. . T: V — > W are linear and U(??. Null Spaces. V has a finite basis {vi..=i have U(x) = y^QjUfoj) = J^OiWi = T(x). 2 . . v £ V and d £ F. .. Then we may write n u — y biVi and i=l for some scalars bi. ..v = ^2^bi i=1 So / T(du 4.Sec.1 Linear Transformations. i=\ where a i o 2 . = d y] bjWj 4. for ?' = 1 . . n. Let V and W be vector spaces. n. ?. 2 . (c) T is unique: Suppose that U: V — * W is linear and U(t^) = Wi for i = 1 . ... i=l i=l Hence U = T. du 4. Define T: V -» W by n T(x) = J ^ o ^ . ci. . .T(v). . bn. Then for x € V with = ^OiUi.c^w. If U. and Ranges Proof. Then X = y^QtTJj..) = T(f^) for i = 1 . . CiWj = n v = y CiVi i=l cn. .. . vn}. . . Thus + ('i)v>- dT(u) 4.. Let x £ V. • • •. n. I Corollary. . c2.22 i=i i=l i=l (b) Clearly T(vi) = «. . . 2 . and suppose that. i=i 73 (a) T is linear: Suppose that u. b2. v2..v) — y](dbj 4. o n are unique scalars. then U = T..

In each part. (c) T is one-to-one if and only if the only vector x such that T(x) = 0 is x = 0.o 2 ) . and T is a function from V to W. If we know that U(l.203). This follows from the corollary and from the fact that {(1. (1. x 2 € V and yi. • EXERCISES 1.2). T: R2 -> R3 defined by T(a l 5 a 2 ) = (01 4-o 2 . (g) If T.2) = (3. then T(#v) = #w(e) If T is linear.0. U: V — > W are both linear and agree on a basis for V.3) and U(l. Label the following statements as true or false.f'(x). then T carries linearly independent subsets of V onto linearly independent subsets of W. 3 a i ) . T: M 2 x 3 ( F ) -» M 2 x 2 ( F ) defined by T On 012 0 2 1 0 22 013 \ 023 /'2an — 012 ai3 + 2ai 2 0 0 5. then T preserves sums and scalar products.rank(T) = dim(W).o 2 ) = (2a2 . For Exercises 2 through 6. (b) If T(x 4. Then compute the nullity and rank of T. 2 Linear Transformations and Matrices Let T : R2 — > R2 be the linear transformation defined by T(oi. then T is linear. and verify the dimension theorem. (h) Given x i . then U = T. 4.1)} is a basis for R2.o i . T: P2(R) .2o! . 2. and suppose that U: R2 — > R2 is linear.3). 1) = (1.74 Example 14 Chap. Finally.03) = (ai — 02. use the appropriate theorems in this section to determine whether T is one-to-one or onto.P3(R) defined by T(/(x)) = x / ( x ) 4.y2 £ W. there exists a linear transformation T: V — > W such that T(xi) = yi and T(x 2 ) = y2. (d) If T is linear. T: R3 — > R2 defined by T(oi. V and W are finite-dimensional vector spaces (over F). (f) If T is linear. . prove that T is a linear transformation.y) = T(x) + T(y). then nullity(T) 4. 3. (a) If T is linear. and find bases for both N(T) and R(T). then T = U.02.

. For each of the following parts. Prove that T linear and one-to-one. .Tl)} is a basis for W.3) = (1.vk} is chosen so that T(vi) = Wi for i = 1 ..it)2. Prove properties 1.vn} is a basis for V and T is one-to-one and onto. . T(?. (a) Prove that T is one-to-one if and only if T carries linearly independent subsets of V onto linearly independent subsets of W.3) that tv(A) = Y/Aii. Define T:P(JJ)-P(Jl) by T(/(x)) = f(t)dt. Let V and W be vector spaces.. 9. (a) (b) (c) (d) (e) T(oi.0) = (1. .Sec.2 . tion 1. Prove that S is linearly independent if and only if T(S) is linearly independent.3) = (1. 3.4).v2.2) and T(2. and Ranges 6.0. 75 Recall (Example 4. • • • . a2) T(oi.. 0. Prove that the transformations in Examples 2 and 3 are linear. . (b) Suppose that T is one-to-one and that 5 is a subset of V. Prove that T(ft) = {T(vi).1) and T ( . Sec- 7.v2..1 Linear Transformations. 2 . Is there a linear transformation T: R3 — » R2 such that T(l. 15. Prove that if S — {vi. then S is linearly independent..a 2 ) T(ai.a 2 ) T(ai.-1. T: R2 — > R2 is a function. . Null Spaces.o 2 ) (ai 4-1.o 2 ) (oi.3)? Is T one-to-one? 11. (c) Suppose ft = {vi.T(v2). Let V and W be vector spaces and T : V — > W be linear. a 2 ) = = = = = (l. let T: V — > W be linear.. and 4 on page 65.0) (|oi|. .4). W h a t / s T(8. T: M n x n ( F ) -¥ F defined by T(A) = tr(A). but not onto. . and let {w\.6 ) = (2.5). 0 . a 2 ) 10. 2.wk} be a linearly independent subset of R(T). T(1.o 2 ) T(ai.of) (sinai. k . .1)? 13. 14.. and T(l. 1) = (2. 8. state why T is not linear. What is T(2... Prove that there exists a linear transformation T: R2 — > R3 such that T ( l .11)? 12. l ) = (1. Recall the definition of P(R) on page 10. In this exercise. 2. Suppose that T: R2 -> R2 is linear.

. a 2 .76 Chap. Let V and W be vector spaces with subspaces Vi and Wi. . but not one-to-one. a vector space and Wi and W2 be subspaces of (Recall the definition of direct sum given in the function T: V — > V is called the projection on xi 4. Definition. Define the functions T. but not one-to-one.. (c) Prove that U is one-to-one. U: V -» V by T ( a i . pn . o 2 . exercises of Section 1. ) = (0. (b) Prove that T is onto. R2 such that 19. then T cannot be onto. 22. and c such that T(x. for x = T(x) = x\. . Can you generalize this result for T: F n —• F? State and prove an analogous result for T ..x2 with xi £ Wi and x2 £ W 2 . Prove that T is onto.) and U ( a i . Let T: P(R) -> P{R) be defined by T(/(x)) = f'(x). 18. The following definition is used in Exercises 24 27 and in Exercise 30.ai. Include figures for each of the following parts. • • •)• T and U arc called the left shift and right shift operators on V.. If T: V — » W is linear. respectively. Let V be V such that V = W] © W 2 . Let T: R3 — » R be linear. we have 24. Let T: R2 — * R2. .03. z) = ax + by + cz for all (x. b. . (a) Prove that if dim(V) < dim(W). (b) Prove that if dim(V) > dim(W). respectively. z) £ R3. . Let T: R — > R be linear. Give an example of a linear transformation T: R2 N(T) = R(T). pra 23.3. but not onto. Show that there exist scalars a. ^ (a) Prove that T and U are linear. ) = (02.) A Wi along VV2 if. y. Give an example of distinct linear transformations T and U such that N(T) = N(U) and R(T) = R(U). . prove that. then T cannot be one-to-one. 20. Let V and W be finite-dimensional vector spaces and T: V linear. 21. Describe geometrically the possibilities for the null space of T. Let V be the vector space of sequences described in Example 5 of Section 1.2. y. 2 Linear Transformations and Matrices Recall that T is W be 16. linear. 17. T(V ( ) is a subspace of W and that {x £ V: T(x) £ Wi} is a subspace of V. Hint: Use Exercise 22.a 2 .

show that T is the projection on the xyplane along the 2-axis. that is. a): a £ R}. Prove that there exists a subspace W and a function T: V — > V such that T is a projection on W along W . Null Spaces. Prove that W is T-invariant and that Tw — lw31. Warning: Do not assume that W is T-invariant or that T is a projection unless explicitly stated. and let T: V —• V be linear. Let T: R: R3.3. The following definitions are used in Exercises 28 32. (b) Give an example of a subspace W of a vector space V such that there are two projections on W along two (distinct) subspaces. T(W) C W. 6.c) = (a — c. (c) If T(a. Suppose that W is a subspace of a finite-dimensional vector space V. (b) Find a formula for T(a. Exercises 28-32 assume that W is a subspace of a vector space V and that T: V — > V is linear. 26. 0). V. A subspace W of V is said to be T-invariant ifT(x) £ W for every x £ W. we define the restriction of T on W to be the function T w : W — • W defined by T w ( x ) = T(x) for all x £ W.V . 0. and N(T) are all T-invariant. If W is T-invariant. b). where T represents the projection on the y-axis along the line L — {(s. b. (b) Find a formula for T(a.b. Prove that W. c). prove that Tw is linear. 7f W is T-iuvariaut. where T represents the projection on the y-axis along the x-axis. 2.1 Linear Transformations. Prove that the subspaces {0}. Suppose that T is the projection on W along some subspace W .0). assume that T: V — > V is the projection on Wi along W 2 . Describe T if Wi is the zero subspace. 30. 29. 28. 27.s): s £ R}.) (a) . Describe T if W. b. R(T). (a) (b) (c) (d) Prove that T is linear and W| = { J G V : T(X) = x}. b. (a) If T(o.Sec. where T represents the projection on the 2-axis along the xy-plane. Using the notation in the definition above. = R(T) and W 2 = N(T).c) = (a. Let V be a vector space. show that T is the projection on the x.(/-plane along the line L = {(o. 25. . (Recall the definition of direct sum given in the exercises of Section 1.b). Definitions. Suppose that V = R(T)©W and W is T-invariant. and Ranges 77 (a) Find a formula for T(a.

33. 32. 39. V has a basis ft. (a) Prove that V = R(T) + N(T). 35. f(y) = x. but V is not a direct sum of these two spaces. Let V be a finite-dimensional vector space and T: V — » V be linear. (b) Find a linear operator T x on V such that R(TT) n N(Ti) = { 0} but V is not a direct sum of R(Ti) and N(Ti). (b) Show that if V is finite-dimensional. Let x and y be two distinct vectors in ft. and let ft be a basis for V. Be careful to say in each part where finite-dimensionality is used. that is.6: Let V and W be vector spaces over a common field. Let V and T be as defined in Exercise 21. 38.78 Chap. (b) Suppose that R(T) n N(T) = {0}.2 for the case that ft is infinite. Let T: C — » C be the function defined by T(z) = z.13 (p. and define / : ft — > V by f(x) = y. then any additive function from V into W is a linear transformation. Prove that if V and W are vector spaces over the field of rational numbers. there exists a linear transformation . Thus the result of Exercise 35(a) above cannot be proved without assuming that V is finite-dimensional.T(y) for all x. 37. By the corollary to Theorem 1. R(T) = spm({T(v):v£ft}). / 36. By Exercise 34. Prove that there is an additive function T: R — > R (as defined in Exercise 37) that is not linear. Conclude that V being finite-dimensional is also essential in Exercise 35(b). Prove the following generalization of Theorem 2. 60). y £ V. A function T: V — > W between vector spaces V and W is called additive if T(x + y) = T(x) 4. Prove that V = R(T) © N(T). Prove that N(T W ) = N ( T ) n W and R(T W ) = T(W). and f(z) = z otherwise. Exercises 35 and 36 assume the definition of direct sum given in the exercises of Section 1. Prove that V = R(T) © N(T). 2 Linear Transformations and Matrices (a) Prove that W C N ( T ) . then W = N(T). 34. Then for any function f:ft—>\N there exists exactly one linear transformation T: V -+ W such that T(x) = f(x) for all x £ ft. Prove Theorem 2. (a) Suppose that V = R(T) + N(T). Prove that T is additive (as defined in Exercise 37) but not linear.3. Hint: Let V be the set of real numbers regarded as a vector space over the field of rational numbers. Suppose that W is T-invariant. (c) Show by example that the conclusion of (b) is not necessarily true if V is not finite-dimensional.

2 THE MATRIX REPRESENTATION OF A LINEAR TRANSFORMATION Until now. Similarly. We first need the concept of an ordered basis for a vector space. e 2 . . Compare the method of solving (b) with the method of deriving the same result as outlined in Exercise 35 of Section 1. . we can identify abstract vectors in an n-dimensional vector space with n-tuples. (c) Read the proof of the dimension theorem. that is. Definition. . e n } the standard ordered basis for F n .2 The Matrix Representation of a Linear Transformation 79 T: V -> V such that T(u) = f(u) for all u £ ft. (b) Suppose that V is finite-dimensional. as introduced next. we have studied linear transformations by examining their ranges and null spaces.3. ft = {ei. x n } the standard ordered basis for P n ( F ) . ei. The following exercise requires familiarity with the definition of quotient space given in Exercise 31 of Section 1. Then T is additive. Example 1 In F3. but for c = y/x. . . Let V be a vector space and W be a subspace of V. but ft ^ 7 as ordered bases. . . . An ordered basis for V is a basis for V endowed with a specific order. In fact. we call {ei. Let V be a finite-dimensional vector space. Use (a) and the dimension theorem to derive a formula relating dim(V). In this section. • For the vector space F n . x . for the vector space P n (F).e3} can be considered an ordered basis. 2. we embark on one of the most useful approaches to the analysis of a linear transformation on a finite-dimensional vector space: the representation of a linear transformation by a matrix. (a) Prove that 7 7 is a linear transformation from V onto V/W and that N(n) = W. dim(W). and dim(V/W). This identification is provided through the use of coordinate vectors. .e 2 . Now that we have the concept of ordered basis.Sec. Also 7 = {e2. an ordered basis for V is a finite sequence of linearly independent vectors in V that generates V. 40. we call {1. 63} is an ordered basis.6. T(cx) ^ cT(x). 2. we develop a one-to-one correspondence between matrices and linear transformations that allows us to utilize properties of one to study properties of the other. Define the mapping n: V -» V/W by jj(v) = v + W for v £ V.

by Let us now proceed with the promised matrix representation of a linear transformation.. v2. we call the mxn matrix A defined by Aij = a^ the matrix representation of T in the ordered bases ft and 7 and write A = [T]^. 1 < j < n. a 2 . Example 2 Let V = P2(R). . let oi.102. We illustrate the computation of [T]2 in the next several examples. vn} and 7 = {101.? < n. If f(x) = 4 + 6 x .4 in more detail. x . . x 2 } be the standard ordered basis for V.u2. 1 <i <m. . Definition. Then for each j.. wm}. 73).6 (p. then U = T by the corollary to Theorem 2.un} be an ordered basis for a finitedimensional vector space V. such that m T(VJ) = y2 aijwi i=l f° r 1 < . If V = W and ft = 7. a n be the unique scalars such that x = = y^QjMj.7 x 2 . Let ft — {ui. . Also observe that if U: V — > W is a linear transformation such that [U]^ = [T]^. . . For x £ V. Suppose that V and W are finite-dimensional vector spaces with ordered bases ft = {v\.. It is left as an exercise to show that the correspondence x —• [x]/3 provides us with a linear transformation from V to F n . i=i vector of x relative /ai\ o2 \x a = \On/ Notice that [ui\p = ei in the preceding definition. denoted [x]a.. then we write A = [T]p. We study this transformation in Section 2. Let T: V — + W be linear. Using the notation above. respectively. then We define the coordinate to ft. there exist unique scalars Oij £ F.80 Chap. Notice that the jih column of A is simply [T(vj-)]7. and let ft = {1. • • • . •. . 2 Linear Transformations and Matrices Definition.

Now T(l.4a 2 ).x 2 . 2. respectively.0 . ei}. Then T(l) = 0 .4e 3 .l 4 . 0) = (1. So [T]} / 0 1 0 0' = 0 0 2 0 0 0 0 3. respectively.Sec.2 .3a 2 .l + 0 . e 2 . 3.2) = lei + 0e2 4.x 4 .1) = (3.4 ) = 3 d + 0e2 . a2) = (oi 4. its coefficients give the entries of column j + 1 of [TH.x + 0-x 2 T(x 3 ) = 0 . -£ 01.0 .3 . / • Note that when T(x3) is written as a linear combination of the vectors of 7. . Let ft and 7 be the standard ordered bases for P3(R) and P 2 (i?).l 4 . then '2 mi' = 1 0 1 Example 4 Let T: P3(-R) —• P2(-#) be the linear transformation defined by T(/(x)) = f'(x). 81 Let ft and 7 be the standard ordered bases for R2 and R3. Hence If we let 7' = {e3.2 The Matrix Representation of a Linear Transformation Example 3 Let T : R2 —* R3 be the linear transformation defined by T(ai.0. x 2 T(x) = l-l-r-O-x + O-x2 T(x 2 ) = 0 . 2oi .0.0.x 4 . • .2e3 and T(0.

We denote the vector space of all linear transformations from V into W by £(V. W).U is linear.8. to have the result that both sums and scalar multiples of linear transformations are also linear. Definition.cU(x) 4. Then (aT 4. we see a complete identification of £(V. y £ V and c £ F.y) 4.82 Chap. (a) For all a. and let T. where V and W are vector spaces over F. Let V and W be vector spaces over a field F. respectively.U(x) for a/1 x £ V. the zero transformation. however. To make this more explicit. and hence that the collection of all linear transformations from V into W is a vector space over F. (b) Noting that T 0 . aT 4. and aT: V -> W by (aT)(x) = oT(x) for all x € V. 2 Linear Transformations and Matrices Now that we have defined a procedure for associating matrices with linear transformations. This identification is easily established by the use of the next theorem. So aT 4. U: V — > W be linear transformations.8 that this association "preserves" addition and scalar multiplication. II Definitions. W). We are fortunate. the collection of all linear transformations from V to W is a vector space over F.• W be linear.4. it is easy to verify that the axioms of a vector space are satisfied.U is linear. Let V and W be vector spaces over F. Theorem 2. where n and rn are the dimensions of V and W.y) = a[T(cx + y)]4-cU(x) + U(y) = a[cT(x) + T(y)] + cU(x)4-U(y) = acT(x) 4.U(cx 4.y) = aT(cx 4. we write £(V) instead of £(V. Let V and W be finite-dimensional vector spaces with ordered bases ft and 7. we show in Theorem 2. U: V . plays the role of the zero vector. Then . We define T 4.aT(y) + V(y) = c(aT + [)){x) + {aT + U){y). (a) Let x. W) with the vector space Mmxn(E). Theorem 2. Proof. U : V — » W be arbitrary functions. and let a £ F. these are just the usual definitions of addition and scalar multiplication of functions. and let T. In the case that V = W.7.UWcx 4. we need some preliminary discussion about the addition and scalar multiplication of linear transformations. In Section 2. (b) Using the operations of addition and scalar multiplication in the preceding definition. respectively.U: V —* W by (T + U)(x) = T(x) 4. £ F. Of course. Let T.

a 2 ) = (2ai 4.3ai 4.a 2 .2oi — 4a 2 ) and U(oi. 1 < j < n) such that rn T(t.U ] J ) i j = n .j) = yjoij'" .. unique scalars ay and />ij (1 < i < m. respectively.U using the preceding definitions. Let ft = {vi.wm}..3 -r 0I 2. So [T + U ] J .5ai .3a 2 . If we compute T 4.2a 2 ).a 2 ) = (ai 4. i=\ Thus ( [ T 4 . Then mj = [0 2 (as computed in Example 3).. Proof. . 01 -A. illustrating Theorem 2..2ai.[ 2 Oj. • which is simply [T]l + [U]^. J 4 .Sec..o 2 ) = (ai . we obtain (T 4. v2. .. and U '1 j2 .0.2oi.2a 2 .^ = ([T]54-[U]3) u .U)(a.. U ail( There exist l m U(vj) = 7 ] 6jjt^j i=l for 1 < j < n.w2. i j=] Hence (T + U K ^ ^ ^ O y + f c y J t B . vn} and 7 = {wi. .2 The Matrix Representation of a Linear Transformation (a) [T + % = [T]} + M}and 83 (b) [oT]I = a[T]^ for all scalars a. 2.. So (a) is proved. Example 5 Let T: R2 — > R3 and U: R2 — > R3 be the linear transformations respectively defined by T(oi.2a 2 ). Let ft and 7 be the standard ordered bases of R2 and R3. and the proof of (b) is similar.8.

x 2 }. .x1}. 7 (g) T: R -* R defined by T ( a i .84 Chap.03) = 2ai 4 . (0.ai +03). . R3 defined by T(ai.a2 — 303. ai 4. R 2 defined by T(ai. . . 4.2.3)}. . n m 2.1. Define T: Let 0 = Compute [T]^.3ai 4-4a2. compute [T]l. Label the following statements as true or false. n m For each linear transformation T: R — > R . M2x2(R) P 2 (fl) by T = (a + b) + (2d)x 4. respectively. W) is a vector space.03) = (2ai 4.a„) = ( o n . . . U: V — > W are linear transformations.3)}. (b) [T]J = [[)]} implies that T = U. .ai).an. (f) £(V. . a2. r n ( 0 T : R -* R defined by T ( a i .02) = (2ai — a2. . a i .x. (a) (b) (c) (d) T:R 2 T:R 3 T:R 3 T:R 3 R3 defined by T(01. and 7 = {1}- . Let ft and 7 be the standard ordered bases for R and R . aT 4. a 2 . a „ _ i .6x2.03) = (2a2 4 .1).0 3 .02.W) = £(W. (c) If m = dim(V) and n = dim(W). .1. 5.3a2 — 03.03). . and T.x. . . .2). (d) [ H . Let T: R2 » R3 be defined by T ( a i . r 3. then [T]^ is an m x n matrix. 02.V). Let ft be the standard ordered basis for R2 and 7 = {(1.0). a 2 ) = (ai — a 2 . a n ) = ai 4. a 2 . R defined by T(ai. (a) For any scalar a. .U ] J = rT]J + [U]J(e) £(V.U is a linear transformation from V to W. 2 Linear Transformations and Matrices EXERCISES 1. Let and 7 = {l. . a i . . ft={l. Assume that V and W are finite-dimensional vector spaces with ordered bases ft and 7. a 2 . (2. a n ) = ( a i . (2. respectively. compute [TTC.ai).01). .0 ! 4-4a 2 4-5a 3 . If a = {(1. . 2 a i 4-a 2 ). . Compute [T]J. n (e) T: R -* R defined by T ( a i .

Prove part (b) of Theorem 2.7. where z is the complex conjugate of z.x 2 . compute [f(x)]@. Compute [T]l. [T]^.1 that T is not linear if V is regarded as a vector space over the field C.6 (p. vn}. / 8. Let V be an n-dimensional vector space. compute [a]7. (Recall by Exercise 38 of Section 2. Complete the proof of part (b) of Theorem 2. . (c) Define T: M 2 x 2 ( F ) -> F by T(A) = tr(A). and compute [T]^. (b) Define T: P2(R) . Suppose that W is a T-invariant subspace of V (see the exercises of Section 2. By Theorem 2. Let V be Define vo formation Compute a vector space with the ordered basis ft = {v\.M2x2(R) by T(/(x)) = ^ 0 ) 85 *£jjj) .Sec. and let T: V — * V be a linear transformation. 11..V2. where ft — {l.n.. Prove that T is linear. 2. Show that there is a basis ft for V such that [T]^ has the form A O B C where A is a k x k matrix and O is the (n — k) x k zero matrix.) 10. Define T: V — > F n by T(x) = [x]^. (f) If f(x) = 3 — 6x 4.8. Prove that T is linear. • • •. where ' denotes differentiation.2. Compute [T]2. .1) having dimension k.. there exists a linear transT: V —• V such that T(VJ) = Vj 4. Let V be the vector space of complex numbers over the field R. (d) Define T: P2(R) -* R by T(/(x)) = /(2).fj-i for j = 1. Define T: V — > V by T(z) = z.2 The Matrix Representation of a Linear Transformation (a) Define T: M 2 x 2 ( F ) -+ M 2 x 2 ( F ) by T(A) = AK Compute [T] a .* Let V be an n-dimensional vector space with an ordered basis ft. 72). (g) For a £ F. = 0. 7. 9. 6. Compute [T]L (e) If A = 1 0 -2 4 compute [i4]Q.i}.

W). If R(T) n R(U) = {0}. Let V and W be vector spaces.T(y)) = all(T(x)) 4. and let T: V — » W be linear. we learned how to associate a matrix with a linear transformation in such a way that both sums and scalar multiples of matrices are associated with the corresponding sums and scalar multiples of the transformations. Let V and W be vector spaces such that dim(V) = dim(W). (a) 5° is a subspace of £(V. 16. Let V and W be vector spaces. Define 5° = {T £ £(V. and Z be vector spaces over the same field F. Let V be a finite-dimensional vector space and T be the projection on W along W .9. Proof. . Show that there exist ordered bases ft and 7 for V and W. (b) If Si and S2 are subsets of V and Si C S2. and let S be a subset of V. then (Vi 4. 13. and let T: V -+ W and U: W -» Z be linear. where f{j)(x) is the jib. Let V = P(R). . W.U(T(y)) = o(UT)(x) 4. (See Appendix B. The attempt to answer this question leads to a definition of matrix multiplication. prove that {T. Prove that the set {Ti.3 COMPOSITION OF LINEAR TRANSFORMATIONS AND MATRIX MULTIPLICATION In Section 2. U} is a linearly independent subset of £(V. T h e o r e m 2. We use the more convenient notation of UT rather than U o T for the composite of linear transformations U and T. Prove the following statements.UT(y). derivative of f(x). (See the definition in the exercises of Section 2. 2 Linear Transformations and Matrices 12.86 Chap.y) = U(T(ox 4.) Find an ordered basis ft for V such that [T]^ is a diagonal matrix.)) = fu)(x). (c) If Vi and V2 are subspaces of V. W).V 2 )° = V? n Vg. Then UT: V -> Z is linear.y)) = U(oT(x) 4.) Our first result shows that the composite of linear transformations is linear. Let x.2. 2. Let V. I . . y € V and a£ F.1 on page 76. T 2 . and let T and U be nonzero linear transformations from V into W. . and for j > 1 define T. 14. then S% C 5?. respectively. The question now arises as to how the matrix representation of a composite of linear transformations is related to the matrix representation of each of the associated linear transformations. such that [T]I is a diagonal matrix.-(/(a. T n } is a linearly independent subset of £(V) for any positive integer n. Then UT(ox 4. where W and W are subspaces of V. 15.W): T(x) = 0 for all x £ S}.

+ U 2 ) = T U i + T U 2 a n d ( U ] + U 2 )T = U1T4. BkMwk) B kj . m }. Definition. MkBkjfe=i This computation motivates the following definition of matrix multiplication.) Let T: V — » W and U: W — • Z be linear transformations.=1 i=l where Cij — 2_. Let A be an m x n matrix and B be an n x p matrix. We define the product of A and B. Let T. to be the m x p matrix such that n (AB)ij = Y..10.fc=i k=l = E E ^ i=l \A.Sec. Then (a) T(U.Z2. • • • .3 Composition of Linear Transformations and Matrix Multiplication 87 The following theorem lists some of the properties of the composition of linear transformations. where a = {vi. we have / m \ rn (UT)fe) = \J(T(Vj)) = U ( ] £ Bkjwk = J2 . and let A = [U]^ and B = [T]£. Some interesting applications of this definition are presented at the end of this section. respectively.zp} are ordered bases for V.u.(aU 2 ) for all scalars a.. and Z. I A more general result holds for linear transformations that have domains unequal to their codomains.T (d) a(UjU 2 ) = (alli)U 2 = U.. We would like to define the product AB of two matrices so that AB — [UT]^. k=\ Note that (AB)ij is the sum of products of corresponding entries from the ith row of A and the j t h column of B..i02. AikBkj for 1 < i < m. W. Consider the matrix [UT]^. denoted AB. 1 < j < p. 2. Exercise. and 7 = {z\. Theorem 2.v2.. For 1 < j < n.vn}. Ui. (See Exercise 8.. U2 £ £(V).U 2 T (b) T(UiU 2 ) = (TU.. Let V be a vector space. Proof.)U 2 (c) Tl = IT . . ft = {ttfi.

Therefore the transpose of a product is the product of the transposes in the opposite order.88 Chap. then (AB)1 = BtAt. Example 1 We have 1-44-2-24-1-5 0-44-4-24-(-l)-5 Notice again the symbolic relationship ( 2 x 3 ) . Let V. 2 Linear Transformations and Matrices The reader should observe that in order for the product AB to be defined. there are restrictions regarding the relative sizes of A and B. Since (AB)tij = (AB)ji and [B*A% = ] £ ( B * ) « ( A % = fc=i fc=i J2BkiA^ = J2AjkBki fc=i we are finished. Then [UTE = [urjmS- . and 7. we have that matrix multiplication is not commutative. The following mnemonic device is helpful: "(ra x n)'(n x p) = (ra x p)". Consider the following two products: 1 1 0 0 0 1 1 0 1 1 0 0 and 0 1 1 0 1 1 0 0 0 0 1 1 Hence we see that even if both of the matrix products AB and BA are defined.11. Theorem 2. Recalling the definition of the transpose of a matrix from Section 1. ft. Let T: V — * W and U: W — * Z be linear transformations. and Z be finite-dimensional vector spaces with ordered bases a.3. in order for the product AB to be defined. it need not be true that AB = BA. • As in the case with composition of functions. The next theorem is an immediate consequence of our definition of matrix multiplication. we show that if A is an mxn matrix and B is an nxp matrix. and the two "outer" dimensions yield the size of the product. respectively. that is. W.( 3 x l) = 2 x 1. the two "inner" dimensions must be equal.

12.10(b) has its analog in Theorem 2. observe that '0 1 0 0N [UTl/j = [UlSm? = | 0 0 2 0 . 1 0 0 1 and 1% — /i = (l).11. (c). The n x n identity matrix In is defined by (In)ij — Oy. 2. it follows that UT = I. /° 1 0 \° 0 0 1 2 0 o\ 0 0 3 1 / The preceding 3 x 3 diagonal matrix is called an identity matrix and i is defined next. T h e o r e m 2. Let T.16. and D and E be q x rn matrices. Jo Let a and /? be the standard ordered bases of Ps(R) and P2(R).11 in the next example.10. along with a very useful notation. Thus. for example. the identity transformation on P2(R). (b) a(AB) = (aA)B — A(aB) for any scalar a. (d) If V is an n-dimensional vector space with an ordered basis ft.U £ £(V). Then [[)T]0 = [V]p[T]p. Let V be a finite-dimensional vector space with an ordered basis ft.E)A = DA + EA. respectively. Let A be an m x n matrix.3 Composition of Linear Transformations and Matrix Multiplication 89 Corollary. We define the Kronecker delta 6ij by Sij = lifi=j and 6ij =0ifi^j. To illustrate Theorem 2. We illustrate Theorem 2.Sec. Example 2 Let U:'P 3 (i2) -> P2(R) and T: P2(R) -* ?3{R) be the linear transformations respectively defined by U(/(x)) = f'(x) and T(/(x)) = f f(t) dt. and (d) of Theorem 2. When the context is clear. (c) ImA = A = AIn. h = The next theorem provides analogs of (a). Theorem 2. Then (a) A(B -t-C) = AB + AC and (D 4. then [lv]/3 = In- . From calculus. we sometimes omit the subscript n from In.0 0 0 3. Definitions. B and C be n x p matrices. the Kronecker delta. Observe also that part (c) of the next theorem illustrates that the identity matrix acts as a multiplicative identity in M n X n ( F ) .

A2 = AA.ak be scalars. (See Exercise 5.(AC)ij = [AB 4. We define A0 = In.. For each j (1 < j < p) let Uj and Vj denote the jth columns of AB and B. Let A be an ra x n matrix.Bk benxp matrices. .Y k=\ k=l fc=l = (AB)ij 4. Then Al^OiBi) j=i and / < fc \ fc a C A Y i i ) = X ] OiCi A vi=I / i=l Proof. A3 — A2A.AikCkj) = J2 AikBkj 4. Thus the cancellation property for multiplication in fields is not valid for matrices. For an n x n matrix A. So A(B + C) = AB + AC. Ak = Ak~l A for k = 2 . and. respectively. we see that if A = 0 0 1 0/ ' =YjaiABi i=\ A ikCkj then A2 = O (the zero matrix) even though A ^ O. . . C 2 .) (a) We have [A(B + C)}ij = ] T Aik(B 4.a2. Cfc be qxm matrices. . We prove the first half of (a) and (c) and leave the remaining proofs as an exercise.13. assume that the cancellation law is valid.. (c) We have m m [ImA)ij = y ^\lrn)ikAkj — / Ojfc-^fcj = Ay. Bi.. we define A. from A-A = A2 = O = AO. 3 .90 Chap.C)kj = J2 Aik(Bkj + Ckj) k=i fc=i n n n = ^2(AikBkj 4.AC]y. Then. To see why. Ci. which is false. . . With this notation. Let A be an ra x n matrix and B be an n x p matrix. .B2.• • . 2 Linear Transformations and Matrices Proof. Then .1 = A. in general. we would conclude that A — O. . Exercise. and ai. fc=l fc=l Corollary.. Theorem 2. .

2.11. Fix u £ V. Notice that g = If. we have /0 A= I0 \0 1 0 0N 0 2 0 0 0 3. from Example 4 of Section 2. The proof of (b) is left as an exercise. Identifying column vectors as matrices and using Theorem 2. It utilizes both the matrix representation of a linear transformation and matrix multiplication in order to evaluate the transformation at any given vector.14. for each u £ V.13 that column j of AB is a linear combination of the columns of A with the coefficients in the linear combination being the entries of column j of B. that is. T h e o r e m 2.) It follows (sec Exercise 14) from Theorem 2.3 Composition of Linear Transformations and Matrix Multiplication (a) Uj = AVJ (b) Vj — BCJ. where ej is the jth standard vector of Fp. we obtain [T(ti)]7 = mh Example 3 Let T: Ps(i?) — • P 2 (#) be the linear transformation defined by T(/(x)) = f'(x). The next result justifies much of our past work. (See Exercise 6. = M = PVE = m j i / 1 2 = m j [ / ( i ) ] / j = m j M / i • . Let V and W be finite-dimensional vector spaces having ordered bases ft and 7.). Proof.2. we have [T(u)]7 = [T]}[u}0. respectively. Let a = {1} be the standard ordered basis for F.Sec. If A — [T]L then. An analogous result holds for rows. row i of AB is a linear combination of the rows of B with the coefficients in the linear combination being the entries of row i of A. and define the linear transformations / : F — * V by /(a) = au and g: F — » W by g(a) = aT(u) for all a £ F. and let ft and 7 be the standard ordered bases for Ps(R) and P2(i?. respectively. Then. (a) We have 91 f(AB)u\ (AB)2j \{AB)mj) fc=i ^2A2kBk kj fc=l 2 4 AmkBkj Vfc=i = A (Bij\ B2j {BnjJ = AvJ- Hence (a) is proved. and let T: V —• W be linear. Proof.

. it has a great many other useful properties. Example 4 Let A = Then A £ M2x3(R) 1 0 2 1 1 2 and L^: R3 -> R2.F" — * FTO defined by L^(x) = Ax (the matrix product of A and x) for each column vector x £ F n . This transformation is probably the most important tool for transferring properties about transformations to analogous properties about matrices and vice versa./(. Let q(x) — T(p(x)). then q(x) = p'(x) = — A + 2x 4. If then LA(x) = Ax = 1 2 0 1 We see in the next theorem that not only is L^ linear.14 by verifying that [T(p(x))]7 = [T]l[p(x)]/3. We denote by LA the mapping LA. We call LA a left-multiplication transformation. For example. in fact.9x 2 .r)) . Definition. but also '0 = 0 0 1 0 0' 0 2 0 0 0 3.92 Chap. we use it to prove that matrix multiplication is associative. 2 Linear Transformations and Matrices We illustrate Theorem 2. | 2 raw* = A\P(X)]0 V V We complete this section with the introduction of the left-multiplication transformation LA. ( 2\ -A 1 !.-)].T(p(. but. where p(x) £ P3(R) is the polynomial p(x) = 2 —4x4-x 2 4-3x 3 . Hence . Let A be an ra x n matrix with entries from a field F. These properties are all quite natural and so are easy to remember. where A is an mxn matrix.

So T = L^. Then the left-multiplication transformation LA '. The uniqueness of C follows from (b).3 Composition of Linear Transformations and Matrix Multiplication 93 T h e o r e m 2.j). if B is any other ra x n matrix (with entries from F) and ft and 7 are the standard ordered bases for F n and Fm.15 and the associativity of functional composition (see Appendix B).In fact.16. Then (AB)C is also defined and A(BC) = (AB)C.15. The fact that LA is linear follows immediately from Theorem 2. then Ljn = lp«. Let A. (e) If E is an n x p matrix. 2. then there exists a uniquemxn that T = Lc. we have [T(x)]7 = [ T ^ x ] ^ or T(x) = Cx = Lc(x) for all x £ F n . it follows that A(BC) = (AB)C. Thus LAE(ej) = (AE)ej = A(Eej) = LA(Ee5) = L A (LB(e. (b) LA = LB if and only if A = B. C = [T]}. 73). we have LA(BC) = LALBC = L A ( L B L C ) — ( L ^ L B ) L C = L ^ B I .12.) 1 We now use left-multiplication transformations to establish the associativity of matrix multiplication.C = L^AB)C- So from (b) of Theorem 2. that is. (f) The proof is left as an exercise. which is also the jth column of A by Theorem 2. then we may use (a) to write A — [LA]} LB]} = B. (a) [LA]} = A.15.LB a n d LaA = OLA for all a£ F. So (AE)ej = A(Ee. Using (e) of Theorem 2. (a) The jth column of [LA]} is equal to LA(CJ). (b) If LA = LB. Proof.) (d) Let C = [T]}.13(b).B. (c) The proof is left as an exercise. and C be matrices such that A(BC) is defined. (e) For any j (1 < j < p). respectively.6 (p. So [LA]} = A. (See Exercise 7.)).13 several times to note that (AE)ej is the jth column of AE and that the jth column of AE is also equal to A(Eej).Sec. T h e o r e m 2. It is left to the reader to show that (AB)C is defined. then we have the following properties. then LAE = ^-ALE(f) Ifm = n. matrix multiplication is associative.14. matrixC such Proof. Hence A — B. Hence LAE = L^Ls by the corollary to Theorem 2. The proof of the converse is trivial. we may apply Theorem 2.F n — > F m is linear. By Theorem 2. However L^e^) = Aej. Let A be an m x n matrix with entries from F. (d) If T: F n — * F m is linear. | . (See Exercise 7. (c) LA+B = LA 4. Furthermore.

2.A34A41. and Ay = 0 otherwise. and Ay = 0 otherwise. then we define the associated incidence matrix A by Ay = 1 if i is related to j. Thus (A2)3i gives the number of ways in which 3 can send to 1 in two stages (or in one relay). Applications A large and varied collection of interesting applications arises in connection with special matrices called incidence matrices. this theorem could be proved directly from the definition of matrix multiplication (see Exercise 18). each of whom owns a communication device. In general." then Ay = 1 if i can send a message to j. An incidence matrix is a square matrix in which all the entries are either zero or one and. Since (\ 0 0 1\ 1 2 0 0 A = 2 1 0 1 Vi 1 0 17 2 we see that there are two ways 3 can send to 1 in two stages. if and only if 3 can send to k and k can send to 1. The proof above..A32A21 4. Suppose that fO 1 0 1 0 0 A = 0 1 0 \1 1 0 0\ 1 1 0/ Then since A34 = 1 and A14 = 0. (A + A 2 + • • • 4. Note that any term A3fcAfci equals 1 if and only if both A3fc and Afci equal 1. To make things concrete.A33A31 4.. 2 Linear Transformations and Matrices Needless to say. that is. but there is a simple method for determining if someone . all the diagonal entries are zero. A maximal collection of three or more people with the property that any two can send to each other is called a clique.94 Chap. we see that person 3 can send to 4 but 1 cannot send to 4. Consider. We obtain an interesting interpretation of the entries of A 2 . If we have a relationship on a set of n objects that we denote by 1. provides a prototype of many of the arguments that utilize the relationships between linear transformations and matrices. (A 2 ) 3I = A31A11 4. If the relationship on this group is "can transmit to. for instance. for convenience.A m ) y is the number of ways in which i can send to j in at most ra stages. The problem of determining cliques is difficult. .. however.n. suppose that we have four people.

exactly one of them dominates (or. that is. described earlier. Consider. for example. In this case. If we define a new matrix B by By = 1 if i and j can send to each other. Our final example of the use of incidence matrices is concerned with the concept of dominance. and By = 0 otherwise. it can be shown (see Exercise 21) that the matrix A 4. given any two people. For such a relation. we form the matrix B. A relation among a group of people is called a dominance relation if the associated incidence matrix A has the property that for all distinct pairs i and j. Aij = 1 if and only if Aji — 0. suppose that the incidence matrix associated with some relationship is /o 1 A = 1 \1 1 0 1 1 0 1\ 1 0 0 1 1 0/ To determine which people belong to cliques. An = 0 for all i. it can be shown that any person who dominates [is dominated by] the greatest number of people in the first stage has this property. In fact. can send a message to) the other. 2. we conclude that there are no cliques in this relationship. there is at least one person who dominates [is dominated by] all others in one or two stages. Now (0 1 A + A2 = 1 1 V2 2 1 0 1 2 0 2 2 2 2 1 1 2 0 2 1\ 0 1 1 V .Sec. (0 1 0 1 0 1 B = 0 1 0 1 0 1 1\ 0 1 o 3 and B (0 A = 0 U A 0 4 0 0 4 0 4 4\ 0 4 0 Since all the diagonal entries of B 3 are zero. In other words. using the terminology of our first example. For example.A 2 has a row [column] in which each entry is positive except for the diagonal entry.3 Composition of Linear Transformations and Matrix Multiplication 95 belongs to a clique. and compute B 3 . then it can be shown (see Exercise 19) that person i belongs to a clique if and only if (B3)u > 0. the matrix /o 1 0 0 0 1 A = 1 0 0 0 1 0 \1 1 1 1 0 1 0 0 0\ 0 0 1 o^ The reader should verify that this matrix corresponds to a dominance relation. Since A is an incidence matrix.

A2 = I implies that A = I or A = -I. In each part. where O denotes the zero matrix. BCt. 2. 4.W.mg[U]?. while persons 1.2/(x) and U(a 4. V. and A(££>). [T2]g = ([T]g)2. (a) [UTfc . (AB)D. 3. (a) Let = 0 ? and B = D - ' ) . (c) [U(w)]p = [\J]P[w]l3 for ail w€\N. If A is square and Ay = <5y for all i and j. (d) (e) (f) (g) (h) (i) (j) [lv]« = / . . Label the following statements as true or false. L ^ + B = LA + LB. T = LA for some matrix A. and 5 dominate (can send messages to) all the others in at most two stages. 4. A*B. (b) Let B = Compute A4. 2 Linear Transformations and Matrices Thus persons 1. / A = G C = (-: -?)< 1 B 2. T: V — • W and U: W — * Z denote linear transformations. Let g(x) = 3 4. 3. Let ft and 7 be the standard ordered bases of P2(R) and R3.96 Chap. respectively. and 7. and A and B denote matrices. 3.3C).x. and 4 are dominated by (can receive messages from) all the others in at most two stages. ft. and Z denote vector spaces with ordered (finite) bases a. a . Let T: P2(R.bx + ex2) = (a + b. -2 Oj ( 1 Compute A(2B 4. CB. and CA. EXERCISES 1.b). (b) \T(v)]0 = mSMa for all v € V.) -> P 2 (B) and U: P 2 (B) linear transformations respectively defined by T(/(x)) = f'(x)g(x) R3 be the and C = (4 0 3) . respectively. c. then A = I. A2 = O implies that A = O.

Find linear transformations U. then U is onto. (a) Prove that if UT is one-to-one. 2. Prove (c) and (f) of Theorem 2.2x 4. then UT is also. let T be the linear transformation defined in the corresponding part of Exercise 5 of Section 2. where A = ( \ (b) * [T(/(x))]a.T: F2 — • F2 such that UT = To (the zero transformation) but TU ^ To. Then use Theorem 2. and let T: V — > V be linear. . and let T: V — W and U: W — Z be linear. and Z be vector spaces. Let A and B be n x n matrices. For each of the following parts. Prove (b) of Theorem 2.Sec.13. 11.x 2 . 9.11 to verify your result. [T]^. Now state and prove a more general result involving linear transformations with domains unequal/to their codomains. 4. Let A be an n x n matrix. 13.10.r)=4f [T(/(x))] 7 .2. Use Theorem 2. 6. (b) Let h(x) = 3 .14 to compute the following vectors: (a) [T(A)] a . Prove that A is a diagonal matrix if and only if Ay = SijAjj for all i and j. (c) [T(A)]7.15. Must U also be one-to-one? (b) Prove that if UT is onto.Use your answer to find matrices A and B such that AB = O but BA^O. W. Must T also be onto? (c) Prove that if U and T are one-to-one and onto. Let V be a vector space. Prove that T 2 = T() if and only if R(T) C N(T). then T is one-to-one.where/(.12 and its corollary.x + 2x2. and [UTg directly.3 Composition of Linear Transformations and Matrix Multiplication 97 (a) Compute [[)]}. where A=(l (d) 5. Let V. 8. *=i Prove that tr(AB) = tr(BA) and tr(A) = tr(A'). where f(x) = 6 . 7. 10. 12. Compute [h{x)]a and [U(/t(x))]7. Recall that the trace of A is defined by tr(A) = £ ) A i i .14 to verify your result. Complete the proof of Theorem 2. Prove Theorem 2. Then use [[)]} from (a) and Theorem 2.

if z = (ai. 17. (a) Suppose that z is a (column) vector in F p .T(x)) for every x in V. (d) Prove the analogous result to (b) about rows: Row i of AB is a linear combination of the rows of B with the coefficients in the linear combination being the entries of row i of A. prove that R(T) n N(f) = {0}. If the j t h column of A is a linear combination of a set of columns of A. 18.. prove that wA is a linear combination of the rows of A with the coefficients in the linear combination being the coordinates of w. 16. prove that the jth column of MA is a linear combination of the corresponding coftimns of MA with the same corresponding coefficients. (a) If rank(T) = rank(T 2 ). ap)1.98 Chap. (b) Prove that V = R(Tfc) © N(Tfc) for some positive integer k.13. prove that multiplication of matrices is associative. Assume the notation in Theorem 2. * Let M and A be matrices for which the product matrix MA is defined.. then show that Bz = J2aJUJi=l (b) Extend (a) to prove that column j of AB is a linear combination of the columns of A with the coefficients in the linear combination being the entries of column j of B. Let V be a finite-dimensional vector space. Hint: Note that x = T(x) 4. 19.3). m (c) For any row vector w £ F . and show that V = {y: T(y) = y} © N(T) (see the exercises of Section 1. 2 Linear Transformations and Matrices 14. ..3).(x .. Deduce that V = R(T) e N(T) (see the exercises of Section 1. Determine all linear transformations T: V — *V such that T = T 2 . 20.13(b) to prove that Bz is a linear combination of the columns of B. and By = 0 otherwise. Hint: Use properties of the transpose operation applied to (a). prove that i belongs to a clique if and only if (B3)u > 0. Let V be a vector space. Use Theorem 2. For an incidence matrix A with related matrix B defined by By = 1 if i is related to j and j is related to i. In particular.a2. Using only the definition of matrix multiplication. and let T: V — • V be linear. 15. Use Exercise 19 to determine the cliques in the relations corresponding to the following incidence matrices.

Nevertheless. of course.. if T is invertible. we apply many of the results about invertibility to the concept of isomorphism. These ideas will be made precise shortly. . We will see that finite-dimensional vector spaces (over F) of equal dimension may be identified. / 2.4 INVERTIBILITY AND ISOMORPHISMS The concept of invertibility is introduced quite early in the study of functions.A2 has a row [column] in which each entry is positive except for the diagonal entry. Let A be an n x n incidence matrix that corresponds to a dominance relation. 2.3. 22. the inverse of the left-multiplication transformation L. For example. in calculus we learn that the properties of being continuous or differentiable are generally retained by the inverse functions. Let V and W be vector spaces.17) that the inverse of a linear transformation is also linear. Determine the number of nonzero entries of A. Definition. 23.\ (when it exists) can be used to determine properties of the inverse of the matrix A. As noted in Appendix B. Fortunately. This result greatly aids us in the study of inverses of matrices.4 Invertibility and Isomorphisms (() 1 0 \l 0 1 0 1\ 0 0 1 0 1 0 1 ()) {{) 1 1 \1 0 1 1\ 0 0 1 0 0 1 0 1 0/ 99 (a) (b) 21. many of the intrinsic properties of functions are shared by their inverses. As one might expect from Section 2. true for linear transformations. Prove that the matrix A 4.Sec. Use Exercise 21 to determine which persons dominate [are dominated by] each of the others within two stages. In the remainder of this section. The facts about inverse functions presented in Appendix B are. Prove that the matrix A = corresponds to a dominance relation. then T is said to be invertible. A function U: W —• V is said to be an inverse of T if TU — lw and UT = lvIf T has an inverse. we repeat some of the definitions for use in this section. We see in this section (Theorem 2. Let A be an incidence matrix that is associated with a dominance relation. then the inverse of T is unique and is denoted by T~l. and let T: V — > W be linear.

T(x 2 )] = J-l[T(cxi 4.1 . 3. 2 Linear Transformations and Matrices The following facts hold for invertible functions T and U. Let A be an n x n matrix. ( T . The reader should note the analogy with the inverse of a linear transformation. We are now ready to define the inverse of a matrix. T _ 1 is invertible.1 (j/ 2 ). We can therefore restate Theorem 2. and let T: V linear and invertible. We often use the fact that a function is invertible if and only if it is both one-to-one and onto.1 = T. a 4. W be Proof. Let 3/1.17 demonstrates. Definition. then the matrix B such that AB = BA = J is unique. in particular. if there If A is invertible.x 2 )] = c x i . Example 2 The reader should verify that the inverse of 5 2 7 3 is 3 -2 -7 5 . Then T is invertible if and only if rank(T) = dim(V). (TU).1 ) . As Theorem 2. Then T _ 1 : W — > V is linear.b).T . one-to-one. Then A is invertible exists annx n matrix B such that AB = BA — I. Since T is onto and one-to-one.^ q / ! 4. and onto are all equivalent. The reader can verify directly that T _ 1 : R2 — > Pi(B) is defined by T _ 1 (c.r x 2 = cT. d) = c + (d — c)x. Thus xi . Let V and W be vector spaces.1 .C(AB) = (CA)B = IB = B.5 as follows. | It now follows immediately from Theorem 2.1 [cT(xi) 4. then the conditions of being invertible.100 Chap. then C = CI . so f T .) The matrix B is called the inverse of A and is denoted by A . Observe that T _ 1 is also linear.2/2 € W and c £ F. Let T: V —• W be a linear transformation. (If C were another such matrix. there exist unique vectors xi and x 2 such that T(xi) = yi and T(x 2 ) = j/2.1 (j/ 1 )4-T. Example 1 Let T: Pi(B) —• R2 be the linear transformation defined by T(a 4. 2. where V and W are finitedimensional spaces of equal dimension.17.1 = U~ 1 T. 71) that if T is a linear transformation between vector spaces of equal (finite) dimension. • Theorem 2.5 (p.^ j / i ) and x 2 = T~1(y2).y2) = T. 1. this is true in general.bx) = (a.

we have nullity(T) = 0 and rank(T) = dim(R(T)) = dim(W). vn}. observe that [iJT]0 = [^[T}} = BA = In = {W}0 I by Theorem 2..142. we have dim(V) = dim(W). if W is finite-dimensional. Conversely.2. 2. To show that U = T _ 1 . there exists U e £ ( W ..2 (p. 1 So by the dimension theorem (p.Thus In = [Ivfe = [ T .Sec. . Let n = dim(V).l T l . So UT = l v . Similarly. [ T g p . • • •. 2 . Proof.6 (p..18. 44). By the lemma. hence W is finitedimensional by Theorem 1. So [T]l is an n x n matrix. Then there exists an n x n matrix B such that AB = BA = In. .X2.1 T = lv. Let ft = {x\. = [ T . In this case.. wn} and 0 = {«i. . . Then T is invertible if and only if [T]^ is invertible. we develop a number of results that relate the inverses of matrices to the inverses of linear transformations. So [T]} is invertible and ( [ T g ) " 1 = [T" 1 ]^. we learn a technique for computing the inverse of a matrix. Then V is finite-dimensional if and only if W is finite-dimensional. Lemma. V ) such that u (wj) = ^2 BiiVi i=l fori = 1 . TU = l w .11 (p... dim(V) = dim(W). T(/?) spans R(T) = W. ifc. respectively. Suppose that V is finite-dimensional. At this point. then so is V by a similar argument. 70).1 ] ^ . Suppose that T is invertible. Now T _ 1 : W —• V satisfies T T _ 1 = lw and T . By Theorem 2. it follows that dim(V) = dim(W). Let V and W be finite-dimensional vector spaces with ordered bases (3 and 7. Furthermore. 88). 68). ^T-1]^ = ([T]2) _1 . . n.9 (p. Theorem 2. It follows that [U]^ = B. By Theorem 2. xn} be a basis for V. Now suppose that V and W are finite-dimensional. where 7 = {101. Proof. 72)..4 Invertibility and Isomorphisms 101 In Section 3. and similarly..1 ] ? = J n . using T _ 1 . Let T be an invertible linear transformation from V to W. Let T: V — > W be linear. Because T is one-to-one and onto. Now suppose that A = [T]Z is invertible.

Example 4 Define T: F 2 — * P.) So we need only say that V and W are isomorphic. Furthermore. Exercise. Let V and W be vector spaces. !§j Corollary 2. that certain vector spaces strongly resemble one another except for the form of their vectors. For example. Such a linear transformation is called an isomorphism from V onto W. in the case of M2x2(^) and F 4 . For T as in Example 1. (See Appendix A. in terms of the vector space structure. Let V be a finite-dimensional vector space with an ordered basis (3. that is. that is.1 = I-A-'Proof.. 6. if we associate to each matrix a c 6 d the 4-tuple (a. Furthermore. we have TO=(i ' '-']?=(_. • . we see that sums and scalar products associate in a similar manner. It is easily checked that T is an isomorphism. We leave as an exercise (see Exercise 13) the proof that "is isomorphic to" is an equivalence relation. so F2 is isomorphic to Pi(F). Exercise.1 ]^ = (PI/3) . We say that V is isomorphic to W if there exists a linear transformation T: V — > W that is invertible. and let T: V — > V be linear. ( L a ) . Then A is invertible if and only if\-A is invertible. Proof. Then T is invertible if and only if [T]# is invertible. Let A be ann x n matrix. respectively. [T . • Corollary 1.i(F) by T(a\. d). It can be verified by matrix multiplication that each matrix is the inverse of the other.102 Example 3 Chap. @ The notion of invertibility may be used to formalize what may already have been observed by the reader. Definitions. these two vector spaces may be considered identical or isomorphic.a2) = fli + nix. 2 Linear Transformations and Matrices Let 0 and 7 be the standard ordered bases of Pi(R) and R2.. c.

18. we have associated linear transformations with their matrix representations. 2. $(T) = [T]J for T € Let V and W be finite-dimensional vector spaces over F m. we have R(T) = span(T(/?)) = span(7) = W. Thus T is one-to-one (see Exercise 11). • In each of Examples 4 and 5. V2. By use of the Lagrange interpolation formula in Section 1. L . if V and W are isomorphic. . 71). then either both of V and W are finite-dimensional or both arc infinite-dimensional. 71). . respectively. and let 0 and 7 be ordered bases for V Then the function <I>: £(V. and let 0 — {vi.. there exists T: V —• W such that T is linear and T(vi) — Wi for i — 1 . By the lemma preceding Theorem 2. Using Theorem 2. From Theorem 2. defined by £(V. W) — » M m x n ( F ) .6. . Let V be a vector space over F. Corollary. Theorem 2.2 (p. By Theorem 2.102.18.n. Theorem 2.5 (p.vn} and 7 = {101. Then V is isomorphic to W if and only if dim(V) = dim(W). Up to this point.. As the next theorem shows. Let V and W be finite-dimensional vector spaces (over the same field). We conclude that Ps{R) is isomorphic to M 2x2 ( J R).. Hence T is an isomorphism. W).20.wn} be bases for V and W. Suppose that V is isomorphic to W and that T: V — > W is an isomorphism from V to W. the reader may have observed that isomorphic vector spaces have equal dimensions. 2 . the collection of all linear transformations between two given vector spaces may be identified with the appropriate vector space o f m x n matrices. So T is onto.6 (p. we have that dim(V) = dim(W). 72). this is no coincidence. . Proof. We are now in a position to prove that. because dim(P3(/?)) = dim(M 2x2 (i?)). respectively. Then V is isomorphic to F n if and only if dim(V) = n. it follows that T is invertible by Theorem 2.4 Invertibility and Isomorphisms Example 5 Define T:P3(J2)-M2x2(ie) byT(/) = /(I) /(3) 1(2) /(4) 103 It is easily verified that T is linear. of dimensions n and and W. / Now suppose that dim(V) = dim(W). 1 By the lemma to Theorem 2. • . 68).5 (p. respectively. is an isomorphism.19. Moreover. . •. as a vector space. it can be shown (compare with Exercise 22) that T(/) = O only when / is the zero polynomial. we have that T is also one-to-one.Sec.

$ is linear. .8 (p.2.2). 4>p is an isomorphism.4)}. / Definition..wm}. Theorem 2. For x = (1. Example 6 Let 0 = {(1.6 (p. By Theorem 2. For any finite-dimensional vector space V with ordered basis 0. 2 Linear Transformations and Matrices Proof. The next theorem tells us much more. 1 This theorem provides us with an alternate proof that an n-dimensional vector space is isomorphic to F n (sec the corollary to Theorem 2.-2). The proof follows from Theorems 2. . (3.0). Proof. Let 0 — {vi.20 and 2.. By Theorem 2. we have <f>p (x) = [x](3 f _ 2 j and (j)1(x) = [x\ -5 2 We observed earlier that <j>p is a linear transformation. Then £(V. respectively.104 Chap. 82). This is accomplished if we show that for every mxn matrix A. It is easily observed that 0 and 7 are ordered bases for R2. .19). 7 = {101... there exists a unique linear transformation T: V — * W such that J v ( j) = 5 Z AiJWi for 1 . Thus < ] > is an isomorphism.103. there exists a unique linear transformation T: V —• W such that $(T) = A. Hence we must show that $ is one-to-one and onto. or $(T) = A. I We conclude this section with a result that allows us to see more clearly the relationship between linear transformations defined on abstract finitedimensional vector spaces and linear transformations from F n to F m . We begin by naming the transformation x — * [x]p introduced in Section 2.V2. and let A be a given mxn matrix. W) is finite-dimensional of dimension mn..vn}.. (0. 72).19 and the fact that dim(M m X n (F)) = mn.1)} and 7 = {(1. Proof Exercise. Let V and W be finite-dimensional vector spaces of dimensions n and rn. The standard representation ofV with respect to 0 is the function <f)p: V — > F n defined by <t>p{x) = [x]ff for each x € V.i' - n 1 But this means that [T]l = A.21. Let 0 be an ordered basis for an n-dhncnsional vector space V over the fleid F . Corollary.

the diagram "commutes.i(R) and P2(/^). Map V into W with T and follow it by 0 7 to obtain the composite 0 7 T. 91).2 (T(f(x)) = f'(x))." Heuristically. where 0 and 7 are arbitrary ordered bases of V and W. respectively. Let 0 and 7 be the standard ordered bases for P3(R) and P 2 (i*). respectively. this yields the composite LA(J>02. then /o A= I0 \0 1 0 0\ 0 2 0].4 Invertibility and Isomorphisms V (1) (2) 07 T u Figure 2.2 -*.14 (p. respectively.This diagram allows us to transfer operations on abstract vector spaces to ones on F n and F m . 2. These two composites are depicted by the dashed arrows in the diagram.w 105 Let V and W be vector spaces of dimension n and m. we may conclude that LA4>P = </>7T. respectively.2. Let us first consider Figure 2. and let (f>p: P-S{R) -> R4 and 0 7 : P2(R) -+ R3 be the corresponding standard representations of P-. 0 0 3/ .F -*. Example 7 Recall the linear transformation T: P-. We are now able to use 4>@ and 0 7 to study the relationship between the linear transformations T and LA: F n -> F m . Map V into F n with <fip and follow this transformation with L^.Sec.If A = [T]}. By a simple reformulation of Theorem 2. Notice that there are two composites of linear transformations that map V into Fm: 1. this relationship indicates that after V and W are identified with F" and F m via 0/? and <i>7. Define A = [T]l.](R) — > P2{R) defined in Example 4 of Section 2. and let T: V — > W be a linear transformation. that is. we may "identify" T with LA.

a 2 .03) = (3ai — 2a3. For each of the following linear transformations T. \\ = a + 26.#2. T is invertible if and only if T is one-to-one and onto.2. In each part.3ai + 4 a 2 ) . Label the following statements as true or false. a 2 ) = (3ai . P3(R) -> P2{R) defined by T(p{x)) = p'(x).106 Chap. 0 7 T(p(x)). 2 Linear Transformations and Matrices We show that Lj\(f)p(p(x)) = Consider the polynomial p(x) = 2+x—3x2+bx3. and A and B are matrices. (a) (b) (c) (d) T: T: T: T: R2 — > R3 defined by T(oi. • Try repeating Example 7 with different polynomials p(x). AB = I implies that A and B are invertible. 3 R — > R3 defined by T(oi. a +b a c c+d (e) T: M 2x2 (i?) -> P2(R) defined by T (a (f) T: M2x2(R) -+ M2x2{R) defined by T . determine whether T is invertible and justify your answer. Pn{F) is isomorphic to P m ( F ) if and only if n — rn. 4 a i ) .a 2 .1 = A.x + (c + d)x2.a 2 . (a) (b) (c) (d) (e) (f ) (g) (h) (i) ([Tig)" 1 £ [T-'lg. T = LA.a 2 ) = (ai — 2a 2 .3«i + 4 a 2 ) . A is invertible if and only if L^ is invertible. EXERCISES 1. Now / 0 1 0 0s = 0 0 2 0 \0 0 0 3 / LA4>0(p(x)) 2\ 1 -3 V V But since T(p(x)) = v'(x) = 1 — 6x + 15a. If A is invertible. then ( A . A must be square in order to possess an inverse. M 2X 3(F) is isomorphic to F 5 . V and W are vector spaces with ordered (finite) bases a and 0. T: V — » W is linear. 2.1 ) . respectively. a 2 . we have -J(vU)) (-61 15 So LAMPW) = ^T(P(*)). where A = [T]g. 2 R — > R3 defined by T ( a i .

(b) F 4 a n d P 3 ( F ) . . 5. 11. Let A and B be n x n matrices such that AB is invertible.21.Sec. (d) V = {A e M2x2(R): tr(A) = 0} and R4. Could A be invertible? Explain.* Let A be invertible. 14. Let A be an n x n matrix. (a) F3 and P 3 (F). Construct an isomorphism from V to F . a "one-sided" inverse is a "two-sided" inverse. Prove that A is not invertible. then B = O. (a) Suppose that A2 = O. Give an example to show that arbitrary matrices A and B need not be invertible if AB is invertible. (b) Suppose that AB — O for some nonzero n x n matrix B. 2. 9.) (c) State and prove analogous results for linear transformations defined on finite-dimensional vector spaces. Let = (A'1)1. Let ~ mean "is isomorphic to. 7. (c) M 2x2 (i?) and P 3 (i2).18.T Let A and B be n x n matrices such that AB — In. Prove that A1 is invertible and (A1)'1 6. 10. (We are.4 Invertibility and Isomorphisms 107 3. Prove that AB is invertible and (AB)'1 = B^A'1. Prove that if A is invertible and AB = O. Prove that A and B are invertible. 13. in effect. 4J Let A and B be n x n invertible matrices. Verify that the transformation in Example 5 is one-to-one. saying that for square matrices. Prove Corollaries 1 and 2 of Theorem 2. (a) Use Exercise 9 to conclude that A and B are invertible." Prove that ~ is an equivalence relation on the class of vector spaces over F. 8. Prove Theorem 2. (b) Prove A — B~x (and hence B = A~l). 12. Which of the following pairs of vector spaces are isomorphic? Justify your answers.

First prove that {Tij: 1 < i < m. which is a basis for M2x2(R). 72). Prove that rank(T) = rank(Lyi) and that nullity(T) = nullity (L. (a) Compute [Tj/3. 19. there exist linear transformations T. 2 Linear Transformations and Matrices 15. w m }. * Let T: V — > W be a linear transformation from an n-dimensional vector space V to an m-dimensional vector space W. Then let MlJ be the mxn matrix with 1 in the ith row and j t h column and 0 elsewhere.* Let V and W be finite-dimensional vector spaces and T: V — > W be an isomorphism. where A = [T]^.i). Let B be an n x n invertible matrix. By Theorem 2. E21. (b) Verify that LAMM) / = ^ T ( M ) for A = [T]^ and M = 1 2 3 4 20. Prove that $ is an isomorphism. 17. Define $ : M n x n ( F ) — > Mnxn(F) by $(A) = B~XAB. In Example 5 of Section 2. (b) Prove that dim(V 0 ) = dim(T(V 0 )).j: V — > W such that Tij(vk) = Wi if k = j 0 iik^ j. Let 0 = {EU. 21.-.E22}. Vn} and 7 = {w\.E12. and prove that [T^-H = MlK Again by Theorem 2. (a) Prove that T(V 0 ) is a subspace of W. Let Vn be a subspace of V. W) — > M m X n ( F ) such that $(Tij) = M y '..-. Hint: Apply Exercise 17 to Figure 2. Prove that T is an isomorphism if and only if T(0) is a basis for W. . Let 0 and 7 be ordered bases for V and W.1. Suppose that 0 is a basis for V.. the mapping T: M 2x2 (i?) — > M 2x2 (i?) defined by T(M) = Ml for each M e M 2x2 (i?) is a linear transformation. there exists a linear transformation $ : £(V. 1 < j < n} is a basis for £(V. and let T: V — * W be a linear transformation. Prove that $ is an isomorphism. W). 16. .6 (p. respectively. Repeat Example 7 with the polynomial p(x) = 1 + x + 2x2 + x3. as noted in Example 3 of Section 1. w2.6.108 Chap. . Let V and W be n-dimensional vector spaces. Let V and W be finite-dimensional vector spaces with ordered bases 0 = {vi> v2. 18.2. respectively.6.

3 and with Exercise 40 of Section 2. 60) in Section 1. ..Sec. F) denote the vector space of all functions / € !F(S. Define T: P n ( F ) -» Fn+1 by T ( / ) = ( / ( c b ) . . c i . that is.2. prove that if v + N(T) = v' + N(T). (b) Prove that T is linear. Let V be a nonzero vector space over a field F. (d) Prove that the diagram shown in Figure 2. Define the mapping T: V/N(T) -» Z by T(v + N(T)) = T(v) for any coset v + N(T) in V/N(T).. 23. Prove that T is an isomorphism. Let T: V — » Z be a linear transformation of a vector space V onto a vector space Z.C„. Let c n . Let C(S. c n be distinct scalars from an infinite field F.13 (p. V T • Z 4 / 7 T V/N(T) Figure 2. prove that T = Tr/. / ( d ) .4 Invertibility and Isomorphisms 109 22. 24. every vector space has a basis).3 25. . . that is. Hint: Use the Lagrange polynomials associated with C 0 .7.1. i=0 where n is the largest integer such that er(n) ^ 0. Let V denote the vector space defined in Example 5 of Section 1. . Define T: V — W by T(<r) = j ^ a ( i ) x j ..3 commutes. (a) Prove that T is well-defined. . .Ci.. (c) Prove that T is an isomorphism. and suppose that S is a basis for V. and let W = P(F). (By the corollary to Theorem 1. 2. then T(v) = T(v'). F) such that f(s) — 0 for all but a finite number . . Prove that T is an isomorphism. The following exercise requires familiarity with the concept of quotient space defined in Exercise 31 of Section 1./(c„)).

1 . in calculus an antiderivative of 2xex can be found by making the change of variable u — x .) We see how this change of variable is determined in Section 6.110 Chap. 2. Prove that \I> is an isomorphism. the new coordinate axes are chosen to lie in the direction of the axes of the ellipse. in which form it is easily seen to be the equation of an ellipse.3. / " = v T 7f' 2 can be used to transform the equation 2x — 4xy + by2 — 1 into the simpler equation (x')2 +6(y')2 = 1. s€S. the coordinate vector of P relative to the standard ordered basis 0 = {ei. (See Figure 2.4. For example. an x'y'-coordinate system with coordinate axes rotated from the original xy-coordinate axes. Similarly./(s)^0 otherwise. This is done by introducing a new frame of reference. e 2 }.5 T H E C H A N G E OF COORDINATE MATRIX In many areas of mathematics. Geometrically. (See Exercise 14 of Section 1. to . the change of variable is a change in the way that the position of a point P in the plane is described.F) -> V be defined by * ( / ) = 0 if / is the zero function.5. in geometry the change of variable 2 . a change of variable is used to simplify the appearance of an expression. The resulting expression is of such a simple form that an antiderivative is easily recognized: 2xex dx = eudu = eu + c = ex + c. and the change of variable is actually a change from [P]p = ( j. and *(/) = E /«*. 2 Linear Transformations and Matrices of vectors in 5. Thus every nonzero vector space can be viewed as a space of functions. The unit vectors along the x'-axis and the y'-axis form an ordered basis 1_/-1 ' • { A C y/E\ 2 for R2. In this case.) Let # : C{S.

(a) Since lv is invertible. (b) For any v 6 V. J.Q[v]p>. 101).. Figure 2. Q is invertible by Theorem 2.Sec. A similar result is true in general. the coordinate vector of P relative to the new rotated basis 0'. Thus [v]p = Q[v)p> for all v e R2. Let 0 and 0' be two ordered bases for a finite-dimensional vector space V.22.1 \ fx' 2 y y/EV 2 equals [l]«. . Then (a) Q is invertible.18 (p. [v]p . 91). 2.4 A natural question arises: How can a coordinate vector relative to one basis be changed into a coordinate vector relative to the other? Notice that the system of equations relating the new and old coordinates can be represented by the matrix equation 1_ (2 A u Notice also that the matrix W . [vb = [W(v)]0 = {\vfp.[v]0>=Q[vh> by Theorem 2.5 The Change of Coordinate Matrix 111 [P]pi = ( . and let Q — [\v]%. where I denotes the identity transformation on R2. Proof. T h e o r e m 2. (b) For any v£\J.14 (p.

n.112 Chap. for instance. = [T]pQTherefore [T]p.22 is called a change of coordinate matrix..[Tl]£ = [Tfp[\fp.. Since (2. . Then Proof.x2. Let I be the identity transformation on V.11 (p.1) .1)}. . 2 . then Q~l changes /^-coordinates into /"/-coordinates. 2 Linear Transformations and Matrices The matrix Q = [\y]p. .4). Let T be a linear operator on a finite-dimensional vector space V. (3. (See Exercise 11. (1. we say that Q changes /^'-coordinates into /^-coordinates.1) + 1(1.1 ) } and 0' = {(2. that is. . What is the relationship between these matrices? The next theorem provides a simple answer using a change of coordinate matrix. hence.1) = 2(1.=Q 1\ 0/ / 3 V-l For the remainder of this section. Notice that if Q changes /"^'-coordinates into /^-coordinates. Q[r\p = [ " ] £ m £ = [IT]J . let 0 = {(1. defined in Theorem 2.) Example 1 In R2. by Theorem 2. xn} and 0' = {x'i. ..23. and let 0 and 0' be ordered bases for V. Suppose now that T is a linear operator on a finitedimensional vector space V and that 0 and 0' are ordered bases for V.x'n}. [(2A)Yp = Q[(2. -1). then Xj / WijXi i=l for j = 1 . Observe that if 0 — {xt. the matrix that changes /^'-coordinates into /^-coordinates is 3 2N Thus.1(1. Theorem 2.4)]p.x2.4) = 3(1. Because of part (b) of the theorem. I .1). • • • . . = Q-^pQ. Suppose that Q is the change of coordinate matrix that changes 0'-coordinates into 0-coordinates. the jth column of Q is [x'Ap. Then T = IT = Tl. 88). Then V can be represented by the matrices [T]^ and \T]p>.1 ) and (3. . we consider only linear transformations that map a vector space V into itself. Such a linear transformation is called a linear operator on V.

) We wish to find an expression for T(a. the image of the second vector in 0' is T 3 .1. b) for any (a. 2. (3a — b a + 36 7 ' and let 0 and 0' be the ordered bases in Example 1. • It is often useful to apply Theorem 2.i f i H f i Notice that the coefficients of the linear combination are the entries of the second column of [T]^. The reader should verify that [T]a = 3 -1 1 37" In Example 1.5 The Change of Coordinate Matrix Example 2 Let T be the linear operator on R2 defined by T 113 . (See Figure 2. we can verify that the image under T of each vector of 0' is the linear combination of the vectors of 0' with the entries of the corresponding column as its coefficients. by Theorem 2. Since T is linear.1 17 ' 5 1 37 ' / Hence. it is completely . y) —> (x. Example 3 Recall the reflection about the rc-axis in Example 3 of Section 2. as the next example shows. we saw that the change of coordinate matrix that changes /^'-coordinates into /^-coordinates is Q = and it is easily verified that W 3 2 .5. The rule (x.Sec. —y) is easy to obtain. b) in R2.23. [T]/3'=Q-1[T]/3Q=(J I)' To show that this is the correct matrix. For example. We now derive the less obvious rule for the reflection T about the line y = 2x.23 to compute [T]^.

and let Q be the matrix that changes /^'-coordinates into /^-coordinates. Thus for any (a.2) and T ( .2 .2 .114 Chap. Clearly.( . 1 ) = (2. 1 ) = . 2 Linear Transformations and Matrices y y = 2x Figure 2.2) = (1. .1 ) .b) in R2. T(l. We can solve this equation for [T]^ to obtain that [T]li = Q\T\d'Q~lBecause Q 5 I -2 1 the reader can verify that t * . Therefore if we let 0' = 1\ 1-2 27' V i then 0' is an ordered basis for R2 and / H> = o -17' Let 0 be the standard ordered basis for R2. we have T 1 /-3 4 4 37 \b 1 / . it follows that T is left-multiplication by [T]p.5 determined by its values on a basis for R2.3 a + 46 4 a + 367 ' .H i a)Since 0 is the standard ordered basis. Then Q = 1 -2 and Q l[^]pQ — [T]/9'.

23 is contained in the next corollary.Sec. we can change bases in V as well as in W (see Exercise 8). Let Q be the 3 x 3 matrix whose jth column is the j t h vector of 7. however. Observe that the relation of similarity is an equivalence relation (see Exercise 9). In this case. Let A e M n x n (F). where V is distinct from W. Definition.23 can be stated as follows: If T is a linear operator on a finite-dimensional vector space V. whore Q is the n x n matrix whose jth column is the jth vector of 7. then [T]p' is similar to [T]p.AQ. 6. whose proof is left as an exercise. At this time. Let A and B be matrices in M n x n ( F ) . We say that B is similar to A if there exists an invertible matrix Q such that B = Q. we introduce the name for this relationship. Notice also that in this terminology Theorem 2. So we need only say that A and B are similar. and 7. Theorem 2. .1 AQ.23 will be the subject of further study in Chapters 5. Then [Lyt]7 = Q .5 The Change of Coordinate Matrix 115 A useful special case of Theorem 2. 2.23 can be generalized to allow T: V — * W. l M v = Q. Then and So by the preceding corollary.AQ : Q-1 = I ° = M V 0 2 4 -1 8 6 -1 The relationship between the matrices [T]^' and [T]p in Theorem 2. Example 4 Let A = and let 7 = which is an ordered basis for R3. and if 0 and 0' are any ordered bases for V. Corollary. and let 7 be an ordered basis for F".

x. and let Q be the change of coordinate matrix that changes /^'-coordinates into /^-coordinates. x J J are ordered bases for a vector space and Q is the change of coordinate matrix that changes /^'-coordinates into /^-coordinates.62a.x2.x.2x . x + 1.4x2 .x + 1. ( ..9 . c2x2 + c\x + Cn} 2 2 2 2 (c) 0 = {2x .x + 1.2 .« Q P V Q " 1 ..1 ) } {(01.e 2 } and 0' = { ( . Then the jth column of Q is [xj]p>. (e) Then for any ordered bases 0 and 7 for V. (a) 0 = {x2. .1 ) } {(2.3 ) } {(-4. Let T be a linear operator on a finite-dimensional vector space V. Then [J]p. (61. x.1 3). -x 2 + 2x + 1} and (f) / 0' = { 9 x . (a) (b) (c) (d) 0= 0 = 0= 0 = {ei. 1 ) } 3.1 . find the change of coordinate matrix that changes /^'-coordinates into 0coordinates. x + 1} and 0' = {x2 + x + 4. For each of the following pairs of ordered bases 0 and 0' for R2. . . (d) The matrices A.2 + 61X + 60. .4 .. .l} and 0' = {a2x2 + a\x + a 0 .2x2 + 3} ? = {x2 — a . 3x + 1. For each of the following pairs of ordered bases 0 and 0' for P2(R). (2.02). x 2 + 3x . .2. Label the following statements as true or false. -2x2 + 5x + 5.x. x 2 + 2 1 x .2x2 . c2x2 + C\X + CQ} 2 (b) 0 = {l. let 0 and 0' be ordered bases for V. .3} 9 = {2x2 . x 2 . x } and 0' = {1. . (5. x } 2 2 (d) 0 = {x . 2.B£ M n X n ( F ) are called similar if B = Q*AQ for some Q < E MnXn(F). [T]^ is similar to [T]7.3.. (a) Suppose that 0 = {x\.xn} and 0' = { x i .116 Chap. (b) Every change of coordinate matrix is invertible. ( . x } and 2 0' = {a2x + mx + a0. x 2 + 1.5).1). (2.0)} and 0' = {eue2} and 0' = {(2. 3 x 2 + 5x + 2} 4.62)} and 0' = {(0. Let T be the linear operator on R2 defined by T 2a+ 6 a — 36 . (c) Let T be a linear operator on a finite-dimensional vector space V. find the change of coordinate matrix that changes /^'-coordinates into /^-coordinates.10).3).6 2 x 2 + 61 a: + 60. 2 Linear Transformations and Matrices EXERCISES 1. a : — 1} and (e) / 2 0' = {5a: .x .Sx + 2.

5 The Change of Coordinate Matrix let 0 be the standard ordered basis for R2. find an invertible matrix Q such that [LA]P = Q~lAQ. (See the definition of projection in the exercises of Section 2. 1 — x}. Prove the following generalization of Theorem 2. the derivative of p(x). Let 0 — {l. For each matrix A and ordered basis /?.Sec.) 8.1. Let T be the linear operator on Pi(R) defined by T(p(x)) = p'(x). Let T: V — • » W be a linear transformation from a finite-dimensional vector space V to a finite-dimensional vector space W.23 and the fact that 1 1 tofind[T]p>. let L be the line y = mx. (b) T is the projection on L along the line perpendicular to L. 1\ -ij -1 ( u l 1 2 1 2 (d) A = 7. Also. where m ^ 0. Use Theorem 2. and let 0' = Use Theorem 2. In R2. 6. Find an expression for T(x. where (a) T is the reflection of R2 about L. 2. t/).a:} and 0' = {1 + x. find [L^]^. r2 1 117 2 -1 -1 1 5.23 and the fact that i 1 to find [T]p'. Let 0 and 0' be ordered bases for .23.

. Prove that 0' is a basis for V and hence that Q is the change of coordinate matrix changing /^'-coordinates into /^-coordinates. respectively. then RQ is the change of coordinate matrix that changes o>coordinates into 7-coordinates. . 14..118 Chap. Define x'j = 22 Qijxi for 1 < J < ™. T = L^. 2 Linear Transformations and Matrices V. then Q~l changes /^-coordinates into a-coordinates. respectively. x n } be an ordered basis for V. 11. Now apply the results of Exercise 13 to obtain ordered bases 0' and 7' from 0 and 7 via Q and P . 10. and let 0 — {x\.0. and if there exist invertible mxrn and n x n matrices P and Q. . Prove that "is similar to" is an equivalence relation on MnXn(F). then there exist an n-dimensional vector space V and an m-dimensional vector space W (both over F). (a) Prove that if Q and R are the change of coordinate matrices that change a-coordinates into ^-coordinates and .23. . respectively. then tr(A) = tr(I?). W = F m . x 2 . Prove the converse of Exercise 8: If A and B are each mxn matrices with entries from a field F . and let 7 and 7' be ordered bases for W. Hint: Use Exercise 13 of Section 2. and a linear transformation T: V — > W such that A = [T]} and B = [!]}. .* Let V be a finite-dimensional vector space over a field F . (b) Prove that if Q changes a-coordinates into /3-coordinates.0-coordinates into 7-coordinates. where Q is the matrix that changes /^'-coordinates into /^-coordinates and P is the matrix that changes 7'-coordinates into 7-coordinates. • • •. 12. a. x'n}. Let Q be an nxn invertible matrix with entries from F. 9. and 0 and 7 be the standard ordered bases for F" and F m . and set 0' — {x[. = P _ 1 [T]JQ. respectively. Prove the corollary to Theorem 2. Then [T]j/. Prove that if A and B are similar n x n matrices. such that B = P~1AQ.3. ordered bases 0 and 0' for V and 7 and 7' for W. 13. Let V be a finite-dimensional vector space with ordered bases and 7. x 2 . Hints: Let V = F n .

20 (p.. Note that if V is finite-dimensional.6 Dual Spaces 2. .X2. Fix a function g € V. Thus V* is the vector space consisting of all linear functionals on V with the operations of addition and scalar multiplication as defined in Section 2. . where &y is the Kronecker delta. As we see in Example 1. 104) dim(V*) = dim(£(V.. n. Such a linear transformation is called a linear functional on V. the definite integral provides us with one of the most important examples of a linear functional in mathematics. We generally use the letters f. F). to denote linear functionals. These linear functionals play an important role in the theory of dual spaces (see Theorem 2. .6* DUAL SPACES 119 In this section.F)) = dim(V) • dim(F) = dim(V). . g.24). In the cases that g(t) equals sin nt or cos nt. • Definition. . For a vector space V over F. By Exercise 6 of Section 1. 2 . x n } be an ordered basis for V. 2. define fi(x) = a». Note that U(XJ) = Sij.27r].Sec.3. • Example 3 Let V be a finite-dimensional vector space. denoted by V*. The function h: V —• R defined by 1 rn h(x) = — y x(t)g(t)dt is a linear functional on V. h(x) is often called the n t h Fourier coefficient of x. Then fj is a linear functional on V called the i t h coordinate function with respect to the basis 0.. and let 0 = {xi. the trace of A. . then by the corollary to Theorem 2. . we define the dual space of V to be the vector space £(V. 2 T T . we are concerned exclusively with linear transformations from a vector space V into its field of scalars F . and define f: V -> F by f(A) = tr(A). where /«i\ 02 N/3 = \dn) is the coordinate vector of x relative to 0. we have that f is a linear functional. Example 1 Let V be the vector space of continuous real-valued functions on the interval [0. which is itself a vector space of dimension 1 over F .2. For each i = 1 . . • Example 2 Let V = M n X n (F). h .

Since dim(V*) = n. in fact.24.1) = fi(3ei + e 2 ) = 3fi(ei) + fi(e 2 ). that there is a natural identification of V and V** in the case that V is finite-dimensional. for any f G V*.6 (p. Solving these equations. • . that is. • • • . • • • . Example 4 Let 0 — {(2.To explicitly determine a formula for fi.1)} be an ordered basis for R2. we call the ordered basis 0* = {fi.24. Let f* (1 < i < n) be the ith coordinate function with respect to 0 as just defined.f2..120 Chap..19 (p. and let 0* = {fi. 2 Linear Transformations and Matrices Hence by Theorem 2. we have g(*j) = f ] [ ^ f ( z i ) f i j (xj) = ^ f ( f f i ) f i ( # j i=l n •=£ffo)*y=f(*i). fi(x. it can be shown that f2(x. | Definition. V and V* are isomorphic.26. Let g = ^f(a:i)fi. Theorem 2. i=\ Therefore f = g by the corollary to Theorem 2. i=\ from which it follows that 0* generates V*.1) = fi(2ei + e 2 ) = 2fi(ei) + fi(e 2 ) 0 = f] (3. (3. f2}. and hence is a basis by Corollary 2(a) to the replacement theorem (p.f n } of\/* that satisfies U(XJ) — Sij (1 < i. we show.j < n) the dual basis of 0. we have f = ^f(xi)fi.fn}Then 0* is an ordered basis for V*. Using the notation of Theorem 2.1). Suppose that V is a finite-dimensional vector space with the ordered basis 0 — {x\.x2. 47). For 1 < j < n.y) = x — 2y. . 103). We also define the double dual V** of V to be the dual of V*. i=l Proof Let f € V*. Suppose that the dual basis of 0 is given by 0* = {fi. we obtain fi(ei) = —1 and fi(e2) — 3.f2. Similarly. 72).. In Theorem 2.xn}. we need only show that n f = ^f(xi)fi. and. we need to consider the equations 1 = fi(2. y) — —x + 3y.

)f.g2. . we have T t (g J ) = g j T = ^(g. .24. In Section 2. 2. Let V and W be finite-dimensional vector spaces over F with ordered bases 0 and 7. respectively. We now answer this question by applying" what we have already learned about dual spaces. it would be impossible for U to be a linear transformation from V into W. . By Theorem 2. respectively. For any linear transformation T: V -> W. respectively. the mapping V: W* -> V* defined by T'(g) = gT for all g e W* is a linear transformation with the property that [T']'« = ([T]!)'.s. column j entry of [T*]^.Sec. • • -/gm}. For g £ W*..T)(x.25.25 is called the transpose of T.f2._ We illustrate Theorem 2. To find the j t h column of [Tf]ij. the question arises as to whether or not there exists a linear transformation U associated with T in some natural way such that U may be represented in some basis as A1. we begin by expressing T (gj) as a linear combination of the vectors of 0*. For a matrix of the form A = [T]l.. it is clear that T*(g) = gT is a linear functional on V and hence is in V*. Of course.. x 2 . It is clear that T is the unique linear transformation U such that [ < = cm?)*. is (gjT)(xi) = gj(T(Xi)) = gj I J2 Akiyk j m = ^2Akigj(yk) fc=i Hence [T*]£ = A*. . x n } and 7 = {3/1. To complete the proof. Theorem 2. Thus T f maps W* into V*. let A = [T]^. s=l So the row i. if m / n. fc=i I The linear transformation Tl defined in Theorem 2. we proved that there is a one-to-one correspondence between linear transformations T: V — » W and mxn matrices (over F) via the correspondence T *-*• [T]2.ym} with dual bases 0* — {fi.». rn = y~] Akj6jk = Aji.4. let 0 = { x i . Proof.f n } and 7* = {gi. We leave the proof that Tb is linear to the reader. .6 Dual Spaces 121 We now assume that V and W are finite-dimensional vector spaces over F with ordered bases 0 and 7.2/2. For convenience. • . • • • .25 with the next example.

Suppose that [T*]£ = P b \ Then T*(gi) = af.122 Example 5 Chap. So T*( g l )(l) = (ofi + cf 2 )(l) = afi (1) + cf 2 (l) = a(l) + c(0) = a. an isomorphism between V and V** that does not depend on any choice of bases for the two vector spaces. Clearly. Let f = f. and d = 2. 2 Linear Transformations and Matrices Define T: ?i{R) -* R2 by T(p(x)) = (p(0).25.(1 j) • We compute [T']r. But also (T t (gi))(l) = g i ( T ( l ) ) = g 1 ( l . Let x ^ 0. Then ip is an isomorphism.. . +cf 2 .. I Theorem 2. Lemma. Let /?* = {f 1. f 2 .p(2)). Then fi(x x ) = 1 ^ 0. It is easy to verify that x is a linear functional on V*.. . If x(f) = 0 for all f G V*. Let {fi.f1 1 = ( m ? ) \ 0 2 • We now concern ourselves with demonstrating that any finite-dimensional vector space V can be identified in a natural way with its double dual V**. . so x G V**.X2. x n } for V such that xi = x. For a vector x G V. in fact. . Choose an ordered basis 0 = {xi. There is. So a = 1. f n } be the dual basis of 0. l ) = l. .g 2 }. and let x G V. Proof. The correspondence x *-* x allows us to define the desired isomorphism between V and V**.26.. We show that there exists f G V* such that x(f) ^ 0. f2} ami 7* = {gi. directly from the definition. Let 0 and 7 be the standard ordered bases for Pi(R) and R2. Let V be a finite-dimensional vector space. we obtain that c = 0. b = 1. . Hence a direct computation yields [T / as predicted by Theorem 2.. m? . and define ip: V — » V** by ip(x) = x. respectively. Let V be a finite-dimensional vector space. we define x: V* — * F by x(f) = f(x) for every f G V*. then x = 0. . Using similar computations.

(e. (c) Every vector space is isomorphic to its dual space. (d) Every vector space is the dual of some other vector space.. | Corollary.. (b) A linear functional defined on a field may be represented as a 1 x 1 matrix. Let {fi. y G V and c£ F. then the domain of (vy is v**.. then T(0) = 0*. In fact. Then every ordered basis for V* is the dual basis for some basis for V.. and V** are isomorphic. (g) If V is isomorphic to W. Let V be a finite-dimensional vector space with dual space V*. (a) Every linear transformation is a linear functional.. Proof. f n } is the dual basis of {xi. . 5ij = x~i(fj) = f/(xi) for all i and j. Therefore -0(cx + y) = ex + y = cip(x) + V(y)- 123 (b) • { / > is one-to-one: Suppose that i})(x) is the zero functional on V* for some x G V. the existence of a dual space). for infinite-dimensional vector spaces. (a) if) is linear: Let x.f 2 .. f2.6 Dual Spaces Proof. Assume that all vector spaces are finite-dimensional.g. x2. then V* is isomorphic to W*. that is.. we have V>(cx + y)(f) = f(cx + y) = ef(x) + % ) = cx(f) + y(f) = (cx + y)(f).f n } be an ordered basis for V*... x ^ | . (c) ijj is an isomorphism: This follows from (b) and the fact that dim(V) = dim(V**).. EXERCISES 1.26 to conclude that for this basis for V* there exists a dual basis {xi. . we conclude that x = 0. only a finite-dimensional vector space is isomorphic to its double dual via the map x — » x. . For f G V*.X2. 2. We may combine Theorems 2. x n } in V**.. Label the following statements as true or false.Sec... Then x(f) = 0 for every f G V*. no two of V.. (f) If T is a linear transformation from V to W. can be extended to the case where V is not finite-dimensional.24 and 2. Thus {fi. By the previous lemma. V*.. (e) If T is an isomorphism from V onto V* and 0 is a finite ordered basis for V. I! Although many of the ideas of this section.

Let V = R3.2p(l). For each of the following vector spaces V and bases 0. and define fi.1)} (b) V = P 2 ( Z ? ) . find explicit formulas for vectors of the dual basis 0* for V*.z) =x-2y.f(p(x))=J 0 1 p(t)<ft (f) V = M2x2(F). (c) Compute [T]p and ([T]^)4. and find a basis for V for which it is the dual basis. x 2 } 4.R). Let V = Pi(. define fi.x). f:i(x. and d such that T'(fi) = ofi + cf2 and T'(f 2 ) = 6f: + df2.Zz. determine which are linear functionals. and then find a basis for V for which it is the dual basis. Define T: V — > W by T(p(x)) = (p(0) . (a) V = R 3 .2/) = (2x. x . / 3 = { l . (0. where ' denotes differentiation R 2 . (b) Compute [T^p*. f 5.y) (Zx + 2y.0.y. f2} is the dual basis.124 Chap.y) = (a) Compute T*(f). z ) = a . as in Example 4. Prove that {fi. f ( x . f2(x. b.f3 G V* as follows: fi(x.1). T3} is a basis for V*. z) = x + y + z. f2.f(A) = Au 3. y . f(p(x)) = 2p'(0) + p"(l).f(x. where p'(x) is the derivative of p(x). where 0 is the standard ordered basis for R2 and 0* = {f 1. c. (a) (b) (c) (d) V= V= V= V= P(R). 7. 2.f2. Prove that {fi. and./? = {(1. 6.z) = y . 2 Linear Transformations and Matrices (h) The derivative of a function may be considered as a linear functional on the vector space of differentiable functions.1).f 2 G V* by fi(p(x))= / p(t)dt Jo and f 2 (p(x))= / Jo p(t)dt. (1. Let V = Pi(R) and W = R2 with respective standard ordered bases 0 and 7.f2} is a basis for V*.y. for p(x) G V. by finding scalars a. = 2x + y and T: R2 -> R2 by T(x. For the following functions f on a vector space V. and compare your results with (b). Define f G (R2)* by f(x.f(i4)=tr(i4) R 3 .2.0.p(0) +p'(0)).4y) M2x2(F).y. 2 y2 + z2 (e) V = P(/l). .

. . (a) For 0 < i < n.gm} is the dual basis of the standard ordered basis for F m .. In fact. (b) Compute [T*]^. 3 8.. define fj(x) = (g. Hint: Apply any linear combination of this set that equals the zero transformation to p(x) = (x — ci)(x — c2) • • • (x — cn). Hint: If T is linear. rt m 9.25. f i ..26 and (a) to show that there exist unique polynomials po(x).p n (x) such that Pi(cj) = 5ij for 0 < i < n. (e) Prove that r-b n j p(t)dt = ^2p(ci)di. . . .pi(x). Ja i=0 where di= [ Pi(t)dt. and compare your results with (b). State an analogous result for R2. and deduce that the first coefficient is zero. a x ( ) = ^2<hPi(x). . These polynomials are the Lagrange polynomials defined in Section 1. that is. Prove that a function T: F —• F is linear if and only if there exist n fi. define U G V* by U(p(x)) = p(c. Ja X ^P(ci)Pi(x) .fm G (F )* such that T(x) = (fi(x).f 2 . • • • . . (c) Compute [T]« and its transpose. deduce that there exists a unique polynomial q(x) of degree at most n such that q(ci) — a.i).. c i .i for 0 < i < n.. (b) Use the corollary to Theorem 2. fli. . Let V = P n (F)....f n } is a basis for V*.6 Dual Spaces 125 (a) For f G W* defined by f (o. b) = a .. . 10. Prove that {fo.g2.f 2 (x). . a n (not necessarily distinct). and let c n . . where {gi.. (c) For any scalars an. fi — T'(gi) for 1 < i < m. Show that every plane through the origin in R may be identified with 3 the null space of a vector in (R )*. compute T'(f). 2.T)(x) for x G F n .6. i=0 n (d) Deduce the Lagrange interpolation formula: P( ) = i=0 for any p(x) G V.--. without appealing to Theorem 2. cn be distinct scalars in F .Sec.26.f m (x)) for all x G F n .

. V denotes a finite-dimensional vector space over F ..26. 13. (b) If W is a subspace of V and x g" W. 2 Linear Transformations and Matrices i(b — a) n . where 'ip is defined as in Theorem 2. respectively. (e) For subspaces Wi and W 2 . this result yields Simpson's rule for evaluating the definite integral of a polynomial.. Let V and W be finite-dimensional vector spaces over F . and let tp\ and ip2 be the isomorphisms between V and V** and W and W**.X2.e. Prove that if W is a subspace of V..f n }..X2. Prove that ip(0) = 0**. _i for % = 0 . .126 Suppose now that Ci = a-\ Chap. . the preceding result yields the trapezoidal rule for evaluating the definite integral of a polynomial. x n } of V. prove that Wi = W2 if and only if W? = Wg. Let T: V — » W be linear. . Let 0* = {fi. (a) Prove that S° is a subspace of V*. where ip is defined in Theorem 2. Prove that {f fc+ i.. Hint: Extend an ordered basis {xi. 11. . . Prove that the diagram depicted in Figure 2.26.6 commutes (i. • • •..f„} is a basis for W°. . For n = 1. Let V be a finite-dimensional vector space with the ordered basis 0. n. (c) Prove that (5°)° = span(^'(S')). define the annihilator 5° of 5 as 5° = {f G V*: f(x) = 0 for all x G S}. . (d) For subspaces Wi and W2. For every subset S of V. V W i>2 V / W Figure 2. and define T " = (T*)*. -. In Exercises 13 through 17. show that (Wj + W 2 )° = Wj n W§. then dim(W) + dim(W°) = dim(V). 14.f2. xk] of W to an ordered basis 0 = {xi. 1 .26...6 12. as defined in Theorem 2.ffc+2. prove that ip2T = T " ^ ) . For n — 2. prove that there exists f G W° such that f (x) ^ 0.

Prove that < I > is an isomorphism. 20. and let T: V — » W be a linear transformation. let F(t) denote the force acting on the weight and y(t) denote the position of the weight along the y-axis. Use Exercises 14 and 15 to deduce that rank(L^t) = rank(L^) for any A&MmXn(F).7* HOMOGENEOUS LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS As an introduction to this section. the weight is lowered a distance s along the y-axis and released.1. Hint: Apply Exercise 34 of Section 2. The spring then begins to oscillate. consider the following physical problem. W / V ) . At any time t > 0.7). For example. 18. The second derivative of y with respect . Let T be a linear operator on V. Prove that N(T<) = (R(T))°. We describe the motion of the spring.) Let $: V* —• J-(S. Prove that there exists a nonzero linear functional f G V* such that f (x) = 0 for all x G W. Suppose that at a certain time. Let V be a nonzero vector space over a field F .1) if and only if W° is ^-invariant. the restriction of f to S. A weight of mass m is attached to a vertically suspended spring that is allowed to stretch until the forces acting on the weight are in equilibrium. 17.7. Suppose that W is a finite-dimensional vector space and that T: V — »W is linear. 19. Hint: For the infinite-dimensional case. 2. 16. use Exercise 34 of Section 2. y(0) = — s.Sec..7. Suppose that the weight is now motionless and impose an xy-coordinate system with the weight at the origin and the spring lying on the positive y-axis (see Figure 2. Let V and W be nonzero vector spaces over the''same field. Hint: Parts of the proof require the result of Exercise 19 for the infinitedimensional case. 60) in Section 1.13 (p.1 as well as results about extending linearly independent sets to bases in Section 1. and let W be a proper subspace of V (i. and let 5 be a basis for V.e. every vector space has a basis. (By the corollary to Theorem 1.7 Homogeneous Linear Differential Equations with Constant Coefficients 127 15. F) be the mapping defined by $(f) = fs. say t = 0. Let V be a nonzero vector space. Prove that W is T-invariant (as defined in the exercises of Section 2. 2. (b) Prove that T* is onto if and only if T is one-to-one. (a) Prove that T is onto if and only if T* is one-to-one. and let W be a subspace of V.

. . . o n and / are functions of t and y^k' denotes the kth derivative of y. (4) is called homogeneous .7 to time. (2) Combining (1) and (2).128 Chap. If k > 0 is the proportionality constant. but acts in the opposite direction. then the equation is said to be linear. If the differential equation is of the form any (n) an-iy (n-1) + •• a\y (i) a>oy = f. a i . and that this force satisfies Hooke's law: The force acting on the weight is proportional to its displacement from the equilibrium position. . (3) The expression (3) is an example of a differential equation. t. In this section. y"(t). we apply the linear algebra we have studied to solve homogeneous linear differential equations with constant coefficients. F(t) = my"(t). Thus (3) is an example of a linear differential equation in which the coefficients are constants and the function / is identically zero. 2 Linear Transformations and Matrices Figure 2. When / is identically zero. by Newton's second law of motion. A differential equation in an unknown function y = y(t) is an equation involving y. . then Hooke's law states that F(t) = -ky(t). If an ^ 0. we obtain my" = —ky or y + — y = 0. (1) It is reasonable to assume that the force acting on the weight is due totally to the tension of the spring. The functions ai are called the coefficients of the differential equation (4). (4) where an. is the acceleration of the weight at time t\ hence. and derivatives of y.

however. but equivalent. Example 1 The function y(t) = sin y/k/m t is a solution to (3) since y (t) + —y(t) = = sin \ — t + — sin m m m V m m for all f. Given a function x G J-(R. we always assume that the coefficient an in (4) is 1.2). Because of this observation. equation . . In this case. C) with real part x\ and imaginary part x2. A solution to (4) is a function that when substituted for y reduces (4) to an identity.7 Homogeneous Linear Differential Equations with Constant Coefficients 129 we say that differential equation (4) is of order n. C) of a real variable t. . In our study of differential equations. Thus we are concerned with the vector space J-~(R. Thus y(t) — t is not a solution to (3). In order to consider complex-valued functions of a real variable as solutions to differential equations. we divide both sides by an to obtain a new. If x is differentiable. 2. C) (as defined in Example 3 of Section 1. / a n for i = 0 . we must define what it means to differentiate such functions. The convenience of this viewpoint will become clear later. we define the derivative x' of x by 3s —'dji ~\~ IXo' We illustrate some computations with complex-valued functions in the following example. Notice. . there exist unique real-valued functions xi and X2 of i. . m • which is not identically zero. . it is useful to regard solutions as complex-valued functions of a real variable even though the solutions that are meaningful to us in a physical sense are real-valued. we say that x is differentiable ifx\ and x2 are differentiable. We call Xi the real part and X2 the imaginary part of x. K-iy (n-1) + • • • + hy where 6. Definitions.Sec.'/ (n) {1) + b0y = 0. such that x(t) = xi(t) + ix2(t) for t G R. where i is the imaginary number such that i2 = —1. Given a complex-valued function x G T(R. n — 1. 1 . that substituting y(t) = t into (3) yields k y"(t) + -y(t) m = k -t. = a .

C) and hence a vector space over C. If y is a solution. 2 Linear Transformations and Matrices Suppose that x(t) — cos 2t + i sin 2t. Hence y^ exists. Since x2(t) = (cos 2t + i sin 2t)2 = (cos2 2t . Theorem 2. then y(2) = -4yThus since y^ is a constant multiple of a function y that has two derivatives.27. yW must have two derivatives. we can show that any solution has derivatives of all orders. which we omit. if x is a solution to such an equation. We next find the real and imaginary parts of x 2 . and the imaginary part is sin4£. then x^k' exists for every positive integer k. Continuing in this manner. 3. which is illustrated in Example 3. it is this vector space that . in fact.130 Example 2 Chap.(4) = -4yM. however. In view of Theorem 2. Any solution to a homogeneous linear differential equation with constant coefficients has derivatives of all orders. consider the equation / y(2) + 4y = 0. It is a simple exercise to show that C°° is a subspace of F(R.sin2 2t) + i(2 sin 2t cos 2i) = cos4i + isin4t. Example 3 To illustrate Theorem 2. Since j / 4 ' is a constant multiple of a function that we have shown has at least two derivatives. C). a function y must have two derivatives. that is. Then x'(t) = -2 sin 2t + 2i cos 2t. the real part of x2(t) is cos4i. involves a simple induction argument. hence y(6> exists.27. C) that have derivatives of all orders. Clearly. it also has at least two derivatives. We use C°° to denote the set of all functions in T(R. Its proof. • The next theorem indicates that we may limit our investigations to a vector space considerably smaller than T(R.27. to qualify as a solution. • Definition.

where p(t) is the auxiliary polynomial associated with the equation. We can use the derivative operation to define a mapping D: C°° — + C°° by D(x) = x' forxGC00.+ aiy{i) +a0y=(). More generally. For x G C°°. 2.Sec. It is easy to show that D is a linear operator. can be rewritten using differential operators as . the complex polynop(t) = tn + o n _ i i n _ 1 + • • • + oii + a 0 is called the auxiliary polynomial associated with the equation. .Dn + o w _ i D n .) Definitions.7 Homogeneous Linear Differential Equations with Constant Coefficients 131 is of interest to us. p(D) is called a differential operator.1 + • • • + o i D + o 0 l. For example. +an-it n . mial aiD + ool)(y) = 0. Given the differential equation above. y{n)+an-iy(n-v + --. this equation implies the following theorem. the derivative x' of x also lies in C°°. consider any polynomial over C of the form p(t) = antn If we define p(D) = anDn + On-iD*.1 + Definition. Clearly. then p(D) is a linear operator on C°°. For any polynomial p(t) over C of positive degree. (See Appendix E. (3) has the auxiliary polynomial Any homogeneous linear differential equation with constant coefficients can be rewritten as p(D)(y) = 0. Any homogeneous linear differential equation with constant coefficients.l + a\t + OQ. The order of the differential operator p(D) is the degree of the polynomial p(t). Differential operators are useful since they provide us with a means of reformulating a differential equation in the context of linear algebra.

for c = 2 + i(n/Z). The set of all solutions to a homogeneous linear differential equation with constant coefficients is a subspace of C°°. We now examine a certain class of functions that is of use in finding bases for these solution spaces. namely.28. I Corollary. then we obtain the usual result: ec = e a . For example. We now extend the definition of powers of e to include complex numbers in such a way that these properties are preserved. Proof Exercise. Let 6 = a + ib be a complex number with real part a and imaginary part b. certain properties of exponentiation. if c is real (6 = 0). es+t = eset and e — -r for any real numbers s and t. we are familiar with the real number e9. we can show by the use of trigonometric identities that ec+d = eced for any complex numbers c and d.. We know. where e is the unique number whose natural logarithm is 1 (i. lne = 1). The special case Jb = cos b + i sin 6 e is called Euler's formula. and e'c = — . The set of all solutions to a homogeneous linear differential equation with constant coefficients coincides with the null space ofp(D). for instance.e. 2 Linear Transformations and Matrices Theorem 2.132 Chap. Definition. Define ec = ea(cosb + isinb). For a real number s. where p(t) is the auxiliary polynomial associated with the equation. we call the set of solutions to a homogeneous linear differential equation with constant coefficients the solution space of the equation. In view of the preceding corollary. = e (cos-+Jsin-)=e ^- Clearly. A practical way of describing such a space is in terms of a basis. Using the approach of Example 2.

.) Since z' is identically zero.a0eaotx(t) = 0.tx(t). involves looking separately at the real and imaginary parts of z. computation. 2. Suppose that x(t) is any solution to (5). aot (5) Theorem 2. A justification of this involves a lengthy. The proof involves a straightforward computation. y eai. Exercise. Then x'(t) = -a0x(t) Define z(t) = Differentiating z yields z'(t) for all t G R. Theorem 2. is consistent with the real version.30. Proof.29. | t as a solution. = (eaot)'x(t) + e°°V(t) = o0e°°*x(*) .7 Homogeneous Linear Differential Equations with Constant Coefficients 133 Definition. this fact. A function f:R—*C defined by f(t) complex number c is called an exponential function. The solution space for (5) is of dimension 1 and has {e~ } as a basis. (Notice that the familiar product rule for differentiation holds for complexvalued functions of a real variable. Clearly (5) has e _ a . although direct. We conclude that any solution to (5) is a linear combination of e~aot. 1 We can use exponential functions to describe all solutions to a homogeneous linear differential equation of order 1. which we leave as an exercise. The proof. z is a constant function. = ect for a fixed The derivative of an exponential function. (Again.Sec. Proof. as described in the next theorem.) Thus there exists a complex number k such that z(t) = eaotx(t) = k So x(t) = ke-aot. for all t G R. is also true for complex-valued functions. well known for real-valued functions. Recall that the order of such an equation is the degree of its auxiliary polynomial. For any exponential function f(t) = ect. which relies on the real case. Thus an equation of order 1 is of the form y' + a0y = 0. f'(t) = cect.

l2/(»-1' + its auxiliary polynomial p(t) = tn + an_xtn'1 + • • • + ait + oo OiyW + a 0 ? / = 0. where c. el and e2t are solutions to the differential equation because c — 1 and c — 2 are zeros of p(t).30 is as follows.2). p(t) = (t-ci)(t-c2)---(t-cn). and so.U + 2 = (t .31. For any complex number c. by Exercise 9. (This follows from the fundamental theorem of algebra in Appendix D. N(D-Cil) C N(p(D)) foralli Since N(p(D)) coincides with the solution space of the given differential equation.cn are (not necessarily distinct) complex numbers. The operators D — c j commute..c2. Example 4 Given the differential equation y"-3y' its auxiliary polynomial is p(t) = t2 . if c is a zero ofp(t). ^+an. that is. This result is a consequence of the next theorem.l)(t .i. . we have that r. span({e*. Theorem 2. factors into a product of polynomials of degree 1.e 2 t } is linearly independent. . Let p(t) be the auxiliary polynomial for a homogeneous linear differential equation with constant coefficients.) Thus p(D) = (D-c1\)(D-c2\)---(D-cn\). Since the solution space of the differential equation is a subspace of C°°. the null space of the differential operator D — cl has {ect} as a basis. Hence. • + 2y=0. we can conclude that {e*. 2 Linear Transformations and Matrices Another way of stating Theorem 2.31. we can deduce the following result from the preceding corollary. We next concern ourselves with differential equations of order greater than one.134 Chap. Thus if we can show that the solution space is two-dimensional.e2*} is a basis for the solution space. Given an nth order homogeneous linear differential equation with constant coefficients. It is a simple matter to show that {e e . then ect is a solution to the differential equation. Corollary.e2*}) lies in the solution space. by Theorem 2. For any complex number c...

respectively. Let V be a vector space. W = w. Let W\ and w2 be the real and imaginary parts of w.. W\ and W2. Then the null space of TU is finite-dimensional... We wish to find a u G C°° such that (D — c\)u = v. Let v G C°°.v2. and suppose that T and U are linear operators on V such that U is onto and the null spaces of T and U are finite-dimensional. and the real and imaginary parts of W are W\ and W2.. Since U is onto. Let w(t) = v(t)e~ct for t G R.. Finally.cu(t) = W'(t)cct = w(t)e. Lemma 1.. Then w\ and w2 are continuous because they are differentiable.32. the null space ofp(D) is an n-dimensional subspace of C°°. Furthermore. To complete the proof of the lemma.vi. for any i and j. Furthermore.. Clearly u G C°°. . respectively. Hence they have antiderivatives.vq} contains p + q distinct vectors. Clearly. Proof.u2..7 Homogeneous Linear Differential Equations with Constant Coefficients 135 Theorem 2. . and since (D . 2.Vq] be bases for N(T) and N(U). Note that the Wi's are distinct.v2.. II + W(t)cect cW(f)ect L e m m a 2. q = dim(N(U)).wp.up} and {v\. we can choose for each i (1 < i < p) a vector Wi G V such that U(wi) — m. we establish two lemmas. For any differential operator p(D) of order n.:t = v(t)erctert = v(t). Proof. The differential operator D — c\: C°° — > C°° is onto for any complex number c.Sec. Let p — dim(N(T)). let u: R — * C be defined by u(t) = W(t)ect for t G R.. and {ui.c\)u(t) = u'(t) .. w G C°° because both v and e~ct lie in C°°. . say. Then W G C°°.32. Hence the set 0= {wi.w2. —* C be defined by W(t) = Wt (t) + iW2(t) for t G R. Wi 7^ Vj. for otherwise Ui = i)(wi) = U(VJ) = 0—a contradiction. As a preliminary to the proof of Theorem 2. it suffices to show that 0 is a basis for N(TU). Let W: R. respectively. and dim(N(TU)) = dim(N(T)) + dim(N(U)). . we have (D — c\)u = v.. .

TU(v) = T(U(v)). T U K ) = T(ui) = 0 and TU(^) = T(0) = 0. bq such that v . I Proof of Theorem 2. We conclude that 0 is a basis for N(TU). we obtain oiUi + a2u2 + • • • + apup — 0. Again.npwp).. V apup h aPU(wp) f. The proof is by mathematical induction on the order of the differential operator p(D)..ap..b\. suppose that Theorem 2.136 Chap. Thus (6) &if 1 + b2v2 " 1 r bqvq = 0.. any scalars such that aiwi + a 2 u'2 H h apwp + &1V1 + 62^2 H r...32.02. reduces to .. Since {m.(aiwi + a2w2 H or v = aiu/i + 02^2 H f. let ai. Since for any Wi and Vj in 0. Thus U(v) € N(T).6 g v 9 ..vq} implies that the frj's are all zero...aPwp)) = 0. = aiU(w\) + a2U(w2) H Consequently.32 holds for any differential operator of order less than n.v2. It follows that there exist scalars 61. it follows that 0 C N(TU).30. (6) Applying U to both sides of (6)...bq be r. and consider a differential . b2.a p w p ) = 61^1 + & 2 W 2H h &?u9 Therefore /? spans N(TU). the a..... The first-order case coincides with Theorem 2.. and dim(N(TU)) = p + q = dim(N(T)) + dim(N(U)). For some integer n > 1.. .up] is linearly independent. So there exist scalars 01. Now suppose that v G N(TU). Then 0 .o p such that \J(v) = aiu\ + 02^2 + = \J(ttiWi + 02^2 + Hence \J(v — (aiwi + 02^2 H f.&9vg = 0. v — (a\Wi + 02^2 + • • • + apwp) lies in N(U). Hence N(TU) is finitedimensional.apwp + 61 vi + 62^2 H r. .u2. ..a2. To prove that 0 is linearly independent.b2. 2 Linear Transformations and Matrices We first show that 0 generates N(TU). . the linear independence of {v\.'s are all zero.

D — cl is onto.. II ... Also..) Example 5 We find all solutions to the differential equation y" + 5y' + 4y=0. Thus the given differential operator may be rewritten as p(D) = g(D)(D-cl).33..30.. 2. Proof Exercise. eCn*} is a basis for the solution space of the differential equation. The solution space of any nth-order homogeneous linear differential equation with constant coefficients is an n-dimensional subspace of C°°. By the results of Chapter 1. the set of exponential functions {eClt..d ) ) = 1.eC2t.. Hints for its proof arc provided in the exercises. then {eCl*. Exercise.c2.tt} is linearly independent. . and by the corollary to Theorem 2.. Proof.7 Homogeneous Linear Differential Equations with Constant Coefficients 137 operator p(D) of order n..) 1 Corollary. (See Exercise 10.\. by the induction hypothesis.c n . Thus. The corollary to Theorem 2. (See Exercise 10..Cn.th-order homogeneous linear differential equation with constant coefficients to finding a set of n linearly independent solutions to the equation. The polynomial p(t) can be factored into a product of two polynomials as follows: p(t) = q(t)(t-c). For any nth-order homogeneous linear differential equation with constant coefficients. any such set must be a basis for the solution space. ec'2t.. we conclude that dim(N(p(D))) = dim(N(<-/(D))) + dim(N(D . by Lemma 2.ec.32 reduces the problem of finding all solutions to an r/.. I Corollary. d i m ( N ( D .Sec. dim(N(r/(D)) = n—1. The next theorem enables us to find a basis quickly for many such equations..c2. . Theorem 2. where q(t) is a polynomial of degree n — 1 and c is a complex number. if the auxiliary polynomial has n distinct zeros Ci. Now.cl)) = (n . Given n distinct complex numbers c.1) + 1 = n. by Lemma 1.

its solution space is two-dimensional. for which the auxiliary polynomial is (t + l) 2 .. Thus {e3*f. So any solution to the given equation is of the form y(t)=ble-t for unique scalars b\ and b2.138 Chap.. Using this latter basis. For a given complex number c and positive integer n.32. Example 6 We find all solutions to the differential equation y" + 9y = 0. By the corollary to Theorem 2.. In order to obtain a basis for the solution space. Thus {e~L. The reader can verify that te~l is a such a solution. . suppose that (t — c)n is the auxiliary polynomial of a homogeneous linear differential equation with constant coefficients. Since cos 3^ = .. By Theorem 2.tect. —1 and —4. 2 Linear Transformations and Matrices Since the auxiliary polynomial factors as (t + 4)(t + 1). we see that any solution to the given equation is of the form y(t) = b\ cos St + b2 sin 3t for unique scalars &iand b2. The auxiliary polynomial t2 + 9 factors as (t — 3i)(t + 3?') and hence has distinct zeros Ci = 3i and c2 — —3i. sin 3t} is also a basis for this solution space.e~4t} is a basis for the solution space. The following lemma extends this result. it has two distinct zeros.e-3**} is a basis for the solution space. e~l is a solution to this equation.31.tn-1ect} is a basis for the solution space of the equation. • + b2e-4t it follows from Exercise 7 that {cos St. This basis has an advantage over the original one because it consists of the familiar sine and cosine functions and makes no reference to the imaginary number i. we need a solution that is linearly independent of e _ t . • Next consider the differential equation y" + 2y' + y = 0.( e3/7 + e~ 3 i t ) and sin3£ = —(e :nt 2i -3it ). Then the set 0={ect. Lemma.

. we need only show that 0 is linearly independent and lies in the solution space. the following set is a basis for the solution space of the equation: {eClt. It follows that 0 is a subset of the solution space. tef. Since the auxiliary polynomial is t4 .Tl-l„ct b0ect + hie" + • • • + bn-itn-*ea =0 (7) for some scalars bo..e Cfct .tni-1ec. We next show that 0 is linearly independent. fte1:. Consider any linear combination of vectors in 0 such that . &i. f] Example 7 We find all solutions to the differential equation y(4) _ 4y (3) + §yJy(2) K -^(l) +y=0. First... .. T h e o r e m 2.teCl\ .Dividing by ect in (7).. Given a homogeneous linear differential equation with constant coefficients and auxiliary polynomial (t-Ci)ni(t-c2)n*-'-(t-ck)n\ where n\. 2..4t6 + 6t2 -4t+l = (tl) 4 .........t.Sec.. • The most general situation is stated in the following theorem. we can immediately conclude by the preceding lemma that {e*. and 64. b2.tn" eCkt}. ct K ct K k-\ Lk„ct D .ck are distinct complex numbers. we obtain &n + M 1 bn^tn~n-l = 0. .34. We conclude that the coefficients bo.c2.ie Cfct . (D-c\)n(tkect) = 0.c\)(tKect) = ktK ~le„ct + ctLk„ct e .\.. observe that for any positive integer k. Since the solution space is n-dimensional. ....ct e = Hence for k < n. bj.7 Homogeneous Linear Differential Equations with Constant Coefficients 139 Proof. 6 n -i.bn—i are all zero. So 0 is linearly independent and hence is a basis for the solution space. (8) Thus the left side of (8) must be the zero polynomial function. So any solution y to the given differential equation is of the form y(t) = bie1 + bite* + M 2 e* + ht3e* for unique scalars b\. ktk~1ect. . .bi.nk arc positive integers and c..n2.. i3e*} is a basis for the solution space..

if ci.C2.--.£e*. (a) The set of solutions to an nth-order homogeneous linear differential equation with constant coefficients is an n-dimensional subspace of C°°. and 63.l)2(t .c C2 *. Thus any solution y has the form y(t) = 6ie* + b2tet + b3e2t for unique scalars b\. Exercise. (g) Given any polynomial p(t) G P(C).4£2 + 5t ..2). (b) The solution space of a homogeneous linear differential equation with constant coefficients is the null space of a differential operator. (c) The auxiliary polynomial of a homogeneous linear differential equation with constant coefficients is a solution to the differential equation. {e*.eCkt} is a basis for the solution space of the given differential equation.140 Proof.34. Example 8 The differential equation y(3) Chap. there exists a homogeneous linear differential equation with constant coefficients whose auxiliary polynomial is p(t). . By Theorem 2. then {e Cl '. 2 Linear Transformations and Matrices | _ 4y(2) + 5^(1) -2y= 0 has the auxiliary polynomial f 3 . (f) For any homogeneous linear differential equation with constant coefficients having auxiliary polynomial p(t). Label the following statements as true or false... ..2 = (* .b2. (d) Any solution to a homogeneous linear differential equation with constant coefficients is of the form aect or atkect. (e) Any linear combination of solutions to a given homogeneous linear differential equation with constant coefficients is also a solution to the given equation. where a and c are complex numbers and k is a positive integer. • EXERCISES 1.Cfe are the distinct zeros of p(t).e2<} is a basis for the solution space of the differential equation.

Find a basis for the solution space of each of the following differential equations. then (d) x + yeU(P(D)q(D)).I ) (c) N(D3 + 6D 2 +8D) 5. C). For each of the following parts. then so is \{=o + v). Consider a second-order homogeneous linear differential equation with constant coefficients in which the auxiliary polynomial has distinct conjugate complex roots a + ib and a — ib. (b) Show that any differential operator is a linear operator on C01 7.3 D 2 + 3 D .7 Homogeneous Linear Differential Equations with Constant Coefficients 141 2. Find a basis for each of the following subspaces of C°°. Justify your claim with either a proof or a counterexample.D . (a) Show that D: C°° — * C°° is a linear operator.y{2) + 3y{1) +5y = 0 / 4. Given two polynomials p(t) and q(t) in P(C). is a solution to the equation. if a.b G R. (b) There exists a homogeneous linear differential equation with constant coefficients whose solution space has the basis {t.2. 6. Show that C°° is a subspace of T(R. . t2}. Prove that if {x. (a) N ( D 2 . where a. eat sin bt} is a basis for the solution space.</ ?/4) . (e) xyeU(p(D)q(D)). (a) Any finite-dimensional subspace of C°° is the solution space of a homogeneous linear differential equation with constant coefficients. Show that {eat cos bt. 2. 3. y} is a basis for a vector space over C.I ) (b) N ( D 3 .^-y) 8. whichever is appropriate. so is its derivative x'. + y=0 y(3) . if x G N(p(D)) and y G N((/(D)). (a) (b) (c) (d) (e) y" + 2y' + y = 0 */" = . determine whether the statement is true or false.Sec.(/2> + y = 0 y" + 2y. (c) For any homogeneous linear differential equation with constant coefficients.

apply the operator D — c n l to both sides equation to establish the theorem for n distinct exponential 11.e. The case k = 1 is the lemma to Theorem 2. apply the operator (D — Cfcl)"fc to any linear combination of the alleged basis that equals 0. where Dv: V — > V is defined by Dv(x) = x' for x G V. where g(t) and h(t) are polynomials of positive degree. Assuming that the theorem holds for k — 1 distinct Cj's. Then prove that the two spaces have the same finite dimension.142 Chap. (a) Prove that for any x G C°° there exists y G C°° such that y is a solution to the differential equation. Assuming that the theorem n — 1 functions.j). A differential equation y ( n ) + an-iy{n-l) + •••+ alV{l) + a0y = x is called a nonhomogeneous linear differential equation with constant coefficients if the a% 's are constant and x is a function that is not identically zero. • • •. Then verify that this set is linearly independent by mathematical induction on k as follows.34. U n } is a collection of pairwise commutative linear operators on a vector space V (i. 12. 13. U2. Let V be the solution space of an nth-order homogeneous linear differential equation with constant coefficients having auxiliary polynomial p(t). Prove that if p(t) = g(t)h(t). operators such that U7Uj = UjUi for all i. To show the b^s are zero. the linear operator p(D): C°° —• C°° is onto.32 to show that for any polynomial pit). Hint: First verify that the alleged basis lies in the solution space. Suppose that {Ui. 10. Hint: Use Lemma 1 to Theorem 2. apply mathematical induction on n Verify the theorem for n = 1..33 and its corollary. Prove Theorem 2. for any i (1 < i < n). Hint: Suppose that b\eClt + b2eC2t -\ h bneCnt = 0 (where the c^'s are distinct). . Prove Theorem 2.34. Hint: First prove o(D)(V) C N(/i(D)). 2 Linear Transformations and Matrices 9. Prove that. then N(MD)) = R(o(Dv)) = p(D)(V). is true for of the given functions. as follows. N(U i )CN(UiU 2 ---U n ).

to the given differential equation such that x(to) — co and x^(to) = ck for k = 1. \xSn-lHto)J (a) Prove that $ is linear and its null space is the zero subspace of V. and define a mapping $ : V — +C ™ by * *(*) = x(to) x'(to) \ for each x in V.. Hint: Use Exercise 14. Deduce that «£ is an isomorphism.T(TC_1)(£0) = 0. Prove that if z is any solution to the associated nonhomogeneous linear differential equation. there exists exactly one solution. Hint: Use mathematical induction on n as follows. prove that. and any complex numbers Co. 15.c n _i (not necessarily distinct). .. Now apply the induction hypothesis. . 16. It is well known that the motion of a pendulum is approximated by the differential equation 0" + ^8= 0. and consider an nth-order differential equation with auxiliary polynomial p(t). (b) Prove the following: For any nth-order homogeneous linear differential equation with constant coefficients. Show that z(to) = 0 a n d z' — cz — 0 to conclude that z = 0. First prove the conclusion for the case n = 1. Pendular Motion. Fix ta G R. n — 1. any to G R. Given any nth-order homogeneous linear differential equation with constant coefficients. . and let z = q((D))x. if x(t0) = x'(to) = ••• = . then the set of all solutions to the nonhomogeneous linear differential equation is [z + y: ye V}. Factor p(t) = q(t)(t — c).Sec.8). 2. Let V be the solution space of an nth-order homogeneous linear differential equation with constant coefficients.. for any solution x and any to G R. . x. 2 . 14. Ci. interpreted so that 6 is positive if the pendulum is to the right and negative if the pendulum is to the . Next suppose that it is true for equations of order n — 1. where 6(t) is the angle in radians that the pendulum makes with a vertical line at time t (see Figure 2.7 Homogeneous Linear Differential Equations with Constant Coefficients 143 (b) Let V be the solution space for the homogeneous linear equation y ^ + on-i^"-1) •+Oii(/ (1) +a0y= 0. then x = 0 (the zero function).

The variable t and constants / and g must be in compatible units (e. but that acts in the opposite direction. called the damping force. is given by my + ry' + ky — 0. The modification of (3) to account for the frictional force.) (c) Prove that it takes 2-Ky/lJg units of time for the pendulum to make one circuit back and forth. there is a frictional force acting on the motion that is proportional to the speed of motion.) 17.8 left of the vertical line as viewed by the reader. where r > 0 is the proportionality constant. (a) Find the general solution to this equation. . (This time is called the period of the pendulum. (a) Express an arbitrary solution to this equation as a linear combination of two real-valued solutions. In reality. The ideal periodic motion described by solutions to (3) is due to the ignoring of frictional forces. t in seconds. Find the general solution to (3). Periodic Motion of a Spring with Damping. (The significance of these conditions is that at time t = 0 the pendulum is released from a position displaced from the vertical by 0n..144 Chap.g. 2 Linear Transformations and Matrices Figure 2. ignoring frictional forces. 18. which describes the periodic motion of a spring. (b) Find the unique solution to the equation that satisfies the conditions 0(0) = 0O > 0 and 0'(O) = 0. and g in meters per second per second). however. / in meters. Periodic Motion of a Spring without Damping. Here I is the length of the pendulum and g is the magnitude of acceleration due to gravity.

that is. which do not involve linear algebra. Justify this approach. prove that ec+d = cced and (c) Prove Theorem 2. C). prove that lim y(t) = 0. (d) Prove Theorem 2. show that the amplitude of the oscillation decreases to zero. the initial velocity. (c) For y(t) as in (b). the product xy is differentiable and (xy)' = x'y + xy'. Hint: Apply the rules of differentiation to the real and imaginary parts of xy. Hint: Use mathematical induction on the number of derivatives possessed by a solution. 20. C) and x' — 0.27. 2 Index of Definitions 145 (b) Find the unique solution in (a) that satisfies the initial conditions y(0) — 0 and y'(0) = vo. arc included for the sake of completeness. t—>oo 19.d G C.28. (e) Prove the product rule for differentiating complex-valued functions of a real variable: For any differentiable functions x and y in T(R. (b) For any c. then x is a constant function. In our study of differential equations.Chap. (f) Prove that if x G J-(R.29. The following parts. INDEX OF DEFINITIONS FOR CHAPTER 2 Auxiliary polynomial 131 Change of coordinate matrix 112 Clique 94 Coefficients of a differential equation 128 Coordinate function 119 Coordinate vector relative to a basis 80 Differential equation 128 Differential operator 131 Dimension theorem 69 Dominance relation 95 Double dual 120 Dual basis 120 Dual space 119 Euler's formula 132 Exponential function 133 Fourier coefficient 119 Homogeneous linear differential equation 128 Identity matrix 89 Identity transformation 67 . we have regarded solutions as complex-valued functions even though functions that are useful in describing physical motion are real-valued. (a) Prove Theorem 2.

146 Chap. 2 Linear Transformations and Matrices Order of a differential operator 131 Product of matrices 87 Projection on a subspace 76 Projection on the x-axis 66 Range 67 Rank of a linear transformation 69 Reflection about the x-axis 66 Rotation 66 Similar matrices 115 Solution to a differential equation 129 Solution space of a homogeneous differential equation 132 Standard ordered basis for F" 79 Standard ordered basis for Pn(F) 79 Standard representation of a vector space with respect to a basis 104 Transpose of a linear transformation 121 Zero transformation 67 Incidence matrix 94 Inverse of a linear transformation 99 Inverse of a matrix 100 Invertible linear transformation 99 Invertible matrix 100 Isomorphic vector spaces 102 Isomorphism 102 Kronecker delta 89 Left-multiplication transformation 92 Linear functional 119 Linear operator 112 Linear transformation 65 Matrix representing a linear transformation 80 Nonhomogeneous differential equation 142 Nullity of a linear transformation 69 Null space 67 Ordered basis 79 Order of a differential equation 129 / .

4. 147 . the study of certain "rank-preserving" operations on matrices. multiplying any equation in the system by a nonzero constant.1 3. In this representation of the system. his chapter is devoted to two related objectives: 1.4 Elementary Matrix Operations and Elementary Matrices The Rank of a Matrix and Matrix Inverses Systems of Linear Equations—Theoretical Aspects Systems of Linear Equations—Computational Aspects _£. The technique by which the variables are eliminated utilizes three types of operations: 1. 2.3. These operations provide a convenient computational method for determining all solutions to a system of linear equations. involves the elimination of variables so that a simpler system can be obtained. In Section 3.2 3. which was discussed in Section 1. the three operations above are the "elementary row operations" for matrices. 3. adding a multiple of one equation to another. interchanging any two equations in the system. we obtain a simple method for computing the rank of a linear transformation between finite-dimensional vector spaces by applying these rank-preserving matrix operations to a matrix that represents that transformation. we express a system of linear equations as a single matrix equation. the application of these operations and the theory of linear transformations to the solution of systems of linear equations. Solving a system of linear equations is probably the most important application of linear algebra. 2.3 E l e m e n t a r y O p e r a t i o n s o f L i n e a r M a t r i x a n d S y s t e m s E q u a t i o n s 3. The familiar method of elimination for solving systems of linear equations. As a consequence of objective 1.3 3.

or (3). (3) adding any scalar multiple of a row [column] of A to another row [column]. type 2. Any of these three operations is called an elementary operation. The resulting matrix is B = Multiplying the second column of A by 3 is an example of an elementary column operation of type 2.12). There are two types of elementary matrix operations—row operations and column operations.1 Chap. They arise from the three operations that can be used to eliminate variables in a system of linear equations. (2) multiplying any row [column] of A by a nonzero scalar. or type 3 depending on whether they are obtained by (1). Definitions. As we will see. Example 1 Let A = Interchanging the second row of A with the first row is an example of an elementary row operation of type 1. Any one of the following three operations on the rows [columns] of A is called an elementary row [column] operation: (1) interchanging any two rows [columns] of A. we use these operations to obtain simple computational methods for determining the rank of a linear transformation and the solution of a system of linear equations. we define the elementary operations that are used throughout the chapter. The resulting matrix is C = . Elementary operations are of type 1. Let A be an m x n matrix. the row operations are more useful. 3 Elementary Matrix Operations and Systems of Linear Equations ELEMENTARY MATRIX OPERATIONS AND ELEMENTARY MATRICES In this section.148 3. In subsequent sections.

For example. or 3 operation. In this case. A can be obtained from M by adding —4 times the third row of M to the first row ofM. Then there exists an m x m [n x n] elementary matrix E such that B — EA [B = AE]. if E is . or 3 according to whether the elementary operation performed on In is a type 1. Notice that if a matrix Q can be obtained from a matrix P by means of an elementary row operation. fn fact.Sec.1 Elementary Matrix Operations and Elementary Matrices 149 Adding 4 times the third row of A to the first row is an example of an elementary row operation of type 3. is an elementary matrix since it can be obtained from 1$ by an elementary column operation of type 3 (adding —2 times the first column of I3 to the third column) or by an elementary row operation of type 3 (adding —2 times the third row to the first row). and suppose that B is obtained from A by performing an elementary row [column] operation. the resulting matrix is M = '17 2 7 12' 2 1 . 2. (See Exercise 4. interchanging the first two rows of I3 produces the elementary matrix Note that E can also be obtained by interchanging the first two columns of J3. An n x n elementary matrix is a matrix obtained by performing an elementary operation on In. Let A € M m X n (F). Definition. (See Exercise 8. 3. any elementary matrix can be obtained in at least two ways— either by performing an elementary row operation on In or by performing an elementary column operation on In. The elementary matrix is said to be of type 1. In fact. in Example 1. 2. then P can be obtained from Q by an elementary row operation of the same type. respectively.1 3 4 0 1 2 . Conversely. E is obtained from fm [fn] by performing the same elementary row [column] operation as that which was performed on A to obtain B.1.) Similarly. Theorem 3.) So. Our first theorem shows that performing an elementary row operation on a matrix is equivalent to multiplying the matrix by an elementary matrix.

which we omit. In this case. then EA [AE] is the matrix obtained from A by performing the same elementary row [column] operation as that which produces E from Im [In]. 3 Elementary Matrix Operations and Systems of Linear Equations an elementary m x m [n x n] matrix. By reversing the steps used to transform J n into E. we obtain the elementary matrix E = Note that EA = B. The result is that In can be obtained from E by an elementary row operation of the same type. we can transform E back into fn. and the inverse of an elementary matrix is an elementary matrix of the same type.1 for each type of elementary row operation. In the second part of Example 1. By Theorem 3.1. Let E be an elementary nxn matrix. Then E can be obtained by an elementary row operation on fn. The proof for column operations can then be obtained by using the matrix transpose to transform a column operation into a row operation. T h e o r e m 3.) The next example illustrates the use of the theorem. The proof. Proof.2. (See Exercise 7. requires verifying Theorem 3. we obtain the elementary matrix (\ 0 E = 0 \0 Observe that AE = C. Therefore. B is obtained from A by interchanging the first two rows of A. I 0 0 0\ 3 0 0 0 1 0 0 0 1/ . The details are left as an exercise.150 Chap. It is a useful fact that the inverse of an elementary matrix is also an elementary matrix. by Exercise 10 of Section 2. Elementary matrices are invertible. Performing this same operation on J3. Performing this same operation on I4.4. C is obtained from A by multiplying the second column of A by 3. Example 2 Consider the matrices A and B in Example 1. E is invertible and E~1 = E. there is an elementary matrix E such that EE = fn.

The transpose of an elementary matrix is an elementary matrix.3 3\ /l 1 . If B is a matrix that can be obtained by performing an elementary row operation on a matrix A.1 Elementary Matrix Operations and Elementary Matrices EXERCISES 1. By means of several additional operations. 3.2 0 1/ 4. transform C into I3. Prove that if B can be obtained from A by an elementary row [column] operation. (a) (b) (c) (d) (e) (f) (g) (h) 151 (i) An elementary matrix is always square. 6. . 5. 7.2 to obtain the inverse of each of the following elementary matrices. Let A--= /l 0 . The inverse of an elementary matrix is an elementary matrix. and C ^ 0 1/ \l 0 -2 -3 31 -2 1 Find an elementary operation that transforms A into B and an elementary operation that transforms B into C. Prove Theorem 3. Let A be an m x n matrix. The product of two n x n elementary matrices is an elementary matrix. then B can also be obtained by performing an elementary column operation on A. (a) /o 0 \1 0 1\ 1 0 0 0/ /l (b) ( 0 \0 0 0\ 3 0 0 1/ / 1 0 0\ (c) ( 0 1 0 \ . Prove the assertion made on page 149: Any elementary nxn matrix can be obtained in at least two ways -either by performing an elementary row operation on In or by performing an elementary column operation on fn. The sum of two nxn elementary matrices is an elementary matrix. B = ( 1 -2 VI .1. then A can be obtained by performing an elementary row operation on B. Prove that E is an elementary matrix if and only if Et is. Use the proof of Theorem 3. The only entries of an elementary matrix are zeros and ones. then Bl can be obtained from At by the corresponding elementary column [row] operation. Label the following statements as true or false. 3.Sec. 2. The n x n identity matrix is an elementary matrix. If B is a matrix that can be obtained by performing an elementary row operation on a matrix A.

Prove that any elementary row [column] operation of type 2 can be obtained by dividing some row [column] by a nonzero scalar.4 with respect to the appropriate standard ordered bases. Prove that any elementary row [column] operation of type 3 can be obtained by subtracting a multiple of some row [column] from another row [column]. Many results about the rank of a matrix follow immediately from the corresponding facts about a linear transformation. denoted rank(^4). A.4. 10. Theorem 3. namely. Thus the rank of the linear transformation LA is the same as the rank of one of its matrix representations.152 Chap.3. The next theorem extends this fact to any matrix representation of any linear transformation defined on finite-dimensional vector spaces. 12. Prove that any elementary row [column] operation of type 1 can be obtained by a succession of three elementary row [column] operations of type 3 followed by one elementary row [column] operation of type 2. 100) and Corollary 2 to Theorem 2. This is a restatement of Exercise 20 of Section 2. 3. we define the rank of a matrix. and let 0 and 7 be ordered bases for V and W.2 THE RANK OF A MATRIX AND MATRIX INVERSES In this section. Prove that if a matrix Q can be obtained from a matrix P by an elementary row operation. Hint: Treat each type of elementary row operation separately. Every matrix A is the matrix representation of the linear transformation L. Definition. We then use elementary operations to compute the rank of a matrix and a linear transformation. respectively. Proof. ffA G Mmxn(F). then P can be obtained from Q by an elementary row operation of the same type. is that an n x n matrix is invertible if and only if its rank is n. Let T: V — > W be a linear transformation between finitedimensional vector spaces. which follows from Fact 3 (p. 9. The section concludes with a procedure for computing the inverse of an invertible matrix. we define the rank of A. 3 Elementary Matrix Operations and Systems of Linear Equations 8. 102). An important result of this type. to be the rank of the linear transformation LA • F n — > Fm. Let A be an m x n matrix. 11. | . Prove that there exists a sequence of elementary row operations of types 1 and 3 that transforms A into an upper triangular matrix. Then rank(T) = rank([T]^).18 (p.

The next theorem and its corollary tell us how to do this. and hence rank(Z?) = rank(/4) by Theorem 3. m x m and nxn matrices. Let A be an m x n matrix. Proof. Therefore rank(AQ) = dim(R(L^c))) = dim(R(L yl )) = rank(A). We omit the details. apply Exercise 17 of Section 2.5.2 The Rank of a Matrix and Matrix Inverses 153 Now that the problem of finding the rank of a linear transformation has been reduced to the problem of finding the rank of a matrix. E is invertible.4. Theorem 3. Proof. |fi| Now that we have a class of matrix operations that preserve rank. First observe that R(L. then there exists an elementary matrix E such that B = EA. . Theorem 3. (c) rank(PAQ) = rank(A). 150). This establishes (a). we need a result that allows us to perform rank-preserving operations on matrices. If B is obtained from a matrix A by an elementary row operation. respectively. applying (a) and fab). By Theorem 3. that is. The proof that elementary column operations are rank-preserving is left as an exercise. To establish (b). rank(i4) = rank(L A ) = dim(R(Lj4)). the rank of a matrix is the dimension of the subspace generated by its columns. | ff P and Q are invertible Corollary.Sec. Elementary row and column operations on a matrix are rankpreserving. For any AGMmxn(F). we have mnk(PAQ) = rank(PA) = rank(A). The next theorem is the first of several in this direction. we need a way of examining a transformed matrix to ascertain its rank.4Q) = R(L^LQ) = L /V L Q (F") = L A ( L Q ( R ) ) = L ^ R ) = R(LA) since LQ is onto. Proof. The rank of any matrix equals the maximum number of its linearly independent columns.2 (p. (b) rank(PA) = rank(A).4 to T = Lp. Finally.4. 3. and therefore. then (a) rank(j4<2) = rank(A).

4 guarantees that the rank of the modified matrix is the same as the rank of A. 3 Elementary Matrix Operations and Systems of Linear Equations Let 0 be the standard ordered basis for F n . . Thus rank(A) = dim(R(L / i)) = dim(span({ai.2 (p. by Theorem 2.. . for any j. we have seen in Theorem 2.a 2 . The corollary to Theorem 3..A) = cjim I span = 2. But. 68). 90) that LA(CJ) = Aej = a.5 until A has been suitably modified by means of appropriate elementary row and column operations so that the number of linearly independent columns is obvious.. . . The next example illustrates this procedure. it is frequently useful to postpone the use of Theorem 3. the result is .a„})).. where aj is the j t h column of A. o 2 .. Then 0 spans F n and hence. Thus rank(. Example 2 Let A = If we subtract the first row of A from rows 2 and 3 (type 3 elementary row operations).. One such modification of A can be obtained by using elementary row and column operations to introduce zero entries.. K(LA) = span(LA(/?)) = span ({L A (ei). a n } ) . . Example 1 Let Observe that the first and second columns of A are linearly independent and that the third column is a linear combination of the first two. U(en)})..13(b) (p. To compute the rank of a matrix A. LA(e2). Hence R(LA) = s p a n ( { a i .154 Chap.

Then r < m. r < n. The number above each arrow indicates how many elementary operations are involved. T h e o r e m 3. As an aid in following the proof. though easy to understand.6. and. we first consider an example. Example 3 Consider the matrix /0 2 4 4 A = 8 2 \6 3 4 4 0 2 2 2\ 8 0 10 2 9 1/ By means of a succession of elementary row and column operations. A) 4 8 \6 2 4 2 3 4 4 0 2 2 2\ 8 0 10 2 9 1/ (4 4 0 2 8 2 \6 3 4 4 0 2 8 0\ 2 2 10 2 9 1/ / 1 0 8 \6 1 1 2 4 2 0 3 2 2 0\ 2 2 10 2 9 1/ .Sec. we obtain It is now obvious that the maximum number of linearly independent columns of this matrix is 2. Its proof.6 and its corollaries are quite important. Let A be an m x n matrix of rank r. We list many of the intermediate matrices.2 The Rank of a Matrix and Matrix Inverses 155 If we now subtract twice the first column from the second and subtract the first column from the third (type 3 elementary column operations). The power of this theorem can be seen in its corollaries. by means of a finite number of elementary row and column operations. but on several occasions a matrix is transformed from the preceding one by means of several elementary operations. • The next theorem uses this process to transform a matrix into a particularly simple form. Theorem 3.6. Hence the rank of A is 2. A can be transformed into the matrix D = Ir Q2 OX Oa where 0\. Try to identify the nature of each elementary operation (row or column and type) in the following matrix transformations. is tedious to read. and O3 are zero matrices. 3. we can transform A into a matrix D as in Theorem 3. 02. Thus Du = 1 for i < r and Dij = 0 otherwise.

and the next several operations (type 3) result in 0's everywhere in the first row and first column except for the 1. so rank(^l) = 3. By means of at most one elementary row and at most one elementary column .6. If A is the zero matrix. rank (. r = 0 by Exercise 3. The proof is by mathematical induction on m. Now suppose that A ^ O and r = rank (A). Clearly. Theorem 3. Thus the theorem is established for m = 1. Note that there is one linearly independent column in D. In this case. however.6.3 1/ (I 0 0 3 /l 0 0 \0 0 0 2 4 -6 -8 -3 -4 0\ 1 8 4/ 0 °\ 2 2 -6 2 -3 i/ 0 0 4 2 1 (1 0 0 [p 0 0 1 2 --6 . Subsequent elementary operations do not change the first row and first column. this matrix can in turn be transformed into the matrix (1 0 0).4 0 0\ 1 1 -6 2 . rank(D) = 3. Next assume that the theorem holds for any matrix with at most m — 1 rows (for some m > 1). By means of at most n — 1 type 3 column operations.4 and by Theorem 3.3 . we proceed with the proof of Theorem 3.156 Chap. Since A ^ O. Aij ^ 0 for some i.8 --3 ./I) = rank(D).3 1/ 0 0\ 0 0 0 2 0 4y 2 A 0 0 0 0 1 2 1 ^ 0 0 4 0 \0 0 2 0 0 0 0 ( 0\ 0 2 °> 3 /l 0 0 1 * 0 0 \0 0 0 0 0 0 0\ 0 8 4/ (\ 0 0 0 1 0 0 0 1 ^° 0 2 l /l c 0 0 1 0 0 0 1 v» ( 0 I /l 0 0 0 0 0 <A 1 0 0 0 0 1 0 0 V» 0 0 0 oy By the corollary to Theorem 3.1 position. If n — 1. the conclusion follows with D = A.4. then r > 0. We now suppose that n > 1.1 position. With this example in mind. By means of at most one type 1 column operation and at most one type 2 column operation. the number of rows of A. Proof of Theorem 3.1 position.6 can be established in a manner analogous to that for m = 1 (see Exercise 10). • Note that the first two elementary operations in Example 3 result in a 1 in the 1. 3 Elementary Matrix Operations and Systems of Linear Equations 1 1 2 »> 2 4 2 2 -6 -8 -6 2 Vo . Suppose that A is any m x n matrix.4 . Suppose that m = 1. We must prove that the theorem holds for any matrix with m rows. So rank(D) = rank(A) = 1 by the corollary to Theorem 3. A can be transformed into a matrix with a 1 in the 1. j.5.

1 position. Therefore r — 1 < m — 1 and r — \ < n — 1 by the induction hypothesis.A) = rank(jB) = r. B' has rank one less than B. A can be transformed into D by a finite number of elementary operations. we used two row and three column operations to do this. B' can be transformed by a finite number of elementary row and column operations into the (m — 1) x (n— 1) matrix D' such that D' = /r-] On 04 Oe where O4. with a finite number of elementary operations. which are ones. since A can be transformed into B and B can be transformed into D. and OQ are zero matrices. D' consists of all zeros except for its first r — 1 diagonal entries.2 The Rank of a Matrix and Matrix Inverses 157 operation (each of type 1). / 2 B' =1-6 1-3 4 -8 -4 2 2' -6 2 -3 1 By Exercise 11. However this follows by repeated applications of Exercise 12. By means of at most one additional type 2 operation. (Look at the second operation in Example 3. 3. Since rank(.1 position.) Thus. Also by the induction hypothesis. In Example 3.Sec. A can be transformed into a matrix /1 0 B = Vo 0 ••• B' ) 0 \ where B' is an (m — 1) x (n — 1) matrix. . we can eliminate all nonzero entries in the first row and the first column with the exception of the 1 in the 1. That is.1 position (just as was done in Example 3). Hence r < m and r < n. we can move the nonzero entry to the 1. O5.) By means of at most m — 1 type 3 row operations and at most n — 1 type 3 column operations. for instance. we can assure a 1 in the 1. Let /1 0 D = \o 0 ••• 0 \ D' / We see that the theorem now follows once we show that D can be obtained from B by means of a finite number of elementary row and column operations. each by a finite number of elementary operations. rank(-B') — r — 1. (In Example 3. Thus.

4. matrix in which Oi. By Theorem 3.EP and elementary nxn matrices Gi. Thus there exist elementary mxm matrices E\.-Gq.) ^J . (a) By Corollary 1. 3 Elementary Matrix Operations and Systems of Linear Equations Finally. rank(A') = rank(C t A*5*) = rank(D f ).G2. since D' contains ones as its first r— 1 diagonal entries. Hence by Theorem 3. I Corollary 1. Then B and C are invertible by Exercise 4 of Section 2. (See Exercise 13. Let A be an m x n matrix. there exist invertible matrices B and C such that D = BAC. Then D* is an n x m matrix with the form of the matrix D in Corollary 1. and O3 are zero matrices. 02. Let B — EpEp-\ • • • E\ and C = G\G2. so are Bl and C* by Exercise 5 of Section 2. and D = BAC. each Ej and Gj is invertible.5. Let A be an m x n matrix of rank r.E2. where D = is the mxn fr o2 Ol o. Then (a) rank(4*) = rank(^l). numerically equal to the rank of the matrix. This establishes (a).... (c) The rows and columns of any matrix generate subspaces of the same dimension. where D satisfies the stated conditions of the corollary. Thus rank(^ 4 ) = rank(Dt) =r = rank(^l). The proofs of (b) and (c) are left as exercises. Suppose that r = rank(A). we have Dl = (BAC)1 = CtAtBt. Then there exist invertible matrices B and C of sizes mxm and nxn. Since B and C are invertible. 149) each time we perform an elementary operation.4. We can appeal to Theorem 3. ..6.1 (p. Proof. A can be transformed by means of a finite number of elementary row and column operations into the matrix D.. 150). such that D = BAC. the rank of a matrix is the dimension of the subspace generated by its rows. Taking transposes. D contains ones as its first r diagonal entries and zeros elsewhere.. (b) The rank of any matrix equals the maximum number of its linearly independent rows. I Corollary 2. By Theorem 3. that is.158 Chap. and hence rank(Z) t ) = r by Theorem 3.4.Gq such that D — EpEp-i • • • E2E\AG\G2 • • • Gq. This establishes the theorem. Proof..2 (p. respectively.

Let T: V — > W and U: W — • Z be linear transformations on finite-dimensional vector spaces V. I J . (c) By (a). and (b). rank(AB) — rank(LAs) = rank(L^LB) < rank(L^) = rank(^l). Then (a) rank(UT) < rank(U). (a) Clearly. where the E^s and GVs are elementary matrices. and Z. Hence. (d) rank(AB) < rank(B). R(T) C W. (b) Let ct. T h e o r e m 3. and let A and B be matrices such that the product AB is defined.6. Notice how the proof exploits the relationship between the rank of a matrix and the rank of a linear transformation. Hence the matrix D in Corollary 1 equals In. so that A= E^E^. Thus rank(UT) = dim(R(UT)) < dim(R(U)) = rank(U). and 7 be ordered bases for V. Proof. (d) By (c) and Corollary 2 to Theorem 3.0. and there exist invertible matrices B and C such that In .7.Sec. Thus A = B-lInC-1 = B-lC~l. rank(UT) = r a n k ^ ' S ' ) < rank(5') = rank(T). / (c) rank(. note that B = EpEp-\ • • • E\ and C = GiG2-Gq. by Theorem 3. however. (c). Every invertible matrix is a product of elementary matrices. and let A' = [U]J and B' = [T]g. W. We prove these items in the order: (a).. As in the proof of Corollary 1. W.2 The Rank of a Matrix and Matrix Inverses 159 Corollary 3. 88). and Z. Hence R(UT) = UT(V) = U(T(V)) = U(R(T)) C U(W) = R(U).4£) < rank(A). (d). 1 We now use Corollary 2 to relate the rank of a matrix product to the rank of each factor. Proof.3 and (d). then rank(^4) = n. respectively. If A is an invertible nxn matrix. and hence A is the product of elementary matrices. 3.11 (p. Then A'B' = [UT]2 by Theorem 2. (b) rank(UT) < rank(T).BAC. The inverses of elementary matrices are elementary matrices. mnk(AB) = rank((4£)*) = rank(B t A t ) < rank(B f ) = rank(B).-E-lG-1G-l1---G-[1.

In this case.160 Chap. As an alternative method. 3 Elementary Matrix Operations and Systems of Linear Equations It is important to be able to compute the rank of any matrix. and fourth columns of A are identical and that the first and second columns of A are linearly independent. Suppose that we begin with an elementary row operation to obtain a zero in the 2. we can transform A as follows: 1 0 0 2 -3 0 3 -5 3 r -1 0. and thus to determine its rank.1 1 Note that the first and second rows of A are linearly independent since one is not a multiple of the other.4. Thus rank(vl) = 2.1 position. Theorems 3. there are several ways to proceed. Hence rank(v4) = 2. .6 to accomplish this goal.6. (b) Let A/l 3 1 1 1 0 1 1 \ 0 3 0 0. third. We can use the corollary to Theorem 3. and Corollary 2 to Theorem 3.5 and 3. (c) Let A = Using elementary row operations. Example 4 (a) Let A = 1 2 1 1 1 1 . we obtain 0 0 3 -3 3 1 1 0 0 0 0 Now note that the third row is a multiple of the second row. Subtracting the first row from the second row. note that the first. The object is to perform elementary row and column operations on a matrix to "simplify" it (so that the transformed matrix has many zero entries) to the point where a simple observation enables us to determine how many linearly independent rows or columns the matrix has. Thus raxfk(A) — 2. and the first and second rows are linearly independent.

1. and whose last p columns are the columns of B.. E2. for some nxn matrix B. 3.1 p. say A"1 = EpEp-i • • • E\. the matrix (A\In) can be transformed into the matrix (In\B) by a finite number of elementary row operations. A-1 is the product of elementary matrices. perform row and column operations until the matrix is simplified enough so that the maximum number of linearly independent rows or columns is obvious. By Exercise 15. Since we know how to compute the rank of any matrix. then it is possible to transform the matrix (A\In) into the matrix (In\A~x) by means of a finite number of elementary row operations. 149). we mean the m x (n + p) matrix (A B).2 The Rank of a Matrix and Matrix Inverses 161 It is clear that the last matrix has three linearly independent rows and hence has rank 3. and consider the n x 2n augmented matrix C — (A\In). .. We now provide a simple technique for computing the inverse of a matrix that utilizes elementary row operations. we have A~lC = (A~lA\A'lfn) = (In\A~l).Sec. By the augmented matrix (A\B). (1) By Corollary 3 to Theorem 3. Definition. then EpEp-1---Ei(A\In) Letting M = EpEp-\ = {In\B). Thus (1) becomes EpEp_x • • • Ei(A\In) -i = A~lC = (In\A Because multiplying a matrix on the left by an elementary matrix transforms the matrix by an elementary row operation (Theorem 3. we have from (2) that (MA\M) = M(A\In) = (In\B). that is. the matrix whose first n columns arc the columns of A. Let A be an invertible nxn matrix. (2) • • • E\. suppose that A is invertible and that. Ep be the elementary matrices associated with these elementary row operations as in Theorem 3.6. we can always test a matrix to determine whether it is invertible.. Let E\. The Inverse of a Matrix We have remarked that an nxn matrix is invertible if and only if its rank is n. Let A and B be m x n and m x p matrices. • In summary. we have the following result: If A is an invertible n x n matrix.. respectively. Conversely.

3 Elementary Matrix Operations and Systems of Linear Equations Hence MA = In and M = B. because elementary row operations preserve rank.1 position. this operation yields 2 1 0 2 4 1 3 1 0 1 0 0 0 0 1 0 3 . The result is 2 4 2 0 1 0 0 2 4 1 0 0 3 3 1 0 0 1 In order to place a 1 in the 1. into the corresponding column of I. It follows that M = A'1. A is an n x n matrix that is not invertible. then rank(^4) < n.1 position. We attempt to use elementary row operations to transform '0 2 4 (A\I) = ( 2 4 2 3 3 1 into a matrix of the form (I\B). on the other hand. In fact. then any attempt to transform (A\In) into a matrix of the form (In\B) produces a row whose first n entries are zeros. beginning with the first column. we begin by interchanging rows 1 and 2. Since we need a nonzero entry in the 1.162 Chap. Example 5 We determine whether the matrix A = is invertible. If. So B = A~\ Thus we have the following result: If A is an invertible nxn matrix. yielding the following result: If A is an n x n matrix that is not invertible. Hence any attempt to transform (^4|/n) into a matrix of the form (In\B) by means of elementary row operations must fail because otherwise A can be transformed into In using the same row operations. then B — A~x. we must multiply the first row by \. The next two examples demonstrate these comments. and the matrix (A\In) is transformed into a matrix of the form (In\B) by means of a finite number of elementary row operations. we compute its inverse. One method for accomplishing this transformation is to change each column of A successively. This is impossible. and if it is. A can be transformed into a matrix with a row containing only zero entries. however.

The result is /l 0 0 0 1 0 -1 \ 0 0\ 0 6 _3 -i / 2 2 '7 Only the third column remains to be changed.Sec.2 The Rank of a Matrix and Matrix Inverses 163 We now complete work in the first column by adding —3 times row 1 to row 3 to obtain (\ 0 2 2 -3 1 4 -2 0 0 -* 1 In order to change the second column of the preceding matrix into the second column of / . we multiply row 2 by | to obtain a 1 in the 2. This operation produces fl 0 2 1 1 2 -2 o o o -i l Vo .2 position. 3. In order to place a 1 in the 3. and ( A'3 = * _i 4 3 ^ 8 2 x 4i .3 We now complete our work on the second column by adding —2 times row 2 to row 1 and 3 times row 2 to row 3.3 position. we multiply row 3 by \\ this operation yields /l 0 0 0 1 0 -3 2 1 5 0 O N 0 1 Adding appropriate multiples of row 3 to rows 1 and 2 completes the process and gives IN i 2 1 V 0 0 Thus A is invertible.

• Being able to test for invertibility and compute the inverse of a matrix allows us. We use Corollary 1 of Theorem 2. The result. and if it is.18 (p. We first add —2 times row 1 to row 2 and —1 times row 1 to row 3. We then add row 2 to row 3. Therefore A is not invertible. 101) and its corollaries. Using a strategy similar to the one used in Example 5. we attempt to use elementary row operations to transform (A\I) = /I 2 2 1 \1 5 1 1 0 0 -1 0 1 0 4 0 0 1 into a matrix of the form (I\B).164 Chap. with the help of Theorem 2.18 (p. 102) to test T for invertibility and compute the inverse if T is invertible. Taking 0 to be the standard ordered basis of P2(R). The next example demonstrates this technique. Example 7 Let T: P2(R) -> P2(R) be defined by T(f(x)) = f(x) + f'(x) + f"(x). to test for invertibility and compute the inverse of a linear transformation. 1 2 1 1 0 0 2 1 -1 0 1 0 4 0 0 1 1 5 / 1 2 0 -3 0 3 1 0 0 2 -3 0 1 1 0 0 -3 -2 1 0 3 -1 0 1 1 -3 0 1 0 0 -2 1 0 -3 1 1 is a matrix with a row whose first 3 entries are zeros. where f'(x) and f"(x) denote the first and second derivatives of f(x). 3 Elementary Matrix Operations and Systems of Linear Equations Example 6 We determine whether the matrix is invertible. we have / l 1 2N = 0 1 2 \0 0 1 [T]/3 . we compute its inverse.

a2x ) = (a 0 — ai) + («i . we have [T l (a0 + aix + a2x2)]p = 0 Therefore T (ao + a i x -I. Hence by Theorem 2. Find the rank of the following matrices. (h) The rank of an n x n matrix is at most n. 91). and ([T]^)" 1 = [T .2a2)x + a2x' / EXERCISES 1. (f) The rank of a matrix is equal to the maximum number of linearly independent rows in the matrix.1 ]^. (e) Elementary column operations do not necessarily preserve rank.Sec.0 -1 0' 1 -2 .2 The Rank of a Matrix and Matrix Inverses 165 Using the method of Examples 5 and 6. we can show that \T]p is invertible with inverse ([T^r = T 0 . (b) The product of two matrices always has rank equal to the lesser of the ranks of the two matrices. (i) An n x n matrix having rank n is invertible. 0 1 Thus T is invertible. (a) (c) 1 1 0 2 1 4 .14 (p. Label the following statements as true or false. 2. (g) The inverse of a matrix can be computed exclusively by means of elementary row operations. 3. (c) The mxn zero matrix is the only mxn matrix having rank 0. (a) The rank of a matrix is equal to the number of its nonzero columns. (d) Elementary row operations preserve rank.

a2. / = (ai + 2a2 + as. For each of the following linear transformations T. compute the rank and the inverse if it exists. Use elementary row and column operations to transform each of the following matrices into a matrix D satisfying the conditions of Theorem 3.166 Chap. (a) (b) 5. and compute T . .) f(x). and then determine the rank of each matrix. rank(A) — 0 if and only if A is the 4. 3 Elementary Matrix Operations and Systems of Linear Equations 1 2 1 2 4 2 / 2 0 4 1 6 2 -8 1 /l 2 3 1 4 0 (e) 0 2 . -a\ + a2 + 2a-s. Prove that for any mxn zero matrix.3 \1 0 0 1 l\ 3 0 5 1 -3 l) 1 1 0 0 1\ 2 1 0/ (d) 1 2 (f) 3 1-4 (I 1 0 1\ 2 2 0 2 (g) 1 1 0 1 V i o V 3.ai + a 3 ).6. / (a) 1 1 2 1 1 2 (g) . matrix A.a2. determine whether T is invertible.2 V 3 6. (c) T: R3 -> R3 defined by T(a\.1 if it exists. For each of the following matrices. (a) T: P2(R) -* P2(R) defined by T(/(x)) = f"(x) + 2f'(x) (b) T: P2(R) -+ P2(R) defined by T(/(*)) = (x + l)f'(x).

1) x (n + 1) matrices respectively defined by ( 1 0 B = U 0 ••• B' 7 0 \ and D— U / 1 0 0 ••• D' 7 0 \ Prove that if B' can be transformed into D' by an elementary row [column] operation. + oix 2 .2 The Rank of a Matrix and Matrix Inverses (d) T: R3 -» P2(R) defined by T(oi.R3 defined by T(/(x)) = ( / ( . (f) T: M2X2(-R) -> R4 defined by T(A) = where E = 7.Sec. Prove that if rank(B) = r.a2 + 03)3. 11. Let B' and D ' b e m x n matrices.l ) . 10.02.4) = rank(^4).1. 8. then rank(c. Complete the proof of the corollary to Theorem 3.tr(At). Prove Theorem 3. . 3. / ( 0 ) . 167 ( ! » • ) as a product of elementary matrices.4 by showing that elementary column operations preserve rank. Prove that if c is any nonzero scalar. Let A be an m x n matrix. 9. then rank(£') = r . Let / ' 0 B = V0 B' 7 0 ••• 0 \ where B' is an m x n submatrix of B.03) = (a] + o 2 + o 3 ) + (ai .tr(AE)). 12. / ( l ) ) .tr(EA).6 for the case that A is an m x 1 matrix. and let B and D be (ra-f. then B can be transformed into D by an elementary row [column] operation. (e) T: P a (H) . Express the invertible matrix 0 1 1 0 (tY(A).

A + B) < rank(A) + rank(5) for any mxn matrices A and B. Prove that there exists an nxm matrix B such that AB = Im. Conversely. Let A be an m x n matrix with rank m and B be an n x p matrix with rank n. (b) Suppose that B is a 5 x 5 matrix such that AB = O. (a) Prove that R(T + U) C R(T) + R(U). Let B be an n x m matrix with rank m. 18. 3.6. 3 Elementary Matrix Operations and Systems of Linear Equations 13. Suppose that A and B are matrices having n rows. Let A be an m x n matrix with rank m. 16. Justify your answer. Prove that there exists an mxn matrix A such that AB = Im. Prove that AB can be written as a sum of n matrices of rank at most one. we apply results from Chapter 2 to describe the solution sets of . 21.3 SYSTEMS OF LINEAR EQUATIONS—THEORETICAL ASPECTS This section and the next are devoted to the study of systems of linear equations. Let T. Determine the rank of AB. 22. which arise naturally in both the physical and social sciences. then rank(T+U) < rank(T) + rank(U). Let / Prove that ( 1 -1 A = -2 \ 3 0 -1 3 1 4 1 -1 -5 2 -1 -1 1 l\ 0 3 -e) (a) Find a 5 x 5 matrix M with rank 2 such that AM = O.168 Chap. then there exist a 3 x 1 matrix B and a 1 x 3 matrix C such that A = BC. show that if A is any 3 x 3 matrix having rank 1. then the 3 x 3 matrix BC has rank at most 1. 15. In this section. (c) Deduce from (b) that rank(. (See the definition of the sum of subsets of a vector space on page 22. where O is the 4 x 5 zero matrix. 14. 19. U: V — > W be linear transformations. 17. Supply the details to the proof of (b) of Theorem 3. 20. Let A be an m x n matrix and B be an n x p matrix. M(A\B) = (MA\MB) for any mxn matrix M. Prove that if B is a 3 x 1 matrix and C is a 1 x 3 matrix. Prove (b) and (c) of Corollary 2 to Theorem 3. Prove that rank(B) < 2.4.) (b) Prove that if W is finite-dimensional.

.. 3. If we let x2 x— \xn) then the system (S) may be rewritten as a single matrix equation Ax = b. The set of all solutions to the system (S) is called the solution set of the system.-i + a\2x2 021^1 + 022-X2 (S) o m iXi + arn2x2 1 U-itviyi'rt — "tn • • + ainXn = b\ • • + a2nxn = 62 where aij and h (1 < i < rn and 1 < j < n) are scalars in a field F and Xi. System (S) is called consistent if its solution set is nonempty. The system of equations On2. otherwise it is called inconsistent.. is called a system of m linear equations in n unknowns over the field F.xn are n variables taking values in F. b2 and b= £ F" . .Sec. A solution to the system (S) is an n-tuple s2 \SnJ such that As = b. The mxn matrix 021 «i2 «22 ai„ \ a2l J A = \flml O-m'2 is called the coefficient matrix of the system (S). To exploit the results that we have developed.x2. In Section 3. elementary row operations are used to provide a computational method for finding all solutions to such systems. we often consider a system of linear equations as a single matrix equation.4.3 Systems of Linear Equations—Theoretical Aspects 169 systems of linear equations as subsets of a vector space.

or no solutions.X3 6/ This system has many solutions.. 1/ ' In matrix form. 3 Elementary Matrix Operations and Systems of Linear Equations Example 1 (a) Consider the system xi + x2 = 3 X\ — x2 = 1. _J) and b=(3.170 Chap. 2 1 3 -1 1 2 'xi X2 . such as '-6 N s—\ 2 | and s -4 | (c) Consider xi + x2 = 0 xi +x2 = 1. 1\ (xi . that is. that is. the system can be written 1 1 so A = l\ (b) Consider 2rci + 3x2 + x3 = 1 x\— x2 + 2z 3 = 6. Thus we see that a system of linear equations can have one. x2 = 1. • . 1 1 1\ / a * 1/ U 2 0^ It is evident that this system has no solutions. By use of familiar techniques. many.1 / \X2 1) / that is. we can solve the preceding system and conclude that there is only one solution: xi — 2.

1 Corollary. The second part now follows from the dimension theorem (p.x:i = 0. Let A = 1 1 2 -1 1 -1 . Then rank(A) = rank(L / i) < m. Since dim(K) > 0. 3. K ^ {()}.\ . Our first result (Theorem 3. Proof. and any solution can be expressed as a linear combination of the vectors in the basis. This section and the next are devoted to this end. a basis for the solution space can be found.3 Systems of Linear Equations—Theoretical Aspects 171 We must be able to recognize when a system has a solution and then be able to describe all its solutions. the system Ax = 0 has a nonzero solution. For example. Hence dim(K) = n — rank(Lyi) > n — m > 0. Proof.8) shows that the set of solutions to a homogeneous system of m linear equations in n unknowns forms a subspace of F n . Let Ax = 0 be a homogeneous system of m linear equations in n unknowns over a field F. 1 Example 2 (a) Consider the system Xi + 2. Ifrn < n.8. Thus there exists a nonzero vector s € K. namely. 70). the zero vector. where K = N(L/\).T2 + x3 = 0 x. Any homogeneous system has at least one solution. Otherwise the system is said to be nonhomogeneous.Sec. so s is a nonzero solution to Ax — 0. K = {s G F n : As = 0} = N(LA). Theorem 3. Definitions.x2 . hence K is a subspace of F n of dimension n — rank(L / i) = n — rank(A). A system Ax = b of m linear equations in n unknowns is said to be homogeneous if b = 0. We begin our study of systems of linear equations by examining the class of homogeneous systems of linear equations. We can then apply the theory of vector spaces to this set of solutions. Clearly. Then K = N(L^). Suppose that m < n. The next result gives further information about the set of solutions to a homogeneous system. Let K denote the set of all solutions to Ax = 0.

. 3 Elementary Matrix Operations and Systems of Linear Equations be the coefficient matrix of this system. If A = (l —2 l) is the coefficient matrix.9. since is a solution to the given system. It is clear that rank(^l) = 2. Thus they constitute a basis for K. Our next result shows that the solution set of a nonhomogeneous system Ax = b can be described in terms of the solution set of the homogeneous system Ax = 0. Thus any vector in K is of the form where t G R. For example.s + fc:fcGKH}. then dim(K) = 3 — 1 = 2. Theorem 3. Hence if K is the solution set.4. (b) Consider the system X\ — 2x2 + x$ — 0 of one equation in three unknowns. then rank (A) = 1. Note that /2\ lj ana /-1N ( 0 are linearly independent vectors in K. Thus any nonzero solution constitutes a basis for K.s} + KH = {. Then for any solution s to Ax — b K = {. and let KH be the solution set of the corresponding homogeneous system Ax — 0. We now turn to the study of nonhomogeneous systems. is a basis for K.172 Chap. If K is the solution set of this system. so that K=<£i In Section 3. then dim(K) = 3 — 2 — 1. We refer to the equation Ax = 0 as the homogeneous system corresponding t o Ax = b. explicit computational methods for finding a basis for the solution set of a homogeneous system are discussed. Let K be the solution set of a system of linear equations Ax = b.

I Example 3 (a) Consider the system Xi + 2x2 + £ 3 = 7 #1 — #2 — X3 = —4. then Aw — b. then w = s + k for some fce KH.Sec. We must show that K = {. and thus K-{s} + K». But then Aw = A(s + k) = As + Ak = b + 0 = b. So w — s G KH. The corresponding homogeneous system is the system in Example 2(a). It follows that w = s + fc G {S} + KH.s) = Aw .. It is easily verified that " ( I ) ' is a solution to the preceding nonhomogeneous system.3 Systems of Linear Equations—Theoretical Aspects 173 Proof. Let s be any solution to Ax = b.. Hence A(w . | 1 j +* 2 ( o) /. so w G K."?} + KH./. (b) Consider the system x\ — 2x2 + x% — 4.• R\. and therefore KC{S} + KH- Conversely. Therefore {«} + KH QK. Thus there exists k G KH such that w — s — k. The corresponding homogeneous system is the system in Example 2(b). If w G K.b= 0.As = b. 0 . . the solution set K can be written as K.9. Since is a solution to the given system. suppose that w G {s} + KH. So the solution set of the system is K = { ( i ) + t ( t l : t e * by Theorem 3. 3.

Let Ax = b be a system of linear equations.10.174 Chap. Thus N(L^) = {()}. we computed the inverse of the coefficient matrix A of this system.9.11. . The matrix (^4|6) is called the augmented matrix of the system Ax = b. Conversely.2. Thus A~xb is a solution. Theorem 3. Let KH denote the solution set for the corresponding homogeneous system Ax = 0. This criterion involves the rank of the coefficient matrix of the system Ax — b and the rank of the matrix (>1|&). . But this is so only if KH = {0}. We now establish a criterion for determining when a system has solutions. we have A(A~lb) = (AA~x)b = b. In Example 1(c).5 (p. and hence A is invertible. we saw that R(Lyi) = s p a n ( { a i . . we saw a system of linear equations that has no solutions. b = 4 3 w V 8 5 8 4 3 8 4' 2 1 f2\ I3\ U w (-l\ 4 l-M We use this technique for solving systems of linear equations having invertible coefficient matrices in the application that concludes this section. then the system has exactly one solution. . {s} = {s} + KH. By Theorem 3. In Example 5 of Section 3. Thus the system has one and only one solution. Substituting A~lb into the system. o 2 . namely. If A is invertible. (See Exercise 9. namely. then A is invertible. Thus the system has exactly one solution. o „ } ) . namely. Conversely. Proof. if the system has exactly one solution.T2 + 2x 3 = 3 3xi + 3x2 + X3 = 1. A~lb. suppose that the system has exactly one solution s. then As = b. Theorem 3. Proof. Then the system is consistent if and only ifrank(A) = rank(-A|fr). To say that Ax = b has a solution is equivalent to saying that b G R(L/i). ( * *1 A l X2 \ = A . If s is an arbitrary solution. 3 Elementary Matrix Operations and Systems of Linear Equations The following theorem provides us with a means of computing solutions to certain systems of linear equations. A~lb.) In the proof of Theorem 3. jif Example 4 Consider the following system of three linear equations in three unknowns: 2x2 + 4x 3 = 2 2xi + 4. Let Ax = b be a system of n linear equations in n unknowns. Multiplying both sides by A~l gives s — A~1b. 153). . Suppose that A is invertible.

This last statement is equivalent to dim(span({oi. Because the two ranks are unequal..Sec. Such a vector s must be a solution to the system x\ + x2 + x 3 = 3 X] — X2 + X3 = 3 X\ + X3 = 2.2) £ R(T).. Hence (3.. respectively.. . .. o n })) = dim(span({oi. Since A = 1 1 1 1 and (A\b) = 1 1 1 0 1 1 I an. Example 5 Recall the system of equations xi + x2 — 0 xi + x2 = 1 in Example 1(c). .5. So by Theorem 3.. 3. the preceding equation reduces to rank(v4) = rank(A|6).o n }) = span({ai. a2.. rank(A) = 1 and rank(A|o) = 2.3 Systems of Linear Equations—Theoretical Aspects 175 the span of the columns of A.2 +03. a i + o 3 ) .3. it follows that this system has no solutions.2) is in the range of the linear transformation T: R3 — * R3 defined by T(oi. But b G span({oi.an.o 3 ) = (ai +0. ..o n }) if and only if span({ai. o 2 .02.X2.3.a n }).. • .02.2) G R(T) if and only if there exists a vector s = (xi.a 2 .3.11 to determine whether (3. Now (3.02. Since the ranks of the coefficient matrix and the augmented matrix of this system are 2 and 3.3. b}))..2). • Example 6 We can use Theorem 3.o 2 + a 3 .. . . . the system has no solutions. .a2...01 ..xa) in R3 such that T($) — (3.. Thus Ax = b has a solution if and only if b G span({ai.b})..

and a carpenter who builds all the housing.20p2 + 0.20 0.60 Let Pi.20p3 = Pi 0.70 0. Because the total cost of all clothing is p2. We assume that each person sells to and buys from a central pool and that everything produced is consumed.10p2 + 0.10 Housing 0. Suppose that the proportion of each of the commodities consumed by each person is given in the following table. and so we obtain the equation 0. respectively. this case is referred to as the closed model. and carpenter.70p2 + 0.20 0.20p2 + 0. clothing.40 0.p2. and housing must equal the farmer's income. 3 Elementary Matrix Operations and Systems of Linear Equations An Application In 1973.20p2Moreover.10 0.60p3 = PaThis system can be written as Ap = p.40pi + 0. and p% denote the incomes of the farmer.50pi + 0. Notice that each of the columns of the table must sum to 1. We close this section by applying some of the ideas we have studied to illustrate two special cases of his work. the amount spent by the farmer on food.40pi + 0. a tailor who makes all the clothing. To ensure that this society survives. Note that the farmer consumes 20% of the clothing.lOpi + 0. where P= . we require that the consumption of each individual equals his or her income.20 0. the amount spent by the farmer on clothing is 0. We begin by considering a simple society composed of three people (industries)-—a farmer who grows all the food. Since no commodities either enter or leave the system.50 Clothing 0. Each of these three individuals consumes all three of the commodities produced in the society. Wassily Leontief won the Nobel prize in economics for his work in developing a mathematical model that can be used to describe various economic phenomena.20p3 = PiSimilar equations describing the expenditures of the tailor and carpenter produce the following system of linear equations: 0. Farmer Tailor Carpenter Food 0.20p3 =7 Pi O. the tailor's income.176 Chap. tailor.

there exists a k for which pk >^AkjPjj Hence. Notice that we are not simply interested in any nonzero solution to the system. California) is stated below. bn) and c — (c\. and carpenter have incomes in the proportions 25 : 35 : 40 (or 5 : 7 : 8). In this context. in fact. tailor." by Ben Noble. Proceedings of the Summer Conference for College Teachers on Applied Mathematics. Let A be an n x n input output matrix having the form A = B D C E where D is a l x ( n . it may seem reasonable to replace the equilibrium condition by the inequality Ap < p. .3 Systems of Linear Equations—Theoretical Aspects 177 and A is the coefficient matrix of the system.l ) positive vector and C is an ( n . Theorem 3. Berkeley. For otherwise.12. CUPM. 3. where A is a matrix with nonnegative entries whose columns sum to 1. The vector b is called nonnegative [positive] if b > 0 [b > 0]. For vectors 6 = (b\. Then (I — A)x = 0 has a one-dimensional solution set that is generated by a nonnegative vector... At first.Sec.. the requirement that consumption not exceed production. 1971. but in one that is nonnegative.b2. A is called the input-output (or consumption) matrix. is / We may interpret this to mean that the society survives if the farmer. c n ) in R n .. But. .. we use the notation b > c [b > c] to mean bi > c-i [bi > Ci] for all i.c2. 5 3 » > 5 Z £ ^PS = £ ( J 2 AiA Pi = £ * * ' * 3 which is a contradiction. A useful theorem in this direction (whose proof may be found in "Applications of Matrices to Economic Models and Social Science Relationships. and Ap = p is called the equilibrium condition. Ap < p implies that Ap — p in the closed model. which is equivalent to the equilibrium condition. One solution to the homogeneous system (I — A)x = 0. . Thus we must consider the question of whether the system (/ — A)x = 0 has a nonnegative solution. since the columns of A sum to 1. that is.l ) x l positive vector..

10 cents worth of clothing. Then the value of the surplus of food in the society is xi .25 0 v In the open model.25 0. Hence . clothing. we must find a nonnegative solution to ( / — A)x = d. Recall that for a real number a. Finally.(Auxx + A12X2 + ^13^3).10 . 40 cents worth of clothing.2.. and d > 0.X2. Similarly. the series 1 + a + a2 + • • • converges to (1 — o ) _ 1 if \a\ < 1. where A is a matrix with nonnegative entries such that the sum of the entries of each column of A does not exceed one.178 Chap. namely. It is easy to see that if (/ — A)-1 exists and is nonnegative. Let A be the 3 x 3 matrix such that Aij represents the amount (in a fixed monetary unit such as the dollar) of commodity i required to produce one monetary unit of commodity j. The assumption that everything produced is consumed gives us a similar equilibrium condition for the open model.30 A = I 0. and 20 cents worth of housing are required for the production of $1 of clothing. In this case.d2. AijXj = di 3=1 fori = 1.. are nonnegative. Similarly. we assume that there is an outside demand for each of the commodities produced.35 0.20 0. the value of food produced minus the value of food consumed while producing the three commodities.40 0.3. suppose that 30 cents worth of food. The following matrix does also: '0.A) -1 is nonnegative since the matrices J. (I — . and 30 cents worth of housing are required for the production of $1 worth of food. 10 cents worth of clothing.50 0. that is.20 0. A2.30\ 0. let Xi. A. then the desired solution is (/ — A)~ld. and housing produced with respective outside demands d\. and 0*3.10 \0. 0.75 0.30 0. suppose that 30 cents worth of food. and X3 be the monetary values of food. suppose that 20 cents worth of food. it can be shown (using the concept of convergence of matrices developed in Section 5.. To illustrate the open model.{' 1 / . Then the input output matrix is /0.30/ .3) that the series I + A + A2 + • • • converges to (/ — A)"1 if {A71} converges to the zero matrix. and 30 cents worth of housing are required for the production of $1 worth of housing. Returning to our simple society. that the surplus of each of the three commodities must equal the corresponding outside demands.25 0. 3 Elementary Matrix Operations and Systems of Linear Equations Observe that any input-output matrix with all positive entries satisfies the hypothesis of this theorem. In general.65' 0 0.

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects so I-A = 0.70 -0.10 -0.30 -0.20 0.60 -0.20 -0.30' -0.10 0.70, and (I^A)~1 = 2.0 0.5 1.0 1.0 1.0 2.0 0.5 1.0 2.0

179

Since (I —A)'1 is nonnegative, we can find a (unique) nonnegative solution to (/ — A)x — d for any demand d. For example, suppose that there are outside demands for $30 billion in food, $20 billion in clothing, and $10 billion in housing. If we set d = then x = (I-A)~ld= 60

So a gross production of $90 billion of food, $60 billion of clothing, and $70 billion of housing is necessary to meet the required demands. EXERCISES 1. Label the following statements as true or false. (a) Any system of linear equations has at least one solution. (b) Any system of linear equations has at most one solution. (c) Any homogeneous system of linear equations has at least one solution. (d) Any system of n linear equations in n unknowns has at most one solution. (e) Any system of n linear equations in n unknowns has at least one solution. (f) If the homogeneous system corresponding to a given system of linear equations has a solution, then the given system has a solution. (g) If the coefficient matrix of a homogeneous system of n linear equations in n unknowns is invertible, then the system has no nonzero solutions. (h) The solution set of any system of m linear equations in n unknowns is a subspace of F n . 2. For each of the following homogeneous systems of linear equations, find the dimension of and a basis for the solution set.

180 (a)

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations xi + 3x2 = 0 2xi + 6x 2 = 0 (b) Xi -f X2 — X3 = 0 4xi + X2 - 2x 3 = 0 2xi + x 2 - x 3 = 0 xi - x 2 + x 3 = 0 xi + 2x 2 - 2x 3 = 0

xi + 2x 2 - x 3 = 0 (c) 2xi + x + x = 0 2 3 (e) xi + 2x 2 - 3x 3 + x 4 = 0 Xi + 2X2 + X3 + X4 = 0 X2 — X3 + X4 = 0

(d)

xi + 2x 2 = 0 (f) Xi — x = 0 2

(g)

3. Using the results of Exercise 2, find all solutions to the following systems. xi + 3x2 = 5 (a) 2xi + 6x = 10 2 Xi + 2X2 — X3 = 3 (c) 2xi + x + x = 6 2 3 (b) Xl + X2 - X3 = 1 4xi + x 2 - 2x 3 = 3 2xi + x 2 - x 3 = 5 Xi - X2 + X3 as 1 xi + 2x2 - 2x3 = 4

(d)

(e) xi + 2x2 -^3x3 + x4 = 1 Xi + 2X2 + X3 + X4 = 1 X2 ~ X3 + X4 = 1

xi + 2x 2 = 5 (f) Xi - x = - 1 2

(g)

4. For each system of linear equations with the invertible coefficient matrix A, (1) Compute A"1. (2) Use A-1 to solve the system.
= *l+*2ZX3 \ 2xi - 2x 2 + x 3 = 4 5. Give an example of a system of n linear equations in n unknowns with infinitely many solutions.

W

2x1+5x2 = 3

M

6. Let T: R3 —• R2 be defined by T(a, b, c) = (a + 6,2a — c). Determine T-Kui)7. Determine which of the following systems of linear equations has a solution.

.

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects (a) Xi + X2 — X3 + 2x4 = 2 xi + x 2 + 2x3 = 1 2xi + 2x 2 + x 3 + 2x 4 = 4 (b) X\ -r x2 — Xg = 1 2xi + x 2 + 3x 3 = 2

181

xi + 2x 2 + 3x 3 = 1 (c) xi + x 2 - x 3 = 0 xi -I- 2x2 + X3 = 3 Xi + 2x2 — X3 = 1 (e) 2xi + x2 + 2x 3 = 3 Xi — 4x2 + 7x3 = 4

X\ + Xl + (d) Xi 4xi +

X2 + 3X3 — X4 = 0 X2+ X3 + X 4 = 1 2x2 + X3 - X 4 = 1 X2 + 8x3 — X4 = 0

8. Let T: R3 -+ R3 be defined by T(o,6,c) = (o + 6,6 - 2c, o + 2c). For each vector v in R3, determine whether v £ R(T). (a) v = ( 1 , 3 , - 2 ) (b) ^ = (2,1,1) 9. Prove that the system of linear equations Ax = b has a solution if and only if 6 e R(L^). 10. Prove or give a counterexample to the following statement: If the coefficient matrix of a system of m linear equations in n unknowns has rank m, then the system has a solution. / 11. In the closed model of Leontief with food, clothing, and housing as the basic industries, suppose that the input-output matrix is 1 2 ft\ (& 5 1 5 A = 16 6 16 1 1 1 14 3 2J At what ratio must the farmer, tailor, and carpenter produce in order for equilibrium to be attained? 12. A certain economy consists of two sectors: goods and services. Suppose that 60% of all goods and 30% of all services are used in the production of goods. What proportion of the total economic output is used in the production of goods? 13. In the notation of the open model of Leontief, suppose that A = and d =

are the input-output matrix and the demand vector, respectively. How much of each commodity must be produced to satisfy this demand?

182

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations

14. A certain economy consisting of the two sectors of goods and services supports a defense system that consumes $90 billion worth of goods and $20 billion worth of services from the economy but does not contribute to economic production. Suppose that 50 cents worth of goods and 20 cents worth of services are required to produce $1 worth of goods and that 30 cents worth of goods and 60 cents worth of services are required to produce $1 worth of services. What must the total output of the economic system be to support this defense system? 3.4 SYSTEMS OF LINEAR EQUATIONSCOMPUTATIONAL ASPECTS

In Section 3.3, we obtained a necessary and sufficient condition for a system of linear equations to have solutions (Theorem 3.11 p. 174) and learned how to express the solutions to a nonhomogeneous system in terms of solutions to the corresponding homogeneous system (Theorem 3.9 p. 172). The latter result enables us to determine all the solutions to a given system if we can find one solution to the given system and a basis for the solution set of the corresponding homogeneous system. In this section, we use elementary row operations to accomplish these two objectives simultaneously. The essence of this technique is to transform a given system of linear equations into a system having the same solutions, but which is easier to solve (as in Section 1.4). Definition. Two systems of linear equations are called equivalent they have the same solution set. if

The following theorem and corollary give a useful method for obtaining equivalent systems. Theorem 3.13. Let Ax = 6 be a system of m linear equations in n unknowns, and let C be an invertible mxm matrix. Then the system (CA)x = Cb is equivalent to Ax — b. Proof. Let K be the solution set for Ax = 6 and K' the solution set for (CA)x = Cb. If w < E K, then Aw = 6. So (CA)w = Cb, and hence w € K'. Thus KCK'. Conversely, if w € K', then (CA)w = Cb. Hence Aw = C~\CAw) = C~\Cb) = 6;

so w < E K. Thus K' C K, and therefore, K = K'. | Corollary. Let Ax = 6 be a system ofm linear equations in n unknowns. If (A'\b') is obtained from (A\b) by a finite number of elementary row operations, then the system A'x = 6' is equivalent to the original system.

Sec. 3.4 Systems of Linear Equations—Computational Aspects

183

Proof. Suppose that (A'\b') is obtained from (^4|6) by elementary row operations. These may be executed by multiplying (A\b) by elementary mx m matrices E\, E2,..., Ep. Let C = Ep • • • E2E\; then (A'\b') = C(A\b) = (CA\Cb). Since each Ei is invertible, so is C. Now A' = CA and b' = Cb. Thus by Theorem 3.13, the system A'x = b' is equivalent to the system Ax = b. I We now describe a method for solving any system of linear equations. Consider, for example, the system of linear equations 3xi + 2x 2 + 3x 3 - 2x 4 = 1 xi + x 2 + x 3 =3 xi + 2x 2 + x 3 - x 4 = 2. First, we form the augmented matrix 3 2 3 -2 1 1 1 0 1 2 1 -1 By using elementary row operations, we transform the/augmented matrix into an upper triangular matrix in which the first nonzero entry of each row is 1, and it occurs in a column to the right of the first nonzero entry of each preceding row. (Recall that matrix A is upper triangular if Aij — 0 whenever i>h) 1. In the leftmost nonzero column, create a 1 in the first row. In our example, we can accomplish this step by interchanging the first and third rows. The resulting matrix is 1 2 1 -1 1 1 1 0 3 2 3 -2 2. By means of type 3 row operations, use the first row to obtain zeros in the remaining positions of the leftmost nonzero column. In our example, we must add —1 times the first row to the second row and then add —3 times the first row to the third row to obtain 0 0 2 1 -1 0 -4 0 -1 1 1 2 1 -5

3. Create a 1 in the next row in the leftmost possible column, without using previous row(s). In our example, the second column is the leftmost

184

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations possible column, and we can obtain a 1 in the second row, second column by multiplying the second row by —1. This operation produces 0 0 2 1 -1 1 0 -1 -4 0 1 2 -1 -5

4. Now use type 3 elementary row operations to obtain zeros below the 1 created in the preceding step. In our example, we must add four times the second row to the third row. The resulting matrix is 2 1 -1 0 1 0 -1 0 0 0 -3 5. Repeat steps 3 and 4 on each succeeding row until no nonzero rows remain. (This creates zeros above the first nonzero entry in each row.) In our example, this can be accomplished by multiplying the third row by — | . This operation produces 1 2 1 -1 0 1 0 -1 1 0 0 0 2 -1 3

We have now obtained the desired matrix. To complete the simplification of the augmented matrix, we must make the first nonzero entry in each row the only nonzero entry in its column. (This corresponds to eliminating certain unknowns from all but one of the equations.) 6. Work upward, beginning with the last nonzero row, and add multiples of each row to the rows above. (This creates zeros above the first nonzero entry in each row.) In our example, the third row is the last nonzero row, and the first nonzero entry of this row lies in column 4. Hence we add the third row to the first and second rows to obtain zeros in row 1, column 4 and row 2, column 4. The resulting matrix is 2 1 0 0 1 0 0 0 0 0 1 7. Repeat the process described in step 6 for each preceding row until it is performed with the second row, at which time the reduction process is complete. In our example, we must add —2 times the second row to the first row in order to make the first row, second column entry become zero. This operation produces 0 0 0 1 0 1 0 0 0 0 1

Sec. 3.4 Systems of Linear Equations—Computational Aspects

185

We have now obtained the desired reduction of the augmented matrix. This matrix corresponds to the system of linear equations x\ + x2 x3 =1 =2 X4 = 3.

Recall that, by the corollary to Theorem 3.13, this system is equivalent to the original system. But this system is easily solved. Obviously X2 = 2 and X4 = 3. Moreover, xi and X3 can have any values provided their sum is 1. Letting X3 = t, we then have xi = 1 — t. Thus an arbitrary solution to the original system has the form fl-t\ f\\ 2 2 = 0 t K ^ j A Observe that 0 1 V <v f-l\ +* 0 1 V °/

/

is a basis for the homogeneous system of equations corresponding to the given system. In the preceding example we performed elementary row operations on the augmented matrix of the system until we obtained the augmented matrix of a system having properties 1, 2, and 3 on page 27. Such a matrix has a special name. Definition. A matrix is said to be in reduced row echelon form if the following three conditions are satisfied. (a) Any row containing a nonzero entry precedes any row in which all the entries are zero (if any). (b) The first nonzero entry in each row is the only nonzero entry in its column. (c) The first nonzero entry in each row is 1 and it occurs in a column to the right of the first nonzero entry in the preceding row. Example 1 (a) The matrix on page 184 is in reduced row echelon form. Note that the first nonzero entry of each row is 1 and that the column containing each such entry has all zeros otherwise. Also note that each time we move downward to

186

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations

a new row, we must move to the right one or more columns to find the first nonzero entry of the new row. (b) The matrix

is not in reduced row echelon form, because the first column, which contains the first nonzero entry in row 1, contains another nonzero entry. Similarly, the matrix '0 1 0 2y 1 0 0 1 ,0 0 1 1 , is not in reduced row echelon form, because the first nonzero entry of the second row is not to the right of the first nonzero entry of the first row. Finally, the matrix

/ is not in reduced row echelon form, because the first nonzero entry of the first row is not 1. • It can be shown (see the corollary to Theorem 3.16) that the reduced row echelon form of a matrix is unique; that is, if different sequences of elementary row operations are used to transform a matrix into matrices Q and Q' in reduced row echelon form, then Q = Q'. Thus, although there are many different sequences of elementary row operations that can be used to transform a given matrix into reduced row echelon form, they all produce the same result. The procedure described on pages 183-185 for reducing an augmented matrix to reduced row echelon form is called Gaussian elimination. It consists of two separate parts. 1. In the forward pass (steps 1-5), the augmented matrix is transformed into an upper triangular matrix in which the first nonzero entry of each row is 1, and it occurs in a column to the right of the first nonzero entry of each preceding row. 2. In the backward pass or back-substitution (steps 6-7), the upper triangular matrix is transformed into reduced row echelon form by making the first nonzero entry of each row the only nonzero entry of its column. P

Sec. 3.4 Systems of Linear Equations—Computational Aspects

187

Of all the methods for transforming a matrix into its reduced row echelon form, Gaussian elimination requires the fewest arithmetic operations. (For large matrices, it requires approximately 50% fewer operations than the Gauss-Jordan method, in which the matrix is transformed into reduced row echelon form by using the first nonzero entry in each row to make zero all other entries in its column.) Because of this efficiency, Gaussian elimination is the preferred method when solving systems of linear equations on a computer. In this context, the Gaussian elimination procedure is usually modified in order to minimize roundoff errors. Since discussion of these techniques is inappropriate here, readers who are interested in such matters are referred to books on numerical analysis. When a matrix is in reduced row echelon form, the corresponding system of linear equations is easy to solve. We present below a procedure for solving any system of linear equations for which the augmented matrix is in reduced row echelon form. First, however, we note that every matrix can be transformed into reduced row echelon form by Gaussian elimination. In the forward pass, we satisfy conditions (a) and (c) in the definition of reduced row echelon form and thereby make zero all entries below the first nonzero entry in each row. Then in the backward pass, we make zero all entries above the first nonzero entry in each row, thereby satisfying condition (b) in the definition of reduced row echelon form. T h e o r e m 3.14. Gaussian elimination transforms any matrix into its reduced row echelon form. We now describe a method for solving a system in which the augmented matrix is in reduced row echelon form. To illustrate this procedure, we consider the system 2xi xi xi 2x, + 3x2 + x2 4- x 2 + 2x2 + + + + X3 + 4x4 — 9x5 = 17 X3 + X4 — 3x5 = 6 X3 + 2x4 - 5x 5 = 8 2x 3 + 3x 4 - 8x5 = 14, -9 -3 -5 -8 17\ 6 8

for which the augmented matrix is (2 3 1 4 1 1 1 1 1

14 V / Applying Gaussian elimination to the augmented matrix of the system produces the following sequence of matrices.

17\ 6 8 UJ

(\ 2 1 V2

1 1 1 3 1 4 1 1 2 2 2 3

-3 -9 -5 -8

6\ 17 8 14,

188

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations /l 1 1 0 1 - 1 0 0 0 \0 0 0 1 -3 2 -3 1 -2 1 -2 6\ 5 2 V <l l 0 1 0 0 ^0 0 1 1 -3 -1 2 -3 0 1 -2 0 0 0 6> 5 2 «y

a i 0 1 0 0 \0 0

1 0 -1 4 ^ -1 0 1 1 0 1 -2 2 0 0 0 <v

/l 0 ~~* 0 \0

0 2 0 - 2 3\ 1 - 1 0 1 1 0 0 1 - 2 2 0 0 0 0 0/

The system of linear equations corresponding to this last matrix is xi + 2x 3 #2 - #3 - 2x5 = 3 + x5 = 1 X4 — 2x5 = 2.

Notice that we have ignored the last row since it consists entirely of zeros. To solve a system for which the augmented matrix is in reduced row echelon form, divide the variables into two sets. The first set consists of those variables that appear as leftmost variables in one of the equations of the system (in this case the set is {xi,X2,X4}). The second set consists of all the remaining variables (in this case, {x3,xs}). To each variable in the second set, assign a parametric value t\,t2,... (X3 = t\, X5 = t2), and then solve for the variables of the first set in terms of those in the second set: xi = - 2 x 3 + 2x 5 + 3 = -2ti + 2*2 + 3 x2= x3 - x5 + 1 = t\ - t2 + 1 x4 = 2x 5 + 2 = 2t2 + 2. Thus an arbitrary solution is of the form /3\ /xA /-2ti + 2*2 + 3\\ x2 h — t2 + 1 1 = 0 -rh x3 = ti 2 x4 2*2 + 2 V^5> \ t2 / w where ti,t2 6 R. Notice that \ ( f-2\ < 2\ 1 -1 ! 1 5 0 0 2 { I o^ I V J f-2\ ( 2\ -1 1 1 + t2 0 2 0 I 0) I 1/

Sec. 3.4 Systems of Linear Equations—Computational Aspects

189

is a basis for the solution set of the corresponding homogeneous system of equations and /3\ 1 0 2 w is a particular solution to the original system. Therefore, in simplifying the augmented matrix of the system to reduced row echelon form, we are in effect simultaneously finding a particular solution to the original system and a basis for the solution set of the associated homogeneous system. Moreover, this procedure detects when a system is inconsistent, for by Exercise 3, solutions exist if and only if, in the reduction of the augmented matrix to reduced row echelon form, we do not obtain a row in which the only nonzero entry lies in the last column. Thus to use this procedure for solving a system Ax = 6 of m linear equations in n unknowns, we need only begin to transform the augmented matrix (.4|6) into its reduced row echelon form (^4'|6') by means of Gaussian elimination. If a row is obtained in which the only nonzero entry lies in the last column, then the original system is inconsistent. Otherwise, discard any zero rows from (A'\b'), and write the corresponding system of equations. Solve this system as described above to obtain an arbitrary solution of the form s = so + hui + t2u2 H + tn_rwn_r,

where r is the number of nonzero rows in A' (r < m). The preceding equation is called a general solution of the system Ax = 6. It expresses an arbitrary solution s of Ax = b in terms of n — r parameters. The following theorem states that s cannot be expressed in fewer than n — r parameters. Theorem 3.15. Let Ax = 6 be a system of r nonzero equations in n unknowns. Suppose that rank(A) — rank(^|6) and that (A\b) is in reduced row echelon form. Then (a) rank(A) = r. (b) If the general solution obtained by the procedure above is of the form s = s0 + hui + t2u2 H \tn-run-r,

then {ui,u2,..., un-r} is a basis for the solution set of the corresponding homogeneous system, and So is a solution to the original system. Proof. Since (A\b) is in reduced row echelon form, (A\b) must have r nonzero rows. Clearly these rows are linearly independent by the definition of the reduced row echelon form, and so rank(yl|6) = r. Thus rank(yl) = r.

190

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations

Let K be the solution set for Ax = b. and let KH be the solution set for Ax = 0. Setting t\ = t2 — • • • — tn-r = 0, we see that s = So G K. But by Theorem 3.9 (p. 172), K = {.s,,} + KH. Hence KH = {so} + K = s p a n ( { u 1 . u 2 , . . . . «„-,})•

Because rank(A) = r, we have dim(Kn) = n — r. Thus since dim(Kn) = n — r and KH is generated by a set {u\,u2,... ,un-r} containing at most n — r vectors, we conclude that this set is a basis for KH1 An Interpretation of the Reduced Row Echelon Form Let A be an m x n matrix with columns Oi,a2, • • • ,an< a i J d let B be the reduced row echelon form of A. Denote the columns of B by b\,b2,...,b„. If the rank of A is r, then the rank of B is also r by the corollary to Theorem 3.4 (p. 153). Because B is in reduced row echelon form, no nonzero row of B can be a linear combination of the other rows of B. Hence B must have exactly r nonzero rows, and if r > 1, the vectors e.\, e2,..., er must occur among the columns of B. For i = 1 , 2 , . . . , r, let j.t denote a column number of B such that bjt = e». We claim that a 7 l , aj2..... a J r , the columns of A corresponding to these columns of B, are linearly independent. For suppose that there are scalars c\,c2,...,cr such that c\Uj1 + c2aj2 -\ 1 - crajr = 0.

Because B can be obtained from A by a sequence of elementary row operations, there exists (as in the proof of the corollary to Theorem 3.13) an invertible mxm matrix M such that MA = B. Multiplying the preceding equation by M yields 3i Since Ma*. = 6, = a, it follows that C\('\ + C2C2 H
CIMOJ, +c2Ma

+ crMa.; = 0.

r Cr('r = 0.

Hence c\ = c2 = • • • = cr = 0. proving that the vectors ctj11Q>ja, • • • ,o-jr are linearly independent. Because B has only r nonzero rows, every column of B has the form

dr 0 W

jr.Sec.16(c) asserts that the first.dr. and a$. there is a column bji of B such that bjt = e^. r. a2. The reduced row echelon form of a matrix is unique.. . third. . (d) For each k = 1 . third.16(d) that a2 = 2a-i.16. and let B be the reduced row echelon form of A.) Example 2 Let (2 1 A = 2 \3 4 2 4 6 6 2 3 1 8 0 7 5 4\ 1 0 9/ / M The reduced row echelon form of A is 4 0\ (I 2 0 0 0 1 -1 0 B = 0 1 0 0 0 0 0 0 0/ ^ Since B has three nonzero rows. jr nre linearly independent. Let the columns of A be denoted ai. ... 2 . the same result shows that 04 = 4a\ + (—l)as. and fifth columns of B are ei. n. The corresponding column of A must be \-drer) = d\M 1 191 ei+d2M r-l*e2-t + drM_1( drM~lbjr = diM^bj. (c) The columns of A numbered ji. and 63.---. and fifth columns of A are linearly independent. a$. so Theorem 3.d2. 2 .4 Systems of Linear Equations—Computational Aspects for scalars d\. Because the second column of B is 2ei. + d2M~1bJ2 + • • • + = diajr + d2aj2 + h drajv. where r > 0. (See Exercisel5. Exercise. . 04. (b) For each i = 1 . since the fourth column of B is 4ei + (—l)e2. . it follows from Theorem 3. Corollary. Theorem 3. then column k of A is d\Oj1 + d2aj2 + • • • + dra. M '(di e i +d2e2 H . Proof. . if column k ofB is d\ei +d2e2 -\ \-drer. The next theorem summarizes these results. Let A be an m x n matrix of rank r. The first. .e2. the rank of A is 3.j2. • . Moreover. 3. as is easily checked. Then (a) The number of nonzero rows in B is r.

e2.0. Theorem 3.0.\. we conclude that the first. and 63.0. . The augmented matrix of this system of equations is A= \ -3 5 -12 20 and its reduced row echelon form is B = Using the technique described earlier in this section. Recall that S is linearly dependent if there are scalars Ci. Nevertheless.C4. We begin by noting that if S were linearly independent. Hence 0 = {(2. . In this case. This technique is illustrated in the next example. Because every finite-dimensional vector space over F is isomorphic to F n for some n. However. (7.2 ) + c 4 ( 0 .2.16(c) gives us additional information.c 4 =0 has a nonzero solution.0). . The procedure described there can be streamlined by using Theorem 3. (1. .3 .3 . . -12. 2 . third. we can find nonzero solutions of the preceding system.2 ) . 0 .1 ) . such that c i ( 2 .3 c i . Since the first. 47) that 0 is a basis for R3.6. 5 ) . confirming that 5 is linearly dependent. . and C5.12c2 + 2c4 + 2c5 = 0 5ci + 20c2 .0)}.20). we extracted a basis for R3 from the generating set S = {(2.2c3 . 2 . and fourth columns of A are linearly independent. 2 0 ) + c 3 ( l .2. But the columns of A other than the last column (which is the zero vector) are vectors in S. a similar approach can be used to reduce any finite generating set to a basis. .l ) + C 5 ( 7 .3 . . 3 Elementary Matrix Operations and Systems of Linear Equations In Example 6 of Section 1. then S would be a basis for R3. .1 2 . (8. 5 ) + C 2 ( 8 . (0. it is instructive to consider the calculation that is needed to determine whether S is linearly dependent or linearly independent.C3. 0 ) = (0. 2 .2 ) . 5 ) . (1. Thus S is linearly dependent if and only if the system of linear equations 2ci + 8c2 + c 3 + 7c5 = 0 . it is clear that S is linearly dependent because S contains more than dim(R 3 ) = 3 vectors.C2. ( 0 . . third. . and fourth columns of B are e.192 Chap. If follows from (b) of Corollary 2 to the replacement theorem (p.1 ) } is a linearly independent subset of S. not all zero.16.

. (6. Note that the 4 x 5 matrix in which the columns are the vectors in S' is the matrix A in Example 2. We can then apply the technique described above to reduce this generating set to a basis for V containing S. It follows that {2 + x + 2x 2 + 3x 3 . To find a subset of S that is a basis for V.1 ) . 47).Sec. the set S' generates V.1. and fifth columns. Our approach is based on the replacement theorem and assumes that we can find an explicit basis 0 for V.4x 4 + 2x 5 = 0}.1.8. (4.2.1 .1.4 Systems of Linear Equations—Computational Aspects Example 3 The set 193 S = {2+x+2x2+3x3.1.9)} is a basis for the subspace of R4 that is generated by S'. Recall that this is always possible by (c) of Corollary 2 to the replacement theorem (p.7). Example 4 Let V = {(xi. (2.7). .).9)} consisting of the images of the polynomials in S under the standard representation of Ps(R) with respect to the standard ordered basis.4 + x + 9x 3 } is a basis for the subspace V of Ps(i?.6 + 3x + 8x 2 + 7x 3 . .1 .1. (4. we consider the subset Sf = {(2.4+2x+4x2+6x3. .0.X2.3). Since 0 C S'. (-5.0. Hence {(2.1.1)} is a linearly independent subset of V.2+x+5x3.0.3. Let S' be the ordered set consisting of the vectors in S followed by those in 0.1. (1.4.0.x 5 ) e R 5 : xi +7x2 + 5x 3 . It is easily verified that V is a subspace of R5 and that S = {(-2. (6.3. From the reduced row echelon form of A.1.6+3x+8x2+7x3. which is the matrix B in Example 2.5). • We conclude this section by describing a method for extending a linearly independent subset S of a finite-dimensional vector space V to a basis for V.6).2 . and fifth columns of A are linearly independent and the second and fourth columns of A are linear combinations of the first.3).2. we see that the first.X4.2.x 3 . 3. (4. .0.1 ) .4+x+9x3} generates a subspace V of Ps(R). third.0.8. third. .

. l . 0 . 0 .0. (4.1.0. 3 Elementary Matrix Operations and Systems of Linear Equations To extend 5 to a basis for V. 0 ) + £ 3 ( 4 . .X4. l . X3 = t2. If X2 = £1. 0 .2 . then the vectors in V have the form (xi.0.1 0 0 oy EXERCISES 1.ti. The matrix whose columns consist of the vectors in S followed by those in 0 is (-2 0 0 -1 ^ 1 -5 -7 -5 4 1 1 1 0 0 -2 0 0 1 0 -1 1 0 0 1 -1 1 0 0 0 -2\ 0 0 0 l ) and its reduced row echelon form is /l 0 0 1 0 0 0 0 \o 0 Thus { ( . To do so. 1 . (-5.1.1.0. 0 ) + ^ ( . Label the following statements as true or false. X4.7 . 0 .t4) = i i ( .0). 0 . (-5.1)} is a basis for V by Theorem 3. and X5. Since in this case V is defined by a single equation.0. X4 = £3. .l).x 5 ) = (-7£i -5t2+4t3 -2t4. Hence 0= {(-7. (4.5 0 0 0 1 .0.5 .0).1 .0).0.0.0.1.194 Chap.x 2 . we solve the system of linear equations that defines V. .t3.1 ) . and X5 = £4.1 . l . .2 .0)} is a basis for V containing S. then the systems Ax = b and ^4'x = b' are equivalent. 0 .0. .0.1.0.1 ) . we first obtain a basis 0 for V.0. ( 1 . 0 ) + t4(-2. 0 . we need only write the equation as Xi = —7x2 — 5x3 + 4x4 — 2x5 and assign parametric values to X2.1.0. (-2.5 0 0 .X3. X3. (a) If (A'\b') is obtained from (A\b) by a finite sequence of elementary column operations. • 0 0 1 0 0 1 0 1 0 0 1 0 -1\ -.1).0. 0 .15.t2.

(g) If a matrix A is transformed by elementary row operations into a matrix A' in reduced row echelon form. then the systems ^4x = 6 and ^4'x = b' are equivalent.x5 = (h) 5xi . then the number of nonzero rows in A' equals the rank of A. (c) If A is an n x n matrix with rank n. Use Gaussian elimination to solve the following systems of linear equations. then the system Ax = 6 is consistent.2x4 . where r equals the number of nonzero rows in A.4x2 .2x2 + x-s . X 3 ——1 Xi .X3 2xi .x 3 . If this system is consistent.Sec. (e) If (A\b) is in reduced row echelon form. Xi + 2X2 — X3 = —1 (a) 2xi + 2x2 + X3 = 1 3xi + 5x2 — 2 .x2 .x 3 + x4 = 3 2xi .2x3 + 5x4 = -6 (g) xi + 2x2 — X3 + 3x4 = 2 (f) 2xi + 4x2 . 3.3x2 + .8x2 + X3 — 4x4 = 9 (e) -X! + 4x2 . 2.4 Systems of Linear Equations—Computational Aspects 195 (b) If (A'\b') is obtained from (^4|6) by a finite sequence of elementary row operations.x3 (b) 3xi . then the reduced row echelon form of A is In(d) Any matrix can be put in reduced row echelon form by means of a finite sequence of elementary row operations.x 3 + 6x4 = 5 x2 + 2x4 = 3 2xi — 2x2 — X3 + 6x4 — 2x5 = 1 .2x2 .5x2 Xi + 5x3 = = = = 1 6 7 9 xi + 2x2 + 2x4 = 6 3xi + 5x2 — X3 + 6x4 = 17 (c) 2xi + 4x2 + x 3 + 2x4 = 12 2xi — 7x3 + HX4 = 7 x\ — x2 — 2x3 + 3x4 = — 7 2xi — X2 + 6x3 + 6x4 = —2 (d) — 2X] + X2 — 4X3 ~ " 3X4 = 0 3xi — 2x2 + 9^3 + IOX4 = —5 xi .4x2 + 5x3 + 7x4 . then the dimension of the solution set of Ax = 0 is n — r.x 5 = 6 5 2 10 5 3xi — X2 + X3 — X4 + 2x5 = X] . (f) Let J4X = 6 be a system of m linear equations in n unknowns for which the augmented matrix is in reduced row echelon form.3x4 + 3x5 = 2xi — X2 — 2x4 + X5 = . X ' i — X2 + X3 + 2x4 — X5 = 2 4xi .

6. Let the reduced row echelon form of A be .X4 = 2 (a) 2xi + x 2 + x 3 .x2 + 2x 3 + 4x 4 + x 5 = 2 Xi — x2 + 2X3 + 3X4 + x 5 = — 1 2xi — 3x2 + 6x3 + 9x4 + 4x5 = —5 7xi — 2x2 + 4x3 + 8x4 + X5 = 6 2xi + 3x3 ~ 4x5 — 5 3xi — 4x2 + 8x3 + 3x4 = 8 Xi — x2 + 2x3 + X4 — X5 = 2 —2xi + 5x2 — 9x3 — 3x4 — 5x5 = —8 (i) (j) 3. and fourth columns of A are and respectively. Suppose that the augmented matrix of a system .4 / |6 / ) if and only if (^4'|6') contains a row in which the only nonzero entry lies in the last column. 3 Elementary Matrix Operations and Systems of Linear Equations 3xi .Ax = 6 is transformed into a matrix (A'\b') in reduced row echelon form by a finite sequence of elementary row operations. (a) Prove that vank(A') ^ rank(. X\ + X2 — 3x3 + X4 = —2 xi + 2x2 — X3 + X4 = 2 (b) xi + x 2 + x 3 . If the system is consistent. find all solutions. For each of the systems that follow. apply Exercise 3 to determine whether the system is consistent. Determine A if the first. Let the reduced row echelon form of A be /l 0 0 \0 -3 0 4 0 1 3 0 0 0 0 0 0 0 5\ 0 2 1 -1 0 0/ 5. find a basis for the solution set of the corresponding homogeneous system.X4 = 2 Xi + X2 — X3 =0 '10 2 0 -2' 0 1 . (b) Deduce that Ax = 6 is consistent if and only if (^4'|6') contains no row in which the only nonzero entry lies in the last column. second.196 Chap.T4 = 2 (c) Xi + X2 — 3X3 + X4 = 1 Xi + X2 + X3 .5 0 . 4.3 J) 0 0 1 6.x 4 = 3 xi + x 2 . Finally.x 3 = 0 xi + 2x2 ~ ~ 3x3 + 2.

-3).0)} is a linearly independent subset of V. Find a subset of {u\.Sec. 9 . (1. 2 . . uz = (-8.6 ) . . 7 ) ug} that is a basis for W. 8 ) generate R3. Find a subset of {ui.0)} is a linearly independent subset of V. 9 .1). (a) Show that S = {(1. . l . Let W be the subspace of M2x2(R) consisting of the symmetric 2 x 2 matrices. Let V = {(xi.0.3. u2 = (1.1. 10.8 .1.2x 2 + 3x 3 . Let V denote the set of all solutions to the system of linear equations ^i — #2 + 2x4 — 3x5 + XQ = 0 2xi — X2 — X3 + 3x4 — 4x5 + 4x6 = 0- . 3. third.2 . 8. u4 = ( 2 . .1.X4.0.1 8 .2.x 2 . —2).37.4. (3.1. 9. generate W.-2. 1 5 . 12.-2). The vectors «i «3 u5 u7 = = = = (2.3 . .. -17). and sixth columns of A are / 1\ -2 -1 i *) >—> ( and 1 2 \-4j 3\ -9 2 197 .. —3. 6 ) .7. .1).x 5 ) G R 5 : Xi .1.9 .4 ) . (b) Extend S to a basis for V. (-1.1 2 . w4 = (1.. respectively.3 .x 4 + 2x 5 = 0}. Let W denote the subspace of R5 consisting of all vectors having coordinates that sum to zero.2.5 . .-5. U4.. .4. The set S = 1 2 2 3 2 1 1 9 1 -2 -2 4 -1 2 2 -1 generates W. 11. (b) Extend S to a basis for V. and u8 = ( 2 . u6 = ( 0 .-3.X3.1. . 1 2 ) .us.2). Find a subset of S that is a basis for W. V V 7. . u2 = ( .-2.12. u2.u2.6 .l . (a) Show that S = {(0.-9.4 Systems of Linear Equations—Computational Aspects Determine A if the first. It can be shown that the vectors u\ = (2.u$} that is a basis for R3. Let V be as in Exercise 10. and u5 = ( .

-1. Let V be as in Exercise 12. Prove the corollary to Theorem 3. prove that A is also in reduced row echelon form.output matrix 177 Nonhomogeneous system of linear equations 171 Nonnegative vector 177 Open model of a simple economy 178 Positive matrix 177 Rank of a matrix 152 Reduced row echelon form of a matrix 185 Solution to a system of linear equations 169 Solution set of a system of equations 169 System of linear equations 169 Type 1. 2. 14.16: The reduced row echelon form of a matrix is unique. (b) Extend 5 to a basis for V.2. INDEX OF DEFINITIONS FOR CHAPTER 3 Augmented matrix 161 Augmented matrix of a system of linear equations 174 Backward pass 186 Closed model of a sirnpic economy 176 Coefficient matrix of a system of linear equations 169 Consistent system of linear equations 169 Elementary column operation 148 Elementary matrix 149 Elementary operation 148 Elementary row operation 148 Equilibrium condition for a simple economy 177 Equivalent systems of linear equations 182 Forward pass 186 Gaussian elimination 186 General solution of a system of linear equations 189 Homogeneous system corresponding to a nonhomogeneous system 172 Homogeneous system of linear equations 171 Inconsistent system of linear equations 169 Input. If (A\b) is in reduced row echelon form.1. (a) Show that S = {(1.0.0)} is a linearly independent subset of V.0).1. (1.0)} is a linearly independent subset of V.1.0.0. 13.1.1.0).198 Chap. (0.1.1.1. and 3 elementary operations 148 .1. 3 Elementary Matrix Operations and Systems of Linear Equations (a) Show that S = {(0.0. 15. (b) Extend S to a basis for V.1.

Although the determinant is not a linear transformation on M n x n ( F ) for n > 1. To illustrate the important role that determinants play in geometry. Yet no linear algebra book would be complete without a systematic treatment of the determinant.2 Determinants of Order n 4.3 Properties of Determinants 4. which has played a prominent role in the theory of linear algebra. Although it still has a place in the study of linear algebra and its applications. we also include optional material that explores the applications of the determinant to the study of area and orientation. which is optional. we extend the definition of the determinant to all square matrices and derive its important properties and develop an efficient computational procedure. it does possess a kind of linearity (called n-linearity) as well as other properties that are examined in this chapter. For the reader who prefers to treat determinants lightly.3.5* A Characterization of the Determinant JL he determinant.5. its role is less central than in former times. In Section 4. which we discuss in Chapter 5. is a special scalar-valued function defined on the set of square matrices. However. Section 4.1 Determinants of Order 2 4. we define the determinant of a 2 x 2 matrix and investigate its geometric significance in terms of area and orientation. offers an axiomatic approach to determinants by showing how to characterize the determinant in terms of three key properties.2 and 4.4 Summary — Important Facts about Determinants 4. we consider the determinant on the set of 2 x 2 matrices and derive its important properties and develop an efficient computational procedure. 199 . the main use of determinants in this book is to compute and establish the properties of eigenvalues.1 DETERMINANTS OF ORDER 2 In this section.1. In Sections 4. Finally. and we present one here.4 D e t e r m i n a n t s 4. 4. Section 4.4 contains the essential properties that are needed in later chapters.

'4 4N 9 8 • Since det ( 4 + B) ^ det(A) + det(B). v — (b\. If A = a b c d Chap. For the matrices A and B in Example 1.6 = 0.b2). we have 1 2 3 4.3 = .200 Definition. to be the scalar ad — be. and ™ B = ~ 3 2 \Q 4 determinant det(A) = 1 . Theorem 4. we have A+B = and so / det(A + B) = 4-8 .4-9 = .4 . 4 Determinants is a 2 x 2 matrix with entries from a field F. denoted det(^4) or \A\.4 . That is. the function det: M2X2(-R) — * i? is not a linear transformation. Nevertheless. The function det: M2x2(-F") —> F is a linear function of each row of a 2 x 2 matrix when the other row is held fixed..2 .4 . Example 1 For the matrices A = in M2x2(R).2 and det(B) = 3 . v. be a scalar. Then det ID kdctiV)=det(ai WI \Ci ^ c2 A-d. which is explained in the following theorem. and w = (c\.1. then ^ l and det I w I". and w are in F2 and k is a scalar. Let u = (ai. U + KV J )=dct[W)+kdetln \U I \V U + kv w )=det(" \w A . then we define the of A. ifu.c2) be in F2 and A . { h h C] c2 . the determinant does possess an important linearity property.2 .a2). det w Proof.

Let A G M2x2(-^). Moreover. and so A is invertible and M = A~x. it is easily checked that A is invertible but D is not.Then the determinant of A is nonzero if and only if A is invertible.Sec. Theorem 4. 153).&2C1) + k{b\C2 . suppose that A is invertible. Hence A\\ ^ 0 or A2\ ^ 0...1. Note that det(i4) ^ 0 but dct(#) = 0. If det(yl) ^ 0.(«2 + kb2)<\ = det = det A similar calculation shows that det A-det = det u + ki a\ + kb\ c\ u + kv w a. then we can define a matrix M = I . Conversely.1 Determinants of Order 2 = {a\(-2 .b><-\ = (a\ + kbi)c2 .4 (p.2 + /c&2 c2 201 For the 2 x 2 matrices A and B in Example 1. then A'1 = Ai. 22 det(A) V-^2i -Avl Au Proof. it. If An =^ 0.2. follows that . det(i4) \-A2i -A i-' An A straightforward calculation shows that AM = MA = I. A remark on page 152 shows that the rank of A = Au A2\ Ai2 A22 must be 2. if A is invertible. add —A21/A11 times row 1 of A to row 2 to obtain the matrix A22 ~ Because elementary row operations are rank-preserving by the corollary to Theorem 3. We now show that this property is true in general. 4.

In Sections 4.A ! 2 A 2 i ^ 0. In the remainder of this section. Thus.2 remains true in this more general context.1. Recall that a coordinate system {u.202 Chap. we extend the definition of the determinant to nxn matrices and show that Theorem 4.3. On the other hand. in either case.A n / A 2 i times row 2 of A to row 1 and applying a similar argument. (See Figure 4. I.2. we explore the geometric significance of the determinant of a 2 x 2 matrix. det(A) ^ 0. we see that det(A) ^ 0 by adding . if A2i ^ 0.2 and 4. 4 Determinants Therefore det(A) = AnA22 . The Area of a Parallelogram By the angle between two vectors in R2. which can be omitted if desired.) Clearly 0 1*" I v Notice that ei = 1 and O (w e2/ l 'I 0 1 ~ 1 V-C2. In particular.) Figure 4.v} is an ordered basis for R2. we show the importance of the sign of the determinant in the study of orientation. we mean the angle with measure 9 (0 < 9 < IT) that is formed by the vectors having the same magnitude and direction as the given vectors but emanating from the origin.1: Angle between two vectors in R' If 0 = {u. -1. v} is called right-handed if u can be rotated in a counterclockwise direction through an angle 9 (0 < 9 < n) . we define the orientation of 0 to be the real number del O del (The denominator of this fraction is nonzero by Theorem 4.

1 Determinants of Order 2 203 to coincide with v. .) In general (see Exercise 12). v) A A ^ M ' x Figure 4.. we also define () = 1 if {u. Otherwise {u.) Observe that V in the following origin of R2.2 O = 1 if and only if the ordered basis {u.Sec. V V I A right-handed coordinate system A left-handed coordinate system Figure 4. Regarding u and v as arrows emanating from the call the parallelogram having u and v as adjacent sides the determined by u and v. which we consider to be a degenerate parallelogram having area zero. if u and r are parallel).e.2.v} is linearly dependent. Any ordered set {u. we parallelogram if the set {//. 4.3: Parallelograms determined by u and v is linearly dependent (i. v} is called a left-handed system. (See Figure 4. v} forms a right-handed coordinate system.3. For convenience. v} in R2 determines a parallelogram manner. then the "parallelogram" determined by u and v is actually a line segment. (See Figure 4.

First. since 0 [l) = H. however. we cannot expect that < But we can prove that A|W v from which it follows that det Our argument that » > : . Observe first. can be generalized to R n .: employs a technique that. although somewhat indirect.* » : . 4 Determinants the area of the parallelogram determined by u and v. =0 u\ "l-det \vI W -«**: we may multiply both sides of the desired equation by to obtain the equivalent form ° ' : • * : .» : • . and det which we now investigate. that since det may be negative.204 There is an interesting relationship between A Chap.

4. since the altitude h of the parallelogram determined by u and cv is the same as that in the parallelogram determined by u and v.M .4.) Hence Figure 4.4 " • : = ° : .Sec.o r : .1 Determinants of Order 2 205 We establish this equation by verifying that the three conditions of Exercise 11 are satisfied bv the function < = ° : -A : (a) We begin by showing that for any real number c l")-e.t[u \v .*(" cv J \v Observe that this equation is valid if c = 0 because '™J = ° l o -A o1 So assume that c ^ 0. (See Figure 4. : c )=c. Regarding cv as the base of the parallelogram determined by u and cv.sr v^ |c|-A -A : A similar argument shows that Si0") v ~e. we see that A[ J = base x altitude = |c|(length of v)(altitude) = \c\ • A ( ).

5 = A If a = 0.//•} is linearly independent. Then for any vectors V[. and b. ) =a-6 [ /. We are now able to show that s( '•' )=s(u) \Vl+V2j V'"/ + 4 " V'-' for all u. 2).2)11 + (6i + b2 = (bl+b2)6 ir .v\.V2 G R2.206 We next prove that Chap. 4 Determinants 1.5). Otherwise.t'2 € R2 there exist scalars a. then au + bw J u+w = 61 ")=b-of \bw I \w by the first paragraph of (a). Thus si .' \ u +-ic I \-w So the desired conclusion is obtained in either case.. such that Vi = a. if a ^ 0. •vi +v2 =8 II [O] + 0. Since the result is immediate if u = 0. we assume that u / 0. Choose any vector w € R2 such that {(/. Because the parallelograms determined by u and w and by u and u + w have a common base u and the same altitude (see Figure 4.W + bite (i = 1. w € R2 and any real numbers a and b. )=»•*(" mi + bw J \w for any u. it follows that Figure 4. then u 81 ) =a-6 \au + bwj "i.

4) = 0. (c) Because the parallelogram determined by ei and e2 is the unit square.v 6 R2(b) Since A f£\ = 0. . that the area of the parallelogram determined by u = (-1.2 ) is det det 4 -2 = 18. )+*( !* a\u + biwl \a2u-\-b2wl A similar argument shows that 51U1+U2)=S(U1)+5(U2 for all ui. EXERCISES 1.Sec. 'j '2 e2 Therefore 5 satisfies the three conditions of Exercise 11.1 Determinants of Order 2 1. Label the following statements as true or false. (d) If u and v are vectors in R2 emanating from the origin. d e t r vI \v Thus we see. then the area of the parallelogram having u and v as adjacent sides is det . (c) If A e M2x2{F) and det(. and hence 5 = det. then A is invertible. (b) The determinant of a 2 x 2 matrix is a linear function of each row of the matrix when the other row is held fixed. for example.5) and v = (4. it follows that = *(" \V\ 6 C^\ = 0 f£\ • A (^) = 0 for any u G R2.U2. (a) The function det: W\2x2(F) —* F is a linear transformation. 4. So the area of the parallelogram determined by u and v equals r .

Prove that if B is the matrix obtained by interchanging the rows of a 2 x 2 matrix A.2 ) (d) u = (3. The classical adjoint of A1 is C*. Compute the determinants of the following matrices in M2x2(-ft)(a) 'I 1 ) « ( 1 i) W (a -f 3.1) (c) u = ( 4 . 4 Determinants (e) A coordinate system is right-handed if and only if its orientation equals 1.A) for any A 6 M2X2(-F)8. (a) u = (3. Prove that if A e M2 X 2(^) is upper triangular. Aoo -A: 21 -A 1 2 A« 11.2 ) and v = (2..det(A).3) a n d u = (-3. (b) / 5-2i (-3+ i 6 + 4A 7* .4) and v = (2. The classical adjoint of a 2 x 2 matrix A G M2X2(-F) is the matrix C = Prove that (a) (b) (c) (d) 0 4 = AC = [det(A)]7. If A is invertible.4 A 3 + 2* 2 . (i) 8 is a linear function of each row of the matrix when the other row is held fixed. then A" 1 = [ d e t ^ ^ C . For each of the following pairs of vectors u and v in R . Prove that det(A') = det(. 5) (b) u = (1. 7. compute the area of the parallelogram determined by u and v.l ) a n d v = ( .6 . .208 Chap. Compute the determinants of the following matrices in M2x2(C). 2. (ii) If the two rows of A E M2x2(F) are identical.3 i ] . then 8(A) = 0. det(C) = det(A). then det(B) = . 6. . Be M2X2{F). . Let 8: N\2x2{F) — » F be a function with the following three properties. 9. 10. . (a) '-1 + i 1 . Prove that if the two columns of A G M2x2(F) are identical. then det(^l) equals the product of the diagonal entries of A. Prove that det{AB) = det(A) • det(B) for any A. . v (C) /2i 3 I 4 6i 4..6 ) 5. then dct(A) = 0.

(This result is generalized in Section 4. we extend the definition of the determinant to n x n matrices for n > 3. Thus for /.l ) 1 + i A y • d e t ( i y ) . so that A = (An). v} forms a right-handed coordinate system. Definitions. Let A G M n X n(F). Prove that 8(A) = det(A) for all A G M 2X 2(F).1) x (n . Prove that • 0 - if and only if {u. 8 1 5 6 8 9 Aia = I 5 7 8 and A-12 = 1 3 4 6 2 3\ 5 6 8 9/ eM3x3(R).2 Determinants of Order n (iii) li I is the 2 x 2 identity matrix. we define det(A) — An.Sec. Hint: Recall the definition of a rotation given in Example 2 of Section 2. A=\4 \7 we have Au« id for / 1 -1 -3 4 B = 2 -5 6 V-2 we have Bo* = i -l 2 .) 12.5. we define det(A) recursively as n det(A) = j ^ ( .5 -2 6 -r 8| 1 and £4. it is convenient to introduce the following notation: Given A G M n X n (F). . 7f n = 1. for n > 2. 2 1 -3 -4 1\ -1 G M 4 x 4 (tf).1) matrix obtained from A by deleting row i and column j by Aij.r} be an ordered basis for R2. 4. Let {u. then 8(1) = 1.For n > 2. 4.1.2 DETERMINANTS OF ORDER 71 In this section. denote the (n . For this definition.

1 because det(A) = Au(-l)l+1 Example 1 Let 1 3 . this definition of the determinant of A agrees with the one given in Section 4.3 [ . I — 3 -2)+<-ift3)-*tL2 2 -3 + (-l)4(-3)-detv_4 -5 4 = 1 [ . det(A{j) is called the co factor of the entry of A in row i.5 ) ( . Thus the determinant of A equals the sum of the products of each entry in row 1 of A multiplied by its cofactor.6 ) . Using cofactor expansion along the first row of A.det(An) + ( . 4 Determinants of A and is also denoted by\A\.3 . • . Note that. This formula is called cofactor expansion along the first row of A.<^AN . Letting Cij = (-l)^det(AV) denote the cofactor of the row i. column j. for 2 x 2 matrices.6 ) . .210 The scalar det (A) is called the determinant The scalar (-l)i+j Chap.3(26) .l ) 1 + 1 A n . we can express the formula for the determinant of A as det(A) = A u d i + Auci2 + ••• + Ai„ci„.( .5 ( . we obtain det(A) = ( .l ) 1 + 2 A i 2 .5 -4 4 -3\ 2 € M 3x3 (i?).2(4)] .det 4 .4 ) ] = 1(22) . -6/ det(An) + A 1 2 ( .det(A 12 ) + (-l)1+3A13-det(A13) .2(-4)] . column j entry of A.3(-32) = 40.2'= (-l)2(l).l ) 1 + 2 d e t ( A 1 2 ) = A n A 2 2 A12A21.3 [-3(4) .3 ( .

• Example 3 Let 2 0 0 1\ 0 1 3 .l ) 5 ( l ) .4 ) .5 2 GM 4 x 4 (fi). det(C 14 ) = (-l)2(2)-det 1 3 -3 -5 \-4 4 / -3\ 2 + 0 +0 -6/ -5 ( 0 1 •f ( .1(48) = 32.2 .det(B13) -^+(-l)3(l).1(12)+ 3(20) = 48.2 ( . \ 4 -4 4 -ey Using cofactor expansion along the first row of C and the results of Examples 1 and 2. we obtain det(C) = (-1) 2 (2). d e t I .(-3)(4)] = 0 . we obtain det(B) = (-l)1 +l Bu • det(Bn) + (-l)1+2Bi2- det(BV2) + (-iy+3BV3. 4/ 211 Using cofactor expansion along the first row of B.Sec. det(Cn) + (-1) 3 (0).3 .2 Determinants of Order // Example 2 Let B= 0 -2 \ 4 / 1 -3 -4 3\ .det(C 12 ) + (-1) 4 (0)« det(C 13 ) + (-1) 5 (1).det(-^ -2 4 -3 -4.1 [-2(4) .3 C = -2 .5 GM 3 x 3 (i?).3 4 -4 = 2(40)+ 0 + 0 . ~\ = (-l)2(0)-det(^ + (-l)4(3)-det = 0 . 4.(-5)(4)] + 3 [ . • .

200) that. We now show that a similar property is true for determinants of any size. and each a^ are row vectors in F n . and let I denote the nxn identity matrix. we obtain det(I) = (-1) 2 (1) • det(Jn) + (-1) 3 (0) • det(/ 1 2 ) + • • • + (-l)l+n(0). for 1 < r < n. we present a more efficient method for evaluating determinants. Assume that the determinant of the (n — 1) x (n — 1) identity matrix is 1 for some n > 2. and so the determinant of any identity matrix is 1 by the principle of mathematical induction.1 (p. it is a linear function of each row when the other row is held fixed.3. The result is clearly true for the l x l identity matrix. Assume that for some integer n > 2 the determinant of any (n — l ) x ( n — 1) matrix is a linear function of each row when the remaining . Proof.212 Example 4 Chap. • As is illustrated in Example 3. Using cofactor expansion along the first row of J. 4 Determinants The determinant of the n x n identity matrix is 1.+ 0 = 1 because In is the (n . but we must first learn more about them. The proof is by mathematical induction on n. even for matrices as small as 4 x 4 . Recall from Theorem 4. Theorem 4. v. \ ar_! v ar+i det o r _i u + kv a r +i = det whenever k is a scalar and u. the calculation of a determinant using the recursive definition is extremely tedious.det(/ln) = l(l) + 0 + --.1) x (n . Later in this section. we have "I \ ( «i \ ar_ i u + fcdet O-r+l ( a. The determinant of an n x n matrix is a linear function of each row when the remaining rows are held fixed.1) identity matrix. That is. We prove this assertion by mathematical induction on n. This shows that the determinant of the nxn identity matrix is 1. The result is immediate if n = 1. although the determinant of a 2 x 2 matrix is not a linear transformation.

respectively. row r — 1 of Au is (bi+kci. and C y are the same except for row r — 1. and suppose that for some r (1 < r < n). a2. Let B G M n x n ( F ) . • • •. the rows of Aij.... We must prove that det(A) = det(B) + fcdet(C). Proof.... Since Z?ij and Cij are (n — 1) x (n — 1) matrices. For r > 1 and 1 ^ j < n-.. bn) and v = (c\. . which is the sum of row r — 1 of B\j and fc times row r — 1 of Cij...2 Determinants of Order n 213 rows are held fixed. a n . then det(B) = (-l)i+k det(Bik). respectively. 6„ + fccn).Sec. Thus since A\j = Z?ij = C\j. 4.. we have det(Aij) = det(Z?ij) + kdet(Cij) by the induction hypothesis. If row i of B equals e^ for some k (1 < k < n). v G F n and some scalar k. Our next theorem shows that the determinant of a square matrix can be evaluated by cofactor expansion along any row. c n ). bj^i + kCj-i. we have n det(A) = ^2(-l)1+j Au • d e t ( i i j ) = y > i ) ' + ' A .£ ( .. Let A be an n x n matrix with rows a\.. then det(A) = 0. Moreover.l ) 1 + i A i i • dct(Bl3) + * 2 ( . Let u = (&i. and let £ and C be the matrices obtained from A by replacing row r of A by u and v. b2.l ) 1 + i A y * det(Cy) = det(£) + fcdet(C).. We leave the proof of this fact to the reader for the case r = 1.bj+i + fccj+i. we have ar = u + kv for some u. If A & M n x n ( F ) has a row consisting entirely of zeros. The definition of a determinant requires that the determinant of a matrix be evaluated by cofactor expansion along the first row.. JBIJ. c2. Its proof requires the following technical result.det(fiy) +Ardet(Cij .. See Exercise 24. and so the theorem is true for all square matrices by mathematical induction. where n > 2. This shows that the theorem is true for nxn matrices. Corollary.. Lemma.

l ) i + > B y .l ) 1 + ^ l j . Assume that for some integer n > 3.1 . let Cij denote the (n — 2) x (n — 2) matrix obtained from B by deleting rows 1 and i and columns j and k. and so the lemma is true for all square matrices by mathematical induction. The lemma is easily proved for n = 2. [(-!)(*-!)+*det(^) j>fc 1+ = ( .3. e t ( C i j<fc + ^(_1)i+0-i)5lj. we have (-l)«-i)+(*-i)det(Cy = I 0 V< (-l)(*-i)+*det(Cy) if j <k if j = k )ij>k.det(C.i+fc ^ ( . Hence by the induction hypothesis and the corollary to Theorem 4. . 4 Determinants Proof.l ) 1 + ^ i r det(By) + £ ( . The result follows immediately from the definition of the determinant if i = 1.d e t ( B l i . if j > k. it follows that (let(£) = (-l) i + f c det(G i f c )This shows that the lemma is true for n x n matrices.214 Chap.l ^ Z V [(_i)(*-D+(*-i) det(C^) 3<k + £ ( . Suppose therefore that 1 < i < n.l ) 1 + ^ • det(Bn. j>fc Because the expression inside the preceding bracket is the cofactor expansion of Bik along the first row. j>fc j<fc = ^ ( . row i — 1 of Z?ij is the following vector in F n _ 1 : efc_i 0 efc ifj<fc if j = A . the lemma is true for (n — 1) x (n — 1) matrices. and let B be an n x n matrix in which row i of B equals e* for some /c (1 < k < n).l ) ^ i r d . j=i = ^ ( . The proof is by mathematical induction on n. For each j ^ A : (1 < j < n). For each j . f Therefore n det(B) = ^ ( .

3 provides this information for elementary row operations of type 2 (those in which a row is multiplied by a nonzero scalar). we have n det(A) =^Aij 3=1 n det(Bj) = ^(-1)* + J A.l ) i + J A i r d e t ( A i 3=1 by Theorem 4. 4. and hence det(A) = 0. Since each Ay is an (n — 1) x (n — 1) matrix with two identical rows. This completes the proof for n x n matrices. If A G M n X n (F) has two identical rows. Now det(A) = ^ ( .3 and the lemma.4. Next we turn our attention to elementary row operations of type 1 (those in which two rows are interchanged). The proof is by mathematical induction on n.4. let Bj denote the matrix obtained from A by replacing row i of A by ej. Row i of A can be written as X)?=i ^ijej. It is possible to evaluate determinants more efficiently by combining cofactor expansion with the use of elementary row operations.• det(A^). We leave the proof of the result to the reader in the case that n = 2. the induction hypothesis implies that each det(A^) = 0. Because n > 3. 3=1 I Corollary. d e t ( A ^ ) . Before such a process can be developed. That is. we need to learn what happens to the determinant of a matrix if we perform an elementary row operation on that matrix. then for any integer i (I < i < n). So the result is true if i = 1.For 1 < j < n. ..2 Determinants of Order n 215 We are now able to prove that cofactor expansion along any row can be used to evaluate the determinant of a square matrix. Theorem 4. if A G M n x n ( F ) .l ) J + M ^ .. Assume that for some integer n > 3.. Theorem 4. n det(A) = ^ ( . Then by Theorem 4. and so the lemma is true for all square matrices by mathematical induction. then det(A) = 0. i-i Proof. we can choose an integer i (1 < i < n) other than r and s. Fix i > 1. it is true for (n — 1) x (n — 1) matrices. The determinant of a square matrix can be evaluated by cofactor expansion along any row. and let rows r and s of A G M n x n ( F ) be identical for r ^ s. Proof.Sec. Cofactor expansion along the first row of A gives the determinant of A by definition.

an /aA / \ ( *i \ / a. Let the rows of A G M n X n ( F ) be a\. \ L T-ZZ .. Thus /ai\ a. Then det(B) = det(A). a n .d e t ( A ) . Therefore det(B) = . We now complete our investigation of how an elementary row operation affects the determinant of a matrix by showing that elementary row operations of type 3 do not change the determinant of a matrix.6.4 and Theorem 4. det \ + det ar + a. If A G M n x n (F) and B is a matrix obtained from A by interchanging any two rows of A.216 Chap.5. det / /ai\ "/• det ar \an/ \an/ \ = det ar + a4 On / /ai\ a. a2.. Let A G M n X n (F)...3. then det(-B) = — det(A). and let B be a matrix obtained by adding a multiple of one row of A to another row of A. A = and B = ar \anJ \anJ (ttl\ as Consider the matrix obtained from A by replacing rows r and s by ar + at By the corollary to Theorem 4. where r < s. we have ( ai ar + as 0 = det ar + a. and let B be the matrix obtained from A by interchanging rows r and s. V an fai\ ar = det ar\an/ Va n / = 0 + det(A) + det(B) + 0. 4 Determinants Theorem 4. Proof. Theorem 4.

If the rank of A is less than n. then the rows ai. then det(B) = det (A). then det(B) = -det(A).. Corollary. 201). Applying Theorem 4. then det(B) = fcdet(A). and so det(B) = 0...b2. row r. Then row r of B consists entirely of zeros. an of A are linearly dependent.3 to row s of B.. Suppose that B is the nxn matrix obtained from A by adding k times row r to row s. we can prove half of the promised generalization of this result in the following corollary. det(B) = det(A). The converse is proved in the corollary to Theorem 4. So there exist scalars Cj such that ar = c\a\ H h Cr-icir-i + cr+iar+i -I h cnan.6. we proved that a 2 x 2 matrix is invertible if and only if its determinant is nonzero.5. In Theorem 4.. Let the rows of A be a\. Then bi = ai for i ^ s and bs = as + kar. If A £ M n x n ( F ) has rank less than n.7. I The following rules summarize the effect of an elementary row operation on the determinant of a matrix A G M n X n (F).. Consider.2 Determinants of Order 217 Proof..Sec. But by Theorem 4. where r ^ s.6.. we obtain M = . Let C be the matrix obtained from A by replacing row s with a r . say. Hence det(A) = 0 .2 (p. (a) If B is a matrix obtained by interchanging any two rows of A. By Exercise 14 of Section 1.. the matrix in Example 1: -3\ 4 -6/ Adding 3 times row 1 of A to row 2 and 4 times row 1 to row 3..4. (c) If B is a matrix obtained by adding a multiple of one row of A to another row of A.. some row of A. is a linear combination of the other rows. then det(A) = 0. (b) If B is a matrix obtained by multiplying a row of A by a nonzero scalar k. a2. These facts can be used to simplify the evaluation of a determinant. we obtain det(B) = det(A) + Ardet(C) = det(A) because det(C) = 0 by the corollary to Theorem 4. As a consequence of Theorem 4. and the rows of B be b\. bn. 4. an.. Let B be the matrix obtained from A by adding — c? times row i to row r for each i ^ r. for instance. a2. Proof.

4 Determinants Since M was obtained by performing two type 3 elementary row operations on A.1 ) 1 + 1 ( 1 ) .) By using elementary row operations of types 1 and 3 only. The preceding calculation of det(P) illustrates an important general fact. as described earlier.l ) 1 + 1 ( l ) . Since det(A) = det(M) = det(P). (See Exercise 23. we have reduced the computation of det (A) to the evaluation of one determinant of a 2 x 2 matrix.6. and so we can easily evaluate the determinant of any square matrix. it follows that det(A) = 40.l ) 1 + 2 ( 3 ) * det(M 12 ) + (-l)1+3(-3)*det(M13).218 Chap. The cofactor expansion of M along the first row gives det(M) = ( . we can transform any square matrix into an upper triangular matrix. det(Mn) = ( . we have det(P) = ( . If we add —4 times row 2 of M to row 3 (another elementary row operation of type 3). Both Mi2 and M\3 have a column consisting entirely of zeros. The next two examples illustrate this technique. and so det(Mi2) = det(Mi 3 ) = 0 by the corollary to Theorem 4. we obtain P = Evaluating det(P) by cofactor expansion along the first row.10 = 40.1 ) ^ ( 1 ) . det ( / 6 _~l = l[4(-18)-(-7)(16)]=40. we have det(A) = det(M). d e t ( P n ) = (-l)1+1(l)-det^ ~ £ ) =1*4. Thus with the use of two elementary row operations of type 3.1 ) 1 + 1 ( 1 ) . det(Mn) + ( . But we can do even better. The determinant of an upper triangular matrix is the product of its diagonal entries. Hence det(M) = ( . Example 5 To evaluate the determinant of the matrix .

Interchanging rows 1 and 2 of B produces C = -2 0 4 -3 -5 1 3 -4 4 By means of a sequence of elementary row operations of type 3. is far more efficient than using cofactor expansion. Since det I . it follows that det(B) = .det(C) = 48. This matrix can be transformed into an upper triangular matrix by means of the following sequence of elementary row operations of type 3: / 2 0 -2 \ 4 0 1 -3 -4 0 3 -5 4 A -3 2 -6) (2 0 0 0 1 -3 -4 0 1 0 0 • 0 3 -5 4 0 3 4 0 A -3 3 -8/ A -3 -6 V f2 0 * • 0 \0 0 1 0 0 0 A 3 -3 4 -6 16 . 1 = ad — be.1. 4. Since C was obtained from B by an interchange of rows. .Sec.v in Example 3. as illustrated in Example 6.2 Determinants of Order n 219 in Example 2.4. Consider first the evaluation of a 2 x 2 matrix. we can transform C into an upper triangular matrix: Thus det(C) = -2-1-24 = -48. Example 6 The technique in Example 5 can be used to evaluate the determinant of the matrix C = 2 0 -2 4 0 1 -3 -4 0 3 -5 4 A -3 2 • .2 0 ^ /2 0 0 \° T hus det(C) = 2. 4 = 32 Using elementary row operations to evaluate the determinant of a matrix. we must begin with a row interchange.

which is not large by present standards. By contrast. then det(B) = det(A). we have defined the determinant of a square matrix in terms of cofactor expansion along the first row. (h) The determinant of an upper triangular matrix equals the product of its diagonal entries. (b) The determinant of a square matrix can be evaluated by cofactor expansion along any row. These properties enable us to evaluate determinants much more efficiently. We then showed that the determinant of a square matrix can be evaluated using cofactor expansion along any row.220 Chap. then det(B) = kdct(A). (f) If B is a matrix obtained from a square matrix A by adding k times row i to row j. evaluating the determinant of an n x n matrix by cofactor expansion along any row expresses the determinant as a sum of n products involving determinants of (n— 1) x (n— 1) matrices. including properties that enable us to calculate det(B) from det (A) whenever B is a matrix obtained from A by means of an elementary row operation. 4 Determinants the evaluation of the determinant of a 2 x 2 matrix requires 2 multiplications (and 1 subtraction). we showed that the determinant possesses a number of special properties. EXERCISES 1. Thus it would take a computer performing one billion multiplications per second over 77 years to evaluate the determinant of a 20 x 20 matrix by this method. Thus in all. (e) If B is a matrix obtained from a square matrix A by multiplying a row of A by a scalar. In the next section. (c) If two rows of a square matrix A are identical. then det(A) = 0. then det(B) = —det(A). the evaluation of the determinant of an n x n matrix by cofactor expansion along any row requires over n! multiplications. whereas evaluating the determinant of an n x n matrix by elementary row operations as in Examples 5 and 6 can be shown to require only (n 3 + 2n — 3)/3 multiplications. In addition. (d) If B is a matrix obtained from a square matrix A by interchanging any two rows. cofactor expansion along a row requires over 20! ~ 2. then det(A) = 0. (a) The function det: M n X n ( F ) — > F is a linear transformation. (g) If A G M n x „ ( F ) has rank n.4 x 1018 multiplications. . For n > 3. To evaluate the determinant of a 20 x 20 matrix. the method using elementary row operations requires only 2679 multiplications for this calculation and would take the same computer less than three-millionths of a second! It is easy to see why most computer programs for evaluating the determinant of an arbitrary matrix do not use cofactor expansion. Label the following statements as true or false. we continue this approach to discover additional properties of determinants. In this section.

1 0 2\ 0 1 5 (i.5 . 1 -1 2 -A -3 4 1 . evaluate the determinant of the given matrix by cofactor expansion along the indicated row. -1 3 0/ along the first row 1 0 8. along the fourth row 11.2 2 3 .1 0 1 1 2 O y v-l along the fourth row (% 2+i 0 \ -1 3 2i \ 10. In Exercises 13-22. 0 1+i 2 \ -2i 0 1 r« 3 4i 0 y along the third row / 0 2 1 3\ 1 0 .2 Determinants of Order n 2./ along the second row / 12. 2 along the I 2N 0 -3 3 0.3 8 ^-2 6 -4 I. c3) 221 3. : J 5. second row !). Find the value of k that satisfies the following equation: / det 2a i 36i + 5d \ 7cj 2a 2 362 + 5c2 7c2 2a 3 \ /oj a2 363 + 5c3 = fcdct [ 6X b2 7c3 / \c\ c2 a3\ b3 . c3/ 4. V 2 3 0/ along the first row 0 -1 7. \ 0 -1 1 -.Sec.1 2 . Find the value of k that satisfies the following equation: /3a i det I 36i \3ci 3a 2 3o2 3c2 3a 3 \ fa\ 363 = kdct I &i 3c 3 / \d a2 b2 e2 a3\ b3 .. +61 b2 + c2 a 2 + c2 a2+62 b3 + c3\ /oj a 3 + c 3 j = /edet I 6i a3 + 6 3 / \d a2 62 c 2 a3' 63 c3/ In Exercises 5-12. evaluate the determinant of the given matrix by any legitimate method. ^-1 along the 0 2\ 1 5 3 0/ third row . Find the value of k that satisfies the following equation: /bi+ci det j ai + ci \ a . 4. .

Because the determinant . . 26.3. 23... 25. 24..aj. 29. Prove that if A G M n X n ( F ) has two identical columns. then det(A) = 0. a2. Let A G M n x n ( F ) . Calculate det(Z?) in terms of det(A). an. .1.' Prove that if E is an elementary matrix.3 PROPERTIES OF DETERMINANTS In Theorem 3. Compute det(Fj) if F.. Under what conditions is det(-A) = det(A)? 27. then det(F') = det(F). is an elementary matrix of type i. 30. a n _ i . Prove the corollary to Theorem 4. we saw that performing an elementary row operation on a matrix can be accomplished by multiplying the matrix by an elementary matrix. This result is very useful in studying the effects on the determinant of applying a sequence of elementary row operations. and let B be the matrix in which the rows are a n . 4 Determinants 21. .. Prove that the determinant of an upper triangular matrix is the product of its diagonal entries. . 4. Prove that det(ArA) = kn det(A) for any A G M n x n ( F ) . 28.222 Chap. Let the rows of A G M n x n ( F ) be a\.

.(F).1 (p. det(AP) = — dct(B) = det(A). if A is invertible. If A G M n X n ( F ) is not invertible. AB is a matrix obtained by interchanging two rows of B. then det(A) = 0 by the corollary to Theorem 4. 159).• Sec. v . 4.5 (p. say. A matrix A G M n x n ( F ) is invertible if and only if det(A) 7^ 0.) If A is an nxn matrix with rank less than n. (a) If E is an elementary matrix obtained by interchanging any two rows of/. For any A.3 Properties of Determinants 223 of the nxn identity matrix is 1 (see Example 4 in Section 4. then A is invertible and hence is the product of elementary matrices (Corollary 3 to Theorem 3. Hence by Theorem 4.6 p. (b) If E is an elementary matrix obtained by multiplying some row of / by the nonzero scalar k.2). if A has rank n.1 . det(A _ 1 ) = det(AA _ 1 ) = det(/) = 1 . On the other hand.det(B). Theorem 4. 217). then det(A _ 1 ) = -—7—-. (See Exercise 18. Thus det(AB) = det(A). 159).1 . then det(F) = k. Similar arguments establish the result when A is an elementary matrix of type 2 or type 3. If A is an elementary matrix obtained by interchanging two rows of / . det(AP) = det(A). On the other hand.7.7 (p. A = Em • • • E2Ei. B G . We now apply these facts about determinants of elementary matrices to prove that the determinant is a multiplicative function. if A G MnXTl(F) is invertible. 216). Proof. The first paragraph of this proof shows that det(AB) = det(F r = det(F r •E2ElB) det(Fm_!---F2Fi£) = det(F m ) = det(A)-det(P).det(P) in this case. Furthermore.6 (p.det(P). Since rank(AP>) < rank(A) < n by Theorem 3.6 (p. then det(A) = . thendet(F) = . then det(F) = 1. det(A) Proof. 216). then the rank of A is less than n. But by Theorem 3. det(F 2 ) • det(Fi) • det(P) I = det(Fm---F2Fi)-det(P) Corollary. So det(A) = 0 by the corollary to Theorem 4. then det(A). we have det(AB) = 0. (c) If E is an elementary matrix obtained by adding a multiple of some row of J to another row. we can interpret the statements on page 217 as the following facts about the determinants of elementary matrices. 149). We begin by establishing the result when A is an elementary matrix.

If det(A) ^ 0.224 by Theorem 4.2. (The effect on the determinant of performing an elementary column operation is the same as the effect of performing the corresponding elementary row operation. Xk _ det(Mfc) " det(A) ' . On the other hand. and so A1 is not invertible. det(A') = det(A). the recursive definition of a determinant involved cofactor expansion along a row..7.. where x = (x\.• • E2E\. and for each k (k = l. For example. For any A G M n x n (F).. • • • • det(F 2 ) •det(Fi) = det(F m • •• F 2 F 0 = det(A). we have used only the rows of a matrix.8 are that determinants can be evaluated by cofactor expansion along a column.G (p.det(F 2 )---.• <let(F m ) = d e t ( F m ) . if A is invertible.7 we have det(A') = det(E\El2 .. Theorem 4. Let Ax = b be the matrix form of a system of n linear equations in n unknowns. Since the rows of A are the columns of A6. det(A') = det(A). . Our next result shows that the determinants of A and A1 are always equal. Since det(F. say A = Em.n). then A is a product of elementary matrices.9 ( C r a m e r ' s Rule).) = det(F|) for every i by Exercise 29 of Section 4. Hence det(A) ^ 0 and det(A~ 1 ) = Chap. I Among the many consequences of Theorem 4. Thus det(A') = 0 = det(A) in this case. 4 Determinants det(A) In our discussion of determinants until now. by Theorem 4. and the more efficient method developed in Section 4.det(F2) = d e t ( F 0 . T h e o r e m 4.x2.. If A is not invertible.. Proof.fit ) det(F^) = det(Fi).8. But rank(A') = rank(A) by Corollary 2 to Theorem 3.. in either case.xny. 158).) We conclude our discussion of determinant properties with a well-known result that relates determinants to the solutions of certain types of systems of linear equations. then tin's system has a unique solution. Thus.2 used elementary row operations.. then rank(A) < n. this fact enables us to translate any statement about determinants that involves the rows of a matrix into a corresponding statement that involves its columns.. and that elementary column operations can be used as well as elementary row operations in evaluating a determinant.2.

Thus AXk = Mk.3 Properties of Determinants where Mk is the nxn byb. . The matrix form of this system of linear equations is Ax = b.det(Xfc) = Therefore xfc = [det(A)].1 -det(M fc ). 90). 4. where '1 ll Because det (A) = 6 ^ 0 .10 (p. then the system Ax = b has a unique solution by the corollary to Theorem 4.9.13 (p. Using the notation of Theorem 4. Cramer's rule applies. AXk is the nxn matrix whose zth column is Aei = ai if i ^ k and Ax = b if i = k. let ak denote the kth column of A and Xk denote the matrix obtained from the nxn identity matrix by replacing column k by x. For each integer k (1 < k < n).Sec.7 and Theorem 3. det(Mfc) = det(AXk) = det(A). Then by Theorem 2. 225 matrix obtained from A by replacing column k of A Proof. Example 1 We illustrate Theorem 4. we have '2 2 13 0 1 1 det(A) 2 I1 3 1 det(A) 3> 1 -1 I det(A)-xk= xk- det (Mi) Xi = det (A) 15 6 det(M 2 ) x2 = det(A) 3s 1 -1 -6 = -1.7.9 by using Cramer's rule to solve the following system of linear equations: xi + 2x2 + 3x 3 = 2 x\ + x3 = 3 x\ + x2 — x3 = 1.Evaluating Xk by cofactor expansion along row k produces det(X fc ) = xk-det(I„-i) Hence by Theorem 4. 174). If det (A) ^ 0.

which was discussed in Section 3. a 2 . p.) Example 2 The volume of the parallelepiped having the vectors O] = (1. we sometimes need to know that there is a solution in which the unknowns are integers.1). as the determinant calculation shows. .1. Hence by the familiar formula for volume. .226 and det det(M 3 ) ^3 = det(A) det(A) Chap. its volume should be \/6*\/2*\/3 = 6. 1993. In this situation. and a 3 = (1. 4 Determinants 6 2 Thus the unique solution to the given system of linear equations is (xx. a 2 . . Cramer's rule can be useful because it implies that a system of linear equations with integral coefficients has an integral solution if the determinant of its coefficient matrix is ±1. Elementary Classical Analysis. Hoffman.6) with sides of lengths \/6. Thus Cramer's rule is primarily of theoretical and aesthetic interest.4. The amount of computation to do this is far greater than that required to solve the system by the method of Gaussian elimination. Freeman and Company. a n . Cramer's rule is not useful for computation because it requires evaluating n + 1 determinants o f n x n matrices to solve a system of n linear equations in n unknowns. 524. Marsden and Michael J. • In our earlier discussion of the geometric significance of the determinant formed from the vectors in an ordered basis for R2. As in Section 4. respectively. . and y/3. we also saw that this . (For a proof of a more generalized result.1. If the rows of A are a i .1) as adjacent sides is = 6. On the other hand. . New York. Note that the object in question is a rectangular parallelepiped (see Figure 4. it is possible to interpret the determinant of a matrix A G M n x n ( P ) geometrically.x3) = ( 2'-1'2 In applications involving systems of linear equations.0.x2. —1). .-2. a 2 = (1.a„ as adjacent sides. W. then |det(A)| is the n-dimensional volume (the generalization of area in R2 and volume in R 3 ) of the parallelepiped having the vectors a i . see Jerrold E. rather than of computational value. . \/2. .H.

Specifically. where Q is the change of coordinate matrix changing 7-coordinates into /^-coordinates.6: Parallelepiped determined by three vectors in R3.1) (1. determinant is positive if and only if the basis induces a right-handed coordinate system. 4. Thus. A similar statement is true in R".Sec.0. if 7 is any ordered basis for Rn and (3 is the standard ordered basis for R n .-1) Figure 4. for instance.3 Properties of Determinants 227 (1. 0 1/ . then 7 induces a right-handed coordinate system if and only if det(Q) > 0. 7 = induces a left-handed coordinate system in R3 because whereas 7 = induces a right-handed coordinate system in R3 because r det 2 \0 "2 °\ 1 0 = 5 > 0.-2.

For any A. det(AB) = det(A) • det(B). where x = (x\.2x 2 + x 3 = 10 3x! + 4x 2 .B G M n X n (F).2x2 + x3= 0 3xi + 4x2 — 2x3 = —5 2xi + x 2 . matrix is invertible if and only if .8 to prove a result analogous to Theorem 4. det (A*) = . if (3 and 7 are two ordered bases for R n . In Exercises 2-7.det(A). a-nxi + d\2X2 = b\ a2\X\ + a22x2 = b2 where ana22 . where Q is the change of coordinate matrix changing 7-coordinates into /^-coordinates.8 4. use Cramer's rule to solve the given system of linear equations. . 9.4 5.3x 3 = 5 3. . x\ — x 2 + Ax3 = —2 6. . x n ) f . (h) Let Ax = b be the matrix form of a system of n linear equations in n unknowns. then the unique solution of Ax = b is Xk ~ _ det(M fc ) det(A) i Q v k _ l 2 tar *-*M. Use Theorem 4.x 2 =12 xi + 2x 2 + x 3 = . If det(A) 7^ 0 and if Mk is the nxn matrix obtained from A by replacing row k of A by bl. For any A G M n x n ( F ) . .2 x i .a i 2 a 2 i ^ 0 2xi + x2 — 3x 3 = 1 x\ .x 2 + x 3 = 6 8. then det(F) = ±1. -8x1 + 3x 2 + x 3 = 0 2xi . .3 (p. Prove that an upper triangular nxn all its diagonal entries are nonzero. 4 Determinants More generally. but for columns.228 Chap. xi . 2. then the coordinate systems induced by (3 and 7 have the same orientation (either both are right-handed or both are left-handed) if and only if det(Q) > 0.2x 3 = 0 xi . A matrix M G M n X n ( F ) has rank n if and only if det(M) ^ 0. EXERCISES 1.-. The determinant of a square matrix can be evaluated by cofactor expansion along any column. (g) Every system of n linear equations in n unknowns can be solved by Cramer's rule.x 2 + 4x 3 = . -8x1 + 3x2 + x 3 = 8 2xi — x 2 + x 3 = 0 3xi + x 2 + x 3 = 4 7. 212). Label the following statements as true or false. (a) (b) (c) (d) (e) (f) If E is an elementary matrix.n. x 2 . A matrix M G M n x n ( F ) is invertible if and only if det(M) = 0.

where O is the nxn zero matrix. where Mij is the complex conjugate of Mij. then det(M) = det(A)« det(C). 11. then | det(Q)| = 114. then det(M) = 0. Mk = O. + Prove that if M G M n x n ( F ) can be written in the form M = 'A O B' C wheje A and C are square matrices. and let B be the matrix in M n X n ( F ) having Uj as column j. /. Prove that if M is nilpotent. 20. 21. 4.* Prove that if A. A matrix A G M n x n ( F ) is called lower triangular if Atj — 0 for 1 < » < J < Ti. Prove that if Q is orthogonal. A matrix M G M n x n ( C ) is called skew-symmetric if Ml = — M. 16. B G M n x n ( F ) are similar.1 ) . where Q* = Ql. 15. un} be a subset of F n containing n distinct vectors. for some positive integer k. then det(AB) = det(A) • det(B). 18. Describe det (A) in terms of the entries of A. Complete the proof of Theorem 4. then det(A) = det(B).3 Properties of Determinants 229 10. then A is invertible (and hence B = A . Let A. Use determinants to prove that if A.Be Mnxn(F) be such that AB = -BA. Prove that 0 is a basis for F n if and only if det(B) ^ 0. Let /3 = {ui. A matrix M G M n x n ( C ) is called nilpotent if. (b) A matrix Q G M n X n (C) is called unitary if QQ* = / . Prove that if M is skew-symmetric and n is odd. . (a) Prove that det(Af) = det(M).Sec. Prove that if Q is a unitary matrix. u2. Suppose that A is a lower triangular matrix.j. What happens if n is even? 12. Suppose that M G M n X n ( F ) can be written in the form M = where A is a square matrix. 17. 19. Prove that det(M) = det(A). 13. B G M n X n ( F ) are such that AB =-. • • •. A matrix Q G M n x n (i?) is called orthogonal if QQl — I. then A or B is not invertible. then M is not invertible. then det(Q) = ±1. Prove that if n is odd and F is not a field of characteristic two.7 by showing that if A is an elementary matrix of type 2 or type 3. For M G M n x n (C). let M be the matrix such that (M)ij = ~M~~j for all i.

Let A G M „ x n ( F ) be nonzero.c n are distinct scalars in an infinite field F. 25.. f(cn)). an m x m s u b m a t r i x is obtained by deleting any n ~ m rows and any n — m columns of A. . suppose that rank (A) = k. where J is the nxn identity matrix. For any m (1 < m < n). (b) Conversely. (a) Show that M = [T]l has the form Co Cf]\ \l C cnJ A matrix with this form is called a V a n d e r m o n d e matrix. .230 Chap. Let c. (a) Prove that if B is t he matrix obtained from A by replacing column /. 0<i<j<n the product of all terms of the form c3.jk denote the cofactor of the row j. Prove that rank(A) = k. .(F). Let T: P„(F) — > F n + 1 be the linear transformation defined in Exercise 22 of Section 2. (c) Prove that det(M)= J] (<*-*). (b) Use Exercise 22 of Section 2. 23.• by ej.. Prove that there exists a A :x A : submatrix with a nonzero determinant..ci. — C{ for 0 < i < j < n. where^ cn. Let 0 be the standard ordered basis for P n ( F ) and 7 be the standard ordered basis for F n + 1 . .4 by T ( / ) = (/(co). / ( d ) . column k entry of the matrix A G M nX7 . Let A G M n X n ( F ) have the form ( 0 1 0 A = \ 0 0 0 0 0 -1 0 0 0 ••• -1 ao \ ax "2 On-l/ Compute det(A + tl). 24. 4 Determinants 22.4 to prove that det(M) •/ 0. . then det(B) = Cjk- . (a) Let k (1 < k < n) denote the largest integer such that some k x k submatrix has a nonzero determinant.

then AC = [det(A)]7. Let y\.. 27. define T(y) G C°° by / y(t) y'(t) Wn)(t) vi(t) . 4.square matrix A is the transpose of the matrix whose ij-entry is the ij-cofactor of A. The classical adjoint of a . 28.y2. The following definition is used in Exercises 26 27. (c) Deduce that if C is the nxn matrix such that Cij = c. then C and A . Prove the following statements.ji.yn be linearly independent functions in C°°. in) y\ (t) [T(y)](i)=det \ y2(t) v'zit) (n). Let C be the classical adjoint of A G M r i X n (F). m n. we have cj2 A \CjnJ = det(A) 231 Hint: Apply Cramer's rule to Ax = ej.Sec. (c) If A is an invertible upper triangular matrix.3 Properties of Determinants (b) Show that for 1 < j < n. (a) det(C) = [detCA)]""1... For each y G C°°. (b) Cl is the classical adjoint of A1. y} 2'"(t) ••• ••• yn(t)\ y'Jt) y(n\t)I . 26. then A" 1 = [det(A)]~lC. '4 0 0> (b) ( 0 4 0 0 0 4.. Find the classical adjoint of each of the following matrices. Definition.1 are both upper triangular matrices. (d) Show that if det(A) / 0.

If A is 1 x 1. If A is n x n for n > 2. 4 Determinants The preceding determinant is called the Wronskian of y. the scalar (—l)i+J' det(Aij) is called the cofactor of the row i column j entry of A.. \ 4.1 ) * + ' A W • det(Aij) (if the determinant is evaluated by the entries of row i of A) or n +JA det(A) = Y^{-tf ij' i=l det(iij) (if the determinant is evaluated by the entries of column j of A). 2. y2. where Aij is the (n-l)x(n-l) matrix obtained by deleting row i and column j from A.2 and 4. consequently.. the single entry of A. (a) Prove that T: C°° — » C°° is a linear transformation.3. until 2 x 2 matrices are obtained. then det (A) = A n . and so forth.Ai2A2\. In the formulas above. (b) Prove that N(T) contains span({yi..4 SUMMARY—IMPORTANT FACTS ABOUT DETERMINANTS In this section. The determinants of the 2 x 2 matrices are then evaluated as in item 2. the determinant of A is evaluated as the sum of terms obtained by multiplying each entry of some row or column of A by the cofactor of that entry. y\. then n det(A) = J 3 ( . These determinants are then evaluated in terms of determinants of (n — 2) x (n — 2) matrices. The determinant of an n x n matrix A having entries from a field F is a scalar in F . • . and can be computed in the following manner: 1. "13- ( 5 3) = ( "1)(3) " (2)(5) = 3. denoted by det(A) or |A|. If A is 2 x 2.yn})yn.232 Chap.•. the facts presented here are stated without proofs. . The results contained in this section have been derived in Sections 4. In this language.. we summarize the important properties of the determinant needed for the remainder of the text. det For example. then det(A) = AnA 2 2 . Thus det (A) is expressed in terms of n determinants of (n — 1) x (n — 1) matrices.

l ) 1+2 f\ (l)det 2 \3 2 1 \Z -4 . and —13. Similarly.3 ) ] + (-1)(1)[(1)(1) . Thus det(A) = ( .16 + 0 = -23. let US also compute the determinant of A by expansion along the second column. and 8.1 1 2. -4 -3 -1 1 + (-l)3+2(0)det = 14 + 40 + 0 + 48 = 102. \ .4 .l ) ( . The cofactor of A.l ) 4 + 1 d e t ( £ ) . We have det(B) = (-lY^(l)det (l43 " J ) + ( .3 1 \3 6 1 2) To evaluate the determinant of A by expanding along the fourth row.V .4 Summary—Important Facts about Determinants 233 Let us consider two examples of this technique in evaluating the determinant of the 4 x 4 matrix (2 1 1 5\ 1 1 .Sec.4 . 4. and A44 are 8.l ) * + i ( l ) d c t ( J . 11. and A 42 are —14. We can now evaluate the determinant of A by multiplying each entry of the fourth row by its cofactor: t his gives det(A) = 3(23) + 6(8) + 1(11) + 2(-13) = 102.n = 3 is ( . the cofactors of A 42 . respectively. A 22 . we must know the cofactors of each entry of that row.(5)(-3)] + 0 = . A 43 . * ) + (-l)3+1(0)det( ] _ j ) = 1(1)[(-4)(1) . For the sake of comparison.7 . where Let us evaluate this determinant by expanding along the first column. Thus the cofactor of A4X is (— 1 )5(—23) = 23. respectively. The reader should verify that the cofactors of A 12 .( .1 A = 2 0 . '2 -l)4+'(6)det I 1 2 1 5N -3 1 1 2.l \ (2 -3 I +(-l)2+2(l)det 2 1 2 / \3 1 . 10.

l) = 1(-1) 1+2 -5 ~6\ -3 1 • . 2. then det(B) = det(A). (The elementary row operations used here consist of adding multiples of row 1 to rows 2 and 4. As an example of the use of these three properties in evaluating determinants. Our procedure is to introduce zeros into the second column of A by employing property 3. then det(Z?) = k.1 0 2 ) = 102. Therefore det(A) = l ( . . and then expand along that column.det(A). let us compute the determinant of the 4 x 4 matrix A considered previously. The difference is the presence of a zero in the second column. This technique utilizes the first three properties of the determinant.3 1 6 1 2/ /-I det 2 V-9 -1 0 det 2 0 \-9 0 ( 2 X 1 5 \ -6 -5 1 -3 . the fact that t he value 102 is obtained again is no surprise since the value of the determinant of A is independent of the choice of row or column used in the expansion. In fact.5 -28.4 .234 Chap.1 0 .) This procedure yields (2 1 det(A) = det 2 V3 1 1 5\ 1 . For this reason. This results in the value —102. it is often helpful to introduce zeros into the matrix by means of elementary row operations before computing the determinant. it is beneficial to evaluate the determinant of a mat rix by expanding along a row or column of the matrix that contains t he largest number of zero entries. If B is a matrix obtained from an n x n matrix A by adding a multiple of row i to row j or a multiple of column i to column j for i / j. Observe that the computation of det(A) is easier when expanded along the second column than when expanded along the fourth row. and then to expand along that column. If B is a matrix obtained by interchanging any two rows or interchanging any two columns of an nxn matrix A.l ) 1 + 2 ( . If B is a matrix obtained by multiplying each entry of some row or column of an n x n matrix A by a scalar k. 4 Determinants Of course. which makes it unnecessary to evaluate one of the cofactors (the cofactor of A 32 ). Properties of the Determinant 1. then det(B) = — det(A).5 -28 The resulting determinant of a 3 x 3 matrix can be evaluated in the same manner: Use type 3 elementary row operations to introduce two zeros into the first column. 3.

4. 5. Indeed.1 . In the chapters that follow. For any n x v matrices A and B. . then the determinant of the matrix is zero.1 del -4 5-10 \ 3 -2 10 /I 0 0 \0 -1 1 0 0 ( l\ (\ 0 4 = det 0 -6 () -\) ^ -l I I I 2 -5 -2 4 2 -5 3 0 1\ 2 -2 -4/ l\ 2 _4 6/ 2 (\ .l 1\ 2 0 -5 I = det 3 -4 0 0 9 -*) 0 V° l . notice that -3 0 0 1 4 0 2\ 5 = ( . I -1 2 2 .) 6. det(7) = 1.det(B). (b) Compute the product of the diagonal entries.Sec. and 3 are employed. In particular. perhaps the most significant property of the determinant is that it provides a simple characterization of invertible matrices.4 Summary—Important Facts about Determinants 235 Fhe reader should compare this calculation of det (A) with the preceding ones to see how much less work is required when properties 1. For instance. The determinant of an upper triangular matrix is the product of its diagonal entries. The next two properties of the determinant are useful in this regard: 4. 2.6 ) = 72. (See property 7. -6/ Property 1 provides an efficient method for evaluating the determinant of a matrix: (a) Use Gaussian elimination and properties 1. The next three properties of the determinant are used frequently in later chapters. 2. If two rows (or columns) of a matrix are identical.l . det(AB) = det(A). As an illustration of property 4.3 ) ( 4 ) ( .3 * 6 = 18. and 3 above to reduce the matrix to an upper triangular matrix. we often have to evaluate the determinant of matrices having special forms.

(c) If two rows or columns of A are identical. (a) (c) 4 2 -5 3 .2i (d) 3. 4 Determinants 7.i (b) -1 7 3 8 3 0/ ii li 2+ z 1 . Furthermore. then det(A) = 0. thendet(B) = det(A). If A and B are similar matrices. (e) If B is a matrix obtained by multiplying each entry of some row or column of A by a scalar. then det(AB) = det(A) * det(B).236 Chap. (k) A matrix Q is invertible if and only if det(Q) ^ 0. then dct(Q~ l ) = [det(Q)] -1 ..1 ) = .det(A). (h) For every A € M n x n ( F ) . EXERCISES 1. (i) If A. if A is invertible. 9. The final property. det(A) 8. (b) In evaluating the determinant of a matrix. is used in Chapter 5. (f) If B is a matrix obtained from A by adding a multiple of some row to a different row. An n x n matrix A is invertible if and only if det(A) ^ 0. then det(B) = det(A). then det(A) = det(B). . Evaluate the determinant of the following matrices in the manner indicated.. Label the following statements as true or false. the determinants of A and A' are equal. stated as Exercise 15 of Section 4. it is wise to expand along a row or column containing the largest number of zero entries. For any n x n matrix A. It is a simple consequence of properties 6 and 7.3. Evaluate the determinant of the following 2 x 2 matrices.l + 3i 3. then det(B) = det(A). 2. For example. B G M n X f l (F). (d) If B is a matrix obtained by interchanging two rows or two columns of A. property 7 guarantees that the matrix A on page 233 is invertible because det (A) = 102. (g) The determinant of an upper triangular nxn matrix is the product of its diagonal cut ries.— r . (a) The determinant of a square matrix may be computed by expanding the matrix along any row or column. (j) If Q is an invertible matrix. then det(A . det(A') = .

third row 0 1 2s -1 0 -3 (c) 2 3 0. 4. Evaluate the determinant of the following matrices by any legitimate method.: | J \ 2 3 0/ along the first row 237 (a) (b) along the first column 1 0 (d) -1 along the 0 2N 1 o 3 0. along the second column / 0 1+i 2 \ -2/ 0 1 . Suppose that M G M n X n ( F ) can be written in the form M = A BN O / where A is a square matrix. .1 2 -5 -3 8 \-2 6 -4 \) along the fourth row / :S\ 2 1 2 1 0 -2 I 3 -1 0 (g) V-l 1 2 0/ along the fourth column (h) 4. ( 1 -5 (h) -9 \-A -2 12 22 9 3 -14 -20 -14 -12\ 19 31 15/ 5.4 Summary—Important Facts about Determinants .i] \ 3 Ai 0 / along the third row / () (e) i 2+i 0 \ -1 3 2/ (f) 0 -1 l-i/ along the third column 1 -1 2 -1\ -3 4 1 .Sec. Prove that det(M) = det(A).

we showed that the determinant possesses a number of properties. v. define 8j: M n x n ( F ) ~> F by 8j(A) = AijA2j • • • Anj for each A G M n X n ( F ) . • Example 2 For 1 < j <n. 8 is n-linear if. . then det(Af) = det(A)* det(C).i Op_i u + kv \ ( a. .2 and 4.1 to establish the relationship between det ( ) and the area of the parallelogram determined by u and v. 4. > Or-l li a r +i ( ii \ ar_i v a r +i \ «« / = 8 k8 \ «» / whenever k is a scalar and u. This characterization of the determinant is the one used in Section 4. . that is. The first of these properties that characterize the determinant is the one described in Theorem 4.3. . and each Oj are vectors in F". that is. we show that three of these properties completely characterize the determinant. 2 . Definition. n. the only function 8: M n x n ( F ) — *F having these three properties is the determinant.238 Chap. 212). for every r = 1 . 8j(A) equals the product of the entries of column j of . Example 1 The function 8: M „ x n ( F ) -> F defined by 8(A) = 0 for each A G M n x n ( F ) is an n-linear function. A function 8: M n x n ( F ) — » F is called an n-linear function if it is a linear function of each row of an n x n matrix when the remaining n — \ rows are held fixed. that is. In this section.3 (p.5* A CHARACTERIZATION OF THE DETERMINANT In Sections 4.* Prove that if M e M n x n ( F ) can be written in the form u (A ( o B M = \ o)' where A and C are square matrices. 4 Determinants 6. we have / o.

3 (p. and v = (bub2. For our purposes this is the most important example of an n-linear function.. For if / is the nxn identity matrix and A is the matrix obtained by multiplying the first row of / by 2. Then each 8j is an n-linear function because. .5 A Characterization of the Determinant 239 A.bn) G F n .. a{ = (An.Sec. 212) asserts that the determinant is an n-linear function.8 The function 8: M n X n ( F ) -> F defined for each A G M n x n ( F ) by 8(A) = AnA 2 2 • • -Anil (i.. Ain). Now we introduce the second of the properties used in the characterization of the determinant. An n-linear function 8: M n X n ( F ) — > F is called alternating if. we have ar_i a r + kv ar+i V On / = Aij • • • A ( r _ 1 ) j (A r j + kbj)A{r+1)j •••A ".. we have 8(A) = 0 whenever two adjacent rows of A are identical.. for each A G M n X n (F). Let Ae M n x n ( F ) .) : . +1 ) i • • • Anj + Aij • • • A{r_Uj(kbj)A{r+i)j = Aij • • • A(r_i)jArjA(r+i)j ( «1 \ ar-i ar ar+i \an Example 3 ) \an ) / ay \ (lr-l V ar+i • • • Anj • • • Anj + k(Aij • • • A( r _i)j-6jA( r+ i)j • • • Anj) =8 k. • Example 4 The function 8: MnXn(B) — > R defined for each A G M n x n (i?) by 8(A) = tr(A) is not an n-linear function for n > 2..e. Ai2. 4. for any scalar k.. then 8(A) = n+1^2n = 2-8(1)./ = Aij • • • A (r _.A ri A( T . 8(A) equals the product of the diagonal entries of A) is an n-linear function. • Theorem 4.. . Definition.

o s _i. Beginning with ar and a r +i. (b) Suppose that rows r and s of A G M n X n ( F ) are identical.a n . an.. It follows from the preceding paragraph that 8(B) = (-l)(. ar+i. If s = r + 1. . 4 Determinants T h e o r e m 4..240 Chap... . .r . a 2 . a r + i . a n . Then successively interchange as with the row that precedes it until the rows are in the order ax. . This process requires an additional s — r — 1 interchanges of adjacent rows and produces the matrix B. (a) Let A G M n x n ( F ) . (a) If A G M n x n ( F ) and B is a matrix obtained from A by interchanging any two rows of A. . . + 1 / ai \ flr+1 Or V an y ai \ "I ar+i a r + a r j-i \ an / \ 0=8 = 6" ar + ar+i +< 5 ( «t \ ar+1 ar+1 v «n y = 5 Or V an y w = 0 + 8(A) + 8(B) + 0. a s .. . a2. Let 8: M n X n ( F ) — > F be an alternating n-linear function.s . a r . and let B be the matrix obtained from A by interchanging rows r and s. .r ) + ( s .. (b) If A e M n x n ( F ) has two identical rows. where r < s. a s + i . successively interchange a r with the row that follows it until the rows are in the sequence ai. then 8(B) = -8(A). In all... We first establish the result in the case that s = r + 1. a s . as+i. ... Proof. Thus 8(B) = -<5(A). . Next suppose that s > r + 1..s —r interchanges of adjacent rows are needed to produce this sequence.1 ) <5(A) = -<5(A).. .. we have \ ar + or+i ar + ar+i an /fli\ / / ai \ ar a. then 8(A) = 0.. o r .a2.. . where r < s. .. Because 8: M 7 l X n (F) — » F is an n-linear function that is alternating. and let the rows of A be a i . a r _ i . ar_i. then 5(A) = 0 because 5 is alternating and two adjacent rows ..10.

Then the rows of A. But 8(B) = -8(A) by (a).6 (p. 2. Moreover. Hence the third condition that is used in the characterization of the determinant is that the determinant of the n x n identity matrix is 1. 4. then 8(B) = <5(A). and C are identical except for row j. (See Exercise 12. If M G M n X n (F) has rank less than n. row j of B is the sum of row j of A and k times row j of C. I The next result now follows as in the proof of the corollary to Theorem 4. and so it is omitted. Corollary 3.7 (p. and let C be obtained from A by replacing row J of A by row z of A. respectively. lXTi (F) by adding k times row i of A to row j .Sec. Since 5 is an n-linear function and C has two identical rows. Let 8: M n x n (F) — • F be an alternating n-linear function. Then 8(E\) = -8(1).) Corollary 2. and 3. this can happen only if 8(1) = 1. Proof. Proof. If . If B is a matrix obtained from A G M„ X „(F) by adding a multiple of some row of A to another row. In view of Corollary 3 to Theorem 4. it follows that 8(B) = 8(A) + k8(C) = 8(A) + fc-0 = 8(A). Let B be obtained from A G M. We wish to show that under certain circumstances. we must first show that an alternating n-linear function 8 such that 8(1) = 1 is a multiplicative function. we have 8(AB) = 8(A)-8(B).5 A Characterization of the Determinant 241 of A are identical. Then 8(B) = 0 because two adjacent rows of B are identical. and E3 in M n X f l (F) be elementary matrices of types 1. and S(E3) = 8(1).10 and the facts on page 223 about the determinant of an elementary matrix. where j ^ i. Let 8: M n x n ( F ) —> F be an alternating n-linear function such that 8(1) = 1.11. Let 8: M n X n (F) — • F be an alternating n-linear function. the only alternating n-linear function 8: M n x „ ( F ) — + F is the determinant. Exercise.) T h e o r e m 4. 8(A) = det(A) for all A G M„xri(F). 216). let B be the matrix obtained from A by interchanging rows r + 1 and s. . 223). I Corollary 1. then 8(M) = 0. Be M n x n ( F ) . Let 8: M n X n (F) — • F be an alternating n-linear function. Before we can establish the desired characterization of the determinant.s > r + 1. (See Exercise 11. and let E\. Suppose that E2 is obtained by multiplying some row of I by the nonzero scalar k. The proof of this result is identical to the proof of Theorem 4. 8(E2) = k-8(I). Proof. B. Hence 8(A) = 0. Exercise. that is. For any A. E2.

12. EXERCISES 1.}(F) — * F in Exercises 3-10 are 3-linear functions.. say A = Em---E2En Since 8(1) = 1. (f) The function 8: Mnxn(F) -* F defined by 8(A) = 0 for every A G M n x n ( F ) is an alternating //-linear function. 159).6 (p. Hence by Theorems 4.. (c) If 8: M n X f l (F) — * F is an alternating n-linear function and the matrix A G M „ x n ( F ) has two identical rows. 8(A) = 0. we have 8(A) = det(A) in this case. Since the corollary to Theorem 4.242 Proof. (b) Any n-linear function 8: M n x n ( F ) — * F Is a linear function of each row of an n x n matrix when the other n — 1 rows are held fixed. Label the following statements as true or false. 217) gives det(A) = 0. HE2)-8(Ei) det(F 2 )*det(F 1 ) I = det(Fm---F2F. then by Corollary 2 to Theorem 4. 4 Determinants T h e o r e m 4. | X „(F) — > F is a linear transformation.11 and 4. and let A G M „ x n ( F ) . If A has rank less than n. we have 8(A) = 8(Em • • • E2EX) = <KJSm) = det(F w ) = det(A)..lustily each answer. on the other hand. Kxercise.10 and the facts on page 223 that 8(E) — det(F) for every elementary matrix E. Let 8: M n X n ( F ) — > F be an alternating n-linear function such that 8(1) = 1. and has the property t hat 8[ I) 1.) Theorem 4. (d) If 8: M nXT . is alternating. (a) Any n-linear function 8: M . 223). Determine all the 1-linear functions 8: M ) x i ( F ) — » F. Chap. Determine which of the functions 8: M3X.6 p. A has rank n. .(1 •') I that i^ //-linear.(F) — > F is an alternating n-linear function and B is obtained from A G M n x n ( F ) by interchanging two rows of A.7 (p. then 8(A) = det(A) for every A G M n X n (F). If. . then 8(B) = 8(A).10. 2. it follows from Corollary 3 to Theorem 1. Proof. If 8: M„ X H (F) — • F is an alternating n-linear function such that 8(1) = 1. then 8(A) = 0. . (e) There is a unique alternating n-linear function 8: M n X n (F) — > F.12 provides the desired characterization of the determinant: It is the unique function 8: M. then A is invertible and hence is the product of elementary matrices (Corollary 3 to Theorem 3.

8(A) = AiiA 2 3 A 3 2 6. Prove that the function 8: M 2X 2(F) — * F defined by 8(A) = AnA 2 2 a + AnA2ib + Ai2A22c + Ai2A2id is a 2-linear function.2 (p. Ai2A2xd . 5(A) = A „ + A 2 3 + A32 7. 9). Prove that if 8(B) = —8(A) whenever B is obtained from A G M n X n ( F ) by interchanging any two rows of A.5 A Characterization of the Determinant 3. Prove Theorem 4. 17. 20.d G F. 9). where k is any nonzero scalar 4.10.Sec. 13. 5(A) = AnA 21 A 3 2 8. Prove that if 8: M nXT1 (F) — > F is an alternating n-linear function. 8(A) = A 2 2 5. Give an example to show that the implication in Exercise 19 need not hold if F has characteristic two. Let 8: M n X n (F) — * F be an n-linear function and F a field that does not have characteristic two. Prove that a linear combination of two n-linear functions is an n-linear function.d G F.b. 15. then 8(M) = 0 whenever M G M n X n ( F ) has two identical rows. 19. where the sum and scalar product of n-linear functions are as defined in Example 3 of Section 1. 16. 8(A) = AnA3iA32 243 9. Prove that det: M 2 x 2 ( F ) — > F is a 2-linear function of the columns of a matrix. 14. Prove that 8: M 2 x 2 (F) — » F is a 2-linear function if and only if it has the form 8(A) = An A 22 o + AnA 2 i& + A i 2 A 2 2 c + for some scalars a. b. 18. Let a. 8(A) = k. then there exists a scalar k such that 8(A) = A:det(A) for all A G M n X n (F). 4.2 (p. Prove Corollaries 2 and 3 of Theorem 4.c.c. 8(A) = AUA22A33 AUA21A32 11. 12. Prove that the set of all n-linear functions over a field F is a vector space over F under the operations of function addition and scalar multiplication as defined in Example 3 of Section 1.11. 8(A) = AnA222A233 10.

244 Chap. 4 Determinants INDEX OF DEFINITIONS FOR CHAPTER 4 Alternating n-linear function 239 Angle between two vectors 202 Cofactor 210 Cofactor expansion along the first row 210 Cramer's rule 224 Determinant of a 2 x 2 matrix 200 Determinant of a matrix 210 Left-handed coordinate system 203 n-linear function 238 Orientation of an ordered basis 202 Parallelepiped. volume of 226 Parallelogram determined by two vectors 203 Right-handed coordinate system 202 .

3* 5.2 5. see. how can it be found? Since computations involving diagonal matrices are simple. for example. A linear operator T on a finite-dimensional vector space V is called diagonalizable if there is an ordered basis (3 for V such that [T]^ 245 . The key to our success was to find a basis (3' for which \J\p> is a diagonal matrix.5 D i a g o n a l i z a t i o n 5. 5. we seek answers to the following questions.5. We now introduce the name for an operator or matrix that has such a basis. an affirmative answer to question 1 leads us to a clearer understanding of how the operator T acts on V. Does there exist an ordered basis (3 for V such that [T]/? is a diagonal matrix? 2. his chapter is concerned with the so-called diagonalization problem. A solution to the diagonalization problem leads naturally to the concepts of eigenvalue and eigenvector.4 Eigenvalues and Eigenvectors Diagonalizability Matrix Limits and Markov Chains Invariant Subspaces and the Cayley-Hamilton Theorem J. as we will see in Chapter 7. Section 5. We consider some of these problems and their solutions in this chapter. they also prove to be useful tools in the study of many nondiagonalizable operators. 1. If such a basis exists. Definitions. Aside from the important role that these concepts play in the diagonalization problem.3. we were able to obtain a formula for the reflection of R2 about the line y = 2x. and an answer to question 2 enables us to obtain easy solutions to many practical problems that can be formulated in a linear algebra context. For a given linear operator T on a finite-dimensional vector space V.1 5.1 EIGENVALUES AND EIGENVECTORS In Example 3 of Section 2.

. The scalar A is called the eigenvalue corresponding to the eigenvector v. 0 [T]3 = \u 0 K) 0 A2 0 In the preceding paragraph. The scalar X is called the eigenvalue of A corresponding to the eigenvector v. A linear operator T on a tinilc-dhncnsional vector space V is diagonalizable if and only if there exists an ordered basis 0 for V consisting of eigenvectors ofT. • • .i>2. if Av = Xv for some scalar X. then clearly /A. Let T be a linear operator on a vector space V. Let A be in M n X n ( F ) ..v2. . .v„} is an ordered basis for V such that T(VJ) = XjVj for some scalars Ai. Theorem 5. Using the terminology of eigenvectors and eigenvalues.. The words characteristic vector and proper vector are also used in place of eigenvector. . because v lies in a basis. if D = [T]ff is a diagonal matrix. ifT is diagonalizable.. 5 Diagonalization A square matrix A is called diagonalizable if \-A is We want to determine when a linear operator T on a finite-dimensional vector space V is diagonalizable and. then for each vector Vj G f3. . A nonzero vector v G V is called an eigenvector of T if there exists a scalar A such that T(v) = An.. and D = [T]^. .. Furthermore. we can summarize the preceding discussion as follows. each vector v in the basis 0 satisfies the condition that T(v) = An for some scalar A.. Chap. Note that. Moreover. then D is a diagonal matrix and Djj is the eigenvalue corresponding to Vj for 1 < j < n.v2. v is nonzero. Definitions.4.. if 0 = {i>i. These computations motivate the following definitions.246 is a diagonal matrix. diagonalizable.vn} is an ordered basis of eigenvectors of T. a scalar A is an eigenvalue of A if and only if it is an eigenvalue of L. that is. A 2 . Note that a vector is an eigenvector of a matrix A if and only if it is an eigenvector of LA. 0 = {v\. we have J V ( j) = Y\ D*3Vi i=l D 33V3 = X3V3> where Xj = Djj. ATl. The corresponding terms for eigenvalue are characteristic value and proper value. A nonzero vector v G F n is called an eigenvector of A if v is an eigenvector of LA.1.vn} for V such that [T]^ is a diagonal matrix. . how to obtain an ordered basis (3 = {vi. if so.Likewise. Conversely.

the exponential functions. Note that 0 = {vi. Let T: C°°(B) -> C oc (B) be the function defined by T ( / ) = / ' .1 Eigenvalues and Eigenvectors 247 To diagonalize a matrix or a linear operator is to find a basis of eigenvectors and the corresponding eigenvalues. no eigenvalues.1. R) of all functions from R to R as defined in Section 1. (Thus CX(R) includes the polynomial functions. = -2 0 0 5 . Moreover. \-A(V2) = 1 3 4 2 = b = 5n2. such operators and matrices are not diagonalizable. and hence of A.) Clearly. the sine and cosine functions. It is easily verified that T is a linear operator on C°°(B).Sec.4 are diagonalizable. and hence of A.2. the derivative of / . Therefore T has no eigenvectors and.v2} is an ordered basis for R2 consisting of eigenvectors of both A and L^. with the corresponding eigenvalue A2 = 5. Furthermore.. the vectors v and T(v) are not collinear. consequently. and so v2 is an eigenvector of LA. C°°(B) is a subspace of the vector space F(R. we consider three examples of eigenvalues and eigenvectors. and therefore A and L. It is clear geometrically that for any nonzero vector v. We determine the eigenvalues and eigenvectors of T. by Theorem 5. etc. Before continuing our study of the diagonalization problem. Thus there exist operators (and matrices) with no eigenvalues or eigenvectors. Example 1 Let A = Since U M = 1 3 4 2 1 -1 -2 = -2 2 I = -2vi. • Example 3 Let Ceyo(R) denote the set of all functions f:R—>R having derivatives of all orders. Of course. hence T(v) is not a multiple of v. 1 3 4 2 v\ = 1 I and v2 v\ is an eigenvector of L4. Here Ai = —2 is the eigenvalue corresponding to v\. 5. Me Example 2 Let T be the linear operator on R2 that rotates each vector in the plane through an angle of 7r/2.

this result is equivalent to the statement that det(A — A/ n ) = 0.248 Chap. This is a first-order differential equation whose solutions are of the form f(t) = cext for some constant c. any results proved about determinants in Chapter 4 remain valid in this context. Then a scalar X is an eigenvalue of A if and only if det (A . However. Theorem 5.tln) is called the characteristic polynomial l of A. The following theorem gives us a method for computing eigenvalues. J . It follows from Theorem 5. They are. Theorem 5. the field of quotients of polynomials in t with coefficients from F. every real number A is an eigenvalue of T. however.XIn)(v) = 0. A scalar A is an eigenvalue of A if and only if there exists a nonzero vector v G F n such that Av = An.tl2) = det 1-t 1 -/. Let A G M n x r i (F). The polynomial f(t) = det(A . When determining the eigenvalues of a matrix or a linear operator.2 t . Definition.XIn) = 0 .3 = (t-3)(t + l). (A . Proof. the eigenvectors are the nonzero constant functions.5 (p. this is true if and only if A — XIn is not invertible. By Theorem 2. Note that for A = 0. The observant reader may have noticed that the entries of the matrix A ~ tln are not scalars in the field F. • In order to obtain a basis of eigenvectors for a matrix (or a linear operator). = t2 . we normally compute its characteristic polynomial. Let A G M n X n (F). Example 4 To find the eigenvalues of A = 1 1 4 1 6 M 2x2 (/2). 5 Diagonalization Suppose that / is an eigenvector of T with corresponding eigenvalue A.2. scalars in another field F(t). as in the next example.2 states that the eigenvalues of a matrix are the zeros of its characteristic polynomial. we compute its characteristic polynomial: det(A . Consequently. that is. and A corresponds to eigenvectors of the form cext for c ^ 0. we need to be able to determine its eigenvalues and eigenvectors. Then / ' = T ( / ) = A/. 71).2 that the only eigenvalues of A are 3 and — 1. Consequently.

3. Theorem 5.t)(2 .t) = -(*-l)(*-2)(t-3). Example 5 Let T be the linear operator on P 2 (B) defined by T(f(x)) = f(x)+(x+l)f'(x). let 0 be the standard ordered basis for P 2 (B).tl3) = (let \ 0 0 1 2-t 0 0 2 3-r = (1 . The next theorem tells us even more. Thus if T is a linear operator on a finite-dimensional vector space V and 0 is an ordered basis for V. 5. • Examples 4 and 5 suggest that the characteristic polynomial of an n x n matrix A is a polynomial of degree n. and let A = [T]^. This fact enables us to define the characteristic polynomial of a linear operator as follows. (b) A has at most n distinct eigenvalues. Hence A is an eigenvalue of T (or A) if and only if A = 1. The remark preceding this definition shows that the definition is independent of the choice of ordered basis 0. Definition. (a) The characteristic polynomial of A is a polynomial of degree n with leading coefficient ( — l ) n .0(3 . 2.1 Eigenvalues and Eigenvectors 249 It is easily shown that similar matrices have the same characteristic polynomial (see Exercise 12). We define the characteristic polynomial f(t) of T to be the characteristic polynomial of A = [T]^. Let A G M n x n (F). or 3. That is. Proof. f(t) = det(A-tln).Sec. Then The characteristic polynomial of T is (\-t det(A . We often denote the characteristic polynomial of an operator T by det(T — tl). . Exercise. Let T be a linear operator on an n-dimensional vector sjmcc V with ordered basis 0. then A is an eigenvalue of T if and only if A is an eigenvalue of [T]g. It can be proved by a straightforward induction argument.

T h e o r e m 5. Let T be a linear operator on a vector space V. Let Then is an eigenvector corresponding to Ai = 3 if and only if x ^ 0 and x G N(LR. Ai = 3 and A2 = — 1. ). x ^ 0 and -2 4 l \ (xx\ -2) \x2) __ (-2xx + x2\ V 4 *i . and let X be an eigenvalue ofT. Let B2 = A.i X2I = 4 i\ I (-\ \ 0 o^ -1/ (2 \s V4 2 . Example 6 To find all the eigenvectors of the matrix in Example 4. A vector v G V is an eigenvector of T corresponding to X if and only if v ^ 0 and v G N(T .250 Chap. that is. 5 Diagonalization Theorem 5.4. Our next result gives us a procedure for determining the eigenvectors corresponding to a given eigenvalue. Now suppose that x is an eigenvector of A corresponding to A2 = —1. Proof.AI). recall that A has two eigenvalues.2 W _ /0 V° Clearly the set of all solutions to this equation is Hence x is an eigenvector corresponding to Aj = 3 if and only if x =t { \ for some t ^ 0.2 enables us to determine all the eigenvalues of a matrix or a linear operator on a finite-dimensional vector space provided that we can compute the zeros of the characteristic polynomial. Exercise. We begin by finding all the eigenvectors corresponding to Ai = 3.

Figure 5. 5. To find the eigenvectors of a linear operator T on an /". • Suppose that 0 is a basis for F n consisting of eigenvectors of A. Thus x is an eigenvector corresponding to A2 = — 1 if and only if x= t Observe that » .1 is the special case of Figure 2. We show that v € V is an eigenvector of T corresponding to A if and only if 4>p(v) .2 in Section 2. Recall that for v G V. <f)p(v) = [v]p.) = | t ( 2)-teRj.i is a basis for R2 consisting of eigenvectors of A.1 Eigenvalues and Eigenvectors Then 1 251 'xi x2 ' E N(Lfi2 if and only if rr is a solution to the system 2xi + x2 = 0 4xi + 2x2 = 0. Hence N(Lfl.4 in which V = W and 0 = 7. for instance. if Q = then »-lAQ i Q~ = 3 0 0 -1 2/' -2 for some t ^ 0. the diagonal entries of this matrix are the eigenvalues of A that correspond to the respective columns of Q. Thus L^. In Example 6.Sec. The corollary to Theorem 2. is diagonalizable.23 assures us that if Q is the nxn matrix whose columns are the vectors in 0. and hence A. select an ordered basis 0 for V and let A = [T]^. .-dimensional vector space. then Q"1 AQ is a diagonal matrix. the coordinate vector of v relative to 0. Of course.

hence 0/3 (n) is an eigenvector of A. The next example illustrates this procedure. Suppose that v is an eigenvector of T corresponding to A. Let Ai = 1. This argument is reversible. and so we can establish that if 4>g(v) is an eigenvector of A corresponding to A. 1 0\ 0 2 2 . Recall that T has eigenvalues 1. Hence A<p0(v) = \-A<j>p{v) = 4>pT(v) = 4>n(Xv) = Xcj>fl(v). Then T(v) = A?. then v is an eigenvector of T corresponding to A. and 3 and that A = [T}0 = / .. since 0/? is an isomorphism. Now 4>p(v) 7 * ^ 0.( F" —bi_» p" Figure 5. 2. Example 7 Let T be the linear operator on P2(R) defined in Example 5. a vector y G F n is an eigenvector of A corresponding to A if and only if 01 ] (y) is an eigenvector of T corresponding to A. \ 0 0 3/ We consider each eigenvalue separately.1 Chap. 0 2. and let 0 be the standard ordered basis for P 2 (i?).252 V V </'.) An equivalent formulation of the result discussed in the preceding paragraph is that for an eigenvalue A of A (and hence of T). . 5 Diagonalization is an eigenvector of A corresponding to A. (See Exercise 13. and define Bi = A-XJ Then j 3 6 R ! /0 = 0 \0 1 0N 1 2 |. Thus we have reduced the problem of finding the eigenvectors of a linear operator on a finite-dimensional vector space to the problem of finding the eigenvectors of a matrix.

Sec. 5.1 Eigenvalues and Eigenvectors

253

is an eigenvector corresponding to Ai = 1 if and only if x ^ 0 and x G N(LBJ); that is. x is a nonzero solution to the system x2 =0 x2 + 2x 3 = 0 2x3 = 0. Notice that this system has three unknowns, xi, x 2 , and X3, but one of these, xi, does not actually appear in the system. Since the values of xi do not affect the system, we assign xi a parametric value, say xi = a, and solve the system for x 2 and X3. Clearly, x 2 = X3 = 0, and so the eigenvectors of A corresponding to Ai = 1 are of the form

for a ^ 0. Consequently, the eigenvectors of T corresponding to Ai = 1 are of the form 0^ 1 (aei) = a0^ x (ei) = a-1 = a for any a ^ O . Hence the nonzero constant polynomials are the eigenvectors of T corresponding to Ai = 1. Next let A2 = 2, and define ' - 1 1 0N B2 = A - X2I = ( 0 0 2 0 0 1 It is easily verified that

and hence the eigenvectors of T corresponding to A2 = 2 are of the form ^" (ei + e 2 ) = a(l + x) V for 0 ^ 0. Finally, consider A3 = 3 and A-A3/= -2 0 0 1 0X -1 2 0 0

254 Since

Chap. 5 Diagonalization

the eigenvectors of T corresponding to A3 = 3 are of the form 0 ^ [ o | 2 | ) = a0 / ; 1 (e i + 2e2 + e 3 ) = a(l + 2x + x 2 ) V for a ^ 0. For each eigenvalue, select the corresponding eigenvector with a = 1 in the preceding descriptions to obtain 7 = { l , l + x , l + 2 x + x 2 } , which is an ordered basis for P2(R) consisting of eigenvectors of T. Thus T is diagonalizable. and / l 0 0\ [T]7 = 0 2 0 . \ 0 0 3/ W /

We close this section with a geometric description of how a linear operator T acts on an eigenvector in the context of a vector space V over R. Let v be an eigenvector of T and A be the corresponding eigenvalue. We can think of W = span({n}), the one-dimensional subspace of V spanned by v, as a line in V that passes through 0 and v. For any w G W, w = cv for some scalar c, and hence T(iy) = T(cn) = cT(n) = cAn = Xw; so T acts on the vectors in W by multiplying each such vector by A. There are several possible ways for T to act on the vectors in W, depending on the value of A. We consider several cases. (See Figure 5.2.) CASE 1. If A > 1, then T moves vectors in W farther from 0 by a factor of A. CASE 2. If A = 1, then T acts as the identity operator on W. CASE 3. If 0 < A < 1, then T moves vectors in W closer to 0 by a factor of A. CASE 4. If A = 0, then T acts as the zero transformation on W. CASE 5. If A < 0, then T reverses the orientation of W; that is, T moves vectors in W from one side of 0 to the other.

Sec. 5.1 Eigenvalues and Eigenvectors Case 1: A > 1

255

Case 2: A = 1

Case 3: 0 < A < 1

Case 4: A = 0

Case 5: A < 0

Figure 5.2: The action of T on W = span({v}) when v is an eigenvector of T. To illustrate these ideas, we consider the linear operators in Examples 3, 4, and 2 of Section 2.1. For the operator T on R2 defined by T(ai,a 2 ) = (ai, — a2), the reflection about the x-axis, ei and e2 are eigenvectors of T with corresponding eigenvalues 1 and — 1, respectively. Since ei and e 2 span the x-axis and the y-axis, respectively, T acts as the identity on the x-axis and reverses the orientation of the t/-axis. For the operator T on R2 defined by T(ai,a 2 ) = (ai,0), the projection on the x-axis, ei and e 2 are eigenvectors of T with corresponding eigenvalues 1 and 0, respectively. Thus, T acts as the identity on the x-axis and as the zero operator on the y-axis. Finally, we generalize Example 2 of this section by considering the operator that rotates the plane through the angle 9, which is defined by T0(0.1,0,2) = (ai cos# — a2sin*9, ai sin# + a2 cos9). Suppose that 0 < 9 < n. Then for any nonzero vector v, the vectors v and T#(n) are not collinear, and hence T# maps no one-dimensional subspace of R2 into itself. But this implies that T0 has no eigenvectors and therefore no eigenvalues. To confirm this conclusion, let 0 be the standard ordered basis for R2, and note that the characteristic polynomial of T# is det{[Te}3 - tl2) = det cos 9 — t — sin 1 = t - 2 - ( 2 c o s 0 ) t + l, sin# cos 9 — t

256

Chap. 5 Diagonalization

which has no real zeros because, for 0 < 9 < it, the discriminant 4 cos2 9 — 4 is negative. EXERCISES Label the following statements as true or false. (a) Every linear operator on an n-dimensional vector space has n distinct eigenvalues. (b) If a real matrix has one eigenvector, then it has an infinite number of eigenvectors. (c) There exists a square matrix with no eigenvectors. (d) Eigenvalues must be nonzero scalars. (e) Any two eigenvectors are linearly independent. (f) The sum of two eigenvalues of a linear operator T is also an eigenvalue of T. (g) Linear operators on infinite-dimensional vector spaces never have eigenvalues. (h) An n x n matrix A with entries from a field F is similar to a diagonal matrix if and only if there is a basis for F n consisting of eigenvectors of A. (i) Similar matrices always have the same eigenvalues, (j) Similar matrices always have the same eigenvectors, (k) The sum of two eigenvectors of an operator T is always an eigenvector of T. 2. For each of the following linear operators T on a vector space V and ordered bases 0, compute [T]/?, and determine whether 0 is a basis consisting of eigenvectors of T. 1 0 a - 66 , and 0 = (a) V = R2, T 17a - 106 (b) V = Pi (R), J(a + 6x) = (6a - 66) + (12a - 116)x, and 3 = {3 + 4x, 2 + 3x} 3a + 26 - 2c' (c) V = R - 4 a - 36 + 2c . and

(d) V = P 2 (i?),T(a + 6x + cx 2 ) = ( - 4 a + 26 - 2c) - (7a + 36 + 7c)x + (7a + 6 + 5c> 2 , and 0 = {x - x 2 , - 1 + x 2 , - 1 - x + x 2 }

Sec. 5.1 Eigenvalues and Eigenvectors (e) V = P3(R), T(a + 6x + ex2 + dx 3 ) = - d + (-c + d)x + (a + 6 - 2c)x2 + ( - 6 + c - 2d)x 3 , and /? = {1 - x + x 3 , l + x 2 , l , x + x 2 } (f) V = M 2 x 2 (i2),T c 1 0 1 0 d - 7 a - 4 6 + Ac -Ad b , and - 8 a - 46 + 5c - Ad d -1 2 0 0 1 0 2 0 -1 0 0 2

257

0 =

3. For each of the following matrices A G M n X n (F), (i) Determine all the eigenvalues of A. (ii) For each eigenvalue A of A, find the set of eigenvectors corresponding to A.
n (hi; If possible, find a basis for F consisting of eigenvectors of A. (iv; If successful in finding such a basis, determine an invertible matrix Q and a diagonal matrix D such that Q~x AQ = D.

=R

for F = R

for F = C

for F = R 4. For each linear operator T on V, find the eigenvalues of T and an ordered basis 0 for V such that [T]/3 is a diagonal matrix. (a) (b) (c) (d) (e) (f) (g) V= V= V= V= V= V= V= R2 and T(a, 6) = ( - 2 a + 36. - 10a + 96) R3 and T(a, 6, c) = (7a - 46 + 10c, 4a - 36 + 8c, - 2 a + 6 - 2c) R3 and T(a, 6, c) = ( - 4 a + 3 6 - 6c. 6a - 76 + 12c, 6a - 66 + lie) P,(7?) and T(u.r + 6) = ( - 6 a + 26)x + ( - 6 a + 6) P2(R) and T(/(x)) = x/'(x) + /(2)x + /(3) P3(R) and T(/(x)) = f(x) + /(2)x P3(R) and T(/(x)) = xf'(x) + f"(x) - f(2) and T (a ^

(h) V = M2x2(R)

258 (i) V = M 2 x 2 (fi) and T d1 c \a d 6

Chap. 5 Diagonalization

(j) V = M2x2(/?.) and T(A) = A1 + 2 • tr(A) • I2 5. Prove Theorem 5.4. 6. Let T be a linear operator on a finite-dimensional vector space V, and let 0 be an ordered basis for V. Prove that A is an eigenvalue of T if and only if A is an eigenvalue of [T]#. 7. Let T be a linear operator on a finite-dimensional vector space V. We define the d e t e r m i n a n t of T, denoted det(T), as follows: Choose any ordered basis 0 for V, and define det(T) = det([T]/3). (a) Prove that the preceding definition is independent of the choice of an ordered basis for V. That is, prove that if 0 and 7 are two ordered bases for V, then det([T]/3) = det([T] 7 ). (b) Prove that T is invertible if and only if det(T) ^ 0. (c) Prove that if T is invertible, then det(T _ 1 ) = [det(T)]" 1 . (d) Prove that if U is also a linear operator on V, then det(TU) = dct(T)-det(U). (e) Prove that det(T — Aly) = det([T]/j — XI) for any scalar A and any ordered basis 0 for V. 8. (a) Prove that a linear operator T on a finite-dimensional vector space is invertible if and only if zero is not an eigenvalue of T. (b) Let T be an invertible linear operator. Prove that a scalar A is an eigenvalue of T if and only if A~! is an eigenvalue of T - 1 . (c) State and prove results analogous to (a) and (b) for matrices. 9. Prove that the eigenvalues of an upper triangular matrix M are the diagonal entries of M. 10. Let V be a finite-dimensional vector space, and let A be any scalar. (a) For any ordered basis 0 for V, prove that [Aly]/? = XI. (b) Compute the characteristic polynomial of Aly. (c) Show that Aly is diagonalizable and has only one eigenvalue. 11. A scalar m a t r i x is a square matrix of the form XI for some scalar A; that is, a scalar matrix is a diagonal matrix in which all the diagonal entries are equal. (a) Prove that if a square matrix A is similar to a scalar matrix XI, then A- XL (b) Show that a diagonalizable matrix having only one eigenvalue is a scalar matrix.

Sec. 5.1 Eigenvalues and Eigenvectors (c) Prove that I . 1 is not diagonalizable.

259

12. (a) Prove that similar matrices have the same characteristic polynomial. (b) Show that the definition of the characteristic polynomial of a linear operator on a finite-dimensional vector space V is independent of the choice of basis for V. 13. Let T be a linear operator on a finite-dimensional vector space V over a field F, let 0 be an ordered basis for V, and let A = [T]/3. In reference to Figure 5.1, prove the following. (a) If r G V and <t>0(v) is an eigenvector of A corresponding to the eigenvalue A, then v is an eigenvector of T corresponding to A. (b) If A is an eigenvalue of A (and hence of T), then a vector y G F n is an eigenvector of A corresponding to A if and only if 4>~Q (y) is an eigenvector of T corresponding to A. 14.* For any square matrix A, prove that A and A1 have the same characteristic polynomial (and hence the same eigenvalues). 15- (a) Let T be a linear operator on a vector space V, and let x be an eigenvector of T corresponding to the eigenvalue A. For any positive integer m, prove that x is an eigenvector of T m corresponding to the eigenvalue A7", (b) State and prove the analogous result for matrices. 16. (a) Prove that similar matrices have the same trace. Hint: Use Exercise 13 of Section 2.3. (b) How would you define the trace of a linear operator on a finitedimensional vector space? Justify that your definition is welldefined. 17. Let T be the linear operator on M n X n (i?) defined by T(A) = A1. (a) Show that ±1 are the only eigenvalues of T. (b) Describe the eigenvectors corresponding to each eigenvalue of T. (c) Find an ordered basis 0 for M2x2(Z?) such that [T]/3 is a diagonal matrix. (d) Find an ordered basis 0 for Mnx„(R) such that [T]p is a diagonal matrix for n > 2. 18. Let A,B e M„ x n (C). (a) Prove that if B is invertible, then there exists a scalar c G C such that A + cB is not invertible. Hint: Examine det(A + cB).

r 260 Chap. 5 Diagonalization (b) Find nonzero 2 x 2 matrices A and B such that both A and A + cB are invertible for all e G C. 19. * Let A and B be similar nxn matrices. Prove that there exists an ndimensional vector space V, a linear operator T on V, and ordered bases 0 and 7 for V such that A = [T]/3 and B = [T]7. Hint: Use Exercise 14 of Section 2.5. 20. Let A be an nxn matrix with characteristic polynomial + a n - i r " - 1 + • • • + a ^ + a0.

f(t) = (-l)ntn

Prove that /(0) = an = det(A). Deduce that A is invertible if and only if a 0 ^ 0. 21. Let A and f(t) be as in Exercise 20. (a) Prove t h a t / ( 0 = (An-t)(A22-t) ••• (Ann-t) + q(t), where q(t) is a polynomial of degree at most n — 2. Hint: Apply mathematical induction to n. (b) Show that tr(A) = ( - l ) n - 1 a n _ i . 22. t (a) Let T be a linear operator on a vector space V over the field F, and let g(t) be a polynomial with coefficients from F. Prove that if x is an eigenvector of T with corresponding eigenvalue A, then g(T)(x) = g(X)x. That is, x is an eigenvector of o(T) with corresponding eigenvalue g(X). (b) State and prove a comparable result for matrices. (c) Verify (b) for the matrix A in Exercise 3(a) with polynomial g(t) = 2t2 — t + 1, eigenvector x = I „ J, and corresponding eigenvalue A = 4. 23. Use Exercise 22 to prove that if f(t) is the characteristic polynomial of a diagonalizable linear operator T, then /(T) = To, the zero operator. (In Section 5.4 we prove that this result does not depend on the diagonalizability of T.) 24. Use Exercise 21(a) to prove Theorem 5.3. 25. Prove Corollaries 1 and 2 of Theorem 5.3. 26. Determine the number of distinct characteristic polynomials of matrices in M 2 x 2 (Z 2 ).

Sec. 5.2 Diagonalizability 5.2 DIAGONALIZABILITY

261

In Section 5.1, we presented the diagonalization problem and observed that not all linear operators or matrices are diagonalizable. Although we are able to diagonalize operators and matrices and even obtain a necessary and sufficient condition for diagonalizability (Theorem 5.1 p. 246), we have not yet solved the diagonalization problem. What is still needed is a simple test to determine whether an operator or a matrix can be diagonalized, as well as a method for actually finding a basis of eigenvectors. In this section, we develop such a test and method. In Example 6 of Section 5.1, we obtained a basis of eigenvectors by choosing one eigenvector corresponding to each eigenvalue. In general, such a procedure does not yield a basis, but the following theorem shows that any set constructed in this manner is linearly independent. Theorem 5.5. Let T be a linear operator on a vector space V, and let Ai, A 2 ,..., Xk be distinct eigenvalues of T. Ifvi,v2, • • • ,vk are eigenvectors of T such that Xi corresponds to Vi (I < i < k), then {vi,v2,... ,vk} is linearly independent. Proof. The proof is by mathematical induction on k. Suppose that k = 1. Then vi ^ 0 since v\ is an eigenvector, and hence {vi} is linearly independent. Now assume that the theorem holds for k - 1 distinct eigenvalues, where k — 1 > 1, and that we have k eigenvectors v\, v2,..., Vk corresponding to the distinct eigenvalues Ai, A 2 , . . . , Xk. We wish to show that {vi, v2,..., vk} is linearly independent. Suppose that a i , a 2 , . . . , a k are scalars such that aini+a2n2-l Yakvk = 0. (1)

Applying T - A^ I to both sides of (1), we obtain ai(Ai - Xk)vi +a 2 (A 2 - Afc)n2 + ••• +a fc _i(A fe _ 1 - Xk)vk-i By the induction hypothesis { f i , v 2 , . . . ,vk-i} hence = 0.

is linearly independent, and

ai(Xi - Afc) = a 2 (A 2 - Afc) = • • • = a fc _i(A fc -i - Afc) = 0. Since Ai, A 2 ,... , A^ are distinct, it follows that Aj - A^ ^ 0 for 1 < i < k — 1. So ai = a 2 = • • • = afc_i = 0, and (1) therefore reduces to akvk = 0. But vk ^ 0 and therefore ak = 0. Consequently ai = a 2 = • • • = ak = 0, and it follows that {vi,v2,... ,vk} is linearly independent. Corollary. Let T be a linear operator on an n-dimensional vector space V. If T has n distinct eigenvalues, then T is diagonalizable.

262

Chap. 5 Diagonalization

Proof. Suppose that T has n distinct eigenvalues Ai,...,A„. For each i choose an eigenvector Vi corresponding to A;. By Theorem 5.5, {v\,... ,vn} is linearly independent, and since dim(V) = n, this set is a basis for V. Thus, by Theorem 5.1 (p. 246), T is diagonalizable. 1 Example 1 Let A = 1 1 G 1 1 M2x2(R).

The characteristic polynomial of A (and hence of L^) is det (A - tl) = det \-t 1 1 1-rf = t(t - 2),

and thus the eigenvalues of L^ are 0 and 2. Since L^ is a linear operator on the two-dimensional vector space R2, we conclude from the preceding corollary that L.4 (and hence A) is diagonalizable. • The converse of Theorem 5.5 is false. That is, it is not true that if T is diagonalizable. then it has n distinct eigenvalues. For example, the identity operator is diagonalizable even though it has only one eigenvalue, namely, A = l. We have seen that diagonalizability requires the existence of eigenvalues. Actually, diagonalizability imposes a stronger condition on the characteristic polynomial. Definition. A polynomial f(t) in P(F) splits over F if there are scalars c , a i , . . . ,a„ (not necessarily distinct) in F such that f(t) = c(t-ai)(t-a2)---(t-an).

For example, t2 - 1 = (t + l)(t - 1) splits over R, but (t2 + l)(t - 2) does not split over R because t2 + 1 cannot be factored into a product of linear factors. However, (t2 + \)(t - 2) does split over C because it factors into the product (t + i)(t — i)(t — 2). If f(t) is the characteristic polynomial of a linear operator or a matrix over a field F , then the statement that f(t) splits is understood to mean that it splits over F. T h e o r e m 5.6. The characteristic polynomial of any diagonalizable linear operator splits. Proof. Let T be a diagonalizable linear operator on the n-dimensional vector space V. and let 0 be an ordered basis for V such that [T)p = D is a

Sec. 5.2 Diagonalizability diagonal matrix. Suppose that (Xx 0 D= \o o •••
A,,/

263

0 A2

0\ 0

and let f(t) be the characteristic polynomial of T. Then (Xx-t 0 f(t) = det(D - tl) = det V 0 0 ••• A»-t/ I 0 A2 - f ••• ••• 0 \ 0

= (Ai - t)(X2 - t) • • • (A„ - 1 ) = ( - 1 H * - Ai)(t - A 2 )... (t - An). From this theorem, it is clear that if T is a diagonalizable linear operator on an n-dimensional vector space that fails to have distinct eigenvalues, then the characteristic polynomial of T must have repeated zeros. The converse of Theorem 5.6 is false: that is, the character ist i < polynomial of T may split, but T need not be diagonalizable. (See Example 3. which follows.) The following concept helps us determine when an operator whose characteristic polynomial splits is diagonalizable. Definition. Let X be an eigenvalue of a linear operator or matrix with characteristic polynomial f(t). The (algebraic) multiplicity of X is the largest positive integer k for which (t - X)k is a factor of f(t). Example 2 Let

which has characteristic polynomial f(t) = — (t — 3)2(t — A). Hence A = 3 is an eigenvalue of A with multiplicity 2. and A = 4 is an eigenvalue of A with multiplicity 1. • If T is a diagonalizable linear operator on a finite-dimensional vector spate V, then there is an ordered basis 0 for V consisting of eigenvectors of T. We know from Theorem 5.1 (p. 246) that [T]/3 is a diagonal matrix in which the diagonal entries are the eigenvalues of T. Since the characteristic polynomial of T is detQT]/? — tl), it is easily seen that each eigenvalue of T must occur as a diagonal entry of [T]/3 exactly as many times as its multiplicity. Hence

264

Chap. 5 Diagonalization

0 contains as many (linearly independent) eigenvectors corresponding to an eigenvalue as the multiplicity of that eigenvalue. So the number of linearly independent eigenvectors corresponding to a given eigenvalue is of interest in determining whether an operator can be diagonalized. Recalling from Theorem 5.4 (p. 250) that the eigenvectors of T corresponding to the eigenvalue A are the nonzero vectors in the null space of T — AI, we are led naturally to the study of this set. Definition. Let T be a linear operator on a vector space V, and let X be an eigenvalue ofT. Define E\ = {x G V: T(x) = Ax} = N(T — Aly). The set E\ is called the eigenspace of T corresponding to the eigenvalue X. Analogously, we define the eigenspace of a square matrix A to be the eigenspace of LA. Clearly, E\ is a subspace of V consisting of the zero vector and the eigenvectors of T corresponding to the eigenvalue A. The maximum number of linearly independent eigenvectors of T corresponding to the eigenvalue A is therefore the dimension of EA- Our next result relates this dimension to the multiplicity of A. T h e o r e m 5.7. Let T be a linear operator on a finite-dimensional vector space V, and let X be an eigenvalue of T having multiplicity m. Then 1 < dim(EA) < m. Proof. Choose an ordered basis {v\. r2 vp} for EA. extend it to an ordered basis 0 = {vi, t)2, • • •, Up, vp+i,..., vn} for V, and let A = [T]g. Observe that V{ (1 < i < p) is an eigenvector of T corresponding to A, and therefore
A

-{()

C

By Exercise 21 of Section 4.3, the characteristic polynomial of T is f{t) = det(A - tin) = det ({X = det((A - t)Ip) det(C = (X-ty>g(t), J)l" tln-p) B C tln-p

where g(t) is a polynomial. Thus (A — t)p is a factor of f(t), and hence the multiplicity of A is at least p. But dim(EA) = p, and so dim(EA) < rn. Example 3 Let T be the linear operator on P2(R) defined by T(/(x)) = f'(x). The matrix representation of T with respect to the standard ordered basis 0 for

= I 2 3 l\ 2 . -t. there is no basis for P2(R) consisting of eigenvectors of T.Sec. the characteristic polynomial of T is /-/. and therefore dim(EA) = 1. Thus T has only one eigenvalue (A = 0) wit h mult iplicity 3. Solving T(/(. and hence the characteristic polynomial of T is /4 . • Example 4 Let T be the linear operator on R3 defined by A. Let > be the standard ordered basis for R3.r)) = f'(x) = 0 shows that EA = N(T . and therefore T is not diagonalizable.) [J}0 = 0 \0 1 0\ 0 2 0 0/ 265 Consequently. So {1} is a basis for EA.2 Diagonalizability P2(R.) is /.£/) = det 0 V 0 1 -/ 0 0N 2 = -t*. So the eigenvalues of T are Ai = 5 and A2 = 3 with multiplicities 1 and 2.t 0 1N 2 A-t -(t-5)(t-3)2. det([T]p . 5. A /4ai + a3\ a 2 I = I 2ai + 3a 2 + 2o3 . \". Then /4 0 [T].1 det([T]/3 . . Consequently. respectively.AI) = N(T) is the subspace of P2(R) consisting of the constant polynomials..tl) = (let [ 2 \ I 0 3.i/ \ ai + Au->) T We determine the eigenspace of T corresponding to each eigenvalue.

It is easily seen (using the techniques of Chapter 3) that is a basis for EA. EA2 = N(T — A2I) is the solution space of the system x\ + x 3 = 0 2xi + 2x3 = 0 x\ + x 3 = 0. X2 = s. and dim(EA2) = 2. T is diagonalizable. namely. the multiplicity of each eigenvalue Ai is equal to the dimension of the corresponding eigenspace EA^ . • Hence dim(EA. we assign it a parametric value.) = 1. The result is the general solution to the system 0 . is linearly independent and hence is a basis for R3 consisting of eigenvectors of T. introducing another parameter t.x3 = 0. Observe that the union of the two bases just derived. 5 Diagonalization EAX is the solution space of the system of linear equations -xi + x3 = 0 2xi . and solve the system for xi and x3.266 Chap. say. t G R. Consequently. Similarly. Since the unknown x 2 does not appear in this system. • .2x 2 + 2x 3 = 0 xi . for s. It follows that is a basis for EA2. In this case.

suppose that. .. be a finite linearly independent subset of the eigenspace EA. and hence a^ — 0 for all j. and let Ai.. • Then S = Si U S2 u • • • U Sk is a linearly independent subset ofW. ^ 0 for 1 < i < m."s are linearly independent. A/. as we now show. Vi is an eigenvector of T corresponding to A. Therefore. let S... We conclude that S is lineally independent.... Wi — 0 for all i. for each i < m. rVk = 0..2.A. Bui each .. and W\ + • • • + wk = 0. Let T be a linear operator on a vector space V.5.. for 1 < m < k.Sec.scalars {a^} such that ^^a-ijVij i=i j=i For each i. For each i = 1. T h e o r e m 5.. by the lemma.. = 0. be distinct eigenvalues ot'T. Suppose that for each i Si = {viUVi2. 5. Consider an\.8. let Wi = ^ajjVij. that Vi = 0 for all i. This is indeed true. and v\ + v2 H h vm = 0. is linearly independent. Proof. By renumbering if necessary.. and let X\. Lemma. We begin with the following lemma.. which states that these r.. Then. for each /'. Proof.2. We conclude. Let T be a linear operator.vini}.k. But this contradicts Theorem 5. Suppose otherwise. For each i = 1. A 2 . we have v.. the eigenspace corresponding to A... and Vi = 0 for / > m. //' Vi+v2-\ then V{ = 0 for all i. which is a slight variation of Theorem 5. Then S = [vy : 1 < j < /... .A A : be distinct eigenvalues of T. therefore. and 1 < i < A:}. let Vi G EAJ.5. Then Wi G EA.S'.A2 .2 Diagonalizability 267 Examples 3 and 1 suggest that an operator whose characteristic polynomial splits is diagonalizable if and only if the dimension of each eigenspace is equal to the multiplicity of the corresponding eigenvalue.

. Then n.. mi 'I 1=1 2=1 It follows that ^ ( m . For each /.9.8. The n^'s sum to n because 0 contains n vectors. suj)pose that nij = d. We simultaneously show that T is diagonalizable and prove (b). suppose that T is diagonalizable.) for all i. and let 0 = 0\ \J02U • • • U0k. denote the multiplicity of A.. etc.. (b) If T is diagonaliy.. namely. = m. we conclude that ntj = dt. then the vectors in /52 (in the same order as in ii->). .d t ) = 0. is an ordered basis for EA. let 0r = 0 n E\t. The mVs also sum to n because the degree of the characteristic polynomial of T is equal to the sum of the multiplicities of the eigenvalues.268 Chap. By Theorem 5. Let Ai.nhie and I. Then (a) T is diagonalizable if and only if the multiplicity of Xi is equal to dim(EA. i — /_. Xk be the distinct eigenvalues ofT. let /. Thus k k k n n = 2__. i=i Since (m. for all i. Con\crsely. Let 0 be a basis for V consisting of eigenvectors of T. is a linearly independent subset of a subspace of dimension (/. for all i. 0 is linearly independent. T h e o r e m 5.-the vectors in 0\ are listed first (in the same order as in f3\).7. The next theorem tells us when the resulting set is a basis for the entire space. < <l. 5 Diagonalization Theorem 5. and let n^ denote the number of vectors in ). Proof. and n = dim(V).for all i.). by collecting bases for the individual eigeuspaces. Furthermore..8 tells us how to construct a linearly independent subset of eigenvectors. and di < nrii by Theorem 5. First. = n - . d. — di) > 0 for all i. the set of vectors in 0 that are eigenvectors corresponding to At. For each i. A 2 . let 0i be an ordered basis for E A < .. For each /. di < 2 . for each / because 1. for each i... 0 contains = n i=l i=l 2 We regard 0\ U 02 U • • • l)0k as an ordered basis in the natural way.. since d. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits.. then 0 = 0i U02 U • • • U0k is an ordered basis2 for V consisting of eigenvectors ofT. = dim(EA.

When testing T for diagonalizability. (By Theorem 5. Furthermore. and I) has as its j t h diagonal entry the eigenvalue of A corresponding to the jth column of Q. The union of these bases is a basis *) for F" consisting of eigenvectors of B. 02. we can use the corollary to Theorem 2.ra\\k(B — Xf). The matrix Q has as its columns the vectors in a basis of eigenvectors of A. 1. For each eigenvalue A ofT.2 Diagonalizability 269 vectors.. Therefore 3 is an ordered basis for V consisting of eigenvectors of V. If T is a diagonalizable operator and :j\. Test for Diagonalization Let T be a linear operator on an n-dimensional vector space V. We summarize our results. is diagonalizable. condition 2 is automatically satisfied for eigenvalues of multiplicity 1.rank(T — AI). We now consider some examples illustrating the preceding ideas. 0k are ordered bases for the eigenspaces of T. The characteristic polynomial ofT splits. Example 5 We test t he matrix G M 3x :i(/r) for diagonalizability. This theorem completes our study of the diagonalization problem. If T is diagonalizable and a basis 8 for V consisting of eigenvectors o f T is desired. 5. 2. then we first find a basis for each eigenspace of B. and hence [T]@ is a diagonal matrix. then the union 0 = 0iU02U-• -U0k is an ordered basis for V consisting of eigenvectors of T. and hence T.. If the characteristic polynomial of B splits. if A is an n x n diagonalizable matrix.) If so. 115) to find an invertible n x n matrix Q and a diagonal n x n matrix D such that Q~l AQ = D. then use condition 2 above to check if the multiplicity of each repeated eigenvalue of B equals n . the multiplicity of A equals n .. These same conditions can be used to test if a square matrix A is diagonalizable because diagonalizability of A is equivalent to diagonalizability of the operator L^. . Each vector in 7 is the coordinate vector relative to a of an eigenvector of T.Sec.. The set consisting of these n eigenvectors of T is the desired basis >.7. Then T is diagonalizable if and only if both of the following conditions hold. and we conclude that T is diagonalizable.23 (p. it is usually easiest to choose a convenient basis Q for V and work with B = [T] a . then B.

Also A has eigenvalues Ai = 4 and A2 = 3 with multiplicities 1 and 2. The eigenspace corresponding to Ai = 1 is G R' 0 1 1\ (xi 0 0 0 u 0 1 1/ V'':i Ex. = = 0 . which splits.rank which is equal to the multiplicity of A|. So we need only verify condition 2 for Ai = 1 . 5 Diagonalization The characteristic polynomial of A is d e t ( A . Thus we need only test condition 2 for A2. and A is therefore not diagonalizable. I) = 3 . which splits. and so condition 1 of the test for diagonalization is satisfied. Since A| has multiplicity 1.rank(£ . condition 2 is satisfied for X\. and hence of T. We first test T for diagonalizability.t J ) = -(t — A)(t-3)2. Example 6 Let T be the linear operator on P2(R) defined by T(/(x)) = / ( l ) + f'(0)x + (/'(()) + /"(0))x 2 . Also B has the eigenvalues Ai = 1 and A2 = 2 with multiplicities 2 and 1. Condition 2 is satisfied for A2 because it has multiplicity 1. Therefore T is diagonalizable. respectively. is (/ I )2{l 2).270 Chap. we see that 3 — rank(A — A27) = 1. Then B = /l 0 \0 1 l\ 1 0 . We consider each eigenvalue separately. For this case. 3 . Hence condition 1 of the test for diagonalization is satisfied.X2I = I 0 0 0 I \° ° 7 has rank 2. which is not the multiplicity of A2. respectively. We now find an ordered basis 7 for R3 of eigenvectors of B. Thus condition 2 fails for A2. Because /() 1 0\ A .A. 1 l) The characterist ic polynomial of li. Let a denote the standard ordered basis for P2(R) and B = [T)n.

Thus / l 0 0\ [T}0 = 0 1 0 . and has 72 = as a basis. observe that the vectors in 7 are the coordinate vectors relative to a of the vectors in the set 0={l. and has tr 7i = as a basis. Finally. \o 0 2/ • .2 Diagonalizability which is the solution space for the system x2 + 2:3 = 0. Let ' tr 7 = 7i U 72 = Then 7 is an ordered basis for R-i consisting of eigenvectors of B.x i + x 2 + x3 = 0 X2 =0.l+x2}.Sec.-x + x2. 5. 271 EA2 = i r which is the solution space for the system . The eigenspace corresponding to A2 = 2 is X2\ GR 3 : -1 0 0 1 V -1 0 1 (I. which is an ordered basis for P2(R) consisting of eigenvectors of T.

5 Diagonalization Our next example is an application of diagonalization that is of interest in Section 5. Example 7 Let A = 0 1 -2 3 We show that A is diagonalizable and find a 2 x 2 matrix Q such that Q~1AQ is a diagonal matix. are bases for the eigenspaces EAX and EA2. observe that A = QDQ An = = (QDQ'1)n • • (QDQ-1) QDnQ~l lo -2 1 2«I -l)(l l) l() i) H 2n \ 1 ~l 2 2 _ 2n -l+2n . 115). Therefore = (QDQ-l)(QDQ-')- 2. Moreover. By applying the corollary to Theorem 5. Let Q = -2 1 -T 1 the matrix whose columns are the vectors in 7.272 Chap. we see that A is diagonalizable. Therefore 7 = 7i U 72 -2 1 is an ordered basis for R2 consisting of eigenvectors of A. First observe that the characteristic polynomial of A is (t — l)(t — 2). We then show how to use this result to compute An for any positive integer n. by the corollary to Theorem 2.23 (p.5 to the operator L^. D = Q'lAQ = [LA}0 = 1 0 0 2) ' 1 To find A n for any positive integer n. 7l = and {C~i)} *""{(~i.2 n + 1 -l+2n+1 . Ai = 1 and A2 = 2.3. respectively. and hence A has two distinct eigenvalues. Then.

namely. is defined by /•'•'. We determine all of the solutions to this system. Q~xx' = DQ~lx. this system has a solution. Clearly. It can be verified that for Q = 0 -M 0 -1 -2 1 1 1 and D = (2 0 o 0 0 2 0 0 1 we have Q~lAQ = D. (t)\ = A(t) V'3(t)) At) Let A = be the coefficient matrix of the given system. . Substitute A = QDQ~l into x' = Ax to obtain x' = QDQ~lx or.2 Diagonalizability 273 We now consider an application that uses diagonalizat ion to solve a system of differential equations. equivalently. denoted x'. the solution in which each Xi(t) is the zero function. Let x: R — > R3 be the function defined by The derivative of x. 5. for each i. Systems of Differential Equations Consider the system of differential equations x\ = 3Xi + X2 + X3 Xo = 2xi + 4x 2 + 2x 3 x3 = . and y' = Q~lx' (see Exercise 16). The function y: R -> R3 defined by y(t) = Q~lx(t) can be shown to be differentiable. so that we can rewrite the system as the matrix equation x' = Ax. where.Sec.x i x2 + x3. Hence the original system can be written as y' = Dy. Xj = Xi(t) is a differentiable real-valued function of the real variable t.

It is easily seen (as in Example 3 of Section 5.1) that the general solution to these equations is yi(t) = cie 2 '. 5 Diagonalization Since D is a diagonal matrix.274 Chap. (*i(0\ /-I M O = x(t) = Qy(t) = ° \x3(t)J \ " —cie2t C]t 21 2t 0 -1 -2 1 1 cie 2t c 2 e2t . This result is generalized in Exercise 15. and C3 are arbitrary constants. Note that this solution can be written as x(t) = .c 2 . where z\ G EA. where Ai = 2 and A2 = 4. respectively. Finally. and y3(t) — c3e4t.2e3e + c 2 c 2 ' + c 3 e 4t yields the general solution of the original system. the system y' — Dy is easy to solve.c e At 3 \i c3e 4t . There is a way of decomposing V into simpler subspaces that offers insight into the . and z2 G EA2. and EA2. where ci.2t + f.c2e .4t The expressions in brackets are arbitrary vectors in EA. Thus the general solution of the original system is x(t) = e2tzi + e4tz2. y2(t) = c 2 e 2t . Direct Sums* Let T be a linear operator on a finite-dimensional vector space V. and thus can be solved individually. Setting f yi{t) V2{t) y(t) = we can rewrite y' = Dy as (v\(i) f2yi(t)" 2lte(0 The three equations y[ = 2j/i y2 = 22/2 y3 = 4i/3 are independent of each other.

• Notice that in Example 8 the representation of (a. 6.€ R}.W 2 vVk and write V = W.0. b.c.2 Diagonalizability 275 behavior of T. .0. c) G W 2 .. 0.0): a.0. Then R3 = Wi + W2 because.. d) G V. k + WA or \ \ W.0. For any (a.c) = (a.3. which we denote by W] + W2 -\ It is a simple exercise to show that the sum of subspaces of a vector space is also a subspace.0): c G R}. We call V the direct sum of the subspaces Wi. where (a. W2 = {(0. 0. 0) + (0. Let V\/i. b. 0) + (0. d): d G R}. we have (a. In the case of diagonalizable operators.subspaces of a vector space V. Because we are often interested in sums for which representations are unique. (a. .Sec. and W 3 = {(0. and (0. . . n Y^ W.6. c. 0) + (0. The definition of direct sum that follows is a generalization of the definition given in the exercises of Section 1.6. This approach is especially useful in Chapter 7. Example 8 Let V = R3. b.0) G W. \Nk be . @ W 2 © . We define the sum of these subspaces to be the set {vi + v2 + r vk: Vi G Wi for 1 < i < k). 6. b. .\N2 W/. let W] denote the xy-p\ane.0.. W 2 .0. for any vector (a.0) + (0. c) G R3.b. .• = { 0 } for each j (1 < j < k).0. Let Wi. b. c. d) = (a. c) as a sum of vectors in Wi and W2 is not unique. Example 9 Let V = R'. the simpler subspaces are the eigenspaces ol the operator. c) = (a. . we introduce a condition that assures this outcome. 5. b. c. be subspaces of a vector space V. Definition. (a. if v = f > i=i and W. W. For example.0. d) € Wi + W2 + W 3 .c).0. c) is another representation. where we study nondiagonalizable linear operators. and let W 2 denote the yz-plane. Definition.fr.0 V V A . = {(a.

. then 71 U 72 U • • • U 7^ is an ordered basis for V. then V{ — 0 for all i..©W fc . We prove (b).. But these equalities are obvious. Wfc be subspaces of a finite-dimensional vector space V. and W3. ... Now assume (b)..v2.k. •. there exists an ordered basis 7. (c) Each vector v G V can be uniquely written as v = vi + v2 + • • • + vk. and v = vi + v2 + • • • + vk.. we must prove that Wi n (W2 + W 3 ) = W 2 n (Wi + W 3 ) = W 3 n (Wi + W 2 ) = {0}.. vk such that V{ G W. Proof. for any vectors vi. . W 2 . where Vi G W. .10. .--. i=l Now suppose that i>i. By (b). 5 Diagonalization To show that V is the direct sum of Wi. (a) v = w 1 e w 2 © . Let Wi.. The following conditions are equivalent. (d) If 7i is an ordered basis for Wi (I < i < k)... Then for any j ~v3 But — Vj G Wj and hence -^•eWj-n^w^io}.. (e) For each i = 1. ifvi + \)2 + \-vk = 0.. We prove (c).276 Thus V= ^ W „ Chap. We must show B £ « l e 5ZWi' .vk are vectors such that Vi G W^ for all i and V\ + V2 H \~vk = 0. W 2 . and so V = Wi © W 2 © W3.2. Clearly k V =>*>>. for Wj such that 71 U 72 U • • • U 7fc is an ordered basis for V.n 2 . • Our next result contains several conditions that are equivalent to the definition of a direct sum. . T h e o r e m 5. proving (b)..i>2. there exist vectors vi. Assume (a). Let v G V. k (b) V B B V^Wi and.vk such that Vi G Wj J=I (1 <i< k). So Vj = 0.

Then (vi . 2 . it follows that 71 U 7 2 U • • • U 7fc generates V. To show that this set is linearly independent. set Wi = Then for each i. we assume (e) and prove (a). We prove (d). let 7< be an ordered basis for Wi such that 71 U 72 U • • • U 7fc is an ordered basis for V.2 Diagonalizability that this representation is unique.+ (vk .w^ + (v2-w2) + --. be an ordered basis for W t . Finally. proving the uniqueness of the representation. m* and i = 1 .Sec. consider vectors Vij G 7< (j = 1. . . aijvij j=\ for each i. let 7. JJ ^j • h ^ = wi + if2 H \-u)k. k) and scalars a^ such that ^OijVij 1. But each 7^ is linearly independent. (c) implies = W» = 7.. Consequently 71 U 72 U • • • U 7fc is linearly independent and therefore is a basis for V.. 5. Clearly (e) follows immediately from (d). For each i.2.tnfc = 2 J aij% = 0.wk) = 0. For each i.3 For each i. i=i by (c).. Thus 0 = 0. Now assume (c). Thus V < = Wi for all 2. . Suppose also that v = W\ +W2 H where Wi G Wj for all i. and therefore Vi — Wi — 0 for all i by (b). Then V = span(7i U7 2 U---U7fc) .j Since 0 G Wi for each i and ^ + 0 H that Wi — 0 for all i. «. and hence a^ = 0 for all i and j. But Vi — Wi G Wi for all i. . Wi G span(7i) = Wi and n?i + w2 H 1. Since k v = ]Tw. 277 h wk..

the union 71 U 72 U • • • U 7fc is a basis for V. and suppose that. 3d).278 Chap.10. For each i. 6. Afc be the distinct eigenvalues of T.11.10. n ] T w ( = {0}.o.i=i by repeated applications of Exercise 14 of Section 1. 5 Diagonalization k = span(7i) + span(7 2 ) + • • • + span(7fc) = J j W. First suppose that T is diagonalizable. we conclude that T is diagonalizable.9. and for each i choose an ordered basis 7i for the eigenspace EA^ By Theorem 5. Conversely. Let Ai..8 (p.2c. Proof. .-. Fix j (1 < j < k).. A linear operator T on a finite-dimensional vector space V is diagonalizable if and only if\/ is the direct sum of the cigenspaces ofT. proving (a). By Theorem 5. •jeWj-n^w....4. suppose that V is a direct sum of the eigenspaces of T.Wi = span I ^J i*j L»¥j so that Hence n is a nontrivial linear combination of both 7j and v can be expressed as a linear combination of 71 U 7 2 U • • • U 7fc in more than one way. we are able to characterize diagonalizability in terms of direct sums. A 2 . choose an ordered basis jt of EA. But these representations contradict Theorem 1.c. 43). Then G Wj = span(7j) and w G 2_. E x a m p l e 10 Let T be the linear operator on R1 defined by T(a. T h e o r e m 5. and hence V is a direct sum of the EA^'S by Theorem 5. and so we conclude that w . for some nonzero vector v € V. Since this basis consists of eigenvectors of T. 71 U 72 U • • • U 7fc is a basis for V.10. I With the aid of Theorem 5.d) = (a.

(i) ^ V= £Wi i=l then V = Wi © W2 C E and WiDWj = {tf} Wfc. and W3 of Example 9. (e) Let A G M n x n ( F ) and 0 = {vi. (a) 1 2 0 1 (b) 1 3 3 1 (c) 1 4 3 2 . A2 = 2. (f) A linear operator T on a finite-dimensional vector space is diagonalizable if and only if the multiplicity of each eigenvalue A equals the dimension of EA(g) Every diagonalizable linear operator on a nonzero vector space has at least one eigenvalue. and A3 = 3. 2. Label the following statements as true or false. . and if A is diagonalizable. • EXERCISES 1. Thus Theorem 5. . (d) If Ai and A2 are distinct eigenvalues of a linear operator T. Furthermore. then E A l nEA 2 = {0}. If Q is the nxn matrix whose jth column is Vj (1 < j < n). (h) If a vector space is the direct sum of subspaces W l5 W 2 . 5. find an invertible matrix Q and a diagonal matrix D such that Q~lAQ — D. the corresponding eigenspaces coincide with the subspaces Wi.v2. • • • . then each vector in EA is an eigenvector of T. for i ^ j.vn} be an ordered basis for F n consisting of eigenvectors of A. . W 2 . (a) Any linear operator on an n-dimensional vector space that has fewer than n distinct eigenvalues is not diagonalizable. \Nk.11 provides us with another proof that R4 = Wi © W 2 © W 3 . then Q~lAQ is a diagonal matrix.2 Diagonalizability 279 It is easily seen that T is diagonalizable with eigenvalues Ai = 1. . (b) Two distinct eigenvectors corresponding to the same eigenvalue are always linearly dependent. (c) If A is an eigenvalue of a linear operator T. The following two items relate to the optional subsection on direct sums. test A for diagonalizability. For each of the following matrices A G Mnxn(R).Sec. then Wi n Wj = {0} for 1\± j.

Justify the test for diagonalizability and the method for diagonalization stated in this section. Let T be a lineal' operator on a finite-dimensional vector space V. Prove that A is diagonalizable. 4.6. test T for diagonalizability. (e) V = C2 and T is defined by T(z. (a) V = P3(R) and T is defined by T(/(x)) = f'(x) + f"(x).(/?) and T is defined by T(A) = A1. and that dim(EA. w) = (z + iw. For each of the following linear operators T on a vector space V. Ai and A2. find a basis 0 for V such that [T]a is a diagonal matrix.) = n — 1. and suppose there exists an ordered basis i for V such that [T]^ is an upper triangular matrix. (a) . State and prove the matrix version of Theorem 5. (b) Formulate the results in (a) for matrices. 5 Diagonalization 3. iz + w). Prove the matrix version of the corollary to Theorem 5.4. (a) Prove that the characteristic polynomial for T splits. where n is an arbitrary positive integer. Suppose that A G M n X n ( F ) has two distinct eigenvalues.5: If A G M r i x „(F) has n distinct eigenvalues. The converse of (a) is treated in Fxorcise 32 of Section 5. 5. (c) V = R3 and T is defined by (d) V = P2(R) and T is defined by T(/(x)) = /(0) + / ( l ) ( x + x 2 ). (f) V = M 2 x . respectively. then A is diagonalizable. (b) V = P2(R) and T is defined by T(ax 2 + bx + c) = ex2 + bx + a. and if T is diagonalizable. 7. 6. 9.280 Chap. . (b) State and prove an analogous result for matrices. For find an expression for A". 8.

A 2 ..Afc and corresponding multiplicities m i . dim(EA) = dim(E A ).y x'2 = — 5xi .(A/ 12. 5. let EA and EA denote the corresponding eigenspaces for A and A*. (b) Prove that for any eigenvalue A. Let fan a2i \ani ai 2 a22 an2 0>ln\ 0>2n x1 = Xi x2 = x2 Xo = ^3 X3 2x3 A = . 14. Find the general solution to each system of differential equations. these two eigenspaces need not be the same. 13. . Let T be a linear operator on a finite-dimensional vector space V with the distinct eigenvalues Ai. 11. then A1 is also diagonalizable. then T _ 1 is diagonalizable. A 2 .. .. Afc with corresponding multiplicities mi. m 2 . . (a) Show by way of example that for a given common eigenvalue. Let A G M n x n ( F ) . mk. ..Sec... (b) Prove that if T is diagonalizable. . . Prove that the eigenspace of T corresponding to A is the same as the eigenspace of T _ 1 corresponding to A" 1 . Prove the following statements. . .. A2. . Let T be an invertible linear operator on a finite-dimensional vector space V. A . Recall from Exercise 14 of Section 5. . respectively. . . (a) tr(A) = ^ m i A i i=l (b) det(A) = (Ai)^(A 2 ) Tn2 .1). . (a) Recall that for any eigenvalue A of T.Afc and that each Ai occurs mi times (1 < i < k). Let A be an n x n matrix that is similar to an upper triangular matrix and has the distinct eigenvalues Ai.2 Diagonalizability 281 10.1 is an eigenvalue of T _ 1 (Exercise 8 of Section 5. Suppose that 0 is a basis for V such that [T]p is an upper triangular matrix.7x 2 (c) 15. x'i = 8x1 + 10x2 = x +y (a) (b) = 3x . . (c) Prove that if A is diagonalizable. mk. Prove that the diagonal entries of [T]p are Ai. m 2 . .1 that A and A' have the same characteristic polynomial and hence share the same eigenvalues with the same multiplicities. For any eigenvalue A of A and Af.

. Two linear operators T and U on a finite-dimensional vector space V are called simultaneously diagonalizable if there exists an ordered basis 0 for V such that both [T]p and [U]/? arc diagonal matrices. Afc. and let Y be an n x p matrix of differentiable functions. 2 .4 and LB are simultaneously diagonalizable linear operators. 5 Diagonalization be the coefficient matrix of the system of differential equations X\ = o-\\X\ + a i 2 x 2 + • • • + ai Tl x n x 2 = a 2 iXi + a 2 2 x 2 + • • • + a2nxn x'n = a n i x i + a n 2 x 2 H h annxn. Prove (CY)' = CY'.e.Ait 2t.j. x(t) = e zi + eA 2 2 + • • • +eXkizk.B£ M n x n ( F ) are called simultaneously diagonalizable if there exists an invertible matrix Q G M n X n ( F ) such that both Q~l AQ and Q _ 1 BQ are diagonal matrices. Use this result to prove that the set of solutions to the system is an n-dimensional real vector space. then the matrices [T]^ and [U]/3 are simultaneously diagonalizable for any ordered basis 0. Prove that T and T m are simultaneously diagonalizable.. A.282 Chap. (b) Show that if A and B are simultaneously diagonalizable matrices. 18. Prove that a differentiable function x: R —* Rn is a solution to the system if and only if x is of the form Alt A2t .. then T and U commute (i. k. Definitions. Exercises 20 through 23 are concerned with direct sums. where Zi G E\t for i = 1 . then A and B commute. (b) Prove that if A and B are simultaneously diagonalizable matrices. . (a) Prove that if T and U are simultaneously diagonalizable operators. TU = UT). Let T be a diagonalizable linear operator on a finite-dimensional vector space. The converses of (a) and (b) are established in Exercise 25 of Section 5. then L.. . 16. Similarly. where ( r % = *£ for all i. . 19.. and let m be any positive integer. Let C G M m x n (/?). 17. Suppose that A is diagonalizable and that the distinct eigenvalues of A are Alr A2. .4. Exercises 17 through 19 are concerned with simultaneous diagonalization.. (a) Prove that if T and U are simultaneously diagonalizable linear operators on a finite-dimensional vector space V.

m—*oo m — * o o m — » o o provided that lim rm and lim sm exist. . Let Wi. 23. .. and let 0i...-.. .0k be a partition of 0 (i. and suppose that the distinct eigenvalues of T are Ai.02. Wfc if and only if k dirn(V) = ^ d i m ( W i ) . where rm and sm are real numbers..3 Matrix Limits and Markov Chains 283 20. Mj. Let T be a linear operator on a finite-dimensional vector space V.0k are subsets of 0 such that 0 = 0i U 02 U • • • U 0k and 0{ n 0j = 0 if i ^ j).. Afc. . . . Kp. Let V be a finite-dimensional vector space with a basis 0. .Sec. The limit of a sequence of complex numbers {zm: m = 1.. M 2 . .02. . 5. . A2. A 2 . K 2 . m — » o o TO—»oo . M9 be subspaces of a vector space V such that W! = Ki©K 2 ©---©K p and W 2 = Mi©M 2 ©M(/.. . .3* MATRIX LIMITS AND MARKOV CHAINS In this section.. W 2 . Such sequences and their limits have practical applications in the natural and social sciences. then lim zm— lim rm + i lim sm. 0i.} can be defined in terms of the limits of the sequences of the real and imaginary parts: If Zm — rm + ism. . *=1 Prove that V is the direct sum of W i5 W 2 . . . We assume familiarity with limits of sequences of real numbers.e.. Prove that V = span(/?i) © span(/?2) © • • • © span(0k). then W i + W 2 = W i © W 2 = Ki©K2 Kp © Mi © M2 © M9. . Let Wi. . /=! 21. Prove that if Wx n W 2 = {0}. . W2... where A is a square matrix with complex entries. 5. An. and i is the imaginary number such that i2 = —1.. 22. Prove that span({x G V: x is an eigenvector of T}) = EAX © EA2 © • • • © EAfc..2. Ki. = v.. Wfc be subspaces of a finite-dimensional vector space V such that ] T w . . . . . we apply what we have learned thus far in Chapter 5 to study the limit of a sequence of powers A..

•( 2m+l\ 4J m2 + l T * I m-1 J 1+i then lim A m = TO—»00 1 0 3 + 2z 0 2 e • where e is the base of the natural logarithm.. n lim (PATn)ij= lim y^Pik(Am)kj TO—'OO TO—»O0 *• ' fc=l ... Then for any P G M.284 Chap. For any t (1 < % < r) and j (1 < j < p). A2.. property of matrix limits is contained in the next theorem.. TO—»oo \m — » o c Theorem 5. The sequence Ai. if lim (Am)ij = Lij m — • 00 for all 1 < i < n and 1 < j < p. be a sequence of n x p matrices with complex entries that converges to the matrix L. To designate that L is the limit of the sequence.. A2. is said to converge to the n x p matrix L.A2. lim PAm = PL and lim AmQ = LQ.. benxp matrices having complex entries... Ai . then TO—«00 lim ca m = c I lim a. TO—»00 TO—»00 Proof. 5 Diagonalization Definition. A simple. Note the analogy with the familiar property of limits of sequences of real numbers that asserts that if lim a m exists. called the limit of the sequence. Let A1. we write lim A m = L. m — » o o Example 1 If 1 ^ T m — 3 J 3TO2 . Let L. x n (C) and Q G M p x s (C). but important.12.

1 )" 1 = (QAQ~l)(QAQ~l) we have lim (QAQ~l)m TO—»oo • • • (QAQ~l) = QAmQ~\ = lim QA 771 ^. One proof of this theorem. can be found in Exercise 19 of Section 7. This fact. Friedberg and A. This set is of interest because if A is a complex number. In the discussion that follows.3 Matrix Limits and Markov Chains = J^Piklim (Am)kj *—* TO—»QO fc=l fc=l = S^PikLkj * — * = (PL)ij.l TO—»oo \7n—>oo / by applying Theorem 5. then the dimension of the eigenspace corresponding to 1 equals the multiplicity of 1 as an eigenvalue of A. Theorem 5. (b) If 1 is an eigenvalue of A. Since (QAQ. The following important result gives necessary and sufficient conditions for the existence of the type of limit under consideration.1 . which makes use of Schur's theorem (Theorem 6. Let A G M n x n (C) be such that lim A m = L. m — » o o TO—»oo Corollary.2). Then for any TO—»00 invertible matrix Q G M nXn (C). J.13. 5. "Convergence of matrix powers.2. 23. 285 Hence lim PAm = PL. this set consists of the complex number 1 and the interior of the unit disk (the disk of radius 1 centered at the origin). 765-769. Math." Int. which is obviously true if A is real. no. can be found in the article by S. Educ. TO—»00 (a) Every eigenvalue of A is contained in S.Sec. Sci. Then lim A m exists if and only if both of the following conditions hold. Proo/. H. Technoi. Let A be a square matrix with complex entries. then lim Am exists if and only TO—»00 A G S. Vol.1 = Q ( lim A m ) Q " 1 = QLQ . J. we frequently encounter the set 5 = { A G C : |A| < l o r A = 1}. pp.12 twice.1 ) 771 = QLQ. 5. lim (QAQ.4). 1992.14 of Section 6. can be shown to be true for complex numbers also. The proof that lim AmQ = LQ is similar. A second proof. which relies on the theory of Jordan canonical forms (Section 7. . Insel. Geometrically.

Since A is diagonalizable. we see that lim A7 v = Lv n—»oo by Theorem 5. Observe that the characteristic polynomial for the matrix B = is (t — l) 2 .13 must hold. we consider an example for which this condition fails. Regarding v as an n x 1 matrix. In most of the applications involving matrix limits. (i) Every eigenvalue of A is contained in S. which can be proved using our previous results. (ii) A is diagonalizable. Theorem 5. A simple mathematical induction argument can be used to show that Bm = and therefore that 1 0 m 1 lim (Amv) = lim Bm does not exist.14. Let v be an eigenvector of A corresponding to A. and so condition (b) of Theorem 5. We see in Chapter 7 that if A m — » o o is a matrix for which condition (b) fails. and hence B has eigenvalue A = 1 with multiplicity 2. so that condition (b) of Theorem 5. Theorem 5. m— + o o Proof.13 is automatically satisfied. Then lim A7 exists. Suppose that A. there exists an invertible matrix Q such that Q~] AQ = D is a diagonal matrix. the matrix is diagonalizable. It can easily be verified that dim(EA) = 1. o D = 0 0 A2 0 o\ 0 KJ . then TO—»00 TO—>00 condition (a) of Theorem 5. 5 Diagonalization The necessity of condition (a) is easily justified. Although we are unable to prove the necessity of condition (b) here.12.13 reduces to the following theorem. Hence if lim A7 exists. where L = lim A771. But lim (Amv) lim (Amn) TO—>OC TO—»OC TO—»0O diverges because lim A771 does not exist. For suppose that A is an eigenvalue of A such that A ^ 5.286 Chap. then A is similar to a matrix whose upper left 2 x 2 submatrix is precisely this matrix B. Let A G M n X n (C) satisfy the following two conditions.13 is violated. In this case.

D2. either A* = 1 or |Ad < 1. 5. A 2 . An are the eigenvalues of A.1 and 5. The technique for computing lim A m used in the proof of Theorem 5..2.. Let /7 _9 _15\ (A 4 41 A = Using the methods in Sections 5.. Hence lim A m = lim (QDQ-1)7'1 = QLQ~l TO—>00 TO—»00 by the corollary to Theorem 5..Sec. Thus TO—»oo But since / v Dw = V o the sequence D.3 Matrix Limits and Markov Chains 287 Because Ai. we obtain 0 0N -\ o such that Q~ AQ = D... condition (i) requires that for each *. o otherwise. as we now illustrate.12.14 TO—»00 can be employed in actual computations. Hence lim A m = lim (QDQ~l)m = lim QDmQ~1 = Q ( lim Dm) Q'1 l 0 i 0 /I 0 H r lim TO—»<X> 0 1° 0 0 d r . 0 A271 0 0 \ 0 converges to a limit L..

98 = A For instance. the probability that someone living in the city (on January 1) will be living in the suburbs next year (on January 1) is 0.10. So. in one year.3 probability that a city resident will be living in the suburbs after 2 years. there are two states (residing in the city and residing in the suburbs). they are nonnegative. For an arbitrary n x n transition matrix M.02 0. We now determine the Suburbs Figure. Notice that since the entries of A are probabilities. Specifically. (See . 5 Diagonalization Next. or moving to the suburbs during the first year and remaining there the second year. for example. In our example. we consider an application that uses the limit of powers of a matrix. 5. Suppose that the population of a certain metropolitan area remains constant but there is a continual movement of people between the city and the suburbs. Any square matrix having these two properties (nonnegative entries and columns that sum to 1) is called a t r a n s i t i o n m a t r i x or a stochastic matrix.288 Chap. the assumption of a constant population in the metropolitan area requires that the sum of the entries of each column of A be 1. let the entries of the following matrix A represent the probabilities t hat someone living in the city or in the suburbs on January 1 will be living in each region on January 1 of the next year. the rows and columns correspond to n states.90 0. and the entry M^ represents t he probability of moving from state j to state i in one stage. A2i is the probability of moving from the city to the suburbs in one stage. that is. Moreover. There are two different ways in which such a move can be made: remaining in the city for 1 year and then moving to the suburbs. Currently living in the city Living next year in the city Living next year in the suburbs 0.10 Currently living in the suburbs 0.

the first coordinate is the sum (0.) The probability that a city dweller remains in the city for the first year is 0.90.02)(0. represents the proportion of the 2000 metropolitan population that remained in the city during the next year. Similarly. for any transition matrix M. and the second term.98). Thus the probability that a city dweller will be living in the suburbs after 2 years is the sum of these products.90)(0.10)(0. The first term of this sum. (0. It is often convenient to regard the entries of a transition matrix or a probability vector as proportions or percentages instead of probabilities.Sec.70)+(0. respectively. the second coordinate of = /0.30).70). This argument can be easily extended to show that the coordinates of . 5.364 represents the proportion of the metropolitan population that was living in the suburbs in 2001. (0.30). We record these data as a column vector: Proportion of city dwellers Proportion of suburb residents /0. the entry (Mm)ij represents the probability of moving from state j to state i in m stages. and that these states are listed in the same order as the listing in the transition matrix A.636 v 0. (0. the probability that a city dweller moves to the suburbs in the first year and remains in the suburbs during the second year is the product (0. whereas the probability that the city dweller moves to the suburbs during the first year is 0.98) = 0. In the vector AP.90) (0. Likewise. Observe that this number is obtained by the same calculation as that which produces (A 2 ) 21 .10) + (0. Hence the probability that a city dweller stays in the city for the first year and then moves to the suburbs during the second year is the product (0.02) (0.10).90) (0.10.3 Matrix Limits and Markov Chains 289 Figure 5. as we have already done with the probability vector P. and hence (A 2 ) 2 i represents the probability that a city dweller will be living in the suburbs after 2 years. represents the proportion of the 2000 metropolitan population that moved into the city during the next year.70\ \0. Hence the first coordinate of AP represents the proportion of the metropolitan population that was living in the city in 2001.90)(0. In this terminology. Observe also that the column vector P contains nonnegative entries that sum to 1. In general.10) (0.30 Notice that the rows of P correspond to the states of residing in the city and residing in the suburbs. each column of a transition matrix is a probability vector.3. such a vector is called a probability vector.188. Suppose additionally that 70% of the 2000 population of the metropolitan area lived in the city and 30% lived in the suburbs.

the limiting outcome would be the same. there is only one such vector. Hence LP is both a probability vector and an eigenvector of A corresponding to the eigenvalue 1. and so there is an invertible matrix Q and a diagonal matrix D such that Q'1 AQ — D. of lim AmP. Q = Therefore lim A m = •O O Consequently lim AmP = LP = m—*oo Thus. In fact. T h e o r e m 5. which characterizes transition matrices and probability vectors.15. 5 Diagonalization represent the proportions of the metropolitan population that were living in each location in 2002.J = Q [\ J1 < r = TO—»oc \U U and D = 0 (I 0. Then lim Q D m Q . We now compute this limit. Will the city eventually be depleted if this trend continues? In view of the preceding discussion. it is natural to define the eventual proportion of the city dwellers and suburbanites to be the first and second coordinates. I of the population will live in the city and | will live in the suburbs each year.) For example. It is easily shown TO—>00 that A is diagonalizable. respectively. respectively. we gave probabilistic interpretations of A 2 and AP. let v be a column vector in Rn having nonnegative coordinates. In analyzing the city-suburb problem.290 Chap. A proof of these facts is a simple corollary of the next theorem. Since the eigenspace of A corresponding to the eigenvalue 1 is one-dimensional. the coordinates of AmP represent the proportion of the metropolitan population that will be living in the city and suburbs. In general. (See Exercise 15. showing that A 2 is a transition matrix and AP is a probability vector. and LP is independent of the initial choice of probability vector P. eventually. and the product of any transition matrix and probability vector is a probability vector. and let u G Rn be the column vector in which each coordinate equals 1. the product of any two transition matrices is a transition matrix. Note that the vector LP satisfies A(LP) = LP. In fact.88 . Let M be an nxn matrix having real nonnegative entries. after m stages (m years after 2000). had the 2000 metropolitan population consisted entirely of city dwellers.

During the present year. the data show that 10% will graduate by next fall. Proof. all four of the factors mentioned above influence the probability that an object is in a particular state at a particular time. In particular. we consider another Markov chain. Exercise. however. and 30% will quit permanently.3 Matrix Limits and Markov Chains (a) M is a transition matrix if and only if Mlu = u. Exercise. the number of possible states is finite. With this in mind. 50% will become sophomores. the time in question. Of course. A certain community college would like to obtain information about the likelihood that students in various categories will graduate. then the stochastic process is called a Markov process. The city-suburb problem is an example of a process in which elements of a set are each classified as being in one of several fixed states that can switch over time. We treated the citysuburb example as a two-state Markov chain. or other factors). earlier states. the object could be an American voter. any power of a transition matrix is a transition matrix. and the states that other objects are in or have been in. In these examples. For freshmen.Sec. in addition. 40% of the sophomores will graduate. Data from the school indicate that. If. 5. Proof. and gas). 20% will remain freshmen. and the state of the object could be his or her preference of political party. from one fall semester to the next. and in general this probability depends on such factors as the state in question. . some or all of the previous states in which the object has been (including the current state). (b) The product of a transition matrix and a probability vector is a probability vector. 291 Corollary. (b) v is a probability vector if and only if ulv = (1). and the states could be the three physical states in which H 2 0 can exist (solid. If. In general. a Markov process is usually only an idealization of reality because the probabilities involved are almost never constant over time. the probability that an object in one state changes to a different state in a fixed interval of time depends only on the two states (and not on the time. such a process is called a stochastic process. and 20% will quit permanently. or the object could be a molecule of H 2 0 . then the Markov process is called a Markov chain. The switching to a particular state is described by a probability. 30% will remain sophomores. liquid. The school classifies a student as a sophomore or a freshman depending on the number of credits that the student has earned. (a) The product of two nxn transition matrices is an n x n transition matrix. For instance.

3. we must determine the probabilities that a present student will be in each state by next fall.1 0.1 0.2 0. 4.the student is assumed to have remained in the state of being a freshman during the time he or she was not enrolled.2 0. having graduated being a sophomore being a freshman having quit permanently.10 vO.2 0\ < °) 0 0.3 0. we are told that the present distribution of students is half in each of states 2 and 3 and none in states 1 and 4. these probabilities are the coordinates of the vector (\ 0.292 Chap.2 0\ 0 0 l) of the Markov chain. • The given data provide us with the transition matrix / l 0.) Moreover.5 \ V that describes the initial probability of being in each state is called the initial probability vector for the Markov chain. the percentage of the present students who will graduate.5 0. and the percentage who will quit school permanently by next fall.3 A = 0 0 l o 0. The preceding paragraph describes a four-state Markov chain with the following states: 1. 5 Diagonalization 50% of the students at the school are sophomores and 50% are freshmen. Assuming that the trend indicated by the data continues indefinitely.25\ 0. 2. Thus a freshman who quits the school and returns during a later semester is not regarded as having changed states. the same percentages as in item 1 for the fall semester two years hence. the school would like to know 1.4 0 0. The vector o\ 0. As we have seen.25/ ( . To answer question 1. and 3.3 0.40 0. 2. the percentage who will be freshmen. (Notice that students who have graduated or have quit permanently are assumed to remain indefinitely in those respective states. the percentage who will be sophomores.3 AP = 0 0 \0 0.5 l) x <V /0.5 0.5 P = 0. the probability that one of its present students will eventually graduate.4 0 0.5 0 0.

42\ 0.2 1/ \0. 2% will be freshmen.40 0 0 0. where L = lim A m . Similarly. and 25% will quit the school permanently. 10% will be freshmen.2 0 0.5 0 i i y 1 o^ u f 112 \ 0 0 53 \112/ ^ and hence the probability that one of the present students will graduate is .Sec.10 v0 0. 25% of the present students will graduate. 17% will be sophomores.3 0.25\ 0 0. For the matrices (\ 0 Q = 0 \0 19 0\ -40 0 8 0 13 1/ (I 0 D = 0 \o 0 0 0\ 0. / l 0. and 39% will quit school.5 0. 40% will be sophomores.3 Matrix Limits and Markov Chains 293 Hence by next fall.1 0\ /0. 5.17 0. the answer to question 3 is provided by the vector LP.4 0. Finally. Thus L= lim Am = Q ( lim Dm) Q -1 TO—»00 \ T O — * 0 0 / /l 0 0 \0 a 0 0 0 So (l LP = f 4 -7 0 3 I 0 0 f 1 19 -40 8 13 0\ / l 0 0 0 0 0 0 0 1/ i0 0 0 0 0 0 0\ 0 0 1/ /I 0 0 0 4 7 1 7 0 3 7 27 o \ 56 5 0 7 1 8 0 29 1 56 o\ 0 0 0 0 I 1 0 0 0 0 ^ 56 0 0 29 56 0\ / o\ \ 0 0.5 0 0.3 0 0 0 0.02 [p.25J /0.2 0 o o iy -7 0 3 and we have Q~*AQ = D.39 J A P = A(AP) 2 provides the information needed to answer question 2: within two years 42% of the present students will graduate.3 0.

2) that the only transition matrices A such that lim Am does not exist are preeiselv those matrices for TO—»00 which condition (a) of Theorem 5. Definition. the limit of powers of a transition matrix need not exist.98 of the Markov chain used in the city suburb problem is clearly regular because each entry is positive. it can be shown (see Exercise 20 of Section 7. Example 2 The transit ion matrix 0. where A is the TO—»00 transition matrix and P is the initial probability vector of the Markov chain. (The reader is encouraged to work Exercise 6 to appreciate the truth of the last sentence. the transition matrix / I 0.294 Chap.13 fails to hold. there is a large and important class of transition matrices for which this limit exists and is easily computed this is the cla.3 A = 0 1 ) In 0. . In fact. gives the eventual proportions in each state. In general.2 0.90 0. however.3 0. But even if the limit of powers of the transition matrix exists. we saw that lim AmP.13 does not hold for M ( — 1 is an eigenvalue).) Fortunately.10 0. 5 Diagonalization In the preceding two examples.ss of nyular transition matrices.5 0.1 0.4'" is for any power rn.4 0 0. the computation of the limit may be quite difficult. On the other hand.2 o\ 0 0 \) of the Markov chain describing community college enrollments is not regular because the first column of . For example. if 0 1 A/ = 1 0 then lim Mm does not exist because odd powers of M equal M and even TO—»00 powers of M equal /. A transition matrix is called regular if some power of the matrix contains only positive entries.02 0. The reason that the limit fails to exist is that condition (a) of Theorem 5.

6. For 1 < i. and u3(A) = 12. fori = 1.9 0.2. are defined as p(A) = max{pi(A): 1 < i < n] Example 3 For the matrix A = and u(A) = max{vj(A): 1<j <n}. . For example. for a regular transition matrix A. Ps(A) = 6. 5.4 0. define Pi(A) to be the sum of the absolute values of the entries of row i of A. The necessary terminology is introduced in the definitions that follow.2. • J . p2(A) = 6 + s/h.n Pi(A) = 7.. The remainder of this section is devoted to proving that. denoted p(A).| and . we obtain some interesting bounds for the magnitudes of eigenvalues of any square matrix. These bounds are given in terms of the sum of the absolute values of the rows and columns of the matrix. Thus n pi(A) = y j A j j . • is regular because every entry of M2 is positive. Let A G M nXn (C). In the course of proving this result. and define Vj(A) to be equal to the sum of the absolute values of the entries of column j of A.3 Matrix Limits and Markov Chains Observe that a regular transition matrix may contain zero entries. vx(A) = 4 + v% v2(A) = 3. the limit of the sequence of powers of A exists and has identical columns. Definitions. From this fact. denoted f(A).. /0. for j = 1. Hence p(A) = 6 + y/h and v(A) = 12.5 \0. it is easy to compute this limit. j < n. and the column sum of A.Sec.5 M = 0 0.1 0 0N 0.(A) = ^ | A i j | i=i The row sum of A.

stated below.Au\ < n}. we introduce some terminology. consider the matrix A = l+2i 2i 1 -3/ ' For this matrix. for instance. Ci = {z£ C: \z. For example.296 Chap. tells us that all the eigenvalues of A are located within these two disks. (See Figure 5. Then every eigenvalue of A is contained in a Gerschgorin disk. In the preceding example. we define the ith Gerschgorin disk C{ to be the disk in the complex plane with center An and radius ri — Pi(A) — \An\.) imaginary axis Figure 5. and C2 is the disk with center —3 and radius 2. Proof.4. and hence by Exercise 8(c) of section 5. A is invertible. that is. 5 Diagonalization Our next results show that the smaller of p(A) and v(A) is an upper bound for the absolute values of eigenvalues of A.1. Theorem 5. Let A be an eigenvalue of A with the corresponding eigenvector (vi\ V2 V = \Vn) . A has no eigenvalue with absolute value greater than 6 + \/5To obtain a geometric view of the following theorem.4 Gershgorin's disk theorem. For an nxn matrix A. Ci is the disk with center 1 + 2i and radius 1. we see that 0 is not an eigenvalue. Let A G M nXn (C). In particular.16 (Gerschgorin's Disk Theorem).

5.1. Therefore |A| < v(A).Akkvk\ = ^AkjVj j=i ^ ^2 \Akj\M = \vk\^2\Akj\ Thus \vk\\Xso \X. For i = k. By Exercise 14 of Section 5. n).Sec. Proof. Then |A| < p(A). which can be written ^2 Ai v 297 ii = Xvi (» = 1» 2 . |A — Akk\ < rk. . Since |A| < p(A) by Corollary 1. | The next corollary is immediate from Corollary 2. Let A be any eigenvalue of A G M n X n (C). Let A be any eigenvalue of A G M n X n (C).Akk) + Akk\ < |A . Corollary 2. We show that A lies in Ck.Afcfcl + |Afcfc| <rk + \Akk\ = Pk(A)<p(A). A is an eigenvalue of A1. it suffices to show that |A| < v(A).i/(A)}. . . and so |A| < p(Al) by Corollary 1. Proof.. Corollary 1. Akkvk ^ S J2 I-4** I K I Ak v iJ = \vk\rk.A u l < rk because \vk\ > 0. it follows from (2) that \Xvk . that is. . Then |A|<min{/)(A). note that vk / 0 because v is an eigenvector of A. | Akk\ < \vk\rk. But the rows of A1 are the columns of A. Hence |A| = |(A . .3 Matrix Limits and Markov Chains Then v satisfies the matrix equation Av = An. |A — Afcfcl < rk for some A. By Gerschgorin's disk theorem. (2) Suppose that vk is the coordinate of v having the largest absolute value. consequently p(Al) = v(A).

\AkJvj\ 3=1 ^ f>Wb' (3) ^ E I4yl fc = p^A)h 3=1 Since |A| = p(A). 5 Diagonalization Corollary 3. It is interesting to observe that if P is the initial probability vector of a Markov chain having A as its transition matrix.15. Then X — p(A) and {u} is a basis for E\. Let A be an n x n transition matrix. (a) E Ak3V3 = ^r\Akjvj\.18. For in this situation. Then Alu = u by Theorem 5. for instance. Let v be an eigenvector of A corresponding to A. and let b — \vk\. Proof. and let u G R" be the column vector in which each coordinate is 1. it follows that 1 is also an eigenvalue of A. Then some multiple of this vector is a probability vector P as well as an eigenvector of A corresponding to eigenvalue 1. Proof. Theorem 5. hence the probability of being in each state never changes. AmP = P for every positive integer m. But since A and A1 have the same eigenvalues. Let A G M nXn (C) he a matrix in which each entry is positive. the city-suburb problem with P = Theorem 5. that is.vn. and let X be an eigenvalue of A such that |A| = p(A). Then |A|6= |A||vfc| = |Anfc| = E ^ 3=1 = E \^\M 3=1 <Y. the three inequalities in (3) are actually equalities. and hence u is an eigenvector of A1 corresponding to the eigenvalue 1. then the Markov chain is completely static. • • • . where u G C n is the column vector in which each coordinate equals 1. If X is an eigenvalue of a transition matrix. with coordinates Vi. 3=1 3=1 . Suppose that vk is the coordinate of v having the largest absolute value. I Suppose that A is a transition matrix for which some eigenvector corresponding to the eigenvalue 1 has only nonnegative coordinates. Every transition matrix has 1 as an eigenvalue.17. The next result asserts that the upper bound in Corollary 3 is attained. Consider.v2. then |A| < 1.298 Chap.

and hence A > 0. Combining (4) and (5). . Then X — v(A). . Exercise. . . we have Vj — bz for all j. Then \X\ < 1. and therefore by (4). observe that all of the entries of Au are positive because the same is true for the entries of both A and u..Sec. and the dimension ofE\ = 1. Our next result extends Corollary 2 to regular transition matrices and thus shows that regular transition matrices satisfy condition (a) of Theorems 5. 2 . 2 . . So v2 KVJ (bz\ bz = bzu. . . and 299 We see in Exercise 15(b) of Section 6. Without loss of generality. .1 that (a) holds if and only if all the terms AkjVj (j — 1 . . 5. ^ | Afcj 16. Thus there exist nonnegative real numbers ci. Therefore. we assume that \z\ — 1. Let A G M nxri (C) be a transition matrix in which each entry is positive.14. Proof. n.. we obtain 6= \VJ\ (4) — A kj A kj for j = 1. Corollary 2. Proof. c 2 . and let X be any eigenvalue of A other than 1.13 and 5. .3 Matrix Limits and Markov Chains (b) E l ^ i l N (c) Pk(A) = p(A). . By (b) and the assumption that Akj ^ 0 for all k and j.n.. Moreover. Kbz) and hence {u} is a basis for EAFinally. . n) are nonnegative multiples of some nonzero complex number z. Exercise. cn such that AkjVj = CjZ. . | Corollary 1. Let A G M n X n (C) be a matrix in which each entry is positive. But Au — Xu. the eigenspace corresponding to the eigenvalue 1 has dimension 1. . A = |A| = p(A). and let X be an eigenvalue of A such that |A| = i'(A). we have |Vj | = b for j = 1 .2.

TO—>00 (d) AL = LA = L. Then As and A s + 1 are eigenvalues of As and As+1. having absolute value 1. we state this result here (leaving the proof until Exercise 21 of Section 7.19 and 5. Thus A = 1. Let A be a regular transition matrix. 264). and let X be an eigenvalue of A. 5 Diagonalization Theorem 5. m—*oc (c) L = lim A m is a transition matrix.14. (b) This follows from (a) and Theorems 5.15.13. Hence EA = EA. (b) Since A is regular. Let A be a regular transition matrix that is diagonalizable. Then (a) The multiplicity of 1 as an eigenvalue of A is 1. Because A is a transition matrix and the entries of As are positive. (f) For any probability vector w.18. lim (Amw) = v. Theorem 5.13. Then EA Q E'X and.16. Let A be an n x n regular transition matrix.13 is satisfied. So by Corollary 2 to Theorem 5. condition (b) of Theorem 5. lim A w exists regardless of whether TO—»00 A is or is not diagonalizable. (e) The columns of L are identical. and dim(EA) = 1. Then lim Am exists. and dim(E A ) = 1. (c) By Theorem 5. In fact. however. Suppose that |A| = 1.7 (p.20. As = A s+1 = 1. Then (a) |A| < 1. (b) If \X\ = 1. Statement (a) was proved as Corollary 3 to Theorem 5. dim(E'A) = 1.2. corresponding to A = 1. Nevertheless. Now A m is a transition matrix by the corollary to Theorem 5. then the multiplicity of 1 as an eigenvalue of A is 1. Proof.2) and deduce further facts about lim A771 when A is a regular m— » o o transition matrix.19. we must show that ulL = u*. | Corollary. In fact. each column of L is equal to the unique probability vector v that is also an eigenvector of A corresponding to the eigenvalue 1. m — » o o Proof. respectively. respectively. So if A is a regular transition matrix. there exists a positive integer s such that As has only positive entries.18.300 Chap. it can be shown that if A is a regular transition matrix. then A = 1. Thus. (b) lim A771 exists. by Theorem 5. so u'L = ul lim A m = lim utAm = lim u* = u\ . which follows immediately from Theorems 5. the fact that the multiplicity of 1 as an eigenvalue of A is 1 cannot be proved at this time. TO—>00 The preceding corollary. the entries of As+1 = AS(A) are positive.19 and 5. by Corollary 2 to Theorem 5. Let EA and E'x denote the eigenspaces of A and As. (a) See Exercise 21 of Section 7.15. is not the best possible result. As with Theorem 5.

20% now preferred a. 5. second. Hence y is also an eigenvector corresponding to the eigenvalue 1 of A.lim A". loaf of bread." Assuming that this trend continues. and 10%. each column of L is a probability vector.12. and third states be preferences for bread. and also Ay — ALw = Lir — y by (d). and "thou". + ] = L. Tims. now preferred a. each column of L is an eigenvector of A corresponding to the eigenvalue 1. (f) Let w be any probability vector. Moreover. wine. by (c). We can predict the percentage of Persians in each state for each month following the original survey. Letting the first.Sec. and set y = lim A'"w = Lie. The vector u in Theorem 5. H Definition. 70%: continued to prefer a jug of wine. and 60% continued to prefer "thou. (e) Since AL — L by (d). by (a). the situation described in the preceding paragraph is a three-state Markov chain in which the slates are the three possible preferences. So y — v by (e).15. each column of L is equal to the unique probability vector v corresponding to the eigenvalue 1 of A. and 50% preferred "thou". jug of wine. regular transition matrix. we sec that the probability vector that gives the initial probability of being in each state is P = .20 can be used to deduce information about the eventual distribution in each state of a Markov chain having a. of those who preferred a jug of wine on the first survey. m—»oo in. 30% preferred a jug of wine. (d) By Theorem 5.—>oo m—•TC 301 Similarly. Theorem 5. now preferred '•thou": of those who preferred "thou" on the first survey. 20% now preferred a loaf of bread. 40% continued to prefer a loaf of bread. LA — L. and 20% preferred "thou beside me in the wilderness. 10%: now preferred a jug of wine. Example 4 A survey in Persia showed that on a particular day 50% of the1 Persians preferred a loaf of bread. respectively.20(c) is culled the fixed probability vector or stationary vector of the regular transition matrix A. 20%.3 Matrix Limits and Markov Chains and it follows that L is a transition matrix." A subsequent survey 1 month later yielded the following data: Of those who preferred a loaf of bread on the first survey. Then III > O C y is a probability vector by the corollary to Theorem 5. AL = A M m Am = lim A A"1 .

20 0. 35% prefer a jug of wine. Since A is regular.4078/ Note the apparent convergence of A" P.3418 .40 0.10 \0.50«i 4. 25%.10e-2 . of the Persians prefer a loaf of bread.0.0. and A4P = 0.20e.40^3 . 5 Diagonalization The probabilities of being in each state m months after the original survey arc the coordinates of the vector A"1 P. 111/ \0.10v) 0.42/ \(). + 0.70 0. It is easilv shown that is a basis for the solution space of this system.20^3 = 0 0. 0. The reader may check that AP = /0. \0.30 .50 0." .2504\ 0.40y \0.334 . Hence the unique fixed probability vector for A is /nTTTlA 5+7+8 8 \5+7+8/ Thus. A*P -.302 and the transition matrix is . and 40% prefer "thou beside me in the wilderness.10 0.252\ /0. Letting we see that the matrix equation (. the long-range prediction concerning the Persians" preferences can be found by computing the fixed probability vector for .0 .4 — I)v — 0 yields the following system of linear equations: -O.20\ 0. in the long run.GOe.4.20e2 + 0. A2P = 0.0.30^2 + 0.32 .60/ Chap.26\ /0.30\ /0.20 . this vector is the unique probability vector v such that (A — I)v = 0.4 /0.5 = 0 0.

and 100 acres of wheat were planted.35 0. Thus the (wentrial . in fact. Because they believe in the necessity of rotating their crops.Sec. By converting these amounts into fractions of the total acreage. 200 acres of soybeans.35 0. respectively.40 Farmers in Lamron plant one crop per year either corn. and wheat.40 0.35 0. 5. exactly half is planted with each of the other two crops during the succeeding year. we see that the transition matrix A and the initial probability vector P of the Markov chain are /0 A= J 0 h\ i and 300 \ 600 \ P = •Jim i>(in 100 V 600 fi\ A 0 w The fraction of the total acreage devoted to each crop in m years is given by the coordinates of A'"P.25 0.25 0. rather than the percentage of the total acreage (GOO acres). In this problem. 300 acres of corn. soybeans. the amount of land devoted to each crop.40 Example 5 0. is given.25 I 0. or wheat. This year. of the total acreage on which a particular crop is planted. however. soybeans. these farmers do not plant the same crop in successive years. The situation just described is another three-state Markov chain in which the three states correspond to the planting of corn.3 Matrix Limits and Markov Chains Note thai if 303 then G So lim A"1 =Q Q-] = Q i o o\ 0 0 0 Q" \ 0 0 0/ 0. and the eventual proportions of the total acreage1 used for each crop are the coordinates of lim A" P.

There is another interesting class of transition matrices that can be represented in the form I () D C 3 : ?\ I i 3 :i l i 3 :5 where / is an identify matrix and O is a zero matrix.) The states corresponding to the identity submatrix are called absorbing states because such a.304 Chap. we have1 concentrated primarily on the theory of regular transition matrices. we expect 200 acres of each crop to be planted each year. m—-oo Since A is a regular transition matrix. Theorem 5. A Markov chain is called an absorbing Markov chain if it is possible to go from an arbitrary state into an absorbing state. that is. Readers interested in learning more about absorbing Markov chains are referred to Introduction to Finite. the eventual amounts of land used for each crop are the coordinates of G O O • lim A'" P. see Exercise 14. state is never left once it is entered. (Such transition matrices are not regular since the lower left block remains O in any power of the matrix. Observe that the Markov chain that describes the enrollment pattern in a community college is an absorbing Markov chain with states 1 and 4 as its absorbing states.20 shows that lim -4'" III — > 0 0 is a matrix L in which each column equals the unique fixed probability vector for A. 5 Diagonalization amounts of land devoted to each crop are found by multiplying this limit by the total acreage.lim AmP = Q00LP= \ 200 III — > T C Thus.) • •III—>OC In (his section. in a finite number of stages. in the long run. Mathematics (third edition) by . It is easily seen that the fixed probability vector for A is /i\ i 3 Hence I3 i L = :i 1 :{ so GOO. (For a direct computation of 600 • lim A"1 P.

For example. N. The genes for a particular trait are of two types. that mating is random with respect to genotype. (We assume that the population under consideration is large. Let us consider the probability of offspring ol' each genotype for a male parent of genotype Gg. ..) Let P = denote the proportion of the adult population with genotypes GG. N. respectively. the characteristics of an offspring with respect to a particular genetic trait are determined by a pair of genes. in humans. Inc. Snell. .]. brown eyes are a dominant characteristic and blue eyes are the corresponding recessive characteristic. Thompson (Prentice-Hall. Roberts (PrenticeHall. and G.. Thus. the proportion of offspring in the population having a. which is f\\ w . at the start of the experiment. Offspring with genotypes GG or Gg exhibit the dominant characteristic. and g represents the recessive characteristic.. and that the distribution of each genotype within the population is independent of sex and life expectancy. certain genotype will stabilize at the fixed probability vector for B. whereas offspring with genotype gg exhibit the recessive characteristic. 2 offspring I 2 I It is easily checked that I}2 contains only positive entries.). one inherited from each parent.Sec. and gg. 5.. by permitting only males of genotype Gg to reproduce. Englewood Cliffs. which are denoted by G and g. 1976).3 Matrix Limits and Markov Chains 305 J. J. The gene (4 represents the dominant characteristic. Englewood Cliffs. so II is regular. 1974) or Discrete Mathematical Models by Fred S. Gg. whereas those of type gg are blue-eyed. This experiment describes a three-state Markov chain with the following transition matrix: Genotype of female parent GG Genotype GG Gg (\ Gg gg °\ 1 . thus the offspring with genotypes GG or Gg are brown-eyed.13. Inc. An Application In species that reproduce sexually. Kemeuy.

these experiments are threestate Markov chains with transition matrices /() A = 0 \ o> c = t) Vo o 0 respectively. respectively. D. let a = p+ \q and b = \q + r. As before. respectively. we must. a new transition matrix A/ must be determined based upon the distribution of first-generation genotypes. we find that (V M = ¥ ¥ ¥ + ¥ ¥ + * ¥ ¥ p ¥ o ¥ 4 r' \ (a1 b' V 0 2' | ¥ 0\ a' b ' . and C weighted by the proportion of males of each genotype.first-generation offspring having genotypes GG. which is the linear combination of A. 5 Diagonalization Now suppose that similar experiments are to be performed with males of genotypes GG and gg. q'.306 Chap. in the population. and /•' denote the proportions of the.) Then (a M = b 0 \a \ \b 0\ a b where a + b = p + q+ r = 1. In order to consider the case where all male genotypes are permitted to reproduce. and gg. Then / = MP ap+\aq bp+ k + < L 2b<l l \ 4 br J In order to consider the effects of unrestricted niatings among the firstgeneration offspring. Thus (P M = !« ¥ b n> h kq + r 31 o \ k +r 0 V To simplify the notation. form the transition matrix M = pA + qB + rC. As already mentioned. Let //. (The numbers a and b represent the proportions of G and g genes. Gg.

we have lim QAmQ~i = QLQ If 2 is an eigenvalue of A € 1 UAe (b) (C). MP is the fixed probability vector for M.ah + ah2 \ ab2 + b3 ) / 2 2 I a2(a + b) \ ab(a 4-14-6) = \ b2(a | 6) / (a2 2ab \lr the same as the first-generation offspring.b2 = b(a + b) = 6. . and genetic equilibrium is achieved in the population after only one generation.) Notice that in the important special case.) and lim A" = L.xn = 1 is a probability vector.• • • 4.3 Matrix Limits and Markov Chains where a' = p' 4.r\ + x-2 4. However a' = a2 + -(2a6) = a(a + b) = a and 307 b' . for any invertible matrix 'nx n Q e M n X n (C). then. (This result is called the Hardy Weinberg law.hq' and 1/ = hq' 4. the distribution at equilibrium is rt\ MP 2ab I \¥ EXERCISES 1. then ii x n lim A" does not exist. Thus M = 47: so the distribution of second-generation offspring among the three genotypes is M(MP) a3+a2b \ = M P = a b 4.-(2a6) 4. that a — b (or equivalently. In other words. (d) The sum of the entries of each row of a transition matrix equals 1. that p — r). (c) Anv vector •l'2 such that . Label the following statements as true or false (a) {(. (e) The product of a transition matrix and a probability vector is a probability vector.r'.Sec. 5.

7 4.8 -0. is a sequence of n x p matrices with complex entries such that lim Am = L. -1. Determine whether lim Am exists for each of the following matrices m — » o o 0.8 2.8 (d) -1.7 0.4 1. m » o o m—>oo 4.5 4-Gi 6 7 . m — > o o (j) If A is a regular transition matrix. Prove that if A\.3 0. Then the matrix does not have 3 as an eigenvalue.2 (e) -2 4 -1 3 (f) 2.5 3 G) . (g) Every transition matrix has 1 as an eigenvalue.7 0.1 -2. 2.5 -0.0 3. (h) No transition matrix can have —1 as an eigenvalue.8 4.4 0.8 A. • •. then lim Am exists and has m — + o o rank 1.2i 3 .1 3 4-62 G 3 .308 (f) Chap.20?: 3.6 0. then lim (AmY = V. 5 Diagonalization Let z be any complex number such that \z\ < 1. Prove that if A € M n X U (C) is diagonalizable and L = lim Am exists. (i) If A is a transition matrix.4 0. then either L = In or rank(L) < n. then lim Am exists.2z 35 . .7 the limit if it exists.i 3 .0 -0.5 4. and compute (c) (a) (b) 0.1 0. A2.

5 0. or in square 4. A player begins a.5 o r 0. 20% have become ambulatory.) A die is rolled. and 20% have died. 5. 60% of the ambulatory patients have recovered.3 0 0.3 0. are bedridden. are ambulatory.2 0. game of chance by placing a marker in box 2.4. 20% remain ambulatory.5 0 0' 0. Determine the percentages of patients who have recovered.5 0. 0 0. or G is rolled.5 0 /0.5.8 (a) (b) (c) (d) 0.5 0 0 \ 0 1 0.5 0. it is easier to represent e<i as a linear combination of eigenvectors of A and then apply A" to the result. Which of the following transition matrices are regular? 0.5 0 1 \ 0 1 0. and lim (AB)m all exist. A hospital trauma unit has determined that 30% of its patients are ambulatory and 70% are bedridden at the time of arrival at the hospital. After the same amount of time. 8. and have died 1 month after arrival. 10% of the bedridden patients have recovered. in which case the player wins the game. marked Start. 6.3 0. 50%: remain bedridden. but m—-oc in ->oc lim (AB)m ^ ( lim Am)( lim 309 lim Am.5 1 0 0 0 «/ /l (f) 0 \0 . (\ (e) 0 °\ 0 1 /0. 4. A month after arrival. and the marker is moved one square to the left if a I or a 2 is rolled and one square to the right if a 3. m — » o o Bm).2 0. 7.5 0 1\ 0.Sec. Also determine the eventual percentages of patients of each type.2 0.3 Matrix Limits and Markov Chains 5. What is the probability of winning this game? Hint: Instead of diagonali/ing the appropriate transition matrix Win 1 Start 2 Lose 4 3 Figure 5.5 .7 0. Find 2 x 2 matrices A and B having real entries such that lim Bm. 5. (See Figure 5. in which case the player loses the game. and 20% have become bedridden. This process continues until the marker lands in square 1.

7 0.l\ 0. Five years later. Otherwise.2 0.1 0. m—»oc 10. Assuming that the trends indicated by the 1945 survey continue.2 0. l \ 0.2 0. the initial probability vector is For each transition matrix.1 \0. and agricultural land in the county in 1950 and the corresponding eventual percentages. after a diaper change. then it is discarded and replaced by a new liner. unused.1 0.8 0.4 (d) (e) (f) 11.2 0. The probability that the baby will soil any diaper liner is one-third. for each matrix A in Exercise 8.9 0. compute the proportions of objects in each state after two stages and the eventual proportions of objects in each state by determining the fixed probability vector.2 0 0. In all cases.3 0.7/ 0. 60% had remained unused.4 0.1 0.6/ 0. Each of the matrices that follow is a regular transition matrix for a three-state Markov chain.2 0 0.1 0 . . 10% had become unused.2 0. of the urban land had remained urban. 20% of the unused land had become urban.5 0. the 1945 survey showed that 20% of the agricultural land had become unused while 80% remained agricultural. except that each liner is discarded and replaced after its third use (even if it has never been soiled).2 0. a county land-use survey showed that 10% of the county land was urban. the liner is washed with the diapers and reused.310 /0 (g) \ 0 0\ 0 0 (h) V1 4 o o Chap.8 0. Likewise. compute the percentages of urban.3 0. eventually what proportions of the diaper liners being used will be new.1 0. In 1940.2\ 0.3 0.5 0.2 V0.5 (c) /0.2 0. 12. (a) /0. and 20% had become agricultural.1 ().2 0. Finally.1 0.1 0.3 0.1 0. If there are only new diaper liners at first. and 20% had become agricultural.3 0.8/ 0.8 0. 50%i was unused. A diaper liner is placed in each diaper worn by a baby.5 0.6 0. a follow-up survey revealed that 70%. 5 Diagonalization A o o o i I i o o i 9. Compute lim A7"' if it exists.6 0. the liner is soiled.1 \ 0 0. If.9 0.6 0.4 0.2 0. and 40% was agricultural.6 (b) /0.

18. Prove the two corollaries of Theorem 5. 19. but 30% had changed to an intermediate-sized car.ll cars in 1985. 20% drove intermediate-sized cars. and 40% drove small cars. and twice used? Hint: Assume that a diaper liner ready for use is in one of three states: new. 13. After its use. then ( Deduce that r m m I 1 ''m+1 >'m + i rin rm-\ i J . 5. then W contains a unique probability vector. Prove that if a 1-dimensional subspace W of R" contains a nonzero vector with all nonnegative entries.A" 200 ] V 100 200 2"> 200 IOO': (-1)'"11 (100) 2' 15. 70% continued to drive intermediate-sized cars. Assuming that these trends continue.3 latrix Limits and Markov Chains 311 once used. it then transforms into one of the three states described. Show that if A and P are as in Example 5. In 1975. and twice used. 17. Prove Theorem 5. r m I 1 7'm-l !'„ r Vm = 3 om- 200 /30(r 600(4'" P) . Suppose that M and M' are n x n transition matrices. A second survey in 1985 showed that 70% of the largecar owners in 1975 still owned large cars in 1985. the automobile industry determined that 40% of American car owners drove Large cars. Finally. 14. of the small-car owners in 1975. and 20% had changed to small cars in 1985. once used.15 and its corollary. Of those who owned intermediate-sized cars in 1975. 10% owned intermediate-sized cars and 90% owned sma. 10% had switched to large cars.19.18. . 16. determine the percentages of Americans who own cars of each size in 1995 and the corresponding eventual percentages.Sec. Prove the corollary of Theorem 5.

(c) Deduce that if the nonzero entries of M and M' occur in the same positions.sum of the infinite series 1+ A + A*_ 3! and Bm is the mth partial sum of this series. N is any n x n transition matrix. then cM + (1 — c)N is a regular transition matrix. j.2 if and only if x(t) = e v for some u £ R". and c is a real number such that 0 < c < 1.. 21. 22. For A 6 M n x „(C).. Let A G M n xn(C) be diagonalizable. (b) Suppose that for all i. where III. Thus e A2 2! Am is the .> 0 whenever A/.) 20. Prove that there exists a transition matrix N and a real number c with 0 < c. Prove that a differentiable function x: B -^ R" is a solution to the system of differential equations defined in Exercise 15 of Section 5.2 shows that e exists for every Ac M„ X „(C).> 0.) 23. 5 Diagonalization (a) Prove that if M is regular. Compute c° and c1. define e = lim Bm. we have that M[.I 4 A (see Exercise 22).B e M 2x2 (/?) such that eAeB ^ eA+B. .— ' oc Bm . Use the result of Exercise 21 to show that eA exists.c)JV.(1 . (Exercise 21 of Section 7.312 Chap. Find A. 24. Prove that eA = PeDP~1. Let P~lAP = D be a diagonal matrix. respectively. (Note the analogy with the power series c" = 1 a:' 4! 2! which is valid for all complex numbers a. where A is defined in that exercise. Definition. < 1 such that M''= cM 4. then M is regular if and only if M' is regular. where () and / denote the n x n zero and identity matrices. The following definition is used in Exercises 20-24.

thai is. ..0): x E R} are T-invariant subspaces of R5. 3.}) is called the T-cyclic subspace of V generated by . then T maps the span of {r} into itself. {0} V R(T) N(T) E. we outline a method for using cyclic subspaces to compute the characteristic polynomial of a linear operator without resorting to determinants.T 2 (. The? subspace W-span({./:.vy.4 INVARIANT SUBSPACES AND THE CAYLEY-HAMILTON THEOREM 313 In Section 5.) • Example 2 Let T be the linear operator on R'{ defined by T(a./: be a nonzero vector in V.g. and let.for any eigenvalue A of T.O): x..y G R) and the . W is the "smallest" T-invariant subspace of V containing x. if T(v) £ W for all v 6 W. 2. Definition. Subspaces that are mapped into themselves are of great importance in the study of linear operators (sec. .///-plane = {(.T(:r).4 suhspace W of\/ is called a T -invariant subspace of V if T(W) C _ W. any T-invariant subspace of V containing x must also contain W (see Exercise 11).1). Let T be a linear operator on a vector space V.b + c.6.4 Invariant Subspaces and the Cayley-Hamilton Theorem 5.. • Let T be a linear operator on a vector space V. In fact. e.T-invariant: 1.r-axis . Exercises 28 42 of Section 2. when1 we study matrix representations of nondiagonalizable linear operators. Then the . Cyclic subspaces have various uses. 5. we observed that if v is an eigenvector of a linear operator T. f. simple matter to show that W is T-invariant. (Sec Exercise 3. That is.1. We apply them in this section to establish the Cayley -Hamilton theorem.c) = (a + b. In Exercise 41.r.\. 5. Example 1 Suppose that T is a linear operator on a vector space V. Cyclic subspaces also play an important role in Chapter 7. Then the following subspaces of V arc.. .{(.Sec..70. The proofs that these subspaces are T-invariant are left as exercises.0).r. It is a./'. 0.

. Then the • • The existence of a T-invariant subspace provides the opportunity to define a new linear operator whose domain is this subspace. Let T be a linear operator on a finite-dimensional vector space V.0) = e 2 and T2(ex) = T(T(ei)) . Example 4 Let T be the linear operator on P(R) defined by T(f(x)) = f'(x).\ = (1. then the restriction Tw of T to W (sec Appendix B) is a mapping from W to W.0. 2. . it follows that span({ei. and it follows that Tw is a linear operator on W (see Exercise 7).c . 0 ) = (0. Vk] for W. Then the characteristic polynomial of Tw divides the characteristic polynomial of T. • • •.314 Example 3 Let T be the linear operator on R. 5 Diagonalization We determine the T-cyclic subspace generated by e. ??2.0) = . The following result illustrates one way in which the two operators are linked. a + c. • • • <Vk..t e R}.4 //„) drl[Blo*J* Bs-tInJ •'/:/"'lM //. by Exercise 12. If T is a linear operator on V and W is a T-invariant subspace of V.Vk+1. Chap.O): s. Theorem 5. A = [T]g and Bi = [Tw]-y Then.{(a.0). Since T ( e i ) = T ( l . •.T(e 2 ) = (-1. ' by Exercise 21 of Section 4. vn} for V.e 2 }) . Proof. 3c).T(e 1 ). 2}) = P-2(R). Choose an ordered basis 7 = {v\ .t>2.{ denned by T(a. •. As a linear operator.T 2 (e 1 )..Then /(/) det(. 0 .0.l.}) = span({ei. and extend it to an ordered basis (3 — {v\. Let.21. A can be written in the form A = B\ O B2 B3 Let f(t) be the characteristic polynomial of T and g(t) the characteristic polynomial of Tw. T-cyclic subspace generated by x is span ({x .. Tw inherits certain properties from its parent operator T.t.3. Thus g(t) divides f(t). and let W be a T-invariant subspace of V./:. byc) = (-b + c.

2c .Then A ./.. Then (\ 0 = 0 \o 1 1 0 0 2 0 2 1 -1\ 1 -1 \) R> . Let T be a linear operator on a finite-dimensional vector space V. c.21.4 Invariant Subspaces and the Cayley-Hamilton Theorem Example 5 Let T be the linear operator on R' defined by T(a.T2(v).T(v).a. Extend 7 to the standard ordered basis 0 for R1.s.21.{(/.Tk '('•)} ^ a basis for W. In this regard. Lei -) — {> \.d.i 0 f(t) = det(A . and let W denote the T-cyclic subspace ofV generated by a nonzero vector v 6 V.22.tj ( ' ' o . .d.g(t).0.0.det ! .Sec.(a + b.<->). 315 and let W . c + d)..6. (b) Ifaov + aiT(v) + Hafc_iTfc" 1(v)+Tk(v) = 0. Then (a) {v.1 polynomial of Tw is f(t) = (—1) (an 4.[Twl-v = 4 A=[T]f3 in the notation of Theorem 5. for any vector (a. 0 1 0 2-t -1 0 1 1 .0) € W. 6. — 2 / -< r r_\ In view of Theorem 5.6. cyclic subspaces are useful because the characteristic polynomial of the restriction of a linear operator T to a cyclic subspace is readily computable. 0. 2c .tl4) = det 0 \ 0 ! 2 1 2 -1 \ 1 . Let k = dim(W). . . b + d.0) .\t + • • • + ak-it . 6. T h e o r e m 5. which is an ordered basis for W.. then the characteristic A . 0): t.. T(«. we may use the characteristic polynomial of Tw to gain information about the characteristic polynomial of T itself. d) = (a 4-6 4. 5.0. s 6 R}. Observe that W is a T-invariant subspace of R' because. Let f(t) be the characteristic polynomial of T and g(t) be the characteristic polynomial of Tw.0) G R4.

.. e € Z. proving (b). Furthermore. (a) Since v ^ 0.tf). By Exercise 11. there exist scalars bo. Furthermore. Let j be the largest positive integer for which 0 = {v.). 39)... So Z is T-invariant. We use this information to show that Z is a T-invariant subspace of V. Observe that /() 0 I 0 [Tw]/s = \0 0 ••• 1 -ak-J an \ -«i which has the characteristic polynomial /(i) = (-l) f c (a 0 -r-a 1 i4----4-a f c _ 1 i f c . Then ji is a basis for Z.k \ be the scalars such that a0v + a{T(y) + ••• + ak-. (b) Now view B (from (a)) as an ordered basis for W.Tk k JI(v) + Th(v) = 0. W is the smallest T-invariant subspace of V that contains v.. and hence belongs to Z.. W C Z. T^(v) 6 Z by Theorem 1. Let.22 and by means of determinants..b\. This proves (a)...1 tk) by Exercise 19. Z C W. Thus j = k. the T-cyclic subspace generated by e\. w E Z. Thus T(w) is a linear combination of vectors in Z. Let Z = span(.• • • 4fc-iT'(v). the set {v} is linearly independent. so that. a... Example 6 Let T be the linear operator of Example 3. Thus f(t) is the characteristic polynomial of Tw. and therefore dim(W) = j.T(v). Clearly. It follows that d is a basis for W. We compute the characteristic polynomial f(t) of Tw in two ways: by means of Theorem 5. 5 Diagonalization Proof. and so we conclude that Z = W.316 Chap. Such a j must exist because V is finite-dimensional. u\.bj-^ such that w = bnv + 6i T(i and hence T(w) = b0T(v) + b{T2{v) 4.e2}). and let W = span({ei. Let a.7 (p. . . bj-iV-Hv). Since w is a linear combination of the vectors of (3.V-1(v)} is linearly independent.

23 (Cayley-Hamilton). the zero transformation. Since T(ei) = c> and T(e 2 ) = — e. T "satisfies" its characteristic equation.) Therefore. where T is a linear operator and f(x) is a polynomial. The reader should refer to Appendix E for the definition of /(T). We show that /(T)(v) = 0 for all v E V.ait +-••• + afc-it*" 1 4. hence there exists a polynomial q(t) such that f(t) = q(t)g(t).22.22. By Theorem 5. • • • • dk l . By Theorem 5. and that T 2 (ci) = —e\. and suppose that dim(W) = k. From Example 3. we have that {ei. f(t) = (-l)2(l + 0t + t2) = t2 + l. (b) By means of determinants. Let T be a linear operator on a finite-dimensional vector space V. 1 The Cayley-Hamilton Theorem As an illustration of the importance of Theorem 5. Then /(T) = T 0 ..Sec. which is an ordered basis for W.22(a). Let W be the T-cyclic subspace generated by u. «\.e^} is a cycle that generates W. + T2(cl)^().\.22(b) implies that g{t) = (-l) f c (a 0 4. we have [T w ].e2j. g(t) divides f(i).Combining these two equations yields 9(T)(v) = (-l) f c (a 0 l + oiT + • • • + afc-iT^" 1 + Tk)(v) = 0. That is. Theorem 5. Proof. -t] 0 -1 . 5. Let ft = {ci.22(b).21. = 0. This is obvious if v = 0 because /(T) is linear: so suppose that v ^ 0. and therefore. Hence \ex+{)T(e. 5. we prove a wellknown result that is used in Chapter 7.such that -A--1 v)+Tk(v) a0v + aiT(v) + • • • + dk-iV Hence Theorem 5. by Theorem 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 317 (a) By means of Theorem. and let f(t) be the characteristic polynomial of T.tk) is the characteristic polynomial of Tw. So f(T)(v) = q(T)g(T)(v) = q(T)(g(T)(v)) = q(J){0) = 0. there exist scalars an.

(1 < i < k). and suppose that V = W| © W-2 © • • • © \Nk. 5 Diagonalization Let T be the linear operator on R2 defined by T(a. For example.e 2 }.-(/) is the characteristic polynomial of T. The characteristic polynomial of T is.5.b).(/) is the characteristic polynomial of Tw. Invariant Subspaces and Direct Sums* It is useful to decompose a finite-dimensional vector space V into a direct sum of as many T-invariant subspaces as possible because the behavior of T on V can be inferred from its behavior on the direct summands. Wre proceed to gather a few facts about direct sums of T-invariant subspaces that are used in Section 7.2t 4. In Chapter 7. Let A be an n x n matrix.//. —2a -1. It is easily verified that T 0 = /(T) = T 2 . See Exercise 15.det f ^ t % ) = t2 .4 .2T + 51. Then A= 1 -2 2 1/ ' where A = [T]^. f(A) = A2-2A 0 0 0 ()/' Example 7 suggests the following result. Let T be a linear operator on a finite-dimensional vector space V. (-2 -4\ 4 -2 -3 / ". The first of these facts is about characteristic polynomials. where W. T is diagonalizable if and only if V can be decomposed into a direct sum of one-dimensional T-invariant subspaces (see Exercise 36). Then f(A) = O. and let f(t) be the characteristic polynomial of A. /(f) = det(. Corollary (Cayley-Hamilton Theorem for Matrices). and let (3= {ei. Then fi(t)'f2(t) . is a Tinvariant subspace of V for each i (1 < i < k). Suppose that /. Similarly. we consider alternate ways of decomposing V into direct sums of T-invariant subspaces if T is not diagonalizable. the n x n zero matrix.24.318 Example 7 Chap. 'This subsection uses optional material on direct sums from Section 5. 1 + 5/ = -3 4 4 \ .4. Theorem 5. therefore.2. b) = (a + 2b. 0 5 .//) . Proof.

Suppose first that k — 2. c + d). 278).t): s. has characteristic polynomial (A. sum of k subspaces.().W I Wfc. say. and let W.g(t)-fk(t). a \ b. Now assume t hat the theorem is valid for k — \ summands. Example 8 Lei T be the linear operator on R4 denned by T(a.t. it follows that A= Bi O' O B2y ' where O and O' are zero matrices of the appropriate sizes.. c. 02 = {e3.d. where k — 1 > 2.tl) = det(B.24.. suppose that T is a diagonalizable linear operator on a finite-dimensional vector space V with distinct eigenvalues A].4 Invariant Subspaces and the Cayley-Hamilton Theorem 319 Proof.e 2 }.{(s. 5. v = w . Let A = [T^.• -©Wfc_i. b.0. V is a direct sum of the eigenspaces of T..e 3 .t E R] and W2 . and 0 = fa U ft2 == {ei. Then /(/) = det(i4 . Let W = Wi I.W2-l hWfc 1.t ) m * . .. /(/) .()): s. and 0 = fa U 02Then 3 is an ordered basis for V by Theorem 5. A2 A/. the restriction of T to E\.( A f c .:•••• e w f e .* ) m i ( A 2 . So by the case for k = 2. For each eigenvalue A. we may view this situation in the context of Theorem 5. Let fa = {ei.fi(t)-f2{t) fk(t). as expected. and therefore g(t) — fi(t)-f2(t) fk-i(t) by the induction hypothesis. c .Sec.21. is the dimension of E^. It follows that the multiplicity of each eigenvalue is equal to the dimension of the corresponding eigenspace.tl) = fi(t)-f2(t) as in the proof of Theorem 5. In what follows. Notice that W] and W 2 are each T-invariant and that R1 = Wi © W 2 . B\ = [TwJ/Si) a n d J^2 = [Tw2]/V By Exercise 34. fa an ordered basis for W2. 276). Let .s. We conclude that f{t)=g(t)'fk(t) . proving the result for k — 2.b. and suppose that V is a direct. It is easily verified that W is T-invariant and that V . I As an illustration of this result.e2. d) = (2a .. By Theorem 5.t E R}. The proof is by mathematical induction on k.24. the characteristic polynomial f(t) of T is the product /(*) = ( A i .10(d) (p.Clearly W = Wi ®W2©.e4}. ©w2••].3\ be an ordered basis for Wj. .11 (p. f(l) denotes the characteristic polynomial of T. f)"1' • where m. Then fa is an .t ) m a " .{(0.e 4 }.«/)• det(B 2 . where g(t) is the characteristic polynomial of Tw.. Since each eigenspace is T-invariant. By Theorem 5.

/i(i).tl) = det(5i . B2 . Let # . Let A = [T]a..... and 0 is an ordered basis for R 4 . Bk recursively by lh 0 B2 If A = B\ © £ 2 © .®Bk = (B1®B2®---(BBk-1. Then f(t) = det(A ...Then B. (t)-f2(t).B<2. B2. If B\.(j-m) [0 fori <i. Definition.and Tw2r respectively. Bi = [Twjfr. Bk are square matrices with entries from F.tI)>det(B2 . and B3 = .II) = / . Tw. • The matrix A in Example 8 can be obtained by joining the mat rices B\ and B2 in the manner explained in the next definition. and B2 = [Tw 2 k. Bi Bk.1 0 0 1 0 0 I \o 0 1 0\ 0 7 2 -1 Bo = -1 1 A= O Bo Let . and f2(t) denote the characteristic polynomials of T. then we often write (Bx O A = \ O O B2 O Bk) o\ () Example 9 Let Bx ./(f).(3)..j otherwise.320 Chap. fa is an ordered basis for VV2. then we define the direct sum of B\. E UmXIII{F).( j " ). We define the direct sum of B\ and B2) denoted B\ OjB2.. 5 Diagonalization ordered basis for Wi. as the (m + ri) x (rn + n) matrix A such that ({Bxlij Aij = < (B 2 )(i-m). and let B2 E M„xn(F).j<m <n + m for m + l<i. = and (2 ..

T h e o r e m 5.Wi : WV! • • • I WA. and let \Ni. and let v and w be in V... then v — iv. . = [T w . let . Label the following statements as true or false. It is an extension of Exercise 34 to the case k>2. and W — W ..\Nk be T-invariant subspaces of V such that V . For each i.4 Invariant Subspaces and the Cayley-Hamilton Theorem Then ( 1 1 0 Bi © B> '• lh 0 0 ^ 0 2 321 1 0 0 0 0 0 0 0 0 0 0 3 0 0 0 1 2 0 1 2 0 1 1 0 ^ 0 0 1 3 1J The final result of this section relates direct sums of matrices to direct sums of invariant subspaces. k.Sec. Let A = [T]0 and B.\c for i = 1 . (d) If T is a linear operator on a finite-dimensional vector space V.ria. Then A = BX © B2 © • • • © Bk. EXERCISES 1. See Exercise 35. be an ordered basis for W.\N2. 5. (b) If T is a linear operator on a finite-dimensional vector space V and W is a T-invariant subspace of V. Proof.. and let 3 = 0} U 02 U • • • U 0k. then for any r E V the T-cyclic subspace generated by v is the same as the T-cyclic subspace generated by T(r). then the characteristic polynomial of Tw divides the characteristic polynomial of T. (g) If T is a linear operator on a finite-dimensional vector space V.. (a) There exists a linear operator T with no T-invariant subspace. and if V is the direct sum of/. Let T be a linear operator on a finite-dimensional vector space V. .25. (e) Let T be a linear operator on an //-dimensional vector space. is a direct sum of k matrices. W is the T-cyclic subspace generated by w.. If W is the T-cyclic subspace generated by v. 2 . (f) Any polynomial of degree // with leading coefficient (—1)" is the characteristic polynomial of some linear operator. . . Then there exists a polynomial g(t) of degree // such that g(T) = T 0 . (c) Let T be a linear operator on a finite-dimensional vector space V.4. then there is an ordered basis 0 for V such that [Tj:.T-inva. . .nt subspaces.

a 4. For each linear operator T on the vector space V. and z 0 1 (c) V = IV!L. and z = cx. For each of the following linear operators T on the vector space V. and W = { / E V: f(l) = ai 4. 5. Let T be a linear operator on a finite-dimensional vector space V. T(A) = A'\ and z = 1 0 (d) v = . T(/(/)) = [.3 (b) V .).b + c. and W = P2(R) (b) V . T-invariant subspace is a linear operator on that subspace.b for some a and 6} '0 1' (e) V = M2x2(R). b . c.322 Chap.t): te R\ (d) V = C([0.. Prove that the following subspaces are T-invariant. b. then the same is true for T. 1]). d) = {a + b.Q 2) A and z = 7. T(tt. (a) { 0 } a n d V (b) N(T) and R(T) (c) E\.4) . Prove that the intersection of any collection of T-invariant subspaces of V is a T-invariant subspace of V.(R). as in Example 6. a + c. and W = P2(R) (c) V . compute the characteristic polynomial of Tw in two ways./•).se 6.' f(x)dx] I. Let T be a linear operator on a vector space V. (a) V = P3(R).c.i.c. For each linear operator T and cyclic subspace W in Exerc4. Prove that W is p(T)-invariant for any polynomial g(t).fix). T(/(s)) = /"(. and W = {4 G V: A* = A) 3. Let T be a linear operator on a vector space with a T-invariant./. . 8. Let T be a linear operator on a vector space V. T(. Prove that the restriction of a linear operator T to a. c) = (a 4. . 2(R). determine whether the given subspace W is a T-invariant subspace of V. for any eigenvalue A of T 4.P3(i?. Prove that if v is an eigenvector of Tw with corresponding eigenvalue A. (a) V = R4.R.P(R).d). 5 Diagonalization 2. b. subspace W. T(f(:r)) =.xf(x). T(a. 9. 6. a 4-6 4. and let W be a Tinvariant subspace of V. a + 6 + c). T(A) = A. T(/(a:)) . find an ordered basis for the T-cyclic subspace generated by the vector z. and W = {{Lt.

4 Invariant Subspaces and the Cayley-Hamilton Theorem 323 10. 14. Let A be an n x n matrix with characteristic polynomial f(t) = (-l)ntn + an-it n . 15. let v be a nonzero vector in V. 11. then any nontrivial T-invariant subspace of V contains an eigenvector of T. Use the Cayley-Hamilton theorem (Theorem 5." But this argument is nonsense. (a) Prove that if the characteristic polynomial of T splits.l / a o ) [ ( . then An — 1 n A" 1 = ( .AI) = det(O) = 0. it is tempting to "prove" that f(A) = O by saying u f(A) = det(A . (a) Prove that A is invertible if and only if an ^ 0. A.. 5.A — tl) is the characteristic polynomial of A.--})) < n. Prove that dim(span({/ n .23) to prove its corollary for matrices. and let W be the T-cyclic subspace of V generated by v. Let T be a linear operator on a vector space V. Let A be an n x n matrix. Prove that the polynomial g(i) of Exercise 13 can always be chosen so that its degree is less than dim(W).Sec.).l )nM . Prove that A = Bx O Bi B. prove that w E W if and only if there exists a polynomial g(t) such that w = g(T)(?. Let T be a linear operator on a vector space V. find the characteristic polynomial f(t) of T. (b) Prove that if A is invertible.A 2 . then so does the characteristic polynomial of the restriction of T to any T-invariant subspace of V. 12. and let W be the T-cyclic subspace of V generated by v. 17.21. For any w E V.An~2 aiUn\ J. Let T be a linear operator on a finite-dimensional vector space V. Prove that (a) W is T-invariant. (b) Deduce that if the characteristic polynomial of T splits. Why? 16. 13.l + • ait + OQ. Warning: If f(t) = det (. . in the proof of Theorem 5. and verify that the characteristic polynomial of Tw (computed in Exercise 9) divides f(t). (b) Any T-invariant subspace of V containing v also contains W. For each linear operator in Exercise 6. 18. let u b e a nonzero vector in V.

Suppose that Vi. 2 -ak-2 -Ok-iJ where ao./ k-\ + f Hint: Use mathematical induction on k. Prove that if v\ 4.324 (c) Use (b) to compute A ! Chap. Prove that if U is a linear operator on V. and suppose that V is a T-cyclic subspace of itself.+ a fc _. Prove that either V is a T-cyclic subspace of itself or T = cl for some scalar c. 23. Let T be a linear operator on a finite-dimensional vector space V.. • • •. Show that if U is any linear operator on V such that UT = TU. and let W be a T-invariant subspace of V. . then UT = TU if and only if U = g(T) for some polynomial g(t). 5 Diagonalization for 19. ak-\ are arbitrary scalars. 21. then U = q(T) for some polvnoniial g(t). Let T be a linear operator on a two-dimensional vector space V. Prove that the characteristic polynomial of A is (-!)*(«<) 4-ai/. Let T be a linear operator on a two-dimensional vector space V and suppose that T ^ c\ for any scalar c. Hint: Use mathematical induction on k. 20. Prove that the restriction of a diagonalizable linear operator T to any nontrivial T-invariant subspace is also diagonalizable.ai. Let T be a linear operator on a vector space V.vk is in W. then vi E W for all i.! -o. 22.v2 4 (.v2. Hint: Use the result of Exercise 23. + --. 24. vk are eigenvectors of T corresponding to distinct eigenvalues. expanding the determinant along the first row. Hint: Suppose that V is generated by /. Choose g(t) according to Exercise 13 s o t h a t o(T)(f) = U(t>). Let A denote the k X k matrix /() 0 ••• 0 1 0 ••• 0 0 1 ••• 0 0 0 \0 0 0 1 -a„ \ -a.

Let T be a linear operator on an //-dimensional vector space V such that T has n distinct eigenvalues.3.) Hint: For any eigenvalue A of T. (a) Prove. Vfc+ij • • • . is.. T is a fixed linear operator on a finite-dimensional vector space V..4. Prove that V is a T-cyclic subspace of itself. show that T(e + W) = T(v' + W) whenever v + \N = v' + W..1. that T is well defined. and T.v2.v n } for V. Hint: Extend an ordered basis 7 = {v\.. and let W be a T-invariant subspace of V. That.4 Invariant Subspaces and the Cayley-Hamilton Theorem 325 25.vk} for W to an ordered basis ft = {vi.2: If T and U are diagonalizable linear operators on a finite-dimensional vector space V such that UT = TU. (See the definitions in the exercises of Section 5.. the reader should first review the other exercises treating quotient spaces: Exercise 35 of Section 1.. Let T be a linear operator on a vector space V. v2. and Exercise 24 of Section 2. (b) Prove that T is a linear operator on V/W. 27. For the purposes of Exercises 27 through 32. and W is a nonzero T-invariant subspace of V. Exercise 40 of Section 2. (c) bet //: V — > V/W be the linear transformation defined in Exercise 40 of Section 2..W for any v 4. Exercises 27 through 32 require familiarity with quotient spaces as defined in Exercise 31 of Section 1.. (a) Prove the converse to Exercise 18(a) of Section 5.. and apply Exercise 24 to obtain a basis for EA of eigenvectors of U. Show that the diagram of Figure 5. vk. 5.W E V/W. Before attempting these exercises. Then show that the collection of . Let f(t). prove that rjT = Tip (This exercise does not require the assumption that V is finite-dimensional.) V V V/W V/W Figure 5. . T n _ 1 ( v ) } is linearly independent.2.6. respectively.6 commutes. Tw. show that E\ is U-invariant. g(t).Sec. Prove that f(t) = g(i)h(i). then T and U are simultaneously diagonalizable. (b) State and prove a matrix version of (a). T(v). and h(t) be the characteristic polynomials of T.. 26.. Hint: Use Exercise 23 to find a vector v such that {v.1 by TJ(V) = v + W. Definition. We require the following definition. that is. Define T: V/W -* V/W by T(v + W) = T(v) 4.6 28.

2 4.25 for the case k = 2. then T is diagonalizable. then so isT. . the number of subspaces.. The results of Theorem 5. concerned with direct sums. 4.22 to compute1 the characteristic polynomial of Tw. and use this fact to compute the characteristic polynomial of T. T = LA. and apply the induction hypothesis to T: V/W — * V/W. Use the hint in Exercise 28 to prove that if T is diagonalizable. Exercises 33 through 40 arc. Hint: Begin with Exercise 34 and extend it using mathematical induction on k. (This result is used in the proof of Theorem 5. and let Wi. let.6 is helpful here. Exercise 35(b) of Section 1.) 35. 32. First prove that T has an eigenvector v. (c) Use the results of (a) and (b) to find the characteristic polynomial of A. Prove that if both Tw and T are diagonalizable and have no common eigenvalues.W 2 4 h Wfc is also a T-invariant subspace of V.i /W.W .W. . Prove the converse to Exercise 9(a) of Section 5. 31. 5 Diagonalization cosets a = {vk+i 4.j is an upper triangular matrix. . be. T-invariant subspaces of V. v„ + W} is an ordered basis for V/W.326 Chap. then there is an ordered basis 0 for V such that [T]. (b) Show that {e2 4 W} is a basis for R . -3\ 2 3 4 . . Let A = /I . Give a direct proof of Theorem 5. 34. This is illustrated in the next exercise. and prove.24.2: If the characteristic polynomial of T splits. and let W be the cyclic subspace V 2 l) of R generated by e±. W> W/.22 and Exercise 28 are useful in devising methods for computing characteristic polynomials without the use of determinants. vk \. Let T be a linear operator on a vector space V. . Prove Theorem 5. Prove that W. that where £ j = [T]7 and S 3 = [T]f. 1(4 W = span({e}).. 30. 33.25. (a) Use Theorem 5. Hints: Apply mathematical induction to diin(V). 29.

5. 37. .3 such that [T].. be square matrices with entries in the same field. establish the general result by mathematical induction on dim(V). Prove that T is diagonalizable if and only if T w .. 39.4 Invariant Subspaces and the Cayley-Hamilton Theorem 327 36.. 42.Bk. Prove that T is diagonalizable if and only if V is the direct sum of one-dimensional T-invariant subspaces. and let W|.2 //)}) is L^-invariant. is diagonalizable for all /'. Let . Let 2 n •2 A = \n2 . Let T be a linear operator on a. Let T be a linear operator on a finite-dimensional vector space V. (This is an extension of Exercise 25. finite-dimensional vector space1 V.n. 1 1).) •••<!(•! (T w . Prove that there is an ordered basis . B2... be T-invariant subspaces of V such that V = WiO W2 ] ••• DWfc.) Hints for the ease that the operators commute: The result is trivial if each operator has only OIK1 eigenvalue. (1. 38.W_> WA.W-_> W. 41.4 E M„ x „(/i) be the matrix defined by .).Sec..4.. — 1 for all i and j. Hint: First prove that A has rank 2 and that span({(l. Prove that det(T) =det(T W | )«let(Tw.( is a diagonal matrix for all T E C if and only if the. B\. + I ir n 4. and let A — B\ B> > ••• i. operators of C commute under composition.2 7 Find the characteristic polynomial of A. has more than one eigenvalue.. Let B\. find the characteristic polynomial of .4. Lei T be a linear operator on a finite-dimensional vector space V. Otherwise. Prove that the characteristic polynomial of A is the product of the characteristic polynomials of the P. 40. be T-invariant subspaces of V such that V W| | Wj : • • • W/. using the fact that V is the direct sum of the eigenspaces of some operator in C that. and let Wi. Let C be a collection of diagonalizable linear operators on a finitedimensional vector space V.'s.

'514 Diagonalizable linear operator 245 Diagonalizable matrix 246 Direct sum of matrices 320 Direct sum of subspaces 275 Eigenspace of a linear operator 264 Eigenspace of a matrix 264 Eigenvalue of a linear operator 246 Eigenvalue of a matrix 246 Eigenvector of a linear operator 246 Eigenvector of a matrix 246 Fixed probability vector 301 Generator of a cyclic subspace 313 Gerschgorin disk 296 Initial probability vector for a Markov chain 292 Invariant. 5 Diagonalization INDEX OF DEFINITIONS FOR CHAPTER 5 Absorbing Markov chain 404 Absorbing state 404 Characteristic polynomial of a linear operator 249 Characteristic polynomial of a matrix 218 Column sum of a matrix 295 Convergence of matrices 284 Cyclic subspace .328 Chap. subspace 313 Limit of a sequence of mat rices 284 Markov chain 291 Markov process 291 Multiplicity of an eigenvalue 263 Probability vector 289 Regular transition matrix 294 Row sum of a matrix 295 Splits 262 Stochastic process 288 Sum of subspaces 275 Transition matrix 288 .

) We introduce the idea of distance or length into vector spaces via a much richer structure. we assume that all vector spaces are over the field F. This added structure provides applications to geometry (Sections 6. least squares (Section 6. and perpendicularity in R~ and R'' may be extended to more general real and complex vector spaces. Except for Section 6.5 and 6.Lost applications of mathematics are involved with the concept. a 329 .9* 6. the so-called inner product spaa structure.10* 6.9).1 INNER PRODUCTS AND NORMS Many geometric notions such as angle. to every ordered pair of vectors . length. should play a special role. Let V be a vector space over F. and quadratic forms (Section G. conditioning in systems of linear equations (Section 6.8). which have a built-in notion of distance. An inner product on V is a function that assigns.7* 6.6 6.1 6.3 6. of measurement and hence of the magnitude or relative size of various quantities. All of these ideas are related to the concept of inner product. physics (Section 6.r and y in V.11* Inner Products and Norms The Gram-Schmidt Orthogonalization Process and Orthogonal Complements The Adjoint of a Linear Operator Normal and Self-Adjoint Operators Unitary and Orthogonal Operators and Their Matrices Orthogonal Projections and the Spectral Theorem The Singular Value Decomposition and the Pseudoinverse Bilinear and Quadratic Forms Einstein's Special Theory of Relativity Conditioning and the Rayleigh Quotient The Geometry of Orthogonal Operators M . 6.10).11).2 6. Definition. (See Appendix D for properties of complex numbers. where F denotes either R or C.8.8* 6.6 I n n e r P r o d u c t S p a c e s 6. So it is not surprising that the fields of real and complex numbers.4 6.3).5 6.

b2.4) and y = (2 . is easily shown that if a\ .x) if F = R.z.y) = (x. (c) (x. if z — (c\. for x = (1 4./.y). we may define another inner product by the rule (x.^2(a-i + Ci)k = ^ a. (b) (cx. (d) (x.. c2-. the following hold: (a) (x + z. y) is any inner product on a. vector space.1 5 1 • The inner product in Example 1 is called the standard inner product on F".y) Example 1 For x = (ai. For example. a2.x) X ) ifxi= 0.i.) (x 4.4 I 5i) in C2.y). • • • • c-n). Conditions (a) and (b) simply require that the. . we have for (a. an E F and y..5i) = 1 5 . such that for all x.. t hen (^aiVi..3i. where the bar denotes complex conjugation.y). y) . y). and z in V and all c in F. define fay) n = ^2"i'>ii i = ^2at (vt. The verification that (•• •) satisfies conditions (a) through (d) is easy. y) = (i + i)(2 4... J). i=l i=l i=l = fa y) + (z... denoted (x.. It.y) = (y.y). V and r > 0.y) = r{x..y) = c(x.. If r < 0. v2.a2. bn) in F".. Example 2 If (. Note that (e) reduces to (x. y. 6 Inner Product Spaces scalar in F.x).330 Chap.b... vn E V. • . an) and y = (b\. Thus.y).y) = (y. . + ^ c. v\.. then (d) would not hold.. (x.y)..y) + (z. When F — R the conjugations are not needed. inner product be linear in the first component. and in early courses this standard inner product is usually called the dot product and is denoted by X'-y instead of {x.30 + 4(4 .

I &=] Now if .i 2 M .B E V. and define (. hold and leave (b) and (e) to the reader. For f. and hence (/.1 Inner Products and Norms Example 3 331 Let V — C([0. In the case that .g) .g C V.J : E ^ fc=l / . \ I.j. 6. then (x. • Definition. Example 4 Let A= Then A* = 1 . let. B) . Also (A. then Aki ^ 0 for some k and i.tv(B'A) for A.B.C*B) .4 has real entries.) We verify that (a) and (d) of the definition of inner product.l) — Y]"_ . • . .tr(C*B) = (A. define (f.C).4) (. .4.t r ( C M 4. A': is simply the I ranspose of A. then f2 is bounded away from zero on some subinterval of [0. and (c) is trivial.4 ^ ().Sec. A. Then (using Exercise ( > of Section 1.1]).M„ X „(F). Example 5 Let V . Since the preceding integral is linear in / .][] f(t)g(t)dt.C C V. A) > 0.y) = //'•'•• The conjugate transpose of a matrix plays a very important role in the remainder of this chapter.. for this purpose.2/ 3 \i i 2 1 4-2/ 3+U Notice that if x and y are viewed as column vectors in F".1]. A) = tr(AM) = ^2(A*A)u i--\ i=] = X)E(A*)**A« . I f / / tl. A . / ) = / J [f(t)]2 dt > 0. An. (a) and (b) are immediate.1] (continuity is used here). We define the conjugate transpose or adjoint of A to be the n x m matrix A* such that (A')ij — A J.C) + {B. C) = tr(C*(/l 4.4 + B.B)) . for all i. (Recall that the trace of a matrix A is defined by tr(. So (A.triC* A) r. the vector space of real-valued continuous functions on [0. Let A E M m X n ( F ) .

the imaginary number /' can be treated as a constant under the integration sign. A very important inner product space that resembles C([0. it follows that H is an inner product space (see Exercise 16(a)). every complex-valued function / may be written as f — fi -\. The reason for the constant 1/27T will become evident later. For the remainder of this cha. these two inner products yield two different inner product spaces. Even though the underlying vector space is the same.if2. whereas if F = R. it can be shown that both (f(x). and 5 also provide examples of inner product spaces. y) is restricted to the vectors x.ple 1. The reader is cautioned that two distinct inner products on a given vector space yield two distinct inner product spaces.x. Mnxn(F) denotes the inner product space with the Frobenius inner product as defined in Example 5.y) and W is a subspace of V.g(x))2 = j f(t)g(t)dt are inner products on the vector space P(R). 6 Inner Product Spaces The inner product on M n X n ( F ) in Example 5 is called the Frobenius inner product. then W is also an inner product space when the same function (. Some properties that follow easily from the definition of an inner product are contained in the next theorem. we call V a real inner product space. It is clear that if V has an inner product (x.g(x)).pter. as well as the assumption of continuity. At this point. Likewise. A vector space V over F endowed with a specific inner product is called an inner product space. If F — C. however. 27t] with the inner product I 2 T T (f. y E W. Thus we have jf=]fi+ijf2 and / / = / / • From these properties. 3. This inner product space. For example. which arises often in the context of physical situations. is examined more closely in later sections. l|) is the space H of continuous complex-valued functions defined on the interval [0. . Second. where f\ and f2 are real-valued functions. we call V a complex inner product space. = 1 f(t)g(t)dt and (f(x). the polynomials f(x) = x and g(x) — x2 are orthogonal in the second inner product space. but not in the first.332 Chap. Thus Examples 1.g) = f(t)g(t)dt. For instance. First. we mention a few facts about integration of complex-valued functions. F n denotes the inner product space with the standard inner product as defined in Exam.

. Example 6 Let V — F".y).Sec. we need only observe that the length of x — (a. and (e) are left as exercises. (c) (x. (d). For x E V.r|| = 0 if and only if x = tl.). Theorem 6.. (b) facy) =c{x. The reader should observe that (a) and (b) of Theorem (i.b.x). Let V be an inner product space. y.c2 — sj(x. (c) (Canchy Schwarz Inequality) \{.1.r|| 4. .x) + (z.y G V and c E F. As we might expect.. • . Then for all . the following statements are true. Let V be an inner product space over F.i\y)\ < ||:r|| • \\y\\.l show thai the inner product is conjugate linear in the second component./.x) + fax) = fay) + The proofs of (b). (c)./• = (r/|. ./-.x) = (y.0. Then for x. Note that if // — I. This leads to the following definition. 2) for aii x E V. x. (d) (Triangle Inequality) \\x \ y\\ < ||. Proof (a) We have {.x) = (y.x).\. \\x\\ > 0. we define the norm or length of x by \\x\\ = \/(x.x) = 0 . i^ the Euclidean definition of Length. 2 € V and c E F.0) = (0. a-2 . //) = (.||/y||.. (a) ||cx|| = |c|-|k|l(b) ||.// 4.a 2 . In any ease.r. In order to generalize t he notion of length in R' to arbitrary inner product spaces. (e) //' {. ) . (a) (x. Let V be an inner product space. fcnen // — z.y) \ faz). we have ||«|| — lal.1 Inner Products and Norms 333 Theorem 6. (d) (. If . a n ) | | E 2 faz).y + z) = (x./•) = 0 if and only if x .. Definition. the well-known properties of Euclidean length in R'5 hold in general./•.c) C R' is given by s/a2 — b. as shown next. 6. the following statements are true.2.( / / + z. -a. then -1 I /2 \x\\ : ||(ai.

7. x) + (x.c (x. y) . then the result is immediate.cy) — c (y. we may apply (c) and (d) of Theorem G.r. we have 0 < \\x — cy\\ — (x — cy. Note that we used (c) to prove (d).i L. x — cy) = (x.y). y) = \\x\\2 4. So assume that y ^ 0. x) .334 Chap. = l E N 2 -i 1/2 . In particular. (d) We have \\x + y||2 = (x + y.s exercises. y) + (y.c (y. Example 7 For F".||2 + 2||:r||-||y|| + \\y\\2 = (\\x\\ + \\y\\)2.2${x.y) denotes the real part of the complex number (x.x 4 y) = fa x) + (y. For any c E F . x . (c) ]£y = 0.|/)| + ||y||2 < ||. 6 Inner Product Spaces Proof.y) \\\y\\2 <||:r|| 2 4-2|(. x) f cc (y. We leave1 the proofs of (a) and (b) a.y) =Wx\?-l^y) II < &''I i= 1 1 /2r " 2 i 2 £ N J-\ J -I 1/2 < Ei«. 1 The case when equality results in (c) and (d) is considered in Exercise 15. if we set fay) (<j-y) the inequality becomes o<fax)-^i:-y)f from which (c) follows.2 to the standard inner product to obtain the following well-known inequalities: J^dibi and 1 1/2 £ k I 'hi 2 [y. where )R(x. y). x — cy) = (x.

1. » 2 T T \/TOJ Jn) —0 1 2iri(rn eimteint dt = ei(m-n)t dt 2TT • _ 7 '7 _ _ i(rn — n)t = 0. Vectors x and y in V are orthogonal (perpendicular) if (x. v2. but it is not orthonormal. (—1. however. we have that (x.-1.1 Inner Products and Norms 335 The reader may recall from earlier courses that. {(1. A subset S ofV is orthogonal if any two distinct vectors in S are orthogonal. (1. Example 9 Recall the inner product space H (defined on page 332). a subset S of V is orthonormal ifS is orthogonal and consists entirely of unit vectors.2)} Our next example is of an infinite orthonormal set that is important in analysis. unit vector if\\x\\ = 1.Vj) = Sij. . we have. i is the imaginary number such that i2 = — 1. Clearly S is a subset of H. then (l/||x||)rr is a unit vector. Definitions. Let V he an inner product space. that is. then S is orthonormal if and only if (vi. A vector x in V is a. 4=(-i. 1). -^(1. o ) . This equation implies (c) immediately since |cos#| < 1. This set is used in later examples in this chapter. where Sij denotes the Kronecker delta. The process of multiplying a nonzero vector by the reciprocal of its length is called normalizing.y) = 0. Finally. their orthogonality and that if x is any nonzero vector. where 0 (0 < 6 < TT) demotes the angle between x and y.: n is an integer}. Note that if S = {v\. for x and y in R3 or R2. 2)} is an orthogonal set of nonzero vectors.L ( u . For any integer n. Using the property that elt — e '' for every real number t. Notice also that nonzero vectors x and y are perpendicular if and only if cos# = 0.1. for m ^ n. observe that multiplying vectors by nonzero scalars does not affect.1).. if we normalize the vectors in the set. We introduce an important orthonormal subset S of H.. if and only if (x.y) — ||x||*|MI cos0. Example 8 In F3. let fn(t) — eint.. 6. (Recall that eint — count + isinnt.}.Sec. Also. 0).i.y) = 0. where 0 < t < 2-K. For what follows.) Now define S = {/„. we obtain the orthonormal set { . —1. We are now at the point where we can generalize the notion of perpendicularity to arbitrary inner product spaces.

2 + 3i) and y .i. be over the field of real or complex numbers. and (A.4 1 -i i 2 Compute (x. inequality.(2 v'. '|<y. If x. 3. {/. There is exactly one inner product on the vector space R". |j/||.y) — (x. In C([0. and \\f I g\\. spaces.. The triangle inequality only holds in finite-dimensional inner product.i) and y . where . y. x = (2. show that (x.2.y) — 0 for all x in an inner product space. • EXERCISES 1.„. then y — z. If (x.(2 4. (a) Complete the proof in Example 5 that (•. Only square matrices have a conjugate-transpose. Let. let f(t) . Compute (f\g) (as defined in Example 3).A||. (a) (b) (c) (d) (e) (f) (g) (h) An inner product is a scalar-valued function on the set of ordered pairs of vectors. 6 Inner Product Spaces In other words. \\B\\. \\y\\.1 f i.336 Also.y) — xAy* is an inner product.) = Smn.2i). y) for x = (1 . and z are vectors in an inner product space such that (x. 4. then y = 0.B) for . Then verify both the Cauchy Schwarz inequahty and the triangle. .i. 2.4 1 2 \i and B = 1 4.I and g(t) = e'. 3 -. Label the following statements as true or false. (b) Use the Frobenius inner product to compute ||. An inner product space must.y). Then verify both the Cauchy Schwarz inequality and the triangle inequality.i 0 5. An inner product is linear in both components. •) is an inner product (the Frobenius inner product) on MnX. Compute (x. 1]). 1 4-2/) be vectors in C3.z). and ||:r + v/||.|. In C2. Chap.(/**)./„. \\x\\.

then x = 0.1.B) on U2x2(R).Sec.c||2 4. 7.. Complete the proof of Theorem Ci. z) = 0 for all z E ft. (b) (A. 9.g(x)) = / J f'(t)g(t)dt on P(R).b).. 11..(•.B*. What does this equation state about parallelograms in R2? 12. where ' denotes differentiation. 14.... (a) ((a. Prove that (A + cB)* = A* +c. and let c be a scalar. scalars. (a) Prove that if V is an inner product space. are two inner products on a vector space V. Let V be an inner product space.d)) =ac bd on R2. then | (x. v2... Prove the parallelogram law on an inner product space V: that is.y) | = ||rr|| • \\y\\ if and only if one of the vectors x or y is a multiple of the other. Prove that \\x 4. (c) (f(x). Suppose that (•. show that \\x + y\\2 + \\x .\. Prove that (•. 337 Provide reasons why each of the following is not an inner product on the given vector spaces.2\\y\\2 for all x.1 Inner Products and Norms 6.ni".n2i=\ 13.y\\2 = ||a:||2 4.\\y\\2• Deduce the Pythagorean theorem in R2. •)x and (•. (b) Prove that if (x.y\\2 = 2||. •). •). let a = fay) 1 2 : . Let 0 be a basis for a finite-dimensional inner product space. <••/. Hint: If the identity holds and y -£ 0. B) = tv(A 4. Let {v\. y E V. a2. 4. -) 2 is another inner product on V. (a) Prove that if (x. Let A and B be n x n matrices...2. Complete the proof of Theorem 6. z) for all z E 0. then x = y. and suppose that X and y are orthogonal vectors in V.} be an orthogonal set in V. 6. z) = (y.(c. 15. and let a. 10. •) = (•. Prove that ak be J^^i i=l Ei«.

of the matrix A! Let A be an n x //. Prove that the representation in (a) is unique.338 Chap.i.r|| = \\ay 4. Let T be a linear operator on an inner product space V. A*. where 3? (x. then B{ = . where B[ — B\ and B2 . . Let V be an inner product space over F. if F = R. Prove that \\x ± y\\2 = \\x\\2 ± 24? fa y) + \\y\\2 for all x. 16. and suppose that ||T(. Prove the polar identities: For all x. and let W be an inner product space ewer F with inner product (•.4 = Bx 4. vectors. Let V be a vector space over /•'. and define ./:|| for all x.B-2.l]).4.?r)|| = ||. and A = Ax + iA2.and generalize. (b) Derive a similar result for the equality ||x 4. Prove that T is one-to-one. 19.y) — (T(x). That is. (t>) I ikll \W\ i < ll. If T: V —> W is linear.('. (a) Show that the vector space H with (•. Define Ai = -(A + A*) 2 (a) and A2 = 2/ -(A-A*). (b) Prove that A\ = . y E V. Would it be reasonable to define .C([(). respectively.r|| 4. 18./ Is this an inner product on V? 17. 6 Inner Product Spaces and let z = x — ay. prove that (x.4.. •). yE V.•1/2 (.T(y)) defines an inner product on V if and only if T is one-to-one. (a) (b) fay) fay) = l\\x + y\\2-\\\x= \ E L I <k\\x + jkyf y\\2 if F f(t)g(t)dt. matrix.r jy'| for all x. •) defined on page 332 is an inner product space.B2. (a) 20. and B2 = A2. Prove that y and z are orthogonal and a = 2 //I Then apply Exercise 10 to ||. prove1 that if . Let A be an n x n matrix. y) denotes the real part of the complex number (x.4| and A2 to be the real and imaginary parts. it to the case of /.y G V. where F = R or F .y). (b) Let V . = cwnere <2 = ~l 21. Let V be an inner product space. = A2.A//).y\\ = ||.z\\2 to obtain ||c|| = 0.\\y\\.

(d) Define linear operators T and U on V by T(x) — Ax and U(x) = A*x.y E V there exist v\. we may still define a norm.r||. ||^|| = max \AKj\ \\f\\ = max 1/(01 t€[0. Let V be a vector space over F. Show that [\J]p — [T]g for any orthonormal basis ft for V. . where F is either R or C. real-valued function on V satisfying the following three conditions for all x. Ay) = (A*x. (2) \\ax\\ = \a\ • ||. Prove that B = A*. Prove that the following are norms on the given vector spaces V.. 6.Sec. (a) V . we have (x.v2. 23.r|| > 0. let Q be the n x n matrix whose columns are the vectors in ft. (a) Prove that (x.y) for all x. Let V be a real or complex vector space (possibly infinite-dimensional). Regardless of whether V is or is not an inner product space.y) for all x. \\ • || as a. (b) V . Q* =Q '. (b) Suppose that for some B € M n x n ( F ) . The following definition is used in Exercises 24 27.y C V. (c) Let a be the standard ordered basis for V.11)..M m x n ( F ) . (b) Prove that if V — R" or V = C" and ft is the standard ordered basis. 24.bi. Let V = F n .y E_ V.l] for all .1 Inner Products and Norms 339 22. For any orthonormal basis 0 for V. Definition. Prove that (•• •) is an inner product. and \\x\\ = 0 if and only if x = 0.. and let 3 be a basis for V. then the inner product.„X/I(/•"). on V and that 0 is ail orthonormal basis for V. For x.4 G V for all / G V (a) an( i y = / ^hVji=\ .Ay) = (Bx.C([0. defined above is the standard inner product. Prove that. (3) \\x + y\\ < \\x\\ + \\y\\. Thus every real or complex vector space. y C V and a E F: (1) j|.v„ E ft such that x = 2_]aivi I I Define fay) = ])2"i. and let A E M. may be regarded as an inner product space.

j6|} 25. \cfay) every every every (3) in every .yEV. dfa y) f 0 if x ^ y. (b) Prove (x 4. y. (f) Prove | (x. y E V. 27.2 / l l 2 ] • Prove that (•. 6 Inner Product Spaces 11/11=/ \f(t)\dt forall/GV for all (a.1]).((c—r)x. Define fay) = 4 [lls + 3/ll 2 . y E V. Prove the following results for all x.r ) fay) .x) for all x E R2 if the norm is defined as in Exercise 24(d).(x. Hint: Condition the definition of norm can be helpful.l k . (d) V = R . where r varies over the set of rational numbers.x). •) defines an inner product on V such that ||. b)\\ = max{|a|.y) for every positive integer m and x. y) = r (x.'c||||y|| for every x. dfa x) = 0. Let || • || be a.b) E V i.y) | = | ( c . a. and x. every rational number r.y E V.x) for all x E V.y) | < ||.(cx. (g) Prove that for every c E R. 2y) = 2 (x. to establish item (b) of the definition of inner product. Hints: (a) Prove (x. y E V. (d) Prove m(—x. the scalar d.y) = d(y. y) + (u. z E V. called the distance between x and y. . 2 Chap.x||2 = (x.y)>0.340 (c) V = C([0. \c — r\ can be made arbitrarily small.. and define. (a) (b) (c) (d) (c) d(a. y) for all x. norm on a real vector space V satisfying the parallelogram law given in Exercise 11.y) for every positive integer n and x. Use Exercise 20 to show that there is no inner product (•. •) on R2 such that \\x\\2 = (x. dfay)< dfaz) + dfay). (e) Prove (rx. y) — (x.y E V.y€V.u. (c) Prove (nx. Let || • |j be a norm on a vector space V. y) for all x. y) for every rational number r and x. (h) Use the fact that for any c E R. d(x.y) = ?i(x. 26. for each ordered pair of vectors.y) = (x.y) = \\x — y\\.y) \ < 2|c-r|||:e||||y||.

furthermore. where V is regarded as a vector space over R. 30. The special properties of these bases stem from the fact that the basis vectors form an orthonormal set. A subset of V is an orthonormal basis forW if it is an ordered basis that is orthonormal./.y] is the real part of the complex number (x. •] is an inner product for V.. •] be the real-valued function such that [x. •). 6. Prove that [*. Let [•. Let (•. where V is regarded as a vector space over R. Then apply Exercise 29.i[x.ix] = 0 for all x E V.•] = 0 for all . •] is a real inner product on V. •) on V such that ||aj|| = (x. ) is a complex inner product on V. Let V be a complex inner product space with an inner product (•. 29. Example 2 The set • is an orthonormal basis for R2./• G V.lust as bases are the building blocks of vector spaces. y c V. Prove. Let || • || be a norm (as defined in Exercise 24) on a complex vector space V satisfying the parallelogram law given in Exercise 11. we have seen the special role of the standard ordered bases for C" and R". • . We now name such bases.y) for all x. Hint: Apply Exercise 27 to V regarded as a vector space over R.-. and suppose that [•.. Let V be a vector space over C./:. iy] Prove that for x. 6. Example 1 The standard ordered basis for F" is an orthonormal basis for F".2 THE GRAM-SCHMIDT ORTHOGONALIZATION PROCESS AND ORTHOGONAL COMPLEMENTS In previous chapters.. Definition. Prove that there is an inner product (•. y] 4. such that [.x) for all x G V. bases that are also orthonormal sets are the building blocks of inner product spaces.Sec.2 Gram-Schmidt Orthogonalization Process 341 28.y E V. . that [x. Let V be an inner product space.•) be the complex-valued function defined by fa y) = [.

Suppose that V\. . • • • . Then. lLai^"'^VJ) = ai (vJ>vl) = a j\\vj\\2- So aj — . Let V be an inner product. Then S is linearly independent.nk E S and y^aiVt i=l = o..ll 2 =0 As in the proof of Theorem 6. Let V be an inner product space. space and S — {v\. I The next corollary follows immediately from Theorem G. v2 . Vj) for all j.Vi '•/ i=i Proof.j = (0.342 Chap.) Corollary 2. then k y = ^2(y>Vi)vi.3. Corollary 1. and let S be an orthogonal subset of"V consisting of nonzero vectors. a2.3. in addition to the hypotheses of Theorem 6. Write y — 2_. in particular.. S is orthonormal and y E span(S). for I < j < k. Ie. If. Proof.i where a. (See Example 3. i=\ If V possesses a finite orthonormal basis. If y E span(S). So S is linearly independent.32 ..\. T h e o r e m 6.vk] be an orthogonal subset of\/ consisting of nonzero vectors. and the result follows. then Corollary 1 allows us to compute the coefficients in a linear combination very easily..3 with y ~ 0.3. 6 Inner Product Spaces The next theorem and its corollaries illustrate why orthonormal sets and. orthonormal bases are so important. then y. t^2-.vj) = \Y^(HVi-v>) = ak G F. a>vi. i^i we have (y. we have o..

1. the orthonormal set ^(1. we have (2. • Corollary 2 tells us that the vector space H in Section 6.3 that express x as a linear combination of the basis vectors an.3) = | ( 1 . set {v\. set of vectors in such a way that both sets generate the same subspace.Wi) = (w2. .1 contains an infinite linearly independent set. following equation: 0 = (v2.2 Gram-Schmidt Orthogonalization Process Example 3 By Corollary 2. ai = -L(2 + l) = -L a2 = J- ( 2 _ 1 + 3) = -l and «3 = .—=. 6. Fhe coefficients given by Corollary 1 to Theorem 0. Figure 6. \/6 v6 As a check.1 .wi) .0) i ( 1 343 .w<2} that spans the same subspace.v2}. 2 ) obtained in Example 8 of Section 6. -1.2 + 1 4-6) .1. 0 ) + | ( 1 .wi) Ml •Wi c(wi. It tells us how to construct an orthogonal set from a linearly independent. where v\ = W\ and v2 = xv2 ~ cw\. Before stating this theorem.CWi. 1 .W2) is a linearly independent.7 = ( . Of course.l . let us consider a simple case. subset of an inner product space (and hence a basis for some two-dimensional subspace). To find c.1 suggests that flu. Fhe next theorem takes us most of the way in obtaining this result. has this property if c is chosen so that v2 is orthogonal to W|. we need only solve the. 2 ) . Suppose that {W\. yet shown that. Let x = (2.1.3). 4 g ( .1 . we have not.Sec.1 is an orthonormal basis for R'!.w\) So (w2. 1 . l . 1 ) + | ( .Wi) = (w2 .wi) Ikill 2 Thus v2 = w2 (w2. We want to construct an orthogonal set from {w\.1). and hence H is not a finite-dimensional vector space. every finite-dimensional inner product space possesses an orthonormal basis.

2 ^ II 1 1 2 ^P"J J I I™ = 0. .v2. of nonzero vectors. Vi = W\ and A—I for 2 < k < n. where vk is obtained from Sk_j by (1). then (1) implies that wk E span(5^. subset o/'V.) = span(Sfe). w2 be a linearly independent.344 Chap. then the theorem is proved by taking S[ = »S|. let Sk — {w\. '3 Then S' is an orthogonal set of nonzero vectors such that span(S') = span(5).} by the use of Theorem 6. Proof. 6 Inner Product Spaces W\ = Vl Figure 6._1) = s p a n ^ . Let V be an inner product space and S = {w\. Now.. We show that the set S'k = {r\. But by Corollary 2 to Theorem 6. c2.. we have that span (SI) C spanfSfe). Assume then that the set S'k_1 — {t'\.vk} also has the desired properties. Therefore span(S£.2.e._ j is orthogonal. Wn } where = (Wk. .4. The proof is by mathematical induction on n. it follows from (1) that fe-i (Vk-.3. I The construction of {v\. .1 The next theorem shows us that this process can be extended to any finite linearly independent subset. ?'/. If vk = 0.w2 . (1) . v2.. If n = 1. • • • . V2.. . vk-i. S'k is linearly independent.4 is called the Gram Schmidt process. n . so dim(span(5^)) = dim(span(5fc)) = k. Theorem 6.Vi) . Hence S'k is an orthogonal set..-1} with the desired properties has been constructed by the repeated use of (1). the number of vectors in 5.• • • . ..wk}.i ) .vn}. by (1). . Vi = </-i 7^ 0. For k — 1.('i) = (wk.j by the induction assumption that S'[. i. Define S' = {i'\.*'.Vi) ~ IMP since (e .. which contradicts the assumption that Sk is linearly independent. . •. e^) = 0 if i -/. For I < i < k — 1.

| ( 1 .0).0). Take r.V2) i''2 = (0.1.vi) \V] V-i (W3.1. II "31| s/2 Example 5 Let V = P(R) with the inner product (f(x).0. .1. Then j|y•. let wj --. 0 . and then we normalize these vectors to obtain an orthonormal set. Then v2 = a-2 ——IT^-UI (w2^\) = (1. ||2 = / Thus v2 = x (vi.Sec.0).i .x) 'V 2 0 =*~2=xI 2 dt .v2. Take e.2. and (x. We use the Cram Schmidt process to compute the orthogonal vectors <•. u-i = —! •'• l|| v2 u2 = and us = ir^ir = 4 = ( .0. Finally.0.w3} is linearly independent. and consider the subspace P2(R) with the standard ordered basis ft. =wy = (1.1).2.2 Gram-Schmidt Orthogonalization Process Example 4 345 In R'.1).\.0) = (0. I ) . 0 ) . • —v2 = —=(o. L0).1).) = / / • 1 dt = 0. and then use this orthogonal basis to obtain an orthonormal basis for P 2 (/?).1) . where i rl= . 1) = (-1.2.1).(1. process to replace 0 by an orthogonal basis {vi.1. v3 .v3} for P2(R).0. and U3.0). 1.1.1. 6. 1 .1.= ( i .1.0.1.r(0.1. v2. We use the Gram Schmidt.1. w2 . 0.0.0. Then [wi.-(1. o . I''2 II " 72 .w3 (w3. and w3 = (0. u2. These vectors can be normalized to obtain the orthonormal basis {a.w2..1. c.g(x)) = f \ f(t)g(t) dt. 1.1.(1. u3].

5. x.r 2 ) o-«2 . v2.)„ o-U] t. x2. vn} and x E V.x 2 . and t'3 in Example 5 are the first three Legendre polynomials. v2.1*3} is the desired orthonormal basis for P2(R). The following result gives us a simple method of representing a vector as a linear combination of the vectors in an orthonormal basis.} for P(R).v2.. . if ft — {vj .. The orthogonal polynomials v\. 1 1 2 "l t2-ldt=and Chap. then n x= i=l y^{x. (:r 2 .1*2. = / Therefore V3 — X <**.. Then V has an orthonormal basis ft. and v3 to obtain u2 = and similarly.• 1 . V8 I''3 | • Thus {1*1. Theorem 6.vi)vi.«. w3 = \l!-i*dt = n x ' 1'3 = A . 1 U\ = f*l2dt 1 v^' To obtain an orthonormal basis. V be a nonzero finite-dimensional inner product space.v2) = . r 2 ~ l ) .x 2 — ^} is an orthogonal basis for P2(R). 6 Inner Product Spaces (x2.x...346 Furthermore../-i / 2 -tdt = 0. If we continue applying the Gram Schmidt orthogonalization process to the basis {1.0 • :r -•*-§• We conclude that {l..( 3 . Furthermore. we obtain an orthogonal basis whose elements are called the Legendre polynomials.. Let.. we normalize vi.

vi). Apply Theorem 6.1*2. 3 0 • Theorem 6. Although the vectors V\ .3.5 gives us a simple method for computing the entries of the matrix representation of a linear operator with respect to an orthonormal basis.v2..2x 4. we introduce a terminology associated with orthonormal sets 0 in more1 general inner product spaces. and let A — [T]p.Sec. . By Corollary 2 to Theorem 6.5 to represent the polynomial f(x) = 14. Let 0o be an ordered basis for V.Sx as a linear combination of the vectors in the orthonormal basis {1*1. . The remainder of the theorem follows from Corollary I to Theorem 6.j — (T(vj).1*3} for P2{T{) obtained in Example 5. we have..#') = span(/?n) = V..= ( l + 2/. Example 6 We use Theorem 6. • • • . Let V be a finite-dimensional inner product space with an orthonormal basis 0 = {('1. therefore ft is an orthonormal basis for V./•/) given in Theorem 6.1 . vn}. Corollary. v2. Then for any i and j. Proof From Theorem 6. ft is linearly independent. By normalizing each vector in 0'..).4 to obtain an orthogonal set 0' of nonzero vectors with span(..«. Hence Aij = {T(vj).3.2 Gram-Schmidt Orthogonalization Process 347 Proof. + 3c 2 )c/i-2v / 2. Observe that /•i i (/(./ .5 have been studied extensively for special inner product spaces. «2> = and (/(*W = r ±t(i + 2t + 'M2)dt- 2 ^ -(3/"' 1 2t + SP)dt = 2v/10 Therefore f(x) — 2\J2 u\ -\ 2Vo 2vT0 u2 H 1*3. we obtain an orthonormal set ft that generates V. 6.vn were chosen from an orthonormal basis.5. Aj. Let T be a linear operator on V.\ v2 </(*). I The scalars (•'.Vi).-).

l) 2TT » 2 7 T i. We learn more about these1 Fourier coefficients in the remainder of this chapter. •2n int /(*)' r dt. In Example 9 of Section 6. and the literature concerning the. behavior of these coefficients is extensive. f(t) sin ntdt and f(t)cosntdt./o or more generally. Example 7 Let S = {eint: n is an integer}. sum of a spe. *2TT (f.(\)dt = 7T./ll tc'"> dt = 1 2^ 'ii\ dt = -1 1 7 1 </. we see that c n = (fifn). In the first half of the 19th century. We compute the Fourier coefficients of f(t) = t relative to S. 2TT for a function / . Using integration by parts. the French mathematician Jean Baptiste Fourier was associated with the study of the scalars 2ir .fn) and. 6 Inner Product Spaces Definition. Let ft be an orthonormal subset (possibly infinite) of an inner product space V. in H. and using Exercise 16 of this section. In the context of Example 9 of Section 6. = -e 2 71 . where y E ft. S was shown to be an orthonormal set.348 Chap. cn is the nth Fourier coefficient for a continuous function / G V relative to S. As a result of these computations. we have. These coefficients are the ''classical" Fourier coefficients of a function. we obtain an upper bound for the.i{/-/.1.}i2 fc y n=l . and let x E V. We define the Fourier coefficients of x relative to ft to be the scalars (x. for n ^ 0.1.cial infinite series as follows: I if-In) i/ii 2 > E 71= — fc -1 E n = -k k Z-^t T)2 n=\ (/•i)r+x. for n = 0. where fn(t) — e'"1: that is.y).

concept of an orthogonal complement. Example 8 Fhe reader should verify that {()}~L = V and V ' — {0} for any inner product space V. If we let y be the vector determined by 0 and P. . • Exercise 18 provides an interesting example of an orthogonal complement in an infinite-dimensional inner product.2. y) = 0 for all y E S}. we may restate the problem as follows: Determine the. The set S is called the orthogonal complement of S.Z ^ .2 Because this inequality holds for all A:.I 349 U 2 6 > E Z_^ . space. We define S (read "S perp") to be the set of all vectors in V that are orthogonal to every vector in S: that. We' are' now ready to proceed with the. 62 Gram-Schmidt Orthogonalization Process for every k. space. Consider the problem in R. and se> z G W 1 . vector a in W that is "closest" to y. Definition. Notice from the figure that the vector z = y — a is orthogonal to every vector in W.2• Additional results may be produced by replacing / by other functions. The next result presents a practical method of finding u in the case that W is a finite-dimensional subspace of an inner product. Now.equals the xi/-plane (see Exercise 5). S' — {x E V: (x.Sec.. we obtain > 2 Y \ + 7T2. is. we may let k — > oo to obtain •2 6 oo x . • Example 9 If V = R'5 and S — {e3}.J of finding the distance from a point P to a plane W.) Problems of this type arise in many settings. rr. (See figure 6. The desired distance is clearly given by \\y — u\\. using the' fact that |!/|| 2 = — it . then S1. It is easily seen that S ' is a subspace' of V for any subset S of V. Let S be a nonempty subset of an inner product space V.

Furthermore.350 Chap.2 T h e o r e m 6. For any j.vj) = (V. for any x E W.l. Therefore. that is. Let x E W.VJ) . by Exercise 10 of Section (i. so.z E W n W 1 .v2. by Exercise 7. >VJ [y>vi) i i J2(y>vi)(vi.Vj) To show uniqueness of u and z. 6 Inner Product Spaces =y. it suffices to shew.. Then there exist unique vectors u E W and z E W1 such that y = u + z. the vector u is the unique vector in W that is "closest" to y.(y.Vi)vj. we- . and let z = y — a. and let y E V.6.. Let W be a finite-dimensional subspace of an inner product space V." ' w Figure 6.v2. suppose that y = u + z = a' + z'.x\\ > \\y . we have [Z.vk} be an orthonormal basis for W. \\y . Then a . we have that y = u+ z. To show that z E W-1-. and this inequality is an equality if and only if x = a.v! = z' . that z is orthogonal to each Vj. Clearly a E W and y = u + z. Then t* — x is orthogonal to z. let u lie as defined in the preceding equation.Vi) = fc y.. u = u' and z = z'. As in Theorem 6. I Corollary.u\\. if {v\ . y^(y. In the notation of Theorem 0.. • • • • vk} is an orthonormal basis for W.6.{()}.2_s(y>vi)vi\ =0. where z E \NL. where u' E W and z' E W ± . Proof. then k u= i=l Proof. Let {vi.6.

.Sec. p. 6. We will see the importance.For these vectors. • It was shown (Corollary 2 to the replacement theorem. The proof of the converse is obvious. U]) . U2> = .i(R) with the inner product (f(x). We compute the orthogonal projection f\(x) of f(x) = x3 on P2(R)By Example 5. 351 Now suppose that \\y .Ul)=J^t3-j=dt and ~I Hence fx{x) = (f(x). {ui.x\\2 + \\z\\2 \y — x\\2 — \\u + z — x\\2 > || 2 || 2 = \\y.g(x)) . 17) that any linearly independent set in a finite-dimensional vector space can be extended to a basis.3.j f(t)g(t)dt for all f(x). + (f(x). U3} ^ V2 V8v is an orthonormal basis for P2(^?).U2.2 Gram-Schmidt Orthogonalization Process have [u . and hence x = u. Then the inequality above becomes an equality.3 . 1 = 0. </(*).u\\2. and therefore \\u — #|| 2 + ||z|| 2 = ||z|| 2 .x\\ = \\y — u\\. The next theorem provides an interesting analog for an orthonormal subset of a finite-dimensional inner product space. E x a m p l e 10 Let V = P. V6 t\l-(3t'-l)dt = 0.x) + z\\2 = \\u .g(x) G V. u. The vector u in the corollary is called the orthogonal projection of y on W. u2) u2 + (f(x). It follows that ||w — x\\ = 0. of orthogonal projections of vectors in the application to least squares in Section 6.u3) u3 = -x. we have (f(x).

v2. Then x .0 for 1 < i < k.n = k + (n . v2. Now apply the Gram Schmidt process to S'. One can deduce the same result by noting that e3 € W"1 and. • • • :t'fc}. (a) The Gram Schmidt orthogonalization process allows us to construct an orthonormal set from an arbitrary set of vectors. lix e W 1 .0.} is an orthonormal set in an n-diinensional inner product space V. • • • • Vk.c) e W 1 if and only if 0 = (ar. (c) If W is any subspace ofV. The result now follows. (a) By Corollary 2 to the replacement theorem (p. and therefore W ... that din^W. Suppose that S = {i>\.•. It is a finite-dimensional inner product space because V is.352 Chap. we need only show that it spans W 1 . 6 Inner Product Spaces IT Theorem 6. • EXERCISES 1.e2) = b. • • . Wk \-i.= span({c.e 2 }) in F 3 .?'/. Therefore. Vk • Vk+1. and this new set spans V.S1 by Exercise 8.. i=k+l (c) Let W be a subspace of V. % } for V. 47). Normalizing the last n — k vectors of this set.k) = dim(W) + dimfVV1). Since S\ is clearly a subset of W . c).Vi)Vi. • • •. and so it has an orthonormal basis {v\. Then (a) S can be extended to an orthonormal basis {v\. babel the following statements as true or false. then Si = {Vk+i. (b) If W = span(S'). n x = ^2 (x. produces an orthonormal set that spans V. | . v{) .b. S can be extended to an ordered basis S' = [v\.. ••• ivn} is an orthonormal basis for W ' (using the preceding notation). E x a m p l e 11 Let W = span({ei.1 ) = 3 .By (a) and (b).{}).ei) = a and 0 = (x. from (c).7. (b) Because S\ is a subset of a basis.{a. vn} for V. it is linearly independent. So x = (0. then dim(V) = dini(W) + diin(W L ). Proof.. for any x 6 V. Vj) Vi € span(5])..2 = 1.. we have dim(V) . Note that. The first k vectors resulting from this process are the vectors in .. then (x.Vk+2. we have x= 2^{X. v-2.V2.

4 / ) .. . Then normalize the vectors in this basis to obtain an orthonormal basis ri for span(S'). and compute the Fourier coefficients of the given vector relative to 0.4 .1 2 + 3/. 5 ) . (1. ( .-4. . I).C . (0.. / i » .4 i .3 + 5 i .J* f(t)g(t)dt.3.0. l l .2 i ) .l -i. 7 6 / . where S = {(1. I).1) (c) V .8. (0. In each pari.7i. 1) (e) V . If {(. 1 .4 l Ai) (k) V = C . (1.( 13 .-2. . I. S= {(1. 5 .4*)}.S') with the inner product {/. ( .). .{(2.0).4 i .S').3./. S= {\. vn} is a basis for an inner product space V. n 8 6 25 .span(.(0. and x = (-1. 1*2.r2}./}. Every orthonormal set is linearly independent. and h(x) = 1+X (d) V . /. .2 6 •+ 5z) f(t)g(t)dt. (2 1 3z. apply the Gram Schmidt process to the given subset S of the inner product space V to obtain an orthogonal basis for span(S').R1. (1 -1.2..8)}. 3 ) . . and ..11)}.1. 1 ..•].5 .3i. S -1 27 A= -4 8/ 2 2 4 -12 dA = V = M2x2{R).6. and h(t) ~ 21+ I (j) V . and x = (1.) are the Fourier coefficients of x.2 ./.7 6z)}. I). Every orthogonal set is linearly independent.2 .1. ( .18) (f) V = R4.4).l . and x = (3 + i. 1. 5 .Sec.r. .!. Finally. K..i .l + 7 i . S= { ( .-2.1 3 (i) V = span(. .2 7 . 2.1 .2) (b) V . (3. 3 + 4 i ) } .R:\ S ./• . 7 .2 Gram-Schmidt Orthogonalization Process (b) (c) (d) (e) (f) (g) 353 Every nonzero finite-dimensional inner product space has an orthonormal basis. and x = (-11. 39 11 /. .P 2 (/?) with the inner product (f(x).2i). An orthonormal basis must be an ordered basis.2. then for any x € V the scalars (x.{(1.6+9i.!. and x . 1).9-3z. (-1. 1.5 to verify your result.1. 3 2/. use Theorem 6.3)}. S = {(1. 6 + 1 0 i .(1.4i.g(x)) .7.3. S .R.I ) .{(1.1 5 + 2 5 / . 6.0. The orthogonal complement of any set is a subspace. 1). a n d a : = ( 2 I 7i. S 2 1 3 -16 9.2.1)}. c. (a) V .4. < 1 / 3 5\ (— 9\ (1 -17 and (g) V = M 2 x 2 (/.1) / \ \.0. ( .5 t .(/) = S = {sin/.2..cos/.

Let S = {(1. 5.1.( ( 4 = .13/ -7-r24/ 1 /' 2 I 2/ Chap.132/ 7 126/ I | 3/ . for i = 1. and let z e V.().) 11.{. then the vectors V\.0. . Describe Sn geometrically. n ( R ( U . 6 Inner Product Spaces . Now suppose that S — {x%. 6.7 / .3 / . If ./•u}. Prove thai : f W : if and only if (z.2 + 8/ 10-10/ -13 \ i 9-0/ and A -II i 2 i V . Describe S geometrically. prove that there exisls y c V such that y e W J .6. In R2. 8. Hint: Use Theorem 6. r 2 /'„ derived from the (Irani Schmidt process satisfy v. 7.3 . Prove that if { a-\.x2} is a linearly independent subset of R3. 9. Prove that there exists a projection T on W along W thai satisfies |\|(T) = W 1 . 10. and let W be a finite-dimensional subspace of V. 1)}) in C:{.• (f: W. . Hint: Use Theorem 0. 5 = -25-38/ 12-78/ >i) . (1.3/ \ / 8i 1 + /' J ' 1 .2 u.LP = / if and only if the rows of A form an orthonormal basis for C". but (.where XQ is a nonzero vector in R. Compute S1.354 (1) V = M 2X2 (C).6 .6/ 3 + 18/ .span({(/'.2 . 12. and A 3. w2 ir„ } is an orthogonal set of nonzero vectors.5/ . 4= I -1 9' »/0 Find (he Fourier coefficients of (3.1. Let A be an // x /.: Use mathematical induction. Let S .r.1 . lei " . matrix with complex entries.M 2 x 2 (C).3 + 7/ -34-31/ . In addition... Let W be a. Prove that for any matrix A € .y) / 0.2 / ' . ) r =N(L.4) relative to 0 4. (Projections are defined in the exercises of Sect ion 2. Let 0 be a basis lor a subspace W of an inner product space V.• e V. 5 = 11 . = ir.2. Hint.7 1 . Find orthonormal bases for W and W^.8i" 1 -r 10?: . v) = 0 for every /' G 3. Let V be an inner product space.6 and Exercise 10 of Section 0. Let W .1)} in C3.i). prove that ||T(J-)|| < ll-''| for all . Prove thai . finite-dimensional subspace of an inner product space V.2 .7 I 5i 9 .9 .

. . ± ) x . Let {u\ . 15. and W„ denote the subspaces of V consisting of the even and odd functions...y) = 0 for all . 18. (See the definition of the sum of subsets of a vector space on page 22. Let V be an inner product space.Sec. y G V prove that (x. then for any x.y 6 V (Mx).1 + W2L. prove that T = To.) 14. V — W © W^.Vi)(y.) Hint for the second equation: Apply Exercise 13(c) to the first equation. n W 2 ) L = W.[ybY = (x.3. (See the exercises of Section 1. on F". Let V be a finite-dimensional inner product space over F. . 17. Let V = C([-l.My)Y = ([x}0. + W 2 )' = W f n W ^ and (W. S C (6 .Vi}. Prove the following results.vn} be an orthonormal basis for V. i-l Use (a) to prove that if 0 is an orthonormal basis for V with inner product (•.y G V..v2. •) is the standard inner product. (See Exercise 22 . finite-dimensional inner product space. fltnt. *' where {•. Then use Exercise 10 of Section 6.r.. and let S = {r.*>!'• Hint: Apply Theorem 6.y) (b) n ="^2{x. and W be a finite-dimensional subspace of V.. W = (W ' J 1 . 16. Suppose that W. Prove that (W. Let Wi and W2 be subspaces of a. (a) (b) (c) (d) So C 5 implies that 5 X C S(f. so span(S) C ( 5 ± ) ± . Let T be a linear operator on an inner product space V. Let V be an inner product space. If (T(x). Prove that for any x € V we have Ml 2 > £ l <*. Use Exercise 6. 1]). S and SQ be subsets of V.1.y).6 to x e V and W = span(5). respectively. prove this result if the equality holds for all x and y in some basis for V. •). (a) Bessel's Inequality. . (b) In the context of (a). prove that Bessel's inequality is an equality if and only if x G span(5). For any x.2 Gram-Schmidt Ort nalization Process 355 13. vn } be an orthonormal subset of V. 6. . In fact. c 2 . (a) Parseval's Identity.

fi G V. find the distance from the given vector to the subspace W.2a. Let V = C ( [ . . where the inner product on V is defined by </>£> = / MMVdt. Since all but a finite number of terms of n=] the series are zero.1. y): y = Ax}.2. In each part of Exercise 19. (a) Prove that (•.fi) = y o-(n)fi(n).l .2. (i) Prove that eA $ W. . . and let W be the subspace P 2 (i?).1]) with the inner product. e2. and W = P. and W = {{x.} is an orthonormal basis for V. u = (2.. /i(. in W. (c) V = P(R) with the inner product. In each of the following parts.3. 6 Inner Product Spaces of Section 1. let e n be the sequence defined by f-n(k) = <)n.. and conclude that W ^ (\N1)±. Let V = C([0. z): x + 3y . •) is an inner product on V. (ii) Prove that W 1 = {()}. find the orthogonal projection of the given vector on the given subspace W of the inner product space V. the series converges. (b) Let h(t) = t2. where 5n>k is the Kronecker delta.2z = 0}. the space of all sequences a in F (where F = R or F = C) such that o~(n) 7^ 0 for only finitely many positive integers n.356 Chap. 21.1].) Prove that Wg — W 0 . 20. (b) V = R3. and hence V is an inner product space. Use the orthonormal basis obtained in Example 5 to compute the "best" (closest) second-degree polynomial approximation of the function h(t) = el on the interval [—1. (a) Find an orthonormal basis for W. so W £ V. 1]) with the inner product (f. viewed as a space of functions.' 22. (f. we define (a.. and W = {(x. u = (2..g) = f \ f(t)g(t)dt. y.k.g) = / J f(t)g(t) dt. (R). 3).) = 4 + 3. 23.g(x)) = J0' / ( % ( * ) * . (f(x).7. Let W be the subspace spanned by the linearly independent set {/. 19. (c) Let an = e\ + en and W = span({<rn : n > 2}. Prove that {ci. For rr. \/t}.6).7. Use the orthonormal basis obtained in (a) to obtain the "best" (closest) approximation of //. Let V be the vector space defined in Example 5 of Section 1. (b) For each positive integer n. (a) V = R2.

. and let y G V. vn) be an orthonormal basis for V.1(e) (p. a2) = {(ai.3 The Adjoint of a Linear Operator 357 Thus the assumption in Exercise 13(c) that W is finite-dimensional is essential.y') for all x.y) is clearly linear.1. y') for all . v 2 . Then (x. suppose that g(x) = (x. which is clearly linear. 1)) = 2a.1). as in the proof of Theorem 6. a 2 ) = 2ai + a 2 . • . We first need a preliminary result. For a linear operator T on an inner product space V. clearly g is a linear transformation. we have y = y'. every linear transformation from V into F is of this form. 333). Then there exists a unique vector y G V such that g(x) = (x.r) = (x. we now define a related linear operator on V called the adjoint of T.vi) =g(vj).8. Let V be a finite-dimensional inner product space over F.\ + a2. The analogy between conjugation of complex numbers and adjoints of linear operators will become apparent. » i=l Define h: V — > F by h(x) = (x.3 THE ADJOINT OF A LINEAR OPERATOR In Section 6. Let 0 = {ei. whose matrix representation with respect to any orthonormal basis 0 for V is [T]*?. so by Theorem 6. Then g((i\. and let.Sec.y) = (x.8.6 (p.y). Furthermore. 6.. To show that y is unique... Let V be an inner product space.y) = (vj^gMvi) i=l = Y^g(vi) / ?'=! n ^gMSji (vj. (2. r- Since g and h both agree on 0. and let y = g(ex)e\ + g(e2)e2 = 2ex + e 2 = (2. Let 0 = {i?i.e2}. T h e o r e m 6. The function g: V — » F defined by g(. we have that g — h by the corollary to Theorem 2. 6. a 2 ).r. for 1 < j < w we have Hvj) = (vj. and let g: V — * F be a linear transformation.. More interesting is the fact that if V is finite-dimensional. Proof. Example 1 Define g: R2 — > R by g(ai. we defined the conjugate transpose A* of a matrix A. 73). y) for all x G V.

Let V be a finite-dimensional inner product space.r). Proof.//) for all x G V. Then there exists a unique function T*: V — V such that (T(.y2 C V and e G /•'. y) = (cT(. Then U.x) = (y.T* (r//i +r/ 2 )) = (T(x)./•. Defining T* : V — V by T'(y) . the existence of the adjoint is not guaranteed (see Exercise 24).y G V. T"(y)) for all . furthermore. To show that T* is linear. so (./-).358 Chap. Then g(cr. -f y2) = cT'(y\) + T*(. We now apply Theorem 6.) = r(T(.8 to obtain a unique vector . and let T be a linear operator on V.r2)." Thus T* is the unique operator on V satisfying (T(.T(y)) — (T*(.y2)).y) = (x.//) . the adjoint of a linear operator T may be defined to be the function T* such that (T(x)./-.y'): that is.r).T*(y 1 )) + (x. provided it exists.n + x2).r). Suppose thai U: V — »V is linear and that it satisfies (T(.9 is called the adjoint of the operator T./'./•. Then for any x G V./\rT*(y/. (T(. we have (T(x).y) for all x.'/2) by Theorem 6. Finally. We may view these equations symbolically as adding a * to T when shifting its position inside the inner product symbol. // G V.r. We first show that g is linear.T*(ito)> = <. Define g: V • F by g(x) .r).y G V. 1%)) for all . y) . T'C"//. 6 Inner Product Spaces Theorem 6. T' is linear. Hence g is linear.u'. The reader should observe the necessity of the hypothesis of finitedimensionality in the proof of Theorem 6.cg(x1) + g(. For an infinite-dimensional inner product space. The symbol T* is read "T star. T*(y)) = {. I The lineai operator T' described in Theorem 6. we need to show that T* is unique. Although the uniqueness and linearity of T' follow as before.T'(y)) for all x./•).r).y).9.u) = (.(T(a:).T*(y)) for all x.y) = (x.x2 € V and c G F.r 2 ).1(e) (p.8.y) = = (x.c(T(xl). U(y)) for all x./y 1 ) + (T(.) + T(.//2) = e(*. + x2) = (T(r.y G V.U. Lei // G V. Let X\. -a Since x is arbitrary. Many of the theorems we prove .y G V. 333). y C V.r 2 ). we have (.y) for all s G V. let y\. so T* .T'{y)).) + T*(.r.y) = {.q/] +//•_.T*(ar)) = (T*(x)./•.y) i (T(.//) = (x.r./ £ V such that g(x) = (x. Note thai we also have (•'••T(„)} = (T(y).

i. 93). 346).)-=L. we have B. IfT is a linear operator on V. Example 2 Let T be the linear operator on C2 defined by T(a. for the remainder of this chapter we adopt the convention that a reference to the adjoin! of a linear operator on an infinitedimt nsional inner product space. Let A = [T]p. Then L \.3 The Adjoint of a Linear Operator 359 about adjoints. then T 0 = V Proof.ai If 3 is the standard ordered basis for C2.a 2 ) = (-2i. Theorem 6.5 (p. then m. and let 3 be an orthonormal basis for V.m Hence T*(ai.].j ~ (r(vj).y.'y) = Ajt = (A*)ij. Then . Then from the = («i.11. Proof. we compute the adjoint of a.. Corollary...16 (p. do not depend on V being finite-dimensional.(. assumes its existence. corollary to Theorem 6. v2. Theorem 6. Theorem 6.a 2 ) = (2iai+3a2. Thus. Let A be an n x n matrix. and so (L. If 3 is the standard ordered basis for F". by Theorem 2.speeific linear operator.10. then. D = [ T ' V and 0 = {v\.3c/] — a 2 ). I As an illustration of Theorem 6..10 is a useful result for computing adjoints.. and let T and U be linear operators on V. we have [lA]0 . • -2/ 3 1 3 a2).a\ -|-a2.T*( Vi )) = <T(r. nevertheless.-.Sec..Vi) Hence B = A*.). unless stated otherwise. Let V be a finite-dimensional inner product space. So [Ti/j . | The following theorem suggests an analogy between the conjugates of complex numbers and the adjoints of linear operators. 6.10.. Let V be an inner product space. Hence [(LA)*]/9 = [U]J = A* = [L.A. vn}.— (L \):.

(d) Similarly.y G V.10./.(T + U)*(y)> .360 (a) (b) (c) (d) (e) ( T + U ) * . I In the preceding proof.T'(y)) = (x. (c) (ABY = B*A*. (TU)* = U*T*: T** = T: P = I. (a) Because (x.L A ..T * + U*. Chap. which holds even for nonsquare matrices. f <a?. T* + U* has the property unique to (T + U)*. the remaining parts can be proved similarly. can be given by appealing directly to the definition of the conjugate transpose of a matrix (see Exercise 5).4 4 B)* =A*+B*.y) = <T(.. The same proof works in the infinite-dimensional case. = (L AB )* = ( U U ) * . Hence T* + U* = (T + U)*. y\).. An alternative proof. Least Squares Approximation Consider the following problem: An experimenter collects data by taking measurements y\.L B M-. we relied on the corollary to Theorem 6.r) | \J(x).t2. y2.((T4 U)(x). we have (AB)* = B*A*.y) = (d) follows. the experimenter (x.T* for any c G F.. (t2. Proof We prove only (c). (d) A**=A. (cT)* = e. . We prove (a) and (d).T(y)) = {T*(x). (b) (cA)* =cA* for all e£ F.U*(y)) I U (K)) = ( * .y) = (x.T**(y)). Let x.) From this plot...(LB)*(LA)* = L B . since (x.. (tm.. ym at times t\. For example. 6 Inner Product Spaces Proof.. /. provided that the existence of T* and U* is assumed. ( T * 4 U*)(y)>.y) + (U(x).. Corollary. y2). (c) r . Then (a) (.3.y) = (T(x). Suppose that the data (/].T*(y) . Let A and D be n x n matrices.ym) are plotted as points in the plane. . he or she may be measuring unemployment at various times during some period. Since L (/WJ) .. the rest are proved similarly. (See Figure 6.. respectively.„.

This method not only allows us to find the linear function that best fits the data.3 Thus the problem is reduced to finding the constants c and d that minimize E. given an m x n matrix A. One such estimate of fit is to calculate the error E that represents the sum of the squares of the vertical distances from the points to the line. for any positive integer k. 6.) If we let L\ A = t2 \tm 1/ and y = yi \ymJ then it follows that E — \\y — Ax\\2. the best fit using a polynomial of degree at most k. E = X > . We develop a general method for finding an explicit vector xn G F n that minimizes E: that is. (For this reason the line y = ct + d is called the least squares line. that is. but also.d)' >-i y = ct + d Figure 6. say y — ct + d.<*i . we find xn £ F n such that \\y — AXQ\\ < \\y — Ax\\ for all vectors x G F n .Sec. . and would like to find the constants c and d so that the line y — ct + d represents the best possible fit to the data collected.3 The Adjoint of a Linear Operator 361 feels that there exists an essentially linear relationship between y and t.

//. r F". Then rank(^4*. Let A G M m x „(F) and y G F"'. then A" A is Xow let A he an in x // matrix and y « F'".y\\ < \\A. To develop a practical method for finding such an . I n..„ //•(. if rank(.l) invertible.y\ for all x • F" furthermore.!). we need some notation and two simple lemmas.x)n so that Ax = 0. We summarize this discussion in the following theorem.6 (p. there exists a unique vector in W that is closest to y. If.. by Lemma 1. Corollary. For .6 and its corollary that .r . Let A E M m x n ( F ) .r and y in F".ll (see Exercise 5(1))).•„ has the property that E — \\Ax{) — y\\ is minimal.y) denote the standard inner product of ./.i). W = R(L.A*y „ . Ax() tj) in t) for all . Call this vector Axu.362 Chap.-l). as desired. Define W {Ax: x C F"}: that is.btl. then by Lemma 2 we have XQ = (A*A)' A'y./• C F". Proof.y) = . Theorem 6.4) rnnk(.Lr() y G W" : so (Ax. let (x. Then there exist . we need only show that./"./• G F".r0 = (A"A)~] A"y. 6 Inner Product Spaces first.// G F"./• (A"y)mx= (x. By the dimension theorem. . By the corollary to Theorem 6.l) . Recall that if x and y are regarded as column vectors. we note from Theorem 6. By a generalization of the corollary to Theorem (i. So assume that A*Ax t). Then ||Ax'o — y\\ < \\Ax — y\\ for all .A**x)m Ur.. we have A*Ax = 0 if and only if .y). Let A < M.A*(AXQ //):„ 0 for all X G F": that is. Then (Ax. = (Ax.y)m = (x. Thus. we assume that rank(. I Lemma 2.A*y)n.(l to A*Ax — A'y. Proof.12. A*(Axo — y) — (). in addition.4) = n.!. So we need only find a solution ./• G F". If A is an in x n matrix such that iank(. then (x./•„ G F" such that (A*A)x0 = A*I/ and \\Ax0 .„(/•')• x c F". we have (Ax.„. Clearly.Ax)m. Ax II implies that A*Ax=0. Then 0 = {A*Ax./•) (//•. so ./. for ./'. and y t F m . we have that (X. then . where .rn. 350). Lemma 1.

But this would occur only if the experimenter collects all the data at exactly one time.i 1 2 3 I 1 1 1 1 0 30 10 10 ! Thus 20 V—10 Therefore 1 20 V —10 10\ \ 2 3 4 30/ VI /A 3 5 piav be satisfy 30 dI ''" It follows that the line // = 1. and hence A*A is invertible by the corollary to Lemma 2. 3). the appropriate model is fvA .r.• to the data. the method above may also be applied if.'/ w and A = . the computations are greatly simplified. the experimenter wants to fit a polynomial of degree at most /. (2. (3. Finally. so A*A would be a diagonal matrix (see Exercise 19). for some /.7/ is the least squares line.Sec.2). the in x 2 matrix A in our least squares application has rank equal to two.:.7). which consists only of ones.3.: d . Suppose that the experimenter chose the times tt (1 < / < m) i X > = o. The error computed directly as \\Axtl — y\ 2 — 0. In practice. 5). and (1.3 The Adjoint of a Linear Operator 363 To return to our experimenter.4 would be orthogonal. 6. For instance. otherwise. let us suppose that the data collected are (1. For. the first column of A is a multiple of the second column. Then the two columns of . In this case1. Then / I 1\ 2 I A= 3 1 V4 '7 ] \ 2 I 3 I V4 ' / and y /2\ 3 \7/ . if a polynomial y — el2 + dt + e of degree at most 2 is desired.

if u satisfies (AA*)u . Let x be any solution to Ax = b. Let . Finally. T h e o r e m 6.\-).1. there may be no unique solution.6 (p. and s G R(L i-). Then v -.y + 2z = -U x + 5-1/ = 19.slh' by Exercise 10 of Section 6. | Example 3 Consider the system x + 2y + z = A x . then s = A*u. In such cases. Therefore s — v = A*u by the discussion above. To prove (a). Let u be any solution to Ax — b. So s is a solution to Ax = b that lies in W. that is.sewnw'-wnw--j(i| : so V = s.9 (p. Proof. 172).364 Chap. Suppose that Ax = b is consistent. Thus s is a minimal solution. where u C W . (b) The vector s is the only solution to Ax = b that lies in R(L. (a) There exists exactly one minimal solution s of Ax = b. then u — 0. we need only show that s is the unique minimal solution.) and W = N(L. By Theorem 3.= W by Exercise 12. proving (a). it may be desirable to find a solution of minimal norm. But W . we let W = R(L 4 . which equals W by Exercise 12. Let A C M m x . We can also see from the preceding calculation that if ||v|| = ||. Then the following statements are true. (b) Assume that v is also a.sj| < ||u|| for all other solutions u. suppose that (AA*)u = b. hence v = s..4). 6 Inner Product Spaces Minimal Solutions to Systems of Linear Equations Even when a system of linear equations Ax = b is consistent. we have that v = s + u.. A solution s to Ax — b is called a minimal solution if ||. The next theorem assures that every consistent system of linear equations has a unique minimal solution and provides a method for computing it. and let v = A*u. Then v G W and Av = b. Since s G W. we have HI 2 + > . solution to Ax = b that lies in W. Therefore s is the unique minimal solution to Ax — b.b. (a) For simplicity of notation. By Theorem 6. 350).(F) and b G F'"./• = s + y for some s G W and y G W ± . and therefore h = Ax = As + Ay = As.13. .sj|.

Assume that the underlying inner product spaces are finite-dimensional. find a vector .7.\~.) Hence s = A*u = is the minima] solution to the given system. we have (L.+ y+llz = A x + (u/ . 63 The Adjoint of a Linear Operator 365 To find the minimal solution to this system. we have (T*)* — T. . for which one solution is (Any solution will suffice.1.y) for some y G V. Label the following statements as true or false.i)* = L.y) for all . we must first find some solution a to AA*x = 6. (g) For any linear operator T.Az . (a) Every linear operator has an adjoint. Now S O we consider the system 6. • EXERCISES 1. we have [T*]0 = {[T]PY(d) The adjoint of a linear operator is unique.r G V. (f) For any n x n matrix .Sec. Lor each of the following inner product spaces V (over F) and linear transformations g: V — /•'. (e) For any linear operators T and U and scalars a and /.. (c) For every linear operator T on V and every ordered basis 0 for V.-11 llx-4y 4-262= 19. (aT -f-fcU)* =aV I-6U*. (b) Every linear operator on V has the form x — > (x.// such that g(x) = (x. 2.

P 2 (/f) with (/.C .. (b) State a result for nonsquare matrices that is analogous to the corollary to Theorem 6.2/. 1 + 2i).t. (a) R(T*) ± = N(T). then T . Let T be a linear operator on an inner product space V. g(/) = /(0) + / ' ( l ) . and let T be a linear operator on V. h) - IK a .y) for all x. . as in the proof of (c). (For definitions. then T* is invertible and m ! f(t)g(t)dt. 5). (c) V . (a) V = R2./o For each of the following inner product spaces V and linear operators T on V.36).y G V. x = (3 .2.(3.]. prove that T*T = T(. and let T be a linear operator on V.. = U? and U2 .1.366 Chap. Prove that ||T(x-)|| . (b) V . Let V be a finite-dimensional inner product space. Give an example of a. Prove that if T is invertible. Complete the proof of Theorem 6.3 and 2.U2.T(y)) . x . see the exercises of Sections 1.11.(l .. Let lb = T + T* and U2 . b) = (2a + 2 (c) V . T(«. Hint: Use Exercise 13(c) of Section 6. then R(T*) . a2. Prove that U.2n2 -f. ( / ? ) with (/.C 2 . Prove the following results.Aa3 (b) V .i)Zy).11. Hint: Recall that N(T) = W' . T(2.TT*. Let T be a linear operator on an inner product space V.T(f) = f' + 3f. . For a linear operator T on an inner product space V. Is the same result true if we assume that TT" = To? 12. implies T — To.) 10.N(T) 1 . (a) Complete the proof of the corollary to Theorem 6. 11.i: g(r. 6 Inner Product Spaces (a) V = R. Let V be an inner product space. a-A) — <i\ . and prove it using a matrix argument. + iz2. (b) If V is finite-dimensional.11 by using Theorem 6.1. linear operator T on an inner product space V such that N(T) ^ N(T*).9rn / f(t)h(t) dt.(x. evaluate T* at the given vector in V. Prove that if V = W ® W^ and T is the projection on W along W .<?)= | fit) = 4 .11.T":.P . 9. = (T-1)*. Hint: Use Exercise 20 of Section 6.IkH for all x C V if and only if (T(x). g ( ~ . = -••. z2) = (22.

use the least squares approximation to find the best fits with both (i) a linear function and (ii) a quadratic function. respectively. •) j and (•. 6. Let T: V — * W be a linear transformation. First prove that T is linear. •).. T*T(>) = 0 if and only if T(x) = 0. (a) N(T*T) = N(T). (0. (e) For all x G V. 17. Compute the error E in both cases. and let. Deduce that rank(T*T) = rank(T). (d) (T*(aO. State and prove a result that extends the first four parts of Theorem 6. T(y)) 2 for all x G W and y G V.11 using the preceding definition. respectively. (a) {(-3. The following definition is used in Exercises 15-17 and is an extension of the definition of the adjoint of a linear operator. z G V'.y)j = (x. 20.2 . respectively.2). (c) For any n x n matrix A. Define T: V — * V by T(x) — (x. 6 ) . where V and W are finitedimensional inner product spaces with inner products {•. Suppose that A is an rnxn matrix in which no two columns are identical. 15.. where V and W are finitedimensional inner product spaces. Prove that det(. 14. (1. T*(y))j for all x 6 V and y G W. ') 2 . Let T: V — + W be a linear transformation. ( .1)} . Let T: V — » W be a linear transformation. then [T1? = ([T]}Y(c) rank(T*) = rank(T). Prove the following results. Let T be a linear operator on a finite-dimensional inner product space V. Prove that (R(T*)) ' = N(T). y. Let V be an inner product space. and T* is linear.4*) = det(A).3 The Adjoint of a Linear Operator 367 13.9).T Let A be an n x n matrix. For each of the sets of data that follows.Sec. and find an explicit expression for it. 16. (b) rank(T) = rank(T*). Prove that A* A is a diagonal matrix if and only if every pair of columns of A is orthogonal. Then show that T* exists. where V and W are finite-dimensional inner product spaces with inner products (•. using the preceding definition. y) 2 = (x. (a) There is a unique adjoint T* of T. •). Deduce from (a) that rank(TT*) = rank(T). rank(A*A) = rank(AA*) = rank(^4). 19. A function T*: W — » V is called an adjoint of T if (T(x).y)z for all x G V. Prove the following results. and {•. Definition. (b) If 3 and 7 are orthonormal bases for V and W. 18.

(b) x + 2y .4).9). In physics. ( 2 . equal to zero.12 takes the form of the normal equations: ti)d = Y^tiVi £ ' ? < ' £ and ^ L .1).y w =1 (aj x + 2y .7). y = ex + d.1 . (0.12)} (c) {(-2.4).5 4.z = 12 x+y-z=0 2x -y + z = 3 x-y+z=2 (c) (d) 23...ym). Hooka's law states that (within certain limits) there is a linear relationship between the length x of a spring and the force y applied to (or exerted by) the spring.i=1 c + rnd-- ^2'yii-1 These equations may also be obtained from the error E by setting the partial derivatives of E with respect to both c and d. (1.. . Use the following data to estimate the spring constant (the length is given in inches and the force is given in pounds). — 1 2x . y2). 3 ) . That is.-1). (9. (a) Show that the equation (A*A)xo = A*y of Theorem 6. 6 Inner Product Spaces (b) {(1.368 Chap. Find the minimal solution to each of the following systems of linear (xiuations.2)..2 2.0 4.5 5.0 2. (5.z = 1 2x + 3y + 2 = 2 Ax + ly . ( . Length X 3. (f2.8 4. Consider the problem of finding the least squares line y = ct + d corresponding to the rn observations (t\. where c is called the spring constant. ij\).3 22. . (7.0 Force y 1.z = 4 x + y + z — w. (3. (tm.3 ) } 21.

Al)(ar)> . Prove that for any positive integer n. it is necessary and sufficient for the vector space V to possess a basis of eigenvectors. If T has an eigenvector. and hence v is orthogonal to the range of T* — Al.Al)*(ar)> = (v.Sec. Hint: By way of contradiction. L e m m a . . 6. Proof Suppose that e is an eigenvector of T with corresponding eigenvalue A. As V is an inner product space in this chapter.4 NORMAL AND SELF-ADJOINT OPERATORS We have seen the importance of diagonalizable operators in Chapter 5. 0 = (0.y). then so does T*. Let V and {ei. e 2 . . (T .2. where 1 V^ tnd y = m £ « • i=l 24.x) = (v. it is reasonable to seek conditions that guarantee that V has an orthonormal basis of eigenvectors. The formulation that follows is in terms of linear operators. . (a) Prove that T is a linear operator on V. } be defined as in Exercise 23 of Section 6. 6. T*(en)(k) 7^ 0 for infinitely many k. and any nonzero vector in this null space is an eigenvector of T* with corresponding eigenvalue A. (b) Prove that for any positive integer n.W)(v). We begin with a lemma. T(e n ) = Yl7=i ei(c) Prove that T has no adjoint. suppose that T* exists. The next section contains the more familiar matrix form.14). (i. Then for any x G V. Let T be a linear operator on a finite-dimensional inner prodtict space V.4 Normal and Self-Adjoint Operators 369 (b) Use the second normal equation of (a) to show that the least squares line must pass through the center of mass. So T* — AI is not onto and hence is not one-to-one.x) = ((T . Thus T* — Al has a nonzero null space. . A very important result that helps achieve our goal is Schur's theorem (Theorem 6. (T* . For these operators. Notice that the infinite series in the definition of T converges because a(i) ^ 0 for only finitely many i. Define T: V -* V by oo T(a)(k) = y a(i) i=k for every positive integer A:.

z) = cA(0) = 0. An n. Suppose that T*(z) = Xz and that W .) L is an orthonormal basis for V such J that [T]. We show that W ' is T-invariant.span({~}). — l)-dimensional inner product spaces whose characteristic polynomials split. we conclude that T and T* commute. then \T]p is a diagonal matrix. So T(y) G W^. then (T(y). x n real or complex matrix A is normal if AA* = A* A. Let V be an inner product space.10 (p. 6 Inner Product Spaces Recall (see the exercises of Section 2. 352). Because diagonal matrices commute. Theorem 6. we can assume that T* has a unit eigenvector 2. We now return to our original goal of finding an orthonormal basis of eigenvectors of a linear operator T on a finite-dimensional inner product space V. The result is immediate if n — I. Thus if V possesses 11. By the lemma. Proof. where 0 is an orthonormal basis.T*(cz)) -. and hence [T*]^ = [T]!. Recall from Section 5.cT'(z)> .x> = <T(v).370 Chap.1 and see Section 5. we may define the restriction Tw: W • W by Tw(x) — T(x) for all x G W. If y € W ' and x ~ cz G W. (Ini^W 1 ) — n — 1.cAz> = cA (y. 359) that T is normal if and only if [T]/3 is normal. so we may apply the induction hypothesis to T w and obtain an orthonormal basis 7 of W 1 such that [Tw-]is upper triangular. By Theorem 6.4) that a subspace W of V is said to be T-invariant if T(W) is contained in W. and let T be a linear operator on V.7(c) (p.4) that the characteristic polynomial of T w i divides the characteristic polynomial of T and hence splits. or as a consequence of Exercise 6 of Section 4.21 p. M I . then TT* = T*T . . 314. Let T be a linear operator on a finitedimensional inner product space V. If W is T~ invariant. Clearly. It is clear that Tw is a linear operator on W.14 (Schur). is also a diagonal matrix.-< is upper triangular. Then there exists an orthonormal basis 0 forV such that the matrix [T].(y.i is upper triangular. So suppose that the result is true for linear operators on (it. Definitions. The proof is by mathematical induction on the dimension // of V.Suppose that the characteristic polynomial of T splits.(y.11 ortho-normal. basis of eigenvectors of T. It is easy to show (see Theorem 5. It follows immediately from Theorem 6. Note that if such an orthonormal basis 0 exists.cz> =_<y.2 that a polynomial is said to split if it factors into linear polynomials. 0 . We say that T is normal if TT* = T*T.

and let T be a normal operator on V.(•/. we have ||T(. In fact. 6. then T*(x) = Xx.||T*(.x) The proof of (b) is left as an exercise. Then U(. (d) Let A] and A2 be distinct eigenvalues of T with corresponding eigenvectors X] and x>. however. (c) Suppose that T(x) = Xx for some x G V. where ()<()< TT.x2) .Al.x2) = (XlXl. then x is also an eigenvector of T*. we see that normality is not sufficient to guarantee an orthonormal basis of eigenvectors.ar2> .T(aO) .— * R2 be rotation by 0.r)|| for all x € V. we have A.r) = 0. • Clearly.sin 0 cos 0) • Note that A A* = I = A*A: so A. (b) T — cl is normal for every e G F.:> = = (T*(x). is normal. So x is an eigenvector of T*. Let V be an inner product space. Then A is normal because both AA' and A'A are equal to -A .4 Normal and Self-Adjoint Operators Example 1 371 Let T: R. Then the following statements are true. then X\ and x2 are orthogonal. using (c).r) = Xx.15. (d) If Ai and A2 are distinct eigenvalues of T with corresponding eigenvectors X] and x2.||U*(.:. Theorem 6. Thus (a) implies that 0 = ||U0r)|| . (c) If x is an eigenvector ofT. and U is normal by (b). we need some general properties of normal operators.||T*0r) \x\\. Example 2 Suppose that A is a real skew-symmetric matrix: that is. (TT*(x).AI)0O|| .T*(x))^\\T*(x)\\2.. The matrix representation of T in the standard ordered basis is given by A= 0 and .Sec. (a) ||T(. and hence T.r).<T*T(. Proof (a) For any x G V.r)|| = ||(T* . So in the case of a real inner product space. Before we prove the promised result for normal operators.)|| 2 = (TOr). ifj(x) = Xx. A' = —A. (Xl..r)|| . Then. the operator T in Example 1 does not even possess one eigenvector. All is not lost.T*(:r2)> . Let U = T .. Hence T*(.(T(xi). We show that normality suffices if V is a complex inner product space.

J oT(vk) = A\kvi + A2kv2 + ••• + AjkVj + ••• + Akkvk. Suppose that T is normal.. we conclude that (x\. the characteristic polynomial of T splits. TT* = I = T*T: so T is normal. fn-l) ~ (fin. Example 3 Consider the inner product space H with the orthonormal set S from Example 9 in Section 6./ . So by induction. .16. we may write in f = J ^ a-ifi. Then T ( / n ) = /„.T*(Vj)) = (vk.. Let T be a linear operator on a hnitc-dimensional complex inner product space V.n = ^m.U(/„)) . / and U(/) = f_uf.16 does not extend to infinite-dimensional complex inner product spaces. where am ^ 0.1.XjVj) = A. . Since V equals the span of S. We claim that vk is also an eigenvector of T. .5 (p.Vj) = (vk. and let T and U be the linear operators on V defined by T ( / ) . We know that V] is an eigenvector of T because A is upper triangular.. By the fundamental theorem of algebra (Theorem D. v2.-\) = {fm. Then T is normal if and only if there exists an orthonormal basis for V consisting of eigenvectors of T. and hence vk is an eigenvector of T. X2x2) = X2 (xi. We show that T has no eigenvectors.Vk-i are eigenvectors of T. and let A. 6 Inner Product Spaces = (xi..372 Chap.. 1 Theorem 6.{r.+ ! and for all integers n. Assume that V\. It follows that T(vk) = Akkvk. Theorem 6. Furthermore.. as the next example shows. So we may apply Schur's theorem to obtain an orthonormal basis f3 = {v\. Furthermore. The converse was already proved on page 370.vn} for V such that [T]tf = A is upper triangular.x2) • Since Ai ^ A2. T*(VJ) = XjVj. Let V = span(S'). Since A is upper triangular. It then follows by mathematical induction on k that all of the Uj's are eigenvectors of T.v2.x2) =0. denote the eigenvalue of T corresponding to By Theorem 6. U(/ n ) = / n _ ] It follows that U = T*. by the corollary to Theorem 6. Ajk = (T(vk).4). Thus ( T ( / m ) . / n ) = (/m+li/n) = <S(m+l). 347). all the vectors in p* are eigenvectors of T. Consider any j < k. say. T(/) = Xf for some A. (vkfVj) = 0..15. Proof. I Interestingly. Suppose that / is an eigenvector of T.

By definition. • • • . By the fundamental theorem of algebra. the eigenvalues of ~T A are real. that is. and the same is true for self-adjoint operators on a real inner product space. Let T A be the linear operator on C" defined by TA(X) = Ax for all x G C".4 Normal and Self-Adjoint Operators Hence J^aifi+1=T(f) = Xf = TXaifi 373 Since am ^ 0. Because a self-adjoint operator is also normal. Lemma. T is self-adjoint (Hermitian) if T = T*. then T is selfadjoint if and only if [T]^ is self-adjoint. we can write / m + i as a linear combination of f„. It follows immediately that if 0 is an orthonormal basis. Then (a) Every eigenvalue o/'T is real. The lemma that follows shows that the same can *be said for self-adjoint operators on a complex inner product space.4]7 = A. Let T be a linear operator on an inner product space V. (a) Suppose that T(x) = Xx for x -/= 0. where 7 is the standard ordered (orthonormal) basis for C". An n x n real or complex matrix A is self-adjoint (Hermitian) if A = A*. by (a). Definitions. the . Proof. Note that TA is self-adjoint because [T. Before we state our main result for self-adjoint operators. Let T be a self-adjoint operator on a finite-dimensional inner prochict space V.fmBut this is a contradiction because S is linearly independent. we must replace normality by the stronger condition that T = T* in order to guarantee such a basis. • Example 1 illustrates that normality is not sufficient to guarantee the existence of an orthonormal basis of eigenvectors for real inner product spaces. this condition reduces to the requirement that A be symmetric. the characteristic polynomial of every linear operator on a complex inner product space splits. and A = [T]p. Similarly. we can apply Theorem 6. / n + i . 6. a linear operator on a real inner product space has only real eigenvalues. So A = A. We say that. Then A is self-adjoint.Sec.f 5(c) to obtain XX = T(X) = T*(X) = \X. For real inner product spaces. (b) Suppose that V is a real inner product space. for real matrices. 0 be an orthonormal basis for V. we need some preliminary work. A is real. Then the characteristic polynomial of T splits. — dim(V). So. (b) Let n.

By the lemma. The converse is left as an exercise. So A and A* are both upper triangular. the characteristic polynomial splits over R. (e) The eigenvalues of a self-adjoint operator must all be real. Then T is self-adjoint if and only if there exists an orthonormal basis 0 for V consisting of eigenvectors of T.17 is used extensively in many areas of mathematics and statistics. because (AA*)\2 = 1 + i and (A*A)\2 = 1 — i. Since each A is real. . • EXERCISES 1.4 is normal. But A* = [T]J = [Tfc = [T)p = A. Suppose that T is self-adjoint. be normal. where 0 is any ordered basis for V. real symmetric matrices are self-adjoint. I Theorem 6. Therefore the characteristic polynomial of T splits. Assume that the underlying inner product spaces are finite-dimensional. and self-adjoint matrices are normal. we may apply Schur's theorem to obtain an orthonormal basis 0 for V such that the matrix A = \T]p is upper triangular. (d) A real or complex matrix A is normal if and only if L.374 Chap. (c) If T is an operator on an inner product space V.17. The following matrix A is complex and symmetric: A * . (a) Every self-adjoint operator is normal.4 splits into factors of the form t — A.4 has the same characteristic polynomial as A. and therefore A is a diagonal matrix. which has the same characteristic polynomial as T. Proof. But T. I We are now able to establish one of the major results of this chapter. We restate this theorem in matrix form in the next section. Thus 3 must consist of eigenvectors of T. Therefore complex symmetric matrices need not. (b) Operators and their adjoints have the same eigenvectors. Label the following statements as true or false. Example 4 As we noted earlier.( ! l) -* *={-i 1 But A is not normal. 6 Inner Product Spaces characteristic polynomial of T. then T is normal if and only if \T]p is normal. Theorem 6. Let T be a linear operator on a finite-dimensional real inner product space V.

6) = (2a + ib. (/.and T*-invariant and T is normal. where U| and U2 are self-adjoint. If W is both T. then (Tw)* = 0~*)w If W is both T. -2a \ 56).<?>= / (e) V = M 2x2 (/?) and T is defined by T(A) =•.a + b.b) = (2a . If possible. then Tw is normal. Let V be a complex inner product space. (b) Suppose also that T = U. 6. a + 26). 2/ (a) Prove that T| and T 2 are self-adjoint and that T = Tj +iT2. and let T be a linear operator on V. or neither. 6. by T ( / ) .15. A< (f) V = M 2x2 (i?) and T is defined by T d c d a 6 3. For each linear operator T on an inner product space V. Lei T and U be self-adjoint operators on an inner product space V. Prove (b) of Theorem 6. and let W be a T-invariant subspace of V. T(o. Give an example of a linear operator T on R2 and an ordered basis for R that provides a counterexample to the statement in Exercise 1(c). (a) (b) (c) (d) If T is self-adjoint.Sec. produce an orthonormal basis of eigenvectors of T for V and list the corresponding eigenvalues. ./ ' . Prove the following results.4 Normal and Self-Adjoint Operators (f) The identity and zero operators are self-adjoint. > • 4. 375 2. where f(t)g(t)dt. determine whether T is normal.and T*-invariant. Prove that TU is self-adjoint if and only if TU = UT.T*). 7.i +iU 2 . (c) Prove that T is normal if and only if T | T 2 = T 2 T | .26. then Tw is self-adjoint. 6. Prove that Ui = T] and U2 = T 2 . W is T'-invariant. (h) Every self-adjoint operator is diagonalizable. 5.26 + 5c). (a) (b) (c) (d) VVVVR2 and T is defined by R:i and T is defined by C2 and T is defined by P 2 (/r) and T is defined T(a. Let T be a linear operator on an inner product space V. Define Ti = i ( T + r ) 2 and T 2 = i ( T . (g) Every normal operator is diagonalizable. c) = ( . 56. self-adjoint. T(a. Aa .

and U-invariant. n>al matrix A is said to be a G r a m i a n matrix if there exists a real (square) matrix 13 such that A — P' B. A 2 . Prove that there exists an orthonormal basis for V consisting of vectors that are eigenvectors of both U and T.r)±. we have that W L is both T.r 376 Chap.) = \/X~iV. we have that W is both T. T be a self-adjoint operator on a finite-dimensional inner product space V.and U-invariant. then T = T*. Prove that for all x C V !|T(. .15 and Exercise 12 of Section 6.. Prove that V has an orthonormal basis of eigenvectors of T.R(T*).\ is invertible and that [(T ..17 to T = L4 to obtain an orthonormal basis {(']. ./• G V. space V. r|| 2 -!|T0r)|| 2 + ||^|| 2 . 6 Inner Product Spaces 8. Deduce that T . 14. (The complex version of this result appears as Exercise 10 of Section 6. If T satisfies (T(x).x) = 0 for all x G V. 12. By Exercise 7. 11. subspace of V. y 2 . Prove the following results. An n. 9. Define the linear operator U by U (*•'./I)" 1 ]* = (T + i\)~l. Assume that T is a.) Hint: For any eigenspace W — E^ of T..x) is real for all x G V. Let V be a finite-dimensional real inner product space. . 350). then (T(x). and let U and T be self-adjoint linear operators on V such that UT = TU. and let W be a. Simultaneous Diagonalization. 10. Hint: Apply Theorem 6. linear operator on a complex (not necessarily finitedimensional) inner product space V with an adjoint T*.x) is real for all . v„} of eigenvectors with the associated eigenvalues Ai. . .6.17 and Theorem 6. Hint: Replace x by x I y and then by x f iy. and expand the resulting inner products.4. (a) (b) If T is self-adjoint. Let T be a normal operator on a finite-dimensional complex inner product. Hint: Use Theorem 6. Hint: Use Exercise 24 of Section 5.3. Hence prove that T is selfadjoint. 13. Apply Theorem 6. Let T be a normal operator on a finite-dimensional real inner product space V whose characteristic polynomial splits. Prove that if W is T-invariant. . . x n. then W is also T*-invariant.. Let.6 (p. .N(T*) and R(T) . then T = T„. (c) If (T(x). Prove that N(T) . A„. Let T be a normal operator on a finite-dimensional inner product space V. Prove that A is a Gramian matrix if and only if A is symmetric and all of its eigenvalues are nonnegative.

xn matrix A. i}) for j > 2.e2.\.T)(e . 16. . (The general case is proved in Section 5. Prove the Cayley Hamilton theorem for a complex n.both diagonal matrices. (a) T is positive definite [semidefinite] if and only if all of its eigenvalues are positive [nonnegative]. Because of (f). Let .. Definitions.4..j > 0 for all nonzero //. linite-dimeusional inner product space is called positive definite [positive scmidefinitc] if T is self-adjoint and (T(x).x) > 0 [(T(x).4 and B be symmetric n x n matrices such that AB = BA. A linear operator T on a. 17. .. That is. An ii x n matrix A with entries from R or (' is called positive definite positive semidefinite} HL. (d) If T and U are positive semidefinite operators such that T 2 — U2. ) G span({< h e 2 e. then T = U. . where {e. Use Exercise 14 to prove that there exists an orthogonal matrix P such that P'AP and PtBP an. (e) If T and U are positive definite operators such that TU = UT.4 is positive definite [semidefinite].jU.) The following definitions are used in Exercises 17 through 23.Sec. where 0 is an orthonormal basis for V. Prove the following results. and let A = [T]^. results analogous to items (a) through (d) hold for matrices as well as operators.4 [\Jormal and Self-Adjoint Operators 377 15.a n ).-tuples ( a i . . en} is the standard ordered basis for C". in which case /(*) = U(Au -t) i i Now if T = l_4.x) > 0] for all x f 0. . o 2 . (c) T is positive semidefinite if and only if A — B* 11 for some square matrix II. Hint: Use Schur's theorem to show that A may be assumed to be upper triangular. Let T and U be a self-adjoint linear operators on an //-dimensional inner product space V.. then TU is positive definite. (f) T is positive definite [semidefinite] if and only if . prove that f{A) — ().\ is positive definite [positive semideBmte]. if /'(/) is the characteristic polynomial of A.. 6. (b) T is positive definite if and only if y AjjO. we have (Ajj\ .

Hint: Let.y) defines another inner product on V.y)' = (T(x). and let (•. (c) T is positive definite. Let V be a. Hint: Show that UT is self-adjoint with respect to the inner product (x.y) for all x and y in V. • ) ' and U .). Prove the following results. •)• a n < l l°t T oe a positive definite linear operator on V. and define a matrix A by Ajj = (vj. (a) . . Let T2 = T r 1 U * . (•. Let V be an inner product space with inner product (•. 6 Inner Product Spaces 18.•)). 20. (b) If c > 0. •) the inner product on V with respect to which 0 is orthonormal (see Exercise 22(a) of Section 6.. 19.3. (See Exercise 15 of Section 6. and Ti the positive definite operator according to Exercise 22. •) be any other inner product on V. then cT is positive definite.vn} be an orthonormal basis for V with respect to (•. Lei T be the unique linear operator on V such that [T]. 21. This exercise provides a converse to Exercise 20. 22. Prove that there exist positive definite linear operators T| and T[ and self-adjoint linear operatorsT 2 and T 2 such that U = T 2 Tj = T'. repeal the argument with T _ ' in place of T.. Prove that there exists a unique linear operator T on V such that (x. Let U be a diagonalizable linear operator on a finite-dimensional inner product. and let T and U be self-adjoint operators on V such that T is positive definite. Let T and U be positive definite operators on an inner product space V. (a) T*T and TT* are positive semidefinite. Let V be a finitedimensional inner product space with inner product (•. where V and W are finitedimensional inner product spaces.) (b) rank(T*T) = rank(TT*) = rank(T).T ^ ' U ' T .v2.1). •) be the inner product associated with V. (b) Prove that the operator T of (a) is positive definite with respect to both inner products. Prove that both TU and UT are diagonalizable linear operators that have only real eigenvalues.Vi) for all i and j. Let T: V — > W be a linear transformation. (a) T + U is positive definite.j — A. •). (the adjoint is with respect to (•. {•.378 Chap.y). Hint: Let 0 = {vi. space V such that all of the eigenvalues of U are real.. To show that TU is self-adjoint. finite-dimensional inner product space.y) = (T(a. Prove that (x.T 2 . •). Prove the following results. 0 a basis of eigenvectors for U.y) = (T(x). 23. Show that U is self-adjoint with respect to ( • .

5 Unitary and Orthogonal Operators and Their Matrices 379 24. (b) Use Exercise 32 of Section 5. It should be noted that. Let 7 be the orthonormal basis for V obtained by applying the Gram Schmidt orthogonalization process to 0 and then normalizing the resulting vectors./. (a) Suppose that 0 is an ordered basis for V such that [T]. We will see that this condition guarantees. we study those lineai. we prove that. we continue our analogy between complex numbers and linear operators.|| for all x G V. If.operators T on an inner product space V such that TT* = T*T — I. these are the normal operators whose eigenvalues all have absolute value I. linear operators preserve the operations of vector addition and scalar multiplication. Definitions.^ is an upper triangular matrix. that T preserves the inner product. . we cail T a unitary operator if F — (' and an orthogonal operator if F — R. Clearly. In past chapters. Let T be a linear operator on a Unite-dimensional inner product space V (over F). in fact. We will see that these are precisely the linear operators that "preserve length" in the sense that ||T(x)|| = ||x|| for all x G V.z-)j| = ||. In particular. It is now natural to consider those linear operators T on an inner product space that preserve length.1 I. If ||T(.5 UNITARY AND ORTHOGONAL OPERATORS AND THEIR MATRICES In this section. This argument gives another proof of Schur's theorem.Sec. A complex number 2 has length 1 if zz = 1.11 p. Theorem 6. and isomorphisms preserve all the vector space structure. then the operator is called a unitary or orthogonal operator. on a finite-dimensional complex inner product space. In this section. 6. We study these operators in much more detail in Section 6. in addition. Let T be a linear opera!or on a finite dimensional inner product space V. in the infinite-dimensional case. an operator satisfying the preceding norm requirement is generally called an isornetry. linear operator acts similarly to the conjugate of a complex number (see.4 and (a) to obtain an alternate proof of Schur's theorem. the operator is onto (the condition guarantees one-to-one). we were interested in studying those functions that preserve the structure of the underlying space. 359). As another characterization. any rotation or reflection in R2 preserves length and hence is an orthogonal operator. Recall that the adjoint of a. 6. Prove that [T]7 is an upper triangular matrix. for example.

372) or 6.r)> = <. We prove first that (a) implies (b). 374).. Let 0 = {v\. so T(0) = {T(t . we may choose an orthonormal basis 0 for V consisting of eigenvectors of U. So T is a unitary operator. (a) TT* = T*T = I.T(vj)) = (vi.y G V. Let x G V. L e m m a .. 2 ) . From (a).)|| = ||z|| for all x € V. Thus 0 = (x..18. Proof By either Theorem 6. and let 0 — {v\.y G V. y) = (T(ar). U(ar)) = 0 for all x G V. we first prove a lemma.. we prove that (b) implies (c). (d) There exists an orthonormal basis 0 for V such that.18. and thus U = T„. x). Before proving the theorem.380 Example 1 Chap. Compare this lemma to Exercise 11(b) of Section 6... vn) be an orthonormal basis for V. Then the following statements are equivalent. then U(x) = Xx for some A.. If (x. 6 Inner Product Spaces Let h G H satisfy \h(x)\ = 1 for all x. Define the linear operator T on H by T ( / ) = hf.T(y)). so A = 0. (b) (T(x).. Let U be a self-adjoint operator on a finite-dimensional inner product space V. U(. Then l|T(/)|| 2 = \\hff = ±-J "h(t)f(t)Mjm)dt • = Wff since |/?(/)|2 = 1 for all t. Then (a. then U = T 0 . (c) If ft is an orthonormal basis for V.17 (p.Vj) — Sjj. then T(0) is an orthonormal basis for V. Let T be a linear operator on a finite-dimensional inner product space V. y) = (T*T(x).4.v2. 1 ). (e) ||T(:/. T(/3) is an orthonormal basis for V. Next we prove that (d) implies (e). Second.. Hence U(x) = 0 for all x G 0. Thus all the conditions above are equivalent to the definition of a unitary or orthogonal operator. I Proof of Theorem./.T(e„)}. Now E .T(/. Xx) = X (x. Theorem 6. Therefore T(p') is an orthonormal basis for V. vn}.T(y)) = (x. If x G 0. That (c) implies (d) is obvious. It follows that (T(Vi). 6.. it follows that unitary or orthogonal operators are normal.16 (p.y) for all x. Let x.v2.

r)|| = ||..T(x)) = (x.I'-'. we have To = U = I — T*T. Finally.. 6. Hence. i. that T(0) is also orthonormal. vn} such that T(v.| = 1 for all i. we obtain IITMIP = D«. T is orthogonal by Theorem 6.4 to conclude that TT* — I.v. we prove? that (e) implies (a).. . .A.18(a). by the lemma.X) — \\x\ = \\T(x)\\ = (T(x). so I Aj I = 1 for every i. T is self-adjoint. U(x)) = 0 for all x G V. 374). If T is self-adjoint. and so Ixll2 = ^2aiVi. we have I'M "INI = IIAjUiH = ||T(UJ)|| = \\vi\\. vn} such that T(vi) — Xji'i and |A. v2. Proof. In fact. 0 r1 and using the fact.i Hence ||T(.i2 i=\ 381 since 0 is orthonormal. Let U = I . Let T be a linear operator on a finite-dimensional real inner product space V. For any x G V. Thus (TT*)(vi) = T(Xii'i) = A. .17 (p. Then V has an orthonormal basis of eigenvectors of T with corresponding eigenvalues of absolute value 1 if and only ifT is both sell-adjoint and orthogonal. So TT* = I. It follows immediately from the definition that every eigenvalue of a unitary or orthogonal operator has absolute value 1. and again by Exercise 10 of Section 2.. and therefore T*T — I.0 for all .x||. Suppose that V has an orthonormal basis {?. then U is selfadjoint.T*T(x) So (re. then. .17. = X'jvj = Vi for each /".T*T.i) = A. • • •. we have 2 [X. even more is true.4. by Theorem 6. we may use Exercise 10 of Section 2. Corollary 1.J2aJvJ 3=1 i=i j=\ a ii i=i j=i Ei«.j for all ?'.Sec.'|..• G V.e. Since V is finite-dimensional. By Theorem 6. and (re. . we have that V possesses an orthonormal basis {v\. Applying the same manipulations to n T(x) = ] T > T ( . (I -T*T)0r)) . If T is also orthogonal. t ' 2 .5 Unitary and Orthogonal Operators and Their Matrices for some scalars a.

382 Chap. Example 3 Let T be a reflection of R2 about a line L through the origin. Example 2 Let T: R2 — > R2 be a rotation by 6 > . The fact that rotations by a fixed angle preserve perpendicularity not only can be seen geometrically but now follows from (b) of Theorem 6." that is. The proof is similar to the proof of Corollary 1. • Definition.Then T(?. where 0 < 0 < n.5.v2} is an orthonormal basis for R2. {vi. Finally. It is seen easily from the preceding matrix that T x is the rotation by —$. A linear operator T on R2 is called a reflection of R2 about L if T(x) = x for all x G L and T(x) — —x for all As an example of a reflection. Definitions.15 (p. Select vectors v\ G L and v2 G L1.i) = v\ and T(v2) — —v2. It is clear geometrically that T "preserves length. • We now examine the matrices that represent unitary and orthogonal transformations. Furthermore. that ||T(:r)|| = \\x\\ for all x G R2. It follows that T is an orthogonal operator by Corollary 1 to Theorem 6. We show that T is an orthogonal operator. an inspection of the matrix representation of T with respect to the standard ordered basis.such that \\vi || — \\v2\\ = 1. Perhaps the fact that such a transformation preserves the inner product is not so obvious: however. As we mentioned earlier. consider the operator defined in Example 3 of Section 2.18. respectively. wThich is cos 0 sin 6 — sin i cos i j| reveals that T is not self-adjoint for the given restriction on 9. 371). Thus v\ and v2 are eigenvectors of T with corresponding eigenvalues 1 and — 1. Let L be a one-dimensional subspace of R2. we obtain this fact from (b) also. matrix if . We may view L as a line in the plane through the origin. Then V has an orthonormal basis of eigenvectors ofT with corresponding eigenvalues of absolute value 1 if and only ifT is unitary: Proof. 6 Inner Product Spaces Corollary 2. Let T be a linear operator on a finite-dimensional complex inner product space V. A square matrix A is called an orthogonal A* A = AAl = I and unitary if A* A = A A* = I.18. this fact also follows from the geometric observation that T has no eigenvectors and from Theorem 6.

v\ G L.. Suppose that a is the angle from the positive x-sxis to L. A is an orthogonal matrix. and let A = [T]g. cos a). we have [T]7 = [lA]y = Let Q = cos a sin a — sin a cos a ly the corollary to Theorem 2.i|| = ||v2|| = 1. 359) that a linear operator T on an inner product space V is unitary [orthogonal] if and only if [T]p is unitary [orthogonal] for some orthonormal basis 0 for V. Similarly. Let v\ — (cos a.23 (p. • v . 115). It also follows from the definition above and from Theorem 6. We describe A. we call A orthogonal rather than unitary. Example 5 Let T be a reflection of R2 about a line L through the origin.v2} is an orthonormal basis for R2. Example 4 From Example 2. Then ||i.Sec. Note that the condition AA* = / is equivalent to the statement that the rows of A form an orthonormal basis for F n because Si. sin a) and t'2 = (—sin a. One can easily see that the rows of the matrix form an orthonormal basis for R .10 (p. A= Q[lA\1Q-1 . let 0 be the standard ordered basis for R2. A similar remark can be made about the columns of A and the condition A* A = I. In this case. the columns of the matrix form an orthonormal basis for R2. and v2 G L 1 . a real unitary matrix is also orthogonal.5 Unitary and Orthogonal Operators and Their Matrices 383 Since for a real matrix A we have A* = A1. 6. Since T is an orthogonal operator and 0 is an orthonormal basis. k-\ k=i and the last term represents the inner product of the ith and jfth rows of A. Because T(fi) = v\ and T(v2) = —v2. Hence 7 = {v\. Then T = L^. = hj = {AA*)ij = J2Aik(A*)kj = Y^AikAjk. the matrix cos 9 sin 9 — sin 9 cos 9 is clearly orthogonal.

sin of) We know that. it follows that Q is unitary [orthogonal .384 cos a sin o shm\ f\ cos a J l o 0 -I Chap. Let A be a complex n x n matrix. Let A be a real n x /.19 and is left as an exercise. The preceding paragraph has proved half of each of the next two theorems. we say that A is unitarily equivalent orthogonally equivalent. however. Similarly. A*A = P*D*DP. „(li)\. . Then A is symmetric if and only if A is orthogonally equivalent to a real diagonal matrix. then A is normal. 6 Inner Product Spaces C O S Q sin a — sine* COSQ cos. where P is a unitary matrix and I) is a diagonal matrix.20.19. there exists an orthonormal basis Q for F" consisting of eigenvectors of A. By the corollary to Theorem 2. Since D is a diagonal matrix. Thus AA* = A*A. we have DD* =D*D. It is easily seen (see Exercise 18) that this relation is an equivalence relation on M„.23 (p. Then A is normal if and only if A is unitarily equivalent to a diagonal matrix.4. the matrix Q whose columns are the vectors in 3 is such that D — Q AC). I Proof. Proof. But since t he columns of Q are an orl lionoiinal basis for F". 115).a sin a 2sinacosa sin 2o j sin a coso — (cos o. 1 Example 6 Let . More generally. In this case.(P*DP){P*D*P) P*DID*P P*DD*P.„(C) [M„. Suppose that A — P*DP.4 is similar to a diagonal matrix D. By the preceding remarks. The proof is similar to the proof of Theorem 6. Hence . matrix. we need only prove that if A is unitarily equivalent to a diagonal matrix. Then AA* = (P*DP)(P*DP)* .4 and II are unitarily equivalent [orthogonally equivalent] if and only if there exists a unitary [orthogonal] matrix P such thai A P' IIP. Theorem 6. I Theorem 6. for a complex normal real symmetric matrix . to D.

Notice that (1. Let V be a real inner product space./•(•'') -f(y)\\ = \\x-y\ .5 Unitary and Orthogonal Operators and Their Matrices 385 Since A is symmetric. then A is orthogonally equivalent to a real upper triangular matrix.14 p.0). 1)} is a basis for the eigenspace corresponding to 2.U l . As it is the matrix form of Schur's theorem. 1 . Rigid Motions* The purpose of this application is to characterize the so-called rigid motions of a finite-dimensional real inner product space.1. It is easy to show that the eigenvalues of . Definition. To find P.15(d) (p.21 (Schur).4 are 2 and 8.1.1) is orthogonal to the preceding two vectors. The key requirement for such a transformation is that it preserves distances. Taking the union of these two bases and normalizing the vectors. -2)}. Theorem 6. the next "result is immediate. as predicted by Theorem 6. l .0).0 0 8 Because of Schur's theorem (Theorem 6. (b) If F — R.Sec. The set {(-1.20 tells us that A is orthogonally equivalent to a diagonal matrix. Because this set is not orthogonal. we obtain an orthonormal basis of eigenvectors. 1. . we also refer to it as Schur's theorem. .0.-^(1. then A is unitarily equivalent to a complex upper triangular matrix.1 ( 1 . (a) If F = ('. V2 v0 v3 J Thus one possible choice for P is I p = 0 1 i 2 \/6 I v •':. l ) ) . hence the term rigid. We find an orthogonal matrix P and a diagonal matrix D such that PtAP= I).1)} is a basis for the eigenspace corresponding to 8.1.-2). 6. l v/5/ '2 0 if I0 2 0 .1. 370). The set {(1. we apply the Gram Schmidt process to obtain the orthogonal set {(-1.0). 371). A function f: V — *V is called a rigid motion if !. we obtain the following orthonormal basis for R'5 consisting of eigenvectors of A: 4 ( 1. Theorem 6. Let A C M„ X „(F) be a matrix whose characteristic polynomial splits over F. One may think intuitively of such a motion as a transformation that does not affect the shape of a figure under ifs action. (-1.1.

where V is a real inner product. any orthogonal operator on a finite-dimensional real inner product space is a rigid motion. Any orthogonal operator is a special case of this composite. Observe that T is the composite of / and the translation by —f(0).aT(y)\\2 = \\[T(x + ay) . is called a translation if there exists a vector e„ G V such that g(x) = X + VQ for all x G V.r) . ||T(.T(x) . Then ||T(x + ay) . Any translation is also a special case.T{y)\\2 = \\x . But ||T(x) . space. for any x G V ||T(x)|| 2 = !|/(.T(y)) + ||y| and \\x . y) for all x. Let T: V V be defined by T{x)=f(x)-f{0) for all x G V. as well as composites of rigid motions on a real inner product space.y G V. and let a G R.T(x)] . 6 Inner Product Spaces For example. We show that T is an orthogonal operator. y G V. every rigid motion on V may be characterized in this way.y\\2: so (T(x). (See Exercise 22.aT(y)|| 2 .y\\2 = ||x|| 2 . hence T is a rigid motion. Then there exists a unique orthogonal operator T on V and a unique translation g on V such that f = g o T.y) + \\y\\2. A function g: V — V.22. y G V.T(y)) + ||T(y)|| 2 2<T(aO. where g is the translation by f(0). Another class of rigid motions are the translations. It is a simple exercise to show that translations. in which the orthogonal operator is the identity operator. from which it follows that / = g o T. Let x. We are now in a position to show that T is a linear transformation.r) -f(0)\\2 = \\x-0\[2 = \\x\\2. Chap. Remarkably. y € V. We say that g is the translation by VQ.T(y)\\2 = ||T(z)|| 2 .2 (T(x). T h e o r e m 6. Proof.) Thus an orthogonal operator on a finitedimensional real inner product space V followed by a translation on V is a rigid motion on V. Let f: V — • V be a rigid motion on a Hnite-dimcnsional real inner product space V. are also rigid motions.386 for all x. in which the translation is by 0. T(y)) = (x. Thus for any x. Furthermore.2 (x. and consequently ||T(x)|| = ||x|| for any x G V.

1 1. -cos6>).T(y)> - (T(z). I Orthogonal Operators on R2 Because of Theorem 6. and hence T -. T h e o r e m 6..y)] for all x G V. and hence the translation is unique. an understanding of rigid motions requires a characterization of orthogonal operators. and det(A) = —1. Let. suppose that I (e->) — .5 Unitary and Orthogonal Operators and Their Matrices 387 = ||T(ar + ay) . Then exactly one of the following conditions is satisfied: (a) T is a rotation..(.U. To prove uniqueness.4) = cos2 0 + sin2 9 = 1.23. Substituting x — 0 in the preceding equation yields W Q = Vn.„ . . reduces to T(x) = U(. Since T(e 2 ) is a unit vector and is orthogonal to T(ei).T(x). v ~' K ' \s\\\9 cosi It follows from Example 1 of Section 6. Since T also preserves inner products.r) + oT(y).cos0) r .18(c). 1 hen . T(y)> = \\(x + ay) . Also del. . .r) for all x G V. T is an orthogonal operator. .22.x\\2 + a2\\yf 2 2 2 2 . T(. T be an orthogonal operator on R .2a[{T(x + oy).4 —I . Either T(e 2 ) = (-sin0.ay) = T(. 0 < 9 < 2TT.4 that T is a rotation by the angle 9.(x.y) . suppose that un and VQ are in V and T and U are orthogonal operators on V such that f(x)=T(x)+u0 = U(x)+v0 2a2[\y\\2-2a[(x. Thus T(x -f. (b) T is a reflection about a line through the origin. _ .tf) = {T(ei).2a (T(x + ay) . „ „. This equation. .y)+a\\y\\2-(x. Because T is an orthogonal operator. there are only two possible choices for T(c2).T(e2)} is an orthonormal basis for R2 by Theorem 6. therefore. there is a unique angle 9. 6. . Proof. such that T(ei) = (cos 9. (e. Since T(ei) is a unit vector. sin 9).T(y))] = a \\y\\ + a \\y[\ . where 0 is the standard ordered basis for R2.os9 —sin i first..s i n 9.2a[(x + ay. and det(A) = 1. _ . cos 0). and let A = [T]^.T(x)\\2 + a2\\T(y)\\2 . result characterizes orthogonal operators on R2. . The next.Sec.y)] = = 0. and hence T is linear. We postpone the case of orthogonal operators on more general spaces to Section 6. or T(e 2 ) = (sin0.

4 is a reflection of R2 about a line L through the origin by Theorem 6. Corollary. Example 7 Let \/5 -1 75 A = 2 We showjthat LA is the reflection of R2 about a line L through the origin. and hence is the line with the equation y= Conic Sections As an application of Theorem 6. Thus L is the span of {?.1 .}. Now suppose that T(e 2 ) = (sin0. (2) '5. Then A = det(A) = .= . and therefore A is an orthogonal matrix. it suffices to find an eigenvector of L4 corresponding to 1. 6 Inner Product Spaces cos 9 sin 0" sin0 — cos0 Comparing this matrix to the matrix A of Example 5.1 . Any rigid motion on R2 is either a rotation followed by a translation or a reflection about a line through the origin followed by a translation.20.22 and 6. Alternatively.23.23.s i n 2 0 = . and then describe L.. we see that T is the reflection of R2 about a line L. Clearly AA* = A* A = I. \/5 — 1). Furthermore. One such vector is v = (2. we consider the quadratic equation ax2 + 2bxy + ay2 + dx + ey + f = 0. L is the line through the origin with slope (\/5 — l)/2.l . Hence LA is an orthogonal operator.388 Chap. Since L is the one-dimensional eigenspace corresponding to the eigenvalue I of L4.. 5 5 and thus L. Furthermore. | Combining Theorems 6. 1 4 det(>4 = . — cos0). we obtain the following characterization of rigid motions on R2. so that a = 0/2 is the angle from the positive x-axis to L.c o s 2 0 ..

y) — » (x'. b = d = e = 0.y2 = 1 with center at the origin. The remaining conic sections. equivalently. the ellipse.5 Unitary and Orthogonal Operators and Their Matrices 389 For special choices of the coefficients in (2). we have by Theorem 6. namely.and y-terms. are obtained by other choices of the coefficients. If we let A = a b b e and X— For example. then our — 3.20.-2) in the .. This change of variable allows us to equation simplifies to ' .Sec.(y + 2)2 = 3.8. Now define X' = by X' = P'X or. by Theorem 6.y) —* (x'. We now concentrate solely on the elimination of the xy-term..y').r-t/-coordinate system. For example. since P is orthogonal. the quadratic then (3) may be written as X'AX = (AX. For example. and hyperbola. For. Quadratic forms are studied in more generality in Section 6. we may choose an orthogonal matrix P and a diagonal matrix D with real diagonal entries Ai and A2 such that P'AP = D. the equation x2 + 2x + y2 + Ay + 2 — 0 may be rewritten as (x + I) 2 4. we may interchange the columns . we obtain the various conic sections.X). form 3. (3) which is called the associated quadratic form of (2). which describes a circle with radius A/3 and center at (—1. If / > = 0. we obtain the circle x 4. and / = —1. Furthermore. 6.l\2 IV'\2 — eliminate the x. if a = c = 1.23 (with T = Lp) that det(P) = ±1. we consider the expression ax2 + 2bxy + cy2. Then XlAX = (PX')tA(PXl) = X'f(PtAP)X' = X''DX' = Xx{x')2 XiilJ >\2 Thus the transformation (x. parabola.r2 + Axy + 6y may be written as The fact that A is symmetric is crucial in our discussion. by PX' = PT>fX = X. where x' = x + 1 and y' = y + 2. then it is easy to graph the equation by the method of completing the square because the xy-term is absent. To accomplish this. If we consider the transformation of coordinates (x. If det(P) = —1. and hence in (2).y') allows us to eliminate the xy-term in (3).

So. consider the quadratic equation 2x2 \xy + ryy2 . by special choices of the coefficients. in the case // . 371). consequently.3 6 = 0./•')-' and (//')• are the eigenvalues of A = a b b e This result is a. The arguments above. Q'AQA2 0 0 A.Vv/57 /2_\ .•' follows that matrix P represents a rotation. the coefficients of (. we may always choose P so that det(P) = I. the same is true of the columns of Q. Axy I 5//2. of course. Notice that de\(Q) -det(P) = 1. 4 2 -9 -2 n so that the eigenvalues of A are 1 and 0 with associated eigenvectors As expected (from Theorem 6. the ellipsoid! the hyperbolic paraboloid. the . Therefore.22 (with T = L/-). etc. In summary. we can take Q for our new /'. Furl hermore.15(d) p. if det(P) 1.390 Chap. In the notation for which the associated quadratic form is 2x we have been using. where P is an orthogonal matrix and del (P) — 1.. 6 Inner Product Spaces of P to obtain a matrix Q.3. are easily extended to quadratic (•(Illations in n variables. we obtain the quadratic surfaces the elliptic cone. restatement of a result known as t he principal axis theort m for R2. For example. By Lemma4 to Theorem 6./•//-term in (2) may be eliminated by a rotation of the x-axis and //-axis to new axes x' and //' given by X PX'. The corresponding orthonormal basis of eigenvectors r i < . As an illustration of the preceding transformation. these vectors are orthogonal. Because the columns of / ' form an orthonormal basis of eigenvectors of .1.

There is another possibility .Sec. -7?2/ v-> 0 (i . Thus the original equation 2x2 — Axy \ by2 36 may be written in the form (. respectively. (See Figure 6.4.5 Unitary and Orthogonal Operators and Their Matrices determines new axes x' and y' as in Figure 6.) Note thai the preceding matrix / ' has the form cos 9 sin0 sin 0 cos0 where 9 = cos' 1 —= ~ 2(i.(i .and //-axes. Thus the change of variable X PX' can be accomplished by this rotation of the X.4 an ellipse.1. So /' is the matrix representation of a rotal ion v5 of R2 through the angle 0.and //'-axes in the directions of the first and second vectors ol 4. •'• .7 = * \/-> v I . 6.V + 4"'V5 y5 we have the new quadratic form (x') I (it//')2./•')"' M>(//)2 = 36 relative to a new coordinate system with the x'. Hence if (2 /' 1 V\ 5 -1\ 2 v'5 / 2 5 V 1 V -I 2 391 then P'AP I'nder the transformation X / ' A ' or 2 . It is clear I hat this equation represents Figure 6.

and the eigenvalues are interchanged. then the operator must be unitary or orthogonal. however. but interchanges the names of the x'. but not the inner product. (0) If all the eigenvalues of a linear operator are I. If two mat rices are unit arily equivalent. II the eigenvector of A corresponding to the eigenvalue 6 is taken to be (I. 2 I) I) 2 I I' 12 1 VI I 2 (C) 3-3. then we obtain the matrix /J_ -2 V. (i) A linear operator may preserve the norm. Prove that the composite ol unitary orthogonal operators is unitary ort hogonalb . (a) (b) (Q) (d) (e) (f) (g) Every unitary operator is normal.392 Chap. 2. which is the matrix representation of a rotation through the angle 9 9 63. for each of the following matrices . I. 2) instead of ( 1. This possibility produces the same ellipse as the VO one in figure 0.orthogonal operator is diagonalizable. Label the following statements as true or false.1. I . t hen t hey are also similar. The adjoint ol a unitary operator is unitary. If T is an orthogonal operator on V. EXERCISES 1. Even.and //'-axes. The sum ol unitary matrices is unitary.2). 6 Inner Product Spaces for P.75 _2_\ 1 75. find an orthogonal or unitary matrix /' and a diagonal matrix D such thai P"AP = D. Assume that the underlying inner product spaces are finite-dimensional.4°. A matrix is unitary if and only if it is invertible. 3-3/ 5 W (d) / 0 2 2\ 2 (l 2 \2 2 0/ (e) 3. then [T j is an orthogonal matrix for any ordered basis / for V.

(</) = 2U. Prove that T is a unitary operator if and only if \h(t)\ — I for 0 < t < 1. unitary operator on a.'s are the (not necessarily distinct) eigenvalues of A. then T has a unitary square. Prove that (T + /'I)(T — y'l)~' is unitary using Exercise 10 of Section 6.Sec.r(/l) = ] T X i i-i where the A. Prove that . Let U be a linear operator on a finite-dimensional inner product space V. and define T: V — V by T ( / ) = /.r)|| — ||. define T~: C — > C by T-. Characterize those z for which T a is normal. Let V be the inner product space of complex-valued continuous functions on [0. real symmetric or complex normal matrix. self-adjoint. must U be unitary? Justify your answer with a proof or a counterexample.|2.|| for all x in some orthonormal basis for V. 7. Let A be an n x /. For z 6 C./'.4. or unitary. 6. Let T be a self-adjoint linear operator on a finite-dimensional inner product space.. Let h C V. there exists a unitary operator U such that T = U ./.1] with the inner product </. Prove that if T is a.5 Unitary and Orthogonal Operators and Their Matrices 393 4. If ||U(.<?)= / f(t)gjt)dt. . 9. root: that is. 8. Which of the following pairs of matrices are unitarily equivalent? 0 1 1 0 and (b) 0 0 '0 and f 0 and and 6. 10. 5. finite-dimensional inner product space V. and tr(A*A) = ^ |A.

.V2.U(r.y e W. I u2) = V\ . Let U be a unitary operator on an inner product space V. G W and v2 G W .) 15. Define U : V -V by U(e. self-adjoint unitary operator. Prove that U is a. Prove that a matrix that is both unitary and upper triangular must be a. (a) . 18. Prove the following results. A linear operator U on V is called a partial isometry if there exists a subspace W of V such that |!U(.r) 0 for all x G W . Suppose that U is such an operator and {v\.394 Chap. (b) {U(/']). Prove or disprove that A is similar to II if and only if A and 11 are unitarily equivalent. 14. and let W be a finite-dimensional U-invariant subspace of V. . r-> Uk} is an orthonormal basis for W. .r|| for all x C W and U(. ^).1.is not U-invariant.1.y) for all x.7 (p. 6 Inner Product Spaces 11.] ] A . V W : AN 1 . Prove that n (let 1. Show that "is unitarily equivalent to" is an equivalence relation on Mnxn(C).is U-invariant. Let V be a finite-dimensional inner product space. ('ontrast (b) with Exercise 16. 13. Suppose that . 20. Observe that W need not be U-invariant. Let A be an n x n real symmetric or complex normal matrix. where the A.'s are the (not necessarily distinct) eigenvalues of A. (See t he definit ions in the exercises in Section (i.U(//)) = (x.r)|| = ||. 12.3. Let W be a finite-dimensional subspace of an inner product space V. then A is positive definite [semidefinite] if and only if II is positive definite semidefinite]. (U(:r). Find an example of a unitary operator U on an inner product space and a U-invariant subspace W such thai W .) U(a-)} is an orthonormal basis for R(U). where e. Find an orthogonal matrix whose first row is ( | . 17. 16. Hint: Use Exercise 20 of Section 6. diagonal mat rix. Prove that if A and 11 are unitarily equivalent matrices. 352) and the exercises of Section 1. 19. | . By Theorem 0. Prove that (a) U(W) W: (b) W. 1.4 and 11 are diagonalizable matrices.

j = ] (c) Use (b) to show.y) = (x. then both UT and TU arc reflections about lines through the origin.. (b) If T is a rotation and U is a reflection about a line through the origin. E I^I2 = E •1.that the matrices 1 2 2\ i) . Let A and B be n x n matrices that. W]..and 0 = {U(ci).5 Unitary and Orthogonal Operators and Their Matrices 395 (c) There exists an orthonormal basis 7 for V such that the first k columns of [U]7 form an orthonormal set and the remaining columns are zero..iu-2.y G 0. Let T and U be orthogonal operators on R2. 22. (a) If T and U are both reflections about lines through the origin.23 to prove the following results. 6.3-] t. Then 0 is an orthonormal basis for V.. 21.22: If / : V — > V is a rigid motion on a finite-dimensional real inner product space V. U(e 2 ). then there exists a unique orthogonal operator T on V and a unique translation g on V such that f = T o g. 23..6.)) = v\ (1 < i < k) and T(v)i) = 0 (1 < i < 3). (a) Prove that any translation on V is a rigid motion.mm Sec. \J(vk). Prove the following variation of Theorem 6. Use Theorem 6. (e) Let T be the linear operator on V that satisfies T(U(t'. (d) Let {wi. Let V be a real inner product space.. (f) U* is a partial isometry. and T = U*..T(y)) for all x. m.. (b) Prove that the composite of any two rigid motions on V is a rigid motion on V. • • • • Wj} be an orthonormal basis for R(U)1. are unitarily equivalent. then UT is a rotation. Then T is well defined. This exercise is continued in Exercise 9 of Section 6. There are four cases. (a) Prove that tr(AM) = (b) Use (a) to prove that ti(B*B). 24.. Hint: Show that (D(x). Wj}.d I%I2- fi (l 4 1 are not unitarily equivalent. .

Prove A = QR. By Exercise 24. QR-Factorization.uo "„ be the orthonormal basis obtained by normalizing the r.2./-2 + 2xy + 3//2 x2 .1).. respectively.0). Suppose that T and U are orthogonal operators on R such that T is the rotation by the angle o and U is the reflection about the line L through the origin. show that A--1 U U ifk = I Vk\\Uk + ^ (wk. (c) Compute Q and R as in (b) for the 3 x 3 matrix whose columns are the vectors (1.w„ be linearly independent vectors in F".-. Let r be the angle from the positive .2.A2(j/7) . 26. Find its angle of rotation. by the Gram Schmidt process.4 is as defined in Exercise 2(e).396 Chap.0. Let Ui. (a) Solving (1) in Section 6. //. both UT and TU are reflections about lines l_i and L2. Find new coordinates x'. respectively.1.2 for Wk in terms of u&. 29. 27. z) and .z' so that the preceding expression is of the form A| (.Ay2 3.W2 a'. (a) (b) bind the angle 9 from the positive rc-axis to L|.r> rn be the orthogonal vectors obtained from W\. (2. Let wi. By Exercise 24.r-axis to L2.y'. UT is a rotation.r-axis to L. 6 Inner Product Spaces 25. . i) 3 [\<k< n).ll Rjk = { (wk.Uj) 0 ifj=* if j <k ifj>Ar. bind the angle 0 from the positive . Define /? G M„>„(F) by '•. through the origin..W2. (a) (b) (c) (d) (e) x2 + Axy + y2 2x2 + 2xy 4. . and let L'i. Consider the expression X AX. where X1 = (. Suppose that T and U are reflections of R2 about the respective lines L and L' through the origin and that 0 and v are the angles from the positive x-axis to L and L'. (b) Let A and Q denote the n x // mat rices in which the fcth columns are Wk and Uk.X->(y')2 + X%(z')2.r')2 4.y' so that the following quadratic forms can be written as Xi(x') -t.ry + y2 28./•. find a change of coordinates x'.1). and (2.'s.2\r x2 .Ylxy . respectively.

2x3 .2 (x. 30. 4.. where Q is unitary and />' is upper triangular. Hint: Use Exercise 17.9^ Sec. (Later. (a) H„ is linear. however. (d) H* = H„ and H2 = I. Decompose A to QR. we have shown that every invertible matrix is the product of a unitary [orthogonal] matrix and an upper triangular matrix.2ar2 4. Then QRx — b." then it is nearly as stable as the orthogonalization method. J.I I •''2 + X3 = . This last system can be easily solved since P is upper triangular. H„ is a reflection. 6. (Note: If V is a real inner product space. then in the language of Section 6.) . Prove that D = i?2/?j" is a unitary diagonal matrix. then (3 is orthonormal if and only i f ' is orthonormal. sj)<>ce.) 'At one time. II.5 Unitary and Orthogonal Operators and Their Matrices 397 (d) Since Q is unitary [orthogonal] and /? is upper triangular in (b).r•> ./•) = x . and hence H„ is a unitary [orthogonal] operator on V.P. Definition. and let u be a unit vector in V.I Xj • 2. The following definition is used in Exercises 31 and 32. because of its great stability. and hence Rx = Q'b. this method for solving large systems of linear equations with a computer was being advocated as a better method than Gaussian elimination even though it requires about three times as much work.where Q\-Qz £ M„ x „(/'') are unitary and P\.1 .r. G M„ x „(/'') are upper triangular. (b) H„(.11. Suppose that 3 and *) are ordered bases for an //-dimensional real [complex] inner product space V. Prove the following results.^-coordinates. Prove thai if Q is an orthogonal [unitary] ii x ii matrix that changes 7-coordinates into . (e) The QR factorization described in (b) provides an orthogonalization method for solving a linear system Ax — b when A is invertible. by the Gram Schmidt process or other means. 31. (c) Hu(u) = -u. it) 11 for all x C V. Use the orthogonalization method and (c) to solve the system . Define the Householder operator H„: V — V by H„ ('. Let V be a finite-dimensional complex [real] inner product. Let H„ be a Householder operator on a finite-dimensional inner product space V. Wilkinson showed that if Gaussian elimination is done "properly.r) — X if and only if X is orthogonal to u. Suppose that A G M n x n ( F ) is invertible and A = Q1Ri = Q>Il>.

with x 1 G Wi and x-z C WL>. Let V be an inner product space. prove that there exists a unit vector // in V such thai r\u(x) = y. where A.2. Because V .i.W. \\x — 9y\\ (b) If F = R.9y) is real. we have R ( T ) = W . it can be shown (see Exercise 17 of Section 2. 374) to develop an elegant representation of a normal (if /•' — C) or a self-adjoint (if F = R) operator T on a finite-dimensional inner product space.. we rely heavily on Theorems 6. Hint: Choose 9 so that (x.. Let x and y be linearly independent vectors in V such that ||cc|| = \\y\\. 6.r2. are the distinct eigenvalues of T and T | . Recall from the exercises of Section 2.R(T). T is uniquely determined by its range.6 (p. if V is finite-dimensional.3. r-Ax — 9y). 350). arc orthogonal project ions. We say that T is an orthogonal projection if R(T) = N(T) and N(T) 1 . ©W 2 = W.r) = x\. Definition. In the notation of Theorem 6." fn fact.A2T2 + • • • 1 A/. We must first develop some results about these special projections. projection if and only if T = = T2. Let V be a finite-dimensional inner product space over F. • W:i does not imply that W2 . Note that by Exercise 13(c) of Section 6.•) — 9y.398 Chap..17 (p. we need only assume that one of the preceding conditions holds.N(T). The special case where V is a direct sum of two subspaces is considered in the exercises of Section 1. we can define a function . if R(T) .6 ORTHOGONAL PROJECTIONS AND THE SPECTRAL THEOREM In this sect ion. We prove that T can be written in the form A| T[ 4. we have T(.1.. (a) If F — ('. = b e V : T(x) =x d N(T) = W->. For an orthogonal projection T. X> A/.. and set u — r.3) that T is a.16 (p. By Exercise 20 of Section 2.R(T)' ' = N(T) . Now assume that W is a. T/. prove that there exists a. unit vector u in V and a complex number 0 with |0j — 1 such that H„(.1 that if V = W] ©W 2 . for example. 372) and 0.2.. We assume that the reader is familiar with the results about direct sums developed at the end of Section 5. finite-dimensional subspace of an inner product space V. 6 Inner Product Spaces 32. then a linear operator T on V is t he projection on W.. Thus there is no ambiguity if we refer to T as a "projection on W|" or simply as a "projection. we see that Wi does not uniquely determine T. whenever .W. So V = R(T) © N(T). along W2 if. und let T: V — V be a projection.T/. then R(T) .. however./• — X] 4-. T_>.

ijt where c/„ or (/ „ is nonzero.a\). Then T is the orthogonal projection of V on W. Note that V .T(v) G W-L. 6. It is easy to show that T is an orthogonal projection on W. We show that the best approximation to / by a trigonometric polynomial of degree less than or equal to n is the trigonometric polynomial . Figure: 6. We call T the orthogonal projection of V on W./• and U(ai. we see that T(v) is the "best approximation in W to y".6 Orthogonal Projections and the Spectral Theorem 399 T: V — V by T(/y) — u. Define a trigonometric polynomial of degree // to be a function g G H of the form g(t) X "jfj1 E j=-n . We can say even more there exists exactly one orthogonal projection on W. Define U and T as in Figure 6. then R(T) — W — R(U).a 2 ) = (<i\. whereas v . then \\w — v[\ > |JT(r) v\\. if ir G W. where T(v) is the foot of a. As an application to Fourier analysis. 350). To understand the geometric difference between an arbitrary projection on W and the orthogonal projection on W. that is. perpendicular from v on the line y — . recall the inner product space H and the orthonormal set S in Example 9 of Section 6. These results follow immediately from the corollary to Theorem 0.R(U)J.Sec.5.5 From Figure 6. this approximation property characterizes T.U(» £ W ' . let V = R2 and W = span{(l. we have T — U. and U is a different projection on W.= N(U). 1)}.1. For if T and U are orthogonal projections on W. Let / G H.5. Hence N(T) = R(T) J .0 (p. In fact. and since every projection is uniquely determined by its range and null space.

where X\. I sing the preceding results.T(.r). Let V be an inner product space.T ( y ) ) = (y. and let T be a linear operator on V.yi 4. Hence R(T) . Then T is a projection by Exercise 17 of Section 2. Now suppose that .y2) . Suppose that T is an orthogonal projection./• C RfT)^.T(y . The corollary to Theorem 6./•).2.r). Then T is an orthogonal projection if and only if T has an adjoint T* and T2 — T = T". 350) tells us that the best approximation to / by a function in W is T(/)= £ (fjj)fj An algebraic characterization of orthogonal projections follows in the next theorem. Si. Let . that is.= N(T).D N(T) by Exercise 13(b) of Section 0. Therefore x G N(T) 1 .N(T) ' and R(T)' = N(T).//i) = ( l i .y ( V.y) for all x.y\ G R(T) and . For any y G V.r 400 Chap.T(y)) . Then x = T(x) = T*(. we have R(T) ' N(T) 1 .y) = (T*(. Now l|y-T(»)||2 = <y-T(y). and let T be the orthogonal projection of H on W. let W = span({/.y-T(y)> = (y. and thus.!)•>} = (xi. and so (x.<•'•]•</]} • So (.N(T)~.y>. we need only show that T* exists and T = T*.yi) + (x1. y .y 2 G N(T).T*(//)) = (x.ri 4. 0) = 0. we have (T(. ( T ( y ) .6 (p. I . // .(x. Since T 2 = T because T is a projection.(. y i ) + {•f2-!l\} .y G V. Proof. But also T(y))) = (y.y Since // T(y)). T(y) G N(T).r 2 . from which it follows that R(T) C N(T) X .0) = 0. Then we can write x = X\ 4 x-2 and y = yx 4.r) = 0.y-T(y))-(T(y).T(//)) {T(x).//) = (x.r 2 . 6 Inner Product Spaces whose coefficients are the Fourier coefficients o f / relative to the orthonormal set S. Now V = R(T) © N(T) and R(T) 1 N(T). Hence R(T). Lei y G N(T) ± .T(y))) = (y.r C R(T) and y G N(T).3.//) = (. the first term must equal zero.//) = (x-i. thus T* exists and T = T*. Hence (•'•-T(//)) . T h e o r e m 6.r. For this result.{•''! •!!)) and (T(.r.J(y)) = 0. T(y) = y.: \j\ < //}). Let x. /• C N(T).T(y) G R(T). We must show that y G R(T).T'dl Thus y T(y) = 0: that is.r). Now suppose that T = T = = T*.24. and hence we must show that R(T) .

Xk. [T]^ has the form h Oi Or o3 If U is any projection on W. Then the following statements are true. Suppose that T is a linear operator on a finite-dimensional inner product space V over F with the distinct eigenvalues Ai. + T 2 + --. we have dim(W/-) = dim(V) -dim(Wt) by Theorem 6.Assume that T is normal if F = C and that T is self-adjoint if F — R... Then [T]/j is a diagonal matrix with ones as the first k diagonal entries and zeros elsewhere. Vk} is a basis for W. (d) I = T . j < k. Xk.Sec. and T be the orthogonal projection of V on W. In fact. vn} for V such that {v\. 278). (d) Since T* is the orthogonal projection of V on Wj. 372) and 6..e\Nk. For each i (1 < i < k). (a) v = w 1 © w 2 e . (b) If ar € Wi and y G Wj for some i ^ j. let W* be the eigenspace of T corresponding to the eigenvalue Aj. however 7 is not necessarily orthonormal. so V = Wi © w 2 w* by Theorem 5. we have X = %i + %2 + r.• • •. .dim(W... Hence. proving (b). T is diagonalizablc. Hence W^ = WT1.17 (p. On the other hand. we have dim(W^) = ^ d i m ( W j ) = dim(V) ..V2. Theorem 6.). and let T. it follows from (b) that N(Ti) = R(Tj) x = W-1 = \N[.. then (c) TjTj = SijT-i for 1 < i.25 (The Spectral Theorem). We may choose an orthonormal basis j3 = {v\. 352). where Ti(x) = x% £ Wj.. (e) T = AJTJ + A2T2 + • • • + AfcTfc. (b If W^ denotes the direct sum of the subspaces \Nj for j 7^ i.+Tk. y) — 0 by Theorem 6. (c) The proof of (c) is left as an exercise. v<i. proving (d). then (x.. 374). 37T It follows easily from this result that W^ C W4From (a). 6. Proof. We are now ready for the principal theorem of this section. we may choose a basis 7 for V such that [U]7 has the form above. (a) By Theorems 6.. W be a subspace of V. for x € V.6 Orthogonal Projections and the Spectral Theorem 401 Let V be a finite-dimensional inner product space.. A2.11 (p. be the orthogonal projection of\/ on W.16 (p..15(d) (p.7(c) (p.

many more results are found in the exercises.r. If F = C. in (d) is called the resolution of the identity operator induced by T.) + T(x2) + ••• + T(xfc) = Ai#i + \2x2 H 1. A 2 . Conversely.9(Afc)Tfc = A1T1 + A2T2 + ••• + AfcTfc=T*. Then T(x) = T(. Then 9(V=g(X1)T1+g(X2)T2 + • • • + . and the sum T = A1T1 + A2T2 + • • • + AfcTfc in (e) is called the spectral decomposition of T. then T commutes with T* since T commutes with every polynomial in T. where Xi <E Wj. we assume that T is a linear operator on a finite-dimensional inner product space V over F. If A1T1 + A2T2 + • • • + A^Tfc is the spectral decomposition of T. Proof. then T is normal if and only if T* = g(J) for some polynomial g. write x = x\ + x2 -\ Chap. 1 . . we have T* = A1T1 + A2T2 + • • • + AfcT& since each Tj is self-adjoint.+ AfcTfc)(a. let 0 be the union of orthonormal bases of the Wj's and let raj = dim(Wj). Let T = A1T1 + A2T2 + • • • + A^T^ be the spectral decomposition of T. 6 Inner Product Spaces h xk.Ox) + A2T2(:r) + • • • + AfcTfc(z) .) Then [T]^ has the form (X\Imi o \ 0 o Xol 2*m 0 0 2 \ o Xklmk) that is. A^} of eigenvalues of T is called the spectrum of T. (Thus raj is the multiplicity of Aj.). Taking the adjoint of both sides of the preceding equation. . Suppose first that T is normal. \T]f) is a diagonal matrix in which the diagonal entries are the eigenvalues Aj of T. . With the preceding notation. The spectral decomposition of T is unique up to the order of its eigenvalues. So T is normal. then it follows (from Exercise 7) that fl(T) = g(Xi)T\ + #(A 2 )T 2 + • • • 4. I The set {Ai. the sum I = Ti +T2 + • • • +T*. For what follows.( A i T i + AaTa + " . .402 (e) For x € V.AfcZfc = AjT.g{Xk)Tk for any polynomial g. We now list several interesting corollaries of the spectral theorem. if T* = <?(T) for some polynomial g. we may choose a polynomial g such that g(X{) = Aj for 1 < i < k. and each Aj is repeated raj times. Using the Lagrange interpolation formula (see page 52). This fact is used below. Corollary 1.

Proof Choose a polynomial gj (1 < j < k) such that #?(Aj) = 6^. Let T he as in the spectral theorem with spectral decomposition T = AiTi + A2T2 + • • • + AfcTfc. If T is unitary. . then T is normal and every eigenvalue of T has absolute value 1 by Corollary 2 to Theorem 6.• • + AfcTfc = A. Then T* = AiTi + A2T2 + . 6.6 Orthogonal Projections and the Spectral Theorem 403 Corollary 2.XkTk be the spectral decomposition of T. then T is unitary if and only if T is normal and |A| = 1 for every eigenvalue A ofT.17 (p.+ |Afc| Tfc = Tj + T 2 + • • • + Tfc 2 2 2 A*Tfc) Hence T is unitary.Sec. Label the following statements as true or false. Corollary 4.18 (p. (c) Every self-adjoint operator is a linear combination of orthogonal projections. be the spectral decomposition of T. TT* = (AiTi + A2T2 + • • • + AfcTfcXAiTi + A 2 T 2 = |A 1 | T 1 + |A 2 | T 2 + --. I EXERCISES 1. (a) All projections are self-adjoint. then by (c) of the spectral theorem. Suppose that every eigenvalue of T is real. 382). If F = C and T is normal. Proof. Then 9j(V = Pi(Ai)Ti + gj{X2)T2 + ••• + gj{Xk)Tk — 5\jT\ + S2jT2 -\ h SkjTk = Tj. The converse has been proved in the lemma to Theorem 6. (b) An orthogonal projection is uniquely determined by its range. If |A| = 1 for every eigenvalue A of T. Let T = Ai T x + A2T2 -I. then T is self-adjoint if and only if every eigenvalue of T is real.• —f. 374). Then each Tj is a polynomial in T. If F = C. Assume that the underlying inner product spaces are finite-dimensional. Proof. Let T = AiTi + A2T2 -fh A^T*.Ti + A2T2 + • • • + XkTk = T. 1 Corollary 3.

then SCO = £ > ( A i ) T < z=i (b) If T" = To for some n. . Let T be a normal operator on a finite-dimensional inner product space. (f) T is a projection if and only if every eigenvalue of T is 1 or 0. Let V = R2. (a) If g is a polynomial. Give an example of a projection for which this inequality does not hold. prove that ||T(a:)|| < ||rr|| for all x e V. then T(x) is the vector in W that is closest to x. then T = To(c) Let U be a linear operator on V. 0. (3) Verify your results using the spectral theorem. Then U commutes with T if and only if U commutes with each Tj. (2) For each eigenvalue of LA.404 Chap. Prove that T is an orthogonal projection. 3. For each of the matrices A in Exercise 2 of Section 6. Let W be a finite-dimensional subspace of an inner product space V.x)|| < ||x|| for x G V. then I — T is the orthogonal projection of V on W -1 . What can be concluded about a projection for which the inequality is actually an equality for all x 6 V? (b) Suppose that T is a projection such that ||T(. Prove that if T is a projection.5: (1) Verify that LA possesses a spectral decomposition. 5. 6 Inner Product Spaces (d) If T is a projection on W. and /? be the standard ordered basis for V. (e) T is invertible if and only if Aj ^ 0 for 1 < i < k.1)}). Compute [T]^. (e) Every orthogonal projection is a unitary operator. where T is the orthogonal projection of V on W. 6. 7. Use the spectral decomposition AiTi + A2T2 + • —h XkTk of T to prove the following results. 2. 2 (d) There exists a normal operator U on V such that U = T. Show that if T is the orthogonal projection of V on W. W = span({(l. Let T be a normal operator on a finite-dimensional complex inner product space V. Let T be a linear operator on a finite-dimensional inner product space V.2)}). 4. then T is also an orthogonal projection. Do the same for V = R3 and W = span({(l. (a) If T is an orthogonal projection. explicitly define the orthogonal projection on the corresponding eigenspace.

Let U and T be normal operators on a finite-dimensional complex inner product space V such that TU = UT. we establish a comparable theorem whose scope is the entire class of linear transformations on both complex and real finite-dimensional inner product spaces—the singular value theorem for linear transformations (Theorem 6.17. However. or any matrix A for which (Ax. 372. The singular value theorem encompasses both real and complex spaces.16. Thus any operator T for which (T(x).Sec.26). All rely on the use of orthonormal bases and numerical invariants. the eigenvalues. (a) U*U is an orthogonal projection on W. This property is necessary to guarantee the uniqueness of singular values. 374). In this section.7 The Singular Value Decomposition and the Pseudoinverse (g) T = —T* if and only if every Aj is an imaginary number. (b) UU*U = U. There are similarities and differences among these theorems. we characterized normal operators on complex spaces and selfadjoint operators on real spaces in terms of orthonormal bases of eigenvectors and their corresponding eigenvalues (Theorems 6.5. If the two spaces and the two bases are identical. and 6. the singular values. 6.7* THE SINGULAR VALUE DECOMPOSITION AND THE PSEUDOINVERSE In Section 6. . Referring to Exercise 20 of Section 6.4. Hint: Use the hint of Exercise 14 of Section 6. Prove (c) of the spectral theorem. Use Corollary 1 of the spectral theorem to show that if T is a normal operator on a complex finite-dimensional inner product space and U is a linear operator that commutes with T.T(y)) = {x. Simultaneous diagonalization. 11. the singular value theorem is concerned with two (usually distinct) inner product spaces and with two (usually distinct) orthonormal bases. are nonnegative. in this section we use the terms unitary operator and unitarymatrix to include orthogonal operators and orthogonal matrices in the context of real spaces. Another difference is that the numerical invariants in the singular value theorem. p. y). 405 8. 10. 9.y). 6. Prove that there exists an orthonormal basis for V consisting of vectors that are eigenvectors of both T and U. in fact. prove the following facts about a partial isometry U. for all x and y is called unitary for the purposes of this section. Ay) = (x. be a normal or self-adjoint operator. p. For brevity. then the transformation would. then U commutes with T*.4 along with Exercise 8. in contrast to their counterparts. for which there is no such restriction. because of its general scope.

v2.j (ui. Furthermore. T*T is a positive semidefinite linear operator of rank r on V. where 3 and 7 are orthonormal bases for V and W. hence there is an orthonormal basis {v\. Vi is an eigenvector of T*T with corresponding eigenvalue of if 1 < i < r and 0 if i > r.406 Chap. is an orthonormal subset of W. .. Therefore the scalars o\_. Then T(Vj)) ur} (JiO (Vi.Vj) o.. Proof. .T(vi). define < T j = y/Xi and Ui = —T(vA We show that {u{.vn} for V and {u\.. . By this exercise. the linear operator T*T on V is positive semidefinite and rank(T*T) — rank(T) by Exercise 18 of Section 6.. 6 Inner Product Spaces In Exercise 15 of Section 6.26 (Singular Value Theorem for Linear Transformations)..3. By Exercises 18 of Section 6. and let T: V — > W be a linear transformation of rank r.. we begin with the principal result. 0 (4) Conversely.o2.\Oi Oj < r. suppose that the preceding conditions are satisfied. Let V and W be finite-dimensional inner product spaces.3... We first establish the existence of the bases and scalars. respectively. the definition of the adjoint of an operator is extended to any linear transformation T: V — • W. . Then for 1 < i < n. v2.Uj) = ( .u2. Theorem 6. and Aj = 0 for i > r. Then there exist orthonormal bases {v\.o~r arc uniquely determined by T.4 and 15(d) of Section 6. u2..o I"} = Sij.. where V and W arc finite-dimensional inner product spaces. For 1 < i < r.4... v„} for V consisting of eigenvectors of T*T with corresponding eigenvalues Aj.. where Ai > A2 > • • • > Ar > 0. • • •. With these facts in mind. Suppose I <i. the adjoint T* of T is a linear transformation from W to V and [T*]^ = ([T]!)*.um} for W and positive scalars o~\ > <r2 > • • • > or such that T{Vi) = crjUj if 1 < i < r ifi > r.

If i > r. 352). .. 6. the singular values of a linear transformation T: V —* W and its adjoint T* are identical.. {u\.o2..26 are simply reversed for T*. ...um}. To establish uniqueness. Clearly T(vi) = GiUi if 1 < i < r. T*T(vi) = T*(cTjWj) = <7jT*(uj) = afvi and T*T(VJ) = T*(0) = 0 for i > r. um} for W.. and so T(vi) = 0 by Exercise 15(d) of Section 6. Therefore each Vi is an eigenvector of T*T with corresponding eigenvalue of if i < r and 0 if i > r. o~iVi if i = j < r 0 otherwise. If r is less than both m and n.v2.. (T*(ui).. By Theorem 6. In view of (5).. (5) if i — j < r otherwise..ur} is orthonormal. Although the singular values of a linear transformation T are uniquely determined by T. and o~\ > a2 > • • • > or > 0 satisfy the properties stated in the first part of the theorem.u2....7(a) (p. suppose that {v\.. So for i < r. this set extends to an orthonormal basis {u\..or in Theorem 6. The unique scalars o\.Vj) = (ui. I Definition. Furthermore.3. .. Then for 1 < i < m and 1 < j < n. then the term singular value is extended to include or+i = • • • = ok = 0. ur.. the orthonormal bases for V and W given in Theorem 6... where k is the minimum of m and n. the orthonormal bases given in the statement of Theorem 6...26 are not uniquely determined because there is more than one orthonormal basis of eigenvectors of T*T.g(x))= / -l f{t)g{t)dt.T(vj)) ai 0 and hence for any 1 < i < m.. u2.u2. then T*T(^j) = 0.7 The Singular Value Decomposition and the Pseudoinverse 407 and hence {ui.Sec.26 are called the singular values of T.vn}. Example 1 Let P2(R) and Pi(-R) be the polynomial spaces with inner products defined by (f(x).

3) = 0. Now set °~i — \ A 7 = N/15 and a2 =• \[X2 — v3. v2. Translating everything into the context of T. respectively. the nonzero singular values of T. we use the results of Exercise 21(a) of Section 6. where o\ > a2 > 0 are the nonzero singular values of T.u2} .x o~\ V 2 and u2 = —T{v2) = -7=. v/2 Then (3 — {^1. A2 = 3. These eigenvalues correspond. and A3 = 0. let ui = \ / g ( 3 ^ . e 2 = (0. (See Exercise 15 of Section 6.408 Chap. and take Mi = — T(i'i) = \ .l ) . Let A = m. A2.2 and T(?.2 to obtain orthonormal bases for P2(7?) and P^ft).0. = (1.3.1..0. = 0 0 \/3 0 0 v 7 !^ Then which has eigenvalues (listed in descending order of size) Ai = 15.1). 6 Inner Product Spaces Let T: P 2 (/?) —• P\(P) be the linear transformation defined by T(f(x)) = f'{x). and V-A — —=.) = OjUi for i = 1. respectively. cr2 s/2 for Pi(/?). and Pi(/?). u2} for Pi(H) such that T(i'. P 2 (i?).v3] for P2(R) and 7 = {u\. and c. to the orthonormal eigenvectors e 3 = (0.0). and A3. we must use a matrix representation constructed from orthonormal bases for P2(li) and P\(R) to guarantee that the adjoint of the matrix representation of T is the same as the matrix representation of the adjoint of T.^3} is an orthonormal basis for P2(R) consisting of eigenvectors of T*T with corresponding eigenvalues Ai. to obtain the required basis 7 = {ui. To facilitate the computations. Caution is advised here because not any matrix representation will do. we translate this problem into the corresponding problem for a matrix representation of T.^2.0) in R3. Since the adjoint is defined in terms of inner products. Find orthonormal bases li = {v\.) For this purpose.

We characterize S' in terms of an equation relating x[ and x2. We apply Theorem 6.6.6 .26 to describe 5 ' = T(5).7 The Singular Value Decomposition and the Pseudoinverse 409 We can use singular values to describe how a figure is distorted by a linear transformation. relative to 0. u € S' if and only if v € S if and only if i^=xl+x22 = l. respectively.26. if u = x\u\ + x'2u2. this is the equation of an ellipse with major axis and minor axis oriented along the rc'-axis and the y'-axis. where the x'-axis contains u\ and the ?/-axis contains u2.|| = l}. as in Theorem 6. If o\ = o2.:r2T('<j2) = XiOi'ti\ + x2o2u2. (See Figure 6. 6. Furthermore. we have x\ = x\a\ and x'2 = x2a2. Example 2 Let T be an invertible linear operator on R2 and S'={a:GR 2 :||3. the unit circle in R2.v2} and (3 = {ui. which we shall call the aj'^/'-coordinate system for R . is the coordinate vector of u X.u2} be orthonormal bases for R2 so that T(i>i) = o~iUi and T(f 2 ) = a2u2.) • T V — X\V\ + X1V2 U = X1U1 + X2U2 Figure 6. Since T is invertible. Thus for u = x'xu\ + x'2u2. it has rank equal to 2 and hence has singular values o\ > <r2 > 0.1vi + x2v2) = xiT(t. This is illustrated in the next example. this is the equation of a circle of radius o\. For any vector u € R2. Then /3 determines a coordinate system. then [u]^ = ( . and if a\ > o2.Sec.i) -f. For any vector v = xi^i + x2v2 G R 2 . Let {vi. the equation u = T(v) means that u = T(a.

the nonzero singular values of A are the same as those of L^.27. is given by U{o-3e3) = o~3U{e3) = OjUj.. are ra x n matrices whose corresponding columns are equal. the j t h column of AV is AVJ = Ojiij. By Theorem 6. We begin with the definition of the singular values of a matrix. v2. there exist orthonormal bases f3 — {v\. vn} for F n and 7 = {u\. the columns of V are the vectors in (3. A factorization A = U"EV* where U and V are unitary matrices and E is the m x n matrix defined as in Theorem 6. Proof. and let E be the ra x n matrix defined by Sij — Oi ifi = j < r 0 otherwise.27 is called a singular value decomposition of A. column is Vj for all j. 90)...27 (Singular Value Decomposition Theorem for Matrices). (See Exercise 9. Let A be an m x n matrix. 6 Inner Product Spaces The singular value theorem for linear transformations is useful in its matrix form because we can perform numerical computations on matrices.. Observe that the jth column of E is CTJCJ..) . I Definition. hence they are the square roots of the nonzero eigenvalues of A*A or of AA*. Then there exists an m x ra unitary matrix U and a n n x n unitary matrix V such that A = UXV*.u2. Therefore A = AW* = UEV*. Let A be an ra x n matrix of rank r with positive singular values o\ > o2 > • • • > or.26..13(a) and (b).. Note that both U and V are unitary matrices. and let V be the nxn matrix whose jth. Let T = L^: Fre -> F m . So by Theorem 2. Let U be the ra x ra matrix whose j t h column is Uj for all j . the j t h column of UT. Definition.um} for F m such that T(vi) — OiUi for 1 < i < r and T(Vi) = 0 for i > r.. Furthermore.13(a) (p. By Theorem 2. and hence AV = UE. We define the singular of A to be the singular values of the linear transformation LA • values Theorem 6. In the proof of Theorem 6. It follows that AV and UT. where ej is the jth standard vector of F m .410 Chap. Let A be an m x n matrix of rank r with the positive singular values o\ > o2 > • • • > ay. and the columns of U are the vectors in 7.

The Polar Decomposition of a Square Matrix A singular value decomposition of a matrix can be used to factor a square matrix in a manner analogous to the factoring of a complex number as the product of a complex number of length 1 and a nonnegative number. we take 1 1 A = —Av\ = —fc o~i Oi V2 Next choose u2 = —j= I J. Hence.£ i . 6. Then y/Q 0 0 0 0 0 /1 V^ 1 V = </Z \V3 Also.27. / \ «i = — L A (Mi) orthonormal basis 7 = {«I.27. and set U = 1 1 —X I \y/2 y/2/ Then A — UT.^2.M 2 } for R2. a unit vector orthogonal to Wi. the complex number of length 1 is replaced by a unitary matrix. to obtain the 1 E = and J_ 1 \ \/2 V^ -1 1 V2 V6 .28 (Polar Decomposition).7 The Singular Value Decomposition and the Pseudoinverse Example 3 We find a singular value decomposition for A = First observe that for 1 >'\ = 73 and * .^3} is an orthonormal basis for R3 consisting of eigenvectors of A* A with corresponding eigenvalues Ai = 6. as in Theorem 6. Consequently.j U3 = _1_ 7e 1 1 -1 -1 411 the set 0 = {^1. For any square matrix A. there exists a unitary matrix W and a positive semidefinite matrix P such that A = WP.Sec.V* is the desired singular value decomposition. • . and A2 = A3 = 0. Theorem 6. and the nonnegative number is replaced by a positive semidefinite matrix. as in the proof of Theorem 6. In the case of matrices. we let V be the matrix whose columns are the vectors in 0. o\ = \/6 is the only nonzero singular value of A.

6 Inner Product Spaces Furthermore. Proof. So we have V = Next. It can be shown that To find the polar decomposition of A = = * 71 (-1 "2 = 7*2 ( I arc orthonormal eigenvectors of A* A with corresponding eigenvalues Ai — 200 and A2 = 50. So 0 — {v\. is called a polar decomposition of A. Therefore W = Z. and so / = (QF-'YiQp-1) = P-1Q2P~]. I The factorization of a square matrix A as WP where W is unitary and P is positive semidefinite.5. and consequently the factorization is unique.V* = WP. Hence P2 = Q2. and therefore Z*W = QP~X. Thus QP~X is unitary. The object is to find an orthonormal basis 0 for R2 consisting of eigenvectors of A* A. where W and Z are unitary and P and Q are positive semidefinite. we begin by finding a sin10 -2 gular value decomposition WEV* of A. Since W is the product of unitary matrices. Example 4 11 .5 . W is unitary. there exist unitary matrices U and V and a diagonal matrix E with nonnegative diagonal entries such that A = t/EV*.27. Since A is invertible. Thus o\ = v200 = 10v2 and o2 = VoO = 5\/2 are the singular values of A. and since E is positive semidefinite and P is unitarily equivalent to E.412 Chap.4. we find the columns u\ and u2 of U: Mi = — Av\ = . then the representation is unique. it follows that P and Q are positive definite and invertible. By Theorem 6. if A is invertible. it follows that P = Q by Exercise 17 of Section 6. Now suppose that A is invertible and factors as the products A = WP = ZQ. where W = UV* and P = VEV*.v2} is an appropriate basis. Since both P and Q are positive definite'. P is positive semidefinite by Exercise 14 of Section 6.I „ ) <ri 5 Vand u2 = — Av 2 02 E = /l0\/2 0 5V2 . So A = UYV* = UV*VY.

Let L: N(T)-1 —• R(T) be the linear transformation defined by L(x) = T(x) for all x G N(T)-1. namely. The pseudoinverse of a linear transformation T on a finite-dimensional inner product space exists even if T is not invertible. and we can use the inverse of L to construct a linear transformation from W to V that salvages some of the benefits of an inverse of T.7 The Singular Value Decomposition and the Pseudoinverse Thus U = Therefore. Tlie pseudoinverse (or Moore-Penrose generalized inverse) of T. 6. and L (as just defined) coincides with T. Then R(To) = {0}. consider the zero transformation To: V — >W between two finite-dimensional inner product spaces V and W. It is desirable to have a linear transformation from W to V that captures some of the essence of an inverse of T even if T is not invertible. Definition. and let T: V — • W be a linear transformation. Then L is invertible. then T* = T _ 1 because N(T)-1 = V. and therefore T* is the zero transformation from W to V.— • R(T) be the linear transformation defined by L(x) = T(x) for all x G N(T)-1-. we have W = UV* = and P = VZV* = 413 The Pseudoinverse Let V and W be finite-dimensional inner product spaces over the same field. in the notation of Theorem 6.Sec. the restriction of T to N(T)"1. denoted by T'. As an extreme example. A simple approach to this problem is to focus on the "part" of T that is invertible. Let L: N(T)1. and let T: V — * W be a linear transformation. is defined as the unique linear transformation from W to V such that Tf(M) = L-HM) 0 foryeR(T) foryeRiT)2-. Let V and W be finite-dimensional inner product spaces over the same field.28. Furthermore. if T is invertible. .

{vr+i.. Then L_1(UJ) — —Vi for 1 < i < r. 6 Inner Product Spaces We can use the singular value theorem to describe the pseudoinverse of a linear transformation.. The pseudoinverse of A can be computed with the aid of a singular value decomposition A = UY.u2} be the orthonormal bases for P2(R) and P\(R) in Example 1. T f (a + bx) = aV(l) + bV{x) = ax + ^{3x2 ..V*...vr+2. Then a\ = \/l5 and a 2 = v 3 are the nonzero singular values of T.u2. Let A be an m x n matrix of rank r.414 Chap. and {ur+\..um} be orthonormal bases for V and W. ty} is a basis for N(T)-1-. ... .. Let {v\.. for any polynomial a + bx G P\(R)..^3} and 7 = {ui. Suppose that V and W are finite-dimensional vector spaces and T : V — > W is a linear transformation or rank r. Let L be the restriction of T to N(T)-1. Then there exists a unique n x ra matrix B such that (Lyi)t: FTO — > F n is equal to the left-multiplication transformation LB. as in Example 1.. and let oi > o2 > • • • > or be the nonzero singular values of T satisfying (4) in Theorem 6. Therefore Oi I —Vi T f ( W j) = I Oi \ 0 Example 5 Let T: P2(R) —* Pi(-R) be the linear transformation defined by T(f(x)) — f'{x).1 ) .^.26... vn} and {u\.We call B the pseudoinverse of A and denote it by B — A^. as in the definition of pseudoinverse.u2. Thus ( U ) + = L A t.. • i V l if 1 < i < r (6) if r < i < ra. Let 0 and 7 be the ordered bases whose vectors are the columns of V and U.. ^ ^ 1).. Similarly. {u\. . Then {v\.. respectively. vn} is a basis for N(T). It follows that Tt(y|. T*(l) = x.1)...v2. The Pseudoinverse of a Matrix Let A be an ra x n matrix.ur] is a basis for R(T). Thus.ur+2. .v2..)=Tt(Wi) = and hence Tt(*) = i ( 3 z 2 .um} is a basis for R(T) X . Let 0 = {^1.

We know that the . and that T* = L^t • The Pseudoinverse and Systems of Linear Equations Let A be an ra x n matrix with entries in F. the matrix equation Ax — b is a system of linear equations.7 The Singular Value Decomposition and the Pseudoinverse 415 respectively. 6. and (4) and (6) are satisfied for T = L^. Let E* be the n x ra matrix defined by .Sec.29 is actually the pseudoinverse of E. or infinitely many solutions.29. I— \0 if i = j < r otherwise.V* and nonzero singular values o\ > o2 > • • • > ay. and this is a singular value decomposition of A^. Notice that E^ as defined in Theorem 6. Then A^ — VT^U*.27. a unique solution. Theorem 6. and so it either has no solutions. Then for any b G F m . Let A be anmxn matrix of rank r with a singular value decomposition A = UT. Example 6 We find A^ for the matrix A — Since A is the matrix of Example 3. v/6/ 'v^ 0 0 0 0 0 Notice that the linear transformation T of Example 5 is L^. we have & A* = VtfU* = 1 V3 -1 1 V2 -1 V2 0 3s) 1 V§ _2_ 7e/ /1 fav < \ ( J °1 V2 0 0 [jL 1° °y \V2 1 V2 -1 V2 1 1 /' 1 1 1 7 6 I-1 -1 1 V2 -1 v/2 fa 1 73 & 1 v/2 -1 V2 0 V6) 1 _2_ . we obtain the following result. respectively.29. Reversing the roles of 0 and 7 in the proof of Theorem 6. and let o\ > o2 > • • • > ay be the nonzero singular values of A. we can use the singular value decomposition obtained in that example: /1 A= = UZV* = f \72 By Theorem 6. Then 0 and 7 are orthonormal bases for F" and F m . where A is the matrix of Example 6.

I T h e o r e m 6. therefore. we define L: N(T)-1. Observe that b G R(T). That is. (a) If Ax = b is consistent. A is not invertible or the system Ax = bis inconsistent. on the other hand. The proof of (b) is similar and is left as an exercise. For convenience. Let V and W be finite-dimensional inner product . 350). and let T: V'->W be linear. and if x G N(T). Now suppose that y is any solution to the system.) = 6 by part (b) of the lemma. by the corollary to Theorem 6. then \\z\\ < \\y\\ with equality if and only if z = y.\. (a) Suppose that Ax = b is consistent. let T = L. if • i . Furthermore. we have that ll^ll 5 = ||M|| with equality if and only if z = y. 350). That is. by the corollary to Theorem 6. we need the following lemma. Consequently VT is the orthogonal projection of V on N(T) X . and hence z is the orthogonal projection of y on N(T)"1" by part (a) of the lemma. Proof. if Az = Ay. We therefore pose the following question: In general.6 (p. If x G N(T) X . how is the vector . This proves (a). then z is the unique best approximation to a solution having minimum norm. with equality if and only if Az = Ay. As in the earlier discussion. If. if A is invertible.30. Therefore. then TtT(. Then (a) T ' T is the orthogonal projection of V on N(T)"1. where A is an in x // matrix and b G F'". then T+T(a.416 Chap.6 (p. (b) Suppose that Ax = b is inconsistent. That is.spaces. z is a solution to the system. L e m m a . Thus z is a solution to the system. 6 Inner Product Spaces system has a unique solution for every / > G F m if and only if A is invertible. By the lemma. then z is the unique solution to the system having minimum norm. then z has the following properties. then A~l — A*. and let z = A^b. Az = AA^b = TT'(6) = b is the orthogonal projection of b on R(T). Furthermore.) = L"1L(a:) = x. Proof.4*6 related to the system of linear equations Ax = 6? In order to answer this question. If z = A^b. and if y is any solution to the system. (b) If Ax = b is inconsistent./. Consider the system of linear equations Ax = b. Then T+T(w) = A]Ay = AU) = z.—> W by L(x) — T{x) for all x G N(T)'. and therefore Az = AA*b = TT f (/. (b) TT' is the orthogonal projection of W on R(T).) = T+(0) = 0. then || z || < || w|| with equality if and only if z — y. in which case the solution is given by A~xb. Az is the vector in R(T) nearest b. and so the solution can be written as x = A^b. then A^b still exists. \\Az — b\\ < \\Ay — b\\ for any i) G F"..

ii Note that the vector z = A'b in Theorem 6. • . 6. then \\Az — b\\ < \\Ay — b\\ with equality if and only if Az = Ay. al- z = A^b = 6 is not a solution to the second system. and therefore D -1 -lj is the solution of minimal norm by Theorem 6. Let A = coefficient matrix of the system. Example 7 Consider the linear systems X\ -I.30 is the vector x'o described in Theorem 6. it is the "best approximation" to a solution having minimum norm. as described in Theorem 6. suppose that y is any vector in F n such that Az — Ay = c.7 The Singular Value Decomposition and the Pseudoinverse 417 Ay is any other vector in R(T).X3 = 1 x\ + x2 .30(a).x2 .12 that arises in the least squares application on pages 360 364. Let b = though 1 1 -1 Thus. Finally. hence we may apply part (a) of this theorem to the system Ax = c to conclude that ||z|| < ||y|| with equality if and only if z = y.X3 = 1 and Xi + X2 . Then A*c = A^Az = A^AA^b = A^b = z by Exercise 23.30(b). The second system is obviously inconsistent.X3 = 1 x\ + x2 .x3 = 2.Sec. and let b = '1 By Example 6. 1 1 1 1 -1 -1 the The first system has infinitely many solutions.

um} for W. cosx}) with the inner product defined by (/. find orthonormal bases {vi. g) = C f(t)g(t) dt. (b) The singular values of any matrix A are the eigenvalues of A* A. X\ + x2. and the nonzero singular values ai > a 2 > • • • > a r of T such that T(vi) = OiUi for 1 < i < r. Label the following statements as true or false. (g) The pseudoinverse of any linear operator exists even if the operator is not invertible. Let T: V — are finite-dimensional inner product spaces.. and the inner products are defined as in Example 1 (c) Let V = W = span({l. (a) T: R2 — + R3 defined by T{x\. u2. 6 Inner Product Spaces 1. 1 0 1 0 1 -1 /I i\ 0 I 1 0 V (d) (e) 1-M 1 1 — i —i l (0 I (a) (b) (c) V I l 0 -1 1 1 -2 1 1 1 4.. the vector A*b is a solution to Ax = b.. Find a polar decomposition for each of the following matrices. • W be a linear transformation of rank r. (1 + i)zx + z2) 3. where T(f(x)) = f"(x). (a) 0 -i) (b > ( 1 j { 5. • • •.418 EXERCISES Chap. sin x. X\ — x2) (b) T: P2(R) . Find a singular value decomposition for each of the following matrices.i)z2.• Pi(R). (a) The singular values of any linear operator on a finite-dimensional vector space are also eigenvalues of the operator. if a is a singular value of A. then |c|a is a singular value of cA. (e) If A is an eigenvalue of a self-adjoint matrix A.v2. then A is a singular value of A..x2) = (x\. (d) The singular values of any linear operator are nonnegative. (c) For any matrix A and any scalar c. Find an explicit formula for each of the following expressions. In each of the following. •M . and T is defined by T ( / ) = f + 2 / (d) T: C2 -> C2 defined by T(z1. vn} for V and {u\.z2) = ((1 . where V and W 2. (f) For any rnxn matrix A and any b G F n .

um} are orthonormal bases for V and W. where T is the linear transformation of Exercise 2(d) 6. For each of the given linear transformations T : V — » W.7 The Singular Value Decomposition and the Pseudoinverse 419 (a) T+ (x\. Let V and W be finite-dimensional inner product spaces over F. respectively. where T is the linear transformation of Exercise 2(c) (d) T + iz\.. /l i\ 0 I 1 0 1 (b) (c) (a) 1 0 1 0 V (d) (e) 1-H 1-i 1N (f) I I I ly I I I 0 -l -2 1 1 1 7. find the "best approximation to a solution" having minimum norm. find the unique solution having minimum norm. and suppose that ai > a 2 > • • • > a r > 0 arc such that T(Mj) = OjUi if 1 < i < r if r < i. (i) If the system is consistent.u2.Sec. and suppose that {v\. (ii) If the system is inconsistent. For each of the given systems of linear equations.z 2 ). 6.x4 — 2 Xi . as described in Theorem 6. X3).vn} and {u\.. Use the results of Exercise 3 to find the pseudoinverse of each of the following matrices..30(b). (ii) Describe the subspace Z2 of W such that TT* is the orthogonal projection of W on Z 2 . (i) Describe the subspace Zi of V such that T*T is the orthogonal projection of V on Z\.v2.T3 + £ 4 = 2 9. where T is the linear transformation of Exercise 2(b) (c) T + (a + 6 sin a: + coos re). .. x2. where T is the linear transformation of Exercise 2(a) (b) T + (a + bx + ex2). 0 .. (Use your answers to parts (a) and (f) of Exercise 6..2x 3 + x4 = -1 X\ — X2 + . (a) (b) (c) (d) T T T T is the is the is the is the linear linear linear linear transformation transformation transformation transformation of of of of Exercise Exercise Exercise Exercise 2(a) 2(b) 2(c) 2(d) 8. Let T: V —• W is a linear transformation of rank r..) (a) X\ + x\ + —X\ -\ x2 = 1 x2 = 2 X2 = 0 (b) X\ + x2 + xs -\.

(b) Use (a) to prove that A is normal if and only if WP = PW.. (c) Prove that TT* and T*T have the same nonzero eigenvalues. An. . then UT.27.. .vn} and corresponding eigenvalues Ai. A 2 . This exercise relates the singular values of a well-behaved linear operator or matrix to its eigenvalues.420 Chap. . . . Let A be a square matrix.. Prove that if A is a positive definite matrix and A — UEV* is a singular value decomposition of A. 10... Let A be a square matrix with a polar decomposition A = WP. 11.. . = (b) Let A be an ra x n matrix with real or complex entries. Prove that the nonzero singular values of A are the positive square roots of the nonzero eigenvalues of AA*. . Prove an alternate form of the polar decomposition for A: There exists a unitary matrix W and a positive semidefinite matrix P such that A = PW. Let A be a normal matrix with an orthonormal basis of eigenvectors 0 — {v\. .. the singular value decomposition theorem for matrices. . . .u2. A m . Prove that if A is a positive semidefinite matrix. |A n |.5 to obtain another proof of Theorem 6. (d) State and prove a result for matrices analogous to (c).. 16. A 2 . Prove that for each i there is a scalar 0j of absolute value 1 such that if U is the n x n matrix with 9iVt as column i and E is the diagonal matrix such that Ejj = |Aj| for each i. then the singular values of A are the same as the eigenvalues of A. 6 Inner Product Spaces (a) Prove that {u\. (a) Let T be a normal linear operator on an ?i-dimensional inner product space with eigenvalues Ai. A 2 .v2. 14. 15.. 12.. then U = V.V* is a singular value decomposition of A 13. . where A. Use Exercise 8 of Section 2. including repetitions. Let V be the n x n matrix whose columns are the vectors in 0. . . . An. (a) Prove that A is normal if and only if WP2 = P2W. Prove that the singular values of T are |Ai|. |A 2 |. including repetitions.um} is a set of eigenvectors of TT* with corresponding eigenvalues Ai. (b) State and prove a result for matrices analogous to (a).

UTU = U. State and prove a result for matrices that is analogous to the result of Exercise 21. 21. Let V and W be finite-dimensional inner product spaces.0). (c) Both T*T and TT* are self-adjoint. 25. then TT* is invertible and T* = T*(TT*) _ 1 . 22. The preceding three statements are called the P e n r o s e conditions. (b) For any n x n unitary matrix H. and both UT and TU are self-adjoint. 20. 24. 421 G R2 by (a) Prove that (UT)* ^ T*U*.x2) T(xi. Prove the following results. then T*T is invertible and T* = (T*T)~ 1 T*. (c) (At)«-(A*)t. Prove that (A*)2 = O. (a) TT*T = T. (b) If T is onto. but (AB)^ ^ 18. Let T: V — >W and U: W — > V be linear transformations such that TUT = T. State and prove a result for matrices that is analogous to the result of Exercise 22. and let T: V — > W be linear.7 The Singular Value Decomposition and the Pseudoinverse 17. 23. 6.x 2 ) = (xi + x 2 .0) and U(xi. Let T and U be linear operators on R2 defined for all (xi. (b) T*TT*=T*. Let V and W be finite-dimensional inner product spaces. and let T: V — * W be linear. 19. Let A be a square matrix such that A2 = O. (a) The nonzero singular values of A are the same as the nonzero singular values of A*.4*)* = (A)*.Sec. which are the same as the nonzero singular values of A1. Let ^4 be an ra x n matrix. (A//)* = ifM*.x2) = (xi. (a) For any ra x ra unitary matrix G. (GAY = A^G*. Prove the following results. (b) Exhibit matrices A and B such that AB is defined. Let V and W be finite-dimensional inner product spaces. (a) If T is one-to-one. (b) (. Prove that U = T*. Prove the following results. Prove the following results. Let A be a matrix with real or complex entries. and they characterize the pseudoinverse of a linear transformation as shown in Exercise 22. .

We study the basic properties of this class with a special emphasis on symmetric bilinear forms.3ai&2 + 4a2&i — a2b2 for We could verify directly that II is a bilinear form on R2. respectively.y2) — aH(x. Example 1 Define a function H: R2 x R2 H R by GR2. This is the class of bilinear forms. and let T: V — » W be linear. y) = aH(x]. and we consider some of its applications to quadratic surfaces and multivariable calculus.422 Chap. it is more enlightening and less tedious to observe that if A = then H(x. H is a bilinear form on V if (a) H{ax\ + x2. = 2ai&i 4. 2 4 ?v -w /«. y2) for all x. 27. and let T: V — > W be a linear transformation. y G V and a G F (b) H(x. that is.30: TT* is the orthogonal projection of W on R(T). We denote the set of all bilinear forms on V by #(V). However. A function II from the set V x V of ordered pairs of vectors to F is called a bilinear form on V if H is linear in each variable when the other variable is held fixed. y\. Let V be a vector space over a field F. 6 Inner Product Spaces 26. but not. yi) 4 H(x. Let V and W be finite-dimensional inner product spaces with orthonormal bases 0 and 7. Let V and W be finite-dimensional inner product spaces. y) for a l l x \ . Prove part (b) of the lemma to Theorem 6. Observe that an inner product on a vector space is a bilinear form if the underlying field is real. Prove that ([T|j)t = [Tt]?. \a2 and y— The bilinearity of H now follows directly from the distributive property of matrix multiplication over matrix addition. 6. • . x 2 .y) = x 'Ay. if the underlying field is complex. y) + H(x2. Bilinear Forms Definition.y2 G V and a G F.8* BILINEAR AND QUADRATIC FORMS There is a certain class of scalar-valued functions of two variables defined on a vector space that arises in the study of such diverse subjects as geometry and multivariable calculus. mj\ 4.

The bilinearity of H follows as in Example 1. then J is a bilinear Definitions.y) form.y).y) 4 H2{x. y G V.x) forallyGV. For example. Notice that since x and y are n x 1 matrices and A is an nxn matrix. We define the sum Hi 4 H2 and the scalar product aH\ by the equations (f/i 4. For any bilinear form i J o n a vector space V over a field F. the followingproperties hold. = Hx{x.x). = H(y.y G V.z + w) = H(x. define H: V x V . let H\ and H2 be bilinear forms on V.y G V.w G V. H(x. and let a be a scalar.y) and (aHi)(x.y) The following theorem is an immediate consequence of the definitions. then Lx and R^ are linear.y) = a(Hi(x. We identify this matrix with its single entry. Let V be a vector space.y) + H(x2.y) = (axi + x2)tAy = {ax\ 4 = ax\Ay 4 x\Ay = aH{x1. Example 2 423 Let V = F n .8 Bilinear and Quadratic Forms The preceding bilinear form is a special case of the next example. 6. .y) is a 1 x 1 matrix. y) = xlAy for x. 3. w). H(0. H(x + y.y)) forallx. For all x. z) 4 H(y. z) 4 H(x. x\)Ay • We list several properties possessed by all bilinear forms.• F by H(x.x2. If .z. 4. w) + H(y.x2. For any A G M n X n (F).7: V x V — > F is defined by J(x. where the vectors are considered as column vectors. Their proofs are left to the reader (see Exercise 2).y) and Rx(y) = H(y. 0) = 0 for all x G V.Sec. the functions Lx. x) = H(x. we have H(axi 4. 2. for a G F and x\. If. for any x G V.H2){x. 1. Rx: V — * F are defined by Lx(y) = H{x. y.

Definition. 6 Inner Product Spaces Theorem 6. Vj) for i. For any vector space V.. n. Furthermore.v2.. j = 1 . = 2 4 3 . the sum of two bilinear forms and the product of a scalar and a bilinear form on V are again bilinear forms on V. We can associate with // an .1 = 8.424 Chap.31. and let H G £(V).vn} be an ordered basis for an n-dimensional vector space V. r and B = $p(H). The matrix A above is called the matrix representation of H with respect to the ordered basis 0 and is denoted bytppffi).2 -1 = 2-3-4-1 =-6. ..(F). where F is the field of scalars for V./ x n matrix A whose entry in row i and column j is defined by Aij = H{vi. = 2 . . We first consider an example and then show that ipp is an isomorphism.4 + 1 = 2. = 2 + 3 + 4 . | Let 0 = {vx. 2 . If 7 is the standard ordered basis for R .3 + 441=4. Exercise. and let 0 = Then I3U = H BV2 = H B2\ = H and B22 =a H So MH) = 4 . Example 3 Consider the bilinear form H of Example 1. . We can therefore regard ijjp as a mapping from S(V) to M„ x r . the reader can verify that iMH) = '2 4 3N -1 . Proof. B(V) is a vector space with respect to these operations. . that takes a bilinear form H into its matrix representation ipp(H). .

4. Ry(vi) = H(vi. So H{x. To show that i/jp is onto. Hence LVi is the zero transformation from V to F. which is linear by property 1 on page 423. 6. Let viyv3 G 0.y) — Ry(x) = 0 for all x. y G F". for any i and j.j\ hence.Sec.y G V. and 0 be the standard ordered basis for F n .y) = [^(x)]*A[<f>e(y)] for all x. there exists a unique matrix A G M„. and therefore V/3 is one-to-one. So H{vi. and recall the linear mapping Ry: V —> F defined in property 1 on page 423. By (7).8 Bilinear and Quadratic Forms 425 T h e o r e m 6. Proof. Exercise. Then <^(fj) = ej and 4>(i{vj) = e. #(V) lias dimenI The following corollary is easily established by reviewing the proof of Theorem 6. n a positive integer. Then for any H G B(fn). sion n2. and recall the mapping LVi: V — * F . A = ipp{H). we view (f)p(x) G F n as a column vector. | We conclude that •tfipiH) = A and V^/3 is onto. Corollary 1. Let F be a field. Fix Vi G 0. Corollary 3. If H G B{\1) and A G M„Xn(-F). y) = xlAy for all x.32. y) = lMx)]*A\^p(y)] for all x. LVi(vj) = H(vi. and hence Ry is the zero transformation. ^cn ij)0{H) = A if and only ifH(x. For any n-dimcnsional vector space V over F and any ordered basis 0 for V. Corollary 2. For x G V. We leave the proof that ipp is linear to the reader. The following result is now an immediate consequence of Corollary 2.y G V. x) = LVi (x) = 0 for all x G V and v{ G 0. Let V be an n-dimensional vector space over F with ordered basis 0. We show that tj>p(H) = A. Proof.= Ai3. consider any A G M n X n ( F ) . To show that ipp is one-to-one. H(Vi. y G V. such that H(x. By hypothesis. . iftp: i?(V) — > M r t X n (F) is an isomorphism. namely. A slight embellishment of the method of Example 2 can be used to prove that H G B{V).y) = 0 for all Vi G 0.Vj) = 0 for all Vj G 0. suppose that ij^p(H) = O for some H G B(V). Let H: V X V — > F be the mapping defined by H(x. Thus H is the zero bilinear form. For any n-dimensional vector space V. xn (F). (7) Next fix an arbitrary y G V. Recall the isomorphism <f>p: V — > F n defined in Section 2.Vj) = [Mvi)]tA[Mvj)} = e\Ae3.32.

Proof.y G R2..x'Ay for all x..e. n h u r ' n : B ^ ^ Chap. the congruence. 6 Inner Product Spaces ta 6i &MS> It can be shown that H is a bilinear form. One involves a direct computation.. B G M „ x n ( F ) . one can pose the following question: How docs the matrix corresponding to a fixed bilinear form change when the ordered basis is changed? As we have seen.426 Example 4 Define a function H: R2 x R2 -^ R by // "l . we have Av = d e t | * '''" Therefore A = {< ()) A 12 = d e t ( j and A22 = det ( f)=l. Then B is said to be congruent to A if there exists an invertible matrix Q G M n x n ( F ) such that B = Q'AQ. The next theorem relates congruence to the matrix representation of a bilinear form.v2. the corresponding question leads to another relation on square matrices. We find the matrix A in Corollary 3 such that H{x. Let V be a finite-dimensional vector space with ordered bases 0 = {vi.33. relation. 1 1 01 0 1 -1 0 There is an analogy between bilinear forms and linear operators on finitedimensional vector spaces in that both are associated with unique square matrices and the correspondences depend on the choice of an ordered basis for the vector space. •. In the case of bilinear forms. Theorem 6. Then. Observe that the relation of congruence is an equivalence relation (sec Exercise 12).j) for all i and j . We give the more direct proof here.. Since Ai3 — H(ci. Therefore ij^{H) is congruent tod'3{H). Definition. As in the case of linear operators.y) .. while the other follows immediately from a clever observation. w7l). .w2. There are essentially two proofs of this theorem. and let Q be the change of coordinate matrix changing ^-coordinates into 0-coordinates. vn) and 7 = {w\. for any H G #(V). we have 0 7 (J7) = Q'ipg{H)Q. the corresponding question for matrix representations of linear operators leads to the definition of the similarity relation on square matrices.. leaving the other proof for the exercises (see Exercise 13). Let A.

.. Vr) fc=l r=l n n fc=l » fc=l 7i 7=1 7 1 r=l ^QkiH(vk.33. r=l 427 fc=l T J = y£Qlk(AQhJ fe=i Hence B = Q*AQ. if B is congruent to ipp(H).Sec. 6. . Then for 1 < i. Qijvi 7=1 f° r 1 < i < M. = (QtAQ)lj.Wj) = H I 'Y^QkiVk.Wj J n = fc=sl = YL ®kiH I Vk> 5 Z QriVr) fc=l V 7-1 / T t n = 22 ^kl S Q'jH(vk. I The following result is the converse of Theorem 6..wn]. and let H be a bilinear form on V. Let 7 = {w\. Corollary. then Q changes 7-coordinates into 0-coordinates. where = = 2_..i. .8 Bilinear and Quadratic Forms Suppose that A — ipp(H) and B = ipy(H). then there exists an ordered basis 7 for V such that ip^{H) = B.v2. Furthermore.QkiVk k=\ Thus Bij = H{w.Wj) and w3 =^TQrjVr.w2. Suppose that B = Q'ij)ft(H)Q for some invertible matrix Q and that 0 = {v\.. if B = Qt"iftp(H)Q for some invertible matrix Q. Proof... Let V he an n-dimensional vector space with ordered basis 0. For any n xn matrix B. j < n. = ^.vn}.

Corollary. v„ } and B = ip0{H). Then II is symmetric if and only ifif)p(H) is symmetric. Then H is symmetric.34. symmetric bilinear forms correspond to symmetric matrices.y) = H(y. Let H be a diagonalizable bilinear form ou a finite-dimensional vector space V. Let H be a bilinear form on a finite-dimensional vector space V.y G V. . Conversely. there is a close relationship between diagonalizable bilinear forms and those that are called symmetric..428 Chap.j < n. there is an analogous diagonalization problem for bilinear forms. we have . and let ii be an ordered basis for V.32.vj) = H(vJ. namely.vl) = BJI = Bu. and it follows that B is symmetric. First assume that H is symmetric. B = QtMH)Q=<!h(H). Then for 1 < i. 6 Inner Product Spaces Since Q is invertible. where F is the field of scalars for V. by Theorem 6.va. By property 4 on page 423. L I Definition.r.x) for all x. y G V. the problem of determining those bilinear forms for which there are diagonal matrix representations. and therefore H is symmetric. if I As the name suggests. and Q is the change of coordinate matrix that changes 7-coordinates into /^-coordinates.x) = J(x.y) for all x. A bilinear form H on a finite-dimensional vector space V is called diagonalizable if there is an ordered basis 0 for V such that 0^(H) is a diagonal matrix.. Proof. suppose that B is symmetric. Thus C = B. Hence H(y. Symmetric Bilinear Forms Like the diagonalization problem for linear operators. Therefore. A bilinear form H on a vector space V is symmetric H(x. Let ./: V x V — • F. As we will see.y) = H(. CIJ=J(v1.vj) = H(v3yvi) = Bji. j < n. be the mapping defined by J(x./ = //./ is a bilinear form. . Bij = H(vi..y) = H(y. for 1 < i.. Let C — ipe{J)Then. Let 6 = {wi. Theorem 6. Since ijjp is one-to-one.x) for all x.y G V. 7 is an ordered basis for V. Definition.

In fact. • The bilinear form of Example 5 is an anomaly. Recall . and H: V x V H F be the bilinear form defined by = a\b2 +a2bx. comparing the row 1. Then there is an ordered basis 0 for V such that ij)@ (H) = D is a diagonal matrix. by Theorem 6. Clearly H is symmetric. there exists an invertible matrix Q such that B = Q* AQ. So by Theorem 6. B = Suppose that QThen = B = Q'AQ a c b d 0 1 1 0 a b c d ac + ac be + ad be + ad bd + bd 1 0 0 1 But p + p = 0 for all p G F. and hence. 6. suppose that H is diagonalizable. Then there is an ordered basis 7 for V such that B — 0 7 ( i / ) is a diagonal matrix.34. a contradiction. Since Q is invertible. Trivially. column 1 entries of the matrices in the equation above. Its failure to be diagonalizable is due to the fact that the scalar field Z2 is of characteristic two. D is a symmetric matrix. Thus. By way of contradiction. Example 5 Let F = Z2.8 Bilinear and Quadratic Forms 429 Proof Suppose that H is diagonalizable. 1 Unfortunately. then A = #(//) = ^ JV a symmetric matrix. the converse is not true. Therefore H is not diagonalizable. it follows that rank (73) = rank(^4) = 2. We show that H is not diagonalizable.Sec.33. H is symmetric.\/ = F 2 . and consequently the diagonal entries of B are nonzero. Since the only nonzero scalar of F is 1. hence ac + ac — 0. we conclude that 1 = 0. if 0 is the standard ordered basis for V. as is illustrated by the following example.

rank(L. . and hence dim(N(L J )) = n . I . Recall the function L.y) for all y G V. then A is congruent to a diagonal matrix. Furthermore.x) ^ 0.430 Chap. v G V such that H(u. L^ is linear. of characteristic two.) = 1. If // is the zero bilinear form on V. we establish the following Lemma. so suppose that 77 is a nonzero symmetric bilinear form on V. set x = u + v. and therefore II is diagonalizable. Let F be a field that is not of characteristic two. Proof.v) y£ 0. Then every symmetric bilinear form on V is diagonalizable.j < n . v) ^ 0.x) = H(u.. there is nothing to prove. Since H is nonzero. 6 Inner Product Spaces from Appendix C that a field F is of characteristic two if 1 + 1 = 0 in F. Set yn = x. then every element of <B(V) is diagonalizable.1).U2.r) is obviously a symmetric bilinear form on a vector space of dimension // — 1. Theorem 6. Tims. by the induction hypothesis. there exists an ordered basis {vi. Let V be a finite-dimensional vector space over a field F not of characteristic two. then 1 + 1 = 2 has a multiplicative inverse. we can choose vectors u. then trivially H is diagonalizable. By property 1 on page 423. Proof. Then vn <£ N(LX).1..Vi) — 0 for i = 1 .„) = H(un.:: V — F defined by L. and so 0 ~ {MI. .v) ^ 0 | because 2 ^ 0 and H{u.u) ^ 0 or H(v. Now suppose that the theorem is valid for vector spaces of dimension less than n for some fixed integer n > 1. . We use mathematical induction on n = dim(V). If H(u.n — 1.u) + H{u.v2. Then there is a vector x in V such that H(x. i.vn) is an ordered basis for V.34 for scalar fields that are not. By the lemma. We conclude that ijipiH) is a diagonal matrix. Proof. If n — 1. x) ^ 0.v) + H(v. Exercise.. . In addition. Lemma. Before proving the converse of the corollary to Theorem 6. /7(u.v) ^ 0. .(/y) = H(x.u) + H{v.v) = 2H(u.Vj) = 0 for i / j (1 < i. .35. Lx is nonzero. . Let H be a nonzero symmetric bilinear form on a vector space V over a field F not of characteristic two. . 1 Corollary. 2 . The restriction of 77 to N(L. Consequently. If A G M„ x „(F) is a symmetric matrix. Otherwise. Then H(x.. and suppose that diin(V) = n.vn i} for INKL^) such that II(vi.x) / 0. which we denote by 1/2. there exists a nonzero vector x in V such that 77(x. If F is not of characteristic two. since L a (x) = II(x. .

i(R) defined by We use the procedure just described to find an invertible matrix Q and a diagonal matrix D such that Q'AQ = D...1. Thus E'AE can be obtained from A by performing an elementary operation on the columns of A and then performing the same operation on the rows of AE.) Suppose that Q is an invertible matrix and D is a diagonal matrix such that QlAQ = D. we conclude that by means of several elementary column operations and the corresponding row operations. and if Q — E\E2 • . then Q'AQ = D. there are matrices Q. sayQ = E1E2---Ek.. and Ql AQ = D.E2.8 Bilinear and Quadratic Forms Diagonalization of Symmetric Matrices 431 Let A be a symmetric n x n matrix with entries from a field F not of characteristic two. This method requires familiarity with elementary matrices and their properties. If E is an elementary nxn matrix. We now give a method for computing Q and D. (Note that the order of the operations can be reversed because of the associative property of matrix multiplication. By the corollary to Theorem 6.Sec. D is diagonal. The elementary matrix that corresponds to this elementary column operation is E. We begin by eliminating all of the nonzero entries in the first row and first column except for the entry in column 1 and row 1. 6.6 (p.Ek are the elem. A can be transformed into a diagonal matrix D. then AE can be obtained by performing an elementary column operation on A. Q is a product of elementary matrices.D G M n x n ( F ) such that Q is invertible. 159). To this end. we add the first column of A to the second column to produce a zero in row 1 and column 2. From the preceding equation. if E\. Furthermore. By Corollary 3 to Theorem 3. ElA can be obtained by performing the same operation on the rows of A rather than on its columns. Example 6 Let A be the symmetric matrix in M:ix. = . which the reader may wish to review in Section 3.35. By Exercise 21.• Ek. .E2 • • • Ek. Thus D = Q'AQ = ££££_i • • • E\AE.entary matrices corresponding to these elementary column operations indexed in the order performed.

So we let /I 1 . respectively. 4 1 We now use the first column of E[AE\ to eliminate the 3 in row 1 column 3. and follow this operation with the corresponding row operation. The corresponding elementary matrix E3 and the result of the elementary operations E!iE2E\AE\E2E:i are. The reader should justify the following method for computing Q without recording each elementary matrix separately. this method produces the following sequence of matrices: 1 -1 3 2 1 -1 1 1 3 1 0 3 1 1 1 3 1 1 (A\I)= .2. The method is inspired by the algorithm for computing the inverse of a matrix developed in Section 3.432 Chap. Starting with the matrix A of the preceding example. 1 0 0 1 0 0 0 -4 1 E^El2E\AElE2E3 0 0 0 1 0 0 0 -24 E3 = and = Since we have obtained a diagonal matrix. The corresponding elementary matrix E2 and the result of the elementary operations E^E\AEXE2 are.= 0 -3 0 1 0 0 0 1 1 0 0 4 0 1 0 4 -8 and E\E\AEiE2 = Finally.7 \ Q = E1E2E3 = I 0 1 .4 \0 0 1/ /I 0 D = I0 1 \0 0 • 0N 0 -24. the process is complete. E. We use a sequence of elementary column operations and corresponding row operations to change the n x 2n matrix (A\I) into the form (D\B). 6 Inner Product Spaces We perforin the corresponding elementary operation on the rows of AE\ to obtain /1 E\AEi = 0 \3 0 3N 1 4 | . It then follows that D = Q'AQ. and to obtain the desired diagonalization Q' AQ = D. where D is a diagonal matrix and B = Q*. respectively. we subtract 4 times the second column of E2E\AE\E2 from the third column and follow this with the corresponding row operation.

8 Bilinear and Quadratic Forms 433 Therefore F> = Q = l and (l 1 Q= I0 1 VO 0 -V -4 1. x) for some symmetric bilinear form 77 on V. tn) = ^2 a-ijtitji<3 K{x) ... then we can recover 77 from K because H(x.K(y)} (9) . In fact. there is a one-to-one correspondence between symmetric bilinear forms and quadratic forms given by (8).. x) for all x G V. Let V be a vector space over F.. if K is a quadratic form on a vector space V over a field F not of characteristic two..Sec.t2.y) = -[K(x + y) (See Exercise 16..t2. .. (8) If the field F is not of characteristic two. and K(x) = 77(x. Definition. 6.) Example 7 The classic example of a quadratic form is the homogeneous second-degree polynomial of several variables. Quadratic Forms Associated with symmetric bilinear forms are functions called quadratic forms. define the polynomial f(h.tn that take values in a field F not of characteristic two and given (not necessarily distinct) scalars aij (1 < i < j < n). Given the variables t\.. A function K: V —• F is called a quadratic form if there exists a symmetric bilinear form 77 G #(V) such that K(x) = H(x.

33. SI L . Let 0 = {wi.. ' 2 .5. Since A is symmetric.20 p. Proof. Let V be a finite-dimensional real inner product space. The following theorem and its corollary are useful. 0 is orthonormal by Exercise 30 of Section 6.v2. where A 7 — A'* ~ ~ an \ai3 if t = j if i^j. Wn} be defined by ivj = Y J QijVi for 1 < 7 < n. 384). T h e o r e m 6.t2)t3) = (tut2. To see this. there exists an orthogonal matrix Q and a diagonal matrix D such that D — Q' AQ by Theorem 6.. . and let H be a symmetric bilinear form on V.*( 77) is a diagonal matrix. In fact. Then there exists an orthonormal basis 0 for V such that «/'. the theory of symmetric bilinear forms and quadratic forms on finite-dimensional vector spaces over 77 is especially nice. if 3 is the standard ordered basis for F". For example.t3)A % ( t2 Quadratic Forms Over the Field R Since symmetric matrices over 7? are orthogonally diagonalizable (see Theorem 6..434 Chap.vn} for V. h) = 2/'f .e3) = A j from the quadratic form K.36.y) = x' Ay for all x.. Furthermore. ^#(77) = D.t\ + 6tit 2 . given the polynomial / ( ' i .. apply (9) to obtain H(ei.. let Setting H(x. ..4t 2 t 3 with real coefficients. and let A — */>-v(T7). 6 Inner Product Spaces Any such polynomial is a quadratic form.w2. Choose any orthonormal basis 7 = {vi. and verify that / is computable from H by (8) using / in place of K. 7=1 By Theorem 6. since Q is orthogonal and 7 is orthonormal. then the symmetric bilinear form 77 corresponding to the quadratic form / has the matrix representation ipn(H) = A.y G R'\ we see that f(ti.20.

..2>.x) = [<f>p(x)]tD[<f>(i(x)} = = I > f ..i 2 .vn} for V and scalars Ai. In fact.t2) relative to 7. there exists an orthonormal basis 0 — {v\..v2.t2).sn)D K(x)=H(x.t2) = Xis2 + A2A. . A 2 .. (10) we find an orthonormal basis 7 = {xi.Then fsi\ s2 (s1. Let 77 be the symmetric bilinear form for which K{x) = H(x.2 + 4 ll . as an expression involving . then 0 can be chosen to be any orthonormal basis for V such that il>p(H) is a diagonal matrix. 7=1 I Example 8 For the homogeneous real polynomial of degree 2 defined by / ( t i . then f{t\. There exists an orthonormal basis 0 = {v\..x 2 } for R2 and scalars Ai and A2 such that if 1 ) G R2 and ( '} ] = SiXi + s 2 x 2 .36. . if 77 is the symmetric bilinear form determined by K. .s2..=i then K(x) = J2XiS2.x) for all x G V.v2. Let K be a quadratic form on a finite-dimensional real inner product space V. By Theorem 6. .8 Bilinear and Quadratic Forms 435 Corollary. Thus the polynomial f(ti.Sec. . We can think of s\ and $2 as the coordinates of (ti. An (not necessarily distinct) such that if x G V and n x = ^2siVi. . . 6. and suppose that x = X^iLi SiVi...vn} for V such that tpp(H) is the diagonal matrix /Ai 0 D = \0 0 A2 0 ••• 0\ 0 An/ Let x G V. Proof. Si G 7?.. t 2 ) = 5i 2 + 2r.

Consequently. setting Q J _ (2 1 we see that Q is an orthogonal matrix and QlAQ = '6 0 0 1 Clearly Q is also a change of coordinate matrix. is transformed into a new polynomial g(si. Then Next. = 7i(i) •°d W2 = 7s(-2)' Let 7 = {vi. • '6 0N y (| L The next example illustrates how the theory of quadratic forms can be applied to the problem of describing quadratic surfaces in R3. we find an orthogonal matrix Q such that Q'AQ is a diagonal matrix.2*2*3 + 2*1 + 3*i . (11) 2*f + 6*i*2 + 5*1 .*3 + 14 = 0.436 Chap. Let 77 denote the symmetric bilinear form corresponding to the quadratic form defined by (10). Then 7 is an orthonormal basis for R2 consisting of eigenvectors of A.2*2 . So g(si. Example 9 Let «S be the surface in R3 defined by the equation . K(x) = 6sl + si for any x = s\v 1 + s2i>2 G R2.v2}. . 6 Inner Product Spaces the coordinates of a point with respect to the standard ordered basis for R2.s2) = XiS2 + A2s2 interpreted as an expression involving the coordinates of a point relative to the new ordered basis 7. For this purpose. let 0 be the standard ordered basis for R2. Hence. observe that Ai = 6 and A2 = 1 are the eigenvalues of A with corresponding orthonormal eigenvectors ". s2) = 6s2 + s 2 . and let i4 = ^ ( H ) . V7(77) = Q*il>p{H)Q = QlAQ = Thus by the corollary to Theorem 6.36.

and let A — ipp{H). and ^(H) = QtMH)Q ^ = QtAQ= /2 [0 \0 0 7 0 0\ 0). Next. hence A has the eigenvalues Ai = 2. Then The characteristic polynomial of A is (—1)(* — 2)(* — 7)*. we diagonalize K.8 Bilinear and Quadratic Forms 437 Then (11) describes the points of S in terms of their coordinates relative to 0.2*2*3 + 2*§. then K(x) = 2s 2 + 7s|. A2 = 7. We find a new orthonormal basis 7 for R3 so that the equation describing the coordinates of < S relative to 7 is simpler than (11).v3} and 1 vTo Q = 0 3 \VTo 3 \/35 5 -1 -3 y/Ti 2 1 W As in Example 8.Sec. Q is a change of coordinate matrix changing 7-coordinates to /3-coordinates. if x = siVi + s2v2 + S3M3. and A3 = 0.36.v2. 0/ By the corollary to Theorem 6. Let 77 be the symmetric bilinear form corresponding to K. 6. Corresponding unit eigenvectors are and V3 = Vi = VTo\3J v2 = Set 7 = {vi. the standard ordered basis for R3. We begin with the observation that the terms of second degree on the left side of (11) add to form a quadratic form K on R3: f< I t 2 = 2*2 + 6*i*2 + 5*1 . (12) .

438 Chap. I l l ) .22 (p. Then.s2 *i = —7= • + 10 V35 *2 = and *a = 3.7 We are now ready to transform (11) into an equation involving coordinates relative to 7. by Theorem 2. and therefore Si 3.S1 s2 '35 S3 VT4 5. t2.s2 \/3o~ 3. 6 Inner Product Spaces Figure 6. and suppose that x = S1V1 -r82v2+s$V3. Let x — (ti. t3) G R3.S3 2s 3 \/\l .

p2. then for any i = 1 . * n ) be a fixed real-valued function of n real variables for which all third-order partial derivatives exist and are continuous. we say that / has a local extremum at p. 6. n the .v2. Wade (Prentice Hall. if we draw new axes x'. Upper Saddle River. Consequently. . An Introduction to Analysis 2d ed. then p is a critical point of / .'\2 ^iy') 14. . and z' in the directions of V\. respectively. • • •. 2 . The reader is undoubtedly familiar with the one-variable version of Taylor's theorem. . For. we conclude that if x G R3 and x = Sifi + S2^2 + S3M3. 2 . A point p G R n is called a critical point of / if df(p)/dti = 0 for i = 1 .\/Us3 + 14 = 0 or s3 = ^ s ? + — &\ + Vu. consult. / has a local minimum at p G Rn if there exists a 5 > 0 such that f(p) < f(x) whenever ||x— p\\ < 6. and the preceding equation. *2. We recognize S to be an elliptic paraboloid. . The function / is said to have a local maximum at a point p G R n if there exists a 6 > 0 such that f(p) > f(x) whenever \\x — p|| < 8.y'. If / has either a local minimum or a local maximum at p. rewritten as . . . by William R. n. 439 Combining (11).7 is a sketch of the surface S drawn so that the vectors V\. . For a statement and proof of the multivariable version. Figure 6. N. • • • iPn)-. 2000).8 Bilinear and Quadratic Forms Thus 14s? 3*i — 2*2 — *3 = — = -V14s 3 . z — 7 (xT v ' + 2 coincides with the surface S. for example. It is a well-known fact that if / has a local extremum at a point p G R n . For practical purposes. • The Second Derivative Test for Functions of Several Variables We now consider an application of the theory of quadratic forms to multivariable calculus—the derivation of the second derivative test for local extrema of a function of several variables. We assume an acquaintance with the calculus of functions of several variables to the extent of Taylor's theorem. Let z — /(*i. Likewise. (12). if / has a local extremum at p = (pi.J. the scale of the z' axis has been adjusted so that the figure fits the page. then x G S if and only if 2s^ + lsA2 . the graph of the equation.Sec.. and V3. . v2 and V3 are oriented to lie in the principal directions.

.pn +*n) . . * 2 .pi+i... *n) be a real-valued function in n real variables for which all third-order partial derivatives exist and arc continuous. 1. then f has no local extremum at p (p is called a saddle-point of f).. .. The partial derivatives of g at 0 are equal to the corresponding partial derivatives of / at p. (b) If all eigenvalues of A(p) are negative.. .....pn) be a critical point of f. Note that if the thirdorder partial derivatives of / are continuous. and let A(p) be the Hessian of f at p. 2. If p ^ 0. by an elementary single-variable argument. . 3. all of the eigenvalues of A(p) are real. Proof. . So.f(p)- . The second-order partial derivatives of / at a critical point p can often be used to test for a local extremum at p. 0 is a critical point of g. column j entry is d2f{p) (dti)(dtj)This matrix is called the Hessian matrix of / at p. Let p = (p\. then the second derivative test is inconclusive.. 6 Inner Product Spaces function <f>i defined by 4>i(t) = f(pi. (d) If rank(A(p)) < n and A(p) does not have both positive and negative eigenvalues.. . Theorem 6. df(p) Ok d&ipi] = 0. . then f has a local minimum at p. we may define a function g: Rn —• 77 by 0(*i. 0 . +p2.t2 The following facts are easily verified. dt Thus p is a critical point of / ... (a) If all eigenvalues of A(p) arc positive.37 (The Second Derivative Test). Let / ( t i . In this case.t n ) = /(*i +pl. 0). (c) 7f A(p) has at least one positive and at least one negative eigenvalue. and hence A(p) is a symmetric matrix.p2. then the mixed second-order partials of / at p are independent of the order in which they are taken. . But critical points are not necessarily local extrema.pn) has a local extremum at * = pi.p2. . These partials determine a matrix A(p) in which the row i....* 2 . . The function / has a local maximum [minimum] at p if and only if g has a local maximum [minimum] at 0 — ( 0 .440 Chap.pi-\. then f has a local maximum at p.t. .

) . Let 7 = {v\.8 Bilinear and Quadratic Forms 441 In view of these facts.. Theorem 6.. • • • . .*n)> 2^(^)(a*j (13) where 5 is a real-valued function on Rn such that 5(x) lim x-»0 ||x[| 2 5(*i.Sec.. Now we apply Taylor's theorem to / to obtain the first-order approximation of / around 0. 384) implies that there exists an orthogonal matrix Q such that /Ai 0 \0 0 A2 0 0\ 0 XnJ d2f(0) "Cjtj.33 ' * ^ 7 (77) = QlMH)Q = ^QtA(p)Q = 0 0 0 T/ ... Then Q is the change of coordinate matrix changing 7coordinates into /^-coordinates. .^{dt^dt.. * .«. and by Theorem 6.vn} be the orthogonal basis for Rn whose ith vector is the ith column of Q. ... • •..*n)-»0 *i + *i + r. we may assume without loss of generality that p = 0 and f(p) = 0.*2. It is easy to verify that ip@(H) — $A(p). Since A(p) is symmetric.. (ti./ W + EdU T T^ ^' + 2 S E ^(dti)(dtj) 7=1 1 n a 2 /(fl) *i*j + 5(*i. We have d2f(0) *z*j+5(*i. .*„) /(*..* n ) lim ? = 0...* 2 . . and 0 be the standard ordered basis for R n .ta.*n (14) Let 7f: R n —• 77 be the quadratic form defined by (ti\ *2 v»/ 77 be the symmetric bilinear form corresponding to K.v2.. 6. .20 (p.*2. K 1 y 2 (15) QtA{p)Q = is a diagonal matrix whose diagonal entries are the eigenvalues of A(p).

Thus / attains both positive and negative values arbitrarily close to 0.442 Suppose that A(p) is not the zero values. + e)s2 < 0 = f(0).e\\x\\2 < f(x) < K{x) + f ||x| Suppose that x = > SiVt. by the left inequality in (17). Then i=\ Z* and 1 K(x) = . Consider any x G R n and (15). so / has neither a local maximum nor a local minimum at 0. we have that if all of the eigenvalues of A(p) are negative.r > 0 and ^Xj + e < 0. By (14). 6 Inner Product Spaces matrix. and hence. Choose e > 0 such that e < exists S > 0 such that for any x G |S(x)| < e||x|| 2 . we have < S. By a similar argument using the right inequality in (17). by (13) \S(x)\<e\\x\ K{x) . there ||x|| < 6. Then |Aj|/2 for all R n satisfying such that 0 < A(p) Aj ^ 0 < ||x|| has nonzero eigen0. . Next. we obtain 7=1 ^ ' i=l > ' Now suppose that all eigenvalues of A(p) are positive. This establishes (a) and (b) of the theorem. Let s be any real number such that 0 < |s| < 6. say. suppose that A(p) has both a positive and a negative eigenvalue. Then.e V s 2 < / ( x ) . then / has a local maximum at 0. 7=1 ^ ' Thus f(0) < / ( x ) for ||x|| < 6. This establishes (c). Then ^A. \f(x)-K(x)\ and hence = Chap. .f > 0 for all i. Then ^Aj . Aj > 0 and Aj < 0 for some i and j. /(0) = 0 < E ( i A .y^Xn 2 7=1 :i6) Combining these equations with (16). . and so / has a local minimum at 0. Substituting x = svi and x = SVJ into the left inequality and the right inequality of (17). we obtain f(0) = 0<(±Xi-e)s2<f(sVi) and f(SVj) < (JA. respectively.

Let p and q be the number of positive diagonal entries in the matrix representations of 77 with respect to 0 and 7. assume that p < q. Let 0 = {v\... Without loss of generality. respectively. .38 (Sylvester's Law of Inertia).Sec.. Theorem 6. consider the functions /(*i.. We suppose that p •£ q and arrive at a contradiction...* 2 ) = 4 .wn}.4 and g{tiM) = tl+4 at p = 0. Each such form has a diagonal matrix representation in which the diagonal entries may be positive. negative. and zero. We prove the law and apply it to describe the equivalence classes of congruent symmetric real matrices.vr. 6.8 Bilinear and Quadratic Forms 443 To show that the second-derivative test is inconclusive under the conditions stated in (d). Let 77 be a symmetric bilinear form on a finite-dimensional real vector space V.. We can therefore define the rank of a bilinear form to be the rank of any of its matrix representations.v2... This result is called Sylvester's law of inertia.wq. they are independent of the choice of diagonal representation. . It suffices to show that both representations have the same number of positive entries because the number of negative entries is equal to the difference between the rank and the number of positive entries. Suppose that 0 and 7 are ordered bases for V that determine diagonal representations of 77.vp.w2. we show that the number of entries that are positive and the number that are negative are unique.. / does not have a local extremum at 0. Without loss of generality.. negative. That is. whereas g has a local minimum at 0. . we may assume that 0 and 7 are ordered so that on each diagonal the entries are in the order (of positive. Although these entries are not unique. the function has a critical point at p... If a matrix representation is a diagonal matrix. Then the number of positive diagonal entries and the number of negative diagonal entries in any diagonal matrix representation of H are each independent of the diagonal representation. Proof.. We confine our analysis to symmetric bilinear forms on finite-dimensional real vector spaces. and A(p) = '2 0N 0 0 However. I Sylvester's Law of Inertia Any two matrix representations of a bilinear form have the same rank because rank is preserved under congruence...... or zero...vn} and 7 = {wi. In both cases.wr. . then the rank is equal to the number of nonzero diagonal entries of the matrix.

Vi) i-i = a.Vi). which is a contradiction.. Since v0 G N(L). Thus In 7t \ T t H(vo. Suppose that >o = ^a3v3 j=i For any i < p. . We conclude that p — q...^aiVi\=Yla>2jH{v3.vo) > 0. so that Uj = 0. H{x. H(x.J)>0. These same terms apply to the associated quadratic form.wq+i). Since VQ is not in the span of {vr+\.Vj) > 0 and H(v{).. 1 Definitions. H(v0. j= P +i = E 6 ? H ^ ' .Mo) < 0 and H(vo... It is easily verified that L is linear and rank(L) < p + r — q.vp). it follows that at ^ 0 for some p < i < r..wr)). j=i ^2ajH(v3.v3)= y=i 7=1 / j=i Furthermore. Similarly..v2). but vo G N(L). vn}).Wi) = 0 for q < i < r. t'r+2i • • • > Mn}.444 u• Chap. H(v0. The difference between the number of positive and the number of negative diagonal entries in a diagonal representation of a symmetric bilinear form is called the signature of the form. and signature are called the invariants of the bilinear form because they are invariant with respect to matrix representations. index. j )<0..Vo)=Hl^a3v3. W J ) = E />J^(M'J.Vi) = 0..iH{vi. Hence nullity(L) > n — (p + r — q) > n — r. 6j = 0 for q + 1 < i < r..Vj) = 0 for i < p and H{vo.t.M. H{x.vi). r +i.v0) = H E ^ ^ ' E / \j=l 7=1 w r E oJ/T(« i . The three terms rank. Notice that the values of any two of these invariants determine the value of the third. But for i < p. Let L: V — * RP+r t be the mapping defined by L(x) = (H(x.. / j=l j=P+l So 77(fn. vr+2. 6 Inner Product Spaces where r is the rank of 77 and n = dim(V). H(x. The number of positive diagonal entries in a diagonal representation of a symmetric bilinear form on a real vector space is called the index of the form. So there exists a nonzero vector VQ such that vo ^ span({?. it follows that 77(t'0.Vi) = 77 I E a i v i J v * I \j-i J = =E'V Wi. we have H{vi.

Conversely. We can state this result as the matrix version of Sylvester's law.y) = xlAy with respect to the standard ordered basis for R n . As before. then they are both congruent to the same diagonal matrix. and let D be a diagonal matrix that is congruent to A. • Since the congruence relation is intimately associated with bilinear forms. and it follows that they have the same invariants. Any two of these invariants can be used to determine an equivalence class of congruent real symmetric matrices. and 0. the rank. and signature of a matrix are called the invariants of the matrix. 6. we can apply Sylvester's law of inertia to study this relation on the set of real symmetric matrices. If A and B are congruent n x n symmetric matrices. the index of K is 1. and signature of K are each 2. Corollary 2. 7.32. • Example 11 The matrix representation of the bilinear form corresponding to the quadratic form K(x. and suppose that D and E are each diagonal matrices congruent to A By Corollary 3 to Theorem 6. The difference between the number of positive diagonal entries and the number of negative diagonal entries of D is called the signature of A. Then the number of positive diagonal entries and the number of negative diagonal entries in any diagonal matrix congruent to A is independent of the choice of the diagonal matrix. Corollary 1 (Sylvester's Law of Inertia for Matrices).Sec. and the values of any two of these invariants determine the value of the third. and the signature of K is 0. Therefore the rank of K is 2. Let D and E be diagonal matrices congruent to A and B. Definitions. The number of positive diagonal entries of D is called the index of A.y) = x2 — y2 on R2 with respect to the standard ordered basis is the diagonal matrix with diagonal entries of 1 and — 1. . Proof. Two real symmetric n x n matrices are congruent if and only if they have the same invariants. Let A be a real symmetric matrix. Therefore the rank. Therefore Sylvester's law of inertia tells us that D and E have the same number of positive and negative diagonal entries. index. suppose that A and B are nx n symmetric matrices with the same invariants. Let A be an n x n real symmetric matrix.8 Bilinear and Quadratic Forms Example 10 445 The bilinear form corresponding to the quadratic form K of Example 9 has a 3 x 3 diagonal matrix representation with diagonal entries of 2. A is the matrix representation of the bilinear form 77 on Rn defined by H(x. Let A be a real symmetric matrix. index.

446

Chap. 6 Inner Product Spaces

respectively, chosen so that the diagonal entries are in the order of positive, negative, and zero. (Exercise 23 allows us to do this.) Since A and B have the same invariants, so do D and E. Let p and r denote the index and the rank, respectively, of both D and E. Let di denote the ith diagonal entry of D, and let Q be the n x n diagonal matrix whose ith diagonal entry qi is given by ( 1 -7= \ di 1 \l-di 1 Then Q DQ — Jpr, where Jpr — I p O *r—p o
t

if 1 < i < p if p < i < r if r < i.

Qi = S

It follows that A is congruent to Jpr- Similarly, B is congruent to Jpr, and hence A is congruent to B. [\ The matrix Jpr acts as a canonical form for the theory of real symmetric matrices. The next corollary, whose proof is contained in the proof of Corollary 2, describes the role of Jpr. Corollary 3. A real symmetric n x n matrix A has index p and rank r if and only if A is congruent to Jpr (as just defined). Example 12 Let A = 1 1 3 ' \ 1 2 1 ' 3 1 1/ I 2 1 77 = 1 2 3 2 \l 2 1 and C =

We apply Corollary 2 to determine which pairs of the matrices A, B, and C are congruent. The matrix A is the 3 x 3 matrix of Example 6, where it is shown that A is congruent to a diagonal matrix with diagonal entries 1, 1, and —24. Therefore, A has rank 3 and index 2. Using the methods of Example 6 (it is not necessary to compute Q), it can be shown that B and C are congruent, respectively, to the diagonal matrices and

\ x- -«

Sec. 6.8 Bilinear and Quadratic Forms

447

It follows that both A and C have rank 3 and index 2, while B has rank 3 and index 1. We conclude that A and C are congruent but that B is congruent to neither A nor C. • EXERCISES 1. Label the following statements as true or false. (a) (b) (c) (d) (e) (f) (g) (h) (i) Every quadratic form is a bilinear form. If two matrices are congruent, they have the same eigenvalues. Symmetric bilinear forms have symmetric matrix representations. Any symmetric matrix is congruent to a diagonal matrix. The sum of two symmetric bilinear forms is a symmetric bilinear form. Two symmetric matrices with the same characteristic polynomial are matrix representations of the same bilinear form. There exists a bilinear form 77 such that H(x,y) ^ 0 for all x and yIf V is a vector space of dimension n, then dim(B(V)) = 2n. Let 77 be a bilinear form on a finite-dimensional vector space V with dim(V) > 1. For any x G V, there exists y G V such that y^0,butH(x,y) = 0. If 77 is any bilinear form on a finite-dimensional real inner product space V, then there exists an ordered basis 0 for V such that ijjp(H) is a diagonal matrix.

(j)

2. Prove properties 1, 2, 3, and 4 on page 423. 3. (a) Prove that the sum of two bilinear forms is a bilinear form. (b) Prove that the product of a scalar and a bilinear form is a bilinear form. (c) Prove Theorem 6.31. 4. Determine which of the mappings that follow are bilinear forms. Justify your answers. (a) Let V = C[0,1] be the space of continuous real-valued functions on the closed interval [0,1]. For / , g G V, define H(f,g)= f Jo f(t)g(t)dt.

(b) Let V be a vector space over F , and let J G B(V) be nonzero. Define 77: V x V ^ F by 77(x, y) = [J(x, y)}2 for all x, y G V.

448

Chap. 6 Inner Product Spaces (c) Define 77: 77 x R -> R by II{h,t2) = *j + 2*2. (d) Consider the vectors of R2 as column vectors, and let 77: R2 —* R be the function defined by H(x,y) = det(x,y), the determinant of the 2 x 2 matrix with columns x and y. (e) Let V be a real inner product space, and let 77: V x V — » 7? be the function defined by H(x,y) = (x.y) for x, y G V. (f) Let V be a complex inner product space, and let 7/: V x V — * C be the function defined by H(x,y) = (x,y) for x.y G V.

5. Verify that each of the given mappings is a bilinear form. Then compute its matrix representation with respect to the given ordered basis 0. (a) H: R3 x R3 -> R, where = a\bi — 2a\b2 + a 2 ^i — 0363 and

(b)

Let V = M2x2(77) and 0 = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Define 77: V x V - R by 77(A B) = tr(A) • tr(77). (c) Let 0 = {cos*,sin*,cos 2t, sin 21}. Then 0 is an ordered basis for V = span(tf), a four-dimensional subspace of the space of all continuous functions on 7?. Let H: V x V 77 be the function defined by H(f,g) = f'(0)-g"(0).
2 6. Let 77: R

R be the function defined by = a i62 + a2b\ for G R'"

// (a) (b)

Prove that 77 is a bilinear form. Find the 2 x 2 matrix A such that 77(x. y) = x' Ay for all x, y G R2.

For a 2 x 2 matrix A7 with columns x and y, the bilinear form 77(A7) = H(x,y) is called the p e r m a n e n t of M. * W be 7. Let V and W be vector spaces over the same field, and let T: V — a linear transformation. For any 77 G #(W), define f (77): V x V - » F by f(H)(x,y) = H(T(x),T(y)) for all x,y G V. Prove the following results.

Sec. 6.8 Bilinear and Quadratic Forms (a) If 77 G B(W), then f (77) G £(V). (b) T : #(W) — > B(\f) is a linear transformation. (c) If T is an isomorphism, then so is T. 8. Assume the notation of Theorem 6.32.

449

(a) Prove that for any ordered basis 0, ipp is linear. (b) Let 0 be an ordered basis for an n-dimensional space V over F , and let 4>p: V — * Fn be the standard representation of V with respect to 0. For A G M n X n (F), define 77: V x V -> F by H(x,y) = [(f>0(x)\tA[(j)fi{y)]. Prove that 77 G #(V). Can you establish this as a corollary to Exercise 7? (c) Prove the converse of (b): Let 77 be a bilinear form on V. If A = MH), then H(x,y) = [M^Y^Mv)]9. (a) Prove Corollary 1 to Theorem 6.32. (b) For a finite-dimensional vector space V, describe a method for finding an ordered basis for <3(V). 10. Prove Corollary 2 to Theorem 6.32. 11. Prove Corollary 3 to Theorem 6.32. 12. Prove that the relation of congruence is an equivalence relation. 13. The following outline provides an alternative proof to Theorem 6.33. (a) Suppose that 0 and 7 are ordered bases for a finite-dimensional vector space V, and let Q be the change of coordinate matrix changing 7-coordinates to /^-coordinates. Prove that <f>p — LQ^, where 4>& and 0 7 are the standard representations of V with respect to 0 and 7, respectively. (b) Apply Corollary 2 to Theorem 6.32 to (a) to obtain an alternative proof of Theorem 6.33. 14. Let V be a finite-dimensional vector space and 77 G #(V). Prove that, for any ordered bases 0 and 7 of V, rank(V^(77)) = rank(?^7(77)). 15. Prove the following results. (a) Any square diagonal matrix is symmetric. (b) Any matrix congruent to a diagonal matrix is symmetric. (c) the corollary to Theorem 6.35 16. Let V be a vector space over a field F not of characteristic two, and let 77 be a symmetric bilinear form on V. Prove that if K(x) = 77(x, x) is the quadratic form associated with 77, then, for all x, y G V, H(x,y) = -[K(x + y)-K(x)-K(y)}.

450

Chap. 6 Inner Product Spaces

17. For each of the given quadratic forms i ^ o n a real inner product space V, find a symmetric bilinear form 77 such that K(x) = 77(x, x) for all x G V. Then find an orthonormal basis 0 for V such that ipp(H) is a diagonal matrix. (a) K: R2 (b) K: R2 (c) K: R3 R defined by K ' t l * *2. "J "1 * ' 2 + *| -2*^+4*

77 defined by K r1 J = lt\ - 8*i*2 + t\ R defined by K (u\ *2 = 3*? + Zt\ + 3*§ - 2*i*3 V WW

18. Let S be the set of all (*i, i 2 , *3) G R3 for which 3*? + St2, + 3*1 - 2*1*3 + 2v^(*i + *3) + 1 = 0. Find an orthonormal basis 0 for R3 for which the equation relating the coordinates of points of S relative to 0 is simpler. Describe < S geometrically. 19. Prove the following refinement of Theorem 6.37(d). (a) If 0 < rank(^4) < n and A has no negative eigenvalues, then / has no local maximum at p. (b) If 0 < rank(^4) < n and A has no positive eigenvalues, then / has no local minimum at p. 20. Prove the following variation of the second-derivative test for the case n = 2: Define D = (a) (b) (c) (d) If If If If D D D D > > < = \d2f{p)' \d2f(P)] dt\ &i J \d2f(P)] n2 0*10*2

0 and d2f(p)/dt2 > 0, then / has a local minimum at p. 0 and d2f(p)/dt2 < 0, then / has a local maximum at p. 0, then / has no local extremum at p. 0, then the test is inconclusive.

Hint: Observe that, as in Theorem 6.37, D = det(A) — AiA2, where Ai and A2 are the eigenvalues of A. 21. Let A and E be in M n x n ( F ) , with E an elementary matrix. In Section 3.1, it was shown that AE can be obtained from A by means of an elementary column operation. Prove that EtA can be obtained by means of the same elementary operation performed on the rows rather than on the columns of A. Hint: Note that ElA = (AtE)t.

Sec. 6.9 Einstein's Special Theory of Relativity

451

22. For each of the following matrices A with entries from R, find a diagonal matrix D and an invertible matrix Q such that QlAQ = D.

#m£ /or ^ : columns.

Use an elementary operation other than interchanging

23. Prove that if the diagonal entries of a diagonal matrix are permuted, then the resulting diagonal matrix is congruent to the original one. 24. Let T be a linear operator on a real inner product space V, and define H: V x V -> R by H(x, y) = (x, T(y)) for all x, y € V. (a) (b) (c) (d) Prove that H is a bilinear form. Prove that H is symmetric if and only if T is self-adjoint. What properties must T have for H to be an inner product on V? Explain why H may fail to be a bilinear form if V is a complex inner product space.

25. Prove the converse to Exercise 24(a): Let V be a finite-dimensional real inner product space, and let H be a bilinear form on V. Then there exists a unique linear operator T on V such that H(x, y) = (ar, T(y)) for all x, y e V , Hint: Choose an orthonormal basis (3 for V, let A = ipp(H), and let T be the linear operator on V such that [T]^ = A. Apply Exercise 8(c) of this section and Exercise 15 of Section 6.2 (p. 355). 26. Prove that the number of distinct equivalence classes of congruent real symmetric matrices is (n + l)(n + 2) nxn

6.9*

EINSTEIN'S SPECIAL THEORY OF RELATIVITY

As a consequence of physical experiments performed in the latter half of the nineteenth century (most notably the Michelson Morley experiment of 1887), physicists concluded that the results obtained in measuring the speed of light are independent of the velocity of the instrument used to measure the speed of light. For example, suppose that while on Earth, an experimenter measures the speed of light emitted from the sun and finds it to be 186,000 miles per second. Now suppose that the experimenter places the measuring equipment in a spaceship that leaves Earth traveling at 100,000 miles per second in a direction away from the sun. A repetition of the same experiment from the spaceship yields the same result: Light is traveling at 186,000 miles per second

452

Chap. 6 Inner Product Spaces

relative to the spaceship, rather than 86,000 miles per second as one might expect! This revelation led to a new way of relating coordinate systems used to locate events in space—time. The result was Albert Einstein's special theory of relativity. In this section, we develop via a linear algebra viewpoint the essence of Einstein's theory.

S' Figure 6.8 The basic problem is to compare two different inertial (nonaccelerating) coordinate systems S and S' in three-space (R3) that are in motion relative to each other under the assumption that the speed of light is the same when measured in either system. We assume that S' moves at a constant velocity in relation to S as measured from S. (See Figure 6.8.) To simplify matters, let us suppose that the following conditions hold: 1. The corresponding axes of S and S' (x and x', y and y', z and z') are parallel, and the origin of S' moves in the positive direction of the x-axis of S at a constant velocity v > 0 relative to S. 2. Two clocks C and C are placed in space—the first stationary relative to the coordinate system S and the second stationary relative to the coordinate system S'. These clocks are designed to give real numbers in units of seconds as readings. The clocks are calibrated so that at the instant the origins of S and S" coincide, both clocks give the reading zero. 3. The unit of length is the light second (the distance light travels in 1 second), and the unit of time is the second. Note that, with respect to these units, the speed of light is 1 light second per second. Given any event (something whose position and time of occurrence can be described), we may assign a set of space-time coordinates to it. For example,

Sec. 6.9 Einstein's Special Theory of Relativity if p is an event that occurs at position

453

relative to S and at time t as read on clock C, we can assign to p the set of coordinates M y z W This ordered 4-tuple is called the space—time coordinates of p relative to S and C. Likewise, p has a set of space-time coordinates (x'\ y z v«7 relative to S' and C. For a fixed velocity v, let Jv: R4 /x\ y T, z W where (x\ y z fx'\ y z R4 be the mapping defined by fx'\ y z V)

and

are the space—time coordinates of the same event with respect to 5 and C and with respect to S' and C", respectively. Einstein made certain assumptions about T„ that led to his special theory of relativity. We formulate an equivalent set of assumptions. Axioms of the Special Theory of Relativity (R 1) The speed of any light beam, when measured in either coordinate system using a clock stationary relative to that coordinate system, is 1.

454 (R 2) The mapping T„: R1 (R 3) If

Chap. 6 Inner Product Spaces R4 is an isomorphism.

/x\ T, y — then y' = y and z' = z. (R.4) If (x\ T, l)\ -1 V) ,'\ - k u V and

(x'\ y z V)

/x\ Tr \t)

(x"\ = y" z" V")

then x" = x' and t" - t'. (R 5) The origin of S moves in the negative direction of the rr'-axis of Sf at the constant velocity — v < 0 as measured from S'. Axioms (R 3) and (R 4) tell us that for p 6 R4, the second and third coordinates of T„(p) are unchanged and the first and fourth coordinates of Tv(p) are independent of the second and third coordinates of p. As we will see, these five axioms completely characterize T t ,. The operator Tv is called the Lorentz t r a n s f o r m a t i o n in direction :/;. We intend to compute T„ and use it to study the curious phenomenon of time contraction. Theorem 6.39. On R4, the following statements are true. (a) Tv(ei) = e-i for i = 2,3. (b) span({e2, e.3}) is T,,-invariant. (c) span({f'i,e4}) is Tv-invariant. (d) Both span({r; 2l fi 3}) and span({/?i,C4}) are T*-/nvariant. (e) T*v(('i) = et fori = 2,3. Proof, (a) By axiom (R 2), (°) 0 T„ 0 w (o\ 0 0 w

and hence, by axiom (R 4), the first and fourth coordinates of

T, w

V- " " M f

Sec. 6.9 Einstein's Special Theory of Relativity are both zero for any a,b € R. Thus, by axiom (R 3), (°) 1 T, 0 W (0\ 1 0 v^ (0\ 0 Tv 1 vO> /0\ 0 1 v^y

455

and

The proofs of (b), (c), and (d) are left as exercises. (e) For any j / 2, (T*,(e2), ej) - (e2, T ^ e , ) ) = 0 by (a) and (c); for j = 2, (Tl{e2),ej) = <e 2 ,T„(e 2 )) = (e 2 ,e 2 ) = 1 by (a). We conclude that T*(e 2 ) is a multiple of e 2 (i.e., that T*(e 2 ) — ke.2 for some k € R). Thus, 1 = (e 2 ,e 2 ) = (e 2 ,T u (e 2 )) = (T*,(e2),e2) = (fcc2,e2) = k, and hence T*(e 2 ) = e 2 . Similarly, T*(e 3 ) = 63. I

Suppose that, at the instant the origins of S and S' coincide, a light flash is emitted from their common origin. The event of the light flash when measured either relative to S and C or relative to S' and C has space time coordinates M 0 0 w Let P be the set of all events whose space—time coordinates /x\ y z w relative to S and C are such that the flash is observable from the point with coordinates

(as measured relative to S) at the time t (as measured on C). Let us characterize P in terms of x, y, z, and t. Since the speed of light is 1, at any time t > 0 the light flash is observable from any point whose distance to the origin of S (as measured on S) is t • 1 = t. These are precisely the points that lie on the sphere of radius t with center at the origin. The coordinates (relative to

456

Chap. 6 Inner Product Spaces

S) of such points satisfy the equation x2 + y2 + z2 — t2 — 0. Hence an event lies in P if and only if its space time coordinates

[t > 0)

relative to S and C satisfy the equation x2 + y2 + z2 ~ t2 = 0. By virtue of axiom (R 1), we can characterize P in terms of the space time coordinates relative; to S" and C similarly: An event lies in P if and only if, relative to S' and C, its space-time coordinates f.r'\ II w satisfy the equation {x')2 + (;/) 2 + {z')2 - {t')2 = 0. Let /l 0 0 1 A = 0 0 \0 0 0 0 1 0

(t > 0)

-l)

Theorem 6.40. If (U(u>), w) - 0 for some w € R4, then {VvLATv(w),w)=0. Proof. Let 'A (I 'I and suppose that (LA(W).W) — 0. CASE 1. t > 0. Since (L^w), w?) = x2 + y2 + z2 — t2, the vector w gives the coordinates of an event in P relative to S and C. Because /.A and Vl fx'\ y'
VJ

w—

GR 4 ,

Sec. 6.9 Einstein's Special Theory of Relativity

457

are the space—time coordinates of the same event relative to S' and C, the discussion preceding Theorem 6.40 yields {x'Y + {y'Y + (z'Y - {t'Y = 0. Thus (T*vLATv(w),w) = (lATv(w),Tv(w)) = (x')2 + (y')2 + (z')2 - (t/)2 = 0, and the conclusion follows. CASE 2. t < 0. The proof follows by applying case 1 to — w. 1 We now proceed to deduce information about Tv. Let n\ 0 wi = 0 W ( W 0 Wo. = 0 v-v

and

By Exercise 3, {wi,w 2 } is an orthogonal basis for span({ei,e4J), and span({ei,e4}) is T*L^T v -invariant. The next result tells us even more. T h e o r e m 6.41. There exist nonzero scalars a, and b such that (a) TlLATv(wi) = aw2. (b) T*vLATv(u)2) = bwi. Proof, (a) Because (\-A(wi),w\) — 0, (TlLAlv{w\),w1) = 0 by Theorem 6.40. Thus T*L^T w (u'i) is orthogonal to w\. Since span({ei,e4}) = span({w;i,tij2}) is T*LATv-invariant, T*\-ATv{ivi) must lie in this set. But {wi,u>2} is an orthogonal basis for this subspace, and so T^Lj^Jv{w{) must be a multiple of w2- Thus T*L^Tu(w;1) = aw2 for some scalar a. Since Tv and A are invertible, so is T,*L^T„. Thus a ^ 0, proving (a). The proof of (b) is similar to (a). I Corollary. Let Bv — [Tv]0, where ft is the standard ordered basis for R4. Then (a) B*ABV = A. (b) T^L^T. = LA. We leave the proof of the corollary as an exercise. For hints, see Exercise 4. Now consider the situation 1 second after the origins of S and S' have coincided as measured by the clock C. Since the origin of S' is moving along the #-axis at a velocity v as measured in S, its space time coordinates relative to S and C are

(19) T. Thus. the space—time coordinates for the origin of S' relative to S' and C must be /o\ 0 0 VI for some t' > 0.LATW = (LATV VI /o\ (0\ 0 0 u 0 i 0 VI VI Combining (19) and (20).41. o 0 w o 0 0 \Vi=^I / \ or (21) (22) Next recall that the origin of S moves in the negative direction of the x'-axis of S' at the constant velocity — v < 0 as measured from S'.TV 0 0 VI = -(tf\2 (20) (v\ 0 i 0 VI fv\ 0 0 VI fv\ 0 ) 0 VI T*LAT„ = (LA = v2-l. t' = VT . we conclude that v — 1 = —(f).u * .458 Chap. from (18) and (21). (v\ 0 0 VI But also /v\ 0 0 VI (v\ 0 5 0 VI = /tl\ fv\ 0 0 . we obtain M T. 0 W /o\ 0 0 w for some f > 0. Thus we have fv\ 0 T. 6 Inner Product Spaces Similarly. [This fact . (18) By the corollary to Theorem 6.

39.9 Einstein's Special Theory of Relativity 459 is axiom (R 5). at the end of time t as measured on Earth. 0 VI -v Vl-v2 0 0 1 WT^v2! / \ (25) 1 vT^2' (24) The following result is now easily proved using (22). /0\ 0 T. Then { 1 o 0 —v \VT^ Time Contraction A most curious and paradoxical conclusion follows if we accept Einstein's theory. (25). consider the coordinate systems S and S' and clocks C and C that we have been studying. 0 W f-vt"\ 0 0 V '" / (23) From (23). Suppose that an astronaut leaves our solar system in a space vehicle traveling at a fixed velocity v as measured relative to our solar system. the time that passes on the space vehicle is only t\/\ — v2. 1 second after the origins of S and S' have coincided as measured on clock C. it follows in a manner similar to the derivation of (22) that t" = hence.Sec. from (23) and (24).42. It follows from Einstein's theory that.] Consequently. Suppose that the origin of S' coincides with the space vehicle and the origin of S coincides with a point in the solar system 0 1 0 0 0 0 1 0 0 1 \ VT^ Pwjfl — "v — . there exists a time t" > 0 as measured on clock C such that /0\ 0 T. and Theorem 6. To establish this result. Theorem 6. 6. Let 0 be the standard ordered basis for R 4 .

0 VI Thus ( 1 Vl-v L' v\ft — &v — 2 (o\ 0 0 VI —V \ VT^ 1 0 0 0 1 0 o I 0 0 —v /vt\ 0 0 V^ /0\ 0 0 VI VvT From the preceding equation. the space-time coordinates of the vehicle at any time f > 0 as measured by C are /0\ 0 0 VI But if two sets of space -time coordinates M 0 0 VI 0 0 VI and are to describe the same event.460 Chap. as viewed from S'. 6 Inner Product Spaces (stationary relative to the sun) so that the origins of 5" and S' coincide and clocks C and C read zero at the moment the astronaut embarks on the trip. it must follow that fvt\ 0 T. the space-time coordinates of the vehicle at any time t > 0 as measured by C are /vt\ 0 0 VI whereas. or (26) ty/l-v2. we obtain t' = -vH v7!3^ + VT^ = f. . As viewed from 5.

? EXERCISES 1. Let us make one additional point. .9 Einstein's Special Theory of Relativity 461 This is the desired result. Hints: (a) Prove that 0 0 0 1 0 = 0 0 1 0 0 Vi (p ( and u)2 — l 0 0 \ v-v B*ABV 0 0 -p) where a+ b p = —-— and a—b q = —-—. A dramatic consequence of time contraction is that distances are contracted along the line of motion (see Exercise 9).41. Prove the corollary to Theorem 6.Sec.39. or the kilometer and second. 4. (b) span({ei. For fi\ 0 W\ = 0 w show that (a) {w\. Suppose that we consider units of distance and time more commonly used than the light second and second. Let c denote the speed of light relative to our chosen units of distance. for an arbitrary set of units of distance and time. then it is traveling at a velocity v/c in units of light seconds per second. Complete the proof of Theorem 6.e4}) is T*LATw-invariant. (c). 6. Thus. 2.W2} is an orthogonal basis for span({ei. Prove (b).C4}). (26) becomes f=«. and (d) of Theorem 6./i. It is easily seen that if an object travels at a velocity v relative to a set of units. such as the mile and hour.40 for the case t < 0. 3.

S'. all the origins of S. and S" is moving past S at a velocity V3 > 0 (as measured on .".. x'-. and that there are three clocks C.9)... Assuming that T r . that when measured on any of the three clocks.462 Chap. 8.40 to M i w = o VI to show that p = 1. and :r"-axes coincide. This tells \is that the speed of light as measured in 5 or S' is the same. prove that V3 = Note that substituting v2 = 1 in this equation yields V3 = 1. Suppose.z'. y. Suppose that S' is moving past S at a velocity t'i > 0 (as measured on S).. Assuming Einstein's special theory of .. = T„2T.y": and z.e.:/. S'.z") parallel and such that the x-..:/. Derive (24). 6. is of the form given in Theorem 6. C.^ = B. Compute (By)'1. and C" is stationary relative to S".] . Conclude that if S' moves at a negative velocity v relative to S. then [T.A = BV2BVl). where B. Why would we be surprised if this were not the case? 7. (i. (c) Apply Theorem 6. 5. S" is moving past S' at a velocity u2 > 0 (as measured on 5'). and S" coincide at time 0. and prove that s/l-v* 0 ' 0 1 Xs/T^I f{)) 0 T„ 0 V'J (25) Hint: Use a technique similar to the derivation of (22).. Suppose that an astronaut left Earth in the year 2000 and traveled to a star 99 light years away from Earth at 99% of the speed of light and that upon reaching the star immediately turned around and returned to Earth at the same speed.42. Show ( B r ) _ l = B{ r).'. BV. and S" with the corresponding axes (j. 6 Inner Product Spaces (b) Show that q — 0 by using the fact that B*ABV is self-adjoint. C is stationary relative to S'. and C" such that C is stationary relative to S... Consider three coordinate systems S.y'.

prove the following results. b — tv x = VT=r . The paradox is resolved by observing that the distance from the solar system to the star as measured by the astronaut is less than b.9) is bJ\ . then he or she would return to Earth at age 48. the space—time coordinates of the star relative to S' and C are / b .vt \ 0 0 t-bv V/T^I (c) For . the astronaut perceives a time span of f = ty/1 — v2 = (b/v)y/l — v2.2 in the year 2200. 6. Recall the moving space vehicle considered in the study of time contraction. Explain the use of Exercise 7 in solving this problem. we have x' = by/1 — v2 — fv. show that if the astronaut was 20 years old at the time of departure.v2 fv. the distance from the astronaut to the star as measured by the astronaut (see Figure 6. the space time coordinates of star relative to S and C are fb\ 0 0 VI (b) At time t (as measured on C). Suppose that the vehicle is moving toward a fixed star located on the :r-axis of 5 at a distance b units from the origin of S. This result may be interpreted to mean that at time t! as measured by the astronaut. 9. Earthlings (who remain "almost" stationary relative to S) compute the time it takes for the vehicle to reach the star as t = b/v.Sec. t — bv t = v T ^ ' . A paradox appears in that the astronaut perceives a time span inconsistent with a distance of b and a velocity of v. If the space vehicle moves toward the star at velocity v.9 Einstein's Special Theory of Relativity 463 relativity. Assuming that the coordinate systems S and S' and clocks C and C are as in the discussion of time contraction. (a) At time t (as measured on C). and . Due to the phenomenon of time contraction.

we assume that A is a square. concentrating primarily on changes in 6 rather than on changes in the entries of A. First. In addition. in many cases. is b\/\ — v2. both m and n are so large that a computer must be used in the calculation of the solution. Second. in the collection of data since no instruments can provide completely accurate measurements.0. Thus distances along the line of motion of the space vehicle appear to be contracted by a factor of s/l — v2. computers introduce roundoff errors. as measured by the astronaut. as measured by the astronaut. the system is called ill-conditioned.0. we studied specific techniques that allow us to solve systems of linear equations in the form Ax — b. experimental errors arise. One might intuitively feel that small relative changes in the coefficients of the system cause.4. where A is an m x n matrix and 6 is an m x 1 vector.0) coordinates relative to S' *(star) S' Figure 6. small relative errors in the solution. We now consider several examples of these types of errors. Such systems often arise in applications to the real world. 6.. is v. and. complex (or real).0) coordinates relative to S (d) Conclude from the preceding equation that (1) the speed of the space vehicle relative to the star. invcrtible matrix since this is the case most freemently encountereel in applications. The coefficients in the system are frequently obtained from experimental data.464 Chap. (2) the distance from Earth to the star. A system that has this property is called well-conditioned: otherwise.9 (6.10* CONDITIONING AND THE RAYLEIGH QUOTIENT In Section 3. i . Thus two types of errors must be considered. 6 Inner Product Spaces (x'.

Most . we are really interested in relative changes since a change in the solution of. is considered large if the original solution is of the order 10 .5 has the solution 3 + 6/2 2-8/2 Hence small changes in b introduce small changes in the solution.00005\ 1. ||6|| = y/(b. More generally.Sec.99995/ ' We see that a change of 10 4 in one coefficient has caused a change of less than 10 . but small if the original solution is of the order 106.b).2 .10 Conditioning and the Rayleigh Quotient Example 1 Consider the system x\ + x2 = 5 X\ — X2 — 1. Thus we have 8b = We now define the relative change in b to be the scalar ||o"6||/||&||. 10.x2 = 1. where b is the vector in the original system and b' is the vector in the modified system. where || • || denotes the standard norm on C n (or R n ). We use the notation 5b to denote the vector b' — b. The solution to this system is 2/- 465 Now suppose that we change the system somewhat and consider the new system x\ + x2 = 5 xi . This modified system has the solution 3. Of course.4 in each coordinate of the new solution. the system $i + X2 = 5 X\ — X2 = 1 4. 6.0001. say. that is.

Similar definitions holel for the relative change in x. the solution to the system.466 Chap.y • Hence. (3 + (h/2)\ \2 ~ (h/2)J . • Example 2 Consider the system X\ X\ which has + + X2 = 1. The solution to the relateel system x\ + x2 = 3 X] + 1. Observe.(105)//' i + (io5)/».00001a-2 = 3 3.00001 + e 5 is 2 .00001x2 = 3. that the lines defineel by the two equations are nearly coincident. Se> a small change in either line could greatly alter the point of intersection. In this example.00001. se> the system is well-conditioned. however.|. is true for any norm. that is. the relative change in b. IN '26 and \6x\ as its solution. • .G ) '26 0 Thus the relative change in x equals. 6 Inner Product Spaces of what follows. ll-rll while1 ll^l Zy/2 Thus the relative change in x is at least 104 times the relative change in 6! This system is very ill-conditioned. llfall = 10 5 >/275|/i| > 104|//. coincidentally.

an such that X= Hence R(x) = BX.— |> ' \\ ~~ II T *I ^ It is easy to see that R(vi) = Aj. By Theorems 6. . The second half is proved similarly. for x G F n . so we have demonstrated the first half of the theorem.17.x) /||x|| 2 . (Recall that by the lemma to Theorem 6. .19 (p. X) \ E " = 1 aiXiVi' EJ=1 aJV> ^TaiVi. . . i=l Define the I j. (Euclidean) norm of A by \\Ax\ \A\\ = max xjtO \\x\\ where x G Cn or x G R n .1 for further results about norms. Definition. a 2 . For a self-adjoint matrix B G MnXn(F). we may choose an orthonormal basis {vi. The Rayleigh quotient for x ^ 0 is defined to be the scalar R(x) = (Bx. . 384). 384) and 6. \\A\\ represents the maximum magnification of a vector by the matrix A. where Ai > A2 > • • • > ATO. as well as the problem of how to compute it. 373. Proof.20 (p.) Definition. v 2 . we have that max R(x) is the largest eigenvalue ofB and min R(x) is the smallest eigenvalue x^O xj^O ofB. p. . (See Exercise 24 of Section 6. there exist scalars oi. The following result characterizes the extreme values of the Rayleigh quotient of a self-adjoint matrix. vn} of eigenvectors of B such that Bvi = Ajfj (1 < i < n). H .) Now. Let B be an n x n self-adjoint matrix. the eigenvalues of B are real. The question of whether or not this maximum exists. Let A be a complex (or real) n x n matrix. . we need the notion of the norm of a matrix. Intuitively. . Theorem 6. 6. can be answered by the use of the so-called Rayleigh quotient.Sec.43.10 Conditioning and the Rayleigh Quotient 467 To apply the full strength of the theory of self-adjoint matrices to the study of conditioning.

* « > . then A* A is not invertible. Apply A to both sides to obtain (AA*)(Ax) = \(Ax). The proof of the converse is similar. 6 Inner Product Spaces Corollary 1. If A = 0. and let A be the largest eigenvalue of B. in fact. it is only the largest and smallest eigenvalues that are of interest.43 that p | | 22 = Observe that the proof of Corollary 1 shows that all the eigenvalues of A* A are nonnegative. For any square matrix A. Then ||J4 _1 || = l/\/A. Ax) (A*Ax. Lemma. Then \\A~~1 \\2 equals the largest eigenvalue of (A'1)*A'1 = (AA*)~\ which equals 1/A n . Example 3 Let . Hence A and A* are not invertible. we have that A is an eigenvalue of AA*. so that A is also an eigenvalue of AA*. 0< |Ar|| 2 (Ax. for x / 0. Then there exists x ^ 0 such that A* Ax = \x. A is an eigenvalue of A* A if and only if A is an eigenvalue of AA*. I Corollary 2. We see the role of both of these eigenvalues in our study of conditioning. Let B be the self-adjoint matrix A*A. Let A be an invertible matrix. Ne>w let Ai > A2 > • • • > ATl be the eigenvalues of A* A. Proof. For our next result. \\A\\ is finite and. where A is the largest eigenvalue of A* A. Let A be an eigenvalue of A* A. The proof of the converse is left as an exercise. it follows from Theorem 6. equals y/X. Suppose now that A ^ O . For example.1 is an eigenvalue of its inverse.468 Chap. Proof. we need the following lemma. which by the lemma are the eigenvalues of AA*. where A is the smallest eigenvalue of A* A. Proof. < £ # . Since Ax ^ 0 (lest Xx = 0). Recall that A is an eigenvalue of an invertible matrix if and only if A . I For many applications. Since. in the case of vibration problems. the smallest eigenvalue represents the lowest frequency at which vibrations can occur. For any square matrix A.x) x\ _ A.

we may compute R(x) for the matrix B as 3 > R(x) = (Bx. x) 2(a2 + b2 + c2 -ab + ac + be) a2 + b2-r c2 Now that we know ||^4|| exists for every square matrix A.10 Conditioning and the Rayleigh Quotient Then 469 The eigenvalues of B are 3. we can make use of the inequality \Ax\ < \\A\\ • \\x\\.\\Ax\\ < \\A\\ • \\x\\ and Thus \8x\\ imi \\8x\\ = H ^ W I I < P " II • \W\. and 0. T h e o r e m 6. the following statements are true. Similarly (see Exercise 9). and so 8x = A-1 (8b). ( « ) . For a given 8b. 3. Therefore. and Ax = b. the only property needed to establish the inequalities above is that \\Ax\\ < \\A\\ • \\x\\ for all x. < II&II/PII < WA-'w-im-wAw \b\\ = « . 1 \\5b\\ ll<MI . b ^ 0. . Hence . Then A(8x) — 8b. There are many reasonable ways of defining the norm of a matrix. We summarize these results in the following theorem. \\A\\-\\A-i\\ (¥b\\\^ \\Sx\\ < \\\b\\J The number \\A\\ • ||>1 -1 || is called the condition number of A and is denoted cond(yl).JI<H>ll (a) For any norm || • ||. let 5x be the vector that satisfies A(x + 8x) = b + 8b. which holds for every x./. 6. Assume in what follows that A is invertible. . we have cond(A) . where A is invertible and b ^ 0.„ . ||^4|| = \/3. In fact. .44. . For the system Ax — b.For any x=\b\^0.Sec. It should be noted that the definition of cond(^4) depends on how the norm of A is defined.

. the estimate of the relative error in x is based on an estimate of cond(^4). The norm of A always equals the largest eigenvalue of A. (a) (b) (c) (d) (e) If Ax = b is well-conditioned. If cond(A) is large. Thus. For example. A + 8A may fail to be invertible. respectively. then cond(A) = \/Ai/A n . PWS Publishing Co. 6 Inner Product Spaces (b) If || • || is the Euclidean norm. We can see immediately from (a) that if cond(^4) is close to 1. We have so far considered only errors in the vector b. cond(^4) merely indicates the potential for large relative errors. p. in practice. So we are caught in a vicious circle! There are. If cond(^4) is small. 1 It is clear from Theorem 6. The norm of A equals the Rayleigh quotient. Cullen. For example. or the relative error in x may be large even though the relative error in b is small! In short. If there is an error 8A in the coefficient matrix of the system Ax — b. and the error in the computed inverse is affected by the size of cond(A)..44 that cond(^4) > 1. and (b) follows from Corollaries 1 and 2 to Theorem 6. then \\5. however. Statement (a) follows from the previous inequalities. the situation is more complicated. then Ax — b is well-conditioned. \\8A\ < cond(. where Ai and An are the largest and smallest eigenvalues. Label the following statements as true or false. then the relative error in x may be small even though the relative error in b is large. Boston 1994. An Introduction to Numerical Linear Algebra. 60) shows that if A + 8A is invertible. then cond(^4) is small. It is left as an exercise to prove that cond(A) = 1 if and only if A is a scalar multiple of a unitary or orthogonal matrix. then Ax = b is ill-conditioned. if a computer is used to find A~l. of A* A. EXERCISES 1. the computed inverse of A in all likelihood only approximates A~l. Proof. it can be shown with some work that equality can be obtained in (a) by an appropriate choice of b and 8b. some situations in which a usable approximation of cond(^4) can be found. it can be shown that a bound for the relative error in x can be given in terms of cond(A). one never computes cond(v4) from its definition. Moreover. But under the appropriate assumptions. then a small relative error in b forces a small relative error in x. If cond(A) is large.4) 8x\ JM It should be mentioned that.43.470 Chap. for it would be an unnecessary waste of time to compute A-1 merely to determine its norm. In fact. Charles Cullen (Charles G. however. in most cases.

A~lb\\ (the absolute error) and \\x .10 Conditioning and the Rayleigh Quotient 2. Prove that if B is symmetric.1 ||. Compute the norms of the following matrices.43. then ||Z?|| is the largest eigenvalue of B. If cond(J4) = 100.Sec. 6. Use (a) to determine upper bounds for \\x . ||6|| = 1.i i ) (e) 0 471 (a) % Tz 75 °\ l \° 4. Prove that if A is an invertible matrix and Ax — b.3 8 / -vr -38 50. (a) Approximate ||i4||. Let B = Compute | B ||. then A is an eigenvalue of A* A.) (b) Suppose that we have vectors x and x such that Ax = b and ||6 — Ax\\ < 0.A _1 &|| (the relative error).001. ||A . and 0. Let A and A-1 be as follows: A= 6 13 13 29 V—17 . Suppose that x is the actual solution of Ax = b and that a computer arrives at an approximate solution x. Let B b e a symmetric matrix. Prove that min R(x) equals the smallest x^O eigenvalue of B. \\8x\\ \A\\-\\A -ii v II&II J - . n m \ \ < . (Note Exercise 3. This completes the proof of the lemma to Corollary 2 to Theorem 6. 4 1 0\ s) f1 (b) ( ..2007. and ||6 — Ax\\ = 0.1. 5. 6.4 -1 6||/||.0588. Prove that if A is an eigenvalue of AA*. and cond(Z?). 8. 9. 7. obtain upper and lower bounds for \\x — x||/||x||.74. 0. and cond(^l). then i / -8b. = The eigenvalues of A are approximately 84. and A l j 3.

Let A be an n x n matrix e)f rank r with the nonzero singular values o~i > cr2 > • • • > rr r .472 Chap. Define |T|| = max \T(x) Prove that ||T|| = ||[T]^||. Prove that cond(. any rigid motion on a finite-dimensional real inner product space is the composite of an orthogonal operator and a translation. Definitions..Prove that ||T|| (defined in (b)) does not exist... This material assumes familiarity with the results about direct sums developed at the end of Section 5. 386). Such is the aim of this section. Let T be a linear operator on a finite-dimensional real inner product space V. to understand the geometry of rigid motions thoroughly.. (c) If A is invertible (and hence r = n).4) = 1 if and only if A is a scalar multiple of a unitary or orthogonal matrix. (a) Let A and B be square matrices that are unitarily equivalent.22 (p.where (3 is any orthonormal basis for V. then cond(vl) = — . Let T be the linear operator on V such that T(vk) = kvk. 13. and familiarity with the definition and elementary properties of the determinant of a linear operator defined in Exercise 7 of Section 5. o~„ 6.11* THE GEOMETRY OF ORTHOGONAL OPERATORS By Theorem 6.44. ||At|| = i . Prove the left inequality of (a) in Theorem 6. 6 Inner Product Spaces 10.}.v2. (c) Let V be an infinite-dimensional inner product space with an orthonormal basis {vi.7. 12.2. Thus. (b) Let T be a linear operator on a finite-dimensional inner product space V. (a) (b) ||A||=ai.1. we must analyze the structure of orthogonal ejperators. Prove each of the following results. Prove that \\A\\ = \\B\\. The next exercise assumes the definitions of singular value and pseudoinverse and the results of Section 6. 11. The operator T is called a rotation if T is the identity on . We show that any orthogonal operator on a finite-dimensional real inner product space is the composite of rotations and reflections.

In this context. Thus T is a reflection of R2 about W1 = span({e 2 }). Definitions. The subspace W-1 is called the axis of rotation. If A = —1.b. The operator T is called a reflection if there exists a one-dimensional subspace W of V such that T(x) = —x for all x G W and T(y) = y for all y G W-1. Then T(x) = . T is called a rotation about W"1.-c). The principal aim of this section is to establish that the converse is also true. 6. det(T) = 1.6). so T is a reflection of V about V1. then T is the identity on V. T(x 2 ) = (—sint9)xi + (cos0)x 2 .b) = (-0. the y-axis.1 for the special case that V = R 2 . that is. Rotations are defined in Section 2. Note that in the first case.e 2 }). and a real number 0 such that T(xi) = (cos0)xi + (sin0)x 2 . and T(y) = y for all 1 / 6 W 1 . T is called a reflection ofV about W1. Since T is orthogonal and A is an eigenvalue of T. Hence T is a reflection of R3 about W-1-. any orthogonal operates on a finite-dimensional real inner product space is the composite of rotations and reflections. an orthonormal basis (3 = {xi.y for all y € W"1 = span({ei. and let W = span({ei}). and T(y) .Sec. Then T(x) = -x for all x G W. and so T(x) = Ax for some A G R. of W and T(y) — y for all y G \N±.c) = (a. (b) Let T: R3 -» R3 be defined by T(a. Example 1 A Characterization of Orthogonal Operators on a One-Dimensional Real Inner Product Space Let T be an orthogonal operator on a one-dimensional inner product space V. Let T be a linear operator on a finite-dimensional real inner product space V. and let W = span({e 3 }). and hence T is a rotation. • Example 2 Some Typical Reflections (a) Define T: R2 -> R2 by T(a.11 The Geometry of Orthogonal Operators 473 V or if there exists a two-dimensional subspace W ol'V. It should be noted that rotations and reflections (or composites of these) are orthogonal operators (see Exercise 2). Choose any nonzero vector x in V. If A = 1. A = ±1.b. Then V = span({x}). and in the second case. The following theorem characterizes all orthogonal . x 2 } ^ o r W. then T(x) — —x for all x G V. Thus T is either a rotation or a reflection. det(T) = — 1. In this context.= {0}.x for all x G W. the xy-plane. • Example 1 characterizes all orthogonal operators on a one-dimensional real inner product space.

) Theorem 6. and. 6 Inner Product Spaces operators on a two-dimensional real inner product space V. det(Ti) = 1 and det(T 2 ) = . 387) since all two-dimensional real inner product spaces are structurally identical. Thus. If T is a linear operator on a nonzero finite-dimensional real vector space V. By Exercise 15 of Section 6. The proof for TiT 2 is similar. 2/n} for V. . det(T) = det(T 2 )« det(Ti) = —1.n. If we then define W = 4>^l(Z). where j3 is an orthonormal basis for V. Then <f>p is an isomorphism. Moreover.23 (p. Proof.10 . then by Theorem 6. Furthermore.474 Chap. (S| We now study orthogonal operators on spaces of higher dimension. A complete description of the reflections of R2 is given in Section 6. it suffices to show that there exists an L^-invariant subspace Z of R n such that 1 < dim(Z) < 2. as we have seen in Section 2. 104).45. The Proof. The proof follows from Theorem 6. and T is a reflection if and only if det(T) = — 1. As a consequence. by Theorem 6. Let T he an orthogonal operator on a two-dimensional real inner product space V.5. (See Exercise 8. 2 . the resulting isomorphism (j>p: V — » R2 preserves inner products. Fix an ordered basis (3 = {y\. Since T 2 and Ti are orthogonal. and let A = [T]p.1 . • • •. y 2 . For a rigorous justification. then there exists a T-invariant subspace W of V such that 1 < dim(W) < 2. Corollary. Rn Figure 6. .10 commutes. so is T.45. V V Rn L ^ . the diagram in Figure 6. T is a rotation if and only if det(T) = 1. Lemma. If Ti is a reflection on V and T 2 is a rotation on V.45.21 (p. . T is a reflection. Let T = T 2 Tj be the composite. it follows that W satisfies the conclusions of the lemma (see Exercise 13). apply Theorem 2.2. Let V be a two-dimensional real inner product space.4. Then T is either a rotation or a reflection. that is. . LA(f>p = <j>p\. composite of a reflection and a rotation on V is a reflection on V. Let <f>p: V — > R n be the linear transformation defined by <f>p(yi) — e% for i — 1 .

Proof The proof is by mathematical induction on dim(V). as such. where xi and x 2 have real entries. and the preceding pair of equations shows that Z is L^-invariant. . . ..11 The Geometry of Orthogonal Operators 475 The matrix A can be considered as an n x n matrix over C and. 2 . Similarly. it has an eigenvalue A G C. 6. Since xi ^ 0 or x 2 7^ 0. we conclude that Ax 1 — A1X1 — A2x2 and Ax2 — Ajx2 + A 2 xi. the result is obvious.x 2 }). . and lax +ih\ a2 + ib2 x = \an + ibn) where the a^'s and tVs are real. WTO} ofV such that (a) 1 < dim(Wi) <2 for i = 1 . Note that at least one of Xi or x 2 is nonzero since x ^ 0. Thus.46. Since U is a linear operator on a finite-dimensional vector space over C. . Finally. can be used to define a linear operator U on C n by U(i>) = Av. m.Sec. the span being taken as a subspace of R n . Then there exists a collection of pairwise orthogonal T-invariant subspaces {Wi.. . . So assume that the result is true whenever dim(V) < n for some fixed integer n > 1. (b) v = w 1 e w 2 e .?'A2)(xi + 1x2) = (\\X\ — A 2 x 2 ) + i(\\X2 4. setting far\ Xi = Vn) and x2 = b2 Vn) we have x = X\ + ix 2 . We may write A = Ai + iA2. let Z = span({xi. Let T be an orthogonal operator on a nonzero finitedimensional real inner product space V.e w T O .A 2 xi). Hence U(x) = Ax = (Ai -I. If dim(V) = 1. where Ai and A2 are real. II T h e o r e m 6. U(x) = ^4(xi + 1x2) = Ax\ + iAx2Comparing the real and imaginary parts of these two expressions for U(x). Thus 1 < dim(Z) < 2. . . Z is a nonzero subspace. Let x G C n be an eigenvector corresponding to A. W 2 .

very little can be said about the uniqueness e)f the decomposition of V in Theorem 6.1 .. (a) The number ofWi s for which Tw. (b) Let E = {x G V: T(x) = —x}. i/Tw./ {0}. V = Wi e VArf = Wi $ W 2 0W. Tw. . the result follows. W 2 . . For example. W 2 . . by Exercise 15. If W = E . . V. . exists a nonzero x G W^ for which T(x) = — x. (a) Let r denote the number of W.2. there exists a T-invariant subspace Wi of V such that 1 < dim(W) < 2. the number rn of Wj's. W m } is pairwise. is a reflection. we conclude that the restriction of T to W."s for which Tw. By Exercise 14. W. det(T) = det(T W l ). Otherwise..45 in the context of Theorem 6.2. W m } of W^. is a reflection is not unique. Otherwise. Proof. orthogonal. Unfortunately.det(T w . \Nj. By the lemma. . m .46 to Tw. . Although the number of W. is a reflection is zero or one according to whether elet(T) = 1 or dct(T) = —1.k. . x G W. . .'s for which Tw. in some sense.such that 1 < dim(W<) < 2 for i = 2 . . . .) dct(T w. then dim(Wi) = 1. Since dim(Wf) < n. we can always decompose V so that Tw. is a reflection. Let T. Applying Example 1 and Theorem 6. Wi. W 2 . 6 Inner Product Spaces Suppose elim(V) = n.we obtain a collection of pairwise orthogonal T-invariant subspaces {Wi.46. there. .47.. Observe that. For otherwise. we may apply the induction hypothesis to T w i and conclude that there exists a collection of pairwise orthogonal T-invariant subspaces {Wi. and the number of W. is either a rotation e>r a reflection for each i = 2 . Thus. Moreover. the dimension of each Wi is either 1 or 2.476 Chap.46. fi E C E 1 n E = {0}. if Tw. If Wi = V.. Wfc} of W such that W = Wi 0 W 2 0 • • • e Wfc and for 1 < i < k. then E is a T-invariant subspace of V. . Wj1 is T-invariant and the restriction of T to Wf is orthogonal. So by applying Theorem 6. Then. then W is T-invariant. is a reflection for at most one W. 3 . . whether this number is even or odd is an intrinsic property of T. the NA/j's. T is composed of rotations and reflections. .'s in the decomposition for which Twi is a reflection. . These fae. .„ be as in Theorem 0. (b) It is always possible to decompose V as in Theorem 6. = (-1 proving (a). is a rotation. 3 . If E = {()}. . Furthermore. and by Exercise 13(d) of Section 6. is a reflection is even or odd according to whether det(T) = 1 or det(T) = . the result is established. . . Thus { W i . T h e o r e m 6. . .46 so that the number of Wj s for which Tw. for each i — 1. .46.ts are established in the following result. . m and W^ = W 2 e W 3 0 • • • 0 W m . a contradiction. But then. is a reflection are not unique.

47(b)... T. It is easily shown that each T$ is an orthogonal operator on V. clearly. \fii = det -1 0 0 -1 = 1.Sec. | As a consequence of the preceding theorem. if any $ contains two vectors.. As in the proof of Theorem 6. let Wk+i = span(/?i). JA Thus Twfc+J. T 2 . This establishes (a) and (b). (See Exercise 16. . Let T be an orthogonal operator on a finite-dimensional real inner product space V. {Wi.: V — +V by Tj(xi + x 2 + = Xi+X2-\ r. For each i = 1. then det(T Wfc+i ) = det([Tw. 6.46.l ) = . (d). and we conclude that the decomposition in (27) satisfies the condition of (b). The proofs of (c). . . is a reflection by Theorem 6. . .. .) = d e t ( . 2 ..11 The Geometry of Orthogonal Operators 477 choose an orthonormal basis f3 for E containing p vectors (p > 0). (d) T = T i T 2 . define T. Proof. W f c . Then. For each i = 1 . then dim(Wfc+r) = 1 and = det([Tw. 2 . (c) TiTj = TjT. and (3r contains two vectors if p is even and one vector if p is odd. Corollary. r. (b) For at most one i. .T m ..xm. is a rotation or a reflection.) 1 . is either a reflection or a rotation. . Then there exists a collection { T i . • • •.. (a) For each i.. and hence Tw. In fact. where Xj G Wj for all j. T m } of orthogonal operators on V such that the following statements are true. (e) det(T) = 1 if Ti is a rotation for each i — 1 otherwise. T. . is a rotation for i < m. (27) So Twfc+i is a rotation. is a rotation for j < k + r. where Tw. is a rotation or a reflection according to whether Tw.Xi-i + T(Xi) + X i+ i -I 1 . .-0Wm. W f c + r } is pairwise orthogonal. W2. It is possible to decompose ft into a pairwise disjoint union (3 = (3\ U (32 U • • • U (3r such that each /% contains exactly two vectors for i < r. we can write V = W10W20.1 . m. and V = Wi 0 W 2 © • • • 0 Wfc © • • • 0 Wfc+r Moreover. If (3r consists of one vector. . T% is a reflection. . an orthogonal operator can be factored as a product of rotations and reflections. . for all i and j. and (e) are left as exercises.

(j) Rotations always have eigenvalues. then V = Wx © W 2 © W 3 and ehm(W. let Tj be as in the. then T is a reflection. 2. (c) The composite e > f any two rotations on a three-dimensional space is a rotation. (i) Reflections always have eigenvalues. Let V = W. reflections. suppose that dim(Wi) = 1 and dim(W 2 ) = 2. Label the following statements as true or false. is a reflection or the identity on Wi. (g) Any orthogonal operator is a composite of rotations. is not a reflection. Since Twi is a reflection for at most one i.47(b). (a) Any orthogonal operator is either a rotation or a re-flection. (e) The ielentity operator is a rotation. then V = Wi © W 2 . Thus Tw. Tj is a reflection. If m = 2. is not a reflection. (f) The composite of two reflections is a reflection.47. . proof of the corollary to Theorem 6. (h) For any orthogonal operator T.) = 1 for all i. (Note that if Tw. • EXERCISES 1. 6 Inner Product Spaces Orthogonal Operators on a Three-Dimensional Real Inner Product Space Let T be an orthogonal operator em a three-dimensional real inner product space V. Defining Tj and T 2 as in the proof of the corollary to Theorem 6. (d) The composite of any two rotations on a four-dimensional space is a rotation. Without loss of generality. We show that T can be decomposed into the composite of a rotation and at most one reflection.478 Example 3 Chap. Assume that the underlying vector spaces are finite-dimensional real inner product spaces. Otherwise. (b) The composite of any two rotations on a two-dimensional space is a rotation. Prove that rotations. then Ti is the identity on V and T = T 2 . we conclude that T is either a single reflection or the identity (a rotation).47. If Tw. we have that T = TiT 2 is the composite of a rotation and at most one reflection. then Tj is the identity on Wj. Clearly.) If m = 3. anel composites of rotations and re%flections are orthogonal operators. and Tw2 I s a rotation. m = 2 or m = 3. For each i. if det(T) = — 1.©W20---0Wm be a decomposition as in Theorem 6.

11 The Geometry of Orthogonal Operators 3.ip G R. let A = cos* Usm< sin* — cos* (a) Prove that LA is a reflection. (b) Find the axis in R2 about which L^ reflects. 6. that is. define matrices 1 0 0 A = | 0 cos <p — sin < . (c) Deduce that any two rotations on R2 commute. For any real number < / > . 6. _ /cos <p — sin ysin</> COS' (a) Prove that any rotation on R2 is of the form T^. (b) Find the axis in R2 about which L^ reflects.45 using the hints preceding the statement of the theorem. (b) Prove that LAB is a rotation. Prove Theorem 6. Prove that no orthogonal operator can be both a rotation and a reflection. Let ( I 2 v/3 \/3\ 2 1 "2 I 479 A = and B— 0 -1 (a) Prove that VA is a reflection. define T<p = LA. 0 sin <b cos and 'cos ip B = | sint/' 0 — sin ip 0^ cos ip 0 0 1. 4. (a) Prove that LA and LB are rotations. For any real number <p. Sec. 5. . 9. (c) Prove that \-AB and \-BA are rotations. the subspace of R2 on which L^ acts as the identity. Prove that the composite of any two rotations on R3 is a rotation on R3. (c) Find the axis of rotation for LAB8. Given real numbers (p and ip. 7. where . (b) Prove that T^T^p = T^+^) for any (p. for some </>.•^•-T* i.

. 12. Complete the proof of the1 corollary to The'orcm 6.. Hint: Use the fact that. prove the following rosults. 16.s. (b) If /? is even. 13. Let T be an orthogonal [unitary] operator on a finite-dimensional real [complex] inner product space V.(/) G W.47. Let T be1 a linear e)perate>r on a finite-eliniensional vector space V. Tw is one-to-one and onto te) concluele that. for any y G W. V = W] © W 2 0 .) ele>t(TwJ. Complete the proof e > f the lemma to Theorem 6. 15. show that there exists a unique rotation T em V such that T(x) — y. If W is a T-invariant subspace erf V. Lt. 14. say. INDEX OF DEFINITIONS FOR CHAPTER 6 Adjoint of a linear operator Adjoint of a matrix 331 Axis of rotation 473 358 Bilinear form 422 Complex inner product space Condition number 469 332 . then T can be e'xpressexl as the1 e-emiposite of at most one reflection anel at most \(n — 1) rotations. (b) W L is a T-invariant subspace of V.46 by shewing that W = 0~i1(Z) satisfies the required conelitieais. (a) If a. Let T be' a linear e)perate>r on an n-dimensional real inner product space V. where V is a diroct sum of T-invariant subspae-e. 6 Inner Product Spaces 10.480 Chap. 11. (a) Tw is an orthogonal [unitary] operator on W. (c) T w is an orthogonal [unitary] operator on W. Prove that if V is a two. T*(y) = T '(. then the cemiposite e > f two reflcctiems on V is a rotation of V. 18. Give an example of an orthogonal operator that is neither a rcfiection nor a rotation. Suppe)se that T is not the identity. y G V sue:h that x / y and ||x|| = \\y\\ = 1. Let V be a real inner product spae*? e)f elimension 2. Prove the following results. For any x. Define T: V — *V by T(x) = — x. Prove that det(T) = elet(T W| ) • ele>t(Tw. of at most | n rotations e>r as the ce)iupe)sitc of one inflection and at most ^(n — 2) rotations. 17.or three-dimensional real inner product space'. Prove that T is a product e > f rotatiems if and only if dim(V) is even. is odd.. then T can be1 e'xpresse>el as the' colllpe)site.t V be a finite-dimensional real inner product spae-e.• -cOW/.

335 Penrose conditions 421 Permanent. of a 2 x 2 matrix 448 Polar decomposition of a matrix 412 Pseudoinverse of a linear transformation 413 Pseudoinverse of a matrix 414 Quadratic form 433 Rank of a bilinear form 443 Rayleigh quotient 467 Real inner product space 332 Reflection 473 Resolution of the identity operator induced by a linear transformation 402 Rigid motion 385 Rotation 472 Self-adjoint matrix 373 Self-adjoint operator 373 Signature of a form 444 Signature of a matrix 445 Singular value decomposition of a matrix 410 Singular value of a linear transformation 407 Singular value of a matrix 410 Space-time coordinates 453 Spectral decomposition of a linear operator 402 Spectrum of a linear operator 402 Standard inner product 330 Symmetric bilinear form 428 Translation 386 Trigonometric polynomial 399 Unitarily equivalent matrices 384 Unitary matrix 382 Unitary operator 379 Unit vector 335 . 6 Index of Definitions Congruent matrices 426 Conjugate transpose (adjoint) of a matrix 331 Critical point 439 Diagonalizable bilinear form 428 Fourier coefficients of a vector relative to an orthonormal set 348 Frobenius inner product 332 Gram-Schmidt orthogonalization process 344 Hessian matrix 440 Index of a bilinear form 444 Index of a matrix 445 Inner product 329 Inner product space 332 Invariants of a bilinear form 444 Invariants of a matrix 445 Least squares line 361 Legendre polynomials 346 Local extremum 439 Local maximum 439 Local minimum 439 Lorentz transformation 454 Matrix representation of a bilinear form 424 Minimal solution of a system of equations 364 Norm of a matrix 467 Norm of a vector 333 Normal matrix 370 Normal operator 370 Normalizing a vector 335 Orthogonal complement of a subset of an inner product space 349 Orthogonally equivalent.Chap. matrices 384 Orthogonal matrix 382 Orthogonal operator 379 Orthogonal projection 398 Orthogonal projection on a subspace 351 Orthogonal subset of an inner product space 335 481 Orthogonal vectors 335 Orthonormal basis 341 Orthonormal se.t.

4. the field of complex numbers is algebraically closed by the fundamental theorem of algebra (see Appendix D). Such an operator has a diagonal matrix representation. and suppose that the characteristic polynomial of T splits.3 The Minimal Polynomial 7.7 C a n o n i c a l F o r m s 7. The first two sections deal with this form. The rational canonical form.2 describes such an operator. there is an ordered basis for the underlying vector space consisting of cigenvee. or. requires that the characteristic polynomial of the operator splits. the advantage of a diagonalizable linear operator lies in the simplicity of its elescription. the Jordan canonical form.4* The Rational Canonical Form .1 THE JORDAN CANONICAL FORM I Let T be a linear operator on a finite-dimensional vector space V.tors of the operator.2 The Jordan Canonical Form II 7. There are different kinds of canonical forms. It is the purpose of this chapter to consider alternative matrix representations for nondiagonalizable operators.2 that the eliagonalizability of T de. These representatiems are called canonical forms.pends on whether the union of ordered bases for the distinct eigenspaces of T is an ordered basis for V. The first of these. However. equivalently.1 The Jordan Canonical Form I 7. So a lack of diagonalizability means that at least one eigenspace of T is too "small. The choice of a canonical form is determineel by the appropriate choice of an ordered basis. if every polynomial with coefficients from the field splits. This form is always available if the underlying field is algebraically closed. Recall from Section 5. the canonical forms of a linear ejperator are not diagonal matrices if the linear operator is not diagonalizable. treated in Section 7. 7. In this chapter. does not require such a factorization. we treat two common canonical forms. that is. and their advantages and disadvantages depend on how they are applied. Naturally. For example. not every linear operator is diagonalizable. / I s we learned in Chapter 5. even if its characteristic polynomial splits." 482 . Example 3 of Section 5.

Observe that each Jordan block Ai is "almost" a diagonal matrix—in fact.2. These are the vectors corresponding to the columns of J with no 1 above the diagonal entry. [T]^ is a diagonal matrix if and only if each Ai is of the form (A).. and hence the multiplicity of each eigenvalue is the number of times that the eigenvalue appears on the diagonal of J. Also observe that v\. Example 1 Suppose that T is a linear operator on C 8 . we extend the definition of eigenspace to generalized eigenspace. 7.V4.. and each Ai is a square matrix of the form (A) or A 0 0 {0 1 0 • • 0 A 1 • • 0 0 0 0 • • A 0 • • 0 °> 0 1 V for some eigenvalue A of T. and (3 = {v\. We also say that the ordered basis j3 is a Jordan canonical basis for T. • .Sec. we select ordered bases whose union is an ordered basis (3 for V such that /Ai O m* = {O O Ak) O A2 0\ O where each O is a zero matrix.1 The Jordan Canonical Form I 483 In this section.v2.v&} is an is a Jordan canonical form of T. Notice that the characteristic polynomial of T is det(J — ti) — (t — 2)4(t — 3)2r. From these subspaces. and the matrix [T]^ is called a Jordan canonical form of T.v^. ordered basis for C8 such that ( 2 0 0 0 J = [T]/3 = 0 0 0 ^ 0 1 2 0 0 0 0 0 0 0 0 1 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 1 3 0 0 0 0 0 0 0 0 0 0 0 \ 0 0 0 0 0 1 0/ . Such a matrix Ai is called a Jordan block corresponding to A.. and vj are the only vectors in (3 that are eigenvectors of T.

8.3. Since v\ and V4 are eigenvectors of T corresponding to A = 2. and let A be a scalar. Just as eigenvectors lie in eigenspaces. then (T — \\)p(v) — 0 for sufficiently large p. then (T —Al)p_1(x) is an eigenvector of T corresponding to A. and V7 and vs correspond to the scalar 0. it follows that (T . Notice that T(v2) = vi+2v 2 and therefore. For example. (T—2\)(v2) = v\. Nevertheless.6. The generalized eigenspace of T corresponding to . Let T be a linear operator on a vector space V. let T' be the linear operator on C8 such that [T']/? = J ' . Similarly.2. Notice that if x is a generalized eigenvector of T corresponding to A.v2. Therefore A is an eigenvalue of T. and let A be an eigenvalue of T. which is different from J. Eigenvectors satisfy this condition for p = 1. each vector in (3 is a generalized eigenvector of T. (T—2\)(v$) = v2. and p is the smallest positive integer for which (T —Al)p(x) = 0. A nonzero vector x in V is called a generalized eigenvector of T corresponding to A if (T — Al)p(x) = 0 for some positive integer p.484 Chap. generalized eigenvectors lie in "generalized eigenspaces. 7 Canonical Forms In Sections 7.3I) 2 (^) = 0 for i = 5. and (T . In fact. But the operator T' has the Jordan canonical form J'." Definition. In the context of Example 1. we can generalize these observations: If v lies in a Jordan canonical basis for a linear operator T and is associated with a Jordan block with diagonal entry A.0\)2(vi) = 0 for i = 7.1 and 7. we prove that every linear operator whose characteristic polynomial splits has a Jordan canonical form that is unique up to the order of the Jordan blocks. vi.2. Definition. where (3 is the ordered basis in Example 1 and /2 0 0 0 J' = 0 0 0 \0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0\ 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0/ Then the characteristic polynomial of T' is also (t — 2)4(t — 3)2t2. and 4. Because of the structure of each Jordan block in a Jordan canonical form. V3 and V4 correspond to the scalar 2. Consider again the matrix J and the ordered basis (3 of Example 1. the Jordan canonical form of the linear operator T of Example 1. Let T be a linear operator on a vector space V.2\)3(vi) = 0 for i = 1. Similarly (T . i>5 and VQ correspond to the scalar 3. it is not the case that the Jordan canonical form is completely determined by the characteristic polynomial of the operator.

is the subset ofV defined by K\ = {x G V: (T — Al)p(x) = 0 for some positive integer p}. and hence x + y G KA. Let p be the smallest integer for which (T — Al)p(x) = 0. so that j / 6 E. denoted K\. for any polynomial g(x). we assume the results of Exercises 3 and 4 of Section 5. Proof. contrary to the hypothesis.X\)P+fl(x + y) = (T.Then there exist positive integers p and q such that ( T . To show that K\ is T-invariant.4.1.p\)(y) = (T . it is a simple observation that EA is contained in KA(b) Let x G KA and (T — p\)(x) = 0..p\)(T . In particular.The proof that KA is closed under scalar multiplication is straightforward. Therefore T(x) G K>.. 0 £ K\.Furthermore. Therefore (T . consider any x G K\. 7.AI) p .Sec.x (T . Finally. Recall that a subspace W of V is T-invariant for a linear operator T if T(W) C W. and hence y G EA. (T . suppose that x 7^ 0. In the development that follows. the restriction ofT — p\ to KA is one-to-one.Al) p+ «(x) + (T .1 The Jordan Canonical Form I A. if W is T-invariant. But EA n EM = {0}.p\)(x) = 0. (a) Clearly.M)p+q(y) = (T-AI)«(0)-r-(T-AI)p(0) = 0.Wf-^x) = (T . Suppose that x and y are in KA. (b) For any scalar p ^ A.A I ) p ( x ) = (T-AI)4(y) = 0. then it is also ^(T-invariant. Then (a) K\ is a T-invariant subspace ofV containing E\ (the eigenspace ofT corresponding to X). and let A be an eigenvalue of T. Let T be a linear operator on a vector space V.Al)p(x) = T(0) = 0. and let y = (T-\\)P-l(x). Choose a positive integer p such that (T — Al)p(x) = 0. T h e o r e m 7. Then (T . and thus y = 0. 1 . So x = 0. Furthermore. the range of a linear operator T is T-invariant.AI)pT(x) = T(T . By way of contradiction. Then (T-AI)(y) = ( T .A I ) p ( x ) = 0. 485 Note that K\ consists of the zero vector and all generalized eigenvectors corresponding to A. and the restriction of T — p\ to KA is one-to-one.

therefore (T . and let m be the multiplicity of Ai.486 Chap. Thus V = K Al . Since every eigenvalue of Tw is an eigenvalue of T. The proof is by mathematical induction on the number k of distinct eigenvalues of T. Then v = (T — Afcl)m(y) for some y G V.Xk). A is the only eigenvalue of Tw. and hence Xi is an eigenvalue of Tw with corresponding generalized eigenspace K\i. the restriction of T — Afc I to K\i is one-to-one (by Theorem 7. Then (Ai — t)m is the characteristic polynomial of T. Then f(t) = (t — Xk)mg(t) for some polynomial g(t) not divisible by (t . Let m be the multiplicity of Afc.Al) m ). 7 Canonical Forms Theorem 7.) = (T-Afcir + 1 (2/)Therefore y G KAfc. Since (T — Afcl)m maps K\t into itself and Afc 7^ Ai. First suppose that k = 1. and by Theorem 7. .Al)d(x) = 0 for all x G W.1(b)) and hence is onto. K\{ is contained in W. • • •. we have KA C N((T . 317). Next. such that X — Vi + V2 H hUfe. Suppose that A is an eigenvalue of T with multiplicity m. One consequence of this is that for i < k.Hence h(t) = (—l)d(t—X)d. v = (T . (a) Let W = KA.21 (p. For suppose that i < k. Afe be the distinct eigenvalues of T.1(b).Xk\)m(y) = 0. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. Clearly W is T-invariant. . h(t) divides the characteristic polynomial of T. for every x G V.A I ) m ) .For suppose that T(v) = Afct? for some v G W. and let Ai. the distinct eigenvalues of T w are Ai. and let f(t) be the characteristic polynomial of T. (b) KA = N ( ( T . the result is established whenever T has fewer than k distinct eigenvalues. Let T be a linear operator on a finite-dimensional vector~space V such that the characteristic polynomial of T splits. | v Theorem 7. . So by Theorem 7. and d < m. A2. and hence (Ail .Al)m) C KA. and the result follows. Then /i(Tw) is identically zero by the Cayley-Hamilton theorem (p. 317). A 2 . Since d < m. Now let W and h(t) be as in (a). Afc_i.2. Let W = R((T . observe that Afc is not an eigenvalue of Tw. . and let h(t) be the characteristic polynomial of T w By Theorem 5. Observe that (T — Afcl)w maps KA{ onto itself for i < k. and it follows that 0 = (T-Afcl)(t. and suppose that T has k distinct eigenvalues. Proof. there exist vectors Vi G K\if 1 < i < k. 314). Now suppose that for some integer k > 1. Then (a) dim(KA) < m. Then.Afcl)m). where d = dim(W). .2. (b) Clearly N((T . Proof.T) m = To by the Cayley-Hamilton theorem (p.3.

and hence there are vectors Wi G K\^ 1 < i < k — 1. 7. let di = dim(KAj). there exist vectors Vi G KA4 such that x = v\ + v2 + • • • -f. The corresponding generalized eigenspace of T ^ for each Xi is K\. (b) (3 = 0i U (32 U • • • U 0k is an ordered basis for V. (a) Suppose that x G /% fl f3j C KA4 D KAJ5 where i ^ j. Thus (T .Xk\)m(Vl) + (T . there exist vectors Vi G KA. | x = vi+u2H The next result extends Theorem 5.1(b). .1 distinct eigenvalues Ai.1 The Jordan Canonical Form I 487 Now let x G V.Afcl)m(t*) + • • • + (T + vk-i) Xk\)m(vk-i). By Theorem 7.9(b) (p.3..(v\ + u2 H vector vk G KAfc such that G KAfc. such that (T .. 268) to all linear operators whose characteristic polynomials split. and it follows that x . A2.Afcl)m(x) = W! + w2 + • • • + wk-i.. Afc be the distinct eigenvalues of T with corresponding multiplicities mi. by Theorem 7. and the result follows. Proof. let Pi be an ordered basis for K\{. Since T w has the k . .Afcl)m(x) = (T .Afcl)m(x) G W. By Theorem 7. .mk. . (b) Let x G V. Let q be the number of vectors in (3. Then the following statements are true. Then. In this case. (c) dim(KAj = mi for all i. q = > J di < T j m-i = dim(V). 47)..2(a). and let Ai. onto itself for i < k. (a) A n f t = 0 fori^j. . Afc_i. Consequently (3 is a basis for V by Corollary 2 to the replacement theorem (p. such that (T . For each i. .vk. it follows that x is a linear combination of the vectors of (3. Therefore (3 spans V. For 1 < i < k. Since (T — Afcl)m maps KA. the eigenspaces are replaced by generalized eigenspaces. the induction hypothesis applies.Sec. Since each Vi is a linear combination of the vectors of pi. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. Theorem 7. Therefore there exists a +vk. • .Xk\)m(vi) = Wi for i < k. and therefore (T — AJ) p (x) ^ 0 for any positive integer p. i=i i=i Hence q = dim(V). for 1 < i < k. Then dim(V) < q. A 2 .. Then (T . T — A^l is one-to-one on KA^ . But this contradicts the fact that x G K\t.m2.4. .

subspnce>s have1 the same dimension if and only if they are equal.{) = 0.v2.VQ} are the cycle's of generalized eigenvectors of T that occur in 0. But EA G KA.j = span(/?i) for 1 < i. In Example 1. ( T .e. 02 = {"4}^ 03 = {»?>•<va}and 04 — {VJ. respectively.2 l ) 2 ( e 3 ) .. for all i. Proof. and the'reforc d. We say that the length of the cycle is p. 268)..twe.Al)(x). Notice' that the initial vector of a cycle of generalized eigenvecte>rs of a line. For this purpose. Let T be a linear operator on a vector space V.x} is called a cycle of generalized eigenvectors of T corresponding to X. . I Corollary./ are of the. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits.488 Chap. The vectors (T — Al) p ~'(x) and x are called the initial vector and the end vector of the cycle.\}.2 l ) ( r 0 . we see that fti is a.'"2.4 to obtain a Jordan canonical basis for the operator. Definitions.. We now foe-us our attention on the' problem of select ing suitable bases for the generalized eigenspaces of a linear operates se> that we. then the set {x} is a cycle of generalized eigenvectors of T corresponding to A of length 1.ar operator T is the only eigenvector erf T in the eyerie1.":i} = { ( T . The relation be.2l)^(/». — m. we consieler again the. Furthermore. we s e e > that / J d t = > . Notie-e that ft is a disjoint union of these cycles. (T . This leads to the fe)lle)wing ele'finitions. anel [Twjft is the ith Jordan block of the' Jorelan canonical form of T. We.2(a).n these vccte>rs is the key to finding Jordan canonie-al base's. form {'•i.Al)". the>se. setting W. the subsets fti — {v\. anel he'iure.r is an eigenvector of T e-orrosponding to the eigenvalue A.Al)p(x) = 0. 7 Canonical Forms % (c) Using the notation and result of (b). we s e > e > that T is diagonalizable' if anel only if (HIII(EA) = dim(KA) for each eigenvalue A of T. Then the ordered set {(T .may use' The'orem 7. and let x be a generalized eigenvector of T corresponding to the eigenvalue X. have seen that the first four vectors of (3 lie' in the generalize'el eigenspace K2.v. Alse) observe that if. ' m i .0(a) (p. This is precisely the condition that is re'ejuirenl for a Jordan canonical basis. basis ft of Example 1.2 (x). basis for W. Suppose that p is the smallest positive integer for which (T .But i=l i \ di < m-i by Theorem 7. Furthermore. (T ..AI)""1 (x). e)bse>rve that ( T .1 and 5. Observe that the vee-te)rs in 0 that eletcrinine the first Jorelan block of .. < 4. Combining Theeuvms 7. r : { } . Then T is diagonalizable if and only if EA = KA for every eigenvalue X ofT.

s us toward the. vp}. there exist bases that are disjoint unions of cycles of generalized eigenvectors.. For i > 1. (b) 0 is a Jordan canonical basis for V. So assume that. and suppose that 7 has exactly n vectors. Then the 7* 's are disjoint.A l ) p ( x ) = 0. • • • . where Vi = (T — Al) p. The proof that 7 is line.. desireel existence theorem. Suppose that 71. by the preceding equations. w 2 . 7. II In view of this result. Let W be the subspace of V generatcxl by 7. and x is the end vector of 7. and let X be an eigenvalue ofT. For (b). Let U denote the. Then the following statements are true. Proof. restriction of T . Then 7 = {v\. Clearly W is (T — Al)-invariant. Theorem 7.6. we must show that. and [Tw]7 is a Jordan block. Therefore T maps W into itself. Proof. We leave the details as an exercise.. simply repeat the arguments of (a) for each cycle in ft in order to obtain [T]^. (a) For each cycle 7 of generalized eigenvectors contained in ft.A I ) M = ( T . . (a) Suppose that 7 corresponds to A.. we sec that [Twh is a Jorelan block.lq are cycles of generalized eigenvectors of T corresponding to X such that the initial vectors of the "fi 's are distinct and form a linearly independent set. W = span(7) is T-invariant. the result is valid whenever 7 has fewer than n vectors. and. Exercise 5 shows that the 7$'s are disjoint. and their union 7 = M 7.arly independent is by mathematical induct ion on the number of vectors in 7. under appropriate conditions. and hence T(v{) = Xv\.e that it is also sufficient. We will soon s<. then the result is clear. this is a neec. is linearly i-l independent. If this number is less than 2.1 The Jordan Canonical Form I 489 Theorem 7. Let T be a linear operator on a vector space V.Sec.AI to W. for some integer n > 1. Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial splits.*(x) for i < p So ( T ..72. Since the characteristic polynomial of a Jordan canonical form splits. and vp — x.ssary condition. and suppose that ft is a basis for V such that ft is a disjoint union of cycles of generalized eigenvectors of T. and dim(W) < n. (T-AI)K) = (T-AI)^i-1>(x) = ^_i. 7 has length p. The next result mov<.5.

and so we can extend each 7* to a larger cycle ji = 7* U {vi} of generalized eigenvectors of T corresponding to A. From these inequalities and the dimension theorem. •. Note that if 7* has length one. by the induction hypothesis. then 7^ = 0 .7. For 1 < i < q. Hence dim(R(U)) = n — q.W2. The result is clear for n = 1. Proof. each vector of 7^ is the image under U of a vector in 7. So suppose that for some integer n > 1 the result is valid whenever dim(KA) < n. the end vector of 7» is the image under U of a vector Vi G KA.wQ. wq} is a linearly independent subset of EA. Furthermore. every nonzero image under U of a vector of 7$ is contained in 7^. The proof is by mathematical induction on n = dim(KA). Since 7 generates W and consists of n vectors. Hence 7 is linearly independent. I Corollary. and assume that dim(KA) = n. and conversely. it must be a basis for W. In the case that 7^ ^ 0. Since {wi. and let X be an eigenvalue of T. . . we obtain n > dim(W) = dim(R(U)) . Let U denote the restriction of T — AI to KA. Therefore.U2. and 1 hence of T itself. and the initial vectors of the 7^'s are also initial vectors of the 7i's. this set can be extended to a basis {wi.. corresponding to A for which 7 = M 7^ is a basis for R(U).. ..Then R(U) is a subspace of KA of lesser dimension.. 7' generates R(U). Thus we may apply the induction hypothesis to conclude that 7' is linearly independent. w 2 .7 2 . Therefore 7' is a basis for R(U). we have dim(N(U)) > q. Since the q initial vectors of the 7. We conclude that dim(W) = n.dim(N(U)) >(n-q) + q — n.'s form a linearly independent set and lie in N(U). Let T be a linear operator on a finite-dimensional vector space V. let Wi be the initial vector of 7J (and hence of 7*). .us} . . Let 7' = IIT*i Then by the last statement. • • •. For 1 < i < q. Then K\ has an ordered basis consisting of a union of disjoint cycles of generalized eigenvectors corresponding to X. let 7^ denote the cycle obtained from 7$ by deleting the end vector. 7 g of generalized eigenvectors of this restriction.. there exist disjoint cycles 71.U\. 7' consists of n — q vectors.490 Chap.. 7 Canonical Forms For each i. and R(U) is the space of generalized eigenvectors corresponding to A for the restriction of T to R(U). Every cycle of generalized eigenvectors of a linear operator is linearly independent. Theorem 7..

. • • • . 1 We can now compute the Jordan canonical forms of matrices and linear operators in some simple cases. Corollary 2. . . {'"«} are disjoint cycles of generalized eigenvectors of T corresponding to A such that the initial vectors of these cycles are linearly independent. 1 The Jordan canonical form also can be studied from the viewpoint of matrices.72.1 The Jordan Canonical Form I 491 for EA. for each i there is an ordered basis fti consisting of a disjoint union of cycles of generalized eigenvectors corresponding to A. Then 7 consists of r + q + s vectors. w 2 . Furthermore. it follows that nullity(U) = q + s. We show that 7 is a basis for KA. • • • . Corollary 1. It follows that 7 is a basis for KAI The following corollary is immediate. {«i}. . Let Ai. . and A is similar to J. U\. . Then the Jordan canonical form of A is defined to be the Jordan canonical form of the linear operator LA on Fn. as is illustrated in the next two examples. Proof. The tools necessary for computing the Jordan canonical forms in general are developed in the next section. 7. So 7 is a linearly independent subset of KA containing dim(KA) vectors. Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial splits. Exercise. Afc be the distinct eigenvalues of T.7.wq. Let ft = ft\ U 02 U • • • U 0kThen. {^2}.4(b).Then 71. Then T has a Jordan canonical form. Definition. . Then A has a Jordan canonical form J. Therefore dim(KA) = rank(U) + nullity(U) = r + q + s. us] is a basis for EA = N(U). Let A be an n x n matrix whose characteristic polynomial splits..6.W2. . % . Therefore their union 7 is a linearly independent subset of KA by Theorem 7. by Theorem 7.Suppose that 7 consists of r = rank(U) vectors. . Let A G M n X n (F) be such that the characteristic polynomial of A (and hence of LA) splits. ft is an ordered basis for V. By Theorem 7. since {wi. The next result is an immediate consequence of this definition and Corollary 1.Sec. Proof. • • •. A 2 .

\.) = 2 and a generalized eigenspace has a basis consisting of a union of cycles. = N((T-2I) 2 ). we obtain the cycle of generalizes! eigenvectors 02 = {(A-2I)v. we have that EA. Since EAl = N(T-3I). The characteristic polynomial of A is f(t)=det(A-tI) = -(t-Z)(t-2)2. Since dim(KA_.2.2.v = 0. KA. need to find a Jordan canonical basis for T = L. this basis is either a union of two cycles of length 1 or a single cycle of length 2. Hence Ai = 3 and A2 = 2 arc the eigenvalues of A with multiplicities 1 and 2. To find the Jorelan canonical form for A. —3. but (A . Since (A — 2I)v = (I.1) is an eigenvector of T corresponding to A) = 3. we. therefore ft = is a basis for KA.ptable candidate for v. = KA.2I)v f 0. Therefore the desired basis is a single cycle of length 2. A vector v is the end vector of such a cycle if and only if (A . Observe that (—1.e.492 Example 2 Let A= 3 1 -2 -1 5 0 1 -1 4 Chap.4. .2I)2v = 0. dim(KA. and dhn(KA2) = 2.2.. = N(T-3I). respectively. 7 Canonical Forms G M 3 x3(rt). It can easily be shown that is a basis for the solution space of the homogeneous system (A — 2I)2. —1).0) is an ae-c. and KA. By Theorem 7. The former case is impossible because the vectors in the basis would be eigenvectors -contradicting the fact that dim(EA2) = 1. .) = 1.v} = m . By Theorem 7. The vector v = (-1. Now choose a vector v in this set so that (A — 2I)v ^ 0.

then 7 determines a single Jordan block [T]. there must be a vector that satisfies this condition.Sec. J = Q~lAQ. So ft is a basis for KA. Let ft be the standard ordered basis for P2(R). where Q is the matrix whose columns are the vectors in 0. = /-I 0 \ 0 1 0" -1 1 0 . or else .Then m* = which has the characteristic polynomial f(t) — —(t + l) 3 . 7. In fact.1 which is a Jordan canonical form of T.NOW dim(E A ) = 3 . and there do not exist two or more linearly independent eigenvectors. 0/ Therefore a basis for KA cannot be a union of two or three cycles because the initial vector of each cycle is an eigenvector. We find a Jordan canonical form of T and a Jordan canonical basis for T. In any basis for KA. Therefore.0 -1 0 0 0\ . Notice that A is similar to J. The end vector h(x) of such a. cycle must satisfy (T 4.2 = 3 . Example 3 Let T be the linear operator on P2(R) defined by T(g(x)) = —g(x) — g'(x).4. Thus A = — 1 is the only eigenvalue of T.\)2(h(x)) / 0.2 = 1. and hence KA = ?2(R) by Theorem 7.1 The Jordan Canonical Form I 493 as a basis for KA2 • Finally.rank(j4 + 7) = 3 . 3 0 0 ~o1 2 1 0 0 2 J = m* = is a Jordan canonical form for A. So the desired basis must consist of a single cycle of length 3. If 7 is such a cycle. we take the union of these two bases to obtain ft = ftiUft2 = which is a Jordan canonical basis for A.rank '0 0 .

494 Chap. Proof.. ( . Let T be a linear operator on a finite-dimensional vector space V. for each i. which is optional. we./ has Jordan canonical form J. then 0i U 02 U • • • U ftk is a Jordan canonical basis for T. If. characteristic polynomial splits.~2x. for a generalized eigenvector of a linear operator T to correspond to a scalar that is not an eigenvalue of T.11 to the nondiagonalizable case. (a) Eigenvectors of a linear operator T are also generalized eigenvectors of T. the operator L. Testing the vectors in 0. whose. 7 Canonical Forms no vector in KA satisfies this condition.8. (f) Let T be a linear operator on a finite-dimensional vector space whose characteristic polynomial splits. T is diagonalizable if and only if V is the direct sum of the eigenspaces of T. Then. we develop a computational approach for finding a Jordan canonical form and a Jordan canonical basis.11 (p..x 2 } = {2. If T is diagonalizable.A2 Afc be the. (c) Any linear operator on a finite-dimensional ve..\. • In the next section..e:tor space. and let Ai. Exercise. then the eigenspaces and the generalized eigenspaces coincide. has a Jorelan canonical form. The next result./. (h) Let T be a linear operator on an n-dimensional vector spae. EXERCISES 1. we see that h(x) = x is acceptable. 278). distinct eigenvalues of T. contrary to our reasoning. ( T + I)(x 2 ). By Theorem 5. fti is a basis for K. for any eigenvalue A of T. (e) There is exactly one cycle of generalized eigenvectors corresponding to each eigenvalue of a linear operator on a finite-dimensional ve. Then V is the direct sum of the generalized eigenspaces of T.e. In the process. (b) It is possible. (g) For any Jorelan block . (d) A cycle of generalized eigenvectors is linearly independent. 2 } is a Jordan canonical basis for T. extends Theorem 5. Theorem 7. and suppose that the characteristic polynomial of T splits. Label the following statements as true or false. KA = N((T-AI)"). Therefore 7 = {(T + I ) V ) .ctor space. Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial sj>lits. prove that Jorelan canonical forms are unique up to the order of the Jordan blocks.

then for any positive integer k N((T-Alv) f c ) = N((Al v -T) f c ). then the cycles are disjoint.1 3/ 3. Prove that if the initial eigenvectors are distinct. 6. Then find a Jordan canonical form J of A.e1. For each matrix A. (a) N(U) C N(U2) C • • • C N(Ufc) C N(Ufc+1) C • • •.72.7 P be cycles of generalized eigenvectors of a linear operator T corresponding to an eigenvalue A. (a) A = 1 l\ . For each linear operator T. Let T: V . 5.1 The Jordan Canonical Form I 495 2.Sec. 7. • • • . 4 J Let T be a linear operator on a vector space V. (a) T is the linear operator on P2(R) defined by T(/(x)) = 2/(x) — (b) V is the real vector space of functions spanned by the set of real valued functions {l. Prove that span(7) is a T-invariant subspace of V. (a) N(T) = N(-T). Prove the following results. 7. find a basis for each generalized eigenspace of L^ consisting of a union of disjoint cycles of generalized eigenvectors.» W b e a linear transformation. (d) T(A) = 2A + At for all A G M 2x2 (i2). te1}.t. ) -A . (c) T is the linear operator on M 2 x 2 ( # ) defined by T(A) = I for all A G M 2X 2(#). and let 7 be a cycle of generalized eigenvectors that corresponds to the eigenvalue A. (b) N(Tfc) = N((-T) fc ). find a basis for each generalized eigenspace of T consisting of a union of disjoint cycles of generalized eigenvectors. Let U be a linear operator on a finite-dimensional vector space V.1 *) 11 21 3 -4 -8 -1 -5 -11 0 (b) A = 1 2 3 2 (c) A = (2 1 0 o\ 0 2 1 0 (d) A = 0 0 3 0 ^0 1 . Let 71. (c) If V = W (so that T is a linear operator on V) and A is an eigenvalue of T. and T is the linear operator on V defined by T ( / ) = / ' . Then find a Jordan canonical form J of T. Prove the following results. t2.

then fc N(U'") = N(U ) for any positive integer k > rn. defined in Section 5. and let A be an eigenvalue of T. then KA = N((T . . Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial splits. 13. of Theorem 7. A 2 . Let T be a linear operator on (e) V whose characteristic polynomial splits. 8. and suppose that J = [T]^ has q Jordan blocks with A in the diagonal positions. and let Ai. vk in the statement 9. Prove Corollary 2 to Theorem 7. Prove that q < dim(EA). 12... and let A be an eigenvalue of T.3 are unique.4 to prove that the vectors v\.7. Afc be the distinct eigenvalues of T. then Tw is diagonalizable. Prove that J . Let T be a linear operator on V. Use (e) to obtain a simpler proof of Exercise 24 of Section 5.A. Use Theorem 7. and let A be an eigenvalue of T.A. (b) Let ft be a Jordan canonical basis for T.J\ © J 2 © • • • © Jfc is the Jordan canonical form of J . Then T is diagonalizable if and only if rank(T . (b) Suppose. 7 Canonical Forms m m+1 ) for some positive integer rn.Prove that ft' is a basis for KA10.496 Chap.. and let Ai. . . then (b) If rank(U ) = rank(U m fc rank(U ) = rank(U ) for any positive integer k > m. Afc be the distinct eigenvalues of T. v2.AI)'")Second Test for Diagonalizability. For each i. Let ft' = ft n KA. Prove that q < dim(E A )..4 on page 320. . Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. Exercises 12 and 13 are concerned with direct sums of matrices.. 11. A 2 . Prove Theorem 7.I)2) for 1 < i < k. (d) Prove that if rank((T-Al)' n ) = rank((T-Al) m + 1 ) for some integer rn. ..5(b). (a) Prove Theorem 7.4: If (f) T is a diagonalizable linear operator on a finite-dimensional vector space V and W is a T-invariant subspace of V.. let Jt be the Jordan canonical form of the restriction of T to KA^ . that ft is a Jordan canonical basis for T.I) = rank((T . . TO w+1 (c) If rank(U ) = rank(U ) for some positive integer rn. (a) Suppose that 7 is a basis for KA consist ing of the union of q disjoint cycles of generalized eigenvectors.8. Let T be a linear operator on a finite-dimensional vector space whose characteristic polynomial splits.

. That is. is completely detenninc'd by T.].. is a disjoint union of cycles 7!. 489).. of cycles that form 0i} and the length pj (j — 1. To aid in formulating the uniqueness theorem for . While developing a method for finding . let T. be the restriction of T to KA. is unique.. In this matrix. if/?..2. fc the union ft — M fti is a Jordan canonical basis for T. It is in this sense that A. It then follows that the Jordan canonical form for T is unique up to an ordering of the eigenvalues of T. Let Ai../ anel ft as well. we show that for eae:h i. contains an ordered basis fti consisting of a union of disjoint cycles of generalized eigenvectors corresponding to A. Then Ai is the Jordan canonical form of T. rij) of e%ach cycle. the number n. and /Ai 0 J O A2 O 0\ 0 Ak) = m* = \() is the Jordan canonical form of T.^. 490).. A2 Afc be the distinct eigenvalues of T. Example 1 To illustrate the discussion above. Specifically. the ordered basis fti for KA.. we index the ewcles so that p\ > p2 > • • • > pUi. each O is a zero matrix of appropriate size. we fix a linear operator T on an n-dimensional vector space V such that the characteristic polynomial of T splits.. it becomes evident that in some sense the matrices Aj are unique.5 (p. there. For each i.7 (p.4(b) (p. In this section./. which in turn determines the matrix Aj.2 THE JORDAN CANONICAL FORM II 497 For the purposes of this section. So by Theorems 7. As we will see. wc compute the matrices A. orderings of vectors in fti. By Theorem 7.. 7. 487) and 7. is no uniqueness theorem for the. each generalized eigenspace KA.. will henceforth be ordered in such a way that the cycles appear in order of decreasing length. 7 2 .Sec. we adopt the following convention: The basis fti for KA. is the union of four cycles fti = 71 U 72 U 73 U 74 with respective . 7. anel let Ai = [T. ./. for some /'. and if the length of the cycle 7~ is pj. and the bases fti. This ordering of the cye:les limits the possible. suppose that. bases fti or for ft. . thereby computing .2 The Jordan Canonical Form II 7. .

respectively. . • • • »7n< with lengths p\ > p 2 > • • • > pn. 0 0 0 0 Chap..498 lengths p] = 3.I. .l)(„2) v2 •(T-A. anel the dots are configured according to the following rules.. (T-A.. Then (Xi 0 0 0 0 0 0 0 1 0 A.i . 0 0 0 0 Xi 1 0 0 0 A. 1 0 A. pi > p2 > • • • > pni. is the restriction of T to KA. anel ordered bases 0i. 1. p2 = 3. the diagram can be reconstructed from the values of the r'j's. facts. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 A.the columns of the dot diagram become shorter (or at least not longer) as we move from left to right.) and p\ rows. 0 A. The proofs of these.. The array consists of n* columns (one column for each cycle).. columns (one for each cycle. pz = 2. Now let /-. Observe that r\ > r 2 > ••• > r P l .v2 <•„. are treated in Exercise 9. 2. denote the number of dots in the j t h row of the dot diagram. Furthermore.-'K.72. In the following dot diagram of T.2 (.) (T \i\r»i-2(vni) (T-K\)(vni) (T-A. Denote the end vectors of the cycles by v\.. Suppose that fti is a disjoint union of cycles of generalized eigenvectors 71. Since. '•1 (T-A. we use an array of dots called a dot d i a g r a m of T. Counting from left to right. where T. and p4 = 1. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A. contains one clot for each vector in fti. The dot diagram of T. Notice that the.l)(c1) • e. the y'th column consists of the Pj dots that correspond to the vectors of 7/ starting with the initial vector at the top and continuing down to the end vector. 7 Canonical Forms Ar = 0 \ 0 0 0 0 0 0 0 Xi ) To help us visualize each of the matrices A. (T-Atl)p». (T-AJ)"-1^.> 2 .l)". each clot is labeled with the name of the vector in fti to which it corresponds. which are combinatorial in nature. dot diagram of Tj has //.l)p'-2(r (T-A...

and P4 = 1. and KXi is invariant under ( T . N((T — Ajl)r) = N(U).A t l ) r ) C KA. Hence the number of dots in the first r rows of the dot diagram equals nullity((T — AJ)7-).. see Exercise 8. x G Si if and only if x is one of the first r vectors of a cycle.. But S\ is a linearly independent subset of N(U) consisting of a vectors. the dot diagram of Tj is as follows: • • • Here r\ = 4. Then a + b = rrii.. Thus {U(x): x G S2} is a basis for R(U) consisting of b vectors. p\ = p 2 = 3.9 yields the following corollary. rather than with fti. Hence the dot diagram is completely determined by T. Theorem 7. and let m-i = din^KAj. For any x G fti. For example. using only T and Xi. G A : U(x) ^ 0}. We now devise a method for computing the dot diagram of Tj using the ranks of linear operators determined by T and A^. the vectors in fti that arc associated with the dots in the first r rows of the dot diagram of Ti constitute a basis for N((T — AJ) r ). and r3 = 2. Hence rank(U) = b. 1 In the case that r = 1.- . 0i is not unique.) To determine the dot diagram of Ti.A ? J ) r . and hence it suffices to establish the theorem for U. is n«. p$ = 2. Proof. Now define Si = {x G fti: U(x) = 0} and S2 = {. N ( ( T . 7. On the other hand. we fix a basis 0i for KA. Theorem 7. the number of Jordan blocks corresponding to Xi equals the dimension of Ex. cycles of generalized eigenvectors with lengths p\ > p 2 > • • • > prli. To facilitate our arguments. r 2 = 3. therefore 5] is a basis for N(U). The next three results give us the required method. the number of dots in the j t h row of the dot diagram. SO that fti is a disjoint union of n7. Let a and b denote the number of vectors in S\ and S2. Corollary. For any x & S2.Sec./. with n-i = 4. Hence a is the number of dots in the first r rows of the dot diagram. Hence in a Jordan canonical form ofT. By the preceding remarks. we devise a method for computing each Tj. The dimension of EA. respectively. and so nullity(U) = nii — b = a. It follows that U maps 5 2 in a one-to-one fashion into fti. from which it follows that it is unique. the effect of applying U to x is to move the dot corresponding to x exactly r places up its column to another dot. and this is true if and only if x corresponds to a dot in the first r rows of the dot diagram.2 The Jordan Canonical Form II 499 In Example 1. (It is for this reason that we associate the dot diagram with T. Let U denote the restriction of (T — Ajl)7' to KA*.9. Clearly. For any positive integer r.

(b) rj = rank((T .1 ) . Hence we have proved the following result. rj = (rL +r2-\ + Tj) . Exercise. Theorem 7.A J ) ' .rank((T . Let rj denote the number of dots in the jth row of the dot diagram of Tj. Hence r\ — dim(V) — rank(T — A»l). Chap. subject to the convention that the cycles of generalized eigenvectors for the bases of each generalized eigenspace are listed in order of decreasing length. 7 Canonical Forms We are now able to devise a method for describing the dot diagram in terms of the ranks of operators. we have ri + r2 + • • • + rj = nullity((T .9.rank((T . We apply these results to find the Jordan canonical forms of two matrices and a linear operator.Ail) J ).rank(T .AJ).Xi\)j) if j > 1. Proof.( n + r2 + j h r^-i) I = [dim(V) .500 Proof.Ajl)^ 1 ) . Theorem 7. the restriction ofT to KA4.10. the Jordan canonical form of a linear operator or a matrix is unique up to the ordering of the eigenvalues.Xi\) )} . and for j > 1.Xi\)j). Corollary. for 1 < j» < pi.rank((T .10 shows that the dot diagram of Tj is completely determined by T and Aj. For any eigenvalue Xi ofT.Ajl)j) = dim(V) . Thus.rank((T . (a) n = dim(V) .rank((T . the dot diagram of Ti is unique.Ajl)^ 1 )] = rank((T . Then the following statements are true. Example 2 Let (1 0 A = 0 -l 3 1 1 V0 0 1\ -1 0 1 0 0 3/ . By Theorem 7.[dim(V) .

3). 7.2 = 2. (Actually.4(c) (p. /o 0 n = 4 — rank(i4 — 27) = 4 — rank 0 and r 2 = ram\(A . by Theorem 7. with multiplicities 3 and 1.Sec.1 So M = P"il A (2 1 0' = 0 2 0 V° 0 2. Therefore A2 = [T2J>a = (3). respectively. As we did earlier. 1 -1 0 0 V V> . Setting 0 = 0\ U 02. respectively. we have (2 1 0 2 = 0 0 \0 0 0 0\ 0 0 2 0 0 3/ J=[lAb .2) 3 (i .ti) = (t. it follows that dim(KA2) = 1. Let Ti and T2 be the restrictions of L^ to the generalized eigenspaces KA. Then.rank((. 487). Ai = 2 and A2 = 3. Thus A has two distinct eigenvalues. hence the dot diagram of Ti has three dots.) Hence the dot diagram associated with 0i is -1 0 i\ 1 -1 0 = 4 .21) . let rj denote the number of dots in the jth.2 The Jordan Canonical Form II 501 We find the Jordan canonical form of A and a Jordan canonical basis for the linear operator T = L^.4 . Suppose that 0\ is a Jordan canonical basis for Ti.) = 3 by Theorem 7. and KA2.1 = 1. row of this dot diagram. the computation of r 2 is unnecessary in this case because r\ = 2 and the dot diagram only contains three dots. and consequently any basis 02 for KA2 consists of a single eigenvector corresponding to A2 = 3. Since A2 = 3 has multiplicity 1.10.21 f) = 2 . it follows that dim^A. The characteristic polynomial of A is det(i4 . Since Ai has multiplicity 3.

7 Canonical Forms We now find a Jordan canonical basis for T = L^.2I) 2 It is easily seen that f (\\ 0 0 W 2 AA l ' 2 w /o\ I 0 w is a basis for N((T — 2I) ) = KA. We begin by determining a Jordan canonical basis 0\ for Ti. there are two such cycles. select /1\ 0 V2 = 0 . We reprint below the dot diagram with the dots labeled with the names of the vectors to which they correspond. each corresponding to a cycle of generalized eigenvectors. Chap.Since the dot diagram of Ti has two columns. Of these three basis vectors. .2 0 0 = 0 0 2 V" 0 0 A-2J and (A .21).502 and so J is the Jordan canonical form of A. Suppose that we choose (0\ 1 '••i = 2 w Then /o 0 = 0 V 0 L\ (0\ -i 1 1 -1 0 1 2 0 1 --1 0 V w -*\ -1 " -1 V-i/ (T-2\)(vi) = (A-2I)(v1) Now simply choose v2 to be a vector in EA.21) (t>i). Let v\ and u2 denote the end vectors of the first and second cycles. that is linearly independent of (T . the last two do not belong to N(T — 21). • (T-2l)(t>i) •v2 Prom this diagram we sec that V\ e N((T . Now /0 0 = 0 \0 -1 1 1 -1 0 1\ -1 0 -1 0 0 /() . and hence we select one of these for v\.2I)2) but Vi £ N(T . for example. respectively.

0 0 W Thus \\ (°) 1 i 1 ? 2 h is a Jordan canonical basis for L^. Hence any eigenvector of L^ corresponding to A2 = 3 constitutes an appropriate basis 02.1 Q = . dim(KA2) = dim(EA2) = 1. /-1\ -1 -1 \rh 0 \ 0 0 w • By Theorem 7. Notice that if /-l . For example. Since A2 = 3 has multiplicity 1.1 \-l then J = Q~lAQ.Sec. 7. 489).6 (p. the linear independence of 0\ is guaranteed since t'2 was chosen to be linearly independent of (T — 21)(vi).2 The Jordan Canonical Form II Thus we have associated the Jordan canonical basis M -1 -1 K-y (°\ i > 2 sPJ (1\ 0 ) 0 v<v 503 0i = with the dot diagram in the following manner. o 1 i\ 1 0 0 2 0 0 0 0 1/ v<v /1\ /i\ 0 0 ! 0 ? 0 W W 02 = 0 = 0lU02 = .

2 -4 2 0 1 -2 3 -6 3 2^ 3 3 V We find the Jordan canonical form J of A. In this case. and hence this dot diagram has the following form: Thus M = [T 2 ] A = 4 0 1 4 .]. we have dim(KA2) = 2.21) = 4 .2)2(t .2.A — 4/) = 3. and let T. be the restriction of L^ to KA* for i = 1.504 Example 3 Let ( A = 2 Chap. hence the dot diagram of Ti is as follows.. and a matrix Q such that J — Q _ 1 AQ. for example.2 = 2. and A2 = 4.4 . /2\ M 1 1 5 0 2 \v w 01 = Next we compute the dot diagram of T 2 . 0\ is an arbitrary basis for EAX = N(T — 21). = 4 . a Jordan canonical basis for L^. = g °) where /?i is any basis corresponding to the dots. Let r\ denote the number of dots in the first row of this diagram. 7 Canonical Forms -2 -2 V. Then r. Since A2 = 4 has multiplicity 2. Therefore ^=[T.4) 2 . The characteristic polynomial of A is dct(A .ti) = (t . there is only 4 — 3 = 1 dot in the first row of the diagram. Ai = 2.rank(. We begin by computing the dot diagram of Ti. Let T = L^. Since rank(.

4I) 2 ) such that v £ N(T .v} -i -i \ V o/ 0 = 0lU02 = V <v is a Jordan canonical basis for L.4.41).AI)x = 1 W for example..41 is Choose v to be any solution to the system of linear equations /0\ {A . we illustrate another method. In this example. The corresponding Jordan canonical form is given by Ax O O An / 2 0 0 \ 0 0 2 0 0 0 0 4 0 0 \ 0 1 4/ •/ = \\-A\B = . corresponding to the dots. 7.2 The Jordan Canonical Form II 505 where 02 is any basis for KA. A simple calculation shows that a basis for the null space of L^ . The end vector of this cycle is a vector v € K^a = N((T . 02 is a cycle of length 2. -1 -1 \ V Thus ( /0\ 1 ={ 1 I W Therefore /& (2y M 1 1 1 o • 2 < 1 v) 2/ W / -I -1 A 1 l 02 = {(LA-4\)(v).Sec. In this case. One way of finding such a vector was used to select the vector v\ in Example 2.

Example 4 Let V be the vector space of polynomial functions in two real variables x and y of degree at most 2.y. y) = x + 2x2 — 3xy + y.y2. By Theorem 7. Then /() 0 0 A = 0 0 vo 1 0 0 0 0 o 0 0 0 0 0 o 0 2 0 0 0 o 0 0 0 0 0 o 0\ 0 1 0 0 oy and hence the characteristic polynomial of T is f-t det(A . namely.ti) = det V 0 0 0 0 () I -t 0 0 0 0 0 0 -t 0 0 0 0 2 0 -t 0 0 0 0 0 0 -t 0 0\ 0 1 = ? 0 0 -v Thus A = 0 is the only eigenvalue of T. Let T be the linear operator on V defined by T(f(x.y). Then V is a vector space over R and a = {l.(x + 2x2 . we define Q to be the matrix whose columns are the vectors of 0 listed in the same order. We find the Jordan canonical form and a Jordan canonical basis for T. let rj denote the number of dots in the jib.rank(.x. Let A = [T]a. For each j.3y.3xy + y) = 1 + Ax . and KA = V. . if f(x.xy} is an ordered basis for V. then TCffoy)) = j.10.3 = 3. • 0 1\ 1 -1 1 -1 1 V For example.y)) = -^f(x.506 Chap. x2. row of the dot diagram of T. 7 Canonical Forms Finally. n = 6 . 0 1 1 Q = 0 2 V2 0 f2 Then J = Q~lAQ.4) = 6 .

y))^0.42 = 0 0 \0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0\ 0 0 0 0 0/ 507 r 2 = rank(4) . it follows that r$ = 1.y)) and (T .y2.4 2 ) = 3 . since the second column of the dot diagram consists of two dots.y) d2 such that 2 :fi(x.2 The Jordan Canonical Form II and since A) 0 0 . we see that x is a suitable candidate. So the dot diagram of T is We conclude that the Jordan canonical form of T is / 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 J = 0 0 0 0 1 0 0 0 0 0 \ 0 0 0 0 0 o\ 0 0 0 0 0/ We now find a Jordan canonical basis for T.x.y)) = 0.y)) = ^(x2) = 2.1 = 2. Because there are a total of six dots in the dot diagram and r\ = 3 and r 2 = 2. .xy} for dx K A = V. we must find a polynomial f\(x.y) 7^ 0..y) = x . Examining the basis a = {l. = T(h(x. we must find a polynomial / 2 (a. we see that (T .Sec.y. 7.rank(.x2.y)) = T2(h(x. Since the first column of the dot diagram of T consists of three dots.X\)(h(x. but ^(f2(x.X\)2(j\(x. y) such that ~(f2(x.y)) = —(x2) = 2x Likewise. Setting fi(x.

23 (p.x2. The following result may be thought of as a corollary to Theorem 7.2x. by the corollary to Theorem 2. the only choice in a that satisfies these constraints is xy. Hence JA and JB are Jordan canonical forms of L^. The only remaining polynomial in a is y2. with the same ordering of their eigenvalues. the third column of the dot diagram consists of a single polynomial that lies in the null space of T. each having Jordan canonical forms computed according to the conventions of this section. Theorem 7. •2 • 2x • x2 Thus 0 = {2. 1 Example 5 We determine which of the matrices B = . The reader can do the same in the exercises.?y) = xy. xy. we relied on our ingenuity and the context of the problem to find Jordan canonical bases. to develop a general algorithm for computing Jordan canonical bases. J A and JB are matrix representations of L^. If A and B have the same Jordan canonical form J. Let J A and JR denote the Jordan canonical forms of A and B. respectively. however. ox Finally. Conversely.11. Proof. So we set /2(a.y)) is a Jordan canonical basis for T. 490). Lot A and B he n x n matrices.y2} y >xy • ir (T . 7 Canonical Forms Since our choice must be linearly independent of the polynomials already chosen for the first cycle.7 p. and it is suitable here. Thus JA = JB by the corollary to Theorem 7. Then A is similar to both J A and JB. 115). Then A and B are similar if and only if they have (up to an ordering of their eigenvalues) the same Jordan canonical form. We do not attempt. Then A and B have the same eigenvalues.y) = y2. then A and B are each similar to J and hence are similar to each other. Thus = %-(xy) = y.10. although one could be devised by following the steps in the proof of the existence of such a basis (Theorem 7. We are successful in these cases because the dimensions of the generalized eigenspaces under consideration are small.10. Therefore we have identified polynomials with the dots in the dot diagram as follows.y. suppose that A and B are similar. • In the three preceding examples.508 Chap.X\)(f2(x>y)) = T(f2(x. and therefore.. So set fz(x.

Then (see Exercise 4) 0 °\ 0 2 o . (a) The Jordan canonical form of a diagonal matrix is the matrix itself. a n d Jc be the Jordan canonical forms of A. whereas D has —t(t — l)(t — 2) as its characteristic polynomial. Because similar matrices have the same characteristic polynomials. and C in Example 5. Observe that A. B.2 The Jordan Canonical Form II and /. (b) Let T be a linear operator on a finite-dimensional vector space V that has a Jordan canonical form J. A and C are not diagonalizable because their Jordan canonical forms are not diagonal matrices. 2 for their common eigenvalues. B. . (d) Matrices having the same Jordan canonical form are similar. 7. (e) Every matrix is similar to its Jordan canonical form. B is similar to neither A nor C. (h) The dot diagrams of a linear operator on a finite-dimensional vector space are unique. A is similar to C. and C have the same characteristic polynomial —(t — l)(t — 2) 2 . Label the following statements as true or false. Hence T is diagonalizable if and only if the Jordan canonical basis for T consists of eigenvectors of T. D cannot be similar to A. (g) Every linear operator on a finite-dimensional vector space has a unique Jordan canonical basis. respectively.Sec. and C. then the Jordan canonical form of [T]^ is J . Thus. 0 0 2 and 0 0 Jc = ° 2 1 Vo 0 2 /I JB = Since J A = Jc-.0 0 2. B. JB. Similar statements can be made about matrices. using the ordering 1. or C. EXERCISES 1. 509 are similar. of the matrices A. If 0 is any basis for V. B. Assume that the characteristic polynomial of the matrix or linear operator splits. '0 1 2' [ o i l . Thus a linear operator T on a finite-dimensional vector space V is diagonalizable if and only if its Jordan canonical form is a diagonal matrix. Since JB is different from J A and Jc. (c) Linear operators having the same characteristic polynomial are similar. (f) Every linear operator with the characteristic polynomial (—l)n(t — A)ra has the same Jordan canonical form. • The reader should observe that any diagonal matrix is a Jordan canonical form. Let J A.

and A3 = —3 are the distinct eigenvalues of T and that the dot diagrams for the restriction of T to KA. 3. (e) Compute the following numbers for each i. (a) A = 1 2\ -1 2 -1 2 1 4/ ( (c) A = (d) A 0 -2 -2 V-2 . (b). Let T be a linear operator on a finite-dimensional vector space V with Jordan canonical form / 2 1 0 0 2 1 0 0 2 0 0 0 0 0 0 0 0 0 V0 0 0 (a) (b) (c) (d) 0 0 0 2 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 3 0 0N 0 0 0 0 0 3J Find the characteristic polynomial of T. find the smallest positive integer pi for which KAi = N((T . docs EA. A2 = 4.3 Find the Jordan canonical form J of T. = 4 A3 = . Find the dot diagram corresponding to each eigenvalue of T.7 = Q~1AQ. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. For each of the matrices A that follow.510 Chap. = KA/? For each eigenvalue A*. 7 Canonical Forms 2.2. and (c) are those used in Example 5. Notice that the matrices in (a).AJ)^). find a Jordan canonical form J and an invertible matrix Q such that . if any. .3) are as follows: Ai = 2 A. (i = 1.Ajl to KA. For which eigenvalues A. where Uj denotes the restriction of T . (i) rank(Ui) (ii) rank(U 2 ) (iii) nullity(U?) (iv) nullity(U2) 4.. Suppose that Ai = 2.

Sec. as defined in Example 4. (b) T is the linear operator on ?3(R) defined by T(/(a:)) = xf"(x).y)) = £f(x. 6. Let A be an n x n matrix whose characteristic polynomial splits. and W be the subspace spanned by 7. Hint: For any eigenvalue A of A and A1 and any positive integer r. show that rank((A — Al)r) = rank((Al — Al)r). (e) T is the linear operator on M 2 x 2 (i?) defined by T(A)=(l (f) ty(A-A% V is the vector space of polynomial functions in two real variables x and y of degree at most 2. {ex: x € 0} is a Jordan canonical basis for T.y) + ^f(x. (a) Prove that [Tw]7' = ([Tw]7) • (b) Let J be the Jordan canonical form of A. For each linear operator T. and conclude that A and A* are similar.2 The Jordan Canonical Form II 511 5. (c) T is the linear operator on P-$(R) defined by T(f(x)) = f"(x) + 2f(x). Use (a) to prove that J and J* are similar. Define 7' to be the ordered set obtained from 7 by reversing the order of the vectors in 7. 7.e2t}. and T is the linear operator on V defined by T ( / ) = / ' . . Let 0 be a Jordan canonical basis for T.t2et. (a) Prove that for any nonzero scalar c. Prove that A and A1 have the same Jordan canonical form.tet. find a Jordan canonical form J of T and a Jordan canonical basis 0 for T. 7 be a cycle of generalized eigenvectors corresponding to an eigenvalue A. Let T be a linear operator on a finite-dimensional vector space.y). 7. 8. and suppose that the characteristic polynomial of T splits. (a) V is the real vector space of functions spanned by the set of realvalued functions {et. Let A be an n x n matrix whose characteristic polynomial splits. and T is the linear operator on V defined by T(f(x. (c) Use (b) to prove that A and At are similar. (d) T is the linear operator on M 2x2 (i?) defined by T(A)-(» ty-A-A*.

Let 7' be the ordered set obtained from 7 by replacing x by x + y. nilpotcnt nilpotent 11. (a) N ( P ) C N(T' + 1 ) for every positive integer i. (a) Prove that dim(KA) is the sum of the lengths of all the blocks corresponding to A in the Jordan canonical form of T. Suppose that a dot diagram has k columns and m rows with pj dots in column j and r* dots in row i. Prove the following results. and that if 7' replaces 7 in the union that defines 0. Let x be the end vector of 7. The following definitions arc used in Exercises 11 19. where A is the matrix given in Example 2. 7 Canonical Forms (b) Suppose that 7 is one of the cycles of generalized eigenvectors that forms 0. (a) rn — p\ and k = r\. Let T be a linear operator on a finite-dimensional vector space V. (b) Pj = max {i: Ti > j} for 1 < j < k and r^ = max {j: pj > i} for 1 < i < rn. then the new union is also a Jordan canonical basis for T. Prove the following results. 10. that is different from the basis given in the example. Prove that 7' is a cycle of generalized eigenvectors corresponding to A. . and let y be a nonzero vector in EA. and suppose that 7 corresponds to the eigenvalue A and has length greater than 1. (c) n > r 2 > ••• > rm. and let A be an eigenvalue of T. Prove that T is nilpotcnt if and only if \T]j3 is nilpotent. and suppose that p is the smallest positive integer for which T p = T 0 . Definitions. (b) Deduce that EA = KA if and only if all the Jordan blocks corresponding to A are l x l matrices. (c) Apply (b) to obtain a Jordan canonical basis for LA.512 Chap. (d) Deduce that the number of dots in each column of a dot diagram is completely determined by the number of dots in the rows. An nxn matrix A is called if Ap — O for some positive integer p. Prove that any square upper triangular matrix with each diagonal entry equal to zero is nilpotent. Let T be a linear operator whose characteristic polynomial splits. and let 0 be an ordered basis for V. Let T be a nilpotent operator on an n-dimensional vector space V. 13. Hint: Use mathematical induction on m. 9. 12. A linear operator T on a vector space V is called if Tp = To for some positive integer p.

if we delete from 0 the vectors corresponding to the last i dots in each column of a dot. p\. . 18.Sec. SU = US.(This unique representation is guaranteed by Theorem 7. (If a column of the dot diagram contains fewer than i dots. for each i. and 0 is the only eigenvalue of T. . Prove that U is nilpotent and commutes with S.Let 0 be a Jordan canonical basis for T. A 2 . Let D be the diagonal matrix whose diagonal entries are the diagonal entries of J.. Let T be a nilpotent linear operator on a finite-dimensional vector space V. . (c) Let 0 = 0P be the ordered basis for N(TP) = V in (b). Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. and let M = J — D. 0P such that 0i is a basis for N(T*) and 0i+\ contains 0i for 1 < i < p — 1. and let Ai. Prove the converse of Exercise 13(d): If T is a linear operator on an ndimensional vector space V and (— l) n t n is the characteristic polynomial of T. Then [T]^ is an upper triangular matrix with each diagonal entry equal to zero.) 17. the resulting set is a basis for R(T*). (a) M is nilpotent. then T is nilpotent.. 486) and Exercise 8 of Section 7.j + X2v2 H h Afc-Ufc.) (a) Prove that S is a diagonalizable linear operator on V. (b) Let U = T — S. (d) The characteristic polynomial of T is (—l)ntn. (b) MD = DM.. Characterize all such operators. Hence the characteristic polynomial of T splits. Prove the following results. such that x — v\ + V2 H \-Vfc.. Vi is the unique vector in KA. Recall from Exercise 13 that A = 0 is the only eigenvalue of T. Afc be the distinct eigenvalues of T. and hence V = KA. that is. 15. Give an example of a linear operator T on a finite-dimensional vector space such that T is not nilpotent. Prove that for any positive integer i. 16. 7. .2 The Jordan Canonical Form II 513 (b) There is a sequence of ordered bases 0\. 14. where. and let J be the Jordan canonical form of T. but zero is the only eigenvalue of T. all the vectors associated with that column arc removed from 0.1. Let S: V — * V be the mapping defined by S(x) = Ai?. diagram of 0.3 (p. . Let T be a linear operator on a finite-dimensional vector space V.

and let N = J — XIm. 7 Canonical Forms (c) If p is the smallest positive integer for which Mp — O. and.( r .l ) . jr = Dr + rDr~lM + r^~^Dr-2M2 2! r! + (r-p-rl)\(p-l)\ + ••• Dr~p+1Mp-1. r = £>r + rDr~1M + r{'r~V)Dr-2M2 + ••• + rDMr~l + Mr. (ii) A = 1 and m = 1.. 1 otherwise. for any positive integer r > p.514 Chap..l ) 2! rX r .m + 2)xr_m+1^ \r—m+l (m-1)! r ( r . Ar rAT Jr = 0 A? _x r ( r .l r ( r . for any positive integer r < p. . 0 (b) For any integer r >m. and for 1 < r < m.l ) .( r . r _ m + 2 (m-2)! \0 (c) 0 0 AT lim Jr exists if and only if one of the following holds: T—>00 (i) |A| < 1. 19. then.. Let (X 0 0 J = 0 \0 0 0 0 0 ••• ••• 1 A/ 1 0 ••• A 1 ••• 0 A ••• 0\ 0 0 be the m x m Jordan block corresponding to A..m + 3) . Prove the following results: (a) N m = O.

A has a Jordan canonical form J to which A is similar. (d) lim Am exists if and only if 1 is the only eigenvalue of A with m — > o o absolute value 1.2. (c) Each Jordan block of J corresponding to the eigenvalue A = 1 is a l x l matrix. \\cA\\ = |c|»||i4|| for any scalar c. Sec. (See Section 5.2 The Jordan Canonical Form II 515 (Note that lim Ar exists under these conditions. Let Ai. where x is an n-tuple of differentiable functions x\(t).-w. See the discusr—*oo sion preceding Theorem 5. however. (a) (b) (d) ||A|| > 0 and ||A|| = 0 if and only if A = O. Prove the following results. ||AB||<n||A||||S||.13 on page 285. . Definition. Prove the following results.. .. 20.13 on page 285. but assume that the characteristic polynomial of A splits. The following definition is used in Exercises 20 and 21. (See page 312. Let x' — Ax be a system of n linear differential equations.) Furthermore.) Since C is an algebraically closed field. The next exercise requires knowledge of absolutely convergent series as well as the definition of eA for a matrix A. define the norm of A by \\A\\ = max{|j4jj|: 1 < i. In contrast to that exercise.3.xn(t) of the real variable t. Afc be the distinct eigenvalues of A. . Let A G M n x n ( C ) be a transition matrix.. lim Jr r—»oo is the zero matrix if condition (i) holds and is the l x l matrix (1) if condition (ii) holds.20(a) using (c) and Theorem 5. For any A G M n X n (C). do not assume that A is diagonalizable. Let P be an invertible matrix such that P~lAP = J. B G M n x n ( C ) . (e) Theorem 5. . (c) P + S | | < ||A|| + ||£||.X2(t).19. Use Exercise 20(d) to prove that eA exists for every A G M n x n ( C ) . 21. (d) Prove Theorem 5. 7.j < n}. (a) ||A m || < 1 for every positive integer rn. A 2 . 23.) 22. and A is an n x n coefficient matrix as in Exercise 15 of Section 5. . Let A. (b) There exists a positive number c such that ||J m || < c for every positive integer m. .

Hence there is a polynomial of least degree with this property.Xi\)p-2 + ••• + / ^ ( i O J t i is a solution to the system x' = Ax. where x. (See Appendix E. (1) .12. Let p(t) be a minimal polynomial of a linear operator T on a finite-dimensional vector space V. Theorem 7. (a) For any polynomial g(t). there exist polynomials q(t) and r(t) such that 9(t) = q(t)p(t)+r(t). A polynomial p(t) is called a minimal polynomial ofT if p(t) is a monic polynomial of least positive degree for which p(T) — To. we can divide g(t) by its leading coefficient to obtain another polynomial p(t) of the same degree with leading coefficient 1.) Definition. the function ex*[f(t)(A .Xilf-1 + f'(t)(A .l of Appendix E.516 Chap.3 THE MINIMAL POLYNOMIAL The Cayley-Hamilton theorem (Theorem 5. x' = 2x + y x' = 2x + y (a) y'= 2yz (b) y'= 2y + z z' = 3z z'= 2z 7. (a) Let g(t) be a polynomial for which g(T) = TQ. p(t) is a monic polynomial. Proof. 317) tells us that for any linear operator T on an n-dimensional vector space.. where the vectors u are the end vectors of the distinct cycles that constitute a fixed Jordan canonical basis for LA24. then for any polynomial f(t) of degree less than p. p. Let T be a linear operator on a finite-dimensional vector space. the characteristic polynomial of T. (b) The minimal polynomial of T is unique. The preceding discussion shows that every linear operator on a finitedimensional vector space has a minimal polynomial. If g(t) is such a polynomial. Use Exercise 23 to find the general solution to each of the following systems of linear equations. (b) Prove that the general solution to x' = Ax is a sum of the functions of the form given in (a). 7 Canonical Forms (a) Prove that if u is the end vector of a cycle of generalized eigenvectors of LA of length p and u corresponds to the eigenvalue A. In particular. The next result shows that it is unique. if g(T) = To. thenp(t) divides g(t). y. 562). namely. and this degree is at most n. there is a polynomial f(t) of degree n such that f(T) = To. that is. and z are real-valued differentiable functions of the real variable t.23 p. By the division algorithm for polynomials (Theorem E. p(t) divides the characteristic polynomial ofT.

Since p\(t) and P2(t) have the same degree.Sec. Thus (1) simplifies to g(t) — q(t)p(t). Since r(t) has degree less than p(t) and p(t) is the minimal polynomial of T. Theorem 7. the minimal polynomial of A is the same as the minimal polynomial of LA • Proof. Theorem 7. we study primarily minimal polynomials of operators (and hence matrices) whose characteristic polynomials split.0 = 0. Let A € M n X n (F). and let p(t) be the minimal polynomial of T. Let f(t) be the characteristic polynomial of T. So A is a zero of f(t). A scalar X is an eigenvalue ofT if and only ifp(X) = 0. Let T be a linear operator on a finite-dimensional vector space V.13. c = 1. we have r(T) = To. A is an eigenvalue of T. hence P\(t) = p 2 (i). Because Pi(t) and p 2 (i) are monic. r(t) must be the zero polynomial. there exists a polynomial q(t) such that f(t) = q(t)p(t). Proof. Definition. Substituting T into (1) and using that g(T) — p(T) — T 0 . that is. Hence the characteristic polynomial and the minimal polynomial of T have the same zeros.3 The Minimal Polynomial 517 where r(t) has degree less than the degree of p(i). we have that p2(t) = cpi(t) for some nonzero scalar c. Since p(t) divides f(t).12 and all subsequent theorems in this section that are stated for operators are also valid for matrices. Theorem 7. I| In view of the preceding theorem and corollary.14. and let 0 be an ordered basis for V. For any A G Mnxn(F). A more general treatment of minimal polynomials is given in Section 7. The following results are now immediate. D The minimal polynomial of a linear operator has an obvious analog for a matrix. Exercise. The minimal polynomial p(t) of A is the monic polynomial of least positive degree for which p(A) = O. Let T be a linear operator on a finite-dimensional vector space V. proving (a). then f(X) = q(X)p(X) = q(X). Proof. .4. Then the minimal polynomial ofT is the same as the minimal polynomial of [T]p. 7. For the remainder of this section. Exercise. (b) Suppose that p\(t) and p2(£) are each minimal polynomials ofT. Then P\(t) divides p2(£) by (a). [~] Corollary. If A is a zero of p(t).

518 Chap..(\k-t)n'>. A 2 . . suppose. Example 1 We compute the minimal polynomial of the matrix 4 = Since A has the characteristic polynomial (3-t -1 0 \ 0 2-t 0 = -(t-2)2{t~3). it follows that p(X) = 0.7 the minimal polynomial of A must be either (t — 2)(t — 3) or (t — 2)2(t — 3) by the corollary to Theorem 7. . Afc are the distinct eigenvalues ofT. we have 0 = T o (aO=p(T)(aO=p(A)a. Let T be a linear operator on a finite-dimensional vector space V with minimal polynomial p(t) and characteristic polynomial f(t). we find that p(A) — O. 7 Canonical Forms Conversely. . . The following corollary is immediate. • / ( t ) = det Example 2 Let T be the linear operator on R2 defined by T(a. Substituting A into p(t) = (t . 1 \ i 2 .A*)™*. and so A is a zero of p(t). • Thus the minimal polynomial of T is also (t — 7)(t + 4). that A is an eigenvalue of T. I where X\.nifc such that 1 < m» < rii for all i and p ( t ) « ( t . > 6 1 and hence the characteristic polynomial of T is /(*)=det(2~' l l t ) = ( * .3).1. Suppose that f(t) factors as f(t) = (\1-t)ni(\2-t)n>-. hence p(t) is the minimal polynomial of A.7 ) ( * + 4). .. .6a + b) and 0 be the standard ordered basis for R2. and let x G V be an eigenvector corresponding to A.2)(t .A 1 ) m » ( t . b) = (2a + 56. m 2 .14. Corollary. . Then there exist integers mi. Since x ^ 0. . By Exercise 22 of Section 5. .A a ) m « • • • ( « . Then [T]p = '2 .

hence g(T) / To. Here the minimal and characteristic polynomials are of the same degree. which is also the degree of the characteristic polynomial of T. and so g(T)(x) is a linear combination of the vectors of 0 having at least one nonzero coefficient. afc.T(x). or t3. 315). the derivative of g(x).Sec.Tn-1(x)} is a basis for V (Theorem 5. 7. Let g(t) = a0-ra1t-r--+ aktk. \o 0 0 / and it follows that the characteristic polynomial of D is — t3. Then the characteristic polynomial f(t) and the minimal polynomial p(t) have the same degree. and hence f(t) = (-l)np(t). t2. be a polynomial of degree k < n. there exists an . The next result shows that the operators for which the degree of the minimal polynomial is as small as possible are precisely the diagonalizable operators.15. Theorem 7. We compute the minimal polynomial of T. Since D2(a. hence the minimal polynomial of D must be t3.15 gives a condition under which the degree of the minimal polynomial of an operator is as large as possible. Theorem 7.. it follows that D2 ^ T 0 . Since V is a T-cyclic space. By Theorem 7. Then afc 7^ 0 and g(T)(x) = a0x + axT(x) + ••• + o. So by the corollary to Theorem 7. G V such that 0 = {x.3 The Minimal Polynomial Example 3 519 Let D be the linear operator on P 2 (/?) defined by D(g(x)) — g'(x). the minimal polynomial of D is t.kTk(x). . namely..22 p. Let 0 be the standard ordered basis for P 2 (i?). Proof..14. We now investigate the other extreme. it is easily verified that P 2 (i?) is a D-cyclic subspace (of itself). the degree of the minimal polynomial of an operator must be greater than or equal to the number of distinct eigenvalues of the operator. • In Example 3. it follows that g(T)(x) ^ 0. Let T be a linear operator on an n-dimcnsional vector space V such that V is a T -cyclic subspace of itself.14. Then [O]0 = /(> 1 0 \ 0 0 2 .7.2) = 2 ^ 0.. Since 0 is linearly independent. Therefore the minimal polynomial of T has degree n. This is no coincidence.

Then (T — Xj\)(Vi) = 0 for some eigenvalue Ay.Xj).are the distinct eigenvalues ofT. . Then 0\ and 02 are disjoint by the previous comment. Therefore W n N(T .A2) • • • (/ — Afc_i). It follows that p(T) = To. and consider any '. Suppose that T is diagonalizable. Clearly T is diagonalizable for n = 1. By Theorem 7. Then W is T-invariant. a m and b\.!)(:/.Afcl).: G 0. It follows that the minimal polynomial of Tw divides the polynomial (/ — Ai)(r .14.14. A2 (t-X1)(t-X2)---(t-Xk). Consider scalars a ] . and define p(t) = (t-\1)(t-\2)•••(tAfe). Let 0 — {vi. Now let ft = {ci. . and for any x G W. the eigenspace of T corresponding to Afc.t' m + b\ u-\ + b2ti<2 + • • • + bpwp = 0. there is a polynomial qj(t) such that p(t) = qj(t)(t .• • .. Tw is diagonalizable.A2I) • • • (T . Therefore p(t) is the minimal polynomial of T.'. A 2 .I)(T . . Let Ai.-1). Furthermore.V2. We apply mathematical induction on n = diin(V). m+p = n by the dimension theorem applied to T . . ... Let T be a linear operator on a finite-dimensional vector space V..) = 0.. . We show that Q = 0i U 02 is linearly independent.. a 2 . If W = {0}. Afc such that the minimal polynomial p(t) of T factors as p(t) = (t-X1)(t-X2)•••(*Afc). Hence by the induction hypothesis. Afc be the distinct Proof. . since p(T) takes each vector in a basis for V into 0. and let 02 = {w\. Conversely.AAI) = {()}. vm) be a basis for W consisting of eigenvectors of Tw (and hence of T). A A . . Now assume that T is diagonalizable whenever dim(V) < n for some n > 1.wp} be a basis for N(T .Afc_. Since t . Hence p(T)(vi) = qj(T)(T-Xj\)(vi) = 0. because Afc is an eigenvalue of T. So suppose that 0 < diin(W) < n.A. (T . and lei dim(V) = n and W = R(T .A/. Obviously W ^ V. A2 eigenvalues of T. . Afc is not an eigenvalue of Tw by Theorem 7.bv such that a\V\ + a 2 t' 2 . the A/'s are eigenvalues of T. Moreover.Vn} be a basis for V consisting of eigenvectors of T.b2. .16.A? divides p(i). suppose that there are distinct scalars Ai. which is clearly diagonalizable. • • • . Then T is diagonalizable if and only if the minimal polynomial ofT is of the form p(t) = where Ai. 7 Canonical Forms Theorem 7.520 Chap. W2.r. .Afcl. By Theorem 7. ..14. p(t) divides the minimal polynomial of T. then T = Afc I..

the minimal polynomial p(t) of A divides g(t).Afcl). we have that ai = a 2 = • • • = a m = 0. • . and we conclude that 0 is a linearly independent subset of V consisting of n eigenvectors. respectively.4. by Theorem 7. we saw that the minimal polynomial of the differential operator D on P2(7?) is t3.) Example 4 We determine all matrices A G M 2X 2(fl) for which A2 . then A = / or A ~ 21.) In the case that the characteristic polynomial does not split. In the case that the characteristic polynomial of the operator splits. which we study in the next section. Since 0\ is linearly independent. 1 In addition to diagonalizable operators.1 or p(t) =t-2.3t + 2 = (t .2). Since g(t) has no repeated factors. neither does p(t).2). If p(t) = (t — l)(t — 2). and hence A is similar to 1 0 0 2 Example 5 Let A £ MnXn(7?) satisfy A3 = A. It follows that 0 is a basis for V consisting of eigenvectors of T.l)(t . i=l 521 Then x G W. Thus A is diagonalizable by Theorem 7. Hence the only possible candidates for p(i) are t — \. the minimal polynomial can be described using the rational canonical form. y G N(T .2. We show that A is diagonalizable. b\ = 62 = • • • = 6» = 0.3A-r 21 = O. and therefore .Sec. (See Exercise 13.t = t(t + l)(t .l)(t .16. and consequently T is diagonalizable. • Example 6 In Example 3. and hence the minimal polynomial p(t) of A divides g(t).1). Since g(A) = O. and x + y = 0.16.3 The Minimal Polynomial Let x = 22 aiVi and */ = ! > < (D. Similarly.r = 0. t . Let g(t) = t3 . D is not diagonalizable. Then g(A) = O. If p(t) = t . then A is diagonalizable with eigenvalues 1 and 2. there arc methods for determining the minimal polynomial of any linear operator on a finite-dimensional vector space. and (I . It follows that x = -y <E W fl N(T — Afcl). 7. the minimal polynomial can be described using the Jordan canonical form of the operator. (See Exercise 7 of Section 7. Let 2 g(t) = t . Hence.

(a) (b) (c) (d) V= V= V= V= R2 and T(o. (h) Let T be a linear operator on a vector space V such that V is a T-cyclic subspace of itself. Describe all linear operators T on R2 such that T is diagonalizable and T3 _ 2T 2 + T = T 0 . (d) The minimal and the characteristic polynomials of any diagonalizable operator are equal. (g) A linear operator is diagonalizable if its minimal polynomial splits. 7 Canonical Forms 1. p(t) be the minimal polynomial of T. find the minimal polynomial of T. . a . Find the minimal polynomial of each of the following matrices. Suppose that f(t) splits. Hint: Note that T 2 = I. b) = (a + b. 4. Determine which of the matrices and operators in Exercises 2 and 3 are diagonalizable. and f(t) be the characteristic polynomial of T. Assume that all vector spaces are finite-dimensional. (c) The characteristic polynomial of a linear operator divides the minimal polynomial of that operator. (a) Every linear operator T has a polynomial p(t) of largest degree for which 7->(T) = TQ. (i) Let T be a linear operator on a vector space V such that T has n distinct eigenvalues. (b) Every linear operator has a unique minimal polynomial.522 EXERCISES Chap.b) P2(R) and T(g(x)) = g'(x) + 2g(x) P2(R) and T(f(x)) = -xf"(x) + f'(x) + 2f(x) M n X n (i?) and T(A) = A*. For each linear operator T on V. Label the following statements as true or false. Then the degree of the minimal polynomial of T equals n. 2. Then f(t) divides \p(t)\n(f) The minimal polynomial of a linear operator always has the same degree as the characteristic polynomial of the operator. where n — dim(V). 5. Then the degree of the minimal polynomial of T equals dim(V). (e) Let T be a linear operator on an n-dimensional vector space V. (a) 2 1\ 1 2j 4 -14 1 -4 1 -6 5 2 4 (b) 1 0 1 1 (c) 3.

Let T be a linear operator on a finite-dimensional vector space. Prove Theorem 7. Prove that the minimal polynomial of T is (t-x1r(t-x2y^. Let D be the differentiation operator on P(/?). (b) The minimal polynomial of Dv (the restriction of D to V) is g(t). Prove that there exists no polynomial g(t) for which </(D) = T 0 .1 = -— ( T " . 7. 12. Prove the corollary to Theorem 7.32 (p. (a) T is invertible if and only if p(0) -£ 0. and suppose that W is a T-invariant subspace of V. then the characteristic polynomial of Dv is(-l)ng(t). Prove that V is a T-cyclic subspace if and only if each of the eigenspaces of T is one-dimensional.14.13 and its corollary. Hint: Use Theorem 2. the space of polynomials over R. Prove the following results. Let T be a linear operator on a finite-dimensional vector space V. A 2 . I ) .3 The Minimal Polynomial 6. 13.--(t-xk)p^. 523 8. 7. . 135) for (b) and (c). T n . and for each i let pi be the order of the largest Jordan block corresponding to Xi in a Jordan canonical form of T. and let p(t) be the minimal polynomial of T. Prove that the minimal polynomial of Tw divides the minimal polynomial of T.7). The following exercise requires knowledge of direct sums (see Section 5. Hence D has no minimal polynomial. Let T be a diagonalizable linear operator on a finite-dimensional vector space V. Afc be the distinct eigenvalues of T. . and suppose that the characteristic polynomial of T splits. . 10. (b) If T is invertible and p(t) = tn + a n _ i £ n _ 1 + • • • + ait + a 0 . then T . Let g(t) be the auxiliary polynomial associated with a homogeneous linear differential equation with constant coefficients (as defined in Section 2. . and let V denote the solution space of this differential equation. (c) If the degree of g(t) is n. a0 9. Let T be a linear operator on a finite-dimensional vector space. where D is the differentiation operator on C°°.Sec.2). 11. Let Ai. Prove the following results. . (a) V is a D-invariant subspace.2 + • • • + a 2 T + a .1 + a n _ .

. Let T be linear operator on a finite-dimensional vector space V. operators need not have eigenvalues! However. (b) If fi. * Let T be a linear operator on a finite-dimensional vector space V. (d) Let W 2 be a T-invariant subspace of V such that W 2 C W^ and let g2(t) be the unique monic polynomial of least degree such that g2(T)(x) G W 2 . and dim(W) equals the degree of p(t). respectively. (b) The T-annihilator of X divides any polynomial git) for which 9(T) = T(). 15. T be a linear operator on a finite-dimensional vector space V. Let T be a linear operator on a linite-dimensional vector space V.(t) is a polynomial for which h(T)(x) G Wi. (d) The degree of the T-annihilator of x is I if and only if x is an eigenvector of T. In general. and let x be a nonzero vector in V. characteristic polynomials need not split. and Tw 2 . 16.4* THE RATIONAL CANONICAL FORM Until now we have used eigenvalues. (a) The vector x has a unique T-annihilator.524 Chap. Prove the following results. that p\{t)p2(t) is the minimal polynomial of T. (a) There exists a unique monic polynomial g\ (t) of least positive degree such that 0] (T)(. (c) If p(t) is the T-annihilator of x and W is the T-cyclic subspace generated by x. Definition. (c) gi(t) divides the minimal and the characteristic polynomials of T. the unique factorization theorem for polynomials (see Appendix E) guarantees that the characteristic polynomial f(t) of any linear operator T on an n-dimensional vector space factors . 7. The polynomial p(t) is called a T-annihilator of x if p(t) is a monic polynomial of least degree for which p(T)(x) = 0. and generalized eigenvectors in our analysis of linear operators with characteristic polynomials that split..T) G WI. Let x G V such that . then gi(t) divides h(t). and indeed.• ^ Wi. 7 Canonical Forms 14. and let Wi be a T-invariant subspace of V. Then gi(t) divides g2(t). and let Wi and W 2 be T-invariant subspaces of V such that V = W| ©W 2 . Exercise 15 uses the following definition. Prove the following results. Prove or disprove. eigenvectors. and let x be a nonzero vector in V. then />(<) is the minimal polynomial of Tw. Suppose that pi(t) and p2(t) are the minimal polynomials of Tw.

where the <t>i(t)'s (1 < i < k) arc distinct irreducible monic polynomials and the ni 's are positive integers. In this context. The one that we study is called the rational canonical form. = {x G V: (<fii(T))p(x) — 0 for some positive integer p). Recall (Theorem 5.Sec. In the case that f(t) splits. In general.. We show that each K^ is a nonzero T-invariant subspace of V... 7.i r O i ( * ) ) n i (Mt))n* ••• (MW. and there is a one-to-one correspondence between eigenvalues of T and the irreducible monic factors of the characteristic polynomial. we call it the T-cyclic basis g e n e r a t e d by x and denote it by . of V by K^. each irreducible monic polynomial factor is of the form </»j(i) = t — Xi.To distinguish this basis from all other ordered bases for Cx.T2^).22) that if dim(Ca. In this section.^-1^)} is an ordered basis for Cr. 315). we define the subset K^.4 The Rational Canonical Form uniquely as f(t) = (-i)n(Mt)r(<h(t))n2---(Mmnk 525 where the </>i(£)'s (1 < i < k) are distinct irreducible monic polynomials and the ni's are positive integers. then K^ is the generalized eigenspace of T corresponding to the eigenvalue A. Here the bases of interest naturally arise from the generators of certain cyclic subspaces. but the irreducible monic factors always exist. Since a canonical form is a description of a matrix representation of a linear operator.) = k. eigenvalues need not exist. then the set {^T^. Note that if (/>i(t) = t — X is of degree one. the reader should recall the definition of a T-cyclic subspace generated by a vector and Theorem 5. Definition. We briefly review this concept and introduce some new notation and terminology Let T be a linear operator on a finite-dimensional vector space V. our next task is to describe a canonical form of a linear operator suitable to this context. we establish structure theorems based on the irreducible monic factors of the characteristic polynomial instead of eigenvalues. it can be defined by specifying the form of the ordered bases allowed for these representations. We use the notation Cx for the T-cyclic subspace generated by x. and let x be a nonzero vector in V. where Aj is an eigenvalue of T. For 1 < i < k. Let T be a linear operator on a finite-dimensional vector space V with characteristic polynomial f(t) = ( .. the following definition is the appropriate replacement for eigenspace and generalized eigenspace. Having obtained suitable generalizations of the related concepts of eigenvalue and eigenspace. For this reason.22 (p.

and the characteristic polynomial of the companion matrix of a monic polynomial g(t) of degree k is equal to ( — l)kg(t). 7 Canonical Forms 0X. is the companion matrix of a polynomial (<t>(t))m such that <j)(t) is a monic irreducible divisor of the characteristic polynomial of T and m is a positive integer.526 Chap. 0 1 ak-lTk-l{x) where aox + a\T(x) + Furthermore. . Since A is the matrix representation of the restriction of T to Cx. the monic polynomial h(t) is also the minimal polynomial of A. and suppose that the T-annihilator of x is of the form ((j)(t) )v for some irreducible monic polynomial <j)(t). Recall from the proof of Theorem 5.3. introduced in the Exercises of Section 7. It is the object of this section to prove that for every linear operator T on a finite-dimensional vector space V. and x G K&. Let A be the matrix representation of the restriction of T to C x relative to the ordered basis 0X.ti) = (-l) f c (a 0 + axt + • • • + k-\ ak-it The matrix A is called the companion matrix of the monic polynomial h(t) = ao + ai/. which relies on the concept of T-annihilator.4. Every monic polynomial has a companion matrix. A matrix representation of this kind is called a rational canonical form of T.3. h(t) is also the T-annihilator of x. Lemma. + ••• + ak-ifk~1 + tk. Then (f>(t) divides the minimal polynomial of T.15 (p. We call the accompanying basis a rational canonical basis for T. the characteristic polynomial of A is given by det(i4 . Let T be a linear operator on a finite-dimensional vector space V. there exists an ordered basis 0 for V such that the matrix representation [T]^ is of the form (Cl o O o c2 Cr) \o o where each C7. 519).22 that /o 0 • • 0 1 0 • • 0 0 1 • • 0 xo A -ao \ -ai -a2 -afc-i/ + Tk(x) = 0.) By Theorem 7. let x be a nonzero vector in V. h(t) is also the minimal polynomial of this restriction. By Exercise 15 of Section 7. (See Exercise 19 of Section 5. The next theorem is a simple consequence of the following lemma.

By Exercise 15(b) of Section 7. 0 is the disjoint union of the T-cyclic bases.v4. and let 0 be an ordered basis for V.4 The Rational Canonical Form 527 Proof. where <j)l(t)=t2 -t + 3 and 4>2(t) = t2 + 1.v3.p by the definition of K^.v7. (<i>(t))p divides the minimal polynomial of T. that is. C 2 .v2.~= Sec. and 0 2 (£). Exercise. Therefore 4>(t) divides the minimal polynomial of T. In this case. where each Vi lies in K^ for some irreducible monic divisor <p(t) of the characteristic polynomial ofT.v8} Ill is a rational canonical basis for T such that ( o -3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 -1 0 0 0 -2 1 0 0 0 0 0 0 0 \ 0 0 0 0 0 0 0 0 0 0 0 -1 1 oy C=[Tb = v is a rational canonical form of T. Let T be a linear operator on a finite-dimensional vector space V. 7.17. the characteristic polynomial f(t) of T is the product of the characteristic polynomials of the companion matrices: f(t) = Mt)(h(t))2Mt) = Mt)(Mt))3• . the submatrices C\. In the context of Theorem 7.v5. (cj)2(t))2. 0 = 0VI U 0V3 U 0V7 = {vi.v6. respectively. v6} U {v7. and Cz are the companion matrices of the polynomials <f>\ (t). | Theorem 7. Then 0 is a rational canonical basis for T if and only if 0 is the disjoint union of T-cyclic bases 0Vi.v5. Furthermore. By Exercise 40 of Section 5. x G K.v8}.v2} U {v3.17.. Proof.4. Example 1 Suppose that T is a linear operator on R8 and 0= {vi.3.V4.

Then (<f>i(T)) (x) = 0 for some positive integer q. and the restriction of 0. where 4>(t) is an irreducible monic divisor of the minimal polynomial of T. Proof. nKlh = {()} fori ^j. we show that the companion matrices C. Theorem 7. In the course of showing that. each such divisor is used in this way at least once. (e) K. the minimal polynomial and the characteristic polynomial have the same irreducible divisors.' .( because (<i>i(T))mi(x) = (d>i(T))mifi(T)(z)=p(T)(z) q = 0.18. and (e) are obvious. and the result follows. The reader is advised to review the definition of T-annihilator and the accompanying Exercise 15 of Section 7. .528 Chap. every linear operator T on a finite dimensional vector space has a rational canonical form C. Hence the T-annihilator of x divides (4>%(t))q by Exercise 15(b) of Section 7. (a) K0( i. then (a). Let fi(t) be the polynomial obtained from p(t) by omitting the factor (0. We begin with a result that lists several properties of irreducible divisors of the minimal polynomial. Furthermore... .. Then the following statements are true. where the <f>i(t)*s (1 < i < k) arc the distinct irreducible monic factors of pit) and the nij 's are positive integers. while (c) and (d) are vacuously true.s a nonzero T-invariant subspace of V for each i. If k = 1. Let T lie a linear operator on a finite-dimensional vector space V. that constitute C are always constructed from powers of the monic irreducible divisors of the characteristic polynomial of T. each of which is the companion matrix of some power of a monic irreducible divisor of the characteristic polynomial of T.3.(/) is a proper divisor of pit): therefore there exists a vector z G V such that x — fi(T)(z) ^ 0. (b) Assume the hypothesis. is invariant under (f>j(T) for i ^ j. (b).(T))'"<) for each i. is nonzero. 7 Canonical Forms The rational canonical form C of the operator T in Example 1 is constructed from matrices of the form C\. (c) K0. (d) K0. To prove that K^. We eventually show that the converse is also true: that is. then the T-annihilator of x is of the form ((j>i(t))p for some integer p. (b) If x is a nonzero vector in some K^. Now suppose that k > 1. Then x G K.. and suppose that p(t) = (Mt))mi(Mt)r3---(Mt))mk is the minimal polynomial ofT.3. first observe that /. (a) The proof that Kfl>i is a T-invariant subspace of V is left as an exercise. every irreducible divisor of the former is also an irreducible divisor of the latter. A key role in our analysis is played by the subspaces K^.(/))'"'.(T) to K(j>j is one-to-one and onto.N((0. Since the minimal polynomial of an operator divides tin1 characteristic polynomial of the operator.

7.(T) to K 0i is onto. Then x G K^. The result is trivial if k = 1. Let fi(t) be the polynomial obtained from p(t) by omitting the factor (<t>i(t))"1' • As a consequence of Theorem 7. Therefore the restriction of 0y(T) to K 0i is one-to-one. so suppose that k > 1. = N((^(T))'"')- Since a rational canonical basis for an operator T is obtained from a union of T-cyclic bases.(T))'"') C K^.. and suppose that p(t) = (Mt))mi(Mt))m3---(</>k(t)yrn/. The next major result. But this is impossible because (j>i(t) and <f>j(t) are relatively prime (see Appendix E).Sec. Since K<Pi is T-invariant. I Theorem 7. and suppose that x ^ 0. Thus K 0 . and suppose that p(t) = (Mt))mi(<h(t))m2---(Mt))mk is the minimal polynomial of T. (2) . (e) Suppose that 1 < i < k.. Let x G K</)( HK^. we obtain /„(T)(«0 = 0. Then there exists y G K 0i such that fj(T)(y) = x. Suppose that 0_. By (b).18. it is also 0y(T)-invariant. We begin with the following lemma.-(T)(:/:) = 0 for SOUK.. Thus. X G K 0 / .(T) to (2).iT)y"')fiiT)iy) =p(T)(y) = 0. and fr(V(vj) = 0 for i / j. Let T be a linear operator on a finite-dimensional vector space V. Let x G Kfa. we need to know when such a union is linearly independent. Clearly.4 The Rational Canonical Form 529 (c) Assume i ^ j. where the 0. from which it follows that Vi = 0. Proof. D K<t>j = {<)} by (c). Since V is finite-dimensional. 's (1 < i < k) are the distinct irreducible monic factors of pit) and the rrii 's are positive integers. Lemma. (d) Assume / ^ j.19. where <p(t) is an irreducible monic divisor of the minimal polynomial of T. Therefore ((MVr-Kx) = «<P./: is a power of both 4>i(t) and <I)j(t). the T-annihilator of.. For 1 < i < k. /i(T) is one-to-one on K^. Theorem 7. applying /. N((0. I and hence x G N ( 0 . Then v{ = 0 for all i. Since fj(t) is a product of polynomials of the form (pj(i) for j ^ i. reduces this problem to the study of T-cyclic bases within K^. we have by (d) that the restriction of /. ( T ) ) m ' ) . Let T be a linear operator on a finite-dimensional vector space V.19. this restriction is also onto. let Vi G K 0i be such that V1+V2 + '" +vk = 0. Consider any i. We conclude that x = 0. Let fc(t) be the polynomial defined in (a).

we fix a linear operator T on a finite-dimensional vector space V and an irreducible monic divisor 6(t) of the minimal polynomial of T.-(/) be the polynomial defined by / i ( t ) = y w . If A : — 1. Then vk be distinct vectors in K. let Si be a linearly independent subset of K 0 i . Consider any linear combination of vectors in 5 2 that sums to zero. Now suppose that k > 1. Let v\. For Theorems 7. Furthermore. we can focus on bases of individual spaces of the form K(/)(/). Proof. where 0(f) is an irreducible monic divisor of the minimal polynomial of T. These results serve the dual purposes of leading to the existence theorem for the rational canonical form and of providing methods for constructing rational canonical bases.p such that 0V2 U • • • U 0Vk For each i. then (a) is vacuously true and (b) is obvious.20. T h e o r e m 7.. the proof of (b) is identical to the proof of Theorem 5.. For 1 < i < k. v2. 267) with the eigenspaces replaced by the subspaces K^. Si =0VlU is linearly independent.18(c).8 (p. where the 0j 's (1 < i < fc) are the distinct irreducible monic factors ofp(t) and the 7m's are positive integers. The next several results give us ways to construct bases for these spaces that are unions of T-cyclic bases. choose wi G V such that 0(T)(w^) = t'j.530 Chap. 1=0 Then (3) can be rewritten as £ / < ( T ) ( t m ) = 0. i=\ (4) . Then (a) follows immediately from Theorem 7. Then (a) 5j n 5j = 0 for i ^ j (b) S\ U S2 U • • • U Sfc is linearly independent. Proof.21 and the latter's corollary. say.. let /. 7 Canonical Forms is the minimal polynomial of T. (3) < = 1 j=0 For each i. S2 = 0un U 0W2 U • • • U 0Wk is also linearly independent.20 and 7. i In view of Theorem 7...19.

and let 0 be a basis for W.(T)(«i) = 0 far all i. Proof. 1 We now show that K 0 has a basis consisting of a union of T-cycles. Therefore <S2 is linearly independent. . 0(f) divides the T-annihilator of Vi. . linear independence of 5i requires that /i(T)(tiK)=fl|. Then 0 U 0X is linearly independent.) By Theorem 7. for each i.Sec. Then the following statements are true. 7. (b) For some ttfi. v 2 . i=l j=l Again.18(b). there exists a polynomial gi(t) such that fi(t) = <?j(f)0(f). So (4) becomes k k X > ( T ) 0 ( T ) K ) = £ > ( T ) M . . . (See Exercise 15 of Section 7.w2. and hence 0(f) divides fi(t) for all i. Thus. 0 = J > 0 > ( T ) ( u . aij = 0 for all j. Therefore the T-annihilator of vi divides fi(t) for all i. vk}. 0 can be extended to the linearly independent set 0' = 0U0W1 whose span contains N(0(T)). and suppose that yZ aiVi + z = 0 i=i and z = 2_\fy~^j (x)' 3=0 U0W2U---U0Wa.4 The Rational Canonical Form Apply 0(T) to both sides of (4) to obtain I > 0 O / < ( T ) ( t i . it follows that fi(T)(vi) = 0 for all*.0. Lemma. • • • . (a) Suppose that x G N(0(T)). but x (£ W. Since Si is linearly independent. ) = X > ( T ) ( « 0 = »• i=i i=l i=\ 531 This last sum can be rewritten as a linear combination of the vectors in S\ so that each /»(T)(«j) is a linear combination of the vectors in 0Vi. But fi(T)(wi) is the result of grouping the terms of the linear combination in (3) that arise from the linearly independent set 0Wi. .3. (a) Let 0 = {vi.ws in N(0(T)). . We conclude that for each i. Let W be a T-invariant subspace ofK^.

and the restriction of T to this subspace has (0(f)) m _ 1 as its minimal polynomial.fc+1.. Since V = N(0(T)). . this set is a rational canonical basis for V. choose Wi in V such that Vi = cf>(T)(wi). but not in Wi. Then z G Cx D W. if necessary. Choose a vector tui G N(0(f)) that is not in W. .. By (a). Then R(0(T)) is a T-invariant subspace of V.21.. Therefore z = 0. If Wi does not contain N(0(f)). where w^ is in N(0(T)) for i > k. 0\ = 0 U 0Wl is linearly independent. so that /32 = 0\ D0W2 = 0U0Wl Li0W2 is linearly independent. The proof is by mathematical induction on m. choose a vector u. Let W = span(/3). Then 2 has 0(f) as its T-annihilator.fc+2. Continuing this process. . the union 0 of the sets 0Wi is linearly independent. v 2 . Therefore we may apply the induction hypothesis to obtain a rational canonical basis for the restriction of T to R(T). . Thus 0 U 0X is linearly independent. . and hence C2 C Cx n W. Apply (b) of the lemma to W = {0} to obtain a linearly independent subset of V of the form 0Vl U 0V2 U • • • U 0Vk. we eventually obtain vectors w\. By Theorem 7. and consequently x G W. 7 Canonical Forms where d is the degree of 0(f). and therefore d = dim(Q) < dim(C x n W ) < dim(C x ) = d. . for some integer m > 1.vk are the generating vectors of the T-cyclic bases that constitute this rational canonical basis. Let r = rank(0(T)).2 in N(0(f))./?u. it follows that a^ = 0 for all i. then there exists a rational canonical basis for T. w 2 . Since 0 is linearly independent. . whose span contains N(0(T)). Suppose that m = 1. Suppose that v\. Now suppose that. It follows that Cx PI W = Cx. . Proof.0W„ to 0. and assume that the minimal polynomial of T is p(t) = (0(f) ) m . from which it follows that bj — 0 for all j. where k < m. to obtain a linearly independent set 0' = 0Wl U0W2 U •• • U0Wk U • • • U0Wa whose span W contains both W and N(0(T)). For each i. . Then W contains R(0(T)). Apply (b) of the lemma and adjoin additional T-cyclic bases Ai. (b) Suppose that W does not contain N(0(T)). Suppose that z ^ 0. . ws in N(0(T)) such that the union 0'=0U0WlU0W2U---U0Wa is a linearly independent set whose span contains N(0(T)). the result is valid whenever the minimal polynomial of T is of the form (0(T))fc. | T h e o r e m 7. If the minimal polynomial ofT is of the form p(t) = (0(f) ) m .532 Chap.20. contrary to hypothesis. Let Wi = span(/?i).

K0fc has a basis 02 consisting of a union of Tcyclic bases. We are now ready to study the general case. Thus W = V. The case k — 1 is proved in Theorem 7.18(e). Let s denote the number of vectors in 0. By the corollary to Theorem 7. Corollary.21. a rational canonical form. hence. 1 < i < k — 1. Suppose that the result is valid whenever the minimal polynomial contains fewer than k distinct irreducible factors for some k > 1.22. and 0 = 0i U/?2 is linearly independent. It follows that (MT))mk + 1(y) = 0. Thus q(t) contains fewer than k distinct irreducible divisors. Furthermore. Therefore dim(W') = rank(U) + nullity(U) = rank(0(T)) + nullity(0(T)) = dim(V). K0 has a basis consisting of the union of T-cyclic bases.4 The Rational Canonical Form 533 We show that W = V. Let U denote the restriction of 0(T) to W .Sec. where the 0i(f)'s are the distinct irreducible monic factors of p(t) and mi > 0 for all i. Then s = dim(R((0fc(T))Wfc)) + dim(K0fc) = rank((0 f c (T)r*) + nullity((0fc(T))TOfc) = n. it follows that R(U) = R(0(T)) and N(U) = N(0(T)). and suppose that p(t) contains k distinct factors. and hence y G K ^ and x = f> fc (T)) m *(y) = 0 by Theorem 7. Every linear operator on a finite-dimensional vector space has a rational canonical basis and. By Theorem 7. Proof. So by the induction hypothesis. Let U be the restriction of T to the T-invariant subspace W = R((0fc(T)mfc). We conclude that 0 is a basis for V. By the way in which W was obtained from R(0(T)). a contradiction. U has a rational canonical basis 0\ consisting of a union of U-cyclic bases (and hence T-cyclic bases) of vectors from some of the subspaces K 0i . Proof. 0fc(f) does not divide q(t).3. Then q(t) divides p(t) by Exercise 10 of Section 7. and 0' is a rational canonical basis for T. I I | .21 to the restriction of T to K^. Let T be a linear operator on a finite-dimensional vector space V. there would exist a nonzero vector x G W such that 0fc(U)(:r) = 0 and a vector y G V such that x = (4>k{T))mk(y). Theorem 7. The proof is by mathematical induction on k. Therefore 0 is a rational canonical basis. Apply Theorem 7. 0\ and 02 are disjoint.21.19. and T has a rational canonical form. which is 0(T)-invariant. and let q(t) be the minimal polynomial of U. and let p(t) = (0i(f)) m i (0 2 (*)) m 2 • • • (<f>k(t))mk be the minimal polynomial of T. For otherwise. 7.

Therefore each irreducible monic divisor 0i(f) of /(f) divides the characteristic polynomial of at least one of the companion matrices. (a) 0i(f).19. is the product of the characteristic polynomials of the companion matrices that compose C. We are now able to relate the rational canonical form to the characteristic polynomial. because this sum is equal to the degree of /(f).534 Chap. 7 is linearly independent. if 0(f) is an irreducible monic polynomial that divides the minimal polynomial of T.diUi. where di is the degree of 0i(f).i ) » ( f c ( t ) r (Mt)r ••• (Mt))nk. Then the following statements are true.. (0j(f)) p is the T-annihilator of a nonzero vector of V. (b). and (d) Let C — \T]p. and hence of T. (b) For each i.22. we have nidi < dim(K 0i ). Since /(f) is the product of the characteristic polynomials of the companion matrices that compose C. then 0(f) divides the characteristic polynomial of T because the minimal polynomial divides the characteristic polynomial. We conclude that (0j(f)) p . which is a rational canonical form of T. (d) If 7J is a basis for K 0i for each i. By Theorem 7. (a) By Theorem 7. Furthermore. if each 7$ is a disjoint union of T-cyclic bases... then 7 = 71 U 7 2 U • • • U 7fc is a basis for V. i=i Now let s denote the number of vectors in 7. n = 2_. divides the minimal polynomial of T. 0 2 (f).23. (c) If 0 is a rational canonical basis for T. Consider any i. i=i . then 7 is a rational canonical basis for T. the characteristic polynomial of C. we may multiply those characteristic polynomials that arise from the T-cyclic bases in 0i to obtain the factor (0j(f)) ni of /(f). Since this polynomial has degree nidi. (1 < i < k). we relied on the minimal polynomial. . then 0i = 0C\ K 0i is a basis for K 0i for each i.4. and the union of these bases is a linearly independent subset 0i of K 0 i . 7 Canonical Forms In our study of the rational canonical form. By Exercise 40 of Section 5. T h e o r e m 7. where the (pi(t)'s (I < i < k) are distinct irreducible monic polynomials and the Hi's are positive integers. In particular. and therefore = y^dini i=i < / ]dim(K^ i ) = s < n. and so 0j(f). dim(K 0i ) = diUi. Let T be a linear operator on an n-dimensional vector space V with characteristic polynomial /(f) = ( . and hence for some integer p. 0fc(f) are the irreducible monic factors of the minimal polynomial. (c). Proof. Conversely. T has a rational canonical form C.

although the rational canonical bases are not. and P3 = 2. 0Vk are the T-cyclic bases of 0 that are contained in K^. we arrange the T-cyclic bases in decreasing order of size. the rational canonical form of a linear operator is unique up to the arrangement of the irreducible monic divisors. the rational canonical form of a linear operator T can be modified by permuting the Tcyclic bases that constitute the corresponding rational canonical basis. To simplify this task. For example. 0Vj contains dpj vectors. p2 = 2. let (0(f))Pj be the annihilator of Vj. This has the effect of permuting the companion matrices that make up the rational canonical form. we introduce arrays of dots from which we can reconstruct the rational canonical form. Furthermore. Furthermore. we define a dot diagram for each irreducible monic divisor of the characteristic polynomial of the given operator. if k = 3. pi > p 2 > • • • > pk since the T-cyclic bases are arranged in decreasing order of size.i) for all i.. In the case of the rational canonical form.. As in the case of the Jordan canonical form. 7.. This polynomial has degree dpj\ therefore. and dini = dim(K()f. Our task is to show that. 0(f) is an irreducible monic divisor of the characteristic polynomial of T. pi = 4. 0Vl. It follows that 7 is a basis for V and 0i is a basis for K^ for each i. column. In what follows. Certainly. the rational canonical form is unique. we devised a dot diagram for each eigenvalue of the given operator. subject to this order. within each such grouping. For the Jordan canonical form. by Exercise 15 of Section 7. and d is the degree of 0(f). we show that except for these permutations. For each j.4 The Rational Canonical Form 535 Hence n = s. 0V2.. A proof that the resulting dot diagrams are completely determined by the operator is also a proof that the rational canonical form is unique.Sec. arranged so that the jfth column begins at the top and terminates after pj dots. then the dot diagram is Although each column of a dot diagram corresponds to a T-cyclic basis . I Uniqueness of the Rational Canonical Form Having shown that a rational canonical form exists. we adopt the convention of ordering every rational canonical basis so that all the T-cyclic bases associated with the same irreducible monic divisor of the characteristic polynomial are grouped together.3. we are now in a position to ask about the extent to which it is unique. T is a linear operator on a finite-dimensional vector space with rational canonical basis 0. As in the case of the Jordan canonical form. We define the dot diagram of 0(f) to be the array consisting of k columns of dots with Pj dots in the jib.

0i(f) = t — t + 3 and 0 2 (f) = t + 1. hence this diagram contributes two copies of the 2 x 2 companion . Example 2 Recall the linear operator T of Example 1 with the rational canonical basis ii and the rational canonical form C — [T]/j. there are fewer dots in the column than there are vectors in the basis. Since there are two irreducible monic divisors of the characteristic polynomial of T. and suppose that the irreducible monic divisors of the characteristic polynomial of T are 01(f)=f-l. that the dot diagrams associated with these divisors are as follows: Diagram for 0i(f) Diagram for <j>2(t) Diagram for 03(f) Since the dot diagram for Oi (f) has two columns.. These diagrams are as follows: Dot diagram for 0i(f) Dot diagram for <p2(t) • In practice. corresponds to the l x l companion matrix of 0i(£) = t — 1. each containing a single dot. These two companion matrices are given by Ci = 0 1 -1 2 and C2 = (\). furthermore. 0V3 and 0Vl. it contributes two companion matrices to the rational canonical form. Example 3 Let T be a linear operator on a finite-dimensional vector space over R. The second column. Because 0i(f) is the T-annihilator of V\ and 0Vl is a basis for K 0 . Suppose. we obtain the rational canonical form of a linear operator from the information provided by dot diagrams. The first column has two dots.. Since V3 has T-annihilator (0 2 (f)) 2 and V7 has T-annihilator 0 2 (f). 7 Canonical Forms 0Vi in K(/. there are two dot diagrams to consider. and therefore corresponds to the 2 x 2 companion matrix of (0i(f)) 2 = (f — l) 2 .536 Chap. This is illustrated in the next example. lie in K 02 . The dot diagram for 0 2 (f) = f2 + 2 consists of two columns. 0 2 (f) = f 2 + 2 . and 4>-sit) = t2 + t + I. The other two T cyclic bases. in the dot diagram of 0 2 (f) we have p\ = 2 and p 2 = 1. the dot diagram for 0i(f) consists of a single dot. with only one dot.

The dot diagram associated with the Jordan canonical form of U gives us a key to understanding the dot diagram of T that is associated with 0(f). C-x ~ CA — 537 The dot diagram for 03(f) = f2 4. by Exercise 12 of Section 7. and 0Vl.2. = 0 1 -1 -ly Therefore the rational canonical form of T is the 9 x 9 matrix (Cx O C = O o \o 0 C2 0 o o o -1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O O c3 o O 0 0 1 0 0 0 0 0 0 O O o c4 O 0 0 0 0 1 0 0 0 0 o\ O o o C5J 0 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -2 1 0 0 0 0 0 0 o > 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 1 -1 J ( v We return to the general problem of finding dot diagrams.4 The Rational Canonical Form matrix for 0 2 (f).f + 1 consists of a single column with a single dot contributing the single 2 x 2 companion matrix C. be the cycle of generalized eigenvectors of U corresponding to A = 0 with end vector TZ(VJ).18(d). Let 0 be a rational canonical basis for T. 0V2. and suppose again that the T-annihilator of Vj is (4>(t))Pj. 7. the characteristic polynomial of U is ( . and U has a Jordan canonical form. namely. we fix a linear operator T on a finite-dimensional vector space and an irreducible monic divisor 0(f) of the characteristic polynomial of T... . Therefore K^ is the generalized eigenspace of U corresponding to A = 0. We now relate the two diagrams. By Theorem 7. \Jq = TQ for some positive integer q. where m = dim(K^). Then 0Vj consists of dpj vectors in 0. For 0 < i < d. As we did before.l ) w f m .—z MET Sec. let 7. Consequently.. Let U denote the restriction of the linear operator 0(T) to K^. Consider one of these T-cyclic bases 0Vj.. 0Vk be the Tcyclic bases of 0 that are contained in K 0 .

7.. Chap. and hence (<l>(t))Pj~lg(t) is a nonzero polynomial with degree less than pjd. and since span(aj) = span(/3Wi) = CVi. otj is an ordered basis for CVj. Lemma 1.1 ^ ) + a 1 (0(T))*. Since (0(f)) Pj is the T-annihilator of Vj. Since ctj is the union of cycles of generalized eigenvectors of U corresponding to A = 0. this basis is . it follows that (0(T)) p J _1 o(T)(vj) ^ 0. Consider any linear combination of these vectors a o W ) ) * . Notice that otj contains pjd vectors. v Now let otj = 70U71 U .U 7 d _ i . . bearing in mind that in the first case we are considering a rational canonical form and in the second case we are considering a Jordan canonical form. Lemma 2. by otj as a basis for Cv .1 T(t V ) (0(T))^-1P'"1(f/)} is linearly independent. We conclude that otj is a basis for C v .. i ) + • • • -r-a d .1 T(i.Tivj)}. For each j. 7J'S are disjoint. is a linearly independent subset of C. where not all of the coefficients are zero. Since 0Vi U 0V2 U • • • U 0Vk is a basis for K^.1 (t. Consequently. it suffices to show that the set of initial vectors of these cycles { ( 0 ( T ) r . Proof. (0(T))^. For convenience.1 (p. i ). Exercise 9 implies that a is a basis for K 0 . we conclude that a is a Jordan canonical basis. By Lemma 1. The key to this proof is Theorem 7. •.J-lVivj).. we designate the first diagram Di. results in a column of pj dots in D\. ) .. otj consists oipjd linearly independent vectors in Cv . We do this for each j to obtain a subset o: = Qi U Q:2 • • • U afc of K^.Then g(t) is a nonzero polynomial of degree less than d. Proof. Let g(t) be the polynomial defined by g(t) — ao + a\t + • • • + a<i-if d_1 . a is a Jordan canonical basis for K^. 7 Canonical Forms (<PiT))Tivj).1 ^ ) . which has dimension pjd. So by Theorem 7.1 (0(T))^. and the second diagram D2. 487). Then H = {(<f>(T)y. By Theorem 7. 485).4 (p. and the.2 T ' ( r .538 where T°(Vj) = bj. H Thus we may replace 0V. otj is linearly independent. ( 0 ( T ) ) ^ . If] We are now in a position to relate the dot diagram of T corresponding to 0(f) to the dot diagram of U. Because a is a union of cycles of generalized eigenvectors of U. the presence of the T-cyclic basis 0X. . Therefore the set of initial vectors is linearly independent.4.1 T d .

500) gives us the number of dots in any row of D2. which are powers of the irreducible monic divisors. In effect. we may divide the appropriate expression in this theorem by d to obtain the number of dots in the corresponding row of D\. denote the number of dots in the ith row of the dot diagram for 0(f) with respect to a rational canonical basis for T. let 0(f) be an irreducible monic divisor of the characteristic polynomial ofT of degree d. 7. we have the following uniqueness condition. and all columns in JD2 are obtained in this way. Under the conventions described earlier.Sec. xex cos 2x.10 (p. We call the number of such occurrences the multiplicity of the elementary divisor.rank((0(T)) i )l for % > 1. Let T be a linear operator on a finite-dimensional vector space V. Since a companion matrix may occur more than once in a rational canonical form.So each column in Di determines d columns in D2 of the same length. Thus we have the following result. the same is true for the elementary divisors. which becomes part of the Jordan canonical basis for U.1 ) . Conversely. are called the elementary divisors of the linear operator.rank(0(T))] (b) n = i[rank((0(T)) i . Theorem 7. otj determines d columns each containing pj dots in D2. d Thus the dot diagrams associated with a rational canonical form of an operator are completely determined by the operator. Then (a) n = i[dim(V) . ex sin 2x. Since the rational canonical form is completely determined by its dot diagrams. the rational canonical form of a linear operator. the polynomials corresponding to the companion matrices that determine this form are also unique. Example 4 Let 0 = {ex cos 2x. Corollary. These polynomials. therefore. xex sin 2x} . Since Theorem 7.4 The Rational Canonical Form 539 replaced by the union otj of d cycles of generalized eigenvectors of U. the rational canonical form of a linear operator is unique up to the arrangement of the irreducible monic divisors of the characteristic polynomial. each of length Pj. Alternatively.24. the elementary divisors and their multiplicities determine the companion matrices and. Since the rational canonical form of a linear operator is unique. and let r. each row in D2 has d times as many dots as the corresponding row in D\.

It follows that the second dot lies in the second row. Let D be the linear operator on V defined by D(y) = y'.r a n k ( 0 ( A ) ) ] = i [ 4 . and hence of A. the number of dots in the first row. For the cyclic generator.) has degree 2 and V is four-dimensional. it suffices to find a function g in V for which 0(D) (o) ^ 0. its rational canonical form is given by the companion matrix of (0(f)) 2 = f4 . which is /() 0 1 0 0 1 \0 0 0 0 0 1 -25\ 20 -14 4/ Thus (0(f)) 2 is the only elementary divisor of D. Since 0(/. we can use A in place of D in the formula given in Theorem 7. and 0 is an ordered basis for V. R). Thus 0(f) = f2 — 2f -f-5 is the only irreducible monic divisor of /(f). and the dot diagram is as follows: 4\ 0 0 o/ Hence V is a D-cyclic space generated by a single function with D-annihilator (0(f)) 2 .4 4>(A) = 0 0 0 \0 0 0 and so r 1 = l [ 4 .24.2 ] = l.2 f + 5) 2 . and let V = span (/if). is /(f) = ( f 2 . the space of all real-valued functions defined on R. and it has multiplicity 1. and let A — [D]^. . the dot diagram for 0(f) contains only two dots.20f + 25. Now (0 0 0 0 0 . Because ranks are preserved under matrix representations. R). Then V is a four-dimensional subspace oiT(R. the derivative of y. 7 Canonical Forms be viewed as a subset of J-(R. Furthermore. Therefore the dot diagram is determined by ri.540 Chap.4i 3 + 14f2 . Then 1 -2 A = 0 ^ 0 ( 2 1 0 0 1 0\ 0 1 1 2 -2 V and the characteristic polynomial of D.

excos2a:) ^ 0. Since the degree of 0i (f) is 2. By Theorem 7. D3(xex cos 2x)} is a rational canonical basis for D. Hence 0g = {xex cos 2x. Notice that the function h defined by h(x) = xex sin 2x can be chosen in place of g. Example 5 For the following real matrix A. D(xex cos 2x). the elementary divisors and their multiplicities are the same as those of LA • Let A be an n x n matrix.rank(0i (A))] = i [ 5 .r a n k ( . if Q is the matrix whose columns are the vectors of 0 in the same order. In fact. the total number of dots in the dot diagram of 0i(f) is 4/2 = 2.Likewise. let C be a rational canonical form of A. which are defined in the obvious way. dim(K 0 . then Q~XAQ = C. Then C = [LA\P. therefore 01 (f) = f2 + 2 and 0 2 (f) = f — 2 are the distinct irreducible monic divisors of /(f). 7. it follows that 0(D)(a. therefore g(x) = xex cos 2x can be chosen as the cyclic generator.•*• Sec. Let A G M n x n (F).) = 4 and dim(K02) = 1.4 The Rational Canonical Form 541 Since 0(^4)(e3) ^ 0. This shows that the rational canonical basis is not unique. 0 2 0 -6 1 -2 0 0 0 1 -3 A = 1 1 -2 1 -1 V1 . and therefore A is similar to C. for A. and let 0 be the appropriate rational canonical basis for L^. • It is convenient to refer to the rational canonical form and elementary divisors of a matrix.4 3 . Thus the dot diagram of 0i (f) is . we find the rational canonical form C of A and a matrix Q such that Q~lAQ — C. Definitions.3 2\ 2 2 2 *) The characteristic polynomial of A is /(f) = — (f2 + 2)2(f — 2). and the number of dots n in the first row is given by n = 4[dim(R 5 ) . The rational canonical form of A is defined to be the rational canonical form of LA. D2(xex cos 2x).23. 4 2 + 2/)] = |[5-1]=2.

Then it can be seen that 2\ -2 0 Av-? = -2 V-4/ and 0Vl U 0V2 is a basis for K 0 ] .Av\}.2) = 1. Setting v\ — e. is the union of two cyclic bases 0Vx and 0V2. then 0D K^. we sec that fi)\ 1 1 1 \V Next choose v2 in K^. Since dim(K0.542 and each column contributes the companion matrix 0 1 -2 0 Chap. which contributes the l x l matrix (2).2 lie in N(0i(L / t)). where v\ and V2 each have annihilator 0i(f). the dot diagram of 0 2 (f) = f — 2 consists of a single dot. 7 Canonical Forms for 0i (f) = f2 + 2 to the rational canonical form C. V2 = e 2 .\. Consequently 0i(f) is an elementary divisor with multiplicity 2. = N(0(Lyt)). / {v\. but not in the span of 0Vi = For example. It follows that both v\ and ?. Therefore the rational canonical form C is / 0 -2 0 0 1 0 0 0 C = 0 0 0 -2 0 0 1 0 0 V o 0 0 0 \ 0 0 0 2/ We can infer from the dot diagram of 0i(f) that if 0 is a rational canonical basis for L^. Hence 0 2 (f) is an elementary divisor with multiplicity 1.4)). Av\ . It can be shown that ((1\ 0 o 0 AA (o\ l 0 0 • 2 0 1 w W ( I w \ °\ 0 -l > 0 X V _ is a basis for N(0i(L.

r 2 = rank(0(^)) .SO setting 2 0\ (1 0 0 0 1 1 -2 1 0 1 Q = 0 1 0 0 1 0 -2 1 \o 1 0 . Av\.rank((0(^)) 3 ) = 1 . any nonzero vector in K02 is an eigenvector of A corresponding to the eigenvalue A = 2. hence in applying Theorem 7. In this case. For example. 0(f) has degree 1. 0 = {vi. where r^ is the number of dots in the ith row of the dot diagram.rank(0(A)) = 4 . and so K 0 = R4.1 = 1. 7. we obtain n = 4 .23. the only irreducible monic divisor of /(f) is 0(f) — f — 2.4)) 2 ) = 2 . choose M I V3= 1 1 w By Theorem 7.2 = 2. Av2. we find the rational canonical form C and a matrix Q such that Q~lAQ = C: (2 1 0 0\ 0 2 1 0 A = 0 0 2 0 \0 0 0 2/ Since the characteristic polynomial of A is /(f) = (f — 2) 4 .4 2/ we have Q~lAQ = C. Since there are dim(R 4 ) = 4 dots in the diagram. Example 6 For the following matrix A.24 to compute the dot diagram for 0(f).4 The Rational Canonical Form 543 Since the dot diagram of 0 2 (f) = f — 2 consists of a single dot.0 = 1.v%} is a rational canonical basis for LA.rank((0(. we may terminate these computations • . and r 3 = rank((0(4)) 2 ) .Sec.V2.

It follows that 0\ Avi = 2 V and n \ A V{ = 4 4 W Next we choose a vector v2 G N(L/i -21) that is not in the span of 0Vi. It can easily be shown that N(L4-2l)=span({c1. v2} is a rational canonical basis for L. and (f — 2) has the companion matrix (2). v2 = < 1 satisfies this condition. the rational canonical form of A is given by / 0 1 C = 0 \ 0 0 8 0 -12 6 1 0 0 0 \ 0 0 2 J Next we find a rational canonical basis for L.\. Vj £ N((L 4 .e4}) and N ( ( L A .2 l ) 2 ) = span({ei. The.. A«. Furthermore. preceding dot diagram indicates that there are two vectors V\ and v2 in R4 with annihilators (0(f)) 3 and 0(f). Thus r/o\ (o\ 1 0 \ 1 > 2 I w W (i\ 1 4 w (0\ 0 0 W . Clearly. The standard vector C3 meets the criteria for v\\ so we set V\ = ('3. respectively. and v2 G N(L4 ..544 with r$.e 4 }).21). Thus the dot diagram for A is Chap. A2vi. and such that 0 = [0Vl U 0Vi} = {v.4.e2. 7 Canonical Forms Since (f — 2) has the companion matrix '0 0 1 0 J) 1 8" -12 6..2I)2).

i r (0. (b) If Ti (1 < i < k) is the restriction ofT to K^. Proof. fhen C\ © C 2 © • • • © Ck is the rational canonical form ofT.25 (Primary Decomposition Theorem). (a) V = K^ ©K^ 2 ©••• © K ^ . Exercise. 7. The next theorem is a simple consequence of Theorem 7.(f)'s (1 < % < k) are distinct irreducible monic polynomials and the ni 's are positive integers. Proof. Label the following statements as true or false. (a) Every rational canonical basis for a linear operator T is the union of T-cyclic bases. Let T be a linear operator on an n-dimensional vector space V with characteristic polynomial /(f) = ( . . where the 0. Theorem 7. and Cj is the rational canonical form of Tj.4 The Rational Canonical Form is a rational canonical basis for LA- 545 Finally. 1 1 EXERCISES 1. Let T be a linear operator on a finite-dimensional vector space V.Sec.23. let Q be the matrix whose columns are the vectors of 0 in the same order: ft) 0 1 0\ 0 1 4 0 Q = 1 2 4 0 \0 0 0 1/ ThenC = Q-MQ. Theorem 7. Exercise. Then the following statements arc true. ( o r ( 0 2 ( o r • • • (0fc(f)r.26. Direct Sums* The next theorem is a simple consequence of Theorem 7. Then V is a direct sum of T-cyclic subspaces CVi.17. where each V{ lies in K^ for some irreducible monic divisor 0(f) of the characteristic polynomial ofT.

(b) Let S — {sinx.xcos3. Define T to be the linear operator on V such that T(/) = /'• (c) T is the linear operator on M 2x2 (. then its Jordan canonical form and rational canonical form are similar. 2.cosa:. find the elementary divisors.R) defined by . (a) A = F = R (b) A = 0 1 -1 -1 F= R (c) A = (d) A = F = R (e) A = F = R 3. For each of the following matrices A G M n x „ ( F ) . a subset of F(R. 7 Canonical Forms (b) If a basis is the union of T-cyclic bases for a linear operator T. R). (a) T is the linear operator on P$(R) defined by T(f(x)) = f(0)x-f'(l). (c) There exist square matrices having no rational canonical form. (g) If a matrix has a Jordan canonical form. the rational canonical form C. and let V = span(5). (f) Let 0(f) be an irreducible monic divisor of the characteristic polynomial of a linear operator T. (e) For any linear operator T on a finite-dimensional vector space.xsinx. The dots in the diagram used to compute the rational canonical form of the restriction of T to K 0 are in one-to-one correspondence with the vectors in a basis for K^. any irreducible factor of the characteristic polynomial of T divides the minimal polynomial of T. (d) A square matrix is similar to its rational canonical form. and a rational canonical basis Q.}.546 Chap. For each of the following linear operators T. find the rational canonical form C of A and a matrix Q G M n X n ( F ) such that Q~l AQ = C. then it is a rational canonical basis for T.

R). and let V = span(S'). where 0i(f) and 0 2 (f) are distinct irreducible monic polynomials and n — dim(V). we require that the generators of the T-cyclic bases that constitute a rational canonical basis have T-annihilators equal to powers of irreducible monic factors of the characteristic polynomial of T. and 0Vl U 0 V2 is a basis for V. .1 ). 5. (a) Prove that R(0(T)) C N((0(T)) m .Sec. Prove that for each i. 547 (d) Let S = {sin x sin?/. m-i is the number of entries in the first column of the dot diagram for 0i(f).y) = df(x. ^2 has T-annihilator 0 2 (f). Let T be a linear operator on a finite-dimensional vector space V with minimal polynomial (0(f)) m for some positive integer m. (a) Prove that there exist V\.4 The Rational Canonical Form T(A) = 0 -1 1 1 •A. 7. Let T be a linear operator on a finite-dimensional vector space with minimal polynomial m = (Mt))mi(Mt)y w)r where the 0*(f)'s are distinct irreducible monic factors of /(f). Let T be a linear operator on a finite-dimensional vector space. cos x sin y.y) dy 4.V2 G V such that v\ has T-annihilator <i>i(t). df(x. to assure the uniqueness of the rational canonical form.y) dx . sin x cost/. (b) Prove that there is a vector t^ G V with T-annihilator 0i(f)0 2 (f) such that 0Va is a basis for V. Thus. cos a: cost/}. 6. Define T to be the linear operator on V such that T(f)(x. 7. (b) Give an example to show that the subspaces in (a) need not be equal. Prove that the rational canonical form of T is a diagonal matrix if and only if T is diagonalizable. (c) Describe the difference between the matrix representation of T with respect to 0Vl U 0V2 and the matrix representation of T with respect to 0Va. a subset of F(R x R. (c) Prove that the minimal polynomial of the restriction of T to R(0(T)) equals (0(f)) m ~ 1 . Let T be a linear operator on a finite-dimensional vector space V with characteristic polynomial /(f) = (—l) n 0i(f)0 2 (f).

Prove Theorem 7. Let T be a linear operator on a finite-dimensional vector space V. Prove. Now suppose that 7i. if 0(T) is not one-to-one. 10. then x G Cy if and only if C x = C«. then 0(f) divides the characteristic polynomial of T.548 Chap.26. . Hint: Apply Exercise 15 of Section 7.. Let V be a vector space and 0i. 9. 7 Canonical Forms 8. 11. that if 0(f) is the T-annihilator of vectors x and y. . Prove that for any irreducible polynomial 0(f).02. •.3.0k be disjoint subsets of V whose union is a basis for V. 7fc are linearly independent subsets of V such that span(7$) = span(/?») for all i. and suppose that 0(f) is an irreducible monic factor of the characteristic polynomial of T. Prove Theorem 7.25. INDEX OF DEFINITIONS FOR CHAPTER 7 Companion matrix 526 Cycle of generalized eigenvectors 488 Cyclic basis 525 Dot diagram for Jordan canonical form 498 Dot diagram lor rational canonical form 535 Elementary divisor of a linear operator 539 Elementary divisor of a matrix 541 End vector of a cycle 488 Generalized eigenspace 484 Generalized eigenvector 484 Generator of a cyclic basis 525 Initial vector of a cycle 488 Jordan block 483 Jordan canonical basis 483 Jordan canonical form of a linear operator 483 Jordan canonical form of a matrix 491 Length of a cycle 488 Minimal polynomial of a linear operator 516 Minimal polynomial of a matrix 517 Multiplicity of an elementary divisor 539 llational canonical basis of a linear operator 526 Rational canonical form for a linear operator 526 Rational canonical form of a matrix 541 . 7 2 . Let T be a linear operator on a finite-dimensional vector space. Prove that 71 U 7 2 U • • • U 7fc is also a basis for V. Exercises 11 and 12 arc concerned with direct sums. 12.• • .

3. then B is called a proper subset of A.1.4.2}. and B / A. For example. we write x G " A. denoted A U B. For example.3. The union of two sets A and B. Sets may be combined to form other sets in two basic ways.3. Observe that A = B if and only if A C B and B C A.A p p e n d i c e s APPENDIX A SETS A set is a collection of objects.4} = {1. By listing the elements of the set between set braces { }. By describing the elements of the set in terms of some characteristic property. 2. AU B = {x: x e A or x & B}.8. is the set of elements that are in A. is the set containing no elements.7.2.6} C {2. otherwise. 549 . Sets may be described in one of two ways: 1.2.if they contain exactly the same elements. • A set B is called a subset of a set A. if Z is the set of integers.4} or as {x: x is a positive integer less than 5}. which we denote by R throughout this text. {1. If B C A.3.1}.2. and 4 can be written as {1. denoted by 0 . hence {1.6. then 3 G Z and i ^ Z. One set that appears frequently is the set of real numbers. Two sets A and B are called equal. called elements of the set. Then A may be written as A = {xeR:l<x<2}. written B C A or A 3 B. that is. The empty set.4} = {3.1. then we write x E A. a fact that is often used to prove that two sets are equal. Example 1 Let A denote the set of real numbers between 1 and 2. written A = B. or both. For example. the set consisting of the elements 1. Note that the order in which the elements of a set are listed is immaterial. The empty set is a subset of every set. if every element of B is an element of A. or B.2. If x is an element of the set A. 2.

5. denoted Ad B. .n} 2=1 and p | Ai = {rr: .550 Appendices The intersection of two sets A and B. j=i Similarly. 2 .8} and A n J? = {1. A fl B = {x: x G A and :r G B}. if X = {1. then XUY = {1. . aeA Example 3 Let A = {a € R: a > 1}. . Two sets are called disjoint if their intersection equals the empty set.1 } oeA and f) Aa = {x e R: 0 < x <2}.8}. Example 2 Let A= {1. and let 4 a = < x e R: — < x < 1 + a I « for each a G A. then the union and intersections of these sets are defined. 2 . . Then AuB = {1. . Likewise.3.3. n}. An are sets. . .2. by M A tt = {x: X E AQ for some a G A} r*6A and p | Aa = {x: x G An for all a G A}.8} and Y = {3. Then ( J A a = {£ G R: x > . .4. . A 2 . respectively. . .2. Specifically. Thus X and Y are disjoint sets.8} • and X (lY = 0.4.5. . The union and intersection of more than two sets can be defined analogously.5}. by n M Ai = { x : x G Ai for some i — 1 .3.r G A{ for all i = 1 . respectively.5}.5} and B = {1.7. is the set of elements that are in both A and B.5. that is. if Aj. if A is an index set and {Aa: a G A} is a collection of sets. the union and intersection of these sets arc defined.7. a€A • .

x in A a unique element denoted f(x) in Z?. 2. written / = </. and 2 is a preimage of 5. that is. then a function / from A to B. • As Example 1 shows. the image of 2 is 5. Moreover. More precisely. Then A is the domain o f / . then y ~ x (symmetry). "is equal to. two functions f: A —> B and <?: A — > f? are equal. If x ~ y." and "is greater than or equal to" are familiar relations. x ~ x (refcxivity). Functions such that each element of the range has a unique preimage are called one-to-one. On the set of real numbers. if S — [1. Likewise.101] is the range of / . then ~ is an equivalence. A relation 5 on a set A is called an equivalence relation on A if the following three conditions hold: 1. we often write x ~ y in place of (x. then / is called onto. if x ^ y implies f(x) ^ f(y)If / : A —• B is a function with range 5 . So / is onto if and only if the range of / equals the codomain of/.9 ] U [9. a relation on A is a set S of ordered pairs of elements of A such that (x. and x is called a preimage of f(x) (under / ) .10]. and [1. cquivalently. for instance. is a rule that associates to each element.2] and T = [82. written / : A — * B.1 in R. The element f(x) is called the image of x (under / ) . Let / : A —> R be the function that assigns to each element x in A the element x 4.5] and f~l(T) = [-10. . For example. that is / : A — > B is one-to-one if /(x) = f(y) implies x = y or. APPENDIX B FUNCTIONS If A and B are sets. that is. / is defined by /(x) = x2 + l. B is called the codomain of / . Finally. the preimage of an element in the range need not be unique. If S is a relation on a set A. and the set {fix): x E A} is called the range of / . for any elements x and y in A. then f(S) = [2. x stands in a given relationship to y.o(x) for all x G A Example 1 Suppose that A = [—10. if T C B. . 3. y) G S if and only if x stands in the given relationship to y.101]. If x ~ y and y ^ z. if /(A) = B. Notice that —2 is another preimage of 5. if / ( x ) = . Since /(2) = 5. we denote by f(S) the set {/(x) : x £ S} of all images of elements of S. if we define x ~ y to mean that x — ?/ is divisible by a fixed integer n.we denote by f~l(T) the set {x G A: / ( x ) G T7} of all preimages of elements in T.10]. For each x G A. If / : A —• B.Appendix B Functions 551 By a relation on a set A. Note that the range of / is a subset of B. i? is the codomain o f / . relation on the set of integers. If S C A. y) G S." "is less than. then A is called the domain of / . then x ~ 2 (transitivity). we mean a rule for determining whether or not.

If / : A . Note that if 5 = [0. then it is unique and is called the inverse of / . and division) can be defined so that. if h: C — * I) is another function. This function is onto.(/. however. Example 2 Let / : [-1.1] -» [0. Hence.•: S — * B. subtraction. . the sum. Thus (g o /)(x) = g(f(x)) for all x G A.'?o. 2.Functional composition is associative. • Let A. let A = B = C = R. If such a function g exists. and C be sets and f:A-+B and o: B -* C be functions. f(x) = sinx. 1] be defined by f(x) = x 2 . Then (g o /)(x) = (. called the restriction of / to S. then h o (p of) — (h o g ) o / ./r. difference. By following / with g. then fr is one-to-one. If / : A —* B and g: B —> C are invertible. if T = [A. 1. whereas ( / o g)(x) = f(g(x)) ~ sin(x 2 + 3). we obtain a function g of: A —> C called the composite of o and / . and quotient of any two elements in the set is an element of the set. then fs is both onto and one-to-one. Finally. The next example illustrates these concepts. with the exception of division by zero. A function / : A — > B is said to be invertible if there exists a function g: B — > A such that ( / o g)(y) = y for all y G B and (g o /)(x) = x for all x G A. then q o f is invertible. 1].o(x) = x 2 + 3. 9 ° / 7^ / ° . • The following facts about invertible functions are easily proved. More precisely. APPENDIX C FIELDS The set of real numbers is an example of an algebraic structure called a field.1 .' = / . a field is denned as follows. and ( / _ 1 ) . It can be shown that / is invertible if and only if / is both one-to-one and onto. hence / is invertible. The inverse of /' is the function / _ l : /? —• R defined by/-'(x) = (x-l)/3. product. Basically. For example. can be formed by defining fs(x) = f(x) for each x G S. then / ' is invertible. and . Example 3 The function f:R—* R denned by fix) — 3x + 1 is one-to-one and onto. and (.</(/(•''-')) = sin2 2 + 3. a field is a set in which four operations (called addition.> B is invertible. We denote the inverse of / (when it exists) by / . but not onto. B./-1ory-'.=. Then a function /<.1]. that is.552 Appendices Let / : A — * B be a function and S Q A. multiplication. but not one-to-one since / ( — I) = / ( l ) = 1.

respectively. with addition and multiplication as in R is a field. respectively. 1-1 = 1. • Example 3 The set of all real numbers of the form a + by/2. rational numbers. • 0-l = l . there exist elements c and d in F such that a+c= 0 and tW = 1 (existence of inverses for addition and multiplication) (F 5) a-(b + c) =ci-b + a-c (distributivity of multiplication over addition) The elements x + y and x-y are called the sum and product. y in F. • Example 2 The set of rational numbers with the usual definitions of addition and multiplication is a field. 0-0 = 0.•*• Appendix C Fields 553 Definitions. ( F l ) a + b = b + a and a*6 = b-a (commutativity of addition and multiplication) ( F 2 ) (a + b)+c = a + (b + c) and (a-b)>c = a>(b-c) (associativity of addition and multiplication) (F 3) There exist distinct elements 0 and 1 in F such that 0+ a=a and 1 -a = a (existence of identity elements for addition and multiplication) (F 4) For each clement a in F and each nonzero clement b in F.c in F. A field F is a set on which two operations + and • (called addition and multiplication. of x and y. The elements 0 (read "zero") and 1 (read "one') mentioned in (F 3) are called identity elements for addition and multiplication.0 = 0.b. respectively. respectively) are defined so that. . where a and b an. • Example 4 The field Z2 consists of two elements 0 and 1 with the operations of addition and multiplication defined by the equations 0 + 0 = 0. 0+1 = 1+0=1. for each pair of elements x. and 1 + 1=0. Example 1 The set of real numbers R with the usual definitions of addition and multiplication is a field. there are unique elements x + y and x-y in F for which the following conditions hold for all elements a. and the elements c and d referred to in (F 4) are called an additive inverse for a and a multiplicative inverse for b.

( . we have (a • b) • d = a • (6. Proof. difference. respectively. Note that . Consider the left side of this equality: By (F 2) and (F 3).l. then (F 4) guarantees the existence of an element d in the field such that b-d = 1. Thus a = c. Multiply both sides of the equality a-b = c . (b) If b T& 0. (a) If a + b = c + b. are unique. a — b = a + (—6) and . Division by zero is undefined. the right side of the equality reduces to c.554 Example 5 Appendices Neither the set of positive integers nor the set of integers with the usual definitions of addition and multiplication is a field. then a = c. b. that is. (b) If a-b = cb and b ^ 0. a unique multiplicative inverse. Suppose that 0' G F satisfies 0' + o. (a) The proof of (a) is left as an exercise. subtraction of b is defined to be addition of —b and division by b ^ 0 is defined to be multiplication by o _ 1 . the symbol f denotes b l. Specifically. Similarly. Proof.2 that 0 has no multiplicative inverse. product. I Corollary. the following statements are true. (It is shown in the corollary to Theorem C. o with this exception. . l (Cancellation Laws). The elements 0 and 1 mentioned in (F 3). the sum.& ) = 6 and (&-1)""1 = 6. but.= d'b b .d) = a • 1 = a. Thus 0' = 0 by Theorem C. I Thus each element b in a field has a unique additive inverse and.b by d to obtain (a-6)-d = (c-fe)-d. The proofs of the remaining parts are similar. Subtraction and division can be defined in terms of addition and multiplication by using the additive and multiplicative inverses. if b ^ 0. Since 0 + a = a for each a G F. and c in a held. Theorem C . then a = c. For arbitrary elements a. for in either case (F 4) does not hold. = a for each a G F.) The additive inverse and the multiplicative inverse of b are denoted by — b and ft""1. 1 In particular. and quotient of any two elements of a field are defined. • The identity and inverse elements guaranteed by (F 3) and (F 4) are unique. we have 0' + a = 0 + a for each a G F. and the elements c and d mentioned in (F 4). this is a consequence of the following theorem.

• '—*^B Appendix C Fields 555 Many of the familiar properties of multiplication of real numbers are true in any field.( a .6 ) . (b) ( . at least. (c) By applying (b) twice. For this reason. | Corollary. Finally. in the field Z2 (defined in Example 4). so a-6 + ( .6 ) = . if no such positive integer exists. (F 5) shows that 0 + a-0 = a-0 = a-(0 + 0) = a-0 + a-0. it may happen that a sum 1 + 1 H + 1 (p summands) equals 0 for some positive integer p. (b) By definition.• • + x (p summands) equals 0 for all x G F. 1 + 1 = 0.b= [a + (-a)] -6 = 0-6 = 6-0 = 0 by (F 5) and (a).6 ) ] = .( .6 ) ] = a-6.( a . Thus Z2 has characteristic two.a ) .( a .a ) . .6 = a .( a . then x + x + .[ . Observe that if F is a field of characteristic p / 0. note that in other sections of this book. Thus ( . Then each of the following statements are true.6 ) .M . as the next theorem shows. In this case. So in order to prove that (—a)-6 = — (a-6).a ) = 0. The proof that a . Let a and b be arbitrary elements of a field. —(a-6) is the unique element of F with the property a-6 + [-(a-6)] = 0.l. we find that (-a)-(-b) = . (c) ( . (a) a-0 = 0. some of the results about vector spaces stated in this book require that the field over which the vector space is defined be of characteristic zero (or.2. For example.6 ) is similar. In an arbitrary field F. it suffices to show that a-6 + (—a)-6 = 0.( . Proof. of some characteristic other than two). Theorem C.6 ) = .6 = .a ) . and R has characteristic zero. The additive identity of a field has no multiplicative inverse. the product of two elements a and 6 in a field is usually denoted a6 rather than a • 6. Thus 0 = a-0 by Theorem C. then F is said to have characteristic zero. But —a is the element of F such that a + ( .a ) .6 ) = a-6.( . In a field having nonzero characteristic (especially characteristic two). the smallest positive integer p for which a sum of p l's equals 0 is called the characteristic of F. (a) Since 0 + 0 = 0. many unnatural problems arise.

for there are polynomials of nonzero degree with real coefficients that have no zeros in the field of real numbers (for example.556 APPENDIX D COMPLEX NUMBERS Appendices For the purposes of algebra. Example 2 illustrates this technique. The product of two imaginary numbers is real since (bi)(di) = (0 + 6i)(0 + di) = (0 . for i = 0 + li. is called imaginary. respectively. It is often desirable to have a field in which any polynomial of nonzero degree with coefficients from that field has a zero in that field. the field of real numbers is not sufficient. Observe that this correspondence preserves sums and products.5i)(9 + 7i) = [3-9 . The observation that i2 = i • i = — 1 provides an easy way to remember the definition of multiplication of complex numbers: simply multiply two complex numbers as you would any two algebraic expressions. and d are real numbers) are defined. that is. It is possible to "enlarge" the field of real numbers to obtain such a field. we have i • i = — 1. where a and 6 are reai numbers called the real part and the imaginary part of z. where 6 is a nonzero real number. x 2 + 1). . Definitions. respectively. A complex number is an expression of the form z = a+bi. • Any real number c may be regarded as a complex number by identifying c with the complex number c + Oi. The sum and product of two complex numbers z = a + bi and w = c+di (where a.5i) + (9 + 7i) = (3 + 9) + [(-5) + 7}i = 12 + 2i and zw = (3 . Any complex number of the form bi = 0 + bi. z + w = (3 . and replace i2 by —1. b. (c + Oi) + (d + Oi) = (c + d) + Qi and (c + 0i)(d + 0i) = cd + Oi. In particular. c.24i. Example 1 The sum and product of z = 3 — 5i and w = 9 + 7% are.6d) + (6-0 + 0-d)i = -6d.(-5)-7] + [(-5)-9 + 3-7}i = 62 . respectively. as follows: z + w = (a + bi) + (c + di) = (a + c) + (6 + d)i and zw = (a + bi)(c + di) — (ac — bd) + (6c + ad)i.

respectively.Oi = 6.6-0) + (6-1 + a-0)i = a + bi. 4-7i = 4 + 7i. Every complex number a + bi has an additive inverse. But also each complex number except 0 has a multiplicative inverse. The (complex) conjugate of a complex number a + bi is the complex number a — bi.6i 2 = —5 + 15i + 2« — 6 ( .5 + 2i)(l .Z Appendix D Complex Numbers Example 2 The product of — 5 + 2i and 1 — 3i is ( . Then the following . l .5 + 15i + 2i .Zi) = . (a + « ) = a2 + b2 b2 In view of the preceding statements.Si) + 2i(l . In fact. and 6 are.5 ( 1 . Theorem D .2. -3 + 2i = -3-2i. • The next theorem contains some important properties of the conjugate of a complex number. Proof. Likewise the real number 1.Si) = . and 6 = 6 + Oi = 6 . is a multiplicative identity element for the set of complex numbers since (a + 6i)-l = (a + bi)(l+0i) = (a-1 . 1 Definition. regarded as a complex number. Example 3 The conjugates of —3 + 2i. We denote the conjugate of the complex number z byz. 4 — 7i. is an additive identity element for the complex numbers since (a + bi) + 0 = (a + bi) + (0 + Oi) = (a + 0) + (6 + 0)i = a + bi. namely (—a) + (—b)i.l ) = 1 + 17i • 557 The real number 0. Theorem D. the following result is not surprising. The set of complex numbers with the operations of addition and multiplication previously defined is a Held. regarded as a complex number. statements are true. Let z and w be complex numbers. Exercise.

r/'2 13 The absolute value of a complex number has the familiar properties of the absolute value of a real number. where a. For any complex number z = a + bi. Let z = a + bi.b G R. Then the following statements are true. for z~z — (a + bi)(a — bi) = a2 + b2. Theorem D. (c) ZW — ~Z'W. for if c + di ^ 0. We leave the proofs of (a). where a. we have zw = (a + bi)(c + di) = (ac — bd) + (ad + bc)i Appendices — (ac — 6d) — (ad + bc)i — (a — 6i)(c — di) = z-w. We denote the absolute value of z by \z\.2i ~ 3 .5 + 14i_ 3 . Let z and w denote any two complex numbers. 6. (d). \wJ w (e) z is a real number if and only if z = z. The fact that the product of a complex number and its conjugate is real provides an easy method for determining the quotient of two complex numbers. + d? be — ad . 13 + 13*' a + 6i c — di c + di c — di (ac + bd) + (be — ad)i c2+d2 ac + bd. c. Then [z + w) = (a + c) + (6 + d)i = (a + c) .(6 + d)i = (a — bi) + (c — di) = z + w. (b) (z + w) = z + w.3. . then a + 6i c + di Example 4 To illustrate this procedure. (b) Let z — a + bi and w — c + di. we compute the quotient (1 + 4i)/(3 — 2i): l + 4 i _ l + 4 i 3 + 2 f _ .558 (a) z = z. z~z is real and nonnegative. as the following result shows. Observe that zz = |z| 2 . and (e) to the reader. The absolute value (or modulus) of z is the real number \/a 2 + 62. (c) For z and to. Definition. Proof. (d) (-) = = ifw ^ 0. This fact can be used to define the absolute value of a complex number.2 i ' 3 + 2i " 9+ 4 5 14. d G R.

We may represent z as a vector in the complex plane (see Figure D. Taking x = wz. (b) For the proof of (b). Using Theorem D.||2| = 2|z||tu|. Notice that. w \w\ (c) \z + w\ < \z\ + \w\. there are two axes. proving (a). where a. (a) By Theorem D. observe that x + x = (a + 6i) + (a . and the absolute value of z gives the length of the vector z. we have.l(a)). we have \zw\2 = izw)izw) = (zw)(z • w) = (zz)(ww) — |2|2|w. the real axis and the imaginary axis. . apply (a) to the product (— ) w. it follows that \z\ = 1(2 + w) — w\ < \z + w\ + I — w\ = \z + w\ + \w\. wz + wz < 2\wz\ = 2|w. by Theorem D. (d) From (a) and (c). So \z\ — |t^| < \z + w\. Thus x + x is real and satisfies the inequality x + x < 2|x|. (b) . we obtain (c).|2. proving (d).\w\ < \z + w\. Suppose that z — a + bi. The real and imaginary parts of z are the first and second coordinates. 1 »' It is interesting as well as useful that complex numbers have both a geometric and an algebraic representation. as in R2.2 again gives \z + w\ = iz + w)(z + w) = (z + w)(z + w) = zz + wz + ZW + WW < \zf + 2|z|H + \wf = (|z| By taking square roots.bi) = 2a < 2\/a2 + b2 = 2|x|. It is clear that addition of complex numbers may be represented as in R2 using the parallelogram law.2 and (a).=11 ifw^O. where a and 6 are real numbers. 559 (d) \z\ .B~ Appendix D Complex Numbers (a) \zw\ = \z\-\w\.2. Proof. 6 G R. \w/ (c) For any complex number x = a + 6i.

w are two nonzero complex numbers.7 (p. We want to find zrj in C such that p(zo) = 0. has a simple geometric interpretation: If z = \z\e and w = \w\e. For \z\ = s > 0. then from the properties established in Section 2.l In Section 2.„z" + a„ iz" l + ••• + a\z + a{) is a polynomial in P(C) of degree n > 1. Suppose that p(z) = o.4 (The Fundamental Theorem of Algebra).r) has a zero. The following proof is based on one in the book Principles of Mathematical Analysis 3d. as well as addition. Proof. 1976). we have (. Then />(. New York. by Walter Rudin (McGraw-Hill Higher Education.«-i i an . Our motivation for enlarging the set of real numbers to the set of complex numbers is to obtain a field such that every polynomial with nonzero degree having coefficients in that field has a zero.l(b). el° is the unit vector that makes an angle 6 with the positive real axis. that is.7 and Theorem D. Thus multiplication.tn . Theorem D.) So zw is the vector whose length is the product of the lengths of z and . Because of the geometry we have introduced. Our next result guarantees that the field of complex numbers has this property. we introduce Filler's formula. we sec1 that any nonzero complex number z may be depicted as a multiple of a unit vector..560 imaginary axis Appendices (b) Figure D. From this figure. The special case _ c o s g _|_ j g m 0 js 0£ particular interest. where 0 is the angle that the vector z makes with the positive real axis.//• >o we' .i(<H a. we have \p(z)\ •-= \anzn + a „ _ .132)./>.3. namely. Let m be the greatest lower bound of {|/>(c)|: 2 G ('}. z = |^|e*^. we may represent the vector c'" as in Figure D. and makes the angle 0 + uj with the positive real axis.

It follows that rn is the greatest lower bound of {|p(z)|: z G D). So we may write q(z) = 1 + bkzk + bk+1zh+1 where 6fc 7^ 0. Because D is closed and bounded and p(z) is continuous. or etkdbk = — l&J. We want to show that rn = 0. | A field is called algebraically closed if it has the property that every polynomial of positive degree with coefficients from that field factors as a product of polynomials of degree 1.\bk\rk + bk+1rk+1ei(k+l)d + ••• + Choose r small enough so that 1 — |6fc[rfc > 0. If p(z) = anzn + an-izn~l + • • • + a\z + ao is a polynomial of degree n > 1 with complex coefficients. Thus the preceding corollary asserts that the field of complex numbers is algebraically closed. so that the expression within the brackets is positive.4 and the division algorithm for polynomials (Theorem E.| a n _ i | s _ 1 |a 0 | |ao| \a0\s~n].7s Appendix D Complex Numbers > |a n ||2 n | . We obtain that \q(rel0)\ < 1. bnrnein6 bnrnein0. c 2 . 561 Because the last expression approaches infinity as s approaches infinity.c 2 ) • • • (z .| a n _ i | s n _ 1 = sn[\an\ . cn (not necessarily distinct) such that p(z) = an(z . Then q(z) is a polynomial of P(zo) + ••• + bnzn. • • •. if necessary. Because — — has modulus one. we may pick a real number 9 bk such that etkd = — ~-. Let q(z) = :—-—.l). Then \q(rei9)\ < 1 . Assume that rn ^ 0. For any r > 0. and \q(z)\ > 1 for all z in C. . q(0) = 1.cA)(z . Now choose r even smaller. then there exist complex numbers Ci. But this is a contradiction. we may choose a closed disk D about the origin such that \p(z)\ > m + 1 if z is not in D.|6fc|rfc + |6fc + i|r fc+1 + • • • + |6 n |r n = l-rfc[|6fc|-|6fc+i|r |6 n |r"-*].Cn). degree n. Corollary. Proof. We argue by contradiction. Exercise. I The following important corollary is a consequence of Theorem D.lon-iU^p" 1 = \an\sn . we have bk q(rei9) = 1 + bkrkelk6 + & fc+1 r* +1 e«* +1 >' + • • • + = 1 . there exists ZQ in D such that |p(zo)| = m.

and assume that f(x) has degree n. CASE 1. Throughout this appendix.r) and r(x) such that f(x)=q(x)g(x) where the degree of r(x) is less than m. and it follows that f(x) and g(x) are nonzero constants. When 0 < rn < n. Now suppose that the result is valid for all polynomials with degree less than n for some fixed n > 0. (1) Then h(x) is a polynomial of degree less than n. Then there exist unique polynomials q(. refer to Section 1. Suppose that f{x) = anxn + an-ixn and g(x) = bmxm + bm-ix™-1 +••• + bix + bo. Theorem E . A polynomial f(x) divides a polynomial g(x) if there exists a polynomial q(x) such that g(x) = f(x)q(x). we assume that all polynomials have coefficients from a fixed field F. If n < m. (3) . we apply mathematical induction on n. and let h(x) be the polynomial defined by hi.562 APPENDIX E POLYNOMIALS Appendices In this appendix. Definition. l (The Division Algorithm for Polynomials).2. and let g(x) be a polynomial of degree rn > 0. Proof. Let f(x) be a polynomial of degree n. and therefore we may apply the induction hypothesis or CASE 1 (whichever is relevant) to obtain polynomials q\ (x) and r(x) such that r(x) has degree less than rn and h(x) = qx(x)g(x) + r(x). For the definition of a polynomial. we discuss some useful properties of the polynomials with coefficients from a field. take q(x) = 0 and r(x) = f(x) to satisfy (1). We begin by establishing the existence of q(x) and r(x) that satisfy (1). Hence we may take q(x) — f(x)/g(x) and r(x) = 0 to satisfy (1). First suppose that n = 0.r) = fix)-anb-l]-r"</(*)• (2) + aix + ao + r(x). Our first result shows that the familiar long division process for polynomials with real coefficients is valid for polynomials with coefficients from an arbitrary field. Then m = 0. CASE 2.

Since r(x) has degree less than 1. Then / ( a ) = 0 if and only if x — a divides f(x). suppose that / ( a ) = 0. (4) The right side of (4) is a polynomial of degree less than m. suppose that F is the field of complex numbers. we call q(x) and r(x) the quotient and remainder. Thus f(x) = q(x)(x-a). and r 2 (x) exist such that n ( x ) and r 2 (x) each has degree less than m and f(x) = oi(x)o(x) + ri(x) = q2(x)g(x) + r 2 (x). Then the quotient and remainder for the division of f(x) = (3 + i)x5 .(1 . respectively. I In the context of Theorem E. which establishes (a) and (b) for any n > 0 by mathematical induction. there exist polynomials q(x) and r(x) such that r(x) has degree less than one and f(x) =o(x)(x-a)+r(x). Since g(x) has degree m. Let f(x) be a polynomial of positive degree. Then there exists a polynomial q(x) such that f(x) = (x — a)q(x). g 2 (x). Hence < 7 i (x) = g 2 (x). Proof. it must follow that q\(x) — q2(x) is the zero polynomial. +4 Corollary 1. thus n ( x ) = r2(x) by (4).Appendix E Polynomials 563 Combining (2) and (3) and solving for f(x) gives us f(x) = q(x)g(x) + r(x) with q(x) — a n 6 ~ 1 x " _ m + </i(x). Substituting a for x in the equation above. For example. and let a G F. we obtain ria) = 0. Then ki(x-) . I . respectively. Suppose that x — a divides f(x).3i)x + 9.6 + 2i)x 2 + (2 + i)x + 1 by g(x) = (3 + i)x2 -2ix arc. Conversely. for the division of f(x) by g(x). This establishes the existence of q(x) and r(x). q(x) = x 3 + ix 2 . Suppose that Oi(x). By the division algorithm. Thus f(a) = (a — a)o(a) = O-g(a) = 0. ri(x). it must be the constant polynomial r(x) = 0.2 and r(x) = (2 .q2(x)]g(x) = r2(x) .l. We now show the uniqueness of q(x) and r(x).n(x).i)x4 + 6x 3 + ( .

Thus f(x) and g(x) are relatively prime because they have. the polynomials with real coefficients f(x) = x 2 (x . Now suppose that the result is true for some positive integer n. consider f(x) and g(x) = (x — 2)(x — 3). by the induction hypothesis. assume that the degree of fi(x) is greater than or equal to the degree of / 2 (x).564 Appendices For any polynomial f(x) with coefficients from a field F. then there is nothing to prove. If / 2 (x) has degree 0.2. and let f(x) be a polynomial of degree n + 1. an element a G F is called a zero of f(x) if f(a) = 0. Corollary 2. Otherwise. Note that q(x) must be of degree n. then by Corollary 1 we may write f(x) = (x —a)o(x) for some polynomial q(x). ( x ) = g ( x ) / 2 ( x ) + r(x). The result is obvious if n — 1. Polynomials having no common divisors arise naturally in the study of canonical forms. (5) . zeros. no common factors of positive degree.) Definition. which do not appear to have common factors. Without loss of generality. where 1 denotes the constant. there exist polynomials q(x) and r(x) such that r(x) has degree less than n and / . Could other factorizations of f(x) and g(x) reveal a hidden common factor? We will soon see (Theorem E. it follows that / ( x ) can have at most n + 1 distinct. By the division algorithm. Now suppose that the theorem holds whenever the polynomial of lesser degree has degree less than n for some positive integer n. q(. Proof. In this case. polynomial with value 1. we can take q\(x) = 0 and f/2(x) = 1/c. there exist polynomials q\(x) and q2ix) such that qi(x)fi(x) + q2(x)f2(x) = l. If f(x) has no zeros. Any polynomial of degree n > 1 has at most n distinct zeros. Proof. Two nonzero polynomials are called relatively polynomial of positive degree divides each of them.r) can have at most n distinct zeros. the preceding corollary states that a is a zero of f(x) if and only if x — a divides f(x). therefore. If fi(x) and f2(x) are relatively prime polynomials.9) that the preceding factors are the only ones. On the other hand. if a is a zero of /(x).1) and h(x) = (x — l)(x — 2) are not relatively prime because x — 1 divides each of them. The proof is by mathematical induction on the degree of / 2 (x). Since any zero of f(x) distinct from a is also a zero of q(x). The proof is by mathematical induction on n. (See Chapter 7. With this terminology. and suppose that / 2 (x) has degree n. Theorem E. then / 2 (x) is a nonzero constant c. prime if no For example.

then there exists a polynomial g(x) of positive degree that divides both / 2 (x) and r(x). | Example 1 Let /i(x) = x 3 — x 2 + 1 and / 2 (x) = (x — l) 2 . Definitions. . Thus there exist polynomials ai (x) and p2 (x) such that 9i(x)f2(x)+g2(x)r(x) Combining (5) and (6). As polynomials with real coefficients. 6. the following notation is convenient.+ a n T n . /i(x) and f2(x) are relatively prime. and 7. we consider linear operators that are polynomials in a particular operator T and matrices that are polynomials in a particular matrix A.g2(x)q(x)] Thus. we have 1 = gi(x)f2(x) = g^(x)fi(x) + g2(x) [fi(x) q(x)f2(x)\ f2(x).2. • and hence these polynomials satisfy the conclusion of Theorem E. contradicting the fact that / i ( x ) and / 2 (x) are relatively prime. It is easily verified that the polynomials qi(x) = —x + 2 and g 2 (x) = x 2 — x — 1 satisfy qi(x)fi(x) + q2(x)f2(x) = 1. Throughout Chapters 5. r(x) is not the zero polynomial. If T is a linear operator on a vector space V over F. For these operators and matrices. Hence. We claim that / 2 (x) and r(x) are relatively prime. Suppose otherwise. we obtain the desired result. g(x) also divides /i(x). by (5). Since r(x) has degree less than n. Similarly. we define / ( T ) = a 0 l + a i T + --. = l. we define f(A) = a0I + a1A + ---+anAn. setting q\(x) = p 2 (^) and <?2(x) = gi(x) — g2(x)q(x). we may apply the induction hypothesis to / 2 (x) and r(x). if A is an x n matrix with entries from F.Appendix E Polynomials 565 Since /i(x) and f2(x) are relatively prime. (6) + [oi(x) . Let f(x) = a 0 + ai (x) + \.anxn be a polynomial with coefficients from a field F.

Exercise. Similarly. Then. Let T be a linear operator on a vector space V over a field F.3a-36). Theorem E. for any polynomials fi(x) and / 2 (x) with coefficients from F. . = I. It is easily checked that T 2 (a.4. a + 26). Proof. then there exist polynomials q\(x) and q2(x) with entries from F such that (a) ai(T)/!(T)+o 2 (T)/ 2 (T) = l (b) q1{A)f1(A) + q2{A)f2(A) Proof. If f\ (x) and / 2 (x) are relatively prime polynomials with entries from F. and let A be a square matrix with entries from F.3(a. and let T be a linear operator on a vector space V over F. and let f(x) = x2 + 2x . if A = then f(A) = A2+2A-3I = 1 2 +2 2 l 1 -1 )-3 l -1 r °\0 ° 1 6 3 3 -3 The next three results use this notation. Let T be a linear operator on a vector space V over a field F. (a) / ( T ) is a linear operator on V.6) = (2a + 6.6) = (T 2 + 2 T .3. (a) / i ( T ) / 2 ( T ) = / 2 (T)/ 1 (T) (b) / 1 (A)/ 2 (A) = / 2 (A)/ 1 (A).566 Example 2 Appendices Let T be the linear operator on R2 defined by T(a.2a . 6) = (5a + 6. Then the following statements are true. then [f(T)}0 = f(A).5.26) . Proof. 1 Theorem E. so /(T)(a. Exercise. (b) If8 is a finite ordered basis for V and A = [T)0. a — 6). a + 26) + (4a + 26. 6 ) = (5a + 6.3 l ) ( a .3. 1 Theorem E. 6) = (6a+ 36. Exercise. Let f(x) be a polynomial with coefficients from a field F. and let A be an n x n matrix with entries from F.

g(x). particular types of polynomials play an important role. Theorem E. and 0(x) be polynomials. Let 0(x) and f(x) be polynomials. If f(x) has positive degree and cannot be expressed as a product of polynomials with coefficients from F each having positive degree. Definitions. Proof. If 0(x) is irreducible and 4>(x) does not divide f(x). Let f(x). for polynomials with coefficients from an algebraically closed field. i Theorem E. Proof. there is a polynomial h(x) such that f(x)g(x) 4>(x)h(x). Suppose that 0(x) does not divide f(x).7. then 0(x) divides f(x) or <j>(x) divides g(x).8. Since 0(x) divides f(x)g(x). If 0(x) is irreducible and divides the product f(x)g(x). The following facts are easily established. For example.6. Proof. Exercise. Any two distinct irreducible monic polynomials arc relatively prime. and so there exist polynomials tfi(x) and q2(x) such that l = Ol(x)0(x)+o2(x)/(x). f(x) = x 2 + 1 is irreducible over the field of real numbers. the polynomials of degree 1 are the only irreducible polynomials. In this setting.w Appendix E Polynomials 567 In Chapters 5 and 7. q2(x)h(x)}. Moreover. then f(x) is called irreducible. Clearly any polynomial of degree 1 is irreducible.6. Thus (7) becomes g(x) = qi(x)(f>(x)g(x) + c 2 (x)0(x)a(x) = 0(x) \qi(x)g(x) + So 0(x) divides g(x). Observe that whether a polynomial is irreducible depends on the field F from which its coefficients come. Both of these problems are affected by the factorization of a certain polynomial determined by T (the characteristic polynomial ofT). A polynomial f(x) with coefficients from a field F is called monic if its leading coefficient is 1. but it is not irreducible over the field of complex numbers since x 2 + 1 = (x + i)(x — i). Exercise. Multiplying both sides of this equation by g(x) yields g(x) = ai(x)0(x)p(x) + a 2 (x)/(x)o(x). then 0(x) and f(x) are relatively prime. 1 (7) = . we are concerned with determining when a linear operator T on a finite-dimensional vector space can be diagonalized and with finding a simple (canonical) representation of T. I Theorem E. Then 0(x) and f(x) are relatively prime by Theorem E.

r) • • • (t)n(x) = [0l(x)0 2 (x) • • • 0„_i(x)] 0 n (x).568 Appendices Corollary. For n = 1. We are now able to establish the unique factorization theorem. If 0(x) divides 0l(-r)0 2 (. . The. Let 0(x). . induction hypothesis . Since 0(x) is an irreducible monic polynomial. 0(x) = 0i(x) for some i (i = 1 . .aix + a 0 for some constants a* with a n ^ 0. . Suppose then that for some n > 1. Proof. If f(x) is not irreducible. Theorem E. .7. unique distinct irreducible monic polynomials 0i(. We prove the corollary by mathematical induction on n..8. . <f>n(x) be irreducible monic polynomials. we have f(x) — a0(x). . nk such that f(x) = <iMx)Y'i[Mx)}n2---[ct>kix)}nk. 0(x) = <f>n(x) by Theorem E. degree is uniquely expressible as a. there exist a unique constant c.9 (Unique Factorization Theorem for Polynomials). . the result is an immediate. 0 2 ( x ) . 0 n (x) be n irreducible polynomials. consequence of Theorem E.r). . 4>k(x). . each of positive degree less than n. and let f(x) be a polynomial of degree n. 0 2 ( x ) . . . and unique positive integers n\. . 2 . . then 0(x) = 0i(x) for some i (i — 1 . We begin by showing the existence of such a factorization using mathematical induction on the degree of fix). In the first case. If f(x) is of degree 1. . then 0(x) divides the product 0i(x)0 2 (x) • • • 0„_i (x) or 0(x) divides 0 n (x) by Theorem E. Then f(x) = anxn H r. the result is proved in this case. then / W = a„(. the corollary is true for any n — 1 irreducible monie polynomials.r" \ + ^i:r»-'+. Proof. in the second case. For any polynomial fix) of positive degree. . . 0 2 ( x ) . then /(x) = g(x)h(x) for some polynomials g(x) and h(x). . If 0(x) divides the product 0i(x)0 2 (x) • • -0n(x). n). . . . Setting 0(x) = x + 6/a. Now suppose that the conclusion is true for any polynomial with positive degree less than some integer n > 1.7. then f(x) = ax + b for some constants a and 6 with a ^ 0. n 2 . This result states that every polynomial of positive. 0i (x). an + ^ + ^ ) an aTIJ is a representation of f(x) as a product of an and an irreducible monic polynomial. .. 2 . . constant times a product of irreducible monic polynomials. which is used throughout Chapters 5 and 7. n— 1) by the induction hypothesis. and let 0i (x). . If f(x) is irreducible.

k and j — 1 . 0i(x) divides the left side of (10) and hence divides the right side also. so that / and g are equal polynomial functions. But this contradicts that 0 i ( x ) . But f(a) = g(a) for all a G Z2. We conclude that r = k and that. each 0i(x) equals some ipj(x). . Consequently f(x) — g(x)h(x) also factors in this way. Let f(x) and g(x) be polynomials with coefficients from an infinite Geld F. we find that (8) becomes [0i ( x ) r [ 0 2 ( x ) p • • • [0fc(x)r = [Mx)}™1 [Mx)r> • • • IM*)}1 (9) So 0i(x) divides the right side of (9) for i = 1 . we may suppose that i = 1 and ni > mi. . hence c = d. Define h(x) = f(x) — g(x).Lmfortunately. Thus.™ ' [0 2 (x)]" 2 • • • [ 0 f c ( x ) r = [ 0 2 ( x ) p • • • [0fc(x)f (10) Since ni — mi > 0. In this case. the value of / at c G F is f(c) = ancn + \.k. . . It follows from Corollary 2 to . each ij)j(x) equals some 0»(x). . For example. . . if f(x) = x2 and g(x) = x are two polynomials over the field Z 2 (defined in Example 4 of Appendix C). . then f(x) and g(x) are equal. Suppose that n» ^ m-j for some i. . . . .Appendix E Polynomials 569 guarantees that both g(x) and h(x) factor as products of a constant and powers of distinct irreducible monic polynomials. k. we obtain [ 0 i ( x ) p . . for arbitrary fields there is not a one-to-one correspondence between polynomials and polynomial functions. then f(x) and g(x) have different degrees and hence are not equal as polynomials. in either case.2. .. If f(a) = g(a) for all a. k by the corollary to Theorem E. and similarly. 2 . .. 0i(x) = i^i(x) for i = 1. 1 It is often useful to regard a polynomial f(x) = anxn H h aix + ao with coefficients from a field F as a function f:F—*F. . Then by canceling [0i(x)] mi from both sides of (9). r. . Our final result shows that this anomaly cannot occur over an infinite field. So 0i(x) = 0i(x) for some i = 2 . Consequently. Suppose that / ( a ) = g(a) for all a G F. .0fc(x) are distinct. G F. 2 . Suppose that / ( x ) = c [ 0 i ( x ) r [02(x)p • • • [0fc(x)r = d[Mx)}mi[Mx)}m2---lMx)]n (8) where c and d are constants.j are positive integers for i = 1 . T h e o r e m E. by the corollary to Theorem E. . It remains to establish the uniqueness of such a factorization. Clearly both c and d must be the leading coefficient of /(x). . by renumbering if necessary. Proof. Dividing by c.. 0i(x) and i/)j(x) are irreducible monic polynomials.a\C + ao. and suppose that h(x) is of degree n > 1.8. Hence the factorizations of f(x) in (8) are the same. 0 2 ( x ) . and ni and m.8.10. . Without loss of generality. f(x) can be factored as a product of a constant and powers of distinct irreducible monic polynomials. 2 .

it follows that h(x) is the zero polynomial. But h(a) = f(a)-g(a) =0 Appendices for every a G F. Thus h(x) is a constant polynomial.570 Theorem E. contradicting the assumption that h(x) has positive degree. Hence f(x) = g(x).l that h(x) can have at most n zeroes. and since h(a) — 0 for each a G F. 1 .

2 1. 2. (VS 4) fails. -1) + s(-2.3 1.5 l \ 1 (e) (g) (5 6 7) 3 < 5y 8.1 1. 7. (a) x = (2. -10) 3.1. 14. -5. 7) + t(-5. No 17.0) + s(9. the set is not closed under addition. -2.15x 13.12.0) SECTION 1. .2) (c) x = (-8. and M22 = 5 6 3 2\ . No. 22. Only the pairs in (b) and (c) are parallel. 2 m n SECTION 1.4) + t(-8. No.9. (a) F 2 .0.0) + t(14. /8 20 -12N 4 (a) (C) ' 1-4 3 9 U 0 28 (e) 2x4 + x3 + 2x2 . M2i = 4. (a) Yes (c) Yes (e) No 11. ( a ) (b) F (c) T (d) F (e) T (c) ( " * J (f) F fj ( g)F (_2 ( _ i ) i the trace i s . -7. 2) + t(0.30x4 + 40x2 .9. (VS 5) fails. (a) T (b) F (c) F (d) F (e) T (f) F (g)F (h)F (i)T (j)T (k)T 3.A n s w e r s t o S e l e c t e d E x e r c i s e s CHAPTER 1 SECTION 1. 15. Yes 15.2.2x + 10 (g) 10rc7 . M13 = 3. Yes 571 . (a) x = (3. -3) (c) x = (3. No.

1 17. n 30.l ) 26. s G R} 3. -2. (a) T (b) F (c) T (d) F (e) T (f) F 2.0.0) + s(-3.0. (a) F (b) T (c) F (d) F (e) T (f)T 2. (a) Yes (c) Yes (e) No 5.x 3 + 2x2 + 4x .ua} 9. dim(W2) = 2.0.02. (a) linearly dependent (c) linearly independent (e) linearly dependent (g) linearly dependent.0.0) :r.1.1) + (5. (a) Yes (c) Yes (e) No 3. n2 . (a) F (b) F (c)F (d)T (e)T (f)T CHAPTER 2 SECTION 2.7 1.4.1.s<= R} (c) There are no solutions.1 1. dim(Wi + W2) = 4.U2.5 1.0) + s(-3. No 5.0. (a) {r( 1.4 1.572 Answers to Selected Exercises SECTION 1. -3.1.04) = aioi + (02 — Oi)«2 + (o. No 7. {u\.5 13. (a) -4x 2 . (a) T (b) F (c)F (d)T (e)F (f)F (g) T (h) F . dim(Wi) = 3.2. o)' \0 1 vo 11. (e) {r(10.x + 8 (c) .i)u4 10.6 1. and dim(Wi n W2) = 1 SECTION 1.1. 2" SECTION 1. | n ( n .5): r.0. (a) F (b) T (c) F (d) F (e)T (f)F (g)F (h) T (i) F (j) T (k)T (1)T 2.3.03.j — a2)«3 + (04 — a. (a) Yes (c) No (e) Yes (g) Yes SECTION 1.0. (a) No (c) Yes (e) No 4.1)} 15.0) + (-4. (a) Yes (c) No (e) No 4. (i) linearly independent T 0\ (0 0N 7. {(1. (oi.

and the rank is 3.11)..1 4 5 V i o i/ (g)(l 0 •-• 0 1) \ (~ 3 2 2 5. anc [UT]2 = 0 0 o) \2 0 6 4 -6 3.3) = (5. and the rank is 2. 3/ (e) / 1\ -2 0 V v 0 0 0 -•\0 0 0 ••• SECTION 2.v „•„ /23 19 A(BD) -(•£) CB = (27 7 9) 1 0 -1 /2 6 0\ 1 . T is not one-to-one but is onto. The nullity is 0. The nullity is 1. (a) F (b) T (g) F (h) F (e) F and (f) F 2.Answers to Selected Exercises 2.l = (e)T f)F / 0 '. T is one-to-one but not onto. (a) 3 4 (c) (2 1 -3) \i 0/ /0 0 ••• 0 1\ 0 0 ••• 1 0 (f) 0 1 ••• 0 0 \1 0 ••• 0 0/ 3. (a) /l 0 0 10.1 \ 2. (a)[T]^= l \ 1 4. 573 SECTION 2. The nullity is 4.3 1. 5. T(2. (a) T (b) T (c) F (d)T (2 . and the rank is 2.I 1\ (d) . (a)A(2£ + 3C7)=( 2 ° . \ (0 1 2 2 (b) 0 0 ^0 0 3 o\ 2 0 2/ 3 3 *-. (a) 4 < 6y /2 3 0\ /I 0 3 6 . T is one-to-one.2 1. m j = /-* -1\ 0 1 V I o/ / l 0 0 0\ 0 0 1 0 0 1 0 0 \0 0 0 1/ 1 0 ••• O N 1 1 ••• 0 0 1 ••• 0 1 1/ (c) F (i) T ^ 0\ (d)T (J)T i) and and [T]. 10. T is neither one-to-one nor onto. No. 4. 12. [U]J = 0 \ 0 0 4/ V (c) (5) { .

y) = SECTION 2.P2(x)}.574 12.5 7. (a) [T]0 = (d)F (d) No (d) No (e)T (e) No (b) No. where g(o + bx) = -3a . = ( z i .5 + x. ) (o[T]s=(-. 7. -f) . (a) No 19.z) = -x + z 5. (a) No. (a) F (g)T 2. (a) T'(f) = g.6 1. SECTION 2. where pi(x) = 2 .z) = x. (a)T(x. 2mx + (m2 . The functions in (a). (a) No 3. The basis for V is {pi(x). h(x.2x and p?(x) = . and (f) are linear functionals.z) = \y.y. Answers to Selected Exercises (f)F (f) Yes SECTION 2. (a) h(x.y.l)y) (b) T (c) T (d) T (e) F (f) T (g) T (h) F 2. (e).4 1.\y. 3. (c). and f3ix.m2)x + 2my.y.46 w v f . (a) F 1+m2 ((1 .

1 1.c) = (\a-b+\c)x + (-\a+\c)x +b ( 1 0 0\ (1 0 1 Oil i o l/lo 0 0\ / l 1 0 0 o i/\o 0 0\ (1 -2 OllO o 1/ \ o 2 0\/l 1 Olio o 1/ Vo 0 ° \ (1 0 1 1 0 10 1 0 .*. and the inverse is 6.e* cos 2t.b. ^ 1 . ( a H e ' 1 4 .l 1/ Vo o i . (e) The rank is 3. (a) T _ 1 (ax2 + bx + c) = -ax2 . -3- (c) The rank is 2. /0 0 1\ / l 0 0\ 3. (a) ( 0 1 0 (c) ( 0 1 0 ) \ 1 0 0/ \2 0 1/ (e) F (f) T 5. and so no inverse exists. e -4* „-2t CHAPTER 3 SECTION 3.-\ + \b+ §c) 1 2 (e) T~ (a.6.*} (c) {e'Ste . (a) F (b) T (b) F (c) F (c) T (d) F (d) T (e) T (e) F (f) F (g) T 575 3.*^} 4. and the inverse is (g) The rank is 4. (a) The rank is 2.(10a + 2b + c) (c) T" 1 (a.^ 2 . (a) T 2.Answers to Selected Exercises SECTION 2. and the inverse is I .7 1.^ / 2 } (e) {e~*.**. (a) {e-*. (a) T (g) T (b) F (h) F (c) T (i) T (d) F (e) T (f) F 2.e t sin2t} ( c ) { l .\a-\c.te .c) = (la-\b+\c. Adding —2 times column 1 to column 2 transforms A into B.(4a + b)x .

and carpenter must have incomes in the proportions 4 : 3 : 4 .8 units of the first commodity and 9. 11. There must be 7. s. . .4 1.1 {(1. T.576 1 3 -2 1 20.H)} = ^ V 0/ +t 2 9 1 9/ G - -2.3 1. The systems in parts (b). (a) (~3\ 1 1 V <v (b) F (c) T (d) F (e)F / 0 0 0 0 0 0 0\ 0 0 0 0 0 0 0 0/ Answers to Selected Exercises (f)F (g)T (h)F ( l (e) (g) -1 \ \ > V i) . (a) F (b) T (c) T (d) T (e) F (f) T (g) T •L .2 V 0 1 SECTION 3.= 2 3 6.1 : t G /? 7. (a) +t -3' tGR 1 (1\ (~2\ (3\ (-l\ 0 1 0 0 + r + .4. t G It 1 ^w V oy ^ 1/ w (g) 0 0 + r w 1 1 : r. (b) (1) . 3. SECTION 3. The farmer. tailor.5 units of the second. (a) F 2. (a) 1 0 0 . (c).s. 13. G /? 0 ' 3N (2) 4. and (d) have solutions.s +t (e) 0 1 0 0 : r.

0). (a) F 2.1 . 0 (c)T 7.2 .«2.1.8 (c) -24 (c) 14 (c) F (d) F (e) T 3. s G . . (1.7 V 3 1 1 0 -9/ 7.0.0.0.2 8 -i .0. (a) ^ 3 + t : te R 1 0 \ z) .Answers to Selected Exercises /4\ (f\ 0 2.s £ R\ (g) < 9 0 9 \o) V <V I l) 1 s /0\ ( 2\ ( l\ 1 2 0 -4 0 + r 1 + s 0 : r.s e R 2 w 577 (c) There are no solutions. (a) -10 + 15i 19. -12 17. 95 (f) F (g) F (h) T (b) T (c) . . (-2. CHAPTER 4 SECTION 4.0.0.1 3 .0.2 1.0.1.0.0). 42 13. 22 (e)F 11. (b) {(1.«5} 11.0.ff > (i) -1 -2 0 J l °y V V VV f 3 ( l\ 1 -1 4. (a) 19 SECTION 4.0. (1. (a) 30 4. -12 15.1 1. {«i.0).1.3 (c) (e) H 1 [ w w t /-23\ /A /-23\ 1 0 0 7 + r 0 +s 6 : r.3 2 21.1.0. (b) {(1. . (a) F 3.-2.0).0. (2. -49 (d)T 9. ( .1.1)} ^ /1\ U +8 : r.0.1.1.1.1.0).2.0. (0.8 (b) T 5. (-3.1)} 13. I w / 1 0 2 1 4\ 5.0). (a) .1.2.

„ (\ an<1 D= y v v r " vs v v) 3 The eigenvalues are 1 and —1. No (e) F (f)T CHAPTER 5 SECTION 5. 16) 24.1 0 0 0 0 -1) The eigenvalues are 4 and — 1.l —1/1 VI —i -l—i/ \Q (b) T (h) T (c) T (i) T (d) F (j) F (e) 3. (a) (e) F (f) F (k) F (-1 0 0N (c) [T] = 0 1 0 3 [ T k 3 = ( _ $ o ) ' "° yes V 0 0 -1. — 11 \ . (a) T (b) T (c) T (d) F (e) F T (h) K (i) T (j) T (k) T 22 (c) 2 . a basis of eigenvectors is 1 M „ / 1 1 .-3. (a) F 3. Yes (d) F 9.5 1.i (g) 95 (f)T (g) F (h) F -6N 37 -16. (a) (c) -1 . (a) 3. (-20.-48.£ : An \ 0 0 -8 18 28 -3i 0 0 (e) 4 -1 + i 0 (g) -20 -21 14 10+16i . a basis of eigenvectors is '2\ ( l M ^ (2 L\ . (4.4 1.0) 5. tn +ari-it' H +tti< + ao /10 0 -An (c) 0 -20 26 . (a) 4. •. „ i .3 0 (c) -49 (e) -28 .<•> ( . (0. o\ -1 1 0 0 0 1 1 [TJ no 0 0 . SECTION 4. and D = . 48 SECTION 4. (a) (g) 2.3 i 3 + 3i. (a) F (b) T (c) F (d) T (e) F (f) T 3.578 SECTION 4. (a) F (g) F 2.1 1.-12.4i -12 (c) -12 (e) 22 (g) . No (b) T 5.5 . Yes (c) T 7.-8) 7.3 Answers to Selected Exercises 1.

.1 . (a) F (g) T (b) F (h) T (c) F (i) F (d) T (c) Q = I / (g) Q = (e) T J (f)F /? = {(3.5). (a) (b) T (h)F 0 0 0 0 (c) (c) F (i)F (d) F (j)T (e) T C I + C 2 0 [ 2ti + c3e l /-I 0 -1\ -4 1 -2 \ 2 0 2/ 6.l .1).1 .8 + x 3 . x + x 2 } 7 4 « _ l / 5 n n + 2 ( . 41% are bedridden. (1. and 14% have died. -1)} (d) (3 = {x .2 1.2)} /? = {(1.4 (b) A = -1.x2. 20% are ambulatory. (a) T (g)T 2.(2.2. 25% of the patients have recovered.2 ^ _ 1 1 ) (c)x(t)=e' SECTION 5.0.^ + c 2 e . 5 26.0).2 (f) A = 1. . l . (a) Not diagonalizable (e) Not diagonalizable 3. (a) A = 3.x2.l o j ' l o .l / ' l l 0 579 0 0 1 0 0 -1 0 1 1 0 0 1 2.(1.1. (g) .3 (h) A = -1.1 (i) A = 1.3 1. . (b)x(t) = c 1 e 3 ^ . x} -1 0\ /0 l \ / l 0 0= o 1/ ' \ o oy ' [o i 1 0\ /0 l \ (-1 0 (3 = i o) ' vo iy ' v i o 0 A (1 0\ / 0 1 (3 = .-1).inr 3\5 -(-l) 2(5" )-2(-l)" 2(5)n + ( .-1)} (3 = {-2 + x.l ) n 14.1.1 (j)A = .(1. One month after arrival.1. (a) Not diagonalizable i i i\ 2 -1 0 V-l 0 -l) (c) Not diagonalizable (e) (3 = {(1. 4 SECTION 5.~ Answers to Selected Exercises 4.1. -4 + x2.x .-1. Eventually | | recover and |£ die.

f . e° = I and e7 = el.2 .225 after two stages and 0. and 60%. (a) tit .l) 3 (t '2 . and (d) are T-invariant.4 \ 1 18. and j§ twice-used 13. l eventually 12. (a) (g)T 1) .l)(t . (c) A'1 = ± I 0 1 3 z \0 0 -2 6.580 Answers to Selected Exercises 7.334 after two stages and .30. '0. Only the matrices in (a) and (b) are regular transition matrices.372^ '0. (a) '0.337. and 42% will own small cars.t 2 10. JQ once-used. 34% will own intermediate-sized cars. yjj new.334. (a) F (b) T (c) F (d) F (e) T (f) T 2.403. (a) l 3 (c) No limit exists.3t + 3) (c) (t .1 (c) 0 ' 1 ' 2 i w W V 2/ 9. the corresponding eventual percentages are 10%.0.0. SECTION 5. '0. 8.3t + 3) (c) 1 . 24% will own large cars. (a) -tit2 . (c).4 1. The subspaces in (a).50N 0.20 eventually (c) . 20. A\ /i\ / i\ 0 0 . 3 1 3 \) \ (e) l 2 0 0 V (g) 0 0 l 2 9. v0. 10. 30%.441 after two stages and ^0. In 1995.225' eventually 0.329N (e) 0.

1. 2 V 3 ( x . and \\x + y\\ = >/&. 10. v / 2 4 6 ( .i. 1 ) . \\x\\ = V7. S1' is the line through the origin that is perpendicular to the plane containing xi and x 2 .y) = 8 + 5i. (a) F (b) T (c) T (d) F (e) T (f) F (g) T 2. .4).4 i ) . vTo5 ( c ) { l .1 7 . (a) t2 . SQ is the plane through the origin that is perpendicular to xo.^cost. ^ ( .1.3 .8 .i J l G =»)' ~ ^ (i){yfSint.i).2 i . (b) { f (1.^T(l-lsint).i ( l + i ) .f . (a) T (b) T (c) F (d) F (e) F (f) F (g) F 2.o) = 1. 2 (h)T e'-l 1 1 0 1 1= ^^-^.2 + 4i. ^ ( . 3 . 7 ) } . l ) } ) (m) / V Il60(l + i) 1 -4% -7-26i _58 11 — 9i . f. i M « f i ( .Qt + 6) 581 CHAPTER 6 SECTION 6. -4yjl.w h ( 75 * (b) * I " M . -9 + 8i.i ) . .2 + i). 2 .v/^'(t+Jcost-f)}. Ji(2n + 2). 9 .l .! O ' i f c O .1 1. (x. l . 3. For each part the orthonormal basis and the Fourier coefficients are given. 0. .6t + 6 (c) -(t + l)(t2 . . a n d | | / + o| l l + 3e^ 2#.-2. f. ^ ( 3 . v/18(2 + i).2 1. \\y\\ = vT4.3 . y/^(l + n). i -5-118i ^39063 V ~uu L 4.4 . 6 V 5 ( x . (e) {±(2. . i .(/.« .x + i ) } . ^ .= s p a n ( { ( i .1). . 0 {^I (a-V 5.-1. 5 .^ ( . f (0. 3 ^ . (b) No SECTION 6. 29\ 19 .Answers to Selected Exercises 31. y ^ ( k ) { ^ ( . y/4l(-l . 4 .1)}.||/|| = ^ . . -18 + 16i. -1.5 i . 16. V60i-1 + 2i).i ) .9 + 8i)}.4 .i . f (-2.1).

and the quadratic function is y = t 2 /3 .3 Answers to Selected Exercises (a) T (b) F (c) F (d) T (e) F (f) T (g) T (a) y = (1. with corresponding eigenvalues 6 and 1.1. T 2 is self-adjoint unitary if and only if |. (a) T (b) F (c) F (d) T (e) T (f) T (g) F (h)T 2. T2 is normal for all z G C. 1. (a) x = f. and the quadratic function is t 2 /56 + 15t/14 + 239/280 with E = 0. —F(2. (a) T (g)F (b)F (h)F (c)F 0)F and • • « ' .22857 (approximation). y = f. • V2V1 oJ'^Vo V'v^V1 0/'V2 -1 0 0 1 with corresponding eigenvalues 1. —V2) > with corresponding eigenvalues l+iand2_i+!.1 73 (d)T D= 3 0 (e) F 0N -1 (-2 I 0 0 0 0N -2 0 0 4.204x + 33 (a) T* (x) = (11. V2). y = i .25t + 0. 21.K .^ SECTION 6.ft. 20.( 1 + i. 1. 22.i .z| = 1. (f) T 1 v^3 V6 (d)P = and D = 1 1 V6 v^3 V 0 -is j-J 4.5 l . . -12) (c) r (/(*)) = 12 + 6t T*(x) = (x. —1. w = . z = \. SECTION 6. V2 V2 (e) T is self-adjoint.3. but not self-adjoint. 5. An orthonormal basis of eigenvectors is 2 + 1. z « \ (d) x = £ . An orthonormal basis of eigenvectors is < -(1 + i. —2). and T 2 is equivalent.4) (c) y = 210x2 .4t/3 + 2 with £7 = 0. An orthonormal basis of eigenvectors is —T=(1. V5 ( v5 (c) T is normal. (a) T is self-adjoint. -2. 14. Only the pair of matrices in (d) are unitarily if and only if z G . 1) \. . The spring constant is approximately 2.55 with E = 0.582 SECTION 6.z)y (a) The linear function is y = —2t + 5/2 with £7 = 1. 3. (b) The linear function is y = 1.4 1. 2.

1 75 1 V2 1 75 i_ V2 . (c) Q = 1 75 0 1 V3 _ 1 73 473 = -5.2)}).f 27. ForW = span({(l.c) = i ( 2 a . (72 = VO) 0"3 = 2 / 1 '73 1 3. u3 = v^F' tii = \/57r \/57r O"! = V5. X3 6 6 ^. (c) x = —T=X H—7=2/ and y = —-=x H—p=y V ' vT3 vT3 \/l3 \/l3 The new quadratic form is 5(x') — 8(y') . 2 0 .c . (a) 73 V-73 1 75 _ 1 72 ° _= h\ 76 1 76 TJ V S 0^ 0 0 . .c) = | ( a + 6 + c.7 1.a .a + b + c) SECTION 6. Fh = (\ 3.^ x ' + -^2/' = -L*' - ±y' The new quadratic form is 3(x') — (y') . b. X2 SECTION 6. U2 = -. 0 0. / l 29. (a) 583 (b)V+! and y x = . . U3 = —==.c .a + b + c.6 + 2c) and T2(a.o .6) = ± ( a . ( a ) t f . i>2 = — 7 = cos x.6 1. 71 V "x + 2 sin x v71" 2 cos x — V 2 7 T sin x cos -. (a) F (b) T (c) T (d) F \ )• (e) F 2. .o + 6) andT 2 (a.Answers to Selected Exercises 25.6. 3/ =4 and R = /v/2 0 V 0 V2 \/3 0 2v^\ ^ ^y \ U (e) xi = 3.a + & ) (d) Ti(a.a + 2 6 .0 ) 26. (2) (a) Ti(o.6) = | ( a + 6.6 . <?2 = V2 1 ! 1\ • ! (c) f 1 = —= sin x. (a) F (b) F (c) T (d) T (e) F (f) F (g) T 2 - ( a ) u i = G)' u 2 = (i)' u i = ^ ( i j ' M 2 = ^ ( _ j ) ' u 3 = ^ <7i = V3.

i and 1 N -0.4 0 0 0 0 0 5. (a) (c) 0 0 0 0 0 0 0 0 \ . (a) and (b) 18.z 5. (a) T (x.25 2 .-1)} (c) Zx = N(T) l = V and Z2 = R(T) .8 1.V 8.584 i l v/lll (c) I v T T ) V-2* ~4U (e) 2 1-t 2 1 V'2 I V2 o 0 0 1 73 _u\ 2 vTo 2 yio 1 Answers to Selected Exercises v2 y/2 2 1 -i i±i >/« 0 ^6 2 0 0 l+i 2 -I 4-i 2 1 J _ \ / s/&+s/2 -N/S+V/2 2 4.1.(0.2 0 .8 0/ \1 0 0 1/ 17.75.1). (c) /1 > V2 0 U J M 1 » w V2 0 V~3j/ /0 0 (c) Q = I 0 1 \1 0 and . (a) WP = V2 75 \ / 2 1 1 J 1 -V§+^2 2 75 x N/2/ \ 2 f +U +z y. D= -1 0 0 0 0 N 4 0 0 6. (a) Yes (b) No (c) No (d) Yes (e) Yes (f) No / 0 0 0 O N f\ 0 0 l \ 1 0 . (a) No solution 2 VI SECTION 6. (a) Q= (i 0 (b) Q = 1 ""?! -* i.y.2) = (c) T^a + 6 sin a: + ccosx) = T" a 6sinx + ccosx) = a (26 + c) sin re + (— 6 + 2c) cos x 2+" "I" -2 3 1 I -i o > ! " 3 -2 1 I -l w S C r * « S G 1 1 +i i 7. (a) F (b) K (c) T (d) F (e) T (f) F (g)F (h)F (i)T (j)F 4. (a) Zi = N(T) 1 = R 2 and Z2 = R(T) = span{(l. Same as Exercise 17(c) 22. 1.

)" 1 = VvT^ SECTION 6. 0 0 1 0 0 1 0 0 s/T^v2/ V vT \ 585 SECTION 6. (a) F (b) F (c) T (d) F (e) F 2. (a) \\A\\ft*84.34 4.2 = .Answers to Selected Exercises SECTION 6.? 5.17 and ^ . (a) \/l8 (c) approximately 2.A~lb\\ < ||A-11| • \\Ax . ||P|| = 2. 0.9 / vT^ 7.(b) {t ( y 1 : / C H t G ft \ if 0 # 0 7..f ' U ^ l IA. and cond(P) = 2.^ l1 6 " < c o n d ( A ) " 6 . ||A _1 || « 17. (c) There are six possibilities: (1) Any line through the origin if 0 = ip = 0 (2) < t I 0 I : t e ft > if 0 = 0 and tp = rr ( /cos ^ + F t I -sirn/- t G ft > if0 = 7 T and tp ^ ir (4) i t I cos 0 . (a) F (g)F (b) T (h)F (c) T (i)T (d) F (j)F (e) T (f)F 3.74.11 1.10 1. (P. ft .001 < "X„ „*" < 10 6.6|| a 0. and cond(A) « 1441 (b) ||x .01.1 t G ft > if^ = 7r and 0 ^ n (5) < t 1 : t G ft > if 0 = ^ = 7 T .

2 ) ( t .( t . (a) .3 ) ' 5 -3 0 A.x 2 } (2 1 0N J = I0 2 1 \ 0 0 2. J = I O \0 (b) T O A2 O O O A:i (c)F (d)T (2 0 0 where A\ = 0 0 \0 and A3 = (e)T 1 0 2 1 0 2 0 0 0 0 0 0 0 -3 Ao = 3 (f)F 0 0 0 0 0 0 2 1 0 2 0 0 (g) F 0\ 0 0 0 0 2/ (h)T /4 1 o ON 0 4 1 0 A2 = 0 0 4 0 \ 0 0 0 4/ 3. 3. o o) n (o o) C 0 ) SECTION 7. (a) For A = 2. {2. A 0 J = 0 V" 1 1 0 0 0 0^ 0 0 1 1 0 J= (c) For A = 1. (a) T /Ax 2. = 2 (b) (c) A2 = 3 (d) pi = 3 and p2 = 1 (e) (i) rank(Ui) = 3 and rank(U2) = 0 (ii) rank(U?) = 1 and rank(Ui) = 0 (iii) nullity(Ui) = 2 and nullity(U2) = 2 (iv) nullity(U?) = 4 and nullity(Ui) = 2 .2 1. (a) T /sin 0(cos i\> + 1) t I — sin0sint/> \sint/>(cos0 + 1) N t e ft otherwise CHAPTER 7 (b) F (f) F (g) T (h) T 2. (a) For A = 2.-2x.586 Answers to Selected Exercises { SECTION 7. (c) For A = -1.1 1.

3) (c) (t .l + n/3) 0 \i-l-iV3) . (a).x 2 } /2 1 0 O N 0 2 1 0 (d)J = and 0 0 2 0 \ 0 0 0 4/ 1 oWo 1 0 = o oj'li o ci + c2t) j 0 j + c2 1 (c! + c2t + c3r) 0 + (c2 + 2c3t) + 2c3 SECTION 7.2tet. (a) J = I 0 \0 /o 0 (d) J = 0 \0 (1 0 5. (a) F (b) T (c) F (d) F (e) T (f) F (g) F (h) T (i) T 2. (a) t .3 1. For (2).1 o) (1 0 l -1 Q = 1 -2 \i o 1 2 1 l 587 and 1 -IN 0 l 0 l 1 0/ 1 0 O N 1 1 0 0 1 0 0 0 2/ 1 0 0\ 2 0 0 0 2 1 0 0 2/ and (3={2et.e2t} and /?= {6x.2)2 2 2 3. I.l) 2 (t . (a) T (b) F 2.2.x 3 . for (3).2) (d) (t -l)it+ 1) 4. and all operators having both 0 and 1 as eigenvalues.l)(t .4 1. (a) /o .2) (d) it . (a) J = 0 \0 /2 0 (c) J = 0 \Q 0 2 0 i 0 0 0 O N 1 2/ o 0 2 0 and O N 0 0 2/ Q= lN 1 2 .Answers to Selected Exercises /I 4.2 0 O N 1 0 0 0 (e) 0 0 0 -3 0 1 \o 0/ (e) T (f) F (g) T (c) ± ( . The operators are To. (a) it .t2et. (a) and (d) 5. SECTION 7.2 (c) (t .

1 0 0 \ 1 0 0 0 C= 0 0 0 0 Vo o o o/ (3= {l. .x.3.1 \0 0 1 1/ 0 O N (0 -1 oj'vo 1\ /0 oj'lo 0 -1 P= 1 0 0 0 588 . (a) t + 1 and t 2 2 /() .-2x + x 2 .3 x + x 3 } (c) t ~ t + 1 2 (0 -1 0 0\ 1 1 0 0 C 0 0 0 .

526 dual. 443 signature. 554 Canonical form Jordan. 335 Annihilator of a subset. 524. 433-435 vector space. 483-516 rational. 424-428 product with a scalar. 43 standard basis for P n (P). 41.Index 589 I n d e x Absolute value of a complex number. 389 Augmented matrix. 131. 446 Cauchy-Schwarz inequality. 482. 526-548 for a symmetric matrix. 444 sum. 204 Associated quadratic form. 112115 Characteristic of a field. 23. 423 rank. 42. 444 invariants. 346-347. see Mu