I

N

E

A

R

L

G

E

B

R

A

STEPHEN H. FRIEDBERG

ARNOLD J. INSEL

LAWRENCE E. SPENCE

L i s t

o if

S y m b o l s

Aj A'1 Al A* Al3 A' (A\B) Bi@---®Bk B(V) 0* Px C Q cond(/4) C"{R) Coc C(R) C([0,1]) Cx D det(A) ^»i dim(V) eA ?A EA F /(^) F"

the ij-th entry of the matrix A the inverse of the matrix A the pseudoinverse of the matrix A the the the the the the adjoint of the matrix A matrix A with row i and column j deleted transpose of the matrix A matrix A augmented by the matrix B direct sum of matrices Bi through B^ set of bilinear forms on V

page 9 page 100 page 414 page 331 page page page page page page 210 17 161 320 422 120

the dual basis of 0 the T-cyclic basis generated by x the field of complex numbers the iih Gerschgorin disk the condition number of the matrix A set of functions / on R with / ' " ' continuous set of functions with derivatives of every order the vector space of continuous functions on R the vector space of continuous functions on [0,1] the T-cyclic subspace generated by x the derivative operator on C°° the determinant of the matrix A the Kronecker delta the dimension of V lim m _ 0 0 ( / + A + 4r + • • • + AS) the ith standard vector of F" the eigenspace of T corresponding to A a field the polynomial f(x) evaluated at the matrix A the set of n-tuples with entries in a field F

page 526 page 7 page 296 page page page page page 469 21 130 18 331

page 525 page 131 page 232 page 89 page 47 page 312 page 43 page 264 page 6 page 565 page 8

/(T) J-'iS, F) H /„ or / lv or I K\ K,;, L\ Inn Am m—oo £(V) £(V. W) Mt/)x;i(/') is(A) Vj(A) N(T) nullity(T) 0 per(.V) P(F) P„(/'") o.i R rank(/4) rank(T) f>(A) Pi(A) R(T)

the polynomial /(•»') evaluated at the operator T the set of functions from S to a field F space of continuous complex functions on [0. 2TT] the // x n identity matrix the identity operator on V generalized eigenspace of T corresponding to A {./•: (<.'>(T))''(.r) = 0 for sonic positive integer p} left-multiplicatioD transformation by matrix A the limit of a sequence of matrices the space of linear transformations from V to V the space of linear transformations from V to W the set of m x // matrices with entries in F the column sum of the matrix A the j t h column sum of the matrix A the null space of T the dimension of the null space of T the zero matrix the permanent of the 2 x 2 matrix M the space of polynomials with coefficients in F the polynomials in P(F) of degree at most // standard representation with respect to basis 3 the field of real numbers the rank of the matrix A the rank of the linear transformation T the row sum of the matrix A the ith row sum of the matrix A the range of the linear transformation T CONTINUED ON REAR ENDPAPERS

page page page page page page page page page

565 9 332 89 07 485 525 92 284

page 82 page 82 page 9 page 295 age 295 age 67 >age 69 age Pi 8 ige P< 448 age 10 P P ige 18 pa ge 104 page 7 pa ;e 152 Da ;e 69 page 295 •v 295 re 67

T^

7

~

n

C o n t e n t s

Preface

IX

1 Vector Spaces 1.1 1.2 1.3 1.4 1.5 1.6 1.7* Introduction Vector Spaces Subspaces Linear Combinations and Systems of Linear Equations . . . . Linear Dependence and Linear Independence Bases and Dimension Maximal Linearly Independent Subsets Index of Definitions

1 1 6 16 24 35 42 58 62

2 Linear Transformations and M a t r i c e s 2.1 2.2 2.3 Linear Transformations. Null Spaces, and Ranges The Matrix Representation of a Linear Transformation Composition of Linear Transformations and Matrix Multiplication 2.4 Invertibility and Isomorphisms 2.5 The Change of Coordinate Matrix 2.6" Dual Spaces 2.7* Homogeneous Linear Differential Equations with Constant Coefficients Index of Definitions

64 64 79 86 99 110 119 127 145

...

3 Elementary M a t r i x O p e r a t i o n s a n d Systems o f Linear Equations 147 3.1 Elementary Matrix Operations and Elementary Matrices . . 147

Sections denoted by an asterisk are optional.

vi 3.2 3.3 3.4

Table of Contents The Rank of a Matrix and Matrix Inverses Systems of Linear Equations Theoretical Aspects Systems of Linear Equations Computational Aspects . . . . Index of Definitions 152 168 182 198

4

Determinants 4.1 4.2 4.3 4.4 4.5* Determinants of Order 2 Determinants of Order n Properties of Determinants Summary Important Facts about Determinants A Characterization of the Determinant Index of Definitions

199 199 209 222 232 238 244

5

Diagonalization 5.1 5.2 5.3s 5.4

245

Eigenvalues and Eigenvectors 245 Diagonalizability 261 Matrix Limits and Markov Chains 283 Invariant Subspaces and the Cayley Hamilton Theorem . . . 313 Index of Definitions 328

6

Inner P r o d u c t Spaces 6.1 6.2

329

Inner Products and Norms 329 The Gram-Schmidt Orthogonalization Process and Orthogonal Complements 341 6.3 The Adjoint of a Linear Operator 357 6.4 Normal and Self-Adjoint. Operators 369 6.5 Unitary and Orthogonal Operators and Their Matrices . . . 379 6.6 Orthogonal Projections and the Spectral Theorem 398 6.7* The Singular Value Decomposition and the Pseudoinverse . . 405 6.8* Bilinear and Quadratic Forms 422 6.9* Einstein A Special Theory of Relativity s 151 6.10* Conditioning and the Rayleigh Quotient 464 6.11* The Geometry of Orthogonal Operators 472 Index of Definitions 480

Table of Contents 7 Canonical Forms 7.1 7.2 7.3 7.4* The Jordan Canonical Form I The Jordan Canonical Form II The Minimal Polynomial The Rational Canonical Form Index of Definitions 482 482 497 516 524 548 549 549 551 552 555 561 571

Appendices A B C D E Sets Functions Fields Complex Numbers Polynomials

Answers t o Selected Exercises

Index

589

-— -

_• . »

z

P r e f a c e The language and concepts of matrix theory and, more generally, of linear algebra have come into widespread usage in the social and natural sciences, computer science, and statistics. In addition, linear algebra continues to be of great importance in modern treatments of geometry and analysis. The primary purpose of this fourth edition of Linear Algebra is to present a careful treatment of the principal topics of linear algebra and to illustrate the power of the subject through a variety of applications. Our major thrust emphasizes the symbiotic relationship between linear transformations and matrices. However, where appropriate, theorems are stated in the more general infinite-dimensional case. For example, this theory is applied to finding solutions to a homogeneous linear differential equation and the best approximation by a trigonometric polynomial to a continuous function. Although the only formal prerequisite for this book is a one-year course in calculus, it requires the mathematical sophistication of typical junior and senior mathematics majors. This book is especially suited for a. second course in linear algebra that emphasizes abstract vector spaces, although it can be used in a first course with a strong theoretical emphasis. The book is organized to permit a number of different courses (ranging from three to eight semester hours in length) to be taught from it. The core material (vector spaces, linear transformations and matrices, systems of linear equations, determinants, diagonalization. and inner product spaces) is found in Chapters 1 through 5 and Sections 6.1 through 6.5. Chapters 6 and 7, on inner product spaces and canonical forms, are completely independent and may be studied in either order. In addition, throughout the book are applications to such areas as differential equations, economics, geometry, and physics. These applications are not central to the mathematical development, however, and may be excluded at the discretion of the instructor. We have attempted to make it possible for many of the important topics of linear algebra to be covered in a one-semester course. This goal has led us to develop the major topics with fewer preliminaries than in a traditional approach. (Our treatment of the Jordan canonical form, for instance, does not require any theory of polynomials.) The resulting economy permits us to cover the core material of the book (omitting many of the optional sections and a detailed discussion of determinants) in a one-semester four-hour course for students who have had some prior exposure to linear algebra. Chapter 1 of the book presents the basic theory of vector spaces: subspaces, linear combinations, linear dependence and independence, bases, and dimension. The chapter concludes with an optional section in which we prove

x

Preface

that every infinite-dimensional vector space has a basis. Linear transformations and their relationship to matrices are the subject of Chapter 2. We discuss the null space and range of a linear transformation, matrix representations of a linear transformation, isomorphisms, and change of coordinates. Optional sections on dual spaces and homogeneous linear differential equations end the chapter. The application of vector space theory and linear transformations to systems of linear equations is found in Chapter 3. We have chosen to defer this important subject so that it can be presented as a consequence of the preceding material. This approach allows the familiar topic of linear systems to illuminate the abstract theory and permits us to avoid messy matrix computations in the presentation of Chapters 1 and 2. There arc occasional examples in these chapters, however, where we solve systems of linear equations. (Of course, these examples are not a part of the theoretical development.) The necessary background is contained in Section 1.4. Determinants, the subject of Chapter 4, are of much less importance than they once were. In a short course (less than one year), we prefer to treat determinants lightly so that more time may be devoted to the material in Chapters 5 through 7. Consequently we have presented two alternatives in Chapter 4 -a complete development of the theory (Sections 4.1 through 4.3) and a summary of important facts that are needed for the remaining chapters (Section 4.4). Optional Section 4.5 presents an axiomatic development of the determinant. Chapter 5 discusses eigenvalues, eigenvectors, and diagonalization. One of the most important applications of this material occurs in computing matrix limits. We have therefore included an optional section on matrix limits and Markov chains in this chapter even though the most general statement of some of the results requires a knowledge of the Jordan canonical form. Section 5.4 contains material on invariant subspaces and the Cayley Hamilton theorem. Inner product spaces are the subject of Chapter 6. The basic mathematical theory (inner products; the Grain Schmidt process: orthogonal complements; the adjoint of an operator; normal, self-adjoint, orthogonal and unitary operators; orthogonal projections; and the spectral theorem) is contained in Sections 6.1 through 6.6. Sections 6.7 through 6.11 contain diverse applications of the rich inner product space structure. Canonical forms are treated in Chapter 7. Sections 7.1 and 7.2 develop the Jordan canonical form, Section 7.3 presents the minimal polynomial, and Section 7.4 discusses the rational canonical form. There are five appendices. The first four, which discuss sets, functions, fields, and complex numbers, respectively, are intended to review basic ideas used throughout the book. Appendix E on polynomials is used primarily in Chapters 5 and 7, especially in Section 7.4. We prefer to cite particular results from the appendices as needed rather than to discuss the appendices

Preface

xi

independently. The following diagram illustrates the dependencies among the various chapters. Chapter 1

Chapter 2

Chapter 3

Sections 4.1 4.3 or Section 4.4

Sections 5.1 and 5.2

Chapter 6

Section 5.4

Chapter 7

One final word is required about our notation. Sections and subsections labeled with an asterisk (*) are optional and may be omitted as the instructor sees fit. An exercise accompanied by the dagger symbol (f) is not optional, however -we use this symbol to identify an exercise that is cited in some later section that is not optional. DIFFERENCES BETWEEN THE THIRD AND FOURTH EDITIONS The principal content change of this fourth edition is the inclusion of a new section (Section 6.7) discussing the singular value decomposition and the pseudoinverse of a matrix or a linear transformation between finitedimensional inner product spaces. Our approach is to treat this material as a generalization of our characterization of normal and self-adjoint operators. The organization of the text is essentially the same as in the third edition. Nevertheless, this edition contains many significant local changes that im-

xii

Preface

prove the book. Section 5.1 (Eigenvalues and Eigenvectors) has been streamlined, and some material previously in Section 5.1 has been moved to Section 2.5 (The Change of Coordinate Matrix). Further improvements include revised proofs of some theorems, additional examples, new exercises, and literally hundreds of minor editorial changes. We are especially indebted to Jane M. Day (San Jose State University) for her extensive and detailed comments on the fourth edition manuscript. Additional comments were provided by the following reviewers of the fourth edition manuscript: Thomas Banchoff (Brown University), Christopher Heil (Georgia Institute of Technology), and Thomas Shemanske (Dartmouth College). To find the latest information about this book, consult our web site on the World Wide Web. We encourage comments, which can be sent to us by e-mail or ordinary post. Our web site and e-mail addresses are listed below. web site: e-mail: http://www.math.ilstu.edu/linalg linalg@math.ilst u.edu Stephen H. Friedberg Arnold J. Insel Lawrence R. Spence

This geometry is derived from physical experiments that. . involve both a magnitude (the amount of the force. consequently." A vector is represented by an arrow whose length denotes the magnitude of the vector and whose direction represents the direction of the vector. Familiar situations suggest that when two like physical quantities act simultaneously at a point. regard for the direction of motion) is called its speed. or acceleration) and a direction. test the manner in which two vectors interact.3 Subspaces 1. we regard vectors with the same magnitude and direction as being equal irrespective of their positions. such as forces.1 and accelerations. velocities. the magnitude of their effect need not equal the sum of the magnitudes of the original quantities. If.6 Bases and Dimension 1. Any such entity involving both magnitude and direction is called a "vector.7* Maximal Linearly Independent Subsets 1. Lor in this instance the motions of the swimmer and current. however.2 Vector Spaces 1. The magnitude of a velocity (without.5 Linear Dependence and Linear Independence 1. velocity. and the rate of progress of the swimmer is only 1 mile per hour upstream. For example.1 Introduction 1. the 'The word velocity is being used here in its scientific sense as an entity having both magnitude and direction. a swimmer swimming upstream at the rate of 2 miles per hour against a current of 1 mile per hour does not progress at the rate of 3 miles per hour. oppose each other.1 V e c t o r S p a c e s 1.1 INTRODUCTION Many familiar physical notions. In tins section the geometry of vectors is discussed. only the magnitude and direction of the vector are significant.4 Linear Combinations and Systems of Linear Equations 1. In most physical situations involving vectors.

the vectors used to represent these quantities can be combined to form a resultant vector that represents the combined effects of the original quantities.beginning at the origin is completely determined by its endpoint. Since opposite sides of a parallelogram are parallel and of equal length. (See Figure 1. either x or y may be applied at P and a vector having the same magnitude and direction as the other may be applied to the endpoint of the first.) Figure 1. Moreover.2(a) shows. their effect is predictable. Experiments show that if two like quantities act together. In the plane containing x and y. a-2 + b?). This resultant vector is called the sum of the original vectors. can be performed on vectors—the length of a vector may be magnified . The addition of vectors can be described algebraically with the use of analytic geometry. 62) denote the endpoint of y. there is another natural operation that. that is. The sum of two vectors x and y that act at the same point P is the vector beginning at P that is represented by the diagonal of parallelogram having x and y as adjacent sides. Besides the operation of vector addition. the endpoint Q of the arrow representing x + y can also be obtained by allowing x to act at P and then allowing y to act at the endpoint of x. we sometimes refer to the point x rather than the endpoint of the vector x if x is a vector emanating from the origin. 1 Vector Spaces swimmer is moving downstream (with the current). then his or her rate of progress is 3 miles per hour downstream. the vector should be assumed to emanate from the origin. (Z2) denote the endpoint of x and (fei. introduce a coordinate system with P at the origin. the endpoint of the vector x + y can be obtained by first permitting y to act at P and then allowing x to act at the endpoint of y.1 Parallelogram Law for Vector Addition. Thus two vectors x and y that both act at the point P may be added "tail-to-head". Then as Figure 1. Let (o. Henceforth. of a vector. Similarly.2 Chap.1. the endpoint Q of x+y is (ay + b\.i. since a vector. If this is done. and the rule for their combination is called the parallelogram law. In this case. the endpoint of the second vector is the endpoint of x + y. when a reference is made to the coordinates of the endpoint.

6. y.) The algebraic descriptions of vector addition and scalar multiplication for vectors in a plane yield the following properties: 1.02). 8.hi) C {ai +61.Sec. (See Figure 1. (a + b)x = Arguments similar to the preceding ones show that these eight properties. (x + y) + z = x + (y + z).1 Introduction {taittaa) 1 ( M +6i. For all vectors x and y. lx — x.a2 4. the vector tx is represented by an arrow in the same direction if t > 0 and in the opposite direction if t < 0. 2. For all vectors x. If the endpoint of x has coordinates (01. again introduce a coordinate system into a plane containing the vector x so that x emanates from the origin. 5.2 or contracted. consists of multiplying the vector by a real number. called scalar multiplication.) To describe scalar multiplication algebraically. . 4. 1. y. Two nonzero vectors x and y are called parallel if y = tx for some nonzero real number t. For each pair of real numbers a and 6 and each vector x. (Thus nonzero vectors having the same or opposite directions are parallel. then for any nonzero real number t. This operation. If the vector x is represented by an arrow. x + y = y + x. For each real number a and each pair of vectors x and ax + ay. There exists a vector denoted 0 such that x + 0 = x for For each vector x. each vector x. there is a vector y such that x + y = For each vector x.62] Figure 1. For each pair of real numbers a and b and each vector ax + bx. The length of the arrow tx is |i| times the length of the arrow x. a(x + y) = x. 7. as well as the geometric interpretations of vector addition and scalar multiplication. These results can be used to write equations of lines and planes in space. (ab)x = a(bx).2(b).t02). and z. are true also for vectors acting in space rather than in a plane. 3. then the coordinates of the endpoint of tx are easily seen to be (toi. 0.

4

Chap. 1 Vector Spaces

Consider first the equation of a line in space that passes through two distinct points A and B. Let O denote the origin of a coordinate system in space, and let u and v denote the vectors that begin at. O and end at A and B, respectively. If w denotes the vector beginning at A and ending at B. then "tail-to-head" addition shows that u + w — v, and hence w — v — u, where —u denotes the vector (—l)u. (See Figure 1.3, in which the quadrilateral OABC is a parallelogram.) Since a scalar multiple of w is parallel to w but possibly of a different length than w, any point on the line joining A and B may be obtained as the endpoint of a vector that begins at A and has the form tw for some real number t. Conversely, the endpoint of every vector of the form tw that begins at A lies on the line joining A and B. Thus an equation of the line through A and B is x = u + tw = u + t(v — u), where t is a real number and x denotes an arbitrary point on the line. Notice also that the endpoint C of the vector v — u in Figure 1.3 has coordinates equal to the difference of the coordinates of B and A.

^C Figure 1.3 Example 1 Let A and B be points having coordinates (—2,0,1) and (4,5,3), respectively. The endpoint C of the vector emanating from the origin and having the same direction as the vector beginning at A and terminating at B has coordinates (4,5,3) - ( - 2 , 0 , 1 ) = (6, 5, 2). Hence the equation of the line through A and B is x = (-2,0.1)-R(6,5,2). •

Now let A, IS, and C denote any three noncollinear points in space. These points determine a unique plane, and its equation can be found by use of our previous observations about vectors. Let u and v denote vectors beginning at A and ending at B and C, respectively. Observe that any point in the plane containing A, B, and C is the endpoint S of a vector x beginning at A and having the form su + tv for some real numbers s and t. The endpoint of su is the point of intersection of the line through A and B with the line through S

Sec. 1.1 Introduction

*-C Figure 1.4 parallel to the line through A and C. (See Figure 1.4.) A similar procedure locates the endpoint of tv. Moreover, for any real numbers 5 and i, the vector su + tv lies in the plane containing A, B, and C. It follows that an equation of the plane containing A, B, and C is x = A + su + tv. where s and t are arbitrary real numbers and x denotes an arbitrary point in the plane. Example 2 Let A, B, and C be the points having coordinates (1,0,2), (—3,-2,4), and (1,8, -5), respectively. The endpoint of the vector emanating from the origin and having the same length and direction as the vector beginning at A and terminating at B is (-3,-2,4)-(1,0,2) = (-4,-2,2). Similarly, the endpoint of a vector emanating from the origin and having the same length and direction as the vector beginning at A and terminating at C is (1,8, —5) — (1,0,2) = (0,8,-7). Hence the equation of the plane containing the three given points is x = (l,0,2) + . s ( - 4 , - 2 , 2 ) + f.(0,8,-7). •

Any mathematical structure possessing the eight properties on page 3 is called a vector space. In the next section we formally define a vector space and consider many examples of vector spaces other than the ones mentioned above. EXERCISES 1. Determine whether the vectors emanating from the origin and terminating at the following pairs of points are parallel.

Chap. 1 Vector Spaces (a) (b) (c) (d) (3. 1,2) and (6,4,2) ( - 3 , 1 , 7 ) and ( 9 . - 3 , - 2 1 ) ( 5 . - 6 . 7 ) and ( - 5 , 6 , - 7 ) ( 2 . 0 , - 5 ) and (5,0,-2)

2. hind the equations of the lines through the following pairs of points in space. (a) ( 3 , - 2 , 4 ) and (-5.7,1) (b) (2,4.0) and ( - 3 , - 6 . 0 ) (c) (3, 7. 2) and (3, 7. - 8 ) (d) ( - 2 , - 1 , 5 ) and (3,9,7) 3. Find the equations of the planes containing the following points in space. (a) (b) (c) (d) (2. - 5 , - 1 ) . (0,4.6). and ( - 3 , 7 . 1 ) (3, - 6 , 7), ( - 2 , 0 . - 4 ) , and (5, - 9 , - 2 ) ( - 8 . 2,0), (1.3,0), and (6, - 5 , 0 ) (1.1.1). (5,5.5). and ( - 6 , 4 , 2 )

4. What arc the coordinates of the vector 0 in the Euclidean plane that satisfies property 3 on page 3? Justify your answer. 5. Prove that if the vector x emanates from the origin of the Euclidean plane and terminates at the point, with coordinates (01,02), then the vector tx that emanates from the origin terminates at the point with coord i n ates (/ a \. Ia •>) • 6. Show that the midpoint of the line segment joining the points (a.b) and (c.d) is ((a \ c)/2,(6 + d)/2). 7. Prove that the diagonals of a parallelogram bisect each other. 1.2 VECTOR SPACES

In Section 1.1. we saw that, with the natural definitions of vector addition and scalar" multiplication, the vectors in a plane; satisfy the eight properties listed on page 3. Many other familiar algebraic systems also permit definitions of addition and scalar multiplication that satisfy the same eight properties. In this section, we introduce some of these systems, but first we formally define this type of algebraic structure. Definitions. A vector space (or linear space) V over a field2 F consists of a set on which two operations (called addition and scalar multiplication, respectively) are defined so that for each pair of elements x, y.
2

Fields are discussed in Appendix C.

Sec. 1.2 Vector Spaces

7

in V there is a unique element x + y in V, and for each element a in F and each element x in V there is a unique element ax in V, such that the following conditions hold. (VS 1) For all x, y in V, x + y = y + x (commutativity of addition).

(VS 2) For all x, y, z in V, (x + y) + z = x + (y + z) (associativity of addition). (VS 3) There exists an element in V denoted by 0 such that x + 0 = x for each x in V. (VS 4) For each element x in V there exists an clement y in V such that x + y = 0. (VS 5) For each clement x in V, 1.x = x. (VS 6) For each pair of elements a, b in F and each element x in V, (ab)x — a(bx). (VS 7) For each clement a in F and each pair of elements x, y in V, a(x + y) — ax + ay. (VS 8) For each pair of elements o, b in F and each clement x in V, (a + b)x — ax + bx. The elements x + y and ax are called the sum of x and y and the of a and x, respectively. product

The elements of the field F are called scalars and the elements of the vector space V are called vectors. The reader should not confuse this use of the word "vector'' with the physical entity discussed in Section 1.1: the word "vector" is now being used to describe any element of a vector space. A vector space is frequently discussed in the text without explicitly mentioning its field of scalars. The reader is cautioned to remember, however, that every vector space is regarded as a vector space over a given field, which is denoted by F. Occasionally we restrict our attention to the fields of real and complex numbers, which are denoted R and C. respectively. Observe that (VS 2) permits us to define the addition of any finite number of vectors unambiguously (without the use of parentheses). In the remainder of this section we introduce several important examples of vector spaces that are studied throughout this text. Observe that in describing a vector space, it is necessary to specify not only the vectors but also the operations of addition and scalar multiplication. An object of the form (ai ,02, • • •, a n ), where the entries a\, a 2 , . . . , a n are elements of a field F, is called an n-tuple with entries from F. The elements

8

Chap. 1 Vector Spaces

01,02 "„ are called the entries or components of the //-tuple. Two n-tuples (</\.a-2 an) and (61,62 6 n ) with entries from a field F are called equal if a, — b, lor / 1.2 11. Example 1 Thesel of all n-tuples with enl ries from a held F is denoted by F". This set is a vector space over F with the operat ions of coordinatewise addition and scalar multiplical ion; that is, if u — (01,02 a,,) € F". v — (61, 62 ....!>,,) € F", and c € F, then ;/ I i'(a.| I 61,02 + 62 "» +6„) and C = (cai,C02 K <""»)•

Thus R'5 is n vector space over R. In this vector space, ( 3 , - 2 , 0 ) + ( - 1 , 1 , 4 ) = (2, I. II and -5(1. 2,0) = (-5,10,0).

Similarly. C is a vector space over ('. In this vector space. (1 + i, 2) + (2 - 3i, 4z) = (3 - 2/, 2 + Ai) and i( I I i, 2) = ( - 1 + 1, 2t).

Vectors in F" may be written as column vectors A'.\

W rat her than as row vectors (a \.'/_an). Since a 1 -1 uple whose only enl ry is from F can be regarded as an element of F, we usually write F rather than F1 for the vector space of l-tuples with entry from F. • An /// x 11 matrix with entries from a field F is a rectangular array of the form fau "21 "u 022 aln\ "2n where each entry alj (1 < i < m, I • j O-mn) is an element of F. We < n) call the entries c/,; with / ~ 7 the diagonal entries of the matrix. The entries a,\.a;-2 </,„ compose the ?'th row of the matrix, and the entries a\r(i<i O-rnj compose the j t h column of the matrix. The rows of the preceding matrix are regarded as vectors in F". and the columns are regarded as vectors in F'". The m - n matrix in which each entry equals zero is called the zero matrix and is denoted by ().

Sec. 1.2 Vector Spaces

9

In this book, we denote matrices by capital italic letters (e.g., A. B, and C). and we denote the entry of a matrix A that lies in row i and column j by Ajj. In addition, if the number of rows and columns of a matrix are equal. the matrix is called square. Two m x n matrices A and B are called equal if all their corresponding entries are equal, that is, if Ajj = By for 1 < i < m and 1 < j < n. Example 2 The set of all mxn matrices with entries from a Held F is a vector space, which we denote by M m x n ( F ) , with the following operations of matrix addition and scalar multiplication: For A, B C M m x n ( F ) and c £ F, (4 + B)y = Ay + By and (cA)ij - Ci4y

for 1 < i < m and 1 < j < n. For instance, 2 1 and -3 1 1 M2X3 R) 1 Example 3 Let S be any nonempty set and /•" be any field, and let F(S,F) denote the set of all functions from S to F. Two functions / and g in T(S, F) are called equal if f(s) = g(s) for each s £ S. The set J-'iS.F) is a vector space with the operations of addition and scalar multiplication defined for / . g G .F(S. F) and c € F by (/ + 0)OO=/(*)+0(«) and (c/)(.s)=c[/(,s)] -3 0 2 -2 0 •6 6 -9 0 -3 -r> —9 4 3 6 -3 -2 5 1 1 3

for each s e S. Note that these are the familiar operations of addition and scalar multiplication for functions used in algebra and calculus. • A polynomial with coefficients from a. field F is an expression of the form f(x) = anxn + an-1x1 + <I]:V + do,

where n is a nonnegative integer and each a.f,.. called the coefficient of x , is in F. If f(x) = 0. that is, if a„ — an-\ — • • • = at) = 0. then j'(.r) is called the zero polynomial and. for convenience, its degree is defined to be —1;

10

Chap. 1 Vector Spaces

otherwise, the degree of a, polynomial is defined to be the largest exponent of x that appears in the representation /(./;) = anxn + a n _ i : r n - ] + • • • + axx + a0 with a nonzero coefficient. Note that the polynomials of degree zero may be written in the form f(x) — c for some nonzero scalar c. Two polynomials. f(x) = anxn + an-ixn~l and g(x) = bmxm + 6 m _ 1 x"'- 1 + • • • + blX + b0, are called equal if m = n and a-i = b,. for ? — 0 . 1 . . . . , n. When F is a field containing infinitely many scalars, we usually regard a polynomial with coefficients from F as a function from F into F. (See page 569.) In this case, the value of the function f(x) = anxn + an-yxn at c e F is the scalar /(c) = anc" + an_1cn
l x

+

h axx + a0

+ •••+ ayx + a0

+ • • • + aye + aQ.

Here either of the notations / or /(./;) is used for the polynomial function f{x) = anxn + o n _ i ^ n _ 1 + • • • + axx + a0. Example 4 Let f(x) = anxn + an-ixn~l and g{x) = bmxm + 6 m _iar m - 1 + • • • + 6,.r + 60 be polynomials with coefficients from a field F. Suppose that m < n, and define 6 m _i = 6 m + 2 = • • • = 6 n = 0. Then g(x) can be written as 9{x) = bnx" + 6 n _ i x n _ 1 + • • • + bix + b0. Define f(x)+g{x) - (o n + 6„)a;n + ( a n _ i + 6 n . i).r"-' +• • • + («, + 6,).r + (a 0 + 60) c/(;r) = canxn + can. ixn~l H h axx + a0

and for any c € F, define H h coia; + ca 0 .

With these operations of addition and scalar multiplication, the set of all polynomials with coefficients from F is a vector space, which we denote by P(F). •

Sec. 1.2 Vector Spaces

11

We will see in Exercise 23 of Section 2.4 that the vector space defined in the next, example is essentially the same as P(F). Example 5 Let F be any field. A sequence in F is a function a from the positive integers into F. In this book, the sequence o such that, o~(n) — an for n — 1.2.... is denoted {a,,}. Let V consist of all sequences {o n } in F that have only a finite number of nonzero terms a„. If {«„} and {6n} are in V and t € F . define {/'.„} + {6n} = {an + bn} and t{an} = {tan}. •

With these operations V is a vector space.

Our next two examples contain sets on which addition and scalar multiplication are defined, but which are not vector spaces. Example 6 Let S = {(ai, 02): 01,02 £ R}- For ( M , 0-2), (61,62) E S and c € R, define C (oi,a 2 ) + (61,62) = ( M + 61,02 - 62) C and c(oi,a 2 ) = (cai,ca 2 ).

Since (VS f). (VS 2), and (VS 8) fail to hold, S is not a vector space with these operations. • Example 7 Let S be as in Example 6. For (01,02), (61,62) £ S and c 6 R, define (a i. a-)) + (61,62) = [a 1 + b\, 0) and c(o\, a->) = (ca 1,0).

Then S is not a vector space with these operations because (VS 3) (hence (VS 4)) and (VS 5) fail. • We conclude this section with a few of the elementary consequences of the definition of a vector space. Theorem 1.1 (Cancellation Law for Vector Addition). If x, y, and z are vectors in a vector space V .such that x + z = y + z, then x = y. Proof. There exists a vector v in V such that 2 + v = 0 (VS 4). Thus
x

= x + 0 = x + (z + v) = (x + z) + v = (y + z) + v = y + (z + v) =y+0=y I

by (VS 2) and (VS 3). Corollary 1. The vector 0 described in (VS 3) is unique.

12 Proof. Exercise.

Chap. 1 Vector Spaces ]| i

Corollary 2. The vector y described in (VS 4) is unique. Proof. Exercise. The vector 0 in (VS 3) is called the zero vector of V, and the vector y in (VS 4) (that is. the unique vector such that x + y = 0) is called the additive inverse of x and is denoted by —x. The next result, contains some of the elementary properties of scalar multiplication. Theorem 1.2. In any vector space V, the following statements are true: (a) Ox = 0 for each x 6 V. (b) (—a)x = —(ax) = a.(—x) for each a £ F and each x £ V. (c) aO = 0 for each a € F. Proof, (a) By (VS 8), (VS 3), and (VS 1), it follows that Ox + Ox - (0 + 0 > = 0:/: = 0.r +0 = 0 + Ox. Hence Ox = 0 by Theorem 1.1. (b) The vector —(ax) is the unique element of V such that ax + [—(ax)] — 0. Thus if ax + (—a)x — 0, Corollary 2 to Theorem 1.1 implies that {-a).r = -(ax). But. by (VS 8), ax + (-a)x = [a + {-a)]x = Ox = 0 by (a). Consequently (—a)x = —(ax). by (VS 6). a(-x)=a[(-l)x] = [a(-l)]x = (-a)x. 1 The proof of (c) is similar to the proof of (a). EXERCISES 1. Label the following statements as true or false. (a) (b) (c) (d) (e) (f) (g) (h) Every vector space contains a zero vector. A vector space may have more than one zero vector. In any vector space, ax = bx implies that a = 6. In any vector space, ax = ay implies that x = y. A vector in F" may be regarded as a matrix in M.„ x i(F). An m x n matrix has m columns and n rows. In P(F). only polynomials of the same degree may be added. If / and o are polynomials of degree n, then / + g is a polynomial of degree n. (i) If / is a polynomial of degree n and c is a nonzero scalar, then cf is a polynomial of degree n. In particular. (—l)x = —x. So,

Sec. 1.2 Vector Spaces

13

(j) A nonzero scalar of F may be considered to be a polynomial in P(F) having degree zero, (k) Two functions in F(S, F) are equal if and only if they have the same value at each element of S. 2. Write the zero vector of 3. If M = what are Mia,M2i, and M22? 4. Perform the indicated operations. 1 2 3 4 5 6 "3x4 (F).

w (d)

<\

I

1

(e) (2.T4 - 7x3 + Ax + 3) + (8x3 + 2x2 - 6x + 7) (f) (-3a; 3 + 7a;2 + 8a - 6) + (2a:3 - 8ar + 10) (g) 5(2.r7 - 6x4 + 8x2 - 3a) (h) 3(.x5 - 2a:3 + 4a; + 2) Exercises 5 and 6 show why the definitions of matrix addition and scalar multiplication (as defined in Example 2) are the appropriate ones. 5. Richard Card ("Effects of Beaver on Trout in Sagehen Creek. California," ./. Wildlife Management. 25, 221-242) reports the following number of trout, having crossed beaver dams in Sagehen Creek.

Upstream Crossings Fall Brook trout. Rainbow trout Brown trout. Spring Summer

14 Downstream Crossings Fall 9 * 1 Spring 1 (J 1

Chap. 1 Vector Spaces

Brook trout Rainbow trout Brown trout

Summer 4 0 0

Record the upstream and downstream crossings in two 3 x 3 matrices. and verify that the sum of these matrices gives the total number of crossings (both upstream and downstream) categorized by trout species and season. 6. At the end of May, a furniture1 store had the following inventory.

Living room suites Bedroom suites Dining room suites

Early American 1 5 3

Spanish 2 1 1

Mediterranean ] I 2

Danish 3 4 6

Record these data as a 3 x 4 matrix M. To prepare for its June sale, the store decided to double its inventory on each of the items listed in the preceding table. Assuming that none of the present stock is sold until the additional furniture arrives, verify that the inventory on hand after the order is filled is described by the matrix 2M. If the inventory at the end of June is described by the matrix A = 5 3 1 2 (i 2 1 • ) 1 0 3 3

interpret 2M — A. How many suites were sold during the June sale? 7. Let 5 = {(). 1} and F - R. In F(S, R), show that. / = g and / -t g = h. where f(t) = 2t + 1. g(t) = 1 + At - It2, and h(t) = 5' + 1. 8. In any vector space1 V, show that, (a + b)(x + y) = ax + ay + bx + by for any x.y £ V and any o,6 £ F. 9. Prove Corollaries 1 and 2 of Theorem 1.1 and Theorem 1.2(c). 10. Let V denote the set of all differentiablc real-valued functions defined on the real line. Prove that V is a vector space with the operations of addition and scalar multiplication defined in Example 3.

Sec. 1.2 Vector Spaces

15

11. Let V = {()} consist of a single vector 0 and define 0 + 0 = 0 and cO = 0 for each scalar c in F. Prove that V is a vector space over F. (V is called the zero vector space.) 12. A real-valued function / defined on the real line is called an even function if /(—/,) = f(t) for each real number t. Prove that the set. of even functions defined on the real line with the operations of addition and scalar multiplication defined in Example 3 is a vector space1. 13. Let V denote the set of ordered pairs of real numbers. If (01,02) and (6], 62) are elements of V and c £ R, define (a 1, <I-I) + (61,62) = (O] + 6], o 2 6 2 ) and c.(a{, a2) = (cai, a2).

Is V a vector space over R with these operations? Justify your answer. 14. Let V = {(c/.i, a2..... o n ) : c/.( £ C for i = 1,2,... n}; so V is a vector space over C by Example 1. Is V a. vector space over the field of real numbers with the operations of coordinatewise addition and multiplication? 15. Let V = {(a.]. 0'2: • • • • an): aL £ Riori — 1,2, . . . n } ; so V is a vector space over R by Example 1. Is V a vector space over the field of complex numbers with the operations of coordinatewise addition and multiplication? 16. Let V denote the set of all m x n matrices with real entries; so V is a vector space over R by Example 2. Let. F be the field of rational numbers. Is V a vector space over F with the usual definitions of matrix addition and scalar multiplication? 17. Let V = {(01,02): 01,02 £ F}, where F is a field. Define addition of elements of V coordinatewise. and for c £ F and (01,02) £ V, define c(ai,o 2 ) = (c/.|,0). Is V a vector space over F with these operations? Justify your answer. 18. Let V = {(01,02): oi,a 2 £ R}. define For (ai,a 2 ),(6i,6 2 ) £ V and c £ R.

(a,\.a2) + (6].62) = (ai + 26i.a-2 + 3b2)

and

c{a.\,a2) = (cai,ca 2 )-

Is V a vector space over R with these operations? Justify your answer.

) For {«„}. 0) if c = 0 cai.{«. V is a vector space over R.md c(v\.02) in V and c £ R. How many matrices arc there in the vector space M m x n (Z2)? Appendix C. necessary to verify all of the vector space properties to prove that a subset is a subspace.cw1). (VS 6). Prove that.16 Chap. (See 22. Let V be the set of sequences {an} of real numbers.wi) + (v2.02 £ R}.4 subset W of a vector space V over a field F is called a subspace of V if W is a vector space over F with the operations of addition and scalar multiplication defined on V. Let V and W be vector spaces over a field F. Is V a vector space over R with these operations? Justify your answer. note that V and {0} are subspaces. Fortunately it is not. . {6n} £ V and any real number t. 20. Let V = {(01. it is of interest to examine subsets that possess the same structure as the set under consideration.02): 01. with these operations. w\ + w2) . these properties automatically hold for the vectors in any subset. Prove that Z is a vector space over F with the operations (vi.) 1. (VS 7). define .„ + 6„} and t{a„} = {tan}. In any vector space V. . The appropriate notion of substructure for vector spaces is introduced in this section.Define addition of elements of V coordinatewise.w2) = (ci + u2.o. — 1 it c / 0. and for (01.3 SUBSPACES In the study of any algebraic structure.w): v £ Vandv/. £ W}. Definition. vector space V is a subspace of V if and only if the following four properties hold. 1 Vector Spaces 19. Thus a subset W of a. define {<>•„} + {6n} . (VS 2).Wi) = (cvi. Let Z = {(v. Because properties (VS 1). The latter is called the zero subspace of V. and (VS 8) hold for all vectors in the vector space. (See Example 5 for the definition of a sequence. (VS 5). 21.

Hence conditions (!>] and (c) hold. Let V be a vector space and W a subset ofV. a subspace. 1 0 -2 3 5 . 12). It is easily proved that for any matrices A and B and any scalars a and 6. (aA + bB)' = a A' + bB'. (W is closed under scalar multiplication. 1. x+y £ W whenever . a symmetric matrix must be square. Hence W is a subspace of V. I"he zero matrix is equal to its transpose and hence belongs to W. in fact. Proof. I I). If W is a subspace of V. Normally.3 Subspaces 17 1. it is this result that is used to prove that a subset is. and thus 0' . the discussion preceding tins theorem shows thai W is a subspace of V if the additive inverse of each vector in W lies in W.0 by Theorem 1. 4. if conditions (a).W and y i W.) Using this fact. t hen W is a vector space wit h the operat ions of addition and scalar multiplication defined on V.4' = A.Sec.2 (p.1 2 3 1 2 2 3 A symmetric matrix is a matrix . (See Exercise 3. Clearly. and (c) hold. Then W is a subspace ofV if and only if the following three conditions hold for the operations defined in V. (Al)ij = Aj%. So condition (a) holds. Conversely.) 2.3 hold: 1. Theorem 1. For example. then ( -l)x C W by condition (c). The next theorem shows that the zero vector of W must be the same as the zero vector of V ami thai property I is redundant. But also x + 0 = x. (c) ex £ W whenever c . the 2 x 2 matrix displayed above is a symmetric matrix. (a) 0 e W . ex £ W whenever c £ /•" and x C W. (b) x + y £ W whenever x i. But if a i W. The transpose A1 of an m x n matrix A is the n x m matrix obtained from A by interchanging the rows with the columns: that is. W has a zero vector. and —x = ( — 1 )x by Theorem 1. The set W of all symmetric matrices " u :F) is a subspace of M n x „ ( F ) since the conditions of Theorem 1. (W is closed under addition.3. I The preceding theorem provides a simple method for determining whether or not a given subset of a vector space is a subspace. Each vector in W has an additive inverse in W./• c W and y ( W. .) 3. (b).F and x C W.4 such that . and there exists a vector 0' > W Mich that x — 0' = X for each x £ W. For example. we show that the set of symmetric matrices is closed under addition and scalar multiplication.1 (p.

R) by Theorem 1. if all its nondiagonal entries are zero. Se> P„(F) is edoseel under addition and scalar multiplication. • Example 4 The1 trace of an n x n matrix M. We claim that C(/t*) is a subspace of J-(R. • Example 2 Lc1!. Hence A + B and cA are diagonal matrices for any scalar c. it is in P. If A £ W. If A £ W and B € W. denoted tr(M). 3. we have1 (a/1)' = «.„(F).4' = aA. Thus (A + B)1 = A1 + Bf = A + B.2. . that is. • Example 3 An n x n matrix M is called a. Thus aA £ W.4 and B are diagonal v x n matrices. is the. = cO = 0 for any scalar c. Since constant functions are1 continuous. First note that the1 zero of 1F(R. and let P„(F) consist of all polynomials in P(F) having ck'gree1 k'ss than or equal to n. Since the zc. The example's that follow provide further illustrations of the concept of a subspace.. C(7?) denote the set of all continuous real-valued functions defined on R. then A' = A.j = 0 whenever i ^ j.. three are particularly important.3. So C(R) is closed under addition and scalar multiplication and hene-e1 is a subspace of J-(R. then A' = A and B* = B. 1 Vector Spaces 2. the sum of two continuous functions is continuous.+ M n n . (A f B)ij = A. Moreover. Therefore the set of diagonal matrices is a subspace of M„ X „(F) by Theorem 1. The first. that is. if .ro polynomial has degree —1.sum of the diagonal entries of M. diagonal matrix if M. continuous function is continuous.18 Chap. Moreover. + Bi:j = 0 + 0 = 0 1 and 1 (cA)l:J = eA. Clearly C(7?) is a subset of the vector space1 T(R. we. and the product of a scalar and a polynomial of degree le'ss than or equal to n is a polynomial of degree less than or equal te> n. Clearly the1 zero matrix is a diagonal matrix because all of its emtric's are1 0.have / £ C(R). and the product of a real number and a. It therefore follows from Theorem 1.3 that P„(F) is a subspace of P(F).R). then whenever i ^ j. II) is the1 constant function defined by f(t) = 0 for all t € R. so that A + B £ W. the1 sum of two polynomials with degrees less than or equal to n is another polynomial of degree less than or equal to n. tr(M) = Mn + M 2 2 + --. So for any a £ F. Moreover.R) defined in Example 3 of Section 1.3. Example 1 Let it be a nonnegative integer.

• Example 5 The set of matrices in M. Hence X f y and a. As we already have suggested.3. it can be readily shown that the union of two subspaces of V is a subspace of V if and only if one of the subspaces contains the other.r are also contained in W. a natural way to combine two subspaces Wi and W_) to obtain a subspace that contains both W| and W-j.3 Subspaces 19 It follows from Exercise 6 that the set of n x n matrices having trace equal to zero is a subspace of M n x n ( F ) . (d) The intersection of any two subsets of V is a subspace of V. but in general the union of subspaces of V need not be1 closed under addition. (b) The empty set is a subspace of every vector space. it follows that x + y and ax are contained in each subspace in C. and lei W demote' the intersection of the1 subspaces in C.Sec. This idea is explored in Exercise 23. then V contains a subspace W such thai W f V. If V is a vector space and W is a subset of V that is a vector space. 0 £ W. T h e o r e m 1. it is natural to consider whether or not the union of subspaces of V is a subspace of V.y £ W.4.union of subspaces must contain the zero vector and be' closed under scalar multiplication. (c) If V is a vector space other than the zero vector space. Any intersection of subspaces of a vector si>ace V is a subspace ofV. Label the following statements as true or false. In fact. £ /•' and x. then W is a subspace of V.IIXII(R. Because.) having nonnegative entries is not a subspace of Mmxn{R) because it is not closed under scalar multiplication (by negative scalars). Then x and y are' contained in each subspace in C. Proof.) There is.each subspace in C is closed under addition and scalar multiplication. se> that W is a subspace of V by Theorem 1. It is easily seen that the. the key to finding such a subspace is to assure thai it must be closed under addition. (a) . Since1 every subspace contains the zcre> vector. Let C be a collection of subspaces of V. 1. EXERCISES 1. (See Exercise 19. Let c. however. I Having shown that the intersection of subspaces of a vector space V is a subspace of V. • The next theorem shows how te) form a new subspace from other subspaces.

Determine the transpose of each of the matrices that follow. and VV4 be as in Exercise 8.3 a | + 60^ = 0} 9. compute its trace. and W3 n W4. B £ M m x n ( F ) and any a. a2.7o2 + a 3 = 0} R 3 : O] . and observe that each is a subspace of R3. a 2 £ R}. 6 £ F. Wi n W 4 .6 4 7 (d) -2 5 7 0 I 4 1 . Let Wi. Determine whether the following sets are subspaces of R3 under the operations of addition and scalar multiplication defined on R3. a 3 ) {(aj. B £ M n x n ( F ) . 1 Vector Spaces (e) An n x n diagonal matrix can never have more than n nonzero entries.20 Chap. W = {(ai.03) {(01.02. that is. 7. 4. (f) The trace of a square matrix is the product of its diagonal entries. 2. a 3 ) £ £ £ £ £ £ R 3 : ai = 3a2 and 03 = —a2) R 3 : a.3o3 = 1} R 3 : 5a? . Prove that tr(aA + bB) = atr(A) + 6tr(£) for any A.03) {(01. Prove that diagonal matrices are symmetric matrices. (g) Let W be the ay-plane in R3.03) {(ai.4a 2 . a 2 . (a) (b) (c) (d) (e) (f) W) W2 W3 W4 W5 W6 = = = = = {(01. 6. Prove that A + A1 is symmetric for any square matrix A. Prove that (aA + bB)1 = aA* + bB1 for any A. 5. = a 3 + 2} R 3 : 2ai . 8. Prove that (A1)* = A for each A £ M m X n ( F ) . T h e n W = R2.6 (v. W 3 .a 2 .) ( I 13 5) (f) (h) 3.a 3 = 0} R 3 : ai + 2a 2 . Describe Wi n W 3 . Justify your answers. In addition.02.03) {(01.0): a i . (b) 0 3 8 .a 2 . . if the matrix is square.02.

F) is a subspace of ^(S. a n ) £ F" : ai + o 2 + • • • + a n = 0} is a subspace of F n . Let S be a nonempty set and F a. Let C(S. an. w2. Show that the set of convergent sequences {o n } (i. {/ £ T(S. Prove that the set of all even functions in T(F\.e.F(Fi. 13. . a 2 . Prove that C(S. this exercise is essential for a later sectiejii.. 17.R). m x n (F). Let W. .' Prove that if W is a subspace of a vector space V and w\. C W 2 or W 2 C W. n] a subspace of P(F) if n > 1? Justify your answer. Prove that W. wn are in W. 22. 19.) denote the set of all real-valued functions defined on the real line that have a continuous nth derivative. 11. W = {f(x) £ P(F): f(x) = 0 or f(x) has degree. 15. Let Cn(R. . and. 21..s of.F2) is called an even function if g(—t) = g(t) for each t £ F\ and is called an odd function if g(—t) = —g(t) for each t £ Fi. F). a2.F 2 ). 1. and W2 be subspaces of a vector space V. Let F\ and F 2 be fiedds. field. = { ( o i . . 20. 'A dagger means that. Is the set.. • • •.n exists) is a subspace of the vector space V in Exercise 20 of Section 1. Prove that a subset W of a vector space V is a subspace. Prove that for any SQ £ S. . Is the set of all differentiable real-valued functions defined on R a subspace of C(R)? Justify your answer. . An m x n matrix A is called upper triangular if all entries lying below the diagonal entries are zero. field. Prove that a subset W of a vector space. . those for which limn-too a. .. F): / ( s 0 ) = 0}. of V if and only if W ^ 0. Prove that. Let S be a nonempty set and F a. then a\W\ + a2w2 + • • • + anwn £ W for any scalars a\. F) denote the set of all functions / £ T(S. A function g £ T(Fl. F2) and the set of all odd functions in T(I'\.Sec. 16. the upper triangular matrices form a subspace of M. F2) are subspace. Prove that W| U W2 is a subspace of V if and only if W. 12. whenever a £ F and x.3 Subspaces 21 10. V is a subspace of V if and only if 0 £ W and ax + y £ W whenever a £ F and x. y £ W. that is. 18. is a subspace of F(S. . 14. F).2. F) such that f(s) — 0 for all but a finite number of elements of S. an) £ F n : a i + a 2 H h an = 1} is not. Prove that Cn(R) is a subspace of F(R..y £ W.. a 2 . then ax £ W and x + y £ W. if Aij = 0 whenever i > j. but W 2 = {(a \.

26. Wc denote that V is the direct sum of W] and W 2 by writing V = W. . . Let Wi and W 2 be subspaces of a vector space V.1 + • • • + bxx + b0.. Let W[ denote the set of all polynomials /(.. .o 2 . Show that V = Wi © W 2 . m . Chap.. © W 2 . o n ) £ F" : a. Prove that P(F) = W. Prove that Wi + VV2 is a subspace of V that contains both W] and W2."_1 -| h a i a + c/(l. then the sum of Si and S2. . (b) Prove that any subspace of V that contains both Wi and W2 must also contain W t + W 2 . o 2 . o n ) £ F n : o n = 0} and W 2 = {(01. (a) we have o^ = 0 whenever / is even. If S\ and S2 are nonempty subsets of a vector space V.0 whenever i < j}. Let V demote the vector space consisting of all upper triangular n x n matrices (as defined in Exercise 12). and W 2 are subspaces of'W such that W2 D W 2 = {0} and W. Show that. Likewise let W 2 denote the set of all polynomials g(x) in P(F) such that in the representation g(x) = bmx'" + 6 m _ 1 a. where W 2 = {A £ V: Aij = 0 whenever i > j). 24.22 The following definitions are used in Exercises 23 30. and let W| denote the subspace of V consisting of all diagonal matrices. In M m x n ( F ) define W. = { ( o i . is the set [x+y: x £ S\ and y £ S2}. .j . (W. 27. 1 Vector Spaces Definition. is the set of all upper triangular matrices defined in Exercise 12./:) in P(F) such that in the representation f(x) ~ anx" + fln_ia. © W 2 .{A £ M TOXn (F): Ai3 = 0 whenever i > j} and W 2 = {A £ M m X n ( F ) : Ai. = a 2 = • • • = a n _] = ()}.. Definition. F n is the direct sum of the subspaces W. we have 6* = 0 whenever i is odd. + W 2 = V. denoted S\ +S2. A vector space V is called the direct sum of W] and V 2 if V W.) Show that MWXn(F)=W1(DW2. 25. 23. .

3 Subspaces 23 28. A matrix M is called skew-symmetric if Ml = — M. 29. . fiedd. (b) Prove that c. Prove that M „ x n ( F ) = Wi + W 2 . show that if vi + W = v[ + W and v2 + W = v'2 + W. Clearly. It is customary to denote this coset. Compare this exercise with Exercise 28. Both Wi and W2 are subspaces of M n X „(F).v2 £ W. This vector space is called the quotient space of V modulo W and is denoted by V/W. that is. 30. by v + W rather than {v} + W. Define W.v2 £ V and a(v + W) = av + W for all v £ V and a £ F. Let F be a field that is not of characteristic 2. Prove that the set Wi of all skew-symmetric n x n matrices with entries from F is a subspae. Addition and scalar multiplication by scalars of F can be defined in the collection S = [v f W: v £ V} of all cosets of W as follows: (vi + W) + (v2 + W) = (Vj + v2) + W for all v\. 31. Let F be a. (d) Prove that the set S is a vector space with the operations defined in (c). + W = v2 + W if and only if vx . . (c) Prove that the preceding operations are well defined. then (c.e of M n x n ( F ) . Prove that M n x n ( F ) — Wi © W 2 . a skewsymmetric matrix is square. (a) Prove that v + W is a subspace of V if and only if v £ W. Now assume that F is not of characteristic 2 (see Appendix C).\ + x2: where ari £ Wi and x2 £ W 2 . and let W 2 be the subspace of M n x n ( F ) consisting of all symmetric n x n matrices. Let Wi and W2 be subspaces of a vector space V. Prove that V is the direct sum of Wi and W 2 if and only if each vector in V can be uniquely written as x.Sec. Let W be a subspace of a vector space V over a field F. 1.{ A £ F): Aij = 0 whenever i < ?'} and W2 to be the set of all symmetric n x n matrices with entries from /•'. + W) + (v2 + W) = (v[ + W) + (v'2 + W) and a(Vl + W) = a(v[ + W) for all a £ F. For any v £ V the set {v} + W = {v + w: w £ W} is called the coset of W containing v.

the equation of the plane simplifies to x = su + tv.25 0. Thus the zero vector is a.\. (This is proved as Theorem 1.02 0.2 2 0. un in S and scalars ai. Ov — 0 for each v £ V.S.1.01 0.05 4. unpared apples (freshly harvested) Chocolate-coated candy with coconut center Clams (meat.1 4 0.02 0. 1963. and C in space is x = A + su + tv..2 2 Coconut custard pie (baked from mix) 0 0.05 0. an in F such that v = oiU\ + a2u2 + • • • f anun.3 0 Raw wild rice (0) 0.34 0. 1 Vector Spaces LINEAR COMBINATIONS AND SYSTEMS OF LINEAR EQUATIONS In Section 1. In this case.C.03 0.. U. play a central role in the theory of vector spacers.4 0 Cooked spaghetti (unenriched) 0 0. Washington.01 0.4 Chap. Consumer and food Economics Research Division. Definitions.4 0 Raw brown rice (0) 0.5. generalization of such expressions is presented in the following deli nit ions.02 0.06 0. . Department of Agriculture.2 (0) Source: Rernice K.24 1.03 0. . an the coefficients of the linear combination..01 0. . D. (nig) 0. and s and t deneDte arbitrary real numbers. where u and v denote the vectors beginning at. of a vitamin present is either none or too small to measure.7 (0) Soy sauce 0 0. Composition of Foods (Agriculture Handbook Number 8). it was shown that the equaticju of the plane through three noncollinear points A. A vector v £ V is called a linear combination of vectors of S if there exist a finite number of vectors u\..1 (0) Jams and preserves In 0.1 Vitamin Content of 100 Grams of Certain Foods A (units) 0 90 0 1 [ 3 (mg) 0.. Let V be a vector space and S a nonempty subset ofV. A and ending at B and C.01 0.01 0. Observe that in any vector space V.02 0. . linear combination of any nonempty subset of V. a2 ti„ and call a.10 0..07 Niacin C (mg) (nig) 0. An important special case occurs when A is the origin. a Zeros in parentheses indicate that the amount.02 I5_.. Example 1 TABLE 1.45 0. and the set of all points in this plane is a subspace of R3. a2.. Merrill. respectively. The appropriate. u2. only) 100 0. . where s and t are scalars and u and v are vectors..) Expressions of the form su + tv.2 0 Apple butter Raw.3 10 Cupcake from mix (dry form) 0 0.01 0. Watt and Annabel I. B.. In this case we also say that v is a linear combination of ii\.3 0 Cooked farina (unenriched) (0)a 0. a 2 .02 0.03 6.18 L.

01 0. coconut custard pie.00\ 0.00y lO.oo/ \ 10.OOX 0.1). Chapters 1 and 2 we encounter many different situations in which it is necessary to determine1 whether or not a vector can be expressed as a linear combination of other vectors. -2). we illustrate how to solve a system of linear equations by showing how to determine if the vector (2. u3 = (0.10 0. and wild rice1. 100 grams of chocolate candy. .4 .10 0.02 4 0.45 0. how. . and 200 grains of soy sauce provide exactly the same amounts of the five vitamins as 100 grams of raw wild rice.00y K0.20 4. and 100 grams of spaghetti provide exactly the same amounts of the five vitamins as 100 grams of clams.06 + 0.ooy \0. In Chapter 3. and C (ascorbic acid). raw brown rice. we see that /0.30 0.OOy \0.00\ /0.01 0.20 0.oo\ /90. 2. and if so. coconut custard pie.00J Considering the vitamin vectors for cupcake.02 + 0.2 .Sec.18 0.02 -f 0.oo\ 0. • Throughout. For now. niacin. raw brown rice.2.01 + 0.07 + 0.03 + 0.20 \2.0()\ /o.3).34 0. soy sauce. and soy sauce. 100 grams of jam.01 0.63 0.20 0.oo\ /0.20 0.00/ v 4. > we discuss a general method for using matrices to solve any system of linear equations.oo\ /o.40 6.10 0.25 = 0.00\ /o. 100 grains of coconut custard pie.6. 1. B 2 (riboflavin).40 0.01 = 0.8) can be.02 0.03 0.02 0. 100 grams of apples. The vitamin content of 100 grams of each food can be recorded as a column vector in R:> for example1.4 Linear Combinations and Systems of Linear Equations 25 Table 1.05 0. This question often reduces to the problem e f solving a system of linear equations. u2 = ( .00y \0.00\ /0. Bi (thiamine).()()/ Ki)i)oj Thus the vitamin vector for wild rice1 is a linear combination of the vitamin vectors for cupcake.30 ^2. expressed as a.02 0. Similarly. 100 grams of farina. So 100 grams of cupcake. linear combination of ui = (1.00\ /o.05 + 2 0.00yi yO.02 0. since /io.oo\ fom\ /IOO.30 1.1 shows the vitamin content of 100 grams of 12 foods with respect to vitamins A.01 0. the vitamin vector for apple butter is /0.01 0.70 ^0.00j V 2-00y 200 grains of apple butter. 100 grams of raw brown rice.

2a 1 — 4a 2 + 203 + 805.8.4o4 + 14a5 = 2 3a.3a4 + I605 = 8.03.2 ) + o 3 (0.204 — 305 — 2 2a. and then eliminate 03 from the third equation.4o2 + 2 o 3 + 805 = 6 ai — 2a2 + 3a3 .02.3 = 1 04 — 2ar.5 = 6. To solve system (I). (3) . = 2 03 — 2ei4 + 705 = 1 3a. To do this.5a4 + 19a5 = 6. obtaining Oi — 2a 2 + 204 — 3a. Hence (2.16).4 + 7a. u2. and 05 such that (2.(2..6. The result is the following new system: Oi — 2o2 + 2a 4 — 3a5 = 2 2o 3 .1) + a 2 ( . -3). 1 Vector Spaces u5 = (-3.26 u4 = (2. Next we add —3 times the second equation to the third. and W if and only if there is a 5-tuple of scalars (01.j .04. U3.6.2. u. the first. .r> — 2 03 — 2a.r. we eliminate 01 from every equation except the first by adding —2 times the first equation to the second and — 1 times the first equation to the third.04.8) can be expressed as a linear combination of u\. We now want to make the coefficient of 03 in the second equation equal to 1.2 .03. it happened that while eliminating 01 from every equation except. we replace it by another system with the same solutions.0. but which is easier to solve.4 .2 + 2«4 — 3as. we first multiply the second equation by | . This need not happen in general.0.05) satisfying the 5 system of linear equations a 1 — 2a 2 f. (2) In this case. and Chap.-3) + a 5 (-3.8. To begin.\. <i] — 2a2 + 3c?3 — 3c/4 + I6U5). . .j — 5ci4 + 19a.3) + a..02. (1) which is obtained by equating the corresponding coordinates in the preceding equation. which produces Oi — 2a 2 + 2a4 — 3a. Thus we must determine if there are scalars 01. The procedure to be used expresses some of the unknowns in terms of others by eliminating certain unknowns from all the equations except one. 2.10) = (ai — 2o.8) = tti'Ui + a2u2 + 03113 + ei4U. we also eliminated a 2 from every equation except the first.i + 05145 = Oi(l. = 3.

05) is a solution to system (1).Sec. + 7 a.0.04.u2. change the set of solutions to the original system. Therefore (2.1/3.05 .4. we prove that these operations do not. (2.4 = 2a 5 + 3. constant. In particular. and 04) in terms of the other unknowns (a2 and 0.r> — —A o3 +3o5= 7 04 — 2a.5). 3.8) is a linear combination of U\.-5 — 4 03 = — 3a-. unknown occurs with a zero coefficient in each of the other equations. 2.a-.3. — 3ar. the vector (—4.4 Linear Combinations and Systems of Linear Equations 27 We continue by eliminating a 4 from every equation of (3) except the third.5 = 0 is a solution to (1). a 2 . The first nonzero coefficient in each equation is one. 1. The procedure: just illustrated uses three1 types of operations to simplify the original system: 1.iy 04. The first unknown with a nonzero coefficient in any equation has a larger subscript than the first unknown with a nonzc:ro coefficient in any prex-cxling equation. Thus for any choice of scalars a 2 and a. interchanging the order of any two equations in the system. (4) System (4) is a system of the desired form: It is easy to solve for the first unknown present in each of the equations (01.8) = -Aui + 0u2 + 7u3 + 3w4 + 0u5. a vector of the form (oj. + 7. = 3. a2. This yic'lds a i — 2a2 + o.03. If an unknown is the first. so that. 2. adding a. In Section 3.6. 3. nonzero coefficient in some equation.6. then that.0) obtained by setting a 2 = 0 and a.4.5.7. Note that we employed these operations to obtain a system of equations that had the following properties: 1. Rewriting system (4) in this form. 2a 5 + 3. we find that Oi = 2a 2 — a. multiple of any equation to another equation in the system. .r. and 1/5. unknown with a.05) = (2a2 . multiplying any equation in the system by a nonzero constant.

in the third equation. the first nonzero coefficient in the second equation is 2. 1 Vector Spaces To help clarify the meaning of these properties. . occurs with a nonzero e.T5 = 6 . and system (7) doe:s not. system (5) does not satisfy property 1 because. satisfy property 3 because x2. the first unknown with a nonzero ccxuficient in the second equation. (0) x.5x .2x 2 + 7x + 8 is not. system (6) does not. then the original system has no solutions.6 = a(xz .28 Chap. in the course of tisiug operations 1. however._.) We return to the study of systems of linear equations in Chapter 3. satisfy property 2 because X3./|.3x 5 = 2.2x 2 . is obtained. One:e: a system with properties 1. does not have a larger subscript than . case we wish to find scalars a and b such that 2x:j .i .2x2 + 12x .2x2 . If. X\ + 3X2 + X4 =7 2x3-5x4 = . the first unknown with a nonzero coefficient. where c is nonzero.2x 2 + 12x .6x5 = 0 x 2 + 5x 3 . 2. note that none of the following systems meets these requirements. and 3 a system containing an equation of the form 0 = c.5x . Example 2 We claim that 2x3 .3 and in P-.2x 8 (7) Specifically. (Sex1 Example 2. In the first.4x .2x 2 + 3x 3 x3 (6) xi + xr} = 1 x4 . 3x 3 . in the second equation.5x 2 .9) 3x 3 . We diseniss there the theoretical basis for this method of solving systems of linear equations and further simplify the procedure by use of matrices. the first unknown with a nonzero coefficient.5 -2x5= 9 x4 + 3.4x .9 .-r>x2 . but that. it is easy to solve lor some of the unknowns in terms of the others (as in the preceding example).3) + b(3x:i . 2.oeffieie:nt in the first equation.6 is a linear combination of x3 .l + x5 = . and 3 has been obtained.i(R).

we wish to show that.3) + 2(3x:i .96).9. -4 2 0 0. In the second case.2 46 = 12 96 = .3) + 6(3x3 . • .2x2 + 12x .46)x + (-3a. we find that a + 36 = 6= 116 = 06= 2 2 22 0.2x2 .4x . Now adding the appropriate multiples of the second equation to the others yields o= 6 = 0= 0= Hence 2x 3 . 29 Adding appropriate multiples of the first equation to the others in order to eliminate a.2 .2 a .5x 2 .5x . Hence 3x 3 — 2x2 + 7x + 8 is not a linear combination of x:i . there are no scalars a and 6 for which 3x 3 .3 a .96 = 8. Thus we are led to the following system of linear equations: a -2a -5a -3a + 36 = 2 56 = . we obtain a system of linear equations a + 36 = 3 .5 a .9).5x 2 .46 = 7 .4x .4 Linear Combinations and Systems of Linear Equations = (a + 3b)x3 + ( .5x . Using the preceding technique.5x .2 o . Eliminating a. as before yields a + 36= 6= 116 = 0= 3 4 22 17. . (8) But the presence of the inconsistent equation 0 = 17 indicates that (8) has no solutions.56 = .56)x2 + ( .2x2 + 7x + 8 = a(x3 .9).6 .Sec. 1.3 and 3x 3 .6 = -A(x3 .5 a .2x2 .4x .2x 2 .5x 2 .

w2 u\. v„ in 5 and scalars oi. 6. . in S and some scalars c\. bn such that x = ai«i + a2u2 H Then x + y = a. Because iu.0) for some scalars a and 6. the span of the set {(1. Proof This result is immediate if . « 2 . Thus the span of {(1.0. . Then there exist vectors U\. 0 2 . w2. which is a subspace that is contained in any subspace of V. Since S C W. A subset S of a vector space V generates (or spans) V if span(5) = V. .. is the set consisting of all linear combinations of the vectors in S.f.. 1.S1 = 0 because span(0) = {()}. an arbitrary vector in span(<Sr)..1. ex = (cai)wi + (c. ... The span of any subset S of a vector space V is a subspace of V. are clearly linear combinations of the vectors in S: so x + y and ex are in span(5). um.0) = (a.30 Chap. then S contains a vector z.1. 6 2 . we form the set of all linear combinations of some set of vectors. any subspace of V that contains S must also contain the span of S. it follows that span(5) C W. If S ^ 0 .. Moreover. for instance.(cam)um + b„e„ f. .6(0. . Now let W denote any subspace of V that contains S.3. c. We now name such a. 1 Vector Spaces Throughout this book. In R3.bnvn.a2)u2 4 f. If w C span(. So Oz = 0 is in span(5"). set of linear combinations. Let ^iV £ span (5).m and y = 6. For convenience. we have u)\. for any scalar c. denoted span(S'). . In this case.0. V\. c.v2. Thus span(S) is a subspace of V. Wk € W. b\. .0) 4.. 0)} consists of all vectors in R3 that have the form a( 1. belongs to W.. (0. Definition. Therefore w = c+wi + c2w2 + • • • + Ck'Wk is in W by Exercise 20 of Section 1. In this case. am.-.0). «i + a2u. we also say that the vectors of S generate (or span) V.. | Definition. . + b2v2 -\ \. (0.0). Theorem 1..S'). The span of S.. .5. . then w has the form w = CiWj +C2if2 + " • -+CkWk for some vectors W\. we define span(0) = {()}. the span of the set is a subspace of R . Let S be a nonempty subset of a vector space V.2 + • • • + amum + 6|t . c2.amu. This fact is true in general. 1 + 62t'2 4 and.0)} contains all the points in the xy-plane..0. . .

3) + ( .1) + t{0.1 1 1 2 .3) in R3 is a linear combination of the three given vectors.1) generate R3 since an arbitrary vector {0. 2 1 1 .2.a 2 + a 3 ). s. 1 0 1 ) 1 0 1 1 1 + . (1.R) c a n be expressed as a linear combination of the four given matrices as follows: an a2i aV2 a22 = .0).4a.0. and -a. and t for which r ( l .a3) Example 4 The polynomials x2 + 3x — 2. ( ^ a i l + o a 1 2 + o a 21 ~ 3^22) .1 •3an . 2 — 4x + 4 generate P2(-R) since each of the three given polynomials belongs to P 2 (i?) and each polynomial ax2 + bx + c in P2(-R) is a linear combination of these three.1 1 1 o a 12 . the matrices 1 0 0 1 1 1 0 1 and 1 1 0 1 . 2 1 1 1 x 0 0 a fl (""Q !! + o 1 2 + o 21 + o a 22) 1 On the other hand. 0. 1.a + 6 + c ) ( . the scalars r.4 Linear Combinations and Systems of Linear Equations Example 3 31 The vectors (1.c)(2a-2 + 5x . • (aua2.1. 2x2 + 5x — 3.8 a + 56 + 3c)(x2 + 3x . and t = -(—a\ + 0.1) = are r = -(a\ + 02 — a 3 ). namely. s = -{a\ . 1.1.1.2) + (4a . and (0.2 + a 3 ).ft) since an arbitrary matrix A in M 2x2 (.ofl21 + o a 22.x 2 .1. + 4) = az 2 + bx + a / Example 5 The matrices 1 1 1 0 1 1 0 1 1 1 0 1 and 0 1 1 1 • generate M 2x2 (. in fact.0) + a(l.Sec. ( .1).0.0.26 .

5 Usually there are many different subsets that generate a subspace W. (See Figure 1. • At the beginning of this section we noted that the equation of a plane through three noncollinear points in space.32 Chap. EXERCISES 1. In the next section we explore the circumstances under which a vector can be removed from a generating set to obtain a smaller generating set. . one of which is the origin. 1 Vector Spaces do not generate M 2x2 (i?) because each of these matrices has equal diagonal entries. So any linear combination of these matrices has equal diagonal entries. v € R3 and s and t are scalars. it is permissible to add any multiple of one equation to another. is of the form x — su + tv. v E R if and only if x lies in the plane containing u and v. (See Exercise 13. then span(5) equals the intersection of all subspaces of V that contain S. (b) The span of 0 is 0 . it is permissible to multiply an equation by any constant. (a) The zero vector is a linear combination of any nonempty set of vectors. Thus x G R3 is a linear combination of u. Hence not every 2 x 2 matrix is a linear combination of these three matrices. where u. Label the following statements as true or false.5. (f) Every system of linear equations has a solution.) It is natural to seek a subset of W that generates W and is as small as possible.) Figure 1. (c) If 5 is a subset of a vector space V. (d) In solving a system of linear equations. (e) In solving a system of linear equations.

T4 = .3x 4 .3x2 4 x + 2.-5).3 ) .T3 + 5x4 = 3 Xi + 2x 2 -Xi 42xi + 5x 2 4xi + l l x 2 Xi 2xi 3xi xi 4.0.2x3 + x 2 + 3x .4. x 3 .4x 5 ~ 4x 4 .1).x 2 + 2x 4 3 x3 .4. .3x3 .-1. determine whether the first polynomial can be expressed as a linear combination of the other two. .x + 1.IOX4 . (a) (b) (c) (d) (e) (f) (-2.(2. ( 1 . ( 1 . For each list of polynomials in P 3 (K). . .-1).1 . 1.1 lx2 4 3x 4 2. 2 . ( .2 .5.2 . .1 + x3 = 8 X3 = 15 + IOX3 = .2 x3 4 x2 + 2x 4-13.3x2 + 4x + 1.2. (1.3 . (1.6x2 + x + 4 -2x3 . x 3 4 2x2 .6 xi 4. x3 . 3 . x 3 .6.2x2 + 4x + 1.2).2.3x 2 4x 3 l()x3 5x 3 7x 3 X4 + x 5 .2x + 3 6x3 . .2x 2 + 2x 3 = 2 xi + 8x 3 + 5.1 4x3 + 2x2 .lx2 + 4x3 = 10 x\ .3x 2 — X3 4.-1) (1.1.1x2 . 2 ) (5.1).5 3.1. x 3 .2x 2 + x2 + x2 4.3 ) . ( .1 . x 3 + 3x2 . 1 ) .2x 5 = 7 = -16 = 2 = 7 / (b) (c) (d) (e) (f) + 6x3 = .0).1 ) (3.3x3 — 3x4 — 6 2xi 4. (1.2.2x3 .3x 4 5.1. determine whether the first vector can be expressed as a linear combination of the other two. x 3 . For each of the following lists of vectors in R3. .4x 4 = 8 xi 4.3x3 = -2 3xi — 3x 2 — 2x3 + 5x4 = 7 X\ — X2 — 2x3 — x 4 = —3 3xi .2x2 + 3x .3 .3x 4 1 - .-2.3).x 5 .2x2 + 3x . x 3 .x 2 + 2x 4 3. Solve the following systems of linear equations by the method introduced in this section. .2x 2 + X3 = 3 2x\ — X2 — 2x 3 = 6 x\ + 2x 2 — X3 + a*4 = 5 xi 4.2x3 .2.8x2 + 4x.4 Linear Combinations and Systems of Linear Equations 33 2.3.x 2 4.-3).0). (a) (b) (c) (d) (e) (f) x3 .(1. (a) 2si . ( 2 .3 . 1 ) (2.2 . ( .3 . 3 ) 4.Sec. ( .4 ) (-2.4x 2 . .

11. 1 .1) generate F 3 . then span(S'iU5.-1. .1.2.1. .1.1)} . Prove that {ei. M 3 } is the set of all symmetric 2 x 2 matrices. . ( . In F n . S = {(1. determine whether the given vector is in the span of S.2).3. Interpret this result geometrically in R3.1.T Prove that span({x}) = {ax: a £ F} for any vector x in a vector space. and (0. .1).1 . .x 2 4-x + l . M 2 .1). 14. . ( . (0.X 4. 1 . . Show that a subset W of a vector space V is a subspace of V if and only if span(W) = W. Show that if Mi = 1 0 0 0 M2 = 0 0 0 1 and M3 = 0 1 1 0 0 0 1 0 0 0 1 0 and 0 0 0 1 then the span of {Mi.1. In each part. .1} 1 2 -3 4 1 0 0 1 s = S = 1 0 -1 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1 0 0 6. .1. 1 .2). let ej denote the vector whose j t h coordinate is 1 and whose other coordinates are 0.2).1. .1} 2 x 3 .0. S = {(1. x 2 4-x 4. ^ Show that if Si and S 2 are subsets of a vector space V such that Si C 5 2 . (The sum of two subsets is defined in the exercises of Section 1. then span(Si) C span(5 2 ).34 Chap.1. Show that if Si and S2 are arbitrary subsets of a vector space V. if Si C 5 2 and span(Si) = V. In particular.) . S = {(1.3 ) . 7.1.1).0. x n }. (1. x 4 . S = {x 3 4. 1 ) } (-1. 9.1 . x . 12. (0.x 4. 1 Vector Spaces 5.-1).1. Show that Pn(F) is generated by {1.0. 13. 1 ) } (-1. 8.x 2 x + 3. Show that the vectors (1.1 . Show that the matrices 1 0 0 0 generate M2X2(F)10. deduce that span(5 2 ) — V. S = {(1.1)} ( 2 . x 2 4.-1).x 3 4-2x 2 4-3x 4-3. e 2 .0.0. en} generates F n . (a) (b) (c) (d) (e) (f) (g) (h) (2.l .2) = span(5i)+span(S' 2 ). S= {x 3 + x 2 + x + l .1.0).

3). It is desirable to find a "small" finite subset S that generates W because we can then describe each vector in W as a linear combination of the finite number of vectors in S. The search for this subset is related to the question of whether or not some vector in S is a linear combination of the other vectors in S..5 Linear Dependence and Linear Independence 35 15.vn € S and aiVi 4. Let S\ and S 2 be subsets of a vector space V. .a 3 ).• • • 4. W is an infinite set.a 2 w 2 4. Prove that span (Si DS 2 ) C span(Si) n span(S 2 ).a 3 = 1 .2 4ai 4. and U4.-1. for example. . 1. Let us attempt to find a proper subset of S that also generates W. Consider. . 1)2. Under what conditions are there only a finite number of distinct subsets S of W such that S generates W? 1.U2. Let V be a vector space and S a subset of V with the property that whenever v\.a i .1. a 2 . the subspace W of R3 generated by S = {u\. u 3 — 2u\ ~ 3w2 4. Give an example in which span(Si D S 2 ) and span(Si) nspan(S 2 ) are equal and one in which they are unequal. where ui — (2..a 2 4. that is.-1. .1 ) = (2ai 4.a 3 = .a 2 4. however.^3. and a 3 such that u4 = aiui 4. a 2 . and U3 if and only if the system of linear equations 2ai 4.4a 1 4.2 . —ai . if and only if there are scalars a i . Thus U4 is a linear combination of ui. Now U4 is a linear combination of the other vectors in S if and only if there arc scalars a\.u 2 .anvn = 0. then o\ = 0.3a 2 — 03 = — 1 has a solution.u 2 .a 2 v 2 4. namely. Let W be a subspace of a vector space V.a 3 . Indeed.4).. This does not. Unless W is the zero subspace. It can be shown.Sec.a 3 . —1). and a 3 satisfying (1.3a 2 . answer our question of whether some vector in S is a linear combination of the other vectors in S.5 LINEAR DEPENDENCE AND LINEAR INDEPENDENCE Suppose that V is a vector space over an infinite field and that W is a subspace of V.OW4.a 3 w 3 . 17.^/4}. —2. that U3 is a linear combination of wi. the smaller that S is.a 2 4. u 2 = (1. 16. and U4 = (1.a 2 4. The reader should verify that no such solution exists. —1). in fact.2 = • • • = an = 0. Prove that every vector in the span of S can be uniquely written as a linear combination of vectors of S. the fewer computations that are required to represent vectors in W. U3 = (1.

.To show that . .2. . un. 2 ) .1. (-1.un in S and scalars ai. . We call this the trivial representation of 0 as a linear combination of u\. A subset S of a vector space V is called linearly dependent if there exist a finite number of distinct vectors W]. (1. Consequently. any subset of a vector space that contains the zero vector is linearly dependent.w3 — O 4 = 0. and U4 is a linear combination of the others. because 0 = 1 • 0 is a nontrivial representation of 0 as a linear combination of vectors in the set. .4 . Example 1 Consider the set S = {(1.U3. of ui. . (2. w 2 . because some vector in S is a linear combination of the others.• • • + anur= 0.• • • 4. in the example above. the zero vector can be expressed as a linear combination of the vectors in S using coefficients that are not all zero.3u 2 4. By formulating our question differently. For any vectors Wi. for a set to be linearly dependent. rather than asking whether some vector in S is a linear combination of the other vectors in S. Thus.3 . For instance.«3 — O 4 = 0 can be solved for any vector having a nonzero W coefficient. a 2 .OW4. we have aiUi 4. This observation leads us to the following definition. . a n .anun = 0 if ai = a-2 = • • • = a n = 0. we can save ourselves some work. if any. u 2 .a2w2 4. we have —2ui 4. not all zero. . there must exist a nontrivial representation of 0 as a linear combination of vectors in the set.36 Chap. checking that some vector in S is a linear combination of the other vectors in S could require that we solve several different systems of linear equations before we determine which. 1 Vector Spaces In the preceding example. . un. w2. . w 2 . . Thus. so U\. Definition. such that aiUi 4. .U2.4 . . In this case we also say that the vectors of S are linearly dependent.3u 2 4. .0)} in R 4 . or U3 (but not U4) can be written as a linear combination of the other three vectors. . Note that since U3 = 2ui — 3w2 4.0. The converse of this statement is also true: If the zero vector can be written as a linear combination of the vectors in S in which not all the coefficients are zero.a 2 w 2 4. 2 . . . . We show that S is linearly dependent and then express one of the vectors in S as a linear combination of the other vectors in S. then some vector in S is a linear combination of the others. W That is. it is more efficient to ask whether the zero vector can be expressed as a linear combination of the vectors in S with coefficients that are not all zero. the equation —2wi 4. -4). . 0 ) .3.

Thus u= a (au) — a 0 = 0. and 04.2a 2 — 3a 3 —4ai — 4a 2 4.0) = 0. then au = 0 for some nonzero scalar a. A set consisting of a single nonzero vector is linearly independent. 2. For if {u} is linearly dependent.l .3 .5 Linear Dependence and Linear Independence 37 S is linearly dependent. 1.4 .4 . Example 2 In M 2x3 (i?).4 . 0 . 3. and 04 = 0. 2 ) 4 .2. 0 ) + 2(1.4 ) 4 . Finding such scalars amounts to finding a nonzero solution to the system of linear equations ai 4.3(2.0 ( .Sec.4 .4 ) 4.1. . As before. 0. a 2 = —3. . The empty set is linearly independent. 0 ) = 0. Thus S is a linearly dependent subset of R4. . a3. 2 ) .2a 2 4.3 . . . 2 . l . . 1. 3 .a 4 2ai — 4a 3 = = = = 0 0 0 0. we also say that the vectors of S are linearly independent. 03 = 2.3. 2 . A subset S of a vector space that is not linearly dependent is called linearly independent. a 2 .a 3 — a 4 3ai 4.a 3 ( l . The following facts about linearly independent sets are true in any vector space. .2a 3 4. 2 . for linearly dependent sets must be nonempty. A set is linearly independent if and only if the only representations of 0 as linear combinations of its vectors are trivial representations. not all zero. the set 1 -4 -3 2 0 5 -3 6 7 -2 4 -7 -2 -1 3 -3 11 2 • is linearly dependent because 1 -4 -3 2 4-3 0 5 -3 6 7 -2 4\_ -7 (-2 3 11 V—1 —3 2 0 0 0 0 0 0 Definition. we must find scalars ai.a 4 ( . and 4(1. . . 0 ) 4 . such that a i ( l .a 2 ( 2 .l . One such solution is ai = 4.

. 1 Vector Spaces The condition in item 3 provides a useful method for determining whether a finite set is linearly independent.3. 1 . 2 . l . This technique is illustrated in the examples that follow. and 04 are scalars such that a i ( l .1)} is linearly independent..1 ) 4 .-1). . a i . (0. we obtain ao a 0 4. . =0 =0 a3 =0 —ai — a2 — 03 4.ai ao 4. .0.0.-1).a 2 = 0 = 0 = 0 1 anpn(x) =0 ai ao 4.0). • Example 4 For k = 0 . .pi(x).0.0.• • • 4. • .1) = (0.aipi(x) H for some scalars ao. .• • 4..0.0. . and so S is linearly independent.(a 0 4. . .ai)x 4.1. we obtain the following system of linear equations.0.a n = 0 Clearly the only solution to this system of linear equations is ao = ai = • • • = a n = 0.1. . For if a0p0(x) 4. n let pk(x) = xk + xk+1 4. (0.a\ 4.2.a 2 )x 2 -| 1 (a 0 4. 0 . Example 3 To prove that the set S = {(1. By equating the coefficients of xk on both sides of this equation for k = 1 . . 0 . .. 1 . 0.ai 4. we must show that the only linear combination of vectors in S that equals the zero vector is the one in which all the coefficients are zero. a n .x n . .-1).38 Chap.0. The set {po(x).a 2 4.ai -I h a n ) x n = 0. .0. 0 . Equating the corresponding coordinates of the vectors on the left and the right sides of this equation.a 2 ( 0 .1 ) + a 4 (0.l ) + a 3 ( 0 . then ao 4.ai 4..0. (0.0. Suppose that ai. 0 .(a 0 4.04 = 0 a2 Clearly the only solution to this system is ai = a 2 = a 3 = a4 = 0. n. .pn(x)} is linearly independent in Pn(F).

consider the subset S = {u\.3a 3 )» 2 + 04^4.1. Thus the subset S' — {ai. we remarked that the issue of whether S is the smallest generating set for its span is related to the question of whether some vector in S is a linear combination of the other vectors in S. and 114: a\tii 4.0411. Thus the issue of whether S is the smallest generating set for its span is related to the question of whether S is linearly dependent.2 . 11. Theorem 1.\ 4.a. Theorem 1.u 2 . Proof. Therefore every linear combination a\Ui f a2u2 + a3":i + 04114 of vectors in S can be written as a linear combination of U[. Then S U {v} is linearly dependent if and only ifv € span(S).1 . 113 = (1.3). Then some vector v € S can be written as a linear combination of the other vectors in S. u2. Let V be a vector space. If Si is linearly dependent. SI S Corollary.5 Linear Dependence and Linear Independence 39 The following important results are immediate consequences of the definitions of linear dependence and linear independence. 1. Exercise.U3 .OU4. . -2u. Another way to view the preceding statement is given in Theorem 1. Exercise. suppose that S is any linearly dependent set containing two or more vectors.a3?i3 4.U4} of S has the same span as S! More generally.114} of R3.6.2u2 4. Let S be a linearly independent subset of a vector space V.7.4U4 = aiUi + o.4 = (ai 4. .3 = 2ii\ — 3u2 4. then S must be linearly independent. w2 = (1.3u 2 4.2u2 4. Let V be a vector space. .OM4) 4.1*2. For example. and the subset obtained by removing v from S has the same span as S. and let S] C S 2 C V. To see why. U\ or u2) is a linear combination of the other vectors in S.0.2a3)jvi + (a 2 . . then S 2 is linearly dependent. If S 2 is linearly independent.1 ) . 4 ) . and let V be a vector in V that is not in S.-1). where //. then Si is linearly independent.a 3 (2ui — 3u2 4.O//4 = 0. and «4 — ( 1 . —1. Proof.^3. This equation implies that w3 (or alternatively.7. It follows that if no proper subset of S generates the span of S. I Earlier in this section. = ( 2 . We have previously noted that S is linearly dependent: in fact. and let St C S 2 C V.Sec.

and (j) are tedious unless technology is used.a2u2 4. Therefore S U {v} is linearly dependent by Theorem 1. x 3 . If Su{v} is linearly dependent.. Thus aiv 4. 6 m such that v = biVi + b2v2 4. . and so v = a1 (-a2u2 onun) = . u2. . . Since v ^ Vi for i = 1 .xn are linearly independent..• • • 4. Since v is a linear combination of u2.• • • 4. then each vector in S is a linear combination of other vectors in S..x 2 4.anxn = 0 and X i . 1 Vector Spaces Proof. Then there exist vectors Vi. 1 -3 -2 6 in M 2 x 2 (fl) (a) -2 4 4 -8 ^ H .. (f) If aixi 4.l A ) \ 2 _4J)-M2><2^ (c) {x 3 4..1. Determine whether the following sets are linearly dependent or linearly independent. x 2 ..bmvm. then all the scalars a« are zero.x 4. . let v £ span(S). • • • . we have v £ span(S)..( a x a2)u2 (ax 1 an)un.. (a) If S is a linearly dependent set. which are in S. ra.2x 2 . . . . .a2u2 4. (i). equals v.. .• • • 4. Hence 0 = b\vi + b2v2 4-l)v.6. H Linearly independent generating sets are investigated in detail in Section 1.vm in S and scalars 6i. . .. say «i. Conversely. un in S U {v} such that ai^i 4...un.v} is linearly dependent. 2 . .3. (c) The empty set is linearly dependent. and so the set {vi. Label the following statements as true or false. . .anun = 0. one of the u^s. EXERCISES 1. a 2 .40 Chap. 62.2x .• • • + anun = 0 for some nonzero scalars a i .x 2 4. (e) Subsets of linearly independent sets are linearly independent. the coefficient of v in this linear combination is nonzero. . then there are vectors u\.V2.vm. 2. a n . .v2. (b) Any set containing the zero vector is linearly dependent. (h).1} in P3(R) 3 The computations in Exercise 2(g).. (d) Subsets of linearly dependent sets are linearly dependent.a2x2 4. Because S is linearly independent.6..

1. x 3 . 2 .l . x 2 .3x 2 . (1.2 . 8.x 4 4. then S is linearly independent.1)} be a subset of the vector space F 3 .5x . . 5. . .6} in P3(R) (e) { ( l .x 3 + ox 2 .6.x 3 4. 10.3x 3 4. Let S = {(1. . . Recall from Example 3 in Section 1. Show that the set { l . 1. 0 .3. Show that {u. . Prove that { e i . v} is linearly dependent if and only if u or v is a multiple of the other.5x 2 4. 9.2x 4.l . Find a linearly independent set that generates this subspace.x 4.x 3 4.x. let E1* denote the matrix whose only nonzero entry is 1 in the zth row and j t h column. .1).4. 7. In M 3 x 2 (F).3. Give an example of three linearly dependent vectors in R3 such that none of the three is a multiple of another. .x 4 4-x 3 .2x 4 4.5 Linear Dependence and Linear Independence (d) {x 3 . / 4. prove that the set is linearly dependent.6.1. x 4 + 3x 2 . 4 ) } i n R 3 (f) { ( l . x . . e n } is linearly independent. . then S is linearly dependent.2x 4 4. x n } is linearly independent in P n ( F ) . 6.l . ( 2 .2 x 3 4. ( .3 that the set of diagonal matrices in M 2 x2(^) is a subspace.2} in P 4 (R) 4 (j) {x .0.8x 4.8x} in P4(R) 3.8x 4. (a) Prove that if F = R. . e 2 .x 4-1. (0. 2x 2 4. 2 ) .5x 2 .l ) } i n R 3 (g) 1 0 ~2 1 1 0 ~2 1 0 1 0 1 -1 1 -1 1 -1 2 1 0 -1 2 1 0 2 1 -4 4 2 2 1 -2 in M 2 x 2 (tf) in M 2 x 2 ( # ) 41 (h) (i) {x4 .* Let u and v be distinct vectors in a vector space V. . l ) . (b) Prove that if F has characteristic 2.x 3 . In M m X n ( F ) . In F n . l .3x 4. .5.3x 4.0). l ) . . . 1 < j < n) is linearly independent. ( l . x 4 4. let ej denote the vector whose j t h coordinate is 1 and whose other coordinates are 0.5.4x 2 . 2 ) .Sec. Prove that \Exi: 1 < i < m.4x 2 4. ( l . .3x 2 4.5x 2 + 5x .

Prove that {u.u + w.8. A2.u2.u2. Alk} is also linearly independent. 1. v} is linearly independent if and only if {u 4. 19.3) with nonzero diagonal entries. Prove that S is linearly independent. g. Prove that a set S of vectors is linearly independent if and only if each finite subset of S is linearly independent. How many vectors are there in span(S)? Justify your answer. (This property is proved below in Theorem 1. Let S = {ui. 15. 17. • • • .. and w be distinct vectors in V. Let S = {ui.v. (a) Let u and v be distinct vectors in V.6 BASES AND DIMENSION We saw in Section 1.un} be a finite set of vectors. 12. € F(R. where r ^ s. Let / . Prove that {u. .U2.Wn} be a linearly independent subset of a vector space V over the field Z2. Prove that if {Ai.5 that if S is a generating set for a subspace W and no proper subset of S is a generating set for W. • • •.U2. (b) Let u. Prove that the columns of M are linearly independent. v.) It is this property that makes linearly independent generating sets the building blocks of vector spaces. ..v.. un in S such that v is a linear combination of ui. 14.w} is linearly independent. u — v} is linearly independent. v 4. .w} is linearly independent if and only if {u + v.Ak} is a linearly independent subset of Mnxn(-F)j then {A\. R) be the functions defined by f(t) = ert and g(t) = est.. w 2 .A2.42 Chap. 13. . . Let M be a square upper triangular matrix (as defined in Exercise 12 of Section 1. Prove that a set S is linearly dependent if and only if S = {0} or there exist distinct vectors v. • • • .. then S must be linearly independent. R). Uk}) for some k (1 < k < n). 16. . 18.. Prove Theorem 1..un. Let S be a set of nonzero polynomials in P(F) such that no two have the same degree. • • . Let V be a vector space over a field of characteristic not equal to two. Prove that / and g are linearly independent in J-(R. Prove that S is linearly dependent if and only if ui = 0 or Wfc+i € span({tii.. 20. A linearly independent generating set for W possesses a very useful property—every vector in W can be expressed in one and only one way as a linear combination of the vectors in the set. 1 Vector Spaces 11.6 and its corollary.U\.

.. .0. } is a basis.. Let V be a vector space and ft = {ui. later in this section it is shown that no basis for P(F) can be finite. we see that 0 is a basis for the zero vector space. • Example 2 In F n . Let ft be a basis for V. .. x 2 . We call this basis the s t a n d a r d basis for Pn(F). which is used frequently in Chapter 2. In fact..6 Bases and Dimension 43 Definition. • / Example 5 In P(F) the set { 1 .0. . Tims v is a linear combination of the vectors of ft.un} be a subset ofV. we aiso say that the vectors of ft form a basis for V. . e 2 . 1. • Example 3 In MTOXn(F).. a n . . let ei = (1. . Proof. e n } is readily seen to be a basis for F n and is called the standard basis for F n . Suppose that a\U\ +a2u2 4anun and v = b\U\ + b2u2 + • • • + bnut \.. row and j t h column.. Theorem 1. . A basis j3 for a vector space V is a linearly independent subset of V that generates V.1).0. x n } is a basis. Example 1 Recalling that span(0) = {()} and 0 is linearly independent..0. . a 2 . The next theorem. . .Sec. • Observe that Example 5 shows that a basis need not be finite. Then ft is a basis for V if and only if each o G V can be uniquely expressed as a linear combination of vectors of ft. x. .8. let E*i denote the matrix whose only nonzero entry is a 1 in the ith.0. If v e V. .anun . {ei. x 2 . . If ft is a basis for V. •' Example 4 In Pn(F) the set {1. . 0 ) .a2u2 H for unique scalars a\.. . 1 < j < n} is a basis for M m x n (F).0). .u2. .e„ = (0. . . can be expressed in the form v = aiUi 4.1. . . . that is. .. Hence not every vector space has a finite basis. x . establishes the most significant property of a basis.e 2 = (0. then v € span(/?) because span(/i/) = V. Then {E*? : 1 < i < m..

I Because of the method by which the basis ft was obtained in the proof of Theorem 1.. . Uk} is linearly independent. we must eventually reach a stage at which ft = {ui.. . Hence V has a finite basis.. Theorem 1.---. then some subset of S is a basis for V.. each n-tuple of scalars determines a unique vector v £ V by using the entries of the n-tuple as the coefficients of a linear combination of ui.an. This fact suggests that V is like the vector space F n . .. if possible. We claim that ft is a basis for V. In this book. I Theorem 1..9.bn = 0.8 shows that if the vectors u\. then clearly v £ span(/3). then the preceding construction shows that ft U {v} is linearly dependent. choosing vectors u2. If a vector space V is generated by a Bnite set S. The proof of the converse is an exercise.an = bn.b2)u2 4h (an bn)un.. and so v is uniquely expressible as a linear combination of the vectors of ft.. Hence a\ — l>i.. we are primarily interested in vector spaces having finite bases...a2. un. . We see in Section 2. .4 that this is indeed the case. If v £ ft. this theorem is often remembered as saying that a tinite spanning set for V can be reduced to a basis for V. Subtracting the second equation from the first gives 0 = (a\ . Let v £ S. Because ft is linearly independent by construction..7 (p. This method is illustrated in the next example.. By Theorem 1.u2. it follows that ai — b\ = a2 — b2 = • • • = on . an) and.a2u2 -\ h anun for appropriately chosen scalars 0]. 39). a 2 . conversely. Thus S C span(/?). So v £ span(/3) by Theorem 1. Continue. Otherwise. Thus v determines a unique n-tuple of scalars ( a i . but adjoining to ft any vector in S not in ft produces a linearly dependent set.b\)ui + (a2 . Since ft is linearly independent. By item 2 on page 37..itfc in S such that {ui.. 1 Vector Spaces are two such representations of v.u2..5 (p. if v £ ft. then every vector in V can be uniquely expressed in the form v = arui 4.. {ui} is a linearly independent set. where n is the number of vectors in the basis for V.44 Chap.U2. If S = 0 or S = {0}. 30) we need to show that S C span(/?). T h e o r e m 1.un form a basis for a vector space V... Since S is a finite set..9 identifies a large class of vector spaces of this type. it suffices to show that ft spans V.9. Uk) is a linearly independent subset of S. then V = {()} and 0 is a subset of S that is a basis for V. Otherwise S contains a nonzero vector u\.u2.a2 = b2. . . Proof...

5 ) . . . — 1) in our basis or exclude it from the basis according to whether {(2.1 ) . —1) in our basis.-3.3 . ( 0 .2. . 2. —2) as part of our basis. 1.0. . ( 0 . for in this case L = 0 . .-3. (1.-2).2.3(1. .-12.4(0. -1). • The corollaries of the following theorem are perhaps the most significant results in Chapter 1. (1.0.5) = (8. .1 ) .20)} is linearly dependent by Exercise 9 of Section 1.-12.0)}. and so taking H = G gives the desired result. Thus we include (1.0.0. wc include (0.2 ) is not a multiple of (2.0.3 . -12. we exclude (7. Hence we do not include (8. We conclude that {(2.-3. The proof is by mathematical induction on m.5). say (2.20). 5 ) and vice versa. . so that the set {(2. .20) in our basis.0) from our basis.0.3 . the set {(2. 5 ) .-1)} is a subset of S that is a basis for R3. 5). . Since 4(2.0) = (0.0. An easy calculation shows that this set is linearly independent. To start. 5). 0 . . (7. (0.10 (Replacement Theorem).1 ) } obtained by adjoining another vector in S to the two vectors that we have already included in our basis. On the other hand.0)} is linearly independent or linearly dependent. (8.2.3 . —12. (0. 2 .9. Now we consider the set {(2. (1.2 ) .2. 2 . 5 ) 4. and let L be a m vectors. The induction begins with m = 0.6 Bases and Dimension Example 6 Let S = {(2. .0. (1.(7. select any nonzero vector in S.2 ) 4. (8.(1.2. Because 2(2. .(0.2. Let V be a vector space vectors. and so we include (0.5.5). ( 1 . that is generated by a set G containing exactly n linearly independent subset of V containing exactly and there exists a subset HofG containing exactly L U H generates V. 45 It can be shown that S generates R3. Theorem 1.5). —3. . -2).20). Then rn < n n — rn vectors such that Proof.Sec. .3 . 5 ) .3 .-3. We can select a basis for R3 that is a subset of S by the technique used in proving Theorem 1.1 ) } is linearly independent or linearly dependent. (7. .2. (1. to be a vector in the basis.2 ) . As before.2.0. 2. In a similar fashion the final vector in S is included or excluded from our basis according to whether the set {(2. —2)} is linearly independent.0).2 ) .

v m + i } be a linearly independent subset of V consisting of ra 4.. Definitions. + (lhl)vni+i Let H = {u2.. < n..1...b2u2 4bn= n—rn **n—rn VTO+1 • (9) Note that n — m > 0.(-6r/ 1 6 n _ m )u f l _ m . Moreover. Proof. ui. v%. Thus there exist scalars a\.7 (p.. n > rn + l. for otherwise we obtain the same contradiction.. 1 Corollary 1.(-bi1b2)u2 H + • • • 4. Un-m} Q span(L U H).vm}u{ui. The unique number of vectors . Hence n > ra... the replacement theorem implies that n +1 < n. a 2 . This fact makes possible the following important definitions.. that is. 30) implies that span(L U H) = V. which by Theorem 1.1 vectors.v2.. and so we may apply the induction hypothesis to conclude that rn < n and that there is a subset {ui . We prove that the theorem is true for m 4.vm} is linearly independent.a2v2 + amvrn 4. . v2.u2..biui 4. vm.. b\. 1 Vector Spaces Now suppose that the theorem is true for some integer m > 0. Because {v\...(-bilam)vm 4. bn-m such that a\V\ 4.(-bi1a2)v2 4. . If 7 contains more than n vectors. aTO.46 Chap.. un-m } of G such that {vi . and the number m of vectors in 7 satisfies 77?. Reversing the roles of ft and 7 and arguing as above. Suppose that ft is a finite basis for V that contains exactly n vectors. un-m are clearly in span(L U H)...... Then ui £ span(LUi/)... vm.. lest vm+i be a linear combination of v\. . .. I If a vector space has a finite basis.5 (p. 39) contradicts the assumption that L is linearly independent..ui.. . some bi..v2.1. Then every basis for V contains the same number of vectors. u2. un-m } generates V... . Therefore 7 is finite.. and because V\.. By the corollary to Theorem 1. Since H is a subset of G that contains (n — rn) — 1 = n — (m + 1) vectors.m } generates V... Un-m).V2. Theorem 1.. Solving (9) for wi gives ui = (-6^ 1 Oi)« 1 4. .. Let V be a vector space having a tinitc basis. and let 7 be any other basis for V.. the theorem is true for rn 4. Let L = {v\. is nonzero. 39).. This completes the induction. . • • • . . a contradiction.6 (p. u 2 . w n . it follows that {vi. say &i. b2. v2. vm... Hence rn = n. ... um. Corollary 1 asserts that the number of vectors in any basis for V is an intrinsic property of V.. Since S is linearly independent and ft generates V. {v\.v2. we obtain n < ra.. A vector space is called finite-dimensional if it has a basis consisting of a Unite number of vectors...u2. then we can select a subset S of 7 containing exactly n 4-1 vectors.u2.

x. the vector space of complex numbers has dimension 1. Example 9 The vector space M m x n ( F ) has dimension rnn.) • In the terminology of dimension. Just as no linearly independent subset of a finite-dimensional vector space V can contain more than dim(V) vectors. . From this fact it follows that the vector space P(F) is infinite-dimensional because it has an infinite linearly independent set.6 Bases and Dimension 47 in each basis for V is called the dimension of\/ and is denoted by dim(V). and a generating set for V that contains exactly n vectors is a basis for V. A vector space that is not finite-dimensional is called infinite-dimensional. This set is. in fact. • • • • The following examples show that the dimension of a vector space depends on its field of scalars. (A basis is {1}. namely {l. Example 7 The vector space {0} has dimension zero. In Section 1. Let V be a vector space with dimension n.Sec. a basis for P(F).}.) • Example 12 Over the field of real numbers.i}. (A basis is {l. a corresponding statement can be made about the size of a generating set. Yet nothing that we have proved in this section guarantees an infinite-dimensional vector space must have a basis. (a) Any finite generating set for V contains at least n vectors. 1.. The following results are consequences of Examples 1 through 4. the vector space of complex numbers has dimension 2..7 it is shown. however.. that every vector space has a basis. / Example 11 Over the field of complex numbers. Corollary 2. the first conclusion in the replacement theorem states that if V is a finite-dimensional vector space. then no linearly independent subset of V can contain more than dim(V) vectors. Example 8 The vector space F n has dimension n. Example 10 The vector space Pn{F) has dimension n + 1. x2.

Moreover.4 and (a) of Corollary 2 that {x 2 4.9 some subset H of G is a basis for V.4} is a basis for P2(R. so that G is a basis for V. (a) Let G be a finite generating set for V.1. 1 Vector Spaces (b) Any linearly independent subset ofV that contains exactly n vectors is a basis for V.3x . Proof. therefore (a) implies that L U H contains exactly n vectors and that L U H is a basis for V. Now L U H contains at most n vectors.1 ) . L is a basis for V.1 ) . .48 Chap.5x . G must contain at least n vectors. and L generates V.1)} is a basis for R4. (c) Every linearly independent subset of V can be extended to a basis for V.5 and (b) of Corollary 2 that {(1. ( : : It follows from Example 3 of Section 1.0. E x a m p l e 15 .0. .4 and (a) of Corollary 2 that : is a basis for M2x2(R). . . • E x a m p l e 14 It follows from Example 5 of Section 1. (c) If L is a linearly independent subset of V containing ra vectors. if G contains exactly n vectors. • . (b) Let L be a linearly independent subset of V containing exactly n vectors. By Theorem 1. (0. (0. Since L is also linearly independent. Thus H = 0 .x 2 . Since a subset of G contains n vectors.0. 1 E x a m p l e 13 It follows from Example 4 of Section 1. then the replacement theorem asserts that there is a subset H of ft containing exactly n — ra vectors such that LU H generates V.0.). Corollary 1 implies that H contains exactly n vectors.1 ) . It follows from the replacement theorem that there is a subset H of ft containing n — n = 0 vectors such that L U H generates V. (0.0.2x 2 4.0. Let ft be a basis for V.1.4x 4.3.)•(.: ^ : ) • ( : : ) . then we must have H = G.2.

6 depicts these relationships. 49 f xn.. Also. each generating set for V contains at least n vectors and can be reduced to a basis for V by excluding appropriately chosen vectors. n. . and V is said to be finite-dimensional. For this reason. bases. It follows from Example 4 A procedure for reducing a generating set to a basis was illustrated in Example 6. every linearly independent subset of V contains no more than n vectors and can be extended to a basis for V by including appropriately chosen vectors. Thus if the dimension of V is n. when we have learned more about solving systems of linear equations. and generating sets.. Moreover. Figure 1.pn(x)} is a basis for P n (F). A basis for a vector space V is a linearly independent subset of V that generates V.Sec.5 and (b) of Corollary 2 that {po(x). . If V has a finite basis. . as (c) of Corollary 2 asserts is possible..6 Bases and Dimension Example 16 For k = 0 .9 as well as the replacement theorem and its corollaries contain a wealth of information about the relationships among linearly independent sets.. let pk(x) = x fc 4-x fc+1 4 of Section 1. we summarize here the main results of this section in order to put them into better/perspective. every basis for V contains exactly n vectors. then every basis for V contains the same number of vectors. In Section 3.6 .4. we will discover a simpler method for reducing a generating set to a basis. An Overview of Dimension and Its Consequences Theorem 1. This number is called the dimension of V. 1 . . 1. The Venn diagram in Figure 1. This procedure also can be used to extend a linearly independent set to a basis.Pi(x).

Proof. .0).3 that the set of symmetric n x n matrices is a subspace W of M n X n ( F ) .11. • . • • • . if dim(W) = dim(V). X in W such & that {xi. 1 Vector Spaces Our next result relates the dimension of a subspace to the dimension of the vector space that contains it. If dim(W) = n.0. .a&) £ F 5 : a. this process must stop at a stage where A < n and { x i . But Corollary 2 of the replacement theorem implies that this basis for W is also a basis for V. 39) implies that { x i . .05 = 0. . It is easily shown that W is a subspace of F5 having {(-1.0. Example 18 The set of diagonal n x n matrices is a subspace W of M n x n ( F ) (see Example 3 of Section 1. and hence it is a basis for W. any other vector from W produces a linearly dependent set.0. then V = W. Since no linearly independent subset of V can contain more than n vectors.1)..7 (p. . 1 Example 17 Let / W = {(o. Thus dim(W) = 3. .3).Enn}. Then W is finite-dimensional and dim(W) < dim(V). (-1.Xfc} generates W. Moreover.a :i .. Therefore dim(W) = k < n.0)} as a basis. Xi. Theorem 1.1. • • • . . (0. A basis for W is {En.\.\ + 03 4. Otherwise.50 The Dimension of Subspaces Chap.a 2 . Theorem 1. then W is finite-dimensional and dim(W) = 0 < n. so {xi} is a linearly independent set.. a2 = 04). x 2 . x 2 . Thus dim(W) = n..0.X2.Xfc} is linearly independent.E22. so W = V. W contains a nonzero vector Xj. Continue choosing vectors.X2. • Example 19 We saw in Section 1.Xfc} is linearly independent but adjoining . A basis for W is {Aij : 1 < i < j < n}.1.0. Let dim(V) = n.0. . If W = {()}. Let W be a subspace of a finite-dimensional vector space V. then a basis for W is a linearly independent subset of V containing n vectors. where E%3 is the matrix in which the only nonzero entry is a 1 in the iih row and j t h column.04.1.

ai6. a subspace of dimension 1 is a line through the origin. where ais.. If W is a subspace of a finite-dimensional vector space V.an £ F . • We can apply Theorem 1.1 = -n(n+ 1 Corollary.} must have dimensions 0. The polynomials fo(x). Corollary 2 of the replacement theorem guarantees that S can be extended to a basis for V. If a point of R2 is identified in the natural way with a point in the Euclidean plane. Similarly.(Ci . is a subspace W of Pis(F).Ci. or 3. and a subspace of R2 having dimension 2 is the entire Euclidean plane.Ci~l)(Ci .Cc f fc=0 kjti .11 to determine the subspaces of R2 and R \ Since R2 has dimension 2. The Lagrange Interpolation Formula Corollary 2 of the replacement theorem can be applied to obtain a useful formula. 1 in the j t h row and ith column. The only subspaces of dimension 0 or 2 are {0} and R2. then any basis for W can be extended to a basis for V. fi(x) fn(x) defined by fi(*) = (X . a subspace of dimension 2 is a plane through the origin.Cj+i) • • • (X . . 1.4). A basis for W is {1. and 0 elsewhere. Any subspace of R2 having dimension 1 consists of all scalar multiples of some nonzero vector in R2 (Exercise 11 of Section 1. 2. subset of V. then it is possible to describe the subspaces of R geometrically: A subspace of R2 having dimension 0 consists of the origin of the Euclidean plane. • • • . or 2 only. .. 1. J Example 20 The set of all polynomials of the form a18xl84-a16x1644. Let Cq.Cj_!)(x .a 0 . we see that a subspace of dimension zero must be the origin of Euclidean 3-space. ..Cn) n Ci . and a subspace of dimension 3 is Euclidean 3-space itself. respectively. which is a subset of the standard basis for Pig(F). Let S be a basis for W..Cp) . subspaces of R2 can be of dimensions 0. It follows that dim(W) =n + (n~ 1)44. x 1 6 .a 2 . x 18 }.a 2 x 4.Sec.c n be distinct scalars in an infinite field F. 1. Proof.Ci+l) •••(C-i. a subspace of R with dimension 1 consists of a line through the origin. Interpreting these possibilities geometrically. x 2 . Because S is a linearly independent. .Cn) X -Cfc {(•< -Co)--.• • (X . the subspaces of R.6 Bases and Dimension 51 where A*3 is the n x n matrix having 1 in the ith row and jfth column.

Then n ^2 aifi(cj) i=0 But also n 22a. .ifi(cj) i=0 = Oj = 0 for j = 0 .. . .. This representation is called the Lagrange interpolation formula. . Since the dimension of P n ( F ) is n+1. n. . . 1=0 It follows that 9(cj) = ^2bifi(cj) i=0 so 9= i-0 is the unique representation of g as a linear combination of elements of ft. Hence aj = 0 for j = 0 . where 0 denotes the zero function. it follows from Corollary 2 of the replacement theorem that ft is a basis for P n ( F ) . By regarding fi(x) as a polynomial function /*: F — F. 1 . . . a i . Suppose that /_. we see that > fi(cj) = 0 1 tii^j if i = j. so /3 is linearly independent. n 9 = ^>2bifi. a n . .c n ). n. by (10).. . every polynomial function g in P n ( F ) is a linear combination of polynomial functions of ft. Because ft is a basis for P n (F). aifi — 0 i=0 f° r s o m e scalars an. . (10) This property of Lagrange polynomials can be used to show that ft = {/o> /i) • • • > fn} is a linearly independent subset of P n (F). say.52 Chap. Note that each fi(x) is a polynomial of degree n and hence is in P n (F). 1 Vector Spaces are called the Lagrange polynomials (associated with co. 1 . .ci. . Notice ^9(ci)h = bJ'^ .

and b2 = —4... 1 . (2.bn are any n 4 .1)(3 . n ) .5). . &i = 5. (x . . C\.Sec. For example.. . Every vector space has a finite basis.2)(x .2) 2 Hence the desired polynomial is 2 g(x) = J2bifi(x) i=0 2 . (Thus.. and c2 are / .4x 4. cn = 1. . EXERCISES 1.2(x 2 .3 x 2 + 6x 4.5/i(x) . / -3* + 2).2) = . Thus we have found the unique polynomial of degree not exceeding n that has specified values bj at given points Cj in its domain (j = 0 . Label the following statements as true or false... o . Ci. then the polynomial function n 9 = Y1 Mi i=0 is the unique polynomial in P n ( F ) such that g(cj) = bj.3x 4.4/ 2 (x) = 4(x 2 .5(x 2 . A vector space cannot have more than one basis. in the notation above.4x + 3). C\ = 2. 1. then / is the zero function. . cn in F. Every vector space that is generated by a finite set has a basis. . .. = 8/ 0 (x) 4. bo = 8.5.6 Bases and Dimension 53 that the preceding argument shows that if bo. let us construct the real polynomial g of degree at most 2 whose graph contains the points (1. —4).8). (a) (b) (c) (d) The zero vector space has no basis. An important consequence of the Lagrange interpolation formula is the following result: If / £ P n ( F ) and f(ci) = 0 for n4T distinct scalars CQ. ^)=(l-2)(l-3)-2-^-5- + 6 ^ / ' (•'•) = t lit ^ = -Hx2 (2-l)(2-3) and /*<*)=(r?i!r?M(* (3 . C2 = 3.1 scalars in F (not necessarily distinct).b\.5x 4.6) . .3) l .3) . and (3.) The Lagrange polynomials associated with Co.

x + 2x 2 . and if S is a subset of V with n vectors.4 .-1).2 + x .5.2.0)} a linearly independent subset of R 3 ? Justify your answer. and u*. .3 . 8 .2 ) .18x .5x .-1)} (c) {(1.u2. Give three different bases for F 2 and for M 2X 2(F).3 .1.9x 2 } 4.1. (6. (a) {(1. (k) If V is a vector space having dimension n.4x 2 — x4-3.2x 2 . that Si is a linearly independent subset of V.3 . The vectors u : = (2.37. (0.2x 4.-1). .1).8).2x . Determine which of the following sets are bases for R3. 4 .3 . Do the polynomials x 3 — 2x 2 4T. Find a subset of the set {ui. 5.0.-3.4x 2 . 3 ) .-2). . l .X4-X2} (c) {1 . (1) If V is a vector space having dimension n. (h) Suppose that V is a finite-dimensional vector space. . then the number of vectors in every basis is the same.2 x 2 .0. uA = (1. u2 = (1. u 3 = ( . .(2. 2. (j) Every subspace of a finite-dimensional space is finite-dimensional.6x 2 } (d) { .2 .x 2 . .3. . and that S 2 is a subset of V that generates V.2 .x 4.0.1).6 ) .x 2 .6x 2 } (e) {1 4. and 3x —2 generate P. Then Si cannot contain more vectors than S 2 .2x 4.4.2x .1 4.2 4.1). (1. (2. Is {(1. 7. 3 ) } (b) {(2. 1 .} that is a basis for R3.10x2.3 ) . (0.(1.3 4-X2.3. then every vector in V can be written as a linear combination of vectors in S in only one way. then V has exactly one subspace with dimension 0 and exactly one subspace with dimension n. 1 . ( . 6.1 4. .3x . .1.8 .2 ) } 3.ur. —17).U3.4. (g) The dimension of M m X n ( F ) is rn 4.4 x 2 } {14-2X4-X2.2).1). ( . .4 ) . (i) If S generates the vector space V. (2. 2 ) } (e) {(1.54 Chap.U4.n.l . -10. .1)} (d) {(-1. 1 Vector Spaces (e) If a vector space has a finite basis.x 2 . (f) The dimension of P n ( F ) is n. (2.2 x 4 .4 . . —5.1).-4. 1 2 . (a) (b) { .3(i?)? Justify your answer.5. ( 0 .8) generate R3.-1). Determine which of the following sets are bases for P2(R). ( . then S is linearly independent if and only if S spans V.4x . = (—3.

(-1. (a) (b) (c) (d) ( . (-1.9 .5 . v + w. v.3 0 ) . 1. . then {u 4. 1 . 6 ) . a 2 .1.Sec.15.a 3 . . w} is also a basis for V.. 1 ) . The vectors wi u3 u5 u7 = = = = (2. .3 .1. . 13.2x 2 4.1) form a basis for F4. .02.15). (1. Find bases for the following subspaces of F 5 : Wi = {(ai.3).1). 14. -3). then both {u 4. Find the unique representation 4 of an arbitrary vector (01. bv} are also bases for V. What are the dimensions of Wi and W2? . 3 .2 ) .a 5 = 0}.2 . u2 = (0.a4. (2. a 5 ) £ F 5 : a 2 = a 3 = a 4 and a x 4. Let W denote the subspace of R5 consisting of all the vectors having coordinates that sum to zero.3 . and U4.3) (-4. 9. (3. v. .1). .a 5 ) £ F 5 : a x . (1.5).-2) ( . 7 .6 .1. . 9 . ( . u2 u4 uQ u8 = = = = ( . . Find a basis for this subspace.10) / 11. Let u and v be distinct vectors of a vector space V. a 4 .2 . (1. (1. -18. u3 = (0. .9). . use the Lagrange interpolation formula to construct the polynomial of smallest degree whose graph contains the following points. (2.1.0. . v} is a basis for V and a and 6 are nonzero scalars. u2. 2 ) .6 ) . In each part.04) in F4 as a linear combination of Mi. (0.1.12). . 7 ) . U3.1. 12.a 3 . W. -12. w} is a basis for V.. (3.2.Ug} that is a basis for generate W. (1. 4 .03. 10. The set of solutions to the system of linear equations xi .9 ..0. and W = (0. .w. 2 .6 ) . Let u. Show that if {u.0.2 .1).v 4.7). a 3 . (-2.a2.8 .a 4 = 0} and W 2 = {(ai. and w be distinct vectors of a vector space V. (3.1.1 .0.0).v.6 Bases and Dimension 55 8. Show that if {u.1 .3 . .24).6 ) . Find a subset of the set {ui. u2.2 .3) (-2.9. (0.x 3 = 0 2xi — 3x 2 4. au} and {au.X3 = 0 is a subspace of R3. The vectors m = (1.

vk.* Let V be a vector space having dimension n. and let S be a subset of V that generates V. Find a basis for W. (b) State and prove a relationship involving dim(Wi) and dim(W 2 ) in the case that dim(Wi) ^ dim(W 2 ). Determine necessary and sufficient conditions on Wi and W 2 so that d i m ( W i n W 2 ) = dim(W 1 ). Let Vi.vk}). 19. Complete the proof of Theorem 1. • • • . (a) Prove that there is a subset of S that is a basis for V.c2f"(x) + • • • 4where f^n'(x) denotes the nth derivative of f(x).vk. What is the dimension of W? 17.) (b) Prove that S contains at least n vectors.2. c\. 23.. Prove that a vector space is infinite-dimensional if and only if it contains an infinite linearly independent subset. W. . Justify your answer.. determine the dimension ofZ. (Be careful not to assume that S is finite.. 24.. 21. 20. Let Wi and W 2 be subspaces of a finite-dimensional vector space V. Find a basis for W. 22.v span({vi. Let V. What Exam is the dimension of W? 16.v}).v2.. 1 Vector Spaces 15. The set of all n x n matrices having trace equal to zero is a subspace W of M n x n ( F ) (see Example 4 of Section 1. What is the dimension of W? 18. and Z be as in Exercise 21 of Section 1. Let f(x) be a polynomial of degree n in Pn(R). Find a basis for W.C l / ' ( x ) 4. Prove that for any g(x) £ Pn(R) there exist scalars Co. Find a basis for the vector space in Example 5 of Section 1. If V and W are vector spaces over F of dimensions m and n. (a) Find necessary and sufficient conditions on v such that dim(Wi) = dim(W 2 ).cn such that g(x) = co/(x) 4.3). .56 Chap.3).. 25. be vectors in a vector space V. The set of all upper triangular n x n matrices is a subspace W of M n X n ( F ) (see Exercise 12 of Section 1. cnf^(x)..2.v2..8. and define Wi = and W 2 = span({vi.... The set of all skew-symmetric n x n matrices is a subspace W of Mnxn(^) (see Exercise 28 of Section 1.3).V2.

27. Let V = M 2 X 2 (F). Let V be a finite-dimensional vector space over C with dimension n. and Wi n W 2 . 30.) Exercises 29-34 require knowledge of the sum and direct sum of subspaces.. (See Examples 11 and 12. and W2 = £\/:a.dim(W 2 ). n W 2 ).b£F Wi = c a eV: a.. Deduce that V is4he direct sum of Wi and W 2 if and only if dim(V) = dim(Wi) 4. and let V = Wi 4..W 2 ) <m + n. 1. 31.. Let Wi and W 2 be the subspaces of P(F) defined in Exercise 25 in Section 1. (b) Prove that dim(Wi 4. Let Wi and W 2 be subspaces of a vector space V having dimensions rn and n.3... then dim V = 2n.. respectively. 29. W 2 . wp} for W 2 .uk...VV2) = m + n. where m > n > 0. (a) Find an example of subspaces Wi and W 2 of R3 with dimensions m and n. For a fixed a £ R. such that dim(Wi 4. (a) Prove that dim(Wi n W 2 ) < n.Sec.6 Bases and Dimension 57 26..W 2 .W 2 is finite-dimensional. where m > n > 0. Hint: Start with a basis {ui..u2..w2. (b) Let Wi and W 2 be finite-dimensional subspaces of a vector space V. (b) Find an example of subspaces Wi and VV2 of R3 with dimensions m and n.. vm} for Wi and to a basis {uii«2.v2.b. . then the subspace Wi 4. Prove that if V is now regarded as a vector space over R. u2.vi.uk} for Wi f) W 2 and extend this set to a basis {ui. Determine the dimensions of the subspaces Wi n P n ( F ) and W 2 n P n (F). such that dim(Wi D W 2 ) = n. 28. . as defined in the exercises of Section 1.wi. (a) Prove that if W] and W 2 are finite-dimensional subspaces of a vector space V.W 2 .c£F —a Prove that W] and W 2 are subspaces of V. determine the dimension of the subspace of Pn(R) defined by {f £ Pn(R): f(a) = 0}...dim(W. . where m> n. uk. 32. Wi 4. and dim(Wi 4-W 2 ) = dim(Wi) 4-dim(W 2 ) .. and find the dimensions of Wi.3.

un 4.u2. There is no obvious way to construct a basis for this space. which played a crucial role in many of the proofs of Section 1. Our principal goal here is to prove that every vector space has a basis.W} is a basis for V/W. Instead. and yet it follows from the results of this section that such a basis does exist.W 2 ) < m 4-n. . for example. Before stating this principle. 1 Vector Spaces 3 (c) Find an example of subspaces Wi and W 2 of R with dimensions m and n. is no longer adequate. Let {ui. The following exercise requires familiarity with Exercise 31 of Section 1. .be a family of sets. 1.. . .3. respectively..u2. Give examples of two different subspaces W 2 and W 2 such that V = Wi © W 2 and V = Wi©W2.. show that fti n £k = 0 and fti U ft2 is a basis for V. dim(W). The difficulty that arises in extending the theorems of the preceding section to infinite-dimensional vector spaces is that the principle of mathematical induction.uk. Let J. wfc+2 4.6. the vector space of real numbers over the field of rational numbers. (b) Derive a formula relating dim(V).... 34. 35. then V = Wi © W 2 . respectively.. (b) Let V = R2 and Wi = {(ai. let fti and ft2 be disjoint bases for subspaces Wi and W 2 .. A member M of T is called maximal (with respect to set inclusion) if M is contained in no member of T other than M itself.0): ai £ R}. of a vector space V.. Consider.W .6 are extended to infinite-dimensional vector spaces.uk} for W. Prove that if fti U ft2 is a basis for V. where rn > n. If fti and ft2 are bases for Wi and W 2 . . (a) Let Wi and W 2 be subspaces of a vector space V such that V = Wi ©W 2 . (a) Prove that {uk+i + W. several significant results from Section 1. (a) Prove that if Wi is any subspace of a finite-dimensional vector space V. we need to introduce some terminology. This result is important in the study of infinite-dimensional vector spaces because it is often difficult to construct an explicit basis for such a space. such that both dim(Wi D W 2 ) < n and dim(Wi 4.un} be an extension of this basis to a basis for V. 33.58 Chap. (b) Conversely. . .7* MAXIMAL LINEARLY INDEPENDENT SUBSETS In this section. . and consider the basis {ui. Definition. then there exists a subspace VV2 of V such that V = Wi©W2.uk+\. a more general result called the maximal principle is needed. Let W be a subspace of a finite-dimensional vector space V. and dim(V/W).

Am C An if and only if m<n. which is an assumption in most axiomatic developments of set theory. In Theorem 1. • Definition. 4 The Maximal Principle is logically equivalent to the Axiom of Choice. 27.) The set S is easily seen to be a maximal element oiT. General Topology. the concept defined next is equivalent to a basis. then ML) {s} is a member of T that contains M as a proper subset. Then the collection of sets C = {An: n — 1. A maximal linearly independent subset of S is a subset B of S satisfying both of the following conditions.3. Then S and T are both maximal elements of T. In fact.} is a chain. . ... • Example 3 Let T be the family of all finite subsets of an infinite set S. If. Then T has no maximal element. • With this terminology we can now state the maximal principle. in fact. we show that this is possible. For if M is any member of T and s is any element of S that is not in M. there exists a member of T that contains each member ofC. see John L. 4 Let T be a family of sets. • Example 2 Let S and T be disjoint nonempty sets.2. 1991. Maximal Principle. 1.12. (b) The only linearly independent subset of S that contains B is B itself.7 Maximal Linearly Independent Subsets Example 1 59 Let T be the family of all subsets of a nonempty set S.Sec. . (a) B is linearly independent.2. (This family T is called the power set of S. For a treatment of set theory using the Maximal Principle. A collection of sets C is called a chain (or nest or tower) if for each pair of sets A and B in C. Example 4 For each positive integer n let An = {1. Let S be a subset of a vector space V. Definition. then T contains a maximal member. . Vol. . for each chain C C JF. and let T be the union of their power sets. n } . it is useful to reformulate the definition of a basis in terms of a maximal property. Kelley.. Because the maximal principle guarantees the existence of maximal elements in a family of sets satisfying the hypothesis above. either A C B or B C A. Springer-Verlag. Graduate Texts in Mathematics Series.

is the desired set. Let V be a vector space and S a subset that generates V. Because ft is linearly independent.2 . then there exists a member U of T that contains each member of C.13. 30) that span(/3) = V.9} in P2(-R). the union of the members of C.se there exists a v £ S such that v £ span(/?). Therefore we can accomplish our goal of proving that every vector space has a basis by showing that every vector space contains a maximal linearly independent subset.4 shows that Chap. If v £ V and v ^ ft. We claim that S C span(/3). it follows from Theorem 1. and so it suffices to prove . it suffices to prove that ft generates V. 0 Thus a subset of a vector space is a basis if and only if it is a maximal linearly independent subset of the vector space. If ft is a maximal linearly independent subset ofS. 1 Vector Spaces {x3 . Proof. Let T denote the family of all linearly independent subsets of V that contain S. 2.5x2 . Clearly U contains each member of C. we must show that if C is a chain in T. for otherwi.Ax . Our next result shows that the converse of this statement is also true. In this case.9} is a maximal linearly independent subset of S = {2x3 .Ax . Let ft be a maximal linearly independent subset of S. Because span(S) = V. Theorem 1.5x . because 1.7 (p. Thus maximal linearly independent subsets of a set need not be unique.hx .3.12. we have contradicted the maximality of ft.3x 3 .3.2x2 . Therefore S C span(/3). Proof.7 (p. There exists a maximal linearly independent subset ofV that contains S. • A basis ft for a vector space V is a maximal linearly independent subset of V. Let S be a linearly independent subset of a vector space V.x 3 . any subset of S consisting of two polynomials is easily shown to be a maximal linearly independent subset of S.5 (p. then ft is a basis for V.60 Example 5 Example 2 of Section 1. then ft U {v} is linearly dependent by Theorem 1. Theorem 1. We claim that U. 39) implies that ft U {v} is linearly independent. This result follows immediately from the next theorem.6.3. ft is linearly independent by definition. In order to show that T contains a maximal element. however.2x2 .2x2 + \2x .x3 .5a. Since Theorem 1. 39) because span(/?) = V.

) Exercises 4-7 extend other results from Section 1. of V that contains S). Lectures in Abstract Algebra. . It follows that U is linearly independent. an be scalars such that aiUi 4. Because each member of C is a subset of V containing S. Ak is a linearly independent set. Van Nostrand Company.e. .) (See. (c) If a family of sets has a maximal element. then that maximal element is unique. However.n. (Sets have the same cardinality if there is a one-to-one and onto mapping between them. . Let u\.. N. Every vector space has a basis. Hint: . Show that the set of convergent sequences is an infinite-dimensional subspace of the vector space of all sequences of real numbers. (See Exercise 21 in Section 1. Jacobson.2.. analogously to Corollary 1 of the replacement theorem (p..anun = 0. 2.. p. . then that maximal element is unique.a2w2 4. say Ak. Because Ui £ U for i — 1. H Corollary. Prove that V is infinite-dimensional. (b) Every chain contains a maximal element. for example.7 Maximal Linearly Independent Subsets 61 that U £ T (i. that U is a linearly independent subset. Label the following statements as true or false. EXERCISES 1. a 2 . .• • • 4. But since C is a chain. Thus ui £ Ak for i = 1. . we have S C U C V. . New York. This element is easily seen to be a maximal linearly independent subset of V that contains S.) 3.anun = 0 implies that a\ = a2 — • • • — an = 0. Let V be the set of real numbers regarded as a vector space over the field of rational numbers. (d) If a chain of sets has a maximal clement. (a) Every family of sets contains a maximal element.• • • 4.Sec. 2. (f) A maximal linearly independent subset of a vector space is a basis for that vector space. .6 tri infinite-dimensional vector spaces. there exists a set Ai in C such that Ui £ Ai. The maximal principle implies that T has a maximal element. u2. D. 1. Linear Algebra. It can be shown. vol. . Thus we need only prove that U is linearly independent. 46). 240.3. 1953. .. so a\U\ + a2u2 4. one of these sets.2. (e) A basis for a vector space is a maximal linearly independent subset of that vector space... contains all the others. that every basis for an infinite-dimensional vector space has the same cardinality. n .un be in U and a i .

INDEX OF DEFINITIONS FOR CHAPTER 1 Additive inverse 12 Basis 43 Cancellation law 11 Column vector 8 Chain 59 Degree of a polynomial 9 Diagonal entries of a matrix £ Diagonal matrix 18 Dimension 47 Finite-dimensional space 46 Generates 30 Infinite-dimensional space 47 Lagrange interpolation formula 52 Lagrange polynomials 52 Linear combination 24 Linearly dependent 36 Linearly independent 37 Matrix 8 Maximal element of a family of sets 58 Maximal linearly independent subset 59 n-tuple 7 Polynomial 9 Row vector 8 Scalar 7 Scalar multiplication 6 Sequence 11 Span of a subset 30 Spans 30 Square matrix 9 Standard basis for F" 43 Standard basis for P n (F) 43 Subspace 16 Subspace generated by the elements of a set 30 Symmetric matrix 17 Trace 18 Transpose 17 Trivial representation of 0 36 . 4. and let S be a linearly independent subset of V. . Prove the following infinite-dimensional version of Theorem 1. There exists a subset Si of ft such that S U Si is a basis forV. that is. 5.13. Prove that any basis for W is a subset of a basis for V. Prove the following generalization of Theorem 1..9 (p. un in ft and unique nonzero scalars C\. Prove the following generalization of the replacement theorem. cn such that v — ciui + c2u2 -\ 1 cnun. Let ft be a basis for a vector space V. Let W be a subspace of a (not necessarily finite-dimensional) vector space V. c 2 . .62 Chap. 6. it is not a zero of any polynomial with rational coefficients. 1 Vector Spaces Use the fact that IT is transcendental. Then ft is a basis for V if and only if for each nonzero vector v in V. Hint: Apply the maximal principle to the family of all linearly independent subsets of S 2 that contain Si.8 (p. then there exists a basis ft for V such that Si C ft C S 2 . 7. there exist unique vectors u\. and proceed as in the proof of Theorem 1.. If Si is linearly independent and S 2 generates V.. 44): Let Si and S 2 be subsets of a vector space V such that Si C S 2 . u2. 43): Let ft be a subset of an infinite-dimensional vector space V. . ..

1 Index of Definitions Vector 7 Vector addition 6 Vector space 6 Zero matrix 8 Zero polynomial 9 Zero subspace 16 Zero vector 12 Zero vector space 15 63 / .Chap.

we consider a number of examples of linear transformations.7* T r a n s f o r m a t i o n s M a t r i c e s Linear Transformations. rotations. and Ranges The Matrix Representation of a Linear Transformation Composition of Linear Transformations and Matrix Multiplication Invertibility and Isomorphisms The Change of Coordinate Matrix Dual Spaces Homogeneous Linear Differential Equations with Constant Coefficients i n Chapter 1.4 2.2 L i n e a r a n d 2. In geometry.1) provide us with another class of linear transformations. and 4 of Section 2.1 LINEAR TRANSFORMATIONS. Recall that a function T with domain V and codomain W is denoted by 64 . These two examples allow us to reformulate many of the problems in differential and integral equations in terms of linear transformations on particular vector spaces (see Sections 2.2 2. These special functions are called linear transformations. we developed the theory of abstract vector spaces in considerable detail.1).10). NULL SPACES. the operations of differentiation and integration provide us with two of the most important examples of linear transformations (see Examples 6 and 7 of Section 2. 2. Throughout this chapter. Later we use these transformations to study rigid motions in R n (Section 6. AND RANGES In this section. Null spaces. reflections. It is now natural to consider those functions defined on vector spaces that in some sense "preserve" the structure. Many of these transformations are studied in more detail in later sections.3 2. In calculus. In the remaining chapters. and projections (see Examples 2. and they abound in both pure and applied mathematics.5 2.1 2.6* 2. we see further examples of linear transformations occurring in both the physical and the social sciences.2).7 and 5. 3. we assume that all vector spaces are over a common field F.

6j) 4. 4.(2di 4. (See Appendix B.d2). Null Spaces. .xn £V and oi.T(y) and (b) T(cx) = cT(x). . The reader should verify the following properties of a function T: V — W. 2. rf.d 2 . T is linear if and only if T(cx + y) = cT(x) + T(y) for all x.) = (2c6. We call a function T: V —* W a linear transformation from V to W if. where x = (bi.T(y) for all x..T(y) = c(26. If T is linear. Example 1 Define T: R2 -> R2 by T ( a . a„ £ F. and Ranges T: V -» W. we have T E " > x < ) = X>T(-<-<)- We generally use property 2 to prove that a given transformation is linear. in general (a) and (b) are logically independent.. we have T(cx + y) = (2(c6] +di) + cb2 + d2. Also cT(x) 4..) 65 Definition. .Sec.r/. cbi 4. . We often simply call T linear. o2. 2. If T is linear. y £ V and c£F. then T(x -y) = T(x) . So T is linear. then T(0) = 0. Since ex + y= (cbi 4. cb2 4. a i ) ..dx.di) 4 c62 + ^2.1 Linear Transformations..x2.c&2 + 2di 4. 4. cbx +di).) » 1. we have (a) T(x + y) = T(x) 4.y £ R2. y £ V and c £ F. Let V and W be vector spaces (over F). let c £ R and x. See Exercises 38 and 39. 4. but. then (a) implies (b) (see Exercise 37). + a 2 . a 2 ) = (2a. If the underlying field F is the field of rational numbers. T is linear if and only if.. y £ V.d2).) = (2(c6i 4.d 2 . for all x.6 2 . for xi. 3. (See Exercise 7. To show that T is linear. c6] + di).b2) and y = (d\.

Also. (See Figure 2.a 2 ) counterclockwise by 6 if (oi. Then a.02] Chap. observe that this same formula is valid for (01.02) = (0. fa) Rotation T(oi.0).02) = (ai. (See Figure 2.o 2 ) = (oi.rsin(a!4-0)) = (r cos a cos 6 — r sin a sin 0.66 T« (01. Then T#: R2 — R2 is a linear transformation that is called » the rotation by 0. reflection. Example 2 For any angle 0.a 2 ) ^ (0.1(a)).a2) is the vector » obtained by rotating (ai. We leave the proofs of linearity to the reader.0) (c) Projection As we will see in Chapter 6. T is called the reflection » about the x -axis. It follows that Tfl(ai.1 T(ai.0). Tff(ai.a 2 ) £ R2. It is now easy to show... The main reason for this is that most of the important geometrical transformations are linear. r cos a sin 0 + r sin a cos 0) = (ai cos 9 — a2 sin 0.0).a2) = (rcos(a4-0).a 2 ] (01. Fix a nonzero vector (ai.-02).0) = (0.ai sin 0 4. We determine an explicit formula for T#.fl2) (01.02.a2) has length r and makes an angle a + 9 with the positive x-axis. Finally. define T#: R2 — R2 by the rule: Te(ai.o 2 ) = (01.) • Example 4 Define T: R2 — R2 by T(oi.1(b).0). the applications of linear algebra to geometry are wide and varied. T is called the projection on the > x-axis. and let r = ya2 + a2. Let a be the angle that (oi. and T#(0.-02] (b) Reflection Figure 2.a2 cos 9). and projection. Three particular transformations that we now consider are rotation. that T# is linear.i — rcosc* and a2 = rsino.02] (01.1(c). Example 3 Define T: R2 — R2 by T(ai. 2 Linear Transformations and Matrices (oi. as in Example 1.) • • .a 2 ) makes with the positive x-axis (see Figure 2.

and Ranges We now look at some additional examples of linear transformations. where f'(x) denotes the derivative of f(x). Now T(ag(x) 4. . for all / £ V. The determination of these sets allows us to examine more closely the intrinsic properties of a linear transformation. We often write I instead of lvWe now turn our attention to two very important sets associated with linear transformations: the range and null space. Let a. • Example 6 Define T: Pn(R) -* Pn-i(R) by T(/(x)) = f'(x). where A1 is the transpose > of A. Definitions. that is.1 Linear Transformations.• W by T0(x) = 0 for all x £ V. h(x) £ Pn(R) and a £ R. Then T is a linear transformation by Exercise 3 of Section 1.h'(x) = aT(g(x)) + So by property 2 above. For vector spaces V and W (over F).3.3. • Two very important examples of linear transformations that appear frequently in the remainder of the book. a<b. Example 5 67 Define T: M m x n (-F) — M n x m ( F ) by T(A) = A1. T is linear. and let T: V — W be linear. are the identity and zero transformations. and therefore deserve their own notation. We define the range (or image) R(T) of T to be the subset of W consisting of all images (under T) of vectors in V. * We define the null space (or kernel) N(T) of T to be the set of all vectors x in V such that T(x) = 0. defined in Section 1. let g(x).b£ R. N(T) = { i G V : T(X) = 0}. R(T) = {T(x): x £ V}. we define the identity transformation lv: V — V by \v(x) = x for all x £ V and the zero transformation • T 0 : V . 2.Sec. Null Spaces. Then T is a linear transformation because the definite integral of a linear combination of functions is the same as the linear combination of the definite integrals of the functions. To show that T is linear. the vector space of continuous real-valued functions on R.h(x)) = (ag(x) + h(x))' = ag'(x) 4. that is. Define T : V -* R by T(/)= / Ja f(t)dt / • T(h(x)). It is clear that both of these transformations are linear. Let V and W be vector spaces. Example 7 Let V = C(R).

i). Then there exist v and w in V such that T(v) — x and T(w) = y.0):a£R} and R(T) = R 2 . we have that 0\j £ N(T).03) = (01 .. respectively. Then T(x + y) = T(x) + T(y) = 0yv + 0w = #w. 2 Linear Transformations and Matrices Let V and W be vector spaces. The next result shows that this is true in general. 30). we have that #w € R(T). so R(T) is a subspace of W. . we use the symbols Oy and 0w to denote the zero vectors of V and W. .T(w) = x + y. T(v2).. Theorem 2. With this accomplished. and T(cv) = cT(v) — ex. Theorem 2. and let I: V — V and TQ: V — W be the > > identity and zero transformations.a 2 . Proof. R(l) = V.. To clarify the notation. It is left as an exercise to verify that N(T) = {(a.i).T( V2 ). a basis for the range is easy to discover using the technique of Example 6 of Section 1.Hence x + y £ N(T) and ex £ N(T). R(T) contains span({T(i. T(vn)}) = span(T(/3)) by Theorem 1.. so that N(T) is a subspace of V. Then N(l) = {0}.v2. Proof.2. • Example 9 Let T: R3 — R2 be the linear transformation defined by > T(ai. a. and T(cx) = cT(x) = c 0 w = #w. 1 The next theorem provides a method for finding a spanning set for the range of a linear transformation. Because T(0v) = 0\N. N(T0) = V. W be linear.. 02. 2 o 3 ) .y £ N(T) and c£ F.5 (p. respectively.T(vn)})..y £ R(T) and c £ F. Let x. Clearly T(vi) £ R(T) for each i. Now let x.. Because R(T) is a subspace... Let V and W be vector spaces. So T(v 4. vn} is a basis for V. respectively. and let T: V — W be • linear. then R(T) = span(T(/?)) = span({T(<. Since T(0 V ) = #w.68 Example 8 Chap.w) = T(v) 4. If ft = {vi. • In Examples 8 and 9. Thus x + y £ R(T) and ex £ R(T).. andR(T o ) = {0}.. Let V and W be vector spaces and T: V Then N(T) and R(T) are subspaces ofV and W.1.6. we see that the range and null space of each of the linear transformations is a subspace.

Null Spaces. Reflecting on the action of a linear transformation. i=i Since T is linear. The same heuristic reasoning tells us that the larger the rank.2 is true if ft is infinite. the smaller the nullity.x. If N(T) and R(T) are finite-dimensional. and Ranges 69 Now suppose that w £ R(T). 10 i=l So R(T) is contained in span(T (/?)). it follows that n = T(«) = ^ a i T ^ ) £ span(T(/3)). that is. (See Exercise 33. Definitions. by is a basis for P2(R). . we measure the "size" of a subspace by its dimension. we have R(T) = span(T(/5)) = span({T(l).2. This balance between rank and nullity is made precise in the next theorem.({T(v): v £ ft}).1 Linear Transformations. the smaller the rank.. a n 6 F. denoted rank(T). Then w — T(v) for some « G V ./(2) 0 M2x2(R) 0 /(0). . The null space and range are so important that we attach special names to their respective dimensions. 2.) The next example illustrates the usefulness of Theorem 2. a 2 . and let T: V — W be • linear. Let V and W be vector spaces. |J It should be noted that Theorem 2. and so dim(R(T)) = 2.Sec. we see intuitively that the larger the nullity. In other words. denoted nullity(T). R(T) = Bpan. we have n v = 2 J OiVi for some 01. Because ft is a basis for V. Example 10 Define the linear transformation T: P2(R) T(/(*)) = Since ft — {l. . and the r a n k of T. then we define the nullity ofT. appropriately called the dimension theorem. respectively.T(x 2 )}) = span = span 0 0 0 1 -1 0 0 0 -3 0 0 0 0 0 0 1 -1 0 0 0 • Thus we have found a basis for R(T). . the more vectors that are carried into 0. to be the dimensions of N(T) and R(T).x2} / ( I ) . As in Chapter 1.T(. the smaller the range. .).

Proof. bk+2. The reader should review the concepts of "one-to-one" and "onto" presented in Appendix B..vk} to a basis ft = {vi. T(vn) are distinct.. Suppose that n yi biT(vi) = 0 i=fc+l for 6 fc+ i... we have bi — 0 for all i. . bn £ F.v2. Now we prove that S is linearly independent.11 (p. We claim that S = {T(vk+i).T(v f c + 2 ).. 2 Linear Transformations and Matrices Theorem 2.. Interestingly. If V is finite-dimensional. so nullity(T) = 1.. This is demonstrated in the next two theorems..2 and the fact that T(v-i) — 0 for 1 < i < k. and {vi. T(vn)} = span({T(v/ c + i). dim(N(T)) = k. we may extend {vi.v2... T(vk+2).. .\..2 = 3..70 Chap... T(vk+2). Notice that this argument also shows that T(i>fc+i)..v2...vn} for V. for a linear transformation. T(vn)} is a basis for R(T).rank(T) = dim(V). we have T / So ( J2 biVi) \i=fc+i / =°- ^2 bivi i=k+l Hence there exist c.. Suppose that dim(V) = n. .ck n k ^2 biVi = y]ciVi i=k+l i=l e N T ( )- £ F such that or k n ^2(~Ci)vi + 22 °ivi = °i=l i=k+l Since ft is a basis for V.. By the corollary to Theorem 1.c2. we have that nullity(T) 4. Using Theorem 2... First we prove that S generates R(T). Let V and W be vector spaces. then > nullity(T) 4. Using the fact that T is linear..3 (Dimension Theorem). and let T: V — W be linear... therefore rank(T) = n — k...vk} is a basis for N(T). . we have R(T) = s p a n ( { T M . Hence S is linearly independent.. 51)... B If we apply the dimension theorem to the linear transformation T in Example 9.. T(v2).. both of these concepts are intimately connected to the rank and nullity of the transformation.. T(vn)} = span(S).

we have x — 0. The next two examples make use of the preceding theorems in determining whether a given linear transformation is one-to-one or onto. and 21. Proof. 2. Hence N(T) = {0}. I The reader should observe that Theorem 2. Surprisingly. By Theorem 1. 16.5. and let T: V — W be linear. (See Exercises 15. Theorem 2.4 allows us to conclude that the transformation defined in Example 9 is not one-to-one. Proof. we have nullity (T) 4. Then T is one-to-one if and only if N(T) = {0}. I We note that if V is not finite-dimensional and T: V — V is linear. Now assume that N(T) = {0}. Let V and W be vector spaces of equal (finite) dimension.1 Linear Transformations. and suppose that T(x) = T(y). This means that T is one-to-one. and let T: V — W be > linear. and Ranges 71 Theorem 2. this equality is equivalent to R(T) = W. Since T is one-to-one. if and only if nullity(T) = 0.) The linearity of T in Theorems 2. then » it does not follow that one-to-one and onto are equivalent.11 (p. (b) T is onto. and vice versa. • (a) T is one-to-one. we have that T is one-to-one if and only if N(T) = {0}. Example 11 Let T: P2(R) —* P3(R) be the linear transformation defined by T(f(x)) = 2f'(x)-r fX3f(t)dt. Let V and W be vector spaces. Suppose that T is one-to-one and x £ N(T). Then the following are equivalent. 50). Now. and if and only if dim(R(T)) = dim(W). (c) rank(T)=dim(V).4 and 2. the definition of T being onto. but are onto. From the dimension theorem. if and only if rank(T) = dim(W). Therefore x — y £ N(T) = {0}. if and only if rank(T) = dim(V). Then 0 = T(x) — T(y) = T(x — y) by property 3 on page 65.5 is essential.rank(T) = dim(V). Jo . for it is easy to construct examples of functions from R into R that are not one-to-one.Sec. or x = y. So x — y = 0. with the use of Theorem 2. Then T(x) = 0 — T(0).4. the conditions of one-to-one and onto are equivalent in an important special case. Null Spaces.4.

and therefore. • Example 12 Let T : F2 — F2 be the linear transformation defined by + T(oi.6. vn} is a basis for V. It is easy to see that N(T) = {0}.x2}. Clearly T is linear and one-to-one. T is not onto. u 2 .72 Now Chap. nullity(T) + 3 = 3. This technique is exploited more fully later. we transferred a property from the vector space of polynomials to a property in the vector space of 3-tuples. so T is one-to-one.a i x 4-a 2 a: 2 ) = (ao. wn in W.4 that T is one-to-one.Ax 4.5 tells us that T must be onto. . • In Example 13..-x2. We conclude from Theorem 2. N(T) = {0}..\x2.2 ) } is linearly independent in R3. . which follows from the next theorem and corollary. Since {3x. . * .x + x2. n.l . . This result. T(x). is used frequently throughout the book. l ) . rank(T) = 3. Since dim(P3(i?)) = 4. Example 13 Let T: P2(R) —• R3 be the linear transformation defined by T(a 0 4. and suppose that {vi. . o i ) . 2 . From the dimension theorem. ( 0 . 2 4. 0 .1 — 2. . then a subset S is linearly independent if and only if T(S) is linearly independent. Let V and W be vector spaces over F. Then S is linearly independent in P 2 (i?) because T(S) = { ( 2 . l . Example 13 illustrates the use of this result. it is stated that if T is linear and one-to-one. .. Ax + x3}). Let S = {2 — x + 3x2. Theorem 2.x3} is linearly independent. 2 Linear Transformations and Matrices R(T) = span({T(l).. there exists exactly one linear transformation T: V — W such that T(vi) = W{ for i = 1 . w2. Hence Theorem 2.o 2 ). . ( l . For w\. 2 4. • In Exercise 14. So nullity(T) = 0. 3 ) . One of the most important properties of a linear transformation is that it is completely determined by its action on a basis.oi. .a 2 ) = (oi + o 2 . T(x 2 )}) = span({3x. .

.v) — y](dbj 4. v £ V and d £ F. .22 i=i i=l i=l (b) Clearly T(vi) = «... for ?' = 1 . i=\ where a i o 2 . . then U = T. If U. .Sec. bn. Define T: V -» W by n T(x) = J ^ o ^ .c^w. i=i 73 (a) T is linear: Suppose that u. b2. n. and suppose that. o n are unique scalars. . . 2 .v = ^2^bi i=1 So / T(du 4. c2. . • • •. ?. du 4.) = T(f^) for > i = 1 . and Ranges Proof. Null Spaces.. = d y] bjWj 4. CiWj = n v = y CiVi i=l cn. . . . 2 . Then X = y^QtTJj. V has a finite basis {vi. ... n.. I Corollary. vn}. (c) T is unique: Suppose that U: V — W is linear and U(t^) = Wi for * i = 1 . . 2. T: V — W are linear and U(??. Then for x € V with = ^OiUi. Thus + ('i)v>- dT(u) 4. .. . . v2. Let V and W be vector spaces. Then we may write n u — y biVi and i=l for some scalars bi.=i have U(x) = y^QjUfoj) = J^OiWi = T(x).1 Linear Transformations. n. ci.. Let x £ V. . .. . 2 . i=l i=l Hence U = T.T(v).

U: V — W are both linear and agree on a basis for V.1)} is a basis for R2.0. This follows from the corollary and from the fact that {(1.2o! . > For Exercises 2 through 6. 4.02. and verify the dimension theorem.y) = T(x) + T(y). then nullity(T) 4.03) = (ai — 02. . > 3. x 2 € V and yi. there exists a linear transformation T: V — W such that T(xi) = yi and T(x 2 ) = y2.3) and > U(l. • EXERCISES 1.f'(x).2). T: R3 — R2 defined by T(oi. T: R2 -> R3 defined by T(a l 5 a 2 ) = (01 4-o 2 . (d) If T is linear.203).o i . (a) If T is linear. Finally. (f) If T is linear. then T preserves sums and scalar products. (1.o 2 ) . and T is a function from V to W. T: P2(R) . (g) If T.y2 £ W. 2. 1) = (1. Then compute the nullity and rank of T.o 2 ) = (2a2 .74 Example 14 Chap.rank(T) = dim(W). prove that T is a linear transformation. (h) Given x i . then > T = U. If we know that U(l.2) = (3. (b) If T(x 4. then T carries linearly independent subsets of V onto linearly independent subsets of W. and find bases for both N(T) and R(T). and suppose that U: R2 — R2 is linear. use the appropriate theorems in this section to determine whether T is one-to-one or onto. 3 a i ) . T: M 2 x 3 ( F ) -» M 2 x 2 ( F ) defined by T On 012 0 2 1 0 22 013 \ 023 /'2an — 012 ai3 + 2ai 2 0 0 5.3). then T(#v) = #w(e) If T is linear. then T is linear.P3(R) defined by T(/(x)) = x / ( x ) 4. Label the following statements as true or false. (c) T is one-to-one if and only if the only vector x such that T(x) = 0 is x = 0. 2 Linear Transformations and Matrices Let T: R2 — R2 be the linear transformation defined by > T(oi. In each part. V and W are finite-dimensional vector spaces (over F). then U = T.

a 2 ) T(ai.o 2 ) T(ai. Let V and W be vector spaces.o 2 ) (oi. What is T(2. T(?. Let V and W be vector spaces and T : V — W be linear. (b) Suppose that T is one-to-one and that 5 is a subset of V. In this exercise. Suppose that T: R2 -> R2 is linear.3) = (1. 0. state why T is not linear.0) (|oi|.v2.Sec.. k . T(1. . Prove that S is linearly independent if and only if T(S) is linearly independent.T(v2). T: R2 — R2 is a function. 2.4).0) = (1. let T: V — W be linear. Prove that the transformations in Examples 2 and 3 are linear. 75 Recall (Example 4. Prove that T linear and one-to-one. > (a) Prove that T is one-to-one if and only if T carries linearly independent subsets of V onto linearly independent subsets of W. .3) that tv(A) = Y/Aii.1)? 13. 9.-1.vk} is chosen so that T(vi) = Wi for i = 1 .3)? Is T one-to-one? 11..it)2. but not onto.o 2 ) (ai 4-1. l ) = (1.a 2 ) T(ai.0. .5). • • • ..2) and T(2.. and 4 on page 65. W h a t / s T(8. and Ranges 6.v2. . Sec- 7...2 .Tl)} is a basis for W. a2) T(oi.1 Linear Transformations.a 2 ) = = = = = (l. T: M n x n ( F ) -¥ F defined by T(A) = tr(A). Null Spaces. . 3.. .6 ) = (2. . For each of the following > parts. then S is linearly independent.wk} be a linearly independent subset of R(T). (c) Suppose ft = {vi. Define T:P(JJ)-P(Jl) by T(/(x)) = f(t)dt. Prove properties 1. and T(l. 14. 1) = (2. 8. . (a) (b) (c) (d) (e) T(oi.1) » and T ( . Is there a linear transformation T: R3 — R2 such that T(l.of) (sinai. 2. 15. .4). Recall the definition of P(R) on page 10.11)? 12. 0 .3) = (1.. a 2 ) 10.. tion 1. 2 .vn} is a basis for V and T is one-to-one and onto. Prove that if S — {vi. and let > {w\. Prove that there exists a linear transformation T: R2 — R3 such that > T ( l . Prove that T(ft) = {T(vi).

76 Chap. Let V be the vector space of sequences described in Example 5 of Section 1. respectively. Can you generalize this result for T: F n —• F? State and prove an analogous result for T . Let V be V such that V = W] © W 2 . for x = T(x) = x\. and c such » that T(x. 22.ai. Define the functions T. Hint: Use Exercise 22. Definition. 18. 21. Give an example of distinct linear transformations T and U such that N(T) = N(U) and R(T) = R(U). Let V and W be finite-dimensional vector spaces and T: V linear. . linear.03. Let T: P(R) -> P{R) be defined by T(/(x)) = f'(x). a 2 . 17. o 2 .a 2 . Give an example of a linear transformation T: R2 N(T) = R(T). T(V ( ) is a subspace of W and that » {x £ V: T(x) £ Wi} is a subspace of V.) and U ( a i . we have 24. y. U: V -» V by T ( a i . but not one-to-one. R2 such that 19. Let T: R — R be linear. 20.3. Let T: R2 — R2.. but not one-to-one. (b) Prove that if dim(V) > dim(W). Prove that T is onto. * . (c) Prove that U is one-to-one. (b) Prove that T is onto.. . Describe geometrically the possibilities for > the null space of T.x2 with xi £ Wi and x2 £ W 2 . If T: V — W is linear. pra 23.. ^ (a) Prove that T and U are linear. Include figures for each of the following parts. then T cannot be one-to-one. . b.) A Wi along VV2 if. (a) Prove that if dim(V) < dim(W). z) = ax + by + cz for all (x.2. but not onto. pn . ) = (0. • • •)• T and U arc called the left shift and right shift operators on V. exercises of Section 1. then T cannot be onto. Show that there exist scalars a. The following definition is used in Exercises 24 27 and in Exercise 30. a vector space and Wi and W2 be subspaces of (Recall the definition of direct sum given in the function T: V — V is called the projection > on xi 4. Let V and W be vector spaces with subspaces Vi and Wi. prove that. . . Let T: R3 — R be linear. z) £ R3. y. . ) = (02. respectively. 2 Linear Transformations and Matrices Recall that T is W be 16.

(a) If T(o. Null Spaces. Definitions. Let V be a vector space. Suppose that V = R(T)©W and W is T-invariant. b. Prove that there exists a subspace W and a function T: V — V > such that T is a projection on W along W . (Recall the definition of direct sum given in the exercises of Section 1.s): s £ R}. assume that T: V — V is > the projection on Wi along W 2 . 29. Using the notation in the definition above. b. 2. b).V . Warning: Do not assume that W is T-invariant or that > T is a projection unless explicitly stated. where T represents the projection on the y-axis along the x-axis. Suppose that T is the projection on W along some subspace W . 30.) (a) . c). Prove that W is T-invariant and that Tw — lw31. V. a): a £ R}. 0). Describe T if Wi is the zero subspace. 27. (b) Give an example of a subspace W of a vector space V such that there are two projections on W along two (distinct) subspaces.0). . and Ranges 77 (a) Find a formula for T(a. 25. 6. (c) If T(a. where T represents the projection on the 2-axis along the xy-plane. A subspace W of V is said to be T-invariant ifT(x) £ W for every x £ W. 0. that is. (b) Find a formula for T(a.3. • Exercises 28-32 assume that W is a subspace of a vector space V and that T: V — V is linear. The following definitions are used in Exercises 28 32. Suppose that W is a subspace of a finite-dimensional vector space V. R(T). prove that Tw is linear. where T represents the projection on the y-axis along the line L — {(s. 7f W is T-iuvariaut. = R(T) and W 2 = N(T). Let T: R: R3. 28. show that T is the projection on the xyplane along the 2-axis.b).c) = (a. Describe T if W. 26.c) = (a — c. Prove that W.Sec.(/-plane along the line L = {(o. T(W) C W. show that T is the projection on the x. Prove that the subspaces {0}. b.1 Linear Transformations.b. If W is T-invariant. and N(T) are all T-invariant. we define the restriction of T on W to be the function T w : W — W defined by T w ( x ) = T(x) for all x £ W. and let T: V —• V be linear. (a) (b) (c) (d) Prove that T is linear and W| = { J G V : T(X) = x}. (b) Find a formula for T(a.

(b) Show that if V is finite-dimensional.T(y) for all x. 33. 37. A function T: V — W between vector spaces V and W is called additive > if T(x + y) = T(x) 4. and > f(z) = z otherwise. By Exercise 34. Prove the following generalization of Theorem 2. 60). 35. Prove that T is » additive (as defined in Exercise 37) but not linear. Prove that if V and W are vector spaces over the field of rational numbers. By the corollary to Theorem 1. Conclude that V being finite-dimensional is also essential in Exercise 35(b). 38. Then for any function f:ft—>\N there exists exactly one linear transformation T: V -+ W such that T(x) = f(x) for all x £ ft. Let x and y be two distinct vectors in ft. then any additive function from V into W is a linear transformation. Prove that there is an additive function T: R — R (as defined in Ex> ercise 37) that is not linear.78 Chap. Prove Theorem 2. then W = N(T). Prove that V = R(T) © N(T). but V is not a direct sum of these two spaces. f(y) = x. Let V and T be as defined in Exercise 21. (c) Show by example that the conclusion of (b) is not necessarily true if V is not finite-dimensional. Thus the result of Exercise 35(a) above cannot be proved without assuming that V is finite-dimensional. 34. Let T: C — C be the function defined by T(z) = z. Let V be a finite-dimensional vector space and T: V — V be linear. R(T) = spm({T(v):v£ft}). (b) Suppose that R(T) n N(T) = {0}. Prove that N(T W ) = N ( T ) n W and R(T W ) = T(W). and define / : ft — V by f(x) = y. Hint: Let V be the set of real numbers regarded as a vector space over the field of rational numbers. there exists a linear transformation . y £ V. and let ft be a basis for V.13 (p. Be careful to say in each part where finite-dimensionality is used. 32. / 36.6: Let V and W be vector spaces over a common field. V has a basis ft.2 for the case that ft is infinite. 2 Linear Transformations and Matrices (a) Prove that W C N ( T ) .3. (b) Find a linear operator T x on V such that R(TT) n N(Ti) = {0} but V is not a direct sum of R(Ti) and N(Ti). » (a) Suppose that V = R(T) + N(T). Prove that V = R(T) © N(T). (a) Prove that V = R(T) + N(T). Exercises 35 and 36 assume the definition of direct sum given in the exercises of Section 1. 39. that is. Suppose that W is T-invariant.

3. for the vector space P n (F). . 63} is an ordered basis. x . . . This identification is provided through the use of coordinate vectors. In fact. 40. (c) Read the proof of the dimension theorem. e n } the standard ordered basis for F n . Use (a) and the dimension theorem to derive a formula relating dim(V).e3} can be considered an ordered basis. Let V be a vector space and W be a subspace of V. Example 1 In F3. ei. and dim(V/W). an ordered basis for V is a finite sequence of linearly independent vectors in V that generates V.2 THE MATRIX REPRESENTATION OF A LINEAR TRANSFORMATION Until now. 2. e 2 . we develop a one-to-one correspondence between matrices and linear transformations that allows us to utilize properties of one to study properties of the other. • For the vector space F n . Then T is additive. The following exercise requires familiarity with the definition of quotient space given in Exercise 31 of Section 1. We first need the concept of an ordered basis for a vector space. (a) Prove that 7 is a linear transformation from V onto V/W and that 7 N(n) = W. Let V be a finite-dimensional vector space. that is. dim(W).Sec. Similarly.2 The Matrix Representation of a Linear Transformation 79 T: V -> V such that T(u) = f(u) for all u £ ft. as introduced next. (b) Suppose that V is finite-dimensional. but ft ^ 7 as ordered bases. we call {1. In this section. . . Also 7 = {e2. we call {ei. . An ordered basis for V is a basis for V endowed with a specific order. Compare the method of solving (b) with the method of deriving the same result as outlined in Exercise 35 of Section 1. T(cx) ^ cT(x). Now that we have the concept of ordered basis. we have studied linear transformations by examining their ranges and null spaces. Define the mapping n: V -» V/W by jj(v) = v + W for v £ V. we embark on one of the most useful approaches to the analysis of a linear transformation on a finite-dimensional vector space: the representation of a linear transformation by a matrix. but for c = y/x. .6. . x n } the standard ordered basis for P n ( F ) . . 2. Definition.e 2 . we can identify abstract vectors in an n-dimensional vector space with n-tuples. ft = {ei.

2 Linear Transformations and Matrices Definition.7 x 2 . . Let T: V — W be linear. 1 < j < n. we call the mxn matrix A defined by Aij = a^ the matrix representation of T in the ordered bases ft and 7 and write A = [T]^.? < n. Suppose that V and W are finite-dimensional vector spaces with ordered bases ft = {v\...102. Also observe that if U: V — W is a linear transformation such that [U]^ = [T]^. a n be the unique scalars such that x = y^QjMj. • • • . and let ft = {1. then we write A = [T]p. Definition. Let ft — {ui. . For x £ V.80 Chap. let oi. by Let us now proceed with the promised matrix representation of a linear transformation. respectively. v2.. It is left as an exercise to show that the correspondence x —• [x]/3 provides us with a linear transformation from V to F n .6 (p. Example 2 Let V = P2(R). . = i=i vector of x relative /ai\ o2 \x a = \On/ Notice that [ui\p = ei in the preceding definition. If V = W and ft = 7. wm}. We study this transformation in Section 2..4 in more detail. If f(x) = 4 + 6 x . there exist + unique scalars Oij £ F. x . a 2 . •. x 2 } be the standard ordered basis for V. denoted [x]a. . then We define the coordinate to ft. then U = T by > the corollary to Theorem 2. such that m T(VJ) = y2 aijwi i=l f° r 1 < . .u2. . 73). Using the notation above. vn} and 7 = {101. We illustrate the computation of [T]2 in the next several examples. 1 <i <m. Notice that the jih column of A is simply [T(vj-)]7. . Then for each j.un} be an ordered basis for a finitedimensional vector space V.

2 The Matrix Representation of a Linear Transformation Example 3 Let T: R2 —* R3 be the linear transformation defined by T(ai. x 2 T(x) = l-l-r-O-x + O-x2 T(x 2 ) = 0 .l 4 . ei}.x 2 . 2oi . then '2 mi' = 1 0 1 Example 4 Let T: P3(-R) —• P2(-#) be the linear transformation defined by T(/(x)) = f'(x). a2) = (oi 4. • .x 4 .4e 3 .2e3 and T(0.4 ) = 3 d + 0e2 .0. Let ft and 7 be the standard ordered bases for P3(R) and P 2 (i?). respectively.1) = (3. 3.3a 2 . Hence If we let 7' = {e3.l 4 .0 . / • Note that when T(x3) is written as a linear combination of the vectors of 7.l + 0 . Now T(l. 81 Let ft and 7 be the standard ordered bases for R2 and R3.x 4 . e 2 .2 . its coefficients give the entries of column j + 1 of [TH.x + 0-x 2 T(x 3 ) = 0 .Sec.3 .0.2) = lei + 0e2 4. Then T(l) = 0 . 2.0. So [T]} / 0 1 0 0' = 0 0 2 0 0 0 0 3.4a 2 ).0 . 0) = (1. -£ 01. . respectively.

U: V — W be linear transfor> mations.4. So aT 4. Let V and W be vector spaces over F. We define T 4. (b) Using the operations of addition and scalar multiplication in the preceding definition. Theorem 2. (b) Noting that T 0 .U(x) for a/1 x £ V.8. and let T. 2 Linear Transformations and Matrices Now that we have defined a procedure for associating matrices with linear transformations. where n and rn are the dimensions of V and W. II Definitions. (a) For all a. Then .82 Chap. to have the result that both sums and scalar multiples of linear transformations are also linear. In the case that V = W. Let V and W be finite-dimensional vector spaces with ordered bases ft and 7. it is easy to verify that the axioms of a vector space are satisfied. respectively. Let T.aT(y) + V(y) = c(aT + [)){x) + {aT + U){y). and let T.y) = a[T(cx + y)]4-cU(x) + U(y) = a[cT(x) + T(y)] + cU(x)4-U(y) = acT(x) 4. however. To make this more explicit. we write £(V) instead of £(V. W).U is linear.8 that this association "preserves" addition and scalar multiplication. we need some preliminary discussion about the addition and scalar multiplication of linear transformations. the zero transformation. (a) Let x. This identification is easily established by the use of the next theorem. U: V . In Section 2.U(cx 4. respectively.U: V —* W by (T + U)(x) = T(x) 4. Let V and W be vector spaces over a field F. these are just the usual definitions of addition and scalar multiplication of functions. £ F. aT 4. where V and W » are vector spaces over F. W).y) = aT(cx 4. Of course. U: V — W be arbitrary functions. Definition. Theorem 2.7. We are fortunate. Proof. we see a complete identification of £(V. and let a £ F.UWcx 4. and aT: V -> W by (aT)(x) = oT(x) for all x € V.• W be linear. and hence that the collection of all linear transformations from V into W is a vector space over F.y) 4. the collection of all linear transformations from V to W is a vector space over F. we show in Theorem 2.U is linear.cU(x) 4. Then (aT 4. W) with the vector space Mmxn(E). y £ V and c £ F. We denote the vector space of all linear transformations from V into W by £(V. plays the role of the zero vector.

.3a 2 . .3ai 4.U ] J ) i j = n . illustrating Theorem 2.Sec. i j=] Hence (T + U K ^ ^ ^ O y + f c y J t B .2ai. So [T + U ] J . 1 < j < n) such that rn T(t. Proof.0. • which is simply [T]l + [U]^. Let ft and 7 be the standard ordered bases of R2 and R3. . v2. Example 5 Let T: R2 — R3 and U: R2 — R3 be the linear transformations respectively > > defined by T(oi. unique scalars ay and />ij (1 < i < m. and the proof of (b) is similar.a 2 .2 The Matrix Representation of a Linear Transformation (a) [T + % = [T]} + M}and 83 (b) [oT]I = a[T]^ for all scalars a...a 2 ) = (2ai 4.a 2 ) = (ai 4.w2. i=\ Thus ( [ T 4 ... and U '1 j2 .8.2a 2 .2oi — 4a 2 ) and U(oi. Then mj = [0 2 (as computed in Example 3). 01 -A. respectively.5ai .o 2 ) = (ai .. 2. So (a) is proved. vn} and 7 = {wi. If we compute T 4.2a 2 ). we obtain (T 4.j) = yjoij'" .2a 2 ).. U ail( There exist l m U(vj) = 7 ] 6jjt^j i=l for 1 < j < n.U)(a.[ 2 Oj. .2oi. Let ft = {vi. J 4 .U using the preceding definitions.^ = ([T]54-[U]3) u ..3 -r 0I 2.wm}.

0).02) = (2ai — a2. . .03) = (2a2 4 . ft={l. . (d) [ H . W) is a vector space. aT 4. .03).ai). n m For each linear transformation T: R — R .03) = 2ai 4 a2 — 303. R defined by T(ai. .2). r 3.1). 02. respectively. . respectively.3ai 4-4a2. ai 4. > (a) (b) (c) (d) T:R 2 T:R 3 T:R 3 T:R 3 R3 defined by T(01. n m 2. 7 (g) T: R -* R defined by T ( a i . M2x2(R) P 2 (fl) by T = (a + b) + (2d)x 4. Let and 7 = {l. compute [T]l. 2 Linear Transformations and Matrices EXERCISES 1.02. .2.an. (b) [T]J = [[)]} implies that T = U.1. r n ( 0 T : R -* R defined by T ( a i . and 7 = {1}- . 5. a „ _ i .3)}.x1}. a 2 .3a2 — 03. compute [TTC. U: V — W are linear transformations. n (e) T: R -* R defined by T ( a i . Compute [T]J. .ai +03).3)}. Let T: R2 » R3 be defined by T ( a i . . a n ) = ai 4. a 2 . then [T]^ is an m x n matrix. R3 defined by T(ai.0 3 . .03) = (2ai 4. . 2 a i 4-a 2 ).x. . Let ft and 7 be the standard ordered bases for R and R . .W) = £(W. a i . Let ft be the standard ordered basis for R2 and 7 = {(1. a 2 .V). If a = {(1.84 Chap.x 2 }. . Assume that V and W are finite-dimensional vector spaces with ordered bases ft and 7.U is a linear transformation from V to W. .01). (2. 4. . a 2 ) = (ai — a 2 . (2. Define T: Let 0 = Compute [T]^. Label the following statements as true or false. . R 2 defined by T(ai. a n ) = ( a i . and T. > (a) For any scalar a.1. . a i . a2. .ai).0 ! 4-4a 2 4-5a 3 . . (c) If m = dim(V) and n = dim(W).a„) = ( o n . (0.U ] J = rT]J + [U]J(e) £(V.x. . (f) £(V.6x2.

compute [f(x)]@. . 2.8. there exists a linear transT: V —• V such that T(VJ) = Vj 4. (Recall by Exercise 38 of Section 2. Define T: V — V by T(z) = z. (g) For a £ F. Let V be the vector space of complex numbers over the field R.2 The Matrix Representation of a Linear Transformation (a) Define T: M 2 x 2 ( F ) -+ M 2 x 2 ( F ) by T(A) = AK Compute [T] a .7. 72). Prove part (b) of Theorem 2. (b) Define T: P2(R) .fj-i for j = 1. vn}. Compute [T]L (e) If A = 1 0 -2 4 compute [i4]Q. Define T: V — F n by T(x) = [x]^. / 8.1) having dimension k. Compute [T]2.Sec.i}. and compute [T]^. > 9. and let T: V — V be a linear * transformation. (c) Define T: M 2 x 2 ( F ) -> F by T(A) = tr(A). (d) Define T: P2(R) -* R by T(/(x)) = /(2). By Theorem 2. Prove that T is linear. where ft — {l.. where ' denotes differentiation.x 2 .2.1 that T is not linear if V is regarded as a vector space over the field C. 7. . 6. Suppose that W is a T-invariant subspace of V (see the exercises of Section 2.. = 0. 11.6 (p. where z is the complex conjugate of z.) 10. Complete the proof of part (b) of Theorem 2. compute [a]7. Show that there is a basis ft for V such that [T]^ has the form A O B C where A is a k x k matrix and O is the (n — k) x k zero matrix. Prove > that T is linear. Let V be Define vo formation Compute a vector space with the ordered basis ft = {v\.V2..n. (f) If f(x) = 3 — 6x 4.M2x2(R) by T(/(x)) = ^ 0 ) 85 *£jjj) .* Let V be an n-dimensional vector space with an ordered basis ft. [T]^. • • •. Compute [T]l. Let V be an n-dimensional vector space.

we learned how to associate a matrix with a linear transformation in such a way that both sums and scalar multiples of matrices are associated with the corresponding sums and scalar multiples of the transformations.3 COMPOSITION OF LINEAR TRANSFORMATIONS AND MATRIX MULTIPLICATION In Section 2.T(y)) = all(T(x)) 4.y)) = U(oT(x) 4. W). Let V be a finite-dimensional vector space and T be the projection on W along W .9.W): T(x) = 0 for all x £ S}. . Let V = P(R). then (Vi 4. 15. respectively. Define 5° = {T £ £(V. and for j > 1 define T. Let x. U} is a linearly independent subset of £(V. . W). . Let V and W be vector spaces such that dim(V) = dim(W). prove that {T.V 2 )° = V? n Vg.1 on page 76. (b) If Si and S2 are subsets of V and Si C S2. The attempt to answer this question leads to a definition of matrix multiplication. (a) 5° is a subspace of £(V. 2. Let V and W be vector spaces. Prove that the set {Ti. and let S be a subset of V. T 2 . The question now arises as to how the matrix representation of a composite of linear transformations is related to the matrix representation of each of the associated linear transformations. then S% C 5?.U(T(y)) = o(UT)(x) 4. T h e o r e m 2. 2 Linear Transformations and Matrices 12. 13. We use the more convenient notation of UT rather than U o T for the composite of linear transformations U and T. W.y) = U(T(ox 4.) Find an ordered basis ft for V such that [T]^ is a diagonal matrix. derivative of f(x). where f{j)(x) is the jib. (See the definition in the exercises of Section 2.UT(y). 16. (c) If Vi and V2 are subspaces of V.2. If R(T) n R(U) = {0}.-(/(a. Then UT(ox 4. T n } is a linearly independent subset of £(V) for any positive integer n.86 Chap.) Our first result shows that the composite of linear transformations is linear. 14. and let T and U be nonzero linear transformations from V into W. I . and let T: V -+ W and U: W -» Z be linear. Let V. .)) = fu)(x). (See Appendix B. where W and W are subspaces of V. and Z be vector spaces over the same field F. such that [T]I is a diagonal matrix. Prove the following statements. Show that there exist ordered bases ft and 7 for » V and W. Let V and W be vector spaces. and let T: V — W be linear. Then UT: V -> Z is linear. y € V and a£ F. Proof.

and 7 = {z\.Sec. We define the product of A and B.. Proof.vn}. W. Ui.T (d) a(UjU 2 ) = (alli)U 2 = U. AikBkj for 1 < i < m. m }. and Z. . k=\ Note that (AB)ij is the sum of products of corresponding entries from the ith row of A and the j t h column of B. we have / m \ rn (UT)fe) = \J(T(Vj)) = U ( ] £ Bkjwk = J2 .) Let T: V — W and U: W — Z be linear transformations. Theorem 2. For 1 < j < n. Let A be an m x n matrix and B be an n x p matrix. to be the m x p matrix such that n (AB)ij = Y.. U2 £ £(V)..fc=i k=l = E E ^ i=l \A.u. Let V be a vector space. respectively.v2.i02. Some interesting applications of this definition are presented at the end of this section.)U 2 (c) Tl = IT .. Exercise. (See Exercise 8. ft = {ttfi. Let T. + U 2 ) = T U i + T U 2 a n d ( U ] + U 2 )T = U1T4. Consider the matrix [UT]^. where a = {vi. denoted AB... 1 < j < p.Z2. 2.(aU 2 ) for all scalars a.3 Composition of Linear Transformations and Matrix Multiplication 87 The following theorem lists some of the properties of the composition of linear transformations.zp} are ordered bases for V. We would like to define the product AB of two matrices so that AB — [UT]^. Definition. • • • .. Then (a) T(U.10. BkMwk) B kj . MkBkjfe=i This computation motivates the following definition of matrix multiplication. and let A = [U]^ » • and B = [T]£.U 2 T (b) T(UiU 2 ) = (TU.=1 i=l where Cij — 2_. I A more general result holds for linear transformations that have domains unequal to their codomains.

that is. we show that if A is an mxn matrix and B is an nxp matrix. respectively. and 7. Since (AB)tij = (AB)ji and [B*A% = ] £ ( B * ) « ( A % = fc=i fc=i J2BkiA^ = J2AjkBki fc=i we are finished. Let V. then (AB)1 = BtAt. and the two "outer" dimensions yield the size of the product.88 Chap. W. Example 1 We have 1-44-2-24-1-5 0-44-4-24-(-l)-5 Notice again the symbolic relationship ( 2 x 3 ) . Then [UTE = [urjmS- . • As in the case with composition of functions. it need not be true that AB = BA. in order for the product AB to be defined.( 3 x l) = 2 x 1. The next theorem is an immediate consequence of our definition of matrix multiplication. 2 Linear Transformations and Matrices The reader should observe that in order for the product AB to be defined. we have that matrix multiplication is not commutative. there are restrictions regarding the relative sizes of A and B. Let T: V — W and U: W — Z be * * linear transformations. Recalling the definition of the transpose of a matrix from Section 1. ft. and Z be finite-dimensional vector spaces with ordered bases a. Theorem 2. Therefore the transpose of a product is the product of the transposes in the opposite order.11. The following mnemonic device is helpful: "(ra x n)'(n x p) = (ra x p)". Consider the following two products: 1 1 0 0 0 1 1 0 1 1 0 0 and 0 1 1 0 1 1 0 0 0 0 1 1 Hence we see that even if both of the matrix products AB and BA are defined.3. the two "inner" dimensions must be equal.

(d) If V is an n-dimensional vector space with an ordered basis ft. 1 0 0 1 and 1% — /i = (l). Let V be a finite-dimensional vector space with an ordered basis ft. the identity transformation on P2(R). (c) ImA = A = AIn. (b) a(AB) = (aA)B — A(aB) for any scalar a. We illustrate Theorem 2. 2. h = The next theorem provides analogs of (a). respectively. The n x n identity matrix In is defined by (In)ij — Oy.12.E)A = DA + EA. we sometimes omit the subscript n from In. T h e o r e m 2.10(b) has its analog in Theorem 2.0 0 0 3. From calculus. When the context is clear. it follows that UT = I. Then [[)T]0 = [V]p[T]p.Sec. To illustrate Theorem 2. and (d) of Theorem 2. Jo Let a and /? be the standard ordered bases of Ps(R) and P2(R). for example. Observe also that part (c) of the next theorem illustrates that the identity matrix acts as a multiplicative identity in M n X n ( F ) . along with a very useful notation. the Kronecker delta. (c).11 in the next example.10.16. B and C be n x p matrices.11. Thus. Let A be an m x n matrix. Then (a) A(B -t-C) = AB + AC and (D 4. We define the Kronecker delta 6ij by Sij = lifi=j and 6ij =0ifi^j. then [lv]/3 = In- . /° 1 0 \° 0 0 1 2 0 o\ 0 0 3 1 / The preceding 3 x 3 diagonal matrix is called an identity matrix and i is defined next. and D and E be q x rn matrices. Definitions.U £ £(V). Example 2 Let U:'P 3 (i2) -> P2(R) and T: P2(R) -* ?3{R) be the linear transformations respectively defined by U(/(x)) = f'(x) and T(/(x)) = f f(t) dt. observe that '0 1 0 0N [UTl/j = [UlSm? = | 0 0 2 0 .3 Composition of Linear Transformations and Matrix Multiplication 89 Corollary. Theorem 2. Let T.

Exercise.. Then .ak be scalars. For an n x n matrix A. . Ci. assume that the cancellation law is valid. . fc=l fc=l Corollary. So A(B + C) = AB + AC. Ak = Ak~l A for k = 2 . from A-A = A2 = O = AO.1 = A. . . For each j (1 < j < p) let Uj and Vj denote the jth columns of AB and B.90 Chap.. Then. which is false. (See Exercise 5. C 2 . (c) We have m m [ImA)ij = y ^\lrn)ikAkj — / Ojfc-^fcj = Ay.) (a) We have [A(B + C)}ij = ] T Aik(B 4. With this notation. A2 = AA. we see that if A = 0 0 1 0/ ' =YjaiABi i=\ A ikCkj then A2 = O (the zero matrix) even though A ^ O. We prove the first half of (a) and (c) and leave the remaining proofs as an exercise. and ai.a2. .. Then Al^OiBi) j=i and / < fc \ fc a C A Y i i ) = X ] OiCi A vi=I / i=l Proof. .(AC)ij = [AB 4.Bk benxp matrices. We define A0 = In. Let A be an ra x n matrix and B be an n x p matrix. Cfc be qxm matrices. . 2 Linear Transformations and Matrices Proof.AikCkj) = J2 AikBkj 4. Let A be an ra x n matrix. we define A. respectively.C)kj = J2 Aik(Bkj + Ckj) k=i fc=i n n n = ^2(AikBkj 4. 3 .Y k=\ k=l fc=l = (AB)ij 4. .. and. To see why.13. Thus the cancellation property for multiplication in fields is not valid for matrices. we would conclude that A — O.• • . A3 — A2A.B2.AC]y. Theorem 2. Bi. . in general.

respectively. Let V and W be finite-dimensional vector spaces having ordered bases ft and 7. The next result justifies much of our past work.). where ej is the jth standard vector of Fp. Proof. we obtain [T(ti)]7 = mh Example 3 Let T: Ps(i?) — P 2 (#) be the linear transformation defined by T(/(x)) = • f'(x). respectively. for each u £ V. An analogous result holds for rows.11. Identifying column vectors as matrices and using Theorem 2.2. Fix u £ V. we have [T(u)]7 = [T]}[u}0. It utilizes both the matrix representation of a linear transformation and matrix multiplication in order to evaluate the transformation at any given vector. and let T: V —• W be linear.14.Sec. If A — [T]L then. (a) We have 91 f(AB)u\ (AB)2j \{AB)mj) fc=i ^2A2kBk kj fc=l 2 4 AmkBkj Vfc=i = A (Bij\ B2j {BnjJ = AvJ- Hence (a) is proved. row i of AB is a linear combination of the rows of B with the coefficients in the linear combination being the entries of row i of A.13 that column j of AB is a linear combination of the columns of A with the coefficients in the linear combination being the entries of column j of B. from Example 4 of Section 2.) It follows (sec Exercise 14) from Theorem 2. we have /0 A= I0 \0 1 0 0N 0 2 0 0 0 3. Then. Proof. 2. Let a = {1} be » the standard ordered basis for F. T h e o r e m 2. and define the linear transformations / : F — V by * /(a) = au and g: F — W by g(a) = aT(u) for all a £ F. = M = PVE = m j i / 1 2 = m j [ / ( i ) ] / j = m j M / i • . and let ft and 7 be the standard ordered bases for Ps(R) and P2(i?.3 Composition of Linear Transformations and Matrix Multiplication (a) Uj = AVJ (b) Vj — BCJ. Notice that g = If. The proof of (b) is left as an exercise. (See Exercise 6. that is.

but also '0 = 0 0 1 0 0' 0 2 0 0 0 3. then q(x) = p'(x) = — A + 2x 4.-)]. We denote by LA the mapping LA. it has a great many other useful properties. .9x 2 . | 2 raw* = A\P(X)]0 V V We complete this section with the introduction of the left-multiplication transformation LA.14 by verifying that [T(p(x))]7 = [T]l[p(x)]/3. ( 2\ -A 1 !. but./(. we use it to prove that matrix multiplication is associative. Let A be an ra x n matrix with entries from a field F.T(p(.r)) . For example. We call LA a left-multiplication transformation. in fact.92 Chap. Hence . Example 4 Let A = Then A £ M2x3(R) 1 0 2 1 1 2 and L^: R3 -> R2. where A is an mxn matrix. 2 Linear Transformations and Matrices We illustrate Theorem 2. If then LA(x) = Ax = 1 2 0 1 We see in the next theorem that not only is L^ linear. This transformation is probably the most important tool for transferring properties about transformations to analogous properties about matrices and vice versa. Let q(x) — T(p(x)). Definition. These properties are all quite natural and so are easy to remember.F" — FTO defined by L^(x) = Ax (the * matrix product of A and x) for each column vector x £ F n . where p(x) £ P3(R) is the polynomial p(x) = 2 —4x4-x 2 4-3x 3 .

) (d) Let C = [T]}. The fact that LA is linear follows immediately from Theorem 2. that is. Let A.15. (e) For any j (1 < j < p). (f) The proof is left as an exercise. (b) If LA = LB. (a) [LA]} = A. By Theorem 2.) 1 We now use left-multiplication transformations to establish the associativity of matrix multiplication.13(b).LB a n d LaA = OLA for all a£ F. T h e o r e m 2. (e) If E is an n x p matrix.Sec.j). (a) The jth column of [LA]} is equal to LA(CJ). It is left to the reader to show that (AB)C is defined. (See Exercise 7. it follows that A(BC) = (AB)C.In fact. (See Exercise 7. So T = L^. > if B is any other ra x n matrix (with entries from F) and ft and 7 are the standard ordered bases for F n and Fm.6 (p. matrixC such Proof. then Ljn = lp«. then there exists a uniquemxn * that T = Lc. So (AE)ej = A(Ee. and C be matrices such that A(BC) is defined. then LAE = ^-ALE(f) Ifm = n. matrix multiplication is associative.14. C = [T]}.12. Furthermore. (d) If T: F n — F m is linear. then we have the following properties. The proof of the converse is trivial.13 several times to note that (AE)ej is the jth column of AE and that the jth column of AE is also equal to A(Eej). then we may use (a) to write A — [LA]} LB]} = B. | . we have LA(BC) = LALBC = L A ( L B L C ) — ( L ^ L B ) L C = L ^ B I . which is also the jth column of A by Theorem 2. we may apply Theorem 2. Then the left-multiplication transformation LA '. we have [T(x)]7 = [ T ^ x ] ^ or T(x) = Cx = Lc(x) for all x £ F n . Proof. (b) LA = LB if and only if A = B.B. (c) The proof is left as an exercise. (c) LA+B = LA 4.F n — F m is linear.15.16. Hence LAE = L^Ls by the corollary to Theorem 2. Thus LAE(ej) = (AE)ej = A(Eej) = LA(Ee5) = L A (LB(e. The uniqueness of C follows from (b). 73).C = L^AB)C- So from (b) of Theorem 2.3 Composition of Linear Transformations and Matrix Multiplication 93 T h e o r e m 2. Then (AB)C is also defined and A(BC) = (AB)C. Hence A — B. Let A be an m x n matrix with entries from F. respectively. 2.15 and the associativity of functional composition (see Appendix B). Using (e) of Theorem 2. However L^e^) = Aej. So [LA]} = A.)).

If we have a relationship on a set of n objects that we denote by 1. 2 Linear Transformations and Matrices Needless to say.. The problem of determining cliques is difficult. we see that person 3 can send to 4 but 1 cannot send to 4. A maximal collection of three or more people with the property that any two can send to each other is called a clique. Suppose that fO 1 0 1 0 0 A = 0 1 0 \1 1 0 0\ 1 1 0/ Then since A34 = 1 and A14 = 0. Applications A large and varied collection of interesting applications arises in connection with special matrices called incidence matrices. that is. but there is a simple method for determining if someone . We obtain an interesting interpretation of the entries of A 2 .94 Chap. In general.A34A41. Consider. however. this theorem could be proved directly from the definition of matrix multiplication (see Exercise 18).. If the relationship on this group is "can transmit to.A33A31 4. Since (\ 0 0 1\ 1 2 0 0 A = 2 1 0 1 Vi 1 0 17 2 we see that there are two ways 3 can send to 1 in two stages.n. suppose that we have four people. . each of whom owns a communication device.2. for convenience. then we define the associated incidence matrix A by Ay = 1 if i is related to j. all the diagonal entries are zero. (A + A 2 + • • • 4. To make things concrete. Thus (A2)3i gives the number of ways in which 3 can send to 1 in two stages (or in one relay). The proof above. (A 2 ) 3I = A31A11 4. if and only if 3 can send to k and k can send to 1." then Ay = 1 if i can send a message to j. provides a prototype of many of the arguments that utilize the relationships between linear transformations and matrices.. and Ay = 0 otherwise. Note that any term A3fcAfci equals 1 if and only if both A3fc and Afci equal 1. for instance.A m ) y is the number of ways in which i can send to j in at most ra stages. and Ay = 0 otherwise. An incidence matrix is a square matrix in which all the entries are either zero or one and.A32A21 4.

described earlier. Now (0 1 A + A2 = 1 1 V2 2 1 0 1 2 0 2 2 2 2 1 1 2 0 2 1\ 0 1 1 V . Consider.A 2 has a row [column] in which each entry is positive except for the diagonal entry. we conclude that there are no cliques in this relationship. can send a message to) the other. it can be shown (see Exercise 21) that the matrix A 4. Our final example of the use of incidence matrices is concerned with the concept of dominance. 2. For such a relation. In other words. For example. and compute B 3 . If we define a new matrix B by By = 1 if i and j can send to each other. given any two people. for example. Since A is an incidence matrix. exactly one of them dominates (or. Aij = 1 if and only if Aji — 0. that is. we form the matrix B. it can be shown that any person who dominates [is dominated by] the greatest number of people in the first stage has this property. suppose that the incidence matrix associated with some relationship is /o 1 A = 1 \1 1 0 1 1 0 1\ 1 0 0 1 1 0/ To determine which people belong to cliques. and By = 0 otherwise. In fact.Sec. then it can be shown (see Exercise 19) that person i belongs to a clique if and only if (B3)u > 0. In this case. An = 0 for all i. using the terminology of our first example.3 Composition of Linear Transformations and Matrix Multiplication 95 belongs to a clique. there is at least one person who dominates [is dominated by] all others in one or two stages. the matrix /o 1 0 0 0 1 A = 1 0 0 0 1 0 \1 1 1 1 0 1 0 0 0\ 0 0 1 o^ The reader should verify that this matrix corresponds to a dominance relation. A relation among a group of people is called a dominance relation if the associated incidence matrix A has the property that for all distinct pairs i and j. (0 1 0 1 0 1 B = 0 1 0 1 0 1 1\ 0 1 o 3 and B (0 A = 0 U A 0 4 0 0 4 0 4 4\ 0 4 0 Since all the diagonal entries of B 3 are zero.

In each part.96 Chap. then A = I. A*B. and A(££>). Label the following statements as true or false. • * and A and B denote matrices. (c) [U(w)]p = [\J]P[w]l3 for ail w€\N. (a) [UTfc . ft.3C).) -> P 2 (B) and U: P 2 (B) linear transformations respectively defined by T(/(x)) = f'(x)g(x) R3 be the and C = (4 0 3) . 3. A2 = I implies that A = I or A = -I. BCt. -2 Oj ( 1 Compute A(2B 4. and Z denote vector spaces with ordered (finite) bases a. (b) \T(v)]0 = mSMa for all v € V.b). V. and 7. c. (AB)D.bx + ex2) = (a + b. L ^ + B = LA + LB. Let ft and 7 be the standard ordered bases of P2(R) and R3. CB. a . / A = G C = (-: -?)< 1 B 2. 4. 2. respectively. Let g(x) = 3 4. and 5 dominate (can send messages to) all the others in at most two stages. (a) Let = 0 ? and B = D - ' ) .2/(x) and U(a 4. 3.mg[U]?. 3. while persons 1. 4. T = LA for some matrix A. where O denotes the zero matrix. (b) Let B = Compute A4. [T2]g = ([T]g)2.x. EXERCISES 1. If A is square and Ay = <5y for all i and j. Let T: P2(R. A2 = O implies that A = O. 2 Linear Transformations and Matrices Thus persons 1. T: V — W and U: W — Z denote linear transformations. respectively. . (d) (e) (f) (g) (h) (i) (j) [lv]« = / . and 4 are dominated by (can receive messages from) all the others in at most two stages. and CA.W.

3 Composition of Linear Transformations and Matrix Multiplication 97 (a) Compute [[)]}. and Z be vector spaces. Find linear transformations U. Let V. where A = ( \ (b) * [T(/(x))]a. 6. Let A and B be n x n matrices. 12.14 to compute the following vectors: (a) [T(A)] a .x 2 . and let T: V — W and U: W — Z be linear.x + 2x2. Prove (b) of Theorem 2. then T is one-to-one.2. 13.Use your answer to find matrices A and B such that AB = O but BA^O. (c) [T(A)]7. Must U also be one-to-one? (b) Prove that if UT is onto. Must T also be onto? (c) Prove that if U and T are one-to-one and onto. and let T: V — V be linear. Prove that A is a diagonal matrix if and only if Ay = SijAjj for all i and j.12 and its corollary. 4.Sec.2x 4. where f(x) = 6 . Now state and prove a more general result involving linear transformations with domains unequal/to their codomains.15.14 to verify your result. (a) Prove that if UT is one-to-one. Then use [[)]} from (a) and Theorem 2. then U is onto. (b) Let h(x) = 3 .10. 8. Recall that the trace of A is defined by tr(A) = £ ) A i i . W.r)=4f [T(/(x))] 7 . and [UTg directly.13. . Then use Theorem 2. Complete the proof of Theorem 2. then UT is also. Prove that T 2 = T() > if and only if R(T) C N(T). Compute [h{x)]a and [U(/t(x))]7. For each of the following parts. 10.where/(. [T]^. let T be the linear transformation defined in the corresponding part of Exercise 5 of Section 2. Prove (c) and (f) of Theorem 2. 7.11 to verify your result.T: F2 — F2 such that UT = To (the zero • transformation) but TU ^ To. 2. where A=(l (d) 5. Prove Theorem 2. Let A be an n x n matrix. Use Theorem 2. *=i Prove that tr(AB) = tr(BA) and tr(A) = tr(A'). Let V be a vector space. 9. 11.

. prove that wA is a linear combination of the rows of A with the coefficients in the linear combination being the coordinates of w.(x .. 18. ap)1. . • (a) If rank(T) = rank(T 2 ). For an incidence matrix A with related matrix B defined by By = 1 if i is related to j and j is related to i.13(b) to prove that Bz is a linear combination of the columns of B.98 Chap. and show that V = {y: T(y) = y} © N(T) (see the exercises of Section 1. and By = 0 otherwise. 15. Hint: Note that x = T(x) 4. Assume the notation in Theorem 2.. Use Theorem 2.T(x)) for every x in V. Deduce that V = R(T) e N(T) (see the exercises of Section 1. Hint: Use properties of the transpose operation applied to (a). 16. (d) Prove the analogous result to (b) about rows: Row i of AB is a linear combination of the rows of B with the coefficients in the linear combination being the entries of row i of A.3). prove that the jth column of MA is a linear combination of the corresponding coftimns of MA with the same corresponding coefficients. If the j t h column of A is a linear combination of a set of columns of A. 17. 2 Linear Transformations and Matrices 14. Using only the definition of matrix multiplication. prove that i belongs to a clique if and only if (B3)u > 0. Let V be a vector space..3). prove that R(T) n N(f) = {0}. Let V be a finite-dimensional vector space. Use Exercise 19 to determine the cliques in the relations corresponding to the following incidence matrices. then show that Bz = J2aJUJi=l (b) Extend (a) to prove that column j of AB is a linear combination of the columns of A with the coefficients in the linear combination being the entries of column j of B.13. m (c) For any row vector w £ F . prove that multiplication of matrices is associative. if z = (ai. * Let M and A be matrices for which the product matrix MA is defined. (a) Suppose that z is a (column) vector in F p . 19. 20. (b) Prove that V = R(Tfc) © N(Tfc) for some positive integer k. In particular.a2. Determine all linear transformations T: V — V * such that T = T 2 . and let T: V — V be linear.

of course. Prove that the matrix A 4. The facts about inverse functions presented in Appendix B are. then T is said to be invertible. many of the intrinsic properties of functions are shared by their inverses.4 Invertibility and Isomorphisms (() 1 0 \l 0 1 0 1\ 0 0 1 0 1 0 1 ()) {{) 1 1 \1 0 1 1\ 0 0 1 0 0 1 0 1 0/ 99 (a) (b) 21.3. 22.\ (when it exists) can be used to determine properties of the inverse of the matrix A. This result greatly aids us in the study of inverses of matrices. Definition. As noted in Appendix B. then the inverse of T is unique and is denoted by T~l. Determine the number of nonzero entries of A. .17) that the inverse of a linear transformation is also linear. / 2. 2. In the remainder of this section. Let V and W be vector spaces. and let T: V — W be linear. Let A be an incidence matrix that is associated with a dominance relation. We will see that finite-dimensional vector spaces (over F) of equal dimension may be identified. As one might expect from Section 2. These ideas will be made precise shortly.Sec. For example. Use Exercise 21 to determine which persons dominate [are dominated by] each of the others within two stages. Let A be an n x n incidence matrix that corresponds to a dominance relation.A2 has a row [column] in which each entry is positive except for the diagonal entry. we apply many of the results about invertibility to the concept of isomorphism. Prove that the matrix A = corresponds to a dominance relation. We see in this section (Theorem 2.4 INVERTIBILITY AND ISOMORPHISMS The concept of invertibility is introduced quite early in the study of functions.. > A function U: W —• V is said to be an inverse of T if TU — lw and UT = lvIf T has an inverse. the inverse of the left-multiplication transformation L. true for linear transformations. in calculus we learn that the properties of being continuous or differentiable are generally retained by the inverse functions. Fortunately. we repeat some of the definitions for use in this section. if T is invertible. 23. Nevertheless.

then the matrix B such that AB = BA = J is unique. and onto are all equivalent. Example 1 Let T: Pi(B) —• R2 be the linear transformation defined by T(a 4.1 (j/ 1 )4-T.1 [cT(xi) 4. where V and W are finitedimensional spaces of equal dimension.y2) = T. As Theorem 2.100 Chap.^ q / ! 4. 2. then C = CI .1 = T. then the conditions of being invertible. The reader should note the analogy with the inverse of a linear transformation.2/2 € W and c £ F. Example 2 The reader should verify that the inverse of 5 2 7 3 is 3 -2 -7 5 . Thus xi .C(AB) = (CA)B = IB = B. 2 Linear Transformations and Matrices The following facts hold for invertible functions T and U. in particular. there exist unique vectors xi and x 2 such that T(xi) = yi and T(x 2 ) = j/2. a 4. one-to-one.5 (p. Let V and W be vector spaces.r x 2 = cT. The reader can verify directly that T _ 1 : R2 — Pi(B) is defined by > T _ 1 (c.x 2 )] = c x i . Observe that T _ 1 is also linear.5 as follows.bx) = (a.T(x 2 )] = J-l[T(cxi 4. Then A is invertible exists annx n matrix B such that AB = BA — I. 3. Definition. Then T is invertible if and only if rank(T) = dim(V). Let T: V —• W be a linear transformation. this is true in general. | It now follows immediately from Theorem 2. Let 3/1. T _ 1 is invertible.17 demonstrates.1 . > W be Proof. Since T is onto and one-to-one. 71) that if T is a linear transformation between vector spaces of equal (finite) dimension.1 = U~ 1 T. so f T .1 . Let A be an n x n matrix. We can therefore restate Theorem 2.1 (j/ 2 ). d) = c + (d — c)x.) The matrix B is called the inverse of A and is denoted by A .1 ) . 1.^ j / i ) and x 2 = T~1(y2). if there If A is invertible. Then T _ 1 : W — V is linear. ( T .b). (TU). and let T: V linear and invertible.T . We often use the fact that a function is invertible if and only if it is both one-to-one and onto. • Theorem 2.17. (If C were another such matrix. We are now ready to define the inverse of a matrix.

there exists U e £ ( W .6 (p.142. So UT = l v .. Furthermore..2 (p.9 (p. Let n = dim(V). respectively. if W is finite-dimensional. dim(V) = dim(W).. 44). 88). Let T be an invertible linear transformation from V to W. = [ T . So [T]l is an n x n matrix. Now suppose that V and W are finite-dimensional. Suppose that T is invertible. At this point. TU = l w . 1 So by the dimension theorem (p. vn}. By Theorem 2.. Lemma. using T _ 1 .1 T = lv. ifc. we have dim(V) = dim(W).1 ] ^ . n. . Let ft = {x\..2. hence W is finitedimensional by Theorem 1.X2. In this case.18. Then V is finite-dimensional if and only if W is finite-dimensional. Theorem 2. and similarly. Because T is one-to-one and onto. Then there exists an n x n matrix B such that AB = BA = In. then so is V by a similar argument.1 ] ? = J n .. By Theorem 2. 72). So [T]} is invertible and ( [ T g ) " 1 = [T" 1 ]^. Now T _ 1 : W —• V satisfies T T _ 1 = lw and T . Proof. ^T-1]^ = ([T]2) _1 .. we learn a technique for computing the inverse of a matrix. Let T: V — W be linear. 2 . Then T is > invertible if and only if [T]^ is invertible.4 Invertibility and Isomorphisms 101 In Section 3. . . Suppose that V is finite-dimensional. 70). It follows that [U]^ = B. xn} be a basis for V. V ) such that u (wj) = ^2 BiiVi i=l fori = 1 . [ T g p . where 7 = {101. observe that [iJT]0 = [^[T}} = BA = In = {W}0 I by Theorem 2. Conversely.. wn} and 0 = {«i. . 68). • • •. 2. . By the lemma. we develop a number of results that relate the inverses of matrices to the inverses of linear transformations. Let V and W be finite-dimensional vector spaces with ordered bases (3 and 7.Sec.l T l .11 (p. Proof. To show that U = T _ 1 . Similarly. it follows that dim(V) = dim(W). T(/?) spans R(T) = W. Now suppose that A = [T]Z is invertible. we have nullity(T) = 0 and rank(T) = dim(R(T)) = dim(W).Thus In = [Ivfe = [ T .

We say that V is isomorphic to W if there exists a linear transformation T: V — W that is invertible. [T . • Corollary 1. Exercise. It is easily checked that T is * an isomorphism. > Such a linear transformation is called an isomorphism from V onto W. these two vector spaces may be considered identical or isomorphic. Let V and W be vector spaces. in terms of the vector space structure. Let V be a finite-dimensional vector space with an ordered basis (3.i(F) by T(a\. Definitions.1 = I-A-'Proof. We leave as an exercise (see Exercise 13) the proof that "is isomorphic to" is an equivalence relation. Then A is invertible if and only if\-A is invertible. Proof. Example 4 Define T: F 2 — P. that is. For example. !§j Corollary 2. and let T: V — V be linear.. that is.102 Example 3 Chap.) So we need only say that V and W are isomorphic. 2 Linear Transformations and Matrices Let 0 and 7 be the standard ordered bases of Pi(R) and R2. It can be verified by matrix multiplication that each matrix is the inverse of the other. c. that certain vector spaces strongly resemble one another except for the form of their vectors. we have TO=(i ' '-']?=(_. ( L a ) .. • .a2) = fli + nix. Exercise. respectively. Let A be ann x n matrix. Furthermore. 6. @ The notion of invertibility may be used to formalize what may already have been observed by the reader. we see that sums and scalar products associate in a similar manner. Then T is invertible if and only if [T]# > is invertible. For T as in Example 1. if we associate to each matrix a c 6 d the 4-tuple (a. in the case of M2x2(^) and F 4 . so F2 is isomorphic to Pi(F). Furthermore.1 ]^ = (PI/3) . (See Appendix A. d).

6.19. •. Hence T is an isomorphism. L . 1 By the lemma to Theorem 2. Let V be a vector space over F. 2 . respectively. as a vector space. is an isomorphism. 71). Then V is isomorphic to W if and only if dim(V) = dim(W). Theorem 2. we have that T is also one-to-one.2 (p. Up to this point. there exists T: V —• W such that T is linear and T(vi) — Wi for i — 1 . We conclude that Ps{R) is isomorphic to M 2x2 ( J R). of dimensions n and and W. and let 0 and 7 be ordered bases for V Then the function <I>: £(V.102.. because dim(P3(/?)) = dim(M 2x2 (i?)). 68). V2. By use of the Lagrange interpolation formula in Section 1. it can be shown (compare with Exercise 22) that T(/) = O only when / is the zero polynomial. . .. we have R(T) = span(T(/?)) = span(7) = W. From Theorem 2. We are now in a position to prove that. then either both of V and W are finite-dimensional or both arc infinite-dimensional. if V and W are isomorphic. Thus T is one-to-one (see Exercise 11). we have that dim(V) = dim(W). 2. the reader may have observed that isomorphic vector spaces have equal dimensions.. Suppose that V is isomorphic to W and that T: V — W is an > isomorphism from V to W. .20.4 Invertibility and Isomorphisms Example 5 Define T:P3(J2)-M2x2(ie) byT(/) = /(I) /(3) 1(2) /(4) 103 It is easily verified that T is linear. respectively. this is no coincidence. 72).n. defined by » £(V.vn} and 7 = {101. Corollary.18. 71).5 (p.18.wn} be bases for V and W. W). Theorem 2. . W) — M m x n ( F ) . $(T) = [T]J for T € Let V and W be finite-dimensional vector spaces over F m. By Theorem 2. the collection of all linear transformations between two given vector spaces may be identified with the appropriate vector space o f m x n matrices. So T is onto. and let 0 — {vi. / Now suppose that dim(V) = dim(W). By the lemma preceding Theorem 2. it follows that T is invertible by Theorem 2. Moreover. • In each of Examples 4 and 5. we have associated linear transformations with their matrix representations. respectively.6 (p. Let V and W be finite-dimensional vector spaces (over the same field). Using Theorem 2. As the next theorem shows. Proof.Sec. . • .5 (p. Then V is isomorphic to F n if and only if dim(V) = n.

. By Theorem 2.104 Chap. The standard representation ofV with respect to 0 is the function <f)p: V — F n defined by <t>p{x) = [x]ff for each x € V. Proof Exercise. there exists a unique linear transformation T: V — W such that * J v ( j) = 5 Z AiJWi for 1 . Thus <> is an isomorphism.19). 2 Linear Transformations and Matrices Proof. I We conclude this section with a result that allows us to see more clearly the relationship between linear transformations defined on abstract finitedimensional vector spaces and linear transformations from F n to F m .4)}. It is easily observed that 0 and 7 are ordered bases for R2. For x = (1. respectively. Let 0 be an ordered basis for an n-dhncnsional vector space V over the fleid F . Let V and W be finite-dimensional vector spaces of dimensions n and rn. 7 = {101. (0. The next theorem tells us much more. . 4>p is an isomorphism. Let 0 — {vi. (3.0). Proof. Hence we must show that $ is one-to-one and onto. Then £(V.103..i' - n 1 But this means that [T]l = A.. $ is linear. we have <f>p (x) = [x](3 f _ 2 j and (j)1(x) = [x\ -5 2 We observed earlier that <j>p is a linear transformation..1)} and 7 = {(1.6 (p.. or $(T) = A.2. . Theorem 2. / Definition. We begin by naming the transformation x — [x]p introduced in Sec* tion 2.19 and the fact that dim(M m X n (F)) = mn. W) is finite-dimensional of dimension mn.21. 82). The proof follows from Theorems 2.-2). 1 This theorem provides us with an alternate proof that an n-dimensional vector space is isomorphic to F n (sec the corollary to Theorem 2. By Theorem 2.8 (p. . For any finite-dimensional vector space V with ordered basis 0.20 and 2. ] Corollary. > Example 6 Let 0 = {(1.. 72).2).vn}. there exists a unique linear transformation T: V —• W such that $(T) = A. This is accomplished if we show that for every mxn matrix A. and let A be a given mxn matrix.wm}.V2.

Let 0 and 7 be the standard ordered bases for P3(R) and P 2 (i*).This diagram allows us to transfer operations on abstract vector spaces to ones on F n and F m . 2.w 105 Let V and W be vector spaces of dimension n and m. We are now able to use 4>@ and 0 7 to study the relationship between the linear transformations T and LA: F n -> F m .F -*. we may conclude that LA4>P = </>7T. Map V into W with T and follow it by 0 7 to obtain the composite 0 7 T. and let T: V — W be a linear transformation. Example 7 Recall the linear transformation T: P-. we may "identify" T with LA. that is.i(R) and P2(/^). this yields the composite LA(J>02. By a simple reformulation of Theorem 2. where 0 and 7 are > arbitrary ordered bases of V and W.2 (T(f(x)) = f'(x)). respectively. Let us first consider Figure 2.14 (p. respectively. the diagram "commutes.2 -*. 0 0 3/ . respectively.4 Invertibility and Isomorphisms V (1) (2) 07 T u Figure 2.2.If A = [T]}.](R) — P2{R) defined in Example 4 of > Section 2. Map V into F n with <fip and follow this transformation with L^. this relationship indicates that after V and W are identified with F" and F m via 0/? and <i>7.Sec. 91). These two composites are depicted by the dashed arrows in the diagram." Heuristically. Notice that there are two composites of linear transformations that map V into Fm: 1. then /o A= I0 \0 1 0 0\ 0 2 0]. and let (f>p: P-S{R) -> R4 and 0 7 : P2(R) -+ R3 be the corresponding standard representations of P-. Define A = [T]l. respectively.

then ( A .1 = A.a 2 . 2. Label the following statements as true or false. 0 7 T(p(x)). » (a) (b) (c) (d) (e) (f ) (g) (h) (i) ([Tig)" 1 £ [T-'lg. A must be square in order to possess an inverse. EXERCISES 1. • Try repeating Example 7 with different polynomials p(x).106 Chap. > R2 — R3 defined by T ( a i . A is invertible if and only if L^ is invertible. where A = [T]g. T: V — W is linear.3ai + 4 a 2 ) . a 2 . (a) (b) (c) (d) T: T: T: T: R2 — R3 defined by T(oi. V and W are vector spaces with ordered (finite) bases a and 0. determine whether T is invertible and justify your answer. If A is invertible. For each of the following linear transformations T. M 2X 3(F) is isomorphic to F 5 . Now / 0 1 0 0s = 0 0 2 0 \0 0 0 3 / LA4>0(p(x)) 2\ 1 -3 V V But since T(p(x)) = v'(x) = 1 — 6x + 15a. \\ = a + 26. 2 Linear Transformations and Matrices We show that Lj\(f)p(p(x)) = Consider the polynomial p(x) = 2+x—3x2+bx3. T = LA.03) = (3ai — 2a3.1 ) .a 2 . AB = I implies that A and B are invertible.a 2 ) = (ai — 2a 2 . 4 a i ) .a 2 . Pn{F) is isomorphic to P m ( F ) if and only if n — rn. T is invertible if and only if T is one-to-one and onto. > P3(R) -> P2{R) defined by T(p{x)) = p'(x). a +b a c c+d (e) T: M 2x2 (i?) -> P2(R) defined by T (a (f) T: M2x2(R) -+ M2x2{R) defined by T . respectively.2. and A and B are matrices. we have -J(vU)) (-61 15 So LAMPW) = ^T(P(*)). > R3 — R3 defined by T(oi.#2.3«i + 4 a 2 ) . In each part.x + (c + d)x2. a 2 ) = (3ai .

21. Could A be invertible? Explain. . (b) Prove A — B~x (and hence B = A~l). (b) Suppose that AB — O for some nonzero n x n matrix B. Which of the following pairs of vector spaces are isomorphic? Justify your answers.4 Invertibility and Isomorphisms 107 3. (a) Use Exercise 9 to conclude that A and B are invertible.18.T Let A and B be n x n matrices such that AB — In. Prove that A1 is invertible and (A1)'1 6. (b) F 4 a n d P 3 ( F ) . (d) V = {A e M2x2(R): tr(A) = 0} and R4. (We are. Let A and B be n x n matrices such that AB is invertible. 11. Let A be an n x n matrix. Prove that AB is invertible and (AB)'1 = B^A'1. Let = (A'1)1.Sec. (a) Suppose that A2 = O. 4J Let A and B be n x n invertible matrices. 9. Prove Theorem 2.* Let A be invertible. Give an example to show that arbitrary matrices A and B need not be invertible if AB is invertible. 7. 2. Prove that if A is invertible and AB = O. 14. Verify that the transformation in Example 5 is one-to-one. in effect. (a) F3 and P 3 (F).) (c) State and prove analogous results for linear transformations defined on finite-dimensional vector spaces. saying that for square matrices. 13." Prove that ~ is an equivalence relation on the class of vector spaces over F. Prove Corollaries 1 and 2 of Theorem 2. Prove that A is not invertible. a "one-sided" inverse is a "two-sided" inverse. (c) M 2x2 (i?) and P 3 (i2). 12. 10. Let ~ mean "is isomorphic to. 8. Prove that A and B are invertible. 5. then B = O. Construct an isomorphism from V to F .

In Example 5 of Section 2. Vn} and 7 = {w\. Suppose that 0 is a basis for V.E12. Prove that T is an isomorphism if and only if T(0) is a basis for W. (a) Prove that T(V 0 ) is a subspace of W.1. there exist linear transformations T. Let V and W be finite-dimensional vector spaces with ordered bases 0 = {vi> v2. (b) Verify that LAMM) / = ^ T ( M ) for A = [T]^ and M = 1 2 3 4 20. . 2 Linear Transformations and Matrices 15. Let B be an n x n invertible matrix.. (a) Compute [Tj/3.j: V — W such > that Tij(vk) = Wi if k = j 0 iik^ j. Let Vn be a subspace of V.-.-.108 Chap. Hint: Apply Exercise 17 to Figure 2. w m }. 21. as noted in Example 3 of Section 1.i). W) — M m X n ( F ) such that > $(Tij) = M y '.6 (p. Prove that rank(T) = rank(Lyi) and that nullity(T) = nullity (L. 18. 17.E22}. W). 16. where A = [T]^. the mapping T: M 2x2 (i?) — M 2x2 (i?) de> fined by T(M) = Ml for each M e M 2x2 (i?) is a linear transformation. respectively. (b) Prove that dim(V 0 ) = dim(T(V 0 )). which is a basis for M2x2(R). and prove that [T^-H = MlK Again by Theorem 2. 19. Prove that $ is an isomorphism.* Let V and W be finite-dimensional vector spaces and T: V — W be an > isomorphism.. First prove that {Tij: 1 < i < m. Let 0 and 7 be ordered bases for V and W. Prove that $ is an isomorphism.2.6. and let T: V — W be a * linear transformation. 1 < j < n} is a basis for £(V. Repeat Example 7 with the polynomial p(x) = 1 + x + 2x2 + x3. Define $ : M n x n ( F ) — M n x n ( F ) > by $(A) = B~XAB.6. there exists a linear transformation $ : £(V. Then let MlJ be the mxn matrix with 1 in the ith row and j t h column and 0 elsewhere. Let V and W be n-dimensional vector spaces. E21. 72). . Let 0 = {EU. respectively. By Theorem 2. . w2. * Let T: V — W be a linear transformation from an n-dimensional vector > space V to an m-dimensional vector space W.

that is. . 24. and let W = P(F). . .3 25. / ( d ) ./(c„)). prove that if v + N(T) = v' + N(T). Prove that T is an isomorphism. The following exercise requires familiarity with the concept of quotient space defined in Exercise 31 of Section 1. (By the corollary to Theorem 1. Let V be a nonzero vector space over a field F.7.2. Let C(S. i=0 where n is the largest integer such that er(n) ^ 0. Let T: V — Z be a linear transformation of a vector space V onto a » vector space Z. Let c n .C„. . .3 and with Exercise 40 of Section 2.13 (p. prove that T = Tr/.3 commutes. then T(v) = T(v').. Let V denote the vector space defined in Example 5 of Section 1..1. . c i .. . 60) in Section 1.. (c) Prove that T is an isomorphism. F) denote the vector space of all functions / € !F(S.Sec. and suppose that S is a basis for V. Prove that T is an isomorphism.Ci.4 Invertibility and Isomorphisms 109 22. c n be distinct scalars from an infinite field F. (d) Prove that the diagram shown in Figure 2. Define T: V — W by T(<r) = j ^ a ( i ) x j . Define the mapping T: V/N(T) -» Z by T(v + N(T)) = T(v) for any coset v + N(T) in V/N(T). V T • Z 4 / 7 T V/N(T) Figure 2. Hint: Use the Lagrange polynomials associated with C 0 . 23. (a) Prove that T is well-defined. . that is. every vector space has a basis). (b) Prove that T is linear. Define T: P n ( F ) -» Fn+1 by T ( / ) = ( / ( c b ) . 2. F) such that f(s) — 0 for all but a finite number .

3. s€S. an x'y'-coordinate system with coordinate axes rotated from the original xy-coordinate axes. 1 . The resulting expression is of such a simple form that an antiderivative is easily recognized: 2xex dx = eudu = eu + c = ex + c. This is done by introducing a new frame of reference./(s)^0 otherwise. and *(/) = E /«*. The unit vectors along the x'-axis and the y'-axis form an ordered basis 1_/-1 ' • { A C y/E\ 2 for R2.110 Chap. (See Exercise 14 of Section 1. Prove that \I> is an isomorphism. a change of variable is used to simplify the appearance of an expression. and the change of variable is actually a change from [P]p = ( j. Similarly.) Let # : C{S.4.F) -> V be defined by * ( / ) = 0 if / is the zero function. in which form it is easily seen to be the equation of an ellipse. Geometrically. in geometry the change of variable 2 . the coordinate vector of P relative to the standard ordered basis 0 = {ei.5 T H E C H A N G E OF COORDINATE MATRIX In many areas of mathematics. in calculus an antiderivative of 2xex can be found by making the change of variable u — x .) We see how this change of variable is determined in Section 6. 2 Linear Transformations and Matrices of vectors in 5. the change of variable is a change in the way that the position of a point P in the plane is described. e 2 }. For example. the new coordinate axes are chosen to lie in the direction of the axes of the ellipse. (See Figure 2. to . In this case. / " = v T 7f' 2 can be used to transform the equation 2x — 4xy + by2 — 1 into the simpler equation (x')2 +6(y')2 = 1. 2. Thus every nonzero vector space can be viewed as a space of functions.5.

Figure 2. [v]p . and let Q — [\v]%. Q is invertible by Theorem 2. Let 0 and 0' be two ordered bases for a finite-dimensional vector space V. 101). (b) For any v£\J.Q[v]p>.22. . Thus [v]p = Q[v)p> for all v e R2.18 (p.. 2.14 (p. where I denotes the identity transformation on R2. Proof. (b) For any v 6 V.5 The Change of Coordinate Matrix 111 [P]pi = ( .[v]0>=Q[vh> by Theorem 2. J. [vb = [W(v)]0 = {\vfp. (a) Since lv is invertible. Then (a) Q is invertible. A similar result is true in general. the coordinate vector of P relative to the new rotated basis 0'. 91). T h e o r e m 2.1 \ fx' 2 y y/EV 2 equals [l]«.Sec.4 A natural question arises: How can a coordinate vector relative to one basis be changed into a coordinate vector relative to the other? Notice that the system of equations relating the new and old coordinates can be represented by the matrix equation 1_ (2 A u Notice also that the matrix W .

Let I be the identity transformation on V. 2 Linear Transformations and Matrices The matrix Q = [\y]p.x2.4).x2.23. we consider only linear transformations that map a vector space V into itself. Then Proof.4)]p.1) = 2(1. by Theorem 2. then Xj / WijXi i=l for j = 1 . . Because of part (b) of the theorem. the matrix that changes /^'-coordinates into /^-coordinates is 3 2N Thus. (1. and let 0 and 0' be ordered bases for V. [(2A)Yp = Q[(2. (See Exercise 11. the jth column of Q is [x'Ap.22 is called a change of coordinate matrix. Such a linear transformation is called a linear operator on V. for instance.1) . hence.112 Chap. = [T]pQTherefore [T]p.1 ) } and 0' = {(2. Since (2.1(1. Let T be a linear operator on a finite-dimensional vector space V. . I .11 (p.[Tl]£ = [Tfp[\fp.1).1)}. Theorem 2. defined in Theorem 2. Then T = IT = Tl. 88).) Example 1 In R2. xn} and 0' = {x'i.x'n}. Suppose that Q is the change of coordinate matrix that changes 0'-coordinates into 0-coordinates. let 0 = {(1. n. . that is.1) + 1(1. (3. Suppose now that T is a linear operator on a finitedimensional vector space V and that 0 and 0' are ordered bases for V..1 ) and (3. . = Q-^pQ. Observe that if 0 — {xt. -1). . Notice that if Q changes /"^'-coordinates into /^-coordinates..=Q 1\ 0/ / 3 V-l For the remainder of this section. then Q~l changes /^-coordinates into /"/-coordinates. What is the relationship between these matrices? The next theorem provides a simple answer using a change of coordinate matrix. Q[r\p = [ " ] £ m £ = [IT]J . we say that Q changes /^'-coordinates into /^-coordinates.. Then V can be represented by the matrices [T]^ and \T]p>. . 2 . • • • .4) = 3(1. .

b) for any (a. We now derive the less obvious rule for the reflection T about the line y = 2x. [T]/3'=Q-1[T]/3Q=(J I)' To show that this is the correct matrix. we saw that the change of coordinate matrix that changes /^'-coordinates into /^-coordinates is Q = and it is easily verified that W 3 2 .5 The Change of Coordinate Matrix Example 2 Let T be the linear operator on R2 defined by T 113 .23 to compute [T]^. Example 3 Recall the reflection about the rc-axis in Example 3 of Section 2. as the next example shows.1. The rule (x. we can verify that the image under T of each vector of 0' is the linear combination of the vectors of 0' with the entries of the corresponding column as its coefficients. For example. it is completely . b) in R2. the image of the second vector in 0' is T 3 . • It is often useful to apply Theorem 2. —y) is easy to obtain. 2.i f i H f i Notice that the coefficients of the linear combination are the entries of the second column of [T]^. Since T is linear. by Theorem 2. y) —> (x.1 17 ' 5 1 37 ' / Hence.Sec. The reader should verify that [T]a = 3 -1 1 37" In Example 1.23.) We wish to find an expression for T(a. (See Figure 2. (3a — b a + 36 7 ' and let 0 and 0' be the ordered bases in Example 1.5.

114 Chap.2 . 1 ) = .2) = (1.3 a + 46 4 a + 367 ' .H i a)Since 0 is the standard ordered basis. Thus for any (a. T(l.( . We can solve this equation for [T]^ to obtain that [T]li = Q\T\d'Q~lBecause Q 5 I -2 1 the reader can verify that t * . it follows that T is left-multiplication by [T]p. 1 ) = (2. Therefore if we let 0' = 1\ 1-2 27' V i then 0' is an ordered basis for R2 and / H> = o -17' Let 0 be the standard ordered basis for R2.b) in R2. 2 Linear Transformations and Matrices y y = 2x Figure 2.1 ) .2 .2) and T ( . . Clearly.5 determined by its values on a basis for R2. and let Q be the matrix that changes /^'-coordinates into /^-coordinates. we have T 1 /-3 4 4 37 \b 1 / . Then Q = 1 -2 and Q l[^]pQ — [T]/9'.

Then [Lyt]7 = Q . So we need only say that A and B are similar. Let A e M n x n (F). l M v = Q. Example 4 Let A = and let 7 = which is an ordered basis for R3. . and let 7 be an ordered basis for F". where V is distinct * from W. 2.23 can be generalized to allow T: V — W. In this case. whose proof is left as an exercise.1 AQ. Corollary. 6. At this time. Notice also that in this terminology Theorem 2. and if 0 and 0' are any ordered bases for V. whore Q is the n x n matrix whose jth column is the jth vector of 7. we introduce the name for this relationship. Then and So by the preceding corollary.23 can be stated as follows: If T is a linear operator on a finite-dimensional vector space V. however.Sec.AQ.AQ : Q-1 = I ° = M V 0 2 4 -1 8 6 -1 The relationship between the matrices [T]^' and [T]p in Theorem 2. we can change bases in V as well as in W (see Exercise 8). and 7.23 will be the subject of further study in Chapters 5. Theorem 2.23 is contained in the next corollary.5 The Change of Coordinate Matrix 115 A useful special case of Theorem 2. We say that B is similar to A if there exists an invertible matrix Q such that B = Q. Let A and B be matrices in M n x n ( F ) . Let Q be the 3 x 3 matrix whose jth column is the j t h vector of 7. then [T]p' is similar to [T]p. Definition. Observe that the relation of similarity is an equivalence relation (see Exercise 9).

0)} and 0' = {eue2} and 0' = {(2. ( ..1 ) } {(2.4x2 . 1 ) } 3. x + 1. . 2 : (e) / = {x — a . -x + 2x + 1} and 2 2 0' = { 9 x . E Let T be a linear operator on a finite-dimensional vector space V. Then the jth column of Q is [xj]p>. (a) Suppose that 0 = {x\. x + 3x .B£ M n X n ( F ) are called similar if B = Q*AQ for some Q < M n X n ( F ) .02).2x2 . For each of the following pairs of ordered bases 0 and 0' for P2(R). c2x2 + C\X + CQ} 2 (b) 0 = {l. find the change of coordinate matrix that changes /^'-coordinates into /^-coordinates. (c) Let T be a linear operator on a finite-dimensional vector space V.3} 2 2 2 9 (f) / = {2x . Then [J]p.x + 1.2x . (a) 0 = {x2. x + 1.62a. (5.3 ) } {(-4. let 0 and 0' be ordered bases for V.9 . 2 Linear Transformations and Matrices EXERCISES 1. x + 2 1 x . ( . . (2. [T]^ is similar to [T]7.« Q P V Q " 1 .2x2 + 3} 2 ? . -2x2 + 5x + 5. a — 1} and 2 0' = {5a: . .e 2 } and 0' = { ( .x . (a) (b) (c) (d) 0= 0 = 0= 0 = {ei.x + 1. 2...4 .. Label the following statements as true or false.2 + 61X + 60.1). Let T be the linear operator on R2 defined by T 2a+ 6 a — 36 . (2. c2x2 + c\x + Cn} 2 2 2 2 (c) 0 = {2x .x2.3). 3 x + 5x + 2} 4. (b) Every change of coordinate matrix is invertible. .2.x. x + 1} and 0' = {x2 + x + 4. x } 2 2 (d) 0 = {x . .l} and 0' = {a2x2 + a\x + a 0 . x 2 .62)} and 0' = {(0. For each of the following pairs of ordered bases 0 and 0' for R2. (e) Then for any ordered bases 0 and 7 for V.1 .Sx + 2. x } and 2 0' = {a2x + mx + a0. find the change of coordinate matrix that changes /^'-coordinates into 0coordinates.1 3).x.5). and let Q be the change of coordinate matrix that changes /^'-coordinates into /^-coordinates. 3x + 1. (d) The matrices A.3. .2 .1 ) } {(01.xn} and 0' = { x i .10).116 Chap. x. .6 2 x 2 + 61 a: + 60.x. x } and 0' = {1. x J J are ordered bases for a vector space and Q is the change of coordinate matrix that changes /^'-coordinates into /^-coordinates. (61.

let L be the line y = mx. find [L^]^. 1 — x}. 1\ -ij -1 ( u l 1 2 1 2 (d) A = 7. and let 0' = Use Theorem 2. 2. r2 1 117 2 -1 -1 1 5.) 8. (See the definition of projection in the exercises of Section 2. For each matrix A and ordered basis /?. Let 0 — {l.a:} and 0' = {1 + x. 6. Let T: V — W be • » a linear transformation from a finite-dimensional vector space V to a finite-dimensional vector space W. Use Theorem 2.Sec. Prove the following generalization of Theorem 2. the derivative of p(x). In R2. Let T be the linear operator on Pi(R) defined by T(p(x)) = p'(x). Let 0 and 0' be ordered bases for . where m ^ 0. Also. Find an expression for T(x.23 and the fact that 1 1 tofind[T]p>. where (a) T is the reflection of R2 about L. t/).23.5 The Change of Coordinate Matrix let 0 be the standard ordered basis for R2.1. find an invertible matrix Q such that [LA]P = Q~lAQ. (b) T is the projection on L along the line perpendicular to L.23 and the fact that i 1 to find [T]p'.

then tr(A) = tr(I?). x 2 . W = F m . Prove that 0' is a basis for V and hence that Q is the change of coordinate matrix changing /^'-coordinates into /^-coordinates. and if there exist invertible mxrn and n x n matrices P and Q. then RQ is the change of coordinate matrix that changes o>coordinates into 7-coordinates. x 2 . such that B = P~1AQ. Then [T]j/. . . Hints: Let V = F n . 10. respectively. x n } be an ordered basis for V. Prove that if A and B are similar n x n matrices. Prove the corollary to Theorem 2.0. a. and a linear transformation T: V — W such that > A = [T]} and B = [!]}. and 0 and 7 be the standard ordered bases for F" and F m .* Let V be a finite-dimensional vector space over a field F . Define x'j = 22 Qijxi for 1 < J < ™.0-coordinates into 7-coordinates. = P _ 1 [T]JQ. Prove the converse of Exercise 8: If A and B are each mxn matrices with entries from a field F . ordered bases 0 and 0' for V and 7 and 7' for W. Let Q be an nxn invertible matrix with entries from F. 2 Linear Transformations and Matrices V. . respectively. Let V be a finite-dimensional vector space with ordered bases and 7. x'n}. then Q~l changes /^-coordinates into a-coordinates. . (a) Prove that if Q and R are the change of coordinate matrices that change a-coordinates into ^-coordinates and . 12.118 Chap. (b) Prove that if Q changes a-coordinates into /3-coordinates. . and set 0' — {x[. where Q is the matrix that changes /^'-coordinates into /^-coordinates and P is the matrix that changes 7'-coordinates into 7-coordinates. and let 7 and 7' be ordered bases for W. T = L^. 14.. respectively. 13. Prove that "is similar to" is an equivalence relation on MnXn(F). Hint: Use Exercise 13 of Section 2. 9.3. Now apply the results of Exercise 13 to obtain ordered bases 0' and 7' from 0 and 7 via Q and P . 11. respectively. then there exist an n-dimensional vector space V and an m-dimensional vector space W (both over F). • • •. and let 0 — {x\.23.

• Example 2 Let V = M n X n (F). As we see in Example 1. Example 1 Let V be the vector space of continuous real-valued functions on the interval [0. .2. We generally use the letters f. g. we are concerned exclusively with linear transformations from a vector space V into its field of scalars F . 2 . Fix a function g € V. • Example 3 Let V be a finite-dimensional vector space. the trace of A.6 Dual Spaces 2. Note that if V is finite-dimensional. Such a linear transformation is called a linear functional on V. . 2T T . Then fj is a linear functional on V called the i t h coordinate function with respect to the basis 0. Thus V* is the vector space consisting of all linear functionals on V with the operations of addition and scalar multiplication as defined in Section 2. and let 0 = {xi. define fi(x) = a». we have that f is a linear functional. the definite integral provides us with one of the most important examples of a linear functional in mathematics. Note that U(XJ) = Sij. These linear functionals play an important role in the theory of dual spaces (see Theorem 2. . ..Sec.3. where &y is the Kronecker delta. The function h: V —• R defined by 1 rn h(x) = — y x(t)g(t)dt is a linear functional on V. n. where /«i\ 02 N/3 = \dn) is the coordinate vector of x relative to 0.24).27r]. which is itself a vector space of dimension 1 over F . For a vector space V over F.F)) = dim(V) • dim(F) = dim(V).20 (p. . then by the corollary to Theorem 2. we define the dual space of V to be the vector space £(V. By Exercise 6 of Section 1. and define f: V -> F by f(A) = tr(A). denoted by V*. • Definition. . 2.6* DUAL SPACES 119 In this section. In the cases that g(t) equals sin nt or cos nt.. 104) dim(V*) = dim(£(V. h(x) is often called the n t h Fourier coefficient of x. to denote linear functionals. . x n } be an ordered basis for V.X2. F). h . For each i = 1 . ..

f2. i=\ from which it follows that 0* generates V*. we show. i=\ Therefore f = g by the corollary to Theorem 2.. i=l Proof Let f € V*. Let f* (1 < i < n) be the ith coordinate function with respect to 0 as just defined. (3.120 Chap.24. • • • . for any f G V*.f n } of\/* that satisfies U(XJ) — Sij (1 < i. 103). Suppose that the dual basis of 0 is given by 0* = {fi. we need to consider the equations 1 = fi(2.1) = fi(2ei + e 2 ) = 2fi(ei) + fi(e 2 ) 0 = f] (3.. In Theorem 2. and let 0* = {fi. it can be shown that f2(x. 2 Linear Transformations and Matrices Hence by Theorem 2.To explicitly determine a formula for fi. and hence is a basis by Corollary 2(a) to the replacement theorem (p.24. Solving these equations. • . and.j < n) the dual basis of 0. we obtain fi(ei) = —1 and fi(e2) — 3. | Definition.1)} be an ordered basis for R2. f2}. Theorem 2. 72). Let g = ^f(a:i)fi. . we have g(*j) = f ] [ ^ f ( z i ) f i j (xj) = ^ f ( f f i ) f i ( # j i=l n •=£ffo)*y=f(*i).26. fi(x. y) — —x + 3y. Since dim(V*) = n. V and V* are isomorphic.1).6 (p. we need only show that n f = ^f(xi)fi. Using the notation of Theorem 2.xn}.y) = x — 2y. • • • . 47). Example 4 Let 0 — {(2. we have f = ^f(xi)fi.1) = fi(3ei + e 2 ) = 3fi(ei) + fi(e 2 ). in fact. that there is a natural identification of V and V** in the case that V is finite-dimensional.f2.19 (p.. For 1 < j < n. we call the ordered basis 0* = {fi.fn}Then 0* is an ordered basis for V*. Similarly. Suppose that V is a finite-dimensional vector space with the ordered basis 0 — {x\.x2. We also define the double dual V** of V to be the dual of V*. that is.

T)(x. 2. the mapping V: W* -> V* defined by T'(g) = gT for all g e W* is a linear transformation with the property that [T']'« = ([T]!)'.g2.4. For g £ W*. column j entry of [T*]^. the question arises as to whether or not there exists a linear transformation U associated with T in some natural way such that U may be represented in some basis as A1.24. For convenience. • • • . x 2 . For a matrix of the form A = [T]l. Of course. It is clear that T is the unique linear transformation U such that [ < = cm?)*. we begin by expressing T (gj) as a linear combination of the vectors of 0*. . respectively.Sec. . Theorem 2. Thus T f maps W* into V*. s=l So the row i..25. To find the j t h column of [Tf]ij. In Section 2. is (gjT)(xi) = gj(T(Xi)) = gj I J2 Akiyk j m = ^2Akigj(yk) fc=i Hence [T*]£ = A*. respectively.f n } and 7* = {gi. • .».25 is called the transpose of T. let 0 = { x i . let A = [T]^. we have T t (g J ) = g j T = ^(g. For any linear transformation T: V -> W._ We illustrate Theorem 2. it would be impossible for U to be a linear transformation from V into W.. Proof. To complete the proof. . fc=i I The linear transformation Tl defined in Theorem 2. We leave the proof that Tb is linear to the reader.25 with the next example. We now answer this question by applying" what we have already learned about dual spaces. • • -/gm}. By Theorem 2.)f.s. Let V and W be finite-dimensional vector spaces over F with ordered bases 0 and 7.6 Dual Spaces 121 We now assume that V and W are finite-dimensional vector spaces over F with ordered bases 0 and 7. we proved that there is a one-to-one correspondence between linear transformations T: V — W and » mxn matrices (over F) via the correspondence T *-*• [T]2. it is clear that T*(g) = gT is a linear functional on V and hence is in V*. x n } and 7 = {3/1. respectively. rn = y~] Akj6jk = Aji.2/2..f2. if m / n. . .ym} with dual bases 0* — {fi.

so x G V**. we define x: V* — F by x(f) = f(x) for every f G V*. The correspondence x *-* x allows us to define the desired isomorphism between V and V**. and define ip: V — V** by ip(x) = x. Let V be a finite-dimensional vector space. But also (T t (gi))(l) = g i ( T ( l ) ) = g 1 ( l . * It is easy to verify that x is a linear functional on V*. an isomorphism between V and V** that does not depend on any choice of bases for the two vector spaces. Let /?* = {f 1. and d = 2. There is. and let x G V.p(2)). » . Lemma. .. Suppose that [T*]£ = P b \ Then T*(gi) = af.122 Example 5 Chap. m? . Let f = f. x n } for V such that xi = x. So a = 1. directly from the definition. Then fi(x x ) = 1 ^ 0.(1 j) • We compute [T']r.26. Hence a direct computation yields [T / as predicted by Theorem 2. If x(f) = 0 for all f G V*. Let 0 and 7 be the standard ordered bases for Pi(R) and R2. . l ) = l.. So T*( g l )(l) = (ofi + cf 2 )(l) = afi (1) + cf 2 (l) = a(l) + c(0) = a.25. Proof. +cf 2 .f1 1 = ( m ? ) \ 0 2 • We now concern ourselves with demonstrating that any finite-dimensional vector space V can be identified in a natural way with its double dual V**. 2 Linear Transformations and Matrices Define T: ?i{R) -* R2 by T(p(x)) = (p(0). . I Theorem 2.X2. . ... respectively. For a vector x G V. f n } be the dual basis of 0. Let x ^ 0.. b = 1. in fact. f 2 .g 2 }. We show that there exists f G V* such that x(f) ^ 0. . Let {fi. we obtain that c = 0. Choose an ordered basis 0 = {xi. Then ip is an isomorphism. then x = 0. f2} ami 7* = {gi. Let V be a finite-dimensional vector space. Clearly. Using similar computations.

(c) Every vector space is isomorphic to its dual space. By the previous lemma..X2. .. V*. Let {fi. we conclude that x = 0.. We may combine Theorems 2. Therefore -0(cx + y) = ex + y = cip(x) + V(y)- 123 (b) •/ is one-to-one: Suppose that i})(x) is the zero functional on V* for { > some x G V. (f) If T is a linear transformation from V to W. that is. Proof. x ^ | . Then x(f) = 0 for every f G V*..24 and 2. (c) ijj is an isomorphism: This follows from (b) and the fact that dim(V) = dim(V**). (e.y G V and c£ F. EXERCISES 1. for infinite-dimensional vector spaces. (g) If V is isomorphic to W.g. Then every ordered basis for V* is the dual basis for some basis for V.6 Dual Spaces Proof. then V* is isomorphic to W*. the existence of a dual space)... f n } is the dual basis of {xi.. (a) Every linear transformation is a linear functional. f2.Sec.. x n } in V**. and » V** are isomorphic.26 to conclude that for this basis for V* there exists a dual basis {xi.. In fact.. (a) if) is linear: Let x. Label the following statements as true or false. we have V>(cx + y)(f) = f(cx + y) = ef(x) + % ) = cx(f) + y(f) = (cx + y)(f). Let V be a finite-dimensional vector space with dual space V*. only a finite-dimensional vector space is isomorphic to its double dual via the map x — x. Assume that all vector spaces are finite-dimensional. (d) Every vector space is the dual of some other vector space. then the domain of (vy is v**. For f G V*. 2... | Corollary.. Thus {fi. then T(0) = 0*.f n } be an ordered basis for V*. I! Although many of the ideas of this section. 5ij = x~i(fj) = f/(xi) for all i and j. (e) If T is an isomorphism from V onto V* and 0 is a finite ordered basis for V. x2.. . can be extended to the case where V is not finite-dimensional.f 2 .. no two of V. . (b) A linear functional defined on a field may be represented as a 1 x 1 matrix.

Define T: V — W by > T(p(x)) = (p(0) .f2} is a basis for V*. z ) = a . 7.0.p(0) +p'(0)). / 3 = { l . b. (0.y. as in Example 4.2/) = (2x.f(A) = Au 3. (c) Compute [T]p and ([T]^)4./? = {(1.x). find explicit formulas for vectors of the dual basis 0* for V*. x . define fi. .y. f2(x.0. Prove that {fi. f2. f 5. f:i(x. and. Let V = Pi(. Let V = Pi(R) and W = R2 with respective standard ordered bases 0 and 7. f2} is the dual basis.1).f(i4)=tr(i4) R 3 . and compare your results with (b).y) (Zx + 2y.y.y) = (a) Compute T*(f). where 0 is the standard ordered basis for R2 and 0* = {f 1.f(p(x))=J 0 1 p(t)<ft (f) V = M2x2(F).4y) M2x2(F).124 Chap.1).f 2 G V* by fi(p(x))= / p(t)dt Jo and f 2 (p(x))= / Jo p(t)dt. where p'(x) is the derivative of p(x).R). 2. Let V = R3. Define f G (R2)* by f(x. 6. determine which are linear functionals. (b) Compute [T^p*. f(p(x)) = 2p'(0) + p"(l). Prove that {fi. and define fi.2p(l).2. z) = x + y + z. 2 Linear Transformations and Matrices (h) The derivative of a function may be considered as a linear functional on the vector space of differentiable functions.f(x. (a) (b) (c) (d) V= V= V= V= P(R). (1. and then find a basis for V for which it is the dual basis. c.z) = y .Zz. 2 y2 + z2 (e) V = P(/l).f3 G V* as follows: fi(x. For each of the following vector spaces V and bases 0. and d such that T'(fi) = ofi + cf2 and T'(f 2 ) = 6f: + df2. y . for p(x) G V.z) =x-2y. f ( x . x 2 } 4.f2. T3} is a basis for V*. by finding scalars a. (a) V = R 3 . = 2x + y and T: R2 -> R2 by T(x. For the following functions f on a vector space V. where ' denotes differentiation R 2 . and find a basis for V for which it is the dual basis.1)} (b) V = P 2 ( Z ? ) .

Let V = P n (F). Show that every plane through the origin in R may be identified with 3 the null space of a vector in (R )*. .. 3 8. c i .26. fli.f 2 (x).. a n (not necessarily distinct). that is. . (b) Compute [T*]^.. .p n (x) such that Pi(cj) = 5ij for 0 < i < n. compute T'(f).i). define U G V* by U(p(x)) = p(c. Ja X ^P(ci)Pi(x) . . Prove that a function T: F —• F is linear if and only if there exist n fi.6 Dual Spaces 125 (a) For f G W* defined by f (o. rt m 9.--. (c) For any scalars an. Hint: If T is linear.T)(x) for x G F n . and deduce that the first coefficient is zero. without appealing to Theorem 2.gm} is the dual basis of the standard ordered basis for F m .. .. i=0 n (d) Deduce the Lagrange interpolation formula: P( ) = i=0 for any p(x) G V. Prove that {fo. . deduce that there exists a unique polynomial q(x) of degree at most n such that q(ci) — a... • • • .i for 0 < i < n. where {gi. In fact. . Hint: Apply any linear combination of this set that equals the zero transformation to p(x) = (x — ci)(x — c2) • • • (x — cn).f m (x)) for all x G F n . State an analogous result for R2.f 2 . cn be distinct scalars in F . b) = a .fm G (F )* such that T(x) = (fi(x). (a) For 0 < i < n. .. Ja i=0 where di= [ Pi(t)dt. and let c n . 10.. f i . a x ( ) = ^2<hPi(x).26 and (a) to show that there exist unique polynomials po(x). .pi(x).Sec.6. (b) Use the corollary to Theorem 2. fi — T'(gi) for 1 < i < m. .. (e) Prove that r-b n j p(t)dt = ^2p(ci)di. 2. (c) Compute [T]« and its transpose.25. define fj(x) = (g.f n } is a basis for V*.. These polynomials are the Lagrange polynomials defined in Section 1.g2. and compare your results with (b).

. (a) Prove that S° is a subspace of V*. 2 Linear Transformations and Matrices i(b — a) n . xk] of W to an ordered basis 0 = {xi. Prove that if W is a subspace of V.. Let 0* = {fi. prove that ip2T = T " ^ ) . (e) For subspaces Wi and W 2 . For n — 2. V denotes a finite-dimensional vector space over F . prove that there exists f G W° such that f (x) ^ 0. and define » T " = (T*)*. 14. Hint: Extend an ordered basis {xi. show that (Wj + W 2 )° = Wj n W§. In Exercises 13 through 17. For every subset S of V.126 Suppose now that Ci = a-\ Chap.f„} is a basis for W°. Prove that the diagram depicted in Figure 2. V W i>2 V / W Figure 2.X2. . ..ffc+2.. and let tp\ and ip2 be the isomorphisms between V and V** and W and W**.6 commutes (i.26. (b) If W is a subspace of V and x g" W. . 1 . 13.. define the annihilator 5° of 5 as 5° = {f G V*: f(x) = 0 for all x G S}.26. n. x n } of V. 11. this result yields Simpson's rule for evaluating the definite integral of a polynomial. For n = 1. where 'ip is defined as in Theorem 2.26. _i for % = 0 .e. respectively. Let V and W be finite-dimensional vector spaces over F .f2. as defined in Theorem 2.. Let V be a finite-dimensional vector space with the ordered basis 0.X2. . . . where ip is defined in Theorem 2. -. Let T: V — W be linear. Prove that {f fc+ i. prove that Wi = W2 if and only if W? = Wg. Prove that ip(0) = 0**. (c) Prove that (5°)° = span(^'(S'))..f n }. the preceding result yields the trapezoidal rule for evaluating the definite integral of a polynomial. then dim(W) + dim(W°) = dim(V). • • •.... (d) For subspaces Wi and W2.6 12.

Suppose that the weight is now motionless and impose an xy-coordinate system with the weight at the origin and the spring lying on the positive y-axis (see Figure 2.7. every vector space has a basis. Let V and W be nonzero vector spaces over the''same field. let F(t) denote the force acting on the weight and y(t) denote the position of the weight along the y-axis. For example. Suppose that at a certain time.7 Homogeneous Linear Differential Equations with Constant Coefficients 127 15. Let T be a linear operator on V. Prove that W is T-invariant (as defined in the exercises of Section 2. 17. use Exercise 34 of Section 2. 18. F) be the mapping defined by $(f) = fs. Prove that N(T<) = (R(T))°. and let T: V — W be a linear transformation. and let 5 be a basis for V. Use Exercises 14 and 15 to deduce that rank(L^t) = rank(L^) for any A&MmXn(F). W / V ) . the restriction of f to S. 60) in Section 1. and let W be a proper subspace of V (i.1 as well as results about extending linearly independent sets to bases in Section 1. (By the corollary to Theorem 1.7* HOMOGENEOUS LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS As an introduction to this section.1. 2.Sec. Suppose that W is a finite-dimensional vector space and that T: V — W » is linear. A weight of mass m is attached to a vertically suspended spring that is allowed to stretch until the forces acting on the weight are in equilibrium. say t = 0. consider the following physical problem. (b) Prove that T* is onto if and only if T is one-to-one. I Hint: Apply Exercise 34 of Section 2. y(0) = — s. the weight is lowered a distance s along the y-axis and released. Hint: Parts of the proof require the result of Exercise 19 for the infinitedimensional case.7).) Let $: V* —• J-(S. Let V be a nonzero vector space over a field F . At any time t > 0..1) if and only if W° is ^-invariant. » (a) Prove that T is onto if and only if T* is one-to-one. 2. The spring then begins to oscillate. Prove that there exists a nonzero linear functional f G V* such that f (x) = 0 for all x G W. 16. 19. Prove that <> is an isomorphism. The second derivative of y with respect . We describe the motion of the spring. and let W be a subspace of V. 20.e. Hint: For the infinite-dimensional case. Let V be a nonzero vector space.7.13 (p.

t. is the acceleration of the weight at time t\ hence. Thus (3) is an example of a linear differential equation in which the coefficients are constants and the function / is identically zero. . and that this force satisfies Hooke's law: The force acting on the weight is proportional to its displacement from the equilibrium position. (4) is called homogeneous. A differential equation in an unknown function y = y(t) is an equation involving y. . (4) where an. In this section. a i . and derivatives of y. we apply the linear algebra we have studied to solve homogeneous linear differential equations with constant coefficients.128 Chap. When / is identically zero. If an ^ 0. y"(t). (1) It is reasonable to assume that the force acting on the weight is due totally to the tension of the spring. (2) Combining (1) and (2). . 2 Linear Transformations and Matrices Figure 2. o n and / are functions of t and y^k' denotes the kth derivative of y. The functions ai are called the coefficients of the differential equation (4).7 to time. (3) The expression (3) is an example of a differential equation. If the differential equation is of the form any (n) an-iy (n-1) + •• a\y (i) a>oy = f. then the equation is said to be linear. F(t) = my"(t). but acts in the opposite direction. by Newton's second law of motion. . . If k > 0 is the proportionality constant. then Hooke's law states that F(t) = -ky(t). we obtain my" = —ky or y + — y = 0.

we divide both sides by an to obtain a new. such that x(t) = xi(t) + ix2(t) for t G R. Thus we are concerned with the vector space J-~(R. Thus y(t) — t is not a solution to (3). we define the derivative x' of x by 3s —'dji ~\~ IXo' We illustrate some computations with complex-valued functions in the following example.7 Homogeneous Linear Differential Equations with Constant Coefficients 129 we say that differential equation (4) is of order n. Because of this observation. we must define what it means to differentiate such functions. In order to consider complex-valued functions of a real variable as solutions to differential equations. there exist unique real-valued functions xi and X2 of i. n — 1. 2. that substituting y(t) = t into (3) yields k y"(t) + -y(t) m = k -t. If x is differentiable. we say that x is differentiable ifx\ and x2 are differentiable. equation . . 1 . C) with real part x\ and imaginary part x2. A solution to (4) is a function that when substituted for y reduces (4) to an identity. . . In our study of differential equations. C) (as defined in Example 3 of Section 1. it is useful to regard solutions as complex-valued functions of a real variable even though the solutions that are meaningful to us in a physical sense are real-valued. . however. Notice. we always assume that the coefficient an in (4) is 1. = a . C) of a real variable t. .Sec. Definitions. but equivalent.2). In this case. We call Xi the real part and X2 the imaginary part of x. where i is the imaginary number such that i2 = —1. Given a function x G J-(R.'/ (n) {1) + b0y = 0. / a n for i = 0 . m • which is not identically zero. The convenience of this viewpoint will become clear later. Example 1 The function y(t) = sin y/k/m t is a solution to (3) since y (t) + —y(t) = = sin \ — t + — sin m m m V m m for all f. Given a complex-valued function x G T(R. K-iy (n-1) + • • • + hy where 6.

27. then x^k' exists for every positive integer k. in fact. Since j / 4 ' is a constant multiple of a function that we have shown has at least two derivatives. if x is a solution to such an equation. 3. we can show that any solution has derivatives of all orders. however. C) and hence a vector space over C. It is a simple exercise to show that C°° is a subspace of F(R. it also has at least two derivatives. Its proof. Continuing in this manner. • Definition. involves a simple induction argument. to qualify as a solution. C) that have derivatives of all orders. Since x2(t) = (cos 2t + i sin 2t)2 = (cos2 2t . hence y(6> exists.27. Then x'(t) = -2 sin 2t + 2i cos 2t. We use C°° to denote the set of all functions in T(R. • The next theorem indicates that we may limit our investigations to a vector space considerably smaller than T(R.130 Example 2 Chap. then y(2) = -4yThus since y^ is a constant multiple of a function y that has two derivatives. consider the equation / y(2) + 4y = 0. Any solution to a homogeneous linear differential equation with constant coefficients has derivatives of all orders.(4) = -4yM. a function y must have two derivatives. In view of Theorem 2. yW must have two derivatives. which is illustrated in Example 3. Theorem 2.sin2 2t) + i(2 sin 2t cos 2i) = cos4i + isin4t. If y is a solution. which we omit. C). Hence y^ exists. the real part of x2(t) is cos4i. it is this vector space that . Example 3 To illustrate Theorem 2. We next find the real and imaginary parts of x 2 . Clearly.27. that is. 2 Linear Transformations and Matrices Suppose that x(t) — cos 2t + i sin 2t. and the imaginary part is sin4£.

More generally. The order of the differential operator p(D) is the degree of the polynomial p(t).) Definitions. Given the differential equation above. It is easy to show that D is a linear operator. For example. this equation implies the following theorem. consider any polynomial over C of the form p(t) = antn If we define p(D) = anDn + On-iD*.l + a\t + OQ. 2. y{n)+an-iy(n-v + --. For any polynomial p(t) over C of positive degree. mial aiD + ool)(y) = 0. p(D) is called a differential operator. Differential operators are useful since they provide us with a means of reformulating a differential equation in the context of linear algebra. Any homogeneous linear differential equation with constant coefficients. (See Appendix E.Sec. +an-it n . (3) has the auxiliary polynomial Any homogeneous linear differential equation with constant coefficients can be rewritten as p(D)(y) = 0.1 + • • • + o i D + o 0 l. then p(D) is a linear operator on C°°.1 + Definition. For x G C°°. the derivative x' of x also lies in C°°.+ aiy{i) +a0y=(). We can use the derivative operation to define a mapping D: C°° — C°° by + D(x) = x' forxGC00.Dn + o w _ i D n . where p(t) is the auxiliary polynomial associated with the equation. can be rewritten using differential operators as .7 Homogeneous Linear Differential Equations with Constant Coefficients 131 is of interest to us. . the complex polynop(t) = tn + o n _ i i n _ 1 + • • • + oii + a 0 is called the auxiliary polynomial associated with the equation. Clearly.

Definition. for c = 2 + i(n/Z). = e (cos-+Jsin-)=e ^- Clearly. For example. For a real number s.28. certain properties of exponentiation. I Corollary. for instance. then we obtain the usual result: ec = e a . 2 Linear Transformations and Matrices Theorem 2. In view of the preceding corollary. We know. we can show by the use of trigonometric identities that ec+d = eced for any complex numbers c and d. We now extend the definition of powers of e to include complex numbers in such a way that these properties are preserved.. we are familiar with the real number e9. we call the set of solutions to a homogeneous linear differential equation with constant coefficients the solution space of the equation. The special case Jb e = cos b + i sin 6 is called Euler's formula. A practical way of describing such a space is in terms of a basis. The set of all solutions to a homogeneous linear differential equation with constant coefficients coincides with the null space ofp(D). and e'c = — . Let 6 = a + ib be a complex number with real part a and imaginary part b.e. if c is real (6 = 0). es+t = eset and e — -r for any real numbers s and t. where p(t) is the auxiliary polynomial associated with the equation. lne = 1). namely. Define ec = ea(cosb + isinb). Using the approach of Example 2. The set of all solutions to a homogeneous linear differential equation with constant coefficients is a subspace of C°°. We now examine a certain class of functions that is of use in finding bases for these solution spaces.132 Chap. where e is the unique number whose natural logarithm is 1 (i. Proof Exercise.

A justification of this involves a lengthy. The proof. A function f:R—*C defined by f(t) complex number c is called an exponential function. which relies on the real case. z is a constant function. f'(t) = cect.) Thus there exists a complex number k such that z(t) = eaotx(t) = k So x(t) = ke-aot. | t as a solution. is consistent with the real version. Exercise.tx(t). (Again. involves looking separately at the real and imaginary parts of z. for all t G R. The solution space for (5) is of dimension 1 and has {e~ } as a basis. computation. (Notice that the familiar product rule for differentiation holds for complexvalued functions of a real variable. .a0eaotx(t) = 0. aot (5) Theorem 2. as described in the next theorem. Clearly (5) has e _ a . this fact.29. 1 We can use exponential functions to describe all solutions to a homogeneous linear differential equation of order 1. = (eaot)'x(t) + e°°V(t) = o0e°°*x(*) . Recall that the order of such an equation is the degree of its auxiliary polynomial. The proof involves a straightforward computation.7 Homogeneous Linear Differential Equations with Constant Coefficients 133 Definition.) Since z' is identically zero. well known for real-valued functions. Proof. For any exponential function f(t) = ect. Thus an equation of order 1 is of the form y' + a0y = 0. Theorem 2.30. = ect for a fixed The derivative of an exponential function. 2. Suppose that x(t) is any solution to (5).Sec. Then x'(t) = -a0x(t) Define z(t) = Differentiating z yields z'(t) for all t G R. is also true for complex-valued functions. y eai. which we leave as an exercise. We conclude that any solution to (5) is a linear combination of e~aot. although direct. Proof.

by Exercise 9. The operators D — c j commute. the null space of the differential operator D — cl has {ect} as a basis. Hence. Thus if we can show that the solution space is two-dimensional. It is a simple matter to show that {e e .31..e2*}) lies in the solution space.134 Chap.l)(t . .i. that is. factors into a product of polynomials of degree 1.) Thus p(D) = (D-c1\)(D-c2\)---(D-cn\). Corollary. Given an nth order homogeneous linear differential equation with constant coefficients..30 is as follows. p(t) = (t-ci)(t-c2)---(t-cn). . (This follows from the fundamental theorem of algebra in Appendix D.e 2 t } is linearly independent. if c is a zero ofp(t). 2 Linear Transformations and Matrices Another way of stating Theorem 2.31. el and e2t are solutions to the differential equation because c — 1 and c — 2 are zeros of p(t). then ect is a solution to the differential equation. Example 4 Given the differential equation y"-3y' its auxiliary polynomial is p(t) = t2 . we have that r. we can deduce the following result from the preceding corollary. • + 2y=0. N(D-Cil) C N(p(D)) foralli Since N(p(D)) coincides with the solution space of the given differential equation.c2.U + 2 = (t . Let p(t) be the auxiliary polynomial for a homogeneous linear differential equation with constant coefficients. Theorem 2. ^+an. For any complex number c. span({e*. and so.cn are (not necessarily distinct) complex numbers.e2*} is a basis for the solution space. For any complex number c. Since the solution space of the differential equation is a subspace of C°°. This result is a consequence of the next theorem. where c..l2/(»-1' + its auxiliary polynomial p(t) = tn + an_xtn'1 + • • • + ait + oo OiyW + a 0 ? / = 0. We next concern ourselves with differential equations of order greater than one.2). we can conclude that {e*. by Theorem 2.

Clearly.. we can choose for each i (1 < i < p) a vector Wi G V such that U(wi) — m. The differential operator D — c\: C°° — C°° is onto for any > complex number c. and dim(N(TU)) = dim(N(T)) + dim(N(U)). Let V be a vector space. Then the null space of TU is finite-dimensional. Furthermore. To complete the proof of the lemma. Let w(t) = v(t)e~ct for t G R. Then W G C°°.vi. . and the real and imaginary parts of W are W\ and W2.c\)u(t) = u'(t) .v2..vq} contains p + q distinct vectors.wp... . and {ui.. Note that the Wi's are distinct. —* C be defined by W(t) = Wt (t) + iW2(t) for t G R. . Hence the set 0= {wi. Clearly u G C°°. Then w\ and w2 are continuous because they are differentiable..u2. respectively. For any differential operator p(D) of order n.w2. w G C°° because both v and e~ct lie in C°°. the null space ofp(D) is an n-dimensional subspace of C°°. . q = dim(N(U)). Let W\ and w2 be the real and imaginary parts of w.:t = v(t)erctert = v(t). II + W(t)cect cW(f)ect L e m m a 2. let u: R — C be defined by * u(t) = W(t)ect for t G R. Proof. Let v G C°°.7 Homogeneous Linear Differential Equations with Constant Coefficients 135 Theorem 2. and since (D .Sec.. Furthermore. Proof. it suffices to show that 0 is a basis for N(TU). W = w..cu(t) = W'(t)cct = w(t)e. for any i and j.up} and {v\.. Finally. W\ and W2. we have (D — c\)u = v. respectively.Vq] be bases for N(T) and N(U). . 2. and suppose that T and U are linear operators on V such that U is onto and the null spaces of T and U are finite-dimensional. say. Let W: R. Hence they have antiderivatives. We wish to find a u G C°° such that (D — c\)u = v. Wi 7^ Vj. respectively. Lemma 1. Since U is onto. Let p — dim(N(T)).32.32. for otherwise Ui = i)(wi) = U(VJ) = 0—a contradiction.. we establish two lemmas... As a preliminary to the proof of Theorem 2.v2.

a2..(aiwi + a2w2 H or v = aiu/i + 02^2 H f. T U K ) = T(ui) = 0 and TU(^) = T(0) = 0. and consider a differential . any scalars such that aiwi + a 2 u'2 H h apwp + &1V1 + 62^2 H r.vq} implies that the frj's are all zero. .. let ai. it follows that 0 C N(TU). .&9vg = 0.02.aPwp)) = 0. Since {m.6 g v 9 . Thus (6) &if 1 + b2v2 " 1 r bqvq = 0. we obtain oiUi + a2u2 + • • • + apup — 0.apwp + 61 vi + 62^2 H r. To prove that 0 is linearly independent. the a. .v2. We conclude that 0 is a basis for N(TU).. Again. It follows that there exist scalars 61. V apup h aPU(wp) f.b\. 2 Linear Transformations and Matrices We first show that 0 generates N(TU). the linear independence of {v\.o p such that \J(v) = aiu\ + 02^2 + = \J(ttiWi + 02^2 + Hence \J(v — (aiwi + 02^2 H f. Now suppose that v G N(TU).bq be r.ap..up] is linearly independent. .b2. suppose that Theorem 2. b2.npwp).. (6) Applying U to both sides of (6). bq such that v .. The proof is by mathematical induction on the order of the differential operator p(D).. The first-order case coincides with Theorem 2...30...136 Chap.u2.TU(v) = T(U(v))...32 holds for any differential operator of order less than n.. I Proof of Theorem 2. Hence N(TU) is finitedimensional. Then 0 . Since for any Wi and Vj in 0.a p w p ) = 61^1 + &W H 2 2 h &?u9 Therefore /? spans N(TU)..32.. reduces to . v — (a\Wi + 02^2 + • • • + apwp) lies in N(U). and dim(N(TU)) = p + q = dim(N(T)) + dim(N(U)). For some integer n > 1.'s are all zero. = aiU(w\) + a2U(w2) H Consequently... Thus U(v) € N(T). So there exist scalars 01..

For any nth-order homogeneous linear differential equation with constant coefficients.. Theorem 2.c2.eC2t. Proof Exercise.cl)) = (n . any such set must be a basis for the solution space. The corollary to Theorem 2.c2. Thus. The next theorem enables us to find a basis quickly for many such equations.. where q(t) is a polynomial of degree n — 1 and c is a complex number. Given n distinct complex numbers c. and by the corollary to Theorem 2.d ) ) = 1. (See Exercise 10... then {eCl*. Proof.. the set of exponential functions {eClt.. D — cl is onto. we conclude that dim(N(p(D))) = dim(N(<-/(D))) + dim(N(D .30.Cn. Hints for its proof arc provided in the exercises.1) + 1 = n. (See Exercise 10. I Corollary. . dim(N(r/(D)) = n—1.\. by the induction hypothesis. eCn*} is a basis for the solution space of the differential equation. if the auxiliary polynomial has n distinct zeros Ci.. Exercise. Also.ec. The polynomial p(t) can be factored into a product of two polynomials as follows: p(t) = q(t)(t-c).32 reduces the problem of finding all solutions to an r/.Sec.) 1 Corollary. 2.33. II . by Lemma 1..tt} is linearly independent. . The solution space of any nth-order homogeneous linear differential equation with constant coefficients is an n-dimensional subspace of C°°.th-order homogeneous linear differential equation with constant coefficients to finding a set of n linearly independent solutions to the equation. By the results of Chapter 1.. ec'2t.. Now.7 Homogeneous Linear Differential Equations with Constant Coefficients 137 operator p(D) of order n... Thus the given differential operator may be rewritten as p(D) = g(D)(D-cl)..c n .) Example 5 We find all solutions to the differential equation y" + 5y' + 4y=0. d i m ( N ( D . by Lemma 2..

138 Chap. Lemma.31. By Theorem 2. • + b2e-4t it follows from Exercise 7 that {cos St.e~4t} is a basis for the solution space. it has two distinct zeros. This basis has an advantage over the original one because it consists of the familiar sine and cosine functions and makes no reference to the imaginary number i. Thus {e~L. For a given complex number c and positive integer n. Using this latter basis.e-3**} is a basis for the solution space. Example 6 We find all solutions to the differential equation y" + 9y = 0. So any solution to the given equation is of the form y(t)=ble-t for unique scalars b\ and b2. The reader can verify that te~l is a such a solution.tect. Then the set 0={ect..tn-1ect} is a basis for the solution space of the equation.. e~l is a solution to this equation. Thus {e3*f. suppose that (t — c)n is the auxiliary polynomial of a homogeneous linear differential equation with constant coefficients.. The following lemma extends this result. Since cos 3^ = . . By the corollary to Theorem 2. sin 3t} is also a basis for this solution space. we need a solution that is linearly independent of e _ t . we see that any solution to the given equation is of the form y(t) = b\ cos St + b2 sin 3t for unique scalars &iand b2.32. its solution space is two-dimensional. In order to obtain a basis for the solution space. for which the auxiliary polynomial is (t + l) 2 . 2 Linear Transformations and Matrices Since the auxiliary polynomial factors as (t + 4)(t + 1).( e3/7 + e~ 3 i t ) and sin3£ = —(e :nt 2i -3it ). • Next consider the differential equation y" + 2y' + y = 0. The auxiliary polynomial t2 + 9 factors as (t — 3i)(t + 3?') and hence has distinct zeros Ci = 3i and c2 — —3i. —1 and —4..

n2..4t6 + 6t2 -4t+l = (tl) 4 ... (D-c\)n(tkect) = 0. 2. ktk~1ect.... tef.7 Homogeneous Linear Differential Equations with Constant Coefficients 139 Proof.. Consider any linear combination of vectors in 0 such that . So any solution y to the given differential equation is of the form y(t) = bie1 + bite* + M 2 e* + ht3e* for unique scalars b\.. we obtain &n + M 1 bn^tn~n-l = 0... K ct k-\ ct Lk„ct D . .ck are distinct complex numbers. i3e*} is a basis for the solution space..34.... &i.bi.tni-1ec.t. we can immediately conclude by the preceding lemma that {e*.. It follows that 0 is a subset of the solution space. b2. We next show that 0 is linearly independent.ctKe e = Hence for k < n. So 0 is linearly independent and hence is a basis for the solution space. First.. we need only show that 0 is linearly independent and lies in the solution space. Since the solution space is n-dimensional. .ie Cfct .\. T h e o r e m 2. . Since the auxiliary polynomial is t4 . Given a homogeneous linear differential equation with constant coefficients and auxiliary polynomial (t-Ci)ni(t-c2)n*-'-(t-ck)n\ where n\.teCl\ .tn" eCkt}.Dividing by ect in (7). bj. (8) Thus the left side of (8) must be the zero polynomial function...bn—i are all zero. f] Example 7 We find all solutions to the differential equation y(4) _ 4y (3) + §yJy(2) ...e Cfct .Sec.. 6 n -i.c2.^ ( l ) K +y=0. fte1:. . observe that for any positive integer k.Tl-l„ct b0ect + hie" + • • • + bn-itn-*ea =0 (7) for some scalars bo.. • The most general situation is stated in the following theorem. the following set is a basis for the solution space of the equation: {eClt. We conclude that the coefficients bo.nk arc positive integers and c.c\)(tKect) = ktK~le„ct + ctLk„ct.. and 64. .

l)2(t . (a) The set of solutions to an nth-order homogeneous linear differential equation with constant coefficients is an n-dimensional subspace of C°°. Example 8 The differential equation y(3) Chap. if ci.c C2 *.140 Proof. (d) Any solution to a homogeneous linear differential equation with constant coefficients is of the form aect or atkect. By Theorem 2.--. there exists a homogeneous linear differential equation with constant coefficients whose auxiliary polynomial is p(t). 2 Linear Transformations and Matrices | _ 4y(2) + 5^(1) -2y= 0 has the auxiliary polynomial f 3 .b2..Cfe are the distinct zeros of p(t). (e) Any linear combination of solutions to a given homogeneous linear differential equation with constant coefficients is also a solution to the given equation.eCkt} is a basis for the solution space of the given differential equation. .2). (c) The auxiliary polynomial of a homogeneous linear differential equation with constant coefficients is a solution to the differential equation. {e*.C2.34. where a and c are complex numbers and k is a positive integer.. Exercise.e2<} is a basis for the solution space of the differential equation. Thus any solution y has the form y(t) = 6ie* + b2tet + b3e2t for unique scalars b\. Label the following statements as true or false. and 63. then {e Cl '. (g) Given any polynomial p(t) G P(C). (b) The solution space of a homogeneous linear differential equation with constant coefficients is the null space of a differential operator. • EXERCISES 1. ..4£2 + 5t ..2 = (* . (f) For any homogeneous linear differential equation with constant coefficients having auxiliary polynomial p(t).£e*.

determine whether the statement is true or false. (a) Any finite-dimensional subspace of C°° is the solution space of a homogeneous linear differential equation with constant coefficients. y} is a basis for a vector space over C. 6.b G R. (a) (b) (c) (d) (e) y" + 2y' + y = 0 */" = . C). (e) xyeU(p(D)q(D)).7 Homogeneous Linear Differential Equations with Constant Coefficients 141 2.^-y) 8. then (d) x + yeU(P(D)q(D)). Show that C°° is a subspace of T(R. if a. where a. then so is \{=o + v). (a) Show that D: C°° — C°° is a linear operator. 2.I ) (c) N(D3 + 6D 2 +8D) 5.Sec. Show that {eat cos bt. (b) There exists a homogeneous linear differential equation with constant coefficients whose solution space has the basis {t. eat sin bt} is a basis for the solution space.D . For each of the following parts. 3.I ) (b) N ( D 3 . Given two polynomials p(t) and q(t) in P(C). (c) For any homogeneous linear differential equation with constant coefficients. is a solution to the equation.3 D 2 + 3 D . Consider a second-order homogeneous linear differential equation with constant coefficients in which the auxiliary polynomial has distinct conjugate complex roots a + ib and a — ib. whichever is appropriate.</ ?/4) .(/2> + y = 0 y" + 2y. if x G N(p(D)) and y G N((/(D)). Justify your claim with either a proof or a counterexample. so is its derivative x'. .2.y{2) + 3y{1) +5y = 0 / 4. + y=0 y(3) . (a) N ( D 2 . Find a basis for each of the following subspaces of C°°. Prove that if {x. Find a basis for the solution space of each of the following differential equations. * (b) Show that any differential operator is a linear operator on C01 7. t2}.

10. U2.. Prove Theorem 2. Then prove that the two spaces have the same finite dimension. for any i (1 < i < n). apply mathematical induction on n Verify the theorem for n = 1. as follows. apply the operator (D — Cfcl)"fc to any linear combination of the alleged basis that equals 0. Let V be the solution space of an nth-order homogeneous linear differential equation with constant coefficients having auxiliary polynomial p(t). The case k = 1 is the lemma to Theorem 2. • • •. Hint: Suppose that b\eClt + b2eC2t -\ h bneCnt = 0 (where the c^'s are distinct). Assuming that the theorem holds for k — 1 distinct Cj's. Then verify that this set is linearly independent by mathematical induction on k as follows. To show the b^s are zero. Hint: First prove > o(D)(V) C N(/i(D)). Prove that if p(t) = g(t)h(t).34. operators such that U7Uj = UjUi for all i.j). 13. 2 Linear Transformations and Matrices 9. the linear operator p(D): C°° —• C°° is onto. is true for of the given functions. apply the operator D — c n l to both sides equation to establish the theorem for n distinct exponential 11. where g(t) and h(t) are polynomials of positive degree. Hint: First verify that the alleged basis lies in the solution space.e.142 Chap. A differential equation y ( n ) + an-iy{n-l) + •••+ alV{l) + a0y = x is called a nonhomogeneous linear differential equation with constant coefficients if the a% 's are constant and x is a function that is not identically zero. then N(MD)) = R(o(Dv)) = p(D)(V). Suppose that {Ui.33 and its corollary.32 to show that for any polynomial pit). Hint: Use Lemma 1 to Theorem 2.34. Prove that. . N(U i )CN(UiU 2 ---U n ). U n } is a collection of pairwise commutative linear operators on a vector space V (i. where Dv: V — V is defined by Dv(x) = x' for x G V. (a) Prove that for any x G C°° there exists y G C°° such that y is a solution to the differential equation. Prove Theorem 2. 12. Assuming that the theorem n — 1 functions.

Prove that if z is any solution to the associated nonhomogeneous linear differential equation. Given any nth-order homogeneous linear differential equation with constant coefficients.. there exists exactly one solution. Hint: Use Exercise 14. interpreted so that 6 is positive if the pendulum is to the right and negative if the pendulum is to the . then x = 0 (the zero function). Factor p(t) = q(t)(t — c). x. Show that z(to) = 0 a n d z' — cz — 0 to conclude that z = 0. 2 . Ci. First prove the conclusion for the case n = 1.c n _i (not necessarily distinct)..7 Homogeneous Linear Differential Equations with Constant Coefficients 143 (b) Let V be the solution space for the homogeneous linear equation y ^ + on-i^"-1) •+Oii(/ (1) +a0y= 0. . and any complex numbers Co. any to G R. (b) Prove the following: For any nth-order homogeneous linear differential equation with constant coefficients. for any solution x and any to G R. 14.T(TC_1)(£0) = 0. It is well known that the motion of a pendulum is approximated by the differential equation 0" + ^8= 0. Now apply the induction hypothesis.Sec. n — 1. . . Let V be the solution space of an nth-order homogeneous linear differential equation with constant coefficients. and define a mapping $ : V — C by + ™ * *(*) = x(to) x'(to) \ for each x in V. 2. then the set of all solutions to the nonhomogeneous linear differential equation is [z + y: ye V}. 15.. \xSn-lHto)J (a) Prove that $ is linear and its null space is the zero subspace of V. 16. Next suppose that it is true for equations of order n — 1. if x(t0) = x'(to) = ••• = . and consider an nth-order differential equation with auxiliary polynomial p(t). prove that. to the given differential equation such that x(to) — co and x^(to) = ck for k = 1.8). . where 6(t) is the angle in radians that the pendulum makes with a vertical line at time t (see Figure 2. Pendular Motion. Fix ta G R. Deduce that «£ is an isomorphism. Hint: Use mathematical induction on n as follows. and let z = q((D))x.

18.144 Chap. Periodic Motion of a Spring with Damping.) (c) Prove that it takes 2-Ky/lJg units of time for the pendulum to make one circuit back and forth. The variable t and constants / and g must be in compatible units (e. In reality. (This time is called the period of the pendulum. and g in meters per second per second). 2 Linear Transformations and Matrices Figure 2. Here I is the length of the pendulum and g is the magnitude of acceleration due to gravity. there is a frictional force acting on the motion that is proportional to the speed of motion. . t in seconds. (a) Express an arbitrary solution to this equation as a linear combination of two real-valued solutions. however.g. Periodic Motion of a Spring without Damping. is given by my + ry' + ky — 0. called the damping force. but that acts in the opposite direction. The modification of (3) to account for the frictional force. (b) Find the unique solution to the equation that satisfies the conditions 0(0) = 0O > 0 and 0'(O) = 0. which describes the periodic motion of a spring. Find the general solution to (3). where r > 0 is the proportionality constant. (a) Find the general solution to this equation. (The significance of these conditions is that at time t = 0 the pendulum is released from a position displaced from the vertical by 0n. The ideal periodic motion described by solutions to (3) is due to the ignoring of frictional forces. ignoring frictional forces.) 17.8 left of the vertical line as viewed by the reader. / in meters..

Hint: Use mathematical induction on the number of derivatives possessed by a solution.27. C) and x' — 0. (d) Prove Theorem 2. prove that ec+d = cced and (c) Prove Theorem 2. (f) Prove that if x G J-(R. prove that lim y(t) = 0. then x is a constant function.28. the initial velocity. arc included for the sake of completeness.Chap. which do not involve linear algebra.d G C. 2 Index of Definitions 145 (b) Find the unique solution in (a) that satisfies the initial conditions y(0) — 0 and y'(0) = vo. the product xy is differentiable and (xy)' = x'y + xy'. we have regarded solutions as complex-valued functions even though functions that are useful in describing physical motion are real-valued. In our study of differential equations. Justify this approach. C). (b) For any c. (c) For y(t) as in (b). Hint: Apply the rules of differentiation to the real and imaginary parts of xy.29. show that the amplitude of the oscillation decreases to zero. that is. (a) Prove Theorem 2. t—>oo 19. INDEX OF DEFINITIONS FOR CHAPTER 2 Auxiliary polynomial 131 Change of coordinate matrix 112 Clique 94 Coefficients of a differential equation 128 Coordinate function 119 Coordinate vector relative to a basis 80 Differential equation 128 Differential operator 131 Dimension theorem 69 Dominance relation 95 Double dual 120 Dual basis 120 Dual space 119 Euler's formula 132 Exponential function 133 Fourier coefficient 119 Homogeneous linear differential equation 128 Identity matrix 89 Identity transformation 67 . (e) Prove the product rule for differentiating complex-valued functions of a real variable: For any differentiable functions x and y in T(R. 20. The following parts.

146 Chap. 2 Linear Transformations and Matrices Order of a differential operator 131 Product of matrices 87 Projection on a subspace 76 Projection on the x-axis 66 Range 67 Rank of a linear transformation 69 Reflection about the x-axis 66 Rotation 66 Similar matrices 115 Solution to a differential equation 129 Solution space of a homogeneous differential equation 132 Standard ordered basis for F" 79 Standard ordered basis for Pn(F) 79 Standard representation of a vector space with respect to a basis 104 Transpose of a linear transformation 121 Zero transformation 67 Incidence matrix 94 Inverse of a linear transformation 99 Inverse of a matrix 100 Invertible linear transformation 99 Invertible matrix 100 Isomorphic vector spaces 102 Isomorphism 102 Kronecker delta 89 Left-multiplication transformation 92 Linear functional 119 Linear operator 112 Linear transformation 65 Matrix representing a linear transformation 80 Nonhomogeneous differential equation 142 Nullity of a linear transformation 69 Null space 67 Ordered basis 79 Order of a differential equation 129 / .

3 E l e m e n t a r y O p e r a t i o n s o f L i n e a r M a t r i x a n d S y s t e m s E q u a t i o n s 3. involves the elimination of variables so that a simpler system can be obtained. which was discussed in Section 1.4. 3. 147 .1 3.3. The technique by which the variables are eliminated utilizes three types of operations: 1. 2. These operations provide a convenient computational method for determining all solutions to a system of linear equations. 2. As a consequence of objective 1. interchanging any two equations in the system.2 3. we obtain a simple method for computing the rank of a linear transformation between finite-dimensional vector spaces by applying these rank-preserving matrix operations to a matrix that represents that transformation. we express a system of linear equations as a single matrix equation. adding a multiple of one equation to another. the study of certain "rank-preserving" operations on matrices. The familiar method of elimination for solving systems of linear equations. the three operations above are the "elementary row operations" for matrices. In Section 3.4 Elementary Matrix Operations and Elementary Matrices The Rank of a Matrix and Matrix Inverses Systems of Linear Equations—Theoretical Aspects Systems of Linear Equations—Computational Aspects _£. Solving a system of linear equations is probably the most important application of linear algebra. his chapter is devoted to two related objectives: 1. In this representation of the system. multiplying any equation in the system by a nonzero constant. the application of these operations and the theory of linear transformations to the solution of systems of linear equations.3 3.

the row operations are more useful. Any of these three operations is called an elementary operation. As we will see. There are two types of elementary matrix operations—row operations and column operations. Let A be an m x n matrix. The resulting matrix is C = . (3) adding any scalar multiple of a row [column] of A to another row [column].12). In subsequent sections. (2) multiplying any row [column] of A by a nonzero scalar. They arise from the three operations that can be used to eliminate variables in a system of linear equations. The resulting matrix is B = Multiplying the second column of A by 3 is an example of an elementary column operation of type 2. Any one of the following three operations on the rows [columns] of A is called an elementary row [column] operation: (1) interchanging any two rows [columns] of A. Elementary operations are of type 1. we use these operations to obtain simple computational methods for determining the rank of a linear transformation and the solution of a system of linear equations.1 Chap. type 2. 3 Elementary Matrix Operations and Systems of Linear Equations ELEMENTARY MATRIX OPERATIONS AND ELEMENTARY MATRICES In this section. or type 3 depending on whether they are obtained by (1).148 3. Example 1 Let A = Interchanging the second row of A with the first row is an example of an elementary row operation of type 1. Definitions. we define the elementary operations that are used throughout the chapter. or (3).

interchanging the first two rows of I3 produces the elementary matrix Note that E can also be obtained by interchanging the first two columns of J3. 2. any elementary matrix can be obtained in at least two ways— either by performing an elementary row operation on In or by performing an elementary column operation on In. An n x n elementary matrix is a matrix obtained by performing an elementary operation on In. E is obtained from fm [fn] by performing the same elementary row [column] operation as that which was performed on A to obtain B. or 3 according to whether the elementary operation performed on In is a type 1. the resulting matrix is M = '17 2 7 12' 2 1 . 2. In fact. Let A € M m X n (F). For example. A can be obtained from M by adding —4 times the third row of M to the first row ofM. Definition. Then there exists an m x m [n x n] elementary matrix E such that B — EA [B = AE]. Notice that if a matrix Q can be obtained from a matrix P by means of an elementary row operation. (See Exercise 8. The elementary matrix is said to be of type 1. then P can be obtained from Q by an elementary row operation of the same type.) So. Our first theorem shows that performing an elementary row operation on a matrix is equivalent to multiplying the matrix by an elementary matrix. in Example 1. and suppose that B is obtained from A by performing an elementary row [column] operation. Conversely. In this case.1.Sec. if E is .1 3 4 0 1 2 . Theorem 3. (See Exercise 4. or 3 operation. 3. respectively. is an elementary matrix since it can be obtained from 1$ by an elementary column operation of type 3 (adding —2 times the first column of I3 to the third column) or by an elementary row operation of type 3 (adding —2 times the third row to the first row).) Similarly. fn fact.1 Elementary Matrix Operations and Elementary Matrices 149 Adding 4 times the third row of A to the first row is an example of an elementary row operation of type 3.

by Exercise 10 of Section 2. we obtain the elementary matrix (\ 0 E = 0 \0 Observe that AE = C. requires verifying Theorem 3.1 for each type of elementary row operation. Then E can be obtained by an elementary row operation on fn. By Theorem 3. 3 Elementary Matrix Operations and Systems of Linear Equations an elementary m x m [n x n] matrix. T h e o r e m 3.2.) The next example illustrates the use of the theorem.150 Chap. Elementary matrices are invertible. E is invertible and E~1 = E. Therefore. Performing this same operation on J3. (See Exercise 7. Example 2 Consider the matrices A and B in Example 1. B is obtained from A by interchanging the first two rows of A. we obtain the elementary matrix E = Note that EA = B. Performing this same operation on I4.4.1. we can transform E back into fn. Proof. C is obtained from A by multiplying the second column of A by 3. In this case. The proof for column operations can then be obtained by using the matrix transpose to transform a column operation into a row operation. In the second part of Example 1. The result is that In can be obtained from E by an elementary row operation of the same type. and the inverse of an elementary matrix is an elementary matrix of the same type. By reversing the steps used to transform J n into E. It is a useful fact that the inverse of an elementary matrix is also an elementary matrix. The proof. I 0 0 0\ 3 0 0 0 1 0 0 0 1/ . then EA [AE] is the matrix obtained from A by performing the same elementary row [column] operation as that which produces E from Im [In]. The details are left as an exercise. there is an elementary matrix E such that EE = fn. Let E be an elementary nxn matrix. which we omit.

If B is a matrix that can be obtained by performing an elementary row operation on a matrix A.1. 5. then Bl can be obtained from At by the corresponding elementary column [row] operation.3 3\ /l 1 . Prove Theorem 3. Prove that E is an elementary matrix if and only if Et is. If B is a matrix that can be obtained by performing an elementary row operation on a matrix A. 2. Let A be an m x n matrix. The product of two n x n elementary matrices is an elementary matrix. .1 Elementary Matrix Operations and Elementary Matrices EXERCISES 1. B = ( 1 -2 VI .2 to obtain the inverse of each of the following elementary matrices. Prove the assertion made on page 149: Any elementary nxn matrix can be obtained in at least two ways -either by performing an elementary row operation on In or by performing an elementary column operation on fn. 7. The sum of two nxn elementary matrices is an elementary matrix. The inverse of an elementary matrix is an elementary matrix. and C ^ 0 1/ \l 0 -2 -3 31 -2 1 Find an elementary operation that transforms A into B and an elementary operation that transforms B into C.Sec. 3. The only entries of an elementary matrix are zeros and ones. (a) /o 0 \1 0 1\ 1 0 0 0/ /l (b) ( 0 \0 0 0\ 3 0 0 1/ / 1 0 0\ (c) ( 0 1 0 \ . 3. Label the following statements as true or false.2 0 1/ 4. Use the proof of Theorem 3. then A can be obtained by performing an elementary row operation on B. 6. then B can also be obtained by performing an elementary column operation on A. transform C into I3. The transpose of an elementary matrix is an elementary matrix. Prove that if B can be obtained from A by an elementary row [column] operation. Let A--= /l 0 . The n x n identity matrix is an elementary matrix. By means of several additional operations. (a) (b) (c) (d) (e) (f) (g) (h) 151 (i) An elementary matrix is always square.

9. ffA G Mmxn(F). 102). namely. The next theorem extends this fact to any matrix representation of any linear transformation defined on finite-dimensional vector spaces. 3 Elementary Matrix Operations and Systems of Linear Equations 8. Let T: V — W be a linear transformation between finite> dimensional vector spaces.152 Chap. 3. respectively. we define the rank of A. Thus the rank of the linear transformation LA is the same as the rank of one of its matrix representations. 11. > Many results about the rank of a matrix follow immediately from the corresponding facts about a linear transformation. Prove that if a matrix Q can be obtained from a matrix P by an elementary row operation. Then rank(T) = rank([T]^). A. Prove that any elementary row [column] operation of type 2 can be obtained by dividing some row [column] by a nonzero scalar. Every matrix A is the matrix representation of the linear transformation L. 12. Proof. 100) and Corollary 2 to Theorem 2. Theorem 3. Prove that any elementary row [column] operation of type 1 can be obtained by a succession of three elementary row [column] operations of type 3 followed by one elementary row [column] operation of type 2. We then use elementary operations to compute the rank of a matrix and a linear transformation. is that an n x n matrix is invertible if and only if its rank is n. Definition. 10.3. which follows from Fact 3 (p. Prove that any elementary row [column] operation of type 3 can be obtained by subtracting a multiple of some row [column] from another row [column].2 THE RANK OF A MATRIX AND MATRIX INVERSES In this section. An important result of this type. Prove that there exists a sequence of elementary row operations of types 1 and 3 that transforms A into an upper triangular matrix. This is a restatement of Exercise 20 of Section 2. we define the rank of a matrix. Hint: Treat each type of elementary row operation separately. The section concludes with a procedure for computing the inverse of an invertible matrix. denoted rank(^4).18 (p. and let 0 and 7 be ordered bases for V and W.4 with respect to the appropriate standard ordered bases. | .4. Let A be an m x n matrix. to be the rank of the linear transformation LA • F n — F m . then P can be obtained from Q by an elementary row operation of the same type.

We omit the details. (b) rank(PA) = rank(A). the rank of a matrix is the dimension of the subspace generated by its columns. Let A be an m x n matrix.2 (p. The next theorem and its corollary tell us how to do this. then there exists an elementary matrix E such that B = EA. apply Exercise 17 of Section 2. Proof. If B is obtained from a matrix A by an elementary row operation. Theorem 3. Proof. 3. | ff P and Q are invertible Corollary. and therefore. we need a way of examining a transformed matrix to ascertain its rank.2 The Rank of a Matrix and Matrix Inverses 153 Now that the problem of finding the rank of a linear transformation has been reduced to the problem of finding the rank of a matrix. This establishes (a). then (a) rank(j4<2) = rank(A). The next theorem is the first of several in this direction. we have mnk(PAQ) = rank(PA) = rank(A). First observe that R(L.4 to T = Lp. Elementary row and column operations on a matrix are rankpreserving.5. The rank of any matrix equals the maximum number of its linearly independent columns.Sec.4. respectively. . Finally. To establish (b). that is. and hence rank(Z?) = rank(/4) by Theorem 3. Theorem 3. applying (a) and fab).4Q) = R(L^LQ) = L /V L Q (F") = L A ( L Q ( R ) ) = L ^ R ) = R(LA) since LQ is onto. By Theorem 3. Therefore rank(AQ) = dim(R(L^c))) = dim(R(L yl )) = rank(A).4. (c) rank(PAQ) = rank(A). 150). For any AGMmxn(F). |fi| Now that we have a class of matrix operations that preserve rank. rank(i4) = rank(L A ) = dim(R(Lj4)). E is invertible. Proof. we need a result that allows us to perform rank-preserving operations on matrices. m x m and nxn matrices. The proof that elementary column operations are rank-preserving is left as an exercise.

it is frequently useful to postpone the use of Theorem 3. Thus rank(A) = dim(R(L / i)) = dim(span({ai. ..2 (p. Then 0 spans F n and hence. where aj is the j t h column of A. Example 1 Let Observe that the first and second columns of A are linearly independent and that the third column is a linear combination of the first two. . Hence R(LA) = s p a n ( { a i . for any j.. The corollary to Theorem 3. by Theorem 2. K(LA) = span(LA(/?)) = span ({L A (ei). 90) that LA(CJ) = Aej = a. 3 Elementary Matrix Operations and Systems of Linear Equations Let 0 be the standard ordered basis for F n .13(b) (p. Thus rank(. Example 2 Let A = If we subtract the first row of A from rows 2 and 3 (type 3 elementary row operations).a 2 .. o 2 ..154 Chap..a„})). . The next example illustrates this procedure. we have seen in Theorem 2. U(en)}).A) = cjim I span = 2. One such modification of A can be obtained by using elementary row and column operations to introduce zero entries.. LA(e2)..5 until A has been suitably modified by means of appropriate elementary row and column operations so that the number of linearly independent columns is obvious. To compute the rank of a matrix A. the result is . a n } ) . But. . . 68)..4 guarantees that the rank of the modified matrix is the same as the rank of A.

6. we can transform A into a matrix D as in Theorem 3. 02. As an aid in following the proof. The power of this theorem can be seen in its corollaries. Let A be an m x n matrix of rank r. Then r < m. The number above each arrow indicates how many elementary operations are involved. Try to identify the nature of each elementary operation (row or column and type) in the following matrix transformations.6. but on several occasions a matrix is transformed from the preceding one by means of several elementary operations. T h e o r e m 3. r < n. 3. is tedious to read. Its proof. and O3 are zero matrices. and. A) 4 8 \6 2 4 2 3 4 4 0 2 2 2\ 8 0 10 2 9 1/ (4 4 0 2 8 2 \6 3 4 4 0 2 8 0\ 2 2 10 2 9 1/ / 1 0 8 \6 1 1 2 4 2 0 3 2 2 0\ 2 2 10 2 9 1/ . Example 3 Consider the matrix /0 2 4 4 A = 8 2 \6 3 4 4 0 2 2 2\ 8 0 10 2 9 1/ By means of a succession of elementary row and column operations. A can be transformed into the matrix D = Ir Q2 OX Oa where 0\. Thus Du = 1 for i < r and Dij = 0 otherwise. we first consider an example. We list many of the intermediate matrices. by means of a finite number of elementary row and column operations.6 and its corollaries are quite important.Sec. Theorem 3. we obtain It is now obvious that the maximum number of linearly independent columns of this matrix is 2. Hence the rank of A is 2. • The next theorem uses this process to transform a matrix into a particularly simple form.2 The Rank of a Matrix and Matrix Inverses 155 If we now subtract twice the first column from the second and subtract the first column from the third (type 3 elementary column operations). though easy to understand.

the conclusion follows with D = A. the number of rows of A. If n — 1. Since A ^ O. By means of at most one elementary row and at most one elementary column . Theorem 3. Proof of Theorem 3. r = 0 by Exercise 3. We now suppose that n > 1. Now suppose that A ^ O and r = rank (A). Note that there is one linearly independent column in D. Suppose that m = 1. and the next several operations (type 3) result in 0's everywhere in the first row and first column except for the 1./I) = rank(D). Next assume that the theorem holds for any matrix with at most m — 1 rows (for some m > 1). j. A can be transformed into a matrix with a 1 in the 1.6 can be established in a manner analogous to that for m = 1 (see Exercise 10).4 0 0\ 1 1 -6 2 . Clearly. Thus the theorem is established for m = 1.4 . rank(D) = 3.3 1/ (I 0 0 3 /l 0 0 \0 0 0 2 4 -6 -8 -3 -4 0\ 1 8 4/ 0 °\ 2 2 -6 2 -3 i/ 0 0 4 2 1 (1 0 0 [p 0 0 1 2 --6 . this matrix can in turn be transformed into the matrix (1 0 0). however. In this case.156 Chap.6. The proof is by mathematical induction on m. If A is the zero matrix.6.1 position. By means of at most one type 1 column operation and at most one type 2 column operation. So rank(D) = rank(A) = 1 by the corollary to Theorem 3.3 . Subsequent elementary operations do not change the first row and first column. By means of at most n — 1 type 3 column operations. Aij ^ 0 for some i. Suppose that A is any m x n matrix. 3 Elementary Matrix Operations and Systems of Linear Equations 1 1 2 »> 2 4 2 2 -6 -8 -6 2 Vo . With this example in mind.4.8 --3 . then r > 0.3 1/ 0 0\ 0 0 0 2 0 4y 2 A 0 0 0 0 1 2 1 ^ 0 0 4 0 \0 0 2 0 0 0 0 ( 0\ 0 2 °> 3 /l 0 0 1 * 0 0 \0 0 0 0 0 0 0\ 0 8 4/ (\ 0 0 0 1 0 0 0 1 ^° 0 2 l /l c 0 0 1 0 0 0 1 v» ( 0 I /l 0 0 0 0 0 <A 1 0 0 0 0 1 0 0 V» 0 0 0 oy By the corollary to Theorem 3.1 position. rank (. We must prove that the theorem holds for any matrix with m rows. • Note that the first two elementary operations in Example 3 result in a 1 in the 1. we proceed with the proof of Theorem 3.4 and by Theorem 3.5.1 position. so rank(^l) = 3.

1 position (just as was done in Example 3). we can move the nonzero entry to the 1. In Example 3.A) = rank(jB) = r. B' can be transformed by a finite number of elementary row and column operations into the (m — 1) x (n— 1) matrix D' such that D' = /r-] On 04 Oe where O4.Sec. D' consists of all zeros except for its first r — 1 diagonal entries. rank(-B') — r — 1. By means of at most one additional type 2 operation. (In Example 3. we can eliminate all nonzero entries in the first row and the first column with the exception of the 1 in the 1. B' has rank one less than B. and OQ are zero matrices.) Thus. Hence r < m and r < n. each by a finite number of elementary operations. Also by the induction hypothesis. 3. / 2 B' =1-6 1-3 4 -8 -4 2 2' -6 2 -3 1 By Exercise 11. However this follows by repeated applications of Exercise 12. we used two row and three column operations to do this.2 The Rank of a Matrix and Matrix Inverses 157 operation (each of type 1). Therefore r — 1 < m — 1 and r — \ < n — 1 by the induction hypothesis. . which are ones. A can be transformed into a matrix /1 0 B = Vo 0 ••• B' ) 0 \ where B' is an (m — 1) x (n — 1) matrix. Thus. O5. we can assure a 1 in the 1. since A can be transformed into B and B can be transformed into D.) By means of at most m — 1 type 3 row operations and at most n — 1 type 3 column operations.1 position. That is. (Look at the second operation in Example 3. Let /1 0 D = \o 0 ••• 0 \ D' / We see that the theorem now follows once we show that D can be obtained from B by means of a finite number of elementary row and column operations. A can be transformed into D by a finite number of elementary operations. for instance. Since rank(.1 position. with a finite number of elementary operations.

Let A be an m x n matrix.. Suppose that r = rank(A).) ^J .5. Then (a) rank(4*) = rank(^l). 02. I Corollary 2. (c) The rows and columns of any matrix generate subspaces of the same dimension. and O3 are zero matrices. Since B and C are invertible. (b) The rank of any matrix equals the maximum number of its linearly independent rows.158 Chap. A can be transformed by means of a finite number of elementary row and column operations into the matrix D. (See Exercise 13. matrix in which Oi.. where D satisfies the stated conditions of the corollary. Then D* is an n x m matrix with the form of the matrix D in Corollary 1. .4. and hence rank(Z) t ) = r by Theorem 3. where D = is the mxn fr o2 Ol o. such that D = BAC. D contains ones as its first r diagonal entries and zeros elsewhere. the rank of a matrix is the dimension of the subspace generated by its rows. and D = BAC. respectively. Then there exist invertible matrices B and C of sizes mxm and nxn. Let A be an m x n matrix of rank r.-Gq. We can appeal to Theorem 3.4. since D' contains ones as its first r— 1 diagonal entries.EP and elementary nxn matrices Gi. (a) By Corollary 1.4. 3 Elementary Matrix Operations and Systems of Linear Equations Finally.6. This establishes the theorem. By Theorem 3. Taking transposes. that is. we have Dl = (BAC)1 = CtAtBt. each Ej and Gj is invertible. 149) each time we perform an elementary operation. there exist invertible matrices B and C such that D = BAC.. 150). The proofs of (b) and (c) are left as exercises..1 (p.E2.Gq such that D — EpEp-i • • • E2E\AG\G2 • • • Gq.. By Theorem 3. Then B and C are invertible by Exercise 4 of Section 2. Proof.2 (p.. rank(A') = rank(C t A*5*) = rank(D f ). so are Bl and C* by Exercise 5 of Section 2. This establishes (a). Hence by Theorem 3.. numerically equal to the rank of the matrix.G2. Let B — EpEp-\ • • • E\ and C = G\G2. Proof. I Corollary 1. Thus there exist elementary mxm matrices E\. Thus rank(^ 4 ) = rank(Dt) =r = rank(^l).

(b) rank(UT) < rank(T). 88). (d) rank(AB) < rank(B). Proof. then rank(^4) = n. and Z. T h e o r e m 3.BAC. rank(UT) = r a n k ^ ' S ' ) < rank(5') = rank(T). rank(AB) — rank(LAs) = rank(L^LB) < rank(L^) = rank(^l). Let T: V — W and U: W — Z be linear transformations > • on finite-dimensional vector spaces V. Hence. Then (a) rank(UT) < rank(U). 1 We now use Corollary 2 to relate the rank of a matrix product to the rank of each factor. and 7 be ordered bases for V. (b) Let ct. respectively. Thus A = B-lInC-1 = B-lC~l. Notice how the proof exploits the relationship between the rank of a matrix and the rank of a linear transformation. (d) By (c) and Corollary 2 to Theorem 3. (c) By (a). (d).6. As in the proof of Corollary 1. If A is an invertible nxn matrix. Hence R(UT) = UT(V) = U(T(V)) = U(R(T)) C U(W) = R(U). and hence A is the product of elementary matrices. (c). and let A' = [U]J and B' = [T]g. (a) Clearly. W. R(T) C W. and let A and B be matrices such that the product AB is defined. The inverses of elementary matrices are elementary matrices. I J . mnk(AB) = rank((4£)*) = rank(B t A t ) < rank(B f ) = rank(B). Hence the matrix D in Corollary 1 equals In. and Z. however. by Theorem 3. and there exist invertible matrices B and C such that In . Then A'B' = [UT]2 by Theorem 2. We prove these items in the order: (a). W.Sec. note that B = EpEp-\ • • • E\ and C = GiG2-Gq.2 The Rank of a Matrix and Matrix Inverses 159 Corollary 3. where the E^s and GVs are elementary matrices.4£) < rank(A). Proof. so that A= E^E^.. / (c) rank(.3 and (d). and (b).-E-lG-1G-l1---G-[1. Thus rank(UT) = dim(R(UT)) < dim(R(U)) = rank(U). Every invertible matrix is a product of elementary matrices.11 (p.0. 3.7.

3 Elementary Matrix Operations and Systems of Linear Equations It is important to be able to compute the rank of any matrix.6.160 Chap. We can use the corollary to Theorem 3. Subtracting the first row from the second row. and fourth columns of A are identical and that the first and second columns of A are linearly independent. and the first and second rows are linearly independent. (c) Let A = Using elementary row operations. The object is to perform elementary row and column operations on a matrix to "simplify" it (so that the transformed matrix has many zero entries) to the point where a simple observation enables us to determine how many linearly independent rows or columns the matrix has. note that the first. Theorems 3. third. Thus rank(vl) = 2. Example 4 (a) Let A = 1 2 1 1 1 1 . we obtain 0 0 3 -3 3 1 1 0 0 0 0 Now note that the third row is a multiple of the second row. and Corollary 2 to Theorem 3.4.5 and 3. In this case.1 1 Note that the first and second rows of A are linearly independent since one is not a multiple of the other. and thus to determine its rank. Hence rank(v4) = 2. Thus raxfk(A) — 2.6 to accomplish this goal. there are several ways to proceed. . Suppose that we begin with an elementary row operation to obtain a zero in the 2. (b) Let A/l 3 1 1 1 0 1 1 \ 0 3 0 0.1 position. As an alternative method. we can transform A as follows: 1 0 0 2 -3 0 3 -5 3 r -1 0.

Let E\. Let A be an invertible nxn matrix. • In summary. for some nxn matrix B. that is. and whose last p columns are the columns of B. and consider the n x 2n augmented matrix C — (A\In).. Thus (1) becomes EpEp_x • • • Ei(A\In) = A~lC = (In\Ai Because multiplying a matrix on the left by an elementary matrix transforms the matrix by an elementary row operation (Theorem 3. By the augmented matrix (A\B). the matrix whose first n columns arc the columns of A. The Inverse of a Matrix We have remarked that an nxn matrix is invertible if and only if its rank is n. suppose that A is invertible and that. we have A~lC = (A~lA\A'lfn) = (In\A~l). say A"1 = EpEp-i • • • E\..1. Definition. then EpEp-1---Ei(A\In) Letting M = EpEp-\ = {In\B). A-1 is the product of elementary matrices. We now provide a simple technique for computing the inverse of a matrix that utilizes elementary row operations. we have the following result: If A is an invertible n x n matrix. Since we know how to compute the rank of any matrix. 3. respectively. . we have from (2) that (MA\M) = M(A\In) = (In\B). E2. Ep be the elementary matrices associated with these elementary row operations as in Theorem 3.Sec. we mean the m x (n + p) matrix (A B). Conversely. the matrix (A\In) can be transformed into the matrix (In\B) by a finite number of elementary row operations. (2) • • • E\.. Let A and B be m x n and m x p matrices.. 149). we can always test a matrix to determine whether it is invertible. By Exercise 15.1 p. perform row and column operations until the matrix is simplified enough so that the maximum number of linearly independent rows or columns is obvious. (1) By Corollary 3 to Theorem 3.6.2 The Rank of a Matrix and Matrix Inverses 161 It is clear that the last matrix has three linearly independent rows and hence has rank 3. then it is possible to transform the matrix (A\In) into the matrix (In\A~x) by means of a finite number of elementary row operations.

Since we need a nonzero entry in the 1. It follows that M = A'1. we must multiply the first row by \. then B — A~x.1 position. and the matrix (A\In) is transformed into a matrix of the form (In\B) by means of a finite number of elementary row operations. yielding the following result: If A is an n x n matrix that is not invertible. 3 Elementary Matrix Operations and Systems of Linear Equations Hence MA = In and M = B. If. Example 5 We determine whether the matrix A = is invertible. So B = A~\ Thus we have the following result: If A is an invertible nxn matrix. In fact. We attempt to use elementary row operations to transform '0 2 4 (A\I) = ( 2 4 2 3 3 1 into a matrix of the form (I\B). and if it is. Hence any attempt to transform (^4|/n) into a matrix of the form (In\B) by means of elementary row operations must fail because otherwise A can be transformed into In using the same row operations. we compute its inverse. then rank(^4) < n. this operation yields 2 1 0 2 4 1 3 1 0 1 0 0 0 0 1 0 3 . on the other hand. then any attempt to transform (A\In) into a matrix of the form (In\B) produces a row whose first n entries are zeros. into the corresponding column of I. because elementary row operations preserve rank.162 Chap. One method for accomplishing this transformation is to change each column of A successively. This is impossible. The next two examples demonstrate these comments. however.1 position. we begin by interchanging rows 1 and 2. beginning with the first column. The result is 2 4 2 0 1 0 0 2 4 1 0 0 3 3 1 0 0 1 In order to place a 1 in the 1. A is an n x n matrix that is not invertible. A can be transformed into a matrix with a row containing only zero entries.

This operation produces fl 0 2 1 1 2 -2 o o o -i l Vo .3 position. and ( A'3 = * _i 4 3 ^ 8 2 x 4i .2 The Rank of a Matrix and Matrix Inverses 163 We now complete work in the first column by adding —3 times row 1 to row 3 to obtain (\ 0 2 2 -3 1 4 -2 0 0 -* 1 In order to change the second column of the preceding matrix into the second column of / . In order to place a 1 in the 3.2 position. we multiply row 3 by \\ this operation yields /l 0 0 0 1 0 -3 2 1 5 0 O N 0 1 Adding appropriate multiples of row 3 to rows 1 and 2 completes the process and gives IN i 2 1 V 0 0 Thus A is invertible.3 We now complete our work on the second column by adding —2 times row 2 to row 1 and 3 times row 2 to row 3. 3. we multiply row 2 by | to obtain a 1 in the 2.Sec. The result is /l 0 0 0 1 0 -1 \ 0 0\ 0 6 _3 -i / 2 2 '7 Only the third column remains to be changed.

to test for invertibility and compute the inverse of a linear transformation.18 (p. The result. with the help of Theorem 2. 1 2 1 1 0 0 2 1 -1 0 1 0 4 0 0 1 1 5 / 1 2 0 -3 0 3 1 0 0 2 -3 0 1 1 0 0 -3 -2 1 0 3 -1 0 1 1 -3 0 1 0 0 -2 1 0 -3 1 1 is a matrix with a row whose first 3 entries are zeros. where f'(x) and f"(x) denote the first and second derivatives of f(x). we attempt to use elementary row operations to transform (A\I) = /I 2 2 1 \1 5 1 1 0 0 -1 0 1 0 4 0 0 1 into a matrix of the form (I\B). we compute its inverse. The next example demonstrates this technique. Example 7 Let T: P2(R) -> P2(R) be defined by T(f(x)) = f(x) + f'(x) + f"(x). we have / l 1 2N = 0 1 2 \0 0 1 [T]/3 . • Being able to test for invertibility and compute the inverse of a matrix allows us. We use Corollary 1 of Theorem 2. We then add row 2 to row 3. 101) and its corollaries. 3 Elementary Matrix Operations and Systems of Linear Equations Example 6 We determine whether the matrix is invertible. and if it is. Using a strategy similar to the one used in Example 5.164 Chap. Therefore A is not invertible.18 (p. We first add —2 times row 1 to row 2 and —1 times row 1 to row 3. Taking 0 to be the standard ordered basis of P2(R). 102) to test T for invertibility and compute the inverse if T is invertible.

and ([T]^)" 1 = [T . Find the rank of the following matrices. 0 1 Thus T is invertible. (f) The rank of a matrix is equal to the maximum number of linearly independent rows in the matrix.0 -1 0' 1 -2 . (i) An n x n matrix having rank n is invertible.2a2)x + a2x' / EXERCISES 1. (a) The rank of a matrix is equal to the number of its nonzero columns.2 The Rank of a Matrix and Matrix Inverses 165 Using the method of Examples 5 and 6. we can show that \T]p is invertible with inverse ([T^r = T 0 . (d) Elementary row operations preserve rank. 91). we have [T l (a0 + aix + a2x2)]p = 0 Therefore T (ao + a i x -I.a2x ) = (a 0 — ai) + («i .1 ]^.14 (p. 3. (e) Elementary column operations do not necessarily preserve rank. (h) The rank of an n x n matrix is at most n. (b) The product of two matrices always has rank equal to the lesser of the ranks of the two matrices. Hence by Theorem 2.Sec. (a) (c) 1 1 0 2 1 4 . 2. Label the following statements as true or false. (g) The inverse of a matrix can be computed exclusively by means of elementary row operations. (c) The mxn zero matrix is the only mxn matrix having rank 0.

166 Chap. For each of the following linear transformations T.ai + a 3 ). matrix A. Use elementary row and column operations to transform each of the following matrices into a matrix D satisfying the conditions of Theorem 3. For each of the following matrices. rank(A) — 0 if and only if A is the 4. .a2. (c) T: R3 -> R3 defined by T(a\.1 if it exists. 3 Elementary Matrix Operations and Systems of Linear Equations 1 2 1 2 4 2 / 2 0 4 1 6 2 -8 1 /l 2 3 1 4 0 (e) 0 2 .6. / = (ai + 2a2 + as. and then determine the rank of each matrix. and compute T .) f(x). Prove that for any mxn zero matrix. compute the rank and the inverse if it exists. (a) T: P2(R) -* P2(R) defined by T(/(x)) = f"(x) + 2f'(x) (b) T: P2(R) -+ P2(R) defined by T(/(*)) = (x + l)f'(x). -a\ + a2 + 2a-s. (a) (b) 5. / (a) 1 1 2 1 1 2 (g) .a2.3 \1 0 0 1 l\ 3 0 5 1 -3 l) 1 1 0 0 1\ 2 1 0/ (d) 1 2 (f) 3 1-4 (I 1 0 1\ 2 2 0 2 (g) 1 1 0 1 V i o V 3.2 V 3 6. determine whether T is invertible.

167 ( ! » • ) as a product of elementary matrices.6 for the case that A is an m x 1 matrix.tr(At). Complete the proof of the corollary to Theorem 3.Sec. 11. Let A be an m x n matrix.4 by showing that elementary column operations preserve rank. .tr(AE)).03) = (a] + o 2 + o 3 ) + (ai . Express the invertible matrix 0 1 1 0 (tY(A).02. + oix 2 . 10. 9. then rank(£') = r .2 The Rank of a Matrix and Matrix Inverses (d) T: R3 -» P2(R) defined by T(oi. (f) T: M2X2(-R) -> R4 defined by T(A) = where E = 7.1) x (n + 1) matrices respectively defined by ( 1 0 B = U 0 ••• B' 7 0 \ and D— U / 1 0 0 ••• D' 7 0 \ Prove that if B' can be transformed into D' by an elementary row [column] operation. Prove Theorem 3.1. Let / ' 0 B = V0 B' 7 0 ••• 0 \ where B' is an m x n submatrix of B. and let B and D be (ra-f.R3 defined by T(/(x)) = ( / ( . Prove that if c is any nonzero scalar. 3. / ( l ) ) . Let B' and D ' b e m x n matrices. 12. then B can be transformed into D by an elementary row [column] operation. 8. (e) T: P a (H) . Prove that if rank(B) = r. then rank(c.tr(EA).a2 + 03)3. / ( 0 ) .4) = rank(^4).l ) .

3 SYSTEMS OF LINEAR EQUATIONS—THEORETICAL ASPECTS This section and the next are devoted to the study of systems of linear equations. 19.A + B) < rank(A) + rank(5) for any mxn matrices A and B. U: V — W be linear transformations. 3 Elementary Matrix Operations and Systems of Linear Equations 13. 14. we apply results from Chapter 2 to describe the solution sets of . 21. 17. Justify your answer. Prove that if B is a 3 x 1 matrix and C is a 1 x 3 matrix. Conversely. Let A be an m x n matrix and B be an n x p matrix. Let A be an m x n matrix with rank m and B be an n x p matrix with rank n.6. then the 3 x 3 matrix BC has rank at most 1. 20. Let T. In this section. M(A\B) = (MA\MB) for any mxn matrix M. Determine the rank of AB. Prove that there exists an mxn matrix A such that AB = Im. (See the definition of the sum of subsets of a vector space on page 22. Suppose that A and B are matrices having n rows. 22. (c) Deduce from (b) that rank(. Let B be an n x m matrix with rank m.168 Chap. Prove (b) and (c) of Corollary 2 to Theorem 3. 15. Let / Prove that ( 1 -1 A = -2 \ 3 0 -1 3 1 4 1 -1 -5 2 -1 -1 1 l\ 0 3 -e) (a) Find a 5 x 5 matrix M with rank 2 such that AM = O. which arise naturally in both the physical and social sciences. where O is the 4 x 5 zero matrix. 18. Prove that rank(B) < 2. 16. Prove that there exists an nxm matrix B such that AB = Im. Supply the details to the proof of (b) of Theorem 3.) (b) Prove that if W is finite-dimensional. Prove that AB can be written as a sum of n matrices of rank at most one. > (a) Prove that R(T + U) C R(T) + R(U). (b) Suppose that B is a 5 x 5 matrix such that AB = O. 3. then there exist a 3 x 1 matrix B and a 1 x 3 matrix C such that A = BC. Let A be an m x n matrix with rank m.4. then rank(T+U) < rank(T) + rank(U). show that if A is any 3 x 3 matrix having rank 1.

3. The system of equations On2. elementary row operations are used to provide a computational method for finding all solutions to such systems. otherwise it is called inconsistent. To exploit the results that we have developed.. A solution to the system (S) is an n-tuple s2 \SnJ such that As = b. If we let x2 x— \xn) then the system (S) may be rewritten as a single matrix equation Ax = b. we often consider a system of linear equations as a single matrix equation. In Section 3. b2 and b= £ F" .4. System (S) is called consistent if its solution set is nonempty.xn are n variables taking values in F. The mxn matrix 021 «i2 «22 ai„ \ a2l J A = \flml O-m'2 is called the coefficient matrix of the system (S).3 Systems of Linear Equations—Theoretical Aspects 169 systems of linear equations as subsets of a vector space.. is called a system of m linear equations in n unknowns over the field F..-i + a\2x2 021^1 + 022-X2 (S) o m iXi + arn2x2 1 U-itviyi'rt — "tn • • + ainXn = b\ • • + a2nxn = 62 where aij and h (1 < i < rn and 1 < j < n) are scalars in a field F and Xi.x2. .Sec. The set of all solutions to the system (S) is called the solution set of the system.

1 1 1\ / a * 1/ U 2 0^ It is evident that this system has no solutions. such as '-6 N s—\ 2 | and s -4 | (c) Consider xi + x2 = 0 xi +x2 = 1. 1\ (xi . many. x2 = 1. or no solutions. Thus we see that a system of linear equations can have one. the system can be written 1 1 so A = l\ (b) Consider 2rci + 3x2 + x3 = 1 x\— x2 + 2z 3 = 6.. _J) and b=(3.1 / \X2 1) / that is. • . 1/ ' In matrix form. that is. 3 Elementary Matrix Operations and Systems of Linear Equations Example 1 (a) Consider the system xi + x2 = 3 X\ — x2 = 1. 2 1 3 -1 1 2 'xi X2 .X3 6/ This system has many solutions. that is. By use of familiar techniques. we can solve the preceding system and conclude that there is only one solution: xi — 2.170 Chap.

Otherwise the system is said to be nonhomogeneous. Let K denote the set of all solutions to Ax = 0. Suppose that m < n. the zero vector. For example. This section and the next are devoted to this end.Sec. namely. Hence dim(K) = n — rank(Lyi) > n — m > 0.x2 . Clearly. a basis for the solution space can be found. A system Ax = b of m linear equations in n unknowns is said to be homogeneous if b = 0. Proof. The next result gives further information about the set of solutions to a homogeneous system.8.x:i = 0. We can then apply the theory of vector spaces to this set of solutions. Any homogeneous system has at least one solution. 1 Example 2 (a) Consider the system Xi + 2. hence K is a subspace of F n of dimension n — rank(L / i) = n — rank(A). Theorem 3. Then K = N(L^). and any solution can be expressed as a linear combination of the vectors in the basis. Then rank(A) = rank(L / i) < m. We begin our study of systems of linear equations by examining the class of homogeneous systems of linear equations.T2 + x3 = 0 x. Let A = 1 1 2 -1 1 -1 . Definitions.\ .8) shows that the set of solutions to a homogeneous system of m linear equations in n unknowns forms a subspace of F n . Ifrn < n. 1 Corollary. so s is a nonzero solution to Ax — 0. K ^ {()}. Since dim(K) > 0. The second part now follows from the dimension theorem (p. Let Ax = 0 be a homogeneous system of m linear equations in n unknowns over a field F. Proof. 70). Our first result (Theorem 3. where K = N(L/\). the system Ax = 0 has a nonzero solution.3 Systems of Linear Equations—Theoretical Aspects 171 We must be able to recognize when a system has a solution and then be able to describe all its solutions. Thus there exists a nonzero vector s € K. K = {s G F n : As = 0} = N(LA). 3.

If K is the solution set of this system. and let KH be the solution set of the corresponding homogeneous system Ax — 0. Thus they constitute a basis for K. Thus any nonzero solution constitutes a basis for K. If A = (l —2 l) is the coefficient matrix. 3 Elementary Matrix Operations and Systems of Linear Equations be the coefficient matrix of this system. . Theorem 3. We refer to the equation Ax = 0 as the homogeneous system corresponding t o Ax = b. explicit computational methods for finding a basis for the solution set of a homogeneous system are discussed. so that K=<£i In Section 3. Note that /2\ lj ana /-1N ( 0 are linearly independent vectors in K.9.s + fc:fcGKH}. Thus any vector in K is of the form where t G R. For example.s} + KH = {.172 Chap. We now turn to the study of nonhomogeneous systems. Then for any solution s to Ax — b K = {. Hence if K is the solution set. is a basis for K.4. since is a solution to the given system. Our next result shows that the solution set of a nonhomogeneous system Ax = b can be described in terms of the solution set of the homogeneous system Ax = 0. It is clear that rank(^l) = 2. (b) Consider the system X\ — 2x2 + x$ — 0 of one equation in three unknowns. then rank (A) = 1. then dim(K) = 3 — 1 = 2. Let K be the solution set of a system of linear equations Ax = b. then dim(K) = 3 — 2 — 1.

Therefore {«} + KH QK.As = b. So w — s G KH. 0 .Sec. . It follows that w = s + fc G {S} + KH. I Example 3 (a) Consider the system Xi + 2x2 + £ 3 = 7 #1 — #2 — X3 = —4. and therefore KC{S} + KH- Conversely. and thus K-{s} + K». But then Aw = A(s + k) = As + Ak = b + 0 = b. Let s be any solution to Ax = b.9.b= 0. suppose that w G {s} + KH.• R\. Thus there exists k G KH such that w — s — k. It is easily verified that " ( I ) ' is a solution to the preceding nonhomogeneous system.. The corresponding homogeneous system is the system in Example 2(b). | 1 j +* 2 ( o) /.3 Systems of Linear Equations—Theoretical Aspects 173 Proof. then Aw — b. so w G K. We must show that K = {. Since is a solution to the given system. then w = s + k for some fce KH. (b) Consider the system x\ — 2x2 + x% — 4."?} + KH. the solution set K can be written as K. So the solution set of the system is K = { ( i ) + t ( t l : t e * by Theorem 3. The corresponding homogeneous system is the system in Example 2(a). Hence A(w . If w G K./. 3.s) = Aw ..

In Example 1(c). then A is invertible. . and hence A is invertible. then the system has exactly one solution. We now establish a criterion for determining when a system has solutions. 3 Elementary Matrix Operations and Systems of Linear Equations The following theorem provides us with a means of computing solutions to certain systems of linear equations. . Then the system is consistent if and only ifrank(A) = rank(-A|fr).) In the proof of Theorem 3. If s is an arbitrary solution. ( * *1 A l X2 \ = A . Let Ax = b be a system of linear equations. Multiplying both sides by A~l gives s — A~1b.10. then As = b. . Let KH denote the solution set for the corresponding homogeneous system Ax = 0. . namely. Conversely. Proof. suppose that the system has exactly one solution s. Thus the system has exactly one solution. Let Ax = b be a system of n linear equations in n unknowns. o „ } ) . we saw that R(Lyi) = s p a n ( { a i .T2 + 2x 3 = 3 3xi + 3x2 + X3 = 1. Thus N(L^) = {()}. By Theorem 3. . if the system has exactly one solution. Proof. o 2 . namely. A~lb. A~lb. jif Example 4 Consider the following system of three linear equations in three unknowns: 2x2 + 4x 3 = 2 2xi + 4. Suppose that A is invertible. we have A(A~lb) = (AA~x)b = b. To say that Ax = b has a solution is equivalent to saying that b G R(L/i). {s} = {s} + KH. we computed the inverse of the coefficient matrix A of this system. This criterion involves the rank of the coefficient matrix of the system Ax — b and the rank of the matrix (>1|&). Substituting A~lb into the system. we saw a system of linear equations that has no solutions. Theorem 3. 153). namely. Thus the system has one and only one solution. Conversely.174 Chap. Theorem 3. Thus A~xb is a solution.2.5 (p. The matrix (^4|6) is called the augmented matrix of the system Ax = b.9. In Example 5 of Section 3. If A is invertible.b = 4 3 w V 8 5 8 4 3 8 4' 2 1 f2\ I3\ U w (-l\ 4 l-M We use this technique for solving systems of linear equations having invertible coefficient matrices in the application that concludes this section. But this is so only if KH = {0}. (See Exercise 9.11.

a i + o 3 ) . So by Theorem 3.. Such a vector s must be a solution to the system x\ + x2 + x 3 = 3 X] — X2 + X3 = 3 X\ + X3 = 2. a2.3. • .. the system has no solutions.a2.3 Systems of Linear Equations—Theoretical Aspects 175 the span of the columns of A. .2 +03.o n }) if and only if span({ai..3. Since the ranks of the coefficient matrix and the augmented matrix of this system are 2 and 3. Example 5 Recall the system of equations xi + x2 — 0 xi + x2 = 1 in Example 1(c).01 .11 to determine whether (3. . ..o 2 + a 3 . b})).a n }).3.Sec.an. • Example 6 We can use Theorem 3.o n }) = span({ai.xa) in R3 such that T($) — (3.. This last statement is equivalent to dim(span({oi.. But b G span({oi. respectively..2) is in the range of the linear transformation T: R3 — R3 defined by * T(oi..a 2 . the preceding equation reduces to rank(v4) = rank(A|6)...02. . o 2 .. Now (3. .o 3 ) = (ai +0. rank(A) = 1 and rank(A|o) = 2. Since A = 1 1 1 1 and (A\b) = 1 1 1 0 1 1 I an.5. it follows that this system has no solutions. Because the two ranks are unequal. 3... Hence (3. .b}). .2) £ R(T).3.X2.02. Thus Ax = b has a solution if and only if b G span({ai. o n })) = dim(span({oi. ...02..2) G R(T) if and only if there exists a vector s = (xi.2).

the amount spent by the farmer on food. We begin by considering a simple society composed of three people (industries)-—a farmer who grows all the food.p2. this case is referred to as the closed model. We assume that each person sells to and buys from a central pool and that everything produced is consumed. Farmer Tailor Carpenter Food 0.20p2 + 0. the tailor's income.20p2Moreover. We close this section by applying some of the ideas we have studied to illustrate two special cases of his work.20p3 = PiSimilar equations describing the expenditures of the tailor and carpenter produce the following system of linear equations: 0.40pi + 0.10p2 + 0.20p3 =7 Pi O.70 0. Suppose that the proportion of each of the commodities consumed by each person is given in the following table. 3 Elementary Matrix Operations and Systems of Linear Equations An Application In 1973. tailor. respectively. we require that the consumption of each individual equals his or her income.40pi + 0.70p2 + 0. a tailor who makes all the clothing.20 0.10 Housing 0.20p2 + 0. and housing must equal the farmer's income. where P= . and a carpenter who builds all the housing. Notice that each of the columns of the table must sum to 1. and p% denote the incomes of the farmer. Each of these three individuals consumes all three of the commodities produced in the society. Since no commodities either enter or leave the system.50 Clothing 0. and so we obtain the equation 0. the amount spent by the farmer on clothing is 0.60p3 = PaThis system can be written as Ap = p. To ensure that this society survives.20 0. Note that the farmer consumes 20% of the clothing.10 0.60 Let Pi. clothing.20 0. and carpenter. Wassily Leontief won the Nobel prize in economics for his work in developing a mathematical model that can be used to describe various economic phenomena. Because the total cost of all clothing is p2.40 0.lOpi + 0.176 Chap.50pi + 0.20p3 = Pi 0.

" by Ben Noble. At first. California) is stated below. One solution to the homogeneous system (I — A)x = 0. For otherwise. For vectors 6 = (b\... where A is a matrix with nonnegative entries whose columns sum to 1. 1971. 5 3 » > 5 Z £ ^PS = £ ( J 2 AiA Pi = £ * * ' * 3 which is a contradiction. Proceedings of the Summer Conference for College Teachers on Applied Mathematics. 3. bn) and c — (c\.Sec. . A is called the input-output (or consumption) matrix. and carpenter have incomes in the proportions 25 : 35 : 40 (or 5 : 7 : 8). that is. there exists a k for which pk >^AkjPjj Hence.. CUPM. since the columns of A sum to 1.. but in one that is nonnegative. A useful theorem in this direction (whose proof may be found in "Applications of Matrices to Economic Models and Social Science Relationships. which is equivalent to the equilibrium condition.c2. in fact. the requirement that consumption not exceed production. tailor. c n ) in R n .b2. But.12.. it may seem reasonable to replace the equilibrium condition by the inequality Ap < p.l ) x l positive vector. In this context. is / We may interpret this to mean that the society survives if the farmer. Thus we must consider the question of whether the system (/ — A)x = 0 has a nonnegative solution. . Berkeley. we use the notation b > c [b > c] to mean bi > c-i [bi > Ci] for all i.. Notice that we are not simply interested in any nonzero solution to the system. . Theorem 3. Let A be an n x n input output matrix having the form A = B D C E where D is a l x ( n . Then (I — A)x = 0 has a one-dimensional solution set that is generated by a nonnegative vector. Ap < p implies that Ap — p in the closed model.3 Systems of Linear Equations—Theoretical Aspects 177 and A is the coefficient matrix of the system. and Ap = p is called the equilibrium condition.l ) positive vector and C is an ( n . The vector b is called nonnegative [positive] if b > 0 [b > 0].

3 Elementary Matrix Operations and Systems of Linear Equations Observe that any input-output matrix with all positive entries satisfies the hypothesis of this theorem.178 Chap.25 0.65' 0 0. that the surplus of each of the three commodities must equal the corresponding outside demands. namely.10 . 10 cents worth of clothing. In general. 0.d2. 40 cents worth of clothing.30 0. To illustrate the open model.(Auxx + A12X2 + ^13^3). (I — . suppose that 20 cents worth of food. Finally. 10 cents worth of clothing.25 0 v In the open model. and d > 0. the value of food produced minus the value of food consumed while producing the three commodities.10 \0. Similarly.20 0. then the desired solution is (/ — A)~ld. In this case.20 0.30 A = I 0.3) that the series I + A + A2 + • • • converges to (/ — A)"1 if {A71} converges to the zero matrix. It is easy to see that if (/ — A)-1 exists and is nonnegative. and X3 be the monetary values of food. Recall that for a real number a. suppose that 30 cents worth of food. AijXj = di 3=1 fori = 1. Hence . Then the value of the surplus of food in the society is xi . let Xi. suppose that 30 cents worth of food.35 0. and housing produced with respective outside demands d\..50 0.30/ . clothing. we must find a nonnegative solution to ( / — A)x = d.40 0.{' 1 / . Let A be the 3 x 3 matrix such that Aij represents the amount (in a fixed monetary unit such as the dollar) of commodity i required to produce one monetary unit of commodity j. the series 1 + a + a2 + • • • converges to (1 — o ) _ 1 if \a\ < 1.25 0. are nonnegative.. Then the input output matrix is /0.75 0.. Returning to our simple society. that is. it can be shown (using the concept of convergence of matrices developed in Section 5. The following matrix does also: '0. The assumption that everything produced is consumed gives us a similar equilibrium condition for the open model.A) -1 is nonnegative since the matrices J. and 30 cents worth of housing are required for the production of $1 worth of food.3. A. where A is a matrix with nonnegative entries such that the sum of the entries of each column of A does not exceed one. A2.X2. and 20 cents worth of housing are required for the production of $1 of clothing.30\ 0. Similarly. we assume that there is an outside demand for each of the commodities produced.2. and 30 cents worth of housing are required for the production of $1 worth of housing. and 0*3.

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects so I-A = 0.70 -0.10 -0.30 -0.20 0.60 -0.20 -0.30' -0.10 0.70, and (I^A)~1 = 2.0 0.5 1.0 1.0 1.0 2.0 0.5 1.0 2.0

179

Since (I —A)'1 is nonnegative, we can find a (unique) nonnegative solution to (/ — A)x — d for any demand d. For example, suppose that there are outside demands for $30 billion in food, $20 billion in clothing, and $10 billion in housing. If we set d = then x = (I-A)~ld= 60

So a gross production of $90 billion of food, $60 billion of clothing, and $70 billion of housing is necessary to meet the required demands. EXERCISES 1. Label the following statements as true or false. (a) Any system of linear equations has at least one solution. (b) Any system of linear equations has at most one solution. (c) Any homogeneous system of linear equations has at least one solution. (d) Any system of n linear equations in n unknowns has at most one solution. (e) Any system of n linear equations in n unknowns has at least one solution. (f) If the homogeneous system corresponding to a given system of linear equations has a solution, then the given system has a solution. (g) If the coefficient matrix of a homogeneous system of n linear equations in n unknowns is invertible, then the system has no nonzero solutions. (h) The solution set of any system of m linear equations in n unknowns is a subspace of F n . 2. For each of the following homogeneous systems of linear equations, find the dimension of and a basis for the solution set.

180 (a)

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations xi + 3x2 = 0 2xi + 6x 2 = 0 (b) Xi -f X2 — X3 = 0 4xi + X2 - 2x 3 = 0 2xi + x 2 - x 3 = 0 xi - x 2 + x 3 = 0 xi + 2x 2 - 2x 3 = 0

xi + 2x 2 - x 3 = 0 (c) 2xi + x + x = 0 2 3 (e) xi + 2x 2 - 3x 3 + x 4 = 0 Xi + 2X2 + X3 + X4 = 0 X2 — X3 + X4 = 0

(d)

xi + 2x 2 = 0 (f) Xi — x = 0 2

(g)

3. Using the results of Exercise 2, find all solutions to the following systems. xi + 3x2 = 5 (a) 2xi + 6x = 10 2 Xi + 2X2 — X3 = 3 (c) 2xi + x + x = 6 2 3 (b) Xl + X2 - X3 = 1 4xi + x 2 - 2x 3 = 3 2xi + x 2 - x 3 = 5 Xi - X2 + X3 as 1 xi + 2x2 - 2x3 = 4

(d)

(e) xi + 2x2 -^3x3 + x4 = 1 Xi + 2X2 + X3 + X4 = 1 X2 ~ X3 + X4 = 1

xi + 2x 2 = 5 (f) Xi - x = - 1 2

(g)

4. For each system of linear equations with the invertible coefficient matrix A, (1) Compute A"1. (2) Use A-1 to solve the system.
= *l+*2ZX3 \ 2xi - 2x 2 + x 3 = 4 5. Give an example of a system of n linear equations in n unknowns with infinitely many solutions.

W

2x1+5x2 = 3

M

6. Let T: R3 —• R2 be defined by T(a, b, c) = (a + 6,2a — c). Determine T-Kui)7. Determine which of the following systems of linear equations has a solution.

.

Sec. 3.3 Systems of Linear Equations—Theoretical Aspects (a) Xi + X2 — X3 + 2x4 = 2 xi + x 2 + 2x3 = 1 2xi + 2x 2 + x 3 + 2x 4 = 4 (b) X\ -r x2 — Xg = 1 2xi + x 2 + 3x 3 = 2

181

xi + 2x 2 + 3x 3 = 1 (c) xi + x 2 - x 3 = 0 xi -I- 2x2 + X3 = 3 Xi + 2x2 — X3 = 1 (e) 2xi + x2 + 2x 3 = 3 Xi — 4x2 + 7x3 = 4

X\ + Xl + (d) Xi 4xi +

X2 + 3X3 — X4 = 0 X2+ X3 + X 4 = 1 2x2 + X3 - X 4 = 1 X2 + 8x3 — X4 = 0

8. Let T: R3 -+ R3 be defined by T(o,6,c) = (o + 6,6 - 2c, o + 2c). For each vector v in R3, determine whether v £ R(T). (a) v = ( 1 , 3 , - 2 ) (b) ^ = (2,1,1) 9. Prove that the system of linear equations Ax = b has a solution if and only if 6 e R(L^). 10. Prove or give a counterexample to the following statement: If the coefficient matrix of a system of m linear equations in n unknowns has rank m, then the system has a solution. / 11. In the closed model of Leontief with food, clothing, and housing as the basic industries, suppose that the input-output matrix is 1 2 ft\ (& 5 1 5 A = 16 6 16 1 1 1 14 3 2J At what ratio must the farmer, tailor, and carpenter produce in order for equilibrium to be attained? 12. A certain economy consists of two sectors: goods and services. Suppose that 60% of all goods and 30% of all services are used in the production of goods. What proportion of the total economic output is used in the production of goods? 13. In the notation of the open model of Leontief, suppose that A = and d =

are the input-output matrix and the demand vector, respectively. How much of each commodity must be produced to satisfy this demand?

182

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations

14. A certain economy consisting of the two sectors of goods and services supports a defense system that consumes $90 billion worth of goods and $20 billion worth of services from the economy but does not contribute to economic production. Suppose that 50 cents worth of goods and 20 cents worth of services are required to produce $1 worth of goods and that 30 cents worth of goods and 60 cents worth of services are required to produce $1 worth of services. What must the total output of the economic system be to support this defense system? 3.4 SYSTEMS OF LINEAR EQUATIONSCOMPUTATIONAL ASPECTS

In Section 3.3, we obtained a necessary and sufficient condition for a system of linear equations to have solutions (Theorem 3.11 p. 174) and learned how to express the solutions to a nonhomogeneous system in terms of solutions to the corresponding homogeneous system (Theorem 3.9 p. 172). The latter result enables us to determine all the solutions to a given system if we can find one solution to the given system and a basis for the solution set of the corresponding homogeneous system. In this section, we use elementary row operations to accomplish these two objectives simultaneously. The essence of this technique is to transform a given system of linear equations into a system having the same solutions, but which is easier to solve (as in Section 1.4). Definition. Two systems of linear equations are called equivalent they have the same solution set. if

The following theorem and corollary give a useful method for obtaining equivalent systems. Theorem 3.13. Let Ax = 6 be a system of m linear equations in n unknowns, and let C be an invertible mxm matrix. Then the system (CA)x = Cb is equivalent to Ax — b. Proof. Let K be the solution set for Ax = 6 and K' the solution set for (CA)x = Cb. If w < K, then Aw = 6. So (CA)w = Cb, and hence w € K'. E Thus KCK'. Conversely, if w € K', then (CA)w = Cb. Hence Aw = C~\CAw) = C~\Cb) = 6;

so w < K. Thus K' C K, and therefore, K = K'. E | Corollary. Let Ax = 6 be a system ofm linear equations in n unknowns. If (A'\b') is obtained from (A\b) by a finite number of elementary row operations, then the system A'x = 6' is equivalent to the original system.

Sec. 3.4 Systems of Linear Equations—Computational Aspects

183

Proof. Suppose that (A'\b') is obtained from (^4|6) by elementary row operations. These may be executed by multiplying (A\b) by elementary mx m matrices E\, E2,..., Ep. Let C = Ep • • • E2E\; then (A'\b') = C(A\b) = (CA\Cb). Since each Ei is invertible, so is C. Now A' = CA and b' = Cb. Thus by Theorem 3.13, the system A'x = b' is equivalent to the system Ax = b. I We now describe a method for solving any system of linear equations. Consider, for example, the system of linear equations 3xi + 2x 2 + 3x 3 - 2x 4 = 1 xi + x 2 + x 3 =3 xi + 2x 2 + x 3 - x 4 = 2. First, we form the augmented matrix 3 2 3 -2 1 1 1 0 1 2 1 -1 By using elementary row operations, we transform the/augmented matrix into an upper triangular matrix in which the first nonzero entry of each row is 1, and it occurs in a column to the right of the first nonzero entry of each preceding row. (Recall that matrix A is upper triangular if Aij — 0 whenever i>h) 1. In the leftmost nonzero column, create a 1 in the first row. In our example, we can accomplish this step by interchanging the first and third rows. The resulting matrix is 1 2 1 -1 1 1 1 0 3 2 3 -2 2. By means of type 3 row operations, use the first row to obtain zeros in the remaining positions of the leftmost nonzero column. In our example, we must add —1 times the first row to the second row and then add —3 times the first row to the third row to obtain 0 0 2 1 -1 0 -4 0 -1 1 1 2 1 -5

3. Create a 1 in the next row in the leftmost possible column, without using previous row(s). In our example, the second column is the leftmost

184

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations possible column, and we can obtain a 1 in the second row, second column by multiplying the second row by —1. This operation produces 0 0 2 1 -1 1 0 -1 -4 0 1 2 -1 -5

4. Now use type 3 elementary row operations to obtain zeros below the 1 created in the preceding step. In our example, we must add four times the second row to the third row. The resulting matrix is 2 1 -1 0 1 0 -1 0 0 0 -3 5. Repeat steps 3 and 4 on each succeeding row until no nonzero rows remain. (This creates zeros above the first nonzero entry in each row.) In our example, this can be accomplished by multiplying the third row by — | . This operation produces 1 2 1 -1 0 1 0 -1 1 0 0 0 2 -1 3

We have now obtained the desired matrix. To complete the simplification of the augmented matrix, we must make the first nonzero entry in each row the only nonzero entry in its column. (This corresponds to eliminating certain unknowns from all but one of the equations.) 6. Work upward, beginning with the last nonzero row, and add multiples of each row to the rows above. (This creates zeros above the first nonzero entry in each row.) In our example, the third row is the last nonzero row, and the first nonzero entry of this row lies in column 4. Hence we add the third row to the first and second rows to obtain zeros in row 1, column 4 and row 2, column 4. The resulting matrix is 2 1 0 0 1 0 0 0 0 0 1 7. Repeat the process described in step 6 for each preceding row until it is performed with the second row, at which time the reduction process is complete. In our example, we must add —2 times the second row to the first row in order to make the first row, second column entry become zero. This operation produces 0 0 0 1 0 1 0 0 0 0 1

Sec. 3.4 Systems of Linear Equations—Computational Aspects

185

We have now obtained the desired reduction of the augmented matrix. This matrix corresponds to the system of linear equations x\ + x2 x3 =1 =2 X4 = 3.

Recall that, by the corollary to Theorem 3.13, this system is equivalent to the original system. But this system is easily solved. Obviously X2 = 2 and X4 = 3. Moreover, xi and X3 can have any values provided their sum is 1. Letting X3 = t, we then have xi = 1 — t. Thus an arbitrary solution to the original system has the form fl-t\ f\\ 2 2 = 0 t K ^ j A Observe that 0 1 V <v f-l\ +* 0 1 V °/

/

is a basis for the homogeneous system of equations corresponding to the given system. In the preceding example we performed elementary row operations on the augmented matrix of the system until we obtained the augmented matrix of a system having properties 1, 2, and 3 on page 27. Such a matrix has a special name. Definition. A matrix is said to be in reduced row echelon form if the following three conditions are satisfied. (a) Any row containing a nonzero entry precedes any row in which all the entries are zero (if any). (b) The first nonzero entry in each row is the only nonzero entry in its column. (c) The first nonzero entry in each row is 1 and it occurs in a column to the right of the first nonzero entry in the preceding row. Example 1 (a) The matrix on page 184 is in reduced row echelon form. Note that the first nonzero entry of each row is 1 and that the column containing each such entry has all zeros otherwise. Also note that each time we move downward to

186

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations

a new row, we must move to the right one or more columns to find the first nonzero entry of the new row. (b) The matrix

is not in reduced row echelon form, because the first column, which contains the first nonzero entry in row 1, contains another nonzero entry. Similarly, the matrix '0 1 0 2y 1 0 0 1 ,0 0 1 1 , is not in reduced row echelon form, because the first nonzero entry of the second row is not to the right of the first nonzero entry of the first row. Finally, the matrix

/ is not in reduced row echelon form, because the first nonzero entry of the first row is not 1. • It can be shown (see the corollary to Theorem 3.16) that the reduced row echelon form of a matrix is unique; that is, if different sequences of elementary row operations are used to transform a matrix into matrices Q and Q' in reduced row echelon form, then Q = Q'. Thus, although there are many different sequences of elementary row operations that can be used to transform a given matrix into reduced row echelon form, they all produce the same result. The procedure described on pages 183-185 for reducing an augmented matrix to reduced row echelon form is called Gaussian elimination. It consists of two separate parts. 1. In the forward pass (steps 1-5), the augmented matrix is transformed into an upper triangular matrix in which the first nonzero entry of each row is 1, and it occurs in a column to the right of the first nonzero entry of each preceding row. 2. In the backward pass or back-substitution (steps 6-7), the upper triangular matrix is transformed into reduced row echelon form by making the first nonzero entry of each row the only nonzero entry of its column. P

Sec. 3.4 Systems of Linear Equations—Computational Aspects

187

Of all the methods for transforming a matrix into its reduced row echelon form, Gaussian elimination requires the fewest arithmetic operations. (For large matrices, it requires approximately 50% fewer operations than the Gauss-Jordan method, in which the matrix is transformed into reduced row echelon form by using the first nonzero entry in each row to make zero all other entries in its column.) Because of this efficiency, Gaussian elimination is the preferred method when solving systems of linear equations on a computer. In this context, the Gaussian elimination procedure is usually modified in order to minimize roundoff errors. Since discussion of these techniques is inappropriate here, readers who are interested in such matters are referred to books on numerical analysis. When a matrix is in reduced row echelon form, the corresponding system of linear equations is easy to solve. We present below a procedure for solving any system of linear equations for which the augmented matrix is in reduced row echelon form. First, however, we note that every matrix can be transformed into reduced row echelon form by Gaussian elimination. In the forward pass, we satisfy conditions (a) and (c) in the definition of reduced row echelon form and thereby make zero all entries below the first nonzero entry in each row. Then in the backward pass, we make zero all entries above the first nonzero entry in each row, thereby satisfying condition (b) in the definition of reduced row echelon form. T h e o r e m 3.14. Gaussian elimination transforms any matrix into its reduced row echelon form. We now describe a method for solving a system in which the augmented matrix is in reduced row echelon form. To illustrate this procedure, we consider the system 2xi xi xi 2x, + 3x2 + x2 4- x 2 + 2x2 + + + + X3 + 4x4 — 9x5 = 17 X3 + X4 — 3x5 = 6 X3 + 2x4 - 5x 5 = 8 2x 3 + 3x 4 - 8x5 = 14, -9 -3 -5 -8 17\ 6 8

for which the augmented matrix is (2 3 1 4 1 1 1 1 1

14 V / Applying Gaussian elimination to the augmented matrix of the system produces the following sequence of matrices.

17\ 6 8 UJ

(\ 2 1 V2

1 1 1 3 1 4 1 1 2 2 2 3

-3 -9 -5 -8

6\ 17 8 14,

188

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations /l 1 1 0 1 - 1 0 0 0 \0 0 0 1 -3 2 -3 1 -2 1 -2 6\ 5 2 V <l l 0 1 0 0 ^0 0 1 1 -3 -1 2 -3 0 1 -2 0 0 0 6> 5 2 «y

a i 0 1 0 0 \0 0

1 0 -1 4 ^ -1 0 1 1 0 1 -2 2 0 0 0 <v

/l 0 ~~* 0 \0

0 2 0 - 2 3\ 1 - 1 0 1 1 0 0 1 - 2 2 0 0 0 0 0/

The system of linear equations corresponding to this last matrix is xi + 2x 3 #2 - #3 - 2x5 = 3 + x5 = 1 X4 — 2x5 = 2.

Notice that we have ignored the last row since it consists entirely of zeros. To solve a system for which the augmented matrix is in reduced row echelon form, divide the variables into two sets. The first set consists of those variables that appear as leftmost variables in one of the equations of the system (in this case the set is {xi,X2,X4}). The second set consists of all the remaining variables (in this case, {x3,xs}). To each variable in the second set, assign a parametric value t\,t2,... (X3 = t\, X5 = t2), and then solve for the variables of the first set in terms of those in the second set: xi = - 2 x 3 + 2x 5 + 3 = -2ti + 2*2 + 3 x2= x3 - x5 + 1 = t\ - t2 + 1 x4 = 2x 5 + 2 = 2t2 + 2. Thus an arbitrary solution is of the form /3\ /xA /-2ti + 2*2 + 3\\ x2 h — t2 + 1 1 = 0 -rh x3 = ti 2 x4 2*2 + 2 V^5> \ t2 / w where ti,t2 6 R. Notice that \ ( f-2\ < 2\ 1 -1 ! 1 5 0 0 2 { I o^ I V J f-2\ ( 2\ -1 1 1 + t2 0 2 0 I 0) I 1/

Sec. 3.4 Systems of Linear Equations—Computational Aspects

189

is a basis for the solution set of the corresponding homogeneous system of equations and /3\ 1 0 2 w is a particular solution to the original system. Therefore, in simplifying the augmented matrix of the system to reduced row echelon form, we are in effect simultaneously finding a particular solution to the original system and a basis for the solution set of the associated homogeneous system. Moreover, this procedure detects when a system is inconsistent, for by Exercise 3, solutions exist if and only if, in the reduction of the augmented matrix to reduced row echelon form, we do not obtain a row in which the only nonzero entry lies in the last column. Thus to use this procedure for solving a system Ax = 6 of m linear equations in n unknowns, we need only begin to transform the augmented matrix (.4|6) into its reduced row echelon form (^4'|6') by means of Gaussian elimination. If a row is obtained in which the only nonzero entry lies in the last column, then the original system is inconsistent. Otherwise, discard any zero rows from (A'\b'), and write the corresponding system of equations. Solve this system as described above to obtain an arbitrary solution of the form s = so + hui + t2u2 H + tn_rwn_r,

where r is the number of nonzero rows in A' (r < m). The preceding equation is called a general solution of the system Ax = 6. It expresses an arbitrary solution s of Ax = b in terms of n — r parameters. The following theorem states that s cannot be expressed in fewer than n — r parameters. Theorem 3.15. Let Ax = 6 be a system of r nonzero equations in n unknowns. Suppose that rank(A) — rank(^|6) and that (A\b) is in reduced row echelon form. Then (a) rank(A) = r. (b) If the general solution obtained by the procedure above is of the form s = s0 + hui + t2u2 H \tn-run-r,

then {ui,u2,..., un-r} is a basis for the solution set of the corresponding homogeneous system, and So is a solution to the original system. Proof. Since (A\b) is in reduced row echelon form, (A\b) must have r nonzero rows. Clearly these rows are linearly independent by the definition of the reduced row echelon form, and so rank(yl|6) = r. Thus rank(yl) = r.

190

Chap. 3 Elementary Matrix Operations and Systems of Linear Equations

Let K be the solution set for Ax = b. and let KH be the solution set for Ax = 0. Setting t\ = t2 — • • • — tn-r = 0, we see that s = So G K. But by Theorem 3.9 (p. 172), K = {.s,,} + KH. Hence KH = {so} + K = s p a n ( { u 1 . u 2 , . . . . «„-,})•

Because rank(A) = r, we have dim(Kn) = n — r. Thus since dim(Kn) = n — r and KH is generated by a set {u\,u2,... ,un-r} containing at most n — r vectors, we conclude that this set is a basis for KH1 An Interpretation of the Reduced Row Echelon Form Let A be an m x n matrix with columns Oi,a2, • • • ,an< a i J d let B be the reduced row echelon form of A. Denote the columns of B by b\,b2,...,b„. If the rank of A is r, then the rank of B is also r by the corollary to Theorem 3.4 (p. 153). Because B is in reduced row echelon form, no nonzero row of B can be a linear combination of the other rows of B. Hence B must have exactly r nonzero rows, and if r > 1, the vectors e.\, e2,..., er must occur among the columns of B. For i = 1 , 2 , . . . , r, let j.t denote a column number of B such that bjt = e». We claim that a 7 l , aj2..... a J r , the columns of A corresponding to these columns of B, are linearly independent. For suppose that there are scalars c\,c2,...,cr such that c\Uj1 + c2aj2 -\ 1 crajr = 0. -

Because B can be obtained from A by a sequence of elementary row operations, there exists (as in the proof of the corollary to Theorem 3.13) an invertible mxm matrix M such that MA = B. Multiplying the preceding equation by M yields 3i Since Ma*. = 6, = a, it follows that C\('\ + C2C2 H
CIMOJ, +c2Ma

+ crMa.; = 0.

r Cr('r = 0.

Hence c\ = c2 = • • • = cr = 0. proving that the vectors ctj11Q>ja, • • • ,o-jr are linearly independent. Because B has only r nonzero rows, every column of B has the form

dr 0 W

n. The next theorem summarizes these results. Because the second column of B is 2ei. and fifth columns of B are ei. the same result shows that 04 = 4a\ + (—l)as.Sec. it follows from Theorem 3. so Theorem 3. . Let A be an m x n matrix of rank r. (See Exercisel5. 3. and a$.. 04. and 63. . the rank of A is 3. then column k of A is d\Oj1 + d2aj2 + • • • + dra.) Example 2 Let (2 1 A = 2 \3 4 2 4 6 6 2 3 1 8 0 7 5 4\ 1 0 9/ / M The reduced row echelon form of A is 4 0\ (I 2 0 0 0 1 -1 0 B = 0 1 0 0 0 0 0 0 0/ ^ Since B has three nonzero rows. r. a$. there is a column bji of B such that bjt = e^.. as is easily checked. The reduced row echelon form of a matrix is unique.jr. (c) The columns of A numbered ji. Theorem 3. Moreover. and let B be the reduced row echelon form of A.16(c) asserts that the first.---. and fifth columns of A are linearly independent. 2 . since the fourth column of B is 4ei + (—l)e2. . Proof. jr nre linearly independent. . The first. (d) For each k = 1 .j2. . • . Corollary.16. Let the columns of A be denoted ai. 2 . a2. . M '(di e i +d2e2 H .dr.d2. where r > 0. if column k ofB is d\ei +d2e2 -\ \-drer. + d2M~1bJ2 + • • • + = diajr + d2aj2 + h drajv. third. third. Exercise.16(d) that a2 = 2a-i. (b) For each i = 1 .e2.4 Systems of Linear Equations—Computational Aspects for scalars d\. Then (a) The number of nonzero rows in B is r.. . The corresponding column of A must be \-drer) = d\M 1 191 ei+d2M r-l*e2-t + drM_1( drM~lbjr = diM^bj.

However. . (8. then S would be a basis for R3.2 ) . 0 ) = (0.C3. and 63. But the columns of A other than the last column (which is the zero vector) are vectors in S. 2 .6. 5 ) . Hence 0 = {(2.2 ) + c 4 ( 0 .192 Chap.1 ) . 3 Elementary Matrix Operations and Systems of Linear Equations In Example 6 of Section 1. Theorem 3. confirming that 5 is linearly dependent. 0 . . . Recall that S is linearly dependent if there are scalars Ci. a similar approach can be used to reduce any finite generating set to a basis. . 47) that 0 is a basis for R3. . (1. 2 0 ) + c 3 ( l .0)}.3 . ( 0 . The procedure described there can be streamlined by using Theorem 3. it is instructive to consider the calculation that is needed to determine whether S is linearly dependent or linearly independent. If follows from (b) of Corollary 2 to the replacement theorem (p.12c2 + 2c4 + 2c5 = 0 5ci + 20c2 .0. and fourth columns of B are e. .1 2 .3 . (0. We begin by noting that if S were linearly independent. Nevertheless. Since the first. such that c i ( 2 . (7.1 ) } is a linearly independent subset of S. not all zero.c 4 =0 has a nonzero solution. 2 . we conclude that the first.2.2.2c3 .0. .16. (1.l ) + C 5 ( 7 . Because every finite-dimensional vector space over F is isomorphic to F n for some n. . it is clear that S is linearly dependent because S contains more than dim(R 3 ) = 3 vectors. This technique is illustrated in the next example.\. third.0). and C5.C4. . third. -12. 5 ) . .16(c) gives us additional information. The augmented matrix of this system of equations is A= \ -3 5 -12 20 and its reduced row echelon form is B = Using the technique described earlier in this section.2 ) . we extracted a basis for R3 from the generating set S = {(2.20). . and fourth columns of A are linearly independent.0. Thus S is linearly dependent if and only if the system of linear equations 2ci + 8c2 + c 3 + 7c5 = 0 .3 .C2. In this case. we can find nonzero solutions of the preceding system.3 c i . 2 . e2. 5 ) + C 2 ( 8 .

5). We can then apply the technique described above to reduce this generating set to a basis for V containing S.3). . It is easily verified that V is a subspace of R5 and that S = {(-2.0.4+2x+4x2+6x3. (4.1 ) .1 . Hence {(2. which is the matrix B in Example 2.6 + 3x + 8x 2 + 7x 3 . we see that the first. (4.1 . Our approach is based on the replacement theorem and assumes that we can find an explicit basis 0 for V.2+x+5x3. .0. It follows that {2 + x + 2x 2 + 3x 3 . .8. (-5.0.).0.3.6+3x+8x2+7x3. . (1. and fifth columns of A are linearly independent and the second and fourth columns of A are linear combinations of the first. Example 4 Let V = {(xi. . third.9)} is a basis for the subspace of R4 that is generated by S'.1.1.2. Since 0 C S'. To find a subset of S that is a basis for V.3. Recall that this is always possible by (c) of Corollary 2 to the replacement theorem (p.0.4+x+9x3} generates a subspace V of Ps(R). we consider the subset Sf = {(2.2.1.1.1 ) .x 3 .6).1.x 5 ) e R 5 : xi +7x2 + 5x 3 .X2.2 . (2. 3.1)} is a linearly independent subset of V.4. the set S' generates V.7).X4.1.7).8.4 Systems of Linear Equations—Computational Aspects Example 3 The set 193 S = {2+x+2x2+3x3.3). 47).4 + x + 9x 3 } is a basis for the subspace V of Ps(i?. From the reduced row echelon form of A.1.Sec.0. . third.4x 4 + 2x 5 = 0}.9)} consisting of the images of the polynomials in S under the standard representation of Ps(R) with respect to the standard ordered basis. (6.2. Let S' be the ordered set consisting of the vectors in S followed by those in 0. and fifth columns. (6. • We conclude this section by describing a method for extending a linearly independent subset S of a finite-dimensional vector space V to a basis for V. Note that the 4 x 5 matrix in which the columns are the vectors in S' is the matrix A in Example 2.1. (4.

. 0 .1 . . If X2 = £1.2 . (-5.5 .x 2 . and X5 = £4. Label the following statements as true or false.1 ) . l .0)} is a basis for V containing S. 0 .1.1 .0).X3.0.0.0.0. The matrix whose columns consist of the vectors in S followed by those in 0 is (-2 0 0 -1 ^ 1 -5 -7 -5 4 1 1 1 0 0 -2 0 0 1 0 -1 1 0 0 1 -1 1 0 0 0 -2\ 0 0 0 l ) and its reduced row echelon form is /l 0 0 1 0 0 0 0 \o 0 Thus { ( . ( 1 . Hence 0= {(-7. . we need only write the equation as Xi = —7x2 — 5x3 + 4x4 — 2x5 and assign parametric values to X2.1 ) .1).0.5 0 0 0 1 .0).194 Chap. (a) If (A'\b') is obtained from (A\b) by a finite sequence of elementary column operations.t3. we first obtain a basis 0 for V.7 . 0 . we solve the system of linear equations that defines V.0.15. X4 = £3.1. X3.l).t2. . • 0 0 1 0 0 1 0 1 0 0 1 0 -1\ -.1)} is a basis for V by Theorem 3.t4) = i i ( . and X5.1.0).X4.0. 0 .1. 1 . 0 .0.2 . then the vectors in V have the form (xi. 0 ) + t4(-2.0.0. l .1.1 0 0 oy EXERCISES 1. (4.0. 0 . X4. (4. Since in this case V is defined by a single equation. . then the systems Ax = b and ^4'x = b' are equivalent.0.5 0 0 . 0 .x 5 ) = (-7£i -5t2+4t3 -2t4. (-5. . 0 .1.0. 0 ) + ^ ( . 0 ) + £ 3 ( 4 . To do so.0. 3 Elementary Matrix Operations and Systems of Linear Equations To extend 5 to a basis for V.0.ti. X3 = t2. l . (-2.

then the dimension of the solution set of Ax = 0 is n — r. then the number of nonzero rows in A' equals the rank of A.x5 = (h) 5xi . If this system is consistent.4x2 . (c) If A is an n x n matrix with rank n.X3 2xi . 3.2x3 + 5x4 = -6 (g) xi + 2x2 — X3 + 3x4 = 2 (f) 2xi + 4x2 .5x2 Xi + 5x3 = = = = 1 6 7 9 xi + 2x2 + 2x4 = 6 3xi + 5x2 — X3 + 6x4 = 17 (c) 2xi + 4x2 + x 3 + 2x4 = 12 2xi — 7x3 + HX4 = 7 x\ — x2 — 2x3 + 3x4 = — 7 2xi — X2 + 6x3 + 6x4 = —2 (d) — 2X] + X2 — 4X3 ~ 3X4 = 0 " 3xi — 2x2 + 9^3 + IOX4 = —5 xi . then the system Ax = 6 is consistent. Xi + 2X2 — X3 = —1 (a) 2xi + 2x2 + X3 = 1 3xi + 5x2 — 2 X — — 1 . (e) If (A\b) is in reduced row echelon form.2x2 + x-s .x 3 + 6x4 = 5 x2 + 2x4 = 3 2xi — 2x2 — X3 + 6x4 — 2x5 = 1 .3x4 + 3x5 = 2xi — X2 — 2x4 + X5 = . (g) If a matrix A is transformed by elementary row operations into a matrix A' in reduced row echelon form.x2 .8x2 + X3 — 4x4 = 9 (e) -X! + 4x2 . (f) Let J4X = 6 be a system of m linear equations in n unknowns for which the augmented matrix is in reduced row echelon form. 2.x 3 + x4 = 3 2xi .3 Xi .x 3 .4x2 + 5x3 + 7x4 .2x2 . Use Gaussian elimination to solve the following systems of linear equations. then the systems ^4x = 6 and ^4'x = b' are equivalent.Sec.' — X2 + X3 + 2x4 — X5 = 2 Xi 4xi .4 Systems of Linear Equations—Computational Aspects 195 (b) If (A'\b') is obtained from (^4|6) by a finite sequence of elementary row operations. then the reduced row echelon form of A is In(d) Any matrix can be put in reduced row echelon form by means of a finite sequence of elementary row operations.2x4 .3x2 + x3 (b) 3xi .x 5 = 6 5 2 10 5 3xi — X2 + X3 — X4 + 2x5 = X] . where r equals the number of nonzero rows in A.

apply Exercise 3 to determine whether the system is consistent.x2 + 2x 3 + 4x 4 + x 5 = 2 Xi — x2 + 2X3 + 3X4 + x 5 = — 1 2xi — 3x2 + 6x3 + 9x4 + 4x5 = —5 7xi — 2x2 + 4x3 + 8x4 + X5 = 6 2xi + 3x3 ~ 4x5 — 5 3xi — 4x2 + 8x3 + 3x4 = 8 Xi — x2 + 2x3 + X4 — X5 = 2 —2xi + 5x2 — 9x3 — 3x4 — 5x5 = —8 (i) (j) 3. Let the reduced row echelon form of A be . Determine A if the first. 4. Finally.5 0 .4 / |6 / ) if and only if (^4'|6') contains a row in which the only nonzero entry lies in the last column. X\ + X2 — 3x3 + X4 = —2 xi + 2x2 — X3 + X4 = 2 (b) xi + x 2 + x 3 . If the system is consistent. second.Ax = 6 is transformed into a matrix (A'\b') in reduced row echelon form by a finite sequence of elementary row operations. For each of the systems that follow. Suppose that the augmented matrix of a system .x 3 = 0 xi + 2x2 ~ 3x3 + 2.T4 = 2 ~ (c) Xi + X2 — 3X3 + X4 = 1 Xi + X2 + X3 . (a) Prove that vank(A') ^ rank(.X4 = 2 (a) 2xi + x 2 + x 3 . 3 Elementary Matrix Operations and Systems of Linear Equations 3xi . (b) Deduce that Ax = 6 is consistent if and only if (^4'|6') contains no row in which the only nonzero entry lies in the last column.x 4 = 3 xi + x 2 . find all solutions.196 Chap. 6.3 J) 0 0 1 6. find a basis for the solution set of the corresponding homogeneous system.X4 = 2 Xi + X2 — X3 =0 '10 2 0 -2' 0 1 . and fourth columns of A are and respectively. Let the reduced row echelon form of A be /l 0 0 \0 -3 0 4 0 1 3 0 0 0 0 0 0 0 5\ 0 2 1 -1 0 0/ 5.

U4.u2.3 . (a) Show that S = {(1. (3.1.2. Let W denote the subspace of R5 consisting of all vectors having coordinates that sum to zero. . u4 = ( 2 .1. 7 ) ug} that is a basis for W. .Sec. .7.1.0. 6 ) . (b) Extend S to a basis for V.0)} is a linearly independent subset of V. Find a subset of S that is a basis for W. 8.6 . 1 5 . and u8 = ( 2 . Let V be as in Exercise 10.4. 3. V V 7.1.5 .x 5 ) G R 5 : Xi . and u5 = ( . Let V = {(xi. Find a subset of {u\. 9.1 2 .9 .4. l . w4 = (1.l .6 ) .X4.-9..1 8 . .4 Systems of Linear Equations—Computational Aspects Determine A if the first.1.2 .-2. Let W be the subspace of M2x2(R) consisting of the symmetric 2 x 2 matrices.3. . (b) Extend S to a basis for V.8 .us.x 4 + 2x 5 = 0}. u6 = ( 0 . u2 = ( . 10. . The set S = 1 2 2 3 2 1 1 9 1 -2 -2 4 -1 2 2 -1 generates W. . -17). 12.0. Find a subset of {ui. generate W. 1 2 ) . (1.-2).-5. third. 9 ..X3. (a) Show that S = {(0.-3). 2 . . respectively. Let V denote the set of all solutions to the system of linear equations ^i — #2 + 2x4 — 3x5 + XQ = 0 2xi — X2 — X3 + 3x4 — 4x5 + 4x6 = 0- . It can be shown that the vectors u\ = (2. uz = (-8.12. The vectors «i «3 u5 u7 = = = = (2.0)} is a linearly independent subset of V.. 9 .-2. 11.1.2x 2 + 3x 3 . 8 ) generate R3. and sixth columns of A are / 1\ -2 -1 i *) >—> ( and 1 2 \-4j 3\ -9 2 197 .1)..37. u2 = (1.1). (-1.2).x 2 .-3. —2).4 ) . . u2.3 . —3.2.u$} that is a basis for R3. .

1.0)} is a linearly independent subset of V. and 3 elementary operations 148 .0).2.1. 2.0. (0.1. (a) Show that S = {(1. prove that A is also in reduced row echelon form. (b) Extend 5 to a basis for V. If (A\b) is in reduced row echelon form. INDEX OF DEFINITIONS FOR CHAPTER 3 Augmented matrix 161 Augmented matrix of a system of linear equations 174 Backward pass 186 Closed model of a sirnpic economy 176 Coefficient matrix of a system of linear equations 169 Consistent system of linear equations 169 Elementary column operation 148 Elementary matrix 149 Elementary operation 148 Elementary row operation 148 Equilibrium condition for a simple economy 177 Equivalent systems of linear equations 182 Forward pass 186 Gaussian elimination 186 General solution of a system of linear equations 189 Homogeneous system corresponding to a nonhomogeneous system 172 Homogeneous system of linear equations 171 Inconsistent system of linear equations 169 Input.output matrix 177 Nonhomogeneous system of linear equations 171 Nonnegative vector 177 Open model of a simple economy 178 Positive matrix 177 Rank of a matrix 152 Reduced row echelon form of a matrix 185 Solution to a system of linear equations 169 Solution set of a system of equations 169 System of linear equations 169 Type 1.1. 13.198 Chap. 14. 15. -1. (1. (b) Extend S to a basis for V.1.1.0.1. 3 Elementary Matrix Operations and Systems of Linear Equations (a) Show that S = {(0.1.1. Prove the corollary to Theorem 3.0.0.0)} is a linearly independent subset of V.0).16: The reduced row echelon form of a matrix is unique.1. Let V be as in Exercise 12.

which is optional.1 DETERMINANTS OF ORDER 2 In this section. which has played a prominent role in the theory of linear algebra. 199 .1 Determinants of Order 2 4. we extend the definition of the determinant to all square matrices and derive its important properties and develop an efficient computational procedure. Finally. its role is less central than in former times. offers an axiomatic approach to determinants by showing how to characterize the determinant in terms of three key properties. 4.5. we also include optional material that explores the applications of the determinant to the study of area and orientation. Section 4.1. which we discuss in Chapter 5. For the reader who prefers to treat determinants lightly. In Sections 4. we consider the determinant on the set of 2 x 2 matrices and derive its important properties and develop an efficient computational procedure. is a special scalar-valued function defined on the set of square matrices. However. Although it still has a place in the study of linear algebra and its applications. Yet no linear algebra book would be complete without a systematic treatment of the determinant.4 D e t e r m i n a n t s 4. and we present one here. the main use of determinants in this book is to compute and establish the properties of eigenvalues.2 and 4. To illustrate the important role that determinants play in geometry.4 Summary — Important Facts about Determinants 4. Section 4. Although the determinant is not a linear transformation on M n x n ( F ) for n > 1.2 Determinants of Order n 4. it does possess a kind of linearity (called n-linearity) as well as other properties that are examined in this chapter.3.4 contains the essential properties that are needed in later chapters. we define the determinant of a 2 x 2 matrix and investigate its geometric significance in terms of area and orientation.3 Properties of Determinants 4.5* A Characterization of the Determinant JL he determinant. In Section 4.

the function det: M2X2(-R) — i? is * not a linear transformation.2 .2 and det(B) = 3 .c2) be in F2 and A be a .1.3 = ..4 . { h h C] c2 . and ™ B = ~ 3 2 \Q 4 determinant det(A) = 1 .6 = 0. Theorem 4. v — (b\. For the matrices A and B in Example 1. to be the scalar ad — be. U + KV J )=dct[W)+kdetln \U I \V U + kv w )=det(" \w A det . w Proof. Example 1 For the matrices A = in M2x2(R).4-9 = .4 . and w are in F2 and k is a scalar. then we define the of A. denoted det(^4) or \A\. we have A+B = and so / det(A + B) = 4-8 . That is. v. the determinant does possess an important linearity property. ifu.b2). then ^ l and det I w I". Let u = (ai. 4 Determinants is a 2 x 2 matrix with entries from a field F.a2). Then det ID kdctiV)=det(ai WI \Ci ^ c2 A-d. '4 4N 9 8 • Since det ( 4 + B) ^ det(A) + det(B). scalar. Nevertheless.4 . The function det: M2x2(-F") —> F is a linear function of each row of a 2 x 2 matrix when the other row is held fixed. we have 1 2 3 4.200 Definition. If A = a b c d Chap. and w = (c\. which is explained in the following theorem.2 .

and so A is invertible and M = A~x.Sec.. Theorem 4.1.. 22 det(A) V-^2i -Avl Au Proof.(«2 + kb2)<\ = det = det A similar calculation shows that det A-det = det u + ki a\ + kb\ c\ u + kv w a. A remark on page 152 shows that the rank of A = Au A2\ Ai2 A22 must be 2. Conversely. then A'1 = Ai. Hence A\\ ^ 0 or A2\ ^ 0. suppose that A is invertible. Moreover. Let A G M2x2(-^).&2C1) + k{b\C2 . it is easily checked that A is invertible but D is not. add —A21/A11 times row 1 of A to row 2 to obtain the matrix A22 ~ Because elementary row operations are rank-preserving by the corollary to Theorem 3.1 Determinants of Order 2 = {a\(-2 .4 (p.2 + /c&2 c2 201 For the 2 x 2 matrices A and B in Example 1. it. then we can define a matrix M = I . We now show that this property is true in general.2. det(i4) \-A2i -A i-' An A straightforward calculation shows that AM = MA = I. Note that det(i4) ^ 0 but dct(#) = 0. if A is invertible. 153). follows that . 4.Then the determinant of A is nonzero if and only if A is invertible.b><-\ = (a\ + kbi)c2 . If det(yl) ^ 0. If An =^ 0.

On the other hand. v} is called right-handed if u can be rotated in a counterclockwise direction through an angle 9 (0 < 9 < n) .3. I. we see that det(A) ^ 0 by adding . Thus.2 and 4.) Clearly 0 1*" I v Notice that ei O ( w = 1 and e2/ 'l 0 1 ~I 1 V-C2. if A2i ^ 0.2 remains true in this more general context.202 Chap.1. In particular. In the remainder of this section. det(A) ^ 0.1: Angle between two vectors in R' If 0 = {u. In Sections 4. The Area of a Parallelogram By the angle between two vectors in R2. 4 Determinants Therefore det(A) = AnA22 . Recall that a coordinate system {u.2. we extend the definition of the determinant to nxn matrices and show that Theorem 4.v} is an ordered basis for R2.A n / A 2 i times row 2 of A to row 1 and applying a similar argument. -1. we show the importance of the sign of the determinant in the study of orientation. we define the orientation of 0 to be the real number del O del (The denominator of this fraction is nonzero by Theorem 4. we mean the angle with measure 9 (0 < 9 < IT) that is formed by the vectors having the same magnitude and direction as the given vectors but emanating from the origin. (See Figure 4.A ! 2 A 2 i ^ 0. in either case. which can be omitted if desired.) Figure 4. we explore the geometric significance of the determinant of a 2 x 2 matrix.

1 Determinants of Order 2 203 to coincide with v. (See Figure 4.2 O = 1 if and only if the ordered basis {u.3: Parallelograms determined by u and v is linearly dependent (i. v} forms a right-handed coordinate system. Otherwise {u. we also define () = 1 if {u.e. 4. . if u and r are parallel). For convenience. (See Figure 4. we parallelogram if the set {//. V V I A right-handed coordinate system A left-handed coordinate system Figure 4. v) A A ^ M ' x Figure 4. then the "parallelogram" determined by u and v is actually a line segment.2.) In general (see Exercise 12). which we consider to be a degenerate parallelogram having area zero..) Observe that V in the following origin of R2. v} is called a left-handed system.Sec. Any ordered set {u.3. v} in R2 determines a parallelogram manner. Regarding u and v as arrows emanating from the call the parallelogram having u and v as adjacent sides the determined by u and v.v} is linearly dependent.

204 There is an interesting relationship between A Chap. =0 u\ "l-det \vI W -«**: we may multiply both sides of the desired equation by to obtain the equivalent form ° ' : • * : . although somewhat indirect.* » : . we cannot expect that < But we can prove that A|W v from which it follows that det Our argument that » > : . however.: employs a technique that. Observe first. and det which we now investigate. First. can be generalized to R n . since 0 [l) = H. 4 Determinants the area of the parallelogram determined by u and v.» : • . that since det may be negative.

since the altitude h of the parallelogram determined by u and cv is the same as that in the parallelogram determined by u and v.sr v^ |c|-A -A : A similar argument shows that Si0") v ~e.*(" cv J \v Observe that this equation is valid if c = 0 because '™J = ° l o -A o1 So assume that c ^ 0.t[u \v .4 " • : = ° : .Sec. : c )=c.) Hence Figure 4.4. we see that A[ J = base x altitude = |c|(length of v)(altitude) = \c\ • A ( ).1 Determinants of Order 2 205 We establish this equation by verifying that the three conditions of Exercise 11 are satisfied bv the function < = ° : -A : (a) We begin by showing that for any real number c l")-e.o r : . Regarding cv as the base of the parallelogram determined by u and cv. (See Figure 4.M . 4.

5 = A If a = 0.' \ u +-ic I \-w So the desired conclusion is obtained in either case.//•} is linearly independent. ) =a-6 [ /. •vi +v2 =8 II [O] + 0. Then for any vectors V[. then u 81 ) =a-6 \au + bwj "i.v\. it follows that Figure 4.t'2 € R2 there exist scalars a. 4 Determinants 1.W + bite (i = 1. Choose any vector w € R2 such that {(/. such that Vi = a. then au + bw J u+w = 61 ")=b-of \bw I \w by the first paragraph of (a). and b. )=»•*(" mi + bw J \w for any u.V2 G R2. We are now able to show that s( '•' )=s(u) \Vl+V2j V'"/ + 4 " V'-' for all u. Otherwise. Because the parallelograms determined by u and w and by u and u + w have a common base u and the same altitude (see Figure 4.2)11 + (6i + b2 = (bl+b2)6 ir . 2).5). Since the result is immediate if u = 0. Thus si .. we assume that u / 0. w € R2 and any real numbers a and b.206 We next prove that Chap. if a ^ 0.

U2. then A is invertible. (d) If u and v are vectors in R2 emanating from the origin.Sec.v 6 R2(b) Since A f£\ = 0. (c) Because the parallelogram determined by ei and e2 is the unit square. Label the following statements as true or false.2 ) is det det 4 -2 = 18. . (a) The function det: W\2x2(F) —* F is a linear transformation. d e t r vI \v Thus we see. then the area of the parallelogram having u and v as adjacent sides is det . )+*( !* a\u + biwl \a2u-\-b2wl A similar argument shows that 51U1+U2)=S(U1)+5(U2 for all ui. that the area of the parallelogram determined by u = (-1. 'j '2 e2 Therefore 5 satisfies the three conditions of Exercise 11. (c) If A e M2x2{F) and det(.5) and v = (4. EXERCISES 1.1 Determinants of Order 2 1. it follows that = *(" \V\ 6 C^\ = 0 f£\ • A (^) = 0 for any u G R2. (b) The determinant of a 2 x 2 matrix is a linear function of each row of the matrix when the other row is held fixed. So the area of the parallelogram determined by u and v equals r . and hence 5 = det.4) = 0. for example. 4.

9.4) and v = (2. Be M2X2{F). Aoo -A: 21 -A 1 2 A« 11.6 ) 5. Prove that if B is the matrix obtained by interchanging the rows of a 2 x 2 matrix A. .6 . 2. » (i) 8 is a linear function of each row of the matrix when the other row is held fixed. Prove that det{AB) = det(A) • det(B) for any A. then 8(A) = 0..det(A). 10..4 A 3 + 2* 2 . Prove that det(A') = det(.3) a n d u = (-3. Compute the determinants of the following matrices in M2x2(C). The classical adjoint of A1 is C*. (ii) If the two rows of A E M2x2(F) are identical.2 ) and v = (2. . compute the area of the parallelogram determined by u and v.A) for any A 6 M2X2(-F)8. 5) (b) u = (1. 4 Determinants (e) A coordinate system is right-handed if and only if its orientation equals 1. . then det(^l) equals the product of the diagonal entries of A. If A is invertible.l ) a n d v = ( . Prove that if A e M2 X 2(^) is upper triangular. 7. then dct(A) = 0. then A" 1 = [ d e t ^ ^ C . . For each of the following pairs of vectors u and v in R .208 Chap. Prove that if the two columns of A G M2x2(F) are identical. . (a) u = (3.1) (c) u = ( 4 . (b) / 5-2i (-3+ i 6 + 4A 7* . then det(B) = . v (C) /2i 3 I 4 6i 4. det(C) = det(A). Compute the determinants of the following matrices in M2x2(-ft)(a) 'I 1 ) « ( 1 i) W (a -f 3. The classical adjoint of a 2 x 2 matrix A G M2X2(-F) is the matrix C = Prove that (a) (b) (c) (d) 0 4 = AC = [det(A)]7. Let 8: N\2x2{F) — F be a function with the following three properties. 6.2 ) (d) u = (3. (a) '-1 + i 1 .3 i ] .

we define det(A) — An. Thus for /. Let A G M n X n(F). it is convenient to introduce the following notation: Given A G M n X n (F).1. then 8(1) = 1. v} forms a right-handed coordinate system. for n > 2.2 Determinants of Order n (iii) li I is the 2 x 2 identity matrix. denote the (n . 4.5 -2 6 -r 8| 1 and £4. Definitions. 2 1 -3 -4 1\ -1 G M 4 x 4 (tf). For this definition.For n > 2. Let {u.) 12.1) x (n .5.r} be an ordered basis for R2. (This result is generalized in Section 4. we extend the definition of the determinant to n x n matrices for n > 3. Prove that 8(A) = det(A) for all A G M 2X 2(F). Hint: Recall the definition of a rotation given in Example 2 of Section 2.Sec.l ) 1 + i A y • d e t ( i y ) . . so that A = (An). 8 1 5 6 8 9 Aia = I 5 7 8 and A-12 = 1 3 4 6 2 3\ 5 6 8 9/ eM3x3(R). A=\4 \7 we have Au« id for / 1 -1 -3 4 B = 2 -5 6 V-2 we have Bo* = i -l 2 . 7f n = 1. Prove that • 0 - if and only if {u.2 DETERMINANTS OF ORDER 71 In this section. 4. we define det(A) recursively as n det(A) = j ^ ( .1) matrix obtained from A by deleting row i and column j by Aij.

det(An) + ( .210 The scalar det (A) is called the determinant The scalar (-l)i+j Chap. I — 3 -2)+<-ift3)-*tL2 2 -3 + (-l)4(-3)-detv_4 -5 4 = 1 [ . Letting Cij = (-l)^det(AV) denote the cofactor of the row i.1 because det(A) = Au(-l)l+1 Example 1 Let 1 3 .3 .6 ) . This formula is called cofactor expansion along the first row of A.6 ) . we obtain det(A) = ( . Note that.2'= (-l)2(l). this definition of the determinant of A agrees with the one given in Section 4. -6/ det(An) + A 1 2 ( . • .l ) 1 + 2 d e t ( A 1 2 ) = A n A 2 2 A12A21.2(-4)] .3(-32) = 40.det(A 12 ) + (-l)1+3A13-det(A13) . column j. we can express the formula for the determinant of A as det(A) = A u d i + Auci2 + ••• + Ai„ci„.5 ( . for 2 x 2 matrices.5 ) ( .3 [-3(4) . Using cofactor expansion along the first row of A. 4 Determinants of A and is also denoted by\A\.2(4)] . .3 ( . det(A{j) is called the co factor of the entry of A in row i.det 4 .4 ) ] = 1(22) . Thus the determinant of A equals the sum of the products of each entry in row 1 of A multiplied by its cofactor.( .5 -4 4 -3\ 2 € M 3x3 (i?).3(26) .3 [ .l ) 1 + 1 A n .l ) 1 + 2 A i 2 . column j entry of A.<^AN .

l ) 5 ( l ) .2 ( . \ 4 -4 4 -ey Using cofactor expansion along the first row of C and the results of Examples 1 and 2.2 .5 GM 3 x 3 (i?).5 2 GM 4 x 4 (fi).3 .Sec. ~\ = (-l)2(0)-det(^ + (-l)4(3)-det = 0 .(-5)(4)] + 3 [ . d e t I .det(C 12 ) + (-1) 4 (0)« det(C 13 ) + (-1) 5 (1).2 Determinants of Order // Example 2 Let B= 0 -2 \ 4 / 1 -3 -4 3\ .3 4 -4 = 2(40)+ 0 + 0 .det(B13) -^+(-l)3(l). 4/ 211 Using cofactor expansion along the first row of B.1(12)+ 3(20) = 48.(-3)(4)] = 0 . • . det(C 14 ) = (-l)2(2)-det 1 3 -3 -5 \-4 4 / -3\ 2 + 0 +0 -6/ -5 ( 0 1 •f ( . we obtain det(B) = (-l)1 +l Bu • det(Bn) + (-l)1+2Bi2- det(BV2) + (-iy+3BV3.4 ) . 4.det(-^ -2 4 -3 -4.3 C = -2 .1 [-2(4) . • Example 3 Let 2 0 0 1\ 0 1 3 .1(48) = 32. det(Cn) + (-1) 3 (0). we obtain det(C) = (-1) 2 (2).

1 (p. That is.212 Example 4 Chap. The result is clearly true for the l x l identity matrix. We now show that a similar property is true for determinants of any size.3. and each a^ are row vectors in F n . \ ar_! v ar+i det o r _i u + kv a r +i = det whenever k is a scalar and u. Proof. Assume that the determinant of the (n — 1) x (n — 1) identity matrix is 1 for some n > 2. and so the determinant of any identity matrix is 1 by the principle of mathematical induction.1) identity matrix. Using cofactor expansion along the first row of J. v. This shows that the determinant of the nxn identity matrix is 1. although the determinant of a 2 x 2 matrix is not a linear transformation. and let I denote the nxn identity matrix. The result is immediate if n = 1.1) x (n . Theorem 4. we obtain det(I) = (-1) 2 (1) • det(Jn) + (-1) 3 (0) • det(/ 1 2 ) + • • • + (-l)l+n(0). We prove this assertion by mathematical induction on n.det(/ln) = l(l) + 0 + --. for 1 < r < n. 200) that. even for matrices as small as 4 x 4 . 4 Determinants The determinant of the n x n identity matrix is 1. Later in this section.+ 0 = 1 because In is the (n . it is a linear function of each row when the other row is held fixed. • As is illustrated in Example 3. The determinant of an n x n matrix is a linear function of each row when the remaining rows are held fixed. the calculation of a determinant using the recursive definition is extremely tedious. but we must first learn more about them. Assume that for some integer n > 2 the determinant of any (n — l ) x ( n — 1) matrix is a linear function of each row when the remaining . The proof is by mathematical induction on n. we have "I \ ( «i \ ar_ i u + fcdet O-r+l ( a. Recall from Theorem 4. we present a more efficient method for evaluating determinants.

then det(B) = (-l)i+k det(Bik). See Exercise 24.. v G F n and some scalar k. Thus since A\j = Z?ij = C\j. Moreover. Our next theorem shows that the determinant of a square matrix can be evaluated by cofactor expansion along any row. a n ..bj+i + fccj+i.. which is the sum of row r — 1 of B\j and fc times row r — 1 of Cij. row r — 1 of Au is (bi+kci. the rows of Aij... Lemma. Proof. b2.. 6„ + fccn). Let A be an n x n matrix with rows a\. we have ar = u + kv for some u.l ) 1 + i A i i • dct(Bl3) + * 2 ( .2 Determinants of Order n 213 rows are held fixed. If A & M n x n ( F ) has a row consisting entirely of zeros... The definition of a determinant requires that the determinant of a matrix be evaluated by cofactor expansion along the first row. We must prove that det(A) = det(B) + fcdet(C). where n > 2. we have det(Aij) = det(Z?ij) + kdet(Cij) by the induction hypothesis. a2.. . Corollary. JBIJ. If row i of B equals e^ for some k (1 < k < n). We leave the proof of this fact to the reader for the case r = 1.Sec. Its proof requires the following technical result.. Since Z?ij and Cij are (n — 1) x (n — 1) matrices. c2. and C y are the same except for row r — 1. bj^i + kCj-i. respectively.. c n ).l ) 1 + i A y * det(Cy) = det(£) + fcdet(C).£ ( . and let £ and C be the matrices obtained from A by replacing row r of A by u and v. 4. For r > 1 and 1 ^ j < n-. This shows that the theorem is true for nxn matrices. and suppose that for some r (1 < r < n).. Let u = (&i. respectively. we have n det(A) = ^2(-l)1+j Au • d e t ( i i j ) = y > i ) ' + ' A ... bn) and v = (c\... Let B G M n x n ( F ) . • • •. and so the theorem is true for all square matrices by mathematical induction. then det(A) = 0.det(fiy) +Ardet(Cij .

if j > k. .det(C.l ) ^ i r d . it follows that (let(£) = (-l) i + f c det(G i f c )This shows that the lemma is true for n x n matrices. The lemma is easily proved for n = 2. and so the lemma is true for all square matrices by mathematical induction. we have (-l)«-i)+(*-i)det(Cy =I 0 V< (-l)(*-i)+*det(Cy) if j <k if j = k )ij>k. and let B be an n x n matrix in which row i of B equals e* for some /c (1 < k < n).3.1 . row i — 1 of Z?ij is the following vector in F n _ 1 : efc_i 0 efc ifj<fc if j = A . let Cij denote the (n — 2) x (n — 2) matrix obtained : from B by deleting rows 1 and i and columns j and k. j=i = ^ ( . Hence by the induction hypothesis and the corollary to Theorem 4. The proof is by mathematical induction on n. 4 Determinants Proof. e t ( C i j<fc + ^(_1)i+0-i)5lj. j>fc j<fc = ^ ( . f Therefore n det(B) = ^ ( .l ) i + > B y .l ) 1 + ^ i r det(By) + £ ( .l ^ Z V [(_i)(*-D+(*-i) det(C^) 3<k + £ ( .l ) 1 + ^ • det(Bn.214 Chap. the lemma is true for (n — 1) x (n — 1) matrices. Suppose therefore that 1 < i < n. The result follows immediately from the definition of the determinant if i = 1. For each j ^ A (1 < j < n).d e t ( B l i .l ) 1 + ^ l j .i+fc ^ ( . [(-!)(*-!)+*det(^) j>fc 1+ = ( . j>fc Because the expression inside the preceding bracket is the cofactor expansion of Bik along the first row. Assume that for some integer n > 3. For each j .

Since each Ay is an (n — 1) x (n — 1) matrix with two identical rows. Next we turn our attention to elementary row operations of type 1 (those in which two rows are interchanged).Sec.. 3=1 I Corollary. So the result is true if i = 1. n det(A) = ^ ( .3 provides this information for elementary row operations of type 2 (those in which a row is multiplied by a nonzero scalar). This completes the proof for n x n matrices.l ) i + J A i r d e t ( A i 3=1 by Theorem 4.2 Determinants of Order n 215 We are now able to prove that cofactor expansion along any row can be used to evaluate the determinant of a square matrix. Theorem 4. and hence det(A) = 0. then for any integer i (I < i < n).3 and the lemma. we can choose an integer i (1 < i < n) other than r and s. The determinant of a square matrix can be evaluated by cofactor expansion along any row.• det(A^).4. 4. i-i Proof. Theorem 4. Because n > 3. Now det(A) = ^ ( . and so the lemma is true for all square matrices by mathematical induction. Row i of A can be written as X)?=i ^ijej.For 1 < j < n. If A G M n X n (F) has two identical rows. We leave the proof of the result to the reader in the case that n = 2. it is true for (n — 1) x (n — 1) matrices. Proof.. let Bj denote the matrix obtained from A by replacing row i of A by ej. The proof is by mathematical induction on n. Assume that for some integer n > 3. It is possible to evaluate determinants more efficiently by combining cofactor expansion with the use of elementary row operations. we need to learn what happens to the determinant of a matrix if we perform an elementary row operation on that matrix.. if A G M n x n ( F ) . the induction hypothesis implies that each det(A^) = 0. . Then by Theorem 4. we have n det(A) =^Aij 3=1 n det(Bj) = ^(-1)* + J A. and let rows r and s of A G M n x n ( F ) be identical for r ^ s.4. Before such a process can be developed. That is.l ) J + M ^ . Fix i > 1. then det(A) = 0. d e t ( A ^ ) . Cofactor expansion along the first row of A gives the determinant of A by definition.

. Let the rows of A G M n X n ( F ) be a\. a2. a n . \ L T-ZZ .4 and Theorem 4. Therefore det(B) = . where r < s. Thus /ai\ a. Proof. Then det(B) = det(A). Let A G M n X n (F).d e t ( A ) . an /aA / \ ( *i \ / a.6. A = and B = ar \anJ \anJ (ttl\ as Consider the matrix obtained from A by replacing rows r and s by ar + at By the corollary to Theorem 4. Theorem 4. we have ( ai ar + as 0 = det ar + a.. det \ + det ar + a. If A G M n x n (F) and B is a matrix obtained from A by interchanging any two rows of A. then det(-B) = — det(A).216 Chap.. 4 Determinants Theorem 4.3. and let B be a matrix obtained by adding a multiple of one row of A to another row of A. det / /ai\ "/• det ar \an/ \an/ \ = det ar + a4 On / /ai\ a.. and let B be the matrix obtained from A by interchanging rows r and s. We now complete our investigation of how an elementary row operation affects the determinant of a matrix by showing that elementary row operations of type 3 do not change the determinant of a matrix. V an fai\ ar = det ar\an/ Va n / = 0 + det(A) + det(B) + 0.5.

3 to row s of B.Sec. then det(B) = det (A). These facts can be used to simplify the evaluation of a determinant. Then bi = ai for i ^ s and bs = as + kar. So there exist scalars Cj such that ar = c\a\ H h Cr-icir-i + cr+iar+i -I h cnan.. an of A are linearly dependent. Consider. det(B) = det(A). Corollary.. bn. Then row r of B consists entirely of zeros. for instance.b2... Let C be the matrix obtained from A by replacing row s with a r . Proof. we obtain det(B) = det(A) + Ardet(C) = det(A) because det(C) = 0 by the corollary to Theorem 4. But by Theorem 4. where r ^ s. I The following rules summarize the effect of an elementary row operation on the determinant of a matrix A G M n X n (F). Let B be the matrix obtained from A by adding — c? times row i to row r for each i ^ r. Applying Theorem 4. we can prove half of the promised generalization of this result in the following corollary. a2... (b) If B is a matrix obtained by multiplying a row of A by a nonzero scalar k. By Exercise 14 of Section 1. then the rows ai. 201). some row of A.4. 4.6. and the rows of B be b\. Hence det(A) = 0 .6... If the rank of A is less than n. (c) If B is a matrix obtained by adding a multiple of one row of A to another row of A. we obtain M = .. Let the rows of A be a\. then det(A) = 0.2 Determinants of Order 217 Proof.7. and so det(B) = 0.. (a) If B is a matrix obtained by interchanging any two rows of A. then det(B) = fcdet(A). row r. If A £ M n x n ( F ) has rank less than n. an. the matrix in Example 1: -3\ 4 -6/ Adding 3 times row 1 of A to row 2 and 4 times row 1 to row 3. say.5. Suppose that B is the nxn matrix obtained from A by adding k times row r to row s. As a consequence of Theorem 4.2 (p. a2. is a linear combination of the other rows. In Theorem 4... then det(B) = -det(A). we proved that a 2 x 2 matrix is invertible if and only if its determinant is nonzero. The converse is proved in the corollary to Theorem 4.

l ) 1 + 1 ( l ) . 4 Determinants Since M was obtained by performing two type 3 elementary row operations on A.l ) 1 + 2 ( 3 ) * det(M 12 ) + (-l)1+3(-3)*det(M13). det(Mn) = ( . it follows that det(A) = 40. as described earlier.10 = 40. The preceding calculation of det(P) illustrates an important general fact.6.1 ) 1 + 1 ( 1 ) . det ( / 6 _~l = l[4(-18)-(-7)(16)]=40. d e t ( P n ) = (-l)1+1(l)-det^ ~ £ ) =1*4. and so we can easily evaluate the determinant of any square matrix. Thus with the use of two elementary row operations of type 3. Both Mi2 and M\3 have a column consisting entirely of zeros.218 Chap.1 ) ^ ( 1 ) . we have reduced the computation of det (A) to the evaluation of one determinant of a 2 x 2 matrix.1 ) 1 + 1 ( 1 ) . The determinant of an upper triangular matrix is the product of its diagonal entries. we have det(A) = det(M). we can transform any square matrix into an upper triangular matrix. The next two examples illustrate this technique. But we can do even better. we have det(P) = ( . The cofactor expansion of M along the first row gives det(M) = ( . det(Mn) + ( . Hence det(M) = ( . Since det(A) = det(M) = det(P). (See Exercise 23. If we add —4 times row 2 of M to row 3 (another elementary row operation of type 3).) By using elementary row operations of types 1 and 3 only. Example 5 To evaluate the determinant of the matrix . we obtain P = Evaluating det(P) by cofactor expansion along the first row. and so det(Mi2) = det(Mi 3 ) = 0 by the corollary to Theorem 4.

4 = 32 Using elementary row operations to evaluate the determinant of a matrix. it follows that det(B) = . we must begin with a row interchange. Consider first the evaluation of a 2 x 2 matrix.4. Example 6 The technique in Example 5 can be used to evaluate the determinant of the matrix C = 2 0 -2 4 0 1 -3 -4 0 3 -5 4 A -3 2 • .v in Example 3.Sec. Interchanging rows 1 and 2 of B produces C = -2 0 4 -3 -5 1 3 -4 4 By means of a sequence of elementary row operations of type 3.2 0 ^ /2 0 0 \° T hus det(C) = 2. Since C was obtained from B by an interchange of rows. 4.det(C) = 48. we can transform C into an upper triangular matrix: Thus det(C) = -2-1-24 = -48. . 1 = ad — be. is far more efficient than using cofactor expansion.2 Determinants of Order n 219 in Example 2. This matrix can be transformed into an upper triangular matrix by means of the following sequence of elementary row operations of type 3: / 2 0 -2 \ 4 0 1 -3 -4 0 3 -5 4 A -3 2 -6) (2 0 0 0 1 -3 -4 0 1 0 0 • 0 3 -5 4 0 3 4 0 A -3 3 -8/ A -3 -6 V f2 0 * 0 • \0 0 1 0 0 0 A 3 -3 4 -6 16 . Since det I .1. as illustrated in Example 6.

To evaluate the determinant of a 20 x 20 matrix. Thus in all. In this section. In the next section. then det(A) = 0. (g) If A G M n x „ ( F ) has rank n. By contrast. then det(B) = —det(A). (d) If B is a matrix obtained from a square matrix A by interchanging any two rows. which is not large by present standards.4 x 1018 multiplications. we have defined the determinant of a square matrix in terms of cofactor expansion along the first row. we continue this approach to discover additional properties of determinants. . then det(B) = det(A). In addition. then det(A) = 0. Label the following statements as true or false. EXERCISES 1. then det(B) = kdct(A). Thus it would take a computer performing one billion multiplications per second over 77 years to evaluate the determinant of a 20 x 20 matrix by this method. We then showed that the determinant of a square matrix can be evaluated using cofactor expansion along any row. These properties enable us to evaluate determinants much more efficiently. (e) If B is a matrix obtained from a square matrix A by multiplying a row of A by a scalar. (h) The determinant of an upper triangular matrix equals the product of its diagonal entries.220 Chap. (f) If B is a matrix obtained from a square matrix A by adding k times row i to row j. the method using elementary row operations requires only 2679 multiplications for this calculation and would take the same computer less than three-millionths of a second! It is easy to see why most computer programs for evaluating the determinant of an arbitrary matrix do not use cofactor expansion. the evaluation of the determinant of an n x n matrix by cofactor expansion along any row requires over n! multiplications. (c) If two rows of a square matrix A are identical. we showed that the determinant possesses a number of special properties. including properties that enable us to calculate det(B) from det (A) whenever B is a matrix obtained from A by means of an elementary row operation. cofactor expansion along a row requires over 20! ~ 2. whereas evaluating the determinant of an n x n matrix by elementary row operations as in Examples 5 and 6 can be shown to require only (n 3 + 2n — 3)/3 multiplications. 4 Determinants the evaluation of the determinant of a 2 x 2 matrix requires 2 multiplications (and 1 subtraction). > (b) The determinant of a square matrix can be evaluated by cofactor expansion along any row. (a) The function det: M n X n ( F ) — F is a linear transformation. For n > 3. evaluating the determinant of an n x n matrix by cofactor expansion along any row expresses the determinant as a sum of n products involving determinants of (n— 1) x (n— 1) matrices.

+61 b2 + c2 a 2 + c2 a2+62 b3 + c3\ /oj a 3 + c 3 j = /edet I 6i a3 + 6 3 / \d a2 62 c 2 a3' 63 c3/ In Exercises 5-12. Find the value of k that satisfies the following equation: /3a i det I 36i \3ci 3a 2 3o2 3c2 3a 3 \ fa\ 363 = kdct I &i 3c 3 / \d a2 b2 e2 a3\ b3 . c3/ 4.2 Determinants of Order n 2. evaluate the determinant of the given matrix by any legitimate method. evaluate the determinant of the given matrix by cofactor expansion along the indicated row. In Exercises 13-22.Sec. 4./ along the second row / 12. 1 -1 2 -A -3 4 1 . Find the value of k that satisfies the following equation: / det 2a i 36i + 5d \ 7cj 2a 2 362 + 5c2 7c2 2a 3 \ /oj a2 363 + 5c3 = fcdct [ 6X b2 7c3 / \c\ c2 a3\ b3 .1 0 1 1 2 O y v-l along the fourth row (% 2+i 0 \ -1 3 2i \ 10. Find the value of k that satisfies the following equation: /bi+ci det j ai + ci \ a . \ 0 -1 1 -.1 2 .. along the fourth row 11.5 . : J 5.2 2 3 . V 2 3 0/ along the first row 0 -1 7. c3) 221 3. ^-1 along the 0 2\ 1 5 3 0/ third row . . 0 1+i 2 \ -2i 0 1 r« 3 4i 0 y along the third row / 0 2 1 3\ 1 0 . second row !). 2 along the I 2N 0 -3 3 0. -1 3 0/ along the first row 1 0 8.3 8 ^-2 6 -4 I. 1 0 2\ 0 1 5 (i.

Calculate det(Z?) in terms of det(A). This result is very useful in studying the effects on the determinant of applying a sequence of elementary row operations. 25. Prove that the determinant of an upper triangular matrix is the product of its diagonal entries. Under what conditions is det(-A) = det(A)? 27. Prove that det(ArA) = kn det(A) for any A G M n x n ( F ) . 4. Because the determinant ... then det(A) = 0. then det(F') = det(F).222 Chap. 30. Let A G M n x n ( F ) . 28.' Prove that if E is an elementary matrix. . and let B be the matrix in which the rows are a n . we saw that performing an elementary row operation on a matrix can be accomplished by multiplying the matrix by an elementary matrix.1. a n _ i . . 4 Determinants 21... 29. an. Prove that if A G M n X n ( F ) has two identical columns..3 PROPERTIES OF DETERMINANTS In Theorem 3. . Let the rows of A G M n x n ( F ) be a\. 23. 26. Prove the corollary to Theorem 4. 24. a2. is an elementary matrix of type i. Compute det(Fj) if F.aj.3. .

For any A. 159).1 . if A G MnXTl(F) is invertible. The first paragraph of this proof shows that det(AB) = det(F r = det(F r •E2ElB) det(Fm_!---F2Fi£) = det(F m ) = det(A)-det(P). v . if A has rank n. Hence by Theorem 4. (a) If E is an elementary matrix obtained by interchanging any two rows of/. 216). then det(A) = 0 by the corollary to Theorem 4. Theorem 4. we can interpret the statements on page 217 as the following facts about the determinants of elementary matrices. we have det(AB) = 0.det(B). If A is an elementary matrix obtained by interchanging two rows of / . Similar arguments establish the result when A is an elementary matrix of type 2 or type 3.7. We begin by establishing the result when A is an elementary matrix.(F). We now apply these facts about determinants of elementary matrices to prove that the determinant is a multiplicative function. then det(F) = k. 216).) If A is an nxn matrix with rank less than n. det(AP) = — dct(B) = det(A). det(A) Proof. 149). (b) If E is an elementary matrix obtained by multiplying some row of / by the nonzero scalar k. then the rank of A is less than n. then det(F) = 1.1 (p.det(P). then det(A) = . (See Exercise 18.det(P) in this case. det(A _ 1 ) = det(AA _ 1 ) = det(/) = 1 .2).5 (p. 4. then det(A _ 1 ) = -—7—-. 217). AB is a matrix obtained by interchanging two rows of B. So det(A) = 0 by the corollary to Theorem 4. Furthermore. If A G M n X n ( F ) is not invertible. if A is invertible.6 p. On the other hand. But by Theorem 3. Proof. A matrix A G M n x n ( F ) is invertible if and only if det(A) 7^ 0.6 (p.7 (p. then det(A). then A is invertible and hence is the product of elementary matrices (Corollary 3 to Theorem 3. (c) If E is an elementary matrix obtained by adding a multiple of some row of J to another row.• Sec..6 (p. say.3 Properties of Determinants 223 of the nxn identity matrix is 1 (see Example 4 in Section 4. det(AP) = det(A). thendet(F) = .1 . det(F 2 ) • det(Fi) • det(P) I = det(Fm---F2Fi)-det(P) Corollary. B G . On the other hand. Thus det(AB) = det(A). 159). Since rank(AP>) < rank(A) < n by Theorem 3. A = Em • • • E2Ei.

say A = Em.. Thus det(A') = 0 = det(A) in this case. then rank(A) < n.2.det(F 2 )---. For any A G M n x n (F).det(F2) = d e t ( F 0 .2.. I Among the many consequences of Theorem 4.. we have used only the rows of a matrix. On the other hand.7 we have det(A') = det(E\El2 . Since the rows of A are the columns of A6. Our next result shows that the determinants of A and A1 are always equal. Xk _ det(Mfc) " det(A) ' . If det(A) ^ 0. then tin's system has a unique solution. det(A') = det(A). T h e o r e m 4..9 ( C r a m e r ' s Rule).G (p. Hence det(A) ^ 0 and det(A~ 1 ) = Chap. Since det(F.x2. by Theorem 4. 158).2 used elementary row operations. But rank(A') = rank(A) by Corollary 2 to Theorem 3.7.) We conclude our discussion of determinant properties with a well-known result that relates determinants to the solutions of certain types of systems of linear equations. and that elementary column operations can be used as well as elementary row operations in evaluating a determinant..• <let(F m ) = d e t ( F m ) .224 by Theorem 4.fit ) det(F^) = det(Fi).. if A is invertible.• • E2E\. For example. 4 Determinants det(A) In our discussion of determinants until now. in either case. Let Ax = b be the matrix form of a system of n linear equations in n unknowns. If A is not invertible. • • • • det(F 2 ) •det(Fi) = det(F m • •• F 2 F 0 = det(A). and for each k (k = l.. Proof. det(A') = det(A). this fact enables us to translate any statement about determinants that involves the rows of a matrix into a corresponding statement that involves its columns. and the more efficient method developed in Section 4.8. where x = (x\.n). then A is a product of elementary matrices.xny. and so A1 is not invertible. Theorem 4.) = det(F|) for every i by Exercise 29 of Section 4. (The effect on the determinant of performing an elementary column operation is the same as the effect of performing the corresponding elementary row operation. the recursive definition of a determinant involved cofactor expansion along a row..8 are that determinants can be evaluated by cofactor expansion along a column.. Thus. .

Then by Theorem 2. then the system Ax = b has a unique solution by the corollary to Theorem 4.9 by using Cramer's rule to solve the following system of linear equations: xi + 2x2 + 3x 3 = 2 x\ + x3 = 3 x\ + x2 — x3 = 1. 90). we have '2 2 13 0 1 1 det(A) 2 I1 3 1 det(A) 3> 1 -1 I det(A)-xk= xk- det (Mi) Xi = det (A) 15 6 det(M 2 ) x2 = det(A) 3s 1 -1 -6 = -1. Cramer's rule applies.1 -det(M fc ).Sec. . The matrix form of this system of linear equations is Ax = b.10 (p. If det (A) ^ 0. 174). 225 matrix obtained from A by replacing column k of A Proof.Evaluating Xk by cofactor expansion along row k produces det(X fc ) = xk-det(I„-i) Hence by Theorem 4.9. AXk is the nxn matrix whose zth column is Aei = ai if i ^ k and Ax = b if i = k. let ak denote the kth column of A and Xk denote the matrix obtained from the nxn identity matrix by replacing column k by x. where '1 ll Because det (A) = 6 ^ 0 . Thus AXk = Mk. det(Mfc) = det(AXk) = det(A).det(Xfc) = Therefore xfc = [det(A)]. For each integer k (1 < k < n).7.13 (p. 4. Example 1 We illustrate Theorem 4.3 Properties of Determinants where Mk is the nxn byb. Using the notation of Theorem 4.7 and Theorem 3.

its volume should be \/6*\/2*\/3 = 6. a 2 . Freeman and Company. rather than of computational value. —1). 4 Determinants 6 2 Thus the unique solution to the given system of linear equations is (xx. respectively. 524. as the determinant calculation shows. p. Thus Cramer's rule is primarily of theoretical and aesthetic interest. which was discussed in Section 3. Cramer's rule can be useful because it implies that a system of linear equations with integral coefficients has an integral solution if the determinant of its coefficient matrix is ±1. Hence by the familiar formula for volume. In this situation. If the rows of A are a i . see Jerrold E. Elementary Classical Analysis. . . On the other hand. 1993. (For a proof of a more generalized result.1.0. As in Section 4. W. a n . Marsden and Michael J.-2. we sometimes need to know that there is a solution in which the unknowns are integers. The amount of computation to do this is far greater than that required to solve the system by the method of Gaussian elimination.1).1.1) as adjacent sides is = 6. then |det(A)| is the n-dimensional volume (the generalization of area in R2 and volume in R 3 ) of the parallelepiped having the vectors a i .x3) = ( 2'-1'2 In applications involving systems of linear equations. a 2 = (1.226 and det det(M 3 ) ^3 = det(A) det(A) Chap. a 2 . Cramer's rule is not useful for computation because it requires evaluating n + 1 determinants o f n x n matrices to solve a system of n linear equations in n unknowns. . Note that the object in question is a rectangular parallelepiped (see Figure 4. and a 3 = (1. \/2. it is possible to interpret the determinant of a matrix A G M n x n ( P ) geometrically. Hoffman. .a„ as adjacent sides. .6) with sides of lengths \/6. New York. .) Example 2 The volume of the parallelepiped having the vectors O] = (1.H. .4. we also saw that this . • In our earlier discussion of the geometric significance of the determinant formed from the vectors in an ordered basis for R2. and y/3. .x2.

Sec. if 7 is any ordered basis for Rn and (3 is the standard ordered basis for R n .0. then 7 induces a right-handed coordinate system if and only if det(Q) > 0. A similar statement is true in R".1) (1. for instance. 7 = induces a left-handed coordinate system in R3 because whereas 7 = induces a right-handed coordinate system in R3 because r det 2 \0 "2 °\ 1 0 = 5 > 0. 4. determinant is positive if and only if the basis induces a right-handed coordinate system.6: Parallelepiped determined by three vectors in R3. 0 1/ .-1) Figure 4.3 Properties of Determinants 227 (1. Thus. where Q is the change of coordinate matrix changing 7-coordinates into /^-coordinates. Specifically.-2.

2x2 + x3= 0 3xi + 4x2 — 2x3 = —5 2xi + x 2 . (h) Let Ax = b be the matrix form of a system of n linear equations in n unknowns.a i 2 a 2 i ^ 0 2xi + x2 — 3x 3 = 1 x\ .3x 3 = 5 3. matrix is invertible if and only if . if (3 and 7 are two ordered bases for R n . EXERCISES 1. . x 2 .2 x i . (a) (b) (c) (d) (e) (f) If E is an elementary matrix. For any A G M n x n ( F ) . Use Theorem 4.x 2 + 4x 3 = . For any A.2x 2 + x 3 = 10 3x! + 4x 2 .det(A). -8x1 + 3x 2 + x 3 = 0 2xi . . but for columns. . where x = (x\.2x 3 = 0 xi . 212). x\ — x 2 + Ax3 = —2 6. then the unique solution of Ax = b is Xk ~ _ det(M fc ) det(A) i Q v k _ l 2 tar *-*M. then det(F) = ±1. x n ) f .x 2 + x 3 = 6 8.8 4. det(AB) = det(A) • det(B). where Q is the change of coordinate matrix changing 7-coordinates into /^-coordinates. xi .n.228 Chap.3 (p. Label the following statements as true or false. -8x1 + 3x2 + x 3 = 8 2xi — x 2 + x 3 = 0 3xi + x 2 + x 3 = 4 7. In Exercises 2-7. A matrix M G M n X n ( F ) has rank n if and only if det(M) ^ 0.4 5. If det(A) 7^ 0 and if Mk is the nxn matrix obtained from A by replacing row k of A by bl. . 4 Determinants More generally. 2. a-nxi + d\2X2 = b\ a2\X\ + a22x2 = b2 where ana22 . 9. Prove that an upper triangular nxn all its diagonal entries are nonzero.x 2 =12 xi + 2x 2 + x 3 = . . (g) Every system of n linear equations in n unknowns can be solved by Cramer's rule.-.8 to prove a result analogous to Theorem 4.B G M n X n (F). The determinant of a square matrix can be evaluated by cofactor expansion along any column. use Cramer's rule to solve the given system of linear equations. then the coordinate systems induced by (3 and 7 have the same orientation (either both are right-handed or both are left-handed) if and only if det(Q) > 0. A matrix M G M n x n ( F ) is invertible if and only if det(M) = 0. det (A*) = .

Sec. Prove that 0 is a basis for F n if and only if det(B) ^ 0. (a) Prove that det(Af) = det(M). Prove that if Q is orthogonal. A matrix M G M n x n ( C ) is called nilpotent if. then M is not invertible. then det(Q) = ±1. where Q* = Ql. Suppose that A is a lower triangular matrix. u2. Prove that if M is skew-symmetric and n is odd. What happens if n is even? 12. 4. Prove that if M is nilpotent. Describe det (A) in terms of the entries of A. • • •.j. For M G M n x n (C). 16. 17. then det(M) = det(A)« det(C). 20. /. then det(AB) = det(A) • det(B). 15. then det(A) = det(B). Prove that if Q is a unitary matrix. B G M n X n ( F ) are such that AB =-. let M be the matrix such that (M)ij = ~M~~j for all i.3 Properties of Determinants 229 10. 13. (b) A matrix Q G M n X n (C) is called unitary if QQ* = / . A matrix A G M n x n ( F ) is called lower triangular if Atj — 0 for 1 < » < J < Ti.Be Mnxn(F) be such that AB = -BA. B G M n x n ( F ) are similar. Let /3 = {ui. and let B be the matrix in M n X n ( F ) having Uj as column j. 18. Let A. un} be a subset of F n containing n distinct vectors. Suppose that M G M n X n ( F ) can be written in the form M = where A is a square matrix. . Use determinants to prove that if A. then det(M) = 0.* Prove that if A.1 ) . + Prove that if M G M n x n ( F ) can be written in the form M = 'A O B' C wheje A and C are square matrices. Prove that if n is odd and F is not a field of characteristic two. A matrix M G M n x n ( C ) is called skew-symmetric if Ml = — M.7 by showing that if A is an elementary matrix of type 2 or type 3. then | det(Q)| = 114. Complete the proof of Theorem 4. 11. 19. 21. Prove that det(M) = det(A). where Mij is the complex conjugate of Mij. then A is invertible (and hence B = A . A matrix Q G M n x n (i?) is called orthogonal if QQl — I. for some positive integer k. Mk = O. then A or B is not invertible. where O is the nxn zero matrix.

then det(B) = Cjk- . (b) Use Exercise 22 of Section 2. Prove that there exists a A x A submatrix with a nonzero determinant. Prove that rank(A) = k.(F).c n are distinct scalars in an infinite field F. 4 Determinants 22.4 to prove that det(M) •/ 0.• by ej. . 0<i<j<n the product of all terms of the form c3. : : 24. / ( d ) . . (b) Conversely. suppose that rank (A) = k. (c) Prove that det(M)= J] (<*-*). Let A G M „ x n ( F ) be nonzero. an m x m s u b m a t r i x is obtained by deleting any n ~ m rows and any n — m columns of A. Let T: P„(F) — F n + 1 be the linear transformation defined in Exer> cise 22 of Section 2.ci. Let c.230 Chap. Let A G M n X n ( F ) have the form ( 0 1 0 A = \ 0 0 0 0 0 -1 0 0 0 ••• -1 ao \ ax "2 On-l/ Compute det(A + tl).. where J is the nxn identity matrix. — C{ for 0 < i < j < n.jk denote the cofactor of the row j. (a) Show that M = [T]l has the form Co Cf]\ \l C cnJ A matrix with this form is called a V a n d e r m o n d e matrix. where^ cn. (a) Let k (1 < k < n) denote the largest integer such that some k x k submatrix has a nonzero determinant. . . column k entry of the matrix A G M nX7 . (a) Prove that if B is t he matrix obtained from A by replacing column /. Let 0 be the standard ordered basis for P n ( F ) and 7 be the standard ordered basis for F n + 1 .. 23.4 by T ( / ) = (/(co). For any m (1 < m < n).. 25. . f(cn)).

Sec.. 27. The classical adjoint of a . then A" 1 = [det(A)]~lC.3 Properties of Determinants (b) Show that for 1 < j < n. 4. '4 0 0> (b) ( 0 4 0 0 0 4. (d) Show that if det(A) / 0. Find the classical adjoint of each of the following matrices. we have cj2 A \CjnJ = det(A) 231 Hint: Apply Cramer's rule to Ax = ej.square matrix A is the transpose of the matrix whose ij-entry is the ij-cofactor of A.(t) y2(t) v'zit) (n). then C and A . The following definition is used in Exercises 26 27.. 28.yn be linearly independent functions in C°°. y}2'"(t) ••• ••• yn(t)\ y'Jt) y(n\t)I [T(y)](i)=det \ .. (a) det(C) = [detCA)]""1.ji. (c) If A is an invertible upper triangular matrix. (b) Cl is the classical adjoint of A1. For each y G C°°.. Let y\. 26.1 are both upper triangular matrices.y2. (c) Deduce that if C is the nxn matrix such that Cij = c. define T(y) G C°° by / y(t) y'(t) Wn)(t) vi(t) . Definition. then AC = [det(A)]7. Prove the following statements. m in) y\n. Let C be the classical adjoint of A G M r i X n (F).

yn})yn. In the formulas above. where Aij is the (n-l)x(n-l) matrix obtained by deleting row i and column j from A. then det (A) = A n .. If A is 2 x 2. the scalar (—l)i+J' det(Aij) is called the cofactor of the row i column j entry of A.2 and 4. until 2 x 2 matrices are obtained.. \ 4. the determinant of A is evaluated as the sum of terms obtained by multiplying each entry of some row or column of A by the cofactor of that entry. we summarize the important properties of the determinant needed for the remainder of the text.•. 4 Determinants The preceding determinant is called the Wronskian of y. If A is 1 x 1.4 SUMMARY—IMPORTANT FACTS ABOUT DETERMINANTS In this section. 2.3. and can be computed in the following manner: 1. . denoted by det(A) or |A|. y2. the single entry of A.1 ) * + ' A W • det(Aij) (if the determinant is evaluated by the entries of row i of A) or n det(A) = Y^{-tf+JAij' i=l det(iij) (if the determinant is evaluated by the entries of column j of A). the facts presented here are stated without proofs. y\. The determinant of an n x n matrix A having entries from a field F is a scalar in F . • . » (b) Prove that N(T) contains span({yi. In this language. "13- ( 5 3) = ( "1)(3) " (2)(5) = 3. consequently. and so forth.Ai2A2\. then n det(A) = J 3 ( . det For example..232 Chap. then det(A) = AnA 2 2 .. The results contained in this section have been derived in Sections 4. If A is n x n for n > 2. Thus det (A) is expressed in terms of n determinants of (n — 1) x (n — 1) matrices. (a) Prove that T: C°° — C°° is a linear transformation. These determinants are then evaluated in terms of determinants of (n — 2) x (n — 2) matrices. The determinants of the 2 x 2 matrices are then evaluated as in item 2.

we must know the cofactors of each entry of that row. We have det(B) = (-lY^(l)det (l43 " J ) + ( . and —13. Thus det(A) = ( .1 A = 2 0 . let US also compute the determinant of A by expansion along the second column. The cofactor of A.l ) 4 + 1 d e t ( £ ) . A 22 .4 .l \ (2 -3 I +(-l)2+2(l)det 2 1 2 / \3 1 .3 1 \3 6 1 2) To evaluate the determinant of A by expanding along the fourth row. We can now evaluate the determinant of A by multiplying each entry of the fourth row by its cofactor: t his gives det(A) = 3(23) + 6(8) + 1(11) + 2(-13) = 102.l ) * + i ( l ) d c t ( J . respectively. 10. A 43 . respectively. \ . and 8.(5)(-3)] + 0 = . -4 -3 -1 1 + (-l)3+2(0)det = 14 + 40 + 0 + 48 = 102. Thus the cofactor of A4X is (— 1 )5(—23) = 23. '2 -l)4+'(6)det I 1 2 1 5N -3 1 1 2.16 + 0 = -23.n = 3 is ( . For the sake of comparison. where Let us evaluate this determinant by expanding along the first column. The reader should verify that the cofactors of A 12 .3 ) ] + (-1)(1)[(1)(1) .l ) 1+2 f\ (l)det 2 \3 2 1 \Z -4 .V .7 . 11.1 1 2.Sec.( .4 Summary—Important Facts about Determinants 233 Let us consider two examples of this technique in evaluating the determinant of the 4 x 4 matrix (2 1 1 5\ 1 1 . the cofactors of A 42 .l ) ( . and A 42 are —14.4 . Similarly. and A44 are 8. 4. * ) + (-l)3+1(0)det( ] _ j ) = 1(1)[(-4)(1) .

.3 1 6 1 2/ /-I det 2 V-9 -1 0 det 2 0 \-9 0 ( 2 X 1 5 \ -6 -5 1 -3 . then det(B) = det(A). l) = 1(-1) 1+2 -5 ~6\ -3 1 • . it is often helpful to introduce zeros into the matrix by means of elementary row operations before computing the determinant. 4 Determinants Of course. it is beneficial to evaluate the determinant of a mat rix by expanding along a row or column of the matrix that contains t he largest number of zero entries. Properties of the Determinant 1. If B is a matrix obtained from an n x n matrix A by adding a multiple of row i to row j or a multiple of column i to column j for i / j. then det(B) = — det(A).1 0 2 ) = 102. Therefore det(A) = l ( .234 Chap. and then to expand along that column. (The elementary row operations used here consist of adding multiples of row 1 to rows 2 and 4.4 . then det(Z?) = k. the fact that t he value 102 is obtained again is no surprise since the value of the determinant of A is independent of the choice of row or column used in the expansion.det(A). 3. This technique utilizes the first three properties of the determinant. 2.5 -28 The resulting determinant of a 3 x 3 matrix can be evaluated in the same manner: Use type 3 elementary row operations to introduce two zeros into the first column.) This procedure yields (2 1 det(A) = det 2 V3 1 1 5\ 1 . which makes it unnecessary to evaluate one of the cofactors (the cofactor of A 32 ).1 0 . Observe that the computation of det(A) is easier when expanded along the second column than when expanded along the fourth row. let us compute the determinant of the 4 x 4 matrix A considered previously.5 -28. As an example of the use of these three properties in evaluating determinants.l ) 1 + 2 ( . and then expand along that column. For this reason. If B is a matrix obtained by multiplying each entry of some row or column of an n x n matrix A by a scalar k. In fact. The difference is the presence of a zero in the second column. If B is a matrix obtained by interchanging any two rows or interchanging any two columns of an nxn matrix A. This results in the value —102. Our procedure is to introduce zeros into the second column of A by employing property 3.

and 3 are employed. In particular.6 ) = 72. perhaps the most significant property of the determinant is that it provides a simple characterization of invertible matrices. 2. I -1 2 2 . The next three properties of the determinant are used frequently in later chapters. det(AB) = det(A). (b) Compute the product of the diagonal entries. and 3 above to reduce the matrix to an upper triangular matrix.1 . det(7) = 1. As an illustration of property 4. Indeed.3 * 6 = 18. For instance.Sec. -6/ Property 1 provides an efficient method for evaluating the determinant of a matrix: (a) Use Gaussian elimination and properties 1. If two rows (or columns) of a matrix are identical. notice that -3 0 0 1 4 0 2\ 5 = ( . 5. 4.l .l 1\ 2 0 -5 I = det 3 -4 0 0 9 -*) 0 V° l .1 del -4 5-10 \ 3 -2 10 /I 0 0 \0 -1 1 0 0 ( l\ (\ 0 4 = det 0 -6 () -\) ^ -l I I I 2 -5 -2 4 2 -5 3 0 1\ 2 -2 -4/ l\ 2 _4 6/ 2 (\ .4 Summary—Important Facts about Determinants 235 Fhe reader should compare this calculation of det (A) with the preceding ones to see how much less work is required when properties 1. The determinant of an upper triangular matrix is the product of its diagonal entries. The next two properties of the determinant are useful in this regard: 4. (See property 7.) 6. For any n x v matrices A and B. then the determinant of the matrix is zero. . 2.det(B).3 ) ( 4 ) ( . we often have to evaluate the determinant of matrices having special forms. In the chapters that follow.

it is wise to expand along a row or column containing the largest number of zero entries.i (b) -1 7 3 8 3 0/ ii li 2+ z 1 . Label the following statements as true or false. For example.236 Chap. det(A) 8. (j) If Q is an invertible matrix. (e) If B is a matrix obtained by multiplying each entry of some row or column of A by a scalar. (a) (c) 4 2 -5 3 . is used in Chapter 5. then det(AB) = det(A) * det(B). (k) A matrix Q is invertible if and only if det(Q) ^ 0.2i (d) 3. Evaluate the determinant of the following 2 x 2 matrices. (i) If A. It is a simple consequence of properties 6 and 7. (b) In evaluating the determinant of a matrix. (d) If B is a matrix obtained by interchanging two rows or two columns of A. then det(A) = det(B). . 4 Determinants 7. For any n x n matrix A. thendet(B) = det(A). If A and B are similar matrices. Furthermore. the determinants of A and A' are equal. (a) The determinant of a square matrix may be computed by expanding the matrix along any row or column. EXERCISES 1. Evaluate the determinant of the following matrices in the manner indicated. det(A') = . 9. (h) For every A € M n x n ( F ) . (f) If B is a matrix obtained from A by adding a multiple of some row to a different row. then dct(Q~ l ) = [det(Q)] -1 .1 ) = . 2... then det(B) = det(A). B G M n X f l (F).det(A). An n x n matrix A is invertible if and only if det(A) ^ 0. (c) If two rows or columns of A are identical.— r . then det(B) = det(A).3. if A is invertible. then det(A) = 0. (g) The determinant of an upper triangular nxn matrix is the product of its diagonal cut ries. property 7 guarantees that the matrix A on page 233 is invertible because det (A) = 102.l + 3i 3. stated as Exercise 15 of Section 4. then det(A . The final property.

1 2 -5 -3 8 \-2 6 -4 \) along the fourth row / :S\ 2 1 2 1 0 -2 I 3 -1 0 (g) V-l 1 2 0/ along the fourth column (h) 4. Prove that det(M) = det(A).4 Summary—Important Facts about Determinants .Sec. . ( 1 -5 (h) -9 \-A -2 12 22 9 3 -14 -20 -14 -12\ 19 31 15/ 5.i] \ 3 Ai 0 / along the third row / () (e) i 2+i 0 \ -1 3 2/ (f) 0 -1 l-i/ along the third column 1 -1 2 -1\ -3 4 1 . 4. third row 0 1 2s -1 0 -3 (c) 2 3 0.: | J \ 2 3 0/ along the first row 237 (a) (b) along the first column 1 0 (d) -1 along the 0 2N 1 o 3 0. along the second column / 0 1+i 2 \ -2/ 0 1 . Suppose that M G M n X n ( F ) can be written in the form M = A BN O / where A is a square matrix. Evaluate the determinant of the following matrices by any legitimate method.

A function 8: M n x n ( F ) — F is called an n-linear function » if it is a linear function of each row of an n x n matrix when the remaining n — \ rows are held fixed. that is. 8j(A) equals the product of the entries of column j of .3 (p. Definition. that is. 4.5* A CHARACTERIZATION OF THE DETERMINANT In Sections 4. the only function 8: M n x n ( F ) — F * having these three properties is the determinant.1 to establish the relationship between det ( ) and the area of the parallelogram determined by u and v. Example 1 The function 8: M „ x n ( F ) -> F defined by 8(A) = 0 for each A G M n x n ( F ) is an n-linear function. we have / o. for every r = 1 . . The first of these properties that characterize the determinant is the one described in Theorem 4. . 8 is n-linear if. 4 Determinants 6. 2 . n. . .i Op_i u + kv \ ( a. we showed that the determinant possesses a number of properties. define 8j: M n x n ( F ) ~> F by 8j(A) = AijA2j • • • Anj for each A G M n X n ( F ) .2 and 4. that is. In this section. > Or-l li a r +i ( ii \ ar_i v a r +i \ «« / = 8 k8 \ «» / whenever k is a scalar and u.3. then det(Af) = det(A)* det(C). v. 212). • Example 2 For 1 < j <n. we show that three of these properties completely characterize the determinant.* Prove that if M e M n x n ( F ) can be written in the form u (A ( o B M = \ o)' where A and C are square matrices. This characterization of the determinant is the one used in Section 4.238 Chap. and each Oj are vectors in F".

8 The function 8: M n X n ( F ) -> F defined for each A G M n x n ( F ) by 8(A) = AnA 2 2 • • -Anil (i. we have ar_i a r + kv ar+i V On / = Aij • • • A ( r _ 1 ) j (A r j + kbj)A{r+1)j •••A ". for any scalar k. for each A G M n X n (F). Definition. Ai2. a{ = (An. 4. Let Ae M n x n ( F ) ..) : .. +1 ) i • • • Anj + Aij • • • A{r_Uj(kbj)A{r+i)j = Aij • • • A(r_i)jArjA(r+i)j ( «1 \ ar-i ar ar+i \an Example 3 ) \an ) / ay \ (lr-l V ar+i • • • Anj • • • Anj + k(Aij • • • A( r _i)j-6jA( r+ i)j • • • Anj) =8 k.. • Theorem 4. 8(A) equals the product of the diagonal entries of A) is an n-linear function. For if / is the nxn identity matrix and A is the matrix obtained by multiplying the first row of / by 2. . then 8(A) = n+1^2n = 2-8(1).A ri A( T ..../ = Aij • • • A (r _.. and v = (bub2. An n-linear function 8: M n X n ( F ) — F is called alternating > if. we have 8(A) = 0 whenever two adjacent rows of A are identical.5 A Characterization of the Determinant 239 A.bn) G F n . Now we introduce the second of the properties used in the characterization of the determinant. For our purposes this is the most important example of an n-linear function. ..e.Sec. • Example 4 The function 8: MnXn(B) — R defined for each A G M n x n (i?) by 8(A) = > tr(A) is not an n-linear function for n > 2. Then each 8j is an n-linear function because. Ain).3 (p. 212) asserts that the determinant is an n-linear function.

a s + i . a 2 . a2. . . .240 Chap.. o s _i. . and let the rows of A be a i .1 ) <5(A) = -<5(A). where r < s.. We first establish the result in the case that s = r + 1. Let 8: M n X n ( F ) — F be an alternating n-linear function. (b) If A e M n x n ( F ) has two identical rows.. .s —r interchanges of adjacent rows are needed to produce this sequence. . successively interchange a r with the row that follows it until the rows are in the sequence ai.. . Because 8: M 7 l X n (F) — F is an n-linear function » that is alternating. + 1 / ai \ flr+1 Or V an y ai \ "I ar+i a r + a r j-i \ an / \ 0=8 = 6" ar + ar+i +< 5 ( «t \ ar+1 ar+1 v «n y = 5 Or V an y w = 0 + 8(A) + 8(B) + 0.. a s . Beginning with ar and a r +i. In all.r . ar_i. then 5(A) = 0 because 5 is alternating and two adjacent rows .. If s = r + 1.a n . we have \ ar + or+i ar + ar+i an /fli\ / / ai \ ar a. .. an. Then successively interchange as with the row that precedes it until the rows are in the order ax.. then 8(B) = -8(A). (a) Let A G M n x n ( F ) . a r _ i .s . .. a r .. o r . .. a n . as+i. where r < s...r ) + ( s .. a r + i . ar+i. > (a) If A G M n x n ( F ) and B is a matrix obtained from A by interchanging any two rows of A. .a2.10. (b) Suppose that rows r and s of A G M n X n ( F ) are identical. Thus 8(B) = -<5(A). This process requires an additional s — r — 1 interchanges of adjacent rows and produces the matrix B. a s . then 8(A) = 0. and let B be the matrix obtained from A by interchanging rows r and s.. .. 4 Determinants T h e o r e m 4. Proof. . It follows from the preceding paragraph that 8(B) = (-l)(.. Next suppose that s > r + 1.

8(E2) = k-8(I). 2. and so it is omitted. Hence 8(A) = 0. lXTi (F) by adding k times row i of A to row j . 8(A) = det(A) + for all A G M„xri(F). (See Exercise 11. and C are identical except for row j.) Corollary 2. . Suppose that E2 is obtained by multiplying some row of I by the nonzero scalar k. I The next result now follows as in the proof of the corollary to Theorem 4. and S(E3) = 8(1). (See Exercise 12. Moreover. respectively. Let 8: M n x n (F) — F be an alternating n-linear function. row j of B is the sum of row j of A and k times row j of C. and E3 in M n X f l (F) be elementary matrices of types 1. and let C be obtained from A by replacing row J of A by row z of A.6 (p. Let 8: M n x n ( F ) —> F be an alternating n-linear function such that 8(1) = 1. Then 8(B) = 0 because two adjacent rows of B are identical. Be M n x n ( F ) .) T h e o r e m 4. I Corollary 1. The proof of this result is identical to the proof of Theorem 4. Exercise. then 8(B) = <5(A). let B be the matrix obtained from A by interchanging rows r + 1 and s. Let B be obtained from A G M. Proof.5 A Characterization of the Determinant 241 of A are identical. E2. and 3. • If M G M n X n (F) has rank less than n. Before we can establish the desired characterization of the determinant. the only alternating n-linear function 8: M n x „ ( F ) — F is the determinant. For any A. Exercise. then 8(M) = 0. In view of Corollary 3 to Theorem 4. we must first show that an alternating n-linear function 8 such that 8(1) = 1 is a multiplicative function. 4. • and let E\. Hence the third condition that is used in the characterization of the determinant is that the determinant of the n x n identity matrix is 1.s > r + 1. Then 8(E\) = -8(1). But 8(B) = -8(A) by (a). If .7 (p. 216). that is. 223). • If B is a matrix obtained from A G M„ X „(F) by adding a multiple of some row of A to another row. we have 8(AB) = 8(A)-8(B). Let 8: M n X n (F) — F be an alternating n-linear function. Then the rows of A. Since 5 is an n-linear function and C has two identical rows. B. it follows that 8(B) = 8(A) + k8(C) = 8(A) + fc-0 = 8(A). Proof.11. We wish to show that under certain circumstances.10 and the facts on page 223 about the determinant of an elementary matrix. this can happen only if 8(1) = 1. where j ^ i. Let 8: M n X n (F) — F be an alternating n-linear function. Proof.Sec. Corollary 3.

. (a) Any n-linear function 8: M . then 8(A) = det(A) for every A G M n X n (F). Hence by Theorems 4. it follows from Corollary 3 to Theorem 1. > (b) Any n-linear function 8: M n x n ( F ) — F Is a linear function of each * row of an n x n matrix when the other n — 1 rows are held fixed. (e) There is a unique alternating n-linear function 8: M n X n (F) — F. (d) If 8: M nXT .. we have 8(A) = 8(Em • • • E2EX) = <KJSm) = det(F w ) = det(A).. then by Corollary 2 to Theorem 4. 159). and has the property t hat 8[ I) 1. Label the following statements as true or false.10. Proof. Chap.lustily each answer. Since the corollary to Theorem 4.(1 •') I that i^ //-linear. If 8: M„ X H (F) — F is an alternating n-linear function • such that 8(1) = 1. on the other hand. then 8(A) = 0. 223). is alternating.6 (p.12. then A is invertible and hence is the product of elementary matrices (Corollary 3 to Theorem 3. Kxercise. we have 8(A) = det(A) in this case. . 4 Determinants T h e o r e m 4. Let 8: M n X n ( F ) — F be an alternating n-linear function such that > 8(1) = 1. then 8(B) = 8(A).7 (p.11 and 4. | X „(F) — F is a linear transformation. 217) gives det(A) = 0. > (f) The function 8: Mnxn(F) -* F defined by 8(A) = 0 for every A G M n x n ( F ) is an alternating //-linear function. A has rank n. . Determine all the 1-linear functions 8: M ) x i ( F ) — F. 8(A) = 0.10 and the facts on page 223 that 8(E) — det(F) for every elementary matrix E. HE2)-8(Ei) det(F 2 )*det(F 1 ) I = det(Fm---F2F.242 Proof. » Determine which of the functions 8: M3X. and let A G M „ x n ( F ) . (c) If 8: M n X f l (F) — F is an alternating n-linear function and the * matrix A G M „ x n ( F ) has two identical rows. If. say A = Em---E2En Since 8(1) = 1. 2.(F) — F is an alternating n-linear function and B is > obtained from A G M n x n ( F ) by interchanging two rows of A. EXERCISES 1.) Theorem 4.6 p. .12 provides the desired characterization of the determinant: It is the unique function 8: M.}(F) — F in Exercises 3-10 are * 3-linear functions. If A has rank less than n.

13. Prove that if 8(B) = —8(A) whenever B is obtained from A G M n X n ( F ) by interchanging any two rows of A. 8(A) = AiiA 2 3 A 3 2 6. Prove that if 8: M nXT1 (F) — F is an alternating n-linear function.d G F.d G F.2 (p.2 (p.c. 5(A) = A „ + A 2 3 + A32 7.Sec. where k is any nonzero scalar 4. 8(A) = AnA222A233 10. Give an example to show that the implication in Exercise 19 need not hold if F has characteristic two. Prove that the set of all n-linear functions over a field F is a vector space over F under the operations of function addition and scalar multiplication as defined in Example 3 of Section 1. then 8(M) = 0 whenever M G M n X n ( F ) has two identical rows. 19. 4.b.5 A Characterization of the Determinant 3. 15. Ai2A2xd .10. Prove Theorem 4. 8(A) = A 2 2 5. 16. b. 8(A) = AUA22A33 AUA21A32 11. 5(A) = AnA 21 A 3 2 8. 8(A) = k.c. Prove that the function 8: M 2X 2(F) — F defined by * 8(A) = AnA 2 2 a + AnA2ib + Ai2A22c + Ai2A2id is a 2-linear function. 9). Let 8: M n X n (F) — F be an n-linear function and F a field that does * not have characteristic two. where the sum and scalar product of n-linear functions are as defined in Example 3 of Section 1. Prove Corollaries 2 and 3 of Theorem 4. Prove that a linear combination of two n-linear functions is an n-linear function. Prove that det: M 2 x 2 ( F ) — F is a 2-linear function of the columns of > a matrix. then > there exists a scalar k such that 8(A) = A:det(A) for all A G M n X n (F). Prove that 8: M 2 x 2 (F) — F is a 2-linear function if and only if it has » the form 8(A) = An A 22 o + AnA 2 i& + A i 2 A 2 2 c + for some scalars a. 17. 14. Let a. 20. 9). 8(A) = AnA3iA32 243 9. 12. 18.11.

4 Determinants INDEX OF DEFINITIONS FOR CHAPTER 4 Alternating n-linear function 239 Angle between two vectors 202 Cofactor 210 Cofactor expansion along the first row 210 Cramer's rule 224 Determinant of a 2 x 2 matrix 200 Determinant of a matrix 210 Left-handed coordinate system 203 n-linear function 238 Orientation of an ordered basis 202 Parallelepiped. volume of 226 Parallelogram determined by two vectors 203 Right-handed coordinate system 202 .244 Chap.

2 5. 1.1 5. for example.4 Eigenvalues and Eigenvectors Diagonalizability Matrix Limits and Markov Chains Invariant Subspaces and the Cayley-Hamilton Theorem J. Section 5. as we will see in Chapter 7. For a given linear operator T on a finite-dimensional vector space V. we were able to obtain a formula for the reflection of R2 about the line y = 2x. and an answer to question 2 enables us to obtain easy solutions to many practical problems that can be formulated in a linear algebra context. If such a basis exists.5 D i a g o n a l i z a t i o n 5. they also prove to be useful tools in the study of many nondiagonalizable operators. how can it be found? Since computations involving diagonal matrices are simple. The key to our success was to find a basis (3' for which \J\p> is a diagonal matrix.3. we seek answers to the following questions. We consider some of these problems and their solutions in this chapter. Definitions. his chapter is concerned with the so-called diagonalization problem. an affirmative answer to question 1 leads us to a clearer understanding of how the operator T acts on V. see.3* 5. 5. Does there exist an ordered basis (3 for V such that [T]/? is a diagonal matrix? 2.1 EIGENVALUES AND EIGENVECTORS In Example 3 of Section 2. We now introduce the name for an operator or matrix that has such a basis. A solution to the diagonalization problem leads naturally to the concepts of eigenvalue and eigenvector. A linear operator T on a finite-dimensional vector space V is called diagonalizable if there is an ordered basis (3 for V such that [T]^ 245 .5. Aside from the important role that these concepts play in the diagonalization problem.

Definitions.. .. then for each vector Vj G f3. then D is a diagonal matrix and Djj is the eigenvalue corresponding to Vj for 1 < j < n. The scalar A is called the eigenvalue corresponding to the eigenvector v.vn} is an ordered basis of eigenvectors of T. The corresponding terms for eigenvalue are characteristic value and proper value. A nonzero vector v G F n is called an eigenvector of A if v is an eigenvector of LA. if so. . Let A be in M n X n ( F ) . Note that.246 is a diagonal matrix. each vector v in the basis 0 satisfies the condition that T(v) = An for some scalar A.4. Conversely. 0 = {v\. ifT is diagonalizable. v is nonzero. then clearly /A. that is.. The scalar X is called the eigenvalue of A corresponding to the eigenvector v.Likewise. Furthermore. A linear operator T on a tinilc-dhncnsional vector space V is diagonalizable if and only if there exists an ordered basis 0 for V consisting of eigenvectors ofT. we have J V ( j) = Y\ D*3Vi i=l D 33V3 = X3V3> where Xj = Djj.. These computations motivate the following definitions. .. Theorem 5. if D = [T]ff is a diagonal matrix. • • . . Chap. A 2 . The words characteristic vector and proper vector are also used in place of eigenvector.v2. because v lies in a basis..v2. ATl.i>2. Note that a vector is an eigenvector of a matrix A if and only if it is an eigenvector of LA. and D = [T]^.1. a scalar A is an eigenvalue of A if and only if it is an eigenvalue of L. if 0 = {i>i. 0 [T]3 = \u 0 K) 0 A2 0 In the preceding paragraph. diagonalizable.vn} for V such that [T]^ is a diagonal matrix. Moreover. .. we can summarize the preceding discussion as follows. Let T be a linear operator on a vector space V. how to obtain an ordered basis (3 = {vi. Using the terminology of eigenvectors and eigenvalues.v„} is an ordered basis for V such that T(VJ) = XjVj for some scalars Ai. 5 Diagonalization A square matrix A is called diagonalizable if \-A is We want to determine when a linear operator T on a finite-dimensional vector space V is diagonalizable and. if Av = Xv for some scalar X. A nonzero vector v G V is called an eigenvector of T if there exists a scalar A such that T(v) = An. . .

C°°(B) is a subspace of the vector space F(R.2. the sine and cosine functions. \-A(V2) = 1 3 4 2 = b = 5n2. Here Ai = —2 is the eigenvalue corresponding to v\. Thus there exist operators (and matrices) with no eigenvalues or eigenvectors. such operators and matrices are not diagonalizable. = -2 0 0 5 . the derivative of / . etc. and hence of A. (Thus CX(R) includes the polynomial functions.1 Eigenvalues and Eigenvectors 247 To diagonalize a matrix or a linear operator is to find a basis of eigenvectors and the corresponding eigenvalues. and hence of A. we consider three examples of eigenvalues and eigenvectors. with the corresponding eigenvalue A2 = 5. Before continuing our study of the diagonalization problem. Furthermore. It is clear geometrically that for any nonzero vector v. R) of all functions from R to R as defined in Section 1. Me Example 2 Let T be the linear operator on R2 that rotates each vector in the plane through an angle of 7r/2. Therefore T has no eigenvectors and. the exponential functions. 5. Example 1 Let A = Since U M = 1 3 4 2 1 -1 -2 = -2 2 I = -2vi. and so v2 is an eigenvector of LA. Note that 0 = {vi. Moreover. We determine the eigenvalues and eigenvectors of T. consequently. Let T: C°°(B) -> C oc (B) be the function defined by T ( / ) = / ' . by Theorem 5.Sec.v2} is an ordered basis for R2 consisting of eigenvectors of both A and L^. and therefore A and L.1.) Clearly. It is easily verified that T is a linear operator on C°°(B). 1 3 4 2 v\ = 1 I and v2 v\ is an eigenvector of L4.4 are diagonalizable. no eigenvalues. Of course.. • Example 3 Let Ceyo(R) denote the set of all functions f:R—>R having derivatives of all orders. the vectors v and T(v) are not collinear. hence T(v) is not a multiple of v.

2 t . Example 4 To find the eigenvalues of A = 1 1 4 1 6 M 2x2 (/2). Theorem 5. Then / ' = T ( / ) = A/.XIn) = 0 . By Theorem 2.tl2) = det 1-t 1 -/. When determining the eigenvalues of a matrix or a linear operator. The following theorem gives us a method for computing eigenvalues. However.tln) is called the characteristic polynomial l of A. • In order to obtain a basis of eigenvectors for a matrix (or a linear operator).5 (p. Consequently. we compute its characteristic polynomial: det(A . The polynomial f(t) = det(A . 5 Diagonalization Suppose that / is an eigenvector of T with corresponding eigenvalue A. as in the next example.2 states that the eigenvalues of a matrix are the zeros of its characteristic polynomial. Note that for A = 0. J . Let A G M n X n (F). the eigenvectors are the nonzero constant functions. Consequently.XIn)(v) = 0. however.248 Chap. and A corresponds to eigenvectors of the form cext for c ^ 0.2.3 = (t-3)(t + l). Proof. Theorem 5. we normally compute its characteristic polynomial. we need to be able to determine its eigenvalues and eigenvectors. Definition. this is true if and only if A — XIn is not invertible. It follows from Theorem 5.2 that the only eigenvalues of A are 3 and — 1. this result is equivalent to the statement that det(A — A/ n ) = 0. any results proved about determinants in Chapter 4 remain valid in this context. = t2 . They are. The observant reader may have noticed that the entries of the matrix A ~ tln are not scalars in the field F. 71). This is a first-order differential equation whose solutions are of the form f(t) = cext for some constant c. that is. A scalar A is an eigenvalue of A if and only if there exists a nonzero vector v G F n such that Av = An. Then a scalar X is an eigenvalue of A if and only if det (A . every real number A is an eigenvalue of T. scalars in another field F(t). Let A G M n x r i (F). (A . the field of quotients of polynomials in t with coefficients from F.

Then The characteristic polynomial of T is (\-t det(A . This fact enables us to define the characteristic polynomial of a linear operator as follows. Proof. Exercise.1 Eigenvalues and Eigenvectors 249 It is easily shown that similar matrices have the same characteristic polynomial (see Exercise 12). The remark preceding this definition shows that the definition is independent of the choice of ordered basis 0.t)(2 . then A is an eigenvalue of T if and only if A is an eigenvalue of [T]g.Sec. f(t) = det(A-tln). The next theorem tells us even more. (b) A has at most n distinct eigenvalues.3.tl3) = (let \ 0 0 1 2-t 0 0 2 3-r = (1 . • Examples 4 and 5 suggest that the characteristic polynomial of an n x n matrix A is a polynomial of degree n. Let T be a linear operator on an n-dimensional vector sjmcc V with ordered basis 0. (a) The characteristic polynomial of A is a polynomial of degree n with leading coefficient ( — l ) n . That is. 2. . Theorem 5. let 0 be the standard ordered basis for P 2 (B). Example 5 Let T be the linear operator on P 2 (B) defined by T(f(x)) = f(x)+(x+l)f'(x). We define the characteristic polynomial f(t) of T to be the characteristic polynomial of A = [T]^. Let A G M n x n (F). or 3. Definition. Hence A is an eigenvalue of T (or A) if and only if A = 1. Thus if T is a linear operator on a finite-dimensional vector space V and 0 is an ordered basis for V. It can be proved by a straightforward induction argument.0(3 .t) = -(*-l)(*-2)(t-3). 5. We often denote the characteristic polynomial of an operator T by det(T — tl). and let A = [T]^.

4. A vector v G V is an eigenvector of T corresponding to X if and only if v ^ 0 and v G N(T . Let T be a linear operator on a vector space V. T h e o r e m 5.2 W _ /0 V° Clearly the set of all solutions to this equation is Hence x is an eigenvector corresponding to Aj = 3 if and only if x =t { \ for some t ^ 0.AI). Our next result gives us a procedure for determining the eigenvectors corresponding to a given eigenvalue. 5 Diagonalization Theorem 5. Ai = 3 and A2 = — 1.2 enables us to determine all the eigenvalues of a matrix or a linear operator on a finite-dimensional vector space provided that we can compute the zeros of the characteristic polynomial. ). Now suppose that x is an eigenvector of A corresponding to A2 = —1. and let X be an eigenvalue ofT. We begin by finding all the eigenvectors corresponding to Ai = 3. Exercise. x ^ 0 and -2 4 l \ (xx\ -2) \x2) __ (-2xx + x2\ V 4 *i . Let Then is an eigenvector corresponding to Ai = 3 if and only if x ^ 0 and x G N(LR. Proof. recall that A has two eigenvalues. Let B2 = A. that is.250 Chap.i X2I = 4 i\ I (-\ \ 0 o^ -1/ (2 \s V4 2 . Example 6 To find all the eigenvectors of the matrix in Example 4.

We show that v € V is an eigenvector of T corresponding to A if and only if 4>p(v) .2 in Section 2. • Suppose that 0 is a basis for F n consisting of eigenvectors of A. for instance. To find the eigenvectors of a linear operator T on an /". Thus L^. if Q = then ». 5. Of course. Figure 5. Hence N(Lfl.i is a basis for R2 consisting of eigenvectors of A.) = | t ( 2)-teRj. select an ordered basis 0 for V and let A = [T]^. In Example 6. and hence A. the coordinate vector of v relative to 0.4 in which V = W and 0 = 7.Sec.23 assures us that if Q is the nxn matrix whose columns are the vectors in 0. then Q"1 AQ is a diagonal matrix.-dimensional vector space. . The corollary to Theorem 2.i Q~lAQ = 3 0 0 -1 2/' -2 for some t ^ 0.1 is the special case of Figure 2. the diagonal entries of this matrix are the eigenvalues of A that correspond to the respective columns of Q. Thus x is an eigenvector corresponding to A2 = — 1 if and only if x= t Observe that » . <f)p(v) = [v]p. Recall that for v G V. is diagonalizable.1 Eigenvalues and Eigenvectors Then 1 251 'xi x2 ' E N(Lfi2 if and only if rr is a solution to the system 2xi + x2 = 0 4xi + 2x2 = 0.

252 V V </'. since 0/? is an isomorphism. Suppose that v is an eigenvector of T corresponding to A. The next example illustrates this procedure. Recall that T has eigenvalues 1.( F" —bi_» p" Figure 5. Example 7 Let T be the linear operator on P2(R) defined in Example 5. and 3 and that A = [T}0 = / . . then v is an eigenvector of T corresponding to A. Let Ai = 1. 2. and define Bi = A-XJ Then j 3 6 R ! /0 = 0 \0 1 0N 1 2 |. Then T(v) = A?. 1 0\ 0 2 2 . a vector y G F n is an eigenvector of A corresponding to A if and only if 01 ] (y) is an eigenvector of T corresponding to A. 5 Diagonalization is an eigenvector of A corresponding to A. \ 0 0 3/ We consider each eigenvalue separately. 0 2.1 Chap. (See Exercise 13. and so we can establish that if 4>g(v) is an eigenvector of A corresponding to A. Thus we have reduced the problem of finding the eigenvectors of a linear operator on a finite-dimensional vector space to the problem of finding the eigenvectors of a matrix.) An equivalent formulation of the result discussed in the preceding paragraph is that for an eigenvalue A of A (and hence of T). This argument is reversible. Now 4>p(v) 7^ 0. and let 0 be the standard ordered basis for P 2 (i?). Hence A<p0(v) = \-A<j>p{v) = 4>pT(v) = 4>n(Xv) = Xcj>fl(v). hence 0/3 (n) is an eigenvector * of A..

Sec. 5.1 Eigenvalues and Eigenvectors

253

is an eigenvector corresponding to Ai = 1 if and only if x ^ 0 and x G N(LBJ); that is. x is a nonzero solution to the system x2 =0 x2 + 2x 3 = 0 2x3 = 0. Notice that this system has three unknowns, xi, x 2 , and X3, but one of these, xi, does not actually appear in the system. Since the values of xi do not affect the system, we assign xi a parametric value, say xi = a, and solve the system for x 2 and X3. Clearly, x 2 = X3 = 0, and so the eigenvectors of A corresponding to Ai = 1 are of the form

for a ^ 0. Consequently, the eigenvectors of T corresponding to Ai = 1 are of the form 0^ 1 (aei) = a0^ x (ei) = a-1 = a for any a ^ O . Hence the nonzero constant polynomials are the eigenvectors of T corresponding to Ai = 1. Next let A2 = 2, and define ' - 1 1 0N B2 = A - X2I = ( 0 0 2 0 0 1 It is easily verified that

and hence the eigenvectors of T corresponding to A2 = 2 are of the form ^" (ei + e 2 ) = a(l + x) V for 0 ^ 0. Finally, consider A3 = 3 and A-A3/= -2 0 0 1 0X -1 2 0 0

254 Since

Chap. 5 Diagonalization

the eigenvectors of T corresponding to A3 = 3 are of the form 0 ^ [ o | 2 | ) = a0 / ; 1 (e i + 2e2 + e 3 ) = a(l + 2x + x 2 ) V for a ^ 0. For each eigenvalue, select the corresponding eigenvector with a = 1 in the preceding descriptions to obtain 7 = { l , l + x , l + 2 x + x 2 } , which is an ordered basis for P2(R) consisting of eigenvectors of T. Thus T is diagonalizable. and / l 0 0\ [T]7 = 0 2 0 . \ 0 0 3/ W /

We close this section with a geometric description of how a linear operator T acts on an eigenvector in the context of a vector space V over R. Let v be an eigenvector of T and A be the corresponding eigenvalue. We can think of W = span({n}), the one-dimensional subspace of V spanned by v, as a line in V that passes through 0 and v. For any w G W, w = cv for some scalar c, and hence T(iy) = T(cn) = cT(n) = cAn = Xw; so T acts on the vectors in W by multiplying each such vector by A. There are several possible ways for T to act on the vectors in W, depending on the value of A. We consider several cases. (See Figure 5.2.) CASE 1. If A > 1, then T moves vectors in W farther from 0 by a factor of A. CASE 2. If A = 1, then T acts as the identity operator on W. CASE 3. If 0 < A < 1, then T moves vectors in W closer to 0 by a factor of A. CASE 4. If A = 0, then T acts as the zero transformation on W. CASE 5. If A < 0, then T reverses the orientation of W; that is, T moves vectors in W from one side of 0 to the other.

Sec. 5.1 Eigenvalues and Eigenvectors Case 1: A > 1

255

Case 2: A = 1

Case 3: 0 < A < 1

Case 4: A = 0

Case 5: A < 0

Figure 5.2: The action of T on W = span({v}) when v is an eigenvector of T. To illustrate these ideas, we consider the linear operators in Examples 3, 4, and 2 of Section 2.1. For the operator T on R2 defined by T(ai,a 2 ) = (ai, — a2), the reflection about the x-axis, ei and e2 are eigenvectors of T with corresponding eigenvalues 1 and — 1, respectively. Since ei and e 2 span the x-axis and the y-axis, respectively, T acts as the identity on the x-axis and reverses the orientation of the t/-axis. For the operator T on R2 defined by T(ai,a 2 ) = (ai,0), the projection on the x-axis, ei and e 2 are eigenvectors of T with corresponding eigenvalues 1 and 0, respectively. Thus, T acts as the identity on the x-axis and as the zero operator on the y-axis. Finally, we generalize Example 2 of this section by considering the operator that rotates the plane through the angle 9, which is defined by T0(0.1,0,2) = (ai cos# — a2sin*9, ai sin# + a2 cos9). Suppose that 0 < 9 < n. Then for any nonzero vector v, the vectors v and T#(n) are not collinear, and hence T# maps no one-dimensional subspace of R2 into itself. But this implies that T0 has no eigenvectors and therefore no eigenvalues. To confirm this conclusion, let 0 be the standard ordered basis for R2, and note that the characteristic polynomial of T# is det{[Te}3 - tl2) = det cos 9 — t — sin 1 = t - 2 - ( 2 c o s 0 ) t + l, sin# cos 9 — t

256

Chap. 5 Diagonalization

which has no real zeros because, for 0 < 9 < it, the discriminant 4 cos2 9 — 4 is negative. EXERCISES Label the following statements as true or false. (a) Every linear operator on an n-dimensional vector space has n distinct eigenvalues. (b) If a real matrix has one eigenvector, then it has an infinite number of eigenvectors. (c) There exists a square matrix with no eigenvectors. (d) Eigenvalues must be nonzero scalars. (e) Any two eigenvectors are linearly independent. (f) The sum of two eigenvalues of a linear operator T is also an eigenvalue of T. (g) Linear operators on infinite-dimensional vector spaces never have eigenvalues. (h) An n x n matrix A with entries from a field F is similar to a diagonal matrix if and only if there is a basis for F n consisting of eigenvectors of A. (i) Similar matrices always have the same eigenvalues, (j) Similar matrices always have the same eigenvectors, (k) The sum of two eigenvectors of an operator T is always an eigenvector of T. 2. For each of the following linear operators T on a vector space V and ordered bases 0, compute [T]/?, and determine whether 0 is a basis consisting of eigenvectors of T. 1 0 a - 66 , and 0 = (a) V = R2, T 17a - 106 (b) V = Pi (R), J(a + 6x) = (6a - 66) + (12a - 116)x, and 3 = {3 + 4x, 2 + 3x} 3a + 26 - 2c' (c) V = R - 4 a - 36 + 2c . and

(d) V = P 2 (i?),T(a + 6x + cx 2 ) = ( - 4 a + 26 - 2c) - (7a + 36 + 7c)x + (7a + 6 + 5c> 2 , and 0 = {x - x 2 , - 1 + x 2 , - 1 - x + x 2 }

Sec. 5.1 Eigenvalues and Eigenvectors (e) V = P3(R), T(a + 6x + ex2 + dx 3 ) = - d + (-c + d)x + (a + 6 - 2c)x2 + ( - 6 + c - 2d)x 3 , and /? = {1 - x + x 3 , l + x 2 , l , x + x 2 } (f) V = M 2 x 2 (i2),T c 1 0 1 0 d - 7 a - 4 6 + Ac -Ad b , and - 8 a - 46 + 5c - Ad d -1 2 0 0 1 0 2 0 -1 0 0 2

257

0 =

3. For each of the following matrices A G M n X n (F), (i) Determine all the eigenvalues of A. (ii) For each eigenvalue A of A, find the set of eigenvectors corresponding to A.
n (hi; If possible, find a basis for F consisting of eigenvectors of A. (iv; If successful in finding such a basis, determine an invertible matrix Q and a diagonal matrix D such that Q~x AQ = D.

=R

for F = R

for F = C

for F = R 4. For each linear operator T on V, find the eigenvalues of T and an ordered basis 0 for V such that [T]/3 is a diagonal matrix. (a) (b) (c) (d) (e) (f) (g) V= V= V= V= V= V= V= R2 and T(a, 6) = ( - 2 a + 36. - 10a + 96) R3 and T(a, 6, c) = (7a - 46 + 10c, 4a - 36 + 8c, - 2 a + 6 - 2c) R3 and T(a, 6, c) = ( - 4 a + 3 6 - 6c. 6a - 76 + 12c, 6a - 66 + lie) P,(7?) and T(u.r + 6) = ( - 6 a + 26)x + ( - 6 a + 6) P2(R) and T(/(x)) = x/'(x) + /(2)x + /(3) P3(R) and T(/(x)) = f(x) + /(2)x P3(R) and T(/(x)) = xf'(x) + f"(x) - f(2) and T (a ^

(h) V = M2x2(R)

258 (i) V = M 2 x 2 (fi) and T d1 c \a d 6

Chap. 5 Diagonalization

(j) V = M2x2(/?.) and T(A) = A1 + 2 • tr(A) • I2 5. Prove Theorem 5.4. 6. Let T be a linear operator on a finite-dimensional vector space V, and let 0 be an ordered basis for V. Prove that A is an eigenvalue of T if and only if A is an eigenvalue of [T]#. 7. Let T be a linear operator on a finite-dimensional vector space V. We define the d e t e r m i n a n t of T, denoted det(T), as follows: Choose any ordered basis 0 for V, and define det(T) = det([T]/3). (a) Prove that the preceding definition is independent of the choice of an ordered basis for V. That is, prove that if 0 and 7 are two ordered bases for V, then det([T]/3) = det([T] 7 ). (b) Prove that T is invertible if and only if det(T) ^ 0. (c) Prove that if T is invertible, then det(T _ 1 ) = [det(T)]" 1 . (d) Prove that if U is also a linear operator on V, then det(TU) = dct(T)-det(U). (e) Prove that det(T — Aly) = det([T]/j — XI) for any scalar A and any ordered basis 0 for V. 8. (a) Prove that a linear operator T on a finite-dimensional vector space is invertible if and only if zero is not an eigenvalue of T. (b) Let T be an invertible linear operator. Prove that a scalar A is an eigenvalue of T if and only if A~! is an eigenvalue of T - 1 . (c) State and prove results analogous to (a) and (b) for matrices. 9. Prove that the eigenvalues of an upper triangular matrix M are the diagonal entries of M. 10. Let V be a finite-dimensional vector space, and let A be any scalar. (a) For any ordered basis 0 for V, prove that [Aly]/? = XI. (b) Compute the characteristic polynomial of Aly. (c) Show that Aly is diagonalizable and has only one eigenvalue. 11. A scalar m a t r i x is a square matrix of the form XI for some scalar A; that is, a scalar matrix is a diagonal matrix in which all the diagonal entries are equal. (a) Prove that if a square matrix A is similar to a scalar matrix XI, then A- XL (b) Show that a diagonalizable matrix having only one eigenvalue is a scalar matrix.

Sec. 5.1 Eigenvalues and Eigenvectors (c) Prove that I . 1 is not diagonalizable.

259

12. (a) Prove that similar matrices have the same characteristic polynomial. (b) Show that the definition of the characteristic polynomial of a linear operator on a finite-dimensional vector space V is independent of the choice of basis for V. 13. Let T be a linear operator on a finite-dimensional vector space V over a field F, let 0 be an ordered basis for V, and let A = [T]/3. In reference to Figure 5.1, prove the following. (a) If r G V and <t>0(v) is an eigenvector of A corresponding to the eigenvalue A, then v is an eigenvector of T corresponding to A. (b) If A is an eigenvalue of A (and hence of T), then a vector y G F n is an eigenvector of A corresponding to A if and only if 4>~Q (y) is an eigenvector of T corresponding to A. 14.* For any square matrix A, prove that A and A1 have the same characteristic polynomial (and hence the same eigenvalues). 15- (a) Let T be a linear operator on a vector space V, and let x be an eigenvector of T corresponding to the eigenvalue A. For any positive integer m, prove that x is an eigenvector of T m corresponding to the eigenvalue A7", (b) State and prove the analogous result for matrices. 16. (a) Prove that similar matrices have the same trace. Hint: Use Exercise 13 of Section 2.3. (b) How would you define the trace of a linear operator on a finitedimensional vector space? Justify that your definition is welldefined. 17. Let T be the linear operator on M n X n (i?) defined by T(A) = A1. (a) Show that ±1 are the only eigenvalues of T. (b) Describe the eigenvectors corresponding to each eigenvalue of T. (c) Find an ordered basis 0 for M2x2(Z?) such that [T]/3 is a diagonal matrix. (d) Find an ordered basis 0 for Mnx„(R) such that [T]p is a diagonal matrix for n > 2. 18. Let A,B e M„ x n (C). (a) Prove that if B is invertible, then there exists a scalar c G C such that A + cB is not invertible. Hint: Examine det(A + cB).

r 260 Chap. 5 Diagonalization (b) Find nonzero 2 x 2 matrices A and B such that both A and A + cB are invertible for all e G C. 19. * Let A and B be similar nxn matrices. Prove that there exists an ndimensional vector space V, a linear operator T on V, and ordered bases 0 and 7 for V such that A = [T]/3 and B = [T]7. Hint: Use Exercise 14 of Section 2.5. 20. Let A be an nxn matrix with characteristic polynomial + a n - i r " - 1 + • • • + a ^ + a0.

f(t) = (-l)ntn

Prove that /(0) = an = det(A). Deduce that A is invertible if and only if a 0 ^ 0. 21. Let A and f(t) be as in Exercise 20. (a) Prove t h a t / ( 0 = (An-t)(A22-t) ••• (Ann-t) + q(t), where q(t) is a polynomial of degree at most n — 2. Hint: Apply mathematical induction to n. (b) Show that tr(A) = ( - l ) n - 1 a n _ i . 22. t (a) Let T be a linear operator on a vector space V over the field F, and let g(t) be a polynomial with coefficients from F. Prove that if x is an eigenvector of T with corresponding eigenvalue A, then g(T)(x) = g(X)x. That is, x is an eigenvector of o(T) with corresponding eigenvalue g(X). (b) State and prove a comparable result for matrices. (c) Verify (b) for the matrix A in Exercise 3(a) with polynomial g(t) = 2t2 — t + 1, eigenvector x = I „ J, and corresponding eigenvalue A = 4. 23. Use Exercise 22 to prove that if f(t) is the characteristic polynomial of a diagonalizable linear operator T, then /(T) = To, the zero operator. (In Section 5.4 we prove that this result does not depend on the diagonalizability of T.) 24. Use Exercise 21(a) to prove Theorem 5.3. 25. Prove Corollaries 1 and 2 of Theorem 5.3. 26. Determine the number of distinct characteristic polynomials of matrices in M 2 x 2 (Z 2 ).

Sec. 5.2 Diagonalizability 5.2 DIAGONALIZABILITY

261

In Section 5.1, we presented the diagonalization problem and observed that not all linear operators or matrices are diagonalizable. Although we are able to diagonalize operators and matrices and even obtain a necessary and sufficient condition for diagonalizability (Theorem 5.1 p. 246), we have not yet solved the diagonalization problem. What is still needed is a simple test to determine whether an operator or a matrix can be diagonalized, as well as a method for actually finding a basis of eigenvectors. In this section, we develop such a test and method. In Example 6 of Section 5.1, we obtained a basis of eigenvectors by choosing one eigenvector corresponding to each eigenvalue. In general, such a procedure does not yield a basis, but the following theorem shows that any set constructed in this manner is linearly independent. Theorem 5.5. Let T be a linear operator on a vector space V, and let Ai, A 2 ,..., Xk be distinct eigenvalues of T. Ifvi,v2, • • • ,vk are eigenvectors of T such that Xi corresponds to Vi (I < i < k), then {vi,v2,... ,vk} is linearly independent. Proof. The proof is by mathematical induction on k. Suppose that k = 1. Then vi ^ 0 since v\ is an eigenvector, and hence {vi} is linearly independent. Now assume that the theorem holds for k - 1 distinct eigenvalues, where k — 1 > 1, and that we have k eigenvectors v\, v2,..., Vk corresponding to the distinct eigenvalues Ai, A 2 , . . . , Xk. We wish to show that {vi, v2,..., vk} is linearly independent. Suppose that a i , a 2 , . . . , a k are scalars such that aini+a2n2-l Yakvk = 0. (1)

Applying T - A^ I to both sides of (1), we obtain ai(Ai - Xk)vi +a 2 (A 2 - Afc)n2 + ••• +a fc _i(A fe _ 1 - Xk)vk-i By the induction hypothesis { f i , v 2 , . . . ,vk-i} hence = 0.

is linearly independent, and

ai(Xi - Afc) = a 2 (A 2 - Afc) = • • • = a fc _i(A fc -i - Afc) = 0. Since Ai, A 2 ,... , A^ are distinct, it follows that Aj - A^ ^ 0 for 1 < i < k — 1. So ai = a 2 = • • • = afc_i = 0, and (1) therefore reduces to akvk = 0. But vk ^ 0 and therefore ak = 0. Consequently ai = a 2 = • • • = ak = 0, and it follows that {vi,v2,... ,vk} is linearly independent. Corollary. Let T be a linear operator on an n-dimensional vector space V. If T has n distinct eigenvalues, then T is diagonalizable.

262

Chap. 5 Diagonalization

Proof. Suppose that T has n distinct eigenvalues Ai,...,A„. For each i choose an eigenvector Vi corresponding to A;. By Theorem 5.5, {v\,... ,vn} is linearly independent, and since dim(V) = n, this set is a basis for V. Thus, by Theorem 5.1 (p. 246), T is diagonalizable. 1 Example 1 Let A = 1 1 G 1 1 M2x2(R).

The characteristic polynomial of A (and hence of L^) is det (A - tl) = det \-t 1 1 1-rf = t(t - 2),

and thus the eigenvalues of L^ are 0 and 2. Since L^ is a linear operator on the two-dimensional vector space R2, we conclude from the preceding corollary that L.4 (and hence A) is diagonalizable. • The converse of Theorem 5.5 is false. That is, it is not true that if T is diagonalizable. then it has n distinct eigenvalues. For example, the identity operator is diagonalizable even though it has only one eigenvalue, namely, A = l. We have seen that diagonalizability requires the existence of eigenvalues. Actually, diagonalizability imposes a stronger condition on the characteristic polynomial. Definition. A polynomial f(t) in P(F) splits over F if there are scalars c , a i , . . . ,a„ (not necessarily distinct) in F such that f(t) = c(t-ai)(t-a2)---(t-an).

For example, t2 - 1 = (t + l)(t - 1) splits over R, but (t2 + l)(t - 2) does not split over R because t2 + 1 cannot be factored into a product of linear factors. However, (t2 + \)(t - 2) does split over C because it factors into the product (t + i)(t — i)(t — 2). If f(t) is the characteristic polynomial of a linear operator or a matrix over a field F , then the statement that f(t) splits is understood to mean that it splits over F. T h e o r e m 5.6. The characteristic polynomial of any diagonalizable linear operator splits. Proof. Let T be a diagonalizable linear operator on the n-dimensional vector space V. and let 0 be an ordered basis for V such that [T)p = D is a

Sec. 5.2 Diagonalizability diagonal matrix. Suppose that (Xx 0 D= \o o •••
A,,/

263

0 A2

0\ 0

and let f(t) be the characteristic polynomial of T. Then (Xx-t 0 f(t) = det(D - tl) = det V 0 0 ••• A»-t/ I 0 A2 - f ••• ••• 0 \ 0

= (Ai - t)(X2 - t) • • • (A„ - 1 ) = ( - 1 H * - Ai)(t - A 2 )... (t - An). From this theorem, it is clear that if T is a diagonalizable linear operator on an n-dimensional vector space that fails to have distinct eigenvalues, then the characteristic polynomial of T must have repeated zeros. The converse of Theorem 5.6 is false: that is, the character ist i polynomial < of T may split, but T need not be diagonalizable. (See Example 3. which follows.) The following concept helps us determine when an operator whose characteristic polynomial splits is diagonalizable. Definition. Let X be an eigenvalue of a linear operator or matrix with characteristic polynomial f(t). The (algebraic) multiplicity of X is the largest positive integer k for which (t - X)k is a factor of f(t). Example 2 Let

which has characteristic polynomial f(t) = — (t — 3)2(t — A). Hence A = 3 is an eigenvalue of A with multiplicity 2. and A = 4 is an eigenvalue of A with multiplicity 1. • If T is a diagonalizable linear operator on a finite-dimensional vector spate V, then there is an ordered basis 0 for V consisting of eigenvectors of T. We know from Theorem 5.1 (p. 246) that [T]/3 is a diagonal matrix in which the diagonal entries are the eigenvalues of T. Since the characteristic polynomial of T is detQT]/? — tl), it is easily seen that each eigenvalue of T must occur as a diagonal entry of [T]/3 exactly as many times as its multiplicity. Hence

264

Chap. 5 Diagonalization

0 contains as many (linearly independent) eigenvectors corresponding to an eigenvalue as the multiplicity of that eigenvalue. So the number of linearly independent eigenvectors corresponding to a given eigenvalue is of interest in determining whether an operator can be diagonalized. Recalling from Theorem 5.4 (p. 250) that the eigenvectors of T corresponding to the eigenvalue A are the nonzero vectors in the null space of T — AI, we are led naturally to the study of this set. Definition. Let T be a linear operator on a vector space V, and let X be an eigenvalue ofT. Define E\ = {x G V: T(x) = Ax} = N(T — Aly). The set E\ is called the eigenspace of T corresponding to the eigenvalue X. Analogously, we define the eigenspace of a square matrix A to be the eigenspace of LA. Clearly, E\ is a subspace of V consisting of the zero vector and the eigenvectors of T corresponding to the eigenvalue A. The maximum number of linearly independent eigenvectors of T corresponding to the eigenvalue A is therefore the dimension of EA- Our next result relates this dimension to the multiplicity of A. T h e o r e m 5.7. Let T be a linear operator on a finite-dimensional vector space V, and let X be an eigenvalue of T having multiplicity m. Then 1 < dim(EA) < m. Proof. Choose an ordered basis {v\. r2 vp} for EA. extend it to an ordered basis 0 = {vi, t)2, • • •, Up, vp+i,..., vn} for V, and let A = [T]g. Observe that V{ (1 < i < p) is an eigenvector of T corresponding to A, and therefore
A

-{()

C

By Exercise 21 of Section 4.3, the characteristic polynomial of T is f{t) = det(A - tin) = det ({X = det((A - t)Ip) det(C = (X-ty>g(t), J)l" tln-p) B C tln-p

where g(t) is a polynomial. Thus (A — t)p is a factor of f(t), and hence the multiplicity of A is at least p. But dim(EA) = p, and so dim(EA) < rn. Example 3 Let T be the linear operator on P2(R) defined by T(/(x)) = f'(x). The matrix representation of T with respect to the standard ordered basis 0 for

Consequently. and therefore T is not diagonalizable. the characteristic polynomial of T is /-/.) [J}0 = 0 \0 1 0\ 0 2 0 0/ 265 Consequently.Sec.AI) = N(T) is the subspace of P2(R) consisting of the constant polynomials. 5.tl) = (let [ 2 \ I 0 3.1 det([T]/3 . Solving T(/(. -t. . Let > be the standard ordered basis for R3. and hence the characteristic polynomial of T is /4 . det([T]p . So the eigenvalues of T are Ai = 5 and A2 = 3 with multiplicities 1 and 2.= I 2 3 l\ 2 . • Example 4 Let T be the linear operator on R3 defined by A. So {1} is a basis for EA.i/ \ ai + Au->) T We determine the eigenspace of T corresponding to each eigenvalue.. A /4ai + a3\ a 2 I = I 2ai + 3a 2 + 2o3 .£/) = det 0 V 0 1 -/ 0 0N 2 = -t*.) is /. there is no basis for P2(R) consisting of eigenvectors of T.r)) = f'(x) = 0 shows that EA = N(T . respectively.2 Diagonalizability P2(R. and therefore dim(EA) = 1. Then /4 0 [T]. Thus T has only one eigenvalue (A = 0) wit h mult iplicity 3. \".t 0 1N 2 A-t -(t-5)(t-3)2.

x3 = 0. we assign it a parametric value. introducing another parameter t. • . The result is the general solution to the system 0 .) = 1. • Hence dim(EA. Consequently. Since the unknown x 2 does not appear in this system. t G R.2x 2 + 2x 3 = 0 xi . namely. In this case. say. is linearly independent and hence is a basis for R3 consisting of eigenvectors of T. 5 Diagonalization EAX is the solution space of the system of linear equations -xi + x3 = 0 2xi . for s. Similarly. the multiplicity of each eigenvalue Ai is equal to the dimension of the corresponding eigenspace EA^ . It is easily seen (using the techniques of Chapter 3) that is a basis for EA. Observe that the union of the two bases just derived. EA2 = N(T — A2I) is the solution space of the system x\ + x 3 = 0 2xi + 2x3 = 0 x\ + x 3 = 0. X2 = s. T is diagonalizable. and dim(EA2) = 2. It follows that is a basis for EA2. and solve the system for xi and x3.266 Chap.

2. This is indeed true. For each i = 1. = 0. for each /'..5. Therefore.5.. be distinct eigenvalues ot'T.. let Wi = ^ajjVij. Bui each . be a finite linearly independent subset of the eigenspace EA.. suppose that. Wi — 0 for all i. • Then S = Si U S2 u • • • U Sk is a linearly independent subset ofW. and W\ + • • • + wk = 0. Proof. Vi is an eigenvector of T corresponding to A.Sec.scalars {a^} such that ^^a-ijVij i=i j=i For each i. therefore.vini}. and 1 < i < A:}. that Vi = 0 for all i. But this contradicts Theorem 5. for each i < m.. Let T be a linear operator on a vector space V. let Vi G EAJ. we have v. which is a slight variation of Theorem 5.. We conclude that S is lineally independent. by the lemma. and hence a^ — 0 for all j. We conclude. and Vi = 0 for / > m. the eigenspace corresponding to A. A : be distinct A eigenvalues of T. is linearly independent. and let Ai. which states that these r..."s are linearly independent. . T h e o r e m 5.... Then Wi G EA.2.. ^ 0 for 1 < i < m.. Suppose otherwise. Lemma.. Consider an\. . and let X\.A. For each i = 1.. 5.A2 . for 1 < m < k.. We begin with the following lemma. A/..k.2 Diagonalizability 267 Examples 3 and 1 suggest that an operator whose characteristic polynomial splits is diagonalizable if and only if the dimension of each eigenspace is equal to the multiplicity of the corresponding eigenvalue. let S. Suppose that for each i Si = {viUVi2.. //' Vi+v2-\ then V{ = 0 for all i.. Then. A 2 . By renumbering if necessary. as we now show.S'. Then S = [vy : 1 < j < /.8.. Proof. and v\ + v2 H h vm = 0. Let T be a linear operator. rVk = 0.

0 contains = n i=l i=l 2 We regard 0\ U 02 U • • • l)0k as an ordered basis in the natural way. di < 2 . then the vectors in /52 (in the same order as in ii->). for all i..7. Proof. (b) If T is diagonaliy. denote the multiplicity of A. etc. the set of vectors in 0 that are eigenvectors corresponding to At. since d. For each /. Con\crsely.9. A 2 .. let /... let 0i be an ordered basis for E < . d. is a linearly independent subset of a subspace of dimension (/. For each i. by collecting bases for the individual eigeuspaces. = m. let 0r = 0 n E\t.. and let n^ denote the number of vectors in ).8. for each i.) for all i. Let 0 be a basis for V consisting of eigenvectors of T. i — /_. < <l. — di) > 0 for all i.-the vectors in 0\ are listed first (in the same order as in f3\). Thus k k k n n = 2__. and n = dim(V). . Then (a) T is diagonalizable if and only if the multiplicity of Xi is equal to dim(EA.nhie and I. = dim(EA. mi 'I 1=1 2=1 It follows that ^ ( m . namely.). we conclude that ntj = dt. Then n. suj)pose that nij = d.. = n - . For each /. then 0 = 0i U02 U • • • U0k is an ordered basis2 for V consisting of eigenvectors ofT.268 Chap. By Theorem 5. is an ordered basis for EA.. Let Ai. The next theorem tells us when the resulting set is a basis for the entire space. We simultaneously show that T is diagonalizable and prove (b). and di < nrii by Theorem 5. A Furthermore.for all i.d t ) = 0. for all i.. and let 0 = 0\ \J02U • • • U0k.. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. First. Xk be the distinct eigenvalues ofT.. The n^'s sum to n because 0 contains n vectors. 5 Diagonalization Theorem 5. for each / because 1. suppose that T is diagonalizable.. i=i Since (m. T h e o r e m 5. 0 is linearly independent.8 tells us how to construct a linearly independent subset of eigenvectors. The mVs also sum to n because the degree of the characteristic polynomial of T is equal to the sum of the multiplicities of the eigenvalues.

ra\\k(B — Xf). If T is a diagonalizable operator and :j\. we can use the corollary to Theorem 2.rank(T — AI). The characteristic polynomial ofT splits. condition 2 is automatically satisfied for eigenvalues of multiplicity 1. For each eigenvalue A ofT. If the characteristic polynomial of B splits. 0k are ordered bases for the eigenspaces of T.Sec.. The union of these bases is a basis *) for F" consisting of eigenvectors of B. Furthermore. and we conclude that T is diagonalizable. then use condition 2 above to check if the multiplicity of each repeated eigenvalue of B equals n . 115) to find an invertible n x n matrix Q and a diagonal n x n matrix D such that Q~l AQ = D.. If T is diagonalizable and a basis 8 for V consisting of eigenvectors o f T is desired.. 1. We now consider some examples illustrating the preceding ideas. The matrix Q has as its columns the vectors in a basis of eigenvectors of A. then the union 0 = 0iU02U-• -U0k is an ordered basis for V consisting of eigenvectors of T. and hence T. then we first find a basis for each eigenspace of B. (By Theorem 5. 02. We summarize our results. Then T is diagonalizable if and only if both of the following conditions hold. 2. if A is an n x n diagonalizable matrix. is diagonalizable. it is usually easiest to choose a convenient basis Q for V and work with B = [T] a . Example 5 We test t he matrix G M 3x :i(/r) for diagonalizability. . Each vector in 7 is the coordinate vector relative to a of an eigenvector of T. and hence [T]@ is a diagonal matrix. This theorem completes our study of the diagonalization problem.7. The set consisting of these n eigenvectors of T is the desired basis >. Test for Diagonalization Let T be a linear operator on an n-dimensional vector space V.23 (p.. and I) has as its j t h diagonal entry the eigenvalue of A corresponding to the jth column of Q. When testing T for diagonalizability. Therefore 3 is an ordered basis for V consisting of eigenvectors of V. the multiplicity of A equals n .) If so. These same conditions can be used to test if a square matrix A is diagonalizable because diagonalizability of A is equivalent to diagonalizability of the operator L^. then B. 5.2 Diagonalizability 269 vectors.

We first test T for diagonalizability. Example 6 Let T be the linear operator on P2(R) defined by T(/(x)) = / ( l ) + f'(0)x + (/'(()) + /"(0))x 2 . condition 2 is satisfied for X\. Since A| has multiplicity 1. Then B = /l 0 \0 1 l\ 1 0 . 3 . The eigenspace corresponding to Ai = 1 is G R' 0 1 1\ (xi 0 0 0 u 0 1 1/ V'':i Ex. which is not the multiplicity of A2.X2I = I 0 0 0 I \° ° 7 has rank 2. we see that 3 — rank(A — A27) = 1. For this case. Also A has eigenvalues Ai = 4 and A2 = 3 with multiplicities 1 and 2. and so condition 1 of the test for diagonalization is satisfied. Hence condition 1 of the test for diagonalization is satisfied. Also B has the eigenvalues Ai = 1 and A2 = 2 with multiplicities 2 and 1. Therefore T is diagonalizable.270 Chap. and hence of T. and A is therefore not diagonalizable. So we need only verify condition 2 for Ai = 1 . respectively. Let a denote the standard ordered basis for P2(R) and B = [T)n. which splits. 1 l) The characterist ic polynomial of li.t J ) = -(t — A)(t-3)2.rank which is equal to the multiplicity of A|. We now find an ordered basis 7 for R3 of eigenvectors of B. Thus we need only test condition 2 for A2. I) = 3 . is (/ I )2{l 2). Condition 2 is satisfied for A2 because it has multiplicity 1. Thus condition 2 fails for A2. Because /() 1 0\ A . = = 0 . 5 Diagonalization The characteristic polynomial of A is d e t ( A . We consider each eigenvalue separately. respectively.A.rank(£ . which splits.

-x + x2.x i + x 2 + x3 = 0 X2 =0. The eigenspace corresponding to A2 = 2 is X2\ GR 3 : -1 0 0 1 V -1 0 1 (I. \o 0 2/ • . Finally. observe that the vectors in 7 are the coordinate vectors relative to a of the vectors in the set 0={l. and has tr 7i = as a basis.l+x2}. 271 EA2 = i r which is the solution space for the system .Sec. 5. Let ' tr 7 = 7i U 72 = Then 7 is an ordered basis for R-i consisting of eigenvectors of B. and has 72 = as a basis. which is an ordered basis for P2(R) consisting of eigenvectors of T. Thus / l 0 0\ [T}0 = 0 1 0 .2 Diagonalizability which is the solution space for the system x2 + 2:3 = 0.

272 Chap. 5 Diagonalization Our next example is an application of diagonalization that is of interest in Section 5. by the corollary to Theorem 2. By applying the corollary to Theorem 5. We then show how to use this result to compute An for any positive integer n. Then. 7l = and {C~i)} *""{(~i. Therefore = (QDQ-l)(QDQ-')- 2. Therefore 7 = 7i U 72 -2 1 is an ordered basis for R2 consisting of eigenvectors of A. we see that A is diagonalizable. Ai = 1 and A2 = 2.3. Let Q = -2 1 -T 1 the matrix whose columns are the vectors in 7. D = Q'lAQ = [LA}0 = 1 0 0 2) ' 1 To find A n for any positive integer n.5 to the operator L^. First observe that the characteristic polynomial of A is (t — l)(t — 2).23 (p. and hence A has two distinct eigenvalues. are bases for the eigenspaces EAX and EA2. respectively. Example 7 Let A = 0 1 -2 3 We show that A is diagonalizable and find a 2 x 2 matrix Q such that Q~1AQ is a diagonal matix. 115). Moreover.2 n + 1 -l+2n+1 . observe that A = QDQ An = = (QDQ'1)n • • (QDQ-1) QDnQ~l lo -2 1 2«I -l)(l l) l() i) H 2n \ 1 ~l 2 2 _ 2n -l+2n .

. denoted x'. The function y: R -> R3 defined by y(t) = Q~lx(t) can be shown to be differentiable. and y' = Q~lx' (see Exercise 16). is defined by /•'•'.2 Diagonalizability 273 We now consider an application that uses diagonalizat ion to solve a system of differential equations. Substitute A = QDQ~l into x' = Ax to obtain x' = QDQ~lx or. It can be verified that for Q = 0 -M 0 -1 -2 1 1 1 and D = (2 0 o 0 0 2 0 0 1 we have Q~lAQ = D. the solution in which each Xi(t) is the zero function. for each i. 5.Sec. Systems of Differential Equations Consider the system of differential equations x\ = 3Xi + X2 + X3 Xo = 2xi + 4x 2 + 2x 3 x3 = . so that we can rewrite the system as the matrix equation x' = Ax. where. We determine all of the solutions to this system. Xj = Xi(t) is a differentiable real-valued function of the real variable t. Let x: R — R3 be the function defined by > The derivative of x. Q~xx' = DQ~lx. (t)\ = A(t) V'3(t)) At) Let A = be the coefficient matrix of the given system.x i x2 + x3. namely. Hence the original system can be written as y' = Dy. this system has a solution. equivalently. Clearly.

c e At 3 \i c3e 4t .4t The expressions in brackets are arbitrary vectors in EA.c 2 . respectively. It is easily seen (as in Example 3 of Section 5. the system y' — Dy is easy to solve. and thus can be solved individually.2e3e + c 2 c 2 ' + c 3 e 4t yields the general solution of the original system. and EA2.2t + f. Thus the general solution of the original system is x(t) = e2tzi + e4tz2. where Ai = 2 and A2 = 4. 5 Diagonalization Since D is a diagonal matrix.1) that the general solution to these equations is yi(t) = cie 2 '. and C3 are arbitrary constants. Note that this solution can be written as x(t) = . There is a way of decomposing V into simpler subspaces that offers insight into the . and y3(t) — c3e4t. y2(t) = c 2 e 2t .c2e . where z\ G EA. (*i(0\ /-I M O = x(t) = Qy(t) = ° \x3(t)J \ " —cie2t C]t 21 2t 0 -1 -2 1 1 cie 2t c 2 e2t .274 Chap. and z2 G EA2. Finally. where ci. This result is generalized in Exercise 15. Direct Sums* Let T be a linear operator on a finite-dimensional vector space V. Setting f yi{t) V2{t) y(t) = we can rewrite y' = Dy as (v\(i) f2yi(t)" 2lte(0 The three equations y[ = 2j/i y2 = 22/2 y3 = 4i/3 are independent of each other.

if v = f > i=i and W.Sec. = {(a. and W 3 = {(0. b. . (a.• = {0} for each j (1 < j < k).0. The definition of direct sum that follows is a generalization of the definition given in the exercises of Section 1. d): d G R}.0. W 2 . b. we introduce a condition that assures this outcome.b. Definition. where (a. let W] denote the xy-p\ane.0..0): c G R}. which we denote by W] + W2 -\ It is a simple exercise to show that the sum of subspaces of a vector space is also a subspace. In the case of diagonalizable operators. . .3. Let V\/i. W2 = {(0.0. Example 8 Let V = R3.0. d) = (a. . c) G R3. 0. This approach is especially useful in Chapter 7. Then R3 = Wi + W2 because. 6.0) + (0. b. c. b. For any (a.\N2 W/. c) G W 2 . \Nk be .2 Diagonalizability 275 behavior of T. • Notice that in Example 8 the representation of (a.c).fr. c) is another representation. We define the sum of these subspaces to be the set {vi + v2 + r vk: Vi G Wi for 1 < i < k).0): a. 5. 0) + (0.0) + (0.c.0. W. d) G V. Example 9 Let V = R'. b. d) € Wi + W2 + W 3 . 6. (a. and (0. be subspaces of a vector space V.6.0. .. .0.W 2 vVk and write V = W. we have (a. c.0) G W.0.0. Definition. For example. k + WA or \ \ W.c) = (a. @ W 2 © .€ R}.subspaces of a vector space V.6. c. and let W 2 denote the yz-plane. c) = (a. for any vector (a. b. Let Wi. . We call V the direct sum of the subspaces Wi. c) as a sum of vectors in Wi and W2 is not unique. Because we are often interested in sums for which representations are unique.. where we study nondiagonalizable linear operators. n Y^ W. the simpler subspaces are the eigenspaces ol the operator.0 V V A . 0) + (0.

Assume (a)... (c) Each vector v G V can be uniquely written as v = vi + v2 + • • • + vk. Let v G V.vk such that Vi G Wj B J=I (1 <i< k).. and W3.vk are vectors such that Vi G W^ for all i and V\ + V2 H \~vk = 0. for Wj such that 71 U 72 U • • • U 7fc is an ordered basis for V. T h e o r e m 5.n 2 . W 2 .v2.. .. . Clearly k V =>*>>. Let Wi.©W fc . and so V = Wi © W 2 © W3. ifvi + \)2 + \-vk = 0. • Our next result contains several conditions that are equivalent to the definition of a direct sum. then V{ — 0 for all i... 5 Diagonalization To show that V is the direct sum of Wi. (e) For each i = 1.k. proving (b).. there exist vectors vi. we must prove that Wi n (W2 + W 3 ) = W 2 n (Wi + W 3 ) = W 3 n (Wi + W 2 ) = {0}.--.. for any vectors vi. (d) If 7i is an ordered basis for Wi (I < i < k). where Vi G W.. So Vj = 0. •. But these equalities are obvious. k (b) V B V^Wi and..2. .i>2. then 71 U 72 U • • • U 7^ is an ordered basis for V.276 Thus V= ^ W „ Chap. We prove (b). Now assume (b). By (b). vk such that V{ G W. The following conditions are equivalent. W 2 . . i=l Now suppose that i>i. Proof. there exists an ordered basis 7. . and v = vi + v2 + • • • + vk. Then for any j ~v3 But — Vj G Wj and hence -^•eWj-n^w^io}.. .10. We must show B £ « l e 5ZWi' . We prove (c).. (a) v = w 1 e w 2 © . Wfc be subspaces of a finite-dimensional vector space V.

< Now assume (c).. Consequently 71 U 72 U • • • U 7fc is linearly independent and therefore is a basis for V.j Since 0 G Wi for each i and ^ + 0 H that Wi — 0 for all i. Since k v = ]Tw. proving the uniqueness of the representation. it follows that 71 U 7 2 U • • • U 7fc generates V. . To show that this set is linearly independent.Sec. 2 . Finally. Wi G span(7i) = Wi and n?i + w2 H 1. i=i by (c). Thus V = Wi for all 2. Clearly (e) follows immediately from (d). Suppose also that v = W\ +W2 H where Wi G Wj for all i. 277 h wk. But each 7^ is linearly independent. «. let 7< be an ordered basis for Wi such that 71 U 72 U • • • U 7fc is an ordered basis for V.. aijvij j=\ for each i. Then V = span(7i U7 2 U---U7fc) . JJ ^j • h ^ = wi + if2 H \-u)k. But Vi — Wi G Wi for all i.2 Diagonalizability that this representation is unique.+ (vk . For each i. k) and scalars a^ such that ^OijVij 1.tnfc = 2 J aij% = 0. we assume (e) and prove (a). Thus 0 = 0. let 7. . For each i. . consider vectors Vij G 7< (j = 1. We prove (d).m* and i = 1 . .wk) = 0. Then (vi .2. set Wi = Then for each i. 5. and hence a^ = 0 for all i and j..w^ + (v2-w2) + --.3 For each i. and therefore Vi — Wi — 0 for all i by (b). (c) implies = W» = 7. be an ordered basis for W t ..

278 Chap. 71 U 72 U • • • U 7fc is a basis for V. . I With the aid of Theorem 5. For each i. and for each i choose an ordered basis 7i for the eigenspace EA^ By Theorem 5.4. 43). and hence V is a direct sum of the EA^'S by Theorem 5.o. we are able to characterize diagonalizability in terms of direct sums. By Theorem 5. choose an ordered basis jt of EA. the union 71 U 72 U • • • U 7fc is a basis for V... Conversely.i=i by repeated applications of Exercise 14 of Section 1. and suppose that. and so we conclude that w .10.2c..10. Afc be the distinct eigenvalues of T. A linear operator T on a finite-dimensional vector space V is diagonalizable if and only if\/ is the direct sum of the cigenspaces ofT. Fix j (1 < j < k). First suppose that T is diagonalizable. n ] T w ( = {0}. proving (a).d) = (a. 3d).8 (p..10.c. E x a m p l e 10 Let T be the linear operator on R1 defined by T(a. Proof.. for some nonzero vector v € V. suppose that V is a direct sum of the eigenspaces of T.-. But these representations contradict Theorem 1. •jeWj-n^w.11. 6. 5 Diagonalization k = span(7i) + span(7 2 ) + • • • + span(7fc) = J j W. A 2 . T h e o r e m 5.Wi = span I ^J i*j L»¥j so that Hence n is a nontrivial linear combination of both 7j and v can be expressed as a linear combination of 71 U 7 2 U • • • U 7fc in more than one way. Let Ai. we conclude that T is diagonalizable. Since this basis consists of eigenvectors of T.9. Then G Wj = span(7j) and w G 2_.

then Q~lAQ is a diagonal matrix. Thus Theorem 5. then E A l nEA 2 = {0}. 5. (i) ^ V= £Wi i=l then V = Wi © W2 C E and WiDWj = {tf} Wfc. (d) If Ai and A2 are distinct eigenvalues of a linear operator T. . (c) If A is an eigenvalue of a linear operator T. and if A is diagonalizable.11 provides us with another proof that R4 = Wi © W 2 © W 3 . (f) A linear operator T on a finite-dimensional vector space is diagonalizable if and only if the multiplicity of each eigenvalue A equals the dimension of EA(g) Every diagonalizable linear operator on a nonzero vector space has at least one eigenvalue. test A for diagonalizability. A2 = 2.v2. and A3 = 3. If Q is the nxn matrix whose jth column is Vj (1 < j < n). W 2 . (b) Two distinct eigenvectors corresponding to the same eigenvalue are always linearly dependent. (a) Any linear operator on an n-dimensional vector space that has fewer than n distinct eigenvalues is not diagonalizable. then Wi n Wj = {0} for 1\± j. find an invertible matrix Q and a diagonal matrix D such that Q~lAQ — D. • • • .vn} be an ordered basis for F n consisting of eigenvectors of A. (e) Let A G M n x n ( F ) and 0 = {vi. . for i ^ j. . (h) If a vector space is the direct sum of subspaces W l5 W 2 . and W3 of Example 9. Furthermore.Sec. . The following two items relate to the optional subsection on direct sums. then each vector in EA is an eigenvector of T. \Nk. For each of the following matrices A G Mnxn(R). • EXERCISES 1.2 Diagonalizability 279 It is easily seen that T is diagonalizable with eigenvalues Ai = 1. 2. (a) 1 2 0 1 (b) 1 3 3 1 (c) 1 4 3 2 . the corresponding eigenspaces coincide with the subspaces Wi. Label the following statements as true or false.

State and prove the matrix version of Theorem 5. (c) V = R3 and T is defined by (d) V = P2(R) and T is defined by T(/(x)) = /(0) + / ( l ) ( x + x 2 ). test T for diagonalizability. 4. (b) V = P2(R) and T is defined by T(ax 2 + bx + c) = ex2 + bx + a. (f) V = M 2 x . and if T is diagonalizable. respectively. Suppose that A G M n X n ( F ) has two distinct eigenvalues. and suppose there exists an ordered basis i for V such that [T]^ is an upper triangular matrix. (b) Formulate the results in (a) for matrices. Let T be a lineal' operator on a finite-dimensional vector space V. (a) . where n is an arbitrary positive integer.4.(/?) and T is defined by T(A) = A1. 9. iz + w). For find an expression for A". . (a) Prove that the characteristic polynomial for T splits. 8. The converse of (a) is treated in Fxorcise 32 of Section 5. For each of the following linear operators T on a vector space V. 6.6. w) = (z + iw.Justify the test for diagonalizability and the method for diagonalization stated in this section. find a basis 0 for V such that [T]a is a diagonal matrix. (b) State and prove an analogous result for matrices. then A is diagonalizable. 7. 5. Prove the matrix version of the corollary to Theorem 5. Prove that A is diagonalizable. (e) V = C2 and T is defined by T(z.280 Chap. Ai and A2. (a) V = P3(R) and T is defined by T(/(x)) = f'(x) + f"(x).5: If A G M r i x „(F) has n distinct eigenvalues. and that dim(EA.) = n — 1. 5 Diagonalization 3.

(A/ 12. . 5. let EA and EA denote the corresponding eigenspaces for A and A*.. . .1 is an eigenvalue of T _ 1 (Exercise 8 of Section 5. . mk.Afc and corresponding multiplicities m i .. 13. . (a) Recall that for any eigenvalue A of T.1 that A and A' have the same characteristic polynomial and hence share the same eigenvalues with the same multiplicities. (a) tr(A) = ^ m i A i i=l (b) det(A) = (Ai)^(A 2 ) Tn2 . Let T be a linear operator on a finite-dimensional vector space V with the distinct eigenvalues Ai.Sec. Let A be an n x n matrix that is similar to an upper triangular matrix and has the distinct eigenvalues Ai. Prove the following statements. . . Let A G M n x n ( F ) .. . A .y x'2 = — 5xi . . m 2 . (a) Show by way of example that for a given common eigenvalue. Prove that the eigenspace of T corresponding to A is the same as the eigenspace of T _ 1 corresponding to A" 1 . (b) Prove that if T is diagonalizable.A 2 . A2. Afc with corresponding multiplicities mi. m 2 . respectively. then A1 is also diagonalizable. . Prove that the diagonal entries of [T]p are Ai.1). Recall from Exercise 14 of Section 5. A 2 . 14.7x 2 (c) 15..Afc and that each Ai occurs mi times (1 < i < k). .. Find the general solution to each system of differential equations. 11. . x'i = 8x1 + 10x2 = x +y (a) (b) = 3x .. dim(EA) = dim(E A ). Let T be an invertible linear operator on a finite-dimensional vector space V. these two eigenspaces need not be the same. For any eigenvalue A of A and Af. .. then T _ 1 is diagonalizable. Suppose that 0 is a basis for V such that [T]p is an upper triangular matrix. .2 Diagonalizability 281 10. (b) Prove that for any eigenvalue A. (c) Prove that if A is diagonalizable. . mk. Let fan a2i \ani ai 2 a22 an2 0>ln\ 0>2n x1 = Xi x2 = x2 Xo = ^3 X3 2x3 A = .

Definitions. then A and B commute.. Exercises 17 through 19 are concerned with simultaneous diagonalization. .e. 5 Diagonalization be the coefficient matrix of the system of differential equations X\ = o-\\X\ + a i 2 x 2 + • • • + ai Tl x n x 2 = a 2 iXi + a 2 2 x 2 + • • • + a2nxn x'n = a n i x i + a n 2 x 2 H h annxn. A. then L. 2 . . Prove that a differentiable function x: R —* Rn is a solution to the system if and only if x is of the form A2t .282 Chap. Let T be a diagonalizable linear operator on a finite-dimensional vector space.j. 19. and let Y be an n x p matrix of differentiable functions.4 and LB are simultaneously diagonalizable linear operators.4. 16. 2 + • • • +eXkizk. (a) Prove that if T and U are simultaneously diagonalizable linear operators on a finite-dimensional vector space V. and let m be any positive integer. 18. . where ( r % = *£ for all i. (a) Prove that if T and U are simultaneously diagonalizable operators. Let C G M m x n (/?). Two linear operators T and U on a finite-dimensional vector space V are called simultaneously diagonalizable if there exists an ordered basis 0 for V such that both [T]p and [U]/? arc diagonal matrices. (b) Prove that if A and B are simultaneously diagonalizable matrices. then the matrices [T]^ and [U]/3 are simultaneously diagonalizable for any ordered basis 0. Prove that T and T m are simultaneously diagonalizable.B£ M n x n ( F ) are called simultaneously diagonalizable if there exists an invertible matrix Q G M n X n ( F ) such that both Q~l AQ and Q _ 1 BQ are diagonal matrices. . . Afc. The converses of (a) and (b) are established in Exercise 25 of Section 5... Prove (CY)' = CY'. 2 where Zi G E\t for i = 1 .Ait x(t) = eAltzi + eA2t. 17. k. Suppose that A is diagonalizable and that the distinct eigenvalues of A are Alr A2. (b) Show that if A and B are simultaneously diagonalizable matrices. then T and U commute (i. Exercises 20 through 23 are concerned with direct sums. Similarly... Use this result to prove that the set of solutions to the system is an n-dimensional real vector space. TU = UT).

Sec..3* MATRIX LIMITS AND MARKOV CHAINS In this section. Such sequences and their limits have practical applications in the natural and social sciences. . 22. . 0i. where A is a square matrix with complex entries. Let T be a linear operator on a finite-dimensional vector space V. /=! 21. Let V be a finite-dimensional vector space with a basis 0.. W2. Let Wi. *=1 Prove that V is the direct sum of W i5 W 2 . . . Wfc be subspaces of a finite-dimensional vector space V such that ] T w . = v. M9 be subspaces of a vector space V such that W! = Ki©K 2 ©---©K p and W 2 = Mi©M 2 ©M(/. .. .02. . Mj. Prove that if Wx n W 2 = {0}. then lim zm— lim rm + i lim sm... 23. . .. K 2 . Let Wi. Prove that span({x G V: x is an eigenvector of T}) = EAX © EA2 © • • • © EAfc. Afc. then W i + W 2 = W i © W 2 = Ki©K2 Kp © Mi © M2 © M9..... A2.. A 2 . 5.0k be a partition of 0 (i.02. where rm and sm are real numbers. and let 0i.. We assume familiarity with limits of sequences of real numbers. .3 Matrix Limits and Markov Chains 283 20.. .0k are subsets of 0 such that 0 = 0i U 02 U • • • U 0k and 0{ n 0j = 0 if i ^ j).. The limit of a sequence of complex numbers {zm: m = 1. . Ki.. m »o —o TO—»oo . and i is the imaginary number such that i2 = —1. .e. Prove that V = span(/?i) © span(/?2) © • • • © span(0k).2. . . . W 2 . we apply what we have learned thus far in Chapter 5 to study the limit of a sequence of powers A. Wfc if and only if k dirn(V) = ^ d i m ( W i ) . An.} can be defined in terms of the limits of the sequences of the real and imaginary parts: If Zm — rm + ism. . . 5. . . Kp. . and suppose that the distinct eigenvalues of T are Ai. m—*oo m *o —o m »o —o provided that lim rm and lim sm exist. .-.. M 2 .

•( 2m+l\ 4J m2 + l T * I m-1 J 1+i then lim A m = TO—»00 1 0 3 + 2z 0 2 e • where e is the base of the natural logarithm. benxp matrices having complex entries. Let A1. is said to converge to the n x p matrix L.. The sequence Ai..— Tm 3 J 3TO2 . A2. x n (C) and Q G M p x s (C).. be a sequence of n x p matrices with complex entries that converges to the matrix L. For any t (1 < % < r) and j (1 < j < p). m »o —o Example 1 If 1 ^... we write lim A m = L. if lim (Am)ij = Lij m • 00 — for all 1 < i < n and 1 < j < p.. TO—»oo \m — o »c Theorem 5. n lim (PATn)ij= lim y^Pik(Am)kj TO—'OO TO—»O0 *• ' fc=l .12. To designate that L is the limit of the sequence. property of matrix limits is contained in the next theorem. but important. Note the analogy with the familiar property of limits of sequences of real numbers that asserts that if lim a m exists. 5 Diagonalization Definition... lim PAm = PL and lim AmQ = LQ. A2. Ai.. A simple. Then for any P G M. Let L.A2. TO—»00 TO—»00 Proof. called the limit of the sequence. then TO—«00 lim ca m = c I lim a.284 Chap.

. Then for any TO—»00 invertible matrix Q G M nXn (C). then the dimension of the eigenspace corresponding to 1 equals the multiplicity of 1 as an eigenvalue of A. TO—»00 (a) Every eigenvalue of A is contained in S.13. which is obviously true if A is real. Vol. can be found in the article by S. Insel.4). this set consists of the complex number 1 and the interior of the unit disk (the disk of radius 1 centered at the origin). "Convergence of matrix powers.1 ) 771 = QLQ.l TO—»oo \7n—>oo / by applying Theorem 5. lim (QAQ. Since (QAQ. we frequently encounter the set 5 = { A G C : |A| < l o r A = 1}.2). The following important result gives necessary and sufficient conditions for the existence of the type of limit under consideration.1 .1 )" 1 = (QAQ~l)(QAQ~l) we have lim (QAQ~l)m TO—»oo • • • (QAQ~l) = QAmQ~\ = lim QA 771 ^.Sec. Technoi. Theorem 5. Let A be a square matrix with complex entries. no. 5. pp.1 = Q ( lim A m ) Q " 1 = QLQ . 765-769. Proo/. Math.2.14 of Section 6. can be shown to be true for complex numbers also. 23. m »o —o TO—»oo Corollary. H. Then lim A m exists if and only if both of the following conditions hold. Geometrically. which relies on the theory of Jordan canonical forms (Section 7. This set is of interest because if A is a complex number. This fact. J. then lim Am exists if and only TO—»00 A G S. 285 Hence lim PAm = PL. 5. J. Sci. A second proof. which makes use of Schur's theorem (Theorem 6. Let A G M n x n (C) be such that lim A m = L." Int. Friedberg and A. can be found in Exercise 19 of Section 7. In the discussion that follows. 1992. One proof of this theorem. The proof that lim AmQ = LQ is similar.12 twice.3 Matrix Limits and Markov Chains = J^Piklim (Am)kj *—* TO—»QO fc=l fc=l = S^PikLkj * * — = (PL)ij. (b) If 1 is an eigenvalue of A. Educ.

For suppose that A is an eigenvalue of A such that A ^ 5. But lim (Amv) lim (Amn) TO—>OC TO—»OC TO—»0O diverges because lim A771 does not exist. and hence B has eigenvalue A = 1 with multiplicity 2. then TO—»00 TO—>00 condition (a) of Theorem 5. so that condition (b) of Theorem 5. A simple mathematical induction argument can be used to show that Bm = and therefore that 1 0 m 1 lim (Amv) = lim Bm does not exist. Observe that the characteristic polynomial for the matrix B = is (t — l) 2 . we see that lim A7 v = Lv n—»oo by Theorem 5. In this case. Then lim A7 exists. It can easily be verified that dim(EA) = 1. Theorem 5. the matrix is diagonalizable. Theorem 5. Since A is diagonalizable. (ii) A is diagonalizable. where L = lim A771. In most of the applications involving matrix limits. we consider an example for which this condition fails.12. We see in Chapter 7 that if A m »o —o is a matrix for which condition (b) fails. which can be proved using our previous results.13 must hold. m —o +o Proof.286 Chap.13 reduces to the following theorem. Suppose that A. Although we are unable to prove the necessity of condition (b) here. 5 Diagonalization The necessity of condition (a) is easily justified. (i) Every eigenvalue of A is contained in S. then A is similar to a matrix whose upper left 2 x 2 submatrix is precisely this matrix B. Let v be an eigenvector of A corresponding to A. Regarding v as an n x 1 matrix. Let A G M n X n (C) satisfy the following two conditions. Hence if lim A7 exists.14. and so condition (b) of Theorem 5.13 is violated. there exists an invertible matrix Q such that Q~] AQ = D is a diagonal matrix.13 is automatically satisfied. o D = 0 0 A2 0 o\ 0 KJ .

Sec..3 Matrix Limits and Markov Chains 287 Because Ai..14 TO—»00 can be employed in actual computations. as we now illustrate.. we obtain 0 0N -\ o such that Q~ AQ = D.2. either A* = 1 or |Ad < 1.. An are the eigenvalues of A.1 and 5. A 2 . Thus TO—»oo But since / v Dw = V o the sequence D. condition (i) requires that for each *.. Let /7 _9 _15\ (A 4 41 A = Using the methods in Sections 5. The technique for computing lim A m used in the proof of Theorem 5.12.D2. o otherwise.. Hence lim A m = lim (QDQ~l)m = lim QDmQ~1 = Q ( lim Dm) Q'1 l 0 i 0 /I 0 H r lim TO—»<X> 0 1° 0 0 d r .. 5. 0 A271 0 0 \ 0 converges to a limit L. Hence lim A m = lim (QDQ-1)7'1 = QLQ~l TO—>00 TO—»00 by the corollary to Theorem 5.

In our example. Notice that since the entries of A are probabilities. and the entry M^ represents t he probability of moving from state j to state i in one stage. We now determine the Suburbs Figure. For an arbitrary n x n transition matrix M. Specifically.02 0.10. 5. So. Suppose that the population of a certain metropolitan area remains constant but there is a continual movement of people between the city and the suburbs. there are two states (residing in the city and residing in the suburbs).90 0. the rows and columns correspond to n states.10 Currently living in the suburbs 0. they are nonnegative. Moreover. A2i is the probability of moving from the city to the suburbs in one stage. in one year.98 = A For instance. that is.288 Chap. (See . There are two different ways in which such a move can be made: remaining in the city for 1 year and then moving to the suburbs. 5 Diagonalization Next. the probability that someone living in the city (on January 1) will be living in the suburbs next year (on January 1) is 0. or moving to the suburbs during the first year and remaining there the second year.3 probability that a city resident will be living in the suburbs after 2 years. Any square matrix having these two properties (nonnegative entries and columns that sum to 1) is called a t r a n s i t i o n m a t r i x or a stochastic matrix. the assumption of a constant population in the metropolitan area requires that the sum of the entries of each column of A be 1. for example. let the entries of the following matrix A represent the probabilities t hat someone living in the city or in the suburbs on January 1 will be living in each region on January 1 of the next year. we consider an application that uses the limit of powers of a matrix. Currently living in the city Living next year in the city Living next year in the suburbs 0.

70\ \0.10) (0. This argument can be easily extended to show that the coordinates of .Sec.10).70)+(0. represents the proportion of the 2000 metropolitan population that remained in the city during the next year. 5. Similarly. Observe that this number is obtained by the same calculation as that which produces (A 2 ) 21 . the entry (Mm)ij represents the probability of moving from state j to state i in m stages.02)(0. (0. as we have already done with the probability vector P.3.90)(0. We record these data as a column vector: Proportion of city dwellers Proportion of suburb residents /0. such a vector is called a probability vector.30).02) (0. (0. and the second term. each column of a transition matrix is a probability vector. the probability that a city dweller moves to the suburbs in the first year and remains in the suburbs during the second year is the product (0. the second coordinate of = /0.10)(0. Suppose additionally that 70% of the 2000 population of the metropolitan area lived in the city and 30% lived in the suburbs.98). and hence (A 2 ) 2 i represents the probability that a city dweller will be living in the suburbs after 2 years. It is often convenient to regard the entries of a transition matrix or a probability vector as proportions or percentages instead of probabilities.) The probability that a city dweller remains in the city for the first year is 0.70). Thus the probability that a city dweller will be living in the suburbs after 2 years is the sum of these products.188.30). In the vector AP.10. Hence the probability that a city dweller stays in the city for the first year and then moves to the suburbs during the second year is the product (0.30 Notice that the rows of P correspond to the states of residing in the city and residing in the suburbs. Likewise.636 v 0.364 represents the proportion of the metropolitan population that was living in the suburbs in 2001.98) = 0.90. for any transition matrix M.90) (0. The first term of this sum. Observe also that the column vector P contains nonnegative entries that sum to 1. In general. In this terminology.3 Matrix Limits and Markov Chains 289 Figure 5. (0. represents the proportion of the 2000 metropolitan population that moved into the city during the next year. the first coordinate is the sum (0. respectively. Hence the first coordinate of AP represents the proportion of the metropolitan population that was living in the city in 2001. whereas the probability that the city dweller moves to the suburbs during the first year is 0.10) + (0. and that these states are listed in the same order as the listing in the transition matrix A.90) (0.90)(0.

) For example. the product of any two transition matrices is a transition matrix.15. of lim AmP. I of the population will live in the city and | will live in the suburbs each year. We now compute this limit. In fact. it is natural to define the eventual proportion of the city dwellers and suburbanites to be the first and second coordinates.J = Q [\ J1 < r = TO—»oc \U U and D = 0 (I 0. A proof of these facts is a simple corollary of the next theorem. respectively. and LP is independent of the initial choice of probability vector P. (See Exercise 15. In general. and let u G Rn be the column vector in which each coordinate equals 1. eventually. the coordinates of AmP represent the proportion of the metropolitan population that will be living in the city and suburbs. respectively. In analyzing the city-suburb problem. Note that the vector LP satisfies A(LP) = LP. Let M be an nxn matrix having real nonnegative entries. Then lim Q D m Q . the limiting outcome would be the same. Since the eigenspace of A corresponding to the eigenvalue 1 is one-dimensional. In fact.88 . 5 Diagonalization represent the proportions of the metropolitan population that were living in each location in 2002. after m stages (m years after 2000). there is only one such vector. and so there is an invertible matrix Q and a diagonal matrix D such that Q'1 AQ — D. T h e o r e m 5. Hence LP is both a probability vector and an eigenvector of A corresponding to the eigenvalue 1.290 Chap. let v be a column vector in Rn having nonnegative coordinates. and the product of any transition matrix and probability vector is a probability vector. It is easily shown TO—>00 that A is diagonalizable. Will the city eventually be depleted if this trend continues? In view of the preceding discussion. which characterizes transition matrices and probability vectors. Q = Therefore lim A m = •O O Consequently lim AmP = LP = m—*oo Thus. showing that A 2 is a transition matrix and AP is a probability vector. had the 2000 metropolitan population consisted entirely of city dwellers. we gave probabilistic interpretations of A 2 and AP.

Of course.3 Matrix Limits and Markov Chains (a) M is a transition matrix if and only if Mlu = u. Proof. and the states that other objects are in or have been in. Exercise. In general. 50% will become sophomores. and 30% will quit permanently. . or other factors). For freshmen. such a process is called a stochastic process. the data show that 10% will graduate by next fall. Exercise. 40% of the sophomores will graduate. If. a Markov process is usually only an idealization of reality because the probabilities involved are almost never constant over time. If. and gas). (a) The product of two nxn transition matrices is an n x n transition matrix. (b) v is a probability vector if and only if ulv = (1). 20% will remain freshmen. The switching to a particular state is described by a probability. and in general this probability depends on such factors as the state in question. During the present year. we consider another Markov chain. any power of a transition matrix is a transition matrix. With this in mind. (b) The product of a transition matrix and a probability vector is a probability vector. then the Markov process is called a Markov chain. In particular. and the states could be the three physical states in which H 2 0 can exist (solid. however. the object could be an American voter. all four of the factors mentioned above influence the probability that an object is in a particular state at a particular time. 5. from one fall semester to the next. the time in question. For instance. the probability that an object in one state changes to a different state in a fixed interval of time depends only on the two states (and not on the time. some or all of the previous states in which the object has been (including the current state). The school classifies a student as a sophomore or a freshman depending on the number of credits that the student has earned. 291 Corollary.Sec. The city-suburb problem is an example of a process in which elements of a set are each classified as being in one of several fixed states that can switch over time. liquid. or the object could be a molecule of H 2 0 . We treated the citysuburb example as a two-state Markov chain. In these examples. and the state of the object could be his or her preference of political party. A certain community college would like to obtain information about the likelihood that students in various categories will graduate. Data from the school indicate that. the number of possible states is finite. in addition. earlier states. then the stochastic process is called a Markov process. and 20% will quit permanently. Proof. 30% will remain sophomores.

The vector o\ 0.1 0.1 0. The preceding paragraph describes a four-state Markov chain with the following states: 1.5 0 0.3 0.) Moreover. (Notice that students who have graduated or have quit permanently are assumed to remain indefinitely in those respective states.4 0 0. • The given data provide us with the transition matrix / l 0. To answer question 1. Assuming that the trend indicated by the data continues indefinitely. 2. having graduated being a sophomore being a freshman having quit permanently. the percentage who will be freshmen.2 0.25\ 0.2 0\ < °) 0 0. the probability that one of its present students will eventually graduate. 3. 4.5 l) x <V /0.25/ ( .2 0\ 0 0 l) of the Markov chain. we are told that the present distribution of students is half in each of states 2 and 3 and none in states 1 and 4.10 vO. these probabilities are the coordinates of the vector (\ 0. and 3. the percentage who will be sophomores.3 AP = 0 0 \0 0.5 0. the same percentages as in item 1 for the fall semester two years hence.2 0.292 Chap.3 0. Thus a freshman who quits the school and returns during a later semester is not regarded as having changed states.4 0 0.5 0.the student is assumed to have remained in the state of being a freshman during the time he or she was not enrolled. 2. we must determine the probabilities that a present student will be in each state by next fall. 5 Diagonalization 50% of the students at the school are sophomores and 50% are freshmen.5 \ V that describes the initial probability of being in each state is called the initial probability vector for the Markov chain. and the percentage who will quit school permanently by next fall. the percentage of the present students who will graduate. As we have seen.3 A = 0 0 l o 0.40 0.5 P = 0. the school would like to know 1.

2 0 o o iy -7 0 3 and we have Q~*AQ = D.17 0.5 0 i i y 1 o^ u f 112 \ 0 0 53 \112/ ^ and hence the probability that one of the present students will graduate is . where L = lim A m . Thus L= lim Am = Q ( lim Dm) Q -1 TO—»00 \ O *0 T —0 / /l 0 0 \0 a 0 0 0 So (l LP = f 4 -7 0 3 I 0 0 f 1 19 -40 8 13 0\ / l 0 0 0 0 0 0 0 1/ i0 0 0 0 0 0 0\ 0 0 1/ /I 0 0 0 4 7 1 7 0 3 7 27 o \ 56 5 0 7 1 8 0 29 1 56 o\ 0 0 0 0 I 1 0 0 0 0 ^ 56 0 0 29 56 0\ / o\ \ 0 0.25\ 0 0. / l 0.25J /0.5 0 0. 2% will be freshmen. the answer to question 3 is provided by the vector LP.3 0.Sec.3 0. 17% will be sophomores.4 0.40 0 0 0.5 0.3 0 0 0 0.1 0\ /0. 40% will be sophomores. For the matrices (\ 0 Q = 0 \0 19 0\ -40 0 8 0 13 1/ (I 0 D = 0 \o 0 0 0\ 0.39 J A P = A(AP) 2 provides the information needed to answer question 2: within two years 42% of the present students will graduate. 5.10 v0 0. Finally.2 1/ \0.02 [p. and 39% will quit school. Similarly. 10% will be freshmen. 25% of the present students will graduate. and 25% will quit the school permanently.3 Matrix Limits and Markov Chains 293 Hence by next fall.42\ 0.2 0 0.

3 0. But even if the limit of powers of the transition matrix exists. 5 Diagonalization In the preceding two examples.13 fails to hold. On the other hand.02 0. however. Example 2 The transit ion matrix 0. the computation of the limit may be quite difficult.4 0 0.4'" is for any power rn. In fact.) Fortunately. For example. (The reader is encouraged to work Exercise 6 to appreciate the truth of the last sentence.1 0. if 0 1 A/ = 1 0 then lim Mm does not exist because odd powers of M equal M and even TO—»00 powers of M equal /.294 Chap.5 0.98 of the Markov chain used in the city suburb problem is clearly regular because each entry is positive. it can be shown (see Exercise 20 of Section 7. The reason that the limit fails to exist is that condition (a) of Theorem 5.2) that the only transition matrices A such that lim Am does not exist are preeiselv those matrices for TO—»00 which condition (a) of Theorem 5.2 o\ 0 0 \) of the Markov chain describing community college enrollments is not regular because the first column of .10 0.ss of nyular transition matrices. the limit of powers of a transition matrix need not exist.3 A = 0 1 ) In 0. the transition matrix / I 0. we saw that lim AmP. where A is the TO—»00 transition matrix and P is the initial probability vector of the Markov chain.13 does not hold for M ( — 1 is an eigenvalue).2 0. . A transition matrix is called regular if some power of the matrix contains only positive entries. Definition.90 0. gives the eventual proportions in each state. In general. there is a large and important class of transition matrices for which this limit exists and is easily computed this is the cla.

denoted p(A).2.| and . denoted f(A).n Pi(A) = 7.4 0. and the column sum of A. • J . In the course of proving this result. Thus n pi(A) = y j A j j . Let A G M nXn (C).9 0. for j = 1. These bounds are given in terms of the sum of the absolute values of the rows and columns of the matrix. . Ps(A) = 6.5 \0. The remainder of this section is devoted to proving that. fori = 1.5 M = 0 0.Sec. For 1 < i. Hence p(A) = 6 + y/h and v(A) = 12. vx(A) = 4 + v% v2(A) = 3. 5. and define Vj(A) to be equal to the sum of the absolute values of the entries of column j of A.. For example.3 Matrix Limits and Markov Chains Observe that a regular transition matrix may contain zero entries.1 0 0N 0. Definitions. • is regular because every entry of M2 is positive.6. for a regular transition matrix A. are defined as p(A) = max{pi(A): 1 < i < n] Example 3 For the matrix A = and u(A) = max{vj(A): 1<j <n}. j < n. From this fact. we obtain some interesting bounds for the magnitudes of eigenvalues of any square matrix.(A) = ^ | A i j | i=i The row sum of A.. it is easy to compute this limit. the limit of the sequence of powers of A exists and has identical columns. /0. The necessary terminology is introduced in the definitions that follow.2. p2(A) = 6 + s/h. and u3(A) = 12. define Pi(A) to be the sum of the absolute values of the entries of row i of A.

for instance. and hence by Exercise 8(c) of section 5. and C2 is the disk with center —3 and radius 2.4. A has no eigenvalue with absolute value greater than 6 + \/5To obtain a geometric view of the following theorem. 5 Diagonalization Our next results show that the smaller of p(A) and v(A) is an upper bound for the absolute values of eigenvalues of A. Ci = {z£ C: \z. In the preceding example. stated below. we see that 0 is not an eigenvalue. (See Figure 5.Au\ < n}. that is. Then every eigenvalue of A is contained in a Gerschgorin disk. Proof.1. we define the ith Gerschgorin disk C{ to be the disk in the complex plane with center An and radius ri — Pi(A) — \An\. A is invertible. For an nxn matrix A. In particular. Ci is the disk with center 1 + 2i and radius 1.4 Gershgorin's disk theorem. we introduce some terminology.) imaginary axis Figure 5. Let A G M nXn (C). Let A be an eigenvalue of A with the corresponding eigenvector (vi\ V2 V = \Vn) .16 (Gerschgorin's Disk Theorem).296 Chap. consider the matrix A = l+2i 2i 1 -3/ ' For this matrix. tells us that all the eigenvalues of A are located within these two disks. Theorem 5. For example.

that is. it suffices to show that |A| < v(A). . .3 Matrix Limits and Markov Chains Then v satisfies the matrix equation Av = An.Akkvk\ = ^AkjVj j=i ^ ^2 \Akj\M = \vk\^2\Akj\ Thus \vk\\Xso \X. |A — Afcfcl < rk for some A.Afcfcl + |Afcfc| <rk + \Akk\ = Pk(A)<p(A). But the rows of A1 are the columns of A. A is an eigenvalue of A1.Sec. By Exercise 14 of Section 5. |A — Akk\ < rk. and so |A| < p(Al) by Corollary 1. | The next corollary is immediate from Corollary 2. .. Proof. n). By Gerschgorin's disk theorem. | Akk\ < \vk\rk. Since |A| < p(A) by Corollary 1. Proof. it follows from (2) that \Xvk .1. Let A be any eigenvalue of A G M n X n (C). We show that A lies in Ck.A u l < rk because \vk\ > 0.Akk) + Akk\ < |A . Let A be any eigenvalue of A G M n X n (C). note that vk / 0 because v is an eigenvector of A. . 5. . Then |A|<min{/)(A). Therefore |A| < v(A). consequently p(Al) = v(A). Corollary 1. Akkvk ^ S J2 I-4** I K I Ak v iJ = \vk\rk. Hence |A| = |(A . which can be written ^2 Ai v 297 ii = Xvi (» = 1» 2 . Then |A| < p(A). Corollary 2. For i = k. (2) Suppose that vk is the coordinate of v having the largest absolute value.i/(A)}.

for instance. the city-suburb problem with P = Theorem 5.vn. Theorem 5. AmP = P for every positive integer m. Let A G M nXn (C) he a matrix in which each entry is positive. where u G C n is the column vector in which each coordinate equals 1. Suppose that vk is the coordinate of v having the largest absolute value. Proof. the three inequalities in (3) are actually equalities. 5 Diagonalization Corollary 3. Then Alu = u by Theorem 5.15. I Suppose that A is a transition matrix for which some eigenvector corresponding to the eigenvalue 1 has only nonnegative coordinates. with coordinates Vi. For in this situation. Every transition matrix has 1 as an eigenvalue. hence the probability of being in each state never changes. Let v be an eigenvector of A corresponding to A. that is. it follows that 1 is also an eigenvalue of A. Then some multiple of this vector is a probability vector P as well as an eigenvector of A corresponding to eigenvalue 1. But since A and A1 have the same eigenvalues. and hence u is an eigenvector of A1 corresponding to the eigenvalue 1.17. Consider. If X is an eigenvalue of a transition matrix. Then X — p(A) and {u} is a basis for E\. and let b — \vk\. The next result asserts that the upper bound in Corollary 3 is attained. and let X be an eigenvalue of A such that |A| = p(A). Let A be an n x n transition matrix.v2.298 Chap. and let u G R" be the column vector in which each coordinate is 1. It is interesting to observe that if P is the initial probability vector of a Markov chain having A as its transition matrix. • • • .\AkJvj\ 3=1 ^ f>Wb' (3) ^ E I4yl fc = p^A)h 3=1 Since |A| = p(A). 3=1 3=1 . then |A| < 1. Proof.18. then the Markov chain is completely static. (a) E Ak3V3 = ^r\Akjvj\. Then |A|6= |A||vfc| = |Anfc| = E ^ 3=1 = E \^\M 3=1 <Y.

.14. we have |Vj | = b for j = 1 . Let A G M n X n (C) be a matrix in which each entry is positive. Therefore. we assume that \z\ — 1. So v2 KVJ (bz\ bz = bzu. . . Proof. Exercise. and hence A > 0. we have Vj — bz for all j.. .Sec. ^ | Afcj 16. Let A G M nxri (C) be a transition matrix in which each entry is positive. Combining (4) and (5). Corollary 2. Moreover.2. .1 that (a) holds if and only if all the terms AkjVj (j — 1 . Then \X\ < 1. and 299 We see in Exercise 15(b) of Section 6. the eigenspace corresponding to the eigenvalue 1 has dimension 1. But Au — Xu. n. c 2 . and let X be an eigenvalue of A such that |A| = i'(A). observe that all of the entries of Au are positive because the same is true for the entries of both A and u. cn such that AkjVj = CjZ. . we obtain 6= \VJ\ (4) — A kj A kj for j = 1.. . . Thus there exist nonnegative real numbers ci. . and let X be any eigenvalue of A other than 1. Without loss of generality. . 2 . Kbz) and hence {u} is a basis for EAFinally. By (b) and the assumption that Akj ^ 0 for all k and j.3 Matrix Limits and Markov Chains (b) E l ^ i l N (c) Pk(A) = p(A). . n) are nonnegative multiples of some nonzero complex number z. . .13 and 5. 2 . | Corollary 1. . . and therefore by (4). and the dimension ofE\ = 1.n. Our next result extends Corollary 2 to regular transition matrices and thus shows that regular transition matrices satisfy condition (a) of Theorems 5. Exercise. Then X — v(A). 5. A = |A| = p(A). Proof.

TO—>00 (d) AL = LA = L. then A = 1. dim(E'A) = 1. So by Corollary 2 to Theorem 5. 5 Diagonalization Theorem 5. corresponding to A = 1. Let A be an n x n regular transition matrix. Thus A = 1. Because A is a transition matrix and the entries of As are positive. Then (a) The multiplicity of 1 as an eigenvalue of A is 1. (a) See Exercise 21 of Section 7.7 (p. which follows immediately from Theorems 5. As with Theorem 5.19.20.13. Let EA and E'x denote the eigenspaces of A and As.18. Proof. Now A m is a transition matrix by the corollary to Theorem 5. | Corollary. In fact.15. by Corollary 2 to Theorem 5. we must show that ulL = u*. (b) If \X\ = 1. and let X be an eigenvalue of A. having absolute value 1. (f) For any probability vector w. then the multiplicity of 1 as an eigenvalue of A is 1. Then lim Am exists. Let A be a regular transition matrix. however.13. As = A s+1 = 1. respectively. respectively.14. we state this result here (leaving the proof until Exercise 21 of Section 7. each column of L is equal to the unique probability vector v that is also an eigenvector of A corresponding to the eigenvalue 1.19 and 5. (c) By Theorem 5. (e) The columns of L are identical. and dim(EA) = 1. the entries of As+1 = AS(A) are positive. Then (a) |A| < 1. Suppose that |A| = 1. the fact that the multiplicity of 1 as an eigenvalue of A is 1 cannot be proved at this time. m—*oc (c) L = lim A m is a transition matrix. Statement (a) was proved as Corollary 3 to Theorem 5.18. TO—>00 The preceding corollary.2. so u'L = ul lim A m = lim utAm = lim u* = u\ . Nevertheless.15. So if A is a regular transition matrix. is not the best possible result. Theorem 5. by Theorem 5. Then As and A s + 1 are eigenvalues of As and As+1. Then EA Q E'X and. m »o —o Proof.19 and 5.300 Chap. it can be shown that if A is a regular transition matrix. In fact. Let A be a regular transition matrix that is diagonalizable.2) and deduce further facts about lim A771 when A is a regular m —o »o transition matrix. condition (b) of Theorem 5. there exists a positive integer s such that As has only positive entries. Hence EA = EA. (b) This follows from (a) and Theorems 5. (b) lim A771 exists. lim (Amw) = v. 264). Thus. (b) Since A is regular. and dim(E A ) = 1.13 is satisfied.16. lim A w exists regardless of whether TO—»00 A is or is not diagonalizable.

and 10%. (f) Let w be any probability vector. The vector u in Theorem 5.15. by (c). 10%: now preferred a jug of wine.3 Matrix Limits and Markov Chains and it follows that L is a transition matrix. Tims. LA — L.—>oo m—•TC 301 Similarly.20(c) is culled the fixed probability vector or stationary vector of the regular transition matrix A. each column of L is an eigenvector of A corresponding to the eigenvalue 1. 70%: continued to prefer a jug of wine.12. and third states be preferences for bread. and set y = lim A'"w = Lie." Assuming that this trend continues. of those who preferred a jug of wine on the first survey. 5. Moreover. Example 4 A survey in Persia showed that on a particular day 50% of the1 Persians preferred a loaf of bread. loaf of bread. second. Then III >C O y is a probability vector by the corollary to Theorem 5. and 60% continued to prefer "thou. regular transition matrix. 20%. wine.Sec. respectively. now preferred a. jug of wine. each column of L is equal to the unique probability vector v corresponding to the eigenvalue 1 of A. and 20% preferred "thou beside me in the wilderness. AL = A M Am = lim A A"1 . the situation described in the preceding paragraph is a three-state Markov chain in which the slates are the three possible preferences. we sec that the probability vector that gives the initial probability of being in each state is P = .lim A". (e) Since AL — L by (d). and "thou". each column of L is a probability vector. 40% continued to prefer a loaf of bread. by (a). m m—»oo in. We can predict the percentage of Persians in each state for each month following the original survey. Theorem 5. now preferred '•thou": of those who preferred "thou" on the first survey. (d) By Theorem 5. and also Ay — ALw = Lir — y by (d). 20% now preferred a loaf of bread. Hence y is also an eigenvector corresponding to the eigenvalue 1 of A. and 50% preferred "thou".20 can be used to deduce information about the eventual distribution in each state of a Markov chain having a. 20% now preferred a. So y — v by (e). 30% preferred a jug of wine." A subsequent survey 1 month later yielded the following data: Of those who preferred a loaf of bread on the first survey. H Definition. + ] = L. Letting the first.

0.20^3 = 0 0.60/ Chap.10 0.20 .42/ \().2504\ 0.3418 . Since A is regular. A2P = 0. 35% prefer a jug of wine.20e2 + 0. this vector is the unique probability vector v such that (A — I)v = 0. Letting we see that the matrix equation (.30^2 + 0.4.0.20 0.10e-2 . 111/ \0.334 . It is easilv shown that is a basis for the solution space of this system.70 0. + 0.10 \0.50 0.302 and the transition matrix is .0.5 = 0 0.40 0. Hence the unique fixed probability vector for A is /nTTTlA 5+7+8 8 \5+7+8/ Thus. of the Persians prefer a loaf of bread.4 — I)v — 0 yields the following system of linear equations: -O. A*P -.50«i 4.20e.4078/ Note the apparent convergence of A" P.40^3 . and 40% prefer "thou beside me in the wilderness.10v) 0.4 /0. The reader may check that AP = /0.30 .0 . the long-range prediction concerning the Persians" preferences can be found by computing the fixed probability vector for . \0. 25%. in the long run.GOe.20\ 0.252\ /0.32 . and A4P = 0. 5 Diagonalization The probabilities of being in each state m months after the original survey arc the coordinates of the vector A"1 P." .40y \0.30\ /0. 0.26\ /0.

and 100 acres of wheat were planted. In this problem.3 Matrix Limits and Markov Chains Note thai if 303 then G So lim A"1 =Q Q-] = Q i o o\ 0 0 0 Q" \ 0 0 0/ 0. these farmers do not plant the same crop in successive years. This year. respectively. 200 acres of soybeans. is given. soybeans. or wheat.35 0.Sec.25 0.25 I 0. the amount of land devoted to each crop. 5. Thus the (wentrial .40 0. and wheat. soybeans.35 0. By converting these amounts into fractions of the total acreage.40 Farmers in Lamron plant one crop per year either corn. however.35 0. 300 acres of corn. The situation just described is another three-state Markov chain in which the three states correspond to the planting of corn. of the total acreage on which a particular crop is planted. Because they believe in the necessity of rotating their crops. we see that the transition matrix A and the initial probability vector P of the Markov chain are /0 A= J 0 h\ i and 300 \ 600 \ P = •Jim i>(in 100 V 600 fi\ A 0 w The fraction of the total acreage devoted to each crop in m years is given by the coordinates of A'"P. exactly half is planted with each of the other two crops during the succeeding year.40 Example 5 0. in fact.25 0. rather than the percentage of the total acreage (GOO acres). and the eventual proportions of the total acreage1 used for each crop are the coordinates of lim A" P.

the eventual amounts of land used for each crop are the coordinates of G O • lim A'" P.304 Chap.20 shows that lim -4'" III —0 >0 is a matrix L in which each column equals the unique fixed probability vector for A. we have1 concentrated primarily on the theory of regular transition matrices. Theorem 5.lim AmP = Q00LP= \ 200 III —T >C Thus. 5 Diagonalization amounts of land devoted to each crop are found by multiplying this limit by the total acreage. A Markov chain is called an absorbing Markov chain if it is possible to go from an arbitrary state into an absorbing state. Observe that the Markov chain that describes the enrollment pattern in a community college is an absorbing Markov chain with states 1 and 4 as its absorbing states. in a finite number of stages. that is. see Exercise 14. in the long run. Readers interested in learning more about absorbing Markov chains are referred to Introduction to Finite. state is never left once it is entered. Mathematics (third edition) by .) The states corresponding to the identity submatrix are called absorbing states because such a. It is easily seen that the fixed probability vector for A is /i\ i 3 Hence I3 i L = :i 1 :{ so GOO. we expect 200 acres of each crop to be planted each year. O m—-oo Since A is a regular transition matrix.) • •III—>OC In (his section. (For a direct computation of 600 • lim A"1 P. There is another interesting class of transition matrices that can be represented in the form I () D C 3 : \ ? I i 3 :i l i 3 :5 where / is an identify matrix and O is a zero matrix. (Such transition matrices are not regular since the lower left block remains O in any power of the matrix.

5. Roberts (PrenticeHall. Let us consider the probability of offspring ol' each genotype for a male parent of genotype Gg. and that the distribution of each genotype within the population is independent of sex and life expectancy. in humans.Sec. thus the offspring with genotypes GG or Gg are brown-eyed.. Gg. and gg. brown eyes are a dominant characteristic and blue eyes are the corresponding recessive characteristic. certain genotype will stabilize at the fixed probability vector for B. and g represents the recessive characteristic. Englewood Cliffs. 1976).. whereas those of type gg are blue-eyed. Inc.].3 Matrix Limits and Markov Chains 305 J. For example. Snell. one inherited from each parent. and G. Offspring with genotypes GG or Gg exhibit the dominant characteristic.. . This experiment describes a three-state Markov chain with the following transition matrix: Genotype of female parent GG Genotype GG Gg (\ Gg gg °\ 1 . whereas offspring with genotype gg exhibit the recessive characteristic.13. which is f\\ w . by permitting only males of genotype Gg to reproduce. that mating is random with respect to genotype. the characteristics of an offspring with respect to a particular genetic trait are determined by a pair of genes. N. the proportion of offspring in the population having a.). 1974) or Discrete Mathematical Models by Fred S. Thus. (We assume that the population under consideration is large. . N. respectively. which are denoted by G and g. An Application In species that reproduce sexually. The gene (4 represents the dominant characteristic. so II is regular.) Let P = denote the proportion of the adult population with genotypes GG. Thompson (Prentice-Hall. Kemeuy. J.. The genes for a particular trait are of two types. Englewood Cliffs. 2 offspring I 2 I It is easily checked that I}2 contains only positive entries. Inc. at the start of the experiment.

5 Diagonalization Now suppose that similar experiments are to be performed with males of genotypes GG and gg. in the population. As already mentioned. Let //. and /•' denote the proportions of the. form the transition matrix M = pA + qB + rC. In order to consider the case where all male genotypes are permitted to reproduce.306 Chap. we find that (V M = ¥ ¥ ¥ + ¥ ¥ + * ¥ ¥ p ¥ o ¥ 4 r' \ (a1 b' V 0 2' | ¥ 0\ a' b ' . and C weighted by the proportion of males of each genotype. (The numbers a and b represent the proportions of G and g genes. which is the linear combination of A.) Then (a M = b 0 \a \ \b 0\ a b where a + b = p + q+ r = 1. Thus (P M = !« ¥ b n> h kq + r 31 o \ k +r 0 V To simplify the notation. D. q'. we must. Gg. these experiments are threestate Markov chains with transition matrices /() A = 0 \ o> c = t) Vo o 0 respectively. As before. Then / = MP ap+\aq bp+ k + < L 2b<l l \ 4 br J In order to consider the effects of unrestricted niatings among the firstgeneration offspring.first-generation offspring having genotypes GG. let a = p+ \q and b = \q + r. respectively. and gg. a new transition matrix A/ must be determined based upon the distribution of first-generation genotypes. respectively.

we have lim QAmQ~i = QLQ If 2 is an eigenvalue of A € 1 UAe (b) (C).hq' and 1/ = hq' 4.r'. for any invertible matrix 'nx n Q e M n X n (C). then ii x n lim A" does not exist.• • • 4.-(2a6) 4.b2 = b(a + b) = 6. Label the following statements as true or false (a) {(. Thus M = 47: so the distribution of second-generation offspring among the three genotypes is M(MP) a3+a2b \ = M P = a b 4.xn = 1 is a probability vector.) Notice that in the important special case. (c) Anv vector •l'2 such that . (e) The product of a transition matrix and a probability vector is a probability vector.r\ + x-2 4.) and lim A" = L. and genetic equilibrium is achieved in the population after only one generation. . that a — b (or equivalently. (d) The sum of the entries of each row of a transition matrix equals 1. (This result is called the Hardy Weinberg law.ah + ah2 \ ab2 + b3 ) / 2 2 I a2(a + b) \ ab(a 4-14-6) = \ b2(a | 6) / (a2 2ab \lr the same as the first-generation offspring. that p — r). In other words. MP is the fixed probability vector for M. 5.3 Matrix Limits and Markov Chains where a' = p' 4. However a' = a2 + -(2a6) = a(a + b) = a and 307 b' . then. the distribution at equilibrium is rt\ MP 2ab I \¥ EXERCISES 1.Sec.

8 2.6 0. 5 Diagonalization Let z be any complex number such that \z\ < 1.4 1.3 0.0 -0.4 A. A2. Prove that if A € M n X U (C) is diagonalizable and L = lim Am exists.8 (d) -1. then lim (AmY = V. Determine whether lim Am exists for each of the following matrices m »o —o 0.5 4-Gi 6 7 .1 0.7 0. then lim Am exists and has m +o —o rank 1. then lim Am exists. 0. • •. and compute the limit if it exists. is a sequence of n x p matrices with complex entries such that lim Am = L.7 4.5 3 G) .2 (e) -2 4 -1 3 (f) 2.5 4. m >o —o (j) If A is a regular transition matrix. then either L = In or rank(L) < n.7 -1.5 -0.7 0. Then the matrix does not have 3 as an eigenvalue.2z 35 .8 (c) (a) (b) 0. 2.2i 3 .4 0. (i) If A is a transition matrix. (h) No transition matrix can have —1 as an eigenvalue.8 4.i 3 .1 -2. Prove that if A\.308 (f) Chap.8 -0.0 3.20?: 3.1 3 4-62 G 3 . . m »o o m—>oo 4. (g) Every transition matrix has 1 as an eigenvalue.

5 o r 0. 60% of the ambulatory patients have recovered. and 20% have become bedridden.4. 20% have become ambulatory.5 0. 8. and lim (AB)m all exist. m »o —o Bm).5 0 0 \ 0 1 0. This process continues until the marker lands in square 1.5 .5 0. What is the probability of winning this game? Hint: Instead of diagonali/ing the appropriate transition matrix Win 1 Start 2 Lose 4 3 Figure 5.5 1 0 0 0 «/ /l (f) 0 \0 . are ambulatory. and 20% have died. 10% of the bedridden patients have recovered.2 0.3 Matrix Limits and Markov Chains 5. 5. 50%: remain bedridden. and the marker is moved one square to the left if a I or a 2 is rolled and one square to the right if a 3. it is easier to represent e<i as a linear combination of eigenvectors of A and then apply A" to the result. 4. in which case the player loses the game. 20% remain ambulatory.5 0 /0. game of chance by placing a marker in box 2. Which of the following transition matrices are regular? 0.5. and have died 1 month after arrival. (\ (e) 0 °\ 0 1 /0. 6. Determine the percentages of patients who have recovered. 0 0. 7. or in square 4. 5.3 0. After the same amount of time.5 0 1\ 0. Also determine the eventual percentages of patients of each type.7 0. Find 2 x 2 matrices A and B having real entries such that lim Bm.2 0. are bedridden. (See Figure 5. marked Start.3 0. but m—-oc in ->oc lim (AB)m ^ ( lim Am)( lim 309 lim Am.8 (a) (b) (c) (d) 0.5 0. or G is rolled.2 0.5 0 0' 0. A month after arrival. A player begins a.Sec.3 0 0. in which case the player wins the game. A hospital trauma unit has determined that 30% of its patients are ambulatory and 70% are bedridden at the time of arrival at the hospital.) A die is rolled.5 0 1 \ 0 1 0.

1 ().1 0. the liner is washed with the diapers and reused.3 0.1 0.6 0. . eventually what proportions of the diaper liners being used will be new. The probability that the baby will soil any diaper liner is one-third.5 0. In 1940. (a) /0. compute the proportions of objects in each state after two stages and the eventual proportions of objects in each state by determining the fixed probability vector. 10% had become unused.8 0.1 0. compute the percentages of urban. the initial probability vector is For each transition matrix.1 0.5 (c) /0. In all cases.1 \ 0 0.6 (b) /0.1 0. unused. and 20% had become agricultural.3 0.9 0.6/ 0.2 0.2 0 0.6 0. If there are only new diaper liners at first.1 \0. and 20% had become agricultural.2 0. l \ 0. and 40% was agricultural.1 0 . m—»oc 10.l\ 0. the 1945 survey showed that 20% of the agricultural land had become unused while 80% remained agricultural.2 0.7 0.3 0. Five years later.7/ 0.2 0. Likewise. Each of the matrices that follow is a regular transition matrix for a three-state Markov chain. a county land-use survey showed that 10% of the county land was urban. and agricultural land in the county in 1950 and the corresponding eventual percentages. 50%i was unused.310 /0 (g) \ 0 0\ 0 0 (h) V1 4 o o Chap.3 0. 12. then it is discarded and replaced by a new liner.5 0. for each matrix A in Exercise 8.2 0.8 0.6 0.2 0.5 0. 60% had remained unused. Finally.3 0.2 V0. 5 Diagonalization A o o o i I i o o i 9. 20% of the unused land had become urban.1 0.9 0.8/ 0.2\ 0. a follow-up survey revealed that 70%. except that each liner is discarded and replaced after its third use (even if it has never been soiled).4 (d) (e) (f) 11. A diaper liner is placed in each diaper worn by a baby.2 0. If. of the urban land had remained urban. Compute lim A7"' if it exists.2 0 0. Otherwise. Assuming that the trends indicated by the 1945 survey continue.2 0.4 0.8 0.1 0.2 0. the liner is soiled. after a diaper change.4 0.

A" 200 ] V 100 200 2"> 200 IOO': (-1)'"11 (100) 2' 15. . of the small-car owners in 1975.18. Suppose that M and M' are n x n transition matrices. and 20% had changed to small cars in 1985. and twice used? Hint: Assume that a diaper liner ready for use is in one of three states: new. Of those who owned intermediate-sized cars in 1975. and twice used.3 latrix Limits and Markov Chains 311 once used. 20% drove intermediate-sized cars. 16.19. determine the percentages of Americans who own cars of each size in 1995 and the corresponding eventual percentages. 17. 10% owned intermediate-sized cars and 90% owned sma. then W contains a unique probability vector. 14. Prove Theorem 5. A second survey in 1985 showed that 70% of the largecar owners in 1975 still owned large cars in 1985. and 40% drove small cars. 19. 10% had switched to large cars. but 30% had changed to an intermediate-sized car. 70% continued to drive intermediate-sized cars. In 1975. Assuming that these trends continue. 18. 13.Sec. After its use.ll cars in 1985. then ( Deduce that r m m I 1 ''m+1 >'m + i rin rm-\ i J .15 and its corollary. Finally. Prove the corollary of Theorem 5. Show that if A and P are as in Example 5. 5. the automobile industry determined that 40% of American car owners drove Large cars. once used. it then transforms into one of the three states described. r m I 1 7'm-l !'„ r Vm = 3 om- 200 /30(r 600(4'" P) . Prove that if a 1-dimensional subspace W of R" contains a nonzero vector with all nonnegative entries. Prove the two corollaries of Theorem 5.

22. where III. Prove that eA = PeDP~1. < 1 such that M''= cM 4. 5 Diagonalization (a) Prove that if M is regular. respectively. Let P~lAP = D be a diagonal matrix. 21.B e M 2x2 (/?) such that eAeB ^ eA+B. Prove that a differentiable function x: B -^ R" is a solution to the system of differential equations defined in Exercise 15 of Section 5. where () and / denote the n x n zero and identity matrices.. Thus e A2 2! Am is the . and c is a real number such that 0 < c < 1. Compute c° and c1.2 shows that e exists for every Ac M„ X „(C).) 23.. For A 6 M n x „(C). then cM + (1 — c)N is a regular transition matrix.) 20. j. where A is defined in that exercise. Prove that there exists a transition matrix N and a real number c with 0 < c. Let A G M n xn(C) be diagonalizable.> 0 whenever A/.I 4 A (see Exercise 22).sum of the infinite series 1+ A + A*_ 3! and Bm is the mth partial sum of this series. The following definition is used in Exercises 20-24. N is any n x n transition matrix.c)JV. then M is regular if and only if M' is regular. (b) Suppose that for all i.(1 .> 0. (Exercise 21 of Section 7.2 if and only if x(t) = e v for some u £ R". Use the result of Exercise 21 to show that eA exists. (Note the analogy with the power series c" = 1 a:' 4! 2! which is valid for all complex numbers a.—oc ' Bm . (c) Deduce that if the nonzero entries of M and M' occur in the same positions. we have that M[. 24.312 Chap. Definition. define e = lim Bm. Find A. .

It is a. The proofs that these subspaces are T-invariant are left as exercises.6. 5...b + c.T 2 (.for any eigenvalue A of T. we observed that if v is an eigenvector of a linear operator T. In Exercise 41.O): x. Then the . .70. thai is. . we outline a method for using cyclic subspaces to compute the characteristic polynomial of a linear operator without resorting to determinants. Definition. Exercises 28 42 of Section 2.) • Example 2 Let T be the linear operator on R'{ defined by T(a.\. f. Cyclic subspaces also play an important role in Chapter 7. We apply them in this section to establish the Cayley -Hamilton theorem. Then the following subspaces of V arc./'. In fact.T-invariant: 1.. simple matter to show that W is T-invariant. any T-invariant subspace of V containing x must also contain W (see Exercise 11).0): x E R} are T-invariant subspaces of R5.g.1). • Let T be a linear operator on a vector space V./: be a nonzero vector in V./:. {0} V R(T) N(T) E. 3. if _ T(v) £ W for all v 6 W.1.y G R) and the . 5. Example 1 Suppose that T is a linear operator on a vector space V. (Sec Exercise 3..}) is called the T-cyclic subspace of V generated by .. then T maps the span of {r} into itself.r-axis .///-plane = {(. Let T be a linear operator on a vector space V.T(:r). Subspaces that are mapped into themselves are of great importance in the study of linear operators (sec. The? subspace W-span({. e. . when1 we study matrix representations of nondiagonalizable linear operators. W is the "smallest" T-invariant subspace of V containing x.c) = (a + b. Cyclic subspaces have various uses.{(. 2.4 Invariant Subspaces and the Cayley-Hamilton Theorem 5.Sec. That is. 0.r.4 INVARIANT SUBSPACES AND THE CAYLEY-HAMILTON THEOREM 313 In Section 5. and let.vy.0).4 suhspace W of\/ is called a T-invariant subspace of V if T(W) C W.r.

3c). 2. then the restriction Tw of T to W (sec Appendix B) is a mapping from W to W. 5 Diagonalization We determine the T-cyclic subspace generated by e. byc) = (-b + c.O): s. Choose an ordered basis 7 = {v\ . 0 .{(a.t. it follows that span({ei. If T is a linear operator on V and W is a T-invariant subspace of V.T 2 (e 1 ). 0 ) = (0. •.t e R}. Proof. vn} for V. Tw inherits certain properties from its parent operator T. 2}) = P-2(R). . Let T be a linear operator on a finite-dimensional vector space V.4 //„) drl[Blo*J* Bs-tInJ •'/:/"'lM //.0.3. a + c.T(e 2 ) = (-1.l. Thus g(t) divides f(t). A can be written in the form A = B\ O B2 B3 Let f(t) be the characteristic polynomial of T and g(t) the characteristic polynomial of Tw.c .Vk+1.T(e 1 ). Vk] for W..0. • • •. Chap.. by Exercise 12. Let.\ = (1.{ denned by T(a. Then the • • The existence of a T-invariant subspace provides the opportunity to define a new linear operator whose domain is this subspace. • • • <Vk. T-cyclic subspace generated by x is span({x . and it follows that Tw is a linear operator on W (see Exercise 7). •.0) = .0). and let W be a T-invariant subspace of V. .314 Example 3 Let T be the linear operator on R. Since T ( e i ) = T ( l . The following result illustrates one way in which the two operators are linked./:.Then /(/) det(.e 2 }) . ' by Exercise 21 of Section 4.0) = e 2 and T2(ex) = T(T(ei)) .. and extend it to an ordered basis (3 — {v\. Theorem 5.21.}) = span({ei. As a linear operator. ??2. Then the characteristic polynomial of Tw divides the characteristic polynomial of T. Example 4 Let T be the linear operator on P(R) defined by T(f(x)) = f'(x). A = [T]g and Bi = [Tw]-y Then.t>2.

0. Let f(t) be the characteristic polynomial of T and g(t) be the characteristic polynomial of Tw. T(«.T2(v). d) = (a 4-6 4.tj ( ' ' o .det ! .Then A .T(v). In this regard.s.0) € W.21.Sec.<->).. . then the characteristic A polynomial of Tw is f(t) = (—1) (an 4. — 2 / -< r r_\ In view of Theorem 5. c + d).(a + b. cyclic subspaces are useful because the characteristic polynomial of the restriction of a linear operator T to a cyclic subspace is readily computable.22.d. Extend 7 to the standard ordered basis 0 for R1. 2c .tl4) = det 0 \ 0 ! 2 1 2 -1 \ 1 . 0 1 0 2-t -1 0 1 1 . 315 and let W .6.d.0) ./.[Twl-v = 4 A=[T]f3 in the notation of Theorem 5. we may use the characteristic polynomial of Tw to gain information about the characteristic polynomial of T itself.. Then (\ 0 = 0 \o 1 1 0 0 2 0 2 1 -1\ 1 -1 \) R> . T h e o r e m 5. (b) Ifaov + aiT(v) + Hafc_iTfc" 1(v)+Tk(v) = 0. which is an ordered basis for W.g(t).a.i 0 f(t) = det(A . Let k = dim(W).0) G R4.{(/. . s 6 R}.0. Let T be a linear operator on a finite-dimensional vector space V. 6. c. 6. 0): t. and let W denote the T-cyclic subspace ofV generated by a nonzero vector v 6 V.0.\t + • • • + ak-it 1 .4 Invariant Subspaces and the Cayley-Hamilton Theorem Example 5 Let T be the linear operator on R' defined by T(a. b + d.. Then (a) {v.2c .Tk '('•)} ^ a basis for W. Observe that W is a T-invariant subspace of R' because.6.21. . Lei -) — {> \. for any vector (a. 5..0.

. and so we conclude that Z = W... bj-iV-Hv). It follows that d is a basis for W. u\. Thus T(w) is a linear combination of vectors in Z. w E Z.b\. Since w is a linear combination of the vectors of (3. W C Z. By Exercise 11.Tk k JI(v) + Th(v) = 0.bj-^ such that w = bnv + 6i T(i and hence T(w) = b0T(v) + b{T2{v) 4.7 (p.T(v). so that. proving (b). Thus j = k.22 and by means of determinants.. e € Z. Furthermore.1 tk) by Exercise 19. Thus f(t) is the characteristic polynomial of Tw.. W is the smallest T-invariant subspace of V that contains v.tf).. Example 6 Let T be the linear operator of Example 3. a. the T-cyclic subspace generated by e\. (b) Now view B (from (a)) as an ordered basis for W. Let Z = span(. there exist scalars bo. We compute the characteristic polynomial f(t) of Tw in two ways: by means of Theorem 5. So Z is T-invariant. Let a. This proves (a). Let j be the largest positive integer for which 0 = {v. and therefore dim(W) = j. Clearly.. and hence belongs to Z. Furthermore.)... Let. We use this information to show that Z is a T-invariant subspace of V. . Such a j must exist because V is finite-dimensional.. the set {v} is linearly independent. T^(v) 6 Z by Theorem 1. 39). 5 Diagonalization Proof.316 Chap. and let W = span({ei. Observe that /() 0 I 0 [Tw]/s = \0 0 ••• 1 -ak-J an \ -«i which has the characteristic polynomial /(i) = (-l) f c (a 0 -r-a 1 i4----4-a f c _ 1 i f c . Then ji is a basis for Z. . Z C W.k \ be the scalars such that a0v + a{T(y) + ••• + ak-.V-1(v)} is linearly independent.e2}).• • • 4fc-iT'(v). (a) Since v ^ 0..

22. Theorem 5. = 0. and therefore. g(t) divides f(i). Let ft = {ci. hence there exists a polynomial q(t) such that f(t) = q(t)g(t).e^} is a cycle that generates W. Proof. and let f(t) be the characteristic polynomial of T.Combining these two equations yields 9(T)(v) = (-l) f c (a 0 l + oiT + • • • + afc-iT^" 1 + Tk)(v) = 0. This is obvious if v = 0 because /(T) is linear: so suppose that v ^ 0. Hence \ex+{)T(e.tk) is the characteristic polynomial of Tw.21. We show that /(T)(v) = 0 for all v E V. From Example 3. Let T be a linear operator on a finite-dimensional vector space V.\. By Theorem 5. and suppose that dim(W) = k..ait +-••• + afc-it*" 1 4. which is an ordered basis for W. T "satisfies" its characteristic equation. 1 The Cayley-Hamilton Theorem As an illustration of the importance of Theorem 5. That is. we have [T w ].) Therefore. f(t) = (-l)2(l + 0t + t2) = t2 + l.Sec. we have that {ei. The reader should refer to Appendix E for the definition of /(T). So f(T)(v) = q(T)g(T)(v) = q(T)(g(T)(v)) = q(J){0) = 0. (b) By means of determinants. where T is a linear operator and f(x) is a polynomial.22. • • • • dk l . by Theorem 5.such that -A--1 a0v + aiT(v) + • • • + dk-iV v)+Tk(v) Hence Theorem 5. Then /(T) = T 0 . 5.22(a). there exist scalars an. the zero transformation. and that T 2 (ci) = —e\.e2j.23 (Cayley-Hamilton).22(b). 5. we prove a wellknown result that is used in Chapter 7.22(b) implies that g{t) = (-l) f c (a 0 4. By Theorem 5. «\. Let W be the T-cyclic subspace generated by u. -t] 0 -1 . + T2(cl)^(). Since T(ei) = c> and T(e 2 ) = — e.4 Invariant Subspaces and the Cayley-Hamilton Theorem 317 (a) By means of Theorem.

Corollary (Cayley-Hamilton Theorem for Matrices). and suppose that V = W| © W-2 © • • • © \Nk.4.2T + 51. See Exercise 15. Similarly. 'This subsection uses optional material on direct sums from Section 5. b) = (a + 2b.24. —2a -1. Invariant Subspaces and Direct Sums* It is useful to decompose a finite-dimensional vector space V into a direct sum of as many T-invariant subspaces as possible because the behavior of T on V can be inferred from its behavior on the direct summands.5.//) . and let (3= {ei.(/) is the characteristic polynomial of Tw. 1 + 5/ = -3 4 4 \ . For example. therefore.4 . The first of these facts is about characteristic polynomials. Proof. we consider alternate ways of decomposing V into direct sums of T-invariant subspaces if T is not diagonalizable. It is easily verified that T 0 = /(T) = T 2 . In Chapter 7. (-2 -4\ 4 -2 -3 / ". Then fi(t)'f2(t) . Then f(A) = O. /(f) = det(. T is diagonalizable if and only if V can be decomposed into a direct sum of one-dimensional T-invariant subspaces (see Exercise 36). Let T be a linear operator on a finite-dimensional vector space V.2t 4.318 Example 7 Chap. and let f(t) be the characteristic polynomial of A. the n x n zero matrix. Let A be an n x n matrix. Theorem 5. Suppose that /. 0 5 . is a Tinvariant subspace of V for each i (1 < i < k).-(/) is the characteristic polynomial of T. The characteristic polynomial of T is. f(A) = A2-2A 0 0 0 ()/' Example 7 suggests the following result.det f ^ t % ) = t2 . Then A= 1 -2 2 1/ ' where A = [T]^. Wre proceed to gather a few facts about direct sums of T-invariant subspaces that are used in Section 7. where W. (1 < i < k).b). 5 Diagonalization Let T be the linear operator on R2 defined by T(a.2.e 2 }.//.

I As an illustration of this result. Since each eigenspace is T-invariant. In what follows. Let .e4}.e 2 }.t. So by the case for k = 2. Let W = Wi I. proving the result for k — 2. c.3\ be an ordered basis for Wj. We conclude that f{t)=g(t)'fk(t) . The proof is by mathematical induction on k. where k — 1 > 2. By Theorem 5.t E R] and W2 .e 4 }. it follows that A= Bi O' O B2y ' where O and O' are zero matrices of the appropriate sizes..24. By Theorem 5. and therefore g(t) — fi(t)-f2(t) fk-i(t) by the induction hypothesis.( A f c .. f(l) denotes the characteristic polynomial of T. and 0 = fa U ft2 == {ei. Now assume t hat the theorem is valid for k — \ summands.().{(s.Clearly W = Wi ®W2©.W2-l hWfc 1. v = w .«/)• det(B 2 . A2 A/.10(d) (p. V is a direct sum of the eigenspaces of T. It follows that the multiplicity of each eigenvalue is equal to the dimension of the corresponding eigenspace. the restriction of T to E\. where g(t) is the characteristic polynomial of Tw.d. Let fa = {ei. Suppose first that k — 2.fi(t)-f2{t) fk(t). For each eigenvalue A.0.W I Wfc. and let W.21.g(t)-fk(t). Example 8 Lei T be the linear operator on R4 denned by T(a.t E R}. suppose that T is a diagonalizable linear operator on a finite-dimensional vector space V with distinct eigenvalues A].tl) = fi(t)-f2(t) as in the proof of Theorem 5. Then /(/) = det(i4 . B\ = [TwJ/Si) a n d J^2 = [Tw2]/V By Exercise 34. Then fa is an .tl) = det(B.24.11 (p. It is easily verified that W is T-invariant and that V . c + d). c .• -©Wfc_i. as expected. sum of k subspaces.. b. f)"1' • where m. and 0 = fa U 02Then 3 is an ordered basis for V by Theorem 5. a \ b. Let A = [T^.* ) m i ( A 2 .. 278). 276).:•••• e w f e .b.Sec. has characteristic polynomial (A.{(0. say.t ) m a " .s.4 Invariant Subspaces and the Cayley-Hamilton Theorem 319 Proof. is the dimension of E^. ©w2••]..t): s.e 3 .()): s. . /(/) . we may view this situation in the context of Theorem 5. fa an ordered basis for W2. . d) = (2a . 02 = {e3. and suppose that V is a direct. 5.e2. the characteristic polynomial f(t) of T is the product /(*) = ( A i . Notice that W] and W 2 are each T-invariant and that R1 = Wi © W 2 ..t ) m * .

Tw.. = and (2 . Bi Bk.( j " ). fa is an ordered basis for VV2. and B2 = [Tw 2 k. Let # . and let B2 E M„xn(F).. If B\.. and B3 = ..j<m <n + m for m + l<i.and Tw2r respectively. (t)-f2(t). Let A = [T]a.j otherwise.320 Chap.tl) = det(5i . 5 Diagonalization ordered basis for Wi..Then B. Bi = [Twjfr. Bk recursively by lh 0 B2 If A = B\ © £ 2 © . as the (m + ri) x (rn + n) matrix A such that ({Bxlij Aij = < (B 2 )(i-m).. B2. and 0 is an ordered basis for R 4 .®Bk = (B1®B2®---(BBk-1. then we define the direct sum of B\.(j-m) [0 fori <i. /i(i)..1 0 0 1 0 0 I \o 0 1 0\ 0 7 2 -1 Bo = -1 1 A= O Bo Let .. • The matrix A in Example 8 can be obtained by joining the mat rices B\ and B2 in the manner explained in the next definition. Definition.B<2. Then f(t) = det(A . Bk are square matrices with entries from F./(f). then we often write (Bx O A = \ O O B2 O Bk) o\ () Example 9 Let Bx . E UmXIII{F). and f2(t) denote the characteristic polynomials of T. We define the direct sum of B\ and B2) denoted B\ OjB2. B2 ..tI)>det(B2 .II) = / .(3).

. It is an extension of Exercise 34 to the case k>2.. then for any r E V the T-cyclic subspace generated by v is the same as the T-cyclic subspace generated by T(r). let . 5. Label the following statements as true or false. Then A = BX © B2 © • • • © Bk. Proof. then v — iv.Sec.\Nk be T-invariant subspaces of V such that V . be an ordered basis for W. and let \Ni. (a) There exists a linear operator T with no T-invariant subspace.\c for i = 1 .\N2. . and let 3 = 0} U 02 U • • • U 0k. Then there exists a polynomial g(t) of degree // such that g(T) = T 0 . then the characteristic polynomial of Tw divides the characteristic polynomial of T.. . and let v and w be in V. k. T h e o r e m 5. 2 . .ria.Wi : WV! • • • I WA. Let T be a linear operator on a finite-dimensional vector space V.. . is a direct sum of k matrices.4. EXERCISES 1. W is the T-cyclic subspace generated by w. (g) If T is a linear operator on a finite-dimensional vector space V. and if V is the direct sum of/. (f) Any polynomial of degree // with leading coefficient (—1)" is the characteristic polynomial of some linear operator. ..T-inva.25. Let A = [T]0 and B. If W is the T-cyclic subspace generated by v. then there is an ordered basis 0 for V such that [Tj:. (d) If T is a linear operator on a finite-dimensional vector space V.. (c) Let T be a linear operator on a finite-dimensional vector space V.nt subspaces. See Exercise 35. and W — W .. For each i. (b) If T is a linear operator on a finite-dimensional vector space V and W is a T-invariant subspace of V. (e) Let T be a linear operator on an //-dimensional vector space. = [T w .4 Invariant Subspaces and the Cayley-Hamilton Theorem Then ( 1 1 0 Bi © B> '• lh 0 0 ^ 0 2 321 1 0 0 0 0 0 0 0 0 0 0 3 0 0 0 1 2 0 1 2 0 1 1 0 ^ 0 0 1 3 1J The final result of this section relates direct sums of matrices to direct sums of invariant subspaces.

and let W be a Tinvariant subspace of V. and W = P2(R) (b) V . Prove that the intersection of any collection of T-invariant subspaces of V is a T-invariant subspace of V. then the same is true for T. find an ordered basis for the T-cyclic subspace generated by the vector z. for any eigenvalue A of T 4. b. a 4.). (a) { 0 } a n d V (b) N(T) and R(T) (c) E\. c) = (a 4. Let T be a linear operator on a vector space with a T-invariant.i. Prove that W is p(T)-invariant for any polynomial g(t). T(/(a:)) . a 4-6 4. Prove that the following subspaces are T-invariant. determine whether the given subspace W is a T-invariant subspace of V. T(.Q 2) A and z = 7. 5.322 Chap. Prove that the restriction of a linear operator T to a. Let T be a linear operator on a vector space V.t): te R\ (d) V = C([0. and W = {{Lt. T(tt. and z = cx.b for some a and 6} '0 1' (e) V = M2x2(R). b. T(f(:r)) =.d).R. as in Example 6. Let T be a linear operator on a finite-dimensional vector space V. . 6. b .se 6. c. T(A) = A'\ and z = 1 0 (d) v = . T(a. For each linear operator T and cyclic subspace W in Exerc4. . and W = { / E V: f(l) = ai 4. Let T be a linear operator on a vector space V. T(/(s)) = /"(. compute the characteristic polynomial of Tw in two ways. and W = P2(R) (c) V ./•).' f(x)dx] I. (a) V = P3(R). (a) V = R4.P(R). Prove that if v is an eigenvector of Tw with corresponding eigenvalue A.c. 9. T(A) = A.xf(x). a + c. and W = {4 G V: A* = A) 3./. 2(R). and z 0 1 (c) V = IV!L. T-invariant subspace is a linear operator on that subspace. subspace W. d) = {a + b. For each of the following linear operators T on the vector space V.c.3 (b) V .P3(i?. 8.(R).b + c.4) . 1])..fix). T(/(/)) = [. a + 6 + c). 5 Diagonalization 2. For each linear operator T on the vector space V.

. For any w E V. 13. let u b e a nonzero vector in V.--})) < n.A 2 . Prove that A = Bx O Bi B. 11. Let A be an n x n matrix with characteristic polynomial f(t) = (-l)ntn + an-it n . and let W be the T-cyclic subspace of V generated by v. 15. 17.). 18. Use the Cayley-Hamilton theorem (Theorem 5. then any nontrivial T-invariant subspace of V contains an eigenvector of T.23) to prove its corollary for matrices. (b) Prove that if A is invertible.AI) = det(O) = 0. then An A" 1 = ( .Sec. 5." But this argument is nonsense. (a) Prove that if the characteristic polynomial of T splits. then so does the characteristic polynomial of the restriction of T to any T-invariant subspace of V.21. find the characteristic polynomial f(t) of T. . Let T be a linear operator on a vector space V. let v be a nonzero vector in V. Why? 16. Prove that (a) W is T-invariant. Let T be a linear operator on a finite-dimensional vector space V.l + • ait + OQ. Prove that the polynomial g(i) of Exercise 13 can always be chosen so that its degree is less than dim(W). it is tempting to "prove" that f(A) = O by saying u f(A) = det(A . 14. (b) Deduce that if the characteristic polynomial of T splits.An~2 aiUn\ J. Let T be a linear operator on a vector space V.4 Invariant Subspaces and the Cayley-Hamilton Theorem 323 10. Prove that dim(span({/ n . in the proof of Theorem 5. and verify that the characteristic polynomial of Tw (computed in Exercise 9) divides f(t).l / a o ) [ ( . A. prove that w E W if and only if there exists a polynomial g(t) such that w = g(T)(?.l )nM — 1 n . For each linear operator in Exercise 6. Warning: If f(t) = det (. Let A be an n x n matrix. 12.A — tl) is the characteristic polynomial of A. (a) Prove that A is invertible if and only if an ^ 0. and let W be the T-cyclic subspace of V generated by v. (b) Any T-invariant subspace of V containing v also contains W.

5 Diagonalization for 19. vk are eigenvectors of T corresponding to distinct eigenvalues. Hint: Use mathematical induction on k.+ a fc _. 23. 24. Let A denote the k X k matrix /() 0 ••• 0 1 0 ••• 0 0 1 ••• 0 0 0 \0 0 0 1 -a„ \ -a.ai. Suppose that Vi. Hint: Suppose that V is generated by /. then vi E W for all i. and let W be a T-invariant subspace of V. expanding the determinant along the first row. then U = q(T) for some polvnoniial g(t). Prove that if v\ 4. and suppose that V is a T-cyclic subspace of itself. Let T be a linear operator on a two-dimensional vector space V and suppose that T ^ c\ for any scalar c. Let T be a linear operator on a two-dimensional vector space V. ak-\ are arbitrary scalars. Prove that the characteristic polynomial of A is (-!)*(«<) 4-ai/. 22. 21. 2 -ak-2 -Ok-iJ where ao. • • •. 20. Let T be a linear operator on a vector space V.v2..324 (c) Use (b) to compute A ! Chap. then UT = TU if and only if U = g(T) for some polynomial g(t). Choose g(t) according to Exercise 13 s o t h a t o(T)(f) = U(t>). Hint: Use the result of Exercise 23. Prove that the restriction of a diagonalizable linear operator T to any nontrivial T-invariant subspace is also diagonalizable.vk is in W. Prove that if U is a linear operator on V. Let T be a linear operator on a finite-dimensional vector space V. Prove that either V is a T-cyclic subspace of itself or T = cl for some scalar c.! -o. . Show that if U is any linear operator on V such that UT = TU. + --./ k-\ + f Hint: Use mathematical induction on k.v2 4 (.

Then show that the collection of . (See the definitions in the exercises of Section 5.W E V/W.v n } for V. Prove that f(t) = g(i)h(i). (b) Prove that T is a linear operator on V/W. (a) Prove. and T.) Hint: For any eigenvalue A of T..vk} for W to an ordered basis ft = {vi. Hint: Extend an ordered basis 7 = {v\. that is.v2.W for any v 4.. . T is a fixed linear operator on a finite-dimensional vector space V. Exercise 40 of Section 2. and apply Exercise 24 to obtain a basis for EA of eigenvectors of U. Exercises 27 through 32 require familiarity with quotient spaces as defined in Exercise 31 of Section 1. respectively.. Before attempting these exercises. then T and U are simultaneously diagonalizable. and h(t) be the characteristic polynomials of T. g(t). v2. vk. show that E\ is U-invariant.. the reader should first review the other exercises treating quotient spaces: Exercise 35 of Section 1.4.1 by TJ(V) = v + W.6 28. Prove that V is a T-cyclic subspace of itself. prove that rjT = Tip (This exercise does not require the assumption that V is finite-dimensional.Sec. (a) Prove the converse to Exercise 18(a) of Section 5. Hint: Use Exercise 23 to find a vector v such that {v. Tw.. 27. We require the following definition. and let W be a T-invariant subspace of V.6. Let T be a linear operator on a vector space V. 5..6 commutes. and Exercise 24 of Section 2. That. T(v). Let f(t). show that T(e + W) = T(v' + W) whenever v + \N = v' + W. Vfc+ij • • • . Definition.) V V V/W V/W Figure 5.3. Show that the diagram of Figure 5. that T is well defined.. (c) bet //: V — V/W be the linear transformation defined in Exer> cise 40 of Section 2. is. 26. Let T be a linear operator on an //-dimensional vector space V such that T has n distinct eigenvalues. Define T: V/W -* V/W by T(v + W) = T(v) 4... and W is a nonzero T-invariant subspace of V.4 Invariant Subspaces and the Cayley-Hamilton Theorem 325 25.2. T n _ 1 ( v ) } is linearly independent. (b) State and prove a matrix version of (a)..2: If T and U are diagonalizable linear operators on a finite-dimensional vector space V such that UT = TU. For the purposes of Exercises 27 through 32.1..

2 4. Let T be a linear operator on a vector space V. be. Prove the converse to Exercise 9(a) of Section 5. Prove that if both Tw and T are diagonalizable and have no common eigenvalues. (b) Show that {e2 4 W} is a basis for R . Give a direct proof of Theorem 5.22 and Exercise 28 are useful in devising methods for computing characteristic polynomials without the use of determinants. the number of subspaces. that where £ j = [T]7 and S 3 = [T]f. vk \. Let A = /I . concerned with direct sums. T = LA.326 Chap. and let W be the cyclic subspace V 2 l) of R generated by e±. and let Wi.6 is helpful here. then so isT. T-invariant subspaces of V. then T is diagonalizable.25 for the case k = 2. Prove that W.W . and use this fact to compute the characteristic polynomial of T. (c) Use the results of (a) and (b) to find the characteristic polynomial of A. * Exercise 35(b) of Section 1.i /W. 34.W 2 4 h Wfc is also a T-invariant subspace of V. . -3\ 2 3 4 . and apply the induction hypothesis to T: V/W — V/W. (This result is used in the proof of Theorem 5. 29. Prove Theorem 5. (a) Use Theorem 5. v„ + W} is an ordered basis for V/W.2: If the characteristic polynomial of T splits.. Exercises 33 through 40 arc. 30. . Hints: Apply mathematical induction to diin(V). 5 Diagonalization cosets a = {vk+i 4.j is an upper triangular matrix. then there is an ordered basis 0 for V such that [T].24.25. 31. 4. let.W. and prove. The results of Theorem 5. . W> W/. 33. This is illustrated in the next exercise. Hint: Begin with Exercise 34 and extend it using mathematical induction on k. . 32. 1(4 W = span({e})..) 35. First prove that T has an eigenvector v. . Use the hint in Exercise 28 to prove that if T is diagonalizable.22 to compute1 the characteristic polynomial of Tw.

Prove that T is diagonalizable if and only if V is the direct sum of one-dimensional T-invariant subspaces. 42. Lei T be a linear operator on a finite-dimensional vector space V. be T-invariant subspaces of V such that V W| | Wj : • • • W/. Let B\.( is a diagonal matrix for all T E C if and only if the.Sec. (This is an extension of Exercise 25.n.). + I ir n 4.2 //)}) is L^-invariant. Let C be a collection of diagonalizable linear operators on a finitedimensional vector space V. and let Wi. using the fact that V is the direct sum of the eigenspaces of some operator in C that.'s. Hint: First prove that A has rank 2 and that span({(l. is diagonalizable for all /'. has more than one eigenvalue.. 1 1).. be T-invariant subspaces of V such that V = WiO W2 ] ••• DWfc. find the characteristic polynomial of .W_> WA.) •••<!(•! (T w .4..4 E M„ x „(/i) be the matrix defined by .4 Invariant Subspaces and the Cayley-Hamilton Theorem 327 36. establish the general result by mathematical induction on dim(V)..) Hints for the ease that the operators commute: The result is trivial if each operator has only OIK1 eigenvalue. and let A — B\ B> > ••• i.2 7 Find the characteristic polynomial of A. Let . 5. Let 2 n •2 A = \n2 . finite-dimensional vector space1 V. 41.. 39. . operators of C commute under composition.. Prove that there is an ordered basis .. Otherwise. 37.3 such that [T]. Prove that the characteristic polynomial of A is the product of the characteristic polynomials of the P. and let W|.Bk. (1.. B\. be square matrices with entries in the same field. Prove that det(T) =det(T W | )«let(Tw. Let T be a linear operator on a. — 1 for all i and j.4.W-_> W. 40. Prove that T is diagonalizable if and only if T w . Let T be a linear operator on a finite-dimensional vector space V. B2. 38.

5 Diagonalization INDEX OF DEFINITIONS FOR CHAPTER 5 Absorbing Markov chain 404 Absorbing state 404 Characteristic polynomial of a linear operator 249 Characteristic polynomial of a matrix 218 Column sum of a matrix 295 Convergence of matrices 284 Cyclic subspace . subspace 313 Limit of a sequence of mat rices 284 Markov chain 291 Markov process 291 Multiplicity of an eigenvalue 263 Probability vector 289 Regular transition matrix 294 Row sum of a matrix 295 Splits 262 Stochastic process 288 Sum of subspaces 275 Transition matrix 288 .328 Chap.'514 Diagonalizable linear operator 245 Diagonalizable matrix 246 Direct sum of matrices 320 Direct sum of subspaces 275 Eigenspace of a linear operator 264 Eigenspace of a matrix 264 Eigenvalue of a linear operator 246 Eigenvalue of a matrix 246 Eigenvector of a linear operator 246 Eigenvector of a matrix 246 Fixed probability vector 301 Generator of a cyclic subspace 313 Gerschgorin disk 296 Initial probability vector for a Markov chain 292 Invariant.

11* Inner Products and Norms The Gram-Schmidt Orthogonalization Process and Orthogonal Complements The Adjoint of a Linear Operator Normal and Self-Adjoint Operators Unitary and Orthogonal Operators and Their Matrices Orthogonal Projections and the Spectral Theorem The Singular Value Decomposition and the Pseudoinverse Bilinear and Quadratic Forms Einstein's Special Theory of Relativity Conditioning and the Rayleigh Quotient The Geometry of Orthogonal Operators M . to every ordered pair of vectors . where F denotes either R or C.8).1 6. conditioning in systems of linear equations (Section 6. should play a special role. 6.6 I n n e r P r o d u c t S p a c e s 6. the so-called inner product spaa structure. length.Lost applications of mathematics are involved with the concept. (See Appendix D for properties of complex numbers. which have a built-in notion of distance.11). Let V be a vector space over F. least squares (Section 6.8.3 6.3).10). An inner product on V is a function that assigns. and perpendicularity in R~ and R'' may be extended to more general real and complex vector spaces. a 329 .r and y in V.8* 6.4 6. of measurement and hence of the magnitude or relative size of various quantities. physics (Section 6. Except for Section 6. Definition.) We introduce the idea of distance or length into vector spaces via a much richer structure. and quadratic forms (Section G.7* 6. This added structure provides applications to geometry (Sections 6. So it is not surprising that the fields of real and complex numbers.5 6.9* 6.2 6.6 6.10* 6.1 INNER PRODUCTS AND NORMS Many geometric notions such as angle.5 and 6. we assume that all vector spaces are over the field F.9). All of these ideas are related to the concept of inner product.

^2(a-i + Ci)k = ^ a. (c) (x. y) is any inner product on a. if z — (c\. y.y). vn E V. y) = (i + i)(2 4. • .) (x 4. an E F and y. (x.3i.y). 6 Inner Product Spaces scalar in F./. the following hold: (a) (x + z.. When F — R the conjugations are not needed.y).30 + 4(4 .5i) = 1 5 .. V and r > 0. for x = (1 4. i=l i=l i=l = fa y) + (z. vector space.y) = (x.y).b. inner product be linear in the first component.4) and y = (2 . Conditions (a) and (b) simply require that the. and z in V and all c in F.x) X ) ifxi= 0.y) = r{x.. . and in early courses this standard inner product is usually called the dot product and is denoted by X'-y instead of {x. define fay) n = ^2"i'>ii i = ^2at (vt. . y) . It.4 I 5i) in C2. is easily shown that if a\ .. Thus..a2.y) = (y.x) if F = R. t hen (^aiVi.. denoted (x.. For example. bn) in F".. where the bar denotes complex conjugation. y).y) Example 1 For x = (ai. If r < 0.y) = c(x. Note that (e) reduces to (x.... + ^ c. an) and y = (b\. we have for (a. The verification that (•• •) satisfies conditions (a) through (d) is easy. we may define another inner product by the rule (x. such that for all x.y)... a2..y) = (y.y). (b) (cx.. v2. v\..i. c2-. b2.y) + (z. J).. Example 2 If (. • • • • c-n). (d) (x. then (d) would not hold.x).330 Chap.z.1 5 1 • The inner product in Example 1 is called the standard inner product on F".

B.B)) .1]). C) = tr(C*(/l 4. B) . A. (a) and (b) are immediate. Let A E M m X n ( F ) . A) = tr(AM) = ^2(A*A)u i--\ i=] = X)E(A*)**A« .M„ X „(F).y) = //'•'•• The conjugate transpose of a matrix plays a very important role in the remainder of this chapter. for all i. Since the preceding integral is linear in / . (Recall that the trace of a matrix A is defined by tr(. An. We define the conjugate transpose or adjoint of A to be the n x m matrix A* such that (A')ij — A J. Example 5 Let V . / ) = / J [f(t)]2 dt > 0.j. A .1 Inner Products and Norms Example 3 331 Let V — C([0. \ I. For f. .4.C*B) .C) + {B. let. then (x.C C V.C).2/ 3 \i i 2 1 4-2/ 3+U Notice that if x and y are viewed as column vectors in F".g C V. A) > 0. • Definition.Sec. hold and leave (b) and (e) to the reader.g) .triC* A) r. and hence (/..4 ^ (). . Example 4 Let A= Then A* = 1 .i 2 M . then Aki ^ 0 for some k and i. and (c) is trivial.1].tr(C*B) = (A. I f / / tl. the vector space of real-valued continuous functions on [0.tv(B'A) for A.) We verify that (a) and (d) of the definition of inner product.l) — Y]"_ . Also (A.4 + B.J : E ^ fc=l / . In the case that .1] (continuity is used here). define (f. So (A. for this purpose. and define (.4 has real entries. 6.I &=] Now if .t r ( C M 4.B E V.][] f(t)g(t)dt. then f2 is bounded away from zero on some subinterval of [0. • . A': is simply the I ranspose of A. Then (using Exercise ( of > Section 1.4) (.

It is clear that if V has an inner product (x. Likewise. For the remainder of this cha. however. it follows that H is an inner product space (see Exercise 16(a)). For example.g(x)). these two inner products yield two different inner product spaces. The reason for the constant 1/27T will become evident later.g) = f(t)g(t)dt. where f\ and f2 are real-valued functions. For instance. F n denotes the inner product space with the standard inner product as defined in Exam. the polynomials f(x) = x and g(x) — x2 are orthogonal in the second inner product space.y) and W is a subspace of V. At this point. 6 Inner Product Spaces The inner product on M n X n ( F ) in Example 5 is called the Frobenius inner product. which arises often in the context of physical situations. 3. = 1 f(t)g(t)dt and (f(x). A very important inner product space that resembles C([0. we mention a few facts about integration of complex-valued functions. Thus we have jf=]fi+ijf2 and / / = / / • From these properties. we call V a real inner product space.pter. Second.332 Chap. the imaginary number /' can be treated as a constant under the integration sign. This inner product space. is examined more closely in later sections. every complex-valued function / may be written as f — fi -\. we call V a complex inner product space. Some properties that follow easily from the definition of an inner product are contained in the next theorem.ple 1. as well as the assumption of continuity. .if2. it can be shown that both (f(x). First. A vector space V over F endowed with a specific inner product is called an inner product space. Even though the underlying vector space is the same. y E W. then W is also an inner product space when the same function (. y) is restricted to the vectors x. The reader is cautioned that two distinct inner products on a given vector space yield two distinct inner product spaces. and 5 also provide examples of inner product spaces. Thus Examples 1. whereas if F = R. Mnxn(F) denotes the inner product space with the Frobenius inner product as defined in Example 5. l|) is the space H of continuous complex-valued functions defined on the interval [0. 27t] with the inner product I 2T T (f.x.g(x))2 = j f(t)g(t)dt are inner products on the vector space P(R). but not in the first. If F — C.

2) for aii x E V.y G V and c E F. (c) (Canchy Schwarz Inequality) \{. .x) = 0 .y).c2 — sj(x. x.1 Inner Products and Norms 333 Theorem 6.0. and (e) are left as exercises. (a) (x. For x E V. (d). • .x) = (y. a-2 .x) = (y. 6. the following statements are true.. the well-known properties of Euclidean length in R'5 hold in general. we define the norm or length of x by \\x\\ = \/(x.2.||/y||.a 2 ./.0) = (0.y + z) = (x. 2 € V and c E F. (c). Let V be an inner product space. ) .x) + (z.l show thai the inner product is conjugate linear in the second component..r|| 4. fcnen // — z. (b) facy) =c{x. we have ||«|| — lal.)./-. Then for all . As we might expect. i^ the Euclidean definition of Length. //) = (. (d) (Triangle Inequality) \\x \ y\\ < ||.r. .c) C R' is given by s/a2 — b. -a.b.Sec. Proof (a) We have {. y.( / / + z. If .y) \ faz)..// 4. In order to generalize t he notion of length in R' to arbitrary inner product spaces. . Let V be an inner product space over F.i\y)\ < ||:r|| • \\y\\. Let V be an inner product space.x). (c) (x.x)./•) = 0 if and only if x . This leads to the following definition. (d) (. the following statements are true. (a) ||cx|| = |c|-|k|l(b) ||. Definition. \\x\\ > 0. then -1 I /2 \x\\ : ||(ai. In any ease./•.. Theorem 6.r|| = 0 if and only if x = tl.\.x) + fax) = fay) + The proofs of (b). we need only observe that the length of x — (a. Then for x. (e) //' {.1. Example 6 Let V — F"./• = (r/|. as shown next. a n ) | | E 2 faz). Note that if // — I. The reader should observe that (a) and (b) of Theorem (i.

y) denotes the real part of the complex number (x.||2 + 2||:r||-||y|| + \\y\\2 = (\\x\\ + \\y\\)2.c (y. (d) We have \\x + y||2 = (x + y. 6 Inner Product Spaces Proof. where )R(x.i L. 1 The case when equality results in (c) and (d) is considered in Exercise 15. (c) ]£y = 0. x) f cc (y. For any c E F .334 Chap.|/)| + ||y||2 < ||.7. We leave1 the proofs of (a) and (b) a.2${x. y). y) + (y. y) .r. x) + (x. x . we have 0 < \\x — cy\\ — (x — cy. Note that we used (c) to prove (d). x) .cy) — c (y. In particular.c (x.x 4 y) = fa x) + (y. y) = \\x\\2 4. x — cy) = (x.y) \\\y\\2 <||:r|| 2 4-2|(. x — cy) = (x. = l E N 2 -i 1/2 .y) =Wx\?-l^y) II < &''I i= 1 1 /2r " 2 i 2 £ N J-\ J -I 1/2 < Ei«. So assume that y ^ 0.2 to the standard inner product to obtain the following well-known inequalities: J^dibi and 1 1/2 £ k I 'hi 2 [y. if we set fay) (<j-y) the inequality becomes o<fax)-^i:-y)f from which (c) follows. then the result is immediate. we may apply (c) and (d) of Theorem G.s exercises.y). Example 7 For F".

: n is an integer}. we have. Clearly S is a subset of H. however. for m ^ n.1). Notice also that nonzero vectors x and y are perpendicular if and only if cos# = 0. This set is used in later examples in this chapter. where 0 < t < 2-K. A subset S ofV is orthogonal if any two distinct vectors in S are orthogonal. _ .1 Inner Products and Norms 335 The reader may recall from earlier courses that. observe that multiplying vectors by nonzero scalars does not affect. a subset S of V is orthonormal ifS is orthogonal and consists entirely of unit vectors. 2)} is an orthogonal set of nonzero vectors. We are now at the point where we can generalize the notion of perpendicularity to arbitrary inner product spaces.. A vector x in V is a.-1. »T 2T \/TOJ Jn) —0 1 2iri(rn eimteint dt = ei(m-n)t dt 2TT •7 _7 ' _ i(rn — n)t = 0.y) = 0. v2. if and only if (x.y) = 0.2)} Our next example is of an infinite orthonormal set that is important in analysis. then S is orthonormal if and only if (vi. (1. -^(1. let fn(t) — eint. Note that if S = {v\. Definitions. Example 8 In F3. (Recall that eint — count + isinnt.1.}. For any integer n. Example 9 Recall the inner product space H (defined on page 332). 6. we obtain the orthonormal set { .L ( u . The process of multiplying a nonzero vector by the reciprocal of its length is called normalizing. but it is not orthonormal. We introduce an important orthonormal subset S of H. Let V he an inner product space.y) — ||x||*|MI cos0. Finally. i is the imaginary number such that i2 = — 1. that is. we have that (x.Sec. {(1. where Sij denotes the Kronecker delta. 4=(-i.. 1). For what follows.i. (—1. unit vector if\\x\\ = 1. —1.Vj) = Sij. where 0 (0 < 6 < TT) demotes the angle between x and y. Also. This equation implies (c) immediately since |cos#| < 1. Vectors x and y in V are orthogonal (perpendicular) if (x. if we normalize the vectors in the set. then (l/||x||)rr is a unit vector..) Now define S = {/„. for x and y in R3 or R2. 0). Using the property that elt — e '' for every real number t. their orthogonality and that if x is any nonzero vector.1. o ) .

(2 4.i 0 5. Chap. Label the following statements as true or false. x = (2. An inner product space must. inequality. \\x\\. |j/||.4 1 2 \i and B = 1 4.336 Also. and (A. {/.4 1 -i i 2 Compute (x.(/**).2. 1 4-2/) be vectors in C3. In C2.i. 2. where .1 f i. (a) Complete the proof in Example 5 that (•. and \\f I g\\. be over the field of real or complex numbers. If x.) = Smn. Only square matrices have a conjugate-transpose.i) and y . and ||:r + v/||. • EXERCISES 1. (a) (b) (c) (d) (e) (f) (g) (h) An inner product is a scalar-valued function on the set of ordered pairs of vectors. 3.2i). . spaces. An inner product is linear in both components. show that (x. 1]). 2 + 3i) and y .B) for .. y) for x = (1 . Then verify both the Cauchy Schwarz inequahty and the triangle.y). 4.A||. In C([0. 3 -.„. \\B\\.|. \\y\\. Let.I and g(t) = e'./„. Compute (x.y) — 0 for all x in an inner product space. then y = 0. There is exactly one inner product on the vector space R".z).y) — xAy* is an inner product.y) — (x. Then verify both the Cauchy Schwarz inequality and the triangle inequality. (b) Use the Frobenius inner product to compute ||. '|<y. let f(t) . •) is an inner product (the Frobenius inner product) on MnX. and z are vectors in an inner product space such that (x. If (x. 6 Inner Product Spaces In other words.(2 v'.i. Compute (f\g) (as defined in Example 3). then y — z. y. The triangle inequality only holds in finite-dimensional inner product.

a2. 10.2. 4..2\\y\\2 for all x. Prove that ak be J^^i i=l Ei«. show that \\x + y\\2 + \\x ..n2i=\ 13.B) on U2x2(R). Let V be an inner product space. Prove that (•.y) | = ||rr|| • \\y\\ if and only if one of the vectors x or y is a multiple of the other. Let {v\. and let a. are two inner products on a vector space V. B) = tv(A 4. •)x and (•. then | (x. Prove the parallelogram law on an inner product space V: that is.1. (a) Prove that if V is an inner product space..Sec. <••/. 7.y\\2 = ||a:||2 4. Let A and B be n x n matrices.1 Inner Products and Norms 6. (c) (f(x).ni"..(•. y E V.B*. (b) Prove that if (x. 14. What does this equation state about parallelograms in R2? 12. 11.\. and let c be a scalar. •). z) for all z E 0. Prove that \\x 4.b). Let 0 be a basis for a finite-dimensional inner product space. where ' denotes differentiation. let a = fay) 1 2 : .} be an orthogonal set in V. -) 2 is another inner product on V. Complete the proof of Theorem 6.\\y\\2• Deduce the Pythagorean theorem in R2. Complete the proof of Theorem Ci..c||2 4. (b) (A. scalars.(c. 337 Provide reasons why each of the following is not an inner product on the given vector spaces. Hint: If the identity holds and y -£ 0. •) = (•. 15.. Suppose that (•.. •). z) = (y. then x = y... 9. z) = 0 for all z E ft. v2. and suppose that X and y are orthogonal vectors in V. (a) ((a. 6. Prove that (A + cB)* = A* +c.y\\2 = 2||. then x = 0.d)) =ac bd on R2.g(x)) = / J f'(t)g(t)dt on P(R).. (a) Prove that if (x.

y) — (T(x). Let A be an n x n matrix. Let V be an inner product space. prove1 that if . Prove that T is one-to-one. of the matrix A! Let A be an n x //.\\y\\. 19..•1/2 (. where 3? (x. (a) 20.l]).4 = Bx 4. That is. (b) Derive a similar result for the equality ||x 4. and B2 = A2. (a) Show that the vector space H with (•. y E V.C([().y).i./ Is this an inner product on V? 17. (b) Prove that A\ = . 16. 18.r|| 4. Define Ai = -(A + A*) 2 (a) and A2 = 2/ -(A-A*). where B[ — B\ and B2 . yE V.r|| = \\ay 4. and A = Ax + iA2.B2. and suppose that ||T(. Prove that \\x ± y\\2 = \\x\\2 ± 24? fa y) + \\y\\2 for all x. where F = R or F . if F = R. vectors. y) denotes the real part of the complex number (x.?r)|| = ||. it to the case of /.4| and A2 to be the real and imaginary parts.4.A//).T(y)) defines an inner product on V if and only if T is one-to-one.z\\2 to obtain ||c|| = 0. Would it be reasonable to define .B-2. then B{ = . Prove the polar identities: For all x. = cwnere <2 = ~l 21. A*. Prove that y and z are orthogonal and a = 2 //I Then apply Exercise 10 to ||.4. •). matrix. . 6 Inner Product Spaces and let z = x — ay. (b) Let V ./:|| for all x. prove that (x.and generalize.r jy'| for all x.338 Chap. Prove that the representation in (a) is unique.('. and let W be an inner product space ewer F with inner product (•. (t>) I ikll \W\ i < ll. respectively. and define . •) defined on page 332 is an inner product space. Let V be a vector space over /•'. Let V be an inner product space over F. Let T be a linear operator on an inner product space V. (a) (b) fay) fay) = l\\x + y\\2-\\\x= \ E L I <k\\x + jkyf y\\2 if F f(t)g(t)dt. If T: V —> W is linear.y\\ = ||. = A2.y G V.

11). and let 3 be a basis for V. For any orthonormal basis 0 for V.M m x n ( F ) . (b) V .y) for all x. .Ay) = (Bx. where F is either R or C. (d) Define linear operators T and U on V by T(x) — Ax and U(x) = A*x.1 Inner Products and Norms 339 22. For x. and \\x\\ = 0 if and only if x = 0.4 G V for all / G V (a) an( i y = /^hVji=\ .y E V there exist v\. (b) Suppose that for some B € M n x n ( F ) . Definition. Prove that. 24.bi.v2.C([0. real-valued function on V satisfying the following three conditions for all x. on V and that 0 is ail orthonormal basis for V.y E_ V. defined above is the standard inner product..Sec. Regardless of whether V is or is not an inner product space.y) for all x. The following definition is used in Exercises 24 27. y C V and a E F: (1) j|. \\ • || as a. Prove that B = A*. (b) Prove that if V — R" or V = C" and ft is the standard ordered basis.l] for all . let Q be the n x n matrix whose columns are the vectors in ft... Q* =Q '. 23.r|| > 0. we have (x. and let A E M. we may still define a norm.y C V. Thus every real or complex vector space.„X/I(/•"). (a) V . Prove that the following are norms on the given vector spaces V. may be regarded as an inner product space. Let V = F n . (c) Let a be the standard ordered basis for V. Ay) = (A*x. ||^|| = max \AKj\ \\f\\ = max 1/(01 t€[0. Show that [\J]p — [T]g for any orthonormal basis ft for V. Let V be a real or complex vector space (possibly infinite-dimensional).r||. then the inner product. Prove that (•• •) is an inner product. (a) Prove that (x. (2) \\ax\\ = \a\ • ||. 6. Let V be a vector space over F.v„ E ft such that x = 2_]aivi I I Define fay) = ])2"i. (3) \\x + y\\ < \\x\\ + \\y\\.

dfa x) = 0.y) | = | ( c . and define.u. j6|} 25. Hint: Condition the definition of norm can be helpful. (h) Use the fact that for any c E R. the scalar d.y E V. d(x.(cx. 26. dfa y) f 0 if x ^ y.x) for all x E V. \c — r\ can be made arbitrarily small.y) | < ||.340 (c) V = C([0.((c—r)x.. dfay)< dfaz) + dfay). (a) (b) (c) (d) (c) d(a. y E V.b) E V i. to establish item (b) of the definition of inner product.x||2 = (x. (d) V = R . 6 Inner Product Spaces 11/11=/ \f(t)\dt forall/GV for all (a. (f) Prove | (x.1]). •) on R2 such that \\x\\2 = (x. (d) Prove m(—x.y) = \\x — y\\. every rational number r. . y) = r (x. (g) Prove that for every c E R. y) for all x.y€V. (e) Prove (rx.x) for all x E R2 if the norm is defined as in Exercise 24(d).y) for every positive integer m and x. \cfay) every every every (3) in every . y E V.y) = d(y. y) — (x.y) \ < 2|c-r|||:e||||y||. a. y) for all x.l k . y.x). z E V. 2y) = 2 (x. Define fay) = 4 [lls + 3/ll 2 .(x. y E V. y) + (u.y)>0.r ) fay) . and x.y) = ?i(x. called the distance between x and y.'c||||y|| for every x.y E V.y) for every positive integer n and x. Prove the following results for all x. norm on a real vector space V satisfying the parallelogram law given in Exercise 11. •) defines an inner product on V such that ||. (c) Prove (nx. Let || • || be a. where r varies over the set of rational numbers. y) for every rational number r and x.2 / l l 2 ] • Prove that (•. Use Exercise 20 to show that there is no inner product (•. b)\\ = max{|a|.yEV. (b) Prove (x 4. Let || • |j be a norm on a vector space V.y) = (x. Hints: (a) Prove (x. for each ordered pair of vectors. 27. 2 Chap.

/:. Let (•. that [x. Let [•. 6. we have seen the special role of the standard ordered bases for C" and R". bases that are also orthonormal sets are the building blocks of inner product spaces./. Example 2 The set • is an orthonormal basis for R2. iy] Prove that for x. 30. furthermore. Hint: Apply Exercise 27 to V regarded as a vector space over R.. such that [.y E V. where V is regarded as a vector space over R. •] is a real inner product on V..Sec.ix] = 0 for all x E V. and suppose that [•. ) is a complex inner product on V.lust as bases are the building blocks of vector spaces. y c V. Prove. Let V be a vector space over C. Let V be an inner product space. The special properties of these bases stem from the fact that the basis vectors form an orthonormal set. 6. A subset of V is an orthonormal basis forW if it is an ordered basis that is orthonormal. Prove that there is an inner product (•. •] is an inner product for V.•) be the complex-valued function defined by fa y) = [. Let V be a complex inner product space with an inner product (•. •). 29. where V is regarded as a vector space over R.i[x. •] be the real-valued function such that [x. Prove that [*.x) for all x G V. . y] 4. Example 1 The standard ordered basis for F" is an orthonormal basis for F". Definition.y] is the real part of the complex number (x. Let || • || be a norm (as defined in Exercise 24) on a complex vector space V satisfying the parallelogram law given in Exercise 11.•] = 0 for all ./• G V.y) for all x.. We now name such bases. • .2 THE GRAM-SCHMIDT ORTHOGONALIZATION PROCESS AND ORTHOGONAL COMPLEMENTS In previous chapters. Then apply Exercise 29.2 Gram-Schmidt Orthogonalization Process 341 28.-. •) on V such that ||aj|| = (x.

space and S — {v\. 6 Inner Product Spaces The next theorem and its corollaries illustrate why orthonormal sets and. If. . S is orthonormal and y E span(S). v2 ... Proof. lLai^"'^VJ) = ai (vJ>vl) = a j\\vj\\2- So aj — .32 . Then. T h e o r e m 6.i where a..3. Vj) for all j.vj) = \Y^(HVi-v>) = ak G F. t^2-. Write y — 2_.Vi '•/ i=i Proof.vk] be an orthogonal subset of\/ consisting of nonzero vectors.j = (0. in addition to the hypotheses of Theorem 6. Corollary 1. I The next corollary follows immediately from Theorem G. orthonormal bases are so important. and let S be an orthogonal subset of"V consisting of nonzero vectors. then Corollary 1 allows us to compute the coefficients in a linear combination very easily. Suppose that V\. • • • .ll 2 =0 As in the proof of Theorem 6. (See Example 3. a2. then y.nk E S and y^aiVt i=l = o.3 with y ~ 0. i^i we have (y. i=\ If V possesses a finite orthonormal basis. If y E span(S). we have o. Let V be an inner product.3.\. Ie. in particular. a>vi..) Corollary 2. then k y = ^2(y>Vi)vi. and the result follows. for I < j < k. So S is linearly independent. Then S is linearly independent.3. Let V be an inner product space.342 Chap..

It tells us how to construct an orthogonal set from a linearly independent. \/6 v6 As a check.0) i ( 1 343 . we have (2.1. Figure 6. the orthonormal set ^(1. where v\ = W\ and v2 = xv2 ~ cw\.w\) So (w2.wi) Ml •Wi c(wi. Fhe next theorem takes us most of the way in obtaining this result. 0 ) + | ( 1 .Wi) = (w2.—=.wi) Ikill 2 Thus v2 = w2 (w2. Let x = (2.1 .W2) is a linearly independent.3). following equation: 0 = (v2.7 = ( . 2 ) obtained in Example 8 of Section 6. every finite-dimensional inner product space possesses an orthonormal basis. set {v\. Of course. let us consider a simple case.CWi. 1 ) + | ( . 6.v2}.1.w<2} that spans the same subspace.Wi) = (w2 . yet shown that.1). We want to construct an orthogonal set from {w\.1 contains an infinite linearly independent set. . l . • Corollary 2 tells us that the vector space H in Section 6. we have not. ai = -L(2 + l) = -L a2 = J- ( 2 _ 1 + 3) = -l and «3 = . and hence H is not a finite-dimensional vector space. 1 . we need only solve the.1 suggests that flu.3 that express x as a linear combination of the basis vectors an.2 Gram-Schmidt Orthogonalization Process Example 3 By Corollary 2.1 is an orthonormal basis for R'!. subset of an inner product space (and hence a basis for some two-dimensional subspace). Before stating this theorem.Sec.l .1 . has this property if c is chosen so that v2 is orthogonal to W|. Suppose that {W\. 1 . To find c. set of vectors in such a way that both sets generate the same subspace. 4 g ( .wi) .1. Fhe coefficients given by Corollary 1 to Theorem 0. 2 ) .2 + 1 4-6) .3) = | ( 1 . -1.

Hence S'k is an orthogonal set. c2. e^) = 0 if i -/. by (1). then (1) implies that wk E span(5^.('i) = (wk.. let Sk — {w\.*'. .-1} with the desired properties has been constructed by the repeated use of (1). then the theorem is proved by taking S[ = »S|. subset o/'V. We show that the set S'k = {r\. Vi = </-i 7^ 0. The proof is by mathematical induction on n. •. . so dim(span(5^)) = dim(span(5fc)) = k. For k — 1. V2...Vi) . ?'/.. (1) ..e. . which contradicts the assumption that Sk is linearly independent.2. n . v2. 6 Inner Product Spaces W\ = Vl Figure 6. Define S' = {i'\. i. Now.4 is called the Gram Schmidt process. Vi = W\ and A—I for 2 < k < n. Proof. For I < i < k — 1. If vk = 0.344 Chap. Theorem 6.• • • .i ) . where vk is obtained from Sk_j by (1). I The construction of {v\.1 The next theorem shows us that this process can be extended to any finite linearly independent subset. . the number of vectors in 5. .vk} also has the desired properties. Assume then that the set S'k_1 — {t'\.wk}. • • • . But by Corollary 2 to Theorem 6.._ j is orthogonal. we have that span (SI) C spanfSfe).w2 .} by the use of Theorem 6.v2.vn}.) = span(Sfe). of nonzero vectors. Let V be an inner product space and S = {w\.2 ^ II 1 2 ^P"J 1 J I I™ = 0.Vi) ~ IMP since (e . . Wn } where = (Wk.j by the induction assumption that S'[. vk-i.3._1) = s p a n ^ . '3 Then S' is an orthogonal set of nonzero vectors such that span(S') = span(5). If n = 1.. Therefore span(S£. it follows from (1) that fe-i (Vk-. S'k is linearly independent.4. . w2 be a linearly independent.

0.1.0. 1 . 1.V2) i''2 = (0. 1) = (-1.v2.w3 (w3.1.w3} is linearly independent.w2. and (x.1). and then we normalize these vectors to obtain an orthonormal set.1.2.2.1. where i . w2 . =wy = (1.= ( i . These vectors can be normalized to obtain the orthonormal basis {a. and then use this orthogonal basis to obtain an orthonormal basis for P 2 (/?).0). I''2 II " 72 .2. 0 ) .. 6.-(1.0) = (0.1). Then j|y•.1.1). u2. . I ) . u3]. II "31| s/2 Example 5 Let V = P(R) with the inner product (f(x).) = / / • 1 dt = 0.1) .1. v3 . L0).1. Then v2 = a-2 ——IT^-UI (w2^\) = (1. o . u-i = —-rl= ! •'• l|| v2 u2 = and us = ir^ir = 4 = ( . 0.1. Then [wi. ||2 = / Thus v2 = x (vi.1. • —v2 = —=(o.(1.x) 'V 2 0 =*~2=xI 2 dt . We use the Gram Schmidt.v3} for P2(R).g(x)) = f \ f(t)g(t) dt.0). Take r.1. We use the Cram Schmidt process to compute the orthogonal vectors <•. process to replace 0 by an orthogonal basis {vi. Finally. and U3.Sec.(1. 0 . v2.1. and consider the subspace P2(R) with the standard ordered basis ft.i . and w3 = (0. c.0.r(0.| ( 1 .\.0.0. let wj --.1. 1.1.vi) \V] V-i (W3. Take e.1).0).0.0.2 Gram-Schmidt Orthogonalization Process Example 4 345 In R'.0).

r 2 ~ l ) . 1 U\ = f*l2dt 1 v^' To obtain an orthonormal basis. we obtain an orthogonal basis whose elements are called the Legendre polynomials.} for P(R). V8 I''3 | • Thus {1*1.v2. w3 = \l!-i*dt = n x ' 1'3 = A . vn} and x E V./-i / 2 -tdt = 0.• 1 .r 2 ) o-«2 . ..x.1*3} is the desired orthonormal basis for P2(R).x 2 — ^} is an orthogonal basis for P2(R). Then V has an orthonormal basis ft.x 2 . and v3 to obtain u2 = and similarly. Let. x2. we normalize vi. V be a nonzero finite-dimensional inner product space.0 • :r -•*-§• We conclude that {l. The following result gives us a simple method of representing a vector as a linear combination of the vectors in an orthonormal basis. (:r 2 ...( 3 . = / Therefore V3 — X <**. v2.)„ t.. Theorem 6.1*2. v2. The orthogonal polynomials v\.vi)vi. and t'3 in Example 5 are the first three Legendre polynomials. x. 6 Inner Product Spaces (x2. then n x= i=l y^{x.. If we continue applying the Gram Schmidt orthogonalization process to the basis {1.. Furthermore. if ft — {vj ... o-U] 1 "l 12 t2-ldt=and Chap..346 Furthermore.«.v2) = .5.

we have. Corollary. v2.5.1*2./•/) given in Theorem 6./ .\ v2 </(*). Apply Theorem 6.= ( l + 2/.j — (T(vj).5 to represent the polynomial f(x) = 14.#') = span(/?n) = V. we obtain an orthonormal set ft that generates V.1*3} for P2{T{) obtained in Example 5.4 to obtain an orthogonal set 0' of nonzero vectors with span(.).v2. The remainder of the theorem follows from Corollary I to Theorem 6. . Proof From Theorem 6.3. we introduce a terminology associated with orthonormal sets 0 in more1 general inner product spaces. 3 0 • Theorem 6.5 have been studied extensively for special inner product spaces. therefore ft is an orthonormal basis for V.. Hence Aij = {T(vj).. Then for any i and j.Sec. vn}.2 Gram-Schmidt Orthogonalization Process 347 Proof. and let A — [T]p. ft is linearly independent. .-).3.. + 3c 2 )c/i-2v / 2. Observe that /•i i (/(. I The scalars (•'.vn were chosen from an orthonormal basis.Vi). Let T be a linear operator on V. 6. Aj.1 . By Corollary 2 to Theorem 6.Sx as a linear combination of the vectors in the orthonormal basis {1*1.2x 4.«.5 gives us a simple method for computing the entries of the matrix representation of a linear operator with respect to an orthonormal basis. Let V be a finite-dimensional inner product space with an orthonormal basis 0 = {('1. «2> = and (/(*W = r ±t(i + 2t + 'M2)dt- 2 ^ -(3/"' 1 2t + SP)dt = 2v/10 Therefore f(x) — 2\J2 u\ -\ 2Vo 2vT0 u2 H 1*3.vi).. Although the vectors V\ .. Let 0o be an ordered basis for V. • • • . By normalizing each vector in 0'. Example 6 We use Theorem 6.

for n = 0. S was shown to be an orthonormal set. and using Exercise 16 of this section.i{/-/. Let ft be an orthonormal subset (possibly infinite) of an inner product space V. Example 7 Let S = {eint: n is an integer}. As a result of these computations. where y E ft./ll tc'"> dt = 1 2^ 'ii\ dt = -1 11 7 </.y). *2TT (f. and the literature concerning the.(\)dt = 7T.cial infinite series as follows: I if-In) i/ii 2 > E 71= — fc -1 E n = -k k Z-^t T)2 n=\ (/•i)r+x. In the first half of the 19th century. f(t) sin ntdt and f(t)cosntdt.fn) and.1.l) 2TT »7 2T i. we have. and let x E V.1. These coefficients are the ''classical" Fourier coefficients of a function.348 Chap. behavior of these coefficients is extensive. 2TT for a function / . We define the Fourier coefficients of x relative to ft to be the scalars (x. 6 Inner Product Spaces Definition./o or more generally. In Example 9 of Section 6. we see that c n = (fifn). Using integration by parts. cn is the nth Fourier coefficient for a continuous function / G V relative to S. In the context of Example 9 of Section 6.}i2 fc y n=l . in H. We compute the Fourier coefficients of f(t) = t relative to S. where fn(t) — e'"1: that is. the French mathematician Jean Baptiste Fourier was associated with the study of the scalars 2ir . We learn more about these1 Fourier coefficients in the remainder of this chapter. sum of a spe. we obtain an upper bound for the. = -e 2 71 . •2n int /(*)' r dt. for n ^ 0.

S' — {x E V: (x. Let S be a nonempty subset of an inner product space V.J of finding the distance from a point P to a plane W. The set S is called the orthogonal complement of S. The desired distance is clearly given by \\y — u\\. Consider the problem in R.2• Additional results may be produced by replacing / by other functions.equals the xi/-plane (see Exercise 5).. we may restate the problem as follows: Determine the. we may let k — oo to obtain > •2 6 oo x .Z ^ .) Problems of this type arise in many settings. The next result presents a practical method of finding u in the case that W is a finite-dimensional subspace of an inner product. concept of an orthogonal complement.2 Because this inequality holds for all A:.Sec. We define S (read "S perp") to be the set of all vectors in V that are orthogonal to every vector in S: that. and se> z G W 1 . Now. y) = 0 for all y E S}. then S1. 62 Gram-Schmidt Orthogonalization Process for every k. . We' are' now ready to proceed with the. we obtain > 2 Y \ + 7T2.2.I 349 U 2 6 > E Z_^ . If we let y be the vector determined by 0 and P. rr. vector a in W that is "closest" to y. using the' fact that |!/|| 2 = — it . space. Definition. Notice from the figure that the vector z = y — a is orthogonal to every vector in W. • Example 9 If V = R'5 and S — {e3}. space. It is easily seen that S ' is a subspace' of V for any subset S of V. Example 8 Fhe reader should verify that {()}~L = V and V ' — {0} for any inner product space V. • Exercise 18 provides an interesting example of an orthogonal complement in an infinite-dimensional inner product. is. (See figure 6.

• • • • vk} is an orthonormal basis for W.6.2 T h e o r e m 6. for any x E W.Vi) = fc y.. Then t* — x is orthogonal to z." ' w Figure 6. 6 Inner Product Spaces =y. Proof.u\\. by Exercise 7. where u' E W and z' E W ± . that z is orthogonal to each Vj. \\y .vj) = (V..VJ) ..v! = z' . so.350 Chap. where z E \NL. Let W be a finite-dimensional subspace of an inner product space V. we have [Z.6..{()}.Vj) To show uniqueness of u and z.vk} be an orthonormal basis for W.z E W n W 1 . and this inequality is an equality if and only if x = a. and let z = y — a.6. Then there exist unique vectors u E W and z E W1 such that y = u + z. then k u= i=l Proof. y^(y. that is. u = u' and z = z'. To show that z E W-1-. let u lie as defined in the preceding equation.(y.2_s(y>vi)vi\ =0. Let x E W. Clearly a E W and y = u + z. In the notation of Theorem 0. >VJ [y>vi) i i J2(y>vi)(vi. suppose that y = u + z = a' + z'.l.v2.v2. it suffices to shew. we have that y = u+ z. Therefore.x\\ > \\y . and let y E V. As in Theorem 6. For any j. if {v\ . by Exercise 10 of Section (i. I Corollary. we- . Let {vi. the vector u is the unique vector in W that is "closest" to y.Vi)vj. Then a . Furthermore.

p. It follows that ||w — x\\ = 0.3 .x\\2 + \\z\\2 \y — x\\2 — \\u + z — x\\2 > || 2 || 2 = \\y. • It was shown (Corollary 2 to the replacement theorem.2 Gram-Schmidt Orthogonalization Process have [u .j f(t)g(t)dt for all f(x). and therefore \\u — #|| 2 + ||z|| 2 = ||z|| 2 . The proof of the converse is obvious.g(x)) . E x a m p l e 10 Let V = P.i(R) with the inner product (f(x). U3} ^ V2 V8v is an orthonormal basis for P2(^?). of orthogonal projections of vectors in the application to least squares in Section 6. 6. .x) + z\\2 = \\u .3. </(*).u\\2. Then the inequality above becomes an equality. 1 = 0.U2.Sec. {ui. The vector u in the corollary is called the orthogonal projection of y on W.u3) u3 = -x. We compute the orthogonal projection f\(x) of f(x) = x3 on P2(R)By Example 5.For these vectors. u2) u2 + (f(x). The next theorem provides an interesting analog for an orthonormal subset of a finite-dimensional inner product space. 17) that any linearly independent set in a finite-dimensional vector space can be extended to a basis. u.Ul)=J^t3-j=dt and ~I Hence fx{x) = (f(x). We will see the importance. U2> = .x\\ = \\y — u\\. 351 Now suppose that \\y . + (f(x). and hence x = u. V6 t\l-(3t'-l)dt = 0.g(x) G V. U]) . we have (f(x).

Now apply the Gram Schmidt process to S'. that din^W. v2. % } for V. then (x. The first k vectors resulting from this process are the vectors in ..2 = 1. E x a m p l e 11 Let W = span({ei. v{) .S1 by Exercise 8.. (b) Because S\ is a subset of a basis. lix e W 1 .{a. ••• ivn} is an orthonormal basis for W ' (using the preceding notation). It is a finite-dimensional inner product space because V is. The result now follows.0.•. 47). Normalizing the last n — k vectors of this set. from (c).n = k + (n .1 ) = 3 . produces an orthonormal set that spans V. S can be extended to an ordered basis S' = [v\. • • . i=k+l (c) Let W be a subspace of V. So x = (0.c) e W 1 if and only if 0 = (ar.e 2 }) in F 3 . • • • :t'fc}.} is an orthonormal set in an n-diinensional inner product space V. Then x . • • •.ei) = a and 0 = (x. Since S\ is clearly a subset of W .V2. Then (a) S can be extended to an orthonormal basis {v\. One can deduce the same result by noting that e3 € W"1 and.7. for any x 6 V. 6 Inner Product Spaces IT Theorem 6. (b) If W = span(S'). Wk \-i.By (a) and (b). Proof.= span({c. (a) By Corollary 2 to the replacement theorem (p.. Vk • Vk+1. (a) The Gram Schmidt orthogonalization process allows us to construct an orthonormal set from an arbitrary set of vectors. • • • • Vk.k) = dim(W) + dimfVV1).. we need only show that it spans W 1 .352 Chap. Therefore...b. and this new set spans V. then Si = {Vk+i.{}). we have dim(V) .. then dim(V) = dini(W) + diin(W L ).Vi)Vi. Vj) Vi € span(5]). vn} for V.e2) = b.?'/. and therefore W . • EXERCISES 1. (c) If W is any subspace ofV. v2. babel the following statements as true or false. we have x= 2^{X. Note that.c). n x = ^2 (x. | .0 for 1 < i < k.Vk+2. Suppose that S = {i>\. and so it has an orthonormal basis {v\. it is linearly independent. v-2..

-4.r. 1) (e) V .0).{(1. (3. 1. 5 . Finally. 7 .2 . S = {(1.4i.5 to verify your result.i .7 6z)}.2 6 •+ 5z) f(t)g(t)dt.1.1) (c) V .(0.g(x)) . 6 + 1 0 i ./• . /. < 1 / 3 5\ (— 9\ (1 -17 and (g) V = M 2 x 2 (/.7i. 1*2.6+9i.5 . 5 . 1).!. S 2 1 3 -16 9.!. and .) are the Fourier coefficients of x. .2 i ) .1 5 + 2 5 / . where S = {(1.1)}.1 2 + 3/. (1.2.0. (0.l -i.6. n 8 6 25 . (-1.l + 7 i . S= {(1.2 .. and x = (-1. . 3 2/. 3 + 4 i ) } .18) (f) V = R4.4 i .R..S') with the inner product {/. ( . ( . ( . and x .2) (b) V . and x = (-11. .•].{(2./}.0.R1. 3 ) . then for any x € V the scalars (x. / i » .(/) = S = {sin/.l .1. I). (0.-2. ( .2. 1 .3.J* f(t)g(t)dt. vn} is a basis for an inner product space V. S= { ( .9-3z. and x = (3 + i.2.1) / \ \.R:\ S .4). I).cos/..4 i .(1.8)}. I). c. and compute the Fourier coefficients of the given vector relative to 0.C . 5 ) ..2. 1 .. .span(.8./.2 Gram-Schmidt Orthogonalization Process (b) (c) (d) (e) (f) (g) 353 Every nonzero finite-dimensional inner product space has an orthonormal basis. use Theorem 6. . (1 -1. Every orthonormal set is linearly independent.4 / ) ./.0.P 2 (/?) with the inner product (f(x). K. .. . S .11)}. 2.).4*)}.3. 1.{(1. (a) V . .1 . l l .I ) .5 t . Every orthogonal set is linearly independent. I. If {(.4. Then normalize the vectors in this basis to obtain an orthonormal basis ri for span(S'). . and h(x) = 1+X (d) V .2i). In each pari. a n d a : = ( 2 I 7i.3 + 5 i . and x = (1.r2}.7.4 ..2 7 . An orthonormal basis must be an ordered basis.-2. apply the Gram Schmidt process to the given subset S of the inner product space V to obtain an orthogonal basis for span(S').4 l Ai) (k) V = C .3)}.S'). 6.( 13 . S -1 27 A= -4 8/ 2 2 4 -12 dA = V = M2x2{R).3.1. (1. 39 11 /. S= {\. 7 6 / . 1).3i. The orthogonal complement of any set is a subspace. and h(t) ~ 21+ I (j) V .Sec. (2 1 3z.1 3 (i) V = span(.

6 .x2} is a linearly independent subset of R3.2 / ' .6.6 and Exercise 10 of Section 0. Describe Sn geometrically. lei " . and let W be a finite-dimensional subspace of V.{.( ( 4 = .2 . Compute S1.2 u. 1)}) in C:{. r 2 /'„ derived from the (Irani Schmidt process satisfy v. prove that there exisls y c V such that y e W J . 5 = -25-38/ 12-78/ >i) .) 11.r.7 1 .. . 7. ) r =N(L. v) = 0 for every /' G 3.where XQ is a nonzero vector in R..3 . Hint: Use Theorem 6.2.3 / .2 .3/ \ / 8i 1 + /' J ' 1 . Let 0 be a basis lor a subspace W of an inner product space V. Hint: Use Theorem 0. for i = 1.7 I 5i 9 . = ir.span({(/'.: Use mathematical induction.9 .0. Find orthonormal bases for W and W^.13/ -7-r24/ 1 /' 2 I 2/ Chap. 10. 5 = 11 .7 / . Let S . Let V be an inner product space.1.1.i). but (. then the vectors V\. . 6 Inner Product Spaces . (Projections are defined in the exercises of Sect ion 2.M 2 x 2 (C). prove that ||T(J-)|| < ll-''| for all .6/ 3 + 18/ . 5. 4= I -1 9' »/0 Find (he Fourier coefficients of (3. In R2.5/ . Let W .1 . w2 ir„ } is an orthogonal set of nonzero vectors.LP = / if and only if the rows of A form an orthonormal basis for C". Prove thai : f W : if and only if (z. Now suppose that S — {x%. Prove that there exists a projection T on W along W thai satisfies |\|(T) = W 1 . Prove thai .354 (1) V = M 2X2 (C). Prove that if { a-\. n ( R ( U .1)} in C3. finite-dimensional subspace of an inner product space V. and let z e V.y) / 0. matrix with complex entries.4) relative to 0 4. Let A be an // x /. In addition. 6.• (f: W.• e V. Describe S geometrically. Hint. Prove that for any matrix A € .3 + 7/ -34-31/ .2 + 8/ 10-10/ -13 \ i 9-0/ and A -II i 2 i V .().132/ 7 126/ I | 3/ . Let S = {(1. If . 9.8i" 1 -r 10?: ./•u}. 8. (1. and A 3. 12. Let W be a.

Sec.y) = 0 for all . S C (6 . If (T(x). *' where {•.) 14. Let V be a finite-dimensional inner product space over F.[ybY = (x. prove that T = To. and let S = {r. Then use Exercise 10 of Section 6. c 2 . prove that Bessel's inequality is an equality if and only if x G span(5).*>!'• Hint: Apply Theorem 6.1 + W2L.. on F". Let V be an inner product space. .3. vn } be an orthonormal subset of V. n W 2 ) L = W. W = (W ' J 1 .. Use Exercise 6. (See the definition of the sum of subsets of a vector space on page 22. respectively. Let {u\ . so span(S) C ( 5 ± ) ± . . (a) (b) (c) (d) So C 5 implies that 5 X C S(f. finite-dimensional inner product space. 1]).. Let T be a linear operator on an inner product space V. then for any x.Vi)(y. ± ) x . Let V = C([-l.6 to x e V and W = span(5). Prove that (W.2 Gram-Schmidt Ort nalization Process 355 13. S and SQ be subsets of V.vn} be an orthonormal basis for V. Let V be an inner product space. (See Exercise 22 . and W be a finite-dimensional subspace of V. Prove that for any x € V we have Ml 2 > £ l <*.. Prove the following results. V — W © W^.v2.y) (b) n ="^2{x. •). (b) In the context of (a). •) is the standard inner product.) Hint for the second equation: Apply Exercise 13(c) to the first equation.1. and W„ denote the subspaces of V consisting of the even and odd functions. y G V prove that (x. .y 6 V (Mx). In fact.y G V. 16. Suppose that W. For any x.My)Y = ([x}0.r. 15. i-l Use (a) to prove that if 0 is an orthonormal basis for V with inner product (•. 17. (a) Bessel's Inequality. Let Wi and W2 be subspaces of a. (a) Parseval's Identity. fltnt. (See the exercises of Section 1. prove this result if the equality holds for all x and y in some basis for V.Vi}. .. 18. .y). 6. + W 2 )' = W f n W ^ and (W.

) Prove that Wg — W 0 . \/t}. 19.356 Chap. Since all but a finite number of terms of n=] the series are zero. and W = P. viewed as a space of functions. .k. 6 Inner Product Spaces of Section 1. Prove that {ci. u = (2. so W £ V. Let V = C ( [ . 23. .3.g) = / J f(t)g(t) dt. (f. •) is an inner product on V. Let V = C([0. For rr. and W = {{x. In each of the following parts.. (a) Find an orthonormal basis for W. 1]) with the inner product (f. (ii) Prove that W 1 = {()}. (a) Prove that (•. where 5n>k is the Kronecker delta.2a. (c) V = P(R) with the inner product.l .7. find the orthogonal projection of the given vector on the given subspace W of the inner product space V.2. y. find the distance from the given vector to the subspace W.' 22. fi G V. and hence V is an inner product space. . (b) Let h(t) = t2. 21. and W = {(x. Let W be the subspace spanned by the linearly independent set {/.1]) with the inner product. let e n be the sequence defined by f-n(k) = <)n. /i(.1]. (R). (a) V = R2. Use the orthonormal basis obtained in Example 5 to compute the "best" (closest) second-degree polynomial approximation of the function h(t) = el on the interval [—1... 20. (b) For each positive integer n. e2. (c) Let an = e\ + en and W = span({<rn : n > 2}.6).) = 4 + 3. y): y = Ax}. where the inner product on V is defined by </>£> = / MMVdt.g(x)) = J0' / ( % ( * ) * .1..fi) = y o-(n)fi(n). Use the orthonormal basis obtained in (a) to obtain the "best" (closest) approximation of //. (f(x). 3). (i) Prove that eA $ W. in W.2z = 0}.2. (b) V = R3. In each part of Exercise 19. we define (a. the space of all sequences a in F (where F = R or F = C) such that o~(n) 7^ 0 for only finitely many positive integers n. u = (2.} is an orthonormal basis for V. and conclude that W ^ (\N1)±. Let V be the vector space defined in Example 5 of Section 1.7. z): x + 3y . the series converges. and let W be the subspace P 2 (i?).g) = f \ f(t)g(t)dt.

y) = (x. (2. y') for all .r.1). 1)) = 2a. every linear transformation from V into F is of this form. • . and let y G V.e2}. More interesting is the fact that if V is finite-dimensional. » i=l Define h: V — F by h(x) = (x. Let V be a finite-dimensional inner product space over F. we now define a related linear operator on V called the adjoint of T. > Let 0 = {ei. Then (x.Sec..v 2 .y') for all x.y). To show that y is unique. clearly g is a linear transformation.\ + a2. Let 0 = {i?i. as in the proof of Theorem 6. which is clearly linear.y) = (vj^gMvi) i=l = Y^g(vi) / ?'=! n ^gMSji (vj. a 2 ). The function g: V — F » defined by g(.3 THE ADJOINT OF A LINEAR OPERATOR In Section 6.3 The Adjoint of a Linear Operator 357 Thus the assumption in Exercise 13(c) that W is finite-dimensional is essential. a2) = {(ai..8. a 2 ) = 2ai + a 2 . Example 1 Define g: R2 — R by g(ai. for > 1 < j < w we have Hvj) = (vj. Proof. We first need a preliminary result.1(e) (p.r) = (x.vi) =g(vj). Furthermore. suppose that g(x) = (x. For a linear operator T on an inner product space V.. Let V be an inner product space. 73). so by Theorem 6. and let y = g(ex)e\ + g(e2)e2 = 2ex + e 2 = (2. 6. T h e o r e m 6. we have that g — h by the corollary to Theorem 2. and let.. vn) be an orthonormal basis for V. and let g: V — F be a linear transformation. Then g((i\. 6.8. r- Since g and h both agree on 0. we have y = y'. Then there exists a unique * vector y G V such that g(x) = (x. whose matrix representation with respect to any orthonormal basis 0 for V is [T]*?.6 (p. 333). The analogy between conjugation of complex numbers and adjoints of linear operators will become apparent. we defined the conjugate transpose A* of a matrix A.1.y) is clearly linear.. y) for all x G V.

9.8.r)./-.y) = (x.y2 C V and e G /•'.u'. We first show that g is linear.y G V.r).r. so (. Finally.//) = (x. the existence of the adjoint is not guaranteed (see Exercise 24).9 is called the adjoint of the operator T./•. Then there exists a unique function T*: V — V such that (T(.) + T(. furthermore.cg(x1) + g(./•).r).c(T(xl). Let V be a finite-dimensional inner product space.T*(ito)> = <.r2).y G V. we need to show that T* is unique.) = r(T(.T'(y)) for all x.y2)). and let T be a linear operator on V.u) = (.r. we have (.y'): that is. Then = U. T"(y)) for all .T*(ar)) = (T*(x).//2) = e(*. (T(.358 Chap.y) = (x. provided it exists. Hence g is linear. I The lineai operator T' described in Theorem 6. T' is linear.U.q/] +//•_.y) = (x.y) = {. Defining T* : V — V by T'(y) ./-). the adjoint of a linear operator T may be defined to be the function T* such that (T(x).8 to obtain a unique vector .y G V. The reader should observe the necessity of the hypothesis of finitedimensionality in the proof of Theorem 6. Note thai we also have (•'••T(„)} = (T(y). -f y2) = cT'(y\) + T*(. 333)./ £ V such that g(x) = (x.x2 € V and c G F. Proof. Let X\. Many of the theorems we prove . // G V. we have (T(x). -a Since x is arbitrary.//) for all x G V. We now apply Theorem 6. let y\. Although the uniqueness and linearity of T' follow as before.T*(y)) for all x. To show that T* is linear.//) . Suppose thai U: V — V » is linear and that it satisfies (T(.y G V.y).n + x2).T* (r//i +r/ 2 )) = (T(x).r).r 2 ). The symbol T* is read "T star. Then g(cr. 6 Inner Product Spaces Theorem 6. T*(y)) = {. + x2) = (T(r. y C V./'.T'{y)). so T* .x) = (y. Lei // G V. For an infinite-dimensional inner product space. Then for any x G V.T(y)) — (T*(.(T(a:). 1%)) for all .r). y) = (cT(./y 1 ) + (T(. y) ./•. Define g: V • F by g(x) .y) for all s G V./•.1(e) (p. We may view these equations symbolically as adding a * to T when shifting its position inside the inner product symbol.'/2) by Theorem 6.r 2 )." Thus T* is the unique operator on V satisfying (T(.y) for all x. T'C"//. U(y)) for all x.T*(y 1 )) + (x.y) i (T(./\rT*(y/.r.) + T*(.

346).-. we have [lA]0 . assumes its existence. and so (L. D = [ T ' V and 0 = {v\. Theorem 6..)-=L.5 (p. for the remainder of this chapter we adopt the convention that a reference to the adjoin! of a linear operator on an infinitedimt nsional inner product space.m Hence T*(ai. I As an illustration of Theorem 6. Hence [(LA)*]/9 = [U]J = A* = [L.a\ -|-a2.. then T 0 = V Proof. and let T and U be linear operators on V. Theorem 6.i. Then L \.10 is a useful result for computing adjoints. Then from the = («i. Example 2 Let T be the linear operator on C2 defined by T(a.y. then.j ~ (r(vj). | The following theorem suggests an analogy between the conjugates of complex numbers and the adjoints of linear operators.Vi) Hence B = A*.A. If 3 is the standard ordered basis for F". 93). by Theorem 2.. Thus. • -2/ 3 1 3 a2). we compute the adjoint of a...a 2 ) = (2iai+3a2. v2. and let 3 be an orthonormal basis for V.3c/] — a 2 ).'y) = Ajt = (A*)ij.ai If 3 is the standard ordered basis for C2.10.. do not depend on V being finite-dimensional.speeific linear operator.Sec. vn}. Then .a 2 ) = (-2i. Let V be a finite-dimensional inner product space. then m. Let A be an n x n matrix. nevertheless.T*( Vi )) = <T(r. Theorem 6.11.— (L \):. corollary to Theorem 6.(.. unless stated otherwise. Proof. we have B. Let A = [T]p. 6.3 The Adjoint of a Linear Operator 359 about adjoints. Corollary.). IfT is a linear operator on V.16 (p.10. Let V be an inner product space. So [Ti/j .].

. /.((T4 U)(x)..T'(y)) = (x. Since L (/WJ) .(T + U)*(y)> . . the rest are proved similarly. Let x./. Corollary. (d) Similarly. (tm.360 (a) (b) (c) (d) (e) ( T + U ) * .T(y)) = {T*(x). since (x.U*(y)) I U (K)) = ( * .3. respectively.. Proof We prove only (c). provided that the existence of T* and U* is assumed. Let A and D be n x n matrices. T* + U* has the property unique to (T + U)*. the remaining parts can be proved similarly. . (TU)* = U*T*: T** = T: P = I. 6 Inner Product Spaces Proof. the experimenter (x.L A .4 4 B)* =A*+B*. (b) (cA)* =cA* for all e£ F.. ym at times t\....y) = (d) follows. Chap..) From this plot.(LB)*(LA)* = L B . Hence T* + U* = (T + U)*. ( T * 4 U*)(y)>.y G V. (c) r .y) = (T(x). he or she may be measuring unemployment at various times during some period. (cT)* = e. We prove (a) and (d).10. y2). For example.r) | \J(x).„. An alternative proof. Least Squares Approximation Consider the following problem: An experimenter collects data by taking measurements y\.t2. y2. we relied on the corollary to Theorem 6. Suppose that the data (/]... (See Figure 6. = (L AB )* = ( U U ) * .T*(y) . I In the preceding proof.. (d) A**=A. Then (a) (. (c) (ABY = B*A*. y\).T* for any c G F. f <a?.. can be given by appealing directly to the definition of the conjugate transpose of a matrix (see Exercise 5). The same proof works in the infinite-dimensional case.ym) are plotted as points in the plane.T * + U*. (a) Because (x. (t2.y) = <T(.L B M-.y) = (x. which holds even for nonsquare matrices.y) + (U(x). we have (AB)* = B*A*.T**(y)).

d)' >-i y = ct + d Figure 6. One such estimate of fit is to calculate the error E that represents the sum of the squares of the vertical distances from the points to the line. and would like to find the constants c and d so that the line y — ct + d represents the best possible fit to the data collected.Sec.3 The Adjoint of a Linear Operator 361 feels that there exists an essentially linear relationship between y and t. This method not only allows us to find the linear function that best fits the data.3 Thus the problem is reduced to finding the constants c and d that minimize E. for any positive integer k. (For this reason the line y = ct + d is called the least squares line. . given an m x n matrix A. We develop a general method for finding an explicit vector xn G F n that minimizes E: that is. E = X > . but also. that is.) If we let L\ A = t2 \tm 1/ and y = yi \ymJ then it follows that E — \\y — Ax\\2.<*i . say y — ct + d. we find xn £ F n such that \\y — AXQ\\ < \\y — Ax\\ for all vectors x G F n . 6. the best fit using a polynomial of degree at most k.

To develop a practical method for finding such an . Lemma 1. .y\\ < \\A. Then (Ax. then (x.ll (see Exercise 5(1))). as desired.y). in addition.A*(AXQ //):„ 0 for all X G F": that is.4) rnnk(./• G F".„ //•(. We summarize this discussion in the following theorem.btl.12./". we need some notation and two simple lemmas. So assume that A*Ax t). 350). there exists a unique vector in W that is closest to y. A*(Axo — y) — (). I Lemma 2. Then ||Ax'o — y\\ < \\Ax — y\\ for all .r and y in F".A**x)m Ur./• G F"..l) . we have A*Ax = 0 if and only if . we assume that rank(.// G F"..4) = n. Then rank(^4*.i).(l to A*Ax — A'y.„.!).rn. we need only show that. By the dimension theorem. By a generalization of the corollary to Theorem (i.Lr() y G W" : so (Ax. Call this vector Axu./. and y t F m . Thus. For . So we need only find a solution .„(/•')• x c F".6 and its corollary that .y) denote the standard inner product of . we have (Ax. 6 Inner Product Spaces first.6 (p. Recall that if x and y are regarded as column vectors.y)m = (x.•„ has the property that E — \\Ax{) — y\\ is minimal. Then 0 = {A*Ax. let (x. we note from Theorem 6. by Lemma 1. then by Lemma 2 we have XQ = (A*A)' A'y. so .A*y „ .A*y)n. for . Ax II implies that A*Ax=0.//. Clearly. If A is an in x n matrix such that iank(. Define W {Ax: x C F"}: that is.. Proof. Let A G M m x „(F) and y G F"'./• C F". Corollary. where . Proof.Ax)m./• (A"y)mx= (x.r . then A" A is Xow let A he an in x // matrix and y « F'". Let A E M m x n ( F ) ./•„ G F" such that (A*A)x0 = A*I/ and \\Ax0 .y) = . Then there exist .y\ for all x • F" furthermore.r0 = (A"A)~] A"y. I n./'.x)n so that Ax = 0.362 Chap. = (Ax. if rank(.l) invertible./. r F".-l).!. By the corollary to Theorem 6. Ax() tj) in t) for all . we have that (X. W = R(L./•) (//•. Theorem 6. Let A < M. If. then .

In this case1. for some /. (3.• to the data.Sec. Then the two columns of .7/ is the least squares line. so A*A would be a diagonal matrix (see Exercise 19). and (1. and hence A*A is invertible by the corollary to Lemma 2. 5). For instance.: d . let us suppose that the data collected are (1.3. the experimenter wants to fit a polynomial of degree at most /. the method above may also be applied if.3 The Adjoint of a Linear Operator 363 To return to our experimenter.'/ w and A = . the in x 2 matrix A in our least squares application has rank equal to two. which consists only of ones. if a polynomial y — el2 + dt + e of degree at most 2 is desired. But this would occur only if the experimenter collects all the data at exactly one time.:. Suppose that the experimenter chose the times tt (1 < / < m) i X > = o. Then / I 1\ 2 I A= 3 1 V4 '7 ] \ 2 I 3 I V4 ' / and y /2\ 3 \7/ . For. Finally.7). In practice. the computations are greatly simplified. 6. The error computed directly as \\Axtl — y\ 2 — 0. (2.4 would be orthogonal. otherwise.i 1 2 3 I 1 1 1 1 0 30 10 10 ! Thus 20 V—10 Therefore 1 20 V —10 10\ \ 2 3 4 30/ VI /A 3 5 piav be satisfy 30 dI ''" It follows that the line // = 1.2). the appropriate model is fvA . 3). the first column of A is a multiple of the second column.r.

if u satisfies (AA*)u . we have that v = s + u.9 (p. But W . (b) Assume that v is also a. (a) For simplicity of notation. solution to Ax = b that lies in W. we let W = R(L 4 .sewnw'-wnw--j(i| : so V = s.sj|. To prove (a).13.1. Thus s is a minimal solution. that is. Then v -. Let .364 Chap. which equals W by Exercise 12.(F) and b G F'". Let u be any solution to Ax — b. Let x be any solution to Ax = b. we need only show that s is the unique minimal solution. there may be no unique solution.\-). | Example 3 Consider the system x + 2y + z = A x . . then u — 0.= W by Exercise 12. 6 Inner Product Spaces Minimal Solutions to Systems of Linear Equations Even when a system of linear equations Ax = b is consistent. In such cases. hence v = s. and let v = A*u. By Theorem 6. and therefore h = Ax = As + Ay = As. Proof. Suppose that Ax = b is consistent.4).y + 2z = -U x + 5-1/ = 19. We can also see from the preceding calculation that if ||v|| = ||. The next theorem assures that every consistent system of linear equations has a unique minimal solution and provides a method for computing it. we have HI 2 + > . it may be desirable to find a solution of minimal norm. So s is a solution to Ax = b that lies in W. Let A C M m x . 350).b.slh' by Exercise 10 of Section 6. Then v G W and Av = b. where u C W . (b) The vector s is the only solution to Ax = b that lies in R(L. suppose that (AA*)u = b. Then the following statements are true. Therefore s is the unique minimal solution to Ax — b. Therefore s — v = A*u by the discussion above. and s G R(L i-)../• = s + y for some s G W and y G W ± .) and W = N(L. Finally. 172).. proving (a).sj| < ||u|| for all other solutions u. T h e o r e m 6. then s = A*u. By Theorem 3. Since s G W.6 (p. A solution s to Ax — b is called a minimal solution if ||. (a) There exists exactly one minimal solution s of Ax = b.

+ y+llz = A x + (u/ . (g) For any linear operator T. we have (T*)* — T. Lor each of the following inner product spaces V (over F) and linear transformations g: V — /•'.Az . find a vector .. for which one solution is (Any solution will suffice.y) for some y G V. > (c) For every linear operator T on V and every ordered basis 0 for V. 2. we have [T*]0 = {[T]PY(d) The adjoint of a linear operator is unique.r G V.i)* = L.\~.1. Assume that the underlying inner product spaces are finite-dimensional. (e) For any linear operators T and U and scalars a and /.Sec. we have (L.7. • EXERCISES 1. (aT -f-fcU)* =aV I-6U*.y) for all .-11 llx-4y 4-262= 19. (b) Every linear operator on V has the form x — (x. (f) For any n x n matrix . we must first find some solution a to AA*x = 6.) Hence s = A*u = is the minima] solution to the given system. 63 The Adjoint of a Linear Operator 365 To find the minimal solution to this system. Label the following statements as true or false.// such that g(x) = (x. . Now S we consider the system O 6. (a) Every linear operator has an adjoint.

36). a2.i)Zy).C . 1 + 2i). b) = (2a + 2 (c) V .y) for all x. (b) If V is finite-dimensional. (For definitions.(x. Prove that if T is invertible. Prove that ||T(x-)|| .366 Chap.9rn / f(t)h(t) dt. (a) R(T*) ± = N(T).].T(f) = f' + 3f. then R(T*) . Let V be an inner product space. = U? and U2 . (b) State a result for nonsquare matrices that is analogous to the corollary to Theorem 6.11. h) - IK a . as in the proof of (c).P . Is the same result true if we assume that TT" = To? 12.T(y)) . = -••.TT*..(l . = (T-1)*. Let T be a linear operator on an inner product space V. Complete the proof of Theorem 6. then T* is invertible and m ! f(t)g(t)dt. x . g(/) = /(0) + / ' ( l ) .2.3 and 2.) 10.T":. Hint: Use Exercise 13(c) of Section 6. Let T be a linear operator on an inner product space V.11.2/. .2n2 -f. (a) Complete the proof of the corollary to Theorem 6.C 2 . 5). For a linear operator T on an inner product space V.1. T(«. T(2. Let lb = T + T* and U2 . and prove it using a matrix argument. and let T be a linear operator on V.N(T) 1 . then T . g ( ~ . z2) = (22. Give an example of a.(3. (c) V . Prove that if V = W ® W^ and T is the projection on W along W .11. linear operator T on an inner product space V such that N(T) ^ N(T*)./o For each of the following inner product spaces V and linear operators T on V. 6 Inner Product Spaces (a) V = R. a-A) — <i\ . (a) V = R2. Prove that U.1..Aa3 (b) V .y G V.P 2 (/f) with (/. and let T be a linear operator on V.IkH for all x C V if and only if (T(x).t. x = (3 . Hint: Use Exercise 20 of Section 6. see the exercises of Sections 1.11 by using Theorem 6.<?)= | fit) = 4 . implies T — To.U2. + iz2. Prove the following results. 9. ( / ? ) with (/. 11. Let V be a finite-dimensional inner product space..i: g(r. prove that T*T = T(. (b) V . evaluate T* at the given vector in V. . Hint: Recall that N(T) = W' .

(b) rank(T) = rank(T*). 19. 20. 16. respectively. respectively. using the preceding definition. Then show that T* exists. •) j and (•. For each of the sets of data that follows.11 using the preceding definition. y. 14. and {•. where V and W » are finite-dimensional inner product spaces with inner products (•. (a) {(-3. 6. (e) For all x G V. where V and W are finite* dimensional inner product spaces. then [T1? = ([T]}Y(c) rank(T*) = rank(T). T*(y))j for all x 6 V and y G W. Prove that det(.3 The Adjoint of a Linear Operator 367 13. Suppose that A is an rnxn matrix in which no two columns are identical. (0. Let V be an inner product space. State and prove a result that extends the first four parts of Theorem 6.9). use the least squares approximation to find the best fits with both (i) a linear function and (ii) a quadratic function. •). Prove the following results. (1. T*T(>) = 0 if and only if T(x) = 0. Definition. •). Prove that (R(T*)) ' = N(T). The following definition is used in Exercises 15-17 and is an extension of the definition of the adjoint of a linear operator.1)} . rank(A*A) = rank(AA*) = rank(^4).T Let A be an n x n matrix.. Prove the following results. Let T: V — W be a linear transformation. 6 ) . ') 2 . Compute the error E in both cases. y) 2 = (x. Let T: V — W be a linear transformation. and let. Define T: V — V by * T(x) — (x.4*) = det(A). 15. T(y)) 2 for all x G W and y G V. Let T: V — W be a linear transformation. (b) If 3 and 7 are orthonormal bases for V and W. and find an explicit expression for it. (a) There is a unique adjoint T* of T. (c) For any n x n matrix A. z G V'. First prove that T is linear. and T* is linear..2 . respectively.Sec. A function T*: W — V is called an adjoint of T if » (T(x). 17.y)z for all x G V. where V and W are finite+ dimensional inner product spaces with inner products {•.y)j = (x.2). (a) N(T*T) = N(T). Deduce from (a) that rank(TT*) = rank(T). Let T be a linear operator on a finite-dimensional inner product space V. (d) (T*(aO. 18. Deduce that rank(T*T) = rank(T). ( . Prove that A* A is a diagonal matrix if and only if every pair of columns of A is orthogonal.

5 5.12 takes the form of the normal equations: ti)d = Y^tiVi £ ' ? < ' £ and ^ L . where c is called the spring constant.i=1 c + rnd-- ^2'yii-1 These equations may also be obtained from the error E by setting the partial derivatives of E with respect to both c and d.9).. (0.-1). In physics. Length X 3. Hooka's law states that (within certain limits) there is a linear relationship between the length x of a spring and the force y applied to (or exerted by) the spring. ij\). — 1 2x . y2). (3.7). (9. Consider the problem of finding the least squares line y = ct + d corresponding to the rn observations (t\.. (a) Show that the equation (A*A)xo = A*y of Theorem 6.ym).0 4. Use the following data to estimate the spring constant (the length is given in inches and the force is given in pounds). . ( 2 ..4).. (f2. (1.2 2. equal to zero.z = 4 x + y + z — w. (tm.z = 1 2x + 3y + 2 = 2 Ax + ly .8 4.368 Chap.3 ) } 21. Find the minimal solution to each of the following systems of linear (xiuations.1).5 4.2). 3 ) .y w =1 (aj x + 2y .3 22.1 . That is.4).12)} (c) {(-2. y = ex + d. (5. (b) x + 2y . 6 Inner Product Spaces (b) {(1. (7.0 Force y 1.0 2. . ( .z = 12 x+y-z=0 2x -y + z = 3 x-y+z=2 (c) (d) 23.

A very important result that helps achieve our goal is Schur's theorem (Theorem 6. . T(e n ) = Yl7=i ei(c) Prove that T has no adjoint. Let T be a linear operator on a finite-dimensional inner prodtict space V. (b) Prove that for any positive integer n.Sec.Al)(ar)> . 0 = (0. Hint: By way of contradiction. e 2 .x) = ((T . (a) Prove that T is a linear operator on V. (T* . 6.14).x) = (v.W)(v). Define T: V -* V by oo T(a)(k) = y a(i) i=k for every positive integer A:.y).Al)*(ar)> = (v. (T . L e m m a . . and any nonzero vector in this null space is an eigenvector of T* with corresponding eigenvalue A. it is necessary and sufficient for the vector space V to possess a basis of eigenvectors. Thus T* — Al has a nonzero null space. } be defined as in Exercise 23 of Section 6. Then for any x G V.2. We begin with a lemma. Proof Suppose that e is an eigenvector of T with corresponding eigenvalue A.4 NORMAL AND SELF-ADJOINT OPERATORS We have seen the importance of diagonalizable operators in Chapter 5. For these operators. So T* — AI is not onto and hence is not one-to-one. it is reasonable to seek conditions that guarantee that V has an orthonormal basis of eigenvectors. Let V and {ei. . suppose that T* exists. and hence v is orthogonal to the range of T* — Al.4 Normal and Self-Adjoint Operators 369 (b) Use the second normal equation of (a) to show that the least squares line must pass through the center of mass. where 1 V^ tnd y = m £ « • i=l 24. As V is an inner product space in this chapter. Prove that for any positive integer n. T*(en)(k) 7^ 0 for infinitely many k. The next section contains the more familiar matrix form. 6. The formulation that follows is in terms of linear operators. Notice that the infinite series in the definition of T converges because a(i) ^ 0 for only finitely many i. (i. If T has an eigenvector. then so does T*. .

x> = <T(v).(y.4) that a subspace W of V is said to be T-invariant if T(W) is contained in W. x n real or complex matrix A is normal if AA* = A* A. then \T]p is a diagonal matrix. z) = cA(0) = 0. is also a diagonal matrix. Then there exists an orthonormal basis 0 forV such that the matrix [T]. Let T be a linear operator on a finitedimensional inner product space V. . (Ini^W 1 ) — n — 1. By Theorem 6. By the lemma. Thus if V possesses 11. basis of eigenvectors of T. It is easy to show (see Theorem 5.Suppose that the characteristic polynomial of T splits. Proof. Because diagonal matrices commute.7(c) (p. Recall from Section 5. Note that if such an orthonormal basis 0 exists.cAz> = cA (y. Definitions. 352). we conclude that T and T* commute.1 and see Section 5. It follows immediately from Theorem 6.10 (p. 0 . so we may apply the induction hypothesis to T w and obtain an orthonormal basis 7 of W 1 such that [Tw-]is upper triangular. and hence [T*]^ = [T]!. Suppose that T*(z) = Xz and that W . M I .(y. So suppose that the result is true for linear operators on (it.cz> =_<y. 359) that T is normal if and only if [T]/3 is normal. If y € W ' and x ~ cz G W. 6 Inner Product Spaces Recall (see the exercises of Section 2. An n. The proof is by mathematical induction on the dimension // of V.370 Chap. we can assume that T* has a unit eigenvector 2. — l)-dimensional inner product spaces whose characteristic polynomials split.2 that a polynomial is said to split if it factors into linear polynomials. 314.i is upper triangular. It is clear that Tw is a linear operator on W. then TT* = T*T . or as a consequence of Exercise 6 of Section 4.span({~}). We now return to our original goal of finding an orthonormal basis of eigenvectors of a linear operator T on a finite-dimensional inner product space V.4) that the characteristic polynomial of T w i divides the characteristic polynomial of T and hence splits. where 0 is an orthonormal basis.14 (Schur). Clearly. Let V be an inner product space. we may define the restriction Tw: W • W by Tw(x) — T(x) for all x G W. We say that T is normal if TT* = T*T. and let T be a linear operator on V.21 p. The result is immediate if n — I. then (T(y). If W is T~ invariant.) L is an orthonormal basis for V such J that [T].-< is upper triangular. So T(y) G W^.cT'(z)> . Theorem 6.T*(cz)) -.11 ortho-normal. We show that W ' is T-invariant.

T*(x))^\\T*(x)\\2. Proof (a) For any x G V.:.r) = Xx.r)|| for all x € V. is normal.Sec. A' = —A.r)|| .||U*(. ifj(x) = Xx.4 Normal and Self-Adjoint Operators Example 1 371 Let T: R. • Clearly.T(aO) . (Xl. Then the following statements are true. Then. we have ||T(. and hence T.x2) . and U is normal by (b). Let U = T . we see that normality is not sufficient to guarantee an orthonormal basis of eigenvectors.T*(:r2)> .||T*(.x2) = (XlXl.:> = = (T*(x).15. The matrix representation * of T in the standard ordered basis is given by A= 0 and . Then U(. using (c). We show that normality suffices if V is a complex inner product space. In fact.Al. (b) T — cl is normal for every e G F. (d) Let A] and A2 be distinct eigenvalues of T with corresponding eigenvectors X] and x>..r). Example 2 Suppose that A is a real skew-symmetric matrix: that is.(T(xi).r) = 0. the operator T in Example 1 does not even possess one eigenvector. however.(•/. (c) Suppose that T(x) = Xx for some x G V.r)|| = ||(T* . (c) If x is an eigenvector ofT. (TT*(x).<T*T(. where ()<()< TT. we have A.sin 0 cos 0) • Note that A A* = I = A*A: so A.AI)0O|| . Let V be an inner product space. (d) If Ai and A2 are distinct eigenvalues of T with corresponding eigenvectors X] and x2. Before we prove the promised result for normal operators. then T*(x) = Xx. then X\ and x2 are orthogonal. we need some general properties of normal operators. and let T be a normal operator on V. then x is also an eigenvector of T*.||T*0r) \x\\. (a) ||T(.. So in the case of a real inner product space.ar2> . All is not lost. Theorem 6. Hence T*(. So x is an eigenvector of T*.— R2 be rotation by 0.. Thus (a) implies that 0 = ||U0r)|| . 6. Then A is normal because both AA' and A'A are equal to -A .)|| 2 = (TOr).x) The proof of (b) is left as an exercise.

.XjVj) = A. T(/) = Xf for some A. Assume that V\. Proof. . .15. So by induction. Since V equals the span of S.16 does not extend to infinite-dimensional complex inner product spaces. Suppose that T is normal. Then T is normal if and only if there exists an orthonormal basis for V consisting of eigenvectors of T. . fn-l) ~ (fin.4). (vkfVj) = 0. 1 Theorem 6.x2) • Since Ai ^ A2.5 (p. I Interestingly..Vk-i are eigenvectors of T. Since A is upper triangular. v2. denote the eigenvalue of T corresponding to By Theorem 6. T*(VJ) = XjVj.. Furthermore. as the next example shows. So we may apply Schur's theorem to obtain an orthonormal basis f3 = {v\. We show that T has no eigenvectors. we conclude that (x\. / n ) = (/m+li/n) = <S(m+l). TT* = I = T*T: so T is normal. 347). / and U(/) = f_uf..vn} for V such that [T]tf = A is upper triangular. We claim that vk is also an eigenvector of T.. and hence vk is an eigenvector of T./ .U(/„)) . Ajk = (T(vk).T*(Vj)) = (vk. where am ^ 0.Vj) = (vk. we may write in f = J ^ a-ifi. 6 Inner Product Spaces = (xi. Then T ( / n ) = /„. X2x2) = X2 (xi. We know that V] is an eigenvector of T because A is upper triangular. by the corollary to Theorem 6. Suppose that / is an eigenvector of T.n = ^m.16. It follows that T(vk) = Akkvk. Let V = span(S'). Consider any j < k.{r. and let A. The converse was already proved on page 370. Example 3 Consider the inner product space H with the orthonormal set S from Example 9 in Section 6. and let T and U be the linear operators on V defined by T ( / ) .. It then follows by mathematical induction on k that all of the Uj's are eigenvectors of T.372 Chap.x2) =0. U(/ n ) = / n _ ] It follows that U = T*. Theorem 6. say. the characteristic polynomial of T splits.+ ! and for all integers n. all the vectors in p* are eigenvectors of T. Thus ( T ( / m ) .v2.1. By the fundamental theorem of algebra (Theorem D..-\) = {fm. J oT(vk) = A\kvi + A2kv2 + ••• + AjkVj + ••• + Akkvk. Furthermore. Let T be a linear operator on a hnitc-dimensional complex inner product space V.

Definitions. • • • . Before we state our main result for self-adjoint operators. 6.4]7 = A. the eigenvalues of ~T A are real.Sec. By definition. Proof. For real inner product spaces. So A = A. Lemma. Note that TA is self-adjoint because [T. for real matrices. a linear operator on a real inner product space has only real eigenvalues. by (a). this condition reduces to the requirement that A be symmetric. The lemma that follows shows that the same can *be said for self-adjoint operators on a complex inner product space. (b) Let n. Then A is self-adjoint. Let T be a linear operator on an inner product space V. So. Because a self-adjoint operator is also normal. A is real. we can apply Theorem 6. Similarly.f 5(c) to obtain XX = T(X) = T*(X) = \X. that is. we need some preliminary work. (b) Suppose that V is a real inner product space. where 7 is the standard ordered (orthonormal) basis for C". Then (a) Every eigenvalue o/'T is real. We say that. An n x n real or complex matrix A is self-adjoint (Hermitian) if A = A*.4 Normal and Self-Adjoint Operators Hence J^aifi+1=T(f) = Xf = TXaifi 373 Since am ^ 0. Then the characteristic polynomial of T splits. and the same is true for self-adjoint operators on a real inner product space. (a) Suppose that T(x) = Xx for x -/= 0. / n + i . By the fundamental theorem of algebra. 0 be an orthonormal basis for V. — dim(V). the . and A = [T]p. we can write / m + i as a linear combination of f„. Let T be a self-adjoint operator on a finite-dimensional inner prochict space V. Let T A be the linear operator on C" defined by TA(X) = Ax for all x G C".fmBut this is a contradiction because S is linearly independent. • Example 1 illustrates that normality is not sufficient to guarantee the existence of an orthonormal basis of eigenvectors for real inner product spaces. It follows immediately that if 0 is an orthonormal basis. T is self-adjoint (Hermitian) if T = T*. the characteristic polynomial of every linear operator on a complex inner product space splits. then T is selfadjoint if and only if [T]^ is self-adjoint. we must replace normality by the stronger condition that T = T* in order to guarantee such a basis.

The following matrix A is complex and symmetric: A * .374 Chap.4 is normal. (b) Operators and their adjoints have the same eigenvectors. which has the same characteristic polynomial as T. then T is normal if and only if \T]p is normal. (a) Every self-adjoint operator is normal. • EXERCISES 1.17. where 0 is any ordered basis for V. Therefore complex symmetric matrices need not. we may apply Schur's theorem to obtain an orthonormal basis 0 for V such that the matrix A = \T]p is upper triangular. Since each A is real.17 is used extensively in many areas of mathematics and statistics. The converse is left as an exercise. (e) The eigenvalues of a self-adjoint operator must all be real. Example 4 As we noted earlier. real symmetric matrices are self-adjoint. Suppose that T is self-adjoint. 6 Inner Product Spaces characteristic polynomial of T. Label the following statements as true or false. I We are now able to establish one of the major results of this chapter. (d) A real or complex matrix A is normal if and only if L.4 has the same characteristic polynomial as A. Thus 3 must consist of eigenvectors of T. be normal.4 splits into factors of the form t — A. Theorem 6. We restate this theorem in matrix form in the next section. Let T be a linear operator on a finite-dimensional real inner product space V. Assume that the underlying inner product spaces are finite-dimensional. But T. So A and A* are both upper triangular. because (AA*)\2 = 1 + i and (A*A)\2 = 1 — i. (c) If T is an operator on an inner product space V.( ! l) -* *={-i 1 But A is not normal. the characteristic polynomial splits over R. By the lemma. Proof. Therefore the characteristic polynomial of T splits. I Theorem 6. . Then T is self-adjoint if and only if there exists an orthonormal basis 0 for V consisting of eigenvectors of T. and self-adjoint matrices are normal. and therefore A is a diagonal matrix. But A* = [T]J = [Tfc = [T)p = A.

If possible.and T*-invariant.<?>= / (e) V = M 2x2 (/?) and T is defined by T(A) =•. Let T be a linear operator on an inner product space V. A< (f) V = M 2x2 (i?) and T is defined by T d c d a 6 3. (a) (b) (c) (d) If T is self-adjoint. (c) Prove that T is normal if and only if T | T 2 = T 2 T | . and let T be a linear operator on V.b) = (2a . Let V be a complex inner product space. . by T ( / ) . > • 4. or neither. 6. produce an orthonormal basis of eigenvectors of T for V and list the corresponding eigenvalues.26 + 5c). Aa . self-adjoint. then Tw is self-adjoint./ ' .and T*-invariant and T is normal. 6. a + 26).4 Normal and Self-Adjoint Operators (f) The identity and zero operators are self-adjoint. 56. where f(t)g(t)dt.Sec. W is T'-invariant.15. (h) Every self-adjoint operator is diagonalizable. Prove that TU is self-adjoint if and only if TU = UT. (b) Suppose also that T = U. T(o. Lei T and U be self-adjoint operators on an inner product space V. (/. T(a. 7. If W is both T. 375 2. 2/ (a) Prove that T| and T 2 are self-adjoint and that T = Tj +iT2. Prove (b) of Theorem 6.T*). Prove that Ui = T] and U2 = T 2 . -2a \ 56). 5. 6.i +iU 2 . For each linear operator T on an inner product space V. where U| and U2 are self-adjoint. Define Ti = i ( T + r ) 2 and T 2 = i ( T . and let W be a T-invariant subspace of V.a + b. then Tw is normal. then (Tw)* = 0~*)w If W is both T. (g) Every normal operator is diagonalizable. 6) = (2a + ib. c) = ( . (a) (b) (c) (d) VVVVR2 and T is defined by R:i and T is defined by C2 and T is defined by P 2 (/r) and T is defined T(a.26. Prove the following results. determine whether T is normal. Give an example of a linear operator T on R2 and an ordered basis for R that provides a counterexample to the statement in Exercise 1(c).

y 2 .x) = 0 for all x G V. . .\ is invertible and that [(T . (a) (b) If T is self-adjoint. 10. T be a self-adjoint operator on a finite-dimensional inner product space V./• G V.. Simultaneous Diagonalization. A 2 .. . . x n. Let.. Hint: Apply Theorem 6. we have that W L is both T. linear operator on a complex (not necessarily finitedimensional) inner product space V with an adjoint T*.6. .r 376 Chap. Hint: Replace x by x I y and then by x f iy. Prove that if W is T-invariant. and let W be a. Prove that there exists an orthonormal basis for V consisting of vectors that are eigenvectors of both U and T. r|| 2 -!|T0r)|| 2 + ||^|| 2 . then T = T„. Hint: Use Exercise 24 of Section 5. then (T(x).R(T*).N(T*) and R(T) . Let T be a normal operator on a finite-dimensional real inner product space V whose characteristic polynomial splits. A„. Prove that A is a Gramian matrix if and only if A is symmetric and all of its eigenvalues are nonnegative. then T = T*. (c) If (T(x). . 6 Inner Product Spaces 8. (The complex version of this result appears as Exercise 10 of Section 6.4. Hence prove that T is selfadjoint.6 (p. An n.x) is real for all .17 to T = L4 to obtain an orthonormal basis {(']. 9. 12. and expand the resulting inner products. Let V be a finite-dimensional real inner product space. n>al matrix A is said to be a G r a m i a n matrix if there exists a real (square) matrix 13 such that A — P' B.) Hint: For any eigenspace W — E^ of T. subspace of V.x) is real for all x G V. Prove that for all x C V !|T(. Assume that T is a. .15 and Exercise 12 of Section 6. 13. v„} of eigenvectors with the associated eigenvalues Ai. Define the linear operator U by U (*•'. then W is also T*-invariant. Apply Theorem 6.r)±. Deduce that T . If T satisfies (T(x). Hint: Use Theorem 6. Let T be a normal operator on a finite-dimensional complex inner product.and U-invariant. 14.17 and Theorem 6. Prove that N(T) .and U-invariant. By Exercise 7. space V. . 350).) = \/X~iV./I)" 1 ]* = (T + i\)~l. we have that W is both T. Prove that V has an orthonormal basis of eigenvectors of T. Prove the following results..3. 11. Let T be a normal operator on a finite-dimensional inner product space V. and let U and T be self-adjoint linear operators on V such that UT = TU.

Sec. Let .both diagonal matrices. then TU is positive definite. (e) If T and U are positive definite operators such that TU = UT. we have (Ajj\ . and let A = [T]^.4. where 0 is an orthonormal basis for V.. 6.x) > 0] for all x f 0. Definitions. . Because of (f). Prove the following results. Let T and U be a self-adjoint linear operators on an //-dimensional inner product space V..4 [\Jormal and Self-Adjoint Operators 377 15. if /'(/) is the characteristic polynomial of A. (The general case is proved in Section 5.\.j > 0 for all nonzero //. prove that f{A) — ().. Hint: Use Schur's theorem to show that A may be assumed to be upper triangular.xn matrix A. . . linite-dimeusional inner product space is called positive definite [positive scmidefinitc] if T is self-adjoint and (T(x).x) > 0 [(T(x). o 2 . 16. 17. . i}) for j > 2.-tuples ( a i . That is.4 and B be symmetric n x n matrices such that AB = BA..a n ). in which case /(*) = U(Au -t) i i Now if T = l_4. (d) If T and U are positive semidefinite operators such that T 2 — U2. Prove the Cayley Hamilton theorem for a complex n.) The following definitions are used in Exercises 17 through 23. where {e. results analogous to items (a) through (d) hold for matrices as well as operators. then T = U. (f) T is positive definite [semidefinite] if and only if . ) G span({< h e 2 e. A linear operator T on a. . An ii x n matrix A with entries from R or (' is called positive definite positive semidefinite} HL.T)(e .\ is positive definite [positive semideBmte]. (b) T is positive definite if and only if y AjjO.jU.e2. (c) T is positive semidefinite if and only if A — B* 11 for some square matrix II. Use Exercise 14 to prove that there exists an orthogonal matrix P such that P'AP and PtBP an.. (a) T is positive definite [semidefinite] if and only if all of its eigenvalues are positive [nonnegative].4 is positive definite [semidefinite]. en} is the standard ordered basis for C".

and let (•. (b) If c > 0. •) the inner product on V with respect to which 0 is orthonormal (see Exercise 22(a) of Section 6. Prove that there exists a unique linear operator T on V such that (x. •) be the inner product associated with V. {•. Prove that both TU and UT are diagonalizable linear operators that have only real eigenvalues. 6 Inner Product Spaces 18. (the adjoint is with respect to (•.1). and Ti the positive definite operator according to Exercise 22. and let T and U be self-adjoint operators on V such that T is positive definite. Let U be a diagonalizable linear operator on a finite-dimensional inner product. 0 a basis of eigenvectors for U. where V and W are finite> dimensional inner product spaces.3.vn} be an orthonormal basis for V with respect to (•. Prove the following results. 20. Prove that (x.T 2 .j — A. Let V be a finitedimensional inner product space with inner product (•. (a) T + U is positive definite.•)). (c) T is positive definite. (•. Prove that there exist positive definite linear operators T| and T[ and self-adjoint linear operatorsT 2 and T 2 such that U = T 2 Tj = T'.) (b) rank(T*T) = rank(TT*) = rank(T). Show that U is self-adjoint with respect to ( • . (b) Prove that the operator T of (a) is positive definite with respect to both inner products. Hint: Let. •)• a n < l l°t T oe a positive definite linear operator on V. •) be any other inner product on V. Let T2 = T r 1 U * . This exercise provides a converse to Exercise 20.). and define a matrix A by Ajj = (vj. repeal the argument with T _ ' in place of T. 21. Let T: V — W be a linear transformation. Hint: Let 0 = {vi. Let T and U be positive definite operators on an inner product space V.Vi) for all i and j. Lei T be the unique linear operator on V such that [T].v2... finite-dimensional inner product space. Let V be an inner product space with inner product (•.y) = (T(a. •).y)' = (T(x). . (a) T*T and TT* are positive semidefinite.T ^ ' U ' T .y) = (T(x). To show that TU is self-adjoint. (a) . 23. (See Exercise 15 of Section 6. 22.y). Prove the following results. •).y) for all x and y in V.378 Chap. 19. Hint: Show that UT is self-adjoint with respect to the inner product (x. • ) ' and U . Let V be a.. space V such that all of the eigenvalues of U are real. then cT is positive definite.y) defines another inner product on V.

in fact. If ||T(.Sec. It is now natural to consider those linear operators T on an inner product space that preserve length.^ is an upper triangular matrix. we cail T a unitary operator if F — (' and an orthogonal operator if F — R. As another characterization. Clearly. Prove that [T]7 is an upper triangular matrix. on a finite-dimensional complex inner product space. for example. A complex number 2 has length 1 if zz = 1. and isomorphisms preserve all the vector space structure. In particular. linear operators preserve the operations of vector addition and scalar multiplication. Let 7 be the orthonormal basis for V obtained by applying the Gram Schmidt orthogonalization process to 0 and then normalizing the resulting vectors. 359). We will see that this condition guarantees. linear operator acts similarly to the conjugate of a complex number (see. (a) Suppose that 0 is an ordered basis for V such that [T]. 6. We study these operators in much more detail in Section 6. we prove that. that T preserves the inner product. . we were interested in studying those functions that preserve the structure of the underlying space.4 and (a) to obtain an alternate proof of Schur's theorem.11 p. 6. Let T be a linear operator on a Unite-dimensional inner product space V (over F). If. any rotation or reflection in R2 preserves length and hence is an orthogonal operator. in the infinite-dimensional case. In past chapters. Recall that the adjoint of a. in addition. Theorem 6.5 UNITARY AND ORTHOGONAL OPERATORS AND THEIR MATRICES In this section.5 Unitary and Orthogonal Operators and Their Matrices 379 24.|| for all x G V. Definitions. we continue our analogy between complex numbers and linear operators./. This argument gives another proof of Schur's theorem.z-)j| = ||.1 I. (b) Use Exercise 32 of Section 5. In this section. an operator satisfying the preceding norm requirement is generally called an isornetry. then the operator is called a unitary or orthogonal operator. these are the normal operators whose eigenvalues all have absolute value I.operators T on an inner product space V such that TT* = T*T — I. Let T be a linear opera!or on a finite dimensional inner product space V. the operator is onto (the condition guarantees one-to-one). It should be noted that. we study those lineai. We will see that these are precisely the linear operators that "preserve length" in the sense that ||T(x)|| = ||x|| for all x G V.

(d) There exists an orthonormal basis 0 for V such that. then U(x) = Xx for some A. Before proving the theorem. We prove first that (a) implies (b).. then T(0) is an orthonormal basis for V.16 (p.. 1 ).v2. Let 0 = {v\. 372) or 6. It follows that (T(Vi). L e m m a . 2 ) . so A = 0. Second. Therefore T(p') is an orthonormal basis for V.. That (c) implies (d) is obvious.T(e„)}. Let x.v2. I Proof of Theorem. y) = (T*T(x). (a) TT* = T*T = I.. Then (a. So T is a unitary operator. we may choose an orthonormal basis 0 for V consisting of eigenvectors of U.. If (x.r)> = <. Hence U(x) = 0 for all x G 0./. Let T be a linear operator on a finite-dimensional inner product space V. and let 0 — {v\. U(. 6 Inner Product Spaces Let h G H satisfy \h(x)\ = 1 for all x.Vj) — Sjj. Next we prove that (d) implies (e).T(y)) = (x. U(ar)) = 0 for all x G V. (c) If ft is an orthonormal basis for V.y G V. 6. Theorem 6. Let U be a self-adjoint operator on a finite-dimensional inner product space V.y) for all x.4. vn}..380 Example 1 Chap.)|| = ||z|| for all x € V.. we first prove a lemma. Then l|T(/)|| 2 = \\hff = ±-J "h(t)f(t)Mjm)dt • = Wff since |/?(/)|2 = 1 for all t. (b) (T(x). vn) be an orthonormal basis for V. it follows that unitary or orthogonal operators are normal. so T(0) = {T(t . Define the linear operator T on H by T ( / ) = hf. y) = (T(ar). From (a).. (e) ||T(:/. 374).T(y)). then U = T 0 .. we prove that (b) implies (c).T(/.17 (p. Compare this lemma to Exercise 11(b) of Section 6. Thus 0 = (x. Thus all the conditions above are equivalent to the definition of a unitary or orthogonal operator.18. T(/3) is an orthonormal basis for V. Then the following statements are equivalent.y G V. x). Proof By either Theorem 6. Let x G V. Now E .T(vj)) = (vi..18. and thus U = T„. Xx) = X (x. If x G 0.

18(a). we may use Exercise 10 of Section 2. we have that V possesses an orthonormal basis {v\.T*T(x) So (re. = X'jvj = Vi for each /". . (I -T*T)0r)) . So TT* = I.0 for all .17. by Theorem 6. In fact.i) = A. we have 2 [X.. Let T be a linear operator on a finite-dimensional real inner product space V.'|.. and (re.x||. For any x G V. even more is true. then U is selfadjoint. By Theorem 6. Let U = I . Suppose that V has an orthonormal basis {?.• G V. Finally. Hence.17 (p.. T is self-adjoint.i Hence ||T(.i2 i=\ 381 since 0 is orthonormal. . and again by Exercise 10 of Section 2. 0 r1 and using the fact. Corollary 1.v.5 Unitary and Orthogonal Operators and Their Matrices for some scalars a. vn} such that T(vi) — Xji'i and |A. . t ' 2 . Proof. U(x)) = 0 for all x G V. Thus (TT*)(vi) = T(Xii'i) = A. we prove? that (e) implies (a).T(x)) = (x. Since V is finite-dimensional.j for all ?'. 6. and therefore T*T — I. i.Sec. If T is self-adjoint. .. we have I'M "INI = IIAjUiH = ||T(UJ)|| = \\vi\\.. . by the lemma.4 to conclude that TT* — I.T*T.J2aJvJ 3=1 i=i j=\ a ii i=i j=i Ei«. that T(0) is also orthonormal. v2. vn} such that T(v.I'-'.| = 1 for all i. so I Aj I = 1 for every i. we have To = U = I — T*T. then.r)|| = ||.e. Applying the same manipulations to n T(x) = ] T > T ( . we obtain IITMIP = D«.4. • • •. Then V has an orthonormal basis of eigenvectors of T with corresponding eigenvalues of absolute value 1 if and only ifT is both sell-adjoint and orthogonal.A. and so Ixll2 = ^2aiVi. It follows immediately from the definition that every eigenvalue of a unitary or orthogonal operator has absolute value 1. T is orthogonal by Theorem 6.X) — \\x\ = \\T(x)\\ = (T(x). If T is also orthogonal. 374).

It follows that T is an orthogonal operator by Corollary 1 to Theorem 6.18. that ||T(:r)|| = \\x\\ for all x G R2. we obtain this fact from (b) also.18. The proof is similar to the proof of Corollary 1.i) = v\ and T(v2) — —v2. matrix if . consider the operator defined in Example 3 of Section 2. Example 3 Let T be a reflection of R2 about a line L through the origin." that is. where 0 < 0 < n.15 (p. wThich is cos 0 sin 6 — sin i cos i j| reveals that T is not self-adjoint for the given restriction on 9. Select vectors v\ G L and v2 G L1. 371). Thus v\ and v2 are eigenvectors of T with corresponding eigenvalues 1 and — 1. The fact that rotations by a fixed angle preserve perpendicularity not only can be seen geometrically but now follows from (b) of Theorem 6.v2} is an orthonormal basis for R2. A linear operator T on R2 is called a reflection of R2 about L if T(x) = x for all x G L and T(x) — —x for all As an example of a reflection. We may view L as a line in the plane through the origin. {vi. It is seen easily from the preceding matrix that T x is the rotation by —$.such that \\vi || — \\v2\\ = 1. this fact also follows from the geometric observation that T has no eigenvectors and from Theorem 6. Example 2 Let T: R2 — R2 be a rotation by 6 . Perhaps the fact that such a transformation preserves the inner product is not so obvious: however.Then T(?. • Definition. an inspection of the matrix representation of T with respect to the standard ordered basis. Then V has an orthonormal basis of eigenvectors ofT with corresponding eigenvalues of absolute value 1 if and only ifT is unitary: Proof.382 Chap. respectively. We show that T is an orthogonal operator. 6 Inner Product Spaces Corollary 2. Let L be a one-dimensional subspace of R2. Furthermore. A square matrix A is called an orthogonal A* A = AAl = I and unitary if A* A = A A* = I. Let T be a linear operator on a finite-dimensional complex inner product space V. Definitions. As we mentioned earlier.5. It is clear geometrically > > that T "preserves length. Finally. • We now examine the matrices that represent unitary and orthogonal transformations.

Hence 7 = {v\. the columns of the matrix form an orthonormal basis for R2. a real unitary matrix is also orthogonal. Let v\ — (cos a. • v . and v2 G L 1 .23 (p. It also follows from the definition above and from Theorem 6. we have [T]7 = [lA]y = Let Q = cos a sin a — sin a cos a ly the corollary to Theorem 2.. the matrix cos 9 sin 9 — sin 9 cos 9 is clearly orthogonal. Note that the condition AA* = / is equivalent to the statement that the rows of A form an orthonormal basis for F n because Si.10 (p. Similarly. = hj = {AA*)ij = J2Aik(A*)kj = Y^AikAjk. One can easily see that the rows of the matrix form an orthonormal basis for R .i|| = ||v2|| = 1. 6. Because T(fi) = v\ and T(v2) = —v2. let 0 be the standard ordered basis for R2. 359) that a linear operator T on an inner product space V is unitary [orthogonal] if and only if [T]p is unitary [orthogonal] for some orthonormal basis 0 for V. v\ G L. Then T = L^. k-\ k=i and the last term represents the inner product of the ith and jfth rows of A. sin a) and t'2 = (—sin a. A similar remark can be made about the columns of A and the condition A* A = I. and let A = [T]g. Example 5 Let T be a reflection of R2 about a line L through the origin. In this case. Then ||i. Suppose that a is the angle from the positive x-sxis to L. Example 4 From Example 2. cos a). Since T is an orthogonal operator and 0 is an orthonormal basis. A is an orthogonal matrix. We describe A. A= Q[lA\1Q-1 .Sec.5 Unitary and Orthogonal Operators and Their Matrices 383 Since for a real matrix A we have A* = A1.v2} is an orthonormal basis for R2. we call A orthogonal rather than unitary. 115).

where P is a unitary matrix and I) is a diagonal matrix. Then A is normal if and only if A is unitarily equivalent to a diagonal matrix. Proof.4. In this case.4 is similar to a diagonal matrix D. to D. matrix. . then A is normal.4 and II are unitarily equivalent [orthogonally equivalent] if and only if there exists a unitary [orthogonal] matrix P such thai A P' IIP. It is easily seen (see Exercise 18) that this relation is an equivalence relation on M„. The preceding paragraph has proved half of each of the next two theorems. By the corollary to Theorem 2. But since t he columns of Q are an orl lionoiinal basis for F". Suppose that A — P*DP. sin of) We know that. More generally. 1 Example 6 Let . Hence . By the preceding remarks. Similarly.(P*DP){P*D*P) P*DID*P P*DD*P. Let A be a complex n x n matrix. „(li)\. Let A be a real n x /. there exists an orthonormal basis Q for F" consisting of eigenvectors of A.384 cos a sin o shm\ f\ cos a J l o 0 -I Chap. Theorem 6. we have DD* =D*D. 115).20. however. Then AA* = (P*DP)(P*DP)* . we say that A is unitarily equivalent orthogonally equivalent. The proof is similar to the proof of Theorem 6. 6 Inner Product Spaces CS O Q sin a — sine* COSQ cos.„(C) [M„.a sin a 2sinacosa sin 2o j sin a coso — (cos o. for a complex normal real symmetric matrix . the matrix Q whose columns are the vectors in 3 is such that D — Q AC). it follows that Q is unitary [orthogonal . I Theorem 6. we need only prove that if A is unitarily equivalent to a diagonal matrix. Thus AA* = A*A.19 and is left as an exercise. Since D is a diagonal matrix. Then A is symmetric if and only if A is orthogonally equivalent to a real diagonal matrix.19. A*A = P*D*DP.23 (p. I Proof.

1. To find P. The key requirement for such a transformation is that it preserves distances. 1.1 ( 1 . hence the term rigid.0 0 8 Because of Schur's theorem (Theorem 6.1. Definition. . 370). the next "result is immediate.5 Unitary and Orthogonal Operators and Their Matrices 385 Since A is symmetric.4 are 2 and 8.-^(1.0).14 p.15(d) (p. 1)} is a basis for the eigenspace corresponding to 2.0. (-1. Because this set is not orthogonal. Theorem 6.1.Sec. V2 v0 v3 J Thus one possible choice for P is I p = 0 1 i 2 \/6 I v •':. l . 6. we apply the Gram Schmidt process to obtain the orthogonal set {(-1.1) is orthogonal to the preceding two vectors.21 (Schur). we also refer to it as Schur's theorem.-2). As it is the matrix form of Schur's theorem.0). then A is orthogonally equivalent to a real upper triangular matrix. l v/5/ '2 0 if I0 2 0 . (b) If F — R. The set {(1.U l . 371)./•(•'') -f(y)\\ = \\x-y\ . (a) If F = ('. The set {(-1. .1)} is a basis for the eigenspace corresponding to 8. 1 . We find an orthogonal matrix P and a diagonal matrix D such that PtAP= I). Taking the union of these two bases and normalizing the vectors.0). we obtain an orthonormal basis of eigenvectors. It is easy to show that the eigenvalues of . Rigid Motions* The purpose of this application is to characterize the so-called rigid motions of a finite-dimensional real inner product space. as predicted by Theorem 6.1. -2)}. One may think intuitively of such a motion as a transformation that does not affect the shape of a figure under ifs action. Let V be a real inner product space. Theorem 6. then A is unitarily equivalent to a complex upper triangular matrix. Notice that (1.20 tells us that A is orthogonally equivalent to a diagonal matrix.1. A function f: V — V * is called a rigid motion if !. l ) ) . Let A C M„ X „(F) be a matrix whose characteristic polynomial splits over F. we obtain the following orthonormal basis for R'5 consisting of eigenvectors of A: 4 ( 1.

) Thus an orthogonal operator on a finitedimensional real inner product space V followed by a translation on V is a rigid motion on V. Furthermore.T(y)) + ||y| and \\x . Another class of rigid motions are the translations.2 (x. Then there exists a unique orthogonal operator T on V and a unique translation g on V such that f = g o T. T h e o r e m 6. any orthogonal operator on a finite-dimensional real inner product space is a rigid motion. for any x G V ||T(x)|| 2 = !|/(.r) -f(0)\\2 = \\x-0\[2 = \\x\\2.T(x) . in which the translation is by 0. It is a simple exercise to show that translations. T(y)) = (x. Any translation is also a special case. y) for all x. (See Exercise 22. Observe that T is the composite of / and the translation by —f(0). space.r) . y € V. hence T is a rigid motion. as well as composites of rigid motions on a real inner product space.y\\2 = ||x|| 2 . in which the orthogonal operator is the identity operator.T(y)\\2 = ||T(z)|| 2 .y G V. from which it follows that / = g o T. We are now in a position to show that T is a linear transformation. where g is the translation by f(0). ||T(. Let f: V — V be a rigid motion on a Hnite-dimcnsional • real inner product space V.T(x)] . We show that T is an orthogonal operator. Let T: V V be defined by T{x)=f(x)-f{0) for all x G V. Any orthogonal operator is a special case of this composite. and consequently ||T(x)|| = ||x|| for any x G V.y) + \\y\\2. Let x. 6 Inner Product Spaces For example. A function g: V — V.aT(y)|| 2 .2 (T(x). where V is a real inner product.386 for all x. We say that g is the translation by VQ. are also rigid motions. Chap. Then ||T(x + ay) . every rigid motion on V may be characterized in this way.aT(y)\\2 = \\[T(x + ay) . But ||T(x) .T{y)\\2 = \\x . is called a translation if there exists a vector e„ G V such that g(x) = X + VQ for all x G V. y G V. Thus for any x.T(y)) + ||T(y)|| 2 2<T(aO. Proof. and let a G R. Remarkably. y G V.y\\2: so (T(x).22.

v ~' K ' \s\\\9 cosi It follows from Example 1 of Section 6. Because T is an orthogonal operator.ay) = T(. . Either T(e 2 ) = (-sin0.. I Orthogonal Operators on R2 Because of Theorem 6. 6. or T(e 2 ) = (sin0. T is an orthogonal operator.T(y)> - (T(z).y)+a\\y\\2-(x. and hence T -. T(.y)] = = 0. where 0 is the standard ordered basis for R2. T h e o r e m 6.2a[(x + ay. and det(A) = 1. Then exactly one of the following conditions is satisfied: (a) T is a rotation. Since T(e 2 ) is a unit vector and is orthogonal to T(ei).r) for all x G V. there are only two possible choices for T(c2). _ . T(y)> = \\(x + ay) .s i n 9. and let A = [T]^.T(e2)} is an orthonormal basis for R2 by Theorem 6. . . Proof. .tf) = {T(ei).. Also del.2a[{T(x + oy). 0 < 9 < 2TT.y)] for all x G V.os9 —sin i first. result characterizes orthogonal operators on R2.4 —I . suppose that I (e->) — . Since T(ei) is a unit vector.„ . (b) T is a reflection about a line through the origin.U.. and det(A) = —1.T(y))] = a \\y\\ + a \\y[\ . The next.T(x). reduces to T(x) = U(. Thus T(x -f. -cos6>).4 that T is a rotation by the angle 9. Let. Since T also preserves inner products. cos 0). therefore. 1 hen .x\\2 + a2\\yf 2 2 2 2 .(x. Substituting x — 0 in the preceding equation yields W = Vn. .22. suppose that un and VQ are in V and T and U are orthogonal operators on V such that f(x)=T(x)+u0 = U(x)+v0 2a2[\y\\2-2a[(x.r) + oT(y).Sec.4) = cos2 0 + sin2 9 = 1.23.2a (T(x + ay) .y) . . and hence T is linear.1 1. T be an orthogonal operator on R . an understanding of rigid motions requires a characterization of orthogonal operators.18(c). We postpone the case of orthogonal operators on more general spaces to Section 6. . such that T(ei) = (cos9.(. To prove uniqueness.sin9). „ „. Q and hence the translation is unique.T(x)\\2 + a2\\T(y)\\2 . This equation.cos0) r .5 Unitary and Orthogonal Operators and Their Matrices 387 = ||T(ar + ay) . there is a unique angle 9. _ . (e.

Now suppose that T(e 2 ) = (sin0..}. Example 7 Let \/5 -1 75 A = 2 We showjthat LA is the reflection of R2 about a line L through the origin. Clearly AA* = A* A = I. it suffices to find an eigenvector of L4 corresponding to 1.s i n 2 0 = .20.. — cos0). \/5 — 1).22 and 6. and therefore A is an orthogonal matrix. we consider the quadratic equation ax2 + 2bxy + ay2 + dx + ey + f = 0.l . and then describe L. Hence LA is an orthogonal operator. | Combining Theorems 6. L is the line through the origin with slope (\/5 — l)/2. 6 Inner Product Spaces cos 9 sin 0" sin0 — cos0 Comparing this matrix to the matrix A of Example 5. Furthermore.1 . (2) '5.c o s 2 0 .388 Chap.23.= . and hence is the line with the equation y= Conic Sections As an application of Theorem 6. we obtain the following characterization of rigid motions on R2. Any rigid motion on R2 is either a rotation followed by a translation or a reflection about a line through the origin followed by a translation. Since L is the one-dimensional eigenspace corresponding to the eigenvalue I of L4.. Alternatively. One such vector is v = (2. Corollary.23. 1 4 det(>4 = .4 is a reflection of R2 about a line L through the origin by Theorem 6. Furthermore. Then A = det(A) = . Thus L is the span of {?. we see that T is the reflection of R2 about a line L.1 . 5 5 and thus L. so that a = 0/2 is the angle from the positive x-axis to L.

then our » — equation simplifies to ' . Then XlAX = (PX')tA(PXl) = X'f(PtAP)X' = X''DX' = Xx{x')2 XiilJ >\2 Thus the transformation (x. we obtain the various conic sections. which describes a circle with radius A/3 and center at (—1..y) —* (x'. For example. If we consider the transformation of coordinates (x. since P is orthogonal. and / = —1. The remaining conic sections. equivalently. parabola. are obtained by other choices of the coefficients.y'). To accomplish this. where x' = x + 1 and y' = y + 2. namely. by Theorem 6.r2 + Axy + 6y may be written as The fact that A is symmetric is crucial in our discussion. and hence in (2). We now concentrate solely on the elimination of the xy-term.r-t/-coordinate system. form 3. If / = 0. b = d = e = 0. we obtain the circle x 4.y2 = 1 with center at the origin.X). For. we may interchange the columns .y) — (x'.-2) in the . the ellipse. we may choose an orthogonal matrix P and a diagonal matrix D with real diagonal entries Ai and A2 such that P'AP = D. Furthermore.20. This change of variable allows us to eliminate the x.(y + 2)2 = 3. we have by Theorem 6. by PX' = PT>fX = X.8. and hyperbola. if a = c = 1. For example. Quadratic forms are studied in more generality in Section 6. If det(P) = —1.23 (with T = Lp) that det(P) = ±1.Sec. the quadratic then (3) may be written as X'AX = (AX.y') allows us to eliminate the xy-term in (3).5 Unitary and Orthogonal Operators and Their Matrices 389 For special choices of the coefficients in (2).l\2 IV'\2 — 3. we consider the expression ax2 + 2bxy + cy2. then it is easy to graph the > equation by the method of completing the square because the xy-term is absent. Now define X' = by X' = P'X or..and y-terms. the equation x2 + 2x + y2 + Ay + 2 — 0 may be rewritten as (x + I) 2 4. If we let A = a b b e and X— For example. (3) which is called the associated quadratic form of (2). 6.

So. Axy I 5//2. we can take Q for our new /'./•')-' and (//')• are the eigenvalues of A = a b b e This result is a.1.Vv/57 /2_\ . 6 Inner Product Spaces of P to obtain a matrix Q..22 (with T = L/-). if det(P) 1. consider the quadratic equation 2x2 \xy + ryy2 . we obtain the quadratic surfaces the elliptic cone./•//-term in (2) may be eliminated by a rotation of the x-axis and //-axis to new axes x' and //' given by X PX'.•' follows that matrix P represents a rotation. by special choices of the coefficients. in the case // . For example. 4 2 -9 -2 n so that the eigenvalues of A are 1 and 0 with associated eigenvectors As expected (from Theorem 6. are easily extended to quadratic (•(Illations in n variables. etc. these vectors are orthogonal. By Lemma4 to Theorem 6. of course. Because the columns of / ' form an orthonormal basis of eigenvectors of . the coefficients of (. As an illustration of the preceding transformation. the . we may always choose P so that det(P) = I. Furl hermore. the same is true of the columns of Q.390 Chap. Q'AQA2 0 0 A. the ellipsoid! the hyperbolic paraboloid.15(d) p. In summary.3. The corresponding orthonormal basis of eigenvectors r i < . In the notation for which the associated quadratic form is 2x we have been using. restatement of a result known as t he principal axis theort m for R2. 371). where P is an orthogonal matrix and del (P) — 1. Notice that de\(Q) -det(P) = 1.3 6 = 0. The arguments above. consequently. Therefore.

4. respectively. (See Figure 6.V + 4"'V5 y5 we have the new quadratic form (x') I (it//')2.and //'-axes in the directions of the first and second vectors ol 4. Thus the change of variable X PX' can be accomplished by this rotation of the X.5 Unitary and Orthogonal Operators and Their Matrices determines new axes x' and y' as in Figure 6.1.7 = * \/-> v I . There is another possibility . Hence if (2 /' 1 V\ 5 -1\ 2 v'5 / 2 5V1 V -I 2 391 then P'AP I'nder the transformation X / ' A ' or 2 .4 an ellipse.and //-axes. -7?2/ v-> 0 (i .Sec.) Note thai the preceding matrix / ' has the form cos 9 sin0 sin 0 cos0 where 9 = cos' 1 —= ~ 2(i./•')"' M>(//)2 = 36 relative to a new coordinate system with the x'. 6. •'• . Thus the original equation 2x2 — Axy \ by2 36 may be written in the form (. It is clear I hat this equation represents Figure 6.(i . So /' is the matrix representation of a rotal ion v5 of R2 through the angle 0.

find an orthogonal or unitary matrix /' and a diagonal matrix D such thai P"AP = D. II the eigenvector of A corresponding to the eigenvalue 6 is taken to be (I. Even. then we obtain the matrix /J_ -2 V. which is the matrix representation of a rotation through the angle 9 9 63. (i) A linear operator may preserve the norm. for each of the following matrices . If T is an orthogonal operator on V. Label the following statements as true or false. I.and //'-axes. This possibility produces the same ellipse as the VO one in figure 0. 3-3/ 5 W (d) / 0 2 2\ 2 (l 2 \2 2 0/ (e) 3.orthogonal operator is diagonalizable. then [T j is an orthogonal matrix for any ordered basis / for V. Assume that the underlying inner product spaces are finite-dimensional. 6 Inner Product Spaces for P. but not the inner product. t hen t hey are also similar. (0) If all the eigenvalues of a linear operator are I.1. 2) instead of ( 1.4°. 2 I) I) 2 I I' 12 1 VI I 2 (C) 3-3. A matrix is unitary if and only if it is invertible.392 Chap. The sum ol unitary matrices is unitary. EXERCISES 1. (a) (b) (Q) (d) (e) (f) (g) Every unitary operator is normal. however. If two mat rices are unit arily equivalent. but interchanges the names of the x'. and the eigenvalues are interchanged.2).75 _2_\ 1 75. 2. I . then the operator must be unitary or orthogonal. Prove that the composite ol unitary orthogonal operators is unitary ort hogonalb . The adjoint ol a unitary operator is unitary.

Let T be a self-adjoint linear operator on a finite-dimensional inner product space. there exists a unitary operator U such that T = U . Prove that if T is a.<?)= / f(t)gjt)dt. unitary operator on a.r(/l) = ] T X i i-i where the A.|| for all x in some orthonormal basis for V. Prove that (T + /'I)(T — y'l)~' is unitary using Exercise 10 of Section 6. or unitary. Prove that .1] with the inner product </.5 Unitary and Orthogonal Operators and Their Matrices 393 4.'s are the (not necessarily distinct) eigenvalues of A./'. Let U be a linear operator on a finite-dimensional inner product space V. If ||U(. self-adjoint. Which of the following pairs of matrices are unitarily equivalent? 0 1 1 0 and (b) 0 0 '0 and f 0 and and 6. then T has a unitary square. must U be unitary? Justify your answer with a proof or a counterexample. Let V be the inner product space of complex-valued continuous functions on [0. Let A be an n x /.r)|| — ||. Prove that T is a unitary operator if and only if \h(t)\ — I for 0 < t < 1./. 8. 9.4. 10. and tr(A*A) = ^ |A.(</) = 2U. For z 6 C. define T~: C — C by T-. Characterize those z for > which T a is normal. real symmetric or complex normal matrix. . Let h C V. and define T: V — V by T ( / ) = /. 6. 7. root: that is. 5.Sec.|2.. finite-dimensional inner product space V.

Observe that W need not be U-invariant.r) 0 for all x G W . V W :AN 1 .y) for all x. (See t he definit ions in the exercises in Section (i. Suppose that . Prove the following results. (U(:r). 14. 352) and the exercises of Section 1. (b) {U(/']). Show that "is unitarily equivalent to" is an equivalence relation on Mnxn(C).7 (p. Let U be a unitary operator on an inner product space V. where the A. 6 Inner Product Spaces 11.U(r. self-adjoint unitary operator. Prove that if A and 11 are unitarily equivalent matrices.is not U-invariant.4 and 11 are diagonalizable matrices. r-> Uk} is an orthonormal basis for W. 1.r|| for all x C W and U(. | . Let V be a finite-dimensional inner product space. Define U : V -V by U(e. (a) . Let A be an n x n real symmetric or complex normal matrix. G W and v2 G W .) 15..U(//)) = (x. .1.y e W. 12. then A is positive definite [semidefinite] if and only if II is positive definite semidefinite]. 17. Hint: Use Exercise 20 of Section 6. By Theorem 0. 18.] ] A . 20. ^). A linear operator U on V is called a partial isometry if there exists a subspace W of V such that |!U(. Prove or disprove that A is similar to II if and only if A and 11 are unitarily equivalent.3. 19. 16. 13. .'s are the (not necessarily distinct) eigenvalues of A.1. ('ontrast (b) with Exercise 16. and let W be a finite-dimensional U-invariant subspace of V.394 Chap. Prove that n (let 1. Suppose that U is such an operator and {v\. I u2) = V\ . Let W be a finite-dimensional subspace of an inner product space V. Find an example of a unitary operator U on an inner product space and a U-invariant subspace W such thai W . Prove that (a) U(W) W: (b) W. Prove that U is a. Prove that a matrix that is both unitary and upper triangular must be a. where e. diagonal mat rix.V2.r)|| = ||.is U-invariant. Find an orthogonal matrix whose first row is ( | .) U(a-)} is an orthonormal basis for R(U).

T(y)) for all x. There are four cases. E I^I2 = E •1. 6.that the matrices 1 2 2\ i) . Let V be a real inner product space.22: If / : V — V is a rigid > motion on a finite-dimensional real inner product space V... (d) Let {wi. 22. Wj}.y) = (x..3-] t. . 21. This exercise is continued in Exercise 9 of Section 6.j = ] (c) Use (b) to show. m. 24. (b) If T is a rotation and U is a reflection about a line through the origin.y G 0. then UT is a rotation. (a) If T and U are both reflections about lines through the origin.. Let A and B be n x n matrices that.and 0 = {U(ci).d I%I2- fi (l 4 1 are not unitarily equivalent. (b) Prove that the composite of any two rigid motions on V is a rigid motion on V.iu-2.. and T = U*. then both UT and TU arc reflections about lines through the origin. (f) U* is a partial isometry. Prove the following variation of Theorem 6. U(e 2 ). (a) Prove that tr(AM) = (b) Use (a) to prove that ti(B*B). Then 0 is an orthonormal basis for V. Use Theorem 6..mm Sec. • • • • Wj} be an orthonormal basis for R(U)1. 23.5 Unitary and Orthogonal Operators and Their Matrices 395 (c) There exists an orthonormal basis 7 for V such that the first k columns of [U]7 form an orthonormal set and the remaining columns are zero.. W].)) = v\ (1 < i < k) and T(v)i) = 0 (1 < i < 3). are unitarily equivalent. (a) Prove that any translation on V is a rigid motion. then there exists a unique orthogonal operator T on V and a unique translation g on V such that f = T o g. Hint: Show that (D(x).6. Then T is well defined.23 to prove the following results. \J(vk). Let T and U be orthogonal operators on R2.. (e) Let T be the linear operator on V that satisfies T(U(t'..

UT is a rotation.1.y' so that the following quadratic forms can be written as Xi(x') -t. respectively. By Exercise 24. show that A--1 U ifk = I Vk\\Uk + ^ (wk.2 for Wk in terms of u&.r> rn be the orthogonal vectors obtained from W\. 27.1). 29.2.r-axis to L2. Let Ui. Prove A = QR. Define /? G M„>„(F) by '•. Let wi.-.X->(y')2 + X%(z')2.0.ry + y2 28. Find new coordinates x'. and (2. //.ll Rjk = { (wk. 26.396 Chap. respectively.y'. (b) Let A and Q denote the n x // mat rices in which the fcth columns are Wk and Uk. Consider the expression X AX.2\r x2 . Suppose that T and U are orthogonal operators on R such that T is the rotation by the angle o and U is the reflection about the line L through the origin.Ylxy .A2(j/7) . (a) (b) (c) (d) (e) x2 + Axy + y2 2x2 + 2xy 4. . (a) (b) bind the angle 9 from the positive rc-axis to L|.r')2 4. and let L'i. respectively./-2 + 2xy + 3//2 x2 .2.'s.W2 a'../•. by the Gram Schmidt process. QR-Factorization.w„ be linearly independent vectors in F". . By Exercise 24. both UT and TU are reflections about lines l_i and L2.r-axis to L. (2. 6 Inner Product Spaces 25.Ay2 3.4 is as defined in Exercise 2(e). Find its angle of rotation. where X1 = (.0)..uo "„ be the orthonormal basis obtained by normalizing the r.Uj) 0 ifj=* if j <k ifj>Ar.W2. (a) Solving (1) in Section 6.z' so that the preceding expression is of the form A| (. i)U3 [\<k< n). find a change of coordinates x'. z) and . bind the angle 0 from the positive . Suppose that T and U are reflections of R2 about the respective lines L and L' through the origin and that 0 and v are the angles from the positive x-axis to L and L'.1). Let r be the angle from the positive . through the origin. (c) Compute Q and R as in (b) for the 3 x 3 matrix whose columns are the vectors (1.

Hint: Use Exercise 17.) . Decompose A to QR. Define the Householder operator H„: V — V by H„ ('. J. (b) H„(. we have shown that every invertible matrix is the product of a unitary [orthogonal] matrix and an upper triangular matrix.5 Unitary and Orthogonal Operators and Their Matrices 397 (d) Since Q is unitary [orthogonal] and /? is upper triangular in (b). The following definition is used in Exercises 31 and 32.9^ Sec. Suppose that A G M n x n ( F ) is invertible and A = Q1Ri = Q>Il>. however..2x3 . (Later. this method for solving large systems of linear equations with a computer was being advocated as a better method than Gaussian elimination even though it requires about three times as much work. Let V be a finite-dimensional complex [real] inner product.2ar2 4. (Note: If V is a real inner product space.where Q\-Qz £ M„ x „(/'') are unitary and P\. and let u be a unit vector in V. because of its great stability.I I •''2 + X3 = . 30.2 (x. where Q is unitary and />' is upper triangular. and hence H„ is a unitary [orthogonal] operator on V. Then QRx — b.P. and hence Rx = Q'b. Definition. II.r•> ." then it is nearly as stable as the orthogonalization method.I Xj • 2./•) = x . Wilkinson showed that if Gaussian elimination is done "properly. Prove that D = i?2/?j" is a unitary diagonal matrix. then (3 is orthonormal if and only i f ' is orthonormal. Suppose that 3 and *) are ordered bases for an //-dimensional real [complex] inner product space V. it) 11 for all x C V. 31.r. Use the orthogonalization method and (c) to solve the system . This last system can be easily solved since P is upper triangular.11. sj)<>ce. (a) H„ is linear. Prove the following results. Prove thai if Q is an orthogonal [unitary] ii x ii matrix that changes 7-coordinates into .) 'At one time. 6. H„ is a reflection.r) — X if and only if X is orthogonal to u. G M„ x „(/'') are upper triangular. then in the language of Section 6.^-coordinates. 4. (c) Hu(u) = -u. (e) The QR factorization described in (b) provides an orthogonalization method for solving a linear system Ax — b when A is invertible. (d) H* = H„ and H2 = I.1 . by the Gram Schmidt process or other means. Let H„ be a Householder operator on a finite-dimensional inner product space V.

.6 ORTHOGONAL PROJECTIONS AND THE SPECTRAL THEOREM In this sect ion. Let x and y be linearly independent vectors in V such that ||cc|| = \\y\\. however.6 (p. 372) and 0. Let V be a finite-dimensional inner product space over F.. Let V be an inner product space. \\x — 9y\\ (b) If F = R. we see that Wi does not uniquely determine T. we need only assume that one of the preceding conditions holds. By Exercise 20 of Section 2. = b e V : T(x) =x d N(T) = W->.9y) is real. 374) to develop an elegant representation of a normal (if /•' — C) or a self-adjoint (if F = R) operator T on a finite-dimensional inner product space. The special case where V is a direct sum of two subspaces is considered in the exercises of Section 1. So V = R(T) © N(T). We prove that T can be written in the form A| T[ 4.•) — 9y. unit vector u in V and a complex number 0 with |0j — 1 such that H„(. Recall from the exercises of Section 2. if V is finite-dimensional. Definition. ©W 2 = W.r) = x\. for example.W.T/. = Because V . In the notation of Theorem 6.398 Chap. We must first develop some results about these special projections.2. We say that T is an orthogonal projection if R(T) = N(T) and N(T) 1 . arc orthogonal project ions. T is uniquely determined by its range. then a linear operator T on V is t he projection on W.1.3) that T is a.N(T). Thus there is no ambiguity if we refer to T as a "projection on W|" or simply as a "projection. whenever . 6. along W2 if.16 (p./• — X] 4-.1 that if V = W] ©W 2 . we have R ( T ) = W .. Now assume that W is a. For an orthogonal projection T. and set u — r.2. finite-dimensional subspace of an inner product space V. then R(T) . prove that there exists a unit vector // in V such thai r\u(x) = y. are the distinct eigenvalues of T and T | . if R(T) . T_>..3. (a) If F — ('. • W:i does not imply that W2 .W..R(T)' ' = N(T) . projection if and only if T = T 2 .r2. it can be shown (see Exercise 17 of Section 2. 350)." fn fact. with x 1 G Wi and x-z C WL>. We assume that the reader is familiar with the results about direct sums developed at the end of Section 5.. r-Ax — 9y). 6 Inner Product Spaces 32. und let T: V — V be a projection. we can define a function . Hint: Choose 9 so that (x...A2T2 + • • • 1 A/. T/.17 (p. we rely heavily on Theorems 6. we have T(.R(T). where A. prove that there exists a. Note that by Exercise 13(c) of Section 6.i. X> A/.

Let / G H. For if T and U are orthogonal projections on W.5. We call T the orthogonal projection of V on W.5 From Figure 6. recall the inner product space H and the orthonormal set S in Example 9 of Section 6.0 (p. It is easy to show that T is an orthogonal projection on W. and since every projection is uniquely determined by its range and null space. Figure: 6.= N(U). this approximation property characterizes T. perpendicular from v on the line y — .a 2 ) = (<i\. that is. then R(T) — W — R(U). and U is a different projection on W. let V = R2 and W = span{(l. In fact. We show that the best approximation to / by a trigonometric polynomial of degree less than or equal to n is the trigonometric polynomial . Hence N(T) = R(T) J . if ir G W.a\). we have T — U. whereas v . 1)}. Note that V .ijt where c/„ or (/ „ is nonzero. Then T is the orthogonal projection of V on W. Define U and T as in Figure 6. To understand the geometric difference between an arbitrary projection on W and the orthogonal projection on W.Sec. 6. we see that T(v) is the "best approximation in W to y".1. 350).U(» £ W ' . then \\w — v[\ > |JT(r) v\\. Define a trigonometric polynomial of degree // to be a function g G H of the form g(t) X "jfj1 E j=-n .T(v) G W-L. As an application to Fourier analysis. These results follow immediately from the corollary to Theorem 0.R(U)J. We can say even more there exists exactly one orthogonal projection on W.6 Orthogonal Projections and the Spectral Theorem 399 T: V — V by T(/y) — u.5./• and U(ai. where T(v) is the foot of a.

T(y) G N(T).{•''! •!!)) and (T(.T'dl Thus y T(y) = 0: that is.r 2 .T*(//)) = (x.= N(T).r).0) = 0.//) = (x.T(y))) = (y.y 2 G N(T).y) = (T*(.//) = (.T(y . Now V = R(T) © N(T) and R(T) 1 N(T).r).y Since // T(y)).y) for all x.//) = (x-i. and let T be a linear operator on V. Lei y G N(T) ± .r 2 .r). and let T be the orthogonal projection of H on W. let W = span({/.(. // . I sing the preceding results. For any y G V.T(y)) .yi) + (x1. and so (x. y .6 (p. 6 Inner Product Spaces whose coefficients are the Fourier coefficients o f / relative to the orthonormal set S. where X\.: \j\ < //}). But also T(y))) = (y. /• C N(T).T ( y ) ) = (y. Let V be an inner product space. For this result. we need only show that T* exists and T = T*./• C RfT)^.r) = 0.y-T(y)> = (y. Si.r C R(T) and y G N(T). T(y) = y.r. Proof.y2) .N(T)~.T(y) G R(T). Now suppose that T = T = T*.y G V. T h e o r e m 6. We must show that y G R(T). and hence we must show that R(T) . Then T is a projection by Exercise 17 of = Section 2. Then x = T(x) = T*(. Then T is an orthogonal projection if and only if T has an adjoint T* and T2 — T = T".<•'•]•</]} • So (. thus T* exists and T = T*. Hence (•'•-T(//)) .T(//)) {T(x). Hence R(T). the first term must equal zero. Now l|y-T(»)||2 = <y-T(y).T(. Let . we have (T(.r 400 Chap.ri 4.y>. we have R(T) ' N(T) 1 . from which it follows that R(T) C N(T) X .!)•>} = (xi.3.y\ G R(T) and . Hence R(T) .//i) = ( l i .y-T(y))-(T(y). and thus. I . Let x. Suppose that T is an orthogonal projection. 350) tells us that the best approximation to / by a function in W is T(/)= £ (fjj)fj An algebraic characterization of orthogonal projections follows in the next theorem. that is. Since T 2 = T because T is a projection. 0) = 0.y ( V.N(T) ' and R(T)' = N(T).yi 4./•). Then we can write x = X\ 4 x-2 and y = yx 4. Therefore x G N(T) 1 .r. Now suppose that .(x.24.2. y i ) + {•f2-!l\} . The corollary to Theorem 6. ( T ( y ) .J(y)) = 0.D N(T) by Exercise 13(b) of Section 0.

(d) Since T* is the orthogonal projection of V on Wj. for x € V.+Tk.• • •.25 (The Spectral Theorem).). where Ti(x) = x% £ Wj. T is diagonalizablc. 352).e\Nk.7(c) (p. and T be the orthogonal projection of V on W.. then (x. proving (d).Sec. 374).V2. We are now ready for the principal theorem of this section. Hence W^ = WT1. so V = Wi © w 2 w* by Theorem 5.. .. however 7 is not necessarily orthonormal. Suppose that T is a linear operator on a finite-dimensional inner product space V over F with the distinct eigenvalues Ai. (b If W^ denotes the direct sum of the subspaces \Nj for j 7^ i. W be a subspace of V. Theorem 6. proving (b). 37T It follows easily from this result that W^ C W4From (a). we have dim(W/-) = dim(V) -dim(Wt) by Theorem 6.15(d) (p. 278). For each i (1 < i < k). (a) By Theorems 6. we have dim(W^) = ^ d i m ( W j ) = dim(V) . be the orthogonal projection of\/ on W.6 Orthogonal Projections and the Spectral Theorem 401 Let V be a finite-dimensional inner product space.. y) — 0 by Theorem 6. we have X = %i + %2 + r. Then [T]/j is a diagonal matrix with ones as the first k diagonal entries and zeros elsewhere.. it follows from (b) that N(Ti) = R(Tj) x = W-1 = \N[.. j < k..11 (p. Vk} is a basis for W.dim(W. and let T. [T]^ has the form h Oi Or o3 If U is any projection on W. (a) v = w 1 © w 2 e .17 (p. Xk.16 (p.. we may choose a basis 7 for V such that [U]7 has the form above. In fact. Proof. (d) I = T . (e) T = AJTJ + A2T2 + • • • + AfcTfc. Hence. A2. On the other hand. 6. (c) The proof of (c) is left as an exercise.. then (c) TjTj = SijT-i for 1 < i.. vn} for V such that {v\.Assume that T is normal if F = C and that T is self-adjoint if F — R.. v<i. Then the following statements are true.Xk. (b) If ar € Wi and y G Wj for some i ^ j. let W* be the eigenspace of T corresponding to the eigenvalue Aj. + T 2 + --. 372) and 6.. We may choose an orthonormal basis j3 = {v\.

Then T(x) = T(. So T is normal. . we have T* = A1T1 + A2T2 + • • • + AfcT& since each Tj is self-adjoint. We now list several interesting corollaries of the spectral theorem.( A i T i + AaTa + " . For what follows. Using the Lagrange interpolation formula (see page 52). I The set {Ai. .402 (e) For x € V. 1 .AfcZfc = AjT.) Then [T]^ has the form (X\Imi o \ 0 o Xol 2*m 0 0 2 \ o Xklmk) that is. (Thus raj is the multiplicity of Aj.Ox) + A2T2(:r) + • • • + AfcTfc(z) . we may choose a polynomial g such that g(X{) = Aj for 1 < i < k. 6 Inner Product Spaces h xk. then it follows (from Exercise 7) that fl(T) = g(Xi)T\ + #(A 2 )T 2 + • • • 4. . Then 9(V=g(X1)T1+g(X2)T2 + • • • + .9(Afc)Tfc = A1T1 + A2T2 + ••• + AfcTfc=T*. the sum I = Ti +T2 + • • • +T*. If F = C. . Taking the adjoint of both sides of the preceding equation. A^} of eigenvalues of T is called the spectrum of T. write x = x\ + x2 -\ Chap. Corollary 1.r.+ AfcTfc)(a.) + T(x2) + ••• + T(xfc) = Ai#i + \2x2 H 1.g{Xk)Tk for any polynomial g. Let T = A1T1 + A2T2 + • • • + A^T^ be the spectral decomposition of T. The spectral decomposition of T is unique up to the order of its eigenvalues. A 2 .). where Xi <E Wj. Proof. \T]f) is a diagonal matrix in which the diagonal entries are the eigenvalues Aj of T. Suppose first that T is normal. let 0 be the union of orthonormal bases of the Wj's and let raj = dim(Wj). in (d) is called the resolution of the identity operator induced by T. then T is normal if and only if T* = g(J) for some polynomial g. we assume that T is a linear operator on a finite-dimensional inner product space V over F. then T commutes with T* since T commutes with every polynomial in T. If A1T1 + A2T2 + • • • + A^Tfc is the spectral decomposition of T. if T* = <?(T) for some polynomial g. Conversely. With the preceding notation. many more results are found in the exercises. and the sum T = A1T1 + A2T2 + • • • + AfcTfc in (e) is called the spectral decomposition of T. This fact is used below. and each Aj is repeated raj times.

Label the following statements as true or false. The converse has been proved in the lemma to Theorem 6. Suppose that every eigenvalue of T is real.• —f. Let T he as in the spectral theorem with spectral decomposition T = AiTi + A2T2 + • • • + AfcTfc.6 Orthogonal Projections and the Spectral Theorem 403 Corollary 2. If T is unitary.XkTk be the spectral decomposition of T. Proof Choose a polynomial gj (1 < j < k) such that #?(Aj) = 6^. 1 Corollary 3. Assume that the underlying inner product spaces are finite-dimensional. Let T = AiTi + A2T2 -fh A^T*. be the spectral decomposition of T.Sec.18 (p. 374). (b) An orthogonal projection is uniquely determined by its range. Then T* = AiTi + A2T2 + . If |A| = 1 for every eigenvalue A of T. Then each Tj is a polynomial in T.• • + AfcTfc = A. I EXERCISES 1.Ti + A2T2 + • • • + XkTk = T. (c) Every self-adjoint operator is a linear combination of orthogonal projections. Proof. Then 9j(V = Pi(Ai)Ti + gj{X2)T2 + ••• + gj{Xk)Tk — 5\jT\ + S2jT2 -\ h SkjTk = Tj. Corollary 4. If F = C.+ |Afc| Tfc = Tj + T 2 + • • • + Tfc 2 2 2 A*Tfc) Hence T is unitary. then T is unitary if and only if T is normal and |A| = 1 for every eigenvalue A ofT. then T is self-adjoint if and only if every eigenvalue of T is real. Let T = Ai T x + A2T2 -I. If F = C and T is normal. TT* = (AiTi + A2T2 + • • • + AfcTfcXAiTi + A 2 T 2 = |A 1 | T 1 + |A 2 | T 2 + --. Proof. then by (c) of the spectral theorem. 6. then T is normal and every eigenvalue of T has absolute value 1 by Corollary 2 to Theorem 6.17 (p. . 382). (a) All projections are self-adjoint.

Prove that if T is a projection. Let T be a normal operator on a finite-dimensional complex inner product space V. Let W be a finite-dimensional subspace of an inner product space V. then T is also an orthogonal projection. (e) T is invertible if and only if Aj ^ 0 for 1 < i < k. Let T be a normal operator on a finite-dimensional inner product space. . Let V = R2. Compute [T]^. (a) If g is a polynomial. 4.5: (1) Verify that LA possesses a spectral decomposition. Give an example of a projection for which this inequality does not hold. What can be concluded about a projection for which the inequality is actually an equality for all x 6 V? (b) Suppose that T is a projection such that ||T(. (f) T is a projection if and only if every eigenvalue of T is 1 or 0. 7. 6. then I — T is the orthogonal projection of V on W -1 . explicitly define the orthogonal projection on the corresponding eigenspace. prove that ||T(a:)|| < ||rr|| for all x e V. Do the same for V = R3 and W = span({(l. For each of the matrices A in Exercise 2 of Section 6. (a) If T is an orthogonal projection. Then U commutes with T if and only if U commutes with each Tj.404 Chap. 5. 2 (d) There exists a normal operator U on V such that U = T. 0. (3) Verify your results using the spectral theorem. 2. Let T be a linear operator on a finite-dimensional inner product space V.2)}). then T = To(c) Let U be a linear operator on V. (e) Every orthogonal projection is a unitary operator. then T(x) is the vector in W that is closest to x. Prove that T is an orthogonal projection. Use the spectral decomposition AiTi + A2T2 + • —h XkTk of T to prove the following results. where T is the orthogonal projection of V on W. then SCO = £ > ( A i ) T < z=i (b) If T" = To for some n.x)|| < ||x|| for x G V. (2) For each eigenvalue of LA. and /? be the standard ordered basis for V.1)}). 3. Show that if T is the orthogonal projection of V on W. 6 Inner Product Spaces (d) If T is a projection on W. W = span({(l.

Prove that there exists an orthonormal basis for V consisting of vectors that are eigenvectors of both T and U.7* THE SINGULAR VALUE DECOMPOSITION AND THE PSEUDOINVERSE In Section 6. p. y).Sec. Another difference is that the numerical invariants in the singular value theorem. Thus any operator T for which (T(x). we characterized normal operators on complex spaces and selfadjoint operators on real spaces in terms of orthonormal bases of eigenvectors and their corresponding eigenvalues (Theorems 6. prove the following facts about a partial isometry U. Prove (c) of the spectral theorem. 6. 9. be a normal or self-adjoint operator. 11. In this section.y). If the two spaces and the two bases are identical. 10.4. in contrast to their counterparts. Let U and T be normal operators on a finite-dimensional complex inner product space V such that TU = UT.26). we establish a comparable theorem whose scope is the entire class of linear transformations on both complex and real finite-dimensional inner product spaces—the singular value theorem for linear transformations (Theorem 6. There are similarities and differences among these theorems. the singular values. Use Corollary 1 of the spectral theorem to show that if T is a normal operator on a complex finite-dimensional inner product space and U is a linear operator that commutes with T. are nonnegative. the eigenvalues. 374). 6. (a) U*U is an orthogonal projection on W. The singular value theorem encompasses both real and complex spaces. and 6.17.5. for all x and y is called unitary for the purposes of this section. Hint: Use the hint of Exercise 14 of Section 6. Ay) = (x. (b) UU*U = U.7 The Singular Value Decomposition and the Pseudoinverse (g) T = —T* if and only if every Aj is an imaginary number.4 along with Exercise 8. 405 8. then the transformation would. Simultaneous diagonalization. in fact. in this section we use the terms unitary operator and unitarymatrix to include orthogonal operators and orthogonal matrices in the context of real spaces. because of its general scope. All rely on the use of orthonormal bases and numerical invariants. p. the singular value theorem is concerned with two (usually distinct) inner product spaces and with two (usually distinct) orthonormal bases. or any matrix A for which (Ax. . However. then U commutes with T*.16.T(y)) = {x. 372. This property is necessary to guarantee the uniqueness of singular values. Referring to Exercise 20 of Section 6. For brevity. for which there is no such restriction.

um} for W and positive scalars o~\ > <r2 > • • • > or such that T{Vi) = crjUj if 1 < i < r ifi > r.4 and 15(d) of Section 6.. Then T(Vj)) ur} (JiO (Vi.26 (Singular Value Theorem for Linear Transformations). For 1 < i < r. the adjoint T* of T is a linear transformation from W to V and [T*]^ = ([T]!)*. Then there exist orthonormal > bases {v\. hence there is an orthonormal basis {v\..v2. . v„} for V consisting of eigenvectors of T*T with corresponding eigenvalues Aj. .. By this exercise. and Aj = 0 for i > r.j (ui.. respectively. T is an orthonormal subset of W. By Exercises 18 of Section 6. where Ai > A2 > • • • > Ar > 0. .. where 3 and 7 are orthonormal bases for V and W. 6 Inner Product Spaces In Exercise 15 of Section 6.o~r arc uniquely determined by T. Suppose I <i. 0 (4) Conversely.vn} for V and {u\.3. and let T: V — W be a linear transformation of rank r. We first establish the existence of the bases and scalars. we begin with the principal result.Uj) = ( .u2.3.. • • •.. Proof.. Furthermore.T(vi).406 Chap.. T*T is a positive semidefinite linear operator of rank r on V. where V and W arc • finite-dimensional inner product spaces.o2.o I"} = Sij. the definition of the adjoint of an operator is extended to any linear transformation T: V — W. suppose that the preceding conditions are satisfied.. the linear operator T*T on V is positive semidefinite and rank(T*T) — rank(T) by Exercise 18 of Section 6.Vj) o.4. Vi is an eigenvector of T*T with corresponding eigenvalue of if 1 < i < r and 0 if i > r. With these facts in mind. . v2. define < j = y/Xi and Ui = —T(vA We show that {u{..\Oi Oj < r. Let V and W be finite-dimensional inner product spaces. Therefore the scalars o\_.. u2. Then for 1 < i < n.. Theorem 6.

the orthonormal bases for V and W given in Theorem 6. o~iVi if i = j < r 0 otherwise..Vj) = (ui.. and so T(vi) = 0 by Exercise 15(d) of Section 6. then the term singular value is extended to include or+i = • • • = ok = 0.u2.. Although the singular values of a linear transformation T are uniquely determined by T. If i > r..7(a) (p.. and o~\ > a2 > • • • > or > 0 satisfy the properties stated in the first part of the theorem.. (T*(ui). To establish uniqueness. {u\. u2.26 are called the singular values of T.. By Theorem 6.T(vj)) ai 0 and hence for any 1 < i < m... Clearly T(vi) = GiUi if 1 < i < r. (5) if i — j < r otherwise. . this set extends to an orthonormal basis {u\. The unique scalars o\.Sec..v2.26 are simply reversed for T*.. Furthermore.. So for i < r.o2. I Definition.um}. ur. 6. If r is less than both m and n. suppose that {v\.. Then for 1 < i < m and 1 < j < n.. um} for W.3.ur} is orthonormal. the orthonormal bases given in the statement of Theorem 6. . the singular values of a linear transformation T: V —* W and its adjoint T* are identical..vn}. . 352)..u2.. Example 1 Let P2(R) and Pi(-R) be the polynomial spaces with inner products defined by (f(x).. T*T(vi) = T*(cTjWj) = <7jT*(uj) = afvi and T*T(VJ) = T*(0) = 0 for i > r.. where k is the minimum of m and n. Therefore each Vi is an eigenvector of T*T with corresponding eigenvalue of if i < r and 0 if i > r.26 are not uniquely determined because there is more than one orthonormal basis of eigenvectors of T*T..g(x))= / -l f{t)g{t)dt. In view of (5). then T*T(^j) = 0.or in Theorem 6.7 The Singular Value Decomposition and the Pseudoinverse 407 and hence {ui...

let ui = \ / g ( 3 ^ .0. respectively. and V-A — —=. and Pi(/?).) = OjUi for i = 1. Let A = m..3) = 0.^3} is an orthonormal basis for P2(R) consisting of eigenvectors of T*T with corresponding eigenvalues Ai. These eigenvalues correspond. to the orthonormal eigenvectors e 3 = (0.l ) .1.x o~\ V 2 and u2 = —T{v2) = -7=.2 and T(?. (See Exercise 15 of Section 6.v3] for P2(R) and 7 = {u\. and A3.) For this purpose. Now set °~i — \ A 7 = N/15 and a2 =• \[X2 — v3.^2. e 2 = (0.1). = (1. we translate this problem into the corresponding problem for a matrix representation of T. = 0 0 \/3 0 0 v 7 !^ Then which has eigenvalues (listed in descending order of size) Ai = 15. v2. respectively. 6 Inner Product Spaces Let T: P 2 (/?) —• P\(P) be the linear transformation defined by T(f(x)) = f'{x). and A3 = 0.0) in R3.2 to obtain orthonormal bases for P2(7?) and P^ft). Caution is advised here because not any matrix representation will do. A2. cr2 s/2 for Pi(/?). we use the results of Exercise 21(a) of Section 6. and take Mi = — T(i'i) = \ . P 2 (i?). Translating everything into the context of T. and c. where o\ > a2 > 0 are the nonzero singular values of T. to obtain the required basis 7 = {ui. the nonzero singular values of T. Find orthonormal bases li = {v\. we must use a matrix representation constructed from orthonormal bases for P2(li) and P\(R) to guarantee that the adjoint of the matrix representation of T is the same as the matrix representation of the adjoint of T.u2} .0. Since the adjoint is defined in terms of inner products.3. A2 = 3. To facilitate the computations.0). v/2 Then (3 — {^1. u2} for Pi(H) such that T(i'.408 Chap.

the unit circle in R2. We characterize S' in terms of an equation relating x[ and x2. we have x\ = x\a\ and x'2 = x2a2.1vi + x2v2) = xiT(t. This is illustrated in the next example.v2} and (3 = {ui.Sec. the equation u = T(v) means that u = T(a. if u = x\u\ + x'2u2.:r2T('<j2) = XiOi'ti\ + x2o2u2. this is the equation of an ellipse with major axis and minor axis oriented along the rc'-axis and the y'-axis. relative to 0. (See Figure 6.7 The Singular Value Decomposition and the Pseudoinverse 409 We can use singular values to describe how a figure is distorted by a linear transformation. Since T is invertible. where the x'-axis contains u\ and the ?/-axis contains u2. For any vector v = xi^i + x2v2 G R 2 . Example 2 Let T be an invertible linear operator on R2 and S'={a:GR 2 :||3.6.) • T V — X\V\ + X1V2 U = X1U1 + X2U2 Figure 6. it has rank equal to 2 and hence has singular values o\ > <r2 > 0.|| = l}. which we shall call the aj'^/'-coordinate system for R .u2} be orthonormal bases for R2 so that T(i>i) = o~iUi and T(f 2 ) = a2u2. We apply Theorem 6. respectively. Then /3 determines a coordinate system.6 . then [u]^ = ( . is the coordinate vector of u X. For any vector u € R2. u € S' if and only if v € S if and only if i^=xl+x22 = l.i) -f. 6. Let {vi. and if a\ > o2. Thus for u = x'xu\ + x'2u2. as in Theorem 6. this is the equation of a circle of radius o\.26 to describe 5 ' = T(5). Furthermore. If o\ = o2.26.

and let V be the nxn matrix whose jth. Furthermore. I Definition. 6 Inner Product Spaces The singular value theorem for linear transformations is useful in its matrix form because we can perform numerical computations on matrices.27 is called a singular value decomposition of A...27. the columns of V are the vectors in (3. Let A be an ra x n matrix of rank r with positive singular values o\ > o2 > • • • > or. is given by U{o-3e3) = o~3U{e3) = OjUj. So by Theorem 2. and hence AV = UE.) .. Then there exists an m x ra unitary matrix U and a n n x n unitary matrix V such that A = UXV*. and let E be the ra x n matrix defined by Sij — Oi ifi = j < r 0 otherwise. v2.. Definition. By Theorem 6.. Proof. We define the singular of A to be the singular values of the linear transformation LA • values Theorem 6.13(a) and (b).u2. hence they are the square roots of the nonzero eigenvalues of A*A or of AA*. Observe that the jth column of E is CTJCJ. the j t h column of UT. (See Exercise 9. there exist orthonormal bases f3 — {v\. and the columns of U are the vectors in 7. 90).. Let A be an m x n matrix of rank r with the positive singular values o\ > o2 > • • • > ay. Therefore A = AW* = UEV*.27 (Singular Value Decomposition Theorem for Matrices). where ej is the jth standard vector of F m . A factorization A = U"EV* where U and V are unitary matrices and E is the m x n matrix defined as in Theorem 6. are ra x n matrices whose corresponding columns are equal. the j t h column of AV is AVJ = Ojiij. By Theorem 2. Let A be an m x n matrix.410 Chap.um} for F m such that T(vi) — OiUi for 1 < i < r and T(Vi) = 0 for i > r. Let U be the ra x ra matrix whose j t h column is Uj for all j .26.. the nonzero singular values of A are the same as those of L^. column is Vj for all j. Note that both U and V are unitary matrices. It follows that AV and UT. In the proof of Theorem 6. vn} for F n and 7 = {u\.13(a) (p. We begin with the definition of the singular values of a matrix. Let T = L^: Fre -> F m ..

to obtain the 1 E = and J_ 1 \ \/2 V^ -1 1 V2 V6 . and A2 = A3 = 0.27.j U3 = _1_ 7e 1 1 -1 -1 411 the set 0 = {^1.^3} is an orthonormal basis for R3 consisting of eigenvectors of A* A with corresponding eigenvalues Ai = 6. the complex number of length 1 is replaced by a unitary matrix.M 2 } for R2. Then y/Q 0 0 0 0 0 /1 V^ 1 V = </Z \V3 Also. and set U = 1 1 —X I \y/2 y/2/ Then A — UT. / \ «i = — L A (Mi) orthonormal basis 7 = {«I. Theorem 6. For any square matrix A. Hence. and the nonnegative number is replaced by a positive semidefinite matrix. o\ = \/6 is the only nonzero singular value of A.28 (Polar Decomposition). we let V be the matrix whose columns are the vectors in 0. we take 1 1 A = —Av\ = —fc o~i Oi V2 Next choose u2 = —j= I J.Sec.7 The Singular Value Decomposition and the Pseudoinverse Example 3 We find a singular value decomposition for A = First observe that for 1 >'\ = 73 and * .V* is the desired singular value decomposition. In the case of matrices.27.£ i . as in Theorem 6. as in the proof of Theorem 6. The Polar Decomposition of a Square Matrix A singular value decomposition of a matrix can be used to factor a square matrix in a manner analogous to the factoring of a complex number as the product of a complex number of length 1 and a nonnegative number. • .^2. there exists a unitary matrix W and a positive semidefinite matrix P such that A = WP. a unit vector orthogonal to Wi. 6. Consequently.

is called a polar decomposition of A. where W and Z are unitary and P and Q are positive semidefinite.I „ ) <ri 5 Vand u2 = —Av 2 02 E = /l0\/2 0 5V2 . W is unitary. and so / = (QF-'YiQp-1) = P-1Q2P~]. Since both P and Q are positive definite'. Since A is invertible.27. I The factorization of a square matrix A as WP where W is unitary and P is positive semidefinite. we find the columns u\ and u2 of U: Mi = —Av\ = . where W = UV* and P = VEV*.V* = WP. Now suppose that A is invertible and factors as the products A = WP = ZQ. Thus o\ = v200 = 10v2 and o2 = VoO = 5\/2 are the singular values of A. So 0 — {v\. Example 4 11 . and since E is positive semidefinite and P is unitarily equivalent to E.412 Chap. it follows that P and Q are positive definite and invertible. Therefore W = Z. we begin by finding a sin10 -2 gular value decomposition WEV* of A. The object is to find an orthonormal basis 0 for R2 consisting of eigenvectors of A* A. So we have V = Next. and therefore Z*W = QP~X. So A = UYV* = UV*VY.5 . if A is invertible. Thus QP~X is unitary.5. Since W is the product of unitary matrices.4. then the representation is unique. It can be shown that To find the polar decomposition of A = = * 71 (-1 "2 = 7*2 ( I arc orthonormal eigenvectors of A* A with corresponding eigenvalues Ai — 200 and A2 = 50. it follows that P = Q by Exercise 17 of Section 6. P is positive semidefinite by Exercise 14 of Section 6. Hence P2 = Q2. and consequently the factorization is unique. Proof. 6 Inner Product Spaces Furthermore.v2} is an appropriate basis. By Theorem 6. there exist unitary matrices U and V and a diagonal matrix E with nonnegative diagonal entries such that A = t/EV*.

7 The Singular Value Decomposition and the Pseudoinverse Thus U = Therefore.28.— R(T) be the linear transformation defined by • L(x) = T(x) for all x G N(T)-1-. denoted by T'. Furthermore. As an extreme example. Let L: N(T)1. The pseudoinverse of a linear transformation T on a finite-dimensional inner product space exists even if T is not invertible. . and let T: V — W be a linear transformation. we have W = UV* = and P = VZV* = 413 The Pseudoinverse Let V and W be finite-dimensional inner product spaces over the same field. then T* = T _ 1 because N(T)-1 = V. It is desirable to have a • linear transformation from W to V that captures some of the essence of an inverse of T even if T is not invertible. is defined as the unique linear transformation from W to V such that Tf(M) = L-HM) 0 foryeR(T) foryeRiT)2-. Then L is invertible. consider the zero transformation To: V — W > between two finite-dimensional inner product spaces V and W. the restriction of T to N(T)"1. namely. and therefore T* is the zero transformation from W to V. 6. in the notation of Theorem 6. * Let L: N(T)-1 —• R(T) be the linear transformation defined by L(x) = T(x) for all x G N(T)-1. Then R(To) = {0}. and L (as just defined) coincides with T. Let V and W be finite-dimensional inner product spaces over the same field. and we can use the inverse of L to construct a linear transformation from W to V that salvages some of the benefits of an inverse of T. Tlie pseudoinverse (or Moore-Penrose generalized inverse) of T.Sec. if T is invertible. and let T: V — W be a linear transformation. Definition. A simple approach to this problem is to focus on the "part" of T that is invertible.

. Let 0 = {^1. T*(l) = x.ur] is a basis for R(T). > vn} and {u\..u2.We call B the pseudoinverse of A and denote it by B — A^. Therefore Oi I —Vi T f ( W j) = I Oi \ 0 Example 5 Let T: P2(R) —* Pi(-R) be the linear transformation defined by T(f(x)) — f'{x)..1). . Let A be an m x n matrix of rank r. Thus... T f (a + bx) = aV(l) + bV{x) = ax + ^{3x2 . ...V*.26.ur+2.u2} be the orthonormal bases for P2(R) and P\(R) in Example 1.. Suppose that V and W are finite-dimensional vector spaces and T : V — W is a linear transformation or rank r. ty} is a basis for N(T)-1-. • i V l if 1 < i < r (6) if r < i < ra.. Then L_1(UJ) — —Vi for 1 < i < r. {vr+i. Let 0 and 7 be the ordered bases whose vectors are the columns of V and U.. Then there exists a unique n x ra matrix B such that (Lyi)t: FTO — F n is equal to the left-multiplication transformation > LB.vr+2.um} be orthonormal bases for V and W..v2.. respectively..um} is a basis for R(T) X . as in Example 1. vn} is a basis for N(T). . as in the definition of pseudoinverse. {u\. ^ ^ 1).v2. Then {v\. . 6 Inner Product Spaces We can use the singular value theorem to describe the pseudoinverse of a linear transformation. It follows that Tt(y|..u2.^3} and 7 = {ui.1 ) . and {ur+\. Let {v\.414 Chap. Thus ( U ) + = L A t. Similarly. for any polynomial a + bx G P\(R). The pseudoinverse of A can be computed with the aid of a singular value decomposition A = UY.. and let oi > o2 > • • • > or be the nonzero singular values of T satisfying (4) in Theorem 6.)=Tt(Wi) = and hence Tt(*) = i ( 3 z 2 ...^.... The Pseudoinverse of a Matrix Let A be an ra x n matrix.. Then a\ = \/l5 and a 2 = v 3 are the nonzero singular values of T. Let L be the restriction of T to N(T)-1.

Then A^ — VT^U*.29. and (4) and (6) are satisfied for T = L^. Example 6 We find A^ for the matrix A — Since A is the matrix of Example 3. respectively.27. We know that the . I— \0 if i = j < r otherwise.V* and nonzero singular values o\ > o2 > • • • > ay. where A is the matrix of Example 6. Then 0 and 7 are orthonormal bases for F" and F m . and that T* = L^t • The Pseudoinverse and Systems of Linear Equations Let A be an ra x n matrix with entries in F. we obtain the following result.29 is actually the pseudoinverse of E. Reversing the roles of 0 and 7 in the proof of Theorem 6.29.Sec. or infinitely many solutions. Let E* be the n x ra matrix defined by . we have & A* = VtfU* = 1 V3 -1 1 V2 -1 V2 0 3s) 1 V§ _2_ 7e/ fav < ° 1 ( V2 \ /1 J 0 0 [jL 1° °y \V2 1 V2 -1 V2 1 1 /' 1 1 1 7 6 I-1 -1 1 V2 -1 v/2 fa 1 73 & 1 v/2 -1 V2 0 V6) 1 _2_ . Let A be anmxn matrix of rank r with a singular value decomposition A = UT. and so it either has no solutions. a unique solution. we can use the singular value decomposition obtained in that example: /1 A = UZV* = f = \72 By Theorem 6.7 The Singular Value Decomposition and the Pseudoinverse 415 respectively. Notice that E^ as defined in Theorem 6. the matrix equation Ax — b is a system of linear equations. 6. and let o\ > o2 > • • • > ay be the nonzero singular values of A. Theorem 6. v/6/ 'v^ 0 0 0 0 0 Notice that the linear transformation T of Example 5 is L^. and this is a singular value decomposition of A^. Then for any b G F m .

Thus z is a solution to the system. Az is the vector in R(T) nearest b. and let T: V'->W be linear. then A^b still exists. we have that ll^ll 5 ||M|| with equality if and only if z = y. (b) If Ax = b is inconsistent. Az = AA^b = TT'(6) = b is the orthogonal projection of b on R(T). That is. = (b) Suppose that Ax = b is inconsistent. If. then z is the unique best approximation to a solution having minimum norm. and therefore Az = AA*b = TT f (/. Consequently VT is the orthogonal projection of V on N(T) X . The proof of (b) is similar and is left as an exercise.. if A is invertible. If x G N(T) X . if Az = Ay.4*6 related to the system of linear equations Ax = 6? In order to answer this question. (a) Suppose that Ax = b is consistent. Consider the system of linear equations Ax = b. (b) TT' is the orthogonal projection of W on R(T). I T h e o r e m 6.30. on the other hand. where A is an in x // matrix and b G F'". L e m m a . 6 Inner Product Spaces system has a unique solution for every / G F m if and only if A is invertible. We therefore pose the following question: In general.\. Then (a) T ' T is the orthogonal projection of V on N(T)"1. and if x G N(T). then A~l — A*. therefore.) = T+(0) = 0. As in the earlier discussion./. then z is the unique solution to the system having minimum norm. (a) If Ax = b is consistent. then TtT(.416 Chap. By the lemma. by the corollary to Theorem 6. 350). with equality if and only if Az = Ay.6 (p. how is the vector . That is. let T = L. This proves (a). > in which case the solution is given by A~xb. if • i . we need the following lemma.) = 6 by part (b) of the lemma. then T+T(a. then ||z|| < || w|| with equality if and only if z — y. Let V and W be finite-dimensional inner product . and let z = A^b. Furthermore.—> W by L(x) — T{x) for all x G N(T)'. we define L: N(T)-1. If z = A^b. That is.) = L"1L(a:) = x. Furthermore.spaces. 350). Therefore. Then T+T(w) = A]Ay = AU) = z. A is not invertible or the system Ax = bis inconsistent. Now suppose that y is any solution to the system. z is a solution to the system. then \\z\\ < \\y\\ with equality if and only if z = y. \\Az — b\\ < \\Ay — b\\ for any i) G F". then z has the following properties. Proof. Observe that b G R(T).6 (p. For convenience. and hence z is the orthogonal projection of y on N(T)"1" by part (a) of the lemma. Proof. and if y is any solution to the system. by the corollary to Theorem 6. and so the solution can be written as x = A^b.

6. • .30(a). then \\Az — b\\ < \\Ay — b\\ with equality if and only if Az = Ay. and let b = '1 By Example 6.7 The Singular Value Decomposition and the Pseudoinverse 417 Ay is any other vector in R(T).30 is the vector x'o described in Theorem 6. it is the "best approximation" to a solution having minimum norm.x3 = 2.30(b). 1 1 1 1 -1 -1 the The first system has infinitely many solutions.Sec. The second system is obviously inconsistent.x2 . as described in Theorem 6. hence we may apply part (a) of this theorem to the system Ax = c to conclude that ||z|| < ||y|| with equality if and only if z = y. suppose that y is any vector in F n such that Az — Ay = c.X3 = 1 x\ + x2 .X3 = 1 x\ + x2 . Let b = though 1 1 -1 Thus.X3 = 1 and Xi + X2 . and therefore D -1 -lj is the solution of minimal norm by Theorem 6. ii Note that the vector z = A'b in Theorem 6. Example 7 Consider the linear systems X\ -I. Let A = coefficient matrix of the system. Then A*c = A^Az = A^AA^b = A^b = z by Exercise 23.12 that arises in the least squares application on pages 360 364. Finally. al- z = A^b = 6 is not a solution to the second system.

i)z2. g) = C f(t)g(t) dt. X\ — x2) + (b) T: P2(R) . sinx.418 EXERCISES Chap. Find a polar decomposition for each of the following matrices. then A is a singular value of A. (e) If A is an eigenvalue of a self-adjoint matrix A.. if a is a singular value of A. • 2. then |c|a is a singular value of cA. In each of the following. (g) The pseudoinverse of any linear operator exists even if the operator is not invertible. Find a singular value decomposition for each of the following matrices. (b) The singular values of any matrix A are the eigenvalues of A* A. and T is defined by T ( / ) = f + 2 / (d) T: C2 -> C2 defined by T(z1. the vector A*b is a solution to Ax = b. u2. (a) 0 -i) (b > ( 1 j { 5. vn} for V and {u\. find orthonormal bases {vi.x2) = (x\.• Pi(R). where T(f(x)) = f"(x). and the inner products are defined as in Example 1 (c) Let V = W = span({l.z2) = ((1 .. •M . • • •.v2. Label the following statements as true or false. (f) For any rnxn matrix A and any b G F n . (d) The singular values of any linear operator are nonnegative. Find an explicit formula for each of the following expressions.. and the nonzero singular values ai > a 2 > • • • > a r of T such that T(vi) = OiUi for 1 < i < r. Let T: V — W be a linear transformation of rank r. um} for W. 6 Inner Product Spaces 1. (c) For any matrix A and any scalar c. cosx}) with the inner product defined by (/. (1 + i)zx + z2) 3. (a) T: R2 — R3 defined by T{x\. 1 0 1 0 1 -1 /I i\ 0 I 1 0 V (d) (e) 1-M 1 1 — i —i l (0 I (a) (b) (c) V I l 0 -1 1 1 -2 1 1 1 4.. X\ + x2. (a) The singular values of any linear operator on a finite-dimensional vector space are also eigenvalues of the operator. where V and W are finite-dimensional inner product spaces.

as described in Theorem 6. find the unique solution having minimum norm.) (a) X\ + x\ + —X\ -\ x2 = 1 x2 = 2 X2 = 0 (b) X\ + x2 + xs -\.z 2 ). (a) (b) (c) (d) T T T T is the is the is the is the linear linear linear linear transformation transformation transformation transformation of of of of Exercise Exercise Exercise Exercise 2(a) 2(b) 2(c) 2(d) 8.2x 3 + x4 = -1 X\ — X2 + . Use the results of Exercise 3 to find the pseudoinverse of each of the following matrices. . where T is the linear transformation of Exercise 2(c) (d) T + iz\. Let V and W be finite-dimensional inner product spaces over F.7 The Singular Value Decomposition and the Pseudoinverse 419 (a) T+ (x\. where T is the linear transformation of Exercise 2(d) 6..30(b). Let T: V —• W is a linear transformation of rank r...Sec. x2. X3). (i) If the system is consistent. and suppose that ai > a 2 > • • • > a r > 0 arc such that T(Mj) = OjUi if 1 < i < r if r < i.. 0 .u2. find the "best approximation to a solution" having minimum norm.v2. (ii) Describe the subspace Z2 of W such that TT* is the orthogonal projection of W on Z 2 ..vn} and {u\. /l i\ 0 I 1 0 1 (b) (c) (a) 1 0 1 0 V (d) (e) 1-H 1-i 1N (f) I I I ly I I I 0 -l -2 1 1 1 7. and suppose that {v\. For each of the given systems of linear equations.T3 + £ 4 = 2 9. (ii) If the system is inconsistent. where T is the linear transformation of Exercise 2(b) (c) T + (a + 6 sin a: + coos re). » (i) Describe the subspace Zi of V such that T*T is the orthogonal projection of V on Z\. respectively.x4 — 2 Xi . where T is the linear transformation of Exercise 2(a) (b) T + (a + bx + ex2).. For each of the given linear transformations T : V — W. (Use your answers to parts (a) and (f) of Exercise 6.. 6.um} are orthonormal bases for V and W.

. Prove that if A is a positive definite matrix and A — UEV* is a singular value decomposition of A. An. Prove that the singular values of T are |Ai|. .420 Chap. A 2 .. Prove that if A is a positive semidefinite matrix. 15.. (a) Let T be a normal linear operator on an ?i-dimensional inner product space with eigenvalues Ai. including repetitions.V* is a singular value decomposition of A 13. 10. . Let A be a square matrix with a polar decomposition A = WP. then the singular values of A are the same as the eigenvalues of A.vn} and corresponding eigenvalues Ai.5 to obtain another proof of Theorem 6. including repetitions. (d) State and prove a result for matrices analogous to (c). = (b) Let A be an ra x n matrix with real or complex entries. 11. Use Exercise 8 of Section 2. . then U = V. .. .u2. A 2 . the singular value decomposition theorem for matrices.. Let A be a normal matrix with an orthonormal basis of eigenvectors 0 — {v\. then UT. This exercise relates the singular values of a well-behaved linear operator or matrix to its eigenvalues. . . 14. . Let A be a square matrix. |A 2 |.. where A.v2. (c) Prove that TT* and T*T have the same nonzero eigenvalues. A m . (b) Use (a) to prove that A is normal if and only if WP = PW. ... . Prove an alternate form of the polar decomposition for A: There exists a unitary matrix W and a positive semidefinite matrix P such that A = PW. 12.um} is a set of eigenvectors of TT* with corresponding eigenvalues Ai.. 6 Inner Product Spaces (a) Prove that {u\. . 16. Prove that for each i there is a scalar 0j of absolute value 1 such that if U is the n x n matrix with 9iVt as column i and E is the diagonal matrix such that Ejj = |Aj| for each i.27. Prove that the nonzero singular values of A are the positive square roots of the nonzero eigenvalues of AA*. . Let V be the n x n matrix whose columns are the vectors in 0.. (a) Prove that A is normal if and only if WP2 = P2W.. An. . . (b) State and prove a result for matrices analogous to (a). |A n |. A 2 .

* (a) If T is one-to-one. Prove the following results. 19. . State and prove a result for matrices that is analogous to the result of Exercise 21.0). Let V and W be finite-dimensional inner product spaces. (a) The nonzero singular values of A are the same as the nonzero singular values of A*. > (a) TT*T = T. (a) For any ra x ra unitary matrix G. but (AB)^ ^ 18. and let T: V — W be linear. 23.x2) = (xi. 24. Let A be a square matrix such that A2 = O. which are the same as the nonzero singular values of A1. Let V and W be finite-dimensional inner product spaces. and let T: V — W be linear.0) and U(xi. (c) Both T*T and TT* are self-adjoint. (b) Exhibit matrices A and B such that AB is defined. and they characterize the pseudoinverse of a linear transformation as shown in Exercise 22.7 The Singular Value Decomposition and the Pseudoinverse 17. (GAY = A^G*.4*)* = (A)*. (b) For any n x n unitary matrix H. (c) (At)«-(A*)t.x2) T(xi. The preceding three statements are called the P e n r o s e conditions. State and prove a result for matrices that is analogous to the result of Exercise 22. 22. 421 G R2 by (a) Prove that (UT)* ^ T*U*. Let T and U be linear operators on R2 defined for all (xi. then T*T is invertible and T* = (T*T)~ 1 T*. Prove the following results. (b) If T is onto. > and both UT and TU are self-adjoint. 25. 6. Prove the following results.Sec. 20. then TT* is invertible and T* = T*(TT*) _ 1 . 21. UTU = U. (b) (. Let T: V — W > and U: W — V be linear transformations such that TUT = T. Let V and W be finite-dimensional inner product spaces. Prove that U = T*. Let ^4 be an ra x n matrix. (b) T*TT*=T*. (A//)* = ifM*. Let A be a matrix with real or complex entries. Prove the following results. Prove that (A*)2 = O.x 2 ) = (xi + x 2 .

Observe that an inner product on a vector space is a bilinear form if the underlying field is real. y) = aH(x]. y2) for all x.y) = x 'Ay. = 2ai&i 4. y) + H(x2. Example 1 Define a function H: R2 x R2 H R by GR2. Bilinear Forms Definition. and we consider some of its applications to quadratic surfaces and multivariable calculus. x 2 . 2 4 ? . but not.8* BILINEAR AND QUADRATIC FORMS There is a certain class of scalar-valued functions of two variables defined on a vector space that arises in the study of such diverse subjects as geometry and multivariable calculus. A function II from the set V x V of ordered pairs of vectors to F is called a bilinear form on V if H is linear in each variable when the other variable is held fixed. We denote the set of all bilinear forms on V by #(V). H is a bilinear form on V if (a) H{ax\ + x2. y) for a l l x \ .y2) — aH(x. We study the basic properties of this class with a special emphasis on symmetric bilinear forms. Prove » that ([T|j)t = [Tt]?. However.y2 G V and a G F. \a2 and y— The bilinearity of H now follows directly from the distributive property of matrix multiplication over matrix addition. it is more enlightening and less tedious to observe that if A = then H(x. Let V and W be finite-dimensional inner product spaces. respectively.wv /«. that is. Let V be a vector space over a field F. Prove part (b) of the lemma > to Theorem 6. y\.422 Chap. yi) 4 H(x. mj\ 4. 6 Inner Product Spaces 26. and let T: V — W be a linear transformation. This is the class of bilinear forms. 6.3ai&2 + 4a2&i — a2b2 for We could verify directly that II is a bilinear form on R2. • . y G V and a G F (b) H(x. Let V and W be finite-dimensional inner product spaces with orthonormal bases 0 and 7. and let T: V — W be linear. if the underlying field is complex. 27.30: TT* is the orthogonal projection of W on R(T).

1.y) The following theorem is an immediate consequence of the definitions.Sec.• F by H(x. We identify this matrix with its single entry.7: V x V — F is defined by J(x.w G V. Their proofs are left to the reader (see Exercise 2). H(x + y. w). y) = xlAy for x. z) 4 H(y. the functions Lx. x\)Ay • We list several properties possessed by all bilinear forms. 6. Let V be a vector space. H(x. For any A G M n X n (F). w) + H(y.y) and Rx(y) = H(y. let H\ and H2 be bilinear forms on V. y.y) = a(Hi(x.y G V. for any x G V. where the vectors are considered as column vectors. we have H(axi 4. and let a be a scalar.y) = (axi + x2)tAy = {ax\ 4 = ax\Ay 4 x\Ay = aH{x1.z.y) > form. If.y) 4 H2{x. 3. x) = H(x.y) + H(x2.z + w) = H(x. The bilinearity of H follows as in Example 1. y G V. = H(y. for a G F and x\. z) 4 H(x.y).x) forallyGV. = Hx{x.x2. For all x. H(0. Example 2 423 Let V = F n . define H: V x V . Notice that since x and y are n x 1 matrices and A is an nxn matrix.y)) forallx. then Lx and R^ are linear. 4.8 Bilinear and Quadratic Forms The preceding bilinear form is a special case of the next example.H2){x.y) and (aHi)(x. 2. If . 0) = 0 for all x G V. .x2.x). the followingproperties hold. For any bilinear form i J o n a vector space V over a field F. then J is a bilinear Definitions.y G V. We define the sum Hi 4 H2 and the scalar product aH\ by the equations (f/i 4. For example.y) is a 1 x 1 matrix. Rx: V — F are defined by * Lx(y) = H{x.

(F).v2. The matrix A above is called the matrix representation of H with respect to the ordered basis 0 and is denoted bytppffi). . = 2 4 3 .1 = 8. 2 . We can associate with // an . = 2 + 3 + 4 .4 + 1 = 2. j = 1 . Definition. r and B = $p(H)..424 Chap. . | Let 0 = {vx. Vj) for i. If 7 is the standard ordered basis for R . For any vector space V.vn} be an ordered basis for an n-dimensional vector space V. B(V) is a vector space with respect to these operations. 6 Inner Product Spaces Theorem 6. . and let 0 = Then I3U = H BV2 = H B2\ = H and B22 =a H So MH) = 4 . We can therefore regard ijjp as a mapping from S(V) to M„ x r . the sum of two bilinear forms and the product of a scalar and a bilinear form on V are again bilinear forms on V. the reader can verify that iMH) = '2 4 3N -1 .31.. = 2 . Exercise.2 -1 = 2-3-4-1 =-6. We first consider an example and then show that ipp is an isomorphism. and let H G £(V).. n. Furthermore. Proof./ x n matrix A whose entry in row i and column j is defined by Aij = H{vi. that takes a bilinear form H into its matrix representation ipp(H). Example 3 Consider the bilinear form H of Example 1. . . where F is the field of scalars for V.3 + 441=4.

y) = lMx)]*A\^p(y)] for all x. Exercise.32. H(Vi. 6. xn (F). A slight embellishment of the method of Example 2 can be used to prove that H G B{V). and 0 be the standard ordered basis for F n . namely. By hypothesis. iftp: i?(V) — M r t X n (F) is an isomorphism. sion n2. #(V) lias dimenI The following corollary is easily established by reviewing the proof of Theorem 6. We show that tj>p(H) = A. For any n-dimcnsional vector space V over F and any ordered basis 0 for V. and recall the mapping LVi: V — F . > Proof. we view (f)p(x) G F n as > a column vector. y G F". For any n-dimensional vector space V.Vj) = 0 for all Vj G 0. For x G V. A = ipp{H). Corollary 1. for any i and j. Fix Vi G 0. y) = xlAy for all x. We leave the proof that ipp is linear to the reader. Then for any H G B(fn). To show that i/jp is onto. LVi(vj) = H(vi. The following result is now an immediate consequence of Corollary 2. So H{vi. consider any A G M n X n ( F ) . So H{x. Corollary 3. Let H: V X V — F be the mapping defined by > H(x. x) = LVi (x) = 0 for all x G V and v{ G 0. Proof. which is linear by * property 1 on page 423. y G V. ^cn ij)0{H) = A if and only ifH(x. there exists a unique matrix A G M„. To show that ipp is one-to-one. and recall the linear mapping Ry: V —> F defined in property 1 on page 423. and hence Ry is the zero transformation. n a positive integer. suppose that ij^p(H) = O for some H G B(V). Let F be a field.y) — Ry(x) = 0 for all x. By (7). | We conclude that •tfipiH) = A and V^/3 is onto. Recall the isomorphism <f>p: V — F n defined in Section 2.j\ hence. Thus H is the zero bilinear form.y G V.4. If H G B{\1) and A G M„Xn(-F).y) = 0 for all Vi G 0. such that H(x. and therefore V/3 is one-to-one.= Ai3.Sec. Hence LVi is the zero transformation from V to F. .y) = [^(x)]*A[<f>e(y)] for all x.8 Bilinear and Quadratic Forms 425 T h e o r e m 6. Ry(vi) = H(vi. Then <^(fj) = ej and 4>(i{vj) = e. Let V be an n-dimensional vector space over F with ordered basis 0. Let viyv3 G 0.32.y G V. (7) Next fix an arbitrary y G V.Vj) = [Mvi)]tA[Mvj)} = e\Ae3. Corollary 2.

we have Av = d e t | * '''" Therefore A = {< ()) A 12 = d e t ( j and A22 = det ( f)=l. leaving the other proof for the exercises (see Exercise 13). Definition.y) . . We give the more direct proof here. we have 0 7 (J7) = Q'ipg{H)Q. n h u r ' n : B ^ ^ Chap.. Theorem 6. Then.. Let V be a finite-dimensional vector space with ordered bases 0 = {vi.j) for all i and j ..v2. •. the corresponding question leads to another relation on square matrices.e.y G R2.w2. vn) and 7 = {w\. while the other follows immediately from a clever observation. one can pose the following question: How docs the matrix corresponding to a fixed bilinear form change when the ordered basis is changed? As we have seen.. Let A. One involves a direct computation. Observe that the relation of congruence is an equivalence relation (sec Exercise 12).. and let Q be the change of coordinate matrix changing ^-coordinates into 0-coordinates. Proof. The next theorem relates congruence to the matrix representation of a bilinear form. There are essentially two proofs of this theorem. B G M „ x n ( F ) . for any H G #(V). In the case of bilinear forms. the corresponding question for matrix representations of linear operators leads to the definition of the similarity relation on square matrices.x'Ay for all x.33. As in the case of linear operators. We find the matrix A in Corollary 3 such that H{x.. Since Ai3 — H(ci. relation. 6 Inner Product Spaces ta 6i &MS> It can be shown that H is a bilinear form.426 Example 4 Define a function H: R2 x R2 -^ R by // "l . w7l). 1 1 01 0 1 -1 0 There is an analogy between bilinear forms and linear operators on finitedimensional vector spaces in that both are associated with unique square matrices and the correspondences depend on the choice of an ordered basis for the vector space. the congruence. Then B is said to be congruent to A if there exists an invertible matrix Q G M n x n ( F ) such that B = Q'AQ. Therefore ij^{H) is congruent tod'3{H).

i. and let H be a bilinear form on V.vn}..Wj J n = fc=sl = YL ®kiH I Vk> 5 Z QriVr) fc=l V 7-1 / T t n = 22 ^kl S Q'jH(vk. Qijvi = 7=1 f° r 1 < i < M. if B is congruent to ipp(H). then there exists an ordered basis 7 for V such that ip^{H) = B.v2. then Q changes 7-coordinates into 0-coordinates.8 Bilinear and Quadratic Forms Suppose that A — ipp(H) and B = ipy(H)..w2. Vr) fc=l r=l n n fc=l » fc=l 7i 7=1 7 1 r=l ^QkiH(vk.QkiVk k=\ Thus Bij = H{w. Let 7 = {w\. 6. Suppose that B = Q'ij)ft(H)Q for some invertible matrix Q and that 0 = {v\. Let V he an n-dimensional vector space with ordered basis 0. if B = Qt"iftp(H)Q for some invertible matrix Q. where = 2_. I The following result is the converse of Theorem 6.. .Wj) = H I 'Y^QkiVk. Proof.. Corollary. Then for 1 < i. r=l 427 fc=l T J = y£Qlk(AQhJ fe=i Hence B = Q*AQ...wn].33. . = (QtAQ)lj.Wj) and w3 =^TQrjVr.. = ^. Furthermore.Sec. j < n. For any n xn matrix B.

Definition.y) for all x. Thus C = B. Then H is symmetric.y) = H(.y G V. and let ii be an ordered basis for V. Let H be a diagonalizable bilinear form ou a finite-dimensional vector space V.x) for all x. .y) = H(y. Then II is symmetric if and only ifif)p(H) is symmetric. B = QtMH)Q=<!h(H). L I Definition. A bilinear form H on a vector space V is symmetric H(x. Let . Symmetric Bilinear Forms Like the diagonalization problem for linear operators. Then for 1 < i.x) = J(x.va./ is a bilinear form.428 Chap.32. symmetric bilinear forms correspond to symmetric matrices. suppose that B is symmetric.r.. Corollary. there is a close relationship between diagonalizable bilinear forms and those that are called symmetric.. Therefore. for 1 < i.. As we will see.y) = H(y. be the mapping defined by J(x. the problem of determining those bilinear forms for which there are diagonal matrix representations. v„ } and B = ip0{H). j < n. Let C — ipe{J)Then. there is an analogous diagonalization problem for bilinear forms. Let 6 = {wi. 6 Inner Product Spaces Since Q is invertible. namely. Let H be a bilinear form on a finite-dimensional vector space V.34. we have . CIJ=J(v1./: V x V — F.x) for all x. and therefore H is symmetric.vj) = H(vJ. First assume that H is symmetric. by Theorem 6. 7 is an ordered basis for V./ = //. y G V. A bilinear form H on a finite-dimensional vector space V is called diagonalizable if there is an ordered basis 0 for V such that 0^(H) is a diagonal matrix. and it follows that B is symmetric.vl) = BJI = Bu. Theorem 6. Hence H(y. and Q is the change of coordinate matrix that changes 7-coordinates into /^-coordinates. where F is • the field of scalars for V. if I As the name suggests..y G V.j < n. Conversely. Proof.vj) = H(v3yvi) = Bji. Bij = H(vi. . By property 4 on page 423. Since ijjp is one-to-one.

So by Theorem 6. Clearly H is symmetric. Since the only nonzero scalar of F is 1. if 0 is the standard ordered basis for V. Therefore H is not diagonalizable. 1 Unfortunately. and hence. Its failure to be diagonalizable is due to the fact that the scalar field Z2 is of characteristic two. B = Suppose that QThen = B = Q'AQ a c b d 0 1 1 0 a b c d ac + ac be + ad be + ad bd + bd 1 0 0 1 But p + p = 0 for all p G F. there exists an invertible matrix Q such that B = Q* AQ. by Theorem 6. column 1 entries of the matrices in the equation above. comparing the row 1. the converse is not true. H is symmetric. We show that H is not diagonalizable. D is a symmetric matrix. and consequently the diagonal entries of B are nonzero.\/ = F 2 .34. Then there is an ordered basis 0 for V such that ij)@ (H) = D is a diagonal matrix. hence ac + ac — 0. it follows that rank (73) = rank(^4) = 2. we conclude that 1 = 0.8 Bilinear and Quadratic Forms 429 Proof Suppose that H is diagonalizable. Trivially. then A = #(//) = ^ JV a symmetric matrix. Example 5 Let F = Z2.33. Thus. Since Q is invertible. By way of contradiction. • The bilinear form of Example 5 is an anomaly. In fact. Then there is an ordered basis 7 for V such that B — 0 7 ( i / ) is a diagonal matrix. Recall . as is illustrated by the following example. a contradiction. and H: V x V H F be the bilinear form defined by = a\b2 +a2bx. 6. suppose that H is diagonalizable.Sec.

j < n . by the induction hypothesis. Then H(x. L^ is linear. If n — 1. and suppose that diin(V) = n.n — 1. Then vn <£ N(LX).:: V — F defined by L. Let F be a field that is not of characteristic two. and so 0 ~ {MI. In addition.„) = H(un. and therefore II is diagonalizable. . Theorem 6. Exercise.) = 1. We use mathematical induction on n = dim(V). Before proving the converse of the corollary to Theorem 6.u) + H{u. Recall the function L. we can choose vectors u.u) + H{v. Let H be a nonzero symmetric bilinear form on a vector space V over a field F not of characteristic two.430 Chap. then trivially H is diagonalizable.(/y) = H(x. If F is not of characteristic two. Lemma. . 6 Inner Product Spaces from Appendix C that a field F is of characteristic two if 1 + 1 = 0 in F. Then every symmetric bilinear form on V is diagonalizable. so suppose that 77 is a nonzero symmetric bilinear form on V.. If H(u. .v) ^ 0.x) ^ 0.x) / 0.1). Tims. 1 Corollary. which we denote by 1/2..r) is obviously a symmetric bilinear form on a vector space of dimension // — 1.35. and hence dim(N(L J )) = n . .1.v) + H(v. Furthermore. set x = u + v.v) = 2H(u.u) ^ 0 or H(v. there exists an ordered basis {vi. .v2.vn i} for INKL^) such that II(vi. If // is the zero bilinear form on V.Vj) = 0 for i / j (1 < i. The restriction of 77 to N(L. We conclude that ijipiH) is a diagonal matrix. then A is congruent to a diagonal matrix. . Then there is a vector x in V such that H(x. Proof. Now suppose that the theorem is valid for vector spaces of dimension less than n for some fixed integer n > 1. Let V be a finite-dimensional vector space over a field F not of characteristic two. Consequently. since L a (x) = II(x. x) ^ 0.x) = H(u. there is nothing to prove. .v) y£ 0.. of characteristic two. By property 1 on page 423. Proof. then every element of <B(V) is diagonalizable. I .y) for all y G V. there exists a nonzero vector x in V such that 77(x.v) ^ 0 | because 2 ^ 0 and H{u.Vi) — 0 for i = 1 . By the lemma.. we establish the following Lemma.34 for scalar fields that are not. 2 . Lx is nonzero.U2. .vn) is an ordered basis for V. If A G M„ x „(F) is a symmetric matrix. Set yn = x. rank(L. /7(u. i. Proof. Since H is nonzero. Otherwise. v G V such that H(u. then 1 + 1 = 2 has a multiplicative inverse. v) ^ 0. .

By the corollary to Theorem 6.) Suppose that Q is an invertible matrix and D is a diagonal matrix such that QlAQ = D. we add the first column of A to the second column to produce a zero in row 1 and column 2. 6. (Note that the order of the operations can be reversed because of the associative property of matrix multiplication.1. ElA can be obtained by performing the same operation on the rows of A rather than on its columns. Thus E'AE can be obtained from A by performing an elementary operation on the columns of A and then performing the same operation on the rows of AE. then AE can be obtained by performing an elementary column operation on A. Thus D = Q'AQ = ££££_i • • • E\AE.D G M n x n ( F ) such that Q is invertible. and if Q — E\E2 • .35.Sec. 159). we conclude that by means of several elementary column operations and the corresponding row operations. If E is an elementary nxn matrix. This method requires familiarity with elementary matrices and their properties.• Ek.Ek are the elem. which the reader may wish to review in Section 3. A can be transformed into a diagonal matrix D. . To this end. The elementary matrix that corresponds to this elementary column operation is E. then Q'AQ = D. Furthermore.E2... From the preceding equation. if E\.i(R) defined by We use the procedure just described to find an invertible matrix Q and a diagonal matrix D such that Q'AQ = D. Example 6 Let A be the symmetric matrix in M:ix.. By Corollary 3 to Theorem 3. = .8 Bilinear and Quadratic Forms Diagonalization of Symmetric Matrices 431 Let A be a symmetric n x n matrix with entries from a field F not of characteristic two.E2 • • • Ek. sayQ = E1E2---Ek. By Exercise 21. We begin by eliminating all of the nonzero entries in the first row and first column except for the entry in column 1 and row 1.entary matrices corresponding to these elementary column operations indexed in the order performed.6 (p. and Ql AQ = D. D is diagonal. there are matrices Q. We now give a method for computing Q and D. Q is a product of elementary matrices.

4 \0 0 1/ /I 0 D = I0 1 \0 0 • 0N 0 -24. The corresponding elementary matrix E2 and the result of the elementary operations E^E\AEXE2 are. we subtract 4 times the second column of E2E\AE\E2 from the third column and follow this with the corresponding row operation. Starting with the matrix A of the preceding example.= 0 -3 0 1 0 0 0 1 1 0 0 4 0 1 0 4 -8 and E\E\AEiE2 = Finally. where D is a diagonal matrix and B = Q*. the process is complete. 4 1 We now use the first column of E[AE\ to eliminate the 3 in row 1 column 3. E. 1 0 0 1 0 0 0 -4 1 E^El2E\AElE2E3 0 0 0 1 0 0 0 -24 E3 = and = Since we have obtained a diagonal matrix. 6 Inner Product Spaces We perforin the corresponding elementary operation on the rows of AE\ to obtain /1 E\AEi = 0 \3 0 3N 1 4 | . respectively. respectively. and to obtain the desired diagonalization Q' AQ = D. and follow this operation with the corresponding row operation. It then follows that D = Q'AQ.7 \ Q = E1E2E3 = I 0 1 . this method produces the following sequence of matrices: 1 -1 3 2 1 -1 1 1 3 1 0 3 1 1 1 3 1 1 (A\I)= . The method is inspired by the algorithm for computing the inverse of a matrix developed in Section 3.432 Chap. We use a sequence of elementary column operations and corresponding row operations to change the n x 2n matrix (A\I) into the form (D\B).2. The reader should justify the following method for computing Q without recording each elementary matrix separately. So we let /I 1 . The corresponding elementary matrix E3 and the result of the elementary operations E!iE2E\AE\E2E:i are.

Sec. x) for all x G V.. Let V be a vector space over F. then we can recover 77 from K because H(x...t2.. 6...y) = -[K(x + y) (See Exercise 16.8 Bilinear and Quadratic Forms 433 Therefore F> = Q = l and (l 1 Q= I0 1 VO 0 -V -4 1. define the polynomial f(h.K(y)} (9) .. x) for some symmetric bilinear form 77 on V.tn that take values in a field F not of characteristic two and given (not necessarily distinct) scalars aij (1 < i < j < n).) Example 7 The classic example of a quadratic form is the homogeneous second-degree polynomial of several variables. In fact. if K is a quadratic form on a vector space V over a field F not of characteristic two. and K(x) = 77(x.t2. Given the variables t\. (8) If the field F is not of characteristic two.. Quadratic Forms Associated with symmetric bilinear forms are functions called quadratic forms. A function K: V —• F is called a quadratic form if there exists a symmetric bilinear form 77 G #(V) such that K(x) = H(x. tn) = ^2 a-ijtitji<3 K{x) . there is a one-to-one correspondence between symmetric bilinear forms and quadratic forms given by (8). . Definition.

.v2.36. 0 is orthonormal by Exercise 30 of Section 6.. Then there exists an orthonormal basis 0 for V such that «/'.y G R'\ we see that f(ti. since Q is orthogonal and 7 is orthonormal. Let V be a finite-dimensional real inner product space. 384). Choose any orthonormal basis 7 = {vi. To see this. h) = 2/'f . SI L . then the symmetric bilinear form 77 corresponding to the quadratic form / has the matrix representation ipn(H) = A.e3) = A j from the quadratic form K. In fact.. 7=1 By Theorem 6. there exists an orthogonal matrix Q and a diagonal matrix D such that D — Q' AQ by Theorem 6.t\ + 6tit 2 .4t 2 t 3 with real coefficients.*( 77) is a diagonal matrix. and verify that / is computable from H by (8) using / in place of K. let Setting H(x.t2)t3) = (tut2.y) = x' Ay for all x. . and let A — */>-v(T7).33. . Furthermore. Since A is symmetric.vn} for V. Wn} be defined by ivj = Y J QijVi for 1 < 7 < n. T h e o r e m 6...20 p. For example. given the polynomial / ( ' i . the theory of symmetric bilinear forms and quadratic forms on finite-dimensional vector spaces over 77 is especially nice.. ^#(77) = D.w2.20. 6 Inner Product Spaces Any such polynomial is a quadratic form. The following theorem and its corollary are useful.5. where A 7 — A'* ~ ~ an \ai3 if t = j if i^j..434 Chap. Proof. ' 2 . and let H be a symmetric bilinear form on V.t3)A % ( t2 Quadratic Forms Over the Field R Since symmetric matrices over 7? are orthogonally diagonalizable (see Theorem 6. apply (9) to obtain H(ei. if 3 is the standard ordered basis for F". Let 0 = {wi.

t 2 ) = 5i 2 + 2r. then f{t\. then 0 can be chosen to be any orthonormal basis for V such that il>p(H) is a diagonal matrix. (10) we find an orthonormal basis 7 = {xi...2>. and suppose that x = X^iLi SiVi. . . 6. There exists an orthonormal basis 0 = {v\.Then fsi\ s2 (s1.x 2 } for R2 and scalars Ai and A2 such that if 1 ) G R2 and ( '} ] = SiXi + s 2 x 2 . . A 2 .x) = [<f>p(x)]tD[<f>(i(x)} = = I > f . . . 7=1 I Example 8 For the homogeneous real polynomial of degree 2 defined by / ( t i ..s2. Proof.sn)D K(x)=H(x. . as an expression involving .vn} for V such that tpp(H) is the diagonal matrix /Ai 0 D = \0 0 A2 0 ••• 0\ 0 An/ Let x G V.x) for all x G V.2 + 4 ll .=i then K(x) = J2XiS2. Thus the polynomial f(ti. . Let 77 be the symmetric bilinear form for which K{x) = H(x..i 2 . In fact. An (not necessarily distinct) such that if x G V and n x = ^2siVi.t2) = Xis2 + A2A. We can think of s\ and $2 as the coordinates of (ti...t2).Sec...t2) relative to 7. By Theorem 6. there exists an orthonormal basis 0 — {v\.36.v2. .8 Bilinear and Quadratic Forms 435 Corollary. Si G 7?.v2. Let K be a quadratic form on a finite-dimensional real inner product space V..vn} for V and scalars Ai. if 77 is the symmetric bilinear form determined by K.

= 7i(i) •°d W2 = 7s(-2)' Let 7 = {vi.36. V7(77) = Q*il>p{H)Q = QlAQ = Thus by the corollary to Theorem 6. let 0 be the standard ordered basis for R2.s2) = XiS2 + A2s2 interpreted as an expression involving the coordinates of a point relative to the new ordered basis 7. Then 7 is an orthonormal basis for R2 consisting of eigenvectors of A.*3 + 14 = 0. we find an orthogonal matrix Q such that Q'AQ is a diagonal matrix. Example 9 Let «S be the surface in R3 defined by the equation . • '6 0N y (| L The next example illustrates how the theory of quadratic forms can be applied to the problem of describing quadratic surfaces in R3. Then Next. Consequently. 6 Inner Product Spaces the coordinates of a point with respect to the standard ordered basis for R2. K(x) = 6sl + si for any x = s\v 1 + s2i>2 G R2. So g(si.2*2*3 + 2*1 + 3*i . s2) = 6s2 + s 2 . and let i4 = ^ ( H ) . (11) 2*f + 6*i*2 + 5*1 . setting Q J _ (2 1 we see that Q is an orthogonal matrix and QlAQ = '6 0 0 1 Clearly Q is also a change of coordinate matrix.436 Chap. . Let 77 denote the symmetric bilinear form corresponding to the quadratic form defined by (10). is transformed into a new polynomial g(si.2*2 .v2}. Hence. For this purpose. observe that Ai = 6 and A2 = 1 are the eigenvalues of A with corresponding orthonormal eigenvectors ".

we diagonalize K. We begin with the observation that the terms of second degree on the left side of (11) add to form a quadratic form K on R3: f< I t 2 = 2*2 + 6*i*2 + 5*1 .36. 6. and let A — ipp{H).8 Bilinear and Quadratic Forms 437 Then (11) describes the points of S in terms of their coordinates relative to 0. if x = siVi + s2v2 + S3M3.v2. Then The characteristic polynomial of A is (—1)(* — 2)(* — 7)*. and A3 = 0.2*2*3 + 2*§.Sec. and ^(H) = QtMH)Q ^ = QtAQ= /2 [0 \0 0 7 0 0\ 0). Next.v3} and 1 vTo Q = 0 3 \VTo 3 \/35 5 -1 -3 y/Ti 2 1 W As in Example 8. hence A has the eigenvalues Ai = 2. then K(x) = 2s 2 + 7s|. A2 = 7. Q is a change of coordinate matrix changing 7-coordinates to /3-coordinates. We find a new orthonormal basis 7 for R3 so that the equation describing the coordinates of < relative to 7 is simpler S than (11). Corresponding unit eigenvectors are and V3 = Vi = VTo\3J v2 = Set 7 = {vi. (12) . 0/ By the corollary to Theorem 6. the standard ordered basis for R3. Let 77 be the symmetric bilinear form corresponding to K.

by Theorem 2.S1 s2 '35 S3 VT4 5.438 Chap.s2 \/3o~ 3.s2 *i = —7= • + 10 V35 *2 = and *a = 3. t3) G R3.22 (p. Then. 6 Inner Product Spaces Figure 6.7 We are now ready to transform (11) into an equation involving coordinates relative to 7. I l l ) .S3 2s 3 \/\l . and therefore Si 3. Let x — (ti. t2. and suppose that x = S1V1 -r82v2+s$V3.

rewritten as . . 2000). Upper Saddle River.7 is a sketch of the surface S drawn so that the vectors V\. For a statement and proof of the multivariable version. It is a well-known fact that if / has a local extremum at a point p G R n .Sec. * n ) be a fixed real-valued function of n real variables for which all third-order partial derivatives exist and are continuous. if / has a local extremum at p = (pi. and the preceding equation.p2. z — 7 (xT v ' + 2 coincides with the surface S. we say that / has a local extremum at p. • • • iPn)-. . . 439 Combining (11). and V3.8 Bilinear and Quadratic Forms Thus 14s? 3*i — 2*2 — *3 = — = -V14s 3 . An Introduction to Analysis 2d ed. Likewise. / has a local minimum at p G Rn if there exists a 5 > 0 such that f(p) < f(x) whenever ||x— p\\ < 6. by William R. We recognize S to be an elliptic paraboloid. • The Second Derivative Test for Functions of Several Variables We now consider an application of the theory of quadratic forms to multivariable calculus—the derivation of the second derivative test for local extrema of a function of several variables. . respectively. then x G S if and only if 2s^ + lsA2 . consult. For. then for any i = 1 . Figure 6. the graph of the equation.J. The reader is undoubtedly familiar with the one-variable version of Taylor's theorem. n. We assume an acquaintance with the calculus of functions of several variables to the extent of Taylor's theorem.\/Us3 + 14 = 0 or s3 = ^ s ? + — &\ + Vu. if we draw new axes x'. • • •. for example. Wade (Prentice Hall. N. For practical purposes. we conclude that if x G R3 and x = Sifi + S2^2 + S3M3. n the . 2 . the scale of the z' axis has been adjusted so that the figure fits the page. 2 . If / has either a local minimum or a local maximum at p. Let z — /(*i. v2 and V3 are oriented to lie in the principal directions.'\2 ^iy') 14. and z' in the directions of V\. . . A point p G R n is called a critical point of / if df(p)/dti = 0 for i = 1 . . . Consequently. then p is a critical point of / ..v2. *2. 6.y'. (12). The function / is said to have a local maximum at a point p G R n if there exists a 6 > 0 such that f(p) > f(x) whenever \\x — p|| < 8.

.pi-\. These partials determine a matrix A(p) in which the row i. 1. Let p = (p\. If p ^ 0.pn +*n) .. Let / ( t i . But critical points are not necessarily local extrema. (d) If rank(A(p)) < n and A(p) does not have both positive and negative eigenvalues. . (a) If all eigenvalues of A(p) arc positive. Note that if the thirdorder partial derivatives of / are continuous. (b) If all eigenvalues of A(p) are negative. 2. then f has no local extremum at p (p is called a saddle-point of f)..440 Chap. (c) 7f A(p) has at least one positive and at least one negative eigenvalue. by an elementary single-variable argument.f(p)- ... . . dt Thus p is a critical point of / .t2 The following facts are easily verified.p2. and let A(p) be the Hessian of f at p. * 2 .. The partial derivatives of g at 0 are equal to the corresponding partial derivatives of / at p.pi+i.. .. column j entry is d2f{p) (dti)(dtj)This matrix is called the Hessian matrix of / at p. .pn) has a local extremum at * = pi.t n ) = /(*i +pl.* 2 . . 0). 0 .. Proof.pn) be a critical point of f. 3... and hence A(p) is a symmetric matrix. all of the eigenvalues of A(p) are real. 0 is a critical point of g..t.p2.. Theorem 6.. +p2..37 (The Second Derivative Test). . In this case. then the mixed second-order partials of / at p are independent of the order in which they are taken. 6 Inner Product Spaces function <f>i defined by 4>i(t) = f(pi. then the second derivative test is inconclusive. So. then f has a local minimum at p. *n) be a real-valued function in n real variables for which all third-order partial derivatives exist and arc continuous. . we may define a function g: Rn —• 77 by 0(*i. . .. The second-order partial derivatives of / at a critical point p can often be used to test for a local extremum at p. The function / has a local maximum [minimum] at p if and only if g has a local maximum [minimum] at 0 — ( 0 . df(p) Ok d&ipi] = 0.. then f has a local maximum at p..

.ta. ..Sec..«./ W + EdU T^ ^' + S E ^(dti)(dtj) *z*j+5(*i... K 1 y 2 (15) QtA{p)Q = is a diagonal matrix whose diagonal entries are the eigenvalues of A(p)..20 (p. .vn} be the orthogonal basis for Rn whose ith vector is the ith column of Q.^{dt^dt. we may assume without loss of generality that p = 0 and f(p) = 0...*„) T 2 7=1 1 n a 2 /(fl) *i*j + 5(*i.8 Bilinear and Quadratic Forms 441 In view of these facts..*n (14) Let 7f: R n —• 77 be the quadratic form defined by (ti\ *2 v»/ 77 be the symmetric bilinear form corresponding to K. Now we apply Taylor's theorem to / to obtain the first-order approximation of / around 0. (ti. ) . 6. Then Q is the change of coordinate matrix changing 7coordinates into /^-coordinates.* n ) lim ? = 0. and 0 be the standard ordered basis for R n . Since A(p) is symmetric. We have d2f(0) /(*.*2. 384) implies that there exists an orthogonal matrix Q such that /Ai 0 \0 0 A2 0 0\ 0 XnJ d2f(0) "Cjtj.33 ' * ^ 7 (77) = QlMH)Q = ^QtA(p)Q = 0 0 0 T/ . .. • •.*n)> 2^(^)(a*j (13) where 5 is a real-valued function on Rn such that 5(x) lim x-»0 ||x[| 2 5(*i. Let 7 = {v\. .. Theorem 6..*n)-»0 *i + *i + r.v2. * . It is easy to verify that ip@(H) — $A(p). . .*2. • • • ..* 2 .. and by Theorem 6.

By a similar argument using the right inequality in (17). by (13) \S(x)\<e\\x\ K{x) . Then ^A. Substituting x = svi and x = SVJ into the left inequality and the right inequality of (17). . Then ^Aj . Next. This establishes (a) and (b) of the theorem. Then. and so / has a local minimum at 0. suppose that A(p) has both a positive and a negative eigenvalue. \f(x)-K(x)\ and hence = Chap. we have < S. Let s be any real number such that 0 < |s| < 6. so / has neither a local maximum nor a local minimum at 0. Thus / attains both positive and negative values arbitrarily close to 0. /(0) = 0 < E ( i A .y^Xn 2 7=1 :i6) Combining these equations with (16).e\\x\\2 < f(x) < K{x) + f ||x| Suppose that x = > SiVt. Consider any x G R n and (15).e V s 2 < / ( x ) . we obtain 7=1 ^ ' i=l > ' Now suppose that all eigenvalues of A(p) are positive.f > 0 for all i.442 Suppose that A(p) is not the zero values. Then i=\ Z* and 1 K(x) = . + e)s2 < 0 = f(0). 7=1 ^ ' Thus f(0) < / ( x ) for ||x|| < 6. we have that if all of the eigenvalues of A(p) are negative. there ||x|| < 6. Choose e > 0 such that e < exists S > 0 such that for any x G |S(x)| < e||x|| 2 . then / has a local maximum at 0. say. . by the left inequality in (17). By (14). . we obtain f(0) = 0<(±Xi-e)s2<f(sVi) and f(SVj) < (JA. This establishes (c). respectively. Then |Aj|/2 for all R n satisfying such that 0 < A(p) Aj ^ 0 < ||x|| has nonzero eigen0. 6 Inner Product Spaces matrix.r > 0 and ^Xj + e < 0. and hence. Aj > 0 and Aj < 0 for some i and j.

vr.. negative. Each such form has a diagonal matrix representation in which the diagonal entries may be positive.wr.. negative. Without loss of generality. or zero. We prove the law and apply it to describe the equivalence classes of congruent symmetric real matrices..w2. That is. This result is called Sylvester's law of inertia. . respectively. and zero. Let 77 be a symmetric bilinear form on a finite-dimensional real vector space V.vn} and 7 = {wi.. Then the number of positive diagonal entries and the number of negative diagonal entries in any diagonal matrix representation of H are each independent of the diagonal representation.. . and A(p) = '2 0N 0 0 However..4 and g{tiM) = tl+4 at p = 0. .8 Bilinear and Quadratic Forms 443 To show that the second-derivative test is inconclusive under the conditions stated in (d)..wn}. We confine our analysis to symmetric bilinear forms on finite-dimensional real vector spaces. we show that the number of entries that are positive and the number that are negative are unique. .v2. 6.. It suffices to show that both representations have the same number of positive entries because the number of negative entries is equal to the difference between the rank and the number of positive entries. Let 0 = {v\.vp.* 2 ) = 4 . the function has a critical point at p. Without loss of generality. / does not have a local extremum at 0. We can therefore define the rank of a bilinear form to be the rank of any of its matrix representations..38 (Sylvester's Law of Inertia). we may assume that 0 and 7 are ordered so that on each diagonal the entries are in the order (of positive... Proof.... We suppose that p •£ q and arrive at a contradiction.Sec. consider the functions /(*i..wq. Although these entries are not unique. assume that p < q. If a matrix representation is a diagonal matrix... I Sylvester's Law of Inertia Any two matrix representations of a bilinear form have the same rank because rank is preserved under congruence.. In both cases. Theorem 6. Suppose that 0 and 7 are ordered bases for V that determine diagonal representations of 77. Let p and q be the number of positive diagonal entries in the matrix representations of 77 with respect to 0 and 7. whereas g has a local minimum at 0.... then the rank is equal to the number of nonzero diagonal entries of the matrix. they are independent of the choice of diagonal representation.

vn}). So there exists a nonzero vector VQ such that vo ^ span({?. so that Uj = 0.^aiVi\=Yla>2jH{v3. 6 Inner Product Spaces where r is the rank of 77 and n = dim(V).vo) > 0. H(v0. Thus In 7t \ T t H(vo. Similarly.M. It is easily verified that L is linear and rank(L) < p + r — q.v3)= y=i 7=1 / j=i Furthermore. vr+2. H(v0.v0) = H E ^ ^ ' E / \j=l 7=1 w r E oJ/T(« i . 6j = 0 for q + 1 < i < r. W J ) = E />J^(M'J.Vj) = 0 for i < p and H{vo.Vj) > 0 and H(v{). Since VQ is not in the span of {vr+\. j= P +i = E 6 ? H ^ ' .vp).Vi) = 0..Wi) = 0 for q < i < r..t.Mo) < 0 and H(vo. and signature are called the invariants of the bilinear form because they are invariant with respect to matrix representations. j=i ^2ajH(v3.wq+i).Vi). j )<0.444 u• Chap. But for i < p. These same terms apply to the associated quadratic form. Since v0 G N(L). t'r+2i • • • > Mn}.... We conclude that p — q. it follows that at ^ 0 for some p < i < r.v2). H(x. The three terms rank. Suppose that >o = ^a3v3 j=i For any i < p. The difference between the number of positive and the number of negative diagonal entries in a diagonal representation of a symmetric bilinear form is called the signature of the form. / j=l j=P+l So 77(fn. The number of positive diagonal entries in a diagonal representation of a symmetric bilinear form on a real vector space is called the index of the form.vi).Vi) = 77 I E a i v i J v * I \j-i J = =E'VWi. H{x.. it follows that 77(t'0. but vo G N(L). .. index. Notice that the values of any two of these invariants determine the value of the third. Hence nullity(L) > n — (p + r — q) > n — r.. 1 Definitions. which is a contradiction... we have H{vi.iH{vi.Vi) i-i = a..Vo)=Hl^a3v3. H{x..wr)). Let L: V — RP+r t be the mapping * defined by L(x) = (H(x.J)>0. r +i. H(x.

Corollary 2. 6. Therefore the rank of K is 2.8 Bilinear and Quadratic Forms Example 10 445 The bilinear form corresponding to the quadratic form K of Example 9 has a 3 x 3 diagonal matrix representation with diagonal entries of 2. We can state this result as the matrix version of Sylvester's law. Corollary 1 (Sylvester's Law of Inertia for Matrices). then they are both congruent to the same diagonal matrix. Two real symmetric n x n matrices are congruent if and only if they have the same invariants. As before. Let A be an n x n real symmetric matrix. The number of positive diagonal entries of D is called the index of A. and signature of K are each 2. Then the number of positive diagonal entries and the number of negative diagonal entries in any diagonal matrix congruent to A is independent of the choice of the diagonal matrix. and 0. the index of K is 1.y) = x2 — y2 on R2 with respect to the standard ordered basis is the diagonal matrix with diagonal entries of 1 and — 1. suppose that A and B are nx n symmetric matrices with the same invariants. Definitions.32. Therefore the rank.Sec. and the signature of K is 0. the rank. Any two of these invariants can be used to determine an equivalence class of congruent real symmetric matrices. The difference between the number of positive diagonal entries and the number of negative diagonal entries of D is called the signature of A. If A and B are congruent n x n symmetric matrices. Conversely. Let A be a real symmetric matrix. • Example 11 The matrix representation of the bilinear form corresponding to the quadratic form K(x. and signature of a matrix are called the invariants of the matrix. Therefore Sylvester's law of inertia tells us that D and E have the same number of positive and negative diagonal entries. . index. Let D and E be diagonal matrices congruent to A and B. Let A be a real symmetric matrix. index. we can apply Sylvester's law of inertia to study this relation on the set of real symmetric matrices. and suppose that D and E are each diagonal matrices congruent to A By Corollary 3 to Theorem 6. Proof. • Since the congruence relation is intimately associated with bilinear forms. and let D be a diagonal matrix that is congruent to A.y) = xlAy with respect to the standard ordered basis for R n . A is the matrix representation of the bilinear form 77 on Rn defined by H(x. 7. and the values of any two of these invariants determine the value of the third. and it follows that they have the same invariants.

446

Chap. 6 Inner Product Spaces

respectively, chosen so that the diagonal entries are in the order of positive, negative, and zero. (Exercise 23 allows us to do this.) Since A and B have the same invariants, so do D and E. Let p and r denote the index and the rank, respectively, of both D and E. Let di denote the ith diagonal entry of D, and let Q be the n x n diagonal matrix whose ith diagonal entry qi is given by ( 1 -7= \ di 1 \l-di 1 Then Q DQ — Jpr, where Jpr — I p O *r—p o
t

if 1 < i < p if p < i < r if r < i.

Qi = S

It follows that A is congruent to Jpr- Similarly, B is congruent to Jpr, and hence A is congruent to B. [\ The matrix Jpr acts as a canonical form for the theory of real symmetric matrices. The next corollary, whose proof is contained in the proof of Corollary 2, describes the role of Jpr. Corollary 3. A real symmetric n x n matrix A has index p and rank r if and only if A is congruent to Jpr (as just defined). Example 12 Let A = 1 1 3 ' \ 1 2 1 ' 3 1 1/ I 2 1 77 = 1 2 3 2 \l 2 1 and C =

We apply Corollary 2 to determine which pairs of the matrices A, B, and C are congruent. The matrix A is the 3 x 3 matrix of Example 6, where it is shown that A is congruent to a diagonal matrix with diagonal entries 1, 1, and —24. Therefore, A has rank 3 and index 2. Using the methods of Example 6 (it is not necessary to compute Q), it can be shown that B and C are congruent, respectively, to the diagonal matrices and

\ x- -«

Sec. 6.8 Bilinear and Quadratic Forms

447

It follows that both A and C have rank 3 and index 2, while B has rank 3 and index 1. We conclude that A and C are congruent but that B is congruent to neither A nor C. • EXERCISES 1. Label the following statements as true or false. (a) (b) (c) (d) (e) (f) (g) (h) (i) Every quadratic form is a bilinear form. If two matrices are congruent, they have the same eigenvalues. Symmetric bilinear forms have symmetric matrix representations. Any symmetric matrix is congruent to a diagonal matrix. The sum of two symmetric bilinear forms is a symmetric bilinear form. Two symmetric matrices with the same characteristic polynomial are matrix representations of the same bilinear form. There exists a bilinear form 77 such that H(x,y) ^ 0 for all x and yIf V is a vector space of dimension n, then dim(B(V)) = 2n. Let 77 be a bilinear form on a finite-dimensional vector space V with dim(V) > 1. For any x G V, there exists y G V such that y^0,butH(x,y) = 0. If 77 is any bilinear form on a finite-dimensional real inner product space V, then there exists an ordered basis 0 for V such that ijjp(H) is a diagonal matrix.

(j)

2. Prove properties 1, 2, 3, and 4 on page 423. 3. (a) Prove that the sum of two bilinear forms is a bilinear form. (b) Prove that the product of a scalar and a bilinear form is a bilinear form. (c) Prove Theorem 6.31. 4. Determine which of the mappings that follow are bilinear forms. Justify your answers. (a) Let V = C[0,1] be the space of continuous real-valued functions on the closed interval [0,1]. For / , g G V, define H(f,g)= f Jo f(t)g(t)dt.

(b) Let V be a vector space over F , and let J G B(V) be nonzero. Define 77: V x V ^ F by 77(x, y) = [J(x, y)}2 for all x, y G V.

448

Chap. 6 Inner Product Spaces (c) Define 77: 77 x R -> R by II{h,t2) = *j + 2*2. (d) Consider the vectors of R2 as column vectors, and let 77: R2 —* R be the function defined by H(x,y) = det(x,y), the determinant of the 2 x 2 matrix with columns x and y. (e) Let V be a real inner product space, and let 77: V x V — 7? be the » function defined by H(x,y) = (x.y) for x, y G V. (f) Let V be a complex inner product space, and let 7/: V x V — C * be the function defined by H(x,y) = (x,y) for x.y G V.

5. Verify that each of the given mappings is a bilinear form. Then compute its matrix representation with respect to the given ordered basis 0. (a) H: R3 x R3 -> R, where = a\bi — 2a\b2 + a 2 ^i — 0363 and

(b)

Let V = M2x2(77) and 0 = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Define 77: V x V - R by 77(A B) = tr(A) • tr(77). (c) Let 0 = {cos*,sin*,cos2t,sin21}. Then 0 is an ordered basis for V = span(tf), a four-dimensional subspace of the space of all continuous functions on 7?. Let H: V x V 77 be the function defined by H(f,g) = f'(0)-g"(0).
2 6. Let 77: R

R be the function defined by = a i62 + a2b\ for G R'"

// (a) (b)

Prove that 77 is a bilinear form. Find the 2 x 2 matrix A such that 77(x. y) = x' Ay for all x, y G R2.

For a 2 x 2 matrix A7 with columns x and y, the bilinear form 77(A7) = H(x,y) is called the p e r m a n e n t of M. * 7. Let V and W be vector spaces over the same field, and let T: V — W be a linear transformation. For any 77 G #(W), define f (77): V x V - » F by f(H)(x,y) = H(T(x),T(y)) for all x,y G V. Prove the following results.

Sec. 6.8 Bilinear and Quadratic Forms (a) If 77 G B(W), then f (77) G £(V). (b) T : #(W) — B(\f) is a linear transformation. > (c) If T is an isomorphism, then so is T. 8. Assume the notation of Theorem 6.32.

449

(a) Prove that for any ordered basis 0, ipp is linear. (b) Let 0 be an ordered basis for an n-dimensional space V over F , and let 4>p: V — Fn be the standard representation of V with respect * to 0. For A G M n X n (F), define 77: V x V -> F by H(x,y) = [(f>0(x)\tA[(j)fi{y)]. Prove that 77 G #(V). Can you establish this as a corollary to Exercise 7? (c) Prove the converse of (b): Let 77 be a bilinear form on V. If A = MH), then H(x,y) = [M^Y^Mv)]9. (a) Prove Corollary 1 to Theorem 6.32. (b) For a finite-dimensional vector space V, describe a method for finding an ordered basis for <3(V). 10. Prove Corollary 2 to Theorem 6.32. 11. Prove Corollary 3 to Theorem 6.32. 12. Prove that the relation of congruence is an equivalence relation. 13. The following outline provides an alternative proof to Theorem 6.33. (a) Suppose that 0 and 7 are ordered bases for a finite-dimensional vector space V, and let Q be the change of coordinate matrix changing 7-coordinates to /^-coordinates. Prove that <f>p — LQ^, where 4>& and 0 7 are the standard representations of V with respect to 0 and 7, respectively. (b) Apply Corollary 2 to Theorem 6.32 to (a) to obtain an alternative proof of Theorem 6.33. 14. Let V be a finite-dimensional vector space and 77 G #(V). Prove that, for any ordered bases 0 and 7 of V, rank(V^(77)) = rank(?^7(77)). 15. Prove the following results. (a) Any square diagonal matrix is symmetric. (b) Any matrix congruent to a diagonal matrix is symmetric. (c) the corollary to Theorem 6.35 16. Let V be a vector space over a field F not of characteristic two, and let 77 be a symmetric bilinear form on V. Prove that if K(x) = 77(x, x) is the quadratic form associated with 77, then, for all x, y G V, H(x,y) = -[K(x + y)-K(x)-K(y)}.

450

Chap. 6 Inner Product Spaces

17. For each of the given quadratic forms i ^ o n a real inner product space V, find a symmetric bilinear form 77 such that K(x) = 77(x, x) for all x G V. Then find an orthonormal basis 0 for V such that ipp(H) is a diagonal matrix. (a) K: R2 (b) K: R2 (c) K: R3 R defined by K ' t l * *2. "J " ' -2*^+4* 1 * 2 + *|

77 defined by K r1 J = lt\ - 8*i*2 + t\ R defined by K (u\ *2 = 3*? + Zt\ + 3*§ - 2*i*3 VW W

18. Let S be the set of all (*i, i 2 , *3) G R3 for which 3*? + St2, + 3*1 - 2*1*3 + 2v^(*i + *3) + 1 = 0. Find an orthonormal basis 0 for R3 for which the equation relating the coordinates of points of S relative to 0 is simpler. Describe < S geometrically. 19. Prove the following refinement of Theorem 6.37(d). (a) If 0 < rank(^4) < n and A has no negative eigenvalues, then / has no local maximum at p. (b) If 0 < rank(^4) < n and A has no positive eigenvalues, then / has no local minimum at p. 20. Prove the following variation of the second-derivative test for the case n = 2: Define D = (a) (b) (c) (d) If If If If D D D D > > < = \d2f{p)' \d2f(P)] dt\ &i J \d2f(P)] n2 0*10*2

0 and d2f(p)/dt2 > 0, then / has a local minimum at p. 0 and d2f(p)/dt2 < 0, then / has a local maximum at p. 0, then / has no local extremum at p. 0, then the test is inconclusive.

Hint: Observe that, as in Theorem 6.37, D = det(A) — AiA2, where Ai and A2 are the eigenvalues of A. 21. Let A and E be in M n x n ( F ) , with E an elementary matrix. In Section 3.1, it was shown that AE can be obtained from A by means of an elementary column operation. Prove that EtA can be obtained by means of the same elementary operation performed on the rows rather than on the columns of A. Hint: Note that ElA = (AtE)t.

Sec. 6.9 Einstein's Special Theory of Relativity

451

22. For each of the following matrices A with entries from R, find a diagonal matrix D and an invertible matrix Q such that QlAQ = D.

#m£ /or ^ : columns.

Use an elementary operation other than interchanging

23. Prove that if the diagonal entries of a diagonal matrix are permuted, then the resulting diagonal matrix is congruent to the original one. 24. Let T be a linear operator on a real inner product space V, and define H: V x V -> R by H(x, y) = (x, T(y)) for all x, y € V. (a) (b) (c) (d) Prove that H is a bilinear form. Prove that H is symmetric if and only if T is self-adjoint. What properties must T have for H to be an inner product on V? Explain why H may fail to be a bilinear form if V is a complex inner product space.

25. Prove the converse to Exercise 24(a): Let V be a finite-dimensional real inner product space, and let H be a bilinear form on V. Then there exists a unique linear operator T on V such that H(x, y) = (ar, T(y)) for all x, y e V , Hint: Choose an orthonormal basis (3 for V, let A = ipp(H), and let T be the linear operator on V such that [T]^ = A. Apply Exercise 8(c) of this section and Exercise 15 of Section 6.2 (p. 355). 26. Prove that the number of distinct equivalence classes of congruent real symmetric matrices is (n + l)(n + 2) nxn

6.9*

EINSTEIN'S SPECIAL THEORY OF RELATIVITY

As a consequence of physical experiments performed in the latter half of the nineteenth century (most notably the Michelson Morley experiment of 1887), physicists concluded that the results obtained in measuring the speed of light are independent of the velocity of the instrument used to measure the speed of light. For example, suppose that while on Earth, an experimenter measures the speed of light emitted from the sun and finds it to be 186,000 miles per second. Now suppose that the experimenter places the measuring equipment in a spaceship that leaves Earth traveling at 100,000 miles per second in a direction away from the sun. A repetition of the same experiment from the spaceship yields the same result: Light is traveling at 186,000 miles per second

452

Chap. 6 Inner Product Spaces

relative to the spaceship, rather than 86,000 miles per second as one might expect! This revelation led to a new way of relating coordinate systems used to locate events in space—time. The result was Albert Einstein's special theory of relativity. In this section, we develop via a linear algebra viewpoint the essence of Einstein's theory.

S' Figure 6.8 The basic problem is to compare two different inertial (nonaccelerating) coordinate systems S and S' in three-space (R3) that are in motion relative to each other under the assumption that the speed of light is the same when measured in either system. We assume that S' moves at a constant velocity in relation to S as measured from S. (See Figure 6.8.) To simplify matters, let us suppose that the following conditions hold: 1. The corresponding axes of S and S' (x and x', y and y', z and z') are parallel, and the origin of S' moves in the positive direction of the x-axis of S at a constant velocity v > 0 relative to S. 2. Two clocks C and C are placed in space—the first stationary relative to the coordinate system S and the second stationary relative to the coordinate system S'. These clocks are designed to give real numbers in units of seconds as readings. The clocks are calibrated so that at the instant the origins of S and S" coincide, both clocks give the reading zero. 3. The unit of length is the light second (the distance light travels in 1 second), and the unit of time is the second. Note that, with respect to these units, the speed of light is 1 light second per second. Given any event (something whose position and time of occurrence can be described), we may assign a set of space-time coordinates to it. For example,

Sec. 6.9 Einstein's Special Theory of Relativity if p is an event that occurs at position

453

relative to S and at time t as read on clock C, we can assign to p the set of coordinates M y z W This ordered 4-tuple is called the space—time coordinates of p relative to S and C. Likewise, p has a set of space-time coordinates (x'\ y z v«7 relative to S' and C. For a fixed velocity v, let Jv: R4 /x\ y T, z W where (x\ y z fx'\ y z R4 be the mapping defined by fx'\ y z V)

and

are the space—time coordinates of the same event with respect to 5 and C and with respect to S' and C", respectively. Einstein made certain assumptions about T„ that led to his special theory of relativity. We formulate an equivalent set of assumptions. Axioms of the Special Theory of Relativity (R 1) The speed of any light beam, when measured in either coordinate system using a clock stationary relative to that coordinate system, is 1.

454 (R 2) The mapping T„: R1 (R 3) If

Chap. 6 Inner Product Spaces R4 is an isomorphism.

/x\ T, y — then y' = y and z' = z. (R.4) If (x\ T, l)\ -1 V) ,'\ - k u V and

(x'\ y z V)

/x\ Tr \t)

(x"\ = y" z" V")

then x" = x' and t" - t'. (R 5) The origin of S moves in the negative direction of the rr'-axis of Sf at the constant velocity — v < 0 as measured from S'. Axioms (R 3) and (R 4) tell us that for p 6 R4, the second and third coordinates of T„(p) are unchanged and the first and fourth coordinates of Tv(p) are independent of the second and third coordinates of p. As we will see, these five axioms completely characterize T t ,. The operator Tv is called the Lorentz t r a n s f o r m a t i o n in direction :/;. We intend to compute T„ and use it to study the curious phenomenon of time contraction. Theorem 6.39. On R4, the following statements are true. (a) Tv(ei) = e-i for i = 2,3. (b) span({e2, e.3}) is T,,-invariant. (c) span({f'i,e4}) is Tv-invariant. (d) Both span({r; 2l fi 3}) and span({/?i,C4}) are T*-/nvariant. (e) T*v(('i) = et fori = 2,3. Proof, (a) By axiom (R 2), (°) 0 T„ 0 w (o\ 0 0 w

and hence, by axiom (R 4), the first and fourth coordinates of

T, w

V - "M " f

Sec. 6.9 Einstein's Special Theory of Relativity are both zero for any a,b € R. Thus, by axiom (R 3), (°) 1 T, 0 W (0\ 1 0 v^ (0\ 0 Tv 1 vO> /0\ 0 1 v^y

455

and

The proofs of (b), (c), and (d) are left as exercises. (e) For any j / 2, (T*,(e2), ej) - (e2, T ^ e , ) ) = 0 by (a) and (c); for j = 2, (Tl{e2),ej) = <e 2 ,T„(e 2 )) = (e 2 ,e 2 ) = 1 by (a). We conclude that T*(e 2 ) is a multiple of e 2 (i.e., that T*(e 2 ) — ke.2 for some k € R). Thus, 1 = (e 2 ,e 2 ) = (e 2 ,T u (e 2 )) = (T*,(e2),e2) = (fcc2,e2) = k, and hence T*(e 2 ) = e 2 . Similarly, T*(e 3 ) = 63. I

Suppose that, at the instant the origins of S and S' coincide, a light flash is emitted from their common origin. The event of the light flash when measured either relative to S and C or relative to S' and C has space time coordinates M 0 0 w Let P be the set of all events whose space—time coordinates /x\ y z w relative to S and C are such that the flash is observable from the point with coordinates

(as measured relative to S) at the time t (as measured on C). Let us characterize P in terms of x, y, z, and t. Since the speed of light is 1, at any time t > 0 the light flash is observable from any point whose distance to the origin of S (as measured on S) is t • 1 = t. These are precisely the points that lie on the sphere of radius t with center at the origin. The coordinates (relative to

456

Chap. 6 Inner Product Spaces

S) of such points satisfy the equation x2 + y2 + z2 — t2 — 0. Hence an event lies in P if and only if its space time coordinates

[t > 0)

relative to S and C satisfy the equation x2 + y2 + z2 ~ t2 = 0. By virtue of axiom (R 1), we can characterize P in terms of the space time coordinates relative; to S" and C similarly: An event lies in P if and only if, relative to S' and C, its space-time coordinates f.r'\ II w satisfy the equation {x')2 + (;/) 2 + {z')2 - {t')2 = 0. Let /l 0 0 1 A = 0 0 \0 0 0 0 1 0

(t > 0)

-l)

Theorem 6.40. If (U(u>), w) - 0 for some w € R4, then {VvLATv(w),w)=0. Proof. Let 'A (I 'I and suppose that (LA(W).W) — 0. CASE 1. t > 0. Since (L^w), w?) = x2 + y2 + z2 — t2, the vector w gives the coordinates of an event in P relative to S and C. Because /.A and Vl fx'\ y'
VJ

w—

GR 4 ,

Sec. 6.9 Einstein's Special Theory of Relativity

457

are the space—time coordinates of the same event relative to S' and C, the discussion preceding Theorem 6.40 yields {x'Y + {y'Y + (z'Y - {t'Y = 0. Thus (T*vLATv(w),w) = (lATv(w),Tv(w)) = (x')2 + (y')2 + (z')2 - (t/)2 = 0, and the conclusion follows. CASE 2. t < 0. The proof follows by applying case 1 to — w. 1 We now proceed to deduce information about Tv. Let n\ 0 wi = 0 W ( W 0 Wo. = 0 v-v

and

By Exercise 3, {wi,w 2 } is an orthogonal basis for span({ei,e4J), and span({ei,e4}) is T*L^T v -invariant. The next result tells us even more. T h e o r e m 6.41. There exist nonzero scalars a, and b such that (a) TlLATv(wi) = aw2. (b) T*vLATv(u)2) = bwi. Proof, (a) Because (\-A(wi),w\) — 0, (TlLAlv{w\),w1) = 0 by Theorem 6.40. Thus T*L^T w (u'i) is orthogonal to w\. Since span({ei,e4}) = span({w;i,tij2}) is T*LATv-invariant, T*\-ATv{ivi) must lie in this set. But {wi,u>2} is an orthogonal basis for this subspace, and so T^Lj^Jv{w{) must be a multiple of w2- Thus T*L^Tu(w;1) = aw2 for some scalar a. Since Tv and A are invertible, so is T,*L^T„. Thus a ^ 0, proving (a). The proof of (b) is similar to (a). I Corollary. Let Bv — [Tv]0, where ft is the standard ordered basis for R4. Then (a) B*ABV = A. (b) T^L^T. = LA. We leave the proof of the corollary as an exercise. For hints, see Exercise 4. Now consider the situation 1 second after the origins of S and S' have coincided as measured by the clock C. Since the origin of S' is moving along the #-axis at a velocity v as measured in S, its space time coordinates relative to S and C are

Thus. 0 W /o\ 0 0 w for some f > 0.458 Chap.41. (18) By the corollary to Theorem 6. 6 Inner Product Spaces Similarly.LATW = (LATV VI /o\ (0\ 0 0 u 0 i 0 VI VI Combining (19) and (20).TV 0 0 VI = -(tf\2 (20) (v\ 0 i 0 VI fv\ 0 0 VI fv\ 0 ) 0 VI T*LAT„ = (LA = v2-l. (19) T.u * . (v\ 0 0 VI But also /v\ 0 0 VI (v\ 0 5 0 VI = /tl\ fv\ 0 0 . [This fact . o 0 w o 0 0 \Vi=^I / \ or (21) (22) Next recall that the origin of S moves in the negative direction of the x'-axis of S' at the constant velocity — v < 0 as measured from S'. from (18) and (21). we conclude that v — 1 = —(f). t' = VT . we obtain M T. the space—time coordinates for the origin of S' relative to S' and C must be /o\ 0 0 VI for some t' > 0. Thus we have fv\ 0 T.

Suppose that the origin of S' coincides with the space vehicle and the origin of S coincides with a point in the solar system 0 1 0 0 0 0 1 0 0 1 \ VT^ Pwjfl — "v — . /0\ 0 T. consider the coordinate systems S and S' and clocks C and C that we have been studying. Suppose that an astronaut leaves our solar system in a space vehicle traveling at a fixed velocity v as measured relative to our solar system. To establish this result. 6. there exists a time t" > 0 as measured on clock C such that /0\ 0 T. Then { 1 o 0 —v \VT^ Time Contraction A most curious and paradoxical conclusion follows if we accept Einstein's theory.9 Einstein's Special Theory of Relativity 459 is axiom (R 5). 0 W f-vt"\ 0 0 V '" / (23) From (23).Sec. the time that passes on the space vehicle is only t\/\ — v2. from (23) and (24). it follows in a manner similar to the derivation of (22) that t" = hence. at the end of time t as measured on Earth.42. Let 0 be the standard ordered basis for R 4 . 1 second after the origins of S and S' have coincided as measured on clock C. It follows from Einstein's theory that. and Theorem 6.39. (25). Theorem 6. 0 VI -v Vl-v2 0 0 1 WT^v2! / \ (25) 1 vT^2' (24) The following result is now easily proved using (22).] Consequently.

6 Inner Product Spaces (stationary relative to the sun) so that the origins of 5" and S' coincide and clocks C and C read zero at the moment the astronaut embarks on the trip. As viewed from 5. it must follow that fvt\ 0 T. 0 VI Thus ( 1 Vl-v L' v\ft — &v — 2 (o\ 0 0 VI —V \ VT^ 1 0 0 0 1 0 o I 0 0 —v /vt\ 0 0 V^ /0\ 0 0 VI VvT From the preceding equation.460 Chap. as viewed from S'. we obtain t' = -vH v7!3^ + VT^ = f. . the space-time coordinates of the vehicle at any time t > 0 as measured by C are /vt\ 0 0 VI whereas. or (26) ty/l-v2. the space-time coordinates of the vehicle at any time f > 0 as measured by C are /0\ 0 0 VI But if two sets of space -time coordinates M 0 0 VI 0 0 VI and are to describe the same event.

For fi\ 0 W\ = 0 w show that (a) {w\. 4. (c).9 Einstein's Special Theory of Relativity 461 This is the desired result. A dramatic consequence of time contraction is that distances are contracted along the line of motion (see Exercise 9). Complete the proof of Theorem 6. Let c denote the speed of light relative to our chosen units of distance.40 for the case t < 0. Hints: (a) Prove that 0 0 0 1 0 = 0 0 1 0 0 Vi (p ( and u)2 — l 0 0 \ v-v B*ABV 0 0 -p) where a+ b p = —-— and a—b q = —-—. (26) becomes f=«.41. (b) span({ei. and (d) of Theorem 6. Prove the corollary to Theorem 6. 3. Let us make one additional point. .C4}). such as the mile and hour. Thus./i. ? EXERCISES 1. 2.W2} is an orthogonal basis for span({ei. or the kilometer and second.39.Sec. for an arbitrary set of units of distance and time. It is easily seen that if an object travels at a velocity v relative to a set of units. then it is traveling at a velocity v/c in units of light seconds per second.e4}) is T*LATw-invariant. Prove (b). 6. Suppose that we consider units of distance and time more commonly used than the light second and second.

where B. BV. S'. Why would we be surprised if this were not the case? 7. (i.. Consider three coordinate systems S.z'. Assuming that T r .:/. and prove that s/l-v* 0 ' 0 1 Xs/T^I f{)) 0 T„ 0 V'J (25) Hint: Use a technique similar to the derivation of (22)... then [T. (c) Apply Theorem 6.40 to M i w = o VI to show that p = 1. and C" such that C is stationary relative to S. Compute (By)'1.. is of the form given in Theorem 6.9).. and S" with the corresponding axes (j. 5.". Suppose that S' is moving past S at a velocity t'i > 0 (as measured on S). 6.42. all the origins of S. Show ( B r ) _ l = B{ r).462 Chap. Conclude that if S' moves at a negative velocity v relative to S. Suppose. S'. and C" is stationary relative to S". x'-.:/. and :r"-axes coincide...^ = B. S" is moving past S' at a velocity u2 > 0 (as measured on 5').. C is stationary relative to S'. and that there are three clocks C. prove that V3 = Note that substituting v2 = 1 in this equation yields V3 = 1. C.] . that when measured on any of the three clocks. 6 Inner Product Spaces (b) Show that q — 0 by using the fact that B*ABV is self-adjoint.A = BV2BVl). y.e. Assuming Einstein's special theory of .y": and z.. = T„2T.y'. and S" coincide at time 0. This tells \is that the speed of light as measured in 5 or S' is the same. 8.z") parallel and such that the x-.'. Suppose that an astronaut left Earth in the year 2000 and traveled to a star 99 light years away from Earth at 99% of the speed of light and that upon reaching the star immediately turned around and returned to Earth at the same speed. and S" is moving past S at a velocity V3 > 0 (as measured on . Derive (24).

and . Suppose that the vehicle is moving toward a fixed star located on the :r-axis of 5 at a distance b units from the origin of S. A paradox appears in that the astronaut perceives a time span inconsistent with a distance of b and a velocity of v. show that if the astronaut was 20 years old at the time of departure. 9. the distance from the astronaut to the star as measured by the astronaut (see Figure 6. the space—time coordinates of the star relative to S' and C are / b . b — tv x = VT=r . Earthlings (who remain "almost" stationary relative to S) compute the time it takes for the vehicle to reach the star as t = b/v. The paradox is resolved by observing that the distance from the solar system to the star as measured by the astronaut is less than b.vt \ 0 0 t-bv V/T^I (c) For . the space time coordinates of star relative to S and C are fb\ 0 0 VI (b) At time t (as measured on C). Explain the use of Exercise 7 in solving this problem.Sec.9 Einstein's Special Theory of Relativity 463 relativity. (a) At time t (as measured on C). If the space vehicle moves toward the star at velocity v. prove the following results.2 in the year 2200.v2 fv.9) is bJ\ . This result may be interpreted to mean that at time t! as measured by the astronaut. Recall the moving space vehicle considered in the study of time contraction. the astronaut perceives a time span of f = ty/1 — v2 = (b/v)y/l — v2. 6. Assuming that the coordinate systems S and S' and clocks C and C are as in the discussion of time contraction. we have x' = by/1 — v2 — fv. t — bv t = v T ^ ' . Due to the phenomenon of time contraction. then he or she would return to Earth at age 48.

In addition. One might intuitively feel that small relative changes in the coefficients of the system cause. invcrtible matrix since this is the case most freemently encountereel in applications. small relative errors in the solution. complex (or real). i . We now consider several examples of these types of errors. Such systems often arise in applications to the real world. computers introduce roundoff errors. Thus distances along the line of motion of the space vehicle appear to be contracted by a factor of s/l — v2.0.10* CONDITIONING AND THE RAYLEIGH QUOTIENT In Section 3. both m and n are so large that a computer must be used in the calculation of the solution.4.. is v.0) coordinates relative to S (d) Conclude from the preceding equation that (1) the speed of the space vehicle relative to the star. First. Thus two types of errors must be considered. we studied specific techniques that allow us to solve systems of linear equations in the form Ax — b. (2) the distance from Earth to the star. A system that has this property is called well-conditioned: otherwise. we assume that A is a square. and. in the collection of data since no instruments can provide completely accurate measurements. concentrating primarily on changes in 6 rather than on changes in the entries of A. The coefficients in the system are frequently obtained from experimental data. is b\/\ — v2.0) coordinates relative to S' *(star) S' Figure 6. as measured by the astronaut.0. 6 Inner Product Spaces (x'. where A is an m x n matrix and 6 is an m x 1 vector. experimental errors arise.9 (6. 6.464 Chap. the system is called ill-conditioned. in many cases. as measured by the astronaut. Second.

that is. 6. is considered large if the original solution is of the order 10 . the system $i + X2 = 5 X\ — X2 = 1 4. where b is the vector in the original system and b' is the vector in the modified system. say. This modified system has the solution 3.x2 = 1. Of course.10 Conditioning and the Rayleigh Quotient Example 1 Consider the system x\ + x2 = 5 X\ — X2 — 1. Thus we have 8b = We now define the relative change in b to be the scalar ||o"6||/||&||. we are really interested in relative changes since a change in the solution of. Most . but small if the original solution is of the order 106.b).4 in each coordinate of the new solution.Sec. ||6|| = y/(b. We use the notation 5b to denote the vector b' — b. More generally.0001. The solution to this system is 2/- 465 Now suppose that we change the system somewhat and consider the new system x\ + x2 = 5 xi . where || • || denotes the standard norm on C n (or R n ). 10.00005\ 1.99995/ ' We see that a change of 10 4 in one coefficient has caused a change of less than 10 .5 has the solution 3 + 6/2 2-8/2 Hence small changes in b introduce small changes in the solution.2 .

Se> a small change in either line could greatly alter the point of intersection. se> the system is well-conditioned.00001x2 = 3. however. IN '26 and \6x\ as its solution.(105)//' i + (io5)/». the solution to the system. ll-rll while1 ll^l Zy/2 Thus the relative change in x is at least 104 times the relative change in 6! This system is very ill-conditioned.|.00001.00001 + e 5 is 2 . In this example. llfall = 10 5 >/275|/i| > 104|//. Similar definitions holel for the relative change in x. 6 Inner Product Spaces of what follows. The solution to the relateel system x\ + x2 = 3 X] + 1. • Example 2 Consider the system X\ X\ which has + + X2 = 1.466 Chap.G ) '26 0 Thus the relative change in x equals. • . that is. that the lines defineel by the two equations are nearly coincident. coincidentally. the relative change in b. Observe.y • Hence.00001a-2 = 3 3. (3 + (h/2)\ \2 ~ (h/2)J . is true for any norm.

) Definition. i=l Define the I j. The question of whether or not this maximum exists. .— | > \\ ~~ ' II T I ^ * It is easy to see that R(vi) = Aj. a 2 . Definition. p. so we have demonstrated the first half of the theorem. we may choose an orthonormal basis {vi. (See Exercise 24 of Section 6. 384) and 6. . 373. . there exist scalars oi. (Recall that by the lemma to Theorem 6.x) /||x|| 2 . H .Sec. . Intuitively.43. can be answered by the use of the so-called Rayleigh quotient. we have that max R(x) is the largest eigenvalue ofB and min R(x) is the smallest eigenvalue x^O xj^O ofB.20 (p. . as well as the problem of how to compute it. (Euclidean) norm of A by \\Ax\ \A\\ = max xjtO \\x\\ where x G Cn or x G R n . an such that X= Hence R(x) = BX. \\A\\ represents the maximum magnification of a vector by the matrix A. vn} of eigenvectors of B such that Bvi = Ajfj (1 < i < n). where Ai > A2 > • • • > ATO. 384). The Rayleigh quotient for x ^ 0 is defined to be the scalar R(x) = (Bx. . the eigenvalues of B are real. The second half is proved similarly.17. for x G F n .19 (p. For a self-adjoint matrix B G MnXn(F). . .10 Conditioning and the Rayleigh Quotient 467 To apply the full strength of the theory of self-adjoint matrices to the study of conditioning.1 for further results about norms. The following result characterizes the extreme values of the Rayleigh quotient of a self-adjoint matrix.) Now. Let B be an n x n self-adjoint matrix. 6. By Theorems 6. Theorem 6. we need the notion of the norm of a matrix. v 2 . X) \ E " = 1 aiXiVi' EJ=1 aJV> ^TaiVi. Proof. Let A be a complex (or real) n x n matrix.

equals y/X. we need the following lemma. I For many applications. which by the lemma are the eigenvalues of AA*. Proof. Example 3 Let . Apply A to both sides to obtain (AA*)(Ax) = \(Ax). we have that A is an eigenvalue of AA*. For any square matrix A.1 is an eigenvalue of its inverse. Recall that A is an eigenvalue of an invertible matrix if and only if A . where A is the smallest eigenvalue of A* A. Then there exists x ^ 0 such that A* Ax = \x. The proof of the converse is left as an exercise. Let B be the self-adjoint matrix A*A. For any square matrix A. where A is the largest eigenvalue of A* A.43 that p | | 22 = A. Then ||J4 _1 || = l/\/A. 0< |Ar|| 2 (Ax. Proof. Let A be an eigenvalue of A* A. Ax) (A*Ax. Hence A and A* are not invertible. < £ # . in fact. Suppose now that A ^ O . in the case of vibration problems. For example. for x / 0. then A* A is not invertible. I Corollary 2. it is only the largest and smallest eigenvalues that are of interest. A is an eigenvalue of A* A if and only if A is an eigenvalue of AA*. We see the role of both of these eigenvalues in our study of conditioning. Proof. Observe that the proof of Corollary 1 shows that all the eigenvalues of A* A are nonnegative. Ne>w let Ai > A2 > • • • > ATl be the eigenvalues of A* A. Lemma. For our next result.x) x\ _ it follows from Theorem 6. so that A is also an eigenvalue of AA*.* « > . Let A be an invertible matrix.468 Chap. The proof of the converse is similar. the smallest eigenvalue represents the lowest frequency at which vibrations can occur. Since Ax ^ 0 (lest Xx = 0). Then \\A~~1 \\2 equals the largest eigenvalue of (A'1)*A'1 = (AA*)~\ which equals 1/A n . \\A\\ is finite and. and let A be the largest eigenvalue of B. If A = 0. Since. 6 Inner Product Spaces Corollary 1.

In fact. \\A\\-\\A-i\\ (¥b\\\^ \\Sx\\ < \\\b\\J The number \\A\\ • ||>1 -1 || is called the condition number of A and is denoted cond(yl). we may compute R(x) for the matrix B as 3 > R(x) = (Bx. For a given 8b. let 5x be the vector that satisfies A(x + 8x) = b + 8b. 3.Sec. and 0. x) 2(a2 + b2 + c2 -ab + ac + be) a2 + b2-r c2 Now that we know ||^4|| exists for every square matrix A. which holds for every x.10 Conditioning and the Rayleigh Quotient Then 469 The eigenvalues of B are 3. T h e o r e m 6. Therefore.44. 1 \\5b\\ ll<MI . < II&II/PII < WA-'w-im-wAw \b\\ = « . the only property needed to establish the inequalities above is that \\Ax\\ < \\A\\ • \\x\\ for all x. and Ax = b. where A is invertible and b ^ 0.„ .JI<H>ll (a) For any norm || • ||. It should be noted that the definition of cond(^4) depends on how the norm of A is defined. Assume in what follows that A is invertible. ||^4|| = \/3. . 6. and so 8x = A-1 (8b). Hence . we can make use of the inequality \Ax\ < \\A\\ • \\x\\.\\Ax\\ < \\A\\ • \\x\\ and Thus \8x\\ imi \\8x\\ = H ^ W I I < P " II • \W\. . b ^ 0. We summarize these results in the following theorem. There are many reasonable ways of defining the norm of a matrix./. For the system Ax — b.For any x=\b\^0. ( « ) . Then A(8x) — 8b. the following statements are true. . . Similarly (see Exercise 9). we have cond(A) .

In fact. and (b) follows from Corollaries 1 and 2 to Theorem 6. the situation is more complicated. Charles Cullen (Charles G. some situations in which a usable approximation of cond(^4) can be found. For example. 1 It is clear from Theorem 6. if a computer is used to find A~l. We can see immediately from (a) that if cond(^4) is close to 1. in practice. If there is an error 8A in the coefficient matrix of the system Ax — b. Cullen. PWS Publishing Co.470 Chap. Boston 1994. We have so far considered only errors in the vector b. An Introduction to Numerical Linear Algebra. A + 8A may fail to be invertible. it can be shown that a bound for the relative error in x can be given in terms of cond(A). The norm of A always equals the largest eigenvalue of A. (a) (b) (c) (d) (e) If Ax = b is well-conditioned. If cond(A) is large.4) 8x\ JM It should be mentioned that. Thus. it can be shown with some work that equality can be obtained in (a) by an appropriate choice of b and 8b.44 that cond(^4) > 1. 6 Inner Product Spaces (b) If || • || is the Euclidean norm. then cond(A) = \/Ai/A n . then Ax — b is well-conditioned. then the relative error in x may be small even though the relative error in b is large. where Ai and An are the largest and smallest eigenvalues. the estimate of the relative error in x is based on an estimate of cond(^4). It is left as an exercise to prove that cond(A) = 1 if and only if A is a scalar multiple of a unitary or orthogonal matrix. If cond(A) is large. then \\5. . one never computes cond(v4) from its definition. however. then a small relative error in b forces a small relative error in x. The norm of A equals the Rayleigh quotient. the computed inverse of A in all likelihood only approximates A~l. however. Moreover. 60) shows that if A + 8A is invertible. and the error in the computed inverse is affected by the size of cond(A).43.. Proof. then cond(^4) is small. EXERCISES 1. But under the appropriate assumptions. for it would be an unnecessary waste of time to compute A-1 merely to determine its norm. p. \\8A\ < cond(. then Ax = b is ill-conditioned. of A* A. If cond(^4) is small. cond(^4) merely indicates the potential for large relative errors. So we are caught in a vicious circle! There are. or the relative error in x may be large even though the relative error in b is small! In short. Label the following statements as true or false. For example. respectively. in most cases. Statement (a) follows from the previous inequalities.

Sec. Prove that if B is symmetric. 6. and cond(Z?). If cond(J4) = 100.4 -1 6||/||. Use (a) to determine upper bounds for \\x . and 0. Let B = Compute | B ||. then i / m n -8b.74. then A is an eigenvalue of A* A. Let B b e a symmetric matrix. Let A and A-1 be as follows: A= 6 13 13 29 V—17 . 4 1 0\ s) f1 (b) ( .43. 7. Prove that min R(x) equals the smallest x^O eigenvalue of B. and cond(^l). 6. = The eigenvalues of A are approximately 84.001.1. (a) Approximate ||i4||. ||6|| = 1.i i ) (e) 0 471 (a) % Tz 75 °\ l \° 4. 8. Prove that if A is an eigenvalue of AA*. 0.A _1 &|| (the relative error). \\8x\\ \A\\-\\A -ii v II&II J - . obtain upper and lower bounds for \\x — x||/||x||.1 ||. then ||Z?|| is the largest eigenvalue of B.\ \ < .3 8 / -vr -38 50. This completes the proof of the lemma to Corollary 2 to Theorem 6.A~lb\\ (the absolute error) and \\x . and ||6 — Ax\\ = 0.. ||A .2007.0588. 9.) (b) Suppose that we have vectors x and x such that Ax = b and ||6 — Ax\\ < 0. 5. Suppose that x is the actual solution of Ax = b and that a computer arrives at an approximate solution x. and A l j 3.10 Conditioning and the Rayleigh Quotient 2. Prove that if A is an invertible matrix and Ax — b. (Note Exercise 3. Compute the norms of the following matrices.

(c) Let V be an infinite-dimensional inner product space with an orthonormal basis {vi.where (3 is any orthonormal basis for V. Prove the left inequality of (a) in Theorem 6.. Let T be the linear operator on V such that T(vk) = kvk.11* THE GEOMETRY OF ORTHOGONAL OPERATORS By Theorem 6.44. The operator T is called a rotation if T is the identity on . Prove each of the following results.1. Thus. Definitions. We show that any orthogonal operator on a finite-dimensional real inner product space is the composite of rotations and reflections... Such is the aim of this section. ||At|| = i . 13.. (a) Let A and B be square matrices that are unitarily equivalent. Let A be an n x n matrix e)f rank r with the nonzero singular values o~i > cr2 > • • • > rr r . Prove that \\A\\ = \\B\\.v2. This material assumes familiarity with the results about direct sums developed at the end of Section 5.4) = 1 if and only if A is a scalar multiple of a unitary or orthogonal matrix. Define |T|| = max \T(x) Prove that ||T|| = ||[T]^||.2. 6 Inner Product Spaces 10. then cond(vl) = — . any rigid motion on a finite-dimensional real inner product space is the composite of an orthogonal operator and a translation. 11.472 Chap. (b) Let T be a linear operator on a finite-dimensional inner product space V. 386).}.7. to understand the geometry of rigid motions thoroughly. o~„ 6. (a) (b) ||A||=ai. we must analyze the structure of orthogonal ejperators. Prove that cond(.Prove that ||T|| (defined in (b)) does not exist. and familiarity with the definition and elementary properties of the determinant of a linear operator defined in Exercise 7 of Section 5. The next exercise assumes the definitions of singular value and pseudoinverse and the results of Section 6. (c) If A is invertible (and hence r = n). Let T be a linear operator on a finite-dimensional real inner product space V.22 (p. 12.

and T(y) . Since T is orthogonal and A is an eigenvalue of T.b. (b) Let T: R3 -» R3 be defined by T(a. A = ±1.b) = (-0. T is called a rotation about W"1. If A = —1. The subspace W-1 is called the axis of rotation. Then V = span({x}).-c). Thus T is either a rotation or a reflection. The following theorem characterizes all orthogonal . Hence T is a reflection of R3 about W-1-. det(T) = — 1. and a real number 0 such that T(xi) = (cos0)xi + (sin0)x 2 . the xy-plane. If A = 1. The principal aim of this section is to establish that the converse is also true.11 The Geometry of Orthogonal Operators 473 V or if there exists a two-dimensional subspace W ol'V. any orthogonal operates on a finite-dimensional real inner product space is the composite of rotations and reflections. an orthonormal basis (3 = {xi. The operator T is called a reflection if there exists a one-dimensional subspace W of V such that T(x) = —x for all x G W and T(y) = y for all y G W-1. Then T(x) = -x for all x G W. In this context. det(T) = 1. and let W = span({ei}). of W and T(y) — y for all y G \N±. In this context. T is called a reflection ofV about W1.b. and hence T is a rotation. x 2 } ^ o r W. then T is the identity on V. Let T be a linear operator on a finite-dimensional real inner product space V. Note that in the first case.= {0}. so T is a reflection of V about V1.6). that is.e 2 }).Sec. Thus T is a reflection of R2 about W1 = span({e 2 }). and so T(x) = Ax for some A G R. Example 1 A Characterization of Orthogonal Operators on a One-Dimensional Real Inner Product Space Let T be an orthogonal operator on a one-dimensional inner product space V. Definitions. • Example 1 characterizes all orthogonal operators on a one-dimensional real inner product space. T(x 2 ) = (—sint9)xi + (cos0)x 2 . • Example 2 Some Typical Reflections (a) Define T: R2 -> R2 by T(a. and let W = span({e 3 }). Rotations are defined in Section 2.c) = (a. and in the second case.x for all x G W. 6. Choose any nonzero vector x in V.1 for the special case that V = R 2 . then T(x) — —x for all x G V.y for all y € W"1 = span({ei. Then T(x) = . the y-axis. It should be noted that rotations and reflections (or composites of these) are orthogonal operators (see Exercise 2). and T(y) = y for all 1 / 6 W 1 .

LA(f>p = <j>p\. then there exists a T-invariant subspace W of V such that 1 < dim(W) < 2. y 2 . 6 Inner Product Spaces operators on a two-dimensional real inner product space V. By Exercise 15 of Section 6. Moreover. the diagram in Figure 6. apply Theorem 2. If we then define W = 4>^l(Z).5.21 (p. det(Ti) = 1 and det(T 2 ) = . so is T. A complete description of the reflections of R2 is given in Section 6. As a consequence.10 .4. det(T) = det(T 2 )« det(Ti) = —1. Fix an ordered basis (3 = {y\. (S| We now study orthogonal operators on spaces of higher dimension. If Ti is a reflection on V and T 2 is a rotation on V. it suffices to show that there exists an L^-invariant subspace Z of R n such that 1 < dim(Z) < 2. . If T is a linear operator on a nonzero finite-dimensional real vector space V. The proof follows from Theorem 6. T is a reflection. T is a rotation if and only if det(T) = 1.474 Chap. Let <f>p: V — R n be the linear transformation defined by <f>p(yi) — e% for > i — 1 . Corollary. 104). For a rigorous justification.n. The Proof. and. where j3 is an orthonormal basis for V.) Theorem 6. The proof for TiT 2 is similar. Rn Figure 6.1 . 2 . Then <f>p is an isomorphism. composite of a reflection and a rotation on V is a reflection on V. Let T = T 2 Tj be the composite.23 (p. by Theorem 6. Let V be a two-dimensional real inner product space. 387) since all two-dimensional real inner product spaces are structurally identical. Furthermore. Since T 2 and Ti are orthogonal.45. Let T he an orthogonal operator on a two-dimensional real inner product space V.45. Lemma. it follows that W satisfies the conclusions of the lemma (see Exercise 13). the resulting isomorphism (j>p: V — R2 preserves inner products. as we have seen in Section 2. and let A = [T]p. Then T is either a rotation or a reflection.10 commutes. then by Theorem 6. . . that is. 2/n} for V. . V V Rn L ^ . » (See Exercise 8. • • •.2. and T is a reflection if and only if det(T) = — 1.45. Proof. Thus.

we conclude that Ax 1 — A1X1 — A2x2 and Ax2 — Ajx2 + A 2 xi.11 The Geometry of Orthogonal Operators 475 The matrix A can be considered as an n x n matrix over C and.x 2 }).. Similarly.?'A2)(xi + 1x2) = (\\X\ — A 2 x 2 ) + i(\\X2 4.A 2 xi). where Ai and A2 are real. Finally. So assume that the result is true whenever dim(V) < n for some fixed integer n > 1. Proof The proof is by mathematical induction on dim(V). (b) v = w 1 e w 2 e . m. . . . II T h e o r e m 6. W 2 . If dim(V) = 1. Since xi ^ 0 or x 2 7^ 0. . 2 . WTO} ofV such that (a) 1 < dim(Wi) <2 for i = 1 . Since U is a linear operator on a finite-dimensional vector space over C. . where xi and x 2 have real entries.46. 6. let Z = span({xi. Then there exists a collection of pairwise orthogonal T-invariant subspaces {Wi. Thus. setting far\ Xi = Vn) and x2 = b2 Vn) we have x = X\ + ix 2 . We may write A = Ai + iA2. Hence U(x) = Ax = (Ai -I. Note that at least one of Xi or x 2 is nonzero since x ^ 0. as such. and the preceding pair of equations shows that Z is L^-invariant. U(x) = ^4(xi + 1x2) = Ax\ + iAx2Comparing the real and imaginary parts of these two expressions for U(x). Thus 1 < dim(Z) < 2. Let T be an orthogonal operator on a nonzero finitedimensional real inner product space V. Let x G C n be an eigenvector corresponding to A. . the result is obvious. .e w T O .. the span being taken as a subspace of R n . . Z is a nonzero subspace. and lax +ih\ a2 + ib2 x = \an + ibn) where the a^'s and tVs are real.Sec. it has an eigenvalue A G C. . can be used to define a linear operator U on C n by U(i>) = Av.

m and W^ = W 2 e W 3 0 • • • 0 W m . Moreover. T is composed of rotations and reflections. . . . These fae.ts are established in the following result. if Tw. is a reflection is even or odd according to whether det(T) = 1 or det(T) = . By Exercise 14. (b) Let E = {x G V: T(x) = —x}. . Otherwise. T h e o r e m 6.„ be as in Theorem 0. Although the number of W.det(T w ."s for which Tw. a contradiction. is a reflection is zero or one according to whether elet(T) = 1 or dct(T) = —1. . Observe that. If Wi = V. . . very little can be said about the uniqueness e)f the decomposition of V in Theorem 6. we can always decompose V so that Tw. Otherwise. i/Tw. 3 . . 6 Inner Product Spaces Suppose elim(V) = n.2. the NA/j's. Unfortunately. = (-1 proving (a).'s for which Tw. But then. For otherwise. exists a nonzero x G W^ for which T(x) = — x. . there.46. by Exercise 15. is a reflection.46. W. Since dim(Wf) < n. Wi. is a reflection. then E is a T-invariant subspace of V. . . . orthogonal. If W = E . . (a) The number ofWi s for which Tw. is a reflection is not unique.46. . . in some sense. W m } of W^. . the result follows.we obtain a collection of pairwise orthogonal T-invariant subspaces {Wi. is a reflection are not unique.) dct(T w. W m } is pairwise. whether this number is even or odd is an intrinsic property of T. \Nj. Applying Example 1 and Theorem 6. Then. Thus. Proof. .476 Chap. V = Wi e VArf = Wi $ W 2 0W. Thus { W i . det(T) = det(T W l )..46 to Tw. . Let T.'s in the decomposition for which Twi is a reflection. . W 2 . W 2 .1 .. (a) Let r denote the number of W. the dimension of each Wi is either 1 or 2.2. For example. m . and the number of W. x G W. 3 . the number rn of Wj's. fi E C E 1 n E = {0}. (b) It is always possible to decompose V as in Theorem 6. W 2 .45 in the context of Theorem 6. then W is T-invariant. for each i — 1. .46 so that the number of Wj s for which Tw. is either a rotation e>r a reflection for each i = 2 . is a rotation. V. So by applying Theorem 6. Furthermore. . . Wj1 is T-invariant and the restriction of T to Wf is orthogonal. .. By the lemma.. . we may apply the induction hypothesis to T w i and conclude that there exists a collection of pairwise orthogonal T-invariant subspaces {Wi.such that 1 < dim(W<) < 2 for i = 2 . If E = {()}. . Wfc} of W such that W = Wi 0 W 2 0 • • • e Wfc and for 1 < i < k. the result is established. then dim(Wi) = 1. we conclude that the restriction of T to W./ {0}. is a reflection for at most one W.k. .47. there exists a T-invariant subspace Wi of V such that 1 < dim(W) < 2. and by Exercise 13(d) of Section 6. Tw.

.-0Wm. As in the proof of Theorem 6. T. is a rotation or a reflection according to whether Tw. . ..) JA =det(-l) = -1.) 1 . and (3r contains two vectors if p is even and one vector if p is odd. . W f c . an orthogonal operator can be factored as a product of rotations and reflections. (e) det(T) = 1 if Ti is a rotation for each i — 1 otherwise. T 2 . . then dim(Wfc+r) = 1 and = det([Tw.. is a rotation for i < m.. {Wi. If (3r consists of one vector.. (c) TiTj = TjT. and (e) are left as exercises. .47(b). is either a reflection or a rotation. Then there exists a collection { T i . and hence Tw. W2. For each i = 1 . is a rotation or a reflection. for all i and j. Thus Twfc+J. then det(T Wfc+i ) = det([Tw. T m } of orthogonal operators on V such that the following statements are true. 6. T. r. we can write V = W10W20. - where Xj G Wj for all j. (See Exercise 16. where Tw. .11 The Geometry of Orthogonal Operators 477 choose an orthonormal basis f3 for E containing p vectors (p > 0).T m . if any $ contains two vectors. It is easily shown that each T$ is an orthogonal operator on V. and V = Wi 0 W 2 © • • • 0 Wfc © • • • 0 Wfc+r Moreover. • • •.Sec. . m. .Xi-i + T(Xi) + X i+ i -I 1 xm. | As a consequence of the preceding theorem. . .. This establishes (a) and (b). (a) For each i. Let T be an orthogonal operator on a finite-dimensional real inner product space V. is a reflection by Theorem 6. 2 .. and we conclude that the decomposition in (27) satisfies the condition of (b). (d) T = T i T 2 . let Wk+i = span(/?i).: V — V + by Tj(xi + x 2 + = Xi+X2-\ r. The proofs of (c). 2 . It is possible to decompose ft into a pairwise disjoint union (3 = (3\ U (32 U • • • U (3r such that each /% contains exactly two vectors for i < r. T% is a reflection. In fact. For each i = 1. . (b) For at most one i. Then. W f c + r } is pairwise orthogonal. (27) So Twfc+i is a rotation. clearly. Proof.46. is a rotation for j < k + r. \fii = det -1 0 0 -1 = 1. . define T. . (d). Corollary..

then V = Wi © W 2 . is not a reflection. is not a reflection. suppose that dim(Wi) = 1 and dim(W 2 ) = 2. (a) Any orthogonal operator is either a rotation or a re-flection. then Tj is the identity on Wj.©W20---0Wm be a decomposition as in Theorem 6. we conclude that T is either a single reflection or the identity (a rotation). m = 2 or m = 3. (d) The composite of any two rotations on a four-dimensional space is a rotation.47(b). (Note that if Tw. is a reflection or the identity on Wi. (b) The composite of any two rotations on a two-dimensional space is a rotation. Since Twi is a reflection for at most one i. Defining Tj and T 2 as in the proof of the corollary to Theorem 6. We show that T can be decomposed into the composite of a rotation and at most one reflection. If m = 2. (f) The composite of two reflections is a reflection. Otherwise.47. Thus Tw. proof of the corollary to Theorem 6. 2.47. then Ti is the identity on V and T = T 2 . Prove that rotations. we have that T = TiT 2 is the composite of a rotation and at most one reflection. . (e) The ielentity operator is a rotation. if det(T) = — 1. Label the following statements as true or false. Assume that the underlying vector spaces are finite-dimensional real inner product spaces. (g) Any orthogonal operator is a composite of rotations. (c) The composite e f any two rotations on a three-dimensional space > is a rotation. (i) Reflections always have eigenvalues. 6 Inner Product Spaces Orthogonal Operators on a Three-Dimensional Real Inner Product Space Let T be an orthogonal operator em a three-dimensional real inner product space V. Clearly. For each i. let Tj be as in the. • EXERCISES 1. then T is a reflection. (j) Rotations always have eigenvalues.478 Example 3 Chap. (h) For any orthogonal operator T. If Tw. reflections. then V = Wx © W 2 © W 3 and ehm(W. Let V = W. anel composites of rotations and re%flections are orthogonal operators. Without loss of generality.) If m = 3. and Tw2 I s a rotation.) = 1 for all i. Tj is a reflection.

(b) Prove that T^T^p = T^+^) for any (p. _ /cos <p — sin ysin</> COS' (a) Prove that any rotation on R2 is of the form T^. Prove that no orthogonal operator can be both a rotation and a reflection. that is. Given real numbers (p and ip. Let ( I 2 v/3 \/3\ 2 1 "2 I 479 A = and B— 0 -1 (a) Prove that VA is a reflection. Prove that the composite of any two rotations on R3 is a rotation on R3. where . For any real number <p.45 using the hints preceding the statement of the theorem. (c) Deduce that any two rotations on R2 commute. 6. 0 sin <b cos and 'cos ip B = | sint/' 0 — sin ip 0^ cos ip 0 0 1.ip G R. define T<p = LA.•^•-T* i. For any real number <> let /. 4. define matrices 1 0 0 A = | 0 cos <p — sin < . the subspace of R2 on which L^ acts as the identity. 7. (b) Prove that LAB is a rotation. for some </>. (c) Prove that \-AB and \-BA are rotations. (c) Find the axis of rotation for LAB8. 6. (b) Find the axis in R2 about which L^ reflects.11 The Geometry of Orthogonal Operators 3. A = cos* Usm< sin* — cos* (a) Prove that LA is a reflection. 5. (b) Find the axis in R2 about which L^ reflects. 9. Prove Theorem 6. (a) Prove that LA and LB are rotations. Sec. .

(/) G W. If W is a T-invariant subspace erf V. of at most | n rotations e>r as the ce)iupe)sitc of one inflection and at most ^(n — 2) rotations.480 Chap. Prove that det(T) = elet(T W| ) • ele>t(Tw. Prove that T is a product e f rotatiems if and only if > dim(V) is even. 16. prove the following rosults. (b) If /? is even. INDEX OF DEFINITIONS FOR CHAPTER 6 Adjoint of a linear operator Adjoint of a matrix 331 Axis of rotation 473 358 Bilinear form 422 Complex inner product space Condition number 469 332 . Give an example of an orthogonal operator that is neither a rcfiection nor a rotation.t V be a finite-dimensional real inner product spae-e. where V is a diroct sum of T-invariant subspae-e. 6 Inner Product Spaces 10. T*(y) = T '(. 18.• -cOW/. 15.s. V = W] © W 2 0 . then T can be1 e'xpresse>el as the' colllpe)site. For any x. Define T: V — V * by T(x) = — x. then T can be e'xpressexl as the1 e-emiposite of at most one reflection anel at most \(n — 1) rotations. > 11. is odd. Hint: Use the fact that. Let V be a real inner product spae*? e)f elimension 2. 14. Suppe)se that T is not the identity.47.or three-dimensional real inner product space'. say. (a) Tw is an orthogonal [unitary] operator on W.. Complete the proof of the1 corollary to The'orcm 6. (a) If a. Let T be' a linear e)perate>r on an n-dimensional real inner product space V.46 by shewing that > W = 0~i1(Z) satisfies the required conelitieais... Complete the proof e f the lemma to Theorem 6. 17.) ele>t(TwJ. y G V sue:h that x / y and ||x|| = \\y\\ = 1. Prove that if V is a two. 12. show that there exists a unique rotation T em V such that T(x) — y. Let T be1 a linear e)perate>r on a finite-eliniensional vector space V. Let T be an orthogonal [unitary] operator on a finite-dimensional real [complex] inner product space V. for any y G W. Lt. then the cemiposite e f two reflcctiems on V is a rotation of V. (c) T w is an orthogonal [unitary] operator on W. (b) W L is a T-invariant subspace of V. Tw is one-to-one and onto te) concluele that. Prove the following results. 13.

t.Chap. matrices 384 Orthogonal matrix 382 Orthogonal operator 379 Orthogonal projection 398 Orthogonal projection on a subspace 351 Orthogonal subset of an inner product space 335 481 Orthogonal vectors 335 Orthonormal basis 341 Orthonormal se. of a 2 x 2 matrix 448 Polar decomposition of a matrix 412 Pseudoinverse of a linear transformation 413 Pseudoinverse of a matrix 414 Quadratic form 433 Rank of a bilinear form 443 Rayleigh quotient 467 Real inner product space 332 Reflection 473 Resolution of the identity operator induced by a linear transformation 402 Rigid motion 385 Rotation 472 Self-adjoint matrix 373 Self-adjoint operator 373 Signature of a form 444 Signature of a matrix 445 Singular value decomposition of a matrix 410 Singular value of a linear transformation 407 Singular value of a matrix 410 Space-time coordinates 453 Spectral decomposition of a linear operator 402 Spectrum of a linear operator 402 Standard inner product 330 Symmetric bilinear form 428 Translation 386 Trigonometric polynomial 399 Unitarily equivalent matrices 384 Unitary matrix 382 Unitary operator 379 Unit vector 335 . 335 Penrose conditions 421 Permanent. 6 Index of Definitions Congruent matrices 426 Conjugate transpose (adjoint) of a matrix 331 Critical point 439 Diagonalizable bilinear form 428 Fourier coefficients of a vector relative to an orthonormal set 348 Frobenius inner product 332 Gram-Schmidt orthogonalization process 344 Hessian matrix 440 Index of a bilinear form 444 Index of a matrix 445 Inner product 329 Inner product space 332 Invariants of a bilinear form 444 Invariants of a matrix 445 Least squares line 361 Legendre polynomials 346 Local extremum 439 Local maximum 439 Local minimum 439 Lorentz transformation 454 Matrix representation of a bilinear form 424 Minimal solution of a system of equations 364 Norm of a matrix 467 Norm of a vector 333 Normal matrix 370 Normal operator 370 Normalizing a vector 335 Orthogonal complement of a subset of an inner product space 349 Orthogonally equivalent.

equivalently. The rational canonical form.tors of the operator. 7.2 that the eliagonalizability of T de.4* The Rational Canonical Form .2 The Jordan Canonical Form II 7. the field of complex numbers is algebraically closed by the fundamental theorem of algebra (see Appendix D). Example 3 of Section 5. These representatiems are called canonical forms. we treat two common canonical forms. However. The first of these. It is the purpose of this chapter to consider alternative matrix representations for nondiagonalizable operators. and suppose that the characteristic polynomial of T splits. even if its characteristic polynomial splits. there is an ordered basis for the underlying vector space consisting of cigenvee.4. The choice of a canonical form is determineel by the appropriate choice of an ordered basis.2 describes such an operator. does not require such a factorization. treated in Section 7. that is. not every linear operator is diagonalizable. Recall from Section 5. For example. the canonical forms of a linear ejperator are not diagonal matrices if the linear operator is not diagonalizable.7 C a n o n i c a l F o r m s 7. or." 482 . In this chapter.1 THE JORDAN CANONICAL FORM I Let T be a linear operator on a finite-dimensional vector space V. requires that the characteristic polynomial of the operator splits. This form is always available if the underlying field is algebraically closed. the Jordan canonical form. and their advantages and disadvantages depend on how they are applied. Such an operator has a diagonal matrix representation. The first two sections deal with this form. / I s we learned in Chapter 5. the advantage of a diagonalizable linear operator lies in the simplicity of its elescription.1 The Jordan Canonical Form I 7. Naturally. if every polynomial with coefficients from the field splits.3 The Minimal Polynomial 7. So a lack of diagonalizability means that at least one eigenspace of T is too "small.pends on whether the union of ordered bases for the distinct eigenspaces of T is an ordered basis for V. There are different kinds of canonical forms.

1 The Jordan Canonical Form I 483 In this section.v&} is an is a Jordan canonical form of T.v2..2. [T]^ is a diagonal matrix if and only if each Ai is of the form (A).v^.. Also observe that v\. ordered basis for C8 such that ( 2 0 0 0 J = [T]/3 = 0 0 0 ^ 0 1 2 0 0 0 0 0 0 0 0 1 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 1 3 0 0 0 0 0 0 0 0 0 0 0 \ 0 0 0 0 0 1 0/ . we extend the definition of eigenspace to generalized eigenspace. Example 1 Suppose that T is a linear operator on C 8 . and vj are the only vectors in (3 that are eigenvectors of T. We also say that the ordered basis j3 is a Jordan canonical basis for T. we select ordered bases whose union is an ordered basis (3 for V such that /Ai O m* = {O O Ak) O A2 0\ O where each O is a zero matrix. and hence the multiplicity of each eigenvalue is the number of times that the eigenvalue appears on the diagonal of J. Observe that each Jordan block Ai is "almost" a diagonal matrix—in fact. From these subspaces. • . 7.. These are the vectors corresponding to the columns of J with no 1 above the diagonal entry. Notice that the characteristic polynomial of T is det(J — ti) — (t — 2)4(t — 3)2r. and each Ai is a square matrix of the form (A) or A 0 0 {0 1 0 • • 0 A 1 • • 0 0 0 0 • • A 0 • • 0 °> 0 1 V for some eigenvalue A of T. Such a matrix Ai is called a Jordan block corresponding to A.Sec. and (3 = {v\.V4. and the matrix [T]^ is called a Jordan canonical form of T.

it is not the case that the Jordan canonical form is completely determined by the characteristic polynomial of the operator.2\)3(vi) = 0 for i = 1.484 Chap.2. i>5 and VQ correspond to the scalar 3. But the operator T' has the Jordan canonical form J'. then (T — \\)p(v) — 0 for sufficiently large p. Therefore A is an eigenvalue of T. we prove that every linear operator whose characteristic polynomial splits has a Jordan canonical form that is unique up to the order of the Jordan blocks. Nevertheless. each vector in (3 is a generalized eigenvector of T. For example. In the context of Example 1. Let T be a linear operator on a vector space V. and 4. we can generalize these observations: If v lies in a Jordan canonical basis for a linear operator T and is associated with a Jordan block with diagonal entry A. where (3 is the ordered basis in Example 1 and /2 0 0 0 J' = 0 0 0 \0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0\ 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0/ Then the characteristic polynomial of T' is also (t — 2)4(t — 3)2t2. vi. and (T . and p is the smallest positive integer for which (T —Al)p(x) = 0. Similarly. Just as eigenvectors lie in eigenspaces. the Jordan canonical form of the linear operator T of Example 1. which is different from J. it follows that (T .8. let T' be the linear operator on C8 such that [T']/? = J ' . V3 and V4 correspond to the scalar 2. Eigenvectors satisfy this condition for p = 1.1 and 7.v2. Since v\ and V4 are eigenvectors of T corresponding to A = 2. In fact. A nonzero vector x in V is called a generalized eigenvector of T corresponding to A if (T — Al)p(x) = 0 for some positive integer p. Similarly (T . Consider again the matrix J and the ordered basis (3 of Example 1. The generalized eigenspace of T corresponding to .3. 7 Canonical Forms In Sections 7. and let A be an eigenvalue of T.6.2. (T—2\)(v$) = v2. and V7 and vs correspond to the scalar 0. Notice that if x is a generalized eigenvector of T corresponding to A.3I) 2 (^) = 0 for i = 5.0\)2(vi) = 0 for i = 7. Notice that T(v2) = vi+2v 2 and therefore. Let T be a linear operator on a vector space V. (T—2\)(v2) = v\. Definition. generalized eigenvectors lie in "generalized eigenspaces. Because of the structure of each Jordan block in a Jordan canonical form." Definition. then (T —Al)p_1(x) is an eigenvector of T corresponding to A. and let A be a scalar.

suppose that x 7^ 0.The proof that KA is closed under scalar multiplication is straightforward. is the subset ofV defined by K\ = {x G V: (T — Al)p(x) = 0 for some positive integer p}. 485 Note that K\ consists of the zero vector and all generalized eigenvectors corresponding to A.p\)(T . for any polynomial g(x). and let A be an eigenvalue of T. 1 .X\)P+fl(x + y) = (T. In particular. contrary to the hypothesis. But EA n EM = {0}. Then (T-AI)(y) = ( T .4.x (T .. the range of a linear operator T is T-invariant. Let T be a linear operator on a vector space V. and hence x + y G KA. Suppose that x and y are in KA. and hence y G EA. and thus y = 0. In the development that follows.1 The Jordan Canonical Form I A. Then (a) K\ is a T-invariant subspace ofV containing E\ (the eigenspace ofT corresponding to X).A I ) p ( x ) = (T-AI)4(y) = 0. Let p be the smallest integer for which (T — Al)p(x) = 0. T h e o r e m 7..AI) p .Furthermore.p\)(x) = 0. it is a simple observation that EA is contained in KA(b) Let x G KA and (T — p\)(x) = 0. if W is T-invariant.Al)p(x) = T(0) = 0.M)p+q(y) = (T-AI)«(0)-r-(T-AI)p(0) = 0. Proof. Finally. the restriction ofT — p\ to KA is one-to-one. denoted K\. So x = 0. (a) Clearly.Wf-^x) = (T . so that j / 6 E. Therefore (T . Recall that a subspace W of V is T-invariant for a linear operator T if T(W) C W. then it is also ^(T-invariant. consider any x G K\. By way of contradiction. (T .p\)(y) = (T . 0 £ K\.1. and the restriction of T — p\ to KA is one-to-one. Therefore T(x) G K>. 7.Sec.A I ) p ( x ) = 0.Al) p+ «(x) + (T . Then (T .AI)pT(x) = T(T . (b) For any scalar p ^ A. Choose a positive integer p such that (T — Al)p(x) = 0.Then there exist positive integers p and q such that ( T . Furthermore. To show that K\ is T-invariant. we assume the results of Exercises 3 and 4 of Section 5. and let y = (T-\\)P-l(x).

Afcl)m). 317). the result is established whenever T has fewer than k distinct eigenvalues.) = (T-Afcir + 1 (2/)Therefore y G KAfc. Proof. 314). Then /i(Tw) is identically zero by the Cayley-Hamilton theorem (p. and by Theorem 7. Since (T — Afcl)m maps K\t into itself and Afc 7^ Ai. Thus V = K Al . Then v = (T — Afcl)m(y) for some y G V. K\{ is contained in W. Then (Ai — t)m is the characteristic polynomial of T. (a) Let W = KA. 7 Canonical Forms Theorem 7. (b) KA = N ( ( T . and d < m. 317). • • •. Next. Proof. . observe that Afc is not an eigenvalue of Tw. Then. the restriction of T — Afc I to K\i is one-to-one (by Theorem 7.1(b). . Then f(t) = (t — Xk)mg(t) for some polynomial g(t) not divisible by (t .2. | v Theorem 7. One consequence of this is that for i < k. there exist vectors Vi G K\if 1 < i < k. A2. Now let W and h(t) be as in (a). Since every eigenvalue of Tw is an eigenvalue of T. . therefore (T . and let f(t) be the characteristic polynomial of T. For suppose that i < k. Clearly W is T-invariant. such that X — Vi + V2 H hUfe.Hence h(t) = (—l)d(t—X)d. Now suppose that for some integer k > 1.21 (p. Since d < m. and hence (Ail . and it follows that 0 = (T-Afcl)(t. and suppose that T has k distinct eigenvalues. .1(b)) and hence is onto.3. Afc_i. Suppose that A is an eigenvalue of T with multiplicity m.A I ) m ) .Xk\)m(y) = 0. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. Let T be a linear operator on a finite-dimensional vector~space V such that the characteristic polynomial of T splits.Al) m ).T) m = To by the Cayley-Hamilton theorem (p. A is the only eigenvalue of Tw. where d = dim(W).486 Chap. The proof is by mathematical induction on the number k of distinct eigenvalues of T.Al)d(x) = 0 for all x G W. and the result follows. Then (a) dim(KA) < m. So by Theorem 7.Al)m) C KA. and let Ai. Afe be the distinct eigenvalues of T. and let h(t) be the characteristic polynomial of T w By Theorem 5.For suppose that T(v) = Afct? for some v G W. Let m be the multiplicity of Afc.Xk). h(t) divides the characteristic polynomial of T. v = (T . A 2 . Let W = R((T . . and hence Xi is an eigenvalue of Tw with corresponding generalized eigenspace K\i. for every x G V. the distinct eigenvalues of T w are Ai. Observe that (T — Afcl)w maps KA{ onto itself for i < k. (b) Clearly N((T . First suppose that k = 1.2. and let m be the multiplicity of Ai. we have KA C N((T .

. For 1 < i < k.Afcl)m(t*) + • • • + (T + vk-i) Xk\)m(vk-i).2(a).mk.4.Afcl)m(x) = W! + w2 + • • • + wk-i. q = > J di < T j m-i = dim(V). Let q be the number of vectors in (3. Then (T . (a) Suppose that x G /% fl f3j C KA4 D KAJ5 where i ^ j. For each i. 7. Afc_i. In this case. By Theorem 7. T — A^l is one-to-one on KA^ . But this contradicts the fact that x G K\t. for 1 < i < k. onto itself for i < k. Then. . 47). and hence there are vectors Wi G K\^ 1 < i < k — 1. Since each Vi is a linear combination of the vectors of pi.Sec. Since T w has the k . and therefore (T — AJ) p (x) ^ 0 for any positive integer p. Therefore (3 spans V. Theorem 7. ..vk. Then the following statements are true.. | x = vi+u2H The next result extends Theorem 5. Since (T — Afcl)m maps KA.Xk\)m(Vl) + (T . Afc be the distinct eigenvalues of T with corresponding multiplicities mi. (a) A n f t = 0 fori^j. Then dim(V) < q. . there exist vectors Vi G KA4 such that x = v\ + v2 + • • • -f. • . and it follows that x . by Theorem 7. such that (T . A2. (b) Let x G V. (c) dim(KAj = mi for all i.Xk\)m(vi) = Wi for i < k. it follows that x is a linear combination of the vectors of (3.9(b) (p.1 distinct eigenvalues Ai.Afcl)m(x) = (T . .m2. let di = dim(KAj).Afcl)m(x) G W.1 The Jordan Canonical Form I 487 Now let x G V. the induction hypothesis applies.3. the eigenspaces are replaced by generalized eigenspaces. Consequently (3 is a basis for V by Corollary 2 to the replacement theorem (p.. there exist vectors Vi G KA. By Theorem 7. Thus (T . The corresponding generalized eigenspace of T ^ for each Xi is K\. Therefore there exists a +vk.(v\ + u2 H vector vk G KAfc such that G KAfc. let Pi be an ordered basis for K\{. . Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. i=i i=i Hence q = dim(V).1(b). and the result follows. . and let Ai... such that (T . 268) to all linear operators whose characteristic polynomials split. (b) (3 = 0i U (32 U • • • U 0k is an ordered basis for V. Proof. A 2 .

Furthermore.twe. We say that the length of the cycle is p.488 Chap. Definitions. I Corollary. 02 = {"4}^ 03 = {»?>•<va}and 04 — {VJ. < 4.2 l ) 2 ( e 3 ) . and the'reforc d. . ' m i .n these vccte>rs is the key to finding Jordan canonie-al base's. then the set {x} is a cycle of generalized eigenvectors of T corresponding to A of length 1. r : { } . This leads to the fe)lle)wing ele'finitions. anel he'iure. subspnce>s have1 the same dimension if and only if they are equal.0(a) (p. Observe that the vee-te)rs in 0 that eletcrinine the first Jorelan block of . the>se.AI)""1 (x). In Example 1. ( T . we see that fti is a. the subsets fti — {v\. Combining Theeuvms 7. Furthermore. 268). Then T is diagonalizable if and only if EA = KA for every eigenvalue X ofT. form {'•i.ar operator T is the only eigenvector erf T in the eyerie1. e)bse>rve that ( T .But ee i=l i \ di < m-i by Theorem 7.2 l ) ( r 0 .{) = 0. 7 Canonical Forms (c) Using the notation and result of (b).\}. we consieler again the. Notie-e that ft is a disjoint union of these cycles.. But EA G KA. and let x be a generalized eigenvector of T corresponding to the eigenvalue X. respectively. (T .'"2. The relation be. basis ft of Example 1. we s>> that T is ee diagonalizable' if anel only if (HIII(EA) = dim(KA) for each eigenvalue A of T. Alse) observe that if.4 to obtain a Jordan canonical basis for the operator. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits.VQ} are the cycle's of generalized eigenvectors of T that occur in 0.x} is called a cycle of generalized eigenvectors of T corresponding to X./ are of the. setting W.v. Then the ordered set {(T .Al)".may use' The'orem 7. This is precisely the condition that is re'ejuirenl for a Jordan canonical basis.v2.e. Let T be a linear operator on a vector space V.2l)^(/»..1 and 5.2 (x).2(a).":i} = { ( T . For this purpose. We..Al)p(x) = 0. anel [Twjft is the ith Jordan block of the' Jorelan canonical form of T. Notice' that the initial vector of a cycle of generalized eigenvecte>rs of a line. (T . for all i. We now foe-us our attention on the' problem of select ing suitable bases for the generalized eigenspaces of a linear operates se> that we..Al)(x). The vectors (T — Al) p ~'(x) and x are called the initial vector and the end vector of the cycle. have seen that the first four vectors of (3 lie' in the generalize'el eigenspace K2. basis for W..j = span(/?i) for 1 < i. — m. we s%> that / J d t = > . Suppose that p is the smallest positive integer for which (T .r is an eigenvector of T e-orrosponding to the eigenvalue A. Proof.

vp}. Theorem 7. by the preceding equations. For i > 1. . then the result is clear.*(x) for i < p So ( T . Suppose that 71. II In view of this result. For (b). and suppose that 7 has exactly n vectors. and let X be an eigenvalue ofT. (a) Suppose that 7 corresponds to A. we sec that [Twh is a Jorelan block. where Vi = (T — Al) p. Let U denote the.e that it is also sufficient. and x is the end vector of 7. The proof that 7 is line. So assume that.lq are cycles of generalized eigenvectors of T corresponding to X such that the initial vectors of the "fi 's are distinct and form a linearly independent set. this is a neec. We leave the details as an exercise.s us toward the.. If this number is less than 2. w 2 .. Let T be a linear operator on a vector space V.. and suppose that ft is a basis for V such that ft is a disjoint union of cycles of generalized eigenvectors of T. 7 has length p..5. We will soon s<. (a) For each cycle 7 of generalized eigenvectors contained in ft. 7. restriction of T . simply repeat the arguments of (a) for each cycle in ft in order to obtain [T]^. the result is valid whenever 7 has fewer than n vectors. (T-AI)K) = (T-AI)^i-1>(x) = ^_i. (b) 0 is a Jordan canonical basis for V. Since the characteristic polynomial of a Jordan canonical form splits. is linearly i-l independent. Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial splits.ssary condition. • • • . Then the following statements are true. Proof. and.Sec. and [Tw]7 is a Jordan block. Exercise 5 shows that the 7$'s are disjoint. Therefore T maps W into itself. and hence T(v{) = Xv\. Clearly W is (T — Al)-invariant. The next result mov<. there exist bases that are disjoint unions of cycles of generalized eigenvectors.A I ) M = ( T . desireel existence theorem. under appropriate conditions.AI to W.. for some integer n > 1. W = span(7) is T-invariant.arly independent is by mathematical induct ion on the number of vectors in 7. Proof.A l ) p ( x ) = 0.6. and dim(W) < n.72. Then the 7* 's are disjoint. Then 7 = {v\. and their union 7 = M 7. Let W be the subspace of V generatcxl by 7.1 The Jordan Canonical Form I 489 Theorem 7. and vp — x. we must show that.

The result is clear for n = 1. Theorem 7. and assume that dim(KA) = n.W2. 7 Canonical Forms For each i. it must be a basis for W. then 7^ = 0 . the end vector of 7» is the image under U of a vector Vi G KA. Let 7' = IIT*i Then by the last statement. and so we can extend each 7* to a larger cycle ji = 7* U {vi} of generalized eigenvectors of T corresponding to A.. every nonzero image under U of a vector of 7$ is contained in 7^. Hence dim(R(U)) = n — q. Since the q initial vectors of the 7.. wq} is a linearly independent subset of EA.7 2 ... So suppose that for some integer n > 1 the result is valid whenever dim(KA) < n. I Corollary.'s form a linearly independent set and lie in N(U).Then R(U) is a subspace of KA of lesser dimension. by the induction hypothesis.us} . let 7^ denote the cycle obtained from 7$ by deleting the end vector. • • •.. Since 7 generates W and consists of n vectors. this set can be extended to a basis {wi. .490 Chap. For 1 < i < q. Therefore 7' is a basis for R(U). From these inequalities and the dimension theorem. For 1 < i < q.dim(N(U)) >(n-q) + q — n.U\. Note that if 7* has length one. Therefore. The proof is by mathematical induction on n = dim(KA). corresponding to A for which 7 = M 7^ is a basis for R(U). and let X be an eigenvalue of T. and conversely. Every cycle of generalized eigenvectors of a linear operator is linearly independent. We conclude that dim(W) = n. and 1 hence of T itself. 7' consists of n — q vectors. . Hence 7 is linearly independent. we obtain n > dim(W) = dim(R(U)) . each vector of 7^ is the image under U of a vector in 7. and R(U) is the space of generalized eigenvectors corresponding to A for the restriction of T to R(U). Since {wi. we have dim(N(U)) > q.wQ. •. . Then K\ has an ordered basis consisting of a union of disjoint cycles of generalized eigenvectors corresponding to X. .. let Wi be the initial vector of 7J (and hence of 7*). . Furthermore. 7 g of generalized eigenvectors of this restriction.U2. Thus we may apply the induction hypothesis to conclude that 7' is linearly independent. and the initial vectors of the 7^'s are also initial vectors of the 7i's.. 7' generates R(U). In the case that 7^ ^ 0. Let T be a linear operator on a finite-dimensional vector space V. Let U denote the restriction of T — AI to KA. w 2 . Proof. there exist disjoint cycles 71.7.

7. .6. 1 We can now compute the Jordan canonical forms of matrices and linear operators in some simple cases. Let A G M n X n (F) be such that the characteristic polynomial of A (and hence of LA) splits. . ..wq. {«i}. Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial splits. By Theorem 7. by Theorem 7. Furthermore. . The next result is an immediate consequence of this definition and Corollary 1. Let ft = ft\ U 02 U • • • U 0kThen. Proof. Afc be the distinct eigenvalues of T. Definition. Corollary 2. 1 The Jordan canonical form also can be studied from the viewpoint of matrices. Therefore dim(KA) = rank(U) + nullity(U) = r + q + s. w 2 . it follows that nullity(U) = q + s.1 The Jordan Canonical Form I 491 for EA. . • • • .Then 71. Exercise. Corollary 1. 7. U\. as is illustrated in the next two examples. It follows that 7 is a basis for KAI The following corollary is immediate. for each i there is an ordered basis fti consisting of a disjoint union of cycles of generalized eigenvectors corresponding to A.Suppose that 7 consists of r = rank(U) vectors. Let A be an n x n matrix whose characteristic polynomial splits. Then T has a Jordan canonical form. . us] is a basis for EA = N(U).72.Sec. A 2 . Let Ai. We show that 7 is a basis for KA. Proof. The tools necessary for computing the Jordan canonical forms in general are developed in the next section.W2. • • •. Then 7 consists of r + q + s vectors. . % . So 7 is a linearly independent subset of KA containing dim(KA) vectors.4(b). ft is an ordered basis for V. and A is similar to J. • • • . Then A has a Jordan canonical form J. . Then the Jordan canonical form of A is defined to be the Jordan canonical form of the linear operator LA on Fn. {^2}. . {'"«} are disjoint cycles of generalized eigenvectors of T corresponding to A such that the initial vectors of these cycles are linearly independent. since {wi. Therefore their union 7 is a linearly independent subset of KA by Theorem 7.

) = 1. To find the Jorelan canonical form for A. need to find a Jordan canonical basis for T = L.ptable candidate for v. and KA. .1) is an eigenvector of T corresponding to A) = 3. this basis is either a union of two cycles of length 1 or a single cycle of length 2.4.2..v} = m .0) is an ae-c.2I)2v = 0. —3.e.2. A vector v is the end vector of such a cycle if and only if (A . and dhn(KA2) = 2. The characteristic polynomial of A is f(t)=det(A-tI) = -(t-Z)(t-2)2. By Theorem 7. = KA. KA. Observe that (—1.2. we have that EA. By Theorem 7. but (A . . respectively. we obtain the cycle of generalizes! eigenvectors 02 = {(A-2I)v. It can easily be shown that is a basis for the solution space of the homogeneous system (A — 2I)2. —1). Since (A — 2I)v = (I. we. Now choose a vector v in this set so that (A — 2I)v ^ 0. 7 Canonical Forms G M 3 x3(rt). The vector v = (-1. Hence Ai = 3 and A2 = 2 arc the eigenvalues of A with multiplicities 1 and 2. Since EAl = N(T-3I).\.492 Example 2 Let A= 3 1 -2 -1 5 0 1 -1 4 Chap. Since dim(KA_. The former case is impossible because the vectors in the basis would be eigenvectors -contradicting the fact that dim(EA2) = 1. therefore ft = is a basis for KA. = N((T-2I) 2 ).) = 2 and a generalized eigenspace has a basis consisting of a union of cycles. = N(T-3I).2I)v f 0. dim(KA. Therefore the desired basis is a single cycle of length 2.v = 0.

Sec. where Q is the matrix whose columns are the vectors in 0. So the desired basis must consist of a single cycle of length 3. Notice that A is similar to J. In any basis for KA.2 = 3 . So ft is a basis for KA.1 The Jordan Canonical Form I 493 as a basis for KA2 • Finally.NOW dim(E A ) = 3 . 0/ Therefore a basis for KA cannot be a union of two or three cycles because the initial vector of each cycle is an eigenvector. J = Q~lAQ. Let ft be the standard ordered basis for P2(R). In fact. there must be a vector that satisfies this condition. we take the union of these two bases to obtain ft = ftiUft2 = which is a Jordan canonical basis for A.1 which is a Jordan canonical form of T.0 -1 0 0 0\ . The end vector h(x) of such a. Thus A = — 1 is the only eigenvalue of T. and hence KA = ?2(R) by Theorem 7. or else .2 = 1. 3 0 0 ~o1 2 1 0 0 2 J = m* = is a Jordan canonical form for A.rank '0 0 . and there do not exist two or more linearly independent eigenvectors.\)2(h(x)) / 0.rank(j4 + 7) = 3 . 7. We find a Jordan canonical form of T and a Jordan canonical basis for T. = /-I 0 \ 0 1 0" -1 1 0 . Therefore.Then m* = which has the characteristic polynomial f(t) — —(t + l) 3 . then 7 determines a single Jordan block [T].4. If 7 is such a cycle. Example 3 Let T be the linear operator on P2(R) defined by T(g(x)) = —g(x) — g'(x). cycle must satisfy (T 4.

prove that Jorelan canonical forms are unique up to the order of the Jordan blocks. then 0i U 02 U • • • U ftk is a Jordan canonical basis for T. The next result. fti is a basis for K./ has Jordan canonical form J. (g) For any Jorelan block . whose. characteristic polynomial splits. we develop a computational approach for finding a Jordan canonical form and a Jordan canonical basis. (e) There is exactly one cycle of generalized eigenvectors corresponding to each eigenvalue of a linear operator on a finite-dimensional ve. 278). (h) Let T be a linear operator on an n-dimensional vector spae. then the eigenspaces and the generalized eigenspaces coincide.8. (b) It is possible. If. • In the next section. extends Theorem 5. for a generalized eigenvector of a linear operator T to correspond to a scalar that is not an eigenvalue of T.11 (p. 2 } is a Jordan canonical basis for T. Proof. EXERCISES 1. we see that h(x) = x is acceptable. contrary to our reasoning.11 to the nondiagonalizable case. If T is diagonalizable. and let Ai. Then V is the direct sum of the generalized eigenspaces of T. Let T be a linear operator on a finite-dimensional vector space V.\. Then. ( . for each i.. distinct eigenvalues of T. (c) Any linear operator on a finite-dimensional ve.494 Chap. Testing the vectors in 0. Therefore 7 = {(T + I ) V ) .e. and suppose that the characteristic polynomial of T splits.. the operator L. (a) Eigenvectors of a linear operator T are also generalized eigenvectors of T. for any eigenvalue A of T. ( T + I)(x 2 ).e:tor space. Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial sj>lits. (f) Let T be a linear operator on a finite-dimensional vector space whose characteristic polynomial splits.~2x. In the process. (d) A cycle of generalized eigenvectors is linearly independent. which is optional. Exercise./. T is diagonalizable if and only if V is the direct sum of the eigenspaces of T.. Theorem 7.A2 Afc be the. 7 Canonical Forms no vector in KA satisfies this condition. has a Jorelan canonical form. By Theorem 5.ctor space. Label the following statements as true or false.. we. KA = N((T-AI)").x 2 } = {2.

7 P be cycles of generalized eigenvectors of a linear operator T corresponding to an eigenvalue A. (c) If V = W (so that T is a linear operator on V) and A is an eigenvalue of T. 7. For each matrix A. 4 J Let T be a linear operator on a vector space V. Let 71. (a) N(T) = N(-T).1 *) 11 21 3 -4 -8 -1 -5 -11 0 (b) A = 1 2 3 2 (c) A = (2 1 0 o\ 0 2 1 0 (d) A = 0 0 3 0 ^0 1 . 6. (c) T is the linear operator on M 2 x 2 ( # ) defined by T(A) = I for all A G M 2X 2(#).Sec. 5. te1}.1 3/ 3. Then find a Jordan canonical form J of A. Prove that span(7) is a T-invariant subspace of V. then for any positive integer k N((T-Alv) f c ) = N((Al v -T) f c ). (a) A = 1 l\ .t. (b) N(Tfc) = N((-T) fc ). (a) T is the linear operator on P2(R) defined by T(/(x)) = 2/(x) — (b) V is the real vector space of functions spanned by the set of real valued functions {l. Prove the following results. For each linear operator T. • • • . find a basis for each generalized eigenspace of T consisting of a union of disjoint cycles of generalized eigenvectors.» W b e a linear transformation. and T is the linear operator on V defined by T ( / ) = / ' . Prove that if the initial eigenvectors are distinct. ) -A .e1. (a) N(U) C N(U2) C • • • C N(Ufc) C N(Ufc+1) C • • •.72. t2. Let U be a linear operator on a finite-dimensional vector space V. Then find a Jordan canonical form J of T.1 The Jordan Canonical Form I 495 2. Let T: V . find a basis for each generalized eigenspace of L^ consisting of a union of disjoint cycles of generalized eigenvectors. and let 7 be a cycle of generalized eigenvectors that corresponds to the eigenvalue A. then the cycles are disjoint. Prove the following results. 7. (d) T(A) = 2A + At for all A G M 2x2 (i2).

Let T be a linear operator on a finite-dimensional vector space V whose characteristic polynomial splits.. then KA = N((T .. 11. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. . Use (e) to obtain a simpler proof of Exercise 24 of Section 5. and let A be an eigenvalue of T. 12. (d) Prove that if rank((T-Al)' n ) = rank((T-Al) m + 1 ) for some integer rn.5(b). and let Ai.. Prove that J . Afc be the distinct eigenvalues of T. and suppose that J = [T]^ has q Jordan blocks with A in the diagonal positions. Let T be a linear operator on a finite-dimensional vector space whose characteristic polynomial splits.4 on page 320. Exercises 12 and 13 are concerned with direct sums of matrices. Let ft' = ft n KA. let Jt be the Jordan canonical form of the restriction of T to KA^ . (b) Let ft be a Jordan canonical basis for T.4 to prove that the vectors v\. Prove Theorem 7.A. Use Theorem 7.. (a) Suppose that 7 is a basis for KA consist ing of the union of q disjoint cycles of generalized eigenvectors. then (b) If rank(U ) = rank(U m fc rank(U ) = rank(U ) for any positive integer k > m. then fc N(U'") = N(U ) for any positive integer k > rn.7. . A 2 .I)2) for 1 < i < k. . Let T be a linear operator on V.. and let Ai. For each i. Afc be the distinct eigenvalues of T. Then T is diagonalizable if and only if rank(T .. defined in Section 5. that ft is a Jordan canonical basis for T. Prove that q < dim(EA).A.496 Chap. v2.. TO w+1 (c) If rank(U ) = rank(U ) for some positive integer rn.3 are unique.8. 13.J\ © J 2 © • • • © Jfc is the Jordan canonical form of J . (a) Prove Theorem 7. . Prove Corollary 2 to Theorem 7. and let A be an eigenvalue of T.I) = rank((T . Let T be a linear operator on (e) V whose characteristic polynomial splits. A 2 .AI)'")Second Test for Diagonalizability. Prove that q < dim(E A ). (b) Suppose. 8. . then Tw is diagonalizable. . of Theorem 7.Prove that ft' is a basis for KA10. vk in the statement 9. 7 Canonical Forms m m+1 ) for some positive integer rn.4: If (f) T is a diagonalizable linear operator on a finite-dimensional vector space V and W is a T-invariant subspace of V. and let A be an eigenvalue of T.

it becomes evident that in some sense the matrices Aj are unique. fc the union ft — M fti is a Jordan canonical basis for T. if/?.]./. there. of cycles that form 0i} and the length pj (j — 1. and the bases fti. rij) of e%ach cycle.. . 487) and 7. . By Theorem 7. orderings of vectors in fti. each O is a zero matrix of appropriate size.^. is a disjoint union of cycles 7!.. the number n.Sec. suppose that. In this matrix. In this section.. A2 Afc be the distinct eigenvalues of T. That is.. which in turn determines the matrix Aj. As we will see. will henceforth be ordered in such a way that the cycles appear in order of decreasing length. is completely detenninc'd by T. Specifically. we show that for eae:h i./. each generalized eigenspace KA... While developing a method for finding . 489). wc compute the matrices A. This ordering of the cye:les limits the possible. To aid in formulating the uniqueness theorem for . Let Ai. we index the ewcles so that p\ > p2 > • • • > pUi. Example 1 To illustrate the discussion above. 490).5 (p. 7 2 . anel let Ai = [T.. 7.4(b) (p. So by Theorems 7. the ordered basis fti for KA..7 (p. is no uniqueness theorem for the.2 THE JORDAN CANONICAL FORM II 497 For the purposes of this section. thereby computing ../ anel ft as well. bases fti or for ft. For each i.2. . is unique. and /Ai 0 J O A2 O 0\ 0 Ak) = m* = \() is the Jordan canonical form of T. we fix a linear operator T on an n-dimensional vector space V such that the characteristic polynomial of T splits. Then Ai is the Jordan canonical form of T. It then follows that the Jordan canonical form for T is unique up to an ordering of the eigenvalues of T. 7. and if the length of the cycle 7~ is pj. . for some /'. is the union of four cycles fti = 71 U 72 U 73 U 74 with respective .. It is in this sense that A. let T. be the restriction of T to KA.2 The Jordan Canonical Form II 7. contains an ordered basis fti consisting of a union of disjoint cycles of generalized eigenvectors corresponding to A. we adopt the following convention: The basis fti for KA.

0 0 0 0 Xi 1 0 0 0 A.v2 <•„. The proofs of these. In the following dot diagram of T.. p2 = 3. ..l)(c1) • e. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 A. 0 A. 1. we use an array of dots called a dot d i a g r a m of T. Then (Xi 0 0 0 0 0 0 0 1 0 A. Observe that r\ > r 2 > ••• > r P l . is the restriction of T to KA... which are combinatorial in nature. (T-Atl)p».l)p'-2(r (T-A.> 2 . Notice that the. respectively. 1 0 A. (T-AJ)"-1^. columns (one for each cycle.-'K... . facts.l)(„2) v2 •(T-A.I. The array consists of n* columns (one column for each cycle). 7 Canonical Forms Ar = 0 \ 0 0 0 0 0 0 0 Xi ) To help us visualize each of the matrices A. Now let /-. contains one clot for each vector in fti. Furthermore. Since. The dot diagram of T. dot diagram of Tj has //. the y'th column consists of the Pj dots that correspond to the vectors of 7/ starting with the initial vector at the top and continuing down to the end vector.. anel ordered bases 0i.l)". Denote the end vectors of the cycles by v\.) and p\ rows. where T. 0 0 0 0 Chap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A.the columns of the dot diagram become shorter (or at least not longer) as we move from left to right. anel the dots are configured according to the following rules. '•1 (T-A.2 (. are treated in Exercise 9. Counting from left to right. Suppose that fti is a disjoint union of cycles of generalized eigenvectors 71.i .72. and p4 = 1.498 lengths p] = 3. 2.) (T \i\r»i-2(vni) (T-K\)(vni) (T-A. each clot is labeled with the name of the vector in fti to which it corresponds. (T-A.. denote the number of dots in the j t h row of the dot diagram. pz = 2. the diagram can be reconstructed from the values of the r'j's. • • • »7n< with lengths p\ > p 2 > • • • > pn. pi > p2 > • • • > pni.

the dot diagram of Tj is as follows: • • • Here r\ = 4.. Hence the number of dots in the first r rows of the dot diagram equals nullity((T — AJ)7-).2 The Jordan Canonical Form II 499 In Example 1. For example. using only T and Xi. the number of Jordan blocks corresponding to Xi equals the dimension of Ex. The next three results give us the required method. Let a and b denote the number of vectors in S\ and S2. Then a + b = rrii. By the preceding remarks. Hence in a Jordan canonical form ofT. We now devise a method for computing the dot diagram of Tj using the ranks of linear operators determined by T and A^. 7.. x G Si if and only if x is one of the first r vectors of a cycle. Corollary. To facilitate our arguments. Hence a is the number of dots in the first r rows of the dot diagram. the vectors in fti that arc associated with the dots in the first r rows of the dot diagram of Ti constitute a basis for N((T — AJ) r ). the effect of applying U to x is to move the dot corresponding to x exactly r places up its column to another dot.. and hence it suffices to establish the theorem for U. r 2 = 3. But S\ is a linearly independent subset of N(U) consisting of a vectors.A ? J ) r . N((T — Ajl)r) = N(U). It follows that U maps 5 2 in a one-to-one fashion into fti. from which it follows that it is unique.- . cycles of generalized eigenvectors with lengths p\ > p 2 > • • • > prli. Hence the dot diagram is completely determined by T.A t l ) r ) C KA. and P4 = 1. The dimension of EA. and this is true if and only if x corresponds to a dot in the first r rows of the dot diagram. rather than with fti. Proof. see Exercise 8. For any positive integer r. and r3 = 2.9. Thus {U(x): x G S2} is a basis for R(U) consisting of b vectors. and KXi is invariant under ( T . 0i is not unique. is n«.9 yields the following corollary.) To determine the dot diagram of Ti. Let U denote the restriction of (T — Ajl)7' to KA*. p$ = 2. with n-i = 4.Sec. the number of dots in the j t h row of the dot diagram. For any x & S2. therefore 5] is a basis for N(U). 1 In the case that r = 1. we devise a method for computing each Tj. Theorem 7./. For any x G fti. (It is for this reason that we associate the dot diagram with T. we fix a basis 0i for KA. Now define Si = {x G fti: U(x) = 0} and S2 = {. N ( ( T . G A : U(x) ^ 0}. and so nullity(U) = nii — b = a. Clearly. respectively. and let m-i = din^KAj. On the other hand. Hence rank(U) = b. SO that fti is a disjoint union of n7. p\ = p 2 = 3. Theorem 7.

Then the following statements are true.Ajl)j) = dim(V) . rj = (rL +r2-\ + Tj) .( n + r2 + j h r^-i) I = [dim(V) . Example 2 Let (1 0 A = 0 -l 3 1 -1 V0 0 1\ -1 0 1 0 0 3/ .10.rank((T . 7 Canonical Forms We are now able to devise a method for describing the dot diagram in terms of the ranks of operators.Ajl)^ 1 )] = rank((T .Ail) J ). (a) n = dim(V) . and for j > 1. the restriction ofT to KA4. Hence we have proved the following result. Corollary.9. Proof.Xi\)j).rank((T .500 Proof. We apply these results to find the Jordan canonical forms of two matrices and a linear operator. for 1 < j» < pi.10 shows that the dot diagram of Tj is completely determined by T and Aj. Thus.1 ) . Let rj denote the number of dots in the jth row of the dot diagram of Tj.[dim(V) .Ajl)^ 1 ) . the Jordan canonical form of a linear operator or a matrix is unique up to the ordering of the eigenvalues.rank((T . Chap.A J ) ' .AJ). Theorem 7.rank((T . (b) rj = rank((T . the dot diagram of Ti is unique. Theorem 7.Xi\) )} . Exercise. By Theorem 7.rank((T . we have ri + r2 + • • • + rj = nullity((T .Xi\)j) if j > 1. subject to the convention that the cycles of generalized eigenvectors for the bases of each generalized eigenspace are listed in order of decreasing length. For any eigenvalue Xi ofT. Hence r\ — dim(V) — rank(T — A»l).rank(T .

and consequently any basis 02 for KA2 consists of a single eigenvector corresponding to A2 = 3. let rj denote the number of dots in the jth.) Hence the dot diagram associated with 0i is -1 0 i\ 1 -1 0 = 4 . 487). hence the dot diagram of Ti has three dots. Let Ti and T2 be the restrictions of L^ to the generalized eigenspaces KA. the computation of r 2 is unnecessary in this case because r\ = 2 and the dot diagram only contains three dots.) = 3 by Theorem 7. respectively.1 So M = P"il A (2 1 0' = 0 2 0 V° 0 2.3). it follows that dim^A. The characteristic polynomial of A is det(i4 .rank((. Since A2 = 3 has multiplicity 1.2) 3 (i . we have (2 1 0 2 = 0 0 \0 0 0 0\ 0 0 2 0 0 3/ J=[lAb . with multiplicities 3 and 1.2 The Jordan Canonical Form II 501 We find the Jordan canonical form of A and a Jordan canonical basis for the linear operator T = L^.ti) = (t. Suppose that 0\ is a Jordan canonical basis for Ti. respectively.4(c) (p.21) . 7. As we did earlier.Sec. it follows that dim(KA2) = 1. and KA2.2 = 2. /o 0 n = 4 — rank(i4 — 27) = 4 — rank 0 and r 2 = ram\(A . Therefore A2 = [T2J>a = (3).10. 1 -1 0 0 V V> . Thus A has two distinct eigenvalues. by Theorem 7. (Actually. Ai = 2 and A2 = 3.1 = 1. row of this dot diagram.4 .21 f) = 2 . Since Ai has multiplicity 3. Then. Setting 0 = 0\ U 02.

2I) 2 It is easily seen that f (\\ 0 0 W 2 AA l ' 2 w /o\ I 0 w is a basis for N((T — 2I) ) = KA. 7 Canonical Forms We now find a Jordan canonical basis for T = L^. Suppose that we choose (0\ 1 '••i = 2 w Then /o 0 = 0 V 0 L\ (0\ -i 1 1 -1 0 -1 0 2 1 --1 0 V w -*\ -1 " -1 V-i/ (T-2\)(vi) = (A-2I)(v1) Now simply choose v2 to be a vector in EA. for example.502 and so J is the Jordan canonical form of A. Now /0 0 = 0 \0 -1 1 1 -1 0 1\ -1 0 -1 0 0 /() . Let v\ and u2 denote the end vectors of the first and second cycles. . that is linearly independent of (T .21). Of these three basis vectors. We begin by determining a Jordan canonical basis 0\ for Ti.2I)2) but Vi £ N(T . each corresponding to a cycle of generalized eigenvectors. • (T-2l)(t>i) •v2 Prom this diagram we sec that V\ e N((T .2 0 0 = 0 0 -2 V" 0 0 A-2J and (A . We reprint below the dot diagram with the dots labeled with the names of the vectors to which they correspond. Chap. the last two do not belong to N(T — 21). select /1\ 0 V2 = 0 . there are two such cycles.21) (t>i). and hence we select one of these for v\.Since the dot diagram of Ti has two columns. respectively.

489).1 \-l then J = Q~lAQ.2 The Jordan Canonical Form II Thus we have associated the Jordan canonical basis M1 -1 K-y (°\ i > 2 sPJ (1\ 0 ) 0 v<v 503 0i = with the dot diagram in the following manner. Hence any eigenvector of L^ corresponding to A2 = 3 constitutes an appropriate basis 02. the linear independence of 0\ is guaranteed since t'2 was chosen to be linearly independent of (T — 21)(vi). Notice that if /-l . 0 0 W Thus \\ (°) 1 i 1 ? 2 h is a Jordan canonical basis for L^.1 Q = . dim(KA2) = dim(EA2) = 1. For example.Sec.6 (p. Since A2 = 3 has multiplicity 1. 7. o 1 i\ 1 0 0 2 0 0 0 0 1/ v<v /1\ /i\ 0 0 ! 0 ? 0 W W 02 = 0 = 0lU02 = . /-1\ -1 -1 \rh 0 \ 0 0 w • By Theorem 7.

Let T = L^.504 Example 3 Let ( A = 2 Chap. and hence this dot diagram has the following form: Thus M = [T 2 ] A = 4 0 1 4 .4) 2 . and let T. Let r\ denote the number of dots in the first row of this diagram. We begin by computing the dot diagram of Ti. Since A2 = 4 has multiplicity 2. Therefore ^=[T. we have dim(KA2) = 2.]. a Jordan canonical basis for L^.4 . 7 Canonical Forms -2 -2 V. be the restriction of L^ to KA* for i = 1. The characteristic polynomial of A is dct(A . there is only 4 — 3 = 1 dot in the first row of the diagram. Then r.. 0\ is an arbitrary basis for EAX = N(T — 21). /2\ M 1 1 5 2 0 \v w 01 = Next we compute the dot diagram of T 2 .21) = 4 .2)2(t .2 -4 2 0 1 -2 3 -6 3 2^ 3 3 V We find the Jordan canonical form J of A. In this case.2 = 2. and A2 = 4. Ai = 2. for example. Since rank(. hence the dot diagram of Ti is as follows. = 4 .ti) = (t . = g °) where /?i is any basis corresponding to the dots.rank(.2. and a matrix Q such that J — Q _ 1 AQ.A — 4/) = 3.

One way of finding such a vector was used to select the vector v\ in Example 2.4. -1 -1 \ V Thus ( /0\ 1 ={ 1 I W Therefore /& (2y M 1 1 1 o • 2 < 1 v) 2/ W / -I -1 A 1 l 02 = {(LA-4\)(v).2 The Jordan Canonical Form II 505 where 02 is any basis for KA.4I) 2 ) such that v £ N(T .. The corresponding Jordan canonical form is given by Ax O O An / 2 0 0 \ 0 0 2 0 0 0 0 4 0 0 \ 0 1 4/ •/ = \\-A\B = . we illustrate another method. 02 is a cycle of length 2. corresponding to the dots.v} -i -i \ V o/ 0 = 0lU02 = V <v is a Jordan canonical basis for L.41 is Choose v to be any solution to the system of linear equations /0\ {A . 7. In this case. In this example. The end vector of this cycle is a vector v € K^a = N((T . A simple calculation shows that a basis for the null space of L^ .41).AI)x = 1 W for example.Sec.

3xy + y) = 1 + Ax . . and KA = V.4) = 6 . Example 4 Let V be the vector space of polynomial functions in two real variables x and y of degree at most 2. • 0 1\ 1 -1 1 -1 1 V For example. let rj denote the number of dots in the jib. Then /() 0 0 A = 0 0 vo 1 0 0 0 0 o 0 0 0 0 0 o 0 2 0 0 0 o 0 0 0 0 0 o 0\ 0 1 0 0 oy and hence the characteristic polynomial of T is f-t det(A .y2. We find the Jordan canonical form and a Jordan canonical basis for T. row of the dot diagram of T. n = 6 .3 = 3.xy} is an ordered basis for V. if f(x.506 Chap. then TCffoy)) = j. 7 Canonical Forms Finally. we define Q to be the matrix whose columns are the vectors of 0 listed in the same order. x2.(x + 2x2 .y. y) = x + 2x2 — 3xy + y.10.rank(.3y. Let A = [T]a.y). namely. Then V is a vector space over R and a = {l. 0 1 1 Q = 0 2 V2 0 f2 Then J = Q~lAQ. By Theorem 7.y)) = -^f(x.ti) = det V 0 0 0 0 () I -t 0 0 0 0 0 0 -t 0 0 0 0 2 0 -t 0 0 0 0 0 0 -t 0 0\ 0 1 = ? 0 0 -v Thus A = 0 is the only eigenvalue of T. For each j.x. Let T be the linear operator on V defined by T(f(x.

y)) and (T . Since the first column of the dot diagram of T consists of three dots. = T(h(x. Setting fi(x.y2.x. we see A that (T ..rank(. Because there are a total of six dots in the dot diagram and r\ = 3 and r 2 = 2. So the dot diagram of T is We conclude that the Jordan canonical form of T is / 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 J = 0 0 0 0 1 0 0 0 0 0 \ 0 0 0 0 0 o\ 0 0 0 0 0/ We now find a Jordan canonical basis for T.2 The Jordan Canonical Form II and since A) 0 0 . it follows that r$ = 1.x2. we see that x is a suitable candidate.y)) = 0.xy} for dx K = V. but ^(f2(x.y) d2 such that 2 :fi(x. we must find a polynomial / 2 (a.y) 7^ 0.1 = 2.y. y) such that ~(f2(x.4 2 ) = 3 .y)) = T2(h(x. we must find a polynomial f\(x. .X\)2(j\(x.X\)(h(x.y)) = ^(x2) = 2.y)) = —(x2) = 2x Likewise.y))^0. since the second column of the dot diagram consists of two dots.y) = x .Sec.42 = 0 0 \0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0\ 0 0 0 0 0/ 507 r 2 = rank(4) . 7. Examining the basis a = {l.

The reader can do the same in the exercises. Thus JA = JB by the corollary to Theorem 7. however. Then A is similar to both J A and JB. So we set /2(a. xy.y2} y >xy • ir (T . Hence JA and JB are Jordan canonical forms of L^. each having Jordan canonical forms computed according to the conventions of this section.y)) is a Jordan canonical basis for T.x2. We are successful in these cases because the dimensions of the generalized eigenspaces under consideration are small.y. Then A and B are similar if and only if they have (up to an ordering of their eigenvalues) the same Jordan canonical form. J A and JB are matrix representations of L^.?y) = xy.2x.508 Chap. suppose that A and B are similar. Thus = %-(xy) = y. although one could be devised by following the steps in the proof of the existence of such a basis (Theorem 7.10. The only remaining polynomial in a is y2. •2 • 2x • x2 Thus 0 = {2. So set fz(x. We do not attempt. Lot A and B he n x n matrices. ox Finally.23 (p. by the corollary to Theorem 2.11. and it is suitable here. Therefore we have identified polynomials with the dots in the dot diagram as follows.7 p. to develop a general algorithm for computing Jordan canonical bases. the only choice in a that satisfies these constraints is xy. respectively.. with the same ordering of their eigenvalues.X\)(f2(x>y)) = T(f2(x. • In the three preceding examples. If A and B have the same Jordan canonical form J. The following result may be thought of as a corollary to Theorem 7. 7 Canonical Forms Since our choice must be linearly independent of the polynomials already chosen for the first cycle. Theorem 7. then A and B are each similar to J and hence are similar to each other. Let J A and JR denote the Jordan canonical forms of A and B.y) = y2.10. and therefore. 115). 490). Then A and B have the same eigenvalues. we relied on our ingenuity and the context of the problem to find Jordan canonical bases. Conversely. 1 Example 5 We determine which of the matrices B = . the third column of the dot diagram consists of a single polynomial that lies in the null space of T. Proof.

(e) Every matrix is similar to its Jordan canonical form. Then (see Exercise 4) 0 °\ 0 2 o . Thus a linear operator T on a finite-dimensional vector space V is diagonalizable if and only if its Jordan canonical form is a diagonal matrix. '0 1 2' [ o i l . B. (g) Every linear operator on a finite-dimensional vector space has a unique Jordan canonical basis. respectively. A and C are not diagonalizable because their Jordan canonical forms are not diagonal matrices. Let J A. • The reader should observe that any diagonal matrix is a Jordan canonical form. 7. Label the following statements as true or false. (a) The Jordan canonical form of a diagonal matrix is the matrix itself. 0 0 2 and 0 0 Jc = ° 2 1 Vo 0 2 /I JB = Since J A = Jc-. or C. Thus. If 0 is any basis for V. whereas D has —t(t — l)(t — 2) as its characteristic polynomial.2 The Jordan Canonical Form II and /. using the ordering 1. and C have the same characteristic polynomial —(t — l)(t — 2) 2 . (h) The dot diagrams of a linear operator on a finite-dimensional vector space are unique. B is similar to neither A nor C. B. A is similar to C. then the Jordan canonical form of [T]^ is J . Since JB is different from J A and Jc. Assume that the characteristic polynomial of the matrix or linear operator splits. (b) Let T be a linear operator on a finite-dimensional vector space V that has a Jordan canonical form J. 2 for their common eigenvalues. a n d Jc be the Jordan canonical forms of A. D cannot be similar to A. EXERCISES 1. (c) Linear operators having the same characteristic polynomial are similar. Similar statements can be made about matrices. and C. Because similar matrices have the same characteristic polynomials. (d) Matrices having the same Jordan canonical form are similar. and C in Example 5. JB. Hence T is diagonalizable if and only if the Jordan canonical basis for T consists of eigenvectors of T. B. B. . 509 are similar. (f) Every linear operator with the characteristic polynomial (—l)n(t — A)ra has the same Jordan canonical form.Sec. of the matrices A.0 0 2. Observe that A.

if any.2. = KA/? For each eigenvalue A*.3) are as follows: Ai = 2 A.3 Find the Jordan canonical form J of T. 3. Let T be a linear operator on a finite-dimensional vector space V with Jordan canonical form / 2 1 0 0 2 1 0 0 2 0 0 0 0 0 0 0 0 0 V0 0 0 (a) (b) (c) (d) 0 0 0 2 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 3 0 0N 0 0 0 0 0 3J Find the characteristic polynomial of T.AJ)^). For which eigenvalues A. Find the dot diagram corresponding to each eigenvalue of T. . where Uj denotes the restriction of T . Suppose that Ai = 2.7 = Q~1AQ. find the smallest positive integer pi for which KAi = N((T . (i = 1. = 4 A3 = . (a) A = 1 2\ -1 2 -1 2 1 4/ ( (c) A = (d) A 0 -2 -2 V-2 . docs EA.Ajl to KA. A2 = 4. (b). 7 Canonical Forms 2. Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. and A3 = —3 are the distinct eigenvalues of T and that the dot diagrams for the restriction of T to KA. (i) rank(Ui) (ii) rank(U 2 ) (iii) nullity(U?) (iv) nullity(U2) 4. Notice that the matrices in (a).510 Chap.. and (c) are those used in Example 5. find a Jordan canonical form J and an invertible matrix Q such that . For each of the matrices A that follow. (e) Compute the following numbers for each i.

(c) Use (b) to prove that A and At are similar. (a) Prove that [Tw]7' = ([Tw]7) • (b) Let J be the Jordan canonical form of A. 8.tet. 7. find a Jordan canonical form J of T and a Jordan canonical basis 0 for T. show that rank((A — Al)r) = rank((Al — Al)r).y). and T is the linear operator on V defined by T(f(x.y)) = £f(x. 7 be a cycle of generalized eigenvectors corresponding to an eigenvalue A. Hint: For any eigenvalue A of A and A1 and any positive integer r. Let A be an n x n matrix whose characteristic polynomial splits.e2t}. Let A be an n x n matrix whose characteristic polynomial splits. Let T be a linear operator on a finite-dimensional vector space.t2et. (b) T is the linear operator on ?3(R) defined by T(/(a:)) = xf"(x). {ex: x € 0} is a Jordan canonical basis for T. 7. Prove that A and A1 have the same Jordan canonical form. Define 7' to be the ordered set obtained from 7 by reversing the order of the vectors in 7. 6. (e) T is the linear operator on M 2 x 2 (i?) defined by T(A)=(l (f) ty(A-A% V is the vector space of polynomial functions in two real variables x and y of degree at most 2. (c) T is the linear operator on P-$(R) defined by T(f(x)) = f"(x) + 2f(x). and suppose that the characteristic polynomial of T splits.Sec. and conclude that A and A* are similar. Let 0 be a Jordan canonical basis for T. as defined in Example 4.2 The Jordan Canonical Form II 511 5. and T is the linear operator on V defined by T ( / ) = / ' . For each linear operator T. and W be the subspace spanned by 7. Use (a) to prove that J and J* are similar. (a) V is the real vector space of functions spanned by the set of realvalued functions {et. (a) Prove that for any nonzero scalar c. . (d) T is the linear operator on M 2x2 (i?) defined by T(A)-(» ty-A-A*.y) + ^f(x.

and let 0 be an ordered basis for V. then the new union is also a Jordan canonical basis for T. Prove the following results. that is different from the basis given in the example. 7 Canonical Forms (b) Suppose that 7 is one of the cycles of generalized eigenvectors that forms 0. 10. (d) Deduce that the number of dots in each column of a dot diagram is completely determined by the number of dots in the rows. and suppose that 7 corresponds to the eigenvalue A and has length greater than 1. A linear operator T on a vector space V is called if Tp = To for some positive integer p. 9. (a) Prove that dim(KA) is the sum of the lengths of all the blocks corresponding to A in the Jordan canonical form of T. Let x be the end vector of 7. Prove the following results. and let y be a nonzero vector in EA. Prove that any square upper triangular matrix with each diagonal entry equal to zero is nilpotent.512 Chap. Hint: Use mathematical induction on m. nilpotcnt nilpotent 11. where A is the matrix given in Example 2.Let 7' be the ordered set obtained from 7 by replacing x by x + y. Let T be a linear operator whose characteristic polynomial splits. (b) Deduce that EA = KA if and only if all the Jordan blocks corresponding to A are l x l matrices. (a) rn — p\ and k = r\. Definitions. and let A be an eigenvalue of T. (c) Apply (b) to obtain a Jordan canonical basis for LA. (c) n > r 2 > ••• > rm. The following definitions arc used in Exercises 11 19. (a) N ( P ) C N(T' + 1 ) for every positive integer i. and that if 7' replaces 7 in the union that defines 0. Suppose that a dot diagram has k columns and m rows with pj dots in column j and r* dots in row i. An nxn matrix A is called if Ap — O for some positive integer p. Let T be a nilpotent operator on an n-dimensional vector space V. 13. 12. (b) Pj = max {i: Ti > j} for 1 < j < k and r^ = max {j: pj > i} for 1 < i < rn. . Let T be a linear operator on a finite-dimensional vector space V. and suppose that p is the smallest positive integer for which T p = T 0 . Prove that T is nilpotcnt if and only if \T]j3 is nilpotent. Prove that 7' is a cycle of generalized eigenvectors corresponding to A.

486) and Exercise 8 of Section 7. and let J be the Jordan canonical form of T. SU = US. (c) Let 0 = 0P be the ordered basis for N(TP) = V in (b). and let Ai.) 17. 15. 16.) (a) Prove that S is a diagonalizable linear operator on V.2 The Jordan Canonical Form II 513 (b) There is a sequence of ordered bases 0\. that is. Recall from Exercise 13 that A = 0 is the only eigenvalue of T. for each i. 18. Let D be the diagonal matrix whose diagonal entries are the diagonal entries of J. Prove that for any positive integer i.. and let M = J — D. Then [T]^ is an upper triangular matrix with each diagonal entry equal to zero. such that x — v\ + V2 H \-Vfc. Vi is the unique vector in KA.3 (p. and hence V = KA. (b) Let U = T — S. (a) M is nilpotent.. 0P such that 0i is a basis for N(T*) and 0i+\ contains 0i for 1 < i < p — 1. Prove the converse of Exercise 13(d): If T is a linear operator on an ndimensional vector space V and (—l) n t n is the characteristic polynomial of T. Let T be a linear operator on a finite-dimensional vector space V. . Let S: V — V be the mapping defined by * S(x) = Ai?. (d) The characteristic polynomial of T is (—l)ntn. all the vectors associated with that column arc removed from 0.j + X2v2 H h Afc-Ufc. the resulting set is a basis for R(T*). Characterize all such operators. Hence the characteristic polynomial of T splits. Prove that U is nilpotent and commutes with S. diagram of 0. but zero is the only eigenvalue of T. and 0 is the only eigenvalue of T. .(This unique representation is guaranteed by Theorem 7. Let T be a nilpotent linear operator on a finite-dimensional vector space V. Give an example of a linear operator T on a finite-dimensional vector space such that T is not nilpotent.1.Sec. 7. (If a column of the dot diagram contains fewer than i dots. . 14. then T is nilpotent.Let 0 be a Jordan canonical basis for T. where. . Let T be a linear operator on a finite-dimensional vector space V such that the characteristic polynomial of T splits. A 2 . .. (b) MD = DM. if we delete from 0 the vectors corresponding to the last i dots in each column of a dot. Afc be the distinct eigenvalues of T. Prove the following results. p\..

19. Ar rAT Jr = 0 A? _x r ( r .l ) 2! rX r .. r = £>r + rDr~1M + r{'r~V)Dr-2M2 + ••• + rDMr~l + Mr. 0 (b) For any integer r >m.. .( r ..514 Chap. jr = Dr + rDr~lM + r^~^Dr-2M2 2! r! + (r-p-rl)\(p-l)\ + ••• Dr~p+1Mp-1.l ) . Prove the following results: (a) N m = O. 1 otherwise. and. for any positive integer r < p. for any positive integer r > p.( r . Let (X 0 0 J = 0 \0 0 0 0 0 ••• ••• 1 A/ 1 0 ••• A 1 ••• 0 A ••• 0\ 0 0 be the m x m Jordan block corresponding to A.m + 3) . r _ m + 2 (m-2)! \0 (c) 0 0 AT lim Jr exists if and only if one of the following holds: T—>00 (i) |A| < 1. (ii) A = 1 and m = 1. 7 Canonical Forms (c) If p is the smallest positive integer for which Mp — O. and let N = J — XIm.m + 2)xr_m+1^ \r—m+l (m-1)! r ( r .l ) . and for 1 < r < m.l r ( r .. then.

) 22. (d) Prove Theorem 5. and A is an n x n coefficient matrix as in Exercise 15 of Section 5..xn(t) of the real variable t. (b) There exists a positive number c such that ||J m || < c for every positive integer m. (c) P + S | | < ||A|| + ||£||. For any A G M n X n (C). (d) lim Am exists if and only if 1 is the only eigenvalue of A with m >o —o absolute value 1. . .19. Let A. Afc be the distinct eigenvalues of A.-w. 7. Use Exercise 20(d) to prove that eA exists for every A G M n x n ( C ) . The following definition is used in Exercises 20 and 21. Sec. Prove the following results.20(a) using (c) and Theorem 5. 23. .X2(t). ||AB||<n||A||||S||. A 2 . where x is an n-tuple of differentiable functions x\(t). 21. define the norm of A by \\A\\ = max{|j4jj|: 1 < i.2 The Jordan Canonical Form II 515 (Note that lim Ar exists under these conditions.. (See page 312. See the discusr—*oo sion preceding Theorem 5.13 on page 285. Let P be an invertible matrix such that P~lAP = J. (See Section 5. (a) (b) (d) ||A|| > 0 and ||A|| = 0 if and only if A = O.3. Let A G M n x n ( C ) be a transition matrix. 20. B G M n x n ( C ) . but assume that the characteristic polynomial of A splits.. . Definition. Let x' — Ax be a system of n linear differential equations.2. (a) ||A m || < 1 for every positive integer rn. .) Since C is an algebraically closed field. \\cA\\ = |c|»||i4|| for any scalar c. Prove the following results.j < n}. lim Jr r—»oo is the zero matrix if condition (i) holds and is the l x l matrix (1) if condition (ii) holds. Let Ai. (e) Theorem 5. A has a Jordan canonical form J to which A is similar.) Furthermore. In contrast to that exercise. do not assume that A is diagonalizable. . The next exercise requires knowledge of absolutely convergent series as well as the definition of eA for a matrix A.13 on page 285. however. (c) Each Jordan block of J corresponding to the eigenvalue A = 1 is a l x l matrix.

then for any polynomial f(t) of degree less than p. Hence there is a polynomial of least degree with this property. there is a polynomial f(t) of degree n such that f(T) = To. Theorem 7. 317) tells us that for any linear operator T on an n-dimensional vector space. (b) The minimal polynomial of T is unique. the function ex*[f(t)(A .3 THE MINIMAL POLYNOMIAL The Cayley-Hamilton theorem (Theorem 5. In particular. that is.Xilf-1 + f'(t)(A . y. (See Appendix E.23 p. we can divide g(t) by its leading coefficient to obtain another polynomial p(t) of the same degree with leading coefficient 1. Use Exercise 23 to find the general solution to each of the following systems of linear equations. 7 Canonical Forms (a) Prove that if u is the end vector of a cycle of generalized eigenvectors of LA of length p and u corresponds to the eigenvalue A. (b) Prove that the general solution to x' = Ax is a sum of the functions of the form given in (a). p(t) divides the characteristic polynomial ofT. (a) Let g(t) be a polynomial for which g(T) = TQ. (a) For any polynomial g(t). where x. thenp(t) divides g(t). where the vectors u are the end vectors of the distinct cycles that constitute a fixed Jordan canonical basis for LA24. Let T be a linear operator on a finite-dimensional vector space.12. there exist polynomials q(t) and r(t) such that 9(t) = q(t)p(t)+r(t). the characteristic polynomial of T. The next result shows that it is unique. and z are real-valued differentiable functions of the real variable t. If g(t) is such a polynomial.) Definition. The preceding discussion shows that every linear operator on a finitedimensional vector space has a minimal polynomial. (1) . namely. 562).. x' = 2x + y x' = 2x + y (a) y'= 2yz (b) y'= 2y + z z' = 3z z'= 2z 7. if g(T) = To.516 Chap. By the division algorithm for polynomials (Theorem E. Proof. p(t) is a monic polynomial.l of Appendix E.Xi\)p-2 + ••• + / ^ ( i O J t i is a solution to the system x' = Ax. p. and this degree is at most n. Let p(t) be a minimal polynomial of a linear operator T on a finite-dimensional vector space V. A polynomial p(t) is called a minimal polynomial ofT if p(t) is a monic polynomial of least positive degree for which p(T) — To.

Since p\(t) and P2(t) have the same degree. Let A € M n X n (F). we study primarily minimal polynomials of operators (and hence matrices) whose characteristic polynomials split.13. Definition. [~] Corollary. r(t) must be the zero polynomial. Thus (1) simplifies to g(t) — q(t)p(t). Proof. For any A G Mnxn(F). Let f(t) be the characteristic polynomial of T.12 and all subsequent theorems in this section that are stated for operators are also valid for matrices. (b) Suppose that p\(t) and p2(£) are each minimal polynomials ofT. Then P\(t) divides p2(£) by (a). Then the minimal polynomial ofT is the same as the minimal polynomial of [T]p. I| In view of the preceding theorem and corollary. Theorem 7. Proof. Theorem 7. For the remainder of this section. The minimal polynomial p(t) of A is the monic polynomial of least positive degree for which p(A) = O. Exercise.3 The Minimal Polynomial 517 where r(t) has degree less than the degree of p(i). and let 0 be an ordered basis for V. there exists a polynomial q(t) such that f(t) = q(t)p(t). Hence the characteristic polynomial and the minimal polynomial of T have the same zeros. Theorem 7. we have r(T) = To. Exercise. and let p(t) be the minimal polynomial of T.4. c = 1. hence P\(t) = p 2 (i). Let T be a linear operator on a finite-dimensional vector space V. A more general treatment of minimal polynomials is given in Section 7.14. Because Pi(t) and p 2 (i) are monic. then f(X) = q(X)p(X) = q(X).Sec. . If A is a zero of p(t). D The minimal polynomial of a linear operator has an obvious analog for a matrix. Substituting T into (1) and using that g(T) — p(T) — T 0 . Let T be a linear operator on a finite-dimensional vector space V. A is an eigenvalue of T. that is. Since p(t) divides f(t). 7. we have that p2(t) = cpi(t) for some nonzero scalar c.0 = 0. A scalar X is an eigenvalue ofT if and only ifp(X) = 0. The following results are now immediate. the minimal polynomial of A is the same as the minimal polynomial of LA • Proof. So A is a zero of f(t). proving (a). Since r(t) has degree less than p(t) and p(t) is the minimal polynomial of T.

that A is an eigenvalue of T.518 Chap.2)(t . . and so A is a zero of p(t). . we have 0 = T o (aO=p(T)(aO=p(A)a.(\k-t)n'>. Then [T]p = '2 . 7 Canonical Forms Conversely.14.nifc such that 1 < m» < rii for all i and p ( t ) « ( t . suppose. 1 \ i 2 . hence p(t) is the minimal polynomial of A.A*)™*. Example 1 We compute the minimal polynomial of the matrix 4 = Since A has the characteristic polynomial (3-t -1 0 \ 0 2-t 0 = -(t-2)2{t~3). Corollary. Let T be a linear operator on a finite-dimensional vector space V with minimal polynomial p(t) and characteristic polynomial f(t).6a + b) and 0 be the standard ordered basis for R2. Substituting A into p(t) = (t . . m 2 . Suppose that f(t) factors as f(t) = (\1-t)ni(\2-t)n>-. Afc are the distinct eigenvalues ofT. The following corollary is immediate. . b) = (2a + 56. .7 ) ( * + 4).. Then there exist integers mi. .A a ) m « • • • ( « .1. and let x G V be an eigenvector corresponding to A. .3). we find that p(A) — O. By Exercise 22 of Section 5. Since x ^ 0. I where X\. .A 1 ) m » ( t . > 6 1 and hence the characteristic polynomial of T is /(*)=det(2~' l l t ) = ( * . .. it follows that p(X) = 0.7 the minimal polynomial of A must be either (t — 2)(t — 3) or (t — 2)2(t — 3) by the corollary to Theorem 7. • Thus the minimal polynomial of T is also (t — 7)(t + 4). • / ( t ) = det Example 2 Let T be the linear operator on R2 defined by T(a. A 2 .

We now investigate the other extreme. Then the characteristic polynomial f(t) and the minimal polynomial p(t) have the same degree. Then afc 7^ 0 and g(T)(x) = a0x + axT(x) + ••• + o. the minimal polynomial of D is t. Since 0 is linearly independent. Let g(t) = a0-ra1t-r--+ aktk. This is no coincidence.7.15. or t3.. it follows that g(T)(x) ^ 0. Let T be a linear operator on an n-dimcnsional vector space V such that V is a T -cyclic subspace of itself. there exists an . and so g(T)(x) is a linear combination of the vectors of 0 having at least one nonzero coefficient. and hence f(t) = (-l)np(t). it follows that D2 ^ T 0 . We compute the minimal polynomial of T. it is easily verified that P 2 (i?) is a D-cyclic subspace (of itself).T(x).kTk(x). By Theorem 7. hence g(T) / To. So by the corollary to Theorem 7. Since D2(a.Sec. G V such that 0 = {x.3 The Minimal Polynomial Example 3 519 Let D be the linear operator on P 2 (/?) defined by D(g(x)) — g'(x).14.. 315). Theorem 7. the derivative of g(x). Then [O]0 = /(> 1 0 \ 0 0 2 . afc. the degree of the minimal polynomial of an operator must be greater than or equal to the number of distinct eigenvalues of the operator.14. which is also the degree of the characteristic polynomial of T. Let 0 be the standard ordered basis for P 2 (i?). namely. • In Example 3. \o 0 0 / and it follows that the characteristic polynomial of D is — t3. Since V is a T-cyclic space. Here the minimal and characteristic polynomials are of the same degree. . Therefore the minimal polynomial of T has degree n.Tn-1(x)} is a basis for V (Theorem 5.. be a polynomial of degree k < n.22 p..15 gives a condition under which the degree of the minimal polynomial of an operator is as large as possible. Proof. Theorem 7. 7. t2.2) = 2 ^ 0. The next result shows that the operators for which the degree of the minimal polynomial is as small as possible are precisely the diagonalizable operators. hence the minimal polynomial of D must be t3.

520 Chap. Afc be the distinct Proof.I)(T . 7 Canonical Forms Theorem 7. A . Then 0\ and 02 are disjoint by the previous comment. Suppose that T is diagonalizable.Afcl. .. A2 eigenvalues of T.V2. . Now let ft = {ci. which is clearly diagonalizable.A2) • • • (/ — Afc_i). and define p(t) = (t-\1)(t-\2)•••(tAfe). Let Ai. Furthermore. W2. . Clearly T is diagonalizable for n = 1. Tw is diagonalizable.wp} be a basis for N(T . and lei dim(V) = n and W = R(T . . and consider any '.A2I) • • • (T . Afc such that the minimal polynomial p(t) of T factors as p(t) = (t-X1)(t-X2)•••(*Afc). p(t) divides the minimal polynomial of T.A? divides p(i). Hence by the induction hypothesis.Vn} be a basis for V consisting of eigenvectors of T. Then W is T-invariant. there is a polynomial qj(t) such that p(t) = qj(t)(t . Therefore p(t) is the minimal polynomial of T.. Then (T — Xj\)(Vi) = 0 for some eigenvalue Ay.bv such that a\V\ + a 2 t' 2 .!)(:/. then T = Afc I. and for any x G W. . Conversely. A 2 .. A . and let 02 = {w\. By Theorem 7.14. We apply mathematical induction on n = diin(V). It follows that p(T) = To. We show that Q = 0i U 02 is linearly independent.AAI) = {()}.. the A/'s are eigenvalues of T.. Let T be a linear operator on a finite-dimensional vector space V.A/. .• • . . because Afc is an eigenvalue of T. vm) be a basis for W consisting of eigenvectors of Tw (and hence of T). . Now assume that T is diagonalizable whenever dim(V) < n for some n > 1.A.are the distinct eigenvalues ofT. the eigenspace of T corresponding to Afc. By Theorem 7. Let 0 — {vi.16. If W = {0}. It follows that the minimal polynomial of Tw divides the polynomial (/ — Ai)(r .r. Therefore W n N(T . a m and b\..: G 0.) = 0. • • • . suppose that there are distinct scalars Ai.Afcl). (T . A2 (t-X1)(t-X2)---(t-Xk).-1). .14.Afc_.b2. Since t .. . a 2 . Hence p(T)(vi) = qj(T)(T-Xj\)(vi) = 0.t' m + b\ u-\ + b2ti<2 + • • • + bpwp = 0. Consider scalars a ] . since p(T) takes each vector in a basis for V into 0. Afc is not an eigenvalue of Tw by Theorem 7. Obviously W ^ V. .14.Xj). Then T is diagonalizable if and only if the minimal polynomial ofT is of the form p(t) = where Ai. So suppose that 0 < diin(W) < n. Moreover.'.. m+p = n by the dimension theorem applied to T .

It follows that 0 is a basis for V consisting of eigenvectors of T.3 The Minimal Polynomial Let x = 22 aiVi and */ = ! > < (D. and we conclude that 0 is a linearly independent subset of V consisting of n eigenvectors.l)(t . the minimal polynomial p(t) of A divides g(t). (See Exercise 7 of Section 7.16. and therefore .4. t . then A = / or A ~ 21. Hence the only possible candidates for p(i) are t — \. It follows that x = -y <E W fl N(T — Afcl). We show that A is diagonalizable. and x + y = 0. If p(t) = t . In the case that the characteristic polynomial of the operator splits. the minimal polynomial can be described using the rational canonical form. Since g(t) has no repeated factors.3A-r 21 = O. i=l 521 Then x G W. which we study in the next section.) In the case that the characteristic polynomial does not split. 7. Since 0\ is linearly independent.1 or p(t) =t-2. then A is diagonalizable with eigenvalues 1 and 2. we saw that the minimal polynomial of the differential operator D on P2(7?) is t3.Sec. Thus A is diagonalizable by Theorem 7.3t + 2 = (t .r = 0. • Example 6 In Example 3. y G N(T . 1 In addition to diagonalizable operators.) Example 4 We determine all matrices A G M 2X 2(fl) for which A2 . there arc methods for determining the minimal polynomial of any linear operator on a finite-dimensional vector space.2. Similarly. by Theorem 7. and (I . and hence the minimal polynomial p(t) of A divides g(t). the minimal polynomial can be described using the Jordan canonical form of the operator.16.Afcl). Then g(A) = O.2). • . D is not diagonalizable. (See Exercise 13. neither does p(t). Hence. Let g(t) = t3 . we have that ai = a 2 = • • • = a m = 0. respectively.2). If p(t) = (t — l)(t — 2). Since g(A) = O. b\ = 62 = • • • = 6» = 0.t = t(t + l)(t .1). and consequently T is diagonalizable.l)(t . and hence A is similar to 1 0 0 2 Example 5 Let A £ MnXn(7?) satisfy A3 = A. Let 2 g(t) = t .

Suppose that f(t) splits. b) = (a + b. Then the degree of the minimal polynomial of T equals dim(V).522 EXERCISES Chap. (i) Let T be a linear operator on a vector space V such that T has n distinct eigenvalues. (h) Let T be a linear operator on a vector space V such that V is a T-cyclic subspace of itself. Find the minimal polynomial of each of the following matrices. where n — dim(V). (c) The characteristic polynomial of a linear operator divides the minimal polynomial of that operator. (e) Let T be a linear operator on an n-dimensional vector space V. Determine which of the matrices and operators in Exercises 2 and 3 are diagonalizable. and f(t) be the characteristic polynomial of T. 4. Label the following statements as true or false. . Then the degree of the minimal polynomial of T equals n. (a) Every linear operator T has a polynomial p(t) of largest degree for which 7->(T) = TQ. 7 Canonical Forms 1. Then f(t) divides \p(t)\n(f) The minimal polynomial of a linear operator always has the same degree as the characteristic polynomial of the operator. 2. find the minimal polynomial of T. (a) (b) (c) (d) V= V= V= V= R2 and T(o. (g) A linear operator is diagonalizable if its minimal polynomial splits. a . Hint: Note that T 2 = I.b) P2(R) and T(g(x)) = g'(x) + 2g(x) P2(R) and T(f(x)) = -xf"(x) + f'(x) + 2f(x) M n X n (i?) and T(A) = A*. (d) The minimal and the characteristic polynomials of any diagonalizable operator are equal. Assume that all vector spaces are finite-dimensional. For each linear operator T on V. p(t) be the minimal polynomial of T. (a) 2 1\ 1 2j 4 -14 1 -4 1 -6 5 2 4 (b) 1 0 1 1 (c) 3. Describe all linear operators T on R2 such that T is diagonalizable and T3 _ 2T 2 + T = T 0 . 5. (b) Every linear operator has a unique minimal polynomial.

1 + a n _ . Prove that the minimal polynomial of T is (t-x1r(t-x2y^. and suppose that W is a T-invariant subspace of V. Prove that V is a T-cyclic subspace if and only if each of the eigenspaces of T is one-dimensional. . . Prove the following results. (b) If T is invertible and p(t) = tn + a n _ i £ n _ 1 + • • • + ait + a 0 .2). (b) The minimal polynomial of Dv (the restriction of D to V) is g(t).7). I ) . 7.13 and its corollary. (a) V is a D-invariant subspace. Let Ai. Hint: Use Theorem 2. where D is the differentiation operator on C°°. Prove that the minimal polynomial of Tw divides the minimal polynomial of T. The following exercise requires knowledge of direct sums (see Section 5. 7. . Prove Theorem 7. then the characteristic polynomial of Dv is(-l)ng(t). 11. . T n . 12. Prove the following results. Let g(t) be the auxiliary polynomial associated with a homogeneous linear differential equation with constant coefficients (as defined in Section 2. and for each i let pi be the order of the largest Jordan block corresponding to Xi in a Jordan canonical form of T.1 = -— ( T " . a0 9. 135) for (b) and (c). Prove that there exists no polynomial g(t) for which </(D) = T 0 . (c) If the degree of g(t) is n. and suppose that the characteristic polynomial of T splits. 13. (a) T is invertible if and only if p(0) -£ 0.14.--(t-xk)p^. Let T be a linear operator on a finite-dimensional vector space. the space of polynomials over R. and let p(t) be the minimal polynomial of T. 10. Let T be a diagonalizable linear operator on a finite-dimensional vector space V.2 + • • • + a 2 T + a . 523 8. Prove the corollary to Theorem 7. Let T be a linear operator on a finite-dimensional vector space V. then T . Hence D has no minimal polynomial. . A 2 . Let D be the differentiation operator on P(/?). and let V denote the solution space of this differential equation.Sec.32 (p.3 The Minimal Polynomial 6. Afc be the distinct eigenvalues of T. Let T be a linear operator on a finite-dimensional vector space.

and generalized eigenvectors in our analysis of linear operators with characteristic polynomials that split. eigenvectors. (a) There exists a unique monic polynomial g\ (t) of least positive degree such that 0] (T)(.• ^ Wi. (c) If p(t) is the T-annihilator of x and W is the T-cyclic subspace generated by x. (a) The vector x has a unique T-annihilator. characteristic polynomials need not split. Then gi(t) divides g2(t). (d) The degree of the T-annihilator of x is I if and only if x is an eigenvector of T. 7. Definition. * Let T be a linear operator on a finite-dimensional vector space V. then gi(t) divides h(t). the unique factorization theorem for polynomials (see Appendix E) guarantees that the characteristic polynomial f(t) of any linear operator T on an n-dimensional vector space factors . 16. and indeed. and let Wi be a T-invariant subspace of V.T) G WI. Exercise 15 uses the following definition. Let x G V such that . The polynomial p(t) is called a T-annihilator of x if p(t) is a monic polynomial of least degree for which p(T)(x) = 0.524 Chap. (c) gi(t) divides the minimal and the characteristic polynomials of T. that p\{t)p2(t) is the minimal polynomial of T. 15. Let T be linear operator on a finite-dimensional vector space V. Prove or disprove. and let x be a nonzero vector in V. respectively. (d) Let W 2 be a T-invariant subspace of V such that W 2 C W^ and let g2(t) be the unique monic polynomial of least degree such that g2(T)(x) G W 2 . then />(<) is the minimal polynomial of Tw. and Tw 2 . (b) The T-annihilator of X divides any polynomial git) for which 9(T) = T(). and dim(W) equals the degree of p(t).4* THE RATIONAL CANONICAL FORM Until now we have used eigenvalues. and let x be a nonzero vector in V. Prove the following results. . Prove the following results.(t) is a polynomial for which h(T)(x) G Wi. 7 Canonical Forms 14. Suppose that pi(t) and p2(t) are the minimal polynomials of Tw. and let Wi and W 2 be T-invariant subspaces of V such that V = W| ©W 2 . In general. (b) If fi. operators need not have eigenvalues! However.. T be a linear operator on a finite-dimensional vector space V. Let T be a linear operator on a linite-dimensional vector space V.

In general. Let T be a linear operator on a finite-dimensional vector space V with characteristic polynomial f(t) = ( . For 1 < i < k. Having obtained suitable generalizations of the related concepts of eigenvalue and eigenspace. The one that we study is called the rational canonical form. then the set {^T^.4 The Rational Canonical Form uniquely as f(t) = (-i)n(Mt)r(<h(t))n2---(Mmnk 525 where the </>i(£)'s (1 < i < k) are distinct irreducible monic polynomials and the ni's are positive integers. Definition. Recall (Theorem 5.To distinguish this basis from all other ordered bases for Cx. 315). and let x be a nonzero vector in V. In this section. We briefly review this concept and introduce some new notation and terminology Let T be a linear operator on a finite-dimensional vector space V.. the following definition is the appropriate replacement for eigenspace and generalized eigenspace. Note that if (/>i(t) = t — X is of degree one. we call it the T-cyclic basis g e n e r a t e d by x and denote it by .22 (p. we establish structure theorems based on the irreducible monic factors of the characteristic polynomial instead of eigenvalues. of V by K^. For this reason. each irreducible monic polynomial factor is of the form </»j(i) = t — Xi. and there is a one-to-one correspondence between eigenvalues of T and the irreducible monic factors of the characteristic polynomial.Sec. = {x G V: (<fii(T))p(x) — 0 for some positive integer p). our next task is to describe a canonical form of a linear operator suitable to this context. then K^ is the generalized eigenspace of T corresponding to the eigenvalue A. we define the subset K^. In the case that f(t) splits. 7. but the irreducible monic factors always exist. We use the notation Cx for the T-cyclic subspace generated by x. where Aj is an eigenvalue of T..22) that if dim(Ca. In this context. where the <t>i(t)'s (1 < i < k) arc distinct irreducible monic polynomials and the ni 's are positive integers.^-1^)} is an ordered basis for Cr.T2^). eigenvalues need not exist. the reader should recall the definition of a T-cyclic subspace generated by a vector and Theorem 5. We show that each K^ is a nonzero T-invariant subspace of V.. it can be defined by specifying the form of the ordered bases allowed for these representations. Since a canonical form is a description of a matrix representation of a linear operator.) = k.i r O i ( * ) ) n i (Mt))n* ••• (MW. Here the bases of interest naturally arise from the generators of certain cyclic subspaces..

is the companion is a monic irreducible divisor of the characteristic polynomial of T and m is a positive integer. The next theorem is a simple consequence of the following lemma. the monic polynomial h(t) is also the minimal polynomial of A.22 that /o 0 • • 0 1 0 • • 0 0 1 • • 0 xo A -ao \ -ai -a2 -afc-i/ + Tk(x) = 0. Lemma. By Exercise 15 of Section 7. and suppose that the T-annihilator of x is of the form ((j)(t) )v for some irreducible monic polynomial <j)(t). 519). which relies on the concept of T-annihilator. . and x G K&. Let T be a linear operator on a finite-dimensional vector space V.) By Theorem 7.3.3. introduced in the Exercises of Section 7. (See Exercise 19 of Section 5. It is the object of this section to prove that for every linear operator T on a finite-dimensional vector space V. + ••• + ak-ifk~1 + tk. Recall from the proof of Theorem 5.4. 7 Canonical Forms 0X. and the characteristic polynomial of the companion matrix of a monic polynomial g(t) of degree k is equal to ( — l)kg(t). We call the accompanying basis a rational canonical basis for T. the characteristic polynomial of A is given by det(i4 . 0 1 ak-lTk-l{x) where aox + a\T(x) + Furthermore. there exists an ordered basis 0 for V such that the matrix representation [T]^ is of the form (Cl o O o c2 Cr) \o matrix of a polynomial (<t>(t))m such that <j)(t) o where each C7. h(t) is also the minimal polynomial of this restriction. Let A be the matrix representation of the restriction of T to C x relative to the ordered basis 0X.15 (p. Then (f>(t) divides the minimal polynomial of T. h(t) is also the T-annihilator of x. let x be a nonzero vector in V. Since A is the matrix representation of the restriction of T to Cx. A matrix representation of this kind is called a rational canonical form of T.526 Chap.ti) = (-l) f c (a 0 + axt + • • • + k-\ ak-it The matrix A is called the companion matrix of the monic polynomial h(t) = ao + ai/. Every monic polynomial has a companion matrix.

v3. the characteristic polynomial f(t) of T is the product of the characteristic polynomials of the companion matrices: f(t) = Mt)(h(t))2Mt) = Mt)(Mt))3• .. By Exercise 40 of Section 5.~= Sec.17.4. x G K. respectively. v6} U {v7. Proof.v2} U {v3.17. In the context of Theorem 7.4 The Rational Canonical Form 527 Proof. Exercise.v2.v8}. Example 1 Suppose that T is a linear operator on R8 and 0= {vi. Let T be a linear operator on a finite-dimensional vector space V. (<i>(t))p divides the minimal polynomial of T. Therefore 4>(t) divides the minimal polynomial of T.v4. and Cz are the companion matrices of the polynomials <f>\ (t).v5.v8} Ill is a rational canonical basis for T such that ( o -3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 -1 0 0 0 -2 1 0 0 0 0 0 0 0 \ 0 0 0 0 0 0 0 0 0 0 0 -1 1 oy C=[Tb = v is a rational canonical form of T. Then 0 is a rational canonical basis for T if and only if 0 is the disjoint union of T-cyclic bases 0Vi.3. where each Vi lies in K^ for some irreducible monic divisor <p(t) of the characteristic polynomial ofT.v6. 0 = 0VI U 0V3 U 0V7 = {vi. and 0 2 (£).V4. (cj)2(t))2. In this case. 0 is the disjoint union of the T-cyclic bases. the submatrices C\. and let 0 be an ordered basis for V.v7. By Exercise 15(b) of Section 7.p by the definition of K^.v5. | Theorem 7. Furthermore. that is. C 2 . 7. where <j)l(t)=t2 -t + 3 and 4>2(t) = t2 + 1.

(T) to K(j>j is one-to-one and onto. and the result follows. then the T-annihilator of x is of the form ((j>i(t))p for some integer p. (a) K0( i.18. and (e) are obvious. Let T lie a linear operator on a finite-dimensional vector space V. Now suppose that k > 1. Since the minimal polynomial of an operator divides tin1 characteristic polynomial of the operator. every linear operator T on a finite dimensional vector space has a rational canonical form C. the minimal polynomial and the characteristic polynomial have the same irreducible divisors. (d) K0. We begin with a result that lists several properties of irreducible divisors of the minimal polynomial. (e) K. while (c) and (d) are vacuously true. . The reader is advised to review the definition of T-annihilator and the accompanying Exercise 15 of Section 7. is invariant under (f>j(T) for i ^ j.528 Chap.. where the <f>i(t)*s (1 < i < k) arc the distinct irreducible monic factors of pit) and the nij 's are positive integers. (b). Theorem 7. each of which is the companion matrix of some power of a monic irreducible divisor of the characteristic polynomial of T. every irreducible divisor of the former is also an irreducible divisor of the latter. If k = 1. (b) If x is a nonzero vector in some K^. Then the following statements are true.. first observe that /. To prove that K^.( because (<i>i(T))mi(x) = (d>i(T))mifi(T)(z)=p(T)(z) q = 0. (c) K0.(/))'"'. Then x G K. and suppose that p(t) = (Mt))mi(Mt)r3---(Mt))mk is the minimal polynomial ofT. where 4>(t) is an irreducible monic divisor of the minimal polynomial of T. and the restriction of 0. In the course of showing that. (a) The proof that Kfl>i is a T-invariant subspace of V is left as an exercise. Hence the T-annihilator of x divides (4>%(t))q by Exercise 15(b) of Section 7. 7 Canonical Forms The rational canonical form C of the operator T in Example 1 is constructed from matrices of the form C\. is nonzero.. Proof. nKlh = {()} fori ^j. We eventually show that the converse is also true: that is. A key role in our analysis is played by the subspaces K^. Furthermore.3.(T))'"<) for each i. Then (<f>i(T)) (x) = 0 for some positive integer q.(/) is a proper divisor of pit): therefore there exists a vector z G V such that x — fi(T)(z) ^ 0. Let fi(t) be the polynomial obtained from p(t) by omitting the factor (0. we show that the companion matrices C.s a nonzero T-invariant subspace of V for each i.3. each such divisor is used in this way at least once. then (a).N((0.. (b) Assume the hypothesis.' . that constitute C are always constructed from powers of the monic irreducible divisors of the characteristic polynomial of T. .

N((0.iT)y"')fiiT)iy) =p(T)(y) = 0. The next major result.(T))'"') C K^.Sec. Then x G K^. and fr(V(vj) = 0 for i / j. Let x G K</)( HK^... so suppose that k > 1.(T) to (2). Then there exists y G K 0i such that fj(T)(y) = x. Suppose that 0_.19. (2) . applying /.(T) to K 0i is onto.19. Since V is finite-dimensional. Let x G Kfa. we need to know when such a union is linearly independent. reduces this problem to the study of T-cyclic bases within K^. we obtain /„(T)(«0 = 0. where <p(t) is an irreducible monic divisor of the minimal polynomial of T. Proof. Since fj(t) is a product of polynomials of the form (pj(i) for j ^ i. Then v{ = 0 for all i. Lemma. (e) Suppose that 1 < i < k. For 1 < i < k. /i(T) is one-to-one on K^. Let fi(t) be the polynomial obtained from p(t) by omitting the factor (<t>i(t))"1' • As a consequence of Theorem 7. We begin with the following lemma. it is also 0y(T)-invariant. and suppose that p(t) = (Mt))mi(Mt))m3---(</>k(t)yrn/. We conclude that x = 0. Consider any i. where the 0. 7. Thus. and suppose that p(t) = (Mt))mi(<h(t))m2---(Mt))mk is the minimal polynomial of T. I and hence x G N ( 0 . D K<t>j = {<)} by (c). I Theorem 7. Let fc(t) be the polynomial defined in (a). we have by (d) that the restriction of /. By (b). = N((^(T))'"')- Since a rational canonical basis for an operator T is obtained from a union of T-cyclic bases. from which it follows that Vi = 0. The result is trivial if k = 1. this restriction is also onto. Theorem 7. and suppose that x ^ 0. Therefore ((MVr-Kx) = «<P. let Vi G K 0i be such that V1+V2 + '" +vk = 0. the T-annihilator of. X G K 0 / ... 's (1 < i < k) are the distinct irreducible monic factors of pit) and the rrii 's are positive integers. But this is impossible because (j>i(t) and <f>j(t) are relatively prime (see Appendix E)./: is a power of both 4>i(t) and <I)j(t). Let T be a linear operator on a finite-dimensional vector space V.4 The Rational Canonical Form 529 (c) Assume i ^ j. Clearly. (d) Assume / ^ j. Therefore the restriction of 0y(T) to K 0i is one-to-one. ( T ) ) m ' ) .-(T)(:/:) = 0 for SOUK. Thus K 0 . Let T be a linear operator on a finite-dimensional vector space V. Since K<Pi is T-invariant.18.

. Furthermore. Then vk be distinct vectors in K. we fix a linear operator T on a finite-dimensional vector space V and an irreducible monic divisor 6(t) of the minimal polynomial of T. Then (a) follows immediately from Theorem 7. where 0(f) is an irreducible monic divisor of the minimal polynomial of T.. Proof.20 and 7.. Now : suppose that k > 1. Let v\. the proof of (b) is identical to the proof of Theorem 5. Si =0VlU is linearly independent. then (a) is vacuously true and (b) is obvious. Then (a) 5j n 5j = 0 for i ^ j (b) S\ U S2 U • • • U Sfc is linearly independent. v2.. 7 Canonical Forms is the minimal polynomial of T.530 Chap. (3) < = 1 j=0 For each i. i=\ (4) . If A — 1. T h e o r e m 7. Consider any linear combination of vectors in 5 2 that sums to zero. let Si be a linearly independent subset of K 0 i ..20. choose wi G V such that 0(T)(w^) = t'j. say. 1=0 Then (3) can be rewritten as £ / < ( T ) ( t m ) = 0.19. i In view of Theorem 7. Proof.8 (p. The next several results give us ways to construct bases for these spaces that are unions of T-cyclic bases. we can focus on bases of individual spaces of the form K(/)(/). S2 = 0un U 0W2 U • • • U 0Wk is also linearly independent. where the 0j 's (1 < i < fc) are the distinct irreducible monic factors ofp(t) and the 7m's are positive integers. For Theorems 7.-(/) be the polynomial defined by / i ( t ) = y w . let /. These results serve the dual purposes of leading to the existence theorem for the rational canonical form and of providing methods for constructing rational canonical bases. 267) with the eigenspaces replaced by the subspaces K^.18(c). For 1 < i < k.21 and the latter's corollary.p such that 0V2 U • • • U 0Vk For each i.

and hence 0(f) divides fi(t) for all i. 0 can be extended to the linearly independent set 0' = 0U0W1 whose span contains N(0(T)). 7. So (4) becomes k k X > ( T ) 0 ( T ) K ) = £ > ( T ) M . Then 0 U 0X is linearly independent.w2. Since Si is linearly independent. (a) Suppose that x G N(0(T)). . (See Exercise 15 of Section 7. 1 We now show that K 0 has a basis consisting of a union of T-cycles. Then the following statements are true.) By Theorem 7. linear independence of 5i requires that /i(T)(tiK)=fl|. 0 = J > 0 > ( T ) ( u .3. but x (£ W.(T)(«i) = 0 far all i. v 2 . But fi(T)(wi) is the result of grouping the terms of the linear combination in (3) that arise from the linearly independent set 0Wi. vk}. (a) Let 0 = {vi. it follows that fi(T)(vi) = 0 for all*. ) = X > ( T ) ( « 0 = »• i=i i=l i=\ 531 This last sum can be rewritten as a linear combination of the vectors in S\ so that each /»(T)(«j) is a linear combination of the vectors in 0Vi. i=l j=l Again. We conclude that for each i. Thus.ws in N(0(T)).18(b). • • • .0. aij = 0 for all j. (b) For some ttfi. Therefore the T-annihilator of vi divides fi(t) for all i. 0(f) divides the T-annihilator of Vi. . . Lemma. . for each i. Proof. Let W be a T-invariant subspace ofK^. and suppose that yZ aiVi + z = 0 i=i and z = 2_\fy~^j (x)' 3=0 U0W2U---U0Wa. and let 0 be a basis for W. . Therefore <S2 is linearly independent.Sec.4 The Rational Canonical Form Apply 0(T) to both sides of (4) to obtain I > 0 O / < ( T ) ( t i . . there exists a polynomial gi(t) such that fi(t) = <?j(f)0(f).

By Theorem 7. Therefore z = 0. Now suppose that. 0\ = 0 U 0Wl is linearly independent. but not in Wi. for some integer m > 1. Apply (b) of the lemma to W = {0} to obtain a linearly independent subset of V of the form 0Vl U 0V2 U • • • U 0Vk./?u. where k < m. .. v 2 .. and therefore d = dim(Q) < dim(C x n W ) < dim(C x ) = d. Proof. Then 2 has 0(f) as its T-annihilator. Let W = span(/3).0W„ to 0. Then R(0(T)) is a T-invariant subspace of V.fc+1.vk are the generating vectors of the T-cyclic bases that constitute this rational canonical basis. It follows that Cx PI W = Cx. w 2 . Let r = rank(0(T)). Let Wi = span(/?i). then there exists a rational canonical basis for T.2 in N(0(f)). . to obtain a linearly independent set 0' = 0Wl U0W2 U •• • U0Wk U • • • U0Wa whose span W contains both W and N(0(T)). ws in N(0(T)) such that the union 0'=0U0WlU0W2U---U0Wa is a linearly independent set whose span contains N(0(T)). The proof is by mathematical induction on m. . and consequently x G W. it follows that a^ = 0 for all i. choose a vector u. For each i. Then W contains R(0(T)). Then z G Cx D W. the union 0 of the sets 0Wi is linearly independent. . choose Wi in V such that Vi = cf>(T)(wi). we eventually obtain vectors w\. contrary to hypothesis. . . where w^ is in N(0(T)) for i > k. Therefore we may apply the induction hypothesis to obtain a rational canonical basis for the restriction of T to R(T). and assume that the minimal polynomial of T is p(t) = (0(f) ) m . and the restriction of T to this subspace has (0(f)) m _ 1 as its minimal polynomial. Suppose that z ^ 0. . | T h e o r e m 7. Since V = N(0(T)). Apply (b) of the lemma and adjoin additional T-cyclic bases Ai. Thus 0 U 0X is linearly independent. so that /32 = 0\ D0W2 = 0U0Wl Li0W2 is linearly independent. . If Wi does not contain N(0(f)).fc+2. Since 0 is linearly independent. 7 Canonical Forms where d is the degree of 0(f). from which it follows that bj — 0 for all j. and hence C2 C Cx n W. . By (a).. (b) Suppose that W does not contain N(0(T)). the result is valid whenever the minimal polynomial of T is of the form (0(T))fc.20. Choose a vector tui G N(0(f)) that is not in W. this set is a rational canonical basis for V. . Continuing this process. Suppose that v\. Suppose that m = 1.532 Chap. whose span contains N(0(T)).21. If the minimal polynomial ofT is of the form p(t) = (0(f) ) m . if necessary.

and hence y G K ^ and x = f> fc (T)) m *(y) = 0 by Theorem 7. K0 has a basis consisting of the union of T-cyclic bases. Theorem 7. 1 < i < k — 1. and let p(t) = (0i(f)) m i (0 2 (*)) m 2 • • • (<f>k(t))mk be the minimal polynomial of T. Every linear operator on a finite-dimensional vector space has a rational canonical basis and. Proof. a rational canonical form. 0fc(f) does not divide q(t). and let q(t) be the minimal polynomial of U.Sec. hence. which is 0(T)-invariant. and T has a rational canonical form. 0\ and 02 are disjoint.22. Therefore 0 is a rational canonical basis. Let U denote the restriction of 0(T) to W . K0fc has a basis 02 consisting of a union of Tcyclic bases. By the corollary to Theorem 7.4 The Rational Canonical Form 533 We show that W = V. and 0' is a rational canonical basis for T. Therefore dim(W') = rank(U) + nullity(U) = rank(0(T)) + nullity(0(T)) = dim(V). there would exist a nonzero vector x G W such that 0fc(U)(:r) = 0 and a vector y G V such that x = (4>k{T))mk(y). Let U be the restriction of T to the T-invariant subspace W = R((0fc(T)mfc).21. U has a rational canonical basis 0\ consisting of a union of U-cyclic bases (and hence T-cyclic bases) of vectors from some of the subspaces K 0i . I I | . Proof. For otherwise. The case k — 1 is proved in Theorem 7. We conclude that 0 is a basis for V. Suppose that the result is valid whenever the minimal polynomial contains fewer than k distinct irreducible factors for some k > 1. Let s denote the number of vectors in 0.19. Let T be a linear operator on a finite-dimensional vector space V.21 to the restriction of T to K^. Then s = dim(R((0fc(T))Wfc)) + dim(K0fc) = rank((0 f c (T)r*) + nullity((0fc(T))TOfc) = n. and suppose that p(t) contains k distinct factors.3. It follows that (MT))mk + 1(y) = 0. The proof is by mathematical induction on k. Furthermore.18(e). a contradiction. By Theorem 7. So by the induction hypothesis. By the way in which W was obtained from R(0(T)). where the 0i(f)'s are the distinct irreducible monic factors of p(t) and mi > 0 for all i. and 0 = 0i U/?2 is linearly independent. Thus q(t) contains fewer than k distinct irreducible divisors. Apply Theorem 7. We are now ready to study the general case. Corollary. Then q(t) divides p(t) by Exercise 10 of Section 7. Thus W = V. it follows that R(U) = R(0(T)) and N(U) = N(0(T)). 7.21.

. we may multiply those characteristic polynomials that arise from the T-cyclic bases in 0i to obtain the factor (0j(f)) ni of /(f). and hence for some integer p. We are now able to relate the rational canonical form to the characteristic polynomial. (c). because this sum is equal to the degree of /(f). i=i .i ) » ( f c ( t ) r (Mt)r ••• (Mt))nk. T has a rational canonical form C. In particular. i=i Now let s denote the number of vectors in 7.22. T h e o r e m 7. 7 Canonical Forms In our study of the rational canonical form. where the (pi(t)'s (I < i < k) are distinct irreducible monic polynomials and the Hi's are positive integers. then 0(f) divides the characteristic polynomial of T because the minimal polynomial divides the characteristic polynomial.4.. then 7 = 71 U 7 2 U • • • U 7fc is a basis for V. Since /(f) is the product of the characteristic polynomials of the companion matrices that compose C. where di is the degree of 0i(f). we relied on the minimal polynomial. then 0i = 0C\ K 0i is a basis for K 0i for each i.. Conversely.diUi. then 7 is a rational canonical basis for T. and hence of T. and (d) Let C — \T]p.19. which is a rational canonical form of T.534 Chap. if 0(f) is an irreducible monic polynomial that divides the minimal polynomial of T. is the product of the characteristic polynomials of the companion matrices that compose C. By Theorem 7. Then the following statements are true. and so 0j(f).23. divides the minimal polynomial of T. Furthermore. 0fc(f) are the irreducible monic factors of the minimal polynomial. and the union of these bases is a linearly independent subset 0i of K 0 i . (b) For each i. if each 7$ is a disjoint union of T-cyclic bases. the characteristic polynomial of C. We conclude that (0j(f)) p . (1 < i < k). n = 2_. By Exercise 40 of Section 5. dim(K 0i ) = diUi. Proof. Let T be a linear operator on an n-dimensional vector space V with characteristic polynomial /(f) = ( .. (a) By Theorem 7. Therefore each irreducible monic divisor 0i(f) of /(f) divides the characteristic polynomial of at least one of the companion matrices. (c) If 0 is a rational canonical basis for T. we have nidi < dim(K 0i ). (d) If 7J is a basis for K 0i for each i. (0j(f)) p is the T-annihilator of a nonzero vector of V. 7 is linearly independent. Since this polynomial has degree nidi. and therefore = y^dini i=i < / ]dim(K^ i ) = s < n. (b). 0 2 (f). Consider any i. (a) 0i(f).

Furthermore. Furthermore. within each such grouping. As in the case of the Jordan canonical form.. Certainly. It follows that 7 is a basis for V and 0i is a basis for K^ for each i. subject to this order. 0(f) is an irreducible monic divisor of the characteristic polynomial of T. the rational canonical form is unique. then the dot diagram is Although each column of a dot diagram corresponds to a T-cyclic basis .. we show that except for these permutations. the rational canonical form of a linear operator is unique up to the arrangement of the irreducible monic divisors. we are now in a position to ask about the extent to which it is unique. To simplify this task. 0Vj contains dpj vectors. if k = 3.. This polynomial has degree dpj\ therefore. and P3 = 2. As in the case of the Jordan canonical form. In what follows. although the rational canonical bases are not. Our task is to show that. T is a linear operator on a finite-dimensional vector space with rational canonical basis 0. A proof that the resulting dot diagrams are completely determined by the operator is also a proof that the rational canonical form is unique. let (0(f))Pj be the annihilator of Vj. and dini = dim(K()f. 0V2. and d is the degree of 0(f).4 The Rational Canonical Form 535 Hence n = s. the rational canonical form of a linear operator T can be modified by permuting the Tcyclic bases that constitute the corresponding rational canonical basis. 0Vl.i) for all i. This has the effect of permuting the companion matrices that make up the rational canonical form. In the case of the rational canonical form.3. we introduce arrays of dots from which we can reconstruct the rational canonical form.Sec. 0Vk are the T-cyclic bases of 0 that are contained in K^. we devised a dot diagram for each eigenvalue of the given operator. 7. we define a dot diagram for each irreducible monic divisor of the characteristic polynomial of the given operator. we adopt the convention of ordering every rational canonical basis so that all the T-cyclic bases associated with the same irreducible monic divisor of the characteristic polynomial are grouped together. For the Jordan canonical form. pi > p 2 > • • • > pk since the T-cyclic bases are arranged in decreasing order of size.. I Uniqueness of the Rational Canonical Form Having shown that a rational canonical form exists. column. p2 = 2. pi = 4. For example. For each j. we arrange the T-cyclic bases in decreasing order of size. by Exercise 15 of Section 7. We define the dot diagram of 0(f) to be the array consisting of k columns of dots with Pj dots in the jib. arranged so that the jfth column begins at the top and terminates after pj dots.

it contributes two companion matrices to the rational canonical form. with only one dot. that the dot diagrams associated with these divisors are as follows: Diagram for 0i(f) Diagram for <j>2(t) Diagram for 03(f) Since the dot diagram for Oi (f) has two columns.. and 4>-sit) = t2 + t + I. and therefore corresponds to the 2 x 2 companion matrix of (0i(f)) 2 = (f — l) 2 . furthermore. each containing a single dot. These two companion matrices are given by Ci = 0 1 -1 2 and C2 = (\). and suppose that the irreducible monic divisors of the characteristic polynomial of T are 01(f)=f-l. there are two dot diagrams to consider.536 Chap.. The second column. Since V3 has T-annihilator (0 2 (f)) 2 and V7 has T-annihilator 0 2 (f). The first column has two dots. Example 3 Let T be a linear operator on a finite-dimensional vector space over R. there are fewer dots in the column than there are vectors in the basis. the dot diagram for 0i(f) consists of a single dot. in the dot diagram of 0 2 (f) we have p\ = 2 and p 2 = 1. corresponds to the l x l companion matrix of 0i(£) = t — 1. 0V3 and 0Vl. This is illustrated in the next example. 7 Canonical Forms 0Vi in K(/. hence this diagram contributes two copies of the 2 x 2 companion . we obtain the rational canonical form of a linear operator from the information provided by dot diagrams. lie in K 02 . These diagrams are as follows: Dot diagram for 0i(f) Dot diagram for <p2(t) • In practice. The dot diagram for 0 2 (f) = f2 + 2 consists of two columns. 0i(f) = t — t + 3 and 0 2 (f) = t + 1. Because 0i(f) is the T-annihilator of V\ and 0Vl is a basis for K 0 . The other two T cyclic bases. Since there are two irreducible monic divisors of the characteristic polynomial of T. Example 2 Recall the linear operator T of Example 1 with the rational canonical basis ii and the rational canonical form C — [T]/j. Suppose. 0 2 (f) = f 2 + 2 .

For 0 < i < d. 0V2. We now relate the two diagrams. be the cycle of generalized eigenvectors of U corresponding to A = 0 with end vector TZ(VJ).. and U has a Jordan canonical form. 7. the characteristic polynomial of U is ( . The dot diagram associated with the Jordan canonical form of U gives us a key to understanding the dot diagram of T that is associated with 0(f). Let U denote the restriction of the linear operator 0(T) to K^.l ) w f m . namely. and suppose again that the T-annihilator of Vj is (4>(t))Pj. we fix a linear operator T on a finite-dimensional vector space and an irreducible monic divisor 0(f) of the characteristic polynomial of T.—z MET Sec. = 0 1 -1 -ly Therefore the rational canonical form of T is the 9 x 9 matrix (Cx O C = O o \o 0 C2 0 o o o -1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O O c3 o O 0 0 1 0 0 0 0 0 0 O O o c4 O 0 0 0 0 1 0 0 0 0 o\ O o o C5J 0 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -2 1 0 0 0 0 0 0 o > 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 1 -1 J ( v We return to the general problem of finding dot diagrams. let 7. Let 0 be a rational canonical basis for T...2. and 0Vl. \Jq = TQ for some positive integer q. Consider one of these T-cyclic bases 0Vj.18(d).. Therefore K^ is the generalized eigenspace of U corresponding to A = 0. Then 0Vj consists of dpj vectors in 0. where m = dim(K^). C-x ~ CA — 537 The dot diagram for 03(f) = f2 4. . By Theorem 7.4 The Rational Canonical Form matrix for 0 2 (f). As we did before. 0Vk be the Tcyclic bases of 0 that are contained in K 0 . by Exercise 12 of Section 7. Consequently.f + 1 consists of a single column with a single dot contributing the single 2 x 2 companion matrix C.

For each j. 7J'S are disjoint. and the. ) . results in a column of pj dots in D\. bearing in mind that in the first case we are considering a rational canonical form and in the second case we are considering a Jordan canonical form. Because a is a union of cycles of generalized eigenvectors of U. Consider any linear combination of these vectors a o W ) ) * . (0(T))^. v Now let otj = 70U71 U .4 (p.1 ^ ) . If] We are now in a position to relate the dot diagram of T corresponding to 0(f) to the dot diagram of U.1 ^ ) + a 1 (0(T))*. By Lemma 1. H Thus we may replace 0V. this basis is .J-lVivj). •. We conclude that otj is a basis for C v . Proof.2 T ' ( r . and the second diagram D2.1 (t. and hence (<l>(t))Pj~lg(t) is a nonzero polynomial with degree less than pjd. For convenience.1 (0(T))^. it suffices to show that the set of initial vectors of these cycles { ( 0 ( T ) r . 7 Canonical Forms (<PiT))Tivj). Exercise 9 implies that a is a basis for K 0 . it follows that (0(T)) p J _1 o(T)(vj) ^ 0. We do this for each j to obtain a subset o: = Qi U Q:2 • • • U afc of K^.. Since 0Vi U 0V2 U • • • U 0Vk is a basis for K^.1 T(t V ) (0(T))^-1P'"1(f/)} is linearly independent. The key to this proof is Theorem 7. where not all of the coefficients are zero.Then g(t) is a nonzero polynomial of degree less than d.1 (p. 7. . By Theorem 7. i ).. Since (0(f)) Pj is the T-annihilator of Vj. is a linearly independent subset of C. Therefore the set of initial vectors is linearly independent.. Proof. Since ctj is the union of cycles of generalized eigenvectors of U corresponding to A = 0. So by Theorem 7. and since span(aj) = span(/3Wi) = CVi. 485). . Then H = {(<f>(T)y. 487). Chap.U 7 d _ i .. by otj as a basis for Cv . otj is linearly independent.4. we designate the first diagram Di.1 T d . we conclude that a is a Jordan canonical basis. Let g(t) be the polynomial defined by g(t) — ao + a\t + • • • + a<i-if d_1 . otj is an ordered basis for CVj. Lemma 1. which has dimension pjd. Consequently. Notice that otj contains pjd vectors.1 T(i. otj consists oipjd linearly independent vectors in Cv . i ) + • • • -r-a d . Lemma 2. a is a Jordan canonical basis for K^. the presence of the T-cyclic basis 0X.Tivj)}. ( 0 ( T ) ) ^ .538 where T°(Vj) = bj.

Under the conventions described earlier. otj determines d columns each containing pj dots in D2. let 0(f) be an irreducible monic divisor of the characteristic polynomial ofT of degree d. Theorem 7. xex sin 2x} .rank((0(T)) i )l for % > 1. each of length Pj.24. We call the number of such occurrences the multiplicity of the elementary divisor. Alternatively. Since a companion matrix may occur more than once in a rational canonical form. d Thus the dot diagrams associated with a rational canonical form of an operator are completely determined by the operator. Corollary. denote the number of dots in the ith row of the dot diagram for 0(f) with respect to a rational canonical basis for T. and let r. the rational canonical form of a linear operator. the polynomials corresponding to the companion matrices that determine this form are also unique. are called the elementary divisors of the linear operator. ex sin 2x. In effect. the elementary divisors and their multiplicities determine the companion matrices and. Thus we have the following result. 7. Then (a) n = i[dim(V) . and all columns in JD2 are obtained in this way. Example 4 Let 0 = {ex cos 2x.10 (p.1 ) . each row in D2 has d times as many dots as the corresponding row in D\.Sec. Since Theorem 7.4 The Rational Canonical Form 539 replaced by the union otj of d cycles of generalized eigenvectors of U. Conversely. we may divide the appropriate expression in this theorem by d to obtain the number of dots in the corresponding row of D\. therefore. Since the rational canonical form of a linear operator is unique.So each column in Di determines d columns in D2 of the same length.rank(0(T))] (b) n = i[rank((0(T)) i . which becomes part of the Jordan canonical basis for U. 500) gives us the number of dots in any row of D2. the same is true for the elementary divisors. we have the following uniqueness condition. which are powers of the irreducible monic divisors. the rational canonical form of a linear operator is unique up to the arrangement of the irreducible monic divisors of the characteristic polynomial. Since the rational canonical form is completely determined by its dot diagrams. xex cos 2x. These polynomials. Let T be a linear operator on a finite-dimensional vector space V.

and the dot diagram is as follows: 4\ 0 0 o/ Hence V is a D-cyclic space generated by a single function with D-annihilator (0(f)) 2 . Since 0(/. we can use A in place of D in the formula given in Theorem 7. is /(f) = ( f 2 . Let D be the linear operator on V defined by D(y) = y'. It follows that the second dot lies in the second row. Then 1 -2 A = 0 ^ 0 ( 2 1 0 0 1 0\ 0 1 1 2 -2 V and the characteristic polynomial of D.24.540 Chap. and let V = span (/if). and 0 is an ordered basis for V.20f + 25. and let A — [D]^. the dot diagram for 0(f) contains only two dots.r a n k ( 0 ( A ) ) ] = i [ 4 . Furthermore. R). the derivative of y. 7 Canonical Forms be viewed as a subset of J-(R. Then V is a four-dimensional subspace oiT(R. R). which is /() 0 1 0 0 1 \0 0 0 0 0 1 -25\ 20 -14 4/ Thus (0(f)) 2 is the only elementary divisor of D.2 ] = l.4i 3 + 14f2 . Because ranks are preserved under matrix representations. its rational canonical form is given by the companion matrix of (0(f)) 2 = f4 . Therefore the dot diagram is determined by ri.) has degree 2 and V is four-dimensional. For the cyclic generator. the space of all real-valued functions defined on R. the number of dots in the first row.2 f + 5) 2 .4 4>(A) = 0 0 0 \0 0 0 and so r 1 = l [ 4 . . it suffices to find a function g in V for which 0(D) (o) ^ 0. Thus 0(f) = f2 — 2f -f-5 is the only irreducible monic divisor of /(f). and it has multiplicity 1. Now (0 0 0 0 0 . and hence of A.

Example 5 For the following real matrix A. 7. By Theorem 7.r a n k ( .4 The Rational Canonical Form 541 Since 0(^4)(e3) ^ 0. Since the degree of 0i (f) is 2. dim(K 0 .23.rank(0i (A))] = i [ 5 . Let A G M n x n (F). Hence 0g = {xex cos 2x.Likewise. Thus the dot diagram of 0i (f) is . D(xex cos 2x).4 3 . then Q~XAQ = C. the elementary divisors and their multiplicities are the same as those of LA • Let A be an n x n matrix. if Q is the matrix whose columns are the vectors of 0 in the same order. Notice that the function h defined by h(x) = xex sin 2x can be chosen in place of g. the total number of dots in the dot diagram of 0i(f) is 4/2 = 2. let C be a rational canonical form of A.•*• Sec. we find the rational canonical form C of A and a matrix Q such that Q~lAQ — C. for A. • It is convenient to refer to the rational canonical form and elementary divisors of a matrix. Definitions. The rational canonical form of A is defined to be the rational canonical form of LA. therefore g(x) = xex cos 2x can be chosen as the cyclic generator.excos2a:) ^ 0. 4 2 + 2/)] = |[5-1]=2. and let 0 be the appropriate rational canonical basis for L^.) = 4 and dim(K02) = 1. it follows that 0(D)(a. Then C = [LA\P. therefore 01 (f) = f2 + 2 and 0 2 (f) = f — 2 are the distinct irreducible monic divisors of /(f). and therefore A is similar to C. which are defined in the obvious way.3 2\ 2 2 2 *) The characteristic polynomial of A is /(f) = — (f2 + 2)2(f — 2). This shows that the rational canonical basis is not unique. D3(xex cos 2x)} is a rational canonical basis for D. 0 2 0 -6 1 -2 0 0 0 1 -3 A = 1 1 -2 1 -1 V1 . D2(xex cos 2x). and the number of dots n in the first row is given by n = 4[dim(R 5 ) . In fact.

Then it can be seen that 2\ -2 0 Av-? = -2 V-4/ and 0Vl U 0V2 is a basis for K 0 ] . = N(0(Lyt)). which contributes the l x l matrix (2). where v\ and V2 each have annihilator 0i(f). we sec that fi)\ 1 1 1 \V Next choose v2 in K^. It can be shown that ((1\ 0 o 0 AA (o\ l 0 0 • 2 0 1 w W ( I w \ °\ 0 -l > 0 X V _ is a basis for N(0i(L. is the union of two cyclic bases 0Vx and 0V2. but not in the span of 0Vi = For example.542 and each column contributes the companion matrix 0 1 -2 0 Chap.Av\}. Av\ .2 lie in N(0i(L / t)). It follows that both v\ and ?. / {v\. V2 = e 2 . Hence 0 2 (f) is an elementary divisor with multiplicity 1.4)).2) = 1.\. Consequently 0i(f) is an elementary divisor with multiplicity 2. then 0D K^. the dot diagram of 0 2 (f) = f — 2 consists of a single dot. Since dim(K0. 7 Canonical Forms for 0i (f) = f2 + 2 to the rational canonical form C. Therefore the rational canonical form C is / 0 -2 0 0 1 0 0 0 C = 0 0 0 -2 0 0 1 0 0 V o 0 0 0 \ 0 0 0 2/ We can infer from the dot diagram of 0i(f) that if 0 is a rational canonical basis for L^. Setting v\ — e.

In this case.2 = 2.Sec.23. and r 3 = rank((0(4)) 2 ) . hence in applying Theorem 7.rank(0(A)) = 4 . any nonzero vector in K02 is an eigenvector of A corresponding to the eigenvalue A = 2. where r^ is the number of dots in the ith row of the dot diagram. we find the rational canonical form C and a matrix Q such that Q~lAQ = C: (2 1 0 0\ 0 2 1 0 A = 0 0 2 0 \0 0 0 2/ Since the characteristic polynomial of A is /(f) = (f — 2) 4 . we obtain n = 4 .rank((0(^)) 3 ) = 1 . Av2.SO setting 2 0\ (1 0 0 0 1 1 -2 1 0 1 Q = 0 1 0 0 1 0 -2 1 \o 1 0 . 0(f) has degree 1. r 2 = rank(0(^)) . choose M I V3= 1 1 w By Theorem 7.0 = 1. Av\.4 The Rational Canonical Form 543 Since the dot diagram of 0 2 (f) = f — 2 consists of a single dot.4 2/ we have Q~lAQ = C. 7. Since there are dim(R 4 ) = 4 dots in the diagram.4)) 2 ) = 2 . 0 = {vi. Example 6 For the following matrix A.rank((0(.V2. and so K 0 = R4. the only irreducible monic divisor of /(f) is 0(f) — f — 2. For example.24 to compute the dot diagram for 0(f). we may terminate these computations • .1 = 1.v%} is a rational canonical basis for LA.

v2 = < 1 satisfies this condition.e2. and v2 G N(L4 .\. Vj £ N((L 4 .e 4 }). and (f — 2) has the companion matrix (2). Thus the dot diagram for A is Chap. preceding dot diagram indicates that there are two vectors V\ and v2 in R4 with annihilators (0(f)) 3 and 0(f).2 l ) 2 ) = span({ei.. 7 Canonical Forms Since (f — 2) has the companion matrix '0 0 1 0 J) 1 8" -12 6. It follows that 0\ Avi = 2 V and n \ A V{ = 4 4 W Next we choose a vector v2 G N(L/i -21) that is not in the span of 0Vi. Thus r/o\ (o\ 1 0 \ 1 > 2 I w W (i\ 1 4 w (0\ 0 0 W . v2} is a rational canonical basis for L. respectively.e4}) and N ( ( L A . The standard vector C3 meets the criteria for v\\ so we set V\ = ('3.21). A«. The. A2vi. the rational canonical form of A is given by / 0 1 C = 0 \ 0 0 8 0 -12 6 1 0 0 0 \ 0 0 2 J Next we find a rational canonical basis for L. and such that 0 = [0Vl U 0Vi} = {v. Clearly. Furthermore.2I)2). It can easily be shown that N(L4-2l)=span({c1.4.544 with r$...

1 1 EXERCISES 1. where each V{ lies in K^ for some irreducible monic divisor 0(f) of the characteristic polynomial ofT.Sec.(f)'s (1 < % < k) are distinct irreducible monic polynomials and the ni 's are positive integers. Label the following statements as true or false.4 The Rational Canonical Form is a rational canonical basis for LA- 545 Finally.17. (a) V = K^ ©K^ 2 ©••• © K ^ . let Q be the matrix whose columns are the vectors of 0 in the same order: ft) 0 1 0\ 0 1 4 0 Q = 1 2 4 0 \0 0 0 1/ ThenC = Q-MQ. Then V is a direct sum of T-cyclic subspaces CVi.26. The next theorem is a simple consequence of Theorem 7. Proof. Then the following statements arc true. fhen C\ © C 2 © • • • © Ck is the rational canonical form ofT. Exercise. Theorem 7. Direct Sums* The next theorem is a simple consequence of Theorem 7. 7. . where the 0. (b) If Ti (1 < i < k) is the restriction ofT to K^. (a) Every rational canonical basis for a linear operator T is the union of T-cyclic bases. Let T be a linear operator on a finite-dimensional vector space V. ( o r ( 0 2 ( o r • • • (0fc(f)r. Theorem 7.23. Exercise. Let T be a linear operator on an n-dimensional vector space V with characteristic polynomial /(f) = ( . and Cj is the rational canonical form of Tj.i r (0.25 (Primary Decomposition Theorem). Proof.

(f) Let 0(f) be an irreducible monic divisor of the characteristic polynomial of a linear operator T. then it is a rational canonical basis for T. 7 Canonical Forms (b) If a basis is the union of T-cyclic bases for a linear operator T. then its Jordan canonical form and rational canonical form are similar. Define T to be the linear operator on V such that T(/) = /'• (c) T is the linear operator on M 2x2 (. any irreducible factor of the characteristic polynomial of T divides the minimal polynomial of T.}. R). 2.546 Chap. (c) There exist square matrices having no rational canonical form. The dots in the diagram used to compute the rational canonical form of the restriction of T to K 0 are in one-to-one correspondence with the vectors in a basis for K^. find the rational canonical form C of A and a matrix Q G M n X n ( F ) such that Q~l AQ = C. the rational canonical form C. (d) A square matrix is similar to its rational canonical form.xsinx. (a) A = F = R (b) A = 0 1 -1 -1 F= R (c) A = (d) A = F = R (e) A = F = R 3. find the elementary divisors.cosa:. (b) Let S — {sinx.xcos3. (a) T is the linear operator on P$(R) defined by T(f(x)) = f(0)x-f'(l). and a rational canonical basis Q. and let V = span(5). For each of the following linear operators T. (g) If a matrix has a Jordan canonical form. (e) For any linear operator T on a finite-dimensional vector space.R) defined by . For each of the following matrices A G M n x „ ( F ) . a subset of F(R.

Let T be a linear operator on a finite-dimensional vector space with minimal polynomial m = (Mt))mi(Mt)y w)r where the 0*(f)'s are distinct irreducible monic factors of /(f). Prove that the rational canonical form of T is a diagonal matrix if and only if T is diagonalizable. 7.4 The Rational Canonical Form T(A) = 0 -1 1 1 •A. Define T to be the linear operator on V such that T(f)(x. (c) Prove that the minimal polynomial of the restriction of T to R(0(T)) equals (0(f)) m ~ 1 . (b) Give an example to show that the subspaces in (a) need not be equal. 6.y) = df(x. ^2 has T-annihilator 0 2 (f). 5. (c) Describe the difference between the matrix representation of T with respect to 0Vl U 0V2 and the matrix representation of T with respect to 0Va.R). 7. Let T be a linear operator on a finite-dimensional vector space V with characteristic polynomial /(f) = (—l) n 0i(f)0 2 (f). and 0Vl U0 V2 is a basis for V. m-i is the number of entries in the first column of the dot diagram for 0i(f). cos x sin y. a subset of F(R x R. .y) dx . df(x. and let V = span(S'). to assure the uniqueness of the rational canonical form. sin x cost/.Sec. (a) Prove that there exist V\.1 ). (b) Prove that there is a vector t^ G V with T-annihilator 0i(f)0 2 (f) such that 0Va is a basis for V.V2 G V such that v\ has T-annihilator <i>i(t). where 0i(f) and 0 2 (f) are distinct irreducible monic polynomials and n — dim(V). (a) Prove that R(0(T)) C N((0(T)) m . Prove that for each i. Thus. Let T be a linear operator on a finite-dimensional vector space V with minimal polynomial (0(f)) m for some positive integer m. we require that the generators of the T-cyclic bases that constitute a rational canonical basis have T-annihilators equal to powers of irreducible monic factors of the characteristic polynomial of T. cos a: cost/}. Let T be a linear operator on a finite-dimensional vector space.y) dy 4. 547 (d) Let S = {sin x sin?/.

. Let V be a vector space and 0i.3. Exercises 11 and 12 arc concerned with direct sums. Prove. . Let T be a linear operator on a finite-dimensional vector space. . and suppose that 0(f) is an irreducible monic factor of the characteristic polynomial of T.02. Hint: Apply Exercise 15 of Section 7. •.• • . then 0(f) divides the characteristic polynomial of T. 10. 7 Canonical Forms 8. 7 2 . that if 0(f) is the T-annihilator of vectors x and y.25. 9. Prove Theorem 7. 11. 7fc are linearly independent subsets of V such that span(7$) = span(/?») for all i.548 Chap. Now suppose that 7i. Let T be a linear operator on a finite-dimensional vector space V. Prove that for any irreducible polynomial 0(f).26. 12. Prove that 71 U 7 2 U • • • U 7fc is also a basis for V. INDEX OF DEFINITIONS FOR CHAPTER 7 Companion matrix 526 Cycle of generalized eigenvectors 488 Cyclic basis 525 Dot diagram for Jordan canonical form 498 Dot diagram lor rational canonical form 535 Elementary divisor of a linear operator 539 Elementary divisor of a matrix 541 End vector of a cycle 488 Generalized eigenspace 484 Generalized eigenvector 484 Generator of a cyclic basis 525 Initial vector of a cycle 488 Jordan block 483 Jordan canonical basis 483 Jordan canonical form of a linear operator 483 Jordan canonical form of a matrix 491 Length of a cycle 488 Minimal polynomial of a linear operator 516 Minimal polynomial of a matrix 517 Multiplicity of an elementary divisor 539 llational canonical basis of a linear operator 526 Rational canonical form for a linear operator 526 Rational canonical form of a matrix 541 . then x G Cy if and only if C x = C«. if 0(T) is not one-to-one. Prove Theorem 7.0k be disjoint subsets of V whose union is a basis for V.

and B / A. By describing the elements of the set in terms of some characteristic property. then B is called a proper subset of A. then we write x E A. Two sets A and B are called equal. the set consisting of the elements 1. is the set of elements that are in A.4} = {3.8. which we denote by R throughout this text.2.2. Note that the order in which the elements of a set are listed is immaterial. if every element of B is an element of A. or both. One set that appears frequently is the set of real numbers. hence {1. 2.4} or as {x: x is a positive integer less than 5}.1.4. Observe that A = B if and only if A C B and B C A. By listing the elements of the set between set braces { }.7. If x is an element of the set A. written A = B. " if Z is the set of integers. Sets may be combined to form other sets in two basic ways.2.6. For example. For example. AU B = {x: x e A or x & B}.A p p e n d i c e s APPENDIX A SETS A set is a collection of objects.6} C {2. called elements of the set. a fact that is often used to prove that two sets are equal. The empty set.1.2. {1. Example 1 Let A denote the set of real numbers between 1 and 2. that is.3.2}. Sets may be described in one of two ways: 1. is the set containing no elements. For example. we write x G A. 2. otherwise. • A set B is called a subset of a set A. denoted A U B.3. or B. then 3 G Z and i ^ Z. and 4 can be written as {1. The union of two sets A and B. written B C A or A 3 B.1}. If B C A.3. The empty set is a subset of every set. 549 . 3. denoted by 0 .4} = {1. Then A may be written as A = {xeR:l<x<2}.if they contain exactly the same elements.

7. j=i Similarly. Example 2 Let A= {1. . Then AuB = {1. by M A tt = {x: X E AQ for some a G A} r*6A and p | Aa = {x: x G An for all a G A}. The union and intersection of more than two sets can be defined analogously. a€A • . that is. . Likewise.2. A fl B = {x: x G A and :r G B}.r G A{ for all i = 1 . n}. is the set of elements that are in both A and B. Specifically.5.8} and A n J? = {1. Thus X and Y are disjoint sets.4.550 Appendices The intersection of two sets A and B.7. 2 . .5. An are sets.3. Two sets are called disjoint if their intersection equals the empty set. and let 4 a = < x e R: — < x < 1 + a I « for each a G A. if Aj.8}.4.2. Then ( J A a = {£ G R: x > . respectively.8} and Y = {3.5}.1 } oeA and f) Aa = {x e R: 0 < x <2}.3. . then the union and intersections of these sets are defined. denoted Ad B. . . then XUY = {1.5. . .8} • and X (lY = 0. the union and intersection of these sets arc defined. respectively. .5} and B = {1.5}. A 2 . . if X = {1. aeA Example 3 Let A = {a € R: a > 1}. .3. 2 . by n M Ai = { x : x G Ai for some i — 1 . if A is an index set and {Aa: a G A} is a collection of sets. .n} 2=1 and p | Ai = {rr: .

101]. Then A is the domain o f / . If S C A.9 ] U [9.2] and T = [82. and 2 is a preimage of 5. we denote by f(S) the set {/(x): x £ S} of all images of elements of S. x ~ x (refcxivity)." and "is greater than or equal to" are familiar relations. is * a rule that associates to each element. then y ~ x (symmetry).Appendix B Functions 551 By a relation on a set A. then / is called onto. if T C B. / is defined by /(x) = x2 + l. Note that the range of / is a subset of B. and [1. 2. . that is.o(x) for all x G A Example 1 Suppose that A = [—10. then f(S) = [2. for any elements x and y in A. written / : A — B. More precisely. we often write x ~ y in place of (x. A relation 5 on a set A is called an equivalence relation on A if the following three conditions hold: 1. then a function / from A to B. B is called the codomain of / . x stands in a given relationship to y. APPENDIX B FUNCTIONS If A and B are sets. > written / = </. y) G S if and only if x stands in the given relationship to y. if x ^ y implies f(x) ^ f(y)If / : A —• B is a function with range 5 . if we define x ~ y to mean that x — ?/ is divisible by a fixed integer n. that is. Since /(2) = 5. the image of 2 is 5. y) G S. Notice that —2 is another preimage of 5. i? is the codomain o f / . Functions such that each element of the range has a unique preimage are called one-to-one.10]. and the set {fix): x E A} is called the range of / . a relation on A is a set S of ordered pairs of elements of A such that (x. then A is called the domain of / . Finally. 3. • As Example 1 shows. x in A a unique element denoted f(x) in Z?. then ~ is an equivalence.we denote by f~l(T) the set {x G A: / ( x ) G T7} of all preimages of elements in T.1 in R. if /(A) = B. the preimage of an element in the range need not be unique.5] and f~l(T) = [-10. cquivalently. Let / : A —> R be the function that assigns to each element x in A the element x 4.10]. If x ~ y and y ^ z. "is equal to. for instance. and x is called a preimage of f(x) (under / ) . If S is a relation on a set A." "is less than.101] is the range of / . two functions f: A —> B and <?: A — f? are equal. If x ~ y. if / ( x ) = . if S — [1. Likewise. If / : A —• B. that is / : A — B is one-to-one if /(x) = f(y) implies > x = y or. On the set of real numbers. we mean a rule for determining whether or not. relation on the set of integers. Moreover. For example. The element f(x) is called the image of x (under / ) . then x ~ 2 (transitivity). For each x G A. So / is onto if and only if the range of / equals the codomain of/. .

. By following / with g. subtraction. * * called the restriction of / to S. and division) can be defined so that. a field is denned as follows. If / : A . APPENDIX C FIELDS The set of real numbers is an example of an algebraic structure called a field.1]. the sum. f(x) = sinx. can be formed by defining fs(x) = f(x) for each x G S. then fs is both onto and one-to-one. then fr is one-to-one. Example 3 The function f:R—* R denned by fix) — 3x + 1 is one-to-one and onto. and (. hence / is invertible. then h o (p of) — (h o g ) o / .(/. then / ' is invertible./r.•: S — B.1 . and C be sets and f:A-+B and o: B -* C be functions.'?o. multiplication. * A function / : A — B is said to be invertible if there exists a function > g: B — A such that ( / o g)(y) = y for all y G B and (g o /)(x) = x for all > x G A. whereas ( / o g)(x) = f(g(x)) ~ sin(x 2 + 3). If such a function g exists. Basically. then it is unique and is called the inverse of / . Hence. The next example illustrates these concepts. 1]. then q o f is invertible. If / : A —* B and g: B —> C are invertible. if T = [A. difference. Finally. Thus (g o /)(x) = g(f(x)) for all x G A. For example. and quotient of any two elements in the set is an element of the set. 9 ° / 7^ / ° .=. Then a function /<.o(x) = x 2 + 3. • Let A.' = / . if h: C — I) is another function. Example 2 Let / : [-1./-1ory-'. The inverse of /' is the function / _ l : /? —• R defined by/-'(x) = (x-l)/3. and ( / _ 1 ) . and . but not one-to-one since / ( — I) = / ( l ) = 1. Then (g o /)(x) = (. • The following facts about invertible functions are easily proved. 1] be defined by f(x) = x 2 . we obtain a function g of: A —> C called the composite of o and / . with the exception of division by zero. that is. Note that if 5 = [0. We denote the inverse of / (when it exists) by / . 1. product.1] -» [0. It can be shown that / is invertible if and only if / is both one-to-one and onto. B.552 Appendices Let / : A — B be a function and S Q A. This function is onto. but not onto.Functional composition is associative.</(/(•''-')) = sin2 2 + 3. however. More precisely. let A = B = C = R.> B is invertible. 2. a field is a set in which four operations (called addition.

of x and y. there are unique elements x + y and x-y in F for which the following conditions hold for all elements a. with addition and multiplication as in R is a field. 1-1 = 1. • Example 2 The set of rational numbers with the usual definitions of addition and multiplication is a field. The elements 0 (read "zero") and 1 (read "one') mentioned in (F 3) are called identity elements for addition and multiplication. respectively.•*• Appendix C Fields 553 Definitions. 0+1 = 1+0=1. • 0-l = l . respectively. ( F l ) a + b = b + a and a*6 = b-a (commutativity of addition and multiplication) ( F 2 ) (a + b)+c = a + (b + c) and (a-b)>c = a>(b-c) (associativity of addition and multiplication) (F 3) There exist distinct elements 0 and 1 in F such that 0+ a=a and 1 -a = a (existence of identity elements for addition and multiplication) (F 4) For each clement a in F and each nonzero clement b in F. • Example 3 The set of all real numbers of the form a + by/2. Example 1 The set of real numbers R with the usual definitions of addition and multiplication is a field. • Example 4 The field Z2 consists of two elements 0 and 1 with the operations of addition and multiplication defined by the equations 0 + 0 = 0. A field F is a set on which two operations + and • (called addition and multiplication.c in F.0 = 0. and 1 + 1=0. there exist elements c and d in F such that a+c= 0 and tW = 1 (existence of inverses for addition and multiplication) (F 5) a-(b + c) =ci-b + a-c (distributivity of multiplication over addition) The elements x + y and x-y are called the sum and product. 0-0 = 0. y in F.b. respectively) are defined so that. where a and b an. . and the elements c and d referred to in (F 4) are called an additive inverse for a and a multiplicative inverse for b. respectively. for each pair of elements x. rational numbers.

for in either case (F 4) does not hold. Theorem C . Note that . are unique. a unique multiplicative inverse.& ) = 6 and (&-1)""1 = 6. l (Cancellation Laws).d) = a • 1 = a. then a = c.554 Example 5 Appendices Neither the set of positive integers nor the set of integers with the usual definitions of addition and multiplication is a field. The proofs of the remaining parts are similar. Multiply both sides of the equality a-b = c . if b ^ 0.b by d to obtain (a-6)-d = (c-fe)-d. we have (a • b) • d = a • (6. Since 0 + a = a for each a G F. and c in a held. (It is shown in the corollary to Theorem C. Thus 0' = 0 by Theorem C. Subtraction and division can be defined in terms of addition and multiplication by using the additive and multiplicative inverses. product. and the elements c and d mentioned in (F 4). I Thus each element b in a field has a unique additive inverse and. o with this exception. For arbitrary elements a. = a for each a G F. Similarly. Specifically. but. then a = c. the following statements are true. 1 In particular. (a) If a + b = c + b.l. . that is. the sum.2 that 0 has no multiplicative inverse.( . I Corollary. then (F 4) guarantees the existence of an element d in the field such that b-d = 1. • The identity and inverse elements guaranteed by (F 3) and (F 4) are unique. Suppose that 0' G F satisfies 0' + o. Consider the left side of this equality: By (F 2) and (F 3). Division by zero is undefined. Proof. Thus a = c. we have 0' + a = 0 + a for each a G F. b. (b) If a-b = cb and b ^ 0. subtraction of b is defined to be addition of —b and division by b ^ 0 is defined to be multiplication by o _ 1 . this is a consequence of the following theorem. The elements 0 and 1 mentioned in (F 3). respectively. the right side of the equality reduces to c. Proof. and quotient of any two elements of a field are defined. (b) If b T& 0.) The additive inverse and the multiplicative inverse of b are denoted by — b and ft""1. (a) The proof of (a) is left as an exercise. difference. the symbol f denotes b l.= d'b b . a — b = a + (—6) and .

a ) . of some characteristic other than two). (a) Since 0 + 0 = 0. the product of two elements a and 6 in a field is usually denoted a6 rather than a • 6.M . (c) By applying (b) twice. some of the results about vector spaces stated in this book require that the field over which the vector space is defined be of characteristic zero (or.6 ) . if no such positive integer exists. The additive identity of a field has no multiplicative inverse. in the field Z2 (defined in Example 4).a ) = 0.( a . it may happen that a sum 1 + 1 H + 1 (p summands) equals 0 for some positive integer p.( a .6 ) = . Then each of the following statements are true.6 ) ] = . then x + x + .( a . Thus ( .a ) . as the next theorem shows.l.6 ) = a-6. the smallest positive integer p for which a sum of p l's equals 0 is called the characteristic of F. So in order to prove that (—a)-6 = — (a-6). Thus Z2 has characteristic two. note that in other sections of this book. For this reason.( .a ) . (a) a-0 = 0. For example. (F 5) shows that 0 + a-0 = a-0 = a-(0 + 0) = a-0 + a-0.( a . and R has characteristic zero.a ) . Proof. we find that (-a)-(-b) = . Let a and b be arbitrary elements of a field. In a field having nonzero characteristic (especially characteristic two). —(a-6) is the unique element of F with the property a-6 + [-(a-6)] = 0.6 ) = . | Corollary. then F is said to have characteristic zero.6 ) .6 = .( .2. Finally.[ .6 ) ] = a-6. it suffices to show that a-6 + (—a)-6 = 0.• • + x (p summands) equals 0 for all x G F. In an arbitrary field F.6 ) is similar. Theorem C. so a-6 + ( . at least. In this case. (b) By definition. (c) ( .( . 1 + 1 = 0.6 = a . Thus 0 = a-0 by Theorem C. . But —a is the element of F such that a + ( .b= [a + (-a)] -6 = 0-6 = 6-0 = 0 by (F 5) and (a).• '—*^B Appendix C Fields 555 Many of the familiar properties of multiplication of real numbers are true in any field. (b) ( . Observe that if F is a field of characteristic p / 0. many unnatural problems arise. The proof that a .

respectively.5i)(9 + 7i) = [3-9 . Observe that this correspondence preserves sums and products. Example 2 illustrates this technique. A complex number is an expression of the form z = a+bi. Definitions.6d) + (6-0 + 0-d)i = -6d.24i. where 6 is a nonzero real number. that is. as follows: z + w = (a + bi) + (c + di) = (a + c) + (6 + d)i and zw = (a + bi)(c + di) — (ac — bd) + (6c + ad)i. z + w = (3 . The observation that i2 = i • i = — 1 provides an easy way to remember the definition of multiplication of complex numbers: simply multiply two complex numbers as you would any two algebraic expressions. It is possible to "enlarge" the field of real numbers to obtain such a field.556 APPENDIX D COMPLEX NUMBERS Appendices For the purposes of algebra. b. is called imaginary. • Any real number c may be regarded as a complex number by identifying c with the complex number c + Oi. c. the field of real numbers is not sufficient. for i = 0 + li. we have i • i = — 1. and d are real numbers) are defined. Any complex number of the form bi = 0 + bi.(-5)-7] + [(-5)-9 + 3-7}i = 62 . x 2 + 1). . The sum and product of two complex numbers z = a + bi and w = c+di (where a. for there are polynomials of nonzero degree with real coefficients that have no zeros in the field of real numbers (for example. The product of two imaginary numbers is real since (bi)(di) = (0 + 6i)(0 + di) = (0 .5i) + (9 + 7i) = (3 + 9) + [(-5) + 7}i = 12 + 2i and zw = (3 . It is often desirable to have a field in which any polynomial of nonzero degree with coefficients from that field has a zero in that field. where a and 6 are reai numbers called the real part and the imaginary part of z. Example 1 The sum and product of z = 3 — 5i and w = 9 + 7% are. In particular. respectively. and replace i2 by —1. respectively. (c + Oi) + (d + Oi) = (c + d) + Qi and (c + 0i)(d + 0i) = cd + Oi.

statements are true. The set of complex numbers with the operations of addition and multiplication previously defined is a Held. regarded as a complex number. the following result is not surprising. The (complex) conjugate of a complex number a + bi is the complex number a — bi. and 6 = 6 + Oi = 6 .5 ( 1 . (a + « ) = a2 + b2 b2 In view of the preceding statements. Proof. Theorem D. is an additive identity element for the complex numbers since (a + bi) + 0 = (a + bi) + (0 + Oi) = (a + 0) + (6 + 0)i = a + bi. Likewise the real number 1. Exercise.5 + 15i + 2i . But also each complex number except 0 has a multiplicative inverse.l ) = 1 + 17i • 557 The real number 0. -3 + 2i = -3-2i. 4-7i = 4 + 7i. Let z and w be complex numbers.Oi = 6.5 + 2i)(l . respectively. In fact.Si) = . regarded as a complex number. 1 Definition. is a multiplicative identity element for the set of complex numbers since (a + 6i)-l = (a + bi)(l+0i) = (a-1 . Theorem D . Every complex number a + bi has an additive inverse.6-0) + (6-1 + a-0)i = a + bi. Example 3 The conjugates of —3 + 2i. and 6 are.Z Appendix D Complex Numbers Example 2 The product of — 5 + 2i and 1 — 3i is ( . namely (—a) + (—b)i. We denote the conjugate of the complex number z byz.2. l . Then the following .Zi) = .Si) + 2i(l . 4 — 7i. • The next theorem contains some important properties of the conjugate of a complex number.6i 2 = —5 + 15i + 2« — 6 ( .

We leave the proofs of (a). Then [z + w) = (a + c) + (6 + d)i = (a + c) . Let z = a + bi. This fact can be used to define the absolute value of a complex number. (b) (z + w) = z + w. (c) ZW — ~Z'W.3. c. then a + 6i c + di Example 4 To illustrate this procedure. d G R.(6 + d)i = (a — bi) + (c — di) = z + w. Let z and w denote any two complex numbers. as the following result shows. (b) Let z — a + bi and w — c + di.b G R. where a.2 i ' 3 + 2i " 9+ 4 5 14. Definition. where a. \wJ w (e) z is a real number if and only if z = z.2i ~ 3 . The absolute value (or modulus) of z is the real number \/a 2 + 62. we compute the quotient (1 + 4i)/(3 — 2i): l + 4 i _ l + 4 i 3 + 2 f _ . Observe that zz = |z| 2 . Proof. (c) For z and to.558 (a) z = z. + d? be — ad . For any complex number z = a + bi. for if c + di ^ 0. for z~z — (a + bi)(a — bi) = a2 + b2. and (e) to the reader. z~z is real and nonnegative.5 + 14i_ 3 . Then the following statements are true. (d) (-) = = ifw ^ 0. We denote the absolute value of z by \z\. 13 + 13*' a + 6i c — di c + di c — di (ac + bd) + (be — ad)i c2+d2 ac + bd. Theorem D. . The fact that the product of a complex number and its conjugate is real provides an easy method for determining the quotient of two complex numbers. 6. r/'2 13 The absolute value of a complex number has the familiar properties of the absolute value of a real number. (d). we have zw = (a + bi)(c + di) = (ac — bd) + (ad + bc)i Appendices — (ac — 6d) — (ad + bc)i — (a — 6i)(c — di) = z-w.

proving (a). w \w\ (c) \z + w\ < \z\ + \w\. Using Theorem D.2 and (a). (a) By Theorem D. Suppose that z — a + bi. Thus x + x is real and satisfies the inequality x + x < 2|x|. 6 G R. we have. as in R2. So \z\ — |t^| < \z + w\. (b) . proving (d). wz + wz < 2\wz\ = 2|w. It is clear that addition of complex numbers may be represented as in R2 using the parallelogram law.2 again gives \z + w\ = iz + w)(z + w) = (z + w)(z + w) = zz + wz + ZW + WW < \zf + 2|z|H + \wf = (|z| By taking square roots.bi) = 2a < 2\/a2 + b2 = 2|x|. and the absolute value of z gives the length of the vector z. We may represent z as a vector in the complex plane (see Figure D. . by Theorem D. there are two axes.||2| = 2|z||tu|. Proof.=11 ifw^O.\w\ < \z + w\. where a and 6 are real numbers. where a. the real axis and the imaginary axis. 1 »' It is interesting as well as useful that complex numbers have both a geometric and an algebraic representation.2. 559 (d) \z\ . we have \zw\2 = izw)izw) = (zw)(z • w) = (zz)(ww) — |2|2|w.|2.B~ Appendix D Complex Numbers (a) \zw\ = \z\-\w\. The real and imaginary parts of z are the first and second coordinates. Notice that. Taking x = wz. apply (a) to the product (— ) w. \w/ (c) For any complex number x = a + 6i.l(a)). (d) From (a) and (c). we obtain (c). (b) For the proof of (b). it follows that \z\ = 1(2 + w) — w\ < \z + w\ + I — w\ = \z + w\ + \w\. observe that x + x = (a + 6i) + (a .

«-i i an . we sec1 that any nonzero complex number z may be depicted as a multiple of a unit vector.r) has a zero. z = |^|e*^. Our next result guarantees that the field of complex numbers has this property./>.. Our motivation for enlarging the set of real numbers to the set of complex numbers is to obtain a field such that every polynomial with nonzero degree having coefficients in that field has a zero. Theorem D. Because of the geometry we have introduced. The special case _ c o s g _|_ j g m 0 js 0£ particular interest.w are two nonzero complex numbers.l(b).„z" + a„ iz" l + ••• + a\z + a{) is a polynomial in P(C) of degree n > 1. We want to find zrj in C such that p(zo) = 0. For \z\ = s > 0.tn . by Walter Rudin (McGraw-Hill Higher Education. we have \p(z)\ •-= \anzn + a „ _ . has a simple geometric interpretation: If z = \z\e and w = \w\e.i(<H a. as well as addition.//• >o we' .l In Section 2.7 and Theorem D. 1976). The following proof is based on one in the book Principles of Mathematical Analysis 3d. New York.4 (The Fundamental Theorem of Algebra). Suppose that p(z) = o.3. el° is the unit vector that makes an angle 6 with the positive real axis. we may represent the vector c'" as in Figure D.560 imaginary axis Appendices (b) Figure D.132). and makes the angle 0 + uj with the positive real axis. then from the properties established in Section 2. Proof. we introduce Filler's formula. Thus multiplication.7 (p.) So zw is the vector whose length is the product of the lengths of z and . where 0 is the angle that the vector z makes with the positive real axis. Let m be the greatest lower bound of {|/>(c)|: 2 G ('}. namely. From this figure. that is. we have (. Then />(.

| A field is called algebraically closed if it has the property that every polynomial of positive degree with coefficients from that field factors as a product of polynomials of degree 1.lon-iU^p" 1 = \an\sn . degree n. Now choose r even smaller. Because D is closed and bounded and p(z) is continuous.| a n _ i | s _ 1 |a 0 | |ao| \a0\s~n].\bk\rk + bk+1rk+1ei(k+l)d + ••• + Choose r small enough so that 1 — |6fc[rfc > 0. q(0) = 1. So we may write q(z) = 1 + bkzk + bk+1zh+1 where 6fc 7^ 0. we may pick a real number 9 bk such that etkd = — ~-. Then q(z) is a polynomial of P(zo) + ••• + bnzn. cn (not necessarily distinct) such that p(z) = an(z . Corollary. It follows that rn is the greatest lower bound of {|p(z)|: z G D).cA)(z .| a n _ i | s n _ 1 = sn[\an\ . so that the expression within the brackets is positive. We want to show that rn = 0. 561 Because the last expression approaches infinity as s approaches infinity. there exists ZQ in D such that |p(zo)| = m.c 2 ) • • • (z . Because — — has modulus one. Proof. • • •.7s Appendix D Complex Numbers > |a n ||2 n | . Thus the preceding corollary asserts that the field of complex numbers is algebraically closed. Exercise. and \q(z)\ > 1 for all z in C. If p(z) = anzn + an-izn~l + • • • + a\z + ao is a polynomial of degree n > 1 with complex coefficients.4 and the division algorithm for polynomials (Theorem E. For any r > 0. we have bk q(rei9) = 1 + bkrkelk6 + & fc+1 r* +1 e«* +1 >' + • • • + = 1 . then there exist complex numbers Ci. I The following important corollary is a consequence of Theorem D. Then \q(rei9)\ < 1 . or etkdbk = — l&J. We argue by contradiction. .l). Assume that rn ^ 0. But this is a contradiction. bnrnein6 bnrnein0. if necessary. c 2 .|6fc|rfc + |6fc + i|r fc+1 + • • • + |6 n |r n = l-rfc[|6fc|-|6fc+i|r |6 n |r"-*]. Let q(z) = :—-—. we may choose a closed disk D about the origin such that \p(z)\ > m + 1 if z is not in D. We obtain that \q(rel0)\ < 1.Cn).

and let g(x) be a polynomial of degree rn > 0. Our first result shows that the familiar long division process for polynomials with real coefficients is valid for polynomials with coefficients from an arbitrary field. CASE 2. For the definition of a polynomial. Then there exist unique polynomials q(. If n < m.2. we assume that all polynomials have coefficients from a fixed field F. (3) . and let h(x) be the polynomial defined by hi. we discuss some useful properties of the polynomials with coefficients from a field. l (The Division Algorithm for Polynomials). Hence we may take q(x) — f(x)/g(x) and r(x) = 0 to satisfy (1). A polynomial f(x) divides a polynomial g(x) if there exists a polynomial q(x) such that g(x) = f(x)q(x). First suppose that n = 0. CASE 1. Then m = 0. Suppose that f{x) = anxn + an-ixn and g(x) = bmxm + bm-ix™-1 +••• + bix + bo. we apply mathematical induction on n.r) and r(x) such that f(x)=q(x)g(x) where the degree of r(x) is less than m. refer to Section 1. and therefore we may apply the induction hypothesis or CASE 1 (whichever is relevant) to obtain polynomials q\ (x) and r(x) such that r(x) has degree less than rn and h(x) = qx(x)g(x) + r(x). Throughout this appendix.r) = fix)-anb-l]-r"</(*)• (2) + aix + ao + r(x). Theorem E . and assume that f(x) has degree n. (1) Then h(x) is a polynomial of degree less than n. When 0 < rn < n. take q(x) = 0 and r(x) = f(x) to satisfy (1). Proof.562 APPENDIX E POLYNOMIALS Appendices In this appendix. and it follows that f(x) and g(x) are nonzero constants. Definition. Now suppose that the result is valid for all polynomials with degree less than n for some fixed n > 0. Let f(x) be a polynomial of degree n. We begin by establishing the existence of q(x) and r(x) that satisfy (1).

suppose that F is the field of complex numbers. Then the quotient and remainder for the division of f(x) = (3 + i)x5 . Since r(x) has degree less than 1. g 2 (x). suppose that / ( a ) = 0.Appendix E Polynomials 563 Combining (2) and (3) and solving for f(x) gives us f(x) = q(x)g(x) + r(x) with q(x) — a n 6 ~ 1 x " _ m + </i(x). respectively. For example. ri(x). Then / ( a ) = 0 if and only if x — a divides f(x). respectively. +4 Corollary 1. which establishes (a) and (b) for any n > 0 by mathematical induction. q(x) = x 3 + ix 2 .l.6 + 2i)x 2 + (2 + i)x + 1 by g(x) = (3 + i)x2 -2ix arc. Suppose that x — a divides f(x). Substituting a for x in the equation above. for the division of f(x) by g(x). Let f(x) be a polynomial of positive degree. Since g(x) has degree m. Conversely.2 and r(x) = (2 . Proof.q2(x)]g(x) = r2(x) . (4) The right side of (4) is a polynomial of degree less than m. and let a G F. there exist polynomials q(x) and r(x) such that r(x) has degree less than one and f(x) =o(x)(x-a)+r(x). it must be the constant polynomial r(x) = 0. Then there exists a polynomial q(x) such that f(x) = (x — a)q(x).i)x4 + 6x 3 + ( . This establishes the existence of q(x) and r(x). it must follow that q\(x) — q2(x) is the zero polynomial. Thus f(x) = q(x)(x-a).n(x).3i)x + 9. and r 2 (x) exist such that n ( x ) and r 2 (x) each has degree less than m and f(x) = oi(x)o(x) + ri(x) = q2(x)g(x) + r 2 (x). thus n ( x ) = r2(x) by (4). we call q(x) and r(x) the quotient and remainder. By the division algorithm. We now show the uniqueness of q(x) and r(x). Suppose that Oi(x). Hence < i (x) = g 2 (x). we obtain ria) = 0.(1 . 7 I In the context of Theorem E. I . Then ki(x-) . Thus f(a) = (a — a)o(a) = O-g(a) = 0.

Since any zero of f(x) distinct from a is also a zero of q(x). Note that q(x) must be of degree n. then / 2 (x) is a nonzero constant c. an element a G F is called a zero of f(x) if f(a) = 0. (See Chapter 7. Two nonzero polynomials are called relatively polynomial of positive degree divides each of them. In this case. consider f(x) and g(x) = (x — 2)(x — 3). If fi(x) and f2(x) are relatively prime polynomials.) Definition. there exist polynomials q\(x) and q2ix) such that qi(x)fi(x) + q2(x)f2(x) = l.r) can have at most n distinct zeros.1) and h(x) = (x — l)(x — 2) are not relatively prime because x — 1 divides each of them. zeros. Thus f(x) and g(x) are relatively prime because they have. assume that the degree of fi(x) is greater than or equal to the degree of / 2 (x). ( x ) = g ( x ) / 2 ( x ) + r(x). Any polynomial of degree n > 1 has at most n distinct zeros. If / 2 (x) has degree 0. if a is a zero of /(x). With this terminology. Theorem E. there exist polynomials q(x) and r(x) such that r(x) has degree less than n and / . The proof is by mathematical induction on n. Could other factorizations of f(x) and g(x) reveal a hidden common factor? We will soon see (Theorem E. Now suppose that the result is true for some positive integer n. the preceding corollary states that a is a zero of f(x) if and only if x — a divides f(x). the polynomials with real coefficients f(x) = x 2 (x . polynomial with value 1. q(. Proof. it follows that / ( x ) can have at most n + 1 distinct. no common factors of positive degree.2. and let f(x) be a polynomial of degree n + 1. where 1 denotes the constant. then there is nothing to prove. If f(x) has no zeros. The result is obvious if n — 1. On the other hand. Proof. Without loss of generality. Now suppose that the theorem holds whenever the polynomial of lesser degree has degree less than n for some positive integer n. by the induction hypothesis. therefore.564 Appendices For any polynomial f(x) with coefficients from a field F. and suppose that / 2 (x) has degree n. prime if no For example. which do not appear to have common factors. we can take q\(x) = 0 and f/2(x) = 1/c. Otherwise.9) that the preceding factors are the only ones. (5) . then by Corollary 1 we may write f(x) = (x —a)o(x) for some polynomial q(x). Polynomials having no common divisors arise naturally in the study of canonical forms. The proof is by mathematical induction on the degree of / 2 (x). By the division algorithm. Corollary 2.

anxn be a polynomial with coefficients from a field F. 6. we consider linear operators that are polynomials in a particular operator T and matrices that are polynomials in a particular matrix A. Since r(x) has degree less than n. Similarly. • and hence these polynomials satisfy the conclusion of Theorem E. /i(x) and f2(x) are relatively prime. setting q\(x) = p 2 (^) and <?2(x) = gi(x) — g2(x)q(x). and 7. Definitions. If T is a linear operator on a vector space V over F. then there exists a polynomial g(x) of positive degree that divides both / 2 (x) and r(x). . we define f(A) = a0I + a1A + ---+anAn. Suppose otherwise.+ a n T n . We claim that / 2 (x) and r(x) are relatively prime. For these operators and matrices. contradicting the fact that / i ( x ) and / 2 (x) are relatively prime. Throughout Chapters 5. we define / ( T ) = a 0 l + a i T + --. we may apply the induction hypothesis to / 2 (x) and r(x). we obtain the desired result. As polynomials with real coefficients. It is easily verified that the polynomials qi(x) = —x + 2 and g 2 (x) = x 2 — x — 1 satisfy qi(x)fi(x) + q2(x)f2(x) = 1. we have 1 = gi(x)f2(x) = g^(x)fi(x) + g2(x) [fi(x) q(x)f2(x)\ f2(x). r(x) is not the zero polynomial.2. Thus there exist polynomials ai (x) and p2 (x) such that 9i(x)f2(x)+g2(x)r(x) Combining (5) and (6).g2(x)q(x)] Thus. = l. if A is an x n matrix with entries from F. by (5). | Example 1 Let /i(x) = x 3 — x 2 + 1 and / 2 (x) = (x — l) 2 .Appendix E Polynomials 565 Since /i(x) and f2(x) are relatively prime. Let f(x) = a 0 + ai (x) + \. Hence. (6) + [oi(x) . the following notation is convenient. g(x) also divides /i(x).

Then.3a-36). a + 26) + (4a + 26.3. if A = then f(A) = A2+2A-3I = 1 2 +2 2 l 1 -1 )-3 l -1 r °\0 ° 1 6 3 3 -3 The next three results use this notation.5. and let f(x) = x2 + 2x . a + 26). and let A be an n x n matrix with entries from F. then [f(T)}0 = f(A). Proof. 6) = (5a + 6. Then the following statements are true. Exercise. then there exist polynomials q\(x) and q2(x) with entries from F such that (a) ai(T)/!(T)+o 2 (T)/ 2 (T) = l (b) q1{A)f1(A) + q2{A)f2(A) Proof. 1 Theorem E.3 l ) ( a . Similarly.26) . Proof. Exercise. Theorem E.4.6) = (T 2 + 2 T . It is easily checked that T 2 (a.566 Example 2 Appendices Let T be the linear operator on R2 defined by T(a. (a) / i ( T ) / 2 ( T ) = / 2 (T)/ 1 (T) (b) / 1 (A)/ 2 (A) = / 2 (A)/ 1 (A). and let T be a linear operator on a vector space V over F.6) = (2a + 6. Exercise. 1 Theorem E. (b) If8 is a finite ordered basis for V and A = [T)0.3. 6 ) = (5a + 6. Let f(x) be a polynomial with coefficients from a field F. . for any polynomials fi(x) and / 2 (x) with coefficients from F. and let A be a square matrix with entries from F. a — 6). = I. (a) / ( T ) is a linear operator on V.2a . Let T be a linear operator on a vector space V over a field F. Let T be a linear operator on a vector space V over a field F. so /(T)(a. If f\ (x) and / 2 (x) are relatively prime polynomials with entries from F. 6) = (6a+ 36.3(a.

Then 0(x) and f(x) are relatively prime by Theorem E. Exercise. Any two distinct irreducible monic polynomials arc relatively prime. there is a polynomial h(x) such that f(x)g(x) 4>(x)h(x). g(x). the polynomials of degree 1 are the only irreducible polynomials. for polynomials with coefficients from an algebraically closed field. A polynomial f(x) with coefficients from a field F is called monic if its leading coefficient is 1. Since 0(x) divides f(x)g(x). Clearly any polynomial of degree 1 is irreducible.8. The following facts are easily established.6. Exercise. Both of these problems are affected by the factorization of a certain polynomial determined by T (the characteristic polynomial ofT). then 0(x) and f(x) are relatively prime.w Appendix E Polynomials 567 In Chapters 5 and 7. and so there exist polynomials tfi(x) and q2(x) such that l = Ol(x)0(x)+o2(x)/(x). and 0(x) be polynomials. but it is not irreducible over the field of complex numbers since x 2 + 1 = (x + i)(x — i). particular types of polynomials play an important role. Proof. q2(x)h(x)}. i Theorem E. In this setting. If f(x) has positive degree and cannot be expressed as a product of polynomials with coefficients from F each having positive degree. If 0(x) is irreducible and divides the product f(x)g(x). Let 0(x) and f(x) be polynomials. Moreover. Multiplying both sides of this equation by g(x) yields g(x) = ai(x)0(x)p(x) + a 2 (x)/(x)o(x). Proof. then f(x) is called irreducible. Let f(x).6. Definitions. I Theorem E. Theorem E. For example. Thus (7) becomes g(x) = qi(x)(f>(x)g(x) + c 2 (x)0(x)a(x) = 0(x) \qi(x)g(x) + So 0(x) divides g(x). 1 (7) = . Suppose that 0(x) does not divide f(x).7. Observe that whether a polynomial is irreducible depends on the field F from which its coefficients come. f(x) = x 2 + 1 is irreducible over the field of real numbers. If 0(x) is irreducible and 4>(x) does not divide f(x). we are concerned with determining when a linear operator T on a finite-dimensional vector space can be diagonalized and with finding a simple (canonical) representation of T. Proof. then 0(x) divides f(x) or <j>(x) divides g(x).

0 2 ( x ) . Proof. and let 0i (x). . We are now able to establish the unique factorization theorem. 0 n (x) be n irreducible polynomials. . Suppose then that for some n > 1. If 0(x) divides the product 0i(x)0 2 (x) • • -0n(x). This result states that every polynomial of positive. induction hypothesis . and unique positive integers n\. The. In the first case. Then f(x) = anxn H r. 0(x) = 0i(x) for some i (i = 1 . . 0(x) = <f>n(x) by Theorem E. Theorem E. . then f(x) = ax + b for some constants a and 6 with a ^ 0. each of positive degree less than n. . then 0(x) = 0i(x) for some i (i — 1 . If f(x) is irreducible. which is used throughout Chapters 5 and 7.r) • • • (t)n(x) = [0l(x)0 2 (x) • • • 0„_i(x)] 0 n (x). 2 .8. . 4>k(x). .r). nk such that f(x) = <iMx)Y'i[Mx)}n2---[ct>kix)}nk. For any polynomial fix) of positive degree. Let 0(x). consequence of Theorem E. then /(x) = g(x)h(x) for some polynomials g(x) and h(x). degree is uniquely expressible as a. constant times a product of irreducible monic polynomials. the result is an immediate. . 2 .568 Appendices Corollary. . unique distinct irreducible monic polynomials 0i(. . . . the corollary is true for any n — 1 irreducible monie polynomials. Now suppose that the conclusion is true for any polynomial with positive degree less than some integer n > 1. We begin by showing the existence of such a factorization using mathematical induction on the degree of fix).. For n = 1. . <f>n(x) be irreducible monic polynomials.aix + a 0 for some constants a* with a n ^ 0. the result is proved in this case. 0 2 ( x ) . . Proof. . . an + ^ + ^ ) an aTIJ is a representation of f(x) as a product of an and an irreducible monic polynomial. Setting 0(x) = x + 6/a. then 0(x) divides the product 0i(x)0 2 (x) • • • 0„_i (x) or 0(x) divides 0 n (x) by Theorem E. .7. .r" \ + ^i:r»-'+. we have f(x) — a0(x). . n 2 . Since 0(x) is an irreducible monic polynomial. .7. . 0i (x). We prove the corollary by mathematical induction on n. If 0(x) divides 0l(-r)0 2 (. . n). If f(x) is of degree 1. then / W = a„(. .. n— 1) by the induction hypothesis. 0 2 ( x ) . there exist a unique constant c. .9 (Unique Factorization Theorem for Polynomials). If f(x) is not irreducible. and let f(x) be a polynomial of degree n. in the second case.

But f(a) = g(a) for all a G Z2. . So 0i(x) = 0i(x) for some i = 2 ..10. . k and j — 1 . f(x) can be factored as a product of a constant and powers of distinct irreducible monic polynomials. .0fc(x) are distinct. . and suppose that h(x) is of degree n > 1. 0i(x) = i^i(x) for i = 1. each 0i(x) equals some ipj(x). Suppose that / ( a ) = g(a) for all a G F. T h e o r e m E. Dividing by c. It remains to establish the uniqueness of such a factorization. each ij)j(x) equals some 0»(x). Define h(x) = f(x) — g(x). the value of / at c G F is f(c) = ancn + \. .j are positive integers for i = 1 . If f(a) = g(a) for all a. Thus. then f(x) and g(x) are equal. We conclude that r = k and that..™ ' [0 2 (x)]" 2 • • • [ 0 f c ( x ) r = [ 0 2 ( x ) p • • • [0fc(x)f (10) Since ni — mi > 0. 1 It is often useful to regard a polynomial f(x) = anxn H h aix + ao with coefficients from a field F as a function f:F—*F. and ni and m. by renumbering if necessary. Proof. . In this case.a\C + ao. . Suppose that n» ^ m-j for some i.2. and similarly. . It follows from Corollary 2 to . . . Our final result shows that this anomaly cannot occur over an infinite field. we may suppose that i = 1 and ni > mi. 0i(x) divides the left side of (10) and hence divides the right side also. 0i(x) and i/)j(x) are irreducible monic polynomials. . for arbitrary fields there is not a one-to-one correspondence between polynomials and polynomial functions. Consequently f(x) — g(x)h(x) also factors in this way.Appendix E Polynomials 569 guarantees that both g(x) and h(x) factor as products of a constant and powers of distinct irreducible monic polynomials. . Hence the factorizations of f(x) in (8) are the same. Clearly both c and d must be the leading coefficient of /(x). . we find that (8) becomes [0i ( x ) r [ 0 2 ( x ) p • • • [0fc(x)r = [Mx)}™1 [Mx)r> • • • IM*)}1 (9) So 0i(x) divides the right side of (9) for i = 1 . But this contradicts that 0 i ( x ) . r. . Then by canceling [0i(x)] mi from both sides of (9). Let f(x) and g(x) be polynomials with coefficients from an infinite Geld F. 0 2 ( x ) . k. . 2 . 2 . if f(x) = x2 and g(x) = x are two polynomials over the field Z 2 (defined in Example 4 of Appendix C). so that / and g are equal polynomial functions.k. k by the corollary to Theorem E. . in either case. Without loss of generality. Consequently..8. then f(x) and g(x) have different degrees and hence are not equal as polynomials. by the corollary to Theorem E. 2 . we obtain [ 0 i ( x ) p . . . . G F. hence c = d. Suppose that / ( x ) = c [ 0 i ( x ) r [02(x)p • • • [0fc(x)r = d[Mx)}mi[Mx)}m2---lMx)]n (8) where c and d are constants. . .8. For example.Lmfortunately.

contradicting the assumption that h(x) has positive degree. it follows that h(x) is the zero polynomial.570 Theorem E. Thus h(x) is a constant polynomial. But h(a) = f(a)-g(a) =0 Appendices for every a G F. Hence f(x) = g(x).l that h(x) can have at most n zeroes. and since h(a) — 0 for each a G F. 1 .

-2.1. (a) x = (3. (VS 5) fails.0. 14.9. -7. Only the pairs in (b) and (c) are parallel.5 l \ -1 (e) (g) (5 6 7) 3 < 5y 8. (a) x = (2. the set is not closed under addition.0) + s(9.2x + 10 (g) 10rc7 . -5. No. 2) + t(0.0) SECTION 1. 22. (a) Yes (c) Yes (e) No 11.1 1. M13 = 3. M2i = 4. Yes 571 . 2.0) + t(14. 2 m n SECTION 1.4) + t(-8. and M22 = 5 6 3 2\ . (VS 4) fails. 7) + t(-5. No.2 1. (a) F 2 .30x4 + 40x2 .3 1. /8 20 -12N 4 (a) (C) ' 1-4 3 9 U 0 28 (e) 2x4 + x3 + 2x2 . -3) (c) x = (3.15x 13. Yes 15.A n s w e r s t o S e l e c t e d E x e r c i s e s CHAPTER 1 SECTION 1. No. (a) T (b) F (c) F (d) F (e) T (f) F (g)F (h)F (i)T (j)T (k)T 3. -10) 3. 7. -1) + s(-2.2) (c) x = (-8. 15. .9. No 17. ( a ) (b) F (c) T (d) F (e) T (c) ( " * J (f) F fj ( g)F (_2 ( _ i ) i the trace i s .2.12.

0.1.U2. (a) Yes (c) No (e) Yes (g) Yes SECTION 1.0) + (-4.j — a2)«3 + (04 — a.s<= R} (c) There are no solutions.5 1. (a) F (b) T (c) F (d) F (e) T (f)T 2. (a) No (c) Yes (e) No 4.0.1.0) + s(-3. (a) -4x 2 .6 1. (a) F (b) T (c) F (d) F (e)T (f)F (g)F (h) T (i) F (j) T (k)T (1)T 2. (a) Yes (c) Yes (e) No 5.i)u4 10. {(1.1) + (5.1 17.0. s G R} 3.4. | n ( n . (a) F (b) F (c)F (d)T (e)T (f)T CHAPTER 2 SECTION 2. dim(Wi) = 3. No 5.0. (a) Yes (c) Yes (e) No 3.0. (oi. -2. 2" SECTION 1.0.04) = aioi + (02 — Oi)«2 + (o.0) + s(-3.572 Answers to Selected Exercises SECTION 1.5 13.5): r.l ) 26. No 7. -3. dim(Wi + W2) = 4. n2 . o)' \0 1 vo 11.1)} 15.ua} 9. and dim(Wi n W2) = 1 SECTION 1. (a) T (b) F (c) T (d) F (e) T (f) F 2.02. (a) linearly dependent (c) linearly independent (e) linearly dependent (g) linearly dependent.1 1.2. n 30.1. (a) {r( 1. (a) T (b) F (c)F (d)T (e)F (f)F (g) T (h) F .1. {u\.0) :r. (a) Yes (c) No (e) No 4.7 1. (e) {r(10.0. dim(W2) = 2.3.x + 8 (c) . (i) linearly independent T 0\ (0 0N 7.03.x 3 + 2x2 + 4x .4 1.

T is one-to-one. anc [UT]2 = 0 0 o) \2 0 6 4 -6 3. The nullity is 0.3 1.v „•„ /23 19 A(BD) -(•£) CB = (27 7 9) 1 0 -1 /2 6 0\ 1 . No.3) = (5. (a) 4 < 6y /2 3 0\ /I 0 3 6 . (a)[T]^= l \ -1 4. \ (0 1 2 2 (b) 0 0 ^0 0 3 o\ 2 0 2/ 3 3 *-.1 \ 2. (a) T (b) T (c) F (d)T (2 .. T(2. (a) /l 0 0 10. 5. [U]J = 0 \ 0 0 4/ V (c) (5) { . 3/ (e) / 1\ -2 0 V v 0 0 0 -•\0 0 0 ••• SECTION 2. 4. and the rank is 2. The nullity is 4. 573 SECTION 2. 12.Answers to Selected Exercises 2. (a) F (b) T (g) F (h) F (e) F and (f) F 2. (a) 3 4 (c) (2 1 -3) \i 0/ /0 0 ••• 0 1\ 0 0 ••• 1 0 (f) 0 1 ••• 0 0 \1 0 ••• 0 0/ 3. T is neither one-to-one nor onto.1 4 5 V i o i/ (g)(l 0 •-• 0 1) \ (~ 3 2 2 5. 10. T is not one-to-one but is onto. and the rank is 3.11).2 1. and the rank is 2. (a)A(2£ + 3C7)=( 2 ° .l = (e)T f)F / 0 '. T is one-to-one but not onto. m j = /-* -1\ 0 1 V I o/ / l 0 0 0\ 0 0 1 0 0 1 0 0 \0 0 0 1/ 1 0 ••• O N 1 1 ••• 0 0 1 ••• 0 1 1/ (c) F (i) T ^ 0\ (d)T (J)T i) and and [T].I 1\ (d) . The nullity is 1.

3. -f) .m2)x + 2my.l)y) (b) T (c) T (d) T (e) F (f) T (g) T (h) F 2. (e).y.z) = -x + z 5. (a) h(x. = ( z i .P2(x)}.4 1. (a) No 19. h(x. (a) F 1+m2 ((1 .5 7.z) = \y.5 + x. The functions in (a). (a) [T]0 = (d)F (d) No (d) No (e)T (e) No (b) No. (a) F (g)T 2. (c). 2mx + (m2 . and f3ix. Answers to Selected Exercises (f)F (f) Yes SECTION 2. and (f) are linear functionals. (a)T(x. (a) No 3. ) (o[T]s=(-.y.46 w v f .\y. where g(o + bx) = -3a .574 12. where pi(x) = 2 . (a) No.2x and p?(x) = . SECTION 2.z) = x.y) = SECTION 2. 7. The basis for V is {pi(x).y. (a) T'(f) = g.6 1.

l 1/ Vo o i . (a) The rank is 2.^ 2 . and the inverse is I . and the inverse is 6. /0 0 1\ / l 0 0\ 3. (a) T 2. e -4* „-2t CHAPTER 3 SECTION 3. (a) {e-*.*} (c) {e'Ste . (a) ( 0 1 0 (c) ( 0 1 0 ) \ 1 0 0/ \2 0 1/ (e) F (f) T 5. Adding —2 times column 1 to column 2 transforms A into B.**. -3- (c) The rank is 2.Answers to Selected Exercises SECTION 2.e t sin2t} ( c ) { l .*.te . (a) T _ 1 (ax2 + bx + c) = -ax2 .(4a + b)x . (e) The rank is 3.^ / 2 } (e) {e~*. and the inverse is (g) The rank is 4.c) = (\a-b+\c)x + (-\a+\c)x +b ( 1 0 0\ (1 0 1 Oil i o l/lo 0 0\ / l 1 0 0 o i/\o 0 0\ (1 -2 OllO o 1/ \ o 2 0\/l 1 Olio o 1/ Vo 0 ° \ (1 0 1 1 0 10 1 0 .6.c) = (la-\b+\c.\a-\c.1 1. ( a H e ' 1 4 . (a) F (b) T (b) F (c) F (c) T (d) F (d) T (e) T (e) F (f) F (g) T 575 3.(10a + 2b + c) (c) T" 1 (a. and so no inverse exists.-\ + \b+ §c) 1 2 (e) T~ (a.7 1. ^ 1 .*^} 4.e* cos 2t. (a) T (g) T (b) F (h) F (c) T (i) T (d) F (e) T (f) F 2.b.

s +t (e) 0 1 0 0 : r. 13. and carpenter must have incomes in the proportions 4 : 3 : 4 . G /? 0 ' 3N (2) 4. T.2 V 0 1 SECTION 3.1 {(1. t G It 1 ^w V oy ^ 1/ w (g) 0 0 + r w 1 1 : r. . The farmer. (a) F (b) T (c) T (d) T (e) F (f) T (g) T •L . SECTION 3.1 : t G /? 7. . (b) (1) .4.576 1 3 -2 1 20. and (d) have solutions.H)} = ^ V 0/ +t 2 9 1 9/ G - -2.s.5 units of the second. (a) (~3\ 1 1 V <v (b) F (c) T (d) F (e)F / 0 0 0 0 0 0 0\ 0 0 0 0 0 0 0 0/ Answers to Selected Exercises (f)F (g)T (h)F ( l (e) (g) -1 \ \ > V i) . (a) F 2. 3. 11.8 units of the first commodity and 9. The systems in parts (b).= 2 3 6. (a) +t -3' tGR 1 (1\ (~2\ (3\ (-l\ 0 1 0 0 + r + . (a) 1 0 0 . (c). There must be 7. s. tailor.4 1.3 1.

{«i.3 2 95 21.0.1)} ^ /1\ U : r.2 .0.0).0. (b) {(1.1)} 13. (2. (1. -12 15. ( .0). (a) F 2. 22 (e)F 11.3 (c) (e) H 1 [ w w t /-23\ /A /-23\ 1 0 0 7 + r 0 +s 6 : r.0).0.0.8 (b) T 5.0).1. (a) 19 SECTION 4.0.0. I w / 1 0 2 1 4\ 5. (a) F 3.0.1.0.1 . (0. s G .1.4 : r. 42 13. (-3. (a) . (f) F (g) F (h) T (b) T (c) .«5} 11.2.0.1.0.0.0. CHAPTER 4 SECTION 4. (b) {(1.1. -12 17.ff > 0 (i) -1 -2 0 J l °y V V VV f 3 ( l\ 1 -1 4.2 1.1.1 3 .0.0).8 (c) -24 (c) 14 (c) F (d) F (e) T 3.1.-2. . (a) 30 4.s £ R\ (g) < 9 0 9 \o) V <V I l) 1 s /0\ ( 2\ ( l\ 1 2 0 0 + r 1 + s .1.s e R +8 2 w 577 (c) There are no solutions.0.2 8 -i . -49 (d)T 9.7 V 3 1 1 0 -9/ 7.2.1 1. .1. (1.«2. 0 (c)T 7. (-2. (a) ^ 3 + t : te R 1 0 \ z) . (a) -10 + 15i 19.0). .1.Answers to Selected Exercises /4\ (f\ 0 2.

(a) F (b) T (c) F (d) T (e) F (f) T 3. and D = . tn +ari-it' H +tti< + ao /10 0 -An (c) 0 -20 26 .0) 5. (-20.3 0 (c) -49 (e) -28 .4 1. •.<•> ( . (0.3 Answers to Selected Exercises 1. (a) (g) 2.3 i 3 + 3i.1 1. (a) F (g) F 2. (a) 3. (a) T (b) T (c) T (d) F (e) F T (h) K (i) T (j) T (k) T 22 (c) 2 .-3. 16) 24. No (e) F (f)T CHAPTER 5 SECTION 5.£ : An \ 0 0 -8 18 28 -3i 0 0 (e) 4 -1 + i 0 (g) -20 -21 14 10+16i .v r " vs .v an<1 v) 3 The eigenvalues are 1 and —1. o\ -1 1 0 0 0 -1 1 [TJ no 0 0 .-48. (4. 48 SECTION 4. (a) F 3.-12. (a) (e) F (f) F (k) F (-1 0 0N (c) [T]3 = 0 1 0 [ T k 3 = ( _ $ o ) ' "° yes V 0 0 -1.578 SECTION 4. a basis of eigenvectors is '2\ ( l M ^ (2 L\ . Yes (c) T 7. No (b) T 5. „ i .4i -12 (c) -12 (e) 22 (g) . „ (\ D= y v .1 0 0 0 0 -1) The eigenvalues are 4 and — 1. a basis of eigenvectors is 1 M „ / 1 1 . (a) (c) -1 .5 . — 11 \ . SECTION 4.l —1/1 VI —i -l—i/ \Q (b) T (h) T (c) T (i) T (d) F (j) F (e) 3. (a) 4. Yes (d) F 9.5 1.i (g) 95 (f)T (g) F (h) F -6N 37 -16.-8) 7.

-4 + x2. .x2.2. .2)} /? = {(1.1). (g) .1 (i) A = 1.1 .(1. (a) A = 3. l .1.x .4 (b) A = -1.-1. 25% of the patients have recovered. (1.-1).l o j ' l o .3 (h) A = -1. 4 SECTION 5.2 ( .l )n" 2(5)n + ( . (b)x(t) = c 1 e 3 ^ . and 14% have died.1.~ Answers to Selected Exercises 4.2 ^ _ 1 1 ) (c)x(t)=e' SECTION 5.l ) 14.1 (j)A = . -1)} (d) (3 = {x .2 1.inr 3\5 -(-l) 2 ( 5 " ) .0). 41% are bedridden. x} -1 0\ /0 l \ / l 0 0= o 1/ ' \ o oy ' [o i 1 0\ /0 l \ (-1 0 (3 = i o) ' vo iy ' v i o 0 A (1 0\ / 0 1 (3 = . . (a) T (g)T 2. 20% are ambulatory. (a) F (g) T (b) F (h) T (c) F (i) F (d) T (c) Q = I / (g) Q = (e) T J (f)F /? = {(3.3 1. (a) Not diagonalizable i i i\ 2 -1 0 V-l 0 -l) (c) Not diagonalizable (e) (3 = {(1.1.^ + c 2 e . 5 26.1.1 . (a) Not diagonalizable (e) Not diagonalizable 3. (a) (b) T (h)F 0 0 0 0 (c) (c) F (i)F (d) F (j)T (e) T C I + C 2 0 [ 2ti + c3e l /-I 0 -1\ -4 1 -2 \ 2 0 2/ 6.x2.8 + x 3 .0.2 (f) A = 1. x + x 2 } 7 4 « _ l / 5 n n + 2 ( .l .(1.5).(2. One month after arrival.l / ' l l 0 579 0 0 1 0 0 -1 0 1 1 0 0 1 2. Eventually | | recover and |£ die.-1)} (3 = {-2 + x.

'0. JQ once-used.372^ '0.403.0. f .441 after two stages and ^0. (a) -tit2 . (c) A'1 = ± I 0 1 3 z \0 0 -2 6.225' eventually 0. the corresponding eventual percentages are 10%. 20.225 after two stages and 0. (a) tit . (a) (g)T 1) .3t + 3) (c) (t . 3 1 3 \) \ (e) l 2 0 0 V (g) 0 0 l 2 9.580 Answers to Selected Exercises 7. e° = I and e7 = el. l eventually 12. Only the matrices in (a) and (b) are regular transition matrices.334.0. and j§ twice-used 13.t 2 10.4 1. (a) F (b) T (c) F (d) F (e) T (f) T 2. '0.30. 24% will own large cars.337. and 42% will own small cars.50N 0.3t + 3) (c) 1 .4 \ 1 18. yjj new. 10.334 after two stages and .1 (c) 0 ' 1 ' 2 i w W V 2/ 9. and 60%. (a) '0.329N (e) 0. 8. SECTION 5. A\ /i\ / i\ 0 0 .l) 3 (t '2 . (c). 30%. v0.l)(t .20 eventually (c) . and (d) are T-invariant.2 . In 1995. The subspaces in (a). 34% will own intermediate-sized cars. (a) l 3 (c) No limit exists.

y ^ ( k ) { ^ ( . \\y\\ = vT4.4 .f . v/18(2 + i).^ ( .i ( l + i ) . For each part the orthonormal basis and the Fourier coefficients are given. (e) {±(2.« . (b) No SECTION 6. . 7 ) } .^T(l-lsint).w h ( 75 * (b) * I " M . (a) t2 . -4yjl. ^ ( 3 .3 .o) = 1. y/^(l + n). 11 a n d | | / + o| l l + 3e^ 2#. SQ is the plane through the origin that is perpendicular to xo.1 1.4 i ) . 0. i .i).2 1. 6 V 5 ( x . . -18 + 16i.^ .Answers to Selected Exercises 31. V60i-1 + 2i). Ji(2n + 2).2 + i). -9 + 8i.1. 9 .||/|| = ^ .2 + 4i.4). ^ ( . 2 .^cost. 29\ 19 .l .4 .1.v/^'(t+Jcost-f)}. l ) } ) (m) / V Il60(l + i) 1 -4% -7-26i _58 11 — 9i . v / 2 4 6 ( . f (-2. 0 {^I (a-V 5. (a) T (b) T (c) F (d) F (e) F (f) F (g) F 2.8 . y/4l(-l .i ) .1 7 . vTo5 ( c ) { l . (b) { f (1.-2. i -5-118i ^39063 V ~uu L 4.Qt + 6) 581 CHAPTER 6 SECTION 6. 3.(/.i ) . 1 ) . 2 V 3 ( x .1)}. .x + i ) } . f (0. (a) F (b) T (c) T (d) F (e) T (f) F (g) T 2. S1' is the line through the origin that is perpendicular to the plane containing xi and x 2 .! O ' i f c O . and \\x + y\\ = >/&. 4 .y) = 8 + 5i. 3 ^ . 2 (h)T e'-l 101 = ^ ^ . 16. . ^ ( . 3 . l . 5 .1).= s p a n ( { ( i . -1. f. (x. ^ .9 + 8i)}. f. . . i M « f i ( .1).5 i .2 i .3 .6t + 6 (c) -(t + l)(t2 . 10.i J l G =»)' ~ ^ (i){yfSint.i .-1.i. . \\x\\ = V7.

204x + 33 (a) T* (x) = (11.4t/3 + 2 with £7 = 0. T2 is normal for all z G C. An orthonormal basis of eigenvectors is < -(1 + i.K .3.55 with E = 0. V5 ( v5 (c) T is normal. —F(2. 14. 20. y = i . . V2 V2 (e) T is self-adjoint. —1. T 2 is self-adjoint unitary if and only if |. w = .4) (c) y = 210x2 . An orthonormal basis of eigenvectors is —T=(1.ft. 1. 1) \. V2).z| = 1. with corresponding eigenvalues 6 and 1. 22. and the quadratic function is y = t 2 /3 .z)y (a) The linear function is y = —2t + 5/2 with £7 = 1.1 73 (d)T D= 3 0 (e) F 0N -1 (-2 I 0 0 0 0N -2 0 0 4. and the quadratic function is t 2 /56 + 15t/14 + 239/280 with E = 0.22857 (approximation). 21.1. (a) T (g)F (b)F (h)F (c)F 0)F and • • « ' .25t + 0. An orthonormal basis of eigenvectors is 2 + 1. 1. z « \ (d) x = £ . SECTION 6.582 SECTION 6. —V2) > with corresponding eigenvalues l+iand2_i+!. (a) T (b) F (c) F (d) T (e) T (f) T (g) F (h)T 2. but not self-adjoint.5 l .( 1 + i. (f) T 1 v^3 V6 (d)P = and D = 1 1 V6 v^3 V 0 -is j-J 4. 3. -12) (c) r (/(*)) = 12 + 6t T*(x) = (x. (a) x = f.^ SECTION 6.i . Only the pair of matrices in (d) are unitarily if and only if z G . . —2). and T 2 is equivalent. 5. z = \.3 Answers to Selected Exercises (a) T (b) F (c) F (d) T (e) F (f) T (g) T (a) y = (1. The spring constant is approximately 2.4 1. • V2V1 oJ'^Vo V'v^V1 0/'V2 -1 0 0 1 with corresponding eigenvalues 1. y = f. -2. 2. (b) The linear function is y = 1. (a) T is self-adjoint.

2)}). i>2 = — = cos x.6) = | ( a + 6. (c) Q = 1 75 0 1 V3 _ 1 73 473 = -5. (a) 73 V-73 1 75 _ 1 72 ° _= h\ 76 1 76 TJ V S 0^ 0 0 . U3 = —==. 7 V71"x + 2 sin x v71" 2 cos x — V 2 x T sin7 cos -. 1 75 1 V2 1 75 i_ V2 .Answers to Selected Exercises 25. ( a ) t f .a + b + c) SECTION 6. X3 6 6 ^.6 .a + & ) (d) Ti(a. / l 29. b. U2 = -. (a) F (b) T (c) T (d) F \ )• (e) F 2. 2 0 .c .6 + 2c) and T2(a.c) = i ( 2 a . ForW = span({(l.f 27. .0 ) 26.6) = ± ( a . Fh = (\ 3.a + b + c. (a) 583 (b)V+! and y x = .6 1. (72 = VO) 0"3 = 2 / 1 '73 1 3.o . 0 0.c) = | ( a + 6 + c.a .6. .^ x ' + -^2/' = -L*' - ±y' The new quadratic form is 3(x') — (y') . (c) x = —T=X H—7=2/ and y = —-=x H—p=y V ' vT3 vT3 \/l3 \/l3 The new quadratic form is 5(x') — 8(y') . u3 = v^F' tii = \/57r \/57r O"! = V5. . (2) (a) Ti(o.c . X2 SECTION 6. (a) F (b) F (c) T (d) T (e) F (f) F (g) T 2 - ( a ) u i = G)' u 2 = (i)' u i = ^ ( i j ' M 2 = ^ ( _ j ) ' u 3 = ^ <7i = V3. 3/ =4 and R = /v/2 0 V 0 V2 \/3 0 2v^\ ^ ^y \ U (e) xi = 3.7 1.o + 6) andT 2 (a.a + 2 6 . <?2 = V2 1 ! 1\ • ! (c) f 1 = —= sin x.

(a) Zi = N(T) 1 = R 2 and Z2 = R(T) = span{(l.-1)} (c) Zx = N(T) l = V and Z2 = R(T) .y. Same as Exercise 17(c) 22. (a) (c) 0 0 0 0 0 0 0 0 \ . (a) Q= (i 0 (b) Q = 1 ""?! -* i. (a) WP = V2 75 \ / 2 1 1 J 1 -V§+^2 2 75 x N/2/ + z y .8 0/ \1 0 0 1/ 17.4 0 0 0 0 0 5.i and 1 N -0.2 0 . (a) F (b) K (c) T (d) F (e) T (f) F (g)F (h)F (i)T (j)F 4.25 2 .(0. (a) Yes (b) No (c) No (d) Yes (e) Yes (f) No / 0 0 0 O N f\ 0 0 l \ 1 0 . (a) and (b) 18. D= -1 0 0 0 0 N 4 0 0 6.2) = (c) T^a + 6 sin a: + ccosx) = T" a 6sinx + ccosx) = a (26 + c) sin re + (— 6 + 2c) cos x 2+" "I" -2 3 1 I -i o > ! " 3 -2 1 I -l w S C r * « S G 1 1 +i i 7.V 8.z \ 2 f +U 5.1). (c) /1 > V2 0 U J M 1 » w V2 0 V~3j/ /0 0 (c) Q = I 0 1 \1 0 and .8 1.1.75. (a) No solution 2 VI SECTION 6.584 i l v/lll (c) I vT T) V-2* ~4U (e) 2 1-t 2 1 V'2 I V2 o 0 0 1 73 _u\ 2 vTo 2 yio 1 Answers to Selected Exercises v2 y/2 2 1 -i i±i >/« 0 ^6 2 0 0 l+i 2 -I 4-i 2 1 J _ \ / s/&+s/2 -N/S+V/2 2 4. (a) T (x. 1.

2 = . (a) F (g)F (b) T (h)F (c) T (i)T (d) F (j)F (e) T (f)F 3.9 / vT^ 7. ||P|| = 2. 0.17 and ^ .f ' U ^ l IA. (c) There are six possibilities: (1) Any line through the origin if 0 = ip = 0 (2) < t I 0 I : t e ft > if 0 = 0 and tp = rr ( /cos ^ + F t I -sirn/- t G ft > if0 = 7 and tp ^ ir T (4) i t I cos 0 .01. (a) F (b) F (c) T (d) F (e) F 2.^ l1 6 " < c o n d ( A ) " 6 .10 1.11 1.A~lb\\ < ||A-11| • \\Ax .34 4.6|| a 0.(b) {t ( y 1 : / C H t G ft \ if 0 # 0 7.Answers to Selected Exercises SECTION 6. (a) \/l8 (c) approximately 2. and cond(A) « 1441 (b) ||x .? 5. (P.. 0 0 1 0 0 1 0 0 s/T^v2/ V vT \ 585 SECTION 6. and cond(P) = 2. ||A _1 || « 17. ft .74.)" 1 = VvT^ SECTION 6. (a) \\A\\ft*84.1 t G ft > if^ = 7r and 0 ^ n (5) < t 1 : t G ft > if 0 = ^ = 7 T .001 < "X„ „*" < 10 6.

586 Answers to Selected Exercises { SECTION 7. J = I O \0 (b) T O A2 O O O A:i (c)F (d)T (2 0 0 where A\ = 0 0 \0 and A3 = (e)T 1 0 2 1 0 2 0 0 0 0 0 0 0 -3 Ao = 3 (f)F 0 0 0 0 0 0 2 1 0 2 0 0 (g) F 0\ 0 0 0 0 2/ (h)T /4 1 o ON 0 4 1 0 A2 = 0 0 4 0 \ 0 0 0 4/ 3. (a) T /sin 0(cos i\> + 1) t I — sin0sint/> \sint/>(cos0 + 1) N t e ft otherwise CHAPTER 7 (b) F (f) F (g) T (h) T 2.-2x.2 1.1 1. 3.2 ) ( t . (a) For A = 2. o o) n (o o) C 0 ) SECTION 7. = 2 (b) (c) A2 = 3 (d) pi = 3 and p2 = 1 (e) (i) rank(Ui) = 3 and rank(U2) = 0 (ii) rank(U?) = 1 and rank(Ui) = 0 (iii) nullity(Ui) = 2 and nullity(U2) = 2 (iv) nullity(U?) = 4 and nullity(Ui) = 2 . (a) For A = 2. (a) . (c) For A = -1. {2. (a) T /Ax 2.3 ) ' 5 -3 0 A.x 2 } (2 1 0N J = I0 2 1 \ 0 0 2. A 0 J = 0 V" 1 1 0 0 0 0^ 0 0 1 1 0 J= (c) For A = 1.( t .

for (3).x 2 } /2 1 0 O N 0 2 1 0 (d)J = and 0 0 2 0 \ 0 0 0 4/ 1 oWo 1 0 = o oj'li o ci + c2t) j 0 j + c2 1 (c! + c2t + c3r) 0 + (c2 + 2c3t) + 2c3 SECTION 7. (a) /o . (a) t .1 o) (1 0 l -1 Q = 1 -2 \i o 1 2 1 l 587 and 1 -IN 0 l 0 l 1 0/ 1 0 O N 1 1 0 0 1 0 0 0 2/ 1 0 0\ 2 0 0 0 2 1 0 0 2/ and (3={2et. and all operators having both 0 and 1 as eigenvalues.2) (d) it .2.Answers to Selected Exercises /I 4.l)(t .2) (d) (t -l)it+ 1) 4. (a) and (d) 5.l) 2 (t .2 0 O N 1 0 0 0 (e) 0 0 0 -3 0 1 \o 0/ (e) T (f) F (g) T (c) ± ( .e2t} and /?= {6x.l + n/3) 0 \i-l-iV3) . The operators are To.4 1. (a) J = 0 \0 /2 0 (c) J = 0 \Q 0 2 0 i 0 0 0 O N 1 2/ o 0 2 0 and O N 0 0 2/ Q= lN 1 2 . SECTION 7. For (2).3) (c) (t .2)2 2 2 3.2 (c) (t . (a) T (b) F 2. (a). I.t2et. (a) it . (a) J = I 0 \0 /o 0 (d) J = 0 \0 (1 0 5. (a) F (b) T (c) F (d) F (e) T (f) F (g) F (h) T (i) T 2.3 1.2tet.x 3 .

3.-2x + x 2 .3 x + x 3 } (c) t ~ t + 1 2 (0 -1 0 0\ 1 1 0 0 C 0 0 0 .1 0 0 \ 1 0 0 0 C= 0 0 0 0 Vo o o o/ (3= {l.1 \0 0 1 1/ 0 O (0 N -1 oj'vo 1\ /0 oj'lo 0 -1 P= 1 0 0 0 588 .x. (a) t + 1 and t 2 2 /() . .

192-194 cyclic. 526 standard basis for F n . 449. see Multiplicity of an eigenvalue Algebraically closed field. 11 Cancellation laws for a field. 174 Auxiliary polynomial. 43 standard basis for P n (P). 59 Change of coordinate matrix. 355 Bilinear form. 399 Area of a parallelogram. 186 Backward pass. 444 invariants. 9 Addition of vectors.Index 589 I n d e x Absolute value of a complex number. 423 rank. 428 diagonalization. 137140 Axioms of the special theory of relativity. 161. 120 Jordan canonical. 558 Absorbing Markov chain. 373 . 46 Bessel's inequality. 528 Approximation property of an orthogonal projection. 377 Chain of sets. 424-428 product with a scalar. 126 of a vector. 453 Axis of rotation. 554 Canonical form Jordan. 422-433 diagonalizable. 526 dual. 424 Cancellation law for vector addition. 443 signature. 78 Additive inverse of an element of a field. 79 standard ordered basis for P„(F). 524. 433-435 vector space. 304 Absorbing state. 372 rational canonical. 331. 79 uniqueness of size. 204 Associated quadratic form. 317 for a matrix. 482. 60-61. 239 Angle between two vectors. 318. 341. 444 matrix representation. 561 Alternating n-linear function. 483 ordered. 428-430. 131. 304 Addition of matrices. 553 of a vector. 43-49. 186 Basis. 444 sum. 358-360 of a linear transformation. 446 Cauchy-Schwarz inequality. 367 of a matrix. 346-347. 134. 389 Augmented matrix. 42. 483-516 rational. 526-548 for a symmetric matrix. 112115 Characteristic of a field. 335 Annihilator of a subset. 202. 6 Additive function. 423 symmetric. 79 orthonormal. 473 Back substitution. 41. 428-