Professional Documents
Culture Documents
AND MATRIX
THEORY
JIMMIE GILBERT
LINDA GILBERT
University of South Carolina at Spartanburg
Spartanburg, South Carolina
®
ACADEMIC PRESS
San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper, fe)
• The complete text would be suitable for a one-year undergraduate course for
mathematics majors and would provide a strong foundation for more abstract
courses at higher levels of instruction.
• A one-semester or one-quarter course could be taught from the first five chapters
together with selections from the remaining chapters. The selections could be
chosen so as to meet the requirements of students in the fields of business, science,
economics, or engineering.
The presentation of material presumes the knowledge and maturity gained from one
calculus course, and a second calculus course is a desirable prerequisite.
It is our opinion that linear algebra is well suited to provide the transition from the
intuitive developments of courses at a lower level to the more abstract treatments en
countered later. Throughout the treatment here, material is presented from a structural
point of view: fundamental algebraic properties of the entities involved are emphasized.
This approach is particularly important because the mathematical systems encountered
in linear algebra furnish a wealth of examples for the structures studied in more ad
vanced courses.
The unifying concept for the first five chapters is that of elementary operations. This
concept provides the pivot for a concise and efficient development of the basic theory
of vector spaces, linear transformations, matrix multiplication, and the fundamental
equivalence relations on matrices.
A rigorous treatment of determinants from the traditional viewpoint is presented in
Chapter 6. For a class already familiar with this material, the chapter can be omitted.
In Chapters 7 through 10, the central theme of the development is the change in
the matrix representing a vector function when only certain types of basis changes are
admitted. It is from this approach that the classical canonical forms for matrices are
derived.
Numerous examples and exercises are provided to illustrate the theory. Exercises are
included of both computational and theoretical nature. Those of a theoretical nature
amplify the treatment and provide experience in constructing deductive arguments,
while those of a computational nature illustrate fundamental techniques. The amount
of labor in the computational problems is kept to a minimum. Even so, many of them
provide opportunities to utilize current technology, if that is the wish of the instructor.
Answers are provided for about half of the computational problems.
The exercises are intended to develop confidence and deepen understanding. It is
assumed that students grow in maturity as they progress through the text and the
proportion of theoretical problems increases in later chapters.
Since much of the interest in linear algebra is due to its applications, the solution
of systems of linear equations and the study of eigenvalue problems appear early in the
text. Chapters 4 and 7 contain the most important applications of the theory.
ix
X Preface
ACKNOWLEDGMENTS
We wish to express our appreciation for the support given us by the University of
South Carolina at Spartanburg during the writing of this book, since much of the work
was done while we were on sabbatical leave. We would especially like to thank Sharon
Hahs, Jimmie Cook, and Olin Sansbury for their approval and encouragement of the
project.
This entire text was produced using Scientific Word and Scientific Workplace, soft
ware packages from TCI Software Research, Inc. Special thanks are due to Christopher
Casey and Fred Osborne for their invaluable assistance throughout the project.
We would like to acknowledge with thanks the helpful suggestions made by the
following reviewers of the text:
We also wish to express our thanks to Dave Pallai for initiating the project at
Academic Press, to Peter Renz for his editorial guidance in developing the book, and to
Michael Early for his patience and encouragement while supervising production of the
book.
Jimmie Gilbert
Linda Gilbert
Chapter 1
1.1 Introduction
There are various approaches to that part of mathematics known as linear algebra.
Different approaches emphasize different aspects of the subject such as matrices, ap
plications, or computational methods. As presented in this text, linear algebra is in
essence a study of vector spaces, and this study of vector spaces is primarily devoted
to finite-dimensional vector spaces. The real coordinate spaces, in addition to being
important in many applications, furnish excellent intuitive models of abstract finite-
dimensional vector spaces. For these reasons, we begin our study of linear algebra with
a study of the real coordinate spaces. Later it will be found that many of the results
and techniques employed here will easily generalize to more abstract settings.
Definition 1.1 For each positive integer n, R n will denote the set of all ordered n-
tuples (ui,v,2, ...,un) of real numbers Ui. Two n-tuples (u\,U2,-..,un) and {v\,V2, •••5^n)
are equal if and only if ui = Vi for i = l,2,...,n. The set R n is referred to as an n-
dimensional real coordinate space. The elements of R n are called n-dimensional
real coordinate vectors, or simply vectors. The numbers Ui in a vector (u\, U2, ··., un)
will be called the components of the vector. The elements of R will be referred to as
scalar s. 1
x
The terms "vector" and "scalar" are later extended to more general usage, but this will cause no
confusion since the context will make the meaning clear.
1
2 Chapter 1 Real Coordinate Spaces
The real coordinate spaces and the related terminology described in this definition
are easily seen to be generalizations and extensions of the two- and three-dimensional
vector spaces studied in the calculus.
When we use a single letter to represent a vector, the letter will be printed in
boldface lower case Roman, such as v, or written with an arrow over it, such as V . In
handwritten work with vectors, the arrow notation V is commonly used. Scalars will
be represented by letters printed in lower case italics.
Definition 1.2 Addition in R n is defined as follows: for any u = (ui,U2, ...,^n) and
v = (^1,^2, .·., v n ) in R71; the sum u + v is given by
For any scalar a and any vector u = ( ΐ / ι , ι ^ •••^η) in R n , the product au is defined
by
au = {aui^auz, ···, aun).
The operation that combines the scalar a and the vector u to yield au is referred to as
multiplication of the vector u by the scalar a, or simply as scalar multiplication.
Also, the product au is called a scalar multiple of u.
The following theorem gives the basic properties of the two operations that we have
defined.
T h e o r e m 1.3 The following properties are valid for any scalars a and 6, and any vec
tors u, v, w in R n :
1. u + v G R n . (Closure under addition)
2. (u + v) + w = u + (v + w). (Associative property of addition)
3. There is a vector 0 in R n such that u + 0 = u for all u G R n . (Additive identity)
4. For each u G R n , there is a vector — u in R n such that u + (—u) = 0. (Additive
inverses)
5. u + v = v + u. (Commutative property of addition)
6. au G R n . (Absorption under scalar multiplication)
7. a(bu) = (ab)u. (Associative property of scalar multiplication)
8. a(u + v) = a u + av. (Distributive property, vector addition)
9. (a + b)u = au + bu. (Distributive property, scalar addition)
10. 1 u = u.
The proofs of these properties are easily carried out using the definitions of vector
addition and scalar multiplication, as well as the properties of real numbers. As typical
examples, properties 3, 4, and 8 will be proved here. The remaining proofs are left as
an exercise.
Proof of Property 3. The vector 0 = (0,0, ...,0) is in R n , and if u = (1*1,^2, ...,ix n ),
u + 0 = (ui + 0, u2 + 0,..., un + 0) = u.
1.2 The Vector Spaces R1 3
If linear dependence on a finite set B = {vi, V2, .··, v r } is under consideration, the
statement in Definition 1.4 is equivalent to requiring that all of the vectors in B be
involved in the linear combination. That is, a vector v is linearly dependent on B =
{vi, V2,..., v r } if and only if there are scalars &i, 62,..., br such that v = $^^ =1 biVi.
Vl = ( l , 0 , 2 , l ) , v 2 = ( 0 , - 2 , 2 , 3 ) , and v 3 = ( 1 , - 2 , 4 , 4 ) ,
v = 1 · vi + 0 · v 2 + 2 · v 3 ,
or
ν = 3 · ν ι + 2 - V 2 + 0- v 3 .
Either of these combinations shows that v is linearly dependent on B.
4 Chapter 1 Real Coordinate Spaces
To consider a situation involving an infinite set, let A be the set of all vectors in R 3
that have integral components, and let v = (y/2, §,0). Now Ui = (1,0,1), 112 = (0,1,0),
and 113 = (0,0,1) are in A, and
v = \/2ui + - u 2 - λ/2ιΐ3.
o
Thus v is linearly dependent on A. It should be noted that other choices of vectors u^
can be made in order to exhibit this dependence. ■
In order to decide whether a certain vector is linearly dependent on a given set in
R n , it is usually necessary to solve a system of equations. This is illustrated in the
following example.
Example 2 □ Consider the question as to whether (6,0, —1) is linearly dependent on
the set A = {(2, —1,1), (0,1, —1), (—2,1,0)}. To answer the question, we investigate the
conditions on αι,α2, and as that are required by the equation
2a\ — 2a3 = 6
-a\ + a2 + 03 = 0
a\ — a 2 = —1.
We decide to work toward the solution of this system by eliminating a\ from two of the
equations in the system. As steps toward this goal, we multiply the first equation by \
and we add the second equation to the third equation. These steps yield the system
a\ — as = 3
—a\ + a2 + as = 0
as = - 1 .
Adding the first equation to the second now results in
a\ — as = 3
a2 = 3
a3 = - 1 .
The solution a\ — 2, a 2 = 3, as = — 1 is now readily obtained. Thus the vector (6,0,-1)
is linearly dependent on the set A. M
1.2 T h e Vector Spaces R n 5
Another important type of dependence is given in the definition below. This time,
the phrase linearly dependent involves only a set instead of involving both a vector and
a set.
Again, the case involving a finite set of vectors is of special interest. It is readily
seen that a finite set B = {vi, V2,..., v r } is linearly dependent if and only if there are
scalars 61,62, ...,6 r , not all zero, such that ΣΓ=ι ^ v * = 0·
k k
Σ C3U3 = T,Cj(Ulj,U2j,-,Unj)
3=1 3=1
k k k
= ( £ CjUij, Σ CjU2j, .·., Σ CjUnj).
3= 1 3= 1 3=1
Thus J2j=i cjuj = 0 if and only if J2j=i cjuij = 0 f° r e a c n * = 1, 2,..., n. This shows
that the problem of determining the conditions on the Cj is equivalent to investigating
the solutions of a system of n equations in k unknowns. If ^ 7 = 1 cjuj = 0 implies
ci = c2 = · · · = Ck = 0, then {ui, U2,..., u^} is linearly independent.
The discussion in the preceding paragraph is illustrated in the next example.
c i ( l , l , 8 , l ) + c 2 ( l , 0 , 3 , 0 ) + c 3 ( 3 , l , 1 4 , l ) = (0,0,0,0).
ci + c2 + 3c 3 = 0
c\ + c3 = 0
8ci + 3c2 + 14c3 = 0
ci + c3 = 0 .
To solve this system, we first interchange the first two equations to place the equation
c\ + c 3 = 0 at the top.
c\ + c3 = 0
ci + c2 + 3c3 = 0
8ci + 3c2 + 14c3 = 0
c\ + c3 = 0
By adding suitable multiples of the first equation to each of the other equations, we
then eliminate c\ from all but the first equation. This yields the system
ci + c3 = 0
c2 + 2c3 = 0
3c2 + 6c3 = 0
0 = 0.
c\ -f c3 = 0
c2 + 2c3 = 0
0 - 0
0 = 0.
It is now clear that there are many solutions to the system, and they are given by
ci = - c 3
c2 = - 2 c 3
c 3 is arbitrary.
In particular, it is not necessary that c 3 be zero, so the original set of vectors is linearly
dependent. ■
1.2 T h e Vector Spaces Rg 7
Proof. If the set A = { u i , u 2 , ...,Ufc} is linearly dependent, then there are scalars
ai, (Z2, ...a/c such that Σί=ι a * u * = 0 with at least one α^, say α^, not zero. This implies
that
ajUj — — a\\\\ — · · · — dj-iUj-i — a j + i u J + i — · · · — a^u^
so that
■*=(-*)■—·+(-^-+(-^)—+ (-*)-
Thus Uj can be written as a linear combination of the remaining vectors in the set.
Now assume that some Uj is a linear combination of the remaining vectors in the
set, i.e.,
Uj = biui + 6 2 u 2 H h bj-iUj-x + fy+iUf+i H h bkuk.
Then
&iui + · · · + bj-iUj-x + ( - l ) u j -f 6 j + i u i + i + . · -h^Ufc = 0,
and since the coefficient of Uj in this linear combination is not zero, the set is linearly
dependent. ■
The different meanings of the word "dependent" in Definitions 1.4 and 1.5 should
be noted carefully. These meanings, though different, are closely related. The preced
ing theorem, for example, could be restated as follows: "A set {ιΐι,...,ιι&} is linearly
dependent if and only if some u^ is linearly dependent on the remaining vectors." This
relation is further illustrated in some of the exercises at the end of this section.
In the last section of this chapter, the following definition and theorem are of primary
importance. Both are natural extensions of Definition 1.4.
associative and commutative properties for vector addition [Theorem 1.3, (2) and (5)]
imply that
k I m \
u =m / k
υ \
Σ E d ΣΣ 4
2=1 \j = l ) j= l \t=l /
Exercises 1.2
1. For any pair of positive integers i and j , the symbol 6ij is defined by 6ij = 0 if
i φ j and <5^ = 1 if i = j . This symbol is known as the Kronecker delta.
1.2 The Vector Spaces R' 9
3. In each case, determine whether or not the given vector v is linearly dependent
on the given set A.
4. Assuming the properties stated in Theorem 1.3, prove the following statements.
(a) Λ = { ( 1 , 0 , - 2 ) , ( 0 , 2 , 1 ) , ( - 1 , 2 , 3 ) }
(b) A = {(1,4,3), (2,12,6), (5,21,15), (0,2, - 1 ) }
(c) Λ = {(1,2,-1), (-1,1,0), ( 1 , 3 , - 1 ) }
(d) A = {(1,0,1,2), (2,1,0,0), (4,5,6,0), (1,1,1,0)}
6. Show that the given set is linearly dependent and write one of the vectors as a
linear combination of the remaining vectors.
{(1,1,0),(0,1,1),(1,0,-1),(1,0,1)}
that cannot be written as a linear combination of the other vectors in the set.
10. Prove that if the set {ui, U2,..., u^} of vectors in R n contains the zero vector, it
is linearly dependent.
11. Prove that a set consisting of exactly one nonzero vector is linearly independent.
12. Prove that a set of two vectors in R n is linearly dependent if and only if one of
the vectors is a scalar multiple of the other.
13. Prove that a set of nonzero vectors {ui, 112,..., u^} in R n is linearly dependent if
and only if some u r is a linear combination of the preceding vectors.
(a) Prove that the set {ui — 112,112 — U3, ui + 113} is linearly independent.
(b) Prove that the set {ui — 112,112 — 113, Ui — 113} is linearly dependent.
16. Let 0 denote the empty set of vectors in R n . Determine whether or not 0 is
linearly dependent, and justify your conclusion.
17. Prove that any subset of a linearly independent set A Ç R n is linearly indepen
dent.
18. Let A Ç R n . Prove that if A contains a linearly dependent subset, then A is
linearly dependent.
1.3 Subspaces of R n
There are many subsets of R n that possess the properties stated in Theorem 1.3. A
study of these subsets furnishes a great deal of insight into the structure of the spaces
R n , and is of vital importance in subsequent material.
2. (u -h v) + w = u + (v + w) for all u, v, w in W .
3. 0 G W .
4. For each u G W , —u is in W .
5. u - h v = v + u for all u, v in W .
6. au G W /or a// a G R and a// u G W .
7. a (bu) = (ab)u for all a, b G R ana7 a// u G W .
8. a(u + v) = a u + av /or all a eH and all u, v G W .
9. (a + b)u —au + 6u /or all a, 6 G R ana7 a// u G W .
10. 1 u = u for all u G W .
Before considering some examples of subspaces, we observe that the list of properties
in Definition 1.9 can be shortened a great deal. For example, properties (2), (5), (7),
(8), (9), and (10) are valid throughout R n , and hence are automatically satisfied in any
subset of R n . Thus a subset W of R n is a subspace if and only if properties (1), (3), (4),
and (6) hold in W . This reduces the amount of labor necessary in order to determine
whether or not a given subset is a subspace, but an even more practical test is given in
the following theorem.
m
4 . T h e set of all vectors (χι,Χ2, -.·,^η) R n t h a t satisfy a fixed equation
m
5. T h e set of all vectors (xi,X2,...,xn) R-n t h a t satisfy the system of equations
Proof for the third set. Let W be the set of all vectors t h a t are dependent on the set
A — { u i , U 2 , . . . , u/e} of vectors in R n . From the discussion in the paragraph following
Definition 1.4, we know t h a t W is the set of all vectors t h a t can be written in the form
Σί=ι aiui- T h e set W is nonempty since
0 · U! + 0 · u 2 H h 0 ■ uk = 0 is in W .
u = 2_. ciui an
d v = \^ diUi.
2=1
Thus we have
c u b d u
a u + bv = a ί Σ ii ) + ( Σ ii )
k k
= Σ aCiUi + Σ bdiUi
i=l i=l
k
= Y,(aci\ii + bdi\ii)
i=l
k
= Y,(aci + bdi)\ii.
i=l
Our next theorem has a connection with the sets listed as 4 and 5 in Example 1
that should be investigated by the student. In this theorem, we are confronted with a
situation which involves a collection that is not necessarily finite. In situations such as
this, it is desirable to have available a notational convenience known as indexing.
Let C and T be nonempty sets. Suppose that with each λ G C there is associated
a unique element t\ of T, and that each element of T is associated with at least one
λ G C. (That is, suppose that there is given a function with domain C and range T.)
Then we say that the set T is indexed by the set £, and refer to C as an index set. We
write {t\ | λ G C} to denote that the collection of t\'s is indexed by C.
If {M\ | λ G C} is a collection of sets M\ indexed by £, then \J M\ indicates
xec
the union of this collection of sets. Thus |J Λ4\ is the set of all elements that are
xec
contained in at least one Ai\. Similarly, Ç\ M.\ denotes the intersection of the sets
xec
M\, and consists of all elements that are in every Λ4χ.
Theorem 1.11 The intersection of any nonempty collection of subspaces of R n is a
subspace o / R n .
Thus the operation of intersection can be used to construct new subspaces from
given subspaces.
There are all sorts of subsets in a given subspace W of R n . Some of these have the
important property of being spanning sets for W , or sets that span W . The following
definition describes this property.
Intuitively, the word span is a natural choice in Definition 1.12 because a spanning
set A reaches across (hence spans) the entire subspace when all linear combinations of
A are formed.
In calculus texts, the vectors in £3 are labeled with the standard notation
The coefficients in this equation can be found by inspection if we start with the first
component and work from left to right. ■
We shall see in Theorem 1.15 that the concept of a spanning set is closely related to
the set (A) defined as follows.
Definition 1.13 For any nonempty set A of vectors in R n , (A) is the set of all vectors
in R n that are dependent on A. By definition, (0) is the zero subspace {0}.
Thus, for A Φ 0 , (*4) is the set of all vectors u that can be written as u = J^ · χ ÜJUJ
with ÜJ in R and Uj in A. Since any u in A is dependent on A, the subset relation
A Ç (A) always holds.
In Example 1 of this section, the third set listed is (^4) where A = {ui,ii2, ...,Ufc}
in R n . When the notation (A) is combined with the set notation for this A, the result
is a somewhat cumbersome notation:
CA> = <(1,3,7),(2,0,6)>
instead of (A) = ({(1,3, 7), (2,0,6)}) to indicate the set of all vectors that are dependent
on A.
It is proved in Example 1 that, for a finite subset A = {ui,u 2 , ...,Ufc} of R n , the
set (A) is a subspace of R n . The next theorem generalizes this result to an arbitrary
subset A of R n .
We state the relation between Definitions 1.12 and 1.13 as a theorem, even though
the proof is almost trivial.
We will refer to (A) as the subspace spanned by A. Some of the notations used
in various texts for this same subspace are
The linear combination is not as apparent with (2,3,2,3), so we place unknown coeffi
cients in the equation
Using the same procedure as in Example 2 of Section 1.2, we obtain the system of
equations
ai + a,2 =2
a2 -l· a3 = S
ai + a2 + a3 = 2
a2 + a3 = 3 .
It is then easy to find the solution a\ = — 1, a2 = 3, a$ = 0. That is,
Thus we have ^ Ç W .
We must now decide if every vector in W is linearly dependent on A, and Theorem
1.8 is of help here. We have W dependent on the set
B = {(1,0,1,0), (1,1,1,1),(0,1,1,1)}.
61 + 263 = 0
62 + 363 = 1
61 + 263 = 1
62 + 363 = 1 .
The first and third equations contradict each other, so there is no solution. Hence W
is not dependent on *4, and the set A does not span W . ■
We saw earlier in this section that the operation of intersection can be used to
generate new subspaces from known subspaces. There is another operation, given in the
following definition, that also can be used to form new subspaces from given subspaces.
au + bv = a(ui + u 2 ) + b(vx + v 2 )
= (aui + 6vi) + (au 2 + 6v 2 ),
Wi = ((l,-l,0,0),(0,0,0,l)>,
W 2 = {(2,-2,0,0), (0,0,1,0)).
Then W i is the set of all vectors of the form
α ι ( 1 , - 1 , 0 , 0 ) + 02(0,0,0,1) = ( a i , - a i , 0 , a 2 ) ,
α ι ( Ι , - Ι , Ο , Ο ) + α 2 ( 0 , 0 , 0 , 1 ) + 6 i ( 2 , - 2 , 0 , 0 ) + &2(0,0,1,0)
= (ai + 2&1, —ai — 2b\, 62,02).
The last equation describes the vectors in W i + W 2 , but it is not the most efficient
description possible. Since a\ + 2b\ can take on any real number c\ as a value, we see
that W i + W2 is the set of all vectors of the form
(ci,-ci,c2,c3). ■
Exercises 1.3
1. Prove that each of the sets listed as 1, 2, 4, and 5 in Example 1 of this section is
a subspace of R n .
18 Chapter 1 Real Coordinate Spaces
2. Explain the connection between the sets listed as 4 and 5 in Example 1 and
Theorem 1.11.
3. Let C denote the set of all real numbers λ such that 0 < λ < 1. For each λ G £,
let M\ be the set of all x G R such that \x\ < X. Find (J M\ and |°| M\.
xec xec
3
4. Let V = R .
5. Formulate Definition 1.4 and Definition 1.5 for an indexed set A — {u\ \ X G £ }
of vectors in R n .
6. For each given set A and subspace W , determine whether or not A spans W .
(a) .4 = { ( 1 , 0 , 2 ) , ( - 1 , 1 , - 3 ) } , W = <(1,0,2)>
(b) A = {(1,0,2), ( - 1 , 1 , - 3 ) } , W = ((1,1,1), (2, -1,5))
(c) A = {(1,-2), ( - 1 , 3 ) } , W = R 2
(d) A = {(2,3,0,-1), ( 2 , 1 , - 1 , 2 ) } , W = ( ( 0 , - 2 , - 1 , 3 ) , (6,7,-1,0))
(e) A = {(3, - 1 , 2 , 1 ) , (4,0,1,0)}, W = ((3, - 1 , 2 , 1 ) , (4,0,1,0), (0, -1,0,1))
(f) A = {(3, - 1 , 2 , 1 ) , (4,0,1,0)}, W = ((3, - 1 , 2 , 1 ) , (4,0,1,0), (1,1, - 1 , -1))
(a) 11 = ( - 4 , 1 , - 5 )
(b) u = ( 3 , 2 , - 6 )
(c) 11= (-5,3,-2)
(d) u = (3,0,0)
10. Prove or disprove: (A + B) = (A) + (B) for any nonempty subsets A and B of R™.
11. Let W be a subspace of R™. Use condition (ii) of Theorem 1.10 and mathematical
induction to show that any linear combination of vectors in W is again a vector
inW.
1.4 Geometrie Interpretations of R 2 and R 3 19
12. If A Ç R n , prove that (A) is the intersection of all of the subspaces of R n that
contain A.
14. Prove that (A) = (B) if and only if every vector in A is dependent on B and every
vector in B is dependent on A.
I 1 1 I I
■10 1 2
Figure 1.1
fry) (*>y>z)
-»* > y
n =2 n = 3
Figure 1.2
In making identifications of vectors with directed line segments, we shall follow the
convention that any line segment with the same direction and the same length as the
one we have described may be used to represent the same vector v.
20 Chapter 1 Real Coordinate Spaces
Figure 1.3
Figure 1.4
Figure 1.5
If Λ = {vi, V2} is independent, then νχ and V2 are not collinear. If P is any point
in the plane determined by vi and V2, then the vector OP from the origin to P is the
diagonal of a parallelogram with sides parallel to Vi and V2, as shown in Figure 1.6. In
this case, the subspace (A) consists of all vectors in the plane through the origin that
contains νχ and V2.
Figure 1.6
If A = {vi, V2, V3} is linearly independent, then vi and V2 are not collinear and V3
does not lie in the plane of vi and V2. Vectors v i , V2, and V3 of this type are shown
in Figure 1.7. An arbitrary vector OP in R 3 is the diagonal of a parallelepiped with
adjacent edges aiVi,a2V2, and (Z3V3 as shown in Figure 1.7(a). The "heads to tails"
construction along the edges of the parallelepiped indicated in Figure 1.7(b) shows that
OP = aiVi + a 2 v 2 + a 3 v 3 .
22 Chapter 1 Real Coordinate Spaces
(a) (b)
Figure 1.7
We shall prove in the next section that a subset of R 3 cannot contain more than three
linearly independent vectors. Thus the subspaces of R 3 fall into one of four categories:
1. the origin;
2. a line through the origin;
3. a plane through the origin;
4. the entire space R 3 .
It is shown in calculus courses that a plane in R 3 consists of all points with rectan
gular coordinates (x,y,z) that satisfy a linear equation
ax + by + cz = d
in which at least one of a, 6, c is not zero. A connection is made in the next example
between this fact and our classification of subspaces.
2
Note that ordered triples such as (1,2,3) are doing double duty here. Sometimes they are coordi
nates of points, and sometimes they are vectors.
1.4 Geometrie Interpretations of R 2 and R 3 23
» y
Figure 1.8
Since the points lie in the plane, their coordinates must satisfy the equation
ax + by + cz = d
of the plane. Substituting in order for (0,0,0), (1, 2,3), and (3,5,1), we obtain
0=d
a + 2b + 3c = d
3a + 56 + c = d .
Using d — 0 and subtracting 3 times the second equation from the last one leads to
a + 26 + 3c = 0
- 6 - 8c - 0 .
a = 13c
b= - 8 c
c is arbitrary.
With c = 1, we have
13x - 8y + z = 0
as the equation of the plane (Λ).
24 Chapter 1 Real Coordinate Spaces
In the remainder of our discussion, we shall need the following definition, which
applies to real coordinate spaces in general.
Definition 1.18 For any two vectors u = (u\,U2,..., un) and v = (t>i,t>2, ...,ü n ), the
inner product (dot product, or scalar product) o / u and v is defined by
n
U v
U V = U\V\ + U2V2 H h UnVn = ^2 k k-
fc=l
The inner product defined in this way is a natural extension of the following defini
tions that are used in the calculus:
( X l , 2 / l , Z l ) · (x2,2/2,^2) = ^ 1 ^ 2 + 2 / 1 2 / 2 + ^1^2.
The distance formulas used in the calculus lead to formulas for the length ||v|| of a
vector v in R 2 or R 3 as follows:
||(*,?/,z)|| = ^ 2 + 2/2 + * 2 .
We extend these formulas for length to more general use in the next definition.
Definition 1.19 For any v = (^1,^2, ...,v n ) in R n , ί/ie length for normj of v zs
denoted by ||v|| and zs defined by
Ml = \jvi+v'.* + ■
The following properties are direct consequences of the definitions involved, and are
presented as a theorem for convenient reference.
Theorem 1.20 For any u, v, w in R n and any a in R:
(i) u· v = v· u
(ii) (au) · v — u-(av) = a(u · v)
fm,) u - ( v + w) = u - v + u - w
(w,) ||u|| = ^/u · u , or u - u = | | u | |
(v) ||au|| = \a\ ||u||.
Figure 1.9
Theorem 1.21 For any two nonzero vectors u = (^1,^2^3) and v = (^1,^2,^3) in
R 3 ? u · v = ||u|| || v || cos Θ, where Θ is the angle between the directions of u and v and
0° < Θ < 180°.
Proof. Suppose first that Θ = 0° or Θ = 180°. Then v = cu, where the scalar c is
positive if Θ = 0° and negative if Θ = 180°. We have
c u
Thus the theorem is true for Θ = 0° or (9 = 180°.
Suppose now that 0° < 0 < 180°. If u — v is drawn from the head of v to the head
of u, the vectors u, v and u — v form a triangle with u — v as the side opposite Θ. (See
Figure 1.10.)
(Vi,V2,V3)
VL-\ = (UI-V1,U2-V29U3-V3)
(u,,u2,u3)
Figure 1.10
Thus
lull llvll cos0 Ni 2 + V|| U · v|| 2 )
_ I W W 2 _i_„,2
= H l+ 2+M3+*>i+V5+u!
" [ K - ^l) 2 + (U2 - ^ ) 2 + (U3 - V3)2}}
U · V.
Corollary 1.22 In R 2 or R 3 , £wo nonzero vectors u and v are perpendicular (or or
thogonal) if and only if u · v = 0.
Proof. This follows at once from the fact that u · v = 0 if and only if cos Θ = 0. ■
Pr°j u v Proj u v
Figure 1.11
Let Θ (0° < Θ < 180°) denote the angle between the directions of u and v as labeled
in Figure 1.11. The number
d= llvll cos Θ = ^ΓΖ
u
is called the scalar projection of v onto u or the scalar component of v along
u. From Figure 1.11, it is clear that d is the length of Proj u v if 0° < Θ < 90° and d is
the negative of the length of Proj u v if Θ > 90°. Thus d can be regarded as the directed
length of Proj u v.
The geometry involved in having line segments perpendicular to each other breaks
down in R n if n > 3. Even so, we extend the use of the word orthogonal to all R n . Two
1.4 Geometrie Interpretations of R 2 and R 3 27
Exercises 1.4
1. Use Figures 1.2 and 1.3 as patterns and illustrate the parallelogram rule with the
vectors u = (1,6), v = (4, —4), and u -h v in an ^-coordinate system.
2. Use Figures 1.2 and 1.4 as patterns and sketch the vectors u = (5,6), v = (2, —3),
and u — v in an xy-coordinate system.
3. For each λ G R, let M\ be the set of all points in the plane with rectangular
coordinates (x,y) that satisfy y = Xx. Find f] Λ4χ and (J λ4χ.
xec xec
4. Find the equation of the plane (*4) for the given set A.
(a) .A = { ( 1 , 0 , 2 ) , (2, - 1 , 1 ) }
(b) . 4 = {(1,0,2),(2,1,5)}
(a) ( 3 , - 4 , - 1 2 )
(b) (2,3,6)
(c) (1,-2,4,2)
(d) (2,6,0,-3)
(e) ( 1 , - 2 , - 4 , 3 )
(f) (3,0,-5,8)
6. Determine x so that (x,2) is perpendicular to (—3,9).
7. A vector of length 1 is called a unit vector.
(a) Find a unit vector that has the same direction as (3, —4,12).
(b) Find a vector in the direction of u = (2, —3,6) that has length 4 units.
9. Find the length of the projection of the vector (3,4) onto a vector contained in
the line x — 2y = 0.
10. Use projections to write the vector (19,22) as a linear combination of (3,4) and
(4, —3). (Note that (3,4) and (4, —3) are perpendicular.)
28 Chapter 1 Real Coordinate Spaces
15. Let u and v be vectors in R n . Prove that ||u|| ||v|| if and only if u + v and
u — v are orthogonal.
U2 U3 U3 U\ U\ U2
ei + e2 + e3,
V2 V3 V3 Vi Vi V2
ei e 2 e 3
u x v = I u\ u2 u3
Vi V2 V3
Definition 1.23 A set B of vectors is a basis of the subspace W if (i) B spans W and
(ii) B is linearly independent.
The empty set 0 is regarded as being linearly independent since the condition for
linear dependence in Definition 1.5 cannot be satisfied. Thus 0 is a basis of the zero
subspace of R n .
Example 1 □ Some of our earlier work helps in providing examples concerning bases.
We saw in Example 2 of Section 1.3 that each of the sets
ci =0
ci + c2 =0
ci + c2 + c 3 = 0
.4 = { ( 1 , 1 , 8 , 1 ) , (1,0,3,0), (3,1,14,1)}
is linearly dependent. It follows that this set A is not a basis for the subspace
that it spans. I
We are concerned in much of our future work with indexed sets of vectors, and we
use a restricted type of equality for this type of set. Two indexed sets A and B are
equal if and only if they are indexed A = {u\ \ λ e C} and B = {νχ | λ G C} by
30 Chapter 1 Real Coordinate Spaces
the same index set C such that u\ = v\ for each λ G C. In particular, two finite sets
A — {ui, u 2 ,..., u / J and B — {vi, v 2 ,..., v&} are equal if and only if they consist of the
same vectors in the same order.
The equality described in the preceding paragraph is the one we shall use in the
remainder of this book. For finite sets A — { u i , u 2 , ...,Ufc} of vectors, this equality is
actually an equality of ordered sets. For example, if Ui φ 112, then
{ui,U2,...,Ufc} φ {u2,Ui,...,Ufc}.
When we write
A = {ui,u2,...,u/J,
this notation is meant to imply that A is an ordered set with Ui as the first vector,
U2 as the second vector, and so on. Moreover, we make a notational agreement for the
remainder of this book that when we list the vectors in a set, this listing from left to
right specifies their order. For instance, if we write
.4 = { ( 5 , - 1 , 0 , 2 ) , (-4,0,3,7), (1,-1,3,9)}
this means that ( 5 , - 1 , 0 , 2) is the first vector in A, (—4,0,3,7) is the second vector in
A, and (1, —1,3,9) is the third vector in A. That is, the vectors in A are automatically
indexed with positive integers 1,2,3, · · · from left to right without this being stated.
Suppose now that B = {vi, v 2 ,..., v/J is a basis of the subspace W . Then B spans
W , so that any v G W can be written as X ^ = 1 «iV^. As a matter of fact, this expression
is unique. For if v = Σί=ι ^ v * a s w e n > w e n a v e
k k
aiVi =
Σ ΣhiWi
and therefore
k
^ ( a » - bi)vi = 0.
i=l
Theorem 1.24 LetW be a subspace ofHn. Suppose that a finite set A— {ui, 112,..., u r }
spans W , and let B be a linearly independent set of vectors in W . Then B contains at
most r vectors.
1.5 Bases and Dimension 31
vi = a n u i + a 2 iu 2 Λ h ariur.
(This assumption is purely for notational convenience. We are assuming that the "suit
ably chosen" vector in A is the first vector listed in A.) The equation for v i implies
that
a n u i = vi — (I21U2 — · · · — a r i u r
and therefore
\anJ \ a nJ \ auJ
Thus Ui is dependent on {vi,ii2, . . . , u r } , and this clearly implies that A is dependent
on { v i , u 2 , . . . , u r } .
Assume now that A is dependent on {vi, V2,..., ν&,ιΐ£+ι, . . . , u r } , where 1 < k < r.
Since W is dependent on A, then W is dependent on {vi,V2,...,Vfc,Ufc + i,...,u r } by
Theorem 1.8. In particular,
k r
i=l i=k+l
At least one of the coefficients αι^+ι of the u^ must be nonzero. For if they were all zero,
then Vfc+i would be a linear combination of νχ, ν 2 ,..., ν&, and this would contradict the
linear independence of B. Without loss of generality, we may assume that a/c+i^+i ^ 0.
Hence we obtain
k r
a
Vfc+1 = 2^6z,fc+lV; + 2_^ i,HlUi
i=l i=k+l
« * + i = E ( - Ja i ! Î ± î - ) v * + ( ^ L - ) v f c + i + Σ, a
(--^ U7.
i=l \ /e+l,/e+l/ \ûfc+l,fc+l/ i=fc+2 ^ k+l,k+l ,
Thus
{vi,v 2 ,...,v f c ,u f c +i,...,u r }
32 Chapter 1 Real Coordinate Spaces
is dependent on
{vi,V2,...,V/c,V/c+i,Ufc+2,...,Ur}.
Since A is dependent on {vi, v 2 ,..., ν^, u^+i, . . . , u r } , Theorem 1.8 implies that A is
dependent on {vi, v 2 ,..., v*, v^+i, u fc + 2 , .··, u r } .
Letting k — 1, 2, ...,r — 1 in the iterative argument above, we see that each vz- in
B can be used to replace a suitably chosen vector in A until we obtain the fact that A
is dependent on {vi, v 2 ,..., v r } . But B is dependent on A, so we have B dependent on
{vi, v 2 ,..., v r } . In particular, if B had more than r elements, any v r + i in B would be
dependent on {vi, v 2 ,..., v r } . But this is impossible since B is independent. Therefore,
B has r elements, and this completes the proof. ■
Corollary 1.25 Any linearly independent set of vectors in R n contains at most n vec
tors.
Proof. The set of n vectors βχ = (1,0, ...,0),e 2 = (0,1, ...,0), ...,e n = (0,0, ...,1)
spans R n since v = (^i,i>2, ...,^ n ) can be written as v = Σ Γ = ι ^ θ * · The c o r o n a r y
follows at once from the theorem. H
If we think in terms of geometric models as presented in Section 1.4, the next theorem
seems intuitively obvious. It certainly seems obvious in R 2 and R 3 , and there is no
reason to suspect the situation to be different in R n for other values of n. On the
other hand, there is no compelling reason to suspect that the situation would not be
different in R n for other values of n. At any rate, we refuse to accept such an important
statement on faith or intuition, and insist that this result be validated by a logical
argument based upon our development up to this point. This attitude or frame of mind
is precisely what is meant when one refers to the "axiomatic method" of mathematics.
Theorem 1.26 Every subspace o / R n has a basis with a finite number of elements.
Theorem 1.27 Let W be a subspace ofHn, and let A and B be any two bases for W .
Then A and B have the same number of elements.
Proof. If W = {0}, then each of A and B must be the empty set 0 , and the number
of elements in both A and B is 0.
Suppose W φ {0}. From Corollary 1.25, A and B are both finite. Let A —
{ui,ii2, ...,u r } and B = {νχ, V2,..., v t } . Since A spans W and B is linearly independent,
t < r by Theorem 1.24. But B spans W and A is linearly independent, so r < t by the
same theorem. Thus, t = r. ■
The following theorem is somewhat trivial, but it serves to confirm that the preceding
definition of dimension is consistent with our prior experience.
Proof. Consider the set En — {ei,e 2 , . . . , e n } , where ei = (1,0, ...,0),e 2 = (0,1, ...,0),
...,e n = (0,0,0,..., 1), are the same as in the proof of Corollary 1.25. It was noted in
that proof that an arbitrary vector v = (vi, ^2,···»^η) c a n be written as
n
v = Συίθί>
and therefore £n spans R n .
The set £n is linearly independent since
n
^2,c%^i = 0 implies (ci,c 2 , ...,c n ) = (0,0, ...,0)
i=l
and therefore all Q = 0. Thus Sn is a basis of R n with n elements, and it follows that
R n has dimension n. ■
Definition 1.30 The set En = {βχ,β2, ...,e n } used in the proof of Theorem 1.29 is
called the standard basis o / R n .
The discussion of coefficients just before Theorem 1.24 explains why the coefficients
Ci,C2, ...,c n in v = Σ™=1 CiWi are unique whenever B = {vi, V2,..., v n } is a basis of
R n . The scalars c\ are called the coordinates of v relative to B. For the special basis
En = {ei,e2,...,e n }, the components of v are the same as the coordinates relative to
£-77,·
34 Chapter 1 Real Coordinate Spaces
There are several types of problems involving "basis" and "dimension" that occur
often in linear algebra. In dealing with a certain subspace W , it may be necessary to
find the dimension of W , to find a basis of W , or to determine whether or not a given
set is a basis of W . Frequently it is desirable to find a basis of W that has certain
specified properties. The fundamental techniques for attacking problems such as these
are developed in the remainder of this section.
Although the details of the work would vary, the procedure given in the proof above
provides a method for "refining" a basis from a given spanning set. This refinement
procedure is demonstrated in the next example.
Example 3 D With
we shall use the procedure in the proof of Theorem 1.31 to find a basis of W = (A)
that is contained in A.
It is natural to start the procedure by choosing Vi = (1,2,1,0). We see that A is
not dependent on {νχ} because the second vector in A, ( 3 , - 4 , 5 , 6 ) , is not a multiple
of v i . If we let V2 = (3, —4,5,6), then {vi, V2} is linearly independent.
1.5 Bases and Dimension 35
We need to check now to see if A is dependent on {vi, V2}. When we set up the
equation
c i ( l , 2 , l , 0 ) + c 2 (3,-4,5,6) = ( 2 , - l , 3 , 3 ) ,
this leads to the system of equations
ci+3c2 = 2
2ci - 4 c 2 = -1
c\ + 5c2 = 3
6c2 = 3.
The solution to this system is easily found to be c\ — c2 = \ . Thus the third vector in
A is dependent on {vi, v 2 } . In similar fashion, we find that
(l)(l,2,l,0) + (-l)(3,-4,5,6) = ( - 2 , 6 , - 4 , - 6 ) .
{(1,2,1,0), ( 3 , - 4 , 5 , 6 ) }
In the proof of Theorem 1.31, we have seen how a basis of a subspace W can be
refined or extracted from an arbitrary spanning set. The spanning set is not required
to be a finite set, but it could happen to be finite, or course. If a spanning set is
finite, the natural refining procedure demonstrated in Example 3 can be given a simpler
description: A basis of W can be obtained by deleting all vectors in the spanning set
that are linear combinations of the preceding vectors. Problem 13 of Exercises 1.2
assures us this will lead to an independent set, and Theorem 1.8 assures us this will
lead to a spanning set. Thus a basis will result from the deletion of all vectors in a finite
spanning set that are linear combinations of preceding vectors as listed in the spanning
set.
Our next theorem looks at a procedure that in a sense is opposite to refining: It
considers extending a linearly independent set to a basis.
Suppose that some u^ ^ (23). Let k\ be the smallest integer such that u*^ ^ (23).
Then 23i = {νχ, v 2 , .··, v ^ u ^ } is linearly independent. If each u^ G (23i), then B\
spans W and forms a basis. If some u^ ^ (B\), we repeat the process. After p steps
(1 < V < r)> w e arrive at a set Bp = {vi, V2,..., Vt, ιΐ/^,ιι^, ...,u/ep} such that all
vectors of >4 are dependent on 23p. Thus 23p spans W . Since no vector in Bp is a
linear combination of the preceding vectors, 23p is linearly independent by Problem 13
of Exercises 1.2. Therefore Bp is a basis of W . ■
In the next example, we follow the preceding proof to extend a linearly independent
set to a basis.
and
vi = ( l , 0 , l , 0 ) , v 2 = (0,2,0,3).
Following the proof of the theorem, we find that
(l,0,0,0) = c i ( l , 0 , l , 0 ) + c 2 (0,2,0,3)
has no solution. Thus u 2 = (1,0,0,0) is not in (23), and k\ = 2 is the smallest integer
such that Ufcj ^ (23). Using the notation in the proof of the theorem, the set
B\ = { v i , v 2 , u 2 }
= {(1,0,1,0),(0,2,0,3),(1,0,0,0)}
is linearly independent. We check now for a vector u^2 in A that is not in (B\). We
find that
(0,0,1,0) = ( l ) ( l , 0 , l , 0 ) + (0)(0,2,0,3) + ( - l ) ( l , 0 , 0 , 0 )
and U3 = (0,0,1,0) is in (B\), but the equation
( 0 , l , 0 , l ) = c i ( l , 0 , l , 0 ) + c2(0,2,0,3)+c3(l,0,0,0)
1.5 Bases and Dimension 37
#2 = {vi,v 2 ,112,114}
such that all vectors in A are dependent on B<i. According to the proof of Theorem 1.32,
this set i?2 is a basis of R 4 . ■
Our last two theorems in this section apply to the very special situations where the
number of vectors in a set is the same as the dimension r of the subspace involved. For
sets of this special type, only one of the conditions for a basis needs to be checked. This
is the substance of the following two theorems.
Exercises 1.5
1. Given that each set A below spans R 3 , find a basis of R 3 that is contained in A.
(Hint: Follow the proof of Theorem 1.31.)
2. Given that each set A is a basis of R 4 and that each B is linearly independent,
follow the proof of Theorem 1.32 to extend B to a basis of R 4 .
38 Chapter 1 Real Coordinate Spaces
(a) A= {(1,-2,3), ( 0 , 1 , - 2 ) , ( 1 , - 1 , 2 ) }
(b) „4 = {(1,0,0), (1,1,0), (1,1,1)}
(c) .4 = {(2,0,0), (4,1,0), (3,3,1)}
(d) .4 = {(2,-1,1), ( 0 , 1 , - 1 ) , (-2,1,0)}
4. Show that each of the sets A in Problem 3 is a basis of R 3 by using Theorem 1.34.
5. By direct use of the definition of a basis, show that each of the sets A in Problem
3 is a basis of R 3 .
6. Which of the following sets of vectors in R 3 are linearly dependent?
(a) {(1,3,1),(1,3,0)}
(b) {(1,-1,0), (0,1,1), (1,1,1), (0,0,1)}
(c) {(1,1,0), (0,1,1), (1,2,1), ( 1 , 0 , - 1 ) }
(d) {(1,0,1), (0,1,1), (2,1,3)}
(e) {(1,0,0), (1,1,0), (1,1,1)}
(f) {(1,1,0), ( 0 , 1 , - 1 ) , (1,0,0)}
(a) {(1,0,0),(0,1,0),(0,0,1),(1,1,1)}
(b) {(1,0,0), (0,1,1)}
(c) {(1,0,0),(1,0,1),(1,1,1)}
(d) {(1,0,0), (0,1,0), (1,1,0)}
3=1
Elementary Operations on
Vectors
2.1 Introduction
The elementary operations are as fundamental in linear algebra as the operations of
differentiation and integration are in the calculus. These elementary operations are
indispensable both in the development of the theory of linear algebra and in the appli
cations of this theory.
In many treatments of linear algebra, the elementary operations are introduced after
the development of a certain amount of matrix theory, and the matrix theory is used as
a tool in establishing the properties of the elementary operations. In the presentation
here, this procedure is reversed somewhat. The elementary operations are introduced
as operations on sets of vectors and many of the results in matrix theory are developed
with the aid of our knowledge of elementary operations. This approach has two main
advantages. The material in Chapter 1 can be used to efficiently develop several of the
properties of elementary operations, and the statements of many of these properties are
simpler when formulated in vector terminology.
ί 1 if i = 3
Oij = <
41
42 Chapter 2 Elementary Operations on Vectors
(ii) If a type II operation is used to obtain A! from A, then A! has the form
A' = { v i , . . . , v s _ i , v s + 6 v t , v e + i , . . . , v f c } ,
We see, then, that once an elementary operation is applied to a set A to obtain a set
A', we need only apply another elementary operation of the same type to A! in order
to obtain A.
It is clear from our discussion above that the inverse of an elementary operation E
is unique, and is of the same type as E.
Proof. Suppose that A' is obtained from A by a sequence E\, E2,..., Et of elementary
operations. That is, the operations £Ί, ϋ?2, ···, £* are applied successively, obtaining a
new set Ai each time an 2^ is applied, until we obtain At = A'. Now consider the
sequence £ t _ *, ü ^ i , . . . , E^1, E^1 applied to A!. Applying E^1 to A! = At, one obtains
At-i since Et yields At when applied to At-\- Then applying E ^ to At-i, one obtains
.4t_2. Continuing in this manner, we obtain, successively, the sets
An illustration of this theorem and its proof is provided in the next example.
by a sequence ΕΊ, E2, £3, E4 of elementary operations that can be described as follows:
Utilizing the general discussion preceding Definition 2.1, we formulate the inverse
elementary operations as follows.
and
Eïx applied to A\ yields A.
Exercises 2.2
1. Write out the elements of the standard basis of R 5 .
2. Find an elementary operation that yields
{(1,0,2,1),(0,3,0,7),(3,6,4,3)}
when applied to {(1,0,2,1), ( - 2 , 3 , - 4 , 5 ) , (3,6,4,3)}.
3. Find an elementary operation that yields
{(l,0,2,l),(-2,3,-4,5),(3,6,4,3)}
when applied to {(1,0,2,1), (0,3,0,7), (3,6,4,3)}.
4. Show that the set
{(2,3,0,-1),(2,1,-1,2)}
can be obtained from the set {(0, —2, —1,3), (6, 7, —1,0)} by a sequence of elemen
tary operations.
5. Show that the set
{ ( 0 , - 2 , - 1 , 3 ) , (6, 7 , - 1 , 0 ) }
can be obtained from the set {(2,3,0, —1), (2,1, —1, 2)} by a sequence of elemen
tary operations.
6. Assume that the set A! = {v'l5 v 2 , v 3 } is obtained from the set A — {vi, v 2 , v 3 }
by the sequence Εχ,Εϊ, Ε% defined as follows.
vi = 2vi
v 2 = 2v 2 + 3v 3
v
3 = v 3 + Vi.
8. Let A = {vi, v 2 , v 3 , v 4 } , and let A' = {v'l5 v 2 , V3, V4} be sets of vectors in R 3
such that
vi = vi
V 2 = Vi + V 2
V3 = V 2 + V 3
V4 = V 3 + V 4 .
Write out a sequence of elementary operations that yields A' when applied to A.
Write out a sequence of elementary operations that yields A' when applied to A.
10. With the sets A and A' as given in Problem 8, write out a sequence of elementary
operations that yields A when applied to A!'.
11. With the sets A and A' as given in Problem 9, write out a sequence of elementary
operations that yields A when applied to A!.
12. Show that the sequence of elementary operations used to obtain A4 from A in Ex
ample 2 is not unique by exhibiting a different sequence of elementary operations
that yields A! when applied to A.
13. Show that the identity operation on a set with more than 1 element is an elemen
tary operation of type II.
Theorem 2.3 Suppose that A and A' are sets of vectors in R n such that A' is obtained
from A by applying a single elementary operation. Then A! is linearly independent if
and only if A is linearly independent.
2.3 Elementary Operations and Linear Independence 47
Proof. Suppose first that A = {vi, V2, ···, v/c} is linearly independent.
If A! is obtained by a type I elementary operation, then
A! = { v i , . . . , v e _ i , v e + 6v t ,v e +i,...,Vfc},
where s φί, If
k
^CiVi + c s (v s + bvt) = 0,
z=l
then
y ^ QVi + c s v s + (ct + c s 6)v t = 0,
i=l
At this point, we have developed a somewhat crude method for investigating the
linear dependence of a given set A of vectors in R n . If, by application of a sequence of
elementary operations to A, it is possible to obtain a set that contains the zero vector
(or any set that is clearly dependent), then the given set is linearly dependent. By
the same token, if a set can be obtained that is clearly independent, then A is linearly
independent. This method is refined to a systematic procedure later in this chapter.
We conclude this section with a final corollary to Theorem 2.3.
Corollary 2.5 A set of vectors resulting from applying a sequence of elementary oper
ations to a basis o / R n is again a basis o / R n .
Proof. Let A be a basis of R n . According to Corollary 2.4, any set A! obtained from
A by a sequence of elementary operations is a linearly independent set of n vectors, and
hence is a basis by Theorem 1.33. ■
Exercises 2.3
{(1,1,0),(0,1,1), ( 1 , 0 , - 1 ) , (1,0,1)}
is linearly dependent.
3. Show that the set A = {(1,0,0), (1,1,0), (1,1,1)} is linearly independent by ob
taining A from the standard basis of R 3 by a sequence of elementary operations.
4. Show that the set A = {(1,0,0), (1,1,0), (1,1,1)} is linearly independent by ob
taining the standard basis of R 3 from A by a sequence of elementary operations.
5. Use elementary operations to determine whether or not the given set is linearly
independent.
(a) { ( 1 , 0 , 2 ) , ( 2 , - 1 , 1 ) , ( 1 , 1 , - 1 ) }
(b) {(1,1,-1), ( 2 , - 1 , 1 ) , (2,1,1)}
(c) {(1,1,8,-1),(1,0,3,0),(3,2,19,-2)}
(d) {(1, - 1 , - 2 , - 4 ) , (1,1,8,4), (3, - 1 , 4 , - 4 ) }
(e) {(1,0,1,0), (0,1,0,1), (4,3,2,3), (1,0,0,0)}
(f) {(1,0,1,0), (2,1,4,3), ( 1 , 2 , 5 , - 2 ) , (-1,3,5,4)}
Ei : Replace the second vector by the sum of the second vector and (—2) times
the first vector.
E2 : Replace the third vector by the sum of the third vector and (—3) times the
second vector.
£3 : Multiply the third vector by \ .
According to Definition 2.6,
Ε2Ελ(Α) = E2(E1(A))
= £ 2 ({(1,0,2), (0,1,2), (0,3,8)})
= {(1,0,2),(0,1,2),(0,0,2)}
E3E2E1(A) = E3(E2E1(A))
= £ 3 ({(1,0,2),(0,1,2),(0,0,2)})
= {(1,0,2),(0,1,2),(0,0,1)}. ■
The next theorem in our development is fairly obvious, but it is important enough
to be designated as a theorem. A restricted form of the converse is contained in the last
theorem of this section, but the proof of that theorem must wait until some intermediate
results are established.
Combining this result and Corollary 2.4, we obtain an important corollary concerning
bases of a subspace.
Corollary 2.8 A set of vectors resulting from the application of a sequence of elemen
tary operations to a basis of a subspace W is again a basis of W .
2.4 Standard Bases for Subspaces 51
The next theorem is our first step in standardizing the bases of subspaces. Example
2 appears just after the end of the proof of this theorem, and the work in that example
illustrates the steps described in the proof. If the steps in the proof and the steps in
the example are traced together, this should make each of them easier to follow.
Theorem 2.9 Let A = {vi , V2, · ·., v m } be a set of m vectors in R n that spans the sub-
space W = (A) of dimension r, where m > r > 0. Then a set A' = {v 1? v 2 ,..., v r , 0 , ...,0}
of m vectors can be obtained from A by a finite sequence of elementary operations so
that {v x , v 2 ,..., v r } has the following properties:
1. The first nonzero component from the left in v · is a 1 in the kj component for
j = 1,2, ...,r. (This 1 is called a leading one.)
2. k\ < &2 < * * * < kr. (In vectors listed later in A!, the leading ones occur in
positions that are farther to the right.)
3. Vj is the only vector in A! with a nonzero kj component.
4. {v x , v 2 ,..., v r } is a basis o / W .
Proof. By Theorem 1.31, A contains a basis of (A). Thus, there is at least one vector
in A that is not zero. Let k\ be the smallest positive integer for which some v^ has
nonzero k\ component. By no more than one interchange of vectors, a spanning set for
W can be obtained in which the first vector has a nonzero k\ component. Multiplication
of this vector by the reciprocal of its k\ component yields a spanning set of W in which
the k\ component of the first vector is 1. Then each of the other vectors can be replaced
by the sum of that vector and a suitable multiple of the new first vector to obtain a
spanning set
Λι = {νΜ\...,ν£}
of W in which
(i) the first nonzero component in v^ is a 1 in the k\ component, and
(ii) v^ is the only vector in A\ with a nonzero number in any of the first fci positions
from the left.
(i) The first nonzero component in v[ ' is a 1 in the k\ component, and the first
(2)
nonzero component in v^ is a 1 in the k2 component,
(ii) ki<k2,
(iii) v\ } is the only vector in A2 with a nonzero k\ component, and v 2 is the only
vector in A2 with a nonzero k2 component.
That is, the first two vectors in the set A2 have the first three properties required
in the statement of the theorem.
Suppose that a set Ai that spans W has been obtained in which the first i vectors
(i < r) satisfy the first three properties listed in the theorem. Then fc^+i is chosen to be
the least positive integer for which one of the last m — i vectors in the set has a nonzero
fc^+i component. Such a vector exists, for Ai must contain at least r nonzero vectors.
The procedure described to obtain A\ and A2 can then be repeated to obtain the set
Ai+i that spans W in which the first i + 1 vectors satisfy the first three conditions.
It is clear, then, that a finite sequence of elementary operations can be applied to A
to obtain a set Ar — {v^ , vi>r\ ..., vJl } that spans W and in which the first r vectors
satisfy the conditions (1), (2), (3). Now assume that there exists an v;· , j > r, with
nonzero fcr+i component. Then it must be that kr+1 > /cr, and from this it is easily
seen that {ν^ , v 2 r ,..., vf" , v; } is linearly independent, contradicting the fact that r
is the dimension of W . Thus the remaining m — r vectors in Ar are zero and Ar = Af,
where A! satisfies the conditions of the theorem. ■
The proof given for Theorem 2.9 is a constructive one in that it describes a method
of obtaining the set A' from a given set A. This is illustrated in the following example.
in which the first vector has nonzero k\ component. Multiplication of this vector by |
yields the spanning set
in which the k\ component of the first vector is 1. Each of the other vectors is now
replaced by the sum of that vector and a suitable multiple of the new first vector as
follows.
Replace the second vector by the sum of the second vector and (—3) times the
first vector.
2.4 Standard Bases for Subspaces 53
Λ
1 - iVl >V2 >V3 >V4 )
in which
(ii) Vj = (0,1, —2,1,0) is the only vector in A\ with a nonzero component in either
of the first two positions.
Now &2 = 4 is the least positive integer for which some w\ \i ^ 1, in A\ has a
nonzero k2 component, and we have v 2 = (0,0,0, —2, —2) with a nonzero k2 compo
nent. Multiplication of v 2 by — \ yields
Each of the vectors other than the second is now replaced by the sum of that vector
and a suitable multiple of the second vector as follows.
Replace the first vector by the sum of the first vector and (—1) times the second
vector.
Replace the third vector by the sum of the third vector and the second vector.
Replace the fourth vector by the sum of the fourth vector and (—3) times the
second vector.
This yields
A _ r„(2) „(2) (2) (2),
A2 - {ν χ , ν 2 , v 3 , v 4 )
= {(0,1, - 2 , 0 , - 1 ) , (0,0,0,1,1), (0,0,0,0,0), (0,0,0,0,0)}.
It is evident at this point that A2 = A'. Simultaneously with finding A', we have found
that W has dimension r = 2. ■
The conditions of Theorem 2.9 are very restrictive, and one might expect that there
is only one basis of a given subspace that satisfies these conditions. The next theorem
confirms that this is the case.
Theorem 2.10 There is one and only one basis of a given subspace W that satisfies
the conditions of Theorem 2.9.
54 Chapter 2 Elementary Operations on Vectors
Proof. Starting in Theorem 2.9 with a basis A of W , that theorem assures us that
a basis A! of W that satisfies the conditions can be obtained from A by a sequence of
elementary operations. Thus there is at least one basis of the required type.
Suppose now that A' = {v'1?..., v^.} and A" — {v",..., v"} are two bases of W that
satisfy the conditions of Theorem 2.9. Let k[, ...,k'r and fc",..., A;" be the sequences of
positive integers described in the conditions for A! and A"', respectively.
Assume k[ < k'{. Since A!' spans W , there must exist scalars en such that ν'χ =
Σ [ = ι ciivï- Since each v " has zero j t h component for each j < k", any linear combi
nation such as Σ1=ι Ci\v" must have zero j t h component for each j < k'{. But v^ has
a nonzero k[ component, and k[ < k'{. This is a contradiction; hence k[ > k". The
symmetry of the conditions on A! and A!' implies that k'{ > k[, and thus, k[ = k'{.
Now assume kf2 < k2. Since A" spans W , there must exist scalars C{2 such that
v 2 = Σ [ = ι ci2wi- Now v " is the only vector in A!' that has nonzero k[ component, so
a linear combination Σ\=1 Q2v" has zero k[ component if and only if en = 0. Since v 2
has a zero k[ component, C\2 = 0, and we have v 2 = ΣΙ=2 c i2 v f· For i > 2, v" has zero
j t h component for each j < k2, and thus the linear combination Σ1=2 ci2v" has zero j t h
component for all j < k2. But v'2 has nonzero k2 component, and k2 < k2. As before,
we have a contradiction, and therefore, k2 > k2. From the symmetry of the conditions,
k2 > k'2, and thus, k2 = k2.
It is clear that this argument may be repeated to obtain kj = k'· for j = 1, 2,..., r.
Now A" spans W , so for each v^ there must exist scalars Cij such that \'- =
ΣΓ=ι cijvi- Since v" is the only vector in A" that has nonzero k[ component and
v" has k[ component equal to 1, Cij is the k[ component of J I [ = 1 Cijv". But w'- has zero
k\ component for i φ j , so c^ = 0 for i φ j , and since vfj has kj component equal to 1,
Cjj = 1. Therefore ν'ά = v'j, and A! = A" M
That is, the standard basis of W is the unique basis {vi, V2,..., v r } that has the
following properties.
t h
1. The first nonzero component from the left in the j vector Vj is a 1 in the kj
component. (This 1 is called a leading one.)
Clearly, if r = m = n in Theorem 2.9, the standard basis thus defined is the same as
that given in Definition 1.30, and our two definitions are in agreement with each other.
0 0 0 0
0 3 2-1
0-6-4 2
- 1 1 2 2
-1-2 0 3
In composing this array, we have recorded the components of v^ from top to bottom in
the z th column from the left. (It is admittedly more natural to record these components
in rows rather than columns. The reason for the use of columns will become clear in
Chapter 3.) Let us use an arrow from the first array to a second to indicate that the set
represented by the second array is obtained from the first array by application of one
or more elementary operations.
Our work in Example 2 can then be recorded in this manner:
0 0 0 0 0 0 0 0 0 0 0 0
2 3 0-1 1 3 0-1 1 0 0 0
-4-6 0 2 -2-6 0 2 - 2 0 0 0
2 1 - 1 2 1 1 - 1 2 1-2-1 3
0-2-1 3 0-2-1 3 0-2-1 3
0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0
2 0 0 0 - - 2 0 0 0
1 1 -1 3 0 1 0 0
0 1 -1 3 - 1 1 0 0
Theorem 2.12 The standard basis of a subspace W can be obtained from any basis of
W by a sequence of elementary operations.
Theorem 2.13 Let A and B be two sets of m vectors each in R n . Then (A) = (B) if
and only if B can be obtained from A by a sequence of elementary operations.
of elementary operations. ■
Exercises 2.4
4. For each of the sets A given below, use rectangular arrays as in Example 3 to find
the standard basis for (A).
5. Determine whether or not each of the sets below is linearly independent by finding
the dimension of the subspace spanned by the set.
8. Let A and A' be as given in Example 2. Write out a complete sequence of ele
mentary operations that will yield A' when applied to A.
9. Given that the sets A and B below span the same subspace W , follow the proof
of Theorem 2.13 to find a sequence of elementary operations that can be used to
obtain B from A.
Matrix Multiplication
3.1 Introduction
In Example 3 of Section 2.4, it was found that rectangular arrays were a useful notational
convenience in recording the results of elementary operations on a set of vectors. These
rectangular arrays are specific examples of the more general concept of a matrix over a
set Λ4, to be defined in the following section. In this chapter, we define the operation
of multiplication on matrices with real numbers as elements and establish the basic
properties of this operation. As mentioned earlier, the results of Chapter 2 are extremely
useful in the development here, particularly in the last two sections.
a n «12 · · · ais
a2s
0>2\ ^22 " '
where aij denotes the element in the ith row and j t h column of the matrix. The numbers
r and s are said to be the dimensions of A, and r by s is sometimes written as r x s.
The matrix A above may be written more simply as A = [a>ij]rXs or A = [α^] if the
number of rows or columns is not important.
59
60 Chapter 3 Matrix Multiplication
There are several terms that are useful in describing matrices of certain types. An
r by r matrix is said to be a square matrix or a matrix of order r. The elements
an of A = [aij] are the diagonal elements of ^4, and a square matrix A = [a^] with
aij = 0 whenever i φ j is a diagonal matrix. The matrix Ir = [ % ] r x r is the identity
matrix of order r. A matrix A = [α^] is a zero matrix if a^ = 0 for all pairs i, j . A
matrix that has only one row is a row matrix, and a matrix that has only one column
is a column matrix.
Definition 3.2 Two matrices A = [a>ij]rxs and B = [bij] χ over a set M are equal
if and only if r = p,s — q, and a^ = b^ for all pairs i, j .
1 7 -3 0 0 0
0
0 5 -6 B = 0 0 0 C = 2 4 6 D =
0
8 4 9 0 0 0
1 0 0
5 0 1 0 0
E = F= |0 1 0 I G=
0 -3 0 1 0
0 0 1
The special terms just introduced apply to these matrices in the following ways:
At times, we will denote a zero matrix by the same symbol 0 that we use for a zero
vector. This will not cause confusion, because the context where the symbol is used will
make the meaning clear.
Now consider a set of vectors A = {ui,U2, ...,u r } in R n and a second set B =
{vi, V2,..., v s } contained in (^4). Since A spans {A), there are scalars a^ in R such
that Vj = ΣΓ=ι a u' u « f° r 3 — 1,2,..., 5. The following definition involves these scalars
aij.
3.2 Matrices of Transition 61
W = ai Ui or =
J Σ J f i ^ 2 ' ·■■' '
The term matrix of transition applies only to situations involving nonempty finite
sets of vectors, and these sets must be ordered. Whenever a set of vectors is listed
without indices, it is understood that the index j is to go with the j t h vector from the
left. Thus the first vector from the left is to have index 1, the second from the left to
have index 2, and so on. This is consistent with the notational agreement made after
Example 1 in Section 1.5.
Another point needs to be emphasized in connection with Definition 3.3. The def
inition of the term transition matrix that is found in many elementary linear algebra
texts is not equivalent to the one given here. The one stated in Definition 3.3 is what we
need to present our development of matrix multiplication, and it is the one that leads
to simpler proofs of major theorems later in this book.
The following example shows that the matrix of transition A is not always uniquely
determined by A and B.
Example 2 □ Let A {(1,2,0), (2,4,0), (0,0,1)} and B = {(1,2,4), (2,4,8)} in (A).
Now
(1,2,4) = (1)(1,2,0) + (0)(2,4,0) + (4)(0,0,1)
and
(2,4,8) = (2)(1,2,0) + (0)(2,4,0) + (8)(0,0,1)
so that
Ai =
i=l
Definition 3.4 If A— {ui, U2,..., u r } is a basis of the subspace W and v Σί=ι ciui
the r by 1 matrix C given by
C2
C
For a given basis A, the vector v and the coordinate matrix [v]^ uniquely determine
each other. Also, with the notation of the paragraph preceding Definition 3.4, the j t h
column of A, written as a column matrix
aXj
a2j
1 2 1-3
A = 2 1-45
3 0 2 1
3.2 Matrices of Transition 63
This matrix is the transition matrix from £ 3 to the set B = {vi, V2, V3, V4} where
vi = l - e i + 2 - e 2 + 3 - e 3 = (l,2,3),
v 2 = 2 ■ ei + 1 · e 2 + 0 · e 3 = (2,1,0),
v 3 = 1 · ei + (-4) · e 2 + 2 · e 3 = (1, - 4 , 2),
v 4 = (-3) · e x + 5 · e 2 + 1 · e 3 = (-3,5,1).
wi = 1 · ui + 2 · u 2 + 3 · u 3 = (5,4,3),
w 2 = 2 · m + 1 · u 2 + 0 · u 3 = (1,2,3),
w 3 = 1 · ui + (-4) · u 2 + 2 · u 3 = ( - 2 , 3 , - 3 ) ,
w 4 = (-3) - m + 5 · u 2 + 1 ■ u 3 = (6, - 2 , 2 ) .
In this situation, we have an illustration of how the coordinates of a vector change from
1
basis to basis. For the vector vi = W2 = (1,2,3) has the coordinate matrix
Exercises 3.2
1. If
2
A = 1
0
is the matrix of transition from £3 to #, find the standard basis of (B).
2. If
2 4 1 1
1 2 2 0
0 0 3 -1
3 6 0 2
is the matrix of transition from £4 to #, find the standard basis of (B).
64 Chapter 3 Matrix Multiplication
2x + y x — 2y
3. Determine x and y so that is a diagonal matrix.
Ax — Sy 3x — y
5. Find the transition matrix from the first set to the second.
6. In each part below, a set A and a matrix A are given. In each case, find B so that
A is the matrix of transition from A to B.
4 -1
(a) Λ = {(1,2), (0,1)}; A =
-4 3
5 7
(b) . 4 = {(1,2), (2,1)}; A
2 10
(c) ^4 = { ( 1 , 0 , 2 , - 1 ) , (0,1,1,2), (1,2,1,4), (2,2,3,0)};
0 2 4 2 0
0 3-6 1-2
0 0 0-1-1
0 - 1 2 2 3
0 1 1
0-3 2
(d) A ={(3,1,2,4), (1,0,1,-1), (0,2,0,-4), (0,-3,1,-3)}; A
0 1 -2
0 1 0
1 1 -1
(e) .A = { ( 0 , 1 , - 2 ) , (-1,1,2), ( 1 , - 1 , 0 ) } ; A = -1 3 -1
- 1 2 0
3.2 Matrices of Transition 65
1 2
(f) Λ = {(2,1,0,-1,0), ( 0 , - 2 , 0 , 2 , 0 ) , ( 1 , - 1 , 0 , 1 , 0 ) } ; A 1 3
1 -4
7. Determine whether or not there is a matrix of transition from the first set to the
second, and find such a matrix if it exists.
8. Each of the sets A below is a basis of (A) ■ Find the vector that has the given
matrix C as its coordinate matrix with respect to A-
(a) . 4 = {(1,3), ( - 2 , 1 ) } ; C
3
(b) ,A= {(2,-1,1), ( 0 , 1 , - 1 ) , ( - 2 , 1 , 0 ) } ; C 1
-4
0
(c) .A = { ( 3 , 2 , - 1 , 0 ) , ( 0 , - 2 , 5 , 0 ) , ( 2 , 0 , - 4 , 1 ) } ; C -2
1
2
(d) A = { ( 0 , 3 , 4 , 6 ) , ( - 1 , - 2 , 0 , 2 ) , ( 4 , 0 , - 3 , 1 ) } ; C -3
-1
10. Assume that the vector v = (xi,X2,^3) has coordinate matrix [v]^ C2
C3
relative to the basis A = {(1,0,1), (0,1,1), (1,1,0)}, and express the components
£ι,£2ΐ#3 of v in terms of ci,C2,and C3.
66 Chapter 3 Matrix Multiplication
11. Suppose that the vector v = (x\, x2, £3) has coordinate matrix [v]^ = d2 with
ds
respect to the basis A = {(1,1,1), (1,1,0), (1,0,0)}. Express each di in terms of
xi,X2 5 and £3.
13. Let A ={ui,ii2, ...,u r } and B = {νχ, V2,..., v r } be bases of W , and let P —
[Pij]rXr be the matrix of transition from A to B. If v has coordinates ci, C2,..., cr
relative to A and di,d2,...,dr relative to ß, what relation exists between the C{
and the di?
Definition 3.5 Let A = [aij]rxa and B = [bij]sxt be matrices over R, and let W
be a subspace of R n of dimension r. Then A is the matrix of transition from a basis
A = {ui,U2, ...,u r } of W to a set B = {vi, V2,..., v s } in W , and B is a matrix of
transition from B to a set C — {wi, W2,..., w £ } in W . The product AB is defined to
be the matrix of transition from A to C. (See Figure 3.1.)
Λ >C
AB
F i g u r e 3.1
3.3 Properties of Matrix Multiplication 67
B. = a
A ^ ^ A y
must be
equal
The product AB as given in Definition 3.5 involves not only the matrices A and B,
but choices of A and W as well. This means that there is a possibility that the product
AB may not be unique, but may vary with the choices of A and W . The next theorem
shows that this does not happen, and the product AB actually depends only on A and
B.
W
J = Σ hjVk
s / r
a
= Σ fcfcj ( Σ ^u?
fc=l \i=l
s / r
a
— Σ (Σ ikbkjUi
fc=l \ i = l
r / s
= Σ ( Σ ciikbkjUi
a
Σ Σ i/c^j ) u 2 .
2=1 \fc=l
If A is a matrix with only one row, A = [αι,ο^ ···, Û S ], and B is a matrix with a
68 Chapter 3 Matrix Multiplication
single column,
b2
B =
then AB has only one element, given by a\b\ + a2b2 + * * · + asbs· This result can be
committed to memory easily by mentally "standing A by B" forming products of pairs
of corresponding elements, and then forming the sum of these products:
αι&ι
+d2b2
+ · ··
+asbs .
It is easily seen from the formula in Theorem 3.6 that the element in the zth row and
j t h column of AB can be found by multiplying the z th row of A by the j t h column of
B, following the routine given for a single row and a single column. This aid to mental
multiplication is know as the row-by-column-rule.
The row-by-column rule uses the same pattern as the one for computing the inner
product of two vectors. (See Definition 1.18.) This pattern is illustrated in the following
diagram.
column j of B
bnj
column j of C
S
v '
{row i of C
where
Cij = diibij -f ai2b2j + a^bsj + · · · + Q<inbnj.
3.3 Properties of Matrix Multiplication 69
2 -1
3 1 7 -2 -3
A = , B =
6 -5 5-4 8
0 4
The number of rows in B is the same as the number of columns in A, so the product
AB is defined. Performing the computations, we find that
2 -1
, 3^ 1. , , 7 - 2 - 3
AB =
6 -5 "5-4 0
0 4
The product BA is not defined because the number of rows in A is 4 and the number
of columns in B is 3. H
t t / s \
a
Σ dikCkj = Σ ( Σ imbmk I Ckj
k=l k=l \m=l /
t / s
=
Σ ( Σ (aimbmk)Ckj
k=l \m=l
70 Chapter 3 Matrix Multiplication
1 2 3 -1
Example 2 D Let A = and B = Then
4 0 2 1
7 1 -1 6
AB = and iM =
12 - 4 6 4
so that AB φ BA.
There are two other fundamental properties of multiplication of real numbers that
are not valid for multiplication of matrices. In general AB = AC and A φ 0 do not
imply B = C, nor do BA = CA and A φ 0 imply that B — C. That is, there is no
cancellation property for matrix multiplication. Also, AB = 0 does not imply that one
of A, B must be zero. Thus, the product of two nonzero matrices may be a zero matrix.
Examples of these situations are requested in some of the exercises.
Exercises 3.3
2 0 5 2 0
(a) A- 1 -2 4 , B = 8 1
3 1 6 5 -1
4 -5 8-2 7
(b) A B
-2 1 3 0 6
3.3 Properties of Matrix Multiplication 71
1 -2
3 2-1
(c) A £ = 5 -3
1 4 2
4 6
(d) A- -2 7 5 =
-1 -il
2 -5 3 0 -8 -1
(e) A = ß
-6 8 1 -2 -9 3
1 4 4
2 1
(f) A B = 4
5 0
-5
-2 1 0
4. Given that A = is the transition matrix from A — {(1,1), (0,1)}
0 2 1
1 0
4 -3
to the set #, and that B — is the transition matrix from B to C, find
0 -1
i -8 6
the matrix of transition from A to C.
5. Give an example of nonzero matrices A and B such that AB = BA.
6. Let A
— [&ij]rxs- Under what conditions does Ak exist for every positive integer
k? (A = A, A2 = A ■ A, etc.)
1
72 Chapter 3 Matrix Multiplication
10. For each of the following pairs A, B, let AB = [cij] and write a formula for Cij in
terms of i and j .
1 -2 1 Xl 4
2 0 3 X2 = 5
1 4 -1 X3 6
is equivalent to a system of linear equations in #i, x2, and x 3 . (Hint: Use Definition
3.2.)
12. Reversing the procedure in Problem 11, write a matrix equation that is equivalent
to the system of equations
Xi + 4x 2 + 3x 3 -h 2x 4 = 3
X2 + #3 + 7X4 = —5
2xi — 3x2 + x 3 — X4 = 0.
Σ ^ = °·
j=i
n I n \
2=1 \j=l J
= Σθ-Ui
2=1
= 0,
and this implies that b\ = 62 = * · · = bn = 0 since B is linearly independent. Thus, B'
is linearly independent and is a basis of R n .
An n by n matrix A over R is a nonsingular matrix if and only if the columns of
A record the coordinates of one basis of R n with respect to a second (not necessarily
different) basis of R n . That is, A is nonsingular if and only if the j t h column of A is the
coordinate matrix of the j t h vector in a basis of R n for j = 1, 2,..., n.
In the discussion of special types of matrices just before Definition 3.2, an identity
matrix was defined to be a matrix of the form In = [ i y ] n x n , where <5^ is the Kronecker
delta. There are many identity matrices, 7 n , but only one for each value of n. As
examples,
1 0 0
1 0
and I3 0 1 0
0 1
0 0 1
74 Chapter 3 Matrix Multiplication
From the placing of the l's and O's in 7 n , it is easy to see that In is the unique matrix
of transition from a basis A to the same basis A, and In is therefore nonsingular.
Using the fact that In is the transition matrix from A to A, it follows easily from
Definition 3.5 that
ImA = A and AIn = A
for any m x n matrix A. In particular,
lnA = A = Aln
Proof. Suppose that A is a basis of R n and let the nxn matrix A be the transition
matrix from A to a set B of n vectors in R n .
Assume first that A is nonsingular. Then B is a basis of R n and therefore every
vector in A is a linear combination of vectors in B. Hence there exists a transition
matrix B from B to A. By Definition 3.5, AB is the matrix of transition from A to A.
It follows that AB = In since In is the unique matrix of transition from A to A. In
similar fashion, we can show that BA is the matrix of transition from the basis B to B,
and therefore BA = In.
To prove the other part of the theorem, assume that A has an inverse 2?, so that
AB = In and BA — In. With the same notation as in the first paragraph of this proof,
let B be the transition matrix from the set B of n vectors to a set C of n vectors in
R n . Then AB is the transition matrix from A to C, by Definition 3.5. But AB = In,
so the set C is exactly the same as A. This means that B is a transition matrix from B
to A. Therefore A is dependent on B, and this implies that R n is dependent on B, by
Theorem 1.8. That is, B spans R n . It follows then from Theorem 1.34 that B is a basis
of R n and hence A is nonsingular. ■
Corollary 3.11 Suppose A is the matrix of transition from a basis A ofHn to a basis
B of R n . Then the matrix B is an inverse of A if and only if B is the transition matrix
from B to A.
3.4 Invertible Matrices 75
Proof. The corollary follows at once from the proof of the theorem. I
1 1 1
0 1 1
0 0 1
Up to this point, we have allowed the possibility that a matrix might have two or
more distinct inverses. The next theorem shows that this possibility does not actually
happen.
Proof. Suppose that B and C are both inverses of the n x n matrix A. Then all of
the equations
AB = In = BA and AC = In = CA
are valid. To prove that B = C, we evaluate the product BAC in two different ways.
First, we have
Second, we have
1
Definition 3.13 If A is an invertible matrix, its unique inverse is denoted by A .
1 1 1
A = 0 1 1
0 0 1
is invertible and is the transition matrix from the basis £3 = {(1,0,0), (0,1,0), (0,0,1)}
to the basis B = {(1,0,0), (1,1,0), (1,1,1)} of R 3 . In order to find A~l, it is sufficient
to find the transition matrix from B to £3. Since
we see that
1 -1 0
1
A' 0 1 -1
0 0 1
If a given square matrix A is complicated, a more efficient method than the one we used
here is needed to find A~l. Such a method is presented in Section 3.6. ■
Some of our earlier results can be rewritten using the exponential notation in Defi
nition 3.13. The equations AB = In = BA now read as
Also, we noted earlier that inverses occur in pairs: When B = -A -1 , then A = B~~l.
Substituting the value B — A~l in the equation A = B~l yields
A=(A-r1
In the definition and all discussion in this section, it has been required that both of the
equations A A"1 = In and A~XA = In be satisfied by the inverse matrix. Our next two
theorems show that these equations are not independent for square matrices. In fact,
for square matrices, we shall see that either of them implies the other.
Proof. Suppose there is a square matrix B such that AB — In. Now A is the matrix
of transition from Sn to a set A of n vectors in R n , B is a matrix of transition from A
to a set B of n vectors in R n , and AB = J n , so ß must be En. Thus B is a matrix of
transition form A to £ n , and this means that each vector in En is a linear combination
of vectors in A. That is, Sn is dependent on A. And since R n is dependent on £ n , this
means that R n is dependent on A. By Theorem 1.34, A is a basis of R n , and A is
invertible. It follows from Corollary 3.11 that B = A~l. ■
In view of Theorem 3.14, we see that a matrix A of order n is invertible if and only
if there is a square matrix B such that AB = In.
The proof of the next theorem is quite similar to that of Theorem 3.14, and is left
as an exercise.
Theorem 3.15 Let A be a square matrix of order n over R. / / there is a square matrix
B over R such that BA = 7 n , then A is invertible and B = A~l.
Theorem 3.16 If A and B are invertible matrices of the same order, then AB is
invertible and ( A B ) - 1 = B~1A~1.
Proof. If A and B are invertible matrices of order n, then A~l and B~l exist. Since
Corollary 3.17 If Ai, A2,..., Ak are square matrices of order n over R and each Ai is
invertible, then A\A<i · · · Ak is invertible and (A1A2 · · · Ak)-1 = A^1 · · · A^A^1.
Definition 3.18 If A and A' are bases of a subspace W ofHn such that A! is obtained
from A by a single elementary operation, then the matrix of transition from A to A' is
an elementary matrix.
Fk - - - F2FM) = £n
and therefore
A = F^F2-1---F^(Sn).
If Mi is the elementary matrix of transition that is associated with Fi, then this indirect
procedure will yield
Α = Μ~λ ---Μ^Μΐ1
since M~l is the elementary matrix associated with the elementary operation F~l. Note
that we also have
A'1 =M1M2--Mk
' M2 M3
F,(A)
M,
A- —» Fk F2F,(A)=£n
A' =M,M2 Mk
Figure 3.2
2 3
A =
6 4
The following display shows the elementary operations Fi, their associated elemen-
80 Chapter 3 Matrix Multiplication
l
tary matrices Mi, and the inverses Mi .
ELEMENTARY ASSOCIATED
INVERSE
OPERATION MATRIX
2 3 1 0 1 0 1 3 2 0
=
6 4 3 1 0 -5 0 1 0 1
Exercises 3.4
1. For each matrix A, determine A so that A is the matrix of transition from 83 to
A. Then use Definition 3.8 to decide whether or not A is nonsingular.
2-6 6 4 4 4
(a)A = -5 13 1 (b)A = 3 4 2
-2 4 10 -6 1 7
5 2 7 1 5
(c)A- 2 1 0 (dM = 3 9
2 9 3 -4
2. Which of the following are elementary matrices?
1 4 0 1 0 0 1 4 1 1 0
(a) 0 1 0 (b) 4 1 0 (c) 4 1 (d) 1 1 0
0 0 1 0 0 1 0 0 0 0 1
3.4 Invertible Matrices 81
0 0 1 1 1 0 1 0 0 0 1 0
(e) 0 1 0 (f) 0 2 0 (g) 0 1 0 (h) 0 0 1
1 0 0 0 0 1 0 0 2 1 0 0
1 0 1 0 0 1 1
5. Given that A , write A as a product of elemen
2 1 0 3 1 0
tary matrices.
6. Write each of the following invertible matrices as a product of elementary matrices.
2 4 -1 1 3 4 0 3
(a) (b) (c) (d)
3 4 1 0 2 1 -2 6
7. Find the inverse of each matrix in Problem 6 by use of Corollary 3.17 and the
factorization obtained in Problem 6.
8. Each of the matrices A below is nonsingular. Use A as the transition matrix from
the basis A = {(1, - 2 ) , (2,1)} of R 2 to the basis A! of R 2 . Determine A'1 by
finding the matrix of transition from A! to A.
1 3 2 -7 2 1 2 1
(a) A: (b)A (c)A W A-
-1 - 2 -1 4 -1 2 5 3
2 4 2 3
12. Given the matrices B = and (AB} -1 _ , find A-1.
7 8 -1 -2
1 3 -4 -5
13. Given that B~l = and AB , find the matrix A.
-1 - 2 2 3
82 Chapter 3 Matrix Multiplication
-2 1 0
14. Suppose A = is the transition matrix from A = {(1,1), (1,0)}
0 2 1
1 0
4 -3
to the set B, and B = is the transition matrix from B to C.
0 -1
-8 6
4 6 0
B = 2-5 3
4 8 2
was obtained from the invertible matrix A by adding 3 times the first column to
the second column, find the elementary matrix M such that BM = A. (Hint: Let
B be the matrix of transition from £ 3 to B.)
D
16. Derive a formula for the inverse of a nonsingular A = [aij]2x2 Y consideration of
a system of equations obtained from
an a12 X\ #2 1 0
a2\ a22 £3 X4 0 1
17. Prove that if A = [α^]ηΧη is not invertible, then AB is not invertible for any n x p
matrix B. (Hint: Use Theorem 3.14.)
18. Prove that if A = [α^}ηχη is singular, then BA is singular for any p x n matrix
B. (Hint: Use Theorem 3.15.)
A
M
AM
Figure 3.3
84 Chapter 3 Matrix Multiplication
Definition 3.22 A matrix A = [aij] m x r i over R that satisfies the following conditions
is a matrix in reduced column-echelon form, or a reduced column-echelon ma
trix.
1. The first nonzero element in column j is a 1 in row kj for j = 1,2, ...,r. (This 1
is called a leading one.)
2. k\ < k2 < - · - < kr < m (That is, for each change in columns from left to right,
the leading one appears in a lower row.)
3. For j — 1,2, ...,r, the leading one in column j is the only nonzero element in row
kj.
4. Each of the last n — r columns consists entirely of zeros.
For future use, we note that conditions (1) and (3) can be reworded in the following
ways.
Thus a matrix is in reduced column-echelon form if and only if its nonzero columns
record the components of the standard basis of a subspace, or if its nonzero columns
form the matrix of transition from £ m to the standard basis of a subspace.
3.5 Column Operations and Column-Echelon Forms 85
Example 1 □ Consider the question as to which of the following matrices are in re
duced column-echelon form.
1 0 0 1 0 0 1 0 0 1 0 0
0 2 0 0 0 0 2 0 0 0 0 1
B = C = D =
5 4 3 0 1 0 0 1 0 0 1 0
0 0 0 3 4 0 3 4 1 0 o oj
The matrix A is not in reduced column-echelon form. It fails to satisfy condition (1) in
the second and third columns because the first nonzero element in each of these columns
is not a 1. The matrix B is in reduced column-echelon form since it satisfies all four
conditions. The matrix C fails on condition (3) because the leading 1 in column 3 is not
the only nonzero element in row k% = 4. The matrix D fails on condition (2) because
&2 = 3 and k% = 2 violate ki < k^. That is, the leading 1 in column 3 of D fails to be
in a lower row than the leading 1 in column 2. ■
1. The first nonzero component from the left in Vj is a 1 in the kj component for
j = l,2,...,r.
Now each of the elementary operations Ei has an associated elementary matrix Qi,
and when Ei is applied to a set, a matrix of transition from that set to the new set is
Qi. As the diagram in Figure 3.4 shows, this means that a matrix of transition from A
to A is Q1Q2 ' ' ' Qt, and that AQ1Q2 · · · Qt = A is the matrix of transition from £ m
to A. Thus the i t h element in column j of A = [α^] is equal to the i t h component of
y' and therefore:
AQ,Q2 Q,=A'
Figure 3.4
We saw in Chapter 2 that the set A' used in the proof of Theorem 3.23 is uniquely
determined by the set A. It follows from this fact that each matrix A over R has an
associated unique reduced column-echelon form A1 as described in the proof. However,
we have seen in Chapter 2 that the sequence of elementary operations used to obtain A!
from A is not unique. The invertible matrix Q is similarly not unique in spite of the fact
that AQ = A! is unique for A. An example that demonstrates this lack of uniqueness
is given at the end of this section.
Definition 3.24 The matrix A! in Theorem 3.23 is called the reduced column-
echelon form for A.
We shall now show how the proof of Theorem 3.23 can be interpreted so as to give
a systematic procedure for finding an invertible Q such that AQ is in reduced column-
echelon form. This procedure is closely related to that used in Chapter 2 in finding the
standard basis of a subspace.
Suppose that A — [α^] is a given m x n matrix over R. Interpreting A a s a matrix
of transition from Sm to A is equivalent to obtaining A by recording the components of
vectors in A, as was done in Chapter 2. We have seen that performing an elementary
operation on A corresponds to performing an elementary column operation on A. With
these interpretations, the procedure in Example 3 of Section 2.4 can be regarded as a
method for obtaining the reduced column-echelon form
0 0 0 0
1 0 0 0
f
A = I -2 0 0 0
0 1 0 0
- 1 1 0 0
3.5 Column Operations and Column-Echelon Forms 87
AIn = A
AInQ1 = AQX
AInQlQ2 = AQXQ2
Let Ei be the elementary column operation that has Qi as its associated matrix. Now
AQi = E\(A), and AQ\Q2 · Qi = EiEi-\ · · · Εχ(Α) in general. Thus the right mem
bers of the equations above may be found by applying the sequence E\, E2,..., Et to
A. The left members have as factors the products InQ\Q2 · · Qi = E\Ei-\ · · · E\{In).
Thus Q can be found by applying the same sequence of elementary operations to In.
What is desired, then, is an efficient method of recording the results Ei · · · E2E\(A) and
Ei- - E2Ei(In). This can be done effectively by recording both A and In in a single
matrix as
Γ
A
A EM) E2EM)
1 1 etc.,
_/n_ _£l(/n)_ E2E1(In)
A A A
Ei E2E\ etc.
In In
This procedure is quite valid since the same operations in the same order are to be
applied to each of A and / „ .
Chapter 3 Matrix Multiplication
0 0 0 0 0 0 0 0
0 3 2 -1 2 3 0-1
0 -6 -4 2 -4-6 0 2
1 1 2 2 2 1 - 1 2
A 1 -2 0 3 0-2-1 3
~"*
In
1 0 0 0 0 0 1 0
0 1 0 0 0 1 0 0
0 0 1 0 1 0 0 0
0 0 0 1 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0
1 3 0 -1 1 0 0 0 1 0 0 0
2 -6 0 2 2 0 0 0 - 2 0 0 0
1 1 -1 2 1 -2 -1 3 0 1 0 0
0 -2 -1 3 0 -2 -1 3 - 1 1 0 0
~~> ~~*
0 0 1 0 0 0 1 0 0 0 1 0
1 1 1 3
0 1 0 0 0 1 0 0 2 2 2 2
1 1 3 1 1 3 3 7
2 0 0 0 2 2 0 2 4 4 4 4
0 0 0 1 0 0 0 1 0 0 0 1
Thus
0 0 0 0
1 0 0 0
A' - 2 0 0 0
0 1 0 0
- 1 1 0 0
3.5 Column Operations and Column-Echelon Forms 89
and
0 0
0 0 0 1
is an invertible matrix such that AQ = A' It is easy to show that Q is not unique by
performing elementary operations that involve only the last two columns of Q. For
instance, adding the third column of Q to its fourth column yields the matrix
0 0
B =
! -i
0 0 0
Exercises 3.5
1. Describe the elementary column operation that is associated with the given ele
mentary matrix M.
0 1 0
0 1
(a) M = (b) M 1 0 0
1 0
0 0 1
1 0 0 1 0 0
(c) M 0 1 0 (d) M 2 1 0
0 2 1 0 0 1
0 0 0 0 1 0 0 0 0
1 0 0 1 0 0 1 0 0
(a) (b) (c)
0 1 0 0 1 0 1 1 0
2 3 0 1 0 0 1 1 1
90 Chapter 3 Matrix Multiplication
0 1 0 1 1 1
1 0 0
0 0 1 0 1 1
(d) (e) (f) 1 1 0
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0
1 4 2
4. Given that B 2 0 1 is obtained from the matrix A by multiplying A
-2 1 3
1 0 0
on the right by 0 1 0 , find A.
2 0 1
0 0 0 1 0 1 1
9 6
1 2 1 2 1 3 1
(a) (b) -4 -3 (c)
2 4 2 -1 0 -1 -1
-4 -2
1 1 1 3 2 5 1
1 2 1 -1
0 0 0 0 1 2 1-1
2 4 2 -2
2 3 0 -1 2 4 2-2
0 1 2 3
(d) 4 -6 0 2 (e) (f) 0 1 2 3
1 4 5 5
2 1 -1 2 1 4 5 5
0 3 -2 4
0 -2 -1 3 0 3 - 2 4
1 6 1 6
3.6 Row Operations and Row-Echelon Forms 91
1 1 3 2 0 1 1 3 1 1
1 1 -1 0 0 1 0 2 0 1
(g) (h)
2 2 6 4 0 3 -2 4 0 7
-3 0-6-3 1 -1 0 -2 1 1
7. In each part of Problem 6, let A be the given matrix and find an invertible matrix
Q such that AQ is in reduced column-echelon form.
Definition 3.25 There are three types of elementary row operations on a matrix A.
(I) An elementary row operation of type I multiplies one of the rows of A by a
nonzero scalar a.
(II) An elementary row operation of type II adds to row t in A the product of b
and row s in A, where s Φ t.
(Ill) An elementary row operation of type III interchanges two rows in A.
The descriptions of the products MA, where M is an elementary matrix, are very
much like those obtained for the products AM in Section 3.5, even though the deriva
tions are fundamentally different. As mentioned earlier, we consider a matrix M of type
II here and leave the other derivations as exercises.
92 Chapter 3 Matrix Multiplication
-> B
MA
Figure 3.5
We have
V a U
j = Σ ij i
m
= ^2 aijUi-\-asjus
i=l
ϊφδ
771
m
= 5Z a û' e ^ + (atj + basj)et
i=l
i±t
so that the coordinates of v^ relative to £ m are the same as the coordinates of \j
relative to A except for the tth coordinate, and the tth coordinate of Vj relative to Sm
is obtained by adding to the £th coordinate of Vj relative to A the product of b and the
s t h coordinate relative to A. Hence multiplying A on the left by M simply adds the
product of b and the s t h row to the tth row of A. That is, row t of A is replaced by the
sum of row t of A and b times row s of A.
The descriptions of MA for elementary matrices M of types I and III are as follows.
• If M is obtained from Im by multiplying row s of Ιπ by a, then MA is obtained
from A by multiplying row s of A by o.
• If M is obtained from 7 m by interchanging rows s and t of In (s ^ i), then MA
is obtained from A by interchanging rows s and t of A.
The concept of the transpose of a matrix is extremely useful in obtaining the row
analogue of the reduced column-echelon form. This analogous form is called the reduced
row-echelon form of a matrix.
Definition 3.26 / / A [aij] is any mx n matrix over R, the transpose of A is the
n x m matrix B — [bij]with bj üji fori = 1,2,..., n;j = 1,2, ...,ra. The transpose of
A is denoted by AT. If A is a matrix such that AT - A, then A is called symmetric.
If AT = — A, then A is skew-symmetric.
3.6 Row Operations and Row-Echelon Forms 93
1 -2 3 0 1 -2
B= -2 - 4 5 and C 1 0 3
3 5-6 2 -3 0
-1 2
7
C 0 -3 -c.
3 0
Our next theorem states that the transpose of a product is equal to the product of
the transposes in reverse order.
10 0 0 10 0 3 0 0 0 1 0 0 1 2
0 0 0 0 B= 0 1 0 2 c= 10 0 0 D= 0 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
94 Chapter 3 Matrix Multiplication
The matrix A is not in reduced row-echelon form since the row of zeros is placed above
a nonzero element, violating condition (4). The matrix B is not in reduced row-echelon
form because the leading 1 in row 3 is not the only nonzero element in column k$ = 4.
The matrix C fails on condition (2) because the leading 1 in the second row does not
appear farther to the right than the leading 1 in the first row. That is, k\ = 4 and
&2 = 1 do not satisfy k\ < &2- The matrix D satisfies all four conditions, and D is in
reduced row-echelon form. ■
Conditions (1) and (3) in Definition 3.28 can be reworded in the following ways that
parallel the rewording we used with the definition of reduced column-echelon form:
These alternative wordings are more useful in proving our next theorem.
Proof. Let A = [ a ^ ] m X n and let B = [bij]nxrn = AT. If we let j denote the row
numbers and i denote the column numbers of elements of A in Definition 3.22, then A
is in reduced column-echelon form if and only if these conditions hold:
The elements in column i of A are the elements in rows i of AT, so the conditions
on A are satisfied if and only if:
From this theorem and the fact that (AT)T = A, it follows that AT is in reduced
column-echelon form if and only if A is in reduced row-echelon form.
Proof. By Theorem 3.23, there is an invertible matrix Q such that ATQ is in reduced
column-echelon form, and this means that (ATQ)T = QT(AT)T = QTA is in reduced
T
row-echelon form. But P = Q is an invertible matrix, so the theorem is proved. ■
In the proof of Theorem 3.31, the reduced column-echelon form for AT is uniquely
determined by AT, and therefore the reduced row-echelon form PA is uniquely deter
mined by A. We make the following definition.
Definition 3.32 The unique matrix PA in the statement of Theorem 3.31 is called the
reduced row-echelon form for A.
Theorem 3.33 A square matrix A = [α^]ηΧη is invertible if and only if the reduced
row-echelon form for A is In.
Proof. Assume that A is invertible, and let P be an invertible matrix such that PA
is in reduced row-echelon form. There must be no rows of zeros in PA, for otherwise,
AT would obviously be singular. This means that k\ = i of each i, and PA = In.
If the reduced row-echelon form for A is In, there is an invertible matrix P such that
PA = In. Then A is invertible by Theorem 3.15. ■
Pi J m A = PiA
These equations indicate that if the reduced row-echelon form is obtained by application
of a certain sequence of elementary row operations to A, the matrix P may be obtained
by application of the same sequence of operations, in the same order, to 7 m . This can
be done efficiently by recording A and J m in a single matrix as [A, Im] and performing
the row operations simultaneously on A and 7 m .
96 Chapter 3 Matrix Multiplication
2 0 2
A = 0 1 -3
2 1 1
and consider the problem of finding an invertible matrix P such that PA is in reduced
row-echelon form.
1 0 0 1 0 1 1 0 0
2 0 1 2
[A,Im} = 0 1 0 1 0 - 0 1 -3 1 0 1 0
2 1 0 0 1 0 1 - 1 1 -1
■ 0 1
1 0 1 1
2 0 0 1 10 0 1 1 1
2
1
2
0 1 -3 0 1 0 0 1 0 1 -32 1
2
3
2
0 0 2 1 -1 1 1 0 0 1 1 -21 1
2
1
2
P =
Exercises 3.6
1. Describe the elementary row operation that is associated with the given elementary
matrix M.
4 0 0 1 0 0
(a) M 0 1 0 (b) M 0 1 5
0 0 0 0 0 1
0 0 1 1 3 0
(c) M = 0 1 0 (d) M 0 1 0
1 0 0 0 0 1
1 0 3 0 0 1 0 1 0 1 1 1
(a) 0 2 4 0 (b) 1 0 1 0 (c) 0 0 1 1
0 0 1 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 1 0 0 0
(d) 1 0 0 0 (e) 1 0 0 (f) 1 1 0 0
0 1 0 0 1 1 0 0 0 1 0
3. Find the reduced row-echelon form for each of the matrices given in Problem 6 of
Exercises 3.5.
4. In each part of Problem 6 in Exercises 3.5, let A be the given matrix and find an
invertible matrix P such that PA is in reduced row-echelon form.
1 2 1 -4 -5 3 1 0 1 1 2 1
(a) 1 0 1 (b) 3 3-2 (c) 1 1 2 (d) -1 -1 1
0 1 -1 -1 -1 1 3 4 -2 0 1 3
1 2 0 1 1 0
6. Given A , write AT as a product of elementary
0 1 1 0 0 3
matrices.
98 Chapter 3 Matrix Multiplication
1 0 2
7. UA -1 1 0 is the matrix of transition from S3 to A, (a) find A and (b)
0 1 -1
find the transition matrix from A to £3.
8. Prove that if M is obtained from 7 m by multiplying row s of Im by a, then MA
is obtained from A by multiplying row 5 by a.
9. Prove that if M is obtained from Im by interchanging row 5 and t of 7 m , then
MA is obtained from A by interchanging row s and £.
10. Given that the matrix
Γ
4 2 4
B = I6 -5
0 3 2
is obtained from A by the following elementary operations:
i. First, the second and third rows of A are interchanged.
ii. Next, the first column is multiplied by 2 and added to the second column.
Find the matrix A.
11. Prove that {AT)T = A.
12. Which types of elementary matrices are symmetric?
13. Prove that any two diagonal elements of a skew-symmetric matrix over R are
equal.
14. Prove or disprove.
(a) The set of all symmetric nxn matrices over R is closed under multiplication.
(b) The set of all skew-symmetric nxn matrices over R is closed under multi
plication.
15. Prove that, for any matrix A over R, AAT is defined and is a symmetric matrix.
16. Extend Theorem 3.27 to any product with a finite number of factors:
(A1A2-Ak)T = Al--AlAj.
17. Prove that if A is symmetric, then Ak is symmetric for any positive integer k.
18. Let A = [aij]nxn with aik = ai<7- + ajk for all z, j , fe.
The properties (1), (2), and (3) are known as the reflexive, symmetric, and
transitive properties, respectively.
We are interested here in equivalence relations on matrices. There are two such
relations that are intimately connected with the two preceding sections.
The term "column-equivalent" defines a relation on the set of all matrices over R.
The relation consists of the set of all ordered pairs (B, A) such that B = AQ for some
invertible matrix Q. It is clear that if A is m x n, then any matrix that is column-
equivalent to A is m x n.
Theorem 3.37 Let A and B be m x n matrices over R. For any given basis of R m ,
A will be the matrix of transition from the given basis to a set A of n vectors, and B
will be the matrix of transition from the given basis to a set B of n vectors. Each of the
following statements implies the other three:
1. B is column-equivalent to A.
2. B may be obtained from A by a sequence of elementary column operations.
3. (B) = {A).
4. B may be obtained from A by a sequence of elementary operations.
Proof. Statement (3) is true if and only if (4) is true, according to Theorem 2.13.
In view of Theorem 3.19, (1) means the same as the assertion that B — AQ1Q2 - - Qt,
where each Qi is elementary. It thus follows from the discussion following Definition
3.21 that (1) and (2) are equivalent. Thus the theorem will be proved if we show that
each of (1) and (4) implies the other.
Suppose first that B is column-equivalent to A. Then B = AQ1Q2 · · · Qt, where
each Qi is elementary. This means that AQ1Q2 - - Qt is the matrix of transition from
the given basis to B. And since A is the matrix of transition from the given basis to A,
Q1Q2 - - ' Qt is a matrix of transition from A to B. Each Qi has an associated elementary
operation E^ and we have Et · · · £2^1 (*4) = B. Thus (1) implies (4).
Assume now that B may be obtained from A by a sequence E\,E2,...,Et of elemen
tary operations E{. If Qi is the elementary matrix associated with Ei, then Q1Q2 · · · Qt
is the matrix of transition from A to Et · · · E2E\(A) = B. Since A is the matrix of
transition from the given basis to A, AQ1Q2 - - Qt is the matrix of transition from the
given basis to B. But the matrix of transition from the given basis to B is .£?, and this
matrix is unique. Hence B — AQ1Q2 - — Qt, and B is column-equivalent to A. ■
We consider now some examples that illustrate the quantities involved in Theorem
3.37 and its proof.
3.7 Row and Column Equivalence 101
2 0 1 1 -3 0
1 1 0 0 1 2
A = B =
8 4 2 2-2 8
6 0 3 3-9 0
.A = { ( 2 , 1 , 8 , 6 ) , (0,1,4,0), (1,0,2,3)}
B = {(1,0,2,3), ( - 3 , 1 , - 2 , - 9 ) , (0,2,8,0)}.
Theorem 2.13 assures us that B and A span the same subspace of R 4 if and only if B
can be obtained from A by a sequence of elementary operations. By the end of the next
example, we can see that (B) = (A). ■
Example 2 □ For the matrices A and B in Example 1, we shall show that B is column-
equivalent to A and find an invertible matrix Q such that B = AQ. To show that B is
column-equivalent to A we need only confirm that their reduced column-echelon forms
are equal. But in order to find an invertible Q such that B = AQ, we need first to find
invertible matrices Miand M^ such that AM\ = A' and BM2 = B' are both in reduced
column-echelon form. Using the same procedure as in Example 2 of Section 3.5, we
obtain the following results.
2 0 1 1 0 2 1 0 0 1 0 0
1 1 0 0 1 1 0 1 1 0 1 0
8 4 2 2 4 8 2 4 4 2 4 0
A 6 0 3 3 0 6 3 0 0 3 0 0
- - -
h
1 0 0 0 0 1 0 0 1 0 0 1
0 1 0 0 1 0 0 1 0 0 1 -1
0 0 1 1 0 0 1 0 -2 1 0 -2
1 0 0
0 0 1
0 1 0
Thus Mi = 0 1 -1 is an invertible matrix such that AM\ = A'
2 4 0
1 0 -2
3 0 0
102 Chapter 3 Matrix Multiplication
1 -3 0 1 0 0 1 0 0
0 1 2 0 1 2 0 1 0
2 -2 8 2 4 8 2 4 0
B 3 -9 0 3 0 0 3 0 0
—
* - f
h
1 0 0 1 3 0 1 3 -6
0 1 0 0 1 0 0 1 -2
0 0 1 0 0 1 0 0 1
1 3 -6
Thus M 2 0 1 -2 is an invertible matrix such that
0 0 1
1 0 0
0 1 0
BM2 = B'
2 4 0
3 0 0
As mentioned earlier, the fact that Α' = Β' shows that B is column-equivalent to A.
The equation BM2 = AM\ implies that B = ΑΜχΜ^1, and thus Q = ΜχΜ^1 is an
invertible matrix such that B = AQ. When these computations are performed, we find
1 -3 0
_1
that M 2 = 0 1 2 and
0 0 1
0 0 1 -3
1
Q = MiMf = 0 1 0 1
1 0 0 0
is an invertible matrix such that B = AQ. According to Theorem 3.37, the sets B and
A in Example 1 do indeed span the same subspace. ■
Example 3 D Suppose now that we wish to find elementary matrices Qi,Q2,-,Qt
such that B = AQ1Q2 - -Qt as described in the proof of Theorem 3.37. These elemen
tary matrices can be obtained from the work in Example 1 because both A! and B'
3.7 Row and Column Equivalence 103
were found by performing a single elementary column operation in each step. It is easy
to see that the work done in Example 1 with A can be represented by
J3 . h E x
.
I3E1E2 I3E1E2E3
0 0 1 1 0 1 0 0
Ex = 0 1 0 Eo = 0 1 0 1 -1
1 0 0 0 0 0 0 1
B BFX BF1F2
- -
I3F1F2
1 3 0 1 0 0
i\ 0 1 0 , F2 = 0 1 -2
0 0 1 0 0 1
B = AMXM^X
= AE^EziF^)-1
= AE^E^F^1.
0 0 1 1 0 -2 1 0 0 1 0 0 1 -3 0
B 0 1 0 0 1 0 0 1 -1 0 1 2 0 1 0
1 0 0 0 0 1 0 0 1 0 0 1 0 0 1
Similar to the situation with Definition 3.35 the term "row-equivalent" defines a
relation on the set of all matrices over R. It is left as an exercise to prove that this
relation is an equivalence relation.
We have seen in Section 3.6 that multiplication of a given matrix on the left by a
product of elementary matrices yields the same result as the application of a sequence
of elementary row operations to the matrix. In combination with Theorem 3.19 this
shows that the following conditions are equivalent:
1. B is row-equivalent to A;
Proof. This theorem follows from Theorem 3.31 and the discussion of uniqueness
just before Definition 3.32. ■
Exercises 3.7
1 3 3 -6
(a) A B
-2 - 6 2 -4
3 2 5 1
(b) A 6 4 , B = 10 2
13 9 9 2
1 2 1 0 -1
-2 -4 4 1 -2
(c) A B
1 0 3 1 -1
-1 0 2 1 0
1 0 0 1 1 0 0 1
1 1 0 2 2 1 0 2
(d) A , B =
0 1 1 2 2 2 3 1
0 0 1 1 1 1 3 1
3 2 5 1
A = 6 4 , B = 10 2
1 13 9 9 2
0 1 -1
Pl = P<2 = 1 -5 5
-2 1 0
1 3 5 1 0 -1 7 -3 0
2 7 12 ,B = 2 1 0 ,Pi -2 1 0
3 4 5 1 2 3 13 5 1
5. For each pair of matrices in Problem 1 that are column-equivalent, find an in
vertible matrix Q such that B = AQ.
6. For each pair of matrices in Problem 1 that are row-equivalent, find an invertible
matrix P such that B = PA.
7. For each pair of matrices in Problem 1 that are column-equivalent, find elementary
matrices Qi, Qi, ···, Qt such that B — AQ1Q2 · · Qt-
8. Prove that the relation defined by "row-equivalent" is an equivalence relation on
the set of all matrices over R.
9. Justify your answer for each of the following questions.
It is easy to see that A is the matrix of transition from £ m to a spanning set of the
column space of A.
Proof. Let A = [ a ^ ] m X n have rank r, and let A' = [a^lmxri be the reduced column-
echelon form of A. Let A — {vi, V2,..., v n } , where Vj = (aij, Ü2J, ...,a m j), and let
A! — {vi, V2,..., v^}, where ν^ = (α^, a;2j, . . . , α ^ ) . Then (A) and {A!) are the column
spaces of A and A!', respectively. It is clear from conditions (1) and (2) of Definition
3.22 that {v^, v'2,..., v^.} is linearly independent and therefore is a basis of {A'). But
(A) = (Α') by Theorem 3.37 since A and A! are column-equivalent. Thus (A) has
dimension r. ■
Proof. Since A and AQ are column-equivalent, their column spaces are equal. Hence
their ranks are equal, by the theorem. ■
Theorem 3.46 Let A be an m x n matrix over R, and let r be the rank of A. There
exist invertible matrices P and Q such that PAQ has the first r diagonal elements equal
to 1, and all other elements zero.
3.8 Rank and Equivalence 107
ε <r
Figure 3.6
According to Theorem 1.32, the linearly independent set {v^, \'2,..., v^,} can be ex
tended to a basis B = {wi, w 2 ,..., w m } of R m , where w^ — v^ for i = l,2,...,r. Let P
be the invertible mxm matrix of transition from B to £ m (see Figure 3.6). Then PAQ
is the matrix of transition from B to A'. Since the first r vectors of Af are the same as
the first r vectors of B, the first r columns of PAQ are the same as the first r columns
of 7 m . And since the last n — r vectors of Af are zero, the last n — r columns of PAQ
are zero. Thus
1 0 0 0
0 Ir I 0
PAQ =
0 I 0 0 I 0
applying a sequence of row operations. Thus one may proceed to use column operations
A'
on to obtain as we did in Section 3.5, and then use row operations on
In Q
[Α', Im] to obtain [Dr, P], where Dr = PA' = PAQ. This is illustrated in the following
example.
1 2 -1
3 6-3
-1 1 0
2 4-2
1 2 -1 1 0 0
3 6 -3 3 0 0
-1 1 0 -1 3 -1
A 2 4 -2 2 0 0
~~*
h
1 0 0 1 -2 1
0 1 0 0 1 0
0 0 1 0 0 1
1 0 0 1 0 0
3 0 0 3 0 0
1 -1 3 0 1 0
2 0 0 2 0 0 A'
-►
Q
1 1 -2 0 -1 1
0 0 1 0 0 1
0 1 0 -1 -1 3
3.8 Rank and Equivalence 109
Next we use row operations to transform [A',/4] into [Dr,P], where Dr — PA' =
PAQ.
1 0 0 11 0 0 0 1 0 0 1 0 0 0
3 0 0 I 0 1 0 0 0 0 0 - 3 1 0 0
[A',I4
0 1 0 10 0 1 0 0 1 0 0 0 1 0
2 0 0 10 0 0 1 0 0 0 - 2 0 0 1
1 0 0 1 0 0 0
0 1 0 0 0 1 0
[Dr,P]
0 0 0 - 3 1 0 0
0 0 0 - 2 0 0 1
Thus
1 0 0 0
0 -1 1
0 0 1 0
and Q 0 0 1
- 3 1 0 0
-1 - 1 3
- 2 0 0 1
are invertible matrices such that
1 0 1 o
0 11 0 h 1 0
PAQ — + — = — + — D2.
0 0 1 0 0 1 0
0 0 1 0
Theorem 3.47 Let A and B be m x n matrices over R. Then B is equivalent to A if
and only if B and A have the same rank.
Proof. Let r denote the rank of A, and let r' denote the rank of B.
Suppose that r = r'. Then there are invertible matrices P,Q,P'', and Q' such that
PAQ — Dr = Dr' = P'BQ'. The matrices P and P' are m x m, and the matrices Q
and Q' are n x n. Hence the equation PAQ = P'BQ' implies that
(P')-1PAQ(Q')-1 = B,
PDr = Dr,Q~l.
Similarly,
r = rank(£>r) = rank(£>rQ) = rank(P _ 1 D r /) < r'
so that r = r'. I
The following theorem and corollary are extremely useful in connection with the
solution of systems of linear equations in Chapter 4.
Theorem 3.48 Let A — [ûr/jmxn over R. Then A and AT have the same rank.
Proof. Let r denote the rank of A, and let P and Q be invertible matrices such that
Ir I 0
PAQ = Dr.
0 I 0
Now QT and PT are invertible (Theorem 3.30), and QTATPT = Dj. The dimension
of the column space of Dj is clearly r, so the D^ has rank r. But AT and Dj are
equivalent, so AT must have rank r also. ■
Corollary 3.49 If A has rank r, then r is the number of nonzero rows in the reduced
row-echelon form of A.
Proof. Suppose that A has rank r, and let P be an invertible matrix such that
PA is in reduced row-echelon form. By Thoerem 3.29, (PA)T = ATPT is in reduced
column-echelon form. Since A, AT and ATPT all have the same rank r, the number of
nonzero columns in (PA)T = ATPT is r. But the number of nonzero columns in (PA)T
is the same as the number of nonzero rows in PA. Hence the corollary is proved. ■
This result leads to the next corollary. The proof is requested in Problem 14 of the
exercises.
Exercises 3.8
2 4 1 3 2 3
1 2 3 -2 1 4
(a) (b)
0 0 5 -7 0 5
3 6 0 2 1-2 5 1 -1
1 0 2 1 2 3
4 1 3 2 -4 -6
(c) (d)
3 1 1 0 2 4
2 1 -1 0 0 0
2. Find the standard basis of the column space of each matrix A in Problem 1.
r
i o-i
η 3 2
. 1- 3- 2- . , 3_ - 6_ 7. , , , , 4 1 - 2
6 4 , D
~ =
A=
2 -6 4
, B
* =
2 -4 5 , c~ =
13 9 ' 3 1 - 1
2 1 0
6. Answer the following questions for the matrices
2 1 -3 1 1 -1
A = 1 2 0 B 2 3-1
1 1 -1 3 3-3
1 2 1 1 -2 3
7. For A = 2 4 2 it is given that Q = 0 1 -2 is an invertible matrix
0 1 2 0 0 1
1 0 0
such that AQ — A! — 2 0 0 . Let A! = {(1,2,0), (0,0,1), (0,0,0)}. Find a
0 1 0
1 0 0
3
basis B of R such that the matrix of transition from B to A! is Ό2 0 1 0
0 0 0
and an invertible P such that PAQ = £>2· {Hint: See the proof of Theorem 3.46.)
8. For each matrix A below, follow the proof of Theorem 3.46 step by step to find
invertible matrices P and Q such that PAQ = Dr. In your development, write
out the sets A, A', and B.
1 0 0 1 3 2 1 0
1 1 0 2 4 -3 -2 0
(a) A (b)A =
0 1 1 2 1 0 3 1
0 0 1 1 3 -3 1 1
9. For each matrix A, find invertible matrices P and Q such that PAQ has the first
r elements of the main diagonal equal to 1, and all other elements 0.
0 2 4 1 -1
(aM = 0 1 2 0 (b)A = 2
0 3 6 1 -3
10. In Problem 1 above, let A be the matrix in part (c), and let B be the matrix in
part (d). Given that A and B are equivalent, find invertible matrices P and Q
such that B = PAQ. (Hint: See the proof of Theorem 3.47.)
11. In Problem 1 above, let A be the matrix in part (a), and let B be the matrix in
part (b). Given that A and B are equivalent, find invertible matrices P and Q
such that B = PAQ.
12. Prove that the relation "equivalence of matrices" is an equivalence relation on the
set of all matrices over R.
13. Prove that if B is conformable to A, then rank(AB) < min{rank(A),rank(£?)}.
14. Prove Corollary 3.50.
Chapter 4
4.1 Introduction
As promised earlier, the preceding results will now be extended to more general situ
ations. This extension is followed by an application of these results to the solution of
systems of linear equations.
Definition 4.1 Suppose that T is a set of elements in which a relation of equality and
operations of addition and multiplication, denoted by + and ·, respectively, are defined.
Then F is a field with respect to these operations if the conditions below are satisfied
for all a,b,c in T:
1. a + b is in T. (Closure property for addition)
2. (a -f b) + c = a + (b + c). (Associative property of addition)
3. There is an element 0 in F such that a + 0 = a for every a in T. (Additive
identity)
4. For each a in T, there is an element —a in T such that a + (—a) = 0. (Additive
inverses)
5. a + b = b + a. (Commutative property of addition)
6. a - b is in T'. (Closure property for multiplication)
7. (a · b) · c — a · (6 · c). (Associative property of multiplication)
113
114 Chapter 4 Vector Spaces, Matrices, and Linear Equations
The results in Chapter 1 after Definition 1.23 apply only to finite-dimensional vector
spaces. If R is replaced by T and R n is replaced by a finite-dimensional vector space V
over T, Theorems 1.24, 1.26, and 1.27 and Definition 1.28 remain valid with the proofs
unchanged except for notation. In particular, any two bases of a finite-dimensional
vector space have the same number of elements, and the number of elements in a basis
is the dimension of the vector space. In Theorems 1.31 through 1.34, R n is replaced
by an n-dimensional vector space V and W is a subspace of V.
Let us consider now some examples of vector spaces. In each case, T denotes a field.
Example 1 □ For a fixed positive integer n, let Tn denote the set of all n-tuples
(u\,U2, ...,ΐΧη) with Ui in T. Two elements u = (^1,7x2, ...,u n ) and v = {v\,V2,..,vn)
are equal if and only if m — Vi for i = 1,2,..., n. With addition defined in Tn by
=
a(ui,u2,...,un) (aui,au2,>..,aun)
the same techniques used to prove Theorem 1.3 in Section 1.2 can be used to prove that
Tn is a vector space over T. Denoting the multiplicative identity in T by 1, the vectors
(1,0,0,...,0),(0,1,0,...,0),...,(0,0,0,...,1)
can be shown to form a basis of Tn, and therefore Tn has dimension n over T. For
n = 1, we can identify (αχ) with a\ That is, T is a vector space of dimension one over
TM
Example 2 □ For a fixed nonnegative integer n, let Vn denote the set of all polynomials
of the form
ÜQ + a\x + Ü2X2 -\ + a>nxn
with each α^ in T. We shall refer to this set as " 7 ^ over TP That is, Vn over T is the
set of all polynomials Σ™=0 eux1 in the variable x with coefficients in T and degree less
than or equal t o n . ! Let
and
q(x) = b0 + biX-\ h bnxn
1
Note that we are using CLQ interchangeably with CLQX0 in the sigma notation.
116 Chapter 4 Vector Spaces, Matrices, and Linear Equations
denote two elements of Vn. Then p(x) and q(x) are equal if and only if a; = bi for
i — 0,1,2,..., n.With addition and scalar multiplication defined in the usual ways by
p(x) + q(x) = (a0 + &o) + (ax + h)x -\ h (α η + 6 n )£ n
and
φ ( χ ) = cao + caix + · · · + canxn'·>
it is easy to verify that Vn is a vector space over T with the zero polynomial
0 = 0 + Ox + Ox2 + · · · + 0xn
as its additive identity. With p(x) as given above, its additive inverse is the polynomial
= (-IM*)-
The set ß of n + 1 polynomials given by
β={1,ζ,:τ2,...,χ"}
n
spans Vn since an arbitrary p(x) = Σ αΐχ% ls
automatically a linear combination of
vectors in B:
p(x) = a 0 (l) + a\{x) + a2(x2) H l· an(xn).
Also, a linear combination of vectors in B yields the zero polynomial 0 if and only if all
coefficients are zero. Thus B is linearly independent and forms a basis of Vn. It follows
from this fact that all bases of Vn have n + 1 elements, and Vn is of dimension n + 1. ■
For our next two examples, we draw on topics from the calculus.
Example 3 □ Let V denote the set of all infinite sequences
-Wn] = {-dn}
= (-i)KJ.
We are not equipped in this text to prove it, but this vector space V is of infinite
dimension over R. ■
4.2 Vector Spaces 117
Example 4 □ Let V be the set of all real-valued functions of the real variable t with
domain the set R of all real numbers. Two functions / and g in V are equal if and
only if / (t) = g(t) for all real numbers t. With respect to the ordinary operations of
addition
U + g)(t) = f(t) + g(t)
and scalar multiplication
( c / ) ( « ) = c (/(*))
used in the calculus, the set V is a vector space over R. The zero vector in V is the
constant function that is identically zero for all values of the variable t. The additive
inverse of / is the function — / given by
(-/)(*) = - ( / ( * ) )
We are not equipped to prove it here, but this V is an infinite-dimensional vector space
over R. ■
Example 5 □ Let J-'mxn denote the set of all m by n matrices A — [aij] m X n with
elements a^ in T. With addition defined by
Fmxn is a finite-dimensional vector space over T. The additive identity in Tmxn is the
zero matrix
r
o o ··· o
o o 0
(Jmxn — — lujmxn?
o o ... o
and the additive inverse of A — [aij] m X n is the matrix
=
Ά L Üijlmxn
= (-l)A
In many instances, we shall also use the zero vector symbol 0 to indicate a zero matrix.
As illustrations of the operations defined in Example 5, we have
1 -2 0 6 5-9 7 3 - 9
+
7 4 -3 -5 - 1 8 2 3 5
118 Chapter 4 Vector Spaces, Matrices, and Linear Equations
and
5-4 7 15 - 1 2 21
1 0 2 3 0 6
in the vector space R,2x3 of all 2 x 3 matrices over R.
Our last example in this section demonstrates that the operations used in a set to
form a vector space are not unique and do not have to be defined in a certain way. Some
operations that can be used as addition and scalar multiplication may look somewhat
strange when first encountered.
Example 6 □ Let V be the set of all ordered pairs of real numbers with the usual
equality:
(xux2) = (2/1,2/2) if and only if xx = yx and x2 = 2/2-
Addition and scalar multiplication are defined in V as follows:
We shall systematically check the ten conditions required by Definition 4.2 in order that
V be a vector space over R. To this end, let u = (u\, u2), v = (^1,^2), and w = (w\, w2)
be arbitrary elements in V, and let a and b represent arbitrary real numbers.
(1) The sum
u + v = (ui + v\ + 1, u2 + v2 + 1)
is in V since both u\ + v\ + 1 and u2 + v2 + 1 are real numbers.
(2) We have
and
u + ( v + w) = (ui,u2) + (vi +wi + 1,^2 + w2 + 1)
= (ui + (υι + wi + 1) + 1, u2 + (v2 + ^2 + 1) + 1)
= (ui + vi + wi + 2,u2 + v2 + w2 + 2).
Thus addition in V is associative.
(3) The element (—1,-1) in V is an additive identity since
(vuv2) + (-1,-1) - ( v i - 1 + 1 , 1 ^ - 1 + 1)
= (^1,^2)
4.2 Vector Spaces
= (-1,-1)
- 0.
u + v = (ui+vi + l,u2 + v2 + l)
= (vi +ui + l,v2 + u2 + 1)
= V + U.
a(u + v) = a(u\ + ^ + l , u 2 + v2 + 1)
= (a + m/i + a^i -ha — 1, a -f 0^2 + 0^2 + a — 1)
= (a + at*i — 1, a + 01^2 — 1) + (a + at>i — 1, a + av2 — 1)
= au + av.
(10) We have
1-v = (l + l ( v i ) - l , l + l ( t ; 2 ) - l )
= (vi,i>2).
We have verified all the conditions in Definition 4.2, so this set V is a vector space
over R with respect to these operations, even though these operations are dramatically
different from the standard operations in the familiar vector space R 2 . ■
Exercises 4.2
1. Verify that the set in the indicated example from this section is actually a vector
space.
(a) Example 1 (b) Example 2 (c) Example 3 (d) Example 4
(e) Example 5
2. Write out a basis for the vector space R3 X 2 .
3. What is the dimension of J-mxn?
4. Find a set that forms a basis for the vector space V in Example 6 of this section,
and prove that your set is a basis for V.
In Problems 5-16, assume that equality in the given set is the same as in the example of
this section that involves the same elements, and determine if the given set is a vector
space over R with respect to the operations defined in the problem. If it is not, list all
conditions in Definition 4.2 that fail to hold.
5. The set V of all ordered pairs of real numbers with operations defined by
6. The set V of all ordered pairs of real numbers with operations defined by
7. The set V of all ordered pairs of positive real numbers with operations defined by
8. The set W of all p(x) in Example 2 of this section that have zero constant term,
with operations as defined in Example 2.
4.2 Vector Spaces 121
9. The set V of all ordered triples of real numbers with operations defined by
10. The set V of all ordered triples of real numbers with operations defined by
11. The set V of all diagonal matrices in Hnxn with operations the same as those
defined in Example 5.
12. The set V of all skew-symmetric matrices in R n x n with operations the same as
those defined in Example 5.
13. The set V of all ordered pairs of real numbers with operations defined by
c(xi,x 2 ) = (c#i,cx 2 ).
14. The set V of all ordered pairs of real numbers with operations defined by
15. The set W of all / in Example 4 that are differentiate (that is, have a derivative
at every real number), with operations as defined in Example 4.
16. The set W of all / in Example 4 such that /(0) = 0, with operations as defined
in Example 4.
18. Prove that the additive inverse — v of an element v in a vector space is unique.
19. Prove that — v = (—l)v for an arbitrary vector v in a vector space.
20. The vector u — v is defined as the vector w that satisfies the equation v + w = u.
Prove that u — v = u+(—l)v.
21. Let V be a vector space over T.
— \C"ij \nxn
= c,
4.3 Subspaces and Related Concepts 123
where Cij — τα^ + sb^ for i = 1,2, ...,n; j = 1,2, ...,n. Thus
for all pairs i , j . This means that rA + sB is symmetric and hence a member of W .
Therefore W is a subspace of R n X n . ■
For a nonempty subset A of a general vector space V, the set (A) is the set of all
linear combinations of vectors in A. The same development used in Section 1.3 applies
in V, and (,4) is the subspace spanned by A. As in Section 1.3, (0) is the zero
subspace {0} of V.
Example 2 □ In the vector space R 2 x2, consider the sets of vectors A = {A\, A2} and
B = {BUB2}, where
2 -1 1 2 3 -4 4 3"!
,A2 = ,£l = ,B2 =
0 1 1 0 -1 2 2 1
2 -1 1 2 3 -4
+ c2 =
0 1 1 0 -1 2
2ci + c2 = 3
- c i + 2c2 = -4
c2 = -1
ci = 2 ,
so B\ = 2Ai — A2. Similarly, we find that B2 = A\ -f 2A2. Thus we have proved that
(A) = (B).M
124 Chapter 4 Vector Spaces, Matrices, and Linear Equations
Just after Example 3 in Section 1.5, we noted that a basis of a subspace can be refined
from a finite spanning set for the subspace by deleting all vectors in the spanning set
that are linear combinations of preceding vectors. This procedure is illustrated in the
following example.
Example 3 □ Let
A = {pi(x),p2(x),P3(x),P4(x),P5(x)}.
{Pi(x),P3(x),Pi(x),P5(x)}
of (^4). We see that p$(x) is not a multiple of pi(x), so we then check to see if PA(X) is
a linear combination of p\{x) and P3(x). It is easy to discover that
p4(x) =Pl(x)+P3{x),
so ρ4(χ) can be deleted from the last spanning set of (A) we obtained, leaving
{Pi(x),P3(x),P5(x)}
as a spanning set of (,4). Setting up the equation
leads to
1 + x 2 = d(l-\-x-\-2x2) + c 2 (l + x 2 + x 3 )
= (ci + c2) + c\x + (2ci + c2)x2 + c 2 x 3
and the resulting system of equations
ci + c2 = 1
ci =0
2ci + c2 = 1
C2 = 0,
The concept of a transition matrix from one finite set of vectors to another in an
arbitrary vector space applies in every vector space. We consider an example now in
the vector space V2 of all polynomials in x with degree < 2 and coefficients in R.
Example 4 □ Consider the bases A ={pi(x),p2{x),P3{x)} a n d ß = {q\(x), q2(x), q3(x)}
of V2 over R, where
We shall find the matrix of transition A from A to B. In the matrix A = [0^)3x3, the j t h
column is the coordinate matrix of qj (x) with respect to A. Thus we must write each
qj (x) as a linear combination of the polynomials in A. The required linear combinations
are given by
1 = (l)(l + x) + ( - l ) ( x ) + (0)(x + x 2 ),
1 + x = (l)(l + x) + (0)(x) + (0)(x + x 2 ),
1 + x + x 2 = (l)(l + x) + ( - l ) ( x ) + (l)(x + x 2 ).
That is,
qi{x) = ( l ) p i W + ( - l ) p 2 W + (0)p 3 W,
q2(x) = (l)Pl(x) + (0)p2(x) + (0)p3(x),
q3(x) = (1)PI(X) + (-1)P2(X) + (1)P3(X)·
1 0 -1
0 0 1
We note that the coordinates of the qj(x) are entered as columns of A, not as rows of
A. M
Exercises 4.3
1. Determine whether the given set of vectors is linearly dependent or linearly inde
pendent in the vector space V2 over R.
(a) {1 - 2x, x + x 2 ,1 - x + x2} (b) {1 + x, x + x 2 ,1 + 2x + x 2 }
(c) {1 - x 2 , x + 2x 2 ,1 + 3x 2 } (d) {1 + x, 1 + x 2 ,1 + x + x 2 }
(e) {1 - 2x, 2 - x ,4 + x ,1 + x + x 2 }
2 2
(f) {1 + x, 1 - x}
2. Determine whether the given set of polynomials spans V2 over R.
(a) {1 - x, 1 + x, 1 - x 2 } (b) {1 - x, x + x 2 , 1 + x 2 }
(c) {1 + x, x + x 2 ,1 + 2x + x 2 ,1 - x 2 } (d) {1 - x, x + x 2 ,1 + x + x 2 ,1 + x 2 }
126 Chapter 4 Vector Spaces, Matrices, and Linear Equations
3. Determine whether the given set of vectors in R2X2 is linearly dependent or linearly
independent.
1 1 0 1 1 0 1 -1
(a) 7 5
0 0 ·) 1 0 1 0 1 0
1 0 0 1 0 0 1 1
(b) 5 5
0 0 0 0 ·) 1 0 1 0
1 1 1 1 0 0
(c)
0 0 1 0 1 0
1 1 0 1 0 0
(d)
0 0 1 0 1 0
4. Which of the following sets of vectors are bases for the vector space V2 over R?
(a) {1 - x, 1 - x + x 2 ,1 + x 2 ,1 - x - x2} (b) {1 - x, 1 - x 2 , x - x2}
(c) {1 - x, x - x 2 ,1 + x 2 } (d) {1 + x, 1 + %2}
5. Find the transition matrix from the basis A to the basis B in V2 over R.
6. Find a subset of the given set of vectors that forms a basis for the subspace spanned
by the given set.
1 1 1 0 0 1 0 0 1 1
(d) 5 5 1 inR<■2x2
2 1 -3 1 5 0 1 1 3 2
4.3 Subspaces a n d R e l a t e d C o n c e p t s 127
a b
14. The set W of all matrices in R2X2 that have a + b = 0.
c d
1 a
15. The set W of all matrices of the form i n R .2x2·
b c
a b
16. The set W of all matrices in R2X2 that have the form with a, b, c, and
c d
d integers.
r
o 0
17. The set W that consists of the zero matrix together with all invertible
0 0
matrices in R2X2·
0 a
18. The set W of all matrices of the form inR-'2x2·
b 0
a b
19. The set W of all matrices in R2X2 that have the form where a? — d2.
c d
128 Chapter 4 Vector Spaces, Matrices, and Linear Equations
If / is a mapping of S into T, then it may happen that f(s\) = /(S2), even though
si ^ 52- If it is true that f(s\) = /(S2) always implies s\ — S2, then / is called injective
or one-to-one. Another point of interest is that it is not required that every element
of T be an image of an element in S. If it happens that every t in T is the image of at
least one s in S under / , we say that / is a surjective mapping of S into T, or that /
maps S onto T. If / is both injective and surjective, then / is called bijective.
Two mappings / and g of S into T are equal if and only if f(s) = g(s) for every s
in S.
The rule f(x) = sinx defines a mapping of the set R of real numbers into R. This
mapping / is clearly not injective (for example, sin ^ = sin ^ = *ψ ) . Also, / is not
surjective since there is no x in R such that f{x) — 2.
If S ={x G R I 0 < x < π} and T = {t G R | 0 < t < 1}, then the rule f(x) = sinx
defines a mapping of S into T that is surjective but is not injective.
If S = {x G R I 0 < x < f } and T = {t G R | 0 < t < 1}, the rule f(x) = sinx
defines a mapping of S into T that is both surjective and injective.
These examples illustrate the fact that the surjective and injective properties depend
on the sets S and T as well as the rule that defines the mapping.
Definition 4.4 Let U and V be vector spaces over the same field T. An isomorphism
from XJtoVisa bijective mapping fofXJ into V that has the property that
/ ( a u + ftv)=a/(u) + 6/(v)
We are now in a position to prove the principal result of this section. This result
shows that the first example in Section 4.2 furnishes a completely typical pattern for
n-dimensional vector spaces over a field T.
Theorem 4.5 Any n-dimensional vector space over the field T is isomorphic to Tn.
Proof. Let V be an n-dimensional vector space over the field ?*, and let
B = {vi,v2,...,vn}
/ ( u ) = (αι,α 2 ,...,α η )
Exercises 4.4
1. Let V be the vector space in the second example in Section 4.2. Exhibit an
isomorphism from V to Ρη+ι.
2. Define a mapping / of R3X2 into R 6 that is an isomorphism from R3X2 to R 6 ,
and prove that your mapping is an isomorphism.
3. For each subspace W below, determine the dimension r of W and find an isomor
phism from W to R r .
1 0 -2 0 1 1 2 1 0 1
(g) W : 7 7 5 5 inR<•2x2
-1 1 2 -2 1 1 0 2 2 0
1 1 3 3 1 2 2 1
(h) W 7 5 inR-•2x2
2 0 6 0 3 1 3 -1
4. Let V be the subspace spanned by the set {p\ (x), p2 (x) » P3 (#) > P4 (#)} of polynomi
als in Problem 6(e) of Exercises 4.3. Find an isomorphism from V to a subspace
ofR4.
Theorem 4.6 Let B = {ui, 112,..., u n } be a fixed basis of the vector space V overT, and
let A = {vi, V2,..., v m } be a set of m vectors in V that spans the subspace W = {A) of
dimension r > 0. Then a set A! = {ν'ΐ5 v 2 ,..., v^.,0, ...,0} of m vectors can be obtained
4.5 Standard Bases for Subspaces 131
from A by a finite sequence of elementary operations so that {ν^, v 2 ,..., v^,} has the
following properties:
1. The first nonzero coordinate of v'j with respect to B is a 1 for the kj coordinate
for j = 1,2, ...,r. That is, v'j = Y^=k. a'^Ui with a'kjj = 1.
2. ki <k2 < ·" <kr
3. Vj is the only vector in A! with a nonzero kj coordinate relative to B.
4. {v[, v 2 ,..., v^} is a basis o / W .
Proof. The proof can be obtained from the proof of Theorem 2.9 by replacing R n
by V, εη by 23, and e^ by u* so the a[j represents the z th coordinate of v'j relative to B
instead of the z th component of v'j. ■
Theorem 4.7 For a fixed basis 23, there is one and only one basis of a given subspace
W that satisfies the conditions of Theorem J^.6.
Proof. It follows from Theorem 4.6 that there is at least one such basis of W . Let
A! = {v'1?V2,...,vJ.} and A" = {v", v 2 ',..., v"} be two bases of W that satisfy the
conditions. Then the same replacements used in the proof of Theorem 4.6 can be used
in the proof of Theorem 2.10 to obtain a proof that A! = A"'. M
Definition 4.8 Let B be a fixed basis of the n-dimensional vector space V over T, and
let W be a subspace 0 / V of dimension r. The basis o / W that satisfies the conditions
of Theorem 1^.6 is called the standard basis o / W relative to B.
That is, the standard basis of W relative to B — {ιΐχ, 112,..., u n } is the unique basis
A = {vi, V2,..., v r } that has the following properties:
V
!· J = YTi=ka aiJUi with a
k3j = 1
2. hi < k2 < · · · < kr
3. Vj is the only vector in A that has a nonzero kj coordinate relative to B.
When referring to standard bases, we use the phrase "with respect to 23" inter
changeably with "relative to 23."
The proofs of the following two theorems can be obtained from those of Theorems
2.12 and 2.13 by making the same changes as were indicated in Theorems 4.6 and 4.7.
Theorem 4.9 Let B be a fixed basis of the finite-dimensional vector space V over T.
For any subspace W of V, the standard basis of W relative to B can be obtained from
any basis o / W by a sequence of elementary operations.
Theorem 4.10 Let A and B be two sets of m vectors each in the finite-dimensional
vector space V over T. Then {A) — (23) if and only if B can be obtained from A by a
sequence of elementary operations.
132 Chapter 4 Vector Spaces, M a t r i c e s , and Linear Equations
2-4-1 0
1 5 3 7
- 1 7 3 5
1 0 0 0
Ä 0 1 0 0
1 I 0 0
Thus the standard basis for (*4) with respect to B is {v^, v 2 } , where
v'1=U1-fu3 = ( l , - f , - f ) ,
vi, = u 2 + f u 3 = ( 2 , f , f ) .
Exercises 4.5
2. Let ρι(χ) = 2x2 + 2,p2(x) — x-\-l,ps(x) = 2x2 — 3x-\-l in the vector space V2 over
R. Given that B — {pi{x),P2(x),P3(x)} is a basis of V2 over R, find the standard
basis of (A) with respect to B if A is given by
1 0 1 1 1 1 1 1
,u2 = ,u3 = ,u 4 =
0 0 0 0 1 0 1 1
in the vector space R 2 x 2 . It is given that B — {iii, u 2 , u3,114} is a basis of R 2 x 2 .
For each set A, find the standard basis of (*4) relative to B.
2 1 1 2 4 5
, A2 = , A3 =
0 1 2 1 4 3
-2 -2 -4 -7 0 -2 6 7
, A2 = ,A3 = , A4 =
-2 -1 -1 -2 2 0 5 3
4. Determine in each case whether or not the given sets span the same subspace of
V2 over R.
(a) Find the matrix of transition from the basis {1, x, x2} of V to the basis B in
Problem 2.
(b) Find the matrix of transition from the basis B in Problem 2 to the basis
{l,x,x2}ofV.
134 Chapter 4 Vector Spaces, Matrices, and Linear Equations
a
ml%l + Om2%2 + * ' * + Q'mnXn — 6m
cases, it is more convenient to write these solutions in vector form a s v = (#i, #2»..., # n ),
while in others, the matrix form
Xl
X2
X =
Theorem 4.11 With the notation of the preceding paragraph, let (A) denote the sub-
space of Q n that is spanned by A. Then each solution of the system A is a solution of
every equation in the subspace (A).
or
Therefore
= ci6i + c 2 6 2 H h cm6m,
Definition 4.12 Two systems A and B contained in Q n are equivalent if and only if
they have the same solutions.
136 Chapter 4 Vector Spaces, Matrices, and Linear Equations
Matrices are valuable tools in the solution of systems of linear equations. Consider
the system A given by
a i i x i + α ι 2 £ 2 H l· αιηχη = &i
a2\X\ +022^2 H Va2nXn = &2
Q>mlXl + Û m 2 ^ 2 H h ümn^n = bm .
Xl 6i
X2 b2
A — [&ij\mxm ^ — and B =
xi + 3x 2 + 2x 3 - χδ = 7
2x\ + 6x2 + 5^3 + 6x4 + 4x5 = 0
xi + 3x 2 + 2x 3 + 2x 4 + X5 = 9
xi + 3x 2 + 2x 3 - x 5
2xi + 6x2 + 5x3 + 6x4 + 4x5
xi + 3x2 + 2x3 + 2x4 + x5
4.7 Systems of Linear Equations 137
Xl
3 2 0 -1 X2 7
6 5 6 4 X3 = 0
9
3 2 2 1 X4
J
£5
Xl
1 3 2 0 -1 %2
2 6 5 6 4 X X3 and B =
1 3 2 2 1 £4
X5
Definition 4.14 In the system AX = B, the matrix A is called the coefficient ma
trix, X is the matrix of unknowns, and B is the matrix of constants. The matrix
Ural Gm2 x
ran u
m
Each system in a certain set of variables such as A above has a unique augmented
matrix. That is, each system has an augmented matrix, and different systems have
different augmented matrices. Any elementary operation performed on A is reflected
in the augmented matrix as an elementary row operation, and any elementary row
operation on the augmented matrix produces a corresponding elementary operation on
A.
Proof. Let A and A' be the sets of vectors in Q n that consist of the equations in
the systems AX = B and A'X = B'', respectively.
138 Chapter 4 Vector Spaces, Matrices, and Linear Equations
If the augmented matrices [A, B] and [Α',Β'] are row-equivalent, then [Α',Β'] can
be obtained from [A, B] by a sequence of elementary row operations. Hence A! can be
obtained from A by a sequence of elementary operations and A and A! have the same
solutions, by Theorem 4.13. ■
Proof. This follows from the fact that the augmented matrices [A, B] and [PA, PB] =
P[A,B] are row-equivalent. ■
Let S denote the column space of A, let S * denote the column space of [-A, JB], and let
b = (61,625 •••î&m) £ ^7771· According to the last statement of the preceding paragraph,
AX = B has a solution if and only if b is in S. But b is in S if and only if S and S*
have the same dimension, i.e., if and only if A and [A, B] have the same rank. ■
Theorem 4.18 If A is an m xn matrix over T and rank ([A, B]) — rank(A) = r, then
the solutions to AX = B can be expressed in terms of n — r parameters.
By Corollary 3.49, A' has r nonzero rows. Thus the system A'X = B' has the form
x
ki + ai )fcl+1 Xfc 1+ i + · · · + α ^ . ^ - ι + · · ·
0 = bL.
In this system each variable χ^,ϊ = 1, 2,..., r occurs just once in the z th equation with
a coefficient 1. Hence each of x/^, ...,x/cr can be expressed in terms of the remaining
n — r variables. ■
The variables x^, Xk2 » ···» #fcr m the ^ a s t paragraph are called the leading variables,
and the remaining n — r variables are called the parameters in the solution of the
system.
The proof of Theorem 4.18 furnishes at one and the same time a method for de
termining the existence of solutions and a method for obtaining them. To solve the
system AX = B, we can use elementary row operations to transform the augmented
matrix [A, B] into reduced row-echelon form [A*', B']. The condition that rank ([A, B]) =
rank(A) is reflected in the conditions 0 = ^ + 1 , ...,0 = 6^ since
and
rank ([A, B)) = rank (P[A, B]) = rank ([Α', Β']).
If rank ([A, J5]) > rank(A), then rank ([A\ B']) > r and at least one of the equations
0 = &'Γ+1,...,0 = b'n will be contradictory. If rank ([A, B\) = rank(A), then there are
solutions, and they can be obtained by solving for the leading variables in terms of the
parameters. This method of solution is called Gauss-Jordan elimination. A system
is solved by Gauss-Jordan elimination in Example 1.
xi + 2x 2 + X3 ~ 4x 4 + x 5 = 1
X\ + 2^2 — X3 + 2X4 + £5 = 5
2x\ + 4x2 + £3 — 5x4 = 2
x\ + 2x2 + 3x3 — IOX4 4- X5 = —3 .
140 Chapter 4 Vector Spaces, Matrices, and Linear Equations
The augmented matrix [A, B] can be transformed to reduced row-echelon form as fol
lows.
1 2 1 -4 1 1 1 2 1 -4 1 1
1 2 1 2 1 5 0 0 -2 6 0 4
[AB] -
2 4 1 -5 0 2 0 0 -1 3 -2 0
1 2 3 -10 1 -3 0 0 0 -4
1 2 0-1 1 3 1 2 0 - 1 0 2
0 0 1 - 3 0 - 2 0 0 1 - 3 0 - 2
= [A',B'}
0 0 0 0-2-2 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0
x\ + 2x2 — XA — 2
X3 — 3X4 = -2
X5 = 1 -
When we solve for the leading variables xi, X3, £5 in terms of the parameters #2? #4? we
obtain the solutions to the system:
x\ — 2 — 2x2 + #4
£3 = — 2 + 3x4
^5 = 1,
There is another point of view to Theorem 4.18 and its proof that is useful in
Chapter 5. The fifth set in Example 1 of Section 1.3 generalizes immediately from R n
to Tn. That is, the set of all vectors v = (xi,X2, ...,x n ) in Tn with components x;
that satisfy a given system of equations AX = 0 is a subspace W of Tn. The n — r
parameters in the proof of Theorem 4.18 represent the components of v that can be
assigned values arbitrarily. We can solve for the leading variables Xki,Xk2)—>xkr m
the equation A'X = 0 and express them in terms of the n — r parameters. If we then
replace these leading variables by their values in terms of the parameters, the vector
(xi,X2, ...,x n ) that represents the general solution of the system can be obtained as a
linear combination of the vectors in a basis of W . There will be n — r vectors in this basis
since there are n — r parameters present. This is illustrated in the following example.
4.7 Systems of Linear Equations 141
xi + 2x 2 + X3 - 4x 4 4- x 5 = 0
X\ + 2X2 — X3 "h 2X4 + X5 = 0
2xi + 4x 2 + X3 — 5x4 = 0
X\ + 2X2 + 3X3 — IOX4 + #5 = 0 .
The coefficient matrix in this system is the same as the one in Example 1, and
the reduced row-echelon form for this system can be obtained simply by replacing the
constants B' in \A', B'} by a column of zeros to obtain [A\ 0]. This gives
1 2 0 - 1 0 0
0 0 1 - 3 0 0
[A',0]
0 0 0 0 1 0
0 0 0 0 0 0
X\ + 2X2 — X4
X3 — 3X4
£5 =
X\ = —2X2 + X4
X3 = 3X4
x5 = 0.
Replacing the leading variables by their values in terms of the parameters yields
W = ((-2,1,0,0,0),(1,0,3,1,0)),
142 Chapter 4 Vector Spaces, Matrices, and Linear Equations
and {(—2,1,0,0,0), (1,0,3,1,0)} is a basis for the solution space W . As a final remark
concerning these solutions, we note that the matrix form
Γ-2 1
1 0
X = x2 0 + X4 3
0 1
0 °J
can be easily predicted by listing all the variables in a column with the leading variables
expressed in terms of the parameters:
Xl = -2x2 + XA
X2 = X2
X3 = 3X4
X4 = X4
x$ = 0.
Exercises 4.7
Xl + X2 = —2 2xi — x3 = 1
2xi + 4x 2 — #3 = —3 3xi + x2 - x3 = 2
5. 2#i + £2 = — 4 6. 9xi — 6x2 = 15
Xl - X2 = 4 15xi - 10x2 = 25
- 3 x i + 3x 2 = 2 6x1 - 4x2 = 10
7. xi + 2x 2 + 5x 3 = 0 8. 4xi + 4x2 - 7x3 + 3x4
4xi + 12x2 + 21x 3 + 2x 4 = 0 3xi + 3x2 — 5x3 + 2x4
3xi + 6x 2 + 15x3 - 3x 4 = 0
4.7 Systems of Linear Equations 143
12. χχ + x2 + X3 + X4 + X5 = 2
x\ + X2 + 2x3 + 3x4 + 4x5 = 4
2xi + 2x2 + 3x3 -1- 4x4 + 5x5 = 6
13. xi + 2x2 -f X4 =1
2xi + 4x2 -h X3 + 4x4 + 3x5 = 6
xi + 2x2 + 2x3 + 4x4 — 2x5 = 1
—xi — 2x2 + 3x3 -f 5x4 H- 4x5 = 6
In Problems 15 and 16, (a) find the rank of the coefficient matrix, (b) find the rank
of the augmented matrix, and (c) determine if the system is consistent by comparing
these ranks. It is not necessary to solve the systems.
15. xi + 2x 2 - X3 = 3 16. xi + 2x 2 + 3x 3 + x 4 = 0
X2 + X3 = 1 %2 + %3 + %4 = 0
x\ — 3x3 = 0 χχ + X3 — X4 = 1
144 Chapter 4 Vector Spaces, Matrices, and Linear Equations
17. For the following matrix A, (a) find a basis of the column space of A, and (b)
express each column vector of A as a linear combination of the vectors in your
basis.
3 2 -2 7
6 4 -4 14
1 1 2 3
-5 -4 -2 -13
11 8 -2 27
18. Find all real numbers a and b for which the system of equations below does not
have a solution.
X\ + £ 3 = 1
ax\ + £2 + 2x 3 = 0
3xi + 4x 2 + bx3 = 2
In Problems 19 and 20, find the values of the real number a for which the given system
(a) has no solution, (b) has exactly one solution, (c) has infinitely many solutions.
19. xi + 2x2 - X3 = 2
2x\ + 6x 2 + 3x 3 = 4
2
3xi + 8x 2 + (a 2)x 3 = a + 8
20. xi + x2 + X3 = 3
2xi + 3x2 + 3x3 = 8
3xi + 3x2 + (a2 — 6)x3 = a + 6
22. For each of Problems 1-14, let A be the coefficient matrix of the given system. Let
W be the subspace consisting of all (xi,X2, ...,x n ) in R n that satisfy the system
AX = 0. Find a basis of W in each case.
23. Let W be the subspace of all (xi, X2, ···, xn) in Tn that satisfy the system AX = 0,
and let c = (ci,C2, ...,c n ), where x\ = ci,X2 = C2,...,x n = cn is a particular
solution to the system AX = B. Prove that c + W is the complete set of solutions
to AX = B.
Chapter 5
Linear Transformations
5.1 Introduction
In this chapter, the important concept of a linear transformation of a vector space is
introduced. Matrices prove to be a powerful tool in the study of linear transformations
of finite-dimensional vector spaces. They can be used to classify linear transformations
according to certain equivalence relations that are based on fundamental properties
common to different linear transformations.
145
146 Chapter 5 Linear Transformations
We recall from Section 4.4 that an isomorphism is a bijective mapping / that has
the property / ( a u + bw) = a / ( u ) + bf(w) required in Definition 5.1. Hence every
isomorphism is a linear transformation. However, a linear transformation of U into V
may be neither injective nor surjective, even though it preserves linear combinations
just as an isomorphism does.
In addition to the isomorphisms, another family of examples of linear transformations
is provided by the zero transformations. For a given pair of vector spaces U and V, the
zero linear transformation is the mapping Z : U —> V defined by Z(u) = 0 for all
u in U.
The following examples provide some more detailed illustrations concerning linear
transformations.
Example 1 D Consider the mapping T : R 2 —> R 2 defined by 1
T(x, y) = (4x + 5j/, 6x - y).
For arbitrary u = (u\, u2), w = (wi, w2) in R 2 and arbitrary a, b in R, we have
and
aT(u) + feT(w) = a(ui + 1, 2i*i + u 2 ) + 6(1^1 + 1, 2iüi + w2)
= (au\ + fe^i + a +fe,2αΐλχ + cm2 + 2feu>i + 6^2).
We see that the equality
T(au + few) = aT(u) + 6T(w)
holds if and only if a + fe = 1. Since this equation is not always true for a and fe in R,
we conclude that T is not a linear transformation. ■
X2
X =
Example 5 D We saw in Example 4 and Problem 15 of Section 4.2 that the set V of
all real-valued functions of t with domain R and the set W of all differentiable functions
of t with domain R form vector spaces over R with respect to the usual operations of
addition and scalar multiplication. Consider the mapping T : W —> V defined by
r(/(*)) = f = /'(*)·
That is, T maps each differentiable function onto its derivative. Using familiar facts
from the calculus, we get
= aT(f(t)) + bT(g(t)).
(aT)(u)=a(T(u)).
Example 6 D Let U = R and V = R 2 , and let S and T be defined by
3
S(xi,X2,X3) = (2X1 + X 3 , 3 x i - X 2 ),
T(xi,X2,X3) = (-Xl + 3 £ 2 , 2 £ i - X2 + X 3 ) ·
(aT)(xi,X2,X3) = α ( - χ ι + 3 x 2 , 2 x i - x2+X3)
= (—αχι+3αχ2,2αχι—αχ2+αχ3). Μ
With the definitions of addition and scalar multiplication given in Definition 5.2,
the linear transformations of U into V can be regarded as possible vectors. The next
theorem shows that they are indeed vectors.
Theorem 5.3 Let U and V be vector spaces over the same field T. Then the set of all
linear transformations ofXJ into V is a vector space2 over J7.
Proof. For a complete proof, each of the ten conditions of Definition 4.2 must be
verified. We verify the first six here, leaving the others as exercises. Let Ti,T2, and T3
denote arbitrary linear transformation of U into V, let u and w be arbitrary vectors in
U, and let a, 6, and c be scalars.
Since
for all u in U.
The additive inverse of T\ is the linear transformation —T\ of U into V defined by
( - 7 i ) ( u ) = - T i ( u ) since
for all u in U.
For any u in U,
8θΤι+Γ = Γ + Τι.
2 2
Since
(cTi)(au + 6w) = c(Ti(au + ftw))
- c(aTi(u) + 6Ti(w))
- a(cr (u))+6(cT (w))
1 1
cT\ is a linear transformation of U into V. ■
Definition 5.5 The subspace T(U) o/ V 25 ca//ed ί/ie range of T. The dimension of
T(U) is called the rank ofT. The rank ofT will be denoted by rank(T).
Near the end of this section we will devise a method for finding the rank of T when
U and V are finite-dimensional.
Definition 5.7 The subspace T~1(0) is called the kernel of the linear transformation
T. The dimension o / T _ 1 ( 0 ) is the nullity o/T, denoted by nullity(T).
The essence of a method for finding the rank of a linear transformation T is contained
in the proof of Theorem 5.8. For the set T(A) contains a basis of T(U), and the number
of elements in the basis is rank(T). Our next example demonstrates the use of this
method to find a basis for the range of a linear transformation T, and we also find a
basis for the kernel of T. More efficient and systematic methods for finding these bases
are developed in the next section.
a n «12 «13
= («n + α 2 ι , 2 α η + a2\ - a i 2 , a 2 2 + ai 3 ,0,a 2 3 )
a 2 i a 22 a 23
is a linear transformation of R 2 x 3 into R 5 . We shall (a) find a basis for the range of T,
(b) find a basis for the kernel of T, and (c) state the rank and nullity of T.
(a) In order to find a basis for T ( R 2 x 3 ) , we first obtain a spanning set T(A) as
described in the proof of Theorem 5.8. The set A = {Αχ, Α2, A^, A4, Α^, Α&} forms a
basis of R 2 x 3 , where
Γΐ 0 0 0 1 0 0 0 ll
, A2 = , 4» =
[o 0 0 0 0 0 0 0 oj
0 0 0 0 0 0 0 0 0
, A5 = , A6 =
1 0 0 0 1 0 0 0 1
152 Chapter 5 Linear Transformations
and we know that T(A) contains a basis of T(R,2X3). Using the refinement process from
Section 1.5 and Example 3 in Section 4.3, we find that the first three vectors in T(A)
are linearly independent and the fourth vector can be written as
Thus the fourth vector can be deleted from the spanning set T(A). The fifth vector in
T(A) is a repetition of the third vector, so it can also be deleted. The last vector is
clearly not a linear combination of the preceding vectors since it is the only one with a
nonzero fifth component. Thus
Gil + Ö 2 1 = 0
2<2n + α21 - Û12 = 0
&22 + Û13 = 0
0 = 0
Ö23 = 0
Using Gauss-Jordan elimination, we find that the reduced row-echelon form for the
augmented matrix is
1 0 - 1 0 0 0 0
0 1 1 0 0 0 0
0 0 0 1 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 0
5.2 Linear Transformations 153
an = ai2
Û21 = -Û12
Û12 = Û12
a22 = — Û13
«13 = Û13
«23 = 0
rank(T) = dim(T(R 2 x 3 )) - 4
and
nullity(T) = dimiT-^O)) = 2
We note that the sum of the rank and nullity of T is equal to the dimension of the
domain of T in the example above. Our next theorem states that this equality always
holds.
Theorem 5.9 Let T be a linear transformation of\J into V. If TJ has finite dimension,
then
rank(T) + nullity (T) = dim(U).
Proof. Suppose that U has dimension n, and let k be the nullity of T. Choose
{ui, U2,..., Ufc} to be a basis of the kernel T - 1 ( 0 ) . This linearly independent set can be
extended to a basis
A= {ui,U2,...,Ufc,Ufc+i,...,Un}
154 Chapter 5 Linear Transformations
of U. According to Theorem 5.8, the set T(A) spans T(U). But T(ui) - T(u 2 ) = · · · =
T(iifc) = 0, so this means that the set of n — k vectors {T(ufc+i),T(ufc+2), ...,T(u n )}
spans T(U). To show that this set is linearly independent, suppose that
Then
r(Cfc+lUfc+l + Cfc+2Ufc+2 H l· CnUn) = 0,
-1
and Y^=kJri CiU{ is in T ( 0 ) . Thus there are scalars di, d 2 ,..., d^ such that Σ ί = ι d^u* =
ΣΓ^/c+i c*u* and
k n
d U CiUi =
Y2 i i~ Σ °·
i=l i=fc+l
Since *4 is a basis, each C{ and each d^ must be zero. Hence {T(ufc_j_i), T(ufc+2), ♦··? T(un)}
is a basis of T(U). Since rank(T) is the dimension of T(U), n — k — rank(T) and
Exercises 5.2
9. Find the rank and nullity of the given linear transformation of U into V.
(a) U = V = R 4 ,
Γ(χι,Χ2,Χ3,£4) = {Xl +X2 + X 3 - Χ±,Χ\ +2x2 + X3 - X4,
Xl — X 2 + 3X3, —Xl + 5X2 — 5X3 — X 4 )
4 3
(b) U = R ,V = R ,
Τ(Χ\,Χ2,Χ$,ΧΑ) = (xi H- 2x 2 + X3 + 3x 4 ,2x 3 - 4χ 4 ,χ χ + 2x 2 + 3x 3 - x 4 )
10. In each part of Problem 9, find the standard basis of the kernel of T by solving
the system of equations that results from setting T(u) = 0.
(a) T(0) = 0
156 Chapter 5 Linear Transformations
(
(c) T(u n- w) =\ T(u)n - T(w) for all u, w in U
Σ aiui ) — E a
^ ( u 0 f° r a ^ scalars α^ in T and vectors u* in U
Thus, with each choice of bases A and ß , a linear transformation T of U into V de
termines a unique indexed set {a^·} of run elements of T. These elements make up the
matrix of T relative to the bases A and B.
Definition 5.10 Suppose that A = {ui,U2, ...,u n } and B = {νχ, v 2 , ..., v m } are bases
of U and V, respectively. Let T be a linear transformation of U into V. The matrix
of T relative t o the bases A and B is the matrix
A = [aij]mxn = [T]B,A
= aijVi + a2jv2 -\ h a m jV m
for j = l,2,...,n.
The symbols A = [ a ^ ] m X n and [T]ß^ in Definition 5.10 denote the same matrix,
but the first one places notational emphasis on the elements of the matrix, while the
second one places emphasis on T and the bases A and B. This matrix A is also referred
to as the matrix of T with respect t o A and ß , and we say that T is represented
by the matrix A.
As mentioned earlier, the elements CLî j are uniquely determined by T for given bases
A and B. Another way to describe A is to observe that the j t h column of A is the
coordinate matrix of T(UJ) with respect to B. That is,
aij
a2j
= PK·)]*
I Q"mj
and
A = [Τ]β,Λ = [[T( U l )] ß , [T(u 2 )] ß ,..., [Γ(ιΐη)]Β].
A = {l,l-x,x2} oiV2
and
# = { l , x , l - x 2 , l + x 3 } of TV
158 Chapter 5 Linear Transformations
T(l) = 2 + x + x 3
= (l)(l) + (l)(x) + ( 0 ) ( l - x 2 ) + (l)(l + x 3 ).
T(l-x) = 2-x2+x3
= (0)(1) + ( 0 ) W + ( 1 ) ( 1 - I 2 ) + (1)(1 + I 3 )
T(x2) = 2 + 3x + 2x2 + x3
= (3)(1) + (3)(z) + (-2)(1 - x2) + (1)(1 + x3).
1 0 3
1 0 3
A=[T]B,A = [[T(l)]B,[T(l-x)]B,[T(x2)}B] =
0 1 -2
1 1 1
1 0 - 1 1
[Τ}ε3,ε4 2 1 0 3
1 2 3 3
Xl
X2
/(ζι,Χ2,.·.£η)
and consider them as being the same entity. This isomorphism leads to a natural
connection between the preceding example and Example 4 of Section 5.2. From the
example in Section 5.2, we know that the matrix transformation S : R4 X i —► Rßxi
defined by
/ xi
\ xi
1 0 -1 1 £1 — £3 + ^4
X2 X2
— 2 1 0 3 = 2a: 1 + X2 + 3x4
X3 X3
1 2 3 3 xi + 2x 2 + 3x3 + 3x4
X4 X4
\ )
is a linear transformation. It is clear at a glance that the matrix transformation S
and the mapping T in Example 2 are the same except for notation. That is, the two
mappings differ only by an isomorphism.
The work in Examples 1 and 2 of this section illustrates that, for given bases A and
/3, the matrix [ T ] ^ ^ is uniquely determined by T. On the other hand, for a given matrix
Aijimx n and fixed bases A and #, there is only one linear transformation that has
A as its matrix relative to A and B. For if A is the matrix of both S and T, then we
have S(UJ) = ΣΙ T(uj) and, for any u = X ^ = 1 XjVLj in U,
160 Chapter 5 Linear Transformations
But 5(u) = T(u) for all u in U means S = T. Thus, for fixed bases A and 23, T and
v4 = [Τ]β^ determine each other uniquely by the rule
aij
τ
κ) = Σα^ν*'or
a2j
= [ΓΚ)] Β ·
i=\
1 0 3
1 0 3
A =
0 1 -2
1 1 1
relative to A = {1,1 — x,x2} and B — {l,x, 1 — x2,1 + x3}. The matrix A can be
transformed to reduced column-echelon form as follows.
1 0 3 1 0 0 1 0 0
1 0 3 1 0 0 1 0 0
A = = Ä
0 1 -2 0 1 -2 0 1 0
1 1 1 1 1 -2 1 1 0
(l)(l) + (l)(x) + ( 0 ) ( l - x 2 ) + ( l ) ( l - f x 3 ) = 2 + x + x 3
(0)(l) + (0)(x) + ( l ) ( l - x 2 ) + ( l ) ( l - f x 3 ) = 2 - x 2 + x 3
{2 + £ + £ 3 , 2 - x 2 + £ 3 }. ■
[T(u)] B = [T}B<A[u}A.
Proof. Let
Xl 2/1
2/2
ful , =X = and [T(u)]B = Y =
162 Chapter 5 Linear Transformations
so that u = Σ xjuj an
d T(u) — Σ Vivi- Since A — [ a ^ ] m x n is the matrix of T relative
j=l i=l
to A and B, we have
m
Σ yM = r(u)
3=1
n / m
X a
= Σ J Σ ijVi
j=l \t=l
m / n \
α χ v
= Σ Σ ϋ3 *·
But the coordinates of T(u) relative to B are unique, so this means that jji = Σ ? = ι aijxj->
and
2/1 Û l l X l + CL\2X2 H h ûln^n
That is,
[T(u)]ß = [T]^[u]^.
Theorem 5.12 shows that the matrix transformation from R n x i to R m x i defined
in Section 5.2 by T(X) = AX generalizes to coordinates with arbitrary linear transfor
mations of finite-dimensional vector spaces. However, it should be kept in mind that
X, A, and Y in the equation Y = AX are taken relative to the bases A and B, and
consequently change when A and B are changed.
Example 4 □ Let T be the linear transformation of R 4 into R 3 that has the matrix
1 0 - 1 1
A = 2 1 0 3
1 2 3 3
5.3 Linear Transformations and Matrices 163
Thus
-4
3
[uü
5
-1
and therefore
-4 _ _
1 0 -1 1 -10
3
[T(U)]B = 2 1 0 3 = -8
5
1 2 3 3 14
L
-1
Using these coordinates with the vectors in the basis ß, we find that
T ( 3 , 4 , - l , - 4 ) = ( - 1 0 ) ( 0 , l , l ) + ( - 8 ) ( l , 0,0) + 14(0,0,1)
= (-8,-10,4). ■
Theorem 5.12 reveals a practical approach to the problem of finding a basis for the
kernel T - 1 ( 0 ) . For u G Γ _ 1 ( 0 ) if and only if the coordinates X of u satisfy AX = 0.
Thus, the solutions to the system of equations AX = 0 furnish the coordinates of the
vectors i n T " 1 ^ ) .
1 1 1 0 2
0 1 2 1-1
0 0 0 1-1
1 1 1 0 2
and we need to solve the linear system AX = 0. Using the augmented matrix [A, 0], we
have
1 1 1 0 2 0 1 1 1 0 2 0
0 12 1-1 0 0 12 1-10
μ,ο]
0 0 0 1-1 0 0 0 0 1-10
1 1 1 0 2 0 0 0 0 0 0 0
1 1 1 0 2 0 1 0 - 1 0 2 0
0 12 0 0 0 0 1 2 0 0 0
0 0 0 1-10 0 0 0 1-10
0 0 0 0 0 0 0 0 0 0 0 0
Xl = ^3 -- 2X 5
X2 = -2x 3
X3 = ^3
X4 = x5
X5 = x5
5.3 Linear Transformations and Matrices 165
Using the coordinate matrices [1, —2,1,0,0] T and [—2,0,0,1,1] T with the basis A, we
obtain the basis
{ l - 2 x + x 2 , - 2 - f x3 +x4}
for the kernel of T. ■
Theorem 5.13 Let A = {ui,U2, ...,u n } and B = {vi, v 2 ,..., v m } be bases of JJ and
V, respectively. If A = [afj] m x n is a matrix such that the equation
[T(u)]B = A[u}A
is satisfied for all u in U, then A is the matrix of the linear transformation T relative
to A and B.
02j
Uo
u
nj
and
n
Σ alkt>kj
k=l aXj
a2j
T[(uj)]B - ^[u,·]^ fc=l =
n
G"mj
/ v Q"mkOkj
k=\
Thus T(UJ) — Σ™=ι aijVi, and A is the matrix of T relative to A and ß. ■
Exercises 5.3
1. Let vi = (2, —1) and v 2 = (1,0) in R 2 . Find a formula for T(x, y) if T is the linear
transformation of R 2 into R 3 for which Τ(ν χ ) = (1,0,1) and T(v 2 ) = (0,1,1).
166 Chapter 5 Linear Transformations
Γ ~\ \ 1 0
1 \ 0
= 2 andT ) - -1
0 / 1
L J /
3 ) 1 1
FindT
1 0 0 1
,T
0 0 0 0
Γ "I \ 2 / Γ "1 \ 3
0 0 \ / 0 0 \
= 3 ,T = 3
1 0 / 0 1
L J /
2 I\ L J
/
/
0
FindT
In Problems 5-8, find the matrix of the given linear transformation T relative to the
bases Λ and B.
5. T : R 3 ^ R 4 , . 4 = é:3,£ = £:4
Τ(χι,χ2,χ3) = (xi + X2 - ^3,5x 2 - 2x 3 ,4xi + x 3 , 2 x i + 3 x 2 + £3)
4 3
6. T : R - + R , . 4 = £4,B = S3
T(xi, X2, X3,X4) = (^1 + #2 + #3 - #4, X\ + 2x 2 + #3 ~ #4> #1 ~ #2 + #3)
1 2
9. If T is the linear transformation that has the matrix -1 1 relative to the
1 1
bases {(1,1), (-3,1)} of R 2 and £ 3 of R 3 , find T ( - 2 , 2 ) .
2 3 1
10. Let T be the linear transformation of R 3 into R 2 that has the matrix
1 2 1
relative to the bases {(1,-1,1), (0,1,0), (1,0,0)} of R 3 and {(3,2), (2,1)} o f R 2 .
Find T(2,0,1).
1 0 0
2 1 1
11. LetA = be the matrix of the linear transformation T : V2 —> V3 over
3 2 1
4 3 1
R with respect to the bases {4,1+x, 1 + x 2 } and { l , x , x 2 , x 3 } . Find T ( 2 - 2 x + x 2 ) .
1 2 3
12. Let A = be the matrix of the linear transformation T : V2 —> Pi
0 1 2
with respect to the bases { l + x + x 2 , x + x 2 , x 2 } and { l , 2 + 3 x } . Find T ( 2 + 5 x + x 2 ) .
4 -1
13. The linear operator T on R 2 has the matrix relative to the basis
-4 3
1
A = B = {(1,2), (0,1)}. A vector u has coordinates relative to this basis.
1
Find T(u) in component form (#,?/).
14. Suppose the bases A of R2X2 and # of V\ over R are given by
1 0 0 0 0 0 0 1
A = 1 1 1
0 1 1 1 1 0 0 0
B= {1 + χ , Ι - χ } .
2 0 1 0
Let T be the linear transformation T : R2X2 —* Pi with matrix
1 1 - 1 0
0 0
relative to A and i3. Find T
3 2
168 Chapter 5 Linear Transformations
9 -4 -4
A =
-3 - 2
relative to the standard bases £3 and £2· Find the matrix of T relative to the
bases {(1,2,0), (1,1,1), (1,1,0)} of R 3 and {(1,0), (1,1)} of R 2 .
16. Let T be a linear operator on R 2 that maps (2,1) onto (5,2) and (1,2) onto (7,10).
Determine the matrix of T with respect to the bases A = B = {(3,3), (1, - 1 ) } .
17. Find the matrix of T in Problem 9 relative to the bases {(-1,3), (—1,1)} of R 2
and £3 of R 3 .
18. A linear operator T on R 4 has the matrix
1 0 1 1
2 1 3 1
1 ■0 -1 -1
3 2 5 1
1 2 -1 2 1 1 2 0 1 0
1 4 4 -3 -1 2 4 1 4 3
(C)A- (d)A:
2 6 3 -1 0 1 2 2 5 -2
3 8 2 1 2 1 -2 3 5 4
20. Find a basis for the kernel of T in each part of Problem 19.
21. Let T be the linear transformation of R 5 into R 3 that has the matrix A relative to
the bases {(1,1,1,1,1), (1,1,1,1,0), (1,1,0,0,0), (1,0,0,0,0), (0,0,0,0,1)} of R 5
and {(1,1,1), (0,1,0), (1,0,0)} of R 3 . Find a basis for the range of T.
1 3 2 0 -1 1-2 0 1-4
(aM = 2 6 4 6 4 (b)A 2-4 1 3-5
1 3 2 2 1 1-2 0 0-2
5.4 Change of Basis 169
22. Find a basis for the kernel of T in each part of Problem 21.
1 1 0
3
23. Let T be the linear operator on R that has the matrix 2 2 0 relative to
3 3 0
3
the basis A = B = {(1,0,1), (1,1,0), (0,1,1)} of R . Find a basis for the kernel of
T.
(a) T - 1 ( W 1 + W 2 ) = T - 1 ( W 1 ) + T - 1 ( W 2 ) .
(b) r 1 (Winw 2 ) = r 1 (Wi)nr 1 (w 2 ).
Theorem 5.14 Let C = {wi, w 2 ,..., w&} andC = {w^, w 2 ,..., w^.} be two bases of the
vector space W over T. For an arbitrary vector w in W , let
C\ V
[wlc = C =
C2
and {w]c = C' =
4
Ck _ AJ
denote the coordinate matrices of w relative to C and C, respectively. If P is the matrix
of transition from C to C, then C = PC. That is,
[w]c = P [ w ] c .
Proof. Let P = [pij]kxk, and assume that the hypotheses of the theorem are satisfied.
Then
k k
170 Chapter 5 Linear Transformations
and
W
J =EpyWi"
i=l
Combining these equalities, we have
W = Σ CiWi
i=l
k / k \
Σ c'j[ Σ Pij-w»
k / k \
Σ ( EPÜCJ j w<-
Therefore Q = Y2j=iPijc'j an
d
ci
4 + x = ( - l ) ( x ) + (2)(2 + x),
4 - x = (-3)(x) + (2)(2 + x)
-1 -3
2 2
[w]c =
According to Theorem 5.14, the coordinate matrix [w]c may be found from
-1 -3 2
[w]c = P[w]c =
2 2 -1
5.4 Change of Basis 171
(l)(x) + 2(2 + x) = 4 + 3x = w,
The following theorem gives a full description of the effect of a change in the bases
A and B.
Theorem 5.15 Suppose that T has matrix A = [ a ^ ] m X n relative to the bases AofXJ
and B ofV. If Q is the matrix of transition from A to the basis A! of U and P is the
matrix of transition from B to the basis B' o/V, then the matrix ofT relative to A! and
B' is P~lAQ. (See Figure 5.1.)
U .
Basis y\ Basis B
U _
Basis y\' A'=PXAQ Basis B '
Figure 5.1
Proof. Assume that the hypotheses of the theorem are satisfied. Let u be an
arbitrary vector in U, let X and X' denote the coordinate matrices of u relative to A
and A!\ respectively, and let Y and Y' denote the coordinate matrices of T(u) relative
to B and B', respectively. According to Theorems 5.12 and 5.14, we have Y = AX,
where Y = ΡΥ' and X = QX'. Substituting for Y and X, we have
PY' = AQX',
and therefore
Y' = (p-1AQ)Xf.
Theorem 5.16 Two mxn matrices A and B represent the same linear transformation
TofXJ into V if and only if A and B are equivalent.
172 Chapter 5 Linear Transformations
The proof of Theorem 5.16 can be modified so as to obtain two similar results
concerning row equivalence and column equivalence. For requiring B' = B is the same
as requiring P — J m , and requiring A' — A is the same as requiring Q = In. Thus we
have the following theorems.
Theorem 5.17 Two mxn matrices A and B represent the same linear transformation
TofU into V relative to the same basis of U if and only if they are row-equivalent.
Theorem 5.18 Two mxn matrices A and B represent the same linear transformation
T ofXJ into V relative to the same basis o / V if and only if they are column-equivalent.
The three preceding theorems give a full exposition of the connection between linear
transformations and the equivalence relations on matrices that were studied in Chapter
3. However, there is one more major application of matrix theory to the study of linear
transformations. This application is contained in the following theorem.
Theorem 5.19 Let T be an arbitrary linear transformation o / U into V, and let r be
the rank of T. Then there exist bases A' of U and B' of V such that the matrix of T
relative to A' and B' has the first r diagonal elements equal to 1, and all other elements
zero.
Proof. With the stated hypotheses, suppose that A and B are bases of U and V,
respectively, and that T has matrix A relative to A and B. By Theorem 3.46, there exist
invertible matrices P~l and Q such that P~lAQ has the first r diagonal elements equal
to 1, and all other elements zero. Let A! and B' be bases such that Q is the matrix of
transition from A to A' and P is the matrix of transition from B to B'. Then P~lAQ
is the matrix of T relative to A' and Bf, and the theorem is proved. ■
Thus, with suitable choice of bases, each linear transformation T of U into V can
be represented by a matrix of the form
Ir I 0
Dr = —+—
0 I 0
where r is the rank of T. From a different point of view, this means that two linear
transformations of U into V can be represented by the same matrix if and only if they
have the same rank. It is easy to see that the relation of having the same rank is an
equivalence relation on the set of all linear transformations of U into V.
5.4 Change of Basis 173
1 -1 0 2 1 0 0 1 0 0 0
1 -1 1 1 1 1 -1 0 1 0 0
2 -2 1 3 2 1 -1 1 1 0 0
A AQ
] — — — —
-►
[h_ 1 0 0 0 1 0 -2 1 0 1 -2 Q
0 1 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 1 0 -1 1 0 1
0 0 0 1 0 0 0 1 0 0 0 1
Next we use row operations to transform [Α', Is] into [D r , P _ 1 ] , where Dr P~lÄ
l
P~ AQ.
1 0 0 0 1 0 0 1 0 0 01 1 0 0
[A',h} = 0 1 0 0 0 1 0 -> 0 1 0 01 0 1 0 = [Dr,p-1}.
1 1 0 0 0 0 1 0 0 0 0 1 -1 -1 1
174 Chapter 5 Linear Transformations
Thus
1 0 1-2
1 0 0
0 0 1 0
p-1 0 1 0 and Q
- 1 1 0 1
-1 -1 1
0 0 0 1
are invertible matrices such that
1 0 0 0 h I o
P-'AQ = 0 1 0 0 = — + — D2.
0 0 0 0 0 I 0
According to the proof of Theorem 5.19, the desired basis A! and B' can be found by
using Q as the transition matrix from A to A' and P as the transition matrix from B
to B'. Using Q to find A', we get
To find B', we first obtain P by taking the inverse of P~l and then use P as the
transition matrix from B to B'. We find
-1
1 0 0 1 0 0
P = 0 1 0 = 0 1 0
-1 -1 1 1 1 1
and B' — {1 + x 2 , x + x 2 , x 2 }. The original defining equation for Τ(αι, α2, Ö3, 04) can be
used to check that D2 is in fact the matrix of T with respect to A' and B'. ■
Exercises 5.4
3 5
4. Suppose is the transition matrix from the basis A = {(—4,2), (10, —6)}
1 2
of R 2 to the basis B.
o n , 1 - 3 1
8. Suppose S : R —* R is the linear transformation with matrix
' 2 - 6 2
relative to the bases £3 and 82- Find the matrix of S with respect to the bases
{(1,0,1), (1,0,0), (1,1,0)} and {(1,-1), (2,0)}.
2 3 1
9. Let S be the linear transformation from R 3 to R 2 that has the matrix
1 2 1
1 1 1
relative to 83 and £2· Given that 0 0-1 is the transition matrix from £3
0 1 1
176 Chapter 5 Linear Transformations
2 3
to A! and is the transition matrix from £3 to B', find the matrix of S
1 2
relative to A' and B'.
1 -1 1
10. Let T be the linear transformation of R 3 into R 2 that has the matrix
2 1 1
relative to the bases {(1,2,0), (1,1,1), (1,1,0)} of R and {(1,1), (1,-1)} of R 2 .
3
Find the matrix of T relative to the bases {(2,3,0), (1,1,1), (2,3,1)} of R 3 and
{(3,-1), ( 1 , - 1 ) } of R 2 .
2 1 0
11. Let T be the linear operator on V2 over R that has the matrix 0 2 0
2 3 1
2
relative to the bases A — B — {x + x , — 1 + x, x}. Find the matrix of T relative
to the basis A' = B' = {2 - x + x 2 , - 6 x - 2x 2 , x}.
3 -2
12. Suppose the linear operator T on V\ over R has the matrix with respect
1 0
to A = B = {1— x,x}. Find the matrix of T with respect to A! = B' = { 2 - x , - 1 } .
1 4
13. The linear operator S : R 2 —* R 2 has the matrix with respect to the
2 3
bases A = ß = {(1,1), (0,1)}. Find the matrix of S with respect to A! = B' =
{(2,1), (1,2)}.
14. Let T be the linear operator in Problem 13 of Exercises 5.3. Use matrices of
transition to find the matrix of T relative to 82 and £2. Then use the new matrix
to compute T(u), thus making a check on the answer previously obtained.
1 -1 1
16. Suppose the linear transformation S : R 3 —> R 2 has the matrix A =
2 1 0
3 2
relative to £3 and £ 2 · Find bases A' of R and B' of R such th*
that the matrix A!
of S relative to A! and B' is the reduced row-echelon form for A.
2 3 1
17. Suppose the linear transformation T : R 3 —► R 2 has the matrix A =
1 2 1
5.5 Composition of Linear Transformations 177
1 0 1
2 -3
relative to £3 and £2· Given that B\ and B<i = 0 1 -1 are
1 2
0 0 1
1 0 0
invertible matrices such that B1AB2 = Ό2 , find bases A! of R 3
0 1 0
and B' of R 2 such that T has matrix Ό2 relative to A' and B'.
18. Suppose T is the linear transformation of R n into R m that has the given matrix
A relative to £n and £ m . Find bases A! of R n and B' of R m that satisfy the
conditions given in Theorem 5.19.
1 3 2 0 -1 3 -2 -1 -4
(a)A = 2 6 5 6 1 (b)A = 1 1 -2 -3
1 3 2 2 0 2 3 -1 1
Proof. Let A = {ui, 112, ...,u n } and B = {vi, V2,..., v m } be bases of U and V,
respectively. If S has matrix A = [ a ^ ] m x n and T has matrix B = [6^] m X n relative to
A and #, then S(UJ) = Σ ϋ ι aijvi a n d T(\ij) = J^iLi bijVi- Hence
(S + r ) ( u i ) = X ) ( a i i + 6<i)vi,
(oT)( U j ) =^2(abij)vi,
178 Chapter 5 Linear Transformations
We turn our attention now to the "third operation" on linear transformations that
was mentioned on page 150. This third operation is the composition of linear trans
formations, defined by the same kind of rule as used for composite functions in the
calculus.
In the calculus, two given functions / and g are combined to produce the composite
function / o g by the rule
(fog)(x) = f(g(x)).
The domain of / o g is the set of all x in the domain of g such that / is defined at g(x).
In linear algebra, it is common to refer to the composite of two linear transformations
as their product. We adopt this usage here, stated formally in the following definition.
Definition 5.21 Let U, V, and W be vector spaces over the same field T', and suppose
that S is a linear transformation of U into V and that T is a linear transformation of
V into W . Then the product TS is the mapping of U into W defined by
TS(u) = (ToS)(u) = T(S(u))
for each u in U.
Example 1 D Consider the linear transformations S : R 2 x2 R 3 and T : R 3 -» Vx
over R defined by
b
= (a + 2b,b- 3c, c + d)
and
Τ(αι,α 2 ,α 3 ) = (αχ - 2α2 - 6α3) + (α2 + 3α3)χ.
Computing the product TS, we have
a b a b
TS = T IS
c d c d
= T(a + 2b,b-3c,c + d)
= (a + 26 - 2(6 - 3c) - 6(c + d) + (6 - 3c + 3(c + d))x
= {a-6d) + (b + 3d)x. ■
One of the exercises for this section asks for verification that the product in Definition
5.21 is associative but not commutative. The most important property, as far as our
study is concerned, is stated in Theorem 5.22.
For the remainder of this section, U, V, and W will denote vector spaces over the
same field T. Also, S and T will denote linear transformations of U into V and V into
W , respectively.
TS(aui + 6u 2 ) = T (5(aui + 6u 2 ))
- T ( a S ( U l ) + 6S(u 2 ))
= a T ( S ( u i ) ) + 6T(5(u 2 ))
= a r S ( u i ) + MTS(u2),
Theorem 5.23 Suppose that U, V, and W are finite-dimensional vector spaces with
bases *4, B, and C, respectively. If S has matrix A relative to A and B and T has
matrix B relative to B and C, then TS has matrix BA relative to A and C.
m
= ZakjT(vk)
k=l
rn / p >
a
= Σ kj ( Σ bikWi
m p
= Σ Σ (°>kjbikWi)
k=li=l
p m
= Σ Σ (bikCLkjWi)
P / m \
= Σ Σ bikdkj Wi,
2=1 \fc=l /
B = £3, and C — {l,x}. With S and T as in Example 1, we shall find the matrices of
5, T, and TS relative to these bases and verify that the matrix of TS is the product of
the matrix of T times the matrix of S.
Since
1 0 0 1
(1,0,0), S (2,1,0),
0 0 0 0
0 0 0 0
(0,-3,1), S (0,0,1),
1 0 0 1
S has matrix
1 2 0 0
0 1 - 3 0
0 0 1 1
1 -2 -6
B =
0 1 3
1 0 0 1
TS (1)(1) + (0)(ζ), TS (0)(1)+ (!)(*),
0 0 0 0
0 0 0 0
TS (0)(l) + (0)(x), TS (-6)(1) + (3)(x),
1 0 0 1
1 0 0-6
0 1 0 3
5.5 Composition of Linear Transformations 181
1 2 0 0
1 -2 -6 1 0 0-6
BA = 0 1 -3 0 =
0 1 3 0 1 0 3
0 0 1 1
We note that the product AB is not defined, and this is consistent with the fact that
ST is not defined. ■
The operations of addition and multiplication of matrices are connected by the dis
tributive property. The statement of this fact is in Theorem 5.24. Proofs are requested
in the exercises.
Theorem 5.24 Let A — [a^] m X n ,jB = [bij]nxp, and C = [β^·] ηχρ over T. Then
A(B + C) = AB + AC.
There is the possibility in Definition 5.10 that the vector spaces U and V may be
identical, and yet the bases A and B may be different. In many instances, there is
no condition present that requires that A and B be different. In these instances, it is
convenient and conventional to choose A = B. If it is our intention to use just one base
A of V, we simply use the phrase "matrix of T relative to ^4" rather than "matrix of T
relative to A and ß , where A and B are equal." Similarly, "A represents T relative to
*4" means that A represents T relative to A and #, where A and B are equal.
Let us consider the case where U = V = W in Definition 5.21. This allows us to
define positive integral powers of T inductively by T / e + 1 = TkoT for each positive integer
fc. We define T° to be the identity transformation of V. In combination with Definition
5.2, this determines the value of each polynomial arTr + ar-\Tr~l + · · · + a{T + aoT°
in T with coefficients in T", and such a polynomial is always a linear transformation of
V into V. In such a polynomial we shall write α$ in place of aoT°.
If T has matrix A relative to the basis A, then Theorems 5.20 and 5.23 show that
aTTr + ar-\TT~x H f-aiT + ao has matrix arAr + ar-iAr~1 H \-a\A-\-aoI relative
to A. Consequently, ^ [ = 0 α{Τι is the zero linear transformation if and only if ΣΙ=0 CLÎA1
is the zero matrix. This means that T and A satisfy the same polynomial equations.
A linear transformation T of U into V is called invertible or nonsingular if there
is a mapping S of V into U such that ST(u) = u for all u G U and TS(v) = v for
all v G V . Whenever such a mapping S exists, it is denoted by S — T~x and is called
the inverse of T. It is left as an exercise to prove that the inverse of T is a linear
transformation of V into U. It follows from Theorem 5.23 that if T is invertible and has
matrix A relative to the bases A of U and B of V, then T _ 1 has matrix A~x relative
to B and A.
The next example shows how polynomials in matrices and linear transformations
can sometimes give interesting and surprising results.
Example 3 D Let T be the linear operator on R 3 defined by
T(xi,x2,X3) = (2xi + X2,2x 2 ,2xi + 3x 2 + #3)·
182 Chapter 5 Linear Transformations
8 0 14 4 0 6 2 0 2 1 0 0
3 2
A 5A + 8A-4I = 12 8 31 -5 4 4 11 +8 1 2 3 -4 0 1 0
0 0 1 0 0 1 0 0 1 0 0 1
0 0 0
0 0 0
0 0 0
The equation A3 — bA2 + 8^4 — 4 / = 0 also has some implications concerning positive
integral powers of A. For instance,
A3 = 5A 2 - SA + AI
and this implies that
A4 = 5A3 - SA2 + 4 A
Substituting for A3, we have
This substitution procedure can be repeated so as to express any higher integral power
of A as a quadratic polynomial in A, and the corresponding powers of T can be expressed
as quadratic polynomials in Γ. ■
5.5 Composition of Linear Transformations 183
Exercises 5.5
and
1 -1
3. Suppose that 5 and T are linear operators on R 2 such that 5 has matrix
0 2
3 0
and T has matrix relative to the standard basis £ 2 · Find the matrix
-2 1
that represents TS relative to £2·
4. Let T : R 2 -► R 2 and 5 : R 2 -> R 2 be given by
5 -2
5. Suppose the linear operator 5 : R 2 —> R 2 has the matrix relative to the
1 -3
basis {(1,2), (0,1)} and the linear transformation T : R 2 —► R 3 has the matrix
2 ll
1 —1 relative to the bases {(1,2), (0,1)} of R 2 and £3 of R 3 . Find a matrix
1 1 I
representation for TS.
6. Find the matrix representation of 5 _ 1 in Problem 5 with respect to the basis
{(1,2), (0,1)}·
184 Chapter 5 Linear Transformations
vi = u 2 + 2u 3
v2 = u3
V3 = ui + 2u 2 + 5u 3 .
3 2
9. Given that T is a linear operator on R 2 with matrix relative to £ 2 ,
-4 -2
find the matrix of 2T 3 + T 2 - 3T + 7 relative to £ 2 .
1 2
10. Let T be the linear operator on R 2 that has the matrix relative to £ 2 .
2 -2
T 3 - 3T 2 + AT + 6 = 0,
14. Give an example which shows that it may happen that ST φ TS, even when both
ST and TS are defined.
15. Let T\ and T2 be linear transformations of U into V, and let 5 be a linear trans
formation of V into W . Suppose that 5, 7\, and T2 have matrices A, B, and C,
respectively, relative to certain bases of U, V, and W . Prove that S(T\ + T2) has
matrix AB + AC relative to these same bases.
17. Use Theorem 3.6 and the definition of addition of matrices to prove Theorem 5.24.
20. Let T be a linear operator on V with matrix A relative to the basis A of V. Prove
that T is invertible if and only if A is invertible.
Determinants
6.1 Introduction
In this chapter, the fundamentals of the theory of determinants are developed. A
knowledge of this material is necessary in the study of eigenvalues and eigenvectors of
linear transformations, and many of the applications of linear algebra involve a use of
eigenvalues and eigenvectors. These topics will be studied in Chapter 7.
187
188 Chapter 6 Determinants
Definition 6.2 The index of a permutation j i , j2,..., j n of {1,2,... ,n} is the integerX
given by
fc=l
X = J(l)+J(4)+J(5)+X(3)+J(2)+J(6)
=0+0+0+2+3+0
= 5. ■
where X'(jk) denotes the index of jk in the new permutation. It is clear that X'{jk) —
X(jk) if k is different from m and ra+1. If j m > j m + 1 , then T(jm) = J ( j m ) , X / ( j m + i ) =
T(jm+i) — 1, and consequently X' = X — 1. On the other hand, if j m < j m + i , then
Z'(jm) = X{jm) + 1 and Tf(jm+i) — Z(j m +i), so that X' = X + 1. In either case, the
index is changed by 1, and the theorem is proven. ■
Theorem 6.3 paves the way for obtaining the corresponding result concerning the
interchange of any two (not necessarily adjacent) elements in a permutation.
Theorem 6.4 Any interchange of two elements in a permutation j \ , J2, --,jn of the set
{1,2, ...,n} changes the index by an odd integer.
6.2 Permutations and Indices 189
Proof. Let
J\ , j l i · · · , Jri ""> Js·) · · · , Jn
be the given permutation, and consider the interchange of j r and j s . Let m be the
number of elements between j r and j s .
Now the permutation that results from the interchange of j r and j s can be accom
plished by using only interchanges of adjacent elements. The element j r can be moved
to the position initially occupied by j s by m + 1 interchanges with the adjacent element
on the right. Then j s can be moved to the position that j r initially occupied by m
interchanges with the adjacent element on the left. Thus, the interchange of j r and
j s can be accomplished by 2m + 1 interchanges of adjacent elements. These 2m + 1
interchanges cause 2m + 1 changes of 1 in the index of the ordering, and consequently
the index has changed by an odd number. ■
The main objective of this section is to establish Theorem 6.7 for use in Section 6.3.
The following two lemmas are basic to our proof of Theorem 6.7.
Lemma 6.5 If a given permutation of {1, 2,..., n} is carried into another permutation
by an odd number of interchanges of elements, then the index of the given permutation
differs from the index of the final permutation by an odd number.
Proof. Suppose that a given permutation of {1,2, ...,n} is carried into another
permutation by an odd number of interchanges of elements. According to Theorem 6.4,
each of the interchanges of elements changes the index by an odd number. Thus, the
index of the original permutation differs from the index of the final permutation by an
odd number, since the sum of an odd number of odd integers is an odd integer. ■
Proof. The proof is an exact parallel to that of Lemma 6.5, except that the sum of
an even number of odd integers is an even integer. ■
Theorem 6.7 The number of interchanges used to carry a permutation ji,J2, ~-,3n of
{1,2, ...,n} into the natural ordering is either always odd or always even.
Proof. Since the index of the natural ordering is zero, the difference in the indices
of ji,J2T~,jn and the natural ordering is the same as the index X of j i , J2? ···, jn- If
3\,32, —, jn can be carried into 1,2, ...,n by an odd number of interchanges, then X
must be odd by Lemma 6.5. If ji,J2> ···> jn can be carried into l,2,...,n by an even
number of interchanges, then X must be even by Lemma 6.6. Thus the number of
interchanges used to carry ji,J2 5 ···, jn into 1,2,..., n must always be odd if X is odd and
must always be even if X is even. ■
6,4,1,7,5,2,3
is odd or even by counting the number of interchanges used to carry this permutation
into the natural ordering.
Because we have used an odd number of interchanges to carry the original permutation
into the natural ordering, the index X of
6,4,1,7,5,2,3
must be odd by Lemma 6.5, and the number of interchanges used to carry the original
permutation into the natural ordering would always be odd by Theorem 6.7. Although
always odd, this number of interchanges may be different from the index of the permu
tation. In this example, we used 5 interchanges of elements and the index of
6,4,1,7,5,2,3
is
X = J(l)+J(2)+J(3)+J(4)+J(5)+T(6)+J(7)
=2 +4+4+1+2+0+0
= 13. ■
6.3 The Definition of a Determinant 191
Exercises 6.2
1. Find the index of the following permutations.
(a) 5,3,1,4,2 (b) 2,4,5,3,1 (c) 6,2,4,5,3,1
(d) 5,3,4,2,1,6 (e) 3,4,6,1,2, 7,5 (f) 2,5,1,4,3, 7,6
2. Determine whether the index of the given permutation is odd or even by counting
the number of interchanges used to carry the given permutation into the natural
ordering.
(a) 3,4,2,5,1 (b) 5,1,3,2,4 (c) 6,2,4,5,3,1
(d) 5,2,4,6,1,3 (e) 4,1,7,5,2,6,3 (f) 3,7,5,4,6,2,1
3. Write out a sequence of interchanges of adjacent elements that will accomplish the
interchange of the given pair of elements in the permutation 6,2,4,5,3,1.
(a) 2 and 3 (b) 6 and 3 (c) 2 and 1 (d) 4 and 1
4. Prove that the number of interchanges used to carry a permutation j i , j2, ···> jn of
{1,2, ...,n} into itself (i.e., into the same permutation) must always be even.
5. If n > 2, what is the index of the permutation n, n — 1, ...,3,2,1? Justify your
answer.
an ai2 «In
«2n
det(A) = \A\ =
192 Chapter 6 Determinants
Although the number of interchanges used to carry j i , j'2, ···, jn into 1,2,..., n is not
always the same, Theorem 6.7 assures us that this number is either always even or
always odd. Hence the sign (—1)* of each term is well-defined, and det(A) is uniquely
determined by A.
We observe that there are n\ terms in the sum det(A) since there are n! possible
orderings of 1,2, ...,n. The determinant of an n x n matrix is referred to as an n x n
determinant, or a determinant of order n.
By the definition,
Since 1,2,3 is the natural ordering, we may take t\ = 0. Since 1,3,2 can be carried into
1, 2,3 by the single interchange of 2 and 3, we may take £2 — 1· The ordering 2,3,1 can
be carried into 1,2,3 be an interchange of 2 and 1, followed by an interchange of 2 and
3. Thus we may take t3 = 2. By the same method, we find £4 = 1,£5 = l,and t6 = 2.
Hence
det(A) = 0 1 1 0 2 2 0 3 3 - 0 1 1 0 2 3 0 3 2 + 012023031
— ai2Ö21 a 33 —
O l 3 a 2 2 û 3 1 + 013021Ö32- "
It is worth noting that the value of det(^4) obtained in Example 1 agrees with the
value yielded by evaluation routines that are taught in high school algebra. One of the
most popular of these routines evaluates a 3 x 3 determinant \A\ be reproducing the
first two columns and forming signed products according to the following diagram.
det(A) =
+
N« *i I
y\j
det(A) = \( = a,,/ / a„
" 2 2 - "a/ 2 a
n "2/
gives a correct value, but it is important to know that there is no similar scheme
that works for determinants of order 4 or any order greater than 3.
Definition 6.8 is frequently referred to as the "row" definition of a determinant, since
the row subscripts on the factors α ^ are held fixed in the natural ordering.
The next theorem presents an alternate formulation (the "column" definition) in
which the column subscripts are held fixed in the natural ordering.
where Σ denotes the sum over all possible permutations ii,i2,—,i>n o/ 1,2,..., n, and s
(0
is the number of interchanges used to carry i\,i<i,...,in into the natural ordering.
Proof. Let S = Σ(ί)(~l)Saniai22 * · · &%ηη> Now both S and det (A) have n\ terms.
Except possibly for sign, each term of S is a term of det (A), and each term of det (A) is
a term of 5. Thus, S and det (A) consist of the same terms, with a possible difference
in sign.
Consider a certain term ( — Ι ^ α ^ ι α ^ · · - a>inn and let (—l)taij1a2j2 · · · anjn be the
corresponding term in det(A). Then α ^ ι α ^ · · · αιηΎΙ can be carried into ÖIJ 1 Ö2J 2 * ' * anjn
by s interchanges of factors since the permutation ζ'ι,Ζ2, -·Άη can be changed into the
natural ordering 1,2,..., n by 5 interchanges of elements. This means that the natural or
dering 1,2,..., n can be changed into the permutation iuH-i ~->Jn by s interchanges since
the column subscripts have been interchanged each time the factors were interchanged.
But ji,J2,—,jn can be carried into l,2,...,n by t interchanges, by the definition of
det(A). Thus 1, 2,..., n can be carried into j i , J2, ···? Jn a n d then back into itself by s -h t
interchanges. Since 1,2, ...,n can be carried into itself by an even number (zero) of
interchanges, s + 1 is even by Theorem 6.7. Therefore (—l) s + i = 1 and (—l)s = (—1)£.
Now we have the corresponding terms in det (A) and S with the same sign, and therefore
det(A) = S.M
Exercises 6.3
1. Determine whether t is even or odd in the given term of det(A), where A = [a^] n .
(a) (-ΐΥα13α2ια34α42 (b) (-ΐΥαι4α2ια33α42 (c) {-l)1 al4a2^a32aAi
(d) (-1)^12024031043 (e) (-if α2ΑαΑ3αλ2α3ι (f) (-1)^21^33014042
7. If A = [dij]n and c is any scalar, express the value of det(cA) in terms of det(>l).
4 0 0 4 -5 7 an 0 0 an 0 0
0 -3 0 (b) 0 -3 -8 (c) 0-21 &22 0 (d) 0 0 a 23
0 0 6 0 0 6 &31 a
32 &33 0 a 32 0
Theorem 6.11 If the square matrix B is obtained from the matrix A by an elementary
row (column) operation of type III, then det(£?) = — det(A).
U)
U)
where t\ is the number of interchanges used to carry
The main purpose of this section is to establish an expression for the value of a
determinant that is known as "an expansion by cofactors." Some new notation and
terminology are needed to state this expansion by cofactors.
Definition 6.12 The minor of the element a,ij in A — [α^] η is the determinant Mij
of the (n — 1) x (n — 1) submatrix of A obtained by deleting row i and column j of A.
Example 1 D To obtain the minor of a 12 in A = [0^3, we first delete row 1 and
column 2 of A as indicated below.
We then evaluate the determinant of the submatrix that remains. The minor Mi 2
of a 12 is given by
Û21 &23
Afi2 = Û21Û33 ~ Û23Û31·
Û31 «33
is
7 9
M32 42 - 36 - 6,
4 6
and the cofactor of 2 is
132 ( - 1 ) 3 + 2 M 3 2 = -M32 = - 6 .
Our next theorem shows that the evaluation of an n x n determinant can b^ reduced
to the evaluation of n determinants of order n — 1. This is of little practical use except
when used in combination with elementary operations. Aside from this fact, however,
the theorem has substantial theoretical value. The expression given in the theorem is
referred to as "an expansion by cofactors" or more precisely as "the expansion about
the ith row." This expansion is the main result of this section.
6.4 Cofactor Expansions 197
a
det(A) = an An + Q>iiAii ^ ·" inAin.
Proof. For a fixed integer i, we collect all of the terms in the sum det(^4) =
Σ(ι)("~l) ia iji a 2.72 ' ' ' anjn that contain an as a factor in one group, all of the terms
that contain α^ as a factor in another group, and so on for each column number. This
separates the terms in det(A) into n groups with no overlapping since each term contains
exactly one factor from row i. In each of the terms containing an, we factor out an and
let Fn denote the remaining factor. Repeating this process for each of α^, 0*3, .··, «m in
turn, we obtain
det{A) = diiFii + ai2Fi2 -\ h ainFin.
To finish the proof, we need only show that Fij = A^ = (—1) Î+:, M^, where M^ is the
minor of α^.
Consider first the case where i = 1 and j = 1. We shall show that a\\Fn = auMn.
Each term in F n was obtained be factoring a n from a term (—l) tl ana2j 2 · · · anjn in
the expansion of det(^l). Thus each term in F n has the form (—l)*1 a>2j2a3j3 · — anjn
where t2 is the number of interchanges used to carry J2,J3, ···, jn into 2,3,..., n. Letting
J2, J3, "-,3η range over all permutations of 2,3, ...,n, we see that each of F\\ and M\\
has (n — 1)! terms. Now l,j2,---,in can be carried into the natural ordering by the
same interchanges used to carry J2,---,jn into 2,...,n. That is, we may take t\ = £2·
This means that F n and M\\ have exactly the same terms, yielding Fn = M\\ and
aiiFu =aiiMu.
Consider now an arbitrary α^. By i — 1 interchanges of the original row i with the
adjacent row above and then j — 1 interchanges of column j with the adjacent column
on the left, we obtain a matrix B that has α^ in the first row, first column position.
Since the order of the remaining rows and columns of A was not changed, the minor of
aij in B is the same M^ as it is in A. By Theorem 6.11,
ÖH a i 2 Ö13
- a 2 3 |anû32 - ^12^311 ·
Theorem 6.14 proves to be extremely useful. For instance, it provides the key step
in establishing our next theorem.
Theorem 6.16 The expression C\An -\-C2Ai2 H \-cnAin is equal to the determinant
of a matrix which is the same as A — [α^·]η except that the elements aij of the ith row
have been replaced by the scalars Cj.
det(A) = an An + ^ 2 ^ 2 H H ainAin.
Let B be obtained from A by replacing an by c i , α ^ by C2, a^n by cn. Then det(i?) can
be found from the above expansion by replacing each α^ by Ck for k = l,2,...,n. In
evaluating A ^ , the i t h row and kth column are deleted from A. Thus, the values of the
Aik do not depend on any of the elements α α , α ^ , -..,am· Therefore, the values of the
Aik do not change when each α^ is replaced by c^, and
The result parallel to Theorem 6.16 has the following formulation in terms of columns.
Theorem 6.17 The expression c\A\j -\-c2A2j H \-cnAnj is equal to the determinant
of a matrix which is the same as A = [α^·]η except that the elements a^ of the j t h
column have been replaced by the scalars Q .
Theorem 6.18 The determinant of a matrix A — [a^] n that has two identical rows
(columns) is zero.
6.4 Cofactor Expansions 199
Proof. In contrast to the development thus far in this chapter, we must take into
account the field T that contains the elements α^ of A.
Suppose first that 1 + 1 ^ 0 in T} If the uth and vth rows of A are identical, let
B be the matrix formed from A by the interchange of the uth and vth rows. Then
B = A, but det(B) = -det(A) by Theorem 6.11. Thus we have det(A) = - d e t ( A ) ,
and (1 + 1) det(i4) = 0. Since 1 + 1 ^ 0 , det(A) = 0.
Consider now the case where 1 + 1 = 0 in T. According to Theorem 6.11, an inter
change of two rows in A produces a matrix whose determinant has the value — det(-A).
But det(A) = — det(A) since det(A) is in T and c + c= (1 + l)c = 0 for all c in T. Thus,
an interchange of rows does not change the value of the determinant, and there is no loss
of generality if we assume that the first two rows are equal. That is, a\3- = a2j for all j .
Since — 1 = 1 in T, det(A) = Σ(Ί) aijia2j2 ' ' * anjn· For each term a>ij1a2j2a3j3 ' ' ' anjn
in det(^4), there is a corresponding term aij2a2j1a3j3 · · ' anjn- If w e group these terms
in pairs and sum over only those orderings with ji < j 2 , we have
Theorem 6.19 The sum of the products of the elements of a row of A = [α^·]η by the
cofactors of the corresponding elements of a different row of A is zero. Hence
anAki +ai2Ak2 + ··
is the determinant of a matrix B that is the same as A except that the elements of row
k have been replaced by αίΐ7«ζ2, ••-,αζη- Thus the matrix B has the z th and kth rows
identical, and det(JB) = 0 by Theorem 6.18. ■
Theorem 6.20 The sum of the products of the elements of a column of A = [α^·]η by
the cofactors of the corresponding elements of a different column is zero. Hence
Exercises 6.4
1. Prove that if A = [α^] η has a row with all elements zero, then det(A) = 0.
2. Compute the cofactor of the indicated element α^ in A = [α^], where
2 -2 2 1
2 -1 -2 -1
A =
0 2 -4 -6
2 -3 10 4
4 -5 3
A = 3 3 -2
1 -1 1
(a) Find the matrix B = [Aij]T, where Aij is the cofactor of α^.
(b) Compute AB, where A and B are as in part (a).
(c) Use the results of parts (a) and (b) to find A"1.
2 3 0-1 1 2 1
-4-6 0 3 3 4-1
(c) (d)
2 1 - 1 2 -2 2 -1
0-2-1 3 1 -3 -2
1-x 1 -1
-1 1-x -1 = 0.
-1 -1 1-x
Proof. Suppose that B is obtained from A = [a^] n by multiplying each entry of the
kth row by c φ 0.
By Definition 6.8,
det(B) = c^u)(-l)taljla2j2'"akj
= cdet(A).
202 Chapter 6 Determinants
Proof. Let A = [o>ij]n and suppose that B = [bij]n is formed by adding to each
element auj of the uth row of A the product of the scalar c and the corresponding
element avj of the vth row of A(u ^ v). Then B and A are the same except in the uth
rows, and the cofactor of bUj in the uth row of B is the same as the cofactor Auj of the
corresponding element aUj in A. When det(£?) is expanded about the uth row, we find
5 2 2 15
2 2 -- 4 6
det(A) =
2 -- 4 2 6
0 5 7 1
By Theorem 6.21,
5 2 2 15
1 1 -2 3
det(A) = 2
2 -4 2 6
0 5 7 1
According to Theorem 6.22, the value of the determinant is unchanged if we (a) add to
row 1 the product of -5 and row 2 and (b) add to row 3 the product of -2 and row 2.
Thus
' 0 - 3 12 0
1 1 -2 3
det(A)
0 -6 6 0
0 5 7 1
6.5 Elementary Operations and Cramer's Rule 203
-3 12 0 1 -4 0
2+1
det(yl) = 2 ( - l ) -6 6 0 = (-2)(-3)(6) -1 1 0
5 7 1 5 7 1
1 -4
det(A) = ( - 2 ) ( - 3 ) ( 6 ) ( - l ) 3 + 3 = -108.
-1 1
Our final result of this section makes an important connection between determinants
and the solution of certain types of systems of linear equations. This theorem presents
a formula for the unknowns in terms of certain determinants. This formula is commonly
known as C r a m e r ' s rule.
Proof. We first show that the given values are a solution of the system. Substitution
of these values for Xj into the left member of the i t h equation of the system yields
= ^(Ä)Zh(özkdet(A))
fc=l
d^Âyft<(««det(i4))
= b,
multiply both members of the ith equation by A^ (j fixed) and form the sum of these
equations, we find that
Σ ^o.ikAijyk =^2biAij
i=l \k=l i=l
or
and
ΣΓ=ι Mi,
Vj
det{A) *
Hence these yj 's are the same as the solution given in the statement of the theorem. ■
We note that the sum JZfc=i bkAkj is the determinant of the matrix obtained by
replacing the j t h column of A by the column of constants B = [6i, b2,..., bn]T.
&1 Û12 a
13 a n 61 «13 an ai2 fei
a
62 »22 &23 &21 b2 ÖL23 Ö21 22 b2
a
&3 « 3 2 Û33 «31 &3 33 «31 &32 63
ari ^2 Z3
μΐ μΐ μι
Exercises 6.5
1 1 1 1 - 1 1 0 2
4 -2-9 2 4-3
2 3 1 -2 10 2 3 -1
(e) 7 2 10 (f) 4 8 9 (g) (h)
4 6 1 4 4 0 1-1
4 1 -3 4 4-9
8 9 1 -8 1 1 1 1
2. Use Cramer's Rule to evaluate two of the unknowns, and then find the remaining
unknown by substitution of these values into one of the equations.
2 5-6 4 15 - 6
if - 1 2 4 = 139, then -2 6 4 = 834.
3 1 5 6 3 5
4. Solve for x:
x 0 0 -8
- 1 x 0 -10
= 0.
0-1 x -1
0 0-1 1
5. Show that
1 1 1
(a — b)(b — c)(c — a).
b2
206 Chapter 6 Determinants
6. Show that
a a a
a b b = a(a — b)(b - c).
a b c
7. Evaluate the following determinant.
(a + 6)2 c*
a2 (6 + c) 2
b2 b2 (a + c)2
8. Show that
—x 1 0 0
0 —x 1 0
C0 + C\X + C2X + C3X + X .
0 0 —x 1
-co -ci -C2 -c3 -
9. Show that
a b c a b c a c
d e f + d e / = d /
h i j k m n /i + fc i + m j + n
10. Prove that an equation of the straight line through two distinct points with rect
angular coordinates (x\,yi) and (#2,2/2) is given by
X y 1
x1 Vi 1 0.
X2 V2 1
11. Suppose that /ii,/i2>/2i» and ^22 are differentiable functions of x and that g is
defined by
/11(a) /12(a)
9(x)
/21W /22(a)
Use calculus formulas and show that
12. Use Theorem 6.21 to express \cA\ in terms of \A\ for A — [ßij]n·
13. Prove the dual statement of Theorem 6.21.
14. Prove the dual statement of Theorem 6.22.
The result of the theorem extends readily to the following corollary, the proof of
which is left as an exercise (Problem 6).
Corollary 6.26 If Μχ, M2,..., Mk are elementary n x n matrices and A is n x n, then
det(MiM 2 . · - MkA) = det(Mi) det(M 2 ) · · · det(M fc ) det(A).
Corollary 6.27 If M\, M 2 ,..., Μ& are elementary nx n matrices, then
det(MiM 2 · · · Mk) = det(Mi) det(M 2 ) · · · det(M /c ).
208 Chapter 6 Determinants
Proof. The asserted equality follows at once from Corollary 6.26 with A = In. ■
Corollary 6.27 enables us to extend the result of Theorem 6.24 to arbitrary invertible
matrices, and we are also able to establish the converse.
Since det(A) φ 0 and each det(A^) φ 0,det(Pi4) Φ 0. This implies that PA does not
have a row of zeros, and consequently is of rank n by Corollary 3.49. But A and PA
have the same rank since they are equivalent, so this means that A has rank n. It follows
from Corollary 3.50 that A is invertible. ■
When the theory of matrices and determinants was in its early stages of development,
the word nonsingular was normally used instead of invertible, and the word singular
was used to indicate that a square matrix did not have an inverse. The last theorem
gives the basis for the early terminology: singular corresponds to zero determinant, and
nonsingular corresponds to nonzero determinant.
We are now in a position to prove the main theorem in this section.
det(AB)=det{A)det(B).
Proof. If A is not invertible, then AB is not invertible (for AB invertible implies
AB ( A B ) - 1 = 7 n , which in turn implies A is invertible by Theorem 3.14). By Theorem
6.28, det(A) = 0 and det(AB) = 0 so that
det(AB) = det(A)det(B)
in this case.
Assume that A is invertible. Then A is a product of elementary matrices, A =
ΜλΜ<2 · · · M*, and Corollary 6.26 implies that
det(AB) = det(M1M2---MkB)
= det(Mi)det(M 2 )---det(il4)det(£).
6.6 D e t e r m i n a n t s a n d M a t r i x Multiplication 209
But
det(i4) = det(Mi) det(M 2 ) ■ ■ · det(M fc )
det(AB)=det(A)det(B).m
The next theorem is of great interest since it provides a formula for the computation
of the inverse of a matrix. Unfortunately the use of this formula is not very practical for
matrices of higher orders, due to the large number of computations that are necessary.
The formula is conveniently expressed in terms of the adjoint of a matrix, defined
as follows.
Definition 6.30 If A— [α^] η , then the adjoint of A, denoted by adj(^4), is the matrix
given by adj(A) = [A^JJ, where A^ is the cofactor of a^ in A.
E x a m p l e 1 D Let
11 - 6 2
A = [aij]3 3 -2 1
2-2 2
The cofactor of 11 in A is
-2 1
An = (-l) 1 + 1 -4 + 2:
-2 2
3 1
An = ( - 1 ) 1 + 2 (_l)(6-2) = -4.
2 2
T
2 -4 -2 -2 8 -2
8 18 10 = - 4 18 - 5
2 -5 -4 - 2 10 - 4
λ
T h e o r e m 6.31 / / A invertible, then A = de^A^ adj(A).
210 Chapter 6 Determinants
Q"n\ &η2 ''' ann ^ln ^-2n ' " " Art
and hence c^· = Y^=i dikAjk- By Theorem 6.19, Σ%=ι aikAjk = 6ij det(.A). Thus
Aad)(A) =det(A)In.
1
adj(A) = Jn,
det(A)
and it follows that
A- 1 adj(yl).
det(A)
11 - 6 2
det(i4) 3 -2 1 = -2
2-2 2
and
η T |
2 -4 -2 1 -4 1
1
A- 8 18 10 = 2 -9 |
2 -5 -4 1 -5 2
Example 3 D The formula in Theorem 6.31 has its greatest usefulness with 2 x 2
matrices. If
a b
and ad — be φ 0, then
1 d -b
A-1
ad — 6c -c a
6.6 Determinants and Matrix Multiplication 211
Exercises 6.6
1. Use the formula of Theorem 6.31 to find the inverse of the given matrix.
1 2 1 -4 -5 1 0 1
(a) 1 0 1 (b) 3 3 (c) 1 1 2
0 1 -1 -1 -1 3 4-2
1 [4 2 3 3 2
(d) -1 (e) 6 -5 (f) 2 4 2
0 3 3 7 10 6
2. Use Theorem 6.29 to express det(Arn) in terms of det{A) for an arbitrary integer
m (A0 = I). In particular, obtain the result det(A x) 1
det(A) '
3. Given that A and B are square matrices of the same order, det (A) 2, and
det(B) = 6. Find the value of d e t ^ - 1 ^ ) .
λ
4. If A and B are invertible and of the same order, express det ({AB) ) in terms
of det (A) a n d d e t ( £ ) .
5. Express det (adj(^l)) in terms of det (A), where A is an n x n matrix.
6. Prove Corollary 6.26.
7. A matrix A is called skew-symmetric if AT — —A. Prove that any skew-
symmetric matrix of odd order is singular if 1 + 1 φ 0 in T.
8. Suppose that the matrices A — [α^·] η ,β = [bij]n, and C = [cij]n are such that
x
r/ bij Cij whenever i φ fc, and c^j — α^ + bkj for each j . Prove that
det(C)=det(A)+det(£).
9. A submatrix of an m x n matrix A is a matrix obtained by deleting certain rows
and/or columns of A. For an arbitrary matrix A (not necessarily square), the
number p{A) is defined as follows:
(i) if A = 0, then p{A) = 0,
(ii) if A φ 0, then p{A) is the largest possible order for a square submatrix
of A that has a nonzero determinant.
(a) Prove that if the matrix B is obtained from A by an elementary row opera
tion, then p{B) < p{A). Hence conclude that p{B) = p{A).
(b) Prove that if the matrix B is row-equivalent to A, then p{B) = p{A).
(c) Prove that p{A) is the rank of A. {Hint: See Corollary 3.49.)
10. Suppose that A is a nonzero square matrix of order 3 that is singular. What is
the most precise statement that can be made about the rank of A?
212 Chapter 6 Determinants
11. Find all values of x and y in the field of complex numbers for which the matrix
x y
has rank 1.
1 x y
Chapter 7
7.1 Introduction
In Chapter 5, we studied linear transformations of a vector space U into a vector space
V, where U and V were vector spaces over the same field T. We turn our attention
now to the special case in which U = V. More precisely, we shall be concerned here
with linear operators on a finite-dimensional vector space V. Throughout this chapter,
T will denote a field, V will denote a finite-dimensional vector space over T, and T will
denote a linear operator on V.
We say that V and λ are associated with each other, or that they correspond to
each other. Thus a scalar λ is an eigenvalue of T if and only if there exists a nonzero
vector v such that T(v) = λν. The set of all eigenvalues of T is called the spectrum
of T.
The term "eigenvalue" in the definition is not completely standardized. Other terms
used interchangeably are characteristic value, characteristic root, proper value,
and proper number. The German word "eigen" translates into English as "charac
teristic," but the hybrid word eigenvalue seems to be more widely used than any of
the other terms. Similarly, other terms for eigenvector are characteristic vector and
proper vector.
213
214 Chapter 7 Eigenvalues and Eigenvectors
We have already studied the intimate relations between a linear transformation and
the various matrices that represent it relative to different bases. We shall see presently
that the matrices that represent a linear operator are also useful tools in investigating
the eigenvalues of the operator. The principal connection between a linear operator T
and a matrix A that represents it is provided here by the characteristic matrix of A.
Definition 7.2 Let A be annxn matrix over T, and let x represent an indeterminate
scalar. Then the matrix A — xl is called the characteristic matrix of A.
Example 1 D The characteristic matrix of
8 5 6 0
0 - 2 0 0
-10 - 5 - 8 0
2 1 1 2
8-x 5 6 0
0 -2-x 0 0
A-xI =
-10 -5 -8-x 0
2 1 1 2-x
it is clear that the term with highest degree is the product of the diagonal elements
(an - x)(a 22 -x)-" (ann - x).
Hence det(^4 — xl) is a polynomial in x of degree n with lead coefficient (—l) n , say
det(A - xl) = {-l)nxn + cn-ixn~l H h c\x + c 0 .
Upon setting x = 0, we find that c$ = det(A).
Definition 7.3 For any square matrix A over J-\ the polynomial det(A — xl) is the
characteristic polynomial of A. The equation det(A-xI) — 0 is the characteristic
equation of A, and the solutions of det(A — xl) = 0 are called the eigenvalues of A.
The set of all eigenvalues of A is called the spectrum of A.
7.2 Eigenvalues and Eigenvectors 215
—x 5 6 0
0 -2-x 0 0
det(A - xl)
-10 -5 i-x 0
2 1 1 2-x
= x4 - Sx2 + 16
x4 - Sx2 + 16 = 0
x4 - 8x2 + 16 = (x + 2)2(x - 2) 2 ,
Theorem 7.4 Let A be any matrix that represents the linear operator T. Then T and
A have the same spectrum.
The proof of the preceding theorem rests primarily on the equivalence of the equa
tions
T(v) = λν and AX = XX.
We see that λ is an eigenvalue of A if and only if there exists a nonzero n x 1 matrix X
such that AX = XX. This motivates the following definition of eigenvectors of matrices.
Thus the eigenvectors of matrices and linear operators are related in this way: If
the n x n matrix A represents T relative to the basis A of V, then X is an eigenvector
of A corresponding to λ if and only if X is the coordinate matrix relative to A of an
eigenvector of T corresponding to the same eigenvalue.
The method of proof of Theorem 7.4 provides a systematic method for determining
the eigenvalues of a given linear operator. Any convenient choice of a basis A of V will
determine the matrix A that represents T relative to A, and the eigenvalues of T are
precisely the solutions of the characteristic equation det(v4 — xl) = 0. The eigenvectors
v corresponding to a particular eigenvalue λ are just those nonzero vectors v in V with
coordinates X relative to A that satisfy (A — \I)X = 0. An illustration of this procedure
is given in the next example.
Example 3 □ Consider the linear operator on R 4 defined by
T(xi,X2,X3,X4)
10 5 6 0 Xl 0
0 0 0 0 X2 0
-10 - 5 -6 0 X3 0
2 1 1 4 X4 0
Solving this system, we have
10 5 6 0 0 1 4 1 1
2
0 12 0
0 0 0 0 0 0 -20 0 0 1 -20 0
-10 - 5 - 6 0 0 0 0 0 0 0 0 0
2 1 1 4 0 0 0 0 0 0 0 0
7.2 Eigenvalues and Eigenvectors 217
X\ — —\X2 — 12X4
X2 = X2
xs = 20^4
X4 — X4 5
= x2(-±, 1,0,0)+x4(-12,0,20,1).
Γ 6 5 6 0 Xl 0
0-4 0 0 X2 0
-10 - 5 -10 0 X3 0
I 2 1 1 0 X4 0
[~1 0 0 0 0]
0 1 0 0 0
O O I O O '
lo 0 0 0 ol
( x i , x 2 , x 3 , x 4 ) = £4(0,0,0,1). ■
The procedure described just before Example 3 is not always as simple and effective
as it might seem, for there are several complications that may arise. There is usually no
difficulty in determining the matrix A. (In most cases, the matrix A is already known.)
But one may encounter trouble in the solution of the resulting characteristic equation
det(A - xi) = 0.
First of all, some or all of the solutions to det(A — xi) = 0 may not lie in the
field T. If an eigenvalue λ is not in T, then the nonzero coordinates xi in a solution
of AX = XX are not in T. For the nonzero elements of AX are in T whenever those
218 Chapter 7 Eigenvalues and Eigenvectors
Example 4 □ Before stating the physical problem to be solved, we recall that the force
required to stretch or compress a spring by an amount x is directly proportional to x,
and that the constant of proportionality is called the spring constant. Thus F — ex,
where F is the force on the spring and x is the change in length caused by F.
Consider now the mechanical system shown in Figure 7.1.
1 1
1 1
, 1 , 1
1 1
MB-
1
rvffiin-
1
M, M2
1
1 1
1 1
>
Xj = 0 x2 = 0
Figure 7.1
On the horizontal plane containing the line OX, an object Mi of mass 1 unit is
connected to the fixed point P by a first spring with spring constant c\ = 3 . A second
object M.2 of mass 1 unit is then connected to M\ by a second spring with spring constant
C2 = 2. The centers of gravity of M\ and M% lie on a horizontal line through P. The
object Mi is displaced 1 unit toward P from its equilibrium position, and the object
M.2 is displaced from its equilibrium position 2 units away from P. The two objects
are released at time t — 0, and it is desired to find the positions of the objects at any
subsequent time t. The masses of the springs and frictional forces are to be neglected,
and no external forces act on the system.
Let x\ and #2 denote the displacement from the equilibrium positions of Mi and
M2, respectively. Each displacement Xi is measured with the positive direction to the
right as shown in Figure 7.2.
7.2 Eigenvalues and Eigenvectors 219
I
I ^
X, i
>
M
NUÛÛQOH · mmA
A/,
O - > JC
x, = 0
Objects in Motion
Figure 7.2
There are two forces acting on Mi at any time t > 0, one from each spring. The
first spring exerts a force F\ given by F\ = — 3xi since Fi acts in the direction opposite
to the displacement. The second spring exerts a force F2 given by F2 — 2(x2 — #1) since
X2 — x\ is the (directed) change in the distance from the center of gravity in Mi to that
in M2. According to Newton's second law, the sum of the forces acting on Mi is equal
to the product of its mass and its acceleration. Since Mi has unit mass, this requires
that
d2x\
-3#i + 2(#2 — x\) = —5xi + 2x2·
dt2
The only force acting on the object M2 is the force —2(^2 — #1) due to the second
spring, and Newton's second law yields
d2x2
= 2xi - 2 x 2 .
~dF
Thus the original problem has been reduced to that of solving the system of differ
ential equations:
dzxx
dt2 = —5xi — 2x 2
2
d x2
dt2 = 2xi — 2x2-
X2 = b\ COSCJit + 62 COSCJ2η
(A justification for this assumption would be quite a digression, and we shall see mo
mentarily that it is a valid one, at any rate.)
220 Chapter 7 Eigenvalues and Eigenvectors
«1 02
ω{ coscjii + ^ 2 COSüJ2^
61 62
5 -2 Û1 5 -2 «2
COSCJit + cosu^
-2 2 -2 2 62
> _
Since this equation is an identity in i, we must have
Û1 5 -2 «1
=
>_ -2 2
and
2 Û2 5 -2 ^2
a;;2
_62_ -2 2
> _
5 -2 CLi
Thus, a;2 and ω\ must be eigenvalues of the matrix A , and must
-2 2
> _
be an eigenvector corresponding to u;?.
The eigenvalues of A are found to be λι = 1, X2 = 6 . With ω\ — 1 and CJ2 — \/6> the
eigenvectors are given by
Q>\ 1 «2 -2I
= «1 and -62
_&1_ 2 _&2_ 1
Exercises 7.2
Find the eigenvalues of the given matrix A.
-1 1 2 1 2 1 2
1. A = 2. A = 3. A = 4. A =
-2 2 -2 3 4 5 4
2-2 3 - 1 2 2 1 -1 -1
5. A = 1 1 1 6. A 2 2 2 7. A = 1 3 2
1 3 -1 -3 - 6 -6 -1 - 1 0
1 1 -1 3 1 - 1 2 2
8. ,4 = -1 3 -1 9. Λ - -1 1 10. A = -2 3 2
-12 0 1 1 -1 1 2
3 0 4 4 2 0 0 0
0 -1 0 0 0 1 -1 1
11. ,4 = 12. A =
0 - 4 -1 - 4 1 0 1 0
0 4 0 3 1 0 -1 2
4 -5
17. The linear operator T on R 2 has the matrix relative to the basis
-4 3
{(1,2),(0,1)}.
{(1,2), (0,1)}. Find
Fine the eigenvalues of T, and obtain an eigenvector corresponding
to each eigenvalue.
2 1 0
18. Let T be the linear operator on V2 over R that has the matrix A 0 2 0
2 3 1
with respect to the basis {x 2 , x — 2, x + 1}. Find the eigenvalues of T, and find an
eigenvector of T corresponding to each eigenvalue.
222 Chapter 7 Eigenvalues and Eigenvectors
In Problems 19 and 20, find the spectrum of the given linear operator T on V2 over R
and find an eigenvector of T corresponding to each eigenvalue.
19. T(a0 + a\X + a2x2) = (2a0 + a>\) + Q>ix + (2a 0 + 2a\ + a2)x2
21. Given that T is a linear operator on R 2 with T(2,1) = (5,2) and T(l,2) -
(7,10), determine the eigenvalues of T and a corresponding eigenvector for each
eigenvalue.
22. Prove that a square matrix A is not invertible if and only if 0 is an eigenvalue of
A.
24. Assume that T is a linear operator on R 3 that has eigenvalues 1, 2, and 3, with
associated eigenvectors (2,1,3), (1,4,0), and (1,0,0), respectively. Find the eigen
values of T _ 1 , and give an eigenvector associated with each eigenvalue.
27. Translate the results of Problem 26 into statements concerning the eigenvalues
and eigenvectors of a polynomial p(A) in a square matrix A.
Theorem 7.6 For each eigenvalue XofTin T, let N\ be the set consisting of the zero
vector together with all eigenvectors of T in V that are associated with X. Then each
V\ is a subspace of V.
7.3 Eigenspaces and Similarity 223
Definition 7.7 The subspace Y\ of Theorem 7.6 is called the eigenspace ofT that is
associated with the eigenvalue X. The dimension of VA is called the geometric multi
plicity of X.
There is another approach that we could have taken in proving the last theorem.
For T(v) = λν if and only if (T — λ)ν = 0, i.e., if and only if v is in the kernel of T — X.
Thus VA is nothing more than the kernel of T — λ, and the geometric multiplicity of λ
is precisely the dimension of the kernel of T — X. This translates the problem of finding
an eigenspace into the familiar problem of finding a kernel. In Example 3 of Section 7.2,
V _ 2 = (T + 2)- 1 (0) = ( ( - i l , 0 , 0 ) , ( - 1 2 , 0 , 2 0 , 1 ) )
and
V 2 = ( T - 2 ) - 1 ( 0 ) = ((0,0,0,l)).
Concerning those eigenvectors of T that are associated with different eigenvalues,
we have the following result.
Theorem 7.8 Let {λι, λ 2 ,..., λ Γ } be a set of distinct eigenvalues of the linear operator
T. For each i, 1 < i < r, let wi be an eigenvector ofT corresponding to λ^. Then
{vi, v 2 ,..., v r } is a linearly independent set.
Proof. The proof is by induction on r. The theorem is true for r = 1 since any
eigenvector is nonzero.
Assume that the theorem is true for any set of k distinct eigenvalues. Let the set
{λι, ...,λ/cÀjfc+i} be a set of k + 1 distinct eigenvalues of T, with v* an eigenvector
corresponding to λ^. Suppose that ci, ...,Cfc,Cfc+i are scalars such that
Ci(Xi -Afc+i) = 0
Cl = c2 = · · · = ck = 0.
Thus in (7.1) we have Ck+iVk+i = 0, and hence Ck+i = 0. This shows that the set of
fc + 1 eigenvectors is linearly independent, and it follows that the theorem is true for all
positive integers r. ■
Corollary 7.9 / / VA 1 7 VA 2 , ··., VAr are distinct eigenspaces ofT and {vi,V2, ...,v r } is
a set of eigenvectors such that v^ G VA^ for i = 1, 2,..., r, then {vi, v 2 ,..., v r } is linearly
independent.
Proof. The eigenvalues Ai,A 2 ,...,A r must be distinct in order for the eigenspaces
VA 1 5 VA 2 , ..., VAr to be distinct. Since v* is an eigenvector associated with λ^, then the
set {vi, V2, ···, v r } is linearly independent by the theorem. ■
Corollary 7.10 IfV\1, VA 2 , ..., VAr are distinct eigenspaces ofT, the swra VA1+VA2 +
· · · + VAr is direct.
Proof. Suppose that the sum is not direct, and let ν& be a nonzero vector in VA*.
that is also contained in Σ%=ι VA.,·. Then there are vectors Vj in V\j and scalars aj
such that
r
A = {vi,...,Vfc_i,Vfc+i,...,v r }.
Since ν^ φ 0, there are nonzero vectors in A. Let *4' = {v^, v 2 ,..., v£} be the nonempty
set obtained by deleting all zero vectors from A. Then v^ is dependent on A' so that
the set {v'l7 v 2 ,..., v{, v^} is linearly dependent. But {v'l5 v 2 ,..., v£, v / J is a set of eigen
vectors that satisfies the hypothesis of Corollary 7.9. Thus we have a contradiction, and
it follows that the sum 5Z[ =1 VA^ is direct. ■
7.3 Eigenspaces and Similarity 225
Suppose that T has matrix A relative to the basis A of V, and that T has matrix
B relative to the basis A' of V. If P is the (invertible) matrix of transition from A to
A!i then Theorem 5.15 asserts that B = P~lAP. This leads to the following definition.
Definition 7.11 Let A and B be nxn matrices over ?'. Then B is similar to A over
T if there is an invertible matrix P with elements in T such that B = P~lAP.
It is left as an exercise (Problem 16) to show that this relation of similarity is a true
equivalence relation on the set of n x n matrices over T. This relation proves to be a
useful tool in the investigation of the eigenvalues of matrices.
The remarks just before Definition 7.11 show that two nxn matrices over T are
similar over T if and only if they represent the same linear operator on an n-dimensional
vector space V over T.
The strong connection between the relation of similarity and the eigenvalues of
matrices becomes apparent in our next theorem.
Proof. If B is similar to A over T', then there is an invertible matrix P such that
B = P~lAP. Thus
det{B-xI) = det(P-MP-xJ)
= àet(P-lAP-xP-lP)
= det (F" 1 (A -xI)P)
= detiP'1) · det{A - xl) · det(P)
= det(P-1P)-det{A-xI)
= det(A- xl),
Near the beginning of this section, we defined the geometric multiplicity of an eigen
value of a linear transformation. There is a second type of multiplicity for eigenvalues,
the algebraic multiplicity.
Definition 7.14 Let λ be an eigenvalue ofT, and let A be any matrix that represents
T. The algebraic multiplicity of λ is the multiplicity of λ as a root ofdet(A-xI) = 0.
From Theorem 7.12 and our discussion concerning Definition 7.11, it is clear that the
algebraic multiplicity of an eigenvalue is well-defined. That is, the algebraic multiplicity
is independent of the choice of the matrix A.
226 Chapter 7 Eigenvalues and Eigenvectors
Example 1 D In Example 3 of Section 7.2, the linear operator T had the two distinct
eigenvalues λι = — 2 and X% = 2. Examining the characteristic polynomial of A, we see
that the algebraic multiplicity of each eigenvalue is 2. Upon comparing the algebraic and
geometric multiplicities, we find that the two are equal for Ai, but that the algebraic
multiplicity of X3 exceeds the geometric multiplicity, which is 1. Our next theorem
shows that the situation in this example illustrates the only possibilities. ■
Theorem 7.15 The geometric multiplicity of an eigenvalue does not exceed its algebraic
multiplicity.
λ 0 ·· · 0 αι>Γ+ι ·· · ain
0 λ · · · 0 a2,r+l ' · · Û2n
A =
0 0 · · · λ ar , r + l
0 0 ··· 0
and
& r + l , r + l ~~ X ' ' ' Gr+l,n
r
det(A - xl) = (A - x)
&n,r+l Cinn X
Thus the algebraic multiplicity of λ is at least r. That is, the geometric multiplicity
does not exceed the algebraic multiplicity. ■
Example 2 D Let T be the linear operator on V2 over R defined by
T(a0 + a\x + a2x2) = (-4a 0 - a\ - a2) + (4a0 + 2a2)x + (2a0 + a\ - a2)x2.
We shall find the algebraic and geometric multiplicity of each eigenvalue of T and obtain
a basis for each eigenspace. The set { Ι , χ , χ 2 } is a more convenient choice of basis, but
we shall use the basis
A= {Ι,Ι + χ , Ι + χ + χ 2 }
to emphasize the difference between the eigenvectors of T and the eigenvectors of the
matrix A that represents T relative to A.
Since
T(l) = - 4 + 4x + 2x 2
= (-8)(l) + 2(l + x) + 2(l + x + z 2 ),
7.3 Eigenspaces and Similarity 227
T ( l + x) = - 5 + 4x + 3x 2
= (-9)(l) + l ( l + x) + 3(l + x + x 2 ),
T ( l + x + x 2 ) = - 6 + 6x + 2x 2
= ( - 1 2 ) ( l ) + 4 ( l + x) + 2(l + x + x 2 ),
Γ
-8 - 9 - 1 2
2 1 4
2 3 2
-8-x -9 -12
det(A-xI) = 2 1-x 4
2 3 2-x
3 2
= - ( x + 5x + 8x + 4)
= - ( x + l)(x 2 + 4x + 4)
= - ( x + l)(x + 2) 2 .
Thus the eigenvalues of T and A are —2 with algebraic multiplicity 2 and —1 with
algebraic multiplicity 1.
For the eigenvalue λ = — 2, the system (A + 21) = 0 appears as
- 6 - 9 -12 Xl 0
2 3 4 £2 = 0
2 3 4 ^3 0
1 2 0
0 0 0 0
0 0 0 0
From this, we see that the solutions to (A + 21) = 0 are given by x\ — —\x<i — 2x3,
with X2 and £3 arbitrary. That is, the coordinates of vectors in the eigenspace V_2 are
228 Chapter 7 Eigenvalues and Eigenvectors
3 -2
Xl \x2 - 2x3 2
X = X2 X2 = X2 1 H-^3 0
X3 X3 0 1
of A. Thus V_2 has dimension 2. We can find coordinates for a basis of V_2 by first
setting #2 = — 2 and x% = 0 to obtain
X =
2
X = 0
-1
(Any other linearly independent pair of coordinate matrices X would serve as well, of
course.) Corresponding to these coordinates, we have the vectors 3(1) + (—2)(1 + x) =
1 - 2x and 2(1) + (-1)(1 + x + x2) = 1 - x - x2 that form a basis of V_ 2 .
For the eigenvalue λ = — 1, (A + I)X = 0 is given by
-7 - 9 12 xi 0
2 2 4 X2 = 0
2 3 3 X3 0
1 0 3 0
0 1 - 1 0
0 0 0 0
xi -3
X = X2 = X3 1
X3 1
7.3 Eigenspaces and Similarity 229
Exercises 7.3
In Problems 1-6, let T be the linear operator on R n that is represented by the given
matrix A relative to the standard basis of R n . Find the algebraic multiplicity and
the geometric multiplicity of each eigenvalue of T. The matrices here are taken from
Problems 1-12 in Exercises 7.2.
1 -1 -1
-1 2 1 2
1. A 2. A = 3. A = 1 3 2
-2 3 2 -2
-1 -1 0
3 0 4 4 2 0 0 0
3 1 -1
0 - 1 0 0 0 1 - 1 1
4. A -1 1 1 5. A 6. A--
0 -4 -1 -4 1 0 1 0
1 1 1
0 4 0 3 1 0 - 1 2
7. For each matrix A below, let T be the linear operator on R 3 that has matrix
A relative to the basis A = {(1,0,0), (1,1,0), (1,1,1)}. Find the algebraic and
geometric multiplicities of each eigenvalue, and a basis for each eigenspace.
8 5 - 5 -4 -3 -1
(a) A 5 8 - 5 (b)A = -4 0 -4
1 15 15 - 1 2 8 4 5
3 2 2 8 5 6
(c)A- 1 4 1 (d)A = 0-2 0
-2 -4 -1 -10 - 5 -8
8. Let C denote the field of complex numbers, and let Cn be as defined in Example
1 of Section 4.2. Let £n be the basis Sn = {ei,e2, ...,e n } of C n , where ei =
(1,0,0,..., 0), e2 = (0,1,0,..., 0), etc. If T is the linear operator that has the given
matrix A relative to £ n , find the algebraic and geometric multiplicities of each
eigenvalue, and a basis for each eigenspace.
230 Chapter 7 Eigenvalues and Eigenvectors
2 1-2 5 i
(aM = (h)A = (c)A
1+ 2 3 -i 2
i 0 0 1+ 2 0 0
(d)A = -2i i -2 +2 (e)A = -22 1 + 2 22
0 0 - 2 2 0 1
- 2 + 2i 0 - 2 + i
(f)A = 0 -i 0
4 — 2z 0 4-z
In Problems 9-12, find the eigenvalues of the given linear operator T on V2 over R. For
each eigenvalue, (a) state the algebraic multiplicity, (b) state the geometric multiplicity,
and (c) find a basis for each eigenspace.
1 2
13. Let T be the linear operator on R 2 that has the matrix A = relative to
2 -2
the standard basis of R 2 .
5 -2
14. The linear operator T on R 2 has the matrix relative to the basis
-2 2
A — {(3,3), (1, —1)}· Find the eigenvalues of T and obtain an eigenvector of T
corresponding to each eigenvalue.
15. Suppose that the basis A = {vi, V2,..., v n } of V consists entirely of eigenvectors
of T. Determine the matrix of T relative to A.
16. Prove that the relation of similarity over T is an equivalence relation on the set
of all n x n matrices over J7.
17. Which n x n matrices over T are similar to ΙηΊ
7.4 Representation by a Diagonal Matrix 231
19. Prove that if A and B are n x n matrices over T with A invertible, then BA is
similar over T to AB.
24. For any square matrix A = [a^] n , the trace of A, t(A), is defined by t(A) =
ΣΓ=ι α " · That is, t(A) is the sum of the diagonal elements of A. Prove that if B
is similar to A, then t(B) — t(A).
Theorem 7.16 The linear operator T onV can be represented by a diagonal matrix if
and only if there is a basis of V that consists entirely of eigenvectors of T.
232 C h a p t e r 7 Eigenvalues a n d Eigenvectors
[ 0 0 ·. · λη J
Thus T is represented by a diagonal matrix relative to A.
On the other hand, if T has a diagonal matrix
[di 0 ... 0 1
D 0 d2 · · · 0
= . l·
[ 0 0 · ·. dn \
relative to the basis {νχ, v 2 ,..., v n } of V, then T(v^) = e^v; so that each di is an
eigenvalue of T with v^ as an associated eigenvector. ■
Proof. This follows at once from the last part of the proof of the theorem. ■
Corollary 7.18 If T has n distinct eigenvalues in T, then T can he represented by a
diagonal matrix.
Proof. Suppose that T has n distinct eigenvalues λι, λ 2 ,..., λ η in T. Consider a set
A = {vi, V2,..., v n } of n vectors in V that contains exactly one eigenvector correspond
ing to each λ^. The set A is linearly independent by Theorem 7.8 and therefore forms a
basis of the n-dimensional vector space V. ■
Corollary 7.19 / / the n x n matrix A over T has n distinct eigenvalues in J-*, then A
is similar over F to a diagonal matrix.
B = { U n , . . . , U i n i , U 2 1 , ...,U2n 2 î •••?U r i, . . . , U r r i r } .
ν λ ι + ν λ 2 + ··· + νλΓ.
The sum 5Z[ =1 V ^ is direct by Corollary 7.10, and therefore
(E M E
r \ r r
v = dim v
( ^) = Eni·
i=l / i=l i=l
Hence B is a basis of $Z[ =1 V A ^
Assume that there exists a basis {vi, V2,..., v n } of eigenvectors of T. Each Vj is in
some V\j and therefore dependent on B. This means that B spans V and consequently
has n elements since it is linearly independent. Thus 5^[ =1 Ui — n and nz = mz for
i = l,2,...,r.
Assume now that n2 = mz for i = l,2,...,r. Then 5^[ =1 nt- = n so that # has
n vectors. Since B is linearly independent, ß must be a basis of V. And since B is
composed of eigenvectors of T, the proof is complete. ■
In the remainder of this chapter, the frequent references to diagonal matrices make
it desirable to have a more compact notation for this type of matrix. This notational
convenience is provided in the next definition.
Definition 7.21 The diagonal matrix D = [dij]n with dij = 0 for i φ j and da = Xi
will be denoted by D = diag{Ai, λ2,..., λ η } .
We have seen that the problem of finding a diagonal matrix and a basis such that
a given linear operator is represented by the diagonal matrix is one type of eigenvalue
problem. Since we have a systematic method for finding the eigenvalues and eigenvectors
of a linear operator, we are already equipped to solve this type of problem. We also have
available from Chapter 5 a method for finding an invertible matrix P such that P~~XAP
is diagonal. For, with any convenient choice of basis #, P is the matrix of transition
from Λ to a basis A' of eigenvectors of T.
234 Chapter 7 Eigenvalues and Eigenvectors
In most eigenvalue problems, the linear operator T is not given explicitly. Instead,
one encounters the matrix A and is confronted with the problem of finding an invertible
P such that P~lAP is diagonal. In such a situation, the formulation of the problem in
terms of linear operators, vectors, and bases is only an encumbrance. It is more efficient
to proceed directly to the problem of finding the columns of P. For this procedure, it
is desirable to formulate the problem P~lAP = D = diag{Ai, A 2 ,..., λ η } in the form
AP = PD. With A = [α^] η and P = \pij]n, the element in the z th row and j t h column
of AP is Σ £ = ι dikPkj, whereas the corresponding element in PD is PijXj. With j fixed,
we have
Σ£=1 alkPkj PijXj Pij
Û&21 a
22 &2n P2j
APu
7 3 3 2
0 1 2 - 4
-8 - 4 - 5 0
2 1 2 3
7.4 Representation by a Diagonal Matrix 235
4 3 3 2 Xl 0
0-2 2 4 X2 0
-8 -4 -8 0 ^3 0
2 1 2 0 £4 0
3 1
-2
Pi and P2 =
-2
0
provide two linearly independent columns of P. Repetition of the same procedure yields
the solutions
1 1
-2 -6
ft = for Xs = 1 and P4 = for XA = —1.
0 4
0 -1
Thus the matrix
3 1 1 1
2 -2 -2 -6
P=[P1,Pi,P3,P4]
2 0 0 4
0 1 0 -1
236 Chapter 7 Eigenvalues and Eigenvectors
1 -1 -1
-1 2 1 2
1. A = 2. A = 3. A = 1 3 2
-2 3 2 -2
-1 -1 0
3 0 4 4 2 0 0 0
3 1 -1
0 - 1 0 0 0 1 - 1 1
4. A = -1 1 1 5. A = 6. A =
0 -4 -1 -4 1 0 1 0
1 1 1
0 4 0 3 1 0 - 1 2
In Problems 7-10, (a) determine whether the given linear operator T can be represented
by a diagonal matrix, and (b) whenever possible, find a diagonal matrix and a basis such
that T is represented by the diagonal matrix relative to the basis. The linear operators
are the same as those in Problems 13-16 in Exercises 7.2.
7. T(xi,x2) = (xi + 2x 2 ,2x x - 2x 2 ) on R 2
8. T(x l 5 x 2 ) = (2a:i + x 2 ,xi - x2) on R 2
9. T(xi,x2,x3) = (xi + x2 - x 3 , —xi + 3x 2 - #3, —x\ + 2x2) on R 3
10. Τ{χι,Χ2,Χ$,Χ4) = (Xi + X3 +#4,2X1 + X2 +3#3 +X4, - # 1 - #3 - #4,
3#i + 2x2 + 5x3 + X4) on R 4
In Problems 11-14, let T be the linear operator on R 3 that has the given matrix A
relative to the basis A = {(1,0,0), (1,1,0), (1,1,1)}. (a) Determine whether T can be
represented by a diagonal matrix, and (b) whenever possible, find a diagonal matrix
and a basis of R 3 such that T is represented by the diagonal matrix relative to the
basis. These linear operators are the same as those in Problem 7 of Exercises 7.3.
7.4 Representation by a Diagonal Matrix 237
8 5 - 5 -4 -3 -1
11. A 5 8 - 5 12. A = -4 0 -4
15 15 - 1 2 8 4 5
3 2 2 8 5 6
13. A = 1 4 1 14. A 0-2 0
-2 -4 -1 -10 - 5 -8
In Problems 15-20, (a) determine whether the given matrix A is similar over C to a
diagonal matrix, and (b) whenever possible, find an invertible matrix P over C such
that P~lAP is a diagonal matrix. The matrices here are the same as in Problem 8 of
Exercises 7.3.
2 1 - 2 5 z
15. A = 16. A
1+ i 3 -i 2
3 4 2 i 0 0
17. A 1 3 1 18. A: -2z 2 - 2 - h i
1 2 2 0 0 - 2
1+ 2 0 0 - 2 + 2i 0 -2 + z
19. A -2i 1 + 2 2i 20. A 0 - 2 0
i 0 1 4-2i 0 4-2
22. Whenever possible, perform a check on the work in the indicated problem by
computing P - 1 and verifying that P~l AP is indeed a diagonal matrix.
(a) Problem 1 (b) Problem 2 (c) Problem 3 (d) Problem 4
(e) Problem 5 (f) Problem 6 (g) Problem 15 (h) Problem 16
(i) Problem 17 (j) Problem 18 (k) Problem 19 (1) Problem 20
23. Give an example of a 2 x 2 matrix over R that is not similar over R to a diagonal
matrix.
238 C h a p t e r 7 Eigenvalues and Eigenvectors
24. Give an example of two 2 x 2 matrices that have the same characteristic equation
but are not similar.
0 1 0 ·· 0
0 0 1 ·· 0
C =
0 0 0 .· 1
-co - c i - c 2 ·· * -Cn-l _
26. Use the result of Problem 25 to write down a matrix with the given polynomial
p(x) as its characteristic polynomial.
(a) p(x) = -x3 + 5x2 - 2 (b) p(x) = x2 - 3x + 2
(c) p(x) = x4 + 5x 2 + 4 (d) p(x) = -x5 + 1
27. Suppose that λχ,..., λΓ are the distinct eigenvalues of T, and that each λ^ is in J7.
Prove that T can be represented by a diagonal matrix if and only if
V = V> 'Xr'
Chapter 8
Functions of Vectors
8.1 Introduction
There are several standard types of functions defined on a vector space that have found
widespread application. The linear transformation, which we have already studied, is
probably the most important of these, but there are others that are of great value. The
linear functional is central to the study of linear programming. The quadratic form
is frequently useful in statistics, engineering, and physics. We shall encounter each of
these types of functions in this chapter.
Definition 8.1 Let Y be a vector space over the field T. A linear transformation of Y
into T is called a linear functional on V. The set of all linear functionals on V is
denoted by V*.
239
240 Chapter 8 Functions of Vectors
Example 1 D Let V = R n , and let c = (ci, C2,..., cn) be a fixed vector in V. For each
v = (xi,X2, —,Xn) £ V, define / ( v ) to be
and
/ ( e i + e 2 ) = / ( l , l ) = >/2.
Thus / is not a linear functional on R 2 . 1
Example 3 □ For a fixed positive integer n, let V be the vector space Vn consisting of
all polynomials in x with coefficients in the field T and degree < n. For each polynomial
p(x) = Σ™=0α*χ2> define f(p(x)) = p(0). The mapping / is clearly scalar-valued. For
any p(x) = Σ™=0 aix1 and q(x) = Σ™=0 bix1 in Vn and any a, b G T,
Example 4 D Let V be the vector space Hnxn of all n x n matrices over R as defined
in Chapter 4. For each A = [α^·]η G V, the trace of A, denoted by t(A), is given by
t(A) = Σ™=ι au· It is left as an exercise (Problem 2) to verify that t is a linear functional
On R n x n · β
As mentioned earlier, it is already known that V* is a vector space over ?', just as
V is. The following theorem shows that V* has the same dimension as V whenever V
is of finite dimension.
Proof. Suppose that A = {ui, u 2 ,..., u n } is a basis of V. For each j , (j = 1,2,..., n),
let pj be defined at u = ΣΓ=ι Xi^i in V by
Since each u can be written uniquely as u = ΣΓ=ι XiUi, the value Pj(u) is well-defined,
and Pj is a mapping of V into ?'. For any a, 6 in T and u = ΣΓ=ι x^u^, v = ΣΓ=ι 2/iU2-
in V,
e ^ i P i ) ( u )·
CiPl + C2P2 H h C n p n = Z,
(
n \ n n
For later use we note that the defining property of the coordinate projections pj is
that Pj(ui) = 6ij, for each base vector u^.
Whenever V is finite-dimensional, V* is called the dual space of V. If V is of
infinite dimension, then V* is not necessarily isomorphic to V, and the term "dual
space" is not ordinarily used.
For the remainder of this section, V will denote an n-dimensional vector space over
a field T". As a linear transformation of V into ?', each linear functional has a l x n
matrix relative to each basis of V. According to Definition 5.10, the matrix of / relative
to the basis A — {ui, U2,..., u n } of V is A = [ai, a2,..., an] where ctj = f{uj). (We are
using the basis {1} of T here, and will adhere to this choice consistently.) The matrix of
/ provides a convenient method of computing the values / ( u ) . For if u has coordinate
matrix X = [χι,Χ2, ••·,^η] 7 \ then
i=l
Xl
X<2
a\ Ü2 · · · an
= AX.
Actually, this result is nothing new, but merely a special case of Theorem 5.12.
The set {1} is clearly the simplest choice of basis for ?', so we do not propose to
make any changes here. But there is no reason to restrict ourselves in the choice of
basis in V, and Theorem 5.15 describes completely the results of such a change. If /
has matrix A relative to the basis A of V, and if Q is the matrix of transition from A
to the basis B, then / has matrix B — AQ relative to B. (The space V here is playing
the role of U in Theorem 5.15.)
Any basis of V has a dual basis in V*, so a change of basis from A to B in V induces
a corresponding change of basis from A* to B* in V*. Our next theorem describes the
relation between these changes of bases.
Theorem 8.4 If Q is the transition matrix from the basis A = {ui,U2, ...,u n } to the
basis B = {vi, V2,..., v n } of V, then (QT) is the matrix of transition from A* to B*
mV*.
Proof. Rather than prove the statement in the conclusion, we shall prove the equiv
alent assertion that QT is the transition matrix from B* to A*.
Let A* = {pi,P2,-..,Pn}, and let #* = {gi,g 2 ,...,gn}. Now p^ = Y2=ickjSk,
8.2 Linear Functionals 243
The ideas developed in this section are illustrated in the following example.
Example 6 □ Let V be the vector space C3 of all ordered triples of complex numbers. 1
The mapping / given by f{c\,C2,cs) = ic\ — ic 2 + c% is a linear functional on C3.
Relative to the basis A = {ui = (1,0,0),u 2 = (0, i,0),U3 = (0,1, z)}, / has matrix
A = [z, 1,0]. The coordinates Xi of u = (ci,c 2 ,C3) relative to A are given by x\ —
c\,%2 = -ic2 + c 3 ,x 3 = - i c 3 , and
ci
The elements of the dual basis A* = {pi, P2, P3} are given by
Pi(ci,c 2 ,c 3 ) = ci,
p 2 (ci,c 2 ,c 3 ) = -zc 2 + c 3 ,
P3(ci, c 2 ,c 3 ) = - i c 3 .
The matrix
0 -2 0
Q = 1 0 1
0 0 -i
is the matrix of transition from A to B = {vi,v 2 ,V3}, where vi = (0,z,0),v 2 =
(—i, 0,0), V3 = (0,0,1). The matrix of / relative to B is
B = AQ= [1,1,1].
The coordinates yi of u = (ci, c 2 , c3) relative to # are ?/i = — zc 2 ,y 2 = ici, 2/3 = C3, and
gi(ci,c 2 ,c 3 ) = -ic2,
g2(ci,c 2 ,c 3 ) = ici,
g3(ci, C2,C3) = C3.
0 i 0
(QT)_1 = 1 0 0
-i 0 z
so that gi = p2 — ip 3 ,g2 = ψ ι , and g 3 = φ 3 . The reader may verify that these last
equalities are correct by evaluating both members for an arbitrary u GC 3 . ■
Exercises 8.2
4. Let V be the vector space of all real-valued continuous functions defined on the
closed interval [0,1] . For each g G V, put h(g) = J0 g(t)dt. Determine whether
or not h is a linear functional on V.
7. Let . 4 = {(1,1,0), (1,0,0), (1,1,1)} and A* = {pi,P2,P3>, and let / be the linear
functional that has coordinates [1, 2,3] T relative to A*. Find the values of /(5,4,3)
and f{x\,X2,X3)·
8. Let u be a fixed vector in the n-dimensional vector space V.
10. Let £3 = {gi,g2,g3} denote the dual basis of £3 = {βι,β2,β3}. In each part of
Problem 9, find the coordinates of the elements Pi of A* relative to £3.
11. Find the matrix of the given linear functional relative to the given basis of R".
12. Use the matrices found in Problem 11 to compute /(e^) for each e^ in the standard
basis £n.
13. Suppose that the linear functional / has matrix A — [αι,α2, · . . , a n ] relative to
the basis A of V. Prove that the coordinate matrix of / relative to A* is AT.
14. Use the matrix of transition from <f* to A* to find A* = {pi,P2? —>Pn} f° r the
given basis A of R n . Write each p^ as a linear combination of the elements g^ of
15. Suppose that the linear functional / o n V has matrix A = [a\1 <22,..., an] relative
to the basis A of V, and that Q is the matrix of transition from A to B. Without
using Theorem 5.15, prove that the matrix of / relative to B is AQ.
T° = {u G V | / ( u ) - 0 for all / G T } .
19. Prove that (W°)° = W for any subspace W of V, and (Τ°)° = T for any
subspace T of V*.
22. Let V be of dimension n over T. It follows from Theorem 8.2 that (V*)* = V**
is an n-dimensional vector space over T.
(Comment: Since V and V** are n-dimensional spaces over T, each is isomorphic
to Tn. The isomorphism φ is such a natural one, however, that it is ordinarily used
to identify V and V** as being the same space. That is, u and hu are regarded as
the same entity. This point of view is advantageous in certain instances in linear
programming.)
i=l j=l
For convenience, we shall refer to a real quadratic form in this section simply as a
"quadratic form." And since q(v) is a polynomial in #i,£2> ...,x n ) w e refer to g as a
quadratic form in the variables χ\,Χ2·> ••·,^η· The use of parentheses in both q(v) and
v = (#i, #2? ···> xn) leads to the clumsy expression q(v) = q ((x 1? x2» ···> #n)), so we shall
drop one set of parentheses from this notation. Thus a quadratic form in χχ, Χ2 is given
by
q{x\,X2) = OLX\ + bx\ + cx\X2-
The student has no doubt encountered such expressions as
xy = 1.
Typically, the set of all points (x,y) in R 2 for which a given quadratic form has a
constant value is a conic section with center at the origin.
The value of a real quadratic form q(x,y,z) in x, y, z is a polynomial
where the coefficients are real numbers. Typically, the set of all points (x,y,z) in R 3
for which a given quadratic form has a constant value is a quadric surface.
1 4 0
0 -2 -3
0 0 4
248 Chapter 8 Functions of Vectors
a simple computation shows that XTAX is a matrix with a single element q(x\,X2, £3) :
T
X AX = [x\ - 2x\ + 4x3 + 4xix2 - 3x2^3] ·
When q(xi, X2, £3) and this matrix of order 1 are identified as being the same, we have
q{xi,X2,%z) = XT AX.
In choosing this matrix A, we simply entered the coefficient a^ of X^Xj 111 the i row
and j t h column of A. Since the cross-product terms can be written in many ways, the
matrix A used in q(xi,X2,X3) — XTAX is not unique. For instance, the term 4x\X2
can be written as 4x2#i or as 3xi#2 + £2^1 · These variations would lead one to use
l O o l |"l 3 0
4-2-3 or A = 1 -2 -3
0 0 4J I 0 0 4
With the cross-product terms split into equal parts, the symmetric matrix
1 2 0
A= I 2 -2 -§
4
L° 2 .
would be used. ■
For any q(x\,X2, ...,x n ) — ΣΓ=ι Σ τ = ι Cij%iXj, it is always possible to write
n n
i=l j= l
and with A = [a^] n a symmetric matrix, we say that A represents the quadratic form
g, or that A is the matrix of q relative to x\,X2, .-·,^η·
Lemma 8.7 If A is a real symmetric matrix of order n and XT AX = 0 for all choices
of X = [#i,2?2> ••·> χ η] 7 \ then A is a zero matrix.
Proof. Suppose that A is a matrix that satisfies the given conditions. Let xr = 1
and all other xi = 0. Then 0 = XTAX — arr. Now let r φ s, and put xr = 1 and
xs — 1, with all other xi — 0. Then
0 - XTAX
n n
i=lj = l
— (JjfgJLfJLß I LL ßf*iL ßJUf
— Ors i &sr·
But ars + asr = 2ars since A is symmetric, so we have ars = 0. Since r and s were
arbitrary, A is a zero matrix. ■
Theorem 8.8 The matrix of the real quadratic form q relative to xi,X2, ...,xn is unique.
Proof. Suppose that A and B both represent q relative to χι,#2, .-·,^η· Then
XTAX = q(xi,X2,:~,xn)
= XTBX,
Let us now reexamine the definition of a real quadratic form. As stated in Definition
8.5, q(v) is a scalar that is uniquely determined by the vector v in R n . Although the
discussion up to this point has been primarily in terms of the components Xi of v, this
should not obscure the fact that q is a function of the vector variable v. Now the vector
v is uniquely determined by its coordinates relative to any given basis of R n , and so the
250 Chapter 8 Functions of Vectors
value q(v) should also be uniquely determined by these coordinates. Our next theorem
gives q(v) explicitly in terms of these coordinates. Theorem 5.14 provides the key to
the proof.
q(v) = XTAX,
where A is the matrix of q relative to χι,Χ2, ...,χ η · Let P be the matrix of transition
from £n to the basis A and let y = [2/1,2/2, —?2/n]T be the coordinate matrix ofv relative
to A. Then
q(v) = YTBY,
where B = PT AP.
13 - 8 - 2 0 Xl
ç(xi,x 2 ,x 3 ) = Xl X2 X3 -8 5 12 X2
20 12 26 £3
1 0 2
P = 2-2 5
0 1 -1
8.3 Real Quadratic Forms 251
B == PTAP
1 2 0 13 - 8 - 2 0 1 0 2
0 -2 1 -8 5 12 2-2 5
2 5-1 -20 12 26 0 1 -1
1 0 ol
0-2 0
0 0 3j
Thus the expression for q(v) in terms of 2/1,2/2,2/3 is given by the simple form
2
Q(v) = yl-2y 2+?>yl m
Theorem 8.9 makes available many different expressions for the value q(v) of a
quadratic form. From one point of view, it describes the effect of a change of variables
from the set xi,X2,...,xn to the set y\,y2,---,Vn according to the rule that X = PY.
Such a change of variables is called a linear change of variables. The terminology of
Definition 8.6 can be applied to the variables yi as well as the xi.
Corollary 8.10 Suppose that v = (#i,£2? •••»^n) and A represents the quadratic form
q relative to X\,X2, ·~,Χη· If the variables 2/1,2/2» •••?2/n are related to the Xi by X — PY
with P invertible, then the matrix B = PTAP represents q relative to 2/1,2/2» • ••>2/n·
Proof. With B = PTAP, we have q(v) — YTBY from the theorem. This expression
is valid for any v = (#1,2:2, —<>χη) since Y — P~lX can take on any prescribed value.
Now B is symmetric, since
BT = (PTAP)T = PTAT (PT)T = PTAP = B.
The results of the preceding theorem and corollary motivate the introduction of the
relation of congruence on matrices.
Definition 8.11 Let A and B be matrices over the field T. Then B is congruent to
A over T if there is an invertible matrix P over T such that B = PTAP.
It is left as an exercise (Problem 8) to prove that congruence over T is an equivalence
relation on the set of all square matrices over T.
The principal objective of this and the next two sections is to show that the poly
nomial expression for the value of a quadratic form q takes on the particularly simple
form of a "sum of squares"
whenever the basis A in Theorem 8.9 is chosen appropriately. Example 2 illustrates this
simple form, but we have no method for finding an appropriate basis A at this time.
Exercises 8.3
1. Write g(xi,x 2 , ...,# n ) as XTAX with two different matrices A, one of which is
symmetric.
2. Suppose that for each v = (xi,X2, ...,x n ) in R n , q(v) = XTAX for the given
matrix A. For the given basis A of R n , find the expression for q(v) in terms of
the coordinates yi of V relative to A.
1 V2 - i
(a) A v/2 1 -v/2 , .4 = {(1,0,1), (3, V2,1), (3χ/2, - 4 , ^2)}
v z
2 2
1 -2 0
(b) A = -2 2 -2 Λ = {(1,0,1), (0,1,1), (1,1,0)}
0-2 3
1 -1 -1
(c) A -1 1 -1 Λ = {(1,0,0), (1,1,0), (1,1,1)}
-1 -1 1
1 1 2
(d) ,4 1 3 \ Λ={(1,0,0),(1,-1,0),(-11,3,4)}
2 I 7
3. Find the matrix that represents the given quadratic form relative to the variables
8.3 Real Quadratic Forms 253
5. Recall that a matrix ^4 is skew-symmetric if and only if ^4T — —A. Prove that if
A is skew-symmetric, then XTAX = 0 for all X — [χι,Χ2, ...,a?n]T·
6. Prove that the sum and difference of two symmetric matrices of the same order
are symmetric.
7. Prove that any square matrix can be written uniquely as the sum of a symmetric
and a skew-symmetric matrix. (Hint: A + AT is symmetric.)
8. Prove that the relation of congruence over T is an equivalence relation on the set
of all square matrices over T.
9. Prove that two symmetric matrices A and B over R are congruent if and only
if they represent the same real quadratic form relative to two sets of n variables
that are related by an invertible linear change of variables.
10. Show that the set of symmetric matrices of order n over R is not closed under
multiplication.
11. Show that AÄ1 and ATA are symmetric for any matrix A.
12. Show that if A is an n x n matrix over R such that XTAX = 0 for all X =
[xi, #2, ···, Xn]T, then A is skew-symmetric.
254 Chapter 8 Functions of Vectors
P T A P = diag{A!,A 2 ,...,A n }.
This problem has something of the same flavor as the diagonalization problem studied
in Section 7.4. The difference is that PTAP has taken the place of P~lAP. It is not
unnatural, then, to consider the possibility of finding a matrix P over R such that
PTAP is diagonal and PT = P~l. These are clearly strong restrictions to be placed
on P, but oddly enough it is true that such a P always exists (whenever A is real and
symmetric). Those matrices P that have the property that PT — P~l are quite useful
in many instances, and have a special name.
It would seem natural to expect a connection between the use of the word orthogonal
to describe a matrix and the use of the same word to describe a set of vectors in Chapter
1. We recall that an orthogonal set of vectors is a set {\i\ \ X G £ } such that u\1 -U\2 — 0
whenever λι φ X2· An orthogonal basis of a subspace W of R n , then, is a basis of W
that is an orthogonal set of vectors. The two uses of the word orthogonal are related,
but not exactly in the way that one might expect. Definition 8.13 and Theorem 8.14
explain the relation completely.
Proof. Let P = [pij)rxr over R, and let A — {ui, U2,..., u r } be an orthonormal set
in R n . Then P is the matrix of transition from A to a set B = {vi, V2,..., v r } of vectors
in R n such that Vj = X ^ = 1 Pkj^-k- Now B is orthonormal if and only if v^ · Vj = 6ij for
8.4 Orthogonal Matrices 255
Σ PkiUk ) ( Σ PmjUm )
rfc=l r / \m=l /
= Σ Σ PkiPrnjUk ' U m
- t Σ PWVA.
/c=l m = l
r
= Σ P/czP/ej,
fc=l
where u^ · u m = 6km since Λ is orthonormal. But this last sum is precisely the element
in the i t h row and j t h column of PT P. Thus v^ · Vj = 6{j if and only if PTP — 7 r , and
the proof is complete. ■
Vi · W2 = Vi · U 2 - (Vi · U 2 ) V i · Vi
= vi · u 2 - vi · u 2
- 0,
w2
so w 2 is orthogonal to νχ. Put v 2 = -. Then {vi, v 2 j is an orthonormal set and v^
||w 2 ||
is a linear combination of Ui,..., u^ for i = 1,2.
256 C h a p t e r 8 Functions of Vectors
If r > 2, let
w 3 = u 3 - (vi · u 3 )vi - (v 2 · u 3 )v 2 .
Then w 3 φ 0, since w 3 = 0 would require that u 3 be dependent on {vi, v 2 } and hence
on {u 1 ,U2}. Since
Then Wfc+i φ 0, since otherwise ιι^+χ would be dependent on {vi, v 2 ,..., v^} and hence
on {ui,u 2 ,...,Ufc}. Since
k
Vi · W f c + 1 = Vi · U f c + i - Σ ( V J * U fc+l)Vi · Vj
3=1
= V^ · Ufc+i - Vi · Ufc + i
= o,
Next we let
w 2 = u 2 - (vi · u 2 )vi
= (0,0,4,l)-3(|,0,|,i)
= (-2,0,2,0)
and
W2
v2 ^(-2,0,2,0) = ( - ^ , 0 , ^ , θ ) .
|w 2 |
To obtain the third vector in our orthonormal basis of (A), we write
w 3 = u 3 - (vi · u 3 )vi - (v 2 · u 3 )v 2
= (8,0,3,5)-9(§,0,§,I)-(-^)(-^,0,^,o)
= (8,0,3,5)-(6,0,6,3) + ( - | , 0 , |,0)
= (-è,o,-|,2).
Finally, we let
v3 = Ä = ^(-1,0,-1,4).
w3
Thus the set {νχ, v 2 , V3} is an orthonormal basis of (.4), where
vi = | ( 2 , 0 , 2 , 1 ) , v2 = ^ ( - 1 , 0 , 1 , 0 ) , v3 = ^ ( - 1 , 0 , - 1 , 4 ) . ■
Exercises 8.4
1 \/3
2 2
2. Let A = {(§,§, i ) , ( - § , i , § ) } and P
y/3 1
2 2
4. Given that the set A is linearly independent, use the Gram-Schmidt Orthogonal-
ization Process to obtain an orthonormal basis of (A).
(b) ,4 = { ( 2 , 0 , - 3 , 6 ) , (2,1,0,4), ( 1 , 7 , - 1 , 3 ) }
(c) .A = { ( 2 , 2 , 1 ) , (0,4,1), (8,3,5)}
(d) A = {(1,0,1,0), (1,1, - 3 , 0 ) , (2, - 3 , - 4 , 1 ) , (2,3, - 2 , - 3 ) }
5. Prove that the set of all orthogonal matrices of order n is closed under multipli
cation.
8. Prove that, given any unit vector vi in R n , there is an orthonormal basis R n that
has vi for its first element.
10. (See Problem 9.) Prove that T is orthogonal if and only if T maps an orthonormal
basis {vi, V2,..., v n } of R n onto an orthonormal basis {T(vi),T(v2), ...,T(v n )}
of R n .
11. (See Problem 10.) Let M = {vi, v 2 ,..., v n } be an arbitrary orthonormal basis of
R n . Prove that T is orthogonal if and only if the matrix of T relative to λί is an
orthogonal matrix.
12. (See Problem 11.) Show that the product of two orthogonal linear operators on
R n is an orthogonal linear operator.
13. It follows from Problem 11 that any orthogonal linear operator is invertible. Prove
that if T is orthogonal, then T _ 1 is orthogonal.
{A - \jI)Pj = 0
8.5 Reduction of Real Quadratic Forms 259
can be taken to be real since A is real. We show in our next theorem that the Aj's are
indeed real because A is symmetric. Before proceeding to this theorem, we introduce
some needed terminology and notation.
Definition 8.16 Let C = [crs]rnXn be a matrix over the field C of complex numbers.
The conjugate ofC is the matrix C = [ c r s ] m X n , where crs = ars — ibrs is the conjugate2
of the complex number crs = ars + ibrs.
If a and b are real numbers, the notation Ί for the conjugate a — bi of the complex
number z = a + bi is a standard one, and we have merely extended this notation to
matrices. The basic properties ~ζγζ~2 — ~z\z2 and z\+ z%= ~z\ Ύ~ζ2 are valid for matrices:
ÄB = ~AB and A + B = ~A + Έ. (See Problem 3 of the exercises.)
Proof. Let A b e a real symmetric matrix, let λ be any eigenvalue3 of A in the field
C, and let X be an eigenvector over C associated with λ. Then AX = ΛΧ, and this
implies that
X*AX = XX* X.
We regard this equality of 1 by 1 matrices as an equality of numbers. Now
{X*AX)* = X*A* (X*)* = X*AX
since A = A and AT = A imply that A* = A. Hence X*AX is a real number. Also,
n n
X*X = ^xkxk = Σ \χΛ2 >
fc=l k=l
Proof. The proof is by induction on the order n of A. The theorem is trivially true
for n = 1.
Assume that the theorem is true for all k x k matrices, and let A be a real symmetric
matrix of order k +1. Let T be the linear operator on R / c + 1 that has matrix A relative to
£fc+i. Let λι be any eigenvalue of T (and A). Then λι is real and there is a corresponding
eigenvector vi in R fe+1 . Since any multiple of vi is in the eigenspace V\1, we may assume
that vi is a unit vector. It follows from Theorem 8.15 that {vi} can be extended to an
orthonormal basis
ΛΓ = {vi,v 2 ,...,Vfc+i}
of R fc+1 (See Problem 8 of Exercises 8.4). The matrix of transition P\ from Sk+i to λί
is orthogonal (Theorem 8.14), and T is represented by
λι 0
A1 = P f U P i
0 I A2
relative to ΛΛ (The elements to the right of λι in the first row must be zero because
of symmetry.) The k x k matrix A2 is real and symmetric, so it follows from our
induction hypotheses that there is an orthogonal k x k matrix Q such that Q~l A2Q —
diag{A2,..., Afc+i}. It is readily verified that
Γ 1 | 0
P2 = - + -
Lo i Q
is orthogonal and that
Ρ2ιΑλΡ2 = Ρ^ιΡ{ιΑΡλΡ2 = diag{Ax, λ 2 ,..., A fc+1 }.
Now P\P2 is orthogonal since
(P1P2)T = ΡξΡ? = Ρ^Ρϊ1 = (Ρ1Ρ2Γ 1 ,
and thus P — P\P2 is the desired matrix. ■
The proof of Theorem 8.19 is not exactly constructive, although it does suggest an
iterative procedure to obtain P by beginning with the selection of Λχ, Vi and P\. It is
not advantageous for us to pursue this lead, as the next theorem leads to a much more
efficient procedure.
We observe that if u and v are vectors in R n with coordinate matrices X —
[χι,χ 2 ,·..,£η] Τ and y = [2/1,2/2, ···, UnY relative to £ n , then u · v =Y2=ixkVk = ΧΎΥ·
T
In particular, u and v are orthogonal if and only if X Y — 0.
Theorem 8.21 Let A be a real symmetric matrix. If Xr and Xs are distinct eigenvalues
of A with associated eigenvectors Pr and Ps, respectively, then PjPs — 0.
and
(PjAPs)T = PjAPr = Pj{XrPr) = KPjPr = A r (P r T P s ) T .
But (PjAPs)T = PjAPs and {PjPS)T = Pj Ps since PjAPS and PjPs are matrices
of order 1. Hence
XrPr Ps = XsPr Ps
and
(Ar - Xs)PjPs = 0.
Since λΓ — Xs φ 0, it must be that PjPs = 0. ■
In Section 7.4, we found that P = [Ρχ, Ρ<ι,..., Pn] was an invertible matrix such that
P~lAP was diagonal if and only if the columns Pj of P were the coordinate matrices
relative to Sn of a basis of eigenvectors \j of the associated linear transformation T. Since
PjPs is the element in row r and column s of PTP, the requirement that PT — P~l
is satisfied if and only if PjPs — 6rs. Since PjPs — v r · v 5 , this is equivalent to
requiring that the basis of eigenvectors be orthonormal. Theorem 8.21 assures us that
eigenvectors from distinct eigenspaces are automatically orthogonal. Thus, the only
modification of the procedure in Section 7.4 that is necessary to make P orthogonal
is to choose orthonormal bases of the eigenspaces V\j. This is illustrated in the next
example.
we find that {(1,1,1)} is a basis of the eigenspace V_i and {(1, —1,0), (1,0, —1)} is a
basis of V2. Applying the Gram-Schmidt process, we obtain
{A* 1 ' 1 · 1 *}
and
{75U.-LO).^1'1'-2>}
as orthonormal bases of V_i and V2, respectively. Hence
1
y/3
1
x/2
1
'V2 va 1
_ 1
P = 1
V3
1
V2
1
V6
V2 - v / 3 1
1
0 2
Vë .
V2 0 -2
x = pnx' +p\2y'
y = p2ix' + P22y'
to a diagonalized form
Ai(z')' + W Λ)2
It is shown in analytic geometry that such a change of variables corresponds to a rotation
of the coordinate axes about the origin. The different possibilities for the signs of the
eigenvalues correspond to different types of conic sections, with degenerate cases possible
in each instance. If λι and \2 are nonzero and of the same sign, the conic section is a
circle or an ellipse. If Ai and X2 are nonzero and of opposite sign, the conic section is
a hyperbola. If exactly one of λι, Χ2 is zero, the conic section is a parabola. If both λι
and À2 are zero, the graph is a straight line.
8.5 Reduction of Real Quadratic Forms 263
As mentioned in Section 8.3, the quadric surfaces can be related to quadratic forms
in three variables. This relation can be analyzed in a manner analogous to that for the
conic sections. However, this analysis becomes somewhat involved, and it is omitted
here for this reason.
Exercises 8.5
Γθ 2 2 1 2 -4
(aM = 2 0 2 (b)A = 2 -2 -2
[ 2 2 0 -4 -2 1
4 -1 (3 1
17 2 - '2
- 1 5 - 1 0
(c)A = 2 14 <4 (dM =
0 - 1 <4 - 1
! -2 4 l·4 1 0 -1 5
4. Prove that C* = (C T ).
8. In the proof of Theorem 8.19, verify that the matrix P 2 is orthogonal and that
P2~lAiP2 = diag{Ai, λ 2 ,..., Afc+i}.
9. Let u and v be vectors in R n with coordinate matrices X = \ V and
Y = [2/1,2/2, •••72/n]7\ respectively, relative to an orthonormal basis {vi, v 2 ,..., v n }
of R n . Prove that u v = XTY.
264 Chapter 8 Functions of Vectors
10. LetA/* = {vi, v 2 ,..., v n } be an orthonormal basis of R n ,and let X = [x\,X2, ••• 5 ^n] T
be the coordinate matrix of u relative to ΛΛ Prove that Xk = u · v^ for k —
l,2,...,n.
11. Prove that if A = BT B for some matrix B over R, then XTAX > 0 for all
X = [#i,#2> '-·,χη]Τ o v ^ r R»
12. Assume that the linear operator T of R n has a symmetric matrix relative to an
orthonormal basis of R n . Prove that the eigenvectors of T which correspond to
distinct eigenvalues are orthogonal.
where
PTAP = D = diag{Ai,A 2 ,...,A n }.
The matrix Y = [yi,i/2, ...,2/ n ] T is the coordinate matrix of v = (χι,Χ2, — ,Xn) relative
to the basis A — {ui,U2, . . . , u n } , where P is the transition matrix from En to A. An
interchange of two vectors u^ and Uj in A amounts to an interchange of the variables
yi and yj, and such an interchange is reflected in the matrix D by an interchange of λ^
and Aj. The diagonalized representation can thus be written as
Definition 8.22 The rank of the real quadratic form q with ç(v) = XT AX is the rank
of the matrix A.
The discussion above shows that the rank of q is the same as the number of variables
having nonzero coefficients in a diagonalized representation of q. We shall examine these
nonzero terms in more detail.
Theorem 8.23 In any two diagonalized representations of a real quadratic form q, the
number of variables with positive coefficients is the same and the number of variables
with negative coefficients is the same.
8.6 Classification of Real Quadratic Forms 265
D1 = diag{di,..., 4 , 4 + 1 . - X , 0,..., 0}
with the first k elements d!{ positive.
Let Λ = {ui,U2, ...,u n } and B = {vi, V2,..., v n } be bases of R n chosen so that
[2/1 ? 2/2, ••·,2/η]Τ and [zi,z2, ...,zn]T are the coordinate matrices of v = (xi,x2, ...,xn)
relative to Λ and ß, respectively (see Corollary 8.20). Now q(ui) = d{ > 0 for i =
1,2, ...,p. Therefore, for any w = Σ?=1 yiUi G W i = (ui, u 2 ,..., u p ) ,
p
i=l
Also, q(vi) = d'i < 0 for i = k + l,fc 4- 2, ...,n, so that for each w = ΣΓ=Α;+ι ^ v * ^
W 2 = (Vfc+i,Vfc+2,...,Vn),
d
q{w)= £ ^0·
i=k+l
Definition 8.25 The index of the quadratic form q is the number p of positive coeffi
cients appearing in a diagonalized representation of q. The difference s between p and
the number of negative coefficients in a diagonalized representation of q is the signa
ture of q. That is, the signature of q is the number s = p — (r — p) = 2p — r, where r is
the rank of q.
Theorem 8.23 shows that the index and signature of q are well-defined terms. The
signature of g is a measure of the "positiveness" or "negativeness" of q.
Theorem 8.26 A quadratic form q on R n with rank r and index p can be represented
as
, 2 _ 2
9(v) zf + -+- zp zp+1
by a suitable invertible linear change of variables.
Proof. Let
q(v) = dxy\ + · · - + dpyl + dp+iyl+l + · · · + dry2r
be a diagonalized representation of q with d\, ...,d p positive and d p +i, ...,c! r negative.
For i = 1,2, ...,p, di has a positive real square root \J~di, and we put z% = y/dïyi. For
i = p-f l,p + 2,..., r, —di has a positive real square root ^/jdï|, and we put Zi = y/\di\yi.
For 2 = r + l,...,n, we put Zi — y%. The linear change of variables
21
Z2 V2
= diag{ v d i , ···, yJoTv, y |d p +i|,..., v/je^f,1» —» 1}
Vn
e
- .2
= * f + ··■ + *£ VU
The form q(v) = ζ\λ \-z% — ζ^+ι z2, in Theorem 8.26 is called the canonical
form for q. There are two corollaries to the theorem concerning symmetric matrices.
Proofs are requested in the exercises.
8.6 Classification of Real Quadratic Forms 267
Corollary 8.27 Any real symmetric matrix A is congruent over R to a unique matrix
of the form
Γ
iP I o I o
—+ — +—
c 0 I -Ir-p I 0
— + — +—
0 0 0
where r is the rank of A.
The number p of positive l's in C is called the index of A. Since the linear change
of variable in the proof of Theorem 8.26 is clearly not necessarily orthogonal, the matrix
P used to obtain PTAP = C in Corollary 8.27 may not be orthogonal.
Corollary 8.28 Two symmetric n x n matrices over R are congruent over R if and
only if they have the same rank and the same index.
Definition 8.29 Let q be a real quadratic form on R n with rank r and index p.
(1) If p — r — n, q is called positive definite.
(2) If p = r, q is called positive semidefinite.
(3) lfp = 0 and r = n, q is called negative definite.
(4) lfp = 0, q is called negative semidefinite.
Each of the conditions in Definition 8.29 can be formulated in terms of the range of
values of q. These formulations are given in the next theorem, with proofs requested in
Problem 8.
Theorem 8.30 Let q be a real quadratic form on R n .
(1) q is positive definite if and only if q(v) > 0 for all v φ 0 in R n .
(2) q is positive semidefinite if and only if q(y) > 0 for all v G R n .
(3) q is negative definite if and only if q(y) < 0 for all v ^ 0 in R n .
(4) q is negative semidefinite if and only if q{w) < 0 for all v G R n .
Example 1 D In Example 1 of Section 8.5, we obtained the orthogonal matrix
V2 Λ/3 1
* =Λ y/2 -V3 1
V2 0-2
1 -1 -1
A = -1 1 -1
-1 -1 1
268 Chapter 8 Functions of Vectors
1 y/3 2
P = P2P3 = 2~73
Λ? -y/3
0
is a matrix such that PTAP = diag{l, 1,-1}. We note that P is not orthogonal. It is
clear that q has rank 3, index 2, and signature 1. None of the terms in Definition 8.29
apply to q. M
8.6 Classification of Real Quadratic Forms 269
Exercises 8.6
For each v = (χχ, X2, ···, xn) in R", a quadratic form q is defined by q(v) XTAX
for the given A. Find the rank, index, and signature of q.
I V2 -\ 1 -2 0
(a) A y/2 1 -s/2 (b)A -2 2 -2
-y/2 0 -2 3
1 -1 -1 -2 2 2
(c)A -1 1 -1 (d)A. 2 1 4
-1 - 1 1 2 4 1
2. Find the canonical form for each of the quadratic forms referred to in Problem 1.
3. For each of the following matrices A, find an invertible real matrix P such that
PTAP is of the form C given in Corollary 8.27.
0 2 2 2 -4
(a) A: 2 0 2 (b)A = -2 - 2
2 2 0 -2 1
4 -1 0 1
17 2 -2
-1 5 -1 0
(cM = 2 14 4 (dM =
0 -1 4 -1
-2 4 14
1 0 -1 5
4. For each of the matrices A in Problem 3, let q be the quadratic form on R n with
q(v) = XTAX. Find a basis B of R n such that q(v) has the form
q{v) = *i + · · · + Zp - z p + i
(a) Prove that A is positive definite if and only if A = BTB for some real
invertible matrix B.
(b) Prove that A is positive semidefinite if and only if there exists a (possibly
singular) real matrix Q such that A = QTQ.
(c) Prove that A is positive definite if and only if all of the eigenvalues of A are
positive.
( m
^aiUi^bjVj]
n \ m n
=^2Y^aibjf(ui,vj)
i=\ j =l ) i=l j=l
for all positive integers m and n. Thus, a function / from U x V to T is a bilinear form
on U and V if and only if (iv) is satisfied as an identity.
We are primarily interested in the case where the vector spaces U and V in Definition
8.31 are finite-dimensional. For the remainder of this section, U and V will denote vector
spaces over T with dimensions m and n, respectively.
8.7 Bilinear Forms 271
Definition 8.32 Let A — {ui,U2, ...,u m } and B — {vi, V2,..., v n } be bases ofU and
V, respectively, and let f be a bilinear form on U and V. The matrix of f relative to
A and B is the matrix A = [ a ^ ] m X n , where
CLij = f{Ui,Vj)
We say that the matrix A represents the bilinear form / . As with linear functionals,
the function values / ( u , v) can be expressed compactly by use of the matrix of / .
Theorem 8.33 Let X — [xi,#2? • •••>Xm]T denote the coordinate matrix of a vector
u G U relative to the basis A ofU, and let Y = [3/1,2/2? ...,ΐ/ η ] Τ be the coordinate matrix
of a vector v G V relative to the basis B of V. If A— [ a ^ ] m X n , then A is the matrix of
the bilinear form f relative to A and B if and only if the equation
f(u1v)=XTAY
m n
Z(u,v) = £ Y,Xiyjf(uu\3)
1=1.7 = 1
ra n
i=lj=l
m 1 n \
= Σ ^ Σ%■%· ·
i=l \j =l /
Now ΣΊ=Ι UijVj is the element in the i t h row of the m x 1 matrix AY, and thus
Y = [6ij,62jT..16nj]T. Hence
/(ΐϋ,ν,-) = XTAY
a
Σ2=1 lk^kj
Y2=\ a2kÔkj
Su 62i Ornj.
a
2^k=l mkàkj
aij
a2j
Su 6>2i * "
~ }-^k=l °kiükj
=
Q>ij >
It follows from Definition 8.32 and Theorem 8.33 that the two bilinear forms / and
g on U and V are equal if and only if they have the same matrix relative to fixed bases
A of U and B of V.
As we have in similar situations previously, we ask about the effects of changes of
bases in U and V. The answer is obtained quite easily.
Theorem 8.34 Let A be the matrix of the bilinear form f relative to the bases AofXJ
and B o/ V. If Q is the matrix of transition from A to the basis A! of U and P is the
matrix of transition from B to the basis B' o/V, then QTAP is the matrix of f relative
to A' and B'.
Proof. Suppose that u G U has coordinate matrix X relative to A and X' relative
to A!. Let v G V have coordinate matrix Y relative to B and Y' relative to B'. By
Theorem 5.14, X = QXf and Y = ΡΥ'. Combined with the "only if" part of Theorem
8.33, this yields
/(u,v) = XTAY
= (QX')TA(PY')
= (X')T{QTAP)Y'.
But, by the "if" part of Theorem 8.33, this means that QTAP is the matrix of / relative
to A' and ff. ■
8.7 Bilinear Forms 273
Corollary 8.35 Two mxn matrices A and B over T represent the same bilinear form
on U and V relative to certain (not necessarily different) choices of bases of U and V
if and only if A and B are equivalent over T.
The r a n k of a bilinear form / is defined to be the rank r of any matrix that represents
/·
Corollary 8.36 Let f be a bilinear form on U and V. With suitable choices of bases
in U and V, / can be represented by a matrix Dr that has the first r diagonal elements
equal to 1 and all other elements zero.
We obtain the matrix A by computing the values α^ = /(u^, Vj). For example,
an = /((1,0,0),(1,-1))
= -7(1)(1) - 10(1)(-1) - 2(0)(1) - 3 ( 0 ) ( - l ) + 12(0)(1) + 17(0)(-1)
= 3.
Similarly,
a 2 1 = / ( ( l , l , 0 ) , ( l , - l ) ) = 4,
a3i=/((l,l,l),(l,-l)) = -l,
and so on. Thus
Γ 3 —4 1
A=
274 Chapter 8 Functions of Vectors
In order to use A to compute the value of / ((2,3,1), (0, —1)), we first write
and
(0,-1) = 2 ( 1 , - l ) + ( - l ) ( 2 , - l ) .
Then the equation / ( u , v ) = X7 AY yields
3 -4
2
/ ((2,3,1), (0,-1)) = [-1,2,1] 4 -5 = 12.
-1
-1 2
1 0 0
-5 4
QT = 0 1 0 and P =
-4 3
3 -2 1
1 0
T
Q AP 0 1
0 0
Using Q as the matrix of transition from A to A' and P as the matrix of transition
from B to Β', we obtain
Exercises 8.7
1. Find the matrix of the given bilinear form / on U and V with respect to the given
bases A and B.
(d) U = R 3 , V = R 2 M = { ( 1 , 1 , 0 ) , ( 1 , - 1 , 1 ) , ( 0 , 1 , 0 ) } , B = { ( 1 , 2 ) , ( 2 , 1 ) } ,
/ ( ( x i , £ 2 , ^ 3 ) , (2/1,2/2)) = 2xiyi +4:XXy2 - 6x22/i + 3x22/2 + X3ÎJ1
(e) U =C 3 , V =C 2 , A={(h 0,0), (1, i, 0), (0,0,2i)}, S = {(1 - i, i), (i, - * ) } ,
/ ( ( x ! ^ ^ ) , (2/1,2/2)) = 5xi2/i + ixi2/2 - ^ 2 2 / 1 +2x22/2 + 2x32/i - ^32/2
(f) U = C 3 ,V = C 2 M = { ( l , 0 , 0 ) , ( l , l , 0 ) , ( l , l , l ) } , ß = { ( l , - l ) , ( 2 , « l ) } ,
/ ( ( X 1 , X 2 , X 3 ) , (2/1,2/2)) = X l 2 / 2 + X 2 2 / l +2X22/2 - 2 X 3 2 / 1 +2X32/2
3. Each of parts (a)-(f) below relate to the corresponding part of Problem 1 above.
In each case, a new pair of bases A', B' is given for the vector spaces U, V. Find
the matrix of / relative to A' and B' by use of Theorem 8.34.
4. Let / be the bilinear form on R 4 and R 3 that has the given matrix A relative to
£4 and £3. Find bases A' of R 4 and B' of R 3 such that, relative to A' and #', /
has a matrix of the form Dr described in Corollary 8.36.
1 0 2 1 2 3
4 1 3 -2 --4 - 6
(aM = (b)A =
3 1 1 1 0 -1
2 1 -1 -1 0 1
0 0 0
2
(c)A
2 1 3
{d) A-
0 1
1
4 2 6 3 -2 - i
1 0 1
(/ + 2)(u,v) = / ( u , v ) + p ( u , v )
(a/)(u,v) = a - / ( u , v ) .
Prove that the set of all bilinear forms on U and V is a vector space with respect
to these operations of addition and scalar multiplication.
9. (See Problem 8.) Let W be the vector space of all bilinear forms on R m and R n .
Prove that the mapping m(f) = A that sends a bilinear form onto its matrix A
relative to £ m and £n is an isomorphism of the vector space W onto the vector
space Rmxn as defined in Section 4.2.
Theorem 8.37 Let A and B be square matrices of ordern over ?\ Then B is congruent
to A over T if and only if B and A represent the same bilinear form on V.
Proof. Let A and B be matrices of order n over ?*, and suppose that A represents
the bilinear form / o n V relative to the basis A.
Assume first that B is congruent to A over T. Then B = PTAP for some invertible
P over T. Since P is invertible with elements in J-*, P is the transition matrix from A
to a basis A' of V. With U = V, A = B, Q = P, and A' = B' in Theorem 8.34, we have
B — PTAP is the matrix of / relative to A'.
On the other hand, suppose that B represents / relative to some basis A' of V. If P
is the matrix of transition from A to Α', then Theorem 8.34 asserts that PTAP is the
matrix of / relative to A'. But this matrix is unique, so it must be that B = PTAP.
Now P is invertible with elements in T since it is the transition matrix from one basis
of V to another. Hence B is congruent to A over T. M
8.8 Symmetrie Bilinear Forms 277
The connection established between bilinear forms and matrices in Theorem 8.33
suggests the possibility of describing properties of bilinear forms in terms of their matri
ces. Several interesting results along these lines can be obtained whenever the matrices
are square.
Definition 8.38 A bilinear form fonWis symmetric if / ( u , v) = / ( v , u) for all
u,v G V.
Theorem 8.39 A bilinear form f onW is symmetric if and only if every matrix that
represents f is symmetric.
Proof. Suppose that / is symmetric, and let A = [α^·]η be a matrix that represents
/ . Then CLij = / ( u ^ u ^ ) = / ( U J , U J ) = α^, and A is symmetric.
Suppose now that / is represented by a symmetric matrix A = [a^] n relative to
the basis A of V. Let u and v be arbitrary vectors with coordinate matrices X and Y,
respectively, relative to A. Then / ( u , v) = XTAY and / ( v , u) = YTAX. Since YTAX
is a 1 by 1 matrix, YTAX = (YTAX)T. Thus
/ ( v , u ) = (YTAX)T = XTAT(YT)T = XTAY = / ( u , v),
T
where A = A, since A is symmetric. ■
Our next theorem has an immediate corollary concerning symmetric bilinear forms.
Theorem 8.40 If 1 + 1 Φ 0 in T, every symmetric matrix A of order n over T is
congruent over T to a diagonal matrix.
Proof. The proof is by induction on the order n of A. The theorem is trivially true
for n = 1.
Assume that the theorem is true for all symmetric matrices of order k over T, and
let A be a symmetric matrix of order fc + 1 over T. Let A = {ui, 112,..., 11^+1} be a
basis of TkJrl, and let / be the symmetric bilinear form on Fk+l that has matrix A
relative to A. If A is the zero matrix, there is nothing to prove. Assume, then, that
&rs — / ( u r , u s ) 7^ 0 for the pair r, s. If / ( u r , u r ) = 0 and / ( u s , u s ) = 0, then
/ ( u r + u s , u r + u s ) = 2 / ( u r , u8) Φ 0.
Thus there is a vector v i G TkJrl such that d\ — / ( ν χ , ν ι ) φ 0. The set {vi} can
be extended to a basis B = {vi, V2,..., Vk+i} of TkJrX. The set A' = {u^u^, ...,u^. +1 }
is obtained from the basis B as follows: u! = νχ and uf- = v 7 V J u
i for
3 J
d\
j = 2,...,fc + l. That is,
ui = vi
, /(ui,v2) ,
u;2 = v 2 ^ — u ;
«1
11'
U
- v /«,ν^ + ΐ )U
fc+1 - V /c+l ^ l·
278 Chapter 8 Functions of Vectors
/(ui.uj) ./(u;,V;.Mp)u;)
/(U'l-Vj)- /(u'nu'i)
di
0.
1 1 0
Pi = — + —
0 I Q
is an invertible matrix, and
(ΡιΡ2)τΑ(ΡιΡ2) = P?P?APiP2
= PÎA1P2
1 I 0 1 Γ di I 0 1 1 0
— + — — + —
T 0
0 I Q 0 I Q
di I 0
0 I QTA2Q
= diag{di,d 2 ,.-.,dfc+i}·
Thus P = P1P2 is an invertible matrix such that PTAP is diagonal. The theorem
follows by induction. ■
8.8 Symmetrie Bilinear Forms 279
Much the same as with Theorem 8.19, the proof of Theorem 8.40 provides a basis for
an approach to the problem of finding an invertible matrix P over T such that PTAP
is diagonal, but this approach is not very efficient. We can proceed more directly in the
following manner to obtain such a P. The first column of P is simply the coordinate
matrix
Pn
P21
Pi =
Pn\
The vector u 2 must be chosen so that {u^, u 2 } is linearly independent and f(u[, u 2 ) = 0.
This latter condition means that the coordinate matrix
Pl2
, Ρ22
P2 =
Pn2
P[AP2 = 0.
That is, X\ = pi2,X2 — P22, ···, Xn = Pn2 must be a solution to the system (PfA)X = 0.
Similarly, the requirements that /(111,113) = 0 and / ( u 2 , u 3 ) = 0 are reflected by the
conditions (P^fA)Ps = 0 and (P%A)P$ = 0 on the coordinate matrix P3 of u 3 relative
to ^4, and so on. The method is illustrated in the following example.
280 Chapter 8 Functions of Vectors
Example 1 D The problem is to determine a real invertible matrix P such that PTAP
is diagonal, where
Γ
0 1 2
0 0
0 0
3
Let / be the symmetric bilinear form on R that has matrix A relative to £3. Since
/(βΐ,βί) = 0 for i — 1,2,3 and /(βι,β2) φ 0,we choose u[ = βχ + e 2 such that
/ ( u i , u i ) ^ 0 . Then Pi = [1,1,0] T and
P?A=[ll2
The choice P2 = [1» 1» ~~1]T satisfies (P± A)P2 = 0 and makes {u'1? u 2 } linearly indepen
dent. The matrix P%A is given by
P?A = -1 1 2
1 1
P = 1 1
0 -1
andPTAP=:diag{2,-2,0}.l
The following definition and theorem apply to infinite-dimensional vector spaces as
well as those of finite dimension, although our main interest is in the latter case.
Definition 8.43 Let Y be a vector space over T. A mapping q of V into T is a
quadratic form on V if and only if there is a symmetric bilinear form f on V such
that q(u) = / ( u , u) for all u G V.
It should be observed that the definition of a real quadratic form that was given
earlier is entirely consistent with this definition.
It is clear that each symmetric bilinear form / determines a unique associated
quadratic form q by the rule that q(u) = / ( u , u). The following theorem shows that this
correspondence between symmetric bilinear forms and quadratic forms is one-to-one if
1 + 1 ^ 0 in T.
Theorem 8.44 Let Y be a vector space over the field T in which 1 + 1 Φ 0. / / the
quadratic form q on V is determined by the symmetric bilinear form f on Y', then
/(u,v) = i[9(u + v ) - 9 ( u ) - ? ( v ) ]
for all u, v m V.
8.8 Symmetrie Bilinear Forms 281
Definition 8.45 Let Y be a finite-dimensional vector space over the field T in which
1 + 1 φ 0. The matrix of the quadratic form q relative to the basis Λ of Y is the same
as the matrix of the symmetric bilinear form f that determines q.
We note that the matrix of a quadratic form is required to be symmetric.
For the remainder of this chapter, we restrict our attention to those vector spaces
that have a field of scalars T in which 1 + 1 ^ 0 .
The intimate connection between quadratic forms and symmetric bilinear forms
means that many of the results obtained for symmetric bilinear forms translate imme
diately into statements about quadratic forms. The most important of these are listed
next.
1. If q has matrix A relative to the basis A of V and u has coordinate matrix X
relative to A, then q(u) = XTAX (Theorem 8.33).
2. If q has matrix A relative to the basis A of V and P is the matrix of transition
from A to A', then PTAP is the matrix of q relative to A! (Theorem 8.34).
3. Every quadratic form on V can be represented by a diagonal matrix (Corollary
8.41).
4. Every quadratic form on a vector space over C can be represented by a matrix Dr
with the first r diagonal elements equal to 1 and all other elements zero (Problem
6).
Exercises 8.8
1. For each matrix A, find an invertible matrix P such that PTAP is diagonal.
0 2-1 2 -1 1 0 2 3
(a)A = 2 1 1 (b)A = - 1 3 0 (c)A = 2 0-2
-1 1 -2 1 0 0 3-2 0
1 2 0 0 - 1 2 2 3
(a) A 2 -2 -3 (e)A -1 0 -3 (f)A = 1 -1
0-3 4 2-3 0 -1 1
282 Chapter 8 Functions of Vectors
2. For each matrix A in Problem 1, let / be the bilinear form on R 3 that has matrix
A relative to the basis A = {(1,1,1), (1,0,1), (0,1, —1)}. Use the matrix P from
Problem 1 to find a basis of R 3 relative to which / is represented by a diagonal
matrix.
3. Prove Corollary 8.41.
4. Prove Corollary 8.42.
5. Translate Corollary 8.42 into a statement concerning symmetric bilinear forms on
vector spaces over C.
6. Use Corollary 8.42 to prove (4) above in the list of properties of quadratic forms
over C.
7. Let q represent an arbitrary quadratic form.
/(u,v) = -/(v,u)
(a) Prove that / is skew-symmetric if and only if / determines the zero quadratic
form on V.
(b) Prove that the two bilinear forms f\ and ji on V define the same quadratic
form on V if and only if f\ — $2 is skew-symmetric.
(c) Does this "extension" actually enlarge the set of quadratic forms on V? Why,
or why not?
8.9 Hermitian Forms 283
Definition 8.46 Let U and V be vector spaces over the field ?'. A complex bilinear
form on U and Y is a mapping f of pairs of vectors (u, V)G U x V onto scalars
/ ( u , v ) G f that has the properties
(i) / ( a i u i + a 2 u 2 ,v) = â i / ( u i , v ) + ä 2 / ( u 2 , v )
and
(ii) / ( u , 6 i v i + 6 2 v 2 ) = 6 1 / ( u , v 1 ) + fc2/(u,v2),
where az- denotes the complex conjugate of ai.
We describe the conditions (i) and (ii) by saying that / is complex linear in the
first variable and linear in the second variable. It is readily seen that a function / from
U x V into T is a complex bilinear form if and only if
( r
2=1
s
j= l
\
J
r s
2=1 j = l
for all positive integers r, s.
Whenever the field of scalars is real, a complex bilinear form reduces to a bilinear
form. In this sense, a complex bilinear form is a generalization of a bilinear form. We
must keep in mind, however, that the two concepts are quite distinct whenever T is the
field C.
Throughout the remainder of this section, U and V will denote vector spaces over
T with dimensions m and n, respectively, and / will denote a complex bilinear form on
U and V.
Definition 8.47 Let Λ = { u i , u 2 , ...,u m } and B = {vi, v 2 ,..., v n } be bases ofXJ and
V, respectively. The matrix of a complex bilinear form / relative to Λ and B is
the matrix A = [ a y ] m x n , where ai3 — /(u^, Vj) for i = 1, 2,..., m; j = 1, 2,..., n.
At this point, a parallel can be perceived between the properties of bilinear forms
and those of complex bilinear forms. Actually, the parallel is so strong that it is quite
repetitious to develop the properties of complex bilinear forms in as much detail as
284 Chapter 8 Functions of Vectors
was done with bilinear forms. At the same time, the adjustments that are necessitated
by complex linearity in the first variable are not altogether obvious. Consequently, we
shall more or less outline the development here with statements of the major results
as theorems, and leave the proofs of most of these theorems as exercises. In all cases,
a proof can be obtained by a suitable modification of the proof of the corresponding
result for bilinear forms.
We recall that the conjugate transpose (A)T of a matrix A is denoted by A*.
Theorem 8.48 Let X — [x\,x2, • ••,^m] T denote the coordinate matrix o / u e U rela
tive to the basis A o / U , and let Y = [2/1,2/2? • ••»2/n]T be the coordinate matrix o / v G V
relative to the basis B o/V. If A = [a^^xn, then A is the matrix of the complex bilinear
form f on U and V if and only if the equation
f(u,v) = X*AY
is satisfied for all choices of u E U, v E V.
/(u,v) f((u1,U2),(vUV2,V3))
Vl
) 3i -l-3z
[Ul,U2\ v2
- 4 + 3z
V3
Hence / is the complex bilinear form on C2 and C3 that has the matrix
0 3z -l-3z
A=
i 4- i - 4 + 3i
It follows from Definition 8.47 and Theorem 8.48 that two complex bilinear forms /
and g on U and V are equal if and only if they have the same matrix relative to bases
A of U and B of V.
Theorem 8.49 Let A be the matrix of f relative to the bases AofU and B ofV. If Q
is the matrix of transition from A to the basis A! of U and P is the matrix of transition
from B to the basis B' of V, then Q*AP is the matrix of f relative to A! and B'.
8.9 Hermitian Forms 285
1 i
Q
i 0
and the matrix of transition from B to B' is
1 1 1
P= I1 1 0
1 0 0
1 1 1
1 -i 0 3z - 1 - 3i
Q*AP 1 1 0
-i 0 i 4- z - 4 + 3z
1 0 0
2 -i 1
i 3 0
Our main interest in complex bilinear forms is with the case where V = U. As with
bilinear forms and linear transformations, we assume that the bases A and B of V = U
are the same unless it is stated otherwise. For the remainder of the section, we shall
have V = U and V an n-dimensional vector space over T.
The equivalence relation that we need to replace congruence of matrices over T is
given in our next definition.
Definition 8.50 Let A and B be matrices over the field ?\ Then B is conjunctive
(or hermitian congruent) to A over T if there is an invertible matrix P over T such
that B = P*AP.
Theorem 8.49 and Definition 8.50 lead to the following result.
Theorem 8.51 Let A and B be matrices of order n over T. Then B is conjunctive to
A over T if and only if B and A represent the same complex bilinear form on V.
The term "hermitian" applies to complex bilinear forms in about the same way as
the term "symmetric" applies to bilinear forms. As a matter of fact, when T is the field
of real numbers, the two terms are coincident, just as "bilinear" and "complex bilinear"
are. In this sense, a hermitian complex bilinear form is a generalization of a symmetric
bilinear form.
286 Chapter 8 Functions of Vectors
/(u,v)=7ÖMÖ
for all u, v G V.
Definition 8.53 A matrix H over the field T is hermitian if and only if H* — H.
Thus a real matrix A is hermitian if and only if it is symmetric. The relation
between hermitian complex bilinear forms and hermitian matrices is exactly what one
would expect.
Theorem 8.54 A complex bilinear form f on V is hermitian if and only if every matrix
that represents f is hermitian.
Our next theorem corresponds to the result in Theorem 8.40 for symmetric matrices.
We include a proof for this theorem, since it is quite important and its proof is a bit
more difficult than the others in this section.
Theorem 8.55 Every hermitian matrix H of order n over C is conjunctive over C to
a diagonal matrix.
Proof. The proof is by induction on the order n of H. The theorem is trivially valid
for n = 1.
Assume that the theorem is true for all hermitian matrices of order k over C, and
let H be a hermitian matrix of order k + 1 over C. Let A = {ui, U2,..., u^+i} be a basis
of C fc+1 , and let h be the hermitian complex bilinear form on CkJtl that has matrix H
relative to A. If H = 0, H is a diagonal matrix already. Assume now that H ^ 0 and
/i(u r , u s ) = a + bi Φ 0 for the pair r, s. If /i(u r , u r ) = 0 and h(us, us) = 0, then
Thus, there is a vector vi G Cfc+1 such that d\ — Λ(νχ,νι) Φ 0. The set {vi} can be
extended to a basis B — {vi, V2,..., v^+i} of Ch+1. The set A' = {u^ui,, . . . , ^ + 1 } is
obtained from B as follows:
ui = vi
8.9 Hermitian Forms 287
and
fc(ui»vi)
u0 Ui
1 I 0
0 I Q
— P^HiPi
di I 0
0 I Q*H2Q
= diag{di,d 2 ,...,c4+i}.
It follows from Theorems 8.55 and 8.51 that any hermitian form h on V can be
represented by a diagonal matrix. Now the condition H* = H requires that the diag
onal elements of a hermitian matrix must be real. Hence the diagonal elements in a
diagonalized representation of h are always real.
The proof of Theorem 8.23 can be modified so as to obtain a proof of the follow
ing theorem. With q(v) replaced by /i(v,v) and R n replaced by C n , the only other
288 Chapter 8 Functions of Vectors
changes necessary are in the expressions for ft(v, v). For example, the two diagonalized
representations would appear as ft(v, v) = Σ*·=ι dk \yk\ a n d Mv> v ) = Σ)£=ι ck \zk\ ·
Theorem 8.57 In any two diagonalized representations of a hermitian form ft, the
number of positive terms is the same and the number of negative terms is the same.
The definitions of index, signature, positive definite, etc., apply to hermitian forms
as they are given in Definitions 8.25 and 8.29. The positive definite hermitian forms
are those that are fundamental to Chapter 9. The index of a hermitian matrix is by
definition the same as that of a hermitian form that it represents.
+ +
c= -Ir- V 0
+ +—
0 I 0 I 0
where r is the rank of H. Two n x n hermitian matrices are conjunctive over C if and
only if they have the same rank and index.
Corollary 8.59 With a suitable choice of basis in V, any hermitian form on V can be
represented by a matrix of the form C in Corollary 8.58.
ft(v,v) > 0
for all v ^ O .
Just after Corollary 8.42 in Section 8.8, we described a method for obtaining an
invertible matrix P that would reduce a given symmetric matrix A to a diagonal matrix
PTAP, and we illustrated the method in Example 1 of that section. In order to obtain
a method for finding an invertible matrix P that will reduce a hermitian matrix H to
a diagonal matrix P*HP, the only changes that are necessary in that description are
that each Pj A be replaced by Ρ?Η. The method for obtaining P such that P*HP is
of the form C in Corollary 8.58 is entirely analogous to the techniques of Section 8.8.
Our next example gives a demonstration of this method.
8.9 Hermitian Forms 289
5 Ai -A
H= I -Ai 3 5i
- 4 -5i 3
we shall find an invertible matrix P such that
I o |
■+ +
P*HP= I o | -J r _ p |
— + +
0 | 0 |
We first find an invertible matrix Pi such that ΡιΗΡι is diagonal. Since hu = 5 is not
zero in H = [hij], we can use
Ci =
as the first column of P\. The second column Ci of Ρχ needs to make {Ci, C<i\ linearly
independent and satisfy {C\H)Ci = 0, which appears as
[ δ 4i - 4 ] C 2 = 0.
The choice
C2
satisfies both conditions. The third column C3 of Pi needs to make {Ci, C2, C3} linearly
independent and satisfy (C{H)Cz = 0 and {C^H)^ = 0. That is, we need a column
C3 that is not a linear combination of Ci and C2 and that satisfies both equations
[ δ Ai -A ] C 3 = 0,
[ 0 - 9 i - 1 ] C 3 = 0.
The choice
-8<
1
-9z
290 Chapter 8 Functions of Vectors
1 4 -8i
Pi 0 0 1
0 5 -9i
is a matrix such that PfHPi is diagonal. Performing the multiplication, we find that
Γδ 0 0
ΡΪΗΡλ = 0 - 5 0
[0 0 16
Interchanging the second and third columns in Pi yields
1 -8i 4
P2 = 0 1 0
0 --9» 5
P*HP = (P2P3)*H(P2P3)
= P3*(P2HP2)P3
I
0 0 5 0 0
Έ
1
0 i 0 0 16 0 0 4
0
°i. 0 0 -5
°-°Λ
1 0 0
0 1 0
0 0-1
Exercises 8.9
Si 0 4z
(h) 0 2i - 2
-Ai 2 i
3. For each hermitian matrix H in Problem 1, find an invertible matrix P such that
P*HP is of the form C given in Corollary 8.58.
4. In each of parts (a)-(h) of Problem 1, let / be the complex bilinear form on Cn
that has the given matrix relative to Sn.
15. Prove that a hermitian matrix H is positive definite if and only if there exists an
invertible matrix P such that H = P*P.
16. Given that the matrix H in Problem 1(e) is a positive definite hermitian matrix,
find an invertible matrix P such that P*P — H.
18. A matrix A is called skew-hermitian if A* = —A. Prove that any square matrix
A can be written uniquely as A — B -f C with B hermitian and C skew-hermitian.
Chapter 9
9.1 Introduction
The vector spaces R n have been an intuitive guide in our development thus far, and
we have extended most of the concepts introduced there to more general settings. The
outstanding exceptions are the basic concepts of length and inner product in R n . In this
chapter, we generalize these concepts and study those properties of vector spaces that
are based on an inner product. It is possible to define an inner product for vector spaces
over fields other than the field R of real numbers or the field C of complex numbers, but
our interest is restricted to these cases. Accordingly, throughout this chapter, we shall
always have either T = R or T — C, and V shall denote a finite-dimensional vector
space over T.
We note that property (ii) requires that / ( v , v) be real and property (iii) requires
that this real number be positive except when v = 0.
If T = R in Definition 9.1, V is called a real inner product space, or a Euclidean
space. If T = C, V is called a complex inner product space, or a unitary space.
293
294 Chapter 9 Inner Product Spaces
The term inner product space is used to refer collectively to Euclidean spaces and
unitary spaces.
The properties listed in Definition 9.1 invite a comparison between inner products
and hermitian forms. This comparison yields the following theorem, which is important
even though the proof is trivial.
Proof. Suppose first that / is a positive definite hermitian form. Then / is a complex
bilinear form, so it follows from Definition 8.46 that / has property (i) of Definition 9.1.
By Definition 8.52, the mapping / has property (ii) of Definition 9.1. Finally, / has
property (iii) by Corollary 8.60. Hence / is an inner product on V.
Assume, on the other hand, that / is an inner product on V. The references cited
in the preceding paragraph show that / satisfies all of the requirements of a positive
definite hermitian form except possibly the condition that
/ ( u , 6 i v i + δ 2 ν 2 ) = /(&ivi + 6 2 v 2 , u )
= 6 i / ( v i , u ) + 6 2 /(v 2 ,u)
= &l/(Vi,u)+&2/(V2,u)
= &i/(u,v 1 ) + 6 2 / ( u , v 2 ) ,
The next example presents the most commonly used inner products in R n and Cn.
Example 1 □ For any two vectors u = (ui,u 2 ,..., un) and v = (v\,v2, ...,f n ) in R n ,
let the value / ( u , v) be given by
n
/ ( U , V ) =UiVi +U2V2-\ VUnVn = ^^UkVk>
k=l
where u = (ΐ/χ, u2, ...,u n ) a n d v = (vi,v2, ...,vn) in Cn. The verification that this map
ping is an inner product is left as an exercise (Problem 3). H
9.2 Inner Products 295
The inner products in Example 1 will be referred to hereafter as the standard inner
products on R n and C n , respectively. Whenever an inner product on R n or Cn is not
specified, it is understood to be the standard inner product.
r -i 2 1 Vl
f ({u\,u2),{vi,v2)) = U\ U2
1 1 v2
and since we are dealing with a real vector space, / is a hermitian form. Thus / is
an inner product if and only if / is positive definite; that is, if and only if the third
property in Definition 9.1 is satisfied. Since
/ ( v , v ) = 2v\ + 2viv2 + vl
= v\ Λ-{υλ Λ-υ2)2,
The connection established in Theorem 9.2 makes available the results of Section
8.9 for use with inner products. In particular, an inner product / o n V has a unique
matrix A = [α^]ηχη relative to each basis A = {vi, V2,..., v n } of V. This matrix A is
determined by the conditions α^ = /(v*, Vj), and / ( u , v) = X*AY where u and v have
coordinate matrices X and Y, respectively, relative to A. If P is the matrix of transition
form A to A', then / has matrix P*AP relative to A!', by Theorem 8.49. Since / is
positive definite, Corollary 8.59 implies that there is a basis A' of V such that the inner
product / has matrix In relative to A!. Relative to this basis A', / ( u , v) is given by
/ ( u , v ) = X*Y = Y^^iXkVk' Thus the standard inner products in Example 1 furnish
typical examples provided the choice of basis is appropriate.
The results obtained in this chapter are valid for all finite-dimensional inner-product
spaces, in that they are not dependent on a particular choice of / . For this reason, it
is customary to replace the notation / ( u , v) by a more convenient one. We choose to
drop the / from the notation, and simply write (u, v) instead of / ( u , v). This change
of notation has an additional advantage in that it reminds us that we are dealing with
an inner product, and not just a complex bilinear form.
According to the result in Problem 15 of Exercises 8.9, a hermitian matrix H is
positive definite if and only if H = P*P for some invertible matrix P. This result
296 Chapter 9 Inner Product Spaces
provides an easy way to construct inner products on Cn. For u = (u\, 112, --·,ηη) and
v = (υι,ΐ>2,—,νη) in C n , let
Ui
u2 V2
U V■
We can choose an invertible P of order n, put H — P*P, and then the rule
(u,v) = U*HV
defines an inner product on Cn. We say that the invertible matrix P generates this
inner product.
1 0 0
- i l i
0 i 1
(u,v) = Ui U2 U3 P*P V2
V3
2 i -1
U\ U2 U3 -i 2 0 V2
- 1 0 2 V3
Exercises 9.2
1. Using the standard inner product in C3, compute (u,v) and (v, u) for the given
vectors.
4. Let u = (u\,u2, ...,tt n ) and v = (vi,v2, •••^n) in R n . Prove or disprove that the
given rule defines an inner product on R n .
H
2 -1
is a positive definite hermitian matrix, and let (u, v) be the inner product on C3
that has matrix H relative to the basis
^={(1,0,0),(1,1,0),(1,1,1)}.
Write out the value of (u, v) for arbitrary vectors u = (u\, u2,us) and v =
(^1,^2,^3) i n C 3 .
6. Find a basis A' of C3 for which the inner product in Problem 5 has the form
(u, v) = X*y, where u and v have coordinate matrices X and V, respectively,
relative to A'.
8. Let A = {vi, V2,..., v n } be a basis of the inner product space V, and let u and
v be arbitrary vectors with coordinate matrices X and Y, respectively, relative to
A. Prove that (u,v) = X*Y for all u, v G V if and only if (v;,Vj) = 6ij for all
pairs i,j.
298 Chapter 9 Inner Product Spaces
2 i -1 2%
(v,v) = -2i 6 i -i 2 0 6 = 110
-1 0 2 —i
and
9.3 Norms and Distances 299
|(u,v)|<||u|H|v||.
Proof. If v = 0, the equality holds with both members zero. Consider the case where
v / 0 . Since the inner product is a positive definite hermitian form, we have
for any complex number z. Since v φ 0, we may let z — 7—^—-. This yields
(v,v)
(u,v)(u,v) < ( u , u ) ( v , v )
or
Ku,v)| \|2 ^ II ||2||
< ||u|| ||v|| .
||2
The statement of the theorem follows from taking the positive square root of both
members of the last inequality. ■
Example 2 □ We shall verify the Cauchy-Schwarz inequality using the inner product
from Example 3 of the last section with
u = (1 + i, - 3 i , - 2 ) , v = (2i, 6, -i).
300 Chapter 9 Inner Product Spaces
We have ||v|| = Λ/ΪΪΟ from our last example. Performing the other required computa
tions, we get
2 i -1 2%
(u,v) = i 3i -i 2 0 6 11+6H,
1 0 2 —i
2 i -1 1+2
(u, u) = 1-i 3i - 2 -i 2 0 -3z = 40,
- 1 0 2 -2
ull = VÏO.
The inequality
\/3842 < v'iOvTÏO = \/44ÖÖ
is a valid one, verifying the Cauchy-Schwarz inequality in this case. ■
We now derive the fundamental properties of the norm.
Theorem 9.5 The norm of a vector has the following properties:
(i) | | v | | > 0 i / v ^ 0 , and ||0|| = 0;
(ii) \\av\\ = \a\ · ||v|| ;
(iii) ||u + v|| < ||u|| + ||v||.
Proof. The first of these follows immediately from ||v|| = >/(v, v), since the inner
product (v, v) is a positive definite hermitian form.
For any a G T and any v G V,
u- = ( u - h v , u + v)
= (u,u) + (u,v) + (u,v) + (v,v)
< ||u|| 2 + 2 | ( u , v ) | + ||v|| 2 .
There is one more basic concept to be introduced in an inner product space, that of
a distance function.
Definition 9.6 For a given inner product (u, v) on V, the distance d(u, v) between
vectors u and v in V is defined by d(u, v) = ||u — v||.
The distance thus defined has the properties listed in the next theorem. The proof
is left as an exercise.
Theorem 9.7 The distance function d has the following properties:
(i) d(u, v ) > 0 i / u / v , and d(v, v) = 0;
(ii) d(u, v) = d(v,u);
(iii) d(u, v) + d(v, w) > d(u, w).
A set in which there is defined a distance function with the properties (i), (ii), (iii)
of Theorem 9.7 is called a metric space, and the distance function is called a metric
or norm for the space. The only norms that we are interested in are those connected
with an inner product, but there are more general norms.
Exercises 9.3
1. Compute the norm of the given v e C3.
(a) v = ( 3 , - i , l ) (b) v = (0,2i,l)
(c) v = (2i,i,0) (d) v = ( 5 + i,5,0)
2. Using the inner product from Problem 5 of Exercises 9.2, compute the norm of
each v in Problem 1 above.
3. Let u = (1^1,^2) and v = (^1,^2)· The rule
(u, v) = 2u\V\ + uiv2 + u2vi + 2u2v2
defines an inner product on R 2 .
(a) Use this inner product and verify the Cauchy-Schwarz inequality for u =
(1,2) and v = ( 3 , - 5 ) .
(b) Use this inner product and compute ||w|| for w = (—1,3).
(c) Using this inner product and the vectors from part (a), compute d(u,v).
1 1
4. Use the inner product on R 2 generated by and verify the Cauchy-
-1 2
Schwarz inequality for u = (3, —1) and v = (2, 4).
302 Chapter 9 Inner Product Spaces
5. Using the standard inner product on C2, write out the value of d(u, v) for arbitrary
vectors u, v G C2.
6. Find the value d(u, v) for the distance function determined by the inner product
in Problem 5 of Exercises 9.2.
7. Prove that the equality sign holds in Theorem 9.4 if and only if {u, v} is linearly
dependent. (Hint: Consider ((u, u)v — (u, v)u, (u, u)v — (u, v)u) in case u φ 0.)
8. Prove Theorem 9.7.
9. Use the Cauchy-Schwarz inequality to prove the following statements.
(a) For any real numbers ai, a2,..., an and 6i, &25 ···> bn,
(b) For any complex numbers a\, α2,..., an and 6i, b2,..., bn,
y^Qfcfrfc
k=l
< Ew2 Kk=2
ΣΝ
Kk=l
(u
-i< 'v) <i
- Hull · iivii -
for any u, v in V.
(b) We define the angle Θ between two nonzero vectors u and v in V by
HI · ||v||
with 0 < θ < 7Γ. Prove the law of cosines in V :
u- vi |υ|| 2 + | | ν | | 2 - 2 | | υ | | · | | ν | ^ ο 8 ^ .
11. For u = (1,-4) and v = (2, — 3), use the inner product from Problem 3 to compute
cos#, where Θ is the angle between u and v.
Ui U2 V\ V2
12. Let U = and y = . Given that the rule
u3 u 4 v3 v4
defines an inner product on R/2x2, use this inner product to compute cosö, where
1 2 0 3
Θ is the angle between A and B —
2 0 6 2
13. Assume that the trace function ({/, V) = t(UTV) is an inner product on R,2x25
1 2
and find the norm of
0 2
|2 1 il i|2
(u,v) = T ||u + v - 4 U-V .
I2 1 ll ||2 , i M , · ii2
(u,v) = \ ||u + v lu — ivll ,
I -£||u-v|| + ±||u + zv||
Theorem 9.9 Let P be anrxr matrix over R, and let V be a real inner product space.
Then P is orthogonal if and only if P is the transition matrix from one orthonormal set
of r vectors in V to another orthonormal set of r vectors in V.
304 Chapter 9 Inner Product Spaces
relative to the standard basis £3 of C3, the inner product of u = (1x1,^2,^3) and v
(^1,^2,^3) in C3 is given by
Vl
(u,v) = [ûi u2 u3 H V2
V3
9.4 Orthonormal Bases 305
1 1 1
N/3 V2 y/ë
0 -2- -i-
2 1 1
x/3 Ve J
and P*HP = Is- We note that the matrix P is not unitary, but it should not be since
£3 is not an orthonormal basis relative to this inner product. ■
Exercises 9.4
1. Show that, with the standard inner product in C 3 , each of the following sets is
orthogonal.
2. As in R n , a set is call normalized if each vector in the set has norm 1. Normalize
the sets in Problem 1.
3. Show that the set {(1,0,0), (1 + 2i, 1 + 2i, 1), (3 + 4i, 3 + 6i, 1 + 4i)} is orthogonal
with respect to the inner product of Problem 5 in Exercises 9.2.
4. With the standard inner product in C3, use the Gram-Schmidt process to find an
orthonormal basis of (A).
(a) A = {(1, - i , 1), (2,0,1 - i)} (b) A = {(0, - i , 1), (1 + i, 2,1)}
5. Let the inner product (A,B) be defined on R2X2 as in Problem 7 of Exercises
9.2: {A,B) = t(ATB). Obtain an orthonormal basis of R2X2 by applying the
Gram-Schmidt process to the basis
1 0 1 1 1 1 1 1
/
I
7 5 5
0 0 0 0 1 0 1 1
8. Given that A= { ( è ' ~ 7 i ' è ) ' ( " H T i ' - ! ) ' ί 1 ^ ' 0 ' ^Σ^) } is an orthonormal
basis of C3 relative to the standard inner product, extend each of the orthonormal
sets B below to an orthonormal basis of C3.
306 Chapter 9 Inner Product Spaces
11. Prove that the set of all unitary matrices of order n is closed under multiplication.
(a) Prove the Pythagorean theorem in V : ||u|| 2 + ||v|| 2 =,||u + v|| 2 if and only if
u is orthogonal to v.
(b) Prove that u + v and u — v are orthogonal if and only if ||u|| = ||v||.
Definition 9.14 Let Abe a nonempty subset ofV. The set AL (read A perp) is defined
by
A1- = {v G V | (u, v) = 0 for all u G A }.
That is, A1- consists precisely of all those vectors in V that are orthogonal to every
vector in A.
Example 1 D Let V = R 3 with the standard inner product, and let us consider several
possibilities for A.
liA= {(0,0,0)}, then A1- = R 3 .
If A consists of the single nonzero vector (αχ, α2,03), then A1- is the set of all vectors
(xi,X2,X3) in the plane a\X\ -t-a2#2 + ^3X3 = 0.
If A consists of two linearly independent vectors, A — {(«i, «2, «3), (61,62, ^3)}5 then
A1- is the line of intersection of the planes αιΧι+02X2+^3X3 = 0 and 61X1+62X2+63X3 =
0.
If A is a basis of R 3 , then A1- is the zero subspace. ■
In each of the cases considered in the preceding example, A1- was a subspace of R 3 .
It is not difficult to show that this is always the case.
Proof. If v is a vector such that (u, v) = 0 for all u G (A), then surely (u, v) =0 for
all u G A since AC (A). Thus (A)1- C A±.
Now let v G Α^. Any vector u G (*4) can be written as u = Σ ΐ = ι afcufc with
Uk G v4, and
(u, v) = ^α/cUfc, v = ^ a f c ( u f c , v).
\fc=l / fc=l
J
But (ufc, v) = 0 for each fc since v G ^4 -. Hence (u, v) = 0 and v G (A) since u was
arbitrary in (A). This gives A 1 Ç (.A) , and the proof is complete. ■
Theorem 9.16 shows that there is no loss of generality if we restrict our attention
to those W x where W is a subspace of V. In this case, W x is called the orthogonal
complement of W. The justification for this terminology is contained in the next
theorem.
Theorem 9.17 If W is any subspace of V, then the sum W + W 1 " is direct and
wew^v.
Proof. Now v G W Π W 1 - must satisfy (v, v) = 0, so W Π W x = {0}, and the sum
±
W + W is direct.
If W = {0} or W = V, the theorem is trivial. Suppose then that {vi, V2, ·.·, v r } is
an orthonormal basis of W and r < n. By Corollary 9.13, the set {vi, V2,..., v r } can be
extended to an orthonormal basis
A= {vi,v2,...,vr,vr+i,...,vn}
n
a
Σ kÖik
ai.
Proof. In the proof of the theorem, {vi, V2,..., v r } is a basis of W and {v r + 1,..., v n }
is a basis of W - 1 . ■
for all u G W" 1 . Since νχ G W , (v x , u) = 0 for all u G W x . This means that v G W±A-
if and only if (v 2 , u) = 0 for all u G W" 1 . But v 2 G W 1 -, so the last condition holds if
and only if (v2, V2) = 0. Hence v G W - 1 if and only if V2 = 0. ■
Exercises 9.5
7. Let / be a fixed linear functional on V v/ith matrix A = [αχ, α2,..., an] relative to
the orthonormal basis Λ of V, and let φ(/) be the vector in V that has coordinate
matrix A* relative to A. Prove that / ( v ) = (<£(/), v) for all v G V.
9.6 Isomet ries 309
8. Prove that the mapping φ that maps each / G V* onto </>(/) in V is a bijective
mapping from V* to V, but that φ is not an isomorphism except when V is
Euclidean.
9. Prove that, for any subspace W of V, / is in the annihilator W ° if and only if
Φ(ί) e w±.
10. With φ as in Problem 8, define (f,g) on V* by (f,g) = (</>{f),<l>(g)). Prove that
(f,g) defines an inner product on V*.
Hence
(T(u),T(v)) = i { | | T ( u ) + T ( v ) | | 2 - | | r ( u ) | | 2 - | | T ( v ) | | 2 }
= !{||u + v||2-||u||2-||v||2}
= (u,v),
310 Chapter 9 Inner Product Spaces
and therefore
(u, v) - (u, v) = -i {||u+w|| 2 - ||u|| 2 - ||v|| 2 } . (9.3)
Proof. Let A = {vi, V2,..., v n } be an orthonormal basis of V, and consider the set
Τ(Λ) = { Γ ( ν 1 ) , Τ ( ν 2 ) , . . . , Γ ( ν η ) } .
If T is an isometry, then by Theorem 9.21 we have (T(v^), T ( V J ) ) = (v^, Vj) = <5^·,
and therefore T(A) is an orthonormal basis of V.
Assume now that T(A) is an orthonormal basis of V. For any u = Σ™=ϊ a ^ , and
v
= E j = i bjVj in V,
Γ
(T(u),T(v)) = ( (Σ*νΦ Γ
^
= ΣΣ«Λ·(^),^))
n n
= Σ Σ âibjoij
n n
i=l j = l
( n
Σ α ν
n
* ί> Σ b
jVj
\
i=l 3=1 )
= (u,v).
Theorem 9.23 Let T be a linear operator on the unitary space V. Then T is unitary
if and only if every matrix that represents T relative to an orthonormal basis of Y is a
unitary matrix.
A similar application of Theorems 9.22 and 9.9 yields the corresponding result for
orthogonal operators.
Theorem 9.24 Let T be a linear operator on the Euclidean space V. Then T is or
thogonal if and only if every matrix that represents T relative to an orthonormal basis
ofV is an orthogonal matrix.
-* (£3) = | ^ 2 ' ~ 7 2 ' 2 ) ' \ 2' 75' ~ï) ' V~'0' ~~2/ / ' *
312 Chapter 9 Inner Product Spaces
Exercises 9.6
1 1 0 2 0
(a) 1 -1 0 (b) 0 1
1
0 0 1 2 0
2 -2 1 1 1 1
(c)è 2 1 -2 (d) 1 1 1
1 2 2 0 -1 2
2. Determine whether or not each of the given matrices represents a unitary operator
relative to an orthonormal basis of C3.
3 - 4z 3+ 2 i 2 1— 2 0
/
(*)* (z-2)\/2 2(z + l)v 2~ (2i-l)y/2 (b) 1+ 2 3 0
-1 1 + 32 4-3z 0 0 1
1 2 0 y/2 χ/3 1
4. Given that \ ( "To > "To ) ' ( "75 ' — 775 ) ί i s a n orthonormal basis of R 2 , find an isom
etry that maps ί - ^ , 4 j j onto (0, — 1).
5. Prove that equation (9.4) in the proof of Theorem 9.21 implies that (T(u), T(v)) =
(u,v).
9. Prove that T is an isometry if and only if the image of a unit vector under T is
always a unit vector.
9.7 Normal M a t r i c e s 313
Definition 9.25 If A and B are square matrices overC, then B is unitarily similar
to A if and only if there exists a unitary matrix U such that B = U~1AU. If B and
A are square matrices over R, then B is orthogonally similar to A if and only if
B — P~l AP for some orthogonal matrix P.
Definition 9.26 A matrix B = [6^] m X n is upper triangular ifbij = 0 for all i > j .
That is, an upper triangular matrix is one that has only zero elements below the
main diagonal. Similarly, a lower triangular matrix is one that has only zero elements
above the main diagonal.
The next proof should be compared with that of Theorem 8.19, for the technique of
proof is much the same.
Theorem 9.27 Every nxn matrix A overC is unitarily similar to an upper triangular
matrix B, and the diagonal elements of B are the eigenvalues of A.
B = {vi,v2,...,vn}
314 Chapter 9 Inner Product Spaces
A Äi
A1 = U^AUi =
0 A2
where R\ is 1 by k and A2 is of order k. By the induction hypothesis, there is a unitary
matrix Q such that Q_1A2Q is upper triangular. The matrix
1 0
U2 =
0 Q
is unitary, and
1 0 λ Äj 1 0
U2lAxU2
0 Q-1 0 Λ2 0 Q
λ RiQ
0 Q - M 2Q
Thus, υ2ιΑλυ2 is upper triangular since Q lA2Q is upper triangular. The matrix
U = U1U2 is unitary since U\ and C/2 are unitary, and
B = jy-^f/
is upper triangular since
5 = UïlUïlAUiU2 = υϊιΑλυ2.
It is clear that the diagonal elements of B are the eigenvalues of B = [bij] since
det(B - xl) = (&11 - x)(b22 - x) - - (bnn ~ x)-
But B and A have the same eigenvalues since they are similar. ■
Corollary 9.28 Every linear operator on a unitary space V can be represented by an
upper triangular matrix relative to an orthonormal basis of V.
On the surface, it would appear that there should be a result for real matrices
that corresponds to Theorem 9.27. That is, it would seem likely that every n x n real
matrix would be orthogonally similar to an upper triangular matrix over R. But a
closer examination shows that this is not the case at all. For the diagonal elements of a
triangular matrix are the eigenvalues of that matrix, and the eigenvalues of a real matrix
are not necessarily real (see Problem 1 for an example). However, for real matrices that
have only real eigenvalues, the proof of Theorem 9.27 can be modified so as to prove
the following theorem.
Theorem 9.29 Let A be a real matrix of order n. Then A is orthogonally similar to
an upper triangular matrix if and only if all the eigenvalues of A are real.
9.7 Normal Matrices 315
Now we are ready to establish the criterion for a matrix to be unitarily similar to a
diagonal matrix.
U~lAU = D = diag{di,d 2 , - . , d n } .
AA* = (UDU*)(UD*U*)
= UDD*U*
= UD*DU*
= UD*U*UDU*
= A* A.
Assume now that AA* = A*A. By Theorem 9.27, there is a unitary matrix U such
that B = U~lAU = U*AU is upper triangular. Now
B*B = U*A*AU.
Therefore, BB* = B*B, since AA* = A*A. Since brs = 0 in B = [bij] whenever r > 5,
the element in the z th row and j t h column of BB* is JZ^ = 1 bikbjk — Σ£=ζ bikbjk- Simi
larly, the element in the i t h row and j t h column of B*B is Σ £ = 1 bfci&fcj = Yl]e=i^kit>kj-
Equating the diagonal elements of BB* and B*B, we have
i n
5 > f c i | = £|6 î f c | 2 .
2
(9·5)
k=l k=i
316 Chapter 9 Inner Product Spaces
|6 22 | 2 = |6 12 | 2 + |6 22 | 2 = |6 22 | 2 + |6 23 | 2 + · · · + \b2n\2 .
Therefore &2j = 0 for all j> 2, and all elements except 622 in the second row of B are
zero. This procedure can be repeated with equation (9.5) to obtain
for i = l,2,...,n. Therefore, bij = 0 for all j > z, and B is a diagonal matrix. This
completes the proof. ■
Definition 9.32 A square matrix A over a field T Ç C is called normal if and only if
AA* = A* A.
The last theorem says that those matrices over C that are unitarily similar to a
diagonal matrix are precisely the normal matrices. These include the symmetric real
matrices, the hermitian matrices, the orthogonal matrices, and the unitary matrices.
However, there are normal matrices that do not fall into any of these categories. Such
an example is found in Problem 2.
Our next theorem gives a simple characterization of those real matrices of order n
that are orthogonally similar to a diagonal matrix.
The results established in this section give a practical way to determine whether
or not a matrix is unitarily similar or orthogonally similar to a diagonal matrix, but
a systematic procedure for finding a matrix that will accomplish the diagonalization is
yet lacking. Such a procedure will be developed at the end of the next section.
Exercises 9.7
2. Determine which of the following matrices are orthogonal, which are unitary, and
which are normal.
-1 3 3
x/2
.& o
3 1 3
2 U
i i—1
(a) 2 2
(b) 0 0 (c)
V2 i+ 1 0
3 3 1
v/2 2 2 J 2 2
1 I 1+i 5 i 2
2 i
(d) (e)| -i 1 (f) -i 2 -1
i 2
-1+i 1+i 0 2 - 1 2
a x y
3. For what values of x, y, z is the matrix A = 0 b z a normal matrix?
0 0 c
4. Prove that unitary similarity is an equivalence relation on the square matrices
over C.
7. Prove that if a matrix U is unitary, then all eigenvalues of U have absolute value
1.
8. Prove that a square matrix A over C is normal if and only if every matrix that is
unitarily similar to A is normal.
and
n
(v,T(w)) = X ; 6 i ( v , r ( v i ) ) .
3= 1
Thus the set of equations (9.6) is equivalent to the requirement that (u, w) = (v,T(w))
for all w G V. In other words, for each v G V, the value T*(v) is determined by the
equation
(T*(v),w) = (v,T(w)) for all w G V. (9.7)
This leads to the following definition.
Definition 9.36 For each linear operator T on V, the adjoint ofT is the mapping T*
of V into V that is defined by the equation
(T*(v),w) = (v,T(w))
for all v, w € V.
Theorem 9.37 For any linear operator T on V, the adjoint of T is a linear operator
on V.
9.8 Normal Linear Operators 319
Our next theorem provides a basis for the desired interpretation of the results in
Section 9.7.
Theorem 9.38 / / the linear operator T on V has matrix A = [α^] relative to the
orthonormal basis A of V, then T* has matrix A* relative to A.
Proof. Let A = {vi, V2,..., v n } . Then T(v^) = Y^=iQ>kiVk since A is the matrix
of T relative to A. Suppose B = \bij\n is the matrix of T* relative to A, so that
T
* ( v j ) = E L i bkjVk. From the definition of T*, we have (Τ(ν»), νά) = (VUT*(VJ)) .
But
(T(Vi),Vj) = lY^CLkiVkiVj)
n
= Σ) äfei(vfc,vJ·)
fc=l
n
fc=l
= a
i*
and
( V l ,T*( V j )) - ( V j , E^fcjVfc)
n
= Σ &fcj(Vt,V fc )
fc=l
n
fc=l
Theorem 9.41 A linear operator T on the unitary space V is normal if and only if
there exists an orthonormal basis of V that consists entirely of eigenvectors of T.
As was promised in the last section, we proceed now to develop a systematic method
for finding a unitary matrix that will accomplish a desired diagonalization.
Many of the results in Section 7.4 are helpful here, even though we are presently
restricted to orthonormal bases. If T is represented by a diagonal matrix, the elements
on the diagonal are the eigenvalues of T (Corollary 7.17). If T is normal, the geometric
multiplicity of each eigenvalue is equal to the algebraic multiplicity (Theorem 7.20).
For orthogonal similarity with a real matrix A, Theorem 9.34 shows that the treat
ment in Section 8.5 is complete, and the methods developed there apply unchanged.
For unitary similarity, some changes are necessary. We must first obtain the result for
normal matrices that corresponds to Theorem 8.21 for real symmetric matrices. The
first step in this direction is to relate the eigenvalues of T to those of T*.
Theorem 9.43 Let A be a normal matrix of order n. If Xr and Xs are distinct eigen
values of A with associated eigenvectors Ur and Us, then U*US = 0.
(T*(v r ), v j = (Ä r v r , v e ) = A r (v r , v e ).
We also have
(v r ,T(v 8 )) = (v r , A s v s ) = A s (v r ,v s ).
But (T*(v r ), v s ) = ( v r , T ( v s ) ) , so this means that A r (v r , v s ) = A s (v r , vs) and
(Ar - A s )(v r ,v s ) = 0 .
9.8 Normal Linear Operators 321
We can now formulate our method for finding a unitary matrix U that will diagonal-
ize a given normal matrix A = [a^] n . Let Uj denote the j t h column of U = [uij]n, so that
U = [f/χ, {/2, ...,f/ n ]. The requirement that U~lAU = diag{Ai, À2,..., λ η } is equivalent.
to the system of equations
That is, each Uj must be an eigenvector of A corresponding to Xj. Since U*US is the
element in row r and column s of U*U, the requirement that U* = U~l is satisfied
if and only if U*US = 6rs. With the same notation as in the proof of Theorem 9.43,
U*US = (v r , v s ) . Thus, the columns Ur of U must be the coordinates of an orthonormal
basis of eigenvectors of T. Theorem 9.43 assures us that eigenvectors from different
eigenspaces are automatically orthogonal. Thus the only modification of the procedure
in Section 7.4 that is necessary to make U unitary is to choose orthonormal bases of the
eigenspaces Vxj. The Gram-Schmidt process can be used to obtain orthonormal bases
of those V\j that have dimension greater than 1.
Example 1 □ Consider the problem of finding a unitary matrix U such that U~lAU
is diagonal, where A is the normal matrix
2
0 - 22+ * 1
u
A = 0 i 0 .
-2+z u
2 2 J
Γ1 0 1 ol
0 1 0 0 ,
lo o o ol
for V2. The matrix A — il leads to solutions of the form #2(0,1,0) +2:3(1,0,1). With
X2 = 1, X3 = i,we obtain (i, 1, z), and with X2 = i, X3 = 1, we obtain (1, z, 1). Application
322 Chapter 9 Inner Product Spaces
of the Gram-Schmidt process to the basis {(2, l,i), (1,2,1)} leads to the orthonormal
basis
1ΛΛ/3' Va' VS) ' \Vë> Vë' Vê))
of V». Thus
i 1 V3-i
2v"2
U = 1 2i
0
i 1 -V3-\-i
VS Vë 2\/2 J
is a unitary matrix such that U~1AU = diag{i, 2,2}. ■
Exercises 9.8
1. For each of the following linear operators on C2, write out the value of Τ*(αι, α 2 ).
(a) Τ(αι,α 2 ) = («i + (1 - ϊ)α2, (1 + 2>ι + 2α2)
(b) Τ(αι,α 2 ) = (αχ + 2*α2,αι - α2)
(c) Γ(αι,α 2 ) = (mi - 2 α 2 , α ι )
(d) Γ(αι,α 2 ) = ( ^ ι + (2 - 1)α2, (1 + 2>ι)
2. For each linear operator T in Problem 1, find an orthonormal basis of eigenvectors
whenever such a basis exists.
3. Whenever possible, find a unitary matrix U such that U~1AU is diagonal. The
matrices in parts (a)-(e) are from Problem 2 in Exercises 9.7.
-1 3 3 1 _^1 0
y/2 \/2 2 2
(a) A 3 1 3 (b)A = 0 0 1
V2 2 2
^3 1
3
2 2 0
V2
2 i—1
(c)A (d)^ =
2+ I 0
1 2 1+2 3 1 0
(e)A -2 1 -1 + i (f)A- 1-1 5
2 1 - 2
-I + 2 1 +z 0 0 1+2 3
4. Consider the following matrices.
1 1 1 1 2
A =
2
\/3 _ I
2
, B = V2 V2 , c=
2 2
i i 2 -2
L V2 v/2 J
2+ 2 -2 + i 1 2
D , E =
-2 + 2 2+ i 0 3
9.8 Normal Linear Operators 323
(a) Determine which of these matrices are orthogonal, which are unitary, and
which are normal.
(b) Which of these matrices are unitarily similar to an upper triangular matrix?
(c) Which of these matrices are unitarily similar to a diagonal matrix?
(d) Which of these matrices are similar over C to a diagonal matrix?
11. Prove that a linear operator T on the unitary space V is normal if and only if
||T(v)|| = | | r * ( v ) | | f o r a U v G V .
12. Prove Theorem 9.42.
13. Prove that a normal linear operator T is self-adjoint if and only if all the eigen
values of T are real.
14. Prove that if T is normal, then T and T* have the same kernel.
15. Prove that if T is an invert ible linear operator on V, then T* is invert ible and
(T*)- 1 = (T" 1 )*.
16. A linear operator T on an inner product space V is said to be skew-adjoint if
X* = — T. (For V unitary, T is called skew-hermitian, and for V Euclidean, T
is called skew-symmetric.)
Spectral Decompositions
10.1 Introduction
In this chapter we consider once again the question of diagonalization of a linear operator
on a finite-dimensional vector space V. We have seen in Chapter 7 that a linear operator
T on V is diagonalizable (i.e., can be represented by a diagonal matrix) if and only if
there exists a basis of V that consists of eigenvectors of T. For a unitary vector space
V, those linear operators that can be represented by a diagonal matrix relative to an
orthonormal basis of V are the same as the normal linear operators on V. One of our
main objectives now is to describe the diagonalizable linear operators in terms that are
free of any reference to an inner product. The characterization that we obtain is in
terms of a spectral decomposition. A certain acquaintance with projections is essential
to a formulation of the concept of a spectral decomposition.
For the entire chapter, V shall denote an n-dimensional vector space over a field T',
and T shall denote a linear operator on V.
325
326 Chapter 10 Spectral Decompositions
P 2 ( V l + v 2 ) = P(vi) = vi = P(v x + v 2 ),
Our discussion above shows that the projection P of V onto W i along W2 is idem-
potent, and that W i = P ( V ) , W 2 = P _ 1 ( 0 ) . The converse is also true: An idempotent
linear operator T is always a projection of V onto T(V) along T _ 1 ( 0 ) .
Proof. Assume that T2 = T . For any u G T(V), u = T(v) for some v G V. Hence
T ( u ) = T 2 ( v ) = T ( v ) = u,
and T acts as the identity transformation on T(V). Thus, for any v in T(V) Π Τ _ 1 ( 0 ) ,
we have v = T(v) = 0, and the sum T(V) + T _ 1 ( 0 ) is direct. Let v be an arbitrary
vector, and let vi = T(v), v 2 = (1—T) (v), where 1 denotes the identity transformation.
Now Vi is clearly in T(V) and v 2 is in T~l(0) since1 T(v 2 ) = ( T - T 2 ) ( v ) = Z(v) - 0.
Since
v = T(v) + ( l - T ) ( v ) = v i + v 2 ,
we have V = T(V) Θ Γ _ 1 ( 0 ) , and T is the projection of V onto T(V) along Γ _ 1 ( 0 ) . ■
Pi(vi + ν 2 + · · · + ν Γ ) = Vi.
Then the set {Pi, P 2 ,..., Pr} has the following properties:
and hence Pi + P 2 H + P r = 1.
The three properties of the set {Pi, P 2 ,..., P r } in the preceding paragraph motivate
the following definition.
be in
p,(v)n£^(v).
J= l
Then
w = P i (u 1 ) = i f ( u i ) = P i Σ ^ 2 )
= tPiPj("2)=£z(u2) = o.
Definition 10.4 A complete set of projections {Pi,P 2 ,..., Pr} for Y and a direct sum
decomposition V = W i 0 W 2 0 · · · 0 W r are said to correspond to each other, or to
be associated with each other if P/(v) = \j whenever v G V is written in the unique
form v = vi + v 2 + · · · + v r with v* G WV
328 Chapter 10 Spectral Decompositions
for i = 1,2, ...,n. Then {Ρχ, P 2 ,..., Pn} is a complete set of projections for R n . This
situation generalizes readily to an arbitrary n-dimensional vector space V. For any
given basis A = {vi, v 2 ,..., v n } , let P{ \YTj=iajYj) = αίνί· Then
{Pi, P2, ···, Pn} is a
complete set of projections for V. ■
Variations in the number of projections in a complete set can easily be made. For
example, for each v = (αι,α 2 ,α3) in R 3 , let the mappings 7\ and T2 be defined by
Ti(v) = (αχ, 0,0) and T 2 (v) = (0, a 2 ,03). Then {Ti, T 2 } is a complete set of projections
for R 3 .
(1 - λ)λν = (1 - λ)Ρ(ν)
= (Ρ-λΡ)(ν)
= (Ρ2-λΡ)(ν)
= Ρ((Ρ-λ)(ν))
= Ρ(0)
= 0.
The fact that a projection P has the property P2 — P and also acts as the identity
transformation on its range might lead one to expect the matrix of a projection to
look somehow like the identity matrix. This is not necessarily the case, however. For
example, the matrix
3 -3 -2 3
-4 6 4 -5
A =
3 -3 -2 3
-4 6 4 -5
10.2 Projections and Direct Sums 329
is such that A2 = A, and hence represents a projection. But with an appropriate choice
of basis, the matrix of a projection P can be made to take on a form very much like
In. Now P(V) is the same as the eigenspace of P corresponding to the eigenvalue 1,
and V = P ( V ) Θ P _ 1 ( 0 ) . Thus, if a basis {vi,..., v r } of P ( V ) is extended to a basis
A = {vi,..., v r , v r +i,..., v n } of V with {v r +i,..., v n } a basis of P _ 1 ( 0 ) , then the matrix
of P relative to A is
Γ
ir o
Dr =
0 0
An interchange of the v* in A produces an interchange of the elements on the diagonal
of Dr, so the l's on the diagonal can be placed in any desired diagonal positions.
Exercises 10.2
10. Let V be an inner product space. Prove that if P is the projection of V onto W i
along W 2 , then P* is the projection of V onto W ^ along W^ 1 .
330 Chapter 10 Spectral Decompositions
11. According to the definition in the second paragraph of this section, a projection P
on an inner product space V is called an orthogonal projection if and only if P(V)
and P _ 1 ( 0 ) are orthogonal subspaces. Prove that a projection P is an orthogonal
projection if and only if P is self-adjoint.
12. Let P be a projection on the inner product space V. Prove that if ||P(v)|| < ||v||
for all v G V, then P is an orthogonal projection.
13. Prove that a projection on an inner product space V is self-adjoint if and only if
it is normal.
T h e o r e m 10.6 Let {Pi, P 2 ,..., Pr} be a complete set of projections for V, and suppose
that T = c\P\ + C2P2 H h crPr for some scalars C{. Then each Ci is an eigenvalue of
T, and each eigenvector v^ associated with the eigenvalue 1 of Pi is an eigenvector of T
associated with Q .
= CiVi,
Proof. Let {Pi, P 2 ,..., P r } be a complete set of projections for V. For an arbitrary
v G V, consider the vector Ρ*(ν). Since
Theorem 10.8 Let T be a linear operator on V with distinct eigenvalues λι, λ 2 ,..., Ar.
Then T is diagonalizable if and only if
Τ = λιΡι + Α 2 Ρ 2 + · · · + ΑΓΡΓ,
ß={Pi(vi),...,Pi(vn),P2(vi),...,P2(vn),...,Pr(vi),...,Pr(vn)}.
v = ( Ρ ι + Ρ 2 + ··· + Ρ,)(ν)
= (Ρι+Ρ 2 + ··· + Ρ Γ ) ( | > ν Λ
n r
= Σ Y^aiPj^i).
t=1.7 = l
With W j = V\j in Definition 10.4, the set of projections Pj defined there is a complete
set of projections for V. For each v G V, we have
V = Vi + V 2 H h Vr
with Vj = Pj(v). And since v^· G VAJ7 we have T(VJ) = XjVj. Thus
T(v) = T ( V l ) + T ( v 2 ) + - . . + T ( v r )
= AiVi + λ 2 ν 2 H h Arvr
- λ ι Ρ ι ( ν ) + λ 2 Ρ 2 (ν) + · - · + A r P r (v)
- ( A i P ! + A 2 P 2 + .-. + A r P r )(v),
and T = AiPi + A 2 P 2 + · · · + A r P r . ■
Definition 10.9 / / the linear operator T on Y can be written in the form
Τ = ΑιΡι + Α 2 Ρ 2 + · · · + λ Γ Ρ Γ ,
where {Pi,P 2 , ...,P r } is a complete set of projections for V and λι,λ 2 ,...,λ Γ are the
distinct eigenvalues of T, then the expression X\P\ + λ 2 Ρ 2 + · · · + \rPr is called a
spectral decomposition ofT.
Theorem 10.8 asserts that T is diagonalizable if and only if T has a spectral decom
position.
Theorem 10.10 IfT has a spectral decomposition T = Χ\Ρχ + λ 2 Ρ 2 H h A r P r , then
/ ( T ) = /(λχ)Ρι + / ( λ 2 ) Ρ 2 + - · - + f(Xr)Pr for every polynomial f(x).
r r
=
2^ 2^ XjXmPjPm
j = \ m=\
i=0 \j=\ }
j=l \i=0 /
10.3 Spectral Decompositions 333
dim(Pj(V))<dim(VA,).
and therefore
dim{Pj(V)) = dim(VXj),
for j — 1,2, ...,r. It follows that Pj{V) = V\j for j = 1,2, ...,r. Hence Pj is the
projection of V onto VA · along the sum of the remaining eigenspaces. ■
Proof. With the proofs of Theorems 7.16 and 10.8 in mind, it is sufficient to prove
that an orthonormal basis of eigenvectors of T exists if and only if T has a spectral
decomposition in which each projection is self-adjoint.
Suppose first that T = X\P\ + λ2Ρ2 + · · · + A r P r , with the Pi self-adjoint. Let
Bj = {uji, Uj2, ···, Uj n j } be an orthonormal basis of the eigenspace VA · for j = 1,2,..., r.
As in the proof of Theorem 7.20, the set
B = { U n , . . . , U i n i , U 2 l , ...,U2n 2 î - ) U r l ) • • • ? U r n r }
334 Chapter 10 Spectral Decompositions
{Uit,Ujs) = (Pi(Uit),Pj(uJ8))
= (uiuPtPjiujs))
= (uit.PiPjiujs))
= (u i t ,0)
= 0
whenever i φ j .
Conversely, suppose that there exists an orthonormal basis of eigenvectors of T. With
the same notation as used in the proof of Theorem 10.8, V = V\1 0 VA 2 Θ · · · Θ VA,.·
The set of projections {Pi, P 2 ,..., P r } is a complete set for V, and T — \\P\ + λ 2 Ρ 2 +
• · · + \rPr. The proof will be complete if we show that each Pi is self-adjoint. Let
u = Pi(u) + P 2 (u)H hPr(u) and v = Pi(v) + P 2 (v)H f-P r (v) be any two vectors
in V. For i φ j,Ρ%{\\) and Pj(y) are in distinct eigenspaces VA, and VA-, and so are
orthogonal. This implies that
= (Ρ,(α),^(ν))
= (Pi(u),gP<(v))
= (^(u),v),
We shall devise a method for obtaining the projections involved in a spectral decom
position near the end of the next section.
Exercises 10.3
1. Verify that the mappings of R 2 into R 2 defined by Pi(x\,X2) — (#1 + #2>0) and
^2(^1,^2) = (#2^2) are projections, and show that P\ + P<i is not a projection.
l
3. Prove that if T is invertible and has a spectral decomposition, then T has a
spectral decomposition with the same complete set of projections.
4. Let {Pi, P2, ···, Pr] be a complete set of projections for V, and suppose that T =
C1P1+C2P2 + ·· - + crPr.
(a) Prove that each eigenvalue of T is equal to at least one of the scalars Cj.
(b) Give an example which shows that there may be a vector v such that v is an
eigenvector of T associated with Cj, but v is not an eigenvector of any Pi.
Τ = λιΡι + λ 2 Ρ 2 + · · · + λ Γ Ρ Γ
of T has rank equal to the geometric multiplicity of the eigenvalue λ^ of T.
6. Prove that the spectral decomposition of T is unique if it exists.
7. Let T be a normal linear transformation of the inner product space V. Prove that
T* has a spectral decomposition.
8. Let T = X][ =1 KPi be a spectral decomposition for the linear operator T on the
inner product space V. Suppose that {/i(x), /2(x), ···, fr(x)} is a set of polyno
mials with real coefficients such that fi(Xj) = 6{j. Prove that fi(T) = Pi for
i = l,2,...,r.
9. A linear operator T on an inner product space V is called nonnegative if T is
self-adjoint and (T(v), v) > 0 for every v G V.
(a) Prove that any nonnegative linear operator T on V has a spectral decompo
sition T = ΣΙ=1 ΚΡί with each λ^ > 0.
(b) Show that 5 = \f\[P\ + \f\2P2 H H yf\>Pr is a nonnegative linear operator
such that S2 = T (i.e., 5 is a nonnegative square root of T).
(c) Show that the nonnegative square root of S in (b) is unique.
with r(x) either the zero polynomial or a polynomial of degree less than that of p(x).
This statement is known as the division algorithm for elements of T[x\. If r(x) is the
zero polynomial, we say that p(x) divides m(x), and that p(x) is a divisor of m(x).
A nonzero polynomial p(x) in Τ[χ] is called monic if the coefficient of the highest
degree term in p(x) is 1. That is, the highest power of x that appears in p(x) has 1 as
its coefficient.
A monic polynomial d (x) in T[x\ is called the greatest common divisor of a set
of nonzero polynomials qi(x),q2(x), • ••>#r(#) m F[x] if
i. d(x) is a divisor of each of the polynomials qi(x), and
ii. every polynomial p(x) that divides each qi(x) is also a divisor of d{x).
Every set of nonzero polynomials q\(x),q2(x), ...,qr{x) in !F[x\ has a unique monic
greatest common divisor d(x) in F[x\. Moreover, there exists a set of polynomials
9i(%),92{x),--")9r(x) m ^[χ] s u ch that
In some of the examples and exercises, we assume that the student is familiar with
the partial fraction decomposition of a quotient of two nonzero polynomials.
We have already encountered polynomials p(T) — ΣΙ=ο Ci^% m a n n e a r operator
T on V and the corresponding polynomials p(A) = ΣΙ=ΌθίΑι in a matrix A that
represents T. It has been noted that T and A satisfy the same polynomial equations.
From one point of view, a polynomial p(A) = ]ζ* = 0 CiA1 in the square matrix A can
be thought of as being obtained from a polynomial p(x) = Y^si=0 c^x1 by replacing the
powers xl of the indeterminate x by the corresponding powers A1 of the matrix A. This
suggests the construction of other types of polynomials involving matrices by making
other replacements in p(x). One might treat x as an indeterminate scalar and replace
the scalar coefficients Q by matrix coefficients Ci, or one might replace both the C{ and
the xl by matrix quantities. The algebra connected with this last type of polynomials is
quite involved, and we shall not be concerned with them here. Our interest is confined
to only two types of polynomials:
There is another point of view from which the polynomials of the second type may
be regarded. Any polynomial X ^ = 0 CiX1 of this type can be considered to be a matrix
with elements in !F[x]. For example,
2 -1 -1 0 4 5 2x3 - x + 4 - x3 + 5
x3 + x+
0 4 8 3 -7 0 Sx - 7 Ax3 + 3x
10.4 Minimal Polynomials and Spectral Decompositions 337
In either form, this type of matrix is called a matrix polynomial. In our development,
we have considered only matrices with elements in a field, and F[x] is not a field. But
T[x] is contained in the field F{x) of all rational functions in x over T, and we can
consider matrix polynomials in this context. For future use, we note that two matrix
polynomials are equal if and only if they have equal coefficients of each power of x.
Since powers of A commute with scalars and with each other, multiplication is
commutative for polynomials of the first type: p(A)q(A) = q(A)p(A) for any p(x),q(x)
in T[x\. In this case, factorizations in J~[x\ remain valid: If p(x) = g(x)h(x) in Τ[χ\,
then p(A) = g(A)h(A). A certain amount of care must be exercised when working
with polynomials of the second type, since multiplication is not commutative then.
Factorizations in F[x\ no longer remain valid here. For example, if AB ψ ΒΑ, then
We recall from Chapter 4 that the set Tnxn of all n x n matrices over T is a vector
space over T. If Eij denotes the n x n matrix that has the element in row i, column j
equal to 1 and all other elements 0, then an arbitrary A = [α^] in Tnxn can be written
uniquely as
n n
Α= ΣΥ^α13Ε13,
2=1 j = l
and the set {E^} is a basis of TnXn. Hence TnXn has dimension n 2 , and the set
Ι = Α°,Α,Α2,...,Αη2
is linearly dependent for any n x n matrix A over T. That is, there is a polynomial
2
P(X) ~ ΣΓ=ο CiX% s u c n that p(A) = 0. This means that there exists a monic polynomial
m(x) of smallest degree such that m(A) — 0.
Theorem 10.13 Let A be an n x n matrix, and let m(x) be a monic polynomial over
T of smallest degree such that m(A) = 0. For any p(x) in T[x], p(A) = 0 if and only if
m(x) is a factor ofp(x).
This means that r(x) must be the zero polynomial, since otherwise r(A) = 0 would
contradict the choice oîm(x) as having the smallest possible degree such that m(A) = 0.
Hence m(x) is a factor of p(x). M
338 Chapter 10 Spectral Decompositions
This theorem makes it easy to prove that the polynomial m(x) in the hypothesis is
unique (see Problem 6).
Since a linear operator T and any matrix that represents it satisfy the same poly
nomial equations, the minimal polynomial of T is well-defined.
The discussion preceding Theorem 10.13 is an efficient argument for the existence of
the minimal polynomial m(x) of A, but it suggests no practical method for finding m(x).
The next theorem is a great help in this direction. It is known as the Hamilton-Cay ley
theorem.
Theorem 10.15 Let A = [α^] be an arbitrary n x n matrix over T, and let f (x) be
the characteristic polynomial of A. Then f(A) = 0.
A — xl — [dij — xöij]
is a matrix with polynomials as elements, the minor of each a^—xbij is the determinant
of an (n — 1) x (n — 1) matrix with polynomial elements. Hence each element in B —
adj(A — xl) is a polynomial in x. Moreover,
The usual properties of matrix multiplication and matrix addition yield the factorization
Ak - Ixk — Pk(A — Ix), where
and
f(A) = f(x)I+£ckPk(A-Ix)
k=l
The matrix inside the braces is a matrix polynomial, say Btxl + · · · + B\x + Bo. This
gives
f(A) = ( Β ^ + .-. + Βχχ + Β ο Χ Α - Ι χ )
= -Btxl+l + (BtA - Bt-^x1 + · · · + (BiA - B0)x + ^ o ^ ·
Since two matrix polynomials are equal if and only if they have corresponding coefficients
that are equal, this requires that f(A) = BQA and the coefficients of each positive power
of x on the right side be zero:
Bt = 0
BtA-Bt-i = 0
#2^4-#i = 0
BXA - B0 = 0 .
0 = Bt = Bt-i = · · · = B\ = Bo.
We shall adopt the notation of this paragraph for the characteristic polynomial / (x)
and the minimal polynomial m(x) of T throughout the remainder of the chapter.
340 Chapter 10 Spectral Decompositions
There are some further notational conventions that are essential. For each factor
(x — Xj)1-1 of ra(x), let qj(x) be the polynomial
m{x)
qj{X) =
(^F
= (X - λ ι ) ^ · · - (X - Xj-lY^{x - Xj + lY^1 ' ' ' (X - XrY'·.
Any nonconstant common divisor of q\ (x), q2{x\ ···, qr{x) would necessarily have a linear
factor of the form x — λ?; since the only linear factors of each qj(x) are of this type.
But x — Xi is not a factor of qi(x), so there are no nonconstant common divisors of
qi(x), ^2(^)7 ···? qr(x)- That is, the greatest common divisor of q\(x), q2(x)>> ···, qr(x) is 1·
Hence there are polynomials gi(x),g2(x), ····>9r(x) such that
1 = gi(x)qi(x) + g2(x)q2(x) H h gr{x)qr{x)·
ο· / x mix) .. . ,.
Since qj[x) = -, , this yields
(x - Xj) j
FJ=FJ(J2F)=J2F3Fl = Ff,
\i=i / ?;=i
and {F\, F2,..., Fr} is an orthogonal set of projections such that 1 = F\ + F2 + · · · + Fr.
We shall show that Fj is the projection of V onto JCj along Σ ? : ^ JCi, and it will follow
that Fj Φ Z since JCj clearly contains the eigenspace VA^ of T.
Now
{x-XjY^Pj(x) = (x- XjY^gj(x)qj(x)
= gj(x)m(x),
10.4 Minimal Polynomials and Spectral Decompositions 341
T(W) C W .
That is, a subspace W of V is T-invariant if and only if T(v) G W for all v G W .
Every linear operator T on V has invariant subspaces. The eigenspaces VA of T are
invariant under T, as are the zero subspace and V.
If W is a subspace of V that is invariant under T, then T induces a linear transfor
mation T w of W into W defined by T w (v) = T(v) for all v G W . That is, as long as
v e W , T w and T map v onto the same vector. The distinction between T w and T is
that T w is defined only on W . The transformation T w is called the restriction of T
to W.
(a) V = / C i 0 / C 2 e - - - e / C r ;
(b) each Kj is invariant under T;
(c) ifTj denotes the restriction of T to ICj, then the minimal polynomial of Tj
is (x — Xj)tj ■
Proof. Consider the complete set of projections {Fi, F 2 ,,..., Fr} where Fj = Pj{T).
By Theorem 10.16, Fj is the projection of V onto ICj = Fj(V) along Σ ^ K>i> Thus,
V = /Ci Θ /C2 Θ · · · θ /Cr is the direct sum decomposition corresponding to this complete
set of projections.
To establish (b), let v G JCj and consider T(v). Since (T — Xj)tj is a polynomial in
T, it commutes with T to yield
v = vi + v 2 H h vr
And since
(T-\j)°(vj) = (Tj-\j)'(vj) =0
s
for all \j G /Cj, this means that p(T) = (T — Xj) qj(T) is the zero transformation on
V. But this is a contradiction, since p(x) = (x — Xj)sqj(x) has degree less than that of
m(x) — (x — Xj)tjqj(x). Therefore, s = tj. ■
Theorem 10.19 T/ie linear operator T on\ has a spectral decomposition if and only
if the minimal polynomial m(x) ofT has the form
Proof. If the minimal polynomial m{x) had the given form, then each ti = 1 in
equation (10.2). Hence /Q = V\i for i = 1, 2,..., r, and
ν =νλιθνλ2θ-θνν
It follows from the proof of Theorem 7.20 that T is diagonalizable, and therefore T has
a spectral decomposition by Theorem 10.8.
Conversely, suppose that T has a spectral decomposition
by Theorem 10.10. If p(Xi) φ 0 for some i,p(T) has p(X{) as a nonzero eigenvalue by
Theorem 10.6. Hence p(T) — Z if and only if p(Xi) = 0 for i = 1,2, ...,r. This implies
at once that
m(x) = (x — X\)(x — X2) - - ■ (x — A r ). ■
The result that we have been working toward is now at hand.
of V onto VXj along Σ ^ V A i . Since JCj = V A j for each j , Pj = Fj = Pj{T) for each j ,
and the proof is complete. ■
The next corollary follows readily from the fact that each projection in a spectral
decomposition of T is a polynomial in T.
One immediate and desirable consequence of Theorem 10.20 is that we now have a
systematic procedure available for finding the projections in a spectral decomposition.
This procedure, of course, is to determine the Pj by use of Pj = Pj(T).
Example 1 □ Let T be the linear transformation of R 3 that has the matrix A relative
to the standard basis where
1-1-1
1 -1
-1 1
We shall determine whether or not T has a spectral decomposition, and shall obtain
such a decomposition if there is one. The characteristic polynomial f(x) is given by
/(x) = - ( x + l ) ( x - 2 ) 2 .
It follows from Theorem 10.19 that T has a spectral decomposition if and only if
m(x) = ( x + l ) ( x - 2 ) .
A simple calculation shows that
m(A) = A2 - A - 21 = 0,
so T does indeed have a spectral decomposition. With λι = — 1, À2 = 2, the polynomials
qj(x) are given by q\{x) = x — 2,q2(x) = x + 1. The partial fraction decomposition
1 _ 9i(x) 92{x)
m{x) x + 1 +
leads to g\{x) = —^,g2(x) = ^. Hence p\(x) = —\{x — 2) and P2(x) — \{x + 1)· The
projections P\,p2 thus have respective matrices Ei,E2 relative to Es given by
1 1 1
El = -UA-2I) = \ 1 1 1
1 1 1
344 Chapter 10 Spectral Decompositions
2-1-1
E2 = \{A + I) = \ -1 2-1
1-1-1 2
for T, iV is given by
for all positive integers k. Recall from the proof of Theorem 10.16 that (Γ — Xj)tj Pj = Z,
and thus Nk = Z if k > tj for all j .
Definition 10.22 A linear operator T is called nilpotent ifTk = Z for some positive
integer k. The smallest positive integer t such that Tl = Z is the index of nilpotency
ofT.
Theorem 10.23 If the minimal polynomial ofT factors into a product of linear factors
(not necessarily distinct) over J1", then T can be written as the sum T = D + N of a
diagonalizable transformation D and a nilpotent transformation iV, where D and N are
polynomials in T, and consequently commute.
Proof. Let D and TV be as given in the paragraph preceding Definition 10.22. From
the expressions there for D and N and the fact that Pj = pj (T), it is reasonably clear
that D and N are polynomials in T and hence commute. Nevertheless, we furnish some
additional details. The projections Pj commute with T since Pj = Pj(T). Now
= Σ ZP-xjXjPiPj,
1=1.7=1
10.4 Minimal Polynomials and Spectral Decompositions 345
and
DN = [E^jPj Z(T-\i)Pi
i=l
2=1.7 = 1
But
XjPjiT - X^Pi = (T- λΟλ,-PiPjJ '
so DN = ND. I
A = X1E1 + X2E2 + · · · + A r £ r
Exercises 10.4
-4 -3 -1 8 5 - 5
(aM = -4 0 -4 (b)A = 5 8 - 5
8 4 5 15 15 - 1 2
8 5 6 0 7 3 3 2
0 - 2 0 0 0 1 2 -4
(c)A = (a) A
-10 - 5 - 8 0 -8 - 4 -5 0
2 1 1 2 2 1 1 3
4. Let T be the linear operator on R n that has the given matrix A relative to the
standard basis Sn. Find the spectral decomposition of T.
346 Chapter 10 Spectral Decompositions
8 5 -5 3 2
(a)A = 5 8 -5 (b)A 1 4
5 15 --12 -2 -4
7 3 3 2 4 -1 0 1
0 1 2 ·- 4 -1 5 -1 0
(c)A (d)A =
-8 - 4 - 5 0 0 -1 4 -1
2 1 2 3 1 0 -1 5
Write the given matrix A as the sum of a diagonahzable matrix and a nilpotent
matrix.
7 3 3 2 8 5 6 0
0 1 2 - 4 0 - 2 0 0
(aM = (b)A =
-8 -4 -5 0 -10 - 5 - 8 0
2 1 1 3 2 1 1 2
1 1 0 0
0 1 1 0
(c)A-
0 0 1 1
- 1 0 2 1
6. Prove that the polynomial m{x) in the hypothesis of Theorem 10.13 is unique.
7. Prove that, for an arbitrary square matrix A, A and AT satisfy the same polyno
mial equations with scalar coefficients.
11. Let {Pi, P2,..., P r } be the complete set of projections determined by Pj = Pj(T).
With T = TPi + TP2 + ■ · · + TPr, show that (TPi) · (TPj) = Z if i φ j , and
that p(T) = ρ{ΤΡχ) + p(TP 2 ) + · · · + p{TPr) for any polynomial p(x) with zero
constant term.
10.5 Nilpotent Transformations 347
12. Let T = \\P\ + MP2 + · · * + KPr be a spectral decomposition for the linear
operator T on V. Prove that a linear operator 5 on V commutes with T if and
only if every JCj is invariant under S.
[ 0 A2J
where A\ is the matrix of T\y relative to {vi,..., v / J .
If V = W i Θ W2, where each of W i and W2 is invariant under T, and if the
basis A — {vi,...,Vfc,Vfc+i,...,v n } of V is chosen so that {v 1 ,...,v/ c } is a basis of
W i and {ν&+ι,..., v n } is a basis of W2, then we also have T ( V J ) = ΣΓ=ΑΗ-Ι aijwi f° r
j = k + 1,..., n. Hence ^3 = 0, and the matrix of T relative to A is of the form
AA= \Al 0
l
[ 0 A2\
B = { U n , . . . , U i n i , l l 2 1 , ...,U2n 2 5 " · ) ^ Γ 1 , . . . , U r r i r }
of the form
Ai 0 0
0 A2 0
0 0
where Ai is the matrix of the restriction of T to W^. A matrix such as this A is called
a diagonal block matrix.
The type of invariant subspace that we shall be mainly concerned with is the cyclic
subspace, to be defined shortly.
Let v be any nonzero vector in V. Since a linearly independent subset of V can have
at most n elements, there is a unique positive integer k such that {v,T(v), ...,T / c _ 1 (v)}
is linearly independent and Tk(v) is dependent on {v, T(v), ...,T f c - 1 (v)}. Then there
are scalars αο,αι, ...,α^-ι such that Tk(v) = Σΐ=ο aiTl(v). It follows from this that
the subspace
(y,T(-v),...,1*-1{-v))
is T-invariant (Problem 3).
Definition 10.24 Let v be a nonzero vector in V, and let k be the unique positive in
teger such that {v,T(v), ...,T f c - 1 (v)} is linearly independent andTk{\) is dependent on
{w,T(Y),...,Th-l(\)}. The subspace ( ν , Τ ( ν ) , . . . , Τ * " 1 ^ ) ) is denoted by C(v,T) and is
called the cyclic subspace of v relative to T, or the cyclic subspace generated by
v under T. The particular basis {Tk~l(v), ...,T(v), v} o / C ( v , T ) is called the cyclic
basis generated by v under T and is denoted by #(v, T).
and T (T^'(v)) = T-?+1(v) for j = 0,1,..., fe - 2, the restriction of Γ to C(v,T) has the
matrix
Γ
ak-x 1 0 ··· 0
ak-2 0 1 0
A =
ax 0 0 • 1
a0 0 0 • 0
relative to ß(v, T). The special form that this matrix A takes on for a nilpotent trans
formation T is especially useful.
10.5 Nilpotent Transformations 349
0 0 0 ·.. 1
[θ 0 0 · . . Oj
relative to ß(v, T).
Proof. With the notation of the paragraph just before the theorem, we shall show
that Tk(v) = 0. Our proof of this fact is an "overkill": We show that the restriction Tc
of T to C(v, T) has minimal polynomial xk.
Since T is nilpotent on V, Tc is nilpotent, say with index s. The minimal polynomial
of T c must therefore divide xs. Let xr denote this minimal polynomial. Since Tk~1(v)
is in the basis #(v, T),then r > k.
Consider the polynomial p(x) — ao + Q>\x H H ak-\xk~l — xk'. We shall show that
S = p{Tc) = a0 + axTc + · · · + ak^Tk~l - Tk
= o.
Hence 5 = p(T c ) is the zero transformation, and xr divides p(x). This requires that
r < k. Therefore r — k and p(x) — xr = xk. ■
Our principal concern is with the connection between cyclic subspaces and the kernels
of the powers of a nilpotent transformation. Before proceeding to our main result in
this direction, some preliminary lemmas are in order. These lemmas by themselves are
of little consequence, but they are essential to the proof of Theorem 10.28.
Proof. The equality {0} = W 0 follows from the fact that T° is the identity trans
formation, and W t = V since T has index t o n V .
Since T^l{w) = 0 implies !P(v) = T(0) = 0, W ^ C Wj for j = 1,2, ...,t. Let u
be a vector in V such that Γ* _1 (ιι) ^ 0, and let 1 < j < t. Then T*"J'(u) is in Wj since
Ti (T*-J(u)) = 0, and T*-'(u) is not in Wj_i since T^1 ( T ^ ' ( u ) ) - T * " 1 ^ ) ^ 0.
Thus W j . i ^ W j . l
Lemma 10.27 Let the subspaces Wi be as in Lemma 10.26, with j > 2. Suppose that
{ui,...,Ufc} is a basis of Wj-2,and that {wi,...,w r } is a linearly independent set of
vectors in Wj such that
(wi,..,w r )nw H = {o}.
Then {ui, ...,Ufc,T(wi),...,T(w r )} is a linearly independent subset o / W j _ i .
{ui,...,u f c ,T(wi),...,T(w r )}
k r
Then
r k
b U is in W
Σ CmT(wm) = -^2 i i J-2,
777=1 i=l
so that
\AI 0 ··· 0 1
A=
· l·
[ 0 0 · ·. As J
where each Ai is a square matrix that has all superdiagonal elements 1 and all other
elements 0, A\ is of order £, and order (Ai) > order (Ai+i) for each i. The matrix A is
uniquely determined by T.
A= {Vn,...,Viei,V2i,...,V2s2,...,Vii,...,VtSt}
of V such that {vn,..., v JSj .} is a basis of W j for j = 1, ...,£. Such a basis can be
obtained by extending a basis {vn,..., v i S l } of W i to a basis
{νιι,...,νιβ1,ν2ι,...,ν2β2}
of W2, then extending this basis to a basis of W 3 , and so on. For each j , the elements
of A that are in W j but not in W j _ i are precisely those in the segment v^i,..., v j s . .
We shall replace each of these segments by a new set of vectors in order to obtain a
basis Ar with certain properties. Roughly, the idea is this. The vectors in segment j — 1
will be replaced by the images of the vectors in the next segment to the right, and then
this set will be extended to a basis of W j _ i .
No change is required in the last st vectors of A. That is, we put
vJi = v f i , v { 2 = v t 2 ,...,v{ S t = v t e t .
If
E m = i Cm^'tm & i n W t _ i , then
St t—\ Si
CmW
Σ 'tm = ΣΣαίίνίί-
m=l i=l j=l
vi-1,1 = T ( v t l ) , v i _ l i 2 = T ( v t 2 ) , . . . , v i _ l i e t = T ( v t S t ) .
A' = {ν'η,...,ν'ΐ8ι,ν'211...,ν'232,...,ν'η,...,ν'ί8ί}
such that
ν;_ι,ι=Γ(ν;ι),...,ν;_1ιθί=Γ(ν^)
for each j .
The vectors in A! are now rearranged so that the first vectors from each segment
are written first, followed in order by the second vectors from each segment, and so on
until the vectors in the last segment are exhausted. Whenever a segment of vectors is
exhausted, we continue the same procedure with the remaining segments, if there are
any. This process leads to the basis
=
^ i v l l > v 21» "·> Wt\i ···> V l s t > ""> Vtsti V
l , s t + lJ ···> V i - l , s t + l> ···> V lsi> ···» V
fcsi/·
Since
ν'π = Τ ( ν 2 1 ) , ν 2 1 =Γ(ν / 3 1 ),...,νί_ 1 ι Ι = T ( v { 1 ) ,
we have
v'„ = T i - 1 K 1 ) , v 2 1 = T t - 2 ( v { 1 ) , . . . ) v i _ l i l = T ( v ' a ) ,
and {v' n , V2!, ...,ν^} is the same as the cyclic basis β ( ν ^ , Τ ) of C ( v ^ , T ) . Similarly,
those v£ · with the same second subscript (i.e., with the same position in the segments
of A') are the same as the vectors in the cyclic basis β ( ν ^ · , Γ ) of C(v ,T) :
ß(v,mi,T) = {vii,v^.,...,v^.},
where m is the number of the last segment in A' that has at least j elements. Thus
ß = {ß(vi1,T),...,ßKSt,T),ß(vi_1)ät+1,T),...)ß(viiS2+1,T),...,ß(v'lsi,T)},
10.5 Nilpotent Transformations 353
where Β(ν'^,Τ) is the cyclic basis of C ( v ^ , T ) . By Theorem 10.25, the matrix of the
restriction of T to C ( v ^ , T) is of the form
0 1 0 · • 0
0 0 1 · • 0
Aj =
0 0 0 · • 1
0 0 0 · • 0
Since each C(vJ ·, T) is T-invariant and V is the direct sum of the C(v· ·, T), the matrix
of T with respect to B is of the form A given in the statement of the theorem. There
are si submatrices Aj in A, one for each C(v· , T), and s\ = dim(Wi) = nullity (T).
The matrix A\ is of order £, since B(vftl,T) has t elements. The inequality
3 0 - 2 - 3 1
4 -1 -1 -4 0
M = | 4 -1 -1 -4 0
1 0 - 1 - 1 1
2-1 0-2 0
of W2. [Incidentally, the vector (0,1,1,0,1) cannot be used.] Next we replace the
segment (1,0,0,1,0), (1,2,2,0,1) by T ( - 2 , - 1 , - 1 , - 1 , 0 ) , T(0,1,1,0,0) to obtain the
basis
M = { ( - 1 , - 2 , -2,0, -1), (-2, - 2 , - 2 , - 1 , -1);
( - 2 , - 1 , - 1 , - 1 , 0 ) , (0,1,1,0,0); (0,0,1,0,0)}
B = { ( - 1 , - 2 , - 2 , 0 , - 1 ) , ( - 2 , - 1 , - 1 , - 1 , 0 ) , (0,0,1,0,0);
( - 2 , - 2 , - 2 , - 1 , - 1 ) , (0,1,1,0,0)}.
A1 0
A =
0 A2
where
0 1 0
0 ll
A1 = 0 0 1 , A2 =
0 0
0 0 0
Exercises 10.5
1. Let T be the nilpotent transformation of R n that has the given matrix relative to
the standard basis. Find a basis B of R n that satisfies the conditions of Theorem
10.28, and exhibit the matrix A described in that theorem.
4 - 1 -19 3 2 0-1-2
-3 0 12 - 2 4 -1 -1 -4
(a) (b)
1 0 - 4 1 4 -1 -1 -4
0 0 0 0 0 0 0 0
10.6 The Jordan Canonical Form 355
1 -1 -2 0 1 3 0 -2 -2 1
0 1 3 0 -2 4 -1 -1 -2 0
(c) 2 1 1 0 0 (d) 4 -1 -1 -3 0
1 0 0 0 1 1 0 -1 -1 1
1 1 2 0 -1 2 -1 0 -2 0
Definition 10.29 The Jordan matrix of order k with eigenvalue X is thekxk matrix
K 1 0 · • 0 0
0 A 1 · • 0 0
0 0 λ · • 0 0
0 0 0 · • λ 1
0 0 0 · • 0 λ
i{x)^{-l)n{x-Xl)m^--{x-\r)m^
over T, and let U{ be the geometric multiplicity of Xi. Then there exists a basis of Y
such that the matrix of T relative to this basis has the following form:
Ji 0 ··· 0
, 0 J2 0
J =
0 0
Jix 0 0
, 0 Ji2 0
Ji
0 0 J™
where each Jik is a Jordan matrix with eigenvalue Xi such that Jn has order U and
order (J^) > order (J^^+i) for all k. For a prescribed ordering of the eigenvalues, the
matrix J is uniquely determined by T and is called the Jordan canonical matrix for
T.
With the same notation as used in Section 10.4, let /Q denote the kernel of (Γ — A;)**,
and let F{ = Pi(T). Then V —K\ Θ · · · Θ /Cr, and Fi is the projection of V onto /Q
along Y2jjCifcj, by Theorem 10.16. Hence any nonzero vector in /Q is an eigenvector
of Fi corresponding to the eigenvalue 1, and the restriction of Fi to Kj is the zero
transformation if i φ j . For the time being, let di denote the dimension of /Q. With
every choice of bases Bi = {u*i,..., u ^ } for the subspaces /Q, the set
Γλι/di 0 ··· 0 1
0 X2Id2 ··· 0
K
=
[ 0 0 ... Xrldrj
relative to B.
Consider now the matrix of N relative to a basis of this type. Since
(T - \i)Fi(Ki) Ç (T - Ai)(/Ci) Ç Ki
Lu 0 0
0 Li2 ' 0
0 0 ■l^irii
358 Chapter 10 Spectral Decompositions
0 1 0 · • 0
0 0 1· • 0
ij ~~
0 0 0· • 1
0 0 0 · • 0d
Li\ has order ti and order (Lij) > order (Lij+ι) for all j . The number of diagonal blocks
L^ in Li is the nullity of Ni. Since Ni is the same as the restriction of T — A* to /Q,
and since the kernel of T — λ^ is contained in /Q, we have
=
^ { u l l > • • • » u l d i » u 2 1 » •••> u 2d 2 > " ' > U r l > •••iUrdr}i
0 0 Lr
since each /C is invariant under N. Thus T has the matrix
Ai/dl+Li 0 0
0 A2/d2+L2 · 0
J =K + L
0 0
But det(J — xl) — f(x) since J represents T. Therefore d^ = rrii for each i. Letting
Ji = Ai/ mi + Li, the matrix J is in the required form.
10.6 The Jordan Canonical Form 359
Proof. Let A be a matrix that satisfies the given conditions. With any given
choices of an n-dimensional vector space V over T and a basis of V, A determines a
unique linear operator T on V. Any such linear operator has f(x) as its characteristic
polynomial and m(x) as its minimal polynomial. Since any n-dimensional vector space
over T is isomorphic to Tn, we may assume without loss of generality that T is a linear
operator on Tn. The projections F;, the subspaces /Q, and the operators D and N are
determined independently of the choice of basis in Tn. Thus the matrix J is uniquely
determined by A. ■
The proof of Theorem 10.30 furnishes a method for obtaining the basis and the
matrix J described in that theorem, but several short-cuts can be made in the procedure.
This is illustrated in the following example.
- 1 1 0 0 0 0
- 1 - 1 1 0 0 0
0 1 - 1 0 0 0
- 2 1 1 - 1 1 0
3 0 - 1 - 1 - 3 0
- 1 1 1 0 0-2
relative to SQ. We shall (a) find a basis B' of R 6 such that T has the Jordan canonical
matrix J relative to B\ (b) determine the matrix J, and (c) find a matrix P such that
P~lAP = J.
The characteristic polynomial of A is f(x) — (x + l)3(x + 2) 3 , so λχ = —1 and
λ2 = — 2 are the distinct eigenvalues of T. It is actually not necessary to find the
minimal polynomial.
Our first step is to find a basis B[ for the kernel /Ci of (T + l)*1 that is of the type
in the proof of Theorem 10.30. We have seen that the restriction Νχ of iV to /Ci is
the same as the restriction of T — λι to JC\. It follows that (Ν\Υ is the restriction of
(T — XiY to /Ci for each positive integer j . Since the kernel W j of (T — λι)·7 is contained
in the kernel Wj+i of (T — Ai) J ' +1 , we begin by finding a basis of the kernel W i of
360 Chapter 10 Spectral Decompositions
T — λι, extending to a basis of W2, and so on. Since the kernel of T — λχ is contained
in /Ci, the kernel of the restriction of T — λι to /Ci is the same as the kernel of T — λι.
The reduction of A—ΧχΙ = A+I to row-echelon form yields the basis {(1,0,1,0,1,0)}
of the kernel W i . By use of the row-echelon form of (A + 7) 2 , this extends to the
basis {(1,0,1,0,1,0); (1,1,1,1,0,1)} of W 2 . Repetition of this procedure with (A + I)3
produces the basis
of W3. Upon finding the dimension of W4, we discover that W3 = W4. By Lemma
10.26, this indicates that the index of iVi is t\ = 3. Following the procedure of Theorem
10.28, we replace (1,1,1,1,0,1) by
Nx (0,0,1,0,0,0) = ( Τ - λ χ ) ( 0 , 0 , 1 , 0 , 0 , 0 )
and
(1,0,1,0,1,0) by #2(0,0,1,0,0,0)
to obtain
ßi = {(1,0,1,0,1,0), (0,1,0,1,-1,1),(0,0,1,0,0,0)}.
The matrix of N\ relative to this basis is
0 1 0
U 0 0 1
0 0 0
Following the same procedure with the eigenvalue X2 = —2, and letting W j =
ker (ΝΔ = ker(T + 2)'", we find
W1 = ((0,0,0,1,1,0), (0,0,0,0,0,1)),
W 2 = ((0,0,0,1,1,0), (0,0,0,0,0,1); (0,0,0,0,1,0)).
We find that W2 = W 3 , and this indicates that t2 = 2. We then replace the first vector
in the basis of W2 by
0 1 0
0 0 0
L O o o
The desired basis B' is thus given by
λΐ/mi + L\ 0
J =
0 A2/m 2 + ^2
- 1 1 0 0 0 0
0 - 1 1 0 0 0
0 0 - 1 0 0 0
0 0 0 - 2 1 0
0 0 0 0 - 2 0
0 0 0 0 0-2
The matrix of transition from SQ to B' is
1 0 0 0 0 0
0 1 0 0 0 0
1 0 1 0 0 0
0 1 0 1 0 0
1 - 1 0 - 1 1 0
0 1 0 0 0 1
2 1 0
A- 0 2 0
2 3 1
362 Chapter 10 Spectral Decompositions
1 1 0 0 3 1 0 0
0 1 1 0 0 0 1 0
(c)A (d)A =
0 0 1 1 -1-3 3 0
1 0 2 1 1 3 - 1 2
3. For each part of Problem 2, let T be the linear transformation of R 4 that has the
matrix A relative to £4. Find a basis of R 4 such that the matrix of T relative to
this basis is the Jordan canonical matrix J for T, and write down a matrix P such
that P~lAP = J.
4. Use the results of Problems 2 and 3 to write each matrix A in Problem 2 as the
sum of a diagonalizable matrix and a nilpotent matrix.
7. (01,02,03) = a i e i 4 - 0 2 6 2 + 0363
2. The set of all vectors with components that satisfy a given equation in 5. is a
subspace of the type in 4· The set of all vectors with components that satisfy
the system of equations in 5. is the intersection of these m subspaces, and this
intersection is a subspace by Theorem 1.11.
3. U Μχ = {x G R I - 1 < x < 1}, Π M\ = 0
xec xec
363
364 Answers t o Selected Exercises
έ
-i*' 6)
u + v—
i i i i 1 ■^f^l 1 1 /l 1 ^ X
: v\t/(4,-4)
3 . P) Λΐλ is the origin. (J M\ is the set of all points except those with coordinates
(0, y), where y ψ 0.
13. {ui, U2}, where 6ij is the Kronecker delta and ui = (<5n, 6u,..., ^ΐη);
U 2 = {621,^22, ■■■,δ2η)
3. Replace the second vector by the sum of the second vector and (—2) times the
first vector.
9. Ei : replace the third vector by the sum of the third and (2) times the second.
E2 : replace the second vector by the sum of the second and (3) times the first.
Es : replace the first vector by the sum of the first and (—1) times the fourth.
E4 : multiply the fourth vector by 2.
11. Use the inverses of the elementary operations in Problem 6 in reverse order.
13. The elementary operation of type II which replaces the first vector by the sum of
the first vector and zero times the second vector is the identity operation.
3. (a) 2 (c) 3
6. (a) (A) = (B) (c) (A) φ (B) (e) (A) = (B) 7. (a) 3
366 Answers to Selected Exercises
9. (a) Ei : replace the second vector by the sum of the second vector and (—1)
times the first.
£2 : replace the first vector by the sum of the first and second.
Es : multiply the second vector by (—1).
£4 : replace the first vector by the sum of the first and (—|) times the second.
£5 : multiply the first vector by 2.
(c) £1 : interchange the first and second vectors.
£2 : replace the second vector by the sum of the second vector and (—4)
times the first vector.
£3 : replace the first vector by the sum of the first vector and (—1) times the
second vector.
£4 : replace the first vector by the sum of the first vector and (—1) times the
second vector.
£5 : multiply the second vector by 2.
1 l]
[2 1 1 -1 1 0
5. (a) (c) (e)
[3 2 2 6 1 -1
1 1 1
6. (a) {(4,4), ( - 1 , 1 ) }
(c) {(0,0,0,0), (0,1,4,4), (8, - 2 , 8 , - 1 6 ) , (5,3,10, - 4 ) , (5,2,6, - 8 ) }
(e) { ( 0 , 1 , - 4 ) , (-1,2,4), ( 1 , - 2 , 0 ) }
3 -8 -5
7. (a) is a matrix of transition from the first set to the second set.
0 4 4
(c) No matrix of transition exists.
Answers t o Selected Exercises 367
2
8. (a) (7,14) (c) (2,4,-14,1) 9. (a) (c) -2
-1
[29 - 5 _
9 -18 11 12
1. (a) 6 -6 (c) (e)
29 - 2 -75 - 7
L
[32 - 5
4 - 3 - 4 2
1 -6 -5
- 1 0 32 - 2 5 2
2. (a) BA does not exist. (c) 12 - 2 -11 (e)
- 3 6 69 - 2 4 - 6
18 32 8
-16 12 16 - 8
1 2 2 1
3. (a) n = r (c) n = r and m = t b. A ,B
2 1 1 2
2 1 1 1
7. A = ,B =
2 1 -2 -2
1 2 3 2 1 4
8. A = ,B = ,C =
2 4 2 4 3 3
11. χχ — 2x2 + £3 = 4
2xi + 3x 3 = 5
xi + 4x 2 - x3 = 6
1 0 0
3. (a) 0 1 0
0 5 1
1 -4 0 0 0 1 1 0 0
4. (a) 0 1 0 (c) Not elementary (e) 0 1 0 (g) 0 1 0
1
0 0 1 1 0 0 0 0 ·>
0 1 1 0 1 0^
5. A~
1 0
.° i . -2 1j
1 0 1 4 -1 0 1 0 -5 0 "i-f" i o]
6. (a) (c)
[θ 4 0 1 0 1
Λ K 0 1 0 1 2 l)
-2 - 3
7. (a) (c) 8. (a) (c)
1 1
1 -3 0
1 -2
13. A 15. 0 1 0
i - l 0
0 0 1
1 -2
3. (a)
0 1
Answers to Selected Exercises 369
[l 0 1 0 0
5. Not always. Let A = 0 0 and M -2 1 0 . Then A is the matrix of
11 1j 0 0 1
1 0
transition from 83 to A = {(1,0,1), (0,0,1)} and MA = -2 0 is the matrix
1
of transition form £ 3 to B = {(1, - 2 , 1 ) , (0,0,1)}. Now (1, - 2 , 1 ) is in B but not
in (A) since it has a nonzero second component. Thus (B) Φ (A) in this case.
1 0 0 0
0 0 0 1 0 0 0 2 0 0 0
1 0 0 0 1 0 0 0 1 0 0
6. (a) (c) (e)
2 0 0 -1 0 0 0 1 2 0 0
0 1 0 -1 2 0 0 0 0 1 0
1 1 1 0
0 0 0 0
1 0 0 0
(g)
0 0 0 0
0 0 1 0 0
7 3 41
1 0 -1 -1 1 8 8 8
-1 2 -1
2 1 -1 1 n 1 1 7
7. (a) 1 1 0 (c) (e) u 4 4 4
0 0 1 0 n 3 1 5
0 0 1 u 8 8 8
0 0 0 1 0 0 0 1
1 1
2 2 -2 -1 0
1 1
2 2 -1 -1 1
(g) 0 0 1 0 1
0 0 0 1 -2
3 3
2 o 1 0 0
1. (a) Multiply the first row by 4. (c) Interchange the first and third rows.
370 Answers to Selected Exercises
2. (a) Not in reduced row-echelon form. (c) Not in reduced row-echelon form.
(e) Not in reduced row-echelon form.
41
1 0 0 8
7
1 0 1 1 0 1 1 0 1 0 8
5
0 1 0 0 1 1 -1 0 0 1 8
(c) (e)
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
1 0 2 1 0
0 1 1 1 0
0 0 0 0 1
0 0 0 0 0
0 - 1 0 1 0 0 0
0 1 0 - 2 1 0 0
(c)
0 -2 1 1 0 1 0
1 0 0 1 - 2 0 1
7
1 0 "8 0 0
1
0 0 4 0 0 0 0
3
0 0 8 0-8- 0 0 0
(g)
-1 0 -1 0 1 1 0 1
-1 0 -2 1 0 0 1 0
-2 1 0 0 0 0
10
9
(c)
4 14 -7
3. 6 29 - 1 4
5 25 - 1 2
"l 0 \
5. (a) Q = o-l o an( i? are not column-equivalent.
(c) A and
o 19
1 Z
8
0 0 0-1
0 0 -I
6. (a) A and B are not row-equivalent. (c) P =
o i -1
1 0 -f
1 0 0 0 0 0 0 1 0 0 0
B = A 2
0 1 0 3 0 1 0 0 1 0 1
1 0 1 0 1 2 1 o o A 0 0
1. (a) 2 (c) 2
2. (a) {(1,0, - 1 , 2 ) , (0,1,2, - 1 ) } (c) {(1,0, - 1 , - 2 ) , (0,1,1,1)}
3. In each part except (d), A and B are equivalent.
5. Only A and B are equivalent. 6. (a) No (c) Yes
1 0 0
7. B = {(1,2,0) > (0,0,1),(0,1,0)},P = 0 0 1
-2 1 0
372 Answers to Selected Exercises
0 0 0 1
1 0 0
0 1 - 2 0
9. (a) P = 0 1 0 ,Q =
0 0 1 0
1 -1 1
1 - 2 0 0
1 0 0 0 0 0 0 1
0 1 0 0 3 _9 I
Δ
11. P ,Q = 2 2
0 0 1 0 0 1 0 0
-1 0 0 1 -5 10 0 -5
Γΐ 0 0 1 0 0 0 0 0 0 0 0
2. I 0 0 î 0 0 > 1 0 î 0 1 î 0 0 î 0 0 3. mn
[o o 0 0 0 0 0 0 1 0 0 1
4. (a) The set is not a basis for P2. (c) The set is a basis for P2·
1 0 1
5. (a) 0 1 1
1 0 0
[1 0 0 1]
(c) Î or any subset consisting of two distinct vectors.
Il 1 -1 i l
(e) {pi{x),P2(x),P3(x),P4{x)}
7. (a) {(1,0,0,0), (0,0,1,0)} is a basis for W .
1 2 1 5
15. W is not a subspace of R2X2· The vectors u = and v = are in
3 4 6 7
2 7
W , but u + v = is not in W because its first row, first column element
9 11
is not equal to 1.
1 0 1 0
17. W is not a subspace of R2X2· The vectors u and v = are
0 2 0 -2
1 0
in W , but u + v = is not in W since it is nonzero and not invertible.
0 0
2 3 2 5
19. W is not a subspace of R2X2· The vectors u = and v = are
4 -2 6 2
4 8
in W, but u 4- v = is not in W since 4 2 φ 0 2
10 0
374 Answers to Selected Exercises
3. (a) Γ = 2 , / ( α ι ( 1 , - 1 , 1 ) + α 2 ( 0 , 1 , 0 ) ) = (αι,α 2 )
(c) r = 3, / (αι(2,0,4, - 1 ) + a 2 (5, -1,11,8) + a 3 (0,1, - 7 , 9 ) ) = (a l5 a 2 , a 3 )
(e) r = 2 , / ( a i p i ( x ) + a 2 p 2 (x)) = (ai,a 2 )
1 0 1 1
(g) r = 2,/ U + a2 (01,^2)
-1 1 1 1
[-1 -2 3 3]
2. (a) {p1(x),p2{x),p3(x)} 3. (a) î
1-2 - 1 2 2 1
2 1 1
4. (a) (A) = (B) (b) <Λ) φ (Β) 5. (a) 0 1 -3
2 0 2
21. (b) n + 1
22. 1. 0 3. { ( 1 , - 1 , - 2 ) } 5. 0 7. {(18,1,-4,0)}
9. {(2,1,0,0), (1,0, - 1 , 1 ) } 1 1 . { ( - 3 , 1 , 0 , 0 , 0 ) , ( - 2 , 0 , - 2 , 1 , 0 ) , (-2,0,0,0,1)}
13. {(-2,1,0,0,0)}
1 1 -1 2 4 -2
0 5 -2 -1 -2 0
5. 7. (3,0,2) 11. § + | x - f x 2 - 2 x 3
4 0 1 0 -1 2
2 3 1 0 0 1
4 31
2
Γΐ 0 2
13. (3,5) 15. 17. -1 0
1 0
L
1 3J
3 il
19. (a) {(1,0, - 3 ) , (0,1,2)} (c) {(1,0,1,0), (0,1,1,0), (0,0,0,1)}
20. (a) {(1, - 1 , 1 , 0 ) , (3, - 2 , 0 , 1 ) } (c) {(6, -f, 1,0,0), ( - 7 , f, 0,1,0)}
21. (a) { ( l , 0 , l ) , ( 0 , l , - i ) }
376 Answers t o Selected Exercises
1. 1 + 2x + 3x 2 3. B = {1 + x + x 2 ,2 + x, 2 + x + x2}
1 0 0 0
4 1 0 0
5. (a) (b)(-2,l) 7. 0 1 0 0 9.
-4 0 1 0
0 0 0 0
Ί
3
2 0 0 4 2
[-1 0
11. 1 2 0 13. 15. -1 0
L
0 5J
0 0 1 3 1
Β' = εζ
5 0 - 3 0 4 10 0 0
1. (a) Matrix of S 0 1 6 - 1 , Matrix of T 6 - 1 0 7
2 - 9 5 2 -3 8 0-5
9 10 - 3 0
Matrix of S + T 6 0 6 6
-1 -1 5 -3
- 2 -30 - 6 0
(b) Matrix of 25 - 3T : -18 5 12 - 2 3
13 - 4 2 10 19
3 -3
-2 4
Answers to Selected Exercises 377
[ 1 1 -7
5. TS has matrix 4 1 relative to the bases {(1,2), (0,1)} of R 2 and £ 3 of R 3 .
6 -5
L
ui = - 2 v i - v 2 4- v 3 0 0 1
-11 - 8
7. (a) u 2 = vi - 2v 2 (b) 1 0 2 9.
16 9
u3 = v2 2 1 5
3. (a) Apply the following interchanges in the given order: 5 and 3, 4 and 3, 2 and
3, 2 and 4, 2 and 5.
(c) Apply the following interchanges in the given order: 3 and 1, 5 and 1, 4 and
1, 2 and 1, 2 and 4, 2 and 5, 2 and 3.
5. 1= £ l ( f c ) = ( n - l ) + ( n - 2 ) + --. + 2 + l = 2 i ^ l i
fc=l
7. 2 , - 1
7. 2abc(a + b + c)3
- 1 3 2 -10 4 -1 -34 8 36
(a) è 1 -1 0 (c) 8 -5 -1 (e)i20 12 - 4 -8
1 -1 -2 1 -4 1 33 - 6 - 3 2
19. {1,2}; 1,1 - x, x2; 2,1 + 2x2 21. 3, (3,2); 4, (1,1) 25. 1
For Problems 7 and 8, each eigenvalue is followed by its algebraic multiplicity and a
basis for the eigenspace.
11. For λ = 1 : (a) 1, (b) 1, (c) {1}; For λ = 2 : (a) 2, (b) 1, (c) {2 + x 2 }
λχ 0 0
0 λ2 0
15. , where Xi is the eigenvalue of T that corresponds to Vj.
0 0 ··· λ „ ^
1 1 1 0
0 1 0 0
a) A is similar over R to a diagonal matrix. (b) P =
- 1 0 0 1
0-1 0-1
2 0
a) T can be represented by a diagonal matrix. (b) ,{(2,1), ( 1 , - 2 ) }
0 -3
9. a) T cannot be represented by a diagonal matrix. (b) Not possible
11. a) T can be represented by a diagonal matrix.
3 0 0
b) | 0 3 0 | , {(0,1,0), (2,1,1), (5,4,3)}
0 0-2
1 0 2
17. a) A is similar over C to a diagonal matrix. (b) P 1 1 1
-3 -2 1
380 Answers to Selected Exercises
1 0 0
19. (a) A is similar over C to a diagonal matrix. (b) P = 0 1 2
1 0 -1
0 1 0 0
0 1 0
1 1 0 0 1 0
23. 26. (a) 0 0 1 (c)
0 1 0 0 0 1
2 0 5
-4 0-5 0
Exercises 8.2, page 244
1. / is a linear functional. 3. / is not a linear functional. 5. c = (2, —1,4)
7. /(5,43) = 12, /(xi, x 2 , x3) = 2xx - x 2 + 2x3
9. (a) A* = {pi,p 2 ,p 3 }, where pi(u) = x 3 - x 2 ,p 2 (u) = x2 - xi,p 3 (u) = xx for
u = (xi,X2,x 3 )·
(c) A* = {pi,P2,P3>, where pi(u) = |(2x x + x 2 - x 3 ),
p 2 (u) = | ( - 3 x i - x 2 + 3x 3 ),p 3 (u) = | ( - x i - x 2 + x3) for u = (xi,x 2 ,x 3 ).
(c) § [ 2 1 - l ] ,5 [ - 3 - 1 3] ,§[-l-ll]
2. (a) q(v) = -2y\ + 18y| (c) q(v) = y\- 2yiy3 - 4y2y3 - 3y|
-1 0 o] [ 144 -12 0
3. (a) 0-3 0 (c) -12 -12 12
0 0 4 0 12 -16
1. V! = ^ ( l , - l , 0 , l ) , v 2 = ^ ( 3 , 2 , 3 , - l ) , v 3 = ^ ( - 2 , - 1 , 3 , 1 )
3 V3 =
· 2 7 l 5 ( 2 U l + U2 - U
3)
4. (a) {1(1,2,-2,4), ^ ( 0 , - 1 , 3 , 2 ) , ^ ( 4 , 1 , 1 , - 1 ) }
0 2 ^ 0 4
V^ \/3 - 1 -2\/2 3 1
\/2 -y/3 -1J 2\/2 3 -1
| 1 0-2I Γ 2 0 4
3
· (a) P
= 2 ^ l· ">/3 M (c)p=^ -4 3 1
I1 V3 1I [ 4 3 - 1
4
· (a) B = {273^ *· !)» 5(0» - 1 » !)· ä i s i " 2 ' X> X)}
(c) B = { i ( l , -2,2), 1(0,1,1)^(4,1,-1)}
3 5 1 5+M - 5+i
0 2-1
1. (a) (c) 5 5 2 (e) 3 - 6i 3 + 6i
2 1 1
1 2 0 6 + 4z -6
382 Answers t o Selected Exercises
1 0 -6 1 1 -2 1 1 -3
1. (a) P = 1 0 4 (c)P = 1 -1 3 (e)P = 1 -1 2
0 1 5 0 0-2 0 0 1
1 1
2. (a) For part (a), P = ; for part (c), the matrix is not hermitian;
0 hi
1 1 -4-i 1 1 - 1
for part (e), P = 0 bi 5 - 2z ; for part (g), P = 0 0 2i
0 0 9 0 11 1
(b) In part (a), r = p = 2;in part (c), the matrix is not hermitian; in part (e),
r = p = 3; in part (g), r = p = 3
(c) Those in parts (a) and (b) are conjunctive, and those in parts (e) and (g) are
conjunctive.
3 0
3· (a) =i
3Λ/5
(c) The matrix in 1(c) is not hermitian.
0 5i
(·) 3VE
ό 0 U 5 - 2i (e)ds
V66 0 0 2iv/TÏ
0 0 9 0 11 Λ/ΪΪ
Answers to Selected Exercises 383
4. (a) Those for the matrices in parts (a), (b), (d), (e), (g).
(b) For / in part (a), A! = {(1,0), (1,5<)} ;
for / in part (e), M = {(1,0,0), (1,5i, 0), (4 + i, - 5 + 2i, - 9 ) } ;
for / in part (g), A! = {(1,0,0), (1,0,11), (1, - 2 i , - 1 ) }
5 i
9. (a)
-i 2
(bM={0?.o), (**#)}
13. Those in parts (a), (b), (e), (g).
3. (a) We have |(u, v)| = | — 13| = 13 and ||u|| · ||v|| = x/Ï4\/38 = Λ/532 = 2>/Ï33.
Since 13 < 2Λ/133, the inequality is verified.
(b) y/Ü (c) χ/78
2. (a) ^(3^V2,-l)^{l,-y/2,-l)^(-3,V2,-5)]
[l 0 0 1 0 0 o ol
5 5 5
0 0 0 0 1 0 0 1 1
7. {(1,1,1), (0, - 1 , - 1 ) , ( - 9 , - 1 1 - 2t, - 5 - 2<)}
8
* (a) {(i'Ti'iJ'v~2'v5>""iJ'( 2 Z '°'^~^);
Exercises 9.5, page 308
3. All of the matrices A are normal. A matrix U of the required type is given for
parts (a), (c), and (e).
1 1 1
z — 1 —i
(a) U ( ^ = 75
y/2 2 2
1 1 _1
z 1+i
L y/2 2 2
y/2 1-i 2
(e) U. 2y/2
\/2i 1+i -2i
-y/2 -\/2i 2 0
Answers to Selected Exercises 385
±V-l-bc b
3. A = , with —be > 1.
c =F\/-1 - be
5. (a) A = D + AT, where the diagonahzable matrix D and the nilpotent matrix N
5 2 2 2 2 1 1 0
4 3 4-4 -4 -2 -2 0
are given by D , and N ■■
-8 - 4 - 5 0 0 0 0 0
0 0 0 3 2 1 1 0
386 Answers to Selected Exercises
0 1 0 0
0 0 1 0
2. (a) ß(v, T) = {(4, - 3 , 1 , 0 ) , (1,0,0,0), (0, - 1 , 0 , 0 ) , (1,1,0, - 1 ) } ,
0 0 0 1
0 0 0 0
0 1 0
(c) ß(v,T) = {(2, - 4 , 0 , 0 , - 2 ) , (1,0, - 2 , 0 , - 1 ) , ( - 1 , - 1 , 1 , 0 , 1 ) } , 0 0 1
0 0 0
3 1 0 0 0 1 0 0
-2 1 0
0 3 0 0 0 0 0 0
1. P = 0 -2 0 2- (a) (c)
0 0 1 0 0 0 2 1
-4 0 1
0 0 0 -1 0 0 0 2
Answers to Selected Exercises 379
3. (a) B' = {(1, -2,0,1), (0,2,-1, -1), (1, -2,0,0), (0,1, -1,0)},
1 0 1 0
-2 2-2 1
0-1 0-1
1-10 0
(c) B> = {(1, -1,1, -1), (2,-1, 0,1), (-1, -1, -1, -1), (3,2,1,0)},
1 2 - 1 3
-1 -1 -1 2
P =
1 0 - 1 1
-1 1 - 1 0
5 2 2 2 1 1
4 3 4 -4 -2 - 2
(a) A +
-8 -4 - 5 0 0 0
0 0 0 2 1 0
1
0 2 o-i
1 1
i l 0 0 * 0
(c) A O -L
2
O +
1 1
2 o-l
-i 0 3
2
1 ■i o
5. D has matrix K and JV has matrix L relative to £e, where
-1 0 0 0 0 0 1 0 0 0 0
0 1 0 0 0 -1 0 1 0 0 0
0 0 -1 0 0 0 1 0 0 0 0
K and L
0 1 0-2 0 -2 0 1 1 1 0
1 -1 0 0-2 2 1 -1 -1 -1 0
0 1 0 0 0-2 -1 0 1 0 0 0
Index
A Jordan, 359
Addition Cartesian product, 270
of equations, 134 Cauchy-Schwarz inequality, 299
of linear transformations, 148, Change of variable
of matrices, 117 linear, 251
of subsets, 16 orthogonal, 255
of vectors, 2 Characteristic
Adjoint equation, 214
of a linear operator, 318 matrix, 214
of a matrix, 209 polynomial, 214
Algebraic multiplicity, 225 root, 213
Annihilator, 245 value, 213
Associated vector, 213
direct sum decomposition, 327 Coefficient matrix, 137
eigenvector, 213, 215 Cofactor, 196
Augmented matrix, 137 Column
equivalent matrices, 99
B matrix, 60
Basis, 29 operation, 83
cyclic, 348 space, 106
dual, 241 vectors, 106
orthogonal, 254 Companion matrix, 238
standard, 33, 54, 131 Complete set of projections, 327
Bijective mapping, 128 Complex
Bilinear form, 270 bilinear form, 283
complex, 283 inner product space, 293
hermitian complex, 286 Components, 1
matrix of, 271 Conformable, 67
rank of, 273 Congruence of matrices, 251
skew-symmetric, 282 Conjugate
symmetric, 277 of a matrix, 259
Binary relation, 99 transpose, 259
Conjunctive matrices, 285
c Coordinate, 33, 62
Canonical form matrix, 62
for quadratic form, 266 projections, 241
389
390 Index
D F
Determinant, 191 Field, 113
of order n, 192 Finite-dimensional vector space,
Diagonal Form
block matrix, 348 bilinear, 270
elements, 60 complex bilinear, 283
matrix, 60 hermitian, 287
Dimension, 33, 59, 115 quadratic, 247, 280
Direct sum, 39 Function, 128
associated, 327
Directed line segment, 19 G
Distance, 301 Gauss-Jordan elimination, 139
Division algorithm, 336 Geometric multiplicity, 223
Divisor, 336 Gram-Schmidt process, 255, 304
Dot product, 24 Greatest common divisor, 336
Dual
basis, 241 H
space, 242 Hamilton-Cay ley Theorem, 338
Hermitian
E complex bilinear form, 286
Eigenspace, 223 congruent matrices, 285
Eigenvalue, 213-214 form, 287
Eigenvector, 213, 215 matrix, 286
Elementary operator, 319
column operation, 83
matrix, 77 I
operation on vectors, 42 Idempotent
row operation, 91 linear operator, 326
Equality principal, 345
of equations, 134 Identity
of functions, 117 matrix, 60, 74
of indexed sets, 29 operation, 42
of mappings, 128 Image, 128, 145
of matrices, 60 inverse, 128, 145
of polynomials, 116 Index
of vectors, 1 of a permutation, 188
Index 390
real coordinate, 1 T
real inner product, 293 Trace, 155, 231, 240, 297
unitary, 293 Transformation, 128
vector, 114 linear, 145
Span, 13 matrix, 147
Spectral decomposition Transition matrix, 61
of a linear operator, 332 Transitive property, 99
of a matrix, 345 Transpose, 92
Spectrum, 213-214
Square matrix, 60 u
Square root of a linear operator, 335 Unit vector, 27
Standard Unitarily similar matrices, 313
basis, 33 Unitary
basis of subspace, 54, 131 matrix, 304
inner products, 295 operator, 309
Submatrix, 211 similarity, 313
Subspace, 10, 122 space, 293
cyclic, 348 Upper triangular matrix, 313
invariant, 341
spanned by a set, 15, 123 V
Sum Vector, 1, 114
direct, 39 column, 106
of linear transformations, 148 component, 1, 26
of matrices, 117 geometric interpretation, 19
of subsets, 16 norm, 24, 298
of vectors, 2 projection, 26
Superdiagonal elements, 350 space, 114
Surjective mapping, 128 unit, 27
Symétrie
bilinear form, 277 z
Symmetric Zero
matrix, 92 linear transformation, 146
operator, 319 matrix, 60, 117
property, 99 subspace, 11