You are on page 1of 393

LINEAR ALGEBRA

AND MATRIX
THEORY
JIMMIE GILBERT
LINDA GILBERT
University of South Carolina at Spartanburg
Spartanburg, South Carolina

®
ACADEMIC PRESS
San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper, fe)

Copyright © 1995, 1970 by ACADEMIC PRESS, INC.

All Rights Reserved.


No part of this publication may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopy, recording, or any information
storage and retrieval system, without permission in writing from the publisher.

Academic Press, Inc.


A Division of Harcourt Brace & Company
525 B Street, Suite 1900, San Diego, California 92101-4495

United Kingdom Edition published by


Academic Press Limited
24-28 Oval Road, London NW1 7DX

Library of Congress Cataloging-in-Publication Data

Gilbert, Jimmie, date.


Linear algebra and matrix theory / by Jimmie Gilbert, Linda Gilbert.
p. cm.
Includes indexes.
ISBN 0-12-282970-0
1. Algebras, Linear. I. Gilbert, Linda. II. Title.
QA184.G525 1994
512'.5--dc20 94-38932
CIP

PRINTED IN THE UNITED STATES OF AMERICA


95 96 97 98 99 00 EB 9 8 7 6 5 4 3 2 1
Preface
This text was planned for use with one of the following two options:

• The complete text would be suitable for a one-year undergraduate course for
mathematics majors and would provide a strong foundation for more abstract
courses at higher levels of instruction.

• A one-semester or one-quarter course could be taught from the first five chapters
together with selections from the remaining chapters. The selections could be
chosen so as to meet the requirements of students in the fields of business, science,
economics, or engineering.

The presentation of material presumes the knowledge and maturity gained from one
calculus course, and a second calculus course is a desirable prerequisite.
It is our opinion that linear algebra is well suited to provide the transition from the
intuitive developments of courses at a lower level to the more abstract treatments en­
countered later. Throughout the treatment here, material is presented from a structural
point of view: fundamental algebraic properties of the entities involved are emphasized.
This approach is particularly important because the mathematical systems encountered
in linear algebra furnish a wealth of examples for the structures studied in more ad­
vanced courses.
The unifying concept for the first five chapters is that of elementary operations. This
concept provides the pivot for a concise and efficient development of the basic theory
of vector spaces, linear transformations, matrix multiplication, and the fundamental
equivalence relations on matrices.
A rigorous treatment of determinants from the traditional viewpoint is presented in
Chapter 6. For a class already familiar with this material, the chapter can be omitted.
In Chapters 7 through 10, the central theme of the development is the change in
the matrix representing a vector function when only certain types of basis changes are
admitted. It is from this approach that the classical canonical forms for matrices are
derived.
Numerous examples and exercises are provided to illustrate the theory. Exercises are
included of both computational and theoretical nature. Those of a theoretical nature
amplify the treatment and provide experience in constructing deductive arguments,
while those of a computational nature illustrate fundamental techniques. The amount
of labor in the computational problems is kept to a minimum. Even so, many of them
provide opportunities to utilize current technology, if that is the wish of the instructor.
Answers are provided for about half of the computational problems.
The exercises are intended to develop confidence and deepen understanding. It is
assumed that students grow in maturity as they progress through the text and the
proportion of theoretical problems increases in later chapters.
Since much of the interest in linear algebra is due to its applications, the solution
of systems of linear equations and the study of eigenvalue problems appear early in the
text. Chapters 4 and 7 contain the most important applications of the theory.

ix
X Preface

ACKNOWLEDGMENTS

We wish to express our appreciation for the support given us by the University of
South Carolina at Spartanburg during the writing of this book, since much of the work
was done while we were on sabbatical leave. We would especially like to thank Sharon
Hahs, Jimmie Cook, and Olin Sansbury for their approval and encouragement of the
project.
This entire text was produced using Scientific Word and Scientific Workplace, soft­
ware packages from TCI Software Research, Inc. Special thanks are due to Christopher
Casey and Fred Osborne for their invaluable assistance throughout the project.
We would like to acknowledge with thanks the helpful suggestions made by the
following reviewers of the text:

Ed Dixon, Tennessee Technological University


Jack Garner, University of Arkansas at Little Rock
Edward Hinson, University of New Hampshire
Melvyn Jeter, Illinois Wesleyan University
Bob Jones, Belhaven College

We also wish to express our thanks to Dave Pallai for initiating the project at
Academic Press, to Peter Renz for his editorial guidance in developing the book, and to
Michael Early for his patience and encouragement while supervising production of the
book.

Jimmie Gilbert
Linda Gilbert
Chapter 1

Real Coordinate Spaces

1.1 Introduction
There are various approaches to that part of mathematics known as linear algebra.
Different approaches emphasize different aspects of the subject such as matrices, ap­
plications, or computational methods. As presented in this text, linear algebra is in
essence a study of vector spaces, and this study of vector spaces is primarily devoted
to finite-dimensional vector spaces. The real coordinate spaces, in addition to being
important in many applications, furnish excellent intuitive models of abstract finite-
dimensional vector spaces. For these reasons, we begin our study of linear algebra with
a study of the real coordinate spaces. Later it will be found that many of the results
and techniques employed here will easily generalize to more abstract settings.

1.2 The Vector Spaces R n


Throughout this text the symbol R will denote the set of all real numbers. We assume
a knowledge of the algebraic properties of R, and begin with the following definition of
real coordinate spaces.

Definition 1.1 For each positive integer n, R n will denote the set of all ordered n-
tuples (ui,v,2, ...,un) of real numbers Ui. Two n-tuples (u\,U2,-..,un) and {v\,V2, •••5^n)
are equal if and only if ui = Vi for i = l,2,...,n. The set R n is referred to as an n-
dimensional real coordinate space. The elements of R n are called n-dimensional
real coordinate vectors, or simply vectors. The numbers Ui in a vector (u\, U2, ··., un)
will be called the components of the vector. The elements of R will be referred to as
scalar s. 1
x
The terms "vector" and "scalar" are later extended to more general usage, but this will cause no
confusion since the context will make the meaning clear.

1
2 Chapter 1 Real Coordinate Spaces

The real coordinate spaces and the related terminology described in this definition
are easily seen to be generalizations and extensions of the two- and three-dimensional
vector spaces studied in the calculus.
When we use a single letter to represent a vector, the letter will be printed in
boldface lower case Roman, such as v, or written with an arrow over it, such as V . In
handwritten work with vectors, the arrow notation V is commonly used. Scalars will
be represented by letters printed in lower case italics.

Definition 1.2 Addition in R n is defined as follows: for any u = (ui,U2, ...,^n) and
v = (^1,^2, .·., v n ) in R71; the sum u + v is given by

U + V = (Ui +Vi,U2 +Ü2,...,Un +Vn).

For any scalar a and any vector u = ( ΐ / ι , ι ^ •••^η) in R n , the product au is defined
by
au = {aui^auz, ···, aun).

The operation that combines the scalar a and the vector u to yield au is referred to as
multiplication of the vector u by the scalar a, or simply as scalar multiplication.
Also, the product au is called a scalar multiple of u.
The following theorem gives the basic properties of the two operations that we have
defined.

T h e o r e m 1.3 The following properties are valid for any scalars a and 6, and any vec­
tors u, v, w in R n :
1. u + v G R n . (Closure under addition)
2. (u + v) + w = u + (v + w). (Associative property of addition)
3. There is a vector 0 in R n such that u + 0 = u for all u G R n . (Additive identity)
4. For each u G R n , there is a vector — u in R n such that u + (—u) = 0. (Additive
inverses)
5. u + v = v + u. (Commutative property of addition)
6. au G R n . (Absorption under scalar multiplication)
7. a(bu) = (ab)u. (Associative property of scalar multiplication)
8. a(u + v) = a u + av. (Distributive property, vector addition)
9. (a + b)u = au + bu. (Distributive property, scalar addition)
10. 1 u = u.
The proofs of these properties are easily carried out using the definitions of vector
addition and scalar multiplication, as well as the properties of real numbers. As typical
examples, properties 3, 4, and 8 will be proved here. The remaining proofs are left as
an exercise.
Proof of Property 3. The vector 0 = (0,0, ...,0) is in R n , and if u = (1*1,^2, ...,ix n ),

u + 0 = (ui + 0, u2 + 0,..., un + 0) = u.
1.2 The Vector Spaces R1 3

Proof of Property 4. If u = (u\,u2, •••,'^n) £ Rn> then v = (—ixi, —U2, ··., — Ό is in


R n since all real numbers have additive inverses. And since

U + V = (Ui + (-Ui),U2 + (-^2), ···, ^n + (-i/ n )) = 0,

v is the vector — u as required by property (4).


Proof of Property 8. Let u = {u\,u2, ...,^ n ) and v = (vi,v2, —·>νη). Then

a(u + v) =a(ui+vi,U2 +V2,...<,un + vn)


= (a{ui + vi),a(u2 + v2), ··., a(^n + vn))
= (aui + av\,au2 + ai>2, ...,au n + ai; n )
= (aui,ai/ 2 , ...,em n ) + (αυι,αι^, ...,αι; η )
= a u + av. ■

The associative property of addition in R n can be generalized so that the terms in


a sum such as αχΐΐι + CI2U2 + · · · + ajtUfc can be grouped with parentheses in any way
without changing the value of the result. Such a sum can be written in the compact
form Σί=ι aiui ΟΓ Σ ϊ a * u * ^ the number of terms in the sum is not important. It is
always understood that only a finite number of nonzero terms is involved in such a sum.

Definition 1.4 Let A be a nonempty set of vectors in R n . A vector v in R n is lin­


early dependent on the set A if there exist vectors Ui,U2,.,Ufc in A and scalars
ai,a2,..., ak such that v = ΣΪ=Ι aiui- A vector of the form 5 ^ = 1 ciiUi is called a linear
combination of the u^.

If linear dependence on a finite set B = {vi, V2, .··, v r } is under consideration, the
statement in Definition 1.4 is equivalent to requiring that all of the vectors in B be
involved in the linear combination. That is, a vector v is linearly dependent on B =
{vi, V2,..., v r } if and only if there are scalars &i, 62,..., br such that v = $^^ =1 biVi.

Example 1 □ As an illustration, let B = {vi, v 2 , v 3 } , where

Vl = ( l , 0 , 2 , l ) , v 2 = ( 0 , - 2 , 2 , 3 ) , and v 3 = ( 1 , - 2 , 4 , 4 ) ,

and let v = (3, —4,10,9). Now v can be written as

v = 1 · vi + 0 · v 2 + 2 · v 3 ,

or
ν = 3 · ν ι + 2 - V 2 + 0- v 3 .
Either of these combinations shows that v is linearly dependent on B.
4 Chapter 1 Real Coordinate Spaces

To consider a situation involving an infinite set, let A be the set of all vectors in R 3
that have integral components, and let v = (y/2, §,0). Now Ui = (1,0,1), 112 = (0,1,0),
and 113 = (0,0,1) are in A, and

v = \/2ui + - u 2 - λ/2ιΐ3.
o
Thus v is linearly dependent on A. It should be noted that other choices of vectors u^
can be made in order to exhibit this dependence. ■
In order to decide whether a certain vector is linearly dependent on a given set in
R n , it is usually necessary to solve a system of equations. This is illustrated in the
following example.
Example 2 □ Consider the question as to whether (6,0, —1) is linearly dependent on
the set A = {(2, —1,1), (0,1, —1), (—2,1,0)}. To answer the question, we investigate the
conditions on αι,α2, and as that are required by the equation

αι(2, - 1 , 1 ) + a 2 (0,1, - 1 ) + a 3 ( - 2 , 1 , 0 ) = (6,0, - 1 ) .


Performing the indicated scalar multiplications and additions in the left member of this
equation leads to
(2αι - 2a 3 , —a\ + a 2 + 03, «i — «2 ) = (6,0, - 1 ) .
This vector equation is equivalent to the system of equations

2a\ — 2a3 = 6
-a\ + a2 + 03 = 0
a\ — a 2 = —1.

We decide to work toward the solution of this system by eliminating a\ from two of the
equations in the system. As steps toward this goal, we multiply the first equation by \
and we add the second equation to the third equation. These steps yield the system

a\ — as = 3
—a\ + a2 + as = 0
as = - 1 .
Adding the first equation to the second now results in

a\ — as = 3
a2 = 3
a3 = - 1 .
The solution a\ — 2, a 2 = 3, as = — 1 is now readily obtained. Thus the vector (6,0,-1)
is linearly dependent on the set A. M
1.2 T h e Vector Spaces R n 5

Another important type of dependence is given in the definition below. This time,
the phrase linearly dependent involves only a set instead of involving both a vector and
a set.

Definition 1.5 A set A of vectors in R n is linearly d e p e n d e n t if there is a collection


of vectors u i , 112,..., u& in A and scalars ci, C2,..., c/c, not all of which are zero, such that
C1U1 + C2U2 -+-··· + CkUk — 0. If no such collection of vectors exists in A, then A is
called linearly i n d e p e n d e n t .

Again, the case involving a finite set of vectors is of special interest. It is readily
seen that a finite set B = {vi, V2,..., v r } is linearly dependent if and only if there are
scalars 61,62, ...,6 r , not all zero, such that ΣΓ=ι ^ v * = 0·

E x a m p l e 3 D The set B in Example 1 is linearly dependent since Vi + V2 — V3 = 0.


The set A is also linearly dependent. For if ui,U2, and 113 are as before and U4 =
( - 3 , - 2 , - 4 ) , then 3ui + 2u 2 + u 3 + u 4 = 0. ■

To determine the linear dependence or linear independence of a set {ui, U2,..., u / J of


vectors in R n , it is necessary to investigate the conditions on the Ci which are imposed
by requiring that ] C J = 1 CjUj = 0. If u, = (u\j,U2j, ...,unj), we have

k k
Σ C3U3 = T,Cj(Ulj,U2j,-,Unj)
3=1 3=1
k k k
= ( £ CjUij, Σ CjU2j, .·., Σ CjUnj).
3= 1 3= 1 3=1

Thus J2j=i cjuj = 0 if and only if J2j=i cjuij = 0 f° r e a c n * = 1, 2,..., n. This shows
that the problem of determining the conditions on the Cj is equivalent to investigating
the solutions of a system of n equations in k unknowns. If ^ 7 = 1 cjuj = 0 implies
ci = c2 = · · · = Ck = 0, then {ui, U2,..., u^} is linearly independent.
The discussion in the preceding paragraph is illustrated in the next example.

E x a m p l e 4 D Consider the set of vectors

{(1,1,8,1), (1,0,3,0), (3,1,14,1)}

in R 4 . To determine the linear dependence or linear independence of this set, we inves­


tigate the solutions Ci,C2,C3 to

c i ( l , l , 8 , l ) + c 2 ( l , 0 , 3 , 0 ) + c 3 ( 3 , l , 1 4 , l ) = (0,0,0,0).

Performing the indicated vector operations gives the equation

(ci + c2 + 3c 3 , ci + c 3 ,8ci + 3c2 + 14c3, cx + c3) = (0,0,0,0).


6 Chapter 1 Real Coordinate Spaces

This vector equation is equivalent to

ci + c2 + 3c 3 = 0

c\ + c3 = 0
8ci + 3c2 + 14c3 = 0
ci + c3 = 0 .

To solve this system, we first interchange the first two equations to place the equation
c\ + c 3 = 0 at the top.
c\ + c3 = 0
ci + c2 + 3c3 = 0
8ci + 3c2 + 14c3 = 0
c\ + c3 = 0

By adding suitable multiples of the first equation to each of the other equations, we
then eliminate c\ from all but the first equation. This yields the system

ci + c3 = 0
c2 + 2c3 = 0
3c2 + 6c3 = 0
0 = 0.

Eliminating c2 from the third equation gives

c\ -f c3 = 0
c2 + 2c3 = 0
0 - 0
0 = 0.

It is now clear that there are many solutions to the system, and they are given by

ci = - c 3
c2 = - 2 c 3
c 3 is arbitrary.

In particular, it is not necessary that c 3 be zero, so the original set of vectors is linearly
dependent. ■
1.2 T h e Vector Spaces Rg 7

The following theorem gives an alternative description of linear dependence for a


certain type of set.

T h e o r e m 1.6 A set A = {ui,ii2, ...,ιι^} in R n that contains at least two vectors is


linearly dependent if and only if some Uj is a linear combination of the remaining vectors
in the set.

Proof. If the set A = { u i , u 2 , ...,Ufc} is linearly dependent, then there are scalars
ai, (Z2, ...a/c such that Σί=ι a * u * = 0 with at least one α^, say α^, not zero. This implies
that
ajUj — — a\\\\ — · · · — dj-iUj-i — a j + i u J + i — · · · — a^u^
so that

■*=(-*)■—·+(-^-+(-^)—+ (-*)-
Thus Uj can be written as a linear combination of the remaining vectors in the set.
Now assume that some Uj is a linear combination of the remaining vectors in the
set, i.e.,
Uj = biui + 6 2 u 2 H h bj-iUj-x + fy+iUf+i H h bkuk.
Then
&iui + · · · + bj-iUj-x + ( - l ) u j -f 6 j + i u i + i + . · -h^Ufc = 0,
and since the coefficient of Uj in this linear combination is not zero, the set is linearly
dependent. ■

The different meanings of the word "dependent" in Definitions 1.4 and 1.5 should
be noted carefully. These meanings, though different, are closely related. The preced­
ing theorem, for example, could be restated as follows: "A set {ιΐι,...,ιι&} is linearly
dependent if and only if some u^ is linearly dependent on the remaining vectors." This
relation is further illustrated in some of the exercises at the end of this section.
In the last section of this chapter, the following definition and theorem are of primary
importance. Both are natural extensions of Definition 1.4.

Definition 1.7 A set A Ç R n is linearly dependent on the set B Ç R n if each u £ A


is linearly dependent on B.

Thus A is linearly dependent on B if and only if each vector in A is a linear combi­


nation of vectors that are contained in B.
Other types of dependence are considered in some situations, but linear dependence
is the only type that we will use. For this reason, we will frequently use the term
"dependent" in place of "linearly dependent."
In the proof of the next theorem, it is convenient to have available the notation
Σί=1 ( Σ ^ Ι Ι Uij) for t h e S u m o f t h e form
Σ™=1 U1J + Σ?=1 U 2j H + Σ?=1 Ukj- Tne
8 Chapter 1 Real Coordinate Spaces

associative and commutative properties for vector addition [Theorem 1.3, (2) and (5)]
imply that
k I m \
u =m / k
υ \
Σ E d ΣΣ 4
2=1 \j = l ) j= l \t=l /

Theorem 1.8 Let A,B, and C be subsets of R n . If A is dependent on B and B is


dependent on C, then A is dependent on C.

Proof. Suppose that A is dependent on B and B is dependent on C, and let u be an


arbitrary vector in A. Since A is dependent on #, there are vectors νχ, V2, ···, v m in B
and scalars αχ, α<ι,..., am such that
m

Since B is dependent on C, each Vj can be written as a linear combination of certain


vectors in C. In general, for different vectors Vj, different sets of vectors from C would
be involved in these linear combinations. But each of these m linear combinations (one
for each Vj) would involve only a finite number of vectors. Hence the set of all vectors
in C that appear in a term of at least one of these linear combinations is a finite set, say
{wi,w 2 ,..., Wfc}, and each Vj can be written in the form v^· = Σ ί = ι hjWi. Replacing
the Vj's in the above expression for u by these linear combinations, we obtain
m
U = Σ djVj
J= l
m / k \
Σ aJ Σ kW
7=1 \i=l /
m / k \
= Σ ( Σ OjiyWi
j = l \i=l J
= Σ Σ ajbijWi
ί=ι y = i /
k I m \
a b w
= Σ Σ J ij *·

Letting Q = Σ ^ ι ajbij f° r ^ = 1,2,...,fe,we have u = Σ ί = ι c i w i · Thus u is dependent


onC.
Since u was arbitrarily chosen in A, A is dependent on C and the theorem is
proved. ■

Exercises 1.2

1. For any pair of positive integers i and j , the symbol 6ij is defined by 6ij = 0 if
i φ j and <5^ = 1 if i = j . This symbol is known as the Kronecker delta.
1.2 The Vector Spaces R' 9

(a) Find the value of £)" = 1 fejLi % ) ·

(b) Find the value of ΣΓ=ι (Σ"=Ι(1 ~ ««)) ·

(c) Find the value of £ ? = 1 ( E J = i ( - l ) o y ) ·


(d) Find the value of Σ ^ = 1 öij6jk>

2. Prove the remaining parts of Theorem 1.3.

3. In each case, determine whether or not the given vector v is linearly dependent
on the given set A.

(a) v = (-2,1,4), A = {(1,1,1), (0,1,1), (0,0,1)}


(b) v = (-4,4,2), A = {(2,1, - 3 ) , (1, - 1 , 3 ) }
(c) v = ( 1 , 2 , 1 ) , A = {(1,0, - 2 ) , (0,2,1), ( 1 , 2 , - 1 ) , (-1,2,3)}
(d) v = (2,13, - 5 ) , A = {(1,2, - 1 ) , (3,6, - 3 ) , (-1,1,0), (0,6, - 2 ) , (2,4, - 2 ) }
(e) v - (0,1,2,0), .A = { ( 1 , 0 , - 1 , 1 ) , ( - 2 , 0 , 2 , - 2 ) , (1,1,1,1), (2,1,0,2)}
(f) v = (2,3, 5, 5 ) M = {(0,1,1,1), (1,0,1,1), ( 1 , - 1 , 0 , 0 ) , (1,1, 2, 2)}

4. Assuming the properties stated in Theorem 1.3, prove the following statements.

(a) The vector 0 in property (3) is unique.


(b) The vector —u in R n which satisfies u + (—u) = 0 is unique.
(c) - u = ( - l ) u .
(d) The vector u — v is by definition the vector w such that v + w = u. Prove
that u — v = u + (—l)v.

5. Determine whether or not the given set A is linearly dependent.

(a) Λ = { ( 1 , 0 , - 2 ) , ( 0 , 2 , 1 ) , ( - 1 , 2 , 3 ) }
(b) A = {(1,4,3), (2,12,6), (5,21,15), (0,2, - 1 ) }
(c) Λ = {(1,2,-1), (-1,1,0), ( 1 , 3 , - 1 ) }
(d) A = {(1,0,1,2), (2,1,0,0), (4,5,6,0), (1,1,1,0)}

6. Show that the given set is linearly dependent and write one of the vectors as a
linear combination of the remaining vectors.

(a) {(2,1,0), (1,1,0), (0,1,1), (-1,1,1)}


(b) {(1,2,1,0), (3, - 4 , 5 , 6 ) , (2, - 1 , 3 , 3 ) , ( - 2 , 6 , - 4 , - 6 ) }

7. Show that any vector in R 3 is dependent on the set {βχ,β2,β3} where ei =


( l , 0 , 0 ) , e 2 = ( 0 , l , 0 ) , e 3 = (0,0,1).
10 Chapter 1 Real Coordinate Spaces

8. Show that every vector in R 3 is dependent on the set {iii,U2,u 3 } where Ui =


(l,0,0),u 2 = (1,1,0), and us = (1,1,1).

9. Show that there is one vector in the set

{(1,1,0),(0,1,1),(1,0,-1),(1,0,1)}

that cannot be written as a linear combination of the other vectors in the set.
10. Prove that if the set {ui, U2,..., u^} of vectors in R n contains the zero vector, it
is linearly dependent.

11. Prove that a set consisting of exactly one nonzero vector is linearly independent.

12. Prove that a set of two vectors in R n is linearly dependent if and only if one of
the vectors is a scalar multiple of the other.

13. Prove that a set of nonzero vectors {ui, 112,..., u^} in R n is linearly dependent if
and only if some u r is a linear combination of the preceding vectors.

14. Let A = {ui, U2,113} be a linearly independent set of vectors in R n .

(a) Prove that the set {ui — 112,112 — U3, ui + 113} is linearly independent.
(b) Prove that the set {ui — 112,112 — 113, Ui — 113} is linearly dependent.

15. Let {ui,..., u r - i , u r , u r +i,..., u / J be a linearly independent set of A: vectors in R n ,


and let ufr = ^7=1 a j u j w ^ n ar Φ 0· Prove that {ui, ...,u r _i,u£.,u r +i, ...,ιι^} is
linearly independent.

16. Let 0 denote the empty set of vectors in R n . Determine whether or not 0 is
linearly dependent, and justify your conclusion.

17. Prove that any subset of a linearly independent set A Ç R n is linearly indepen­
dent.
18. Let A Ç R n . Prove that if A contains a linearly dependent subset, then A is
linearly dependent.

1.3 Subspaces of R n
There are many subsets of R n that possess the properties stated in Theorem 1.3. A
study of these subsets furnishes a great deal of insight into the structure of the spaces
R n , and is of vital importance in subsequent material.

Définition 1.9 A set W is a subspace of R n if W is contained in R n and if the


properties of Theorem 1.3 are valid in W . That is,
1. u + v G W for all u, v in W .
1.3 Subspaces of IV 11

2. (u -h v) + w = u + (v + w) for all u, v, w in W .
3. 0 G W .
4. For each u G W , —u is in W .
5. u - h v = v + u for all u, v in W .
6. au G W /or a// a G R and a// u G W .
7. a (bu) = (ab)u for all a, b G R ana7 a// u G W .
8. a(u + v) = a u + av /or all a eH and all u, v G W .
9. (a + b)u —au + 6u /or all a, 6 G R ana7 a// u G W .
10. 1 u = u for all u G W .
Before considering some examples of subspaces, we observe that the list of properties
in Definition 1.9 can be shortened a great deal. For example, properties (2), (5), (7),
(8), (9), and (10) are valid throughout R n , and hence are automatically satisfied in any
subset of R n . Thus a subset W of R n is a subspace if and only if properties (1), (3), (4),
and (6) hold in W . This reduces the amount of labor necessary in order to determine
whether or not a given subset is a subspace, but an even more practical test is given in
the following theorem.

Theorem 1.10 Let W be a subset o / R n . Then W is a subspace o / R n if and only if


the following conditions hold:
(i) W is nonempty;
(ii) for any a, b G R and any u, v G W, au + 6v G W.

Proof. Suppose that W is a subspace of R n . Then W is nonempty, since 0 G W by


property (3). Let a and b be any elements of R, and let u and v be elements of W . By
property (6), au and 6v are in W . Hence au + 6v G W by property (1), and condition
(ii) is satisfied.
Suppose, on the other hand, that conditions (i) and (ii) are satisfied in W . From
our discussion above, it is necessary only to show that properties (1), (3), (4), and
(6) are valid in W . Since W is nonempty, there is at least one vector u G W . By
condition (ii), l u - f ( - l ) u = 0G W , and property (3) is valid. Again by condition (ii),
(—l)u = —u G W , so property (4) is valid. For any u, v in W , l - u + l - v = u + v E W ,
and property (1) is valid. For any a G R and any u G W , au + 0 · u = au G W . Thus
property (6) is valid, and W is a subspace of R n . ■

Example 1 □ The following list gives several examples of subspaces of R n . We will


prove that the third set in the list forms a subspace and leave it as an exercise to verify
that the remaining sets are subspaces of R n .

1. The set {0} , called the zero s u b s p a c e of R n

2. The set of all scalar multiples of a fixed vector u G R n .


12 Chapter 1 Real Coordinate Spaces

3 . T h e set of all vectors t h a t are dependent on a given set { u i , 112,..., u ^ } of vectors


inRn.

m
4 . T h e set of all vectors (χι,Χ2, -.·,^η) R n t h a t satisfy a fixed equation

a\X\ + a2X2 + · · · + anXn = 0.

For n = 2 or n = 3, this example has a simple geometric interpretation, as we


shall see later.

m
5. T h e set of all vectors (xi,X2,...,xn) R-n t h a t satisfy the system of equations

a\\X\ + CL12X2 + · · ' + CilnXn = 0

a2\X\ + 022^2 + · * " + &2η%η = 0

a>mixi + οτη2^2 H l· amîlxn = 0 .

Proof for the third set. Let W be the set of all vectors t h a t are dependent on the set
A — { u i , U 2 , . . . , u/e} of vectors in R n . From the discussion in the paragraph following
Definition 1.4, we know t h a t W is the set of all vectors t h a t can be written in the form
Σί=ι aiui- T h e set W is nonempty since

0 · U! + 0 · u 2 H h 0 ■ uk = 0 is in W .

Let u, v be arbitrary vectors in W , and let a, b be arbitrary scalars. From the


definition of W , there are scalars C{ and d{ such t h a t

u = 2_. ciui an
d v = \^ diUi.
2=1

Thus we have
c u b d u
a u + bv = a ί Σ ii ) + ( Σ ii )
k k
= Σ aCiUi + Σ bdiUi
i=l i=l
k
= Y,(aci\ii + bdi\ii)
i=l
k
= Y,(aci + bdi)\ii.
i=l

T h e last expression is a linear combination of the elements in *4, and consequently


a u + 6v is an element of W since W contains all vectors t h a t are dependent on Λ. Both
conditions in Theorem 1.10 have been verified, and therefore W is a subspace of R n . ■
1.3 Subspaces of R1 13

Our next theorem has a connection with the sets listed as 4 and 5 in Example 1
that should be investigated by the student. In this theorem, we are confronted with a
situation which involves a collection that is not necessarily finite. In situations such as
this, it is desirable to have available a notational convenience known as indexing.
Let C and T be nonempty sets. Suppose that with each λ G C there is associated
a unique element t\ of T, and that each element of T is associated with at least one
λ G C. (That is, suppose that there is given a function with domain C and range T.)
Then we say that the set T is indexed by the set £, and refer to C as an index set. We
write {t\ | λ G C} to denote that the collection of t\'s is indexed by C.
If {M\ | λ G C} is a collection of sets M\ indexed by £, then \J M\ indicates
xec
the union of this collection of sets. Thus |J Λ4\ is the set of all elements that are
xec
contained in at least one Ai\. Similarly, Ç\ M.\ denotes the intersection of the sets
xec
M\, and consists of all elements that are in every Λ4χ.
Theorem 1.11 The intersection of any nonempty collection of subspaces of R n is a
subspace o / R n .

Proof. Let {S\ \ X G C} be any nonempty collection of subspaces S\ of R n , and let


W = f| S\. Now 0 eS\ for each λ G £, so 0 GW and W is nonempty. Let a, b G R,
xec
and let u, v G W. Since each of u and v is in «SA for every λ G £, au + bv G S\ for every
λ G C. Hence au + 6v G W, and W is a subspace by Theorem 1.10. ■

Thus the operation of intersection can be used to construct new subspaces from
given subspaces.
There are all sorts of subsets in a given subspace W of R n . Some of these have the
important property of being spanning sets for W , or sets that span W . The following
definition describes this property.

Definition 1.12 LetW be a subspace o / R n . A nonempty set Λ of vectors in R n spans


W if A Ç W and if every vector in W is a linear combination of vectors in A. By
definition, the empty set 0 spans the zero subspace.

Intuitively, the word span is a natural choice in Definition 1.12 because a spanning
set A reaches across (hence spans) the entire subspace when all linear combinations of
A are formed.

Example 2 □ We shall show that each of the following sets spans R 3 :

ε3 = {(ΐ,ο,ο), (ο,ι,ο), (ο,ο,ΐ)},


.A = { ( 1 , 1 , 1 ) , (0,1,1), (0,0,1)}.

The set 83 spans R 3 since an arbitrary (x,y,z) in R 3 can be written as

(χ, y, z) = x(l, 0,0) + y(0,1,0) + *(0,0,1).


14 Chapter 1 Real Coordinate Spaces

In calculus texts, the vectors in £3 are labeled with the standard notation

i = (1,0,0), j = (0,1,0), k = (0,0,1)

and this is used to write (x, y, z) = xi + y] + zk.


We can take advantage of the way zeros are placed in the vectors of A and write

(*, y, z) = x(l, 1,1) + (y - x)(0,1,1) + (z - y)(0,0,1).

The coefficients in this equation can be found by inspection if we start with the first
component and work from left to right. ■

We shall see in Theorem 1.15 that the concept of a spanning set is closely related to
the set (A) defined as follows.

Definition 1.13 For any nonempty set A of vectors in R n , (A) is the set of all vectors
in R n that are dependent on A. By definition, (0) is the zero subspace {0}.

Thus, for A Φ 0 , (*4) is the set of all vectors u that can be written as u = J^ · χ ÜJUJ
with ÜJ in R and Uj in A. Since any u in A is dependent on A, the subset relation
A Ç (A) always holds.
In Example 1 of this section, the third set listed is (^4) where A = {ui,ii2, ...,Ufc}
in R n . When the notation (A) is combined with the set notation for this A, the result
is a somewhat cumbersome notation:

(A) = ({ui,u 2 ,...,u f c }).

We make a notational agreement for situations like this to simply write

(A) = (ui,u 2 ,...,Ufc).

For example, if A = {(1,3, 7), (2,0,6)}, we would write

CA> = <(1,3,7),(2,0,6)>

instead of (A) = ({(1,3, 7), (2,0,6)}) to indicate the set of all vectors that are dependent
on A.
It is proved in Example 1 that, for a finite subset A = {ui,u 2 , ...,Ufc} of R n , the
set (A) is a subspace of R n . The next theorem generalizes this result to an arbitrary
subset A of R n .

Theorem 1.14 For any subset A o / R n , (.4) is a subspace o / R n .


1.3 Subspaces of R1 15

Proof. If A is empty, then (A) is the zero subspace by definition.


Suppose A is nonempty. Then (A) is nonempty, since A Ç (A). Let a, b G R, and
let u, v G (.A). Now au + bv is dependent on {u, v} and each of u and v is dependent on
A. Hence {au + bv} is dependent on {u, v} and {u,v} is dependent on A. Therefore
{au + bv} is dependent on A by Theorem 1.8. Thus au + 6v G (A), and (A) is a
subspace. ■

We state the relation between Definitions 1.12 and 1.13 as a theorem, even though
the proof is almost trivial.

Theorem 1.15 Let W be a subspace o / R n , and let A be a subset o / R n . Then A spans


W if and only if (A) — W.

Proof. The statement is trivial in case A = 0. Suppose, then, that A is nonempty.


If A spans W , then every vector in W is dependent on A, so W Ç (.4). Now
A Ç W , and repeated application of condition (ii), Theorem 1.10, yields the fact that
any linear combination of vectors in A is again a vector in W . Thus, (A) Ç W , and we
have (A) = W .
If (A) = W , if follows at once that A spans W , and this completes the proof. ■

We will refer to (A) as the subspace spanned by A. Some of the notations used
in various texts for this same subspace are

span (A), sp (A), lin (A), Sp (A), and S [A].

We will use span(^4) or (A) in this book.

Example 3 □ With A and W as follows, we shall determine whether or not A spans


W.

.A = {(1,0,1,0), (0,1,0,1), (2,3,2,3)}

W = ((1,0,1,0), (1,1,1,1), (0,1,1,1))


We first check to see whether A Ç W . That is, we check to see if each vector in A
is a linear combination of the vectors listed in the spanning set for W . By inspection,
we see that (1,0,1,0) is listed in the spanning set, and

(0,1,0,1) = (-1)(1,0,1,0) + (1)(1,1,1,1) + (0)(0,1,1,1).

The linear combination is not as apparent with (2,3,2,3), so we place unknown coeffi­
cients in the equation

α ι ( 1 , 0 , 1 , 0 ) + α 2 (1,1,1,1) + α 3 (0,1,1,1) = (2,3,2,3).


16 Chapter 1 Real Coordinate Spaces

Using the same procedure as in Example 2 of Section 1.2, we obtain the system of
equations
ai + a,2 =2
a2 -l· a3 = S
ai + a2 + a3 = 2
a2 + a3 = 3 .
It is then easy to find the solution a\ = — 1, a2 = 3, a$ = 0. That is,

(-1)(1,0,1,0) + 3(1,1,1,1) + 0(0,1,1,1) = (2,3,2,3).

Thus we have ^ Ç W .
We must now decide if every vector in W is linearly dependent on A, and Theorem
1.8 is of help here. We have W dependent on the set

B = {(1,0,1,0), (1,1,1,1),(0,1,1,1)}.

If B is dependent on *4, then W is dependent on A, by Theorem 1.8. On the other


hand, if B is not dependent on A, then W is not dependent on A since 6 Ç W . Thus
we need only check to see if each vector in B is a linear combination of vectors in A.
We see that (1,0,1,0) is listed in A, and

(1,1,1,1) = (1,0,1,0) + (0,1,0,1).

Considering (0,1,1,1), we set up the equation

6i(l, 0 , 1 , 0 ) + 62(0,1,0,1)+ 63(2,3,2,3) = (0,1,1,1).


This is equivalent to the system

61 + 263 = 0
62 + 363 = 1
61 + 263 = 1
62 + 363 = 1 .

The first and third equations contradict each other, so there is no solution. Hence W
is not dependent on *4, and the set A does not span W . ■

We saw earlier in this section that the operation of intersection can be used to
generate new subspaces from known subspaces. There is another operation, given in the
following definition, that also can be used to form new subspaces from given subspaces.

Definition 1.16 Let S\ and S2 be nonempty subsets of R n . Then the sum S\ + £2


is the set of all vectors u G R n that can be expressed in the form u = ui + U2, where
ui G <Si, and u2 G S2.
1.3 Subspaces of R1 17

Although this definition applies to nonempty subsets of R n generally, the more


interesting situations are those in which both subsets are subspaces.

T h e o r e m 1.17 / / W i and W2 are subspaces o / R n , then W i + W 2 is a subspace of


Rn.

Proof. Clearly 0 + 0 = 0 is in W i + W2, so that W i + W2 is nonempty.


Let a, b G R, and let u and v be arbitrary elements in W i + W 2 . From the definition
of W i -f W 2 , it follows that there are vectors Ui, vi in W i and 112, v 2 in W2 such that
u = Ui + 112 and v = vi + V2. Now au\ + 6vi G W i and a\i2 + f>v2 G W2 since W i
and W2 are subspaces. This gives

au + bv = a(ui + u 2 ) + b(vx + v 2 )
= (aui + 6vi) + (au 2 + 6v 2 ),

which clearly is an element of W i + W2. Therefore, W i + W2 is a subspace of R n by


Theorem 1.10. ■

E x a m p l e 4 D Consider the subspaces W i and W2 as follows:

Wi = ((l,-l,0,0),(0,0,0,l)>,

W 2 = {(2,-2,0,0), (0,0,1,0)).
Then W i is the set of all vectors of the form

α ι ( 1 , - 1 , 0 , 0 ) + 02(0,0,0,1) = ( a i , - a i , 0 , a 2 ) ,

W2 is the set of all vectors of the form

6i(2,-2,0,0) + &2(0,0,l,0) = (26i,-26i,62,0),

and W i + W2 is the set of all vectors of the form

α ι ( Ι , - Ι , Ο , Ο ) + α 2 ( 0 , 0 , 0 , 1 ) + 6 i ( 2 , - 2 , 0 , 0 ) + &2(0,0,1,0)
= (ai + 2&1, —ai — 2b\, 62,02).

The last equation describes the vectors in W i + W 2 , but it is not the most efficient
description possible. Since a\ + 2b\ can take on any real number c\ as a value, we see
that W i + W2 is the set of all vectors of the form

(ci,-ci,c2,c3). ■

Exercises 1.3

1. Prove that each of the sets listed as 1, 2, 4, and 5 in Example 1 of this section is
a subspace of R n .
18 Chapter 1 Real Coordinate Spaces

2. Explain the connection between the sets listed as 4 and 5 in Example 1 and
Theorem 1.11.

3. Let C denote the set of all real numbers λ such that 0 < λ < 1. For each λ G £,
let M\ be the set of all x G R such that \x\ < X. Find (J M\ and |°| M\.
xec xec
3
4. Let V = R .

(a) Exhibit a set of three vectors that spans V.


(b) Exhibit a set of four vectors that spans V.

5. Formulate Definition 1.4 and Definition 1.5 for an indexed set A — {u\ \ X G £ }
of vectors in R n .

6. For each given set A and subspace W , determine whether or not A spans W .

(a) .4 = { ( 1 , 0 , 2 ) , ( - 1 , 1 , - 3 ) } , W = <(1,0,2)>
(b) A = {(1,0,2), ( - 1 , 1 , - 3 ) } , W = ((1,1,1), (2, -1,5))
(c) A = {(1,-2), ( - 1 , 3 ) } , W = R 2
(d) A = {(2,3,0,-1), ( 2 , 1 , - 1 , 2 ) } , W = ( ( 0 , - 2 , - 1 , 3 ) , (6,7,-1,0))
(e) A = {(3, - 1 , 2 , 1 ) , (4,0,1,0)}, W = ((3, - 1 , 2 , 1 ) , (4,0,1,0), (0, -1,0,1))
(f) A = {(3, - 1 , 2 , 1 ) , (4,0,1,0)}, W = ((3, - 1 , 2 , 1 ) , (4,0,1,0), (1,1, - 1 , -1))

7. Let W i = ((2,-1,5)) and W 2 = ( ( 3 , - 2 , 1 0 ) ) . Determine whether or not the


given vector u is in W i + W2.

(a) 11 = ( - 4 , 1 , - 5 )
(b) u = ( 3 , 2 , - 6 )
(c) 11= (-5,3,-2)
(d) u = (3,0,0)

8. Let A = {(1,2,0), (1,1,1)}, and let B = {(0,1, - 1 ) , (0,2,2)}. By direct verification


of the conditions of Theorem 1.10 in each case, show that {A), (B), and (A) + (B)
are subspaces of R™.

9. Let Ai = {u, v} and Ai = {u, v, w} be subsets of R™ with w = u + v. Show that


(,4i) = (^2) by use of Definition 1.13.

10. Prove or disprove: (A + B) = (A) + (B) for any nonempty subsets A and B of R™.

11. Let W be a subspace of R™. Use condition (ii) of Theorem 1.10 and mathematical
induction to show that any linear combination of vectors in W is again a vector
inW.
1.4 Geometrie Interpretations of R 2 and R 3 19

12. If A Ç R n , prove that (A) is the intersection of all of the subspaces of R n that
contain A.

13. Let W be a subspace of R n . Prove that (W) = W .

14. Prove that (A) = (B) if and only if every vector in A is dependent on B and every
vector in B is dependent on A.

1.4 Geometric Interpretations of R 2 and R 3


For n = 1,2, or 3, the vector space R n has a useful geometric interpretation in which
a vector is identified with a directed line segment. This procedure is no doubt familiar
to the student from the study of the calculus. In this section, we briefly review this
interpretation of vectors and relate the geometric concepts to our work. The procedure
can be described as follows.
For n = 1, the vector v = (x) is identified with the directed line segment on the real
line that has its initial point (tail) at the origin and its terminal point (head) at x. This
is shown in Figure 1.1.

I 1 1 I I
■10 1 2

Figure 1.1

For n = 2 or n = 3, the vector v = (x, y) or v = (x, y, z) is identified with the directed


line segment that has initial point at the origin and terminal point with rectangular
coordinates given by the components of v. This is shown in Figure 1.2.

fry) (*>y>z)

-»* > y

n =2 n = 3

Figure 1.2

In making identifications of vectors with directed line segments, we shall follow the
convention that any line segment with the same direction and the same length as the
one we have described may be used to represent the same vector v.
20 Chapter 1 Real Coordinate Spaces

In the remainder of this section, we shall concentrate our attention on R 3 . However,


it should be observed that corresponding results are valid in R 2 .
If u = (ui,u2,U3) and v = (^1,^2,^3), then u + v = {u\ + t>i, ^2 + ^2,^3 + ^3)· Thus,
in the identification above, u + v is the diagonal of a parallelogram which has u and v
as two adjacent sides. This is illustrated in Figure 1.3. The vector u + v can be drawn
by placing the initial point of v at the terminal point of u and then drawing the directed
line segment from the initial point of u to the terminal point of v. The "heads to tails"
construction shown in Figure 1.3 is called the parallelogram rule for adding vectors.

(u, + Vj,U2 + V2,U3 + V3)

Figure 1.3

Now u — v is the vector w satisfying v + w = u, so that u — v is the directed line


segment from the terminal point of v to the terminal point of u, as shown in Figure 1.4.
Since u — v has its head at u and its tail at v, this construction is sometimes referred
to as the "heads minus tails" rule.

Figure 1.4

In approaching the geometric interpretation of subspaces, it is convenient to consider


only those line segments with initial point at the origin. We shall do this in the following
four paragraphs.
As mentioned earlier, the set of all scalar multiples of a fixed nonzero vector v in
R 3 is a subspace of R 3 . From our interpretation above, it is clear that this subspace
(v) consists of all vectors that lie on a line passing through the origin. This is shown in
Figure 1.5.
1.4 Geometrie Interpretations of R 2 and R 3 21

Figure 1.5

If Λ = {vi, V2} is independent, then νχ and V2 are not collinear. If P is any point
in the plane determined by vi and V2, then the vector OP from the origin to P is the
diagonal of a parallelogram with sides parallel to Vi and V2, as shown in Figure 1.6. In
this case, the subspace (A) consists of all vectors in the plane through the origin that
contains νχ and V2.

Figure 1.6

If A = {vi, V2, V3} is linearly independent, then vi and V2 are not collinear and V3
does not lie in the plane of vi and V2. Vectors v i , V2, and V3 of this type are shown
in Figure 1.7. An arbitrary vector OP in R 3 is the diagonal of a parallelepiped with
adjacent edges aiVi,a2V2, and (Z3V3 as shown in Figure 1.7(a). The "heads to tails"
construction along the edges of the parallelepiped indicated in Figure 1.7(b) shows that

OP = aiVi + a 2 v 2 + a 3 v 3 .
22 Chapter 1 Real Coordinate Spaces

(a) (b)

Figure 1.7

We shall prove in the next section that a subset of R 3 cannot contain more than three
linearly independent vectors. Thus the subspaces of R 3 fall into one of four categories:

1. the origin;
2. a line through the origin;
3. a plane through the origin;
4. the entire space R 3 .

It is shown in calculus courses that a plane in R 3 consists of all points with rectan­
gular coordinates (x,y,z) that satisfy a linear equation

ax + by + cz = d

in which at least one of a, 6, c is not zero. A connection is made in the next example
between this fact and our classification of subspaces.

E x a m p l e 1 D Consider the problem of finding an equation of the plane (A) if ^t =


{(1,2,3), (3,5,1)}.
Now the line segments2 extending from the origin (0,0,0) to (1,2,3) and from the ori­
gin to (3,5,1) lie in the plane (A), so the three points with coordinates (0,0,0), (1,2,3),
and (3, 5,1) must all lie in the plane. This is shown in Figure 1.8.

2
Note that ordered triples such as (1,2,3) are doing double duty here. Sometimes they are coordi­
nates of points, and sometimes they are vectors.
1.4 Geometrie Interpretations of R 2 and R 3 23

» y

Figure 1.8

Since the points lie in the plane, their coordinates must satisfy the equation

ax + by + cz = d

of the plane. Substituting in order for (0,0,0), (1, 2,3), and (3,5,1), we obtain

0=d
a + 2b + 3c = d
3a + 56 + c = d .

Using d — 0 and subtracting 3 times the second equation from the last one leads to

a + 26 + 3c = 0
- 6 - 8c - 0 .

Solving for a and 6 in terms of c, we obtain the following solutions

a = 13c
b= - 8 c
c is arbitrary.

With c = 1, we have
13x - 8y + z = 0
as the equation of the plane (Λ).
24 Chapter 1 Real Coordinate Spaces

In the remainder of our discussion, we shall need the following definition, which
applies to real coordinate spaces in general.
Definition 1.18 For any two vectors u = (u\,U2,..., un) and v = (t>i,t>2, ...,ü n ), the
inner product (dot product, or scalar product) o / u and v is defined by
n
U v
U V = U\V\ + U2V2 H h UnVn = ^2 k k-
fc=l
The inner product defined in this way is a natural extension of the following defini­
tions that are used in the calculus:

( z i , 2 / i ) · (0:2,2/2) = X1X2 + 2/12/2,

( X l , 2 / l , Z l ) · (x2,2/2,^2) = ^ 1 ^ 2 + 2 / 1 2 / 2 + ^1^2.
The distance formulas used in the calculus lead to formulas for the length ||v|| of a
vector v in R 2 or R 3 as follows:

(x,y)\\ = \Jx2 + 2/2,

||(*,?/,z)|| = ^ 2 + 2/2 + * 2 .
We extend these formulas for length to more general use in the next definition.
Definition 1.19 For any v = (^1,^2, ...,v n ) in R n , ί/ie length for normj of v zs
denoted by ||v|| and zs defined by

Ml = \jvi+v'.* + ■
The following properties are direct consequences of the definitions involved, and are
presented as a theorem for convenient reference.
Theorem 1.20 For any u, v, w in R n and any a in R:
(i) u· v = v· u
(ii) (au) · v — u-(av) = a(u · v)
fm,) u - ( v + w) = u - v + u - w
(w,) ||u|| = ^/u · u , or u - u = | | u | |
(v) ||au|| = \a\ ||u||.

Our next theorem gives a geometric interpretation of u · v in R 2 or R 3 . In the proof,


we use the Law of Cosines from trigonometry: / / the sides and angles of an arbitrary
triangle are labeled according to the pattern in Figure 1.9, then
a2 + b2- c2
cosC =
2ab
We state and prove the theorem for R 3 , but the same result holds in R 2 with a similar
proof.
1.4 Geometrie Interpretations of R 2 and R 3 25

Figure 1.9

Theorem 1.21 For any two nonzero vectors u = (^1,^2^3) and v = (^1,^2,^3) in
R 3 ? u · v = ||u|| || v || cos Θ, where Θ is the angle between the directions of u and v and
0° < Θ < 180°.

Proof. Suppose first that Θ = 0° or Θ = 180°. Then v = cu, where the scalar c is
positive if Θ = 0° and negative if Θ = 180°. We have

||u|| ||v||cos0 = ||u|| (\c\ · ||u||)cos0


= |c|cos0||u||
II M2
= c\\u\\
and
U V = (ui,U2,U3) - (CUI,CU2,CU3)

c u
Thus the theorem is true for Θ = 0° or (9 = 180°.
Suppose now that 0° < 0 < 180°. If u — v is drawn from the head of v to the head
of u, the vectors u, v and u — v form a triangle with u — v as the side opposite Θ. (See
Figure 1.10.)
(Vi,V2,V3)
VL-\ = (UI-V1,U2-V29U3-V3)

(u,,u2,u3)

Figure 1.10

From the Law of Cosines, we have

l|u||2 + ||v||2 u-v


cos#
2||u||||v||
26 Chapter 1 Real Coordinate Spaces

Thus
lull llvll cos0 Ni 2 + V|| U · v|| 2 )
_ I W W 2 _i_„,2
= H l+ 2+M3+*>i+V5+u!
" [ K - ^l) 2 + (U2 - ^ ) 2 + (U3 - V3)2}}
U · V.

Corollary 1.22 In R 2 or R 3 , £wo nonzero vectors u and v are perpendicular (or or­
thogonal) if and only if u · v = 0.

Proof. This follows at once from the fact that u · v = 0 if and only if cos Θ = 0. ■

Suppose that u and v are vectors in R 2 or R 3 represented by directed line segments


with the same initial point, as shown in Figure 1.11. The vector labeled Proj u v in the
figure is called the vector projection of v onto u. In order to construct Proj u v,
we first draw the straight line that contains u. Next we draw a perpendicular segment
joining the head of v to the line containing u. The vector from the initial point of u to
the foot of the perpendicular segment is Proj u v. The vector Proj u v is also called the
vector component of v along u.

Pr°j u v Proj u v

Figure 1.11

Let Θ (0° < Θ < 180°) denote the angle between the directions of u and v as labeled
in Figure 1.11. The number
d= llvll cos Θ = ^ΓΖ
u
is called the scalar projection of v onto u or the scalar component of v along
u. From Figure 1.11, it is clear that d is the length of Proj u v if 0° < Θ < 90° and d is
the negative of the length of Proj u v if Θ > 90°. Thus d can be regarded as the directed
length of Proj u v.
The geometry involved in having line segments perpendicular to each other breaks
down in R n if n > 3. Even so, we extend the use of the word orthogonal to all R n . Two
1.4 Geometrie Interpretations of R 2 and R 3 27

vectors u, v in R n are called orthogonal if u · v = 0. A set {\i\ | λ G £ } of vectors


in R n is an orthogonal set if u\x · UA2 — 0 whenever λι φ λ 2 .

Exercises 1.4

1. Use Figures 1.2 and 1.3 as patterns and illustrate the parallelogram rule with the
vectors u = (1,6), v = (4, —4), and u -h v in an ^-coordinate system.

2. Use Figures 1.2 and 1.4 as patterns and sketch the vectors u = (5,6), v = (2, —3),
and u — v in an xy-coordinate system.

3. For each λ G R, let M\ be the set of all points in the plane with rectangular
coordinates (x,y) that satisfy y = Xx. Find f] Λ4χ and (J λ4χ.
xec xec
4. Find the equation of the plane (*4) for the given set A.

(a) .A = { ( 1 , 0 , 2 ) , (2, - 1 , 1 ) }
(b) . 4 = {(1,0,2),(2,1,5)}

5. Find the lengths of the following vectors.

(a) ( 3 , - 4 , - 1 2 )
(b) (2,3,6)
(c) (1,-2,4,2)
(d) (2,6,0,-3)
(e) ( 1 , - 2 , - 4 , 3 )
(f) (3,0,-5,8)
6. Determine x so that (x,2) is perpendicular to (—3,9).
7. A vector of length 1 is called a unit vector.

(a) Find a unit vector that has the same direction as (3, —4,12).
(b) Find a vector in the direction of u = (2, —3,6) that has length 4 units.

8. Find each of the following scalar projections.

(a) The scalar projection of (2,3,1) onto (—1, —2,4).


(b) The scalar projection of (-3,4,12) onto (2,3, - 6 ) .

9. Find the length of the projection of the vector (3,4) onto a vector contained in
the line x — 2y = 0.
10. Use projections to write the vector (19,22) as a linear combination of (3,4) and
(4, —3). (Note that (3,4) and (4, —3) are perpendicular.)
28 Chapter 1 Real Coordinate Spaces

11. Let A = {(1,0,2), (2, - 1 , 1 ) } and let B = {(1,1, - 1 ) , (2,1,1)}.

(a) Find a set of vectors that spans (A) Π (B).


(b) Find a set of vectors that spans (A) + (B).

12. Work problem 11 with A = {(3,1, - 2 ) , (-2,1,3)} and B = {(0,1,0), (1,1,1)}.

13. Give an example of an orthogonal set of vectors in R n .

14. Prove that an orthogonal set of nonzero vectors in R n is linearly independent.

15. Let u and v be vectors in R n . Prove that ||u|| ||v|| if and only if u + v and
u — v are orthogonal.

16. Prove Theorem 1.20.

17. The cross product u x v of two vectors u = (^1,^2,^3) and v = (υι,ν2,υ3) is


given by
UXV = (U2V3-U3V2,U3VI -UiV3,UiV2-U2Vi)

U2 U3 U3 U\ U\ U2
ei + e2 + e3,
V2 V3 V3 Vi Vi V2

where ei = (1,0,0),β2 = (0,1,0),e 3 = (0,0,1). The symbolic determinant below


is frequently used as a memory aid, since "expansion" about the first row yields
the value of u x v.

ei e 2 e 3
u x v = I u\ u2 u3

Vi V2 V3

Prove the following facts concerning the cross product.

(a) u x v is perpendicular to each of u, v.


(b) u x u = 0.
(c) u x v = —(v x u).
(d) (au) x v = a(u x v) = u x ( a v ) .
(e) u x (v + w) = (u x v) -f (u x w).
(f) | | u x v | | 2 = | | u | | 2 | | v | | 2 - ( u - v ) 2 .
(g) ||u x v|| = ||u|| ||v|| sin#, where Θ is the angle between the directions of u and
v, and 0° < Θ < 180°.
(h) ||u x v|| is the area of a parallelogram with u and v as adjacent sides.
1.5 Bases and Dimension 29

1.5 Bases and Dimension


We have seen in Section 1.3 that a subset A of the subspace W of R n may be a
spanning set for W and also that A may be a linearly independent set. When both of
these conditions are imposed, they form the requirements necessary for the subset to be
a basis of W .

Definition 1.23 A set B of vectors is a basis of the subspace W if (i) B spans W and
(ii) B is linearly independent.

The empty set 0 is regarded as being linearly independent since the condition for
linear dependence in Definition 1.5 cannot be satisfied. Thus 0 is a basis of the zero
subspace of R n .

Example 1 □ Some of our earlier work helps in providing examples concerning bases.
We saw in Example 2 of Section 1.3 that each of the sets

ε3 = {(ΐ,ο,ο), (ο,ι,ο), (ο,ο,ΐ)},


.A = { ( 1 , 1 , 1 ) , (0,1,1), (0,0,1)}
3
spans R . To show that the set A is linearly independent, we set up the equation

ci(l, 1,1) + c 2 (0,1,1) + c 3 (0,0,1) = (0,0,0).

This equation leads directly to the following system of equations.

ci =0
ci + c2 =0
ci + c2 + c 3 = 0

The only solution to this system is c\ = 0, c2 = 0, c 3 = 0, and therefore A is linearly


independent. It is even easier to see that £3 is linearly independent. Thus both £3 and
A are bases for R 3 .
We saw in Example 4 of Section 1.2 that the set

.4 = { ( 1 , 1 , 8 , 1 ) , (1,0,3,0), (3,1,14,1)}

is linearly dependent. It follows that this set A is not a basis for the subspace

W = ( ( l , 1,8,1), (1,0,3,0), (3,1,14,1))

that it spans. I

We are concerned in much of our future work with indexed sets of vectors, and we
use a restricted type of equality for this type of set. Two indexed sets A and B are
equal if and only if they are indexed A = {u\ \ λ e C} and B = {νχ | λ G C} by
30 Chapter 1 Real Coordinate Spaces

the same index set C such that u\ = v\ for each λ G C. In particular, two finite sets
A — {ui, u 2 ,..., u / J and B — {vi, v 2 ,..., v&} are equal if and only if they consist of the
same vectors in the same order.
The equality described in the preceding paragraph is the one we shall use in the
remainder of this book. For finite sets A — { u i , u 2 , ...,Ufc} of vectors, this equality is
actually an equality of ordered sets. For example, if Ui φ 112, then

{ui,U2,...,Ufc} φ {u2,Ui,...,Ufc}.

When we write
A = {ui,u2,...,u/J,

this notation is meant to imply that A is an ordered set with Ui as the first vector,
U2 as the second vector, and so on. Moreover, we make a notational agreement for the
remainder of this book that when we list the vectors in a set, this listing from left to
right specifies their order. For instance, if we write

.4 = { ( 5 , - 1 , 0 , 2 ) , (-4,0,3,7), (1,-1,3,9)}

this means that ( 5 , - 1 , 0 , 2) is the first vector in A, (—4,0,3,7) is the second vector in
A, and (1, —1,3,9) is the third vector in A. That is, the vectors in A are automatically
indexed with positive integers 1,2,3, · · · from left to right without this being stated.
Suppose now that B = {vi, v 2 ,..., v/J is a basis of the subspace W . Then B spans
W , so that any v G W can be written as X ^ = 1 «iV^. As a matter of fact, this expression
is unique. For if v = Σί=ι ^ v * a s w e n > w e n a v e
k k
aiVi =
Σ ΣhiWi
and therefore
k
^ ( a » - bi)vi = 0.
i=l

Since B is linearly independent, this requires that α^ — bi = 0 and α^ = bi for each i. In


particular, we observe that if u = 0, then all a^ are zero. This uniqueness of coefficients
is not valid for spanning sets in general. The set B in Example 1 of Section 1.2 furnishes
an illustration of this fact.
Although a given subspace usually has many different bases, it happens that the
number of vectors in different bases of the same subspace is always the same. The
derivation of this result is the principal objective of this section.

Theorem 1.24 LetW be a subspace ofHn. Suppose that a finite set A— {ui, 112,..., u r }
spans W , and let B be a linearly independent set of vectors in W . Then B contains at
most r vectors.
1.5 Bases and Dimension 31

Proof. Let W, A, and B be as described in the statement of the theorem. If B


contains less than r vectors, the theorem is true. Suppose then that B contains at least
r vectors, say {νχ, V2,..., νγ} Ç B.
Our proof of the theorem follows this plan: We shall show that each of the vectors
\i in B can be used in turn to replace a suitably chosen vector in A, with A dependent
on the set obtained after each replacement. The replacement process finally leads to
the fact that A is dependent on the set {vi, V2,..., v r } . We then prove that this set of
r vectors must, in fact, be equal to B.
Since A spans W , v i = ΣΓ=ι anui with at least one an φ 0 because νχ ^ 0.
Without loss of generality, we may assume that a\\ φ 0 in the equation

vi = a n u i + a 2 iu 2 Λ h ariur.

(This assumption is purely for notational convenience. We are assuming that the "suit­
ably chosen" vector in A is the first vector listed in A.) The equation for v i implies
that
a n u i = vi — (I21U2 — · · · — a r i u r
and therefore

\anJ \ a nJ \ auJ
Thus Ui is dependent on {vi,ii2, . . . , u r } , and this clearly implies that A is dependent
on { v i , u 2 , . . . , u r } .
Assume now that A is dependent on {vi, V2,..., ν&,ιΐ£+ι, . . . , u r } , where 1 < k < r.
Since W is dependent on A, then W is dependent on {vi,V2,...,Vfc,Ufc + i,...,u r } by
Theorem 1.8. In particular,
k r

i=l i=k+l

At least one of the coefficients αι^+ι of the u^ must be nonzero. For if they were all zero,
then Vfc+i would be a linear combination of νχ, ν 2 ,..., ν&, and this would contradict the
linear independence of B. Without loss of generality, we may assume that a/c+i^+i ^ 0.
Hence we obtain
k r
a
Vfc+1 = 2^6z,fc+lV; + 2_^ i,HlUi
i=l i=k+l

where ak+i,k+i Φ 0· This gives


k

« * + i = E ( - Ja i ! Î ± î - ) v * + ( ^ L - ) v f c + i + Σ, a
(--^ U7.
i=l \ /e+l,/e+l/ \ûfc+l,fc+l/ i=fc+2 ^ k+l,k+l ,

Thus
{vi,v 2 ,...,v f c ,u f c +i,...,u r }
32 Chapter 1 Real Coordinate Spaces

is dependent on
{vi,V2,...,V/c,V/c+i,Ufc+2,...,Ur}.

Since A is dependent on {vi, v 2 ,..., ν^, u^+i, . . . , u r } , Theorem 1.8 implies that A is
dependent on {vi, v 2 ,..., v*, v^+i, u fc + 2 , .··, u r } .
Letting k — 1, 2, ...,r — 1 in the iterative argument above, we see that each vz- in
B can be used to replace a suitably chosen vector in A until we obtain the fact that A
is dependent on {vi, v 2 ,..., v r } . But B is dependent on A, so we have B dependent on
{vi, v 2 ,..., v r } . In particular, if B had more than r elements, any v r + i in B would be
dependent on {vi, v 2 ,..., v r } . But this is impossible since B is independent. Therefore,
B has r elements, and this completes the proof. ■

Corollary 1.25 Any linearly independent set of vectors in R n contains at most n vec­
tors.

Proof. The set of n vectors βχ = (1,0, ...,0),e 2 = (0,1, ...,0), ...,e n = (0,0, ...,1)
spans R n since v = (^i,i>2, ...,^ n ) can be written as v = Σ Γ = ι ^ θ * · The c o r o n a r y
follows at once from the theorem. H

If we think in terms of geometric models as presented in Section 1.4, the next theorem
seems intuitively obvious. It certainly seems obvious in R 2 and R 3 , and there is no
reason to suspect the situation to be different in R n for other values of n. On the
other hand, there is no compelling reason to suspect that the situation would not be
different in R n for other values of n. At any rate, we refuse to accept such an important
statement on faith or intuition, and insist that this result be validated by a logical
argument based upon our development up to this point. This attitude or frame of mind
is precisely what is meant when one refers to the "axiomatic method" of mathematics.

Theorem 1.26 Every subspace o / R n has a basis with a finite number of elements.

Proof. Let W be a subspace of R n . If W = {0}, then 0 is a basis of W , and the


theorem is true.
Suppose W Φ {0}. Then there is at least one nonzero vi in W . The set {νχ} is
linearly independent by Problem 11 of Exercises 1.2. Thus, there are nonempty subsets
of W that are linearly independent, and Corollary 1.25 shows that each of these subsets
contains at most n elements. Let T be the set of all positive integers t such that W
contains a set of t linearly independent vectors. Then any t in T satisfies the inequality
1 < t < n. Let r be the largest integer in T, and let B = {νχ, v 2 ,..., v r } be a linearly
independent set of r vectors in W . We shall show that B spans W .
Let v be any vector in W . Then {vi, v 2 ,..., v r , v} is linearly dependent, from the
choice of r. Thus, there are scalars ci,c 2 , . . . , c r , c r + i , not all zero, such that
r
^ Q V ; + c r + i V = 0.
2=1
1.5 Bases and Dimension 33

Now c r + i φ 0 since B is independent. Therefore, v = ]Ci=i(~~ c */ c H-i) v i· This shows


that B spans W , and hence is a basis of W . ■

This brings us to the main result of this section.

Theorem 1.27 Let W be a subspace ofHn, and let A and B be any two bases for W .
Then A and B have the same number of elements.

Proof. If W = {0}, then each of A and B must be the empty set 0 , and the number
of elements in both A and B is 0.
Suppose W φ {0}. From Corollary 1.25, A and B are both finite. Let A —
{ui,ii2, ...,u r } and B = {νχ, V2,..., v t } . Since A spans W and B is linearly independent,
t < r by Theorem 1.24. But B spans W and A is linearly independent, so r < t by the
same theorem. Thus, t = r. ■

Definition 1.28 7 / W is any subspace o / R n , the number of vectors in a basis o / W


is called the dimension of W and is abbreviated as dim(W).

The following theorem is somewhat trivial, but it serves to confirm that the preceding
definition of dimension is consistent with our prior experience.

Theorem 1.29 The dimension o / R n is n.

Proof. Consider the set En — {ei,e 2 , . . . , e n } , where ei = (1,0, ...,0),e 2 = (0,1, ...,0),
...,e n = (0,0,0,..., 1), are the same as in the proof of Corollary 1.25. It was noted in
that proof that an arbitrary vector v = (vi, ^2,···»^η) c a n be written as
n

v = Συίθί>
and therefore £n spans R n .
The set £n is linearly independent since
n
^2,c%^i = 0 implies (ci,c 2 , ...,c n ) = (0,0, ...,0)
i=l

and therefore all Q = 0. Thus Sn is a basis of R n with n elements, and it follows that
R n has dimension n. ■

Definition 1.30 The set En = {βχ,β2, ...,e n } used in the proof of Theorem 1.29 is
called the standard basis o / R n .

The discussion of coefficients just before Theorem 1.24 explains why the coefficients
Ci,C2, ...,c n in v = Σ™=1 CiWi are unique whenever B = {vi, V2,..., v n } is a basis of
R n . The scalars c\ are called the coordinates of v relative to B. For the special basis
En = {ei,e2,...,e n }, the components of v are the same as the coordinates relative to
£-77,·
34 Chapter 1 Real Coordinate Spaces

Example 2 □ With the basis

εη = {(ΐ,ο,ο), (ο,ι,ο), (ο,ο,ΐ)},


the coordinates of v = (x,y,z) relative to £3 are the same as the components x,y,z,
respectively. But for the basis

B= {(1,1,1), (0,1,1), (0,0,1)},


the coordinates of v = (x, y, z) relative to B are the numbers x,y — x,z — y because

(x, y, z) = x(l, 1,1) + (y - x)(0,1, l) + (z- Î/)(0, 0,1). ■

There are several types of problems involving "basis" and "dimension" that occur
often in linear algebra. In dealing with a certain subspace W , it may be necessary to
find the dimension of W , to find a basis of W , or to determine whether or not a given
set is a basis of W . Frequently it is desirable to find a basis of W that has certain
specified properties. The fundamental techniques for attacking problems such as these
are developed in the remainder of this section.

Theorem 1.31 Every spanning set of a subspace W o / R n contains a basis of W.

Proof. Suppose that W is a subspace of R n and that A is a spanning set for W . If A


is independent, then A is a basis and the theorem is true. Consider now the possibilities
when A is dependent.
If A — {0}, then W is the zero subspace. But we have already seen that 0 is
a basis for the zero subspace and 0 Ç {0}. Thus the theorem is true if A = {0}. If
A τ^ {0}, then there exists a vi in A that is nonzero. If A is dependent on {vi},
we have a basis of W . If A is not dependent on {νχ}, then there is a V2 in A such
that {vi,V2J is linearly independent. This procedure can be repeated until we obtain
a set B = {vi, V2,..., v r } , r < n, that is linearly independent and spans W . For if
we did not obtain such a linearly independent set, we could continue until we had a
linearly independent set in W containing more than n vectors, and this would contradict
Corollary 1.25 since W Ç R n . The set B thus obtained is the required basis of W . ■

Although the details of the work would vary, the procedure given in the proof above
provides a method for "refining" a basis from a given spanning set. This refinement
procedure is demonstrated in the next example.

Example 3 D With

A = {(1,2,1,0), (3, - 4 , 5 , 6 ) , (2, - 1 , 3 , 3 ) , (-2,6, - 4 , - 6 ) } ,

we shall use the procedure in the proof of Theorem 1.31 to find a basis of W = (A)
that is contained in A.
It is natural to start the procedure by choosing Vi = (1,2,1,0). We see that A is
not dependent on {νχ} because the second vector in A, ( 3 , - 4 , 5 , 6 ) , is not a multiple
of v i . If we let V2 = (3, —4,5,6), then {vi, V2} is linearly independent.
1.5 Bases and Dimension 35

We need to check now to see if A is dependent on {vi, V2}. When we set up the
equation
c i ( l , 2 , l , 0 ) + c 2 (3,-4,5,6) = ( 2 , - l , 3 , 3 ) ,
this leads to the system of equations

ci+3c2 = 2
2ci - 4 c 2 = -1
c\ + 5c2 = 3
6c2 = 3.

The solution to this system is easily found to be c\ — c2 = \ . Thus the third vector in
A is dependent on {vi, v 2 } . In similar fashion, we find that

(l)(l,2,l,0) + (-l)(3,-4,5,6) = ( - 2 , 6 , - 4 , - 6 ) .

Thus A is dependent on {νχ, v 2 } , and

{(1,2,1,0), ( 3 , - 4 , 5 , 6 ) }

is a basis for (A).


The work we have done shows that (^4) has dimension 2. After checking to see that
no vector in A is a multiple of another vector in A, we can then see that any pair of
vectors from A forms a basis of (A). H

In the proof of Theorem 1.31, we have seen how a basis of a subspace W can be
refined or extracted from an arbitrary spanning set. The spanning set is not required
to be a finite set, but it could happen to be finite, or course. If a spanning set is
finite, the natural refining procedure demonstrated in Example 3 can be given a simpler
description: A basis of W can be obtained by deleting all vectors in the spanning set
that are linear combinations of the preceding vectors. Problem 13 of Exercises 1.2
assures us this will lead to an independent set, and Theorem 1.8 assures us this will
lead to a spanning set. Thus a basis will result from the deletion of all vectors in a finite
spanning set that are linear combinations of preceding vectors as listed in the spanning
set.
Our next theorem looks at a procedure that in a sense is opposite to refining: It
considers extending a linearly independent set to a basis.

Theorem 1.32 Any linearly independent set in a subspace W o / R n can be extended


to a basis o / W .

Proof. Let A = { u i , u 2 , ...,u r } be a basis of W , and let B — {vi, v 2 ,..., vt } be


a linearly independent set in W . If every U{ is in (B), then A is dependent on B. By
Theorem 1.8, W is dependent on ß, and B is a basis of W .
36 Chapter 1 Real Coordinate Spaces

Suppose that some u^ ^ (23). Let k\ be the smallest integer such that u*^ ^ (23).
Then 23i = {νχ, v 2 , .··, v ^ u ^ } is linearly independent. If each u^ G (23i), then B\
spans W and forms a basis. If some u^ ^ (B\), we repeat the process. After p steps
(1 < V < r)> w e arrive at a set Bp = {vi, V2,..., Vt, ιΐ/^,ιι^, ...,u/ep} such that all
vectors of >4 are dependent on 23p. Thus 23p spans W . Since no vector in Bp is a
linear combination of the preceding vectors, 23p is linearly independent by Problem 13
of Exercises 1.2. Therefore Bp is a basis of W . ■

In the next example, we follow the preceding proof to extend a linearly independent
set to a basis.

Example 4 □ Given that

^ = { ( 1 , 2 , 1 , 3 ) , (1,0,0,0), (0,0,1,0), (0,1,0,1)}

is a basis of R 4 and that 23 = {(1,0,1,0), (0,2,0,3)} is linearly independent, we shall


extend B to a basis of R 4 .
In keeping with the notational agreement made earlier in this section about indexing,
we assume that A and 23 are indexed with positive integers from left to right so that
the notation in the proof of Theorem 1.32 applies with

U! = (1,2,1,3), u 2 = (1,0,0,0), us = (0,0,1,0), u 4 = (0,1,0,1)

and
vi = ( l , 0 , l , 0 ) , v 2 = (0,2,0,3).
Following the proof of the theorem, we find that

(1,2,1,3) = (1,0,1,0) + (0,2,0,3)

and thus Ui is dependent on 23. By inspection, we see that

(l,0,0,0) = c i ( l , 0 , l , 0 ) + c 2 (0,2,0,3)
has no solution. Thus u 2 = (1,0,0,0) is not in (23), and k\ = 2 is the smallest integer
such that Ufcj ^ (23). Using the notation in the proof of the theorem, the set

B\ = { v i , v 2 , u 2 }
= {(1,0,1,0),(0,2,0,3),(1,0,0,0)}

is linearly independent. We check now for a vector u^2 in A that is not in (B\). We
find that
(0,0,1,0) = ( l ) ( l , 0 , l , 0 ) + (0)(0,2,0,3) + ( - l ) ( l , 0 , 0 , 0 )
and U3 = (0,0,1,0) is in (B\), but the equation

( 0 , l , 0 , l ) = c i ( l , 0 , l , 0 ) + c2(0,2,0,3)+c3(l,0,0,0)
1.5 Bases and Dimension 37

has no solution. Thus U4 = (0,1,0,1) is not in (B\).


After two steps we have arrived at the set

#2 = {vi,v 2 ,112,114}

such that all vectors in A are dependent on B<i. According to the proof of Theorem 1.32,
this set i?2 is a basis of R 4 . ■

Our last two theorems in this section apply to the very special situations where the
number of vectors in a set is the same as the dimension r of the subspace involved. For
sets of this special type, only one of the conditions for a basis needs to be checked. This
is the substance of the following two theorems.

Theorem 1.33 Let W be a subspace ofHn of dimension r. Then a set of r vectors in


W is a basis of W if and only if it is linearly independent.

Proof. If a set of r vectors in W is a basis, then it is linearly independent (and spans


W as well) according to Definition 1.23.
Let B = {vi, V2,..., v r } be a set of r linearly independent vectors in W . Then B
can be extended to a basis of W , by Theorem 1.32. Now W has a basis of r elements
since it is of dimension r, and hence all bases of W have r elements. In particular, the
basis to which B is extended has r elements, and therefore is the same as B. ■

Theorem 1.34 Let W be a subspace ofll71 of dimension r. Then a set of r vectors in


W is a basis if and only if it spans W.

Proof. If a set of r vectors in W is a basis, then it spans W (and is linearly


independent as well) by Definition 1.23.
Suppose A = {vi, v 2 ,..., v r } is a set of r vectors which spans W . According to The­
orem 1.31, A contains a basis of W . But any basis of W contains r vectors. Therefore,
the basis contained in A is not a proper subset of A, and A is a basis of W . ■

Exercises 1.5

1. Given that each set A below spans R 3 , find a basis of R 3 that is contained in A.
(Hint: Follow the proof of Theorem 1.31.)

(a) A = {(2,6, - 3 ) , (5,15, - 8 ) , (3,9, - 5 ) , (1,3, - 2 ) , (5,3, - 2 ) }


(b) .A = { ( 1 , 0 , 2 ) , (0,1,1), (2,1,5), (1,1,3), (1,2,1)}
(c) Λ = { ( 1 , 1 , 0 ) , ( 2 , 2 , 0 ) , (2,4,1), (5,9,2), (7,13,3), (1,2,1)}
(d) ^ = { ( 1 , 1 , 2 ) , ( 2 , 2 , 4 ) , ( 1 , - 1 , 1 ) , ( 2 , 0 , 3 ) , ( 3 , 1 , 5 ) , ( 1 , 1 , 1 ) }

2. Given that each set A is a basis of R 4 and that each B is linearly independent,
follow the proof of Theorem 1.32 to extend B to a basis of R 4 .
38 Chapter 1 Real Coordinate Spaces

(a) A = {(1,1,0,0), (0,1,1,0), (0,0,0,1), (0,1,0,1)},


B = {(1,0,2,3), (0,1, - 2 , - 3 ) }
(b) A = {(1,0,0,0), (0,0,1,0), (5,1,11,0), ( - 4 , 0 , - 6 , 1 ) } ,
B = {(1,0,1,0), (0,2,0,3)}
(c) A = {(1,1,1,1), (1,1, - 1 , - 1 ) , (1,0,1,0), (0,1,0, - 1 ) } ,
B = {(1,1,0,0), (0,0,1,1)}
(d) Λ = {(1,1,0,0), (1,0,4,6), (0,0,0,1), (0,1,0,1)},
B = {(1,0,2,3), ( 0 , 1 , - 2 , - 3 ) }

3. Show that Λ is a basis of R 3 by using Theorem 1.33.

(a) A= {(1,-2,3), ( 0 , 1 , - 2 ) , ( 1 , - 1 , 2 ) }
(b) „4 = {(1,0,0), (1,1,0), (1,1,1)}
(c) .4 = {(2,0,0), (4,1,0), (3,3,1)}
(d) .4 = {(2,-1,1), ( 0 , 1 , - 1 ) , (-2,1,0)}

4. Show that each of the sets A in Problem 3 is a basis of R 3 by using Theorem 1.34.
5. By direct use of the definition of a basis, show that each of the sets A in Problem
3 is a basis of R 3 .
6. Which of the following sets of vectors in R 3 are linearly dependent?

(a) {(1,3,1),(1,3,0)}
(b) {(1,-1,0), (0,1,1), (1,1,1), (0,0,1)}
(c) {(1,1,0), (0,1,1), (1,2,1), ( 1 , 0 , - 1 ) }
(d) {(1,0,1), (0,1,1), (2,1,3)}
(e) {(1,0,0), (1,1,0), (1,1,1)}
(f) {(1,1,0), ( 0 , 1 , - 1 ) , (1,0,0)}

7. Which of the sets of vectors in Problem 6 span R 3 ?


8. Which of the following sets are bases for R 3 ?

(a) {(1,0,0),(0,1,0),(0,0,1),(1,1,1)}
(b) {(1,0,0), (0,1,1)}
(c) {(1,0,0),(1,0,1),(1,1,1)}
(d) {(1,0,0), (0,1,0), (1,1,0)}

9. Determine whether or not A is a basis of R 4 .

(a) A = {(1,2,1,0), (3, - 4 , 5 , 6 ) , (2, - 1 , 3 , 3 ) , ( - 2 , 6 , - 4 , - 6 ) }


(b) A = {(1, - 1 , 2 , - 3 ) , (1,1,2,0), (3, - 1 , 6 , - 6 ) , (0,2,0,3)}
1.5 Bases and Dimension 39

(c) . 4 = {(1,0,2,-1), (0,1,1,2), (1,2,1,4), (2,2,3,0)}


(d) Λ = {(1,2,1,-1),(0,1,2,3),(1,4,5,5),(2,7,0,2)}
10. Show that the sets A and Β span the same subspace of R 4 .

(a) A = {(1,1,3, - 1 ) , (1,0, - 2 , 0 ) } , B = {(3,2,4, - 2 ) , (0,1,5, - 1 ) }


(b) A = {(2,3,0, - 1 ) , (2,1, - 1 , 2 ) } , B = {(0, - 2 , - 1 , 3 ) , (6,7, - 1 , 0 ) }
(c) Λ = {(1,0,3,0),(1,1,8,4)},β = { ( 1 , - 1 , - 2 , - 4 ) , (1,1,8,4), ( 3 , - 1 , 4 , - 4 ) }
(d) A = {(1, - 1 , - 1 , - 2 ) , (1, - 5 , - 1 , 0 ) } ,
B = {(0,2,0, - 1 ) , (1, - 3 , - 1 , - 1 ) , (3, - 5 , - 3 , - 5 ) }

11. Find a basis of (A) that is contained in A.

(a) A = {(1,0,1, - 1 ) , (3, - 2 , 3 , 5 ) , (2, - 1 , 2 , 2 ) , (5, - 2 , 5 , 3 ) }


(b) A = {(1,0,1,2), (3,1,0,3), (2,1, - 1 , 1 ) , (1, - 1 , 4 , 5 ) }
(c) .4 = { ( 2 , - 1 , 0 , 1 ) , (1,2,1,0), ( 3 , - 4 , - 1 , 2 ) , (5,3,2,1)}
(d) A= {(1,0,1,1),(0,1,-1,1),(1,-1,2,0),(2,1,1,7)}
12. Find the dimension of (A).

(a) A = {(1,2,1,0), (3, - 4 , 5 , 6 ) , (2, - 1 , 3 , 3 ) , ( - 2 , 6 , - 4 , - 6 ) }


(b) Λ = {(4,3,2,-1), ( 5 , 4 , 3 , - l ) , ( - 2 , - 2 , - 1 , 2 ) , ( 1 1 , 6 , 4 , 1 ) }
(c) A = {(1,0,1,2, - 1 ) , (0,1, - 2 , 1 , 3 ) , (2,1,0,5,1), (1, - 1 , 3 , 1 , - 4 ) }
(d) Λ = {(1,2,0,1,0), (2,4,1,4,3), ( 1 , 2 , 2 , 5 , - 2 ) , ( - l , - 2 , 3 , 5 , 4 ) }
13. Prove that if W i and W2 are subspaces of R™, then
dim(W x + W 2 ) = dim(W x ) + dim(W 2 ) - dim(Wi Π W 2 ) .
(Hint: Let C = {wi,...,w r } be a basis of W i ΓΊ W2, and extend C to bases
A = { w i , . . . , w r , u i , . . . , u s } and B = { w i , . . . , w r , v 1 , . . . , v f } of W i and W 2 , re­
spectively. Prove that {wi,..., w r , u i , . . . , u s , v i , . . . , v ( } is a basis of W i + W 2 .)

14. The sum W i + W 2 + · · · + W fe = £ * = 1 W j of subspaces W3- of R™ is defined to


be the set of all vectors of the form vi + v 2 + · · · + ν&, with v, in W». The sum
Σί=ι W j is called direct if
fe

3=1

for i = 1,2,..., k. A direct sum is written as W i Θ W 2 Θ · · · Θ W^. Prove that


k
dim(Wi Θ W 2 Θ · · · Θ Wfe) = 5 ^ d i m i W j ) .
Chapter 2

Elementary Operations on
Vectors

2.1 Introduction
The elementary operations are as fundamental in linear algebra as the operations of
differentiation and integration are in the calculus. These elementary operations are
indispensable both in the development of the theory of linear algebra and in the appli­
cations of this theory.
In many treatments of linear algebra, the elementary operations are introduced after
the development of a certain amount of matrix theory, and the matrix theory is used as
a tool in establishing the properties of the elementary operations. In the presentation
here, this procedure is reversed somewhat. The elementary operations are introduced
as operations on sets of vectors and many of the results in matrix theory are developed
with the aid of our knowledge of elementary operations. This approach has two main
advantages. The material in Chapter 1 can be used to efficiently develop several of the
properties of elementary operations, and the statements of many of these properties are
simpler when formulated in vector terminology.

2.2 Elementary Operations and Their Inverses


We saw in the proof of Theorem 1.29 that R n has the standard basis £n = {βχ, θ2,..., e n }
in which each vector e* has a very simple form. The Kronecker delta that was introduced
in Exercises 1.2 can be used to describe the vectors e$ in a concise way. Using the fact
that

ί 1 if i = 3
Oij = <

41
42 Chapter 2 Elementary Operations on Vectors

the vectors e* can be written as

for i = 1,2, ...,n.


Example 1 D The vectors in the standard basis £4 = {βι,θ2,β3,β4} are given by

ei = (6u,6i2,6i3,6u) = (1,0,0,0), e 2 = (<$2i, £22, £23,624) = (0,1,0,0),


e 3 = (631, £32, £33, M = (0,0,1,0), e 4 = (641,642,643,644) = (0,0,0,1). ■
We shall see in this chapter that every subspace W of R n has a basis with a certain
simple form, and that particular basis is called the standard basis of W . In order to
develop the concept of this standard basis, we shall need certain operations that change
one spanning set of W into another spanning set of W .
More specifically, we define three types of elementary operations on nonempty or­
dered finite sets of vectors. These types of elementary operations will be referred to
hereafter as types I, II, or III. Let A = {νχ, V2,..., v/~} be a set of vectors in R n .
(i) An elementary operation of type /multiplies one of the v^ in A by a nonzero scalar.
(ii) An elementary operation of type II replaces one of the vectors vs by the sum of
v s and a scalar multiple of a Vt(s / t) in A.
(iii) An elementary operation of type III interchanges two vectors in A.
If the number 1 is used as the scalar in an elementary operation of type I, the
resulting elementary operation is called the identity operation. That is, the identity
operation on a set is that operation that leaves the set unchanged.
Example 2 □ Consider the set of vectors
A = {vi = (1,0,2), v 2 = (2,3,19), v 3 = (0,1,5)}.
Multiplication of the first vector in A by 2 is an elementary operation of type I that
yields the set
Ai = {vn = (2,0,4), via = (2,3,19), v 1 3 = (0,1,5)}.
If the vector V12 in A\ is replaced by V12 + (—3)νχ3, we have an elementary operation
of type II that yields the set
A2 = {V21 = (2,0,4),v 2 2 = (2,0,4),v 2 3 = (0,1,5)}.
An interchange of V22 and V23 is an elementary operation of type III that produces
A3 = {V31 = (2,0,4),v 3 2 = ( 0 , l , 5 ) , v 3 3 = (2,0,4)}.
Application of an elementary operation of type II that replaces V33 by V33 + (—l)v3i
gives
A4 = {V41 = (2,0,4),V 42 = (0,1,5),V 43 = (0,0,0)}. ■
2.2 Elementary Operations and Their Inverses 43

As the example above suggests, the application of a sequence of elementary opera­


tions can be used to replace a given set A by a set A' which has a simpler appearance.
Later in this chapter, we shall turn to an investigation of the properties that A and A!
have in common. When these properties are known, we shall see that the elementary
operations can be chosen so as to make the set A! display certain important information
concerning A. In the investigation of these properties, it is convenient to have available
the concept of an inverse of an elementary operation.
Suppose that an elementary operation is performed on A = {νχ, V2,..., v^} to obtain
a new set A', and consider the operation necessary to obtain the original set A from
the new set A'.

(i) If A! is obtained by employing a type I elementary operation, then A! is of the


form
A! = {vi,...,v e _i,av e ,v e +i,...,Vfc},
where a φ 0. It is readily seen that A is obtained from A! by replacing avs by
^(avs). Thus in this case A is obtained from A! by an elementary operation of
the same type.

(ii) If a type II operation is used to obtain A! from A, then A! has the form

A' = { v i , . . . , v s _ i , v s + 6 v t , v e + i , . . . , v f c } ,

where s φ t. Now v* is in A', and if vs+bvt in A' is replaced by (v s +6v i ) + (—6)v^,


the original set A is obtained. This replacement is an elementary operation of type
II, so that, once again, A is obtained from A! by an elementary operation of the
same type as was used in obtaining A! from A.

(iii) If A' is obtained from A by interchanging the vectors v s and v^ in A, then A is


obtained from A! by the very same operation of interchanging vs and wt ·

We see, then, that once an elementary operation is applied to a set A to obtain a set
A', we need only apply another elementary operation of the same type to A! in order
to obtain A.

Definition 2.1 When an elementary operation E is applied to a set A to obtain a set


A', the elementary operation that must be applied to A! in order to obtain A is called
the inverse elementary operation of E, and is denoted by E~l.

It is clear from our discussion above that the inverse of an elementary operation E
is unique, and is of the same type as E.

Theorem 2.2 / / A' is obtained from A by a sequence of elementary operations, then


A can be obtained from A! by applying the inverses of these elementary operations in
reverse order.
44 Chapter 2 Elementary Operations on Vectors

Proof. Suppose that A' is obtained from A by a sequence E\, E2,..., Et of elementary
operations. That is, the operations £Ί, ϋ?2, ···, £* are applied successively, obtaining a
new set Ai each time an 2^ is applied, until we obtain At = A'. Now consider the
sequence £ t _ *, ü ^ i , . . . , E^1, E^1 applied to A!. Applying E^1 to A! = At, one obtains
At-i since Et yields At when applied to At-\- Then applying E ^ to At-i, one obtains
.4t_2. Continuing in this manner, we obtain, successively, the sets

At-$,At-A, ···, A3, Λ2, *4i.

Then applying E^1 to Ai we obtain A. Thus the theorem is proved. ■

An illustration of this theorem and its proof is provided in the next example.

Example 3 D In Example 2, the set

A' = A* = {V41 = (2,0,4), V42 = (0,1,5), v 4 3 = (0,0,0)}

is obtained from the set

A = {vi = (1,0,2), v 2 = (2,3,19), v 3 = (0,1,5)}

by a sequence ΕΊ, E2, £3, E4 of elementary operations that can be described as follows:

E\\ multiply the first vector by 2.


E2: replace the second vector by the sum of the second vector and (—3) times the
third vector.
E%: interchange the second and third vectors.
E4: replace the third vector by the sum of the third vector and (—1) times the
first vector.

Utilizing the general discussion preceding Definition 2.1, we formulate the inverse
elementary operations as follows.

E^1: multiply the first vector by \.


E21' replace the second vector by the sum of the second vector and 3 times the
third vector.
E%1: interchange the second and third vectors.
E^1: replace the third vector by the sum of the third vector and 1 times the first
vector.

Applying these inverse operations to A! in reverse order, we find that

E41 applied to A! yields Az,


E31 applied to Λ3 yields A2,
E^1 applied to A2 yields Ai,
2.2 Elementary Operations and Their Inverses 45

and
Eïx applied to A\ yields A.

Exercises 2.2
1. Write out the elements of the standard basis of R 5 .
2. Find an elementary operation that yields
{(1,0,2,1),(0,3,0,7),(3,6,4,3)}
when applied to {(1,0,2,1), ( - 2 , 3 , - 4 , 5 ) , (3,6,4,3)}.
3. Find an elementary operation that yields
{(l,0,2,l),(-2,3,-4,5),(3,6,4,3)}
when applied to {(1,0,2,1), (0,3,0,7), (3,6,4,3)}.
4. Show that the set
{(2,3,0,-1),(2,1,-1,2)}
can be obtained from the set {(0, —2, —1,3), (6, 7, —1,0)} by a sequence of elemen­
tary operations.
5. Show that the set
{ ( 0 , - 2 , - 1 , 3 ) , (6, 7 , - 1 , 0 ) }
can be obtained from the set {(2,3,0, —1), (2,1, —1, 2)} by a sequence of elemen­
tary operations.
6. Assume that the set A! = {v'l5 v 2 , v 3 } is obtained from the set A — {vi, v 2 , v 3 }
by the sequence Εχ,Εϊ, Ε% defined as follows.

E\ : multiply the first vector by —3.


E2 : replace the second vector by the sum of the second vector and 2 times
the first vector.
Es : replace the third vector by the sum of the third vector and (—2) times
the second vector.
Write out a sequence of elementary operations that yields A when applied to A'.
7. Let A = {vi, V2, V3} and A! — {v'l5 v 2 , v 3 } be sets of vectors in R n such that

vi = 2vi
v 2 = 2v 2 + 3v 3
v
3 = v 3 + Vi.

Write out a sequence of elementary operations that yields A! when applied to A.


46 Chapter 2 Elementary Operations on Vectors

8. Let A = {vi, v 2 , v 3 , v 4 } , and let A' = {v'l5 v 2 , V3, V4} be sets of vectors in R 3
such that
vi = vi
V 2 = Vi + V 2

V3 = V 2 + V 3

V4 = V 3 + V 4 .

Write out a sequence of elementary operations that yields A' when applied to A.

9. Let A = {vi, v 2 , v 3 , v 4 } , and let A' = {v'1? v 2 , V3, v 4 } be sets of vectors in R 3


such that
vi = vi - v 4
v 2 = v 2 + 3vi
v 3 = v 3 + 2v 2
v^ = 2v 4 .

Write out a sequence of elementary operations that yields A' when applied to A.

10. With the sets A and A' as given in Problem 8, write out a sequence of elementary
operations that yields A when applied to A!'.

11. With the sets A and A' as given in Problem 9, write out a sequence of elementary
operations that yields A when applied to A!.

12. Show that the sequence of elementary operations used to obtain A4 from A in Ex­
ample 2 is not unique by exhibiting a different sequence of elementary operations
that yields A! when applied to A.

13. Show that the identity operation on a set with more than 1 element is an elemen­
tary operation of type II.

2.3 Elementary Operations and Linear Independence


One of the properties that is preserved by application of an elementary operation to
a set is that of linear independence. This important result is established in the next
theorem.

Theorem 2.3 Suppose that A and A' are sets of vectors in R n such that A' is obtained
from A by applying a single elementary operation. Then A! is linearly independent if
and only if A is linearly independent.
2.3 Elementary Operations and Linear Independence 47

Proof. Suppose first that A = {vi, V2, ···, v/c} is linearly independent.
If A! is obtained by a type I elementary operation, then

Ά = {vi,...,v s _i,av e ,v e +i,...,Vfc},

where a φ 0. Now suppose that


k
5^CiVi +csavs = 0.
2=1

This implies that c\ — · · · = c s _i = c s a = c s + i = · · · = c^ = 0 since A is linearly


independent. But csa = 0 implies c s = 0 since a ^ 0. Therefore, all Q are zero, and A'
is linearly independent.
If A' is obtained from .4 by a type II elementary operation, then

A! = { v i , . . . , v e _ i , v e + 6v t ,v e +i,...,Vfc},

where s φί, If
k
^CiVi + c s (v s + bvt) = 0,
z=l

then
y ^ QVi + c s v s + (ct + c s 6)v t = 0,
i=l

and this implies that c\ = · · · = cs — · · · = ct + csb — · · · = Ck = 0 since A is linearly


independent. But cs = 0 and Q + cs6 = 0 imply that ct = 0. Hence all c2 are zero, and
A! is linearly independent.
If A! is obtained by an elementary operation of type III, then A! consists of exactly
the same vectors as does A, except that the order of the vectors is different. It is clear,
then, that A! is linearly independent.
Thus, A' is linearly independent if A is linearly independent.
Suppose now that A' is linearly independent. Since A' is obtained from A by a single
elementary operation, A can be obtained from A' by the inverse elementary operation,
which is an elementary operation of the same type. It then follows from the proof of
the first part of the theorem that A is linearly independent. ■

Frequently, information concerning a set of vectors can be obtained by successive


application of a sequence of elementary operations that cannot be obtained by use of a
single elementary operation. The corollary below is extremely useful in this respect.

Corollary 2.4 If a set A' is obtained by applying a sequence of elementary operations


to a set A of vectors in R n , then A! is linearly independent if and only if A is linearly
independent.
48 Chapter 2 Elementary Operations on Vectors

Proof. Suppose that A! is obtained from A by a sequence E\, E2,..., Et of elementary


operations. Put *4o = A and let ^ be the set obtained by applying Ei to Ai-\ for
z — 1, 2, ...,£. Repeated application of Theorem 2.3 yields the following information:

A\ is independent if and only if Ao = A is independent.

A2 is independent if and only if A\ is independent.

Af = At is independent if and only if At-i is independent.


Thus A is linearly independent if and only if A! is independent. ■

As an example of the use of this corollary, consider the following.

Example 1 D In Example 2 of Section 2.2, it is shown that the set

Λ' = {(2,0,4), (0,1, 5), (0,0,0)}

can be obtained from the set

A= {(1,0,2), (2,3,19), (0,1,5)}

by a sequence of elementary operations. The set A! is obviously linearly dependent


since it contains the zero vector. It follows that the set A is linearly dependent. ■

At this point, we have developed a somewhat crude method for investigating the
linear dependence of a given set A of vectors in R n . If, by application of a sequence of
elementary operations to A, it is possible to obtain a set that contains the zero vector
(or any set that is clearly dependent), then the given set is linearly dependent. By
the same token, if a set can be obtained that is clearly independent, then A is linearly
independent. This method is refined to a systematic procedure later in this chapter.
We conclude this section with a final corollary to Theorem 2.3.

Corollary 2.5 A set of vectors resulting from applying a sequence of elementary oper­
ations to a basis o / R n is again a basis o / R n .

Proof. Let A be a basis of R n . According to Corollary 2.4, any set A! obtained from
A by a sequence of elementary operations is a linearly independent set of n vectors, and
hence is a basis by Theorem 1.33. ■

Exercises 2.3

1. Show the set A = {(1,2,1,0), (3, - 4 , 5 , 6 ) , (2, - 1 , 3 , 3 ) , (-2,6, - 4 , - 6 ) } is linearly


dependent by applying a sequence of elementary operations to A and obtaining a
set A' that contains the zero vector.
2.4 Standard Bases for Subspaces 49

2. Use the method described in Problem 1 to show that the set

{(1,1,0),(0,1,1), ( 1 , 0 , - 1 ) , (1,0,1)}

is linearly dependent.

3. Show that the set A = {(1,0,0), (1,1,0), (1,1,1)} is linearly independent by ob­
taining A from the standard basis of R 3 by a sequence of elementary operations.

4. Show that the set A = {(1,0,0), (1,1,0), (1,1,1)} is linearly independent by ob­
taining the standard basis of R 3 from A by a sequence of elementary operations.

5. Use elementary operations to determine whether or not the given set is linearly
independent.

(a) { ( 1 , 0 , 2 ) , ( 2 , - 1 , 1 ) , ( 1 , 1 , - 1 ) }
(b) {(1,1,-1), ( 2 , - 1 , 1 ) , (2,1,1)}
(c) {(1,1,8,-1),(1,0,3,0),(3,2,19,-2)}
(d) {(1, - 1 , - 2 , - 4 ) , (1,1,8,4), (3, - 1 , 4 , - 4 ) }
(e) {(1,0,1,0), (0,1,0,1), (4,3,2,3), (1,0,0,0)}
(f) {(1,0,1,0), (2,1,4,3), ( 1 , 2 , 5 , - 2 ) , (-1,3,5,4)}

2.4 Standard Bases for Subspaces


In the last section, we found that linear independence is a property that is preserved
by application of elementary operations. As the reader has likely anticipated, we turn
next to an investigation of the application of elementary operations to spanning sets
of a subspace. We then combine and sharpen our results so as to obtain a systematic
method of attacking the types of problems mentioned in Section 1.5. Later, we shall
see that our results have even more applications.
The following definition gives a notation that is useful in describing the result ob­
tained when several elementary operations are applied in a sequence.

Definition 2.6 If AC R n and E denotes an elementary operation that may be applied


to v4, then E(A) denotes the set that is obtained by applying E to A. If E\, E2, ···, Et is
a sequence of elementary operations that may be applied to A, then EtEt-\ · · · E2Ei(A)
is defined inductively by

EtEt-X · · · E2EM) = Et (Et-i ■ · · E2El(A)).

Example 1 D Let A = { v i , v 2 , v 3 } , where vi - ( l , 0 , 2 ) , v 2 = ( 2 , l , 6 ) , v 3 = (0,3,8),


and consider the sequence ΕΊ, E2, £3, where the elementary operations Ei are given by:
50 Chapter 2 Elementary Operations on Vectors

Ei : Replace the second vector by the sum of the second vector and (—2) times
the first vector.
E2 : Replace the third vector by the sum of the third vector and (—3) times the
second vector.
£3 : Multiply the third vector by \ .
According to Definition 2.6,

Ei{A) = {vi,v2 + (-2)vi,v3}


= {(1,0,2), (0,1,2), (0,3,8)}

Ε2Ελ(Α) = E2(E1(A))
= £ 2 ({(1,0,2), (0,1,2), (0,3,8)})
= {(1,0,2),(0,1,2),(0,0,2)}
E3E2E1(A) = E3(E2E1(A))
= £ 3 ({(1,0,2),(0,1,2),(0,0,2)})
= {(1,0,2),(0,1,2),(0,0,1)}. ■

The next theorem in our development is fairly obvious, but it is important enough
to be designated as a theorem. A restricted form of the converse is contained in the last
theorem of this section, but the proof of that theorem must wait until some intermediate
results are established.

Theorem 2.7 If A! is obtained by applying a sequence of elementary operations to a


set A of vectors in R n , then (Α') = (A).

Proof. Suppose that A! is obtained by applying the sequence E\E2,...,Et of ele­


mentary operations Ei to A. Let Ao = A and Ai — Ei(Ai-i) for i — 1,2, ...,£, so that
A! = At.
Now Ai is obtained from Ai-i by application of Ei. If we recall the definitions of
the elementary operations in Section 2.2, it is evident that each vector of Ai is a linear
combination of vectors in A - i · Thus (Ai) Q ( A - i ) · But Ai-i = E~1(Ai) so that each
vector of A4-1 is a linear combination of vectors in Ai. Therefore (Ai-i) Ç (Ai), and
(Ai-i) = (Ai). Letting i = 1,2, ...,£ in succession, we have

(A) = {Ao) = (Λ) = · · · = (A> = (Λ'),


and the theorem is proved. H

Combining this result and Corollary 2.4, we obtain an important corollary concerning
bases of a subspace.
Corollary 2.8 A set of vectors resulting from the application of a sequence of elemen­
tary operations to a basis of a subspace W is again a basis of W .
2.4 Standard Bases for Subspaces 51

The next theorem is our first step in standardizing the bases of subspaces. Example
2 appears just after the end of the proof of this theorem, and the work in that example
illustrates the steps described in the proof. If the steps in the proof and the steps in
the example are traced together, this should make each of them easier to follow.

Theorem 2.9 Let A = {vi , V2, · ·., v m } be a set of m vectors in R n that spans the sub-
space W = (A) of dimension r, where m > r > 0. Then a set A' = {v 1? v 2 ,..., v r , 0 , ...,0}
of m vectors can be obtained from A by a finite sequence of elementary operations so
that {v x , v 2 ,..., v r } has the following properties:
1. The first nonzero component from the left in v · is a 1 in the kj component for
j = 1,2, ...,r. (This 1 is called a leading one.)
2. k\ < &2 < * * * < kr. (In vectors listed later in A!, the leading ones occur in
positions that are farther to the right.)
3. Vj is the only vector in A! with a nonzero kj component.
4. {v x , v 2 ,..., v r } is a basis o / W .

Proof. By Theorem 1.31, A contains a basis of (A). Thus, there is at least one vector
in A that is not zero. Let k\ be the smallest positive integer for which some v^ has
nonzero k\ component. By no more than one interchange of vectors, a spanning set for
W can be obtained in which the first vector has a nonzero k\ component. Multiplication
of this vector by the reciprocal of its k\ component yields a spanning set of W in which
the k\ component of the first vector is 1. Then each of the other vectors can be replaced
by the sum of that vector and a suitable multiple of the new first vector to obtain a
spanning set
Λι = {νΜ\...,ν£}
of W in which
(i) the first nonzero component in v^ is a 1 in the k\ component, and

(ii) v^ is the only vector in A\ with a nonzero number in any of the first fci positions
from the left.

If r — 1, the theorem follows trivially at this point.


If r > 1, let &2 be the least positive integer for which some v^ ',i φ 1, has nonzero
/c2 component. At least one such w\ ' exists, since all remaining vectors would otherwise
be zero. By an interchange of vectors that does not involve the first vector, a spanning
set for W can be obtained in which the second vector has nonzero /c2 component.
Multiplication of this vector by the reciprocal of its k<i component will yield a spanning
set of W in which the k2 component of the second vector is 1. Then each of the other
vectors can be replaced by the sum of that vector and a suitable multiple of the new
second vector to obtain a spanning set
A _ r v (2) „(2) (2)i

A2 - \yY ,v2 ,...,v^j


of W that has the following properties:
52 Chapter 2 Elementary Operations on Vectors

(i) The first nonzero component in v[ ' is a 1 in the k\ component, and the first
(2)
nonzero component in v^ is a 1 in the k2 component,
(ii) ki<k2,

(iii) v\ } is the only vector in A2 with a nonzero k\ component, and v 2 is the only
vector in A2 with a nonzero k2 component.

That is, the first two vectors in the set A2 have the first three properties required
in the statement of the theorem.
Suppose that a set Ai that spans W has been obtained in which the first i vectors
(i < r) satisfy the first three properties listed in the theorem. Then fc^+i is chosen to be
the least positive integer for which one of the last m — i vectors in the set has a nonzero
fc^+i component. Such a vector exists, for Ai must contain at least r nonzero vectors.
The procedure described to obtain A\ and A2 can then be repeated to obtain the set
Ai+i that spans W in which the first i + 1 vectors satisfy the first three conditions.
It is clear, then, that a finite sequence of elementary operations can be applied to A
to obtain a set Ar — {v^ , vi>r\ ..., vJl } that spans W and in which the first r vectors
satisfy the conditions (1), (2), (3). Now assume that there exists an v;· , j > r, with
nonzero fcr+i component. Then it must be that kr+1 > /cr, and from this it is easily
seen that {ν^ , v 2 r ,..., vf" , v; } is linearly independent, contradicting the fact that r
is the dimension of W . Thus the remaining m — r vectors in Ar are zero and Ar = Af,
where A! satisfies the conditions of the theorem. ■

The proof given for Theorem 2.9 is a constructive one in that it describes a method
of obtaining the set A' from a given set A. This is illustrated in the following example.

Example 2 D Let A = {vi, v 2 , v 3 , v 4 } for vi - ( 0 , 0 , 0 , - 1 , - 1 ) , v 2 = ( 0 , 3 , - 6 , 1 , - 2 ) ,


v 3 = (0,2, - 4 , 2 , 0 ) , v 4 = (0, - 1 , 2 , 2 , 3 ) , and let W = (A).
Following the proof of the theorem, we see that k\ = 2 is the smallest positive integer
for which some v^ has nonzero k\ component. By an interchange of νχ and V3, we obtain
the spanning set

{(0,2, - 4 , 2,0), (0,3, - 6 , 1 , - 2 ) , (0,0,0, - 1 , - 1 ) , (0, - 1 , 2, 2,3)}

in which the first vector has nonzero k\ component. Multiplication of this vector by |
yields the spanning set

{(0,1, - 2 , 1 , 0 ) , (0,3, - 6 , 1 , - 2 ) , (0,0,0, - 1 , - 1 ) , (0, - 1 , 2 , 2 , 3 ) }

in which the k\ component of the first vector is 1. Each of the other vectors is now
replaced by the sum of that vector and a suitable multiple of the new first vector as
follows.

Replace the second vector by the sum of the second vector and (—3) times the
first vector.
2.4 Standard Bases for Subspaces 53

No change is needed on the third vector. (A "suitable" multiple to add would be


the zero multiple.)
Replace the fourth vector by the sum of the fourth vector and 1 times the first
vector.

This yields the spanning set

Λ
1 - iVl >V2 >V3 >V4 )

= {(0,1, - 2 , 1 , 0 ) , (0,0,0, - 2 , - 2 ) , (0,0,0, - 1 , - 1 ) , (0,0,0,3,3)}

in which

(i) the first nonzero component in v^ is a 1 in the second component, and

(ii) Vj = (0,1, —2,1,0) is the only vector in A\ with a nonzero component in either
of the first two positions.

Now &2 = 4 is the least positive integer for which some w\ \i ^ 1, in A\ has a
nonzero k2 component, and we have v 2 = (0,0,0, —2, —2) with a nonzero k2 compo­
nent. Multiplication of v 2 by — \ yields

{(0,1,-2,1,0), (0,0,0,1,1), ( 0 , 0 , 0 , - 1 , - 1 ) , (0,0,0,3,3)}.

Each of the vectors other than the second is now replaced by the sum of that vector
and a suitable multiple of the second vector as follows.

Replace the first vector by the sum of the first vector and (—1) times the second
vector.
Replace the third vector by the sum of the third vector and the second vector.
Replace the fourth vector by the sum of the fourth vector and (—3) times the
second vector.

This yields
A _ r„(2) „(2) (2) (2),
A2 - {ν χ , ν 2 , v 3 , v 4 )
= {(0,1, - 2 , 0 , - 1 ) , (0,0,0,1,1), (0,0,0,0,0), (0,0,0,0,0)}.

It is evident at this point that A2 = A'. Simultaneously with finding A', we have found
that W has dimension r = 2. ■

The conditions of Theorem 2.9 are very restrictive, and one might expect that there
is only one basis of a given subspace that satisfies these conditions. The next theorem
confirms that this is the case.

Theorem 2.10 There is one and only one basis of a given subspace W that satisfies
the conditions of Theorem 2.9.
54 Chapter 2 Elementary Operations on Vectors

Proof. Starting in Theorem 2.9 with a basis A of W , that theorem assures us that
a basis A! of W that satisfies the conditions can be obtained from A by a sequence of
elementary operations. Thus there is at least one basis of the required type.
Suppose now that A' = {v'1?..., v^.} and A" — {v",..., v"} are two bases of W that
satisfy the conditions of Theorem 2.9. Let k[, ...,k'r and fc",..., A;" be the sequences of
positive integers described in the conditions for A! and A"', respectively.
Assume k[ < k'{. Since A!' spans W , there must exist scalars en such that ν'χ =
Σ [ = ι ciivï- Since each v " has zero j t h component for each j < k", any linear combi­
nation such as Σ1=ι Ci\v" must have zero j t h component for each j < k'{. But v^ has
a nonzero k[ component, and k[ < k'{. This is a contradiction; hence k[ > k". The
symmetry of the conditions on A! and A!' implies that k'{ > k[, and thus, k[ = k'{.
Now assume kf2 < k2. Since A" spans W , there must exist scalars C{2 such that
v 2 = Σ [ = ι ci2wi- Now v " is the only vector in A!' that has nonzero k[ component, so
a linear combination Σ\=1 Q2v" has zero k[ component if and only if en = 0. Since v 2
has a zero k[ component, C\2 = 0, and we have v 2 = ΣΙ=2 c i2 v f· For i > 2, v" has zero
j t h component for each j < k2, and thus the linear combination Σ1=2 ci2v" has zero j t h
component for all j < k2. But v'2 has nonzero k2 component, and k2 < k2. As before,
we have a contradiction, and therefore, k2 > k2. From the symmetry of the conditions,
k2 > k'2, and thus, k2 = k2.
It is clear that this argument may be repeated to obtain kj = k'· for j = 1, 2,..., r.
Now A" spans W , so for each v^ there must exist scalars Cij such that \'- =
ΣΓ=ι cijvi- Since v" is the only vector in A" that has nonzero k[ component and
v" has k[ component equal to 1, Cij is the k[ component of J I [ = 1 Cijv". But w'- has zero
k\ component for i φ j , so c^ = 0 for i φ j , and since vfj has kj component equal to 1,
Cjj = 1. Therefore ν'ά = v'j, and A! = A" M

Theorem 2.10 allows us to make the following definition.

Definition 2.11 Let W be a subspace ofHn of dimension r. The standard basis of


W is the basis {νχ, V2,..., v r } o / W that satisfies the conditions of Theorem 2.9.

That is, the standard basis of W is the unique basis {vi, V2,..., v r } that has the
following properties.
t h
1. The first nonzero component from the left in the j vector Vj is a 1 in the kj
component. (This 1 is called a leading one.)

2. ki < k2 < · · · < kr.


3. The j t h vector Vj is the only vector in the basis with a nonzero kj component.

Clearly, if r = m = n in Theorem 2.9, the standard basis thus defined is the same as
that given in Definition 1.30, and our two definitions are in agreement with each other.

Example 3 □ In Example 2, we found that the standard basis of the subspace

W = ((0,0,0, - 1 , - 1 ) , (0,3, - 6 , 1 , - 2 ) , (0,2, - 4 , 2 , 0 ) , (0, -1,2,2,3))


2.4 Standard Bases for Subspaces 55

is the set {(0,1, - 2 , 0 , - 1 ) , (0,0,0,1,1)}.


If we are only interested in finding the standard basis of W , the amount of work
that is necessary is much less than that done in Example 2. As a first step, we might
write the vectors of A in a rectangular array as

0 0 0 0
0 3 2-1
0-6-4 2
- 1 1 2 2
-1-2 0 3

In composing this array, we have recorded the components of v^ from top to bottom in
the z th column from the left. (It is admittedly more natural to record these components
in rows rather than columns. The reason for the use of columns will become clear in
Chapter 3.) Let us use an arrow from the first array to a second to indicate that the set
represented by the second array is obtained from the first array by application of one
or more elementary operations.
Our work in Example 2 can then be recorded in this manner:

0 0 0 0 0 0 0 0 0 0 0 0
2 3 0-1 1 3 0-1 1 0 0 0
-4-6 0 2 -2-6 0 2 - 2 0 0 0
2 1 - 1 2 1 1 - 1 2 1-2-1 3
0-2-1 3 0-2-1 3 0-2-1 3

0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0
2 0 0 0 - - 2 0 0 0
1 1 -1 3 0 1 0 0
0 1 -1 3 - 1 1 0 0

Primarily for future use, we record the following theorems.

Theorem 2.12 The standard basis of a subspace W can be obtained from any basis of
W by a sequence of elementary operations.

Proof. Let W be a subspace of dimension r, and let A! be the standard basis of W .


If A is any basis of W , then Theorem 2.9 applies with m = r, asserting that A' can be
obtained from A by a finite sequence of elementary operations. ■
56 Chapter 2 Elementary Operations on Vectors

Theorem 2.13 Let A and B be two sets of m vectors each in R n . Then (A) = (B) if
and only if B can be obtained from A by a sequence of elementary operations.

Proof. If B can be obtained from A by a sequence of elementary operations, then


(A) = (B) by Theorem 2.7.
Suppose now that (A) = (B) = W , where W has dimension r. Let A! denote the set
of vectors in Theorem 2.9 that can be obtained from A by a sequence E\, E2,..., Ek of
elementary operations. Let B' denote the corresponding set that can be obtained from
B by a sequence Fi,F2,...,F t of elementary operations. Then the last m — r vectors
in both Af and B' are zero, and the first r vectors in both A! and B' constitute the
standard basis of W , which is unique by Theorem 2.10. Thus A' — B'.
Now B can be obtained from B' = A' by the sequence Ff1,..., F 2 _ 1 , F j - 1 . Thus B
can be obtained from A by application of the sequence

E\,E2, ...,Ek,Ft , ...,F 2 ,FX

of elementary operations. ■

The value of Theorem 2.13 is mainly theoretical. A practical way to determine


whether or not (A) = (B) is to find and compare the sets A' and B' described in the
proof. Then (A) = (B) if and only if A' = Bf.

Exercises 2.4

1. Let A = { ( - 5 , 1 1 , - 3 , 0 , 1 ) , (0,5,10,1,6), ( 2 , - 4 , 2 , 1 , 1 ) , (2,1,12,2,7)}. Find the


set A' of Theorem 2.9 by the method used in the proof of Theorem 2.9 and
Example 2.
2. Work Problem 1 using the set A given by

.A = { ( 0 , 1 , 1 , 8 , - 1 ) , (0,1,1,3,0), (0,3,3,19, - 2 ) , (0,4,4,22, - 2 ) , (0,3,3,14,1)}.

3. For each of the sets A below, find the dimension of (A).

(a) . 4 = {(1,1,3,-1), (1,0, - 2 , 0 ) , (3,2,4, - 2 ) }


(b) .A= {(1,2,-1,3), (0,1,0,2), ( 1 , 3 , - 1 , 5 ) , (1,1, - 1 , 0 ) }
(c) A = {(0,2,0,4, - 1 ) , (0,5, -1,11,8), (0,0,1, - 7 , 9 ) , (0,7,0,8,16)}
(d) A = {(0,3,0, - 1 , 3 ) , (0, - 2 , 0 , 1 , - 1 ) , (0,5,0,1,13), (0,4,0, - 2 , 2 ) }

4. For each of the sets A given below, use rectangular arrays as in Example 3 to find
the standard basis for (A).

(a) .A = { ( 1 , 1 , 0 , 0 ) , (0,1,1,0), (0,0,1,1), (1,2,2,1)}


(b) A= { ( 1 , - 1 , 2 , - 3 ) , ( 3 , - 1 , 6 , - 6 ) , (1,0,1,0), (1,1,2,0)}
(c) A = {(3,2,5,1), (-1,0, - 1 , - 1 ) , (2,1,3,1), (1,0,1,1)}
2.4 S t a n d a r d Bases for Subspaces 57

(d) A = {(0,2,0,3), (1,0,1,0), (3, - 1 , 6 , - 6 ) , (1,1,2,0), (1, - 1 , 2 , - 3 ) }

5. Determine whether or not each of the sets below is linearly independent by finding
the dimension of the subspace spanned by the set.

(a) {(1,2,0,1,0), (2,4,1,4,3), (1,2,2,5, - 2 ) , ( - 1 , - 2 , 3 , 5 , 4 ) }


(b) { ( 1 , - 1 , 2 , - 3 ) , ( 3 , - 1 , 6 , - 6 ) , (1,0,1,0), (1,1,2,0)}

6. In each case, determine whether or not (A) = (B).

(a) A = {(1,1,1,1), (0,0,2,2)}, B = {(2,2,3,3), (1,1,2,2)}


(b) . 4 = {(1,1,0,0), (1,0,1,1)},B = {(2,-1,3,3), ( 0 , 1 , - 1 , - 1 ) }
(c) A= {(2,2,0,0), (1,2,1,0), ( 1 , 1 , 1 , 1 ) } , B = { ( 1 , 1 , 0 , 0 ) , (0,1,1,0), (0,0,1,2)}
(d) Λ = {(1,2,1,-1),(0,1,2,3),(1,4,5,5)},β = { ( 2 , 4 , 2 , - 2 ) , ( 0 , 3 , - 2 , 4 ) }
(e) Λ={(1,1,-1),(1,3,1),(-1,1,2)},β={(2,1,2),(1,0,1),(3,2,4),(2,6,3)}
(f) A = {(1, - 2 , - 1 ) , (3,2, - 1 ) , (0,4,1), (2,0 - 1)},B = {(5, - 2 , - 3 ) , (1,6,1)}
7. For each pair W j , W2, find the dimension of W j + W2.

(a) W i = ((1,1,3, - 1 ) , (1,0, - 2 , 0 ) , (3,2,4, - 2 ) ) , W 2 = ((1,0,0,1), (1,1,7,1))


(b) W i = ((1, - 1 , 2 , - 3 ) , (1,1,2,0), (3, - 1 , 6 , - 6 ) ) ,
W 2 = {(2,0,4,-3), (0,0,0,1))

8. Let A and A' be as given in Example 2. Write out a complete sequence of ele­
mentary operations that will yield A' when applied to A.

9. Given that the sets A and B below span the same subspace W , follow the proof
of Theorem 2.13 to find a sequence of elementary operations that can be used to
obtain B from A.

(a) Λ = {(1,1,0,0), (1,0,1,1)},8 = {(2,-1,3,3), ( 0 , 1 , - 1 , - 1 ) }


(b) A = {(1,1,3, - 1 ) , (1,0, - 2 , 0 ) } , B = {(3,2,4, - 2 ) , (0,1,5, - 1 ) }
(c) Λ = {(4,5,2,2), (1,1,1,0)},S = { ( 1 , - 1 , 5 , - 4 ) , ( 0 , 2 , - 4 , 4 ) }
(d) ,Λ= { ( 1 , 1 , 8 , - 1 ) , ( 1 , 0 , 3 , 0 ) } , B = { ( 1 , - 1 , - 2 , 1 ) , ( 3 , - 1 , 4 , 1 ) }
Chapter 3

Matrix Multiplication

3.1 Introduction
In Example 3 of Section 2.4, it was found that rectangular arrays were a useful notational
convenience in recording the results of elementary operations on a set of vectors. These
rectangular arrays are specific examples of the more general concept of a matrix over a
set Λ4, to be defined in the following section. In this chapter, we define the operation
of multiplication on matrices with real numbers as elements and establish the basic
properties of this operation. As mentioned earlier, the results of Chapter 2 are extremely
useful in the development here, particularly in the last two sections.

3.2 Matrices of Transition


The definition of a matrix over a set Λ4 is as follows.

Definition 3.1 An r by s matrix over a set M is a rectangular array of elements of


M, arranged in r rows and s columns. Such a matrix will be written in the form

a n «12 · · · ais
a2s
0>2\ ^22 " '

ar\ ar2 · ' ' ars

where aij denotes the element in the ith row and j t h column of the matrix. The numbers
r and s are said to be the dimensions of A, and r by s is sometimes written as r x s.

The matrix A above may be written more simply as A = [a>ij]rXs or A = [α^] if the
number of rows or columns is not important.

59
60 Chapter 3 Matrix Multiplication

There are several terms that are useful in describing matrices of certain types. An
r by r matrix is said to be a square matrix or a matrix of order r. The elements
an of A = [aij] are the diagonal elements of ^4, and a square matrix A = [a^] with
aij = 0 whenever i φ j is a diagonal matrix. The matrix Ir = [ % ] r x r is the identity
matrix of order r. A matrix A = [α^] is a zero matrix if a^ = 0 for all pairs i, j . A
matrix that has only one row is a row matrix, and a matrix that has only one column
is a column matrix.

Definition 3.2 Two matrices A = [a>ij]rxs and B = [bij] χ over a set M are equal
if and only if r = p,s — q, and a^ = b^ for all pairs i, j .

Example 1 D Consider the following matrices.

1 7 -3 0 0 0
0
0 5 -6 B = 0 0 0 C = 2 4 6 D =
0
8 4 9 0 0 0

1 0 0
5 0 1 0 0
E = F= |0 1 0 I G=
0 -3 0 1 0
0 0 1

The special terms just introduced apply to these matrices in the following ways:

Α,Β,Ε, and F are square matrices.


B, E, and F are diagonal matrices.
F is an identity matrix.
Both B and D are zero matrices, but B φ D.
C is a row matrix.
D is a column matrix.
None of the special terms apply to G. M

At times, we will denote a zero matrix by the same symbol 0 that we use for a zero
vector. This will not cause confusion, because the context where the symbol is used will
make the meaning clear.
Now consider a set of vectors A = {ui,U2, ...,u r } in R n and a second set B =
{vi, V2,..., v s } contained in (^4). Since A spans {A), there are scalars a^ in R such
that Vj = ΣΓ=ι a u' u « f° r 3 — 1,2,..., 5. The following definition involves these scalars
aij.
3.2 Matrices of Transition 61

Definition 3.3 If A = {ui,U2, ...,u r } is a set of vectors in R n and B = {νχ, v 2 , ...v s }


is a set of vectors in (A), a matrix of transition (or transition matrixj from A to
B is a matrix A — [dij\rxs such that

W = ai Ui or =
J Σ J f i ^ 2 ' ·■■' '

The term matrix of transition applies only to situations involving nonempty finite
sets of vectors, and these sets must be ordered. Whenever a set of vectors is listed
without indices, it is understood that the index j is to go with the j t h vector from the
left. Thus the first vector from the left is to have index 1, the second from the left to
have index 2, and so on. This is consistent with the notational agreement made after
Example 1 in Section 1.5.
Another point needs to be emphasized in connection with Definition 3.3. The def­
inition of the term transition matrix that is found in many elementary linear algebra
texts is not equivalent to the one given here. The one stated in Definition 3.3 is what we
need to present our development of matrix multiplication, and it is the one that leads
to simpler proofs of major theorems later in this book.
The following example shows that the matrix of transition A is not always uniquely
determined by A and B.
Example 2 □ Let A {(1,2,0), (2,4,0), (0,0,1)} and B = {(1,2,4), (2,4,8)} in (A).
Now
(1,2,4) = (1)(1,2,0) + (0)(2,4,0) + (4)(0,0,1)
and
(2,4,8) = (2)(1,2,0) + (0)(2,4,0) + (8)(0,0,1)
so that

Ai =

is a matrix of transition from A to B. But


(1,2,4) = (-1)(1,2,0) + (1)(2,4,0) + (4)(0,0,1)
and
(2,4,8) = (0)(1, 2,0) + (1)(2,4,0) + (8)(0,0,1)
so that
-1 0
A2 = 1 1
4 8
is also a matrix of transition from A to B. Obviously A\ φ Α<ι, but we note that A is
not linearly independent and thus is not a basis for (*4). ■
62 Chapter 3 Matrix Multiplication

Note that if A is a basis of a subspace W and B Ç W , a transition matrix A =


*Virxs
from A to B always exists. Also, the number of columns in A is the same as
the number of vectors in B, and the number of rows in A is the same as the number of
vectors in A. The scalars an that are the coefficients in the sum

i=l

are called the coordinates of Vj relative to A. It is evident that the coordinates of Vj


relative to A are recorded in the j t h column of the matrix of transition from A to B.
Consequently, the α^ are unique (since A is a basis), and the matrix A — [α^] is unique.
In particular, any A = [ a ^ ] m x n may be considered as the matrix of transition from the
standard basis £ m to a set of n vectors in R m , and the components of these vectors are
given by the columns of A.

Definition 3.4 If A— {ui, U2,..., u r } is a basis of the subspace W and v Σί=ι ciui
the r by 1 matrix C given by

C2
C

is called the coordinate matrix of v relative to A. The coordinate matrix of v relative


to A is denoted by [ν]Λ , and the phrase "with respect to A" is used interchangeably with
"relative to A. "

For a given basis A, the vector v and the coordinate matrix [v]^ uniquely determine
each other. Also, with the notation of the paragraph preceding Definition 3.4, the j t h
column of A, written as a column matrix

aXj
a2j

is the coordinate matrix of Vj relative to A.

E x a m p l e 3 □ Consider the matrix

1 2 1-3
A = 2 1-45
3 0 2 1
3.2 Matrices of Transition 63

This matrix is the transition matrix from £ 3 to the set B = {vi, V2, V3, V4} where

vi = l - e i + 2 - e 2 + 3 - e 3 = (l,2,3),
v 2 = 2 ■ ei + 1 · e 2 + 0 · e 3 = (2,1,0),
v 3 = 1 · ei + (-4) · e 2 + 2 · e 3 = (1, - 4 , 2),
v 4 = (-3) · e x + 5 · e 2 + 1 · e 3 = (-3,5,1).

Using the basis A = { u i , u 2 , u 3 } , where u x = (0,1, l ) , u 2 = (1,0, l ) , u 3 = (1,1,0), the


same matrix A is the matrix of transition from A to the set C = {wi, w 2 , w 3 , W4} where

wi = 1 · ui + 2 · u 2 + 3 · u 3 = (5,4,3),
w 2 = 2 · m + 1 · u 2 + 0 · u 3 = (1,2,3),
w 3 = 1 · ui + (-4) · u 2 + 2 · u 3 = ( - 2 , 3 , - 3 ) ,
w 4 = (-3) - m + 5 · u 2 + 1 ■ u 3 = (6, - 2 , 2 ) .

In this situation, we have an illustration of how the coordinates of a vector change from
1
basis to basis. For the vector vi = W2 = (1,2,3) has the coordinate matrix

relative to £3 and the coordinate matrix relative to A.

Exercises 3.2
1. If
2
A = 1
0
is the matrix of transition from £3 to #, find the standard basis of (B).
2. If
2 4 1 1
1 2 2 0
0 0 3 -1
3 6 0 2
is the matrix of transition from £4 to #, find the standard basis of (B).
64 Chapter 3 Matrix Multiplication

2x + y x — 2y
3. Determine x and y so that is a diagonal matrix.
Ax — Sy 3x — y

4. If C is the matrix of transition from a set of p vectors in R n to a set of q vectors


in R n , how many rows are in C?

5. Find the transition matrix from the first set to the second.

(a) {(1,1,0,0), (0,0,1,1)}; {(2,2,3,3), (1,1,2,2)}


(b) {(2,2,3,3), (1,1,2,2)}; {(1,1,0,0), (0,0,1,1)}
(c) {(1,3),(1,1)}; {(3,5),(5,3)}
(d) {(3,3),(1,-1)}; {(5,2), (7,10)}
(e) {(2,2,0,0), (1,2,1,0), (1,1,1,1), (0,0,0,1)}; {(4,5,2,2), (1,1,-1,0)}
(f) {(2,3), ( 4 , - 2 ) } ; {(6,1), ( 8 , - 4 ) , (2,-5)}

6. In each part below, a set A and a matrix A are given. In each case, find B so that
A is the matrix of transition from A to B.

4 -1
(a) Λ = {(1,2), (0,1)}; A =
-4 3

5 7
(b) . 4 = {(1,2), (2,1)}; A
2 10
(c) ^4 = { ( 1 , 0 , 2 , - 1 ) , (0,1,1,2), (1,2,1,4), (2,2,3,0)};
0 2 4 2 0
0 3-6 1-2
0 0 0-1-1
0 - 1 2 2 3

0 1 1
0-3 2
(d) A ={(3,1,2,4), (1,0,1,-1), (0,2,0,-4), (0,-3,1,-3)}; A
0 1 -2
0 1 0

1 1 -1
(e) .A = { ( 0 , 1 , - 2 ) , (-1,1,2), ( 1 , - 1 , 0 ) } ; A = -1 3 -1
- 1 2 0
3.2 Matrices of Transition 65

1 2
(f) Λ = {(2,1,0,-1,0), ( 0 , - 2 , 0 , 2 , 0 ) , ( 1 , - 1 , 0 , 1 , 0 ) } ; A 1 3
1 -4

7. Determine whether or not there is a matrix of transition from the first set to the
second, and find such a matrix if it exists.

(a) {(1,0,2,0),(0,1,2,1)}; {(3,0,6,0), ( - 8 , 4 , - 8 , 4 ) , ( - 5 , 4 , - 2 , 4 ) }


(b) {(2,4,6,4), (4,6,7,3), ( 2 , 0 , - 4 , - 6 ) } ; {(2,2,1,-1), (0,2,5,5)}
(c) {(2,2,0,0), (1,2,1,0), (1,1,1,1)}; {(1,1,0,0), (0,1,1,0), (0,0,1,2)}
(d) {(2,4,2,-2), ( 0 , 3 , - 2 , 4 ) } ; {(1,2,1,-1), (0,1,2,3), (1,4,5,5)}

8. Each of the sets A below is a basis of (A) ■ Find the vector that has the given
matrix C as its coordinate matrix with respect to A-

(a) . 4 = {(1,3), ( - 2 , 1 ) } ; C

3
(b) ,A= {(2,-1,1), ( 0 , 1 , - 1 ) , ( - 2 , 1 , 0 ) } ; C 1
-4

0
(c) .A = { ( 3 , 2 , - 1 , 0 ) , ( 0 , - 2 , 5 , 0 ) , ( 2 , 0 , - 4 , 1 ) } ; C -2
1

2
(d) A = { ( 0 , 3 , 4 , 6 ) , ( - 1 , - 2 , 0 , 2 ) , ( 4 , 0 , - 3 , 1 ) } ; C -3
-1

9. With A as given in the corresponding part of Problem 8, the given vector v is in


(A). Find the coordinate matrix of v relative to A.
( a ) v = (2,20) ( b ) v = (4,0,-1) (c) v = ( 4 , 8 , - 8 , - 1 ) (d) v = (0,8,3,-9)

10. Assume that the vector v = (xi,X2,^3) has coordinate matrix [v]^ C2

C3

relative to the basis A = {(1,0,1), (0,1,1), (1,1,0)}, and express the components
£ι,£2ΐ#3 of v in terms of ci,C2,and C3.
66 Chapter 3 Matrix Multiplication

11. Suppose that the vector v = (x\, x2, £3) has coordinate matrix [v]^ = d2 with
ds
respect to the basis A = {(1,1,1), (1,1,0), (1,0,0)}. Express each di in terms of
xi,X2 5 and £3.

12. Assume that A = {u, v , w } is a basis of R 3 , where u = (^1,1/2,^3), v =


(^1,^2,^3), and w = (i^i, 1^2,^3). If a vector (xi,X2?^3) has coordinates di,c?2,^3
relative to A, express the components Xi in terms of the components of u, v, and
w.

13. Let A ={ui,ii2, ...,u r } and B = {νχ, V2,..., v r } be bases of W , and let P —
[Pij]rXr be the matrix of transition from A to B. If v has coordinates ci, C2,..., cr
relative to A and di,d2,...,dr relative to ß, what relation exists between the C{
and the di?

3.3 Properties of Matrix Multiplication


Let A = [aij]rXs be a given r by s matrix over R. If W is any subspace of R n and
A— {ui, 112,..., u r } is any set of r vectors in W , then A is the matrix of transition from
A to a unique set B = {vi, V2,..., v s } , where Vj = 5^[ =1 α^ιΐ;. That is, once W and A
are chosen, the matrix A is the matrix of transition from A to one and only one subset
B. Also, as was pointed out in the last section, there is only one matrix of transition
from A to B if A is a basis of W . With these facts in mind, we make the following
definition.

Definition 3.5 Let A = [aij]rxa and B = [bij]sxt be matrices over R, and let W
be a subspace of R n of dimension r. Then A is the matrix of transition from a basis
A = {ui,U2, ...,u r } of W to a set B = {vi, V2,..., v s } in W , and B is a matrix of
transition from B to a set C — {wi, W2,..., w £ } in W . The product AB is defined to
be the matrix of transition from A to C. (See Figure 3.1.)

Λ >C
AB

F i g u r e 3.1
3.3 Properties of Matrix Multiplication 67

We observe that if A is r by s and B is s by t, then AB is r by t. Also, the number


of rows in B must be the same as the number of columns in A in order for AB to be
defined. When the product AB is defined, we say that B is conformable to A. The roles
played by the dimensions of the matrices involved are shown in the following diagram.

B. = a
A ^ ^ A y
must be
equal

dimension of product matrix

The product AB as given in Definition 3.5 involves not only the matrices A and B,
but choices of A and W as well. This means that there is a possibility that the product
AB may not be unique, but may vary with the choices of A and W . The next theorem
shows that this does not happen, and the product AB actually depends only on A and
B.

Theorem 3.6 / / A = [α^]ΓΧ3,Β = [bij}sXt and AB = [c^] r X i , then c^· = Σ / U i aikbkj


for all pairs i,j.

Proof. Let A = {ui, 112, ..·, u r } be a basis of W . Then A — [ a ^ ] r x s is the matrix of


transition from A to a set B = {vi, V2,..., v s } , and B = [bij]sXt is a matrix of transition
from B to a set C = {wi,W2,...,wJ. That is, v& = ΣΓ=ι cuk^i for k = 1,2, ...,s and
w
j = Σ1=ι bkjwk for j = 1,2,..., t. Thus

W
J = Σ hjVk
s / r
a
= Σ fcfcj ( Σ ^u?
fc=l \i=l
s / r
a
— Σ (Σ ikbkjUi
fc=l \ i = l
r / s
= Σ ( Σ ciikbkjUi

a
Σ Σ i/c^j ) u 2 .
2=1 \fc=l

But ^45 = [cij]rxt is the matrix of transition from *4 to C, so that Wj = Σ Ι = ι cijui-


Since A is a basis, Wj can be written as a linear combination of 111,112, ...,u r in only
one way. Therefore, cij = Σ ^ = ι aikbkj for all pairs z, j . M

If A is a matrix with only one row, A = [αι,ο^ ···, Û S ], and B is a matrix with a
68 Chapter 3 Matrix Multiplication

single column,

b2
B =

then AB has only one element, given by a\b\ + a2b2 + * * · + asbs· This result can be
committed to memory easily by mentally "standing A by B" forming products of pairs
of corresponding elements, and then forming the sum of these products:

αι&ι
+d2b2
+ · ··
+asbs .

It is easily seen from the formula in Theorem 3.6 that the element in the zth row and
j t h column of AB can be found by multiplying the z th row of A by the j t h column of
B, following the routine given for a single row and a single column. This aid to mental
multiplication is know as the row-by-column-rule.
The row-by-column rule uses the same pattern as the one for computing the inner
product of two vectors. (See Definition 1.18.) This pattern is illustrated in the following
diagram.
column j of B

... 01j ...


Γ Ί ... 1)2 j '··

of A} Un CLi2 CliS · ' ' CLin . . . b3j ...

bnj

column j of C
S
v '

{row i of C

where
Cij = diibij -f ai2b2j + a^bsj + · · · + Q<inbnj.
3.3 Properties of Matrix Multiplication 69

Example 1 Ü Consider the products AB and BA for the matrices

2 -1
3 1 7 -2 -3
A = , B =
6 -5 5-4 8
0 4

The number of rows in B is the same as the number of columns in A, so the product
AB is defined. Performing the computations, we find that

2 -1
, 3^ 1. , , 7 - 2 - 3
AB =
6 -5 "5-4 0
0 4

(2)(7) + (-l)(5) (2)(-2) + ( - l ) ( - 4 ) (2)(-3) + (-1)(0)


(3)(7) + (1)(5) (3)(-2) + ( l ) ( - 4 ) (3)(-3) + (1)(0)
(6)(7) + (-5)(5) (6)(-2) + ( - 5 ) ( - 4 ) (6)(-3) + (-5)(0)
(0)(7) + (4)(5) (0)(-2) + (4)(-4) (0)(-3) + (4)(0)
9 0 - 6
26 - 1 0 -9
17 8 -18
20 - 1 6 0

The product BA is not defined because the number of rows in A is 4 and the number
of columns in B is 3. H

The major result concerning properties of matrix multiplication is stated next.

Theorem 3.7 Let A = [α^]ΓΧ3,Β = [bij]sxt, and C — [cij]txv be matrices over R.


Then A(BC) = (AB)C.

Proof. By Theorem 3.6, AB = [dij]rXt where ά^ = 5 Z m = i a i m ^ j »


rk=idikckj where and (AB)C =

t t / s \
a
Σ dikCkj = Σ ( Σ imbmk I Ckj
k=l k=l \m=l /
t / s
=
Σ ( Σ (aimbmk)Ckj
k=l \m=l
70 Chapter 3 Matrix Multiplication

Similarity, BC = [fij}SXv where fa = Y?k=1bikckj, and A(BC) = E m = i airnfTnj}rXv


where
s s / t >
*ιτη
7n=l ra=l \fc=l
s / ί
a
= Σ ( Σ im(bmkCkj)
ra=l \k=l
= a
Σ ( Σ im(bmkCkj) )·
fc=l \ m = l /

But ciimftrnkCkj) = (ûim^m/c)cfcj from the associative property of multiplication of real


numbers. Hence A(BC) = (AB)C. ■

If all necessary conformability is assumed, then matrix multiplication is associative.


Now the matrix product BA may fail to exist, even when AB is defined, so the
commutative property of multiplication does not hold in general. In Example 2, matrices
A and B are given for which AB and BA are both defined, and yet AB ^ BA.

1 2 3 -1
Example 2 D Let A = and B = Then
4 0 2 1

7 1 -1 6
AB = and iM =
12 - 4 6 4

so that AB φ BA.

There are two other fundamental properties of multiplication of real numbers that
are not valid for multiplication of matrices. In general AB = AC and A φ 0 do not
imply B = C, nor do BA = CA and A φ 0 imply that B — C. That is, there is no
cancellation property for matrix multiplication. Also, AB = 0 does not imply that one
of A, B must be zero. Thus, the product of two nonzero matrices may be a zero matrix.
Examples of these situations are requested in some of the exercises.

Exercises 3.3

1. Compute the product AB for the given matrices.

2 0 5 2 0
(a) A- 1 -2 4 , B = 8 1
3 1 6 5 -1

4 -5 8-2 7
(b) A B
-2 1 3 0 6
3.3 Properties of Matrix Multiplication 71

1 -2
3 2-1
(c) A £ = 5 -3
1 4 2
4 6

(d) A- -2 7 5 =

-1 -il
2 -5 3 0 -8 -1
(e) A = ß
-6 8 1 -2 -9 3
1 4 4

2 1
(f) A B = 4
5 0
-5

2. Whenever possible, compute the products BA in Problem 1.


3. Let A = [di^mxn and B = [bij]rxt be matrices over R.

(a) What conditions are necessary in order that AB be defined?


(b) What are the dimensions of AB when it exists?
(c) What conditions are necessary in order that both AB and BA be defined?
(d) What conditions are necessary in order that the dimensions of AB be the
same as the dimensions of BA?

-2 1 0
4. Given that A = is the transition matrix from A — {(1,1), (0,1)}
0 2 1
1 0
4 -3
to the set #, and that B — is the transition matrix from B to C, find
0 -1
i -8 6
the matrix of transition from A to C.
5. Give an example of nonzero matrices A and B such that AB = BA.
6. Let A
— [&ij]rxs- Under what conditions does Ak exist for every positive integer
k? (A = A, A2 = A ■ A, etc.)
1
72 Chapter 3 Matrix Multiplication

7. Give an example of two 2 x 2 matrices A and B over R such that AB = 0 but


A φ 0 and B φ 0.

8. Give an example of 2 x 2 matrices A, B, and C over R such that AB = AC, but


B φ C and A φ 0.

9. Let A = [aij]2x3,B = [6^]3X2, and C = [ c ^ x i over R. For these particular


matrices, verify that A(BC) = (AB)C without using Theorem 3.7.

10. For each of the following pairs A, B, let AB = [cij] and write a formula for Cij in
terms of i and j .

(a) A = [dij]2x3 where a^ =i+j,B = [6^]3χ4 where bij = 2i - j


(b) A = [aij]3x2 where a^ =i-j,B = [^·] 2 χ4 where b^ =i+j
(c) A = [aij}3x2 where αίό =i- j,B = [^·] 2 χ3 where b^ = 2i + j
(d) A = [α^]2χ3 where a^ =i + 2j,B = [ ^ ] 3 χ 2 where b^ = i - 2j

11. Show that the matrix equation

1 -2 1 Xl 4
2 0 3 X2 = 5
1 4 -1 X3 6

is equivalent to a system of linear equations in #i, x2, and x 3 . (Hint: Use Definition
3.2.)
12. Reversing the procedure in Problem 11, write a matrix equation that is equivalent
to the system of equations

Xi + 4x 2 + 3x 3 -h 2x 4 = 3
X2 + #3 + 7X4 = —5
2xi — 3x2 + x 3 — X4 = 0.

13. Prove Theorem 3.7 by direct use of Definition 3.5.

3.4 Invertible Matrices


Those matrices that are matrices of transition from one basis to another basis are of
particular importance.

Definition 3.8 A matrix A — [ a ^ ] n X n is nonsingular if and only if A is a matrix of


transition from one basis of R n to another basis of R n . A square matrix that is not
nonsingular is called singular.
3.4 Invertible Matrices 73

Suppose that A = [α^] is a square matrix of order n. For each basis A of R n , A


is the matrix of transition from A to a set B of n vectors in R n . Generally, different
choices of A yield different sets B. In order for the term nonsingular to be well-defined,
we must show that if one of the sets B is a basis, then all of them are bases.
Suppose A = [aij]nxn is the matrix of transition from the basis A = {ui, 112,..., u n }
of R n to the basis B = {νχ, V2,..., v n } of R n . Let A! = {u^, u 2 ,..., u^} be another basis
ofRn,andletß' w, ' 2 > · · , ν ^ } be the set so that A is the matrix of transition from
A! to B'.
In order to show that B' is a basis of R n , it is sufficient to show that B' is linearly
independent (Theorem 1.33). Suppose that

Σ ^ = °·
j=i

Since v^· = Y%=1 a^u-, we have Σ£=ι bj (ΣΓ=ι a^u·) = 0 and

n I n \

2=1 \j=l J

But A! is linearly independent, so this means that ^j=i^jaij = 0 for i = l,2,...,n.


Hence
n n / n
Σ bjVj = Σ, hj Σ «iju
i=l j=l \i=l
n I n \
b a U
= Σ Σ 3 iJ i

= Σθ-Ui
2=1

= 0,
and this implies that b\ = 62 = * · · = bn = 0 since B is linearly independent. Thus, B'
is linearly independent and is a basis of R n .
An n by n matrix A over R is a nonsingular matrix if and only if the columns of
A record the coordinates of one basis of R n with respect to a second (not necessarily
different) basis of R n . That is, A is nonsingular if and only if the j t h column of A is the
coordinate matrix of the j t h vector in a basis of R n for j = 1, 2,..., n.
In the discussion of special types of matrices just before Definition 3.2, an identity
matrix was defined to be a matrix of the form In = [ i y ] n x n , where <5^ is the Kronecker
delta. There are many identity matrices, 7 n , but only one for each value of n. As
examples,
1 0 0
1 0
and I3 0 1 0
0 1
0 0 1
74 Chapter 3 Matrix Multiplication

From the placing of the l's and O's in 7 n , it is easy to see that In is the unique matrix
of transition from a basis A to the same basis A, and In is therefore nonsingular.
Using the fact that In is the transition matrix from A to A, it follows easily from
Definition 3.5 that
ImA = A and AIn = A
for any m x n matrix A. In particular,
lnA = A = Aln

if A is an n x n matrix. Thus In is a multiplicative identity for square matrices of order


n, and it is upon this fact that the term identity matrix is based.
The existence of a multiplicative identity for square matrices of order n leads nat­
urally to the question as to which square matrices have multiplicative inverses. When
working with matrices, it is conventional to use the term inverse to mean "multiplicative
inverse."
Definition 3.9 Let A be a square matrix of order n. Annxn matrix B is an inverse
of A if AB = In = BA. Also, a square matrix is called invertible if it has an inverse.
We note that inverses of matrices occur in pairs in this sense: If B is an inverse of
A, then A is also an inverse of B.
Our next theorem gives an answer to the question as to which square matrices are
invertible.
Theorem 3.10 Let A be a square matrix or order n. Then A is invertible if and only
if A is nonsingular.

Proof. Suppose that A is a basis of R n and let the nxn matrix A be the transition
matrix from A to a set B of n vectors in R n .
Assume first that A is nonsingular. Then B is a basis of R n and therefore every
vector in A is a linear combination of vectors in B. Hence there exists a transition
matrix B from B to A. By Definition 3.5, AB is the matrix of transition from A to A.
It follows that AB = In since In is the unique matrix of transition from A to A. In
similar fashion, we can show that BA is the matrix of transition from the basis B to B,
and therefore BA = In.
To prove the other part of the theorem, assume that A has an inverse 2?, so that
AB = In and BA — In. With the same notation as in the first paragraph of this proof,
let B be the transition matrix from the set B of n vectors to a set C of n vectors in
R n . Then AB is the transition matrix from A to C, by Definition 3.5. But AB = In,
so the set C is exactly the same as A. This means that B is a transition matrix from B
to A. Therefore A is dependent on B, and this implies that R n is dependent on B, by
Theorem 1.8. That is, B spans R n . It follows then from Theorem 1.34 that B is a basis
of R n and hence A is nonsingular. ■
Corollary 3.11 Suppose A is the matrix of transition from a basis A ofHn to a basis
B of R n . Then the matrix B is an inverse of A if and only if B is the transition matrix
from B to A.
3.4 Invertible Matrices 75

Proof. The corollary follows at once from the proof of the theorem. I

Example 1 □ The matrix


3 15
A
4 20
is not invertible. It is the transition matrix from the standard basis £2 — {(1,0), (0,1)}
to B = {(3,4), (15,20)}, and B is clearly dependent because its second vector is equal
to 5 times its first vector. As an example of an invertible matrix, consider the matrix

1 1 1
0 1 1
0 0 1

This matrix is the transition matrix from £3 = {(1,0,0), (0,1,0), (0,0,1)} to B —


{1,0,0), (1,1,0), (1,1,1)} and it is easy to see that B is linearly independent since none
of its vectors are linear combinations of the preceding vectors. Of course, it is not always
so easy to tell whether a matrix is invertible or not. ■

Up to this point, we have allowed the possibility that a matrix might have two or
more distinct inverses. The next theorem shows that this possibility does not actually
happen.

Theorem 3.12 The inverse of an invertible matrix is unique.

Proof. Suppose that B and C are both inverses of the n x n matrix A. Then all of
the equations
AB = In = BA and AC = In = CA
are valid. To prove that B = C, we evaluate the product BAC in two different ways.
First, we have

B(AC) = BIn since AC = In


= B since In is a multiplicative identity.

Second, we have

(BA)C = InC since BA = In


= C since In is a multiplicative identity.

But we proved that matrix multiplication is associative in Theorem 3.7. Therefore


B(AC) = (BA)C, and hence B = C.U

Theorem 3.12 allows us to make the following definition.


76 Chapter 3 Matrix Multiplication

1
Definition 3.13 If A is an invertible matrix, its unique inverse is denoted by A .

E x a m p l e 2 D We saw in Example 1 of this section that the matrix

1 1 1
A = 0 1 1
0 0 1

is invertible and is the transition matrix from the basis £3 = {(1,0,0), (0,1,0), (0,0,1)}
to the basis B = {(1,0,0), (1,1,0), (1,1,1)} of R 3 . In order to find A~l, it is sufficient
to find the transition matrix from B to £3. Since

(1,0,0) = (1)(1,0,0) + (0)(1,1,0) + (0)(1,1,1)


(0,1,0) = (-1)(1,0,0) + (1)(1,1,0) + (0)(1,1,1)
(0,0,1) = (0)(1,0,0) + (-1)(1,1,0) + (1)(1,1,1),

we see that
1 -1 0
1
A' 0 1 -1
0 0 1

If a given square matrix A is complicated, a more efficient method than the one we used
here is needed to find A~l. Such a method is presented in Section 3.6. ■

Some of our earlier results can be rewritten using the exponential notation in Defi­
nition 3.13. The equations AB = In = BA now read as

AA~l =In = A'1 A.

Also, we noted earlier that inverses occur in pairs: When B = -A -1 , then A = B~~l.
Substituting the value B — A~l in the equation A = B~l yields

A=(A-r1
In the definition and all discussion in this section, it has been required that both of the
equations A A"1 = In and A~XA = In be satisfied by the inverse matrix. Our next two
theorems show that these equations are not independent for square matrices. In fact,
for square matrices, we shall see that either of them implies the other.

T h e o r e m 3.14 Let A be a square matrix of order n over R. / / there is a square matrix


B over R such that AB — J n , then A is invertible and B — A~l.
3.4 Invertible Matrices 77

Proof. Suppose there is a square matrix B such that AB — In. Now A is the matrix
of transition from Sn to a set A of n vectors in R n , B is a matrix of transition from A
to a set B of n vectors in R n , and AB = J n , so ß must be En. Thus B is a matrix of
transition form A to £ n , and this means that each vector in En is a linear combination
of vectors in A. That is, Sn is dependent on A. And since R n is dependent on £ n , this
means that R n is dependent on A. By Theorem 1.34, A is a basis of R n , and A is
invertible. It follows from Corollary 3.11 that B = A~l. ■

In view of Theorem 3.14, we see that a matrix A of order n is invertible if and only
if there is a square matrix B such that AB = In.
The proof of the next theorem is quite similar to that of Theorem 3.14, and is left
as an exercise.

Theorem 3.15 Let A be a square matrix of order n over R. / / there is a square matrix
B over R such that BA = 7 n , then A is invertible and B = A~l.

Theorem 3.16 If A and B are invertible matrices of the same order, then AB is
invertible and ( A B ) - 1 = B~1A~1.

Proof. If A and B are invertible matrices of order n, then A~l and B~l exist. Since

{AB){B~1A-1) = A(BB-l)A~l = AInA~l = 7n,

AB is invertible and ( A B ) - 1 = B~lA~l by Theorem 3.14. ■

Repeated application of Theorem 3.16 yields the following corollary.

Corollary 3.17 If Ai, A2,..., Ak are square matrices of order n over R and each Ai is
invertible, then A\A<i · · · Ak is invertible and (A1A2 · · · Ak)-1 = A^1 · · · A^A^1.

Definition 3.18 If A and A' are bases of a subspace W ofHn such that A! is obtained
from A by a single elementary operation, then the matrix of transition from A to A' is
an elementary matrix.

That is, a matrix M is an elementary matrix if and only if it is a matrix of transition


from a given basis to another basis that is obtained by applying a single elementary
operation E to the original basis. We shall say that the elementary matrix M and
the elementary operation E are associated with each other. The elementary matrix
is classified as type I, II, or III according to the type of elementary operation that is
applied to the original basis.

Theorem 3.19 A square matrix over R is invertible if and only if it is a product of


elementary matrices.
78 Chapter 3 Matrix Multiplication

Proof. Suppose that A = M1M2 · · · Mt, where each Mi is an elementary matrix.


Now each Mi is invertible since it is a matrix of transition from one basis to another,
and therefore A is invertible by Corollary 3.17.
Suppose now that A is of order n and invertible. Then A is the matrix of transition
from En to a basis A of R n . By Theorem 2.13, A can be obtained from En by a sequence
E\,E2,...,Et of elementary operations. If Mi is the elementary matrix that is the
matrix of transition for the operation E^ then M1M2 · · · Mt is a matrix of transition
from En to A. But this matrix of transition is unique since En is a basis, and therefore
A= M1M2--Mt.m

If it is desired to write a given invertible n x n matrix A as a product of elementary


matrices, there may be some difficulty in following one of the steps described in the proof
of the last theorem. Usually it is not obvious how the basis A can be obtained from
En by a sequence £Ί, £"2, ...,2?t of elementary operations, and an indirect approach is
easier to use. If we start with the set A and apply a sequence of elementary operations
Fi, F2, ...,Ffc to find the standard basis of (^4), that standard basis will be En since
(A) = R n . We will then have

Fk - - - F2FM) = £n

and therefore

A = F^F2-1---F^(Sn).

If Mi is the elementary matrix of transition that is associated with Fi, then this indirect
procedure will yield

Α = Μ~λ ---Μ^Μΐ1

since M~l is the elementary matrix associated with the elementary operation F~l. Note
that we also have

A'1 =M1M2--Mk

since M\ M2 · · · Mk is the transition matrix from A to En. This is diagrammed in Figure


3.2.
3.4 Invertible Matrices 79

' M2 M3
F,(A)

M,
A- —» Fk F2F,(A)=£n
A' =M,M2 Mk

A = Fi' F;' Fk' (£j <

Figure 3.2

Example 3 O Consider the problem of expressing the invertible matrix

2 3
A =
6 4

as a product of elementary matrices. This matrix A is the transition matrix from £n to


A = {(2,6), (3,4)}. We start with A and apply a single elementary operation at a time
until we obtain 82- Using arrows to indicate that an elementary operation F, has been
applied, this appears as follows.

. 4 = {(2,6), (3,4)} -^ {(1,3), (3,4)}


F2
{(1,3), ( 0 , - 5 ) }
F3
{(1,3), (0,1)}
{1,0),(0,1)}=£2

The following display shows the elementary operations Fi, their associated elemen-
80 Chapter 3 Matrix Multiplication

l
tary matrices Mi, and the inverses Mi .

ELEMENTARY ASSOCIATED
INVERSE
OPERATION MATRIX

FI {Multiply the first vector by \. Mx =


Γ 1
2 ol
M^1 I"2 ol
L° 1J [0 1 J
Replace the second vector by the
F2{
-
11 -3] 11 3]
sum of the second vector and (—3) M2 M;
10 1J [0 1 1
times the first vector.

F3 {Multiply the second vector by (—5) M3


11 ol M3- 1
1 0
10 - 5 J 1
0 -5
Replace the first vector by the
1 0 Γι ol
FA { sum of the first vector and (—3) M4 M;
-3 1 |_3 1 1
times the second vector.
According to the discussion preceding this example, we can now express A as a product
of elementary matrices by using A = M±l M% l M 2 ~ l Mfl.

2 3 1 0 1 0 1 3 2 0
=
6 4 3 1 0 -5 0 1 0 1
Exercises 3.4
1. For each matrix A, determine A so that A is the matrix of transition from 83 to
A. Then use Definition 3.8 to decide whether or not A is nonsingular.
2-6 6 4 4 4
(a)A = -5 13 1 (b)A = 3 4 2
-2 4 10 -6 1 7

5 2 7 1 5
(c)A- 2 1 0 (dM = 3 9
2 9 3 -4
2. Which of the following are elementary matrices?
1 4 0 1 0 0 1 4 1 1 0
(a) 0 1 0 (b) 4 1 0 (c) 4 1 (d) 1 1 0
0 0 1 0 0 1 0 0 0 0 1
3.4 Invertible Matrices 81

0 0 1 1 1 0 1 0 0 0 1 0
(e) 0 1 0 (f) 0 2 0 (g) 0 1 0 (h) 0 0 1
1 0 0 0 0 1 0 0 2 1 0 0

3. Suppose that A is a set of three vectors in R 4 and that B is obtained from A by


replacing the second vector by the sum of the second vector and 5 times the third
vector.

(a) Write a matrix of transition from A to B.


(b) Write a matrix of transition from B to A.

4. Write the inverse of each elementary matrix in Problem 2.

1 0 1 0 0 1 1
5. Given that A , write A as a product of elemen­
2 1 0 3 1 0
tary matrices.
6. Write each of the following invertible matrices as a product of elementary matrices.
2 4 -1 1 3 4 0 3
(a) (b) (c) (d)
3 4 1 0 2 1 -2 6

7. Find the inverse of each matrix in Problem 6 by use of Corollary 3.17 and the
factorization obtained in Problem 6.
8. Each of the matrices A below is nonsingular. Use A as the transition matrix from
the basis A = {(1, - 2 ) , (2,1)} of R 2 to the basis A! of R 2 . Determine A'1 by
finding the matrix of transition from A! to A.

1 3 2 -7 2 1 2 1
(a) A: (b)A (c)A W A-
-1 - 2 -1 4 -1 2 5 3

9. Prove Theorem 3.15.


10. Let A, B, and C be square matrices of order n. Prove that if A is invertible and
AB = AC, then B = C.
11. Prove Corollary 3.17 by mathematical induction.

2 4 2 3
12. Given the matrices B = and (AB} -1 _ , find A-1.
7 8 -1 -2

1 3 -4 -5
13. Given that B~l = and AB , find the matrix A.
-1 - 2 2 3
82 Chapter 3 Matrix Multiplication

-2 1 0
14. Suppose A = is the transition matrix from A = {(1,1), (1,0)}
0 2 1
1 0
4 -3
to the set B, and B = is the transition matrix from B to C.
0 -1
-8 6

(a) Find the matrix of transition from A to C.


(b) Find the set C.
(c) Does ( A B ) - 1 exist? Explain the reason for your answer.

15. Given that the invertible matrix

4 6 0
B = 2-5 3
4 8 2

was obtained from the invertible matrix A by adding 3 times the first column to
the second column, find the elementary matrix M such that BM = A. (Hint: Let
B be the matrix of transition from £ 3 to B.)
D
16. Derive a formula for the inverse of a nonsingular A = [aij]2x2 Y consideration of
a system of equations obtained from

an a12 X\ #2 1 0
a2\ a22 £3 X4 0 1

17. Prove that if A = [α^]ηΧη is not invertible, then AB is not invertible for any n x p
matrix B. (Hint: Use Theorem 3.14.)
18. Prove that if A = [α^}ηχη is singular, then BA is singular for any p x n matrix
B. (Hint: Use Theorem 3.15.)

3.5 Column Operations and Column-Echelon Forms


The elementary matrices of each type can be conveniently described by comparing them
with the identity matrix In. (It is clear that the identity operation has In as its associated
elementary matrix.) The operation of multiplying a matrix by a scalar is helpful in
describing these comparisons. In this section, it is especially useful in situations where
the matrix is a column matrix. Other uses will be found in later sections.
3.5 Column Operations and Column-Echelon Forms 83

Definition 3.20 For any a e R and any A = [ a ^ ] m X n over R, the product aA is


defined by a A = [aaij] m X n .

Note that in the product a A, each element of A is multiplied by the scalar a. A


matrix of the form aln is called a scalar matrix.
Suppose now that M is an elementary matrix with E as its associated elementary
operation.
If M is of type I, then B = E(A) is obtained from a basis A of R n by multiplying
the 5 t h vector of A by a nonzero scalar a. This means that M differs from In only in
column s, and the 5 t h column of M is the product of a and the 5 t h column of In.
If M is of type II, then E replaces one of the vectors v s in a basis A = {vi, V2,..., v n }
of R n by v s + &Vt, where s φ t. This means that M is identical to In except in column s.
In column s o f M , 1 appears in row s and b appears in row t. Thus, M can be obtained
from In by adding to column s of In the product of b and column t of 7 n , where s φ t.
Consider now the case where M is of type III. Then E interchanges the s t h and
t (s φ t) vectors in a basis of R n . This means that M can be obtained from In by
th

interchanging the 5 t h and tth columns (s φ t).


We have thus found that each elementary matrix can be obtained from In by per­
forming a suitable operation on the columns of In.

Definition 3.21 Corresponding to the operations performed on In above, we define


three types of elementary column operations on a matrix A.

(I) A n elementary column operation of type I multiplies one of the columns of


A by a nonzero scalar.
(II) A n elementary column operation of t y p e II adds to column s of A the
product ofb and column t of A (s φ t).
(Ill) A n elementary column operation of type III interchanges two columns of
A.

A more unwieldy but more accurate description of an elementary column operation


of type II would be this statement: an elementary column operation of type II replaces
column s of A by the sum of column s and b times column t (s φ t). This wording more
closely matches the descriptions we used with elementary operations on sets of vectors.
Let us consider the effect on an m by n matrix A when A is multiplied on the right
by an elementary matrix M of order n. This effect is diagrammed in Figure 3.3.

A
M

AM

Figure 3.3
84 Chapter 3 Matrix Multiplication

Now A is the matrix of transition from the basis £ m of R m to a set A of n vectors


in R m , M is a matrix of transition from A to a set B of n vectors in R m , and AM is
the matrix of transition from £ m to B.
If M is of type I, then B is obtained from A by multiplying the s t h vector of A by a
nonzero scalar a. This means that AM can be obtained from A by multiplying the s t h
column of A by a.
If M is of type II, then B is obtained from A = {νχ, V2,..., v n } by replacing v s by
v s + bvt (s ^ t). This means that AM is identical to A except in column s, and column
s in AM is obtained by adding to column s of A the product of b and column t.
If M is of type III, then B is obtained from A by interchanging the sth and £th
vectors (5 Φ t) of A Thus AM is obtained from yl by interchanging the 5 t h and tth
columns of A (s φ t).
Thus, multiplication of A on the right by an elementary matrix M of a certain type
performs an elementary column operation E of the same type on A. We shall say that
the elementary matrix M and the elementary column operation E are associated with
each other whenever multiplication of A on the right by M performs the elementary
column operation E on A.
In each case above, AM is obtained from A by a single column operation. Hence
performing an elementary operation on A corresponds to performing an elementary
column operation on A. By Theorem 2.7, (A) — (B), so A and AM are matrices of
transition from a given basis to set of vectors that span the same subspace.
In the remainder of this section, we shall be concerned with the reduction of a given
matrix A to a certain standard form, known as the reduced column-echelon form.

Definition 3.22 A matrix A = [aij] m x r i over R that satisfies the following conditions
is a matrix in reduced column-echelon form, or a reduced column-echelon ma­
trix.
1. The first nonzero element in column j is a 1 in row kj for j = 1,2, ...,r. (This 1
is called a leading one.)
2. k\ < k2 < - · - < kr < m (That is, for each change in columns from left to right,
the leading one appears in a lower row.)
3. For j — 1,2, ...,r, the leading one in column j is the only nonzero element in row
kj.
4. Each of the last n — r columns consists entirely of zeros.
For future use, we note that conditions (1) and (3) can be reworded in the following
ways.

1. dij = 0 for i < kj, and a^j — 1 for j = 1, 2,..., r.


3. Column j is the only column with a nonzero element in row kj.

Thus a matrix is in reduced column-echelon form if and only if its nonzero columns
record the components of the standard basis of a subspace, or if its nonzero columns
form the matrix of transition from £ m to the standard basis of a subspace.
3.5 Column Operations and Column-Echelon Forms 85

Example 1 □ Consider the question as to which of the following matrices are in re­
duced column-echelon form.

1 0 0 1 0 0 1 0 0 1 0 0
0 2 0 0 0 0 2 0 0 0 0 1
B = C = D =
5 4 3 0 1 0 0 1 0 0 1 0
0 0 0 3 4 0 3 4 1 0 o oj
The matrix A is not in reduced column-echelon form. It fails to satisfy condition (1) in
the second and third columns because the first nonzero element in each of these columns
is not a 1. The matrix B is in reduced column-echelon form since it satisfies all four
conditions. The matrix C fails on condition (3) because the leading 1 in column 3 is not
the only nonzero element in row k% = 4. The matrix D fails on condition (2) because
&2 = 3 and k% = 2 violate ki < k^. That is, the leading 1 in column 3 of D fails to be
in a lower row than the leading 1 in column 2. ■

Theorem 3.23 If A is any m x n matrix over H, there is an invertihle matrix Q over


R such that AQ — A is a matrix in reduced column-echelon form. The matrix A! is
uniquely determined by A.

Proof. Let A = [α^] be an m x n matrix over R and consider the set A =


{vi, V2,..., v n } of n vectors in R m where Vj = (aij,a,2j, ...,a mj ·). (That is, A is the
matrix of transition from £ m to A.) By Theorem 2.9 (with m and n interchanged),
there is a sequence £Ί, Ε^, ···, £^ of elementary operations that can be applied to A to
obtain a set A = {v[, v 2 ,..., vj.,0, ...,0} in which

1. The first nonzero component from the left in Vj is a 1 in the kj component for
j = l,2,...,r.

2. fci < &2 < * · · < kr < m.


3. Vj is the only vector in A with a nonzero kj component.

4. {v[, v 2 ,..., v^.} is a basis of (A).

Now each of the elementary operations Ei has an associated elementary matrix Qi,
and when Ei is applied to a set, a matrix of transition from that set to the new set is
Qi. As the diagram in Figure 3.4 shows, this means that a matrix of transition from A
to A is Q1Q2 ' ' ' Qt, and that AQ1Q2 · · · Qt = A is the matrix of transition from £ m
to A. Thus the i t h element in column j of A = [α^] is equal to the i t h component of
y' and therefore:

1. The first nonzero element in column j of A is a 1 in row fcj, for j — 1, 2,..., r.

2. k\ < k2 < · · · < kr < m.


86 Chapter 3 Matrix Multiplication

3. Column j is the only column with a nonzero element in row kj.


4 . T h e last n — r columns consist entirely of zeros.

T h e matrix Q = Q1Q2 · · · Qt is invertible since each Qi is invertible, and the matrix


A' is uniquely determined by A since A' is uniquely determined by A. H
Q2 Qs
Et(A) > E2E,(*) > .m
Qi

AQ,Q2 Q,=A'

Figure 3.4
We saw in Chapter 2 that the set A' used in the proof of Theorem 3.23 is uniquely
determined by the set A. It follows from this fact that each matrix A over R has an
associated unique reduced column-echelon form A1 as described in the proof. However,
we have seen in Chapter 2 that the sequence of elementary operations used to obtain A!
from A is not unique. The invertible matrix Q is similarly not unique in spite of the fact
that AQ = A! is unique for A. An example that demonstrates this lack of uniqueness
is given at the end of this section.
Definition 3.24 The matrix A! in Theorem 3.23 is called the reduced column-
echelon form for A.
We shall now show how the proof of Theorem 3.23 can be interpreted so as to give
a systematic procedure for finding an invertible Q such that AQ is in reduced column-
echelon form. This procedure is closely related to that used in Chapter 2 in finding the
standard basis of a subspace.
Suppose that A — [α^] is a given m x n matrix over R. Interpreting A a s a matrix
of transition from Sm to A is equivalent to obtaining A by recording the components of
vectors in A, as was done in Chapter 2. We have seen that performing an elementary
operation on A corresponds to performing an elementary column operation on A. With
these interpretations, the procedure in Example 3 of Section 2.4 can be regarded as a
method for obtaining the reduced column-echelon form

0 0 0 0
1 0 0 0
f
A = I -2 0 0 0
0 1 0 0
- 1 1 0 0
3.5 Column Operations and Column-Echelon Forms 87

for the matrix


0 0 0 0
0 3 2 -1
A = 0 -6 -4 2
1 1 2 2
1 -2 0 3
This is an efficient method for finding the reduced column-echelon form, but it does not
yield the matrix Q of Theorem 3.23.
In order to obtain the matrix Q, one needs a method for recording the products
Q1Q2 · ' ' Qi for i — 1,2,..., i. Consider the sequence of products

AIn = A
AInQ1 = AQX
AInQlQ2 = AQXQ2

AInQlQ2 -Qt = AQXQ2 .Qt = A'.

Let Ei be the elementary column operation that has Qi as its associated matrix. Now
AQi = E\(A), and AQ\Q2 · Qi = EiEi-\ · · · Εχ(Α) in general. Thus the right mem­
bers of the equations above may be found by applying the sequence E\, E2,..., Et to
A. The left members have as factors the products InQ\Q2 · · Qi = E\Ei-\ · · · E\{In).
Thus Q can be found by applying the same sequence of elementary operations to In.
What is desired, then, is an efficient method of recording the results Ei · · · E2E\(A) and
Ei- - E2Ei(In). This can be done effectively by recording both A and In in a single
matrix as
Γ
A

Instead of applying the elementary column operations separately to A and In,

A EM) E2EM)
1 1 etc.,
_/n_ _£l(/n)_ E2E1(In)

one can simply apply the operation to the entire matrices

A A A
Ei E2E\ etc.
In In
This procedure is quite valid since the same operations in the same order are to be
applied to each of A and / „ .
Chapter 3 Matrix Multiplication

Example 2 ü Using the matrix A in Example 3 of Section 2.4, we have

0 0 0 0 0 0 0 0
0 3 2 -1 2 3 0-1
0 -6 -4 2 -4-6 0 2
1 1 2 2 2 1 - 1 2
A 1 -2 0 3 0-2-1 3
~"*
In
1 0 0 0 0 0 1 0
0 1 0 0 0 1 0 0
0 0 1 0 1 0 0 0
0 0 0 1 0 0 0 1

0 0 0 0 0 0 0 0 0 0 0 0
1 3 0 -1 1 0 0 0 1 0 0 0
2 -6 0 2 2 0 0 0 - 2 0 0 0
1 1 -1 2 1 -2 -1 3 0 1 0 0
0 -2 -1 3 0 -2 -1 3 - 1 1 0 0
~~> ~~*

0 0 1 0 0 0 1 0 0 0 1 0
1 1 1 3
0 1 0 0 0 1 0 0 2 2 2 2
1 1 3 1 1 3 3 7
2 0 0 0 2 2 0 2 4 4 4 4

0 0 0 1 0 0 0 1 0 0 0 1

Thus

0 0 0 0
1 0 0 0
A' - 2 0 0 0
0 1 0 0
- 1 1 0 0
3.5 Column Operations and Column-Echelon Forms 89

and
0 0

0 0 0 1

is an invertible matrix such that AQ = A' It is easy to show that Q is not unique by
performing elementary operations that involve only the last two columns of Q. For
instance, adding the third column of Q to its fourth column yields the matrix

0 0

B =
! -i
0 0 0

such that AB = AQ = A'.

Exercises 3.5

1. Describe the elementary column operation that is associated with the given ele­
mentary matrix M.

0 1 0
0 1
(a) M = (b) M 1 0 0
1 0
0 0 1

1 0 0 1 0 0
(c) M 0 1 0 (d) M 2 1 0
0 2 1 0 0 1

2. Which of the following are in reduced column-echelon form?

0 0 0 0 1 0 0 0 0
1 0 0 1 0 0 1 0 0
(a) (b) (c)
0 1 0 0 1 0 1 1 0
2 3 0 1 0 0 1 1 1
90 Chapter 3 Matrix Multiplication

0 1 0 1 1 1
1 0 0
0 0 1 0 1 1
(d) (e) (f) 1 1 0
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0

3. Write the elementary matrix of transition from Λ to B.

(a) .4 = { ( 1 , 2 , - 1 ) , (2,1,3)}, B = {(1,2, - 1 ) , (0, - 3 , 5 ) }

(b) A = {(1,0,1), (0,1,1), (1,1,0)}, B = {(1,1,0), (0,1,1), (1,0,1)}

1 4 2
4. Given that B 2 0 1 is obtained from the matrix A by multiplying A
-2 1 3
1 0 0
on the right by 0 1 0 , find A.
2 0 1

5. If M is an elementary matrix, are A and MA matrices of transition from a given


basis to sets of vectors that span the same subspace? Illustrate with an example.

6. Find the reduced column-echelon form for each matrix.

0 0 0 1 0 1 1
9 6
1 2 1 2 1 3 1
(a) (b) -4 -3 (c)
2 4 2 -1 0 -1 -1
-4 -2
1 1 1 3 2 5 1

1 2 1 -1
0 0 0 0 1 2 1-1
2 4 2 -2
2 3 0 -1 2 4 2-2
0 1 2 3
(d) 4 -6 0 2 (e) (f) 0 1 2 3
1 4 5 5
2 1 -1 2 1 4 5 5
0 3 -2 4
0 -2 -1 3 0 3 - 2 4
1 6 1 6
3.6 Row Operations and Row-Echelon Forms 91

1 1 3 2 0 1 1 3 1 1
1 1 -1 0 0 1 0 2 0 1
(g) (h)
2 2 6 4 0 3 -2 4 0 7
-3 0-6-3 1 -1 0 -2 1 1

7. In each part of Problem 6, let A be the given matrix and find an invertible matrix
Q such that AQ is in reduced column-echelon form.

3.6 Row Operations and Row-Echelon Forms


In this section we shall develop elementary row operations that are analogous to the
column operations of the preceding section. In later chapters, these row operations will
be equally as important as column operations.
The connection that was made between columns and vectors led to a natural and
simple correspondence between elementary operations on sets of vectors and elementary
column operations. The interpretation of elementary row operations is not as simple,
and a difference in the method of investigation is necessary. The fundamental technique
is much the same for the different types of operations, with the most complicated sit­
uations occurring with operations of type II. For this reason, the derivation is carried
out for operations of type II, and those for the other types are left as exercises.
Let us return to the comparisons of elementary matrices with In that were made
earlier. If we direct our attention to the rows of the elementary matrices rather than
the columns, the resulting descriptions are as follows.
If M is the elementary matrix of type I on page 83, then M is obtained from In by
multiplying row s of In by a nonzero scalar a.
If M is the elementary matrix of type II on page 83, then M is obtained from In by
adding to row t of In the product of b and row s of 7 n , where s ^ t.
If M is the elementary matrix of type III on page 83, then M is obtained from In
by interchanging row s and t (s φ t).
Thus, each elementary matrix can be obtained from In by performing a suitable
operation on the rows of In.

Definition 3.25 There are three types of elementary row operations on a matrix A.
(I) An elementary row operation of type I multiplies one of the rows of A by a
nonzero scalar a.
(II) An elementary row operation of type II adds to row t in A the product of b
and row s in A, where s Φ t.
(Ill) An elementary row operation of type III interchanges two rows in A.
The descriptions of the products MA, where M is an elementary matrix, are very
much like those obtained for the products AM in Section 3.5, even though the deriva­
tions are fundamentally different. As mentioned earlier, we consider a matrix M of type
II here and leave the other derivations as exercises.
92 Chapter 3 Matrix Multiplication

Suppose that M is an m x m elementary matrix of type II and A = [α^] is m x n over


R. The matrix M is the matrix of transition from £ m to a basis A = {ui,ii2, ...,u m }
of R m , where u* = e* for ί φ s and u s = e s -f bet for s φ t. Also, A is the matrix of
transition from A to a set B = {vi, V2,... v n } , and MA is the matrix of transition from
£m to B. This is shown in Figure 3.5.
A
AC

-> B
MA

Figure 3.5
We have
V a U
j = Σ ij i
m
= ^2 aijUi-\-asjus
i=l
ϊφδ
771

= Σ üijei + asj(es + bet)

m
= 5Z a û' e ^ + (atj + basj)et
i=l
i±t
so that the coordinates of v^ relative to £ m are the same as the coordinates of \j
relative to A except for the tth coordinate, and the tth coordinate of Vj relative to Sm
is obtained by adding to the £th coordinate of Vj relative to A the product of b and the
s t h coordinate relative to A. Hence multiplying A on the left by M simply adds the
product of b and the s t h row to the tth row of A. That is, row t of A is replaced by the
sum of row t of A and b times row s of A.
The descriptions of MA for elementary matrices M of types I and III are as follows.
• If M is obtained from Im by multiplying row s of Ιπ by a, then MA is obtained
from A by multiplying row s of A by o.
• If M is obtained from 7 m by interchanging rows s and t of In (s ^ i), then MA
is obtained from A by interchanging rows s and t of A.
The concept of the transpose of a matrix is extremely useful in obtaining the row
analogue of the reduced column-echelon form. This analogous form is called the reduced
row-echelon form of a matrix.
Definition 3.26 / / A [aij] is any mx n matrix over R, the transpose of A is the
n x m matrix B — [bij]with bj üji fori = 1,2,..., n;j = 1,2, ...,ra. The transpose of
A is denoted by AT. If A is a matrix such that AT - A, then A is called symmetric.
If AT = — A, then A is skew-symmetric.
3.6 Row Operations and Row-Echelon Forms 93

Thus row i of AT is composed of the elements from column i of A, in order from


left to right rather than from top to bottom. We say that AT is obtained from A "by
interchanging rows and columns."

Example 1 D Consider the matrices

1 -2 3 0 1 -2
B= -2 - 4 5 and C 1 0 3
3 5-6 2 -3 0

It is easy to see that BT = B and

-1 2
7
C 0 -3 -c.
3 0

Thus B is a symmetric matrix and C is a skew-symmetric matrix. ■

Our next theorem states that the transpose of a product is equal to the product of
the transposes in reverse order.

Theorem 3.27 If A = [a^mxn and B = [bij]nXr over R, then (AB)T = BTAT.

Proof. Since AB = [cij]mxr with Cij = Σ ^ = ι α ^ ^ ' ί then w e n a v e (AB)T —


Efc=i jkhi]rxm· Also, BT = [dij]rxn with dij = bji and AT = [fij]nxm with f{j = ajL
a

Thus BTAT = E ^ = 1 dikfkj\rxrn = Efc=i hidjk]rxm = {AB)T. ■


Definition 3.28 Anmxn matrix A = [α^] over R is in reduced row-echelon form
if and only if the following conditions are satisfied:
1. The first nonzero element in row i is a 1 in column hi for i — 1,2, ...,r. (This 1
is called a leading one.)
2. k\ < &2 < · · · < kr < n. (That is, for each change in rows from upper to lower,
the leading one appears farther to the right.)
3. For i = 1,2, ...,r, the leading one in row i is the only nonzero element in column

4. Each of the last m — r rows consists entirely of zeros.


Example 2 D Consider the question as to which of the following matrices are in re­
duced row-echelon form.

10 0 0 10 0 3 0 0 0 1 0 0 1 2
0 0 0 0 B= 0 1 0 2 c= 10 0 0 D= 0 0 0 0
0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
94 Chapter 3 Matrix Multiplication

The matrix A is not in reduced row-echelon form since the row of zeros is placed above
a nonzero element, violating condition (4). The matrix B is not in reduced row-echelon
form because the leading 1 in row 3 is not the only nonzero element in column k$ = 4.
The matrix C fails on condition (2) because the leading 1 in the second row does not
appear farther to the right than the leading 1 in the first row. That is, k\ = 4 and
&2 = 1 do not satisfy k\ < &2- The matrix D satisfies all four conditions, and D is in
reduced row-echelon form. ■

Conditions (1) and (3) in Definition 3.28 can be reworded in the following ways that
parallel the rewording we used with the definition of reduced column-echelon form:

1. dij = 0 for j < k{, and α^. = 1 for i = 1, 2,..., r.


3. Row i is the only row with a nonzero element in column ki for i = 1, 2,..., r.

These alternative wordings are more useful in proving our next theorem.

Theorem 3.29 A matrix A is in reduced column-echelon form if and only if AT is in


reduced row-echelon form.

Proof. Let A = [ a ^ ] m X n and let B = [bij]nxrn = AT. If we let j denote the row
numbers and i denote the column numbers of elements of A in Definition 3.22, then A
is in reduced column-echelon form if and only if these conditions hold:

1. CLji = 0 for j < ki, and a^i = 1 for i = 1, 2,..., r.


2. k\ < k<i < · · · < kr < m.
3. Column i is the only column of A with a nonzero element in row k{.
4. Each of the last n — r columns of A consists entirely of zeros.

The elements in column i of A are the elements in rows i of AT, so the conditions
on A are satisfied if and only if:

1. bij = aji = 0 for j < /c^, and b^ — 1 for i — 1, 2,..., r.


2. k\ < k<i < ■ - < kr < m.

3. Row i is the only row of AT with a nonzero element in column ki.


4. Each of the last n — r rows of AT consists entirely of zeros.

Since AT is an n x m matrix, these conditions are equivalent to the assertion that AT


is in reduced row-echelon form. ■

From this theorem and the fact that (AT)T = A, it follows that AT is in reduced
column-echelon form if and only if A is in reduced row-echelon form.

Theorem 3.30 If A is invertible, then AT is invertible and (AT)~l = (A~l)T.


3.6 Row Operations and Row-Echelon Forms 95

Proof. If A is invertible, then A'1 A = In and therefore (AT) · {A~l)T = 1% = In by


Theorem 3.27. According to Theorem 3.14, AT is invertible and (AT)~l = (A~l)T. ■
Theorem 3.31 If A is an m x n matrix over H, there exists an invertible matrix P
over R such that PA is in reduced row-echelon form.

Proof. By Theorem 3.23, there is an invertible matrix Q such that ATQ is in reduced
column-echelon form, and this means that (ATQ)T = QT(AT)T = QTA is in reduced
T
row-echelon form. But P = Q is an invertible matrix, so the theorem is proved. ■

In the proof of Theorem 3.31, the reduced column-echelon form for AT is uniquely
determined by AT, and therefore the reduced row-echelon form PA is uniquely deter­
mined by A. We make the following definition.
Definition 3.32 The unique matrix PA in the statement of Theorem 3.31 is called the
reduced row-echelon form for A.
Theorem 3.33 A square matrix A = [α^]ηΧη is invertible if and only if the reduced
row-echelon form for A is In.

Proof. Assume that A is invertible, and let P be an invertible matrix such that PA
is in reduced row-echelon form. There must be no rows of zeros in PA, for otherwise,
AT would obviously be singular. This means that k\ = i of each i, and PA = In.
If the reduced row-echelon form for A is In, there is an invertible matrix P such that
PA = In. Then A is invertible by Theorem 3.15. ■

We have seen that a matrix is invertible if and only if it is a product of elementary


matrices. Thus the last theorem implies that there are elementary matrices Pi, P2,..., Ps
such that P s P s _ i · · · P2P1A is in reduced row-echelon form. But multiplication of a given
matrix on the left by Pi performs an elementary row operation on the given matrix.
Thus a sequence of elementary row operations may be applied to A in order to obtain
a reduced row-echelon matrix. In much the same manner as with column operations,
this gives rise to a systematic method for finding the invertible matrix P.
With the notation of the preceding paragraph, consider the sequence of equations

Pi J m A = PiA

PS'P2PiImA = PS'P2PiA = PA.

These equations indicate that if the reduced row-echelon form is obtained by application
of a certain sequence of elementary row operations to A, the matrix P may be obtained
by application of the same sequence of operations, in the same order, to 7 m . This can
be done efficiently by recording A and J m in a single matrix as [A, Im] and performing
the row operations simultaneously on A and 7 m .
96 Chapter 3 Matrix Multiplication

Example 3 □ To illustrate the procedure, let

2 0 2
A = 0 1 -3
2 1 1

and consider the problem of finding an invertible matrix P such that PA is in reduced
row-echelon form.

1 0 0 1 0 1 1 0 0
2 0 1 2
[A,Im} = 0 1 0 1 0 - 0 1 -3 1 0 1 0
2 1 0 0 1 0 1 - 1 1 -1
■ 0 1

1 0 1 1
2 0 0 1 10 0 1 1 1
2
1
2

0 1 -3 0 1 0 0 1 0 1 -32 1
2
3
2

0 0 2 1 -1 1 1 0 0 1 1 -21 1
2
1
2

Thus, I3 is the reduced row-echelon form for A, and

P =

is an invertible matrix such that PA = 1%.

In Example 3, we found P such that PA = 1%. According to Theorem 3.33, A


is invertible when its reduced row-echelon form is an identity matrix. However, the
equation PA = In implies more: It implies that P = A~l, by Theorem 3.15. Thus
the procedure illustrated in Example 3 is an efficient and systematic way of finding
A~l when it exists.
Suppose now that P is invertible and PA is in reduced row-echelon form but different
from In. Then PA must contain at least one row of zeros, and therefore PA is not
invertible. This implies that A is not invertible, by Theorem 3.16.
Summarizing our discussion of the procedure in Example 3, we can say that one of
the following possibilities must happen:

1. If the procedure leads to PA = 7 n , then A is invertible and P = A


2. If the procedure leads to a row of zeros in PA, then A is not invertible.
3.6 Row Operations and Row-Echelon Forms 97

Exercises 3.6

1. Describe the elementary row operation that is associated with the given elementary
matrix M.

4 0 0 1 0 0
(a) M 0 1 0 (b) M 0 1 5
0 0 0 0 0 1

0 0 1 1 3 0
(c) M = 0 1 0 (d) M 0 1 0
1 0 0 0 0 1

2. Which of the following are in reduced row-echelon form?

1 0 3 0 0 1 0 1 0 1 1 1
(a) 0 2 4 0 (b) 1 0 1 0 (c) 0 0 1 1
0 0 1 0 0 0 0 0 0 0 0 1

0 0 0 0 0 0 0 1 0 0 0
(d) 1 0 0 0 (e) 1 0 0 (f) 1 1 0 0
0 1 0 0 1 1 0 0 0 1 0

3. Find the reduced row-echelon form for each of the matrices given in Problem 6 of
Exercises 3.5.

4. In each part of Problem 6 in Exercises 3.5, let A be the given matrix and find an
invertible matrix P such that PA is in reduced row-echelon form.

5. Find the inverse of each of the following invertible matrices.

1 2 1 -4 -5 3 1 0 1 1 2 1
(a) 1 0 1 (b) 3 3-2 (c) 1 1 2 (d) -1 -1 1
0 1 -1 -1 -1 1 3 4 -2 0 1 3

1 2 0 1 1 0
6. Given A , write AT as a product of elementary
0 1 1 0 0 3
matrices.
98 Chapter 3 Matrix Multiplication

1 0 2
7. UA -1 1 0 is the matrix of transition from S3 to A, (a) find A and (b)
0 1 -1
find the transition matrix from A to £3.
8. Prove that if M is obtained from 7 m by multiplying row s of Im by a, then MA
is obtained from A by multiplying row 5 by a.
9. Prove that if M is obtained from Im by interchanging row 5 and t of 7 m , then
MA is obtained from A by interchanging row s and £.
10. Given that the matrix
Γ
4 2 4
B = I6 -5
0 3 2
is obtained from A by the following elementary operations:
i. First, the second and third rows of A are interchanged.
ii. Next, the first column is multiplied by 2 and added to the second column.
Find the matrix A.
11. Prove that {AT)T = A.
12. Which types of elementary matrices are symmetric?
13. Prove that any two diagonal elements of a skew-symmetric matrix over R are
equal.
14. Prove or disprove.

(a) The set of all symmetric nxn matrices over R is closed under multiplication.
(b) The set of all skew-symmetric nxn matrices over R is closed under multi­
plication.

15. Prove that, for any matrix A over R, AAT is defined and is a symmetric matrix.
16. Extend Theorem 3.27 to any product with a finite number of factors:

(A1A2-Ak)T = Al--AlAj.

17. Prove that if A is symmetric, then Ak is symmetric for any positive integer k.
18. Let A = [aij]nxn with aik = ai<7- + ajk for all z, j , fe.

(a) Prove that A is skew-symmetric.


(b) Show that A is determined by its first row.
3.7 Row and Column Equivalence 99

3.7 Row and Column Equivalence


Sometimes elements of sets are associated with each other in ways that have some of the
basic properties of equality. These associations between elements are called relations,
and certain types of relations are called equivalence relations. A knowledge of the various
equivalence relations on a set adds depth to our understanding of the structure of the
set.
The formal definition of a relation can be made as follows: A relation (more pre­
cisely, a binary relation) on a set T is a nonempty set "~", of ordered pairs (a, b) of
elements a and b of T. If (a, b) is in "~", then we say that a has the relation ~ to b.
We write a ~ b to indicate that a has the relation ~ to b.
If the relation under consideration is ordinary equality, then the set ~ consists of all
ordered pairs (a, a), and we write a = b for a ~ b.

Definition 3.34 A relation ~ on a set T is an equivalence relation if and only if


the following conditions are satisfied for arbitrary a,b,c inT :

1. a ~ a is true for all a G T.


2. If a ^ b is true, then b ~ a is true.
3. If a ^ b is true and b ~ c is true, then a ~ c is true.

The properties (1), (2), and (3) are known as the reflexive, symmetric, and
transitive properties, respectively.

We are interested here in equivalence relations on matrices. There are two such
relations that are intimately connected with the two preceding sections.

Definition 3.35 Let A andB be matrices overH. The matrixB is column-equivalent


to A if and only if there exists an invertible matrix Q such that B = AQ.

The term "column-equivalent" defines a relation on the set of all matrices over R.
The relation consists of the set of all ordered pairs (B, A) such that B = AQ for some
invertible matrix Q. It is clear that if A is m x n, then any matrix that is column-
equivalent to A is m x n.

Theorem 3.36 The relation defined as column-equivalence is an equivalence relation


on the set of all matrices over R. That is:

1. For any matrix A, A is column-equivalent to A.


2. If A is column-equivalent to B, then B is column-equivalent to A.
3. If A is column-equivalent to B and B is column-equivalent to C, then A is column-
equivalent to C.
100 Chapter 3 Matrix Multiplication

Proof. The statement (1) follows from A = AIn.


If A — BQ, where Q is invertible, then B = AQ~l and Q~l is invertible. Thus (2)
is valid.
Assume A = BQ and B = CP, where Q and P are invertible. Then A = (CP)Q =
C{PQ), and PQ is invertible. Thus (3) is valid. ■

The relation of column-equivalence can be described in several alternate ways. Four


of these are stated in the next theorem, and examples illustrating these alternate de­
scriptions follow Theorem 3.38.

Theorem 3.37 Let A and B be m x n matrices over R. For any given basis of R m ,
A will be the matrix of transition from the given basis to a set A of n vectors, and B
will be the matrix of transition from the given basis to a set B of n vectors. Each of the
following statements implies the other three:
1. B is column-equivalent to A.
2. B may be obtained from A by a sequence of elementary column operations.
3. (B) = {A).
4. B may be obtained from A by a sequence of elementary operations.

Proof. Statement (3) is true if and only if (4) is true, according to Theorem 2.13.
In view of Theorem 3.19, (1) means the same as the assertion that B — AQ1Q2 - - Qt,
where each Qi is elementary. It thus follows from the discussion following Definition
3.21 that (1) and (2) are equivalent. Thus the theorem will be proved if we show that
each of (1) and (4) implies the other.
Suppose first that B is column-equivalent to A. Then B = AQ1Q2 · · · Qt, where
each Qi is elementary. This means that AQ1Q2 - - Qt is the matrix of transition from
the given basis to B. And since A is the matrix of transition from the given basis to A,
Q1Q2 - - ' Qt is a matrix of transition from A to B. Each Qi has an associated elementary
operation E^ and we have Et · · · £2^1 (*4) = B. Thus (1) implies (4).
Assume now that B may be obtained from A by a sequence E\,E2,...,Et of elemen­
tary operations E{. If Qi is the elementary matrix associated with Ei, then Q1Q2 · · · Qt
is the matrix of transition from A to Et · · · E2E\(A) = B. Since A is the matrix of
transition from the given basis to A, AQ1Q2 - - Qt is the matrix of transition from the
given basis to B. But the matrix of transition from the given basis to B is .£?, and this
matrix is unique. Hence B — AQ1Q2 - — Qt, and B is column-equivalent to A. ■

Theorem 3.38 Any matrix over R is column-equivalent to a unique matrix in reduced


column-echelon form.

Proof. This is a restatement of Theorem 3.23. ■

We consider now some examples that illustrate the quantities involved in Theorem
3.37 and its proof.
3.7 Row and Column Equivalence 101

Example 1 D Let the matrices A and B be given by

2 0 1 1 -3 0
1 1 0 0 1 2
A = B =
8 4 2 2-2 8
6 0 3 3-9 0

Then A is the transition matrix from £4 to

.A = { ( 2 , 1 , 8 , 6 ) , (0,1,4,0), (1,0,2,3)}

and B is the transition matrix from £4 to

B = {(1,0,2,3), ( - 3 , 1 , - 2 , - 9 ) , (0,2,8,0)}.

Theorem 2.13 assures us that B and A span the same subspace of R 4 if and only if B
can be obtained from A by a sequence of elementary operations. By the end of the next
example, we can see that (B) = (A). ■

Example 2 □ For the matrices A and B in Example 1, we shall show that B is column-
equivalent to A and find an invertible matrix Q such that B = AQ. To show that B is
column-equivalent to A we need only confirm that their reduced column-echelon forms
are equal. But in order to find an invertible Q such that B = AQ, we need first to find
invertible matrices Miand M^ such that AM\ = A' and BM2 = B' are both in reduced
column-echelon form. Using the same procedure as in Example 2 of Section 3.5, we
obtain the following results.

2 0 1 1 0 2 1 0 0 1 0 0
1 1 0 0 1 1 0 1 1 0 1 0
8 4 2 2 4 8 2 4 4 2 4 0
A 6 0 3 3 0 6 3 0 0 3 0 0
- - -
h
1 0 0 0 0 1 0 0 1 0 0 1
0 1 0 0 1 0 0 1 0 0 1 -1
0 0 1 1 0 0 1 0 -2 1 0 -2

1 0 0
0 0 1
0 1 0
Thus Mi = 0 1 -1 is an invertible matrix such that AM\ = A'
2 4 0
1 0 -2
3 0 0
102 Chapter 3 Matrix Multiplication

Working in similar fashion with £ , we have

1 -3 0 1 0 0 1 0 0
0 1 2 0 1 2 0 1 0
2 -2 8 2 4 8 2 4 0
B 3 -9 0 3 0 0 3 0 0

* - f

h
1 0 0 1 3 0 1 3 -6
0 1 0 0 1 0 0 1 -2
0 0 1 0 0 1 0 0 1

1 3 -6
Thus M 2 0 1 -2 is an invertible matrix such that
0 0 1

1 0 0
0 1 0
BM2 = B'
2 4 0
3 0 0

As mentioned earlier, the fact that Α' = Β' shows that B is column-equivalent to A.
The equation BM2 = AM\ implies that B = ΑΜχΜ^1, and thus Q = ΜχΜ^1 is an
invertible matrix such that B = AQ. When these computations are performed, we find
1 -3 0
_1
that M 2 = 0 1 2 and
0 0 1

0 0 1 -3
1
Q = MiMf = 0 1 0 1
1 0 0 0

is an invertible matrix such that B = AQ. According to Theorem 3.37, the sets B and
A in Example 1 do indeed span the same subspace. ■
Example 3 D Suppose now that we wish to find elementary matrices Qi,Q2,-,Qt
such that B = AQ1Q2 - -Qt as described in the proof of Theorem 3.37. These elemen­
tary matrices can be obtained from the work in Example 1 because both A! and B'
3.7 Row and Column Equivalence 103

were found by performing a single elementary column operation in each step. It is easy
to see that the work done in Example 1 with A can be represented by

A AEX AEXE2 AE1E2E3

J3 . h E x
.
I3E1E2 I3E1E2E3

where E\,E2, and E3 are elementry matrices given by

0 0 1 1 0 1 0 0
Ex = 0 1 0 Eo = 0 1 0 1 -1
1 0 0 0 0 0 0 1

and Mi = E\E2E3. Similarly, the work with B can be represented by

B BFX BF1F2
- -
I3F1F2

where F\ and F2 are elementary matrices given by

1 3 0 1 0 0
i\ 0 1 0 , F2 = 0 1 -2
0 0 1 0 0 1

and M2 = F\F2. To find the desired Qi,Q2, ...,Qt, we write

B = AMXM^X
= AE^EziF^)-1
= AE^E^F^1.

Thus Qi = E\,Q2 = E2,Q3 = Es,Q4 = F^~1,Q5 = F f 1 are elementary matrices such


that B = AQxQ-zQzQiQe,. Writing out all the elementary matrices involved, we have

0 0 1 1 0 -2 1 0 0 1 0 0 1 -3 0
B 0 1 0 0 1 0 0 1 -1 0 1 2 0 1 0
1 0 0 0 0 1 0 0 1 0 0 1 0 0 1

As might be expected, there is an equivalence relation connected with row operations


that parallels the one connected with column operations.

Definition 3.39 Let A and B be matrices over R. The matrix B is row-equivalent


to A if and only if there exists an invertible matrix P such that B — PA.
104 Chapter 3 Matrix Multiplication

Similar to the situation with Definition 3.35 the term "row-equivalent" defines a
relation on the set of all matrices over R. It is left as an exercise to prove that this
relation is an equivalence relation.
We have seen in Section 3.6 that multiplication of a given matrix on the left by a
product of elementary matrices yields the same result as the application of a sequence
of elementary row operations to the matrix. In combination with Theorem 3.19 this
shows that the following conditions are equivalent:

1. B is row-equivalent to A;

2. B may be obtained from A by applying a sequence of elementary row operations.

Theorem 3.40 Any matrix over R is row-equivalent to a unique matrix in reduced


row-echelon form.

Proof. This theorem follows from Theorem 3.31 and the discussion of uniqueness
just before Definition 3.32. ■

Exercises 3.7

1. Determine whether or not the given matrices are column-equivalent.

1 3 3 -6
(a) A B
-2 - 6 2 -4

3 2 5 1
(b) A 6 4 , B = 10 2
13 9 9 2

1 2 1 0 -1
-2 -4 4 1 -2
(c) A B
1 0 3 1 -1
-1 0 2 1 0

1 0 0 1 1 0 0 1
1 1 0 2 2 1 0 2
(d) A , B =
0 1 1 2 2 2 3 1
0 0 1 1 1 1 3 1

2. Which of the pairs of matrices in Problem 1 are row-equivalent?


3.7 Row and Column Equivalence 105

3. In the following list of matrices, A and B are row-equivalent. Also, P\ and P2


are invertible and such that P\A and P2B are both in reduced row-echelon form.
Find an invertible P such that B = PA.

3 2 5 1
A = 6 4 , B = 10 2

1 13 9 9 2

0 1 -1
Pl = P<2 = 1 -5 5
-2 1 0

In the following list of matrices, A and B are row-equivalent and Pi is an invertible


matrix such that P\A is in reduced row-echelon form. Find an invertible matrix
P such that PA = B.

1 3 5 1 0 -1 7 -3 0
2 7 12 ,B = 2 1 0 ,Pi -2 1 0
3 4 5 1 2 3 13 5 1

5. For each pair of matrices in Problem 1 that are column-equivalent, find an in­
vertible matrix Q such that B = AQ.
6. For each pair of matrices in Problem 1 that are row-equivalent, find an invertible
matrix P such that B = PA.
7. For each pair of matrices in Problem 1 that are column-equivalent, find elementary
matrices Qi, Qi, ···, Qt such that B — AQ1Q2 · · Qt-
8. Prove that the relation defined by "row-equivalent" is an equivalence relation on
the set of all matrices over R.
9. Justify your answer for each of the following questions.

(a) Which square matrices over R are column-equivalent to In?


(b) Which square matrices over R are row-equivalent to In?

10. Prove that B is column-equivalent to A if and only if BT is row-equivalent to AT.


11. Prove that A and B are column-equivalent if and only if they have the same
reduced column-echelon form.
12. Assuming that A and B are row-equivalent, describe a procedure for finding ele­
mentary matrices P\,...,Pt such that B = P\- · · PtA.
106 Chapter 3 Matrix Multiplication

3.8 Rank and Equivalence


There is one more equivalence relation on matrices that we wish to consider at this
time. This relation is of fundamental importance in the study of linear transformations,
and it has traditionally been referred to by the term "equivalent," although other terms
have been used.

Definition 3.41 Let A and B be matrices over R. The matrix B is equivalent to A


if and only if there are invertible matrices P and Q such that B = PAQ.

Thus, B is equivalent to A if B is row-equivalent to A or if B is column-equivalent to


A. We shall see in Problem 3 of the exercises for this section that B may be equivalent
to A, and yet be neither row-equivalent nor column-equivalent to A.
It is readily verified that the relation in Definition 3.41 is a true equivalence relation.
From this point on, "equivalence of matrices" will refer to this relation. Equivalence of
matrices is intimately connected with the rank of a matrix.

Definition 3.42 If A is any m x n matrix over R, the rank of A is the number r of


nonzero columns in the reduced column-echelon form of A. The rank of A will be denoted
by rank(A).

Definition 3.43 Let A = [ a ^ ] m X n over R, and let Vj — (aij,ct2j, ...,a m j ) for j =


1,2, ...,n. The subspace o / R m spanned by {vi, v 2 ,..., v n } is called the column space
of A, and the vectors Vj are called the column vectors of A.

It is easy to see that A is the matrix of transition from £ m to a spanning set of the
column space of A.

Theorem 3.44 The rank of A is the dimension of the column space of A.

Proof. Let A = [ a ^ ] m X n have rank r, and let A' = [a^lmxri be the reduced column-
echelon form of A. Let A — {vi, V2,..., v n } , where Vj = (aij, Ü2J, ...,a m j), and let
A! — {vi, V2,..., v^}, where ν^ = (α^, a;2j, . . . , α ^ ) . Then (A) and {A!) are the column
spaces of A and A!', respectively. It is clear from conditions (1) and (2) of Definition
3.22 that {v^, v'2,..., v^.} is linearly independent and therefore is a basis of {A'). But
(A) = (Α') by Theorem 3.37 since A and A! are column-equivalent. Thus (A) has
dimension r. ■

Corollary 3.45 If A is m x n over R and Q is any invertible n x n matrix, then


rank (A) = rank(AQ).

Proof. Since A and AQ are column-equivalent, their column spaces are equal. Hence
their ranks are equal, by the theorem. ■

Theorem 3.46 Let A be an m x n matrix over R, and let r be the rank of A. There
exist invertible matrices P and Q such that PAQ has the first r diagonal elements equal
to 1, and all other elements zero.
3.8 Rank and Equivalence 107

Proof. Let Q be an invertible matrix such that AQ = A' = [ a ^ ] m X n is in reduced


column-echelon form. As in the proof of the preceding theorem, the set {v[, v'2,..., vr}
is a basis of the column space (,4) = (Α') of A.
A
A
Q. PAQ

ε <r

Figure 3.6

According to Theorem 1.32, the linearly independent set {v^, \'2,..., v^,} can be ex­
tended to a basis B = {wi, w 2 ,..., w m } of R m , where w^ — v^ for i = l,2,...,r. Let P
be the invertible mxm matrix of transition from B to £ m (see Figure 3.6). Then PAQ
is the matrix of transition from B to A'. Since the first r vectors of Af are the same as
the first r vectors of B, the first r columns of PAQ are the same as the first r columns
of 7 m . And since the last n — r vectors of Af are zero, the last n — r columns of PAQ
are zero. Thus

1 0 0 0

0 Ir I 0
PAQ =
0 I 0 0 I 0

and the theorem is proved. I


Let
Ir | 0
Or —+—
0 I 0
denote the matrix PAQ in the proof of Theorem 3.46 above. The proof of the theorem
can be followed literally step by step so as to obtain invertible matrices P and Q such
that PAQ = Dr. The only difficulty is in finding the matrix of transition P from B
to 6m. This is probably most easily done by writing out the matrix P~l (which is the
matrix of transition from <?m to #), and then finding the inverse of P~l. However,
we have seen earlier that multiplication on the right by Q is equivalent to applying
a sequence of column operations, and multiplication on the left by P is equivalent to
108 Chapter 3 Matrix Multiplication

applying a sequence of row operations. Thus one may proceed to use column operations
A'
on to obtain as we did in Section 3.5, and then use row operations on
In Q
[Α', Im] to obtain [Dr, P], where Dr = PA' = PAQ. This is illustrated in the following
example.

Example 1 □ For the matrix

1 2 -1
3 6-3
-1 1 0
2 4-2

we shall find invertible matrices P and Q such that PAQ = Dr.


A A'
We first use column operations to transform into where A' = AQ is
h
in reduced column-echelon form.
l. _Q _

1 2 -1 1 0 0
3 6 -3 3 0 0
-1 1 0 -1 3 -1
A 2 4 -2 2 0 0
~~*
h
1 0 0 1 -2 1
0 1 0 0 1 0
0 0 1 0 0 1

1 0 0 1 0 0
3 0 0 3 0 0
1 -1 3 0 1 0
2 0 0 2 0 0 A'
-►

Q
1 1 -2 0 -1 1
0 0 1 0 0 1
0 1 0 -1 -1 3
3.8 Rank and Equivalence 109

Next we use row operations to transform [A',/4] into [Dr,P], where Dr — PA' =
PAQ.

1 0 0 11 0 0 0 1 0 0 1 0 0 0
3 0 0 I 0 1 0 0 0 0 0 - 3 1 0 0
[A',I4
0 1 0 10 0 1 0 0 1 0 0 0 1 0
2 0 0 10 0 0 1 0 0 0 - 2 0 0 1

1 0 0 1 0 0 0
0 1 0 0 0 1 0
[Dr,P]
0 0 0 - 3 1 0 0
0 0 0 - 2 0 0 1

Thus
1 0 0 0
0 -1 1
0 0 1 0
and Q 0 0 1
- 3 1 0 0
-1 - 1 3
- 2 0 0 1
are invertible matrices such that

1 0 1 o
0 11 0 h 1 0
PAQ — + — = — + — D2.
0 0 1 0 0 1 0
0 0 1 0
Theorem 3.47 Let A and B be m x n matrices over R. Then B is equivalent to A if
and only if B and A have the same rank.

Proof. Let r denote the rank of A, and let r' denote the rank of B.
Suppose that r = r'. Then there are invertible matrices P,Q,P'', and Q' such that
PAQ — Dr = Dr' = P'BQ'. The matrices P and P' are m x m, and the matrices Q
and Q' are n x n. Hence the equation PAQ = P'BQ' implies that

(P')-1PAQ(Q')-1 = B,

where (Ρ') lP and Q{Q') l


are invertible matrices. Therefore B is equivalent to A.
110 Chapter 3 Matrix Multiplication

Assume now that B is equivalent to A. Let Dr be the diagonal matrix equivalent to


A that has the form described in Theorem 3.46 and let Dr> be the corresponding matrix
that is equivalent to B. Since A and B are equivalent, Dr and Dr> are equivalent.
Therefore there are invertible matrices P and Q such that PDrQ = Dri and

PDr = Dr,Q~l.

Now PDr has n — r columns of zeros so that rank(PD r ) < r. Thus

r' = rank(D r /) = rank(D r /Q _ 1 ) = rank(P£> r ) < r.

Similarly,
r = rank(£>r) = rank(£>rQ) = rank(P _ 1 D r /) < r'
so that r = r'. I

The following theorem and corollary are extremely useful in connection with the
solution of systems of linear equations in Chapter 4.

Theorem 3.48 Let A — [ûr/jmxn over R. Then A and AT have the same rank.

Proof. Let r denote the rank of A, and let P and Q be invertible matrices such that

Ir I 0
PAQ = Dr.
0 I 0

Now QT and PT are invertible (Theorem 3.30), and QTATPT = Dj. The dimension
of the column space of Dj is clearly r, so the D^ has rank r. But AT and Dj are
equivalent, so AT must have rank r also. ■

Corollary 3.49 If A has rank r, then r is the number of nonzero rows in the reduced
row-echelon form of A.

Proof. Suppose that A has rank r, and let P be an invertible matrix such that
PA is in reduced row-echelon form. By Thoerem 3.29, (PA)T = ATPT is in reduced
column-echelon form. Since A, AT and ATPT all have the same rank r, the number of
nonzero columns in (PA)T = ATPT is r. But the number of nonzero columns in (PA)T
is the same as the number of nonzero rows in PA. Hence the corollary is proved. ■

This result leads to the next corollary. The proof is requested in Problem 14 of the
exercises.

Corollary 3.50 An n x n matrix A is invertible if and only if A has rank n.


3.8 Rank and Equivalence 111

Exercises 3.8

1. Find the rank of each of the following matrices.

2 4 1 3 2 3
1 2 3 -2 1 4
(a) (b)
0 0 5 -7 0 5
3 6 0 2 1-2 5 1 -1

1 0 2 1 2 3
4 1 3 2 -4 -6
(c) (d)
3 1 1 0 2 4
2 1 -1 0 0 0

2. Find the standard basis of the column space of each matrix A in Problem 1.

3. Which of the pairs A, B of matrices in Problem 1 of Exercises 3.7 are equivalent?


(Compare this with the results of Problems 1 and 2 of Exercises 3.7.)

4. Which of the matrices in Problem 1 above are equivalent?

5. Which of the following matrices are equivalent?

r
i o-i
η 3 2
. 1- 3- 2- . , 3_ - 6_ 7. , , , , 4 1 - 2
6 4 , D
~ =
A=
2 -6 4
, B
* =
2 -4 5 , c~ =
13 9 ' 3 1 - 1
2 1 0
6. Answer the following questions for the matrices

2 1 -3 1 1 -1
A = 1 2 0 B 2 3-1
1 1 -1 3 3-3

and give a reason for each answer.

(a) Are A and B column-equivalent?


(b) Are A and B row-equivalent?
(c) Are A and B equivalent?
112 Chapter 3 Matrix Multiplication

1 2 1 1 -2 3
7. For A = 2 4 2 it is given that Q = 0 1 -2 is an invertible matrix
0 1 2 0 0 1
1 0 0
such that AQ — A! — 2 0 0 . Let A! = {(1,2,0), (0,0,1), (0,0,0)}. Find a
0 1 0
1 0 0
3
basis B of R such that the matrix of transition from B to A! is Ό2 0 1 0
0 0 0
and an invertible P such that PAQ = £>2· {Hint: See the proof of Theorem 3.46.)
8. For each matrix A below, follow the proof of Theorem 3.46 step by step to find
invertible matrices P and Q such that PAQ = Dr. In your development, write
out the sets A, A', and B.
1 0 0 1 3 2 1 0
1 1 0 2 4 -3 -2 0
(a) A (b)A =
0 1 1 2 1 0 3 1
0 0 1 1 3 -3 1 1

9. For each matrix A, find invertible matrices P and Q such that PAQ has the first
r elements of the main diagonal equal to 1, and all other elements 0.
0 2 4 1 -1
(aM = 0 1 2 0 (b)A = 2
0 3 6 1 -3
10. In Problem 1 above, let A be the matrix in part (c), and let B be the matrix in
part (d). Given that A and B are equivalent, find invertible matrices P and Q
such that B = PAQ. (Hint: See the proof of Theorem 3.47.)
11. In Problem 1 above, let A be the matrix in part (a), and let B be the matrix in
part (b). Given that A and B are equivalent, find invertible matrices P and Q
such that B = PAQ.
12. Prove that the relation "equivalence of matrices" is an equivalence relation on the
set of all matrices over R.
13. Prove that if B is conformable to A, then rank(AB) < min{rank(A),rank(£?)}.
14. Prove Corollary 3.50.
Chapter 4

Vector Spaces, Matrices, and


Linear Equations

4.1 Introduction
As promised earlier, the preceding results will now be extended to more general situ­
ations. This extension is followed by an application of these results to the solution of
systems of linear equations.

4.2 Vector Spaces


The theory developed thus far depended basically on the fact that the set of all real
numbers forms a field. Our first objective is to extend this theory to other fields and
other vector spaces.

Definition 4.1 Suppose that T is a set of elements in which a relation of equality and
operations of addition and multiplication, denoted by + and ·, respectively, are defined.
Then F is a field with respect to these operations if the conditions below are satisfied
for all a,b,c in T:
1. a + b is in T. (Closure property for addition)
2. (a -f b) + c = a + (b + c). (Associative property of addition)
3. There is an element 0 in F such that a + 0 = a for every a in T. (Additive
identity)
4. For each a in T, there is an element —a in T such that a + (—a) = 0. (Additive
inverses)
5. a + b = b + a. (Commutative property of addition)
6. a - b is in T'. (Closure property for multiplication)
7. (a · b) · c — a · (6 · c). (Associative property of multiplication)

113
114 Chapter 4 Vector Spaces, Matrices, and Linear Equations

8. There is an element 1 in T such that a · 1 = a for every a in T. (Multiplicative


identity)
9. For each a Φ 0 in T, there is an element a - 1 in T such that a · ( a - 1 ) = 1.
(Multiplicative inverses)
10. a · b — b- a. (Commutative property of multiplication)
11. (a -h h) · c = a · c + b ■ c. (Distributive property)
The notation ab will be used interchangeably with a · b to indicate multiplication.
Elements of a field will be referred to as scalar s.
There are many fields other than the real numbers. Two familiar examples are
provided by the set of all rational numbers or the set of all complex numbers. Fields
that are subsets of the complex numbers are called number fields. There are many
fields other than the number fields, and some of them may be known to the student. In
the material from here on, the results obtained may be interpreted in as much generality
as the student's background permits. If that background included no knowledge of fields
other than number fields, T may be regarded as being a number field throughout the
development.
The following definition should be compared with Definition 1.2 and Theorem 1.3
of Chapter 1.
Definition 4.2 Let T be a field, and suppose that Y is a set of elements such that for
each a in T and each v in Y, there is a product av defined. The operation that yields
this product is called scalar multiplication. Moreover, suppose that an operation of
addition is defined in V. Then V is a vector space over T with respect to these
operations if the conditions below are satisfied for any a, b in T and any u, v, w in V :
1. u + v is in V. (Closure of Y under addition)
2. (u + v) + w = u + (v + w). (Associative property of addition in Y)
3. There is an element 0 in V such that v + 0 = v for all v in V. (Additive identity)
4. For each v in V, there is an element — v in V such that v + (—v) = 0. (Additive
inverses in V)
5. u + v = v + u. (Commutative property of addition in Y)
6. av is in V. (Closure of Y under scalar multiplication)
7. a(6v) = (ab)v. (Associative property of scalar multiplication)
8. a(u + v) = au + av. (Distributive property, addition in Y)
9. (a + b)v = av + bv. (Distributive property, addition in T)
10. l - v = v.
The elements of V are called vectors if Y is a vector space over T.
With the exception of Section 1.4, all of the definitions, theorems, and proofs in
Chapter 1 through Definition 1.23 apply to arbitrary vector spaces V. Each corre­
sponding statement may be obtained by replacing R by an arbitrary field T and R n by
an arbitrary vector space V over T. Some restrictions are necessary in the remainder
of Chapter 1.
4.2 Vector Spaces 115

Definition 4.3 Let V be a vector space over T. Then V is a finite-dimensional


vector space (or a vector space of finite dimension over T) if there exists a
basis of V with a finite number of elements. If no such basis exists, V is infinite-
dimensional, or of infinite dimension over T.

The results in Chapter 1 after Definition 1.23 apply only to finite-dimensional vector
spaces. If R is replaced by T and R n is replaced by a finite-dimensional vector space V
over T, Theorems 1.24, 1.26, and 1.27 and Definition 1.28 remain valid with the proofs
unchanged except for notation. In particular, any two bases of a finite-dimensional
vector space have the same number of elements, and the number of elements in a basis
is the dimension of the vector space. In Theorems 1.31 through 1.34, R n is replaced
by an n-dimensional vector space V and W is a subspace of V.
Let us consider now some examples of vector spaces. In each case, T denotes a field.

Example 1 □ For a fixed positive integer n, let Tn denote the set of all n-tuples
(u\,U2, ...,ΐΧη) with Ui in T. Two elements u = (^1,7x2, ...,u n ) and v = {v\,V2,..,vn)
are equal if and only if m — Vi for i = 1,2,..., n. With addition defined in Tn by

and scalar multiplication by

=
a(ui,u2,...,un) (aui,au2,>..,aun)

the same techniques used to prove Theorem 1.3 in Section 1.2 can be used to prove that
Tn is a vector space over T. Denoting the multiplicative identity in T by 1, the vectors

(1,0,0,...,0),(0,1,0,...,0),...,(0,0,0,...,1)

can be shown to form a basis of Tn, and therefore Tn has dimension n over T. For
n = 1, we can identify (αχ) with a\ That is, T is a vector space of dimension one over
TM

Example 2 □ For a fixed nonnegative integer n, let Vn denote the set of all polynomials
of the form
ÜQ + a\x + Ü2X2 -\ + a>nxn
with each α^ in T. We shall refer to this set as " 7 ^ over TP That is, Vn over T is the
set of all polynomials Σ™=0 eux1 in the variable x with coefficients in T and degree less
than or equal t o n . ! Let

p(x) = αο + αιχ H + o,nxn

and
q(x) = b0 + biX-\ h bnxn
1
Note that we are using CLQ interchangeably with CLQX0 in the sigma notation.
116 Chapter 4 Vector Spaces, Matrices, and Linear Equations

denote two elements of Vn. Then p(x) and q(x) are equal if and only if a; = bi for
i — 0,1,2,..., n.With addition and scalar multiplication defined in the usual ways by
p(x) + q(x) = (a0 + &o) + (ax + h)x -\ h (α η + 6 n )£ n
and
φ ( χ ) = cao + caix + · · · + canxn'·>
it is easy to verify that Vn is a vector space over T with the zero polynomial
0 = 0 + Ox + Ox2 + · · · + 0xn
as its additive identity. With p(x) as given above, its additive inverse is the polynomial

-p(x) = (-a0) + (-ai)aH h (-an)xn

= (-IM*)-
The set ß of n + 1 polynomials given by
β={1,ζ,:τ2,...,χ"}
n
spans Vn since an arbitrary p(x) = Σ αΐχ% ls
automatically a linear combination of
vectors in B:
p(x) = a 0 (l) + a\{x) + a2(x2) H l· an(xn).
Also, a linear combination of vectors in B yields the zero polynomial 0 if and only if all
coefficients are zero. Thus B is linearly independent and forms a basis of Vn. It follows
from this fact that all bases of Vn have n + 1 elements, and Vn is of dimension n + 1. ■
For our next two examples, we draw on topics from the calculus.
Example 3 □ Let V denote the set of all infinite sequences

{an} — ÛI,Û2, ...,ûn } ···


of real numbers a n . Two sequences {a n }and {bn} in V are equal if and only if an = bn
for all positive integers n. This set V is a vector space over R with respect to the
operation of addition
{a n } + {6 n } = {a n + 6 n }
and scalar multiplication
c
{°n} = {can}
for c in R. The zero vector in V is the sequence {cn} with cn = 0 for all n, and the
additive inverse of {an} in V is

-Wn] = {-dn}
= (-i)KJ.
We are not equipped in this text to prove it, but this vector space V is of infinite
dimension over R. ■
4.2 Vector Spaces 117

Example 4 □ Let V be the set of all real-valued functions of the real variable t with
domain the set R of all real numbers. Two functions / and g in V are equal if and
only if / (t) = g(t) for all real numbers t. With respect to the ordinary operations of
addition
U + g)(t) = f(t) + g(t)
and scalar multiplication
( c / ) ( « ) = c (/(*))
used in the calculus, the set V is a vector space over R. The zero vector in V is the
constant function that is identically zero for all values of the variable t. The additive
inverse of / is the function — / given by

(-/)(*) = - ( / ( * ) )

We are not equipped to prove it here, but this V is an infinite-dimensional vector space
over R. ■
Example 5 □ Let J-'mxn denote the set of all m by n matrices A — [aij] m X n with
elements a^ in T. With addition defined by

A-\- LS — [&ij\mxn i lyijjmx [CLij ~~r Oij\mxn

and scalar multiplication defined by


=
^Ά ^■\Pjij\mxn = [CQ>ij J m X n ·>

Fmxn is a finite-dimensional vector space over T. The additive identity in Tmxn is the
zero matrix
r
o o ··· o
o o 0
(Jmxn — — lujmxn?

o o ... o
and the additive inverse of A — [aij] m X n is the matrix

=
Ά L Üijlmxn

= (-l)A
In many instances, we shall also use the zero vector symbol 0 to indicate a zero matrix.
As illustrations of the operations defined in Example 5, we have

1 -2 0 6 5-9 7 3 - 9
+
7 4 -3 -5 - 1 8 2 3 5
118 Chapter 4 Vector Spaces, Matrices, and Linear Equations

and
5-4 7 15 - 1 2 21
1 0 2 3 0 6
in the vector space R,2x3 of all 2 x 3 matrices over R.
Our last example in this section demonstrates that the operations used in a set to
form a vector space are not unique and do not have to be defined in a certain way. Some
operations that can be used as addition and scalar multiplication may look somewhat
strange when first encountered.

Example 6 □ Let V be the set of all ordered pairs of real numbers with the usual
equality:
(xux2) = (2/1,2/2) if and only if xx = yx and x2 = 2/2-
Addition and scalar multiplication are defined in V as follows:

(xi,x2) + (2/1,2/2) = (xi +2/1 + 1,£2 +2/2 + 1),


c(xi,x2) = (c + cxi - l,c + cx2 - 1).

We shall systematically check the ten conditions required by Definition 4.2 in order that
V be a vector space over R. To this end, let u = (u\, u2), v = (^1,^2), and w = (w\, w2)
be arbitrary elements in V, and let a and b represent arbitrary real numbers.
(1) The sum
u + v = (ui + v\ + 1, u2 + v2 + 1)
is in V since both u\ + v\ + 1 and u2 + v2 + 1 are real numbers.
(2) We have

(u + v ) + w = (m + vi + 1,U2 +^2 + 1) + (^1,^2)


= ((wi + vi + 1) + wl + 1, (u2 + v2 + 1) + w2 + 1)
= (u\ + vi + wi + 2, u2 + v2 + w2 + 2)

and
u + ( v + w) = (ui,u2) + (vi +wi + 1,^2 + w2 + 1)
= (ui + (υι + wi + 1) + 1, u2 + (v2 + ^2 + 1) + 1)
= (ui + vi + wi + 2,u2 + v2 + w2 + 2).
Thus addition in V is associative.
(3) The element (—1,-1) in V is an additive identity since

(vuv2) + (-1,-1) - ( v i - 1 + 1 , 1 ^ - 1 + 1)
= (^1,^2)
4.2 Vector Spaces

for all v = (vi,v2) in V. Thus we write 0 = (—1, —1).


(4) The additive inverse of v = (^1,^2) is (—v\ — 2, —v2 — 2) in V since

{vi,v2) + (-v1-2,-v2-2) = {vi-v1-2 + l,v2-v2-2 + l)

= (-1,-1)
- 0.

(5) Addition in V is commutative since

u + v = (ui+vi + l,u2 + v2 + l)
= (vi +ui + l,v2 + u2 + 1)
= V + U.

(6) The product


0(^1,^2) = (a + av\ — l , a + av2 — 1)
is always in V since both a + av\ — 1 and a + av2 — 1 are real numbers.
(7) We have

a(bv) = a(b + bui-l,b + bu2-l)


= (a + a(b + bvi - 1) - 1, a + a(6 + bv2 - 1) - 1)
= (a + ab + a(iwi) — a — 1, a + aft + a(bv2) — a — 1)
= ((a&) + (a6)^i - 1, a& + (a&)v2 - 1)
= (a&)v,

so scalar multiplication has the associative property.


(8) Verifying the distributive property with addition in V, we find that

a(u + v) = a(u\ + ^ + l , u 2 + v2 + 1)
= (a + m/i + a^i -ha — 1, a -f 0^2 + 0^2 + a — 1)
= (a + at*i — 1, a + 01^2 — 1) + (a + at>i — 1, a + av2 — 1)
= au + av.

(9) The distributive property with addition in R is valid since

(a + 6)v = ((a + b) + (a + 0)^1 - 1, (a + 6) + (a + 6)υ2 - 1)


= (a + 6 + ai>i + 6^1 — 1, a + 6 + at>2 + iw2 — 1)
= (a + a^i — 1, a + at>2 — 1) + (6 + iwi — 1,6 + 6^2 — 1)
= av + 6v.
120 Chapter 4 Vector Spaces, Matrices, and Linear Equations

(10) We have
1-v = (l + l ( v i ) - l , l + l ( t ; 2 ) - l )
= (vi,i>2).

We have verified all the conditions in Definition 4.2, so this set V is a vector space
over R with respect to these operations, even though these operations are dramatically
different from the standard operations in the familiar vector space R 2 . ■

Exercises 4.2

1. Verify that the set in the indicated example from this section is actually a vector
space.
(a) Example 1 (b) Example 2 (c) Example 3 (d) Example 4
(e) Example 5
2. Write out a basis for the vector space R3 X 2 .
3. What is the dimension of J-mxn?
4. Find a set that forms a basis for the vector space V in Example 6 of this section,
and prove that your set is a basis for V.

In Problems 5-16, assume that equality in the given set is the same as in the example of
this section that involves the same elements, and determine if the given set is a vector
space over R with respect to the operations defined in the problem. If it is not, list all
conditions in Definition 4.2 that fail to hold.

5. The set V of all ordered pairs of real numbers with operations defined by

(xi,X2) +(2/1,2/2) = (zi+2/i>0),


c(xi,x 2 ) = (cxi,cx 2 )·

6. The set V of all ordered pairs of real numbers with operations defined by

(x1,x2) + (2/1,2/2) = (^1+2/1,^2 + 2/2),


c(xux2) = (cxux2).

7. The set V of all ordered pairs of positive real numbers with operations defined by

(xi,x 2 ) +(2/1,2/2) = (£12/1,^22/2),


c{x1,x2) = (χΐ,χξ)·

8. The set W of all p(x) in Example 2 of this section that have zero constant term,
with operations as defined in Example 2.
4.2 Vector Spaces 121

9. The set V of all ordered triples of real numbers with operations defined by

( χ ΐ , £ 2 , £ 3 ) + (2/1,2/2,2/3) = (Zl +2/1,^2 +2/2,^3 +2/3),


c(xi,x 2 ,Z3) = (cxi,0,0).

10. The set V of all ordered triples of real numbers with operations defined by

(X1,X2,^3) + (2/1,2/2,2/3) = (Sl + 2 / 1 , ^ 2 + 2 / 2 , ^ 3 + 2/3),

c(xi,X 2 ,Z3) = (^ϊ,^2»^3)·

11. The set V of all diagonal matrices in Hnxn with operations the same as those
defined in Example 5.

12. The set V of all skew-symmetric matrices in R n x n with operations the same as
those defined in Example 5.

13. The set V of all ordered pairs of real numbers with operations defined by

( x i , x 2 ) +(2/1,2/2) = (xi +2/1 + 1 , ^ 2 + 2/2 + 1),

c(xi,x 2 ) = (c#i,cx 2 ).

14. The set V of all ordered pairs of real numbers with operations defined by

(Xl,X2) + (2/l,2/2) = ( ^ 1 + 2 / 1 , ^ 2 + 2/2),


c(x1,x2) = (xi,X2).

15. The set W of all / in Example 4 that are differentiate (that is, have a derivative
at every real number), with operations as defined in Example 4.

16. The set W of all / in Example 4 such that /(0) = 0, with operations as defined
in Example 4.

17. Prove that the additive identity in a vector space is unique.

18. Prove that the additive inverse — v of an element v in a vector space is unique.
19. Prove that — v = (—l)v for an arbitrary vector v in a vector space.

20. The vector u — v is defined as the vector w that satisfies the equation v + w = u.
Prove that u — v = u+(—l)v.
21. Let V be a vector space over T.

(a) Prove that Ov = 0 for an arbitrary vector v G V.


122 Chapter 4 Vector Spaces, Matrices, and Linear Equations

(b) If 0 denotes the additive identity in V, prove that cO = 0 for all c in T.


22. Suppose V is a vector space over T. Prove that if cv = 0 for c in T and v in V,
then either c = 0 or v = 0.
23. As mentioned just after Definition 4.1, the set C of all complex numbers is a field.
Assuming this fact, prove that each of the sets below is a field.
(a) The set of all complex numbers c that can be written in the form c — a-\-by/2
with a and b rational numbers.
(b) The set of all complex numbers c that can be written as c — a + bi with a
and b rational numbers. (The symbol i denotes the complex number such
that i2 = -1.)
24. Let R,2x2 be as defined in Example 5, and let T be the subset of R,2x2 that consists
of all invertible matrices in R2x2> together with the zero matrix of order 2. With
multiplication as usual and addition as given in Example 5, determine whether or
not T is a field, and justify your answer.

4.3 Subspaces and Related Concepts


Some of the discussion in the last section outlined the way that the theory in Chapter 1
extends to general vector spaces. In order to add depth and meaning to this outline, we
shall illustrate here the concepts of subspace, spanning set, basis, and transition matrix
in vector spaces other than R n . We begin with the concept of a subspace.
If W is a subset of the vector space V over T", then W may also be a vector space
over T. The terminology used in R n generalizes to arbitrary vector spaces, and such a
subset is called a subspace of V.
Theorem 1.10 also extends to an arbitrary vector space V over T. That is, a subset
W of the vector space V is a subspace of V if and only if the following conditions hold:
i. W is nonempty;
ii. for any a, 6 E T and any u, v G W, au + 6v G W.
Example 1 □ Let W be the set of all symmetric matrices in the vector space R n xn
of all square matrices of order n over R. That is, for a n n x n matrix A over R,
A e W if and only if AT = A.
We shall show that W is a subspace of R n X n · The set W is nonempty since the
zero matrix 0 n x n = [0] n X n is in W . Let r and s be arbitrary real numbers, and let A =
Wij]nxn and B = [bij]nXn be arbitrary elements in W . Then A and B are symmetric,
so that üji = üij and bji = bij for i — 1, 2,..., n; j — 1,2,..., n. Evaluating rA + sB, we
have
TA + Sr> = [r(lij\nxn "t" [Sbijlnxn

— \C"ij \nxn

= c,
4.3 Subspaces and Related Concepts 123

where Cij — τα^ + sb^ for i = 1,2, ...,n; j = 1,2, ...,n. Thus

Cji — TCLji ~r SOji

— rctij + sfr^ since α^ = a^ and bji = 6^·


= c
û'

for all pairs i , j . This means that rA + sB is symmetric and hence a member of W .
Therefore W is a subspace of R n X n . ■

For a nonempty subset A of a general vector space V, the set (A) is the set of all
linear combinations of vectors in A. The same development used in Section 1.3 applies
in V, and (,4) is the subspace spanned by A. As in Section 1.3, (0) is the zero
subspace {0} of V.

Example 2 □ In the vector space R 2 x2, consider the sets of vectors A = {A\, A2} and
B = {BUB2}, where

2 -1 1 2 3 -4 4 3"!
,A2 = ,£l = ,B2 =
0 1 1 0 -1 2 2 1

We shall show that (A) = (B).


By inspection, we see that A2 is not a multiple of A\. Hence A = { Α χ , ^ } is
linearly independent and (A) has dimension 2. Similarly, B2 is not a multiple of B\ and
therefore (B) has dimension 2. Knowing that (A) and (B) have the same dimension,
it is sufficient to show that (B) Ç (,4), in order to prove that (B) = (A). But (B) is
dependent on ß, so we need only show that B Ç (A), by Theorem 1.8. That is, we
need only demonstrate that each of B\ and B2 is a linear combination of A\ and A2.
Setting up the equation c\A\ + c2A2 = B\, we have

2 -1 1 2 3 -4
+ c2 =
0 1 1 0 -1 2

This leads to the system of equations

2ci + c2 = 3
- c i + 2c2 = -4
c2 = -1
ci = 2 ,

so B\ = 2Ai — A2. Similarly, we find that B2 = A\ -f 2A2. Thus we have proved that
(A) = (B).M
124 Chapter 4 Vector Spaces, Matrices, and Linear Equations

Just after Example 3 in Section 1.5, we noted that a basis of a subspace can be refined
from a finite spanning set for the subspace by deleting all vectors in the spanning set
that are linear combinations of preceding vectors. This procedure is illustrated in the
following example.

Example 3 □ Let

Pi(x) = l + x + 2x2, p2(x) = 6 + 6x + 12x2, p3(x) = 1 + x2 + x 3 ,


p4(x) = 2 + x + 3x2 + x 3 , ρδ(χ) = 1 + x2

in the vector space V3 over R, and let

A = {pi(x),p2(x),P3(x),P4(x),P5(x)}.

We shall find a subset of A that forms a basis of (A).


Employing the procedure described just before this example, we see that p2(x) =
6pi(x), so p2(x) can be deleted from A to obtain the spanning set

{Pi(x),P3(x),Pi(x),P5(x)}
of (^4). We see that p$(x) is not a multiple of pi(x), so we then check to see if PA(X) is
a linear combination of p\{x) and P3(x). It is easy to discover that

p4(x) =Pl(x)+P3{x),
so ρ4(χ) can be deleted from the last spanning set of (A) we obtained, leaving

{Pi(x),P3(x),P5(x)}
as a spanning set of (,4). Setting up the equation

ρδ(χ) = Cipi(x) + C2P3{x)

leads to
1 + x 2 = d(l-\-x-\-2x2) + c 2 (l + x 2 + x 3 )
= (ci + c2) + c\x + (2ci + c2)x2 + c 2 x 3
and the resulting system of equations

ci + c2 = 1
ci =0
2ci + c2 = 1
C2 = 0,

which clearly has no solution. Thus {pi(x),p3(x),p5(x)} is linearly independent and


therefore forms a basis of (A) that is contained in A. M
4.3 Subspaces and Related Concepts 125

The concept of a transition matrix from one finite set of vectors to another in an
arbitrary vector space applies in every vector space. We consider an example now in
the vector space V2 of all polynomials in x with degree < 2 and coefficients in R.
Example 4 □ Consider the bases A ={pi(x),p2{x),P3{x)} a n d ß = {q\(x), q2(x), q3(x)}
of V2 over R, where

Pi(x) = l+x, p2(x)=x, p 3 ( x ) = x + x2,


qi(x) = l, q2(x) = 1 + x, (73(x) = 1 + x + x 2 .

We shall find the matrix of transition A from A to B. In the matrix A = [0^)3x3, the j t h
column is the coordinate matrix of qj (x) with respect to A. Thus we must write each
qj (x) as a linear combination of the polynomials in A. The required linear combinations
are given by
1 = (l)(l + x) + ( - l ) ( x ) + (0)(x + x 2 ),
1 + x = (l)(l + x) + (0)(x) + (0)(x + x 2 ),
1 + x + x 2 = (l)(l + x) + ( - l ) ( x ) + (l)(x + x 2 ).
That is,
qi{x) = ( l ) p i W + ( - l ) p 2 W + (0)p 3 W,
q2(x) = (l)Pl(x) + (0)p2(x) + (0)p3(x),
q3(x) = (1)PI(X) + (-1)P2(X) + (1)P3(X)·

Thus the transition matrix from A to B is

1 0 -1
0 0 1

We note that the coordinates of the qj(x) are entered as columns of A, not as rows of
A. M
Exercises 4.3
1. Determine whether the given set of vectors is linearly dependent or linearly inde­
pendent in the vector space V2 over R.
(a) {1 - 2x, x + x 2 ,1 - x + x2} (b) {1 + x, x + x 2 ,1 + 2x + x 2 }
(c) {1 - x 2 , x + 2x 2 ,1 + 3x 2 } (d) {1 + x, 1 + x 2 ,1 + x + x 2 }
(e) {1 - 2x, 2 - x ,4 + x ,1 + x + x 2 }
2 2
(f) {1 + x, 1 - x}
2. Determine whether the given set of polynomials spans V2 over R.
(a) {1 - x, 1 + x, 1 - x 2 } (b) {1 - x, x + x 2 , 1 + x 2 }
(c) {1 + x, x + x 2 ,1 + 2x + x 2 ,1 - x 2 } (d) {1 - x, x + x 2 ,1 + x + x 2 ,1 + x 2 }
126 Chapter 4 Vector Spaces, Matrices, and Linear Equations

3. Determine whether the given set of vectors in R2X2 is linearly dependent or linearly
independent.

1 1 0 1 1 0 1 -1
(a) 7 5
0 0 ·) 1 0 1 0 1 0

1 0 0 1 0 0 1 1
(b) 5 5
0 0 0 0 ·) 1 0 1 0

1 1 1 1 0 0
(c)
0 0 1 0 1 0

1 1 0 1 0 0
(d)
0 0 1 0 1 0

4. Which of the following sets of vectors are bases for the vector space V2 over R?
(a) {1 - x, 1 - x + x 2 ,1 + x 2 ,1 - x - x2} (b) {1 - x, 1 - x 2 , x - x2}
(c) {1 - x, x - x 2 ,1 + x 2 } (d) {1 + x, 1 + %2}

5. Find the transition matrix from the basis A to the basis B in V2 over R.

(a) A= {p1(x),p2{x),P3{x)},B = {q1(x),q2(x),q3(x)}, where


2
pi(x) = 1 - x,p2(x) = x + x ,pz(x) = x,
<7i(x) = 1,22(2) = X + X2,<730E) = 1 + x2
(b) A = {pi (x), P2 (x), Ρ3 (x)}, B = {qi (x) ,q2 (x), #3 {%)}, where
pi(x) = x + x 2 ,p 2 (#) = x,P3(x) = 1 + x 2 ,
qi(x) = x,q2(x) = x — x2,qz(x) = 1 + x - x2

6. Find a subset of the given set of vectors that forms a basis for the subspace spanned
by the given set.

(a) pi(x) — x + x2,p2{x) = 1 + x + x 2 ,p 3 (z) = l,P4(x) = 1 + 2x + 2x 2 , in V2


over R
(b) pi(x) = 1 +x 2 ,p2(^) = x + x 2 ,P3(x) = 1 — X,PA(X) = l + x + 2x 2 , in V2 over
R
1 0 0 1 1 -1 2 1
(c) in R<•2x2
1 1 -1 1 2 0 1 3

1 1 1 0 0 1 0 0 1 1
(d) 5 5 1 inR<■2x2
2 1 -3 1 5 0 1 1 3 2
4.3 Subspaces a n d R e l a t e d C o n c e p t s 127

(e) pi(x) = x2 + x + 1,P2(#) — x2 — x — 2,ps(x) = x3 + x — Ι , ρ ^ χ ) = x — 1 in


P3 over R
(f) pi(x) — x2 — Sx + 2,p 2 (#) = x 2 + x — 2,p 3 (x) = x 3 — l,p 4 (x) = x — 1 in P3
over R
7. Verify that the given set W is a subspace of R 4 and find a basis for W .
(a) W is the set of all vectors in R 4 of the form (a, 0,6,0).
(b) W is the set of all vectors in R 4 of the form (a, 6, c, d) with c = a and
d = a + b.
In Problems 8-19, determine which of the given sets W are subspaces of the indicated
vector space. If a set is not a subspace, state a reason.
8. The set W of all vectors in R 3 of the form (a, &, c), where a + b — 1.
9. The set W of all vectors in R 3 of the form (a, 6, c), where b — a2.
10. The set W of all vectors in R 3 of the form (a, b, c), where c = 2a + b.
11. The set W of all vectors in R 3 of the form (a, 6, c), where c = ab.
12. The set W of all polynomials ao + CL\X + «2^ 2 in P2 over R that have a\ = 0.
13. The set W of all polynomials ao + a\x + 02#2 in P2 over R that have ao + a\ = 0 .

a b
14. The set W of all matrices in R2X2 that have a + b = 0.
c d

1 a
15. The set W of all matrices of the form i n R .2x2·
b c

a b
16. The set W of all matrices in R2X2 that have the form with a, b, c, and
c d
d integers.
r
o 0
17. The set W that consists of the zero matrix together with all invertible
0 0
matrices in R2X2·

0 a
18. The set W of all matrices of the form inR-'2x2·
b 0

a b
19. The set W of all matrices in R2X2 that have the form where a? — d2.
c d
128 Chapter 4 Vector Spaces, Matrices, and Linear Equations

4.4 Isomorphisms of Vector Spaces


The objective of this section is to derive the most important single result concerning
finite-dimensional vector spaces. The concept of isomorphism between two vector spaces
is essential to an understanding of this fundamental result, which is contained in Theo­
rem 4.5. For this reason it is necessary to first consider the concept of an isomorphism.
An isomorphism is a certain type of mapping, so we begin with a discussion of mappings
in general.
A mapping (or function, or transformation) from the set S into the set T is a
set / of ordered pairs (s,t) of elements s G <S,£ G T that has the following property:
for each s in <S, there is exactly one element t in T such that (s, £) G / . The notation
t = f(s) indicates that t is the unique element of T that is associated with s by the rule
that (s, t) G / . We say that f (s) is the image of s, and that s is an inverse image of

If / is a mapping of S into T, then it may happen that f(s\) = /(S2), even though
si ^ 52- If it is true that f(s\) = /(S2) always implies s\ — S2, then / is called injective
or one-to-one. Another point of interest is that it is not required that every element
of T be an image of an element in S. If it happens that every t in T is the image of at
least one s in S under / , we say that / is a surjective mapping of S into T, or that /
maps S onto T. If / is both injective and surjective, then / is called bijective.
Two mappings / and g of S into T are equal if and only if f(s) = g(s) for every s
in S.
The rule f(x) = sinx defines a mapping of the set R of real numbers into R. This
mapping / is clearly not injective (for example, sin ^ = sin ^ = *ψ ) . Also, / is not
surjective since there is no x in R such that f{x) — 2.
If S ={x G R I 0 < x < π} and T = {t G R | 0 < t < 1}, then the rule f(x) = sinx
defines a mapping of S into T that is surjective but is not injective.
If S = {x G R I 0 < x < f } and T = {t G R | 0 < t < 1}, the rule f(x) = sinx
defines a mapping of S into T that is both surjective and injective.
These examples illustrate the fact that the surjective and injective properties depend
on the sets S and T as well as the rule that defines the mapping.

Definition 4.4 Let U and V be vector spaces over the same field T. An isomorphism
from XJtoVisa bijective mapping fofXJ into V that has the property that

/ ( a u + ftv)=a/(u) + 6/(v)

for all a, 6 in T and all u, v in U. If an isomorphism from U to V exists, U and V are


said to be isomorphic vector spaces.

Since an isomorphism / from U to V is a bijective mapping, it pairs the elements of


U and V in a one-to-one fashion. In view of the property / ( a u + 6v) — a/(u) + bf(v),
f is said to "preserve linear combinations." As particular instances of this property,
/ ( u + v) = / ( u ) + / ( v ) and / ( a u ) = a / ( u ) . Thus / preserves sums and scalar products,
and the vector spaces U and V are structurally the same.
4.4 Isomorphisms of Vector Spaces 129

We are now in a position to prove the principal result of this section. This result
shows that the first example in Section 4.2 furnishes a completely typical pattern for
n-dimensional vector spaces over a field T.

Theorem 4.5 Any n-dimensional vector space over the field T is isomorphic to Tn.

Proof. Let V be an n-dimensional vector space over the field ?*, and let

B = {vi,v2,...,vn}

be a basis of V. Each u € V can be written uniquely as u = Σ™=ι Q>ïVi, so the rule

/ ( u ) = (αι,α 2 ,...,α η )

defines an injective mapping of V into Tn. It is clear that / is surjective.


Let u = ΣΓ=ι α * ν * a n ( * v — ΣΓ=ι biVi be arbitrary vectors in V, and let a and 6 be
arbitrary scalars in T. Then

/ ( a u + 6v) = / ί £ (acLi + 66*)ν* )

= (ααχ + 66χ, aa2 + 662, ...,aa n + 66n)


= a ( a i , a 2 , . . . , a n ) + 6(61,62, •••,Μ
= a / ( u ) + 6/(v),

so that / is an isomorphism from V to Tn. M

Exercises 4.4

1. Let V be the vector space in the second example in Section 4.2. Exhibit an
isomorphism from V to Ρη+ι.
2. Define a mapping / of R3X2 into R 6 that is an isomorphism from R3X2 to R 6 ,
and prove that your mapping is an isomorphism.
3. For each subspace W below, determine the dimension r of W and find an isomor­
phism from W to R r .

(a) W = ((1, - 1 , 1 ) , (2, - 2 , 2 ) , (0,1,0), (1,0,1)) in R 3


(b) W = ((4, - 2 , 5 ) , (0, - 4 , 0 ) , (12, -18,15), (4,0,5)) in R 3
(c) W = ((2,0,4, - 1 ) , (5, - 1 , 1 1 , 8 ) , (0,1, - 7 , 9 ) , (7,0,8,16)) in R 4
(d) W = ((4, - 2 , 5 , 5 ) , (0, - 4 , 0 , 4 ) , (12, -18,15,27), (4,0,5,3)) in R 4
(e) W = (pi(x),p2(x),P3(x),Pi(x)) in V3 over R, where

Pi(x) = x3 + x2 + 1, P2(x) = x2 +x+ 1,

P3(x) = 2x3 + Sx2 + x + 3, P4(x) = x3 — x.


130 Chapter 4 Vector Spaces, Matrices, and Linear Equations

(f) W = (pi(x),P2(x),P3(x),P4(x)} in ^3 over R, where

Pi(x) = x3 + x + 1, P2{x) = x2,

p3(x) = 2x3 + 3x 2 + 2x + 2, p 4 (x) = x + 1.

1 0 -2 0 1 1 2 1 0 1
(g) W : 7 7 5 5 inR<•2x2
-1 1 2 -2 1 1 0 2 2 0

1 1 3 3 1 2 2 1
(h) W 7 5 inR-•2x2
2 0 6 0 3 1 3 -1

4. Let V be the subspace spanned by the set {p\ (x), p2 (x) » P3 (#) > P4 (#)} of polynomi­
als in Problem 6(e) of Exercises 4.3. Find an isomorphism from V to a subspace
ofR4.

5. Let V be the subspace spanned by the set {pi(x),P2(x),P3(x),P4(x)} of polyno­


mials in Problem 6(f) of Exercises 4.3. Find an isomorphism from V to a subspace
ofR4.
6. Prove that the relation of being isomorphic is an equivalence relation on the set
of all vector spaces over T.

4.5 Standard Bases for Subspaces


The definitions of the elementary operations on sets of vectors as given in Chapter 2
apply unchanged to arbitrary vector spaces. The following statements and proofs also
apply unchanged, and may be used in arbitrary vector spaces: Definition 2.1, Theorem
2.2, Theorem 2.3, Corollary 2.4, Definition 2.6, Theorem 2.7, and Corollary 2.8. These
results may be summarized in the following statement: If A and A! are sets of vectors in
a vector space V such that A! is obtained from A by a sequence of elementary operations,
then (i) A and A! span the same subspace of V and (ii) A! is linearly independent if
and only if A is linearly independent. The significance of this statement lies in the fact
that elementary operations may be used to obtain simpler forms of spanning sets for a
subspace and also to investigate linear independence.
From Corollary 2.8 to the end of Chapter 2, the statements and proofs of results
are intimately connected with R n . Thus more changes are necessary in formulating
the corresponding development for subspaces of arbitrary vector spaces. However, the
proofs of the theorems are basically unchanged. For this reason these theorems are
stated here with only indications as to the changes in notation necessary in order to
obtain the proofs.

Theorem 4.6 Let B = {ui, 112,..., u n } be a fixed basis of the vector space V overT, and
let A = {vi, V2,..., v m } be a set of m vectors in V that spans the subspace W = {A) of
dimension r > 0. Then a set A! = {ν'ΐ5 v 2 ,..., v^.,0, ...,0} of m vectors can be obtained
4.5 Standard Bases for Subspaces 131

from A by a finite sequence of elementary operations so that {ν^, v 2 ,..., v^,} has the
following properties:
1. The first nonzero coordinate of v'j with respect to B is a 1 for the kj coordinate
for j = 1,2, ...,r. That is, v'j = Y^=k. a'^Ui with a'kjj = 1.
2. ki <k2 < ·" <kr
3. Vj is the only vector in A! with a nonzero kj coordinate relative to B.
4. {v[, v 2 ,..., v^} is a basis o / W .

Proof. The proof can be obtained from the proof of Theorem 2.9 by replacing R n
by V, εη by 23, and e^ by u* so the a[j represents the z th coordinate of v'j relative to B
instead of the z th component of v'j. ■

Theorem 4.7 For a fixed basis 23, there is one and only one basis of a given subspace
W that satisfies the conditions of Theorem J^.6.

Proof. It follows from Theorem 4.6 that there is at least one such basis of W . Let
A! = {v'1?V2,...,vJ.} and A" = {v", v 2 ',..., v"} be two bases of W that satisfy the
conditions. Then the same replacements used in the proof of Theorem 4.6 can be used
in the proof of Theorem 2.10 to obtain a proof that A! = A"'. M

Definition 4.8 Let B be a fixed basis of the n-dimensional vector space V over T, and
let W be a subspace 0 / V of dimension r. The basis o / W that satisfies the conditions
of Theorem 1^.6 is called the standard basis o / W relative to B.

That is, the standard basis of W relative to B — {ιΐχ, 112,..., u n } is the unique basis
A = {vi, V2,..., v r } that has the following properties:
V
!· J = YTi=ka aiJUi with a
k3j = 1
2. hi < k2 < · · · < kr
3. Vj is the only vector in A that has a nonzero kj coordinate relative to B.

When referring to standard bases, we use the phrase "with respect to 23" inter­
changeably with "relative to 23."
The proofs of the following two theorems can be obtained from those of Theorems
2.12 and 2.13 by making the same changes as were indicated in Theorems 4.6 and 4.7.

Theorem 4.9 Let B be a fixed basis of the finite-dimensional vector space V over T.
For any subspace W of V, the standard basis of W relative to B can be obtained from
any basis o / W by a sequence of elementary operations.

Theorem 4.10 Let A and B be two sets of m vectors each in the finite-dimensional
vector space V over T. Then {A) — (23) if and only if B can be obtained from A by a
sequence of elementary operations.
132 Chapter 4 Vector Spaces, M a t r i c e s , and Linear Equations

Example 1 D The set B = {ui = ( l , 0 , 2 ) , u 2 = (2, l , 6 ) , u 3 = (0,3,8)} is a basis of


R 3 . Let A = {(4, - 2 , 2 ) , (6,26,78), (5,12,40), (14,22,82)} and consider the problem of
finding the standard basis of (*4) relative to B.
In order to obtain the coordinates α^ as in Theorem 4.6, we determine the coordi­
nates of the vectors in A with respect to B and then record these coordinates in the
columns of a matrix. (That is, we follow the same procedure as in Chapter 2, using
coordinates rather than components.) The vectors of A are found to be given by

(4,-2,2) = 2ui + u2 - u3,


(6,26,78) - - 4 u i + 5u 2 + 7u 3 ,
(5,12,40) = - u i + 3u 2 + 3u 3 ,
(14,22,82) = 7u 2 + 5u 3 ,

so that the matrix of coordinates is

2-4-1 0
1 5 3 7
- 1 7 3 5

By using elementary column operations on A (which corresponds to using elementary


operations on A), we find the reduced column-echelon form for A is

1 0 0 0
Ä 0 1 0 0
1 I 0 0
Thus the standard basis for (*4) with respect to B is {v^, v 2 } , where

v'1=U1-fu3 = ( l , - f , - f ) ,
vi, = u 2 + f u 3 = ( 2 , f , f ) .

Exercises 4.5

1. Using B = {(1,0,0,0), (1,1,0,0), (1,1,1,0), (1,1,1,1)} as the fixed basis of R 4 ,


find the standard basis of {A) relative to B for each set A below. Write your
answers in component form.

(a) Λ = {(4,4,3,1), (7,7,5,1), (4,4,3,1)}


(b) A = {(5,4,2,3), (3,3,2,2), (8,7,4,5), (2,1,0,1)}
(c) A = { ( - 1 , - 2 , - 1 , - 3 ) , (4,3,2,0), (2, - 1 , 0 , - 6 ) , (3,1,1, - 3 ) , (1,1,1,1)}
4.5 S t a n d a r d Bases for Subspaces 133

(d) 4 = {(4,3,2,-1), ( - 1 , - 2 , - 2 , 0 ) , (7,4,2, - 2 ) , (2,1,1,1), (10,9,8,1)}

2. Let ρι(χ) = 2x2 + 2,p2(x) — x-\-l,ps(x) = 2x2 — 3x-\-l in the vector space V2 over
R. Given that B — {pi{x),P2(x),P3(x)} is a basis of V2 over R, find the standard
basis of (A) with respect to B if A is given by

(a) A= {qi{x),q2(x),qz(x)}, where qx(x) = 2ρλ(χ) - 3p2(x) - p 3 ( z ) ,


q2(x) = p\ {x) - P2(x) - P3(x), and q3(x) = 2ρλ (χ) - p2(x) - p 3 (x).
(b) A= {qi{x),q2(x),q3{x),q4(x)}, where qx(x) = - p 3 ( z ) ,
q2{x) = 3pi(x) - 6p 2 (x) +p 3 (x),ç 3 (;r) = 2pi(x) - 4p2(:r) + 2p 3 (z), and
ç 4 (x) = - p i ( x ) + 2p 2 (x) + 2p 3 (x).

1 0 1 1 1 1 1 1
,u2 = ,u3 = ,u 4 =
0 0 0 0 1 0 1 1
in the vector space R 2 x 2 . It is given that B — {iii, u 2 , u3,114} is a basis of R 2 x 2 .
For each set A, find the standard basis of (*4) relative to B.

(a) A={Ai,A2,A$}, where

2 1 1 2 4 5
, A2 = , A3 =
0 1 2 1 4 3

(b) A= {A1,A2,A3,A4}, where

-2 -2 -4 -7 0 -2 6 7
, A2 = ,A3 = , A4 =
-2 -1 -1 -2 2 0 5 3

4. Determine in each case whether or not the given sets span the same subspace of
V2 over R.

(a) A = {2x2 + 2, 2x2 - 2:r + 2}, B= {4x2 - 6x + 4, 2x, x 2 + 1}.


(b) ,4 = {2x2 + 4x + 4,4x 2 - 7x + 7,18x 2 - lOx + 18},
B = {4x2 + Sx + 8,14x 2 - 14x + 8}.

5. Let V be the vector space V2 over R.

(a) Find the matrix of transition from the basis {1, x, x2} of V to the basis B in
Problem 2.
(b) Find the matrix of transition from the basis B in Problem 2 to the basis
{l,x,x2}ofV.
134 Chapter 4 Vector Spaces, Matrices, and Linear Equations

4.6 Matrices over an Arbitrary Field


At this point the generalization of Chapters 1 and 2 to arbitrary vector spaces is com­
plete. This makes available the basis for the theory of multiplication of matrices over
an arbitrary field T. An examination of the theory in Chapter 3 reveals that the entire
development rests only on the facts that R is a field and R n is an n-dimensional vector
space over R. With a single exception, in order to obtain the general form of a result, it
is necessary only to replace R by T and R n by an n-dimensional vector space over T.
The single exception is in the definition of the column space of a matrix in Definition
3.43. In this definition, R n is replaced by Tn.
From this point on, we shall make free use of the general forms of all the properties
of matrix multiplication derived in Chapter 3.

4.7 Systems of Linear Equations


Let Q n denote the set of all linear equations in n unknowns x \, X2, · · ·, Xnwith coefficients
in a field T'. That is,

Q n = \a\X\ + a 2 #2 H V anXn = b \ a* G T and b e F],

and an element of Q n is an equation a\X\ +0,2X2 + · · · + anxn = b. Two elements of


Q n are equal if their corresponding coefficients are equal and their constant terms are
equal. Addition in Q n is defined by

[a\X\ + 02^2 H \~ anxx = b] + [a[xi + af2X2 H l· o!nxn = bf]

= [(ai + a[)xi + (a2 + a2)x2 H h (α η -h a'n)xn = (b + &')].

Scalar multiplication is given by

a · [a\X\ 4- Û2#2 H l· anxn = b] = [aa\X\ + αα2Χ2 H l· aanxn = ab}.


It is readily verified that Q n is a vector space over T.
A system of linear equations in xi,X2> ■■■,Xn with coefficients in T such as

a\\X\ + ai2#2 H 1- Q>inxn = 61


θ2\Χ\ "h 022^2 H l· θ2ηΧη = ^2

a
ml%l + Om2%2 + * ' * + Q'mnXn — 6m

can be regarded as a set of vectors Λ = {vi, V2,..., v m } in Q n , where v^ is the i t h


equation in the system. A solution of the system Λ is a set of values for xi,#2) --·,^η
that satisfies each equation in A The set of all solutions is the solution set of Λ. In some
4.7 Systems of Linear Equations 135

cases, it is more convenient to write these solutions in vector form a s v = (#i, #2»..., # n ),
while in others, the matrix form
Xl

X2
X =

is more convenient to use.

Theorem 4.11 With the notation of the preceding paragraph, let (A) denote the sub-
space of Q n that is spanned by A. Then each solution of the system A is a solution of
every equation in the subspace (A).

Proof. Any equation v in (A) is a linear combination of the equations in A, v =


Y^JiLi civi- Thus, v has the form

ci[aiixi -\ h αιηχη = 6i]


+c2[a2ixi H h a2nxn = b 2]
+ ···
~\~Crn\Clrrl\X\ -\- · ' ' ~\- αγγιηΧη = Ογγι\

or

If xi = di, X2 = d2,..., x n — ^n is a solution of the system A, then

a*idi + a; 2 d 2 H h a i n d n = 6» for 2 = 1,2,..., m.

Therefore

I £ Cj-a^i j di + I Σ Cjaj2 J d2 + · · · + ( Σ Cjajn j dn

= ci(andi H h a i n d n ) + c2(a2\d\ H h a2ndn) H


+Cm(ömldi + · · · + umndn)

= ci6i + c 2 6 2 H h cm6m,

and xi = d\,x2 = d2, ...,x n = d n is a solution of v. ■

Definition 4.12 Two systems A and B contained in Q n are equivalent if and only if
they have the same solutions.
136 Chapter 4 Vector Spaces, Matrices, and Linear Equations

The concept of elementary operations on sets of vectors applies in Q n and is a useful


tool in solving systems of equations.

Theorem 4.13 Let A be a system of m equations contained in Q n . / / the set B in Q n


is obtained from A by a sequence of elementary operations, then B and A are equivalent
systems of equations.

Proof. Suppose that B is obtained from A by a sequence of elementary operations.


Then B Ç (A), so that every solution of A is a solution of B by Theorem 4.11.
But since B is obtained from A by a sequence of elementary operations, A can be
obtained from B by a sequence of elementary operations (Theorem 2.2). Therefore,
A Ç (B), and every solution of B is a solution of A. This completes the proof. ■

Matrices are valuable tools in the solution of systems of linear equations. Consider
the system A given by

a i i x i + α ι 2 £ 2 H l· αιηχη = &i
a2\X\ +022^2 H Va2nXn = &2

Q>mlXl + Û m 2 ^ 2 H h ümn^n = bm .

This system can be written compactly as a single matrix equation AX = B, where

Xl 6i
X2 b2
A — [&ij\mxm ^ — and B =

Example 1 □ The system of linear equations

xi + 3x 2 + 2x 3 - χδ = 7
2x\ + 6x2 + 5^3 + 6x4 + 4x5 = 0
xi + 3x 2 + 2x 3 + 2x 4 + X5 = 9

is equivalent to the single matrix equation

xi + 3x 2 + 2x 3 - x 5
2xi + 6x2 + 5x3 + 6x4 + 4x5
xi + 3x2 + 2x3 + 2x4 + x5
4.7 Systems of Linear Equations 137

and this matrix equation can be written in factored form as

Xl

3 2 0 -1 X2 7
6 5 6 4 X3 = 0
9
3 2 2 1 X4
J
£5

Thus we have a matrix equation of the form AX = B, with

Xl

1 3 2 0 -1 %2

2 6 5 6 4 X X3 and B =
1 3 2 2 1 £4

X5

Definition 4.14 In the system AX = B, the matrix A is called the coefficient ma­
trix, X is the matrix of unknowns, and B is the matrix of constants. The matrix

an a\2 ·" a\n b\


a 2 n
&21 &22 ' " ' ^2
[A,B]

Ural Gm2 x
ran u
m

is called the augmented matrix of the system.

Each system in a certain set of variables such as A above has a unique augmented
matrix. That is, each system has an augmented matrix, and different systems have
different augmented matrices. Any elementary operation performed on A is reflected
in the augmented matrix as an elementary row operation, and any elementary row
operation on the augmented matrix produces a corresponding elementary operation on
A.

Theorem 4.15 Two systems AX = B and A'X = B' of m linear equations in n


unknowns x\,X2,...,xn are equivalent systems if their augmented matrices [A,B] and
[Α',Β'] are row-equivalent.

Proof. Let A and A' be the sets of vectors in Q n that consist of the equations in
the systems AX = B and A'X = B'', respectively.
138 Chapter 4 Vector Spaces, Matrices, and Linear Equations

If the augmented matrices [A, B] and [Α',Β'] are row-equivalent, then [Α',Β'] can
be obtained from [A, B] by a sequence of elementary row operations. Hence A! can be
obtained from A by a sequence of elementary operations and A and A! have the same
solutions, by Theorem 4.13. ■

Corollary 4.16 Let A be m x n in the system AX = B, and let P be any invertible


m x m matrix. Then AX = B and PAX = PB have the same solutions.

Proof. This follows from the fact that the augmented matrices [A, B] and [PA, PB] =
P[A,B] are row-equivalent. ■

Theorem 4.17 The system AX = B has a solution if and only if

rank ([A, B]) = rank(A).

Proof. The system AX = B can be rewritten in the form

an a>\2 Q>ln "&i "

Û21 Q>22 0<2n b2


X\ + X2 + · ·+ %n —

ûml _ Ûm2 _ Q"mn J>m J


Thus the system has a solution if and only if there are scalars x\,xi,...,xn in T such
that
xi(an,a 2 i,---,ömi) + #2(^12,022, •••^ m 2 ) H
+ ^ n ( û l n , Û 2 n v - î a m n ) = (&1> &2, . . · , 6 m ) ·

Let S denote the column space of A, let S * denote the column space of [-A, JB], and let
b = (61,625 •••î&m) £ ^7771· According to the last statement of the preceding paragraph,
AX = B has a solution if and only if b is in S. But b is in S if and only if S and S*
have the same dimension, i.e., if and only if A and [A, B] have the same rank. ■

Theorem 4.18 If A is an m xn matrix over T and rank ([A, B]) — rank(A) = r, then
the solutions to AX = B can be expressed in terms of n — r parameters.

Proof. Let P be an invertible matrix such that PA — A! is in reduced row-echelon


form. The original system is equivalent to AX = B' where PB = B' (Theorem 4.15).
4.7 Systems of Linear Equations 139

By Corollary 3.49, A' has r nonzero rows. Thus the system A'X = B' has the form

x
ki + ai )fcl+1 Xfc 1+ i + · · · + α ^ . ^ - ι + · · ·

+ai,fc2+ixfc2+i + " - + ai,n x " b[


x
/ c 2 "I" a 2,fc 2 + l X f c 2 + l + * ' * + a
2,nXn = b'o

%kr "T CLrnXn = K


Kr + 1
0

0 = bL.

In this system each variable χ^,ϊ = 1, 2,..., r occurs just once in the z th equation with
a coefficient 1. Hence each of x/^, ...,x/cr can be expressed in terms of the remaining
n — r variables. ■

The variables x^, Xk2 » ···» #fcr m the ^ a s t paragraph are called the leading variables,
and the remaining n — r variables are called the parameters in the solution of the
system.
The proof of Theorem 4.18 furnishes at one and the same time a method for de­
termining the existence of solutions and a method for obtaining them. To solve the
system AX = B, we can use elementary row operations to transform the augmented
matrix [A, B] into reduced row-echelon form [A*', B']. The condition that rank ([A, B]) =
rank(A) is reflected in the conditions 0 = ^ + 1 , ...,0 = 6^ since

rank(A) = rank(PA) = rank(A')

and
rank ([A, B)) = rank (P[A, B]) = rank ([Α', Β']).
If rank ([A, J5]) > rank(A), then rank ([A\ B']) > r and at least one of the equations
0 = &'Γ+1,...,0 = b'n will be contradictory. If rank ([A, B\) = rank(A), then there are
solutions, and they can be obtained by solving for the leading variables in terms of the
parameters. This method of solution is called Gauss-Jordan elimination. A system
is solved by Gauss-Jordan elimination in Example 1.

Example 2 D Consider the system of equations

xi + 2x 2 + X3 ~ 4x 4 + x 5 = 1
X\ + 2^2 — X3 + 2X4 + £5 = 5
2x\ + 4x2 + £3 — 5x4 = 2
x\ + 2x2 + 3x3 — IOX4 4- X5 = —3 .
140 Chapter 4 Vector Spaces, Matrices, and Linear Equations

The augmented matrix [A, B] can be transformed to reduced row-echelon form as fol­
lows.

1 2 1 -4 1 1 1 2 1 -4 1 1
1 2 1 2 1 5 0 0 -2 6 0 4
[AB] -
2 4 1 -5 0 2 0 0 -1 3 -2 0
1 2 3 -10 1 -3 0 0 0 -4

1 2 0-1 1 3 1 2 0 - 1 0 2
0 0 1 - 3 0 - 2 0 0 1 - 3 0 - 2
= [A',B'}
0 0 0 0-2-2 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0

The reduced row-echelon form [A\ B'] corresponds to the system

x\ + 2x2 — XA — 2
X3 — 3X4 = -2
X5 = 1 -

When we solve for the leading variables xi, X3, £5 in terms of the parameters #2? #4? we
obtain the solutions to the system:

x\ — 2 — 2x2 + #4
£3 = — 2 + 3x4
^5 = 1,

where X2 and X4 are arbitrary. ■

There is another point of view to Theorem 4.18 and its proof that is useful in
Chapter 5. The fifth set in Example 1 of Section 1.3 generalizes immediately from R n
to Tn. That is, the set of all vectors v = (xi,X2, ...,x n ) in Tn with components x;
that satisfy a given system of equations AX = 0 is a subspace W of Tn. The n — r
parameters in the proof of Theorem 4.18 represent the components of v that can be
assigned values arbitrarily. We can solve for the leading variables Xki,Xk2)—>xkr m
the equation A'X = 0 and express them in terms of the n — r parameters. If we then
replace these leading variables by their values in terms of the parameters, the vector
(xi,X2, ...,x n ) that represents the general solution of the system can be obtained as a
linear combination of the vectors in a basis of W . There will be n — r vectors in this basis
since there are n — r parameters present. This is illustrated in the following example.
4.7 Systems of Linear Equations 141

Example 3 □ Consider the system of equations

xi + 2x 2 + X3 - 4x 4 4- x 5 = 0
X\ + 2X2 — X3 "h 2X4 + X5 = 0
2xi + 4x 2 + X3 — 5x4 = 0
X\ + 2X2 + 3X3 — IOX4 + #5 = 0 .

The coefficient matrix in this system is the same as the one in Example 1, and
the reduced row-echelon form for this system can be obtained simply by replacing the
constants B' in \A', B'} by a column of zeros to obtain [A\ 0]. This gives

1 2 0 - 1 0 0
0 0 1 - 3 0 0
[A',0]
0 0 0 0 1 0
0 0 0 0 0 0

which corresponds to the system

X\ + 2X2 — X4
X3 — 3X4
£5 =

Solving for the leading variables, we get

X\ = —2X2 + X4
X3 = 3X4

x5 = 0.

Replacing the leading variables by their values in terms of the parameters yields

(X1,X2,X3,^4,^5) = (-2X2 +X4,£2,3x 4 ,X 4 ,0)


= (-2x 2 , X2,0,0,0) + (x 4 ,0,3x 4 , x 4 ,0)
= x2(-2,1,0,0,0)+x4(l,0,3,1,0).

Thus the subspace W of all solutions to AX = 0 is given by

W = ((-2,1,0,0,0),(1,0,3,1,0)),
142 Chapter 4 Vector Spaces, Matrices, and Linear Equations

and {(—2,1,0,0,0), (1,0,3,1,0)} is a basis for the solution space W . As a final remark
concerning these solutions, we note that the matrix form

Γ-2 1
1 0
X = x2 0 + X4 3
0 1
0 °J
can be easily predicted by listing all the variables in a column with the leading variables
expressed in terms of the parameters:

Xl = -2x2 + XA

X2 = X2

X3 = 3X4

X4 = X4

x$ = 0.

Exercises 4.7

In Problems 1-14, find all solutions of the given system of equations.


1. xi + 2x 2 + #3 = 5 2. xi + x2 - x3 = 4
-xi - x2 + X3 = 2 3xi + 3x 2 — 2x 3 = 11
x2 + 3x 3 = 1 4xi + 5x 2 - 3x 3 = 17
3. 2xi + x3 = 1 4. Xi — X2 — X 3 = 0

Xl + X2 = —2 2xi — x3 = 1
2xi + 4x 2 — #3 = —3 3xi + x2 - x3 = 2
5. 2#i + £2 = — 4 6. 9xi — 6x2 = 15
Xl - X2 = 4 15xi - 10x2 = 25
- 3 x i + 3x 2 = 2 6x1 - 4x2 = 10
7. xi + 2x 2 + 5x 3 = 0 8. 4xi + 4x2 - 7x3 + 3x4
4xi + 12x2 + 21x 3 + 2x 4 = 0 3xi + 3x2 — 5x3 + 2x4
3xi + 6x 2 + 15x3 - 3x 4 = 0
4.7 Systems of Linear Equations 143

9. x\ — 2x2 — 2x3 — 3x4 = 1


2xi - 4x2 + 2x3 =2
3xi — 6x2 + X3 — 2x4 = 3
x3 + x4 = 0

10. 4xi + 3x2 + 2x3 - x4 = 4


5Xi + 4X2 + 3X3 — X4 = 4
—2xi — 2x2 — X3 + 2x4 = —3
llxi + 6x2 + 4x3 + x4 = 11

11. xi + 3x2 — X3 + 2x5 = 2


2χχ + 6x2 + X3 -l· 6x4 + 4x5 = 13
—xi — 3x2 — 2x4 — 2x5 = —5

12. χχ + x2 + X3 + X4 + X5 = 2
x\ + X2 + 2x3 + 3x4 + 4x5 = 4
2xi + 2x2 + 3x3 -1- 4x4 + 5x5 = 6

13. xi + 2x2 -f X4 =1
2xi + 4x2 -h X3 + 4x4 + 3x5 = 6
xi + 2x2 + 2x3 + 4x4 — 2x5 = 1
—xi — 2x2 + 3x3 -f 5x4 H- 4x5 = 6

14. xi + X2 + 3x3 + 2x4 = 0


#1 - #2 + ^3 =2
2xi + 2x 2 + 6x3 + 4x 4 =0
3xi + 6x3 + 3x4 — X5 = 3

In Problems 15 and 16, (a) find the rank of the coefficient matrix, (b) find the rank
of the augmented matrix, and (c) determine if the system is consistent by comparing
these ranks. It is not necessary to solve the systems.

15. xi + 2x 2 - X3 = 3 16. xi + 2x 2 + 3x 3 + x 4 = 0
X2 + X3 = 1 %2 + %3 + %4 = 0
x\ — 3x3 = 0 χχ + X3 — X4 = 1
144 Chapter 4 Vector Spaces, Matrices, and Linear Equations

17. For the following matrix A, (a) find a basis of the column space of A, and (b)
express each column vector of A as a linear combination of the vectors in your
basis.
3 2 -2 7
6 4 -4 14
1 1 2 3
-5 -4 -2 -13
11 8 -2 27

18. Find all real numbers a and b for which the system of equations below does not
have a solution.
X\ + £ 3 = 1

ax\ + £2 + 2x 3 = 0
3xi + 4x 2 + bx3 = 2

In Problems 19 and 20, find the values of the real number a for which the given system
(a) has no solution, (b) has exactly one solution, (c) has infinitely many solutions.

19. xi + 2x2 - X3 = 2

2x\ + 6x 2 + 3x 3 = 4
2
3xi + 8x 2 + (a 2)x 3 = a + 8

20. xi + x2 + X3 = 3
2xi + 3x2 + 3x3 = 8
3xi + 3x2 + (a2 — 6)x3 = a + 6

2 1 . Let B = {[xi - 0], [x2 = 0],..., [xn = 0], [xi = 1]}.

(a) Prove that B is a basis of Q n .


(b) What is the dimension of Q n over ΤΊ.

22. For each of Problems 1-14, let A be the coefficient matrix of the given system. Let
W be the subspace consisting of all (xi,X2, ...,x n ) in R n that satisfy the system
AX = 0. Find a basis of W in each case.

23. Let W be the subspace of all (xi, X2, ···, xn) in Tn that satisfy the system AX = 0,
and let c = (ci,C2, ...,c n ), where x\ = ci,X2 = C2,...,x n = cn is a particular
solution to the system AX = B. Prove that c + W is the complete set of solutions
to AX = B.
Chapter 5

Linear Transformations

5.1 Introduction
In this chapter, the important concept of a linear transformation of a vector space is
introduced. Matrices prove to be a powerful tool in the study of linear transformations
of finite-dimensional vector spaces. They can be used to classify linear transformations
according to certain equivalence relations that are based on fundamental properties
common to different linear transformations.

5.2 Linear Transformations


Suppose that / is a mapping of S into T, and let A be an arbitrary subset of «S. The
set
f(A) = {t | t = f(s) for some s e A}
is called the image of A under / . Thus, / is a surjective mapping of S into T if and
only if f(S) = T. For any subset B of T, the set
f-1(B) = {seS I f(a)eB}
is called the inverse image of B. In particular, if B consists of a single element £,
f-1(t) = {aeS I f(a)=t}.
-1
Thus, / is injective if, for every t G / ( S ) , / ( £ ) consists of exactly one element.
We write / : 5 - > T o r 5 - > T t o indicate that / is a mapping of S into T.
Our interest in this chapter is with those mappings of one vector space into another.
Throughout the chapter, U and V will denote vector spaces over the same field T.
Definition 5.1 A linear transformation T is a mapping T : U —> V which has the
property that
T(au + few) = aT(u) + 6T(w)
for all u, w in U and all a, b in T.

145
146 Chapter 5 Linear Transformations

We recall from Section 4.4 that an isomorphism is a bijective mapping / that has
the property / ( a u + bw) = a / ( u ) + bf(w) required in Definition 5.1. Hence every
isomorphism is a linear transformation. However, a linear transformation of U into V
may be neither injective nor surjective, even though it preserves linear combinations
just as an isomorphism does.
In addition to the isomorphisms, another family of examples of linear transformations
is provided by the zero transformations. For a given pair of vector spaces U and V, the
zero linear transformation is the mapping Z : U —> V defined by Z(u) = 0 for all
u in U.
The following examples provide some more detailed illustrations concerning linear
transformations.
Example 1 D Consider the mapping T : R 2 —> R 2 defined by 1
T(x, y) = (4x + 5j/, 6x - y).
For arbitrary u = (u\, u2), w = (wi, w2) in R 2 and arbitrary a, b in R, we have

T(au-h6w) = T(au\ + bwuau2 + bw2)


— (4(aui + bw\) + 5(au2 + 6^2), G{au\ + bw\) — (au2 + 6^2))
= (4au\ + 5a^2,6aui — au2) + {Abw\ -f 56t^2,66u?i — bw2)
= aT(u) + 6T(w),

and therefore T is a linear transformation of R 2 into R 2 . ■


A linear transformation of a vector space V into itself is called a linear operator.
The mapping T in Example 1 provides an example of a linear operator.
Example 2 D Let Vn denote the vector space that consists of all polynomials in x with
degree n or less and coefficients in R, and consider the mapping T : V\ —► V2 defined
by
T (p(x)) = (1 + x)p(x)
for all p(x) in V\. A specific computation of a value of T is provided by

T(3 + 2x) = (l + x)(3 + 2z)


= 3 + 5x + 2x 2 .
For arbitrary p(x),q(x) in V\ and arbitrary a,6 in R, the polynomials T (p(x)) and
T (q(x)) are in V2 and

T (ap(x) + bq(x)) = (1 +x)(ap(x)+ bq(x))


= a(l + x)p(x) + b(l + x)q(x)
= aT(p(x)) + bT(q(x)).
1
For notational convenience, the outer parentheses in Τ((χι,Χ2> ·~,Χη)) are dropped for mappings
defined on R n .
5.2 Linear Transformations 147

Thus T is a linear transformation of V\ into V2 · H

E x a m p l e 3 □ Let the mapping T : R 2 —> R 2 be defined by

T(a;,2/) = (z + l,2a; + y).

For any u = (t/i, ^2), w = (wi, V02) in R 2 and scalars a, 6 in R,

T(au + few) = T(aui + bw\,au2 + fe^)


= (aui + fet^i + 1, 2aiii + 2fe^i + aî/2 + fe^)

and
aT(u) + feT(w) = a(ui + 1, 2i*i + u 2 ) + 6(1^1 + 1, 2iüi + w2)
= (au\ + fe^i + a +fe,2αΐλχ + cm2 + 2feu>i + 6^2).
We see that the equality
T(au + few) = aT(u) + 6T(w)
holds if and only if a + fe = 1. Since this equation is not always true for a and fe in R,
we conclude that T is not a linear transformation. ■

In order to indicate some of the variety in linear transformations, we consider some


more examples.
E x a m p l e 4 D Let A = [aij] mX n be a fixed (that is, constant) matrix over R and define
T : Rnxi —> Rmxi by
T(X) = AX
for all
Xl

X2
X =

in R n x i · Let X, Y be arbitrary vectors in R n x i , and let a,febe arbitrary real numbers.


We have both T(X) = AX and T(Y) = AY in R m X i , and

T(aX + bY) = A(aX + bY)


= A(aX)+A(bY)
= aAX + bAY
= aT{X)+bT(Y).

Thus T is a linear transformation of R n x 1 into R m x 1. This type of linear transformation


is called a m a t r i x t r a n s f o r m a t i o n . ■
148 Chapter 5 Linear Transformations

Example 5 D We saw in Example 4 and Problem 15 of Section 4.2 that the set V of
all real-valued functions of t with domain R and the set W of all differentiable functions
of t with domain R form vector spaces over R with respect to the usual operations of
addition and scalar multiplication. Consider the mapping T : W —> V defined by

r(/(*)) = f = /'(*)·
That is, T maps each differentiable function onto its derivative. Using familiar facts
from the calculus, we get

T(af(t) + bg(t)) = £(af(t) + bg(t))


= i(af(t)) + i(bg(t))

= aT(f(t)) + bT(g(t)).

Thus the process of differentiating functions in W is a linear transformation from W


to V. ■
In this chapter, we study linear transformations as mathematical quantities them­
selves. In our study, we need to define three operations on them. Addition and multi­
plication by a scalar are considered first.
Definition 5.2 If S and T are linear transformations ofXJ into V, the sum S + T is
defined by
(5 + T)(u) = 5(u) + T(u)
for all u in U. Also, for each linear transformation TofXJ into V and each scalar a
in T, we define the product of a and T to be the mapping aT of U into V given by

(aT)(u)=a(T(u)).
Example 6 D Let U = R and V = R 2 , and let S and T be defined by
3

S(xi,X2,X3) = (2X1 + X 3 , 3 x i - X 2 ),
T(xi,X2,X3) = (-Xl + 3 £ 2 , 2 £ i - X2 + X 3 ) ·

It is easy to verify that S and T are linear transformations. For example, if u =


(ill,212,^3), w = (wi,W2,ws) and a,b are in R, then

S (au + 6w) = S(aui + bw\,a%i2 + bw21 aus + 6^3)


= (2(aui + bw\) + aus + bws, 3(au\ + bwi) — (au2 + bwy))
= (2au\ + a^3,3au\ — αη2) + (2bw\ + 6103,3bwi — bw2)
= a(2u\ + u3,3ui - u2) + b(2wi + ^3,3w\ - w2)
= aS(u) + bS(w).
5.2 Linear Transformations 149

The sum S + T is given by

(S + T)(xi,X2,X3) = (2xi +x3,3xi -x2) + ( - x i +3^2,2^1 - x 2 +^3)


= (xi + 3x 2 + £ 3 , 5 # i - 2x 2 + X3)»

and the product aT is given by

(aT)(xi,X2,X3) = α ( - χ ι + 3 x 2 , 2 x i - x2+X3)
= (—αχι+3αχ2,2αχι—αχ2+αχ3). Μ

With the definitions of addition and scalar multiplication given in Definition 5.2,
the linear transformations of U into V can be regarded as possible vectors. The next
theorem shows that they are indeed vectors.

Theorem 5.3 Let U and V be vector spaces over the same field T. Then the set of all
linear transformations ofXJ into V is a vector space2 over J7.

Proof. For a complete proof, each of the ten conditions of Definition 4.2 must be
verified. We verify the first six here, leaving the others as exercises. Let Ti,T2, and T3
denote arbitrary linear transformation of U into V, let u and w be arbitrary vectors in
U, and let a, 6, and c be scalars.
Since

( T i + T 2 ) ( a u + 6w) = Ti(au + &w)+T 2 (au + 6w)


= aTi(u) + 6Ti(w) + aT 2 (u) + 6T2(w)
= a[ri(u) + r 2 ( u ) ] + 6 [ T i ( w ) + r 2 ( w ) ]
= a ( T 1 + T 2 ) ( u ) + 6(T 1 +T 2 )(w),

T\ + T2 is a linear transformation of U into V.


Addition is associative, since

(Ti + (T2 + T 3 ))(u) = Ti(u) + (T2 + T 3 )(u)


= ΓχΜ + ΡΜιΟ+Ήίιι)]
= [Γ!(11) + Γ2(11)]+Τ3(11)
= (Τ!+Τ2)(υ)+Τ3(υ)
= ( ( Ï i + T 2 ) + T 3 )(u).

The zero linear transformation Z is an additive identity, since

(7\ + Z)(u) = Ti(u) + Z(u) = Ti(u) + 0 = Ti(u)


2
The notation L(U, V) is often used to denote this vector space.
150 Chapter 5 Linear Transformations

for all u in U.
The additive inverse of T\ is the linear transformation —T\ of U into V defined by
( - 7 i ) ( u ) = - T i ( u ) since

(Γχ + (-rx))(u) = Txiu) + (-rx(u)) = 0

for all u in U.
For any u in U,

(Τχ + T 2 ) ( u ) = Ti(u) + Γ 2 (υ) = T 2 (u) + Ti(u) - (T2 + Ti)(u),

8θΤι+Γ = Γ + Τι.
2 2
Since
(cTi)(au + 6w) = c(Ti(au + ftw))
- c(aTi(u) + 6Ti(w))
- a(cr (u))+6(cT (w))
1 1
cT\ is a linear transformation of U into V. ■

There is a third operation involving transformations that we will consider later.

Theorem 5.4 Let T be a linear transformation o / U into V. If Ui is any subspace of


U, then T(Ui) is a subspace o/V.

Proof. Let Ui be a subspace of U. Then 0 G Ui and T(0) = 0, so T(Ui) is


nonempty.
Let Vi,v 2 be in T(Ui), and αι,α 2 be in T. There exist vectors Ui,u 2 in Ui such
that T(ui) = vi and T(u 2 ) = v 2 . Since Ui is a subspace of U, aiUi + a 2 u 2 is in Ui
and
T ( a i U i + a 2 u 2 ) = αλΤ{\χγ) + a 2 T(u 2 )
= aiVi + a 2 v 2 .

Thus, aiVi -l· a 2 v 2 is in T(Ui), and T(Ui) is a subspace of V. ■

Definition 5.5 The subspace T(U) o/ V 25 ca//ed ί/ie range of T. The dimension of
T(U) is called the rank ofT. The rank ofT will be denoted by rank(T).

Near the end of this section we will devise a method for finding the rank of T when
U and V are finite-dimensional.

Theorem 5.6 Let T be a linear transformation o / U into V. If W is any subspace of


V, the inverse image T _ 1 ( W ) is a subspace o / U .
5.2 Linear Transformations 151

Proof. Let W be a subspace of V. Then 0 G W and hence 0 G T _ 1 ( W ) since


T(0) = 0. Thus T _ 1 ( W ) is nonempty. Let Ui,u 2 be in T _ 1 ( W ) , and let aua2 be in
T. Then vi = T(ui) and v 2 = T(u 2 ) are in W , and this means that

aiVi + a 2 v 2 = a i T ( u i ) + a 2 T(u 2 ) = T(aiUi + a 2 u 2 )

is in W . Therefore, aiUi H- a 2 u 2 is in T _ 1 ( W ) , and T _ 1 ( W ) is a subspace of U. ■

Definition 5.7 The subspace T~1(0) is called the kernel of the linear transformation
T. The dimension o / T _ 1 ( 0 ) is the nullity o/T, denoted by nullity(T).

We concentrate our attention now to the case where U is finite-dimensional.

Theorem 5.8 If T is a linear transformation of\J into V and A = { u i , u 2 , ...,u n } is


a basis ofJJ, then T(A) spans T(U).

Proof. Suppose that A = { u i , u 2 , ...,u n } is a basis of U, and consider the set


T(A) = {T(ui),T(u 2 ), ...,T(u n )}. For any vector v in T(U), there is a vector u in U
such that T(u) = v. The vector u can be written a s u = ΣΓ=ι aiUi s m c e A is a basis
of U. This gives v = T (Σ7=ι a * u *) = ΣΓ=ι a * T ( u i)» a n d T ( ^ ) s P a n s T ( u ) · ■

The essence of a method for finding the rank of a linear transformation T is contained
in the proof of Theorem 5.8. For the set T(A) contains a basis of T(U), and the number
of elements in the basis is rank(T). Our next example demonstrates the use of this
method to find a basis for the range of a linear transformation T, and we also find a
basis for the kernel of T. More efficient and systematic methods for finding these bases
are developed in the next section.

Example 7 □ It is given that the mapping Γ : R 2 x 3 —> R 5 defined by

a n «12 «13
= («n + α 2 ι , 2 α η + a2\ - a i 2 , a 2 2 + ai 3 ,0,a 2 3 )
a 2 i a 22 a 23

is a linear transformation of R 2 x 3 into R 5 . We shall (a) find a basis for the range of T,
(b) find a basis for the kernel of T, and (c) state the rank and nullity of T.
(a) In order to find a basis for T ( R 2 x 3 ) , we first obtain a spanning set T(A) as
described in the proof of Theorem 5.8. The set A = {Αχ, Α2, A^, A4, Α^, Α&} forms a
basis of R 2 x 3 , where

Γΐ 0 0 0 1 0 0 0 ll
, A2 = , 4» =
[o 0 0 0 0 0 0 0 oj
0 0 0 0 0 0 0 0 0
, A5 = , A6 =
1 0 0 0 1 0 0 0 1
152 Chapter 5 Linear Transformations

The set T(A) is given by

T(A) = {(1,2,0,0,0), (0, - 1 , 0 , 0 , 0 ) , (0,0,1,0,0), (1,1,0,0,0),


(0,0,1,0,0), (0,0,0,0,1)},

and we know that T(A) contains a basis of T(R,2X3). Using the refinement process from
Section 1.5 and Example 3 in Section 4.3, we find that the first three vectors in T(A)
are linearly independent and the fourth vector can be written as

(1,1,0,0,0) = (1)(1,2,0,0,0) + (1)(0, - 1 , 0 , 0 , 0 ) + (0)(0,0,1,0,0).

Thus the fourth vector can be deleted from the spanning set T(A). The fifth vector in
T(A) is a repetition of the third vector, so it can also be deleted. The last vector is
clearly not a linear combination of the preceding vectors since it is the only one with a
nonzero fifth component. Thus

{(1,2,0,0,0), (0, - 1 , 0 , 0 , 0 ) , (0,0,1,0,0), (0,0,0,0,1)}

is a basis for the range of T.


(b) To find a basis for T _ 1 ( 0 ) , we set T(v) = 0 and obtain the system of equations

Gil + Ö 2 1 = 0
2<2n + α21 - Û12 = 0

&22 + Û13 = 0

0 = 0
Ö23 = 0

Using Gauss-Jordan elimination, we find that the reduced row-echelon form for the
augmented matrix is

1 0 - 1 0 0 0 0
0 1 1 0 0 0 0
0 0 0 1 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 0
5.2 Linear Transformations 153

Solving for the leading variables in terms of the parameters, we get

an = ai2
Û21 = -Û12

Û12 = Û12

a22 = — Û13

«13 = Û13

«23 = 0

Substituting for the leading variables leads to

a n ai2 ai3 ai2 Û12 «13

a2i a22 «23 - a i 2 -ai3 0


1 1 0 0 0 1
ai2 + Û13
1 0 0 0 - 1 0

Thus the set


1 1 0 0 0 1
5
-1 0 0 0 - 1 0

forms a basis for the kernel T _ 1 ( 0 ) .


(c) From the number of vectors in our bases, we see that

rank(T) = dim(T(R 2 x 3 )) - 4

and
nullity(T) = dimiT-^O)) = 2

We note that the sum of the rank and nullity of T is equal to the dimension of the
domain of T in the example above. Our next theorem states that this equality always
holds.

Theorem 5.9 Let T be a linear transformation of\J into V. If TJ has finite dimension,
then
rank(T) + nullity (T) = dim(U).

Proof. Suppose that U has dimension n, and let k be the nullity of T. Choose
{ui, U2,..., Ufc} to be a basis of the kernel T - 1 ( 0 ) . This linearly independent set can be
extended to a basis
A= {ui,U2,...,Ufc,Ufc+i,...,Un}
154 Chapter 5 Linear Transformations

of U. According to Theorem 5.8, the set T(A) spans T(U). But T(ui) - T(u 2 ) = · · · =
T(iifc) = 0, so this means that the set of n — k vectors {T(ufc+i),T(ufc+2), ...,T(u n )}
spans T(U). To show that this set is linearly independent, suppose that

c fc+ iT(u fc+ i) + c fc+2 T(u fc+2 ) + · · · + c n T(u n ) = 0.

Then
r(Cfc+lUfc+l + Cfc+2Ufc+2 H l· CnUn) = 0,
-1
and Y^=kJri CiU{ is in T ( 0 ) . Thus there are scalars di, d 2 ,..., d^ such that Σ ί = ι d^u* =
ΣΓ^/c+i c*u* and
k n
d U CiUi =
Y2 i i~ Σ °·
i=l i=fc+l

Since *4 is a basis, each C{ and each d^ must be zero. Hence {T(ufc_j_i), T(ufc+2), ♦··? T(un)}
is a basis of T(U). Since rank(T) is the dimension of T(U), n — k — rank(T) and

rank(T) + nullity(T) = n = dim(U). ■

In the next section, a systematic method for finding a basis of T - 1 ( 0 ) is described.

Exercises 5.2

1. Let S and T be the mappings of R 2 into R 2 defined by 5(χχ, x 2 ) — (xi+X2, ^1—^2)


and Γ ( χ ι , χ 2 ) = (—#2> —xi).

(a) Prove that each of 5 and T is a linear operator.


(b) Find S + Γ and 25 - 3Γ.

2. Determine whether the given mapping T : R 2 —» R 2 is a linear operator.


(a) T(x, y) = (x - 2/, 0) (b) T(x, y) = (xy, x) (c) T(x, y) = (x + 1, y - 1)
( d ) T ( x , y ) = (x + y , x - y ) (e) T(x,y) = x(2,1) (f) T(x,y) = (x - y)(x + y,0)
3 2
3. Determine whether the given mapping T : R —> R is a linear transformation.
(a) T(x, y, *) = (2x + 2/, x + *) (b) T(x, y, z) = (x - y, x 2 - y 2 )
(c) T(x,y, *) = (x + y + l , x + y - 1) (d) T(x,y, z) = (x + y + *,0)
4. Determine which of the following mappings T : V\ —> V\ are linear operators.
(a) T(ao + aix) = aox (b) T(ao + αχχ) = αο + a\(x + 1)
(c) T(ao + αχχ) = αοαι + α$χ (d) Τ(αο + CL\X) = αο + ai + «ox
5. Determine which of the following mappings T : R2X2 —► R2X2 are linear operators.
In parts (a)-(c), yl denotes a constant nonzero 2 x 2 matrix over R.
(a) T(X) = A + AX (b) T(X) = AX-XA
(c) T(X) = AXA (d) T(X) = 2X
5.2 Linear Transformations 155

6. Determine whether the given mapping is a linear transformation.

(a) T : R m X n —► R-nxm, T(X) — XT, where XT is the transpose of X


(b) T : R m x n —► R<nxn,T(X) = XTX, where XT is the transpose of X
n
(c) T : R n x n —► R , T ( X ) = £ x » » , where X = [xy] n X n (The value of this
i=l
mapping is called the trace of X.)
(d) T :V2 -> Vi,T(aQ + aix + a2x2) = (α0 + αι) + (αι + α 2 )χ
(e) Γ : Ρ 2 - * ^ 2 , Γ ( ρ ( χ ) ) = ρ ( χ - 1 )
(f) T : V2 - V3,T(p(x)) = xp(x) + p ( l )

7. Let T be the linear transformation of R 4 into R 3 given by

= ( 3 x i — 2X2 — X3 — 4X4, X\ + X2 — 2X3 — 3X4, 2 x i — 3X2 + X3 — X4).

(a) Find a basis of T ( R 4 ) .


(b) Find two linearly independent vectors in the kernel of T.

8. Let T be the mapping of R2X2 into R2X2 defined by

an «12 2an + ai2 2a2i — a22


&21 &22 a n +3a 2 2 «2i — 3a i 2

(a) Prove that T is a linear transformation.


(b) Find a basis for the range of T.

9. Find the rank and nullity of the given linear transformation of U into V.

(a) U = V = R 4 ,
Γ(χι,Χ2,Χ3,£4) = {Xl +X2 + X 3 - Χ±,Χ\ +2x2 + X3 - X4,
Xl — X 2 + 3X3, —Xl + 5X2 — 5X3 — X 4 )
4 3
(b) U = R ,V = R ,
Τ(Χ\,Χ2,Χ$,ΧΑ) = (xi H- 2x 2 + X3 + 3x 4 ,2x 3 - 4χ 4 ,χ χ + 2x 2 + 3x 3 - x 4 )

10. In each part of Problem 9, find the standard basis of the kernel of T by solving
the system of equations that results from setting T(u) = 0.

11. Let T be a linear transformation of U into V. Prove the following statements.

(a) T(0) = 0
156 Chapter 5 Linear Transformations

(b) T ( - u ) = - T ( u ) for all u G U

(
(c) T(u n- w) =\ T(u)n - T(w) for all u, w in U

Σ aiui ) — E a
^ ( u 0 f° r a ^ scalars α^ in T and vectors u* in U

12. Let T be a linear transformation of U into V, and let U i be a subspace of U.


Prove that if A spans U i , then T{A) spans T(Ui). {Note: A is not necessarily
finite.)
13. In each part below, T is a linear transformation of R 4 into R 3 . For the given
subspace Ui and the basis B of R 3 , find the standard basis of T(Ui) relative to
B.

(a) T(xi,X2,X3,X4) = (xi-f3x 2 + ^ 4 , 3 x i - h 5 x 2 - ^ 3 + 2 x 4 , 5 x i + 2 x 2 - 2 x 3 - 2 x 4 ) ,


Ui = ( ( l , 0 , l , 0 ) , ( l , 0 , 2 , 0 ) , ( 0 , l , 0 , - 1 ) ) , B = {(1,0,1),(0,1,1),(1,1,1)}
(b) T(XI,X2,XZ,XA) = (x\ - X3 + 3x 4 ,x 2 + 2χ 4 ,2χχ - x 2 - 2x 3 + 4x 4 ),
U i = ((2,0,1,1), (-1,1,0,0), (4,0, - 2 , 0 ) ) , B = {(1,2,0), (1,0,2), (0,0,1)}

14. Complete the proof of Theorem 5.3.


15. Let T be a linear transformation of U into V with nullity 0. Prove that if T(u 4 )
is dependent on {T(ui),T(u 2 ),T(u3)}, then u 4 is dependent on {ui,u 2 ,U3}.
16. Let T be a linear transformation of U into V, and let Ui and U 2 be subspaces
of U. Prove that T(Ui + U 2 ) = T(Ui) + T(U 2 ).
17. Let T be a linear transformation of U into V, and let Ui and W denote subspaces
of U and V, respectively. Prove or disprove the statements below.
(a) T(T-l(W)) =W (b) T-1(T(\J1)) = Ui

5.3 Linear Transformations and Matrices


In this and the following section, U and V will denote vector spaces of dimension n and
m, respectively, over the same field T, and T will denote a linear transformation of U
into V.
Suppose that A = {u\, u 2 ,..., u n } is a basis of U. Any u in U can be written uniquely
in the form u = Σ ? = ι xjuji ano ^
n
Γ(η) = 5>/Γ(η,·).
i=i
This shows that the value of T at every u in U is determined by the values of T at the
basis vectors u i , u 2 , ...,u n . If B — {vi, v 2 ,..., v m } is a basis of V, then each T(UJ) can
be written uniquely as
m
5.3 Linear Transformations and Matrices 157

Thus, with each choice of bases A and ß , a linear transformation T of U into V de­
termines a unique indexed set {a^·} of run elements of T. These elements make up the
matrix of T relative to the bases A and B.

Definition 5.10 Suppose that A = {ui,U2, ...,u n } and B = {νχ, v 2 , ..., v m } are bases
of U and V, respectively. Let T be a linear transformation of U into V. The matrix
of T relative t o the bases A and B is the matrix

A = [aij]mxn = [T]B,A

where the aij are determined by the conditions


m
T U
( J) = Έαϋ^ΐ

= aijVi + a2jv2 -\ h a m jV m

for j = l,2,...,n.

The symbols A = [ a ^ ] m X n and [T]ß^ in Definition 5.10 denote the same matrix,
but the first one places notational emphasis on the elements of the matrix, while the
second one places emphasis on T and the bases A and B. This matrix A is also referred
to as the matrix of T with respect t o A and ß , and we say that T is represented
by the matrix A.
As mentioned earlier, the elements CLî j are uniquely determined by T for given bases
A and B. Another way to describe A is to observe that the j t h column of A is the
coordinate matrix of T(UJ) with respect to B. That is,

aij
a2j
= PK·)]*

I Q"mj
and
A = [Τ]β,Λ = [[T( U l )] ß , [T(u 2 )] ß ,..., [Γ(ιΐη)]Β].

Example 1 □ With R as the field of scalars, let T : V2 —► V3 be the linear transfor­


mation defined by

T(a0 + a\x + a2x2) = (2a 0 + 2a 2 ) + (a 0 + a\ + 3a2)x + (αχ + 2a2)x2 + (a 0 + a2)x3.

We shall find the matrix A of T relative to the bases

A = {l,l-x,x2} oiV2

and
# = { l , x , l - x 2 , l + x 3 } of TV
158 Chapter 5 Linear Transformations

To find thefirstcolumn of A, we compute T(l) and write it as a linear combination of


the vectors in B.

T(l) = 2 + x + x 3
= (l)(l) + (l)(x) + ( 0 ) ( l - x 2 ) + (l)(l + x 3 ).

Thus the first column of A is


r
i
1
[T(1)]B =
0
1
For the remaining columns of A, we follow the same procedure with the second and
third base vectors in Λ.

T(l-x) = 2-x2+x3
= (0)(1) + ( 0 ) W + ( 1 ) ( 1 - I 2 ) + (1)(1 + I 3 )

T(x2) = 2 + 3x + 2x2 + x3
= (3)(1) + (3)(z) + (-2)(1 - x2) + (1)(1 + x3).

Thus the matrix of T with respect to A and B is given by

1 0 3
1 0 3
A=[T]B,A = [[T(l)]B,[T(l-x)]B,[T(x2)}B] =
0 1 -2
1 1 1

If T is a linear transformation of R n into R m , it is usually easier to find the matrix


of T relative to En and £ m than with any other choice of bases, because the coordinates
of a vector are the same as the components when working with the standard bases.
However, we shall see in the next section that other choices of bases may give a much
simpler matrix for T.

Example 2 □ Consider the linear transformation T : R 4 —* R 3 given by

Τ(χι,Χ2,Χζ,Χ±) = (Xl — X3 +#4,2X1 + X2 + 3χ4,Χχ + 2x2 + 3x 3 + 3£ 4 ).

Using the standard bases 84 and £3, we compute

T(l, 0,0,0) = (1,2,1) Γ(0,1,0,0) = (0,1,2)


Γ(0,0,1,0) = (-1,0,3) Γ(0,0,0,1) = (1,3,3) .
5.3 Linear Transformations and Matrices 159

Thus the matrix of T relative to 84 and £3 is

1 0 - 1 1
[Τ}ε3,ε4 2 1 0 3
1 2 3 3

The mapping / : R n —» R n X i defined by

Xl

X2
/(ζι,Χ2,.·.£η)

is a natural isomorphism. It is so natural, in fact, that some texts make no distinction


between
x1
X2
(xi,x 2 ,...,x n ) and

and consider them as being the same entity. This isomorphism leads to a natural
connection between the preceding example and Example 4 of Section 5.2. From the
example in Section 5.2, we know that the matrix transformation S : R4 X i —► Rßxi
defined by

/ xi
\ xi
1 0 -1 1 £1 — £3 + ^4
X2 X2
— 2 1 0 3 = 2a: 1 + X2 + 3x4
X3 X3
1 2 3 3 xi + 2x 2 + 3x3 + 3x4
X4 X4
\ )
is a linear transformation. It is clear at a glance that the matrix transformation S
and the mapping T in Example 2 are the same except for notation. That is, the two
mappings differ only by an isomorphism.
The work in Examples 1 and 2 of this section illustrates that, for given bases A and
/3, the matrix [ T ] ^ ^ is uniquely determined by T. On the other hand, for a given matrix
Aijimx n and fixed bases A and #, there is only one linear transformation that has
A as its matrix relative to A and B. For if A is the matrix of both S and T, then we
have S(UJ) = ΣΙ T(uj) and, for any u = X ^ = 1 XjVLj in U,
160 Chapter 5 Linear Transformations

S(u) = S X > u , = ^XjSiuj) = ΣΧ3Τ(Μ3) = T J2XJUJ = T ( u )·

But 5(u) = T(u) for all u in U means S = T. Thus, for fixed bases A and 23, T and
v4 = [Τ]β^ determine each other uniquely by the rule

aij

τ
κ) = Σα^ν*'or
a2j
= [ΓΚ)] Β ·
i=\

Generally speaking, a change in either or both of the bases A and B is reflected


in a change in the matrix of the linear transformation. With each particular choice of
A and 23, we say that the matrix A of T relative to A and B represents T relative to
these bases. Thus different matrices may represent the same linear transformation with
different choices of bases. These matrices, though different, have certain properties in
common. The exact relationship between these matrices is revealed in Section 5.4, but
the next theorem gives some useful information on this subject.

Theorem 5.11 If A is any matrix that represents the linear transformation T of U


into V, then rank(A) = rank(T). That is, rank([T] ß? ^) = rank(T).

Proof. Let A = {ui,U2, ...,u n } and B = {vi, v 2 ,..., v m } be bases of U and


V, respectively, and suppose that A — [ a ^ ] m x n represents T relative to these bases.
Then T(UJ) = Σ ΐ ϋ ι aijwit s o ^ n a ^ A is the matrix of transition from B to T(A). Ac­
cording to Theorem 5.8, T(A) spans T(U). This means that the columns of A record
the coordinates relative to B of a spanning set for T(U). By Theorem 3.37, the reduced
column-echelon form A' of A also records the coordinates relative to B of a spanning
set for T(U). Hence the number of nonzero columns in A' is the dimension of T(U),
and we have rank(A) = rank(T). ■

Thus, a convenient method for finding the rank of a linear transformation T is to


find the rank of a matrix that represents T relative to a pair of bases A and B. As a
matter of fact, the reduced column-echelon form of such a matrix will disclose not only
the rank of T, but the standard basis of T(U) relative to B.

Example 3 □ In Example 1 of this section, we found that T : V2 —* ^3 defined by

T(a0 + a\x + a2x2) = (2a0 + 2a 2 ) + (a 0 + a\ + 3a2)x + (ai + 2a2)x2 + (a 0 + a2)x3


5.3 Linear Transformations and Matrices 161

has the matrix

1 0 3
1 0 3
A =
0 1 -2
1 1 1

relative to A = {1,1 — x,x2} and B — {l,x, 1 — x2,1 + x3}. The matrix A can be
transformed to reduced column-echelon form as follows.

1 0 3 1 0 0 1 0 0
1 0 3 1 0 0 1 0 0
A = = Ä
0 1 -2 0 1 -2 0 1 0
1 1 1 1 1 -2 1 1 0

Thus rank(T) = rank(A) = 2, and the computations

(l)(l) + (l)(x) + ( 0 ) ( l - x 2 ) + ( l ) ( l - f x 3 ) = 2 + x + x 3
(0)(l) + (0)(x) + ( l ) ( l - x 2 ) + ( l ) ( l - f x 3 ) = 2 - x 2 + x 3

show that the standard basis of T{V2) relative to B is

{2 + £ + £ 3 , 2 - x 2 + £ 3 }. ■

Theorem 5.12 Let Λ = { u i , u 2 , ...,u n } and B = {νχ, V2,..., v m } be bases of U and


V, respectively, and let A = [ a ^ ] m x n = [T]#5^ be the matrix of T relative to the bases
A and B. If u is an arbitrary vector in U, then

[T(u)] B = [T}B<A[u}A.

Proof. Let

Xl 2/1

2/2
ful , =X = and [T(u)]B = Y =
162 Chapter 5 Linear Transformations

so that u = Σ xjuj an
d T(u) — Σ Vivi- Since A — [ a ^ ] m x n is the matrix of T relative
j=l i=l
to A and B, we have
m
Σ yM = r(u)

3=1
n / m
X a
= Σ J Σ ijVi
j=l \t=l
m / n \
α χ v
= Σ Σ ϋ3 *·

But the coordinates of T(u) relative to B are unique, so this means that jji = Σ ? = ι aijxj->
and
2/1 Û l l X l + CL\2X2 H h ûln^n

2/2 a2\X\ + CL22X2 H h Ö2n^n


y =

Û m l ^ l H" Q>m2x2 + * * * + CLmnXn

un Û12 ··* ßln

«21 Û22 * ' * «2n #2


= ;4A\

ß-ral 0.777,2 ' ' ' Û m i

That is,
[T(u)]ß = [T]^[u]^.
Theorem 5.12 shows that the matrix transformation from R n x i to R m x i defined
in Section 5.2 by T(X) = AX generalizes to coordinates with arbitrary linear transfor­
mations of finite-dimensional vector spaces. However, it should be kept in mind that
X, A, and Y in the equation Y = AX are taken relative to the bases A and B, and
consequently change when A and B are changed.

Example 4 □ Let T be the linear transformation of R 4 into R 3 that has the matrix

1 0 - 1 1
A = 2 1 0 3
1 2 3 3
5.3 Linear Transformations and Matrices 163

with respect to the bases A = {(1,1,1,1), (1,1,1,0), (1,1,0,0), (1,0,0,0)} of R 4 and


B = {(0,1,1), (1,0,0), (0,0,1)} of R 3 . Suppose we wish to use this matrix A to find the
value of T ( 3 , 4 , - 1 , - 4 ) .
To find the coordinates of u = (3,4, —1, —4) relative to A1 we write u as a linear
combination of the vectors in A. The strategic placement of zeros in the vectors of A
allows us to obtain the coefficients of these vectors by inspection: The first vector in A
is the only one with a nonzero fourth component, the first and second vectors in A are
the only ones with nonzero third components, and so on. We obtain

(3,4, - 1 , - 4 ) = ( - 4 ) ( l , l , l , l ) + (3)(l, 1,1,0) + 5(1,1,0,0) + ( - l ) ( l , 0 , 0 , 0 ) .

Thus
-4
3
[uü
5
-1

and therefore
-4 _ _
1 0 -1 1 -10
3
[T(U)]B = 2 1 0 3 = -8
5
1 2 3 3 14
L
-1

Using these coordinates with the vectors in the basis ß, we find that

T ( 3 , 4 , - l , - 4 ) = ( - 1 0 ) ( 0 , l , l ) + ( - 8 ) ( l , 0,0) + 14(0,0,1)
= (-8,-10,4). ■

Theorem 5.12 reveals a practical approach to the problem of finding a basis for the
kernel T - 1 ( 0 ) . For u G Γ _ 1 ( 0 ) if and only if the coordinates X of u satisfy AX = 0.
Thus, the solutions to the system of equations AX = 0 furnish the coordinates of the
vectors i n T " 1 ^ ) .

E x a m p l e 5 D With R as the field of scalars for P 4 , let T : VA —► R 4 be the linear


transformation defined by

T(ao + a\x + Ü2X2 + asx3 + α±χΑ) =


(α0 + a\ + α2 + 2α4, α\ + 2α2 + α3 - α4, α3 - α4, α0 + α\ + α2 + 2α 4 )·

A natural choice of bases is A = { l , x , x 2 , x 3 , x 4 } for V4 and the standard basis £4 =


164 Chapter 5 Linear Transformations

{ei,e2,e3,e4J for R 4 . Straightforward computations yield

T(l) = (1,0,0,1), T(x) = (1,1,0,1)


2
T(x ) = (1,2,0,1), T(x 3 ) = (0,1,1,0),
T(x 4 ) = ( 2 , - 1 , - 1 , 2 ) .

Thus the matrix of T relative to A and B is

1 1 1 0 2
0 1 2 1-1
0 0 0 1-1
1 1 1 0 2

and we need to solve the linear system AX = 0. Using the augmented matrix [A, 0], we
have

1 1 1 0 2 0 1 1 1 0 2 0
0 12 1-1 0 0 12 1-10
μ,ο]
0 0 0 1-1 0 0 0 0 1-10
1 1 1 0 2 0 0 0 0 0 0 0

1 1 1 0 2 0 1 0 - 1 0 2 0
0 12 0 0 0 0 1 2 0 0 0
0 0 0 1-10 0 0 0 1-10
0 0 0 0 0 0 0 0 0 0 0 0

The solutions to AX = 0 are given by

Xl = ^3 -- 2X 5

X2 = -2x 3
X3 = ^3

X4 = x5
X5 = x5
5.3 Linear Transformations and Matrices 165

and in matrix form by


\ 1 -2
-2 0
X = X3 1 + #5 0
0 1
0 1

Using the coordinate matrices [1, —2,1,0,0] T and [—2,0,0,1,1] T with the basis A, we
obtain the basis
{ l - 2 x + x 2 , - 2 - f x3 +x4}
for the kernel of T. ■

Theorem 5.13 Let A = {ui,U2, ...,u n } and B = {vi, v 2 ,..., v m } be bases of JJ and
V, respectively. If A = [afj] m x n is a matrix such that the equation

[T(u)]B = A[u}A

is satisfied for all u in U, then A is the matrix of the linear transformation T relative
to A and B.

Proof. For u = u3;, we have


Hj

02j
Uo

u
nj

and
n
Σ alkt>kj
k=l aXj

a2j
T[(uj)]B - ^[u,·]^ fc=l =

n
G"mj
/ v Q"mkOkj
k=\
Thus T(UJ) — Σ™=ι aijVi, and A is the matrix of T relative to A and ß. ■

Exercises 5.3

1. Let vi = (2, —1) and v 2 = (1,0) in R 2 . Find a formula for T(x, y) if T is the linear
transformation of R 2 into R 3 for which Τ(ν χ ) = (1,0,1) and T(v 2 ) = (0,1,1).
166 Chapter 5 Linear Transformations

2. Suppose T : R2xi ^ Rßxi is a linear transformation such that

Γ ~\ \ 1 0
1 \ 0
= 2 andT ) - -1
0 / 1
L J /
3 ) 1 1

FindT

3. Consider the basis {pi(x),p2(x),ps(x)} for V2 over R, where p\{x) = 1 + x + x 2 ,


P2(#) = # + x iPz(x) = x . Find a formula for T(ao -f αχχ -f- a2^ 2 ) if T : P2 —> R 2
2 2

is the linear transformation such that T(pi(x)) = (l,0),T(p2(#)) — (1,0), and


T(p3(x)) = (0,1).
4. Let T : R.2x2 —► R-3xi be a linear transformation for which it is known that

1 0 0 1
,T
0 0 0 0

Γ "I \ 2 / Γ "1 \ 3
0 0 \ / 0 0 \
= 3 ,T = 3
1 0 / 0 1
L J /
2 I\ L J
/
/
0

FindT

In Problems 5-8, find the matrix of the given linear transformation T relative to the
bases Λ and B.

5. T : R 3 ^ R 4 , . 4 = é:3,£ = £:4
Τ(χι,χ2,χ3) = (xi + X2 - ^3,5x 2 - 2x 3 ,4xi + x 3 , 2 x i + 3 x 2 + £3)
4 3
6. T : R - + R , . 4 = £4,B = S3
T(xi, X2, X3,X4) = (^1 + #2 + #3 - #4, X\ + 2x 2 + #3 ~ #4> #1 ~ #2 + #3)

7. T :V2->V3 over R , . 4 = {1,1 - x,x - x 2 } , ß = {1,1 + x, 1 - x 2 , x + x 3 }


T(p(x)) = (1 - x)p(x)
8. T :V2^Vl over R, .4 = {1 + x 2 , 1 + x, 1}, ß = {1 - x, x}
T(a 0 + CL\X + Û2^ 2 ) = («o + ai + a 2 ) + (ai - 2a 2 )x
5.3 Linear Transformations and Matrices 167

1 2
9. If T is the linear transformation that has the matrix -1 1 relative to the
1 1
bases {(1,1), (-3,1)} of R 2 and £ 3 of R 3 , find T ( - 2 , 2 ) .

2 3 1
10. Let T be the linear transformation of R 3 into R 2 that has the matrix
1 2 1
relative to the bases {(1,-1,1), (0,1,0), (1,0,0)} of R 3 and {(3,2), (2,1)} o f R 2 .
Find T(2,0,1).

1 0 0
2 1 1
11. LetA = be the matrix of the linear transformation T : V2 —> V3 over
3 2 1
4 3 1
R with respect to the bases {4,1+x, 1 + x 2 } and { l , x , x 2 , x 3 } . Find T ( 2 - 2 x + x 2 ) .

1 2 3
12. Let A = be the matrix of the linear transformation T : V2 —> Pi
0 1 2
with respect to the bases { l + x + x 2 , x + x 2 , x 2 } and { l , 2 + 3 x } . Find T ( 2 + 5 x + x 2 ) .

4 -1
13. The linear operator T on R 2 has the matrix relative to the basis
-4 3
1
A = B = {(1,2), (0,1)}. A vector u has coordinates relative to this basis.
1
Find T(u) in component form (#,?/).
14. Suppose the bases A of R2X2 and # of V\ over R are given by

1 0 0 0 0 0 0 1
A = 1 1 1

0 1 1 1 1 0 0 0

B= {1 + χ , Ι - χ } .

2 0 1 0
Let T be the linear transformation T : R2X2 —* Pi with matrix
1 1 - 1 0
0 0
relative to A and i3. Find T
3 2
168 Chapter 5 Linear Transformations

15. Let T be the linear transformation of R 3 into R 2 whose matrix is

9 -4 -4
A =
-3 - 2

relative to the standard bases £3 and £2· Find the matrix of T relative to the
bases {(1,2,0), (1,1,1), (1,1,0)} of R 3 and {(1,0), (1,1)} of R 2 .
16. Let T be a linear operator on R 2 that maps (2,1) onto (5,2) and (1,2) onto (7,10).
Determine the matrix of T with respect to the bases A = B = {(3,3), (1, - 1 ) } .
17. Find the matrix of T in Problem 9 relative to the bases {(-1,3), (—1,1)} of R 2
and £3 of R 3 .
18. A linear operator T on R 4 has the matrix

1 0 1 1
2 1 3 1
1 ■0 -1 -1
3 2 5 1

with respect to the bases A = B = £4. Find a basis for T(R 4 ).


19. Suppose T is the linear transformation of R n into R m that has the given matrix
A relative to the standard bases £ n and £ m . Find a basis for the range of T.
1 1 0 -1 3 -2 -1 -4
(aM 2 3 1 0 (b)A. 1 1-2-3
1 3 2 3 -2 3-1 1

1 2 -1 2 1 1 2 0 1 0
1 4 4 -3 -1 2 4 1 4 3
(C)A- (d)A:
2 6 3 -1 0 1 2 2 5 -2
3 8 2 1 2 1 -2 3 5 4
20. Find a basis for the kernel of T in each part of Problem 19.
21. Let T be the linear transformation of R 5 into R 3 that has the matrix A relative to
the bases {(1,1,1,1,1), (1,1,1,1,0), (1,1,0,0,0), (1,0,0,0,0), (0,0,0,0,1)} of R 5
and {(1,1,1), (0,1,0), (1,0,0)} of R 3 . Find a basis for the range of T.

1 3 2 0 -1 1-2 0 1-4
(aM = 2 6 4 6 4 (b)A 2-4 1 3-5
1 3 2 2 1 1-2 0 0-2
5.4 Change of Basis 169

22. Find a basis for the kernel of T in each part of Problem 21.

1 1 0
3
23. Let T be the linear operator on R that has the matrix 2 2 0 relative to
3 3 0
3
the basis A = B = {(1,0,1), (1,1,0), (0,1,1)} of R . Find a basis for the kernel of
T.

24. Let T be a linear transformation of U into V, and let Ui and U2 be subspaces


of U. Prove or disprove that T(Ui Π U 2 ) = T ( U i ) Π T(U 2 ).

25. Let T be a linear transformation of U into V, and let W i and W2 be subspaces


of V. Prove or disprove the statements below.

(a) T - 1 ( W 1 + W 2 ) = T - 1 ( W 1 ) + T - 1 ( W 2 ) .
(b) r 1 (Winw 2 ) = r 1 (Wi)nr 1 (w 2 ).

5.4 Change of Basis


It is the purpose of this section to give a complete description of the relation between
those matrices that represent the same linear transformation. This description is found
by examining the effect that a change in the bases of U and V has on the matrix of T.

Theorem 5.14 Let C = {wi, w 2 ,..., w&} andC = {w^, w 2 ,..., w^.} be two bases of the
vector space W over T. For an arbitrary vector w in W , let

C\ V
[wlc = C =
C2
and {w]c = C' =
4

Ck _ AJ
denote the coordinate matrices of w relative to C and C, respectively. If P is the matrix
of transition from C to C, then C = PC. That is,

[w]c = P [ w ] c .

Proof. Let P = [pij]kxk, and assume that the hypotheses of the theorem are satisfied.
Then
k k
170 Chapter 5 Linear Transformations

and
W
J =EpyWi"
i=l
Combining these equalities, we have

W = Σ CiWi
i=l
k / k \
Σ c'j[ Σ Pij-w»
k / k \
Σ ( EPÜCJ j w<-

Therefore Q = Y2j=iPijc'j an
d

ci

P21C[ + P22C 2 H h P2kdk


^PC7.

Pkic'i + P/e2C2 H h PfcJfcCfc

Example 1 □ Consider the bases C = {x, 2 + x} and C = {4 + x, 4 — x} of the vector


space W = V\ over R. Since

4 + x = ( - l ) ( x ) + (2)(2 + x),
4 - x = (-3)(x) + (2)(2 + x)

the matrix of transition P from C to C is given by

-1 -3
2 2

The vector w = 4 + 3x can be written as


4 + 3x = 2(4 + x) + (-1)(4 - x)
so

[w]c =

According to Theorem 5.14, the coordinate matrix [w]c may be found from

-1 -3 2
[w]c = P[w]c =
2 2 -1
5.4 Change of Basis 171

This result can be checked by using the base vectors in C. We get

(l)(x) + 2(2 + x) = 4 + 3x = w,

so the value for [w]c is correct. ■

The following theorem gives a full description of the effect of a change in the bases
A and B.

Theorem 5.15 Suppose that T has matrix A = [ a ^ ] m X n relative to the bases AofXJ
and B ofV. If Q is the matrix of transition from A to the basis A! of U and P is the
matrix of transition from B to the basis B' o/V, then the matrix ofT relative to A! and
B' is P~lAQ. (See Figure 5.1.)

U .
Basis y\ Basis B

U _
Basis y\' A'=PXAQ Basis B '

Figure 5.1

Proof. Assume that the hypotheses of the theorem are satisfied. Let u be an
arbitrary vector in U, let X and X' denote the coordinate matrices of u relative to A
and A!\ respectively, and let Y and Y' denote the coordinate matrices of T(u) relative
to B and B', respectively. According to Theorems 5.12 and 5.14, we have Y = AX,
where Y = ΡΥ' and X = QX'. Substituting for Y and X, we have

PY' = AQX',

and therefore
Y' = (p-1AQ)Xf.

By Theorem 5.13, P lAQ is the matrix of T relative to A' and B'.

Theorem 5.16 Two mxn matrices A and B represent the same linear transformation
TofXJ into V if and only if A and B are equivalent.
172 Chapter 5 Linear Transformations

Proof. If A and B represent T relative to set of bases A, B and A\ B', respectively,


then B = P~lAQ, where Q is the matrix of transition from A to A! and P is the matrix
of transition from B to Bf. Hence A and B are equivalent.
If B is equivalent to A, then B = P~lAQ for invertible P and Q. If A represents T
relative to A and ß, then B represents T relative to Af and B', where Q is the matrix
of transition from A to A' and P is the matrix of transition from B to Β', ■

The proof of Theorem 5.16 can be modified so as to obtain two similar results
concerning row equivalence and column equivalence. For requiring B' = B is the same
as requiring P — J m , and requiring A' — A is the same as requiring Q = In. Thus we
have the following theorems.
Theorem 5.17 Two mxn matrices A and B represent the same linear transformation
TofU into V relative to the same basis of U if and only if they are row-equivalent.
Theorem 5.18 Two mxn matrices A and B represent the same linear transformation
T ofXJ into V relative to the same basis o / V if and only if they are column-equivalent.
The three preceding theorems give a full exposition of the connection between linear
transformations and the equivalence relations on matrices that were studied in Chapter
3. However, there is one more major application of matrix theory to the study of linear
transformations. This application is contained in the following theorem.
Theorem 5.19 Let T be an arbitrary linear transformation o / U into V, and let r be
the rank of T. Then there exist bases A' of U and B' of V such that the matrix of T
relative to A' and B' has the first r diagonal elements equal to 1, and all other elements
zero.

Proof. With the stated hypotheses, suppose that A and B are bases of U and V,
respectively, and that T has matrix A relative to A and B. By Theorem 3.46, there exist
invertible matrices P~l and Q such that P~lAQ has the first r diagonal elements equal
to 1, and all other elements zero. Let A! and B' be bases such that Q is the matrix of
transition from A to A' and P is the matrix of transition from B to B'. Then P~lAQ
is the matrix of T relative to A' and Bf, and the theorem is proved. ■

Thus, with suitable choice of bases, each linear transformation T of U into V can
be represented by a matrix of the form

Ir I 0
Dr = —+—
0 I 0

where r is the rank of T. From a different point of view, this means that two linear
transformations of U into V can be represented by the same matrix if and only if they
have the same rank. It is easy to see that the relation of having the same rank is an
equivalence relation on the set of all linear transformations of U into V.
5.4 Change of Basis 173

Example 2 D Let T be the linear transformation from R 4 to V2 over R defined by


Τ(αι,α 2 ,α 3 ,α 4 ) = (ai - a2 + 2α4) + (αχ - a2 + a3 + a 4 )x + (2ai - 2a 2 -h a 3 + 3a 4 )x 2 .
We shall find bases A' of R 4 and Bf of P2 such that the matrix of T relative to A' and
β' has the form
Γ
/Γ I 0
Dr —+
0 1 0
We first find the matrix A of T with respect to <?4 and # = { Ι , χ , χ 2 } . Since

T(l,0,0,0) = (l)(l) + (l)(x) + (2)(x 2 ),


T(0,1,0,0) = (-1)(1) + (-l)(:r) + (-2)(z 2 ),
T(0,0,l,0) = (0)(l) + (l)(x) + (l)(x2),
T(0,0,0,l) = (2)(l) + (l)(x) + (3)(x 2 ),
the matrix A is given by
1 -1 0 2
1 -1 1 1
2 -2 1 3
Following the proof of Theorem 5.19 and using the same procedure as in Section 3.8, we
first find an invertible matrix Q such that AQ = A' is in reduced column-echelon form.

1 -1 0 2 1 0 0 1 0 0 0
1 -1 1 1 1 1 -1 0 1 0 0
2 -2 1 3 2 1 -1 1 1 0 0
A AQ
] — — — —
-►

[h_ 1 0 0 0 1 0 -2 1 0 1 -2 Q
0 1 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 1 0 -1 1 0 1
0 0 0 1 0 0 0 1 0 0 0 1

Next we use row operations to transform [Α', Is] into [D r , P _ 1 ] , where Dr P~lÄ
l
P~ AQ.

1 0 0 0 1 0 0 1 0 0 01 1 0 0
[A',h} = 0 1 0 0 0 1 0 -> 0 1 0 01 0 1 0 = [Dr,p-1}.
1 1 0 0 0 0 1 0 0 0 0 1 -1 -1 1
174 Chapter 5 Linear Transformations

Thus
1 0 1-2
1 0 0
0 0 1 0
p-1 0 1 0 and Q
- 1 1 0 1
-1 -1 1
0 0 0 1
are invertible matrices such that

1 0 0 0 h I o
P-'AQ = 0 1 0 0 = — + — D2.
0 0 0 0 0 I 0

According to the proof of Theorem 5.19, the desired basis A! and B' can be found by
using Q as the transition matrix from A to A' and P as the transition matrix from B
to B'. Using Q to find A', we get

•4' = {(1,0,-1,0), (0,0,1,0),(1,1,0,0), (-2,0,1,1)}·

To find B', we first obtain P by taking the inverse of P~l and then use P as the
transition matrix from B to B'. We find
-1
1 0 0 1 0 0
P = 0 1 0 = 0 1 0
-1 -1 1 1 1 1

and B' — {1 + x 2 , x + x 2 , x 2 }. The original defining equation for Τ(αι, α2, Ö3, 04) can be
used to check that D2 is in fact the matrix of T with respect to A' and B'. ■

Exercises 5.4

1. Find v if [v]# = and B = {x + x 2 ,1 + x 2 , x} in V2 over R.

2. Let ß = { ( 2 , - 1 , 1 ) , ( 0 , 1 , - 1 ) , (-2,1,0)}.Find the coordinate matrix of (6,0,-1)


relative to the basis B of R 3 .
3. Suppose the basis A of V2 over R is given by A = {x + x 2 ,2 + x, 1}. Given that
1 0 1
0 1 0 I is the transition matrix from A to ß, find the basis B oiV2·
1 0 2
5.4 Change of Basis 175

3 5
4. Suppose is the transition matrix from the basis A = {(—4,2), (10, —6)}
1 2
of R 2 to the basis B.

(a) Find ß. (b) Find the transition matrix from B to A.

5. Let A = {(1,1), (2,0)} and B = {(0,2), (2,1)} in R 2 .


3 1
(a) Find [uU if [u]B = (b) Find v if [v]B =
-2 -1

6. Let A be the basis of V2 over R given by A = {x 2 ,1 + x,x + x 2 } . Given that


1 0 0
P = 0 1 1 is the transition matrix from B to A and that [u]#
-1 0 1
find u.

7. Let A = £4 in R 4 and # = {x 2 , x, 1} in V2 over R. If T is the linear transformation


1 1 0 1
that is represented by 0 0 1-1 relative to A and B, find the matrix that
1 1 0 1
represents T with respect to A! and B' where

.A7 = {(1,0,0,0), (0,0,1,0), ( 1 , - 1 , 0 , 0 ) , ( 0 , - 1 , 1 , 1 , ) } ,


B' = {x 2 + l , x , l } .

o n , 1 - 3 1
8. Suppose S : R —* R is the linear transformation with matrix
' 2 - 6 2
relative to the bases £3 and 82- Find the matrix of S with respect to the bases
{(1,0,1), (1,0,0), (1,1,0)} and {(1,-1), (2,0)}.

2 3 1
9. Let S be the linear transformation from R 3 to R 2 that has the matrix
1 2 1
1 1 1
relative to 83 and £2· Given that 0 0-1 is the transition matrix from £3
0 1 1
176 Chapter 5 Linear Transformations

2 3
to A! and is the transition matrix from £3 to B', find the matrix of S
1 2
relative to A' and B'.

1 -1 1
10. Let T be the linear transformation of R 3 into R 2 that has the matrix
2 1 1
relative to the bases {(1,2,0), (1,1,1), (1,1,0)} of R and {(1,1), (1,-1)} of R 2 .
3

Find the matrix of T relative to the bases {(2,3,0), (1,1,1), (2,3,1)} of R 3 and
{(3,-1), ( 1 , - 1 ) } of R 2 .

2 1 0
11. Let T be the linear operator on V2 over R that has the matrix 0 2 0
2 3 1
2
relative to the bases A — B — {x + x , — 1 + x, x}. Find the matrix of T relative
to the basis A' = B' = {2 - x + x 2 , - 6 x - 2x 2 , x}.

3 -2
12. Suppose the linear operator T on V\ over R has the matrix with respect
1 0
to A = B = {1— x,x}. Find the matrix of T with respect to A! = B' = { 2 - x , - 1 } .

1 4
13. The linear operator S : R 2 —* R 2 has the matrix with respect to the
2 3
bases A = ß = {(1,1), (0,1)}. Find the matrix of S with respect to A! = B' =
{(2,1), (1,2)}.

14. Let T be the linear operator in Problem 13 of Exercises 5.3. Use matrices of
transition to find the matrix of T relative to 82 and £2. Then use the new matrix
to compute T(u), thus making a check on the answer previously obtained.

15. Work Problem 17 of Exercises 5.3 using matrices of transition.

1 -1 1
16. Suppose the linear transformation S : R 3 —> R 2 has the matrix A =
2 1 0
3 2
relative to £3 and £ 2 · Find bases A' of R and B' of R such th*
that the matrix A!
of S relative to A! and B' is the reduced row-echelon form for A.

2 3 1
17. Suppose the linear transformation T : R 3 —► R 2 has the matrix A =
1 2 1
5.5 Composition of Linear Transformations 177

1 0 1
2 -3
relative to £3 and £2· Given that B\ and B<i = 0 1 -1 are
1 2
0 0 1
1 0 0
invertible matrices such that B1AB2 = Ό2 , find bases A! of R 3
0 1 0
and B' of R 2 such that T has matrix Ό2 relative to A' and B'.
18. Suppose T is the linear transformation of R n into R m that has the given matrix
A relative to £n and £ m . Find bases A! of R n and B' of R m that satisfy the
conditions given in Theorem 5.19.
1 3 2 0 -1 3 -2 -1 -4
(a)A = 2 6 5 6 1 (b)A = 1 1 -2 -3
1 3 2 2 0 2 3 -1 1

5.5 Composition of Linear Transformations


In this section we conclude our study of the relation between a linear transformation
and its associated matrix. We shall see that there are simple and direct connections
between performing binary operations on linear transformations and performing binary
operations on their associated matrices.
At various points, we have considered operations of addition and scalar multipli­
cation on both matrices and linear transformations (see Example 5 of Section 4.2 and
Definition 5.2). The intimate relation between these operations is described in the
following theorem.
Theorem 5.20 Let U and V be vector spaces over T of dimensions n and ra, respec­
tively, and let S and T be linear transformations of U into V. If S has matrix A and
T has matrix B relative to certain bases of U and V, then S + T has matrix A + B
relative to these same bases. Also, for any a in T, aT has matrix aB relative to these
bases.

Proof. Let A = {ui, 112, ...,u n } and B = {vi, V2,..., v m } be bases of U and V,
respectively. If S has matrix A = [ a ^ ] m x n and T has matrix B = [6^] m X n relative to
A and #, then S(UJ) = Σ ϋ ι aijvi a n d T(\ij) = J^iLi bijVi- Hence

(S + r ) ( u i ) = X ) ( a i i + 6<i)vi,

and S + T has matrix A + B relative to A and B. Also,


771

(oT)( U j ) =^2(abij)vi,
178 Chapter 5 Linear Transformations

and aT has matrix aB relative to Λ and B.

We turn our attention now to the "third operation" on linear transformations that
was mentioned on page 150. This third operation is the composition of linear trans­
formations, defined by the same kind of rule as used for composite functions in the
calculus.
In the calculus, two given functions / and g are combined to produce the composite
function / o g by the rule
(fog)(x) = f(g(x)).
The domain of / o g is the set of all x in the domain of g such that / is defined at g(x).
In linear algebra, it is common to refer to the composite of two linear transformations
as their product. We adopt this usage here, stated formally in the following definition.
Definition 5.21 Let U, V, and W be vector spaces over the same field T', and suppose
that S is a linear transformation of U into V and that T is a linear transformation of
V into W . Then the product TS is the mapping of U into W defined by
TS(u) = (ToS)(u) = T(S(u))
for each u in U.
Example 1 D Consider the linear transformations S : R 2 x2 R 3 and T : R 3 -» Vx
over R defined by
b
= (a + 2b,b- 3c, c + d)

and
Τ(αι,α 2 ,α 3 ) = (αχ - 2α2 - 6α3) + (α2 + 3α3)χ.
Computing the product TS, we have

a b a b
TS = T IS
c d c d
= T(a + 2b,b-3c,c + d)
= (a + 26 - 2(6 - 3c) - 6(c + d) + (6 - 3c + 3(c + d))x
= {a-6d) + (b + 3d)x. ■

One of the exercises for this section asks for verification that the product in Definition
5.21 is associative but not commutative. The most important property, as far as our
study is concerned, is stated in Theorem 5.22.
For the remainder of this section, U, V, and W will denote vector spaces over the
same field T. Also, S and T will denote linear transformations of U into V and V into
W , respectively.

Theorem 5.22 The product of two linear transformations is a linear transformation.


5.5 Composition of Linear Transformations 179

Proof. It is clear from Definition 5.21 that TS is a mapping of U into W . To show


that TS is a linear transformation, let u i , u 2 E U and a,b E T. Then

TS(aui + 6u 2 ) = T (5(aui + 6u 2 ))
- T ( a S ( U l ) + 6S(u 2 ))
= a T ( S ( u i ) ) + 6T(5(u 2 ))
= a r S ( u i ) + MTS(u2),

and TS is indeed a linear transformation of U into W . ■

The relation between multiplication of linear transformation and multiplication of


matrices is given in the next theorem.

Theorem 5.23 Suppose that U, V, and W are finite-dimensional vector spaces with
bases *4, B, and C, respectively. If S has matrix A relative to A and B and T has
matrix B relative to B and C, then TS has matrix BA relative to A and C.

Proof. Assume that the hypotheses are satisfied with A = { u i , u 2 , . . . , u n } , B =


{ v i , v 2 , . . . , v m } , C = { w i , w 2 , . . . , w p } , A = [ o y ] m x n , a n d ß = [6y]p Xm . Let C = [ c ^ ] p x n
be the matrix of TS relative to A and C. For j — 1,2,..., n we have
p
Σ CijWi = TS(uj)
i=l

m
= ZakjT(vk)
k=l
rn / p >
a
= Σ kj ( Σ bikWi
m p
= Σ Σ (°>kjbikWi)
k=li=l
p m
= Σ Σ (bikCLkjWi)
P / m \
= Σ Σ bikdkj Wi,
2=1 \fc=l /

and consequently Cij = Σ ^ ΐ ι &ifcafcj f° r a ^ values of z and j . Therefore C — BA and


the theorem is proved. ■

Example 2 D Let the bases A of R 2 x 2 , B of R 3 , and C of Pi be given by


r
i 0 0 1 0 0 0 0
5 5 1
A- 0 0 0 0 1 0 0 1
180 Chapter 5 Linear Transformations

B = £3, and C — {l,x}. With S and T as in Example 1, we shall find the matrices of
5, T, and TS relative to these bases and verify that the matrix of TS is the product of
the matrix of T times the matrix of S.
Since
1 0 0 1
(1,0,0), S (2,1,0),
0 0 0 0

0 0 0 0
(0,-3,1), S (0,0,1),
1 0 0 1

S has matrix
1 2 0 0
0 1 - 3 0
0 0 1 1

relative to Λ and B. Since

T(l,0,0) = (l)(l) + (0)(x)


T(0,l,0) = (-2)(l) + (l)(x)
T(0,0,l) = (-6)(l) + (3)(x),

the matrix B of T relative to B and C is

1 -2 -6
B =
0 1 3

For the product TS, we have

1 0 0 1
TS (1)(1) + (0)(ζ), TS (0)(1)+ (!)(*),
0 0 0 0

0 0 0 0
TS (0)(l) + (0)(x), TS (-6)(1) + (3)(x),
1 0 0 1

so the matrix of TS relative to Λ and C is

1 0 0-6
0 1 0 3
5.5 Composition of Linear Transformations 181

Routine arithmetic verifies that

1 2 0 0
1 -2 -6 1 0 0-6
BA = 0 1 -3 0 =
0 1 3 0 1 0 3
0 0 1 1

We note that the product AB is not defined, and this is consistent with the fact that
ST is not defined. ■
The operations of addition and multiplication of matrices are connected by the dis­
tributive property. The statement of this fact is in Theorem 5.24. Proofs are requested
in the exercises.
Theorem 5.24 Let A — [a^] m X n ,jB = [bij]nxp, and C = [β^·] ηχρ over T. Then
A(B + C) = AB + AC.
There is the possibility in Definition 5.10 that the vector spaces U and V may be
identical, and yet the bases A and B may be different. In many instances, there is
no condition present that requires that A and B be different. In these instances, it is
convenient and conventional to choose A = B. If it is our intention to use just one base
A of V, we simply use the phrase "matrix of T relative to ^4" rather than "matrix of T
relative to A and ß , where A and B are equal." Similarly, "A represents T relative to
*4" means that A represents T relative to A and #, where A and B are equal.
Let us consider the case where U = V = W in Definition 5.21. This allows us to
define positive integral powers of T inductively by T / e + 1 = TkoT for each positive integer
fc. We define T° to be the identity transformation of V. In combination with Definition
5.2, this determines the value of each polynomial arTr + ar-\Tr~l + · · · + a{T + aoT°
in T with coefficients in T", and such a polynomial is always a linear transformation of
V into V. In such a polynomial we shall write α$ in place of aoT°.
If T has matrix A relative to the basis A, then Theorems 5.20 and 5.23 show that
aTTr + ar-\TT~x H f-aiT + ao has matrix arAr + ar-iAr~1 H \-a\A-\-aoI relative
to A. Consequently, ^ [ = 0 α{Τι is the zero linear transformation if and only if ΣΙ=0 CLÎA1
is the zero matrix. This means that T and A satisfy the same polynomial equations.
A linear transformation T of U into V is called invertible or nonsingular if there
is a mapping S of V into U such that ST(u) = u for all u G U and TS(v) = v for
all v G V . Whenever such a mapping S exists, it is denoted by S — T~x and is called
the inverse of T. It is left as an exercise to prove that the inverse of T is a linear
transformation of V into U. It follows from Theorem 5.23 that if T is invertible and has
matrix A relative to the bases A of U and B of V, then T _ 1 has matrix A~x relative
to B and A.
The next example shows how polynomials in matrices and linear transformations
can sometimes give interesting and surprising results.
Example 3 D Let T be the linear operator on R 3 defined by
T(xi,x2,X3) = (2xi + X2,2x 2 ,2xi + 3x 2 + #3)·
182 Chapter 5 Linear Transformations

Then T has the matrix


Γ
2 0 2
1 2 3
0 0 1
relative to the standard basis £3. Straightforward computations show that A is a zero
of the polynomial x3 — 5x 2 -h Sx — 4. That is,

8 0 14 4 0 6 2 0 2 1 0 0
3 2
A 5A + 8A-4I = 12 8 31 -5 4 4 11 +8 1 2 3 -4 0 1 0
0 0 1 0 0 1 0 0 1 0 0 1
0 0 0
0 0 0
0 0 0

This equation implies that


A3 - 5A 2 + SA = 41
and therefore
A(\A2 \A + 2I) I.
By Theorem 3.14, A is invertible and
A-1 lA2 \A + 2I.

It follows from this that T _ 1 exists and T _ 1 can be expressed as a polynomial in T :

The equation A3 — bA2 + 8^4 — 4 / = 0 also has some implications concerning positive
integral powers of A. For instance,
A3 = 5A 2 - SA + AI
and this implies that
A4 = 5A3 - SA2 + 4 A
Substituting for A3, we have

A4 = 5(5A2 - SA H- 41) - SA2 + 4A


= 1 7 A 2 - 3 6 , 4 + 207.

This substitution procedure can be repeated so as to express any higher integral power
of A as a quadratic polynomial in A, and the corresponding powers of T can be expressed
as quadratic polynomials in Γ. ■
5.5 Composition of Linear Transformations 183

Exercises 5.5

1. Let 5 and T be the linear transformations of R 4 into R 3 given by

S(xi,X2,X3,X4) = (5xi - 3x 3 ,x 2 + 6x 3 - £ 4 ,2xi - 9x 2 + 5^3 + 2x 4 )

and

T(xi,X2,xs,X4) = (4xi + 10x2,6x1 — X2 + 7x4, —3xi + 8x2 — 5x4).

(a) Find the matrices of 5, T, and 5 + T relative to £4 and £3.


(b) Find the matrix of 25 — 3T relative to £4 and £3.

2. Let 5 and T be as given in Problem 1. Consider the bases A and B of R 4


and R 3 , respectively, where A = {(1,1,1,1), (0,1,1,1), (0,0,1,1), (0,0,0,1)}, and
β={(1,1,1),(1,1,0),(1,0,0)}.

(a) Find the matrices of 5, T, and S + T relative to A and B.


(b) Find the matrix of 25 - 3T relative to A and B.

1 -1
3. Suppose that 5 and T are linear operators on R 2 such that 5 has matrix
0 2
3 0
and T has matrix relative to the standard basis £ 2 · Find the matrix
-2 1
that represents TS relative to £2·
4. Let T : R 2 -► R 2 and 5 : R 2 -> R 2 be given by

T(x, y) = (x + y, 2x), 5(x, y) = (3y, x - y).

(a) Find a formula for ST(x,y).


(b) Find matrix representations for each of T, 5, and 5T.

5 -2
5. Suppose the linear operator 5 : R 2 —> R 2 has the matrix relative to the
1 -3
basis {(1,2), (0,1)} and the linear transformation T : R 2 —► R 3 has the matrix
2 ll
1 —1 relative to the bases {(1,2), (0,1)} of R 2 and £3 of R 3 . Find a matrix
1 1 I
representation for TS.
6. Find the matrix representation of 5 _ 1 in Problem 5 with respect to the basis
{(1,2), (0,1)}·
184 Chapter 5 Linear Transformations

7. Assume that A = {ui, 112,113} is a basis for R 3 , and let

vi = u 2 + 2u 3
v2 = u3
V3 = ui + 2u 2 + 5u 3 .

(a) Prove that B = { v i , v 2 , v 3 } is a basis, and express each u* as a linear com­


bination of v i , v 2 , and v 3 .
(b) Let T : R 3 -> R 3 be defined by T(iii) = v* for i = 1,2,3. Find the matrix
that represents T relative to A.

8. Let A, B and T be as defined in Problem 7. Define 5 : R 3 -► R 3 by S(vi) = u*


for 2 = 1,2,3.

(a) Find the matrix that represents S relative to B.


(b) Find the matrix that represents TS relative to A.

3 2
9. Given that T is a linear operator on R 2 with matrix relative to £ 2 ,
-4 -2
find the matrix of 2T 3 + T 2 - 3T + 7 relative to £ 2 .

1 2
10. Let T be the linear operator on R 2 that has the matrix relative to £ 2 .
2 -2

(a) Find the matrix of Γ 2 + 3Γ - 6 relative to £ 2 .


(b) Given that T2 + T - 6 = 0, write Γ - 1 as a polynomial in T.
(c) Write T 3 as a first-degree polynomial in T.

11. Given that T is a linear operator satisfying the polynomial equation

T 3 - 3T 2 + AT + 6 = 0,

write each of T 4 and T 5 as polynomials in T with degree less than 3.


12. Suppose A is any matrix such that A2 — 5A + 12/ = 0.

(a) Prove that A is invertible and that A~x is a polynomial in A.


I 1 2] _Λ _Λ
(b) Given that A = is such a matrix, find A * by writing A λ as a
1-4 4J
polynomial in A.
13. Prove the associative property for multiplication of linear transformations.
5.5 Composition of Linear Transformations 185

14. Give an example which shows that it may happen that ST φ TS, even when both
ST and TS are defined.

15. Let T\ and T2 be linear transformations of U into V, and let 5 be a linear trans­
formation of V into W . Suppose that 5, 7\, and T2 have matrices A, B, and C,
respectively, relative to certain bases of U, V, and W . Prove that S(T\ + T2) has
matrix AB + AC relative to these same bases.

16. Use Problem 15 to prove Theorem 5.24.

17. Use Theorem 3.6 and the definition of addition of matrices to prove Theorem 5.24.

18. Prove that if T is an invertible linear transformation of U into V, then T _ 1 (u) = v


if and only if T(v) = u.

19. Let T be an invertible linear transformation of U into V. Prove that T~l is a


linear transformation of V into U.

20. Let T be a linear operator on V with matrix A relative to the basis A of V. Prove
that T is invertible if and only if A is invertible.

2 1 . Let S and T be invertible linear operators on V. Prove that ST is invertible, and


that (ST)-1=T-1S~1.
22. Let T be the linear operator in Problem 11. Write T _ 1 as a polynomial in T.
Chapter 6

Determinants

6.1 Introduction
In this chapter, the fundamentals of the theory of determinants are developed. A
knowledge of this material is necessary in the study of eigenvalues and eigenvectors of
linear transformations, and many of the applications of linear algebra involve a use of
eigenvalues and eigenvectors. These topics will be studied in Chapter 7.

6-2 Permutations and Indices


The definition of a determinant presented in Section 6.3 depends on the concept of a
permutation of a set of integers and on certain properties of these permutations. A
permutation of a set {#ι,£2> •••5^n} is simply an arrangement of the elements of the
set into a particular sequence or order. For example, the arrangements 1,4,5,3,2,6
and 6,3,2,1,4,5 are permutations of the set {1,2,3,4,5,6}. If an element Xi appears
to the left of an element Xj in a permutation of {x\,X2, . . . , χ η } , then we say that
Xi precedes Xj.
Our interest in permutations is limited to the permutations of the first n positive
integers. The permutation 1,2,3, ...,n is referred to as the natural ordering.

Definition 6.1 The index of an integer jk in &"permutationji,J2i—,jn of {1,2,..., n}


is the number of integers greater than jk that precede jk in the permutation ji,J2, >··,ίη·
The index of jk is denoted byl(jk)·

Example 1 D In the permutation 1,4,5,3,2,6, the index of 3 is given by J(3) = 2 since


4 and 5 are greater than 3 and precede 3 in the permutation. No integer greater than
4 precedes 4 in this permutation, so J(4) = 0. Since the three integers 4, 5, and 3 are
greater than 2 and precede 2 in the permutation, J(2) = 3. ■

187
188 Chapter 6 Determinants

Definition 6.2 The index of a permutation j i , j2,..., j n of {1,2,... ,n} is the integerX
given by

fc=l

where X{jk) is the index of jk in the permutation ji,J2, ···, jn-


That is, the index of a given permutation is the sum of the indices of all of the
elements in that permutation.

Example 2 □ The index of the permutation 1,4,5,3,2,6 in Example 1 is

X = J(l)+J(4)+J(5)+X(3)+J(2)+J(6)
=0+0+0+2+3+0
= 5. ■

Theorem 6.3 Any single interchange of adjacent elements in a permutation j i , ji,..., j n


of {1, 2,..., n} changes the index of the permutation by 1.

Proof. Let X = £]£ = 1 X(jk) be the index of the given permutation

J\ij2i "-·> Jmi Jm+li •••ijn

and consider the index X' of the permutation

Jlij2i "-I 3m-\-\i Jvni "">Jn

that results from interchanging j m and j m + i in the original permutation. Now


772—1 n

i'=J2i>(jk) + i'(jm+l)+i,(jm)+ Σ ïUk),


k=l fc=m+2

where X'(jk) denotes the index of jk in the new permutation. It is clear that X'{jk) —
X(jk) if k is different from m and ra+1. If j m > j m + 1 , then T(jm) = J ( j m ) , X / ( j m + i ) =
T(jm+i) — 1, and consequently X' = X — 1. On the other hand, if j m < j m + i , then
Z'(jm) = X{jm) + 1 and Tf(jm+i) — Z(j m +i), so that X' = X + 1. In either case, the
index is changed by 1, and the theorem is proven. ■

Theorem 6.3 paves the way for obtaining the corresponding result concerning the
interchange of any two (not necessarily adjacent) elements in a permutation.

Theorem 6.4 Any interchange of two elements in a permutation j \ , J2, --,jn of the set
{1,2, ...,n} changes the index by an odd integer.
6.2 Permutations and Indices 189

Proof. Let
J\ , j l i · · · , Jri ""> Js·) · · · , Jn

be the given permutation, and consider the interchange of j r and j s . Let m be the
number of elements between j r and j s .
Now the permutation that results from the interchange of j r and j s can be accom­
plished by using only interchanges of adjacent elements. The element j r can be moved
to the position initially occupied by j s by m + 1 interchanges with the adjacent element
on the right. Then j s can be moved to the position that j r initially occupied by m
interchanges with the adjacent element on the left. Thus, the interchange of j r and
j s can be accomplished by 2m + 1 interchanges of adjacent elements. These 2m + 1
interchanges cause 2m + 1 changes of 1 in the index of the ordering, and consequently
the index has changed by an odd number. ■

Example 3 □ As an illustration of the proof of Theorem 6.4, we shall accomplish


the interchange of j r = 4 and j s — 2 in the permutation 1,4,5,3,2,6 by using only
interchanges of adjacent elements. We note that there are m = 2 elements between 4
and 2.
1,4,5,3,2,6 -> 1,5,4,3,2,6
- 1,5,3,4,2,6
- 1,5,3,2,4,6
-+ 1,5,2,3,4,6
- 1,2,5,3,4,6
We first moved 4 to the position initially occupied by 2 using 3 interchanges with the
adjacent element on the right, and then we moved 2 to the position that 4 initially
occupied by using 2 interchanges with the adjacent element on the left. The total
number of interchanges of adjacent elements was 3 + 2 = 5, an odd integer. ■

The main objective of this section is to establish Theorem 6.7 for use in Section 6.3.
The following two lemmas are basic to our proof of Theorem 6.7.

Lemma 6.5 If a given permutation of {1, 2,..., n} is carried into another permutation
by an odd number of interchanges of elements, then the index of the given permutation
differs from the index of the final permutation by an odd number.

Proof. Suppose that a given permutation of {1,2, ...,n} is carried into another
permutation by an odd number of interchanges of elements. According to Theorem 6.4,
each of the interchanges of elements changes the index by an odd number. Thus, the
index of the original permutation differs from the index of the final permutation by an
odd number, since the sum of an odd number of odd integers is an odd integer. ■

Lemma 6.6 / / a given permutation of {1,2,..., n} is carried into another permutation


by an even number of interchanges of elements, then the index of the given permutation
differs from the index of the final permutation by an even number.
190 Chapter 6 Determinants

Proof. The proof is an exact parallel to that of Lemma 6.5, except that the sum of
an even number of odd integers is an even integer. ■

Theorem 6.7 The number of interchanges used to carry a permutation ji,J2, ~-,3n of
{1,2, ...,n} into the natural ordering is either always odd or always even.

Proof. Since the index of the natural ordering is zero, the difference in the indices
of ji,J2T~,jn and the natural ordering is the same as the index X of j i , J2? ···, jn- If
3\,32, —, jn can be carried into 1,2, ...,n by an odd number of interchanges, then X
must be odd by Lemma 6.5. If ji,J2> ···> jn can be carried into l,2,...,n by an even
number of interchanges, then X must be even by Lemma 6.6. Thus the number of
interchanges used to carry ji,J2 5 ···, jn into 1,2,..., n must always be odd if X is odd and
must always be even if X is even. ■

Example 4 D We shall determine whether the index of the permutation

6,4,1,7,5,2,3

is odd or even by counting the number of interchanges used to carry this permutation
into the natural ordering.

6,4,1,7,5,2,3 -> 1,4,6,7,5,2,3


- 1,2,6,7,5,4,3
- 1,2,3,7,5,4,6
-> 1,2,3,4,5,7,6
- 1,2,3,4,5,6,7

Because we have used an odd number of interchanges to carry the original permutation
into the natural ordering, the index X of

6,4,1,7,5,2,3

must be odd by Lemma 6.5, and the number of interchanges used to carry the original
permutation into the natural ordering would always be odd by Theorem 6.7. Although
always odd, this number of interchanges may be different from the index of the permu­
tation. In this example, we used 5 interchanges of elements and the index of

6,4,1,7,5,2,3

is
X = J(l)+J(2)+J(3)+J(4)+J(5)+T(6)+J(7)
=2 +4+4+1+2+0+0
= 13. ■
6.3 The Definition of a Determinant 191

Exercises 6.2
1. Find the index of the following permutations.
(a) 5,3,1,4,2 (b) 2,4,5,3,1 (c) 6,2,4,5,3,1
(d) 5,3,4,2,1,6 (e) 3,4,6,1,2, 7,5 (f) 2,5,1,4,3, 7,6
2. Determine whether the index of the given permutation is odd or even by counting
the number of interchanges used to carry the given permutation into the natural
ordering.
(a) 3,4,2,5,1 (b) 5,1,3,2,4 (c) 6,2,4,5,3,1
(d) 5,2,4,6,1,3 (e) 4,1,7,5,2,6,3 (f) 3,7,5,4,6,2,1
3. Write out a sequence of interchanges of adjacent elements that will accomplish the
interchange of the given pair of elements in the permutation 6,2,4,5,3,1.
(a) 2 and 3 (b) 6 and 3 (c) 2 and 1 (d) 4 and 1
4. Prove that the number of interchanges used to carry a permutation j i , j2, ···> jn of
{1,2, ...,n} into itself (i.e., into the same permutation) must always be even.
5. If n > 2, what is the index of the permutation n, n — 1, ...,3,2,1? Justify your
answer.

6.3 The Definition of a Determinant


With the results of Section 6.2 at our disposal, we can proceed to our definition of a
determinant. Only square matrices are considered in this chapter since the determinant
is defined only for this type of matrix. We shall simplify our notation from A = [α^]ηχη
to A = [aij]n or even A = [α^] if the order of A is not significant.
Definition 6.8 The determinant of the square matrix A = [a^] n over T is the scalar
det(A) defined by
det(A) = ^2(-l)taljla2J2 · · · αLnjη n^
U)
where Σ denotes the sum of all terms of the form (—l) t ai j l a2j 2 'anjn 0>S Jl,j2,--;Jn
ω
assumes all possible permutations of the numbers 1,2, ...,n, and the exponent t is the
number of interchanges used to carry j\,J2, ...,jn into the natural ordering 1,2, ...,n.
The notations det A and \A\ are used interchangeably with det(A). When the ele­
ments of A = [aij]n are written as a rectangular array, we would have

an ai2 «In

«2n
det(A) = \A\ =
192 Chapter 6 Determinants

Although the number of interchanges used to carry j i , j'2, ···, jn into 1,2,..., n is not
always the same, Theorem 6.7 assures us that this number is either always even or
always odd. Hence the sign (—1)* of each term is well-defined, and det(A) is uniquely
determined by A.
We observe that there are n\ terms in the sum det(A) since there are n! possible
orderings of 1,2, ...,n. The determinant of an n x n matrix is referred to as an n x n
determinant, or a determinant of order n.

Example 1 □ Consider a 3 x 3 matrix

oil «12 «13

^21 Û22 &23

Û31 «32 «33

By the definition,

det(i4) = (-1)^011022033 + (-l)Î2ÛHÛ23a32 + (-l)<3ai2023Û31


+(-1) ί4 αΐ2ΰ2ΐα33 + (-1) <5 αΐ3ΰ22θ 3 ι + (-I) t6 ai3ß2ia32·

Since 1,2,3 is the natural ordering, we may take t\ = 0. Since 1,3,2 can be carried into
1, 2,3 by the single interchange of 2 and 3, we may take £2 — 1· The ordering 2,3,1 can
be carried into 1,2,3 be an interchange of 2 and 1, followed by an interchange of 2 and
3. Thus we may take t3 = 2. By the same method, we find £4 = 1,£5 = l,and t6 = 2.
Hence
det(A) = 0 1 1 0 2 2 0 3 3 - 0 1 1 0 2 3 0 3 2 + 012023031

— ai2Ö21 a 33 —
O l 3 a 2 2 û 3 1 + 013021Ö32- "

It is worth noting that the value of det(^4) obtained in Example 1 agrees with the
value yielded by evaluation routines that are taught in high school algebra. One of the
most popular of these routines evaluates a 3 x 3 determinant \A\ be reproducing the
first two columns and forming signed products according to the following diagram.

det(A) =

- dΊ3 a22 <*31 - <*l, <*23 "32 ~ <*12 ü21 <*33

A similar diagram for a 2 x 2 determinant given by


6.3 The Definition of a Determinant 193

+
N« *i I

y\j
det(A) = \( = a,,/ / a„
" 2 2 - "a/ 2 a
n "2/

gives a correct value, but it is important to know that there is no similar scheme
that works for determinants of order 4 or any order greater than 3.
Definition 6.8 is frequently referred to as the "row" definition of a determinant, since
the row subscripts on the factors α ^ are held fixed in the natural ordering.
The next theorem presents an alternate formulation (the "column" definition) in
which the column subscripts are held fixed in the natural ordering.

Theorem 6.9 For any matrix A = [a,ij]n,det(A) is given by

det(A) = ^^(—l) e a i]L iai 2 2 · · · ainU,


(*)

where Σ denotes the sum over all possible permutations ii,i2,—,i>n o/ 1,2,..., n, and s
(0
is the number of interchanges used to carry i\,i<i,...,in into the natural ordering.

Proof. Let S = Σ(ί)(~l)Saniai22 * · · &%ηη> Now both S and det (A) have n\ terms.
Except possibly for sign, each term of S is a term of det (A), and each term of det (A) is
a term of 5. Thus, S and det (A) consist of the same terms, with a possible difference
in sign.
Consider a certain term ( — Ι ^ α ^ ι α ^ · · - a>inn and let (—l)taij1a2j2 · · · anjn be the
corresponding term in det(A). Then α ^ ι α ^ · · · αιηΎΙ can be carried into ÖIJ 1 Ö2J 2 * ' * anjn
by s interchanges of factors since the permutation ζ'ι,Ζ2, -·Άη can be changed into the
natural ordering 1,2,..., n by 5 interchanges of elements. This means that the natural or­
dering 1,2,..., n can be changed into the permutation iuH-i ~->Jn by s interchanges since
the column subscripts have been interchanged each time the factors were interchanged.
But ji,J2,—,jn can be carried into l,2,...,n by t interchanges, by the definition of
det(A). Thus 1, 2,..., n can be carried into j i , J2, ···? Jn a n d then back into itself by s -h t
interchanges. Since 1,2, ...,n can be carried into itself by an even number (zero) of
interchanges, s + 1 is even by Theorem 6.7. Therefore (—l) s + i = 1 and (—l)s = (—1)£.
Now we have the corresponding terms in det (A) and S with the same sign, and therefore
det(A) = S.M

We recall that the transpose of an m x n matrix A = [α^] is the n x m matrix AT


with UJ2 a s the element in the ith row and j t h column.

Theorem 6.10 If A = [α^·]η, then det(AT) = det(A).


194 Chapter 6 Determinants

Proof. Let B = AT, so that 6^ = α^ for all pairs i, j . Thus

det(B) = 5];(-l) t 6i il 6 2i2 ...6 nin


U)

by the definition of det(2?), and

det(B) = 5^(-1)*ο7·ιΐ%·22 ' ' ·α; η η·


(i)

Therefore det(B) = det(A), by Theorem 6.9. ■

Exercises 6.3

1. Determine whether t is even or odd in the given term of det(A), where A = [a^] n .
(a) (-ΐΥα13α2ια34α42 (b) (-ΐΥαι4α2ια33α42 (c) {-l)1 al4a2^a32aAi
(d) (-1)^12024031043 (e) (-if α2ΑαΑ3αλ2α3ι (f) (-1)^21^33014042

(g) (-1)^34021042013 (h) (-1)^34041012^23

2. Write out the complete expression for det(A) if A = [a^] 4 .

3. If A is a 3 x 3 matrix, find the value of det(A + A) in terms of det(A).

4. Work Problem 3 for an n x n matrix A.

5. If A — [aij]n is a diagonal matrix, what is the value of det(A)?

6. Evaluate det(A) if A = [6ij]n, where 6{j is the Kronecker delta.

7. If A = [dij]n and c is any scalar, express the value of det(cA) in terms of det(>l).

8. Find the value of the given determinant by use of Definition 6.8.

4 0 0 4 -5 7 an 0 0 an 0 0
0 -3 0 (b) 0 -3 -8 (c) 0-21 &22 0 (d) 0 0 a 23
0 0 6 0 0 6 &31 a
32 &33 0 a 32 0

9. Prove or disprove each of the given statements.

(a) \A + B\ = \A\ + \B\.


(b) \A + AT\ = \A\ + \AT\.
6.4 Cofactor Expansions 195

6.4 Cofactor Expansions


A comparison of Definition 6.8 and Theorem 6.9 shows that there is a certain duality in
the determinant of a matrix, since emphasis can be placed on either the row subscripts
or the column subscripts. This duality is complete in that the entire theory can be
developed with either point of emphasis. For each property formulated in terms of
rows, there is a dual formulation in terms of columns, and vice versa. With a few
adjustments, the derivation of a property from one point of view can be changed into a
derivation from the other point of view. It is for this reason that we include derivations
from only one point of view. The approach that is adopted here is to use a formulation
in terms of rows. Each result stated in terms of rows has a dual result stated in terms
of columns. In some instances it is necessary to state the dual result as a separate
theorem. In others, the dual result is indicated by the insertion of the word "column"
in parentheses at appropriate places. In these instances, the dual statement is obtained
simply by replacing the word "row" with the word "column."
In the course of our development, it will become apparent that the elementary op­
erations play a fundamental role in the theory. In addition, they furnish a valuable tool
for use in the evaluation of determinants.
The following theorem describes the effect on the determinant if an elementary row
or column operation of type III is applied to a matrix. This result is invaluable in the
derivation of the cofactor expansion in Theorem 6.14.

Theorem 6.11 If the square matrix B is obtained from the matrix A by an elementary
row (column) operation of type III, then det(£?) = — det(A).

Proof. Let B be obtained from A by interchanging rows u and v of A, so that


bUj — avj and bvj = aUj for all j . Then

detCB) = Σ ί - Ι ) ' 1 ^ ! - · · ^ °vjv ' ' ' °njn


U)

U)

U)
where t\ is the number of interchanges used to carry

3\ i ···} Jut '--Jvi ·'··> Jn

into 1,2, ...,n. Now

det(A) = Y^(-l)t2aih · · · aujv · · · avju · · · a n j n ,


ω
where £2 is the number of interchanges used to carry j i , . . . j v , . . . , j u , . . . , j n into 1,2, ...,n.
Except for the exponents t\ and t2,det(B) would be equal to det(^4). Since only one
196 Chapter 6 Determinants

interchange is necessary to obtain ji,..., j u , . . . , j v , . . . , j n from j u ..., j v , . . . , j u , ...,j n ,*i


and ^2 must differ by an odd number, and (—l)*1 = (—1)(—l)t2. Therefore det(B) =
— det(A), and the proof is complete. ■

The main purpose of this section is to establish an expression for the value of a
determinant that is known as "an expansion by cofactors." Some new notation and
terminology are needed to state this expansion by cofactors.
Definition 6.12 The minor of the element a,ij in A — [α^] η is the determinant Mij
of the (n — 1) x (n — 1) submatrix of A obtained by deleting row i and column j of A.
Example 1 D To obtain the minor of a 12 in A = [0^3, we first delete row 1 and
column 2 of A as indicated below.

We then evaluate the determinant of the submatrix that remains. The minor Mi 2
of a 12 is given by
Û21 &23
Afi2 = Û21Û33 ~ Û23Û31·
Û31 «33

Definition 6.13 The cofactor of QJI j %n JX — I û<2 7 I 71 *"^


the product of (—l)l+^ and the
minor of a^. The cofactor of a^ is denoted by Aij.
E x a m p l e 2 D The minor of 032 = 2 in
7 9
A=
[*1 = 4 6
-4- -3-

is
7 9
M32 42 - 36 - 6,
4 6
and the cofactor of 2 is
132 ( - 1 ) 3 + 2 M 3 2 = -M32 = - 6 .
Our next theorem shows that the evaluation of an n x n determinant can b^ reduced
to the evaluation of n determinants of order n — 1. This is of little practical use except
when used in combination with elementary operations. Aside from this fact, however,
the theorem has substantial theoretical value. The expression given in the theorem is
referred to as "an expansion by cofactors" or more precisely as "the expansion about
the ith row." This expansion is the main result of this section.
6.4 Cofactor Expansions 197

Theorem 6.14 If A — [a^] n , then

a
det(A) = an An + Q>iiAii ^ ·" inAin.

Proof. For a fixed integer i, we collect all of the terms in the sum det(^4) =
Σ(ι)("~l) ia iji a 2.72 ' ' ' anjn that contain an as a factor in one group, all of the terms
that contain α^ as a factor in another group, and so on for each column number. This
separates the terms in det(A) into n groups with no overlapping since each term contains
exactly one factor from row i. In each of the terms containing an, we factor out an and
let Fn denote the remaining factor. Repeating this process for each of α^, 0*3, .··, «m in
turn, we obtain
det{A) = diiFii + ai2Fi2 -\ h ainFin.

To finish the proof, we need only show that Fij = A^ = (—1) Î+:, M^, where M^ is the
minor of α^.
Consider first the case where i = 1 and j = 1. We shall show that a\\Fn = auMn.
Each term in F n was obtained be factoring a n from a term (—l) tl ana2j 2 · · · anjn in
the expansion of det(^l). Thus each term in F n has the form (—l)*1 a>2j2a3j3 · — anjn
where t2 is the number of interchanges used to carry J2,J3, ···, jn into 2,3,..., n. Letting
J2, J3, "-,3η range over all permutations of 2,3, ...,n, we see that each of F\\ and M\\
has (n — 1)! terms. Now l,j2,---,in can be carried into the natural ordering by the
same interchanges used to carry J2,---,jn into 2,...,n. That is, we may take t\ = £2·
This means that F n and M\\ have exactly the same terms, yielding Fn = M\\ and
aiiFu =aiiMu.
Consider now an arbitrary α^. By i — 1 interchanges of the original row i with the
adjacent row above and then j — 1 interchanges of column j with the adjacent column
on the left, we obtain a matrix B that has α^ in the first row, first column position.
Since the order of the remaining rows and columns of A was not changed, the minor of
aij in B is the same M^ as it is in A. By Theorem 6.11,

det(B) = (-ΐγ-1**-1 det{A) = (-l)i+j det(A).

This gives det(A) = (-1)*+' det(B).


The sum of all the terms in det(£?) that contain α^ as a factor is aijMij, from
our first case. Since det(.A) = (—1)*+J det(£?), the sum of all the terms in det(A) that
contain α^ as a factor is (—Ι^+^'α^-Μ^. Thus aijFij = (—l) 1+J a^Mfj = α^Α\^ and
the theorem is proved. ■

The dual statement for Theorem 6.14 is as follows.

Theorem 6.15 If A— [a^] n , then

det(A) = aijAij + a2jA2j H l· anjA,


198 Chapter 6 Determinants

Example 3 □ If A — [ α ^ , the expansion of det(A) about the 2 n d row is given by

ÖH a i 2 Ö13

a2\ Ö22 ^23 021^21 + ^22^22 + «23^23

^31 «32 «33

^12 ^13 , n 9 oil «13


«2l(-l)2+1 + a 2 2 (-l) 2 + 2
^32 ^33 | a 3 i Ü33
a n ai2
+a23(-l)2+3
«31 ^32

= - a 2 l ( a i 2 Ö 3 3 - 013^32) + 022(011^33 ~ ûl3Û3l)

- a 2 3 |anû32 - ^12^311 ·

This value agrees with that given in Example 1 of Section 6.3. ■

Theorem 6.14 proves to be extremely useful. For instance, it provides the key step
in establishing our next theorem.

Theorem 6.16 The expression C\An -\-C2Ai2 H \-cnAin is equal to the determinant
of a matrix which is the same as A — [α^·]η except that the elements aij of the ith row
have been replaced by the scalars Cj.

Proof. By Theorem 6.14,

det(A) = an An + ^ 2 ^ 2 H H ainAin.

Let B be obtained from A by replacing an by c i , α ^ by C2, a^n by cn. Then det(i?) can
be found from the above expansion by replacing each α^ by Ck for k = l,2,...,n. In
evaluating A ^ , the i t h row and kth column are deleted from A. Thus, the values of the
Aik do not depend on any of the elements α α , α ^ , -..,am· Therefore, the values of the
Aik do not change when each α^ is replaced by c^, and

det(B) = ciAn + c2Ai2 H h

The result parallel to Theorem 6.16 has the following formulation in terms of columns.

Theorem 6.17 The expression c\A\j -\-c2A2j H \-cnAnj is equal to the determinant
of a matrix which is the same as A = [α^·]η except that the elements a^ of the j t h
column have been replaced by the scalars Q .

Theorem 6.18 The determinant of a matrix A — [a^] n that has two identical rows
(columns) is zero.
6.4 Cofactor Expansions 199

Proof. In contrast to the development thus far in this chapter, we must take into
account the field T that contains the elements α^ of A.
Suppose first that 1 + 1 ^ 0 in T} If the uth and vth rows of A are identical, let
B be the matrix formed from A by the interchange of the uth and vth rows. Then
B = A, but det(B) = -det(A) by Theorem 6.11. Thus we have det(A) = - d e t ( A ) ,
and (1 + 1) det(i4) = 0. Since 1 + 1 ^ 0 , det(A) = 0.
Consider now the case where 1 + 1 = 0 in T. According to Theorem 6.11, an inter­
change of two rows in A produces a matrix whose determinant has the value — det(-A).
But det(A) = — det(A) since det(A) is in T and c + c= (1 + l)c = 0 for all c in T. Thus,
an interchange of rows does not change the value of the determinant, and there is no loss
of generality if we assume that the first two rows are equal. That is, a\3- = a2j for all j .
Since — 1 = 1 in T, det(A) = Σ(Ί) aijia2j2 ' ' * anjn· For each term a>ij1a2j2a3j3 ' ' ' anjn
in det(^4), there is a corresponding term aij2a2j1a3j3 · · ' anjn- If w e group these terms
in pairs and sum over only those orderings with ji < j 2 , we have

det(A) = Σ (aijia2j2a3j3 - ' · dnu + o,iJ2a2jla3J3 · · · anjri)


h <h
=
Σ (aljia2J2aSj3 ' ' ' anjn + ^2j2^ljia3j3 ' ' · anjn)
3i <h
= Σ (! + 1)aiha2j2a3j3 ' · · anjn
3\ <J2
- 0.

This completes the proof. ■

Theorem 6.19 The sum of the products of the elements of a row of A = [α^·]η by the
cofactors of the corresponding elements of a different row of A is zero. Hence

CLnAki + ai2Ak2 H h ainAkn = 6ik det(A),

where 6ik is the Kronecker delta.

Proof. If i = fc, the equality follows from Theorem 6.14.


Suppose that i Φ k. According to Theorem 6.16,

anAki +ai2Ak2 + ··

is the determinant of a matrix B that is the same as A except that the elements of row
k have been replaced by αίΐ7«ζ2, ••-,αζη- Thus the matrix B has the z th and kth rows
identical, and det(JB) = 0 by Theorem 6.18. ■

The dual statement of Theorem 6.19 is given next.


1
A field in which 1 + 1 = 0 is a field of characteristic 2. For a discussion of characteristics of a field,
the reader may consult any standard text in abstract algebra.
200 Chapter 6 Determinants

Theorem 6.20 The sum of the products of the elements of a column of A = [α^·]η by
the cofactors of the corresponding elements of a different column is zero. Hence

aijAik + a2jA2k H l· anjAnk = 6jk det(A).

Exercises 6.4

1. Prove that if A = [α^] η has a row with all elements zero, then det(A) = 0.
2. Compute the cofactor of the indicated element α^ in A = [α^], where

2 -2 2 1
2 -1 -2 -1
A =
0 2 -4 -6
2 -3 10 4

(a) a 23 (b) a 34 (c) a 42 (d) a 3 i (e) a i 4 (f) a 32


3. Evaluate det(A) with A as in Problem 2.
4. Let A = [aij] be given by

4 -5 3
A = 3 3 -2
1 -1 1

(a) Find the matrix B = [Aij]T, where Aij is the cofactor of α^.
(b) Compute AB, where A and B are as in part (a).
(c) Use the results of parts (a) and (b) to find A"1.

5. For an arbitrary matrix A = [α^] 3 , let B = [Aij]T.

(a) Evaluate AB.


(b) From the product AB in part (a), deduce a formula for A~l whenever
det(A) φ 0 and A = [α^·]3.

6. Evaluate the given determinant by a cofactor expansion.


2 - 1 2 1
2 2 0
0 1 -4 -2
(a) (b) 2 1 1
0 0 4 -2
-7 2 -3
0 0 4 1
6.5 Elementary Operations and Cramer's Rule 201

2 3 0-1 1 2 1
-4-6 0 3 3 4-1
(c) (d)
2 1 - 1 2 -2 2 -1
0-2-1 3 1 -3 -2

7. Find the values of x for which

1-x 1 -1
-1 1-x -1 = 0.

-1 -1 1-x

8. Prove Theorem 6.11 for an elementary column operation of type III.


9. Prove Theorem 6.15.
10. Prove Theorem 6.17.
11. Prove the dual statement of Theorem 6.18.
12. Prove Theorem 6.20.

6.5 Elementary Operations and Cramer's Rule


In Theorem 6.11 of the preceding section, we have seen that the application of a single
operation of type III to a square matrix has the effect of changing the sign of the
determinant. We propose now to investigate the effects of the other types of elementary
operations, and to indicate the usefulness of these operations in combination with the
cofactor expansions.
For an elementary operation of type I, we have the following result.
Theorem 6.21 If the square matrix B is obtained from the matrix A by an elementary
row (column) operation of type I that multiplies every element of a row (column) by
c φ 0, then det(B) = cdet(A).

Proof. Suppose that B is obtained from A = [a^] n by multiplying each entry of the
kth row by c φ 0.
By Definition 6.8,

det(B) = ^2{-1)%^^2 · · · bkjk - · · bnjn,


U)
and since 6^ = α^ if i Φ k and bkjk = cakjk ,we have

det(B) = c^u)(-l)taljla2j2'"akj
= cdet(A).
202 Chapter 6 Determinants

For elementary operations of type II, we have the following description.


Theorem 6.22 If B is obtained from the square matrix A by an elementary row (col­
umn) operation of type II, then det(B) = det(A).

Proof. Let A = [o>ij]n and suppose that B = [bij]n is formed by adding to each
element auj of the uth row of A the product of the scalar c and the corresponding
element avj of the vth row of A(u ^ v). Then B and A are the same except in the uth
rows, and the cofactor of bUj in the uth row of B is the same as the cofactor Auj of the
corresponding element aUj in A. When det(£?) is expanded about the uth row, we find

det(jB) = buiAui + bu2AU2 H l· bunAun


= (oui + cavi)Au\ + (aU2 + cav2)Au2 H l· (aun + cavn)Aun
= CLul-Aui + au2Au2 H" ' · " "r dunAun

+c(av\Au\ -f av2AU2 + · · · + avnAun).

By Theorem 6.14 and 6.19,


det(B) = det(i4) + c(0) = det(A). ■
E x a m p l e 1 □ As an illustration of the usefulness of the elementary operations, consider
the evaluation of the determinant

5 2 2 15
2 2 -- 4 6
det(A) =
2 -- 4 2 6
0 5 7 1

By Theorem 6.21,
5 2 2 15
1 1 -2 3
det(A) = 2
2 -4 2 6
0 5 7 1
According to Theorem 6.22, the value of the determinant is unchanged if we (a) add to
row 1 the product of -5 and row 2 and (b) add to row 3 the product of -2 and row 2.
Thus
' 0 - 3 12 0
1 1 -2 3
det(A)
0 -6 6 0
0 5 7 1
6.5 Elementary Operations and Cramer's Rule 203

Expanding about the first column, we have

-3 12 0 1 -4 0
2+1
det(yl) = 2 ( - l ) -6 6 0 = (-2)(-3)(6) -1 1 0
5 7 1 5 7 1

Expanding now about the third column, we have

1 -4
det(A) = ( - 2 ) ( - 3 ) ( 6 ) ( - l ) 3 + 3 = -108.
-1 1

Our final result of this section makes an important connection between determinants
and the solution of certain types of systems of linear equations. This theorem presents
a formula for the unknowns in terms of certain determinants. This formula is commonly
known as C r a m e r ' s rule.

T h e o r e m 6.23 Consider a system of linear equations AX = B, in which A = [α^·] η χ η ,


X — [χι,Χ2,..·,χη]Τ, and B = [61,62, ...,6 n ] T . If det(A) 7^0, the unique solution of the
system is given by
„ _ Z)fc=l bkAkj
-, j = l,2,...,n.
det(A)

Proof. We first show that the given values are a solution of the system. Substitution
of these values for Xj into the left member of the i t h equation of the system yields

dil^l 1***1 OJÎJIXJI det(A) ( ail [ Σ hAki J H h ain ( Σ bkAkn


k=\
n / n
det(A)
K }
Σ ( Σ UijbkAkj
j=l \k=l
n 1 n
a
det(A) Σ I bk z2 ijAkj

= ^(Ä)Zh(özkdet(A))
fc=l
d^Âyft<(««det(i4))
= b,

Thus, the values


_ _ Efc=i bkAkj
3
det(A)
furnish a solution of the system.
To prove uniqueness, suppose that x3- = yj,j = 1,2, ...,n, represents any solution to
the system. Then the z th equation J^^ = 1 a>ikyk = bi is satisfied for i = 1, 2, ...,n. If we
204 Chapter 6 Determinants

multiply both members of the ith equation by A^ (j fixed) and form the sum of these
equations, we find that

Σ ^o.ikAijyk =^2biAij
i=l \k=l i=l

or

fc=l \i=l / i=l

But, for each k, Σ £ = 1 aikAij = 6k j det(A). Thus


n n
Σ bkj det(^)y fc = ^ M*j,
fc=l i=\

and
ΣΓ=ι Mi,
Vj
det{A) *
Hence these yj 's are the same as the solution given in the statement of the theorem. ■

We note that the sum JZfc=i bkAkj is the determinant of the matrix obtained by
replacing the j t h column of A by the column of constants B = [6i, b2,..., bn]T.

Example 2 □ For a system in three unknowns with det(A) φ 0,

αιι#ι + a\2x2 + ^13^3 = h


d2\X\ + d22X2 + «23^3 = b2
α3ιχι + α32Χ2 + 033^3 = h ,

the solutions stated in Theorem 6.23 can be written as

&1 Û12 a
13 a n 61 «13 an ai2 fei
a
62 »22 &23 &21 b2 ÖL23 Ö21 22 b2

a
&3 « 3 2 Û33 «31 &3 33 «31 &32 63
ari ^2 Z3
μΐ μΐ μι
Exercises 6.5

1. Evaluate the given determinant by an appropriate combination of elementary op­


erations and expansion by cofactors.
1 0 1 4 - 5 3
a —b a — b 2a — b
(a) 1 1 2 (b) 3 3 - 2 (c) (d)
b a b — a 2b — a
3 4-2 1 -1 1
6.5 Elementary Operations and Cramer's Rule 205

1 1 1 1 - 1 1 0 2
4 -2-9 2 4-3
2 3 1 -2 10 2 3 -1
(e) 7 2 10 (f) 4 8 9 (g) (h)
4 6 1 4 4 0 1-1
4 1 -3 4 4-9
8 9 1 -8 1 1 1 1

2. Use Cramer's Rule to evaluate two of the unknowns, and then find the remaining
unknown by substitution of these values into one of the equations.

(a) 2xi + 2x 3 = 2 (b) 4xi - 2x 2 + 9x 3 = 1


x 2 - 3x 3 = -4 7xi + 2x 2 - 10x3 = 0
2xi + x 2 + X3 = 6 4xi + x2 + 3x 3 = 2

(c) 4xi + 2x 2 + 4 x 3 = 2 (d) 3xi - 4x 2 + 6x3 = 1


6xi — 5x2 + 8x3 = —3 9xi + 8x 2 - 12x3 = 3
3x 2 + 2x 3 = 5 9xi - 4x 2 + 12x3 = 4

(e) 2xi - x 2 + 3x 3 = 17 (f) 2xi + 4x 2 + x3 = 17


5xi - 2x 2 + 4x 3 = 28 5xi + 3x2 — X3 = 4
3xi + 3x2 — X3 = — 1 2xi — 7x 2 + 3x 3 = —36

3. Explain how one can use elementary operations to conclude that

2 5-6 4 15 - 6
if - 1 2 4 = 139, then -2 6 4 = 834.
3 1 5 6 3 5

4. Solve for x:
x 0 0 -8
- 1 x 0 -10
= 0.
0-1 x -1
0 0-1 1

5. Show that
1 1 1
(a — b)(b — c)(c — a).
b2
206 Chapter 6 Determinants

6. Show that
a a a
a b b = a(a — b)(b - c).

a b c
7. Evaluate the following determinant.

(a + 6)2 c*
a2 (6 + c) 2
b2 b2 (a + c)2

8. Show that

—x 1 0 0
0 —x 1 0
C0 + C\X + C2X + C3X + X .
0 0 —x 1
-co -ci -C2 -c3 -

9. Show that
a b c a b c a c

d e f + d e / = d /
h i j k m n /i + fc i + m j + n

10. Prove that an equation of the straight line through two distinct points with rect­
angular coordinates (x\,yi) and (#2,2/2) is given by

X y 1
x1 Vi 1 0.
X2 V2 1

11. Suppose that /ii,/i2>/2i» and ^22 are differentiable functions of x and that g is
defined by
/11(a) /12(a)
9(x)
/21W /22(a)
Use calculus formulas and show that

/11 /12 /11 /12


/21 /22
+ /21 /22
6.6 Determinants and Matrix Multiplication 207

12. Use Theorem 6.21 to express \cA\ in terms of \A\ for A — [ßij]n·
13. Prove the dual statement of Theorem 6.21.
14. Prove the dual statement of Theorem 6.22.

6.6 Determinants and Matrix Multiplication


We turn now to an investigation of the determinant of a product of two matrices.
Our principal goal is to obtain the result that det(AB) = det(A) det(B). Although the
statement of this result is simple enough, the derivation is a bit involved. We adopt an
approach that utilizes elementary matrices in this derivation.
Theorem 6.24 The determinant of any elementary matrix is not zero.

Proof. Let M be an elementary matrix. If M is of type I, then \M\ = c\In\ = c,


where c φ 0, by Theorem 6.21. If M is of type II, then \M\ = \In\ = 1 by Theorem
6.22. If M is of type III, then \M\ = - \In\ = - 1 by Theorem 6.11. ■
Theorem 6.25 If A and M are n x n matrices and M is an elementary matrix, then
det(MA) = det(M)det{A).

Proof. We recall that multiplying A on the left by an elementary matrix M performs


the same row operation on A as that used to obtain M from the identity matrix In (see
page 92).
If M is of type I as in Theorem 6.21, then
det(MA) = cdet(A) = det(M)det(A),
where c Φ 0. If M is of type II, then
det(M^l) = det(A) = det(M)det(A),
by Theorem 6.22. If M is of type III as in Theorem 6.11, then
det(MA) = -det(A) = (-l)det(i4) = det(M)det(A).
Thus the theorem is valid in all cases. ■

The result of the theorem extends readily to the following corollary, the proof of
which is left as an exercise (Problem 6).
Corollary 6.26 If Μχ, M2,..., Mk are elementary n x n matrices and A is n x n, then
det(MiM 2 . · - MkA) = det(Mi) det(M 2 ) · · · det(M fc ) det(A).
Corollary 6.27 If M\, M 2 ,..., Μ& are elementary nx n matrices, then
det(MiM 2 · · · Mk) = det(Mi) det(M 2 ) · · · det(M /c ).
208 Chapter 6 Determinants

Proof. The asserted equality follows at once from Corollary 6.26 with A = In. ■

Corollary 6.27 enables us to extend the result of Theorem 6.24 to arbitrary invertible
matrices, and we are also able to establish the converse.

Theorem 6.28 An n x n matrix A is invertible if and only if det(A) φ 0.

Proof. Assume first that A is invertible. Then A is a product of elementary matrices


(Theorem 3.19), say A = MXM2 · · · Mk. From Corollary 6.27,

det(A) = det(Mi) det(M 2 ) · · · det(M fc ),

and this product is not zero by Theorem 6.24.


Assume now that det(A) φ 0. Let P be an invertible matrix such that PA is in re­
duced row-echelon form. Now P is a product of elementary matrices, P — M\M2 · · · Mk,
so that

det(Pi4) = det(MiM 2 · · · MkA) = det(Mi) det(M 2 ) · · · det(Mfc) det(A).

Since det(A) φ 0 and each det(A^) φ 0,det(Pi4) Φ 0. This implies that PA does not
have a row of zeros, and consequently is of rank n by Corollary 3.49. But A and PA
have the same rank since they are equivalent, so this means that A has rank n. It follows
from Corollary 3.50 that A is invertible. ■

When the theory of matrices and determinants was in its early stages of development,
the word nonsingular was normally used instead of invertible, and the word singular
was used to indicate that a square matrix did not have an inverse. The last theorem
gives the basis for the early terminology: singular corresponds to zero determinant, and
nonsingular corresponds to nonzero determinant.
We are now in a position to prove the main theorem in this section.

Theorem 6.29 If A and B are n x n matrices, then

det(AB)=det{A)det(B).
Proof. If A is not invertible, then AB is not invertible (for AB invertible implies
AB ( A B ) - 1 = 7 n , which in turn implies A is invertible by Theorem 3.14). By Theorem
6.28, det(A) = 0 and det(AB) = 0 so that

det(AB) = det(A)det(B)
in this case.
Assume that A is invertible. Then A is a product of elementary matrices, A =
ΜλΜ<2 · · · M*, and Corollary 6.26 implies that

det(AB) = det(M1M2---MkB)
= det(Mi)det(M 2 )---det(il4)det(£).
6.6 D e t e r m i n a n t s a n d M a t r i x Multiplication 209

But
det(i4) = det(Mi) det(M 2 ) ■ ■ · det(M fc )

by Corollary 6.27, and we have

det(AB)=det(A)det(B).m

The next theorem is of great interest since it provides a formula for the computation
of the inverse of a matrix. Unfortunately the use of this formula is not very practical for
matrices of higher orders, due to the large number of computations that are necessary.
The formula is conveniently expressed in terms of the adjoint of a matrix, defined
as follows.

Definition 6.30 If A— [α^] η , then the adjoint of A, denoted by adj(^4), is the matrix
given by adj(A) = [A^JJ, where A^ is the cofactor of a^ in A.

E x a m p l e 1 D Let
11 - 6 2
A = [aij]3 3 -2 1
2-2 2

The cofactor of 11 in A is

-2 1
An = (-l) 1 + 1 -4 + 2:
-2 2

and the cofactor of -6 is

3 1
An = ( - 1 ) 1 + 2 (_l)(6-2) = -4.
2 2

Continuing in this fashion, we find that

T
2 -4 -2 -2 8 -2
8 18 10 = - 4 18 - 5
2 -5 -4 - 2 10 - 4

λ
T h e o r e m 6.31 / / A invertible, then A = de^A^ adj(A).
210 Chapter 6 Determinants

Proof. Let Aadj(A) — [cy]„. Now

an a\2 · · · a\n An A 2i ·· Ani


«21 «22 * ' ' «2n An A22 - An2
Aadi(A) =

Q"n\ &η2 ''' ann ^ln ^-2n ' " " Art

and hence c^· = Y^=i dikAjk- By Theorem 6.19, Σ%=ι aikAjk = 6ij det(.A). Thus

Aad)(A) =det(A)In.

Since det(A) Φ 0, this yields

1
adj(A) = Jn,
det(A)
and it follows that
A- 1 adj(yl).
det(A)

Corollary 6.32 For any A = [a,ij]n,Aaa](A) = det(A)/ n .

Example 2 D Using the matrices A and adj(^4) from Example 1, we have

11 - 6 2
det(i4) 3 -2 1 = -2
2-2 2

and
η T |
2 -4 -2 1 -4 1
1
A- 8 18 10 = 2 -9 |
2 -5 -4 1 -5 2

Example 3 D The formula in Theorem 6.31 has its greatest usefulness with 2 x 2
matrices. If
a b

and ad — be φ 0, then
1 d -b
A-1
ad — 6c -c a
6.6 Determinants and Matrix Multiplication 211

Exercises 6.6

1. Use the formula of Theorem 6.31 to find the inverse of the given matrix.
1 2 1 -4 -5 1 0 1
(a) 1 0 1 (b) 3 3 (c) 1 1 2
0 1 -1 -1 -1 3 4-2

1 [4 2 3 3 2
(d) -1 (e) 6 -5 (f) 2 4 2
0 3 3 7 10 6

2. Use Theorem 6.29 to express det(Arn) in terms of det{A) for an arbitrary integer
m (A0 = I). In particular, obtain the result det(A x) 1
det(A) '

3. Given that A and B are square matrices of the same order, det (A) 2, and
det(B) = 6. Find the value of d e t ^ - 1 ^ ) .
λ
4. If A and B are invertible and of the same order, express det ({AB) ) in terms
of det (A) a n d d e t ( £ ) .
5. Express det (adj(^l)) in terms of det (A), where A is an n x n matrix.
6. Prove Corollary 6.26.
7. A matrix A is called skew-symmetric if AT — —A. Prove that any skew-
symmetric matrix of odd order is singular if 1 + 1 φ 0 in T.
8. Suppose that the matrices A — [α^·] η ,β = [bij]n, and C = [cij]n are such that
x
r/ bij Cij whenever i φ fc, and c^j — α^ + bkj for each j . Prove that
det(C)=det(A)+det(£).
9. A submatrix of an m x n matrix A is a matrix obtained by deleting certain rows
and/or columns of A. For an arbitrary matrix A (not necessarily square), the
number p{A) is defined as follows:
(i) if A = 0, then p{A) = 0,
(ii) if A φ 0, then p{A) is the largest possible order for a square submatrix
of A that has a nonzero determinant.
(a) Prove that if the matrix B is obtained from A by an elementary row opera­
tion, then p{B) < p{A). Hence conclude that p{B) = p{A).
(b) Prove that if the matrix B is row-equivalent to A, then p{B) = p{A).
(c) Prove that p{A) is the rank of A. {Hint: See Corollary 3.49.)

10. Suppose that A is a nonzero square matrix of order 3 that is singular. What is
the most precise statement that can be made about the rank of A?
212 Chapter 6 Determinants

11. Find all values of x and y in the field of complex numbers for which the matrix
x y
has rank 1.
1 x y
Chapter 7

Eigenvalues and Eigenvectors

7.1 Introduction
In Chapter 5, we studied linear transformations of a vector space U into a vector space
V, where U and V were vector spaces over the same field T. We turn our attention
now to the special case in which U = V. More precisely, we shall be concerned here
with linear operators on a finite-dimensional vector space V. Throughout this chapter,
T will denote a field, V will denote a finite-dimensional vector space over T, and T will
denote a linear operator on V.

7.2 Eigenvalues and Eigenvectors


In many applications of linear algebra, particularly in engineering, those vectors in a
vector space that are mapped onto multiples of themselves by a certain linear operator
are of key importance. This type of vector also arises with the use of quadratic forms
in statistics and other areas of mathematics.

Definition 7.1 An eigenvector of T is a nonzero vector v such that T(v) = Xv for


some scalar X. The scalar X is called an eigenvalue of T.

We say that V and λ are associated with each other, or that they correspond to
each other. Thus a scalar λ is an eigenvalue of T if and only if there exists a nonzero
vector v such that T(v) = λν. The set of all eigenvalues of T is called the spectrum
of T.
The term "eigenvalue" in the definition is not completely standardized. Other terms
used interchangeably are characteristic value, characteristic root, proper value,
and proper number. The German word "eigen" translates into English as "charac­
teristic," but the hybrid word eigenvalue seems to be more widely used than any of
the other terms. Similarly, other terms for eigenvector are characteristic vector and
proper vector.

213
214 Chapter 7 Eigenvalues and Eigenvectors

We have already studied the intimate relations between a linear transformation and
the various matrices that represent it relative to different bases. We shall see presently
that the matrices that represent a linear operator are also useful tools in investigating
the eigenvalues of the operator. The principal connection between a linear operator T
and a matrix A that represents it is provided here by the characteristic matrix of A.
Definition 7.2 Let A be annxn matrix over T, and let x represent an indeterminate
scalar. Then the matrix A — xl is called the characteristic matrix of A.
Example 1 D The characteristic matrix of

8 5 6 0
0 - 2 0 0
-10 - 5 - 8 0
2 1 1 2

is the matrix given by

8-x 5 6 0
0 -2-x 0 0
A-xI =
-10 -5 -8-x 0
2 1 1 2-x

If we concentrate on the degrees of the terms in the definition of

an — x ai2 ··· a\n


l Û21 a22 — X - ' CL2n
det(A - xl) =

uni «n2 Q"nn %

it is clear that the term with highest degree is the product of the diagonal elements
(an - x)(a 22 -x)-" (ann - x).
Hence det(^4 — xl) is a polynomial in x of degree n with lead coefficient (—l) n , say
det(A - xl) = {-l)nxn + cn-ixn~l H h c\x + c 0 .
Upon setting x = 0, we find that c$ = det(A).
Definition 7.3 For any square matrix A over J-\ the polynomial det(A — xl) is the
characteristic polynomial of A. The equation det(A-xI) — 0 is the characteristic
equation of A, and the solutions of det(A — xl) = 0 are called the eigenvalues of A.
The set of all eigenvalues of A is called the spectrum of A.
7.2 Eigenvalues and Eigenvectors 215

Example 2 D With A as in Example 1,

—x 5 6 0
0 -2-x 0 0
det(A - xl)
-10 -5 i-x 0
2 1 1 2-x
= x4 - Sx2 + 16

is the characteristic polynomial of A. The equation

x4 - Sx2 + 16 = 0

is the characteristic equation of A. Since

x4 - 8x2 + 16 = (x + 2)2(x - 2) 2 ,

the eigenvalues of A are given by λι = X<2 = —2, A3 = A4 = 2. The spectrum of A is


{-2,2}.«
The important connection between the eigenvalues of linear operators and those of
matrices is given in our next theorem.

Theorem 7.4 Let A be any matrix that represents the linear operator T. Then T and
A have the same spectrum.

Proof. Suppose that T is represented by the matrix A relative to the basis A of V.


Let λ be an eigenvalue of T, and let X = [x\,X2, ~-,Xn]T be the coordinate matrix
relative to A of a corresponding eigenvector v. Then T(v) = λν and therefore AX = XX
since A represents T. This gives AX — XX = 0, and (A — XI)X = 0. Since X is a nonzero
matrix, the system of equations (A — XI)X = 0 has a nontrivial solution given by the
coordinates recorded in X. By Theorems 6.28 and 4.18, this implies that det(A-XI) = 0,
and λ is an eigenvalue of A.
On the other hand, if det(A — XI) = 0, then the system (A — XI)X = 0 has a
nontrivial solution by Theorems 6.28 and 4.18. This solution provides the coordinate
matrix X relative to A of a nonzero vector v. Since AX = XX, T(v) = λν, and λ is an
eigenvalue of T. ■

The proof of the preceding theorem rests primarily on the equivalence of the equa­
tions
T(v) = λν and AX = XX.
We see that λ is an eigenvalue of A if and only if there exists a nonzero n x 1 matrix X
such that AX = XX. This motivates the following definition of eigenvectors of matrices.

Définition 7.5 Let X be an eigenvalue of the n x n matrix A. Then an eigenvector


of A associated with X is a nonzero n x 1 matrix X such that AX = XX.
216 Chapter 7 Eigenvalues and Eigenvectors

Thus the eigenvectors of matrices and linear operators are related in this way: If
the n x n matrix A represents T relative to the basis A of V, then X is an eigenvector
of A corresponding to λ if and only if X is the coordinate matrix relative to A of an
eigenvector of T corresponding to the same eigenvalue.
The method of proof of Theorem 7.4 provides a systematic method for determining
the eigenvalues of a given linear operator. Any convenient choice of a basis A of V will
determine the matrix A that represents T relative to A, and the eigenvalues of T are
precisely the solutions of the characteristic equation det(v4 — xl) = 0. The eigenvectors
v corresponding to a particular eigenvalue λ are just those nonzero vectors v in V with
coordinates X relative to A that satisfy (A — \I)X = 0. An illustration of this procedure
is given in the next example.
Example 3 □ Consider the linear operator on R 4 defined by

T(xi,X2,X3,X4)

= (8#i + 5x2 + 6x3, - 2 x 2 , — 10xi - 5x 2 — 8x3, 2xx + x 2 + X3 + 2x 4 ).


The matrix of T relative to the standard basis of R 4 is the matrix
8 5 6 0
0 - 2 0 0
A =
-10 - 5 - 8 0
2 1 1 2
considered in Examples 1 and 2. Prom Example 2, we know that the characteristic
polynomial of A is
det(A - xl) =x4 - 8x 2 + 16 = (x + 2) 2 (x - 2) 2 ,
and the eigenvalues of T are λι = λ 2 = — 2 and λ% = λ 4 = 2.
To determine the eigenvectors corresponding to λι = —2, we consider the system of
equations (A + 21)X = 0 :

10 5 6 0 Xl 0
0 0 0 0 X2 0
-10 - 5 -6 0 X3 0
2 1 1 4 X4 0
Solving this system, we have

10 5 6 0 0 1 4 1 1
2
0 12 0
0 0 0 0 0 0 -20 0 0 1 -20 0
-10 - 5 - 6 0 0 0 0 0 0 0 0 0
2 1 1 4 0 0 0 0 0 0 0 0
7.2 Eigenvalues and Eigenvectors 217

The solutions to this system are given by

X\ — —\X2 — 12X4

X2 = X2

xs = 20^4
X4 — X4 5

where X2 and X4 are arbitrary. The eigenvectors of T corresponding to the eigenvalue


—2 are those vectors of the form

(X1,X2,X3,^4) = { — \%2 - 12X4,X2,20X4,X4)

= x2(-±, 1,0,0)+x4(-12,0,20,1).

For the eigenvalue \% = A4 = 2, the system of equations (A — 21) X — 0 is given by

Γ 6 5 6 0 Xl 0
0-4 0 0 X2 0
-10 - 5 -10 0 X3 0
I 2 1 1 0 X4 0

The reduced row-echelon form for the augmented matrix here is

[~1 0 0 0 0]
0 1 0 0 0
O O I O O '
lo 0 0 0 ol

so the solutions are given by xi = 0, x 2 = 0, X3 = 0, X4 arbitrary. The eigenvectors of T


corresponding to the eigenvalue 2 are those vectors of the form

( x i , x 2 , x 3 , x 4 ) = £4(0,0,0,1). ■

The procedure described just before Example 3 is not always as simple and effective
as it might seem, for there are several complications that may arise. There is usually no
difficulty in determining the matrix A. (In most cases, the matrix A is already known.)
But one may encounter trouble in the solution of the resulting characteristic equation
det(A - xi) = 0.
First of all, some or all of the solutions to det(A — xi) = 0 may not lie in the
field T. If an eigenvalue λ is not in T, then the nonzero coordinates xi in a solution
of AX = XX are not in T. For the nonzero elements of AX are in T whenever those
218 Chapter 7 Eigenvalues and Eigenvectors

of X are, whereas those of XX are not. Thus there are no eigenvectors of T in V


corresponding to λ whenever λ £ T.
Even when the eigenvalues are all in ?*, difficulties may be encountered in the solution
of the polynomial equation det(A-xI) = 0. This is the typical situation in applications,
and a large number of numerical methods have been devised to obtain approximate
solutions to such problems.
Problems that call for the determination of eigenvalues or eigenvectors of a certain
linear transformation or matrix are called eigenvalue problems. They are one of the
most common types of problems encountered in the applications of linear algebra. Un­
fortunately, most of them are quite complicated in their formulation, and their solution
frequently involves a knowledge of several areas of mathematics. One of the simplest
types that occurs in physical situations is illustrated in our next example.

Example 4 □ Before stating the physical problem to be solved, we recall that the force
required to stretch or compress a spring by an amount x is directly proportional to x,
and that the constant of proportionality is called the spring constant. Thus F — ex,
where F is the force on the spring and x is the change in length caused by F.
Consider now the mechanical system shown in Figure 7.1.

1 1
1 1
, 1 , 1
1 1

MB-
1

rvffiin-
1
M, M2
1
1 1
1 1
>

Xj = 0 x2 = 0

Objects in Equilibrium Positions

Figure 7.1

On the horizontal plane containing the line OX, an object Mi of mass 1 unit is
connected to the fixed point P by a first spring with spring constant c\ = 3 . A second
object M.2 of mass 1 unit is then connected to M\ by a second spring with spring constant
C2 = 2. The centers of gravity of M\ and M% lie on a horizontal line through P. The
object Mi is displaced 1 unit toward P from its equilibrium position, and the object
M.2 is displaced from its equilibrium position 2 units away from P. The two objects
are released at time t — 0, and it is desired to find the positions of the objects at any
subsequent time t. The masses of the springs and frictional forces are to be neglected,
and no external forces act on the system.
Let x\ and #2 denote the displacement from the equilibrium positions of Mi and
M2, respectively. Each displacement Xi is measured with the positive direction to the
right as shown in Figure 7.2.
7.2 Eigenvalues and Eigenvectors 219

I
I ^
X, i
>

M
NUÛÛQOH · mmA
A/,

O - > JC

x, = 0
Objects in Motion

Figure 7.2

There are two forces acting on Mi at any time t > 0, one from each spring. The
first spring exerts a force F\ given by F\ = — 3xi since Fi acts in the direction opposite
to the displacement. The second spring exerts a force F2 given by F2 — 2(x2 — #1) since
X2 — x\ is the (directed) change in the distance from the center of gravity in Mi to that
in M2. According to Newton's second law, the sum of the forces acting on Mi is equal
to the product of its mass and its acceleration. Since Mi has unit mass, this requires
that
d2x\
-3#i + 2(#2 — x\) = —5xi + 2x2·
dt2

The only force acting on the object M2 is the force —2(^2 — #1) due to the second
spring, and Newton's second law yields

d2x2
= 2xi - 2 x 2 .
~dF
Thus the original problem has been reduced to that of solving the system of differ­
ential equations:
dzxx
dt2 = —5xi — 2x 2
2
d x2
dt2 = 2xi — 2x2-

We assume the existence of a solution of the form

X l = CL\ CQSUJ\t + Û2 COS 1^2^

X2 = b\ COSCJit + 62 COSCJ2η

(A justification for this assumption would be quite a digression, and we shall see mo­
mentarily that it is a valid one, at any rate.)
220 Chapter 7 Eigenvalues and Eigenvectors

Substitution for x\,X2-!d2x\/dt2, and d2X2/dt2 in the system yields

—αχϋϋ2 cosüJ\t — (I2UJ2 c o s o ^

= —5αι cosu\t — bü2 coscc^i + 2b\ cosu\t + 2^2 cosu^i,


— biüü2 COSLüit — b2U2 COSCJ2Î

= 2ai cosc^ii + 2a2 cosco^ — 26χ COSUJIÎ — 262 costc^i.

In matrix form, we have

«1 02
ω{ coscjii + ^ 2 COSüJ2^
61 62

5 -2 Û1 5 -2 «2
COSCJit + cosu^
-2 2 -2 2 62
> _
Since this equation is an identity in i, we must have

Û1 5 -2 «1
=
>_ -2 2

and
2 Û2 5 -2 ^2
a;;2
_62_ -2 2
> _
5 -2 CLi
Thus, a;2 and ω\ must be eigenvalues of the matrix A , and must
-2 2
> _
be an eigenvector corresponding to u;?.
The eigenvalues of A are found to be λι = 1, X2 = 6 . With ω\ — 1 and CJ2 — \/6> the
eigenvectors are given by

Q>\ 1 «2 -2I
= «1 and -62
_&1_ 2 _&2_ 1

When the initial conditions x\ = —\,dx\/dt = 0, #2 = 2, da^/cft = 0 are imposed, we


find that ai = §, 62 = I · The desired solution is given by

x\ — I cos ί — f cos \/6£,


^2 = | cos £ + | cos y/6t.
7.2 Eigenvalues and Eigenvectors 221

Exercises 7.2
Find the eigenvalues of the given matrix A.

-1 1 2 1 2 1 2
1. A = 2. A = 3. A = 4. A =
-2 2 -2 3 4 5 4

2-2 3 - 1 2 2 1 -1 -1
5. A = 1 1 1 6. A 2 2 2 7. A = 1 3 2
1 3 -1 -3 - 6 -6 -1 - 1 0

1 1 -1 3 1 - 1 2 2
8. ,4 = -1 3 -1 9. Λ - -1 1 10. A = -2 3 2
-12 0 1 1 -1 1 2

3 0 4 4 2 0 0 0
0 -1 0 0 0 1 -1 1
11. ,4 = 12. A =
0 - 4 -1 - 4 1 0 1 0
0 4 0 3 1 0 -1 2

In Problems 13-16, find the spectrum of the given linear operator on R n .


13. T(xi,x 2 ) = ( x i + 2 x 2 , 2 x ! - 2x 2 )
14. T(xi,x 2 ) = (2xi + x 2 , x i - x 2 )
15. T ( x i , x 2 , x 3 ) = (xi + x 2 - X3,-#i + 3 x 2 - x 3 , - x i + 2x 2 )
16. T ( x i , x 2 , x 3 , x 4 ) = (xi + x 3 + X4,2xi + x 2 + 3x 3 + x 4 , - x i - x 3 - x 4 ,
3xi + 2x 2 + 5x 3 + x 4 )

4 -5
17. The linear operator T on R 2 has the matrix relative to the basis
-4 3
{(1,2),(0,1)}.
{(1,2), (0,1)}. Find
Fine the eigenvalues of T, and obtain an eigenvector corresponding
to each eigenvalue.

2 1 0
18. Let T be the linear operator on V2 over R that has the matrix A 0 2 0
2 3 1
with respect to the basis {x 2 , x — 2, x + 1}. Find the eigenvalues of T, and find an
eigenvector of T corresponding to each eigenvalue.
222 Chapter 7 Eigenvalues and Eigenvectors

In Problems 19 and 20, find the spectrum of the given linear operator T on V2 over R
and find an eigenvector of T corresponding to each eigenvalue.

19. T(a0 + a\X + a2x2) = (2a0 + a>\) + Q>ix + (2a 0 + 2a\ + a2)x2

20. T(a 0 + a\x + a 2 x 2 ) = (3ao — 2a\ + 02) + 2αχχ + (a 0 — 2ai + 3a2)x2

21. Given that T is a linear operator on R 2 with T(2,1) = (5,2) and T(l,2) -
(7,10), determine the eigenvalues of T and a corresponding eigenvector for each
eigenvalue.

22. Prove that a square matrix A is not invertible if and only if 0 is an eigenvalue of
A.

23. Let A be an invertible matrix. Prove that if Λ is an eigenvalue of A, then λ _ 1 is


an eigenvalue of A - 1 .

24. Assume that T is a linear operator on R 3 that has eigenvalues 1, 2, and 3, with
associated eigenvectors (2,1,3), (1,4,0), and (1,0,0), respectively. Find the eigen­
values of T _ 1 , and give an eigenvector associated with each eigenvalue.

25. If 2 is an eigenvalue of T and v = (2, —1,3) is a corresponding eigenvector, find


an eigenvalue of the linear transformation S = T2 — 2T + 1. (1 = T° as in Section
5.5.)

26. Prove that if λ is an eigenvalue of T and if S is a polynomial in T given by


S = p(T) — Σ Ι = ο α ^ > t n e n PW = Σ Γ = ο α ^ ls a n eigenvalue of S. Find an
eigenvector of S corresponding to the eigenvalue ρ(λ).

27. Translate the results of Problem 26 into statements concerning the eigenvalues
and eigenvectors of a polynomial p(A) in a square matrix A.

7.3 Eigenspaces and Similarity


If v is an eigenvector associated with the eigenvalue λ of T, then T(cv) — cT(v) = cAv =
A(cv) for every choice of the scalar c. Thus, there are many eigenvectors associated with
the same eigenvalue. The next theorem gives a great deal of insight into this collection
of eigenvectors.

Theorem 7.6 For each eigenvalue XofTin T, let N\ be the set consisting of the zero
vector together with all eigenvectors of T in V that are associated with X. Then each
V\ is a subspace of V.
7.3 Eigenspaces and Similarity 223

Proof. The zero vector is in VA, SO VA is nonempty. Let v i , v 2 G VA and a, 6 G T.


Then
T(avi + bv2) = aT(vi) + 6T(v 2 ) = αΧνλ + bXv2 = Χ(ανλ + 6v 2 ),
so awi + 6v2 G VA, and VA is a subspace of V. ■

Definition 7.7 The subspace Y\ of Theorem 7.6 is called the eigenspace ofT that is
associated with the eigenvalue X. The dimension of VA is called the geometric multi­
plicity of X.

There is another approach that we could have taken in proving the last theorem.
For T(v) = λν if and only if (T — λ)ν = 0, i.e., if and only if v is in the kernel of T — X.
Thus VA is nothing more than the kernel of T — λ, and the geometric multiplicity of λ
is precisely the dimension of the kernel of T — X. This translates the problem of finding
an eigenspace into the familiar problem of finding a kernel. In Example 3 of Section 7.2,

V _ 2 = (T + 2)- 1 (0) = ( ( - i l , 0 , 0 ) , ( - 1 2 , 0 , 2 0 , 1 ) )

and
V 2 = ( T - 2 ) - 1 ( 0 ) = ((0,0,0,l)).
Concerning those eigenvectors of T that are associated with different eigenvalues,
we have the following result.

Theorem 7.8 Let {λι, λ 2 ,..., λ Γ } be a set of distinct eigenvalues of the linear operator
T. For each i, 1 < i < r, let wi be an eigenvector ofT corresponding to λ^. Then
{vi, v 2 ,..., v r } is a linearly independent set.

Proof. The proof is by induction on r. The theorem is true for r = 1 since any
eigenvector is nonzero.
Assume that the theorem is true for any set of k distinct eigenvalues. Let the set
{λι, ...,λ/cÀjfc+i} be a set of k + 1 distinct eigenvalues of T, with v* an eigenvector
corresponding to λ^. Suppose that ci, ...,Cfc,Cfc+i are scalars such that

civi + h ckwk + Cfc+iVfc+i = 0. (7.1)

Applying T to each member and using T(VJ) = λ^ν^, we have

ciAiVi H h ckXkvk -h Cfc+iAfc+iVfc+i = 0. (7.2)

Multiplying (7.1) by λ^+ι, we have

ciA/c+iVi + h ckXk+1vk + Cfc+iAfc+iVfc+i = 0. (7.3)

Subtracting (7.3) from (7.2), we obtain

ci{Xi - Afc+i)vi H h ck(Xk - Xk+i)vk = 0.


224 Chapter 7 Eigenvalues and Eigenvectors

From the induction hypothesis, {vi,..., v / J is linearly independent so that

Ci(Xi -Afc+i) = 0

for i = 1, 2,..., k. Since λ^ — λ^+ι φ 0 for i — 1, 2,..., k, we conclude that

Cl = c2 = · · · = ck = 0.

Thus in (7.1) we have Ck+iVk+i = 0, and hence Ck+i = 0. This shows that the set of
fc + 1 eigenvectors is linearly independent, and it follows that the theorem is true for all
positive integers r. ■

Corollary 7.9 / / VA 1 7 VA 2 , ··., VAr are distinct eigenspaces ofT and {vi,V2, ...,v r } is
a set of eigenvectors such that v^ G VA^ for i = 1, 2,..., r, then {vi, v 2 ,..., v r } is linearly
independent.

Proof. The eigenvalues Ai,A 2 ,...,A r must be distinct in order for the eigenspaces
VA 1 5 VA 2 , ..., VAr to be distinct. Since v* is an eigenvector associated with λ^, then the
set {vi, V2, ···, v r } is linearly independent by the theorem. ■

We recall that a sum W i + W 2 + · · · + W r of subspaces W^ of V is called direct if


r
w , n ] T w , = {o}
J= l

for i = 1,2, ...,r.

Corollary 7.10 IfV\1, VA 2 , ..., VAr are distinct eigenspaces ofT, the swra VA1+VA2 +
· · · + VAr is direct.

Proof. Suppose that the sum is not direct, and let ν& be a nonzero vector in VA*.
that is also contained in Σ%=ι VA.,·. Then there are vectors Vj in V\j and scalars aj
such that
r

That is, ν^ is linearly dependent on the set

A = {vi,...,Vfc_i,Vfc+i,...,v r }.

Since ν^ φ 0, there are nonzero vectors in A. Let *4' = {v^, v 2 ,..., v£} be the nonempty
set obtained by deleting all zero vectors from A. Then v^ is dependent on A' so that
the set {v'l7 v 2 ,..., v{, v^} is linearly dependent. But {v'l5 v 2 ,..., v£, v / J is a set of eigen­
vectors that satisfies the hypothesis of Corollary 7.9. Thus we have a contradiction, and
it follows that the sum 5Z[ =1 VA^ is direct. ■
7.3 Eigenspaces and Similarity 225

Suppose that T has matrix A relative to the basis A of V, and that T has matrix
B relative to the basis A' of V. If P is the (invertible) matrix of transition from A to
A!i then Theorem 5.15 asserts that B = P~lAP. This leads to the following definition.

Definition 7.11 Let A and B be nxn matrices over ?'. Then B is similar to A over
T if there is an invertible matrix P with elements in T such that B = P~lAP.

It is left as an exercise (Problem 16) to show that this relation of similarity is a true
equivalence relation on the set of n x n matrices over T. This relation proves to be a
useful tool in the investigation of the eigenvalues of matrices.
The remarks just before Definition 7.11 show that two nxn matrices over T are
similar over T if and only if they represent the same linear operator on an n-dimensional
vector space V over T.
The strong connection between the relation of similarity and the eigenvalues of
matrices becomes apparent in our next theorem.

Theorem 7.12 Similar matrices have the same characteristic polynomial.

Proof. If B is similar to A over T', then there is an invertible matrix P such that
B = P~lAP. Thus

det{B-xI) = det(P-MP-xJ)
= àet(P-lAP-xP-lP)
= det (F" 1 (A -xI)P)
= detiP'1) · det{A - xl) · det(P)
= det(P-1P)-det{A-xI)
= det(A- xl),

so that B and A have the same characteristic polynomial. ■

Corollary 7.13 Similar matrices have the same spectrum.

Near the beginning of this section, we defined the geometric multiplicity of an eigen­
value of a linear transformation. There is a second type of multiplicity for eigenvalues,
the algebraic multiplicity.

Definition 7.14 Let λ be an eigenvalue ofT, and let A be any matrix that represents
T. The algebraic multiplicity of λ is the multiplicity of λ as a root ofdet(A-xI) = 0.

From Theorem 7.12 and our discussion concerning Definition 7.11, it is clear that the
algebraic multiplicity of an eigenvalue is well-defined. That is, the algebraic multiplicity
is independent of the choice of the matrix A.
226 Chapter 7 Eigenvalues and Eigenvectors

Example 1 D In Example 3 of Section 7.2, the linear operator T had the two distinct
eigenvalues λι = — 2 and X% = 2. Examining the characteristic polynomial of A, we see
that the algebraic multiplicity of each eigenvalue is 2. Upon comparing the algebraic and
geometric multiplicities, we find that the two are equal for Ai, but that the algebraic
multiplicity of X3 exceeds the geometric multiplicity, which is 1. Our next theorem
shows that the situation in this example illustrates the only possibilities. ■
Theorem 7.15 The geometric multiplicity of an eigenvalue does not exceed its algebraic
multiplicity.

Proof. Suppose that the geometric multiplicity of an eigenvalue λ of T is r, and


let {vi, V2, ··., v r } be a basis of VA- This basis of VA can be extended to a basis A =
{vi, V2,..., v r ,..., v n } of V. The matrix of T relative to this basis is

λ 0 ·· · 0 αι>Γ+ι ·· · ain
0 λ · · · 0 a2,r+l ' · · Û2n

A =
0 0 · · · λ ar , r + l

0 0 ··· 0

and
& r + l , r + l ~~ X ' ' ' Gr+l,n
r
det(A - xl) = (A - x)

&n,r+l Cinn X

Thus the algebraic multiplicity of λ is at least r. That is, the geometric multiplicity
does not exceed the algebraic multiplicity. ■
Example 2 D Let T be the linear operator on V2 over R defined by
T(a0 + a\x + a2x2) = (-4a 0 - a\ - a2) + (4a0 + 2a2)x + (2a0 + a\ - a2)x2.
We shall find the algebraic and geometric multiplicity of each eigenvalue of T and obtain
a basis for each eigenspace. The set { Ι , χ , χ 2 } is a more convenient choice of basis, but
we shall use the basis
A= {Ι,Ι + χ , Ι + χ + χ 2 }
to emphasize the difference between the eigenvectors of T and the eigenvectors of the
matrix A that represents T relative to A.
Since
T(l) = - 4 + 4x + 2x 2
= (-8)(l) + 2(l + x) + 2(l + x + z 2 ),
7.3 Eigenspaces and Similarity 227

T ( l + x) = - 5 + 4x + 3x 2
= (-9)(l) + l ( l + x) + 3(l + x + x 2 ),

T ( l + x + x 2 ) = - 6 + 6x + 2x 2
= ( - 1 2 ) ( l ) + 4 ( l + x) + 2(l + x + x 2 ),

the matrix of T relative to A is

Γ
-8 - 9 - 1 2
2 1 4
2 3 2

We find that the characteristic polynomial of A is

-8-x -9 -12
det(A-xI) = 2 1-x 4
2 3 2-x
3 2
= - ( x + 5x + 8x + 4)
= - ( x + l)(x 2 + 4x + 4)
= - ( x + l)(x + 2) 2 .

Thus the eigenvalues of T and A are —2 with algebraic multiplicity 2 and —1 with
algebraic multiplicity 1.
For the eigenvalue λ = — 2, the system (A + 21) = 0 appears as

- 6 - 9 -12 Xl 0
2 3 4 £2 = 0
2 3 4 ^3 0

The reduced row-echelon form for the corresponding augmented matrix is

1 2 0
0 0 0 0
0 0 0 0

From this, we see that the solutions to (A + 21) = 0 are given by x\ — —\x<i — 2x3,
with X2 and £3 arbitrary. That is, the coordinates of vectors in the eigenspace V_2 are
228 Chapter 7 Eigenvalues and Eigenvectors

given by the eigenvectors

3 -2
Xl \x2 - 2x3 2
X = X2 X2 = X2 1 H-^3 0
X3 X3 0 1

of A. Thus V_2 has dimension 2. We can find coordinates for a basis of V_2 by first
setting #2 = — 2 and x% = 0 to obtain

X =

and then setting #2 — 0 and £3 = — 1 to obtain

2
X = 0
-1

(Any other linearly independent pair of coordinate matrices X would serve as well, of
course.) Corresponding to these coordinates, we have the vectors 3(1) + (—2)(1 + x) =
1 - 2x and 2(1) + (-1)(1 + x + x2) = 1 - x - x2 that form a basis of V_ 2 .
For the eigenvalue λ = — 1, (A + I)X = 0 is given by

-7 - 9 12 xi 0
2 2 4 X2 = 0
2 3 3 X3 0

The reduced row-echelon form this time is

1 0 3 0
0 1 - 1 0
0 0 0 0

and the coordinates of vectors in V _ i are given by

xi -3
X = X2 = X3 1
X3 1
7.3 Eigenspaces and Similarity 229

Choosing x 3 = —1 yields X — [3, —1, —1] and {1 — 2x — x2} as a basis of V _ i .


Summarizing, the eigenvalue λ = — 2 has algebraic multiplicity 2, geometric multi­
plicity 2, and {1 — 2x, 1 — x — x2} is a basis for the eigenspace V_2- The eigenvalue
λ = — 1 has algebraic multiplicity 1, geometric multiplicity 1, and {1 — 2x — x2} is a
basis for V _ i . ■

Exercises 7.3

In Problems 1-6, let T be the linear operator on R n that is represented by the given
matrix A relative to the standard basis of R n . Find the algebraic multiplicity and
the geometric multiplicity of each eigenvalue of T. The matrices here are taken from
Problems 1-12 in Exercises 7.2.

1 -1 -1
-1 2 1 2
1. A 2. A = 3. A = 1 3 2
-2 3 2 -2
-1 -1 0

3 0 4 4 2 0 0 0
3 1 -1
0 - 1 0 0 0 1 - 1 1
4. A -1 1 1 5. A 6. A--
0 -4 -1 -4 1 0 1 0
1 1 1
0 4 0 3 1 0 - 1 2

7. For each matrix A below, let T be the linear operator on R 3 that has matrix
A relative to the basis A = {(1,0,0), (1,1,0), (1,1,1)}. Find the algebraic and
geometric multiplicities of each eigenvalue, and a basis for each eigenspace.

8 5 - 5 -4 -3 -1
(a) A 5 8 - 5 (b)A = -4 0 -4
1 15 15 - 1 2 8 4 5

3 2 2 8 5 6
(c)A- 1 4 1 (d)A = 0-2 0
-2 -4 -1 -10 - 5 -8

8. Let C denote the field of complex numbers, and let Cn be as defined in Example
1 of Section 4.2. Let £n be the basis Sn = {ei,e2, ...,e n } of C n , where ei =
(1,0,0,..., 0), e2 = (0,1,0,..., 0), etc. If T is the linear operator that has the given
matrix A relative to £ n , find the algebraic and geometric multiplicities of each
eigenvalue, and a basis for each eigenspace.
230 Chapter 7 Eigenvalues and Eigenvectors

2 1-2 5 i
(aM = (h)A = (c)A
1+ 2 3 -i 2

i 0 0 1+ 2 0 0
(d)A = -2i i -2 +2 (e)A = -22 1 + 2 22
0 0 - 2 2 0 1

- 2 + 2i 0 - 2 + i
(f)A = 0 -i 0
4 — 2z 0 4-z

In Problems 9-12, find the eigenvalues of the given linear operator T on V2 over R. For
each eigenvalue, (a) state the algebraic multiplicity, (b) state the geometric multiplicity,
and (c) find a basis for each eigenspace.

9. T(ao + CL\X + Ü2X2) = (3ao + a\ + (22) + 2a\X + 2a2X2

10. T(ao + ΟΊΧ + Ö2^ 2 ) = 2ao + (3ao + a\ + 2^2)^ + (3ao — ai + 4a2)x 2


11. T(a0 + aix + a 2 x 2 ) = (a 0 + 3ai + 2a 2 ) + 2a\x + [a\ + 2a 2 )x 2
12. Γ(αο + a\x + a 2 x 2 ) = (2a 0 + ai) + 2aix + (2ao + 3ai + a2)x2

1 2
13. Let T be the linear operator on R 2 that has the matrix A = relative to
2 -2
the standard basis of R 2 .

(a) Find eigenvectors Vi, V2 of T such that {vi, V2} is a basis of R 2 .


(b) Find the matrix of T relative to this basis.

5 -2
14. The linear operator T on R 2 has the matrix relative to the basis
-2 2
A — {(3,3), (1, —1)}· Find the eigenvalues of T and obtain an eigenvector of T
corresponding to each eigenvalue.
15. Suppose that the basis A = {vi, V2,..., v n } of V consists entirely of eigenvectors
of T. Determine the matrix of T relative to A.
16. Prove that the relation of similarity over T is an equivalence relation on the set
of all n x n matrices over J7.
17. Which n x n matrices over T are similar to ΙηΊ
7.4 Representation by a Diagonal Matrix 231

18. Prove that if B is similar to A, then BT is similar to AT.

19. Prove that if A and B are n x n matrices over T with A invertible, then BA is
similar over T to AB.

20. Prove that if B is similar to ^4 over T, then p(B) = Σι=$α>ΐΒτ is similar to


P(^) = Σ ί = ο α * ^ * for a n
y αο,αχ,...,α/e G J 7 .
21. Prove that similar matrices have the same rank.

22. Prove that if B is similar to A, then det(B) = det(A).

23. Let B = P~lAP and suppose that X is an eigenvector of A corresponding to


the eigenvalue λ. Show that λ is an eigenvalue of B, and find a corresponding
eigenvector.

24. For any square matrix A = [a^] n , the trace of A, t(A), is defined by t(A) =
ΣΓ=ι α " · That is, t(A) is the sum of the diagonal elements of A. Prove that if B
is similar to A, then t(B) — t(A).

7.4 Representation by a Diagonal Matrix


The simplest form that the matrix of a linear operator can have is that of a scalar
matrix. For if cln is the matrix of T relative to the basis A = {νχ, V2,..., v n } of V, then
T(vi) — cvi for each i, and this implies that for any v = 5^ILi aivi m V,
n n n
a T V = aiCVi = C
T(v) = ^2 i ( i) ^t X ] a ^ V ^ = CV ·
i=l 2=1 i=l

That is, c is an eigenvalue of T and every nonzero vector in V is an eigenvector cor­


responding to c. If c > 0, T can be described geometrically as an expansion about the
origin if c > 1 and as a contraction about the origin i f c < l . I f c < 0 , T can be described
as a reflection through the origin followed by an expansion or a contraction.
The next simplest form for a matrix is a diagonal matrix (of which the scalar matrix
is a special case). If a linear operator has a diagonal matrix relative to a certain basis,
this diagonal matrix displays at a glance the essential features of the transformation.
With v confined to an eigenspace V A , T ( V ) = λν so that T maps VA in the same
fashion as a scalar linear operator maps the entire space V.
Although the class of linear operators that can be represented by a diagonal matrix
is quite large, not all linear operators can be represented in this way. Our primary
objective in this section is to characterize those linear operators that can be represented
by a diagonal matrix. This characterization is given in the next theorem.

Theorem 7.16 The linear operator T onV can be represented by a diagonal matrix if
and only if there is a basis of V that consists entirely of eigenvectors of T.
232 C h a p t e r 7 Eigenvalues a n d Eigenvectors

Proof. Suppose that A = {vi, V2,..., v n } is a basis of V such that each v^ is an


eigenvector of T with λ^ as the corresponding eigenvalue. Then T(v^) = A^v^ and the
matrix of T relative to A is
Γλι 0 ··· 0 1
0 λ 2 ··· 0

[ 0 0 ·. · λη J
Thus T is represented by a diagonal matrix relative to A.
On the other hand, if T has a diagonal matrix

[di 0 ... 0 1
D 0 d2 · · · 0
= . l·
[ 0 0 · ·. dn \
relative to the basis {νχ, v 2 ,..., v n } of V, then T(v^) = e^v; so that each di is an
eigenvalue of T with v^ as an associated eigenvector. ■

There are several corollaries that are worthy of mention.


Corollary 7.17 IfTis represented hy a diagonal matrix, the elements on the diagonal
are the eigenvalues ofT.

Proof. This follows at once from the last part of the proof of the theorem. ■
Corollary 7.18 If T has n distinct eigenvalues in T, then T can he represented by a
diagonal matrix.

Proof. Suppose that T has n distinct eigenvalues λι, λ 2 ,..., λ η in T. Consider a set
A = {vi, V2,..., v n } of n vectors in V that contains exactly one eigenvector correspond­
ing to each λ^. The set A is linearly independent by Theorem 7.8 and therefore forms a
basis of the n-dimensional vector space V. ■
Corollary 7.19 / / the n x n matrix A over T has n distinct eigenvalues in J-*, then A
is similar over F to a diagonal matrix.

Proof. If A is an n x n matrix over T with n distinct eigenvalues in T, then any


linear operator that A represents has n distinct eigenvalues, by Theorem 7.4. But such
a linear operator can be represented by a diagonal matrix, and this diagonal matrix is
similar to A (by Definition 7.11 and Theorem 5.15). ■
T h e o r e m 7.20 Suppose that all eigenvalues ofT are in T. Then T can he represented
by a diagonal matrix if and only if the geometric multiplicity of each eigenvalue of T is
equal to the algebraic multiplicity.
7.4 Representation by a Diagonal Matrix 233

Proof. Let T be a linear operator on the n-dimensional vector space V. In view of


Theorem 7.16, it is sufficient to show that there exists a basis of eigenvectors of T if
and only if the geometric and algebraic multiplicities of each eigenvalue are equal. As
stated in the theorem, all eigenvalues of T are assumed to be in T.
Let λχ, λ2,..., λΓ be the distinct eigenvalues of T, let nz = dim(VAj be the geomet­
ric multiplicity of λ ζ , and let ra2 be the algebraic multiplicity of λ^. Since rrii is the
multiplicity of λ^ as a zero of the n th -degree polynomial det(A — xl), ΣΓ=ι mî = n-
According to Theorem 7.15, 0 < Hi < rrii for each i. Hence J2l=i ni = n ^ ano ^ οη^Υ ^
Tti = rrii for each i.
Now let Bi = {u^i, u^2, ···, u^ n i } be a basis of Y\i for each i, and put

B = { U n , . . . , U i n i , U 2 1 , ...,U2n 2 î •••?U r i, . . . , U r r i r } .

The set B contains Σ1=ϊ η\ vectors and clearly spans

ν λ ι + ν λ 2 + ··· + νλΓ.
The sum 5Z[ =1 V ^ is direct by Corollary 7.10, and therefore

(E M E
r \ r r

v = dim v
( ^) = Eni·
i=l / i=l i=l
Hence B is a basis of $Z[ =1 V A ^
Assume that there exists a basis {vi, V2,..., v n } of eigenvectors of T. Each Vj is in
some V\j and therefore dependent on B. This means that B spans V and consequently
has n elements since it is linearly independent. Thus 5^[ =1 Ui — n and nz = mz for
i = l,2,...,r.
Assume now that n2 = mz for i = l,2,...,r. Then 5^[ =1 nt- = n so that # has
n vectors. Since B is linearly independent, ß must be a basis of V. And since B is
composed of eigenvectors of T, the proof is complete. ■
In the remainder of this chapter, the frequent references to diagonal matrices make
it desirable to have a more compact notation for this type of matrix. This notational
convenience is provided in the next definition.

Definition 7.21 The diagonal matrix D = [dij]n with dij = 0 for i φ j and da = Xi
will be denoted by D = diag{Ai, λ2,..., λ η } .

We have seen that the problem of finding a diagonal matrix and a basis such that
a given linear operator is represented by the diagonal matrix is one type of eigenvalue
problem. Since we have a systematic method for finding the eigenvalues and eigenvectors
of a linear operator, we are already equipped to solve this type of problem. We also have
available from Chapter 5 a method for finding an invertible matrix P such that P~~XAP
is diagonal. For, with any convenient choice of basis #, P is the matrix of transition
from Λ to a basis A' of eigenvectors of T.
234 Chapter 7 Eigenvalues and Eigenvectors

In most eigenvalue problems, the linear operator T is not given explicitly. Instead,
one encounters the matrix A and is confronted with the problem of finding an invertible
P such that P~lAP is diagonal. In such a situation, the formulation of the problem in
terms of linear operators, vectors, and bases is only an encumbrance. It is more efficient
to proceed directly to the problem of finding the columns of P. For this procedure, it
is desirable to formulate the problem P~lAP = D = diag{Ai, A 2 ,..., λ η } in the form
AP = PD. With A = [α^] η and P = \pij]n, the element in the z th row and j t h column
of AP is Σ £ = ι dikPkj, whereas the corresponding element in PD is PijXj. With j fixed,
we have
Σ£=1 alkPkj PijXj Pij

Σ £ = ι a2kPkj P2jXj P2j


= = λ;

2^k=l ankPkj _ _ Pnj Xj _ _ Pnj _


But the left-hand member of this equation is the same as the product

Û11 «12 θΊη Pij

Û&21 a
22 &2n P2j
APu

uni û n 2 &nn . Pni .


where Pj is the j t h column of P. Thus, we find that the equation
p - U P = dia € {Ai,A 2 ,...,A n }
is equivalent to the system of equations
APj = XjPj, j = 1,2, ...,n,
th
where Pj is the j column of P. But this system says precisely that the j t h column
of P is an eigenvector of A corresponding to Xj. The requirement that P be invertible
is equivalent to the requirement that these eigenvectors of A be linearly independent.
Eigenvectors that are associated with distinct eigenvalues automatically form a linearly
independent set (as in Theorem 7.8), but care must be taken to ensure independence
whenever the geometric multiplicity of an eigenvalue exceeds 1. The procedure is illus­
trated in our next example.
Example 1 D Consider the problem of finding a real invertible matrix P such that
Ρ~λΑΡ is diagonal, where

7 3 3 2
0 1 2 - 4
-8 - 4 - 5 0
2 1 2 3
7.4 Representation by a Diagonal Matrix 235

As the initial step, we find the characteristic equation of A, given by


(x-3)2(x-l)(x + l)=0.
We consider first the repeated eigenvalue λι = À2 = 3, for if the geometric multiplicity
of this eigenvalue is less than 2, then no matrix P of the required type exists (Theorem
7.20). The equation (A - 3I)X = 0 is given by

4 3 3 2 Xl 0
0-2 2 4 X2 0
-8 -4 -8 0 ^3 0
2 1 2 0 £4 0

Straightforward computations show that the solutions here are given by


3
Xl 2 1
X2 1 -2
= X3 + £4
X3 1 0
X4 0 1

Hence the eigenvalue 3 has geometric multiplicity 2, and

3 1
-2
Pi and P2 =
-2
0

provide two linearly independent columns of P. Repetition of the same procedure yields
the solutions

1 1
-2 -6
ft = for Xs = 1 and P4 = for XA = —1.
0 4
0 -1
Thus the matrix
3 1 1 1
2 -2 -2 -6
P=[P1,Pi,P3,P4]
2 0 0 4
0 1 0 -1
236 Chapter 7 Eigenvalues and Eigenvectors

is an invertible matrix such that P~XAP = diag{3,3,1, —1}.


As a check on this solution, the student may verify that AP = P · diag{3,3,1, —1}
or that
1 21 1
2 0
1 1 1
2 4 2 1
5
3 -2 2 -1
1 1 1
\ 2Δ 4 <± 2Δ
l
and P~ AP = diag{3,3,1, - 1 } . ■
Exercises 7.4
In Problems 1-6, (a) determine whether the given matrix A is similar over R to a
diagonal matrix, and (b) whenever possible, find an invertible matrix P over R such
that P~lAP is a diagonal matrix. The matrices here are the same as in Problems 1-6
of Exercises 7.3.

1 -1 -1
-1 2 1 2
1. A = 2. A = 3. A = 1 3 2
-2 3 2 -2
-1 -1 0

3 0 4 4 2 0 0 0
3 1 -1
0 - 1 0 0 0 1 - 1 1
4. A = -1 1 1 5. A = 6. A =
0 -4 -1 -4 1 0 1 0
1 1 1
0 4 0 3 1 0 - 1 2

In Problems 7-10, (a) determine whether the given linear operator T can be represented
by a diagonal matrix, and (b) whenever possible, find a diagonal matrix and a basis such
that T is represented by the diagonal matrix relative to the basis. The linear operators
are the same as those in Problems 13-16 in Exercises 7.2.
7. T(xi,x2) = (xi + 2x 2 ,2x x - 2x 2 ) on R 2
8. T(x l 5 x 2 ) = (2a:i + x 2 ,xi - x2) on R 2
9. T(xi,x2,x3) = (xi + x2 - x 3 , —xi + 3x 2 - #3, —x\ + 2x2) on R 3
10. Τ{χι,Χ2,Χ$,Χ4) = (Xi + X3 +#4,2X1 + X2 +3#3 +X4, - # 1 - #3 - #4,
3#i + 2x2 + 5x3 + X4) on R 4
In Problems 11-14, let T be the linear operator on R 3 that has the given matrix A
relative to the basis A = {(1,0,0), (1,1,0), (1,1,1)}. (a) Determine whether T can be
represented by a diagonal matrix, and (b) whenever possible, find a diagonal matrix
and a basis of R 3 such that T is represented by the diagonal matrix relative to the
basis. These linear operators are the same as those in Problem 7 of Exercises 7.3.
7.4 Representation by a Diagonal Matrix 237

8 5 - 5 -4 -3 -1
11. A 5 8 - 5 12. A = -4 0 -4
15 15 - 1 2 8 4 5

3 2 2 8 5 6
13. A = 1 4 1 14. A 0-2 0
-2 -4 -1 -10 - 5 -8

In Problems 15-20, (a) determine whether the given matrix A is similar over C to a
diagonal matrix, and (b) whenever possible, find an invertible matrix P over C such
that P~lAP is a diagonal matrix. The matrices here are the same as in Problem 8 of
Exercises 7.3.

2 1 - 2 5 z
15. A = 16. A
1+ i 3 -i 2

3 4 2 i 0 0
17. A 1 3 1 18. A: -2z 2 - 2 - h i

1 2 2 0 0 - 2

1+ 2 0 0 - 2 + 2i 0 -2 + z
19. A -2i 1 + 2 2i 20. A 0 - 2 0
i 0 1 4-2i 0 4-2

2 1 . Whenever possible, perform a check on the work in the indicated problem by


verifying that AP — PD, where D is the diagonal matrix that is similar to A.
(a) Problem 1 (b) Problem 2 (c) Problem 3 (d) Problem 4
(e) Problem 5 (f) Problem 6 (g) Problem 15 (h) Problem 16
(i) Problem 17 (j) Problem 18 (k) Problem 19 (1) Problem 20

22. Whenever possible, perform a check on the work in the indicated problem by
computing P - 1 and verifying that P~l AP is indeed a diagonal matrix.
(a) Problem 1 (b) Problem 2 (c) Problem 3 (d) Problem 4
(e) Problem 5 (f) Problem 6 (g) Problem 15 (h) Problem 16
(i) Problem 17 (j) Problem 18 (k) Problem 19 (1) Problem 20

23. Give an example of a 2 x 2 matrix over R that is not similar over R to a diagonal
matrix.
238 C h a p t e r 7 Eigenvalues and Eigenvectors

24. Give an example of two 2 x 2 matrices that have the same characteristic equation
but are not similar.

25. Show that the characteristic polynomial of the matrix

0 1 0 ·· 0
0 0 1 ·· 0
C =
0 0 0 .· 1
-co - c i - c 2 ·· * -Cn-l _

is p(x) = (-l)n(xn + cn-\xn~l + · · · + c\x + Co). The matrix C is called the


companion matrix of the polynomial p(x). (Hint: Expand det(C — xl) about
the last row.)

26. Use the result of Problem 25 to write down a matrix with the given polynomial
p(x) as its characteristic polynomial.
(a) p(x) = -x3 + 5x2 - 2 (b) p(x) = x2 - 3x + 2
(c) p(x) = x4 + 5x 2 + 4 (d) p(x) = -x5 + 1

27. Suppose that λχ,..., λΓ are the distinct eigenvalues of T, and that each λ^ is in J7.
Prove that T can be represented by a diagonal matrix if and only if

V = V> 'Xr'
Chapter 8

Functions of Vectors

8.1 Introduction
There are several standard types of functions defined on a vector space that have found
widespread application. The linear transformation, which we have already studied, is
probably the most important of these, but there are others that are of great value. The
linear functional is central to the study of linear programming. The quadratic form
is frequently useful in statistics, engineering, and physics. We shall encounter each of
these types of functions in this chapter.

8.2 Linear Functionals


We recall from Chapter 5 that whenever U and V are vector spaces over the same field
T, the set of all linear transformations of U into V is a vector space over T (Theorem
5.3). Also, we have seen in Example 1 of Section 4.2 that a field T may be regarded
as a vector space over itself. Thus, for any vector space V over T, the set of all linear
transformations of V into T is a vector space over T. This type of linear transformation
is of such importance that is has a special name.

Definition 8.1 Let Y be a vector space over the field T. A linear transformation of Y
into T is called a linear functional on V. The set of all linear functionals on V is
denoted by V*.

Thus, a linear functional on V is a scalar-valued function / defined on V that has


the property
/ ( a u + bv) = a / ( u ) + 6/(v)
for all u, v G V. As mentioned above, V* is a vector space over T. There are several
interesting relations between V and V* that we will investigate later, but first let us
consider some examples illustrating Definition 8.1.

239
240 Chapter 8 Functions of Vectors

Example 1 D Let V = R n , and let c = (ci, C2,..., cn) be a fixed vector in V. For each
v = (xi,X2, —,Xn) £ V, define / ( v ) to be

/ ( y ) = C V = C\X\ + C2X2 H l· CnXn.

The function / so defined is scalar-valued, and

/ ( a u + 6v) = c-(au + 6v) = a(c · u) + 6(c · v) = a/(u) + fe/(v).

Thus / is a linear functional on R n . ■

Example 2 G Let V = R 2 , and define / ( v ) = ||v|| for each v G V. Then / is scalar-


valued, but / is not a linear transformation. For example,
/(ei) + /(e2) = /(l,0) + /(0,l) = 2

and
/ ( e i + e 2 ) = / ( l , l ) = >/2.
Thus / is not a linear functional on R 2 . 1

Example 3 □ For a fixed positive integer n, let V be the vector space Vn consisting of
all polynomials in x with coefficients in the field T and degree < n. For each polynomial
p(x) = Σ™=0α*χ2> define f(p(x)) = p(0). The mapping / is clearly scalar-valued. For
any p(x) = Σ™=0 aix1 and q(x) = Σ™=0 bix1 in Vn and any a, b G T,

f(ap(x) + bq(x)) = fijtiMi + bbi^)


= aao + 66o
= af(p(x)) + bf(q(x)).

This shows that / is a linear functional on P n . I

Example 4 D Let V be the vector space Hnxn of all n x n matrices over R as defined
in Chapter 4. For each A = [α^·]η G V, the trace of A, denoted by t(A), is given by
t(A) = Σ™=ι au· It is left as an exercise (Problem 2) to verify that t is a linear functional
On R n x n · β

Example 5 □ The set of all convergent sequences of real numbers is a subspace W


of the vector space in Example 3 in Section 4.2. The function / defined by f({an}) =
lim an is a linear functional on W. ■
n—>·οο

As mentioned earlier, it is already known that V* is a vector space over ?', just as
V is. The following theorem shows that V* has the same dimension as V whenever V
is of finite dimension.

Theorem 8.2 If V is a vector space of finite dimension n over T, then V* is also of


dimension n over T.
8.2 Linear Functionals 241

Proof. Suppose that A = {ui, u 2 ,..., u n } is a basis of V. For each j , (j = 1,2,..., n),
let pj be defined at u = ΣΓ=ι Xi^i in V by

Since each u can be written uniquely as u = ΣΓ=ι XiUi, the value Pj(u) is well-defined,
and Pj is a mapping of V into ?'. For any a, 6 in T and u = ΣΓ=ι x^u^, v = ΣΓ=ι 2/iU2-
in V,

Pj{au + ftv) = Pj ί έ ( α χ ; + &2/i)ui J


= axj + 6%·
= ap^uj + ftp^v).

Thus each pj is contained in V*.


The contention is that the set A* = {pi,P2> - , P n } is a basis of V*. To show that
A* spans V*, let / be any linear functional on V. If we put α^ = /(u^), then for any
u
= ΣΓ=ι x * u * in V
'
n
/(u) = 2 =E1 ^ / ( u 0
n
/ j Xi^i
2=1
n
= Σ ûiPi(u)
2=1

e ^ i P i ) ( u )·

Hence / = ΣΓ=ι α*Ρ*> a n ( ^ ^ * s P a n s V*. To see that A* is linearly independent, suppose


that ci, C2, .·., c n is a set of scalars such that

CiPl + C2P2 H h C n p n = Z,

where Z is the zero linear functional. Then (ΣΓ=ι c*Pi) ( u ) ~ 0 f° r eacn


u £ V. In
particular, for each u,,

(
n \ n n

Σ°*Ρί I ( u i) = ^ciPi(uj) = Σβίδίΐ = C


J'
2=1 / 2=1 2=1
That is, each Cj = 0, and this completes the proof. ■
Definition 8.3 The basis A* in the proof of Theorem 8.2 is called the dual basis of
A. The linear functionals pj in A* are called the coordinate projections relative to
A.
242 Chapter 8 Functions of Vectors

For later use we note that the defining property of the coordinate projections pj is
that Pj(ui) = 6ij, for each base vector u^.
Whenever V is finite-dimensional, V* is called the dual space of V. If V is of
infinite dimension, then V* is not necessarily isomorphic to V, and the term "dual
space" is not ordinarily used.
For the remainder of this section, V will denote an n-dimensional vector space over
a field T". As a linear transformation of V into ?', each linear functional has a l x n
matrix relative to each basis of V. According to Definition 5.10, the matrix of / relative
to the basis A — {ui, U2,..., u n } of V is A = [ai, a2,..., an] where ctj = f{uj). (We are
using the basis {1} of T here, and will adhere to this choice consistently.) The matrix of
/ provides a convenient method of computing the values / ( u ) . For if u has coordinate
matrix X = [χι,Χ2, ••·,^η] 7 \ then

i=l

Xl

X<2

a\ Ü2 · · · an

= AX.

Actually, this result is nothing new, but merely a special case of Theorem 5.12.
The set {1} is clearly the simplest choice of basis for ?', so we do not propose to
make any changes here. But there is no reason to restrict ourselves in the choice of
basis in V, and Theorem 5.15 describes completely the results of such a change. If /
has matrix A relative to the basis A of V, and if Q is the matrix of transition from A
to the basis B, then / has matrix B — AQ relative to B. (The space V here is playing
the role of U in Theorem 5.15.)
Any basis of V has a dual basis in V*, so a change of basis from A to B in V induces
a corresponding change of basis from A* to B* in V*. Our next theorem describes the
relation between these changes of bases.

Theorem 8.4 If Q is the transition matrix from the basis A = {ui,U2, ...,u n } to the
basis B = {vi, V2,..., v n } of V, then (QT) is the matrix of transition from A* to B*
mV*.

Proof. Rather than prove the statement in the conclusion, we shall prove the equiv­
alent assertion that QT is the transition matrix from B* to A*.
Let A* = {pi,P2,-..,Pn}, and let #* = {gi,g 2 ,...,gn}. Now p^ = Y2=ickjSk,
8.2 Linear Functionals 243

where C = [cij]n is the matrix of transition from ß* to A*. Then


n n
Pj(Vi) = ^2ckjgk{^i) = ^CkjSki = Cij.
fc=l fc=l
But Vj = X ^ = 1 ç/eiUfc since Q = [%] n is the transition matrix from A to B. Hence
n n
Pj( ») 2ZfePj( /c) = 22qkiSJk
v = u = q u
i
k=l k=l

so we have c^· = Pj(v^) = g^, and C = Q T . B

The ideas developed in this section are illustrated in the following example.

Example 6 □ Let V be the vector space C3 of all ordered triples of complex numbers. 1
The mapping / given by f{c\,C2,cs) = ic\ — ic 2 + c% is a linear functional on C3.
Relative to the basis A = {ui = (1,0,0),u 2 = (0, i,0),U3 = (0,1, z)}, / has matrix
A = [z, 1,0]. The coordinates Xi of u = (ci,c 2 ,C3) relative to A are given by x\ —
c\,%2 = -ic2 + c 3 ,x 3 = - i c 3 , and

ci

AX i 1 0 -ic2 + c 3 ici -ic2 + c3 = / ( u ) .


-zc 3

The elements of the dual basis A* = {pi, P2, P3} are given by

Pi(ci,c 2 ,c 3 ) = ci,
p 2 (ci,c 2 ,c 3 ) = -zc 2 + c 3 ,
P3(ci, c 2 ,c 3 ) = - i c 3 .

The matrix
0 -2 0

Q = 1 0 1
0 0 -i
is the matrix of transition from A to B = {vi,v 2 ,V3}, where vi = (0,z,0),v 2 =
(—i, 0,0), V3 = (0,0,1). The matrix of / relative to B is
B = AQ= [1,1,1].

The coordinates yi of u = (ci, c 2 , c3) relative to # are ?/i = — zc 2 ,y 2 = ici, 2/3 = C3, and

-BF = —ic2 + ici + c 3 = / ( u ) .


1
T h e symbol i here denotes the square root of — 1.
244 Chapter 8 Functions of Vectors

The dual basis is B* = {gi,g2,g3}, where

gi(ci,c 2 ,c 3 ) = -ic2,
g2(ci,c 2 ,c 3 ) = ici,
g3(ci, C2,C3) = C3.

The matrix (Q T ) is given by

0 i 0
(QT)_1 = 1 0 0
-i 0 z

so that gi = p2 — ip 3 ,g2 = ψ ι , and g 3 = φ 3 . The reader may verify that these last
equalities are correct by evaluating both members for an arbitrary u GC 3 . ■

Exercises 8.2

1. In Example 3 of this section, replace f(p(x)) — p(0) by f(p(x)) = p(c) for a


constant c. Determine if the resulting function is a linear functional.

2. Verify that t(aA + bB) = at (A) + bt(B) in Example 4.

3. Define / on the vector space R n x n in Example 5 of Section 4.2 by f(A) = det(A).


Determine whether or not / is a linear functional.

4. Let V be the vector space of all real-valued continuous functions defined on the
closed interval [0,1] . For each g G V, put h(g) = J0 g(t)dt. Determine whether
or not h is a linear functional on V.

5. If / ( x i , X2, X3) = 2#i — X2 + 4x 3 , find a vector c G R 3 such that / ( u ) = c · u for


all u G R 3 .

6. Let A = { u i , u 2 , u 3 } where Ui = ( l , 2 , 0 ) , u 2 = ( l , 0 , 2 ) , u 3 = (0,1,2), and let


A* = {pi,P2, P3}· Find the three expressions that give the values of Pi(xi, X2,£3)
for i = 1,2,3.

7. Let . 4 = {(1,1,0), (1,0,0), (1,1,1)} and A* = {pi,P2,P3>, and let / be the linear
functional that has coordinates [1, 2,3] T relative to A*. Find the values of /(5,4,3)
and f{x\,X2,X3)·
8. Let u be a fixed vector in the n-dimensional vector space V.

(a) Prove that there is a nonzero / G V* such that / ( u ) =0.


(b) Let u = (1, 2,3) G R 3 . Find a linear functional f φ Ζ such that /(u) = 0.
8.2 Linear Functionals 245

9. For the given basis A of R 3 , find the dual basis A*.


(a) A = {(0,0,1), (0,1,1), (1,1,1)} (b) A = {(0,1,1), (1,1,0), (1,0,1)}
(c) A = {(2,0,2), (0,1,1), (2, - 3 , 1 ) } (d) A = {(1, - 1 , 0 ) , (2, - 1 , 1 ) , (1,1,3)}

10. Let £3 = {gi,g2,g3} denote the dual basis of £3 = {βι,β2,β3}. In each part of
Problem 9, find the coordinates of the elements Pi of A* relative to £3.

11. Find the matrix of the given linear functional relative to the given basis of R".

(a) f(xi,x2,x3) = 3zi - 2x 2 + 7x3,A = {(1,0,1), (1,1,0), (0,1,1)}


(b) / ( x i , x 2 , x 3 ) = 6x1 + 5x 2 - 8x3,-4 = {(4, - 3 , 1 ) , (5, - 3 , 1 ) , (3, - 2 , 1 ) }
(c) / ( x i , x 2 , x 3 , x 4 ) = 2 x i + 4 x 3 + 1 2 x 4 ,
. 4 = {(1,0,0,0), (1,1,0,0), (1,1,1,0), (1,1,1,1)}
(d) /(X!,X 2 ,X3,X4) = 9xi - 6X2 + 3X4,
A = {(1,0,1,0), (1,0,0,1), (0,0,1,1), (0,1,1,0)}

12. Use the matrices found in Problem 11 to compute /(e^) for each e^ in the standard
basis £n.

13. Suppose that the linear functional / has matrix A — [αι,α2, · . . , a n ] relative to
the basis A of V. Prove that the coordinate matrix of / relative to A* is AT.

14. Use the matrix of transition from <f* to A* to find A* = {pi,P2? —>Pn} f° r the
given basis A of R n . Write each p^ as a linear combination of the elements g^ of

(a) A = {(4, - 6 , 4 ) , (0,3,2), (6, - 1 7 , -13)}


(b) .A = { ( 1 , 1 , 3 ) , (2, - 1 , 1 ) , ( 1 , - 1 , 0 ) }
(c) A = {0,1,0,0), (1, - 2 , 0 , - 1 ) , (0,1,0,1), (2, - 7 , 1 , - 2 ) }
(d) .A = { ( 3 , 6 , 0 , 2 ) , (2,4,1,1), ( 2 , 5 , - 1 , 1 ) , (1,3,0,1)}

15. Suppose that the linear functional / o n V has matrix A = [a\1 <22,..., an] relative
to the basis A of V, and that Q is the matrix of transition from A to B. Without
using Theorem 5.15, prove that the matrix of / relative to B is AQ.

16. Let V be a finite-dimensional vector space. For any nonempty subset M of V,


the annihilât or of M. is the set Λ40 of linear functionals given by

M° = {/ G V* | / ( u ) = 0 for all u G M}.

(a) Prove that Λ40 is a subspace of V*.


(b) Prove that if W is a subspace of V, then dim(W) + dim(W°) = dim(V).

17. Find a basis for ΛΊ 0 .


246 Chapter 8 Functions of Vectors

(a) M = {(1,2, - 2 , 4 ) , (1,1,1,6), (2,3, -1,10)}


(b) M = {(2,0, - 3 , 6 ) , (2,1,0,4), (0,1,3, - 2 ) }
(c) M = {(2,2,1,0), (0,4,1,0), (4,8,3,0)}
(d) M = {(1,0,1,0), (2, - 3 , - 4 , 1 ) , (1,1, - 3 , 0 ) }

18. For any nonempty subset T of V*,

T° = {u G V | / ( u ) - 0 for all / G T } .

(a) Prove that T° is a subspace of V.


(b) Prove that if T is a subspace of V*, that dim(T) + dim(T°) = dim(V).

19. Prove that (W°)° = W for any subspace W of V, and (Τ°)° = T for any
subspace T of V*.

20. Prove that if Mi Q M2, then M\ D M%.

21. Let W i and W2 be subspaces of V.

(a) Prove that (Wi + W 2 ) ° = WÇ Π W § .


(b) Prove that (Wi Π W 2 ) ° = W ? + W § .

22. Let V be of dimension n over T. It follows from Theorem 8.2 that (V*)* = V**
is an n-dimensional vector space over T.

(a) For each u G V, define the function hu on V* by hu(f) = / ( u ) for all / in


V*. Prove that hu G V**.
(b) Prove that the mapping from V to V** defined by φ(ύ) = hu is an isomor­
phism from V to V**.

(Comment: Since V and V** are n-dimensional spaces over T, each is isomorphic
to Tn. The isomorphism φ is such a natural one, however, that it is ordinarily used
to identify V and V** as being the same space. That is, u and hu are regarded as
the same entity. This point of view is advantageous in certain instances in linear
programming.)

8-3 Real Quadratic Forms


In this section we turn our attention to a second type of vector function, the quadratic
form. For the time being, we are concerned only with real quadratic forms. We consider
quadratic forms in a more general setting in Section 8.9.
8.3 Real Quadratic Forms 247

Definition 8.5 Let q be a mapping of R n into R. Then q is a real quadratic form


if there exist constants C{j in R such that

i=l j=l

for each v = (xi,X2,...,xn) inHn.

For convenience, we shall refer to a real quadratic form in this section simply as a
"quadratic form." And since q(v) is a polynomial in #i,£2> ...,x n ) w e refer to g as a
quadratic form in the variables χ\,Χ2·> ••·,^η· The use of parentheses in both q(v) and
v = (#i, #2? ···> xn) leads to the clumsy expression q(v) = q ((x 1? x2» ···> #n)), so we shall
drop one set of parentheses from this notation. Thus a quadratic form in χχ, Χ2 is given
by
q{x\,X2) = OLX\ + bx\ + cx\X2-
The student has no doubt encountered such expressions as

#(x, y) — ax2 + by2 + CXT/

in analytic geometry or the calculus. For example, the left-hand member of

9x2 + 16y2 = 144

defines a quadratic form in x and y, as does the left-hand member of

xy = 1.
Typically, the set of all points (x,y) in R 2 for which a given quadratic form has a
constant value is a conic section with center at the origin.
The value of a real quadratic form q(x,y,z) in x, y, z is a polynomial

q(x, y, z) = ax2 + by2 + cz2 + dxy + exz + / y z ,

where the coefficients are real numbers. Typically, the set of all points (x,y,z) in R 3
for which a given quadratic form has a constant value is a quadric surface.

Example 1 □ Consider the real quadratic form q defined on R 3 by

q{x\,X2,x$) = x \ - 2 #2 + 4^3 + 4χχχ 2 - 3x 2 #3·

If we let X — [xi, #2? #3] T an


d

1 4 0
0 -2 -3
0 0 4
248 Chapter 8 Functions of Vectors

a simple computation shows that XTAX is a matrix with a single element q(x\,X2, £3) :
T
X AX = [x\ - 2x\ + 4x3 + 4xix2 - 3x2^3] ·
When q(xi, X2, £3) and this matrix of order 1 are identified as being the same, we have
q{xi,X2,%z) = XT AX.
In choosing this matrix A, we simply entered the coefficient a^ of X^Xj 111 the i row
and j t h column of A. Since the cross-product terms can be written in many ways, the
matrix A used in q(xi,X2,X3) — XTAX is not unique. For instance, the term 4x\X2
can be written as 4x2#i or as 3xi#2 + £2^1 · These variations would lead one to use

l O o l |"l 3 0
4-2-3 or A = 1 -2 -3
0 0 4J I 0 0 4
With the cross-product terms split into equal parts, the symmetric matrix

1 2 0
A= I 2 -2 -§
4
L° 2 .
would be used. ■
For any q(x\,X2, ...,x n ) — ΣΓ=ι Σ τ = ι Cij%iXj, it is always possible to write

g(xi,x 2 ,--.,^n) = X T ^4X


with X = [xi,X25 - ^ n F and with ^4 a symmetric n x n matrix as in the foregoing
example. For any r φ s, the sum of the two terms crsxrxs + csrxsxr can be split into
rs
two parts arsxrxs and asrxsxr with a r s = a s r = ——. Together with arr = crr,
this yields

n n

i=l j= l

where A = [dij]n is a symmetric matrix.


Definition 8.6 Whenever q(xi,X2,..-,xn) is written as
q(xi,X2,-,xn) = XTAX
with
xi
X2
X
8.3 Real Quadratic Forms 249

and with A = [a^] n a symmetric matrix, we say that A represents the quadratic form
g, or that A is the matrix of q relative to x\,X2, .-·,^η·

According to this definition, a matrix must be symmetric in order to be called the


matrix of a quadratic form, or to say that the matrix represents the quadratic form.
Unless stated otherwise, it will be assumed from now on that each XTAX is written
with A symmetric. Under this restriction, the matrix that represents a certain quadratic
form is unique in a given set of variables #i, #2, ···, #n- As a first step in establishing this
result, we prove the following lemma.

Lemma 8.7 If A is a real symmetric matrix of order n and XT AX = 0 for all choices
of X = [#i,2?2> ••·> χ η] 7 \ then A is a zero matrix.

Proof. Suppose that A is a matrix that satisfies the given conditions. Let xr = 1
and all other xi = 0. Then 0 = XTAX — arr. Now let r φ s, and put xr = 1 and
xs — 1, with all other xi — 0. Then

0 - XTAX
n n

i=lj = l
— (JjfgJLfJLß I LL ßf*iL ßJUf

— Ors i &sr·

But ars + asr = 2ars since A is symmetric, so we have ars = 0. Since r and s were
arbitrary, A is a zero matrix. ■

Theorem 8.8 The matrix of the real quadratic form q relative to xi,X2, ...,xn is unique.

Proof. Suppose that A and B both represent q relative to χι,#2, .-·,^η· Then

XTAX = q(xi,X2,:~,xn)
= XTBX,

and this implies that


XT(A-B)X =0
T
for all choices of X = [x\,X2, ...,x n ] · Now A — B is symmetric since A and B are
symmetric (see Problem 6), and therefore A = B by the lemma. ■

Let us now reexamine the definition of a real quadratic form. As stated in Definition
8.5, q(v) is a scalar that is uniquely determined by the vector v in R n . Although the
discussion up to this point has been primarily in terms of the components Xi of v, this
should not obscure the fact that q is a function of the vector variable v. Now the vector
v is uniquely determined by its coordinates relative to any given basis of R n , and so the
250 Chapter 8 Functions of Vectors

value q(v) should also be uniquely determined by these coordinates. Our next theorem
gives q(v) explicitly in terms of these coordinates. Theorem 5.14 provides the key to
the proof.

Theorem 8.9 Suppose that, for each v = (xi,X2, ••·,^η) ^η R n ,

q(v) = XTAX,

where A is the matrix of q relative to χι,Χ2, ...,χ η · Let P be the matrix of transition
from £n to the basis A and let y = [2/1,2/2, —?2/n]T be the coordinate matrix ofv relative
to A. Then
q(v) = YTBY,

where B = PT AP.

Proof. Since X = [xi,X2, . . . , x n ] T is the coordinate matrix of v = (χχ,Χ2, ...,x n )


relative to £ n , X — PY by Theorem 5.14. Hence

q(y) = XTAX = (PY)T A(PY) = YT(PTAP)Y. ■

Example 2 □ Let q be the quadratic form given by

9(^1,^2,^3) = 13x? + 5x| 4- 26x| - I6X1X2 — 4OX1X3 + 24x2X3,

and let Y = [2/1,2/2,2/3]T be the coordinate matrix of v = (xi,X2,#3) relative to the


basis A = {(1,2,0), (0, —2,1), (2,5, —1)}. We shall find the expression for q(v) in terms
of 2/1,2/2,2/3.
Written in matrix form, q(v) appears as

13 - 8 - 2 0 Xl

ç(xi,x 2 ,x 3 ) = Xl X2 X3 -8 5 12 X2

20 12 26 £3

The matrix of transition from £3 to Λ is

1 0 2
P = 2-2 5
0 1 -1
8.3 Real Quadratic Forms 251

so Theorem 8.9 assures us that q(v ) - - = YTBY, where

B == PTAP
1 2 0 13 - 8 - 2 0 1 0 2
0 -2 1 -8 5 12 2-2 5
2 5-1 -20 12 26 0 1 -1

1 0 ol
0-2 0
0 0 3j

Thus the expression for q(v) in terms of 2/1,2/2,2/3 is given by the simple form
2
Q(v) = yl-2y 2+?>yl m
Theorem 8.9 makes available many different expressions for the value q(v) of a
quadratic form. From one point of view, it describes the effect of a change of variables
from the set xi,X2,...,xn to the set y\,y2,---,Vn according to the rule that X = PY.
Such a change of variables is called a linear change of variables. The terminology of
Definition 8.6 can be applied to the variables yi as well as the xi.

Corollary 8.10 Suppose that v = (#i,£2? •••»^n) and A represents the quadratic form
q relative to X\,X2, ·~,Χη· If the variables 2/1,2/2» •••?2/n are related to the Xi by X — PY
with P invertible, then the matrix B = PTAP represents q relative to 2/1,2/2» • ••>2/n·

Proof. With B = PTAP, we have q(v) — YTBY from the theorem. This expression
is valid for any v = (#1,2:2, —<>χη) since Y — P~lX can take on any prescribed value.
Now B is symmetric, since
BT = (PTAP)T = PTAT (PT)T = PTAP = B.

Hence B represents q relative to 2/1,2/2 » · · ·, Un · H

The results of the preceding theorem and corollary motivate the introduction of the
relation of congruence on matrices.
Definition 8.11 Let A and B be matrices over the field T. Then B is congruent to
A over T if there is an invertible matrix P over T such that B = PTAP.
It is left as an exercise (Problem 8) to prove that congruence over T is an equivalence
relation on the set of all square matrices over T.
The principal objective of this and the next two sections is to show that the poly­
nomial expression for the value of a quadratic form q takes on the particularly simple
form of a "sum of squares"

<?(v) = λι2/ϊ + λ22/2 + · · · + Kyi


252 Chapter 8 Functions of Vectors

whenever the basis A in Theorem 8.9 is chosen appropriately. Example 2 illustrates this
simple form, but we have no method for finding an appropriate basis A at this time.

Exercises 8.3

1. Write g(xi,x 2 , ...,# n ) as XTAX with two different matrices A, one of which is
symmetric.

(a) q(xi, x 2 ) = 4xf + 6xix 2 - 1x\


(b) q{x\,x2) = Sri 4-5x2
(c) q{xi,x2,X3) = —2x\ — 3xiX3 + 8x2X3 - IOX2X1 — lx\
(d) q{x\,x2,xz) — Ylx\ — 8x2X1 + 4 χ χ χ 3 — 5x2
(e) q(xi,x2,x3) = 6x%-9xix2 -7x1
(f) <7(xi,X2,#3,#4) = — 3x2 + Ax\ — IIX1X4 + 5x2X4 + 18xix 2 + 16x4
with
(g) <7(xi,x 2 ,x 3 ) = Σα=ι Σ^=ι ûij^i^j ûii = « + 3
x x witn α
(h) ρ(χι, χ 2 , Χ3) = Σ*=ι Σ^=ι ^%3 % i ΰ' = * - 3

2. Suppose that for each v = (xi,X2, ...,x n ) in R n , q(v) = XTAX for the given
matrix A. For the given basis A of R n , find the expression for q(v) in terms of
the coordinates yi of V relative to A.

1 V2 - i
(a) A v/2 1 -v/2 , .4 = {(1,0,1), (3, V2,1), (3χ/2, - 4 , ^2)}
v z
2 2

1 -2 0
(b) A = -2 2 -2 Λ = {(1,0,1), (0,1,1), (1,1,0)}
0-2 3

1 -1 -1
(c) A -1 1 -1 Λ = {(1,0,0), (1,1,0), (1,1,1)}
-1 -1 1

1 1 2
(d) ,4 1 3 \ Λ={(1,0,0),(1,-1,0),(-11,3,4)}
2 I 7

3. Find the matrix that represents the given quadratic form relative to the variables
8.3 Real Quadratic Forms 253

(a) q{xi,X2,X3) = 4xiX2 - 2χχχ3 + x\ +2x2X3 - 2x§,


xi = 2/1
X2 = -V\ + 2/2 + 2y3
^3 = ~2/i + 2t/2 + 22/3
(b) g(xi,X2,^3) = x\ + 4x§ - 2x x x 2 + 4χ χ χ 3 - 6x2X3,
xi = - % i - 52/2 + 32/3
^2 = 32/1 + 32/2 - 22/3
^3 = -2/1 -2/2 + 2/3
(c) <?(xi,X2,£3) = 4x x x 2 + 4xix 3 + 4x2X3,
xi = 22/1 + 2/2 - 2y3
£2 = 32/1 + 2/3
^3 = 62/1 - 32/2 + 22/3
(d) q(xi,X2,X3) = - £ ? + 2 x | - 6 x § + 4x x x 2 -#1X3 - 4x2X3,
Xi = 2/1 + 22/2 + 2/3
£2 = -2/1 -2/2+2/3
£3 = 2/2 + 32/3

4. Verify that if ars = a s r = | ( c r s + c s r ) for r, s = 1, 2,..., n, then ^ ^ = 1 Σ ? = χ CijXiXj =


Σ i=l 2-^=1
η ΓΛΪΙ
a
ijXi%j-

5. Recall that a matrix ^4 is skew-symmetric if and only if ^4T — —A. Prove that if
A is skew-symmetric, then XTAX = 0 for all X — [χι,Χ2, ...,a?n]T·

6. Prove that the sum and difference of two symmetric matrices of the same order
are symmetric.

7. Prove that any square matrix can be written uniquely as the sum of a symmetric
and a skew-symmetric matrix. (Hint: A + AT is symmetric.)

8. Prove that the relation of congruence over T is an equivalence relation on the set
of all square matrices over T.

9. Prove that two symmetric matrices A and B over R are congruent if and only
if they represent the same real quadratic form relative to two sets of n variables
that are related by an invertible linear change of variables.

10. Show that the set of symmetric matrices of order n over R is not closed under
multiplication.

11. Show that AÄ1 and ATA are symmetric for any matrix A.

12. Show that if A is an n x n matrix over R such that XTAX = 0 for all X =
[xi, #2, ···, Xn]T, then A is skew-symmetric.
254 Chapter 8 Functions of Vectors

8.4 Orthogonal Matrices


Corollary 8.10 shows that the problem of reducing the expression for q(v) by a linear
change of variable to the form

<?(v) = \iy\ + X2yl + · · ■ + Kyi

is equivalent to that of finding an invertible matrix P such that

P T A P = diag{A!,A 2 ,...,A n }.

This problem has something of the same flavor as the diagonalization problem studied
in Section 7.4. The difference is that PTAP has taken the place of P~lAP. It is not
unnatural, then, to consider the possibility of finding a matrix P over R such that
PTAP is diagonal and PT = P~l. These are clearly strong restrictions to be placed
on P, but oddly enough it is true that such a P always exists (whenever A is real and
symmetric). Those matrices P that have the property that PT — P~l are quite useful
in many instances, and have a special name.

Definition 8.12 A matrix P over R is orthogonal if PT = P~l.

It would seem natural to expect a connection between the use of the word orthogonal
to describe a matrix and the use of the same word to describe a set of vectors in Chapter
1. We recall that an orthogonal set of vectors is a set {\i\ \ X G £ } such that u\1 -U\2 — 0
whenever λι φ X2· An orthogonal basis of a subspace W of R n , then, is a basis of W
that is an orthogonal set of vectors. The two uses of the word orthogonal are related,
but not exactly in the way that one might expect. Definition 8.13 and Theorem 8.14
explain the relation completely.

Definition 8.13 A set of vectors {\ΐχ | λ G C} is orthonormal if the set is orthogonal


and if each u\ has length 1.

The word "orthonormal" is a fusion of the words "orthogonal" and "normal." A


normalized set of vectors is a set in which each vector has unit length.

Theorem 8.14 Let P be an r x r matrix over R. Then P is orthogonal if and only if


P is the matrix of transition from one orthonormal set of r vectors in R n to another
orthonormal set of r vectors in R n .

Proof. Let P = [pij)rxr over R, and let A — {ui, U2,..., u r } be an orthonormal set
in R n . Then P is the matrix of transition from A to a set B = {vi, V2,..., v r } of vectors
in R n such that Vj = X ^ = 1 Pkj^-k- Now B is orthonormal if and only if v^ · Vj = 6ij for
8.4 Orthogonal Matrices 255

all pairs z, j . We have

Σ PkiUk ) ( Σ PmjUm )
rfc=l r / \m=l /
= Σ Σ PkiPrnjUk ' U m

- t Σ PWVA.
/c=l m = l
r
= Σ P/czP/ej,
fc=l
where u^ · u m = 6km since Λ is orthonormal. But this last sum is precisely the element
in the i t h row and j t h column of PT P. Thus v^ · Vj = 6{j if and only if PTP — 7 r , and
the proof is complete. ■

Thus a matrix P is orthogonal if and only if it preserves the property of being


orthonormal from one basis to another. Whether the bases are orthonormal or not, we
say that there is an orthogonal change of basis whenever the matrix of transition
from one basis to the other is orthogonal. The associated linear change of variable
X = PY is called an orthogonal change of variable.
There is a question lurking in the background here that should be brought out into
the open and answered. The question is this: Does every nonzero subspace of R n have
an orthonormal basis? Theorem 8.15 shows that the answer is affirmative, and the proof
describes a procedure for obtaining an orthonormal basis from any given basis. This
procedure is known as the Gram-Schmidt Orthogonalization Process.
Theorem 8.15 Let Λ — {ui,U2,...,u r } be a basis of the subspace W ofHn. There
exists an orthonormal basis M — {νχ, V2,..., v r } of W such that each v* is a linear
combination of Ui, 112,..., u^.

Proof. The proof is by induction on the dimension r of W . In order to describe the


procedure clearly, the routine is presented in full for r = 1,2,3.
Since Λ is linearly independent, each u^ φ 0 and ||UÎ|| > 0. Let Vi = η — - . Then
lluill
vi is a unit vector, and the proof is complete if r = 1.
For r > 1, let
w 2 = u 2 - (vi · u 2 ) v i .
Then W2 φ 0 since { v i , u 2 } is linearly independent. We have

Vi · W2 = Vi · U 2 - (Vi · U 2 ) V i · Vi

= vi · u 2 - vi · u 2
- 0,
w2
so w 2 is orthogonal to νχ. Put v 2 = -. Then {vi, v 2 j is an orthonormal set and v^
||w 2 ||
is a linear combination of Ui,..., u^ for i = 1,2.
256 C h a p t e r 8 Functions of Vectors

If r > 2, let
w 3 = u 3 - (vi · u 3 )vi - (v 2 · u 3 )v 2 .
Then w 3 φ 0, since w 3 = 0 would require that u 3 be dependent on {vi, v 2 } and hence
on {u 1 ,U2}. Since

vi · w 3 = vi u 3 - (vi · u 3 )vi · v i - ( v 2 · u 3 )vi · v 2


= vi · u 3 - vi · u 3
= 0,
w3
w 3 is orthogonal to v i . Similarly, w 3 is orthogonal to v 2 . The vector v 3 = is a
ll w 3||
unit vector and { v i , v 2 , v 3 } is an orthonormal set such that v* is a linear combination
of Ui, ...,Uf for i = 1,2,3.
Assume the theorem is true for all subspaces with r = k. Let A = {ui,..., u^, u/c+i}
be a basis of the (k + l)-dimensional subspace W . By the induction hypothesis, the
subspace (ui, u 2 ,..., u/-) has an orthonormal basis {vi, v 2 ,..., v^} such that Vi is a linear
combination of Ui,..., u^ for each i. Let

W/e+i = Ufc+i - (Vi · Ufc+i)Vi - ( v 2 · U / e + i ) v 2 (Vfc · Ufc+i)Vfc


k
= u fc+ i - E(vi-ufc+i)vj.
3=1

Then Wfc+i φ 0, since otherwise ιι^+χ would be dependent on {vi, v 2 ,..., v^} and hence
on {ui,u 2 ,...,Ufc}. Since
k
Vi · W f c + 1 = Vi · U f c + i - Σ ( V J * U fc+l)Vi · Vj
3=1
= V^ · Ufc+i - Vi · Ufc + i

= o,

Wfc+i is orthogonal to Vi for i = 1,2, ...,fc. Thus v^+i = rr is a unit vector


llw/c+i||
and {vi,..., Vfc, Vfc+i} is an orthonormal set such that Vi is a linear combination of
{ui, ...,Ui} for 2 = 1,2, ...,fc+ 1. Since {vi, V2, ..·, v fc+1 } is a linearly independent set of
fc + 1 vectors in W , it is a basis of W by Theorem 1.33, and this completes the proof. ■
E x a m p l e 1 □ We shall use the Gram-Schmidt Orthogonalization Process to obtain an
orthonormal basis of (A), where A is the linearly independent set given by

A = {ui = (2,0,2,1), u 2 = (0,0,4,1), u 3 = (8,0,3,5)}.


Using the same notation as in the proof of Theorem 8.15, we begin by writing
ν =
ι = ΊΠΓϋ 3(2.0,2,1)=(§,0,2,Ι).
8.4 Orthogonal Matrices 257

Next we let
w 2 = u 2 - (vi · u 2 )vi
= (0,0,4,l)-3(|,0,|,i)
= (-2,0,2,0)
and
W2
v2 ^(-2,0,2,0) = ( - ^ , 0 , ^ , θ ) .
|w 2 |
To obtain the third vector in our orthonormal basis of (A), we write

w 3 = u 3 - (vi · u 3 )vi - (v 2 · u 3 )v 2
= (8,0,3,5)-9(§,0,§,I)-(-^)(-^,0,^,o)
= (8,0,3,5)-(6,0,6,3) + ( - | , 0 , |,0)
= (-è,o,-|,2).
Finally, we let
v3 = Ä = ^(-1,0,-1,4).
w3
Thus the set {νχ, v 2 , V3} is an orthonormal basis of (.4), where

vi = | ( 2 , 0 , 2 , 1 ) , v2 = ^ ( - 1 , 0 , 1 , 0 ) , v3 = ^ ( - 1 , 0 , - 1 , 4 ) . ■

Exercises 8.4

1. Given that {m = ( 1 , - 1 , 0 , l ) , u 2 = (4,1,3,0), u 3 = ( 2 , - 3 , 9 , 4 ) } is linearly in­


dependent, find an orthonormal set { ν χ , ν 2 , ν 3 } such that each v^ is a linear
combination of ιΐχ, ...,ιι^.

1 \/3
2 2
2. Let A = {(§,§, i ) , ( - § , i , § ) } and P
y/3 1
2 2

(a) Verify that Λ is an orthonormal set.


(b) Verify that P is an orthogonal matrix.
(c) Suppose that P is the transition matrix from A to {νχ, v 2 } . Find νχ and v 2 ,
and verify that {vi, v 2 } is orthonormal.

3. Write v 3 in Problem 1 as a linear combination of u i , u 2 , u 3 .

4. Given that the set A is linearly independent, use the Gram-Schmidt Orthogonal-
ization Process to obtain an orthonormal basis of (A).

(a) A= {(1,2, - 2 , 4 ) , (1,1,1,6), (5,2,2,5)}


258 Chapter 8 Functions of Vectors

(b) ,4 = { ( 2 , 0 , - 3 , 6 ) , (2,1,0,4), ( 1 , 7 , - 1 , 3 ) }
(c) .A = { ( 2 , 2 , 1 ) , (0,4,1), (8,3,5)}
(d) A = {(1,0,1,0), (1,1, - 3 , 0 ) , (2, - 3 , - 4 , 1 ) , (2,3, - 2 , - 3 ) }

5. Prove that the set of all orthogonal matrices of order n is closed under multipli­
cation.

6. Prove that P~l is orthogonal whenever P is orthogonal.

7. Prove that if P is orthogonal, then det(P) = ± 1 .

8. Prove that, given any unit vector vi in R n , there is an orthonormal basis R n that
has vi for its first element.

9. A linear operator T on R n is called orthogonal if and only if ||T(u)|| = ||u|| for


every u G R n . Prove that T is orthogonal if and only if T(u) · T(v) = u · v for all
u, v e R n . (Hint: ||u + v|| 2 - ||u|| 2 - ||v|| 2 = 2u · v.)

10. (See Problem 9.) Prove that T is orthogonal if and only if T maps an orthonormal
basis {vi, V2,..., v n } of R n onto an orthonormal basis {T(vi),T(v2), ...,T(v n )}
of R n .

11. (See Problem 10.) Let M = {vi, v 2 ,..., v n } be an arbitrary orthonormal basis of
R n . Prove that T is orthogonal if and only if the matrix of T relative to λί is an
orthogonal matrix.

12. (See Problem 11.) Show that the product of two orthogonal linear operators on
R n is an orthogonal linear operator.

13. It follows from Problem 11 that any orthogonal linear operator is invertible. Prove
that if T is orthogonal, then T _ 1 is orthogonal.

8.5 Reduction of Real Quadratic Forms


We return now to the problem posed at the beginning of Section 8.4, that of reducing
a real quadratic form q(v) = ^Γ=ι Σ ? = ι ^ij^i^j to the form

<?(v) = Xiyl + λ22/2 + · · · + Kyi

by an orthogonal linear change of variables X = PY. This is equivalent to finding an


orthonormal basis A of R n such that the linear operator T with matrix A = [α^·]η
relative to Sn is represented by P~lAP = diag{Ai, λ 2 ,..., λ η } relative to A. This is the
same situation as that considered in Section 7.4, except that now P must be orthogonal.
If the eigenvalues Xj are real, then the solutions Pj to the equation

{A - \jI)Pj = 0
8.5 Reduction of Real Quadratic Forms 259

can be taken to be real since A is real. We show in our next theorem that the Aj's are
indeed real because A is symmetric. Before proceeding to this theorem, we introduce
some needed terminology and notation.
Definition 8.16 Let C = [crs]rnXn be a matrix over the field C of complex numbers.
The conjugate ofC is the matrix C = [ c r s ] m X n , where crs = ars — ibrs is the conjugate2
of the complex number crs = ars + ibrs.
If a and b are real numbers, the notation Ί for the conjugate a — bi of the complex
number z = a + bi is a standard one, and we have merely extended this notation to
matrices. The basic properties ~ζγζ~2 — ~z\z2 and z\+ z%= ~z\ Ύ~ζ2 are valid for matrices:
ÄB = ~AB and A + B = ~A + Έ. (See Problem 3 of the exercises.)

Definition 8.17 If C = [crs]mxn is a


matrix over C, then the conjugate transpose
of C is the matrix C* = (C) T .

The proofs of the equalities C* = (CT) , (Ax + A2 + · · · + An)* = A\ + A\ + · · · + A*n,


and {AiA2 · · · Αη)* — A* · · · A^A\ are left as exercises.
Theorem 8.18 The spectrum of a real symmetric matrix is a set of real numbers. That
is, all of the eigenvalues of a real symmetric matrix are real.

Proof. Let A b e a real symmetric matrix, let λ be any eigenvalue3 of A in the field
C, and let X be an eigenvector over C associated with λ. Then AX = ΛΧ, and this
implies that
X*AX = XX* X.
We regard this equality of 1 by 1 matrices as an equality of numbers. Now
{X*AX)* = X*A* (X*)* = X*AX
since A = A and AT = A imply that A* = A. Hence X*AX is a real number. Also,
n n
X*X = ^xkxk = Σ \χΛ2 >
fc=l k=l

which is a positive real number since I / O . This means that


_ X*AX

a quotient of real numbers. ■

We proceed now to the main result of this section.


Theorem 8.19 If A is a real symmetric matrix of order n, there exists an orthogonal
matrix P over R such that P~lAP is a diagonal matrix.
2
T h e symbol i here denotes the square root of —1; ars and brs are real numbers.
3
We cannot prove it here, but every polynomial of degree n with coefficients in C has exactly n
zeros, all of which are in C. In particular, the polynomial det(A — XI) has n zeros in C if A has order n.
260 Chapter 8 Functions of Vectors

Proof. The proof is by induction on the order n of A. The theorem is trivially true
for n = 1.
Assume that the theorem is true for all k x k matrices, and let A be a real symmetric
matrix of order k +1. Let T be the linear operator on R / c + 1 that has matrix A relative to
£fc+i. Let λι be any eigenvalue of T (and A). Then λι is real and there is a corresponding
eigenvector vi in R fe+1 . Since any multiple of vi is in the eigenspace V\1, we may assume
that vi is a unit vector. It follows from Theorem 8.15 that {vi} can be extended to an
orthonormal basis
ΛΓ = {vi,v 2 ,...,Vfc+i}
of R fc+1 (See Problem 8 of Exercises 8.4). The matrix of transition P\ from Sk+i to λί
is orthogonal (Theorem 8.14), and T is represented by

λι 0
A1 = P f U P i
0 I A2
relative to ΛΛ (The elements to the right of λι in the first row must be zero because
of symmetry.) The k x k matrix A2 is real and symmetric, so it follows from our
induction hypotheses that there is an orthogonal k x k matrix Q such that Q~l A2Q —
diag{A2,..., Afc+i}. It is readily verified that

Γ 1 | 0
P2 = - + -
Lo i Q
is orthogonal and that
Ρ2ιΑλΡ2 = Ρ^ιΡ{ιΑΡλΡ2 = diag{Ax, λ 2 ,..., A fc+1 }.
Now P\P2 is orthogonal since
(P1P2)T = ΡξΡ? = Ρ^Ρϊ1 = (Ρ1Ρ2Γ 1 ,
and thus P — P\P2 is the desired matrix. ■

It is important to note that the diagonal elements of the matrix


p - M P = diag{Ai,A 2 ,...,A n }
are the eigenvalues of A.
Corollary 8.20 Any real quadratic form q with q(v) = XTAX can be reduced to a
diagonalized representation
q(v) = YT(PTAP)Y = X1y21+X2yl + Kyl
by an orthogonal change of variables X = PY. For each v = (x\,x2, ...,xn) G R n ,
Y = [2/1,2/2, ••·,2/η]τ is the coordinate matrix of v relative to A, where P is the matrix
of transition from 8n to A.
8.5 Reduction of Real Quadratic Forms 261

Proof. The proof is left as an exercise in Problem 13. ■

The proof of Theorem 8.19 is not exactly constructive, although it does suggest an
iterative procedure to obtain P by beginning with the selection of Λχ, Vi and P\. It is
not advantageous for us to pursue this lead, as the next theorem leads to a much more
efficient procedure.
We observe that if u and v are vectors in R n with coordinate matrices X —
[χι,χ 2 ,·..,£η] Τ and y = [2/1,2/2, ···, UnY relative to £ n , then u · v =Y2=ixkVk = ΧΎΥ·
T
In particular, u and v are orthogonal if and only if X Y — 0.

Theorem 8.21 Let A be a real symmetric matrix. If Xr and Xs are distinct eigenvalues
of A with associated eigenvectors Pr and Ps, respectively, then PjPs — 0.

Proof. Suppose that Ar and Xs are distinct eigenvalues of A with Pr and Ps as


corresponding eigenvectors. Then APr — XrPr and APS = XSPS. Now

PjAPs = P?(XSPS) = XsPjPs

and
(PjAPs)T = PjAPr = Pj{XrPr) = KPjPr = A r (P r T P s ) T .
But (PjAPs)T = PjAPs and {PjPS)T = Pj Ps since PjAPS and PjPs are matrices
of order 1. Hence
XrPr Ps = XsPr Ps
and
(Ar - Xs)PjPs = 0.
Since λΓ — Xs φ 0, it must be that PjPs = 0. ■

In Section 7.4, we found that P = [Ρχ, Ρ<ι,..., Pn] was an invertible matrix such that
P~lAP was diagonal if and only if the columns Pj of P were the coordinate matrices
relative to Sn of a basis of eigenvectors \j of the associated linear transformation T. Since
PjPs is the element in row r and column s of PTP, the requirement that PT — P~l
is satisfied if and only if PjPs — 6rs. Since PjPs — v r · v 5 , this is equivalent to
requiring that the basis of eigenvectors be orthonormal. Theorem 8.21 assures us that
eigenvectors from distinct eigenspaces are automatically orthogonal. Thus, the only
modification of the procedure in Section 7.4 that is necessary to make P orthogonal
is to choose orthonormal bases of the eigenspaces V\j. This is illustrated in the next
example.

Example 1 □ Consider the problem of finding an orthogonal matrix P such that


PTAP is diagonal, where
1 -1 -1
A = -1 1 -1
-1 - 1 1
262 Chapter 8 Functions of Vectors

As explained above, the basic problem is to find an orthonormal basis of eigenvectors of


the linear transformation T that has matrix A relative to £3. The characteristic equation
of A is — (x + l)(x — 2) 2 = 0 . By solving the systems

{A - (-l)I)X = 0 and {A - 2I)X = 0,

we find that {(1,1,1)} is a basis of the eigenspace V_i and {(1, —1,0), (1,0, —1)} is a
basis of V2. Applying the Gram-Schmidt process, we obtain

{A* 1 ' 1 · 1 *}
and
{75U.-LO).^1'1'-2>}
as orthonormal bases of V_i and V2, respectively. Hence

1
y/3
1
x/2
1
'V2 va 1
_ 1
P = 1
V3
1
V2
1
V6
V2 - v / 3 1
1
0 2
Vë .
V2 0 -2

is an orthogonal matrix such that PTAP = diag{ —1, 2, 2}. ■

Since the introduction of quadratic forms was partially motivated by references to


the conic sections and quadric surfaces, it is of interest to relate our results here to these
geometric quantities.
A conic section always has an equation of the form

ax2 + bxy + cy2 + dx + ey + / = 0

in rectangular coordinates x, y. According to Corollary 8.20, the quadratic form q with


q(x,y) = ax2 + bxy -f- cy2 can be reduced by an orthogonal change of variables

x = pnx' +p\2y'
y = p2ix' + P22y'

to a diagonalized form
Ai(z')' + W Λ)2
It is shown in analytic geometry that such a change of variables corresponds to a rotation
of the coordinate axes about the origin. The different possibilities for the signs of the
eigenvalues correspond to different types of conic sections, with degenerate cases possible
in each instance. If λι and \2 are nonzero and of the same sign, the conic section is a
circle or an ellipse. If Ai and X2 are nonzero and of opposite sign, the conic section is
a hyperbola. If exactly one of λι, Χ2 is zero, the conic section is a parabola. If both λι
and À2 are zero, the graph is a straight line.
8.5 Reduction of Real Quadratic Forms 263

As mentioned in Section 8.3, the quadric surfaces can be related to quadratic forms
in three variables. This relation can be analyzed in a manner analogous to that for the
conic sections. However, this analysis becomes somewhat involved, and it is omitted
here for this reason.

Exercises 8.5

1. Find an orthogonal matrix P such that PTAP is diagonal.

Γθ 2 2 1 2 -4
(aM = 2 0 2 (b)A = 2 -2 -2
[ 2 2 0 -4 -2 1

4 -1 (3 1
17 2 - '2
- 1 5 - 1 0
(c)A = 2 14 <4 (dM =
0 - 1 <4 - 1
! -2 4 l·4 1 0 -1 5

2. Let zi,Z2,-.zn and αι,α2,...,α η be complex numbers.

(a) Verify that z\Z2 = z\Z2 and z\ + z 2 — z\ + z 2 .


(b) By induction, extend the results of part (a) to z\Z2 — -zn Z\Z2· — zn and
Zi+ Z2-\ l· Zn =Ίι+~Ζ2-\ l· ~Ζη·
(c) Prove that Y2=i akzk = Σ * = ι ^ ^ ·

3. Let A = [ a r s ] m x n and £ = [ 6 r s ] n x p over C.

(a) Prove that ÂB = ~ÄÜ.


(b) Prove that A + B = A + B whenever A + B is defined.

4. Prove that C* = (C T ).

5. Prove that (Αχ + A2 + · · · + Αη)* = A\ + A\ + · · · + A*.

6. Prove that (AiA 2 · · · A n )* = A* · · · A^A{.

7. Prove that (A*)" 1 = (A" 1 )*.

8. In the proof of Theorem 8.19, verify that the matrix P 2 is orthogonal and that
P2~lAiP2 = diag{Ai, λ 2 ,..., Afc+i}.
9. Let u and v be vectors in R n with coordinate matrices X = \ V and
Y = [2/1,2/2, •••72/n]7\ respectively, relative to an orthonormal basis {vi, v 2 ,..., v n }
of R n . Prove that u v = XTY.
264 Chapter 8 Functions of Vectors

10. LetA/* = {vi, v 2 ,..., v n } be an orthonormal basis of R n ,and let X = [x\,X2, ••• 5 ^n] T
be the coordinate matrix of u relative to ΛΛ Prove that Xk = u · v^ for k —
l,2,...,n.

11. Prove that if A = BT B for some matrix B over R, then XTAX > 0 for all
X = [#i,#2> '-·,χη]Τ o v ^ r R»
12. Assume that the linear operator T of R n has a symmetric matrix relative to an
orthonormal basis of R n . Prove that the eigenvectors of T which correspond to
distinct eigenvalues are orthogonal.

13. Prove Corollary 8.20.

8.6 Classification of Real Quadratic Forms


We have seen in Section 8.5 that an arbitrary real quadratic form q with q(v) — XTAX
can be reduced by an invertible (even orthogonal) linear change of variables X = PY
to a diagonalized representation

9(ν) = λι2/? + λ22/2 + · · · + λη2/£,

where
PTAP = D = diag{Ai,A 2 ,...,A n }.
The matrix Y = [yi,i/2, ...,2/ n ] T is the coordinate matrix of v = (χι,Χ2, — ,Xn) relative
to the basis A — {ui,U2, . . . , u n } , where P is the transition matrix from En to A. An
interchange of two vectors u^ and Uj in A amounts to an interchange of the variables
yi and yj, and such an interchange is reflected in the matrix D by an interchange of λ^
and Aj. The diagonalized representation can thus be written as

q(v) = Xiyf + \2yl + · · · + \ryî,


where each of Ai, A2, ··., Ar is nonzero. The matrices A and D have the same rank since
they are equivalent. Consequently, the number of nonzero A^'s is always the same as
the rank r of A.

Definition 8.22 The rank of the real quadratic form q with ç(v) = XT AX is the rank
of the matrix A.

The discussion above shows that the rank of q is the same as the number of variables
having nonzero coefficients in a diagonalized representation of q. We shall examine these
nonzero terms in more detail.

Theorem 8.23 In any two diagonalized representations of a real quadratic form q, the
number of variables with positive coefficients is the same and the number of variables
with negative coefficients is the same.
8.6 Classification of Real Quadratic Forms 265

Proof. Suppose that q has rank r, and let


q(v) = dxy\ 4- d2y\ Λ Y dryl
and
ς(ν) = ^ζ? + ^*5 + ··· + 4*?
be any two diagonalized representations of q.4 We may assume without loss of gen­
erality that the diagonal elements in both D = diag{di,c?2, ...,d r ,0, ...,0} and D' =
diagJGEi, c^,..., d'r, 0,..., 0} are arranged so that the positive elements come first, followed
by the negative elements. That is,
D = diag{di,..., dp, d p +i,..., d r , 0,..., 0}
with the first p elements di positive, and

D1 = diag{di,..., 4 , 4 + 1 . - X , 0,..., 0}
with the first k elements d!{ positive.
Let Λ = {ui,U2, ...,u n } and B = {vi, V2,..., v n } be bases of R n chosen so that
[2/1 ? 2/2, ••·,2/η]Τ and [zi,z2, ...,zn]T are the coordinate matrices of v = (xi,x2, ...,xn)
relative to Λ and ß, respectively (see Corollary 8.20). Now q(ui) = d{ > 0 for i =
1,2, ...,p. Therefore, for any w = Σ?=1 yiUi G W i = (ui, u 2 ,..., u p ) ,
p

i=l

Also, q(vi) = d'i < 0 for i = k + l,fc 4- 2, ...,n, so that for each w = ΣΓ=Α;+ι ^ v * ^
W 2 = (Vfc+i,Vfc+2,...,Vn),
d
q{w)= £ ^0·
i=k+l

Thus W i Π W 2 = {0} and


dim(Wi 4- W 2 ) - dim(Wi) + dim(W 2 ).
But dim(Wi 4- W 2 ) < n and
dim(Wi) 4- dim(W 2 ) = p 4 n - k .
Therefore, p + n — k < n and p < k.
From the symmetry of the conditions on p and fc, it follows that k < p and k = p.
The second part of the conclusion follows immediately from r — k = r — p. ■
Corollary 8.24 Lei ^4 be a real symmetric matrix. Any two diagonal matrices that are
congruent to A over R have the same number of positive elements and the same number
of negative elements on the diagonal.
4
These two diagonalized representations correspond to two diagonal matrices D and D' that are
congruent to the original matrix A of q : D = P^AP\ and D' = P£ AP2 for invert ible Pi and P2. That
is, we are dealing here with congruence of matrices, not with orthogonal similarity.
266 Chapter 8 Functions of Vectors

Proof. See Problem 5. ■

Definition 8.25 The index of the quadratic form q is the number p of positive coeffi­
cients appearing in a diagonalized representation of q. The difference s between p and
the number of negative coefficients in a diagonalized representation of q is the signa­
ture of q. That is, the signature of q is the number s = p — (r — p) = 2p — r, where r is
the rank of q.

Theorem 8.23 shows that the index and signature of q are well-defined terms. The
signature of g is a measure of the "positiveness" or "negativeness" of q.

Theorem 8.26 A quadratic form q on R n with rank r and index p can be represented
as
, 2 _ 2
9(v) zf + -+- zp zp+1
by a suitable invertible linear change of variables.

Proof. Let
q(v) = dxy\ + · · - + dpyl + dp+iyl+l + · · · + dry2r
be a diagonalized representation of q with d\, ...,d p positive and d p +i, ...,c! r negative.
For i = 1,2, ...,p, di has a positive real square root \J~di, and we put z% = y/dïyi. For
i = p-f l,p + 2,..., r, —di has a positive real square root ^/jdï|, and we put Zi = y/\di\yi.
For 2 = r + l,...,n, we put Zi — y%. The linear change of variables

21

Z2 V2
= diag{ v d i , ···, yJoTv, y |d p +i|,..., v/je^f,1» —» 1}

Vn

is clearly invertible, and

q(y) = diyl + '" + àvy2v - (-dp+i2/£+i) (-dry?)


2
= (\fa\yi) + · - + (y/dp~yp) - (V|C^H-I|2/ P +I) (\/KÏ?/r)

e
- .2
= * f + ··■ + *£ VU

The form q(v) = ζ\λ \-z% — ζ^+ι z2, in Theorem 8.26 is called the canonical
form for q. There are two corollaries to the theorem concerning symmetric matrices.
Proofs are requested in the exercises.
8.6 Classification of Real Quadratic Forms 267

Corollary 8.27 Any real symmetric matrix A is congruent over R to a unique matrix
of the form
Γ
iP I o I o
—+ — +—
c 0 I -Ir-p I 0
— + — +—
0 0 0
where r is the rank of A.
The number p of positive l's in C is called the index of A. Since the linear change
of variable in the proof of Theorem 8.26 is clearly not necessarily orthogonal, the matrix
P used to obtain PTAP = C in Corollary 8.27 may not be orthogonal.
Corollary 8.28 Two symmetric n x n matrices over R are congruent over R if and
only if they have the same rank and the same index.
Definition 8.29 Let q be a real quadratic form on R n with rank r and index p.
(1) If p — r — n, q is called positive definite.
(2) If p = r, q is called positive semidefinite.
(3) lfp = 0 and r = n, q is called negative definite.
(4) lfp = 0, q is called negative semidefinite.
Each of the conditions in Definition 8.29 can be formulated in terms of the range of
values of q. These formulations are given in the next theorem, with proofs requested in
Problem 8.
Theorem 8.30 Let q be a real quadratic form on R n .
(1) q is positive definite if and only if q(v) > 0 for all v φ 0 in R n .
(2) q is positive semidefinite if and only if q(y) > 0 for all v G R n .
(3) q is negative definite if and only if q(y) < 0 for all v ^ 0 in R n .
(4) q is negative semidefinite if and only if q{w) < 0 for all v G R n .
Example 1 D In Example 1 of Section 8.5, we obtained the orthogonal matrix

V2 Λ/3 1

* =Λ y/2 -V3 1
V2 0-2

such that PfAPi = d i a g { - l , 2 , 2 } , where

1 -1 -1
A = -1 1 -1
-1 -1 1
268 Chapter 8 Functions of Vectors

Thus the change of variables

xi = j . (V2yi + V3y2 + y3)

reduces the quadratic form

<?(v) = 9(^1,^2,^3) = x\ + %l + ^ 3 - 2xix 2 - 2x\x$ - 2x 2 ^3


to the form
q(v) = -y21+2y* + 2yl
In the terminology of the first paragraph of this section, an interchange of the variables
y 1 and ys corresponds to an interchange of Ui and 113, or an interchange of columns one
and three in Pi. Thus
1 y/3 V2
P2 1 -\/3 y/2
~7E
-2 0 y/2
is an orthogonal matrix such that P.JAP2 = diag{2,2, — 1}. This means that the or­
thogonal change of variable [xi,X2,#3] T = ^ Μ ^ Ι , ^ ^ Ρ reduces q(v) to the form
ς(ν) = 2(^)2 + 2 ( ^ ) 2 - ( ^ ) 2 .
Following the method of the proof of Theorem 8.26, we write y[ = -75 ^ι, 2/2 = 775 ^ 2 '
and y3 — 23. This reduces q to the canonical form
q{y) = z2 + zl-zl
The matrix of transition P3 corresponding to this last change of variables [y'i.y^ 2/3]T —
Ps[zi,Z2,23}T is
1 0 0
V2
ft = 0 à°
0 0 1
Combining, we have the change of variable [xi,X2,xs]T = ΡΊΡ^Ι, £2,^3] T , and

1 y/3 2
P = P2P3 = 2~73
Λ? -y/3
0

is a matrix such that PTAP = diag{l, 1,-1}. We note that P is not orthogonal. It is
clear that q has rank 3, index 2, and signature 1. None of the terms in Definition 8.29
apply to q. M
8.6 Classification of Real Quadratic Forms 269

Exercises 8.6

For each v = (χχ, X2, ···, xn) in R", a quadratic form q is defined by q(v) XTAX
for the given A. Find the rank, index, and signature of q.

I V2 -\ 1 -2 0
(a) A y/2 1 -s/2 (b)A -2 2 -2
-y/2 0 -2 3

1 -1 -1 -2 2 2
(c)A -1 1 -1 (d)A. 2 1 4
-1 - 1 1 2 4 1

2. Find the canonical form for each of the quadratic forms referred to in Problem 1.
3. For each of the following matrices A, find an invertible real matrix P such that
PTAP is of the form C given in Corollary 8.27.
0 2 2 2 -4
(a) A: 2 0 2 (b)A = -2 - 2
2 2 0 -2 1

4 -1 0 1
17 2 -2
-1 5 -1 0
(cM = 2 14 4 (dM =
0 -1 4 -1
-2 4 14
1 0 -1 5

4. For each of the matrices A in Problem 3, let q be the quadratic form on R n with
q(v) = XTAX. Find a basis B of R n such that q(v) has the form

q{v) = *i + · · · + Zp - z p + i

with [zi, 22> ···, Zn]T as the coordinate matrix of v relative to B.


5. Prove Corollary 8.24.
6. Prove Corollary 8.27.
7. Prove Corollary 8.28.
8. Prove Theorem 8.30.
9. A real symmetric matrix A is defined to be positive definite if the quadratic form
q(xi,X2, ...,Χη) = XTAX is positive definite. The terms positive semidefinite,
etc., are defined similarly.
270 Chapter 8 Functions of Vectors

(a) Prove that A is positive definite if and only if A = BTB for some real
invertible matrix B.
(b) Prove that A is positive semidefinite if and only if there exists a (possibly
singular) real matrix Q such that A = QTQ.
(c) Prove that A is positive definite if and only if all of the eigenvalues of A are
positive.

8.7 Bilinear Forms


The Cartesian product 5 x T o f two sets <S, T is the set of all ordered pairs of elements
from <S and T :
S x T = {(s, t) I s e S and t G T } .
Few concepts of such simplicity have been put to such a wide range of uses in mathemat­
ics. For this is the basic concept behind the coordinate systems, the definitions of binary
relation and function, and the constructions of several number systems. Although not
stated explicitly in our development, the idea is implicit in the familiar vector spaces
Tn'. For the next three sections, we shall be concerned with Cartesian products of set
of vectors, or rather, with functions defined on such sets of vectors.
Definition 8.31 Let U and V be vector spaces over the same field T. A bilinear form
on U and V is a mapping f of pairs of vectors (u, v) in U x V onto scalars / ( u , v) in
T that has the properties
(i) / ( a i u i + a 2 u 2 , v ) = a i / ( u i , v) + a2f(u2, v)
and
(ii) / ( u , M i +62^2) = hf(u,vi) + b2f(u,v2).
The conditions (i) and (ii) are described by saying that / is a linear function of each
of the vector variables. These conditions (i) and (ii) are together equivalent to the single
requirement that

(iii) / ( a i u i + a 2 u 2 ,61 vi + b2v2)


= ai&i/(ui, vi) + a i 6 2 / ( u i , v 2 ) + a 2 &i/(u 2 , vi) + a2b2f(u2, v 2 ).
These linearity conditions extend readily to

( m

^aiUi^bjVj]
n \ m n

=^2Y^aibjf(ui,vj)
i=\ j =l ) i=l j=l
for all positive integers m and n. Thus, a function / from U x V to T is a bilinear form
on U and V if and only if (iv) is satisfied as an identity.
We are primarily interested in the case where the vector spaces U and V in Definition
8.31 are finite-dimensional. For the remainder of this section, U and V will denote vector
spaces over T with dimensions m and n, respectively.
8.7 Bilinear Forms 271

Definition 8.32 Let A — {ui,U2, ...,u m } and B — {vi, V2,..., v n } be bases ofU and
V, respectively, and let f be a bilinear form on U and V. The matrix of f relative to
A and B is the matrix A = [ a ^ ] m X n , where

CLij = f{Ui,Vj)

fort = l,2,...,ra;j = l,2,...,n.

We say that the matrix A represents the bilinear form / . As with linear functionals,
the function values / ( u , v) can be expressed compactly by use of the matrix of / .

Theorem 8.33 Let X — [xi,#2? • •••>Xm]T denote the coordinate matrix of a vector
u G U relative to the basis A ofU, and let Y = [3/1,2/2? ...,ΐ/ η ] Τ be the coordinate matrix
of a vector v G V relative to the basis B of V. If A— [ a ^ ] m X n , then A is the matrix of
the bilinear form f relative to A and B if and only if the equation

f(u1v)=XTAY

is satisfied for all choices of u G U, v G V.

Proof. Assume first that A = [ a ^ ] m X n is the matrix of / relative to A and B. This


means that α^ = /(u^, Vj) for i = 1,2, ...,ra; j = 1,2, ...,n. For any u = Σ7=ι xiuii a n d
v = Y^=i 2/jVj, the condition (iv) stated previously implies that

m n
Z(u,v) = £ Y,Xiyjf(uu\3)
1=1.7 = 1
ra n

i=lj=l
m 1 n \
= Σ ^ Σ%■%· ·
i=l \j =l /

Now ΣΊ=Ι UijVj is the element in the i t h row of the m x 1 matrix AY, and thus

/(u, v) = 5> Ä 5 > ^ · J - XT^F.


i=l \J=1

Assume, on the other hand, that / ( u , v) = X T A K is satisfied for all u G U , v G V.


For the particular choices u = u^ and v = vJ? we have X — [<5ii,fe,...,(5mi]T and
272 Chapter 8 Functions of Vectors

Y = [6ij,62jT..16nj]T. Hence

/(ΐϋ,ν,-) = XTAY
a
Σ2=1 lk^kj
Y2=\ a2kÔkj
Su 62i Ornj.

a
2^k=l mkàkj
aij

a2j
Su 6>2i * "

~ }-^k=l °kiükj
=
Q>ij >

and A — [aijjmxn is the matrix of / relative to A and B. M

It follows from Definition 8.32 and Theorem 8.33 that the two bilinear forms / and
g on U and V are equal if and only if they have the same matrix relative to fixed bases
A of U and B of V.
As we have in similar situations previously, we ask about the effects of changes of
bases in U and V. The answer is obtained quite easily.

Theorem 8.34 Let A be the matrix of the bilinear form f relative to the bases AofXJ
and B o/ V. If Q is the matrix of transition from A to the basis A! of U and P is the
matrix of transition from B to the basis B' o/V, then QTAP is the matrix of f relative
to A' and B'.

Proof. Suppose that u G U has coordinate matrix X relative to A and X' relative
to A!. Let v G V have coordinate matrix Y relative to B and Y' relative to B'. By
Theorem 5.14, X = QXf and Y = ΡΥ'. Combined with the "only if" part of Theorem
8.33, this yields
/(u,v) = XTAY
= (QX')TA(PY')
= (X')T{QTAP)Y'.

But, by the "if" part of Theorem 8.33, this means that QTAP is the matrix of / relative
to A' and ff. ■
8.7 Bilinear Forms 273

Corollary 8.35 Two mxn matrices A and B over T represent the same bilinear form
on U and V relative to certain (not necessarily different) choices of bases of U and V
if and only if A and B are equivalent over T.

Proof. See Problem 5. ■

The r a n k of a bilinear form / is defined to be the rank r of any matrix that represents

Corollary 8.36 Let f be a bilinear form on U and V. With suitable choices of bases
in U and V, / can be represented by a matrix Dr that has the first r diagonal elements
equal to 1 and all other elements zero.

Proof. See Problem 6. ■

The results of this section are illustrated in our next example.

E x a m p l e 1 □ Let / be the bilinear form on U = R 3 and V = R 2 that is defined by

/ ( ( x i , x 2 , X 3 ) , (2/1,2/2)) = -lx\yi - IOX12/2 -2^22/1 ~ 3x22/2 + 12x32/1 + 17x32/2-


We consider the following problems.

(a) Write the matrix A of / relative to A = {(1,0,0), (1,1,0), (1,1,1)} and B =


{(1,-1), ( 2 , - 1 ) } .
(b) Use the matrix A to compute the value of / ((2,3,1), (0, —1)).
(c) Determine bases A! of U and Bf of V such that the matrix of / relative to
A! and B' is of the form Dr described in Corollary 8.36.

We obtain the matrix A by computing the values α^ = /(u^, Vj). For example,

an = /((1,0,0),(1,-1))
= -7(1)(1) - 10(1)(-1) - 2(0)(1) - 3 ( 0 ) ( - l ) + 12(0)(1) + 17(0)(-1)
= 3.

Similarly,
a 2 1 = / ( ( l , l , 0 ) , ( l , - l ) ) = 4,
a3i=/((l,l,l),(l,-l)) = -l,
and so on. Thus
Γ 3 —4 1
A=
274 Chapter 8 Functions of Vectors

In order to use A to compute the value of / ((2,3,1), (0, —1)), we first write

(2,3,1) = (-1)(1,0,0) + (2)(1,1,0) + (1)(1,1,1)

and
(0,-1) = 2 ( 1 , - l ) + ( - l ) ( 2 , - l ) .
Then the equation / ( u , v ) = X7 AY yields

3 -4
2
/ ((2,3,1), (0,-1)) = [-1,2,1] 4 -5 = 12.
-1
-1 2

By the method of Section 3.8, we find that

1 0 0
-5 4
QT = 0 1 0 and P =
-4 3
3 -2 1

are invertible matrices such that

1 0
T
Q AP 0 1
0 0

Using Q as the matrix of transition from A to A' and P as the matrix of transition
from B to Β', we obtain

A! = {(1,0,0), (1,1,0), (2, - 1 , 1 ) } and B' = {(-13,9), (10, - 7 ) } . ■

Exercises 8.7

1. Find the matrix of the given bilinear form / on U and V with respect to the given
bases A and B.

(a) U = R 2 , V = R 3 , ^ = <f2,o = Î3,


/ ( ( x i , x 2 ) , (2/1,2/2,2/3)) = 2χι2/2 - X12/3 + 2^22/1 + #22/2 + ^22/3
(b) \J = V = R3,A = B = 63,
/ ( ( x i , x 2 , £3), (2/1,2/2,2/3)) = xiyi - x\V2 + 2x22/1 + x22/3 + 2x32/2 +X32/3
(c) U = V = R 3 M = B - {(1,1,1), (1,1,0), (1,0,0)},
/((xi,x 2 ,x 3 ),(2/i 5 2/2,2/3)) = 2xi2/2 - xi2/3 + 2x22/i + £22/2 + ^22/3 ~ X32/1 +
X32/2 - 2x32/3
8.7 Bilinear Forms 275

(d) U = R 3 , V = R 2 M = { ( 1 , 1 , 0 ) , ( 1 , - 1 , 1 ) , ( 0 , 1 , 0 ) } , B = { ( 1 , 2 ) , ( 2 , 1 ) } ,
/ ( ( x i , £ 2 , ^ 3 ) , (2/1,2/2)) = 2xiyi +4:XXy2 - 6x22/i + 3x22/2 + X3ÎJ1
(e) U =C 3 , V =C 2 , A={(h 0,0), (1, i, 0), (0,0,2i)}, S = {(1 - i, i), (i, - * ) } ,
/ ( ( x ! ^ ^ ) , (2/1,2/2)) = 5xi2/i + ixi2/2 - ^ 2 2 / 1 +2x22/2 + 2x32/i - ^32/2
(f) U = C 3 ,V = C 2 M = { ( l , 0 , 0 ) , ( l , l , 0 ) , ( l , l , l ) } , ß = { ( l , - l ) , ( 2 , « l ) } ,
/ ( ( X 1 , X 2 , X 3 ) , (2/1,2/2)) = X l 2 / 2 + X 2 2 / l +2X22/2 - 2 X 3 2 / 1 +2X32/2

2. Use the matrix obtained in the corresponding part of Problem 1 to compute / ( u , v)


for the given vectors.
(a) u = ( 3 , - l ) , v = ( 0 , 4 , - 1 ) (b) u = ( 5 , - 6 , 3 ) , v = ( 2 , - 1 , 0 )
(c) u = (1, - 2 , 2 ) , v = (0,0,1) (d) u = (0,3, - 1 ) , v = (8,7)
(e) u = (i, 0, i), v = (2,0) (f) u = (1,0, - 1 ) , v - (1,0)

3. Each of parts (a)-(f) below relate to the corresponding part of Problem 1 above.
In each case, a new pair of bases A', B' is given for the vector spaces U, V. Find
the matrix of / relative to A' and B' by use of Theorem 8.34.

(a) Λ' = { ( 4 , - 1 ) , ( - 1 , 5 ) } , β ' = { ( 1 , 2 , - 4 ) , ( 1 , - 1 , - 1 ) , ( - 4 , - 2 , 1 ) }


(b) A' = B' = {(17,2, - 2 ) , (1,7,2), (-1,2,7)}
(c) A' = B' = {(5,4,2), (1, - 1 , - 2 ) , (1, - 1 , 1 ) }
(d) A1 = {(3,1,2), (3, - 1 , 1 ) , (0,5, - 2 ) } , B' = {(1, - 1 ) , (0,3)}
(e) A' = {(-1,0,0), ( - 1 , -i, 0), (0,0,1)},B' = { ( 2 , 1 + i), (1, - 1 ) }
(f) A' = {(1,0,0), (0,1,0), ( - 6 , 2 , 1 ) } , # ' = {(-2,1), (1,0)}

4. Let / be the bilinear form on R 4 and R 3 that has the given matrix A relative to
£4 and £3. Find bases A' of R 4 and B' of R 3 such that, relative to A' and #', /
has a matrix of the form Dr described in Corollary 8.36.

1 0 2 1 2 3
4 1 3 -2 --4 - 6
(aM = (b)A =
3 1 1 1 0 -1
2 1 -1 -1 0 1

0 0 0
2
(c)A
2 1 3
{d) A-
0 1
1
4 2 6 3 -2 - i
1 0 1

5. Prove Corollary 8.35.


276 Chapter 8 Functions of Vectors

6. Prove Corollary 8.36.


7. Prove that the rank of a bilinear form as defined in this section is unique.
8. If / and g are two bilinear forms on U and V, then / + g is defined by

(/ + 2)(u,v) = / ( u , v ) + p ( u , v )

for all (u, v) E U x V. Also, for any a G T, the product af is defined by

(a/)(u,v) = a - / ( u , v ) .

Prove that the set of all bilinear forms on U and V is a vector space with respect
to these operations of addition and scalar multiplication.
9. (See Problem 8.) Let W be the vector space of all bilinear forms on R m and R n .
Prove that the mapping m(f) = A that sends a bilinear form onto its matrix A
relative to £ m and £n is an isomorphism of the vector space W onto the vector
space Rmxn as defined in Section 4.2.

8.8 Symmetric Bilinear Forms


As in the case of linear transformations, there is a great deal of interest in those bilinear
forms for the special case where V = U. In this case, we shorten the phrase "bilinear
form on V and V" to "bilinear form on V." We also agree that, without a statement
to the contrary, the bases A and B are the same whenever V = U. This means that
aij = f(ui,Uj) in the matrix of / relative to A. Throughout this section, we shall have
V = U and V an n-dimensional vector space over T".
The imposition of the conditions that U = V and A = B means that our bilinear
forms are now represented by square matrices only. As a more interesting consequence
of these conditions, we have the next theorem. This result should be compared with
Corollary 8.35 in the preceding section.

Theorem 8.37 Let A and B be square matrices of ordern over ?\ Then B is congruent
to A over T if and only if B and A represent the same bilinear form on V.

Proof. Let A and B be matrices of order n over ?*, and suppose that A represents
the bilinear form / o n V relative to the basis A.
Assume first that B is congruent to A over T. Then B = PTAP for some invertible
P over T. Since P is invertible with elements in J-*, P is the transition matrix from A
to a basis A' of V. With U = V, A = B, Q = P, and A' = B' in Theorem 8.34, we have
B — PTAP is the matrix of / relative to A'.
On the other hand, suppose that B represents / relative to some basis A' of V. If P
is the matrix of transition from A to Α', then Theorem 8.34 asserts that PTAP is the
matrix of / relative to A'. But this matrix is unique, so it must be that B = PTAP.
Now P is invertible with elements in T since it is the transition matrix from one basis
of V to another. Hence B is congruent to A over T. M
8.8 Symmetrie Bilinear Forms 277

The connection established between bilinear forms and matrices in Theorem 8.33
suggests the possibility of describing properties of bilinear forms in terms of their matri­
ces. Several interesting results along these lines can be obtained whenever the matrices
are square.
Definition 8.38 A bilinear form fonWis symmetric if / ( u , v) = / ( v , u) for all
u,v G V.
Theorem 8.39 A bilinear form f onW is symmetric if and only if every matrix that
represents f is symmetric.

Proof. Suppose that / is symmetric, and let A = [α^·]η be a matrix that represents
/ . Then CLij = / ( u ^ u ^ ) = / ( U J , U J ) = α^, and A is symmetric.
Suppose now that / is represented by a symmetric matrix A = [a^] n relative to
the basis A of V. Let u and v be arbitrary vectors with coordinate matrices X and Y,
respectively, relative to A. Then / ( u , v) = XTAY and / ( v , u) = YTAX. Since YTAX
is a 1 by 1 matrix, YTAX = (YTAX)T. Thus
/ ( v , u ) = (YTAX)T = XTAT(YT)T = XTAY = / ( u , v),
T
where A = A, since A is symmetric. ■

Our next theorem has an immediate corollary concerning symmetric bilinear forms.
Theorem 8.40 If 1 + 1 Φ 0 in T, every symmetric matrix A of order n over T is
congruent over T to a diagonal matrix.

Proof. The proof is by induction on the order n of A. The theorem is trivially true
for n = 1.
Assume that the theorem is true for all symmetric matrices of order k over T, and
let A be a symmetric matrix of order fc + 1 over T. Let A = {ui, 112,..., 11^+1} be a
basis of TkJrl, and let / be the symmetric bilinear form on Fk+l that has matrix A
relative to A. If A is the zero matrix, there is nothing to prove. Assume, then, that
&rs — / ( u r , u s ) 7^ 0 for the pair r, s. If / ( u r , u r ) = 0 and / ( u s , u s ) = 0, then
/ ( u r + u s , u r + u s ) = 2 / ( u r , u8) Φ 0.
Thus there is a vector v i G TkJrl such that d\ — / ( ν χ , ν ι ) φ 0. The set {vi} can
be extended to a basis B = {vi, V2,..., Vk+i} of TkJrX. The set A' = {u^u^, ...,u^. +1 }
is obtained from the basis B as follows: u! = νχ and uf- = v 7 V J u
i for
3 J
d\
j = 2,...,fc + l. That is,
ui = vi
, /(ui,v2) ,
u;2 = v 2 ^ — u ;
«1

11'
U
- v /«,ν^ + ΐ )U
fc+1 - V /c+l ^ l·
278 Chapter 8 Functions of Vectors

It is clear, then, that A! can be obtained from B by a sequence of elementary operations.


Therefore, A! is a basis of FkJrl. Let A\ denote the matrix of / relative to A!. The matrix
A\ is symmetric since / is symmetric. Now / ( u ^ u ^ ) = / ( ν χ , ν ι ) = di, and if j > 1,

/(ui.uj) ./(u;,V;.Mp)u;)
/(U'l-Vj)- /(u'nu'i)
di
0.

Thus Ai is of the form


di
^!
+
0

where Α^ is a symmetric k x k matrix. If Q\ denotes the matrix of transition from A


to B and Q2 denotes the matrix of transition from B to A', then Pi — Q1Q2 denotes
the invertible matrix of transition from A to Α', and A\ = P^fAPi by Theorem 8.34.
By the induction hypothesis, there is an invertible matrix Q of order k such that
QTA2Q = diag{d 2 , ...,dfc+i}. Hence

1 1 0
Pi = — + —
0 I Q
is an invertible matrix, and

(ΡιΡ2)τΑ(ΡιΡ2) = P?P?APiP2
= PÎA1P2
1 I 0 1 Γ di I 0 1 1 0
— + — — + —
T 0
0 I Q 0 I Q
di I 0

0 I QTA2Q
= diag{di,d 2 ,.-.,dfc+i}·

Thus P = P1P2 is an invertible matrix such that PTAP is diagonal. The theorem
follows by induction. ■
8.8 Symmetrie Bilinear Forms 279

Corollary 8.41 If 1 + 1 ^φ 0 in T, then every symmetric bilinear form on V can be


represented by a diagonal matrix.

Proof. The proof is left as an exercise (Problem 3). ■

Corollary 8.42 Every symmetric matrix over C is congruent over C to a diagonal


matrix Dr with the first r diagonal elements equal to 1 and all other elements zero.

Proof. See Problem 4. ■

Much the same as with Theorem 8.19, the proof of Theorem 8.40 provides a basis for
an approach to the problem of finding an invertible matrix P over T such that PTAP
is diagonal, but this approach is not very efficient. We can proceed more directly in the
following manner to obtain such a P. The first column of P is simply the coordinate
matrix
Pn
P21
Pi =

Pn\

relative to A of a vector u^ chosen such that

The vector u 2 must be chosen so that {u^, u 2 } is linearly independent and f(u[, u 2 ) = 0.
This latter condition means that the coordinate matrix

Pl2

, Ρ22
P2 =

Pn2

of u 2 relative to A must satisfy the equation

P[AP2 = 0.

That is, X\ = pi2,X2 — P22, ···, Xn = Pn2 must be a solution to the system (PfA)X = 0.
Similarly, the requirements that /(111,113) = 0 and / ( u 2 , u 3 ) = 0 are reflected by the
conditions (P^fA)Ps = 0 and (P%A)P$ = 0 on the coordinate matrix P3 of u 3 relative
to ^4, and so on. The method is illustrated in the following example.
280 Chapter 8 Functions of Vectors

Example 1 D The problem is to determine a real invertible matrix P such that PTAP
is diagonal, where
Γ
0 1 2
0 0
0 0
3
Let / be the symmetric bilinear form on R that has matrix A relative to £3. Since
/(βΐ,βί) = 0 for i — 1,2,3 and /(βι,β2) φ 0,we choose u[ = βχ + e 2 such that
/ ( u i , u i ) ^ 0 . Then Pi = [1,1,0] T and

P?A=[ll2

The choice P2 = [1» 1» ~~1]T satisfies (P± A)P2 = 0 and makes {u'1? u 2 } linearly indepen­
dent. The matrix P%A is given by

P?A = -1 1 2

The choice P 3 = [0, - 2 , 1 ] T satisfies (Ρ^Α)Ρ3 = 0 and {P?A)P3 = 0, and makes


{ u ^ u ^ U ß } linearly independent. The matrix P — [Ρι,Ρζ,Ρ^ is given by

1 1
P = 1 1
0 -1

andPTAP=:diag{2,-2,0}.l
The following definition and theorem apply to infinite-dimensional vector spaces as
well as those of finite dimension, although our main interest is in the latter case.
Definition 8.43 Let Y be a vector space over T. A mapping q of V into T is a
quadratic form on V if and only if there is a symmetric bilinear form f on V such
that q(u) = / ( u , u) for all u G V.
It should be observed that the definition of a real quadratic form that was given
earlier is entirely consistent with this definition.
It is clear that each symmetric bilinear form / determines a unique associated
quadratic form q by the rule that q(u) = / ( u , u). The following theorem shows that this
correspondence between symmetric bilinear forms and quadratic forms is one-to-one if
1 + 1 ^ 0 in T.
Theorem 8.44 Let Y be a vector space over the field T in which 1 + 1 Φ 0. / / the
quadratic form q on V is determined by the symmetric bilinear form f on Y', then
/(u,v) = i[9(u + v ) - 9 ( u ) - ? ( v ) ]
for all u, v m V.
8.8 Symmetrie Bilinear Forms 281

Proof. The theorem follows from the simplification

|[g(u + v ) - q ( u ) - q ( v ) ] I[/(u + v,u + v ) - / ( u , u ) - / ( v , v ) ]


= i[/(u,u) + /(u,v) + /(v,u) + /(v,v)
_/(u,u)-/(v,v)]
= è[2/(u,v)]
= /(u,v). ■

Definition 8.45 Let Y be a finite-dimensional vector space over the field T in which
1 + 1 φ 0. The matrix of the quadratic form q relative to the basis Λ of Y is the same
as the matrix of the symmetric bilinear form f that determines q.
We note that the matrix of a quadratic form is required to be symmetric.
For the remainder of this chapter, we restrict our attention to those vector spaces
that have a field of scalars T in which 1 + 1 ^ 0 .
The intimate connection between quadratic forms and symmetric bilinear forms
means that many of the results obtained for symmetric bilinear forms translate imme­
diately into statements about quadratic forms. The most important of these are listed
next.
1. If q has matrix A relative to the basis A of V and u has coordinate matrix X
relative to A, then q(u) = XTAX (Theorem 8.33).
2. If q has matrix A relative to the basis A of V and P is the matrix of transition
from A to A', then PTAP is the matrix of q relative to A! (Theorem 8.34).
3. Every quadratic form on V can be represented by a diagonal matrix (Corollary
8.41).
4. Every quadratic form on a vector space over C can be represented by a matrix Dr
with the first r diagonal elements equal to 1 and all other elements zero (Problem
6).

Exercises 8.8
1. For each matrix A, find an invertible matrix P such that PTAP is diagonal.
0 2-1 2 -1 1 0 2 3
(a)A = 2 1 1 (b)A = - 1 3 0 (c)A = 2 0-2
-1 1 -2 1 0 0 3-2 0

1 2 0 0 - 1 2 2 3
(a) A 2 -2 -3 (e)A -1 0 -3 (f)A = 1 -1
0-3 4 2-3 0 -1 1
282 Chapter 8 Functions of Vectors

2. For each matrix A in Problem 1, let / be the bilinear form on R 3 that has matrix
A relative to the basis A = {(1,1,1), (1,0,1), (0,1, —1)}. Use the matrix P from
Problem 1 to find a basis of R 3 relative to which / is represented by a diagonal
matrix.
3. Prove Corollary 8.41.
4. Prove Corollary 8.42.
5. Translate Corollary 8.42 into a statement concerning symmetric bilinear forms on
vector spaces over C.
6. Use Corollary 8.42 to prove (4) above in the list of properties of quadratic forms
over C.
7. Let q represent an arbitrary quadratic form.

(a) Prove or disprove: q(au + bv) = aq(u) + bq(v).


(b) Prove that q(u + v) + q(u - v) = 2[q(u) + q(v)}.

8. Let q be a quadratic form on R n with symmetric matrix A relative to £n :


q{x\,X2, ...,x n ) — XTAX. If λχ > λ2 > · · · > λ η are the (necessarily real) eigen­
values of A, prove that \\ > q(v) > Xn for all v = (xi,X2,...,xn) with length
1.
9. A bilinear form / on V is by definition skew-symmetric if and only if

/(u,v) = -/(v,u)

for all u, v G V. Prove that a bilinear form on V is skew-symmetric if and only if


any matrix that represents / is skew-symmetric.
10. Let V be a vector space over a field T in which 1 + 1 ^ 0 , and let / be a bilinear
form on V. Prove that / is skew-symmetric if and only if / ( v , v) = 0 for all v G V.
11. Prove that if 1 + 1 φ 0 in T, every bilinear form / o n V can be written uniquely
as the sum of a symmetric and a skew-symmetric bilinear form. (See Problem 8
of Exercises 8.7.)
12. Assume that 1 + 1 ^ 0 in T. The definition of a quadratic form on V can be
"extended" so as to include those mappings qoi V into T such that q(y) — / ( v , v)
for some (not necessarily symmetric) bilinear form on V.

(a) Prove that / is skew-symmetric if and only if / determines the zero quadratic
form on V.
(b) Prove that the two bilinear forms f\ and ji on V define the same quadratic
form on V if and only if f\ — $2 is skew-symmetric.
(c) Does this "extension" actually enlarge the set of quadratic forms on V? Why,
or why not?
8.9 Hermitian Forms 283

8.9 Hermitian Forms


We restrict our attention now to those vector spaces with a scalar field T that is either
R or the field C of complex numbers. Throughout this section, each statement involving
T may be read in only two ways: with T = R, and with T = C.
The development of Chapter 9 is based on the interesting and important concept
of an inner product on a vector space. We shall see there that, so long as our work is
with real vectors spaces, symmetric bilinear forms are sufficient for our needs. But with
complex vector spaces, we shall have need for another type of function, the hermitian
form. A hermitian form has somewhat the same relation to a complex bilinear form as
a symmetric bilinear form has to a bilinear form.

Definition 8.46 Let U and V be vector spaces over the field ?'. A complex bilinear
form on U and Y is a mapping f of pairs of vectors (u, V)G U x V onto scalars
/ ( u , v ) G f that has the properties

(i) / ( a i u i + a 2 u 2 ,v) = â i / ( u i , v ) + ä 2 / ( u 2 , v )

and
(ii) / ( u , 6 i v i + 6 2 v 2 ) = 6 1 / ( u , v 1 ) + fc2/(u,v2),
where az- denotes the complex conjugate of ai.

We describe the conditions (i) and (ii) by saying that / is complex linear in the
first variable and linear in the second variable. It is readily seen that a function / from
U x V into T is a complex bilinear form if and only if

( r

2=1
s

j= l
\

J
r s

2=1 j = l
for all positive integers r, s.
Whenever the field of scalars is real, a complex bilinear form reduces to a bilinear
form. In this sense, a complex bilinear form is a generalization of a bilinear form. We
must keep in mind, however, that the two concepts are quite distinct whenever T is the
field C.
Throughout the remainder of this section, U and V will denote vector spaces over
T with dimensions m and n, respectively, and / will denote a complex bilinear form on
U and V.

Definition 8.47 Let Λ = { u i , u 2 , ...,u m } and B = {vi, v 2 ,..., v n } be bases ofXJ and
V, respectively. The matrix of a complex bilinear form / relative to Λ and B is
the matrix A = [ a y ] m x n , where ai3 — /(u^, Vj) for i = 1, 2,..., m; j = 1, 2,..., n.

At this point, a parallel can be perceived between the properties of bilinear forms
and those of complex bilinear forms. Actually, the parallel is so strong that it is quite
repetitious to develop the properties of complex bilinear forms in as much detail as
284 Chapter 8 Functions of Vectors

was done with bilinear forms. At the same time, the adjustments that are necessitated
by complex linearity in the first variable are not altogether obvious. Consequently, we
shall more or less outline the development here with statements of the major results
as theorems, and leave the proofs of most of these theorems as exercises. In all cases,
a proof can be obtained by a suitable modification of the proof of the corresponding
result for bilinear forms.
We recall that the conjugate transpose (A)T of a matrix A is denoted by A*.

Theorem 8.48 Let X — [x\,x2, • ••,^m] T denote the coordinate matrix o / u e U rela­
tive to the basis A o / U , and let Y = [2/1,2/2? • ••»2/n]T be the coordinate matrix o / v G V
relative to the basis B o/V. If A = [a^^xn, then A is the matrix of the complex bilinear
form f on U and V if and only if the equation

f(u,v) = X*AY
is satisfied for all choices of u E U, v E V.

Proof. See Problem 7. ■

Example 1 □ Consider the mapping / defined from C2 x C3 into C by

/ (u, v) = 3iûiv2 + ( - 1 - 3i)uiV3 + iü2v\ + (4 - i)Ü2V2 + ( - 4 + 32)^2^3,

where u = (1^1,^2) and v = (vi,v2,vs). Guided by the equation α^ = /(e^e^·) in


Definition 8.47, we see that the value of / ( u , v) can be written in the form

/(u,v) f((u1,U2),(vUV2,V3))

Vl
) 3i -l-3z
[Ul,U2\ v2
- 4 + 3z
V3

Hence / is the complex bilinear form on C2 and C3 that has the matrix

0 3z -l-3z
A=
i 4- i - 4 + 3i

relative to the standard bases E2 of C2 and f 3 of C3. ■

It follows from Definition 8.47 and Theorem 8.48 that two complex bilinear forms /
and g on U and V are equal if and only if they have the same matrix relative to bases
A of U and B of V.

Theorem 8.49 Let A be the matrix of f relative to the bases AofU and B ofV. If Q
is the matrix of transition from A to the basis A! of U and P is the matrix of transition
from B to the basis B' of V, then Q*AP is the matrix of f relative to A! and B'.
8.9 Hermitian Forms 285

Proof. See Problem 8.


Example 2 □ Let / be the same complex bilinear form as in Example 1, and consider
the problem of finding the matrix of / relative to the bases A' = {(1, z), (i, 0)} of C2
and B' = {(1,1,1), (1,1,0), (1,0,0)} of C3.
The matrix of transition from A to A' is

1 i
Q
i 0
and the matrix of transition from B to B' is

1 1 1
P= I1 1 0
1 0 0

By Theorem 8.49, the matrix of / relative to A! and B' is

1 1 1
1 -i 0 3z - 1 - 3i
Q*AP 1 1 0
-i 0 i 4- z - 4 + 3z
1 0 0
2 -i 1
i 3 0

Our main interest in complex bilinear forms is with the case where V = U. As with
bilinear forms and linear transformations, we assume that the bases A and B of V = U
are the same unless it is stated otherwise. For the remainder of the section, we shall
have V = U and V an n-dimensional vector space over T.
The equivalence relation that we need to replace congruence of matrices over T is
given in our next definition.
Definition 8.50 Let A and B be matrices over the field ?\ Then B is conjunctive
(or hermitian congruent) to A over T if there is an invertible matrix P over T such
that B = P*AP.
Theorem 8.49 and Definition 8.50 lead to the following result.
Theorem 8.51 Let A and B be matrices of order n over T. Then B is conjunctive to
A over T if and only if B and A represent the same complex bilinear form on V.
The term "hermitian" applies to complex bilinear forms in about the same way as
the term "symmetric" applies to bilinear forms. As a matter of fact, when T is the field
of real numbers, the two terms are coincident, just as "bilinear" and "complex bilinear"
are. In this sense, a hermitian complex bilinear form is a generalization of a symmetric
bilinear form.
286 Chapter 8 Functions of Vectors

Définition 8.52 A complex bilinear form f on V is hermitian if and only if

/(u,v)=7ÖMÖ
for all u, v G V.
Definition 8.53 A matrix H over the field T is hermitian if and only if H* — H.
Thus a real matrix A is hermitian if and only if it is symmetric. The relation
between hermitian complex bilinear forms and hermitian matrices is exactly what one
would expect.
Theorem 8.54 A complex bilinear form f on V is hermitian if and only if every matrix
that represents f is hermitian.

Proof. See Problem 10. ■

Our next theorem corresponds to the result in Theorem 8.40 for symmetric matrices.
We include a proof for this theorem, since it is quite important and its proof is a bit
more difficult than the others in this section.
Theorem 8.55 Every hermitian matrix H of order n over C is conjunctive over C to
a diagonal matrix.

Proof. The proof is by induction on the order n of H. The theorem is trivially valid
for n = 1.
Assume that the theorem is true for all hermitian matrices of order k over C, and
let H be a hermitian matrix of order k + 1 over C. Let A = {ui, U2,..., u^+i} be a basis
of C fc+1 , and let h be the hermitian complex bilinear form on CkJtl that has matrix H
relative to A. If H = 0, H is a diagonal matrix already. Assume now that H ^ 0 and
/i(u r , u s ) = a + bi Φ 0 for the pair r, s. If /i(u r , u r ) = 0 and h(us, us) = 0, then

h(ur + ius,ur + ius) = ft(ur,ur) + z/i(u r ,u s ) — ih(us,ur) + ft(us,us)


= i[h(ur,ua) - ft(ue,ur)]
= z[/i(u r ,u e ) - ft(ur,ue)]
= -26.
A similar computation shows that

h(ur + u s , u r + us) = ft(ur,us)+ /i(u r ,u s )


= 2a.

Thus, there is a vector vi G Cfc+1 such that d\ — Λ(νχ,νι) Φ 0. The set {vi} can be
extended to a basis B — {vi, V2,..., v^+i} of Ch+1. The set A' = {u^ui,, . . . , ^ + 1 } is
obtained from B as follows:
ui = vi
8.9 Hermitian Forms 287

and
fc(ui»vi)
u0 Ui

for j = 2,...,fc+ 1. Since A! is obtained from B by elementary operations, A! is a basis


of C fc+1 . The matrix # i of h relative to A' is hermitian, since h is hermitian. Now
h(u[, u[) = di, and a simple computation shows that h(ui,ufj) = 0 for j = 2,..., k + 1.
Hence if i is of the form
r
dx | 0
#1 =
—+
# 2
0
where H2 is a k x k hermitian matrix. The matrix of transition Ρχ from A to *4' is
invertible over C, and # 1 = PfHPi by Theorem 8.49.
By the induction hypothesis, there is an invertible matrix Q of order k over C such
that Q*H2Q = diag{d2,-.-,d/c+i}· Now

1 I 0

0 I Q

is an invertible matrix over C, and

(PiP 2 )*F(P 1 P 2 ) = P$P{HPXP2

— P^HiPi
di I 0

0 I Q*H2Q
= diag{di,d 2 ,...,c4+i}.

Thus, P = P i P 2 is an invertible matrix over C such that P*HP is diagonal. The


theorem follows from the principle of mathematical induction. ■

Definition 8.56 A hermitian complex bilinear form on V is called a hermitian form


on V.

It follows from Theorems 8.55 and 8.51 that any hermitian form h on V can be
represented by a diagonal matrix. Now the condition H* = H requires that the diag­
onal elements of a hermitian matrix must be real. Hence the diagonal elements in a
diagonalized representation of h are always real.
The proof of Theorem 8.23 can be modified so as to obtain a proof of the follow­
ing theorem. With q(v) replaced by /i(v,v) and R n replaced by C n , the only other
288 Chapter 8 Functions of Vectors

changes necessary are in the expressions for ft(v, v). For example, the two diagonalized
representations would appear as ft(v, v) = Σ*·=ι dk \yk\ a n d Mv> v ) = Σ)£=ι ck \zk\ ·
Theorem 8.57 In any two diagonalized representations of a hermitian form ft, the
number of positive terms is the same and the number of negative terms is the same.

The definitions of index, signature, positive definite, etc., apply to hermitian forms
as they are given in Definitions 8.25 and 8.29. The positive definite hermitian forms
are those that are fundamental to Chapter 9. The index of a hermitian matrix is by
definition the same as that of a hermitian form that it represents.

Corollary 8.58 Any hermitian matrix H is conjunctive over C to a unique matrix of


the form
Γ
h I o l o

+ +
c= -Ir- V 0

+ +—
0 I 0 I 0
where r is the rank of H. Two n x n hermitian matrices are conjunctive over C if and
only if they have the same rank and index.

Proof. See Problem 12. ■

Corollary 8.59 With a suitable choice of basis in V, any hermitian form on V can be
represented by a matrix of the form C in Corollary 8.58.

Proof. See Problem 13. ■

Corollary 8.60 A hermitian form ft is positive definite if and only if

ft(v,v) > 0

for all v ^ O .

Proof. See Problem 14. ■

Just after Corollary 8.42 in Section 8.8, we described a method for obtaining an
invertible matrix P that would reduce a given symmetric matrix A to a diagonal matrix
PTAP, and we illustrated the method in Example 1 of that section. In order to obtain
a method for finding an invertible matrix P that will reduce a hermitian matrix H to
a diagonal matrix P*HP, the only changes that are necessary in that description are
that each Pj A be replaced by Ρ?Η. The method for obtaining P such that P*HP is
of the form C in Corollary 8.58 is entirely analogous to the techniques of Section 8.8.
Our next example gives a demonstration of this method.
8.9 Hermitian Forms 289

Example 3 □ For the hermitian matrix

5 Ai -A
H= I -Ai 3 5i
- 4 -5i 3
we shall find an invertible matrix P such that

I o |
■+ +
P*HP= I o | -J r _ p |
— + +
0 | 0 |
We first find an invertible matrix Pi such that ΡιΗΡι is diagonal. Since hu = 5 is not
zero in H = [hij], we can use

Ci =

as the first column of P\. The second column Ci of Ρχ needs to make {Ci, C<i\ linearly
independent and satisfy {C\H)Ci = 0, which appears as

[ δ 4i - 4 ] C 2 = 0.
The choice

C2

satisfies both conditions. The third column C3 of Pi needs to make {Ci, C2, C3} linearly
independent and satisfy (C{H)Cz = 0 and {C^H)^ = 0. That is, we need a column
C3 that is not a linear combination of Ci and C2 and that satisfies both equations

[ δ Ai -A ] C 3 = 0,
[ 0 - 9 i - 1 ] C 3 = 0.
The choice
-8<
1
-9z
290 Chapter 8 Functions of Vectors

satisfies all the conditions. Thus

1 4 -8i
Pi 0 0 1
0 5 -9i

is a matrix such that PfHPi is diagonal. Performing the multiplication, we find that

Γδ 0 0
ΡΪΗΡλ = 0 - 5 0
[0 0 16
Interchanging the second and third columns in Pi yields

1 -8i 4
P2 = 0 1 0
0 --9» 5

such that P2HP2 = diag{5,16, —5}. If wee now put


1
0 0
1
P3 = 0 4
0
1
0 0

and P = P2P3, we get

P*HP = (P2P3)*H(P2P3)
= P3*(P2HP2)P3

I
0 0 5 0 0
Έ
1
0 i 0 0 16 0 0 4
0
°i. 0 0 -5
°-°Λ
1 0 0
0 1 0
0 0-1

and this matrix has the form described in Corollary 8.58.


8.9 Hermitian Forms 291

Exercises 8.9

1. Which of the following matrices are hermitian?


0 1 ll
5 i 2 1-i 1 1
(a) (b) (c) (d) 1 0 1
-i 2 1+ z 3 -1 1
1 1 0
11
5 i 2 0 2 1 9

(e) -i 2 -1 (f) -2 1 -1 (g)


2 - 1 2 -1 1 -2

Si 0 4z
(h) 0 2i - 2
-Ai 2 i

2. For each hermitian matrix H in Problem 1, answer the following questions.

(a) Find an invertible matrix P such that P*HP is diagonal.


(b) State the rank and index of H.
(c) Which of these matrices are conjuctive over C?

3. For each hermitian matrix H in Problem 1, find an invertible matrix P such that
P*HP is of the form C given in Corollary 8.58.
4. In each of parts (a)-(h) of Problem 1, let / be the complex bilinear form on Cn
that has the given matrix relative to Sn.

(a) Which of these functions / are hermitian forms?


(b) For each / that is a hermitian form, find a basis A! of Cn with respect to
which / is represented by a diagonal matrix.

5. Prove the following corollary to Theorem 8.49. Two m x n matrices over T


represent the same complex bilinear form on U and V if and only if A and B are
equivalent over T.
6. Let / be a complex bilinear form on U and V over C. Prove that, with suitable
choices of bases in U and V, / can be represented by a matrix Dr that has the
first r diagonal elements equal to 1 and all other elements zero.
7. Prove Theorem 8.48.
8. Prove Theorem 8.49.
9. Prove Theorem 8.51.
292 Chapter 8 Functions of Vectors

10. Prove Theorem 8.54.

11. Write out the details of the proof of Theorem 8.57.

12. Prove Corollary 8.58.

13. Prove Corollary 8.59.

14. Prove Corollary 8.60.

15. Prove that a hermitian matrix H is positive definite if and only if there exists an
invertible matrix P such that H = P*P.

16. Given that the matrix H in Problem 1(e) is a positive definite hermitian matrix,
find an invertible matrix P such that P*P — H.

17. Let A and B be hermitian matrices of the same dimensions.

(a) Prove that A + B and A — B are hermitian.


(b) Prove that if AB = BA, then AB is hermitian.
(c) Prove that AT is hermitian.
(d) Prove that if A is invertible, then A"1 is hermitian.

18. A matrix A is called skew-hermitian if A* = —A. Prove that any square matrix
A can be written uniquely as A — B -f C with B hermitian and C skew-hermitian.
Chapter 9

Inner Product Spaces

9.1 Introduction
The vector spaces R n have been an intuitive guide in our development thus far, and
we have extended most of the concepts introduced there to more general settings. The
outstanding exceptions are the basic concepts of length and inner product in R n . In this
chapter, we generalize these concepts and study those properties of vector spaces that
are based on an inner product. It is possible to define an inner product for vector spaces
over fields other than the field R of real numbers or the field C of complex numbers, but
our interest is restricted to these cases. Accordingly, throughout this chapter, we shall
always have either T = R or T — C, and V shall denote a finite-dimensional vector
space over T.

9.2 Inner Products


We begin with the definition of an inner product. As mentioned above, T denotes a field
that is either R o r C , and V is a finite-dimensional vector space over T. The conjugate
of the complex number z is denoted by ~z.

Definition 9.1 An inner product on V is a mapping f of pairs of vectors (u, v) in


V x V onto scalars / ( u , v) G T that has the following properties:
(i) f(aiui + a 2 u 2 , v ) = ä i / ( u i , v ) + ä 2 / ( u 2 , v )
(ii) / ( u , v) = / ( v , u)
(iii) / ( v , v) > 0 ifw φ 0 and /(Ο,Ο) = 0.

We note that property (ii) requires that / ( v , v) be real and property (iii) requires
that this real number be positive except when v = 0.
If T = R in Definition 9.1, V is called a real inner product space, or a Euclidean
space. If T = C, V is called a complex inner product space, or a unitary space.

293
294 Chapter 9 Inner Product Spaces

The term inner product space is used to refer collectively to Euclidean spaces and
unitary spaces.
The properties listed in Definition 9.1 invite a comparison between inner products
and hermitian forms. This comparison yields the following theorem, which is important
even though the proof is trivial.

Theorem 9.2 A mapping f of V x V into T is an inner product on V if and only if


f is a positive definite hermitian form.

Proof. Suppose first that / is a positive definite hermitian form. Then / is a complex
bilinear form, so it follows from Definition 8.46 that / has property (i) of Definition 9.1.
By Definition 8.52, the mapping / has property (ii) of Definition 9.1. Finally, / has
property (iii) by Corollary 8.60. Hence / is an inner product on V.
Assume, on the other hand, that / is an inner product on V. The references cited
in the preceding paragraph show that / satisfies all of the requirements of a positive
definite hermitian form except possibly the condition that

/ ( u , &ivi + 6 2 v 2 ) = 6i/(u, v x ) + 6 2 /(u, v 2 )

in Definition 8.46. But

/ ( u , 6 i v i + δ 2 ν 2 ) = /(&ivi + 6 2 v 2 , u )
= 6 i / ( v i , u ) + 6 2 /(v 2 ,u)
= &l/(Vi,u)+&2/(V2,u)

= &i/(u,v 1 ) + 6 2 / ( u , v 2 ) ,

so that / is indeed a positive definite hermitian form. ■

The next example presents the most commonly used inner products in R n and Cn.

Example 1 □ For any two vectors u = (ui,u 2 ,..., un) and v = (v\,v2, ...,f n ) in R n ,
let the value / ( u , v) be given by
n
/ ( U , V ) =UiVi +U2V2-\ VUnVn = ^^UkVk>
k=l

Then / ( u , v) is the familiar scalar product u · v of Chapter 1. The properties required


in Definition 9.1 follow easily from Theorem 1.20.
The corresponding inner product on Cn is given by
n
# ( U , V ) =ÜiVi + Ü 2 V2 + \-ÛnVn = y^^fc^fc,
fc=l

where u = (ΐ/χ, u2, ...,u n ) a n d v = (vi,v2, ...,vn) in Cn. The verification that this map­
ping is an inner product is left as an exercise (Problem 3). H
9.2 Inner Products 295

The inner products in Example 1 will be referred to hereafter as the standard inner
products on R n and C n , respectively. Whenever an inner product on R n or Cn is not
specified, it is understood to be the standard inner product.

Example 2 D Let u = (u\,u2) and v = (^ι,ι^) in R 2 . We shall determine if the


following rule determines an inner product on R 2 :

/ ( U , V) = 2u\Vi + U\V2 + U2V\ + U2V2.

We see that the value of / can be written as

r -i 2 1 Vl
f ({u\,u2),{vi,v2)) = U\ U2
1 1 v2

and since we are dealing with a real vector space, / is a hermitian form. Thus / is
an inner product if and only if / is positive definite; that is, if and only if the third
property in Definition 9.1 is satisfied. Since

/ ( v , v ) = 2v\ + 2viv2 + vl
= v\ Λ-{υλ Λ-υ2)2,

we see that / ( v , v) > 0 and / ( v , v) = 0 if and only if v\ = 0 and v\ + v2 = 0, that is,


if and only if v = 0. Thus / is an inner product on R 2 , one that looks quite different
from the standard inner product. ■

The connection established in Theorem 9.2 makes available the results of Section
8.9 for use with inner products. In particular, an inner product / o n V has a unique
matrix A = [α^]ηχη relative to each basis A = {vi, V2,..., v n } of V. This matrix A is
determined by the conditions α^ = /(v*, Vj), and / ( u , v) = X*AY where u and v have
coordinate matrices X and Y, respectively, relative to A. If P is the matrix of transition
form A to A', then / has matrix P*AP relative to A!', by Theorem 8.49. Since / is
positive definite, Corollary 8.59 implies that there is a basis A' of V such that the inner
product / has matrix In relative to A!. Relative to this basis A', / ( u , v) is given by
/ ( u , v ) = X*Y = Y^^iXkVk' Thus the standard inner products in Example 1 furnish
typical examples provided the choice of basis is appropriate.
The results obtained in this chapter are valid for all finite-dimensional inner-product
spaces, in that they are not dependent on a particular choice of / . For this reason, it
is customary to replace the notation / ( u , v) by a more convenient one. We choose to
drop the / from the notation, and simply write (u, v) instead of / ( u , v). This change
of notation has an additional advantage in that it reminds us that we are dealing with
an inner product, and not just a complex bilinear form.
According to the result in Problem 15 of Exercises 8.9, a hermitian matrix H is
positive definite if and only if H = P*P for some invertible matrix P. This result
296 Chapter 9 Inner Product Spaces

provides an easy way to construct inner products on Cn. For u = (u\, 112, --·,ηη) and
v = (υι,ΐ>2,—,νη) in C n , let

Ui

u2 V2
U V■

We can choose an invertible P of order n, put H — P*P, and then the rule

(u,v) = U*HV

defines an inner product on Cn. We say that the invertible matrix P generates this
inner product.

Example 3 □ As a specific example of the preceding discussion, the matrix

1 0 0
- i l i
0 i 1

generates the inner product on C3 defined as follows. For u = (^1,^2,^3) and v =


(^1,^2,^3) i n C 3 ,

(u,v) = Ui U2 U3 P*P V2

V3

2 i -1
U\ U2 U3 -i 2 0 V2

- 1 0 2 V3

= 2UiVi + iÜiV2 — Û1V3 — ZÜ2VI + 2ϊΪ2^2 - ^ 3 ^ 1 + 2Ü3V3. ■

Exercises 9.2

1. Using the standard inner product in C3, compute (u,v) and (v, u) for the given
vectors.

(a) u = (2 - z, 1 - 5z, 1 + 2i), v = (2 + z, - 3 , - 5 + i)


(b) u = ( l l , - 6 t , - l ) , v = ( 2 i , 6 , - i )
9.2 Inner Products 297

2. Repeat Problem 1 using the inner product in Example 3 of this section.

3. Let / : R n x R n —> R and g : Cn x Cn —► C be the standard inner products as


defined in Example 1.

(a) Prove that / is an inner product on R n .


(b) Prove that g is an inner product on Cn.

4. Let u = (u\,u2, ...,tt n ) and v = (vi,v2, •••^n) in R n . Prove or disprove that the
given rule defines an inner product on R n .

(a) (u, v) = 2uiV\ + u\v2 + u2v\ + 2u2v2 on R 2


(b) (u, v) = u\v2 + u2v\ on R 2
(c) (u, v) = u\V\ + u\v2 + ^2^i + ^2^2 on R 2
(d) (u, v) = 2u\V\ + ^2^2 on R 2
(e) (u, v) = u\V\ + ^3^3 on R 3
(f) (U, V) = u\v\ + W2^! + ^3 V 3 on
R-3

5. Assume that the matrix

H
2 -1

is a positive definite hermitian matrix, and let (u, v) be the inner product on C3
that has matrix H relative to the basis

^={(1,0,0),(1,1,0),(1,1,1)}.

Write out the value of (u, v) for arbitrary vectors u = (u\, u2,us) and v =
(^1,^2,^3) i n C 3 .

6. Find a basis A' of C3 for which the inner product in Problem 5 has the form
(u, v) = X*y, where u and v have coordinate matrices X and V, respectively,
relative to A'.

7. Recall that the trace of a matrix A — [ α ^ ] η Χ η is the number t(A) = Σ™=ιθ>α·


Prove that (A, B) = t(ATB) is an inner product on the vector space R n x n of all
n x n matrices over R.

8. Let A = {vi, V2,..., v n } be a basis of the inner product space V, and let u and
v be arbitrary vectors with coordinate matrices X and Y, respectively, relative to
A. Prove that (u,v) = X*Y for all u, v G V if and only if (v;,Vj) = 6ij for all
pairs i,j.
298 Chapter 9 Inner Product Spaces

9. Assume that the function / defined on C2 x C2 by

/ ( ( a i , 0 2 ) , (61,62)) = 5âi&i + i â i 6 2 - iâ2bi + 2ä 2 6 2


is an inner product on C2.

(a) Find the matrix of / relative to the basis £2 = {(1,0), (0,1)}.


(b) Find a basis Λ = {vi, v 2 } of C2 such that (v i? v^) = Sij for all i, j .

10. Show that (v, 0) = 0 for all v G V.


11. Prove that if (u, v) = 0 for all v G V, then u = 0.
12. Prove that if u and v are two vectors in V such that (u, w) = (v, w) for all w in
V, then u = v.
13. In each of parts (a)-(h) of Problem 1 of Exercises 8.9, let / be the complex bilinear
form on Cn that has the given matrix relative to £ n . Which of these functions are
inner products on C n ?
14. The definition of an inner product in Definition 9.1 applies equally well to infinite-
dimensional vector spaces V. Assuming the usual properties of the definite inte­
gral, prove that (/,#) = JQ f(x)g(x)dx defines an inner product on the vector
space of all continuous real-valued functions on [0,1].

9.3 Norms and Distances


The concept of an inner product leads to the notion of the length or norm of a vector.
Definition 9.3 For a given inner product (u, v) on V, the length or norm of a vector
v m V is the real number ||v|| given by ||v|| = >/(v, v).
Example 1 □ We shall compute ||v|| for v = (2z, 6, —z) in C3 by using the standard
inner product and also by using the inner product in Example 3 of Section 9.2.
Using the standard inner product, we have
(v, v) = (-2i)(2t) + (6)(6) + (i)(-i) = 41
and
IMI = Λ/ΪΪ.
Using the inner product from Example 3 of Section 9.2, we have

2 i -1 2%
(v,v) = -2i 6 i -i 2 0 6 = 110
-1 0 2 —i

and
9.3 Norms and Distances 299

The norm of a vector as given in Definition 9.3 is readily seen to be a generalization


of Definition 1.19. Our last example illustrates that the number obtained as the length
of a vector depends on the particular inner product that is being used. However, we
shall see in Theorem 9.5 that this length has the properties that one would naturally
expect. In the derivation of these properties, the famous Cauchy-Schwarz inequality is
of invaluable aid.

Theorem 9.4 (The Cauchy-Schwarz inequality) For any vectors u, v in V,

|(u,v)|<||u|H|v||.

Proof. If v = 0, the equality holds with both members zero. Consider the case where
v / 0 . Since the inner product is a positive definite hermitian form, we have

0 < (u — zv, u — zv)


= (u, u) - z(u,v) - z ( v , u ) + zz(v,v)
(u, u) - z(u, v) - z(u, v) + zz(y, v)

for any complex number z. Since v φ 0, we may let z — 7—^—-. This yields
(v,v)

(v,_v) (v,v) (v,v)2


= (U,u)-(U'V)(U'V)
(v,v)

Multiplication of both members by (v, v) yields to

(u,v)(u,v) < ( u , u ) ( v , v )

or
Ku,v)| \|2 ^ II ||2||
< ||u|| ||v|| .
||2

The statement of the theorem follows from taking the positive square root of both
members of the last inequality. ■

It is left as an exercise to prove that the equality holds in the Cauchy-Schwarz


inequality if and only if {u, v} is linearly dependent (Problem 7).

Example 2 □ We shall verify the Cauchy-Schwarz inequality using the inner product
from Example 3 of the last section with

u = (1 + i, - 3 i , - 2 ) , v = (2i, 6, -i).
300 Chapter 9 Inner Product Spaces

We have ||v|| = Λ/ΪΪΟ from our last example. Performing the other required computa­
tions, we get

2 i -1 2%
(u,v) = i 3i -i 2 0 6 11+6H,
1 0 2 —i

(u, v)| = Λ/121 + 3721 = \/3842,

2 i -1 1+2
(u, u) = 1-i 3i - 2 -i 2 0 -3z = 40,

- 1 0 2 -2

ull = VÏO.
The inequality
\/3842 < v'iOvTÏO = \/44ÖÖ
is a valid one, verifying the Cauchy-Schwarz inequality in this case. ■
We now derive the fundamental properties of the norm.
Theorem 9.5 The norm of a vector has the following properties:
(i) | | v | | > 0 i / v ^ 0 , and ||0|| = 0;
(ii) \\av\\ = \a\ · ||v|| ;
(iii) ||u + v|| < ||u|| + ||v||.

Proof. The first of these follows immediately from ||v|| = >/(v, v), since the inner
product (v, v) is a positive definite hermitian form.
For any a G T and any v G V,

||αν|| = \/(αν,αν) = y/âa(v, v) = \a\ >/(v,v).


This proves (ii).
For any complex number z = a -h bi with a , ö G R , z + Έ = 2a <2\z\. Hence

u- = ( u - h v , u + v)
= (u,u) + (u,v) + (u,v) + (v,v)
< ||u|| 2 + 2 | ( u , v ) | + ||v|| 2 .

But |(u, v)| < ||u|| · ||v|| by the Cauchy-Schwarz inequality, so

llu + v f < ||u|| 2 + 2 | | u | H | v | | + ||v|| 2


u + v
9.3 Norms and Distances 301

This result is equivalent to property (iii), and the proof is complete. ■

There is one more basic concept to be introduced in an inner product space, that of
a distance function.

Definition 9.6 For a given inner product (u, v) on V, the distance d(u, v) between
vectors u and v in V is defined by d(u, v) = ||u — v||.
The distance thus defined has the properties listed in the next theorem. The proof
is left as an exercise.
Theorem 9.7 The distance function d has the following properties:
(i) d(u, v ) > 0 i / u / v , and d(v, v) = 0;
(ii) d(u, v) = d(v,u);
(iii) d(u, v) + d(v, w) > d(u, w).

Proof. See Problem 8. ■

A set in which there is defined a distance function with the properties (i), (ii), (iii)
of Theorem 9.7 is called a metric space, and the distance function is called a metric
or norm for the space. The only norms that we are interested in are those connected
with an inner product, but there are more general norms.

Exercises 9.3
1. Compute the norm of the given v e C3.
(a) v = ( 3 , - i , l ) (b) v = (0,2i,l)
(c) v = (2i,i,0) (d) v = ( 5 + i,5,0)
2. Using the inner product from Problem 5 of Exercises 9.2, compute the norm of
each v in Problem 1 above.
3. Let u = (1^1,^2) and v = (^1,^2)· The rule
(u, v) = 2u\V\ + uiv2 + u2vi + 2u2v2
defines an inner product on R 2 .

(a) Use this inner product and verify the Cauchy-Schwarz inequality for u =
(1,2) and v = ( 3 , - 5 ) .
(b) Use this inner product and compute ||w|| for w = (—1,3).
(c) Using this inner product and the vectors from part (a), compute d(u,v).

1 1
4. Use the inner product on R 2 generated by and verify the Cauchy-
-1 2
Schwarz inequality for u = (3, —1) and v = (2, 4).
302 Chapter 9 Inner Product Spaces

5. Using the standard inner product on C2, write out the value of d(u, v) for arbitrary
vectors u, v G C2.
6. Find the value d(u, v) for the distance function determined by the inner product
in Problem 5 of Exercises 9.2.
7. Prove that the equality sign holds in Theorem 9.4 if and only if {u, v} is linearly
dependent. (Hint: Consider ((u, u)v — (u, v)u, (u, u)v — (u, v)u) in case u φ 0.)
8. Prove Theorem 9.7.
9. Use the Cauchy-Schwarz inequality to prove the following statements.

(a) For any real numbers ai, a2,..., an and 6i, &25 ···> bn,

X>6fc < [Σ4 Σ 6 ;


Kk=i SJfe=l Kk=l

(b) For any complex numbers a\, α2,..., an and 6i, b2,..., bn,

y^Qfcfrfc
k=l
< Ew2 Kk=2
ΣΝ
Kk=l

10. Let V be a Euclidean space.

(a) Show that the Cauchy-Schwarz inequality implies that

(u
-i< 'v) <i
- Hull · iivii -
for any u, v in V.
(b) We define the angle Θ between two nonzero vectors u and v in V by

HI · ||v||
with 0 < θ < 7Γ. Prove the law of cosines in V :

u- vi |υ|| 2 + | | ν | | 2 - 2 | | υ | | · | | ν | ^ ο 8 ^ .

11. For u = (1,-4) and v = (2, — 3), use the inner product from Problem 3 to compute
cos#, where Θ is the angle between u and v.

Ui U2 V\ V2
12. Let U = and y = . Given that the rule
u3 u 4 v3 v4

(U,V) = U\Vi + U2V2 + ^ 3 ^ 3 + ^4^4


9.4 Orthonormal Bases 303

defines an inner product on R/2x2, use this inner product to compute cosö, where
1 2 0 3
Θ is the angle between A and B —
2 0 6 2

13. Assume that the trace function ({/, V) = t(UTV) is an inner product on R,2x25
1 2
and find the norm of
0 2

14. Prove that for any u, v in a Euclidean space V,

|u + v||2 + ||u-v|| 2 = 2(||u||2 + ||v"2


15. Prove that for any u, v in a Euclidean space V,

|2 1 il i|2
(u,v) = T ||u + v - 4 U-V .

16. Prove that for any u, v in a unitary space V,

I2 1 ll ||2 , i M , · ii2
(u,v) = \ ||u + v lu — ivll ,
I -£||u-v|| + ±||u + zv||

9.4 Orthonormal Bases


With the generalizations of the concepts of inner product and length of a vector, there
are corresponding generalizations of many of the results in Chapter 8.

Definition 9.8 A set {v\ \ λ G £ } of vectors in V is orthogonal if (VA 1? VA 2 ) = 0


whenever λι φ À2· The set \y\ \ λ G £} is orthonormal if it is orthogonal and if each
VA has norm 1.

It is easily proved that an orthogonal set of nonzero vectors in V is necessarily


linearly independent (see Problem 9).
The definition of an orthogonal matrix over R remains unchanged from Definition
8.12: P is orthogonal if and only if PT = P " 1 .
The proof of Theorem 8.14 modifies easily to yield the following result and Theorem
9.11 as well.

Theorem 9.9 Let P be anrxr matrix over R, and let V be a real inner product space.
Then P is orthogonal if and only if P is the transition matrix from one orthonormal set
of r vectors in V to another orthonormal set of r vectors in V.
304 Chapter 9 Inner Product Spaces

Proof. See Problem 10. ■


Definition 9.10 A matrix U over C is unitary if and only if U* = U~l.
As transition matrices, unitary matrices play the same role for complex inner product
spaces as orthogonal matrices do for Euclidean spaces.
Theorem 9.11 Let U be anr xr matrix overC, and let V be a complex inner product
space. Then U is unitary if and only if U is the transition matrix from one orthonormal
set of r vectors in V to another orthonormal set of r vectors in V.

Proof. See Problem 12. ■

If Vj · Ufc+i is replaced by (VJ, Ufc+i) in the Gram-Schmidt orthogonalization process


(Theorem 8.15), there results a proof of the following theorem. The verification of this
is requested in Problem 13.
Theorem 9.12 Let V be an inner product space, and let A — {ui, U2,..., u r } be a basis
of the subspace W of V. There exists an orthonormal basis λί = {vi, V2, ··., v r } of W
such that each v^ is a linear combination of \i\, U2,..., u 2 .
As an immediate corollary, every finite-dimensional inner product space has an or­
thonormal basis.
Corollary 9.13 Any orthonormal set {vi, V2,..., v r } of vectors in V can be extended
to an orthonormal basis {vi, V2, ..., v r > v r +i,..., v n } o/V.

Proof. See Problem 14. ■

A basis A = {v l 7 v 2 ,..., v n } of V is orthonormal if and only if (v^, Vj) = % . This


condition is equivalent to the requirement that (u, v) = X*F, where u and v have
coordinate matrices X and Y, respectively, relative to A (see Problem 8 of Exercises
9.2). Thus, the orthonormal bases of V relative to a certain inner product are precisely
those bases for which the inner product has In as its matrix (see Section 9.2).
Example 1 D With the inner product on C3 that has the matrix
11 2i 1
9 3 9
if- 22 i
3
2 3
1 i 5
9 3 9

relative to the standard basis £3 of C3, the inner product of u = (1x1,^2,^3) and v
(^1,^2,^3) in C3 is given by

Vl

(u,v) = [ûi u2 u3 H V2

V3
9.4 Orthonormal Bases 305

If we let vi = 4= (1,0,2), v 2 = -4=(—l,z, l ) , v 3 = ~7g(M, - 1 ) , it is readily verified that


A = {vi,v 2 ,V3} is an orthonormal basis of C3 relative to this inner product. The
matrix of transition from £3 to A is given by

1 1 1
N/3 V2 y/ë
0 -2- -i-

2 1 1
x/3 Ve J
and P*HP = Is- We note that the matrix P is not unitary, but it should not be since
£3 is not an orthonormal basis relative to this inner product. ■

Exercises 9.4

1. Show that, with the standard inner product in C 3 , each of the following sets is
orthogonal.

(a) {(§, >/2, - è ) , (Λ/2, - 2 , ->/2), ( - 3 , >/2, - 5 ) }


(b) {(3i, 2z>/2, - i ) , H , 2iy/2,5i), (6i>/2, -7z, 4i>/2)}

2. As in R n , a set is call normalized if each vector in the set has norm 1. Normalize
the sets in Problem 1.
3. Show that the set {(1,0,0), (1 + 2i, 1 + 2i, 1), (3 + 4i, 3 + 6i, 1 + 4i)} is orthogonal
with respect to the inner product of Problem 5 in Exercises 9.2.
4. With the standard inner product in C3, use the Gram-Schmidt process to find an
orthonormal basis of (A).
(a) A = {(1, - i , 1), (2,0,1 - i)} (b) A = {(0, - i , 1), (1 + i, 2,1)}
5. Let the inner product (A,B) be defined on R2X2 as in Problem 7 of Exercises
9.2: {A,B) = t(ATB). Obtain an orthonormal basis of R2X2 by applying the
Gram-Schmidt process to the basis

1 0 1 1 1 1 1 1
/
I
7 5 5
0 0 0 0 1 0 1 1

6. Find an orthonormal basis of R 3 that has (§»—§?§) as its first element.


7. Using the inner product as in Problem 3, find another orthogonal set of three
nonzero vectors in C3.

8. Given that A= { ( è ' ~ 7 i ' è ) ' ( " H T i ' - ! ) ' ί 1 ^ ' 0 ' ^Σ^) } is an orthonormal
basis of C3 relative to the standard inner product, extend each of the orthonormal
sets B below to an orthonormal basis of C3.
306 Chapter 9 Inner Product Spaces

9. Prove that every orthogonal set {v\ \ X G £} of nonzero vectors in V is linearly


independent.

10. Prove Theorem 9.9.

11. Prove that the set of all unitary matrices of order n is closed under multiplication.

12. Prove Theorem 9.11.

13. Write out a detailed proof of Theorem 9.12.

14. Prove Corollary 9.13.

15. Let V be a Euclidean space, and let u, v G V.

(a) Prove the Pythagorean theorem in V : ||u|| 2 + ||v|| 2 =,||u + v|| 2 if and only if
u is orthogonal to v.
(b) Prove that u + v and u — v are orthogonal if and only if ||u|| = ||v||.

9.5 Orthogonal Complements


The inner product in V forms the basis for a special type of decomposition of V as a
direct sum with a given subspace as a summand. This type of decomposition involves
the orthogonal complement of a subspace, a term that we shall define shortly.

Definition 9.14 Let Abe a nonempty subset ofV. The set AL (read A perp) is defined
by
A1- = {v G V | (u, v) = 0 for all u G A }.

That is, A1- consists precisely of all those vectors in V that are orthogonal to every
vector in A.

Example 1 D Let V = R 3 with the standard inner product, and let us consider several
possibilities for A.
liA= {(0,0,0)}, then A1- = R 3 .
If A consists of the single nonzero vector (αχ, α2,03), then A1- is the set of all vectors
(xi,X2,X3) in the plane a\X\ -t-a2#2 + ^3X3 = 0.
If A consists of two linearly independent vectors, A — {(«i, «2, «3), (61,62, ^3)}5 then
A1- is the line of intersection of the planes αιΧι+02X2+^3X3 = 0 and 61X1+62X2+63X3 =
0.
If A is a basis of R 3 , then A1- is the zero subspace. ■

In each of the cases considered in the preceding example, A1- was a subspace of R 3 .
It is not difficult to show that this is always the case.

Theorem 9.15 For any nonempty subset A of V, A1- is a subspace of V.


9.5 Orthogonal Complements 307

Proof. See Problem 1. ■

Theorem 9.16 For any nonempty subset A of V, A1- = (A) .

Proof. If v is a vector such that (u, v) = 0 for all u G (A), then surely (u, v) =0 for
all u G A since AC (A). Thus (A)1- C A±.
Now let v G Α^. Any vector u G (*4) can be written as u = Σ ΐ = ι afcufc with
Uk G v4, and
(u, v) = ^α/cUfc, v = ^ a f c ( u f c , v).
\fc=l / fc=l
J
But (ufc, v) = 0 for each fc since v G ^4 -. Hence (u, v) = 0 and v G (A) since u was
arbitrary in (A). This gives A 1 Ç (.A) , and the proof is complete. ■

Theorem 9.16 shows that there is no loss of generality if we restrict our attention
to those W x where W is a subspace of V. In this case, W x is called the orthogonal
complement of W. The justification for this terminology is contained in the next
theorem.

Theorem 9.17 If W is any subspace of V, then the sum W + W 1 " is direct and
wew^v.
Proof. Now v G W Π W 1 - must satisfy (v, v) = 0, so W Π W x = {0}, and the sum
±
W + W is direct.
If W = {0} or W = V, the theorem is trivial. Suppose then that {vi, V2, ·.·, v r } is
an orthonormal basis of W and r < n. By Corollary 9.13, the set {vi, V2,..., v r } can be
extended to an orthonormal basis

A= {vi,v2,...,vr,vr+i,...,vn}

of V. We shall show that B = {v r +i,..., v n } is a basis of W" 1 . As part of a basis, B is


linearly independent, so it is necessary only to show that B spans W- 1 . To do this, let
v = Σ%=ι GfcVfc be any vector in W- 1 . Since A is orthonormal, (ν^, ν&) = <5^. Thus, for
i = 1,2, ...,r, we have
0 = (v<,v)

n
a
Σ kÖik

ai.

This means that v = ^ ^ = r + 1 α^ν^, and therefore B spans W- 1 .

Corollary 9.18 If W has dimension r, then W- 1 has dimension n — r.


308 Chapter 9 Inner Product Spaces

Proof. In the proof of the theorem, {vi, V2,..., v r } is a basis of W and {v r + 1,..., v n }
is a basis of W - 1 . ■

Whenever v G V is written as v = vi + V2 with νχ G W and V2 G W 1 -, vi is called


the orthogonal projection of v onto W .
Since the orthogonal complement W 1 - of a subspace W is again a subspace, it in turn
has an orthogonal complement (W- 1 )- 1 = W ± J - . This procedure of forming orthogonal
complements can be repeated endlessly, but the next theorem shows that such repetition
leads only back and forth between the same two subspaces.

Theorem 9.19 For any subspace W of V, W- 1 - 1 = W .

Proof. Since W 0 W - 1 = V, any v G V can be written uniquely as v = vi + V2


with vi G W and V2 G W- 1 . To prove that W- 1 - 1 = W , it is sufficient to show that
v G W " 1 1 if and only if v 2 = 0.
Now v G W±J- if and only if

(vi,u) + (v 2 ,u) = (vi + v 2 , u ) = (v,u) = 0

for all u G W" 1 . Since νχ G W , (v x , u) = 0 for all u G W x . This means that v G W±A-
if and only if (v 2 , u) = 0 for all u G W" 1 . But v 2 G W 1 -, so the last condition holds if
and only if (v2, V2) = 0. Hence v G W - 1 if and only if V2 = 0. ■

Exercises 9.5

1. Prove Theorem 9.15.

2. Prove that (Wi + W s ) 1 ^ W ^ f l W ^ , for subspaces W i and W 2 of V.

3. Prove that (Wi Π W 2 ) 1 = W j 1 + W ^ , for subspaces W i and W 2 of V.

4. Let V = R 4 , and let W = ((1, - 2 , 2 , - 7 ) , (1, - 2 , 3 , - 9 ) ) .

(a) Find an orthonormal basis of W1-.


(b) Show how to write an arbitrary vector v G V as v = vi + V2, with vi G W
and v 2 G W±.

5. Find an orthonormal basis of W 1 - in C3.


(a) W = ( ( l , i , l - i ) , ( i , - l , 0 ) ) (b) W = ( ( l , i , f - l ) , ( l , l + i , 2 i - l ) >

6. Let V be Euclidean, and let W be a subspace of V. Prove that whenever v G V


is written as vi + V2 with νχ G W and v 2 G W - 1 , then ||v|| = ||νχ|| + ||v2|| .

7. Let / be a fixed linear functional on V v/ith matrix A = [αχ, α2,..., an] relative to
the orthonormal basis Λ of V, and let φ(/) be the vector in V that has coordinate
matrix A* relative to A. Prove that / ( v ) = (<£(/), v) for all v G V.
9.6 Isomet ries 309

8. Prove that the mapping φ that maps each / G V* onto </>(/) in V is a bijective
mapping from V* to V, but that φ is not an isomorphism except when V is
Euclidean.
9. Prove that, for any subspace W of V, / is in the annihilator W ° if and only if
Φ(ί) e w±.
10. With φ as in Problem 8, define (f,g) on V* by (f,g) = (</>{f),<l>(g)). Prove that
(f,g) defines an inner product on V*.

9.6 Isomet ries


For the remainder of the chapter, (u, v) will denote an arbitrarily chosen but fixed inner
product on V. With this fixed inner product, we direct our attention to those linear
operators on V that preserve distances, or leave the lengths of vectors unchanged. These
linear operators are called isometries. More precisely, we have the following definition.
Definition 9.20 A linear operator T on V is an isometry if ||T(v)|| = ||v|| for all
v G V. An isometry on a Euclidean space is called an orthogonal operator, and an
isometry on a unitary space is called a unitary operator.
Since ||v|| = y/(v, v), one would expect that a linear operator would preserve the
norm if and only if it preserved the inner product. Our next theorem shows that this is
actually the case.
Theorem 9.21 A linear operator T on V is an isometry if and only if
(u,v) = (T(u),T(v))
for all u, v in V.

Proof. If (u, v) = (T(u), T(v)) for all u, v in V, then

||T(v)|| = V(T(v),T(v)) = y/^v) = ||v||,


and T is an isometry.
Suppose conversely that T is an isometry. We first observe that
||u + v|| 2 = ||u|| 2 + (u,v) + (v,u) + ||v|| 2 . (9.1)
If V is a Euclidean space, then (v, u) = (u, v), and
(u,v) = ! { | | u + v | | 2 - | | u | | 2 - | | v | | 2 } .

Hence
(T(u),T(v)) = i { | | T ( u ) + T ( v ) | | 2 - | | r ( u ) | | 2 - | | T ( v ) | | 2 }
= !{||u + v||2-||u||2-||v||2}

= (u,v),
310 Chapter 9 Inner Product Spaces

and the proof is complete for this case.


If V is a unitary space, we have

(u,v) + (u,v) = ||u + v|| - | | u | r - | | v | r (9.2)

from equation (9.1), and another relation is required. We find that

||u + zv|||2 = ||u|| + z ( u , v ) - i ( u , v ) + ||v|| ,

and therefore
(u, v) - (u, v) = -i {||u+w|| 2 - ||u|| 2 - ||v|| 2 } . (9.3)

Equations (9.2) and (9.3) combine to yield

(u, v) = i {||u + v|| 2 - i \\u+iv\\2 + (i - 1) (||u|| 2 + ||v|| 2 ) } . (9.4)

It is left as an exercise (Problem 5) to show that this equation implies (T(u),T(v)) =


(u,v).B

Another characterization of an isometry can be given in terms of the images of


orthonormal bases.

Theorem 9.22 A linear operator T on V is an isometry if and only if T maps an


orthonormal basis of Y onto an orthonormal basis ofY.

Proof. Let A = {vi, V2,..., v n } be an orthonormal basis of V, and consider the set
Τ(Λ) = { Γ ( ν 1 ) , Τ ( ν 2 ) , . . . , Γ ( ν η ) } .
If T is an isometry, then by Theorem 9.21 we have (T(v^), T ( V J ) ) = (v^, Vj) = <5^·,
and therefore T(A) is an orthonormal basis of V.
Assume now that T(A) is an orthonormal basis of V. For any u = Σ™=ϊ a ^ , and
v
= E j = i bjVj in V,

Γ
(T(u),T(v)) = ( (Σ*νΦ Γ
^
= ΣΣ«Λ·(^),^))
n n
= Σ Σ âibjoij
n n

i=l j = l

( n

Σ α ν
n

* ί> Σ b
jVj
\

i=l 3=1 )

= (u,v).

Therefore T is an isometry by Theorem 9.21. ■


9.6 Isometries 311

Yet another characterization of an isometry can be made in terms of its matrix


relative to an orthonormal basis. This time it is necessary to separate the real and
complex cases.

Theorem 9.23 Let T be a linear operator on the unitary space V. Then T is unitary
if and only if every matrix that represents T relative to an orthonormal basis of Y is a
unitary matrix.

Proof. Let A — [ α ^ ] η χ η be the matrix of T relative to an orthonormal basis A =


{vi> v2> ····> v n } of V. Then T(VJ) = Y^l=\0,ij^i, so that A is the matrix of transition
from A to T{A). By Theorem 9.22, T is unitary if and only if T(A) is orthonormal. But
T(A) is orthonormal if and only if A is unitary by Theorem 9.11. ■

A similar application of Theorems 9.22 and 9.9 yields the corresponding result for
orthogonal operators.

Theorem 9.24 Let T be a linear operator on the Euclidean space V. Then T is or­
thogonal if and only if every matrix that represents T relative to an orthonormal basis
ofV is an orthogonal matrix.

E x a m p l e 1 □ Let T be the linear operator on C3 given by


T(vi, v 2 , v 3 ) - (£vi - fv 2 + ψν3, - ^7 ν2 χv l ^+ 7- 2^V- 2 ' ^^ V l " i V 2 ¥V3).
The standard basis £3 is an orthonormal basis relative to the standard inner product
on C3. The matrix of T with respect to £3 is
1 i
2 2 2
i 1
V2
0
V2
1 i -l-i
2 2 2

According to Theorem 9.23, T is a unitary operator if and only if A is a unitary matrix.


Checking to see A* — A~l, we find that
1 i 1 1
2 2 2 2 1 0 0
V2
A*A- i 1 i i
2 2 0 0 1 0
l-i
0 -i-H 1 - 1 - 2
0 0 1
2 2 2

Therefore A is a unitary matrix, and T is a unitary operator. Illustrating Theorem 9.22,


we see that T maps the orthonormal basis 83 onto the orthonormal basis

-* (£3) = | ^ 2 ' ~ 7 2 ' 2 ) ' \ 2' 75' ~ï) ' V~'0' ~~2/ / ' *
312 Chapter 9 Inner Product Spaces

Exercises 9.6

1. Determine whether or not each of the matrices below represents an orthogonal


operator relative to an orthonormal basis of R 3 .

1 1 0 2 0
(a) 1 -1 0 (b) 0 1
1
0 0 1 2 0

2 -2 1 1 1 1

(c)è 2 1 -2 (d) 1 1 1
1 2 2 0 -1 2

2. Determine whether or not each of the given matrices represents a unitary operator
relative to an orthonormal basis of C3.
3 - 4z 3+ 2 i 2 1— 2 0
/
(*)* (z-2)\/2 2(z + l)v 2~ (2i-l)y/2 (b) 1+ 2 3 0
-1 1 + 32 4-3z 0 0 1

1 2 0 y/2 χ/3 1

(c) 2—2 2 (<*H


^6 v^ -v/3 1
-1 1 1 y/2 0-2

3. Find a symmetric orthogonal matrix with [^, — | , | ] in the first column.

4. Given that \ ( "To > "To ) ' ( "75 ' — 775 ) ί i s a n orthonormal basis of R 2 , find an isom­
etry that maps ί - ^ , 4 j j onto (0, — 1).

5. Prove that equation (9.4) in the proof of Theorem 9.21 implies that (T(u), T(v)) =
(u,v).

6. Prove Theorem 9.24.

7. Prove that if A is either orthogonal or unitary, then |det(A)| = 1.

8. Prove that the product of two isometries is an isometry.

9. Prove that T is an isometry if and only if the image of a unit vector under T is
always a unit vector.
9.7 Normal M a t r i c e s 313

9.7 Normal Matrices


In Chapter 7, we formulated necessary and sufficient conditions in order for a linear
operator on V to be represented by a diagonal matrix. At that time, our investigation
was concerned with vector spaces in general, whereas now we are concerned only with
inner product spaces. This prompts us to pose a restricted form of the original question
by asking which linear operators can be represented by a diagonal matrix relative to an
orthonormal basis of V.
A given linear operator T on V will have a unique matrix A relative to a particular
orthonormal basis of V. If P is the transition matrix from this basis to a new basis of
V, the new matrix of T is P~lAP. When V is unitary, the new basis is orthonormal if
and only if P is a unitary matrix. Thus, for a unitary space V, T can be represented by
a diagonal matrix relative to an orthonormal basis if and only if there exists a unitary
matrix P such that P~lAP is diagonal. Similarly, for a Euclidean space V, T can be
represented by a diagonal matrix relative to an orthonormal basis if and only if there
exists an orthogonal matrix P such that P~l AP is diagonal.

Definition 9.25 If A and B are square matrices overC, then B is unitarily similar
to A if and only if there exists a unitary matrix U such that B = U~1AU. If B and
A are square matrices over R, then B is orthogonally similar to A if and only if
B — P~l AP for some orthogonal matrix P.

Unitary similarity is an equivalence relation on the square matrices over C, and


orthogonal similarity is an equivalence relation on the square matrices over R (Problems
4 and 5).

Definition 9.26 A matrix B = [6^] m X n is upper triangular ifbij = 0 for all i > j .

That is, an upper triangular matrix is one that has only zero elements below the
main diagonal. Similarly, a lower triangular matrix is one that has only zero elements
above the main diagonal.
The next proof should be compared with that of Theorem 8.19, for the technique of
proof is much the same.

Theorem 9.27 Every nxn matrix A overC is unitarily similar to an upper triangular
matrix B, and the diagonal elements of B are the eigenvalues of A.

Proof. The theorem is trivial for n = 1. We proceed by induction on n.


Assume that the theorem is true for all k x h matrices over C, and let A be a matrix
of order k + 1 over C. Let V be a unitary space of dimension fc + 1, and let A be an
orthonormal basis of V. The matrix A represents a unique linear operator T on V
relative to the basis A. Let λ be an eigenvalue of T with v a corresponding eigenvector
of norm 1. By Corollary 9.13, there is an orthonormal basis

B = {vi,v2,...,vn}
314 Chapter 9 Inner Product Spaces

of V with vi = v. The matrix of transition U\ from A to B is unitary, and the matrix


of T relative to B is of the form

A Äi
A1 = U^AUi =
0 A2
where R\ is 1 by k and A2 is of order k. By the induction hypothesis, there is a unitary
matrix Q such that Q_1A2Q is upper triangular. The matrix

1 0
U2 =
0 Q
is unitary, and

1 0 λ Äj 1 0
U2lAxU2
0 Q-1 0 Λ2 0 Q
λ RiQ
0 Q - M 2Q

Thus, υ2ιΑλυ2 is upper triangular since Q lA2Q is upper triangular. The matrix
U = U1U2 is unitary since U\ and C/2 are unitary, and
B = jy-^f/
is upper triangular since
5 = UïlUïlAUiU2 = υϊιΑλυ2.
It is clear that the diagonal elements of B are the eigenvalues of B = [bij] since
det(B - xl) = (&11 - x)(b22 - x) - - (bnn ~ x)-
But B and A have the same eigenvalues since they are similar. ■
Corollary 9.28 Every linear operator on a unitary space V can be represented by an
upper triangular matrix relative to an orthonormal basis of V.
On the surface, it would appear that there should be a result for real matrices
that corresponds to Theorem 9.27. That is, it would seem likely that every n x n real
matrix would be orthogonally similar to an upper triangular matrix over R. But a
closer examination shows that this is not the case at all. For the diagonal elements of a
triangular matrix are the eigenvalues of that matrix, and the eigenvalues of a real matrix
are not necessarily real (see Problem 1 for an example). However, for real matrices that
have only real eigenvalues, the proof of Theorem 9.27 can be modified so as to prove
the following theorem.
Theorem 9.29 Let A be a real matrix of order n. Then A is orthogonally similar to
an upper triangular matrix if and only if all the eigenvalues of A are real.
9.7 Normal Matrices 315

Proof. See Problem 6. ■

Corollary 9.30 A linear operator T on a Euclidean space V can be represented by


an upper triangular matrix relative to an orthonormal basis of V if and only if all the
eigenvalues ofT are real.

Now we are ready to establish the criterion for a matrix to be unitarily similar to a
diagonal matrix.

Theorem 9.31 A square matrix A over C is unitarily similar to a diagonal matrix if


and only if AA* = A*A.

Proof. Assume that there exists a unitary matrix U such that

U~lAU = D = diag{di,d 2 , - . , d n } .

We first note that

DD* = D*D = diag{|di| 2 , \d2\2 ,..., \dn\2}.

Since A = UDU~l andf/" 1 = {/*, we have

AA* = (UDU*)(UD*U*)
= UDD*U*
= UD*DU*
= UD*U*UDU*
= A* A.

Assume now that AA* = A*A. By Theorem 9.27, there is a unitary matrix U such
that B = U~lAU = U*AU is upper triangular. Now

BB* = U*AUU*A*U = U*AA*U,

and a similar calculation shows that

B*B = U*A*AU.

Therefore, BB* = B*B, since AA* = A*A. Since brs = 0 in B = [bij] whenever r > 5,
the element in the z th row and j t h column of BB* is JZ^ = 1 bikbjk — Σ£=ζ bikbjk- Simi­
larly, the element in the i t h row and j t h column of B*B is Σ £ = 1 bfci&fcj = Yl]e=i^kit>kj-
Equating the diagonal elements of BB* and B*B, we have
i n

5 > f c i | = £|6 î f c | 2 .
2
(9·5)
k=l k=i
316 Chapter 9 Inner Product Spaces

For z = 1, this yields


|^π| 2 = |6ι 1 | 2 + |6 12 | 2 + ··· + |6 1 „| 2 .
Therefore b\j = 0 for all j > 1, and all elements in the first row of B except on are
zero. In particular, b\2 = 0. With i = 2 in equation (9.5), we now have

|6 22 | 2 = |6 12 | 2 + |6 22 | 2 = |6 22 | 2 + |6 23 | 2 + · · · + \b2n\2 .

Therefore &2j = 0 for all j> 2, and all elements except 622 in the second row of B are
zero. This procedure can be repeated with equation (9.5) to obtain

|6«| 2 = |6ü| 2 + |6i, i + i| 2 + --- + |6 i n | 2

for i = l,2,...,n. Therefore, bij = 0 for all j > z, and B is a diagonal matrix. This
completes the proof. ■

Definition 9.32 A square matrix A over a field T Ç C is called normal if and only if
AA* = A* A.

Corollary 9.33 A linear operator T on a unitary space V can be represented by a diag­


onal matrix relative to an orthonormal basis of V if and only if any matrix representing
T relative to an orthonormal basis of Y is normal.

The last theorem says that those matrices over C that are unitarily similar to a
diagonal matrix are precisely the normal matrices. These include the symmetric real
matrices, the hermitian matrices, the orthogonal matrices, and the unitary matrices.
However, there are normal matrices that do not fall into any of these categories. Such
an example is found in Problem 2.
Our next theorem gives a simple characterization of those real matrices of order n
that are orthogonally similar to a diagonal matrix.

T h e o r e m 9.34 A real matrix of order n is orthogonally similar to a diagonal matrix


if and only if it is symmetric.

Proof. If A is real and symmetric, then A is orthogonally similar to a diagonal


matrix by Theorem 8.19.
Assume that the real matrix A is orthogonally similar to a diagonal matrix D, and
let P be an orthogonal matrix such that P~lAP = PTAP = D. Then A = PDPT, so
that
AT = PDTPT = PDPT = A,
and A is symmetric. ■

Corollary 9.35 A linear operator T on a Euclidean space V can be represented by a


diagonal matrix relative to an orthonormal basis of V if and only if every matrix that
represents T relative to an orthonormal basis of V is symmetric.
9.7 Normal Matrices 317

The results established in this section give a practical way to determine whether
or not a matrix is unitarily similar or orthogonally similar to a diagonal matrix, but
a systematic procedure for finding a matrix that will accomplish the diagonalization is
yet lacking. Such a procedure will be developed at the end of the next section.

Exercises 9.7

1. Determine which of the following real matrices are orthogonally similar to an


upper triangular matrix.
1 2 1 1 1 1 7 -6
(a) (b) (c) (d)
3 4 -1 1 1 -1 -6 -2

2. Determine which of the following matrices are orthogonal, which are unitary, and
which are normal.
-1 3 3
x/2
.& o
3 1 3
2 U
i i—1
(a) 2 2
(b) 0 0 (c)
V2 i+ 1 0
3 3 1
v/2 2 2 J 2 2

1 I 1+i 5 i 2
2 i
(d) (e)| -i 1 (f) -i 2 -1
i 2
-1+i 1+i 0 2 - 1 2

a x y
3. For what values of x, y, z is the matrix A = 0 b z a normal matrix?

0 0 c
4. Prove that unitary similarity is an equivalence relation on the square matrices
over C.

5. Prove that orthogonal similarity is an equivalence relation on the square matrices


over R.

6. Prove Theorem 9.29.

7. Prove that if a matrix U is unitary, then all eigenvalues of U have absolute value
1.

8. Prove that a square matrix A over C is normal if and only if every matrix that is
unitarily similar to A is normal.

9. Let B — [bij] be an upper triangular matrix that is square. Prove that B2 is an


upper triangular matrix.
318 Chapter 9 Inner Product Spaces

10. Let C be any square matrix over C.

(a) Prove that C can be written as C = A + Bi with A and B hermitian.


(b) Prove that C is normal if and only if AB — BA.

9.8 Normal Linear Operators


In the preceding section, we found that a linear operator T on V can be represented
by a diagonal matrix relative to an orthonormal basis if and only if every matrix that
represents T relative to an orthonormal basis is normal. Our main purpose in this
section is to interpret this requirement as a condition on the linear operator T. This
turns naturally to an investigation of the linear operator that has matrix A* whenever
T has matrix A relative to an orthonormal basis A.
Let A — {vi, V2,..., v n } be an orthonormal basis of V. For an arbitrary vector v =
Y!i=\xiVi in V, we have ( ν , , ν ) = ΣΓ=ι x *( v j' v *) = xo- T h a t i s ' v = ΣΓ=ι(ν*> v ) v *·
Thus a vector v is determined uniquely whenever the inner product of v with each base
vector Vj is specified.
Suppose now that T is a certain fixed linear operator on V. For a fixed v in V, the
set of equations
(v J -,u) = (T(v i ),v), j = l,2,...,n, (9.6)
determines a unique vector u in V since it specifies the inner product of u with each
Vj in A. Thus the rule T*(v) = u defines a mapping T* of V into V. For an arbitrary
w
=E"=i6ivj in V
>
n
(u»w) = 5 ^ 6 i ( u , v i )
3= 1

and
n
(v,T(w)) = X ; 6 i ( v , r ( v i ) ) .
3= 1

Thus the set of equations (9.6) is equivalent to the requirement that (u, w) = (v,T(w))
for all w G V. In other words, for each v G V, the value T*(v) is determined by the
equation
(T*(v),w) = (v,T(w)) for all w G V. (9.7)
This leads to the following definition.
Definition 9.36 For each linear operator T on V, the adjoint ofT is the mapping T*
of V into V that is defined by the equation
(T*(v),w) = (v,T(w))
for all v, w € V.
Theorem 9.37 For any linear operator T on V, the adjoint of T is a linear operator
on V.
9.8 Normal Linear Operators 319

Proof. See Problem 7. ■

Our next theorem provides a basis for the desired interpretation of the results in
Section 9.7.
Theorem 9.38 / / the linear operator T on V has matrix A = [α^] relative to the
orthonormal basis A of V, then T* has matrix A* relative to A.

Proof. Let A = {vi, V2,..., v n } . Then T(v^) = Y^=iQ>kiVk since A is the matrix
of T relative to A. Suppose B = \bij\n is the matrix of T* relative to A, so that
T
* ( v j ) = E L i bkjVk. From the definition of T*, we have (Τ(ν»), νά) = (VUT*(VJ)) .
But
(T(Vi),Vj) = lY^CLkiVkiVj)
n
= Σ) äfei(vfc,vJ·)
fc=l
n

fc=l
= a
i*
and
( V l ,T*( V j )) - ( V j , E^fcjVfc)
n
= Σ &fcj(Vt,V fc )
fc=l
n

fc=l

Thus 6i<7- = äji and B — A*. ■

It is left as an exercise to prove that (T*)* = T for any linear operator T on V. A


linear operator T is called self-adjoint if T* = T. It follows from the preceding theorem
that if V is unitary, then T is self-adjoint if and only if every matrix that represents
T relative to an orthonormal basis of V is hermitian. If V is Euclidean, then T is
self-adjoint if and only if every matrix that represents T relative to an orthonormal
basis is symmetric. A self-adjoint linear operator of a unitary space is called a hermi­
tian operator, and a self-adjoint linear operator on a Euclidean space is a symmetric
operator.
Definition 9.39 A linear operator T on V is normal if and only ifTT* = T*T.
If T has matrix A relative to an orthonormal basis A of V, then TT* has matrix
A A* and T^T has matrix A* A relative to A. Thus T is a normal linear transformation
of V if and only if every matrix of T relative to an orthonormal basis is a normal matrix.
Theorem 9.40 A linear operator T on the unitary space V can be represented by a
diagonal matrix relative to an orthonormal basis of V if and only if T is normal.
320 Chapter 9 Inner Product Spaces

Proof. This is a restatement of Corollary 9.33. ■

Theorem 9.41 A linear operator T on the unitary space V is normal if and only if
there exists an orthonormal basis of V that consists entirely of eigenvectors of T.

Proof. By the preceding theorem, T is normal if and only if T can be represented


by a diagonal matrix relative to an orthonormal basis. But, according to the proof of
Theorem 7.16, A — {vi, v 2 , .·., v n } is a basis of eigenvectors of T if and only if the
matrix of T relative to A is diagonal. ■

As was promised in the last section, we proceed now to develop a systematic method
for finding a unitary matrix that will accomplish a desired diagonalization.
Many of the results in Section 7.4 are helpful here, even though we are presently
restricted to orthonormal bases. If T is represented by a diagonal matrix, the elements
on the diagonal are the eigenvalues of T (Corollary 7.17). If T is normal, the geometric
multiplicity of each eigenvalue is equal to the algebraic multiplicity (Theorem 7.20).
For orthogonal similarity with a real matrix A, Theorem 9.34 shows that the treat­
ment in Section 8.5 is complete, and the methods developed there apply unchanged.
For unitary similarity, some changes are necessary. We must first obtain the result for
normal matrices that corresponds to Theorem 8.21 for real symmetric matrices. The
first step in this direction is to relate the eigenvalues of T to those of T*.

Theorem 9.42 A number X is an eigenvalue of the normal linear operator T with v


as an associated eigenvector if and only if X is an eigenvalue of T* with the same v as
an associated eigenvector.

Proof. See Problem 12. ■

Theorem 9.43 Let A be a normal matrix of order n. If Xr and Xs are distinct eigen­
values of A with associated eigenvectors Ur and Us, then U*US = 0.

Proof. Let V be an n-dimensional inner product space, and let A be an orthonor­


mal basis of V. Let T be the linear operator on V that has matrix A relative to A.
An eigenvector of A corresponding to a certain eigenvalue is the coordinate matrix of
an eigenvector of T corresponding to the same eigenvalue. Hence Ur and Us are the
coordinate matrices of eigenvectors v r and v s that correspond to the eigenvalues Xr and
As, respectively. By Problem 8 of Exercises 9.2, U*US = 0 if and only if (v r , v s ) = 0.
According to Theorem 9.42, Xr is an eigenvalue of T* and T*(v r ) = A r v r . Hence

(T*(v r ), v j = (Ä r v r , v e ) = A r (v r , v e ).

We also have
(v r ,T(v 8 )) = (v r , A s v s ) = A s (v r ,v s ).
But (T*(v r ), v s ) = ( v r , T ( v s ) ) , so this means that A r (v r , v s ) = A s (v r , vs) and

(Ar - A s )(v r ,v s ) = 0 .
9.8 Normal Linear Operators 321

Since Xr — Xs φ 0, it must be that (v r , v s ) = 0, and the proof is complete. ■

We can now formulate our method for finding a unitary matrix U that will diagonal-
ize a given normal matrix A = [a^] n . Let Uj denote the j t h column of U = [uij]n, so that
U = [f/χ, {/2, ...,f/ n ]. The requirement that U~lAU = diag{Ai, À2,..., λ η } is equivalent.
to the system of equations

AUj = XjUj, j = l,2,...,n.

That is, each Uj must be an eigenvector of A corresponding to Xj. Since U*US is the
element in row r and column s of U*U, the requirement that U* = U~l is satisfied
if and only if U*US = 6rs. With the same notation as in the proof of Theorem 9.43,
U*US = (v r , v s ) . Thus, the columns Ur of U must be the coordinates of an orthonormal
basis of eigenvectors of T. Theorem 9.43 assures us that eigenvectors from different
eigenspaces are automatically orthogonal. Thus the only modification of the procedure
in Section 7.4 that is necessary to make U unitary is to choose orthonormal bases of the
eigenspaces Vxj. The Gram-Schmidt process can be used to obtain orthonormal bases
of those V\j that have dimension greater than 1.

Example 1 □ Consider the problem of finding a unitary matrix U such that U~lAU
is diagonal, where A is the normal matrix

2
0 - 22+ * 1
u

A = 0 i 0 .
-2+z u
2 2 J

The characteristic polynomial is found to be

det(A - xl) = -{x - i)2(x - 2).

The augmented matrix [A — 21,0] reduces by row operations to

Γ1 0 1 ol
0 1 0 0 ,
lo o o ol

so the corresponding eigenvectors in V 2 Ç C3 are of the form x 3 (—1,0,1). This means


that V2 has an orthonormal basis consisting of a real unit vector. But in order to
exhibit a solution with a unitary U that is not orthogonal, we choose £3 = — y/3 + i and
obtain the orthonormal basis

H 2V2 ' u ' 2χ/2 ;/

for V2. The matrix A — il leads to solutions of the form #2(0,1,0) +2:3(1,0,1). With
X2 = 1, X3 = i,we obtain (i, 1, z), and with X2 = i, X3 = 1, we obtain (1, z, 1). Application
322 Chapter 9 Inner Product Spaces

of the Gram-Schmidt process to the basis {(2, l,i), (1,2,1)} leads to the orthonormal
basis
1ΛΛ/3' Va' VS) ' \Vë> Vë' Vê))
of V». Thus
i 1 V3-i
2v"2
U = 1 2i
0
i 1 -V3-\-i
VS Vë 2\/2 J
is a unitary matrix such that U~1AU = diag{i, 2,2}. ■
Exercises 9.8
1. For each of the following linear operators on C2, write out the value of Τ*(αι, α 2 ).
(a) Τ(αι,α 2 ) = («i + (1 - ϊ)α2, (1 + 2>ι + 2α2)
(b) Τ(αι,α 2 ) = (αχ + 2*α2,αι - α2)
(c) Γ(αι,α 2 ) = (mi - 2 α 2 , α ι )
(d) Γ(αι,α 2 ) = ( ^ ι + (2 - 1)α2, (1 + 2>ι)
2. For each linear operator T in Problem 1, find an orthonormal basis of eigenvectors
whenever such a basis exists.
3. Whenever possible, find a unitary matrix U such that U~1AU is diagonal. The
matrices in parts (a)-(e) are from Problem 2 in Exercises 9.7.

-1 3 3 1 _^1 0
y/2 \/2 2 2
(a) A 3 1 3 (b)A = 0 0 1
V2 2 2
^3 1
3
2 2 0
V2
2 i—1
(c)A (d)^ =
2+ I 0

1 2 1+2 3 1 0
(e)A -2 1 -1 + i (f)A- 1-1 5
2 1 - 2

-I + 2 1 +z 0 0 1+2 3
4. Consider the following matrices.
1 1 1 1 2
A =
2
\/3 _ I
2
, B = V2 V2 , c=
2 2
i i 2 -2
L V2 v/2 J
2+ 2 -2 + i 1 2
D , E =
-2 + 2 2+ i 0 3
9.8 Normal Linear Operators 323

(a) Determine which of these matrices are orthogonal, which are unitary, and
which are normal.
(b) Which of these matrices are unitarily similar to an upper triangular matrix?
(c) Which of these matrices are unitarily similar to a diagonal matrix?
(d) Which of these matrices are similar over C to a diagonal matrix?

5. Let T be a linear operator on V over ?".

(a) Prove that (T*)* = T.


(b) Prove that (aT)* = ôT*, for all a G T.

6. Let S and T be two linear operators on V.

(a) Prove that (5 + T)* = 5* + T*.


(b) Prove that (ST)* = T*S*.

7. Prove Theorem 9.37.

8. Show that T + T* is self-adjoint.


9. Let T be a linear operator on the unitary space V. Prove that T is hermitian if
and only if (T(v), v) is real for all v G V.

10. Show that T is an isometry if and only if T* = T _ 1 .

11. Prove that a linear operator T on the unitary space V is normal if and only if
||T(v)|| = | | r * ( v ) | | f o r a U v G V .
12. Prove Theorem 9.42.

13. Prove that a normal linear operator T is self-adjoint if and only if all the eigen­
values of T are real.

14. Prove that if T is normal, then T and T* have the same kernel.

15. Prove that if T is an invert ible linear operator on V, then T* is invert ible and
(T*)- 1 = (T" 1 )*.
16. A linear operator T on an inner product space V is said to be skew-adjoint if
X* = — T. (For V unitary, T is called skew-hermitian, and for V Euclidean, T
is called skew-symmetric.)

(a) Prove that, for any linear operator T on V, T — T* is skew-adjoint.


(b) Show that any linear operator T on V can be written uniquely as the sum
of a self-adjoint linear operator and a skew-adjoint linear operator.
Chapter 10

Spectral Decompositions

10.1 Introduction
In this chapter we consider once again the question of diagonalization of a linear operator
on a finite-dimensional vector space V. We have seen in Chapter 7 that a linear operator
T on V is diagonalizable (i.e., can be represented by a diagonal matrix) if and only if
there exists a basis of V that consists of eigenvectors of T. For a unitary vector space
V, those linear operators that can be represented by a diagonal matrix relative to an
orthonormal basis of V are the same as the normal linear operators on V. One of our
main objectives now is to describe the diagonalizable linear operators in terms that are
free of any reference to an inner product. The characterization that we obtain is in
terms of a spectral decomposition. A certain acquaintance with projections is essential
to a formulation of the concept of a spectral decomposition.
For the entire chapter, V shall denote an n-dimensional vector space over a field T',
and T shall denote a linear operator on V.

10.2 Projections and Direct Sums


In Section 9.5, we have seen that a finite-dimensional inner product space V can be
decomposed as a direct sum V = W 0 W 1 -, where W is an arbitrary subspace of V and
W 1 - is the orthogonal complement of W . Each vector v G V can be written uniquely
as v = vi + V2 with vi G W and V2 G W \ and the linear operator P defined by
P(v) = vi is called the projection of V onto W along W 1 - (see Problem 1).
The situation in the preceding paragraph generalizes to any direct sum decompo­
sition V = W i Θ W2 of an arbitrary (not necessarily an inner product space) V. For
each v in V can be written uniquely as v = vi + V2 with v^ G W^, and the mapping
P(v) = vi is again a linear operator on V. Corresponding to the situation above, P is
called the projection of V onto W i along W2. (In order to make a distinction for
the case where W2 = Wj 1 , P is frequently called the orthogonal projection of V
onto W i when W2 = Wj 1 .) For any v = vi + V2 in V,

325
326 Chapter 10 Spectral Decompositions

P 2 ( V l + v 2 ) = P(vi) = vi = P(v x + v 2 ),

so P has the property that P2 = P.

Definition 10.1 A linear operator T on V is called idempotent ifT2 = T.

Our discussion above shows that the projection P of V onto W i along W2 is idem-
potent, and that W i = P ( V ) , W 2 = P _ 1 ( 0 ) . The converse is also true: An idempotent
linear operator T is always a projection of V onto T(V) along T _ 1 ( 0 ) .

Theorem 10.2 IfT is an idempotent linear operator on V, then T is the projection of


V ontoT(V) along T~l(0).

Proof. Assume that T2 = T . For any u G T(V), u = T(v) for some v G V. Hence

T ( u ) = T 2 ( v ) = T ( v ) = u,

and T acts as the identity transformation on T(V). Thus, for any v in T(V) Π Τ _ 1 ( 0 ) ,
we have v = T(v) = 0, and the sum T(V) + T _ 1 ( 0 ) is direct. Let v be an arbitrary
vector, and let vi = T(v), v 2 = (1—T) (v), where 1 denotes the identity transformation.
Now Vi is clearly in T(V) and v 2 is in T~l(0) since1 T(v 2 ) = ( T - T 2 ) ( v ) = Z(v) - 0.
Since
v = T(v) + ( l - T ) ( v ) = v i + v 2 ,
we have V = T(V) Θ Γ _ 1 ( 0 ) , and T is the projection of V onto T(V) along Γ _ 1 ( 0 ) . ■

In view of Theorem 10.2, an idempotent linear operator is referred to simply as a


projection: T is a projection if and only if T2 — T.
The relation between projections and direct sums that was described in the first two
paragraphs of this section can be extended to direct sums that involve more than two
terms.
Let V = W i 0 W 2 0 - · - 0 W r , and let v = Vi+v 2 H hv r denote the decomposition
of each v G V with v* G W j . For i = 1, 2,..., r, let Pi be defined by

Pi(vi + ν 2 + · · · + ν Γ ) = Vi.

Then the set {Pi, P 2 ,..., Pr} has the following properties:

(a) each Pi is a projection;


(b) PiPj = Z whenever i φ j \
(c) Ρ ι + Ρ 2 + · · · + Ρ Γ = 1.
x
The symbol Z denotes the zero linear transformation.
10.2 Projections and Direct Sums 327

Each Pi is a projection since Pf(v) = P;(v;) = v* = Pi(v). If i ^ j,PiPj{v) =


Pi(vj) = 0 =Z(v) for all v G V, so (b) holds. For any v = vi + v 2 + · · · v r in V, we
have
v = Pi(v) + P 2 (v) + .-. + P r (v)

and hence Pi + P 2 H + P r = 1.
The three properties of the set {Pi, P 2 ,..., P r } in the preceding paragraph motivate
the following definition.

Definition 10.3 A set of projections {Pi, P 2 ,..., P r } zs an orthogonal set if P%Pj = Z


whenever i φ j . A complete set of projections for V is an orthogonal set {Pi, P 2 ,..., P r }
o/ nonzero projections with the property that Pi + P 2 H + P r = 1.

Two projections P i , P 2 are called orthogonal if {Pi,P 2 } forms an orthogonal set,


that is, if P i P 2 = Z = Ρ2Ρχ.
We have seen above that every direct sum decomposition V = W i 0 · · · 0 W r deter­
mines a complete set of projections {Pi, ...,P r } for V such that Pj{v) — v whenever
v = vi + · · · + v r with Vi G Wi. The converse is also true, in that if {Pi,..., P r } is
a complete set of projections for V, then V = Pi(V) 0 · · · 0 P r ( V ) and Pj(v) = Vj
whenever v = vi + · · · + v r with v^ G Pi(V). Now v = Pi(v) + · · · + P r (v) for each
v G V since Px + · · · + Pr = 1, and therefore V =Pi(V) H h P r ( V ) . To see that the
sum is direct, let
r
w = Pi(u1) = J]P i (u 2 )
i=i

be in
p,(v)n£^(v).
J= l

Then

w = P i (u 1 ) = i f ( u i ) = P i Σ ^ 2 )

= tPiPj("2)=£z(u2) = o.

It is clear that P/(v) = v^ whenever v = νχ -f · · · + v r with v^ G Pi(V).

Definition 10.4 A complete set of projections {Pi,P 2 ,..., Pr} for Y and a direct sum
decomposition V = W i 0 W 2 0 · · · 0 W r are said to correspond to each other, or to
be associated with each other if P/(v) = \j whenever v G V is written in the unique
form v = vi + v 2 + · · · + v r with v* G WV
328 Chapter 10 Spectral Decompositions

Thus, V = W i Θ W 2 Θ · · · Θ W r and {Ρι,Ρ 2 , ■•-,ΡΓ} correspond if and only if


Pj (V) = W j and Pj acts as the identity on W j .
Let us consider now some simple examples of complete sets of projections. The
reader should note in each case the corresponding direct sum decomposition.

Example 1 D Let V = R n , and define Pi by

Pi(ai,a 2 ,...,On) = di^i = (0, ...,Ο,α^,Ο, ...,0)

for i = 1,2, ...,n. Then {Ρχ, P 2 ,..., Pn} is a complete set of projections for R n . This
situation generalizes readily to an arbitrary n-dimensional vector space V. For any
given basis A = {vi, v 2 ,..., v n } , let P{ \YTj=iajYj) = αίνί· Then
{Pi, P2, ···, Pn} is a
complete set of projections for V. ■

Variations in the number of projections in a complete set can easily be made. For
example, for each v = (αι,α 2 ,α3) in R 3 , let the mappings 7\ and T2 be defined by
Ti(v) = (αχ, 0,0) and T 2 (v) = (0, a 2 ,03). Then {Ti, T 2 } is a complete set of projections
for R 3 .

Theorem 10.5 Each eigenvalue of a projection is either 0 or 1.

Proof. Suppose that λ is an eigenvalue of the projection P , and let v be an associated


eigenvector. Then (P — λ)(ν) = 0, and

(1 - λ)λν = (1 - λ)Ρ(ν)
= (Ρ-λΡ)(ν)
= (Ρ2-λΡ)(ν)
= Ρ((Ρ-λ)(ν))
= Ρ(0)
= 0.

Since ν φ 0, it must be that (1 — λ)λ = 0, and λ is either 0 or 1. ■

The fact that a projection P has the property P2 — P and also acts as the identity
transformation on its range might lead one to expect the matrix of a projection to
look somehow like the identity matrix. This is not necessarily the case, however. For
example, the matrix
3 -3 -2 3

-4 6 4 -5
A =
3 -3 -2 3
-4 6 4 -5
10.2 Projections and Direct Sums 329

is such that A2 = A, and hence represents a projection. But with an appropriate choice
of basis, the matrix of a projection P can be made to take on a form very much like
In. Now P(V) is the same as the eigenspace of P corresponding to the eigenvalue 1,
and V = P ( V ) Θ P _ 1 ( 0 ) . Thus, if a basis {vi,..., v r } of P ( V ) is extended to a basis
A = {vi,..., v r , v r +i,..., v n } of V with {v r +i,..., v n } a basis of P _ 1 ( 0 ) , then the matrix
of P relative to A is
Γ
ir o
Dr =
0 0
An interchange of the v* in A produces an interchange of the elements on the diagonal
of Dr, so the l's on the diagonal can be placed in any desired diagonal positions.

Exercises 10.2

1. If V = W i Θ W 2 , prove that the projection of V onto W i along W 2 is a linear


operator on V.

2. Let T be a linear operator on V. Prove that T is a projection if and only if 1 — T


is a projection.

3. Prove that if P is the projection of V onto W i along W2, then 1 — P is the


projection of V onto W 2 along W i .

4. Let Pi, P 2 , P3 be nonzero projections of V such that P\ + P 2 + P3 = 1, and assume


that 1 + 1 φ 0 in T. Prove that { P i , P 2 , P s } is a complete set of projections for
V.

5. Give an example of two projections Pi, P 2 such that P1P2 — Z but Ρ 2 Ρι Φ Z.

6. Prove that if P is a projection on the inner product space V, then P* is a projection


on V.

7. Prove that any complete set of projections for V is linearly independent.

8. Let V = W i -f W 2 + · · · + W r , where each W z is a subspace of V. Prove that


V = W i Θ W 2 Θ · · · Θ W r if and only if vi + v 2 H h v r = 0 and v* G W ,
imply that each Vj = 0 .

9. Let Pi be the projection of V onto Si along S 2 , and let P 2 be the projection of


V onto W i along W 2 , and assume that 1 + 1 Φ 0 in T.

(a) Prove that Pi + P 2 is a projection if and only if {Pi, P 2 } is orthogonal.


(b) Prove that if Pi and P 2 are orthogonal projections, then Pi + P 2 is the
projection of V onto Si 0 W i along S 2 Π W 2 .

10. Let V be an inner product space. Prove that if P is the projection of V onto W i
along W 2 , then P* is the projection of V onto W ^ along W^ 1 .
330 Chapter 10 Spectral Decompositions

11. According to the definition in the second paragraph of this section, a projection P
on an inner product space V is called an orthogonal projection if and only if P(V)
and P _ 1 ( 0 ) are orthogonal subspaces. Prove that a projection P is an orthogonal
projection if and only if P is self-adjoint.

12. Let P be a projection on the inner product space V. Prove that if ||P(v)|| < ||v||
for all v G V, then P is an orthogonal projection.

13. Prove that a projection on an inner product space V is self-adjoint if and only if
it is normal.

10.3 Spectral Decompositions


With the contents of the preceding section at hand, we are prepared to formulate a new
characterization of the diagonalizable linear operators on V. It is clear that in order for
T to be diagonalizable, all of its eigenvalues must be in T. That is, the characteristic
polynomial of T must factor into linear factors over T. The problem, then is to determine
what additional conditions are necessary in order that T be diagonalizable.
The characterization of diagonalizable operators that we shall obtain is phrased in
terms of projections. The projections are a very special type of operator, so it is natural
to ask how the set of projections fits into the vector space that consists of all linear
operators on V. The sum of two projections is usually not a projection (see Problem 1),
so the projections do not form a subspace. However, our main interest is not with the
set of all projections, but rather with those subspaces that are spanned by a complete
set of projections. That is, we are interested primarily in those linear operators that
are linear combinations of a complete set of projections. Our first result concerns the
eigenvectors of this type of operator.

T h e o r e m 10.6 Let {Pi, P 2 ,..., Pr} be a complete set of projections for V, and suppose
that T = c\P\ + C2P2 H h crPr for some scalars C{. Then each Ci is an eigenvalue of
T, and each eigenvector v^ associated with the eigenvalue 1 of Pi is an eigenvector of T
associated with Q .

Proof. Let v^ be an eigenvector associated with the eigenvalue 1 of P^, so that


Pi(vi) = Vi. For any j φ z, we have Pjiyi) = Pj (Pi(vi)) = 0. Therefore

T(v<) = ciPi(v;) + c 2 P 2 (v;) + . . . + c r P r (vi)


= CiPi(vi)

= CiVi,

and Ci is an eigenvalue of T with v^ as an associated eigenvector. H

T h e o r e m 10.7 The eigenvectors of a complete set of projections span V.


10.3 Spectral Decompositions 331

Proof. Let {Pi, P 2 ,..., P r } be a complete set of projections for V. For an arbitrary
v G V, consider the vector Ρ*(ν). Since

Pi(v) is an eigenvector of Pi unless P;(v) = 0. Since 1 = Pi + P 2 H l· Pr,

ν = Ρι(ν) + Ρι(ν) + . · · + Ρ Γ (ν).

But Pi(v) + P 2 (v) H h Pr(v) is a linear combination of eigenvectors of Pi, P 2 ,..., P r ,


and the proof is complete. ■

We are now in a position to prove the main theorem of this section.

Theorem 10.8 Let T be a linear operator on V with distinct eigenvalues λι, λ 2 ,..., Ar.
Then T is diagonalizable if and only if

Τ = λιΡι + Α 2 Ρ 2 + · · · + ΑΓΡΓ,

where {Pi, P 2 , ...,P r } is a complete set of projections for V.

Proof. Suppose first that T = λχΡχ + λ 2 Ρ 2 -f · · · + A r P r , where {Pi, P 2 ,..., P r } is a


complete set of projections for V. Let Λ = {vi, v 2 ,..., v n } be a basis of V, and consider
the set of nr vectors

ß={Pi(vi),...,Pi(vn),P2(vi),...,P2(vn),...,Pr(vi),...,Pr(vn)}.

For any v = Y^=l a ^ , we have

v = ( Ρ ι + Ρ 2 + ··· + Ρ,)(ν)
= (Ρι+Ρ 2 + ··· + Ρ Γ ) ( | > ν Λ
n r
= Σ Y^aiPj^i).
t=1.7 = l

Thus, B spans V, and therefore contains a basis C = { u i , u 2 , ...,u n } of V. Each ιΐ/~ is


a nonzero vector of the form u^ = P J ( V J ) , and hence is an eigenvector of Pj associated
with the eigenvalue 1. By Theorem 10.6, u^ is an eigenvector of T associated with the
eigenvalue λj. Thus C is a basis of eigenvectors of T, and T is diagonalizable by Theorem
7.16.
Suppose now that T is diagonalizable, and let B = {wi, w 2 ,..., w n } be a basis of
V such that T is represented by a diagonal matrix D relative to B. As in the proof
of Theorem 7.16, the w^ are eigenvectors of T, and the diagonal elements of D are the
eigenvalues λ^ of T. By Theorem 7.20, the geometric multiplicity of λ^ is the same as the
algebraic multiplicity nij. That is, VA is of dimension rrij. By Corollary 7.10, the sum
Σ^=ι V\j ls direct. Since JZj=i mj = n> ^ must be that V = V\1 0 VA 2 Θ · · · Θ V\r.
332 Chapter 10 Spectral Decompositions

With W j = V\j in Definition 10.4, the set of projections Pj defined there is a complete
set of projections for V. For each v G V, we have
V = Vi + V 2 H h Vr

with Vj = Pj(v). And since v^· G VAJ7 we have T(VJ) = XjVj. Thus

T(v) = T ( V l ) + T ( v 2 ) + - . . + T ( v r )
= AiVi + λ 2 ν 2 H h Arvr
- λ ι Ρ ι ( ν ) + λ 2 Ρ 2 (ν) + · - · + A r P r (v)
- ( A i P ! + A 2 P 2 + .-. + A r P r )(v),
and T = AiPi + A 2 P 2 + · · · + A r P r . ■
Definition 10.9 / / the linear operator T on Y can be written in the form
Τ = ΑιΡι + Α 2 Ρ 2 + · · · + λ Γ Ρ Γ ,
where {Pi,P 2 , ...,P r } is a complete set of projections for V and λι,λ 2 ,...,λ Γ are the
distinct eigenvalues of T, then the expression X\P\ + λ 2 Ρ 2 + · · · + \rPr is called a
spectral decomposition ofT.
Theorem 10.8 asserts that T is diagonalizable if and only if T has a spectral decom­
position.
Theorem 10.10 IfT has a spectral decomposition T = Χ\Ρχ + λ 2 Ρ 2 H h A r P r , then
/ ( T ) = /(λχ)Ρι + / ( λ 2 ) Ρ 2 + - · - + f(Xr)Pr for every polynomial f(x).

Proof. We first show that T% = 5 ^ = i tyPj. This is trivial for i = 0 and z = 1.


Assuming that Tk = Y^rj=1 X^Pj, we have

r r
=
2^ 2^ XjXmPjPm
j = \ m=\

By induction, T z = £ ^ = 1 X)Pj for each nonnegative integer i. For any polynomial


/ ( x ) — Σ!=ο c*x*' w e ^ nus nave

i=0 \j=\ }

j=l \i=0 /
10.3 Spectral Decompositions 333

Theorem 10.10 will prove to be useful in the next section.

Theorem 10.11 IfT = λχΡχ +Λ2^2Η hA r P r is a spectral decomposition ofT, then


V = VAJ Θ VA 2 Θ · · · Θ VA r , and Pj is the projection ofV onto Υχά along X^=i V ^ .

Proof. Let T = X\P\ -f Λ2Ρ2 H \- \rPr be a spectral decomposition of T. Then T


is diagonalizable by Theorem 10.8, and V = \r\1 0 VA 2 θ · · · Θ V\r as in the proof of
that theorem. Since 1 = Px + P 2 + · · · + P r , we have v = Pi(v) + P 2 (v) + · · · + P r (v)
for any v in V. But P/(v) is in VA^ since

Thus we have Pj(V) Ç VA^ and hence

dim(Pj(V))<dim(VA,).

This implies the inequalities


r r
n = ^dim(P,-(V)) < ^ d i m (VA,) = n
i=i i=i

and therefore
dim{Pj(V)) = dim(VXj),
for j — 1,2, ...,r. It follows that Pj{V) = V\j for j = 1,2, ...,r. Hence Pj is the
projection of V onto VA · along the sum of the remaining eigenspaces. ■

As might be expected, a restriction to orthonormal bases of an inner product space


V corresponds to a restriction on the projections in the spectral decomposition. The
exact description is as follows.

Theorem 10.12 A linear operator T on an n-dimensional inner product space V can


be represented by a diagonal matrix relative to an orthonormal basis of V if and only if
T has a spectral decomposition in which the projections are s elf-adjoint.

Proof. With the proofs of Theorems 7.16 and 10.8 in mind, it is sufficient to prove
that an orthonormal basis of eigenvectors of T exists if and only if T has a spectral
decomposition in which each projection is self-adjoint.
Suppose first that T = X\P\ + λ2Ρ2 + · · · + A r P r , with the Pi self-adjoint. Let
Bj = {uji, Uj2, ···, Uj n j } be an orthonormal basis of the eigenspace VA · for j = 1,2,..., r.
As in the proof of Theorem 7.20, the set

B = { U n , . . . , U i n i , U 2 l , ...,U2n 2 î - ) U r l ) • • • ? U r n r }
334 Chapter 10 Spectral Decompositions

is a basis of V. This basis is orthonormal if and only if (u^, UjS) — 0 whenever % ψ j .


According to Theorem 10.11, Pj is the projection of V onto V\j along Σ ^ . · VA^.
Hence Pj(v) = v for each v in VA^. In other words, each nonzero vector of VA,, is an
eigenvector of Pj associated with 1. Consequently, we have

{Uit,Ujs) = (Pi(Uit),Pj(uJ8))
= (uiuPtPjiujs))
= (uit.PiPjiujs))
= (u i t ,0)
= 0

whenever i φ j .
Conversely, suppose that there exists an orthonormal basis of eigenvectors of T. With
the same notation as used in the proof of Theorem 10.8, V = V\1 0 VA 2 Θ · · · Θ VA,.·
The set of projections {Pi, P 2 ,..., P r } is a complete set for V, and T — \\P\ + λ 2 Ρ 2 +
• · · + \rPr. The proof will be complete if we show that each Pi is self-adjoint. Let
u = Pi(u) + P 2 (u)H hPr(u) and v = Pi(v) + P 2 (v)H f-P r (v) be any two vectors
in V. For i φ j,Ρ%{\\) and Pj(y) are in distinct eigenspaces VA, and VA-, and so are
orthogonal. This implies that

= (Ρ,(α),^(ν))

= (Pi(u),gP<(v))
= (^(u),v),

and therefore P* = P3·,. ■

We shall devise a method for obtaining the projections involved in a spectral decom­
position near the end of the next section.

Exercises 10.3

1. Verify that the mappings of R 2 into R 2 defined by Pi(x\,X2) — (#1 + #2>0) and
^2(^1,^2) = (#2^2) are projections, and show that P\ + P<i is not a projection.

2. Let P\ and P 2 be the projections defined on C2 by P\(x\,X2) = (#2»#i) and


P 2 (xi,x 2 ) = (^2,^2).

(a) Let T = 3P\ + 4P 2 and determine if T is diagonalizable.


(b) Determine whether or not P\ + P 2 is a projection.
10.4 Minimal Polynomials and Spectral Decompositions 335

l
3. Prove that if T is invertible and has a spectral decomposition, then T has a
spectral decomposition with the same complete set of projections.
4. Let {Pi, P2, ···, Pr] be a complete set of projections for V, and suppose that T =
C1P1+C2P2 + ·· - + crPr.

(a) Prove that each eigenvalue of T is equal to at least one of the scalars Cj.
(b) Give an example which shows that there may be a vector v such that v is an
eigenvector of T associated with Cj, but v is not an eigenvector of any Pi.

5. Prove that each projection Pi in a spectral decomposition

Τ = λιΡι + λ 2 Ρ 2 + · · · + λ Γ Ρ Γ
of T has rank equal to the geometric multiplicity of the eigenvalue λ^ of T.
6. Prove that the spectral decomposition of T is unique if it exists.
7. Let T be a normal linear transformation of the inner product space V. Prove that
T* has a spectral decomposition.
8. Let T = X][ =1 KPi be a spectral decomposition for the linear operator T on the
inner product space V. Suppose that {/i(x), /2(x), ···, fr(x)} is a set of polyno­
mials with real coefficients such that fi(Xj) = 6{j. Prove that fi(T) = Pi for
i = l,2,...,r.
9. A linear operator T on an inner product space V is called nonnegative if T is
self-adjoint and (T(v), v) > 0 for every v G V.

(a) Prove that any nonnegative linear operator T on V has a spectral decompo­
sition T = ΣΙ=1 ΚΡί with each λ^ > 0.
(b) Show that 5 = \f\[P\ + \f\2P2 H H yf\>Pr is a nonnegative linear operator
such that S2 = T (i.e., 5 is a nonnegative square root of T).
(c) Show that the nonnegative square root of S in (b) is unique.

10.4 Minimal Polynomials and Spectral Decomposi­


tions
In this section, we shall need to use certain properties of addition and multiplication of
polynomials in x over a field T (i.e., polynomials in x with coefficients in F). The set
of all polynomials in x over T is denoted by T\x\. A derivation of the information that
we need about T\x\ would constitute a major digression at this point. For this reason,
we shall simply state without proofs the results that are needed.
If p(x) and m(x) are in Τ[χ] with p(x) nonzero, then there exist polynomials
q(x),r(x) in T\x\ such that
m(x) = p(x)q(x) + r(x)
336 Chapter 10 Spectral Decompositions

with r(x) either the zero polynomial or a polynomial of degree less than that of p(x).
This statement is known as the division algorithm for elements of T[x\. If r(x) is the
zero polynomial, we say that p(x) divides m(x), and that p(x) is a divisor of m(x).
A nonzero polynomial p(x) in Τ[χ] is called monic if the coefficient of the highest
degree term in p(x) is 1. That is, the highest power of x that appears in p(x) has 1 as
its coefficient.
A monic polynomial d (x) in T[x\ is called the greatest common divisor of a set
of nonzero polynomials qi(x),q2(x), • ••>#r(#) m F[x] if
i. d(x) is a divisor of each of the polynomials qi(x), and
ii. every polynomial p(x) that divides each qi(x) is also a divisor of d{x).

Every set of nonzero polynomials q\(x),q2(x), ...,qr{x) in !F[x\ has a unique monic
greatest common divisor d(x) in F[x\. Moreover, there exists a set of polynomials
9i(%),92{x),--")9r(x) m ^[χ] s u ch that

d{x) = g1(x)qï(x) + 92{x)q2{x) H + gr{x)qr(x).

In some of the examples and exercises, we assume that the student is familiar with
the partial fraction decomposition of a quotient of two nonzero polynomials.
We have already encountered polynomials p(T) — ΣΙ=ο Ci^% m a n n e a r operator
T on V and the corresponding polynomials p(A) = ΣΙ=ΌθίΑι in a matrix A that
represents T. It has been noted that T and A satisfy the same polynomial equations.
From one point of view, a polynomial p(A) = ]ζ* = 0 CiA1 in the square matrix A can
be thought of as being obtained from a polynomial p(x) = Y^si=0 c^x1 by replacing the
powers xl of the indeterminate x by the corresponding powers A1 of the matrix A. This
suggests the construction of other types of polynomials involving matrices by making
other replacements in p(x). One might treat x as an indeterminate scalar and replace
the scalar coefficients Q by matrix coefficients Ci, or one might replace both the C{ and
the xl by matrix quantities. The algebra connected with this last type of polynomials is
quite involved, and we shall not be concerned with them here. Our interest is confined
to only two types of polynomials:

1. Those obtained by leaving the scalar coefficients Ci unchanged and replacing x by


a square matrix A.

2. Those obtained by treating x as an indeterminate scalar and replacing the scalar


coefficients Q by matrix coefficients Ci.

There is another point of view from which the polynomials of the second type may
be regarded. Any polynomial X ^ = 0 CiX1 of this type can be considered to be a matrix
with elements in !F[x]. For example,

2 -1 -1 0 4 5 2x3 - x + 4 - x3 + 5
x3 + x+
0 4 8 3 -7 0 Sx - 7 Ax3 + 3x
10.4 Minimal Polynomials and Spectral Decompositions 337

In either form, this type of matrix is called a matrix polynomial. In our development,
we have considered only matrices with elements in a field, and F[x] is not a field. But
T[x] is contained in the field F{x) of all rational functions in x over T, and we can
consider matrix polynomials in this context. For future use, we note that two matrix
polynomials are equal if and only if they have equal coefficients of each power of x.
Since powers of A commute with scalars and with each other, multiplication is
commutative for polynomials of the first type: p(A)q(A) = q(A)p(A) for any p(x),q(x)
in T[x\. In this case, factorizations in J~[x\ remain valid: If p(x) = g(x)h(x) in Τ[χ\,
then p(A) = g(A)h(A). A certain amount of care must be exercised when working
with polynomials of the second type, since multiplication is not commutative then.
Factorizations in F[x\ no longer remain valid here. For example, if AB ψ ΒΑ, then

(Ax-B)(Ax + B) = A2x2 + (AB-BA)x-B2


φ A2x2 - B2.

We recall from Chapter 4 that the set Tnxn of all n x n matrices over T is a vector
space over T. If Eij denotes the n x n matrix that has the element in row i, column j
equal to 1 and all other elements 0, then an arbitrary A = [α^] in Tnxn can be written
uniquely as
n n
Α= ΣΥ^α13Ε13,
2=1 j = l
and the set {E^} is a basis of TnXn. Hence TnXn has dimension n 2 , and the set

Ι = Α°,Α,Α2,...,Αη2

is linearly dependent for any n x n matrix A over T. That is, there is a polynomial
2

P(X) ~ ΣΓ=ο CiX% s u c n that p(A) = 0. This means that there exists a monic polynomial
m(x) of smallest degree such that m(A) — 0.
Theorem 10.13 Let A be an n x n matrix, and let m(x) be a monic polynomial over
T of smallest degree such that m(A) = 0. For any p(x) in T[x], p(A) = 0 if and only if
m(x) is a factor ofp(x).

Proof. If p(x) = m(x)h(x), then p(A) = m(A)h(A) = 0. Suppose conversely that


p(A) — 0. Let q(x) and r(x) be the quotient and remainder upon division of p(x) by
m(x) :
p(x) = m(x)q(x) + r(x),
where either r(x) is zero, or the degree of r(x) is less than that of m(x). Then

r(A) = m{A)q(A) + r(A) = p{A) = 0.

This means that r(x) must be the zero polynomial, since otherwise r(A) = 0 would
contradict the choice oîm(x) as having the smallest possible degree such that m(A) = 0.
Hence m(x) is a factor of p(x). M
338 Chapter 10 Spectral Decompositions

This theorem makes it easy to prove that the polynomial m(x) in the hypothesis is
unique (see Problem 6).

Definition 10.14 Let A be an n x n matrix over T. The unique monic polynomial


m(x) of smallest degree such that m(A) = 0 is called the minimal polynomial of A.
The minimal polynomial of a linear operator T on V is by definition the same as the
minimal polynomial of any matrix that represents T.

Since a linear operator T and any matrix that represents it satisfy the same poly­
nomial equations, the minimal polynomial of T is well-defined.
The discussion preceding Theorem 10.13 is an efficient argument for the existence of
the minimal polynomial m(x) of A, but it suggests no practical method for finding m(x).
The next theorem is a great help in this direction. It is known as the Hamilton-Cay ley
theorem.

Theorem 10.15 Let A = [α^] be an arbitrary n x n matrix over T, and let f (x) be
the characteristic polynomial of A. Then f(A) = 0.

Proof. Since the characteristic matrix

A — xl — [dij — xöij]

is a matrix with polynomials as elements, the minor of each a^—xbij is the determinant
of an (n — 1) x (n — 1) matrix with polynomial elements. Hence each element in B —
adj(A — xl) is a polynomial in x. Moreover,

B(A-xI) = &dj(A - xl) - (A - xl)


= det(A-xI)I
= f{x)L

If f{x) = cnxn + · · · + C\x + Co, then

f{A) - f(x)I = {cnAn + · · - + ci A + c0I) - (cnIxn + · · · + cjx + c 0 /)


n n 2 2
= cn(A - Ix ) + · · - + c2(A - Ix ) + ci(A - Ix).

The usual properties of matrix multiplication and matrix addition yield the factorization
Ak - Ixk — Pk(A — Ix), where

Pk = Ixk~l + Axk~2 + · · · + Ak~2x + A*'1

is a matrix polynomial of degree k — 1. Hence


n
f(A)-f(x)I = ^ckPk(A-Ix),
k=\
10.4 Minimal Polynomials and Spectral Decompositions 339

and
f(A) = f(x)I+£ckPk(A-Ix)
k=l

= adj(A - Ix) - (A - Ix) + Σ ckPk{A - Ix)

= Ldj(A -Ιχ)+Σ CkPk\ (A - Ix).

The matrix inside the braces is a matrix polynomial, say Btxl + · · · + B\x + Bo. This
gives
f(A) = ( Β ^ + .-. + Βχχ + Β ο Χ Α - Ι χ )
= -Btxl+l + (BtA - Bt-^x1 + · · · + (BiA - B0)x + ^ o ^ ·

Since two matrix polynomials are equal if and only if they have corresponding coefficients
that are equal, this requires that f(A) = BQA and the coefficients of each positive power
of x on the right side be zero:

Bt = 0
BtA-Bt-i = 0

#2^4-#i = 0

BXA - B0 = 0 .

Solving for Bt, Bt-i,..., B\, Bo in order we find

0 = Bt = Bt-i = · · · = B\ = Bo.

But this gives f(A) = Bo A = OA — 0, and the theorem is proved. ■

The minimal polynomial of a diagonalizable operator T can be used to obtain the


projections for a spectral decomposition. Since the existence of a spectral decomposition
requires that the eigenvalues of T be in T, we assume for the remainder of this chapter
that the characteristic polynomial f(x) of T factors into linear factors over T :

f{x) = ( - 1 Π * - λ!)™1 (x - λ 2 Γ 2 · · · (x - λ Ρ Γ - , (10.1)


where the rrij are the algebraic multiplicities of the distinct eigenvalues λ^. Since the
minimal polynomial m(x) divides f(x), this means that

m(x) = (x- Ai)' 1 (x - X2)t2 · · · (x - Xr)tr· (10.2)

We shall adopt the notation of this paragraph for the characteristic polynomial / (x)
and the minimal polynomial m(x) of T throughout the remainder of the chapter.
340 Chapter 10 Spectral Decompositions

There are some further notational conventions that are essential. For each factor
(x — Xj)1-1 of ra(x), let qj(x) be the polynomial
m{x)
qj{X) =
(^F
= (X - λ ι ) ^ · · - (X - Xj-lY^{x - Xj + lY^1 ' ' ' (X - XrY'·.

Any nonconstant common divisor of q\ (x), q2{x\ ···, qr{x) would necessarily have a linear
factor of the form x — λ?; since the only linear factors of each qj(x) are of this type.
But x — Xi is not a factor of qi(x), so there are no nonconstant common divisors of
qi(x), ^2(^)7 ···? qr(x)- That is, the greatest common divisor of q\(x), q2(x)>> ···, qr(x) is 1·
Hence there are polynomials gi(x),g2(x), ····>9r(x) such that
1 = gi(x)qi(x) + g2(x)q2(x) H h gr{x)qr{x)·
ο· / x mix) .. . ,.
Since qj[x) = -, , this yields
(x - Xj) j

1 _ 9i(x) 02 0*0 9r{x)


m{x) (χ-λι)*ι {X-X2Y2 (x-Xr)tr'
Let Pj(x) = gj(x)qj(x) for j — 1,2, ...,r. The set of polynomials p\(#),^2(^)7 ...,Pr(#)
has the following properties:
(i) p?;(x)pj(a:) has m(x) as a factor if i ^ j , say pi(x)pj(x) = hij(x)m(x)
(ii) 1 = p i ( x ) + p 2 ( ^ ) H hPr(^)·
Theorem 10.16 Let Fj = Pj{T), and let JCj denote the kernel of (T — λ^)^ for j —
l,2,...,r. Then {Fi, F2,..., F r } is a complete set of projections for V, and Fj is the
projection of Y onto JCj along X ^ . / Q ·

Proof. Since Pi{x)pj{x) — hij(x)m(x) for i φ j and m(T) = Z, we have


FiFj = Pi(T)Pj(T) = hl3{T)m{T) = Z
whenever i φ j . It follows from (ii) above that 1 = F\ -f F2 + · · · + Fr. Thus

FJ=FJ(J2F)=J2F3Fl = Ff,
\i=i / ?;=i

and {F\, F2,..., Fr} is an orthogonal set of projections such that 1 = F\ + F2 + · · · + Fr.
We shall show that Fj is the projection of V onto JCj along Σ ? : ^ JCi, and it will follow
that Fj Φ Z since JCj clearly contains the eigenspace VA^ of T.
Now
{x-XjY^Pj(x) = (x- XjY^gj(x)qj(x)

= gj(x)m(x),
10.4 Minimal Polynomials and Spectral Decompositions 341

so (T - XjY^Fj = (T - Xj)^Pj{T) = gj(T)m(T) = Z. Hence (T - λ , ) * ^ ( ν ) = 0 for all


v, and Fj(V) C ICj. Now let v G /C,·, so that (T - λ ^ (v) = 0. For i φ j , (T - Xjf* is
a factor of F{ = Pi(T) since (x — Xj)tj is a factor of Pi(x). Hence i^(v) = 0 for all i -φ j ,
and
v = (Fi + F2 + · · · + F r )(v) = F,(v).
Thus, v G Fj(V), and ICj = Fj(V). This completes the proof of the theorem. ■

Definition 10.17 A subspace W o / V i s invariant under T, or T-invariant, if

T(W) C W .
That is, a subspace W of V is T-invariant if and only if T(v) G W for all v G W .
Every linear operator T on V has invariant subspaces. The eigenspaces VA of T are
invariant under T, as are the zero subspace and V.
If W is a subspace of V that is invariant under T, then T induces a linear transfor­
mation T w of W into W defined by T w (v) = T(v) for all v G W . That is, as long as
v e W , T w and T map v onto the same vector. The distinction between T w and T is
that T w is defined only on W . The transformation T w is called the restriction of T
to W.

Theorem 10.18 Let Κό be the kernel of (T - Xtf* for j = 1, 2,..., r. Then

(a) V = / C i 0 / C 2 e - - - e / C r ;
(b) each Kj is invariant under T;
(c) ifTj denotes the restriction of T to ICj, then the minimal polynomial of Tj
is (x — Xj)tj ■

Proof. Consider the complete set of projections {Fi, F 2 ,,..., Fr} where Fj = Pj{T).
By Theorem 10.16, Fj is the projection of V onto ICj = Fj(V) along Σ ^ K>i> Thus,
V = /Ci Θ /C2 Θ · · · θ /Cr is the direct sum decomposition corresponding to this complete
set of projections.
To establish (b), let v G JCj and consider T(v). Since (T — Xj)tj is a polynomial in
T, it commutes with T to yield

(T - A,-)*' (T(v)) =T((T- Xtfi (v)) = T(0) = 0.

Hence T(v) is in /Cj, and /C/ is invariant under T.


Now (Tj — XjY^ is the zero transformation of tCj since (T — Aj)^(v) = 0 for all
v G /Cj. Therefore, the minimal polynomial of Tj divides (x — Xj)tj. That is, the
minimal polynomial of Tj is (x — Xj)s for some s <tj. Suppose that s < tj. Now each
v G V can be written uniquely as

v = vi + v 2 H h vr

with Vi G /Q. Since


(T-A i )**(v < ) = 0
342 Chapter 10 Spectral Decompositions

for each z, then


Qj(T)(v) = qj(T)(vj).

And since
(T-\j)°(vj) = (Tj-\j)'(vj) =0
s
for all \j G /Cj, this means that p(T) = (T — Xj) qj(T) is the zero transformation on
V. But this is a contradiction, since p(x) = (x — Xj)sqj(x) has degree less than that of
m(x) — (x — Xj)tjqj(x). Therefore, s = tj. ■

Theorem 10.19 T/ie linear operator T on\ has a spectral decomposition if and only
if the minimal polynomial m(x) ofT has the form

m(x) = (x — Xi)(x — X2) · · · (x — A r ),

where λι,λ2,...,λ Γ are the distinct eigenvalues ofT.

Proof. If the minimal polynomial m{x) had the given form, then each ti = 1 in
equation (10.2). Hence /Q = V\i for i = 1, 2,..., r, and

ν =νλιθνλ2θ-θνν
It follows from the proof of Theorem 7.20 that T is diagonalizable, and therefore T has
a spectral decomposition by Theorem 10.8.
Conversely, suppose that T has a spectral decomposition

T = X1P1 + X2P2 + -- + XrPr.

For any polynomial p(x),

p(T) = ρ(λι)Ρι + ρ ( λ 2 ) Ρ 2 + - - - +p(Xr)Pr

by Theorem 10.10. If p(Xi) φ 0 for some i,p(T) has p(X{) as a nonzero eigenvalue by
Theorem 10.6. Hence p(T) — Z if and only if p(Xi) = 0 for i = 1,2, ...,r. This implies
at once that
m(x) = (x — X\)(x — X2) - - ■ (x — A r ). ■
The result that we have been working toward is now at hand.

Theorem 10.20 If T — X\P\ + X2P2 + · · · + A r P r is a spectral decomposition, then


Pj=Pj(T).

Proof. Let T = X1 P\ -f X2P2 + · · · + XrPr be a spectral decomposition of T. By


Theorem 10.19,
m{x) — (x — Xi)(x — X2) · · · (x — A r ).
That is, each tj in equation (10.2) is 1, and JCj = V\j for each j . With Fj = Pj(T),
then {Fi, F2,..., Fr} is a complete set of projections for V, and Fj is the projection of
V onto JCj along £ ^ j ■ JCi by Theorem 10.16. By Theorem 10.11, Pj is the projection
10.4 Minimal Polynomials and Spectral Decompositions 343

of V onto VXj along Σ ^ V A i . Since JCj = V A j for each j , Pj = Fj = Pj{T) for each j ,
and the proof is complete. ■

The next corollary follows readily from the fact that each projection in a spectral
decomposition of T is a polynomial in T.

Corollary 10.21 If T = \\P\ + λ 2 Ρ2 + · · · + A r P r is a spectral decomposition of T,


then a linear operator S on V commutes with T if and only if S commutes with each
Pr

Proof. See Problem 9. ■

One immediate and desirable consequence of Theorem 10.20 is that we now have a
systematic procedure available for finding the projections in a spectral decomposition.
This procedure, of course, is to determine the Pj by use of Pj = Pj(T).
Example 1 □ Let T be the linear transformation of R 3 that has the matrix A relative
to the standard basis where
1-1-1
1 -1
-1 1
We shall determine whether or not T has a spectral decomposition, and shall obtain
such a decomposition if there is one. The characteristic polynomial f(x) is given by
/(x) = - ( x + l ) ( x - 2 ) 2 .
It follows from Theorem 10.19 that T has a spectral decomposition if and only if
m(x) = ( x + l ) ( x - 2 ) .
A simple calculation shows that
m(A) = A2 - A - 21 = 0,
so T does indeed have a spectral decomposition. With λι = — 1, À2 = 2, the polynomials
qj(x) are given by q\{x) = x — 2,q2(x) = x + 1. The partial fraction decomposition

1 _ 9i(x) 92{x)
m{x) x + 1 +
leads to g\{x) = —^,g2(x) = ^. Hence p\(x) = —\{x — 2) and P2(x) — \{x + 1)· The
projections P\,p2 thus have respective matrices Ei,E2 relative to Es given by

1 1 1
El = -UA-2I) = \ 1 1 1
1 1 1
344 Chapter 10 Spectral Decompositions

2-1-1
E2 = \{A + I) = \ -1 2-1
1-1-1 2

It is easily checked that E\ = £Ί, E2 = £ 2 , and A = λι£Ί + λ2ϋ?2. ■

In case T does not have a spectral decomposition, a somewhat similar decomposition


can be obtained that in many cases is quite useful. We have seen that the mappings
Pj — Fj = Pj(T) form a complete set of projections {Pi, P 2 ,..., P r } for V. Consider the
linear operator
£> = λ ι Ρ ι + λ 2 Ρ 2 + · · · + λ Γ Ρ Γ ,
where λχ, λ 2 ,..., λΓ are the distinct eigenvalues of T. Since D has a spectral decomposi­
tion, it is diagonalizable. Let N = T — D. Using the representation

T = T(Pi + P 2 + · · + Pr) = TPi + TP2 + · · · + T P r

for T, iV is given by

N = (T- λι)Ρι -h (T - λ 2 )Ρ 2 + - ·. + (Γ - \r)Pr.

It is left as an exercise (Problem 10) to prove that

Nk = (T- X1)kP1 + (Γ - X2)kP2 + . . . + (T - K)kPr

for all positive integers k. Recall from the proof of Theorem 10.16 that (Γ — Xj)tj Pj = Z,
and thus Nk = Z if k > tj for all j .

Definition 10.22 A linear operator T is called nilpotent ifTk = Z for some positive
integer k. The smallest positive integer t such that Tl = Z is the index of nilpotency
ofT.

Theorem 10.23 If the minimal polynomial ofT factors into a product of linear factors
(not necessarily distinct) over J1", then T can be written as the sum T = D + N of a
diagonalizable transformation D and a nilpotent transformation iV, where D and N are
polynomials in T, and consequently commute.

Proof. Let D and TV be as given in the paragraph preceding Definition 10.22. From
the expressions there for D and N and the fact that Pj = pj (T), it is reasonably clear
that D and N are polynomials in T and hence commute. Nevertheless, we furnish some
additional details. The projections Pj commute with T since Pj = Pj(T). Now

= Σ ZP-xjXjPiPj,
1=1.7=1
10.4 Minimal Polynomials and Spectral Decompositions 345

and

DN = [E^jPj Z(T-\i)Pi
i=l

2=1.7 = 1

But
XjPjiT - X^Pi = (T- λΟλ,-PiPjJ '
so DN = ND. I

In case V is a unitary space, the hypothesis of Theorem 10.23 is automatically


satisfied, and every linear operator on V is the sum of a diagonalizable and a nilpotent
transformation.
Spectral decompositions of linear operators have the usual parallel statements for
matrices. If T has matrix A relative to a certain basis and the projections Pi in T =
X\P\ + X2P2 + h XrPr have matrices Ei for i — 1,2,..., r, then

A = X1E1 + X2E2 + · · · + A r £ r

is called a spectral decomposition for A. The 2^ are called projection matrices or


principal idempotents for A. It is left as an exercise to prove that AEi = XiEi.
If T, D, and N iuT = D+N have matrices A, £?, and C, respectively, then A = B+C.
The matrix B is diagonalizable, and C is called a nilpotent matrix.

Exercises 10.4

1. Verify the Hamilton-Cay ley theorem for each matrix A.

-4 -3 -1 8 5 - 5
(aM = -4 0 -4 (b)A = 5 8 - 5
8 4 5 15 15 - 1 2

8 5 6 0 7 3 3 2
0 - 2 0 0 0 1 2 -4
(c)A = (a) A
-10 - 5 - 8 0 -8 - 4 -5 0
2 1 1 2 2 1 1 3

2. Find the minimal polynomial of each matrix A in Problem 1.

3. Determine all real 2 x 2 matrices A such that A2 = —I.

4. Let T be the linear operator on R n that has the given matrix A relative to the
standard basis Sn. Find the spectral decomposition of T.
346 Chapter 10 Spectral Decompositions

8 5 -5 3 2
(a)A = 5 8 -5 (b)A 1 4
5 15 --12 -2 -4

7 3 3 2 4 -1 0 1
0 1 2 ·- 4 -1 5 -1 0
(c)A (d)A =
-8 - 4 - 5 0 0 -1 4 -1
2 1 2 3 1 0 -1 5

Write the given matrix A as the sum of a diagonahzable matrix and a nilpotent
matrix.
7 3 3 2 8 5 6 0
0 1 2 - 4 0 - 2 0 0
(aM = (b)A =
-8 -4 -5 0 -10 - 5 - 8 0
2 1 1 3 2 1 1 2

1 1 0 0
0 1 1 0
(c)A-
0 0 1 1
- 1 0 2 1

6. Prove that the polynomial m{x) in the hypothesis of Theorem 10.13 is unique.

7. Prove that, for an arbitrary square matrix A, A and AT satisfy the same polyno­
mial equations with scalar coefficients.

8. Prove that similar matrices have the same minimal polynomial.

9. Prove Corollary 10.21.

10. Let N = ( Τ - λ ι ) Ρ χ + (T - λ 2 )Ρ 2 + · · ■ + (T - A r )P r , where {Pi,P 2 , ...,P r } is a


complete set of projections. Prove that

Nk = (T- Xl)kP1 + (T - X2)kP2 + (T-\r)kPr


for all positive integers k.

11. Let {Pi, P2,..., P r } be the complete set of projections determined by Pj = Pj(T).
With T = TPi + TP2 + ■ · · + TPr, show that (TPi) · (TPj) = Z if i φ j , and
that p(T) = ρ{ΤΡχ) + p(TP 2 ) + · · · + p{TPr) for any polynomial p(x) with zero
constant term.
10.5 Nilpotent Transformations 347

12. Let T = \\P\ + MP2 + · · * + KPr be a spectral decomposition for the linear
operator T on V. Prove that a linear operator 5 on V commutes with T if and
only if every JCj is invariant under S.

13. Prove that if A — X\Ei + X2E2 + · · · 4- XrEr is a spectral decomposition for A,


then AEi = ΧχΕι for each i.

10.5 Nilpotent Transformations


We have seen in the preceding section that if the minimum polynomial of T factors into
linear factors over T, then T can be written as T = D+N, where D is diagonalizable and
N is nilpotent. The degree of simplicity that can be obtained in a matrix representation
of T thus depends on the type of representation that is possible for N. The representation
of T that is accepted as being simplest, or nearest to a diagonal form, is the Jordan
canonical form. The derivation of the Jordan canonical form is the principal objective of
the remainder of this chapter. This derivation hinges on an investigation of the invariant
subspaces of an operator.
Suppose that W is an invariant subspace of V under T. Let {vi,..., v&} be a basis
of W , and extend this set to a basis A = {v x ,..., vfc, v^+i,..., v n } of V. Since W is
invariant under T, T(VJ) = Σί=ι aijvi f° r 3 = 1,2,..., fc. Hence the matrix of T relative
to A is of the form

[ 0 A2J
where A\ is the matrix of T\y relative to {vi,..., v / J .
If V = W i Θ W2, where each of W i and W2 is invariant under T, and if the
basis A — {vi,...,Vfc,Vfc+i,...,v n } of V is chosen so that {v 1 ,...,v/ c } is a basis of
W i and {ν&+ι,..., v n } is a basis of W2, then we also have T ( V J ) = ΣΓ=ΑΗ-Ι aijwi f° r
j = k + 1,..., n. Hence ^3 = 0, and the matrix of T relative to A is of the form

AA= \Al 0
l
[ 0 A2\

where Ai is the matrix of the restriction of T to W^.


The results of the preceding paragraph generalize readily to direct sums with more
than two terms. If W i , W2,..., W r are T-invariant subspaces of V such that V is the
direct sum V = W i Θ W 2 Θ · · · θ W r , and if the basis

B = { U n , . . . , U i n i , l l 2 1 , ...,U2n 2 5 " · ) ^ Γ 1 , . . . , U r r i r }

of V is such that {u i l ? ..., \iini} is a basis of W i , then the matrix of T relative to B is


348 Chapter 10 Spectral Decompositions

of the form
Ai 0 0
0 A2 0

0 0

where Ai is the matrix of the restriction of T to W^. A matrix such as this A is called
a diagonal block matrix.
The type of invariant subspace that we shall be mainly concerned with is the cyclic
subspace, to be defined shortly.
Let v be any nonzero vector in V. Since a linearly independent subset of V can have
at most n elements, there is a unique positive integer k such that {v,T(v), ...,T / c _ 1 (v)}
is linearly independent and Tk(v) is dependent on {v, T(v), ...,T f c - 1 (v)}. Then there
are scalars αο,αι, ...,α^-ι such that Tk(v) = Σΐ=ο aiTl(v). It follows from this that
the subspace
(y,T(-v),...,1*-1{-v))
is T-invariant (Problem 3).

Definition 10.24 Let v be a nonzero vector in V, and let k be the unique positive in­
teger such that {v,T(v), ...,T f c - 1 (v)} is linearly independent andTk{\) is dependent on
{w,T(Y),...,Th-l(\)}. The subspace ( ν , Τ ( ν ) , . . . , Τ * " 1 ^ ) ) is denoted by C(v,T) and is
called the cyclic subspace of v relative to T, or the cyclic subspace generated by
v under T. The particular basis {Tk~l(v), ...,T(v), v} o / C ( v , T ) is called the cyclic
basis generated by v under T and is denoted by #(v, T).

The arrangement of the elements of #(v, T) according to descending rather than


ascending powers of T seems a bit unnatural, but this is done so that the restriction of
T to C(v,T) will have a special form of matrix relative to ß ( v , T ) . Since

T {Tk~l(y)) = T fc (v) = a f c - i T * - » + · · · + aiT(v) + a 0 v

and T (T^'(v)) = T-?+1(v) for j = 0,1,..., fe - 2, the restriction of Γ to C(v,T) has the
matrix
Γ
ak-x 1 0 ··· 0
ak-2 0 1 0
A =

ax 0 0 • 1
a0 0 0 • 0

relative to ß(v, T). The special form that this matrix A takes on for a nilpotent trans­
formation T is especially useful.
10.5 Nilpotent Transformations 349

Theorem 10.25 If T is a nilpotent transformation o / V , then the restriction of T to


C(v, T) has the matrix
Γθ 1 0 .·· θ ]
0 0 1 ··· 0

0 0 0 ·.. 1
[θ 0 0 · . . Oj
relative to ß(v, T).

Proof. With the notation of the paragraph just before the theorem, we shall show
that Tk(v) = 0. Our proof of this fact is an "overkill": We show that the restriction Tc
of T to C(v, T) has minimal polynomial xk.
Since T is nilpotent on V, Tc is nilpotent, say with index s. The minimal polynomial
of T c must therefore divide xs. Let xr denote this minimal polynomial. Since Tk~1(v)
is in the basis #(v, T),then r > k.
Consider the polynomial p(x) — ao + Q>\x H H ak-\xk~l — xk'. We shall show that
S = p{Tc) = a0 + axTc + · · · + ak^Tk~l - Tk

is the zero transformation of C ( v , T ) . To do this, it is sufficient to show that S takes


on the value 0 at each vector in #(v, T). Since T c (v) = T(v),

S(v) = a 0 v + a i T ( v ) + - · · + afc-iT*- 1 (v) - T fc (v) = 0

by the choice of the scalars a?> For j = 1,..., k — 1, we have

S(:P'(v)) = a0TJ(v) + -.. + ak^Tk-l(Ti(v))-Tk(Ti(v))


= T^(a 0 v + --- + a f c _ 1 T f c - 1 (v)-T f c (v))
= :P'(O)

= o.
Hence 5 = p(T c ) is the zero transformation, and xr divides p(x). This requires that
r < k. Therefore r — k and p(x) — xr = xk. ■

Our principal concern is with the connection between cyclic subspaces and the kernels
of the powers of a nilpotent transformation. Before proceeding to our main result in
this direction, some preliminary lemmas are in order. These lemmas by themselves are
of little consequence, but they are essential to the proof of Theorem 10.28.

Lemma 10.26 Suppose that T is a nilpotent transformation of index t on V, and let


W j be the kernel of TK Then {0} = W 0 Ç W x Ç · . . Ç Wt = V, and W j _ i φ W3
/ o r j = l,2,...,t.
350 Chapter 10 Spectral Decompositions

Proof. The equality {0} = W 0 follows from the fact that T° is the identity trans­
formation, and W t = V since T has index t o n V .
Since T^l{w) = 0 implies !P(v) = T(0) = 0, W ^ C Wj for j = 1,2, ...,t. Let u
be a vector in V such that Γ* _1 (ιι) ^ 0, and let 1 < j < t. Then T*"J'(u) is in Wj since
Ti (T*-J(u)) = 0, and T*-'(u) is not in Wj_i since T^1 ( T ^ ' ( u ) ) - T * " 1 ^ ) ^ 0.
Thus W j . i ^ W j . l

Lemma 10.27 Let the subspaces Wi be as in Lemma 10.26, with j > 2. Suppose that
{ui,...,Ufc} is a basis of Wj-2,and that {wi,...,w r } is a linearly independent set of
vectors in Wj such that
(wi,..,w r )nw H = {o}.
Then {ui, ...,Ufc,T(wi),...,T(w r )} is a linearly independent subset o / W j _ i .

Proof. Since Tj'1 (T(w*)) = !P'(wi) = 0 for each i, the set

{ui,...,u f c ,T(wi),...,T(w r )}

is contained in W j _ i . Assume that

k r

Then
r k
b U is in W
Σ CmT(wm) = -^2 i i J-2,
777=1 i=l

so that

^- 2 ij> m T(w m )j =0.


This implies that T · 7 - 1 ( X ^ = 1 c m w m ) = 0, and therefore Y^!m=i c m w m is in W j _ i .
Since
< W i , . . . , w r ) n W i _ i = {0},

it must be that 5 ^ = 1 c m w m = 0. Hence each cm = 0, and this implies Σΐ=ι bfU« = 0.


This requires in turn that all 6 ^ = 0 since {ui,..., u^} is linearly independent. Therefore,
{ui, ...,Ufc,T(wi), ...,T(w r )} is linearly independent. ■

The superdiagonal elements in a matrix A = [ a ^ ] m X n are those elements α^+ι


for i = 1,2,..., m — 1. That is, the superdiagonal elements are those immediately above
the diagonal elements.
10.5 Nilpotent Transformations 351

Theorem 10.28 Let T be a nilpotent operator of index t on V, let W j be the kernel


ofTi for j = 1,2,..., i, and let s = nullity (T). There exists a basis BofV such that
the matrix of T relative to B is given by

\AI 0 ··· 0 1

A=
· l·
[ 0 0 · ·. As J
where each Ai is a square matrix that has all superdiagonal elements 1 and all other
elements 0, A\ is of order £, and order (Ai) > order (Ai+i) for each i. The matrix A is
uniquely determined by T.

Proof. By Lemma 10.27, there is a basis

A= {Vn,...,Viei,V2i,...,V2s2,...,Vii,...,VtSt}

of V such that {vn,..., v JSj .} is a basis of W j for j = 1, ...,£. Such a basis can be
obtained by extending a basis {vn,..., v i S l } of W i to a basis

{νιι,...,νιβ1,ν2ι,...,ν2β2}

of W2, then extending this basis to a basis of W 3 , and so on. For each j , the elements
of A that are in W j but not in W j _ i are precisely those in the segment v^i,..., v j s . .
We shall replace each of these segments by a new set of vectors in order to obtain a
basis Ar with certain properties. Roughly, the idea is this. The vectors in segment j — 1
will be replaced by the images of the vectors in the next segment to the right, and then
this set will be extended to a basis of W j _ i .
No change is required in the last st vectors of A. That is, we put

vJi = v f i , v { 2 = v t 2 ,...,v{ S t = v t e t .
If
E m = i Cm^'tm & i n W t _ i , then
St t—\ Si
CmW
Σ 'tm = ΣΣαίίνίί-
m=l i=l j=l

The linear independence of A requires that each α^ = 0 and each c m = 0. Hence


( v / t l , . . . , v { e i ) n W t _ 1 = {0}.Let

vi-1,1 = T ( v t l ) , v i _ l i 2 = T ( v t 2 ) , . . . , v i _ l i e t = T ( v t S t ) .

By Lemma 10.27, the set

{vii,...,Vi S l ,...,V i _ 2 ,l,...,V i _2, S t _ 2 ,vJ_ 1 1 ,...,vJ_ 1 ; S i }


352 Chapter 10 Spectral Decompositions

is linearly independent, and so can be extended to a basis

{vn,...,vi e i ,...,v t _2,i,.-,v{_ l j l ,...,vj_ l i e t ,...,vj_ 1 > e t _ 1 }

of W t _ i . At this point, the last two segments in the basis

{vn,...,vi S l ,...,v f _2, S t _ 2 ,vJ_ l j l ,...,vJ_ 1 > e t _ l ,v{ 1 ,...,vi S t }

of V have the desired form.


The procedure is repeated, replacing the segment Vj_i ? i,..., Vj-ilSj_1 by the images

v ; _ u = τ(ν'η), v ;_ li2 = T( V ; 2 ), ..., V ;_ M . = T( V ; S .),


and then adjoining elements v'j_x s + 1 ,..., v^_ 1 s ._i so as to form a basis of W j _ i . This
repetition leads finally to a basis

A' = {ν'η,...,ν'ΐ8ι,ν'211...,ν'232,...,ν'η,...,ν'ί8ί}

such that
ν;_ι,ι=Γ(ν;ι),...,ν;_1ιθί=Γ(ν^)

for each j .
The vectors in A! are now rearranged so that the first vectors from each segment
are written first, followed in order by the second vectors from each segment, and so on
until the vectors in the last segment are exhausted. Whenever a segment of vectors is
exhausted, we continue the same procedure with the remaining segments, if there are
any. This process leads to the basis

=
^ i v l l > v 21» "·> Wt\i ···> V l s t > ""> Vtsti V
l , s t + lJ ···> V i - l , s t + l> ···> V lsi> ···» V
fcsi/·

Since
ν'π = Τ ( ν 2 1 ) , ν 2 1 =Γ(ν / 3 1 ),...,νί_ 1 ι Ι = T ( v { 1 ) ,

we have
v'„ = T i - 1 K 1 ) , v 2 1 = T t - 2 ( v { 1 ) , . . . ) v i _ l i l = T ( v ' a ) ,

and {v' n , V2!, ...,ν^} is the same as the cyclic basis β ( ν ^ , Τ ) of C ( v ^ , T ) . Similarly,
those v£ · with the same second subscript (i.e., with the same position in the segments
of A') are the same as the vectors in the cyclic basis β ( ν ^ · , Γ ) of C(v ,T) :

ß(v,mi,T) = {vii,v^.,...,v^.},

where m is the number of the last segment in A' that has at least j elements. Thus

ß = {ß(vi1,T),...,ßKSt,T),ß(vi_1)ät+1,T),...)ß(viiS2+1,T),...,ß(v'lsi,T)},
10.5 Nilpotent Transformations 353

where Β(ν'^,Τ) is the cyclic basis of C ( v ^ , T ) . By Theorem 10.25, the matrix of the
restriction of T to C ( v ^ , T) is of the form

0 1 0 · • 0
0 0 1 · • 0

Aj =

0 0 0 · • 1
0 0 0 · • 0

Since each C(vJ ·, T) is T-invariant and V is the direct sum of the C(v· ·, T), the matrix
of T with respect to B is of the form A given in the statement of the theorem. There
are si submatrices Aj in A, one for each C(v· , T), and s\ = dim(Wi) = nullity (T).
The matrix A\ is of order £, since B(vftl,T) has t elements. The inequality

order (Ai) > order (Ai+i)

is clear from the construction of the basis B.


Now T determines the subspaces W j , and the dimensions of the W j determine the
matrices Aj. Hence A is uniquely determined by T. ■

Example 1 □ Let T be the linear transformation of R 5 that has matrix

3 0 - 2 - 3 1
4 -1 -1 -4 0
M = | 4 -1 -1 -4 0
1 0 - 1 - 1 1
2-1 0-2 0

relative to the standard basis. It is easily verified that T is nilpotent of index 3. We


follow the proof of Theorem 10.28 to obtain a basis B of R 5 such that T has a matrix
A of the form described in the theorem.
The row-echelon form of M leads to the basis {(1,0,0,1,0), (1, 2, 2,0,1)} of the kernel
W i of T. The row-echelon form for M2 shows that the kernel W2 of T2 is the set of all
(xi,£2>£3,#4,#5) in R 5 such that X2 = #3. The basis of W i can then be extended to
the basis
{(1,0,0,1,0), (1,2, 2,0,1); (0,1,1,0,0), (0,1,1,0,1)}
of W2. (The semicolon separates the segments in the basis.) This basis can in turn be
extended to the basis

.A = { ( 1 , 0 , 0 , 1 , 0 ) , (1,2,2,0,1); (0,1,1,0,0), (0,1,1,0,1); (0,0,1,0,0)}


354 Chapter 10 Spectral Decompositions

of the desired type.


As the first step in modifying the basis, we find T(0,0,1,0,0) = (—2, - 1 , —1, - 1 , 0 )
and replace the segment (0,1,1,0,0), (0,1,1,0,1) in the basis of W 2 by the vector
( - 2 , —1, —1, —1,0). The vector (0,1,1,0,0) can be used to extend this set to the basis

{(1,0,0,1,0), (1,2,2,0,1); ( - 2 , - 1 , - 1 , - 1 , 0 ) , (0,1,1,0,0)}

of W2. [Incidentally, the vector (0,1,1,0,1) cannot be used.] Next we replace the
segment (1,0,0,1,0), (1,2,2,0,1) by T ( - 2 , - 1 , - 1 , - 1 , 0 ) , T(0,1,1,0,0) to obtain the
basis
M = { ( - 1 , - 2 , -2,0, -1), (-2, - 2 , - 2 , - 1 , -1);
( - 2 , - 1 , - 1 , - 1 , 0 ) , (0,1,1,0,0); (0,0,1,0,0)}

as described in the proof of the theorem. By rearrangement of the vectors according to


their positions in the segments, we obtain the basis

B = { ( - 1 , - 2 , - 2 , 0 , - 1 ) , ( - 2 , - 1 , - 1 , - 1 , 0 ) , (0,0,1,0,0);
( - 2 , - 2 , - 2 , - 1 , - 1 ) , (0,1,1,0,0)}.

It is easily verified that the matrix of T relative to B is

A1 0
A =
0 A2

where
0 1 0
0 ll
A1 = 0 0 1 , A2 =
0 0
0 0 0

Exercises 10.5

1. Let T be the nilpotent transformation of R n that has the given matrix relative to
the standard basis. Find a basis B of R n that satisfies the conditions of Theorem
10.28, and exhibit the matrix A described in that theorem.

4 - 1 -19 3 2 0-1-2
-3 0 12 - 2 4 -1 -1 -4
(a) (b)
1 0 - 4 1 4 -1 -1 -4
0 0 0 0 0 0 0 0
10.6 The Jordan Canonical Form 355

1 -1 -2 0 1 3 0 -2 -2 1
0 1 3 0 -2 4 -1 -1 -2 0
(c) 2 1 1 0 0 (d) 4 -1 -1 -3 0
1 0 0 0 1 1 0 -1 -1 1
1 1 2 0 -1 2 -1 0 -2 0

2. In each part below, let T be the nilpotent transformation in the corresponding


part of Problem 1. Find ß(v, T) for the given v, and find the matrix relative to
#(v, T) of the restriction of T to C(v, T).
(a) v = ( 1 , 1 , 0 , - 1 ) (b) v =(1,1,1,1)
(c) v = ( - 1 , - 1 , 1 , 0 , 1 ) (d) v = ( 1 , 2 , - 1 , 1 , 2 )
3. Prove that the subspace C ( v , T ) in Definition 10.24 is T-invariant.
4. Prove that T is nilpotent of index t if and only if the minimum polynomial of T
is xl.
5. Prove that T is nilpotent if and only if the characteristic polynomial of T is xn.
6. A square matrix A is called nilpotent if Ak = 0 for some positive integer k.
The index of nilpotency of A is the least positive integer t such that A1 = 0.
Prove that if A is nilpotent of index £, then any matrix that is similar to A is also
nilpotent of index t.
7. Show that the zero operator Z is the only normal linear operator on a unitary
space V that is nilpotent.
8. Prove that the cyclic subspace of v relative to T is the set of all vectors of the
form #(T)(v), where g(T) is a polynomial in T.
9. Prove that C(v, T) is the intersection of all T-invariant subspaces that contain v.

10.6 The Jordan Canonical Form


Throughout the development, a good portion of our effort has been connected in one
way or another with representations of linear operators by diagonal matrices, and this
is in keeping with the importance of the topic. A person's knowledge of this area might
be regarded as a good rough measure of his knowledge of the fundamentals of linear
algebra.
Not all linear operators can be represented by a diagonal matrix, but a standard form
can be obtained that comes very close to being a diagonal matrix. This standard form
is known as the Jordan canonical form. In a sense, it is the ultimate result concerning
diagonalization, and this is the main reason for its inclusion here. There are situations
in which it is of great value.
We begin with the definition of a Jordan matrix.
356 Chapter 10 Spectral Decompositions

Definition 10.29 The Jordan matrix of order k with eigenvalue X is thekxk matrix

K 1 0 · • 0 0
0 A 1 · • 0 0
0 0 λ · • 0 0

0 0 0 · • λ 1
0 0 0 · • 0 λ

Theorem 10.30 Let T have characteristic polynomial

i{x)^{-l)n{x-Xl)m^--{x-\r)m^

and minimal polynomial

m(x) = {x - Xif1 · ·· (x - Xr)tr

over T, and let U{ be the geometric multiplicity of Xi. Then there exists a basis of Y
such that the matrix of T relative to this basis has the following form:

Ji 0 ··· 0
, 0 J2 0
J =

0 0

with Ji a square matrix of order mi given by

Jix 0 0
, 0 Ji2 0
Ji

0 0 J™

where each Jik is a Jordan matrix with eigenvalue Xi such that Jn has order U and
order (J^) > order (J^^+i) for all k. For a prescribed ordering of the eigenvalues, the
matrix J is uniquely determined by T and is called the Jordan canonical matrix for
T.

Proof. Let T = D + N be the expression of T as the sum of a diagonalizable


transformation D and a nilpotent transformation N as in the proof of Theorem 10.23.
10.6 The Jordan Canonical Form 357

With the same notation as used in Section 10.4, let /Q denote the kernel of (Γ — A;)**,
and let F{ = Pi(T). Then V —K\ Θ · · · Θ /Cr, and Fi is the projection of V onto /Q
along Y2jjCifcj, by Theorem 10.16. Hence any nonzero vector in /Q is an eigenvector
of Fi corresponding to the eigenvalue 1, and the restriction of Fi to Kj is the zero
transformation if i φ j . For the time being, let di denote the dimension of /Q. With
every choice of bases Bi = {u*i,..., u ^ } for the subspaces /Q, the set

B = {Un,...,Uid1,U2l,...,U2d2, ...,Uri, ...,Urdr}

is a basis of V. Since i^(/Q) = /Q and Fi{Kj) ={0} if i φ j , each /Q is invariant under


D = X\F\ + · · · + A r F r , and the restriction of D to /Q has matrix Xildi relative to Bi.
It follows from the discussion at the beginning of Section 10.5 that D has the matrix

Γλι/di 0 ··· 0 1
0 X2Id2 ··· 0
K
=

[ 0 0 ... Xrldrj

relative to B.
Consider now the matrix of N relative to a basis of this type. Since

(T - \i)Fi(Ki) Ç (T - Ai)(/Ci) Ç Ki

and (T — Xi)F(ICj) = {0} if i φ j , each Ki is invariant under

N = (T- \1)F1 + - - - + (Γ - Xr)Fr.

Now the restriction Ni of N to /Q is the same as the restriction of T — Xi to /Q. By part


(c) of Theorem 10.18, the minimal polynomial of the restriction of T to Ki is (x — Xi)li.
Therefore, the restriction of T — Xi to Ki has minimal polynomial xtl. This means that
the minimal polynomial of Ni is xli, and that Ni is nilpotent of index U. According to
Theorem 10.28, there exists a basis B[ — {u^ l9 ..., u!id.} of Ki such that Ni has a di x di
matrix of the form

Lu 0 0
0 Li2 ' 0

0 0 ■l^irii
358 Chapter 10 Spectral Decompositions

where each Lij has the form

0 1 0 · • 0
0 0 1· • 0

ij ~~

0 0 0· • 1
0 0 0 · • 0d

Li\ has order ti and order (Lij) > order (Lij+ι) for all j . The number of diagonal blocks
L^ in Li is the nullity of Ni. Since Ni is the same as the restriction of T — A* to /Q,
and since the kernel of T — λ^ is contained in /Q, we have

nullity (AT») = nullity(T - A*)


= dimiT-Ai)-1^)
= rii.

Relative to the basis

=
^ { u l l > • • • » u l d i » u 2 1 » •••> u 2d 2 > " ' > U r l > •••iUrdr}i

N has the matrix


Li 0 0
0 L2 0
L =

0 0 Lr
since each /C is invariant under N. Thus T has the matrix

Ai/dl+Li 0 0
0 A2/d2+L2 · 0
J =K + L

0 0

relative to #'. Now J is an upper triangular matrix, so

det(J - xJ) = ( - l ) n ( x - Ai) dl · · · (x - \r)dr-

But det(J — xl) — f(x) since J represents T. Therefore d^ = rrii for each i. Letting
Ji = Ai/ mi + Li, the matrix J is in the required form.
10.6 The Jordan Canonical Form 359

Now K is uniquely determined by f(x) whenever the ordering of the eigenvalues is


designated. By Theorem 10.28, Ni determines the matrix L;, and a prescribed ordering
of the eigenvalues then determines L. Hence J = K + L is unique if the order of the
eigenvalues is specified. ■

Theorem 10.31 If an nx n matrix A over T has characteristic polynomial f(x) and


minimal polynomial m(x) that factor over T as given in Theorem 10.30, then A is
similar over T to a unique matrix J as described in that theorem. The matrix J is
called the Jordan canonical form for A.

Proof. Let A be a matrix that satisfies the given conditions. With any given
choices of an n-dimensional vector space V over T and a basis of V, A determines a
unique linear operator T on V. Any such linear operator has f(x) as its characteristic
polynomial and m(x) as its minimal polynomial. Since any n-dimensional vector space
over T is isomorphic to Tn, we may assume without loss of generality that T is a linear
operator on Tn. The projections F;, the subspaces /Q, and the operators D and N are
determined independently of the choice of basis in Tn. Thus the matrix J is uniquely
determined by A. ■

The proof of Theorem 10.30 furnishes a method for obtaining the basis and the
matrix J described in that theorem, but several short-cuts can be made in the procedure.
This is illustrated in the following example.

Example 1 D Consider the linear operator T on R 6 that has the matrix

- 1 1 0 0 0 0
- 1 - 1 1 0 0 0
0 1 - 1 0 0 0
- 2 1 1 - 1 1 0
3 0 - 1 - 1 - 3 0
- 1 1 1 0 0-2

relative to SQ. We shall (a) find a basis B' of R 6 such that T has the Jordan canonical
matrix J relative to B\ (b) determine the matrix J, and (c) find a matrix P such that
P~lAP = J.
The characteristic polynomial of A is f(x) — (x + l)3(x + 2) 3 , so λχ = —1 and
λ2 = — 2 are the distinct eigenvalues of T. It is actually not necessary to find the
minimal polynomial.
Our first step is to find a basis B[ for the kernel /Ci of (T + l)*1 that is of the type
in the proof of Theorem 10.30. We have seen that the restriction Νχ of iV to /Ci is
the same as the restriction of T — λι to JC\. It follows that (Ν\Υ is the restriction of
(T — XiY to /Ci for each positive integer j . Since the kernel W j of (T — λι)·7 is contained
in the kernel Wj+i of (T — Ai) J ' +1 , we begin by finding a basis of the kernel W i of
360 Chapter 10 Spectral Decompositions

T — λι, extending to a basis of W2, and so on. Since the kernel of T — λχ is contained
in /Ci, the kernel of the restriction of T — λι to /Ci is the same as the kernel of T — λι.
The reduction of A—ΧχΙ = A+I to row-echelon form yields the basis {(1,0,1,0,1,0)}
of the kernel W i . By use of the row-echelon form of (A + 7) 2 , this extends to the
basis {(1,0,1,0,1,0); (1,1,1,1,0,1)} of W 2 . Repetition of this procedure with (A + I)3
produces the basis

{(1,0,1,0,1,0); (1,1,1,1,0,1); (0,0,1,0,0,0)}

of W3. Upon finding the dimension of W4, we discover that W3 = W4. By Lemma
10.26, this indicates that the index of iVi is t\ = 3. Following the procedure of Theorem
10.28, we replace (1,1,1,1,0,1) by

Nx (0,0,1,0,0,0) = ( Τ - λ χ ) ( 0 , 0 , 1 , 0 , 0 , 0 )

and
(1,0,1,0,1,0) by #2(0,0,1,0,0,0)
to obtain
ßi = {(1,0,1,0,1,0), (0,1,0,1,-1,1),(0,0,1,0,0,0)}.
The matrix of N\ relative to this basis is

0 1 0
U 0 0 1
0 0 0

Following the same procedure with the eigenvalue X2 = —2, and letting W j =
ker (ΝΔ = ker(T + 2)'", we find

W1 = ((0,0,0,1,1,0), (0,0,0,0,0,1)),
W 2 = ((0,0,0,1,1,0), (0,0,0,0,0,1); (0,0,0,0,1,0)).

We find that W2 = W 3 , and this indicates that t2 = 2. We then replace the first vector
in the basis of W2 by

JV 2 (0,0,0,0,l,0) = (T + 2)(0,0,0,0,1,0) = (0,0,0,1,-1,0)

and extend this to a basis of W i to obtain the basis

{(0,0,0,1, - 1 , 0 ) , (0,0,0,0,0,1); (0,0,0,0,1,0)}

of W2. Rearrangement of the vectors in this basis yields

iB#2 = { ( 0 , 0 , 0 , 1 , - 1 , 0 ) , (0,0,0,0,1,0), (0,0,0,0,0,1)}.


10.6 The Jordan Canonical Form

The matrix of N2 relative to B'2 is

0 1 0
0 0 0
L O o o
The desired basis B' is thus given by

& = {(1,0,1,0,1,0), (0,1,0,1, - 1 , 1 ) , (0,0,1,0,0,0),


(0,0,0,1, - 1 , 0 ) , (0,0,0,0,1,0), (0,0,0,0,0,1)},
and the matrix of T relative to this basis is

λΐ/mi + L\ 0
J =
0 A2/m 2 + ^2
- 1 1 0 0 0 0
0 - 1 1 0 0 0
0 0 - 1 0 0 0
0 0 0 - 2 1 0
0 0 0 0 - 2 0
0 0 0 0 0-2
The matrix of transition from SQ to B' is

1 0 0 0 0 0
0 1 0 0 0 0
1 0 1 0 0 0
0 1 0 1 0 0
1 - 1 0 - 1 1 0
0 1 0 0 0 1

It is easily verified that P~lAP = J. ■


Exercises 10.6
1. Given that the minimum polynomial of

2 1 0
A- 0 2 0
2 3 1
362 Chapter 10 Spectral Decompositions

is m(x) = (x — 2)2(x — 1), find an invertible matrix P such that P lAP is in


Jordan canonical form.
2. Find the Jordan canonical form for each matrix A.
7 3 3 2 8 5 6 0
0 1 2 -4 0 - 2 0 0
(a) A = (b)A:
8 -4 -5 0 -10 - 5 - 8 0
2 1 1 3 2 1 1 2

1 1 0 0 3 1 0 0
0 1 1 0 0 0 1 0
(c)A (d)A =
0 0 1 1 -1-3 3 0
1 0 2 1 1 3 - 1 2

3. For each part of Problem 2, let T be the linear transformation of R 4 that has the
matrix A relative to £4. Find a basis of R 4 such that the matrix of T relative to
this basis is the Jordan canonical matrix J for T, and write down a matrix P such
that P~lAP = J.

4. Use the results of Problems 2 and 3 to write each matrix A in Problem 2 as the
sum of a diagonalizable matrix and a nilpotent matrix.

5. Write the transformation T in Example 1 of this section as T = D + JV, where D


is diagonalizable and N is nilpotent.
6. Suppose A is a matrix that has characteristic polynomial (—l)7(x — 2) 4 (x — 7) 3
and minimal polynomial (x — 2)2(x — 7) 2 . List all the possible Jordan canonical
forms that A might have with λχ = 2 and \2 = 7.
Answers to Selected Exercises

Exercises 1.2, page 8

1. (a) Minimum of m and n (c) 12 3. (a) Yes (c) No (e) Yes

5. (a) Yes (c) No

6. (a) (1,1,0) + (0,1,1) - (-1,1,1) - (2,1,0) = (0,0,0)


(2,1,0) = ( 1 , 1 , 0 ) + ( 0 , 1 , 1 ) - ( - 1 , 1 , 1 )

7. (01,02,03) = a i e i 4 - 0 2 6 2 + 0363

8. (aua2, a 3 ) = (a x - a 2 )(l, 0,0) + (a 2 - a 3 )(l, 1,0) + a 3 (l, 1,1)

9. (1,0,1) is such a vector.

Exercises 1.3, page 17

2. The set of all vectors with components that satisfy a given equation in 5. is a
subspace of the type in 4· The set of all vectors with components that satisfy
the system of equations in 5. is the intersection of these m subspaces, and this
intersection is a subspace by Theorem 1.11.
3. U Μχ = {x G R I - 1 < x < 1}, Π M\ = 0
xec xec

4. (a) {(1,0,0), (0,1,0), (0,0,1)} (b) {(1,0,0), (0,1,0), (0,0,1), (1,1,1)}


5. Definition 1.4 Let A = {\ΐχ \ X G £ } be a nonempty set of vectors in R n . A
vector v is linearly dependent on A if there exist vectors \ΐχ1, UA 2 , ···, ^xk in Λ and
αχ, a2,..., a*; in R such that v = αι\ΐχλ + α2ΐΐλ2 H h ûfcuÀte · A vector of the form
a u can< a
Σ ί = ι i Xi is linear combination of the vectors in A.
Definition 1.5 Let A = {ιΐχ \ X G £ } be a nonempty set of vectors in R n .
Then A is linearly dependent if there exist vectors UA 1 5 UA 2 , ...,UAfc and scalars
ai, a2,..., afc, not all zero, such that αχ\ΐχ1 -f α2ΐΐλ2 + · · · + CLk^xk = 0. If A is not
linearly dependent, it is linearly independent

363
364 Answers t o Selected Exercises

6. (a) No (c) Yes (e) No 7. (a) Yes (c) No

Exercises 1.4, page 27

έ
-i*' 6)
u + v—
i i i i 1 ■^f^l 1 1 /l 1 ^ X

: v\t/(4,-4)

3 . P) Λΐλ is the origin. (J M\ is the set of all points except those with coordinates
(0, y), where y ψ 0.

4. (a) 2x + 3 y - z = 0 5. (a) 13 (c) 5 (e) v/30 7. (a) ^ ( 3 , - 4 , 1 2 )

8. (a) -4Λ/2Ϊ/21 9. 2y/î

11. (a) {(1,0,2)} (b) {(1,0,2), (2, - 1 , 1 ) , ( 1 , 1 , - 1 ) }

13. {ui, U2}, where 6ij is the Kronecker delta and ui = (<5n, 6u,..., ^ΐη);
U 2 = {621,^22, ■■■,δ2η)

Exercises 1.5, page 37

1. (a) {(2,6, - 3 ) , (5,15, - 8 ) , (5,3, - 2 ) } (c) {(1,1,0), (2,4,1), (1,2,1)}

2. (a) {(1,0,2,3), (0,1, - 2 , - 3 ) , (0,1,1,0), (0,0,0,1)}


(c) {(1,1,0,0), (0,0,1,1), (1,0,1,0), ( 0 , 1 , 0 , - 1 ) }
6. (a) Linearly independent (c) Linearly dependent
(e) Linearly independent
7. (a) Does not span R 3 (c) Does not span R 3 (e) Spans R 3
8. (a) Not a basis for R 3 (c) Basis for R 3

9. (a) Not a basis of R 4 (c) Basis of R 4


11. (a) {(1,0,1, - 1 ) , (3, - 2 , 2 , 5 ) } (c) {(2, - 1 , 0 , 1 ) , (1,2,1,0), (5,3,2,1)}
12. (a) 2 (c) 2
Answers t o Selected Exercises 365

Exercises 2.2, page 45

1. {(1,0,0,0,0), (0,1,0,0,0), (0,0,1,0,0), (0,0,0,1,0), (0,0,0,0,1)}

3. Replace the second vector by the sum of the second vector and (—2) times the
first vector.

5. Apply the sequence £Ί, E2, E3, E4 where


Ei : replace the second vector by the sum of the second and (—1) times the first,
E<i : interchange the two vectors,
Es : multiply the second vector by 3,
E4 : replace the second vector by the sum of the second and the first.

7. Ei : multiply the second vector by 2.


E<i : replace the second vector by the sum of the second vector and 3 times the
third vector.
Es : replace the third vector by the sum of the third vector and the first vector.
£4 : multiply the first vector by 2.

9. Ei : replace the third vector by the sum of the third and (2) times the second.
E2 : replace the second vector by the sum of the second and (3) times the first.
Es : replace the first vector by the sum of the first and (—1) times the fourth.
E4 : multiply the fourth vector by 2.

11. Use the inverses of the elementary operations in Problem 6 in reverse order.

13. The elementary operation of type II which replaces the first vector by the sum of
the first vector and zero times the second vector is the identity operation.

Exercises 2.3, page 48

5. (a) Linearly independent (c) Linearly dependent (e) Linearly dependent

Exercises 2.4, page 56

1. A' = {(1,0,5,0,2), (0,1,2,0,1), (0,0,0,1,1), (0,0,0,0,0)}

3. (a) 2 (c) 3

4. (a) {(1,0,0,1), (0,1,0, - 1 ) , (0,0,1,1)} (c) {(1,0,1,1), (0,1,1, - 1 ) }

5. (a) 3, so the set is linearly dependent.

6. (a) (A) = (B) (c) (A) φ (B) (e) (A) = (B) 7. (a) 3
366 Answers to Selected Exercises

8. Ei : interchange the first and third vectors.


E2 : multiply the first vector by | .
Es : replace the second vector by the sum of the second and (—3) times the first.
£4 : replace the fourth vector by the sum of the fourth and the first.
£5 : multiply the second vector by (—\).
EQ : replace the third vector by the sum of the third and the second.
£7 : replace the fourth vector by the sum of the fourth and (—3) times the second.
Eg : replace the first vector by the sum of the first and (—1) times the second.

9. (a) Ei : replace the second vector by the sum of the second vector and (—1)
times the first.
£2 : replace the first vector by the sum of the first and second.
Es : multiply the second vector by (—1).
£4 : replace the first vector by the sum of the first and (—|) times the second.
£5 : multiply the first vector by 2.
(c) £1 : interchange the first and second vectors.
£2 : replace the second vector by the sum of the second vector and (—4)
times the first vector.
£3 : replace the first vector by the sum of the first vector and (—1) times the
second vector.
£4 : replace the first vector by the sum of the first vector and (—1) times the
second vector.
£5 : multiply the second vector by 2.

Exercises 3.2, page 63

1. {(1,0,0), (0,1,0), (0,0,1)} 3. x = 2y

1 l]
[2 1 1 -1 1 0
5. (a) (c) (e)
[3 2 2 6 1 -1
1 1 1

6. (a) {(4,4), ( - 1 , 1 ) }
(c) {(0,0,0,0), (0,1,4,4), (8, - 2 , 8 , - 1 6 ) , (5,3,10, - 4 ) , (5,2,6, - 8 ) }
(e) { ( 0 , 1 , - 4 ) , (-1,2,4), ( 1 , - 2 , 0 ) }

3 -8 -5
7. (a) is a matrix of transition from the first set to the second set.
0 4 4
(c) No matrix of transition exists.
Answers t o Selected Exercises 367

2
8. (a) (7,14) (c) (2,4,-14,1) 9. (a) (c) -2
-1

11. d\ = £3, di = X2 - X3, dz = xi - X2 13. Q = JZ Pijdj for i = 1,2,..., r


J= l

Exercises 3.3, page 70

[29 - 5 _
9 -18 11 12
1. (a) 6 -6 (c) (e)
29 - 2 -75 - 7
L
[32 - 5

4 - 3 - 4 2
1 -6 -5
- 1 0 32 - 2 5 2
2. (a) BA does not exist. (c) 12 - 2 -11 (e)
- 3 6 69 - 2 4 - 6
18 32 8
-16 12 16 - 8

1 2 2 1
3. (a) n = r (c) n = r and m = t b. A ,B
2 1 1 2

2 1 1 1
7. A = ,B =
2 1 -2 -2

1 2 3 2 1 4
8. A = ,B = ,C =
2 4 2 4 3 3

10. (a) a, = 28 + 12i - 6j - 3ij (c) cy = 6i - 3j + 2ij - 10

11. χχ — 2x2 + £3 = 4
2xi + 3x 3 = 5
xi + 4x 2 - x3 = 6

Exercises 3.4, page 80

1. (a) Singular (c) Nonsingular

2. (a) Elementary (c) Not elementary (e) Elementary (g) Elementary


368 Answers t o Selected Exercises

1 0 0
3. (a) 0 1 0
0 5 1

1 -4 0 0 0 1 1 0 0
4. (a) 0 1 0 (c) Not elementary (e) 0 1 0 (g) 0 1 0
1
0 0 1 1 0 0 0 0 ·>

0 1 1 0 1 0^
5. A~
1 0
.° i . -2 1j

1 0 1 4 -1 0 1 0 -5 0 "i-f" i o]
6. (a) (c)
[θ 4 0 1 0 1
Λ K 0 1 0 1 2 l)

-2 - 3
7. (a) (c) 8. (a) (c)
1 1

1 -3 0
1 -2
13. A 15. 0 1 0
i - l 0
0 0 1

Exercises 3.5, page 89

1. (a) Interchange the first and second columns.


(c) Replace the second column by the sum of the second column and 2 times the
third column.

2. (a) In reduced column-echelon form. (c) Not in reduced row-echelon form.


(e) Not in reduced column-echelon form.

1 -2
3. (a)
0 1
Answers to Selected Exercises 369

[l 0 1 0 0
5. Not always. Let A = 0 0 and M -2 1 0 . Then A is the matrix of
11 1j 0 0 1
1 0
transition from 83 to A = {(1,0,1), (0,0,1)} and MA = -2 0 is the matrix
1
of transition form £ 3 to B = {(1, - 2 , 1 ) , (0,0,1)}. Now (1, - 2 , 1 ) is in B but not
in (A) since it has a nonzero second component. Thus (B) Φ (A) in this case.

1 0 0 0
0 0 0 1 0 0 0 2 0 0 0
1 0 0 0 1 0 0 0 1 0 0
6. (a) (c) (e)
2 0 0 -1 0 0 0 1 2 0 0
0 1 0 -1 2 0 0 0 0 1 0
1 1 1 0

0 0 0 0
1 0 0 0
(g)
0 0 0 0
0 0 1 0 0

7 3 41
1 0 -1 -1 1 8 8 8
-1 2 -1
2 1 -1 1 n 1 1 7
7. (a) 1 1 0 (c) (e) u 4 4 4

0 0 1 0 n 3 1 5
0 0 1 u 8 8 8

0 0 0 1 0 0 0 1
1 1
2 2 -2 -1 0
1 1
2 2 -1 -1 1
(g) 0 0 1 0 1
0 0 0 1 -2
3 3
2 o 1 0 0

Exercises 3.6, page 97

1. (a) Multiply the first row by 4. (c) Interchange the first and third rows.
370 Answers to Selected Exercises

2. (a) Not in reduced row-echelon form. (c) Not in reduced row-echelon form.
(e) Not in reduced row-echelon form.

41
1 0 0 8
7
1 0 1 1 0 1 1 0 1 0 8
5
0 1 0 0 1 1 -1 0 0 1 8
(c) (e)
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0

1 0 2 1 0
0 1 1 1 0
0 0 0 0 1
0 0 0 0 0

0 - 1 0 1 0 0 0
0 1 0 - 2 1 0 0
(c)
0 -2 1 1 0 1 0
1 0 0 1 - 2 0 1
7
1 0 "8 0 0
1
0 0 4 0 0 0 0
3
0 0 8 0-8- 0 0 0
(g)
-1 0 -1 0 1 1 0 1
-1 0 -2 1 0 0 1 0
-2 1 0 0 0 0

10
9

(c)

7. (a) A = {(1, - 1 , 0 ) , (0,1,1), (2,0, - 1 ) } (b)


Answers t o Selected Exercises 371

Exercises 3.7, ]page 104

1. (a) A and B are column-equivalent. (c) A and B are not column-equivalent.


2. (a) A and B are not row-equivalent (c) A and B are row-equivalent.

4 14 -7
3. 6 29 - 1 4
5 25 - 1 2

"l 0 \
5. (a) Q = o-l o an( i? are not column-equivalent.
(c) A and
o 19
1 Z
8

0 0 0-1
0 0 -I
6. (a) A and B are not row-equivalent. (c) P =
o i -1
1 0 -f

7. (a) A solution is given by

1 0 0 0 0 0 0 1 0 0 0
B = A 2
0 1 0 3 0 1 0 0 1 0 1
1 0 1 0 1 2 1 o o A 0 0

(c) A and B are not column-equivalent.


9. (a) All nonsingular n x n matrices. (b) All nonsingular n x n matrices.
Exercises 3.8, page 111

1. (a) 2 (c) 2
2. (a) {(1,0, - 1 , 2 ) , (0,1,2, - 1 ) } (c) {(1,0, - 1 , - 2 ) , (0,1,1,1)}
3. In each part except (d), A and B are equivalent.
5. Only A and B are equivalent. 6. (a) No (c) Yes

1 0 0
7. B = {(1,2,0) > (0,0,1),(0,1,0)},P = 0 0 1
-2 1 0
372 Answers to Selected Exercises

8. (a) A = {(1,1,0,0), (0,1,1,0), (0,0,1,1), (1,2,2,1)}

.A'= {(1,0,0,1), ( 0 , 1 , 0 , - 1 ) , (0,0,1,1), (0,0,0,0)]}


B = {(1,0,0,1), (0,1,0, - 1 ) , (0,0, l, l), (0,0,0,1)}
1 0 0 0 1 0 0-1
0 1 0 0 -1 1 0 - 1
,Q =
0 0 1 0 1-1 1-1
1 1 -1 1 0 0 0 1

0 0 0 1
1 0 0
0 1 - 2 0
9. (a) P = 0 1 0 ,Q =
0 0 1 0
1 -1 1
1 - 2 0 0

1 0 0 0 0 0 0 1
0 1 0 0 3 _9 I
Δ
11. P ,Q = 2 2
0 0 1 0 0 1 0 0
-1 0 0 1 -5 10 0 -5

Exercises 4.2, page 120

Γΐ 0 0 1 0 0 0 0 0 0 0 0
2. I 0 0 î 0 0 > 1 0 î 0 1 î 0 0 î 0 0 3. mn

[o o 0 0 0 0 0 0 1 0 0 1

5. Conditions 3, 4, and 9 fail to hold.


7. V is a vector space with respect to the given operations.
9. Condition 10 fails to hold.
11. V is a vector space with respect to the given operations.
13. Conditions 8 and 9 fail to hold.

15. W is a vector space with respect to the given operations.

Exercises 4.3, page 125

1. (a) Linearly dependent (c) Linearly independent (e) Linearly dependent


2. (a) The set spans P^. (c) The set does not span P^.
Answers to Selected Exercises 373

3. (a) Linearly dependent (c) Linearly dependent

4. (a) The set is not a basis for P2. (c) The set is a basis for P2·

1 0 1
5. (a) 0 1 1
1 0 0

6. (a) {pi(x),P2(x)} or any subset consisting of two distinct vectors.

[1 0 0 1]
(c) Î or any subset consisting of two distinct vectors.
Il 1 -1 i l
(e) {pi{x),P2(x),P3(x),P4{x)}
7. (a) {(1,0,0,0), (0,0,1,0)} is a basis for W .

9. W is not a subspace of R 3 . The vectors u = (1,1,0) and v = (2,4,6) are in W ,


but u + v = (3,5,6) is not in W since 5 Φ 3 2 .

11. W is not a subspace of R 3 . The vectors u = (2,3,6) and v = (4,5,20) are in W ,


but u + v =(6,8,26) is not in W since 26 φ (6)(8).

13. W is a subspace of P2 over R.

1 2 1 5
15. W is not a subspace of R2X2· The vectors u = and v = are in
3 4 6 7
2 7
W , but u + v = is not in W because its first row, first column element
9 11
is not equal to 1.

1 0 1 0
17. W is not a subspace of R2X2· The vectors u and v = are
0 2 0 -2
1 0
in W , but u + v = is not in W since it is nonzero and not invertible.
0 0

2 3 2 5
19. W is not a subspace of R2X2· The vectors u = and v = are
4 -2 6 2
4 8
in W, but u 4- v = is not in W since 4 2 φ 0 2
10 0
374 Answers to Selected Exercises

Exercises 4.4, page 129

1· / ί Σ °>ίχί ) = ( α ο, du ■», On)

3. (a) Γ = 2 , / ( α ι ( 1 , - 1 , 1 ) + α 2 ( 0 , 1 , 0 ) ) = (αι,α 2 )
(c) r = 3, / (αι(2,0,4, - 1 ) + a 2 (5, -1,11,8) + a 3 (0,1, - 7 , 9 ) ) = (a l5 a 2 , a 3 )
(e) r = 2 , / ( a i p i ( x ) + a 2 p 2 (x)) = (ai,a 2 )

1 0 1 1
(g) r = 2,/ U + a2 (01,^2)
-1 1 1 1

5. / (aipi(x) + a 2 p 2 (x) + a 3 p 3 (x)) = (ai, a 2 , a 3 )

Exercises 4.5, page 132

1. (a) {(3,3,2,0), (1,1,1,1)} (c) {(3,2,2,0), (1,1,0,0), (1,1,1,1)}

[-1 -2 3 3]
2. (a) {p1(x),p2{x),p3(x)} 3. (a) î
1-2 - 1 2 2 1

2 1 1
4. (a) (A) = (B) (b) <Λ) φ (Β) 5. (a) 0 1 -3
2 0 2

Exercises 4.7, page 142

1. xi = —27, x 2 = 19, x 3 = —6 3. No solution 5. No solution

7. xi = — | x 3 , x 2 = —^x3,X4 = 0;x 3 arbitrary


9. xi = 1 + 2x 2 + X4, x 3 = —X4; x 2 , X4 arbitrary
11. Xi = 5 — 3x 2 — 2x4 — 2x5, x 3 = 3 — 2x4; x 2 , X4, X5 arbitrary
13. Xi = 1 — 2x 2 , x 3 = 1, X4 = 0, X5 = 1; x 2 arbitrary
15. (a) 2 (b) 3 (c) The system is inconsistent since 2 ^ 3 .
17. (a) {(1,2,0,-1,3), (0,0,1,-2,2)}
(b) (3,6,1, -5,11) = (3) (1,2,0, - 1 , 3 ) + (1)(0,0,1, - 2 , 2 )
(2,4,1, - 4 , 8 ) = (2)(1,2,0, - 1 , 3 ) + (1)(0,0,1, - 2 , 2 )
( - 2 , - 4 , 2 , - 2 , - 2 ) = (-2)(1,2,0, - 1 , 3 ) + (2)(0,0,1, - 2 , 2 )
(7,17,3, -13,27) = (7)(1,2,0, - 1 , 3 ) + (3)(0,0,1, - 2 , 2 )
19. (a) a = 2 (b) All real numbers except 2 and —2 (c) a = — 2
Answers to Selected Exercises 375

21. (b) n + 1
22. 1. 0 3. { ( 1 , - 1 , - 2 ) } 5. 0 7. {(18,1,-4,0)}
9. {(2,1,0,0), (1,0, - 1 , 1 ) } 1 1 . { ( - 3 , 1 , 0 , 0 , 0 ) , ( - 2 , 0 , - 2 , 1 , 0 ) , (-2,0,0,0,1)}
13. {(-2,1,0,0,0)}

Exercises 5.2, page 154

1. (b) (S + T)(xux2) = (xu-x2), (2S-3T)(xux2) = {2x1^bx2^x1-2x2)


2. (a) T is a linear operator. (c) T is not a linear operator.
(e) T is a linear operator.

3. (a) T is a linear transformation. (c) T is not a linear transformation.

4. (a) T is a linear operator. (c) T is not a linear operator.

5. (a) T is not a linear operator. (c) T is a linear operator.

6. (a) T is a linear transformation. (c) T is a linear transformation.


(e) T is a linear transformation.

7. (a) {(3,1,2), ( - 2 , 1 , - 3 ) } (b) {(1,1,1,0), (2,1,0,1)}

9. (a) rank(T) = 3,nullity(T) = 1 10. (a) {(1,0, - § , §)}

13. (a) {(1,2,3), (1,1,1)}

Exercises 5.3, page 165

1. T(x,y) = (-y,x + 2y,x + y)) 3. T(a0 + αιχ + α 2 χ 2 ) = (αι,ο 2 - Οχ)

1 1 -1 2 4 -2
0 5 -2 -1 -2 0
5. 7. (3,0,2) 11. § + | x - f x 2 - 2 x 3
4 0 1 0 -1 2
2 3 1 0 0 1

4 31
2
Γΐ 0 2
13. (3,5) 15. 17. -1 0
1 0
L
1 3J
3 il
19. (a) {(1,0, - 3 ) , (0,1,2)} (c) {(1,0,1,0), (0,1,1,0), (0,0,0,1)}
20. (a) {(1, - 1 , 1 , 0 ) , (3, - 2 , 0 , 1 ) } (c) {(6, -f, 1,0,0), ( - 7 , f, 0,1,0)}

21. (a) { ( l , 0 , l ) , ( 0 , l , - i ) }
376 Answers t o Selected Exercises

22. (a) { ( - 2 , - 2 , - 2 , - 2 , - 3 ) , ( - 1 , - 1 , - 2 , - 2 , - 2 ) , (0,1,1,1,2)}

23. {(0,1,0), (0,0,1)}

Exercises 5.4, page 174

1. 1 + 2x + 3x 2 3. B = {1 + x + x 2 ,2 + x, 2 + x + x2}

1 0 0 0
4 1 0 0
5. (a) (b)(-2,l) 7. 0 1 0 0 9.
-4 0 1 0
0 0 0 0
Ί
3
2 0 0 4 2
[-1 0
11. 1 2 0 13. 15. -1 0
L
0 5J
0 0 1 3 1

17. Λ' = {(1,0,0), (0,1,0), (1, - 1 , 1 ) } , Β' = {(2,1), (3,2)}

18. (a) Λ' = {(-2,0,1,0, - 1 ) , (-2,0,1,0,0), (7,0, - 3 , 0 , 1 ) ,


( - 2 , 0 , 0 , 1 , - 2 ) , (-3,1,0,0,0)}

Β' = εζ

Exercises 5.5, page 183

5 0 - 3 0 4 10 0 0
1. (a) Matrix of S 0 1 6 - 1 , Matrix of T 6 - 1 0 7
2 - 9 5 2 -3 8 0-5
9 10 - 3 0
Matrix of S + T 6 0 6 6
-1 -1 5 -3

- 2 -30 - 6 0
(b) Matrix of 25 - 3T : -18 5 12 - 2 3
13 - 4 2 10 19

3 -3
-2 4
Answers to Selected Exercises 377

[ 1 1 -7
5. TS has matrix 4 1 relative to the bases {(1,2), (0,1)} of R 2 and £ 3 of R 3 .

6 -5
L

ui = - 2 v i - v 2 4- v 3 0 0 1
-11 - 8
7. (a) u 2 = vi - 2v 2 (b) 1 0 2 9.
16 9
u3 = v2 2 1 5

11. T 4 = 5T 2 - 1ST - 18, T 5 = - 3 T 2 - 38T - 30

Exercises 6.2, page 191

1. (a) 7 (c) 11 (e) 8 2. (a) Even (c) Odd (e) Even

3. (a) Apply the following interchanges in the given order: 5 and 3, 4 and 3, 2 and
3, 2 and 4, 2 and 5.
(c) Apply the following interchanges in the given order: 3 and 1, 5 and 1, 4 and
1, 2 and 1, 2 and 4, 2 and 5, 2 and 3.

5. 1= £ l ( f c ) = ( n - l ) + ( n - 2 ) + --. + 2 + l = 2 i ^ l i
fc=l

Exercises 6.3, page 194

1. (a) Odd (c) Even (e) Odd (g) Odd 3 . 8det(A)


5. a n a 2 2 ·· · α η η 7. det(cA) = cn det(A) 8. (a) —72 (c) αιια 22 α 3 3

Exercises 6.4, page 200

2. (a) 0 (c) 32 (e) - 3 2 3. 24

5. (a) AB = det(A)I3 (b) A " 1 = ^ [Aio]T 6. (a) 24 (c) 0

7. 2 , - 1

Exercises 6.5, page 204

1. (a) - 9 (c) a2 +b2 (e) - 1 7 7 (g) 60


2. (a) xi = —3,x2 = 8,x 3 = 4 (c) χχ = - l , x 2 = l , x 3 = 1
(e) xi =2,x2 = -1,x3 = 4
3. The second determinant is obtained from the first by multiplying the first column
by 2 and the second column by 3. Therefore the second determinant has the value
(2)(3)(139)=834.
378 Answers to Selected Exercises

7. 2abc(a + b + c)3

Exercises 6.6, page 211

- 1 3 2 -10 4 -1 -34 8 36
(a) è 1 -1 0 (c) 8 -5 -1 (e)i20 12 - 4 -8
1 -1 -2 1 -4 1 33 - 6 - 3 2

2. det(Am) = [det(A)}171 3. 3 5. det(adj(A)) = [det(A)] n _ 1

11. x = 2, y = 4; x = - 1 + iy/3, y = 2 ( - 1 - i>/3) ; x = - 1 - z\/3, y = 2 ( - 1 + ζγ^)

Exercises 7.2, page 221

1. 1,1 3. 5±^5-^/33 5 1?_253 7. 1,1,2 9. 1,2,2

11. - 1 , - 1 , 3 , 3 13. { 2 , - 3 } 15. {1,2} 17. 8, ( - 5 , - 6 ) ; - 1 , (1,3)

19. {1,2}; 1,1 - x, x2; 2,1 + 2x2 21. 3, (3,2); 4, (1,1) 25. 1

Exercises 7.3, page 229


For Problems 1, 3, and 5, each eigenvalue is followed by its algebraic multiplicity and
geometric multiplicity, in that order.

1. 1,2,1 3. 2,1,1; 1,2,1 5. - 1 , 2 , 2 ; 3,2,2

For Problems 7 and 8, each eigenvalue is followed by its algebraic multiplicity and a
basis for the eigenspace.

7. (a) 3,2, {(0,1,0), (2,1,1)}; - 2 , 1 , {(5,4,3)}


(c) 1,1, {(0,1,1)}; 2,1{(-1,1,0)}; 3,1, {(0,0,1)}

8. (a) 1,1, {(1 - i, - 1 ) } ; 4,1, {(1,1 +1)}


(c) 1,2, {(1,1, - 3 ) , (0,1, - 2 ) } ; 6,1, {(2,1,1)}
(e) 1 + i, 2, {(1,0,1), (0,1,0)}; 1,1, { ( 0 , 2 , - 1 ) }

9. For λ = 3 : (a) 1, (b) 1, (c) {1};


For λ = 2 : (a) 2, (b) 2, (c) {1 - x, 1 - x2}

11. For λ = 1 : (a) 1, (b) 1, (c) {1}; For λ = 2 : (a) 2, (b) 1, (c) {2 + x 2 }

13. (a) { ( 1 , - 2 ) , (2,1)} (b) | " 3


0 2
Answers t o Selected Exercises 379

λχ 0 0
0 λ2 0
15. , where Xi is the eigenvalue of T that corresponds to Vj.

0 0 ··· λ „ ^

17. Only In 23. P~lX is a corresponding eigenvector.


Exercises 7.4, page 236
1. a) A is not similar over R to a diagonal matrix. (b) Not possible.
3. a) A is not similar over R to a diagonal matrix. (b) Not possible.

1 1 1 0
0 1 0 0
a) A is similar over R to a diagonal matrix. (b) P =
- 1 0 0 1
0-1 0-1

2 0
a) T can be represented by a diagonal matrix. (b) ,{(2,1), ( 1 , - 2 ) }
0 -3
9. a) T cannot be represented by a diagonal matrix. (b) Not possible
11. a) T can be represented by a diagonal matrix.
3 0 0
b) | 0 3 0 | , {(0,1,0), (2,1,1), (5,4,3)}

0 0-2

13. a) T can be represented by a diagonal matrix


o]
[ l 00 ol
b) 0 2 0 | ,{(0,1,1), ( - 1 , 1 , 0 ) , (0,0,1)}
[o o 3 J
1-i 1
15. a) A is similar over C to a diagonal matrix. (b) P =
-1 1+ i

1 0 2
17. a) A is similar over C to a diagonal matrix. (b) P 1 1 1
-3 -2 1
380 Answers to Selected Exercises

1 0 0
19. (a) A is similar over C to a diagonal matrix. (b) P = 0 1 2
1 0 -1

0 1 0 0
0 1 0
1 1 0 0 1 0
23. 26. (a) 0 0 1 (c)
0 1 0 0 0 1
2 0 5
-4 0-5 0
Exercises 8.2, page 244
1. / is a linear functional. 3. / is not a linear functional. 5. c = (2, —1,4)
7. /(5,43) = 12, /(xi, x 2 , x3) = 2xx - x 2 + 2x3
9. (a) A* = {pi,p 2 ,p 3 }, where pi(u) = x 3 - x 2 ,p 2 (u) = x2 - xi,p 3 (u) = xx for
u = (xi,X2,x 3 )·
(c) A* = {pi,P2,P3>, where pi(u) = |(2x x + x 2 - x 3 ),
p 2 (u) = | ( - 3 x i - x 2 + 3x 3 ),p 3 (u) = | ( - x i - x 2 + x3) for u = (xi,x 2 ,x 3 ).

10. (a) [ 0 - 1 1 ] , [ - 1 1 θ ] ,[ΐΟθ]

(c) § [ 2 1 - l ] ,5 [ - 3 - 1 3] ,§[-l-ll]

11. (a) [ 10 1 5 ] (c) [ 2 2 6 18 ]


12. (a) / ( e i ) = 3, /(e 2 ) = - 2 , /(e 3 ) = 7
(c) / ( β θ = 2, /(e 2 ) = 0, /(e 3 ) = 4, /(e 4 ) = 12
14. (a) Pl = 4j(5gi - 12g2 + 18g 3 ),p 2 = i(73gi + 38g2 - 16g3),
P3 = ^ ( 6 g i + 2 g 2 - 3 g 3 )
(C) Pl = gl + g2 + 3g 3 - g 4 , P2 = gl - 2g 3 , P3 = gl + g4, P4 = g3
17. (a) {4Λ - 3/ a - / s , 8/1 - 2/ 2 - / 4 } (c) {Λ + /„ - 4/ 3 , / 4 }
Exercises 8.3, page 252
1. Only the required symmetric matrix A is given.
9
-2 -5 -f -7 2
0 2 3 4
4 3 9
(a) (c) -5 0 4 (e) 2
6 0 (g) 3 4 5
3 -2
f 4 -7 0 0 0 4 5 6
Answers to Selected Exercises 379

2. (a) q(v) = -2y\ + 18y| (c) q(v) = y\- 2yiy3 - 4y2y3 - 3y|

-1 0 o] [ 144 -12 0
3. (a) 0-3 0 (c) -12 -12 12
0 0 4 0 12 -16

Exercises 8.4, page 257

1. V! = ^ ( l , - l , 0 , l ) , v 2 = ^ ( 3 , 2 , 3 , - l ) , v 3 = ^ ( - 2 , - 1 , 3 , 1 )

3 V3 =
· 2 7 l 5 ( 2 U l + U2 - U
3)

4. (a) {1(1,2,-2,4), ^ ( 0 , - 1 , 3 , 2 ) , ^ ( 4 , 1 , 1 , - 1 ) }

(c) {±(2,2,1), ^ ( - 1 , 1 , 0 ) , ^ ( - 1 , -1,4)}

Exercises 8.5, page 263

0 2 ^ 0 4
V^ \/3 - 1 -2\/2 3 1
\/2 -y/3 -1J 2\/2 3 -1

Exercises 8.6, page 269

1. (a) r = 2,p=l,s =0 (c) r■ = 3,p = 2,s = 1


~2
2. (a) q(v) = z\ - z\ (c) q(y) = z\ + zf - zf

| 1 0-2I Γ 2 0 4
3
· (a) P
= 2 ^ l· ">/3 M (c)p=^ -4 3 1
I1 V3 1I [ 4 3 - 1

4
· (a) B = {273^ *· !)» 5(0» - 1 » !)· ä i s i " 2 ' X> X)}
(c) B = { i ( l , -2,2), 1(0,1,1)^(4,1,-1)}

Exercises 8.7, page 274

3 5 1 5+M - 5+i
0 2-1
1. (a) (c) 5 5 2 (e) 3 - 6i 3 + 6i
2 1 1
1 2 0 6 + 4z -6
382 Answers t o Selected Exercises

2. (a) 24 (c) - 7 (e) Ui

84 0 -15 -9-i -5+i


32 - 4 - 1 1
3. (a) c
( ) 0 -3 3 (e) —9 — 3i - 6 + 3i
-8 1 -40
15 3 -9 3-i 3

4. (a) A = {(1,0,0,0), (-4,1,0,0), (1, - 1 , 1 , 0 ) , (2, - 1 , 0 , 1 ) } ,


B' = {(1,0,0), (0,1,0), ( - 2 , 5 , 1 ) }
(c) A' = {(0,0,0,1), (0,1,0, - 2 ) , (0, - 2 , 1 , 0 ) , (1,0,0,0)},
ff = {(1,0,0), (0,1,0), ( - 1 , - 1 , 1 ) }
Exercises 8.8, page 281

1 0 -6 1 1 -2 1 1 -3
1. (a) P = 1 0 4 (c)P = 1 -1 3 (e)P = 1 -1 2
0 1 5 0 0-2 0 0 1

2. (a) {(2,1,2), (0,1, - 1 ) , ( - 2 , - 1 , - 7 ) } (c) {(2,1,2), (0,1,0), (1, - 4 , 3 ) }


(e) {(2,1,2), (0,1,0), ( - 1 , - 2 , - 2 ) }

Exercises 8.9, page 291

1. Those in parts (a), (b), (d), (e), (g).

1 1
2. (a) For part (a), P = ; for part (c), the matrix is not hermitian;
0 hi
1 1 -4-i 1 1 - 1
for part (e), P = 0 bi 5 - 2z ; for part (g), P = 0 0 2i
0 0 9 0 11 1
(b) In part (a), r = p = 2;in part (c), the matrix is not hermitian; in part (e),
r = p = 3; in part (g), r = p = 3
(c) Those in parts (a) and (b) are conjunctive, and those in parts (e) and (g) are
conjunctive.

3 0
3· (a) =i
3Λ/5
(c) The matrix in 1(c) is not hermitian.
0 5i

3 1 -4-< 3Λ/6 1 -vTl

(·) 3VE
ό 0 U 5 - 2i (e)ds
V66 0 0 2iv/TÏ
0 0 9 0 11 Λ/ΪΪ
Answers to Selected Exercises 383

4. (a) Those for the matrices in parts (a), (b), (d), (e), (g).
(b) For / in part (a), A! = {(1,0), (1,5<)} ;
for / in part (e), M = {(1,0,0), (1,5i, 0), (4 + i, - 5 + 2i, - 9 ) } ;
for / in part (g), A! = {(1,0,0), (1,0,11), (1, - 2 i , - 1 ) }

Exercises 9.2, page 296

1. (a) (u,v) = (v,u) = - 3

2. (a) (u, v) = 15 + 3z, (v, u) = 15 - Si

4. (a) The rule defines an inner product on R 2 .


(c) The rule does not define an inner product on R 2 . If v = (1, —1), then (v, v) =
0, and this contradicts the requirement that (v, v) > 0 if v ^ 0.
(e) The rule does not define an inner product on R 3 . If v = (0,1,0), then (v, v) =
0, contradicting the property that requires (v, v) > 0 if v φ 0.

5. (u, v) = hü\V\ + ( - 5 + i)ü\v<i + (2 — ΐ)ϋ\ν$ + ( - 5 - i)ü2vi + TÜ2V2 + ( - 5 + i)Ü2V3


+(2 + i)Û3Vi + ( - 5 - i)ûsV2 + 6Û 3 Î;3

ΐ . ^ = { ^ ( 1 > 0 , 0 ) ^ δ ( - 3 , - 5 , - 5 ) , ^ δ ( 9 > 1 1 + « , 5 + 2·)}

5 i
9. (a)
-i 2
(bM={0?.o), (**#)}
13. Those in parts (a), (b), (e), (g).

Exercises 9.3, page 301

1. (a) Vïï (c) y/5 2. (a) y/ÏÂ (c) y/ï

3. (a) We have |(u, v)| = | — 13| = 13 and ||u|| · ||v|| = x/Ï4\/38 = Λ/532 = 2>/Ï33.
Since 13 < 2Λ/133, the inequality is verified.
(b) y/Ü (c) χ/78

5. d(u, v) = y/\u\ — vi\2 + \u2 — V2I2, where u = (1/1,1x2) and v = (^1,^2)


17
1 1 . COS0: 2ν/9Ϊ
13. 3

Exercises 9.4, page 305

2. (a) ^(3^V2,-l)^{l,-y/2,-l)^(-3,V2,-5)]

4. (a) { ^ ( i > - i > i ) > 5 J s ( 3 + i , l + 3 i > - 2 i ) }


384 Answers to Selected Exercises

[l 0 0 1 0 0 o ol
5 5 5
0 0 0 0 1 0 0 1 1
7. {(1,1,1), (0, - 1 , - 1 ) , ( - 9 , - 1 1 - 2t, - 5 - 2<)}

8
* (a) {(i'Ti'iJ'v~2'v5>""iJ'( 2 Z '°'^~^);
Exercises 9.5, page 308

4. (a) { ^ ( 2 , 1 , 0 , 0 ) , ^ ( 3 , - 6 , 1 0 , 5 ) } 5. (a) { ^ (i.1,0)}

Exercises 9.6, page 312


1 _2 2
3 3 3
1. (a) No (c) Yes 2. (a) Yes (c) No 3. 2 _2 _1
3 3 3
2 _1
3 3

Exercises 9.7, page 317

1. Those in parts (a) ,(c), (d).


2. The matrix in (b) is orthogonal. Those in parts (b) and (e) are unitary. All are
normal.
3. Only x = y = z = 0

Exercises 9.8, page 322

1. (a) Τ*(αι,α 2 ) = (αι + ( 1 - ί ) α 2 , ( 1 + ί ) α ι + 2 α 2 )


(c) Τ*(αι,α 2 ) = (-iai +α2,ζαι)

2. (a) | - 7 ^ ( - l + h 1), -7^(1,1 + i)\ (c) No such basis exists.

3. All of the matrices A are normal. A matrix U of the required type is given for
parts (a), (c), and (e).

1 1 1
z — 1 —i
(a) U ( ^ = 75
y/2 2 2
1 1 _1
z 1+i
L y/2 2 2

y/2 1-i 2
(e) U. 2y/2
\/2i 1+i -2i
-y/2 -\/2i 2 0
Answers to Selected Exercises 385

Exercises 10.2, page 329

5. Let Pi,P2 be defined on R 2 by P\{x\,X2) = ( ^ ι , ^ ι ) , and P2(x\,X2) — (0, X1+X2)

Exercises 10.3, page 334

2. (a) T is diagonahzable. (b) Pi + P 2 is not a projection.

Exercises 10.4, page 345

2. (a) (x - 2)2(x + 3) (c) (x - 2) 2 (x + 2)

±V-l-bc b
3. A = , with —be > 1.
c =F\/-1 - be

4. (a) T = 3Pi — 2P 2 , where Pi, P2 have matrices E\,Ei relative to 83 given by


2 1 -1 -1 -1 1
E,= 1 2 -1 5^2 -1 -1 1
3 3-2 -3 -3 3

(c) T = Pi — P2 4- 3P3, where Pi,P2,P3 have matrices 2£ι,ϋ72>^3 relative to


6 4 5 2 2 1 2 0
-12 - 8 - 1 0 -4 -12 - 6 - 1 2 0
£4 given by Ελ = - \ 5^2 = 7
0 0 0 0 8 4 8 0
0 0 0 0 -2 -1 -2 0
14 7 8 4
-12 - 6 - 8 -8
£3=4
-8 -4 -4 0
2 1 2 4

5. (a) A = D + AT, where the diagonahzable matrix D and the nilpotent matrix N
5 2 2 2 2 1 1 0
4 3 4-4 -4 -2 -2 0
are given by D , and N ■■
-8 - 4 - 5 0 0 0 0 0
0 0 0 3 2 1 1 0
386 Answers to Selected Exercises

(c) A = D + N, where the diagonalizable matrix D and the nilpotent matrix N


2 3 0-1 0 - 1 0 1
1 2 1 0 - 1 0 1 0
are given by D = | , and N = \
0 1 2 1 0 - 1 0 1
-10 3 2 - 1 0 1 0

Exercises 10.5, page 354

1. (a) B = { ( - 4 , 3 , - 1 , 0 ) , ( - 5 , 3 , - 1 , 0 ) , (3, - 2 , 1 , 0 ) , (0,0,0,1)},


0 1 0 0
0 0 1 0
A =
0 0 0 1
0 0 0 0
(c) B = {(-2,4,0,0,2), ( - 1 , 0 , 2 , \ , 1 ) , (1,0,0,0,0); (1, - 2 , 0 , 1 , - 1 ) , (0,0,0,0,1)},
0 1 0
A! 0 0 1
A = with Αχ = | 0 0 1 | , and A2 =
0 A2 0 0
0 0 0

0 1 0 0
0 0 1 0
2. (a) ß(v, T) = {(4, - 3 , 1 , 0 ) , (1,0,0,0), (0, - 1 , 0 , 0 ) , (1,1,0, - 1 ) } ,
0 0 0 1
0 0 0 0

0 1 0
(c) ß(v,T) = {(2, - 4 , 0 , 0 , - 2 ) , (1,0, - 2 , 0 , - 1 ) , ( - 1 , - 1 , 1 , 0 , 1 ) } , 0 0 1
0 0 0

Exercises 10.6, page 361

3 1 0 0 0 1 0 0
-2 1 0
0 3 0 0 0 0 0 0
1. P = 0 -2 0 2- (a) (c)
0 0 1 0 0 0 2 1
-4 0 1
0 0 0 -1 0 0 0 2
Answers to Selected Exercises 379

3. (a) B' = {(1, -2,0,1), (0,2,-1, -1), (1, -2,0,0), (0,1, -1,0)},
1 0 1 0
-2 2-2 1
0-1 0-1
1-10 0
(c) B> = {(1, -1,1, -1), (2,-1, 0,1), (-1, -1, -1, -1), (3,2,1,0)},
1 2 - 1 3
-1 -1 -1 2
P =
1 0 - 1 1
-1 1 - 1 0

5 2 2 2 1 1
4 3 4 -4 -2 - 2
(a) A +
-8 -4 - 5 0 0 0
0 0 0 2 1 0
1
0 2 o-i
1 1
i l 0 0 * 0
(c) A O -L
2
O +
1 1
2 o-l
-i 0 3
2
1 ■i o
5. D has matrix K and JV has matrix L relative to £e, where
-1 0 0 0 0 0 1 0 0 0 0
0 1 0 0 0 -1 0 1 0 0 0
0 0 -1 0 0 0 1 0 0 0 0
K and L
0 1 0-2 0 -2 0 1 1 1 0
1 -1 0 0-2 2 1 -1 -1 -1 0
0 1 0 0 0-2 -1 0 1 0 0 0
Index
A Jordan, 359
Addition Cartesian product, 270
of equations, 134 Cauchy-Schwarz inequality, 299
of linear transformations, 148, Change of variable
of matrices, 117 linear, 251
of subsets, 16 orthogonal, 255
of vectors, 2 Characteristic
Adjoint equation, 214
of a linear operator, 318 matrix, 214
of a matrix, 209 polynomial, 214
Algebraic multiplicity, 225 root, 213
Annihilator, 245 value, 213
Associated vector, 213
direct sum decomposition, 327 Coefficient matrix, 137
eigenvector, 213, 215 Cofactor, 196
Augmented matrix, 137 Column
equivalent matrices, 99
B matrix, 60
Basis, 29 operation, 83
cyclic, 348 space, 106
dual, 241 vectors, 106
orthogonal, 254 Companion matrix, 238
standard, 33, 54, 131 Complete set of projections, 327
Bijective mapping, 128 Complex
Bilinear form, 270 bilinear form, 283
complex, 283 inner product space, 293
hermitian complex, 286 Components, 1
matrix of, 271 Conformable, 67
rank of, 273 Congruence of matrices, 251
skew-symmetric, 282 Conjugate
symmetric, 277 of a matrix, 259
Binary relation, 99 transpose, 259
Conjunctive matrices, 285
c Coordinate, 33, 62
Canonical form matrix, 62
for quadratic form, 266 projections, 241

389
390 Index

Corresponding Equivalence, 106


direct sum decomposition, 327 column, 99
eigenvector, 213, 216 relation, 99
Cramer's rule, 203 row, 103
Cross product, 28 Equivalent
Cyclic matrices, 106
basis, 348 systems of equations, 135
subspace, 348 Euclidean space, 293

D F
Determinant, 191 Field, 113
of order n, 192 Finite-dimensional vector space,
Diagonal Form
block matrix, 348 bilinear, 270
elements, 60 complex bilinear, 283
matrix, 60 hermitian, 287
Dimension, 33, 59, 115 quadratic, 247, 280
Direct sum, 39 Function, 128
associated, 327
Directed line segment, 19 G
Distance, 301 Gauss-Jordan elimination, 139
Division algorithm, 336 Geometric multiplicity, 223
Divisor, 336 Gram-Schmidt process, 255, 304
Dot product, 24 Greatest common divisor, 336
Dual
basis, 241 H
space, 242 Hamilton-Cay ley Theorem, 338
Hermitian
E complex bilinear form, 286
Eigenspace, 223 congruent matrices, 285
Eigenvalue, 213-214 form, 287
Eigenvector, 213, 215 matrix, 286
Elementary operator, 319
column operation, 83
matrix, 77 I
operation on vectors, 42 Idempotent
row operation, 91 linear operator, 326
Equality principal, 345
of equations, 134 Identity
of functions, 117 matrix, 60, 74
of indexed sets, 29 operation, 42
of mappings, 128 Image, 128, 145
of matrices, 60 inverse, 128, 145
of polynomials, 116 Index
of vectors, 1 of a permutation, 188
Index 390

of an integer, 187 functional, 239


of hermitian form, 288 operator, 146
of nilpotency of a matrix, 355 transformation, 145
of nilpotency of an operator, 344 Linearly
of quadratic form, 266 dependent, 3, 5, 7
of real symmetric matrix, 267 independent, 5
set, 13 Lower triangular matrix, 313
Inequality
Cauchy-Schwarz, 299 M
Infinite-dimensional vector space, 115 Mapping, 128
Injective mapping, 128 bijective, 128
Inner product, 24, 293 injective, 128
generated by a matrix, 296 one-to-one, 128
space, 293 onto, 128
standard, 295 surjective, 128
Invariant subspace, 341 Matrix, 59
Inverse augmented, 137
image, 128, 145 characteristic, 214
of elementary operation, 43 coefficient, 137
of linear transformation, 181 column, 60
of matrix, 74 companion, 238
Invert ible linear transformation, 181 congruence, 251
Invertible matrix, 74 conjugate, 259
Isometry, 309 conjugate transpose, 259
Isomorphic vector spaces, 128 coordinate, 62
Isomorphism, 128 diagonal, 60
diagonal block, 348
J elementary, 77
Jordan hermitian, 286
canonical form, 359 identity, 60, 74
canonical matrix, 356 inverse, 74
matrix, 356 invertible, 74
Jordan, 356
K lower triangular, 313
Kernel, 151 nilpotent, 345, 355
Kronecker delta, 8, 41 nonsingular, 72
normal, 316
L of bilinear form, 271
Leading one, 51, 54, 84, 93 of complex bilinear form, 283
Leading variables, 139 of constants, 137
Length, 24, 298 of linear transformation, 157
Linear of quadratic form, 249, 281
change of variable, 251 of unknowns, 137
combination, 3 orthogonal, 254
dependence, 7 polynomial, 337
390
392 Index

positive definite, 269 matrix, 72


positive semidefinite, 269 Norm, 24, 298, 301
projection, 345 Normal
reduced column-echelon, 84, 86 linear operator, 319
row, 60 matrix, 316
scalar, 83 Normalized set of vectors, 254, 305
similarity, 225 Nullity, 151
singular, 72 Number field, 114
skew-hermitian, 292
skew-symmetric, 92, 211, 253 o
square, 60 One-to-one mapping, 128
symmetric, 92 Order
transformation, 147 of a determinant, 192
transition, 61 of a matrix, 60
unitary, 304 Orthogonal
upper triangular, 313 basis, 254
zero, 60, 117 change of basis, 255
Metric, 301 change of variable, 255
space, 301 complement, 307
Minimal polynomial, 338 linear operator, 258, 309
Minor, 196 matrix, 254
Monic polynomial, 336 operator, 309
Multiplication projection, 308, 325
of a scalar and a matrix, 83, 117 set, 27, 303
of a scalar and a vector, 2, 114 set of projections, 327
of a scalar and an equation, 134 similarity, 313
of linear transformations, 178 vectors, 26-27, 303
of matrices, 66 Orthogonally similar matrices, 313
Multiplicity Orthonormal set of vectors, 254, 303
algebraic, 225
geometric, 223 P
Parallelogram rule, 20
N Parameters, 139
Natural ordering, 187 Permutation, 187
Negative Perp, 306
definite hermitian form, 288 Polynomial
definite quadratic form, 267 characteristic, 214
semidefinite hermitian form, 288 matrix, 337
semidefinite quadratic form, 267 minimal, 338
Nilpotent monic, 336
linear operator, 344 Positive
matrix, 345, 355 definite hermitian form, 288
Nonnegative linear operator, 335 definite quadratic form, 267
Nonsingular semidefinite hermitian form, 288
linear transformation, 181 semidefinite quadratic form, 267
Index 393

Principal idempotents, 345 inner product space, 293


Product quadratic form, 247
cartesian, 270 Reduced column-echelon form, 84, 86
cross, 28 Reduced row-echelon form, 93, 95
dot, 24 Reflexive property, 99
inner, 24, 293 Relation
of a scalar and a linear transforma­ binary, 99
tion, 148, 177 equivalence, 99
of a scalar and a matrix, 83, 117 Restriction of a linear operator, 341
of a scalar and a vector, 2, 114 Row
of linear transformations, 178 equivalent matrices, 103
of matrices, 66 matrix, 60
scalar, 24 operation, 91
Projection, 26, 241, 325-326
matrices, 345 S
orthogonal, 308, 325 Scalar, 1, 114
scalar, 26 component, 26
Proper matrix, 83
number, 213 multiple, 2, 83
value, 213 multiplication, 2, 114, 134
vector, 213 product, 24
projection, 26
Self-adjoint linear operator, 319
Q Signature
Quadratic form, 247, 280 of hermitian form, 288
index of, 266 of quadratic form, 266
matrix of, 281 Similar matrices, 225
negative definite, 267 Singular matrix, 72
negative semidefinite, 267 Skew-adjoint linear operator, 323
positive definite, 267 Skew-hermitian
positive semidefinite, 267 linear operator, 323
rank of, 264 matrix, 292
real, 247 Skew-symmetric
signature of, 266 bilinear form, 282
linear operator, 323
R matrix, 92, 211, 253
Range, 150 Solution
Rank of a system of equations, 134
of bilinear form, 273 set, 134
of linear transformation, 150 Space
of matrix, 106 column, 106
of real quadratic form, 264 complex inner product, 293
Real dual, 242
coordinate space, 1 Euclidean, 293
coordinate vectors, 1 metric, 301
394 Index

real coordinate, 1 T
real inner product, 293 Trace, 155, 231, 240, 297
unitary, 293 Transformation, 128
vector, 114 linear, 145
Span, 13 matrix, 147
Spectral decomposition Transition matrix, 61
of a linear operator, 332 Transitive property, 99
of a matrix, 345 Transpose, 92
Spectrum, 213-214
Square matrix, 60 u
Square root of a linear operator, 335 Unit vector, 27
Standard Unitarily similar matrices, 313
basis, 33 Unitary
basis of subspace, 54, 131 matrix, 304
inner products, 295 operator, 309
Submatrix, 211 similarity, 313
Subspace, 10, 122 space, 293
cyclic, 348 Upper triangular matrix, 313
invariant, 341
spanned by a set, 15, 123 V
Sum Vector, 1, 114
direct, 39 column, 106
of linear transformations, 148 component, 1, 26
of matrices, 117 geometric interpretation, 19
of subsets, 16 norm, 24, 298
of vectors, 2 projection, 26
Superdiagonal elements, 350 space, 114
Surjective mapping, 128 unit, 27
Symétrie
bilinear form, 277 z
Symmetric Zero
matrix, 92 linear transformation, 146
operator, 319 matrix, 60, 117
property, 99 subspace, 11

You might also like