You are on page 1of 214

JinHo Kwak

Sungpyo Hong

Linear Algebra
Second Edition

Springer Science+Business LLC


Contents

Preface to the Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Preface to the First Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Linear Equations and Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Gaussian elimination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Sums and scalar multiplications of matrices. . . . . . . . . . . . . . . . . . . . . 11
1.4 Products of matrices .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Block matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7 Elementary matrices and finding A-I . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 LDU factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29
1.9 Applications..... . .. . . . .. . . .. . .. .... . . . .. .. . . . . ....... . ... . 34
1.9.1 Cryptography.. . . .. . . . . . . . .. . .. . .. .. .. . .. . .. . . .. .. .. 34
1.9.2 Electrical network 36
1.9.3 Leontief model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.10 Exercises 40

2 Determinants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45
2.1 Basic properties of the determinant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Existence and uniqueness of the determinant. . . . . . . . . . . . . . . . . . .. 50
2.3 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 56
2.4 Cramer's rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 61
2.5 Applications .... . ....... ... . . . .. . ... ............. ... ... .... 64
2.5.1 Miscellaneous examples for determinants. . . . . . . . . . . . . . .. 64
2.5.2 Area and volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67
2.6 Exercises ............. ..... . . .. . . .. . ...... .... .. .. . .. .. . . . 72
xii Contents

3 Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75
3.1 The n-space jRn and vector spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Subspaces. . . . .. . . . ... . . .. . . . .. . . . . . ... . ... ... . . . . ... .... .. 79
3.3 Bases. . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4 Dimensions . ... . ... .. .. ... .. .. . .. .. . . . . .... . .. ... . . . ... . .. 88
3.5 Rowand column spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91
3.6 Rank and nullity 96
3.7 Bases for subspaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 100
3.8 Invertibility.... .. ...... ... .. ... . .. . . ... . . . . . . . . . . . . . . . . . . . 106
3.9 Applications 108
3.9.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.9.2 The Wronskian 110
3.10 Exercises 112

4 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117


4.1 Basic propertiesof linear transformations 117
4.2 Invertiblelinear transformations ..................... 122
4.3 Matrices of linear transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 126
4.4 Vector spaces of linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.5 Change of bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.6 Similarity. . . . .. .. . . . . . . . . . .. . . . . . .. .... .. . . . ... .. . . . ... . .. 138
4.7. Applications 143
4.7.1 Dual spaces and adjoint " 143
4.7.2 Computer graphics 148
4.8 Exercises ... . . . . . .. . ... . . . .. . .. . . . . . . . . . . . .. . .. . . . . . . . . . .. 152

5 Inner Product Spaces 157


5.1 Dot products and inner products 157
5.2 The lengths and angles of vectors.. . . . .. . .. .. . . .. . . .. .. . . .. . .. 160
5.3 Matrix representations of inner products 163
5.4 Gram-Schmidt orthogonalization 164
5.5 Projections. .... . .. . . ... .. . . ... .... .... ................. . . . 168
5.6 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.7 Relations of fundamental subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.8 Orthogonal matrices and isometries 177
5.9 Applications 181
5.9.1 Least squares solutions 181
5.9.2 Polynomial approximations 186
5.9.3 Orthogonalprojectionmatrices. . . . . . . . . . . . . . . . . . . . . . . . . 190
5.10 Exercises 196
Linear Algebra
1
Linear Equations and Matrices

1.1 Systems of linear equations


One of the central motivations for linear algebra is solving a system oflinear equations.
We begin with the problem of finding the solutions of a system of m linear equations
in n unknowns of the following form :

I
allXI + a12 X2 + + alnXn = bl
a2lXI + a22x2 + + a2nXn = b2

amlXI + am2X2 + . ., + amnXn = b m,

where Xl , X2, . . . , Xn are the unknowns and aij's and b.'s denote constant (real or
complex) numbers.
A sequence of numbers (Sl, S2, .. . ,sn) is called a solution of the system if Xl =
sj , X2 = S2, , X n = Sn satisfy each equation in the system simultaneously. When
bl = b2 = = b m = 0, we say that the system is homogeneous .
The central topic of this chapter is to examine whether or not a given system has
a solution , and to find the solution if it has one. For instance, every homogeneous
system always has at least one solution XI = X2 = ... = Xn = 0, called the trivial
solution . Naturally, one may ask whether such a homogeneous system has a nontrivial
solution or not. If so, we would like to have a systematic method of finding all the
solutions. A system of linear equations is said to be consistent if it has at least one
solution, and inconsistent if it has no solution .
For example, suppose that the system has only one linear equation

If ai = 0 for i = I, .. . , n, then the equation becomes 0 = b. Thus it has no solution


if b i: 0 (nonhomogeneous), or has infinitely many solutions (any n numbers Xi'S
=
can be a solution) if b 0 (homogeneous).
In any case, if all the coefficients of an equation in a system are zero, the equation
is vacuously trivial. In this book, when we speak of a system of linear equation, we

HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
2 Chapter 1. Linear Equations and Matrices

always assume that not all the coefficients in each equation of the system are zero
unless otherwise specified.

Example 1.1 The system of one equation in two unknowns x and y is

ax +by = c,

in which at least one of a and b is nonzero. Geometrically this equation represents a


straight line in the xy-plane. Therefore, a point P = (x, y) (actually, the coordinates
x and y) is a solution if and only if the point P lies on the line . Thus there are infinitely
many solutions which are all the points on the line.

Example 1.2 The system of two equations in two unknowns x and y is

a\x + b\y = c\
{ a2 X + b2Y = C2·

Solution: (I) Geometric method. Since the equations represent two straight lines in
the xy-plane, only three types are possible as shown in Figure 1.1.

Case (1) Case (2) Case (3)

{2xx-y -1 {x- y = -1
{x+ y 1
-2y = -2 x-y = 0 x-y = 0

y y y

x x

Figure 1.1. Three types of solution sets

Since a solution is a point lying on both lines simultaneously, by looking at the


graphs in Figure 1.1, one can see that only the following three types of solution sets
are possible:
(1) the straight line itself if they coincide,
(2) the empty set if the lines are parallel and distinct,
(3) only one point if they cross at a point.

(II) Algebraic method. Case (1) (two lines coincide): Let the two equations rep-
resent the same straight line, that is, one equation is a nonzero constant multiple of
the other. This condition is equivalent to
1.1. Systemsof linear equations 3

In this case, if a point (s, t) satisfies one equation, then it automatically satisfies the
other too. Thus, there are infinitely many solutions which are all the points on the
line.
Case (2) (two lines are parallel but distinct) : In this case, a2 = Aa" bi =
Ab" but C2 : /= ACl for A : /= O. (Note that the first two equalities are equivalent to
a,b2 - a2b, = 0). Then no point (s, t) can satisfy both equations simultaneously, so
that there are no solutions .
Case (3) (two lines cross at a point) : Let the two lines have distinct slopes, which
means aib: - a2b, ::/= O. In this case, they cross at a point (the only solution), which
can be found by the elementary method of elimination and substitution. The following
computation shows how to do this:
Without loss of generality, one may assume a,: /= 0 by interchanging the two
equations if necessary. (If both a, and a2 are zero, the system reduces to a system of
one variable.)

(1) Elimination : The variable x can be eliminated from the second equation by adding
- a2 times the first equation to the second, to get
a,

(2) Since a, b2 - a2b, : /= 0, y can be found by multiplying the second equation by a


nonzero number
a, to get
a,b2 - a2b,

c,
a,c2 -a2c,
a,b2 - a2b,

(3) Substitution: Now, x is solved by substituting the value of y into the first equation,
and we obtain the solution to the problem:

b2Cl - b,C2
aib: - a2b,

Note that the condition a, b2 - a2b, : /= 0 is necessary for the system to have only
one solution. D

In Example 1.2, the original system of equations has been transformed into a
simpler one through certain operations, called elimination and substitution, which is
4 Chapter 1. Linear Equations and Matrices

just the solution of the given system . That is, if (x, y) satisfies the original system
of equations, then it also satisfies the simpler system in (3), and vice-versa. As in
Example 1.2, we will see later that any system of linear equations may have either no
solution, exactly one solution, or infinitely many solutions. (See Theorem 1.6.)
Note that an equation ax + by + cz = d, (a, b, c) i- (0,0,0) , in three unknowns
represents a plane in the 3-space ]R3. The solution set includes

{(x, y, 0) I ax + by = d} in the xy-plane,


{(x, 0, z) I ax + cz = d} in the xz-plane,
{(O, y, z) I by + cz = d} in the yz-plane.

One can also examine the various possible types of the solution set of a system of
three equations in three unknown. Figure 1.2 illustrates three possible cases.

Infinitly many solutions Only one solution No solutions

Figure 1.2. Three planes in ]R3

Problem 1.1 For a system of three linear equations in three unknowns

allx + anY + al3Z = bl


+ + =
1 a21x
a31 x +
a22Y
a32Y +
a23Z
a33Z =
b2
b3,

describe all the possible types of the solution set in the 3-space ]R3 .

1.2 Gaussian elimination


A basic idea for solving a system of linear equations is to transform the given system
into a simpler one, keeping the solution set unchanged, and Example 1.2 shows an
idea of how to do it. In fact, the basic operations used in Example 1.2 are essentially
only the following three operations , called elementary operations:
1.2. Gaussian elimination 5

(1) multiply a nonzero constant throughout an equation,


(2) interchange two equations,
(3) add a constant multiple of an equation to another equation.
It is not hard to see that none of these operations alters the solutions . That is, if
satisfy the original equations , then they also satisfy those equations altered by
Xi ' S
the three operations, and vice-versa.
Moreover, each ofthe three elementary operations has its inverse operation which
is also an elementary operation:

(I') multiply the equation with the reciprocal of the same nonzero
constant ,
(2') interchange two equations again,
(3') add the negative of the same constant multiple of the equation to
the other.
Therefore, by applying a finite sequence of the elementary operations to the given
original system, one obtains another new system, and by applying these inverse oper-
ations in reverse order to the new system, one can recover the original system. Since
none of the three elementary operations alters the solutions, the two systems have
the same set of solutions. In fact, a system may be solved by transforming it into a
simpler system using the three elementary operations finitely many times.
These arguments can be formalized in mathematical language. Observe that in
performing any of these three elementary operations , only the coefficients of the
variables are involved in the operations, while the variables Xl, X2 , • • •, X n and the
equal sign U= " are simply repeated . Thus, keeping the places of the variables and U= "
in mind, we just pick up the coefficients only from the given system of equations and
make a rectangular array of numbers as follows:

This matrix is called the augmented matrix of the system. The tenn matrix means
just any rectangular array of numbers, and the numbers in this array are called the
entries of the matrix. In the following sections, we shall discuss matrices in general.
For the moment, we restrict our attention to the augmented matrix of a system.
Within an augmented matrix, the horizontal and vertical subarrays

[ail a i2 . .. ain bj) and

are called the i-th ro w (matrix), which represents the i-th equation , and the j-th
column (matrix), which are the coefficients of j -th variable X i - of the augmented
6 Chapter 1. Linear Equations and Matrices

matrix, respectively. The matrix consisting of the first n columns of the augmented
matrix

is called the coefficient matrix of the system.


One can easily see that there is a one-to-one correspondence between the columns
of the coefficient matrix and variables of the system. Note also that the last column
[bl b2 . . . bmf of the augmented matrix represents homogeneity of the system and
so no variable corresponds to it.
Since each row of the augmented matrix contains all the information of the cor-
responding equation of the system, we may deal with this augmented matrix instead
of handling the whole system of linear equations, and the elementary operations may
be applied to an augmented matrix just like they are applied to a system of equations .
But in this case, the elementary operations are rephrased as the elementary row
operations for the augmented matrix:

(1st kind) multiply a nonzero constant throughout a row,


(2Dd kind) interchange two rows,
(3rd kind) add a constant multiple of a row to another row.

The inverse row operations which are also elementary row operations are

(1st kind) multiply the row by the reciprocal of the same constant,
(2Dd kind) interchange two rows again,
(3rd kind) add the negative of the same constant multiple of
the row to the other.

Definition 1.1 Two augmented matrices (or systems of linear equations) are said to
be row-equivalent if one can be transformed to the other by a finite sequence of
elementary row operations .

Note that, if a matrix B can be obtained from a matrix A by these elementary


row operations, then one can obviously recover A from B by applying the inverse
elementary row operations to B in reverse order. Therefore, the two systems have the
same solutions:

Theorem 1.1 If two systems of linear equations are row-equivalent, then they have
the same set ofsolutions.

The general procedure for finding the solutions will be illustrated in the following
example :
1.2. Gaussian elimination 7

Example 1.3 Solve the system of linear equations :

I
2y + 4z = 2
x + 2y + 2z = 3
3x + 4Y + 6z = -1.

Solution: One could work with the augmented matrix only. However, to compare
the operations on the system of linear equations with those on the augmented matrix,
we work on the system and the augmented matrix in parallel. Note that the associated
augmented matrix for the system is

3 4 6 -1

(1) Since the coefficient of x in the first equation is zero while that in the second
equation is not zero, we interchange these two equations :

I [b;; ;].
x + 2y + 2z = 3
2y + 4z = 2
3x + 4Y + 6z = -1 3 4 6 -1

(2) Add -3 times the first equation to the third equation:

I
x +
-
2y
2y
2y
+
+
2z =
4z =
=
3
2
-10 [
1
o
o
2
2
-2
;o ;].
-10

Thus , the first variable x is eliminated from the second and the third equations. In this
process , the coefficient 1 of the first unknown x in the first equation (row) is called
the first pivot.
Consequently, the second and the third equations have only the two unknowns y
and z. Leave the first equat ion (row) alone, and the same elimination procedure can
be applied to the second and the third equations (rows): The pivot to eliminate y from
the last equation is the coefficient 2 of y in the second equation (row).
(3) Add 1 times the second equation (row) to the third equation (row):

I [b;; ;] .
x + 2y + 2z = 3
2y + 4z = 2
4z = -8 o 0 4 -8

The elimination process (i.e., (1) : row interchange , (2): elimination of x from the
last two equations (rows), and then (3): elimination of y from the last equation (row))
done so far to obtain this result is called forward elimination. After this forward
elimination, the leftmost nonzero entries in the nonzero rows are called the pivots .
Thus the pivots of the second and third rows are 2 and 4, respectively.
(4) Normalize nonzero rows by dividing them by their pivots. Then the pivots are
replaced by 1:
r
8 Chapter 1. Linear Equations and Matrices

+ +
2y
y +
2z
2z
z
=
=
= -2
3
I
o 0 I -2
iJ .
The resulting matrix on the right-hand side is called a row-echelon form of the
augmented matrix, and the I's at the pivotal positions are called the leading I's. The
process so far is called Gaussian elimination.
The last equation gives z = -2. Substituting z = -2 into the second equation
gives y = 5. Now, putting these two values into the first equation, we get x = -3. This
process is called back substitution.The computation is shown below: i.e., eliminating
numbers above the leading I's,
(5) Add -2 times the third row to the second and the first rows:

{X +2; = 7 2 0

z
=
= -2
(6) Add -2 times the second row to the first row:
5
U I 0
o I -2

= -3 0 0
J
{ x y
z
=
=
5
-2 U
I 0 -35
o I -2
.

This resulting matrix is called the reduced row-echelon form of the augmented
matrix, which is row-equivalent to the original augmented matrix and gives the so-
lution to the system . The whole process to obtain the reduced row-echelon form is
called Gauss-Jordan elimination. 0

In summary, by applying a finite sequence of elementary row operations , the aug-


mented matrix for a system of linear equations can be transformed into its reduced
row-echelon form, which is row equivalent to the original one. Hence the two corre-
sponding systems have the same solutions. From the reduced row-echelon form, one
can easily decide whether the system has a solution or not, and find the solution of
the given system if it is consistent.

Definition 1.2 A row-echelon form of an augmented matrix is of the following form:


(1) The zero rows, if they exist, come last in the order of rows.
(2) The first nonzero entries in the nonzero rows are I, called leading I's.
(3) Below each leading I is a column of zeros. Thus, in any two consecutive nonzero
rows, the leading I in the lower row appears farther to the right than the leading
I in the upper row.
The reduced row-echelon form of an augmented matrix is of the form:
(4) Above each leading I is a column of zeros, in addition to a row-echelon form .
1.2. Gaussian elimination 9

Example 1.4 The first three augmented matrices below are in reduced row-echelon
form, and the last one is just in row-echelon form.

000 000 0 0
i 000 1 001 3

Recall that in an augmented matrix [A b], the last column b does not correspond
to any variable. Thus , if the reduced row-echelon form of an augmented matrix for a
nonhomogeneous system has a row of the form [ 0 0 ... 0 b ] with b f:. 0, then the
associated equation is OXj + OX2 + . .. + OXn = b with b f:. 0, which means the
system is inconsistent. If b = 0, then it has a row containing only O's, which can be
neglected and deleted . In this example, the third matrix shows the former case, and
the first two matrices show the latter case. 0
In the following example, we use Gauss-Jordan elimination again to solve a
system which has infinitely many solutions .
Example 1.5 Solve the following system of linear equations by Gauss-Jordan elim-

I
ination.
Xj + 3X2 - 2X3 = 3
2xj + 6X2 - 2X3 + 4X4 = 18
X2 + X3 + 3X4 = 10.

Solution: The augmented matrix for the system is

13-2 0 3]
[ 2 6 -2 4 18
o 1 1 3 10
.

The Gaussian elimination begins with:


(1) Adding -2 times the first row to the second produces

13-2 0 3]
[ o
o
0
1
2 4 12
1 3 10
.

(2) Note that the coefficient of X2 in the second equation is zero and that in the
third equation is not. Thus, interchanging the second and the third rows produces

13-2 0 3]
[ 011310
o 0 2 4 12
.

(3) Dividing the third row by the pivot 2 produces a row-echelon form
10 Chapter 1. Linear Equations and Matrices

We now continue the back-substitution :


(4) Adding -1 times the third row to the second, and 2 times the third row to the
first produces
1 3 0 4 15]
o 1 0 1 4 .
[o 0 1 2 6
(5) Finally, adding -3 times the second row to the first produces the reduced
row-echelon form:
1 0 0
010 14
[ 001 2 6
13] .

The corresponding system of equations is

Xl + X4 = 3
+ = 4
1
X2 X4
x3 + 2X4 = 6.

This system can be rewritten as follows:

:
1 X3 = 6 2X4 .

Since there is no other condition on X4, one can see that all the other variables x}, X2,
and X3 may be uniquely determined if an arbitrary real value t e lRis assigned to X4
(R denotes the set of real numbers): thus the solutions can be written as

(Xl, X2, X3, x4)=(3-t , 4 - t , 6-2t, t), teR o


Note that if we look at the reduced row-echelon form in Example 1.5, the variables
and X3 correspond to the columns containing leading 1's, while the column
Xl , x2,
corresponding to X4 contains no leading 1.
An augmented matrix of a system of linear equations may have more than one
row-echelon form, but it has only one reduced row-echelon form (see Remark (2) on
page 97 for a concrete proof) . Thus the number of leading 1's in a system does not
depend on the Gaussian elimination.

Definition 1.3 Among the variables in a system, the ones corresponding to the
columns containing leading l's are called the basic variables, and the ones cor-
responding to the columns without leading 1's, if there are any, are called the free
variables.

Clearly the sum of the number of basic variables and that of free variables is equal
to the total number of unknowns: the number of columns .
In Example 1.4, the first and the last augmented matrices have only basic variables
but no free variables, while the second one has two basic variables Xl and X3, and two
1.3. Sums and scalar multiplications of matrices 11

free variables X2 and X4. The third one has two basic variables XI and X2, and only
one free variable X3.
In general, as we have seen in Example 1.5, a consistent system has infinitely
many solutions if it has at least one free variable, and has a unique solution if it has no
free variable. In fact, if a consistent system has a free variable (which always happens
when the number of equations is less than that of unknowns), then by assigning
arbitrary value to the free variable, one always obtains infinitely many solutions.

Theorem 1.2 If a homogeneous system has more unknowns than equations, then it
has infinitely many solutions.
Problem 1.2 Suppose that the augmented matrices for some systems of linear equations have
been reduced to reduced row-echelon forms as below by elementary row operations. Solve the
systems:

(1)
[
1 0
0 I
o 0
o
o
o
5]
-2 ,
4
(2) [bo 0 I 3
-I ]
6
2
.

Problem 1.3 Solve the following systems of equations by Gaussian elimination. What are the
pivots?
-x + y + 2z 0 2y z I
(1)1 3x + 4y + z 0 (2) 14X lOy + 3z 5
2x + 5y + 3z = O. 3x 3y 6.
w + x + y = 3
-3w 17x + y + 2z I
(3)\ 4w 17x + 8y 5z I
5x 2y + z l.

Problem 1.4 Determine the condition on bi so that the following system has a solution.
+ 2y + 6z bl + 3y 2z bl
(1) 1 3y 2z = b2 (2) { + 3z
Y b2
3x y + 4z b3· 4x + 2y + Z b3·

1.3 Sums and scalar multiplications of matrices

Rectangular arrays of real numbers arise in many real-world problems. Historically,


it was an English mathematician J.1. Sylvester who first introduced the word "matrix "
in the year 1848. It was the Latin word for womb, as a name for an array of numbers.

Definition 1.4 An m by n (written mxn) matrix is a rectangular array of numbers


arranged into m (horizontal) rows and n (vertical) columns . The size of a matrix is
specified by the number m of the rows and the number n of the columns .
12 Chapter 1. Linear Equations and Matrices

In general, a matrix is written in the following form:

or just A = [aij] if the size of the matrix is clear from the context. The number aij is
called the (i, j)-entry of the matrix A , and is written as aij = [A]ij.
An m x 1 matrix is called a column (matrix) or sometimes a column vector, and
a 1 xn matrix is called a row (matrix), or a row vector. In general, we use capital
letters like A, B, C for matrices and small boldface letters like x, y, z for column
or row vectors.

Definition 1.5 Let A = [aij] be an mxn matrix. The transpose of A is the nxm
matrix, denoted by AT, whose j -th column is taken from the j -th row of A: that is,
[AT]ij = [Alji.

Example 1.6 (1) If A = [135]


2 4 6 ,then AT = [12]
;: .

(2) The transpose of a column vector is a row vector and vice-versa:

X {:=:} xT = [XI X2 • •• xnl.


-- [ ]

o
Definition 1.6 A matrix A = [aij] is called a square matrix of order n if the number
of rows and the number of columns are both equal to n.

Definition 1.7 Let A be a square matrix of order n.


(1) The entries au, a22, . . . , ann are called the diagonal entries of A.
(2) A is called a diagonal matrix if all the entries except for the diagonal entries are
zero.
(3) A is called an upper (lower) triangular matrix if all the entries below (above,
respectively) the diagonal are zero.

The following matrices U and L are the general forms of the upper triangular and
lower triangular matrices, respectively:

Jl
ln
a2n
a
. . ]
, L=
[
.. : a n2
1.3. Sums and scalar multiplications of matrices 13

Note that a matrix which is both upper and lower triangular must be a diagonal matrix,
and the transpose of an upper (lower, respectively) triangular matrix is lower (upper,
respectively) triangular.

Definition 1.8 Two matrices A and B are said to be equal, written A = B , if their
sizes are the same and their corresponding entries are equal : i.e., [A]ij = [B]ij for
all i and j.

This definition allows us to write a matrix equation. A simple example is (ATl =


A by definition .
Let Mmxn{lR) denote the set of all m x n matrices with entries of real numbers.
Among the elements of Mmxn{lR), one can define two operations, called the scalar
multiplication and the sum of matrices, as follows:

Definition 1.9 (I)(Scalarmultiplication) Foranm xn matrix A = [aij] E Mmxn{lR)


and a scalar k E lR (which is simply a real number), the scalar multiplication of k
=
and A is defined to be the matrix kA such that [kA]ij k[A]ij for all i and j: i.e., in
an expanded form:

a ll . ..
k . .
[
. .
amI
(2)(Sum ofmatrices) For two matrices A = [aij] and B = [bij] in Mmxn(R), the
sum of A and B is defined to be the matrix A + B such that [A + B]ij [A]ij + [B]ij
=
for all i and j : i.e., in an expanded form:

aln]
:. +
[b
11
:.
.. .
'. '
al
n
7
bIn ].

amn bml amn +bmn


The resulting matrices kA and A + B from these two operations also belong to
Mmxn(lR) . In this sense, we say Mm xn(lR) is closed under the two operations. Note
that matrices of different sizes cannot be added ; for example, a sum

124]+[ad e fc]
[3 b

cannot be defined.
If B is any matrix , then - B is by definition the multiplication (-1) B. Moreover, if
A and B are two matrices of the same size, then the subtraction A - B is by definition
the sum A + (-1) B. A matrix whose entries are all zeros is called a zero matrix,
denoted by the symbol 0 (or Omxn when the size is emphasized).
Clearly, matrix sum has the same properties as the sum of real numbers. The real
numbers in the context here are traditionally called scalars even though "numbers"
is a perfectly good name and "scalar" sounds more technical. The following theorem
lists the basic arithmetic properties of the sum and scalar multiplication of matrices.
14 Chapter 1. Linear Equations and Matrices

Theorem 1.3 Suppose that the sizes of A, Band C are the same. Then the following
arithmetic rules ofmatrices are valid:
(1) (A + B) + C = A + (B + C) , (written as A + B + C) (Associativity),
(2) A+ 0 0+ A
= A, =
(3) A+(-A)=(-A)+A=O,
(4) A + B = B + A, (Commutativity),
(5) k (A + B) = kA + k.B,
(6) (k + £)A = kA + lA,
(7) (kl)A = k (fA).

Proof: We prove only the equality (S) and the remaining ones are left for exercises.
For any (i, j),
ij = k[A + B]ij = k([A]ij + [B]ij) = [kA] ij + [kB]ij
= [kA +kB]ij . o
In particular, A + A = 2A , A + (A + A) = 3A = (A + A) + A, and inductively
nA = (n - l)A + A for any positive integer n,

Definition 1.10 A square matrix A is said to be symmetric if AT = A , or skew-


symmetric if AT = - A .
For example , the matrices

A = b ] , B = ] ,
sc
b c -2 -3 0
are symmetric and skew-symmetric, respectively. Notice that all the diagonal entries
of a skew-symmetric matrix must be zero, since aii = -aii .
By a direct computation, one can easily verify the following properties of the
transpose of matrices:
Theorem 1.4 Let A and B be m x n matrices. Then
(kA)T = kAT and (A + B)T = AT + B T.
Problem 1.5 Prove the remaining parts of Theorem 1.3.

Problem 1.6 Find a matrix B such that A + BT = (A - B)T , where

2-3 0]
A = [ 4 -1 3
-1 0 1
.

Problem 1.7 Find a , b , c and d such that

[ ac db] = 2[a2 a+c + [2+b


c+d
3]
a+9]
b .
1.4. Products of matrices 15

1.4 Products of matrices


The sum and the scalar multiplic ation of matrices were introduced in Section 1.3. In
this section , we introduce the product of matrices. Unlike the sum of two matrices ,
the product of matrices is a little bit more complicated, in the sense that it can be
defined for two matrice s of different sizes. The product of matrices will be defined in
three steps:
Step ( 1) Product of vectors: For a 1 x n row vector a = [al a2 . . . an] and an
n x 1 column vector x = [XI X2 ... Xnf , the product ax is a 1 x 1 matrix (i.e., just
a number) defined by the rule

ax [al a2 ... a.] [ ] [a,x, + a2X2 + .. .+ a.x.l [t.aiXi] .

Note that the number of entries of the first row vector is equal to the number of entries
of the second column vector, so that an entry-wise multiplication is possible.
Step (2) Product of a matrix and a vector: For an m x n matrix

with the row vectors a i'S and for an n x 1 column vector x = [XI'" xn]T, the
product Ax is by definition an m x 1 matrix , whose m rows are computed according
to Step (1 ):

n
Ax =
al
oi« ] [ XI]
X2 a2X [I:?=l
[alX] I:?=I aloux.ix i ]
ami a m2 • = = I:?=I:amiXi .

Therefore, for a system of m linear equations in n unknowns, by writing the n un-


knowns as an n x 1 column matrix x and the coefficients as an m x n matrix A, the
system may be expressed as a matrix equation Ax = b .
Step (3) Product of matrices: Let A be an m x n matrix and B an n x r matrix
with columns bj , b2, . . . , b.. written as B = [bl b2 .. . b.]. The product AB
is defined to be an m x r matrix whose r columns are the products of A and the r
columns of B, each computed according to Step (2) in corresponding order. That is,

AB = [ Abl
16 Chapter 1. Linear Equations and Matrices

which is an m x r matrix. Therefore , the (i , j)-entry [AB] ij of AB is:

[A Blij = a ib j = ailbl j + ai2b 2j + . .. + ainbnj = Lk=1 aikbkj .


This can be easily memorized as the sum of entry-wise multiplications of the boxed
vectors in Figure 1.3.

:
blj bl r
[ bll b2j b2r
AB

amI am2 amn


... bnj bnr J
Figure 1.3. The entry [AB]ij

Example 1.7 Consider the matrices

B= [51 - 20]
1 0 .

The columns of AB are the product of A and each column of B :

2 · 1 + 3 . 5 ] _ [ 17 ]
[ 4 ·1+0 ·5 - 4 '

2 ·2 + 3 . (-1) ] _ [ 1 ]
[-i] = [ 4 ·2+0·(-1) - 8 '

2 .0 + 3.0 ] _ [ 0 ]
[ 4 ·0+0 ·0 - 0 .

Therefore , AB is

2 3] [1 2 0] = [17 1 0]
[ 4 0 5 -1 0 4 8 0 .

Since A is a 2 x2 matrix and B is a 2x3 matrix, the product AB is a 2x3 matrix.


If we concentrate, for example, on the (2, I)-entry of AB, we single out the second
row from A and the first column from B, and then we multiply corresponding entries
together and add them up, i.e., 4 . 1 + 0 . 5 = 4. 0

Note that the product AB of A and B is not defined if the number of columns of
A and the number of rows of B are not equal.
1.4. Products of matrices 17

Remark: In step (2), instead of defining a product of a matrix and a vector, one can
define alternatively the product of a 1 x n row matrix a and an n x r matrix Busing
the same rule defined in step (1) to have a 1 x r row matrix aB. Accordingly, in step
(3) an appropriate modification produces the same definition of the product of two
matrices . We suggest that readers complete the details. (See Example 1.10.)
The identity matrix of order n, denoted by In (or I if the order is clear from the
context) , is a diagonal matrix whose diagonal entries are all 1, i.e.,

By a direct computation, one can easily see that AIn = A = InA for any n x n matrix
A.
The operations of scalar multiplication, sum and product of matrices satisfy many,
but not all, of the same arithmetic rules that real or complex numbers have. The matrix
Om xn plays the role of the number 0, and In plays that of the number 1 in the set of
usual numbers.
The rule that does not hold for matrices in general is commutativity AB = BA
of the product, while commutativity of the matrix sum A + B = B + A always holds .
The follow ing example illustrates noncommutativity of the product of matrices.

Example 1.8 (Noncommutativity of the matrix product)

Let A = _ and B = l Then,

AB = l BA =

which shows AB =1= BA . 0

The following theorem lists the basic arithmetic rules that hold in the matrix
product.

Theorem 1.5 Let A, B, C be arbitrary matrices for which the matrix operations
below can be defined, and let k be an arbitrary scalar. Then
(1) A(BC) = (AB)C, (written as ABC) (Associativity),
(2) A(B + C) = AB + AC, and (A + B)C = AC + BC, (Distributivity),
(3) I A = A = AI,
(4) k(BC) = (kB)C = B(kC),
(5) (AB)T =B T AT.
18 Chapter 1. Linear Equations and Matrices

Proof: Each equality can be shown by direct computation of each entry on both
sides of the equalities. We illustrate this by proving (I) only, and leave the others to
the reader.
Assume that A = [aij] is an mxn matrix, B = [bkl] is an nxp matrix, and
C = [cstl is a pxr matrix. We now compute the (i, i)-entry of each side of the
equation. Note that BC is an n xr matrix whose (i , i)-entry is [BC]ij = L:f=l bnc:..l :
Thus
n n p n p

[A(BC)]ij = =L ai/l I)/l}.C}.j =L Lai/lb/l}.C}.j.


/l=1 /l=1 }.=I /l=1 }.=I

Similarly, AB is an m x p matrix with the (i, i)-entry [A B]ij = ai/lb/lj, and

p p n n p

[(AB)C]ij = L[AB]i}.C}.j = L L ai/lb/l}.c}.j =L L ai/lb/l}.c}.j'


}.=l }.=I/l=1 /l=1 }.=I

This clearly shows that [A(BC)]ij = [(AB)C]ij for all i, i. and consequently
=
A(BC) (AB)C as desired. 0

Problem 1.8 Give an exampleof matrices A and B such that (AB)T f= AT BT .

Problem 1.9 Proveor disprove: If A is not a zeromatrixand AB = AC,thenB = C.Similarly,


is it true or not that AB = 0 implies A = 0 or B = O?

Problem 1.10 Showthatany triangularmatrix A satisfying A AT = AT A is a diagonalmatrix.


Problem 1.11 For a square matrix A , showthat
(1) AA T and A + AT are symmetric,
(2) A - AT is skew-symmetric, and
(3) A can be expressedas the sum of symmetric part B = (A + AT) and skew-symmetric
part C = - AT) , so that A =B+ C.

As an application of our results on matrix operations, one can prove the following
important theorem:

Theorem 1.6 Any system of linear equations has either no solution, exactly one
solution, or infinitely many solutions.

Proof: We have seen that a system of linear equations may be written in matrix form
as Ax = b. This system may have either no solution or a solution. If it has only one
solution, then there is nothing to prove. Suppose that the system has more than one
solution and let Xl and X2 be two different solutions so that AXI = band AX2 = b.
Let Xo = XI - X2· Then xo f= 0, and Axo = A(XI - X2) = O.Thus
1.5. Block matrices 19

+ kxo) == AXI + kAxo = b.


A(XI

This means that Xl + kxo is also a solution of Ax = b for any k. Since there are
infinitely many choices for k, Ax = b has infinitely many solutions. 0

Problem 1.12 For which values of a does each of the following systems have no solution,
exactly one solution, or infinitely many solutions?

I
x + 2y 3z 4
(1) 3x y + Sz 2
4x + y + (a 2 - 14)z a + 2.

I
x - y + z = 1
(2) x + 3y + az 2
2x + ay + 3z 3.

1.5 Block matrices


In this section we introduce some techniques that may be helpful in manipulations of
matrices. A submatrix of a matrix A is a matrix obtained from A by deleting certain
rows and/or columns of A. Using some horizontal and vertical lines, one can partition
a matrix A into submatrices, called blocks, of A as follows: Consider a matrix

divided up into four blocks by the dotted lines shown. Now, if we write

All = [all a12 a 13], A12 = [ a14 ] ,


a21 a22 a23 a24

then A can be written as

called a block matrix.


The product of matrices partitioned into blocks also follows the matrix product
formula, as if the blocks Aij were numbers: If

A = [All A12] and B = [Bll B12]


A2l An B2l B22

are block matrices and the number of columns in Aik is equal to the number of rows
in Bkj, then
20 Chapter 1. Linear Equations and Matrices

AB _ [AllBll + Al2 B21 AII Bl2 + Al2 B22 ]


- A21Bll + A22B21 A2I Bl2 + A22 B22 .

This will be true only if the columns of A are partitioned in the same way as the rows
of B.
It is not hard to see that the matrix product by blocks is correct. Suppose, for
example, that we have a 3x3 matrix A and partition it as

a ll al2 I a 13] [ A
A = a21 a22 I a23 = All
[ - - - 21
a31 an I a33
and a 3 x 2 matrix B which we partition as

B= = [ ] .
E3i1J3i
Then the entries of C = [Cij] = A B are

Cij = (ailb lj + ai2b2j) + ai3b3j.

The quantity ailblj + ai2b2j is simply the (i, j)-entry of AllBll if i .:::: 2, and is
the (i , j)-entry of A21 Bll if i = 3. Similarly, ai3b3j is the (i, j)-entry of A l2B21 if
i .:::: 2, and of A22B21 if i = 3. Thus AB can be written as

AB =[ Cll ] =[ AllBll + AI2 B21 ] .


e21 A21 Bll + A22 B21
Example 1.9 If an m x n matrix A is partitioned into blocks of column vectors : i.e.,
A = [CI C2 cn] , where each block Cj is the j-th column, then the product Ax
with x = [XI xnf is the sum of the block matrices (or column vectors) with
coefficients X /s:

Ax ICI '2 ',] [ ] x1'd X2'2 +. + x""

where Xj Cj = xj[alj a2j anjf . Hence, a matrix equation Ax = b is nothing


but the vector equation XI CI + X2C2 + .. . +
XnC n = b. 0
Example 1.10 Let A be an m x n matrix partitioned into the row vectors ar , a2, ... ,
an as its blocks, and let B be an n x r matrix so that their product AB is well defined .
By considering the matrix B as a block , the product AB can be written

AB =
at
a2
.
J =[ alB
a2B
..
J= [alb
a2bl
.
l
alb2
a2b2
.. B
. ..
[
am amB ambl amb2
1.6. Inverse matrices 21

where b., b2, . .. , b, denote the columns of B . Hence, the row vectors of AB are
the products of the row vectors of A and the column vectors of B. 0
Problem 1.13 Compute AB using block multiplication, where

A=
1211
-3
0] , B=
1 012]
.
[ o 01 2 -1 [
3 -2 I 1

1.6 Inverse matrices


As shown in Section 1.4, a system of linear equations can be written as Ax = b in
matrix form. This form resembles one of the simplest linear equations in one variable
ax = b whose solution is simply x = a-I b when a f= O. Thus it is tempting to write
the solution of the system as x = A -I b. However, in the case of matrices we first have
to assign a meaning to A -I . To discuss this we begin with the following definition.

Definition 1.11 For an m x n matrix A, an n x m matrix B is called a left inverse


of A if BA = In, and an n x m matrix C is called a right inverse of A if AC = 1m •

Example 1.11 (One -sided inverse) From a direct calculation for two matrices

A = [1 2-1]
2 0 1
-3 ]
5 ,
-2 7

we have AB = h , and BA = -; - : ] f= 13 .
12 -4 9
Thus, the matrix B is a right inverse but not a left inverse of A , while A is a left inverse
but not a right inverse of B. 0

A matrix A has a right inverse if and only if AT has a left inverse, since (AB)T =
B T AT and IT = I. In general, a matrix with a left (right) inverse need not have a
right (left , respectively) inverse. However, the following lemma shows that if a matrix
has both a left inverse and a right inverse, then they must be equal:

Lemma 1.7 If an n x n square matrix A has a left inverse B and a right inverse C,
then Band C are equal, i.e., B = C.

Proof: A direct calculation shows that

B = BIn = B(AC) = (BA)C = InC = C . o


22 Chapter 1. Linear Equations and Matrices

By Lemma 1.7, one can say that if a matrix A has both left and right inverses,
then any two left inverses must be both equal to a right inverse C, and hence to each
other. By the same reason, any two right inverses must be both equal to a left inverse
B, and hence to each other. So there exists only one left and only one right inverse,
which must be equal.
We will show later (Theorem 1.9) that if A is a square matrix and has a left inverse,
then it has also a right inverse, and vice-versa. Moreover, Lemma 1.7 says that the left
inverse and the right inverse must be equal. However, we shall also show in Chapter 3
that any non-square matrix A cannot have both a right inverse and a left inverse: that
is, a non-square matrix may have only a one-sided inverse. The following example
shows that such a matrix may have infinitely many one-sided inverses.

!]
Example 1.12 (Infinitely many one-sided inverses) A non-square matrix A =

can have more than one left inverse, In fact, for any x, Y E Ill. the matrix

B= is a left inverse of A. 0

Definition 1.12 An n x n square matrix A is said to be invertible (or nonsingular)


if there exists a square matrix B of the same size such that

AB = In = BA.
Such a matrix B is called the inverse of A, and is denoted by A-I. A matrix A is said
to be singular if it is not invertible.

Lemma 1.7 implies that the inverse matrix of a square matrix is unique. That is
why we call B 'the' inverse of A. For instance, consider a2x2 matrix A =
If ad - be :/= 0, then it is easy to verify that

l
d -b
A-I _ 1
- ad -be
[d-c -b]
a -
_[ ad-c- be ad - be
a
ad - be ad -be

since AA -1 = l: = A -1 A . Note that any zero matrix is singular.


Problem 1.14 Let A be an invertible matrix and k any nonzero scalar. Show that
(1) A-I is invertible and (A-I)-I =
A;
(2) the matrix kA is invertible and (kA)-I = tA-I;
(3) AT is invertible and (AT)-I = (A-I)T.

Theorem 1.8 The product of invertible matrices is also invertible, whose inverse is
the product of the individual inverses in reversed order:
1.7. Elementary matrices and finding A-I 23

Proof: Suppose that A and B are invertible matrices of the same size. Then
(A B)(B - I A- I ) = A(BB-I)A- I = AlA-I = AA- I = I , and similarly
(B - 1A-I) (AB) = I. Thus , AB has the inverse B- 1A-I . o
The inverse of A is written as ' A to the power -1 ' , so one can give the meaning
of Ak for any integer k : Let A be a square matrix. Define A0 = I. Then, for any
positive integer k, we define the power A k of A inductively as

Moreover, if A is invertible, then the negative integer power is defined as

It is easy to check that AkH = AkA l whenever the right-hand side is defined. (If
A is not invertible, A3+(-I) is defined but A-I is not.)

Problem 1.15 Prove:

(1) If A has a zero row, so does AB .


(2) If B has a zero column, so does AB.
(3) Any matrix with a zero row or a zero column cannot be invertible.

Problem 1.16 Let A be an invertible matrix. Is it true that (Ak) T = (A T) k for any integer k?
Justify your answer.

1.7 Elementary matrices and finding A-I

We now return to the system of linear equations Ax = b. If A has a right inverse B


so that AB = 1m , then x = Bb is a solution of the system since

Ax = A (Bb) = (AB)b = b .
(Compare with Problem 1.23). In particular, if A is an invertible square matrix, then
it has only one inverse A- I, and x = A- Ib is the only solution of the system. In this
section, we discuss how to compute A-I when A is invertible.
Recall that Gaussian elimination is a process in which the augmented matrix is
transformed into its row-echelon form by a finite number of elementary row op-
erations. In the following, one can see that each elementary row operation can be
expressed as a nonsingular matrix, called an elementary matrix, so that the process
of Gaussian elimination is the same as multiplying a finite number of corresponding
elementary matrices to the augmented matrix.

Definition 1.13 An elementary matrix is a matrix obtained from the identity matrix
In by executing only one elementary row operation.
24 Chapter 1. Linear Equations and Matrices

For example, the following matrices are three elementary matrices corresponding
to each type of the three elementary row operations:

(1St kind) the second row of h is multiplied by -5;

(2nd kind) the and the fourth rows of 14


o 0 1 0 are interchanged;
o I 0 0

(3 rd kind) [bo 0 I
3 times the third row is added to the first
row of h-

It is an interesting fact that, if E is an elementary matrix obtained by executing a


certain elementary row operation on the identity matrix 1m , then for any m x n matrix
A, the product EA is exactly the matrix that is obtained when the same elementary
row operation in E is executed on A.
The following example illustrates this argument. (Note that AE is not what we
want. For this, see Problem 1.18.)

Example 1.13 (Elementary operation by an elementary matrix) Let b = [bl bz b3]T


be a 3 x 1 column matrix. Suppose that we want to execute a third kind of elementary
operation 'adding (-2) x the first row to the second row' on the matrix b. First, we
execute this operation on the identity matrix h to get an elementary matrix E:

100] .
E =
[ -2 1 0
001

Multiplying this elementary matrix E to b on the left produces the desired result:

Similarly, the second kind of elementary operation 'interchanging the first and
the third rows' on the matrix b can be achieved by multiplying an elementary matrix
P obtained from h by interchanging the two rows, to b on the left:

Recall that each elementary row operation has an inverse operation, which is
also an elementary operation, that brings the elementary matrix back to the original
identity matrix. In other words, if E denotes an elementary matrix and if E' denotes
the elementary matrix corresponding to the 'inverse' elementary row operation of E,
then E' E = 1, because
1.7. Elementary matrices and finding A -I 25

(1) if E multiplies a row by c =f. 0, then E' multiplies the same row by
(2) if E interchanges two rows, then E' interchanges them again;
(3) if E adds a multiple of one row to another, then E' subtracts it from the same
row.
Furthermore, one can say that E' E I = =E E' : every elementary matrix is invertible
and inverse matrix E- 1 = E' is also an elementary matrix.

Example 1.14 (Inverse ofan elementary matrix) If

EI = 01 0c O
0 ] , E2 = [ 1
0 0
100 ] , E 3 = [01
[ 001 301 0

then

Definition 1.14 A permutation matrix is a square matrix obtained from the identity
matrix by permuting the rows.

In Example 1.14, E3 is a permutation matrix, but E2 is not.


Problem 1.17 Prove:
(1) A permutation matrix is the product of a finite number of elementary matrices each of
which corresponds to the 'row-interchanging' elementary row operation.
=
(2) Every permutation matrix P is invertible and p - 1 pT .
(3) The product of any two permutation matrices is a permutation matrix.
(4) The transpose of a permutation matrix is also a permutation matrix.

Problem 1.18 Define the elementary column operations for a matrix by just replacing 'row '
by 'column' in the definition of the elementary row operations . Show that if A is an m x n
matrix and if E is a matrix obtained by executing an elementary column operation on In, then
AE is exactly the matrix that is obtained from A when the same column operation is executed
on A. In particular, if D is an n x n diagonal matrix with diagonal entries d I, d2, . . . , dn,
then AD is obtained by multiplication by dl, d2, ... , dn of the columns of A, while DA is
obtained by multiplication by dl' d2' . . . , dn of the rows of A .

The next theorem establishes some fundamental relations between n xn square


matrices and systems of n linear equations in n unknowns.

Theorem 1.9 Let A be an n x n matrix. The following are equivalent:


(1) A has a left inverse;
(2) Ax = 0 has only the trivial solution x = 0;
26 Chapter 1. Linear Equations and Matrices

(3) A is row-equivalent to In ;
(4) A is a product of elementary matrices;
(5) A is invertible;
(6) A has a right inverse.

Proof: (1) =} (2): Let x be a solution of the homogeneous system Ax = 0, and let
B be a left inverse of A. Then

x = Inx = (BA)x = B(Ax) = BO = O.


(2) =} (3): Suppose that the homogeneous system Ax = 0 has only the trivial

I
solution x 0:=
XI = 0
X2 = 0

Xn = O.
This means that the augmented matrix [A 0] of the system Ax = 0 is reduced to the
system [In 0] by Gauss-Jordan elimination. Hence, A is row-equivalent to In.
(3) =} (4): Assume A is row-equivalent to In, so that A can be reduced to In by a
finite sequence of elementary row operations . Thus, one can find elementary matrices
EI, E2, "" e, such that
Ek .. . E2EIA = In.
By multiplying successively both sides of this equation by s;', ..., 2
E 1, E)I on
the left, we obtain

A = E I- I E2- I E-II E-IE- I E- I


'" k n= I 2'" k '

which expresses A as the product of elementary matrices .


(4) =} (5) is trivial, because any elementary matrix is invertible. In fact, A-I =
Ek " · E2E l .
(5) =} (1) and (5) =} (6) are trivial.
(6) =} (5): If B is a right inverse of A , then A is a left inverse of B and one can
apply (1) =} (2) =} (3) =} (4) =} (5) to B and conclude that B is invertible, with A as
its unique inverse, by Lemma 1.7. That is, B is the inverse of A and so A is invertible.
o

If a triangular matrix A has a zero diagonal entry, then the system Ax = 0 has at
least one free variable, so that it has infinitely many solutions . Hence, one can have
the following corollary.
Corollary 1.10 A triangular matrix is invertible if and only if it has no zero diagonal
entry.

From Theorem 1.9, one can see that a square matrix is invertible if it has a one-
sided inverse. In particular, if a square matrix A is invertible, then x = A -I b is a
unique solution to the system Ax = b.
1.7. Elementary matrices and finding A -I 27

Problem 1.19 Find the inverse of the product

[ o -c 1
[ -b 0 1 0 0

As an application of Theorem 1.9, one can find a practical method for finding
the inverse A-I of an invertible nxn matrix A. If A is invertible, then A is row
equivalent to In and so there are elementary matrices EI, Ez, .. . , Ek such that
Ek'" EzEIA = In. Hence,
A-I = Ek '" EzEI = Ek'" EzElln .
This means that A-I can be obtained by performing on In the same sequence ofthe
elementary row operations that reduces A to In . Practically, one first constructs an
n x 2n augmented matrix [A I In] and then performs a Gaussian-Jordan elimination
that reduces A to In on [A I In] to get [In I A -I]: that is,

[A I In] -+ [Ee'" EIA I Ee'" Ell] = [U I K]


-+ [Fk'" FlU I Fk'" FIK] = [I I A-I],
where Ee .. . EI represents a Gaussian elimination that reduces A to a row-echelon
form U and Fk .. . FI represents the back substitution. The following example illus-
trates the computation of an inverse matrix.
Example 1.15 (Computing A -I by Gauss-Jordan elimination) Find the inverse of

A=
[I 23]
235 .
102

Solution: Apply Gauss-Jordan elimination to


2 3 I100]
[A I I] =
U 3 5 I 0 1 0
I a a 1
(-2)row 1 + row 2
(-1 )row I + row 3

n
0 2

U
2 3 I a
-+ -1 -1 -2 1 (-l)row 2
-2 -1 -1 a

-+
U
2
1
3
1
1 0
2 -1
n (2)row 2 + row 3

n
-2 -1 -1 a

U
2 3 1 0
-+ 1 1 2 -1
a 1 3 -2
28 Chapter 1. Linear Equations and Matrices

This is [U I K] obtained by Gaussian elimination. Now continue the back substitution


to reduce [U I K] to [I I A -I] .

2 3
1 00] (-l)row 3 + row 2
[U I K] =
U 1 1
0 1
2 -1 0
3 -2 1

6
(-3)row 3 + row 1

2 0 -8
-3 ] (-2)row 2 + row 1
-+
U 1 0
0 1
-1 1 - 1
3 -2 1

-i-1 ] =
0 0 -6 4

U
I].
-+ 1 0 -1 1 [IIA-
0 1 3 -2

Thus, we get

A-I = [-6 4-1 ]


-1 1 -1 .
3 -2 1

(The reader should verify that AA- I = I = A-I A .) 0

Note that if A is not invertible, then , at some step in Gaussian elimination, a


zero r[ow ]u p on the left-hand Sid[e [U61 Kl
]For example, the matrix

A= 2 4 -1 is row-equivalent to 0 -8 -9 , which has a zero row


-1 2 5 0 0 0
and is not invertible.
By Theorem 1.9, a square matrix A is invertible if and only if Ax = 0 has only
the trivial solution. That is, a square matrix A is noninvertible if and only if Ax 0 =
has a nontrivial solution, say XC. Now, for any column vector b =
[bl '" bn]T , if XI
is a solution of Ax = b for a noninvertible matrix A, so is kxo + xi for any k, since

A(kxo + Xl) = k(Axo) + AXI = kO + b = b.


This argument strengthens Theorem 1.6 as follows when A is a square matrix:

Theorem 1.11 If A is an invertible n x n matrix. then for any column vector b =


[bl .. . bnf. the system Ax = b has exactly one solution x = A-lb. If A is not
invertible, then the system has either no solution or infinitely many solutions according
to the consistency of the system.

Problem 1.20 Express A-I as a product of elementary matrices for A given in Example 1.15.
1.8. LDU factorization 29

dO
l ...
Problem 1.21 When is a diagonal matrix D = 0 ] . nonsingular, and what is
[
dn

Problem 1.22 Write the system of linear equations

I
x + 2y + 2z = 10
2x - 2y + 3z 1
4x - 3y + 5z = 4

in matrix form Ax = b and solve it by finding A-I b.

Problem 1.23 True or false: If the matrix A has a left inverse C so that C A = In, then x = Cb
is a solution of the system Ax = b . Justify your answer.

1.8 WU factorization

In this section, we show that the forward elimination for solving a system of linear
equations Ax = b can be expressed by some invertible lower triangular matrix, so
that the matrix A can be factored as a product of two or more triangular matrices.
We first assume that no permutations of rows (2nd kind of operation) are necessary
throughout the whole process of forward elimination on the augmented matrix [A b].
Then the forward elimination is just multiplications of the augmented matrix [A b]
by finitely many elementary matrices Ek, .. . , E 1: that is,

where each E, is a lower triangular elementary matrix whose diagonal entries are all
l's and [U y] is a row-echelon form of [A b] without divisions of the rows by the
pivots. (Note that if A is a square matrix, then U must be an upper triangular matrix).
Therefore, if we set L = (Ek' " E1)-1 = Ell . . . Ei: 1, then we have A = LU,
where L is a lowertriangularmatrix whosediagonalentriesareall l's. (In fact, each
Ei l
is also a lower triangular matrix, and a product of lower triangular matrices is
also lower triangular (see Problem 1.25). Such factorization A = LU is called an LU
factorization or an LU decomposition of A. For example,

where di 'S are the pivots.


Now, let A = LU be an LU factorization. Then the system Ax = b can be written
as LUx = b. Let Ux = y. Thus, the system
30 Chapter 1. Linear Equations and Matrices

Ax= LUx=b

can be solved by the following two steps:


Step 1 Solve Ly = b for y.
Step 2 Solve Ux = y by back substitution.
The following example illustrates the convenience of an LU factorization of a matrix
A for solving the system Ax = b.
Example 1.16 Solve the system oflinear equations

2 1 1
Ax = 4 1 0
[ -2 2 1

by using an LU factorization of A.

Solution: The elementary matrices for the forward elimination on the augmented
matrix [A b] are easily found to be

0]
o1 0 ,
3 1
so that

.: ., 0 = U.
o -4
41 ]

Thus, if we set

L = Ell E:;1 E3"1 =[ ;


-1 -3 1
which is a lower triangular matrix with 1 's on the diagonal, then

A = LU = [ ; -2 1
10] .
-1 -3 1 0 0 -4 4

Now, the system

= 1
Ly =b : Y2 = -2
3Y2 + Y3 = 7

can be easily solved inductively to get y = (1, -4, - 4) and the system
1.8. WU factorization 31

= 1
Ux=y: = -4
= -4

also can be solved by back substitution to get

for t E JR, which is the solution for the original system. o


As shown in Example 1.16, it is a simple computation to solve the systems Ly = b
=
and Ux y, because the matrix L is lower triangular and the matrix U is the matrix
obtained from A after forward elimination so that most entries of the lower-left side
are zero.

Remark: For a system Ax = b, the Gaussian elimination may be described as an


LU factorization of a matrix A. Let us assume that one needs to solve several systems
of linear equations Ax = bi for i = 1, 2, . . . , l with the same coefficient matrix A.
Instead ofperforming the Gaussian elimination process l times to solve these systems,
one can use an LU factorization of A and can solve first Ly = b, for i = 1,2, ... , l
to get solutions Yi and then the solutions of Ax = bi are just those of Ux = Yi . From
an algorithmic point of view, the suggested method based on the LU factorization
of A is much efficient than doing the Gaussian elimination repeatedly, in particular,
when l is large .
Problem 1.24 Determine an LU factorization of the matrix

1 -1 0 ]
A = -1 2 -1 ,
[ o -1 2

from which solve Ax = b for (1) b = [1 1 If and (2) b = [2 0 - l]T .

Problem 1.25 Let A and B be two lower triangular matrices. Prove that
(1) their product AB is also a lower triangular matrix;
(2) if A is invertible , then its inverse is also a lower triangular matrix ;
(3) if the diagonal entries of A and B are all 1 's, then the same holds for their product AB
and their inverses.

Note that the same holds for upper triangular matrices, and for the product of more than two
matrices .

The matrix U in the decomposition A = LU of A can further be factored as the


product U = DO, where D is a diagonal matrix whose diagonal entries are the pivots
32 Chapter 1. Linear Equations and Matrices

of U or zeros and U is a row-echelon form of A with leading 1's, so that A = LDU.

u n * *]
For example,

0 0 * * *
1 0 0 dz *
A * * = LU
* 1 0 0 0 d3 *
0 0 0 o 0

un n
* *
* * *
=
0 0
1 0
0
dz
0
0
dI dI
*
d2 * d2
0 1 .!. * .]
* 1 0 d3 0 0 0 1 *
d3
* * 0 0 0 0 0 0 0
= LDU.
For notational convention, we replace U again by U and write A = LDU. This
decomposition of A is called an LDU factorization or an LDU decomposition of
A.
For example, the matrix A in Example 1.16 was factored as

1 ° 0] [2 1-210]
A = [ -12 -31 01 0-1 1 = LU .
0 0 -4 4
It can be further factored as A = LDU by taking
2 1 10] [200][1
° ° 1/2 1/2 °]
[ -1 -2 1
00-44 o
= 0 -1
0 0 -4
0
0
1 2
1
-1
-1
= DU.

The LDU factorization of a matrix A is always possible when no row interchange


is needed in the forward elimination process . In general, if a permutation matrix for
a row interchange is necessary in the forward elimination process , then an LDU
factorization may not be possible.

Example 1.17 (The LDU factorization cannot exist) Consider a matrix A =


b] ' For forward elimination, it is necessary to interchange the first row with
the second row. Without this interchange, A has no LU or LDU factorization . In fact,
one can show that it cannot be expressed as a product of any lower triangular matrix
L and any upper triangular matrix U. 0
Suppose now that a row interchange is necessary during the forward elimination
on the augmented matrix [A b]. In this case, one can first do all the row interchanges
before doing any other type of elementary row operations, since the interchange of
rows can be done at any time, before or after the other elementary operations, with the
same effect on the solution . Those 'row-interchanging' elementary matrices altogether
form a permutation matrix P so that no more row interchanges are needed during the
forward elimination on PA. Now, the matrix PA can have an LDU factorization.
e
1.8. LDU factorization 33

E[Xrr t]18;:,
102
:::::ro:
with the third row, that is, we need to multiply A by the permutation matrix P =
so that
100

PA = = = LDU.
012 011 00200

Note that U is a row-echelon form of the matrix A. 0

Of course, if we choose a different permutation p i, then the L DU factorization of


p iA may be different from that of PA, even if there is another permutation matrix P"
that changes p i A to P A. Moreo ver, as the following example shows, even if a per-
mutation matrix is not necessary in the Gaussian elimination , the LDU factorization
of A need not be unique.

Example1.19 (Infinitely many LDU factorizations) The matrix

110]
B=
[ 000
1 3 0

has the LDU factorization


1 0
B= 1 1 ] = LDU
[ o 0 1 OO x 000

for any value x . It shows that a singular matrix B has infinitely many LDU factor-
izations . 0

However, if the matrix A is invertible and if the permutation matrix P is fixed


when it is necessary, then the matrix PA has a unique LDU factorization.

Theorem 1.12 Let A be an invertible matrix. Thenfor a fixed suitable permutation


matrix P the matrix PA has a unique LDU factorization.

Proof: Suppose that PA = LjD j U , = L zDzUz , wheretheL 's are lower triangular,
the U's are upper triangular whose diagonals are all l's, and the D's are diagonal
matrices with no zeros on the diagonal. One needs to show L , = Lz, D; = Dz , and
U, = Uz for the uniqueness .
Note that the inverse of a lower triangular matrix is also lower triangular, and the
inverse of an upper triangular matrix is also upper triangular. And the inverse of a
34 Chapter 1. Linear Equations and Matrices

diagonal matrix is also diagonal. Therefore, by multiplying (LIDI)-I = D I I L I I


on the left and V:;I on the right, the equation LI DI VI = L2D2V2 becomes

VIV:;I = DIILIIL2D2 '

The left-hand side is upper triangular, while the right-hand side is lower triangular.
Hence, both sides must be diagonal. However, since the diagonal entries of the upper
triangular matrix VI V:;I are all 1's, it must be the identity matrix I (see Problem
1.25). Thus VI V:; 1 = I, i.e., VI = V2. Similarly, L I I L2 = DID:;I implies that
LI = L2 and DI = D2. 0

In particular, if an invertiblematrix A is symmetric (i.e., A = A T) , and if it can


be factored into A = LDV without row interchanges,then we have

and thus, by the uniqueness of factorizations, we have V = L T and A = L DL T •

Problem 1.26 FUuI the factors L , D , .nd U for A =[ -! -1 0 ]


2 -1
-1
.
2
What is the solution to Ax = b for b = [1 0 - If?

probl[em/2; all possible permutation matrices P, find the LDU factorization of PA for

A= 2 4 2 .
111

1.9 Applications

1.9.1 Cryptography

Cryptographyis the study of sendingmessagesin disguisedform (secretcodes) so that


only the intended recipients can remove the disguise and read the message; modem
cryptography uses advanced mathematics. As an application of invertible matrices,
we introduce a simple coding. Suppose we associate a prescribed number with every
letter in the alphabet; for example,

ABC D x Y Z Blank ?
t t t t t t t t t t
° 1 2 3 23 24 25 26 27 28.

Suppose that we wantto send the message"GOOD LUCK." Replacethis message


by
6, 14, 14, 3, 26, 11, 20, 2, 10
1.9.1. Application: Cryptography 35

according to the preceding substitution scheme. To use a matrix technique, we first


break the message into three vectors in ]R3 each with three components, by adding
extra blanks if necessary:

Next, choose a nonsingular 3 x 3 matrix A, say

I 00]
A=
[2 I 0 ,
I I I

which is supposed to be known to both sender and receiver. Then, as a matrix multi-
plication, A translates our message into

By putting the components of the resulting vectors consecutively, we transmit

6, 26, 34, 3, 32, 40, 20, 42, 32.

To decode this message, the receiver may follow the following process . Suppose
that we received the following reply from our correspondent:

19, 45, 26, 13, 36, 41.

To decode it, first break the message into two vectors in ]R3 as before:

un un
We want to find two vectors XI , X2 such that AXi is the i-th vector of the above two
vectors : i.e.,

AX! Ax,

Since A is invertible, the vectors XI , X2 can be found by multiplying the inverse of A


to the two vectors given in the message. By an easy computation, one can find

I 00]
A-I =
[ -2 I 0 .
1 -I I

Therefore,
36 Chapter 1. Linear Equations and Matrices

=[ -21 01 00] [ 45
19 ] = [197 ]
°
XI ,
1 -1 1 26
The numbers one obtains are

19, 7, 0, 13, 10, 18.

Using our correspondence between letters and numbers, the message we have received
is "THANKS."

Problem 1.28 Encode ''TAKEUFO" using the same matrix A used in the above example.

1.9.2 Electrical network

In an electrical network, a simple current flow may be illustrated by a diagram like the
one below. Such a network involves only voltage sources , like batteries, and resistors,
like bulbs , motors, or refrigerators. The voltage is measured in volts, the resistance in
ohms, and the current flow in amperes (amps, in short). For such an electrical network,
current flow is governed by the following three laws:

Ohm's Law: The voltage drop V across a resistor is the product of the current I
and the resistance R: V = I R.
Kirchhoff's Current Law (KCL): The current flow into a node equals the current
flow out of the node.
Kirchhoff's Voltage Law (KVL): The algebraic sum of the voltage drops around
a closed loop equals the total voltage sources in the loop .

Example 1.20 Determine the currents in the network given in Figure 104.

2 ohms 2 ohms
P

18 volts

Figure 1.4. A circuit network


1.9.3. Application: Leontief model 37

40 ohms 20 volts 1 ohm

30 ohms

5 volts

40 ohms 40 volts 4 volts


(1) (2)

Figure 1.5. Two circuit networks

Solution: By applying KCL to nodes P and Q, we get equations

h + h = 12 at P,
12 = h + ls at Q.
Observe that both equations are the same, and one of them is redundant. By applying
KVL to each of the loops in the network clockwise direction, we get

6h + 212 = 0 from the left loop,


212 + 3h = 18 from the right loop.

Collecting all the equations, we get a system of linear equations:

j h -
6h
212
+ 212
h +
+ st,
h =
=
O
0
18.

By solving it, the currents are h = -1 amp, 12 = 3 amps and /3 = 4 amps. The
negative sign for h means that the current h flows in the direction opposite to that
shown in the figure. 0

Problem 1.29 Determine the currents in the networks given in Figure 1.5.

1.9.3 Leontiefmodel

Another significant application of linear algebra is to a mathematical model in eco-


nomics. In most nations, an economic society may be divided into many sectors that
produce goods or services, such as the automobile industry, oil industry, steelindustry,
38 Chapter 1. Linear Equations and Matrices

communication industry, and so on. Then a fundamental problem in economics is to


find the equilibrium of the supply and the demand in the economy.
There are two kind of demands for the goods: the intermediate demand from the
industries themselves (or the sectors) that are needed as inputs for their own pro-
duction, and the extra demand from the consumer, the governmental use, surplus
production, or exports. Practically, the interrelation between the sectors is very com-
plicated, and the connection between the extra demand and the production is unclear.
A natural question is whether there is a production level such that the total amounts
produced (or supply) will exactly balance the total demandfor the production, so that
the equality

{Total output} = {Total demand}


= {Intermediate demand} + {Extra demand}

holds. This problem can be described by a system of linear equations, which is called
the LeontiefInput-Output Model . To illustrate this, we show a simple example.
Suppose that a nation's economy consists of three sectors: It = automobile in-
dustry, h == steel industry, and h = oil industry.
Let x = [Xl Xzx3f denote the production vector (or production level) in ]R3 ,
where each entry Xi denotes the total amount (in a common unit such as 'dollars'
rather than quantities such as 'tons' or 'gallons') of the output that the industry Ii
produces per year.
The intermediate demand may be explained as follows. Suppose that, for the total
output Xz units of the steel industry h 20% is contributed by the output of It. 40%
by that of hand 20% by that of h Then we can write this as a column vector, called
a unit consumption vector of h :

0.2 ]
Cz = 0.4 .
[ 0.2

For example , if h decides to produce 100 units per year, then it will order (or demand)
20 units from It , 40 units from hand 20 units from h: i.e., the consumption vector
of h for the production Xz = 100 units can be written as a column vector: 100cz =
[2040 20]T. From the concept of the consumption vector, it is clear that the sum of
decimal fractions in the column cz must be ::: 1.
In our example, suppose that the demands (inputs) of the outputs are given by the
following matrix , called an input-output matrix:

output
It h h
A = input
t,
h
[0.3
0.1
0.2
0.4
03 ]
0.1 .
h 0.3 0.2 0.3
t t t
c\ cz C3
1.9.3. Application: Leontiefmodel 39

In this matrix, an industry looks down a column to see how much it needs from
where to produce its total output, and it looks across a row to see how much of its
output goes to where. For example, the second row says that, out of the total output
Xz units of the steel industry lz. as the intermediate demand, the automobile industry
h demands 10% of the output Xl, the steel industry /z demands 40% of the output
Xz and the oil industry h demands 10% of the output X3. Therefore , it is now easy to
see that the intermediate demand of the economy can be written as

0.3 0.2 0.3] [ Xl] [ 0.3Xl + 0.2xz + 0.3X3 ]


Ax = 0.1 0.4 0.1 Xz = O.lXl + O.4xz + 0.lX3 .
[ 0.3 0.2 0.3 X3 0 3Xl + 0.2xz + 0.3X3

Suppose that the extra demand in our example is given by d = [dl, di, d3f =
[30,20, ioi" , Then the problem for this economy is to find the production vector x
satisfying the following equation:

x = Ax+d.

Another form of the equation is (l - A)x = d, where the matrix I - A is called the
Leontief matrix. If I - A is not invertible, then the equation may have no solution
or infinitely many solutions depending on what d is. If I - A is invertible, then the
equation has the unique solution x = (l - A)-ld. Now, our example can be written

[]
as

[ X3
] = +[ ].
0.3 0.2 0.3 X3 10
In this example, it turns out that the matrix I - A is invertible and

2.0 1.0 1.0]


(l - A)-l = 0.5 2.0 0.5 .
[ 1.0 1.0 2.0

l
Therefore,

x (l - AJ-1d = [

which gives the total amount of product Xi of the industry I i for one year to meet the
required demand .

Remark: (1) Under the usual circumstances, the sum ofthe entries in a column of the
consumption matrix A is less than one because a sector should require less than one
units worth of inputs to produce one unit of output. This actually implies that I - A
is invertible and the production vector x is feasible in the sense that the entries in x
are all nonnegative as the following argument shows.
(2) In general, by using induction one can easily verify that for any k = 1, 2, . . . ,
40 Chapter 1. Linear Equations and Matrices

If the sums of column entries of A are all strictly less than one, then limk-+oo A k = 0
(see Section 6.4 for the limit of a sequence of matrices). Thus, we get (l - A)(l +
A + ... + A k + ...) = I, that is,

(l - A)-I = I + A + ...+ Ak + ... .


This also shows a practicalway of computing (l - A)-I sinceby takingk sufficiently
large the right-handside may be made veryclose to (l- A)-I. In Chapter6, an easier
method of computing A k will be shown.
In summary, if A and d have nonnegative entries and if the sum of the entries of
each column of A is less than one, then I - A is invertibleand the inverseis given as
the above formula. Moreover, as the formula shows the entries of the inverse are all
nonnegative, and so are those of the production vectorx = (l - A)-Id.
Problem 1.30 Determine the total demand for industries It, 12 and 13 for the input-output
matrix A and the extra demand vector d given below:

0.1 0.7 0.2]


A= 0.5 0.1 0.6 with d = O.
[ 0.4 0.2 0.2

Problem 1.31 Suppose that an economy is divided into three sectors : II = services, h =
=
manufacturing industries, and h agriculture . For each unit of output, II demands no services
from 1], 0.4 units from lz. and 0.5 units from 13. For each unit of output. lz requires 0.1 units
from sector II of services. 0.7 units from other parts in sector lz. and no product from sector
13. For each unit of output. 13 demands 0.8 units of services 110 0.1 units of manufacturing
products from lz. and 0.1 units of its own output from lJ. Determine the production level to
balance the economy when 90 units of services, 10 units of manufacturing. and 30 units of
agriculture are required as the extra demand.

1.10 Exercises
1.1. Which of the following matrices are in row-echelon form or in reduced row-echelon form?

0 0 0
0 1 0
-3 ]
4 •
h[l n- 0
O
0
0
1
0
0
0
1

-n D=[l
0 0 1 2
0 0 0
0 0 0
1 0 0
0 1 2
0 1 1

nF=[l
0 0 1
0 0 1
0 0 0
0 0 0

E=[l -H
1 0 0
1 0 0
0 1 0
0 0 1
1 0 -2
0 0 0
1.10. Exercises 41

1.2. Find a row-echelon form of each matrix.

[I -3 212]
H
2 3 4
3 4 5
(1) 3 -9 10 2 9 (2) 4 5 1
2 -6 4 2 4 '
5 1 2
2 -6 8 1 7
1 2 3

1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2.
1.4. Solve the systems of equations by Gauss-Jordan elimination.

(1)
3XI + 2X2
: X3
+;:
X4
-
I

I
Xl + X2 + 3X3 3X4 -8 .

(2) + z
2x + 4z 1.
What are the pivots in each of 3rd kind elementary operations?
1.5. Which of the following systems has a nontrivial solution?

I
X + 2y + 3z = 0 12x + y - z 0
(1) 2y + 2z = 0 (2) x - 2y - 3z = 0
x + 2y + 3z = O. 3x + y - 2z = O.
1.6. Determine all values of the b, that make the following system consistent:

I
x + y - z = bl
2y + z = b2
Y-Z=b3·

1.7. Determine the condition on b; so that the following system has no solution :

2x + Y + 7z bl

1 6x
2x -
- 2y
y
+
+
l lz
3z
b2
b3.

1.8. Let A and B be matrices of the same size.


(I) Show that, if Ax = 0 for all x, then A is the zero matrix.
(2) Show that, if Ax = Bx for all x, then A = B.
1.9. Compute ABC and CAB for

:j. -I]
1.10. Prove that if A is a 3 x 3 matrix such that AB = BA for every 3 x 3 matrix B , then
A = clJ for some constant c.
1.11. Let A = Find A k for all integers k.
001
42 Chapter 1. Linear Equations and Matrices

1.12. Compute (2A - B)C and CC T for

A = B = [-; C =[
I 0 I 0 0 1 -2 2 I
1.13. Let f(x) =anxn + an_IX n- 1 + ...+ alx + ao be a polynomial. For any square matrix
A, a matrix polynomial f (A) is defined as

f(A) = anAn + an_lAn-I + ...+ alA + aol.


For f(x) = 3x 3

-i
+

n
x 2 - 2x + 3, find f(A) for

1.14. Find the symmetric part and the skew-symmetric part of each of the following matrices .

(I) A =[ ; ;
-1 3 2
(2) A =
0 0
; -1].3

1.15. Find AA T and A T A for the matrix A = [; -


2
i].
8 4 0

1.16. Let A -I =

U:J.
421

(I) Flod. matrix B such that AB

(2) Find a matrix C such that AC = A2 + A.

1.17. Find all possible choices of a, band c so that A = [; has an inverse matrix such

that A-I = A.
1.18. Decide whether or not each of the following matrices is invertible. Find the inverses for
invertible ones.

11I]
:], B= [ 05 52 3I , ; -;].
2 I
o 0 0 4
1.19. Find the inverse of each of the following matrices:

6411 1248 OOlk


1.20. Suppose A is a 2 x I matrix and B is a I x 2 matrix. Prove that the product AB is not

i
invertible.
3 4 ]
1.21. Find three matrices which are row equivalent to A - 2 -I .
-3 4
1.10. Exercises 43

1.22. Write the following systems of equations as matrix equations Ax = b and solve them by
computing A -I b:
2x1 - X2 + 2
3X3 =
=
XI -
XI +
IX2 + X3
=
5
(1)
1 X2 -
2x1 + X2 - 2X3
5 (2)
4X3
7, = 4xI
X2
3X2 + 2x3
X3

1.23. Find the LDU factorization for each of the following matrices:
=
-1
-3.

(1) A = J. (2) A = l
UH
1.24. Find the LDL T factorization of the following symmetric matrices :

(1) A (2) A [: n
1.25. Solve Ax = b with A = LU , where Land U are given as
L = [ .: U = b = [ -; ] .

n
o -1 I 0 0 1 4
Forward elimination is the same as Lc = b, and back-substitution is Ux = c.

: ,00

(1) Solve Ax = b by Gauss-Jordan elimination.


(2) Find the LDU factorizat ion of A.
(3) Write A as a product of elementary matrice s.
(4) Find the inverse of A.
1.27. A square matrix A is said to be nilpotent if A k = 0 for a positive integer k.
(1) Show that any invertible matrix is not nilpotent.
(2) Show that any triangular matrix with zero diagonal is nilpoten t.
(3) Show that if A is a nilpotent with Ak = 0, then I - A is invertible with its inverse
I + A + ' " + Ak- I .
1.28 . A square matrix A is said to be idempotent if A 2 = A.
(l) Find an example of an idempotent matrix other than 0 or I .
(2) Show that, if a matrix A is both idempotent and invertible, then A = I.
1.29. Determine whether the following statements are true or false, in general, and justify your
answers.
(1) Let A and B be row-equivalent square matrices. Then A is invertible if and only if B
is invertible .
(2) Let A be a square matrix such that AA = A. Then A is the identity.
(3) If A and B are invertible matrices such that A 2 = I and B2 = I, then (A B) -I = BA.
(4) If A and B are invertible matrices, A + B is also invertible .
(5) If A, Band AB are symmetric, then AB = BA .
(6) If A and B are symmetric and of the same size, then AB is also symmetric.
44 Chapter 1. Linear Equations and Matrices

(7) If A is invertible and symmetric, then A -\ is also symmetric.


(8) Let AB T = I . Then A is invertible if and only if B is invertible .
(9) If a square matrix A is not invertible, then neither is AB for any B.
(10) If E\ and E2 are elementary matrices , then E\E2 = E2E\ .
(11) The inverse of an invertible upper triangular matrix is upper triangular.
(12) Any invertible matrix A can be written as A = LU, where L is lower triangular and
U is upper triangular.
2
Determinants

2.1 Basic properties of the determinant


Our primary interest in Chapter 1 was in the solvability or finding solutions of a
system Ax = b of linear equations . For an invertible matrix A, Theorem 1.9 shows
that the system has a unique solution x = A-I b for any b.
Now the question is how to determine whether or not a square matrix A is invert-
ible. In this chapter, we introduce the notion of determinant as a real-valued function
of square matrices that satisfies certain axiomatic rules, and then show that a square
matrix A is invertible if and only if the determinant of A is not zero. In fact, it was
shown in Chapter I that a 2 x 2 matrix A = is invertible if and only if
ad - be f:; O. This number is called the determinant of A, written det A, and is defined
formally as follows :

Definition 2.1 For a 2 x 2 matrix A = E the determinant of


A is defined as det A = ad - be.

Geometrically, it turns out that the determinant of a 2 x 2 matrix A represents, up


to sign, the area of a parallelogram in the xy-plane whose edges are constructed
by the row vectors of A (see Theorem 2.10). Naturally, one can expect to define
a determinant function on higher order square matrices so that it has a geometric
interpretation similar to the 2 x 2 case . However, the formula itself in Definition 2.1
does not provide any clue of how to extend this idea of determinant to higher order
matrices. Hence, we first examine some fundamental properties of the determinant
function defined in Definition 2.1.
By a direct computation, one can easily verify that the function det in Definition 2.1
satisfies the following lemma.

Lemma 2.1 (1) det 1.

HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
46 Chapter 2. Determinants

Proof: (2) det : ] = be - ad = -(ad - be) = - det [

(3) det [ ka la' kb lb' ] = (ka + la')d - (kb + lb')e

= k(ad - be) + l(a'd - b'e)

= k det [ ae db] + l det [ a'e b'


d .
]
o

In Lemma 2.5, it will be shown that if a function f : M2x2(lR) lRsatisfies the


properties (l)-{3) in Lemma 2.1, then it must be the function detdefined in Definition
2.1, that is, f(A) =ad -be. The properties (1)-(3) in Lemma 2.1 of the determinant
on M2x2(lR) enable us to define the determinant function for any square matrix.

Definition 2.2 A real-valued function f : Mnxn(lR) lRofalln xn square matrices


is called a determinant if it satisfies the following three rules:
(Rl) The value of f of the identity matrix is I , i.e., fUn) 1; =
(R2) the value of f changes sign if any two rows are interchanged;
(R3) f is linear in the first row: that is, by definition,

krl ] = [ l
... +
.. ...
f kf . f ,
[
rn rn rn

where r i's denote the row vectors [ail . . . ain] of a matrix .

Remark: (1) To be familiar with the linearity rule (R3), note that all row vectors of
any n x n matrix belong to the set Mlxn(lR), on which a matrix sum and a scalar
multiplication are well defined.A real-valued function f : M 1 xn (R) lR is said to be
linear if it preserves these two operations: that is, for any two vectors x, y E M1 xn (R)
and scalar k,

f(x + y) = f(x) + f(y) and f(kx) = kf(x),


or, equivalently f(kx+ly) = kf(x)+lf(y). Such a linear function will be discussed
again in Chapter 4.
(2) The determinant is not defined for a non-square matrix .

Itis already shown that the det on 2x2 matrices satisfies the rules (Rl)-{R3). In the
next section, one can see that for each positive integer n there always exists a function
2.1. Basic properties of the determinant 47

f : MnxnOR) IR satisfying the three rules (Rl)-(R3) and such a function is unique
(existence and uniqueness). Therefore, we say 'the' determinant and designate it as
'det' in any order.
Let us first derive some direct consequences of the rules (Rl)-(R3).

Theorem 2.2 The determinant satisfies the following properties.


(1) The determinant is linear in each row.
(2) If A has either a zero row or two identical rows, then det A = O.
(3) The elementary row operation that adds a constant multiple ofone row to another
row leaves the determinant unchanged.

Proof: (I) Any row can be placed in the first row by interchanging rows with a change
of sign in the determinant by the rule (R2), and then using the linearity rule (R3) and
(R2) again by interchanging the same rows.
(2) If A has a zero row, then this row is zero times the zero row so that det A 0=
by (1). If A has two identical rows, then interchanging those two identical rows does
not change the matrix itself but det A = - det A by the rule (R2), so that det A = O.
(3) By a direct computation using (1), one can get

ri+krj r, rj
det = det + k det

in which the second term on the right-hand side is zero by (2). o


The rule (R2) of the determinant function is said to be the alternating property,
and the property (1) in Theorem 2.2 is said to be multilinearity.
It is now easy to see the effect of elementary row operations on evaluations of the
determinant. The first elementary row operation that 'multiplies a row by a constant
k' changes the determinant to k times the determinant, by Theorem 2.2(1). The rule
(R2) explains the effect of the second elementary row operation that 'interchanges
two rows '. The third elementary row operation that 'adds a constant multiple of a row
to another' is explained in Theorem 2.2(3). In summary, one can see that

det(EA) = det E det A for any elementary matrix E.

For example, if E is the elementary matrix obtained from the identity matrix by
'multiplies a row by a constant k', then det(EA) =
k det A and det E k by =
Theorem 2.2(1) , so that det(EA) = det E detA . As a consequence, if two matrices
A and B are row-equivalent, then det A = k det B for some nonzero number k.
48 Chapter 2. Determinants

Example 2.1 Consider a matrix

1
A= [ b
b+c c+a

If one adds the second row to the third, then the third row becomes

[a+b+c a+b+c a+b+c],

which is a scalar multiple of the first row. Thus , det A = O. o

Problem 2.1 Show that, for an n x n matrix A and k E JR, det(kA) = k n det A.
Problem 2.2 Explain why det A = 0 for
a + l a+4 a+7]
(1) A =[ a+2 a+5 a+ 8 •
a+3 a+6 a+9

Recall that any square matrix can be transformed into an upper triangular matrix
by forward elimination, possibly with row interchanges. Further properties of the
determinant are obtained in the following theorem.

Theorem 2.3 The determinant satisfies the following properties.


(1) The determinant ofa triangular matrix is the product of the diagonal entries.
(2) The matrix A is invertible if and only if det A =1= O.
(3) For any two n x n matrices A and B, det(AB) = det A det B.
(4) detA T = detA.

Proof: (1) If A is a diagonal matrix, then it is clear that det A =


all ... ann by the
multilinearity in Theorem 2.2(1) and rule (Rj). Suppose that A is a lower triangular
matrix . If A has a zero diagonal entry, then a forward elimination, which does not
change the determinant, produces a zero row, so that det A = O. If A does not have
a zero diagonal entry, a forward elimination makes A row equivalent to the diagonal
matrix D whose diagonal entries are exactly those of A, so that det A = det D=
all ' " ann. Similar arguments can be applied to an upper triangular matrix.
(2) A square matrix A is row equivalent to an upper triangular matrix U through
a forward elimination possibly with row interchanges : that is, A = PLU for some
permutation matrix P and a lower triangular matrix L whose diagonal entries are all
1'S o Thus det A = ± det U , and the invertibility of U and A are equivalent. However,
U is invertible if and only if U has no zero diagonal entry by Corollary 1.10, which
is equivalent to det U =1= 0 by (1).
2.1. Basic properties of the determinant 49

(3) If A is not invertible, then neither is AB, and so det(AB) = 0 = det A det B.
If A is invertible, it can be written as a product of elementary matrices by Theorem
1.9, say A = EIE2 ' " Ei, Then by induction on k,

det(AB) = det(EIE2'" EkB)


= det El det E2 ... det Ek det B
= det(El E2.. . Ek) det B
= detAdet B.

(4) Clearly, A is not invertibleif and only if AT is not. Thus, for a singular matrix
A we have det AT = 0 = det A. If A is invertible, then write it again as a product of
elementary matrices, say A = E; E2.. . Ei , But, det E = det E T for any elementary
matrix E. In fact, if E is an elementary matrix obtained from the identity matrix by
row interchange, then det E T = -1 = det E by (R2), and all elementary matrices of
other types are triangular, so that det E = det E T . Hence, we have by (3)

detA T = det(EI E2'" Ek)T


= det(E[ .. . EI Ef)
= det E[ . . . det EI det Ef
= det Ek' . . det E2 det El
= detA. 0

Remark: From the equality det A = det AT, one could define the determinant in
terms of columns instead of rows in Definition2.2, and Theorem 2.2 is also true with
'columns' instead of 'rows' .
Example 2.2 (Computing det A by a forward elimination) Evaluate the determinant
of
2-4 0 0]
1 -3 0 1
A = [ 1
3 -4
0 -1
3-1
2 .

Solution: By using forward elimination, A can be transformed to an upper triangu-


lar matrix U . Since the forward elimination does not change the determinant, the
determinant of A is simply the product of the diagonal entries of U:

= = det
2-4
o 0 0] =
-1 0 1
2 . ( _ 1)2 . 13 = 26. o
det A det U
[o 0 0 -1
0 0 13
4

Problem 2.3 Prove that if A is invertible, then det A-I = 1/ det A.


50 Chapter 2. Determinants

Problem 2.4 Evaluate the determinant of each of the following matrices:

1 4 2] [11 13 24
12 23
21 22 14] [ x13
(1) 3 1 1 , (2) 31 32 33 34 , (3) x 2
[
-2 2 3 41 42 43 44 x

2.2 Existence and uniqueness of the determinant


Throughout this section, we prove the following fundamental theorem for the deter-
minant.

Theorem 2.4 For any natural number n,


(1) (Existence)thereexistsareal-valuedfunetionf: Mnxn(lR) IRwhiehsatisfies
the three rules (RI)-(R3) in Definition 2.2.
(2) (Uniqueness) Suehfunetion is unique.

Clearly, it is true when n = 1 with conclusion det[a] = a.


For 2 x 2 matrices: When n = 2, the existence theorem comes from Lemma
2.1. The next lemma shows that any function f : M2x2(lR) IR satisfying the three
rules (RI)-(R3) must be the det in Definition 2.1, which implies the uniqueness of
the determinant function on M2x2(1R).

Lemma 2.5 If a function f : M2x2(lR) R satisfies the rules (Rj)-(R3), then


f = ad - be. That is, f(A) = detA.

Proof: First, note that f b] = - 1 by the rules (Rj) and (R2).

f(A) = f = f [ + 0 0+ ]

=
ad + 0+ 0 - be = ad - be,

where the third and fourth equalities come from the multilinearity in Theorem 2.2(1).
o

For 3 x 3 matrices: For n = 3, the same process as in the case of n = 2 can


be applied . That is, by repeated use of the three rules (Rj)-(R3) as in the proof of
2.2. Existence and uniqueness of the determinant 51

Lemma 2.5, one can derive an explicit formula for det A of a matrix A = [aij] in
M3x3(lR) as follows :

[all a\2
det a21 a22 a23
a\,]
a31 an a33

= det
[all 0
a22 ]+de{ a\2o 0] [0
a23
0
+ det a21 0 af]
0 a33 a31 o 0 0 an

+det
[ ÿn
0
0 a23
o ] + det [0
a21
a\20 0] [0 0
0 + det 0 a22 ]
a32 o 0 0 a33 a31 0

The first equality is obtained by the multilinearity in Theorem 2.2(1): First, by


applying it to the first row with

[all a \2 an] = [all 0 0] + [0 a\2 0] + [0 0 an],

det A becomes the sum of determinants of three matrices . Observe that, in each of the
three matrices, the first row has just one entry from A and all others zero. Subsequently,
by applying the same multilinearity to the second and the third rows of each of the
three matrices, one gets the sum of the determinants of 33 = 27 matrices, each of
which has exactly three entries from A, one in each of three rows, and all other entries
zero. In each of those 27 matrices, if any two of the three entries of A are in the
same column, then the matrix contains a zero column so that its determinant is zero.
Consequently, the determinants of six matrices are left to get the first equality.
The second equality is just the computation of the six determinants by using the
rules (R2) and Theorem 2.3(1). In fact, in each of those six matrices, no two entries
from A are in the same row or in the same column, and thus one can take suitable
'column interchanges' to convert it to a diagonal matrix. Thus the determinant of each
of them is just the product of the three entries with ± sign which is determined by
the number of column interchanges.

Remark: The explicit formula for the determinant of a 3 x 3 matrix can easily be
memorized by the following scheme. Copy the first two columns and put them on
the right of the matrix , and compute the determinant by multiplying entries on six
diagonals with a + sign or a - sign as in Figure 2.1. This is known as Sarrus's method
for 3 x 3 matrices. It has no analogue for matrices of higher order n ::: 4.
The computation of the explicit formula for det A shows that, if any real-valued
function f : M3x3(lR) -+ lR satisfies the rules (Rl)-(R3) , then f(A) = detA for
any matrix A = [aij] E M3x3(lR). This proves the uniqueness theorem when n = 3.
On the other hand, one can easily show that the given explicit formula for det A of
a matrix A E M3x3(lR) satisfies the three rules, which proves the existence when
52 Chapter 2. Determinants

Figure 2.1. Sarrus's method

n = 3. Therefore, for n = 3, it shows both the uniqueness and the existence of the
determinant function on M3x3(lR), which proves Theorem 2.4 when n = 3.
Problem 2.5 Showthatthegivenexplicitformula of thedeterminant for3 x 3 matrices satisfies
the three rules (RIHR3).

Problem 2.6 Use Sarrus's methodto evaluate the determinants of

(1) A =[ 142] ,
3 1 1
-2 2 3
(2) A = [4
1
-2

Now, a reader might have an idea how to prove the uniqueness and the existence of
the determinant function on Mn xn(R) for n > 3. If so, that reader may omit reading its
continued proof below and rather concentrate on understanding the explicit formula
of det A in Theorem 2.6.
For matrices of higher order n > 3: Again , we repeat the same procedure as for
the 3 x 3 matrices to get an explicit formula for det A of any square matrix A = [aij]
of order n.
(Step 1) Just as the case for n = 3, use the multilinearity in each row of A to
get det A as the sum of the determinants of nn matrices. Notice that each one of n"
matrices has exactly n entries from A, one in each of n rows . However, if any two
of the n entries from A are in the same column, then it must have a zero column, so
that its determinant is zero and it can be neglected in the summation. Now, in each
remaining matrix, the n entries from A must be in different columns: that is, no two
of the n entries from A are in the same row or in the same column .
(Step 2) Now, we aim to estimate how many of them remain. From the observation
in Step 1, in each of the remaining matrices, those n entries from A are of the form

with some column indices i , i, k, .. . , l. Since no two of these n entries are in the
same column, the column indices i , l, k, .. . , £ are just a rearrangement of I, 2, .. . , n
without repetition or omissions. It is not hard to see that there are just n! ways of such
rearrangements, so that n! matrices remain for further consideration. (Here n! =
n(n - 1) .. ·2· I, called n factorial.)
2.2. Existence and uniquenessof the determinant 53

Remark: In fact, the n! remaining matrices can be constructed from the matrix A =
[aij] as follows: First , choose anyone entry from the first row of A, say ali, in the
i -th column. Then all the other n - I entries a2j, a3k. .. . , ani should be taken from
the columns different from the i -th column. That is, they should be chosen from
the submatrix of A obtained by deleting the row and the column containing ali . If
the second entry a2j is taken from the second row, then the third entry a3k should be
taken from the submatrix of A obtained by deleting the two rows and the two columns
containing ali and a 2j, and so on. Finally, if the first n - 1 entries

are chosen, then there is no alternative choice for the last one an i since it is the one
left after deleting n - 1 rows and n - I columns from A.
(Step 3) We now compute the determinant of each of the n! remaining matrices.
Since each of those matrices has just n entries from A so that no two of them are in
the same row or in the same column, one can convert it into a diagonal matrix by
' suitable ' column interchanges. Then the determinant will be just the product of the
n entries from A with '±' sign , which will be determined by the number (actually
the parity) of the column interchanges. To determine the sign , let us once again look
back to the case of n = 3.

Example 2.3 (Convert intoa diagonal matrixby column interchanges) Suppose that
one of the six matrices is of the form:

0 0 a 13]
o a22 0 .
[ a31 0 0
Then , one can convert this matrix into a diagonal matrix by interchanging the first
and the third columns. That is,

0 0 a13 ] [ a13
det 0 a 22 0 = - det 0
[
a31 0 0 0
Note that a column interchange is the same as an interchange of the corresponding
column indices. Moreover, in each diagonal entry of a matrix, the row index must be
the same as its column index . Hence, to convert such a matrix into a diagonal matrix,
one has to convert the given arrangement of the column indices (in the example we
have 3, 2, 1) to the standard order 1, 2, 3 to be matched with the arrangement 1,2,3
of the row indices.
In this case, there may be several ways of column interchanges to convert the
given matrix to a diagonal matrix. For example, to convert the given arrangement 3,
2, 1 of the column indices to the standard order 1, 2, 3, one can take either just one
interchanging of 3 and 1, or three interchanges: 3 and 2, 3 and 1, and then 2 and 1. In
either case, the parity is odd so that the "-" sign in the computation of the determinant
came from (-1 ) I = (_ 1)3, where the exponents mean the numbers of interchanges
of the column indices. 0
54 Chapter 2. Determinants

To formalize our discussions, we introduce a mathematical terminology for a


rearrangement of n objects:

Definition 2.3 A permutation of n objects is a one-to-one function from the set of


n objects onto itself.

=
In most cases, we use the set of integers Nn {I, 2, .. . , n} for a set of n objects.
A permutation a of Nn assigns a numbera(i) in Nn to each number i in Nn , and this
permutation a is usually denoted by

a = (a(I), a(2), .. . , a(n») = :: :

Here , the first row is the usual lay-out of Nn as the domain set, and the second row
is the image set showing an arrangement of the numbers in Nn in a certain order
without repetitions or omissions. If Sn denotes the set of all permutations of Nn,
then, as mentioned previously, one can see that Sn has exactly n! permutations. For
example, S2 has 2 = 2!, S3 has 6 = 3!, and S4 has 24 = 4! permutations.

Definition 2.4 A permutation a = (iI, h, .. . , jn) is said to have an inversion if


i,> jt for s < t (i.e., a larger number precedes a smaller number).

For example, the permutation a = (3,1,5,4,2) has five inversions, since 3 pre-
cedes 1 and 2; 5 precedes 4 and 2; and 4 precedes 2. Note that the identity (1, 2, . .. , n)
is the only one without inversions .

Definition 2.5 A permutation is said to be even if it has an even number of inversions,


and it is said to be odd if it has an odd number of inversions. For a permutation a in
Sn, the sign of a is defined as

_ { 1 if a is an even permutation } _ (_I)k


sgn (a ) - -1 if a is an odd permutation - ,

where k is the number of inversions of a.

For example, when n = 3, the permutations (1, 2, 3), (2, 3, 1) and (3, 1, 2)
are even, while the permutations (1, 3, 2), (2, 1, 3) and (3, 2, 1) are odd.
In general, one can convert a permutation a = (a(1), a(2), . . . , a(n») in Sn into
the identity permutation (1, 2, .. . , n) by transposing each inversion of a . However,
the number of necessary transpositions to convert the given permutation into the
identity permutation need not be unique as shown in Example 2.3. An interesting
fact is that, even though the number of necessary transpositions is not unique, the
parity (even or odd) is always the same as that of the number of inversions. (This
may not be clear and the readers are suggested to convince themselves with a couple
of examples.)
We now go back to Step 3 to compute the determinants of those remaining
n! matrices. Each of them has n entries of the form al 17 ( I ) , a2u(2), . . . , an17 (n)
2.2. Existence and uniqueness of the determinant 55

for a permutation 0' E Sn. Moreover, this can be converted into a diagonal ma-
trix by column interchanges corresponding to the inversions in the permutation
0' = (O'(l), 0'(2), . . . , O'(n)}. Hence, its determinant is equal to

sgn(0')alu(1)a2a(2) . . . anu(n)'

This is called a signed elementary product of A.


Our discussions can be summarized as follows to get an explicit formula for det A:

Theorem 2.6 For an n x n matrix A,

det A = L sgn(0')al u(1)a2a(2) . . . anu(n) '


UES n

That is, det A is the sum ofall signed elementary products of A.

This shows that the determinant must be unique if it exists. On the other hand, one
can show that the explicit formula for det A in Theorem 2.6 satisfies the three rules
(Rl)-(R3). Therefore, we have both existence and uniqueness for the determinant
function of square matrices of any order n 2: 1, which proves Theorem 2.4.

As the last part of this section , we add an example to demonstrate that any per-
mutation 0' can be converted into the identity permutation by the same number of
transpositions as the number of inversions in 0'.

Example 2.4 (Convert into the identity permutation by transpositions) Consider a


permutation 0' = (3,1 ,5,4, 2) in Ss. It has five inversions, and it can be converted
to the identity permutation by composing five transpositions successively:

0' = {3, 1,5,4, 2} -+ {I, 3, 5, 4, 2} -+ {I, 3, 5, 2, 4} -+ {I, 3, 2, 5, 4}


-+ {I, 2, 3, 5, 4} -+ {I, 2, 3, 4 , 5}.

It was done by moving the number 1 to the first position, and then 2 to the second
position by transpositions, and so on. In fact, the bold faced numbers are interchanged
in each one of five steps, and the five transpositions used to convert 0' into the identity
permutation are shown below :

0'{2, 1,3,4, 5){1, 2, 3,5, 4}{1,2, 4, 3, 5}{1,3, 2, 4, 5}{1, 2, 3, 5, 4}= (I, 2, 3,4,5).

Here, two permutations are composed from right to left for notational convention : i.e .,
= =
if we denote T (2, 1,3,4,5), then O'T 0' 0 T. For example, O'T(2) 0'(1) = 3.=
Also, note that 0' can be converted to the identity permutation by composing the
following three transpositions successively:

0' (2,1,3,4, 5}{1 , 5,3,4, 2}{1,2, 5,4, 3)={1, 2, 3, 4, 5). o


56 Chapter 2. Determinants

It is not hard to see that the number of even permutations is equal to that of odd
permutations, so it is if.
In the case n = 3, one can notice that there are three terms
with + sign and three terms with - sign in det A.

Problem 2.7 Show that the number of even permutations and the number of odd permutations
in S« are equal.

Problem 2.8 Let A [CI = cn] be an n x n matrix with the column vectors c/s. Show that
det[c j CI .. . Cj_1 Cj+1 cn] = (-1)j -1 detlc] ... Cj .. . cn]. Note that the same kind of
equality holds when A is written in row vectors.

2.3 Cofactor expansion

Even ifone has found an explicit formula for the determinant as shown in Theorem 2.6,
it is not much help in computing because one has to sum up n! terms, which becomes
a very large number as n gets large. Thus, we reformulate the formula by rewriting it
in an inductive way, by which the summing time can be reduced.
The first factor ala (I) in each of the n! terms is one of al1, a12, ... ,al n in the first
row of A. Hence, one can divide the n! terms of the expansion of det A into n groups
according to the value of a(1): Say,

det A = L sgn(a)ala(l)a2a(2) ... ana(n)


«es,
al1AII + al2 AI2 + .. . + alnAln,

where, for j = 1,2, ... , n, Al j is defined as

Alj = L sgn(a)a2a(2) . .. ana(n) '


aESn,a(I)=j

This number A Ij will tum out to be the determinant, up to a µ sign, ofthe submatrix of
A obtained by deleting the l-st row and j -th column. This submatrix will be denoted
by Mlj and called the minor of the entry al j.

Remark: If we replace the entries all, a 12, . .. , al n in the first row of A as unknown
variables XI, X2 , • • . , X n , then det A is a polynomial of the variables Xi , and the number
A I j becomes the coefficient of the variable X j in this polynomial.
We now aim to compute Alj for j = 1,2, .. . , n . Clearly, when j = 1,
Al1 = L sgn(a)a2a(2)" ·ana(n) = Lsgn('r)a2T(2)" . anT(n) ,
aESn,a(l)=1 T

summing over all permutations r of the numbers 2, 3, . .. , n. Note that each term in
A II contains no entries from the first row or from the first column of A. Hence, all
2.3. Cofactor expansion 57

the (n - I)! terms in the sum of All are just the signed elementary products of the
submatrix Mll of A obtained by deleting the first row and the first column of A, so
that
All = det Mll .
To compute the number Alj for j > 1, let A = [CI . . . cn] with the column vectors
cj's and let B = [Cj CI . . . Cj-l Cj+1 ... cn] be the matrix obtained from A by
interchanging the j-th column with its preceding j - 1 columns one by one up to the
first. Then, det A = (-1 )j-I det B (see Problem 2.8). Write

det B = bllBll + b12B\2 + . . . + bi« BIn

as the expansion of det B. Then, alj = bll and the number BII is the coefficient of
the entry bll in the formula of det B. By noting A Ij is the coefficient of the entry alj
in the formula of det A, one can have Alj = (-I)j-1 BII. Moreover, the minor Mlj
of the entry ai] is the same as the minor NIl of the entry bll. Now, by applying the
previous conclusion All = det Mll to the matrix B, one can obtain Bll = det Nll
and then
. I . I . I
Alj = (-I)J- Bll = (-I)J- detNll = (-I)J - detMlj'

In summary, one can get an expansion of det A with respect to the first row:

detA = allAll+a\2AI2+·· ·+alnAln


= all det Mll - a\2 det MI2 + .. . + (_l) I+na ln det MIn .

This is called the cofactor expansion of det A along the first row.
There is a similar expansion with respect to any other row, say the i-th row. To
show this, first construct a new matrix C from A by moving the i-th row of A up
to the first row by interchanges with its preceding i - I rows one by one. Then
det A = (_1)i-1 det C as before . Now, the expansion of det C with respect to the
first row [ail .. . ain] is

detC = ailCII +ai2C\2 + . .. +ainCln,

where Clj =:. (-I)j-1 det Mlj and Mlj denotes the minor of Clj in the matrix C.
Noting that Mlj = Mij as minors, we have

Aij = (-I)i- IClj = (-I/+ j detMij


as before and then

det A = ailAil + ai2Ai2 + .. . + ainAin.

The Mij is called the minor of the entry aij and the number Aij
( _1)1 + J det Mij is called the cofactor of the entry aij.
Also, one can do the same with the column vectors because det AT = det A. This
gives the following theorem:
58 Chapter 2. Determinants

Theorem 2.7 Let A be an n x n matrix and let Aij be the cofactor of the entry aij'
Then,
(1) for each 1 s i s n,
det A = ail Ail + aizAiz + . . .+ ainAin,
called the cofactor expansion of det A along the i -th row.
(2) For each 1 s sj n,

det A = aljAlj + a2jAzj + . . . + anjAnj,


called the cofactor expansion of det A along the j -tb column.

This cofactor expansion gives an alternative way of defining the determinant


inductively.
Remark: The sign (-1 )i+ j of the cofactor can be explained as follows:

+ + (_l)l+n
+ (_l)2+n
+ + (_1)3+n

(_l)n+1 (_l)n+Z (_l)n+3 (_l)n+n

Therefore, the determinant of an n x n matrix A is the sum of the products of the


entries in any given row (or, any given column) with their cofactors.
Example 2.5 (Computing det A by a cofactor expansion) Let

23]
5 6 .
8 9
Then the cofactors of all , al2 and al3 are

All = (_1)1+1 det [ ; 5 ·9-8 ·6=-3,

AI2 = (-l)I+Zdet[ = (-1)(4·9-7 .6)=6,

AI3 = (_1)1+3det[ = 4·8-7 ·5=-3,

respectively. Hence the expansion of det A along the first row is

detA = allAII + alzAI2 + al3AI3 = 1 . (-3) + 2·6 + 3· (-3) = O. 0


2.3. Cofactor expansion 59

The cofactor expansion formula of det A suggests that the evaluation of Aij can be
avoided whenever aij = 0, because the product aij Aij is zero regardless of the value
of Aij. Therefore, the computation of the determinant will be simplified by making
the cofactor expansion along a row or a column that contains as many zero entries
as possible. Moreover, by using the elementary row (or column) operations which
do not alter the determinant, a matrix A may be simplified into another one having
more zero entries in a row or in a column whose computation of the determinant may
be simpler. For example, a forward elimination to a square matrix A will produce an
upper triangular matrix U, and so the determinant of A will be just the product of the
diagonal entries of U up to the sign caused by possible row interchanges. The next
examples illustrate this method for an evaluation of the determinant.

Example 2.6 (Computing det A by aforward elimination and a cofactor expansion)


Evaluate the determinant of

I -1]
I -1 2
-3 4 1 -1
A -- 2 -5 -3 8 .
-2 6-4 I

Solution: Apply the elementary operations:

3 x row 1 + row 2,
(-2) x row 1 + row 3,
2 x row 1 + row 4

to A; then

det A = det
1 -1
o
0 -3
1 2-1] = det
[1 7-4]
-3 -7 10 .
[
o 4 o -1 4 0-1

Now apply the operation : 1 x row 1 + row 2, to the matrix on the right-hand side,
and take the cofactor expansion along the second column to get

7
-4 ]
-7 10 = det -:]
o -1 4 0 -1

= (_1)1+2.7. det [

-7(2 - 24) = 154.

Thus, det A = 154. o


60 Chapter 2. Determinants

Example 2.7 Show that det A = (x - y)(x - z)(x - w)(y - z)(y - w)(z - w) for
the Vandermonde matrix of order 4:

Solution: Use Gaussian elimination. To begin with, add (-1) x row 1 to rows 2, 3,
and 4 of A :
3

detA = det
[ 01 y -x x y2 x'
_ x 2 y3 x_ x3 J
o z-x Z2 - x 2 Z3 _ x 3
o w-x w 2 _ x 2 w 3 _x 3

[ y-x y' _ x' y3 - x 3 ]


= det Z - x Z2 _ x 2 Z3 _ x 3
w-x w 2 _x 2 w 3 _x 3

y+x y' +xy +x' ]


= (y - x)(z - x)(w - x) det [ : z+x Z2 +xz +x 2
w+x w 2 +xw +x 2

y+x y' +xy +x' ]


(x - y)(x - Z)(w - x)det [ z-y (z - y)(z + y + x)
w-y (w - y)(w + y + x)

(x - y)(x - z)(w _ x) det [ Z - Y (z - y)(z + y + x) ]


w-y (w-y)(w+y+x)

= (x - y)(x - z)(x - w)(y - z)(w - y) det [ z+y+x]


w+y+x

= (x - y)(x - z)(x - w)(y - z)(y - w)(Z - w). 0

Problem 2.9 Use cofactor expansions along a row or a column to evaluate the determinants of
the following matrices:

(1) A = 2 2 0 1 '
2 220

Problem 2.10 Evaluate the determinant of


2.4. Cramer's rule 61

0
-a
a
0 be]
dc
(2) B ==
[
-b-d o 1 .
-c -e -I 0

2.4 Cramer's rule


In Chapter 1, we have studied two ways for solving a system of linear equations
Ax = b: (i) by Gauss-Jordan elimination (or by LDU factorization) or (ii) by using
A -I if A is invertible. In this section, we introduce another method for solving the
system Ax = b for an invertible matrix A .
The cofactor expansion of the determinant gives a method for computing the
inverse of an invertible matrix A. For i f. j, let A * be the matrix A with the j-th row
replaced by the i-th row. Then the determinant of A* must be zero, because the i-th
and j-th rows are the same. Moreover, with respect to the j-th row, the cofactors of
A* are the same as those of A : that is, Ajk = Ajk for all k = 1, . .. , n. Therefore,
we have

O=detA* = ailAjl+aizAjZ+ .. ·+ainAjn


= ailAjl + «ns n + ... + ainAjn .

This proves the following lemma.


Lemma 2.8
ifi=j
ifif.j .
Definition 2.6 Let A be an n x n matrix and let A ij denote the cofactor of aij. Then
the new matrix

[ :::
.':: •
is called the matrix of cofactors of A. Its transpose is called the adjugate of A and
is denoted by adj A.

It follows from Lemma 2.8 that


detA 0
o detA
A · adj A = ] = (detA)/.
[
detA

If A is invertible, then det A f. 0 and we may write A A adj A) = I , Thus


62 Chapter 2. Determinants

and A = (det A) adj (A -I)

by replacing A with A -I.

Example2.8 (Computing A-I with adj A) For a matrix A = adj A =

[ -cd-b]
a
, and if det A = ad - be f:. 0, then

A-I _ 1 [ d -ab ] .
- ad -bc -c

Problem 2.11 Compute adj A and A -I for A =[ ; i


2 -2 1

Problem 2.12 Show that A is invertible if and only if adj A is invertible, and that if A is
invertible, then
(adj A)-I = -A- = adj(A- 1 ) .
detA

Problem 2.13 Let A be an n x n invertible matrix with n > 1. Show that


(1) det(adj A) = (detA)n-l ;
(2) adj(adj A) = (det A)n-2 A.

Problem 2.14 For invertible matrices A and B, show that


(1) adj AB =
adj B . adj A;
(2) adj QAQ-I =
Q(adj A)Q-l for any invertible matrix Q;
(3) if AB = =
BA , then (adj A)B B(adj A) .
In fact, these three properties are satisfied for any (invertible or not) two square matrices A and
B. (See Exercise 6.5.)

The next theorem establishes a formula for the solution of a system of n equations
in n unknowns . It may not be useful as a practical method but can be used to study
properties of the solution without solving the system.

Theorem 2.9 (Cramer's rule) Let Ax = b be a system ofn linear equations in n


unknowns such that det A f:. O. Then the system has the unique solution given by

detCj
x'--- j = 1,2, ... , n,
]- detA '

where C j is the matrix obtainedfrom A by replacing the j -th column with the column
matrixb = [bl b2 .. . bnf.
2.4. Crame r 's rule 63

Proof: If det A =1= 0, then A is invertible and x =


A-I b is the unique solution of
Ax = b. Since
x = A-l b = _ I_ (adj A)b,
detA
it follows that
det Cj
detA
o

Exam ple 2.9 Use Cramer 's rule to solve

I
Xl + 2X2 + X3 = 50
2Xl + 2X2 + X3 = 60
Xl + 2 X2 + 3X3 = 90.

Solution:

[ ; 2 I
2I] ,

n
2 3

Cl =
[ 50
2
60 2
90 2
[i 50 1]
60 I ,
90 3
[i 2
50 ]
2 60 .
2 90
Therefore,
det CI det C2 detC3
Xl = - - = 10
det A '
X2 = - - = 10,
detA
X3 = - - = 20.
det A
o
Cramer's rule provides a convenient method for writing down the solution of a
system of n linear equations in n unknowns in terms of determina nts. To find the
solution, however, one must evaluate n + I determinants of order n, Evaluating even
two of these determi nants generally involves more computations than solving the
system by using Gauss-Jordan elimination.
Problem 2.15 Use Cramer 's rule to solve the following systems.

I
4x2 + 3X3 -2
(I) 3Xl + 4X2 + 5X3 6
- 2x 1 + 5X2 2X3 1.
2 3 5
- + - 3
x y z
4 7 2
(2) + -y + - 0
x z
2 I
- - - 2.
y z
Problem 2.16 Let A be the matrix obtained from the identity matrix In with i-th column
replaced by the column vector x = [Xl .. . xn f. Compute det A.
64 Chapter 2. Determinants

2.5 Applications

2.5.1 Miscellaneous examples for determinants

Example 2.10 Let A be the Vandermonde matrix of order n:

1 Xl Xf ...
1 X2 X22 . . . X2n-l
A= . . . . .. '
[ .. . .
"

1 Xn x n2 x nn - l

Its determinant can be computed by the same method as in Example 2.7 as follows:

detA = TI (Xj -Xi) .

Example 2.11 Let Aij denote the cofactor of aij in a matrix A = [aij]. If n > 1,
then
0 Xl X2 Xn
xl au al2 al n n n
det x2 a21 a22 a2n = - LLAijXiXj .
i=l j=l
x n anI an2 ann

Solution: First take the cofactor expansion along the first row, and then compute the
cofactor expansion along the first column of each n x n submatrix. 0

Example 2.12 Let f I. 12 , ... , fn be n real-valued differentiable functions on JR.


For the matrix

fl(X) h(x)
fn(x) ]
f{(x) f 2(x )
[ ,
f?-:I) (x) j 2(n- l)( )
x .. ·

its determinant is called the Wronskian for (fl (x), h(x) , . . . , fn(x)} , For example ,

X sin X cos X ] [ X sin X X + sin X ]


det 1 C?S X - sin x = - x, but det 1 1 + C?SX
C?SX = o.
[ o - sm x - cos x 0 - sm x - sm x

In general, in fl , 12, ..., fn , if one of them is a constant multiple of another or a


sum of such multiples, then the Wronskian must be zero.
2.5.1. Application: Miscellaneousexamples for determinants 65

Example 2.13 An n x n matrix A is called a circulant matrix if the i-th row of A


is obtained from the first row of A by a cyclic shift of the i - I steps, i.e., the general
form of the circulant matrix is

al a2 a3 an
an al a2 an-I
A= an-I an al an-2

a2 a3 a4 al

where w = e2rri /3 is the primitive root of unity. In general, for n > 1,

detA = n
n-I

j =O
(al +a2Wj +a3w7 +... +anwrl) ,

where W j = e2rrij In, j = 0, 1, ... , n - 1, are the roots of unity. (See Example 8.18
for the proof.)

Example 2.14 A tridiagonal matrix is a square matrix of the form

al bl 0 0 0 0
CI a2 b2 0 0 0
0 C2 a3 0 0 0
Tn =

0 0 0 an-2 bn-2 0
0 0 0 Cn-2 an-I bn-I
0 0 0 0 Cn-I an

The determinant of this matrix can be computed by a recurrence relation: Set Do = 1


and Dk = det n for k 2: 1. By expanding n
with respect to the k-th row, one can
have a recurrence relation

The following two special cases are interesting .


Case (1) Let all ai, b j , Ck be the same, say a; = bj = Ck = b > O. Then, DI = b,
D2 = 0 and
66 Chapter 2. Determinants

Dn = bDn-1 - b 2D n_2 for n 3.


Successively, one can find D3 = _b 3 , D4
= _b 4 , D5 = 0, . . .. In general, the
n-th term D n of the recurrence relation is given by

o; = n
b [cos n; + sin n; l
Later in Section 6.3.1, it will be discussed how to find the n-th term of a given
recurrence relation. (See Exercise 8.6.)
Case (2) Let all bj = 1 and all Ck = -1, and let us write

al 0 0 0 0
-1 a2 1 0 0 0
0 -1 a3 0 0 0
(al . .. an) = det
0 0 0 an-2 0
0 0 0 -1 an-I 1
0 0 0 0 -1 an

Then,
(ala2·· · an) 1
=al+
(a2a3 .. . an) + 1
a2
a3+

+ 1
an-I +-an

Proof: Let us prove it by induction on n. Clearly, al + = ih .It remains to


show that
1 (ala2 . . . an)
al + = ,
(a2a3 . . . an) (a2a3 . . . an)
(a3a4 · · · an)
i.e., al (a2 . .. an) + (a3 .. . an) = (a,a2 . . . an) ' But this identity follows from the
previous recurrence relation, since (a,a2' . . an) = (an . . . a2a,) . D

Example 2.15 (Binet-Cauchy formula) Let A and B be matrices of size n x m and


m x n, respectively, and n ::: m. Then

where Ak! ...k n is the minor obtained from the columns of A whose numbers are
kl' . . . ,kn and Bk! ...kn is the minor obtained from the rows of B whose numbers
2.5.2. Application:Area and volume 67

are kl' ... ,kn . In other words, det(AB) is the sum of products of the corresponding
majors of A and B, where a major of a matrix is, by definition, a determinant of
maximal order minor in the matrix.

Proof: Let C = AB, Cij = aikbkj' Then

m
L alkt ' . . ankn L (_1)<1 bkt<1(l) ... bkn<1(n)
kt.....kn=l <1eSn
m
= L alkl" . anknB kl...kn.
kt.....kn=l

The minor Bkl ...kn is nonzero only if the numbers kl' ,kn are distinct. Thus, the
summation can be performed over distinct numbers kl ' ,kn . Since B,(kIl...,(kn) =
(-1)' Bkt ...kn for any permutation r of the numbers kl' ,kn , we have

1 -1 3 3 21 -12 ]
For example, if A = [ 2 2-1 2] and B _; • then
[

det(AB) = det [ -1]


2 det [12 -12 ] + det [12 -13] det [ - 31 2]
1

+ det [ 21 3]
2 det [11 2]
2 + det [ -12 -13] det [ -23 -1]
1

+ det [
- 1
2 23] [2 -1] + [3 3] [-3 1]
det 1 2 det -1 2 det 1 2

= -167.

2.5.2 Area and volume

In this section, we demonstrate a geometrical interpretation of the determinant of a


square matrix A as the volume (or area for n = 2) of the parallelepiped peA) spanned
by the row vectors of A. For this, we restrict our attention to the case of n = 2 or 3
for a visualization, even if a similar argument can be applied for n > 3.
68 Chapter 2. Determinants

For an n x n square matrix A, its row vectors ri = [a i 1 ... ain] can be considered
as elements in

The set
P(A) = [ttiri:o s u s
1=1
1, i = 1,2, .. . ,nl
is called a parallelogram if n = 2, or a parallelepiped if n 2: 3. Note that the row
vectors of A form the edges of P (A) , and changing the order of row vectors does not
alter the shape of P(A) .
Theorem 2.10 The determinant det A of an n x n matrix A is the volume ofP(A)
up to sign. Infact, the volume ofP(A) is equal to 1 det A I.

Proof: We give the proof for the case n = 2 only, and leave the case n = 3 to the
readers. Let

where rl , r2 are the row vectors of A. Let Area(A) denote the area of the parallelo-
gram P(A) (see Figure 2.3). Note that

Area [ ] = Area [ ] ,

l
but
det [ ] = - det [
Thus, one can expect in general that

det [ ] = ±Area [ ] ,

which explains why we say 'up to sign' . To determine the sign ±, we first define the
orientation of A = [ ] to be

p(A) = { µ1 ifdetA =1= 0 ,


ifdetA = O.
In general, p(A) = 1 if and only if det A > 0: In this case, we say the ordered
pair (rl r2) is positively oriented, while p(A) = -1 if and only if det A < 0: In this
case, a is negatively oriented. See Figure 2.2 (next page).

For example, p ([ = 1, while p ([ = - 1.

Tofinish the proof, it is sufficientto show that the function D(A) = p(A)Area(A)
satisfies the rules (RI)-(R3) of the determinant, so that det = D = ±Area, or
Area(A) = 1 det AI. However,
2.5.2. Application: Area and volume 69

y y

___--+"""""""-'-__ x -------IO.-+--- x

Figure 2.2. Orientation of vectors

(1) it is clear that D = 1.

(2) D [ ] = -D [ ] ,because p ([ = -p ([ ]) and

Area [ ] = Area [ l
(3) D [ ] = kD [ ] for any k . Indeed, if k = 0, it is clear. Suppose k t= o.
Then , as illustrated in Figure 2.3, the bottom edge r j ofP(A) is elongated by Ikl
while the height h remains unchanged. Thus

-¥::::.....-----------+-x

Figure 2.3. The parallelogram P ([ ])

On the other hand,


70 Chapter 2. Determinants

Therefore , we have

D[ = p ([ ]) Area [ ]

= 1:1 p ([ ]) IklArea [ ] = kD [ ].

(4) D [r l ] = D ]+D ] for anyu.rj andr2injR2.Ifu =O,there


is nothing to prove. Assume thatu f= O.Chooseanyvectorv E jR2 suchthat{u, v}
is a basis for jR2 and the pair (u , v) is positively oriented. Then rj = ajU + biv,
i = 1,2, and

D [ rl r2] = D [ (al + a2)u (bl + b2)V ]

= D [ (bl b 2)V] = (bl + b2)D [ : ]

= D [ alU blv ] + D [ a2
U V
b2 ]

= D[ ] + D[ l
The second equality follows from (2) and Figure 2.4.

o
Remark: (1) Note that if we have constructed the parallelepiped peA) using the
column vectors of A, then the shape of the parallelepiped is totally different from
the one constructed using the row vectors. However, det A = det A T means their
volumes are the same, which is a totally nontrivial fact.
(2) For n 3, the volume ofP(A) can bedefined by induction onn, and exactly the
same argument in the proof can be applied to show that the volume is the determinant.
However, there is another 'Nayoflooking at this fact. Let {CI, C2, . . . , cn } be n column
2.5.2. Application:Area and volume 71

vectors of an m x n matrix A. They constitute an n-dimensional parallelepiped in JRm


such that

P(A) = [ttiCi : O::::ti::::l.i=1.2.... ,nj .


1= \
A formula for the volume of this parallelepiped may be expressed as follows: We
first consider a two-dimensional parallelepiped (a parallelogram) determined by two
column vectors C\ and C2 of A =
[CI c2l in R3. The area of this parallelogram is

z
Figure 2.5. A parallelogram in ]R3

simply Area(P(A)) = IIc\lIh, where h = IIc211 sin () and () is the angle between CI
and C2. Therefore. we have
2II 2
Area(P(A))2 = IIcIil c211 sin 2 () = IIcIi1211c2112(l - cos 2 ())

(C\ ' C2)2 )


= (CI ' CI)(C2 . C2) ( 1 - - - - - -
(CI . C\)(C2 . C2)

= (CI ' CI)(C2 . C2) - (CI • C2)2

= det [ C\ . C\ CI' C2 ]
C2 . CI C2 ' C2

det ([:r] [CI C2]) =


T
det(A A),

where "." denotes the dot product. In general, let CI, • . . • Cn be n column vectors
of an m x n (not necessarily square) matrix A. Then one can show (for a proof see
Exercise 5.17) that the volume of the n-dimensional parallelepiped P(A) determined
by those n column vectors C/s in R m is

vol(P(A)) = Jdet(A T A) .
In particular, if A is an m x m square matrix. then

vol(P(A)) = Jdet(AT A) = Jdet(AT)det(A) = [det A],


as expected.
72 Chapter 2 . Determinants

Problem 2.17 Show thattheareaofa triangle ABC in the plane lR 2 , where A = (xj , YI), B =
=
(X2, Y2), C (X3, Y3), is equal to the absolute value of

2I det [Xl
X2
YI
Y2
1]
1 .
X3 Y3 1

2.6 Exercises

2.1. Determine the values of k for which det 2:] = o.


2.2. Evaluate det(A 2 BA -1) and det(B- I A 3) for the following matrices:
A = [-; -; { ] , B= ;] .
o 1 0 2 I 3
2.3. Evaluate the determinant of

A =
2 -3 -5 8
2.4. Evaluate det A for an n x n matrix A = [aij] when
(1) aij = { 0
I ji f
. _ . (2) aij = j + j.
J,
1-

2.5. Find all solutions of the equation det(AB) = 0 for


A=[X;2
2.6. Prove that if A is an n x n skew-symmetric matrix and n is odd, then det A = O. Give an
example of 4 x 4 skew-symmetric matrix A with det A f O.
2.7. Use the determinant function to find
(1) the area of the parallelogram with edges determined by (4, 3) and (7, 5),
(2) the volume of the parallelepiped with edges determined by the vectors
(1,0,4), (0, -2,2) and (3, 1, -I) .
2.8. Use Cramer 's rule to solve each system .

(1) { Xl + X2 = 3
xl x2 = -1.
Xl + x2 + X3 2
(2)
1 xl
Xl
+
+
2X2
3X2
+ X3
X3
=
-4 .
2

X2 + X4 = -1
Xl + X3 = 3
(3)/ Xl x2 x3 X4 2
Xl + X2 + X3 + X4 O.
2.6. Exercises 73

(U:
2.9. Use Cramer's rule to solve the given system :

(l)[l 2J
2.10 . Find a constant k so that the system of linear equations
kx - 2y - z 0
(k + l)y + =
1 4z
(k - ljz = 0
0

has more than one solution . (Is it possible to apply Cramer's rule here?)
2.11. Solve the following system oflinear equations by using Cramer's rule and by using Gaus -
sian elimination:

[1 :
2.12 . Solve the following system of equations by using Cramer's rule:
3x + 2y = 3z + 1
3x +
1 3z -
2z
1
=
=
8 -
x -
5y
2y.
2.13. Calculate the cofactors A II, A \2, A 13 and A33 for the matrix A:

; ;],
211 312
(3)A=[-;
32
;1]'
2.14. Let A be the n x n matrix whose entries are all 1. Show that
(1) det(A - nIn) = O.
(2) (A - nIn)ij =
(_1)n-1 nn-2 for all i, j, where (A - nIn)ij denotes the cofactor of
the (i , j)-entry of A - nino
2.15 . Show that if A is symmetric, so is adj A. Moreover, if it is invertible , then the inverse of
A is also symmetric.
2.16 . Use the adjugate formula to compute the inverses of the following matrices:

;],
4 1 -1 sine 0 cos s
2.17. Compute adj A, detA, det(adj A), A -I, and verify A · adj A = (detA)I for

(1) A = [-; (2) A =


3 -2 1 1 5 7
2.18. Let A, B be invertible matrices . Show that adj(AB) adj B adj A . =
(The reader may also try to prove this equality for noninvertible matrices .)
2.19. For an m x n matrix A and n x m matrix B, show that

det ] = det(AB) .
2.20. Find the area of the triangle with vertices at (0,0), (1,3) and (3,1) in ]R2 .
74 Chapter 2. Determinants

2.21. Find the area of the triangle with vertices at (0, 0, 0), (1, 1,2) and (2,2,1 ) in ]R3.

2.22. For A , B, C, D E Mn x n (]R), show that det = det A det D. But , in general,
det =P det A det D - det B det C.
2.23. Determine whether or not the following statements are true in general, and justify your
answers.
(1) For any square matrices A and B of the same size, det(A + = det A + det B .
B)
(2) For any square matrices A and B of the same size, det(AB) = det(BA) .
(3) If A is an n x n square matrix, then for any scalar c, det(c1n - A) = c" - det A .
(4) If A is an n x n square matrix, then for any scalar c, det(c1n - AT) = det(cln - A).
(5) If E is an elementary matrix, then det E = µ1.
(6) There is no matrix A of order 3 such that A 2 = -lJ.
(7) Let A be a nilpotent matrix, i.e., A k = 0 for some natural number k , Then det A = 0.
(8) det(kA) = k det A for any square matrix A.
(9) The multilinearity holds in any two rows at the same time :

a+u b+ v c+w] [a b C] [U
+ x e + y f + z = det d e f + det x
[e m n
det d
emn e
(10) Any system Ax = b has a solution if and only if det A f= O.
(1 1) For any n x 1, n 2, column vectors u and Y, det(uy T )= o.
(12) If A is a square matrix with det A = 1, then adj(adj A ) = A .
(13) If the entries of A are all integers and det A = 1 or -1 , then the entries of A-I are
also integers .
(14) If the entries of A are O's or 1's, then det A = 1,0, or -1.
(15) Every system of n linear equations in n unknowns can be solved by Cramer's rule.
(16) If A is a permutation matrix, then AT = A.
3
Vector Spaces

3.1 The n-space jRn and vector spaces

Wehave seen thatthe Gauss-Jordan elimination is the most basic technique for solving
a system Ax = b of linear equations and it can be written in matrix notation as an
LDU factorization. Moreover, the questions of the existence or the uniqueness of the
solution are much easier to answer after the Gauss-Jordan eliminat ion. In particular,
if det A f= 0, x = 0 is the unique solution Ax = O. In general, the set of solutions
of Ax = 0 has a kind of mathematic al structure, called a vector space, and with this
concept one can characteri ze the uniqueness of the solution of a system Ax = b of
linear equations in a more systematic way.
In this chapter, we introduce the notion of a vector space, which is an abstraction
of the usual algebraic structure s of the 3-space ]R3 and then elaborate our study of a
system oflinear equations to this framework.
Usually, many physical quantities, such as length, area, mass, temperature are
described by real numbers as magnitudes. Other physical quantities like force or
velocity have directions as well as magnitudes . Such quantities with direction are
called vectors, while the numbers are called scalars. For instance, a vector (or a
point) x in the 3-space ]R3 is usually represented as a triple of real numbers:

where Xi E ]R, i = 1, 2, 3, which are called the coordinates of x. This expression


provides a rectangular coordinate system in a natural way. On the other hand, pictor-
ially such a point in the 3-space ]R3 can also be represented by an arrow from the
origin to x. In this way, a point in the 3-space ]R3 can be understood as a vector. The
direction of the arrow specifies the direction of the vector, and the length of the arrow
describes its magnitude.
In order to have a more general definition of vectors, we extract the most basic
properties of those arrows in 1R3 . Note that for all vectors (or points) in ]R3, there are
two algebraic operations: the sum of two vectors and scalar multiplication of a vector

HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
76 Chapter 3. Vector Spaces

by a scalar. That is, for two vectors x = (XJ, x2, X3), Y = (YI, Y2 , Y3) in 1R3 and k a
scalar, we define

x+ Y = (Xl + YI,X2 + Y2, X3 + Y3),


kx = (kXI, kX2, kx3) .

Then a vector x = (Xl, X2, X3) in 1R3 may be written as

where i = (1,0,0), j = (0,1,0) and k = (0,0, 1) which were introduced as the


rectangular coordinate system in vector calculus. The sum of vectors and the scalar
multiplication of vectors in the 3-space 1R3 are illustrated in Figure 3.1:

X3 x+y X3

Xl
Figure 3.1. A vector sum and a scalar multiplication

Even though our geometric visualization of vectors does not go beyond the
3-space 1R3 , it is possible to extend these algebraic operations of vectors in the 3-
space 1R3 to the n-space IRn for any positive integer n. It is defined to be the set of all
ordered n-tuples (aI, a2, ... ,an) of real numbers, called vectors: i.e.,

For any two vectors x = (Xl, X2, . .. ,xn) and Y = (YI, Y2 , . .. , Yn) in the n-space IRn ,
and a scalar k, the sum X + Yand the scalar multiplication kx of them are vectors in
IRn defined by

x-l-y = (XI+YI, X2+Y2, ... ,X n+Yn) ,


kx = (kxl, kX2, . .. , kx n) .

It is easy to verify the following arithmetical rules of the operations:

Theorem 3.1 For any scalars k and f., and vectors x = (Xl , X2, . . . ,Xn ), Y =
(YI, Y2, . . . , Yn), and z = (Zl, Z2, . . . ,Zn) in the n-space IRn , the following rules
hold:
3.1. The n-space jRn and vector spaces 77

(1) x + Y = Y + x,
(2) x + (y + z) = (x + y) + z,
(3) x + 0 = x = 0 + x,
(4) x+(-I)x=O,
(5) k(x + y) = kx + ky,
(6) (k + e)x = kx + ex,
(7) k(ex) = (ke)x,
(8) lx = x,
where 0 = (0, 0, ... , 0) is the zero vector.

We usually identify a vector (aI, a2, ... ,an) in the n-space lRn with an n x I
column vector

Sometimes a vector in lRn is also identified with a I x n row vector (see Section 3.5).
Then, the two operations of the matrix sum and the scalar multiplication of column
matrices coincide with those of vectors in lRn , and Theorem 3.1 rephrases Theorem 1.3.
These rules of arithmetic of vectors are the most important ones because they
are the only rules that we need to manipulate vectors in the n-space lR n . Hence,
an (abstract) vector space can be defined with respect to these rules of operations of
vectors in the n-space lRn so that lRn itself becomes a vector space. In general, a vector
space is defined to be a set with two operations: an addition and a scalar multiplication
which satisfy the rules (1)-(8) in Theorem 3.1.
Definition 3.1 A (real) vector space is a nonempty set V of elements, called vectors,
with two algebraic operations that satisfy the following rules.
(A) There is an operation called vector addition that associates to every pair x and
y of vectors in V a unique vector x + y in V, called the sum of x and y, so that the
following rules hold for all vectors x, y, z in V :
(1) x + Y = Y+ x (commutativity in addition),
(2) x + (y+z) = (x-l-y) + z(= x + Y+ z) (associativity in addition),
(3) there is a unique vector 0 in V such that x + 0 = x = 0 + x for all x E V (it is
called the zero vector) ,
(4) for any x E V, there is a vector -x E V, called the negative of x, such that
x + (-x) = (-x) + x = O.
(B) There is an operation called the scalar multiplication that associates to each
vector x in V and each scalar k a unique vector kx in V so that the following rules
hold for all vectors x, y, z in V and all scalars k, e:
(5) k(x + y) = kx + ky (distributivity with respect to vector addition),
(6) (k + e)x = kx + ex (distributivity with respect to scalar addition) ,
78 Chapter 3. Vector Spaces

(7) k(lx) = (kl)x (associativity in scalar multiplication),


(8) Ix = x.
Clearly, the n-space JRn is a vector space by Theorem 3.1. A complex vector
space is obtained if, instead of real numbers, we take complex numbers for scalars.
For example , the set en of all ordered n-tuples of complex numbers is a complex
vector space. In Chapter 7 we shall discuss complex vector spaces, but until then we
will discuss only real vector spaces unless otherwise stated.

Example 3.1 (Miscellaneous examples 01 vector spaces)


(1) For any two positive integers m and n, the set Mmxn(JR) of all m x n matrices
forms a vector space under the matrix sum and the scalar multiplication defined in
Section 1.3. The zero vector in this space is the zero matrix Omxn , and -A is the
negative of a matrix A.
(2) Let A be an m x n matrix. Then it is easy to show that the set of solutions
of the homogeneous system Ax = 0 is a vector space (under the sum and the scalar
multiplication of matrices).
(3) Let C(JR) denote the set of real-valued continuous functions defined on the
real line JR. For two functions I and g, and a real number k, the sum I + g and the
scalar multiplication kf of them are defined by

(f + g)(x) I(x) + g(x) ,


(kf)(x) = kl(x).

Then the set C(lR) is a vector space under these operations. The zero vector in this
space is the constant function whose value at each point is zero.
(4) Let S(JR) denote the set of real-valued functions defined on the set of integers.
A function I E S(JR) can be written as a doubly infinite sequence of real numbers:

. .• , X-2, X- I , Xo , XI , X2 , •• . ,

=
where Xk I (k) for each k. This kind of sequences appear frequently in engineering,
and is called a discrete or a digital signal. One can define the sum of two functions
and the scalar multiplication of a function with a scalar just as in C(JR) in (3) so that
S(JR) becomes a vector space. 0

Theorem 3.2 Let V be a vector space and let x, y be vectors in V. Then


(1) x + Y = Y implies x = 0,
(2) Ox = 0,
(3) kO = 0 lor any k E R
(4) -x is unique and -x = (-l)x,
(5) ifkx = 0, then k = 0 orx = O.

Proof: (1) By adding -y to both sides ofx + y = y, we have

x = x + 0 = x + Y+ (-y) =y+ (-y) = O.


3.2. Subspaces 79

(2) Ox = (0 + O)x = Ox + Ox implies Ox = 0 by (1).


(3) This is an easy exercise.
(4) The uniqueness of the negative -x ofx can be shown by a simple modification
of Lemma 1.7. In fact, if x is another negative of x such that x + x = 0, then

-x = -x + 0 = -x + (x + x) = (-x + x) + X= 0 + x = x.
On the other hand, the equation

x + (-I)x = lx + (-I)x = (I - I)x = Ox = 0

=
shows that (-l)x is another negative of x, and hence -x (-l)x by the uniqueness
of-x.
(5) Suppose kx = 0 and k ,p. O.Then x = lx = t(kx) = = O. to0

Problem 3.1 Let V be the set of all pairs (x, y) of real numbers. Suppose that an addition and
scalar multiplication of pairs are defined by

(x, y) + (u , v) = (x + 2u, y + 2v), k(x, y) = (kx, ky).

Is the set V a vector space under those operations? Justify your answer.

3.2 Subspaces
Definition 3.2 A subset W of a vector space V is called a subspace of V if W itself
is a vector space under the addition and the scalar multiplication defined in V.

In order to show that a subset W is a subspace of a vector space V, it is not


necessary to verify all the arithmetic rules of the definition of a vector space. One
only needs to check whether a given subset is closed under the same vector addition
and scalar multiplication as in V . This is due to the fact that certain rules satisfied in
the larger space are automatically satisfied in every subset.

Theorem 3.3 A nonempty subset W ofa vector space V is a subspace if and only if
x + y and kx are contained in W (or equivalently, x + ky E W) for any vectors x
and y in Wand any scalar k E JR.

Proof: We need only to prove the sufficiency. Assume both conditions hold and let
x be any vector in W. Since W is closed under scalar multiplication, 0 = Ox and
-x = (-I)x are in W, so rules (3) and (4) for a vector space hold . All the other rules
for a vector space are clear. 0

A vector space V itself and the zero vector {OJ are trivially subspaces. Some
nontrivial subspaces are given in the following examples.
80 Chapter 3. Vector Spaces

Example 3.2 (Which planes in lR 3 can be a subspace?) Let

W = {(x, Y, z) 3
E lR : ax + by + cz = O},

wherea ,b,careconstants. Ifx = (XI,XZ,X3), Y = (YI,Yz,Y3) are points in W ,


then clearly x + Y = (XI + YI, Xz + yz, X3 + Y3) is also a point in W, because it
satisfies the equation in W. Similarly, kx also lies in W for any scalar k. Hence , W is
a subspace of lR3 , which is a plane passing through the origin in lR 3 • 0

Example 3.3 (The solutions of Ax = 0 form a subspace) Let A be an m x n matrix.


Then, as shown in Example 3.1(2), the set

W = {x E lRn : Ax = O}

of solutions of the homogeneous system Ax =


0 is a vector space . Moreover, since
the operations in Wand in lRn coincide , W is a subspace of lRn . 0

Example 3.4 For a nonnegative integer n, let Pn(lR) denote the set of all real poly-
nomials in x with degree :5 n. Then Pn (R) is a subspace of the vector space C(lR) of
all continuous functions on R 0

Example 3.5 (The space ofsymmetricor skew-symmetric matrices) Let W be the set
of all n x n real symmetric matrices. Then W is a subspace of the vector space Mn xn (R)
of all n x n matrices, because the sum of two symmetric matrices is symmetric and
a scalar multiplication of a symmetric matrix is also symmetric. Similarly, the set of
all n x n skew-symmetric matrices is also a subspace of Mn xn{lR). 0

Problem 3.2 Which of the following sets are subspaces of the 3-space Justify your answer.
(1) W = =
{(x, y, z) E JR3 : x yz O} ,
(2) W =
{(2t , 3t , 4t) E JR3 : t E JR},
(3) W={(X,y ,Z)EJR3 : xZ + y2 _ z2 = O},
(4) W =
{x E JR3 : x T u = =
0 x T v}, where u and v are any two fixed nonzero vectors in
JR3.

Can you describe all subspaces of the 3-space JR3 ?

Problem 3.3 Let V = C(JR) be the vector space of all continuous functions on lR. Which of
the following sets Ware subspaces of V ? Justify your answer.
(l) W is the set of all differentiable functions on lR.
(2) W is the set of all bounded continuous functions on R.
(3) W is the set of all continuous nonnegative-valued functions on JR, i.e., !(x) 2:: 0 for any
x E lR.
(4) W is the set of all continuous odd functions on JR, i.e., ! (-x)=- ! (x) for any x E lR.
(5) W is the set of all polynomials with integer coefficients.

Definition 3.3 Let U and W be two subspaces of a vector space V.


3.2. Subspaces 81

(1) The sum of V and W is defined by

V + W = {u + W E V : u E V , WE W}.

(2) A vector space V is called the direct sum of two subspaces V and W, written as
V = V ffi W , if V = V + Wand V n W = {O}.

It is easy to see that V + Wand V n W are also subspaces of V . If V = JR2


(x y-plane), V = {xi : x E JR}(x-axis), and W = {yj : y E JR} (y-axis), then it is easy
to see that JR2 = V ffi V = JR ffi JR, by considering the x-axis (and also the y-axis) as
R, Similarly, one can easily be convinced that JR3 = JR2 ffi JR 1 = JR 1 ffi JR 1 ffi JR I .

Problem 3.4 Let U and W be subspaces of a vector space V.

(l) Suppose that Z is a subspace of V contained in both U and W . Show that Z is also
contained in U n W.
(2) Suppose that Z is a subspace of V containing both U and W. Show that Z also contains
U + W as a subspace.

Theorem 3.4 A vector space V is the direct sum of subspaces V and W , i.e., V =
V ffi W, if and only if for any v E V there exist unique u E V and W E W such that
v = u-j-w.

Proof: (:::}) Suppose that V = V ffi W . Then, for any v E V, there exist vectors
u E V and WE W such that v = u + w, since V = V + W. To show the uniqueness,
suppose that v is also expressed as a sum u' + w' for u' E V and w' E W. Then
u + W = u' + w' implies

u - u' = w' - W E V n W = {O}.


=
Hence, u u' and W = w'.
(<=) Clearly, V =V + W. Suppose that there exists a nonzero vector v in V n W.
Then v can be written as a sum of vectors in V and W in many different ways:

1 1 1 2
v= v+ 0 = 0 + v = -v + -v = -v + -v E V + W. o
2 2 3 3
Example 3.6 (Sum, but not direct sum) In the 3-space JR3 , consider the three vectors
el = (1,0,0), e2 = (0, 1,0) and e3 = (0,0,1). These three vectors are also well
known as l, j and k respectively. Let V = {ai + ck : a, c E JR} be the xz-plane, and
let W = {bj + ck : b, c E JR} be the yz-plane, which are both subspaces ofJR3. Then
a vector in V + W is of the form

ai + bj + (Cl + c2)k
ai + bj + ck = (a , b , c),
82 Chapter 3. VectorSpaces

where c = ct + Cz and a, b, c can be arbitrary numbers . Thus U + W = ]R3. However,


]R3 f: U $ W since clearly k E un W f: {OJ. In fact, the vector k E ]R3 can be
written as many linear combinations of vectors in U and W:
1 1 1 2
k = -k
2
+ -k
2
= -k
3
+ -k
3
E U + W.

Note that if we had taken W = {yj : y E ]R} to be the y-axis, then it would be
easy to see that]R3 = U $ W. Note also that there are many choices for W so that
]R3 =U$ W. 0
Problem 3.5 Let U and W be the subspaces of the vector space MnxnOR) consisting of all
symmetric matrices and all skew-symmetric matrices, respectively. Show that Mnxn(lR) =
U e W . Therefore, the decomposition of a square matrix A given in (3) of Problem 1.11 is
unique.

3.3 Bases
As we know, a vector in the 3-space]R3 is of the form (XI , xz, X3) , and also it can be
written as

(XI, Xz, X3) = XI (1,0,0) + xz(O, 1,0) + X3(0, 0,1).

That is, any vector in ]R3 can be expressed as a sum of scalar multiples of three vectors
el = (1,0,0), ez = (0,1,0) and e3 = (0,0,1). The following definition gives a
name to such an expression.

Definition 3.4 Let V be a vector space and let {XI,Xz, . .. , Xm } be a set of vectors in
V. Then a vector y in V of the form

where aI, az , .. . , am are scalars, is called a linear combination of the vectors


XI, XZ, •••, Xm ·

The next theorem shows that the set of all linear combinations of a finite set of
vectors in a vector space forms a subspace .

Theorem 3.5 Let XI, Xz, . .. , Xm be vectors in a vector space V. Then the set W =
{aIxI + azxz + ... + amXm : a; E ]R} ofall linear combinationsof XI. Xz, . . . , Xm
is a subspace of V. It is called the subspace of V spanned by XI , Xz, . .. , Xm • Or,
XI, Xz, ... , Xm span the subspace W.

Proof: It is necessary to show that W is closed under the vector sum and the scalar
multiplication. Let u and w be any two vectors in W. Then

u = aIxI + azxz +
w = bIXI + bzxz +
3.3. Bases 83

for some scalars ai' S and b, 's. Therefore,

and , for any scalar k,

Thus, u + wand ku are linear combinations of XI, X2, . .. , Xm and consequently


contained in W. Therefore, W is a subspace of V . 0

Example 3.7 (A space can be spanned by many different sets)


(1) For a nonzero vector v in a vector space V, a linear combination of v is simply
a scalar multiple ofv. Thus the subspace W of V spanned by v is W = {kv : k E JR}.
Note that this subspace W can be spanned by any kv, k :f: O.
(2) Consider three vectors el = (1,0,0), e2 = (0,1,0) and v = el + =
(1, 1, 0) in JR3 . The subspace WI spanned by el and e2 is written as

WI = {aiel +a2e2 = (ai, a2, 0) : ai E JR} ,


and the subspace W2 spanned by ej , e2 and v is written as

W2 = {al el + a2e2 + a3v = (al + a3, a2 + a3, 0) : ai E JR}.


Clearly, WI S; W2 . On the other hand, since v = ei + e2 E WI, W2 S;
WI. Thus
WI = W2 which is the xy-plane in JR3 . In general, a subspace in a vector space can
have many different spanning sets. 0
Example 3.8 (Forany m ::: n, JRm is a subspace ofJRn) Let
el = (1,0,0, ,0),
e2 = (0,1 ,0, ,0) ,

en = (0,0,0, . . . , 1)
be n vectors in the n-space JRn (n 2: 3). Then a linear combination of ej , e2, e3 is of
the form

Hence, the set

is the subspace of the n-space JRn spanned by the vectors ej , e2, e3. Note that the
subspace W can be identified with the 3-space JR3 through the identification

(ai, a2, a3, 0, . .. , 0) =(ai, a2, a3)


with a, E JR. In general, for m ::: n, the m-space JRm can be identified as a subspace
of the n-space JRn . 0
84 Chapter 3. VectorSpaces

Example 3.9 (All Ax'sform the column space) Let A = [Cl C2'" cn] be an m x n
matrix with column c, 'so Then the column vectors Ci are in lRm , and the matrix product
Ax represents the linear combination of the column vectors Ci whose coefficients are
the components of x E lRn, i.e., Ax = Xl Cl + X2C2 + ... + XnC n (see Example 1.9).
Therefore, the set
W = {Ax E lRm : x E R"}
of all linear combinations of the column vectors of A is a subspace of lRm called the
=
column space of A . Consequently, Ax b has a solution (Xl , X2, .. . , x n ) in lRn if
and only if the vector b belongs to the column space of A. 0
Remark: One can take another point of view on the equation Ax = b. For any vector
x E lRn, A assigns another vector b in lRm which is given as Ax. That is, A can be
considered as a function from lRn into lRm and the column space of A is nothing but
the image of this function.
Problem 3.6 Let Xl, X2, , Xm be vectors in a vector space V and let W be the sub-
space spanned by xj , x2, , Xm. Show that W is the smallest subspace of V containing
Xl, X2, .. . , X m. Inotherwords,ifU is a subspace of V containing xl, X2, ... , xm,then
W £; U.

As we saw in Theorem 3.5 and Example 3.7, any nonempty subset of a vector
space V spans a subspace through the linear combinations of the vectors, and two
different subsets may span the same subspace. This means that a vector can be written
as linear combinations in various ways.
However, for some sets of vectors in a vector space V , any vector in V can be
expressed uniquely as a linear combination of the set. Such a set of vectors is called
a basis for V. In the following we will make this notion clear and show how to find
such a basis .

Definition 3.5 A set of vectors {Xl, X2, .. " xm} in a vector space V is said to be
linearly independent if the vector equation, called the linear dependence of Xi 'S,

has only the trivial solution Cl = C2 = ... = Cm = 0. Otherwise, it is said to be


linearly dependent.

By definition, a set of vectors {Xl, X2, . . . , xm} is linearly dependent if and only
if the linear dependence

CIXI + C2X2 + . .. + CmXm = 0

has a nontrivial solution (cj , C2, . . . , cm). For example , if Cm =1= 0, the equation can
be rewritten as
ci C2 Cm-l
Xm = --Xl - -X2 - ... - --Xm-l.
Cm Cm Cm

That is, a set ofvectors is linearly dependent if and only if at least one of the vectors
in the set can be written as a linear combination of the others. It means that at least
3.3. Bases 85

one of the vectors can be expressed as a linear combination of the set in two different
ways.
Example 3.10 (Linear independence in JR3) Let x = (1,2,3) and y = (3,2,1) be
two vectors in the 3-space JR3. Then clearly y =1= AX for any A E ]R (or ax + by = 0 is
possible only when a = b = 0). This means that {x, y} is linearly independent in JR3.
If w = (3, 6, 9), then {x, w} is linearly 'dependent since w - 3x = O. In general, if
x, y are non-collinear vectors in the 3-space ]R3, the set of all linear combinations of x
and y determines a plane W through the origin in]R3, i.e., W = {ax + by : a , b E R},
Let z be another nonzero vector in the 3-space JR3. If z E W, then there are some
scalars a, b E JR, not all of them are zero, such that z = ax + by, that is, the set
[x, y , z} is linearly dependent.Ifz W, then ax + by + cz = 0 is possible only
when a = b = c = 0 (prove it). Therefore, the set {x, y, z] is linearly independent
if and only if z does not lie in W. 0
By abuse of language, it is sometimes convenient to say that "the vectors
xr, X2, . .. , xm are linearly independent," although this is really a property of a
set.
Example 3.11 The columns of the matrix

A= 4
1 -2
2
[ 2 -1
-1 0]
6 8
1 3
are linearly dependent in the 3-space ]R3, since the third column is the sum of the first
and the second. 0
As shown in Example 3.11, the concept of linear dependence can be applied to
the row or column vectors of any matrix.
Example 3.12 Consider an upper triangular matrix

235] .
A=
[ 016
004
The linear dependence of the column vectors of A may be written as

CJ +'2 [!] +" [:]


which, in matrix notation, may be written as a homogeneous system:

From the third row, c3 = 0, from the second row c2 = 0, and substitution of them
into the first row forces Cl = 0, i.e., the homogeneous system has only the trivial
solution, so that the column vectors are linearly independent. 0
86 Chapter 3. Vector Spaces

The following theorem can be proved by the same argument.

Theorem 3.6 The nonzero rows of a matrix in row-echelon form are linearly inde-
pendent, and so are the columns that contain leading I's.

In particular, the rows of any triangular matrix with nonzero diagonals are linearly
independent, and so are the columns.

If V = lRm and VI, V2, . .. , Vn are n vectors in lRm , then they form an m x n
matrix A = [VI v2 .. . vnl . On the other hand, Example 3.9 shows that the linear
dependence ct VI + C2V2 + . .. + Cn Vn = 0 of Vi 'S is nothing but the homogeneous
equation Ax = 0, where x = (CI , c2, . .. , cn ) . Thus, one can say
(1) the column vectors Vi 'S of A are linearly independent in lRm if and only if the
homogeneous system Ax = 0 has only the trivial solution, and
(2) they are linearly dependent if and only if Ax = 0 has a nontrivial solution.
If U is the reduced row-echelon form of A , then we know that Ax = 0 and Ux = 0
have the same set of solutions. Moreover, a homogeneous system Ax = 0 with more
unknowns than equations always has a nontrivial solution by Theorem 1.2. This proves
the following lemma.

Lemma 3.7 (1) Ifn > m, any set ofn vectors in the m-space lRm is linearly depen-
dent.
(2) If U is the reduced row-echelon form of A, then the columns of U are linearly
independent if and only if the columns of A are linearly independent.

Example 3.13 Consider the vectors el = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1)
in the 3-space lR3. The matrix A = [e\ e2 e3l is the identity matrix and so Ax = 0 has
only the trivial solution. Thus , the set of vectors {el , e2, e3} is linearly independent
and also spans lR3 . 0
Example 3.14 (The standard basis for lRn ) The vectors ej , e2, ... , en in lRn are
clearly linearly independent (see Theorem 3.6). Moreover, they span the n-space lRn :
In fact, a vector x = (Xl, x2, .. . ,X n ) E lRn is a linear combination of the vector ei 's :

o
Definition 3.6 Let V be a vector space. A basis for V is a set of linearly independent
vectors that spans V.

For example, as in Example 3.14, the set [ej , e2, .. . , en} forms a basis, called the
standard basis for the n-space lRn . Of course, there are many other bases for lRn .

Example 3.15 (A basis or not)


(1) The set of vectors (1,1, 0) , (0, -1 , 1), and (1,0,1) is not a basis for the 3-
space lR3 , since this set is linearly dependent (the third is the sum of the first two
vectors), and cannot span lR3. (The vector (1, 0, 0) cannot be obtained as a linear
combination of them (prove it).) This set does not have enough vectors spanning lR3.
3.3. Bases 87

(2) The set of vectors (1, 0, 0), (0,1,1), (1, 0,1) and (0,1,0) is not a basis either,
since they are not linearly independent (the sum of the first two minus the third makes
the fourth) even though they span jR3 . This set of vectors has some redundant vectors
spanning jR3.
(3) The set of vectors (1,1,1), (0,1,1), and (0, 0,1) is linearly independent and
also spans jR3. That is, it is a basis for jR3, different from the standard basis. This set
has the proper number of vectors spanning jR3, since the set cannot be reduced to a
smaller set nor does it need any additional vector to span jR3 . 0

By definition, in order to show that a set of vectors in a vector space is a basis, one
needs to show two things: it is linearly independent, and it spans the whole space.
The following theorem shows that a basis for a vector space represents a coordinate
system just like the rectangular coordinate system by the standard basis for jRn.

Theorem 3.8 Let a = {VI, V2, ... , vn} be a basis for a vector space V. Then each
vector x in V can be uniquely expressed as a linear combination of VI, V2, ... , Vn,
i.e., there are unique scalars a.'s, i = 1, 2, .. . , n, such that

Proof: If x can also expressed as x = blvl + b2V2 + . . . + bnv n, then we have


o= (al - b; )VI + (a2 - b2)V2 + . . . + (an - bn)vn . By the linear independence of
vi's,ai=b iforalli=I,2, . . . ,n. 0

Example 3.16 (Two different bases for jR3) Leta = [ej , e2, e3} be the standard basis
forjR3, and let,B = {VI , V2 , V3} with VI = (1, 1, 1) = el +e2 +e3, V2 = (0, 1, 1) =
e2 + e3, V3 = (0,0, 1) = ej , Then, ,B is also a basis for JR3 (see Example 3.15(3Ÿ .
For any x = (XI, X2 , X3) E jR3, one can easily verify that

x XI el + X2 e2 + X3e3

XlVI + (X2 - XI)V2 + (X3 - X2)V3 . o


Problem 3.7 Show that the vectors VI = (l, 2, 1), v2 = (2,9,0) and v3 = (3,3,4) in the
3-space lR3 form a basis .

Problem 3.8 Show that the set {l , x, x 2 , ... , x"} is a basis for Pn (R), the vector space of all
polynomials of degree n with real coefficients .

Problem 3.9 Let Xk denote the vector in lRn whose first k - 1 coordinates are zero and whose
last n - k + I coordinates are 1. Show that the set {XI, X2, ... , xn} is a basis for lRn .
88 Chapter 3. Vector Spaces

3.4 Dimensions
We often say that the line ]R 1 is one-dimensional, the plane]R2 is two-dimensional and
the space R' is three-dimensional, etc. This is mostly due to the fact that the freedom
in choosing coordinates for each element in the space is 1,2 or 3, respectively. This
means that the concept of dimension is closely related to the concept of bases . Note
that for a vector space in general there is no unique way in choosing a basis . However,
there is something common to all bases, and this is related to the notion of dimension.
We first need the following lemma from which one can define the dimension of a
vector space.

Lemma 3.9 Let V be a vector space and let ex = {Xl, X2, ... , xm} be a set of m
vectors in V .
(1) Ifex spans V, then every set ofvectors with more than m vectors cannot be linearly
independent.
(2) If ex is linearly independent, then any set of vectors with fewer than m vectors
cannot span V.

Proof: Since (2) follows from (1) directly, we prove only (1). Let,8 = {Yl, Y2, .. . ,Yn}
be a set of n-vectors in V with n > m. We will show that ,8 is linearly dependent.
Indeed, since each vector Yj is a linear combination of the vectors in the spanning set
ex, i.e., for j = 1,2, .. . , n ,
m

Yj = aljXI + a2j X2 + .. . + amjXm = L aijXi,


i=l
we have

crYI + C2Y2 + .. . + cnYn = cr (alixi + a21x2 + ...+ amlXm)


+ c2(a12xI + a22x2 + ... + a m2 Xm)

= (all CI+ al2 c2 + ... + alnCn)XI


+ (a2l cl + a22 c2 + . .. + a2nCn)X2

Thus, ,8 is linearly dependent if and only if the system of linear equations

crYI + C2Y2 + ... + CnYn = 0


has a nontrivial solution (CI, C2, • •• ,Cn ) ::f:: (0,0, .. . , 0). This is true if all the coef-
ficients of Xi'S are zero but not all of ci 's are zero. It means that the homogeneous
system of linear equations in Ci 'S
3.4. Dimensions 89

must have a nontrivial solution. But it is guaranteed by Lemma 3.7, since m < n. 0

It is clear by Lemma 3.9 that if a set (X = {XI , X2, ... , x n } of n vectors is a basis
for a vector space V, then no other set f3 = {YI, Y2 , . . . , Yr} of r vectors can be a basis
for V if r i= n. This means that all bases for a vector space V have the same number
of vectors, even if there are many different bases for a vector space. Therefore, we
obtain the following important result:

Theorem 3.10 If a basis for a vector space V consists of n vectors, so does every
other basis.

Definition 3.7 The dimension of a vector space V is the number, say n, of vectors
in a basis for V , denoted by dim V = n. When V has a basis of a finite number of
vectors, V is said to be finite dimensional.

Example 3.17 (Computing the dimension) The following are trivial:


(1) If V has only the zero vector: V = {OJ, then dim V = O.
(2) If V = IR n, then the standard basis {el , ez. ... , en} for V implies dim IRn = n.
(3) If V = Pn (IR) of all polynomials of degree less than or equal to n, then
dim Pn (lR) = n + 1 since {I , x, x 2 , ... , x " ] is a basis for V.
(4) If V = Mm x n( lR) of all m x n matrices, then dim M m x n(lR) = mn since {Eij :
i = 1, .. . , m, j = 1, .. . , n} is a basis for V , where Eij is the m x n matrix
whose (i, j)-th entry is 1 and all others are zero. 0

If V = C(IR) of all real-valued continuous functions defined on the real line,


then one can show that V is not finite dimensional. A vector space V is infinite
dimensional if it is not finite dimensional. In this book, we are concerned only with
finite-dimensional vector spaces unless otherwise stated.

Theorem 3.11 Let V be a finite-dimensional vector space.


(1) Any linearly independent set in V can be extended to a basis by adding more
vectors if necessary.
(2) Any set of vectors that spans V can be reduced to a basis by discarding vectors
if necessary.

Proof: We prove (1) and leave (2) as an exercise. Let (X = {XI, X2, . .. , Xk } be a
linearly independent set in V. If (X spans V, then (X is a basis. If (X does not span V,
then there exists a vector, say Xk+ I , in V that is not contained in the subspace spanned
by the vectors in (x. Now {XI, .. . , Xko xk+d is linearly independent (check why). If
{XI, . .• , Xko xk+d spans V, then this is a basis for V. If it does not span V , then the
90 Chapter 3. VectorSpaces

same procedure can be repeated, yielding a linearly independent set that spans V, i.e.,
a basis for V . This procedure must stop in a finite number of steps because of Lemma
3.9 for a finite-dimensional vector space V. 0

Theorem 3.11 shows that a basis for a vector space V is a set of vectors in V which
is maximally independent and minimally spanning in the above sense. In particular,
if W is a subspace of V, then any basis for W is linearly independent also in V, and
can be extended to a basis for V . Thus dim W ::: dim V.

Corollary 3.12 Let V be a vector space ofdimension n. Then


(1) any set ofn vectors that spans V is a basisfor V, and
(2) any set ofn linearlyindependent vectorsis a basisfor V.

Proof: Again we prove (1) only. If a spanning set of n vectors were not linearly
independent, then the set would be reduced to a basis that has a number of vectors
smaller than n. 0

Corollary 3.12 means that if it is known that dim V = n and if a set of n vectors
either is linearly independent or spans V, then it is already a basis for the space V.

Example 3.18 (Constructing a basis) Let W be the subspace of]R4 spanned by the
vectors

xi = (1 , -2, 5, -3), X2 = (0, 1, 1, 4), X3 = (1, 0, 1, 0).

Find a basis for Wand extend it to a basis for ]R4 .

Solution: Note that dim W ::: 3 since W is spanned by three vectors Xj's. Let A be
the 3 x 4 matrix whose rows are XI. X2 and X3 :

-2 5
-3 ]
1 1 4 .
o 1 o
Reduce A to a row-echelon form :

The three nonzero row vectors of U are clearly linearly independent, and they also
span W because the vectors Xl, X2 and X3 can be expressed as a linear combination
of these three nonzero row vectors of U . Hence , the three nonzero row vectors of U
provides a basis for W. (Note that this implies dim W = 3 and hence xi, X2, X3 is
3.5. Rowand column spaces 91

also a basis for W by Corollary 3.12. The linear independence of Xi 'S is a by-product
of this fact).
To extend it to a basis for ]R4 , just add any nonzero vector of the form X4 =
(0, 0, 0, t) to the rows of U. D

Problem 3.10 Let W be a subspace of a vector space V. Show that if dim W = dim V , then
W=V .

Problem 3.11 Find a basis and the dimension of each of the following subspaces of Mn xn (R)
of all n x n matrice s.
(1) The space of all n x n diagonal matrices whose traces are zero.
(2) The space of all n x n symmetric matrices .
(3) The space of all n x n skew-symmetric matrices.

As a direct consequence of Theorem 3.11 and the definition of the direct sum of
subspaces, one can show the following corollary.

Corollary 3.13 For any subspace U of V, there is a subspace W of V such that


V = U EB W.

Proof: Choose a basis {UI , . . . , Uk} for U, and extend itto a basis {UI, .. . , Ub Uk+l ,
. . . • un} for V. Then the subspace W spanned by {Uk+I • . . . • un} satisfies the require-
ment. D

Problem 3.12 Let lv j , V2•...• vn} be a basis for a vector space V and let Wi = {rv i : r E R}
be the subspace of V spanned by Vi. Show that V = W I EEl W2 EEl ... EEl Wn.

3.5 Rowand column spaces


In this section, we go back to systems of linear equations and study them in terms of
the concepts introduced in the previous sections. Note that an m x n matrix A can be
abbreviated by the row vectors or column vectors as follows:

where fi is the i-th row vectors of A in jRn, and Cj is the j -th column vectors of A in
jRm.

Definition 3.8 Let A beanm x n matrix withrow vectors {fl . f 2 , . .. , f m } and column
vectors {CI , C2, ... , cn }.
92 Chapter 3. Vector Spaces

(1) The row space of A is the subspace in jRn spanned by the row vectors {rl. r2 •.. .•
r m } . denoted by R(A) .
(2) The column space of A is the subspace in jRm spanned by the column vectors
Iei. C2• . . . • cn } . denoted by C(A) .
(3) The solution set of the homogeneous equation Ax =
0 is called the null space
of A, denoted by N(A) .

Note that the null space N (A) is a subspace of the n-space jRn. Its dimension is
called the nullity of A. Since the row vectors of A are just the column vectors of
its transpose A T and the column vectors of A are the row vectors of AT, the row
(column) space of A is just the column (row) space of AT; that is,

Since Ax = XlCl + X2C2 + ... + XnC n for any vector x = (Xl. X2 , •. • ,Xn) E jRn, we
get
C(A) = {Ax : x E jRn}.
Thus, for a vector b E jRm. the system Ax = b has a solution if and only if b E
C(A) jRm. In other words, the column space C(A) is the set of vectors bE jRm for
which Ax = b has a solution.
It is quite natural to ask what the dimensions of those subspaces are. and how
one can find bases for them. This will help us to understand the structure of all the
solutions of the equation Ax = b. Since the set of the row vectors and the set of
the column vectors of A are spanning sets for the row space and the column space,
respectively. a minimally spanning subset of each of them will be a basis for each of
them . This is not a difficult problem for a matrix of a (reduced) row-echelon form.

Example 3.19 (Find a basis for the null space) Let U be in a reduced row-echelon
form given as
1 0 0 2 2]
o 1 0 -1 3
U= 0 0 1 4 -1 .
[
o 0 0 0 0
Clearly, the first three nonzero row vectors containing leading 1's are linearly inde-
pendent and they form a basis for the row space R(U) . so that dim R(U) = 3. On the
other hand, the first three columns containing leading l's are also linearly indepen-
dent (see Theorem 3.6), and the last two column vectors can be expressed as linear
combinations ofthem. Hence, they form a basis for C(U), and dim C(U) = 3. To find
a basis for the null space N (U) , we first solve the system Ux = 0 to get the solution
3.5. Rowand columnspaces 93

where n, = (-2 , 1, -4, 1, 0),ot= (-2 , -3 , 1,0, l),ands andt are arbitrary
values for the free variables X4 and X5, respectively. It shows that these two vectors
n, and n, span the null space N (U ), and they are clearly linearly independent (see
their last two entries). Hence, the set (os , oel is a basis for the null space N(U ). 0
For any matrix A, we first investigate the row space R (A) and the null space
N (A) of A by comparing them with those of the reduced row-echelon form U of A.
Since Ax = 0 and Ux = 0 have the same solution set by Theorem 1.1, we clearly
have N (A ) = N(U).
Let [rj , ... , r m } be the row vectors of an m x n matrix A. The three elementary
row operations change A into the matrices Ai of the following three types:

Al = kr, for k æ= 0; A2 = for i < j; A3 = ri + krj


rj

It is clear that the row vectors of the three matrices AI, A2 and A3 are linear com-
binations of the row vectors of A. On the other hand, by the inverse elementary row
operations, these matrices can be changed into A. Thus, the row vectors of A can also
be written as linear combinations of those of Ai'S. This means that if matrices A and
B are row equivalent, then their row spaces must be equal, i.e., R(A) = R(B).
Now the nonzero row vectors in the reduced row-echelon form U are always
linearly independent and span the row space of U (see Theorem 3.6). Thus they form
a basis for the row space R (A) of A. It gives the following theorem.
Theorem 3.14 Let U be a (reduced) row-echelon form of a matrix A. Then

R (A) = R(U) and N(A) = N(U ).

Moreover, if U has r nonzero row vectors containing leading 1's, then they form a
basis for the row space R(A), so that the dimension ofR(A) is r.

The following example shows how to find bases for the row and the null spaces,
and at the same time how to find a basis for the column space.
Example 3.20 (Find bases for the row space and the column space of A) Let A be a
matrix given as

A = [-;o -;
-3 3
-4 1
=[ r3
].
3 6 0 -7 2
Find bases for the row space R(A), the null space N(A ), and the column space C(A)
of A .
94 Chapter 3. Vector Spaces

Solutioo: (1) Find a basis for 'R (A ): By Gauss-Jordan elimination on A , we get the
reduced row-echelon form V :

V =
[1o a
0 I
0
2
-I o I
0 1 1
1] .

Since the three nonzero row vectors


° ° °° °
VI = (1, 0, 2, 0, 1),
V2 = (0, 1, -1, 0, 1),
V3 = (0, 0, 0, 1, 1)

of V are linearly independent, they form a basis for the row space 'R(V) = R(A), so
dim R(A) = 3. (Note that in the process of Gaussian elimination, we did not use a
permutation matrix . This means that the three nonzero rows of V were obtained from
the first three row vectors rl, r2, r3 of A and the fourth row r4 of A turned out to be
a linear combination of them. Thus the first three row vectors of A also form a basis
for the row space.)
(2) Find a basisfor N(A) : It is enough to solve the homogeneous system Ux = 0,
since N(A) = N(V). That is, neglecting the fourth zero equation , the equation
Ux = 0 takes the following system of equations :

X2
+ 2x3
X3
+
+
Xs
Xs
= °0
X4 + X5 = o.
Since the first, the second and the fourth columns of V contain the leading 1's, we
see that the basic variables are XI , X2, X4, and the free variables are X3 , xs . As in
Example 3.19 by assigning arbitrary values sand t to the free variables X3 and xs,
one can find the solution x of Ux = 0 as

x= SOs + tOt,

where n, = (-2,1 ,1 ,0, Oj and n, = (-1, -1 ,0, -1, 1).lnfact,thetwovectors


n, and n, are the solutions when (X3 , xs) = (s, t) is (1,0) and when (X3, xs) = (s, t)
is (0, 1), respectively. They must be linearly independent, since (1,0) and (0, 1), as
the (X3, xs)-coordinates of n, and n, respectively, are linearly independent. Since any
solution of V x = 0 is a linear combination of them, the set (os , n.] is a basis for the
null space N(V) = N(A). Thus dimN(A) = 2 = the number offree variables in
Vx=O.
(3) Find a basis for C(A): Let CI, C2, C3, C4, Cs denote the column vectors of
A in the given order. Since these column vectors of A span C(A), we only need to
discard some of the columns that can be expressed as linear combinations of other
column vectors. But, the linear dependence

XI CI + X2C2 + X3 C3 + X4C4 + XScs = 0, i.e., Ax = 0,


3.5. Rowand column spaces 95

holds if and only ifx = (Xt , ... ,xs ) E N (A). By taking x = n, = (-2 , 1, 1, 0, 0)
orx = 0 , = ( - 1 , -1, 0, -1 , 1), the basis vectors of N(A) given in (2), we obtain
two nontrivial linear dependencies of C j 's:

= 0,
- C4 + Cs = 0,

respectively. Hence, the column vectors C3 and Cs corresponding to the free variables
in Ax = 0 can be written as

C3 2ct - C2,

Cs = Ct + C2 + C4.

That is, the column vectors C3, Cs of A are linear combinations of the column vectors
Ct, C2, C4, which correspond to the basic variables in Ax = O. Hence, [cj , C2, C4}
spans the column space C(A) .
We claim that [cj , C2, C4} is linearly independent. Let A = [Ct C2 C4] and U =
[uj U2 U4] be submatrices of A and U, respectively, where Uj is the j-th column
vector of the reduced row-echelon form U of A obtained in (1):

- 100]
0 1 0 - [ 1 2 2]
-2 -5 -1
U=
[o 0
0 Oland A =
0
0 - 3
3
4
6-7
.

Then clearly U is the reduced row-echelon form of Aso that N(A) = NeU). Since
the vectors uj , U2, U4 are just the columns of U conta ining leading 1's, they are linearly
independent, by Theorem 3.6, and Ux = 0 has only a trivial solution. This means
that Ax = 0 has also only a trivial solution , so [cj , C2, C4} is linearly independent.
Therefore, it is a basis for the column spaceC(A) and dimC(A) = 3 = the number of
basic variables . That is, the column vectors of A corresponding to the basic variables
in Ux = Oform a basis for the column space C(A) . 0

In summary, given a matrix A, we first find the (reduced) row-echelon form U of


A by Gauss-Jordan elimination. Then
a basis for R(A) = R(U) is the set of nonzero row vectors of U,
a basis for N(A) = N(U) can be found by solving Ux = 0,
for a basis for the column space C(A), one notices that C(U) =1= C(A) in general ,
since the column space of A is not preserved by Gauss-Jordan elimination. (See
Problem 3.16.) However, we have dimC(A) = dimC (U) , and a basis for C(A)
can be formed by selecting the columns in A, not in U , which correspond to the
basic variables (or the leading 1's in U ).
96 Chapter 3. Vector Spaces

Alternatively, a basis for the column space C(A) can also be found with the ele-
mentary column operations, which is the same as finding a basis for the row space
R(A T) of AT .
Problem 3.13 Let A be the matrix given in Example3.20. Find the conditions on a, b, c, d
so that the vector x = (a , b, c, d) belongs to C(A) .

Problem 3.14 Find bases for R(A) andN(A) of the matrix

1 -2 0 0
2 -5 -3 -2
A= 0 5 15 10
[
2 6 18 8

Also finda basis for C(A) by finding a basis for R(A T).

Problem 3.15 Let A and B be twon x n matrices. Showthat AB = oif and only ifthe column
space of B is a subspaceof the nullspaceof A.

Problem 3.16 Find an exampleof a matrix A and its row-echelon form U such that C(A) f=
C(U) . What is wrong in C(A) = R(A T ) = R(U T ) = C(U)?

3.6 Rank and nullity


The argument in Example 3.20 is so general that it can be used to prove the following
theorem, which is one of the most fundamental results in linear algebra. The proof
given here is just a repetition of the argument in Example 3.20 in a general case, and
so it may be skipped at the reader's discretion.

Theorem 3.15 (The fundamental theorem) For any m x n matrix A , the row space
and the column space of A have the same dimension ; that is, dim R(A) = dim C(A).

Proof: Let dim R(A) = r and let U be the reduced row-echelon form of A . Then
r is the number of the nonzero row (or column) vectors of U containing leading 1's,
which is equal to the number of basic variables in Ux = 0 or Ax = O. We shall prove
that the r columns of A corresponding to the leading l's (or basic variables) form a
basis for C(A), so that dim C(A) = r = dim R(A).
(1) They are linearly independent: Let Adenote the submatrix of A whose columns
are those of A corresponding to the r basic variables (or leading 1's) in U , and let
o denote the submatrix of U consisting of the r columns containing leading 1'so
Then, it is clear that 0 is the reduced row-echelon form of A, so that Ax 0 if =
and only if Ox = O. However, Ox = 0 has only a trivial solution since the columns
3.6. Rank and nullity 97

of V containing the leading I 's are linearly independent by Theorem 3.6. Therefore,
Ax = 0 also has only the trivial solution, so the columns of Aare linearly independent.
(2) They spanC(A): Note that the columns of A corresponding to the free variables
are not contained in A, and each of these column vectors of A can be written as a
linear combination of the column vectors of A (see Example 3.20). To show this,
let {Cil' Ci2' • •• , Cik} be the columns of A (not contained in A) corresponding to the
free variables {Xii' Xi2' • • • ,Xik}, and let Xij be any of these free variables. Then, by
assigning the value 1 to Xi j and 0 to all the other free variables, one can get a nontrivial
solution of
Ax = XI cI + X2c2 + ... + xnc n = O.

When such a solution is substituted into this equation, one can see that the column
Cij of A corresponding to Xij =
1 is written as a linear combination of the columns
of A. This can be done for each free variable Xi j' j = 1, 2, . .. , k, so the columns of
A corresponding to those free variables are redundant in the spanning set of C(A). 0

Remark: (1) In the proof ofTheorem 3.15, once we have shown that the columns in A
are linearly independent as in step (1), we may replace step (2) by the following argu-
ment: One can easily see that dim C(A) 2: dim R(A) by Theorem 3.11. On the other
hand, since this inequality holds for arbitrary matrices, applying to A T particularly we
get dim C(A") 2: dim R(A T ) . Moreover, C(A T ) = R(A) and R(A T ) = C(A) im-
plies dim C(A) :s dim R(A), which means dim C(A) = dim R(A). This also means
that the column vectors of A span C(A), and so form a basis.
(2) The proof (2) of Theorem 3.15 also shows that the reduced row-echelon form
of a system is unique , which was stated on page 10. In fact, if VI and V2 are two
reduced row-echelon forms of an m x n matrix A, then the columns of VI and V2
corresponding to the basic variables (i.e., containing leading 1's) must be the same
and of the form [0 . . . 0 1 0 .. . of by the definition of the reduced row-echelon
form . If there are no free variables , then it is quite clear that

1 0 0
0 1 0
0
VI= 1 = V2 ·
0 0 0

0 0 ... 0

Suppose that there is a free variable . Since VIX = 0 if and only if V2X = 0, one can
easily check that the columns of VI and V2 corresponding to each free variable must
also be the same, so that VI = V2.
98 Chapter 3. Vector Spaces

In summary, the following equalities are now clear from Theorems 3.14 and 3.15:

dimN(A) = dimN(U)
the number of free variables in Ux = O.
dimR(A) = dimR(U)
= the number of nonzero row vectors of U
= the maximal number of linearly independent
row vectors of A
= the number of basic variables in Ux = 0
= the maximal number of linearly independent
column vectors of A
= dimC(A).

Definition 3.9 For an m x n matrix A, the rank of A is defined to be the dimension


of the row space (or the column space), denoted by rank A.

Clearly, rank In = n and rank A = rank AT. And for an m x n matrix A, since
dim R(A) S m and dim C(A) S n, we have the following corollary :

Corollary 3.16 If A is an m x n matrix, then rank AS min{m, n}.


Since dim R(A) = dimC(A) = rank A is the number of basic variables in
Ax = 0, and dimN(A) = nullity of A is the number of free variables Ax = 0, we
have the following theorem.

Theorem 3.17 (Rank Theorem) For any m x n matrix A,

dim R(A) + dimN(A) rank A + nullity of A = n,


dimC(A) + dimN(A T) = rank A + nullity of AT = m.

If dimN(A) = 0 (or N(A) = (O}), then dim R(A) = n (or R(A) = which
means that A has exactly n linearly independent rows and n linearly independent
columns. In particular, if A is a square matrix of order n, then the row vectors are
linearly independent if and only if the column vectors are linearly independent. There-
fore, Ax = 0 has only the trivial solution, and by Theorem 1.9 we get the following
corollary.

Corollary 3.18 Let A be an n x n square matrix. Then A is invertible if and only if


rank A = n.

Example 3.21 (Find the rank and the nullity) For a 4 x 5 matrix
3.6. Rank and nullity 99

2
A-
-
[ 1
1
-2
2
2

find the rank and the nullity of A.

Solution: Gaussian elimination gives

2021]
013 1
o0 0 1 .
000 0

The first three nonzero rows containing leading 1 's in U form a basis for R(U) =
R(A). Therefore, rank A = dim'R.(A) = dimC(A) = 3, the nullity of A =
dimN(A) = 5 - dim R(A) = 2. 0

Problem 3.17 Find the nullity and the rank of each of the following matrices :

(1) A =[ ; - ], (2) A = ].
-I -2 0-5 2 I 5 -2

For each of the matrices. show that dim R(A) = dim C(A) directly by finding their bases.
Problem 3.18 Show that a system of linear equations Ax = b has a solution if and only if
rank A = rank [A b], where [A b] denotes the augmented matrix for Ax = b.

Theorem 3.19 For any two matrices A and B for which A B can be defined,
(1) N(AB) ;2 N(B),
(2) N«AB)T) ;2 N(A T),
(3) C(AB) S; C(A),
(4) R(AB) S; R(B).

Proof: (1) and (2) are clear, since Bx = 0 implies (AB)x = A(Bx) = O.
(3) For an m x n matrix A and an n x p matrix B,

C(AB) = {ABx : x E lR.P}


S; {Ay: y E 1R.n} = C(A),

because Bx E lRn for any x E lRP . (See Example 3.9.)


(4) R(AB) = C«ABl) = C(B T AT) S; C(B T) = R(B). o
100 Chapter 3. Vector Spaces

Corollary 3.20 rank(AB) :s min{rank A , rank B} .


In some particular cases, the equality holds. In fact, it will be shown later in
Theorem 5.25 that for any square matrix A, rank(A T A) = rank A = rank(AA T ) . The
following problem illustrates another such case.
Problem 3.19 Let A be an invertible square matrix. Show that, for any matrix B, rank(AB) =
rank B = rank(B A) .

Theorem 3.21 Let A be an m x n matrix of rank r. Then


:s
(1) for every submatrix C of A, rank C r, and
(2) the matrix A has at least one r x r submatrix ofrank r .. that is, A has an invertible
submatrix oforder r.

Proof: (l) Consider an intermediate matrix B which is obtained from A by removing


the rows that are not wanted in C. Then clearly nCB) R(A) and hence rank B :s
rank A. Moreover, since the columns of C are taken from those of B , C(C) C(B)
and rank C :s rank B.
(2) Note that one can find r linearly independent row vectors of A, which form a
basis for the row space of A. Let B be the matrix whose row vectors consist of these
vectors. Then rank B = r and the column space of B must be of dimension r. By
taking r linearly independent column vectors of B, one can find an r x r submatrix
C of A with rank r. 0

Problem 3.20 Prove that the rank of a matrix is equal to the largest order of its invertible
submatrices .

Problem 3.21 For each of the matrices given in Problem 3.17, find an invertible submatrix of
the largest order.

3.7 Bases for subspaces


In this section, we introduce two ways of finding bases for V + Wand V n W of
two subspaces V and W of the n-space jRn, and then derive an important relationship
between the dimensions of those subspaces in terms of the dimensions of V and W.
Let ex = {VI, V2, • •. , Vk} and f3 = {WI, W2, • .• , we} be bases for V and W,
respectively. Let Q be the n x (k + l) matrix whose columns are those basis vectors:

Q= [VI" . Vk WI ... wtlnx(kH) .

Theorem 3.22 Let V and W be two subspaces ofjRn, and Q be the matrix defined
above.
(1) C(Q) = V + W , so that a basisfor the column space C(Q) is a basis for V + W.
(2) N(Q) can be identified with V n W so that dim(V n W) = dimN(Q).
3.7. Bases for subspaces 101

Proof: (1) It is clear that C(Q) = V + W.


(2) Let x = (al," " ak, bl, , be) E N(Q) jRk+l . Then

Qx = alvl + + akVk + blwl + ... + bewe = 0,

from which we get

If we set

Y = alvl+ · .. +akvk
-(bl WI + ... + bew £),

then Y E V n W since the first right-hand side alVI + . . . + akVk is in Vasa linear
combination of the basis vectors in a and the second right-hand side -(blwl + ... +
bew£) is in W as a linear combination of the basis vectors in p. That is, to each
x E N(Q), there corresponds a vector Y in V n w.
On the other hand, if Y E V n W, then Y can be written in two linear combinations
by the bases for V and W separately as

Y = alvl+· · ·+akvkEV,
Y = blwl + .. . + bewe E W,

for some al , .. . , ak and bl , . . . , be. Let x = (al , . .. , ak, -bl , . .. , -be) E jRk+e.
Then it is quite clear that Qx = 0, i.e., x E N(Q). Therefore, the correspondence ofx
in N (Q) jRk+ e to a vector y in V n W jRn gives us a one-to-one correspondence
between the sets N(Q) and V n W .
Moreover, if Xi, i = 1, 2, correspond to Yi, then one can easily check that XI + X2
corresponds to YI + Y2, and kx, corresponds to ky!. This means that the two vector
spaces N (Q ) and V n W can be identified as vector spaces (see Section 4.2 for an exact
meaning of this identification). In particular, for a basis for N(Q) , the corresponding
set in V n W is a basis for V n W : that is, if the set of vectors

I XI: = (all, .. " alk , bll' , bl e),

1 x, = (asl , ... , ask, bsl , , bs£),

is a basis for N (Q), then the set of vectors


YI
:
= allvl + . .. + alkvk,
or
1 YI
:
= -(bllWI + ... + blewe),

1 Ys =
is a basis for V
a slvl + .. . + askvk,
n W, and vice-versa.
Ys =
This implies that
-(bsIWI + ...+ bsewt)

dimN(Q ) = dim(V n W) . o
Note that dim(V + W) =1= dim V + dim W , in general. The following theorem
gives a relation between them.
102 Chapter 3. VectorSpaces

Theorem 3.23 For any subspaces V and W of the n-space IRn,

dim(V + W) + dim(V n W) = dim V + dim W.

Proof: Let a = {VI, V2, . •. , vd and f3 = {WI , W2, . . . , we} be bases for V and W ,
respectively. Let Q be the n x (k + f) matrix whose columns are the previous basis
vectors:
Q = [VI ' .. Vk WI . . . welnx(k+l)'
Then, by the Rank Theorem and Theorem 3.22, we have

k + f = dimC(Q) + dimN(Q) = dim(V + W) + dim(V n W). D

In particular, dim(V + W) = dim V + dim W if and only if V n W = {OJ. In this


case, V + W = V EB W .

Example 3.22 (Find a basis for a subspace) Let V and W be two subspaces of IRs

r r
with bases

= (1 , 3, -2, 2, 3), = (2, 3, -1, -2, 9),


V2 = (1, 4, -3, 4, 2), W2 = (1, 5, -6, 6, 1),
V3 = (1, 3, 0, 2, 3), W3 = (2, 4 , 4, 2, 8),

respectively. Find bases for V + Wand V n w.

Solution: The matrix Q takes the following form:

The Gauss-Jordan elimination gives

U =
1 0 0
0 1 0 -3
5 00]
2 0
[ 001
o -1 0 .
000 001

From this, one can directly see that dim(V + W) = 4, and the columns VI, V2, V3, W3
corresponding to the basic variables in Qx = 0 (or leading 1's in U) form a basis for
C(Q) = V + W. Moreover, dimN(Q) = dim(V n W) = 2, corresponding to two
free variables X4 and xs in Qx = O.
To find a basis for V n w, we solve Ux = 0 for (X4, xs) = (1,0) and (X4, xs) =
(0,1) respectively to obtain a basis for N(Q) :
3.7. Bases for subspaces 103

Xl = (-5,3,0,1,0,0) and X2 = (0, -2,1,0,1,0).


From Qx; = 0, we obtain two equations:

-5VI + 3V2 + WI = 0,
-2V2 + V3 + W2 = 0.

Therefore, {YJ, Y2} is a basis for V n W, where

Clearly, one can check

dim(V + W) + dim(V n W) = 4 + 2 = 3 + 3 = dim V + dim W. 0

Remark: (Another method for finding bases) Example 3.22 illustrates a method for
findingbases for V + Wand V n W for givensubspaces V and W oflRn by constructing
a matrix Q whose columns are basis vectors for V and basis vectors for W. There is
another method for finding their bases by constructing a matrix Q whose rows are
basis vectors for V and basis vectors for W. In this case, clearly V + W = R(Q) .
By finding a basis for the row space R(Q), one can get a basis for V + W.
On the other hand, a basis for V n W can be found as follows: Let A be the k x n
matrix whose rows are basis vectors for V, and B the e x n matrix whose rows are
basis vectors for W. Then V = R(A) and W = R(B). Let A denote the matrix A
with an additional unknown vector X = (Xl , x2, . .. ,Xn ) E IRn attached as the bottom
row, i.e.,

and the matrix B is defined similarly.Then it is clear that R(A) = R(A) and R(B) =
R(B) if and only if X E V n W = R(A) n R(B). This means that the row-echelon
form of A and that of A should be the same via the same Gaussian elimination.
Thus, by comparing the row vectors of the row-echelon form of A with those of A,
one can obtain a system of linear equations for x = (Xl, x2, . . . ,xn ) . By the same
argument applied to Band B, one gets another system oflinear equations for the same
x = (Xl, X2, ••. ,xn ) . Common solutions of these two systems together will provide
us a basis for V n W.
The following example illustrates how one can apply this argument to find bases
for V + Wand V n W.

Example 3.23 (Find a basis for a subspace) Let V be the subspace of 1R5 spanned
by
104 Chapter 3. Vector Spaces

VI = (1, 3, -2, 2, 3),


V2 = ( I, 4, -3, 4, 2),
V3 = (2, 3, -I , -2, 10),
and W the subspace spanned by

WI = (1, 3, 0, 2, I) ,
W2 = ( I , 5, -6, 6, 3),
W3 = (2, 5, 3, 2, I ) .

Find a basis for V + Wand for V n w.


Solution: Note that the matrix A whose row vectors are Vi 'S is reduced to a row-
echelon form

[
°
I 3 -2 2
1 -1 2
3]
-1 ,
°° °° 1
so that dim V = 3. Similarly, the matrix B whose row vectors are w/s is reduced to
a row-echelon form

[
°
1 3 0 2
2 -6 4
1]
2 ,

so that dim W = 2.
°° °°°
Now, if Q denotes the 6 x 5 matrix whose row vectors are Vi 'S and w/s, then
V + W = R (Q ). By Gaussian elimination, Q is reduced to a row-echelon form,
excluding zero rows:

I 3 -2
° -1 22 -13 ]

° ° °°
I
I -I .
[
°° ° I
Thus, the four nonzero row vectors

(1, 3, -2, 2, 3), (0, 1, -1 , 2, -1), (0, 0, 1, 0, -1), (0, 0, 0, 0, 1)

form a basis for V + W, so that dim(V + W) = 4.


We now find a basis for vn W.A vector x = (XI , X2, X3, X4, xs) E ]Rs is contained
in V n W if and only if x is contained in both the row space of A and that of B.
Let A be A with x attached as the last row:

A- = 2
3 -2
4 -3 4 22 3]
3 - I - 2 10 .
XI X2 X3 X4 X5
3.7. Bases for subspaces 105

Then by the same Gaussian elimination to reduce A to its row-echelon form, A is


reduced to

[
1 3
o
o
o
1
0
0 -XI+X2+X3
-2
-1
0 -i]
1
o
.

Therefore, x E R(A) = V if and only if R(A) = R(A) . By comparing the row


vectors of the row-echelon form of A with those of A, one can say that x E RCA) if
and only if the last row vector of the row-echelon form of A is the zero vector, that
is, x is a solution of the homogeneous system of equat ions

- XI + X2 + X3 = 0
{ 4xI - 2X2 + X4 = o.
The same calculation with iJ gives another homogeneous system of linear equations
for x:
- 9X I + 3X2 + X3 0
+
1
4xI - 2X2 X4 = 0
2xI - X2 + X5 = O.
Solving these two homogeneous systems together yields

V nW = {t(I, 4, -3, 4, 2) : t E 1R}.

Hence, {(l, 4, -3, 4, 2)} is a basis for V n Wand dim(V n W) = 1. 0

Problem 3.22 Let V and W be the subs paces of the vector space P3 (R) spann ed by

3 x + 4x 2 + x3
5 + 5x 2 + x3
5 5x + lOx 2 + 3x 3
and

I
WI (x) = + 3x 2 + 2x 3
9 - 3x
W2(X) = 5 - + 2x 2 + x 3 x
W3(X) = 6 + 4x 2 + x 3
respectively. Find the dimensions and bases for V + Wand V n w.

Problem 3.23 Let

V = {(x,Y,Z,U)ElR4 : y + z+ u = O},
W {(X , Y, z, u) E
4
lR : x + Y = 0, Z = 2U}
be two subs paces of lR4 . Find bases for V. W, V + W, and V n w.
106 Chapter 3. Vector Spaces

3.8 Invertibility

In Chapter 1, we have seen that a non-square matrix A may have only one-sided
(right or left) inverses. In this section , it will be shown that the existence of a one-
sided inverse (right or left) of A implies the existence or the uniqueness of the solutions
of a system Ax = b.

Theorem 3.24 (Existence) Let A be an m xn matrix. Then the following statements


are equivalent.
(1) For each b E jRm, Ax = b has at least one solution x in jRn.
(2) The column vectors of A span jRm, i.e., C(A) =
jRm.
(3) rank A = m (hence m :s n) .
(4) A has a right inverse (i.e., B such that AB = 1m ) .

Proof: (1) <=> (2): In general, C(A) jRm. For any b E jRm, there is a solution
x E jRn of Ax = b if and only if b is a linear combination of the column vectors of
A, i.e., b E C(A) . Thus jRm = C(A).
(2) <=> (3): C(A) = jRm if and only if dimC(A) = m :s n (see Problem 3.10). But
dimC(A) = rank A = dim R(A) :s min{m, n},
(1) => (4): Let ej , e2, ... , em be the standard basis for jRm. Then for each ei, one
can find an Xi E jRn such that AXi = ei by the hypothesis (1). If B is the n x m matrix
whose columns are these Xi'S: i.e., B = [Xj X2 • . . xm ] , then, by matrix multiplication,

(4) => (1): If B is a right inverse of A, then for any b s jRm, X = Bb is a solution
0

Condition (2) means that A has m linearly independent column vectors, and con-
dition (3) implies that there exist m linearly independent row vectors of A, since rank
= =
A m dim R(A) .
Note that if C(A) jRm, then Ax = b has no solution for b ¢ C(A) .

Theorem 3.25 (Uniqueness) Let A be an m x n matrix. Then thefollowing statements


are equivalent.
(1) For each b E jRm, Ax = b has at most one solution x in jRn.
(2) The column vectors of A are linearly independent.
(3) dimC(A) = rank A = n (hence n :s m).
(4) R(A) = jRn.
(5) N(A) = {OJ.
(6) A has a left inverse (i.e., C such that C A = In).

Proof: (1) => (2): Note that the column vectors of A are linearly independent if
and only if the homogeneous equation Ax = 0 has only the trivial solution . However,
3.8. Invertibility 107

Ax = 0 has always the trivial solution x = 0, and the statement (1) implies that it is
the only one.
(2) {} (3): Clear, because all the column vectors are linearly independent if and
only if they form a basis for C(A), or dim C(A) = n m.
(3) {} (4): Clear, because dim R(A) = rank A = dimC(A) = n if and only if
R(A) = jRn (see Problem 3.10).
(4) {} (5): Clear, since dim R(A) + dimN(A) = n.
(2) ::::} (6): Suppose that the columns of A are linearly independent so that rank
A = n. Extend these column vectors of A to a basis for jRm by adding m -n additional
independent vectors to them. Construct an m x m matrix S with those basis vectors
in its columns. Then the matrix S has rank m, and hence it is invertible. Let C be the
n x m matrix obtained from S-I by throwing away the last m - n rows. Since the
first n columns of S constitute the matrix A, we have C A = In.

(6) ::::} (1): Let C be a left inverse of A. If Ax = b has no solution, then we are
done . Suppose that Ax = b has two solutions, say Xt and X2. Then

XI = CAxI = Cb = CAX2 = X2·


Hence, the system can have at most one solution. o

Remark: (1) We have proved that an m x n matrix A has a right inverse if and only
if rank A = m, while A has a left inverse if and only if rank A = n. Therefore, if
m 1= n, A cannot have both left and right inverses.
(2) For a practical way of finding a right or a left inverse of an m xn matrix A,
we will show later (see Remark (1) below Theorem 5.26) that if rank A = m, then
(AAT )-I exists and AT(AAT)-I is a right inverse of A, and if rank A = n, then
(AT A)-I exists and ( A T A)-I AT is a left inverse of A (see Theorem 5.26) .
(3) Note that if m = n so that A is a square matrix, then A has a right inverse (and
a left inverse) if and only if rank A = m = n. Moreover, in this case the inverses are
the same (see Theorem 1.9). Therefore, a square matrix A has rank n if and only if
=
A is invertible. This means that for a square matrix "Existence Uniqueness," and
the ten statements listed in Theorems 3.24-3.25 are all equivalent. In particular, for
the invertibility of a square matrix it is enough to show the existence of a one-sided
inverse.

Problem 3.24 For each of the following matrices, find all vectors b such that the system of
linear equations Ax = b has at least one solution . Also, discuss the uniqueness of the solution.
108 Chapter 3. Vector Spaces

(I) A [J 3 -2 5
4 1 3
7 - 3 6 13
(2) A [ ;
-6
-;
1
l
(3) A [l 2 -3 -2
3 -2
8 -7 -2
0

1 -9 -10
-3]
-4
, (4) A = [ 1 1
5
2 -2
;l
Summarizing all the results obtained so far about solvability of a system, one can
obtain several characterizations of the invertibility of a square matrix. The following
theorem is a collection of the results proved in Theorems 1.9,3.24, and 3.25.

Theorem 3.26 For a square matrix A oforder n, the following statements are equiv-
alent.
(1) A is invertible.
(2) det A =I O.
(3) A is row equivalent to In .
(4) A is a product of elementary matrices.
(5) =
Elimination can be completed: PA LDU, with all d; =I O.
(6) Ax = b has a solution for every bE jRn .
(7) Ax = 0 has only a trivial solution, i.e., N(A) = (OJ.
(8) The columns of A are linearly independent.
(9) The columns of A span jRn, i.e., C(A) = jRn.
(10) A has a left inverse.
(11) rank A = n.
(12) The rows of A are linearly independent.
(13) The rows of A span jRn, i.e., R(A) = R".
(14) A has a right inverse.
(15)* The linear transformation A : jRn -+ jRn via A(x) = Ax is injective.
(16)* The linear transformation A : jRn -+ jRn is surjective.
(17)* Zero is not an eigenvalue of A.

Proof: Exercise; where have we proved which claim? Prove any not covered. The
numbers with asterisks will be explained in the following places: (15) and (16) in
Remark on page 133 and (17) in Theorem 6.1. 0

3.9 Applications
3.9.1 Interpolation

In many scientific experiments , a scientist wants to find the precise functional rela-
tionship between input data and output data. That is, in his experiment, he puts various
3.9.1. Application: Interpolation 109

input values into his experimental device and obtains output values corresponding to
those input values. After his experiment, what he has is a table of inputs and out-
puts. The precise functional relationship might be very complicated, and sometimes
it might be very hard or almost impossible to find the precise function. In this case,
one thing he can do is to find a polynomial whose graph passes through each of the
data points and comes very close to the function he wanted to find. That is, he is
looking for a polynomial that approximates the precise function. Such a polynomial
is called an interpolating polynomial. This problem is closely related to systems of
linear equations.
Let us begin with a set of given data: Suppose that for n + I distinct experimental
input values xo, Xl, ... , x n, we obtained n + I output values YO = I(xo), YI =
I(XI), . .. ,Yn = I(xn) . The output values are supposed to be related to the inputs by
a certain (unknown) function I. We wish to construct a polynomial p(x) of degree less
than or equal to n which interpolates I (x) atxo, Xl, . . . , Xn: i.e., p(Xi) = Yi = I (Xi)
for i = 0, I , . . . , n.
Note that if there is such a polynomial, it must be unique. Indeed, if q (x) is another
such polynomial, then hex) = p(x) - q(x) is also a polynomial of degree less than
or equal to n vanishing at n + I distinct points Xo, Xl, . .. , Xn. Hence h (x) must be
the identically zero polynomial so that p(x) = q(x) for all X E JR.
To find such a polynomial p(x), let

with n + I unknowns ai's. Then,

p(Xi) = ao + alXi + ... + anxf = Yi = I(Xi),

for i = 0, I, . .. , n. In matrix notation,

I Xl
XQ .,
. . .. Xl [ ao]
al [ Yo
Yl ]
... .. . ..
. ... - .
..
.
[
I Xn an Yn

The coefficient matrix A is a square matrix of order n + I, known as Vandennonde's


matrix (see Example 2.10), whose determinant is

detA = n
O;S.i<};S.n
(X) -Xi) .

Since the Xi'S are all distinct, det A =1= O. Hence, A is nonsingular, and Ax = b has a
unique solution which determines the unique polynomial p(x) of degree g n passing
through the given n + I points (xo, Yo), (Xl , Yl) , .. . , (Xn, Yn) in the plane ]R2.

Example 3.24 (Finding an interpolating polynomial) Given four points

(0, 3), (1, 0), (-I , 2) , (3, 6)


110 Chapter 3. VectorSpaces

in the plane ]R2, let p(x) = ao+alx+a2x2+a3x3 be the polynomial passing through
the given four points . Then , we have a system of equations

l
ao = 3
ao + al + a2 + a3 = 0
ao - al + a2 a3 2
ao + 3al + 9a2 + 27a3 = 6.

Solving this system, one can get ao = 3, al = -2, a2 = -2, a3 = I, and the unique
polynomial is p(x) = 3 - 2x - 2x 2 + x 3. 0

Problem 3.25 Let f(x) = sinx . Then at x = 0, t, zr, the values of fare y =
0, 4-, O. Find the polynomial p(x) of degree g 4 that passes through these five
points. (One may need to use a computer to avoid messy computation.)

Problem 3.26 Find a polynomial p(x) =a + bx + cx 2 + d x 3 that satisfies p(O) = 1,


p'(O) = 2, p(l) = 4, p'(l) = 4.

Problem 3.27 Find the equation ofa circle that passes through the three points (2, -2), (3, 5),
and (-4, 6) in the plane ]R2 .

Remark: Note that the interpolating polynomial p(x) of degree n is uniquely


determined when we have the correct data, i.e., when we are given precisely n + 1
values of y at n + 1 distinct points Xo , XI , . .. , x n .
However, if we are given fewer data, then the polynomial is under-determined:
i.e., if we have m values of y with m < n + 1 at m distinct points XI, X2, ... , X m,
then there are as many interpolating polynomials as the null space of A since in this
case A is an m x (n + 1) matrix with m < n + 1. (See the Existence Theorem 3.24.)
On the other hand , if we are given more than n + 1 data, then the polynomial
is over-determined: i.e., if we have m values of y with m > n + 1 at m distinct
points XI, X2, • •. , X m, then there may not exist an interpolating polynomial since
the system could be inconsistent. (See the Uniqueness Theorem 3.25.) In this case,
the best one can do is to find a polynomial of degree n to which the data is closest,
called the least square solution. It will be reviewed again in Sections 5.9-5.9.2.

3.9.2 The Wronskian

Let YI , Y2, . • . , Yn be n vectors in an m-dimensional vector space V. To check the


independence of the vector Yi 's, consider its linear dependence:

CIYI + C2Y2 + ... + cnYn = O.


Let a = {Xl, X2, .. . ,x m} be a basis for V. By expressing each Yi as a linear combi-
nation of the basis vectors Xi'S: i.e., Yj = I:7=1
aijXi, the linear dependence of Yi'S
can be written as a linear combination of the basis vectors Xi'S:
3.9.2. Application: The Wronskian 111

o= ClYl + C2Y2 + . .. + CnYn (all Cl + a12c2 + + alnCn)Xl


+ (a21CI + a22C2 + + a2nCn)X2

so that all of the coefficients (which are also linear combinations of c; 's) must be zero.
It gives a homogeneous system of linear equations in c; 's, say Ac = 0 with an m x n
matrix A, as in the proof of Lemma 3.9:

Recall that the vectors Yi 's are linearly independent if and only if the system Ac = 0
has only the trivial solution. Hence, the linear independence of a set of vectors in a
finite dimensional vector space can be tested by solving a homogeneous system of
linear equations .

If V is not finite dimensional, this test for the linear independence of a set of vectors
cannot be applied. In this section, we introduce a test for the linear independence of
a set of functions . For our purpose, let V be the vector space of all functions on lR
which are differentiable infinitely many times . Then one can easily see that V is an
infinite dimensional vector space .

Let!J, h. . .. , In be n functions in V. The n functions are linearly independent


in V if
CIfl + c2h + ... + c-J« = 0
implies that all c; = O. Note that the zero function 0 takes its value zero at all the
points in the domain. Thus they are linearly independent if

CI!I (x) + c2h(x) + ... + cnfn(x) = 0

for all x E lRimplies that all c; = O. By taking a differentiations n - 1 times, one can
obtain n equations:

for all x E R Or, in matrix form :

!I (x) h(x)
f{(x) fn (x) C2
] [ Cl] [ 0 ]

[
fl(n-:l) (x) j 2(n - l) (
x) .. ·
fn(n-:l)(x) = b.
112 Chapter 3. VectorSpaces

The determinant of the coefficient matrix is called the Wronskian for {II (x), h(x),
. .. , fn(x)} and denoted by W(x) . Therefore, if there is a point Xo E JR such that
W(xo) ::j:. 0, then the coefficient matrix is nonsingular at x = xo, and so all ci = O.
Therefore,

if the Wronskian W (x) ::j:. 0 for at least one x E R


then fl' h, .. ., fn are linearly independent.

However, the Wronsk ian W (x) = 0 for all x does not imply linear dependence of the
given functions f; 's. In fact, W (x) = 0 means that the functions are linearly dependent
at each point x E JR, but the constants Cj'S giving nontrivial linear dependence may
vary as x varies in the domain . (See Example 3.25 (2).)

Example 3.25 (Test the linear independence offunctions by Wronskian)


(1) For the sets of functions FI = {x, cosx, sinx} and F2 = {x, e", e- X}, the
Wronskians are

X cosx sinx ]
WI (x) = det [ 1 - sinx =x
o -cosx -SlllX

and
x
X eX e- ]
W2(X) = det 1 eX _e- x
[ o e"
= 2x .
e- X
Since Wj(x) ::j:. 0 for x ::j:. 0, both FI and h are linearly independent.
(2) For the set offunctions [x]x], x 2 } on JR, the Wronskian for them is
2
xlxl x ]
W(x) = det [ 21xl 2x = 0

for all x. These two functions are linearly dependent on each of (-00,0] and [0, 00),
since x]x] = -x 2 on (-00,0] and xlxl = x 2 on [0, 00). But they are clearly linearly
independent functions on R 0

Problem 3.28 Show that 1, x, x 2 , .. . , x n are linearly independent in the vectorspace C(lR)
of continuous functions.

3.10 Exercises
3.1. Let V be the set of all pairs (x, y) of real numbers. Define

(x , y) + (Xt , YI) (x +xl, Y + yt>


k(x, y ) (kx, y) .

Is V a vectorspacewiththeseoperations?
3.10. Exercises 113

3.2. For x, Y E JRn and k E JR, define two operations as

xE9y=x-y, k ·x = -kx.
The operations on the right-hand sides are the usual ones . Which of the rules in the
definition of a vector space are satisfied for (JR n , E9, .)?
3.3. Determine whether the given set is a vector space with the usual addition and scalar
multiplication of functions .
(1) The set of all continuous functions f defined on the interval [-1, 1] such that f (0) =
1.
(2) The set of all continuous functions f defined on the real line JR such that Iimx->oo
f(x) = O.
(3) The set of all twice differentiable functions f defined on JR such that fl/ (x) + f (x) =
O.
3.4. Let C 2 [ -1, 1] be the vector space of all functions with continuous second derivative s on
the domain [-1, 1] . Which of the following subsets is a subspace of C 2[-1, I]?
(1) W = {I(x) E C 2[ -I , 1] : fl/ (x ) + f(x) = 0, -1::: x::: I}.
(2) W = {I(x) E C 2[-I, 1] : fl/(x) + f(x) = x 2, -1::: x::: I}.
3.5 . Which of the following subsets of C[ -1, 1] is a subspace of the vector space C[ -I, 1]
of continuous functions on [-1, I]?
(1) W = {I(x) E C[-I, 1] : f(-I) = -f(1)}.
(2) W = {I(x) E C[-I , 1] : f(x) 0 for all x in [-1, I]}.
(3) W = {I(x) E C[-I , 1] : fe-I) = -2 and f(1) = 2} .
(4) W = {I(x) E C[-I , 1] : f(i) = OJ.
3.6 . Show that the set of all matrices of the form AB - BA cannot span the vector space
Mnxn(lR).
3.7. Does the vector (3, -1, 0, -1) belong to the subspace oflR.4 spanned by the vectors
(2, -1 , 3, 2) , (-1, 1, 1, -3) and (1, 1, 9, -5) ?
3.8. Expres s the given function as a linear combination of functions in the given set Q.
(1) p(x) = -1 - 3x + 3x 2 and Q = {PI (x), P2(X), P3( X)} , where
PI(X)=1+2x+x 2, P2(X)=2+5x, P3(X)=3+8x-2x 2 .
(2) p(x) -2 - 4x + x 2 and Q {PI (x) , P2(x), P3(X), P4(X)}, where
= =
PI(X) = I+2x 2+x3, P2(X) = I+x+2x 3, P3(X) = - I - 3 x - 4x3,
P4(X) I +2x _x 2 +x 3.
=
3.9. Is {cos2 x , sin 2 x , 1, eX} linearly independent in the vector space C(JR)?
3.10. In the n-space JRn , determine whether or not the set

is linearly dependent.
3.11. Show that the given sets of functions are linearly independent in the vector space
C[-]f, ]fl.
(1) {I , x, x 2, x 3 , x 4}
114 Chapter 3. Vector Spaces

(2) {I, eX, e2x , e3x }


(3) {I, sinx, cosx, .. . , sinkx , coskx}
3.12. Are the vectors

VI = (I, I, 2, 4), V2 = (2, -I, -5,2),


V3=(1, -1, -4, 0), V4=(2, 1, 1,6)
linearly independent in the 4-space ]R4?
3.13. In the 3-space ]R3 , let W be the set of all vectors (XI , X2, X3) that satisfy the equation
XI - X2 - X3 = O. Prove that W is a subspace of]R3. Find a basis for the subspace W.
3.14. Let W be the subspace of C[ -1£ , 1£) consisting of functions of the form f(x) = a sinx +
b cos x . Determine the dimension of W.
3.15. Let V denote the set of all infinite sequences of real numbers :
V = {x: x = E ]R} .
Ifx = {x;} and y = {y;} are in V, then x + y is the sequence {x; + If cis a real
number, then ex is the sequence {ex;
(1) Prove that V is a vector space.
(2) Prove that V is not finite dimensional.
3.16. For two matrices A and B for which AB can be defined, prove the following statements:
(1) If both A and B have linearly independent column vectors, then the column vectors
of AB are also linearly independent.
(2) If both A and B have linearly independent row vectors, then the row vectors of AB
are also linearly independent.
(3) If the column vectors of B are linearly dependent, then the column vectors of AB are
also linearly dependent.
(4) If the row vectors of A are linearly dependent, then the row vectors of AB are also
linearly dependent.
3.17. Let U = {(x, y, z) : 2x + 3y + z = O} and V = {(x, y, z) : x + 2y - z = O} be
subspaces of]R3.
(1) Find a basis for U n V.
(2) Determine the dimension of U + V.
(3) Describe U , V, un V and U + V geometrically.
3.18. How many 5 x 5 permutation matrices are there? Are they linearly independent? Do they
span the vector space MSxS(]R) ?
3.19. Find bases for the row space, the column space, and the null space for each of the following
matrices .

(1) A [i IS]2
4 -3 0 , (2) B [! 2
1 -2
1
-52 ] ,

n
2 -I 1 5 0 0

[i
1 -1

1
1 -1
3
-23 '1 ]
(3) C 6 1 -1 8 3 .
(4) D [
9 0 -2 2 1
5 -5 5 10
n
3.10. Exercises 115

3.20 Hnd the rank of A as a function of x A U:


3.21 . Find the rank and the largest invertible submatrix of each of the following matrices .

(1)
ooooJ
! i1. (2) [ ;
1114
;], (3) [i i1.
IOOOJ
3.22. For any nonzero column vectors u and v, show that the matrix A = uv T has rank 1.
Conversely, every matrix of rank 1 can be written as uv T for some vectors u and v.
3.23. Determine whether the following statements are true or false, and justify your answers.
(1) The set of all n x n matrices A such that AT = A -1 is a subspace of the vector space
Mnxn(lR).
(2) If Ct and {3 are linearly independent subsets of a vector space V, so is their union
a U {3.
(3) If U and Ware subspaces of a vector space V with bases Ct and {3 respectively, then
the intersection Ct n {3 is a basis for U n W.
(4) Let U be the row-echelon form of a square matrix A. lithe first r columns of U are
linearly independent, so are the first r columns of A.
(5) Any two row-equivalent matrices have the same column space.
(6) Let A be an m x n matrix with rank m. Then the column vectors of A span jRm .
(7) Let A be an m x n matrix with rank n. Then Ax = b has at most one solution .
(8) If U is a subspace of V and x, y are vectors in V such that x + y is contained in U,
then x E U andy E U.
(9) Let U and V are vector spaces . Then U is a subspace of V if and only if dim U
dimV .
(10) Forany m x n matrix A , dimC(A T) + dimN(A T) = m.
4
Linear Transformations

4.1 Basic properties of linear transformations


As shown in Chapter 3, there are many different vector spaces even with the same
dimension . The question now is how one can determine whether or not two given
vector spaces have the 'same' structure as vector spaces , or can be identified as the
same vector space. To answer the question , one has to compare them first as sets, and
then see whether their arithmetic rules are the same or not. A usual way of comparing
two sets is to define afunction between them. When a function f is given between the
underlying sets of vector spaces , one can compare the arithmetic rules of the vector
spaces by examining whether the function f preserves two algebraic operations: the
vector addition and the scalar multiplic ation, that is, f(x + y) = f(x) + f(y) and
f(kx) = kf(x) for any vectors x, y and any scalar k. In this chapter, we discuss this
kind of functions between vector spaces .

Definition 4.1 Let V and W be vector spaces. A function T : V -+ W is called a


linear transformation from V to W if for all x, y E V and scalar k the following
conditions hold:
(1) T(x + y) = T(x) + T(y),
(2) T(kx) = kT(x) .

We often call T simply linear. It is easy to see that the two conditions for a linear
transformation can be combined into a single requirement

T(x + ky) = T(x) + kT(y).

Geometrically, the linearity is just the requirement for a straight line to be transformed
to a straight line, since x + ky represents a straight line through x in the direction y
in V , and its image T(x) + kT(y) also represents a straight line through T(x) in the
direction of T(y) in W .

HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
118 Chapter 4. Linear Transformations

Example 4.1 (Lin ear or not ) Consider the following functions:


(1) f : jR -+ jR defined by f(x ) = 2x ;
(2) g : jR -+ jR defined by g (x) = x 2 - x;
(3) h : jR2 -+ jR2 defined by h ex, y) = (x - y, 2x) ;
(4) k : jR2 -+ jR2 defined by k(x, y) = (xy , x 2 1). +
One can easily see that g and k are not linear, while f and h are linear. Moreover, on
the l-space R, all polynomials of degree greater than one are not linear. 0

Example 4.2 (A matrix A as a linear transformation)


(1) For an m x n matrix A, the transformation T : jRn -+ jRm defined by the
matrix product
T(x) = Ax
is a linear transformation by the distributive law A(x + ky) = Ax + kAy for any
x, y E jRn and for any scalar k E IR. Therefore, a matrix A, identified with the
transformation T, may be considered as a linear transformation from jRn to jRm.
(2) For a vector space V, the identity transformation id : V -+ V is defined
by id(x) = x for all x E V. If W is another vector space, the zero transformation
To : V -+ W is defined by To(x) = 0 (the zero vector) for all x E V. Clearly, both
transformations are linear. 0

The following theorem is a direct consequence of the definition, and the proof is
left for an exercise.

Theorem 4.1 Let T : V -+ W be a linear transformation. Then


(1) T(O) = O.
(2) For any xi , X2 , . .. , xn E V and scalars kl , k2, . .. , k«.

Nontrivial important examples of linear transformations are the rotations , reflec-


tions, and projections in a geometry.

Example 4.3 (The rotations, reflections and projections in a geometry)


e
(1) Let denote the angle between the x-axis and a fixed vector in jR2. Then the
matrix
R _ [ cos sin e - e]
(j - sine cos s
defines a linear transformation on jR2 that rotates any vector in JR2 through the angle
e about the origin. It is called a rotation by the angle e.
(2) The projection on the x -axis is the linear transformation P : JR2 -+ jR2
defined by, for x = (x , y) E jR2,

P(x) = [ ; ] = [ ] .
4.1. Basic properties of linear transformation 119

y y y.
Ro(x)
x x
x

x
P(x) rex)
Figure 4.1. Three linear transformations on ]R2

(3) The linear transformation T : ]R2 -+ ]R2 defined by, for X = (x, y),

is the reflection about the x-axis. o


Problem 4.1 Find the matrix of the reflection about the line y = x in the plane ]R2 .
Example 4.4 (Differentiations and integrations in calculus) In calculus , it is well
known that two transformations

defined by differentiation and integration,

D(f)(x) = !,(x), I(f)(x) = i X

f(t)dt,

satisfy linearity, and so they are linear transformations . Many problems related with
differential and integral equations may be reformulated in terms of linear transforma-
tions . 0

Definition 4.2 Let V and W be two vector spaces, and let T : V -+ W be a linear
transformation from V into W.
(1) Ker(T) = (v E V : T(v) = O} V is called the kernel of T.
(2) Im(T) = (T(v) E W : v E V} = T(V) W is called the image of T .

Example 4.5 Let V and W be vector spaces and let id : V -+ V and To : V -+ W


be the identity and the zero transformations, respectively. Then it is easy to see that
Ker(id) = (OJ, Im(id) = V, Ker(To) = V and Im(To) = to}. 0

Theorem 4.2 Let T : V -+ W be a linear transformation from a vector space V to


a vector space W. Then the kernel Ker(T) and the image Im(T) are subspaces of V
and W, respectively.
120 Chapter 4. Linear Transformations

Proof: Since T(O) = 0, each of Ker(T) and Im (T) is nonempty having O.


(1) For any x, y E Ker(T) and for any scalar k ,

T(x + ky) = T(x) + kT(y) = 0 + kO = O.


Hence x + ky E Ker(T), so that Ker(T) is a subspace of V .
(2) If v, W E Im(T), then there exist x and y in V such that T(x) = v and
T(y) = w. Thus, for any scalar k,

v + kw = T(x) + kT(y) = T(x + ky) .

Thus v + kw E Im(T), so that Im(T) is a subspace of W. o


=
Example 4.6 (Ker(A) N(A) and Im(A) = C(A)forany matrix A) Let A : JRn --+
JRm be the linear transformation defined by an m x n matrix A as in Example 4.2(1).
The kernel Ker(A) of A consists of all solutions of the homogeneous system Ax = O.
Hence, the kernel Ker(A) of A is nothing but the null space N(A) of the matrix A,
and the image Im(A) of A is just the column space C(A) = Im(A) JRm of the
matrix A. Recall that Ax is a linear combination of the column vectors of A. 0
Example 4.7 (The trace is linear) A trace is a function tr : M nxn (JR) --+ JR defined
by the sum of diagonal entries
n
tr(A) = au + a22 + . . . + ann = Laii
i=1
for A = [aij] E M n xn (JR), and tr(A) is called the trace of the matrix A. It is easy to
show that
tr(A + B) = tr(A) + tr(B) and tr(kA) = k tr(A)
for any two matrices A and B in Mnxn(JR), which means that 'tr' is a linear transfor-
mation from Mnxn(JR) to the l -space R In addition, one can easily show that the set
of all nxn matrices with trace 0 is a subspace of the vector space Mnxn(JR). 0

Problem 4.2 Let W = (A E Mnxn(R) : tr(A) = OJ. Show that W is a subspace, and find a
basis for W .

Problem 4.3 Showthat, for any matrices A and B in Mn xn (R) , tr(AB) = tr(BA) .
One of the most important properties of linear transformations is that they are
completely determined by their values on a basis.

Theorem 4.3 Let V and W be vector spaces. Let {VI , V2, ... , vnl be a bas is for V
and let WI, W2, . . . , Wn be any vectors (possibly repeated) in W. Then there exists a
unique linear transformation T : V --+ W such that T (Vi) =
Wi for i = 1, 2, . . . , n.

Proof: Let x E V . Then it has a unique expression: x = 2:7=1 aiv, for some scalars
aI, a2, .. . , an. Define
4.1. Basic properties of linear transformation 121
n
T :V W by T(x) = I>iWi.
i=1
In particular, T(Vi)= Wi for i =
1,2, . . . , n.
Linearity : For x= =
2:7=1 aivi, Y 2:7=1 biv, E V and k a scalar. we have
=
x+ky 2:7=I(a i +kbi)Vi . Then
n n n
T(x +ky) = L(ai +kbi)Wi = Laiwi +k LbiWi = T(x) +kT(y).
i=1 i=1 i=1
Uniqueness: Suppose that S : V W is linear and S(Vi) = Wi for i =
1.2• . .. , n. Then for any x E V with x = 2:7=1 aiv ], we have
n n
S(x) = LaiS(Vi) = Laiwi = T(x) .
i=1 i=1
Hence. we have S = T. o
The uniqueness in Theorem 4.3 may be rephrased as the following corollary.

CoroUary4.4 Let Vand Wbevectorspacesandlet{v], V2, ...• vn}beabasisforV.


If S , T : V =
W are linear transformations and Stvn T(vi)fori =
1.2• .. . . n;
then S = T, i.e., S(x) = T(x) for all x E V.

Example 4.8 (Linear extension of a transformation defined on a basis) Let WI =


(1. 0). W2 = (2. -1). w3 = (4. 3) be three vectors in ]R2.
(1) Let ex = {el. e2. e3} be the standard basis for the 3-space]R3. and let T : ]R3
]R2 be the linear transformation defined by

Find a formula for T(XI , X2. X3) . and then use it to compute T(2 . -3. 5).
(2) Let f3 = {VI,V2 , V3} be another basis for ]R3. where VI = (1, 1. 1). V2 =
(1. 1. 0). V3 = (1. O. 0). and let T : ]R3 ]R2 be the linear transformation defined
by

Find a formula for T(XI, X2. X3). and then use it to compute T(2. -3, 5).

3 3
T(x) = LXiT(ei) = LXiWi
i=1 i=1
= XI(l ,0)+X2(2, -1)+X3(4, 3)
(XI + 2X2 + 4X3, -X2 + 3X3).
122 Chapter 4. Linear Transformations

Thus, T(2, -3, 5) = (16, 18). In matrix notation, this can be written as

[ oI -I24]
3
[ ]= [
X3
Xl + 2X2 + 4X3 ] .
- X2 + 3X3

(2) In this case, we need to express x = (Xl, X2, X3) as a linear combination of
VI, V2, V3, i.e.,
3
(XI ,X2,X3) = LkiVi = kl(l, I, 1)+k2(1, I, 0)+k3(1,0,0)
i=l

By equating corresponding components we obtain a system of equations

= Xl
= X2
= X3 ·

The solution is kl = X3, k2 = X2 - X3, k3 = Xl - X2. Therefore,

(Xl, x2, X3) = x3VI + (X2 - X3)V2 + (Xl - X2)V3, and

T(XI,X2,X3) = X3T(VI)+(X2-X3)T(V2)+(XI-X2)T(V3)
= x3(1, 0) + (X2 - x3)(2, -I) + (X] - x2)(4, 3)
= (4XI - 2X2 - X3, 3XI - 4X2 + X3).

From this formula, we obtain T(2, -3, 5) = (9, 23) . o


Problem 4.4 Is there a linear transformation T : ]R3 -l- ]R2 such that T(3 , I, 0) = (1, I)and
T (-6, -2. 0) = (2, I) ? If yes, can you find an expression of T (x) for x = (xj , x2. X3) in
]R3?

Problem 4.5 Let V and W be vector spaces and T : V W be linear.Let {WI, w2•. . .• Wk}
be a linearly independentsubset of the image Im(T) £ W. Suppose that ex = {V]. v2, . . .• vkl
is chosen so that T(Vi) = Wi for i = I. 2, .. . , k. Prove that ex is linearly independent.

4.2 Invertible linear transformations


A function I from a set X to a set Y is said to be invertible if there is a function g
from Y to X such that their compositions satisfy go 1= id and log =
id . Such a
function g is called the inverse function of I and denoted by g = 1-1. One can notice
that if there exists an invertible function from a set X into another set Y, then it gives
a one-to-one correspondence between these two sets so that they can be identified
as sets. A useful criterion for a function between two given sets to be invertible is
4.2. Invertible linear transformations 123

that it is one-to-one and onto. Recall that a function f : X Y is one-to-one (or


injective) if feu) = f(v) in Y implies u = v in X, and is onto (or surjective) iffor
each element y in Y there exists an element x in X such that f (x) = y. A function is
said to be bijective if it is both one-to-one and onto, that is, if for each element y in
Y there is a unique element x in X such that f (x) = y .

Lemma 4.5 A function f : X Y is invertible if and only if it is bijective (or


one-to-one and onto) .

Proof: Suppose f : X Y is invertible, and let g : Y X be its inverse. If


feu) = f(v), then u = g(f(u)) = g(f(v)) = v. Thus f is one-to-one. For each
y E Y, let g(y) = x in X. Then f(x) = f(g(y)) = y. Thus it is onto.
Conversely, suppose f is bijective. Then, for each y E Y, there is a unique x E X
such that f(x) = y . Now for each y E Y define g(y) = x. Then one can easily check
that g : Y X is well defined, and that fog = id and go f = id, i.e., g is the
inverse function of f. 0

If T : V Wand S : W Z are linear transformations, then it is easy to


show that their composition (S 0 T)(v) = S(T(v)) is also a linear transformation. In
particular, if two linear transformations are defined by matrices A : lRn lRm and
m k
B : lR lR as in Example 4.2(1), then their composition is nothing but the matrix
product BA of them, i.e., (B 0 A)(x) = B(Ax) = (BA)x.
The following lemma shows that if a given function is an invertible linear trans-
formation from a vector space into another, then the linearity is preserved by the
inversion.

Lemma 4.6 Let V and W be vector spaces. If T : V W is an invertible linear


transformation, then its inverse T - I : W V is also linear.

Proof: Let wj , W2 E W, and let k be any scalar. Since T is invertible, there exist
unique vectors VI and V2 in V such that T(VI) = WI and T(V2) = W2. Then

T-I(WI + kW2) = T- I (T(VI) + kT(V2))


T- I (T(VI + kV2))
= VI + kV2
T-I(WI) + kT- I(W2). o
Definition 4.3 A linear transformation T : V W from a vector space V to another
W is called an isomorphism if it is invertible (or one-to-one and onto) . In this case,
we say that V and W are isomorphic to each other.

Example 4.9 (The vector space Pn (R) is isomorphic to lRn + 1) Consider the vector
space P2(lR) = {a + bx + cx 2 : a , b, c E R] of all polynomials of degree 2
with real coefficients. To each polynomial a + bx + cx 2 in the space P2(lR) , one
can assign the column vector [a b c f in lR3 . Then it is not hard to see that it is
124 Chapter 4. Linear Transformations

an isomorphism from the vector space P2(JR) to the 3-space JR3 , by which one can
identify the polynomial a + bx + cx 2 with the column vector [a b cf. It means
that these two vector spaces can be considered as the same vector space through the
isomorphism . In this sense, one often says that a vector space can be identified with
another if they are isomorphic to each other. In general, the vector space Pn (JR) can
be identified with the (n + I)-space JRn+l. 0

It is clear from Lemma 4.6 that if T is an isomorphism, then its inverse T- I


is also an isomorphism with (T- I ) -I = T . In particular, if a linear transformation
A : JRn -+ JRn is defined by an invertible n x n matrix A as in Example 4.2(1), then
the inverse matrix A -I plays the inverse linear transformation, so that it is also an
isomorphism on JRn . That is, a linear transformation A : JRn -+ JRn defined by an
n x n square matrix A is an isomorphism if and only if A is invertible, that is, rank
A =n.
Problem 4.6 Suppose that Sand T are linear transformations whose composition SoT is well
defined. Prove that
(1) if SoT is one-to-one, so is T,
(2) if SoT is onto, so is S,
(3) if Sand T are isomorphisms, so is SoT,
(4) if A and B are two n x n matrices of rank n, so is AB .

Problem 4.7 Let T : V W be a linear transformation. Prove that


(1) T is one-to -one if and only if Ker(T) = {O}.
(2) If V = W, then T is one-to-one if and only if T is onto.

Theorem 4.7 Two vector spaces V and Ware isomorphic if and only if dim V =
dimW.

Proof: Let T : V -+ W be an isomorphism, and let (VI, V2, .. " vnl be a basis for
V. Then we show that the set (T(vj), T(V2), . . . , T(v n )} is a basis for W so that
dim W = n = dim V.
(l) It is linearly independent: Since T is one-to-one, the equation

implies that 0 = CI VI + C2V2 + + Cn Vn. Since the Vi'S are linearly independent,
we have c, = 0 for all i = 1,2, ,n.
(2) It spans W: Since T is onto, for any yEW there exists an x E V such that
T(x) = y. Write x = L:?=I aiVi. Then

i.e., y is a linear combination of T(VI) , T(V2) , .. . , T(v n ) .


Conversely, if dim V = dim W = n, then one can choose bases {VI, V2, • . . , vn }
and {WI, W2, .. . , wn } for V and W, respectively. By Theorem 4.3, there exist linear
4.2. Invertible linear transformations 125

transformations T : V --+ Wand S : W --+ V such that T(Vi) = Wi and S(Wi) = Vi


fori = 1,2, . .. , n.Clearly,(SoT)(Vi) = Vi and (ToS)(wj) = Wi fori = 1,2, . . . , n ,
which implies that SoT and T 0 S are the identity transformations on V and W,
respect ively, by the uniqueness in Corollary 4.4. Hence , T and S are isomorphisms,
and consequently V and W are isomorphic. 0

Corollary 4.8 Let V and W be vector spaces.


(1) If dim V = n, then V is isomorphic to the n-space jRn.
(2) If dim V = dim W, any bijective function from a basis for V to a basis for W
can be extended to an isomorphism from V to w.
An isomorphism between a vector space V and jRn in Corollary 4.8 depends on the
choices of bases for two spaces as shown in Theorem 4.7. However, an isomorphism
is uniquely determined if we fix the bases in which the order of the vectors is also
fixed.
An ordered basis for a vector space is a basis endowed with a specific order.
For example, in the 3-space jR3 , two bases {eI, e2, e3} with the order eI, e2, e3 and
{e2, eI, e3} with the order e2, ej , e3 are clearly different as ordered bases, but the
same as unordered ones . The basis {el , e2, ... , en} with the order ej , e2, . .. , en is
called the standard ordered basis for IRn. However, we often say simply a basis for
an ordered basis if there is no ambiguity in the context.
Let V bea vector space of dimension n with an ordered basis a = {VI , V2, ... , vn},
and let f3 = {el, e2, ... , en} be the standard ordered basis for IRn. Then the isomor-
phism <t> : V --+ IRn defined by <t> (vj) = ei is called the natural isomorphism with
respect to the basis a. By this isomorphism, a vector in V can be identified with a
column vector in IRn. In fact, for any x = I:7=1 aiv, E V , the image of x under this
natural isomorphism is written as

<t>(x) = n
= n
= (al, . . . , an) = [ al: ]
E IRn,
.=1 1=1 an

which is called the coordinate vector of x with respect to the ordered basis a, and it
is denoted by [x]a. Clearly [vila = er.

Example 4.10 (1) Recall that, from Example 4.3, the rotation by the angle () of IR2
is given by the matrix
R = [ cos () - sin () ]
IJ sin () cos () .

Clearly, it is invertible and hence is an isomorphism of IR2. In fact, the inverse I isR;
another rotation R_IJ .
(2) Let a = [ej , e2} be the standard ordered basis, and let f3 = {VI, V2}, where
Vi = RlJei , i = 1,2. Then f3 is also a basis for IR2. The coordinate vectors of Vi with
respect to a are
126 Chapter 4. Linear Transformations

COS 0 ] [ - sin 0 ]
[vtJa =[ sin s ' [vzl a = cosO'

l l
while

[vIlp =[ [vzlp =[
If we choose a' = {ez, eil as a different ordered basis for IR z , then the coordinate
vectors of Vi with respect to sx' are

[vIla ' = [ ], [vzl a l = [ ] . o


Example 4.11 (All reflections are of theform Re 0 T 0 R-e) In the plane IR z, the
reflection about the line y = x can be obtained by the compositions of the rotation
by -f of the plane, the reflection about the x-axis, and the rotation by Actually, l
it is a product of matrices given in (1) and (3) of Example 4.3 with 0 Note that = f:
f
the rotation by is

R" =
COS f - sin frr ] -_ 1
- 1
]'
'4 [ sin!!.4 cos
'4 .J2.J2

and the reflection aboutthex-axis is _ land R_1- = Rjl. Hence, the matrix
for the reflection about the line y = x is

R"'4 [10 -10 ] R;l


'4
= [1 -1]
-
.J2
-
,J2
[10 -10 ] [ - - - 1 1] =
.J2 .J2
[01 01 ] .

In general, the reflection about a line .e in the plane can be expressed as the
composition Re 0 T 0 R-e , where T is the reflection about the x-axis and 0 is the
angle between the x-axis and the line .e (see Figure 4.2). 0

Problem4.8 Find the matrix of reflectionabout the line y = ,J3x in ]RZ.


Problem4.9 Find the coordinatevectorof 5 + 2x + 3x z with respectto the givenorderedbasis
ex for PZ(]R) :
(l)ex={l , x ,x z}; (2)ex={I+x ,l+x z, x+ x z}.

4.3 Matrices of linear transformations


We have seen that the product of an m x n matrix A and an n x 1 column matrix x
gives rise to a linear transformation from IR n to IR m . Conversely, one can show that a
linear transformation of a vector space into another can be represented by a matrix via
4.3. Matrices of linear transformations 127

x y

R_o(x)

y = RO 0 T 0 R_o(x)
x T

Ro To R_O(x)

Figure 4.2. The reflection Ro 0 T 0 R-o

the natural isomorphism between an n-dimensional vector space V and the n-space
jRn, which will be shown in this section.
Let T : V -+ W be a linear transformationfrom an n-dimensionalvectorspace V
toanm-dimensional vectorspace W, and let ex = {VI, . . . , vn } and,B = {WI, . . . , wm }
be any ordered bases for V and W, respectively, which will be fixed throughout this
section. Then by Theorem 4.3 the linear transformation T is completely determined
by its values on a basis ex: Write them as

{ T(v,) = allWI + aZlwz + + amlWm


T(vz) = alZwI + aZZwz + + amZWm

T(v n) = alnwl + aznwz + ... + amnwm,


or, in a short form,
m
T(vj) = I>ijWi for I :5 j :5 n,
i=1

for some scalars aij (i = 1, 2, . . . , m; j = 1, 2, ... , n) .


Now,for any vector x = LJ=I XjVj E V,

Equivalently, the coordinate vector of T(x) with respect to the basis ,B in W is

LJ=I aljx j] [all' .. aln] [ XI ]


[T(x)]p =
[
n: =: : : = A[x]a.
Lj=1 amjxj amI amn Xn
That is, for any x E V the coordinatevector[T(x)]p of T(x) in W isjust the productof
a fixedmatrix A and the coordinatevector[x]a of x. This situationcan be incorporated
128 Chapter 4. Linear Transformations

T
V W

X f--""> T(x)

j T T j
[x]a f--""> [T(x»)p

jRn jRm ,
A = [T]g
Figure 4.3. The associated matrix for T

in the commutative diagram in Figure 4.3 with the natural isomorphisms ct> and
\II, defined in Section 4.2. Note that the commutativity of the diagram means that
A 0 ct> = \II 0 T.
Note that

is the matrix whose column vectors are just the coordinate vectors [T (v j)]p of T (v j )
with respect to the basis (3. In fact, A= [aU] is just the transpose of the coefficient
matrix in the expression of T(V i) with respect to the basis {3 in W . Note that this
matrix is unique since the coordinate expression of a vector with respect to a
fixed basis is unique.
Definition 4.4 The matrix A is called the associated matrix for T (or the matrix
representation of T) with respect to the ordered bases (¥ and {3 , and denoted by
A = [T]g. When V = Wand a = {3, we simply write [T]a for
Now, the argument so far can be summarized in the following theorem.
Theorem 4.9 Let T : V W be a linear transformation from an n-dimensional
vector space V to an m-dimensional vector space W. Forfixed ordered bases a =
{VI, V2, ... , vnl for V and {3 for W, there corresponds a unique associated m x n
matrix [T]g for T such that for any vector x E V the coordinate vector [T(x)]p of
T (x) with respect to {3 is given as a matrix product ofthe associated matrix [T]g for
T and the coordinate vector [x]a. i.e.,

[T(x)]p = [T]g[x]a . I
The associated matrix [T]g is given as
4.3. Matrices of linear transformations 129

The following examples illustrate the computation of the associated matrices for
linear transformations.

Example 4.12 (The associated matrix [id]a) Let id : V V be the identity trans-
formation on a vector space V. Then for any ordered basis a for V, the matrix
[id]a = I, the identity matrix , because if a = {VI , V2 , " " vn } , then

id(vI> = IVI + OV2 + + OVm

j
id(V2) = OVI + IV2 + + OVm
·· ..
· .
id(vn ) = OVI + OV2 + + Iv m • o
Example 4.13 (The associated matrix Let T : PI (lR) P2(lR) be the linear
transformation defined by
(T(p»)(x) = xp(x) .

Find the associated matrix with respected to ordered bases a = {I, x} and
f3 = {I , x, x 2 } for PI(lR) and P2(lR),respectiveiy.

Solution: Clearly,

(T(I»(x) = x = O· I + Ix + Ox2
{ (T(x»(x) = x 2 = O· I + Ox + Ix 2.

[! n.
Hence, the associated matrix for T is the transpose of the coefficient matrix in this

expression, that is, 0

Example 4.14 (The associated matrices and Let T : lR2 lR3 be the
linear transformation defined by T(x , y) = (x + 2y, 0, 2x + 3y) with respect to
the standard bases a and f3 for lR2 and lR3, respectively. Then

T(el) = T(I,O) = (I, 0, 2) = lej + Oe2 + 2e3


{ T(e2) = + +

i:
T(O, I) = (2, 0, 3) = 2el Oe2 3e3.

Hence, P' = Ie, • ., . ej ], then 0

Example 4.15 (The associated matrix [T]a for the standard basis a) Let T : lR2
lR 2bealineartransformationgivenbyT(l , I) = (0, l)andT(-I, I) = (2, 3).Find
the matrix representation [T]a of T with respect to the standard basis a = [ej , e2}.
130 Chapter 4. Linear Transformations

Solution: Note (a, b) = ae) + be2 for any (a, b) E jR2. Thus, the definition of T
shows
T(el)
-T(e» ++T(e2) = T(-e) +
= T(el + e2) = T(l, 1) =
e2) = T(-l , 1) =
(0 , 1) =
(2, 3) = 2el + 3e2.

By solving these equations, we obtain

T(el) = -el
T(e2) = el

Therefore, rn, = [ = ;l 0

Example 4.16 (The associated matrix [T]p for a non-standard basis 13) Let T be
the linear transformation given in Example 4.15. Find [TJp for a basis 13 {VI,V2}, =
where VI = (0, 1) and V2 = (2, 3).

Solution: From Example 4.15,

T(v» = ;] [ ] = [ ; ] = [T(VI)]a ,

T(V2) = ;] [; ] = [ ] = [T(V2)]a.

To write these vectors as linear combinations of basis vectors in 13, we put

[ ; ] = aVI + bV2 = [ a +;: ], ] + = CVI dV2 = [ C ].


Solving for a, b, Cand d, we obtain

[T(VI)]p = [ : ] = [ ] and [T(V2)]p = [ ] = [ ; ] .

Therefore, [T]p = '2 )[1 5]


1 1 .
o
Remark: (1) Recall that any m x n matrix A can be considered as a linear trans-
formation from the n-space jRn to the m-space jRm via x 1-+ Ax . Clearly, its matrix
representation with respect to the standard bases ex for jRn and 13 for jRm is the matrix
A itself, i.e., A = (Note that Aej is just the j-th column vector of A.) In partic-
ular, if A is an invertible n x n square matrix, then the column vectors CI, C2, ... , Cn
form another basis y for jRn. Thus, A is simply the linear transformation on jRn that
takes the standard basis ex to y, in fact,
4.4. Vectorspaces of linear transformations 131

the j-th column of A, so that its matrix representation is the identity matrix.
(2) Let V and W be vector spaces with bases a and {3, respectively , and let
T : V --+ W be a linear transformation with the matrix representation A. Then =
itis clear that Ker(T) and Im(T) are isomorphic to the null spaceN(A) and the column
space C(A) , respectively , via the natural isomorphisms. In particular, if V = JRn and
W = JRm with the standard bases, then Ker(T) = N(A) and Im(T) = C(A) .
Therefore, from Theorem 3.17, we have

dim Ker(T) + dim Im(T) = dim V. I


(3) Let Ax = b be a system of linear equations with an m x n coefficient matrix A. By
considering the matrix A as a linear transformation from JRn to JRm , one can have other
equivalent conditions to those mentioned in Theorems 3.24 and 3.25: The conditions
in Theorem 3.24 (e.g., C(A) = JRm) are equivalent to the condition that A is surjective,
and those in Theorem 3.25 (e.g., N(A) = {On are equivalent to the condition that A
is one-fa-one. This observation gives the proof of (15)-(16) in Theorem 3.26.

Problem 4.10 Find the matrix representations [TJa and [Tl,8 of each of the following linear
transformations T on jR3 with respect to the standard basis a. = {eI, e2, e3} and another
fJ = [ej , e2, ej}:
(1) T(x, y, z) = (2x - 3y + 4z, Sx - y + 2z , 4x + 7y),
(2) T(x, y, z) = (2y + z, x - 4y , 3x).

Also , find the matrix representation of each of the linear transformations T.

Problem 4.11 Let T : jR4 _ jR3 be the linear transformation defined by

T(x , y , z , u)=(x+2y, x - 3z + u, 2y+3z+4u) .

Let a. and fJ be the standard bases for jR4 and jR3 , respectively . Find

Problem 4.12 Let id : jRn _ jRn be the identity transformation. Let Xk denote the vector in
jRn whose first k - 1 coordinates are zero and the last n - k + 1 coordinates are 1. Then clearly
fJ = [xj , . .. , xnl is a basis for jRn (see Problem 3.9) . Let a. = {eI, . . . , en} be the standard
basis for jRn. Find the matrix representations and

4.4 Vector spaces of linear transformations


Let V and W be two vector spaces of dimensions nand m. Let £(V; W) denote the
set of all linear transformations from V to W, i.e.,

£(V; W) = {T : T is a linear transformation from V to W} .

For any two linear transformations Sand T in £(V; W) and A E JR, we define the
sum S + T and the scalar multiplication 'AS by
132 Chapter 4. Linear Transformations

(S + T)(V) = S(V) + T(v) and (AS)(v) = A(S(v))


for any v E V. Clearly, the sum S + T and the scalar multiplication AS are also linear
and satisfy the operational rules of a vector space, so that L(V; W) becomes a vector
space.
Let a and fJ be two ordered bases for V and W, respectively and let T : V --+ W
be a linear transformation. Then the associated matrix of T with respect to these
bases is uniquely determined by Theorem 4.9. That is, the function </J : L(V; W) --+
MmxnOR) defined by
</J(T) = E MmxnOR)

for T E L(V; W) is well defined (see Section 4.3) .

Lemma 4.10 The function </J : L(V; W) --+ MmxnOR) is a one-to-one correspon-
dence between L(V; W) and Mmxn(IR).

Proof: (1) It is one-to-one : If = for Sand T in L(V; W), then we have


=
S T by Corollary 4.4.
(2) It is onto: For any m x n matrix A (considered as a linear transformation from
IRn to IRm), define a linear transformation T : V --+ W by T = w- I 0 A 0 4> as the
composition of A with the natural isomorphisms 4> : V --+ IR n and W : W --+ IRm.
Then clearly = A, i.e., </J is onto. 0

Furthermore, the following lemma shows that </J is linear, so that it is in fact an
isomorphism from L(V; W) to Mm xn(IR).

Lemma 4.11 Let V and W be vector spaces with ordered bases a and fJ, respectively,
and let S, T : V --+ W be linear. Then we have

[S + = + and =

Proof: Leta = {VI , .. . , vn } andfJ = {WI, ... , w m }. Then we have unique expres-
sions S(Vj) = Lr=1
aijw; and T(vj) = Lr=1
bijw; for each I s j s n, so that
= [aij] and = [bij]' Hence
m m m
(S + T)(Vj) = I>ijW; + I )ijW; = L(aij + bij)w;.
;=1 ;=1 ;=1
Thus
[S + = +
The proof of the second equality = is similar and left as an exercise. 0

In particular, if V = IRn and W = IRm, then the vector space Mm xn(IR) of m x n


matrices may be identified with the vector space L(IRn; IRm), since such a matrix A is
4.4. Vectorspaces of linear transformations 133

a linear transformation and A itself is the matrix representation of itself with respect
to the standard bases of lRn and lRm,
One can summarize our discussions in the following theorem:

Theorem 4.12 For vector spaces V ofdimension nand W ofdimension m, the vector
space ,C(V; W) ofall linear transformations from V to W is isomorphic to the vector
space Mmxn(lR) of all m x n matrices, and

dim£(V ; W) = dim Mmxn(lR) = mn = dim V dim W.


Remark: With an isomorphism from the vector space £(V; W) to the vector space
M mxn (R) as mentioned in Theorem 4.12, one can prove that the following conditions
for a linear transformation T on a vector space V are equivalent, as mentioned in
Theorem 3.26:
(1) T is an isomorphism,
(2) T is one-to-one,
(3) T is surjective.
(One can also prove it directly by using the definition of a basis for V. See Prob-
lem4.7.)

The next theorem shows that the one-to-one correspondence between £(V ; W)
and Mm xn (R) preserves not only the vector space structure but also the compositions
oflinear transformations, Let V, Wand Z be vector spaces. Suppose that S : V --+ W
and T : W --+ Z are linear transformations, Then the composition T 0 S : V --+ Z is
also linear.

Theorem 4.13 Let V, Wand Z be vector spaces with ordered bases a , {3, and y ,
respectively. Suppose that S : V --+ Wand T : W --+ Z are linear transformations.
Then
[T 0 S]!; =

Proof: Leta = (v), .. . , vn }, f3 = {WI, .. . , wm}andy = (z), .. . =


[aij] and = [bpq ] . Then, for 1 s i s n,

(T 0 S)(Vi) = T(S(Vi)) =T (tbkiWk) = tbkiT(Wk)


k=) k=)

= tbki (tajkZj) =t (tajkbki) Zj .


k=l j=) j=) k=)

It shows that [T 0 S]!; = o


134 Chapter 4. Linear Transformations

Problem 4.13 Let ex be the standard basis for lR3, and let S, T : lR3 lR3 be two linear
transformations given by

S(el) = (2, 2, 1), S(e2) = (0, I, 2), S(e3) = (-1, 2, I) ,


T(el) = (1, 0, 1), T(e2) = (0, 1, I), T(e3) = (1, 1, 2).

Compute [S + Tl a, [2T - Sla and [T 0 Sla .

Problem 4.14 Let T : P2(JR) P2(JR) be the linear transformation defined by T (f) (3 + =
x)J' +2J,andletS : P2(lR) lR3 be the one definedby S(a+bx+cx 2) =
(a-b , a+b , c).
Fora basis ex = {l,x , x 2} for P2(lR) and the standard basis f3 = {el,e2, e3}for]R3, compute
p p
[Sla, [T]a and [S 0 T]a .

Theorem 4.14 Let V and W be vector spaces with ordered bases a and {3, respec-
tively, and let T : V W be an isomorphism. Then

Proof: Since T is invertible, dim V = dim W, and the matrices and [T-I]p
are square and of the same size. Thus,

is the identity matrix. Hence, [T-I]p = -I . o


In particular, if a linear transformation T : V W is an isomorphism, then
is an invertible matrix for any bases a for V and {3 for W.

Problem 4.15 For the vector spaces PI (JR) and lR2, choose the bases ex = {I , x } for PI (R) and
f3= {el, e2} for lR2, respectively. Let T : PI (R) ]R2 be the linear transformation defined
by T(a + bx) = (a, a + b) .
(1) Show that T is invertible. (2) Find and [T-Il p'

4.5 Change of bases

In Section 4 .2, we have seen that, in an n-dimensional vector space V with a fixed
basis a, any vector x can be identified with a column vector [x]a in the n-space jRn via
the natural isomorphism <1>. Of course, one may get a different column vector [x]p if
another basis {3 is given instead of a . Thus, one may naturally ask what the relation
between [x], and [x]p is for two different bases a and {3.
To answer this question, let us begin with an example in the plane jR2. The coor-
dinate expression ofx = (x, y) E jR2 with respect to the standard basis a = Iei. e2}

is x = xel + ye2, so that [x]a = [ ].


4.5. Change of bases 135

Figure 4.4. Coordinates {el' e2l and {e;, e2l

Now let fJ = {e;, e 2} be another basis for]R2 obtained by rotating ex counterclock-

l
wise through an angle 8 as in Figure 4.4, and suppose that the coordinate expression
ofx E ]R2 with respect to fJ is written as x = x'e; + y'e 2, or [xl,B =[ Then,
the expressions of the vectors in fJ with respect to ex are

e; = id(eD = cos8 el + sin8 e2


{ e = id (e ) =
2 2 - sin8 el + cos8 e2,
so
, [ - sin8 ]
[e; ]a = [ ] , [e 2la = cos8 '
Therefore, from x = xel + ye2 and
x = x' e; + y' e2 = (x' cos 8 - y' sin 8)el + (x' sin 8 + y' cos 8)e2,
one can have the matrix equation:

[ x]
y
= [ 8 - sin 8 ] [
sm8 cos8 y'
x' ], or [x], = [id]p[xl,B,

where
[i d]a = [[ e' ] [e']] = [ 8 - sin 8 ] .
,B I a 2a sin 8 cos 8

It means that two coordinate vectors [x]a and [x],B in the 2-space ]R2 are related
by the associated matrix [id]p for the identity transformation id on ]R2. Note that

[idlg = ([idl,Ba)-1 = [ by Theorem 4.14 .


- smo coso

In general, if ex = {VI,V2, . . . , vn} and fJ = {WI,W2, . . . , wnl are two ordered


bases for an n-dimensional vector space V, then any vector x E V has two expressions:
136 Chapter 4. Linear Transformations
n n
X = l:::XiVi = LYjWj .
i=1 j=1

In particular, each vector in f3 is expressed as a linear combination of the vectors in


a: say Wj = id(wj) = I:7=1 quVi for j = 1,2, ... , n, so that

Then for any x E V,


n n n n
X = LXiVi = LYjWj LYj LqijVi
i= 1 j=1 j=1 i=1

This is equivalent to the matrix equation

[xla = [t J=I
quY j] = [idlp[xlfl

or

where
qll ... qln]
[id]p = '. = [[wIla .. . [Wn]a]·
[
qnl qnn
This means that any two coordinate vectors of a vector in V with respect to two
different ordered bases a and f3 are related by the matrix representation [id]p of
the identity transformation on V, and this can be incorporated in the commutative
diagram in Figure 4.5 (next page).
Definition 4.5 The matrix representation [id]p of the identity transformation id :
V -+ V with respect to any two ordered bases a and f3 is called the basis-change
matrix or the coordinate-change matrix from f3 to a.
Since the identity transformation id : V -+ V is invertible, the basis-change
matrix Q = [id]p is also invertible by Theorem 4.14. If we had taken the expressions
of the vectors in the basis a with respect to the basis f3 : Vj = id(vj) = L:7=1 PijWi
for j = 1,2, . . . , n, then [Pij] = = Q-I and
[x]fl = = ([id]p)-I[X]a'
4.5. Change of bases 137

id

I
V V
x 1--- ...... x

T T
[xlp [xla
jRn jRn.
Q = [idl p

Figure 4.5. The basis-change matrix [idlp

Example 4.17 (Analytic interpretation of a basis-change matrix ) Consider a curve


xy = I on the plane ]R2 . Find the quadric equation of the cur ve which is obtained
from the curve x y = I by rotating around the origin clockwise through an angle x /4.

Solution: .Let f3 = {e; , e;l be the basis for ]R2 obtained by rotating the standard

basis a = {et , e2l counterclockwis e through an angle n /4, and let[x]a = [ ] and

[x]p = [ ] . Then,

[ ] = (i [ ] = [ f - f][ ] = - ] [ l
Hence, the equation xy = I is transformed to

1= xy = _ + = (x:/ _
which is a hyperbola. (See Figure 4.6.) o

Figure 4.6. The graphs of xy = 1 and T - æ( ') 2 = 1


(X') 2
138 Chapter 4. Linear Transformations

Example 4.18 (Computing a basis-change matrix) Let the 3-space JR 3 be equipped


with the standardryz-coordinate system, i.e., with the standard basis a = [ej, e2, e3}.
Take a new x'y'z' -coordinate system by rotating the xyz-system around its z-axis
counterclockwise through an angle (), i.e., we take a new basis f3 = {e;, e2, e by J}
rotating the basis a about z axis through (). Then we get

- Sin()]
[e2]a =
[
o ,

Hence, the basis-change matrix from f3 to a is


COS () - sin e
Q = [id]p = sin () cos ()

n
[
o 0

[n
so

Ixl, Q[x],

Moreover, Q = [id]p is invertible and the basis-change matrix from a to f3 is

Q-I = = -
COS ()
() cose
sin o 0]
0 ,
[
o 1

0] [
so that
X' ] [COS () sin () x ]
= - () o o
[

Problem 4.16 Find the basis-change matrix from a basis a to another basis f3 for the 3-space
1R3,wherea = {(I , 0,1 ), (1,1 ,0) , (0, 1, I)}, f3 = {(2, 3,1) , (1,2,0), (2,0, 3)).

4.6 Similarity

The coordinate expression of a vector in a vector space V depends on the choice of


an ordered basis. Hence, the matrix representation of a linear transformation is also
dependent on the choice of bases.
Let V and W be two vector spaces of dimensions nand m with two ordered bases
a and f3 , respectively, and let T : V -+ W be a linear transformation. In Section 4.3,
we discussed how to find the associated matrix If one takes different bases a'
and f3' for V and W, respectively, then one may get another associated matrix a
[Tl:
of T . In fact, we have two different expressions
4.6. Similarity 139

[X]a and [x]a' in IR n for each x E V,


m
[T (x)]fJ and [T (x)] fJ' in IR for T(x) E W.

They are related by the basis-change matrices as follows:

I fJ'
[x]a' = [x]a , and [T(x)]fJ' = [idw]fJ [T(x)],8.

On the other hand, by Theorem 4.9 , we have

[T(x)],8 = and [T(x)],8'

Therefore, we get

fJ' ,8' , 8' ,8


[T]a,[x]a' = [T(x)]fJ' = [idw]fJ [T(x)],8 = [idw],8 [T]a[x]a

= [idw [idv

This equation looks messy. However, by Theorem 4.13, this relation can be obtained
directly from T =idw 0 T 0 id v as

= [idw 0 T 0 idv = [idw [idv

Note that and are m x n matrices, [idv is an n x n matrix and [idw


is an m x m matrix .
The relation can also be incorporated in the diagram in Figure 4.7 , in which all
rectangles are commutative.

jRn jRm

T
;/
(V, a') • (W , fJ /)

[idv idv 1 1idw


(V,a)
T
• (W, fJ)

jRn
7 jRm

Figure 4.7. Relating two associated matrices and [T],8;


a

Our discussion is summarized in the following theorem.


140 Chapter 4. Linear Transformations

Theorem 4.15 Let T : V -+ W be a linear transformation from a vector space V


with bases a and a ' to another vector space W with bases 13 and 13'. Then

where Q = [i dv J: , and P = [i dw are the basis-change matrices.

In particular, if we take W = V, a = 13 and a' = 13', then P = Q and we get to


the following corollary.
Corollary 4.16 Let T : V -+ V be a linear transformation on a vector space V and
let a and 13 be ordered bases for V. Let Q = [id]Jj be the basis-change matrixfrom
13 to a . Then
(1) Q is invertible, and Q-I =
(2) For any x E V, [x]a = Q[x]p.
(3) [T]p = Q-l[T]aQ.

The relation (3) of [T]p and [T]a in Corollary 4.16 is called a similarity. In
general, we have the following definition.
Definition 4.6 For any square matrices A and B, A is said to be similar to B if there
exists a nonsingular matrix Q such that B = Q-I A Q.
Note that if A is similar to B, then B is also similar to A. Thus we simply say that
A and B are similar. We saw in Corollary 4.16 that if A and B are n x n matrices
representing the same linear transformation T on a vector space V , then A and B are
similar.
Example 4.19 (Two similar associated matrices) Let 13 = {VI , V2, V3} be a basis for
the 3-space ]R3 consisting of VI = (l, 1, 0), V2 = (1, 0, 1) and V3 = (0, 1, 1) . Let
T be the linear transformation on ]R3 given by the matrix

21-1]
[T]p =
[ 1 2
-1 1
3
1
.

Let a = {el, e2, e3} be the standard basis. Find the basis-change matrix and
[T]a.

Solution: Since VI = ei + e2, V2 = el + e3, V3 = e2 + e3, we have


10]
o 1 , and = ([idJJj)-1 = [ -1 ]
1 .
2 -1 1 1

-: n
1 1
Therefore,

[T]a = = -1 [ 43
2 -1 o
4.6. Similarity 141

Example 4.20 (Computing an associated matrix) Let T : ]R3 JR.3 be the linear
transformation defined by

T(XI, X2, X3) = (2xI + X2, XI + X2 + 3X3, -X2)'


Let a = [ej , e2, es) be the standard ordered basis for JR.3, and let f3 = {VI, V2, V3}
be another ordered basis consisting of VI = (-1, 0, 0), V2 = (2, I, 0), and
V3 = (1, I, 1). Find the associated matrices [T]a and [T]/l for T. Also, show that
T (v j) is the linear combination of the basis vectors in f3 with the entries of the j -th
column of [T]/l as its coefficients for j = 1,2,3 .

Solution: One can easily show that

[T]a =
2 10]
[o 1 1 3
-1 0
and [id]p =
[-1 21]
0 1 1
001
.

- 1 2
-1 ]
Thus, with the inverse = ([id]p)-I = 0 1 - , it follows that
[ o 0

To show the second statement,let j = 2. Then T(V2) = T(2, 1,0) = (5,3, -1). On
the other hand, the coefficients of [T (V2)]/l are just the entries of the second column
of [T]/l' Therefore,

T(V2) 2v\ + 4V2 - V3

2(-1,0,0) + 4(2 , 1,0) - (1, I, 1) = (5,3, -I),


as expected. o
The next theorem shows that two similar matrices can be matrix representations
of the same linear transformation.

Theorem 4.17 Suppose that an n x n matrix A representsa linear transformationT :


V Von a vector space V with respect to an ordered basis a = {VI, V2, . . . , vn },
i.e., [T]a = A. If B = Q-I A Q for some nonsingular matrix Q, then there exists a
basis f3 for V such that B = [T]/l and Q = [id]p'

Proof: Let Q = [qij] and let WI, W2, . .. , Wn be the vectors in V defined by

qllvl + q21 V2 + + qnlvn


q12 VI + q22 V2 + + Qn2Vn
142 Chapter 4. Linear Transformations

Then the nonsingularityof Q = [% l impliesthat tl = {WI, W2, .. . , wn } is an ordered


basis for V, and Theorem 4.16(3) shows that [Tlp = Q-I [Tl a Q = Q-I A Q = B
with Q = [idlfj. 0

Example 4.21 (A matrix similar to an associated matrix is also an associated matrix)


Let D be the differential operator on the vector spaceP2(lR) Given the ordered basis
a = {I, x, x 2 }, first note that

D(l) = 0 = 0 . 1+ 0 . x + 0 . x2
D(x) = 1 =1·1+0·x+O·x 2
D(x 2 )= 2x = O· 1 + 2· x + 0 . x 2 •

Hence, the matrix representationof D with respect to a is given by

0I 0]
[Dl a =
[ 000
0 0 2 .

Choose a nonsingular matrix

Q=
o 0 4

Let

Now, we are going to find a basis tl = {VI, V2, V3} so that B = [Dlp. But, if it is, the
matrix Q must be the basis-changematrix [idlfj, and then

VI = 1·1 + o· x + O· x 2 = I ,
V2 = 0·1 + 2 · x + O· x 2 = 2x,
V3 = -2.1+0·x+4·x 2 = -2+4x 2 .

Clearly, one can obtain

D(l) = 0 = O· 1 + 0 . 2x + 0 . (-2 + 4x 2 ) ,
D(2x) = 2 = 2.1+0.2x+O .(-2+4x 2 ) ,
D( -2 + 4x ) = 8x = O· 1 + 4 . 2x + 0 . (-2 + 4x 2 ) ,
2

and
020]
[Dlp = [ 0 0 4
000
, o
thus, as expected, that [Dlp = B = Q-I[DlaQ.
4.7.1. Application: Dual spaces and adjoint 143

Problem 4.17 Let T : ]R3 ]R3 be the linear transformation defined by

Let a be the standard basis, and let fJ = {VI, V2, V3} be another ordered basis consisting of
VI = (1, 0, 0), v2 = (1, 1, 0), and v3 = (1, 1, 1) for ]R3. Find the associated matrix of T
with respect to a and the associated matrix of T with respect to fJ. Are they similar?

Problem 4.18 Suppose that A and B are similar n x n matrices. Show that
=
(1) det A det B,
(2) tr(A) =tr(B),
(3) rank A =
rank B.

Problem 4.19 Let A and B be n x n matrices. Show that if A is similar to B, then A2 is similar
to B 2 . In general, An is similar to B" for all n 2.

4.7 Applications

4.7.1 Dual spaces and adjoint

Note that the space of all scalars is a one-dimensional vector space JR, and the set of
all linear transformations from V to JR is the vector space .c(V; JR) whose dimension
is equal to the dimension of V (see Theorem 4.12), so that the two vector spaces
.c( V; JR) and V are isomorphic. In this section, we are concerned exclusively with
such linear transformations from V to the scalar space R

Definition 4.7 Let V be a vector space.


(1) The vector space .c(V; JR) of all linear transformations from V to JR is called the
dual space of V and denoted by V* .
(2) An element (i.e., a linear transformation) in the dual space .c(V ; JR) is called a
linear functional of V .

From the definition, one can say that any vector space is isomorphic to its dual
space .

Example 4.22 The trace function tr Mn xn(JR) JR is a linear functional of


Mn xn(JR)·
The definite integral of continuous functions is one of the most important examples
of linear functionals in mathematics .

Example 4.23 (Fourier coefficients are linear functionals) Let C[a , b] be the vec-
tor space of all continuous real-valued functions on the interval [a , b]. The definite
integral I : C[a, b] JR defined by
144 Chapter 4. Linear Transformations

I(f) = l b
f(t)dt

is a linear functional of qa, b) . In particular, if the interval is [0, 2rr) and n is an


integer, then

An(f) = - 1 1
rr 0
21r
f(t) cosnt dt and Bn(f) = - 1
rr 0
1 21r
f(t)sinntdt

are linear functionals, called the n-th Fourier coefficients of f. o


For a matrix A regarded as a linear transformation A : IRn -+ IRm , the transpose
AT of A is another linear transformation AT: IRm -+ IR n . For a linear transformation
T : V -+ W from a vector space V to W, one can naturally ask what its transpose is
and what the definition is. In this section, we discuss this problem.
Recall that a linear functional T : V -+ IR is completely determined by the values
on a basis for V. Let a = {VI. V2, .. . , vn } be a basis for a vector space V. For each
i = 1, 2, ... , n, define a linear functional

vt: V -+ IR

by vt(Vj) = liij for each j = 1, 2, . . . , n. Then, for any x = 2:ajvj E V , we have


vt (x) = ai, which is the i -th coordinate of x with respect to a . Thus, the functional
vt is called the i -th coordinate function with respect to the basis a .
Theorem 4.18 The set a* = {vr, vi , . .. , of coordinate functions forms a basis
for the dual space V*, and for any T E V* we have
n
T = LT(vj)vt.
j=1

Proof: Clearly, the set a* = {vi , vi, . .. , is linearly independent, since 0 =


°
2:7=1 Cjvt implies = 2:7=1 Cjvt(Vj) = Cj for each j = 1,2, . .. , n. Because
dim V* = dim V = n, these n linearly independent vectors in a* must form a basis.
Now, for any T E V* , let T = 2:7=1 Cjv7- Then, T(V j) = 2:7=1 Cjvt(Vj) = Cj.
It gives T = 2:7=1 T(vj)v7- 0

Definition 4.8 For a basis a {VI,V2 , .. . , vn } for a vector space V , the basis
a* = {vr, vi, ... , for V* is called the dual basis of a.
Example 4.24 (Computing a dual basis) Let a = {VI , V2} be a basis for 1R2, where
VI = (1, 2) and V2 = (1, 3). To find its dual basis a* = {vi, vi} of a, we consider
the equations

I = vi(vd = vr(el) + 2vi(e2),


° = Vi(V2) = vr(el) + 3Vr(e2).
Solving these equations, we obtain that vi(el) = 3 and Vi(e2) = -1. Thus vi(x , y) =
3x - y . Similarly, it can be shown that Vi (x , y) = -2x + y. 0
4.7.1. Application: Dual spaces and adjoint 145

The following example shows a natural isomorphism between the n-space JRn and
its dual space JRn*.

Example 4.25 (The dual basis et is the coordinate function) For the standard basis
a = {el, e2, ... , en} for the n-space JRn , its dual basis vector et is just the i -th
coordinate function. In fact, for any vector x = (XI , X2 , ,xn) = XI el + X2e2 +
.. . + xnen E JRn, we have et (x) = et (XI el + X2e2 + + xnen) = Xi for all i,
On the other hand, when we write a vector in JRn as x = (XI, X2, . . . ,xn ) with
variables Xi, it means that for a given point a = (ai, a2, ... , an) E JRn , each Xi
gives us the i-th coordinate of a, that is, xi(a) = a, for all i, In this sense, one can
identify et = Xi for i = 1, 2, ... , n , so that JRn * = JRn and they are called coordinate
functions . 0

Problem4.20 Leta = {(1, 0,1) , (1, 2,1), (0,0, 1)}beabasisforlR3.Findthedualbasis


a* .

For a given linear transformation T : V --+ W, one can define T* : W* --+ V*


by T* (g) = goT for any g E W*. In fact, for any linear functional g E W*, i.e. ,
g : W --+ JR, the composition goT : V --+ JR given by g 0 T(x) = g(T(x)) for x E V
defines a linear functional on V , i.e., T* (g) = goT E V*.

Lemma 4.19 The transformation T * : W* --+ V* defined by T*(g) = goT for


g E W* is a linear transformation. It is called the adjoint (or transpose) ofT. 0

Proof: For any f, g E W*, a, bE JR and x E V,

T*(af + bg)(x) = af(T(x)) + bg(T(x)) = (aT*(f) + bT*(g))(x) . 0

Example 4.26 «idv)* = idv» and (T 0 S)* = S* 0 T*)


(1) Let id : V --+ V be the identity transformation on a vector space V. Then for
any g E V*, i d" (g) = g 0 i d = g. Hence, the adjoint i d * : V* --+ V* is the identity
transformation on V*, i.e., id* = id.
(2) Let S : U --+ V and T : V --+ W be two linear transformations. Then for any
g E W*, we have

(T 0 S)*(g) = go (T 0 S) = (g 0 T) 0 S
= T*(g) 0 S = S*(T*(g)) = (S* 0 T*)(g).

It shows that (T 0 S)* = S* 0 T*. o


Now,ifS: V --+ Wisanisomorphism,then(S-I)*oS* = (SoS-I)* = id* = id
shows that S* : W* --+ V* is also an isomorphism.
Note that the linear transformation *:
V --+ V* defined by assigning a basis for V
to its dual basis is an isomorphism, so that the composition ** :
V --+ V** is also an
isomorphism. However, an isomorphism between V and V** can be defined without
choosing a basis for V . In fact, for each x E V, one can first define V* --+ JR by x:
146 Chapter 4. Linear Transformations

x(1) = f (x) for every f E V *. It is easy to verify that x is a linear functional on


x
V* so that E V**. The following theorem shows that the mapping ct> : V V**
x
defined by ct>(x) = is an isomorphism and it is not dependent on the choice of basis
for V.

Theorem 4.20 The mapping ct> : V V** defined by ct>(x) = xis an isomorphism
from V to V**.

Proof: To show the linearity of ct>, let x, y E V and k a scalar. Then , for any f E V*,
..--....-
ct>(x + ky)(f) = (x + ky)(f) = f(x + ky)
= f(x) + kf(y) = x(f) + ky(f)
= (x + ky)(f) = (ct>(x) + kct>(yŸ (I).

Hence, ct>(x + ky) = ct>(x) + kct>(y) .


To show that ct> is injective, suppose x E Ker(ctŸ . Then ct>(x) = x
= 0 in V**,
i.e., x(f) = 0 for all f E V* . It implies that x = 0: In fact, if x :f:. 0, one can choose
a basis ex = {VI, V2, ... , vn } for V such that VI = X. Let ex* = {vi, vi, . . . , be
the dual basis of ex. Then

0= xCvi) = vi(x) = vi(vI) = 1,

which is a contradiction. Thus , x = 0 and Ker(ctŸ = {OJ.


Since dim V = dim V**, ct> is an isomorphism. D

Problem 4.21 Let V = ]R3 and define Ii E V* as follows :


II (x, y, z) = x - 2y, h(x, y, Z) = x + y + Z, f3(x, y, Z) = y - 3z.

Prove that {fl, Iz. 13} is a basis for V*, and then find a basis for V for which it is the dual .

We now consider the matrix representation of the transpose S* : W* V*


of a linear transformation S : V W. Let ex = {VI, V2, .. . , vn } and f3 =
{WI, W2, . .. , w m} be bases for V and W with their dual bases ex* = {vi, vi, ... ,
and f3* = {wi, wi, ... , respectively.

Theorem 4.21 The matrix representation of the transpose S* : W* V* is the


transpose of the matrix representation of S : V W, that is,

Proof: Let S(Vi) = auw«. so that

Then
4.7.1. Application: Dual spaces and adjoint 147

[S*lp: = [[S*(wj)la* ...


Note that, for 1 s j s m,
n n
S*(wj) = LS*(wj)(v;)vi = LajiVi ,
;=1 ;=1

since

= (wj 0 S)(Vi) = wj(S(v;Ÿ


m

= wj (takiwk) = Lakiwj(Wk) =
k=1 k=1

Hence, we get [S*lfJ*


0'*
= (
[SlafJ)T . D

Example 4.27 (The transpose AT is the adjoint transformation of A) Let A : jRn


jRm be a linear transformation defined by an m x n matrix A. Let a and f3 be the
standard bases for jRn and jRm, respectively. Then = A. By Theorem 4.21, we
have [A *lp: = Thus, with the identification jRk* = jRk via Ct* = Ct and
f3* = f3 as in Example 4.25, we have [A *lp: = A * and A * = AT . In this sense, we
see that the transpose A T is the adjoint transformation of A. D
As the final part of the section, we consider the dual space of a subspace. Let
V be a vector space of dimension n, and let U be a subspace of V of dimension k.
Then U* = {T : U jR : T is linear on U} is not a subspace of V*. However,
one can extend each T E U* to a linear functional on V as follows . Choose a basis
a = {UI , U2 , ... , uk! for U . Then by definition Ct* = {uj,ui, , uk} is its dual
basis for U*. Now extend a to a basis f3 = {UI, U2, . . . , Uk> Uk+l, , un} for V . For
each T E U*, let T : V jR be the linear functional on V defined by

ifi k ,
if k + 1 i n.

Then clearly T E V* and the restriction Tlu ofT on U is simply T: i.e., Tlu = T E
U*. It is easy to see that (T + kS) T + kS. In particular, it is also easy to see that
=
{uj, u2' .. ., uk} is linearly independent in V* and uilu = ui E U*, i = 1,2, .. . ,k.
Therefore, one obtains a one-to -one linear transformation

tp : U* V*

given by cp(T) = T for all T E U*. The image cp(U*) is now a subspace of V*. By
identifying U* with the image cp(U*) , one can say U* is a subspace of V*.
Problem 4.22 Let U and W be subspaces of a vector space V. Show that U £:: W if and only
ifW* £:: U*.
148 Chapter 4. Linear Transformations

Let S be an arbitrary subset of V, and let (S} denote the subspace of V spanned
by the vectors in S. Let Sol = (f E V* : f(x) = 0 for any XES} . Then it is easy
to show that Sol is a subspace of V*, Sol = (S}ol, and dim(S} + dim Sol = n.
Let R be a subset of V*. Then Rol = {x E V : f(x) = 0 for any fER} is again
a subspace of V such that Rol = (R}ol and dim Rol + dim(R} = n.

Problem 4.23 For subspaces U and W of a vector space V, show that


(1) (U + W)ol Uol n w-
= (2) (U n W)ol Uol = + w-.

4.7.2 Computer graphics

One of the simple applications of a linear transformation is to animation or graphical


display of pictures on a computer screen. For a simple display of the idea, let us
consider a picture in the 2-plane ]R2. Note that a picture or an image on a screen
usually consists of a number of points, lines or curves connecting some of them, and
information about how to fill the regions bounded by the lines and curves. Assuming
that the computer has information about how to connect the points and curves, a figure
can be defined by a list of points.
For example, consider the capital letters 'LA' as in Figure 4.8. They can be repre-

o
Figure 4.8. Letter L.A. on a screen

sented by a matrix with coordinates of the vertices. For example, the coordinates of
the 6 vertices of 'L' form a matrix:

vertices 1 2 3 4 5 6
x-coordinate [ 0.0 0.0 0.5 0.5 2.0
y-coordinate 0.0 2.0 2.0 0.5 0.5 0.0
2.0] = A.
Of course, we assume that the computer knows which vertices are connected to
which by lines via some algorithm. We know that line segments are transformed
to other line segments by a matrix, considered as a linear transformation. Thus, by
multiplying A by a matrix, the vertices are transformed to the other set of vertices,
and the line segments connecting the vertices are preserved. For example, the matrix
B = [b 0.;5] transforms the matrix A to the following form, which represents
new coordinates of the vertices:
4.7.2. Application: Computer graphics 149

vertices 1 2 3 4 5 6
0.0 0.5 1.0 0.625 2.125 2.0]
BA = [ 0.0 2.0 2.0 0.5 0.5 0.0 .

Now, the computer connects these vertices properly by lines according to the given
algorithm and displays on the screen the changed figure as the left-hand side of the
Figure 4.9. The multiplication ofthe matrix C = [005 to BA shrinks the width

BA n
L!!=Js
3

(CB)A

1 6

Figure 4.9. Tilting and Shrinking

of BA by half producing the right-hand side of Figure 4.9. Thus, changes in the shape
of a figure may be obtained by compositions of appropriate linear transformations.
Now, it is suggested that the readers try to find various matrices such as reflections ,
rotations, or any other linear transformations, and multiply A by them to see how the
shape of the figure changes.

Problem 4.24 For the given matrices A and B = ] above, by the matrix B T A
instead of BA, what kind of figure can you have ?

Remark: Incidentally, one can see that the composition of a rotation by 1( followed
by a reflection about the x-axis is the same as the composition of the reflection
followed by the rotation (see Figure 4.10). In general, a rotation and a reflection are
not commutative, neither are a reflection and another reflection.
The above argument generally applie s to a figure in any dimension. For instance,
a 3 x 3 matrix may be used to convert a figure in )R3 since each point has three
components.

Example 4.28 (Classifying all rotations in )R3) It is easy to see that the matrices

1 0 - sin f3 ]
R(x .a) = 0 coso
[ o sin o
o ,
cos f3

COS Y
-sin y
R( z,y) = Y cos y
[
o
are the rotations about the x, y, z-axes by the angles a, f3 and y, respectively .
150 Chapter 4. Linear Transformations

!t 1t
Rotationby 7f

Reflection Reflection

\f Rotation by x
2\
Figure 4.10. Commutativity of a rotationand a reflection

In general, the matrix that rotates JR3 with respect to a given axis appears frequently
in many applications. One can easily express such a general rotation as a composition
of basic rotations such as R(x ,a) , R(y ,{3) and R(z ,y) :
First, note that by choosing a unit vector u in JR3, one can determine an axis for a
rotation by taking a line passing through u and the origin O. (In fact, vectors u and -u
determine the same line) . Let u = (cosacos{3, cos a sin {3, sin a), a
o {3 2rr in the spherical coordinates. To find the matrix R(u ,o) of the rotation
about the u-axis bye, we first rotate the u-axis about the z-axis into the x z-plane by
R(z,_{3) and then into the x-axis by the rotation R (y ,-a) about the y-axis , Then the
rotation about the u-axis is the same as the rotation about the x-axis followed by the
inverses of the above rotations, i.e., take the rotation R(x ,o) about the x-axis, and then
get back to the rotation about the u-axis via R(y,a) and R(z,{3) ' In summary,

z
u

Figure 4.11.A rotationabout the u-axis


4.7.2. Application: Computer graphics 151

Problem 4.25 Find the matrix R (0 , t ) for the rotation aboutthe line determined by u = 0 , 1, 1)
T(

by 4'

So far, we have seen rotations, reflections, tilting (or say shear) or scaling (shrink-
ing or enlargement) or their compositions as linear transformations on the plain ]R2 or
the space ]R3 for computer graphics. However, another indispensable transformation
for computer graphics is a translation: A translation is by definition a transformation
T : jRn --+ ]Rn defined by T (x) x + Xo for any x E jRn , where Xo is a fixed vector
=
in ]Rn . Unfortunately, a translation is not linear if Xo =1= 0 and hence it cannot be
represented by a matrix . To escape this disadvantage , we introduce a new coordinate
system , called a homogeneous coordinate. For brevity, we will consider only the
3-space ]R3.
A point x = (x, y, z) E ]R3 in the rectangular coordinate can be viewed as the
set of vectors x = (hx , hy, hz; h), h =1= 0 in the 4-space ]R4 in a homogeneous
coordinate. Most time, we use (x, y , z, I) as a representative of this set. Conversely,
a point (hx, hy, hz, h), (h i= 0) in the 4-space ]R4 in a homogeneous coordinate
corresponds to the point (x / h, y / h, z/ h) E ]R3 in the rectangular coordinate. Now,
it is possible to represent all of our transformations including translations as 4 x 4
matrices by using the homogeneous coordinate and it will be shown case by case as
follows.
(1) Translations: A translation T : jR3 --+ ]R3 defined by T (x) = x + xo, where
Xo = (xo, yo , zo). can be represented by a matrix multiplication in the homogeneous
coordinate as

n
(2) Rotations: With the notations in Example 4.28 ,

- sinf3
o
cos f3
o
COS Y - sin y 0 0]
sin y cos y 0 0
R (z,y )= 0 0 I 0
[
o 0 0 I
are the rotations about the x , y, z-axes by the angles a , f3 and y in the homogeneous
coordinate, respectively.
(3) Reflections: An xy- reflection is represented by a matrix multiplication in the
homogeneous coordinate as
152 Chapter 4. Linear Transformations

Similarly, one can have an x z-reflection and a yz -reflection.


(4) Shear: An xy-shear can be represented by a matrix multiplication in the
homogeneous coordinate as

Similarly, one can have an xz-shear and a yz-shear with matrices of the form

!
001

respectively.
(5) Scaling: A scaling is represented by a matrix multiplication in the homoge-
neous coordinate as

In summary, all of the transformations can be represented by matrix multiplica-


tions in the homogeneous coordinate and also their compositions can be done by their
corresponding matrix multiplications.

4.8 Exercises
4.1. Which of the following functions T are linear transformations?
(1) T(x, y) =
(x 2 _ i,x 2 + y2) .
(2) T(x, y, z) = (x + y, 0, 2x + 4z).
(3) T(x , y) =
(sin x , y) .
(4) T(x, y) (x + I, 2y, x + y ) .
=
(5) T(x, y, z) =
(]Ÿ], 0) .
4.2. Let T : P2(lR) P3(lR) be a linear transformation such that T(l) = 1, T(x) = x 2 and
T(x 2 ) = x 3 + x . Find T(ax 2 + bx + c) .
4.3. Find SoT and/or T 0 S whenever it is defined.
4.8. Exercises 153

(1) T(x , y,z) = (x - y+ z, x +z), S(x, y ) = (x, x - y , y) ;


(2) T(x, y ) = (x , 3y +x, 2x - 4y , y) , S(x, y, z) = (2x, y ).
4.4. Let S : C(R) --+ C(]R) be the function on the vector space C(]R)defined by, for f E C(]R),

S(f )(x ) = f(x) _ uf (u)du .

Show that S is a linear transformation on the vector space C(]R).


4.5. Let T be a linear transformation on a vector space V such that T 2 = id and T f= id. Let
U = (v E V : T(v) = v} and W = (v E V : T(v ) = -v} . Show that
(1) at least one of U and W is a nonzero subspace of V;
(2) U nW = {O};
(3) V =U + W.
4.6. If T : ]R3 --+ ]R3 is defined by T(x , y, z) = (2x - z, 3x - 2y , x - 2y + z),
(1) determine the null space N (T) of T,
(2) determine whether T is one-to-one,
(3) find a basis for N(T) .
4.7. Show that each of the following linear transformations T on ]R3 is invertible, and find a
formula for T- 1:
(1) T (x,y ,z) = (3x , x- y, 2x+y + z).
(2) T(x, y, z) = (2x , 4x - y, 2x + 3y - z) .
4.8. Let S, T : V --+ V be linear transformations on a vector space V.
(1) Show that if T 0 S is one-to-one , then T is an isomorphism.
(2) Show that if T 0 S is onto, then T is an isomorphism.
(3) Show that if T k is an isomorphism for some positive k, then T is an isomorphism.
4.9. Let T be a linear transformation from ]R3 to ]R2, and let S be a linear transformation from
]R2 to ]R3 . Prove that the compo sition SoT is not invertible.
4.10. Let T be a linear transformation on a vector space V satisfying T - T 2 = id . Show that
T is invertible.
4.11. Let A be an n x n matrix, which is a linear transformat ion on the n-space R" by the matrix
multiplication Ax for any x E R", Suppose that rj , r2, . . . , r n are linearly independent
vectors in ]Rn constituting a parallelepiped (see Remark (2) on page 70). Then A trans-
forms this parallelepiped into another parallelepiped determined by Arl, Ar2, . . . , Arn .
Suppose that we denote the n x n matrix whose j -th column is r j by B , and the n x n
matrix whose j -th column is Ar j by C . Prove that

vol(P(CŸ = I det AI vol(P(B» .

(This means that, for a square matrix A considered as a linear transformation, the absolute
value of the determinant of A is the ratio between the volumes of a parallelepiped P(B)
and its image parallelepiped P(C) under the transformation by A.1f det A = 0, then the
image P(C) is a parallelepiped in a subspace of dimension less than n).
4.12 . Let T : ]R3 --+ ]R3 be the linear transformation given by

T (x, y ,z) = (x + y,y + z ,x + z ).

Let C denote the unit cube in ]R3 determined by the standard basis ej , e2, e3. Find the
volume of the image parallelepiped T (C) of C under T.
154 Chapter 4. Linear Transformations

4.13. With respect to the ordered basis a = {I, x, x 2 } for the vector space P2(lR), find the
coordinate vector of the following polynomials:

(1) f(x) =x2 - x + I, (2) f(x) = x 2 + 4x - I, (3) f(x) = 2x + 5.


4.14. Let T : P3 (R) P3 (R) be the linear transformation defined by

Tf(x) = f"(x) - 4f'(x) + f(x).

Find the matrix [Tl a for the basis a = {x, 1 + x, x + x 2 ,


x 3 }.
2
4.15. Let T be the linear transformation on lR defined by T(x, y) (-y, x). =
(1) What is the matrix of T with respect to an ordered basis a = {VI, V2}, where vI =
(1, 2), V2 = (1, -I)?
(2) Show that for every real number c the linear transformation T - c id is invertible.
4.16. Find the matrix representation of each of the following linear transformations T on lR2
with respect to the standard basis {el, e2}.
(1) T(x , y) =
(2y, 3x - y) .
(2) T(x, y) = (3x - 4y, x + 5y).

4.17. Let M = i l
(1) Find the unique linear transformation T : lR3 lR2 so that M is the associated

!Ul [il UJ I·
matrix of T with respect to the bases

"1
(2) Find T(x, y, z).
"2 ([ n[: ]J
4.18. Find the matrix representation of each of the following linear transformations T on P2 (lR)
with respect to the basis {I, x , x 2 }.
(1) T: p(x) p(x + 1).
(2) T: p(x) p' (x) .
(3) T: p(x) p(O)x .
(4) T : p(x) p(x) - p(O).
x
4.19. Consider the following ordered bases of lR3: a = {el , e2, ej] the standard basis and
fJ = {UI = (I, I , 1), U2 = (1, 1, 0), U3 = (1,0, O)}.
(1) Find the basis-change matrix P from a to fJ .
(2) Find the basis-change matrix Q from fJ to a.
(3) Verify that Q = P- l .
(4) Show that [vlfJ = P[vl er for any vector V E lR3.
(5) Show that [T]fJ = Q-I [Tl a Q for the linear transformation T defined by
T(x,y,z)=(2y+x, x-4y, 3x).
4.20 . There are no matrices A and B in Mnxn(lR) such that AB - BA = In.
4.8. Exercises 155

4.21. Let T : ]R3 -+ ]RZ be the linear transformation defined by

T (x , y, z) = (3x + 2y - 4z , x - 5y + 3z),
and let a ={(l , I , 1), (1,1, D), (I, D, D) } and d = {(l, 3), (2, 5)} bebasesfor]R3
and ]RZ, respect ively.
(l) Find the associated matrix for T .
(2) Verify = [T (v)]/l for any v E ]R3.
4.22 . Find the basis-change matrix from a to .8, when
(l) a = {(2, 3) , (D , I )}, .8 = {(6, 4) , (4,8)};
(2) a = {(5, 1), (I ,2) }, .8 = {(l , D), (D, I )};
(3) a= {(l, 1, 1), (l, I , D), (I, D, D)} , .8 = {(2, D, 3), (- 1, 4, I) , (3, 2, 5)};
(4) a={t , 1, t Z}, .8 = {3 +2t + t Z , t Z-4 , 2 +t}.

4.23. Show that all matri ces of the form Ae = [ BB sin BB ] are similar.
Sin - cos

4.24. Show that the matrix A = cannot be similar to a diagonal matrix.

4.25. Are the matrices i


I DID D 3
and [ - similar?

4.26. For a linear transformation T on a vector space V, show that T is one-to -one if and only
if its tran spose T * is one-to-on e.
4.27. Let T : ]R3 -+ ]R3 be the linear transformation defined by
T (x, y , z)= (2y+ z , -x+4y+ z , x +z) .
Compute [Tl a and [T*la* for the standard basis a = [e j , eZ, ej} ,
4.28. Let T be the linea r transformation from ]R3 into ]Rz defined by
T (XI, X2 , X3) = (XI +x2, 2X3 - XI ) ·
(l) For the standard ordered bases a and .8 for ]R3 and ]Rz respectivel y, find the associated
matrix for T with respect to the bases a and .8.
(2) Let a = {XI, XZ , x3} and .8 = {YI , yz}, where XI = (l, D, -I), xz = (1, 1, 1),
X3 = (1 , D, D), and YI = (D, I), yz = (I , D). Find the associated matrices and

4.29 . Let T be the linear transformation from ]R3 to ]R4 defined by


T (x , y , z )= (2x+ y+4z , x +y +2z. y + 2z, x +y + 3z) .
Find the image and the kernel of T . What is the dimen sion of Im(T)? Find and
where
a = {(1 , D, D), (D, 1, D) , (D, D, I)},
fJ = {(l ,D,D,D), (I , I , D, D), (1 , I , I ,D), (1, 1, 1, I)}.
4.30. Let T be the linear transforma tion on V = ]R3, for which the associated matrix with
respect to the standard ordered basis is

-I 3 4
Find the bases for the kernel and the image of the transpose T* on V* .
156 Chapter 4. Linear Transformations

4.31. Define three linear functionals on the vector space V = P2(lR) by


II (p) = Jd p(x)dx, h(p) = J02
p(x)dx , f3(p) = Jo- p(x)dx.
I

Show that (II, 12, 13j is a basis for V* by finding its dual basis for V .
4.32. Determine whether or not the following statements are true in general, and justify your
answers .
(1) For a linear transformation T : jRn --+ jRm, Ker(T) = {OJ if m > n.
(2) For a linear transformation T : jRn --+ jRm, Ker(T) =1= {OJ if m < n .
(3) A linear transformation T : jRn --+ jRm is one-to-one if and only if the nullspace of
[T]g is {OJ for any basis ex for jRn and any basis f3 for jRm.
(4) For any linear transformation T on jRn, the dimension of the image of T is equal to
that of the row space of [T]a for any basis ex for jRn .
(5) For any two linear transformations T : V --+ W and S : W --+ Z, if Ker(S 0 T) = 0,
then Ker(T) = O.
(6) Any polynomial p(x) is linear if and only if the degree of p(x) is less than or equal
to 1.
(7) Let T : jR3 --+ jR2 be a function given as T(x) = (TI (x) , T2(X» for any x E jR3.
Then T is linear if and only if their coordinate functions Ti , i = 1, 2, are linear.
(8) For a linear transformation T : jRn --+ jRn , if [T]g = In for some bases ex and f3 of
R" , then T must be the identity transformation.
(9) If a linear transformation T : jRn --+ jRn is one-to-one , then any matrix representation
of T is nonsingular.
(10) Any m x n matrix A can be a matrix representation of a linear transformation T :
jRn --+ jRm.

(11) Every basis-change matrix is invertible.


(12) A matrix similar to a basis-change matrix is also a basis-change matrix.
(13) det : Mn xn(jR) --+ jR is a linear functional.
(14) Every translation in jRn is a linear transformation.
5
Inner Product Spaces

5.1 Dot products and inner products


To study a geometry ofa vector space, we go back to the case ofthe 3-space ]R3 •The dot
(or Euclidean inner) product of two vectors x = (XI, X2, X3) and Y = (YI, Y2, Y3)
in ]R3 is a number defined by the formula

x- y x , y1+ x,Y2 +X3Y3 [x, X2'3] [ ] x Ty,

where x T Y is the matrix product of x T and Y, which is also a number identified with
the 1 x 1 matrix x T y. Using the dot product, the length (or magnitude) of a vector
x =(XI , X2, X3) is defined by

[x] = (x · x) i = Jxl + + xi '


and the Euclidean distance between two vectors x and Y in ]R3 is defined by

d(x, y) = IIx - YII .


In this way, the dot product can be considered to be a ruler for measuring the length
of a line segment in the 3-space ]R3. Furthermore, it can also be used to measure the
angle between two nonzero vectors: in fact, the angle 0 between two vectors x and Y
in ]R3 is measured by the formula involving the dot product
x ·y
cos s = - - - 0 0 it ,
IIxllllYII '
since the dot product satisfies the formula

x· Y = IIxllllYII cos s,
In particular, two vectors x and Y are orthogonal (i.e ., they form a right angle
o= tt /2) if and only if the Pythagorean theorem holds:

HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
158 Chapter 5. Inner Product Spaces

By rewriting this formula in terms of the dot product, we obtain another equivalent
condition:
X. Y = XIYI + xzYz + X3Y3 = O.
In fact, this dot product is one of the most important structures with which JR3 is
equipped. Euclidean geometry begins with the vector space JR3 together with the dot
product, because the Euclidean distance can be defined by the dot product.
The dot product has a direct extension to the n-space JRn of any dimension n:
for any two vectors x = (XI , Xz , . . . , xn) and y = (YI , Yz, .. . , Yn) in JRn , their dot
product, also called the Euclidean inner product, and the length (or magnitude)
of a vector are defined similarly as

x ·y XIYI +xzyz+ " ,+xnYn = xTy,

[x] = (x.x)! =

To extend this notion of the dot product to a (real) vector space, we extract the
most essential properties that the dot product in JRn satisfies and take these properties
as axioms for an inner product on a vector space V .

Definition 5.1 An inner product on a real vector space V is a function that associates
a real number (x, y) to each pair of vectors x and y in V in such a way that the following
rules are satisfied: For any vectors x, y and z in V and any scalar k in JR,
(1) (x, y) = (y , x) (symmetry),
(2) (x + y, z) = (x, z) + (y, z) (additivity),
(3) (kx , y) = k(x, y) (homogeneity),
(4) (x, x) ::: 0, and (x, x) = 0 ¢> x = 0 (positive definiteness) .
A pair (V, ( ,)) of a (real) vector space V and an inner product ( , ) is called a (real)
inner product space. In particular, the pair (JRn , .) is called the Euclidean n-space.

Note that by symmetry (1), additivity (2) and homogeneity (3) also hold for the
second variable: i.e.,

(2') (x, Y+ z) = (x, y) + (x, z),


(3') (x, ky) = k(x, y}.
It is easy to show that (0, y) = 0(0, y) = 0 and also (x, O) = O.
Remark: In Definition 5.1, the rules (2) and (3) mean that the inner product is linear
for the first variable; and the rules (2') and (3') above mean that the inner product is
also linear for the second variable. In this sense, the inner product is called bilinear.

Example 5.1 (Non-Euclidean inner product on JRz) For any two vectors x = (XI, xz)
and y = (YI , yz) in JR z , define
5.1. Dot products and inner products 159

{x, y} = ax \y\ + C(X\Y2 + X2Y\ ) + bX2Y2


T
= [XJX2] [ ] = x Ay,

where a, band C are arbitrary real numbers. Then , this function { , } clearly satisfies
the first three rules of the inner product , i.e., {, } is symmetric and bilinear. Moreover,
° °
if a > and det A = ab - c2 >

if and only if either X2 = °


hold, then it also satisfies rule (4) , the positive
definiteness of the inner product. (Hint: The equation {x , x} = ax ?+ 2cxI x2+bxl ::::
or the discriminant of (x , x} / xi
is nonpositive. ) In the
°
case of C = 0, this reduces to (x, y} = aXIYI + bX2Y2. Notice also that a = (ej , ej},
b = {e2, e2} and c = {el, e2} = {e2, e.) , 0

Problem 5.1 In Example 5.1, the conver se is also true : Prove that if (x, y) = x T Ay is an inner
product in ]R2, then a > 0 and ab - c2 > O.

Example 5.2 (Case ofx f. 0 f. Y but {x, y} = 0) Let V = C [0, 1] be the vector
space of all real-valued continuous functions on [0, 1]. For any two functions f and
g in V , define
{f, g} = 1\ f (x) g(x)dx .

Then { , } is an inner product on V (verify this). Let

I - 2x if ° s x ::: i,
(° °s s i,
°
if x
f(x) =
(
. I
If 2 sxs I,
and g(x) =
2x - I if i s x s I.
Then f i= 0 i= g, but (f, g ) = O. o
By a subspace W of an inner product space V , we mean a subspace of the vector
space V together with the inner product that is the restriction of the inner product on
Vto W.

Example 5.3 (A subspace as an inner product space) The set W = D 1[0, 1] of all
real-valued differentiable functions on [0, 1] is a subspace of V = qo, 1]. The
restriction to W of the inner product on V defined in Example 5.2 makes W an inner
product subspace of V. However, one can define another inner product on W by the
following formula: For any two functions f(x) and g(x) in W,

{{f, g}} = 1 1
f(x)g(x )dx + 1 1
f '( x)g'(x)dx.

Then {{, }} is also an inner product on W , which is different from the restriction to W
of the inner product of V, and hence W with this new inner product is not a subspace
of the inner product space V. 0
160 Chapter 5. Inner Product Spaces

Remark: From vector calculus, most readers might be already familiar with the dot
product (or the inner product) and the cross product (or the outer product) in the
3-space jR3. The concept of this dot product is extended to a higher dimensional
Euclidean space jRn in this section. However, it is known in advanced mathematics
that the cross product in the 3-space jR3 cannot be extended to a higher dimensional
Euclidean space jRn. In fact, it is known that if there is a bilinear function I : jRn X
jRn -+ jRn; I(x, y) = x x y satisfying the property: x x y is perpendicular to x and
= =
y, and IIx x yll2 IIxll 211yll2 - (x . y)2, then n 3 or 7. Hence, the cross product or
the outer product will not be introduced in linear algebra.

5.2 The lengths and angles of vectors

In this section, we study a geometry of an inner product space by introducing a length ,


an angle or a distance between two vectors. The following inequality will enable us
to define an angle between two vectors in an inner product space V.

Theorem 5.1 (Cauchy-Schwarz inequality) If x and yare vectors in an inner


product space V , then
(x, y)2 :5 (x, x){y, y).

Proof: If x = 0, it is clear. Assume x i= O. For any scalar t , we have


0:5 (tx + y, tx + y) = (x, x)t 2 + 2{x, y)t + (y, y).

This inequality implies that the polynomial (x, x)t 2 + 2{x, y)t + (y, y) in t has either
no real roots or a repeated real root. Therefore, its discriminant must be nonpositive:

(x, y)2 - (x, x){y, y) :5 0,

which implies the inequality. o


Problem 5.2 Prove that the equalityin the Cauchy-Schwarz inequality holdsif and onlyif the
vectorsx and yare linearlydependent.

The lengths of vectors and angles between two vectors in an inner product space
are defined in a similar way to the case of the Euclidean n-space.

Definition 5.2 Let V be an inner product space.


(1) The magnitude [x] (or the length) of a vector x is defined by

[x] = J{x, x).


5.2. The lengths and angles of vectors 161

(2) The distance d(x , y) between two vectors x and Yis defined by

d(x, y) = [x - YII.

(3) From the Cauchy-Schwarz inequality, we have -1 :s :s 1 for any two


nonzero vectors x and y. Hence, there is a unique number () E [0, rr] such that

(x, y}
cos e = IIxlillYIl or (x,y} = IIxIiIlYllcos(}.
Such a number () is called the angle between x and y.

For example, the dot product in the Euclidean 3-space ]R3 defines the Euclidean
distance in ]R3. However, one can define infinitely many non-Euclidean distances in
]R3 as shown in the following example.

Example 5.4 (Infinitely many different inner products on ]R2 or ]R3)


(1) In ]R2 equipped with an inner product (x, y} = 2X1Yl + 3X2Y2, the angle
between x = (1, 1) and Y = (1,0) is computed as

(x, y} 2
cos () = - - = - - = 0.6324 .. ..
IIxllllYl1 J5:2
Thus, () = cos"! (k) . Notice that in the Euclidean 2-space]R2 with the dot product,
the angle between x = (1,1) and Y = (1,0) is clearly and cos = =
0 .7071· . '. It shows that the angle between two vectors depends actually on an inner
product on a vector space. [d 1
0 0]
(2) For any diagonal matrix A = 0 da 0 with all d, > 0,
o 0 d3

defines an inner product on]R3. Thus, there are infinitely many different inner products
on ]R3 . Moreover, an inner product in the 3-space]R3 may play the roles of a ruler and
a protractor in our physical world ]R3. 0
Problem 5.3 In Example 5.4(2), show that x T Ay cannot be an inner product if A has a negative
diagonal entry d; < O.

Problem 5.4 Prove the following properties of length in an inner product space V : For any
vectors x, y E V,
(1) [x] 0,
(2) [x] = 0 if and only if x = 0,
(3) IIkxll = Iklllxll,
(4) IIx+YII::::: [x] + IIYII (triangle inequality).
162 Chapter 5. Inner Product Spaces

Problem 5.5 Let V be an inner product space. Show that for any vectors x, y and z in V ,
(I) d(x, y) 2: 0,
(2) d(x, y) = 0 if and only if x = y,
(3) d(x, y) = dey, x),
(4) d(x, y) ::: d(x, z) + d(z , y) (triangle inequality).

Definition 5.3 Twovectorsx and Yin aninnerproductspacearesaid to beorthogonal


(or perpendicular) if (x, y) = O.
Note that for nonzero vectors x and y, (x, y) = 0 if and only if 0 = IT /2.
Lemma 5.2 Let V be an inner product space and let x E V . Then, the vector x is
=
orthogonal to every vector y in V (i.e., (x, y) 0 for all y in V) ifand only ifx O. =
Proof: Ifx = 0, clearly (x, y) = 0 for all y in V . Suppose that (x, y) = 0 for all Y
in V. Then (x, x) = 0, implying x = 0 by positive definiteness. 0

Corollary 5.3 Let V be an inner product space, and let a = {Vj, . . . , vn } be a basis
for V . Then, a vector x in V is orthogonal to every basis vector Vi in a ifand only if
x=o.
Proof: If (x, Vi) = 0 for i = 1, 2, . . . , n, then (x, y) = L:?=j Yi (x, Vi) = 0 for any
y = L:?=j YiV i E V. 0

Example 5.5 (Pythagorean theorem) Let V be an inner product space, and let x
and y be any two nonzero vectorsin V with the angle e. Then, (x, y) = IIxlillYIl cosO
gives the equality
IIx + yf = IIxli z + lIyllZ + 211xllllYil cos e .
Moreover, it deduces the Pythagorean theorem: IIx + yliZ = IIxli z + lIyllZ for any
orthogonal vectors x and y. 0
Theorem 5.4 [fxj, Xz , .. . , Xk are nonzero mutually orthogonal vectors in an inner
product space V (i.e., each vector is orthogonal to every other vector), then they are
linearly independent.

Proof: Suppose CjXj + CZXz + . .. + qXk = O. Then for each i = 1,2, ... , k,
o (0, Xi) = + . ..+ qXk , Xi)
(CjXj
= Cj (Xj,Xi) + . .. + Ci(Xi , Xi} + . .. + q(Xk, Xi}
2
= cillxiU ,
because xi , Xz, . .. , Xk are mutually orthogonal.Since each Xi is not the zero vector,
IIxdl ;6 0; so ci = 0 for i = 1, 2, .. . , k. 0
5.3. Matrix representations of inner products 163

r
Problem 5.6 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. Prove

(1) [fJ f(x)g(x)dx s [fJ f2(X)dx] [fJ g2(x)dx].


1 I I
(2) [fJ (f(x) + g(x»2dx] [fJ f 2(x)dx ] + [fJ g2(x)dx] .

Problem 5.7 Let V = C [0, 1] be the inner product space of all real-valued continuous func-
tions on [0, 1] equipped with the inner product

1
(f, g) = 10 f(x)g(x) dx for any f and gin V.

For the following two functions f and g in V, compute the angle between them: For any natural
numbers k, t,

(1) f(x) = kx and g(x) = lx,


(2) f(x) = sin 211'kx andg(x) = sin 211'lx,
(3) f(x) = cos 211'kx and g(x) = cos 211'lx.

5.3 Matrix representations of inner products


Let A be an n x n diagonal matrix with positive diagonal entries. Then one can
show that (x, y) = x T Ay defines an inner product on the n-space IRn as shown in
Example 5.4(2). The converse is also true: every inner product on a vector space can be
expressed in such a matrix product form. Let (V, ( ,)) be an inner product space, and
let a = {VI , V2, ... , vn } be a fixed ordered basis for V. Then for any x = L:?=I xiv,
and Y = LJ=l YjV j in V,

n n
(x, y) = L Yj (Vi, V j)
i=1 j=1

holds. If we set aij = (Vi, V j) for i, j = I, 2, . . . , n, then these numbers constitute


a symmetric matrix A = [aij], since (Vi, V j) = (v j, Vi) ' Thus, in matrix notation, the
inner product may be written as

n n
(x,y) = LLxiYjaij = [xlrA[yla.
i=l j=l

The matrix A is called the matrix representation of the inner product ( , ) with
respect to the basis a.

Example 5.6 (Matrix representation of an inner product)


(1) With respect to the standard basis . .. , en} for the Euclidean n-
space IRn , the matrix representation of the dot product is the identity matrix, since
164 Chapter 5. Inner Product Spaces

ei . ej = Dij . Thus, for x = Li Xiei and Y = Lj Yjej ERn, the dot product is just
the matrix product x T y:

(2) On V = Pz([O, 1]), we define an inner product on Vas

(f, g) = i l
f(x)g(x)dx.

Then for a basis a = (f1(X) = 1, fz(x) = x, h(x) = xZ} for V, one can easily
find its matrix representation A = [aijl: For instance,

aZ3 = (fz, 13) =


i
o
l
fz(x)h(x)dx = 11
0
x . xZdx 1
= -.
4
D

The expression of the dot product as a matrix product is very useful in stating or
proving theorems in the Euclidean space.
For any symmetric matrix A and for a fixed basis a, the formula (x, y) =
[xlr A[Yla seems to give rise to an inner product on V . In fact, the formula clearly is
symmetric and bilinear, but does not necessarily satisfy the fourth rule, positive defi-
niteness. The following theorem gives a necessary condition for a symmetric matrix
A to give rise to an inner product. Some necessary and sufficient conditions will be
discussed in Chapter 8.
Theorem 5.5 The matrix representation A of an inner product (with respect to any
basis) on a vector space V is invertible. That is, det A =P O.

Proof: Let (x, y) = [xlr A[Yla be an inner product in a vector space V with respect
to a basis a, and let A[Yla = 0 as a homogeneous system of linear equations. Then

(Y, y) = [YlrA[Yla = O.

It implies that A[Yla = 0 has only the trivial solution Y = 0, or equivalently A is


invertible by Theorem 1.9. D

Recall that the conditions a > 0 and det A = ab - c Z > 0 in Example 5.1 are
sufficient for A to give rise to an inner product on R Z•

5.4 Gram-Schmidt orthogonalization


The standard basis for the Euclidean n-space Rn has a special property: The basis
vectors are mutually orthogonal and are of length 1. In this sense, it is called the
5.4. Gram-Schmidt orthogonalization 165

rectangular coordinate system for jRn. In an inner product space, a vector with
length I is called a unit vector. If x is a nonzero vector in an inner product space
V, the vector x is a unit vector. The process of obtaining a unit vector from a
nonzero vector by multiplying the reciprocal of its length is called a normalization.
Thus, if there is a set of mutually orthogonal vectors (or a basis) in an inner product
space, then the vectors can be converted to unit vectors by normalizing them without
losing their mutual orthogonality.

Definition 5.4 A set of vectors xr, X2, . .. , Xk in an inner product space V is said
to be orthonormal if
(orthogonality) ,
(normality) .

A set [xj , X2, . . . , xn } of vectors is called an orthonormal basis for V if it is a basis


and orthonormal.
Problem 5.8 Determine whether each of the following sets of vectors in ]R2 is orthogonal,
orthonormal, or neitherwith respectto the Euclidean inner product.

(I) {[ ] ,[ ]}

(3) {[ l [- ]} (4) I/J2 ] [-I/J2]}


{[ 1/J2 ' 1/J2

It will be shown later in Theorem 5.6 that every inner product space has an
orthonormal basis , just like the standard basis for the Euclidean n-space jRn.
The following example illustrates how to construct such an orthonormal basis.

Example 5.7 (How to construct an orthonormal basis ?) For a matrix

find an orthonormal basis for the column space C(A) of A.

Solution: Let Cl , C2 and C3 be the column vectors of A in the order from left to
right. It is easily verified that they are linearly independent, so they form a basis for
the column space C(A) of dimension 3 in jR4. For notational convention, we denote
by Spanlx}, .. . ,xd the subspace spanned by {Xl, . . . , xd .
(1) First normalize cl to get

ul =
Cl
=
Cl (I 1I I)
'2 = 2' 2' 2' 2 '
166 Chapter S. Inner Product Spaces

which is a unit vector. Then Spanluj] =Span{cI}, because one is a scalar multiple
of the other.
(2) Noting that the vector Cz - (UI, CZ}UI = Cz - 2uI = (0, 1, -1, 0) is a
nonzero vector orthogonal to UI , we set

Then, {UI, uz} is orthonormal and Spanlu}, uz} = Spanjc}, cz}. because each u, is
a linear combination of CI and C2, and the converse is also true.
(3) Finally, note that C3 - (UI, C3}UI - (U2, C3}UZ = C3 - 4uI + J2uz =
(0, 1, 1, -2) is also a nonzero vector orthogonal to both UI and uz. In fact,

(UI, C3-4uI+Y'2uZ) = (UI, c3}-4(UI,UI}+Y'2(UI, uz}=O,


(U2, C3-4uI+Y'2uZ) = (uz, c3}-4(uz, UI}+Y'2(UZ,UZ} =0.

By the normalization , the vector

is a unit vector, and one can also show that Spanlu}, Uz, U3) = Spanlcj , Cz , C3} =
C(A) . Consequently, {UI , Uz, U3} is an orthonormal basis for C(A). 0

The orthonormalization process in Example 5.7 indicates how to prove the fol-
lowing general case, called the Gram-Schmidt orthogonaIization.

Theorem 5.6 Every inner product space has an orthonormal basis.

Proof: [Gram-Schmidt orthogonalization process] Let {XI,Xz, .. . , xnl be a basis


for an n-dimensional inner product space V . Let

XI
UI = IIxIII'

Of course, Xz - (xz, UI}UI =f. 0, because {XI , xz} is linearly independent. Generally,
one can define by induction on k = 1,2, . . . , n,

Then, as Example 5.7 shows, the vectors UI, Uz, ... , Un are orthonormal in the
n-dimensional vector space V . Since every orthonormal set is linearly independent,
it is an orthonormal basis for V . 0
5.4. Gram-Schmidt orthogonalization 167

Problem 5.9 Use the Gram-Schmidt orthogonalization on the Euclidean space R" to transform
the basis
{(O, 1, 1, 0), (-I, 1, 0, 0), (1, 2, 0, -1), (-1, 0, 0, -I)}
into an orthonormal basis.

Problem5.10 Find an orthonormal basis for the subspace W of the Euclidean space ]R3 given
byx + 2y - z = O.

Problem 5.11 Let V = C[O, 1] with the inner product


1
(f,g) = 10 f(x)g(x)dx for any fandgin V.

Find an orthonormal basis for the subspace spanned by 1, x and x 2 .

The next theorem shows that an orthornormal basis acts just like the standard
basis for the Euclidean n-space jRn.

Theorem 5.7 Let {u I, U2, ... , Uk} be an orthonormal basis for a subspace V in an
inner product space V. Then, for any vector x in V ,

Proof: For any vector x E V, one can write x = XIUI + X2U2 + + XkUk, as a
linear combination of the basis vectors. However, for each i I, = , n,

(Uj,x) = (u., XIUI+ " '+XkUk)

XI (u. , uj ) + ...+ Xj(Uj, Uj) + ...+ xt{Uj, Uk)

= xi ,

because {UI, U2, . , . , Uk} is orthonormal. o

In particular, if a = {v], V2, ... , vn } is an orthonormal basis for V, then any


vector x in V can be written uniquely as

x = (VI, X)VI + (V2, X)V2 + ...+ (v n , x)v n .

Moreover, one can identify an n-dimensional inner product space V with the
Euclidean n-space jRn. Let a = {VI, V2, . • . , vn } be an orthonormal basis for the
space V. With this orthonormal basis a, the natural isomorphism <I> : V....-+ jRn given
by <I> (Vj) = [v;]a = ej, i = I, 2, . . . , n preserves the inner product on vectors: For a
vector x = I:7=1 xiv, in V with x; = (x, Vj), the coordinate vector ofx with respect
to a is a column matrix
168 Chapter 5. Inner Product Spaces

Moreover, for another vector y = 2:7=1 YiVj in V,

The right-hand side of this equation is just the dot product of vectors in the Euclidean
space JRn. That is,
(x, y) = = <I>(x) · <I>(y)
for any x, y E V. Hence, the natural isomorphism <I> preserves the inner product,
and we have the following theorem (compare with Corollary 4.8(1Ÿ.
Theorem 5.8 Any n-dimension inner product space V with an inner product (, ) is
isomorphic to the Euclidean n-space JRn with the dot product ., which means that an
isomorphismpreserves the inner product.
In this sense, someone likes to restrict the study of an inner product space to the
case of the Euclidean n-space JRn with the dot product.
A special kind of linear transformation that preserves the inner product such as
the natural isomorphism from V to JRn plays an important role in linear algebra, and
it will be studied in detail in Section 5.8.

5.5 Projections
Let U be a subspace of a vector space V . Then, by Corollary 3.13 there is another
subspace W of V such that V = U EB W, so that any x E V has a unique expression as
x = u + w for u E U and w E W. As an easy exercise, one can show that a function
T : V V defined by T(x) = T(u + w) = u is a linear transformation, whose
image Im(T) =T(V) is the subspace U and kernel Ker(T) is the subspace W.
Definition 5.5 Let U and W be subspaces of a vector space V . A linear transformation
T :V V is called the projection of V onto the subspace U along W if V = U EB W
and T(x) = u for x = u+w E U EB W.

Example 5.8 (Infinitely many differentprojectionsofJR2 onto the x -axis) Let X, Y


and Z be the l-dimensional subspaces of the Euclidean 2-space JR2 spanned by the
vectors ej , e2, and v = el + e2 = (1, 1), respectively:

X = {reI : r E JR} = x-axis,


Y = {re2 : r E 1R} = y-axis ,
Z = {reel + e2) : r E JR}.
5.5. Projections 169

Since the pairs leI, ez} and [ej , v} are linearly independent, the space ]R2 can be
expressed as the direct sum in two ways : ]R2 = X $ Y = X $ Z .

y z

Ty(x) = (0, 1) _____ x = (2, 1) Sz(x) = (1, 1)

TX(x) = (2,0) x x

Figure 5.1. Two decompositions of]R2

Thus, a vector x = (2, 1) E ]R2 may be written in two ways:

x = (2, 1) = { 2(1,0) + (0, 1) E X $ Y = ]R2 or


(1,0) + (1,1) E X $ Z = ]R.i
Let Tx and Sx denote the projections of]R2 onto X along Y and Z, respectively. Then
Tx(x) = 2(1,0) E X, Ty(x) = (0, 1) E Y, and
Sx(x) = (1,0) E X, Sz(x) = (1, 1) E Z.

This shows that a projection of ]R2 onto the subspace X depends on a choice of
complementary subspace of X. For example, by choosing Zn = {r(n, I) : r E R]
for any integer n as a complementary subspace, one can construct infinitely many
different projections of ]R2 onto the x -axis. 0

Note that for a given subspace U of V, a projection T of V onto U depends on


the choice of a complementary subspace W of U as shown in Example 5.8. However,
by definition, T(u) = u for any u E U and for any choice of W. That is, ToT = T
for every projection T of V .
The following theorem shows an algebraic characterization of a linear transfor-
mation to be a projection.

Theorem 5.9 A linear transformation T : V V is a projection if and only if


T = T 2 (= ToT by definition).

Proof: The necessity is clear, because ToT =


T for any projection T.
For the sufficiency, suppose T 2 = T . It suffices to show that V = Im(T)$Ker(T)
and T(u + w) = u for any u + w E Im(T) $ Ker(T) . First, one needs to prove
Im(T) n Ker(T) = (OJ and V = Im(T) + Ker(T). Indeed, if y E Im(T) n Ker(T) ,
then there exists x E V such that T(x) = yand T(y) = O. It implies
170 Chapter 5. Inner Product Spaces

y = T(x) = T 2(x) = T(T(x» = T(y) = O.


The hypothesis T 2 = T also shows that T(v) E Im(T) and v - T(v) E Ker(T)
for any v E V . It implies V = Im(T) + Ker(T). Finally, note that T(o + w) =
T(o) + T(w) = T(o) = 0 for any 0 + WE Im(T) EEl Ker(T). 0

Let T : V -+ V be a projection, so that V = Im(T) EEl Ker(T). It is not difficult


to show that Im(idv - T) = Ker(T) and Ker(idv - T) = Im(T) for the identity
transformation idv on V.

Corollary 5.10 A linear transformation T : V -+ V is a projection if and only if


idv - T is a projection. Moreover, if T is the projection of V onto a subspace U
along W, then idv - T is the projection of V onto W along U. 0

Proof: It is enough to show that (idv - T) 0 (idv - T) = idv - T. But

(idv - T) 0 (idv - T) = (idv - T) - (T - T 2 ) = idv - T. 0

Problem 5.12 For V = U E9 W, let Tv denote the projection of V onto U along W, and let
Tw denote the projection of V onto W along U. Prove the following.

(1) For any x E V , X = TV (x) + Tw(x) .


(2) Tv 0 (idv - Tv) = O.
(3) Tv 0 Tw = Tw 0 TV = O.
(4) For any projection T : V V, Im(idv - T) = Ker(T) and Ker(idv - T) = Im(T).

5.6 Orthogonal projections


Let U be a subspace of a vector space V. As shown in Example 5.8, we learn that there
are infinitely many projections of V onto U which depend on a choice of complemen-
tary subspace W of U. However, if V is an inner product space, there is a particular
choice of complementary subspace W, called the orthogonal complement of U, along
which the projection onto U is called the orthogonal projection, defined below. To
show this, we first extend the orthogonality of two vectors to an orthogonality of two
subspaces.

Definition 5.6 Let U and W be subspaces of an inner product space V.


(1) Two subspaces U and W are said to be orthogonal, written U .1 W, if (0, w) 0 =
for each 0 E U and w E W.
(2) The set of all vectors in V that are orthogonal to every vector in U is called the
orthogonal complement of U, denoted by u-, i.e.,

u- = {v E V : (v,o) = 0 for all 0 E U}.


5.6. Orthogonal projections 171

One can easily show that U.i is a subspace of V , and v E u» if and only if
(v, u) = 0 for every U E f3, where f3 is a basis for U. Moreover, W ..L U if and only
if W o-.
Problem 5.13 Let U and W be subspaces of an inner product space V. Show that

(1) If U .1 W, un W = {OJ. (2) U W if and only if w- u-.

Theorem 5.11 Let U be a subspace ofan inner product space V . Then


(1) (U.i).i = U.
(2) V = U EEl U.i : that is, for each x E V, there exist unique vectors Xu E U and
Xu.l E U.isuchthatx = xU+XuL This is called the orthogonal decomposition
of V (or of X) by U.

Proof: Let dim U = k. To show (U.i).i = U, take an orthonormal basis for U, say
a = {VI,V2, ... , vd, by the Gram-Schmidt orthogonalization, and then extend it to
an orthonormal basis for V, say f3 = {VI,V2,"" Vb Vk+I, ... , Vn }, which is always
possible. Then, clearly y = {Vk+I , " " vn } forms an (orthonormal) basis for o-,
which means that (U.i).i = U and V = U EEl o-. 0

Definition 5.7 Let U be a subspace of an inner product space V, and let {UI, U2 , . .. ,
urn} be an orthonormal basis for U. The orthogonal projection Proju from V onto
the subspace U is defined by

Proju(x) = (UI, X)UI + (U2 , X)U2 + .. . + (urn, x)u m

for any x E V.
Clearly, Proju is linear and a projection, because Proju 0 Proju = Proju. More-
over, Proju(x) E U and x - Proju(x) E u-, because

(x - Proju(x), u.) = (x, OJ) - (Proju(x), u.) = (x, OJ) - (Uj,x) = 0

for every basis vector u. . Hence, by Theorem 5.7, we have

Corollary 5.12 The orthogonal projection Proju is the projection of V onto a sub-
space U along its orthogonal complement u-.

Therefore, in Definition 5.7, the projection Proju(x) is independent of the choice


of an orthonormal basis for the subspace U. In this sense, it is called the orthogonal
projection from the inner product space V onto the subspace U .
Almost all projections used in linear algebra are orthogonal projections.
172 Chapter 5. Inner Product Spaces

Example 5.9 (The orthogonal projectionfrom JR3 onto the xy-plane) In the Euclidean
3-space JR3, let U be the xy-plane with the orthonormal basis ex = [ej , fl}. Then, the
orthogonal projection Proju(x) = (ej , x)el + (fl, X)fl is the orthogonal projection
onto the xy-plane in a usual sense in geometry, and x- Proju (x) E U 1.., which is the z-
axis. It actually means that Proju(XI, X2, X3) = (Xl , X2, 0) for any x = (XI,X2 , X3) E
0
Example 5.10 (The orthogonal projection from JR2 onto the x-axis) As in Exam-
ple 5.8, let X, Y and Z be the I-dimensional subspaces of the Euclidean 2-space
JR2 spanned by the vectors eJ, e2, and v = el + fl = (1, I), respectively. Then
clearly Y = Xl.. and V :f:. Xl... And, for the projections Tx and Sx ofJR2 given in
Example 5.8, Tx is the orthogonal projection , but Sx is not, so that Tx Proh and =
SX:f:. Proh . 0
Theorem 5.13 Let U be a subspace of an inner product space V , and let x E V.
Then, the orthogonal projection Proju(x) ofx satisfies

IIx - Proju(x)1I ::: IIx - YII

for all y E V . The equality holds ifand only ify = Proju(x).

Proof: First, note that for any vector x E V, we have Proju(x) E V and x -
Proju(x) E u-.
Thus, for all y E V,

IIx - yf = II (x - Proju(xŸ + (Proju(x) _ Y)1I 2


= IIx - Proju(x) 11 + IIProju(x) _ yll2
2

> IIx - Proju (x) 11 2,

where the second equality comes from the Pythagorean theorem for the orthogonality
(x - Proju(xŸ .L (Proju(x) - y) . (See Figure 5.2.) 0

It follows from Theorem 5.13 that the orthogonal projection Proju(x) ofx is the
unique vector in U that is closest to x in the sense that it minimizes the distance
from x to the vectors in V. It also shows that in Definition 5.7, the vector Proju(x) is
independent of the choice of an orthonormal basis for the subspace U . Geometrically,
Figure 5.2 depicts the vector Proju(x) .

°
Problem 5.14 Find the point on the plane x - y - z = that is closest to p = (1, 2, 0).

Problem 5.15 Let U C ]R4 be the subspace of the Euclidean 4-space ]R4 spanned by
(1, 1, 0, 0) and (1, 0, 1, 0), and let We ]R4 be the subspace spanned by (0, 1, 0, 1) and
(0, 0, I , 1). Find a basis for and the dimension of each of the following subspaces:
(1) U + W, (2) u-, (3) Ul.. + w-, (4) un w.
5.6. Orthogonal projections 173

x - Proju(x) E UJ..
x
.... X-y

...... ,.:.:!'"
..
.. ' .... ..JV
'

U
Proju(x)

Figure 5.2. Orthogonal projection Proju

Problem 5.16 Let U and W be subspaces of an inner product space V . Show that
(I) (U + W)J.. = UJ.. n WJ... (2) (U n W)J.. = UJ.. + WJ...

As a particular case, let V = jRn be the Euclidean n-space with the dot product
=
and let U {ru : r E jR} be a l-dimensional subspace determined by a unit vector
u. Then for a vector x in jRn, the orthogonal projection of x into U is

Proju(x) = (u - x)u = (u T x)u = u(u Tx) = (uuT)x.


(Here, the last two equalities come from the facts that u . x = u TX is a scalar and the
associativity of a matrix product uuT x, respectively.) This equation shows that the
matrix representation of the orthogonal projection Proju with respect to the standard
basis ex is
[Projula = uuT .
If U is an m-dimensional subspace of R" with an orthonormal basis {UI , U2 , " "
urn}, then for any x E jRn,

Proju(x) = (UI' X)UI + (U2 . X)U2 + . .. + (urn' x)um


= Ut(u[x) + u2(ufx) + ... +
= (UIU; + U2Ur + ... +
Thus, the matrix representation of the orthogonal projection Proju with respect to the
standard basis ex is

Definition 5.8 The matrix representation [Proju la of the orthogonal projection


Proju : jRn --+ jRn of jRn onto a subspace U with respect to the standard basis ex
is called the (orthogonal) projection matrix on U.

Further discussions about the orthogonal projection matrices will be continued in


Section 5.9.3.
174 Chapter 5. Inner Product Spaces

=
Example 5.11 (Distancefrom apointto a line) Letax+by+c 0 be a line L in the
plane JR2. (Note that the line L cannot be a subspace ofJR2 if c f= 0.) For any two points
Q = (Xl. YI) and R = (X2 , Y2) on the line, the equality a(x2 - Xl) + b(Y2 - YI) = 0
that the nonzero vector D = (a , b) is perpendicular to the line L, that is,
QR..L D.
Let P = (xo, YO) be any point in the plane JR2 . Then the distance d between the
point P and the line L is simply the length of the orthogonal projection of QP into
D, for any point Q = (Xl, YI) in the line. Thus,

d = II Projn(QP)II

= I(QP· 11:11)1 (the dot product)


la(xo - Xl) + b(yo - YI)I
=
Ja2 + b2
laxo + byo + cl
=
Ja2 + b2

p = (xo. YO)

Figure 5.3. Distance from a point to a line

Note that the last equality is due to the fact that the point Q is on the line (i.e.,
aXI + bYI + c = 0).
To find the orthogonal projection matrix, let u =
11:11 =
';)+b 2(a, b). Then the
orthogonal projection matrix onto U = {ru : r E JR} is

uuT = a 2 b2 [ : ] [a b] = a 2 b2 [:; l
Thus, if x = (1, 1) E JR2, then
2
Proju(x) 1
= (uuT)x = -2--2
a +b
[ ab
a
o
5.7. Relations of fundamental subspaces 175

Problem 5.17 Let V = P3(lR) be the vector space of polynomials of degree 3 equipped
with the inner product
1
(f, g) = 10 f(x)g(x) dx for any f and gin V .

Let W be the subspace of V spanned by [I , x}, and define f(x) = x 2. Find the orthogonal
projection Proj w(f) of f onto W.

5.7 Relations of fundamental subspaces

We now go back to the study of a system Ax =


b of linear equations with an m x n
matrix A. One of the most important applications of the orthogonal projection of
vectors onto a subspace is the decompositions of the domain space and the image
space of A by the four fundamental subspacesN(A), 'R(A) in JRn and C(A), N(A T)
in JRm (see Theorem 5.16). From these decompositions, one can completely determine
the solution set of a consistent system Ax = b.

Lemma 5.14 For an m x n matrix A, the null spaceN(A) and the row space'R(A)
are orthogonal: i.e. , N(A) 1- R(A) in ]Rn. Similarly, N(A T) 1- C(A) in ]Rm.

Proof: Note that WE N(A) if and only if Aw = 0, i.e., for every row vector r in A,
r . W = O. For the second statement, do the same with AT. D

From Lemma 5.14 , it is clear that

N(A) S; 'R(A) .L (or'R(A) S; N(A).L), and


N(A T) S; C(A).L (or C(A) S; N(AT).L).

Moreover, by comparing the dimensions of these subspaces and by using Theo-


rem 5.11 and Rank Theorem 3.17, we have

dim R(A) + dimN(A) n = dim'R(A) + dim 'R(A).L,


dimC(A) + dimN(A T) = m = dim C(A) + dim C(A).L .

This means that the inclusions are actually equalities.

Lemma 5.15 (1) N(A) =


R(A).L (or'R(A) = N(A).L).
(2) N(A T) = C(A).L (orC(A) = N(AT).L).

We show that the row space 'R(A) is the orthogonal complement of the null space
N(A) in R", and vice-versa. Similarly, the same thing happens for the column space
C(A) and the null space N (A T) of AT in JRm . Hence, by Theorem 5.11, we have the
following orthogonal decomposition.
176 Chapter 5. Inner Product Spaces

Theorem 5.16 For any m x n matrix A,

(1) N (A) EB R(A) = ]Rn ,


(2) N(A T ) EBC(A) = ]Rm.

Note that if rank A = r so that dimR(A) = r = dimC(A), then dimN(A) =


n - rand dimN(A T ) = m - r. Considering the matrix A as a linear transformation
A : ]Rn -+ ]Rm, Figure 5.4 depicts Theorem 5.16.

A
- - - -b-

N( A) :::: jRn - r

Xn

Figure 5.4. Relations of four fundamental subspaces

Corollary 5.17 The set of solutions of a consistent system Ax = b is precisely


Xo+ N (A), where Xo is any solution of A x = b.
Proof: Let Xo E ]Rn be a solution of a system Ax = b. Now consider the set Xo +
N(A), which isjust a translation of N(A) by xo.
(1) For any vector Xo +0 in Xo +N(A) , it is also a solution because A(xo + 0) =
Axo = b .
(2) If x is another solution, then clearly x - Xo is in the null space N(A) so that
x = Xo + 0 for some 0 E N (A ), i.e., x E Xo + N(A). 0

In particular, if rank A = m (so that m n), then C(A) = ]Rm. Thus, for any
b E ]Rm, the system Ax = b has a solution in ]Rn . (This is the case of the existence
Theorem 3.24) .
On the other hand, if rank A = n (so that n m), then N(A) = (OJ and
R (A) = ]Rn. Therefore, the system Ax = b has at most one solution, that is, it
has a unique solution x in R(A ) if b E C(A), and has no solution if b rj. C(A).
(Th is is the case of the uniqueness Theorem 3.25) . The latter case may occur when
m > r = rank A; that is, N (A T) is a nontrivial subspace of]Rm, and will be discussed
later in Section 5.9.1.
5.8. Orthogonal matrices and isometries 177

Problem 5.18 Prove the following statements .

(l) If Ax = b and AT Y = 0, then yTb = 0, i.e., y.l b.


(2) If Ax = 0 and AT y = c, then x T c = 0, i.e., x .1 c.

Problem 5.19 Given two vectors (1 , 2, I, 2) and (0, - I , -1 , 1) in lR4 , find all vectors in
]R4 that are perpendicular to them.

Problem 5.20 Find a basis for the orthogonal complement of the row space of A :

1 28] 001]
(1) A =
[ 23 0-1 61 , (2) A =
[0 0 I .
111

5.8 Orthogonal matrices and isometries


In Chapter 4, we saw that a linear transformation can be associated with a matrix, and
vice-versa. In this section , we are mainly interested in those linear transformations
(or matrices) that preserve the length of a vector in an inner product space.
Let A = [CI . . . cn] be an n x n square matrix, with columns CI, . . • , Cn' Then,
a simple computation shows that

A T A= [--
: - -] [ I
CI
- - cT
n
__ I

cT
Hence, if the column vectors are orthonormal, C j = 8ij, then AT A = In, that is,
AT is a left inverse of A, and vice-versa. Since A is a square matrix , this left inverse
must be the right inverse of A, i.e., AA T = In. Equivalently, the row vectors of A are
also orthonormal. This argument can be summarized as follows .

Lemma 5.18 For an n x n matrix A, the following are equivalent.


(1) The column vectors of A are orthonormal.
(2) AT A = In.
(3) AT = A-I .
(4) AAT=In .
(5) The row vectors of A are orthonormal.

Definition 5.9 A square matrix A is called an orthogonal matrix if A satisfies one


(and hence all) of the statements in Lemma 5.18.

Clearly, A is orthogonal if and only if AT is orthogonal.


178 Chapter 5. Inner Product Spaces

Example 5.12 (Rotations and reflections are orthogonal) The matrices

A = [ 0 - sin 0 ] B = [ 0 sin 0 ]
smO cosO ' smO -cosO

are orthogonal, and satisfy

A-I = AT = [ sinO] B-1 = B T = [ sinO] .


- smO cosO ' smO -cosO

Note that the linear transformation T : ]R2 -+ ]R2 defined by T (x) = Ax is a rotation
through the angle 0, while S : ]R2 -+ ]R2 defined by Sex) = Bx is the reflection about
the line passing through the origin that forms an angle 0/2 with the positive x-axis.D

Example 5.13 (All 2 x 2 orthogonal matrices) Show that every 2 x 2 orthogonal


matrix must be one of the forms

cosO - sinO] or cos O sinO]


[ sinO cosO [ sinO - cos e .

Solution: Suppose that A = is an orthogonal matrix, so that AA T =h =


AT A. The first equality gives a 2 + b 2 = 1, ac + bd = 0, and c 2 + d 2 = 1. The
second equality gives a 2 + c 2 = 1, ab + cd = 0, and b 2 + d 2 = 1. Thus, b = ±c.
If b = -c, then we get a = d . If b = c, then we get a = -d. Now, choose 0 so that
a = cos 0 and b = sin O. 0

Problem 5.21 Find the inverse of each of the following matrices .

1 0 0] 1/./2 -1/./2 0]
(1) 0 cosO sinO , (2) .
[ o - sin 0 cos (} [

What are they as linear transformations on ]R3 : rotations, reflections , or other?

Problem 5.22 Find eight 2 x 2 orthogonal matrices which transform the square -1 x, y 1
onto itself.

As shown in Examples 5.12 and 5.13, all rotations and reflections on the Euclidean
2-space ]R2 are orthogonal and preserve intuitively both the lengths of vectors and the
angle between two vectors. In fact, every orthogonal matrix A preserves the lengths
of vectors :

IIAxll2 = Ax- Ax = (Ax)T (Ax) = x T AT Ax = x T X = IIxf .


5.8. Orthogonal matrices and isometries 179

Definition 5.10 Let V and W be two inner product spaces. A linear transformation
T : V -+ W is called an isometry, or an orthogonal transformation, if it preserves
the lengths of vectors , that is, for every vector x E V

IIT(x)1I = [x]:

Clearly, any orthogonal matrix is an isometry as a linear transformation. If T :


V -+ W is an isometry, then T is one-to-one, since the kernel of T is trivial: T (x) = 0
implies [x] = IIT(x)1I = O. Thus, if dim V = dim W, then an isometry is also an
isomorphism.
The following theorem gives an interesting characterization of an isometry .

Theorem 5.19 Let T : V -+ W be a linear transformation from an inner product


space V to another W. Then, T is an isometry if and only if T preserves inner
products, that is,
(T(x), T(y)} = (x, y)
for any vectors x, yin V.

Proof: Let T be an isometry. Then IIT(x)f = IIxll 2 for any x E V . Hence,

(T(x+y), T(x+y )} = IIT(x+Y)1I 2 = IIx+yll2 = (x+y,x+Y)


for any x, y E V. On the other hand ,

(T(x + y) , T(x + y)} = (T(x), T(x)} + 2(T(x), T(y)} + (T(y) , T(y)},


(x + y, x + y) = (x, x) + 2(x, y} + (y, y),

from which we get (T(x), T(y)} = (x, y).


The converse is quite clear by choosing y = x. o
Theorem 5.20 Let A be an n x n matrix. Then, A is an orthogonal matrix if and only
if A : jRn -+ jRn, as a linear transformation, preserves the dot product. That is, for
any vectors x, y E jRn,
Ax - Ay =x .y.

Proof: The necessity is clear. For the sufficiency, suppose that A preserves the dot
product. Then for any vectors x, y E jRn,

Ax- Ay = xT ATAy = xTY = x · y.


Take x = ei and y = ej . Then , this equation is just [AT A]ij = liij. o
Since d(x , y) = IIx - YII for any x and y in V , one can easily derive the following
corollary.
180 Chapter 5. Inner Product Spaces

Coronary 5.21 A linear transformation T : V W is an isometry if and only if


d(T(x), T(y» = d(x, y)
for any x and yin V.

Recall that if () is the angle between two nonzero vectors x and y in an inner
product space V, then for any isometry T : V V,

cos () = {x, y} = {Tx, Ty} .


IIxlillYIl IITxllllTYIl
Hence, we have
Coronary 5.22 An isometry preserves the angle.
The converse of Corollary 5.22 is not true in general. A linear transformation
T (x) = 2x on the Euclidean space jRn preserves the angle but not the lengths of
vectors (i.e., not an isometry). Such a linear transformation is called a dilation.
We have seen that any orthogonal matrix is an isometry as the linear transforma-
tion T (x) = Ax. The following theorem says that the converse is also true, that is,
the matrix representation of an isometry with respect to an orthonormal basis is an
orthogonal matrix.

Theorem 5.23 Let T : V W be an isometry from an inner product space V to


another W of the same dimension. Let ex = {v}, .. . , vn } and fJ = {WI , . . . , w n }
be orthonormal bases for V and W, respectively. Then, the matrix for T with
respect to the bases ex and fJ is an orthogonal matrix.

Proof: Note that the k-th column vector of the matrix is just [T (Vk)],8' Since
T preserves inner products and ex, fJ are orthonormal, we get

which shows that the column vectors of are orthonormal. o


Remark: In summary, for a linear transformation T : V W, the following are
equivalent:
(1) T is an isometry: that is, T preserves the lengths of vectors.
(2) T preserves the inner product.
(3) T preserves the distance.
(4) with respect to orthonormal bases ex and fJ is an orthogonal matrix.
Anyone (hence all) of these conditions implies that T preserves the angle, but the
converse is not true.
5.9.1. Application: Least squares solutions 181

Problem 5.23 Find values r > 0, S > 0, a > 0, band c such that matrix Q is orthogonal.

(1) Q =
[
° -s2ssa]
r

r
b ,
c
-s a]
3s b
-15 c
.

Problem 5.24 (Bessel's Inequality) Let V be an inner product space, and let {VI, . . . , vm } be
a set of orthonormal vectors in V (not necessarily a basis for V) . Prove that for any x in V,
IIxll 2 2: L:i"=ll(x, Vi)!2 .

Problem 5.25 Determine whether the following linear transformations on the Euclidean space
]R3are orthogonal.

(1) T(x, y, z) = (z, :!,fx + iY, ! - :!,fy).


(2) T(x, y, z) = (U5 x + UZ'
11 12
UY 5
- UZ' x .
)

5.9 Applications

5.9.1 Least squares solutions

In the previous section, we have completely determined the solution set for a system
Ax = b when b E C(A). In this section, we discuss what we can do when the system
Ax = b is inconsistent, that is, when b It C(A) £ IRm • Certainly, there exists no
solution in this case, but one can find a 'pseudo' -solution in the following sense.
Note that for any vector x in IRn, Ax E C(A) . Hence, the best we can do is to find
a vector XO E jRn so that AXQ is the closest to the given vector b E jRm : i.e., II AXO - b II
is as small as possible. Such a vector XQ will give us the best approximation Ax to b
for all vectors x in IRn , and it is called a least squares solution of Ax = b.
To find a least squares solution, we first need to find a vector in C(A) that is
closest to b. However, from the orthogonal decomposition IRm = C(A) EEl N(A T ) ,
any b E IRm has the unique orthogonal decomposition as

b = be + b, E C(A) EElN(AT ) = IRm ,

where be = Prok(A)(b) E C(A) and b n = b - be E N(A T ) . Here, the vector


be = PrOk(A) (b) E C(A) has two basic properties:
(I) There always exists a solution XQ E IRn of Ax = be, since be E C(A),
(2) be is the closest vector to b among the vectors in C(A) (see Theorem 5.13).
Therefore, a least squares solution XQ E IRn of Ax = b is just a solution of Ax = be.
Furthermore, if XQ E IRn is a least squares solution, then the set of all least squares
solutions is XQ +N(A) by Corollary 5.17.
In particular, ifb E C(A), then b = be, so that the least squares solutions are just
the 'true' solutions of Ax = b . The second property of be means that a least squares
182 Chapter 5. Inner Product Spaces

solution Xo E jRn of Ax = b gives the best approximation Axo = b, to b : i.e., for


any vector x in jRn,

IIAxo - b] :::: II Ax - b].

In summary, to have a least squares solution of Ax = b, the first step is to find


the orthogonal projection b c = PrOjc(A) (b) E C(A) ofb, and then solve Ax = b c as
usual .
One can find b c from b E jRm by using the orthogonal projection if we have an
orthonormal basis for C(A). But, such a computation of b c could be uncomfortable,
because the only way we know so far to find an orthonormal basis for C(A) is the
Gram-Schmidt orthogonalization (whose computation may be cumbersome).
However, there is a bypass to avoid the Gram-Schmidt orthogonalization. For
this, let us examine a least squares solution once again . If XO E jRn is a least squares
solution of Ax = b, then

holds since AXO = bc. Thus, AT (Axo-b) = AT (-bn ) = oor equivalently AT Axo =
A Tb, that is, XO is a solution of the equation

ATAx=ATb. I
This equation is very interesting because it is also a sufficient condition for a least
squares solution as the next theorem shows, and it is called the normal equation of
Ax=b.

Theorem 5.24 Let A be an m x n matrix, and let b e jRm be any vector. Then, a
vector XO E jRn is a least squares solution of Ax = b ifand only ifxo is a solution of
the normal equation AT Ax = ATb.

Proof: We only need to show the sufficiency. Let Xo be a solution of the normal
equation AT Ax = ATb. Then, AT (Axo - b) = 0, so AXO - b E N(A T). Say,
AXO - b = n inN(A T), and let b = b c + b n E C(A) EBN(A T) . Then, Axo - b c·=
n-j-b; E N(A T). Since Axo- b c is also contained in C(A) andN(AT)nC(A) = {OJ,
AXO = b, = PrOjc(A) (b) , i.e., xo is a least squares solution of Ax = b . 0
Example 5.14 (The best approximated solution of an inconsistent system Ax = b)
Find all the least squares solutions of Ax = b, and then determine the orthogonal
projection b c ofb into the column space C(A), where

1 -2
2 -3
A = -1 1
[
3 -5
5.9.1. Application: Least squares solutions 183

Solution: (The reader may check that Ax = b has no solutions).

-3
-1
2
-1 3] [
1 -5
2 0 -1
3

and

ATb =
[ 1 2-1 3]
-2 -3
1 -1
1 -5
2 0
= [ 0] -1
3
.

From the normal equation, a least squares solution of Ax = b is a solution of A T Ax =


ATb, i.e.,

=[ ].
[ -3 3 6 X3 3
By sol ving this system of equations (left for an exercise), one can obtain all the least
squares solutions, which are of the form:

for any number t E JR. Moreover,

Note that the set of least squares solutions is Xo + N(A ), where xo = [-8/3
5/3 O]T andN(A) = {t[5 3 I f : t E JR}. 0

Problem 5.26 Find all least squares solutions x in ]R3 of Ax = b, where

10 2]
o 2 2 [3]
-3
A = [ -1
-1 2
1 -1
0
' b =
-3
O '

Note that the normal equation is always consistent by the construction, and, as
Example 5.14 shows, a least squares solution can be found by the Gauss-Jordan
elimination even though AT A is not invertible.lfb E C(A) or, even better, if the rows
of A are linearly independent (thus, rank A = m andC(A) = JRm), then be = b so that
184 Chapter 5. Inner Product Spaces

the system Ax = b is always consistent and the least squares solutions coincide with
the true solutions.Therefore,for any givensystem Ax = b, consistentor inconsistent,
by solving the normalequation AT Ax = ATb, one can obtaineither the true solutions
or the least squares solutions.
If the square matrix AT A is invertible,then the normal equation A T Ax = ATb of
the system Ax = b resolves to x = (AT A)-I ATb, which is a least squares solution.
In particular, if AT A = In , or equivalently the columns of A are orthonormal (see
Lemma5.18), then the normal equationreduces to the leastsquaressolution x = ATb.
The following theorem gives a condition for A T A to be invertible.
Theorem 5.25 Forany m x n matrix A, AT A is a symmetric n x n square matrix
and rank (A T A) = rank A.

Proof: Clearly, AT A is square and symmetric. Since the number of columns of A


and AT A are both n, we have

rank A + dimN(A) = n = rank (AT A) + dimN(A T A).


Hence, it sufficesto showthat.V(A) = N(A T A) sothatdimN(A) = dimN(A T A).
It is trivial to see that N (A) S; N (A T A), since Ax = 0 implies AT Ax = O. Con-
versely, suppose that AT Ax = O. Then

Ax - Ax = (Ax)T (Ax) = x T (AT Ax) = xTO = O.


Hence Ax = 0, and x E N(A) . o
It follows from Theorem 5.25 that AT A is invertible if and only if rank A = n:
that is, the columns of A are linearly independent. In this case, N(A) = {OJ and so
the system Ax = b has a unique least squares solution XQ in 'R(A) = JRn , which is

This can be summarized in the following theorem:


Theorem 5.26 Let A be an m x n matrix. If rank A = n, or equivalently the columns
of A are linearlyindependent, then
(1) AT A is invertible so that (AT A)-I AT is a left inverse of A,
(2) the vector xo = (AT A)-I ATb is the unique least squares solution of a system
Ax = b, and
(3) AXO = A(A T A)-IATb = be = ProjC(A)(b), that is, the orthogonal projection
ofJRm onto C(A) is PrOjC(A) = A(A T A)-lAT.

Remark: (1) For an m x n matrix A, by applying Theorem 5.26 to AT, one can say
that rank A = m if and only if AA T is invertible. In this case AT (AAT)-I is a right
inverse of A (cf. Remark after Theorem 3.25). Moreover, AA T is invertible if and
only if the rows of A are linearly independent by Theorem 5.25.
5.9.1. Application: Least squares solutions 185

(2) If the matrix A is orthogonal, then the columns UI , . . . , Un of A form an


orthonormal basis for the column space C(A ) , so that for any be lRm ,

be = (UI . bju, + . ..+ (Un' b)u n = (uluf + . ..+


and the projection matrix is
. T
Pr oJC(A) = UIUI + ... + unun·
T

In fact, this result coincides with Theorem 5.26: If A is orthogonal so that AT A = In,
then

and the least squares solution is

UF - - ] _ [ UI.· b ]
: b-: ,
-- Un' b

which is the coordinate expression ofAXo = be = PrOk(A)(b) with respect to the


orthonormal basis {UI , ... , un} forC(A).
In general, the columns of A need not be orthonormal, in which case the above
formula is not possible. In Section 5.9.3, we will discuss more about this general case.
(3) If rank A = r < n, one can reduce the columns of A to a basis for the column
space and work with this reduced matrix A (thus, AT A is invertible) to find the
orthogonal projection PrOk(A) = A (A T A)-I AT of R" onto the column space C(A) .
However, the least squares solutions of Ax = b should be found from the original
normal equation directly, since the least squares solution Xo = (AT A)-I ,4Tb of
Ax = b has only r components so that it cannot be a solution of Ax = be.
Example 5.15 (Solving an inconsistent system Ax = b by thenormalequation) Find
the least squares solutions of the system:

Determine also the orthogonal projection be of b in the column space C(A) .

Solution: Clearly, the two columns of A are linearly independent and C(A) is the
xy-plane. Thus , b ¢ C(A ). Note that
186 Chapter 5. Inner Product Spaces

which is invertible . By a simple computation one can obtain

Hence,

is a least squares solution, which is unique sinceN (A) = O. The orthogonal projection
ofb in C(A) is

be = Ax = [
1 2] [ 14/3]
-1/3 =
[ 4]
. o

Problem 5.27 Find all the least squares solutions of the following inconsistent system of linear
equations:

5.9.2 Polynomial approximations

In this section, one can find a reason for the name of the "least squares" solutions,
and the following example illustrates an application of the least squares solution to
the determination of the spring constants in physics .

Example 5.16 Hooke's law for springs in physics says that for a uniform spring, the
length stretched or compressed is a linear function of the force applied, that is, the
force F applied to the spring is related to the length x stretched or compressed by the
equation
F=a+kx ,
where a and k are some constants determined by the spring .
Suppose now that, given a spring of length 6.1 inches, we want to determine the
constants a and k under the experimental data: The lengths are measured to be 7.6,
8.7 and lOA inches when forces of 2, 4 and 6 kilograms , respectively, are applied to
the spring . However, by plotting these data

(x, F) = (6.1, 0), (7.6, 2), (8.7, 4), (lOA , 6),


5.9.2. Application: Polynomial approximations 187

in the x F -plane, one can easily recognize that they are not on a straight line of the
form F a + kx in the x F -plane, which may be caused by experimental errors. This
=

r
means that the system of linear equations

= a + 6.lk = 0
F2 = a +7.6k = 2
F3 = a + 8.7k = 4
F4 = a + lOAk = 6
is inconsistent (i.e., has no solutions so the second equality in each equation may not
be a true equality). It means that if we put b = (0,2,4,6) and F = (FI, F2, F3, F4)
as vectors in R 4 representing the data and the points on the line at Xi'S, respectively,
then II b - F I is not zero. Thus, the best thing one can do is to determine the straight
line a + kx = F that 'fits' the data best: that is, to minimize the sum of the squares of
the vertical distances from the line to the data (Xi, Yi) for i = I, 2, 3, 4 (See Figure
5.5) (this is the reason why we say least squares):
(0 - FI)2 + (2 - F2)2 + (4 - F3)2 + (6 - F4)2 = lib - F1I 2 .

Thus, for the original inconsistent system

=
I 6.1] =
[ 0] = b
Ax
[I lOA 6
C(A),

we are looking for F E C(A), which is the projection of b onto the column space
C(A) of A, and the least squares solution xo, which satisfies Axo = F.

F
10
8
6
4
2

Figure 5.5. Least squares fitting

It is now easily computed as (by solving the normal equation A T Ax = A Tb)

[ ] = x = (AT A)-I ATb = [ ].


It gives F = -8.6 + lAx . o
188 Chapter 5. Inner Product Spaces

In general, a common problem in experimental work is to obtain a polynomial


Y = f(x) in two variables x and Y that best 'fits' the data of various values of Y
determined experimentally for inputs x, say

plotted in the xy-plane. Some possible fitting polynomials are


(1) by a straight line: y = a + bx,
(2) by a quadratic polynomial : y = a + bx + cx 2 , or
(3) by a polynomial of degree k: y = ao + alx + ...+ akxk, etc.
As a general case, suppose that we are looking for a polynomial y f(x) = =
ao + alX + a2x2 + . .. + akx k of degree k that passes through the given data. Then
we obtain a system of linear equations,

I
f(XI) = ao + alxl + + + akxt = YI
f(X2) ao + alx2 + a2x 2 +
= + akx2 = Y2

f(x n) = ao + alXn + a2x; +" ..+ = Yn,

or, in matrix form, the system may be written as Ax = b:

11 Xl
X2 x 2 x
X t] [
2
ao
al ] [ Y2
YI ]

··· ... ,'' - ... '


[
1 Xn x; ak Yn

The left-hand side Ax represents the values of the polynomial atx, 's and the right-hand
side represents the data obtained from the inputs Xi'S in the experiment.
If n k + 1, then the cases have already been discussed in Section 3.9.1. If
n > k + 1, this kind of system may be inconsistent. Therefore, the best thing one can
do is to find the polynomial f(x) that minimizes the sum of the squares of the vertical
distances between the graph of the polynomial and the data. But, it is equivalent to
find the least squares solution of the system Ax = b, because for any C E C(A) of the
form

xt ] [
x2
a
al
o]
[ ao + alxl + .. . + akxt ]
ao+alx2+ · ··+akx2

[jX
n
x; ....." xi = ao + alx
n
1.. .+ = c,

we have

lib - cII 2 = (YI - ao - a\x\ - .. " - akxf)2 + ...


+(Yn - ao - alXn - ". . -
5.9.2. Application: Polynomialapproximations 189

The previous theory says that the orthogonal projection be ofb into the column space
of A minimizes this quantity and shows how to find be and a least squares solution
Xo·
Example 5.17 Find a straight line y = a + bx that fits best the given experimental
data, (1,0), (2,3), (3,4) and (4,4).

Solution: We are looking for a line y = a + bx that minimizes the sum of squares
of the vertical distances IYi - a - bx, I's from the line Y = a + bx to the data (Xi, Yi).
By adapting matrix notation

we have Ax = b and want to find a least squares solution of Ax = b. But the columns
of A are linearly independent, and the least squares solution is x = (AT A)-l ATb.
Now,

10]
30 '
(ATA)_I=[
1 l'
ATb=[ll]
34 .
-- -
2 5
Hence, we have

o
Problem 5.28 From Newton's second law of motion, a body near the surface of the earth falls
vertically downward according to the equation

1
set) = so + vot + zgt 2 ,

where set) is the distance that the body travelled in time t, andso, Vo are the initial displacement
and velocity, respectively, of the body, and g is the gravitational acceleration at the earth's
surface. Suppose a weight is released, and the distances that the body has fallen from some
reference point were measured to be s = -0.18, 0.31, 1.03, 2.48,3.73 feet at times t =
0.1. 0.2, 0.3, 0.4. 0.5 seconds. respect ively. Determine approximate values of so. vo, g
using these data.
190 Chapter 5. Inner Product Spaces

5.9.3 Orthogonal projection matrices

In Section 5.9.1, we have seen that the orthogonal projection ProjC(A) of the Euclidean
space lRm on the column space C(A) of an m x n matrix A plays an important role in
=
finding a least squares solution of an inconsistent system Ax b. Also, the orthogonal
projection ProjC(A) is the main tool in the Gram-Schmidt orthogonalization.
In general , for a given subspace U of lRm , the computation of the orthogonal
projection Proju oflRm onto U appears quite often in applied science and engineering
problems. The least squares solution method can also be used to find the orthogonal
projection: Indeed, by taking first a basis for U and then making an m x n matrix A with
these basis vectors as columns , one clearly gets U = C(A) , and so by Theorem 5.26

Proju = ProjC(A) = A(A T A)-lAT . I


In fact, this projection itself is the orthogonal projection matrix, that is, the matrix
representation of Proju with respect to the standard basis ex,

Note that this projection matrix Proju is independent of the choice of a basis for U
due to the uniqueness of the matrix representation of a linear transformation with
respect to a fixed basis.

Example 5.18 Find the projection matrix P on the plane 2x - y - 3z = 0 in the


space lR3 and calculate Pb for b = (1, 0, 1).

Solution: Choose any basis for the plane 2x - y - 3z = 0, say,


VI = (0, 3, -1) and V2 = (1 , 2, 0).

Let A [ _: be the matrix with vr end V2 as columns,Thon

(A TA )- I = [10 6
6 5
]-1 = ..!.- [
14
5 -6]
- 6 10 .

The orthogonal projection matrix P = ProjC(A) is


P = A(A T A)-IA T

=
[[
14 -i
0[H 5 -6] [0 3
-6 10 1 2 ]
=
14
2
-1 [102 13
6 -3 -H
5.9.3. Application: Orthogonal projection matrices 191

and

Pb = .!..
14 6 -3 5 1
= .!.. [
14 11
]. o

If an orthonormal basis for U is known, then the computation of Proju =


A(A T A)-I AT is easy, as shown in Remark (2) on page 185: For an orthonormal
basis f3 = {uI, U2, . .. , un} for U, the orthogonal projection matrix onto the subspace
U is given as

Example 5.19 If A =
[CI c2l, where CI =
(1,0,0), C2 =
(0,1,0), then the col-
umn vectors of A are orthonormal, C(A) is the x y-plane, and the projection of
b = (x, y, z) E 1R3 onto C(A) is be = (x , y, 0). In fact,

which is equal to CIcj + C2C{. 0

Note that, if we denote by Proju; the orthogonal projection of R" on the subspace
spanned by the basis vector u, for each i, then its projection matrix is uiuT, and so

and
ifi=/=j
if i = j .

Problem5029 Let u = ()z ,)z) be a vector in lR2 which determines l-dimensional subspace
U = {au = (:72' :72) :a E R] . Show that the matrix

considered as a linear transformation on lR2 , is an orthogonal projection onto the subspace U.

Problem5.30 Show that if {VI , v2, .00. v m} is an orthonormal basis for lRm, then VIv[ +
T T_[m·
V2 v2+" ,+vmv m-
192 Chapter 5. Inner ProductSpaces

In general, if a basis {CI, C2, . . . , cn} for U is given, but not orthonormal,then one
has to directly compute
Proju = A(A T A)-I AT,
where A = [CI c2 . . . cn] is an m x n matrix whose columns are the given basis
vectors c.'s.
Sometimesit is necessaryto computean orthonormalbasisfromthe givenbasis for
U by the Gram-Schmidt orthogonalization. Its computationgivesus a decomposition
of the matrix A into an orthogonal part and an upper triangular part, from which the
computation of the projection matrix might be easier.
QR decomposition method: Let {CI, C2, ... ,cn} be an arbitrary basis for a sub-
space U. The Gram-Schmidt orthogonalization process to this basis may be written
as the following steps:
(1) From the basis {CI, C2, . . . , cn} , find an orthogonal basis {ql' q2, ... , qn} for
Uby

ql = CI
(ql, C2)
q2 = C2 -
(ql,qJ)
ql

(qn-I,Cn) (ql, cn)


qn = Cn -
(qn-I , qn-I)
qn-I - . . . -
(ql, ql)
ql ·

(2) By normalization of these vectors: u, = qi/llqj II, one obtains an orthonormal


basis {UI , . . . , un} for U.
(3) By rewriting those equations in (I), one gets

CI = ql = bllUI
C2 = al2ql + q2 = bl2 uI + b22 U2

(qi,Cj ) 1 d
h aij =
were (qi ,qi) lor I < ], au = ,an

(q j, Cj)
bij = aij IIqjII = -(--)
qj,qj
IIqill = (u., Cj),

for i : s i. which is just the component of Cj in u, direction.


(4) Let A = [CI C2 . .. cn]' Then, the equations in (3) can be written in matrix
notation as
bll bvi
o b22 bin]
b2n
A = [CI C2 . . . cn] = [UI U2 . . . un] =QR,
[
o bnn
5.9.3. Application: Orthogonal projection matrices 193

where Q = [UI U2 .. . un] is the m x n matrix whose orthonormal columns are


obtained from Cj 's by the Gram-Schmidt orthogonalization, and R is the n x n upper
triangular matrix.
(5) Note that rank A = rank Q = n ::: m , and C(Q) = V = C(A) which is
of dimension n in IR m • Moreover, the matrix R is an invertible n x n matrix, since
each b jj = (uj , C j) is equal to IIcj - ProjUj_l (Cj) II , where Vj_1 is the subspace of
IR m spanned by {CI, C2, . .. , cj-d (or equivalently, by (UI , U2, . • •, uj-d), and so
bjj ;6 0 for all j because Cj Vj-I.

One of the byproducts of this computation is the following theorem.

Theorem 5.27· Any m x n matrix ofrank n can be factored into a product Q R, where
Q is an m x n matrix with orthonormal columns and R is an n x n invertible upper
triangular matrix.

Definition 5.11 The decomposition A = QR is called the QR factorization or


the QR decomposition of an m x n matrix A, (rank A = n), where the matrix
=
Q [UI U2 .. . un] is called the orthogonal part of A, and the matrix R [bij] is=
called the upper triangular part of A .

Remark: In the QR factorization of A = QR, the orthononnality of the column


vectors of Q means QT Q = In, and the j-th column of the matrix R is simply the
coordinate vector of Cj with respect to the orthonormal basis f3 = {u I, U2 , • .. , un}
for V : i.e.,

and so

o o
With this QR decomposition of A, the projection matrix and the least squares
solution of Ax = b for b E IR m can be calculated easily as

P = A(A T A)-I AT = QR(R T QT QR)-I R T QT = QQT,


Xo = (AT A)-I ATb = (R T QT QR)-I R T QTb = R- 1 QTb.
Corollary 5.28 Let A be an m x n matrix of rank n and let A = Q R be its Q R
factorization. Then,
(1) the projection matrix on the column space of A is [Prok(A)]a = Q QT .
(2) The least squares solution of the system Ax = b is given by XQ = R - I QTb,
which can be solved by using back substitution to the system Rx = QTb.
194 Chapter 5. Inner Product Spaces

Example 5.20 (QR decomposition of A) Find the QR factorization A = QR and


the orthogonal projection matrix P = [PrOk(A)]a for

1 1 0]
101
A= [CI C2 C3] = 0 1 1 .
[
001

Solution: We first find the decomposition of A into Q and R, the orthogonal part and
the upper triangular part. Use the Gram-Schmidt orthogonalization to get the column
vectors of Q:
ql = CI = (I, I, 0, 0)

q2 = C2 - C2' ql ql
ql . ql
=
2 2
1, 0)
= C3 - C3 . q2 q2 _ C3 . ql ql = 1) ,
q3 q2 . q2 ql . ql 3 3 3

and IIqlll = -/2, IIq211 = .J'J72, IIq311 = ../173. Hence,


UI
= .s, _
IIqJII -
(_1 _1 00)
-/2' -/2' ,

U2
=
IIq211 -
_ (_1 _ _
1
../6 ' ../6'-.13'
-/2 0)
U3 =
q3 (
IIq311 = -
2 2
.J2T' .J2T' .J2T' .;7 .
2 -.13)
Then, CI = -/2uI, C2 = JzUI + .j[U2' C3 = JzUI + + ftU3 ' In fact, these
equations can also be derived from A = QR with an upper triangular matrix R . (It
gives that the (i, j)-entry of R is bij = (Uj , Cj).) Therefore,

A = [H :]
o 0 1
= [UI U2 U3]
0 0 .;7/-.13
]
1/-/2 1/../6 -2/.J2T] [ -/2 1/-/2 1/-/2]
= 1/-/2 -1/../6 2/.J2T 0 -.13/-/2 1/../6 = QR,
[ o -/2/-.13 2/.J2T 0 0 .;7/-.13
o 0 -.13/.;7
and
6/7 1/7 1/7
- 2/ 7 ]
6/7 -1/7 2/7
p=QQT=
[ -1/7 6/7 2/7 . o
-2/7 2/7 2/7 3/7
5.9.3. Application: Orthogonal projection matrices 195

Problem 5.31 Find the 2 x 2 matrix P that projects the xy -plane onto the line y =x .

[i l
Problem 5.32 Find the projection matrix P of the Euclidean 3-space]R3 onto the column space

Problem 5.33 Find the projection matrix P on the XJ, X2, X4 coordinate subspace of the
Euclidean 4-space ]R4.

8
Problem 5.34 Find the QR factorization of the matrix [ sin 88 cos ]
cos 0 .

As the last part of this section , we introduce a characterization of the orthogonal


projection matrices.

Theorem 5.29 A square matrix P is an orthogonal projection matrix if andonly if it


is symmetric and idempotent, i.e., pT = P and p 2 = P.

Proof: Let P be an orthogonal projection matrix . Then , the matrix P can be written
as P = A(A T A)-J AT for a matrix A whose column vectors form a basis for the
column space of P. It gives

pT = (A(A T A)-JATf = A(A T A)-J T AT = A(A T A)-JA T = P,


p2 = PP = (A(A T A)-JA T ) (A(A T A)-JA T ) = A(A T A)-lA T = P.
In fact, this second equation was already shown in Theorem 5.9.
Conversely, by Theorem 5.16, one has the orthogonal decomposition jRm = C(P)EB
N(p T). But, N(p T) = N(P) since p T = P. Thus , for any u + 0 E C(P) EB
N(p T) = jRm, P(u+o) = Pu+ Po = Pu = u, because p 2 = P implies Pu = u
for u E C(P) . It shows that P is an orthogonal projection matrix. (Alternatively, one
can use directly Theorem 5.9). 0

From Corollary 5.10, if P is a projection matrix on C(P), then 1- P is a projection


matrix on the null space N(p) (= C(l - P)), which is orthogonal to C(P) (=
N(l - P)).

Example 5.21 Let Pi : jRm --+ jRm be defined by

Pi(Xl,··· ,Xm)=(O, . . . ,O, Xi, 0, . .. ,0),

for i = 1, 2, ... , m. Then, each Pi is the projection of jRm onto the i -th axis, whose
matrix form looks like
196 Chapter5. Inner Product Spaces

o o
o
Pi = I - Pi = o
o
o o
When we restrict the image to JR, Pi is an element in the dual space JRn*, and usually
denoted by Xi as the i-th coordinate function (see Example 4.25). 0
Problem 5.35 Show that any square matrix P that satisfies p T P = P is a projection matrix.

5.10 Exercises
5.1. Decide which of the following functions on lR.2 are inner products and which are not. For
x = (XltX2), Y = (Yl, Y2) in]R2
(1) = XlY l X2Y2.
(x, y)
(2) (x, y) = 4Xl Yl + 4X2Y2 - Xl Y2 - X2Yl,
(3) (x, y) = Xl Y2 - X2Yl,
(4) (x, y) = XlYI + 3X2Y2,
(5) (x, y) = XlYI - Xl Y2 - X2Yl + 3X2Y2·
5.2. Show that the function (A, B) = treAT B) for A, B E Mnxn(]R) defines an inner product
on Mnxn(lR) .
5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in ]R5.
5.4. Determine the values of k so that the given vectors are orthogonal with respect to the

(l)IUlun (2)IUH -nl


Euclidean inner product in ]R4.

5.5. Consider the space qo, 1] with the inner product defined by

ir. g) = f f(x)g(x)dx .

Compute the length of each vector and the cosine of the angle between each pair of vectors
in each of the following :
(1) f(x) = 1, g(x) = X;
(2) f(x) = x m, g(x) = x n , where m, n are positive integers;
(3) f( x) = sin,.,rrx, g(x) = cosrerx, where m, n are positive integers.
5.6. Prove that
(al + + an)2 ::: near + ... + a;)
for any real numbers al, a2, , an. When does equality hold?
5.10. Exercises 197

5.7. Let V = P2([0, 1]) be the vector space of polynomials of degree g 2 on [0, 1] equipped
with the inner product
1
(f, g) = 10 f(t)g(t)dt.

(1) Compute (f, g) and 1If11 for f(x) = x + 2 and g(x) = x 2 - 2x - 3.


(2) Find the orthogonal complement of the subspace of scalar polynomials.
5.8. Find an orthonormal basis for the Euclidean 3-space ]R3 by applying the Gram-Schmidt
orthogonalization to the three vectors x = (1, 0, 1), X2 = (1, 0, - 1), x3 = (0, 3, 4).
5.9. Let W be the subspace of the Euclidean space]R3 spanned by the vectors VI = (1, 1, 2)
andv2=(I , 1, -1). Find Projg-Ib) forb = (1, 3, -2) .
5.10. Show that if u is orthogonal to v, then every scalar multiple of u is also orthogonal to v,
Find a unit vector orthogonal to VI = (1, 1, 2) and v2 = (0, 1, 3) in the Euclidean
3-space ]R3.
5.11. Determine the orthogonal projection of VI onto v2 for the following vectors in the n-space
]Rn with the Euclidean inner product.
(1) VI = (1, 2, 3), v2 = (1, 1, 2),
(2) VI = (1,2, 1), V2 = (2, 1, -1),
(3) VI = (1, 0, 1, 0) , v2 = (0, 2, 2, 0).
5.12. Let S = {v;}, where Vi'S are given below. For each S, find a basis for Sol with respect to
the Euclidean inner product on ]Rn.
(1) VI = (0, 1, 0), v2 = (0, 0, 1),
(2) VI = (1, 1, 0), V2 = (1, 1, 1),
(3) VI = (1, 0, 1,2), v2 = (1, 1, 1, 1), vs = (2, 2, 0, 1).
5.13. Which of the following matrices are orthogonal?

1/2 -1/3 ] 4/5 -3/5 ]


(1) [ -1/2 1/3' (2) [ -3/5 4/5'

1/./2
° ° -1/./2] 1/ ./2 1/./3 -1/..;'6]

°
(3) -1/./2 1/./2 , (4) 1/./2 -1/./3 1/..;'6 .
[ [
-1/./2 1/./2
° 1/./3 2/..;'6

5.14. Let W be the subspace of the Euclidean 4-space ]R4 consisting of all vectors that are
orthogonal to both x = (1, 0, -1 , 1) and Y = (2, 3, -1, 2). Find a basis for the
subspace W.
5.15. Let V be an inner product space. For vectors x and Yin V , establish the following identities :

(1) (x, y) = ! IIx + yll2 - ! IIx - yll2 (polarization identity) ,

(2) (x,y) = 1 (lIx+YII2 -lIxll 2 -IIYI12) (polarization identity),

(3) IIx+ YII 2 + IIx - YII 2 = 2(lIx1l 2 + IIYII 2) (parallelogram equality).


5.16. Show that x + Yis perpendicular to x - Yif and only if [x] = IIYII .
198 Chapter 5. Inner Product Spaces

Figure 5.6. n-dimensional parallelepiped P(A)

5.17. Let A be the m x n matrix whose columns are Cl, C2, ... , Cn in the Euclidean m-space
IRm . Prove that the volume of the n-dimensional parallelepiped P(A) determined by those
vectors Cj 's in IRm is given by

vol (A) = J
det(AT A) .

(Note that the volume of the n-dimensional parallelepiped determined by the vectors
c j , c2, ... , Cn in IRm is by definition the multiplication of the volume of the (n - 1)-
dimensional parallelepiped (base) determined by C2, . . . , Cn and the height of cl from
the plane W which is spanned by C2, .. . , Cn' Here, the height is the length of the vector
C = q - Proj W(q ), which is orthogonalto W . (See Figure 5.6.)lfthe vectors are linearly
dependent, then the parallelepiped is degenerate, i.e., it is contained in a subspace of
dimension less than n .)
5.18. Find the volume of the three-dimensional tetrahedron in the Euclidean 4-space jR4 whose
vertices are at (0,0,0,0), (1,0,0,0), (0, 1,2,2) and (0,0, 1,2).
5.19. For an orthogonal matrix A, show that det A = µ1. Give an example of an orthogonal
matrix A for which det A = -1.
5.20. Find orthonormal bases for the row space and the null space of each of the following
matrices.

243] 1 40]
(I)
[21 1 I ,
0 I
(2)
[ -2 -3 1
002
,

5.21. Let A be an m x n matrix of rank r. Find a relation among m, nand r so that Ax = b has
infinitely many solutions for every b E jRm .
5.22. Find the equation of the straight line that fits best the data of the four points (0, 1), (1, 3),
(2, 4), and (3, 4).
5.23. Find the cubic polynomial that fits best the data of the five points
(-1 , -14), (0, -5), (1, -4), (2, 1), and (3, 22).
5.24. Let W be the subspace of the Euclidean 4-space IR4 spanned by the vectors Xj 'S given in
each of the following problems. Find the projection matrix P for the subspace W and the
null space N(P) of P. Compute Pb for b given in each problem.
5.10. Exercises 199

(1) xI = (1, I , 1, 1),X2=(1 , -I , 1, -1),X 3= (-I , 1, 1, O),and


b = (1, 2, I, 1).
(2) xI = (0, -2, 2, I ), X2 = (2, 0, -1, 2), and b = (1, 1, I , 1).
(3) XI = (2, 0, 3, -6),X2 = (- 3, 6, 8, O),andb= (-I , 2, -I , 1).
5.25. Find the matrix for orthogonal projection from the Euclidean 3-space jR3 to the plane
spanned by the vectors (1, 1, 1) and (1, 0, 2).
5.26. Find the projection matrix for the row space and the null space of each of the following

[{s -f ],
matrices:

[ 24 I] 1 4 0]
(1) (2) 1 1 I ' (3) 0 0 2 .
[ 2 3 -1
../5 ../5
5,27. Consider the space C[ -1, 1] with the inner product defined by

U, g) = 11
-I
f(x)g(x)dx.

A function f E C[-I, 1] is even if f(-x) = f(x), or odd if f(-x) = - f(x) . Let U


and V be the sets of all even functions and odd functions in C[ -1, 1], respectively.
(1) Prove that U and V are subspace s and C[-I , 1] = U + V.
(2) Prove that U .1 V.
(3) Prove that for any f E C[-l , IJ.11fll 2 = IIh1l 2 + IIgll 2 where f = h+g E U Etl V.
5.28. Determine whether the following statements are true or false, in general, and justify your
answers.
(1) An inner product can be defined on any vector space.
(2) Two nonzero vectors X and Y in an inner product space are linearly independent if
and only if the angle between x and Yis not zero.
(3) If V is perpendicular to W , then V .L is perpendicular to W .L.
(4) Let V be an inner product space . Then IIx - YII [x] - IIYII for any vectors Xand Y
in V.
(5) Every permutation matrix is an orthogonal matrix .
(6) For any n x n symmetric matrix A, x T Ay defines an inner product on R",
(7) A square matrix A is a projection matrix if and only if A 2 = J.
(8) For a linear transformation T : jRn --+ jRn, T is an orthogonal projection if and only
if idlR.n - T is an orthogonal projection.
(9) For any m x n matrix A , the row space R(A) and the column space C(A) are ortho-
gonal.
(10) A linear transformation T is an isomorphism if and only if it is an isometry .
(11) For any m x n matrix A and bE jRm , AT Ax = ATb always has a solution.
(12) The least squares solution of Ax = b is unique for any matrix A .
(13) The least squares solution of Ax = b is the orthogonal projection of b on the column
space of A.
Selected Answers and Hints

Chapter1
Problems
1.2 (1) Inconsistent.
(2) (XI, Xz, x3, X4) = (-1- 4t, 6 - 21, 2 - 31, 1) for any 1 E JR.
1.3 (1) (x , y, z) = (1, -1 , 1). (3) (w, x , y, z) = (2, 0, 1, 3)
1.4 (1) bl + bZ - b3 = O. (2) For any bj's.
1.7 a = -¥, b = ¥, c = .!j, d = -4.
1.9 Consider the matrices : A = [; : l B = [; J. C = J-
1.10 Compare the diagonal entries of AA T and AT A.
1.12 (1) Infinitely many for a = 4, exactly one for a f= µ4, and none for a = -4.
(2) Infinitely many for a = 2, none for a = -3, and exactly one otherwise.
1.14 (3) I = IT = (AA-I)T = (A-I)T AT means by definition (AT)-I = (A-I)T.
1.16 Use Problem 1.14(3).
1 .17 Any permutation on n objects can be obtained by taking a finite number of interchangings
of two objects .
1.21 Consider the case that some dj is zero.
1.22 X = 2, y = 3, Z = 1.
1.23 True if Ax = b is consistent, but not true in general.
1.24 L =[ -
o
-I I
U=
0
[b - - ].
0 I
1.25 (1) Consider (i, j)-entries of AB for i < j .
(2) A can be written as a product of lower triangular elementary matrices .
362 Selected Answers and Hints

1 01 0] [20 u=
1.26 L=
[ -1/2 0
o -2/3 1
,D=
0 o 4/3
] ,
0 0 1
1.27 There are four possibilities for P.
1.29 (1) II = 0.5, 12 = 6, 13 = 0.55. (2) II = 0, h = 13 = 1,14 = 15 = 5.
0.35 ]
1.30 x = k 0.40 for k > O.
[ 0.25

0.0 0.1 0.8] [ 90 ]


1.31 A = 0.4 0.7 0.1 with d = 10 .
[ 0.5 0.0 0.1 30

Exercises

1.1 Row-echelon forms are A, B, D, F . Reduced row-echelon forms are A, B, F.


1 -3 2 1 2]
1.2 (1) .
[
o 0 0 0 0
1 -3 0 3/2 1/2]
1.3 (1) 0 0 1 -1/4 3/4
[
o 0 0 0 0 .
o 0 0 0 0
1.4 (1) XI = 0, X2 = 1, X3 = -1, X4 = 2. (2) X = 17/2, y = 3, Z = -4.
1.5 (1) and (2) .
1.6 For any bi'S.
1.7 bl - 2b2 + 5b3 i= O.
1.8 (1) Take x the transpose of each row vector of A.
1.10 Try it with several kinds of diagonal matrices for B.
1 2k 3k(k - 1) ]
1.11 Ak = 0 1 3k .
[ o 0 1

1.13 (2)
5
0
-2227
101]
-60 .
[ o 0 87
1.14 See Problem 1.9.
1.16 (1) A-lAB = B. (2) A-lAC =C = A + I.
1.17 a = 0, c- I = b i= O.

1 -1 0 0] [13/8
11 8 A-I _ 0 1/2 -1/2 0 B-1 = -15/8
. - 0 0 1/3 -1/3 '
[
o 0 0 1/4 5/4

= -Is
8 -23
-19 2]
1.19 A-I
[4
1 4 .
-2 1
Selected Answers and Hints 363

1.22 (I)x=A-Ib= [;] =


-1/3 -2/3 1/3 7 - 5/ 3
2]
1.23 (1) A = Ii = LDU , (2) L = A, D = U = I.

1.24 (1) A = ]
3 1 1 0 0 -1 0 0 1

(2) d_ ][ l
1.25 c=[2 -I3f,x=[423]T.

1.26 (2) A =
111002001
1.27 (1) (Ak)- I = (A-Ii . (2) An-I = 0 if A E Mnxn.
(3) (l- A)(l + A + ... + Ak- I) = I - Ak.

1.28 (1) A = 1 (2) A = r l


AZ = r l
A= I.

l
1.29 Exactly seven of them are true.
(1) See Theorem 1.9. (2) Consider A =

::: T]1f [? r) = BT AT =
BA .

5 3 1 1 3 2
(7) (A-Il = (AT)-I = A-I. (8) If A-I exists, A-I = A-I (AB T) = B T.
(9) If AB has the (right) inverse C, then A-I = BC.
(10) Con sider EI = and EZ = Ul
(12) Consider a permutation matrix

Chapter 2
Problems
2.2 (1) Note: 2nd column - 1st column = yd column - 2nd column.
2.4 (1) -27, (2) 0, (3) (1 - x 4)3 .
2.7 Let a be a transposition in Sn. Then the composition of a with an even (odd) permutation
in Sn is an odd (even, respectively) permutation.
2.9 (1) -14. (2) O.
2.10 (I)(y-x)(-x+z)(z-y)(w-x)(w-y)(w- z)(w+y+x+z) .
(2) (fa - be + cd) (fa + cd - eb) .

2.11 A-I = I =1]; adjA=


-5 2]
-1 I .
2 -! ! 6 8 -5
364 Selected Answers and Hints

2.12 If A = 0, then clearly adj A = O. Otherwise , use A . adj A = (det A)/.


2.13 Use adj A . adj(adj A) = det(adj A) I .
2.14 If A and B are invertible matrices , then (AB)-l = B-1 A-I . Since for any invertible
= =
matrix A we have adj A (det A)A- 1, (1) and (2) are obvious. To show (3), let AB BA
and A be invertible. Then A-I B = A- 1(BA)A- 1 = A- 1(AB)A-1 = BA- 1, which
gives (3).
2.15 (1) xl = 4, X2 = 1, x3 = -2.
10 5 5
(2) x = 23' y = 6' z = 2'

2.16 The solution of the system id(x) = x is Xi = = det A.


Exercises

2.1 k = 0 or 2.
2.2 It is not necessary to compute A 2 or A 3 .
2.3 -37.
2.4 (1) det A = (_1)n-1 (n - 1). (2) O.
2.5 -2,0,1,4.
2.6 Consider L a1u(l) . .. anu(n)'
ueS.
2.7 (1) 1, (2) 24.
2.8 (3) Xl = 1, X2 = -1, x3 =2, X4 = - 2.
2.9 (2) x = (3,0, 4/11)T.
2.10 k=Oorµ1.
2.11 x = (-5, 1,2, 3)T .
2.12 x = 3, y = -1, z = 2.
2.13 (3) All = - 2, A12 = 7, An = -8, A33 = 3.

2.16 A-I = -h. ] .


6 14 -18

2.17 (1) adj A = [


-4
i= 7 5
, det A = -7, det(adj A) = 49,

A-I = -+adj A . (2) adj A =


7 -3 -1
det A = 2, det(adj A) = 4, A-I = !-adj A .

2.19 Multiply

= [; 11 det AI = 4.
1
2.20 If we set A then the area is

2.21 If we set A = [i then the area is !.fl "'I(AT A)I 1#.


SelectedAnswers and Hints 365

2.22 Use det A = L sgn(a)al 17(I)'" a n17(n)'


17eS n

l
2.23 Exactly seven of them are true.

(I) Consider A = [ ; ; ] and B = ;


(2) det(AB) = det A det B = det B det A .
(3) Consider A = [ ; ; ] and c = 3. (4) (cIn - A)T = cln - AT .

(5) Consider E = ] . (6) and (7) Compare their determinants.


(8) If A = h ? (9) Find its counter example.
(10) What happened for A = O?
(11) UyT = U[VI . .. vn] = [VIU' " vnu] ;
det(uyT) = VI . . . Vn det([u ··· uJ) = O.

: : ::::, 1 1o
'J': A : :l IT dot A 0'
(16) At = A -I for any permutation matrix A.

Chapter 3
Problems

3.1 Check the commutativity in addition.


3.2 (2), (4).
3.3 (1), (2), (4).
3.5 See Problem 1.11.
3.6 Note that any vector yin W is of the form aIxI + a2x2 + ... + amXm which is a vector
inU .
3.7 Use Lemma 3.7(2) .
3.9 Use Theorem 3.6
3.10 Any basis for W must be a basis for V already, by Corollary 3.12.
3.11 (I) dim= n - 1, (2) dim= n(nt) , (3) dim= n(yll.
3.13 63a + 39b - 13c + 5d = O.
3.15 If bj , . .. , b n denote the column vectors of B, then AB = [Abl ... Ab n].
3.16 Consider the matrix A from Example 3.20.
3.17 (I) rank = 3, nullity = 1. (2) rank = 2, nullity = 2.
3.18 Ax = b has a solution if and only if b E C(A) .
3.19 A-I(AB) = B implies rank B = rank A-1(AB)::: rank(AB) .
3.20 By (2) of Theorem 3.21 and Corollary 3.18, a matrix A of rank r must have an invertible
submatrix C of rank r. By (1) of the same theorem, the rank of C must be the largest.
3.22 dim(V + W) = 4 and dim(V n W) = 1.
366 Selected Answers and Hints

3.23 A basis for V is ÿ1 ,0,0,0), (0, -1 , 1,0), (0, -1,0, I)},


for W : ÿ-1 , 1,0,0), (0,0,2, I)}, and for V n W : ÿ3, -3,2, I)}. Thus,
dim(V + W) = 4 means V + W =]R4 and any basis for]R4 works for V + W.
3.26
1 0

A= °1 °1
[
° 1

Exercises

3.1 Consider 0(1, 1).


3.2 (5).
3.3 (2), (3). For (1), if f(O) = 1, 2f(O) = 2.
3.4 (1).
3.5 (1), (4).
3.6 tr(AB - BA) = 0.
3.7 No.
3.8 (1) p(x) = -PI (x) + 3P2(X) - 2P3(X).
3.9 No.
3.10 Linearly dependent.
3.12 No.
3.13 ((1,1,0) , (1,0, I)}.
3.14 2.
3 {} OO }
.15 Consider (ej = ai i=l where ai =
{I° if i = j ,
otherwise.

+ ... +cpAbs = A(qb l + . .. + cpb S ) implies qb l + .. . + cpb P = 0


3.16 (1) 0 = qAb l
°
since N (A) = 0, and this also implies Ci = for all i = 1, . . . , P since columns of Bare
linearly independent.
(2) B has a right inverse . (3) and (4) : Look at (1) and (2) above.
3.17 (1) {(-5, 3, I)}. (2) 3.
3.18 5f, and dependent.
3.19 (1) 'R,(A) = (1,2,0,3) , (0,0,1 ,2Ÿ), C(A) = (5,0,1), (0,5 ,2Ÿ),
N(A) = (-2, 1,0,0), (-3,0, -2, 1Ÿ) .
(2) 'R,(B) =
(1 ,1 , -2,2), (0,2,1, -5), (0,0,0, IŸ), C(B) =
((1, -2,0), (0,1,1),
(0,0,1»), N(B) = (5, -1,2,0Ÿ).
3.20 rank = 2 when x = -3, rank = 3 when x =f. - 3.
3.22 See Exercise 2.23: Each column vector of UyT is of the form ViU , that is, U spans the
column space . Conversely, if A is of rank I, then the column space is spanned by anyone
column of A, say the first column u of A, and the remaining columns are of the form ViU,
i = 2, . .. , n . Take v = [1 V2 ... vnf . Then one can easily see that A = UyT.
3.23 Four of them are true.
(1) A - A =1 or 2A = 1
(2) In]R2 , let a = [ej , e2} and f3 = [ej , -e2}.
Selected Answers and Hints 367

(3) Even U = W, (X n fJ can be an empty set.


(4) How can you find a basis for C(A). See Example 3.20.
.
(5) Consider A = [0 0 ] and B = [ 1
1 0 0 0
0] .
(6) See Theorem 3.24. (7) See Theorem 3.25. (8) Ifx = -s.
=
(9) In]R2, Consider U ]R2 x 0 and V 0 x ]R2. =
(10) Note dim C(A T ) =
dim'R(A) dim C(A). =
(II) By the fundamental Theorem and the Rank Theorem.

Chapter 4
Problems

4.1 l since it is simply the change of coordinates x and y .

4.2 To show W is a subspace, see Theorem 4.2. Let Eij be the matrix with I at the (i, j)-th
position and 0 at others . Let Fk be the matrix with 1 at the (k, k) -th position, -I at the
(n, n)-thposition and 0 at others. Then theset{Eij, Fk : I :::: i f= j :::: n, k = I, ... , n-I}
is a basis for W . Thus dim W = n2 - 1.
4.3 tr(AB) = I:i=1 I:k=1 aikbki = I:k=1 I:i=1 bkiaik = tr(BA) .
4.4 If yes, (2, I) = T(-6 , -2, 0) = -2T(3 , I, 0) = (-2, -2).
4.5 If aivi + a2v2 + . ..+ akvk = 0, then
o = T(al VI + a2v2 + .., + akvk) = al WI + a2w2 + . .. + akwk implies ai = 0 for
i=I , ... ,k.
4.6 (I) If T(x) = T(y) , then S 0 T(x) = So T(y) implies x = y. (4) They are invertible.
4.7 (1) T(x) = T(y) if and only if T(x - y) = 0, i.e., x - y E Ker(T) .
(2) Let{VI, . . . , vn} be a basis for V. 1fT is one-to-one, then the set{T(vI), . .. , T(vn )} is
linearly independent as the proof of Theorem 4.7 shows. Corollary 3.12 shows it is a basis
=
for V . Thus, for any y E V, we can write it as y I:I=I aiT(vi) =
T(I:I=I aivi) . Set
x = I:I=I aivi E V . Then clearly T(x) = y so that T is onto. If T is onto, then for each
i = =
I , .. . , n there exists xi E V such that T(Xi) Vi.Then the set Ixj , ... ,xn} is linearly
independent in V,since, ifI:l=l aixi = = =
0, then 0 T(I:I=I aixi) I:I=I aiT(xi) =
I:I=I aivi implies ai = 0 for all i = I, . . . ,n. Thus it is a basis by Corollary 3.12 again.
If T(x) = 0 for x = I:I=I aixi E V , then 0 = T(x) =
I:I=I aiT(xi) =
I:I=I aivi
implies ai = =
0 for all i =
I , . . . , n, that is x O.Thus Ker (T) =
{O}.

4.8 Use rotation R!f andrefiection about the x-axis.

4.9 (1) (5, 2, 3). (2) (2, 3, 0).

4.10 (I)[Tla=[;
4
=i
7 0 4 -3
245 ].

4.11 =
o 2 3 4
U Un
368 Selected Answers and Hints

413

4.14 = - , [Tl a =
o 0 1 0 0 4

4.15 (2) = [T -1l p =[ - l


4.16 -;
2 2 1 1 1 1 -2

4.17
1 0 4 1 1 5
4.18 Write B = Q-I AQ with some invertible matrix Q.
(1) det B = det(Q-I AQ) = det Q-I det Adet Q = det A. (2) tr(B) = tr(Q-I AQ) =
tr(QQ-I A) = tr(A) (see Problem 4.3). (3) Use Problem 3.19.
4.20 a* = {fI (x, y, z) = x- !y, !2(x, y, z) = !y, f3(X, y, z) = -x + z},
4.24 By BA, we get a tilting along the x-axis; while by BT A, one can get a tilting along the
y-axis .

Exercises

4.1 (2).
4.2 ax 3 + bx 2 + ax + c.
4.4 S is linear because the integration satisfies the linearity.
4.5 (1) Consider the decomposition ofv = v+I(V) + v-I(v).
4.6 (1) {(x, 2x) E lR3 : x E R}.
4.7 (2) T-1(r, s, t) = (! r, 2r - s, 7r - 3s - r) ,
4.8 (1) Since T 0 S is one-to-one from V into V, T 0 S is also onto and so T is onto . Moreover,
if S(u) = S(v), then T 0 S(u) = T 0 S(v) implies u = v. Thus, S is one-to-one, and
so onto . This implies T is one-to-one. In fact, if T(u) = T(v), then there exist x and y
such that S(x) = u and S(y) = v. Thus T 0 S(x) = T 0 S(y) implies x = y and so
u = T(x) = T(y) =v.
4.9 Note that T cannot be one-to-one and S cannot be onto.
4.12 vol(T(CŸ = Idet(A)lvo1(C), for the matrix representation A of T.
4.13 (3) (5, 2, 0).
5 4
-4 -3
4.14 0 0
[
o 0 o 1
-1/3
4.15 (1) [ -5/3 2/3 ]
1/3 .
Selected Answers and Hints 369

4.16 (1) _ i l [i (2) 1


4.17 (1) T(l , 0, 0) = (4, 0), T(1, I, 0) = (1, 3), T(1 , I , I) = (4, 3).
(2) T(x, y, z) = (4x - 2y + z, Y+ 2z) .

4.18

i iJ
001 000

(I) h U Q [: tr:' ,

4.20 Compute the trace of AB - BA .

4.21 (1) [-7 -13]


4
-33
19 8'

-2/3 1/3 4/3]


4.22 (2) ;], (4) 2/3 -1/3 -1/3 .
[ 7/3 -2/3 -8/3
4.23 ' All represents reflection in a line at angle ()/2 to the x -axis . And , any two such reflections
are similar (by a rotation).
4.25 Compute their determinants.

4.27 [T]a = = ([T*Ja*)T .


1 0 1

4.28 (1) = [-i ; 1


4J9 I , 0, I), 0, I, I , I) , 14,2,2,3Ÿ, [

4.31 PI (x) =1 + x - P2(x) = -i + P3(X) = -j. + x -


4.32 Five of them are false.
(1) Consider T : jR2 -+ jR3 defined by T(x , y) = (x, 0, 0) .
(2) Note dim Ker(T) + dim 1m(T) = dim jRn = n ,
(3) dim Ker(T) = dim
(4) dim Im(T) =
dimC([T]a) dimjR([TJa). =
(5) Ker(T) CKer(S 0 T) .
(6) P (x) = 2 is not linear.
(7) Use the definition of a linear transformation.
(8) and (10) See Remark (1) in Section 4.3.
(9) T : jRn -+ jRn is one-to-one iff T is an isomorphism. See Remark in Section 4.4 and
Theorem 4.14.
(11) By Definition 4.5 and Theorem 4.14.
(12) Cf. Theorem 4.17. (13) det(A + B) i= det(A) + det(B) in general.
(14) T(O) i= (0) in general for a translation T.
370 Selected Answers and Hints

Chapter 5
Problems
5.1 Let (x, y) = XT Ay be an inner product. Then, for x = (1,0), (x, x) = aXIYI +
C(XI Y2 + X2YI) + bX2Y2 > 0 implies a > O. Similarly, for any x (x, 1), (x, x) > 0=
implies ab - c2 > O.
5.2 (x, y)2 = (x , x) (y, y) if and only if [rx + ylj2 = (x, x)t 2 + 2(x, y)t + (y, y) = 0 has
a repeated real root to.
5.3 If dl < 0, then for x = (1, 0, 0), (x, x) = dl < 0: Impossible.
5.4 (4) Compute the square of both sides and then use Cauchy-Schwarz inequality.
5.5 (4) Use Problem 5.4(4): triangle inequality in the length.
5.6 (f, g) = fJ
f(x)g(x)dx defines an inner product on C [0, 1]. Use Cauchy-Schwarz
inequality.
5.7 (2)-(3): Use fJ
f(x)g(x) dx = 0 if k f. i; and = if k = e.
5.8 (1): Orthogonal, (2) and (3): None, (4): Orthonormal.
= =
5.10 Clearly, Xl (1,0,1), x2 (0,1,2) are in W. The Gram-Schmidt orthogonaliza-
tion gives UI = W
= (1,0,1), u2 = JJ(-I, 1, 1) which form a basis.

5.11 {I, J3(2x - 1), V5(6x 2 - 6x + I)}.

5.12 (4) Im(idv - T) Ker(T) because T(idV - T)(x) = T(x) - T 2(x) = O. Im(idV-
T) 2 Ker(T) because if T(x) = 0, x = x - 0 = X - T(x) = (idv - T)x.
5.13 (1) (x , x) = 0 for only x = O. (2) Use Definition 5.6(2).
5.16 1) is just the definition, and use (1) to prove (2).
5.17
5.18 (1) b E C(A) and y E N(A T) .

5.19 The null space of the matrix [b -i i] is


x=t[I - I l O]T +s[-4IOIffort,sElR.
5.20 Note: R(A)l. = N(A).
5.22 There are 4 rotations and 4 reflections.
5.23 (1) r = s= a = JJ ' b = - JJ ' C = - JJ.
5.24 Extend {VI , ... , Vm} to an orthonormal basis {Vb " " vm, ... , vn}. Then IIxll 2 =
El=ll(x, vi)12 + E}=m+ll(x, Vj)l2.
5.25 (1) orthogonal. (2) not orthogonal.
5.26 x = (1, -1 , 0) + t(2, 1, -1) for any number t .

5.28 [ ] =x= (AT A)-IATb=


'!g 16.1
5.30 For x E IRm, x = (VI, X)VI + . . . + (vm, x)vm = (VI vf)x + ... + (Vm
5.31 The line is a subspace with an orthonormal basis (1, 1), or is the column space of
Selected Answers and Hints 371

5.32 P = [ 7 ; - ].
3 1 - I 2
5.33 Note that (e) , e2, ea} is an orthonormal basis for the subspace.
5.35 Hint: First, show that P is symmetric.
Exercises
5.1 Inner products are (2) , (4), (5).
5.2 For the last condition of the definition, note that (A , A) = tr(A T A) = Lj,i at = 0
if and only if aij = 0 for all t, j .
5.4 (I)k = 3.
5.5 (3) 11/11 = IIgll =../f7'1., The angle is 0 ifn = m, ifn i: m .
5.6 Use the Cauchy-Schwarz inequality and Problem 5.2 with x = (a), . . . , an) and
y = (1, .. . , 1) in (R", .).
5.7 (1) 37/4, JT97J.
(2) If (h , g) = h(% + + c)= 0 with h i: 0 a constant and g(x) = ax 2 + bx + c,
then (a , b, c) is on the plane %+ + c = 0 in jR3.
5.9 Hint: For A = [V) V2], two columns are linearly independent, and its column space
is W.
3 1
5.11 (I ) ZV2, (2) ZV2.
5.13 Orthogonal: (4). Nonorthogonal: (I ), (2), (3).
5.17 Use induction on n. Let B be the matrix A with the first column c) replaced by
c = c ) - Projw (C) , and write Projw(c) = a2c2 + ... +ancn for some c. ts. Show
that Jdet (AT A ) = J det( B T B ) = Ilcllvol(c2, . . . , cn) = vol(P(A».

5.18 Let A =
012
! Then the volume of the tetrahedron is tJ det(A T A ) = 1.
5.19 A T A =I = det A imply det A = ±1.
and det AT

sin e
. A = [cos
Th e matrix sin e
e _ cos e ] ISort
. h ' h det A = - I .
i WIt
ogona

5.21 Ax = b has a solution for every b E jRm if r = m. It has infinitely many solutions if
nullity =n - r =n - m > O.

5.22 Fiod a least squares <0""00 0' [ 1

i
. 3
my = a + bx. Then y = x + Z.

5.23 Follow",,,,,'se 5.22 with A [ 1- ! -I ]


.Then y = 2x 3 - 4x 2 + 3x - 5.

27
372 Selected Answers and Hints

5.27 (1) Let h ex ) = !-U (x ) + f (- x » and g(x ) = !-U (x ) - Then f =h + g.


(2) For fEU and g E V , (f, g) = f (x )g (x)dx =- t:' f(-t )g(-t)dt
=- f(t)g (t )dt = -(f,g ), by change of variable x = -rl,
(3) Expand the length in the inner product.
5.28 Five of them are true.
(1) Possible via a natural isomorphism. See Theorem 5.8.
(2) Consider (I , 0) and (-1 , 0).
(3) Consider two subspaces U and W of R 3 spanned by el and e2, respectively.
(4) IIx - YII + IIYII ::: [x] by the triangle inequality.
(5) The columns of any permutation matrix P are {el, ... , en} in some order.
(6) See Theorem 5.5.
(7) =
A is a projection iff A 2 A . (See Theorem 5.9.)
(8) By Corollaries 5.10 and 5.12.
(9) R(A) and C(A) are not subspaces of the same vector space .
(10) A dilation is an isomorphism.
(11) ATb E C(A T A) always.
(12) The solution set is Xo + N(A) by Corollary 5.17.

Chapter 6

l
Problems

6.3 Consider the matrices and

6.4 Check with A = l


6.5 (1) Use det A = Al .. . An. (2) Ax = AX if and only if x = AA-Ix.
6.6 If A is invertible, then AB = A (B A)A -I. For two singular matrices A =
and B = A Band BA are not similar, but they have the same eigenvalues .
6.7 (1) If Q =
[XI X2 X3] diagonalizes A, then the diagonal matrix must be AI and
=
A Q AQI . Expand this equation and compare the corresponding columns of the
equation to find a contradiction on the invertibility of Q.

6.8 Q = l D = J Then A = QDQ-I = :l


6.9 (1) The eigenvalues of A are I , 1, -3, and their associated eigenvectors are (1, 1,0),
(-1,0, I) and (1, 3, 1), respectively.
(2) If f(x) x lO + x 7 + 5x, then f(1), f(1) and f(-3) are the eigenvalues of
=
A IO + A7 +5A .

6.11 Note that [


an-I
] [i
=
0 1 0 a n -2
] .TheeigenvaluesareI ,2,-Iand

eigenvectors are (1, I, 1), (4, 2,1) and (1, -1 , 1), respectively. It turns out that an =
2 2n
2 - (_I)n_ - -.
3 3
6.13 Its characteristic polynomial is f (A) = (A + 1) (A - 2)2 ; so (- Il , 2n , n2n form a
fundamental set.

You might also like