Professional Documents
Culture Documents
Sungpyo Hong
Linear Algebra
Second Edition
2 Determinants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45
2.1 Basic properties of the determinant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Existence and uniqueness of the determinant. . . . . . . . . . . . . . . . . . .. 50
2.3 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 56
2.4 Cramer's rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 61
2.5 Applications .... . ....... ... . . . .. . ... ............. ... ... .... 64
2.5.1 Miscellaneous examples for determinants. . . . . . . . . . . . . . .. 64
2.5.2 Area and volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67
2.6 Exercises ............. ..... . . .. . . .. . ...... .... .. .. . .. .. . . . 72
xii Contents
3 Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75
3.1 The n-space jRn and vector spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Subspaces. . . . .. . . . ... . . .. . . . .. . . . . . ... . ... ... . . . . ... .... .. 79
3.3 Bases. . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4 Dimensions . ... . ... .. .. ... .. .. . .. .. . . . . .... . .. ... . . . ... . .. 88
3.5 Rowand column spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91
3.6 Rank and nullity 96
3.7 Bases for subspaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 100
3.8 Invertibility.... .. ...... ... .. ... . .. . . ... . . . . . . . . . . . . . . . . . . . 106
3.9 Applications 108
3.9.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.9.2 The Wronskian 110
3.10 Exercises 112
I
allXI + a12 X2 + + alnXn = bl
a2lXI + a22x2 + + a2nXn = b2
where Xl , X2, . . . , Xn are the unknowns and aij's and b.'s denote constant (real or
complex) numbers.
A sequence of numbers (Sl, S2, .. . ,sn) is called a solution of the system if Xl =
sj , X2 = S2, , X n = Sn satisfy each equation in the system simultaneously. When
bl = b2 = = b m = 0, we say that the system is homogeneous .
The central topic of this chapter is to examine whether or not a given system has
a solution , and to find the solution if it has one. For instance, every homogeneous
system always has at least one solution XI = X2 = ... = Xn = 0, called the trivial
solution . Naturally, one may ask whether such a homogeneous system has a nontrivial
solution or not. If so, we would like to have a systematic method of finding all the
solutions. A system of linear equations is said to be consistent if it has at least one
solution, and inconsistent if it has no solution .
For example, suppose that the system has only one linear equation
HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
2 Chapter 1. Linear Equations and Matrices
always assume that not all the coefficients in each equation of the system are zero
unless otherwise specified.
ax +by = c,
a\x + b\y = c\
{ a2 X + b2Y = C2·
Solution: (I) Geometric method. Since the equations represent two straight lines in
the xy-plane, only three types are possible as shown in Figure 1.1.
{2xx-y -1 {x- y = -1
{x+ y 1
-2y = -2 x-y = 0 x-y = 0
y y y
x x
(II) Algebraic method. Case (1) (two lines coincide): Let the two equations rep-
resent the same straight line, that is, one equation is a nonzero constant multiple of
the other. This condition is equivalent to
1.1. Systemsof linear equations 3
In this case, if a point (s, t) satisfies one equation, then it automatically satisfies the
other too. Thus, there are infinitely many solutions which are all the points on the
line.
Case (2) (two lines are parallel but distinct) : In this case, a2 = Aa" bi =
Ab" but C2 : /= ACl for A : /= O. (Note that the first two equalities are equivalent to
a,b2 - a2b, = 0). Then no point (s, t) can satisfy both equations simultaneously, so
that there are no solutions .
Case (3) (two lines cross at a point) : Let the two lines have distinct slopes, which
means aib: - a2b, ::/= O. In this case, they cross at a point (the only solution), which
can be found by the elementary method of elimination and substitution. The following
computation shows how to do this:
Without loss of generality, one may assume a,: /= 0 by interchanging the two
equations if necessary. (If both a, and a2 are zero, the system reduces to a system of
one variable.)
(1) Elimination : The variable x can be eliminated from the second equation by adding
- a2 times the first equation to the second, to get
a,
c,
a,c2 -a2c,
a,b2 - a2b,
(3) Substitution: Now, x is solved by substituting the value of y into the first equation,
and we obtain the solution to the problem:
b2Cl - b,C2
aib: - a2b,
Note that the condition a, b2 - a2b, : /= 0 is necessary for the system to have only
one solution. D
In Example 1.2, the original system of equations has been transformed into a
simpler one through certain operations, called elimination and substitution, which is
4 Chapter 1. Linear Equations and Matrices
just the solution of the given system . That is, if (x, y) satisfies the original system
of equations, then it also satisfies the simpler system in (3), and vice-versa. As in
Example 1.2, we will see later that any system of linear equations may have either no
solution, exactly one solution, or infinitely many solutions. (See Theorem 1.6.)
Note that an equation ax + by + cz = d, (a, b, c) i- (0,0,0) , in three unknowns
represents a plane in the 3-space ]R3. The solution set includes
One can also examine the various possible types of the solution set of a system of
three equations in three unknown. Figure 1.2 illustrates three possible cases.
describe all the possible types of the solution set in the 3-space ]R3 .
(I') multiply the equation with the reciprocal of the same nonzero
constant ,
(2') interchange two equations again,
(3') add the negative of the same constant multiple of the equation to
the other.
Therefore, by applying a finite sequence of the elementary operations to the given
original system, one obtains another new system, and by applying these inverse oper-
ations in reverse order to the new system, one can recover the original system. Since
none of the three elementary operations alters the solutions, the two systems have
the same set of solutions. In fact, a system may be solved by transforming it into a
simpler system using the three elementary operations finitely many times.
These arguments can be formalized in mathematical language. Observe that in
performing any of these three elementary operations , only the coefficients of the
variables are involved in the operations, while the variables Xl, X2 , • • •, X n and the
equal sign U= " are simply repeated . Thus, keeping the places of the variables and U= "
in mind, we just pick up the coefficients only from the given system of equations and
make a rectangular array of numbers as follows:
This matrix is called the augmented matrix of the system. The tenn matrix means
just any rectangular array of numbers, and the numbers in this array are called the
entries of the matrix. In the following sections, we shall discuss matrices in general.
For the moment, we restrict our attention to the augmented matrix of a system.
Within an augmented matrix, the horizontal and vertical subarrays
are called the i-th ro w (matrix), which represents the i-th equation , and the j-th
column (matrix), which are the coefficients of j -th variable X i - of the augmented
6 Chapter 1. Linear Equations and Matrices
matrix, respectively. The matrix consisting of the first n columns of the augmented
matrix
The inverse row operations which are also elementary row operations are
(1st kind) multiply the row by the reciprocal of the same constant,
(2Dd kind) interchange two rows again,
(3rd kind) add the negative of the same constant multiple of
the row to the other.
Definition 1.1 Two augmented matrices (or systems of linear equations) are said to
be row-equivalent if one can be transformed to the other by a finite sequence of
elementary row operations .
Theorem 1.1 If two systems of linear equations are row-equivalent, then they have
the same set ofsolutions.
The general procedure for finding the solutions will be illustrated in the following
example :
1.2. Gaussian elimination 7
I
2y + 4z = 2
x + 2y + 2z = 3
3x + 4Y + 6z = -1.
Solution: One could work with the augmented matrix only. However, to compare
the operations on the system of linear equations with those on the augmented matrix,
we work on the system and the augmented matrix in parallel. Note that the associated
augmented matrix for the system is
3 4 6 -1
(1) Since the coefficient of x in the first equation is zero while that in the second
equation is not zero, we interchange these two equations :
I [b;; ;].
x + 2y + 2z = 3
2y + 4z = 2
3x + 4Y + 6z = -1 3 4 6 -1
I
x +
-
2y
2y
2y
+
+
2z =
4z =
=
3
2
-10 [
1
o
o
2
2
-2
;o ;].
-10
Thus , the first variable x is eliminated from the second and the third equations. In this
process , the coefficient 1 of the first unknown x in the first equation (row) is called
the first pivot.
Consequently, the second and the third equations have only the two unknowns y
and z. Leave the first equat ion (row) alone, and the same elimination procedure can
be applied to the second and the third equations (rows): The pivot to eliminate y from
the last equation is the coefficient 2 of y in the second equation (row).
(3) Add 1 times the second equation (row) to the third equation (row):
I [b;; ;] .
x + 2y + 2z = 3
2y + 4z = 2
4z = -8 o 0 4 -8
The elimination process (i.e., (1) : row interchange , (2): elimination of x from the
last two equations (rows), and then (3): elimination of y from the last equation (row))
done so far to obtain this result is called forward elimination. After this forward
elimination, the leftmost nonzero entries in the nonzero rows are called the pivots .
Thus the pivots of the second and third rows are 2 and 4, respectively.
(4) Normalize nonzero rows by dividing them by their pivots. Then the pivots are
replaced by 1:
r
8 Chapter 1. Linear Equations and Matrices
+ +
2y
y +
2z
2z
z
=
=
= -2
3
I
o 0 I -2
iJ .
The resulting matrix on the right-hand side is called a row-echelon form of the
augmented matrix, and the I's at the pivotal positions are called the leading I's. The
process so far is called Gaussian elimination.
The last equation gives z = -2. Substituting z = -2 into the second equation
gives y = 5. Now, putting these two values into the first equation, we get x = -3. This
process is called back substitution.The computation is shown below: i.e., eliminating
numbers above the leading I's,
(5) Add -2 times the third row to the second and the first rows:
{X +2; = 7 2 0
z
=
= -2
(6) Add -2 times the second row to the first row:
5
U I 0
o I -2
= -3 0 0
J
{ x y
z
=
=
5
-2 U
I 0 -35
o I -2
.
This resulting matrix is called the reduced row-echelon form of the augmented
matrix, which is row-equivalent to the original augmented matrix and gives the so-
lution to the system . The whole process to obtain the reduced row-echelon form is
called Gauss-Jordan elimination. 0
Example 1.4 The first three augmented matrices below are in reduced row-echelon
form, and the last one is just in row-echelon form.
000 000 0 0
i 000 1 001 3
Recall that in an augmented matrix [A b], the last column b does not correspond
to any variable. Thus , if the reduced row-echelon form of an augmented matrix for a
nonhomogeneous system has a row of the form [ 0 0 ... 0 b ] with b f:. 0, then the
associated equation is OXj + OX2 + . .. + OXn = b with b f:. 0, which means the
system is inconsistent. If b = 0, then it has a row containing only O's, which can be
neglected and deleted . In this example, the third matrix shows the former case, and
the first two matrices show the latter case. 0
In the following example, we use Gauss-Jordan elimination again to solve a
system which has infinitely many solutions .
Example 1.5 Solve the following system of linear equations by Gauss-Jordan elim-
I
ination.
Xj + 3X2 - 2X3 = 3
2xj + 6X2 - 2X3 + 4X4 = 18
X2 + X3 + 3X4 = 10.
13-2 0 3]
[ 2 6 -2 4 18
o 1 1 3 10
.
13-2 0 3]
[ o
o
0
1
2 4 12
1 3 10
.
(2) Note that the coefficient of X2 in the second equation is zero and that in the
third equation is not. Thus, interchanging the second and the third rows produces
13-2 0 3]
[ 011310
o 0 2 4 12
.
(3) Dividing the third row by the pivot 2 produces a row-echelon form
10 Chapter 1. Linear Equations and Matrices
Xl + X4 = 3
+ = 4
1
X2 X4
x3 + 2X4 = 6.
:
1 X3 = 6 2X4 .
Since there is no other condition on X4, one can see that all the other variables x}, X2,
and X3 may be uniquely determined if an arbitrary real value t e lRis assigned to X4
(R denotes the set of real numbers): thus the solutions can be written as
Definition 1.3 Among the variables in a system, the ones corresponding to the
columns containing leading l's are called the basic variables, and the ones cor-
responding to the columns without leading 1's, if there are any, are called the free
variables.
Clearly the sum of the number of basic variables and that of free variables is equal
to the total number of unknowns: the number of columns .
In Example 1.4, the first and the last augmented matrices have only basic variables
but no free variables, while the second one has two basic variables Xl and X3, and two
1.3. Sums and scalar multiplications of matrices 11
free variables X2 and X4. The third one has two basic variables XI and X2, and only
one free variable X3.
In general, as we have seen in Example 1.5, a consistent system has infinitely
many solutions if it has at least one free variable, and has a unique solution if it has no
free variable. In fact, if a consistent system has a free variable (which always happens
when the number of equations is less than that of unknowns), then by assigning
arbitrary value to the free variable, one always obtains infinitely many solutions.
Theorem 1.2 If a homogeneous system has more unknowns than equations, then it
has infinitely many solutions.
Problem 1.2 Suppose that the augmented matrices for some systems of linear equations have
been reduced to reduced row-echelon forms as below by elementary row operations. Solve the
systems:
(1)
[
1 0
0 I
o 0
o
o
o
5]
-2 ,
4
(2) [bo 0 I 3
-I ]
6
2
.
Problem 1.3 Solve the following systems of equations by Gaussian elimination. What are the
pivots?
-x + y + 2z 0 2y z I
(1)1 3x + 4y + z 0 (2) 14X lOy + 3z 5
2x + 5y + 3z = O. 3x 3y 6.
w + x + y = 3
-3w 17x + y + 2z I
(3)\ 4w 17x + 8y 5z I
5x 2y + z l.
Problem 1.4 Determine the condition on bi so that the following system has a solution.
+ 2y + 6z bl + 3y 2z bl
(1) 1 3y 2z = b2 (2) { + 3z
Y b2
3x y + 4z b3· 4x + 2y + Z b3·
or just A = [aij] if the size of the matrix is clear from the context. The number aij is
called the (i, j)-entry of the matrix A , and is written as aij = [A]ij.
An m x 1 matrix is called a column (matrix) or sometimes a column vector, and
a 1 xn matrix is called a row (matrix), or a row vector. In general, we use capital
letters like A, B, C for matrices and small boldface letters like x, y, z for column
or row vectors.
Definition 1.5 Let A = [aij] be an mxn matrix. The transpose of A is the nxm
matrix, denoted by AT, whose j -th column is taken from the j -th row of A: that is,
[AT]ij = [Alji.
o
Definition 1.6 A matrix A = [aij] is called a square matrix of order n if the number
of rows and the number of columns are both equal to n.
The following matrices U and L are the general forms of the upper triangular and
lower triangular matrices, respectively:
Jl
ln
a2n
a
. . ]
, L=
[
.. : a n2
1.3. Sums and scalar multiplications of matrices 13
Note that a matrix which is both upper and lower triangular must be a diagonal matrix,
and the transpose of an upper (lower, respectively) triangular matrix is lower (upper,
respectively) triangular.
Definition 1.8 Two matrices A and B are said to be equal, written A = B , if their
sizes are the same and their corresponding entries are equal : i.e., [A]ij = [B]ij for
all i and j.
a ll . ..
k . .
[
. .
amI
(2)(Sum ofmatrices) For two matrices A = [aij] and B = [bij] in Mmxn(R), the
sum of A and B is defined to be the matrix A + B such that [A + B]ij [A]ij + [B]ij
=
for all i and j : i.e., in an expanded form:
aln]
:. +
[b
11
:.
.. .
'. '
al
n
7
bIn ].
124]+[ad e fc]
[3 b
cannot be defined.
If B is any matrix , then - B is by definition the multiplication (-1) B. Moreover, if
A and B are two matrices of the same size, then the subtraction A - B is by definition
the sum A + (-1) B. A matrix whose entries are all zeros is called a zero matrix,
denoted by the symbol 0 (or Omxn when the size is emphasized).
Clearly, matrix sum has the same properties as the sum of real numbers. The real
numbers in the context here are traditionally called scalars even though "numbers"
is a perfectly good name and "scalar" sounds more technical. The following theorem
lists the basic arithmetic properties of the sum and scalar multiplication of matrices.
14 Chapter 1. Linear Equations and Matrices
Theorem 1.3 Suppose that the sizes of A, Band C are the same. Then the following
arithmetic rules ofmatrices are valid:
(1) (A + B) + C = A + (B + C) , (written as A + B + C) (Associativity),
(2) A+ 0 0+ A
= A, =
(3) A+(-A)=(-A)+A=O,
(4) A + B = B + A, (Commutativity),
(5) k (A + B) = kA + k.B,
(6) (k + £)A = kA + lA,
(7) (kl)A = k (fA).
Proof: We prove only the equality (S) and the remaining ones are left for exercises.
For any (i, j),
ij = k[A + B]ij = k([A]ij + [B]ij) = [kA] ij + [kB]ij
= [kA +kB]ij . o
In particular, A + A = 2A , A + (A + A) = 3A = (A + A) + A, and inductively
nA = (n - l)A + A for any positive integer n,
A = b ] , B = ] ,
sc
b c -2 -3 0
are symmetric and skew-symmetric, respectively. Notice that all the diagonal entries
of a skew-symmetric matrix must be zero, since aii = -aii .
By a direct computation, one can easily verify the following properties of the
transpose of matrices:
Theorem 1.4 Let A and B be m x n matrices. Then
(kA)T = kAT and (A + B)T = AT + B T.
Problem 1.5 Prove the remaining parts of Theorem 1.3.
2-3 0]
A = [ 4 -1 3
-1 0 1
.
Note that the number of entries of the first row vector is equal to the number of entries
of the second column vector, so that an entry-wise multiplication is possible.
Step (2) Product of a matrix and a vector: For an m x n matrix
with the row vectors a i'S and for an n x 1 column vector x = [XI'" xn]T, the
product Ax is by definition an m x 1 matrix , whose m rows are computed according
to Step (1 ):
n
Ax =
al
oi« ] [ XI]
X2 a2X [I:?=l
[alX] I:?=I aloux.ix i ]
ami a m2 • = = I:?=I:amiXi .
AB = [ Abl
16 Chapter 1. Linear Equations and Matrices
:
blj bl r
[ bll b2j b2r
AB
B= [51 - 20]
1 0 .
2 · 1 + 3 . 5 ] _ [ 17 ]
[ 4 ·1+0 ·5 - 4 '
2 ·2 + 3 . (-1) ] _ [ 1 ]
[-i] = [ 4 ·2+0·(-1) - 8 '
2 .0 + 3.0 ] _ [ 0 ]
[ 4 ·0+0 ·0 - 0 .
Therefore , AB is
2 3] [1 2 0] = [17 1 0]
[ 4 0 5 -1 0 4 8 0 .
Note that the product AB of A and B is not defined if the number of columns of
A and the number of rows of B are not equal.
1.4. Products of matrices 17
Remark: In step (2), instead of defining a product of a matrix and a vector, one can
define alternatively the product of a 1 x n row matrix a and an n x r matrix Busing
the same rule defined in step (1) to have a 1 x r row matrix aB. Accordingly, in step
(3) an appropriate modification produces the same definition of the product of two
matrices . We suggest that readers complete the details. (See Example 1.10.)
The identity matrix of order n, denoted by In (or I if the order is clear from the
context) , is a diagonal matrix whose diagonal entries are all 1, i.e.,
By a direct computation, one can easily see that AIn = A = InA for any n x n matrix
A.
The operations of scalar multiplication, sum and product of matrices satisfy many,
but not all, of the same arithmetic rules that real or complex numbers have. The matrix
Om xn plays the role of the number 0, and In plays that of the number 1 in the set of
usual numbers.
The rule that does not hold for matrices in general is commutativity AB = BA
of the product, while commutativity of the matrix sum A + B = B + A always holds .
The follow ing example illustrates noncommutativity of the product of matrices.
AB = l BA =
The following theorem lists the basic arithmetic rules that hold in the matrix
product.
Theorem 1.5 Let A, B, C be arbitrary matrices for which the matrix operations
below can be defined, and let k be an arbitrary scalar. Then
(1) A(BC) = (AB)C, (written as ABC) (Associativity),
(2) A(B + C) = AB + AC, and (A + B)C = AC + BC, (Distributivity),
(3) I A = A = AI,
(4) k(BC) = (kB)C = B(kC),
(5) (AB)T =B T AT.
18 Chapter 1. Linear Equations and Matrices
Proof: Each equality can be shown by direct computation of each entry on both
sides of the equalities. We illustrate this by proving (I) only, and leave the others to
the reader.
Assume that A = [aij] is an mxn matrix, B = [bkl] is an nxp matrix, and
C = [cstl is a pxr matrix. We now compute the (i, i)-entry of each side of the
equation. Note that BC is an n xr matrix whose (i , i)-entry is [BC]ij = L:f=l bnc:..l :
Thus
n n p n p
p p n n p
This clearly shows that [A(BC)]ij = [(AB)C]ij for all i, i. and consequently
=
A(BC) (AB)C as desired. 0
As an application of our results on matrix operations, one can prove the following
important theorem:
Theorem 1.6 Any system of linear equations has either no solution, exactly one
solution, or infinitely many solutions.
Proof: We have seen that a system of linear equations may be written in matrix form
as Ax = b. This system may have either no solution or a solution. If it has only one
solution, then there is nothing to prove. Suppose that the system has more than one
solution and let Xl and X2 be two different solutions so that AXI = band AX2 = b.
Let Xo = XI - X2· Then xo f= 0, and Axo = A(XI - X2) = O.Thus
1.5. Block matrices 19
This means that Xl + kxo is also a solution of Ax = b for any k. Since there are
infinitely many choices for k, Ax = b has infinitely many solutions. 0
Problem 1.12 For which values of a does each of the following systems have no solution,
exactly one solution, or infinitely many solutions?
I
x + 2y 3z 4
(1) 3x y + Sz 2
4x + y + (a 2 - 14)z a + 2.
I
x - y + z = 1
(2) x + 3y + az 2
2x + ay + 3z 3.
divided up into four blocks by the dotted lines shown. Now, if we write
are block matrices and the number of columns in Aik is equal to the number of rows
in Bkj, then
20 Chapter 1. Linear Equations and Matrices
This will be true only if the columns of A are partitioned in the same way as the rows
of B.
It is not hard to see that the matrix product by blocks is correct. Suppose, for
example, that we have a 3x3 matrix A and partition it as
a ll al2 I a 13] [ A
A = a21 a22 I a23 = All
[ - - - 21
a31 an I a33
and a 3 x 2 matrix B which we partition as
B= = [ ] .
E3i1J3i
Then the entries of C = [Cij] = A B are
The quantity ailblj + ai2b2j is simply the (i, j)-entry of AllBll if i .:::: 2, and is
the (i , j)-entry of A21 Bll if i = 3. Similarly, ai3b3j is the (i, j)-entry of A l2B21 if
i .:::: 2, and of A22B21 if i = 3. Thus AB can be written as
AB =
at
a2
.
J =[ alB
a2B
..
J= [alb
a2bl
.
l
alb2
a2b2
.. B
. ..
[
am amB ambl amb2
1.6. Inverse matrices 21
where b., b2, . .. , b, denote the columns of B . Hence, the row vectors of AB are
the products of the row vectors of A and the column vectors of B. 0
Problem 1.13 Compute AB using block multiplication, where
A=
1211
-3
0] , B=
1 012]
.
[ o 01 2 -1 [
3 -2 I 1
Example 1.11 (One -sided inverse) From a direct calculation for two matrices
A = [1 2-1]
2 0 1
-3 ]
5 ,
-2 7
we have AB = h , and BA = -; - : ] f= 13 .
12 -4 9
Thus, the matrix B is a right inverse but not a left inverse of A , while A is a left inverse
but not a right inverse of B. 0
A matrix A has a right inverse if and only if AT has a left inverse, since (AB)T =
B T AT and IT = I. In general, a matrix with a left (right) inverse need not have a
right (left , respectively) inverse. However, the following lemma shows that if a matrix
has both a left inverse and a right inverse, then they must be equal:
Lemma 1.7 If an n x n square matrix A has a left inverse B and a right inverse C,
then Band C are equal, i.e., B = C.
By Lemma 1.7, one can say that if a matrix A has both left and right inverses,
then any two left inverses must be both equal to a right inverse C, and hence to each
other. By the same reason, any two right inverses must be both equal to a left inverse
B, and hence to each other. So there exists only one left and only one right inverse,
which must be equal.
We will show later (Theorem 1.9) that if A is a square matrix and has a left inverse,
then it has also a right inverse, and vice-versa. Moreover, Lemma 1.7 says that the left
inverse and the right inverse must be equal. However, we shall also show in Chapter 3
that any non-square matrix A cannot have both a right inverse and a left inverse: that
is, a non-square matrix may have only a one-sided inverse. The following example
shows that such a matrix may have infinitely many one-sided inverses.
!]
Example 1.12 (Infinitely many one-sided inverses) A non-square matrix A =
can have more than one left inverse, In fact, for any x, Y E Ill. the matrix
B= is a left inverse of A. 0
AB = In = BA.
Such a matrix B is called the inverse of A, and is denoted by A-I. A matrix A is said
to be singular if it is not invertible.
Lemma 1.7 implies that the inverse matrix of a square matrix is unique. That is
why we call B 'the' inverse of A. For instance, consider a2x2 matrix A =
If ad - be :/= 0, then it is easy to verify that
l
d -b
A-I _ 1
- ad -be
[d-c -b]
a -
_[ ad-c- be ad - be
a
ad - be ad -be
Theorem 1.8 The product of invertible matrices is also invertible, whose inverse is
the product of the individual inverses in reversed order:
1.7. Elementary matrices and finding A-I 23
Proof: Suppose that A and B are invertible matrices of the same size. Then
(A B)(B - I A- I ) = A(BB-I)A- I = AlA-I = AA- I = I , and similarly
(B - 1A-I) (AB) = I. Thus , AB has the inverse B- 1A-I . o
The inverse of A is written as ' A to the power -1 ' , so one can give the meaning
of Ak for any integer k : Let A be a square matrix. Define A0 = I. Then, for any
positive integer k, we define the power A k of A inductively as
It is easy to check that AkH = AkA l whenever the right-hand side is defined. (If
A is not invertible, A3+(-I) is defined but A-I is not.)
Problem 1.16 Let A be an invertible matrix. Is it true that (Ak) T = (A T) k for any integer k?
Justify your answer.
Ax = A (Bb) = (AB)b = b .
(Compare with Problem 1.23). In particular, if A is an invertible square matrix, then
it has only one inverse A- I, and x = A- Ib is the only solution of the system. In this
section, we discuss how to compute A-I when A is invertible.
Recall that Gaussian elimination is a process in which the augmented matrix is
transformed into its row-echelon form by a finite number of elementary row op-
erations. In the following, one can see that each elementary row operation can be
expressed as a nonsingular matrix, called an elementary matrix, so that the process
of Gaussian elimination is the same as multiplying a finite number of corresponding
elementary matrices to the augmented matrix.
Definition 1.13 An elementary matrix is a matrix obtained from the identity matrix
In by executing only one elementary row operation.
24 Chapter 1. Linear Equations and Matrices
For example, the following matrices are three elementary matrices corresponding
to each type of the three elementary row operations:
(3 rd kind) [bo 0 I
3 times the third row is added to the first
row of h-
100] .
E =
[ -2 1 0
001
Multiplying this elementary matrix E to b on the left produces the desired result:
Similarly, the second kind of elementary operation 'interchanging the first and
the third rows' on the matrix b can be achieved by multiplying an elementary matrix
P obtained from h by interchanging the two rows, to b on the left:
Recall that each elementary row operation has an inverse operation, which is
also an elementary operation, that brings the elementary matrix back to the original
identity matrix. In other words, if E denotes an elementary matrix and if E' denotes
the elementary matrix corresponding to the 'inverse' elementary row operation of E,
then E' E = 1, because
1.7. Elementary matrices and finding A -I 25
(1) if E multiplies a row by c =f. 0, then E' multiplies the same row by
(2) if E interchanges two rows, then E' interchanges them again;
(3) if E adds a multiple of one row to another, then E' subtracts it from the same
row.
Furthermore, one can say that E' E I = =E E' : every elementary matrix is invertible
and inverse matrix E- 1 = E' is also an elementary matrix.
EI = 01 0c O
0 ] , E2 = [ 1
0 0
100 ] , E 3 = [01
[ 001 301 0
then
Definition 1.14 A permutation matrix is a square matrix obtained from the identity
matrix by permuting the rows.
Problem 1.18 Define the elementary column operations for a matrix by just replacing 'row '
by 'column' in the definition of the elementary row operations . Show that if A is an m x n
matrix and if E is a matrix obtained by executing an elementary column operation on In, then
AE is exactly the matrix that is obtained from A when the same column operation is executed
on A. In particular, if D is an n x n diagonal matrix with diagonal entries d I, d2, . . . , dn,
then AD is obtained by multiplication by dl, d2, ... , dn of the columns of A, while DA is
obtained by multiplication by dl' d2' . . . , dn of the rows of A .
(3) A is row-equivalent to In ;
(4) A is a product of elementary matrices;
(5) A is invertible;
(6) A has a right inverse.
Proof: (1) =} (2): Let x be a solution of the homogeneous system Ax = 0, and let
B be a left inverse of A. Then
I
solution x 0:=
XI = 0
X2 = 0
Xn = O.
This means that the augmented matrix [A 0] of the system Ax = 0 is reduced to the
system [In 0] by Gauss-Jordan elimination. Hence, A is row-equivalent to In.
(3) =} (4): Assume A is row-equivalent to In, so that A can be reduced to In by a
finite sequence of elementary row operations . Thus, one can find elementary matrices
EI, E2, "" e, such that
Ek .. . E2EIA = In.
By multiplying successively both sides of this equation by s;', ..., 2
E 1, E)I on
the left, we obtain
If a triangular matrix A has a zero diagonal entry, then the system Ax = 0 has at
least one free variable, so that it has infinitely many solutions . Hence, one can have
the following corollary.
Corollary 1.10 A triangular matrix is invertible if and only if it has no zero diagonal
entry.
From Theorem 1.9, one can see that a square matrix is invertible if it has a one-
sided inverse. In particular, if a square matrix A is invertible, then x = A -I b is a
unique solution to the system Ax = b.
1.7. Elementary matrices and finding A -I 27
[ o -c 1
[ -b 0 1 0 0
As an application of Theorem 1.9, one can find a practical method for finding
the inverse A-I of an invertible nxn matrix A. If A is invertible, then A is row
equivalent to In and so there are elementary matrices EI, Ez, .. . , Ek such that
Ek'" EzEIA = In. Hence,
A-I = Ek '" EzEI = Ek'" EzElln .
This means that A-I can be obtained by performing on In the same sequence ofthe
elementary row operations that reduces A to In . Practically, one first constructs an
n x 2n augmented matrix [A I In] and then performs a Gaussian-Jordan elimination
that reduces A to In on [A I In] to get [In I A -I]: that is,
A=
[I 23]
235 .
102
n
0 2
U
2 3 I a
-+ -1 -1 -2 1 (-l)row 2
-2 -1 -1 a
-+
U
2
1
3
1
1 0
2 -1
n (2)row 2 + row 3
n
-2 -1 -1 a
U
2 3 1 0
-+ 1 1 2 -1
a 1 3 -2
28 Chapter 1. Linear Equations and Matrices
2 3
1 00] (-l)row 3 + row 2
[U I K] =
U 1 1
0 1
2 -1 0
3 -2 1
6
(-3)row 3 + row 1
2 0 -8
-3 ] (-2)row 2 + row 1
-+
U 1 0
0 1
-1 1 - 1
3 -2 1
-i-1 ] =
0 0 -6 4
U
I].
-+ 1 0 -1 1 [IIA-
0 1 3 -2
Thus, we get
Problem 1.20 Express A-I as a product of elementary matrices for A given in Example 1.15.
1.8. LDU factorization 29
dO
l ...
Problem 1.21 When is a diagonal matrix D = 0 ] . nonsingular, and what is
[
dn
I
x + 2y + 2z = 10
2x - 2y + 3z 1
4x - 3y + 5z = 4
Problem 1.23 True or false: If the matrix A has a left inverse C so that C A = In, then x = Cb
is a solution of the system Ax = b . Justify your answer.
1.8 WU factorization
In this section, we show that the forward elimination for solving a system of linear
equations Ax = b can be expressed by some invertible lower triangular matrix, so
that the matrix A can be factored as a product of two or more triangular matrices.
We first assume that no permutations of rows (2nd kind of operation) are necessary
throughout the whole process of forward elimination on the augmented matrix [A b].
Then the forward elimination is just multiplications of the augmented matrix [A b]
by finitely many elementary matrices Ek, .. . , E 1: that is,
where each E, is a lower triangular elementary matrix whose diagonal entries are all
l's and [U y] is a row-echelon form of [A b] without divisions of the rows by the
pivots. (Note that if A is a square matrix, then U must be an upper triangular matrix).
Therefore, if we set L = (Ek' " E1)-1 = Ell . . . Ei: 1, then we have A = LU,
where L is a lowertriangularmatrix whosediagonalentriesareall l's. (In fact, each
Ei l
is also a lower triangular matrix, and a product of lower triangular matrices is
also lower triangular (see Problem 1.25). Such factorization A = LU is called an LU
factorization or an LU decomposition of A. For example,
Ax= LUx=b
2 1 1
Ax = 4 1 0
[ -2 2 1
by using an LU factorization of A.
Solution: The elementary matrices for the forward elimination on the augmented
matrix [A b] are easily found to be
0]
o1 0 ,
3 1
so that
.: ., 0 = U.
o -4
41 ]
Thus, if we set
A = LU = [ ; -2 1
10] .
-1 -3 1 0 0 -4 4
= 1
Ly =b : Y2 = -2
3Y2 + Y3 = 7
can be easily solved inductively to get y = (1, -4, - 4) and the system
1.8. WU factorization 31
= 1
Ux=y: = -4
= -4
1 -1 0 ]
A = -1 2 -1 ,
[ o -1 2
Problem 1.25 Let A and B be two lower triangular matrices. Prove that
(1) their product AB is also a lower triangular matrix;
(2) if A is invertible , then its inverse is also a lower triangular matrix ;
(3) if the diagonal entries of A and B are all 1 's, then the same holds for their product AB
and their inverses.
Note that the same holds for upper triangular matrices, and for the product of more than two
matrices .
u n * *]
For example,
0 0 * * *
1 0 0 dz *
A * * = LU
* 1 0 0 0 d3 *
0 0 0 o 0
un n
* *
* * *
=
0 0
1 0
0
dz
0
0
dI dI
*
d2 * d2
0 1 .!. * .]
* 1 0 d3 0 0 0 1 *
d3
* * 0 0 0 0 0 0 0
= LDU.
For notational convention, we replace U again by U and write A = LDU. This
decomposition of A is called an LDU factorization or an LDU decomposition of
A.
For example, the matrix A in Example 1.16 was factored as
1 ° 0] [2 1-210]
A = [ -12 -31 01 0-1 1 = LU .
0 0 -4 4
It can be further factored as A = LDU by taking
2 1 10] [200][1
° ° 1/2 1/2 °]
[ -1 -2 1
00-44 o
= 0 -1
0 0 -4
0
0
1 2
1
-1
-1
= DU.
E[Xrr t]18;:,
102
:::::ro:
with the third row, that is, we need to multiply A by the permutation matrix P =
so that
100
PA = = = LDU.
012 011 00200
110]
B=
[ 000
1 3 0
for any value x . It shows that a singular matrix B has infinitely many LDU factor-
izations . 0
Proof: Suppose that PA = LjD j U , = L zDzUz , wheretheL 's are lower triangular,
the U's are upper triangular whose diagonals are all l's, and the D's are diagonal
matrices with no zeros on the diagonal. One needs to show L , = Lz, D; = Dz , and
U, = Uz for the uniqueness .
Note that the inverse of a lower triangular matrix is also lower triangular, and the
inverse of an upper triangular matrix is also upper triangular. And the inverse of a
34 Chapter 1. Linear Equations and Matrices
The left-hand side is upper triangular, while the right-hand side is lower triangular.
Hence, both sides must be diagonal. However, since the diagonal entries of the upper
triangular matrix VI V:;I are all 1's, it must be the identity matrix I (see Problem
1.25). Thus VI V:; 1 = I, i.e., VI = V2. Similarly, L I I L2 = DID:;I implies that
LI = L2 and DI = D2. 0
probl[em/2; all possible permutation matrices P, find the LDU factorization of PA for
A= 2 4 2 .
111
1.9 Applications
1.9.1 Cryptography
ABC D x Y Z Blank ?
t t t t t t t t t t
° 1 2 3 23 24 25 26 27 28.
I 00]
A=
[2 I 0 ,
I I I
which is supposed to be known to both sender and receiver. Then, as a matrix multi-
plication, A translates our message into
To decode this message, the receiver may follow the following process . Suppose
that we received the following reply from our correspondent:
To decode it, first break the message into two vectors in ]R3 as before:
un un
We want to find two vectors XI , X2 such that AXi is the i-th vector of the above two
vectors : i.e.,
AX! Ax,
I 00]
A-I =
[ -2 I 0 .
1 -I I
Therefore,
36 Chapter 1. Linear Equations and Matrices
=[ -21 01 00] [ 45
19 ] = [197 ]
°
XI ,
1 -1 1 26
The numbers one obtains are
Using our correspondence between letters and numbers, the message we have received
is "THANKS."
Problem 1.28 Encode ''TAKEUFO" using the same matrix A used in the above example.
In an electrical network, a simple current flow may be illustrated by a diagram like the
one below. Such a network involves only voltage sources , like batteries, and resistors,
like bulbs , motors, or refrigerators. The voltage is measured in volts, the resistance in
ohms, and the current flow in amperes (amps, in short). For such an electrical network,
current flow is governed by the following three laws:
Ohm's Law: The voltage drop V across a resistor is the product of the current I
and the resistance R: V = I R.
Kirchhoff's Current Law (KCL): The current flow into a node equals the current
flow out of the node.
Kirchhoff's Voltage Law (KVL): The algebraic sum of the voltage drops around
a closed loop equals the total voltage sources in the loop .
Example 1.20 Determine the currents in the network given in Figure 104.
2 ohms 2 ohms
P
18 volts
30 ohms
5 volts
h + h = 12 at P,
12 = h + ls at Q.
Observe that both equations are the same, and one of them is redundant. By applying
KVL to each of the loops in the network clockwise direction, we get
j h -
6h
212
+ 212
h +
+ st,
h =
=
O
0
18.
By solving it, the currents are h = -1 amp, 12 = 3 amps and /3 = 4 amps. The
negative sign for h means that the current h flows in the direction opposite to that
shown in the figure. 0
Problem 1.29 Determine the currents in the networks given in Figure 1.5.
1.9.3 Leontiefmodel
holds. This problem can be described by a system of linear equations, which is called
the LeontiefInput-Output Model . To illustrate this, we show a simple example.
Suppose that a nation's economy consists of three sectors: It = automobile in-
dustry, h == steel industry, and h = oil industry.
Let x = [Xl Xzx3f denote the production vector (or production level) in ]R3 ,
where each entry Xi denotes the total amount (in a common unit such as 'dollars'
rather than quantities such as 'tons' or 'gallons') of the output that the industry Ii
produces per year.
The intermediate demand may be explained as follows. Suppose that, for the total
output Xz units of the steel industry h 20% is contributed by the output of It. 40%
by that of hand 20% by that of h Then we can write this as a column vector, called
a unit consumption vector of h :
0.2 ]
Cz = 0.4 .
[ 0.2
For example , if h decides to produce 100 units per year, then it will order (or demand)
20 units from It , 40 units from hand 20 units from h: i.e., the consumption vector
of h for the production Xz = 100 units can be written as a column vector: 100cz =
[2040 20]T. From the concept of the consumption vector, it is clear that the sum of
decimal fractions in the column cz must be ::: 1.
In our example, suppose that the demands (inputs) of the outputs are given by the
following matrix , called an input-output matrix:
output
It h h
A = input
t,
h
[0.3
0.1
0.2
0.4
03 ]
0.1 .
h 0.3 0.2 0.3
t t t
c\ cz C3
1.9.3. Application: Leontiefmodel 39
In this matrix, an industry looks down a column to see how much it needs from
where to produce its total output, and it looks across a row to see how much of its
output goes to where. For example, the second row says that, out of the total output
Xz units of the steel industry lz. as the intermediate demand, the automobile industry
h demands 10% of the output Xl, the steel industry /z demands 40% of the output
Xz and the oil industry h demands 10% of the output X3. Therefore , it is now easy to
see that the intermediate demand of the economy can be written as
Suppose that the extra demand in our example is given by d = [dl, di, d3f =
[30,20, ioi" , Then the problem for this economy is to find the production vector x
satisfying the following equation:
x = Ax+d.
Another form of the equation is (l - A)x = d, where the matrix I - A is called the
Leontief matrix. If I - A is not invertible, then the equation may have no solution
or infinitely many solutions depending on what d is. If I - A is invertible, then the
equation has the unique solution x = (l - A)-ld. Now, our example can be written
[]
as
[ X3
] = +[ ].
0.3 0.2 0.3 X3 10
In this example, it turns out that the matrix I - A is invertible and
l
Therefore,
x (l - AJ-1d = [
which gives the total amount of product Xi of the industry I i for one year to meet the
required demand .
Remark: (1) Under the usual circumstances, the sum ofthe entries in a column of the
consumption matrix A is less than one because a sector should require less than one
units worth of inputs to produce one unit of output. This actually implies that I - A
is invertible and the production vector x is feasible in the sense that the entries in x
are all nonnegative as the following argument shows.
(2) In general, by using induction one can easily verify that for any k = 1, 2, . . . ,
40 Chapter 1. Linear Equations and Matrices
If the sums of column entries of A are all strictly less than one, then limk-+oo A k = 0
(see Section 6.4 for the limit of a sequence of matrices). Thus, we get (l - A)(l +
A + ... + A k + ...) = I, that is,
Problem 1.31 Suppose that an economy is divided into three sectors : II = services, h =
=
manufacturing industries, and h agriculture . For each unit of output, II demands no services
from 1], 0.4 units from lz. and 0.5 units from 13. For each unit of output. lz requires 0.1 units
from sector II of services. 0.7 units from other parts in sector lz. and no product from sector
13. For each unit of output. 13 demands 0.8 units of services 110 0.1 units of manufacturing
products from lz. and 0.1 units of its own output from lJ. Determine the production level to
balance the economy when 90 units of services, 10 units of manufacturing. and 30 units of
agriculture are required as the extra demand.
1.10 Exercises
1.1. Which of the following matrices are in row-echelon form or in reduced row-echelon form?
0 0 0
0 1 0
-3 ]
4 •
h[l n- 0
O
0
0
1
0
0
0
1
-n D=[l
0 0 1 2
0 0 0
0 0 0
1 0 0
0 1 2
0 1 1
nF=[l
0 0 1
0 0 1
0 0 0
0 0 0
E=[l -H
1 0 0
1 0 0
0 1 0
0 0 1
1 0 -2
0 0 0
1.10. Exercises 41
[I -3 212]
H
2 3 4
3 4 5
(1) 3 -9 10 2 9 (2) 4 5 1
2 -6 4 2 4 '
5 1 2
2 -6 8 1 7
1 2 3
1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2.
1.4. Solve the systems of equations by Gauss-Jordan elimination.
(1)
3XI + 2X2
: X3
+;:
X4
-
I
I
Xl + X2 + 3X3 3X4 -8 .
(2) + z
2x + 4z 1.
What are the pivots in each of 3rd kind elementary operations?
1.5. Which of the following systems has a nontrivial solution?
I
X + 2y + 3z = 0 12x + y - z 0
(1) 2y + 2z = 0 (2) x - 2y - 3z = 0
x + 2y + 3z = O. 3x + y - 2z = O.
1.6. Determine all values of the b, that make the following system consistent:
I
x + y - z = bl
2y + z = b2
Y-Z=b3·
1.7. Determine the condition on b; so that the following system has no solution :
2x + Y + 7z bl
1 6x
2x -
- 2y
y
+
+
l lz
3z
b2
b3.
:j. -I]
1.10. Prove that if A is a 3 x 3 matrix such that AB = BA for every 3 x 3 matrix B , then
A = clJ for some constant c.
1.11. Let A = Find A k for all integers k.
001
42 Chapter 1. Linear Equations and Matrices
A = B = [-; C =[
I 0 I 0 0 1 -2 2 I
1.13. Let f(x) =anxn + an_IX n- 1 + ...+ alx + ao be a polynomial. For any square matrix
A, a matrix polynomial f (A) is defined as
-i
+
n
x 2 - 2x + 3, find f(A) for
1.14. Find the symmetric part and the skew-symmetric part of each of the following matrices .
(I) A =[ ; ;
-1 3 2
(2) A =
0 0
; -1].3
1.16. Let A -I =
U:J.
421
1.17. Find all possible choices of a, band c so that A = [; has an inverse matrix such
that A-I = A.
1.18. Decide whether or not each of the following matrices is invertible. Find the inverses for
invertible ones.
11I]
:], B= [ 05 52 3I , ; -;].
2 I
o 0 0 4
1.19. Find the inverse of each of the following matrices:
i
invertible.
3 4 ]
1.21. Find three matrices which are row equivalent to A - 2 -I .
-3 4
1.10. Exercises 43
1.22. Write the following systems of equations as matrix equations Ax = b and solve them by
computing A -I b:
2x1 - X2 + 2
3X3 =
=
XI -
XI +
IX2 + X3
=
5
(1)
1 X2 -
2x1 + X2 - 2X3
5 (2)
4X3
7, = 4xI
X2
3X2 + 2x3
X3
1.23. Find the LDU factorization for each of the following matrices:
=
-1
-3.
(1) A = J. (2) A = l
UH
1.24. Find the LDL T factorization of the following symmetric matrices :
(1) A (2) A [: n
1.25. Solve Ax = b with A = LU , where Land U are given as
L = [ .: U = b = [ -; ] .
n
o -1 I 0 0 1 4
Forward elimination is the same as Lc = b, and back-substitution is Ux = c.
: ,00
HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
46 Chapter 2. Determinants
krl ] = [ l
... +
.. ...
f kf . f ,
[
rn rn rn
Remark: (1) To be familiar with the linearity rule (R3), note that all row vectors of
any n x n matrix belong to the set Mlxn(lR), on which a matrix sum and a scalar
multiplication are well defined.A real-valued function f : M 1 xn (R) lR is said to be
linear if it preserves these two operations: that is, for any two vectors x, y E M1 xn (R)
and scalar k,
Itis already shown that the det on 2x2 matrices satisfies the rules (Rl)-{R3). In the
next section, one can see that for each positive integer n there always exists a function
2.1. Basic properties of the determinant 47
f : MnxnOR) IR satisfying the three rules (Rl)-(R3) and such a function is unique
(existence and uniqueness). Therefore, we say 'the' determinant and designate it as
'det' in any order.
Let us first derive some direct consequences of the rules (Rl)-(R3).
Proof: (I) Any row can be placed in the first row by interchanging rows with a change
of sign in the determinant by the rule (R2), and then using the linearity rule (R3) and
(R2) again by interchanging the same rows.
(2) If A has a zero row, then this row is zero times the zero row so that det A 0=
by (1). If A has two identical rows, then interchanging those two identical rows does
not change the matrix itself but det A = - det A by the rule (R2), so that det A = O.
(3) By a direct computation using (1), one can get
ri+krj r, rj
det = det + k det
For example, if E is the elementary matrix obtained from the identity matrix by
'multiplies a row by a constant k', then det(EA) =
k det A and det E k by =
Theorem 2.2(1) , so that det(EA) = det E detA . As a consequence, if two matrices
A and B are row-equivalent, then det A = k det B for some nonzero number k.
48 Chapter 2. Determinants
1
A= [ b
b+c c+a
If one adds the second row to the third, then the third row becomes
Problem 2.1 Show that, for an n x n matrix A and k E JR, det(kA) = k n det A.
Problem 2.2 Explain why det A = 0 for
a + l a+4 a+7]
(1) A =[ a+2 a+5 a+ 8 •
a+3 a+6 a+9
Recall that any square matrix can be transformed into an upper triangular matrix
by forward elimination, possibly with row interchanges. Further properties of the
determinant are obtained in the following theorem.
(3) If A is not invertible, then neither is AB, and so det(AB) = 0 = det A det B.
If A is invertible, it can be written as a product of elementary matrices by Theorem
1.9, say A = EIE2 ' " Ei, Then by induction on k,
(4) Clearly, A is not invertibleif and only if AT is not. Thus, for a singular matrix
A we have det AT = 0 = det A. If A is invertible, then write it again as a product of
elementary matrices, say A = E; E2.. . Ei , But, det E = det E T for any elementary
matrix E. In fact, if E is an elementary matrix obtained from the identity matrix by
row interchange, then det E T = -1 = det E by (R2), and all elementary matrices of
other types are triangular, so that det E = det E T . Hence, we have by (3)
Remark: From the equality det A = det AT, one could define the determinant in
terms of columns instead of rows in Definition2.2, and Theorem 2.2 is also true with
'columns' instead of 'rows' .
Example 2.2 (Computing det A by a forward elimination) Evaluate the determinant
of
2-4 0 0]
1 -3 0 1
A = [ 1
3 -4
0 -1
3-1
2 .
= = det
2-4
o 0 0] =
-1 0 1
2 . ( _ 1)2 . 13 = 26. o
det A det U
[o 0 0 -1
0 0 13
4
1 4 2] [11 13 24
12 23
21 22 14] [ x13
(1) 3 1 1 , (2) 31 32 33 34 , (3) x 2
[
-2 2 3 41 42 43 44 x
f(A) = f = f [ + 0 0+ ]
=
ad + 0+ 0 - be = ad - be,
where the third and fourth equalities come from the multilinearity in Theorem 2.2(1).
o
Lemma 2.5, one can derive an explicit formula for det A of a matrix A = [aij] in
M3x3(lR) as follows :
[all a\2
det a21 a22 a23
a\,]
a31 an a33
= det
[all 0
a22 ]+de{ a\2o 0] [0
a23
0
+ det a21 0 af]
0 a33 a31 o 0 0 an
+det
[ ÿn
0
0 a23
o ] + det [0
a21
a\20 0] [0 0
0 + det 0 a22 ]
a32 o 0 0 a33 a31 0
det A becomes the sum of determinants of three matrices . Observe that, in each of the
three matrices, the first row has just one entry from A and all others zero. Subsequently,
by applying the same multilinearity to the second and the third rows of each of the
three matrices, one gets the sum of the determinants of 33 = 27 matrices, each of
which has exactly three entries from A, one in each of three rows, and all other entries
zero. In each of those 27 matrices, if any two of the three entries of A are in the
same column, then the matrix contains a zero column so that its determinant is zero.
Consequently, the determinants of six matrices are left to get the first equality.
The second equality is just the computation of the six determinants by using the
rules (R2) and Theorem 2.3(1). In fact, in each of those six matrices, no two entries
from A are in the same row or in the same column, and thus one can take suitable
'column interchanges' to convert it to a diagonal matrix. Thus the determinant of each
of them is just the product of the three entries with ± sign which is determined by
the number of column interchanges.
Remark: The explicit formula for the determinant of a 3 x 3 matrix can easily be
memorized by the following scheme. Copy the first two columns and put them on
the right of the matrix , and compute the determinant by multiplying entries on six
diagonals with a + sign or a - sign as in Figure 2.1. This is known as Sarrus's method
for 3 x 3 matrices. It has no analogue for matrices of higher order n ::: 4.
The computation of the explicit formula for det A shows that, if any real-valued
function f : M3x3(lR) -+ lR satisfies the rules (Rl)-(R3) , then f(A) = detA for
any matrix A = [aij] E M3x3(lR). This proves the uniqueness theorem when n = 3.
On the other hand, one can easily show that the given explicit formula for det A of
a matrix A E M3x3(lR) satisfies the three rules, which proves the existence when
52 Chapter 2. Determinants
n = 3. Therefore, for n = 3, it shows both the uniqueness and the existence of the
determinant function on M3x3(lR), which proves Theorem 2.4 when n = 3.
Problem 2.5 Showthatthegivenexplicitformula of thedeterminant for3 x 3 matrices satisfies
the three rules (RIHR3).
(1) A =[ 142] ,
3 1 1
-2 2 3
(2) A = [4
1
-2
Now, a reader might have an idea how to prove the uniqueness and the existence of
the determinant function on Mn xn(R) for n > 3. If so, that reader may omit reading its
continued proof below and rather concentrate on understanding the explicit formula
of det A in Theorem 2.6.
For matrices of higher order n > 3: Again , we repeat the same procedure as for
the 3 x 3 matrices to get an explicit formula for det A of any square matrix A = [aij]
of order n.
(Step 1) Just as the case for n = 3, use the multilinearity in each row of A to
get det A as the sum of the determinants of nn matrices. Notice that each one of n"
matrices has exactly n entries from A, one in each of n rows . However, if any two
of the n entries from A are in the same column, then it must have a zero column, so
that its determinant is zero and it can be neglected in the summation. Now, in each
remaining matrix, the n entries from A must be in different columns: that is, no two
of the n entries from A are in the same row or in the same column .
(Step 2) Now, we aim to estimate how many of them remain. From the observation
in Step 1, in each of the remaining matrices, those n entries from A are of the form
with some column indices i , i, k, .. . , l. Since no two of these n entries are in the
same column, the column indices i , l, k, .. . , £ are just a rearrangement of I, 2, .. . , n
without repetition or omissions. It is not hard to see that there are just n! ways of such
rearrangements, so that n! matrices remain for further consideration. (Here n! =
n(n - 1) .. ·2· I, called n factorial.)
2.2. Existence and uniquenessof the determinant 53
Remark: In fact, the n! remaining matrices can be constructed from the matrix A =
[aij] as follows: First , choose anyone entry from the first row of A, say ali, in the
i -th column. Then all the other n - I entries a2j, a3k. .. . , ani should be taken from
the columns different from the i -th column. That is, they should be chosen from
the submatrix of A obtained by deleting the row and the column containing ali . If
the second entry a2j is taken from the second row, then the third entry a3k should be
taken from the submatrix of A obtained by deleting the two rows and the two columns
containing ali and a 2j, and so on. Finally, if the first n - 1 entries
are chosen, then there is no alternative choice for the last one an i since it is the one
left after deleting n - 1 rows and n - I columns from A.
(Step 3) We now compute the determinant of each of the n! remaining matrices.
Since each of those matrices has just n entries from A so that no two of them are in
the same row or in the same column, one can convert it into a diagonal matrix by
' suitable ' column interchanges. Then the determinant will be just the product of the
n entries from A with '±' sign , which will be determined by the number (actually
the parity) of the column interchanges. To determine the sign , let us once again look
back to the case of n = 3.
Example 2.3 (Convert intoa diagonal matrixby column interchanges) Suppose that
one of the six matrices is of the form:
0 0 a 13]
o a22 0 .
[ a31 0 0
Then , one can convert this matrix into a diagonal matrix by interchanging the first
and the third columns. That is,
0 0 a13 ] [ a13
det 0 a 22 0 = - det 0
[
a31 0 0 0
Note that a column interchange is the same as an interchange of the corresponding
column indices. Moreover, in each diagonal entry of a matrix, the row index must be
the same as its column index . Hence, to convert such a matrix into a diagonal matrix,
one has to convert the given arrangement of the column indices (in the example we
have 3, 2, 1) to the standard order 1, 2, 3 to be matched with the arrangement 1,2,3
of the row indices.
In this case, there may be several ways of column interchanges to convert the
given matrix to a diagonal matrix. For example, to convert the given arrangement 3,
2, 1 of the column indices to the standard order 1, 2, 3, one can take either just one
interchanging of 3 and 1, or three interchanges: 3 and 2, 3 and 1, and then 2 and 1. In
either case, the parity is odd so that the "-" sign in the computation of the determinant
came from (-1 ) I = (_ 1)3, where the exponents mean the numbers of interchanges
of the column indices. 0
54 Chapter 2. Determinants
=
In most cases, we use the set of integers Nn {I, 2, .. . , n} for a set of n objects.
A permutation a of Nn assigns a numbera(i) in Nn to each number i in Nn , and this
permutation a is usually denoted by
Here , the first row is the usual lay-out of Nn as the domain set, and the second row
is the image set showing an arrangement of the numbers in Nn in a certain order
without repetitions or omissions. If Sn denotes the set of all permutations of Nn,
then, as mentioned previously, one can see that Sn has exactly n! permutations. For
example, S2 has 2 = 2!, S3 has 6 = 3!, and S4 has 24 = 4! permutations.
For example, the permutation a = (3,1,5,4,2) has five inversions, since 3 pre-
cedes 1 and 2; 5 precedes 4 and 2; and 4 precedes 2. Note that the identity (1, 2, . .. , n)
is the only one without inversions .
For example, when n = 3, the permutations (1, 2, 3), (2, 3, 1) and (3, 1, 2)
are even, while the permutations (1, 3, 2), (2, 1, 3) and (3, 2, 1) are odd.
In general, one can convert a permutation a = (a(1), a(2), . . . , a(n») in Sn into
the identity permutation (1, 2, .. . , n) by transposing each inversion of a . However,
the number of necessary transpositions to convert the given permutation into the
identity permutation need not be unique as shown in Example 2.3. An interesting
fact is that, even though the number of necessary transpositions is not unique, the
parity (even or odd) is always the same as that of the number of inversions. (This
may not be clear and the readers are suggested to convince themselves with a couple
of examples.)
We now go back to Step 3 to compute the determinants of those remaining
n! matrices. Each of them has n entries of the form al 17 ( I ) , a2u(2), . . . , an17 (n)
2.2. Existence and uniqueness of the determinant 55
for a permutation 0' E Sn. Moreover, this can be converted into a diagonal ma-
trix by column interchanges corresponding to the inversions in the permutation
0' = (O'(l), 0'(2), . . . , O'(n)}. Hence, its determinant is equal to
sgn(0')alu(1)a2a(2) . . . anu(n)'
This shows that the determinant must be unique if it exists. On the other hand, one
can show that the explicit formula for det A in Theorem 2.6 satisfies the three rules
(Rl)-(R3). Therefore, we have both existence and uniqueness for the determinant
function of square matrices of any order n 2: 1, which proves Theorem 2.4.
As the last part of this section , we add an example to demonstrate that any per-
mutation 0' can be converted into the identity permutation by the same number of
transpositions as the number of inversions in 0'.
It was done by moving the number 1 to the first position, and then 2 to the second
position by transpositions, and so on. In fact, the bold faced numbers are interchanged
in each one of five steps, and the five transpositions used to convert 0' into the identity
permutation are shown below :
0'{2, 1,3,4, 5){1, 2, 3,5, 4}{1,2, 4, 3, 5}{1,3, 2, 4, 5}{1, 2, 3, 5, 4}= (I, 2, 3,4,5).
Here, two permutations are composed from right to left for notational convention : i.e .,
= =
if we denote T (2, 1,3,4,5), then O'T 0' 0 T. For example, O'T(2) 0'(1) = 3.=
Also, note that 0' can be converted to the identity permutation by composing the
following three transpositions successively:
It is not hard to see that the number of even permutations is equal to that of odd
permutations, so it is if.
In the case n = 3, one can notice that there are three terms
with + sign and three terms with - sign in det A.
Problem 2.7 Show that the number of even permutations and the number of odd permutations
in S« are equal.
Problem 2.8 Let A [CI = cn] be an n x n matrix with the column vectors c/s. Show that
det[c j CI .. . Cj_1 Cj+1 cn] = (-1)j -1 detlc] ... Cj .. . cn]. Note that the same kind of
equality holds when A is written in row vectors.
Even ifone has found an explicit formula for the determinant as shown in Theorem 2.6,
it is not much help in computing because one has to sum up n! terms, which becomes
a very large number as n gets large. Thus, we reformulate the formula by rewriting it
in an inductive way, by which the summing time can be reduced.
The first factor ala (I) in each of the n! terms is one of al1, a12, ... ,al n in the first
row of A. Hence, one can divide the n! terms of the expansion of det A into n groups
according to the value of a(1): Say,
This number A Ij will tum out to be the determinant, up to a µ sign, ofthe submatrix of
A obtained by deleting the l-st row and j -th column. This submatrix will be denoted
by Mlj and called the minor of the entry al j.
Remark: If we replace the entries all, a 12, . .. , al n in the first row of A as unknown
variables XI, X2 , • • . , X n , then det A is a polynomial of the variables Xi , and the number
A I j becomes the coefficient of the variable X j in this polynomial.
We now aim to compute Alj for j = 1,2, .. . , n . Clearly, when j = 1,
Al1 = L sgn(a)a2a(2)" ·ana(n) = Lsgn('r)a2T(2)" . anT(n) ,
aESn,a(l)=1 T
summing over all permutations r of the numbers 2, 3, . .. , n. Note that each term in
A II contains no entries from the first row or from the first column of A. Hence, all
2.3. Cofactor expansion 57
the (n - I)! terms in the sum of All are just the signed elementary products of the
submatrix Mll of A obtained by deleting the first row and the first column of A, so
that
All = det Mll .
To compute the number Alj for j > 1, let A = [CI . . . cn] with the column vectors
cj's and let B = [Cj CI . . . Cj-l Cj+1 ... cn] be the matrix obtained from A by
interchanging the j-th column with its preceding j - 1 columns one by one up to the
first. Then, det A = (-1 )j-I det B (see Problem 2.8). Write
as the expansion of det B. Then, alj = bll and the number BII is the coefficient of
the entry bll in the formula of det B. By noting A Ij is the coefficient of the entry alj
in the formula of det A, one can have Alj = (-I)j-1 BII. Moreover, the minor Mlj
of the entry ai] is the same as the minor NIl of the entry bll. Now, by applying the
previous conclusion All = det Mll to the matrix B, one can obtain Bll = det Nll
and then
. I . I . I
Alj = (-I)J- Bll = (-I)J- detNll = (-I)J - detMlj'
In summary, one can get an expansion of det A with respect to the first row:
This is called the cofactor expansion of det A along the first row.
There is a similar expansion with respect to any other row, say the i-th row. To
show this, first construct a new matrix C from A by moving the i-th row of A up
to the first row by interchanges with its preceding i - I rows one by one. Then
det A = (_1)i-1 det C as before . Now, the expansion of det C with respect to the
first row [ail .. . ain] is
where Clj =:. (-I)j-1 det Mlj and Mlj denotes the minor of Clj in the matrix C.
Noting that Mlj = Mij as minors, we have
The Mij is called the minor of the entry aij and the number Aij
( _1)1 + J det Mij is called the cofactor of the entry aij.
Also, one can do the same with the column vectors because det AT = det A. This
gives the following theorem:
58 Chapter 2. Determinants
Theorem 2.7 Let A be an n x n matrix and let Aij be the cofactor of the entry aij'
Then,
(1) for each 1 s i s n,
det A = ail Ail + aizAiz + . . .+ ainAin,
called the cofactor expansion of det A along the i -th row.
(2) For each 1 s sj n,
+ + (_l)l+n
+ (_l)2+n
+ + (_1)3+n
23]
5 6 .
8 9
Then the cofactors of all , al2 and al3 are
The cofactor expansion formula of det A suggests that the evaluation of Aij can be
avoided whenever aij = 0, because the product aij Aij is zero regardless of the value
of Aij. Therefore, the computation of the determinant will be simplified by making
the cofactor expansion along a row or a column that contains as many zero entries
as possible. Moreover, by using the elementary row (or column) operations which
do not alter the determinant, a matrix A may be simplified into another one having
more zero entries in a row or in a column whose computation of the determinant may
be simpler. For example, a forward elimination to a square matrix A will produce an
upper triangular matrix U, and so the determinant of A will be just the product of the
diagonal entries of U up to the sign caused by possible row interchanges. The next
examples illustrate this method for an evaluation of the determinant.
I -1]
I -1 2
-3 4 1 -1
A -- 2 -5 -3 8 .
-2 6-4 I
3 x row 1 + row 2,
(-2) x row 1 + row 3,
2 x row 1 + row 4
to A; then
det A = det
1 -1
o
0 -3
1 2-1] = det
[1 7-4]
-3 -7 10 .
[
o 4 o -1 4 0-1
Now apply the operation : 1 x row 1 + row 2, to the matrix on the right-hand side,
and take the cofactor expansion along the second column to get
7
-4 ]
-7 10 = det -:]
o -1 4 0 -1
= (_1)1+2.7. det [
Example 2.7 Show that det A = (x - y)(x - z)(x - w)(y - z)(y - w)(z - w) for
the Vandermonde matrix of order 4:
Solution: Use Gaussian elimination. To begin with, add (-1) x row 1 to rows 2, 3,
and 4 of A :
3
detA = det
[ 01 y -x x y2 x'
_ x 2 y3 x_ x3 J
o z-x Z2 - x 2 Z3 _ x 3
o w-x w 2 _ x 2 w 3 _x 3
Problem 2.9 Use cofactor expansions along a row or a column to evaluate the determinants of
the following matrices:
(1) A = 2 2 0 1 '
2 220
0
-a
a
0 be]
dc
(2) B ==
[
-b-d o 1 .
-c -e -I 0
[ :::
.':: •
is called the matrix of cofactors of A. Its transpose is called the adjugate of A and
is denoted by adj A.
[ -cd-b]
a
, and if det A = ad - be f:. 0, then
A-I _ 1 [ d -ab ] .
- ad -bc -c
Problem 2.12 Show that A is invertible if and only if adj A is invertible, and that if A is
invertible, then
(adj A)-I = -A- = adj(A- 1 ) .
detA
The next theorem establishes a formula for the solution of a system of n equations
in n unknowns . It may not be useful as a practical method but can be used to study
properties of the solution without solving the system.
detCj
x'--- j = 1,2, ... , n,
]- detA '
where C j is the matrix obtainedfrom A by replacing the j -th column with the column
matrixb = [bl b2 .. . bnf.
2.4. Crame r 's rule 63
I
Xl + 2X2 + X3 = 50
2Xl + 2X2 + X3 = 60
Xl + 2 X2 + 3X3 = 90.
Solution:
[ ; 2 I
2I] ,
n
2 3
Cl =
[ 50
2
60 2
90 2
[i 50 1]
60 I ,
90 3
[i 2
50 ]
2 60 .
2 90
Therefore,
det CI det C2 detC3
Xl = - - = 10
det A '
X2 = - - = 10,
detA
X3 = - - = 20.
det A
o
Cramer's rule provides a convenient method for writing down the solution of a
system of n linear equations in n unknowns in terms of determina nts. To find the
solution, however, one must evaluate n + I determinants of order n, Evaluating even
two of these determi nants generally involves more computations than solving the
system by using Gauss-Jordan elimination.
Problem 2.15 Use Cramer 's rule to solve the following systems.
I
4x2 + 3X3 -2
(I) 3Xl + 4X2 + 5X3 6
- 2x 1 + 5X2 2X3 1.
2 3 5
- + - 3
x y z
4 7 2
(2) + -y + - 0
x z
2 I
- - - 2.
y z
Problem 2.16 Let A be the matrix obtained from the identity matrix In with i-th column
replaced by the column vector x = [Xl .. . xn f. Compute det A.
64 Chapter 2. Determinants
2.5 Applications
1 Xl Xf ...
1 X2 X22 . . . X2n-l
A= . . . . .. '
[ .. . .
"
1 Xn x n2 x nn - l
Its determinant can be computed by the same method as in Example 2.7 as follows:
Example 2.11 Let Aij denote the cofactor of aij in a matrix A = [aij]. If n > 1,
then
0 Xl X2 Xn
xl au al2 al n n n
det x2 a21 a22 a2n = - LLAijXiXj .
i=l j=l
x n anI an2 ann
Solution: First take the cofactor expansion along the first row, and then compute the
cofactor expansion along the first column of each n x n submatrix. 0
fl(X) h(x)
fn(x) ]
f{(x) f 2(x )
[ ,
f?-:I) (x) j 2(n- l)( )
x .. ·
its determinant is called the Wronskian for (fl (x), h(x) , . . . , fn(x)} , For example ,
al a2 a3 an
an al a2 an-I
A= an-I an al an-2
a2 a3 a4 al
detA = n
n-I
j =O
(al +a2Wj +a3w7 +... +anwrl) ,
where W j = e2rrij In, j = 0, 1, ... , n - 1, are the roots of unity. (See Example 8.18
for the proof.)
al bl 0 0 0 0
CI a2 b2 0 0 0
0 C2 a3 0 0 0
Tn =
0 0 0 an-2 bn-2 0
0 0 0 Cn-2 an-I bn-I
0 0 0 0 Cn-I an
o; = n
b [cos n; + sin n; l
Later in Section 6.3.1, it will be discussed how to find the n-th term of a given
recurrence relation. (See Exercise 8.6.)
Case (2) Let all bj = 1 and all Ck = -1, and let us write
al 0 0 0 0
-1 a2 1 0 0 0
0 -1 a3 0 0 0
(al . .. an) = det
0 0 0 an-2 0
0 0 0 -1 an-I 1
0 0 0 0 -1 an
Then,
(ala2·· · an) 1
=al+
(a2a3 .. . an) + 1
a2
a3+
+ 1
an-I +-an
where Ak! ...k n is the minor obtained from the columns of A whose numbers are
kl' . . . ,kn and Bk! ...kn is the minor obtained from the rows of B whose numbers
2.5.2. Application:Area and volume 67
are kl' ... ,kn . In other words, det(AB) is the sum of products of the corresponding
majors of A and B, where a major of a matrix is, by definition, a determinant of
maximal order minor in the matrix.
m
L alkt ' . . ankn L (_1)<1 bkt<1(l) ... bkn<1(n)
kt.....kn=l <1eSn
m
= L alkl" . anknB kl...kn.
kt.....kn=l
The minor Bkl ...kn is nonzero only if the numbers kl' ,kn are distinct. Thus, the
summation can be performed over distinct numbers kl ' ,kn . Since B,(kIl...,(kn) =
(-1)' Bkt ...kn for any permutation r of the numbers kl' ,kn , we have
1 -1 3 3 21 -12 ]
For example, if A = [ 2 2-1 2] and B _; • then
[
+ det [ 21 3]
2 det [11 2]
2 + det [ -12 -13] det [ -23 -1]
1
+ det [
- 1
2 23] [2 -1] + [3 3] [-3 1]
det 1 2 det -1 2 det 1 2
= -167.
For an n x n square matrix A, its row vectors ri = [a i 1 ... ain] can be considered
as elements in
The set
P(A) = [ttiri:o s u s
1=1
1, i = 1,2, .. . ,nl
is called a parallelogram if n = 2, or a parallelepiped if n 2: 3. Note that the row
vectors of A form the edges of P (A) , and changing the order of row vectors does not
alter the shape of P(A) .
Theorem 2.10 The determinant det A of an n x n matrix A is the volume ofP(A)
up to sign. Infact, the volume ofP(A) is equal to 1 det A I.
Proof: We give the proof for the case n = 2 only, and leave the case n = 3 to the
readers. Let
where rl , r2 are the row vectors of A. Let Area(A) denote the area of the parallelo-
gram P(A) (see Figure 2.3). Note that
Area [ ] = Area [ ] ,
l
but
det [ ] = - det [
Thus, one can expect in general that
det [ ] = ±Area [ ] ,
which explains why we say 'up to sign' . To determine the sign ±, we first define the
orientation of A = [ ] to be
Tofinish the proof, it is sufficientto show that the function D(A) = p(A)Area(A)
satisfies the rules (RI)-(R3) of the determinant, so that det = D = ±Area, or
Area(A) = 1 det AI. However,
2.5.2. Application: Area and volume 69
y y
___--+"""""""-'-__ x -------IO.-+--- x
Area [ ] = Area [ l
(3) D [ ] = kD [ ] for any k . Indeed, if k = 0, it is clear. Suppose k t= o.
Then , as illustrated in Figure 2.3, the bottom edge r j ofP(A) is elongated by Ikl
while the height h remains unchanged. Thus
-¥::::.....-----------+-x
Therefore , we have
D[ = p ([ ]) Area [ ]
= 1:1 p ([ ]) IklArea [ ] = kD [ ].
= D [ alU blv ] + D [ a2
U V
b2 ]
= D[ ] + D[ l
The second equality follows from (2) and Figure 2.4.
o
Remark: (1) Note that if we have constructed the parallelepiped peA) using the
column vectors of A, then the shape of the parallelepiped is totally different from
the one constructed using the row vectors. However, det A = det A T means their
volumes are the same, which is a totally nontrivial fact.
(2) For n 3, the volume ofP(A) can bedefined by induction onn, and exactly the
same argument in the proof can be applied to show that the volume is the determinant.
However, there is another 'Nayoflooking at this fact. Let {CI, C2, . . . , cn } be n column
2.5.2. Application:Area and volume 71
z
Figure 2.5. A parallelogram in ]R3
simply Area(P(A)) = IIc\lIh, where h = IIc211 sin () and () is the angle between CI
and C2. Therefore. we have
2II 2
Area(P(A))2 = IIcIil c211 sin 2 () = IIcIi1211c2112(l - cos 2 ())
= det [ C\ . C\ CI' C2 ]
C2 . CI C2 ' C2
where "." denotes the dot product. In general, let CI, • . . • Cn be n column vectors
of an m x n (not necessarily square) matrix A. Then one can show (for a proof see
Exercise 5.17) that the volume of the n-dimensional parallelepiped P(A) determined
by those n column vectors C/s in R m is
vol(P(A)) = Jdet(A T A) .
In particular, if A is an m x m square matrix. then
Problem 2.17 Show thattheareaofa triangle ABC in the plane lR 2 , where A = (xj , YI), B =
=
(X2, Y2), C (X3, Y3), is equal to the absolute value of
2I det [Xl
X2
YI
Y2
1]
1 .
X3 Y3 1
2.6 Exercises
A =
2 -3 -5 8
2.4. Evaluate det A for an n x n matrix A = [aij] when
(1) aij = { 0
I ji f
. _ . (2) aij = j + j.
J,
1-
(1) { Xl + X2 = 3
xl x2 = -1.
Xl + x2 + X3 2
(2)
1 xl
Xl
+
+
2X2
3X2
+ X3
X3
=
-4 .
2
X2 + X4 = -1
Xl + X3 = 3
(3)/ Xl x2 x3 X4 2
Xl + X2 + X3 + X4 O.
2.6. Exercises 73
(U:
2.9. Use Cramer's rule to solve the given system :
(l)[l 2J
2.10 . Find a constant k so that the system of linear equations
kx - 2y - z 0
(k + l)y + =
1 4z
(k - ljz = 0
0
has more than one solution . (Is it possible to apply Cramer's rule here?)
2.11. Solve the following system oflinear equations by using Cramer's rule and by using Gaus -
sian elimination:
[1 :
2.12 . Solve the following system of equations by using Cramer's rule:
3x + 2y = 3z + 1
3x +
1 3z -
2z
1
=
=
8 -
x -
5y
2y.
2.13. Calculate the cofactors A II, A \2, A 13 and A33 for the matrix A:
; ;],
211 312
(3)A=[-;
32
;1]'
2.14. Let A be the n x n matrix whose entries are all 1. Show that
(1) det(A - nIn) = O.
(2) (A - nIn)ij =
(_1)n-1 nn-2 for all i, j, where (A - nIn)ij denotes the cofactor of
the (i , j)-entry of A - nino
2.15 . Show that if A is symmetric, so is adj A. Moreover, if it is invertible , then the inverse of
A is also symmetric.
2.16 . Use the adjugate formula to compute the inverses of the following matrices:
;],
4 1 -1 sine 0 cos s
2.17. Compute adj A, detA, det(adj A), A -I, and verify A · adj A = (detA)I for
det ] = det(AB) .
2.20. Find the area of the triangle with vertices at (0,0), (1,3) and (3,1) in ]R2 .
74 Chapter 2. Determinants
2.21. Find the area of the triangle with vertices at (0, 0, 0), (1, 1,2) and (2,2,1 ) in ]R3.
2.22. For A , B, C, D E Mn x n (]R), show that det = det A det D. But , in general,
det =P det A det D - det B det C.
2.23. Determine whether or not the following statements are true in general, and justify your
answers.
(1) For any square matrices A and B of the same size, det(A + = det A + det B .
B)
(2) For any square matrices A and B of the same size, det(AB) = det(BA) .
(3) If A is an n x n square matrix, then for any scalar c, det(c1n - A) = c" - det A .
(4) If A is an n x n square matrix, then for any scalar c, det(c1n - AT) = det(cln - A).
(5) If E is an elementary matrix, then det E = µ1.
(6) There is no matrix A of order 3 such that A 2 = -lJ.
(7) Let A be a nilpotent matrix, i.e., A k = 0 for some natural number k , Then det A = 0.
(8) det(kA) = k det A for any square matrix A.
(9) The multilinearity holds in any two rows at the same time :
a+u b+ v c+w] [a b C] [U
+ x e + y f + z = det d e f + det x
[e m n
det d
emn e
(10) Any system Ax = b has a solution if and only if det A f= O.
(1 1) For any n x 1, n 2, column vectors u and Y, det(uy T )= o.
(12) If A is a square matrix with det A = 1, then adj(adj A ) = A .
(13) If the entries of A are all integers and det A = 1 or -1 , then the entries of A-I are
also integers .
(14) If the entries of A are O's or 1's, then det A = 1,0, or -1.
(15) Every system of n linear equations in n unknowns can be solved by Cramer's rule.
(16) If A is a permutation matrix, then AT = A.
3
Vector Spaces
Wehave seen thatthe Gauss-Jordan elimination is the most basic technique for solving
a system Ax = b of linear equations and it can be written in matrix notation as an
LDU factorization. Moreover, the questions of the existence or the uniqueness of the
solution are much easier to answer after the Gauss-Jordan eliminat ion. In particular,
if det A f= 0, x = 0 is the unique solution Ax = O. In general, the set of solutions
of Ax = 0 has a kind of mathematic al structure, called a vector space, and with this
concept one can characteri ze the uniqueness of the solution of a system Ax = b of
linear equations in a more systematic way.
In this chapter, we introduce the notion of a vector space, which is an abstraction
of the usual algebraic structure s of the 3-space ]R3 and then elaborate our study of a
system oflinear equations to this framework.
Usually, many physical quantities, such as length, area, mass, temperature are
described by real numbers as magnitudes. Other physical quantities like force or
velocity have directions as well as magnitudes . Such quantities with direction are
called vectors, while the numbers are called scalars. For instance, a vector (or a
point) x in the 3-space ]R3 is usually represented as a triple of real numbers:
HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
76 Chapter 3. Vector Spaces
by a scalar. That is, for two vectors x = (XJ, x2, X3), Y = (YI, Y2 , Y3) in 1R3 and k a
scalar, we define
X3 x+y X3
Xl
Figure 3.1. A vector sum and a scalar multiplication
Even though our geometric visualization of vectors does not go beyond the
3-space 1R3 , it is possible to extend these algebraic operations of vectors in the 3-
space 1R3 to the n-space IRn for any positive integer n. It is defined to be the set of all
ordered n-tuples (aI, a2, ... ,an) of real numbers, called vectors: i.e.,
For any two vectors x = (Xl, X2, . .. ,xn) and Y = (YI, Y2 , . .. , Yn) in the n-space IRn ,
and a scalar k, the sum X + Yand the scalar multiplication kx of them are vectors in
IRn defined by
Theorem 3.1 For any scalars k and f., and vectors x = (Xl , X2, . . . ,Xn ), Y =
(YI, Y2, . . . , Yn), and z = (Zl, Z2, . . . ,Zn) in the n-space IRn , the following rules
hold:
3.1. The n-space jRn and vector spaces 77
(1) x + Y = Y + x,
(2) x + (y + z) = (x + y) + z,
(3) x + 0 = x = 0 + x,
(4) x+(-I)x=O,
(5) k(x + y) = kx + ky,
(6) (k + e)x = kx + ex,
(7) k(ex) = (ke)x,
(8) lx = x,
where 0 = (0, 0, ... , 0) is the zero vector.
We usually identify a vector (aI, a2, ... ,an) in the n-space lRn with an n x I
column vector
Sometimes a vector in lRn is also identified with a I x n row vector (see Section 3.5).
Then, the two operations of the matrix sum and the scalar multiplication of column
matrices coincide with those of vectors in lRn , and Theorem 3.1 rephrases Theorem 1.3.
These rules of arithmetic of vectors are the most important ones because they
are the only rules that we need to manipulate vectors in the n-space lR n . Hence,
an (abstract) vector space can be defined with respect to these rules of operations of
vectors in the n-space lRn so that lRn itself becomes a vector space. In general, a vector
space is defined to be a set with two operations: an addition and a scalar multiplication
which satisfy the rules (1)-(8) in Theorem 3.1.
Definition 3.1 A (real) vector space is a nonempty set V of elements, called vectors,
with two algebraic operations that satisfy the following rules.
(A) There is an operation called vector addition that associates to every pair x and
y of vectors in V a unique vector x + y in V, called the sum of x and y, so that the
following rules hold for all vectors x, y, z in V :
(1) x + Y = Y+ x (commutativity in addition),
(2) x + (y+z) = (x-l-y) + z(= x + Y+ z) (associativity in addition),
(3) there is a unique vector 0 in V such that x + 0 = x = 0 + x for all x E V (it is
called the zero vector) ,
(4) for any x E V, there is a vector -x E V, called the negative of x, such that
x + (-x) = (-x) + x = O.
(B) There is an operation called the scalar multiplication that associates to each
vector x in V and each scalar k a unique vector kx in V so that the following rules
hold for all vectors x, y, z in V and all scalars k, e:
(5) k(x + y) = kx + ky (distributivity with respect to vector addition),
(6) (k + e)x = kx + ex (distributivity with respect to scalar addition) ,
78 Chapter 3. Vector Spaces
Then the set C(lR) is a vector space under these operations. The zero vector in this
space is the constant function whose value at each point is zero.
(4) Let S(JR) denote the set of real-valued functions defined on the set of integers.
A function I E S(JR) can be written as a doubly infinite sequence of real numbers:
. .• , X-2, X- I , Xo , XI , X2 , •• . ,
=
where Xk I (k) for each k. This kind of sequences appear frequently in engineering,
and is called a discrete or a digital signal. One can define the sum of two functions
and the scalar multiplication of a function with a scalar just as in C(JR) in (3) so that
S(JR) becomes a vector space. 0
-x = -x + 0 = -x + (x + x) = (-x + x) + X= 0 + x = x.
On the other hand, the equation
=
shows that (-l)x is another negative of x, and hence -x (-l)x by the uniqueness
of-x.
(5) Suppose kx = 0 and k ,p. O.Then x = lx = t(kx) = = O. to0
Problem 3.1 Let V be the set of all pairs (x, y) of real numbers. Suppose that an addition and
scalar multiplication of pairs are defined by
Is the set V a vector space under those operations? Justify your answer.
3.2 Subspaces
Definition 3.2 A subset W of a vector space V is called a subspace of V if W itself
is a vector space under the addition and the scalar multiplication defined in V.
Theorem 3.3 A nonempty subset W ofa vector space V is a subspace if and only if
x + y and kx are contained in W (or equivalently, x + ky E W) for any vectors x
and y in Wand any scalar k E JR.
Proof: We need only to prove the sufficiency. Assume both conditions hold and let
x be any vector in W. Since W is closed under scalar multiplication, 0 = Ox and
-x = (-I)x are in W, so rules (3) and (4) for a vector space hold . All the other rules
for a vector space are clear. 0
A vector space V itself and the zero vector {OJ are trivially subspaces. Some
nontrivial subspaces are given in the following examples.
80 Chapter 3. Vector Spaces
W = {(x, Y, z) 3
E lR : ax + by + cz = O},
W = {x E lRn : Ax = O}
Example 3.4 For a nonnegative integer n, let Pn(lR) denote the set of all real poly-
nomials in x with degree :5 n. Then Pn (R) is a subspace of the vector space C(lR) of
all continuous functions on R 0
Example 3.5 (The space ofsymmetricor skew-symmetric matrices) Let W be the set
of all n x n real symmetric matrices. Then W is a subspace of the vector space Mn xn (R)
of all n x n matrices, because the sum of two symmetric matrices is symmetric and
a scalar multiplication of a symmetric matrix is also symmetric. Similarly, the set of
all n x n skew-symmetric matrices is also a subspace of Mn xn{lR). 0
Problem 3.2 Which of the following sets are subspaces of the 3-space Justify your answer.
(1) W = =
{(x, y, z) E JR3 : x yz O} ,
(2) W =
{(2t , 3t , 4t) E JR3 : t E JR},
(3) W={(X,y ,Z)EJR3 : xZ + y2 _ z2 = O},
(4) W =
{x E JR3 : x T u = =
0 x T v}, where u and v are any two fixed nonzero vectors in
JR3.
Problem 3.3 Let V = C(JR) be the vector space of all continuous functions on lR. Which of
the following sets Ware subspaces of V ? Justify your answer.
(l) W is the set of all differentiable functions on lR.
(2) W is the set of all bounded continuous functions on R.
(3) W is the set of all continuous nonnegative-valued functions on JR, i.e., !(x) 2:: 0 for any
x E lR.
(4) W is the set of all continuous odd functions on JR, i.e., ! (-x)=- ! (x) for any x E lR.
(5) W is the set of all polynomials with integer coefficients.
V + W = {u + W E V : u E V , WE W}.
(2) A vector space V is called the direct sum of two subspaces V and W, written as
V = V ffi W , if V = V + Wand V n W = {O}.
(l) Suppose that Z is a subspace of V contained in both U and W . Show that Z is also
contained in U n W.
(2) Suppose that Z is a subspace of V containing both U and W. Show that Z also contains
U + W as a subspace.
Theorem 3.4 A vector space V is the direct sum of subspaces V and W , i.e., V =
V ffi W, if and only if for any v E V there exist unique u E V and W E W such that
v = u-j-w.
Proof: (:::}) Suppose that V = V ffi W . Then, for any v E V, there exist vectors
u E V and WE W such that v = u + w, since V = V + W. To show the uniqueness,
suppose that v is also expressed as a sum u' + w' for u' E V and w' E W. Then
u + W = u' + w' implies
1 1 1 2
v= v+ 0 = 0 + v = -v + -v = -v + -v E V + W. o
2 2 3 3
Example 3.6 (Sum, but not direct sum) In the 3-space JR3 , consider the three vectors
el = (1,0,0), e2 = (0, 1,0) and e3 = (0,0,1). These three vectors are also well
known as l, j and k respectively. Let V = {ai + ck : a, c E JR} be the xz-plane, and
let W = {bj + ck : b, c E JR} be the yz-plane, which are both subspaces ofJR3. Then
a vector in V + W is of the form
ai + bj + (Cl + c2)k
ai + bj + ck = (a , b , c),
82 Chapter 3. VectorSpaces
Note that if we had taken W = {yj : y E ]R} to be the y-axis, then it would be
easy to see that]R3 = U $ W. Note also that there are many choices for W so that
]R3 =U$ W. 0
Problem 3.5 Let U and W be the subspaces of the vector space MnxnOR) consisting of all
symmetric matrices and all skew-symmetric matrices, respectively. Show that Mnxn(lR) =
U e W . Therefore, the decomposition of a square matrix A given in (3) of Problem 1.11 is
unique.
3.3 Bases
As we know, a vector in the 3-space]R3 is of the form (XI , xz, X3) , and also it can be
written as
That is, any vector in ]R3 can be expressed as a sum of scalar multiples of three vectors
el = (1,0,0), ez = (0,1,0) and e3 = (0,0,1). The following definition gives a
name to such an expression.
Definition 3.4 Let V be a vector space and let {XI,Xz, . .. , Xm } be a set of vectors in
V. Then a vector y in V of the form
The next theorem shows that the set of all linear combinations of a finite set of
vectors in a vector space forms a subspace .
Theorem 3.5 Let XI, Xz, . .. , Xm be vectors in a vector space V. Then the set W =
{aIxI + azxz + ... + amXm : a; E ]R} ofall linear combinationsof XI. Xz, . . . , Xm
is a subspace of V. It is called the subspace of V spanned by XI , Xz, . .. , Xm • Or,
XI, Xz, ... , Xm span the subspace W.
Proof: It is necessary to show that W is closed under the vector sum and the scalar
multiplication. Let u and w be any two vectors in W. Then
u = aIxI + azxz +
w = bIXI + bzxz +
3.3. Bases 83
en = (0,0,0, . . . , 1)
be n vectors in the n-space JRn (n 2: 3). Then a linear combination of ej , e2, e3 is of
the form
is the subspace of the n-space JRn spanned by the vectors ej , e2, e3. Note that the
subspace W can be identified with the 3-space JR3 through the identification
Example 3.9 (All Ax'sform the column space) Let A = [Cl C2'" cn] be an m x n
matrix with column c, 'so Then the column vectors Ci are in lRm , and the matrix product
Ax represents the linear combination of the column vectors Ci whose coefficients are
the components of x E lRn, i.e., Ax = Xl Cl + X2C2 + ... + XnC n (see Example 1.9).
Therefore, the set
W = {Ax E lRm : x E R"}
of all linear combinations of the column vectors of A is a subspace of lRm called the
=
column space of A . Consequently, Ax b has a solution (Xl , X2, .. . , x n ) in lRn if
and only if the vector b belongs to the column space of A. 0
Remark: One can take another point of view on the equation Ax = b. For any vector
x E lRn, A assigns another vector b in lRm which is given as Ax. That is, A can be
considered as a function from lRn into lRm and the column space of A is nothing but
the image of this function.
Problem 3.6 Let Xl, X2, , Xm be vectors in a vector space V and let W be the sub-
space spanned by xj , x2, , Xm. Show that W is the smallest subspace of V containing
Xl, X2, .. . , X m. Inotherwords,ifU is a subspace of V containing xl, X2, ... , xm,then
W £; U.
As we saw in Theorem 3.5 and Example 3.7, any nonempty subset of a vector
space V spans a subspace through the linear combinations of the vectors, and two
different subsets may span the same subspace. This means that a vector can be written
as linear combinations in various ways.
However, for some sets of vectors in a vector space V , any vector in V can be
expressed uniquely as a linear combination of the set. Such a set of vectors is called
a basis for V. In the following we will make this notion clear and show how to find
such a basis .
Definition 3.5 A set of vectors {Xl, X2, .. " xm} in a vector space V is said to be
linearly independent if the vector equation, called the linear dependence of Xi 'S,
By definition, a set of vectors {Xl, X2, . . . , xm} is linearly dependent if and only
if the linear dependence
has a nontrivial solution (cj , C2, . . . , cm). For example , if Cm =1= 0, the equation can
be rewritten as
ci C2 Cm-l
Xm = --Xl - -X2 - ... - --Xm-l.
Cm Cm Cm
That is, a set ofvectors is linearly dependent if and only if at least one of the vectors
in the set can be written as a linear combination of the others. It means that at least
3.3. Bases 85
one of the vectors can be expressed as a linear combination of the set in two different
ways.
Example 3.10 (Linear independence in JR3) Let x = (1,2,3) and y = (3,2,1) be
two vectors in the 3-space JR3. Then clearly y =1= AX for any A E ]R (or ax + by = 0 is
possible only when a = b = 0). This means that {x, y} is linearly independent in JR3.
If w = (3, 6, 9), then {x, w} is linearly 'dependent since w - 3x = O. In general, if
x, y are non-collinear vectors in the 3-space ]R3, the set of all linear combinations of x
and y determines a plane W through the origin in]R3, i.e., W = {ax + by : a , b E R},
Let z be another nonzero vector in the 3-space JR3. If z E W, then there are some
scalars a, b E JR, not all of them are zero, such that z = ax + by, that is, the set
[x, y , z} is linearly dependent.Ifz W, then ax + by + cz = 0 is possible only
when a = b = c = 0 (prove it). Therefore, the set {x, y, z] is linearly independent
if and only if z does not lie in W. 0
By abuse of language, it is sometimes convenient to say that "the vectors
xr, X2, . .. , xm are linearly independent," although this is really a property of a
set.
Example 3.11 The columns of the matrix
A= 4
1 -2
2
[ 2 -1
-1 0]
6 8
1 3
are linearly dependent in the 3-space ]R3, since the third column is the sum of the first
and the second. 0
As shown in Example 3.11, the concept of linear dependence can be applied to
the row or column vectors of any matrix.
Example 3.12 Consider an upper triangular matrix
235] .
A=
[ 016
004
The linear dependence of the column vectors of A may be written as
From the third row, c3 = 0, from the second row c2 = 0, and substitution of them
into the first row forces Cl = 0, i.e., the homogeneous system has only the trivial
solution, so that the column vectors are linearly independent. 0
86 Chapter 3. Vector Spaces
Theorem 3.6 The nonzero rows of a matrix in row-echelon form are linearly inde-
pendent, and so are the columns that contain leading I's.
In particular, the rows of any triangular matrix with nonzero diagonals are linearly
independent, and so are the columns.
If V = lRm and VI, V2, . .. , Vn are n vectors in lRm , then they form an m x n
matrix A = [VI v2 .. . vnl . On the other hand, Example 3.9 shows that the linear
dependence ct VI + C2V2 + . .. + Cn Vn = 0 of Vi 'S is nothing but the homogeneous
equation Ax = 0, where x = (CI , c2, . .. , cn ) . Thus, one can say
(1) the column vectors Vi 'S of A are linearly independent in lRm if and only if the
homogeneous system Ax = 0 has only the trivial solution, and
(2) they are linearly dependent if and only if Ax = 0 has a nontrivial solution.
If U is the reduced row-echelon form of A , then we know that Ax = 0 and Ux = 0
have the same set of solutions. Moreover, a homogeneous system Ax = 0 with more
unknowns than equations always has a nontrivial solution by Theorem 1.2. This proves
the following lemma.
Lemma 3.7 (1) Ifn > m, any set ofn vectors in the m-space lRm is linearly depen-
dent.
(2) If U is the reduced row-echelon form of A, then the columns of U are linearly
independent if and only if the columns of A are linearly independent.
Example 3.13 Consider the vectors el = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1)
in the 3-space lR3. The matrix A = [e\ e2 e3l is the identity matrix and so Ax = 0 has
only the trivial solution. Thus , the set of vectors {el , e2, e3} is linearly independent
and also spans lR3 . 0
Example 3.14 (The standard basis for lRn ) The vectors ej , e2, ... , en in lRn are
clearly linearly independent (see Theorem 3.6). Moreover, they span the n-space lRn :
In fact, a vector x = (Xl, x2, .. . ,X n ) E lRn is a linear combination of the vector ei 's :
o
Definition 3.6 Let V be a vector space. A basis for V is a set of linearly independent
vectors that spans V.
For example, as in Example 3.14, the set [ej , e2, .. . , en} forms a basis, called the
standard basis for the n-space lRn . Of course, there are many other bases for lRn .
(2) The set of vectors (1, 0, 0), (0,1,1), (1, 0,1) and (0,1,0) is not a basis either,
since they are not linearly independent (the sum of the first two minus the third makes
the fourth) even though they span jR3 . This set of vectors has some redundant vectors
spanning jR3.
(3) The set of vectors (1,1,1), (0,1,1), and (0, 0,1) is linearly independent and
also spans jR3. That is, it is a basis for jR3, different from the standard basis. This set
has the proper number of vectors spanning jR3, since the set cannot be reduced to a
smaller set nor does it need any additional vector to span jR3 . 0
By definition, in order to show that a set of vectors in a vector space is a basis, one
needs to show two things: it is linearly independent, and it spans the whole space.
The following theorem shows that a basis for a vector space represents a coordinate
system just like the rectangular coordinate system by the standard basis for jRn.
Theorem 3.8 Let a = {VI, V2, ... , vn} be a basis for a vector space V. Then each
vector x in V can be uniquely expressed as a linear combination of VI, V2, ... , Vn,
i.e., there are unique scalars a.'s, i = 1, 2, .. . , n, such that
Example 3.16 (Two different bases for jR3) Leta = [ej , e2, e3} be the standard basis
forjR3, and let,B = {VI , V2 , V3} with VI = (1, 1, 1) = el +e2 +e3, V2 = (0, 1, 1) =
e2 + e3, V3 = (0,0, 1) = ej , Then, ,B is also a basis for JR3 (see Example 3.15(3Ÿ .
For any x = (XI, X2 , X3) E jR3, one can easily verify that
x XI el + X2 e2 + X3e3
Problem 3.8 Show that the set {l , x, x 2 , ... , x"} is a basis for Pn (R), the vector space of all
polynomials of degree n with real coefficients .
Problem 3.9 Let Xk denote the vector in lRn whose first k - 1 coordinates are zero and whose
last n - k + I coordinates are 1. Show that the set {XI, X2, ... , xn} is a basis for lRn .
88 Chapter 3. Vector Spaces
3.4 Dimensions
We often say that the line ]R 1 is one-dimensional, the plane]R2 is two-dimensional and
the space R' is three-dimensional, etc. This is mostly due to the fact that the freedom
in choosing coordinates for each element in the space is 1,2 or 3, respectively. This
means that the concept of dimension is closely related to the concept of bases . Note
that for a vector space in general there is no unique way in choosing a basis . However,
there is something common to all bases, and this is related to the notion of dimension.
We first need the following lemma from which one can define the dimension of a
vector space.
Lemma 3.9 Let V be a vector space and let ex = {Xl, X2, ... , xm} be a set of m
vectors in V .
(1) Ifex spans V, then every set ofvectors with more than m vectors cannot be linearly
independent.
(2) If ex is linearly independent, then any set of vectors with fewer than m vectors
cannot span V.
Proof: Since (2) follows from (1) directly, we prove only (1). Let,8 = {Yl, Y2, .. . ,Yn}
be a set of n-vectors in V with n > m. We will show that ,8 is linearly dependent.
Indeed, since each vector Yj is a linear combination of the vectors in the spanning set
ex, i.e., for j = 1,2, .. . , n ,
m
must have a nontrivial solution. But it is guaranteed by Lemma 3.7, since m < n. 0
It is clear by Lemma 3.9 that if a set (X = {XI , X2, ... , x n } of n vectors is a basis
for a vector space V, then no other set f3 = {YI, Y2 , . . . , Yr} of r vectors can be a basis
for V if r i= n. This means that all bases for a vector space V have the same number
of vectors, even if there are many different bases for a vector space. Therefore, we
obtain the following important result:
Theorem 3.10 If a basis for a vector space V consists of n vectors, so does every
other basis.
Definition 3.7 The dimension of a vector space V is the number, say n, of vectors
in a basis for V , denoted by dim V = n. When V has a basis of a finite number of
vectors, V is said to be finite dimensional.
Proof: We prove (1) and leave (2) as an exercise. Let (X = {XI, X2, . .. , Xk } be a
linearly independent set in V. If (X spans V, then (X is a basis. If (X does not span V,
then there exists a vector, say Xk+ I , in V that is not contained in the subspace spanned
by the vectors in (x. Now {XI, .. . , Xko xk+d is linearly independent (check why). If
{XI, . .• , Xko xk+d spans V, then this is a basis for V. If it does not span V , then the
90 Chapter 3. VectorSpaces
same procedure can be repeated, yielding a linearly independent set that spans V, i.e.,
a basis for V . This procedure must stop in a finite number of steps because of Lemma
3.9 for a finite-dimensional vector space V. 0
Theorem 3.11 shows that a basis for a vector space V is a set of vectors in V which
is maximally independent and minimally spanning in the above sense. In particular,
if W is a subspace of V, then any basis for W is linearly independent also in V, and
can be extended to a basis for V . Thus dim W ::: dim V.
Proof: Again we prove (1) only. If a spanning set of n vectors were not linearly
independent, then the set would be reduced to a basis that has a number of vectors
smaller than n. 0
Corollary 3.12 means that if it is known that dim V = n and if a set of n vectors
either is linearly independent or spans V, then it is already a basis for the space V.
Example 3.18 (Constructing a basis) Let W be the subspace of]R4 spanned by the
vectors
Solution: Note that dim W ::: 3 since W is spanned by three vectors Xj's. Let A be
the 3 x 4 matrix whose rows are XI. X2 and X3 :
-2 5
-3 ]
1 1 4 .
o 1 o
Reduce A to a row-echelon form :
The three nonzero row vectors of U are clearly linearly independent, and they also
span W because the vectors Xl, X2 and X3 can be expressed as a linear combination
of these three nonzero row vectors of U . Hence , the three nonzero row vectors of U
provides a basis for W. (Note that this implies dim W = 3 and hence xi, X2, X3 is
3.5. Rowand column spaces 91
also a basis for W by Corollary 3.12. The linear independence of Xi 'S is a by-product
of this fact).
To extend it to a basis for ]R4 , just add any nonzero vector of the form X4 =
(0, 0, 0, t) to the rows of U. D
Problem 3.10 Let W be a subspace of a vector space V. Show that if dim W = dim V , then
W=V .
Problem 3.11 Find a basis and the dimension of each of the following subspaces of Mn xn (R)
of all n x n matrice s.
(1) The space of all n x n diagonal matrices whose traces are zero.
(2) The space of all n x n symmetric matrices .
(3) The space of all n x n skew-symmetric matrices.
As a direct consequence of Theorem 3.11 and the definition of the direct sum of
subspaces, one can show the following corollary.
Proof: Choose a basis {UI , . . . , Uk} for U, and extend itto a basis {UI, .. . , Ub Uk+l ,
. . . • un} for V. Then the subspace W spanned by {Uk+I • . . . • un} satisfies the require-
ment. D
Problem 3.12 Let lv j , V2•...• vn} be a basis for a vector space V and let Wi = {rv i : r E R}
be the subspace of V spanned by Vi. Show that V = W I EEl W2 EEl ... EEl Wn.
where fi is the i-th row vectors of A in jRn, and Cj is the j -th column vectors of A in
jRm.
Definition 3.8 Let A beanm x n matrix withrow vectors {fl . f 2 , . .. , f m } and column
vectors {CI , C2, ... , cn }.
92 Chapter 3. Vector Spaces
(1) The row space of A is the subspace in jRn spanned by the row vectors {rl. r2 •.. .•
r m } . denoted by R(A) .
(2) The column space of A is the subspace in jRm spanned by the column vectors
Iei. C2• . . . • cn } . denoted by C(A) .
(3) The solution set of the homogeneous equation Ax =
0 is called the null space
of A, denoted by N(A) .
Note that the null space N (A) is a subspace of the n-space jRn. Its dimension is
called the nullity of A. Since the row vectors of A are just the column vectors of
its transpose A T and the column vectors of A are the row vectors of AT, the row
(column) space of A is just the column (row) space of AT; that is,
Since Ax = XlCl + X2C2 + ... + XnC n for any vector x = (Xl. X2 , •. • ,Xn) E jRn, we
get
C(A) = {Ax : x E jRn}.
Thus, for a vector b E jRm. the system Ax = b has a solution if and only if b E
C(A) jRm. In other words, the column space C(A) is the set of vectors bE jRm for
which Ax = b has a solution.
It is quite natural to ask what the dimensions of those subspaces are. and how
one can find bases for them. This will help us to understand the structure of all the
solutions of the equation Ax = b. Since the set of the row vectors and the set of
the column vectors of A are spanning sets for the row space and the column space,
respectively. a minimally spanning subset of each of them will be a basis for each of
them . This is not a difficult problem for a matrix of a (reduced) row-echelon form.
Example 3.19 (Find a basis for the null space) Let U be in a reduced row-echelon
form given as
1 0 0 2 2]
o 1 0 -1 3
U= 0 0 1 4 -1 .
[
o 0 0 0 0
Clearly, the first three nonzero row vectors containing leading 1's are linearly inde-
pendent and they form a basis for the row space R(U) . so that dim R(U) = 3. On the
other hand, the first three columns containing leading l's are also linearly indepen-
dent (see Theorem 3.6), and the last two column vectors can be expressed as linear
combinations ofthem. Hence, they form a basis for C(U), and dim C(U) = 3. To find
a basis for the null space N (U) , we first solve the system Ux = 0 to get the solution
3.5. Rowand columnspaces 93
where n, = (-2 , 1, -4, 1, 0),ot= (-2 , -3 , 1,0, l),ands andt are arbitrary
values for the free variables X4 and X5, respectively. It shows that these two vectors
n, and n, span the null space N (U ), and they are clearly linearly independent (see
their last two entries). Hence, the set (os , oel is a basis for the null space N(U ). 0
For any matrix A, we first investigate the row space R (A) and the null space
N (A) of A by comparing them with those of the reduced row-echelon form U of A.
Since Ax = 0 and Ux = 0 have the same solution set by Theorem 1.1, we clearly
have N (A ) = N(U).
Let [rj , ... , r m } be the row vectors of an m x n matrix A. The three elementary
row operations change A into the matrices Ai of the following three types:
It is clear that the row vectors of the three matrices AI, A2 and A3 are linear com-
binations of the row vectors of A. On the other hand, by the inverse elementary row
operations, these matrices can be changed into A. Thus, the row vectors of A can also
be written as linear combinations of those of Ai'S. This means that if matrices A and
B are row equivalent, then their row spaces must be equal, i.e., R(A) = R(B).
Now the nonzero row vectors in the reduced row-echelon form U are always
linearly independent and span the row space of U (see Theorem 3.6). Thus they form
a basis for the row space R (A) of A. It gives the following theorem.
Theorem 3.14 Let U be a (reduced) row-echelon form of a matrix A. Then
Moreover, if U has r nonzero row vectors containing leading 1's, then they form a
basis for the row space R(A), so that the dimension ofR(A) is r.
The following example shows how to find bases for the row and the null spaces,
and at the same time how to find a basis for the column space.
Example 3.20 (Find bases for the row space and the column space of A) Let A be a
matrix given as
A = [-;o -;
-3 3
-4 1
=[ r3
].
3 6 0 -7 2
Find bases for the row space R(A), the null space N(A ), and the column space C(A)
of A .
94 Chapter 3. Vector Spaces
Solutioo: (1) Find a basis for 'R (A ): By Gauss-Jordan elimination on A , we get the
reduced row-echelon form V :
V =
[1o a
0 I
0
2
-I o I
0 1 1
1] .
of V are linearly independent, they form a basis for the row space 'R(V) = R(A), so
dim R(A) = 3. (Note that in the process of Gaussian elimination, we did not use a
permutation matrix . This means that the three nonzero rows of V were obtained from
the first three row vectors rl, r2, r3 of A and the fourth row r4 of A turned out to be
a linear combination of them. Thus the first three row vectors of A also form a basis
for the row space.)
(2) Find a basisfor N(A) : It is enough to solve the homogeneous system Ux = 0,
since N(A) = N(V). That is, neglecting the fourth zero equation , the equation
Ux = 0 takes the following system of equations :
X2
+ 2x3
X3
+
+
Xs
Xs
= °0
X4 + X5 = o.
Since the first, the second and the fourth columns of V contain the leading 1's, we
see that the basic variables are XI , X2, X4, and the free variables are X3 , xs . As in
Example 3.19 by assigning arbitrary values sand t to the free variables X3 and xs,
one can find the solution x of Ux = 0 as
x= SOs + tOt,
holds if and only ifx = (Xt , ... ,xs ) E N (A). By taking x = n, = (-2 , 1, 1, 0, 0)
orx = 0 , = ( - 1 , -1, 0, -1 , 1), the basis vectors of N(A) given in (2), we obtain
two nontrivial linear dependencies of C j 's:
= 0,
- C4 + Cs = 0,
respectively. Hence, the column vectors C3 and Cs corresponding to the free variables
in Ax = 0 can be written as
C3 2ct - C2,
Cs = Ct + C2 + C4.
That is, the column vectors C3, Cs of A are linear combinations of the column vectors
Ct, C2, C4, which correspond to the basic variables in Ax = O. Hence, [cj , C2, C4}
spans the column space C(A) .
We claim that [cj , C2, C4} is linearly independent. Let A = [Ct C2 C4] and U =
[uj U2 U4] be submatrices of A and U, respectively, where Uj is the j-th column
vector of the reduced row-echelon form U of A obtained in (1):
- 100]
0 1 0 - [ 1 2 2]
-2 -5 -1
U=
[o 0
0 Oland A =
0
0 - 3
3
4
6-7
.
Then clearly U is the reduced row-echelon form of Aso that N(A) = NeU). Since
the vectors uj , U2, U4 are just the columns of U conta ining leading 1's, they are linearly
independent, by Theorem 3.6, and Ux = 0 has only a trivial solution. This means
that Ax = 0 has also only a trivial solution , so [cj , C2, C4} is linearly independent.
Therefore, it is a basis for the column spaceC(A) and dimC(A) = 3 = the number of
basic variables . That is, the column vectors of A corresponding to the basic variables
in Ux = Oform a basis for the column space C(A) . 0
Alternatively, a basis for the column space C(A) can also be found with the ele-
mentary column operations, which is the same as finding a basis for the row space
R(A T) of AT .
Problem 3.13 Let A be the matrix given in Example3.20. Find the conditions on a, b, c, d
so that the vector x = (a , b, c, d) belongs to C(A) .
1 -2 0 0
2 -5 -3 -2
A= 0 5 15 10
[
2 6 18 8
Also finda basis for C(A) by finding a basis for R(A T).
Problem 3.15 Let A and B be twon x n matrices. Showthat AB = oif and only ifthe column
space of B is a subspaceof the nullspaceof A.
Problem 3.16 Find an exampleof a matrix A and its row-echelon form U such that C(A) f=
C(U) . What is wrong in C(A) = R(A T ) = R(U T ) = C(U)?
Theorem 3.15 (The fundamental theorem) For any m x n matrix A , the row space
and the column space of A have the same dimension ; that is, dim R(A) = dim C(A).
Proof: Let dim R(A) = r and let U be the reduced row-echelon form of A . Then
r is the number of the nonzero row (or column) vectors of U containing leading 1's,
which is equal to the number of basic variables in Ux = 0 or Ax = O. We shall prove
that the r columns of A corresponding to the leading l's (or basic variables) form a
basis for C(A), so that dim C(A) = r = dim R(A).
(1) They are linearly independent: Let Adenote the submatrix of A whose columns
are those of A corresponding to the r basic variables (or leading 1's) in U , and let
o denote the submatrix of U consisting of the r columns containing leading 1'so
Then, it is clear that 0 is the reduced row-echelon form of A, so that Ax 0 if =
and only if Ox = O. However, Ox = 0 has only a trivial solution since the columns
3.6. Rank and nullity 97
of V containing the leading I 's are linearly independent by Theorem 3.6. Therefore,
Ax = 0 also has only the trivial solution, so the columns of Aare linearly independent.
(2) They spanC(A): Note that the columns of A corresponding to the free variables
are not contained in A, and each of these column vectors of A can be written as a
linear combination of the column vectors of A (see Example 3.20). To show this,
let {Cil' Ci2' • •• , Cik} be the columns of A (not contained in A) corresponding to the
free variables {Xii' Xi2' • • • ,Xik}, and let Xij be any of these free variables. Then, by
assigning the value 1 to Xi j and 0 to all the other free variables, one can get a nontrivial
solution of
Ax = XI cI + X2c2 + ... + xnc n = O.
When such a solution is substituted into this equation, one can see that the column
Cij of A corresponding to Xij =
1 is written as a linear combination of the columns
of A. This can be done for each free variable Xi j' j = 1, 2, . .. , k, so the columns of
A corresponding to those free variables are redundant in the spanning set of C(A). 0
Remark: (1) In the proof ofTheorem 3.15, once we have shown that the columns in A
are linearly independent as in step (1), we may replace step (2) by the following argu-
ment: One can easily see that dim C(A) 2: dim R(A) by Theorem 3.11. On the other
hand, since this inequality holds for arbitrary matrices, applying to A T particularly we
get dim C(A") 2: dim R(A T ) . Moreover, C(A T ) = R(A) and R(A T ) = C(A) im-
plies dim C(A) :s dim R(A), which means dim C(A) = dim R(A). This also means
that the column vectors of A span C(A), and so form a basis.
(2) The proof (2) of Theorem 3.15 also shows that the reduced row-echelon form
of a system is unique , which was stated on page 10. In fact, if VI and V2 are two
reduced row-echelon forms of an m x n matrix A, then the columns of VI and V2
corresponding to the basic variables (i.e., containing leading 1's) must be the same
and of the form [0 . . . 0 1 0 .. . of by the definition of the reduced row-echelon
form . If there are no free variables , then it is quite clear that
1 0 0
0 1 0
0
VI= 1 = V2 ·
0 0 0
0 0 ... 0
Suppose that there is a free variable . Since VIX = 0 if and only if V2X = 0, one can
easily check that the columns of VI and V2 corresponding to each free variable must
also be the same, so that VI = V2.
98 Chapter 3. Vector Spaces
In summary, the following equalities are now clear from Theorems 3.14 and 3.15:
dimN(A) = dimN(U)
the number of free variables in Ux = O.
dimR(A) = dimR(U)
= the number of nonzero row vectors of U
= the maximal number of linearly independent
row vectors of A
= the number of basic variables in Ux = 0
= the maximal number of linearly independent
column vectors of A
= dimC(A).
Clearly, rank In = n and rank A = rank AT. And for an m x n matrix A, since
dim R(A) S m and dim C(A) S n, we have the following corollary :
If dimN(A) = 0 (or N(A) = (O}), then dim R(A) = n (or R(A) = which
means that A has exactly n linearly independent rows and n linearly independent
columns. In particular, if A is a square matrix of order n, then the row vectors are
linearly independent if and only if the column vectors are linearly independent. There-
fore, Ax = 0 has only the trivial solution, and by Theorem 1.9 we get the following
corollary.
Example 3.21 (Find the rank and the nullity) For a 4 x 5 matrix
3.6. Rank and nullity 99
2
A-
-
[ 1
1
-2
2
2
2021]
013 1
o0 0 1 .
000 0
The first three nonzero rows containing leading 1 's in U form a basis for R(U) =
R(A). Therefore, rank A = dim'R.(A) = dimC(A) = 3, the nullity of A =
dimN(A) = 5 - dim R(A) = 2. 0
Problem 3.17 Find the nullity and the rank of each of the following matrices :
(1) A =[ ; - ], (2) A = ].
-I -2 0-5 2 I 5 -2
For each of the matrices. show that dim R(A) = dim C(A) directly by finding their bases.
Problem 3.18 Show that a system of linear equations Ax = b has a solution if and only if
rank A = rank [A b], where [A b] denotes the augmented matrix for Ax = b.
Theorem 3.19 For any two matrices A and B for which A B can be defined,
(1) N(AB) ;2 N(B),
(2) N«AB)T) ;2 N(A T),
(3) C(AB) S; C(A),
(4) R(AB) S; R(B).
Proof: (1) and (2) are clear, since Bx = 0 implies (AB)x = A(Bx) = O.
(3) For an m x n matrix A and an n x p matrix B,
Problem 3.20 Prove that the rank of a matrix is equal to the largest order of its invertible
submatrices .
Problem 3.21 For each of the matrices given in Problem 3.17, find an invertible submatrix of
the largest order.
Theorem 3.22 Let V and W be two subspaces ofjRn, and Q be the matrix defined
above.
(1) C(Q) = V + W , so that a basisfor the column space C(Q) is a basis for V + W.
(2) N(Q) can be identified with V n W so that dim(V n W) = dimN(Q).
3.7. Bases for subspaces 101
If we set
Y = alvl+ · .. +akvk
-(bl WI + ... + bew £),
then Y E V n W since the first right-hand side alVI + . . . + akVk is in Vasa linear
combination of the basis vectors in a and the second right-hand side -(blwl + ... +
bew£) is in W as a linear combination of the basis vectors in p. That is, to each
x E N(Q), there corresponds a vector Y in V n w.
On the other hand, if Y E V n W, then Y can be written in two linear combinations
by the bases for V and W separately as
Y = alvl+· · ·+akvkEV,
Y = blwl + .. . + bewe E W,
for some al , .. . , ak and bl , . . . , be. Let x = (al , . .. , ak, -bl , . .. , -be) E jRk+e.
Then it is quite clear that Qx = 0, i.e., x E N(Q). Therefore, the correspondence ofx
in N (Q) jRk+ e to a vector y in V n W jRn gives us a one-to-one correspondence
between the sets N(Q) and V n W .
Moreover, if Xi, i = 1, 2, correspond to Yi, then one can easily check that XI + X2
corresponds to YI + Y2, and kx, corresponds to ky!. This means that the two vector
spaces N (Q ) and V n W can be identified as vector spaces (see Section 4.2 for an exact
meaning of this identification). In particular, for a basis for N(Q) , the corresponding
set in V n W is a basis for V n W : that is, if the set of vectors
1 Ys =
is a basis for V
a slvl + .. . + askvk,
n W, and vice-versa.
Ys =
This implies that
-(bsIWI + ...+ bsewt)
dimN(Q ) = dim(V n W) . o
Note that dim(V + W) =1= dim V + dim W , in general. The following theorem
gives a relation between them.
102 Chapter 3. VectorSpaces
Proof: Let a = {VI, V2, . •. , vd and f3 = {WI , W2, . . . , we} be bases for V and W ,
respectively. Let Q be the n x (k + f) matrix whose columns are the previous basis
vectors:
Q = [VI ' .. Vk WI . . . welnx(k+l)'
Then, by the Rank Theorem and Theorem 3.22, we have
Example 3.22 (Find a basis for a subspace) Let V and W be two subspaces of IRs
r r
with bases
U =
1 0 0
0 1 0 -3
5 00]
2 0
[ 001
o -1 0 .
000 001
From this, one can directly see that dim(V + W) = 4, and the columns VI, V2, V3, W3
corresponding to the basic variables in Qx = 0 (or leading 1's in U) form a basis for
C(Q) = V + W. Moreover, dimN(Q) = dim(V n W) = 2, corresponding to two
free variables X4 and xs in Qx = O.
To find a basis for V n w, we solve Ux = 0 for (X4, xs) = (1,0) and (X4, xs) =
(0,1) respectively to obtain a basis for N(Q) :
3.7. Bases for subspaces 103
-5VI + 3V2 + WI = 0,
-2V2 + V3 + W2 = 0.
Remark: (Another method for finding bases) Example 3.22 illustrates a method for
findingbases for V + Wand V n W for givensubspaces V and W oflRn by constructing
a matrix Q whose columns are basis vectors for V and basis vectors for W. There is
another method for finding their bases by constructing a matrix Q whose rows are
basis vectors for V and basis vectors for W. In this case, clearly V + W = R(Q) .
By finding a basis for the row space R(Q), one can get a basis for V + W.
On the other hand, a basis for V n W can be found as follows: Let A be the k x n
matrix whose rows are basis vectors for V, and B the e x n matrix whose rows are
basis vectors for W. Then V = R(A) and W = R(B). Let A denote the matrix A
with an additional unknown vector X = (Xl , x2, . .. ,Xn ) E IRn attached as the bottom
row, i.e.,
and the matrix B is defined similarly.Then it is clear that R(A) = R(A) and R(B) =
R(B) if and only if X E V n W = R(A) n R(B). This means that the row-echelon
form of A and that of A should be the same via the same Gaussian elimination.
Thus, by comparing the row vectors of the row-echelon form of A with those of A,
one can obtain a system of linear equations for x = (Xl, x2, . . . ,xn ) . By the same
argument applied to Band B, one gets another system oflinear equations for the same
x = (Xl, X2, ••. ,xn ) . Common solutions of these two systems together will provide
us a basis for V n W.
The following example illustrates how one can apply this argument to find bases
for V + Wand V n W.
Example 3.23 (Find a basis for a subspace) Let V be the subspace of 1R5 spanned
by
104 Chapter 3. Vector Spaces
WI = (1, 3, 0, 2, I) ,
W2 = ( I , 5, -6, 6, 3),
W3 = (2, 5, 3, 2, I ) .
[
°
I 3 -2 2
1 -1 2
3]
-1 ,
°° °° 1
so that dim V = 3. Similarly, the matrix B whose row vectors are w/s is reduced to
a row-echelon form
[
°
1 3 0 2
2 -6 4
1]
2 ,
so that dim W = 2.
°° °°°
Now, if Q denotes the 6 x 5 matrix whose row vectors are Vi 'S and w/s, then
V + W = R (Q ). By Gaussian elimination, Q is reduced to a row-echelon form,
excluding zero rows:
I 3 -2
° -1 22 -13 ]
° ° °°
I
I -I .
[
°° ° I
Thus, the four nonzero row vectors
A- = 2
3 -2
4 -3 4 22 3]
3 - I - 2 10 .
XI X2 X3 X4 X5
3.7. Bases for subspaces 105
[
1 3
o
o
o
1
0
0 -XI+X2+X3
-2
-1
0 -i]
1
o
.
- XI + X2 + X3 = 0
{ 4xI - 2X2 + X4 = o.
The same calculation with iJ gives another homogeneous system of linear equations
for x:
- 9X I + 3X2 + X3 0
+
1
4xI - 2X2 X4 = 0
2xI - X2 + X5 = O.
Solving these two homogeneous systems together yields
Problem 3.22 Let V and W be the subs paces of the vector space P3 (R) spann ed by
3 x + 4x 2 + x3
5 + 5x 2 + x3
5 5x + lOx 2 + 3x 3
and
I
WI (x) = + 3x 2 + 2x 3
9 - 3x
W2(X) = 5 - + 2x 2 + x 3 x
W3(X) = 6 + 4x 2 + x 3
respectively. Find the dimensions and bases for V + Wand V n w.
V = {(x,Y,Z,U)ElR4 : y + z+ u = O},
W {(X , Y, z, u) E
4
lR : x + Y = 0, Z = 2U}
be two subs paces of lR4 . Find bases for V. W, V + W, and V n w.
106 Chapter 3. Vector Spaces
3.8 Invertibility
In Chapter 1, we have seen that a non-square matrix A may have only one-sided
(right or left) inverses. In this section , it will be shown that the existence of a one-
sided inverse (right or left) of A implies the existence or the uniqueness of the solutions
of a system Ax = b.
Proof: (1) <=> (2): In general, C(A) jRm. For any b E jRm, there is a solution
x E jRn of Ax = b if and only if b is a linear combination of the column vectors of
A, i.e., b E C(A) . Thus jRm = C(A).
(2) <=> (3): C(A) = jRm if and only if dimC(A) = m :s n (see Problem 3.10). But
dimC(A) = rank A = dim R(A) :s min{m, n},
(1) => (4): Let ej , e2, ... , em be the standard basis for jRm. Then for each ei, one
can find an Xi E jRn such that AXi = ei by the hypothesis (1). If B is the n x m matrix
whose columns are these Xi'S: i.e., B = [Xj X2 • . . xm ] , then, by matrix multiplication,
(4) => (1): If B is a right inverse of A, then for any b s jRm, X = Bb is a solution
0
Condition (2) means that A has m linearly independent column vectors, and con-
dition (3) implies that there exist m linearly independent row vectors of A, since rank
= =
A m dim R(A) .
Note that if C(A) jRm, then Ax = b has no solution for b ¢ C(A) .
Proof: (1) => (2): Note that the column vectors of A are linearly independent if
and only if the homogeneous equation Ax = 0 has only the trivial solution . However,
3.8. Invertibility 107
Ax = 0 has always the trivial solution x = 0, and the statement (1) implies that it is
the only one.
(2) {} (3): Clear, because all the column vectors are linearly independent if and
only if they form a basis for C(A), or dim C(A) = n m.
(3) {} (4): Clear, because dim R(A) = rank A = dimC(A) = n if and only if
R(A) = jRn (see Problem 3.10).
(4) {} (5): Clear, since dim R(A) + dimN(A) = n.
(2) ::::} (6): Suppose that the columns of A are linearly independent so that rank
A = n. Extend these column vectors of A to a basis for jRm by adding m -n additional
independent vectors to them. Construct an m x m matrix S with those basis vectors
in its columns. Then the matrix S has rank m, and hence it is invertible. Let C be the
n x m matrix obtained from S-I by throwing away the last m - n rows. Since the
first n columns of S constitute the matrix A, we have C A = In.
(6) ::::} (1): Let C be a left inverse of A. If Ax = b has no solution, then we are
done . Suppose that Ax = b has two solutions, say Xt and X2. Then
Remark: (1) We have proved that an m x n matrix A has a right inverse if and only
if rank A = m, while A has a left inverse if and only if rank A = n. Therefore, if
m 1= n, A cannot have both left and right inverses.
(2) For a practical way of finding a right or a left inverse of an m xn matrix A,
we will show later (see Remark (1) below Theorem 5.26) that if rank A = m, then
(AAT )-I exists and AT(AAT)-I is a right inverse of A, and if rank A = n, then
(AT A)-I exists and ( A T A)-I AT is a left inverse of A (see Theorem 5.26) .
(3) Note that if m = n so that A is a square matrix, then A has a right inverse (and
a left inverse) if and only if rank A = m = n. Moreover, in this case the inverses are
the same (see Theorem 1.9). Therefore, a square matrix A has rank n if and only if
=
A is invertible. This means that for a square matrix "Existence Uniqueness," and
the ten statements listed in Theorems 3.24-3.25 are all equivalent. In particular, for
the invertibility of a square matrix it is enough to show the existence of a one-sided
inverse.
Problem 3.24 For each of the following matrices, find all vectors b such that the system of
linear equations Ax = b has at least one solution . Also, discuss the uniqueness of the solution.
108 Chapter 3. Vector Spaces
(I) A [J 3 -2 5
4 1 3
7 - 3 6 13
(2) A [ ;
-6
-;
1
l
(3) A [l 2 -3 -2
3 -2
8 -7 -2
0
1 -9 -10
-3]
-4
, (4) A = [ 1 1
5
2 -2
;l
Summarizing all the results obtained so far about solvability of a system, one can
obtain several characterizations of the invertibility of a square matrix. The following
theorem is a collection of the results proved in Theorems 1.9,3.24, and 3.25.
Theorem 3.26 For a square matrix A oforder n, the following statements are equiv-
alent.
(1) A is invertible.
(2) det A =I O.
(3) A is row equivalent to In .
(4) A is a product of elementary matrices.
(5) =
Elimination can be completed: PA LDU, with all d; =I O.
(6) Ax = b has a solution for every bE jRn .
(7) Ax = 0 has only a trivial solution, i.e., N(A) = (OJ.
(8) The columns of A are linearly independent.
(9) The columns of A span jRn, i.e., C(A) = jRn.
(10) A has a left inverse.
(11) rank A = n.
(12) The rows of A are linearly independent.
(13) The rows of A span jRn, i.e., R(A) = R".
(14) A has a right inverse.
(15)* The linear transformation A : jRn -+ jRn via A(x) = Ax is injective.
(16)* The linear transformation A : jRn -+ jRn is surjective.
(17)* Zero is not an eigenvalue of A.
Proof: Exercise; where have we proved which claim? Prove any not covered. The
numbers with asterisks will be explained in the following places: (15) and (16) in
Remark on page 133 and (17) in Theorem 6.1. 0
3.9 Applications
3.9.1 Interpolation
In many scientific experiments , a scientist wants to find the precise functional rela-
tionship between input data and output data. That is, in his experiment, he puts various
3.9.1. Application: Interpolation 109
input values into his experimental device and obtains output values corresponding to
those input values. After his experiment, what he has is a table of inputs and out-
puts. The precise functional relationship might be very complicated, and sometimes
it might be very hard or almost impossible to find the precise function. In this case,
one thing he can do is to find a polynomial whose graph passes through each of the
data points and comes very close to the function he wanted to find. That is, he is
looking for a polynomial that approximates the precise function. Such a polynomial
is called an interpolating polynomial. This problem is closely related to systems of
linear equations.
Let us begin with a set of given data: Suppose that for n + I distinct experimental
input values xo, Xl, ... , x n, we obtained n + I output values YO = I(xo), YI =
I(XI), . .. ,Yn = I(xn) . The output values are supposed to be related to the inputs by
a certain (unknown) function I. We wish to construct a polynomial p(x) of degree less
than or equal to n which interpolates I (x) atxo, Xl, . . . , Xn: i.e., p(Xi) = Yi = I (Xi)
for i = 0, I , . . . , n.
Note that if there is such a polynomial, it must be unique. Indeed, if q (x) is another
such polynomial, then hex) = p(x) - q(x) is also a polynomial of degree less than
or equal to n vanishing at n + I distinct points Xo, Xl, . .. , Xn. Hence h (x) must be
the identically zero polynomial so that p(x) = q(x) for all X E JR.
To find such a polynomial p(x), let
I Xl
XQ .,
. . .. Xl [ ao]
al [ Yo
Yl ]
... .. . ..
. ... - .
..
.
[
I Xn an Yn
detA = n
O;S.i<};S.n
(X) -Xi) .
Since the Xi'S are all distinct, det A =1= O. Hence, A is nonsingular, and Ax = b has a
unique solution which determines the unique polynomial p(x) of degree g n passing
through the given n + I points (xo, Yo), (Xl , Yl) , .. . , (Xn, Yn) in the plane ]R2.
in the plane ]R2, let p(x) = ao+alx+a2x2+a3x3 be the polynomial passing through
the given four points . Then , we have a system of equations
l
ao = 3
ao + al + a2 + a3 = 0
ao - al + a2 a3 2
ao + 3al + 9a2 + 27a3 = 6.
Solving this system, one can get ao = 3, al = -2, a2 = -2, a3 = I, and the unique
polynomial is p(x) = 3 - 2x - 2x 2 + x 3. 0
Problem 3.25 Let f(x) = sinx . Then at x = 0, t, zr, the values of fare y =
0, 4-, O. Find the polynomial p(x) of degree g 4 that passes through these five
points. (One may need to use a computer to avoid messy computation.)
Problem 3.27 Find the equation ofa circle that passes through the three points (2, -2), (3, 5),
and (-4, 6) in the plane ]R2 .
so that all of the coefficients (which are also linear combinations of c; 's) must be zero.
It gives a homogeneous system of linear equations in c; 's, say Ac = 0 with an m x n
matrix A, as in the proof of Lemma 3.9:
Recall that the vectors Yi 's are linearly independent if and only if the system Ac = 0
has only the trivial solution. Hence, the linear independence of a set of vectors in a
finite dimensional vector space can be tested by solving a homogeneous system of
linear equations .
If V is not finite dimensional, this test for the linear independence of a set of vectors
cannot be applied. In this section, we introduce a test for the linear independence of
a set of functions . For our purpose, let V be the vector space of all functions on lR
which are differentiable infinitely many times . Then one can easily see that V is an
infinite dimensional vector space .
for all x E lRimplies that all c; = O. By taking a differentiations n - 1 times, one can
obtain n equations:
!I (x) h(x)
f{(x) fn (x) C2
] [ Cl] [ 0 ]
[
fl(n-:l) (x) j 2(n - l) (
x) .. ·
fn(n-:l)(x) = b.
112 Chapter 3. VectorSpaces
The determinant of the coefficient matrix is called the Wronskian for {II (x), h(x),
. .. , fn(x)} and denoted by W(x) . Therefore, if there is a point Xo E JR such that
W(xo) ::j:. 0, then the coefficient matrix is nonsingular at x = xo, and so all ci = O.
Therefore,
However, the Wronsk ian W (x) = 0 for all x does not imply linear dependence of the
given functions f; 's. In fact, W (x) = 0 means that the functions are linearly dependent
at each point x E JR, but the constants Cj'S giving nontrivial linear dependence may
vary as x varies in the domain . (See Example 3.25 (2).)
X cosx sinx ]
WI (x) = det [ 1 - sinx =x
o -cosx -SlllX
and
x
X eX e- ]
W2(X) = det 1 eX _e- x
[ o e"
= 2x .
e- X
Since Wj(x) ::j:. 0 for x ::j:. 0, both FI and h are linearly independent.
(2) For the set offunctions [x]x], x 2 } on JR, the Wronskian for them is
2
xlxl x ]
W(x) = det [ 21xl 2x = 0
for all x. These two functions are linearly dependent on each of (-00,0] and [0, 00),
since x]x] = -x 2 on (-00,0] and xlxl = x 2 on [0, 00). But they are clearly linearly
independent functions on R 0
Problem 3.28 Show that 1, x, x 2 , .. . , x n are linearly independent in the vectorspace C(lR)
of continuous functions.
3.10 Exercises
3.1. Let V be the set of all pairs (x, y) of real numbers. Define
Is V a vectorspacewiththeseoperations?
3.10. Exercises 113
xE9y=x-y, k ·x = -kx.
The operations on the right-hand sides are the usual ones . Which of the rules in the
definition of a vector space are satisfied for (JR n , E9, .)?
3.3. Determine whether the given set is a vector space with the usual addition and scalar
multiplication of functions .
(1) The set of all continuous functions f defined on the interval [-1, 1] such that f (0) =
1.
(2) The set of all continuous functions f defined on the real line JR such that Iimx->oo
f(x) = O.
(3) The set of all twice differentiable functions f defined on JR such that fl/ (x) + f (x) =
O.
3.4. Let C 2 [ -1, 1] be the vector space of all functions with continuous second derivative s on
the domain [-1, 1] . Which of the following subsets is a subspace of C 2[-1, I]?
(1) W = {I(x) E C 2[ -I , 1] : fl/ (x ) + f(x) = 0, -1::: x::: I}.
(2) W = {I(x) E C 2[-I, 1] : fl/(x) + f(x) = x 2, -1::: x::: I}.
3.5 . Which of the following subsets of C[ -1, 1] is a subspace of the vector space C[ -I, 1]
of continuous functions on [-1, I]?
(1) W = {I(x) E C[-I, 1] : f(-I) = -f(1)}.
(2) W = {I(x) E C[-I , 1] : f(x) 0 for all x in [-1, I]}.
(3) W = {I(x) E C[-I , 1] : fe-I) = -2 and f(1) = 2} .
(4) W = {I(x) E C[-I , 1] : f(i) = OJ.
3.6 . Show that the set of all matrices of the form AB - BA cannot span the vector space
Mnxn(lR).
3.7. Does the vector (3, -1, 0, -1) belong to the subspace oflR.4 spanned by the vectors
(2, -1 , 3, 2) , (-1, 1, 1, -3) and (1, 1, 9, -5) ?
3.8. Expres s the given function as a linear combination of functions in the given set Q.
(1) p(x) = -1 - 3x + 3x 2 and Q = {PI (x), P2(X), P3( X)} , where
PI(X)=1+2x+x 2, P2(X)=2+5x, P3(X)=3+8x-2x 2 .
(2) p(x) -2 - 4x + x 2 and Q {PI (x) , P2(x), P3(X), P4(X)}, where
= =
PI(X) = I+2x 2+x3, P2(X) = I+x+2x 3, P3(X) = - I - 3 x - 4x3,
P4(X) I +2x _x 2 +x 3.
=
3.9. Is {cos2 x , sin 2 x , 1, eX} linearly independent in the vector space C(JR)?
3.10. In the n-space JRn , determine whether or not the set
is linearly dependent.
3.11. Show that the given sets of functions are linearly independent in the vector space
C[-]f, ]fl.
(1) {I , x, x 2, x 3 , x 4}
114 Chapter 3. Vector Spaces
(1) A [i IS]2
4 -3 0 , (2) B [! 2
1 -2
1
-52 ] ,
n
2 -I 1 5 0 0
[i
1 -1
1
1 -1
3
-23 '1 ]
(3) C 6 1 -1 8 3 .
(4) D [
9 0 -2 2 1
5 -5 5 10
n
3.10. Exercises 115
(1)
ooooJ
! i1. (2) [ ;
1114
;], (3) [i i1.
IOOOJ
3.22. For any nonzero column vectors u and v, show that the matrix A = uv T has rank 1.
Conversely, every matrix of rank 1 can be written as uv T for some vectors u and v.
3.23. Determine whether the following statements are true or false, and justify your answers.
(1) The set of all n x n matrices A such that AT = A -1 is a subspace of the vector space
Mnxn(lR).
(2) If Ct and {3 are linearly independent subsets of a vector space V, so is their union
a U {3.
(3) If U and Ware subspaces of a vector space V with bases Ct and {3 respectively, then
the intersection Ct n {3 is a basis for U n W.
(4) Let U be the row-echelon form of a square matrix A. lithe first r columns of U are
linearly independent, so are the first r columns of A.
(5) Any two row-equivalent matrices have the same column space.
(6) Let A be an m x n matrix with rank m. Then the column vectors of A span jRm .
(7) Let A be an m x n matrix with rank n. Then Ax = b has at most one solution .
(8) If U is a subspace of V and x, y are vectors in V such that x + y is contained in U,
then x E U andy E U.
(9) Let U and V are vector spaces . Then U is a subspace of V if and only if dim U
dimV .
(10) Forany m x n matrix A , dimC(A T) + dimN(A T) = m.
4
Linear Transformations
We often call T simply linear. It is easy to see that the two conditions for a linear
transformation can be combined into a single requirement
Geometrically, the linearity is just the requirement for a straight line to be transformed
to a straight line, since x + ky represents a straight line through x in the direction y
in V , and its image T(x) + kT(y) also represents a straight line through T(x) in the
direction of T(y) in W .
HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
118 Chapter 4. Linear Transformations
The following theorem is a direct consequence of the definition, and the proof is
left for an exercise.
P(x) = [ ; ] = [ ] .
4.1. Basic properties of linear transformation 119
y y y.
Ro(x)
x x
x
x
P(x) rex)
Figure 4.1. Three linear transformations on ]R2
(3) The linear transformation T : ]R2 -+ ]R2 defined by, for X = (x, y),
f(t)dt,
satisfy linearity, and so they are linear transformations . Many problems related with
differential and integral equations may be reformulated in terms of linear transforma-
tions . 0
Definition 4.2 Let V and W be two vector spaces, and let T : V -+ W be a linear
transformation from V into W.
(1) Ker(T) = (v E V : T(v) = O} V is called the kernel of T.
(2) Im(T) = (T(v) E W : v E V} = T(V) W is called the image of T .
Problem 4.2 Let W = (A E Mnxn(R) : tr(A) = OJ. Show that W is a subspace, and find a
basis for W .
Problem 4.3 Showthat, for any matrices A and B in Mn xn (R) , tr(AB) = tr(BA) .
One of the most important properties of linear transformations is that they are
completely determined by their values on a basis.
Theorem 4.3 Let V and W be vector spaces. Let {VI , V2, ... , vnl be a bas is for V
and let WI, W2, . . . , Wn be any vectors (possibly repeated) in W. Then there exists a
unique linear transformation T : V --+ W such that T (Vi) =
Wi for i = 1, 2, . . . , n.
Proof: Let x E V . Then it has a unique expression: x = 2:7=1 aiv, for some scalars
aI, a2, .. . , an. Define
4.1. Basic properties of linear transformation 121
n
T :V W by T(x) = I>iWi.
i=1
In particular, T(Vi)= Wi for i =
1,2, . . . , n.
Linearity : For x= =
2:7=1 aivi, Y 2:7=1 biv, E V and k a scalar. we have
=
x+ky 2:7=I(a i +kbi)Vi . Then
n n n
T(x +ky) = L(ai +kbi)Wi = Laiwi +k LbiWi = T(x) +kT(y).
i=1 i=1 i=1
Uniqueness: Suppose that S : V W is linear and S(Vi) = Wi for i =
1.2• . .. , n. Then for any x E V with x = 2:7=1 aiv ], we have
n n
S(x) = LaiS(Vi) = Laiwi = T(x) .
i=1 i=1
Hence. we have S = T. o
The uniqueness in Theorem 4.3 may be rephrased as the following corollary.
Find a formula for T(XI , X2. X3) . and then use it to compute T(2 . -3. 5).
(2) Let f3 = {VI,V2 , V3} be another basis for ]R3. where VI = (1, 1. 1). V2 =
(1. 1. 0). V3 = (1. O. 0). and let T : ]R3 ]R2 be the linear transformation defined
by
Find a formula for T(XI, X2. X3). and then use it to compute T(2. -3, 5).
3 3
T(x) = LXiT(ei) = LXiWi
i=1 i=1
= XI(l ,0)+X2(2, -1)+X3(4, 3)
(XI + 2X2 + 4X3, -X2 + 3X3).
122 Chapter 4. Linear Transformations
Thus, T(2, -3, 5) = (16, 18). In matrix notation, this can be written as
[ oI -I24]
3
[ ]= [
X3
Xl + 2X2 + 4X3 ] .
- X2 + 3X3
(2) In this case, we need to express x = (Xl, X2, X3) as a linear combination of
VI, V2, V3, i.e.,
3
(XI ,X2,X3) = LkiVi = kl(l, I, 1)+k2(1, I, 0)+k3(1,0,0)
i=l
= Xl
= X2
= X3 ·
T(XI,X2,X3) = X3T(VI)+(X2-X3)T(V2)+(XI-X2)T(V3)
= x3(1, 0) + (X2 - x3)(2, -I) + (X] - x2)(4, 3)
= (4XI - 2X2 - X3, 3XI - 4X2 + X3).
Problem 4.5 Let V and W be vector spaces and T : V W be linear.Let {WI, w2•. . .• Wk}
be a linearly independentsubset of the image Im(T) £ W. Suppose that ex = {V]. v2, . . .• vkl
is chosen so that T(Vi) = Wi for i = I. 2, .. . , k. Prove that ex is linearly independent.
Proof: Let wj , W2 E W, and let k be any scalar. Since T is invertible, there exist
unique vectors VI and V2 in V such that T(VI) = WI and T(V2) = W2. Then
Example 4.9 (The vector space Pn (R) is isomorphic to lRn + 1) Consider the vector
space P2(lR) = {a + bx + cx 2 : a , b, c E R] of all polynomials of degree 2
with real coefficients. To each polynomial a + bx + cx 2 in the space P2(lR) , one
can assign the column vector [a b c f in lR3 . Then it is not hard to see that it is
124 Chapter 4. Linear Transformations
an isomorphism from the vector space P2(JR) to the 3-space JR3 , by which one can
identify the polynomial a + bx + cx 2 with the column vector [a b cf. It means
that these two vector spaces can be considered as the same vector space through the
isomorphism . In this sense, one often says that a vector space can be identified with
another if they are isomorphic to each other. In general, the vector space Pn (JR) can
be identified with the (n + I)-space JRn+l. 0
Theorem 4.7 Two vector spaces V and Ware isomorphic if and only if dim V =
dimW.
Proof: Let T : V -+ W be an isomorphism, and let (VI, V2, .. " vnl be a basis for
V. Then we show that the set (T(vj), T(V2), . . . , T(v n )} is a basis for W so that
dim W = n = dim V.
(l) It is linearly independent: Since T is one-to-one, the equation
implies that 0 = CI VI + C2V2 + + Cn Vn. Since the Vi'S are linearly independent,
we have c, = 0 for all i = 1,2, ,n.
(2) It spans W: Since T is onto, for any yEW there exists an x E V such that
T(x) = y. Write x = L:?=I aiVi. Then
<t>(x) = n
= n
= (al, . . . , an) = [ al: ]
E IRn,
.=1 1=1 an
which is called the coordinate vector of x with respect to the ordered basis a, and it
is denoted by [x]a. Clearly [vila = er.
Example 4.10 (1) Recall that, from Example 4.3, the rotation by the angle () of IR2
is given by the matrix
R = [ cos () - sin () ]
IJ sin () cos () .
Clearly, it is invertible and hence is an isomorphism of IR2. In fact, the inverse I isR;
another rotation R_IJ .
(2) Let a = [ej , e2} be the standard ordered basis, and let f3 = {VI, V2}, where
Vi = RlJei , i = 1,2. Then f3 is also a basis for IR2. The coordinate vectors of Vi with
respect to a are
126 Chapter 4. Linear Transformations
COS 0 ] [ - sin 0 ]
[vtJa =[ sin s ' [vzl a = cosO'
l l
while
[vIlp =[ [vzlp =[
If we choose a' = {ez, eil as a different ordered basis for IR z , then the coordinate
vectors of Vi with respect to sx' are
R" =
COS f - sin frr ] -_ 1
- 1
]'
'4 [ sin!!.4 cos
'4 .J2.J2
and the reflection aboutthex-axis is _ land R_1- = Rjl. Hence, the matrix
for the reflection about the line y = x is
In general, the reflection about a line .e in the plane can be expressed as the
composition Re 0 T 0 R-e , where T is the reflection about the x-axis and 0 is the
angle between the x-axis and the line .e (see Figure 4.2). 0
x y
R_o(x)
y = RO 0 T 0 R_o(x)
x T
Ro To R_O(x)
the natural isomorphism between an n-dimensional vector space V and the n-space
jRn, which will be shown in this section.
Let T : V -+ W be a linear transformationfrom an n-dimensionalvectorspace V
toanm-dimensional vectorspace W, and let ex = {VI, . . . , vn } and,B = {WI, . . . , wm }
be any ordered bases for V and W, respectively, which will be fixed throughout this
section. Then by Theorem 4.3 the linear transformation T is completely determined
by its values on a basis ex: Write them as
T
V W
X f--""> T(x)
j T T j
[x]a f--""> [T(x»)p
jRn jRm ,
A = [T]g
Figure 4.3. The associated matrix for T
in the commutative diagram in Figure 4.3 with the natural isomorphisms ct> and
\II, defined in Section 4.2. Note that the commutativity of the diagram means that
A 0 ct> = \II 0 T.
Note that
is the matrix whose column vectors are just the coordinate vectors [T (v j)]p of T (v j )
with respect to the basis (3. In fact, A= [aU] is just the transpose of the coefficient
matrix in the expression of T(V i) with respect to the basis {3 in W . Note that this
matrix is unique since the coordinate expression of a vector with respect to a
fixed basis is unique.
Definition 4.4 The matrix A is called the associated matrix for T (or the matrix
representation of T) with respect to the ordered bases (¥ and {3 , and denoted by
A = [T]g. When V = Wand a = {3, we simply write [T]a for
Now, the argument so far can be summarized in the following theorem.
Theorem 4.9 Let T : V W be a linear transformation from an n-dimensional
vector space V to an m-dimensional vector space W. Forfixed ordered bases a =
{VI, V2, ... , vnl for V and {3 for W, there corresponds a unique associated m x n
matrix [T]g for T such that for any vector x E V the coordinate vector [T(x)]p of
T (x) with respect to {3 is given as a matrix product ofthe associated matrix [T]g for
T and the coordinate vector [x]a. i.e.,
[T(x)]p = [T]g[x]a . I
The associated matrix [T]g is given as
4.3. Matrices of linear transformations 129
The following examples illustrate the computation of the associated matrices for
linear transformations.
Example 4.12 (The associated matrix [id]a) Let id : V V be the identity trans-
formation on a vector space V. Then for any ordered basis a for V, the matrix
[id]a = I, the identity matrix , because if a = {VI , V2 , " " vn } , then
j
id(V2) = OVI + IV2 + + OVm
·· ..
· .
id(vn ) = OVI + OV2 + + Iv m • o
Example 4.13 (The associated matrix Let T : PI (lR) P2(lR) be the linear
transformation defined by
(T(p»)(x) = xp(x) .
Find the associated matrix with respected to ordered bases a = {I, x} and
f3 = {I , x, x 2 } for PI(lR) and P2(lR),respectiveiy.
Solution: Clearly,
(T(I»(x) = x = O· I + Ix + Ox2
{ (T(x»(x) = x 2 = O· I + Ox + Ix 2.
[! n.
Hence, the associated matrix for T is the transpose of the coefficient matrix in this
Example 4.14 (The associated matrices and Let T : lR2 lR3 be the
linear transformation defined by T(x , y) = (x + 2y, 0, 2x + 3y) with respect to
the standard bases a and f3 for lR2 and lR3, respectively. Then
i:
T(O, I) = (2, 0, 3) = 2el Oe2 3e3.
Example 4.15 (The associated matrix [T]a for the standard basis a) Let T : lR2
lR 2bealineartransformationgivenbyT(l , I) = (0, l)andT(-I, I) = (2, 3).Find
the matrix representation [T]a of T with respect to the standard basis a = [ej , e2}.
130 Chapter 4. Linear Transformations
Solution: Note (a, b) = ae) + be2 for any (a, b) E jR2. Thus, the definition of T
shows
T(el)
-T(e» ++T(e2) = T(-e) +
= T(el + e2) = T(l, 1) =
e2) = T(-l , 1) =
(0 , 1) =
(2, 3) = 2el + 3e2.
T(el) = -el
T(e2) = el
Therefore, rn, = [ = ;l 0
Example 4.16 (The associated matrix [T]p for a non-standard basis 13) Let T be
the linear transformation given in Example 4.15. Find [TJp for a basis 13 {VI,V2}, =
where VI = (0, 1) and V2 = (2, 3).
T(v» = ;] [ ] = [ ; ] = [T(VI)]a ,
T(V2) = ;] [; ] = [ ] = [T(V2)]a.
the j-th column of A, so that its matrix representation is the identity matrix.
(2) Let V and W be vector spaces with bases a and {3, respectively , and let
T : V --+ W be a linear transformation with the matrix representation A. Then =
itis clear that Ker(T) and Im(T) are isomorphic to the null spaceN(A) and the column
space C(A) , respectively , via the natural isomorphisms. In particular, if V = JRn and
W = JRm with the standard bases, then Ker(T) = N(A) and Im(T) = C(A) .
Therefore, from Theorem 3.17, we have
Problem 4.10 Find the matrix representations [TJa and [Tl,8 of each of the following linear
transformations T on jR3 with respect to the standard basis a. = {eI, e2, e3} and another
fJ = [ej , e2, ej}:
(1) T(x, y, z) = (2x - 3y + 4z, Sx - y + 2z , 4x + 7y),
(2) T(x, y, z) = (2y + z, x - 4y , 3x).
Let a. and fJ be the standard bases for jR4 and jR3 , respectively . Find
Problem 4.12 Let id : jRn _ jRn be the identity transformation. Let Xk denote the vector in
jRn whose first k - 1 coordinates are zero and the last n - k + 1 coordinates are 1. Then clearly
fJ = [xj , . .. , xnl is a basis for jRn (see Problem 3.9) . Let a. = {eI, . . . , en} be the standard
basis for jRn. Find the matrix representations and
For any two linear transformations Sand T in £(V; W) and A E JR, we define the
sum S + T and the scalar multiplication 'AS by
132 Chapter 4. Linear Transformations
Lemma 4.10 The function </J : L(V; W) --+ MmxnOR) is a one-to-one correspon-
dence between L(V; W) and Mmxn(IR).
Furthermore, the following lemma shows that </J is linear, so that it is in fact an
isomorphism from L(V; W) to Mm xn(IR).
Lemma 4.11 Let V and W be vector spaces with ordered bases a and fJ, respectively,
and let S, T : V --+ W be linear. Then we have
[S + = + and =
Proof: Leta = {VI , .. . , vn } andfJ = {WI, ... , w m }. Then we have unique expres-
sions S(Vj) = Lr=1
aijw; and T(vj) = Lr=1
bijw; for each I s j s n, so that
= [aij] and = [bij]' Hence
m m m
(S + T)(Vj) = I>ijW; + I )ijW; = L(aij + bij)w;.
;=1 ;=1 ;=1
Thus
[S + = +
The proof of the second equality = is similar and left as an exercise. 0
a linear transformation and A itself is the matrix representation of itself with respect
to the standard bases of lRn and lRm,
One can summarize our discussions in the following theorem:
Theorem 4.12 For vector spaces V ofdimension nand W ofdimension m, the vector
space ,C(V; W) ofall linear transformations from V to W is isomorphic to the vector
space Mmxn(lR) of all m x n matrices, and
The next theorem shows that the one-to-one correspondence between £(V ; W)
and Mm xn (R) preserves not only the vector space structure but also the compositions
oflinear transformations, Let V, Wand Z be vector spaces. Suppose that S : V --+ W
and T : W --+ Z are linear transformations, Then the composition T 0 S : V --+ Z is
also linear.
Theorem 4.13 Let V, Wand Z be vector spaces with ordered bases a , {3, and y ,
respectively. Suppose that S : V --+ Wand T : W --+ Z are linear transformations.
Then
[T 0 S]!; =
Problem 4.13 Let ex be the standard basis for lR3, and let S, T : lR3 lR3 be two linear
transformations given by
Problem 4.14 Let T : P2(JR) P2(JR) be the linear transformation defined by T (f) (3 + =
x)J' +2J,andletS : P2(lR) lR3 be the one definedby S(a+bx+cx 2) =
(a-b , a+b , c).
Fora basis ex = {l,x , x 2} for P2(lR) and the standard basis f3 = {el,e2, e3}for]R3, compute
p p
[Sla, [T]a and [S 0 T]a .
Theorem 4.14 Let V and W be vector spaces with ordered bases a and {3, respec-
tively, and let T : V W be an isomorphism. Then
Proof: Since T is invertible, dim V = dim W, and the matrices and [T-I]p
are square and of the same size. Thus,
Problem 4.15 For the vector spaces PI (JR) and lR2, choose the bases ex = {I , x } for PI (R) and
f3= {el, e2} for lR2, respectively. Let T : PI (R) ]R2 be the linear transformation defined
by T(a + bx) = (a, a + b) .
(1) Show that T is invertible. (2) Find and [T-Il p'
In Section 4 .2, we have seen that, in an n-dimensional vector space V with a fixed
basis a, any vector x can be identified with a column vector [x]a in the n-space jRn via
the natural isomorphism <1>. Of course, one may get a different column vector [x]p if
another basis {3 is given instead of a . Thus, one may naturally ask what the relation
between [x], and [x]p is for two different bases a and {3.
To answer this question, let us begin with an example in the plane jR2. The coor-
dinate expression ofx = (x, y) E jR2 with respect to the standard basis a = Iei. e2}
l
wise through an angle 8 as in Figure 4.4, and suppose that the coordinate expression
ofx E ]R2 with respect to fJ is written as x = x'e; + y'e 2, or [xl,B =[ Then,
the expressions of the vectors in fJ with respect to ex are
[ x]
y
= [ 8 - sin 8 ] [
sm8 cos8 y'
x' ], or [x], = [id]p[xl,B,
where
[i d]a = [[ e' ] [e']] = [ 8 - sin 8 ] .
,B I a 2a sin 8 cos 8
It means that two coordinate vectors [x]a and [x],B in the 2-space ]R2 are related
by the associated matrix [id]p for the identity transformation id on ]R2. Note that
[xla = [t J=I
quY j] = [idlp[xlfl
or
where
qll ... qln]
[id]p = '. = [[wIla .. . [Wn]a]·
[
qnl qnn
This means that any two coordinate vectors of a vector in V with respect to two
different ordered bases a and f3 are related by the matrix representation [id]p of
the identity transformation on V, and this can be incorporated in the commutative
diagram in Figure 4.5 (next page).
Definition 4.5 The matrix representation [id]p of the identity transformation id :
V -+ V with respect to any two ordered bases a and f3 is called the basis-change
matrix or the coordinate-change matrix from f3 to a.
Since the identity transformation id : V -+ V is invertible, the basis-change
matrix Q = [id]p is also invertible by Theorem 4.14. If we had taken the expressions
of the vectors in the basis a with respect to the basis f3 : Vj = id(vj) = L:7=1 PijWi
for j = 1,2, . . . , n, then [Pij] = = Q-I and
[x]fl = = ([id]p)-I[X]a'
4.5. Change of bases 137
id
I
V V
x 1--- ...... x
T T
[xlp [xla
jRn jRn.
Q = [idl p
Solution: .Let f3 = {e; , e;l be the basis for ]R2 obtained by rotating the standard
basis a = {et , e2l counterclockwis e through an angle n /4, and let[x]a = [ ] and
[x]p = [ ] . Then,
[ ] = (i [ ] = [ f - f][ ] = - ] [ l
Hence, the equation xy = I is transformed to
1= xy = _ + = (x:/ _
which is a hyperbola. (See Figure 4.6.) o
- Sin()]
[e2]a =
[
o ,
n
[
o 0
[n
so
Ixl, Q[x],
Q-I = = -
COS ()
() cose
sin o 0]
0 ,
[
o 1
0] [
so that
X' ] [COS () sin () x ]
= - () o o
[
Problem 4.16 Find the basis-change matrix from a basis a to another basis f3 for the 3-space
1R3,wherea = {(I , 0,1 ), (1,1 ,0) , (0, 1, I)}, f3 = {(2, 3,1) , (1,2,0), (2,0, 3)).
4.6 Similarity
I fJ'
[x]a' = [x]a , and [T(x)]fJ' = [idw]fJ [T(x)],8.
Therefore, we get
= [idw [idv
This equation looks messy. However, by Theorem 4.13, this relation can be obtained
directly from T =idw 0 T 0 id v as
jRn jRm
T
;/
(V, a') • (W , fJ /)
jRn
7 jRm
The relation (3) of [T]p and [T]a in Corollary 4.16 is called a similarity. In
general, we have the following definition.
Definition 4.6 For any square matrices A and B, A is said to be similar to B if there
exists a nonsingular matrix Q such that B = Q-I A Q.
Note that if A is similar to B, then B is also similar to A. Thus we simply say that
A and B are similar. We saw in Corollary 4.16 that if A and B are n x n matrices
representing the same linear transformation T on a vector space V , then A and B are
similar.
Example 4.19 (Two similar associated matrices) Let 13 = {VI , V2, V3} be a basis for
the 3-space ]R3 consisting of VI = (l, 1, 0), V2 = (1, 0, 1) and V3 = (0, 1, 1) . Let
T be the linear transformation on ]R3 given by the matrix
21-1]
[T]p =
[ 1 2
-1 1
3
1
.
Let a = {el, e2, e3} be the standard basis. Find the basis-change matrix and
[T]a.
-: n
1 1
Therefore,
[T]a = = -1 [ 43
2 -1 o
4.6. Similarity 141
Example 4.20 (Computing an associated matrix) Let T : ]R3 JR.3 be the linear
transformation defined by
[T]a =
2 10]
[o 1 1 3
-1 0
and [id]p =
[-1 21]
0 1 1
001
.
- 1 2
-1 ]
Thus, with the inverse = ([id]p)-I = 0 1 - , it follows that
[ o 0
To show the second statement,let j = 2. Then T(V2) = T(2, 1,0) = (5,3, -1). On
the other hand, the coefficients of [T (V2)]/l are just the entries of the second column
of [T]/l' Therefore,
Proof: Let Q = [qij] and let WI, W2, . .. , Wn be the vectors in V defined by
D(l) = 0 = 0 . 1+ 0 . x + 0 . x2
D(x) = 1 =1·1+0·x+O·x 2
D(x 2 )= 2x = O· 1 + 2· x + 0 . x 2 •
0I 0]
[Dl a =
[ 000
0 0 2 .
Q=
o 0 4
Let
Now, we are going to find a basis tl = {VI, V2, V3} so that B = [Dlp. But, if it is, the
matrix Q must be the basis-changematrix [idlfj, and then
VI = 1·1 + o· x + O· x 2 = I ,
V2 = 0·1 + 2 · x + O· x 2 = 2x,
V3 = -2.1+0·x+4·x 2 = -2+4x 2 .
D(l) = 0 = O· 1 + 0 . 2x + 0 . (-2 + 4x 2 ) ,
D(2x) = 2 = 2.1+0.2x+O .(-2+4x 2 ) ,
D( -2 + 4x ) = 8x = O· 1 + 4 . 2x + 0 . (-2 + 4x 2 ) ,
2
and
020]
[Dlp = [ 0 0 4
000
, o
thus, as expected, that [Dlp = B = Q-I[DlaQ.
4.7.1. Application: Dual spaces and adjoint 143
Let a be the standard basis, and let fJ = {VI, V2, V3} be another ordered basis consisting of
VI = (1, 0, 0), v2 = (1, 1, 0), and v3 = (1, 1, 1) for ]R3. Find the associated matrix of T
with respect to a and the associated matrix of T with respect to fJ. Are they similar?
Problem 4.18 Suppose that A and B are similar n x n matrices. Show that
=
(1) det A det B,
(2) tr(A) =tr(B),
(3) rank A =
rank B.
Problem 4.19 Let A and B be n x n matrices. Show that if A is similar to B, then A2 is similar
to B 2 . In general, An is similar to B" for all n 2.
4.7 Applications
Note that the space of all scalars is a one-dimensional vector space JR, and the set of
all linear transformations from V to JR is the vector space .c(V; JR) whose dimension
is equal to the dimension of V (see Theorem 4.12), so that the two vector spaces
.c( V; JR) and V are isomorphic. In this section, we are concerned exclusively with
such linear transformations from V to the scalar space R
From the definition, one can say that any vector space is isomorphic to its dual
space .
Example 4.23 (Fourier coefficients are linear functionals) Let C[a , b] be the vec-
tor space of all continuous real-valued functions on the interval [a , b]. The definite
integral I : C[a, b] JR defined by
144 Chapter 4. Linear Transformations
I(f) = l b
f(t)dt
An(f) = - 1 1
rr 0
21r
f(t) cosnt dt and Bn(f) = - 1
rr 0
1 21r
f(t)sinntdt
vt: V -+ IR
Definition 4.8 For a basis a {VI,V2 , .. . , vn } for a vector space V , the basis
a* = {vr, vi, ... , for V* is called the dual basis of a.
Example 4.24 (Computing a dual basis) Let a = {VI , V2} be a basis for 1R2, where
VI = (1, 2) and V2 = (1, 3). To find its dual basis a* = {vi, vi} of a, we consider
the equations
The following example shows a natural isomorphism between the n-space JRn and
its dual space JRn*.
Example 4.25 (The dual basis et is the coordinate function) For the standard basis
a = {el, e2, ... , en} for the n-space JRn , its dual basis vector et is just the i -th
coordinate function. In fact, for any vector x = (XI , X2 , ,xn) = XI el + X2e2 +
.. . + xnen E JRn, we have et (x) = et (XI el + X2e2 + + xnen) = Xi for all i,
On the other hand, when we write a vector in JRn as x = (XI, X2, . . . ,xn ) with
variables Xi, it means that for a given point a = (ai, a2, ... , an) E JRn , each Xi
gives us the i-th coordinate of a, that is, xi(a) = a, for all i, In this sense, one can
identify et = Xi for i = 1, 2, ... , n , so that JRn * = JRn and they are called coordinate
functions . 0
(T 0 S)*(g) = go (T 0 S) = (g 0 T) 0 S
= T*(g) 0 S = S*(T*(g)) = (S* 0 T*)(g).
Theorem 4.20 The mapping ct> : V V** defined by ct>(x) = xis an isomorphism
from V to V**.
Proof: To show the linearity of ct>, let x, y E V and k a scalar. Then , for any f E V*,
..--....-
ct>(x + ky)(f) = (x + ky)(f) = f(x + ky)
= f(x) + kf(y) = x(f) + ky(f)
= (x + ky)(f) = (ct>(x) + kct>(yŸ (I).
Prove that {fl, Iz. 13} is a basis for V*, and then find a basis for V for which it is the dual .
Then
4.7.1. Application: Dual spaces and adjoint 147
since
= wj (takiwk) = Lakiwj(Wk) =
k=1 k=1
ifi k ,
if k + 1 i n.
Then clearly T E V* and the restriction Tlu ofT on U is simply T: i.e., Tlu = T E
U*. It is easy to see that (T + kS) T + kS. In particular, it is also easy to see that
=
{uj, u2' .. ., uk} is linearly independent in V* and uilu = ui E U*, i = 1,2, .. . ,k.
Therefore, one obtains a one-to -one linear transformation
tp : U* V*
given by cp(T) = T for all T E U*. The image cp(U*) is now a subspace of V*. By
identifying U* with the image cp(U*) , one can say U* is a subspace of V*.
Problem 4.22 Let U and W be subspaces of a vector space V. Show that U £:: W if and only
ifW* £:: U*.
148 Chapter 4. Linear Transformations
Let S be an arbitrary subset of V, and let (S} denote the subspace of V spanned
by the vectors in S. Let Sol = (f E V* : f(x) = 0 for any XES} . Then it is easy
to show that Sol is a subspace of V*, Sol = (S}ol, and dim(S} + dim Sol = n.
Let R be a subset of V*. Then Rol = {x E V : f(x) = 0 for any fER} is again
a subspace of V such that Rol = (R}ol and dim Rol + dim(R} = n.
o
Figure 4.8. Letter L.A. on a screen
sented by a matrix with coordinates of the vertices. For example, the coordinates of
the 6 vertices of 'L' form a matrix:
vertices 1 2 3 4 5 6
x-coordinate [ 0.0 0.0 0.5 0.5 2.0
y-coordinate 0.0 2.0 2.0 0.5 0.5 0.0
2.0] = A.
Of course, we assume that the computer knows which vertices are connected to
which by lines via some algorithm. We know that line segments are transformed
to other line segments by a matrix, considered as a linear transformation. Thus, by
multiplying A by a matrix, the vertices are transformed to the other set of vertices,
and the line segments connecting the vertices are preserved. For example, the matrix
B = [b 0.;5] transforms the matrix A to the following form, which represents
new coordinates of the vertices:
4.7.2. Application: Computer graphics 149
vertices 1 2 3 4 5 6
0.0 0.5 1.0 0.625 2.125 2.0]
BA = [ 0.0 2.0 2.0 0.5 0.5 0.0 .
Now, the computer connects these vertices properly by lines according to the given
algorithm and displays on the screen the changed figure as the left-hand side of the
Figure 4.9. The multiplication ofthe matrix C = [005 to BA shrinks the width
BA n
L!!=Js
3
(CB)A
1 6
of BA by half producing the right-hand side of Figure 4.9. Thus, changes in the shape
of a figure may be obtained by compositions of appropriate linear transformations.
Now, it is suggested that the readers try to find various matrices such as reflections ,
rotations, or any other linear transformations, and multiply A by them to see how the
shape of the figure changes.
Problem 4.24 For the given matrices A and B = ] above, by the matrix B T A
instead of BA, what kind of figure can you have ?
Remark: Incidentally, one can see that the composition of a rotation by 1( followed
by a reflection about the x-axis is the same as the composition of the reflection
followed by the rotation (see Figure 4.10). In general, a rotation and a reflection are
not commutative, neither are a reflection and another reflection.
The above argument generally applie s to a figure in any dimension. For instance,
a 3 x 3 matrix may be used to convert a figure in )R3 since each point has three
components.
Example 4.28 (Classifying all rotations in )R3) It is easy to see that the matrices
1 0 - sin f3 ]
R(x .a) = 0 coso
[ o sin o
o ,
cos f3
COS Y
-sin y
R( z,y) = Y cos y
[
o
are the rotations about the x, y, z-axes by the angles a, f3 and y, respectively .
150 Chapter 4. Linear Transformations
!t 1t
Rotationby 7f
Reflection Reflection
\f Rotation by x
2\
Figure 4.10. Commutativity of a rotationand a reflection
In general, the matrix that rotates JR3 with respect to a given axis appears frequently
in many applications. One can easily express such a general rotation as a composition
of basic rotations such as R(x ,a) , R(y ,{3) and R(z ,y) :
First, note that by choosing a unit vector u in JR3, one can determine an axis for a
rotation by taking a line passing through u and the origin O. (In fact, vectors u and -u
determine the same line) . Let u = (cosacos{3, cos a sin {3, sin a), a
o {3 2rr in the spherical coordinates. To find the matrix R(u ,o) of the rotation
about the u-axis bye, we first rotate the u-axis about the z-axis into the x z-plane by
R(z,_{3) and then into the x-axis by the rotation R (y ,-a) about the y-axis , Then the
rotation about the u-axis is the same as the rotation about the x-axis followed by the
inverses of the above rotations, i.e., take the rotation R(x ,o) about the x-axis, and then
get back to the rotation about the u-axis via R(y,a) and R(z,{3) ' In summary,
z
u
Problem 4.25 Find the matrix R (0 , t ) for the rotation aboutthe line determined by u = 0 , 1, 1)
T(
by 4'
So far, we have seen rotations, reflections, tilting (or say shear) or scaling (shrink-
ing or enlargement) or their compositions as linear transformations on the plain ]R2 or
the space ]R3 for computer graphics. However, another indispensable transformation
for computer graphics is a translation: A translation is by definition a transformation
T : jRn --+ ]Rn defined by T (x) x + Xo for any x E jRn , where Xo is a fixed vector
=
in ]Rn . Unfortunately, a translation is not linear if Xo =1= 0 and hence it cannot be
represented by a matrix . To escape this disadvantage , we introduce a new coordinate
system , called a homogeneous coordinate. For brevity, we will consider only the
3-space ]R3.
A point x = (x, y, z) E ]R3 in the rectangular coordinate can be viewed as the
set of vectors x = (hx , hy, hz; h), h =1= 0 in the 4-space ]R4 in a homogeneous
coordinate. Most time, we use (x, y , z, I) as a representative of this set. Conversely,
a point (hx, hy, hz, h), (h i= 0) in the 4-space ]R4 in a homogeneous coordinate
corresponds to the point (x / h, y / h, z/ h) E ]R3 in the rectangular coordinate. Now,
it is possible to represent all of our transformations including translations as 4 x 4
matrices by using the homogeneous coordinate and it will be shown case by case as
follows.
(1) Translations: A translation T : jR3 --+ ]R3 defined by T (x) = x + xo, where
Xo = (xo, yo , zo). can be represented by a matrix multiplication in the homogeneous
coordinate as
n
(2) Rotations: With the notations in Example 4.28 ,
- sinf3
o
cos f3
o
COS Y - sin y 0 0]
sin y cos y 0 0
R (z,y )= 0 0 I 0
[
o 0 0 I
are the rotations about the x , y, z-axes by the angles a , f3 and y in the homogeneous
coordinate, respectively.
(3) Reflections: An xy- reflection is represented by a matrix multiplication in the
homogeneous coordinate as
152 Chapter 4. Linear Transformations
Similarly, one can have an xz-shear and a yz-shear with matrices of the form
!
001
respectively.
(5) Scaling: A scaling is represented by a matrix multiplication in the homoge-
neous coordinate as
4.8 Exercises
4.1. Which of the following functions T are linear transformations?
(1) T(x, y) =
(x 2 _ i,x 2 + y2) .
(2) T(x, y, z) = (x + y, 0, 2x + 4z).
(3) T(x , y) =
(sin x , y) .
(4) T(x, y) (x + I, 2y, x + y ) .
=
(5) T(x, y, z) =
(]Ÿ], 0) .
4.2. Let T : P2(lR) P3(lR) be a linear transformation such that T(l) = 1, T(x) = x 2 and
T(x 2 ) = x 3 + x . Find T(ax 2 + bx + c) .
4.3. Find SoT and/or T 0 S whenever it is defined.
4.8. Exercises 153
(This means that, for a square matrix A considered as a linear transformation, the absolute
value of the determinant of A is the ratio between the volumes of a parallelepiped P(B)
and its image parallelepiped P(C) under the transformation by A.1f det A = 0, then the
image P(C) is a parallelepiped in a subspace of dimension less than n).
4.12 . Let T : ]R3 --+ ]R3 be the linear transformation given by
Let C denote the unit cube in ]R3 determined by the standard basis ej , e2, e3. Find the
volume of the image parallelepiped T (C) of C under T.
154 Chapter 4. Linear Transformations
4.13. With respect to the ordered basis a = {I, x, x 2 } for the vector space P2(lR), find the
coordinate vector of the following polynomials:
4.17. Let M = i l
(1) Find the unique linear transformation T : lR3 lR2 so that M is the associated
!Ul [il UJ I·
matrix of T with respect to the bases
"1
(2) Find T(x, y, z).
"2 ([ n[: ]J
4.18. Find the matrix representation of each of the following linear transformations T on P2 (lR)
with respect to the basis {I, x , x 2 }.
(1) T: p(x) p(x + 1).
(2) T: p(x) p' (x) .
(3) T: p(x) p(O)x .
(4) T : p(x) p(x) - p(O).
x
4.19. Consider the following ordered bases of lR3: a = {el , e2, ej] the standard basis and
fJ = {UI = (I, I , 1), U2 = (1, 1, 0), U3 = (1,0, O)}.
(1) Find the basis-change matrix P from a to fJ .
(2) Find the basis-change matrix Q from fJ to a.
(3) Verify that Q = P- l .
(4) Show that [vlfJ = P[vl er for any vector V E lR3.
(5) Show that [T]fJ = Q-I [Tl a Q for the linear transformation T defined by
T(x,y,z)=(2y+x, x-4y, 3x).
4.20 . There are no matrices A and B in Mnxn(lR) such that AB - BA = In.
4.8. Exercises 155
T (x , y, z) = (3x + 2y - 4z , x - 5y + 3z),
and let a ={(l , I , 1), (1,1, D), (I, D, D) } and d = {(l, 3), (2, 5)} bebasesfor]R3
and ]RZ, respect ively.
(l) Find the associated matrix for T .
(2) Verify = [T (v)]/l for any v E ]R3.
4.22 . Find the basis-change matrix from a to .8, when
(l) a = {(2, 3) , (D , I )}, .8 = {(6, 4) , (4,8)};
(2) a = {(5, 1), (I ,2) }, .8 = {(l , D), (D, I )};
(3) a= {(l, 1, 1), (l, I , D), (I, D, D)} , .8 = {(2, D, 3), (- 1, 4, I) , (3, 2, 5)};
(4) a={t , 1, t Z}, .8 = {3 +2t + t Z , t Z-4 , 2 +t}.
4.23. Show that all matri ces of the form Ae = [ BB sin BB ] are similar.
Sin - cos
4.26. For a linear transformation T on a vector space V, show that T is one-to -one if and only
if its tran spose T * is one-to-on e.
4.27. Let T : ]R3 -+ ]R3 be the linear transformation defined by
T (x, y , z)= (2y+ z , -x+4y+ z , x +z) .
Compute [Tl a and [T*la* for the standard basis a = [e j , eZ, ej} ,
4.28. Let T be the linea r transformation from ]R3 into ]Rz defined by
T (XI, X2 , X3) = (XI +x2, 2X3 - XI ) ·
(l) For the standard ordered bases a and .8 for ]R3 and ]Rz respectivel y, find the associated
matrix for T with respect to the bases a and .8.
(2) Let a = {XI, XZ , x3} and .8 = {YI , yz}, where XI = (l, D, -I), xz = (1, 1, 1),
X3 = (1 , D, D), and YI = (D, I), yz = (I , D). Find the associated matrices and
-I 3 4
Find the bases for the kernel and the image of the transpose T* on V* .
156 Chapter 4. Linear Transformations
Show that (II, 12, 13j is a basis for V* by finding its dual basis for V .
4.32. Determine whether or not the following statements are true in general, and justify your
answers .
(1) For a linear transformation T : jRn --+ jRm, Ker(T) = {OJ if m > n.
(2) For a linear transformation T : jRn --+ jRm, Ker(T) =1= {OJ if m < n .
(3) A linear transformation T : jRn --+ jRm is one-to-one if and only if the nullspace of
[T]g is {OJ for any basis ex for jRn and any basis f3 for jRm.
(4) For any linear transformation T on jRn, the dimension of the image of T is equal to
that of the row space of [T]a for any basis ex for jRn .
(5) For any two linear transformations T : V --+ W and S : W --+ Z, if Ker(S 0 T) = 0,
then Ker(T) = O.
(6) Any polynomial p(x) is linear if and only if the degree of p(x) is less than or equal
to 1.
(7) Let T : jR3 --+ jR2 be a function given as T(x) = (TI (x) , T2(X» for any x E jR3.
Then T is linear if and only if their coordinate functions Ti , i = 1, 2, are linear.
(8) For a linear transformation T : jRn --+ jRn , if [T]g = In for some bases ex and f3 of
R" , then T must be the identity transformation.
(9) If a linear transformation T : jRn --+ jRn is one-to-one , then any matrix representation
of T is nonsingular.
(10) Any m x n matrix A can be a matrix representation of a linear transformation T :
jRn --+ jRm.
where x T Y is the matrix product of x T and Y, which is also a number identified with
the 1 x 1 matrix x T y. Using the dot product, the length (or magnitude) of a vector
x =(XI , X2, X3) is defined by
x· Y = IIxllllYII cos s,
In particular, two vectors x and Y are orthogonal (i.e ., they form a right angle
o= tt /2) if and only if the Pythagorean theorem holds:
HF I u _i cr _j,*J gl c_p?jec` p_
» @gpi f _s qcp@mqrml 0. . 2
158 Chapter 5. Inner Product Spaces
By rewriting this formula in terms of the dot product, we obtain another equivalent
condition:
X. Y = XIYI + xzYz + X3Y3 = O.
In fact, this dot product is one of the most important structures with which JR3 is
equipped. Euclidean geometry begins with the vector space JR3 together with the dot
product, because the Euclidean distance can be defined by the dot product.
The dot product has a direct extension to the n-space JRn of any dimension n:
for any two vectors x = (XI , Xz , . . . , xn) and y = (YI , Yz, .. . , Yn) in JRn , their dot
product, also called the Euclidean inner product, and the length (or magnitude)
of a vector are defined similarly as
[x] = (x.x)! =
To extend this notion of the dot product to a (real) vector space, we extract the
most essential properties that the dot product in JRn satisfies and take these properties
as axioms for an inner product on a vector space V .
Definition 5.1 An inner product on a real vector space V is a function that associates
a real number (x, y) to each pair of vectors x and y in V in such a way that the following
rules are satisfied: For any vectors x, y and z in V and any scalar k in JR,
(1) (x, y) = (y , x) (symmetry),
(2) (x + y, z) = (x, z) + (y, z) (additivity),
(3) (kx , y) = k(x, y) (homogeneity),
(4) (x, x) ::: 0, and (x, x) = 0 ¢> x = 0 (positive definiteness) .
A pair (V, ( ,)) of a (real) vector space V and an inner product ( , ) is called a (real)
inner product space. In particular, the pair (JRn , .) is called the Euclidean n-space.
Note that by symmetry (1), additivity (2) and homogeneity (3) also hold for the
second variable: i.e.,
Example 5.1 (Non-Euclidean inner product on JRz) For any two vectors x = (XI, xz)
and y = (YI , yz) in JR z , define
5.1. Dot products and inner products 159
where a, band C are arbitrary real numbers. Then , this function { , } clearly satisfies
the first three rules of the inner product , i.e., {, } is symmetric and bilinear. Moreover,
° °
if a > and det A = ab - c2 >
Problem 5.1 In Example 5.1, the conver se is also true : Prove that if (x, y) = x T Ay is an inner
product in ]R2, then a > 0 and ab - c2 > O.
Example 5.2 (Case ofx f. 0 f. Y but {x, y} = 0) Let V = C [0, 1] be the vector
space of all real-valued continuous functions on [0, 1]. For any two functions f and
g in V , define
{f, g} = 1\ f (x) g(x)dx .
I - 2x if ° s x ::: i,
(° °s s i,
°
if x
f(x) =
(
. I
If 2 sxs I,
and g(x) =
2x - I if i s x s I.
Then f i= 0 i= g, but (f, g ) = O. o
By a subspace W of an inner product space V , we mean a subspace of the vector
space V together with the inner product that is the restriction of the inner product on
Vto W.
Example 5.3 (A subspace as an inner product space) The set W = D 1[0, 1] of all
real-valued differentiable functions on [0, 1] is a subspace of V = qo, 1]. The
restriction to W of the inner product on V defined in Example 5.2 makes W an inner
product subspace of V. However, one can define another inner product on W by the
following formula: For any two functions f(x) and g(x) in W,
{{f, g}} = 1 1
f(x)g(x )dx + 1 1
f '( x)g'(x)dx.
Then {{, }} is also an inner product on W , which is different from the restriction to W
of the inner product of V, and hence W with this new inner product is not a subspace
of the inner product space V. 0
160 Chapter 5. Inner Product Spaces
Remark: From vector calculus, most readers might be already familiar with the dot
product (or the inner product) and the cross product (or the outer product) in the
3-space jR3. The concept of this dot product is extended to a higher dimensional
Euclidean space jRn in this section. However, it is known in advanced mathematics
that the cross product in the 3-space jR3 cannot be extended to a higher dimensional
Euclidean space jRn. In fact, it is known that if there is a bilinear function I : jRn X
jRn -+ jRn; I(x, y) = x x y satisfying the property: x x y is perpendicular to x and
= =
y, and IIx x yll2 IIxll 211yll2 - (x . y)2, then n 3 or 7. Hence, the cross product or
the outer product will not be introduced in linear algebra.
This inequality implies that the polynomial (x, x)t 2 + 2{x, y)t + (y, y) in t has either
no real roots or a repeated real root. Therefore, its discriminant must be nonpositive:
The lengths of vectors and angles between two vectors in an inner product space
are defined in a similar way to the case of the Euclidean n-space.
(2) The distance d(x , y) between two vectors x and Yis defined by
d(x, y) = [x - YII.
(x, y}
cos e = IIxlillYIl or (x,y} = IIxIiIlYllcos(}.
Such a number () is called the angle between x and y.
For example, the dot product in the Euclidean 3-space ]R3 defines the Euclidean
distance in ]R3. However, one can define infinitely many non-Euclidean distances in
]R3 as shown in the following example.
(x, y} 2
cos () = - - = - - = 0.6324 .. ..
IIxllllYl1 J5:2
Thus, () = cos"! (k) . Notice that in the Euclidean 2-space]R2 with the dot product,
the angle between x = (1,1) and Y = (1,0) is clearly and cos = =
0 .7071· . '. It shows that the angle between two vectors depends actually on an inner
product on a vector space. [d 1
0 0]
(2) For any diagonal matrix A = 0 da 0 with all d, > 0,
o 0 d3
defines an inner product on]R3. Thus, there are infinitely many different inner products
on ]R3 . Moreover, an inner product in the 3-space]R3 may play the roles of a ruler and
a protractor in our physical world ]R3. 0
Problem 5.3 In Example 5.4(2), show that x T Ay cannot be an inner product if A has a negative
diagonal entry d; < O.
Problem 5.4 Prove the following properties of length in an inner product space V : For any
vectors x, y E V,
(1) [x] 0,
(2) [x] = 0 if and only if x = 0,
(3) IIkxll = Iklllxll,
(4) IIx+YII::::: [x] + IIYII (triangle inequality).
162 Chapter 5. Inner Product Spaces
Problem 5.5 Let V be an inner product space. Show that for any vectors x, y and z in V ,
(I) d(x, y) 2: 0,
(2) d(x, y) = 0 if and only if x = y,
(3) d(x, y) = dey, x),
(4) d(x, y) ::: d(x, z) + d(z , y) (triangle inequality).
Corollary 5.3 Let V be an inner product space, and let a = {Vj, . . . , vn } be a basis
for V . Then, a vector x in V is orthogonal to every basis vector Vi in a ifand only if
x=o.
Proof: If (x, Vi) = 0 for i = 1, 2, . . . , n, then (x, y) = L:?=j Yi (x, Vi) = 0 for any
y = L:?=j YiV i E V. 0
Example 5.5 (Pythagorean theorem) Let V be an inner product space, and let x
and y be any two nonzero vectorsin V with the angle e. Then, (x, y) = IIxlillYIl cosO
gives the equality
IIx + yf = IIxli z + lIyllZ + 211xllllYil cos e .
Moreover, it deduces the Pythagorean theorem: IIx + yliZ = IIxli z + lIyllZ for any
orthogonal vectors x and y. 0
Theorem 5.4 [fxj, Xz , .. . , Xk are nonzero mutually orthogonal vectors in an inner
product space V (i.e., each vector is orthogonal to every other vector), then they are
linearly independent.
Proof: Suppose CjXj + CZXz + . .. + qXk = O. Then for each i = 1,2, ... , k,
o (0, Xi) = + . ..+ qXk , Xi)
(CjXj
= Cj (Xj,Xi) + . .. + Ci(Xi , Xi} + . .. + q(Xk, Xi}
2
= cillxiU ,
because xi , Xz, . .. , Xk are mutually orthogonal.Since each Xi is not the zero vector,
IIxdl ;6 0; so ci = 0 for i = 1, 2, .. . , k. 0
5.3. Matrix representations of inner products 163
r
Problem 5.6 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. Prove
Problem 5.7 Let V = C [0, 1] be the inner product space of all real-valued continuous func-
tions on [0, 1] equipped with the inner product
1
(f, g) = 10 f(x)g(x) dx for any f and gin V.
For the following two functions f and g in V, compute the angle between them: For any natural
numbers k, t,
n n
(x, y) = L Yj (Vi, V j)
i=1 j=1
n n
(x,y) = LLxiYjaij = [xlrA[yla.
i=l j=l
The matrix A is called the matrix representation of the inner product ( , ) with
respect to the basis a.
ei . ej = Dij . Thus, for x = Li Xiei and Y = Lj Yjej ERn, the dot product is just
the matrix product x T y:
(f, g) = i l
f(x)g(x)dx.
Then for a basis a = (f1(X) = 1, fz(x) = x, h(x) = xZ} for V, one can easily
find its matrix representation A = [aijl: For instance,
The expression of the dot product as a matrix product is very useful in stating or
proving theorems in the Euclidean space.
For any symmetric matrix A and for a fixed basis a, the formula (x, y) =
[xlr A[Yla seems to give rise to an inner product on V . In fact, the formula clearly is
symmetric and bilinear, but does not necessarily satisfy the fourth rule, positive defi-
niteness. The following theorem gives a necessary condition for a symmetric matrix
A to give rise to an inner product. Some necessary and sufficient conditions will be
discussed in Chapter 8.
Theorem 5.5 The matrix representation A of an inner product (with respect to any
basis) on a vector space V is invertible. That is, det A =P O.
Proof: Let (x, y) = [xlr A[Yla be an inner product in a vector space V with respect
to a basis a, and let A[Yla = 0 as a homogeneous system of linear equations. Then
(Y, y) = [YlrA[Yla = O.
Recall that the conditions a > 0 and det A = ab - c Z > 0 in Example 5.1 are
sufficient for A to give rise to an inner product on R Z•
rectangular coordinate system for jRn. In an inner product space, a vector with
length I is called a unit vector. If x is a nonzero vector in an inner product space
V, the vector x is a unit vector. The process of obtaining a unit vector from a
nonzero vector by multiplying the reciprocal of its length is called a normalization.
Thus, if there is a set of mutually orthogonal vectors (or a basis) in an inner product
space, then the vectors can be converted to unit vectors by normalizing them without
losing their mutual orthogonality.
Definition 5.4 A set of vectors xr, X2, . .. , Xk in an inner product space V is said
to be orthonormal if
(orthogonality) ,
(normality) .
(I) {[ ] ,[ ]}
It will be shown later in Theorem 5.6 that every inner product space has an
orthonormal basis , just like the standard basis for the Euclidean n-space jRn.
The following example illustrates how to construct such an orthonormal basis.
Solution: Let Cl , C2 and C3 be the column vectors of A in the order from left to
right. It is easily verified that they are linearly independent, so they form a basis for
the column space C(A) of dimension 3 in jR4. For notational convention, we denote
by Spanlx}, .. . ,xd the subspace spanned by {Xl, . . . , xd .
(1) First normalize cl to get
ul =
Cl
=
Cl (I 1I I)
'2 = 2' 2' 2' 2 '
166 Chapter S. Inner Product Spaces
which is a unit vector. Then Spanluj] =Span{cI}, because one is a scalar multiple
of the other.
(2) Noting that the vector Cz - (UI, CZ}UI = Cz - 2uI = (0, 1, -1, 0) is a
nonzero vector orthogonal to UI , we set
Then, {UI, uz} is orthonormal and Spanlu}, uz} = Spanjc}, cz}. because each u, is
a linear combination of CI and C2, and the converse is also true.
(3) Finally, note that C3 - (UI, C3}UI - (U2, C3}UZ = C3 - 4uI + J2uz =
(0, 1, 1, -2) is also a nonzero vector orthogonal to both UI and uz. In fact,
is a unit vector, and one can also show that Spanlu}, Uz, U3) = Spanlcj , Cz , C3} =
C(A) . Consequently, {UI , Uz, U3} is an orthonormal basis for C(A). 0
The orthonormalization process in Example 5.7 indicates how to prove the fol-
lowing general case, called the Gram-Schmidt orthogonaIization.
XI
UI = IIxIII'
Of course, Xz - (xz, UI}UI =f. 0, because {XI , xz} is linearly independent. Generally,
one can define by induction on k = 1,2, . . . , n,
Then, as Example 5.7 shows, the vectors UI, Uz, ... , Un are orthonormal in the
n-dimensional vector space V . Since every orthonormal set is linearly independent,
it is an orthonormal basis for V . 0
5.4. Gram-Schmidt orthogonalization 167
Problem 5.9 Use the Gram-Schmidt orthogonalization on the Euclidean space R" to transform
the basis
{(O, 1, 1, 0), (-I, 1, 0, 0), (1, 2, 0, -1), (-1, 0, 0, -I)}
into an orthonormal basis.
Problem5.10 Find an orthonormal basis for the subspace W of the Euclidean space ]R3 given
byx + 2y - z = O.
The next theorem shows that an orthornormal basis acts just like the standard
basis for the Euclidean n-space jRn.
Theorem 5.7 Let {u I, U2, ... , Uk} be an orthonormal basis for a subspace V in an
inner product space V. Then, for any vector x in V ,
Proof: For any vector x E V, one can write x = XIUI + X2U2 + + XkUk, as a
linear combination of the basis vectors. However, for each i I, = , n,
= xi ,
Moreover, one can identify an n-dimensional inner product space V with the
Euclidean n-space jRn. Let a = {VI, V2, . • . , vn } be an orthonormal basis for the
space V. With this orthonormal basis a, the natural isomorphism <I> : V....-+ jRn given
by <I> (Vj) = [v;]a = ej, i = I, 2, . . . , n preserves the inner product on vectors: For a
vector x = I:7=1 xiv, in V with x; = (x, Vj), the coordinate vector ofx with respect
to a is a column matrix
168 Chapter 5. Inner Product Spaces
The right-hand side of this equation is just the dot product of vectors in the Euclidean
space JRn. That is,
(x, y) = = <I>(x) · <I>(y)
for any x, y E V. Hence, the natural isomorphism <I> preserves the inner product,
and we have the following theorem (compare with Corollary 4.8(1Ÿ.
Theorem 5.8 Any n-dimension inner product space V with an inner product (, ) is
isomorphic to the Euclidean n-space JRn with the dot product ., which means that an
isomorphismpreserves the inner product.
In this sense, someone likes to restrict the study of an inner product space to the
case of the Euclidean n-space JRn with the dot product.
A special kind of linear transformation that preserves the inner product such as
the natural isomorphism from V to JRn plays an important role in linear algebra, and
it will be studied in detail in Section 5.8.
5.5 Projections
Let U be a subspace of a vector space V . Then, by Corollary 3.13 there is another
subspace W of V such that V = U EB W, so that any x E V has a unique expression as
x = u + w for u E U and w E W. As an easy exercise, one can show that a function
T : V V defined by T(x) = T(u + w) = u is a linear transformation, whose
image Im(T) =T(V) is the subspace U and kernel Ker(T) is the subspace W.
Definition 5.5 Let U and W be subspaces of a vector space V . A linear transformation
T :V V is called the projection of V onto the subspace U along W if V = U EB W
and T(x) = u for x = u+w E U EB W.
Since the pairs leI, ez} and [ej , v} are linearly independent, the space ]R2 can be
expressed as the direct sum in two ways : ]R2 = X $ Y = X $ Z .
y z
TX(x) = (2,0) x x
This shows that a projection of ]R2 onto the subspace X depends on a choice of
complementary subspace of X. For example, by choosing Zn = {r(n, I) : r E R]
for any integer n as a complementary subspace, one can construct infinitely many
different projections of ]R2 onto the x -axis. 0
Problem 5.12 For V = U E9 W, let Tv denote the projection of V onto U along W, and let
Tw denote the projection of V onto W along U. Prove the following.
One can easily show that U.i is a subspace of V , and v E u» if and only if
(v, u) = 0 for every U E f3, where f3 is a basis for U. Moreover, W ..L U if and only
if W o-.
Problem 5.13 Let U and W be subspaces of an inner product space V. Show that
Proof: Let dim U = k. To show (U.i).i = U, take an orthonormal basis for U, say
a = {VI,V2, ... , vd, by the Gram-Schmidt orthogonalization, and then extend it to
an orthonormal basis for V, say f3 = {VI,V2,"" Vb Vk+I, ... , Vn }, which is always
possible. Then, clearly y = {Vk+I , " " vn } forms an (orthonormal) basis for o-,
which means that (U.i).i = U and V = U EEl o-. 0
Definition 5.7 Let U be a subspace of an inner product space V, and let {UI, U2 , . .. ,
urn} be an orthonormal basis for U. The orthogonal projection Proju from V onto
the subspace U is defined by
for any x E V.
Clearly, Proju is linear and a projection, because Proju 0 Proju = Proju. More-
over, Proju(x) E U and x - Proju(x) E u-, because
Corollary 5.12 The orthogonal projection Proju is the projection of V onto a sub-
space U along its orthogonal complement u-.
Example 5.9 (The orthogonal projectionfrom JR3 onto the xy-plane) In the Euclidean
3-space JR3, let U be the xy-plane with the orthonormal basis ex = [ej , fl}. Then, the
orthogonal projection Proju(x) = (ej , x)el + (fl, X)fl is the orthogonal projection
onto the xy-plane in a usual sense in geometry, and x- Proju (x) E U 1.., which is the z-
axis. It actually means that Proju(XI, X2, X3) = (Xl , X2, 0) for any x = (XI,X2 , X3) E
0
Example 5.10 (The orthogonal projection from JR2 onto the x-axis) As in Exam-
ple 5.8, let X, Y and Z be the I-dimensional subspaces of the Euclidean 2-space
JR2 spanned by the vectors eJ, e2, and v = el + fl = (1, I), respectively. Then
clearly Y = Xl.. and V :f:. Xl... And, for the projections Tx and Sx ofJR2 given in
Example 5.8, Tx is the orthogonal projection , but Sx is not, so that Tx Proh and =
SX:f:. Proh . 0
Theorem 5.13 Let U be a subspace of an inner product space V , and let x E V.
Then, the orthogonal projection Proju(x) ofx satisfies
Proof: First, note that for any vector x E V, we have Proju(x) E V and x -
Proju(x) E u-.
Thus, for all y E V,
where the second equality comes from the Pythagorean theorem for the orthogonality
(x - Proju(xŸ .L (Proju(x) - y) . (See Figure 5.2.) 0
It follows from Theorem 5.13 that the orthogonal projection Proju(x) ofx is the
unique vector in U that is closest to x in the sense that it minimizes the distance
from x to the vectors in V. It also shows that in Definition 5.7, the vector Proju(x) is
independent of the choice of an orthonormal basis for the subspace U . Geometrically,
Figure 5.2 depicts the vector Proju(x) .
°
Problem 5.14 Find the point on the plane x - y - z = that is closest to p = (1, 2, 0).
Problem 5.15 Let U C ]R4 be the subspace of the Euclidean 4-space ]R4 spanned by
(1, 1, 0, 0) and (1, 0, 1, 0), and let We ]R4 be the subspace spanned by (0, 1, 0, 1) and
(0, 0, I , 1). Find a basis for and the dimension of each of the following subspaces:
(1) U + W, (2) u-, (3) Ul.. + w-, (4) un w.
5.6. Orthogonal projections 173
x - Proju(x) E UJ..
x
.... X-y
...... ,.:.:!'"
..
.. ' .... ..JV
'
U
Proju(x)
Problem 5.16 Let U and W be subspaces of an inner product space V . Show that
(I) (U + W)J.. = UJ.. n WJ... (2) (U n W)J.. = UJ.. + WJ...
As a particular case, let V = jRn be the Euclidean n-space with the dot product
=
and let U {ru : r E jR} be a l-dimensional subspace determined by a unit vector
u. Then for a vector x in jRn, the orthogonal projection of x into U is
=
Example 5.11 (Distancefrom apointto a line) Letax+by+c 0 be a line L in the
plane JR2. (Note that the line L cannot be a subspace ofJR2 if c f= 0.) For any two points
Q = (Xl. YI) and R = (X2 , Y2) on the line, the equality a(x2 - Xl) + b(Y2 - YI) = 0
that the nonzero vector D = (a , b) is perpendicular to the line L, that is,
QR..L D.
Let P = (xo, YO) be any point in the plane JR2 . Then the distance d between the
point P and the line L is simply the length of the orthogonal projection of QP into
D, for any point Q = (Xl, YI) in the line. Thus,
d = II Projn(QP)II
p = (xo. YO)
Note that the last equality is due to the fact that the point Q is on the line (i.e.,
aXI + bYI + c = 0).
To find the orthogonal projection matrix, let u =
11:11 =
';)+b 2(a, b). Then the
orthogonal projection matrix onto U = {ru : r E JR} is
uuT = a 2 b2 [ : ] [a b] = a 2 b2 [:; l
Thus, if x = (1, 1) E JR2, then
2
Proju(x) 1
= (uuT)x = -2--2
a +b
[ ab
a
o
5.7. Relations of fundamental subspaces 175
Problem 5.17 Let V = P3(lR) be the vector space of polynomials of degree 3 equipped
with the inner product
1
(f, g) = 10 f(x)g(x) dx for any f and gin V .
Let W be the subspace of V spanned by [I , x}, and define f(x) = x 2. Find the orthogonal
projection Proj w(f) of f onto W.
Lemma 5.14 For an m x n matrix A, the null spaceN(A) and the row space'R(A)
are orthogonal: i.e. , N(A) 1- R(A) in ]Rn. Similarly, N(A T) 1- C(A) in ]Rm.
Proof: Note that WE N(A) if and only if Aw = 0, i.e., for every row vector r in A,
r . W = O. For the second statement, do the same with AT. D
We show that the row space 'R(A) is the orthogonal complement of the null space
N(A) in R", and vice-versa. Similarly, the same thing happens for the column space
C(A) and the null space N (A T) of AT in JRm . Hence, by Theorem 5.11, we have the
following orthogonal decomposition.
176 Chapter 5. Inner Product Spaces
A
- - - -b-
N( A) :::: jRn - r
Xn
In particular, if rank A = m (so that m n), then C(A) = ]Rm. Thus, for any
b E ]Rm, the system Ax = b has a solution in ]Rn . (This is the case of the existence
Theorem 3.24) .
On the other hand, if rank A = n (so that n m), then N(A) = (OJ and
R (A) = ]Rn. Therefore, the system Ax = b has at most one solution, that is, it
has a unique solution x in R(A ) if b E C(A), and has no solution if b rj. C(A).
(Th is is the case of the uniqueness Theorem 3.25) . The latter case may occur when
m > r = rank A; that is, N (A T) is a nontrivial subspace of]Rm, and will be discussed
later in Section 5.9.1.
5.8. Orthogonal matrices and isometries 177
Problem 5.19 Given two vectors (1 , 2, I, 2) and (0, - I , -1 , 1) in lR4 , find all vectors in
]R4 that are perpendicular to them.
Problem 5.20 Find a basis for the orthogonal complement of the row space of A :
1 28] 001]
(1) A =
[ 23 0-1 61 , (2) A =
[0 0 I .
111
A T A= [--
: - -] [ I
CI
- - cT
n
__ I
cT
Hence, if the column vectors are orthonormal, C j = 8ij, then AT A = In, that is,
AT is a left inverse of A, and vice-versa. Since A is a square matrix , this left inverse
must be the right inverse of A, i.e., AA T = In. Equivalently, the row vectors of A are
also orthonormal. This argument can be summarized as follows .
A = [ 0 - sin 0 ] B = [ 0 sin 0 ]
smO cosO ' smO -cosO
Note that the linear transformation T : ]R2 -+ ]R2 defined by T (x) = Ax is a rotation
through the angle 0, while S : ]R2 -+ ]R2 defined by Sex) = Bx is the reflection about
the line passing through the origin that forms an angle 0/2 with the positive x-axis.D
1 0 0] 1/./2 -1/./2 0]
(1) 0 cosO sinO , (2) .
[ o - sin 0 cos (} [
Problem 5.22 Find eight 2 x 2 orthogonal matrices which transform the square -1 x, y 1
onto itself.
As shown in Examples 5.12 and 5.13, all rotations and reflections on the Euclidean
2-space ]R2 are orthogonal and preserve intuitively both the lengths of vectors and the
angle between two vectors. In fact, every orthogonal matrix A preserves the lengths
of vectors :
Definition 5.10 Let V and W be two inner product spaces. A linear transformation
T : V -+ W is called an isometry, or an orthogonal transformation, if it preserves
the lengths of vectors , that is, for every vector x E V
IIT(x)1I = [x]:
Proof: The necessity is clear. For the sufficiency, suppose that A preserves the dot
product. Then for any vectors x, y E jRn,
Recall that if () is the angle between two nonzero vectors x and y in an inner
product space V, then for any isometry T : V V,
Proof: Note that the k-th column vector of the matrix is just [T (Vk)],8' Since
T preserves inner products and ex, fJ are orthonormal, we get
Problem 5.23 Find values r > 0, S > 0, a > 0, band c such that matrix Q is orthogonal.
(1) Q =
[
° -s2ssa]
r
r
b ,
c
-s a]
3s b
-15 c
.
Problem 5.24 (Bessel's Inequality) Let V be an inner product space, and let {VI, . . . , vm } be
a set of orthonormal vectors in V (not necessarily a basis for V) . Prove that for any x in V,
IIxll 2 2: L:i"=ll(x, Vi)!2 .
Problem 5.25 Determine whether the following linear transformations on the Euclidean space
]R3are orthogonal.
5.9 Applications
In the previous section, we have completely determined the solution set for a system
Ax = b when b E C(A). In this section, we discuss what we can do when the system
Ax = b is inconsistent, that is, when b It C(A) £ IRm • Certainly, there exists no
solution in this case, but one can find a 'pseudo' -solution in the following sense.
Note that for any vector x in IRn, Ax E C(A) . Hence, the best we can do is to find
a vector XO E jRn so that AXQ is the closest to the given vector b E jRm : i.e., II AXO - b II
is as small as possible. Such a vector XQ will give us the best approximation Ax to b
for all vectors x in IRn , and it is called a least squares solution of Ax = b.
To find a least squares solution, we first need to find a vector in C(A) that is
closest to b. However, from the orthogonal decomposition IRm = C(A) EEl N(A T ) ,
any b E IRm has the unique orthogonal decomposition as
holds since AXO = bc. Thus, AT (Axo-b) = AT (-bn ) = oor equivalently AT Axo =
A Tb, that is, XO is a solution of the equation
ATAx=ATb. I
This equation is very interesting because it is also a sufficient condition for a least
squares solution as the next theorem shows, and it is called the normal equation of
Ax=b.
Theorem 5.24 Let A be an m x n matrix, and let b e jRm be any vector. Then, a
vector XO E jRn is a least squares solution of Ax = b ifand only ifxo is a solution of
the normal equation AT Ax = ATb.
Proof: We only need to show the sufficiency. Let Xo be a solution of the normal
equation AT Ax = ATb. Then, AT (Axo - b) = 0, so AXO - b E N(A T). Say,
AXO - b = n inN(A T), and let b = b c + b n E C(A) EBN(A T) . Then, Axo - b c·=
n-j-b; E N(A T). Since Axo- b c is also contained in C(A) andN(AT)nC(A) = {OJ,
AXO = b, = PrOjc(A) (b) , i.e., xo is a least squares solution of Ax = b . 0
Example 5.14 (The best approximated solution of an inconsistent system Ax = b)
Find all the least squares solutions of Ax = b, and then determine the orthogonal
projection b c ofb into the column space C(A), where
1 -2
2 -3
A = -1 1
[
3 -5
5.9.1. Application: Least squares solutions 183
-3
-1
2
-1 3] [
1 -5
2 0 -1
3
and
ATb =
[ 1 2-1 3]
-2 -3
1 -1
1 -5
2 0
= [ 0] -1
3
.
=[ ].
[ -3 3 6 X3 3
By sol ving this system of equations (left for an exercise), one can obtain all the least
squares solutions, which are of the form:
Note that the set of least squares solutions is Xo + N(A ), where xo = [-8/3
5/3 O]T andN(A) = {t[5 3 I f : t E JR}. 0
10 2]
o 2 2 [3]
-3
A = [ -1
-1 2
1 -1
0
' b =
-3
O '
Note that the normal equation is always consistent by the construction, and, as
Example 5.14 shows, a least squares solution can be found by the Gauss-Jordan
elimination even though AT A is not invertible.lfb E C(A) or, even better, if the rows
of A are linearly independent (thus, rank A = m andC(A) = JRm), then be = b so that
184 Chapter 5. Inner Product Spaces
the system Ax = b is always consistent and the least squares solutions coincide with
the true solutions.Therefore,for any givensystem Ax = b, consistentor inconsistent,
by solving the normalequation AT Ax = ATb, one can obtaineither the true solutions
or the least squares solutions.
If the square matrix AT A is invertible,then the normal equation A T Ax = ATb of
the system Ax = b resolves to x = (AT A)-I ATb, which is a least squares solution.
In particular, if AT A = In , or equivalently the columns of A are orthonormal (see
Lemma5.18), then the normal equationreduces to the leastsquaressolution x = ATb.
The following theorem gives a condition for A T A to be invertible.
Theorem 5.25 Forany m x n matrix A, AT A is a symmetric n x n square matrix
and rank (A T A) = rank A.
Remark: (1) For an m x n matrix A, by applying Theorem 5.26 to AT, one can say
that rank A = m if and only if AA T is invertible. In this case AT (AAT)-I is a right
inverse of A (cf. Remark after Theorem 3.25). Moreover, AA T is invertible if and
only if the rows of A are linearly independent by Theorem 5.25.
5.9.1. Application: Least squares solutions 185
In fact, this result coincides with Theorem 5.26: If A is orthogonal so that AT A = In,
then
UF - - ] _ [ UI.· b ]
: b-: ,
-- Un' b
Solution: Clearly, the two columns of A are linearly independent and C(A) is the
xy-plane. Thus , b ¢ C(A ). Note that
186 Chapter 5. Inner Product Spaces
Hence,
is a least squares solution, which is unique sinceN (A) = O. The orthogonal projection
ofb in C(A) is
be = Ax = [
1 2] [ 14/3]
-1/3 =
[ 4]
. o
Problem 5.27 Find all the least squares solutions of the following inconsistent system of linear
equations:
In this section, one can find a reason for the name of the "least squares" solutions,
and the following example illustrates an application of the least squares solution to
the determination of the spring constants in physics .
Example 5.16 Hooke's law for springs in physics says that for a uniform spring, the
length stretched or compressed is a linear function of the force applied, that is, the
force F applied to the spring is related to the length x stretched or compressed by the
equation
F=a+kx ,
where a and k are some constants determined by the spring .
Suppose now that, given a spring of length 6.1 inches, we want to determine the
constants a and k under the experimental data: The lengths are measured to be 7.6,
8.7 and lOA inches when forces of 2, 4 and 6 kilograms , respectively, are applied to
the spring . However, by plotting these data
in the x F -plane, one can easily recognize that they are not on a straight line of the
form F a + kx in the x F -plane, which may be caused by experimental errors. This
=
r
means that the system of linear equations
= a + 6.lk = 0
F2 = a +7.6k = 2
F3 = a + 8.7k = 4
F4 = a + lOAk = 6
is inconsistent (i.e., has no solutions so the second equality in each equation may not
be a true equality). It means that if we put b = (0,2,4,6) and F = (FI, F2, F3, F4)
as vectors in R 4 representing the data and the points on the line at Xi'S, respectively,
then II b - F I is not zero. Thus, the best thing one can do is to determine the straight
line a + kx = F that 'fits' the data best: that is, to minimize the sum of the squares of
the vertical distances from the line to the data (Xi, Yi) for i = I, 2, 3, 4 (See Figure
5.5) (this is the reason why we say least squares):
(0 - FI)2 + (2 - F2)2 + (4 - F3)2 + (6 - F4)2 = lib - F1I 2 .
=
I 6.1] =
[ 0] = b
Ax
[I lOA 6
C(A),
we are looking for F E C(A), which is the projection of b onto the column space
C(A) of A, and the least squares solution xo, which satisfies Axo = F.
F
10
8
6
4
2
I
f(XI) = ao + alxl + + + akxt = YI
f(X2) ao + alx2 + a2x 2 +
= + akx2 = Y2
11 Xl
X2 x 2 x
X t] [
2
ao
al ] [ Y2
YI ]
The left-hand side Ax represents the values of the polynomial atx, 's and the right-hand
side represents the data obtained from the inputs Xi'S in the experiment.
If n k + 1, then the cases have already been discussed in Section 3.9.1. If
n > k + 1, this kind of system may be inconsistent. Therefore, the best thing one can
do is to find the polynomial f(x) that minimizes the sum of the squares of the vertical
distances between the graph of the polynomial and the data. But, it is equivalent to
find the least squares solution of the system Ax = b, because for any C E C(A) of the
form
xt ] [
x2
a
al
o]
[ ao + alxl + .. . + akxt ]
ao+alx2+ · ··+akx2
[jX
n
x; ....." xi = ao + alx
n
1.. .+ = c,
we have
The previous theory says that the orthogonal projection be ofb into the column space
of A minimizes this quantity and shows how to find be and a least squares solution
Xo·
Example 5.17 Find a straight line y = a + bx that fits best the given experimental
data, (1,0), (2,3), (3,4) and (4,4).
Solution: We are looking for a line y = a + bx that minimizes the sum of squares
of the vertical distances IYi - a - bx, I's from the line Y = a + bx to the data (Xi, Yi).
By adapting matrix notation
we have Ax = b and want to find a least squares solution of Ax = b. But the columns
of A are linearly independent, and the least squares solution is x = (AT A)-l ATb.
Now,
10]
30 '
(ATA)_I=[
1 l'
ATb=[ll]
34 .
-- -
2 5
Hence, we have
o
Problem 5.28 From Newton's second law of motion, a body near the surface of the earth falls
vertically downward according to the equation
1
set) = so + vot + zgt 2 ,
where set) is the distance that the body travelled in time t, andso, Vo are the initial displacement
and velocity, respectively, of the body, and g is the gravitational acceleration at the earth's
surface. Suppose a weight is released, and the distances that the body has fallen from some
reference point were measured to be s = -0.18, 0.31, 1.03, 2.48,3.73 feet at times t =
0.1. 0.2, 0.3, 0.4. 0.5 seconds. respect ively. Determine approximate values of so. vo, g
using these data.
190 Chapter 5. Inner Product Spaces
In Section 5.9.1, we have seen that the orthogonal projection ProjC(A) of the Euclidean
space lRm on the column space C(A) of an m x n matrix A plays an important role in
=
finding a least squares solution of an inconsistent system Ax b. Also, the orthogonal
projection ProjC(A) is the main tool in the Gram-Schmidt orthogonalization.
In general , for a given subspace U of lRm , the computation of the orthogonal
projection Proju oflRm onto U appears quite often in applied science and engineering
problems. The least squares solution method can also be used to find the orthogonal
projection: Indeed, by taking first a basis for U and then making an m x n matrix A with
these basis vectors as columns , one clearly gets U = C(A) , and so by Theorem 5.26
Note that this projection matrix Proju is independent of the choice of a basis for U
due to the uniqueness of the matrix representation of a linear transformation with
respect to a fixed basis.
(A TA )- I = [10 6
6 5
]-1 = ..!.- [
14
5 -6]
- 6 10 .
=
[[
14 -i
0[H 5 -6] [0 3
-6 10 1 2 ]
=
14
2
-1 [102 13
6 -3 -H
5.9.3. Application: Orthogonal projection matrices 191
and
Pb = .!..
14 6 -3 5 1
= .!.. [
14 11
]. o
Example 5.19 If A =
[CI c2l, where CI =
(1,0,0), C2 =
(0,1,0), then the col-
umn vectors of A are orthonormal, C(A) is the x y-plane, and the projection of
b = (x, y, z) E 1R3 onto C(A) is be = (x , y, 0). In fact,
Note that, if we denote by Proju; the orthogonal projection of R" on the subspace
spanned by the basis vector u, for each i, then its projection matrix is uiuT, and so
and
ifi=/=j
if i = j .
Problem5029 Let u = ()z ,)z) be a vector in lR2 which determines l-dimensional subspace
U = {au = (:72' :72) :a E R] . Show that the matrix
Problem5.30 Show that if {VI , v2, .00. v m} is an orthonormal basis for lRm, then VIv[ +
T T_[m·
V2 v2+" ,+vmv m-
192 Chapter 5. Inner ProductSpaces
In general, if a basis {CI, C2, . . . , cn} for U is given, but not orthonormal,then one
has to directly compute
Proju = A(A T A)-I AT,
where A = [CI c2 . . . cn] is an m x n matrix whose columns are the given basis
vectors c.'s.
Sometimesit is necessaryto computean orthonormalbasisfromthe givenbasis for
U by the Gram-Schmidt orthogonalization. Its computationgivesus a decomposition
of the matrix A into an orthogonal part and an upper triangular part, from which the
computation of the projection matrix might be easier.
QR decomposition method: Let {CI, C2, ... ,cn} be an arbitrary basis for a sub-
space U. The Gram-Schmidt orthogonalization process to this basis may be written
as the following steps:
(1) From the basis {CI, C2, . . . , cn} , find an orthogonal basis {ql' q2, ... , qn} for
Uby
ql = CI
(ql, C2)
q2 = C2 -
(ql,qJ)
ql
CI = ql = bllUI
C2 = al2ql + q2 = bl2 uI + b22 U2
(qi,Cj ) 1 d
h aij =
were (qi ,qi) lor I < ], au = ,an
(q j, Cj)
bij = aij IIqjII = -(--)
qj,qj
IIqill = (u., Cj),
Theorem 5.27· Any m x n matrix ofrank n can be factored into a product Q R, where
Q is an m x n matrix with orthonormal columns and R is an n x n invertible upper
triangular matrix.
and so
o o
With this QR decomposition of A, the projection matrix and the least squares
solution of Ax = b for b E IR m can be calculated easily as
1 1 0]
101
A= [CI C2 C3] = 0 1 1 .
[
001
Solution: We first find the decomposition of A into Q and R, the orthogonal part and
the upper triangular part. Use the Gram-Schmidt orthogonalization to get the column
vectors of Q:
ql = CI = (I, I, 0, 0)
q2 = C2 - C2' ql ql
ql . ql
=
2 2
1, 0)
= C3 - C3 . q2 q2 _ C3 . ql ql = 1) ,
q3 q2 . q2 ql . ql 3 3 3
U2
=
IIq211 -
_ (_1 _ _
1
../6 ' ../6'-.13'
-/2 0)
U3 =
q3 (
IIq311 = -
2 2
.J2T' .J2T' .J2T' .;7 .
2 -.13)
Then, CI = -/2uI, C2 = JzUI + .j[U2' C3 = JzUI + + ftU3 ' In fact, these
equations can also be derived from A = QR with an upper triangular matrix R . (It
gives that the (i, j)-entry of R is bij = (Uj , Cj).) Therefore,
A = [H :]
o 0 1
= [UI U2 U3]
0 0 .;7/-.13
]
1/-/2 1/../6 -2/.J2T] [ -/2 1/-/2 1/-/2]
= 1/-/2 -1/../6 2/.J2T 0 -.13/-/2 1/../6 = QR,
[ o -/2/-.13 2/.J2T 0 0 .;7/-.13
o 0 -.13/.;7
and
6/7 1/7 1/7
- 2/ 7 ]
6/7 -1/7 2/7
p=QQT=
[ -1/7 6/7 2/7 . o
-2/7 2/7 2/7 3/7
5.9.3. Application: Orthogonal projection matrices 195
Problem 5.31 Find the 2 x 2 matrix P that projects the xy -plane onto the line y =x .
[i l
Problem 5.32 Find the projection matrix P of the Euclidean 3-space]R3 onto the column space
Problem 5.33 Find the projection matrix P on the XJ, X2, X4 coordinate subspace of the
Euclidean 4-space ]R4.
8
Problem 5.34 Find the QR factorization of the matrix [ sin 88 cos ]
cos 0 .
Proof: Let P be an orthogonal projection matrix . Then , the matrix P can be written
as P = A(A T A)-J AT for a matrix A whose column vectors form a basis for the
column space of P. It gives
for i = 1, 2, ... , m. Then, each Pi is the projection of jRm onto the i -th axis, whose
matrix form looks like
196 Chapter5. Inner Product Spaces
o o
o
Pi = I - Pi = o
o
o o
When we restrict the image to JR, Pi is an element in the dual space JRn*, and usually
denoted by Xi as the i-th coordinate function (see Example 4.25). 0
Problem 5.35 Show that any square matrix P that satisfies p T P = P is a projection matrix.
5.10 Exercises
5.1. Decide which of the following functions on lR.2 are inner products and which are not. For
x = (XltX2), Y = (Yl, Y2) in]R2
(1) = XlY l X2Y2.
(x, y)
(2) (x, y) = 4Xl Yl + 4X2Y2 - Xl Y2 - X2Yl,
(3) (x, y) = Xl Y2 - X2Yl,
(4) (x, y) = XlYI + 3X2Y2,
(5) (x, y) = XlYI - Xl Y2 - X2Yl + 3X2Y2·
5.2. Show that the function (A, B) = treAT B) for A, B E Mnxn(]R) defines an inner product
on Mnxn(lR) .
5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in ]R5.
5.4. Determine the values of k so that the given vectors are orthogonal with respect to the
5.5. Consider the space qo, 1] with the inner product defined by
ir. g) = f f(x)g(x)dx .
Compute the length of each vector and the cosine of the angle between each pair of vectors
in each of the following :
(1) f(x) = 1, g(x) = X;
(2) f(x) = x m, g(x) = x n , where m, n are positive integers;
(3) f( x) = sin,.,rrx, g(x) = cosrerx, where m, n are positive integers.
5.6. Prove that
(al + + an)2 ::: near + ... + a;)
for any real numbers al, a2, , an. When does equality hold?
5.10. Exercises 197
5.7. Let V = P2([0, 1]) be the vector space of polynomials of degree g 2 on [0, 1] equipped
with the inner product
1
(f, g) = 10 f(t)g(t)dt.
1/./2
° ° -1/./2] 1/ ./2 1/./3 -1/..;'6]
°
(3) -1/./2 1/./2 , (4) 1/./2 -1/./3 1/..;'6 .
[ [
-1/./2 1/./2
° 1/./3 2/..;'6
5.14. Let W be the subspace of the Euclidean 4-space ]R4 consisting of all vectors that are
orthogonal to both x = (1, 0, -1 , 1) and Y = (2, 3, -1, 2). Find a basis for the
subspace W.
5.15. Let V be an inner product space. For vectors x and Yin V , establish the following identities :
5.17. Let A be the m x n matrix whose columns are Cl, C2, ... , Cn in the Euclidean m-space
IRm . Prove that the volume of the n-dimensional parallelepiped P(A) determined by those
vectors Cj 's in IRm is given by
vol (A) = J
det(AT A) .
(Note that the volume of the n-dimensional parallelepiped determined by the vectors
c j , c2, ... , Cn in IRm is by definition the multiplication of the volume of the (n - 1)-
dimensional parallelepiped (base) determined by C2, . . . , Cn and the height of cl from
the plane W which is spanned by C2, .. . , Cn' Here, the height is the length of the vector
C = q - Proj W(q ), which is orthogonalto W . (See Figure 5.6.)lfthe vectors are linearly
dependent, then the parallelepiped is degenerate, i.e., it is contained in a subspace of
dimension less than n .)
5.18. Find the volume of the three-dimensional tetrahedron in the Euclidean 4-space jR4 whose
vertices are at (0,0,0,0), (1,0,0,0), (0, 1,2,2) and (0,0, 1,2).
5.19. For an orthogonal matrix A, show that det A = µ1. Give an example of an orthogonal
matrix A for which det A = -1.
5.20. Find orthonormal bases for the row space and the null space of each of the following
matrices.
243] 1 40]
(I)
[21 1 I ,
0 I
(2)
[ -2 -3 1
002
,
5.21. Let A be an m x n matrix of rank r. Find a relation among m, nand r so that Ax = b has
infinitely many solutions for every b E jRm .
5.22. Find the equation of the straight line that fits best the data of the four points (0, 1), (1, 3),
(2, 4), and (3, 4).
5.23. Find the cubic polynomial that fits best the data of the five points
(-1 , -14), (0, -5), (1, -4), (2, 1), and (3, 22).
5.24. Let W be the subspace of the Euclidean 4-space IR4 spanned by the vectors Xj 'S given in
each of the following problems. Find the projection matrix P for the subspace W and the
null space N(P) of P. Compute Pb for b given in each problem.
5.10. Exercises 199
[{s -f ],
matrices:
[ 24 I] 1 4 0]
(1) (2) 1 1 I ' (3) 0 0 2 .
[ 2 3 -1
../5 ../5
5,27. Consider the space C[ -1, 1] with the inner product defined by
U, g) = 11
-I
f(x)g(x)dx.
Chapter1
Problems
1.2 (1) Inconsistent.
(2) (XI, Xz, x3, X4) = (-1- 4t, 6 - 21, 2 - 31, 1) for any 1 E JR.
1.3 (1) (x , y, z) = (1, -1 , 1). (3) (w, x , y, z) = (2, 0, 1, 3)
1.4 (1) bl + bZ - b3 = O. (2) For any bj's.
1.7 a = -¥, b = ¥, c = .!j, d = -4.
1.9 Consider the matrices : A = [; : l B = [; J. C = J-
1.10 Compare the diagonal entries of AA T and AT A.
1.12 (1) Infinitely many for a = 4, exactly one for a f= µ4, and none for a = -4.
(2) Infinitely many for a = 2, none for a = -3, and exactly one otherwise.
1.14 (3) I = IT = (AA-I)T = (A-I)T AT means by definition (AT)-I = (A-I)T.
1.16 Use Problem 1.14(3).
1 .17 Any permutation on n objects can be obtained by taking a finite number of interchangings
of two objects .
1.21 Consider the case that some dj is zero.
1.22 X = 2, y = 3, Z = 1.
1.23 True if Ax = b is consistent, but not true in general.
1.24 L =[ -
o
-I I
U=
0
[b - - ].
0 I
1.25 (1) Consider (i, j)-entries of AB for i < j .
(2) A can be written as a product of lower triangular elementary matrices .
362 Selected Answers and Hints
1 01 0] [20 u=
1.26 L=
[ -1/2 0
o -2/3 1
,D=
0 o 4/3
] ,
0 0 1
1.27 There are four possibilities for P.
1.29 (1) II = 0.5, 12 = 6, 13 = 0.55. (2) II = 0, h = 13 = 1,14 = 15 = 5.
0.35 ]
1.30 x = k 0.40 for k > O.
[ 0.25
Exercises
1.13 (2)
5
0
-2227
101]
-60 .
[ o 0 87
1.14 See Problem 1.9.
1.16 (1) A-lAB = B. (2) A-lAC =C = A + I.
1.17 a = 0, c- I = b i= O.
1 -1 0 0] [13/8
11 8 A-I _ 0 1/2 -1/2 0 B-1 = -15/8
. - 0 0 1/3 -1/3 '
[
o 0 0 1/4 5/4
= -Is
8 -23
-19 2]
1.19 A-I
[4
1 4 .
-2 1
Selected Answers and Hints 363
1.24 (1) A = ]
3 1 1 0 0 -1 0 0 1
(2) d_ ][ l
1.25 c=[2 -I3f,x=[423]T.
1.26 (2) A =
111002001
1.27 (1) (Ak)- I = (A-Ii . (2) An-I = 0 if A E Mnxn.
(3) (l- A)(l + A + ... + Ak- I) = I - Ak.
l
1.29 Exactly seven of them are true.
(1) See Theorem 1.9. (2) Consider A =
::: T]1f [? r) = BT AT =
BA .
5 3 1 1 3 2
(7) (A-Il = (AT)-I = A-I. (8) If A-I exists, A-I = A-I (AB T) = B T.
(9) If AB has the (right) inverse C, then A-I = BC.
(10) Con sider EI = and EZ = Ul
(12) Consider a permutation matrix
Chapter 2
Problems
2.2 (1) Note: 2nd column - 1st column = yd column - 2nd column.
2.4 (1) -27, (2) 0, (3) (1 - x 4)3 .
2.7 Let a be a transposition in Sn. Then the composition of a with an even (odd) permutation
in Sn is an odd (even, respectively) permutation.
2.9 (1) -14. (2) O.
2.10 (I)(y-x)(-x+z)(z-y)(w-x)(w-y)(w- z)(w+y+x+z) .
(2) (fa - be + cd) (fa + cd - eb) .
2.1 k = 0 or 2.
2.2 It is not necessary to compute A 2 or A 3 .
2.3 -37.
2.4 (1) det A = (_1)n-1 (n - 1). (2) O.
2.5 -2,0,1,4.
2.6 Consider L a1u(l) . .. anu(n)'
ueS.
2.7 (1) 1, (2) 24.
2.8 (3) Xl = 1, X2 = -1, x3 =2, X4 = - 2.
2.9 (2) x = (3,0, 4/11)T.
2.10 k=Oorµ1.
2.11 x = (-5, 1,2, 3)T .
2.12 x = 3, y = -1, z = 2.
2.13 (3) All = - 2, A12 = 7, An = -8, A33 = 3.
2.19 Multiply
= [; 11 det AI = 4.
1
2.20 If we set A then the area is
l
2.23 Exactly seven of them are true.
: : ::::, 1 1o
'J': A : :l IT dot A 0'
(16) At = A -I for any permutation matrix A.
Chapter 3
Problems
A= °1 °1
[
° 1
Exercises
Chapter 4
Problems
4.2 To show W is a subspace, see Theorem 4.2. Let Eij be the matrix with I at the (i, j)-th
position and 0 at others . Let Fk be the matrix with 1 at the (k, k) -th position, -I at the
(n, n)-thposition and 0 at others. Then theset{Eij, Fk : I :::: i f= j :::: n, k = I, ... , n-I}
is a basis for W . Thus dim W = n2 - 1.
4.3 tr(AB) = I:i=1 I:k=1 aikbki = I:k=1 I:i=1 bkiaik = tr(BA) .
4.4 If yes, (2, I) = T(-6 , -2, 0) = -2T(3 , I, 0) = (-2, -2).
4.5 If aivi + a2v2 + . ..+ akvk = 0, then
o = T(al VI + a2v2 + .., + akvk) = al WI + a2w2 + . .. + akwk implies ai = 0 for
i=I , ... ,k.
4.6 (I) If T(x) = T(y) , then S 0 T(x) = So T(y) implies x = y. (4) They are invertible.
4.7 (1) T(x) = T(y) if and only if T(x - y) = 0, i.e., x - y E Ker(T) .
(2) Let{VI, . . . , vn} be a basis for V. 1fT is one-to-one, then the set{T(vI), . .. , T(vn )} is
linearly independent as the proof of Theorem 4.7 shows. Corollary 3.12 shows it is a basis
=
for V . Thus, for any y E V, we can write it as y I:I=I aiT(vi) =
T(I:I=I aivi) . Set
x = I:I=I aivi E V . Then clearly T(x) = y so that T is onto. If T is onto, then for each
i = =
I , .. . , n there exists xi E V such that T(Xi) Vi.Then the set Ixj , ... ,xn} is linearly
independent in V,since, ifI:l=l aixi = = =
0, then 0 T(I:I=I aixi) I:I=I aiT(xi) =
I:I=I aivi implies ai = 0 for all i = I, . . . ,n. Thus it is a basis by Corollary 3.12 again.
If T(x) = 0 for x = I:I=I aixi E V , then 0 = T(x) =
I:I=I aiT(xi) =
I:I=I aivi
implies ai = =
0 for all i =
I , . . . , n, that is x O.Thus Ker (T) =
{O}.
4.10 (I)[Tla=[;
4
=i
7 0 4 -3
245 ].
4.11 =
o 2 3 4
U Un
368 Selected Answers and Hints
413
4.14 = - , [Tl a =
o 0 1 0 0 4
4.17
1 0 4 1 1 5
4.18 Write B = Q-I AQ with some invertible matrix Q.
(1) det B = det(Q-I AQ) = det Q-I det Adet Q = det A. (2) tr(B) = tr(Q-I AQ) =
tr(QQ-I A) = tr(A) (see Problem 4.3). (3) Use Problem 3.19.
4.20 a* = {fI (x, y, z) = x- !y, !2(x, y, z) = !y, f3(X, y, z) = -x + z},
4.24 By BA, we get a tilting along the x-axis; while by BT A, one can get a tilting along the
y-axis .
Exercises
4.1 (2).
4.2 ax 3 + bx 2 + ax + c.
4.4 S is linear because the integration satisfies the linearity.
4.5 (1) Consider the decomposition ofv = v+I(V) + v-I(v).
4.6 (1) {(x, 2x) E lR3 : x E R}.
4.7 (2) T-1(r, s, t) = (! r, 2r - s, 7r - 3s - r) ,
4.8 (1) Since T 0 S is one-to-one from V into V, T 0 S is also onto and so T is onto . Moreover,
if S(u) = S(v), then T 0 S(u) = T 0 S(v) implies u = v. Thus, S is one-to-one, and
so onto . This implies T is one-to-one. In fact, if T(u) = T(v), then there exist x and y
such that S(x) = u and S(y) = v. Thus T 0 S(x) = T 0 S(y) implies x = y and so
u = T(x) = T(y) =v.
4.9 Note that T cannot be one-to-one and S cannot be onto.
4.12 vol(T(CŸ = Idet(A)lvo1(C), for the matrix representation A of T.
4.13 (3) (5, 2, 0).
5 4
-4 -3
4.14 0 0
[
o 0 o 1
-1/3
4.15 (1) [ -5/3 2/3 ]
1/3 .
Selected Answers and Hints 369
4.18
i iJ
001 000
(I) h U Q [: tr:' ,
Chapter 5
Problems
5.1 Let (x, y) = XT Ay be an inner product. Then, for x = (1,0), (x, x) = aXIYI +
C(XI Y2 + X2YI) + bX2Y2 > 0 implies a > O. Similarly, for any x (x, 1), (x, x) > 0=
implies ab - c2 > O.
5.2 (x, y)2 = (x , x) (y, y) if and only if [rx + ylj2 = (x, x)t 2 + 2(x, y)t + (y, y) = 0 has
a repeated real root to.
5.3 If dl < 0, then for x = (1, 0, 0), (x, x) = dl < 0: Impossible.
5.4 (4) Compute the square of both sides and then use Cauchy-Schwarz inequality.
5.5 (4) Use Problem 5.4(4): triangle inequality in the length.
5.6 (f, g) = fJ
f(x)g(x)dx defines an inner product on C [0, 1]. Use Cauchy-Schwarz
inequality.
5.7 (2)-(3): Use fJ
f(x)g(x) dx = 0 if k f. i; and = if k = e.
5.8 (1): Orthogonal, (2) and (3): None, (4): Orthonormal.
= =
5.10 Clearly, Xl (1,0,1), x2 (0,1,2) are in W. The Gram-Schmidt orthogonaliza-
tion gives UI = W
= (1,0,1), u2 = JJ(-I, 1, 1) which form a basis.
5.12 (4) Im(idv - T) Ker(T) because T(idV - T)(x) = T(x) - T 2(x) = O. Im(idV-
T) 2 Ker(T) because if T(x) = 0, x = x - 0 = X - T(x) = (idv - T)x.
5.13 (1) (x , x) = 0 for only x = O. (2) Use Definition 5.6(2).
5.16 1) is just the definition, and use (1) to prove (2).
5.17
5.18 (1) b E C(A) and y E N(A T) .
5.32 P = [ 7 ; - ].
3 1 - I 2
5.33 Note that (e) , e2, ea} is an orthonormal basis for the subspace.
5.35 Hint: First, show that P is symmetric.
Exercises
5.1 Inner products are (2) , (4), (5).
5.2 For the last condition of the definition, note that (A , A) = tr(A T A) = Lj,i at = 0
if and only if aij = 0 for all t, j .
5.4 (I)k = 3.
5.5 (3) 11/11 = IIgll =../f7'1., The angle is 0 ifn = m, ifn i: m .
5.6 Use the Cauchy-Schwarz inequality and Problem 5.2 with x = (a), . . . , an) and
y = (1, .. . , 1) in (R", .).
5.7 (1) 37/4, JT97J.
(2) If (h , g) = h(% + + c)= 0 with h i: 0 a constant and g(x) = ax 2 + bx + c,
then (a , b, c) is on the plane %+ + c = 0 in jR3.
5.9 Hint: For A = [V) V2], two columns are linearly independent, and its column space
is W.
3 1
5.11 (I ) ZV2, (2) ZV2.
5.13 Orthogonal: (4). Nonorthogonal: (I ), (2), (3).
5.17 Use induction on n. Let B be the matrix A with the first column c) replaced by
c = c ) - Projw (C) , and write Projw(c) = a2c2 + ... +ancn for some c. ts. Show
that Jdet (AT A ) = J det( B T B ) = Ilcllvol(c2, . . . , cn) = vol(P(A».
5.18 Let A =
012
! Then the volume of the tetrahedron is tJ det(A T A ) = 1.
5.19 A T A =I = det A imply det A = ±1.
and det AT
sin e
. A = [cos
Th e matrix sin e
e _ cos e ] ISort
. h ' h det A = - I .
i WIt
ogona
5.21 Ax = b has a solution for every b E jRm if r = m. It has infinitely many solutions if
nullity =n - r =n - m > O.
i
. 3
my = a + bx. Then y = x + Z.
27
372 Selected Answers and Hints
Chapter 6
l
Problems
eigenvectors are (1, I, 1), (4, 2,1) and (1, -1 , 1), respectively. It turns out that an =
2 2n
2 - (_I)n_ - -.
3 3
6.13 Its characteristic polynomial is f (A) = (A + 1) (A - 2)2 ; so (- Il , 2n , n2n form a
fundamental set.