# Linear Equations and Matrices

Nishan Krikorian
Northeastern University
December 2009
Copyright c ￿ 2009 by Nishan Krikorian
PREFACE
These notes are intended to serve as an introduction to linear equations and
matrices, or as the subject is usually called, linear algebra. Linear algebra has two
distinct personalities. On the one hand it serves as a computational device, a prob-
lem solving tool indispensable in all quantitative disciplines. This is its algebraic
side. On the other hand its concepts can be seen, its relationships visualized. This
is its geometric side. The structure of these notes follows this basic dichotomy. In
Part 1 we focus on how matrices provide convenient tools for systematizing laborious
calculations by providing a compact notation for storing information and for describ-
ing relationships. In Part 2 we present those concepts of linear algebra that are
best understood geometrically, and we show how matrices describe transformations
of physical space.
The style of these notes is informal, meaning that the main consideration is
pedagogical clarity, not mathematical generality. The treatment of proofs varies.
Some proofs are given in full, some proofs are given only partially, some proofs are
given by example, and some proofs are omitted entirely. Whenever possible, ideas
are illustrated by computational examples and are given geometric interpretations.
Some of the important and essential applications of linear algebra are also presented,
including cubic splines, ODE’s, Markov matrices, and least squares.
These notes should be thought of as course notes or lecture notes, not as a course
text. The distinction is that we take a fairly direct path through the material with
certain speciﬁc goals in mind. These goals include the solution of linear systems, the
structure of linear transformations, least squares approximations, orthogonal trans-
formations, and the three basic matrix factorizations: LU, diagonal, and QR. There
is very little deviation from this path or extraneous material presented. The text is
lean, and Sections 6, 11, 12, 13, and 14 can be skipped without loss of continuity.
Almost all the exercises are important and help to develop subsequent material.
Linear algebra is a beautiful and elegant subject, but its practical side is equally
compelling. Stated in starkest terms, linear problems are solvable while nonlinear
problems are not.
PART 1: Algebra
1. Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. The LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4. Row Exchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5. Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6. Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7. Systems with Many Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9. Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10. Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
11. Matrix Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
12. Diﬀerential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
13. The Complex Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
14. Diﬀerence Equations and Markov Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
PART 2: Geometry
15. Vector Spaces, Subspaces and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
16. Linear Independence, Basis, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
17. Dot Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
18. Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
19. Row Space, Column Space, and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
20. Least Squares and Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization . . . . . . . . . . . . . . . 126
22. Diagonalization of Symmetric and Orthogonal Matrices. . . . . . . . . . . . . . . . . . . . 136
23. Quadratic Forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
24. Positive Deﬁnite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Answers to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
1. Gaussian Elimination 1
PART 1: ALGEBRA
1. GAUSSIAN ELIMINATION
The central problem of linear algebra is to ﬁnd the solutions of systems of linear
equations. We begin with a simple system of three equations and three unknowns:
2u + v − w = 5
4u − 2v = 0
6u − 7v + w = −9 .
The problem is to ﬁnd the unknown values of u, v, and w, which are themselves
called unknowns or variables. To do this we use Gaussian elimination.
The ﬁrst step of Gaussian elimination is to use the coeﬃcient 2 of u in the ﬁrst
equation to eliminate the u from the second and third equations. To accomplish this,
subtract 2 times the ﬁrst equation from the second equation and 3 times the ﬁrst
equation from the third equation. The result is
2u + v − w = 5
− 4v + 2w = −10
− 10v + 4w = −24 .
This completes the ﬁrst elimination step. The coeﬃcient 2 of u in the ﬁrst equation is
called the pivot for this step. Next use the coeﬃcient −4 of v in the second equation to
eliminate the v from the third equation. Just subtract 2.5 times the second equation
from the third equation to get
2u + v − w = 5
− 4v + 2w = −10
− w = 1 .
This completes the second elimination step. The coeﬃcient −4 of v in the second
equation is the pivot for this step. The coeﬃcient −1 of w in the third equation is
the pivot of the third elimination step, which did not have to be performed. The
elimination process is now complete. The resulting system is equivalent to the original
one, and its simple triangular form suggests an obvious method of solution: The third
equation gives w = −1; substituting this into the second equation −4v+2(−1) = −10
gives v = 2; and substituting both into the ﬁrst equation 2u + (2) − (−1) = 5 gives
u = 1. This simple process is called back substitution.
How did we determine the multipliers 2 and 3 in the ﬁrst step and 2.5 in the
second? Each is just the leading coeﬃcient of the row being subtracted from, divided
by the pivot for that step. For example, in the second step, 2.5 equals the coeﬃcient
−10 divided by the pivot −4.
2 1. Gaussian Elimination
We said that the triangular system obtained above is equivalent to the original
system, but what does this mean? It means simply that the two systems have the
same solution. This is clear since any solution of the original system must also be
a solution of each system obtained after each step of Gaussian elimination. This
is because Gaussian elimination amounts to nothing more than the subtraction of
equals from equals. Therefore any solution of the original system must also be a
solution of the ﬁnal triangular system. And by reversing this argument we see that
any solution of the ﬁnal triangular system must also be a solution of the original
system. Both systems must therefore have the same solutions.
We can simplify Gaussian elimination by noticing that there is no need to carry
the symbols for the unknowns u, v, w along in each step. We can instead represent
the system as an array:

2 1 −1
4 −2 0
6 −7 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

5
0
−9
¸
¸
.
The numbers multiplying the unknowns in the equations are called coeﬃcients and
are determined by their position in the array. They are separated from the right-hand
sides of the equations by a vertical line. The ﬁrst elimination step gives

2 1 −1
0 −4 2
0 −10 4 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

5
−10
−24
¸
¸
,
and the second gives

2 1 −1
0 −4 2
0 0 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

5
−10
1
¸
¸
.
Note that the coeﬃcient part of the array is now in triangular form with the pivots
on the diagonal. Back substitution gives the solution.
Can this process ever fail? It is clear that as long as the pivots are not zero
at each step, Gaussian elimination and back substitution will produce a solution.
But if a pivot is ever zero, Gaussian elimination will have to stop. This can happen
suddenly and unpredictably. In the example above, the second pivot was −4, but we
did not know this until we completed the ﬁrst elimination step. It could have turned
out to be zero thereby stopping the process. In fact this would have happened if the
coeﬃcient of v in the ﬁrst equation were −1 instead of 1. In general, we don’t know
what the pivot for a particular elimination step is going to be until we complete the
previous step, so we don’t know ahead of time if the process is going to succeed. In
most cases the problem of a zero pivot can be ﬁxed by exchanging two equations. In
some cases the zero pivot represents a true breakdown, meaning that there is either
no solution or inﬁnitely many solutions. We will consider these possibilities later. For
now we assume our systems have only nonzero pivots and thus have unique solutions.
1. Gaussian Elimination 3
A comment on terminology: The oﬃcial mathematical deﬁnition of a pivot re-
quires it to be nonzero. Therefore to say “nonzero pivot” is redundant, and to say
“zero pivot” is contradictory. For the latter we really should say “a zero in the pivot
position” or “a zero in the diagonal position.” But since it is simpler and clearer just
to say “nonzero pivot” or “zero pivot”, we will continue to do so. We will however
discuss this point further in Section 7 where the exact deﬁnition of a pivot will be
given.
We have seen how Gaussian elimination puts the coeﬃcient part of the array into
triangular form so that back substitution will give the solution. But, instead of back
substitution, we can also use Gaussian elimination from the bottom up to get the
solution. For the example above, this is done as follows: Use Gaussian elimination
to get the array into triangular form as before:

2 1 −1
0 −4 2
0 0 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

5
−10
1
¸
¸
.
Next subtract −2 times the third row from the second row and 1 times the third row
from the ﬁrst row to obtain

2 1 0
0 −4 0
0 0 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

4
−8
1
¸
¸
,
and then subtract −.25 times the second row from the ﬁrst to obtain

2 0 0
0 −4 0
0 0 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
−8
1
¸
¸
.
Clearly the purpose of these steps is to introduce zeros above the diagonal entries.
The coeﬃcient part of the array is now in diagonal form, and the solution u = 1,
v = 2, w = −1 is obvious. This method of using Gaussian elimination forwards
and backwards is called Gauss-Jordan elimination. It can be used for solving small
problems by hand, but it is ineﬃcient for large problems. We will see later (Section 3)
that ordinary Gaussian elimination with back substitution requires fewer operations
and is therefore preferable.
EXERCISES
1. Solve the following systems using Gaussian elimination in array form.
(a) u − 6v = −8
3u − 2v = −8
4 1. Gaussian Elimination
(b) 5u − v = −1
−3u + 2v = −5
(c) 2u + v + 3w = −4
−2u + 5v + w = 18
4u + 2v + 4w = −6
(d) 4u − 2v + 4w = −24
2u + 3v − w = 17
−8u + 2v + 5w = −1
(e) 3u + 5v = 3
− 2v − 3w = −6
6w + 2x = 14
− w − 2x = −4
2. Solve the system below. When a zero pivot arises, exchange the equation with the
one below it and continue.
u + v + w = −2
3u + 3v − w = 6
u − v + w = −1
3. Try to solve the system below. Why won’t the trick in the previous problem work
here?
u + v + w = −2
3u + 3v − w = 6
u + v + w = −1
4. A farmer has two breeds of chickens, Rhode Island Red and Leghorn. In one year,
one Rhode Island Red hen will yield 10 dozen eggs and 4 pounds of meat, and one
Leghorn hen will yield 12 dozen eggs and 3 pounds of meat. The farmer has a market
for 2700 dozen eggs and 900 pounds of meat. How many hens of each breed should
he have to meet the demand of the market exactly?
5. Suppose a man wants to consume exactly his minimum daily requirements of
70.5 grams of protein and 300 grams of carbohydrates on a diet of bread and peanut
butter. How many grams of each should he eat if bread is 10% protein and 50%
carbohydrates and peanut butter is 25% protein and 20% carbohydrates?
6. A nutritionist determines her minimum daily needs for energy (1,800 kcal), protein
(92 g), and calcium (470 mg). She chooses three foods, pasta, chicken, and broccoli,
and she collects the following data on the nutritive value per serving of each.
1. Gaussian Elimination 5
energy (kcal) protein (g) calcium (mg)
pasta 150 5 10
chicken 200 30 10
broccoli 25 3 90
She then asks how many servings per day of pasta, chicken, and broccoli must she
consume in order to satisfy her minimum daily needs for energy, protein, and calcium
exactly.
7. Find the cubic polynomial y = ax
3
+bx
2
+cx+d that interpolates (that is, whose
graph pass through) the points (−1,5), (0,5), (1,1), (2,−1).
8. Find the cubic polynomial function f(x) = ax
3
+ bx
2
+ cx + d such that f(0) =
2, f ￿

(0) = 1, f(1) = 1, f ￿

(1) = 0. (This is called cubic Hermite interpolation). Sketch
its graph.
6 2. Matrix Notation
2. MATRIX NOTATION
As is common in multidimensional calculus, points in space can be represented
by vectors. For example
b =

5
0
−9
¸
¸
is the column vector that represents the point (5, 0, −9) in three-dimensional space.
The basic operations on vectors are multiplication by scalars (real numbers for the
time being)
3b =

15
0
−27
¸
¸

5
−2
2
¸
¸
+

−4
−3
4
¸
¸
=

1
−5
6
¸
¸
.
Two vectors can be added together as long as they are the same size.
We deﬁne a matrix to be an array of column vectors of the same size. For
example
C =

2 1 −1 5
4 −2 0 0
6 −7 1 −9
¸
¸
is a 3 × 4 matrix (read “three by four matrix”). It has three rows and four columns.
Two basic operations on matrices are multiplication by scalars
3C =

6 3 −3 15
12 −6 0 0
18 −21 3 −27
¸
¸

2 1
−3 2
0 4
−1 0
¸
¸
¸
+

−3 6
4 −2
4 −1
3 0
¸
¸
¸
=

−1 7
1 0
4 3
2 0
¸
¸
¸
.
Two matrices can be added together as long as they have the same dimensions.
How do we multiply matrices? We ﬁrst answer the question for two special
matrices; namely, the product of a 1 × n matrix, which is a row vector, and an n
× 1 matrix, which is a column vector. Multiplication for these matrices is done as
follows:
[ 4 1 3 ]

3
1
0
¸
¸
= [ 4 · 3 + 1 · 1 + 3 · 0 ] = [ 13 ] .
2. Matrix Notation 7
This is just the familiar dot product of two vectors. To extend the deﬁnition to the
product of a matrix and a column vector, take the product of each row of the matrix
with the column vector and stack the results to form a new column vector:

4 1 3
2 6 8
1 0 9
2 2 1
¸
¸
¸

3
1
0
¸
¸
=

4 · 3 + 1 · 1 + 3 · 0
2 · 3 + 6 · 1 + 8 · 0
1 · 3 + 0 · 1 + 1 · 0
2 · 3 + 2 · 1 + 1 · 0
¸
¸
¸
=

13
12
3
8
¸
¸
¸
.
Note that the number of columns of the matrix must equal the number of components
of the vector being multiplied. As an application, we note that the system of equations
considered in the previous section
2u + v − w = 5
4u − 2v = 0
6u − 7v + w = −9
can now be represented as a matrix multiplying an unknown vector so as to equal a
known one:

2 1 −1
4 −2 0
6 −7 1
¸
¸

u
v
w
¸
¸
=

5
0
−9
¸
¸
.
This is an equation of the form Ax = b where the known matrix A, called the
coeﬃcient matrix of the system, multiplies the unknown vector x and equals the
known vector b. The problem is to ﬁnd x. The solution vector
x =

u
v
w
¸
¸
=

1
2
−1
¸
¸
,
obtained by Gaussian elimination, satisﬁes the equation

2 1 −1
4 −2 0
6 −7 1
¸
¸

1
2
−1
¸
¸
=

5
0
−9
¸
¸
.
Finally, to multiply two matrices, just multiply the left matrix times each column
of the right matrix and line up the resulting two vectors in a new matrix. For example

4 1 3
2 6 8
1 0 9
2 2 1
¸
¸
¸

3 5
1 0
0 1
¸
¸
=

13 23
12 18
3 14
8 11
¸
¸
¸
.
8 2. Matrix Notation
Note again that the number of columns of the left factor must equal the number of
rows of the right factor for this to make sense. In this example we multiplied a 4 ×3
matrix by a 3 ×2 matrix and obtained a 4 ×2 matrix. In general, if A is m×n and
B is n ×p, then AB is m×p.
Matrix multiplication satisﬁes the associative law (AB)C = A(BC) and the two
distributive laws A(B +C) = AB +AC and (B +C)D = BD+CD. (The proofs of
these properties are tedious and will be omitted.) It does not, however, satisfy the
commutative law. That is, in general AB ￿= BA. For example ￿

2 3
1 2 ￿
￿
0 1
1 1 ￿ ￿

= ￿

0 1
1 1 ￿
￿
2 3
1 2 ￿

.
In fact for many pairs of matrices AB is deﬁned whereas BA is not. (See Exercise
2.)
For every n there is a special n ×n matrix, which we call I, with ones down its
diagonal (also called its main diagonal) and zeros everywhere else. For example, in
the 3 × 3 case
I =

1 0 0
0 1 0
0 0 1
¸
¸
.
It is easy to see that for any 3 × 3 matrix A we have IA = AI = A and that this
property carries over to the n×n case. For this reason I is called the identity matrix.
The notation for a general matrix A with m rows and n columns is
A =

a
11
a
12
a
13
. . . a
1n
a
21
a
22
a
23
. . . a
2n
a
31
a
32
a
33
. . . a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
m3
. . . a
mn
¸
¸
¸
¸
¸
¸
where a
ij
denotes the entry in the ith row and the jth column. Using this notation
we can deﬁne matrix multiplication as follows. Let A be m × n and B be n × p,
then C is the m × p matrix with ijth coeﬃcient c
ij
=
n ￿

k=1
a
ik
b
kj
. We will try to
avoid expressions like this, but it is important to understand them when writing
computer programs to perform matrix computations. In fact, we can write a very
simple program that uses Gaussian elimination and back substitution to solve an
arbitrary linear system of n equations and n unknowns. First express the system in
array form:

a
11
a
12
a
13
. . . a
1n
a
21
a
22
a
23
. . . a
2n
a
31
a
32
a
33
. . . a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
. . . a
nn ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

a
1 n+1
a
2 n+1
a
3 n+1
.
.
.
a
n n+1
¸
¸
¸
¸
¸
¸
.
2. Matrix Notation 9
Then Gaussian elimination would look like
for k = 1 to n −1 do
if a
kk
= 0 then signal failure and stop
for i = k + 1 to n do
m = a
ik
/a
kk
a
ik
= 0
for j = k + 1 to n + 1 do a
ij
= a
ij
−ma
kj
.
(Note that the program stops when a zero pivot is encountered.) And back substi-
tution would look like
for k = n down to 1 do
t = a
k n+1
for j = k + 1 to n do t = t −a
kj
x
j
x
k
= t/a
kk
.
Finally, we summarize the algebraic laws satisﬁed by matrix addition and multi-
plication. (The following equalities assume that all indicated operations make sense.)
1. A+B = B +A (commutative law for addition)
2. A+ (B +C) = (A+B) +C (associative law for addition)
3. r(sA) = (rs)A
4. r(A+B) = rA+rB
5. (−1)A = −A
6. A(BC) = (AB)C (associative law for multiplication)
7. A(B +C) = AB +AC (left-distributive law for multiplication)
8. (B +C)A = BA+CA (right-distributive law for multiplication)
9. r(AB) = (rA)B = A(rB)
EXERCISES
1. Compute the following:
(a) 2 ￿

5 7 −1
4 −2 0 ￿

(b)

6 2
7 1
1 2
¸
¸
+

1 −2
3 6
5 −7
¸
¸
(c)

4 0 −1
0 1 0
2 −2 1
¸
¸

3
4
−5
¸
¸
10 2. Matrix Notation
(d) [ 3 4 −5 ]

4 0 −1
0 1 0
2 −2 1
¸
¸
(e) [ 1 2 3 ]

4
5
6
¸
¸
(f)

4
5
6
¸
¸
[ 1 2 3 ]
(g)

2 −1 3
5 0 7
0 −1 0
¸
¸

0 1
2 1
−2 3
¸
¸
(h)

4 0 −1
0 1 0
2 −2 1
¸
¸

2 −1 3
5 0 7
0 −1 0
¸
¸
(i)

4 0 −1
0 1 0
2 −2 1
¸
¸

1 0 0
0 1 0
0 0 1
¸
¸
(j)

2 0 0
0 1 0
0 0 3
¸
¸
5
(k)

0 1 2
0 0 1
0 0 0
¸
¸
3
2. Which of the expressions 2A, A+B, AB, and BA makes sense for the two matrices
below? Which do not?
A = ￿

5 7 −1
4 −2 0 ￿

B = ￿

2 3
1 2 ￿

3. Give 3 × 3 matrices that are examples of the following.
(a) diagonal matrix: a
ij
= 0 for all i ￿= j.
(b) symmetric matrix: a
ij
= a
ji
for all i and j.
(c) upper triangular matrix: a
ij
= 0 for all i > j.
2. Matrix Notation 11
4. Show with a 3 × 3 example that the product of two upper triangular matrices is
upper triangular.
5. Find examples of 2 × 2 matrices such that
(a) A
2
= −I
(b) B
2
= 0 where no entry of B is zero.
(c) AB = AC but B ￿= C. (Zero matrices not allowed!)
6. For any matrix A, we deﬁne its transpose A
T
to be the matrix whose columns are
the corresponding rows of A.
(a) What is the transpose of

2 −1 3
5 0 7
0 −1 0
¸
¸
?
(b) Illustrate the formula (A+B)
T
= A
T
+B
T
with a 2 × 2 example.
(c) The formula (AB)
T
= B
T
A
T
holds as long as the product AB makes sense.
(This requires a proof, which we omit.) Illustrate this with a 2 × 2 example,
and use it to prove the formula (ABC)
T
= C
T
B
T
A
T
.
(d) If a matrix satisﬁes A
T
= A, then what kind of matrix is it? (See Exercise 3
above.)
(e) Show that for any matrix C (not necessarily square) the matrix C
T
C is sym-
metric. (Use (c) and (d) above.)
(f) Show if A and B are square matrices and A is symmetric, then B
T
AB is sym-
metric.
(g) Show with a 2×2 example that the product of two symmetric matrices may not
be symmetric.
7. Verify

4 0 −1
0 1 0
2 −2 1
¸
¸

c
1
c
2
c
3
¸
¸
= c
1

4
0
2
¸
¸
+c
2

0
1
−2
¸
¸
+c
3

−1
0
1
¸
¸
.
8. Verify
[ c
1
c
2
c
3
]

4 0 −1
0 1 0
2 −2 1
¸
¸
= c
1
[ 4 0 −1 ] +c
2
[ 0 1 0 ] +c
3
[ 2 −2 1 ] .
9. The matrix (A+B)
2
is always equal to which of the following.
(a) A(A+B) +B(A+B)
(b) (A+B)A+ (A+B)B
(c) A
2
+AB +BA+B
2
(d) (B +A)
2
(e) A(A+B) + (A+B)B
12 2. Matrix Notation
(f) A
2
+ 2AB +B
2
10. Convince yourself that the product AB of two matrices can be thought of as A
multiplying the columns of B to produce the columns of AB or
A

.
.
.
.
.
.
.
.
.
b
1
b
2
· · · b
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
Ab
1
Ab
2
· · · Ab
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
.
11. Assuming the operations make sense, which are symmetric matrices?
(a) A
T
A
(b) A
T
AA
T
(c) A
T
+A
3. The LU Factorization 13
3. THE LU FACTORIZATION
If we run Gaussian elimination on the coeﬃcient matrix of Section 1, we obtain
A =

2 1 −1
4 −2 0
6 −7 1
¸
¸

2 1 −1
0 −4 2
0 −10 4
¸
¸

U =

2 1 −1
0 −4 2
0 0 −1
¸
¸
.
We call the resulting upper triangular matrix U. The following equations describe
exactly how the Gaussian steps turn the rows of A into the rows of U.
row 1 of U = row 1 of A
row 2 of U = row 2 of A −2(row 1 of U)
row 3 of U = row 3 of A −3(row 1 of U) −2.5(row 2 of U)
Note that once a row is used as “pivotal row,” it never changes from then on. It
therefore can be considered as a row of U. We can solve these equations for the rows
of A to obtain
row 1 of A = 1(row 1 of U)
row 2 of A = 2(row 1 of U) + 1(row 2 of U)
row 3 of A = 3(row 1 of U) + 2.5(row 2 of U) + 1(row 3 of U) .
Using the property of matrix multiplication illustrated in Section 2 Exercise 8, this
is just an expression of the matrix equation

row 1 of A
row 2 of A
row 3 of A
¸
¸
=

1 0 0
2 1 0
3 2.5 1
¸
¸

row 1 of U
row 2 of U
row 3 of U
¸
¸
or

2 1 −1
4 −2 0
6 −7 1
¸
¸
=

1 0 0
2 1 0
3 2.5 1
¸
¸

2 1 −1
0 −4 2
0 0 −1
¸
¸
.
14 3. The LU Factorization
We write this equation as A = LU and call the product on the right the LU factor-
ization of A. Note that L is the lower triangular matrix with ones down its diagonal,
with the multipliers 2 and 3 from the ﬁrst Gaussian step in its ﬁrst column, and
with the multiplier 2.5 from the second Gaussian step in its second column. The
pattern is the same for every matrix. Any square matrix can be factored by Gaussian
elimination into a product of a lower triangular L with ones down its diagonal and
an upper triangular U, under the proviso that all pivots are nonzero.
How can the LU factorization of A be used to solve the original system Ax = b?
First we replace A by LU in the system to get LUx = b. Then we note that this
system can be solved by solving the two systems Ly = b and Ux = y in order. Letting
y =

r
s
t
¸
¸
, the ﬁrst system Ly = b is

1 0 0
2 1 0
3 2.5 1
¸
¸

r
s
t
¸
¸
=

5
0
−9
¸
¸
,
which can be solved by forward substitution to get

r
s
t
¸
¸
=

5
−10
1
¸
¸
.
And letting x =

u
v
w
¸
¸
the second system Ux = y is

2 1 −1
0 −4 2
0 0 −1
¸
¸

u
v
w
¸
¸
=

5
−10
1
¸
¸
,
which can be solved by back substitution to get

u
v
w
¸
¸
=

1
2
−1
¸
¸
.
Therefore, in the matrix form of Gaussian elimination, we use elimination to factor
A into LU and then solve Ly = b by forward substitution and Ux = y by back
substitution.
If you have just one system Ax
1
= b
1
, then there is no advantage of this method
over the array form of Gaussian elimination presented in Section 1. In fact, it is
3. The LU Factorization 15
slightly harder since there is an extra forward substitution step. But now suppose you
have a second system Ax
2
= b
2
with a diﬀerent right-hand side. The LU factorization
method would factor A into LU and then solve LUx
1
= b
1
and LUx
2
= b
2
both by
forward and back substitution. On the other hand, the array method would have to
run through the entire Gaussian elimination process twice, once for each system. So
if you have several systems to solve, all of which diﬀer only in their right-hand sides,
then the LU factorization method is preferable.
By counting operations, we can compare the relative expense in computer time of
elimination verses forward and back substitution. We will count only multiplications
and divisions since they take much more time than addition and subtraction. In
the ﬁrst elimination step for an n × n matrix, a multiplier (one division) times the
second through nth entries of the ﬁrst row (n − 1 multiplications) is subtracted
from a row below the ﬁrst. This results in n operations. Since there are n − 1
rows to be subtracted from, the total number of operations for the ﬁrst step is
(n−1)n = n
2
−n. The second step is exactly like the ﬁrst except that it is performed
on an (n −1) ×(n −1) matrix and therefore requires (n −1)
2
−(n −1) operations.
Continuing in this manner we see that the total number of operations required for
Gaussian elimination is
n ￿

k=1
(k
2
−k) =
n
3
−n
3
,
and since n is negligible compared to n
3
for large n, we conclude the number of oper-
ations required to compute the LU factorization of an n ×n matrix is approximately
n
3
/3. Back substitution is much faster since the number of operations required is
easily seen to be
n ￿

k=1
k =
n(n + 1)
2
,
which is approximately n
2
/2. Forward substitution is the same. A 50 ×50 matrix
would therefore require 41,666 operations for its LU factorization but only 1,275
for forward and back substitution. By a similar operation count, we can show that
Gauss-Jordan elimination requires n
3
/2 operations, which is 50% more than straight
Gaussian elimination with back substitution. Gauss-Jordan elimination on a 50 ×
50 matrix would therefore require 62,500 operations.
EXERCISES
1. Find the LU factorizations of
(a) A = ￿

4 −6
3 5 ￿

16 3. The LU Factorization
(b) B =

2 1 3
−2 5 1
4 2 4
¸
¸
(c) C =

1 3 2 −1
2 5 3 2
−3 2 −1 2
1 1 3 1
¸
¸
¸
(d) D =

2 1 0 0 0
4 5 3 0 0
0 3 4 1 0
0 0 −1 1 1
0 0 0 4 3
¸
¸
¸
¸
¸
2. Use the LU factorizations above and forward and back substitution to solve
(a) Ax = ￿

−8
13 ￿

(b) Bx =

12
−6
18
¸
¸
(c) Cx =

0
4
−1
2
¸
¸
¸
(d) Dx =

4
5
−4
2
3
¸
¸
¸
¸
¸
3. If your computer performs 10
6
operations/sec and costs \$500/hour to run, then
how large a linear system can you solve with a budget of \$2? Of \$200?
4. Row Exchanges 17
4. ROW EXCHANGES
We now return to the question of what happens when we run into zero pivots.
Example 1: We ﬁrst consider the system
u + 2v + 3w = 1
2u + 4v + 9w = 5
2u + 6v + 7w = 4 .
Using Gaussian elimination on the corresponding array

1 2 3
2 4 9
2 6 7 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
5
4
¸
¸
the ﬁrst elimination step gives

1 2 3
0 0 3
0 2 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
3
2
¸
¸
.
A zero pivot has appeared. But note that there is a nonzero entry lower down in the
second column, in this case the 2 in the third row. The problem can therefore be
ﬁxed by just exchanging the second and third rows:

1 2 3
0 2 1
0 0 3 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
2
3
¸
¸
.
This has the harmless eﬀect of exchanging the second and third equations. In this
case we are done with elimination since the array is now ready for back substitution.
Example 2: Now let’s look at another system:
u + 2v + 3w = 1
2u + 4v + 9w = 5
3u + 6v + 7w = 5 .
Using Gaussian elimination on the corresponding array

1 2 3
2 4 9
3 6 7 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
5
5
¸
¸
18 4. Row Exchanges
the ﬁrst elimination step gives

1 2 3
0 0 3
0 0 −2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
3
2
¸
¸
.
But now a row exchange will not produce a nonzero pivot in the second row. Gaussian
elimination breaks down through no fault of its own simply because this system has
no solution. The last two equations, 3w = 3 and −2w = 2, cannot be satisﬁed
simultaneously. We can also see this by extending Gaussian elimination a little. Use
the 3 in the second equation to eliminate the −2 in the third equation. This will
produce

1 2 3
0 0 3
0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
3
4
¸
¸
.
The third equation, 0 = 4, signals the impossibility of a solution.
Example 3: In the example above, suppose the right-hand side of the third equation
is equal to 1 instead of 5, then the elimination gives

1 2 3
0 0 3
0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
3
0
¸
¸
.
What we really have here is two equations with three unknowns. Back substitution
breaks down since the ﬁrst equation cannot determine both u and v by itself. In this
case there are inﬁnitely many solutions to the original system. (See Section 7.)
We conclude that when we run into a zero pivot, we should look for a nonzero
entry in the column below the zero pivot. If we ﬁnd one, we make a row exchange
and continue. If we don’t, then we must stop; a unique solution to the system does
not exist. A matrix for which Gaussian elimination possibly with row exchanges
produces a triangular system with nonzero pivots is called nonsingular. Otherwise
the matrix is called singular.
What happens to the LU factorization of A when there are row exchanges? The
answer is that the product of the L and U we obtain no longer equals the original
matrix A but equals A with row exchanges. Suppose we knew what row exchanges
would be necessary before we started. Then if we performed those exchanges on A
ﬁrst, we would get the normal LU factorization of this altered A. The altered version
of A is realized by premultiplying A by a permutation matrix P, which is just the
identity matrix with some of its rows exchanged. We would then obtain the equation
PA = LU. For the ﬁrst example of this section this looks like

1 0 0
0 0 1
0 1 0
¸
¸

1 2 3
2 4 9
2 6 7
¸
¸
=

1 0 0
2 1 0
2 0 1
¸
¸

1 2 3
0 2 1
0 0 3
¸
¸
.
4. Row Exchanges 19
The PA = LU factorization can still be used to solve the system Ax = b as before.
Just apply P to both sides to get PAx = Pb. The LU factorization of PA gives
LUx = Pb. Forward and back substitution then give the solution x.
Since row exchanges are unpleasant when factoring matrices, we will try to re-
strict our attention to nonsingular matrices that do not need them. In any case, we
restate the central fact (really a deﬁnition): For a linear system Ax = b whose coef-
ﬁcient matrix A is nonsingular, Gaussian elimination, possibly with row exchanges,
will produce a triangular system with nonzero pivots on the diagonal, and back sub-
stitution will produce the unique solution. (From now on, we will take “Gaussian
elimination” to mean “Gaussian elimination possibly with row exchanges.”)
EXERCISES
1. Solve by the array method

1 4 2
−2 −8 3
0 1 1
¸
¸

u
v
w
¸
¸
=

−2
32
1
¸
¸
2. Which of the following matrices is singular? Why?
(a)

1 4 2
6 −8 2
−2 −8 −4
¸
¸
(b)

1 4 2
−2 −8 −3
−1 −4 5
¸
¸
(c)

1 3 2 0
0 5 0 2
0 0 10 2
0 0 0 11
¸
¸
¸
(d)

1 3 2 −1
0 5 3 2
0 0 0 2
0 0 0 10
¸
¸
¸
3. How many solutions do each of the following systems have?
(a)

0 1 −1
1 −1 0
1 0 −1
¸
¸

u
v
w
¸
¸
=

2
2
2
¸
¸
20 4. Row Exchanges
(b)

0 1 −1
1 −1 0
1 0 −1
¸
¸

u
v
w
¸
¸
=

0
0
0
¸
¸
4. Prove that if A is nonsingular, then the only solution of the system Ax = 0 is
x = 0. (A system of the form Ax = 0 is called homogeneous.)
5. Inverses 21
5. INVERSES
A square matrix A is invertible if there is a matrix B of the same size such
that their product in either order is the identity matrix: AB = BA = I. If there
is such a B, then there is at most one. We write it as A
−1
and call it the inverse
of A. Therefore AA
−1
= A
−1
A = I. We can easily prove that there cannot be
more than one inverse of a given matrix: If B and C are both inverses of A, then
B = BI = B(AC) = (BA)C = IC = C. As an example, the inverse of the matrix ￿

1 −1
1 2 ￿

is ￿

2
3
1
3

1
3
1
3 ￿

since ￿

1 −1
1 2 ￿ ￿

2
3
1
3

1
3
1
3 ￿

= ￿

1 0
0 1 ￿

.
Some matrices do not have inverses. For example the matrix ￿

1 0
2 0 ￿

cannot have
an inverse since
￿
1 0
2 0 ￿
￿
a b
c d ￿

= ￿

a b
2a 2b ￿

,
so there is no choice of a, b, c, d that will make the right-hand side equal to the identity
matrix.
How can we tell if a matrix has an inverse, and, if it does have an inverse, then
how do we compute it? We answer the second question ﬁrst. Let’s try to ﬁnd the
inverse of
A =

2 −3 2
1 −1 1
3 2 2
¸
¸
.
This means that we are looking for a matrix B such that AB = I or

2 −3 2
1 −1 1
3 2 2
¸
¸

b
11
b
12
b
13
b
21
b
22
b
23
b
31
b
32
b
33
¸
¸
=

1 0 0
0 1 0
0 0 1
¸
¸
.
Let B
1
, B
2
, B
3
be the columns of B and I
1
, I
2
, I
3
be the columns of I. Then we can
see that this is really the problem of solving the three separate linear systems
AB
1
= I
1
AB
2
= I
2
AB
3
= I
3
.
Since the coeﬃcient matrix is the same for all three systems, we can just ﬁnd the LU
factorization of A and then use forward and back substitution three times to ﬁnd the
three solution vectors. These vectors, when lined up, will form the columns of B.
22 5. Inverses
If we want to ﬁnd the solution by hand, we can use the array method and a trick
to avoid running through Gaussian elimination three times. First set up the array

2 −3 2
1 −1 1
3 2 2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1 0 0
0 1 0
0 0 1
¸
¸
and use Gaussian elimination to get

2 −3 2
0 .5 0
0 0 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1 0 0
−.5 1 0
5 −13 1
¸
¸
.
Now in this situation we would normally use back substitution three times. But we
could also use Gauss-Jordan elimination. That is, use the −1 in the third row to
eliminate the entries in the column above it by subtracting multiples of the third row
from the second (unnecessary since that entry is already zero) and from the ﬁrst.
This gives

2 −3 0
0 .5 0
0 0 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

11 −26 2
−.5 1 0
5 −13 1
¸
¸
.
Then use the .5 in the second row to eliminate the −3 in the ﬁrst row.

2 0 0
0 .5 0
0 0 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

8 −20 2
−.5 1 0
5 −13 1
¸
¸
.
Finally divide each row by its leading nonzero entry to get

1 0 0
0 1 0
0 0 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

4 −10 1
−1 2 0
−5 13 −1
¸
¸
.
The three columns on the right are the solutions to the three linear systems, so
A
−1
=

4 −10 1
−1 2 0
−5 13 −1
¸
¸
.
These two methods for ﬁnding inverses, (1) LU factorization and forward and
back substitution n times and (2) Gauss-Jordan elimination, each require n
3
oper-
ations. Either method will work as long as A is nonsingular. Gauss-Jordan elimi-
nation does, however, present some organizational clarity when ﬁnding inverses by
5. Inverses 23
hand. Furthermore, since Gauss-Jordan elimination is just the array method per-
formed several times at once, row exchanges can be made without aﬀecting the ﬁnal
Once we have the inverse of a matrix, what can we do with it? It might seem
at ﬁrst glance that A
−1
can be used to solve the system Ax = b directly. Just
apply A
−1
to both sides to obtain x = A
−1
b. This turns out to be much inferior to
ordinary Gaussian elimination with back substitution for two reasons: (1) It takes
n
3
operations to ﬁnd A
−1
as compared with n
3
/3 operations to solve Ax = b by
Gaussian elimination. (2) Computing inverses, by whatever method, is subject to
much more numerical instability and round-oﬀ error than is Gaussian elimination.
Inverses are valuable in theory and for conceptualization. In some areas of statistics
and linear programming it is occasionally necessary to actually compute an inverse.
But for most large-scale applications, the computation of matrix inverses can and
should be avoided.
We end this section with a major result, which we state and prove formally. It
basically says that for any matrix A the three questions, (1) does Gaussian elimination
work, (2) does A have an inverse, and (3) does Ax = b have a unique solution, all
Theorem. For any square matrix A the following statements are equivalent (all are
true or all are false).
(a) A is nonsingular (that is, Gaussian elimination, possibly with row exchanges,
produces nonzero pivots)
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) Ax = 0 has the unique solution x = 0.
Proof: We show (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (a).
(a) ⇒ (b): The point of this section has been to show that if A is nonsingular, then
Gaussian elimination can be used to ﬁnd its inverse.
(b) ⇒ (c): Apply A
−1
to both sides of Ax = b to obtain a solution x = A
−1
b. Let y
be a diﬀerent solution. Then apply A
−1
to both sides of Ay = b to obtain y = A
−1
b.
Therefore x = y and the solution is unique.
(c) ⇒ (d): Clearly x = 0 is a solution of Ax = 0, and by (c) it must be the only
solution.
(d) ⇒ (a): We prove this by assuming (a) is false, that is A is singular, and we show
that this implies (d) is false, that is Ax = 0 has nonzero solutions in x. (Recall
that the statement “(d) ⇒ (a)” is logically equivalent to the statement “not (a) ⇒
not (d).”) Consider the system Ax = 0. Since we are assuming A is singular, if we
apply Gaussian elimination, at some point we will run into a zero pivot. Using the
language and results of Section 7, we can immediately say that there must exist a
free variable and therefore conclude that there are nonzero solutions to Ax = 0. But
since we haven’t got to Section 7 yet, we’ll try to prove this directly. When we run
24 5. Inverses
into the zero pivot, we will have a situation that looks something like

∗ ∗ ∗ ∗ ∗
0 ∗ ∗ ∗ ∗
0 0 0 ∗ ∗
0 0 0 ∗ ∗
0 0 0 ∗ ∗
¸
¸
¸
¸
¸

x
1
x
2
x
3
x
4
x
5
¸
¸
¸
¸
¸
=

0
0
0
0
0
¸
¸
¸
¸
¸
.
But if we set x
5
= x
4
= 0 and x
3
= 1 and solve for x
2
and x
1
by back substitution,
we will get a nonzero solution to Ax = 0. This shows that (d) is false. The pattern is
the same in all cases. If A is any singular matrix, then Gaussian elimination applied
to Ax = 0 will produce a system that in exactly the same way can be shown to have
nonzero solutions. This proves the theorem.
Note that the ﬁrst statement of the theorem is equivalent to the fact that A has
PA = LU factorization, which we can now write A = P
−1
LU. (We skip the proof
that P
−1
exists.)
EXERCISES
1. Use Gauss-Jordan elimination to ﬁnd the inverses of the following matrices.
(a) ￿

1 4
2 7 ￿

(b)

2 0 0
0 .1 0
0 0 −5
¸
¸
(c)

2 6 10
0 2 5
0 0 5
¸
¸
(d)

1 1 1
2 3 2
3 8 2
¸
¸
(e)

1 1 1
−1 3 2
2 1 1
¸
¸
(f)

1 2 3 1
1 3 3 2
2 4 3 3
1 1 1 1
¸
¸
¸
5. Inverses 25
(g) ￿

a b
c d ￿

2. From Exercises 1(b) and 1(c) what can you say about the inverse of a diagonal
matrix and of an upper triangular matrix?
3. Let A be the matrix of Exercise 1(e). Solve the system Ax =

11
23
13
¸
¸
by using A
−1
4. Which of the following matrices is invertible? Why? (See Section 4 Exercise 2.)
(a)

1 4 2
6 −8 2
−2 −8 −4
¸
¸
(b)

1 4 2
−2 −8 −3
−1 −4 5
¸
¸
(c)

1 3 2 0
0 5 0 2
0 0 10 2
0 0 0 11
¸
¸
¸
(d)

1 3 2 −1
0 5 3 2
0 0 0 2
0 0 0 10
¸
¸
¸
5. If A, B, C are invertible, then prove
(a) (AB)
−1
= B
−1
A
−1
.
(b) (ABC)
−1
= C
−1
B
−1
A
−1
.
(c) (A
T
)
−1
= (A
−1
)
T
.
6. Prove that if A and B are nonsingular, then so is AB.
7. Give 2 ×2 examples of the following.
(a) The sum of two invertible matrices may not be invertible.
(b) The sum of two noninvertible matrices may be invertible.
8. There is a slight hole in our proof of (a) ⇒ (b) in the theorem of this section.
To ﬁnd the inverse of A, we applied Gauss-Jordan elimination to the array [A, I] to
obtain [I, B]. We then concluded AB = I so that B is a right-inverse of A. But how
26 5. Inverses
do we know that B is also a left-inverse of A? Prove that it is, that is, prove that
BA = I by applying the reverse of the same Gauss-Jordan steps in reverse order to
the array [B,I] to obtain [I,A].
9. More generally, it is true that if a matrix has a one-sided inverse, then it must have
a two-sided inverse. Or more simply stated, AB = I ⇒BA = I. To prove this, argue
as follows: AB = I ⇒ B is nonsingular ⇒ B is invertible ⇒ A = B
−1
⇒ BA = I.
Fill in the details.
10. True or false?
(a) “Every nonsingular matrix has an LU factorization.”
(b) “If A is singular, then the homogeneous system Ax = 0 has nonzero solutions.”
6. Tridiagonal Matrices 27
6. TRIDIAGONAL MATRICES
When coeﬃcient matrices arise in applications, they usually have special pat-
terns. In such cases Gaussian elimination often simpliﬁes. We now illustrate this
by looking at tridiagonal matrices, which are the simplest kind of band matrices. A
matrix is tridiagonal if all of its nonzero elements are either on the main diagonal or
adjacent to the main diagonal. Here is an example (from Section 3 Exercise 1(d)):

2 1 0 0 0
4 5 3 0 0
0 3 4 1 0
0 0 −1 1 1
0 0 0 4 3
¸
¸
¸
¸
¸
If we run Gaussian elimination on this matrix, we obtain

2 1 0 0 0
0 3 3 0 0
0 0 1 1 0
0 0 0 2 1
0 0 0 0 1
¸
¸
¸
¸
¸
.
This example reveals three properties of tridiagonal matrices and Gaussian elimina-
tion. (1) There is at most one nonzero multiplier in each Gaussian step. (2) The
superdiagonal entries (that is, the entries just above the main diagonal) don’t change.
And (3) the ﬁnal upper triangular matrix has nonzero entries only on its diagonal
and superdiagonal. If we count the number of operations required to triangulate a
tridiagonal matrix, we ﬁnd it is equal to n instead of the usual n
3
/3. We conclude
that large systems involving tridiagonal matrices are very easy to solve. In fact, we
can write a quick and eﬃcient program that will solve tridiagonal systems directly:

d
1
c
1
a
2
d
2
c
2
a
3
d
3
c
3
. . .
a
n
d
n
¸
¸
¸
¸
¸

x
0
x
1
x
2
.
.
.
x
n
¸
¸
¸
¸
¸
¸
=

b
0
b
1
b
2
.
.
.
b
n
¸
¸
¸
¸
¸
¸
for k = 2 to n do
if d
k−1
= 0 then signal failure and stop
m = a
k
/d
k−1
d
k
= d
k
−mc
k−1
b
k
= b
k
−mb
k−1
if d
n
= 0 then signal failure and stop
x
n
= b
n
/d
n
for k = n −1 down to 1 do x
k
= (b
k
−c
k
x
k+1
)/d
k
28 6. Tridiagonal Matrices
Tridiagonal matrices arise in many situations: electrical circuits, heat ﬂow prob-
lems, the deﬂection of beams, and so on. Here we show how tridiagonal matrices are
used in cubic spline interpolation. We are given data (x
0
, y
0
), (x
1
, y
1
), · · · , (x
n
, y
n
).
The points x
1
, x
2
, · · · , x
n−1
are called interior nodes, and x
0
and x
n
are called bound-
ary nodes. The problem is to ﬁnd a cubic polynomial on each of the intervals
[x
0
, x
1
], [x
1
, x
2
], · · · , [x
n−1
, x
n
] such that, at each interior node, the cubic on the left
and the cubic on the right have the same heights, the same slopes, and the same
curvature (that is to say, the same second derivative). If we glue these cubics to-
gether we will obtain a cubic spline, which is a smooth curve passing through all the
data. To make the problem completely determined, we need conditions at the two
boundary nodes. Often these are taken to be the requirement that the spline has no
curvature (zero second derivatives) at the boundary nodes. The spline thus obtained
is called a natural spline. Splines have applications in CAD-CAM, font design, and
modeling.
(x
0
, y
0
)
(x
1
, y
1
)
(x
2
, y
2
)
(x
3
, y
3
)
(x
4
, y
4
)
x
0
x
1
x
2
x
3
x
4
s
0
s
1
s
2
s
3
s
4
FIGURE 1
How do we ﬁnd the cubic polynomials that make up the spline? If we knew what
slopes the spline curve should have at its nodes, then we could ﬁnd the cubic poly-
nomial on each interval using the method of Section 1 Exercise 8. Let s
0
, s
1
, · · · , s
n
be the unknown slopes at the nodes. For simplicity assume that the data is equally
spaced, that is, x
1
−x
0
= x
2
−x
1
= · · · = x
n
−x
n−1
= h. Then with some algebraic
eﬀort it is possible to show that the conditions described above force the slopes to
6. Tridiagonal Matrices 29
satisfy the following linear system:

2 1
1 4 1
1 4 1
1 4 1
.
.
.
1 4 1
1 4 1
1 4 1
1 2
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸

s
0
s
1
s
2
s
3
.
.
.
s
n−3
s
n−2
s
n−1
s
n
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
=
3
h

y
1
−y
0
y
2
−y
0
y
3
−y
1
y
4
−y
2
.
.
.
y
n−2
−y
n−4
y
n−1
−y
n−3
y
n
−y
n−2
y
n
−y
n−1
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
.
The ﬁrst and last equations come from the conditions at the boundary nodes. All
the other equations come from the conditions at the interior nodes. The system is
tridiagonal and therefore easy to solve, even when there is a large number of nodes.
Once the slopes s
0
, s
1
, · · · , s
n
are known, the cubic polynomial on each interval can
be found as a cubic Hermite interpolant. (See Section 1 Exercise 8.)
EXERCISES
1. Write down the system that the slopes of the natural spline interpolant of the
data (0,0), (1,1), (2,4), (3,1), (4,0) must satisfy. Solve it. Sketch the resulting spline
curve.
2. Run Gaussian elimination on the tridiagonal n + 1 ×n + 1 matrix of this section
and show that the pivots for rows 1,· · · , n − 1 are all > 3. This proves that not
only is this matrix nonsingular but also that no row exchanges are necessary. The
tridiagonal algorithm can therefore be used.
30 7. Systems with Many Solutions
7. SYSTEMS WITH MANY SOLUTIONS
Consider the single linear equation in one unknown ax = b. At ﬁrst glance we
would say that the solution is just x = b/a. But in fact there are three cases:
(1) If a ￿= 0, there is exactly one solution: x = b/a. For example, if 2x = 6, then
x = 6/2 = 3.
(2) If a = 0 and b ￿= 0, there is no solution. For example, 0x = 6 is not satisﬁed by
any x.
(3) If a = b = 0, there are inﬁnitely many solutions because 0x = 0 is satisﬁed by
every x.
It is a striking fact that exactly the same three cases are the only possibilities that
exist for systems of equations. We ﬁrst look at 2 × 2 examples.
Example 1: The system
u + v = 2
u − v = 0
in array form ￿

1 1
1 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
0 ￿

reduces by Gaussian elimination to ￿

1 1
0 −2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
−2 ￿

.
The unique solution is therefore v = 1 and u = 1. This is the nonsingular case we
have been considering in these notes up to now.
Example 2: The system
u + v = 2
2u + 2v = 0
in array form ￿

1 1
2 2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
0 ￿

reduces by Gaussian elimination to ￿

1 1
0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
−4 ￿

.
Clearly we cannot use back substitution. Even worse, the second equation, 0u+0v =
−4, has no solution. This indicates that the entire system has no solution. The
coeﬃcient matrix is of course singular, and the system is said to be inconsistent.
7. Systems with Many Solutions 31
Example 3: The system
u + v = 2
2u + 2v = 4
in array form ￿

1 1
2 2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
4 ￿

reduces by Gaussian elimination to ￿

1 1
0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
0 ￿

.
This time the second equation is trivially satisﬁed for all u and v. So we set v = c
where c is an arbitrary constant and try to continue with back substitution. The ﬁrst
equation then gives u = 2 −c. The solution is therefore
u = 2 −c
v = c
or written in vector form is
￿
u
v ￿

= ￿

2 −c
c ￿

or alternatively ￿

u
v ￿

= ￿

2
0 ￿

+c ￿

−1
1 ￿

.
We see that we have obtained an inﬁnite number of solutions parametrized by an
arbitrary constant. The coeﬃcient matrix is still singular as before, but the system
is said to be underdetermined.
In the following examples we present a systematic method for ﬁnding solutions of
more complicated systems. The method is an extension of Gauss-Jordan elimination.
Example 4: Suppose we have the 3 × 3 system

1 2 −1
2 4 1
3 6 −2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
7
7
¸
¸
.
Gaussian elimination produces

1 2 −1
0 0 3
0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
3
0
¸
¸
,
32 7. Systems with Many Solutions
and Gauss-Jordan produces

1 2 0
0 0 3
0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

3
3
0
¸
¸
.
This is as far as we can go. There is no way to get rid of the 2 in the ﬁrst equation.
The variables u, v, w now fall into two groups: leading variables, those that correspond
to columns that have a leading nonzero entry for some row, and free variables, those
that do not. In this case, u and w are leading variables and v is a free variable. Free
variables are set to arbitrary constants, so v = c. Leading variables are solved for in
terms of free variables. Working from the bottom up, we obtain the solution
u = 3 −2c
v = c
w = 1
or in vector form

u
v
w
¸
¸
=

3
0
1
¸
¸
+c

−2
1
0
¸
¸
.
Example 5: Suppose we have the 3 × 4 system

1 2 1 3
0 0 0 1
0 1 1 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
2
3
¸
¸
.
One step of Gaussian elimination, an exchange of the second and third rows, will
produce the staircase form

1 2 1 3
0 1 1 1
0 0 0 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
3
2
¸
¸
.
Now apply two steps of Gauss-Jordan to obtain

1 0 −1 0
0 1 1 0
0 0 0 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

−6
1
2
¸
¸
.
(These Gauss-Jordan steps are really not necessary, but they usually make the answer
somewhat easier to write down.) The free variable is w. The solution is therefore
u = −6 +c
v = 1 −c
w = c
x = 2
7. Systems with Many Solutions 33
or in vector form

u
v
w
x
¸
¸
¸
=

−6
1
0
2
¸
¸
¸
+c

1
−1
1
0
¸
¸
¸
.
Example 6: Suppose we have a 3 × 4 system that reduces to

1 2 0 3
0 0 2 1
0 0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
2
0
¸
¸
.
There are two free variables, v and x. Each is set to a diﬀerent arbitrary constant.
The solution is therefore
u = 2 −3c −2d
v = d
w = 1 −.5c
x = c
or in vector form

u
v
w
x
¸
¸
¸
=

2
0
1
0
¸
¸
¸
+c

−3
0
−.5
1
¸
¸
¸
+d

−2
1
0
0
¸
¸
¸
.
This time we have a an inﬁnite number of solutions parametrized by two arbitrary
constants.
In general, Gaussian elimination will put the array into echelon form

• ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 • ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 0 0 • ∗ ∗ ∗ ∗ ∗
0 0 0 0 0 0 0 0 •
0 0 0 0 0 0 0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

¸
¸
¸
¸
¸
,
and Gauss-Jordan elimination will put the array into row-reduced echelon form

• 0 ∗ 0 ∗ ∗ ∗ ∗ 0
0 • ∗ 0 ∗ ∗ ∗ ∗ 0
0 0 0 • ∗ ∗ ∗ ∗ 0
0 0 0 0 0 0 0 0 •
0 0 0 0 0 0 0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

¸
¸
¸
¸
¸
.
34 7. Systems with Many Solutions
In either case, we get a staircase pattern where the ﬁrst nonzero entry in each row
(indicated by bullets above) is a pivot. This is the precise mathematical deﬁnition of
pivot. For square nonsingular matrices, all pivots occur on the main diagonal. For
singular matrices, at least one pivot occurs to the right of the main diagonal. (Up to
now we have been referring to this informally as the case of a “zero pivot.”)
EXERCISES
1. Solutions can be written in many equivalent ways. Show that the following
expressions represent the same set of solutions.
(a) ￿

u
v ￿

= ￿

2
0 ￿

+c ￿

−1
1 ￿

and ￿

u
v ￿

= ￿

2
0 ￿

+c ￿ ￿

8
−8 ￿

(b)

u
v
w
¸
¸
=

3
0
0
¸
¸
+c

0
1
0
¸
¸
+d

0
0
1
¸
¸
and

u
v
w
¸
¸
=

3
0
0
¸
¸
+c ￿

0
1
0
¸
¸
+d ￿

0
1
1
¸
¸
2. Find the solutions of
(a)

1 2 3
1 4 5
1 0 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

3
4
2
¸
¸
(b)

2 2 2
2 6 4
4 8 6 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
6
10
¸
¸
(c)

1 2 −1
2 4 −2
−3 −6 3 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

3
6
−9
¸
¸
(d)

1 2 1 8
0 1 0 2
2 5 2 20 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
−1
1
¸
¸
(e)

2 4 2 3
1 2 2 2
4 8 0 4 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
−1
8
¸
¸
(f)

2 3
4 6
4 3 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

−9
−18
−3
¸
¸
7. Systems with Many Solutions 35
3. Solve each of the following 2 × 2 systems. Then graph each equation as a line and
give a geometric reason for the number of solutions of each system.
(a) ￿

1 3
3 2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
1 ￿

(b) ￿

2 1
−6 −3 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

−1
−4 ￿

(c) ￿

3 −1
−6 2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
−4 ￿

4. Solve each of the following 3 × 3 systems. Then graph each equation as a plane
and give a geometric reason for the number of solutions of each system.
(a)

1 1 0
1 −1 0
0 0 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
0
0
¸
¸
(b)

2 0 0
0 0 3
0 0 3 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

2
0
6
¸
¸
(c)

1 0 0
0 1 0
1 1 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
1
¸
¸
(d)

1 0 0
0 1 0
1 1 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
0
1
¸
¸
(e)

1 1 1
2 2 2
3 3 3 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

1
2
3
¸
¸
5. Explain why the following statements are true.
(a) If the system Ax = b has more unknowns than equations, then it has either no
solution or inﬁnitely many solutions. (Hint: There must be some free variables.)
(b) If the homogeneous system Ax = 0 has more unknowns than equations, then it
has inﬁnitely many solutions. (Hint: Why can’t the no solution case occur?)
6. If Ax = b has inﬁnitely many solutions, then Ax = c (diﬀerent right-hand side)
has how many possible solutions: none, one, or inﬁnitely many?
36 7. Systems with Many Solutions
7. Show with 3 × 2 examples that if a system Ax = b has more equations than
unknowns, then any one of the three cases of no solution, one solution, or inﬁnitely
many solutions can occur. (Hint: Just expand the 2 × 2 examples at the beginning
of this section to 3 ×2 examples.)
8. A nutritious breakfast drink can be made by mixing whole egg, milk, and orange
juice in a blender. The food energy and protein for these ingredients are given below.
How much of each should be blended to produce a drink with 560 calories of energy
and 24 grams of protein?
energy (kcal) protein (g)
1 egg 80 6
1 cup milk 180 9
1 cup orange juice 100 3
9. Consider the chemical reaction
a NO
2
+b H
2
O = c HNO
2
+d HNO
3
.
The reaction must be balanced, that is, the number of atoms of each element must
be the same before and after the reaction. For oxygen, for example, this would mean
2a +b = 2c +3d. While there are many possible choices for a, b, c, d that balance the
reaction, it is customary to use the smallest possible positive integers. Find such a
solution.
10. Find the equation of the circle in the form c
1
(x
2
+y
2
) +c
2
x +c
3
y +c
4
= 0 that
passes through the points (2,6), (2,0), (5,3).
8. Determinants 37
8. DETERMINANTS
Determinants have been known and studied for 300 years. Today, however, there
is far less emphasis on them than in the past. In modern mathematics, determinants
play an important but narrow role in theory and almost no role at all in computations.
We will make use of them in our study of eigenvalues in Section 9. The determinant
det(A) is a number associated with a square matrix A. For 2 × 2 and 3 × 3 matrices
it is deﬁned as follows
det ￿

a
11
a
12
a
21
a
22 ￿

= a
11
a
22
−a
21
a
12
det

a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
¸
¸
=
a
11
a
22
a
33
+a
12
a
23
a
31
+a
13
a
32
a
21
−a
31
a
22
a
13
−a
21
a
12
a
33
−a
32
a
23
a
11
.
These are the familiar diagonal rules from high school. These rules cannot be extended
to larger matrices! For such matrices we must use the general deﬁnition:
det(A) = ￿

σ
sign(σ)a
1σ(1)
a
2σ(2)
a
3σ(3)
· · · a
nσ(n)
,
where σ(1), · · · , σ(n) is a permutation or rearrangement of the numbers 1, 2, · · · , n.
This means the determinant is the sum of all possible products of n entries of A,
where each product consists of entries taken from unique rows and columns. In
particular, it is easy to see that this is true for the 3 × 3 case written out above. For
example, the second term in the high-school formula comes from

∗ a
12

∗ ∗ a
23
a
31
∗ ∗
¸
¸
.
The symbol sign(σ) is equal to +1 or −1 depending on how the rows and columns
are chosen. We intentionally leave this deﬁnition of the determinant vague since it is
hard to understand, diﬃcult to motivate, and impossible to compute. It is important
to us only because from it the following properties of the determinant can be proved.
We will omit the proofs since in this section we want to get through the determinant
as quickly as possible. In a later section we present another approach that will make
clear where the mysterious determinant formula comes from and how the properties
are dertived.
38 8. Determinants
(1) The determinant of the identity matrix is 1.
det

1 0 0
0 1 0
0 0 1
¸
¸
= 1
(2) If A has a zero row or two equal rows or two rows that are multiples of each other,
then det(A) = 0.
det

1 4 2
0 0 0
5 7 1
¸
¸
= 0 det

1 4 2
3 5 2
1 4 2
¸
¸
= 0 det

1 4 2
3 5 2
2 8 4
¸
¸
= 0
(3) The determinant changes sign when two rows are exchanged.
det

1 2 2
5 7 1
3 1 3
¸
¸
= −det

3 1 3
5 7 1
1 2 2
¸
¸
(4) The typical Gaussian elimination operation of subtracting a multiple of one row
from another leaves the determinant unchanged.
det

1 2 2
3 1 3
5 7 1
¸
¸
= det

1 2 2
0 −5 −3
5 7 1
¸
¸
(5) If all the entries in a row have a common factor, then that factor can be taken
outside the determinant
det

6 3 12
5 7 1
2 5 2
¸
¸
= 3 det

2 1 4
5 7 1
2 5 2
¸
¸
(6) The determinant of the transpose of a matrix is the same as the determinant of
the matrix itself: det(A
T
) = det(A).
det

1 2 2
5 5 5
3 1 3
¸
¸
= det

1 5 3
2 5 1
2 5 3
¸
¸
8. Determinants 39
(7) The determinant of a (lower or upper) triangular matrix is the product of its
diagonal entries.
det

2 3 7
0 5 2
0 0 3
¸
¸
= 2 · 5 · 3 = 30
(8) The determinant of a product is the product of the determinants.
det(AB) = det(A) det(B)
(9) A is nonsingular if and only if det(A) ￿= 0.
Note that property 6 means that all the properties about rows also hold for
columns. Note also that property 9 can be added to the theorem of Section 5 to
obtain
Theorem. For any square matrix A the following statements are equivalent.
(a) A is nonsingular
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) Ax = 0 has the unique solution x = 0.
(e) det(A) ￿= 0.
Proof: We show (a) ⇔(e). If Ais nonsingular, then Gaussian elimination will produce
an upper triangular matrix with nonzero pivots. Since by properties 3 and 4 Gaussian
elimination changes at most the sign of the determinant, we have that det(A) ￿= 0.
If A is singular, then Gaussian elimination will produce an upper triangular matrix
with at least one zero pivot. By the same argument, det(A) = 0.
If the determinant is to have any practical value, there must be an eﬃcient way to
compute it. We could try to use the formula in the deﬁnition of the determinant. But
as we saw, the formula consists of a sum of products of n entries of A, where in each
product each factor is an entry from a diﬀerent row and a diﬀerent column. Since the
ﬁrst entry in a product can be chosen in n ways, the second in n − 1 ways, the third in
n − 2 ways, and so on, there are therefore n(n−1)(n−2)(n−3) · · · (2)(1) = n! diﬀerent
products in the sum. This means there are n! products that must be summed up,
each of which requires n − 1 multiplications, resulting in (n − 1)n! multiplications
in all. For a 25 × 25 matrix there would be 24·25! or 3.7×10
26
multiplications. A
computer that can perform a million multiplications a second would take 10
13
years
to compute this determinant! This is clearly unacceptable.
An alternate approach is suggested by the proof of property (10). Use Gaussian
elimination to triangulate the matrix. Then the determinant is the product of the
40 8. Determinants
diagonal entries (the pivots!) times +1 or −1 depending upon whether there was an
even or odd number of row exchanges. For example, the matrix at the beginning of
Section 4 was reduced to an upper triangular matrix by Gaussian elimination with
one row exchange.

1 2 3
2 4 9
2 6 7
¸
¸

1 2 3
0 0 3
0 2 1
¸
¸

1 2 3
0 2 1
0 0 3
¸
¸
We therefore have
det

1 2 3
2 4 9
2 6 7
¸
¸
= −det

1 2 3
0 2 1
0 0 3
¸
¸
= −1 · 2 · 3 = −6.
Since this method uses only Gaussian elimination, it requires n
3
/3 operations. For a
25 × 25 matrix this is only 5208 operations or only 0.005 seconds on our hypothetical
computer! The method above is an excellent way to compute the determinant, but it
takes just as many steps as Gaussian elimination. In fact, it is Gaussian elimination!
Why do we want to compute a determinant in the ﬁrst place? What can it tell
us about a matrix? Whether or not the matrix is singular? But we can determine
that just by doing Gaussian elimination. If we run into a zero pivot that cannot
be cured by row exchanges, then we know the matrix is singular. Otherwise we get
its LU factorization. So do we ever need to compute a determinant in practice?
No! Determinants are rarely computed outside a classroom. They are important,
however, for theoretical developments as we will see in the next section.
The determinant can be evaluated in other ways. In particular, there is the
cofactor expansion of the determinant. It expresses the determinant of a matrix as a
sum of determinants of smaller matrices. Here we use it to ﬁnd the determinant of
the matrix above:
det

1 2 3
2 4 9
2 6 7
¸
¸
= 1 det ￿

4 9
6 7 ￿

−2 det ￿

2 9
2 7 ￿

+ 3 det ￿

2 4
2 6 ￿

= 1(28 −54) −2(14 −18) + 3(12 −8)
= −26 + 8 + 12
= −6.
In words, the determinant of the matrix on the left is the sum of the entries of its
ﬁrst row times the cofactors of its ﬁrst row. A cofactor is the determinant of the
2 ×2 matrix obtained from the original matrix by crossing out a particular row and
a column, with an appropriate sign placed in front of the determinant. In particular,
the cofactor of the ﬁrst entry is the determinant of the matrix obtained by crossing
8. Determinants 41
out the ﬁrst row and ﬁrst column; the cofactor of the second entry is the determinant
of the matrix obtained by crossing out the ﬁrst row and the second column with a
negative sign in front; and the cofactor of the third entry is the determinant of the
matrix obtained by crossing out the ﬁrst row and the third column. Here is another
cofactor expansion of the same matrix:
det

1 2 3
2 4 9
2 6 7
¸
¸
= −2 det ￿

2 9
2 7 ￿

+ 4 det ￿

1 3
2 7 ￿

−6 det ￿

1 3
2 9 ￿

= −2(14 −18) + 4(7 −6) −6(9 −6)
= 8 + 4 −18
= −6.
This time we expanded with respect to the second column. Note that the 2 ×2 ma-
trices arise in the same way, by crossing out the row and column of the corresponding
entry. Note also the signs. In general the signs in the deﬁnition of the cofactors form
a checkerboard pattern:

+ − + − · · ·
− + − + · · ·
+ − + − · · ·
− + − + · · ·
.
.
.
.
.
.
.
.
.
.
.
.
¸
¸
¸
¸
¸
¸
.
Here’s an example of a cofactor expansion of the determinant of a 4 ×4 matrix:
det

1 1 2 4
1 0 4 2
1 −1 0 0
2 2 2 6
¸
¸
¸
= 1 det

0 4 2
−1 0 0
2 2 6
¸
¸
−1 det

1 4 2
1 0 0
2 2 6
¸
¸
+2 det

1 0 2
1 −1 0
2 2 6
¸
¸
−4 det

1 0 4
1 −1 0
2 2 2
¸
¸
.
We expanded with respect to the ﬁrst row. In this case we are now faced with
ﬁnding four 3 ×3 determinants. We could use either cofactor expansion or the high-
school formula on each of these smaller determinants. (Note that we should have
expanded with respect to the third row because then we would have had only two
3 × 3 determinants to evaluate.) It is becoming clear that the method of cofactor
expansion requires a great deal of computation. Just think about the 5 ×5 case! In
fact, it generally requires exactly the same number of multiplications as the formula
that deﬁned the determinant in the ﬁrst place. It is therefore extremely impractical.
It does, however, have some value in theoretical considerations and in the hand
42 8. Determinants
computation of determinants of matrices that contain algebraic expressions. For
example, to compute
det

x y 1
2 8 1
4 7 1
¸
¸
(ignoring that fact that we have the high-school formula for this!) we would use
cofactor expansion with respect to the ﬁrst row. We would deﬁnitely not want to use
Gaussian elimination here.
As long as we have come this far we might as well write down the general formula
for the cofactor expansion of the determinant of a matrix with respect to its ith row.
It is
det A = a
i1
[(−1)
i+1
det M
i1
] +a
i2
[(−1)
i+2
det M
i2
] + · · · +a
in
[(−1)
i+n
det M
in
]
where M
ij
is the submatrix formed by deleting the ith row and jth column of A. (The
formula for expansion with respect to columns is similar.) Note that the cofactor is
oﬃcially deﬁned as the entire quantity in brackets, that is, as the determinant of the
submatrix M
ij
times (−1)
i+j
. The formula is not very illuminating, and we make no
attempt to prove it.
EXERCISES
1. Compute the determinants by Gaussian elimination
(a)

1 3 1
1 1 4
0 2 0
¸
¸
(b)

1 1 1
3 3 −1
2 −2 2
¸
¸
(c)

2 1 3
−2 5 1
4 2 4
¸
¸
(d)

1 1 2 4
1 0 4 2
1 −1 0 0
2 2 2 6
¸
¸
¸
(e)

1 2 3 1
1 3 3 2
2 4 3 3
1 1 1 1
¸
¸
¸
8. Determinants 43
(f)

0 0 1 0
1 0 0 0
0 0 0 1
0 1 0 0
¸
¸
¸
(Use property (3) for a quick solution.)
2. Give 2 × 2 examples of the following.
(a) A ￿= 0 and det(A) = 0.
(b) A ￿= B and det(A) = det(B)
(c) det(A+B) ￿= det(A) + det(B)
3. Prove the following.
(a) det(A
k
) = (det(A))
k
for any positive integer k.
(b) det(A
−1
) = 1/ det(A)
(c) det(BAB
−1
) = det(A)
(d) det(cA) = c
n
det(A) where A is n ×n.
4. For the matrix in Exercise 1(a) ﬁnd det(A
−1
) and det(A
T
) without doing any
work.
5. Use cofactor expansions to evaluate the following determinants.
(a)

3 7 5 7
0 3 6 0
1 1 7 2
0 0 1 0
¸
¸
¸
(b)

1 1 2 4
1 0 4 2
1 −1 0 0
2 2 2 6
¸
¸
¸
(c)

x y 1
2 8 1
4 7 1
¸
¸
(d)

2 −x 2 2
1 2 −x 0
1 0 2 −x
¸
¸
6. Suppose we have a square n × n matrix that looks like A = ￿

B C
0 D ￿

where B,
C, and D are submatrices of sizes p ×p, p ×(n−p), and (n−p) ×(n−p). (A is said
to be partitioned into blocks.) Show that det(A) = det(B) det(D). (Use Gaussian
elimination.)
7. True or false? “If det(A) = 0, then the homogeneous system Ax = 0 has nonzero
solutions.”
44 9. Eigenvalues
9. EIGENVALUES
There are many problems in engineering and science where, given a square matrix
A, it is necessary to know if there is a number λ (read “lambda”) and a nonzero vector
x such that Ax = λx. The number λ is called an eigenvalue of A and the vector x
is called an eigenvector associated with λ. (“Eigen” is a German word meaning “its
own” or “peculiar to it.”) For example

5 4 4
−7 −3 −1
7 4 2
¸
¸

1
−1
1
¸
¸
= 5

1
−1
1
¸
¸
.
So that 5 is an eigenvalue of the matrix above and

1
−1
1
¸
¸
is an associated eigenvector.
Note that any multiple of this vector is also an eigenvector. That is, any vector of
the form c

1
−1
1
¸
¸
is an eigenvector associated with the eigenvalue 5. So what we
actually have is an inﬁnite family of eigenvectors. Note also that this inﬁnite family
can be represented in many other ways such as, for example, c ￿

−2
2
−2
¸
¸
.
Suppose we want to ﬁnd the eigenvalues of a matrix A. We start by rewriting
the equation Ax = λx as Ax=λIx or Ax − λIx=0 or (A−λI)x=0. We therefore
want to ﬁnd those numbers λ for which the homogeneous system (A−λI)x = 0 has
nonzero solutions x. By the theorem of the previous section, this is equivalent to
asking for those numbers λ that make the matrix A−λI singular or, in other words,
for which det(A − λI) = 0. This equation is called the characteristic equation of A.
The left-hand side is a polynomial in λ and is called the characteristic polynomial of
A.
Example 1: Find the eigenvalues of the matrix
A = ￿

4 2
−1 1 ￿

.
First set
A−λI = ￿

4 2
−1 1 ￿

− ￿

λ 0
0 λ ￿

= ￿

4 −λ 2
−1 1 −λ ￿

.
The characteristic equation of A is det(A−λI) = 0, which can be rewritten as follows:
det ￿

4 −λ 2
−1 1 −λ ￿

=0
9. Eigenvalues 45
(4 −λ)(1 −λ) −2(−1)=0
λ
2
−5λ + 6=0
(λ −2)(λ −3)=0.
The eigenvalues of A are therefore λ = 2 and λ = 3. We can go further and ﬁnd the
associated eigenvectors. For the case λ = 2 we wish to ﬁnd nonzero solutions of the
system (A−2I)x = 0, which can be rewritten as ￿

4 −2 2
−1 1 −2 ￿
￿
u
v ￿

= ￿

0
0 ￿

.
We use Gaussian elimination in array form ￿

2 2
−1 −1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿

to get ￿

2 2
0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿

.
The solution is v = c and u = −c, or in vector form ￿

u
v ￿

= c ￿

−1
1 ￿

.
The case λ = 3 is similar. Write (A−3I)x = 0 as ￿

4 −3 2
−1 1 −3 ￿
￿
u
v ￿

= ￿

0
0 ￿

and then solve
￿
1 2
−1 −2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿

to get ￿

1 2
0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿

.
The solution is v = c and u = −2c, or in vector form ￿

u
v ￿

= c ￿

−2
1 ￿

.
Therefore, for each of the two eigenvalues we have found an inﬁnite family of eigen-
vectors parametrized by a single arbitrary constant.
46 9. Eigenvalues
Example 2: Things can become more complicated as the size of the matrix increases.
Consider the matrix
A =

2 3 0
4 3 0
0 0 6
¸
¸
.
Proceeding as before we have the characteristic equation det(A −λI) = 0 rewritten
as
det

2 −λ 3 0
4 3 −λ 0
0 0 6 −λ
¸
¸
= 0,
(2 −λ)(3 −λ)(6 −λ) −3 · 4(6 −λ)=0,
[(2 −λ)(3 −λ) −3 · 4](6 −λ)=0,

2
−5λ −6](6 −λ)=0,
−(λ −6)
2
(λ + 1)=0.
Here we have two eigenvalues, λ = 6 and λ = −1. To ﬁnd the eigenvectors for λ = −1,
solve (A−(−1)I)x = 0 or

2 + 1 3 0
4 3 + 1 0
0 0 6 + 1
¸
¸

u
v
w
¸
¸
=

0
0
0
¸
¸
by reducing

3 3 0
4 4 0
0 0 7 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
¸
¸
to

3 3 0
0 0 0
0 0 7 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
¸
¸
.
The solution is w = 0, v = c, and u = −c, or in vector form

u
v
w
¸
¸
= c

−1
1
0
¸
¸
.
For the case λ = 6, solve (A−6I)x = 0 or

2 −6 3 0
4 3 −6 0
0 0 6 −6
¸
¸

u
v
w
¸
¸
=

0
0
0
¸
¸
9. Eigenvalues 47
by reducing

−4 3 0
4 −3 0
0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
¸
¸
to

−4 3 0
0 0 0
0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
¸
¸
.
The solution is w = c, v = d, and u =
3
4
d, or in vector form

u
v
w
¸
¸
= c

0
0
1
¸
¸
+d

3
4
1
0
¸
¸
.
We have therefore obtained an inﬁnite family of eigenvectors parametrized by two
arbitrary constants. Note that this inﬁnite family can be represented in many other
ways such as, for example,

u
v
w
¸
¸
= c ￿

0
0
2
¸
¸
+d ￿

3
4
1
¸
¸
.
So in this example we have a 3 × 3 matrix with only two distinct eigenvalues. We
write λ = −1, 6, 6 to indicate that 6 is repeated root of the characteristic equation,
and we say that 6 has multiplicity 2. For λ = 6 we found two linearly indepen-
dent eigenvectors such that arbitrary linear combinations of them generate all other
eigenvectors. Intuitively, “linearly independent” means “essentially diﬀerent.” We
will not discuss the precise mathematical meaning of independence here (see Section
16), except to say that this does not always happen as in the following example.
Example 3: It is possible for a repeated eigenvalue to have only one independent
eigenvector. Consider the matrix
A = ￿

2 1
0 2 ￿

,
which is easily seen to have characteristic equation (λ − 2)
2
= 0 and therefore the
repeated eigenvalue λ = 2, 2. But in solving the system (A−2I)x = 0 we obtain ￿

2 −2 1
0 2 −2 ￿
￿
u
v ￿

= ￿

0
0 ￿

48 9. Eigenvalues
or
￿
0 1
0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿

.
So u = c and v = 0 or in vector form ￿

u
v ￿

= c ￿

1
0 ￿

.
Therefore the eigenvalue λ = 2, 2 has only one independent eigenvector.
Example 4: Even worse, a matrix can have no (real) eigenvalues at all. For example
the matrix
A = ￿

0 1
−1 0 ￿

has characteristic equation λ
2
+ 1 = 0 which has no real solutions.
In this section we have seen that, in order to understand eigenvalues, we have to know
something about determinants. In fact, the characteristic polynomial is deﬁned as a
determinant. Because of this, in practice it is very diﬃcult to compute characteristic
polynomials for large matrices. Even when this can be done, the problem of ﬁnding
the roots of a high degree polynomial is numerically unstable. For practical computa-
tions, a much more sophisticated algorithm called the QR method, which has nothing
to do with characteristic polynomials, is used to ﬁnd eigenvalues and eigenvectors.
Although the characteristic polynomial is important in theory, in practice it is rarely,
if ever, computed.
EXERCISES
1. Find the eigenvalues and eigenvectors of the following matrices.
(a) ￿

1 1
0 2 ￿

(b)

5 0 2
0 1 0
−4 0 −1
¸
¸
(c)

2 2 2
1 2 0
1 0 2
¸
¸
9. Eigenvalues 49
(d)

6 4 4
−7 −2 −1
7 4 3
¸
¸
Hint: Expand in cofactors of the ﬁrst row.
(e)

0 2 2
2 0 −2
2 −2 0
¸
¸
(f)

−2 0 0 0
0 −2 5 −5
0 0 3 0
0 0 0 3
¸
¸
¸
2. Suppose you and I are computing eigenvectors. We get the results below. Explain
in what sense we got the same answers, or not.
(a) You get

−3
9
6
¸
¸
and I get

4
−12
−8
¸
¸
.
(b) You get

1
1
0
¸
¸
,

0
1
1
¸
¸
and I get

1
2
1
¸
¸
,

1
0
−1
¸
¸
.
(c) You get

1
1
0
¸
¸
,

0
1
1
¸
¸
and I get

1
2
1
¸
¸
,

1
0
−1
¸
¸
,

1
1
0
¸
¸
.
(d) You get

1
1
0
¸
¸
,

0
1
1
¸
¸
and I get

1
2
1
¸
¸
,

2
4
2
¸
¸
.
3. Prove the following.
(a) A and A
T
have the same eigenvalues. (Hint: They have the same characteristic
polynomials.)
(b) A and BAB
−1
have the same eigenvalues. (Hint: They have the same charac-
teristic polynomials.)
(c) A = S

λ
1
λ
2
.
.
.
λ
n
¸
¸
¸
¸
S
−1
has eigenvalues λ
1
, λ
2
, · · · , λ
n
.
(d) If Ax = λx, then A
2
x = λ
2
x, A
3
x = λ
3
x, · · ·.
(e) If Ax = λx and A is nonsingular, then A
−1
x =
1
λ
x
(f) If A is singular, then λ = 0 must be an eigenvalue of A.
50 9. Eigenvalues
(g) If A is triangular, then its eigenvalues are its diagonal entries a
11
, a
22
, · · · , a
nn
.
4. If A = ￿

B C
0 D ￿

is the matrix of Section 8 Exercise 6, then show that the
eigenvalues of A are the eigenvalues of B together with the eigenvalues of D. (Hint:
Show det(A−λI) = det(B −λI) det(D −λI).)
5. Find the eigenvalues and associated eigenvectors of each of the following matrices.
(a)

2 0 0
0 2 0
0 0 2
¸
¸
(b)

2 1 0
0 2 0
0 0 2
¸
¸
(c)

2 1 0
0 2 1
0 0 2
¸
¸
10. Diagonalization 51
10. DIAGONALIZATION
Example 1: Let’s look back at Example 1 of the previous section
A = ￿

4 2
−1 1 ￿

which had two eigenvalues, λ = 2 and λ = 3. If we write the two equations Ax
1
= 2x
1
and Ax
2
= 3x
2
, where x
1
and x
2
are the associated eigenvectors, we obtain ￿

4 2
−1 1 ￿
￿
−1
1 ￿

= 2 ￿

−1
1 ￿

and ￿

4 2
−1 1 ￿
￿
−2
1 ￿

= 3 ￿

−2
1 ￿

.
The two eigenvectors can be lined up to form the columns of a matrix S so that the
two equations above can be combined into one matrix equation AS = SD where D
is the diagonal matrix of eigenvalues: ￿

4 2
−1 1 ￿
￿
−1 −2
1 1 ￿

= ￿

−1 −2
1 1 ￿
￿
2 0
0 3 ￿

.
This equation can be rewritten as A = SDS
−1
: ￿

4 2
−1 1 ￿

= ￿

−1 −2
1 1 ￿
￿
2 0
0 3 ￿
￿
−1 −2
1 1 ￿

−1
.
What just happened in this example is so important that we will illustrate it
for the general case. Suppose the n × n matrix A has eigenvalues λ
1
, λ
2
, · · · , λ
n
with linearly independent associated eigenvectors v
1
, v
2
· · · , v
n
, then the equations
Av
1
= λ
1
v
1
, Av
2
= λ
2
v
2
, · · · , Av
n
= λ
n
v
n
can be written in matrix form as
A

.
.
.
.
.
.
.
.
.
v
1
v
2
· · · v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
Av
1
Av
2
· · · Av
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
λ
1
v
1
λ
2
v
2
· · · λ
n
v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
v
1
v
2
· · · v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸

λ
1
λ
2
.
.
.
λ
n
¸
¸
¸
¸
52 10. Diagonalization
This matrix equation is of the form AS = SD. By multiplying on the right by
S
−1
we obtain A = SDS
−1
or
A =

.
.
.
.
.
.
.
.
.
v
1
v
2
· · · v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸

λ
1
λ
2
.
.
.
λ
n
¸
¸
¸
¸

.
.
.
.
.
.
.
.
.
v
1
v
2
· · · v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
−1
.
This last step is possible only if S is invertible. S will in fact be invertible if its
columns, which are the eigenvectors v
1
, v
2
· · · , v
n
, are linearly independent. This of
course leaves a giant gap in our discussion since at this point we still don’t know what
“linear independent” means. We will ﬁll this gap in Sections 16 and 19. Our method
for ﬁnding eigenvectors, which is to solve (A − λI)x = 0 by Gaussian elimination,
does in fact produce linearly independent eigenvectors, one for each free variable.
The only question is are there enough linearly independent eigenvectors to form a
square matrix S? If the answer is yes, then A can be factored into A = SDS
−1
where
S is invertible and D is diagonal, and A is called diagonalizable. If the answer is no,
then A is not diagonalizable.
Example 2: The matrix of Example 2 of the previous section is diagonalizable. Just
line up its eigenvectors to form the columns of S and write

2 3 0
4 3 0
0 0 6
¸
¸
=

−1 0
3
4
1 0 1
0 1 0
¸
¸

−1 0 0
0 6 0
0 0 6
¸
¸

−1 0
3
4
1 0 1
0 1 0
¸
¸
−1
.
Note that the diagonal factorization of a matrix is not completely unique. For exam-
ple,

2 3 0
4 3 0
0 0 6
¸
¸
=

1 0 3
−1 0 4
0 1 0
¸
¸

−1 0 0
0 6 0
0 0 6
¸
¸

1 0 3
−1 0 4
0 1 0
¸
¸
−1
is an equally valid factorization.
Whether or not a matrix can be diagonalized has important consequences for
the matrix and what we can do with it. It is one of the paramount questions in linear
algebra. We now give some conditions that insure that a matrix can be diagonalized.
1. An n × n matrix is diagonalizable if and only if it has n linearly independent
eigenvectors. In Example 2 above, even though there are only two eigenvalues, there
are three independent eigenvectors, and they are used to form the columns of S. If
a matrix does not have enough independent eigenvectors, as in Section 9 Examples
3 and 4, then it is not diagonalizable. Such matrices are called defective.
2. An n ×n matrix is diagonalizable if it has n real and distinct eigenvalues. In Ex-
ample 1 above, there are two distinct eigenvalues, each eigenvalue has an associated
10. Diagonalization 53
eigenvector, and these eigenvectors can be used to form the columns of S since they
are independent. But why do distinct eigenvalues insure diagonability in general?
This follows from the fact, to be proved later, that eigenvectors associated with dis-
tinct eigenvalues are always independent. (See Section 22.)
3. It would be helpful if we could decide if a matrix is diagonalizable just by looking
at it, without having to go through the tedious process of determining if it has enough
independent eigenvectors. Unfortunately there is no simple way to do this. But there
is an important class of matrices that are automatically diagonalizable. These are
the symmetric matrices. A deep theorem in linear algebra, called The Spectral The-
orem, says in part that all symmetric matrices are diagonalizable. (See Section 22.)
A nonsymmetric matrix may or may not be diagonalizable, but, fortunately, many
of the matrices that arise in physics and engineering are symmetric and are therefore
diagonalizable.
EXERCISES
1. Write diagonal factorizations for each of the matrices in Section 9 Exercise 1.
2. If A = SDS
−1
, then show A
n
= SD
n
S
−1
.
3. Decide which of the following matrices are diagonalizable just by looking at them.
(a)

0 −2 2
−2 0 −2
2 2 2
¸
¸
(b)

0 2 2
2 0 −2
2 −2 0
¸
¸
(c)

0 2 2
−2 0 2
2 −2 0
¸
¸
4. If A is 2 × 2 with eigenvalues λ
1
= 6 and λ
2
= 7 and associated eigenvectors
v
1
= ￿

5
9 ￿

and v
2
= ￿

2
4 ￿

, then ﬁnd the following.
(a) The characteristic polynomial of A.
(b) det(A)
(c) A
(d) The eigenvalues of A
2
.
(e) det(A
2
)
54 11. Matrix Exponential
11. MATRIX EXPONENTIAL
So far we have developed a simple algebra for square matrices. We can add,
subtract, and multiply them, and therefore expressions like I +2A−3A
2
+A
3
make
sense. Of course we cannot divide matrices, but A
−1
can be thought of as the
reciprocal of a matrix (deﬁned only if A is nonsingular). Is it possible for us to go
further and give meaning to expressions like

A, e
A
, ln(A), sin(A), cos(A), . . .?
Under certain conditions we can, but, because of its importance in applications, we
will focus only on the matrix exponential e
A
. To deﬁne it we use the Taylor series
for the real exponential function:
e
x
=
∞ ￿

n=0
x
n
n!
= 1 +x +
x
2
2!
+
x
3
3!
+ · · · for −∞< x < ∞.
This inﬁnite series converges to e
x
for any value of x and therefore can be taken as
the deﬁnition of e
x
. We use it as the starting point for the matrix exponential by
simply deﬁning
e
A
=
∞ ￿

n=0
1
n!
A
n
= I +A+
1
2!
A
2
+
1
3!
A
3
+ · · ·
for a square matrix A. Does this make sense? Let’s try an example:
exp ￿￿

0 0
0 0 ￿￿

= ￿

1 0
0 1 ￿

+ ￿

0 0
0 0 ￿

+
1
2! ￿

0 0
0 0 ￿

2
+ · · · = ￿

1 0
0 1 ￿

(Note that e
A
is also written as exp(A).) The exponential of the zero matrix is
therefore the identity matrix. Let’s try another example:
exp ￿￿

2 0
0 3 ￿￿

= ￿

1 0
0 1 ￿

+ ￿

2 0
0 3 ￿

+
1
2! ￿

2 0
0 3 ￿

2
+
1
3! ￿

2 0
0 3 ￿

3
+ · · ·
= ￿

1 0
0 1 ￿

+ ￿

2 0
0 3 ￿

+ ￿

2
2
2!
0
0
3
2
2! ￿

+ ￿

2
3
3!
0
0
3
3
3! ￿

+ · · ·
=

∞ ￿

n=0
2
n
n!
0
0
∞ ￿

n=0
3
n
n!
¸
¸
¸
= ￿

e
2
0
0 e
3 ￿

.
It is clear that to exponentiate a diagonal matrix you just exponentiate its diagonal
entries. Note that in both computations above the inﬁnite series of matrices con-
verged (trivially in the ﬁrst example). Does this always happen? Yes! It can be
11. Matrix Exponential 55
shown that the inﬁnite series for e
A
converges for any square matrix A whatever.
(We omit the proof.) Therefore e
A
exists for any square matrix A.
Accepting this, we still have the problem of how to compute e
A
for more compli-
cated matrices than those in the two previous examples. We can use two properties of
the matrix exponential to help us. The ﬁrst is that if AB = BA then e
A+B
= e
A
e
B
.
(We omit the proof.) This just says that, if A and B commute, then for these matri-
ces the matrix exponential satisﬁes the familiar law of exponents. We use this fact
to compute the following:
exp ￿￿

2 3
0 2 ￿￿

= exp ￿￿

2 0
0 2 ￿

+ ￿

0 3
0 0 ￿￿

= exp ￿￿

2 0
0 2 ￿￿

exp ￿￿

0 3
0 0 ￿￿

= ￿

e
2
0
0 e
2 ￿ ￿ ￿

1 0
0 1 ￿

+ ￿

0 3
0 0 ￿

+
1
2! ￿

0 3
0 0 ￿

2
+ · · · ￿

= ￿

e
2
0
0 e
2 ￿
￿￿
1 0
0 1 ￿

+ ￿

0 3
0 0 ￿

+
1
2! ￿

0 0
0 0 ￿

+ · · · ￿

= ￿

e
2
0
0 e
2 ￿
￿
1 3
0 1 ￿

= ￿

e
2
3e
2
0 e
2 ￿

.
(Don’t forget to ﬁrst show the two matrices above commute in order to justify the
use of the law of exponents.)
The second helpful property of matrix exponentials is that if A = SDS
−1
then
e
A
= Se
D
S
−1
. The proof is so simple we exhibit it here:
e
A
=
∞ ￿

n=0
1
n!
(SDS
−1
)
n
=
∞ ￿

n=0
1
n!
SD
n
S
−1
(See Section 10 Exercise 2.)
= S ￿

∞ ￿

n=0
1
n!
D
n ￿

S
−1
= Se
D
S
−1
Given the diagonal factorization ￿

4 −5
2 −3 ￿

= ￿

1 5
1 2 ￿
￿
−1 0
0 2 ￿
￿
1 5
1 2 ￿

−1
56 11. Matrix Exponential
we can therefore immediately write down
exp ￿￿

4 −5
2 −3 ￿￿

= ￿

1 5
1 2 ￿
￿
e
−1
0
0 e
2 ￿
￿
1 5
1 2 ￿

−1
.
We could multiply out the right-hand side, or we might just want to leave it in this
form. If A is defective, that is, if A doesn’t have a diagonalization factorization, then
there are more sophisticated ways to compute e
A
. We will not pursue them here. In
applications to ODE’s we will need to compute matrix exponentials of the form e
At
.
But this is easy for diagonalizable matrices like the one above since ￿

4 −5
2 −3 ￿

t = ￿

1 5
1 2 ￿
￿
−t 0
0 2t ￿
￿
1 5
1 2 ￿

−1
and therefore
exp ￿￿

4 −5
2 −3 ￿

t ￿

) = ￿

1 5
1 2 ￿
￿
e
−t
0
0 e
2t ￿
￿
1 5
1 2 ￿

−1
.
There is one more property of matrix exponentials that we will need in appli-
cations. It is analogous to the derivative formula
d
dt
e
at
= ae
at
. For the matrix
exponential it is just
d
dt
e
At
= Ae
At
. The proof follows:
d
dt
e
At
=
d
dt
∞ ￿

n=0
1
n!
(At)
n
=
d
dt
∞ ￿

n=0
1
n!
A
n
t
n
=
∞ ￿

n=1
1
n!
A
n
nt
n−1
= A
∞ ￿

n=1
1
(n −1)!
A
n−1
t
n−1
= A
∞ ￿

n=1
1
(n −1)!
(At)
n−1
= A
∞ ￿

n=0
1
n!
(At)
n
= Ae
At
.
11. Matrix Exponential 57
EXERCISES
1. Find e
A
where A is equal to each of the matrices of Section 9 Exercise 1.
2. Find e
At
where A is equal to each of the matrices of Section 9 Exercise 1.
3. Show exp ￿￿

2 3
0 2 ￿

t ￿

= ￿

e
2t
3te
2t
0 e
2t ￿

.
4. Verify the formula
d
dt
e
At
= Ae
At
where A is equal to the following matrices.
(a) ￿

2 0
0 3 ￿

(b) ￿

2 3
0 2 ￿

5. Prove the following equalities.
(a) exp ￿￿

0 β
−β 0 ￿￿

= ￿

cos β sin β
−sin β cos β ￿

(Use the series deﬁnition of the matrix
exponential.)
(b) exp ￿￿

α β
−β α ￿￿

= ￿

e
α
cos β e
α
sin β
−e
α
sin β e
α
cos β ￿

(Use the law of exponents.)
6. If Av = λv, then show e
A
v = e
λ
v.
7. Prove (e
A
)
−1
= e
−A
and conclude that e
A
is nonsingular for any square matrix
A.
58 12. Diﬀerential Equations
12. DIFFERENTIAL EQUATIONS
We recall the diﬀerential equation ˙ y = ay that governs exponential growth and
decay. The general solution is y(t) = Ce
at
. This fact will serve as a model for all that
follows.
Example 1: Suppose we want to solve the following linear system of ﬁrst-order ordi-
nary diﬀerential equations with initial conditions:.
˙ x = 4x −5y
˙ y = 2x −3y
x(0) = 8
y(0) = 5
We can write this system in matrix notation as ￿

˙ x
˙ y ￿

= ￿

4 −5
2 3 ￿￿

x
y ￿
￿
x(0)
y(0) ￿

= ￿

8
5 ￿

or letting u(t) = ￿

x(t)
y(t) ￿

we can write it as
˙ u = ￿

4 −5
2 3 ￿

u u(0) = ￿

8
5 ￿

.
If we let A be the matrix deﬁned above (called the coeﬃcient matrix), then the
system becomes simply ˙ u = Au. The solution of the system ˙ u = Au with initial con-
dition u(0) is u(t) = e
At
u(0). This fact follows immediately from the computations
d
dt
(e
At
u(0)) = A(e
At
u(0)) and e
A0
u(0) = Iu(0) = u(0). For the example above, the
solution would just be
u(t) = exp ￿￿

4 −5
2 3 ￿

t ￿￿

8
5 ￿

.
Since the coeﬃcient matrix has the diagonal factorization ￿

4 −5
2 −3 ￿

= ￿

1 5
1 2 ￿￿

−1 0
0 2 ￿￿

1 5
1 2 ￿

−1
,
we have
u(t) = ￿

1 5
1 2 ￿￿

e
−t
0
0 e
2t ￿￿

1 5
1 2 ￿

−1 ￿

8
5 ￿

.
To ﬁnd the ﬁnal solution it looks like we are going to have to compute an inverse.
But in fact this can be avoided by writing ￿

1 5
1 2 ￿

−1 ￿

8
5 ￿

= ￿

c
1
c
2 ￿

12. Diﬀerential Equations 59
as
￿
1 5
1 2 ￿￿

c
1
c
2 ￿

= ￿

8
5 ￿

,
which is just a linear system. Solving by Gaussian elimination we obtain ￿

c
1
c
2 ￿

= ￿

3
1 ￿

.
And putting this back into u(t) we get
u(t) = ￿

1 5
1 2 ￿￿

e
−t
0
0 e
2t ￿￿

3
1 ￿

= ￿

1 5
1 2 ￿￿

3e
−t
e
2t ￿

= ￿

3e
−t
+ 5e
2t
3e
−t
+ 2e
2t ￿

= 3e
−t ￿

1
1 ￿

+e
2t ￿

5
2 ￿

.
The solution in terms of the individual functions x and y is
x(t) = 3e
−t
+ 5e
2t
y(t) = 3e
−t
+ 2e
2t
.
If no initial conditions are given, then c
1
and c
2
would have to be carried through to
the end. The solution would then look like
u(t) = ￿

1 5
1 2 ￿￿

e
−t
0
0 e
2t ￿￿

c
1
c
2 ￿

= ￿

c
1
e
−t
+ 5c
2
e
2t
c
1
e
−t
+ 2c
2
e
2t ￿

= c
1
e
−t ￿

1
1 ￿

+c
2
e
2t ￿

5
2 ￿

.
We have expressed the solution in matrix form and in vector form. Note that the
vector form is a linear combination of exponentials involving the eigenvalues times
the associated eigenvectors. In fact if we set t = 0 in the vector form, then from the
initial conditions we obtain
c
1 ￿

1
1 ￿

+c
2 ￿

5
2 ￿

= ￿

8
5 ￿

60 12. Diﬀerential Equations
or
￿
1 5
1 2 ￿
￿
c
1
c
2 ￿

= ￿

8
5 ￿

,
which is the same system for the c’s that we obtained above. So the vector form of
the solution carries all the information we need. This suggests that we really don’t
need the matrix factorization at all. To ﬁnd the solution to ˙ u = Au, just ﬁnd the
eigenvalues and eigenvectors of A, and, assuming there are enough eigenvectors, write
down the solution in vector form.
Example 2: Let’s try another system:

˙ x
˙ y
˙ z
¸
¸
=

2 3 0
4 3 0
0 0 6
¸
¸

x
y
z
¸
¸
.
Since the coeﬃcient matrix has the diagonal factorization

2 3 0
4 3 0
0 0 6
¸
¸
=

−1 0 3
1 0 4
0 1 0
¸
¸

−1 0 0
0 6 0
0 0 6
¸
¸

−1 0 3
1 0 4
0 1 0
¸
¸
−1
we can immediately write down the solution as

x
y
z
¸
¸
=

−1 0 3
1 0 4
0 1 0
¸
¸

e
−t
0 0
0 e
6t
0
0 0 e
6t
¸
¸

−1 0 3
1 0 4
0 1 0
¸
¸
−1

x(0)
y(0)
z(0)
¸
¸
=

−1 0 3
1 0 4
0 1 0
¸
¸

e
−t
0 0
0 e
6t
0
0 0 e
6t
¸
¸

c
1
c
2
c
3
¸
¸
=

−1 0 3
1 0 4
0 1 0
¸
¸

c
1
e
−t
c
2
e
6t
c
3
e
6t
¸
¸
= c
1
e
−t

−1
1
0
¸
¸
+c
2
e
6t

0
0
1
¸
¸
+c
3
e
6t

3
4
0
¸
¸
.
Since no initial conditions were given, we have arbitrary constants in the solution.
Note that once we recognize the general form of the solution, we can just write it
down without going through the matrix exponential at all. In general, it is clear that
if A is diagonalizable, that is, if it has eigenvalues λ
1
, λ
2
, · · · , λ
n
and independent
eigenvectors v
1
, v
2
, · · · , v
n
, then the solution to ˙ u = Au has the form
u(t) = c
1
e
λ
1
t
v
1
+c
2
e
λ
2
t
v
2
+ · · · +c
n
e
λ
n
t
v
n
.
12. Diﬀerential Equations 61
It is also clear that the eigenvalues decide how the solutions behave as t →∞. If all
the eigenvalues are negative, then all the solutions consist only of linear combinations
of dying exponentials, and therefore u(t) → 0 as t → ∞. In this case the matrix A
is called stable. If at least one eigenvalue is positive, then there are solutions u(t)
containing at least one growing exponential and therefore those u(t) →∞as t →∞.
In this case the matrix A is called unstable . This is the situation with both systems
above. There is also a third possibility. If all the eigenvalues are negative or zero with
at least one actually equal to zero, then the solutions consist of linear combinations
of dying exponentials and at least one constant function, and therefore all solutions
stay bounded as t → ∞. In this case the matrix A is called neutrally stable. The
eigenvalues therefore determine the qualitative nature of the solution.
All this is clear enough for diagonalizable matrices, but what about defective
matrices? Consider the following example: ￿

˙ x
˙ y ￿

= ￿

2 3
0 2 ￿￿

x
y ￿

.
The solution is ￿

x
y ￿

= exp ￿￿

2 3
0 2 ￿

t ￿￿

x(0)
y(0) ￿

= ￿

e
2t
3te
2t
0 e
2t ￿
￿
x(0)
y(0) ￿

.
(See Section 11 Exercise 3.) A term of the form te
2t
has appeared. This is typical of
defective systems. Note that this term does not change the qualitative nature of the
solution u(t) as t → ∞. In general, terms of the form t
n
e
λt
arise, but they tend to
zero or inﬁnity as t →∞ depending on whether λ is negative or positive. The factor
t
n
ultimately has no eﬀect. It can be shown that this behavior holds for all defective
matrices. That is, the deﬁnitions of stable, unstable, and neutrally stable and their
implications about the long-term behavior of solutions hold for these matrices also.
(Actually a more precise statement has to be made in the case that zero is a multiple
eigenvalue, but we will ignore this possibility.) All of this will become clearer when
we consider the Jordan form of a matrix in a later section.
EXERCISES
1. Find the general solution of ˙ u = Au where A is equal to each of the matrices in
Section 9 Exercise 1.
2. Find the solutions of the systems above with the initial conditions below.
(a) ￿

x(0)
y(0) ￿

= ￿

3
2 ￿

(b)

x(0)
y(0)
z(0)
¸
¸
=

1
2
−3
¸
¸
(c)

x(0)
y(0)
z(0)
¸
¸
=

0
1
3
¸
¸
62 12. Diﬀerential Equations
(d)

x(0)
y(0)
z(0)
¸
¸
=

0
0
1
¸
¸
(e)

x(0)
y(0)
z(0)
¸
¸
=

4
3
4
¸
¸
(f)

x(0)
y(0)
z(0)
w(0)
¸
¸
¸
=

2
2
1
2
¸
¸
¸
3. Decide the stability properties of the following matrices.
(a) ￿

44 −28
77 −49 ￿

(b) ￿

47 −30
75 −48 ￿

(c) ￿

8 −6
15 −11 ￿

4. Here is another way to derive the general form of the solution of the system
˙ u = Au, assuming the diagonal factorization A = SDS
−1
. Make the change of
variables w = S
−1
u, and show that the system then becomes ˙ w = Dw. This is just a
simple system of n individual of ODE’s of the form ˙ w
1
= λ
1
w
1
, ˙ w
2
= λ
2
w
2
, · · · , ˙ w
n
=
λ
n
w
n
. These equations are well-known to have solutions w
1
(t) = c
1
e
λ
1
t
, w
2
(t) =
c
2
e
λ
2
t
, · · · , w
n
(t) = c
n
e
λ
n
t
. Write this as
w(t) =

c
1
e
λ
1
t
c
2
e
λ
2
t
.
.
.
c
n
e
λ
n
t
¸
¸
¸
¸
and conclude that the solution of the original system is
u(t) = Sw(t) = c
1
e
λ
1
t
v
1
+c
2
e
λ
2
t
v
2
+ · · · +c
n
e
λ
n
t
v
n
,
where the v’s are the columns of S, that is, the eigenvectors of A. This alternate
approach avoids the matrix exponential, but it does not generalize so easily to the
complex case or the case of defective matrices.
13. The Complex Case 63
13. THE COMPLEX CASE
We can no longer avoid complex numbers. When we considered real systems
Ax = b, the solution x was automatically real. There was no need to consider
complex numbers. But in the eigenvalue problem we have seen that there are real
matrices whose characteristic equations have complex roots. Does this mean that we
have to consider complex eigenvalues, complex eigenvectors, and complex diagonal
factorizations? The answer is yes, and not just for theoretical reasons. The com-
plex case is essential in solving linear systems of diﬀerential equations that describe
oscillations.
First we give a brief review of the most basic facts about complex numbers.
Recall that a complex number has the form a+ib where a and b are real numbers and
i is a quantity that satisﬁes the equation i
2
= −1. (You can think of i as denoting

−1, but don’t try to give any metaphysical meaning to it!) If z = a +ib, then a is
the real part of z and b is the imaginary part of z. Two complex numbers are equal
if and only if their real and imaginary parts are equal. Complex numbers are added
and multiplied much like real numbers, but you must keep in mind that i
2
= −1.
For example:
(2 +i) + (3 −i2) = 5 −i
(2 +i)(3 −i2) = 6 −i4 +i3 + 2 = 8 −i
Dividing complex numbers is a little more troublesome. First we take the reciprocal
of a complex number:
1
3 −i2
=
1
3 −i2
(3 +i2)
(3 +i2)
=
3 +i2
9 + 4
=
3
13
+i
2
13
We just multiplied the numerator and denominator by 3 +i2. We use the same trick
to divide two complex numbers:
2 +i
3 −i2
=
2 +i
3 −i2
(3 +i2)
(3 +i2)
=
6 +i4 +i3 −2
9 + 4
=
4 +i7
13
=
4
13
+i
7
13
We say that the complex conjugate of a complex number a + ib is a − ib and write
a +ib = a − ib. In both cases above we multiplied the numerator and denominator
by the complex conjugate of the denominator. Complex conjugation commutes with
multiplication, that is, wz = wz (Exercise 1).
We can deﬁne complex matrices in the same way as real matrices. It is possible,
but tedious, to show that the algebra of matrices, Gaussian elimination, inverses,
determinants, eigenvalues and eigenvectors, diagonalization, and so on carry over to
complex matrices.
Now we can go to work. Let’s consider the eigenvalue problem for the matrix
A = ￿

3 −2
1 1 ￿

.
64 13. The Complex Case
We compute the characteristic equation in the usual way and obtain λ
2
−4λ+5 = 0.
The roots are 2 + i and 2 − i, a complex conjugate pair. (In fact, all the complex
roots of any real polynomial equation occur in complex conjugate pairs.) Since we
are now in the complex world, we can consider these two complex numbers as the
eigenvalues of A. Now let’s look for the eigenvectors. First we take the eigenvalue
2 +i and, as usual, use Gaussian elimination to solve the system (A−(2 +i)I)x = 0: ￿

3 −(2 +i) −2
1 1 −(2 +i) ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿ ￿

1 −i −2
1 −1 −i ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿ ￿

1 −1 −i
1 −i −2 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿ ￿

1 −1 −i
0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿

.
(In the second step we exchanged the two rows to avoid a complex division.) Solving
this we obtain the eigenvector ￿

1 +i
1 ￿

. For the eigenvalue 2 −i the computation is
almost the same, and we obtain the eigenvector ￿

1 −i
1 ￿

. (Note that this vector is
the complex conjugate of the previous eigenvector. See Exercise 2.) Now we simply
line up these vectors in the usual way and obtain ￿

3 −2
1 1 ￿
￿
1 +i 1 −i
1 1 ￿

= ￿

1 +i 1 −i
1 1 ￿
￿
2 +i 0
0 2 −i ￿

,
and therefore we have the complex diagonal factorization ￿

3 −2
1 1 ￿

= ￿

1 +i 1 −i
1 1 ￿
￿
2 +i 0
0 2 −i ￿
￿
1 +i 1 −i
1 1 ￿

−1
.
Everything worked exactly as in the real case. Of course, complex arithmetic is
involved, so this isn’t something we would want to do for large systems, but at least
is troubling. The three matrices on the right are complex and the matrix on the left
is real. How can this be? Somehow or other, when the three matrices on the right are
multiplied out, all the imaginary parts of the complex numbers appearing in them
must cancel out! From this we might suspect that it shouldn’t really be necessary to
introduce complex numbers in order to obtain a useful factorization of a real matrix.
It turns out that it is possible to transform the complex diagonal factorization
into one which is real and almost diagonal. To describe this we introduce some
13. The Complex Case 65
notation. We write the ﬁrst eigenvalue and associated eigenvector as λ = 2 + i and
v = ￿

1 +i
1 ￿

. Then the second eigenvalue and associated eigenvector are λ = 2 − i
and v = ￿

1 −i
1 ￿

. Clearly they are just complex conjugates of the ﬁrst eigenvalue
and eigenvector and therefore don’t add any new information. We can ignore them.
Now identify the real and imaginary parts of λ and v as λ = α + iβ = 2 + i and
v = x+iy = ￿

1
1 ￿

+i ￿

1
0 ￿

. Then the basic equation Av = λv can be written A(x+iy) =
(α+iβ)(x+iy). When multiplied out it becomes Ax+iAy = (αx−βy) +i(βx+αy).
Since complex numbers are equal if and only if their real and imaginary parts are
equal, this equation implies that Ax = αx − βy and Ay = βx + αy. These two
equations can be written simultaneously in matrix form as
A ￿

x
1
y
1
x
2
y
2 ￿

= ￿

x
1
y
1
x
2
y
2 ￿
￿
α β
−β α ￿

or
A = ￿

x
1
y
1
x
2
y
2 ￿
￿
α β
−β α ￿
￿
x
1
y
1
x
2
y
2 ￿

−1
.
Therefore for the matrix of our example we obtain ￿

3 −2
1 1 ￿

= ￿

1 1
1 0 ￿
￿
2 1
−1 2 ￿
￿
1 1
1 0 ￿

−1
.
This our desired factorization. Everything on the right side is real. The middle factor
is no longer diagonal, but it exhibits the real and imaginary parts of the eigenvalue
in a nice pattern. (The question of the independence of the vectors x and y will be
settled in Section 16 Exercise 7.)
Let’s look at another example. Let
B =

−2 −2 −2 −2
1 0 −2 −1
0 0 1 −2
0 0 1 3
¸
¸
¸
.
The eigenvalues of B are 2+i, 2−i, −1+i, −1−i. (These are not so easy to compute
by hand since the characteristic polynomial of B is of fourth degree.) First we ﬁnd
the eigenvector associated with 2 + i by solving the system (B − (2 + i)I)x = 0. In
array form

−4 −i −2 −2 −2
1 −2 −i −2 −1
0 0 −1 −i −2
0 0 1 1 −i ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
0
¸
¸
¸
66 13. The Complex Case
by Gaussian elimination becomes

1 0 0 0
0 1 0 i
0 0 1 1 −i
0 0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
0
¸
¸
¸
which gives the eigenvector

0
−i
−1 +i
1
¸
¸
¸
. Similarly, we ﬁnd the eigenvector associated
with −1 +i by solving the system (B −(−1 +i)I)x = 0. In array form

−1 −i −2 −2 −2
1 1 −i −2 −1
0 0 2 −i −2
0 0 1 4 −i ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
0
¸
¸
¸
by Gaussian elimination becomes

1 1 −i 0 0
0 0 1 0
0 0 0 1
0 0 0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0
0
0
¸
¸
¸
which gives the eigenvector

−1 +i
1
0
0
¸
¸
¸
. We are essentially done. All we have to do
now is write down the answers. The complex diagonal factorization is
B =

0 0 −1 +i −1 −i
−i i 1 1
−1 +i −1 −i 0 0
1 1 0 0
¸
¸
¸

2 +i 0 0 0
0 2 −i 0 0
0 0 −1 +i 0
0 0 0 −1 −i
¸
¸
¸

0 0 −1 +i −1 −i
−i i 1 1
−1 +i −1 −i 0 0
1 1 0 0
¸
¸
¸
−1
.
13. The Complex Case 67
And the corresponding real diagonal-like factorization is
B =

0 0 −1 1
0 −1 1 0
−1 1 0 0
1 0 0 0
¸
¸
¸

2 1 0 0
−1 2 0 0
0 0 −1 1
0 0 −1 −1
¸
¸
¸

0 0 −1 1
0 −1 1 0
−1 1 0 0
1 0 0 0
¸
¸
¸
−1
.
Note how each of the two complex conjugate eigenvalues ￿

α +iβ 0
0 α −iβ ￿

in the
diagonal matrix of the ﬁrst factorization expand to 2 × 2 blocks ￿

α β
−β α ￿

in the
diagonal-like matrix of the second factorization. From now on we will call such
diagonal-like matrices block diagonal matrices.
Now we apply all this to solving diﬀerential equations. Suppose we have the
following system:
˙ x = 3x − 2y
˙ y = x + y
The coeﬃcient matrix is just A of the ﬁrst example above. To solve the system we
have to compute e
At
. Using the real block-diagonal factorization of A computed
above and the result of Section 11 Exercise 5(b), we get ￿

x(t)
y(t) ￿

= exp ￿￿

3 −2
1 1 ￿

t ￿￿

x(0)
y(0) ￿

= ￿

1 1
1 0 ￿

exp ￿￿

2 1
−1 2 ￿

t ￿￿

1 1
1 0 ￿

−1 ￿

x(0)
y(0) ￿

= ￿

1 1
1 0 ￿￿

e
2t
cos t e
2t
sin t
−e
2t
sin t e
2t
cos t ￿
￿
c
1
c
2 ￿

= ￿

1 1
1 0 ￿￿

c
1
e
2t
cos t +c
2
e
2t
sin t
−c
1
e
2t
sin t +c
2
e
2t
cos t ￿

= e
2t
(c
1
cos t +c
2
sin t) ￿

1
1 ￿

+e
2t
(−c
1
sin t +c
2
cos t) ￿

1
0 ￿

.
Now consider the larger system
˙ w = − 2w − 2x − 2y − 2z
˙ x = w − 2y − z
˙ y = y − 2z
˙ z = y + 3z.
The coeﬃcient matrix is just B of the second example. We solve the system in the
same way as above using the real block-diagonal factorization of B and obtain
68 13. The Complex Case

w(t)
x(t)
y(t)
z(t)
¸
¸
¸
= exp

¸
¸

−2 −2 −2 −2
1 0 −2 −1
0 0 1 −2
0 0 1 3
¸
¸
¸
t

w(0)
x(0)
y(0)
z(0)
¸
¸
¸
=

0 0 −1 1
0 −1 1 0
−1 1 0 0
1 0 0 0
¸
¸
¸
exp

¸
¸

2 1 0 0
−1 2 0 0
0 0 −1 1
0 0 −1 −1
¸
¸
¸
t

0 0 −1 1
0 −1 1 0
−1 1 0 0
1 0 0 0
¸
¸
¸
−1

w(0)
x(0)
y(0)
z(0)
¸
¸
¸
=

0 0 −1 1
0 −1 1 0
−1 1 0 0
1 0 0 0
¸
¸
¸

e
2t
cos t e
2t
sin t 0 0
−e
2t
sin t e
2t
cos t 0 0
0 0 e
−t
cos t e
−t
sin t
0 0 −e
−t
sin t e
−t
cos t
¸
¸
¸

c
1
c
2
c
3
c
4
¸
¸
¸
=

0 0 −1 1
0 −1 1 0
−1 1 0 0
1 0 0 0
¸
¸
¸

c
1
e
2t
cos t +c
2
e
2t
sin t
−c
1
e
2t
sin t +c
2
e
2t
cos t
c
3
e
−t
cos t +c
4
e
−t
sin t
−c
3
e
−t
sin t +c
4
e
−t
cos t
¸
¸
¸
= (c
1
e
2t
cos t +c
2
e
2t
sin t)

0
0
−1
1
¸
¸
¸
+ (−c
1
e
2t
sin t +c
2
e
2t
cos t)

0
−1
1
0
¸
¸
¸
+(c
3
e
−t
cos t +c
4
e
−t
sin t)

−1
1
0
0
¸
¸
¸
+ (−c
3
e
−t
sin t +c
4
e
−t
cos t)

1
0
0
0
¸
¸
¸
.
(The third equality requires a slight generalization of Section 11 Exercise 5(b).)
Now we can see the pattern. If λ = α + iβ, v = x + iy is a complex eigenvalue-
eigenvector pair for the coeﬃcient matrix, then so is λ = α−iβ, v = x−iy, and they
together will contribute terms like
· · · + (c
1
e
αt
cos βt +c
2
e
αt
sin βt)x + (−c
1
e
αt
sin βt +c
2
e
αt
cos βt)y + · · ·
to the solution. When t = 0 these terms become · · · c
1
x+c
2
y+· · · and are equated to
the initial conditions. Terms of the form e
αt
cos βt and e
αt
sin βt describe oscillations.
13. The Complex Case 69
The imaginary part β of the eigenvalue controls the frequency of the oscillations. The
real part α of the eigenvalue determines whether the oscillations grow without bound
or die out. We can therefore extend the language of the real case and say that a
matrix is stable if all of its eigenvalues have negative real parts, is unstable if one of
its eigenvalues has positive real part, and is neutrally stable if all of its eigenvalues
have nonpositive real parts with at least one with real part actually equal to zero.
What about defective matrices? These are matrices with repeated complex
eigenvalues that do not provide enough independent eigenvectors with which to con-
struct a diagonalization. It is still possible by more general kinds of factorizations to
compute exponentials of such matrices. In systems of diﬀerential equations such ma-
trices will produce solutions containing terms of the form t
n
e
αt
cos βt and t
n
e
αt
sin βt.
Just as in the real case, the factor of t
n
doesn’t have any eﬀect on the long-term qual-
itative behavior of such solutions. Stability or instability and the oscillatory behavior
of the solutions is still determined by the eigenvalues. Therefore, if you know the
eigenvalues of a system of diﬀerential equations, you know a lot about the behavior
of the solutions of that system without actually solving it.
Finally we present an application that describes vibrations in mechanical and
electrical systems. In modeling mass-spring systems, Newton’s second law of motion
and Hooke’s law lead to the second-order diﬀerential equation m¨ x(t) + kx(t) = 0,
where m = the mass, k = the spring constant, and x(t) = the displacement of the
mass as a function of time. For simplicity, divide by m and let ω
2
= k/m, so the
equation becomes ¨ x+ω
2
x = 0. In order to use the machinery that we have built up,
we have to cast this second-order equation into a ﬁrst-order system. To do this let
y
1
= x and y
2
= ˙ x. We then obtain the system
˙ y
1
= y
2
˙ y
2
= − ω
2
y
1
or in matrix form
￿
˙ y
1
˙ y
2 ￿

= ￿

0 1
−ω
2
0 ￿￿

y
1
y
2 ￿

.
To solve the system we have to diagonalize the coeﬃcient matrix. The eigenvalues
are λ = ±iω. Using Gaussian elimination to solve (A−iωI)x = 0 ￿

−iω 1
−ω
2
−iω ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿ ￿

−iω 1
0 0 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿

0
0 ￿

we obtain the eigenvector ￿

1
iω ￿

= ￿

1
0 ￿

+ i ￿

0
ω ￿

. The solution of the system is
therefore ￿

y
1
(t)
y
2
(t) ￿

= (c
1
cos ωt +c
2
sin ωt) ￿

1
0 ￿

+ (−c
1
sin ωt +c
2
cos ωt) ￿

0
ω ￿

.
70 13. The Complex Case
It follows that the solution of the original problem is x(t) = y
1
(t) = c
1
cos ωt +
c
2
sin ωt. This is the mathematical representation of simple harmonic motion.
EXERCISES
1. Verify
1
a +ib
= ￿

a
a
2
+b
2 ￿

+i ￿

−b
a
2
+b
2 ￿

and (a +ib)(c +id) = (a +ib)(c +id).
2. Show if A is real and Av = λv, then Av = λv. Conclude that if λ, v is a complex
eigenvalue-eigenvector pair for A, then so is λ, v.
3. Find the eigenvalues of the matrix ￿

α β
−β α ￿

.
4. Find the complex diagonal factorizations, the real block-diagonal factorizations,
and the stability of the following matrices.
(a) ￿

9 −10
4 −3 ￿

(b)

−1 0 3
−5 1 1
−3 0 −1
¸
¸
5. Find the general solutions of the following systems of diﬀerential equations.
(a) ˙ x = 9x − 10y
˙ y = 4x − 3y
(b) ˙ x = − x + 3z
˙ y = − 5x + y + z
˙ z = − 3x − z
6. Find the solutions of the systems in Exercise 5 with the following initial conditions.
(a) ￿

x(0)
y(0) ￿

= ￿

3
1 ￿

(b)

x(0)
y(0)
z(0)
¸
¸
=

−2
−1
3
¸
¸
14. Diﬀerence Equations and Markov Matrices 71
14. DIFFERENCE EQUATIONS AND MARKOV MATRICES
In this section we investigate how eigenvalues can be used to solve diﬀerence
equations. Diﬀerence equations are discrete analogues of diﬀerential equations. They
occur in a wide variety of applications and are used to desacribe relationships in
physics, chemistry, engineering, biology, ecology, and demographics.
Let A be an n ×n matrix and u
0
be an n ×1 column vector, then the following
inﬁnite sequence of column vectors can be generated:
u
1
= Au
0
u
2
= Au
1
u
3
= Au
2
.
.
.
The general relationship between consecutive terms of this sequence is expressed as
a diﬀerence equation:
u
k
= Au
k−1
.
The basic challenge posed by a diﬀerence equation is to describe the behavior of the
sequence u
0
, u
1
, u
2
, u
3
, · · ·. Speciﬁcally, (1) determine if the sequence has a limit and
if so then ﬁnd it, and (2) ﬁnd an explicit formula for u
k
in terms of u
0
. To this end
we observe that
u
1
= Au
0
u
2
= Au
1
= A(Au
0
) = A
2
u
0
u
3
= Au
2
= A(A
2
u
0
) = A
3
u
0
.
.
.
so the sequence becomes u
0
, Au
0
, A
2
u
0
, A
3
u
0
, · · ·. The problem of ﬁnding the solu-
tion u
k
= A
k
u
0
of the diﬀerence equation at the kth stage then reduces to computing
the matrix A
k
and determining its behavior as k becomes large. Suppose A has the
diagonal factorization A = SDS
−1
, then we can use the fact that A
k
= SD
k
S
−1
(Section 10 Exercise 2). Let A have eigenvalues λ
1
, λ
2
, · · · , λ
n
and associated eigen-
72 14. Diﬀerence Equations and Markov Matrices
vectors v
1
, v
2
, · · · , v
n
, and let c = S
−1
u
0
, then
u
k
= A
k
u
0
= SD
k
S
−1
u
0
= SD
k
c
=

.
.
.
.
.
.
.
.
.
v
1
v
2
· · · v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸

λ
k
1
λ
k
2
.
.
.
λ
k
n
¸
¸
¸
¸

c
1
c
2
.
.
.
c
n
¸
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
v
1
v
2
· · · v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸

c
1
λ
k
1
c
2
λ
k
2
.
.
.
c
n
λ
k
n
¸
¸
¸
¸
= c
1
λ
k
1

.
.
.
v
1
.
.
.
¸
¸
¸
+c
2
λ
k
2

.
.
.
v
2
.
.
.
¸
¸
¸
+ · · · +c
n
λ
k
n

.
.
.
v
n
.
.
.
¸
¸
¸
= c
1
λ
k
1
v
1
+c
2
λ
k
2
v
2
+ · · · +c
n
λ
k
n
v
n
This is then the general solution of the diﬀerence equation. (Note its similarity to
the general solution of a system of ODE’s in Section 12.) The c’s are determined by
the equation c = S
−1
u
0
. We can avoid the taking of an inverse by multiplying this
equation by S to obtain the linear system Sc = u
0
, which can be solved by Gaussian
elimination. This can also be seen by letting k = 0 in the general solution to obtain
u
0
= c
1
v
1
+c
2
v
2
+ · · · +c
n
v
n
, which is again Sc = u
0
.
To determine the long-term behavior of u
k
, let the eigenvalues be ordered so
that |λ
1
| ≥ |λ
2
| ≥ · · · ≥ |λ
n
|. Then from the general solution u
k
= c
1
λ
k
1
v
1
+c
2
λ
k
2
v
2
+
· · · + c
n
λ
k
n
v
n
it is clear that the behavior of u
k
as k → ∞ is determined by the size
of λ
1
. To be speciﬁc,

1
| < 1 ⇒u
k
→0

1
| = 1 ⇒u
k
bounded, may have a limit

1
| > 1 ⇒u
k
blows up.
(We are assuming that c
1
￿= 0. In general, the long-term behavior of u
k
is determined
by the largest λ
i
for which c
i
￿= 0.) We now illustrate these ideas with the following
examples.
Example 1: Find u
k
= A
k
u
0
where A = ￿

0 2
−.5 2.5 ￿

and u
0
= ￿

2
5 ￿

. Since A has the
14. Diﬀerence Equations and Markov Matrices 73
diagonal factorization
A = ￿

0 2
−.5 2.5 ￿

= ￿

2 4
2 1 ￿
￿
2 0
0 .5 ￿
￿
2 4
2 1 ￿

−1
,
we have
u
k
= ￿

0 2
−.5 2.5 ￿

k ￿

2
5 ￿

= ￿

2 4
2 1 ￿
￿
2 0
0 .5 ￿

k ￿

2 4
2 1 ￿

−1 ￿

2
5 ￿

= ￿

2 4
2 1 ￿
￿
2
k
0
0 .5
k ￿
￿
c
1
c
2 ￿

= c
1
(2)
k ￿

2
2 ￿

+c
2
(.5)
k ￿

4
1 ￿

.
(Of course, we could have written down the solution in this form as soon as we knew
the eigenvaluse and eigenvectors. We really didn’t need the diagonal factorization.
We only have to make sure that there are enough independent eigenvectors to insure
that the diagonal factorization exists.) And since the system ￿

2 4
2 1 ￿
￿
c
1
c
2 ￿

= ￿

2
5 ￿

has the solution c
1
= 3, c
2
= −1, we obtain
u
k
= 3(2)
k ￿

2
2 ￿

+ (−1)(.5)
k ￿

4
1 ￿

= ￿

6(2)
k
−4(.5)
k
6(2)
k
−(.5)
k ￿

.
It is also clear that u
k
becomes unbounded as k →∞.
Example 2: Each year 2/10 of the people in California move out and 1/10 of the
people outside California move in. Let I
k
and O
k
be the numbers of people inside
and outside California in the kth year. The initial populations are I
0
= 20 million
and O
0
= 202 million. The relationship between the populations in successive years
is given by
I
k+1
= .8I
k
+.1O
k
O
k+1
= .2I
k
+.9O
k
or ￿

I
k+1
O
k+1 ￿

= ￿

.8 .1
.2 .9 ￿
￿
I
k
O
k ￿

.
The problem is to ﬁnd the population distribution u
k
= ￿

I
k
O
k ￿

and to determine if
it tends to a stable limit. This is, of course, the problem of solving the diﬀerence
equation u
k
= Au
k−1
where A is the matrix above. As usual we ﬁnd the diagonal
factorization of A ￿

.8 .1
.2 .9 ￿

= ￿

1 1
2 −1 ￿
￿
1 0
0 .7 ￿
￿
1 1
2 −1 ￿

−1
74 14. Diﬀerence Equations and Markov Matrices
and solve the system ￿

1 1
2 −1 ￿
￿
c
1
c
2 ￿

= ￿

20
202 ￿

to obtain c
1
= 74 and c
2
= −54. We can then write the solution as
u
k
= 74(1)
k ￿

1
2 ￿

−54(.7)
k ￿

1
−1 ￿

= ￿

74 −54(.7)
k
148 + 54(.7)
k ￿

.
This is then the population distribution for any year. Note that as k → ∞ the
population distribution tends to ￿

74
148 ￿

.
This example exhibits two esssential properties that hold in many chemical, bio-
logical, and economic processes: (1) the total quantity in question is always constant,
and (2) the individual quantities are never negative. As a consequence of these two
properties, note that the columns of the matrix A above are nonnegative and add to
one. This can be interpreted as saying that each year all the people inside California
have to either remain side or move out (⇒ the ﬁrst column adds to one), and all the
people outside California have to either move in or remain outside (⇒ the second
column adds to one). Any matrix with nonnegative entries whose columns add to
one is called a Markov matrix and the process it describes is called a Markov process.
Markov matrices have several important properties, which we state but do not prove
in the following theorem.
Theorem. Any Markov matrix A has the following properties.
(a) All the eigenvalues of A satisfy |λ| ≤ 1.
(b) λ = 1 is always an eigenvalue and there exists an associated eigenvector v
1
with
all entries ≥ 0.
(c) If any power of A has all entries positive, then multiples of v
1
are the only
eigenvectors associated with λ = 1 and A
k
u
0
→c
1
v
1
for any u
0
.
We cannot prove this theorem completely with the tools we have developed so far,
but we can make parts of it plausible. First, since the columns of A sum to one, we
have A
T
v = v where v is the column vector consisting only of one’s. This means
that one is an eigenvalue of A
T
and therefore of A also since both matrices have
the same eigenvalues (Section 9 Problem 3(a)). Second, assume A has a diagonal
factorization and λ
2
, · · · , λ
n
all have absolute value < 1. Then as usual we have
A
k
u
0
= c
1
(1)
k
v
1
+c
2
λ
k
2
v
2
+· · · +c
n
λ
k
n
v
n
, so that clearly A
k
u
0
→c
1
v
1
. This is exactly
what happened in the example above. Note also that, since the limiting vector c
1
v
1
is a multiple of the eigenvector associated with λ = 1, we have A(c
1
v
1
) = c
1
v
1
. We
therefore say c
1
v
1
is a stable distribution or it represents a steady state. In terms of
14. Diﬀerence Equations and Markov Matrices 75
the population example this means ￿

.8 .1
.2 .9 ￿
￿
74
148 ￿

= ￿

74
148 ￿

. In other words, if the
initial population distribution is ￿

74
148 ￿

, it will remain as such forever. And if the
initial population distribution is something else, it will tend to ￿

74
148 ￿

in the long
run.
EXERCISES
1. For the diﬀerence equation u
k
= Au
k−1
where the matrix A and the starting
vector u
0
are as given below, compute u
k
and comment upon its behavior as k →∞.
(a) A = ￿

.5 .25
.5 .75 ￿

and u
0
= ￿

128
64 ￿

.
(b) A = ￿

−2.5 4.5
−1 2 ￿

and u
0
= ￿

18
10 ￿

.
(c) A = ￿

1 4
1 1 ￿

and u
0
= ￿

−1
2 ￿

.
2. Suppose multinational companies in the U.S., Japan, and Europe have total assets
of \$4 trillion. Initially the distribution of assets is \$2 trillion in the U.S., \$0 in Japan,
and \$2 trillion in Europe. Each year the distribution changes according to

US
k+1
J
k+1
E
k+1
¸
¸
=

.5 .5 .5
.25 .5 0
.25 0 .5
¸
¸

US
k
J
k
E
k
¸
¸
.
(We are implicitly making the completely false assumption that the world economy
is a zero-sum game!)
(a) Find the diagonal factorization of A.
(b) Find the distribution of assets in year k.
(c) Find the limiting distribution of assets.
(d) Show the limiting distribution is stable.
3. A truck rental company has centers in New York, Los Angeles, and Chicago.
Every month half of the trucks in New York and Los Angeles go to Chicago, the
other half stay where they are, and the trucks in Chicago are split evenly between
New York and Los Angeles. Initially the distribution of trucks is 90, 30, and 30 in
New York, Los Angeles, and Chicago respectively.

NY
k+1
LA
k+1
C
k+1
¸
¸
=

∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
¸
¸

NY
k
LA
k
C
k
¸
¸
.
76 14. Diﬀerence Equations and Markov Matrices
(a) Find the Markov matrix A that describes this process.
(b) Find the diagonal factorization of A.
(c) Find the distribution of trucks in month k.
(d) Find the limiting distribution of trucks.
(e) Show the limiting distribution is stable.
4. Suppose there is an epidemic in which every month half of those are well become
sick, a quarter of those who are sick get well, and another quarter of those who are
sick die. Find the corresponding Markov matrix and ﬁnd its stable distribution.

D
k+1
S
k+1
W
k+1
¸
¸
=

∗ ∗ ∗
∗ ∗ ∗
∗ ∗ ∗
¸
¸

D
k
S
k
W
k
¸
¸
5. In species that reproduce sexually, the characteristics of an oﬀspring are deter-
mined by a pair of genes, one inherited from each parent. The genes of a particular
trait (say eye color) are of two types, the dominant G (brown eyes) and the recessive
g (blue eyes). Oﬀspring with genotype GG or Gg exhibit the dominant trait, whereas
those of type gg exhibit the recessive trait. Now suppose we allow only males of type
gg to reproduce. Let the initial distribution of genotypes be u
1
=

p
q
r
¸
¸
. The entries
p, q, and r respectively represent the proportions of GG, Gg, and gg genotypes in
the initial generation. (They must be nonnegative and sum to one.) Show that the
Markov matrix
A =

0 0 0
1 .5 0
0 .5 1
¸
¸
represents how the distribution of genotypes in one generation transforms to the next
under our restrictive mating policy (that is, only blue-eyed males can reproduce).
What is the limiting distribution?
6. Suppose in the setup of the previous problem we allow males of all genotypes to
reproduce. Let G and g respectively represent the proportion of G genes and g genes
in the initial generation. (They also must be nonnegative and sum to one.) Show
that G = p +q/2 and g = r +q/2. Show that the Markov matrix
A =

G .5G 0
g .5 G
0 .5g g
¸
¸
14. Diﬀerence Equations and Markov Matrices 77
represents how the distribution of genotypes in the ﬁrst generation transforms to the
second. Show that u
2
=

G
2
2Gg
g
2
¸
¸
. Show that G and g again respectively represent the
proportion of G genes and g genes in the second generation. The matrix A therefore
represents how the distribution of genotypes in the second generation transforms
to the third. Show that u
3
= u
2
. Genetic equilibrium is therefore reached after
only one generation. (What does this say in the important special case where p =
r?) This result is the Hardy-Weinberg law and is at the foundation of the modern
science of population genetics. It says that in a large, ramdom-mating population,
the distribution of genotypes and the proportion of dominant and recessive genes
tend to remain constant from generation to generation, unless outside forces such
as selection, mutation, or migration come into play. In this way, even the rarest of
genes, which one would expect to disappear, are preserved.
78 15. Vector Spaces, Subspaces, and Span
PART 2: GEOMETRY
15. VECTOR SPACES, SUBSPACES, AND SPAN
The presentation so far has been entirely algebraic. Matrices have been added
and multiplied, equations have been solved, but nothing of a geometric nature has
been considered. Yet there is a natural geometric approach to matrices that is at
least as important as the algebraic approach. The mechanics of Gaussian elimination
has produced for us one kind of understanding of linear systems, but for a diﬀerent
and deeper understanding we must look to geometry.
We will assume some familiarity with, lines, planes, and geometrical vectors
in two and three dimensional physical space. We now want to examine what is
really at the heart of these concepts. To do this, we deﬁne an abstract model of
a vector space and then show how this idea can be used to develop concepts and
properties that are valid in all concrete instances of vector spaces. A vector space V
is a collection of objects, called vectors, on which two operations are deﬁned, addition
and multiplication by scalars (numbers). If the scalars are real numbers, the vector
space is a real vector space, and if the scalars are complex numbers, the vector space
is a complex vector space. V must be closed under addition and scalar multiplication.
This means that if x and y are vectors in V and if a is a scalar, then x + y and ax
are also vectors in V . The operations must also satisfy the following rules:
1. x +y = y +x
2. x + (y +z) = (x +y) +z
3. There is a “zero” vector 0 such that x + 0 = x for all x.
4. For each vector x, there is a unique vector −x such that x + (−x) = 0.
5. 1x = x
6. (ab)x = a(bx)
7. a(x +y) = ax +ay
8. (a +b)x = ax +bx
To put meat on this abstract deﬁnition we need some examples. For us the most
important vector spaces are the real Euclidean spaces R
1
, R
2
, R
3
, . . .. The space R
n
consists of all n×1 column matrices with the familiar deﬁnitions of addition and scalar
multiplication of matrices. (We have been calling such matrices column vectors all
along.) That these spaces are vector spaces follows directly from the properties of
matrices. The ﬁrst three spaces can be identiﬁed with familiar geometric objects: R
1
is represented by the real line, R
2
by the real plane, and R
3
by physical 3-space. The
representations are clear. For example, the point (x
1
, x
2
, x
3
) in 3-space corresponds
to the vector

x
1
x
2
x
3
¸
¸
in R
3
. Likewise, a vector in a higher dimensional Euclidean
15. Vector Spaces, Subspaces, and Span 79
space is completely determined by its components, even though the geometry is hard
to visualize.
x
1
x
2
x
3
x
1
x
2
x
3
(x
1
, x
2
, x
3
)
a point a vector
x
1
x
2
x
3
FIGURE 2
If we take column vectors whose components we allow to be complex numbers, we
obtain the complex Euclidean spaces: C
1
, C
2
, C
3
, · · ·. (We were actually in the world
of complex spaces in Section 13.) Even more abstract vector spaces that cannot be
visualized as any kind of Euclidean space are functions spaces. A particular example
is C
0
[0, 1], the collection of all real valued functions deﬁned and continuous on [0,1].
It is easy to see that C
0
[0, 1] is a real vector space, but it is impossible to see it
geometrically. For now, since we want to keep things as concrete as possible, we will
concentrate on real Euclidean spaces.
One nice thing about the ﬁrst three Euclidean spaces R
1
, R
2
, and R
3
is that for
them addition and scalar multiplication have simple geometric interpretations: The
sum x + y is the diagonal of the parallelogram with sides formed by x and y. The
diﬀerence x−y is the other side of the parallelogram with one side side y and diagonal
x. (Note that the line segment from y to x is not the vector x −y and in fact is not
a vector at all!) The product ax is the vector obtained from x by multiplying its
length by a. And the vector −x has the same length as x but points in the opposite
direction. This geometric desciption even extends to higher dimensional Euclidean
spaces.
x
x + y
y
x
y
x - y
FIGURE 3
80 15. Vector Spaces, Subspaces, and Span
It turns out that the vector spaces that we will need most occur inside the
standard spaces R
n
. We formalize this idea by saying that a subset S of a vector
space V is a subspace of V if S has the following properties:
1. S contains the zero vector.
2. If x and y are vectors in S, then x +y is also a vector in S.
3. If x is a vector in S and a is any scalar, then ax is also a vector in S.
Since addition and scalar multiplication in S follow the rules of the host space V ,
there is no need to verify the rules for a vector space for S. It is automatically a
vector space in its own right. We now look at some examples of subspaces of R
n
.
Example 1: Consider all vectors ￿

x
1
x
2 ￿

in R
2
whose components satisfy the equation
x
1
+2x
2
= 0. Clearly they are represented by points in R
2
that lie on a line through
the origin. These vectors form a subspace of R
2
since sums and scalar products of
vectors that satisfy the equation must also satisfy the equation. (We will prove this
using matrix notation later.) Furthermore, we can ﬁnd all such vectors explicitly.
We just write the equation in matrix form
[ 1 2 ] ￿

x
1
x
2 ￿

= [ 0 ]
and solve as usual; that is, we write the array [ 1 2 | | 0 ] , run Gaussian elimination
(unnecessary here of course), assign leading and free variables, and express the solu-
tion in vector form c ￿

−2
1 ￿

. We get all multiples of one vector, clearly a line through
the origin. It is easy to show that such vectors are closed under addition and scalar
multiplication (proved in greater generality later), thereby giving another veriﬁcation
that we have a subspace.
x
1
x
2
-2
1
c
FIGURE 4
If we change the equation to x
1
+ 2x
2
= 2 we still have a line. Vectors that
satisfy this equation, however, cannot form a subspace since the sum of two such
15. Vector Spaces, Subspaces, and Span 81
vectors does not satisfy the equation. If we solve the equation we obtain vectors of
the form ￿

2
0 ￿

+c ￿

−2
1 ￿

. Again we see that we do not have a subspace because these
vectors are not closed under addition. We can also see this geometrically by adding
two vectors that point to the line and noting that the result no longer points to the
line. Even more simply, the line does not pass through the origin, so the zero vector
is not even included.
Example 2: Consider all vectors

x
1
x
2
x
3
¸
¸
in R
3
whose components satisfy the equation
x
1
− x
2
+ x
3
= 0. This equation deﬁnes a plane in R
3
passing through the origin.
Vectors that satisfy this equation are closed under addition and scalar multiplication,
and the plane is therefore a subspace. We can ﬁnd all such vectors by writing the
equation in matrix form
[ 1 −1 1 ]

x
1
x
2
x
3
¸
¸
= [ 0 ]
and solving. We use the array [ 1 −1 1 | | 0 ] to obtain the solution
c

1
1
0
¸
¸
+d

−1
0
1
¸
¸
in vector form. This is the vector representation of the plane. Again, vectors of
this form are closed under addition and scalar multiplication and therefore form a
subspace.
1
1
0
-1
0
1
x
1
x
2
x
3
FIGURE 5
82 15. Vector Spaces, Subspaces, and Span
As in Example 1, if we change the equation to x
1
−x
2
+x
3
= 2, we still get a plane,
but for the same reasons as before it is no longer a subspace.
Example 3: This time we want all vectors in R
3
that satisfy the two equations
x
1
− x
2
= 0
x
2
− x
3
= 0
simultaneously. Again, vectors that satisfy both equations are closed under addition
and scalar multiplication. To ﬁnd all such vectors we write the equations in matrix
form ￿

1 −1 0
0 1 −1 ￿

x
1
x
2
x
3
¸
¸
= ￿

0
0 ￿

.
and solve to obtain c

1
1
1
¸
¸
. All multiples of this single vector generate a line in R
3
passing through the origin. This makes sense since each equation deﬁnes a plane in
R
3
and their intersection must be a line. This also suggests the general fact that the
intersection of any number of subspaces of a vector space is itself a subspace. The
conditions of closure under addition and scalar multiplication are easily veriﬁed.
Example 4: Finally, consider all vectors in R
4
whose components satisfy the equation
x
1
+ x
2
− x
3
+ x
4
= 0. We might expect that this equation deﬁnes some kind of
geometric plane passing through the origin. If we solve it we obtain
a

−1
1
0
0
¸
¸
¸
+b

1
0
1
0
¸
¸
¸
+c

−1
0
0
1
¸
¸
¸
.
These vectors do form a subspace, but it is hard to visualize. Later we will give
precise meaning to the notion that this subspace is a “three dimensional hyperplane
in four space.”
All of the examples above have the same form, which can be expressed more
simply in matrix notation. Each deﬁnes a set S as the collection of all vectors x
in R
n
that satisfy a system of equations Ax = 0. The problem is to show that S
is a subspace. We can do this directly as follows: If Ax = 0 and Ay = 0, then
A(x +y) = Ax +Ay = 0 + 0 = 0 and A(cx) = c(Ax) = c(0) = 0. Thus vectors that
satisfy the system Ax = 0 are closed under addition and scalar multiplication. The
second way to verify that S is a subspace is to solve the system Ax = 0 as we did
in the examples. The solution in vector form will look like a
1
v
1
+a
2
v
2
+. . . +a
n
v
n
,
15. Vector Spaces, Subspaces, and Span 83
where the a’s are arbitrary constants and the v’s are vectors. Vectors of this form
(a
1
v
1
+a
2
v
2
+. . . +a
n
v
n
) + (b
1
v
1
+b
2
v
2
+. . . +b
n
v
n
) =
(a
1
+b
1
)v
1
+ (a
2
+b
2
)v
2
+. . . + (a
n
+b
n
)v
n
and under scalar multiplication since
c(a
1
v
1
+a
2
v
2
+. . . +a
n
v
n
) = (ca
1
)v
1
+ (ca
2
)v
2
+. . . + (ca
n
)v
n
.
So again we see that S is a subspace.
Vectors of the form a
1
v
1
+a
2
v
2
+. . . +a
n
v
n
are said to be linear combinations of
the vectors v
1
, v
2
, . . . , v
n
. The subspace S of all linear combinations of v
1
, v
2
, . . . , v
n
is called the span of v
1
, v
2
, . . . , v
n
. We also say the vectors v
1
, v
2
, . . . , v
n
span or
generate the subspace S, and we write S = span{v
1
, v
2
, . . . , v
n
}.
EXERCISES
1. Show that C
0
[0, 1] is a real vector space. Show that C
1
[0, 1], which is the set of all
functions that are continuous and have continuous derivatives on [0,1], is a subspace
of C
0
[0, 1].
2. Show directly that the following are subspaces of R
3
.
(a) All vectors

x
1
x
2
x
3
¸
¸
that satisfy the equation x
1
−x
2
+x
3
= 0
(b) All vectors of the form c

1
1
0
¸
¸
+d

0
0
1
¸
¸
3. None of the following subsets of vectors ￿

x
1
x
2 ￿

in R
2
is a subspace. Why?
(a) All vectors where x
1
= 1.
(b) All vectors where x
1
= 0 or x
2
= 0.
(c) All vectors where x
1
≥ 0.
(d) All vectors where x
1
and x
2
are both ≥ 0 or both ≤ 0.
(e) All vectors where x
1
and x
2
are both integers.
4. Describe geometrically the subspace of R
3
spanned by the following vectors.
(a)

1
1
0
¸
¸
,

0
0
1
¸
¸
84 15. Vector Spaces, Subspaces, and Span
(b)

1
1
1
¸
¸
,

0
0
1
¸
¸
(c)

1
1
0
¸
¸
,

1
1
1
¸
¸
(d)

1
1
0
¸
¸
,

1
1
1
¸
¸
,

0
0
1
¸
¸
5. Find examples of subspaces of R
4
that satisfy the following conditions.
(a) Two “two dimensional planes” that intersect only at the origin.
(b) A line and a “three dimensional hyperplane” that intersect only at the origin.
6. Find vector representations for the following geometric objects, or said another
way, ﬁnd spanning sets of vectors for each of the following subspaces.
(a) 3x
1
−x
2
= 0 in R
2
.
(b) x
1
+x
2
+x
3
= 0 in R
3
.
(c) x
1
+x
2
+x
3
= 0 and x
1
−x
2
+x
3
= 0 in R
3
.
(d) x
1
−2x
2
+ 3x
3
−4x
4
= 0 in R
4
.
(e) x
1
+ 2x
2
−x
3
= 0, x
1
−2x
2
+x
4
= 0, x
2
−x
5
= 0 in R
5
.
7. Find vector representations for the following geometric objects and describe them.
(a) 3x
1
−x
2
= 3 in R
2
.
(b) x
1
+x
2
+x
3
= 1 in R
3
.
16. Linear Independence, Basis, and Dimension 85
16. LINEAR INDEPENDENCE, BASIS, AND DIMENSION
It is possible for diﬀerent sets of vectors to span the same subspace. For example,
it is easy to see geometrically that the two sets of vectors

1
1
0
¸
¸
,

1
1
1
¸
¸
,

0
0
1
¸
¸

and

1
1
0
¸
¸
,

0
0
1
¸
¸

generate the plane x
1
− x
2
= 0 in R
3
. The mathematical reason for this is that the
second vector in the ﬁrst set can be written as a linear combination of the ﬁrst and
third vectors:

1
1
1
¸
¸
=

1
1
0
¸
¸
+

0
0
1
¸
¸
Since the second vector can be regenerated from the other two, it is really not needed
and therefore can be dropped from the spanning set. The question arises how in
general can we reduce a spanning set to one of minimal size and still have it span the
same subspace.
x
1
x
2
x
3
1
1
1
0
0
1
1
1
0
FIGURE 6
Example 1: Suppose S = span{v
1
, v
2
, v
3
, v
4
, v
5
, v
6
}, and suppose we discover among
the spanning vectors the linear relationship v
2
−v
3
−3v
4
+ 2v
6
= 0. If we solve this
equation for v
2
we obtain v
2
= v
3
+ 3v
4
− 2v
6
. Since v
2
can be regenerated from
the other vectors in the spanning set, it can be removed from the spanning set. The
remaining vectors will still generate the same subspace S = span{v
1
, v
3
, v
4
, v
5
, v
6
}.
Of course, we could have solved the equation for v
3
or for v
4
or for v
6
, so any one of
86 16. Linear Independence, Basis, and Dimension
those vectors could have been the one to have been dropped. Now suppose among the
remaining vectors we ﬁnd another linear relationship, say 2v
1
+v
4
−4v
5
= 0. Then we
can solve for v
1
or for v
4
or for v
5
and can therefore drop any one of these vectors from
the spanning set. Suppose we drop v
4
. We then obtain S = span{v
1
, v
3
, v
5
, v
6
}. At
this point suppose there does not exist any linear relationship between the remaining
vectors. Then this process of shrinking the spanning set will have to stop.
On the basis of these observations we make some deﬁnitions. We say that a
collection of vectors v
1
, v
2
, . . . , v
n
is linearly dependent if there exists a linear combi-
nation of them that equals zero
a
1
v
1
+a
2
v
2
+. . . +a
n
v
n
= 0
where at least some of the coeﬃcients a
1
, a
2
, . . . , a
n
are not zero, and we say that
they are linearly independent if the only linear combination of them that equals zero
is the trivial one
0v
1
+ 0v
2
+. . . + 0v
n
= 0.
If a set of vectors v
1
, v
2
, . . . , v
n
is (1) linearly independent and (2) spans a subspace
S, then we say these vectors form a basis for S. (We state these deﬁnitions for
subspaces, but, since any vector space is a subspace of itself, they also hold for vector
spaces.)
two dependent
vectors
two independent
vectors
three dependent
vectors
v
1
v
1
v
1
v
2
v
2
v
2
v
3
FIGURE 7
The process described in the example above can now be expressed in the lan-
guage of linear independence and basis as follows: Suppose a set of vectors span a
subspace S. If these vectors are linearly dependent, then there is a nontrivial linear
combination of them that equals zero. In this case one of the vectors can be dropped
from the spanning set. (Any vector that appears in the linear combination with a
nonzero coeﬃcient can be chosen.) The remaining vectors will still span S. This
process of successively dropping dependent vectors can be continued until the set of
spanning vectors is linearly independent. The resulting spanning set is therefore a
basis for S. Although this process of successively dropping vectors from spanning set
16. Linear Independence, Basis, and Dimension 87
is not a practical way to actually ﬁnd a basis for a subspace, it does prove that every
subspace has a basis.
The importance of a basis to a subspace lies in the fact that not only can every
vector in a subspace be represented as a linear combination of the vectors in its basis,
but, even further, that this representation is unique. If for a basis v
1
, v
2
, . . . , v
n
we
have v = a
1
v
1
+a
2
v
2
+. . .+a
n
v
n
and also v = b
1
v
1
+b
2
v
2
+. . .+b
n
v
n
, then subtraction
gives 0 = (a
1
− b
1
)v
1
+ (a
2
− b
2
)v
2
+ . . . + (a
n
− b
n
)v
n
. But since v
1
, v
2
, . . . , v
n
are
linearly independent, all the coeﬃcients (a
i
− b
i
) = 0, and therefore a
i
= b
i
. We
conclude that there is only one way to write a vector as a linear combination of basis
vectors.
Example 2: The following three vectors

1
3
−1
−1
¸
¸
¸
,

2
6
0
4
¸
¸
¸
,

1
3
1
5
¸
¸
¸
generate a subspace S of R
4
. Our goal is to ﬁnd a basis for S. We accomplish this
by forming the matrix
A =

1 3 −1 −1
2 6 0 4
1 3 1 5
¸
¸
,
whose rows consist of these three vectors, and running Gaussian elimination (or
Gauss-Jordan elimination as we do here) to obtain
U =

1 3 0 2
0 0 1 3
0 0 0 0
¸
¸
.
The two nonzero rows of U, when made into column vectors, will form a basis for S:

1
3
0
2
¸
¸
¸
,

0
0
1
3
¸
¸
¸
Why does this work? First note that, because of the nature of Gaussian operations,
every row of U is a linear combination of the rows of A. Furthermore, since A can
be reconstructed from U by reversing the sequence of Gaussian operations, every
row of A is a linear combination of the rows of U. We can now draw a number of
conclusions. First, the rows of U must span the same subspace as the rows of A.
Second, since there is some linear combination of the rows of A that results in the
88 16. Linear Independence, Basis, and Dimension
third row of U, which is the zero vector, the rows of A must therefore be linearly
dependent. And ﬁnally, because of the echelon form of U, the nonzero rows of U are
automatically linearly independent. (See Exercise 4.)
Now that we have a basis for S, we can express any vector in S as a unique
linear combination of the basis vectors. For example, to express the vector

2
6
−3
−5
¸
¸
¸
in terms of the basis we must solve the equation
a

1
3
0
2
¸
¸
¸
+b

0
0
1
3
¸
¸
¸
=

2
6
−3
−5
¸
¸
¸
.
If the given vector is in S, then as we have seen there will be exactly one solution,
otherwise there will be no solution. We make the extremely important observation
that this equation is equivalent to the linear system

1 0
3 0
0 1
2 3
¸
¸
¸ ￿

a
b ￿

=

2
6
−3
−5
¸
¸
¸
(See Section 2 Exercise 7), which we can solve by Gaussian elimination. In this case
we obtain the solution a = 2 and b = −3 so that
2

1
3
0
2
¸
¸
¸
−3

0
0
1
3
¸
¸
¸
=

2
6
−3
−5
¸
¸
¸
.
Example 3: Find a basis for the subspace S of all vectors in R
4
whose components
satisfy the equation x
1
+ x
2
− x
3
+ x
4
= 0. This was Example 4 of the previous
section. There we found S consisted of all vectors of the form
a

−1
1
0
0
¸
¸
¸
+b

1
0
1
0
¸
¸
¸
+c

−1
0
0
1
¸
¸
¸
.
16. Linear Independence, Basis, and Dimension 89
The three column vectors clearly span S, and in fact they are also linearly indepen-
dent. This is true because if
a

−1
1
0
0
¸
¸
¸
+b

1
0
1
0
¸
¸
¸
+c

−1
0
0
1
¸
¸
¸
=

−a +b −c
a
b
c
¸
¸
¸
=

0
0
0
0
¸
¸
¸
,
then clearly a = b = c = 0. These three vectors therefore form a basis for S. This
holds in general. That is, if we solve a homogeneous system Ax = 0 by Gaussian
elimination, set the free variables equal to arbitrary constants, and write the solution
in vector form, then we obtain a linear combination of independent vectors, one for
each free variable. Therefore, in all the examples of the previous section, we were
actually ﬁnding not just spanning sets but bases! Furthermore, the comment in
Section 10,“ Our method for ﬁnding eigenvectors, which is to solve (A − λI)x = 0
by Gaussian elimination, does in fact produce linearly independent eigenvectors, one
for each free variable,” is justiﬁed.
There is no unique choice of a basis for a subspace. In fact, there are inﬁnitely
many possibilities. For example, the each of three sets of vectors

1
1
0
¸
¸
,

0
0
1
¸
¸

,

1
1
0
¸
¸
,

1
1
1
¸
¸

,

1
1
1
¸
¸
,

0
0
1
¸
¸

are bases for the plane x
1
− x
2
= 0 in R
3
. You can no doubt think of many more.
For the Euclidean spaces R
n
, however, there is the following natural choice of basis:

1
0
.
.
.
0
0
¸
¸
¸
¸
¸
¸
,

0
1
.
.
.
0
0
¸
¸
¸
¸
¸
¸
, . . . ,

0
0
.
.
.
1
0
¸
¸
¸
¸
¸
¸
,

0
0
.
.
.
0
1
¸
¸
¸
¸
¸
¸
These are the vectors that point along the coordinate axes, so we will call them
coordinate vectors. They clearly span and are linearly independent and therefore
form a basis for R
n
.
Even though the set of vectors in a basis is not unique, it is true that the number
of vectors in a basis is unique. This number we deﬁne to be the dimension of the
subspace. Clearly the Euclidean space R
n
has dimension n. It now makes sense to
talk about things like “a three dimensional hyperplane passing through the origin in
four space.” We state this important property of bases formally as:
90 16. Linear Independence, Basis, and Dimension
Theorem. Any two bases for a subspace contain the same number of vectors.
Proof: It is enough to show that in a subspace S the number of vectors in any linearly
independent set must be less than or equal to the number of vectors in any spanning
set. Since a basis is both linearly independent and spans, this means that any two
bases must contain exactly the same number of vectors. We now illustrate the proof
in a special case. The general case will then be clear. Suppose v
1
, v
2
, v
3
span the
subspace S and w
1
, w
2
, w
3
, w
4
is some larger set of vectors in S. We show that the
w’s must be linearly dependent. Since the v’s span, each w can be written as a linear
combination of the v’s:
w
1
= a
11
v
1
+a
12
v
2
+a
13
v
3
w
2
= a
21
v
1
+a
22
v
2
+a
23
v
3
w
3
= a
31
v
1
+a
32
v
2
+a
33
v
3
w
4
= a
41
v
1
+a
42
v
2
+a
43
v
3
.
In matrix terms this is

.
.
.
.
.
.
.
.
.
.
.
.
w
1
w
2
w
3
w
4
.
.
.
.
.
.
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
v
1
v
2
v
3
.
.
.
.
.
.
.
.
.
¸
¸
¸

a
11
a
21
a
31
a
41
a
12
a
22
a
32
a
42
a
13
a
23
a
33
a
43
¸
¸
,
which we write as W = V A. Since Ahas fewer rows than columns, there are nontrivial
solutions to the homogeneous system Ax = 0 (see Section 7 Exercise 5(b)), that is,
there is a nonzero vector c such that Ac = 0. We then have Wc = (V A)c = V (Ac) =
V 0 = 0. But the equation Wc = 0 when written out is just c
1
w
1
+ c
2
w
2
+ c
3
w
3
+
c
4
w
4
= 0 and is therefore a nontrivial linear combination of the w’s. The w’s are
therefore linearly dependent and we are done.
We see that a basis is a maximal independent set of vectors in the sense that it
cannot be made larger without losing independence. It is also a minimal spanning
set of vectors since it cannot be made smaller and still span the space. Note that we
have been implicitly assuming that the number of vectors in a basis is ﬁnite. It is
possible to extend the discussion above to the inﬁnite dimensional case, but we will
not do this.
EXERCISES
1. Decide the dependence or independence of the following sets of vectors.
(a) ￿

1
2 ￿

, ￿

2
1 ￿

(b)

1
3
2
¸
¸
,

3
3
1
¸
¸
,

3
6
5
¸
¸
16. Linear Independence, Basis, and Dimension 91
(c)

2
1
2
¸
¸
,

1
1
2
¸
¸
,

3
2
4
¸
¸
(d)

1
2
1
1
¸
¸
¸
,

2
1
2
1
¸
¸
¸
,

2
2
2
2
¸
¸
¸
(e)

1
2
1
1
¸
¸
¸
,

2
1
2
1
¸
¸
¸
,

3
3
3
2
¸
¸
¸
,

1
−1
1
0
¸
¸
¸
2. Find bases for the subspaces spanned by the sets of vectors in Exercise 1 above.
In each case indicate the dimension.
3. Find bases for the subspaces deﬁned by the equations in Section 15 Exercise 6. In
each case indicate the dimension.
4. Show directly from the deﬁnition that the nonzero rows of

1 3 0 2
0 0 1 3
0 0 0 0
¸
¸
are
linearly independent.
5. Express each vector as a linear combination of the vectors in the indicated sets.
(a)

5
−1
4
¸
¸

3
1
2
¸
¸
,

2
2
1
¸
¸

(b)

−3
1
4
¸
¸

3
1
2
¸
¸
,

2
2
1
¸
¸

(c)

10
−2
8
¸
¸

3
1
2
¸
¸
,

2
2
1
¸
¸
,

−1
1
−1
¸
¸

(d) ￿

8
13 ￿
￿￿
2
1 ￿

, ￿

1
2 ￿￿

For this case draw a picture!
6. Suppose we have three sets of vectors, U = {u
1
, . . . , u
4
}, V = {v
1
, . . . , v
5
}, W =
{w
1
, . . . , w
6
}, in R
5
. For each set answer the following.
(a) The set (is) (is not) (might be) linearly independent.
(b) The set (does) (does not) (might) span R
5
.
92 16. Linear Independence, Basis, and Dimension
(c) The set (is) (is not) (might be) a basis for R
5
.
7. If the complex vectors v and v are linearly independent over the complex numbers
and if v = x + iy, then show that the real vectors x and y are linearly independent
over the complex numbers. (Hint: Assume ax + by = 0 and use x =
v +v
2
and
x =
v −v
2
to show a = b = 0.) This settles a technnical question about complex
vectors from Section 13.
17. Dot Product and Orthogonality 93
17. DOT PRODUCT AND ORTHOGONALITY
So far, in our discussion of vector spaces, there has been no mention of “length”
or “angle.” This is because the deﬁnition of a vector space does not require such con-
cepts. For many vector spaces however, especially for Euclidean spaces, there is a nat-
ural way to establish these notions that is often quite useful. In two-dimensional space
the physical length of the vector x = ￿

x
1
x
2 ￿

is by the Pythagorean Theorem equal to ￿

x
2
1
+x
2
2
, and in three-dimensional space the physical length of the vector x =

x
1
x
2
x
3
¸
¸
is by two applications of the Pythagorean Theorem equal to ￿

x
2
1
+x
2
2
+x
2
3
. It seems
reasonable therefore to deﬁne the length or norm of a vector x in R
n
, which we denote
as ￿x￿, in the following way: ￿
x￿ = ￿

x
2
1
+x
2
2
+ · · · +x
2
n
(There are situations and applications where other measures of length are more ap-
propriate. But this one will be adequate for our purposes.) Note that since our
vectors are column vectors, the length of a vector can also be written in matrix nota-
tion as ￿x￿ =

x
T
x. It is easy to see that the length function satisﬁes the following
two properties:
1. ￿ax￿ = |a| ￿x￿
2. ￿x￿ ≥ 0 and = 0 ⇔x = 0.
Note also that if we multiply any vector x by the reciprocal of its length, we get
1 ￿
x￿
x, which is a vector of length one. We say this is the unit vector in the direction
of x. With this notion of length we can immediately deﬁne the distance between two
points x and y in R
n
as ￿x − y￿. This corresponds to the usual physical distance
between points in two and three-dimensional space.
How can we decide if two vectors are perpendicular? In order to help us do this,
we deﬁne the dot product x · y of two vectors x and y in R
n
as the number
x · y = x
1
y
1
+x
2
y
2
+ · · · +x
n
y
n
.
In matrix notation we can also write x · y = x
T
y. The dot product satisﬁes the
following properties:
1. x · y = y · x
2. (ax +by) · z = ax · z +by · z
3. z · (ax +by) = az · x +bz · y
4. x · x = ￿x￿
2
.
94 17. Dot Product and Orthogonality
They can be veriﬁed by direct computation. The second and third properties follow
from the distributivity of matrix multiplication. Other terms for dot product are
scalar product and inner product.
Now we will see how to determine if two vectors x and y in R
n
are perpendicular.
First note that, assuming they are independent, they span a two-dimensional sub-
space of R
n
. When endowed with the length function ￿ ￿, this subspace satisﬁes all
the axioms of the Euclidean plane. We therefore have all the constructs of Euclidean
geometry in this plane including lines, circles, lengths, and angles. In particular, we
have the Pythagorean Theorem, which says that the sides of a triangle are in the
relation a
2
+b
2
= c
2
if and only if the angle opposite side c is a right angle. (It goes
x
y
|| x ||
|| y ||
|| y - x ||
FIGURE 8
If we write this equation for the triangle formed by the two vectors x and y in vector
notation and use the properties of the dot product, we have ￿
x￿
2
+￿y￿
2
= ￿x −y￿
2
= (x −y) · (x −y)
= x · x −x · y −y · x +y · y
= ￿x￿
2
−2x · y +￿y￿
2
.
Canceling, we obtain 0 = −2x · y or x · y = 0. We therefore conclude that the vectors
x and y are perpendicular if and only if their dot product x · y = 0. Another term
for perpendicular is orthogonal. In mathematical shorthand we write the statement
“x is orthogonal to y” as x ⊥ y. Therefore the result above can be written as
x ⊥ y ⇔x · y = 0.
17. Dot Product and Orthogonality 95
Example 1: The vectors x =

2
2
1
¸
¸
and y =

−2
1
2
¸
¸
are orthogonal because x · y = 0.
Each has length

4 + 4 + 1 = 3. The unit vector in the direction of x is
1 ￿
x￿
x =

2
3
2
3
1
3
¸
¸
¸
.
Even though it is not necessary for linear algebra, the dot product can also tell us
the angle between any two vectors, orthogonal or not. For this we need the Law of
Cosines, which also appears in Euclid and which says that the sides of any triangle
are in the relation a
2
+b
2
= c
2
+2ab cos θ where θ is the angle opposite side c. Again
writing this equation for the triangle formed by the two vectors x and y in vector
notation ￿x￿
2
+ ￿y￿
2
= ￿x − y￿
2
+ 2￿x￿￿y￿ cos θ and computing (Exercise 9) we
obtain x · y = ￿x￿￿y￿ cos θ or
cos θ =
x · y ￿
x￿￿y￿
.
x
y
|| y ||
|| y - x ||
!
|| x ||
FIGURE 9
Example 2: The angle between the vectors

2
2
1
¸
¸
and

1
3
2
¸
¸
is determined by cos θ =
10

9

14
= 0.89087, so θ = arccos(0.89087) = 27.02

.
We are now in a position to compute the projection of one vector onto another.
Suppose we wish to ﬁnd the vector p which is the geometrically perpendicular pro-
jection of the vector y onto the vector x. To be precise, we should say that we are
seeking the projection p of the vector y onto the direction deﬁned by x or onto the line
96 17. Dot Product and Orthogonality
generated by x. Since we can do geometry in the plane deﬁned by the two vectors x
and y, we immediately see from the ﬁgure below that p must have the property that
x ⊥ (y −p), so 0 = x · (y −p) = x · y −x · p or x · p = x · y. Also, since p lies on the
line generated by x, it must be some constant multiple of x, so p = cx. Substituting
this into the previous equation we obtain c(x · x) = x · y or c = (x · y)/(x · x). The
ﬁnal result is therefore
p =
x · y ￿
x￿
2
x.
We should think of the vector p as the component of y in the direction of x. In fact,
if we write y = p + (y −p), we have resolved y into the sum of its component in the
direction of x and its component perpendicular to x.
y
x
p
y - p
FIGURE 10
Example 3: To resolve y =

5
5
−2
¸
¸
into its components in the direction of and
perpendicular to

2
2
1
¸
¸
, just compute p =
18
9

2
2
1
¸
¸
=

4
4
2
¸
¸
and obtain
y = p + (y −p) =

4
4
2
¸
¸
+

¸

5
5
−2
¸
¸

4
4
2
¸
¸

=

4
4
2
¸
¸
+

1
1
−4
¸
¸
Having completed our discussion of orthogonality of vectors, we now turn to
subspaces. We say that two subspaces V and W are orthogonal subspaces if every
vector in V is orthogonal to every vector in W. For example, the z-axis is orthogonal
to the xy-plane in R
3
. But note that xz-plane and the xy-plane are not orthogonal.
That is, a wall of a room is not perpendicular to the ﬂoor! This is because the x-
coordinate vector

1
0
0
¸
¸
is in both subspaces but is not orthogonal to itself. It is easy
17. Dot Product and Orthogonality 97
to check the orthogonality of subspaces if we have spanning sets for each subspace.
Just verify that every vector in one spanning set is orthogonal to every vector in
the other. For example, if V = span{v
1
, v
2
} and W = span{w
1
, w
2
} and the v’s are
orthogonal to the w’s, then any vector in V is orthogonal to any vector in W, because
(a
1
v
1
+a
2
v
2
) · (b
1
w
1
+b
2
w
2
) = a
1
b
1
v
1
· w
1
+a
2
b
1
v
2
· w
1
+a
1
b
2
v
1
· w
2
+a
2
b
2
v
2
· w
2
= 0
We make one more deﬁnition. The set W of all vectors perpendicular to a
subspace V is called the orthogonal complement of V and is written as W = V

. It is
easy to see that W is in fact a subspace (Exercise 12). It is also follows automatically,
but not so easily, that V is the perpendicular complement of W or V = W

(Exercise
13). In other words, the relationship is symmetric, and we are justiﬁed in saying that
V and W are orthogonal complements of each other. For example, the xy-plane
and the z-axis are orthogonal complements, but the x-axis and the y-axis are not.
Orthogonal complements are easy to compute.
Example 4: Find the orthogonal complement of the line generated by the vector

1
2
3
¸
¸
, and ﬁnd the equations of the line. Here the ﬁrst problem is to ﬁnd all vectors
y orthogonal to the given generating vector, that is, to ﬁnd all vectors y whose dot
product with the given vector is zero. Expressed in matrix notation this is just
[ 1 2 3 ]

y
1
y
2
y
3
¸
¸
= 0.
We solve this linear system and obtain
y = c

−2
1
0
¸
¸
+d

−3
0
1
¸
¸
.
The two vectors above therefore span the plane that is the orthogonal complement
of the given line. In fact, these two vectors are a basis for that plane. Now to ﬁnd
the equations of the line itself, note that a vector x lies in the line if and only if x is
orthogonal to the plane we just found. In other words, the dot product of x with each
of the two vectors that generate that plane must be zero. Therefore x must satisfy
the equations −2x
1
+x
2
= 0 and −3x
1
+x
3
= 0. These are then the equations that
deﬁne the given line.
Example 5: Find the equations of the plane generated by the two vectors

1
1
1
¸
¸
and

1
−1
1
¸
¸
. Again we look for all vectors orthogonal to the generating vectors. We
98 17. Dot Product and Orthogonality
therefore set up the linear system ￿

1 1 1
1 −1 1 ￿

y
1
y
2
y
3
¸
¸
= ￿

0
0 ￿

and get the solution
y = c

−1
0
1
¸
¸
,
which generates the line orthogonal to the given plane. Now to ﬁnd the equation
form of the plane, note that any vector x in the plane must be orthogonal to the
orthogonal complement of the plane, that is, to the line just obtained. This means
that the dot product of x with the vector that generates the orthogonal line must be
zero. Therefore −x
1
+x
3
= 0 is the equation of the given plane.
Note that in Section 15 we learned how to go from the equation form of a subspace
to its vector form. We now know how to go in the reverse direction, that is, from its
vector form to its equation form.
EXERCISES
1. For the two vectors x =

1
2
−2
−4
¸
¸
¸
and y =

−6
−2
2
9
¸
¸
¸
(a) Find their lengths.
(b) Find the unit vectors in the directions they deﬁne.
(c) Find the angle between them.
(d) Find the projection of y onto x.
(e) Resolve y into components in the direction of and perpendicular to x.
2. In R
2
ﬁnd the point on the line generated by the vector ￿

2
3 ￿

closest to the point
(8,
11
2
).
3. Find all vectors orthogonal to ￿

α
β ￿

in R
2
.
4: Show that the line generated by the vector

2
2
1
¸
¸
is orthogonal to the plane gener-
ated by the two vectors

1
1
−4
¸
¸
and

2
0
−4
¸
¸
.
17. Dot Product and Orthogonality 99
5. Find the orthogonal complements of the subspaces generated by the following
vectors.
(a)

1
1
1
¸
¸
(b)

1
1
1
¸
¸
,

1
3
7
¸
¸
(c)

1
1
1
2
¸
¸
¸
,

2
1
1
1
¸
¸
¸
(d)

1
1
0
2
¸
¸
¸
,

2
0
1
1
¸
¸
¸
,

2
2
0
3
¸
¸
¸
6. Find equations deﬁning the subspaces in Exercise 5 above.
7. True or false?
(a) If two subspaces V and W are orthogonal, then so are their orthogonal comple-
ments.
(b) If U is orthogonal to V and V is orthogonal to W, then U is orthogonal to W.
8. Show that the length of
1 ￿
x￿
x is one.
9. Derive x · y = ￿x￿￿y￿ cos θ from the Law of Cosines.
10. Show x · y =
1
4
(￿x +y￿
2
−￿x −y￿
2
).
11. Show that if the vectors v
1
, v
2
, v
3
are all orthogonal to one another, then they
must be linearly independent. (Hint: Write c
1
v
1
+c
2
v
2
+c
3
v
3
= 0 and show the c’s
are all zero by dotting both sides with each of the v’s.) Of course this result extends
to arbitrary numbers of vectors v
1
, v
2
, . . . , v
n
.
12. If V is a subspace, then show W = V

is also a subspace, that is, show W is
closed under addition and scalar multiplication.
13. Let V be a subspace of R
8
and W = V

. We wish to show that W

= V , or,
what is the same thing, (V

)

= V.
100 17. Dot Product and Orthogonality
(a) Suppose V has a basis v
1
, v
2
, v
3
. Let
A =

· · · v
1
· · ·
· · · v
2
· · ·
· · · v
3
· · ·
¸
¸
,
and by counting leading and free variables in the system Ax = 0 show that
V

= W has a basis w
1
, w
2
, w
3
, w
4
, w
5
.
(b) Let
B =

· · · w
1
· · ·
· · · w
2
· · ·
· · · w
3
· · ·
· · · w
4
· · ·
· · · w
5
· · ·
¸
¸
¸
¸
¸
,
and by counting leading and free variables in the system Bx = 0 show that W

has dimension 3.
(c) Observe that each of the three vectors v
1
, v
2
, v
3
satisfy Bx = 0 and therefore are
in W

. Since they are also independent, conclude that W

= span{v
1
, v
2
, v
3
} =
V.
14. Show if v · w = ±||v||||w||, then v = cw for some constant c. Hint: Expand
||v − cw||
2
and show that it equals zero if c = ±||v||/||w||. Interpret this as saying
that if the angle between two vectors is 0 or π, then one vector is a multiple of the
other.
18. Linear Transformations 101
18. LINEAR TRANSFORMATIONS
Many problems in the physical sciences involve transformations, that is, the
way in which input data is changed into output data. It often happens that the
transformations in question are linear. In this section we present some of the ba-
sic terminology and facts about linear transformations. As usual we consider only
Euclidean spaces.
We deﬁne a transformation to be a function that takes points in R
n
as input and
produces points in R
m
as output, or, in other words, maps points in R
n
to points in
R
m
. For example, S(x
1
, x
2
) = (x
2
1
, x
2
+ 1) is a transformation that maps R
2
to R
2
.
Instead of mapping points to points, we can think of transformations as mapping
vectors to vectors. We can therefore write S as S ￿￿

x
1
x
2 ￿￿

= ￿

x
2
1
x
2
+ 1 ￿

. This is the
view we will take from now on. The picture we should keep in mind is that in general
a transformation T maps the vector x in R
n
to the vector T(x) in R
m
.
R
n
x
T(x)
R
m
T
FIGURE 11
We further deﬁne a transformation T from R
n
to R
m
to be a linear transforma-
tion if for all vectors x and y and constants c it satisﬁes the properties:
1. T(x +y) = T(x) +T(y)
2. T(cx) = cT(x)
Note that if we take c = 0 in property 2 we have T(0) = 0. A linear transformation
must therefore take the origin to the origin. (The transformation S above is therefore
not linear.) Let’s try to view these two properties geometrically. Property 1 says
that under the map T the images of x and y when added together should be the same
as the image of x + y, and property 2 says that the image of x when multiplied by
c should be the same as the image of cx. We can think of property 1 as saying that
T must take the vertices of the parallelogram deﬁned by x and y into the vertices of
the parallelogram deﬁned by T(x) and T(y).
102 18. Linear Transformations
T
x
x + y
y
T(x)
T(y)
T(x + y)
FIGURE 12
It is an immediate consequence of the deﬁnition that a linear transformation takes
subspaces to subspaces. In other words, if S is a subspace of R
n
, then T(S), which is
the set of all vectors of the form T(x), is a subspace of R
m
. It is a further consequence
of the deﬁnition that every linear transformation must have a certain special form.
We now determine what that form must be.
First, we can create linear transformations by using matrices. Suppose A is
an m × n matrix. Then we can deﬁne the transformation T(x) = Ax. Because of
the way matrix multiplication works, the input vector x is in R
n
and the output
vector Ax is in R
m
. This transformation is linear because T(x + y) = A(x + y) =
Ax + Ay = T(x) + T(y) and T(cx) = A(cx) = cAx = cT(x), which both follow
from the properties of matrix multiplication. Therefore every m×n matrix induces
a linear transformation from R
n
to R
m
.
Second, every linear transformation is induced by some matrix. Suppose T is a
linear transformation that maps from R
n
to R
m
. Then we can write
18. Linear Transformations 103
T

¸
¸
¸

x
1
x
2
.
.
.
x
n
¸
¸
¸
¸

= T

¸
¸
¸
x
1

1
0
.
.
.
0
¸
¸
¸
¸
+x
2

0
1
.
.
.
0
¸
¸
¸
¸
+ · · · +x
n

0
0
.
.
.
1
¸
¸
¸
¸

= x
1
T

¸
¸
¸

1
0
.
.
.
0
¸
¸
¸
¸

+x
2
T

¸
¸
¸

0
1
.
.
.
0
¸
¸
¸
¸

+ · · · +x
n
T

¸
¸
¸

0
0
.
.
.
1
¸
¸
¸
¸

= x
1

a
11
a
21
.
.
.
a
m1
¸
¸
¸
¸
+x
2

a
12
a
22
.
.
.
a
m2
¸
¸
¸
¸
+ · · · +x
n

a
1n
a
2n
.
.
.
a
mn
¸
¸
¸
¸
=

a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
¸
¸
¸
¸

x
1
x
2
.
.
.
x
n
¸
¸
¸
¸
.
(The second equality follows from the linearity of T. The fourth equality follows
from Section 2 Exercise 7.) Therefore every linear transformation T has a matrix
representation as T(x) = Ax.
Note also that
T

¸
¸
¸

x
1
x
2
.
.
.
x
n
¸
¸
¸
¸

=

a
11
x
1
+a
12
x
2
+. . . +a
1n
x
x
a
21
x
1
+a
22
x
2
+. . . +a
2n
x
n
.
.
.
a
m1
x
1
+a
m2
x
2
+. . . +a
mn
x
n
¸
¸
¸
¸
.
So every linear transformation must have this form. From now on, we will forget
about the formal linear transformation T and instead just consider the matrix A as
a transformation from one Euclidean space to another. Note that A is completely
determined by what it does to the coordinate vectors. This follows either from
the computation above or just from matrix multiplication. For example, if A = ￿

3 −1 1
1 5 2 ￿

, then A

1
0
0
¸
¸
= ￿

3
1 ￿

, A

0
1
0
¸
¸
= ￿

−1
5 ￿

, and A

0
0
1
¸
¸
= ￿

1
2 ￿

.
Let S be a linear transformation from R
n
to R
q
and T be a linear transformation
from R
q
to R
n
. The the composition T ◦ S is deﬁned to be the transformation
(T ◦ S)(x) = T(S(x)) that takes R
n
to R
m
. It is a linear transformation since
T(S(x + y)) = T(S(x) + (S(y)) = T(S(x)) + T(S(y)) and T(S(c)) = T(cS(x)) =
cT(S(x)). If S has matrix A and T has matrix B, then the question arises, what is
104 18. Linear Transformations
the matrix for the composition T ◦ S? If we compute T(S(x)) = B(Ax) = (BA)x,
we see immediately that that answer is that it is the product matrix BA. The key to
this observation is the relation B(Ax) = (BA)x, which follows from the associativity
of matrix multiplication.
Since this result is so important, we will again compute the matrix of the com-
position, but this time directly. To ﬁnd the jth column of the matrix for T ◦ S we
know that all we have to do is see what it does to the jth coordinate vector.
T

¸
¸
¸
¸
¸
¸
S

¸
¸
¸
¸
¸
¸

0
.
.
.
1
j
.
.
.
0
¸
¸
¸
¸
¸
¸
¸

= T

¸
¸
¸

a
1j
a
2j
.
.
.
a
qj
¸
¸
¸
¸

= a
1j
T

¸
¸
¸

1
0
.
.
.
0
¸
¸
¸
¸

+a
2j
T

¸
¸
¸

0
1
.
.
.
0
¸
¸
¸
¸

+ · · · +a
qj
T

¸
¸
¸

0
0
.
.
.
1
¸
¸
¸
¸

= a
1j

b
11
b
21
.
.
.
b
m1
¸
¸
¸
¸
+a
2j

b
12
b
22
.
.
.
b
m2
¸
¸
¸
¸
+ · · · +a
qj

b
1q
b
2q
.
.
.
b
mq
¸
¸
¸
¸
=

b
11
a
1j
+b
12
a
2j
+ · · · +b
1q
a
qj
b
21
a
1j
+b
22
a
2j
+ · · · +b
2q
a
qj
.
.
.
b
m1
a
1j
+b
m2
a
2j
+ · · · +b
mq
a
qj
¸
¸
¸
¸
This is exactly the jth column of the product matrix BA.
Now we investigate the geometry of several speciﬁc linear transformations in
order to build up our intuition. In all of the examples below, the matrix is be square
and is therefore a map between Euclidean spaces of the same dimension. It can
therefore also be thought of as a map from one Euclidean space to itself.
Example 1: Let A = ￿

2 0
0 2 ￿

, then A ￿

x
y ￿

= ￿

2x
2y ￿

= 2 ￿

x
y ￿

. The eﬀect of this
matrix is to stretch every vector by a factor of 2.
Example 2: Let A = ￿

2 0
0 3 ￿

, then A ￿

x
y ￿

= ￿

2x
3y ￿

. This matrix stretches in the
x-direction by a factor of 2 and in the y-direction by a factor of 3.
18. Linear Transformations 105
Example 3: Let A = ￿

1 0
0 −1 ￿

, then A ￿

x
y ￿

= ￿

x
−y ￿

. This matrix reﬂects the plane
R
2
across the x-axis.
Example 4: Let A = ￿

1 0
0 0 ￿

, then A ￿

x
y ￿

= ￿

x
0 ￿

. This matrix perpendicularly
projects the plane R
2
onto the x-axis.
Example 5: Let A = ￿

0 −1
1 0 ￿

, then A ￿

1
0 ￿

= ￿

0
1 ￿

and A ￿

0
1 ￿

= ￿

−1
0 ￿

. Clearly A
rotates the coordinate vectors by 90

, but does this mean that it rotates every vector
by this amount? Yes, as we will see in the next example.
Example 6: Let’s consider the transformation that rotates the plane R
2
by an angle
θ. The ﬁrst thing we must do is to show that this transformation is linear. Since any
rotation T takes the parallelogram deﬁned by x and y to the congruent parallelogram
deﬁned by T(x) and T(y), it takes the vertex x + y to the vertex T(x) + T(y).
Therefore it satisﬁes the property T(x) + T(y) = T(x + y), which is Property 1 for
linear transformations. Property 2 can be veriﬁed in the same way.
x
x + y
y
T(x)
T(y)
T(x + y)
FIGURE 13
We conclude that a rotation is a linear transformation. We are therefore justiﬁed in
asking for its matrix representation A. To ﬁnd A all we have to do is to compute
where the coordinate vectors go. Clearly A ￿

1
0 ￿

= ￿

cos θ
sin θ ￿

and A ￿

0
1 ￿

= ￿

−sin θ
cos θ ￿

and therefore A = ￿

cos θ −sin θ
sin θ cos θ ￿

.
106 18. Linear Transformations
!
!
1
0
0
1
cos !
sin !
-sin !
cos !
T
0
1
=
1
0
T =
FIGURE 14
Example 7: Now consider reﬂection across an arbitrary line through the origin. A
reﬂection clearly takes the parallelogram deﬁned by x and y to the congruent par-
allelogram deﬁned by T(x) and T(y) and therefore satisﬁes Property 1. Property 2
can be veriﬁed in the same way.
x
x + y
y
T(x)
T(x + y)
T(y)
FIGURE 15
A reﬂection is therefore a linear transformation and so has a matrix representation
determined by where it takes the coordinate vectors. For example, if A reﬂects
R
2
across the line y = x, then A ￿

1
0 ￿

= ￿

0
1 ￿

and A ￿

0
1 ￿

= ￿

1
0 ￿

and therefore
A = ￿

0 1
1 0 ￿

.
Example 8: To show that a perpendicular projection of R
2
onto an arbitrary line
through the origin is a linear transformation is a little more diﬃcult. The parallelo-
gram deﬁned by x and y is projected perpendicularly onto the line. By the congru-
ence of the two shaded triangles in the ﬁgure below we see that ￿T(x)￿ +￿T(y)￿ =
18. Linear Transformations 107 ￿
T(x + y)￿, and since these vectors all lie on the same line and point in the same
direction, we conclude that T(x) + T(y) = T(x + y). The other two cases when the
line passes through the parallelogram or when x and y project to opposite sides of
the origin are similar. Property 2 can be veriﬁed in the same way.
y
x + y
x
T(y)
T(x)
T(x + y)
FIGURE 16
A projection is therefore a linear transformation and so has a matrix representation
determined by where it takes the coordinate vectors. For example, if A is the matrix
of the projection of R
2
onto the line y = x, then A ￿

1
0 ￿

= ￿

1
2
1
2 ￿

and A ￿

0
1 ￿

= ￿

1
2
1
2 ￿

,
and therefore A = ￿

1
2
1
2
1
2
1
2 ￿

.
Example 9: Let A = ￿

1 2
0 1 ￿

. In this case, even though we know where the coordinate
vectors go, it is still not easy to see what the transformation does. But if we ﬁx y = c
then A ￿

x
c ￿

= ￿

x + 2c
c ￿

shows us that the horizontal line at level c is shifted 2c units
to the right (if c is positive, to the left otherwise). This is a horizontal shear.
A
FIGURE 17
108 18. Linear Transformations
Example 10: Let A = ￿

4 2
−1 1 ￿

. Again the images of the coordinate vectors do not
tell us much. It turns out that to see the geometrical eﬀect of this matrix we will
need to compute its diagonal factorization. We will take up this approach in Section
22. Most matrices are in fact like this one or worse requiring even more sophisticated
factorizations.
Example 11: First rotate the plane R
2
by 90

and then reﬂect across the 45

line. This
is a typical example of the composition of two linear transformations. The rotation is
A = ￿

0 −1
1 0 ￿

(Example 5) and the reﬂection is B = ￿

0 1
1 0 ￿

(Example 7). To apply
them in the correct order to an arbitrary vector x we must write B(A(x)) which by
the associativity of matrix multiplication is the same as (BA)x. So we just compute
the product
BA = ￿

0 1
1 0 ￿
￿
0 −1
1 0 ￿

= ￿

1 0
0 −1 ￿

,
which is a reﬂection across the x-axis. Note that it is extremely important to perform
the multiplication in the correct order. The reverse order would result in
AB = ￿

0 −1
1 0 ￿
￿
0 1
1 0 ￿

= ￿

−1 0
0 1 ￿

,
which is a reﬂection across the y-axis. This is incorrect!
EXERCISES
1. Prove that linear transformations take subspaces to subspaces.
2. Describe the geometrical eﬀect of each of the following transformations (where
α
2

2
= 1 in (g) and (l)).
(a) ￿

0 −1
−1 0 ￿

(b) ￿

0 0
0 1 ￿

(c) ￿

1
2

1
2

1
2
1
2 ￿

(d) ￿

1

2

1

2
1

2
1

2 ￿

(e) ￿

1
2

3
2

3
2
1
2 ￿

(f) ￿

1
2

3
2

3
2

1
2 ￿

(g) ￿

α −β
β α ￿

(h)

0 −1 0
1 0 0
0 0 1
¸
¸
(i)

0 0 −1
0 1 0
1 0 0
¸
¸
(j)

1 0 0
0 1 0
0 0 0
¸
¸
(k)

0 −1 0
1 0 0
0 0 −1
¸
¸
(l)

α −β 0
β α 0
0 0 −1
¸
¸
3. Find the 3 ×3 matrix that
18. Linear Transformations 109
(a) reverses the direction of every vector.
(b) projects R
3
onto the xz-plane.
(c) reﬂects R
3
across the plane x = y.
(d) rotates R
3
around the x-axis by 45

.
4. Find the image of the unit circle x
2
+ y
2
= 1 under transformations induced by
the two matrices below. What are the image curves? (Hint: Let (¯ x, ¯ y) be the image
of (x, y) where x
2
+y
2
= 1, and ﬁnd an equation satisﬁed by (¯ x, ¯ y).)
(a) ￿

2 0
0 2 ￿

(b) ￿

2 0
0 3 ￿

5. Describe how the following two matrices transform the grid consisting of horizontal
and vertical lines at each integral point of the x and y-axes.
(a) ￿

1 0
3 1 ￿

(b) ￿

3 1
1 3 ￿

6. The matrix ￿

1 1
0 0 ￿

maps R
2
onto the x-axis but is not a projection. Why?
7. In each case below ﬁnd the matrix that represents the resulting transformation
and describe it geometrically.
(a) Transform R
2
by ﬁrst rotating by −90

and then reﬂecting in the line x+y = 0.
(b) Transform R
2
by ﬁrst rotating by 30

, then reﬂecting across the 135

line, and
then rotating by −60

.
(c) Transform R
3
by ﬁrst rotating the xy-plane, then the xz-plane, then the yz-
plane, all through 90

.
8. Interpret the equality ￿

cos β −sin β
sin β cos β ￿
￿
cos α −sin α
sin α cos α ￿

= ￿

cos (α +β) −sin (α +β)
sin (α +β) cos (α +β) ￿

geometrically. Obtain the trigonometric equalities
cos (α +β) = cos αcos β −sin αsin β
sin (α +β) = sin αcos β + cos αsin β.
110 18. Linear Transformations
9. Show that the matrix that reﬂects R
2
across the line through the origin that
makes an angle θ with the x-axis is ￿

cos 2θ sin 2θ
sin 2θ −cos 2θ ￿

. (Hint: Compute where the
coordinate vectors go.)
10. Show that the matrix that projects R
2
onto the line through the origin that
makes an angle θ with the x-axis is ￿

cos
2
θ cos θ sin θ
cos θ sin θ sin
2
θ ￿

. (Hint: Compute where
the coordinate vectors go.)
11. Interpret the equality ￿

cos θ sin θ
sin θ −cos θ ￿
￿
1 0
0 −1 ￿

= ￿

cos θ −sin θ
sin θ cos θ ￿

geometri-
cally. Conclude that any rotation can be written as the product of two reﬂections.
12. Prove the converse of the result of the previous exercise, that is, prove the product
of any two reﬂections is a rotation. (Use the results of Exercises 8 and 9.)
13. Find the matrix that represents the linear transformation T(x
1
, x
2
, x
3
, x
4
) =
(x
2
, x
4
+ 2x
3
, x
1
+x
3
, 2x
3
).
14. If T

1
0
0
¸
¸
= ￿

4
5 ￿

, T

0
1
0
¸
¸
= ￿

0
−2 ￿

, T

0
0
1
¸
¸
= ￿

−3
1 ￿

, then ﬁnd the matrix of T.
15. If T ￿

5
4 ￿

= ￿

6
−2 ￿

, T ￿

3
2 ￿

= ￿

7
1 ￿

, then ﬁnd its matrix.
16. If T rotates R
2
by 30

and dilates it by a factor of 5, then ﬁnd its matrix.
17. If T reﬂects R
3
in the xy-plane and dilates it by a factor of
1
2
, then ﬁnd its
matrix.
19. Row Space, Column Space, Null Space 111
19. ROW SPACE, COLUMN SPACE, NULL SPACE
In the previous section we considered matrices as linear transformations. All of
the examples we looked at were square matrices. Now we consider rectangular ma-
trices and try to understand the geometry of the linear transformations they induce.
To do this, we deﬁne three fundamental subspaces associated with any matrix. Let
A be an m×n matrix. We view A as a map from R
n
to R
m
and make the following
deﬁnitions. The subspace of R
n
spanned by the rows of A (thought of as column
vectors) is called the row space of A and is written row(A). The subspace of R
m
spanned by the columns of A is called the column space of A and is written col(A).
The set of vectors x in R
n
such that Ax = 0 is called the null space of A and is
written null(A). In fact, null(A) is a subspace of R
n
. (This follows from Section 15
where it was shown that the set of solutions to Ax = 0 is closed under addition and
scalar multiplication.)
A
R
n
R
m
col(A)
null(A)
row(A)
FIGURE 18
Now we will show how to compute each of these subspaces for any given ma-
trix. By “compute these subspaces”, we mean “ﬁnd bases for these subspaces.” To
illustrate, we will use the example
A =

1 2 0 4 1
0 0 0 2 2
1 2 0 6 3
¸
¸
.
1. row(A): To ﬁnd a basis for row(A), we use the method of Section 16 Example
2. Recall that to ﬁnd a basis for a subspace spanned by a set of vectors we just
write them as rows of a matrix and then do Gaussian elimination. In this case, the
spanning vectors are already the rows of a matrix, so running Gaussian elimination
(actually Gauss-Jordan elimination) on A we obtain
U =

1 2 0 0 −3
0 0 0 2 2
0 0 0 0 0
¸
¸
.
112 19. Row Space, Column Space, Null Space
Since row(A) = row(U), the two nonzero independent rows of U form a basis for
row(A), so
row(A) has basis

1
2
0
0
−3
¸
¸
¸
¸
¸
,

0
0
0
2
2
¸
¸
¸
¸
¸

.
2. col(A): We have just seen that A and U have the same row spaces. Do they
also have the same column spaces? No, this is not true! What is true is that the
columns of A that form a basis for col(A) are exactly those columns that correspond
to the columns of U that form a basis for col(U). In this example they are columns
1 and 4. The reason for this is as follows: The two systems Ac = 0 and Uc = 0
have exactly the same solutions. Furthermore, linear combinations of the columns
of A can be written as Ac and of U as Uc. This implies that independence and
dependence relations between the columns of U correspond to independence and
dependence relations between the corresponding columns of A. Therefore, since the
pivot columns of U are linearly independent (because no such vector is a linear
combination of the vectors that preceed it), the same is true of the pivot columns
of A. And likewise, since every nonpivot column of U is a linear combination of the
pivot columns, the same is true of A. That is, for the U of our example, columns 1
and 4 are independent, and any other columns are dependent on these two (Exercise
8). Therefore the same can be said of A. We conclude that
col(A) has basis

1
0
1
¸
¸
,

4
2
6
¸
¸

.
3. null(A): We want to ﬁnd a basis for all solutions of Ax = 0. But we have done
this before (Section 16 Example 3). We just solve Ux = 0 and obtain
x = a

−2
1
0
0
0
¸
¸
¸
¸
¸
+b

0
0
1
0
0
¸
¸
¸
¸
¸
+c

3
0
0
−1
1
¸
¸
¸
¸
¸
.
We conclude that
null(A) has basis

−2
1
0
0
0
¸
¸
¸
¸
¸
,

0
0
1
0
0
¸
¸
¸
¸
¸
,

3
0
0
−1
1
¸
¸
¸
¸
¸

.
19. Row Space, Column Space, Null Space 113
R
5
R
3
null(A)
row(A)
1
0
1
col(A)
0
0
0
2
2
0
0
1
0
0
3
0
0
-1
1
-2
1
0
0
0
1
2
0
0
-3
4
2
3 A
FIGURE 19
We make a series of observations about these three fundamental subspaces.
1. From the example above, we immediately see that the number of leading variables
in U, which is called the rank of A, determines the number of vectors in the bases of
both row(A) and col(A). We therefore have
dim(col(A)) = dim(row(A)) = rank(A).
2. The number of free variables in U determines the number of vectors in the basis of
null(A). Since (the number of leading variables) + (the number of free variables) =
n, we have
dim(row(A)) + dim(null(A) = n.
3. If x is any vector in null(A), then Ax = 0, which when written out looks like
A =

row 1 of A
row 2 of A
.
.
.
row m of A
¸
¸
¸
¸

x
1
x
2
.
.
.
x
n
¸
¸
¸
¸
=

0
0
.
.
.
0
¸
¸
¸
¸
.
Because of the way matrix multiplication works, this means that x is orthogonal
to each row of A and therefore to row(A). Therefore null(A) is the orthogonal
complement of row(A). We write row(A) = null(A)

and conclude that null(A) and
row(A) are orthogonal complements of each other. (See Section 17.) This is the
114 19. Row Space, Column Space, Null Space
reason that Figure 18 was drawn the way that it was, that is, with the line null(A)
perpendicular to the plane row(A).
4. As we have seen many times before, the equation Ax = b can be written as
x
1

a
11
a
21
.
.
.
a
m1
¸
¸
¸
¸
+x
2

a
12
a
22
.
.
.
a
m2
¸
¸
¸
¸
+ · · · +x
n

a
1n
a
2n
.
.
.
a
mn
¸
¸
¸
¸
= b.
This immediately says that the system Ax = b has a solution if and only if b is in
col(A). Another way of saying this is that col(A) consists of all those vectors b for
which there exists a vector x such that Ax = b, or in other words col(A) is the image
of R
n
under the transformation A.
5. If x
0
is a solution of the system Ax = b, then any other solution can be written
as x
0
+w where w is any vector in null(A). For suppose y is another solution, then
A(x
0
−y) = Ax
0
−Ay = b −b = 0 ⇒x
0
−y = w where w is some vector in null(A),
so we have y = x
0
+ w. Note that when we solve Ax = b by Gaussian elimination,
we get all solutions expressed in this form automatically.
6. Suppose null(A) = {0}, that is, the null space of A consists of only the zero
vector. (In this case say that the null space is trivial, not empty. A null space can
never be empty. It must always contain at least the zero vector.) Then A has several
important properties which we summarize in a theorem:
Theorem. For any matrix A the following statements are equivalent.
(a) null(A) = {0}
(b) A is one-one (that is, A takes distinct vectors to distinct vectors).
(c) If Ax = b has a solution x, it must be unique.
(d) A takes linearly independent sets to linearly independent sets.
(e) The columns of A are linearly independent.
Proof: We prove (a) ⇒(b) ⇒(c) ⇒(d) ⇒(e) ⇒(a)
(a) ⇒(b): x ￿= y ⇒x −y ￿= 0 ⇒Ax −Ay = A(x −y) ￿= 0 ⇒Ax ￿= Ay.
(b) ⇒(c): Suppose Ax = b and Ay = b, then Ax = Ay ⇒x = y.
(c) ⇒ (d): If v
1
, v
2
, . . . , v
n
are linearly independent, then c
1
Ax
1
+ c
2
Ax
2
+ · · · +
c
n
Ax
n
= 0 ⇒ A(c
1
x
1
+ c
2
x
2
+ · · · + c
n
x
n
) = 0 = A0 ⇒ c
1
x
1
+ c
2
x
2
+ · · · + c
n
x
n
=
0 ⇒c
1
= c
2
= · · · = c
n
= 0.
(d) ⇒ (e): A maps the set of coordinate vectors, which are independent, to the set
of its own columns, which therefore must also be independent.
(e) ⇒ (a): The equation Ax = 0 can be interpreted as a linear combination of the
columns of A equaling zero. Since the columns of A are independent, this can happen
only if x = 0. This ends the proof.
If A is a square matrix, this theorem can be combined with the theorem of Section
9 as follows.
19. Row Space, Column Space, Null Space 115
Theorem. For an n ×n matrix A the following statements are equivalent.
(a) A is nonsingular
(b) A is invertible.
(c) Ax = b has a unique solution for any b.
(d) null(A) = {0}
(e) det(A) ￿= 0.
(f) A has rank n.
(g) The columns of A are linearly independent.
(h) The rows of A are linearly independent.
Proof: From the theorem of Section 8 we have the equivalence of (a), (b), (c), (d),
and (e). Then (d) ⇔ (g) follows from the previous theorem, and (f) ⇔ (g) ⇔ (h) is
obvious from dim(row(A)) = dim(col(A)) = rank(A).
EXERCISES
1. For each matrix below ﬁnd bases for the row, column, and null spaces and ﬁll in the
blanks in the sentence “As a linear transformation, A maps from dimensional
Euclidean space to dimensional Euclidean space and has rank equal to .”
(a) ￿

1 2
2 4 ￿

(b) ￿

1 2
2 3 ￿

(c)

2 4 2
0 4 2
2 8 4
¸
¸
(d)

3 2 −1
6 3 5
−3 −1 8
0 −1 7
¸
¸
¸
(e)

1 2 −1 −4 1
2 4 −1 −3 5
3 6 −3 −12 3
¸
¸
(f)

2 8 4 0 0
2 7 2 1 −2
−2 −6 0 −1 6
0 2 4 −2 4
¸
¸
¸
116 19. Row Space, Column Space, Null Space
2. The 3×3 matrix A has null space generated by the vector

1
1
1
¸
¸
and column space
equal to the xy-plane.
(a) Is

−3
−3
−3
¸
¸
in null(A)? What does A

−3
−3
−3
¸
¸
equal?
(b) Is

−3
13
0
¸
¸
in col(A)? Is it in the image of A?
(c) Is Ax =

−5
−5
2
¸
¸
solvable?
(d) Is

−4
6
−2
¸
¸
in row(A)?
3. The 2×3 matrix A has row space generated by the vector

1
2
9
¸
¸
and column space
generated by the vector ￿

2
−1 ￿

(a) Is

−2
−4
−8
¸
¸
in row(A)?
(b) Is

−2
−1
2
¸
¸
in null(A)?
(c) Find a basis for null(A).
(d) Is ￿

−3
3 ￿

in col(A)?
(e) Is Ax = ￿

−4
2 ￿

solvable?
4. Describe the row, column, and null spaces of the following kinds of transformations
of R
2
.
(a) rotations
(b) reﬂections
(c) projections
19. Row Space, Column Space, Null Space 117
5. For each case below explain why it is not possible for a matrix to exist with the
stated properties.
(a) Row space and null space both contain the vector

1
2
3
¸
¸
.
(b) Column space has basis

3
2
1
¸
¸
and null space has basis

1
3
1
¸
¸
.
(c) Column space = R
4
and row space = R
3
.
6. Show that if null(A) = {0}, then A takes subspaces into subspaces of the same
dimension. In particular, A takes all of R
n
into an n-dimensional subspace of R
m
.
7. Prove the following assertions for an m×n matrix A.
(a) rank(A) ≤ n and m.
(b) If rank(A) = n, then n ≤ m (A is tall and skinny) and A is one-one.
(c) If rank(A) = m, then n ≥ m (A is short and fat) and Ax = b has at least one
solution for any b.
8. Show directly from the deﬁnition that columns 1 and 4 of

1 2 0 0 −3
0 0 0 2 2
0 0 0 0 0
¸
¸
are linearly independent, while columns 1, 4, and any other columns are linearly
dependent.
9. Write down all possible row echelon forms for 2 ×3 matrices.
10. Give examples of matrices A such that
(a) null(A) = span

1
2
3
¸
¸

(b) null(A) = span

1
2
3
¸
¸

(c) col(A) = span

1
2
3
¸
¸

(d) A is 4 ×5 and dim(null(A)) = 3
118 20. Least Squares and Projection
20. LEAST SQUARES AND PROJECTIONS
When a scientist wants to ﬁt a mathematical model to data, he often samples a
greater number of data points than the number of unknowns in the model. The result
is an overdetermined inconsistent system (one with more equations than unknowns
and no solution). We illustrate this situation with the following two examples.
Example 1: We want to ﬁt a straight line y = c +dx to the data (0, 1), (1, 4), (2, 2),
(3, 5). This means we must ﬁnd the c and d that satisfy the equations
c +d · 0 = 1
c +d · 1 = 4
c +d · 2 = 2
c +d · 3 = 5
or the system

1 0
1 1
1 2
1 3
¸
¸
¸ ￿

c
d ￿

=

1
4
2
5
¸
¸
¸
.
This is an example of a curve ﬁtting problem.
y = c + dx
x
y
(0,1)
(1,4)
(2,2)
(3,5)
FIGURE 20
Example 2: Suppose we have experimentally determined the molecular weights of
the following six oxides of nitrogen:
NO N
2
O NO
2
N
2
O
3
N
2
O
5
N
2
O
4
30.006 44.013 46.006 76.012 108.010 92.011
20. Least Squares and Projections 119
We want to use this information to compute the atomic weights of nitrogen and
oxygen as accurately as possible. This means that we must ﬁnd the N and O that
satisfy the equations
1 · N + 1 · O = 30.006
2 · N + 1 · O = 44.013
1 · N + 2 · O = 46.006
2 · N + 3 · O = 76.012
2 · N + 5 · O = 108.010
2 · N + 4 · O = 92.011
or the system

1 1
2 1
1 2
2 3
2 5
2 4
¸
¸
¸
¸
¸
¸
¸ ￿

N
O ￿

=

30.006
44.013
46.006
76.012
108.010
92.011
¸
¸
¸
¸
¸
¸
¸
.
Each of these problems requires the solution of an overdetermined system Ax =
b. We know that a system can have no solution, one solution, or inﬁnitely many
solutions. But in practice, when a system with more equations than unknowns arises
from experimental data, it is extremely unlikely that the second or third cases will
occur. We are therefore faced with the problem of “solving” an overdetermined
inconsistent system of equations – an impossibility!
Since there is no hope of ﬁnding a solution to the system in the normal sense,
the only thing we can do is to ﬁnd x’s that satisfy Ax ≈ b. The “best” x would be the
one that makes this approximate equality as close to an exact equality as possible.
To give meaning to this last statement, we rewrite the system as Ax − b ≈ 0. The
left-hand side of this “equation” is a vector. Our goal then is to ﬁnd an x that makes
this vector as close to zero as possible, or, in other words, as small as possible. Since
we measure the size of a vector by its length, we come to a formulation of the least
squares problem for Ax = b: Find the vector x that makes ￿Ax − b￿ as small as
possible. The vector x that does this is called the least squares solution to Ax = b.
If we write out ￿Ax −b￿ for Example 1 above we get ￿ ￿ ￿ ￿ ￿ ￿ ￿

c + 0d −1
c + 1d −4
c + 2d −2
c + 3d −5
¸
¸
¸ ￿ ￿ ￿ ￿ ￿ ￿ ￿

= ￿

(c + 0d −1)
2
+ (c + 1d −4)
2
+ (c + 2d −2)
2
+ (c + 3d −5)
2
.
Each term under the square root can be interpreted as the square of the vertical
distance by which the line y = c+dx misses each data point. Our goal is to minimize
the sum of the squares of these errors. This is why such problems are called least
squares problems. (In statistics they are called linear regression problems.)
120 20. Least Squares and Projection
How do we ﬁnd the x that minimizes ￿Ax −b￿? First we view A as a map from
R
n
to R
m
. Then b and col(A) both lie in R
m
. Note that b does not lie in col(A)
otherwise Ax = b would be solvable exactly. The matrix A takes vectors x to vectors
Ax in col(A).
x A
col(A)
b
Ax
Ax-b
R
n
R
m
FIGURE 21
Our problem is to ﬁnd the Ax that makes Ax−b as short as possible, or said another
way, to ﬁnd a vector of the form Ax that is as close to b as possible. Intuitively
this occurs when Ax −b is orthogonal to col(A). (For a proof see Exercise 10.) And
this holds if and only if Ax − b is orthogonal to the columns of A, that is, if the
dot product Ax − b with each column of A is zero. If we write the columns of A
horizontally, we can express these conditions all at once as

col 1 of A
col 2 of A
.
.
.
col n of A
¸
¸
¸
¸

.
.
.
Ax −b
.
.
.
¸
¸
¸
=

0
0
.
.
.
0
¸
¸
¸
¸
.
This is just A
T
(Ax −b) = 0, which can be rewritten as A
T
Ax −A
T
b = 0 or as
A
T
Ax = A
T
b.
These are called the normal equations for the least squares problem Ax = b. They
form an n × n linear system that can be solved by Gaussian elimination. We sum-
marize: The least squares solution to the overdetermined inconsistent linear system
Ax ≈ b is deﬁned to be that vector x that minimizes the length of the vector Ax −b.
It is found as the exact solution to the normal equations A
T
Ax = A
T
b. We can now
solve the two problems at the beginning of this section.
20. Least Squares and Projections 121
Example 1 again: The normal equations for this problem are ￿

1 1 1 1
0 1 2 3 ￿

1 0
1 1
1 2
1 3
¸
¸
¸ ￿

c
d ￿

= ￿

1 1 1 1
0 1 2 3 ￿

1
4
2
5
¸
¸
¸
or multiplied out are ￿

4 6
6 14 ￿
￿
c
d ￿

= ￿

12
23 ￿

and the solution by Gaussian elimination is ￿

c
d ￿

= ￿

1.5
1 ￿

.
So the best ﬁt line in the least squares sense is y = 1.5 +x.
Example 2 again: The normal equations for this problem are ￿

1 2 1 2 2 2
1 1 2 3 5 4 ￿

1 1
2 1
1 2
2 3
2 5
2 4
¸
¸
¸
¸
¸
¸
¸ ￿

N
O ￿

= ￿

1 2 1 2 2 2
1 1 2 3 5 4 ￿

30.006
44.013
46.006
76.012
108.010
92.011
¸
¸
¸
¸
¸
¸
¸
or multiplied out are ￿

18 29
29 56 ￿
￿
N
O ￿

= ￿

716.104
1302.161 ￿

and the solution by Gaussian elimination is ￿

N
O ￿

= ￿

14.0069
15.9993 ￿

.
It is clear that the matrix A
T
A is square and symmetric (see Section 2 Exercise
6(e)). But when we said that the least squares solution is the solution of the normal
equations, we were implicitly assuming that the normal equations could be solved,
that is, that A
T
A is nonsingular. This is true if the columns of A are independent,
because in that case we have A
T
Ax = 0 ⇒ x
T
A
T
Ax = 0 ⇒ (Ax)
T
(Ax) = 0 ⇒ ￿
Ax￿
2
= 0 ⇒ Ax = 0 ⇒ x = 0. But if the columns of A are not independent, then
A
T
A will be singular. In fact, for large scale problems A
T
A is usually singular, or is
so close to being singular that Gaussian elimination tends to give very inaccurate an-
swers. For such problems it is necessary to use more numerically stable methods such
as the QR factorization (see the next section) or the singular value decomposition.
122 20. Least Squares and Projection
In solving the least squares problem, we have inadvertently found the solution
to a seemingly unrelated problem: the computation of projection matrices. From
our geometrical considerations, the vector p = Ax is the orthogonal projection of the
vector b onto the subspace col(A). Solving the normal equations for x we obtain x =
(A
T
A)
−1
A
T
b, and putting this expression back into p we obtain p = A(A
T
A)
−1
A
T
b.
Therefore, to ﬁnd the projection of any vector b onto col(A), we simply multiply b
by the matrix P = A(A
T
A)
−1
A
T
. We conclude that
P = A(A
T
A)
−1
A
T
is the matrix that projects R
m
onto the subspace col(A).
Example 3: Find the matrix that projects R
3
onto the plane spanned by the vectors

1
0
1
¸
¸
and

2
1
1
¸
¸
. First line up the two vectors (in any order) to form the matrix
A =

1 2
0 1
1 1
¸
¸
, and then compute
P = A(A
T
A)
−1
A
T
=

1 2
0 1
1 1
¸
¸

¸ ￿

1 0 1
2 1 1 ￿

1 2
0 1
1 1
¸
¸

−1 ￿

1 0 1
2 1 1 ￿

=

1 2
0 1
1 1
¸
¸ ￿￿

2 3
3 6 ￿￿

−1 ￿

1 0 1
2 1 1 ￿

=

1 2
0 1
1 1
¸
¸ ￿

2 −1
−1
2
3 ￿
￿
1 0 1
2 1 1 ￿

=

2
3
1
3
1
3
1
3
2
3

1
3
1
3

1
3
2
3
¸
¸
¸
.
Just as in the case of least squares, the columns of A must be independent for this
to work; that is, the two given vectors must form a basis for the subspace to be
projected onto.
Note that P in the example above is symmetric. It turns out that this is true of any
projection matrix (Exercise 9(a)). Furthermore, projection matrices also satisfy the
property P
2
= P (Exercise 9(a)). These observations also go in the other direction;
20. Least Squares and Projections 123
that is, any matrix P that satisﬁes P
T
= P and P
2
= P is the projection matrix
of R
m
onto col(P). We need only verify that Px − x is orthogonal to col(P) for
any vector x. We check all the required dot products at once with the computation
P
T
(Px −x) = P(Px −x) = P
2
x −Px = Px −Px = 0.
Projection matrices can be used to compute reﬂection matrices. First we have
to precisely deﬁne what we mean by a reﬂection. Let S be a subspace of R
m
.
Any vector x can be written as x = Px + (x − Px) where Px is the projection
of x onto S and x − Px is the component of x orthogonal to S. If we reverse the
direction of x − Px we get a new vector y = Px − (x − Px) which we deﬁne to
be the reﬂection of x across the subspace S. Note that y can then be written as
y = Px − x + Px = 2Px − x = (2P − I)x, and therefore the matrix R = 2P − I
reﬂects R
m
across the subspace S.
S
x
x - Px
-(x - Px)
y
Px
FIGURE 22
The equation x = Px + (x −Px) above also shows that any vector x can be re-
solved into a component in S and a component in S

. Furthermore, since orthogonal
vectors are linearly independent (Section 17 Exercise 11), this resolution is unique.
From this we can see more precisely how any matrix A behaves as a linear transfor-
mation from one Euclidean space to another. Let S = null(A), so that S

= row(A).
Then any vector x can be expressed uniquely as x = n+r, where n is in null(A) and
r is in col(A). Applying A to x we obtain Ax = An +Ar = 0 +Ar. This shows that
A essentially projects x onto r in row(A) and then maps r to a unique vector Ar
in col(A). Any matrix can therefore be visualized as a projection onto its row space
followed by one-one linear transformation of its row space onto its column space.
EXERCISES
1. Solve Ax = b in the least squares sense for the two cases below.
(a) A =

1 0
0 1
1 1
1 2
¸
¸
¸
and b =

5
4
6
4
¸
¸
¸
124 20. Least Squares and Projection
(b) A =

1 4 −1
2 3 1
0 3 1
1 2 −1
¸
¸
¸
and b =

−1
2
−1
1
¸
¸
¸
2. For each case below ﬁnd the line or surface of the indicated type that best ﬁts the
given data in the least squares sense.
(a) y = ax: (1, 5), (2, 3), (−1, 3), (3, 4), (0, 1)
(b) y = a +bx: (0, 0), (1, 1), (3, 2), (4, 5)
(c) z = a +bx +cy: (0, 1, 6), (1, 0, 5), (0, 0, 1), (1, 1, 6)
(d) z = a +bx
2
+cy
2
: (0, 1, 10), (0, 2, 5), (−1, 1, 20), (1, 0, 15)
(e) y = a +bt +ct
2
: (1, 5), (0, −6), (2, 8), (−1, 5)
(f) y = a +b cos t +c sin t: (0, 3), (
π
2
, 5), (−
π
2
, 3), (π, −3)
3. We want to use the following molecular weights of sulﬁdes of copper and iron to
compute the atomic weights of copper, iron, and sulfur.
Cu
2
S CuS FeS Fe
3
S
4
Fe
2
S
3
FeS
2
159.15 95.61 87.92 295.81 207.90 119.98
Express this problem as an overdetermined linear system. Write down the normal
equations. Do not solve them!
4. Find the projection matrices for the indicated subspaces below.
(a) R
2
onto the line generated by ￿

1
3 ￿

.
(b) R
3
onto the line generated by

2
1
2
¸
¸
.
(c) R
3
onto the plane spanned by

1
1
1
¸
¸
,

−2
1
1
¸
¸
.
(d) R
3
onto the plane spanned by

1
0
1
¸
¸
,

1
1
1
¸
¸
.
(e) R
4
onto the plane spanned by

1
1
0
1
¸
¸
¸
,

0
0
1
0
¸
¸
¸
.
5. Find the projection of the vector

1
2
3
¸
¸
onto the plane in Exercise 4(c) above.
20. Least Squares and Projections 125
6. Find the reﬂection matrix of R
3
across the plane in Exercise 4(c) above.
7. Find the projection matrices for the indicated subspaces below.
(a) R
2
onto the line y = 2x.
(b) R
3
onto the plane x −y −2z = 0.
8. Show that as transformations the matrices below have the following geometric
interpretations.
(a) ￿

−1 0
0 −1 ￿

(i) Reﬂection through the origin, (ii) rotation by π radians, (iii)
reﬂection across the x-axis and reﬂection across the y-axis.
(b)

−1 0 0
0 −1 0
0 0 1
¸
¸
(i) Reﬂection across the z-axis and (ii) rotation by π radians
around the z-axis.
(c)

−1 0 0
0 −1 0
0 0 −1
¸
¸
(i) Reﬂection through the origin and (ii) rotation by π radians
around the z-axis and reﬂection across the xy-plane.
9. Use matrix algebra to prove
(a) if P = A(A
T
A)
−1
A
T
, then P
T
= P and P
2
= P.
(b) if R = 2P −I, then R
T
= R and R
2
= I.
10. If S is a subspace of R
n
, b is a vector not in S, and w is a vector in S such that
b −w is orthogonal to S, then show ￿b −w￿ ≤ ￿b −z￿ where z is any other vector in
S. (Use the Pythagorean Theorem on the right triangle with sides b −w and z −w.)
Conclude that w is the unique point in S closest to b.
b
w
S
z
FIGURE 23
126 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
21. ORTHOGONAL MATRICES, GRAM-SCHMIDT, AND QR FACTORIZA-
TION
A set of vectors q
1
, q
2
, · · · , q
n
is orthogonal if every pair of vectors in the set is
orthogonal, that is, q
i
· q
j
= 0 for i ￿= j. Furthermore the set is orthonormal if all
the vectors in the set are unit vectors, that is, ￿q
1 ￿
= ￿q
2 ￿
= · · · = ￿q
n ￿
= 1. We
know that such a set of vectors is linearly independent (Section 17 Exercise 11). We
say that it forms an orthogonal or orthonormal basis (whichever the case) for the
subspace that it spans.
Example 1: In R
2
the coordinate vectors ￿

1
0 ￿

and ￿

0
1 ￿

form an orthonormal basis,
while the vectors ￿

3
4 ￿

and ￿

4
−3 ￿

form an orthogonal basis. If we divide the second
two vectors by their lengths to make them unit vectors (this is called normalizing the
vectors), we obtain the orthonormal basis ￿

3
5
4
5 ￿

and ￿

4
5

3
5 ￿

. Since we have a basis,
we should be able to express any vector in R
2
as a linear combination of these two
vectors. Suppose, for example, we want to write ￿

2
7 ￿

= c ￿

3
5
4
5 ￿

+ d ￿

4
5

3
5 ￿

. As we
have done many times before, we rewrite this as ￿

3
5
4
5
4
5

3
5 ￿ ￿

c
d ￿

= ￿

2
7 ￿

and solve
by Gaussian elimination. But this time the coeﬃcient matrix has a special form: its
columns are orthonormal. We will see in a moment that this fact will enable us to
solve the system much more easily than by using Gaussian elimination.
We say that a square matrix Q is an orthogonal matrix if its columns are orthonormal.
(It is not called an orthonormal matrix even though that might make more sense.)
Clearly the columns of Q are orthonormal if and only if Q
T
Q = I, which can therefore
be taken as the deﬁning condition for a matrix to be orthogonal.
Example 2: Here are some orthogonal matrices. These are especially nice ones be-
cause they don’t involve square roots. ￿

3
5
4
5
4
5

3
5 ￿

2
3
2
3

1
3
2
3

1
3
2
3

1
3
2
3
2
3
¸
¸
¸

3
7
2
7
6
7
6
7
3
7
2
7
2
7

6
7
3
7
¸
¸
¸

4
9

8
9

1
9
7
9
4
9

4
9
4
9
1
9
8
9
¸
¸
¸

10
15
10
15
5
15
10
15

11
15
2
15

5
15

2
15
14
15
¸
¸
¸

1
2

1
2

1
2

1
2

1
2
1
2

1
2

1
2

1
2

1
2
1
2

1
2

1
2

1
2

1
2
1
2
¸
¸
¸
¸
¸
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 127
Now we make a series of observations about orthogonal matrices.
1. From the deﬁning condition for an orthogonal matrix Q
T
Q = I we immediately
have Q
−1
= Q
T
. This suggests that to solve a system Qx = b with an orthogonal
coeﬃcient matrix like that in Example 1 above, we just multiply both sides by Q
T
to
obtain Q
T
Qx = Q
T
b or x = Q
T
b. Thus a linear system with an orthogonal coeﬃcient
matrix can by solved by a simple matrix multiplication.
2. Since Q
T
is the inverse of Q, we also have QQ
T
= QQ
−1
= I. This immediately
says that the rows of an orthogonal matrix are orthonormal as well as the columns!
3. The matrix Q =

2
3
2
3
2
3

1
3

1
3
2
3
¸
¸
¸
has orthonormal columns but is not an orthogonal
matrix because it is not square. Note that Q
T
Q = I but QQ
T ￿
= I. Check it!
4. As a transformation an orthogonal matrix Q preserves length, distance, dot prod-
uct, and angles. Let’s consider each separately.
(a) length: ￿Qx￿
2
= (Qx)
T
(Qx) = x
T
Q
T
Qx = x
T
x = ￿x￿
2
⇒￿Qx￿ = ￿x￿.
(b) distance: From (a) and ￿Qx −Qy￿ = ￿Q(x −y)￿ = ￿x −y￿.
(c) dot product: Qx · Qy = (Qx)
T
(Qy) = x
T
Q
T
Qy = x
T
y = x · y.
(d) angles: The angle between Qx and Qy is given by arccos((Qx · Qy)/(￿Qx￿￿Qy￿))
which from (a) and (c) equals arccos((x · y)/(￿x￿￿y￿)) which is the angle between x
and y.
5. If a matrix Q preserves length, it must be orthogonal. This is the converse of 4(a)
above. Since Q preserves length, it preserves distance (as in 4(b) above). By the
SSS congruence theorem of Euclidean geometry this implies that Q takes triangles
into congruent triangles and therefore preserves angles. Another way to prove this
is to show that Q must preserve dot products and, since angles can be expressed in
terms of the dot product, must preserve angles also. (See Exercise 18 where even
more is proved.) Since Q preserves lengths and angles, it takes orthonormal sets
into orthonormal sets. In particular Q takes the coordinate vectors of R
n
into an or-
thonormal set, but this set consists of the columns of Q. Therefore Q has orthonormal
columns and so is an orthogonal matrix.
We leave orthogonal matrices for a moment and consider a seemingly unrelated
problem: Given a basis v
1
, v
2
, . . . , v
n
of a subspace V , ﬁnd an orthonormal basis
q
1
, q
2
, . . . , q
n
for V . We will use an example to illustrate a method for doing this.
For simplicity, instead of a subspace, we will take all of R
3
. Suppose we are given
the following basis for R
3
:
v
1
=

−2
−2
1
¸
¸
, v
2
=

2
8
2
¸
¸
, v
3
=

7
7
1
¸
¸
.
We will ﬁrst ﬁnd an orthogonal basis p
1
, p
2
, p
3
, and then normalize it to get the
128 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
orthonormal basis q
1
, q
2
, q
3
. The ﬁrst step is to set p
1
= v
1
:
p
1
=

−2
−2
1
¸
¸
.
The second step is to ﬁnd a vector p
2
that is orthogonal to p
1
and such that
span{p
1
, p
2
} = span{v
1
, v
2
}. We can accomplish this by deﬁning p
2
to be the com-
ponent of v
2
orthogonal to p
1
. Just subtract from v
2
its projection onto p
1
:
p
2
= v
2

v
2
· p
1
p
1
· p
1
p
1
=

2
8
2
¸
¸

−18
9

−2
−2
1
¸
¸
=

−2
4
4
¸
¸
.
v
1
= p
1
v
2
p
2
v
3
p
3
FIGURE 24
The third step is to ﬁnd a vector p
3
that is orthogonal to p
1
and p
2
and such that
span{p
1
, p
2
, p
3
} = span{v
1
, v
2
, v
3
}. We can accomplish this by deﬁning p
3
to be the
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 129
component of v
3
orthogonal to span{p
1
, p
2
}. Just subtract from v
3
its projection
onto span{p
1
, p
2
}. To ﬁnd this projection we don’t have to compute a projection
matrix as might be expected. All we have to do is to subtract oﬀ the projection of
v
3
onto p
1
and p
2
separately. (This works because p
1
and p
2
are orthogonal. See
Exercise 12.)
p
3
= v
3

v
3
· p
1
p
1
· p
1
p
1

v
3
· p
2
p
2
· p
2
p
2
=

7
7
1
¸
¸

−27
9

−2
−2
1
¸
¸

18
36

−2
4
4
¸
¸
=

2
−1
2
¸
¸
.
At each stage the p’s and the v’s just are linear combinations of each other, so
we have span{p
1
} = span{v
1
}, span{p
1
, p
2
} = span{v
1
, v
2
}, and span{p
1
, p
2
, p
3
} =
span{v
1
, v
2
, v
3
}. Finally we normalize the p’s to obtain the orthonormal q’s:
q
1
=

2
3

2
3
1
3
¸
¸
¸
, q
2
=

1
3
2
3
2
3
¸
¸
¸
, q
3
=

2
3

1
3
2
3
¸
¸
¸
.
The method that we have just illustrated is called the Gram-Schmidt process. It
should be clear how to extend it to larger numbers of vectors.
We can also express the result of the Gram-Schmidt process in terms of matrices.
First note that
v
1
is in span{q
1
}
v
2
is in span{q
1
, q
2
}
v
3
is in span{q
1
, q
2
, q
3
}.
Using matrices this can be written as

.
.
.
.
.
.
.
.
.
v
1
v
2
v
3
.
.
.
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
q
1
q
2
q
3
.
.
.
.
.
.
.
.
.
¸
¸
¸

∗ ∗ ∗
0 ∗ ∗
0 0 ∗
¸
¸
If we deﬁne A to be the matrix with columns v
1
, v
2
, v
3
, Q to be the matrix with
columns q
1
, q
2
, q
3
, and R to be the appropriate upper triangular matrix, then we
have A = QR. We can interpret this as a factorization of the matrix A into an
orthogonal matrix times an upper triangular matrix. For our example this looks like

−2 2 7
−2 8 7
1 2 1
¸
¸
=

2
3

1
3
2
3

2
3
2
3

1
3
1
3
2
3
2
3
¸
¸
¸

∗ ∗ ∗
0 ∗ ∗
0 0 ∗
¸
¸
.
130 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
It is easy to ﬁnd R. Just multiply the equation A = QR by Q
T
on the left to obtain
R = Q
T
A:

∗ ∗ ∗
0 ∗ ∗
0 0 ∗
¸
¸
=

· · · q
1
· · ·
· · · q
2
· · ·
· · · q
3
· · ·
¸
¸

.
.
.
.
.
.
.
.
.
v
1
v
2
v
3
.
.
.
.
.
.
.
.
.
¸
¸
¸
=

2
3

2
3
1
3

1
3
2
3
2
3
2
3

1
3
2
3
¸
¸
¸

−2 2 7
−2 8 7
1 2 1
¸
¸
=

3 −6 −9
0 6 3
0 0 3
¸
¸
.
We ﬁnally have

−2 2 7
−2 8 7
1 2 1
¸
¸
=

2
3

1
3
2
3

2
3
2
3

1
3
1
3
2
3
2
3
¸
¸
¸

3 −6 −9
0 6 3
0 0 3
¸
¸
.
This shows that any square matrix A with independent columns has a factorization
A = QR into an orthogonal Q and an upper triangular R. In fact, we can make an
even more general statement. Suppose that we had started with the matrix
B =

−2 2
−2 8
1 2
¸
¸
Then we would have had the factorization

−2 2
−2 8
1 2
¸
¸
=

2
3

1
3

2
3
2
3
1
3
2
3
¸
¸
¸ ￿

3 −6
0 6 ￿

.
We see that B = QR where now Q has orthonormal columns but is not orthogonal!
Fortunately Q
T
Q = I is still true so the method above to ﬁnd R still works. We
conclude that any matrix A with independent columns has a factorization of the
form A = QR where Q has orthonormal columns and R is upper triangular. This is
called the QR factorization and is the third great matrix factorization that we have
seen (after the LU and diagonal factorizations). Actually, it is possible to obtain
a QR-like factorization for any matrix whatever, but we will stop here. Note that
Gram-Schmidt process, on which all this is based, is the ﬁrst truly new computational
technique we have had since we ﬁrst introduced Gaussian elimination! In fact, there
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 131
are eﬃcient algorithms that can perform Gram-Schmidt in
2n
3
3
operations, which
makes it competitive with Gaussian elimination in many situations.
The QR factorization has a wide range of applications. We mention two. For the
ﬁrst, recall an overdetermined inconsistent system Ax = b has a least squares solution
given by the normal equations A
T
Ax = A
T
b. Suppose we have the QR factorization
A = QR. Then plugging into the normal equations we obtain (QR)
T
QRx = (QR)
T
b
or R
T
Q
T
QRx = R
T
Q
T
b or R
T
Rx = R
T
Q
T
b. Since R
T
is nonsingular (it’s triangular
with nonzeros down its diagonal), we can multiply through by (R
T
)
−1
to obtain
Rx = Q
T
b.
This equation is another matrix expression of the normal equations. Since R is upper
triangular, it can be solved simply by back substitution. Of course, most of the work
was done in ﬁnding the QR factorization of A in the ﬁrst place. In practice the QR
method preferable to solving the normal equations directly since the Gram-Schmidt
process for ﬁnding the QR factorization is more numerically stable than Gaussian
elimination.
Example 3: Recall the system

1 0
1 1
1 2
1 3
¸
¸
¸ ￿

c
d ￿

=

1
4
2
5
¸
¸
¸
.
from the line ﬁtting problem of Section 20. We ﬁnd the QR factorization of the
coeﬃcient matrix:

1 0
1 1
1 2
1 3
¸
¸
¸
=

1
2

3

20
1
2

1

20
1
2
1

20
1
2
3

20
¸
¸
¸
¸
¸
¸
¸ ￿

2 3
0

5 ￿

.
This gives the normal equations in the form ￿

2 3
0

5 ￿
￿
c
d ￿

= ￿

1
2
1
2
1
2
1
2

3

20

1

20
1

20
3

20 ￿

1
4
2
5
¸
¸
¸
= ￿

6

5 ￿

.
The solution is c = 1.5 and d = 1 as before.
132 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
The second application of the QR factorization is to computing projection ma-
trices. If we have the QR factorization A = QR, the projection matrix P of R
n
onto
col(A) becomes
P = A(A
T
A)
−1
A
T
= QR((QR)
T
(QR))
−1
(QR)
T
= QR(R
T
Q
T
QR)
−1
R
T
Q
T
= QR(R
T
R)
−1
R
T
Q
T
= QRR
−1
(R
T
)
−1
R
T
Q
T
= QQ
T
.
So the projection matrix assumes a very simple form: P = QQ
T
. Of course, again
all the work has been done earlier in ﬁnding the QR factorization of A.
Example 4: Suppose we want the projection matrix P of R
3
onto the subspace
spanned by

1
0
1
¸
¸
and

2
1
1
¸
¸
. (This is Section 20 Example 3.) We construct the
matrix A with these two vectors as its columns and ﬁnd its QR factorization:

1 2
0 1
1 1
¸
¸
=

1

2
1

6
0
2

6
1

2

1

6
¸
¸
¸
¸

2
3

2
0

3

2
¸
¸
.
Then the projection matrix is
P =

1

2
1

6
0
2

6
1

2

1

6
¸
¸
¸
¸ ￿

1

2
0
1

2
1

6
2

6

1

6 ￿

=

2
3
1
3
1
3
1
3
2
3

1
3
1
3

1
3
2
3
¸
¸
¸
.
EXERCISES
1. Use the Gram-Schmidt process to orthonormalize the following sets of vectors.
(a) ￿

5
12 ￿

, ￿

−22
−19 ￿

21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 133
(b)

−3
6
2
¸
¸
,

1
−9
4
¸
¸

1
5
11
¸
¸
(c)

1
−1
−1
−1
¸
¸
¸
,

1
−1
1
1
¸
¸
¸

0
−2
0
−2
¸
¸
¸

1
0
0
−1
¸
¸
¸
(d)

−10
11
2
¸
¸
,

20
−7
26
¸
¸
(e)

1
−1
−1
−1
¸
¸
¸
,

2
−2
−1
−1
¸
¸
¸

3
0
−1
2
¸
¸
¸
2. Find the QR factorizations of the following matrices.
(a) ￿

5 −22
12 −19 ￿

(b)

−3 1 1
6 −9 5
2 4 11
¸
¸
(c)

1 1 0 1
−1 −1 −2 0
−1 1 0 0
−1 1 −2 −1
¸
¸
¸
(d)

−10 20
11 −7
2 26
¸
¸
(e)

1 2 3
−1 −2 0
−1 −1 −1
−1 −1 2
¸
¸
¸
3. Express

3
9
3
¸
¸
as a linear combination of the vectors

2
3
2
3

1
3
¸
¸
¸
,

2
3

1
3
2
3
¸
¸
¸
,

1
3
2
3
2
3
¸
¸
¸
.
134 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization
4. Extend the orthonormal set

8
9
4
9
1
9
¸
¸
¸
,

4
9
7
9
4
9
¸
¸
¸
to a basis of R
3
, or, what is the same
thing, ﬁnd a third column that makes the matrix

8
9
4
9

4
9
7
9

1
9
4
9

¸
¸
¸
orthogonal.
5. Use the QR factorization to ﬁnd the least squares solution of

1 0
0 1
1 1
1 2
¸
¸
¸ ￿

x
y ￿

=

5
4
6
4
¸
¸
¸
.
6. Use the QR factorization to ﬁnd the projection matrix of R
4
onto the plane
spanned by the vectors

1
−1
−1
1
¸
¸
¸
and

0
0
−2
−2
¸
¸
¸
.
7. Show that if Q is an orthogonal matrix then det(Q) = ±1.
8. Show that if Q
1
and Q
2
are orthogonal matrices, then so is Q
1
Q
2
.
9. Show that if Q is an orthogonal matrix, then Q
T
AQ has the same eigenvalues as
A.
10. Which of the following transformations are orthogonal: rotations, reﬂections, or
projections?
11. Let Q = ￿

α ∗
β ∗ ￿

be an orthogonal matrix.
(a) Show that the only unit vectors that are orthogonal to ￿

α
β ￿

are ￿

−β
α ￿

and ￿

β
−α ￿

. Conclude that Q = ￿

α −β
β α ￿

or ￿

α β
β −α ￿

.
(b) Show that Q must be a rotation by arctan ￿

β
α ￿

or a reﬂection in the line that
makes an angle of
1
2
arctan ￿

β
α ￿

with the x-axis.
(c) Conclude that any orthogonal transformation of R
2
must be a rotation or a
reﬂection.
21. Orthogonal matrices, Gram-Schmidt, and QR Factorization 135
12. If p
1
, p
2
, · · · , p
m
is an orthogonal basis for a subspace S of R
n
, v is a vector outside
S, and w = ￿

v · p
1
p
1
· p
1 ￿

p
1
+ ￿

v · p
2
p
2
· p
2 ￿

p
2
+ · · · ￿

v · p
m
p
m
· p
m ￿

p
m
, then show v − w ⊥ S.
(Hint: Verify v − w ⊥ p
i
for all i.) Conclude that w is the orthogonal projection of
v onto S.
13. How would you extend an orthonormal basis v
1
, v
2
, · · · , v
p
of a subspace V of R
n
to an orthonormal basis v
1
, v
2
, · · · , v
p
, v
p+1
, · · · , v
n
of all of R
n
?
14. If P is a projection, show (2P −I)
T
(2P −I) = I. Conclude that any reﬂection
is an orthogonal transformation.
15. If Q is orthogonal, then Q
−1
= ?
16. If T ￿

1
3 ￿

= ￿

−3
1 ￿

and T ￿

1
2 ￿

= ￿

−2
−1 ￿

, then is T orthogonal?
17. If A is n ×n and Q is n ×n orthogonal, then is
(a) AA
T
symmetric?
(b) AA
T
invertible?
(c) AA
T
orthogonal?
(d) Q
T
symmetric?
(e) Q
T
invertible?
(f) Q
T
orthogonal?
18. If T is any transformation of R
n
to itself that preserves distance and such that
T(0) = 0, then T is linear and can be represented as T(x) = Qx where Q is an
orthogonal matrix. This can be proved in the following way. (1) T preserves distance
and the origin ⇒ ||T(x)|| = ||x||, ||T(y)|| = ||y||, and ||T(x) − T(y)||
2
= ||x − y||
2
.
Expand this to show that T(x) · T(y) = x · y. (2) Expand ||cT(x) −T(cx)||
2
and use
(1) to show that it equals zero. (3) Expand ||T(x +y) −T(x) −T(y)||
2
and use (1)
to show that it equals zero. Conclude that T is linear and preserves dot products.
Interpret this as saying that any transformation that preserves length and the origin
must be linear and can be represented by an orthogonal matrix.
136 22. Diagonalization of Symmetric and Orthogonal Matrices
22. DIAGONALIZATION OF SYMMETRIC AND ORTHOGONAL MATRICES
In Sections 9 and 10 we learned how to ﬁnd eigenvalues, eigenvectors, and diag-
onal factorizations. Our point of view was purely algebraic. Now we consider these
concepts geometrically. The ﬁrst thing to mention is that all eigenvectors v associ-
ated with a particular eigenvalue λ of a matrix A form a subspace that we call the
eigenspace of A for the eigenvalue λ. (See Exercise 1)
Example 1: We now illustrate the geometry of diagonalization with the matrix ￿

11
5

3
5
2
5
4
5 ￿

, which has eigenvalues λ = 2 and λ = 1 with associated eigenvectors ￿

3
1 ￿

and ￿

1
2 ￿

. If we think in terms of how this matrix operates on its eigenvectors
we have ￿

11
5

3
5
2
5
4
5 ￿ ￿

3
1 ￿

= 2 ￿

3
1 ￿

and ￿

11
5

3
5
2
5
4
5 ￿ ￿

1
2 ￿

= 1 ￿

1
2 ￿

.
In this case the eigenspaces are the two lines generated by the two eigenvectors. A
maps each line to itself but stretches one by a factor of 2 and the other by a factor of
1. All other vectors are moved in more complicated ways. We can see how they are
moved by observing that, since the two eigenvectors form a basis for R
2
, any vector in
R
2
can be written as a ￿

3
1 ￿

+b ￿

1
2 ￿

. The numbers a and b are the coordinates of the
vector with respect to the skewed coordinate system deﬁned by the two eigenvectors.
Since A maps a ￿

3
1 ￿

+ b ￿

1
2 ￿

to 2a ￿

3
1 ￿

+ b ￿

1
2 ￿

, we see that the eﬀect of A is very
simple when viewed in this new coordinate system.
A
FIGURE 25
22. Diagonalization of Symmetric and Orthogonal Matrices 137
The diagonal factorization A = SDS
−1
, which in this case looks like ￿

11
5

3
5
2
5
4
5 ￿

= ￿

3 1
1 2 ￿
￿
2 0
0 1 ￿
￿
3 1
1 2 ￿

−1
,
also has a geometric interpretation illustrated by the diagram below. The diagram
means that a vector can be mapped horizontally by A (transcontinental railroad) or
around the horn by SDS
−1
(clipper ship). In either case it will arrive at the same
destination. In particular we can watch how the eigenvectors are mapped. Since
S ￿

1
0 ￿

= ￿

3
1 ￿

and S ￿

0
1 ￿

= ￿

1
2 ￿

,
we have
S
−1 ￿

3
1 ￿

= ￿

1
0 ￿

and S
−1 ￿

1
2 ￿

= ￿

0
1 ￿

.
Therefore we see that the two eigenvectors are ﬁrst taken to the two coordinate
vectors, then stretched by factors of 2 and 1, and ﬁnally sent back to stretched
versions of the original two eigenvectors.
A
S S
-1
D
1
2
3
1
0
1
1
0
0
1
1
0
1
2 3
1
2
2
FIGURE 26
Now we cover some points that were skipped over in Section 11.
1. To construct the diagonal factorization A = SDS
−1
we need n linearly indepen-
dent eigenvectors to serve as the columns of S. The independence of the columns
138 22. Diagonalization of Symmetric and Orthogonal Matrices
will insure that S
−1
exists (see Section 19). The problem of diagonalization therefore
reduces to the question of whether there are enough independent eigenvectors.
2. Eigenvectors that are associated with distinct eigenvalues are linearly independent.
In other words, if v
1
, v
2
, · · · , v
n
are eigenvectors for A with associated eigenvalues
λ
1
, λ
2
, · · · , λ
n
where λ
i
￿= λ
j
for all i ￿= j, then all the v’s are linearly independent.
To see this, assume it is not true and ﬁnd the ﬁrst vector v
i
right) that can be written as a linear combination of the v’s to its left. Suppose this
vector is v
5
. Then we know that v
1
, v
2
, v
3
, v
4
are linearly independent, and therefore
we have an equation of the form v
5
= c
1
v
1
+c
2
v
2
+c
3
v
3
+c
4
v
4
. Multiply one copy of
this equation by A to obtain λ
5
c
5
= c
1
λ
1
v
1
+c
2
λ
2
v
2
+c
3
λ
3
v
3
+c
4
λ
4
v
4
and another
copy by λ
5
to obtain λ
5
v
5
= c
1
λ
5
v
1
+c
2
λ
5
v
2
+c
3
λ
5
v
3
+c
4
λ
5
v
4
. Subtracting one from
the other gives 0 = c
1

1
−λ
5
)v
1
c
1
+c
2

2
−λ
5
)v
2
+c
3

3
−λ
5
)v
3
+c
4

4
−λ
5
)v
4
. Since
v
1
, v
2
, v
3
, v
4
are independent, all the coeﬃcients in this equation must equal zero. But
since all the λ’s are diﬀerent, the only way this can happen is if c
1
= c
2
= c
3
= c
4
= 0.
But this means that v
5
= 0, a contradiction. From this result we see that an n × n
matrix is diagonalizable if there are n real and distinct eigenvalues.
3. Unfortunately there are many interesting matrices that have repeated eigenvalues.
For example the shear matrix ￿

2 3
0 2 ￿

and the diagonal matrix ￿

2 0
0 2 ￿

both have
eigenvalues λ = 2, 2 (meaning that the eigenvalue is repeated), but the shear matrix
has only one independent eigenvector whereas the diagonal matrix has two. What
is the relationship in general between the number of independent eigenvectors asso-
ciated with a particular eigenvalue λ
0
of a matrix A and the number of times λ
0
is
repeated as a root of the characteristic polynomial of A? If we deﬁne the ﬁrst number
to be the geometric multiplicity of λ
0
and the second to be the algebraic multiplicity
of λ
0
, then we can state the answer to this question formally as follows.
Theorem. For any eigenvalue, geometric multiplicity ≤ algebraic multiplicity.
Proof: Suppose λ
0
has geometric multiplicity p, meaning that there are p inde-
pendent eigenvectors v
1
, v
2
, · · · , v
p
for λ
0
. Expand this set of vectors to a basis
v
1
, v
2
, · · · , v
p
, · · · , v
n
for R
n
. Then we have
A

.
.
.
.
.
.
.
.
.
v
1
v
p
v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
¸
=

.
.
.
.
.
.
.
.
.
v
1
v
p
v
n
.
.
.
.
.
.
.
.
.
¸
¸
¸
¸

λ
0
· · · 0
.
.
.
.
.
.
.
.
. D
0 · · · λ
0
0 · · · 0
.
.
.
.
.
. E
0 · · · 0
¸
¸
¸
¸
¸
¸
¸
¸
¸
,
which can be written A = SBS
−1
where S is the matrix of column vectors and
B is the matrix on the extreme right. Then B has the form ￿

C D
0 E ￿

, and so
the characteristic polynomial det(A − λI) = det(B − λI) = det(C − λI) det(E −
22. Diagonalization of Symmetric and Orthogonal Matrices 139
λI) = (λ
0
− λ)
p
det(E − λI) (see Section 9 Exercise 3), meaning that the algebraic
multiplicity of λ
0
is at least p. This ends the proof.
There are important classes of matrices that always have diagonal factorizations.
In particular we will now investigate symmetric and orthogonal matrices and show
that they always have especially nice diagonal, or at least diagonal-like, factorizations.
Example 2: Consider the symmetric matrix A = ￿

41 −12
−12 34 ￿

. As usual, we com-
pute the eigenvalues 25 and 50, the corresponding eigenvectors ￿

3
4 ￿

and ￿

4
−3 ￿

, and
set up the factorization
A = ￿

3 4
4 −3 ￿
￿
25 0
0 50 ￿
￿
3 4
4 −3 ￿

−1
But note that the two eigenvectors have a very special property: they are orthogonal.
We can therefore normalize them so that the factorization becomes
A = ￿

3
5
4
5
4
5

3
5 ￿ ￿

25 0
0 50 ￿ ￿

3
5
4
5
4
5

3
5 ￿

−1
,
which has the form A = QDQ
−1
where Q is an orthogonal matrix. Because Q is
orthogonal, we also have Q
−1
= Q
T
, so we can write the factorization as A = QDQ
T
or as
A = ￿

3
5
4
5
4
5

3
5 ￿ ￿

25 0
0 50 ￿ ￿

3
5
4
5
4
5

3
5 ￿

T
.
As in Example 1 the eigenvectors set up a coordinate system with respect to which
the action of A is very simple. The diﬀerence is that this time the coordinate system
is rectangular.
Example 3: Consider the symmetric matrix A =

4 −2 −2
−2 4 2
−2 2 4
¸
¸
. We compute the
eigenvalues λ = 8, 2, 2 and the corresponding eigenvectors

−1
1
1
¸
¸
,

1
1
0
¸
¸
,

1
0
1
¸
¸
.
The ﬁrst vector is orthogonal to the second and third, but those two are not or-
thogonal to each other. They are however both associated with the eigenvalue 2, so
140 22. Diagonalization of Symmetric and Orthogonal Matrices
they generate the eigenspace, in this case a plane, of the eigenvalue 2. If we run the
Gram-Schmidt process on these two eigenvectors, we will stay within the eigenspace
and generate the two orthonormal eigenvectors

1

2
1

2
0
¸
¸
¸

1

6
−1

6
2

6
¸
¸
¸
¸
.
If we normalize the ﬁrst eigenvector and assemble all the pieces, we obtain the fac-
torization
A =

1

3
1

2
1

6
1

3
1

2

1

6
1

3
0
2

6
¸
¸
¸
¸

8 0 0
0 2 0
0 0 2
¸
¸
¸

1

3
1

2
1

6
1

3
1

2

1

6
1

3
0
2

6
¸
¸
¸
¸
T
Again it has the form QDQ
T
where Q is an orthogonal matrix.
In the previous two examples, eigenvectors that come from diﬀerent eigenvalues
seemed to be automatically orthogonal. This is in fact true for any symmetric matrix
A. We prove this by letting Av = λv and Aw = µw where λ ￿= µ and noting that
λv · w = w
T
λv = w
T
Av = (w
T
Av)
T
= v
T
Aw = v
T
µw = µv · w ⇒(λ−µ)v · w = 0 ⇒
v · w = 0. (Justify each step.)
Can every symmetric matrix be factored as in the previous two examples? That
is, does every symmetric have a diagonal factorization through orthogonal matrices,
or said another way, does every symmetric matrix have an orthonormal basis of eigen-
vectors? The answer is yes, and such a factorization is called a spectral factorization.
We state this formally in the following theorem, which is one of the most important
results of linear algebra.
The Spectral Theorem. If A is a symmetric n × n matrix, then A has n real
eigenvalues (counting multiplicities) λ
1
, λ
2
, · · · , λ
n
and its corresponding eigenvectors
form an orthonormal basis with respect to which A takes the form

λ
1
λ
2
.
.
.
λ
n
¸
¸
¸
¸
or, in orther words, A can be expressed as A = QDQ
T
where Q is orthogonal and D
as above.
Proof: We have to temporarily view A as a transformation of complex n-dimensional
space C
n
. Since the characteristic equation det(A−λI) = a
n
λ
n
+a
n−1
λ
n−1
+ · · · +
22. Diagonalization of Symmetric and Orthogonal Matrices 141
a
0
= 0 involves a polynomial of degree n, the Fundamental Theorem of Algebra
tells us that there are n (possibly complex) roots. If λ
0
is one such root, then it
is an eigenvalue of A, so there is a vector v such that Av = λ
0
v. If λ
0
and v are
complex, then taking complex conjugates we have Av = λ
0
v (Section 13 Exercise 2),
so that λ
0
is also an eigenvalue with eigenvector v. We therefore have the equality
λ
0
v
T
v = v
T
Av = (v
T
Av)
T
= v
T
Av = λ
0
v
T
v. Canceling v
T
v (justiﬁed by Exercise 9)
we get λ
0
= λ
0
. Since λ
0
equals its own conjugate, it must be real. (The eigenvector
v may not be real, but the fact that λ
0
is a real eigenvalue ⇒det(A−λ
0
I) = 0 ⇒the
real matrix A − λ
0
I is singular ⇒ there is some real eigenvector for λ
0
.) Therefore
every symmetric n ×n matrix has n real eigenvalues (counting multiplicities).
The rest of the proof takes place in the real world and proceeds in steps. To
illustrate the proof, we let A be a 4 × 4 matrix. A has an eigenvalue λ
1
(which
could, in the worst case, be repeated four times) with eigenvector v
1
. Normalize v
1
and expand it to an orthonormal basis of R
4
. Let Q
1
be the orthogonal matrix with
these vectors as its columns. (The ﬁrst column is v
1
.) Then we have
AQ
1
= Q
1

λ
1
∗ ∗ ∗
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 ∗ ∗ ∗
¸
¸
¸
.
But since Q
T
1
AQ
1
is symmetric (see Section 2 Exercise 6(f)), we can conclude that
AQ
1
= Q
1

λ
1
0 0 0
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 ∗ ∗ ∗
¸
¸
¸
.
Let A
2
be the 3 × 3 matrix in the lower right corner of the last factor on the right.
Then A
2
is symmetric and, except for λ
1
, has the same eigenvalues as A (see Section
9 Exercise 3). This ends step one.
Since A
2
is symmetric, it has an eigenvalue λ
2
with eigenvector v
2
. Normalize
v
2
and expand it to an orthonormal basis of R
3
. Let U
2
be the orthogonal matrix
with these vectors as its columns. (The ﬁrst column is v
2
.) Then as above we have
A
2
U
2
= U
2

λ
2
0 0
0 ∗ ∗
0 ∗ ∗
¸
¸
.
Putting this together with the result of step one, we have

1 0 0 0
0
0 U
2
0
¸
¸
¸
T
Q
T
1
AQ
1

1 0 0 0
0
0 U
2
0
¸
¸
¸
142 22. Diagonalization of Symmetric and Orthogonal Matrices
=

1 0 0 0
0
0 U
2
0
¸
¸
¸
T

λ
1
0 0 0
0
0 A
2
0
¸
¸
¸

1 0 0 0
0
0 U
2
0
¸
¸
¸
=

λ
1
0 0 0
0 λ
2
0 0
0 0 ∗ ∗
0 0 ∗ ∗
¸
¸
¸
or letting Q
2
equal the product of Q
1
and the matrix containing U
2
we have
Q
T
2
AQ
2
=

λ
1
0 0 0
0 λ
2
0 0
0 0 ∗ ∗
0 0 ∗ ∗
¸
¸
¸
Q
2
is the product of orthogonal matrices and is therefore orthogonal (Section 21
Exercise 8). Let A
3
be the 2 ×2 matrix in the lower right corner of the last factor on
the right. Then A
3
is symmetric and, except for λ
1
and λ
2
, has the same eigenvalues
as A. This ends step two. In general, we continue in this manner until we obtain
Q
T
AQ =

λ
1
λ
2
.
.
.
λ
n
¸
¸
¸
¸
This proves the Spectral Theorem.
The Spectral Theorem has many applications, which we will not pursue here.
Instead we will end with a spectral-like factorization for orthogonal matrices. Of
course, orthogonal matrices are not necessarily symmetric, so the Spectral Theorem
does not apply. In fact, most orthogonal matrices are not diagonalizable at all as
in the case of the rotation matrix ￿

0 −1
1 0 ￿

. But let’s push ahead anyway with the
following example.
Example 4: We consider the orthogonal matrix
A =

2
3
2
3

1
3

1
3
2
3
2
3
2
3

1
3
2
3
¸
¸
¸
.
The characteristic equation for A is x
3
−2x
2
+2x −1 = 0. We ﬁnd its roots and use
Gaussian elimination with complex arithmetic as in Section 13 to obtain the following
three eigenvalue-eigenvector pairs:
22. Diagonalization of Symmetric and Orthogonal Matrices 143
1

1
1
1
¸
¸
,
1
2
+i

3
2

3 +i

3 +i
−2i
¸
¸
,
1
2
−i

3
2

3 −i

3 −i
2i
¸
¸
.
We put all this together to obtain the complex diagonal factorization
A =

1

3 +i

3 −i
1 −

3 +i −

3 −i
1 −2i 2i
¸
¸

1 0 0
0
1
2
+i

3
2
0
0 0
1
2
−i

3
2
¸
¸
¸

1

3 +i

3 −i
1 −

3 +i −

3 −i
1 −2i 2i
¸
¸
−1
.
The equations for the second and third eigenvalue-eigenvector pairs can be written
as Av = λv and Av = λv. Just as in Section 13, we can therefore rewrite the
factorization in real form. Recall from that section that the equation Av = λv can
be written as A(x + iy) = (α + iβ)(x + iy), which when multiplied out becomes
Ax + iAy = (αx − βy) + i(βx + αy). Equating real and imaginary parts we obtain
Ax = αx−βy and Ay = βx+αy. This gives us the real block-diagonal factorization
A =

1

3 1
1 −

3 1
1 0 −2
¸
¸

1 0 0
0
1
2

3
2
0 −

3
2
1
2
¸
¸
¸

1

3 1
1 −

3 1
1 0 −2
¸
¸
−1
Note that the columns of the ﬁrst factor on the right are orthogonal, so that if we
normalize each column, we will have an orthogonal matrix. But we must be careful
that when we divide by lengths, the equations Ax = αx − βy and Ay = βx + αy
remain true. This can only be done if we divide x and y by the same number. In our
case, fortunately, both the second and third columns, which correspond to x and y,
have length =

6. Therefore we are justiﬁed in writing
A =

1

3
1

2
1

6
1

3

1

2
1

6
1

3
0 −
2

6
¸
¸
¸
¸

1 0 0
0
1
2

3
2
0 −

3
2
1
2
¸
¸
¸

1

3
1

2
1

6
1

3

1

2
1

6
1

3
0 −
2

6
¸
¸
¸
¸
T
We have a factorization of the form A = QDQ
T
where Q is orthogonal and D is
block-diagonal. We can now see the geometrical eﬀect of A as a transformation of
R
3
. The three columns of Q deﬁne an orthonormal basis, and A rotates R
3
around
the axis deﬁned by the ﬁrst eigenvector by an angle of −π/3.
The kind of factorization we have just obtained can be realized for any orthogonal
matrix. We call it a real block-diagonal factorization .
144 22. Diagonalization of Symmetric and Orthogonal Matrices
Theorem. If A is an orthogonal matrix, then there is an orthonormal basis with
respect to which A takes the form

α
1
β
1
−β
1
α
1
.
.
.
α
p
β
p
−β
p
α
p
−1
.
.
.
−1
1
.
.
.
1
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
or, in other words, A = QDQ
T
where Q is orthogonal and D is as above.
Proof: First we investigate the nature of the eigenvalues. If λ is a possibly com-
plex eigenvalue of A, then Av = λv and Av = λv. From the computation v
T
v =
v
T
A
T
Av = (Av)
T
(Av) = λλv
T
v, cancelling v
T
v we obtain λλ = 1 or |λ| = 1. There-
fore A has n eigenvalues each of which is either ±1 or a complex number and its
complex conjugate both of length 1.
We will just give a sketch of the rest of the proof since it is very similar to that
of the Spectral Theorem. The proof proceeds in steps, and each step consists of two
cases. First, suppose A has eigenvalue λ = ±1 with eigenvector v. Normalize v and
expand it to an orthonormal basis of R
n
, and let Q be the orthogonal matrix with
these vectors as its columns. Then we have
AQ
=
Q

±1 ∗ ∗ ∗
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 ∗ ∗ ∗
¸
¸
¸
.
But since Q
T
AQ is orthogonal, we can conclude
AQ = Q

±1 0 0 0
0 ∗ ∗ ∗
0 ∗ ∗ ∗
0 ∗ ∗ ∗
¸
¸
¸
.
Let A
2
be the matrix in the lower right corner of the last factor on the right. Then
A
2
is orthogonal and, except for λ, has the same eigenvalues as A.
The second possibility is that λ is complex. Let x and y be the real and imaginary
parts of the eigenvector v. Assume for a moment that ￿x￿ = ￿y￿ and x· y = 0. Then
22. Diagonalization of Symmetric and Orthogonal Matrices 145
we can normalize x and y and still maintain the equations Ax = αx − βy and
Ay = βx + αy. Expand x and y into an orthonormal basis and let Q be the matrix
with these vectors as its columns. Then we have
AQ
=
Q

α β ∗ ∗
−β α ∗ ∗
0 0 ∗ ∗
0 0 ∗ ∗
¸
¸
¸
.
But since Q
T
AQ is orthogonal, we can conclude (Exercise 10)
AQ = Q

α β 0 0
−β α 0 0
0 0 ∗ ∗
0 0 ∗ ∗
¸
¸
¸
.
Let A
2
be as above, then A
2
is orthogonal and, except for λ and λ, has the same
eigenvalues as A. This ends the ﬁrst step. Continue in the obvious way as in the
Spectral Theorem.
We still have to prove ￿x￿ = ￿y￿ and x · y = 0. It is enough to show v
T
v = 0,
since then we would have v
T
v = (x+iy)
T
(x+iy) = x· x−y· y+i2x· y = 0 ⇒x· y = 0
and x · x = y · y or ￿x￿ = ￿y￿. To show v
T
v = 0 we compute v
T
v = v
T
A
T
Av =
(Av)
T
(Av) = λ
2
v
T
v. If v
T
v ￿= 0, then we could cancel it from both sides obtaining
λ
2
= 1. But the only solutions to the equation λ
2
= 1 are λ = ±1 (Exercise 11),
contradicting the assumption that λ is complex. Therefore v
T
v = 0. This ends the
proof.
Note that each consecutive pair of −1’s on the diagonal can be considered as a
plane rotation of π radians, and therefore they can be placed in the sequence of αβ
blocks. The block-diagonal matrix D then assumes the form

α
1
β
1
−β
1
α
1
.
.
.
α
q
β
q
−β
q
α
q
±1
1
.
.
.
1
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
.
So we can say that an orthogonal transformation in R
n
produces a rotation through
a certain angle in each of q mutually orthogonal planes and at most one reﬂection
146 22. Diagonalization of Symmetric and Orthogonal Matrices
that reverses one direction orthogonal to these planes. In R
3
the only possibilities
are

α β 0
−β α 0
0 0 1
¸
¸

−1 0 0
0 1 0
0 0 1
¸
¸

α β 0
−β α 0
0 0 −1
¸
¸
,
that is, a pure rotation, a pure reﬂection, or a rotation and reﬂection perpendicular
to the plane of rotation.
Finally we leave symmetric and orthogonal matrices and consider two important
scalar functions of arbitrary square matrices. They are the determinant and the
trace. The determinant of a matrix we already know something about. The trace of
a matrix A is deﬁned as the sum of its diagonal elements
tr(A) = a
11
+a
22
+ · · · +a
nn
.
They both have simple and useful expressions in terms of the eigenvalues of A, which
are summarized in the following.
Theorem. The determinant of a matrix is equal to the product of its eigenvalues,
and the trace of a matrix is equal to the sum of its eigenvalues, both taken over the
complex numbers.
Proof: Consider the characteristic polynomial det(A−λI) of A.
det

a
11
−λ a
12
· · · a
1n
a
21
a
22
−λ · · · a
2n
.
.
.
.
.
.
.
.
.
a
n1
a
n2
· · · a
nn
−λ
¸
¸
¸
¸
= (a
11
−λ)(a
22
−λ) · · · (a
nn
−λ) + expressions in λ
n−2
, λ
n−3
, · · · , λ + constants
= (−λ)
n
+ tr(A)(−λ)
n−1
+ · · · + det(A)
The ﬁrst equality follows from the determinant formula. Note that the ﬁrst term
contains all expressions involving λ
n
and λ
n−1
. The second equality follows by simple
computation and the fact that det(A − 0I) = det(A). If λ
1
, λ
2
, · · · , λ
n
are all the
eigenvalues of A, then the characteristic polynomial can also be written in factored
form as
det(A−λI) = C(λ
1
−λ)(λ
2
−λ) · · · (λ
n
−λ)
= C[(−λ)
n
+ (λ
1

2
+ · · · +λ
n
)(−λ)
n−1
+ · · · +λ
1
λ
2
· · · λ
n
]
Equating the two forms of the characteristic polynomial, we see that C = 1 and
therefore det(A) = λ
1
λ
2
· · · λ
n
and tr(A) = λ
1

2
+ · · · +λ
n
.
These facts are useful in analyzing orthogonal transformations of R
3
. Suppose
A is 3 × 3 orthogonal, so det(A) = ±1. From the considerations above A is a pure
22. Diagonalization of Symmetric and Orthogonal Matrices 147
rotation if and only if det(A) = 1. In this case tr(A) = 1+2α. Since α = cos θ where
θ is the angle of rotation, we have
cos θ =
tr(A) −1
2
.
This means that the angle of rotation can be computed without ﬁnding eigenvalues.
In particular, for the matrix
A =

2
3
2
3

1
3

1
3
2
3
2
3
2
3

1
3
2
3
¸
¸
¸
of the earlier example, we have det(A) = 1, so A is a pure rotation such that cos θ =
(6/3 − 1)/2 = 1/2 and therefore θ = π/3. To ﬁnd the axis and direction of the
rotation, it is still necessary to compute the eigenvectors.
EXERCISES
1. Show that an eigenspace of a matrix is a subspace.
2. Describe the eigenspaces of the following matrix and how the matrix acts on each.
What are the algebraic and geometric multiplicities of the eigenvalues?
A =

2 3 0
4 3 0
0 0 6
¸
¸
=

−1 0
3
4
1 0 1
0 1 0
¸
¸

−1 0 0
0 6 0
0 0 6
¸
¸

−1 0
3
4
1 0 1
0 1 0
¸
¸
−1
3. Find the diagonal factorizations of the following matrices and sketch a diagram
that geometrically describes the eﬀect of each.
(a) ￿

1 4
1 −2 ￿

(b) ￿

2 −2
−2 −1 ￿

(c)

2 1 0
0 3 0
0 0 3
¸
¸
4. Find the spectral factorizations of the following symmetric matrices.
148 22. Diagonalization of Symmetric and Orthogonal Matrices
(a) ￿

2 −2
−2 −1 ￿

(b)

3 −2 0
−2 0 0
0 0 1
¸
¸
(c)

4 0 −2
0 5 0
−2 0 1
¸
¸
(d)

0 2 2
2 0 −2
2 −2 0
¸
¸
5. Find the spectral factorizations of the following transformations and reconstruct
their matrices.
(a) Projection of R
2
onto the line deﬁned by ￿

3
1 ￿

.
(b) Reﬂection of R
2
across the line deﬁned by ￿

3
1 ￿

.
6. Find the real block-diagonal factorizations of the following orthogonal matrices
and describe geometrically the transformations they deﬁne.
(a)

1
3

2
3

2
3

2
3
1
3

2
3

2
3

2
3
1
3
¸
¸
¸
(b)

2
3

1
3
2
3
2
3
2
3

1
3

1
3
2
3
2
3
¸
¸
¸
(c)

0 0 0 1
0 0 −1 0
0 1 0 0
−1 0 0 0
¸
¸
¸
7. Construct the orthogonal matrix that rotates R
3
around the axis deﬁned by the
vector

−1
0
1
¸
¸
by 90

by writing down block-diagonal factorization of the matrix and
multiplying it out.
22. Diagonalization of Symmetric and Orthogonal Matrices 149
8. If Ax = αx −βy and Ay = βx +αy, then
A

.
.
.
.
.
.
x y
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
x y
.
.
.
.
.
.
¸
¸
¸ ￿

α β
−β α ￿

or
A

.
.
.
.
.
.
y x
.
.
.
.
.
.
¸
¸
¸
=

.
.
.
.
.
.
y x
.
.
.
.
.
.
¸
¸
¸ ￿

α −β
β α ￿

.
What does each equation say about the direction of the rotation of the plane spanned
by x and y? (Of course, they must say the same thing.)
9. If v is a nonzero (possibly complex) vector, then show v
T
v ￿= 0.
10. Show that if a vector

c
1
c
2
c
3
¸
¸
is orthogonal to the two vectors

α
−β
0
¸
¸
and

β
α
0
¸
¸
,
then c
1
= c
2
= 0.
11. Show that, even in the world of complex numbers, the only solutions to the
equation λ
2
= 1 are λ = ±1. (Hint: Let λ = α +iβ and reach a contradiction.)
12. If Q is an orthogonal matrix such that det Q = −1, then what can you say about
Q as a transformation?
13. Fix the center of a basketball and choose n axes v
1
, v
2
, · · · , v
n
and angles
θ
1
, θ
2
, · · · , θ
n
. Rotate the basketball around v
1
by an angle θ
1
, around v
2
by an
angle θ
2
, · · ·, and around v
n
by an angle θ
n
. You could have achieved the same result
with one rotation around a certain axis and by a certain angle. Discuss why this is
true and how you could ﬁnd the one axis and angle that will do the job. This is The
Larry Bird Theorem.
14. State one signiﬁcant fact about the eigenvalues of
(a) a symmetric matrix.
(b) an orthogonal matrix.
(c) a stable matrix.
(d) a defective matrix.
(e) a singular matrix.
(g) a projection matrix.
(h) a reﬂection matrix.
150 22. Diagonalization of Symmetric and Orthogonal Matrices
15. For each matrix below decide if it is symmetric, orthogonal, invertible, a projec-
tion, or diagonalizable.
A =

0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
¸
¸
¸
B =
1
4

1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
¸
¸
¸
Find their eigenvalues.
16. Show tr(A) + tr(B) = tr(A+B), tr(AB) = tr(BA), and tr(B
−1
AB) = tr(A)
17. Show that A = SBS
−1
⇒ A and B have the same trace, determinant, eigen-
values, characteristic polynomial, and rank. Find a counterexample for the converse
(⇐). Hint: Try A = ￿

1 0
0 1 ￿

and B = ￿

1 1
0 1 ￿

After linear functions, which we have already studied extensively in the form
of linear equations and linear transformations, quadratic functions are next in level
of complexity. Such functions arise in diverse applications, including geometry, me-
chanical vibrations, statistics, and electrical engineering, but matrix methods allow a
uniﬁed study of their properties. A quadratic equation in two variables is an equation
of the form
ax
2
+bxy +cy
2
+dx +ey +f = 0
where at least one of the coeﬃcients a, b, c is not zero. From analytic geometry, we
know that the graph of a quadratic equation is a conic section, that is, a circle, a
parabola, an ellipse, a hyperbola, a pair of lines, a single line, a point, or the empty
set. A quadratic equation may be expressed with matrices as
[ x y ] ￿

a b/2
b/2 c ￿
￿
x
y ￿

+ [ d e ] ￿

x
y ￿

+f = 0.
The second degree terms
ax
2
+bxy +cy
2
= [ x y ] ￿

a b/2
b/2 c ￿
￿
x
y ￿

determine the type of conic section that the equation represents and are called the
quadratic form associated with the equation. Note that although the matrix above
is symmetric, the same quadratic form can be generated by many other diﬀerent
matrices such as ￿

a b
0 c ￿

and ￿

a 3b
−2b c ￿

. A quadratic equation in three variables is
an equation of the form
ax
2
+by
2
+cz
2
+dxy +exz +fyz +gx +hy +iz +j = 0
or
[ x y z ]

a d/2 e/2
d/2 b f/2
e/2 f/2 c
¸
¸

x
y
z
¸
¸
+ [ g h i ]

x
y
z
¸
¸
+j = 0
where at least one of the coeﬃcients a, b, c, d, e, f is not zero. The graphs of such equa-
tions are quadric surfaces, which include ellipsoids, hyperboloids, and paraboloids of
various types. Again the terms of second degree constitute the quadratic form asso-
ciated with the equation.
In general, a quadratic form in n variables is an expression of the form
n ￿

i=1
n ￿

j=1
a
ij
x
i
x
j
= [ x
1
x
2
· · · x
n
]

a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
¸
¸
¸
¸

x
1
x
2
.
.
.
x
n
¸
¸
¸
¸
= x
T
Ax,
and a quadratic equation in n variables has the representation x
T
Ax +b
T
x +c = 0.
We were able to express the quadratic forms in two and three variables above by
means of symmetric matrices. Can this always be done? Yes, since x
T
Ax = x
T
A
T
x
(Exercise 1), we have x
T
Ax =
1
2
(x
T
A
T
x + x
T
A
T
x) = x
T
(
1
2
(A + A
T
))x, and the
matrix
1
2
(A + A
T
) is symmetric. (Exercise 2.) This just amounts to replacing the
oﬀ-diagonal elements a
ij
and a
ji
by
1
2
(a
ij
+a
ji
). We can therefore always assume A is
symmetric. If A is also nonsingular, the quadratic form x
T
Ax is called nondegenerate.
We now turn to the question of how to recognize the graph of a quadratic equation.
Example 1: Suppose we have the quadratic equation 41x
2
1
−24x
1
x
2
+ 34x
2
2
= 1. We
can write this equation in the form x
T
Ax = 1 or
[ x
1
x
2
] ￿

41 −12
−12 34 ￿
￿
x
1
x
2 ￿

= 1.
Since A is symmetric, it has a spectral factorization A = QDQ
T
, which from Section
22 is
A = ￿

3
5
4
5
4
5

3
5 ￿
￿
25 0
0 50 ￿
￿
3
5
4
5
4
5

3
5 ￿

T
.
If we substitute this into x
T
Ax we obtain x
T
QDQ
T
x = (Q
T
x)
T
D(Q
T
x) = y
T
Dy
where y = Q
T
x or
[ x
1
x
2
] ￿

3
5
4
5
4
5

3
5 ￿
￿
25 0
0 50 ￿
￿
3
5
4
5
4
5

3
5 ￿

T
￿
x
1
x
2 ￿

= [
3
5
x
1
+
4
5
x
2
4
5
x
1

3
5
x
2
] ￿

25 0
0 50 ￿
￿
3
5
x
1
+
4
5
x
2
4
5
x
1

3
5
x
2 ￿

= [ y
1
y
2
] ￿

25 0
0 50 ￿
￿
y
1
y
2 ￿

= 25y
2
1
+ 50y
2
2
.
The y-coordinates are therefore y
1
=
3
5
x
1
+
4
5
x
2
and y
2
=
4
5
x
1

3
5
x
2
equation expressed in these coordinates becomes 25y
2
1
+ 50y
2
2
= 1, which is just an
ellipse. The x-coordinates and the y-coordinates are related by Q, which provides an
orthogonal transformation from y-space to x-space. Since orthogonal transformations
preserve distance, angle, and therefore congruence, the original quadratic equation
must also represent an ellipse. Furthermore Q takes the coordinate vectors ￿

1
0 ￿

, ￿

0
1 ￿

in y-space to the eigenvectors ￿

3
5
4
5 ￿

, ￿

4
5
−3
5 ￿

in x-space, which just amounts to a simple
rotation. Therefore 41x
2
1
− 24x
1
x
2
+ 34x
2
2
= 1 is a rotated ellipse with major and
minor axes along the eigenvectors of A.
Example 2: To ﬁnd the graph of the quadratic equation 4x
1
x
2
+ 4x
1
x
3
−4x
2
x
3
= 1
we ﬁrst write it as
[ x
1
x
2
x
3
]

0 2 2
2 0 −2
2 −2 0
¸
¸

x
1
x
2
x
3
¸
¸
= 1.
From Section 22 Exercise 5(d) the spectral factorization A = QDQ
T
for this matrix
looks like

0 2 2
2 0 −2
2 −2 0
¸
¸
=

1

2
1

6

1

3
0
2

6
1

3
1

2

1

6
1

3
¸
¸
¸

2 0 0
0 2 0
0 0 −4
¸
¸

1

2
1

6

1

3
0
2

6
1

3
1

2

1

6
1

3
¸
¸
¸
T
.
Therefore setting y = Q
T
x so that

y
1
y
2
y
3
¸
¸
=

1

2
1

6

1

3
0
2

6
1

3
1

2

1

6
1

3
¸
¸
¸
T

x
1
x
2
x
3
¸
¸
=

1

2
x
1

1

2
x
3
1

6
x
1
+
2

6
x
2

1

6
x
3

1

3
x
1

1

3
x
2
+
1

3
x
3
¸
¸
¸
,
the quadratic equation in terms of the y-coordinates takes the form 2y
2
1
+2y
2
2
−4y
2
3
=
1. This is a hyperboloid of revolution around the y
3
equation 4x
1
x
2
+ 4x
1
x
3
− 4x
2
x
3
= 1 describes a hyperboloid of revolution around
the axis deﬁned by the third column of Q.
The method just illustrated obviously works in general. We therefore have for
T
Ax, there is an orthogonal change of variables y = Q
T
x with
respect to which the quadratic form becomes λ
1
y
2
1

2
y
2
2
+· · ·+λ
n
y
2
n
. (A is symmetric
with eigenvalues λ
1
, λ
2
, · · · , λ
n
and Q is orthogonal.) This is called the Principal Axis
Theorem. It is really just the Spectral Theorem in another form.
EXERCISES
1. Show that x
T
Ax = x
T
A
T
x. (Hint: Since x
T
Ax is a 1 × 1 matrix, it must equal
its own transpose.)
2. Show that A+A
T
is symmetric for any square matrix A.
3. For the each of the following quadratic equations, ﬁnd a rotation of the coordinates
so that the resulting quadratic form is in standard form, and identify and sketch the
curve or surface.
(a) x
2
1
+x
1
x
2
+x
2
2
= 6
(b) 7x
2
1
+ 7x
2
2
− 5x
2
3
− 32x
1
x
2
− 16x
1
x
3
+ 16x
2
x
3
= 1 (Hint: The eigenvalues are
−9, −9, 27.)
4. For the quadratic equation 6x
2
1
−6x
1
x
2
+14x
2
2
−2x
1
+x
2
= 0, (a) ﬁnd a rotation of
the coordinates so that the resulting quadratic form is in standard form, (b) eliminate
the linear terms by completing the square in each variable and making a translation
of the coordinates, and (c) identify and sketch the curve.
5. Identify the following conics.
(a) 14x
2
−16xy + 5y
2
= 6
(b) 2x
2
+ 4xy + 2y
2
+x −3y = 1
(a) 2x
2
+ 2y
2
+ 3z
2
+ 4yz = 3
(b) 2x
2
+ 2y
2
+z
2
+ 4xz = 4
24. Positive Deﬁnite Matrices 155
24. POSITIVE DEFINITE MATRICES
Now we investigate how quadratic forms arise in the problem of maximizing and
minimizing functions of several variables. Suppose we want to determine the nature
of the critical points of a real valued function z = f(x, y). Assume for simplicity
a critical point occurs at (0, 0) and f(x, y) can be expanded in a Taylor series in a
neighborhood of that point. Then we have f(x, y) =
f(0, 0) +f
x
(0, 0)x +f
y
(0, 0)y +
1
2!
(f
xx
(0, 0)x
2
+ 2f
xy
(0, 0)xy +f
yy
(0, 0)y
2
) + · · · .
Since (0, 0) is a critical point, we must have f
x
(0, 0) = f
y
(0, 0) = 0. Putting this
back into the Taylor series and rewriting the second order terms, we have
f(x, y) −f(0, 0) = ax
2
+bxy +cy
2
+ higher order terms.
This means that f(x, y) behaves near (0, 0) like its second order terms ax
2
+bxy+cy
2
.
That is to say, if the quadratic form ax
2
+ bxy + cy
2
is positive for every nonzero
choice of (x, y) then f(x, y) has a minimum at (0, 0), and if ax
2
+ bxy + cy
2
is
negative for every nonzero choice of (x, y) then f(x, y) has a maximum at (0, 0). In
general, an arbitrary quadratic form ax
2
+bxy +cy
2
will assume positive, negative,
and zero values for various values of (x, y). But there are cases like 2x
2
+ 3y
2
and
x
2
− 2xy + 2y
2
= (x − y)
2
+ y
2
that are positive for all nonzero values of (x, y), or
like −x
2
−6y
2
and −x
2
+ 4xy −4y
2
= −(x −2y)
2
that are negative for all nonzero
values of (x, y).
We are therefore led to the following deﬁnition. A symmetric matrix A is positive
deﬁnite if its associated quadratic form x
T
Ax > 0 for every x ￿= 0. We also say A is
negative deﬁnite if −A is positive deﬁnite, that is if x
T
Ax < 0 for every x ￿= 0. How
can we tell if a symmetric matrix is positive deﬁnite? There are ﬁve ways to answer
this question, and we present them all in the following theorem. Its proof is long but
instructive. First we need a deﬁnition: For any square matrix
A =

a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
¸
¸
¸
¸
,
we deﬁne the leading principal submatrices of A to be
A
1
= [ a
11
] A
2
= ￿

a
11
a
12
a
21
a
22 ￿

A
3
=

a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
¸
¸
· · ·
Now for the characterization of positive deﬁnite matrices.
156 24. Positive Deﬁnite Matrices
Theorem. For any symmetric n ×n matrix A the following statements are equava-
lent.
(a) A is positive deﬁnite.
(b) All the eigenvalues of A are positive.
(c) All the leading principal submatrices A
1
, A
2
, · · · , A
n
of A have positive determi-
nants.
(d) A can be reduced to upper triangular form with all pivots positive by using only
the Gaussian operation of multiplying one row by a scalar and subtracting from
another row (no row exchanges or scalar multiplications of rows are necessary).
(e) There is a matrix R (not necessarily square) with independent columns such
that A = R
T
R.
Proof: We show (a) ⇔ (b), (a) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (a).
(a) ⇒ (b): If A is positive deﬁnite and Ax = λx, then 0 < x
T
Ax = x
T
λx = λ￿x￿
2
and therefore 0 < λ.
(b) ⇒ (a): By the Principal Axis Theorem x
T
Ax = λ
1
y
2
1

2
y
2
2
+ · · · +λ
n
y
2
n
where
y = Q
T
x and Q is orthogonal. Therefore, if all the eigenvalues λ
1
, λ
2
, · · · , λ
n
are
positive, then x
T
Ax > 0 for any x ￿= 0.
(a) ⇒(c): Since A is positive deﬁnite, then so are all the leading principal submatrices
A
1
, A
2
, · · · , A
n
. This follows for A
2
for example from the equality
[ x
1
x
2
] ￿

a
11
a
12
a
21
a
22 ￿
￿
x
1
x
2 ￿

= [ x
1
x
2
0 · · · 0 ]

a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
¸
¸
¸
¸

x
1
x
2
0
.
.
.
0
¸
¸
¸
¸
¸
¸
> 0.
There are similar equalities for all the other leading principal submatrices. Therefore,
since det(A
i
) equals the product of its eigenvalues (by the symmetry of A
i
and Section
22 Exercise 6), which are all positive by (b) above, we have det(A
i
) > 0.
(c) ⇒(d): We ﬁrst note that the Gaussian step of multiplying one row by a scalar and
substracting from another row has no eﬀect on the determinant of a matrix or on the
determinant of its leading principal submatrices. We now illustrate the implication
of this for the 4 ×4 case. Initially A looks like

p
11
∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
¸
¸
¸
24. Positive Deﬁnite Matrices 157
and we have p
11
= det(A
1
) > 0. We run one Gaussian step and obtain

p
11
∗ ∗ ∗
0 p
22
∗ ∗
0 ∗ ∗ ∗
0 ∗ ∗ ∗
¸
¸
¸
.
Then p
11
p
22
= det(A
2
) > 0 ⇒p
22
> 0. We run another Gaussian step and obtain

p
11
∗ ∗ ∗
0 p
22
∗ ∗
0 0 p
33

0 0 ∗ ∗
¸
¸
¸
.
Then p
11
p
22
p
33
= det(A
3
) > 0 ⇒ p
33
> 0. Finally we run one more Gaussian step
and obtain

p
11
∗ ∗ ∗
0 p
22
∗ ∗
0 0 p
33

0 0 0 p
44
¸
¸
¸
.
Then p
11
p
22
p
33
p
44
= det(A
4
) > 0 ⇒ p
44
> 0. Note that no row exchanges are
necessary. The general case is now clear.
(d) ⇒(e): This is the hard one! We need a preliminary result: If A is symmetric and
has an LU-factorization A = LU, then it has a factorization of the form A = LDL
T
where D is diagonal. We quickly indicate the proof. If we divide each row of U by its
pivot and place the pivots into a diagonal matrix D, we immediately have A = LDM
where M is upper triangular with ones down its diagonal. Our goal is to show
L
T
= M or L
T
M
−1
= I. Since A is symmetric, A
T
= A ⇒ M
T
DL
T
= LDM ⇒
L
T
M
−1
= D
−1
(M
T
)
−1
LD. In the last equation, the left side is upper triangular since
it is a product of upper triangular matrices, and the right side is lower triangular since
it is a product of lower triangular and diagonal matrices. Both sides are therefore
diagonal. Furthermore, since L
T
and M
−1
are each upper triangular with ones down
their diagonals, then the same is true of M(L
T
)
−1
(Exercise 1). We conclude that
M(L
T
)
−1
= I. Now we use this result. Since A is symmetric with positive pivots, we
have A = LDL
T
where the diagonal entries of D are all positive. We can therefore
deﬁne

D to be the diagonal matrix with diagonal entries equal to the square roots
of the corresponding diagonal entries of D. We then have A = (L

D)(

DL
T
),
which has the form A = R
T
R.
(e) ⇒ (a): Since R has independent columns, Rx = 0 ⇔ x = 0. Therefore x ￿= 0 ⇒
x
T
Ax = x
T
R
T
Rx = (Rx)
T
(Rx) = ￿Rx￿
2
> 0. This ends the proof.
The factorization A = (L

D)(

DL
T
) is called the Cholesky factorization of the
symmetric positive deﬁnite matrix A. It is useful in numerical applications and can
be computed by a simple variant of Gaussian elimination.
158 24. Positive Deﬁnite Matrices
From this theorem we can also characterize negative deﬁnite matrices. The
equivalent statements are (a) A is negative deﬁnite, (b) all the eigenvalues of A are
negative, (c) det(A
1
) < 0, det(A
2
) > 0, det(A
3
) < 0, · · · (Exercise 2), (d) all the
pivots of A are negative, and (e) A = −R
T
R for some matrix R with independent
columns.
Example 3: Let’s check each of the conditions above for the quadratic form 2x
2
1
+
2x
2
2
+ 2x
2
3
−2x
1
x
2
−2x
1
x
3
+ 2x
2
x
3
. First we write it in the form x
T
Ax where
A =

2 −1 −1
−1 2 1
−1 1 2
¸
¸
.
The the spectral factorization of A is
A =

1

2
1

6

1

3
0
2

6
1

3
1

2

1

6
1

3
¸
¸
¸

1 0 0
0 1 0
0 0 4
¸
¸

1

2
1

6

1

3
0
2

6
1

3
1

2

1

6
1

3
¸
¸
¸
T
.
All the eigenvalues are positive, and therefore A is positive deﬁnite. The leading
principal submatrices have determinants det(A
1
) = 2, det(A
2
) = 3, det(A
3
) = 4 and
are therefore all positive as they should be. The LU factorization of A is
A =

1 0 0

1
2
1 0

1
2
1
3
1
¸
¸

2 −1 −1
0
3
2
1
2
0 0
4
3
¸
¸
.
The pivots are all positive, so we have the factorization A = LDL
T
or
A =

1 0 0

1
2
1 0

1
2
1
3
1
¸
¸

2 0 0
0
3
2
0
0 0
4
3
¸
¸

1 −
1
2

1
2
0 1
1
3
0 0 1
¸
¸
.
We therefore can write A = (L

D)(

DL
T
) = (

DL
T
)
T
(

DL
T
) or
A =

2 0 0

1

2 ￿

3
2
0

1

2
1

6
2

3
¸
¸
¸

2 −
1

2

1

2
0 ￿

3
2
1

6
0 0
2

3
¸
¸
¸
,
which has the form A = R
T
R. There is nothing unique about R. For example, we
can also take the square root of the diagonal matrix in the spectral factorization of
A to obtain A = (Q

D)(

DQ
T
) = (

DQ)
T
(

DQ) or
A =

1

2
1

6

1

3
0
2

6
1

3
2

2

2

6
2

3
¸
¸
¸

1

2
0
2

2
1

6
2

6

2

6

1

3
1

3
2

3
¸
¸
¸
,
24. Positive Deﬁnite Matrices 159
which also has the form A = R
T
R. There are many other such R’s, not even
necessarily square, for example
A =

1 1 0 0
−1 0 0 1
0 −1 0 1
¸
¸

1 −1 0
1 0 −1
0 0 0
0 1 1
¸
¸
¸
.
In fact, the product R
T
R should look familiar. It appears in the normal equations
A
T
Ax = A
T
b. We conclude that least squares problems invariably lead to positive
deﬁnite matrices.
Now let’s return to the problem of maximizing or minimizing a function of two
variables. We have seen that the question comes down to the positive or negative
1
2!
(f
xx
(0, 0)x
2
+ 2f
xy
(0, 0)xy +f
yy
(0, 0)y
2
)
or of the matrix
￿
f
xx
(0, 0) f
xy
(0, 0)
f
xy
(0, 0) f
yy
(0, 0) ￿

.
From the characterization of positive and negative deﬁnite matrices in terms of the
signs of the determinants of their principal leading submatrices, we immediately
obtain that (0, 0) is
a minimum point if f
xx
(0, 0) > 0 and f
xx
(0, 0)f
yy
(0, 0) −(f
xy
(0, 0))
2
> 0,
a maximum point if f
xx
(0, 0) < 0 and f
xx
(0, 0)f
yy
(0, 0) −(f
xy
(0, 0))
2
> 0.
This is just the second derivative test from the calculus of several variables. In the
n-variable case, if a function f(x
1
, x
2
, · · · , x
n
) has a critical point at (0, 0, · · · , 0), then
f
x
1
(0, 0, · · · , 0) = f
x
2
(0, 0, · · · , 0) = · · · = f
x
n
(0, 0, · · · , 0) = 0 and locally we have
f(x
1
, x
2
, · · · , x
n
)
= f(0, 0, · · · , 0)
+
1
2!
[ x
1
x
2
· · · x
n
]

f
x
1
x
1
f
x
1
x
2
· · · f
x
1
x
n
f
x
2
x
1
f
x
2
x
2
· · · f
x
2
x
n
.
.
.
.
.
.
.
.
.
.
.
.
f
x
n
x
1
f
x
n
x
2
· · · f
x
n
x
n
¸
¸
¸
¸
(0,0,···,0)

x
1
x
2
.
.
.
x
n
¸
¸
¸
¸
+ higher order terms
The matrix of second derivatives is called the Hessian of f(x
1
, x
2
, · · · , x
n
). If the
Hessian evaluated at (0, 0, · · · , 0) is positive or negative deﬁnite, then a maximum or
minimum occurs at (0, 0, · · · , 0). To determine if a large matrix is positive deﬁnite, it
160 24. Positive Deﬁnite Matrices
is obviously not eﬃcient to use the determinant test as we did for the 2×2 case above.
It is much better to check the signs of the pivots, because they are easily found by
Gaussian elimination. So we have come full circle. Gauss reigns supreme here as in
every other domain of linear algebra. That is the paramount and overriding principle
of the subject and of these notes.
EXERCISES
1. Show by example that the set of upper triangular matrices with ones down their
diagonals is closed under multiplication and inverse.
2. Why does the determinant test for negative deﬁniteness look like det(A
1
) <
0, det(A
2
) > 0, det(A
3
) < 0, · · ·?
3. Let A and B be symmetric positive deﬁnite, C be nonsingular, E be nonsingular
and symmetric, and F just symmetric. Prove that
(a) A+B is positive deﬁnite. (Use the deﬁnition.)
(b) A is nonsingular and A
−1
is positive deﬁnite. (Use the eigenvalue test and the
Spectral Theorem.)
(c) C
T
AC is positive deﬁnite. (Use the deﬁnition.)
(d) E
2
is positive deﬁnite. (Use the eigenvalue test and the Spectral Theorem.)
(e) e
F
is positive deﬁnite. (Use the eigenvalue test and the Spectral Theorem.)
(f) The diagonal elements a
ii
of A are all positive. (Take x to be a coordinate vector
in the deﬁnition.)
4. Show by an example that the product of two positive deﬁnite symmetric matrices
may not deﬁne a positive deﬁnite quadratic form.
5. Write the quadratic form 3x
2
1
+ 4x
2
2
+ 5x
2
3
+ 4x
1
x
2
+ 4x
2
x
3
in the form x
T
Ax
and verify all the statements in the theorem on positive deﬁnite matrices. That is,
show A has all eigenvalues positive and all pivots positive and obtain two diﬀerent
factorizations of the form A = R
T
R, one from A = QDQ
T
and the other from
A = LDL
T
. Describe the quadric surface 3x
2
1
+ 4x
2
2
+ 5x
2
3
+ 4x
1
x
2
+ 4x
2
x
3
= 16
(Hint: λ = 1, 4, 7)
6. For positive deﬁnite matrices A, make a reasonable deﬁnition of

A, and compute
it for A =

3 2 0
2 4 2
0 2 5
¸
¸
. (See Exercise 5 above.)
7. Decide if the each of the indicated critical points is a maximum or minimum.
(a) f(x, y) = −1 + 4(e
x
−x) −5xsin y + 6y
2
at the point (0, 0).
24. Positive Deﬁnite Matrices 161
(b) f(x, y) = (x
2
−2x) cos y at the point (1, π).
8. Test the following matrix for positive deﬁniteness the easiest way you can.

1 0 1 0
0 2 1 1
1 1 3 1
0 1 1 2
¸
¸
¸
9. A symmetric matrix A is positive semideﬁnite if its associated quadratic form
x
T
Ax ≥ 0 for every x ￿= 0. Characterize positive semideﬁnite matrices in terms of
their eigenvalues.
SECTION 1
1. (a) ￿

−2
1 ￿

(b) ￿

−1
−4 ￿

(c)

−2
3
−1
¸
¸
(d)

−.5
5
−3
¸
¸
(e)

1
0
2
1
¸
¸
¸
2.

1.5
−.5
−3
¸
¸
4. 150, 100
5. 580, 50
6. 10 servings of pasta, 1 serving of chicken, 4 servings of broccoli
7. y = x
3
−2x
2
−3x + 5
8. y = 3x
3
−5x
2
+x + 2
SECTION 2
1. (a) ￿

10 14 −2
8 −4 0 ￿

(b)

7 0
10 7
6 −5
¸
¸
(c)

17
4
−7
¸
¸
(d) [ 2 14 −8 ]
(e) [ 32 ] (f)

4 8 12
5 10 15
6 12 18
¸
¸
(g)

−8 10
−14 26
−2 −1
¸
¸
(h)

8 −3 12
5 0 7
−6 −3 −8
¸
¸
(i)

4 0 −1
0 1 0
2 −2 1
¸
¸
(j)

32 0 0
0 1 0
0 0 243
¸
¸
(k)

0 0 0
0 0 0
0 0 0
¸
¸
6. (a)

2 5 0
−1 0 −1
3 7 0
¸
¸
9. All but the last two.
SECTION 3
1. (a) ￿

1 0
.75 1 ￿￿

4 −6
0 9.5 ￿

(b)

1 0 0
−1 1 0
2 0 1
¸
¸

2 1 3
0 6 4
0 0 −2
¸
¸
(c)

1 0 0 0
2 1 0 0
−3 −11 1 0
1 2 −.5 1
¸
¸
¸

1 3 2 −1
0 −1 −1 4
0 0 −6 43
0 0 0 15.5
¸
¸
¸
(d)

1 0 0 0 0
2 1 0 0 0
0 1 1 0 0
0 0 −1 1 0
0 0 0 2 1
¸
¸
¸
¸
¸

2 1 0 0 0
0 3 3 0 0
0 0 1 1 0
0 0 0 2 1
0 0 0 0 1
¸
¸
¸
¸
¸
2. (a) ￿

1
2 ￿

(b)

2
−1
3
¸
¸
(c)

1
0
0
1
¸
¸
¸
(d)

2
0
−1
0
1
¸
¸
¸
¸
¸
3. 350, 1628
SECTION 4
1.

2
−3
4
¸
¸
2. all except (c)
3. (a) none, (b) inﬁnitely many
SECTION 5
1. (a) ￿

−7 4
2 −1 ￿

(b)

.5 0 0
0 10 0
0 0 −.2
¸
¸
(c)

.5 −1.5 .5
0 .5 −.5
0 0 .2
¸
¸
(d)

10 −6 1
−2 1 0
−7 5 −1
¸
¸
(e)

−1 0 1
−5 1 3
7 −1 −4
¸
¸
(f)

1 −2 1 0
1 −2 2 −3
0 1 −1 1
−2 3 −2 3
¸
¸
¸
(g)
1

d −b
−c a ￿

3.

2
7
2
¸
¸
4. only (c)
10. (a) False. (b) True.
SECTION 6
1.

2 1 0 0 0
1 4 1 0 0
0 1 4 1 0
0 0 1 4 1
0 0 0 1 2
¸
¸
¸
¸
¸

s
0
s
1
s
2
s
3
s
4
¸
¸
¸
¸
¸
=

3
12
0
−12
−3
¸
¸
¸
¸
¸
s
0
= s
2
= s
4
= 0, s
1
= 3, s
3
= −3
SECTION 7
2. (a)

2
.5
0
¸
¸
+c

−1
−1
1
¸
¸
(b) no solution
(c)

3
0
0
¸
¸
+c

1
0
1
¸
¸
+d

−2
1
0
¸
¸
(d)

3
−1
0
0
¸
¸
¸
+c

−1
0
1
0
¸
¸
¸
(e)

2
0
−1.5
0
¸
¸
¸
+c

−1
0
−.5
1
¸
¸
¸
+d

−2
1
0
0
¸
¸
¸
(f) ￿

3
−5 ￿

3. (a) two intersecting lines
(b) two parallel lines
(c) one line
4. (a) three planes intersecting in a point
(b) one plane intersecting two parallel planes
(c) three nonparallel planes with no intersection
(d) a line of intersection
(e) a plane of intersection
8. eggs = −2 +c, milk = 4 −c, orangejuice = c where 2 ≤ c ≤ 4
9. a = 2, b = c = d = 1
10. x
2
+y
2
−4x −6y + 4 = 0
SECTION 8
1. (a) −6 (b) −16 (c) −24 (d) −12 (e) −1 (f) −1
4. −
1
6
, −6
5. (a) 3 (b) −12 (c) x + 2y −18 (d) −x
3
+ 6x
2
−8x
7. True.
SECTION 9
1. (a) for λ = 1: ￿

1
0 ￿

, for λ = 2: ￿

1
1 ￿

(b) for λ = 1:

0
1
0
¸
¸
and

1
0
−2
¸
¸
, for λ = 3:

−1
0
1
¸
¸
(c) for λ = 0:

−2
1
1
¸
¸
, for λ = 2:

0
−1
1
¸
¸
, for λ = 4:

2
1
1
¸
¸
(d) for λ = −1 :

0
−1
1
¸
¸
, for λ = 2:

1
−2
1
¸
¸
, for λ = 6:

1
−1
1
¸
¸
(e) for λ = 2:

1
0
1
¸
¸
and

1
1
0
¸
¸
, for λ = −4:

−1
1
1
¸
¸
(f) for λ = −2:

1
0
0
0
¸
¸
¸
and

1
1
0
0
¸
¸
¸
, for λ = 3:

0
1
1
0
¸
¸
¸
and

0
0
1
1
¸
¸
¸
5. (a)

1
0
0
¸
¸
,

0
1
0
¸
¸
,

0
0
1
¸
¸
(b)

1
0
0
¸
¸

0
0
1
¸
¸
(c)

0
0
1
¸
¸
SECTION 10
1. (a) ￿

1 1
0 2 ￿

= ￿

1 1
0 1 ￿
￿
1 0
0 2 ￿
￿
1 1
0 1 ￿

−1
(b)

5 0 2
0 1 0
−4 0 −1
¸
¸
=

0 1 −1
1 0 0
0 −2 1
¸
¸

1 0 0
0 1 0
0 0 3
¸
¸

0 1 −1
1 0 0
0 −2 1
¸
¸
−1
(c)

2 2 2
1 2 0
1 0 2
¸
¸
=

−2 0 2
1 −1 1
1 1 1
¸
¸

0 0 0
0 2 0
0 0 4
¸
¸

−2 0 2
1 −1 1
1 1 1
¸
¸
−1
(d)

6 4 4
−7 −2 −1
7 4 3
¸
¸
=

0 1 1
−1 −2 −1
1 1 1
¸
¸

−1 0 0
0 2 0
0 0 6
¸
¸

0 1 1
−1 −2 −1
1 1 1
¸
¸
−1
(e)

0 2 2
2 0 −2
2 −2 0
¸
¸
=

1 1 −1
0 1 1
1 0 1
¸
¸

2 0 0
0 2 0
0 0 −4
¸
¸

1 1 −1
0 1 1
1 0 1
¸
¸
−1
(f)

−2 0 0 0
0 −2 5 −5
0 0 3 0
0 0 0 3
¸
¸
¸
=

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
¸
¸
¸

−2 0 0 0
0 −2 0 0
0 0 3 0
0 0 0 3
¸
¸
¸

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
¸
¸
¸
−1
3. (a) Maybe. (In fact it is.) (b) Yes, since symmetric. (c) Maybe. (In fact it is
not.)
SECTION 11
1. (a) exp ￿￿

1 2
0 2 ￿￿

= ￿

1 1
0 1 ￿
￿
e 0
0 e
2 ￿
￿
1 1
0 1 ￿

−1
(b) exp

¸

5 0 2
0 1 0
−4 0 −1
¸
¸

=

0 1 −1
1 0 0
0 −2 1
¸
¸

e 0 0
0 e 0
0 0 e
3
¸
¸

0 1 −1
1 0 0
0 −2 1
¸
¸
−1
(c) exp

¸

2 2 2
1 2 0
1 0 2
¸
¸

=

−2 0 2
1 −1 1
1 1 1
¸
¸

1 0 0
0 e
2
0
0 0 e
4
¸
¸

−2 0 2
1 −1 1
1 1 1
¸
¸
−1
(d) exp

¸

6 4 4
−7 −2 −1
7 4 3
¸
¸

=

0 1 1
−1 −2 −1
1 1 1
¸
¸

e
−1
0 0
0 e
2
0
0 0 e
6
¸
¸

0 1 1
−1 −2 −1
1 1 1
¸
¸
−1
(e) exp

¸

0 2 2
2 0 −2
2 −2 0
¸
¸

=

1 1 −1
0 1 1
1 0 1
¸
¸

e
2
0 0
0 e
2
0
0 0 e
−4
¸
¸

1 1 −1
0 1 1
1 0 1
¸
¸
−1
(f) exp

¸
¸

−2 0 0 0
0 −2 5 −5
0 0 3 0
0 0 0 3
¸
¸
¸

=

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
¸
¸
¸

e
−2
0 0 0
0 e
−2
0 0
0 0 e
3
0
0 0 0 e
3
¸
¸
¸

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
¸
¸
¸
−1
2. (a) exp ￿￿

1 2
0 2 ￿

t ￿

= ￿

1 1
0 1 ￿
￿
e
t
0
0 e
2t ￿
￿
1 1
0 1 ￿

−1
(b) exp

¸

5 0 2
0 1 0
−4 0 −1
¸
¸
t

=

0 1 −1
1 0 0
0 −2 1
¸
¸

e
t
0 0
0 e
t
0
0 0 e
3t
¸
¸

0 1 −1
1 0 0
0 −2 1
¸
¸
−1
(c) exp

¸

2 2 2
1 2 0
1 0 2
¸
¸
t

=

−2 0 2
1 −1 1
1 1 1
¸
¸

1 0 0
0 e
2t
0
0 0 e
4t
¸
¸

−2 0 2
1 −1 1
1 1 1
¸
¸
−1
(d) exp

¸

6 4 4
−7 −2 −1
7 4 3
¸
¸
t

=

0 1 1
−1 −2 −1
1 1 1
¸
¸

e
−t
0 0
0 e
2t
0
0 0 e
6t
¸
¸

0 1 1
−1 −2 −1
1 1 1
¸
¸
−1
(e) exp

¸

0 2 2
2 0 −2
2 −2 0
¸
¸
t

=

1 1 −1
0 1 1
1 0 1
¸
¸

e
2t
0 0
0 e
2t
0
0 0 e
−4t
¸
¸

1 1 −1
0 1 1
1 0 1
¸
¸
−1
(f) exp

¸
¸

−2 0 0 0
0 −2 5 −5
0 0 3 0
0 0 0 3
¸
¸
¸
t

=

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
¸
¸
¸

e
−2t
0 0 0
0 e
−2t
0 0
0 0 e
3t
0
0 0 0 e
3t
¸
¸
¸

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
¸
¸
¸
−1
SECTION 12
1. (a) c
1
e
t ￿

1
0 ￿

+c
2
e
2t ￿

1
1 ￿

(b) c
1
e
t

0
1
0
¸
¸
+c
2
e
t

1
0
−2
¸
¸
+c
3
e
3t

−1
0
1
¸
¸
(c) c
1

−2
1
1
¸
¸
+c
2
e
2t

0
−1
1
¸
¸
+c
3
e
4t

2
1
1
¸
¸
(d) c
1
e
−t

0
−1
1
¸
¸
+c
2
e
2t

1
−2
1
¸
¸
+c
3
e
6t

1
−1
1
¸
¸
(e) c
1
e
2t

1
0
1
¸
¸
+c
2
e
2t

1
1
0
¸
¸
+c
3
e
−4t

−1
1
1
¸
¸
(f) c
1
e
−2t

1
0
0
0
¸
¸
¸
+c
2
e
−2t

1
1
0
0
¸
¸
¸
+c
3
e
3t

0
1
1
0
¸
¸
¸
+c
4
e
3t

0
0
1
1
¸
¸
¸
2. (a) e
t ￿

1
0 ￿

+ 2e
2t ￿

1
1 ￿

(b) 2e
t

0
1
0
¸
¸
+ 2e
t

1
0
−2
¸
¸
+e
3t

−1
0
1
¸
¸
(c)

−2
1
1
¸
¸
+e
2t

0
−1
1
¸
¸
+e
4t

2
1
1
¸
¸
(d) e
−t

0
−1
1
¸
¸
−e
2t

1
−2
1
¸
¸
+e
6t

1
−1
1
¸
¸
(e) 3e
2t

1
0
1
¸
¸
+ 2e
2t

1
1
0
¸
¸
+ 1e
−4t

−1
1
1
¸
¸
(f) e
−2t

1
0
0
0
¸
¸
¸
+e
−2t

1
1
0
0
¸
¸
¸
+e
3t

0
1
1
0
¸
¸
¸
+ 2e
3t

0
0
1
1
¸
¸
¸
3. (a) neutrally stable (b) unstable (c) stable
SECTION 13
3. α ±iβ
4. (a) ￿

3 +i 3 −i
2 2 ￿
￿
3 +i2 0
0 3 −i2 ￿
￿
3 +i 3 −i
2 2 ￿

−1
= ￿

3 1
2 0 ￿
￿
3 2
−2 3 ￿
￿
3 1
2 0 ￿

−1
(b)

−i i 0
1 −i 1 +i 1
1 1 0
¸
¸

−1 +i3 0 0
0 −1 −i3 0
0 0 1
¸
¸

−i i 0
1 −i 1 +i 1
1 1 0
¸
¸
−1
=

0 −1 0
1 −1 1
1 0 0
¸
¸

−1 3 0
−3 −1 0
0 0 1
¸
¸

0 −1 0
1 −1 1
1 0 0
¸
¸
−1
5. (a) (c
1
e
3t
cos 2t +c
2
e
3t
sin 2t) ￿

3
2 ￿

+ (−c
1
e
3t
sin 2t +c
2
e
3t
cos 2t) ￿

1
0 ￿

(b) (c
1
e
−t
cos 3t +c
2
e
−t
sin 3t)

0
1
1
¸
¸
+
(−c
1
e
−t
sin 3t +c
2
e
−t
cos 3t)

−1
−1
0
¸
¸
+c
3
e
t

0
1
0
¸
¸
6. (a) c
1
= 2, c
2
= −3 (b) c
1
= 1, c
2
= 2, c
3
= 3
SECTION 14
1. (a) u
k
= 64(1)
k ￿

1
2 ￿

−64(.25)
k ￿

−1
1 ￿

= 64 ￿

1 + (.25)
k
2 −(.25)
k ￿

→64 ￿

1
2 ￿

= ￿

64
128 ￿

(b) u
k
= 1(−1)
k ￿

6
2 ￿

+ 2(.5)
k ￿

6
4 ￿

= ￿

6(−1)
k
+ 12(.5)
k
2(−1)
k
+ 8(.5)
k ￿

, bounded, no limit
(c) u
k
=
3
4
(3)
k ￿

2
1 ￿

+
5
4
(−1)
k ￿

−2
1 ￿

=
1
4 ￿

6(3)
k
−10(−1)
k
3(3)
k
+ 5(−1)
k ￿

, blows up
2.

.5 .5 .5
.25 .5 0
.25 0 .5
¸
¸
=

2 2 0
1 −1 −1
1 −1 1
¸
¸

1 0 0
0 0 0
0 0 .5
¸
¸

2 2 0
1 −1 −1
1 −1 1
¸
¸
−1
,
1(1)
k

2
1
1
¸
¸
+ (.5)
k

0
−1
1
¸
¸
=

2
1 −(.5)
k
1 + (.5)
k
¸
¸

2
1
1
¸
¸
3.

.5 0 .5
0 .5 .5
.5 .5 0
¸
¸
=

−1 1 −1
1 1 −1
0 1 2
¸
¸

.5 0 0
0 1 0
0 0 −.5
¸
¸

−1 1 −1
1 1 −1
0 1 2
¸
¸
−1
,
−30(.5)
k

−1
1
0
¸
¸
+ 50(1)
k

1
1
1
¸
¸
−10(−.5)
k

−1
−1
2
¸
¸

50
50
50
¸
¸
4.

1 .25 0
0 .5 .5
0 .25 .5
¸
¸
,

1
0
0
¸
¸
, everyone dies!
5. Everyone has blue eyes!
SECTION 15
3. (a) not closed under addition or scalar multiplication
(c) not closed under scalar multiplication
(e) not closed under scalar multiplication
4. All span the plane of x
1
−x
2
= 0 in R
3
.
6. (a) c ￿

1
3 ￿

(b) c

−1
0
1
¸
¸
+d

−1
1
0
¸
¸
(c) c

−1
0
1
¸
¸
(d) c

4
0
0
1
¸
¸
¸
+d

−3
0
1
0
¸
¸
¸
+e

2
1
0
0
¸
¸
¸
(e) c

2
1
4
0
1
¸
¸
¸
¸
¸
+d

−1
0
−1
1
0
¸
¸
¸
¸
¸
7. (a) ￿

1
0 ￿

+c ￿

1
3
1 ￿

(b)

1
0
0
¸
¸
+c

−1
0
1
¸
¸
+d

−1
1
0
¸
¸
SECTION 16
1. (a) independent
(b) independent
(c) dependent
(d) independent
(e) dependent
2. (a) ￿

1
0 ￿

, ￿

0
1 ￿

(b)

1
0
0
¸
¸
,

0
1
0
¸
¸
,

0
0
1
¸
¸
(c)

1
0
0
¸
¸
,

0
1
2
¸
¸
(d)

1
0
1
0
¸
¸
¸
,

0
1
0
0
¸
¸
¸
,

0
0
0
1
¸
¸
¸
(e)

3
0
3
1
¸
¸
¸
,

0
3
0
1
¸
¸
¸
3. Same answers as for Section 15 Exercise 6.
5. (a) 3

3
1
2
¸
¸
−2

2
2
1
¸
¸
(b) no solution
(c) (6 +c)

3
1
2
¸
¸
+ (−4 −c)

2
2
1
¸
¸
+c

−1
1
−1
¸
¸
many solutions
(d) ) ￿

2
1 ￿

+ 6 ￿

1
2 ￿

6. (a) U and V might be, W is not.
(b) U does not, V and W might.
(c) U and W are not, V might be.
SECTION 17
1. (a) ￿x￿ = 5, ￿y￿ = 5

5
(b)

1
5
2
5

2
5

4
5
¸
¸
¸
¸
¸
,

6
5

5

2
5

5
2
5

5
9
5

5
¸
¸
¸
¸
¸
¸
¸
(c) 153.43

(d)

−2
−4
4
8
¸
¸
¸
(e)

−2
−4
4
8
¸
¸
¸
+

−4
2
−2
1
¸
¸
¸
2. (5, 15/2)
3. c ￿

−β
α ￿

5. (a) c

−1
1
0
¸
¸
+d

−1
0
1
¸
¸
(b) c

2
−3
1
¸
¸
(c) c

1
−3
0
1
¸
¸
¸
+d

0
−1
1
0
¸
¸
¸
(d) c

−1
1
2
0
¸
¸
¸
6. (a) −x
1
+x
2
= 0, −x
1
+x
3
= 0
(b) 2x
1
−3x
2
+x
3
= 0
(c) x
1
−3x
2
+x
4
= 0, −x
2
+x
3
= 0
(d) −x
1
+x
2
+ 2x
3
= 0
7. (a) False. (b) False.
SECTION 18
2. (a) Reﬂection of R
2
in 135

line.
(b) Projection of R
2
onto y-axis.
(c) Projection of R
2
onto 135

line.
(d) Rotation of R
2
by 45

.
(e) Rotation of R
2
by −60

.
(f) Reﬂection of R
2
in 150

line.
(g) Rotation of R
2
by arctan ￿

β
α ￿

.
(h) Rotation of R
3
around z-axis by 90

.
(i) Rotation of R
3
around y-axis by −90

.
(j) Projection of R
3
onto xy-plane.
(k) Rotation of R
3
around z-axis by 90

and reﬂection in xy-plane.
(l) Rotation of R
3
around z-axis by arctan ￿

β
α ￿

and reﬂection in xy-plane.
3. (a)

−1 0 0
0 −1 0
0 0 −1
¸
¸
(b)

1 0 0
0 0 0
0 0 1
¸
¸
(c)

0 1 0
1 0 0
0 0 1
¸
¸
(d)

1 0 0
0
1

2

1

2
0
1

2
1

2
¸
¸
4. (a) x
2
+y
2
= 4, a circle of radius 2.
(b) ￿

x
2 ￿

2
+ ￿

y
3 ￿

2
= 1, an ellipse.
7. (a) ￿

1 0
0 −1 ￿

, reﬂects in the x-axis.
(b) ￿

−1 0
0 1 ￿

, reﬂects in the y-axis.
(c)

0 0 1
0 −1 0
1 0 0
¸
¸
, rotates by 180

around the line deﬁned by the vector

1
0
1
¸
¸
.
13.

0 1 0 0
0 0 2 1
1 0 1 0
0 0 2 0
¸
¸
¸
SECTION 19
1. (a) ￿

1
2 ￿

; ￿

1
2 ￿

; ￿

−2
1 ￿

; R
2
→R
2
; rank = 1
(b) ￿

1
0 ￿

, ￿

0
1 ￿

; ￿

1
2 ￿

, ￿

2
3 ￿

; ￿

0
0 ￿

; R
2
→R
2
; rank = 2
(c)

1
0
0
¸
¸
,

0
2
1
¸
¸
;

2
0
2
¸
¸
,

4
4
8
¸
¸
;

0
1
−2
¸
¸
; R
3
→R
3
; rank = 2
(d)

1
0
0
¸
¸
,

0
1
0
¸
¸
,

0
0
1
¸
¸
;

3
6
−3
0
¸
¸
¸
,

2
3
−1
−1
¸
¸
¸
,

−1
5
8
7
¸
¸
¸
;

0
0
0
¸
¸
; R
3
→R
4
; rank = 3
(e)

1
2
0
1
4
¸
¸
¸
¸
¸
,

0
0
1
5
3
¸
¸
¸
¸
¸
;

1
2
3
¸
¸
,

−1
−1
−3
¸
¸
;

−4
0
−3
0
1
¸
¸
¸
¸
¸
,

−1
0
−5
1
0
¸
¸
¸
¸
¸
,

−2
1
0
0
0
¸
¸
¸
¸
¸
; R
5
→R
3
; rank = 2
(f)

1
0
−6
0
−16
¸
¸
¸
¸
¸
,

0
1
2
0
4
¸
¸
¸
¸
¸
,

0
0
0
1
2
¸
¸
¸
¸
¸
;

2
2
−2
0
¸
¸
¸
,

8
7
−6
2
¸
¸
¸
,

0
1
−1
−2
¸
¸
¸
;

16
−4
0
−2
1
¸
¸
¸
¸
¸
,

6
−2
1
0
0
¸
¸
¸
¸
¸
; R
5
→ R
4
;
rank = 3
2. (a) Yes,

0
0
0
¸
¸
. (b) Yes, yes. (c) No. (d) Yes.
3. (a) No. (b) No. (c)

−9
0
1
¸
¸
,

−2
1
0
¸
¸
. (d) No. (e) Yes.
5. (a) Since row(A) ⊥ null(A).
(b) Since dim(row(A)) + dim(null(A)) = 3 and dim(col(A)) = dim(row(A)).
(c) Since dim(col(A)) = dim(row(A)).
6. None or inﬁnitely many.
SECTION 20
1. (a) ￿

4
1 ￿

(b)

5
3

1
2
1
4
¸
¸
2. (a) y =
4
3
x
(b) y = −.2 + 1.1x
(c) z = 2 + 2x + 3y
(d) z = 10 + 8x
2
−y
2
(e) y = −
3
2

3
2
t +
7
2
t
2
(f) y = 2 + 3 cos t + sin t
3.

5 0 3
0 15 21
3 21 32
¸
¸

Cu
Fe
S
¸
¸
=

413.91
1511.13
2389.58
¸
¸
The solution is

Cu
Fe
S
¸
¸
=

63.543
55.851
32.065
¸
¸
!
4. (a) ￿

.1 .3
.3 .9 ￿

(b)

4
9
2
9
4
9
2
9
1
9
2
9
4
9
2
9
4
9
¸
¸
¸
(c)

1 0 0
0
1
2
1
2
0
1
2
1
2
¸
¸
¸
(d)

1
2
0
1
2
0 1 0
1
2
0
1
2
¸
¸
¸
(e)

1
3
1
3
0
1
3
1
3
1
3
0
1
3
0 0 1 0
1
3
1
3
0
1
3
¸
¸
¸
¸
¸
5.

1
5
2
5
2
¸
¸
¸
6.

1 0 0
0 0 1
0 1 0
¸
¸
7. (a) ￿

.2 .4
.4 .8 ￿

(b)

5
6
1
6
1
3
1
6
5
6

1
3
1
3

1
3
1
3
¸
¸
¸
SECTION 21
1. (a) ￿

5
13
12
13 ￿

, ￿

12
13
5
13 ￿

(b)

3
7
6
7
2
7
¸
¸
¸
,

2
7

3
7
6
7
¸
¸
¸
,

6
7
2
7
3
7
¸
¸
¸
(c)

1
2

1
2

1
2

1
2
¸
¸
¸
¸
¸
,

1
2

1
2
1
2
1
2
¸
¸
¸
¸
¸
,

1
2

1
2
1
2

1
2
¸
¸
¸
¸
¸
,

1
2
1
2
1
2

1
2
¸
¸
¸
¸
¸
(d)

2
3
11
15
2
15
¸
¸
¸
,

1
3
2
15
14
15
¸
¸
¸
(e)

1
2

1
2

1
2

1
2
¸
¸
¸
¸
¸
,

1
2

1
2
1
2
1
2
¸
¸
¸
¸
¸
,

1
2
1
2

1
2
1
2
¸
¸
¸
¸
¸
2. (a) ￿

5
13

12
13
12
13
5
13 ￿ ￿

13 −26
0 13 ￿

(b)

3
7

2
7
6
7
6
7

3
7
2
7
2
7
6
7
3
7
¸
¸
¸

7 −7 7
0 7 7
0 0 7
¸
¸
(c)

1
2
1
2

1
2
1
2

1
2

1
2

1
2
1
2

1
2
1
2
1
2
1
2

1
2
1
2

1
2

1
2
¸
¸
¸
¸
¸

2 0 2 1
0 2 0 0
0 0 2 0
0 0 0 1
¸
¸
¸
(d)

2
3
1
3
11
15
2
15
2
15
14
15
¸
¸
¸ ￿

15 −15
0 30 ￿

(e)

1
2
1
2
1
2

1
2

1
2
1
2

1
2
1
2

1
2

1
2
1
2
1
2
¸
¸
¸
¸
¸

2 3 1
0 1 2
0 0 3
¸
¸
3.

7
1
7
¸
¸
4.

1
9
4
9

8
9
¸
¸
¸
or

1
9

4
9
8
9
¸
¸
¸
5. ￿

4
1 ￿

6.

1
4

1
4

1
4
1
4

1
4
1
4
1
4

1
4

1
4
1
4
3
4
1
4
1
4

1
4
1
4
3
4
¸
¸
¸
¸
¸
SECTION 22
3. (a) ￿

4 −1
1 1 ￿
￿
2 0
0 −3 ￿
￿
4 −1
1 1 ￿

−1
(b) ￿

−2 1
1 2 ￿
￿
3 0
0 −2 ￿
￿
−2 1
1 2 ￿

−1
(c)

1 1 0
0 1 0
0 0 1
¸
¸

2 0 0
0 3 0
0 0 3
¸
¸

1 1 0
0 1 0
0 0 1
¸
¸
−1
4. (a) ￿

2

5
1

5
1

5
2

5 ￿ ￿

3 0
0 −2 ￿ ￿

2

5
1

5
1

5
2

5 ￿
T
(b)

2

5
1

5
0
1

5
2

5
0
0 0 1
¸
¸
¸

4 0 0
0 −1 0
0 0 1
¸
¸

2

5
1

5
0
1

5
2

5
0
0 0 1
¸
¸
¸
T
(c)

0
2

5
1

5
−1 0 0
0 −
1

5
2

5
¸
¸
¸

5 0 0
0 5 0
0 0 0
¸
¸

0
2

5
1

5
−1 0 0
0 −
1

5
2

5
¸
¸
¸
T
(d)

1

2
1

6

1

3
0
2

6
1

3
1

2

1

6
1

3
¸
¸
¸
¸

2 0 0
0 2 0
0 0 −4
¸
¸

1

2
1

6

1

3
0
2

6
1

3
1

2

1

6
1

3
¸
¸
¸
¸
T
5. (a) ￿

3
2

5
1
2

5
1
2

5
−3
2

5 ￿ ￿

1 0
0 0 ￿ ￿

3
2

5
1
2

5
1
2

5
−3
2

5 ￿
T
= ￿

.9 .3
.3 .1 ￿

(b) ￿

3
2

5
1
2

5
1
2

5
−3
2

5 ￿ ￿

1 0
0 −1 ￿ ￿

3
2

5
1
2

5
1
2

5
−3
2

5 ￿
T
= ￿

.8 .6
.6 −.8 ￿

6. (a)

1

2
1

6
1

3

1

2
1

6
1

3
0 −
2

6
1

3
¸
¸
¸
¸

1 0 0
0 1 0
0 0 −1
¸
¸

1

2
1

6
1

3

1

2
1

6
1

3
0 −
2

6
1

3
¸
¸
¸
¸
T
(b)

1

6
1

2
1

3

1

6

1

2
1

3
2

6
0
1

3
¸
¸
¸
¸

1
2

3
2
0

3
2
1
2
0
0 0 1
¸
¸
¸
¸

1

6
1

2
1

3

1

6

1

2
1

3
2

6
0
1

3
¸
¸
¸
¸
T
(c)

0 −1 0 0
0 0 0 1
0 0 1 0
1 0 0 0
¸
¸
¸

0 1 0 0
−1 0 0 0
0 0 0 1
0 0 −1 0
¸
¸
¸

0 −1 0 0
0 0 0 1
0 0 1 0
1 0 0 0
¸
¸
¸
T
7.

1

2
1

2
0
0 0 1
1

2
1

2
0
¸
¸
¸

1 0 0
0 0 −1
0 1 0
¸
¸

1

2
1

2
0
0 0 1
1

2
1

2
0
¸
¸
¸
T
=

1
2

1

2

1
2
1

2
0
1

2

1
2

1

2
1
2
¸
¸
¸
¸
15. Eigenvalues for A are ±1, ±i; for B are 0, 0, 0, 1.

PREFACE These notes are intended to serve as an introduction to linear equations and matrices, or as the subject is usually called, linear algebra. Linear algebra has two distinct personalities. On the one hand it serves as a computational device, a problem solving tool indispensable in all quantitative disciplines. This is its algebraic side. On the other hand its concepts can be seen, its relationships visualized. This is its geometric side. The structure of these notes follows this basic dichotomy. In Part 1 we focus on how matrices provide convenient tools for systematizing laborious calculations by providing a compact notation for storing information and for describing relationships. In Part 2 we present those concepts of linear algebra that are best understood geometrically, and we show how matrices describe transformations of physical space. The style of these notes is informal, meaning that the main consideration is pedagogical clarity, not mathematical generality. The treatment of proofs varies. Some proofs are given in full, some proofs are given only partially, some proofs are given by example, and some proofs are omitted entirely. Whenever possible, ideas are illustrated by computational examples and are given geometric interpretations. Some of the important and essential applications of linear algebra are also presented, including cubic splines, ODE’s, Markov matrices, and least squares. These notes should be thought of as course notes or lecture notes, not as a course text. The distinction is that we take a fairly direct path through the material with certain speciﬁc goals in mind. These goals include the solution of linear systems, the structure of linear transformations, least squares approximations, orthogonal transformations, and the three basic matrix factorizations: LU, diagonal, and QR. There is very little deviation from this path or extraneous material presented. The text is lean, and Sections 6, 11, 12, 13, and 14 can be skipped without loss of continuity. Almost all the exercises are important and help to develop subsequent material. Linear algebra is a beautiful and elegant subject, but its practical side is equally compelling. Stated in starkest terms, linear problems are solvable while nonlinear problems are not.

PART 1: Algebra 1. Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. The LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4. Row Exchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5. Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6. Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7. Systems with Many Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 8. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 9. Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 10. Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 11. Matrix Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 12. Diﬀerential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 13. The Complex Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 14. Diﬀerence Equations and Markov Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 PART 2: Geometry 15. Vector Spaces, Subspaces and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 16. Linear Independence, Basis, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 17. Dot Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 18. Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 19. Row Space, Column Space, and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 20. Least Squares and Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 21. Orthogonal Matrices, Gram-Schmidt, and QR Factorization . . . . . . . . . . . . . . . 126 22. Diagonalization of Symmetric and Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . 136 23. Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 24. Positive Deﬁnite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Answers to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

v. To do this we use Gaussian elimination. The problem is to ﬁnd the unknown values of u. This completes the second elimination step. and w. We begin with a simple system of three equations and three unknowns: 2u + v − w = 5 4u − 2v = 0 6u − 7v + w = −9 .5 equals the coeﬃcient −10 divided by the pivot −4. in the second step. The resulting system is equivalent to the original one. Gaussian Elimination 1 PART 1: ALGEBRA 1. The coeﬃcient −1 of w in the third equation is the pivot of the third elimination step. The coeﬃcient −4 of v in the second equation is the pivot for this step. How did we determine the multipliers 2 and 3 in the ﬁrst step and 2. divided by the pivot for that step. and its simple triangular form suggests an obvious method of solution: The third equation gives w = −1.5 times the second equation from the third equation to get 2u + v − w = 5 − 4v + 2w = −10 − w= 1. This completes the ﬁrst elimination step.5 in the second? Each is just the leading coeﬃcient of the row being subtracted from. To accomplish this. The elimination process is now complete. For example.1. This simple process is called back substitution. which did not have to be performed. The ﬁrst step of Gaussian elimination is to use the coeﬃcient 2 of u in the ﬁrst equation to eliminate the u from the second and third equations. 2. substituting this into the second equation −4v+2(−1) = −10 gives v = 2. The coeﬃcient 2 of u in the ﬁrst equation is called the pivot for this step. and substituting both into the ﬁrst equation 2u + (2) − (−1) = 5 gives u = 1. GAUSSIAN ELIMINATION The central problem of linear algebra is to ﬁnd the solutions of systems of linear equations. subtract 2 times the ﬁrst equation from the second equation and 3 times the ﬁrst equation from the third equation. The result is 2u + v − w = 5 − 4v + 2w = −10 − 10v + 4w = −24 . Next use the coeﬃcient −4 of v in the second equation to eliminate the v from the third equation. Just subtract 2. which are themselves called unknowns or variables. .

2

1. Gaussian Elimination

We said that the triangular system obtained above is equivalent to the original system, but what does this mean? It means simply that the two systems have the same solution. This is clear since any solution of the original system must also be a solution of each system obtained after each step of Gaussian elimination. This is because Gaussian elimination amounts to nothing more than the subtraction of equals from equals. Therefore any solution of the original system must also be a solution of the ﬁnal triangular system. And by reversing this argument we see that any solution of the ﬁnal triangular system must also be a solution of the original system. Both systems must therefore have the same solutions. We can simplify Gaussian elimination by noticing that there is no need to carry the symbols for the unknowns u, v, w along in each step. We can instead represent the system as an array: ￿   2 1 −1 ￿ 5 ￿  4 −2 0 ￿ 0  . ￿ 6 −7 1 ￿ −9 The numbers multiplying the unknowns in the equations are called coeﬃcients and are determined by their position in the array. They are separated from the right-hand sides of the equations by a vertical line. The ﬁrst elimination step gives ￿   2 1 −1 ￿ 5 ￿  0 −4 2 ￿ −10  , ￿ 0 −10 4 ￿ −24 and the second gives ￿  2 1 −1 ￿ 5 ￿  0 −4 2 ￿ −10  . ￿ 0 0 −1 ￿ 1 

Note that the coeﬃcient part of the array is now in triangular form with the pivots on the diagonal. Back substitution gives the solution. Can this process ever fail? It is clear that as long as the pivots are not zero at each step, Gaussian elimination and back substitution will produce a solution. But if a pivot is ever zero, Gaussian elimination will have to stop. This can happen suddenly and unpredictably. In the example above, the second pivot was −4, but we did not know this until we completed the ﬁrst elimination step. It could have turned out to be zero thereby stopping the process. In fact this would have happened if the coeﬃcient of v in the ﬁrst equation were −1 instead of 1. In general, we don’t know what the pivot for a particular elimination step is going to be until we complete the previous step, so we don’t know ahead of time if the process is going to succeed. In most cases the problem of a zero pivot can be ﬁxed by exchanging two equations. In some cases the zero pivot represents a true breakdown, meaning that there is either no solution or inﬁnitely many solutions. We will consider these possibilities later. For now we assume our systems have only nonzero pivots and thus have unique solutions.

1. Gaussian Elimination

3

A comment on terminology: The oﬃcial mathematical deﬁnition of a pivot requires it to be nonzero. Therefore to say “nonzero pivot” is redundant, and to say “zero pivot” is contradictory. For the latter we really should say “a zero in the pivot position” or “a zero in the diagonal position.” But since it is simpler and clearer just to say “nonzero pivot” or “zero pivot”, we will continue to do so. We will however discuss this point further in Section 7 where the exact deﬁnition of a pivot will be given. We have seen how Gaussian elimination puts the coeﬃcient part of the array into triangular form so that back substitution will give the solution. But, instead of back substitution, we can also use Gaussian elimination from the bottom up to get the solution. For the example above, this is done as follows: Use Gaussian elimination to get the array into triangular form as before: ￿   2 1 −1 ￿ 5 ￿  0 −4 2 ￿ −10  . ￿ 0 0 −1 ￿ 1 Next subtract −2 times the third row from the second row and 1 times the third row from the ﬁrst row to obtain ￿   2 1 0 ￿ 4 ￿  0 −4 0 ￿ −8  , ￿ 0 0 −1 ￿ 1 and then subtract −.25 times the second  2 0  0 −4 0 0 row from the ﬁrst to obtain ￿  0 ￿ 2 ￿ 0 ￿ −8  . ￿ −1 ￿ 1

Clearly the purpose of these steps is to introduce zeros above the diagonal entries. The coeﬃcient part of the array is now in diagonal form, and the solution u = 1, v = 2, w = −1 is obvious. This method of using Gaussian elimination forwards and backwards is called Gauss-Jordan elimination. It can be used for solving small problems by hand, but it is ineﬃcient for large problems. We will see later (Section 3) that ordinary Gaussian elimination with back substitution requires fewer operations and is therefore preferable. EXERCISES 1. Solve the following systems using Gaussian elimination in array form. (a) u − 6v = −8 3u − 2v = −8

4 (b) (c)

1. Gaussian Elimination

5u − v = −1 −3u + 2v = −5 2u + v + 3w = −4 −2u + 5v + w = 18 4u + 2v + 4w = −6 4u − 2v + 4w = −24 2u + 3v − w = 17 −8u + 2v + 5w = −1 3u + 5v − 2v − 3w 6w + 2x − w − 2x = 3 = −6 = 14 = −4

(d)

(e)

2. Solve the system below. When a zero pivot arises, exchange the equation with the one below it and continue. u + v + w = −2 3u + 3v − w = 6 u − v + w = −1 3. Try to solve the system below. Why won’t the trick in the previous problem work here? u + v + w = −2 3u + 3v − w = 6 u + v + w = −1 4. A farmer has two breeds of chickens, Rhode Island Red and Leghorn. In one year, one Rhode Island Red hen will yield 10 dozen eggs and 4 pounds of meat, and one Leghorn hen will yield 12 dozen eggs and 3 pounds of meat. The farmer has a market for 2700 dozen eggs and 900 pounds of meat. How many hens of each breed should he have to meet the demand of the market exactly? 5. Suppose a man wants to consume exactly his minimum daily requirements of 70.5 grams of protein and 300 grams of carbohydrates on a diet of bread and peanut butter. How many grams of each should he eat if bread is 10% protein and 50% carbohydrates and peanut butter is 25% protein and 20% carbohydrates? 6. A nutritionist determines her minimum daily needs for energy (1,800 kcal), protein (92 g), and calcium (470 mg). She chooses three foods, pasta, chicken, and broccoli, and she collects the following data on the nutritive value per serving of each.

8. (2.5). chicken. Find the cubic polynomial function f (x) = ax3 + bx2 + cx + d such that f (0) = 2. (0.1). f ￿ (1) = 0. whose graph pass through) the points (−1.1. and calcium exactly. Gaussian Elimination 5 energy (kcal) protein (g) calcium (mg) pasta 150 5 10 chicken 200 30 10 broccoli 25 3 90 She then asks how many servings per day of pasta. protein. (This is called cubic Hermite interpolation). and broccoli must she consume in order to satisfy her minimum daily needs for energy. 7.5). f ￿ (0) = 1. Find the cubic polynomial y = ax3 + bx2 + cx + d that interpolates (that is. (1.−1). . f (1) = 1. Sketch its graph.

0. and an n × 1 matrix. which is a row vector. namely. Two basic operations on matrices are multiplication by scalars   6 3 −3 15 3C =  12 −6 0 0  18 −21 3 −27 and addition      2 1 −3 6 −1  −3 2   4 −2   1  + = 0 4 4 −1 4 −1 0 3 0 2  7 0 . 3 0 Two matrices can be added together as long as they have the same dimensions. It has three rows and four columns. 4 6 2 Two vectors can be added together as long as they are the same size. −9) in three-dimensional space. Multiplication for these matrices is done as follows:   3  1  = [ 4 · 3 + 1 · 1 + 3 · 0 ] = [ 13 ] . We deﬁne a matrix to be an array of column vectors of the same size. which is a column vector. the product of a 1 × n matrix.6 2. Matrix Notation 2. For example   5 b= 0  −9 is the column vector that represents the point (5. points in space can be represented by vectors. [4 1 3] 0 . For example   2 1 −1 5 C =  4 −2 0 0  6 −7 1 −9 is a 3 × 4 matrix (read “three by four matrix”). How do we multiply matrices? We ﬁrst answer the question for two special matrices. MATRIX NOTATION As is common in multidimensional calculus. The basic operations on vectors are multiplication by scalars (real numbers for the time being)   15 3b =  0  −27 and addition       5 −4 1  −2  +  −3  =  −5  .

For example  4 2  1 2 1 6 0 2  3  3 8  1 9 0 1   13 5  12 0 =  3 1 8  23 18  . 14 11 . take the product of each row of the matrix with the column vector and stack the results to form a new column vector:  4 2  1 2 1 6 0 2      3   4·3+1·1+3·0 13 3 8     2 · 3 + 6 · 1 + 8 · 0   12   1 =  =  . to multiply two matrices. The problem is to ﬁnd x. To extend the deﬁnition to the product of a matrix and a column vector. just multiply the left matrix times each column of the right matrix and line up the resulting two vectors in a new matrix. called the coeﬃcient matrix of the system. satisﬁes the equation      2 1 −1 1 5  4 −2 0   2  =  0  . 1 w −9 This is an equation of the form Ax = b where the known matrix A. multiplies the unknown vector x and equals the known vector b. As an application. 9 1·3+0·1+1·0 3 0 1 2·3+2·1+1·0 8 Note that the number of columns of the matrix must equal the number of components of the vector being multiplied. we note that the system of equations considered in the previous section 2u + v − w = 5 4u − 2v = 0 6u − 7v + w = −9 can now be represented as a matrix known one:  2 1  4 −2 6 −7 multiplying an unknown vector so as to equal a     −1 u 5 0  v  =  0 . The solution vector    u 1 x =  v  =  2 .2. Matrix Notation 7 This is just the familiar dot product of two vectors. 6 −7 1 −1 −9 Finally. w −1  obtained by Gaussian elimination.

. Matrix multiplication satisﬁes the associative law (AB)C = A(BC) and the two distributive laws A(B + C) = AB + AC and (B + C)D = BD + CD.   . . which we call I. . . 0 0 1 It is easy to see that for any 3 × 3 matrix A we have IA = AI = A and that this property carries over to the n × n case. . a1n ￿ a1 n+1 ￿  a21 a22 a23 .  . in the 3 × 3 case   1 0 0 I = 0 1 0.  . The notation for a general matrix A with m rows and n columns is   a11 a12 a13 . we can write a very simple program that uses Gaussian elimination and back substitution to solve an arbitrary linear system of n equations and n unknowns. if A is m × n and B is n × p. Matrix Notation Note again that the number of columns of the left factor must equal the number of rows of the right factor for this to make sense. . n ￿ then C is the m × p matrix with ijth coeﬃcient cij = aik bkj . a3n   . 1 2 1 1 1 1 1 2 In fact for many pairs of matrices AB is deﬁned whereas BA is not. . That is. For example ￿ ￿￿ ￿ ￿ ￿￿ ￿ 2 3 0 1 0 1 2 3 ￿= .   . with ones down its diagonal (also called its main diagonal) and zeros everywhere else. . amn where aij denotes the entry in the ith row and the jth column. . . . .) For every n there is a special n × n matrix. . Let A be m × n and B be n × p. For this reason I is called the identity matrix. ￿ an1 an2 an3 . . . . . . . am1 am2 am3 . We will try to k=1 avoid expressions like this. .  . In fact. . . in general AB ￿= BA. . . . then AB is m × p. a1n  a21 a22 a23 . . however. ￿ . . ￿ . satisfy the commutative law.) It does not. (The proofs of these properties are tedious and will be omitted. First express the system in array form: ￿   a11 a12 a13 . . . . . (See Exercise 2. . . . but it is important to understand them when writing computer programs to perform matrix computations. For example. a2n ￿ a2 n+1   ￿   a31 a32 a33 . . Using this notation we can deﬁne matrix multiplication as follows. ann an n+1 . . ￿ . a2n    A =  a31 a32 a33 . a3n ￿ a3 n+1  . In this example we multiplied a 4 × 3 matrix by a 3 × 2 matrix and obtained a 4 × 2 matrix. . . .8 2. In general. .

(Note that the program stops when a zero pivot is encountered. 2. Finally. 6. 4. we summarize the algebraic laws satisﬁed by matrix addition and multiplication. Compute the following: ￿ ￿ 5 7 −1 (a) 2 4 −2 0 (b)   6 7 1 (c)   4 0 −1 3 0 1 0  4  2 −2 1 −5    2 1 −2 1 + 3 6  2 5 −7 . A + B = B + A (commutative law for addition) A + (B + C) = (A + B) + C (associative law for addition) r(sA) = (rs)A r(A + B) = rA + rB (−1)A = −A A(BC) = (AB)C (associative law for multiplication) A(B + C) = AB + AC (left-distributive law for multiplication) (B + C)A = BA + CA (right-distributive law for multiplication) r(AB) = (rA)B = A(rB) EXERCISES 1. (The following equalities assume that all indicated operations make sense. 5. 3. 7. 8. 9.) And back substitution would look like for k = n down to 1 do t = ak n+1 for j = k + 1 to n do t = t − akj xj xk = t/akk . Matrix Notation 9 Then Gaussian elimination would look like for k = 1 to n − 1 do if akk = 0 then signal failure and stop for i = k + 1 to n do m = aik /akk aik = 0 for j = k + 1 to n + 1 do aij = aij − makj .) 1.2.

Matrix Notation  4 0 −1 [ 3 4 −5 ]  0 1 0  2 −2 1 [1   4 2 3]5 6 2 3]  1 1 3  (e) (f)   4 5[1 6      (g) 2 −1 5 0 0 −1 (h)  4 0 −1 2 −1 0 1 0 5 0 2 −2 1 0 −1  4 0 −1 1 0 1 0 0 2 −2 1 0 2 0 0 0 0 0 0 1 0 1 0 0 5 0 0 3 3 2 1 0 0 1 0  3 0  2 7 0 −2 (i)  0 0 1  3 7 0 (j) (k) 2. Which of the expressions 2A. . (a) diagonal matrix: aij = 0 for all i ￿= j. (b) symmetric matrix: aij = aji for all i and j. AB. and BA makes sense for the two matrices below? Which do not? ￿ ￿ ￿ ￿ 5 7 −1 2 3 A= B= 4 −2 0 1 2 3. A+B. Give 3 × 3 matrices that are examples of the following.10 (d) 2. (c) upper triangular matrix: aij = 0 for all i > j.

and use it to prove the formula (ABC)T = C T B T AT . (Use (c) and (d) above. then what kind of matrix is it? (See Exercise 3 above. (This requires a proof. Find examples of 2 × 2 matrices such that (a) A2 = −I (b) B 2 = 0 where no entry of B is zero. Verify [ c1  4 0 −1 c3 ]  0 1 0  = c1 [ 4 2 −2 1   c2 0 −1 ] + c2 [ 0 1 0 ] + c3 [ 2 −2 1]. then B T AB is symmetric.) Illustrate this with a 2 × 2 example.   2 −1 3 (a) What is the transpose of  5 0 7 ? 0 −1 0 (b) Illustrate the formula (A + B)T = AT + B T with a 2 × 2 example.2. 5. 9. (g) Show with a 2 × 2 example that the product of two symmetric matrices may not be symmetric. (c) The formula (AB)T = B T AT holds as long as the product AB makes sense. For any matrix A. Show with a 3 × 3 example that the product of two upper triangular matrices is upper triangular. (c) AB = AC but B ￿= C. The matrix (A + B)2 is always equal to which of the following.) (f) Show if A and B are square matrices and A is symmetric. (Zero matrices not allowed!) 6. which we omit. Verify  0 1 0   c2  = c1  0  + c2  1  + c3  0  . Matrix Notation 11 4. (d) If a matrix satisﬁes AT = A. (a) A(A + B) + B(A + B) (b) (A + B)A + (A + B)B (c) A2 + AB + BA + B 2 (d) (B + A)2 (e) A(A + B) + (A + B)B .         4 0 −1 c1 4 0 −1 7. 2 −2 1 c3 2 −2 1 8. we deﬁne its transpose AT to be the matrix whose columns are the corresponding rows of A.) (e) Show that for any matrix C (not necessarily square) the matrix C T C is symmetric.

which are symmetric matrices? (a) AT A (b) AT AAT (c) AT + A . b2 . .12 2. . . Ab2 . . . . . .  · · · Abn  . 11. .   . . . . . .  A  b1 . . . . Convince yourself that the product AB of two matrices can be thought of as A multiplying the columns of B to produce the columns of AB or  . . . . . .   =  Ab1 · · · bn  . . . . . . . Matrix Notation (f) A2 + 2AB + B 2 10. Assuming the operations make sense. .  .

The following equations describe exactly how the Gaussian steps turn the rows of A into the rows of U .5   0 2 1 −1 0   0 −4 2  .” it never changes from then on. 0 0 −1 ↓ ↓ We call the resulting upper triangular matrix U . we obtain   2 1 −1 A =  4 −2 0  6 −7 1    2 1 −1  0 −4 2  0 −10 4  2 1 −1 U =  0 −4 2  . It therefore can be considered as a row of U . this is just an expression of the matrix equation      row 1 of A 1 0 0 row 1 of U  row 2 of A  =  2 1 0   row 2 of U  row 3 of A 3 2.3.5(row 2 of U ) + 1(row 3 of U ) . THE LU FACTORIZATION If we run Gaussian elimination on the coeﬃcient matrix of Section 1. The LU Factorization 13 3. We can solve these equations for the rows of A to obtain row 1 of A = 1(row 1 of U ) row 2 of A = 2(row 1 of U ) + 1(row 2 of U ) row 3 of A = 3(row 1 of U ) + 2. 1 0 0 −1 . row 1 of U = row 1 of A row 2 of U = row 2 of A − 2(row 1 of U ) row 3 of U = row 3 of A − 3(row 1 of U ) − 2. Using the property of matrix multiplication illustrated in Section 2 Exercise 8.5 1 row 3 of U or    2 1 −1 1  4 −2 0  =  2 6 −7 1 3 0 1 2.5(row 2 of U ) Note that once a row is used as “pivotal row.

Then we note that this system can be solved by solving the two systems Ly = b and U x = y in order. with the multipliers 2 and 3 from the ﬁrst Gaussian step in its ﬁrst column. and with the multiplier 2. Letting   r  s . Any square matrix can be factored by Gaussian elimination into a product of a lower triangular L with ones down its diagonal and an upper triangular U . under the proviso that all pivots are nonzero. the ﬁrst system Ly = b is y= t  1 2 3 0 1 2. The LU Factorization We write this equation as A = LU and call the product on the right the LU factorization of A. In fact.5     0 r 5 0s =  0 . How can the LU factorization of A be used to solve the original system Ax = b? First we replace A by LU in the system to get LU x = b. it is . 1 t −9 which can be solved by forward substitution to get  u And letting x =  v  the second system U x = y is w       r 5  s  =  −10  . The pattern is the same for every matrix.14 3. w −1 Therefore. t 1 which can be solved by back substitution to get      2 1 −1 u 5  0 −4 2   v  =  −10  . 0 0 −1 w 1    u 1  v  =  2 . then there is no advantage of this method over the array form of Gaussian elimination presented in Section 1. in the matrix form of Gaussian elimination.5 from the second Gaussian step in its second column. Note that L is the lower triangular matrix with ones down its diagonal. we use elimination to factor A into LU and then solve Ly = b by forward substitution and U x = y by back substitution. If you have just one system Ax1 = b1 .

we conclude the number of operations required to compute the LU factorization of an n × n matrix is approximately n3 /3.666 operations for its LU factorization but only 1. the total number of operations for the ﬁrst step is (n − 1)n = n2 − n. we can compare the relative expense in computer time of elimination verses forward and back substitution. 3 k=1 and since n is negligible compared to n3 for large n. But now suppose you have a second system Ax2 = b2 with a diﬀerent right-hand side. In the ﬁrst elimination step for an n × n matrix. A 50 ×50 matrix would therefore require 41. Find the LU factorizations of ￿ ￿ 4 −6 (a) A = 3 5 . So if you have several systems to solve. all of which diﬀer only in their right-hand sides. By a similar operation count.500 operations. The second step is exactly like the ﬁrst except that it is performed on an (n − 1) × (n − 1) matrix and therefore requires (n − 1)2 − (n − 1) operations. Back substitution is much faster since the number of operations required is easily seen to be n ￿ n(n + 1) k= . On the other hand. By counting operations. a multiplier (one division) times the second through nth entries of the ﬁrst row (n − 1 multiplications) is subtracted from a row below the ﬁrst. This results in n operations. EXERCISES 1. we can show that Gauss-Jordan elimination requires n3 /2 operations. 2 k=1 which is approximately n2 /2. Since there are n − 1 rows to be subtracted from. then the LU factorization method is preferable. Continuing in this manner we see that the total number of operations required for Gaussian elimination is n ￿ n3 − n 2 (k − k) = . which is 50% more than straight Gaussian elimination with back substitution.275 for forward and back substitution. The LU Factorization 15 slightly harder since there is an extra forward substitution step. once for each system. Gauss-Jordan elimination on a 50 × 50 matrix would therefore require 62. the array method would have to run through the entire Gaussian elimination process twice.3. The LU factorization method would factor A into LU and then solve LU x1 = b1 and LU x2 = b2 both by forward and back substitution. We will count only multiplications and divisions since they take much more time than addition and subtraction. Forward substitution is the same.

16 (b) 3. The LU Factorization 2  −2 B= 4 1  2 C= −3 1 2 4  D = 0  0 0    (c)  3 2 −1 5 3 2   2 −1 2 1 3 1 1 0 5 3 3 4 0 −1 0 0  1 3 5 1 2 4 (d)  0 0 0 0  1 0  1 1 4 3 2. then how large a linear system can you solve with a budget of \$2? Of \$200? . Use the LU factorizations above and forward and back substitution to solve ￿ ￿ −8 (a) Ax = 13 (b)  12 Bx =  −6  18  0  4  Cx =   −1 2    (c) (d)  4  5    Dx =  −4    2 3 3. If your computer performs 106 operations/sec and costs \$500/hour to run.

Using Gaussian elimination on the corresponding array ￿   1 2 3￿1 ￿ 2 4 9￿5 ￿ 3 6 7￿5 This has the harmless eﬀect of exchanging the second and third equations. in this case the 2 in the third row. Using Gaussian elimination on the corresponding array ￿   1 2 3￿1 ￿ 2 4 9￿5 ￿ 2 6 7￿4 the ﬁrst elimination step gives  1 0 0 2 0 2 ￿  3￿1 ￿ 3￿3. ￿ 0 0 3￿3 Example 2: Now let’s look at another system: u + 2v + 3w = 1 2u + 4v + 9w = 5 3u + 6v + 7w = 5 . ￿ 1￿2 A zero pivot has appeared. .4. Row Exchanges 17 4. The problem can therefore be ﬁxed by just exchanging the second and third rows: ￿   1 2 3￿1 ￿ 0 2 1￿2. But note that there is a nonzero entry lower down in the second column. Example 1: We ﬁrst consider the system u + 2v + 3w = 1 2u + 4v + 9w = 5 2u + 6v + 7w = 4 . ROW EXCHANGES We now return to the question of what happens when we run into zero pivots. In this case we are done with elimination since the array is now ready for back substitution.

The last two equations. We can also see this by extending Gaussian elimination a little. which is just the identity matrix with some of its rows exchanged. Gaussian elimination breaks down through no fault of its own simply because this system has no solution. then the elimination gives ￿   1 2 3￿1 ￿ 0 0 3￿3. Otherwise the matrix is called singular. For the ﬁrst example of this section this looks like       1 0 0 1 2 3 1 0 0 1 2 3 0 0 12 4 9 = 2 1 00 2 1.) We conclude that when we run into a zero pivot. 0 = 4. Row Exchanges the ﬁrst elimination step gives  But now a row exchange will not produce a nonzero pivot in the second row. A matrix for which Gaussian elimination possibly with row exchanges produces a triangular system with nonzero pivots is called nonsingular. ￿ 0 0 0￿4 The third equation. we make a row exchange and continue. we should look for a nonzero entry in the column below the zero pivot. then we must stop. Suppose we knew what row exchanges would be necessary before we started. 0 1 0 2 6 7 2 0 1 0 0 3 .18 4. In this case there are inﬁnitely many solutions to the original system. What happens to the LU factorization of A when there are row exchanges? The answer is that the product of the L and U we obtain no longer equals the original matrix A but equals A with row exchanges. ￿ 0 −2 ￿ 2 What we really have here is two equations with three unknowns. signals the impossibility of a solution. We would then obtain the equation P A = LU . cannot be satisﬁed simultaneously. 3w = 3 and −2w = 2. suppose the right-hand side of the third equation is equal to 1 instead of 5. If we ﬁnd one. The altered version of A is realized by premultiplying A by a permutation matrix P . This will produce ￿   1 2 3￿1 ￿ 0 0 3￿3. If we don’t. we would get the normal LU factorization of this altered A. Example 3: In the example above. Use the 3 in the second equation to eliminate the −2 in the third equation. ￿ 0 0 0￿0 1 0 0 ￿  2 3 ￿1 ￿ 0 3 ￿3. Then if we performed those exchanges on A ﬁrst. (See Section 7. Back substitution breaks down since the ﬁrst equation cannot determine both u and v by itself. a unique solution to the system does not exist.

and back substitution will produce the unique solution. How many solutions do each of the following systems have? (a)     0 1 −1 u 2  1 −1 0   v  =  2  1 0 −1 w 2 . will produce a triangular system with nonzero pivots on the diagonal. Solve by the array method  −2 −8 3   v  =  32  w 1 0 1 1 2.4. Just apply P to both sides to get P Ax = P b. In any case. Since row exchanges are unpleasant when factoring matrices. Gaussian elimination. The LU factorization of P A gives LU x = P b. Forward and back substitution then give the solution x. (From now on. Row Exchanges 19 The P A = LU factorization can still be used to solve the system Ax = b as before.”) EXERCISES     1 4 2 u −2 1. we restate the central fact (really a deﬁnition): For a linear system Ax = b whose coefﬁcient matrix A is nonsingular. we will try to restrict our attention to nonsingular matrices that do not need them. Which of the following matrices is singular? Why? (a)      1 4 2  6 −8 2  −2 −8 −4  1 4 2  −2 −8 −3  −1 −4 5 1 0  0 0 1 0  0 0  3 5 0 0 3 5 0 0 2 0 10 0  (b) (c) (d)  2 −1 3 2   0 2 0 10  0 2   2 11 3. we will take “Gaussian elimination” to mean “Gaussian elimination possibly with row exchanges. possibly with row exchanges.

Row Exchanges  4. Prove that if A is nonsingular. then the only solution of the system Ax = 0 is x = 0. (A system of the form Ax = 0 is called homogeneous.)     0 1 −1 u 0  1 −1 0   v  =  0  1 0 −1 w 0 .20 (b) 4.

1 Let B1 . then B = BI = B(AC) = ￿ (BA)C = IC = C. We can easily prove that there cannot be more than one inverse of a given matrix: If B and C are both inverses of A. I2 . then how do we compute it? We answer the second question ﬁrst. B2 . Then we can see that this is really the problem of solving the three separate linear systems AB1 = I1 AB2 = I2 AB3 = I3 . and.5. B3 be the columns of B and I1 . Since the coeﬃcient matrix is the same for all three systems. . we can just ﬁnd the LU factorization of A and then use forward and back substitution three times to ﬁnd the three solution vectors. As an example. For example the matrix an inverse since ￿ 1 2 0 0 ￿￿ ￿ ￿ ￿ a b a b = . Therefore AA−1 = A−1 A = I. I3 be the columns of I. the inverse of the matrix ￿ ￿ ￿ 2 1 1 −1 3 3 is since 1 1 2 −3 1 3 ￿ 1 −1 1 2 ￿￿ 2 3 −1 3 1 3 1 3 ￿ 1 = 0 ￿ ￿ 0 . How can we tell if a matrix has an inverse. then there is at most one. d that will make the right-hand side equal to the identity matrix. These vectors. 1 ￿ 1 2 0 0 ￿ cannot have Some matrices do not have inverses. INVERSES A square matrix A is invertible if there is a matrix B of the same size such that their product in either order is the identity matrix: AB = BA = I. 3 2 2 This means that we are looking for a matrix B such that AB = I or  2 −3  1 −1 3 2  2 b11   b21 1 2 b31 b12 b22 b32   b13 1  = 0 b23 b33 0 0 1 0  0 0. will form the columns of B. when lined up. We write it as A−1 and call it the inverse of A. c. Inverses 21 5. b. Let’s try to ﬁnd the inverse of   2 −3 2 A =  1 −1 1  . c d 2a 2b so there is no choice of a. if it does have an inverse. If there is such a B.

so These two methods for ﬁnding inverses. That is. ￿ 0 0 −1 ￿ 5 −13 1 Finally divide each row by its leading nonzero entry to get ￿   1 0 0 ￿ 4 −10 1 ￿  0 1 0 ￿ −1 2 0 .5 in the second row to eliminate the −3 in the ﬁrst row. we can use the array method and a trick to avoid running through Gaussian elimination three times. Either method will work as long as A is nonsingular.5 1 0. −5 13 −1  The three columns on the right are the solutions to the three linear systems.5 1 0. Gauss-Jordan elimination does. (1) LU factorization and forward and back substitution n times and (2) Gauss-Jordan elimination. This gives ￿   2 −3 0 ￿ 11 −26 2 ￿  0 . But we could also use Gauss-Jordan elimination. present some organizational clarity when ﬁnding inverses by . First set up the array ￿   2 −3 2 ￿ 1 0 0 ￿  1 −1 1 ￿ 0 1 0  ￿ 3 2 2￿0 0 1 and use Gaussian elimination to get  ￿  2 −3 2 ￿ 1 0 0 ￿  0 . ￿ ￿ 5 0 0 −1 −13 1 Then use the .5 0 ￿ −. each require n3 operations.22 5.5 0 ￿ −. use the −1 in the third row to eliminate the entries in the column above it by subtracting multiples of the third row from the second (unnecessary since that entry is already zero) and from the ﬁrst.5 1 0. however.5 0 ￿ −. ￿ 0 0 −1 ￿ 5 −13 1 Now in this situation we would normally use back substitution three times. ￿ 0 0 1 ￿ −5 13 −1 A−1  4 −10 1 =  −1 2 0 . Inverses If we want to ﬁnd the solution by hand. ￿   2 0 0 ￿ 8 −20 2 ￿  0 .

(Recall that the statement “(d) ⇒ (a)” is logically equivalent to the statement “not (a) ⇒ not (d). (1) does Gaussian elimination work. (d) ⇒ (a): We prove this by assuming (a) is false. But for most large-scale applications. at some point we will run into a zero pivot. This turns out to be much inferior to ordinary Gaussian elimination with back substitution for two reasons: (1) It takes n3 operations to ﬁnd A−1 as compared with n3 /3 operations to solve Ax = b by Gaussian elimination. (d) Ax = 0 has the unique solution x = 0. It basically says that for any matrix A the three questions. Let y be a diﬀerent solution.5. and (3) does Ax = b have a unique solution. row exchanges can be made without aﬀecting the ﬁnal answer. Then apply A−1 to both sides of Ay = b to obtain y = A−1 b. and by (c) it must be the only solution. (2) does A have an inverse. Inverses 23 hand. We end this section with a major result. Once we have the inverse of a matrix. Just apply A−1 to both sides to obtain x = A−1 b. the computation of matrix inverses can and should be avoided. (a) A is nonsingular (that is. When we run . Furthermore. (2) Computing inverses. we can immediately say that there must exist a free variable and therefore conclude that there are nonzero solutions to Ax = 0. Inverses are valuable in theory and for conceptualization. Since we are assuming A is singular. since Gauss-Jordan elimination is just the array method performed several times at once. In some areas of statistics and linear programming it is occasionally necessary to actually compute an inverse. (c) Ax = b has a unique solution for any b.”) Consider the system Ax = 0. For any square matrix A the following statements are equivalent (all are true or all are false). Gaussian elimination. But since we haven’t got to Section 7 yet. by whatever method. (b) ⇒ (c): Apply A−1 to both sides of Ax = b to obtain a solution x = A−1 b. that is Ax = 0 has nonzero solutions in x. then Gaussian elimination can be used to ﬁnd its inverse. Therefore x = y and the solution is unique. all have the same answer. (c) ⇒ (d): Clearly x = 0 is a solution of Ax = 0. if we apply Gaussian elimination. (a) ⇒ (b): The point of this section has been to show that if A is nonsingular. possibly with row exchanges. Theorem. Using the language and results of Section 7. Proof: We show (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (a). that is A is singular. what can we do with it? It might seem at ﬁrst glance that A−1 can be used to solve the system Ax = b directly. is subject to much more numerical instability and round-oﬀ error than is Gaussian elimination. which we state and prove formally. we’ll try to prove this directly. and we show that this implies (d) is false. produces nonzero pivots) (b) A is invertible.

1 0  (b) 0 0 −5   2 6 10 0 2 5  (c) 0 0 5   1 1 1 2 3 2 (d) 3 8 2   1 1 1  −1 3 2  (e) 2 1 1 (f)  1 1  2 1 2 3 4 1 3 3 3 1  1 2  3 1 a situation that looks something like     ∗ ∗ ∗ x1 0 ∗ ∗ ∗   x2   0      0 ∗ ∗   x3  =  0  . ￿ ￿ 1 4 (a) 2 7   2 0 0  0 . This proves the theorem. then Gaussian elimination applied to Ax = 0 will produce a system that in exactly the same way can be shown to have nonzero solutions. Inverses into the zero pivot.     0 ∗ ∗ x4 0 0 ∗ ∗ x5 0 . we will get a nonzero solution to Ax = 0. This shows that (d) is false. (We skip the proof that P −1 exists. The pattern is the same in all cases. If A is any singular matrix. which we can now write A = P −1 LU .) EXERCISES 1. we will have  ∗ ∗ 0 ∗  0 0  0 0 0 0 But if we set x5 = x4 = 0 and x3 = 1 and solve for x2 and x1 by back substitution. Note that the ﬁrst statement of the theorem is equivalent to the fact that A has P A = LU factorization.24 5. Use Gauss-Jordan elimination to ﬁnd the inverses of the following matrices.

If A. Let A be the matrix of Exercise 1(e). 6. From Exercises 1(b) and 1(c) what can you say about the inverse of a diagonal matrix and of an upper triangular matrix?   11  23  by using A−1 3. Inverses 25 (g) ￿ a b c d ￿ 2. I] to obtain [I. Give 2 × 2 examples of the following. then so is AB. Prove that if A and B are nonsingular. we applied Gauss-Jordan elimination to the array [A. There is a slight hole in our proof of (a) ⇒ (b) in the theorem of this section. But how  2 −1 3 2   0 2 0 10  0 2   2 11 . 7. C are invertible. Solve the system Ax = 13 4. (b) The sum of two noninvertible matrices may be invertible.)   1 4 2 (a)  6 −8 2  −2 −8 −4 (b)     1 4 2  −2 −8 −3  −1 −4 5 1 0  0 0 1 0  0 0 3 5 0 0 3 5 0 0 2 0 10 0 (c) (d) 5. (a) The sum of two invertible matrices may not be invertible. (c) (AT )−1 = (A−1 )T . 8. then prove (a) (AB)−1 = B −1 A−1 . To ﬁnd the inverse of A.5. Which of the following matrices is invertible? Why? (See Section 4 Exercise 2. B]. (b) (ABC)−1 = C −1 B −1 A−1 . We then concluded AB = I so that B is a right-inverse of A. B.

9. 10. it is true that if a matrix has a one-sided inverse. Or more simply stated. prove that BA = I by applying the reverse of the same Gauss-Jordan steps in reverse order to the array [B. To prove this.I] to obtain [I.A]. argue as follows: AB = I ⇒ B is nonsingular ⇒ B is invertible ⇒ A = B −1 ⇒ BA = I.26 5. then the homogeneous system Ax = 0 has nonzero solutions. Inverses do we know that B is also a left-inverse of A? Prove that it is. Fill in the details. AB = I ⇒ BA = I.” (b) “If A is singular. True or false? (a) “Every nonsingular matrix has an LU factorization. that is. then it must have a two-sided inverse. More generally.” .

A matrix is tridiagonal if all of its nonzero elements are either on the main diagonal or adjacent to the main diagonal. .  .   . (2) The superdiagonal entries (that is. Tridiagonal Matrices 27 6. In fact.6.  . And (3) the ﬁnal upper triangular matrix has nonzero entries only on its diagonal and superdiagonal. they usually have special patterns. which are the simplest kind of band matrices.  2 1 0 0 0 3 3 0  0 0 1 1  0 0 0 2 0 0 0 0 we obtain  0 0  0. We conclude that large systems involving tridiagonal matrices are very easy to solve.  .  1 1 This example reveals three properties of tridiagonal matrices and Gaussian elimination. we can write a quick and eﬃcient program that will solve tridiagonal systems directly:    x0   b0  d1 c1  a2 d2 c2   x1   b1     x2   b2  a3 d3 c3    =     . an dn xn bn for k = 2 to n do if dk−1 = 0 then signal failure and stop m = ak /dk−1 dk = dk − mck−1 bk = bk − mbk−1 if dn = 0 then signal failure and stop xn = bn /dn for k = n − 1 down to 1 do xk = (bk − ck xk+1 )/dk . In such cases Gaussian elimination often simpliﬁes. Here is an example (from Section 3 Exercise 1(d)):   2 1 0 0 0 4 5 3 0 0   0 3 4 1 0   0 0 −1 1 1 0 0 0 4 3 If we run Gaussian elimination on this matrix. the entries just above the main diagonal) don’t change. (1) There is at most one nonzero multiplier in each Gaussian step. . If we count the number of operations required to triangulate a tridiagonal matrix. We now illustrate this by looking at tridiagonal matrices. . TRIDIAGONAL MATRICES When coeﬃcient matrices arise in applications.   . we ﬁnd it is equal to n instead of the usual n3 /3.

y4 ) x0 x1 x2 x3 x4 FIGURE 1 How do we ﬁnd the cubic polynomials that make up the spline? If we knew what slopes the spline curve should have at its nodes. · · · . which is a smooth curve passing through all the data. then we could ﬁnd the cubic polynomial on each interval using the method of Section 1 Exercise 8. xn−1 are called interior nodes. Often these are taken to be the requirement that the spline has no curvature (zero second derivatives) at the boundary nodes. To make the problem completely determined. y1 ) (x3 . the deﬂection of beams. the cubic on the left and the cubic on the right have the same heights. heat ﬂow problems. and x0 and xn are called boundary nodes. the same second derivative). x1 ]. yn ). and so on. xn ] such that. (xn . [xn−1 . The spline thus obtained is called a natural spline. and modeling. font design. Tridiagonal Matrices Tridiagonal matrices arise in many situations: electrical circuits. The problem is to ﬁnd a cubic polynomial on each of the intervals [x0 . that is. y2 ) s4 (x4 . x1 − x0 = x2 − x1 = · · · = xn − xn−1 = h. y1 ). · · · . y3 ) s3 (x0 . x2 ]. s1 . The points x1 . For simplicity assume that the data is equally spaced. We are given data (x0 . [x1 . we need conditions at the two boundary nodes. s1 (x1 . If we glue these cubics together we will obtain a cubic spline. and the same curvature (that is to say. y0 ). the same slopes. y0 ) s0 s2 (x2 .28 6. sn be the unknown slopes at the nodes. Let s0 . Then with some algebraic eﬀort it is possible to show that the conditions described above force the slopes to . Here we show how tridiagonal matrices are used in cubic spline interpolation. at each interior node. x2 . · · · . Splines have applications in CAD-CAM. (x1 . · · · .

s1 . Once the slopes s0 . n − 1 are all > 3. the cubic polynomial on each interval can be found as a cubic Hermite interpolant. Run Gaussian elimination on the tridiagonal n + 1 × n + 1 matrix of this section and show that the pivots for rows 1. .1).4).· · · . (See Section 1 Exercise 8. (1.0) must satisfy. . Solve it. Tridiagonal Matrices 29 satisfy the following linear system:  2 1 1 4 1   1 4 1   1 4                               3   =  . · · · .6. .   h      4 1   sn−3   yn−2 − yn−4      1 4 1   sn−2   yn−1 − yn−3    1 4 1   sn−1  yn − yn−2 1 2 sn yn − yn−1  s0 s1 s2 s3 . sn are known. The system is tridiagonal and therefore easy to solve. Write down the system that the slopes of the natural spline interpolant of the data (0.   y1 − y0 y2 − y0 y3 − y1 y4 − y2 .0). This proves that not only is this matrix nonsingular but also that no row exchanges are necessary. .  1 . The tridiagonal algorithm can therefore be used.1). All the other equations come from the conditions at the interior nodes.. (4. . even when there is a large number of nodes.) EXERCISES 1. 2. Sketch the resulting spline curve. (2. . 1 The ﬁrst and last equations come from the conditions at the boundary nodes. (3.

0 −2 ￿ −2 in array form reduces by Gaussian elimination to Clearly we cannot use back substitution. This indicates that the entire system has no solution. Systems with Many Solutions 7. has no solution. there is exactly one solution: x = b/a. (2) If a = 0 and b ￿= 0. This is the nonsingular case we have been considering in these notes up to now. We ﬁrst look at 2 × 2 examples.30 7. the second equation. 0u + 0v = −4. (3) If a = b = 0. Example 2: The system u+ v=2 2u + 2v = 0 ￿ ￿ 1 0 1 2 ￿ ￿ 1￿2 ￿ 2￿0 ￿ ￿ 1 1 ￿ 2 ￿ . Even worse. then x = 6/2 = 3. and the system is said to be inconsistent. At ﬁrst glance we would say that the solution is just x = b/a. there is no solution. SYSTEMS WITH MANY SOLUTIONS Consider the single linear equation in one unknown ax = b. The coeﬃcient matrix is of course singular. 0x = 6 is not satisﬁed by any x. For example. For example. there are inﬁnitely many solutions because 0x = 0 is satisﬁed by every x. Example 1: The system u+v=2 u−v=0 ￿ ￿ ￿ 1 1 ￿2 ￿ 1 −1 ￿ 0 in array form reduces by Gaussian elimination to ￿ The unique solution is therefore v = 1 and u = 1. 0 ￿ −4 . if 2x = 6. It is a striking fact that exactly the same three cases are the only possibilities that exist for systems of equations. ￿ ￿ 1￿ 2 ￿ . But in fact there are three cases: (1) If a ￿= 0.

The solution is therefore u=2−c v=c or written in vector form is ￿ ￿ ￿ ￿ u 2−c = v c ￿ ￿ ￿ ￿ ￿ ￿ u 2 −1 = +c . In the following examples we present a systematic method for ﬁnding solutions of more complicated systems. The ﬁrst equation then gives u = 2 − c. The method is an extension of Gauss-Jordan elimination. but the system is said to be underdetermined. Systems with Many Solutions 31 Example 3: The system u+ v=2 2u + 2v = 4 ￿ ￿ 1 2 ￿ ￿ 1￿2 ￿ 2￿4 in array form reduces by Gaussian elimination to 1 0 This time the second equation is trivially satisﬁed for all u and v. So we set v = c where c is an arbitrary constant and try to continue with back substitution.7. The coeﬃcient matrix is still singular as before. 0￿0 or alternatively We see that we have obtained an inﬁnite number of solutions parametrized by an arbitrary constant. ￿ 0 0 ￿0 . v 0 1 ￿ ￿ 1￿2 ￿ . Example 4: Suppose we have the 3 ×  1 2 3 Gaussian elimination produces  1 0 0 3 system ￿  2 −1 ￿ 2 ￿ 4 1 ￿7. ￿ 6 −2 ￿ 7 ￿  2 −1 ￿ 2 ￿ 0 3 ￿3.

those that do not. ￿ 0 0 0 1￿ 2 u = −6 + c v =1−c w=c x=2 (These Gauss-Jordan steps are really not necessary. The solution is therefore . w 1 0 system ￿  1 3￿2 ￿ 0 1￿2. Free variables are set to arbitrary constants. The variables u. w now fall into two groups: leading variables. will produce the staircase form ￿   1 2 1 3￿2 ￿ 0 1 1 1￿3. but they usually make the answer somewhat easier to write down. Leading variables are solved for in terms of free variables. so v = c. v. we obtain the solution u = 3 − 2c v=c w=1 or in vector form       u 3 −2  v  = 0 + c 1 . In this case.32 7. ￿ 1 1￿3 1 0 0 2 0 0 ￿  0￿3 ￿ 3￿3.) The free variable is w. u and w are leading variables and v is a free variable. Working from the bottom up. and free variables. an exchange of the second and third rows. ￿ 0￿0 Example 5: Suppose we have the 3 × 4  1 2 0 0 0 1 One step of Gaussian elimination. those that correspond to columns that have a leading nonzero entry for some row. There is no way to get rid of the 2 in the ﬁrst equation. ￿ 0 0 0 1￿2 Now apply two steps of Gauss-Jordan to obtain ￿   1 0 −1 0 ￿ −6 ￿ 0 1 1 0￿ 1 . Systems with Many Solutions and Gauss-Jordan produces  This is as far as we can go.

￿ 0￿0 There are two free variables. Each is set to a diﬀerent arbitrary constant. ￿   0 0 0 0 0 0 0 0 •￿∗ ￿ 0 0 0 0 0 0 0 0 0 ∗  ￿  • 0 ∗ 0 ∗ ∗ ∗ ∗ 0￿∗ ￿ 0 • ∗ 0 ∗ ∗ ∗ ∗ 0￿∗  ￿  0 0 0 • ∗ ∗ ∗ ∗ 0￿∗. Systems with Many Solutions 33 or in vector form       u −6 1 v  1   −1   =  + c .5c x=c or in vector form         u 2 −3 −2  v  0  0   1    =   + c  + d .5 0 x 0 1 0 This time we have a an inﬁnite number of solutions parametrized by two arbitrary constants. v and x. w 1 −. The solution is therefore u = 2 − 3c − 2d v=d w = 1 − .7. In general. Gaussian elimination will put the array into echelon form ￿  • ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗￿∗ ￿ 0 • ∗ ∗ ∗ ∗ ∗ ∗ ∗￿∗  ￿  0 0 0 • ∗ ∗ ∗ ∗ ∗￿∗. w 0 1 x 2 0 Example 6: Suppose we have a 3 × 4 system that reduces to  1 0 0 2 0 0 0 2 0 ￿  3￿2 ￿ 1￿2. ￿   0 0 0 0 0 0 0 0 •￿∗ ￿ 0 0 0 0 0 0 0 0 0 ∗  and Gauss-Jordan elimination will put the array into row-reduced echelon form .

we get a staircase pattern where the ﬁrst nonzero entry in each row (indicated by bullets above) is a pivot. Show that the following expressions represent the same set of solutions. at least one pivot occurs to the right of the main diagonal. all pivots occur on the main diagonal. For singular matrices.34 7. ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ u 2 −1 u 2 8 ￿ (a) = +c and = +c v 0 1 v 0 −8                 u 3 0 0 u 3 0 0  v  = 0+c1+d0  v  =  0  + c￿  1  + d￿  1  (b) and 0 0 1 w 0 0 1 w 2. Find the  1 (a)  1 1 (b)      2 2 4 solutions of ￿  2 3￿3 ￿ 4 5￿4 ￿ 0 1￿2 2 6 8 ￿  2￿ 1 ￿ 4￿ 6  ￿ 6 ￿ 10 (c) ￿  1 2 −1 ￿ 3 ￿  2 4 −2 ￿ 6  ￿ −3 −6 3 ￿ −9 1 0 2 2 1 4 2 4 4 2 1 5 4 2 8 1 0 2 2 2 0 ￿  8 ￿ 1 ￿ 2 ￿ −1  ￿ 20 ￿ 1 ￿  3￿ 1 ￿ 2 ￿ −1  ￿ 4￿ 8 (d) (e) (f) ￿  3 ￿ −9 ￿ 6 ￿ −18  ￿ 3 ￿ −3 .”) EXERCISES 1. Systems with Many Solutions In either case. For square nonsingular matrices. Solutions can be written in many equivalent ways. This is the precise mathematical deﬁnition of pivot. (Up to now we have been referring to this informally as the case of a “zero pivot.

Then graph each equation as a plane and give a geometric reason for the number of solutions of each system. or inﬁnitely many? . ￿   1 1 0￿1 ￿ (a)  1 −1 0 ￿ 0  ￿ 0 0 1￿0 (b)     2 0 0 1 0 1 1 0 1 1 2 3 0 0 0 0 1 1 0 1 1 1 2 3 ￿  0￿2 ￿ 3￿0 ￿ 3￿6 ￿  0￿0 ￿ 0￿0 ￿ 0￿1 ￿  0￿1 ￿ 0￿0 ￿ 0￿1 ￿  1￿1 ￿ 2￿2 ￿ 3￿3 (c) (d) (e) 5. Explain why the following statements are true. then Ax = c (diﬀerent right-hand side) has how many possible solutions: none. If Ax = b has inﬁnitely many solutions. then it has inﬁnitely many solutions. Systems with Many Solutions 35 3. (Hint: There must be some free variables. ￿ ￿ ￿ 1 3￿2 ￿ (a) 3 2￿1 (b) ￿ ￿ ￿ ￿ 2 1 ￿ −1 ￿ −6 −3 ￿ −4 ￿ ￿ 3 −1 ￿ 2 ￿ −6 2 ￿ −4 (c) 4. (Hint: Why can’t the no solution case occur?) 6. one. Solve each of the following 3 × 3 systems.7. (a) If the system Ax = b has more unknowns than equations. then it has either no solution or inﬁnitely many solutions. Then graph each equation as a line and give a geometric reason for the number of solutions of each system.) (b) If the homogeneous system Ax = 0 has more unknowns than equations. Solve each of the following 2 × 2 systems.

or inﬁnitely many solutions can occur.3). and orange juice in a blender.36 7. 10. that is. one solution. (5. Systems with Many Solutions 7. How much of each should be blended to produce a drink with 560 calories of energy and 24 grams of protein? energy (kcal) protein (g) 1 egg 80 6 1 cup milk 180 9 1 cup orange juice 100 3 9. this would mean 2a + b = 2c + 3d. Find such a solution. Show with 3 × 2 examples that if a system Ax = b has more equations than unknowns. For oxygen. then any one of the three cases of no solution. d that balance the reaction. milk. c. While there are many possible choices for a. b. it is customary to use the smallest possible positive integers. for example. (2. Find the equation of the circle in the form c1 (x2 + y 2 ) + c2 x + c3 y + c4 = 0 that passes through the points (2.0). (Hint: Just expand the 2 × 2 examples at the beginning of this section to 3 × 2 examples.6).) 8. The food energy and protein for these ingredients are given below. Consider the chemical reaction a NO2 + b H2 O = c HNO2 + d HNO3 . the number of atoms of each element must be the same before and after the reaction. A nutritious breakfast drink can be made by mixing whole egg. The reaction must be balanced. .

For example. This means the determinant is the sum of all possible products of n entries of A. diﬃcult to motivate. It is important to us only because from it the following properties of the determinant can be proved. · · · . Determinants 37 8. The determinant det(A) is a number associated with a square matrix A. there is far less emphasis on them than in the past. DETERMINANTS Determinants have been known and studied for 300 years. . We will omit the proofs since in this section we want to get through the determinant as quickly as possible. determinants play an important but narrow role in theory and almost no role at all in computations. where σ(1). For 2 × 2 and 3 × 3 matrices it is deﬁned as follows ￿ ￿ a11 a12 det = a11 a22 − a21 a12 a21 a22 a11 det  a21 a31  a12 a22 a32  a13 a23  = a33 a11 a22 a33 + a12 a23 a31 + a13 a32 a21 − a31 a22 a13 − a21 a12 a33 − a32 a23 a11 . We will make use of them in our study of eigenvalues in Section 9. n. Today. and impossible to compute. it is easy to see that this is true for the 3 × 3 case written out above. σ(n) is a permutation or rearrangement of the numbers 1. These rules cannot be extended to larger matrices! For such matrices we must use the general deﬁnition: det(A) = ￿ σ sign(σ)a1σ(1) a2σ(2) a3σ(3) · · · anσ(n) . In modern mathematics. These are the familiar diagonal rules from high school. 2. We intentionally leave this deﬁnition of the determinant vague since it is hard to understand. In particular. where each product consists of entries taken from unique rows and columns.8. In a later section we present another approach that will make clear where the mysterious determinant formula comes from and how the properties are dertived. the second term in the high-school formula comes from  ∗  ∗ a31 a12 ∗ ∗  ∗ a23  . · · · . ∗ The symbol sign(σ) is equal to +1 or −1 depending on how the rows and columns are chosen. however.

then that factor can be taken   12 2 1  = 3 det  5 2 2 1 7 5  4 1 2 (6) The determinant of the transpose of a matrix is the same as the determinant of the matrix itself: det(AT ) = det(A). 1 det  5 3  2 7 1   2 3 1  = − det  5 3 1 1 7 2  3 1 2 (4) The typical Gaussian elimination operation of subtracting a multiple of one row from another leaves the determinant unchanged.38 8. 1 0 det 5   4 2 0 0 = 0 7 1 1 3 det 1  4 5 4  2 2 = 0 2 1 3 det 2  4 5 8  2 2 = 0 4 (3) The determinant changes sign when two rows are exchanged. 1 3 det 5  2 1 7    2 1 2 2 3  = det  0 −5 −3  1 5 7 1 (5) If all the entries in a row have outside the determinant  6 3 det  5 7 2 5 a common factor. 1 5 det 3  2 5 1   2 1  = det  2 5 3 2 5 5 5  3 1 3 . Determinants (1) The determinant of the identity matrix is 1. 1 det  0 0  0 1 0  0 0 = 1 1 (2) If A has a zero row or two equal rows or two rows that are multiples of each other. then det(A) = 0.

resulting in (n − 1)n! multiplications in all. But as we saw. each of which requires n − 1 multiplications. A computer that can perform a million multiplications a second would take 1013 years to compute this determinant! This is clearly unacceptable. then Gaussian elimination will produce an upper triangular matrix with nonzero pivots. Note that property 6 means that all the properties about rows also hold for columns. the second in n − 1 ways. Determinants 39 (7) The determinant of a (lower diagonal entries. If A is singular.8. For any square matrix A the following statements are equivalent. Use Gaussian elimination to triangulate the matrix. where in each product each factor is an entry from a diﬀerent row and a diﬀerent column. then Gaussian elimination will produce an upper triangular matrix with at least one zero pivot. there must be an eﬃcient way to compute it. the third in n − 2 ways. For a 25 × 25 matrix there would be 24·25! or 3. Note also that property 9 can be added to the theorem of Section 5 to obtain Theorem. Since by properties 3 and 4 Gaussian elimination changes at most the sign of the determinant.7×1026 multiplications. Proof: We show (a) ⇔ (e). the formula consists of a sum of products of n entries of A. We could try to use the formula in the deﬁnition of the determinant. If A is nonsingular. (d) Ax = 0 has the unique solution x = 0. (e) det(A) ￿= 0. there are therefore n(n−1)(n−2)(n−3) · · · (2)(1) = n! diﬀerent products in the sum.  2 det  0 0 or upper) triangular matrix is the product of its 3 5 0  7 2  = 2 · 5 · 3 = 30 3 (8) The determinant of a product is the product of the determinants. (a) A is nonsingular (b) A is invertible. (c) Ax = b has a unique solution for any b. This means there are n! products that must be summed up. det(AB) = det(A) det(B) (9) A is nonsingular if and only if det(A) ￿= 0. det(A) = 0. An alternate approach is suggested by the proof of property (10). we have that det(A) ￿= 0. If the determinant is to have any practical value. Then the determinant is the product of the . and so on. Since the ﬁrst entry in a product can be chosen in n ways. By the same argument.

it is Gaussian elimination! Why do we want to compute a determinant in the ﬁrst place? What can it tell us about a matrix? Whether or not the matrix is singular? But we can determine that just by doing Gaussian elimination. there is the cofactor expansion of the determinant. They are important. A cofactor is the determinant of the 2 × 2 matrix obtained from the original matrix by crossing out a particular row and a column. for theoretical developments as we will see in the next section. the cofactor of the ﬁrst entry is the determinant of the matrix obtained by crossing . Here we use it to ﬁnd the determinant of the matrix above:   ￿ ￿ ￿ ￿ ￿ ￿ 1 2 3  2 4 9  = 1 det 4 9 − 2 det 2 9 + 3 det 2 4 det 6 7 2 7 2 6 2 6 7 = 1(28 − 54) − 2(14 − 18) + 3(12 − 8) = −26 + 8 + 12 = −6. In fact. So do we ever need to compute a determinant in practice? No! Determinants are rarely computed outside a classroom. 3 Since this method uses only Gaussian elimination. however. In words. In particular. it requires n3 /3 operations. but it takes just as many steps as Gaussian elimination. In particular. the determinant of the matrix on the left is the sum of the entries of its ﬁrst row times the cofactors of its ﬁrst row. Otherwise we get its LU factorization. For example. then we know the matrix is singular. The determinant can be evaluated in other ways. the matrix at the beginning of Section 4 was reduced to an upper triangular matrix by Gaussian elimination with one row exchange. It expresses the determinant of a matrix as a sum of determinants of smaller matrices.40 8. For a 25 × 25 matrix this is only 5208 operations or only 0. If we run into a zero pivot that cannot be cured by row exchanges. with an appropriate sign placed in front of the determinant.005 seconds on our hypothetical computer! The method above is an excellent way to compute the determinant.       1 2 3 1 2 3 1 2 3 2 4 9 → 0 0 3 → 0 2 1 2 6 7 0 2 1 0 0 3 We therefore have 1 det  2 2  2 4 6   3 1 2 9  = − det  0 2 7 0 0  3 1  = −1 · 2 · 3 = −6. Determinants diagonal entries (the pivots!) times +1 or −1 depending upon whether there was an even or odd number of row exchanges.

. . . . . it generally requires exactly the same number of multiplications as the formula that deﬁned the determinant in the ﬁrst place. and the cofactor of the third entry is the determinant of the matrix obtained by crossing out the ﬁrst row and the third column.8. . have some value in theoretical considerations and in the hand   2 1 0  + 2 det  1 −1 0 6 2 2   2 1 0  − 4 det  1 −1 0 6 2 2  4 0. . . It is therefore extremely impractical. It does.   − + − + ··· . We could use either cofactor expansion or the highschool formula on each of these smaller determinants. by crossing out the row and column of the corresponding entry. (Note that we should have expanded with respect to the third row because then we would have had only two 3 × 3 determinants to evaluate. however. Here is another cofactor expansion of the same matrix: 1 det  2 2  2 4 6  ￿ 3 2 9  = −2 det 2 7 ￿ ￿ 9 1 + 4 det 7 2 ￿ ￿ 3 1 − 6 det 7 2 3 9 ￿ = −2(14 − 18) + 4(7 − 6) − 6(9 − 6) = 8 + 4 − 18 = −6. Here’s an example of a cofactor expansion of the determinant of a 4 × 4 matrix: 1 1 1 0 det  1 −1 2 2 4 0 2   2 4 4 2  0 0 2 6   0 4 2 1  −1 0 0  − 1 det  1 = 1 det 2 2 6 2  We expanded with respect to the ﬁrst row. . 2 . In this case we are now faced with ﬁnding four 3 × 3 determinants. Note that the 2 × 2 matrices arise in the same way.) It is becoming clear that the method of cofactor expansion requires a great deal of computation. Just think about the 5 × 5 case! In fact. . the cofactor of the second entry is the determinant of the matrix obtained by crossing out the ﬁrst row and the second column with a negative sign in front. In general the signs in the deﬁnition of the cofactors form a checkerboard pattern:   + − + − ··· − + − + ···   + − + − ···. This time we expanded with respect to the second column. . Note also the signs. Determinants 41 out the ﬁrst row and ﬁrst column.

to compute  x 2 det 4 that contain algebraic expressions. and we make no attempt to prove it. Determinants computation of determinants of matrices example. For y 8 7  1 1 1 (ignoring that fact that we have the high-school formula for this!) we would use cofactor expansion with respect to the ﬁrst row. as the determinant of the submatrix Mij times (−1)i+j .) Note that the cofactor is oﬃcially deﬁned as the entire quantity in brackets. As long as we have come this far we might as well write down the general formula for the cofactor expansion of the determinant of a matrix with respect to its ith row. Compute the determinants by Gaussian elimination   1 3 1 (a)  1 1 4  0 2 0   1 1 1 (b)  3 3 −1  2 −2 2   2 1 3 (c)  −2 5 1  4 2 4   1 1 2 4 1 0 4 2 (d)   1 −1 0 0 2 2 2 6   1 2 3 1 1 3 3 2 (e)   2 4 3 3 1 1 1 1 . (The formula for expansion with respect to columns is similar. The formula is not very illuminating. We would deﬁnitely not want to use Gaussian elimination here. that is.42 8. EXERCISES 1. It is det A = ai1 [(−1)i+1 det Mi1 ] + ai2 [(−1)i+2 det Mi2 ] + · · · + ain [(−1)i+n det Min ] where Mij is the submatrix formed by deleting the ith row and jth column of A.

Give 2 × 2 examples of the following.   3 7 5 7 0 3 6 0 (a)   1 1 7 2 0 0 1 0   1 1 2 4 1 0 4 2 (b)   1 −1 0 0 2 2 2 6   x y 1 (c)  2 8 1  4 7 1   2−x 2 2 (d)  1 2−x 0  1 0 2−x ￿ ￿ B C 6. Use cofactor expansions to evaluate the following determinants. (b) det(A−1 ) = 1/ det(A) (c) det(BAB −1 ) = det(A) (d) det(cA) = cn det(A) where A is n × n. and (n − p) × (n − p). (A is said to be partitioned into blocks. (a) A ￿= 0 and det(A) = 0. True or false? “If det(A) = 0. 4.) Show that det(A) = det(B) det(D). For the matrix in Exercise 1(a) ﬁnd det(A−1 ) and det(AT ) without doing any work.8.) 7. Prove the following.” . (a) det(Ak ) = (det(A))k for any positive integer k. then the homogeneous system Ax = 0 has nonzero solutions. Suppose we have a square n × n matrix that looks like A = where B. 5.) 1 0 3. Determinants 43 2. p × (n − p). (Use Gaussian elimination. and D are submatrices of sizes p × p. (b) A ￿= B and det(A) = det(B) (c) det(A + B) ￿= det(A) + det(B) 0 1 (f)  0 0  0 0 0 1 1 0 0 0  0 0  (Use property (3) for a quick solution. 0 D C.

The number λ is called an eigenvalue of A and the vector x is called an eigenvector associated with λ. (“Eigen” is a German word meaning “its own” or “peculiar to it. given a square matrix A. So what we 1 actually have is an inﬁnite family of eigenvectors. The left-hand side is a polynomial in λ and is called the characteristic polynomial of A. this is equivalent to asking for those numbers λ that make the matrix A − λI singular or. for example. any vector of   1 the form c  −1  is an eigenvector associated with the eigenvalue 5.44 9. c￿  2 . Note also that this inﬁnite family   −2 can be represented in many other ways such as. This equation is called the characteristic equation of A. 1 Note that any multiple of this vector is also an eigenvector. We therefore want to ﬁnd those numbers λ for which the homogeneous system (A − λI)x = 0 has nonzero solutions x. in other words. That is. for which det(A − λI) = 0. By the theorem of the previous section. 1−λ The characteristic equation of A is det(A−λI) = 0. 7 4 2 1 1   1 So that 5 is an eigenvalue of the matrix above and  −1  is an associated eigenvector. −2 Suppose we want to ﬁnd the eigenvalues of a matrix A. it is necessary to know if there is a number λ (read “lambda”) and a nonzero vector x such that Ax = λx. EIGENVALUES There are many problems in engineering and science where. −1 1 First set 4 A − λI = −1 ￿ ￿ ￿ 2 λ − 1 0 ￿ ￿ 0 4−λ = λ −1 ￿ 2 .”) For example      5 4 4 1 1  −7 −3 −1   −1  = 5  −1  . Example 1: Find the eigenvalues of the matrix ￿ ￿ 4 2 A= . which can be rewritten as follows: ￿ ￿ 4−λ 2 det =0 −1 1−λ . Eigenvalues 9. We start by rewriting the equation Ax = λx as Ax=λIx or Ax − λIx=0 or (A − λI)x=0.

Eigenvalues 45 (4 − λ)(1 − λ) − 2(−1)=0 (λ − 2)(λ − 3)=0. v 0 We use Gaussian elimination in array form ￿ ￿ ￿ 2 2 ￿0 ￿ −1 −1 ￿ 0 to get ￿ 2 0 ￿ ￿ 2￿0 ￿ . or in vector form ￿ ￿ ￿ ￿ u −2 =c . v 1 Therefore. ￿ ￿ ￿ 1 2 ￿0 ￿ −1 −2 ￿ 0 1 0 ￿ ￿ 2￿0 ￿ . Write (A − 3I)x = 0 as ￿ 4−3 2 −1 1 − 3 ￿ ￿￿ ￿ ￿ ￿ u 0 = v 0 and then solve to get The solution is v = c and u = −2c. for each of the two eigenvalues we have found an inﬁnite family of eigenvectors parametrized by a single arbitrary constant. v 1 The case λ = 3 is similar. For the case λ = 2 we wish to ﬁnd nonzero solutions of the system (A − 2I)x = 0. We can go further and ﬁnd the associated eigenvectors. or in vector form ￿ ￿ ￿ ￿ u −1 =c . λ2 − 5λ + 6=0 The eigenvalues of A are therefore λ = 2 and λ = 3.9. which can be rewritten as ￿ 4−2 2 −1 1 − 2 ￿￿ ￿ ￿ ￿ u 0 = . 0￿0 . 0￿0 The solution is v = c and u = −c.

λ = 6 and λ = −1.46 9. [λ2 − 5λ − 6](6 − λ)=0. To ﬁnd the eigenvectors for λ = −1. v = c. Eigenvalues Example 2: Things can become more complicated as the size of the matrix increases. w 0 3 0 0 ￿  0￿0 ￿ 0￿0. −(λ − 6)2 (λ + 1)=0. 0 0 6−λ (2 − λ)(3 − λ)(6 − λ) − 3 · 4(6 − λ)=0. and u = −c. solve (A − (−1)I)x = 0 or      2+1 3 0 u 0  4  v  = 0 3+1 0 0 0 6+1 w 0 by reducing   3 4 0 3 4 0 3 0 0 ￿  0￿0 ￿ 0￿0 ￿ 7￿0 to The solution is w = 0. ￿ 7￿0 For the case λ = 6. solve (A − 6I)x = 0 or      2−6 3 0 u 0  4  v  = 0 3−6 0 0 0 6−6 w 0 . Here we have two eigenvalues. Consider the matrix   2 3 0 A = 4 3 0. or in vector form     u −1  v  = c 1 . 0 0 6 Proceeding as before we have the characteristic equation det(A − λI) = 0 rewritten as   2−λ 3 0 det  4 3−λ 0  = 0. [(2 − λ)(3 − λ) − 3 · 4](6 − λ)=0.

Eigenvalues 47 by reducing  to −4 3  4 −3 0 0  −4  0 0 The solution is w = c. 6. Example 3: It is possible for a repeated eigenvalue to have only one independent eigenvector. and we say that 6 has multiplicity 2. But in solving the system (A − 2I)x = 0 we obtain ￿ 2−2 1 0 2−2 ￿￿ ￿ ￿ ￿ u 0 = v 0 . 6 to indicate that 6 is repeated root of the characteristic equation. v = d. Intuitively. 2 which is easily seen to have characteristic equation (λ − 2)2 = 0 and therefore the repeated eigenvalue λ = 2. or in vector form 4     3 u 0 4  v  = c0 + d 1 .       u 0 3  v  = c￿  0  + d￿  4  . Note that this inﬁnite family can be represented in many other ways such as. w 1 0 ￿  3 0￿0 ￿ 0 0￿0.” We will not discuss the precise mathematical meaning of independence here (see Section 16). For λ = 6 we found two linearly independent eigenvectors such that arbitrary linear combinations of them generate all other eigenvectors. ￿ 0 0￿0 ￿  0￿0 ￿ 0￿0 ￿ 0￿0 We have therefore obtained an inﬁnite family of eigenvectors parametrized by two arbitrary constants. w 2 1 So in this example we have a 3 × 3 matrix with only two distinct eigenvalues. except to say that this does not always happen as in the following example. “linearly independent” means “essentially diﬀerent. Consider the matrix 2 A= 0 ￿ ￿ 1 . and u = 3 d. 2. for example. We write λ = −1.9.

a much more sophisticated algorithm called the QR method. in order to understand eigenvalues. v 0 Therefore the eigenvalue λ = 2. Although the characteristic polynomial is important in theory.48 or 9. computed. the problem of ﬁnding the roots of a high degree polynomial is numerically unstable. For practical computations. if ever. in practice it is very diﬃcult to compute characteristic polynomials for large matrices. the characteristic polynomial is deﬁned as a determinant. is used to ﬁnd eigenvalues and eigenvectors. 0￿0 EXERCISES 1. Even when this can be done. ￿ ￿ 1￿0 ￿ . Example 4: Even worse. a matrix can have no (real) eigenvalues at all. 2 has only one independent eigenvector. we have to know something about determinants. ￿ ￿ 1 1 (a) 0 2 (b)    5 0 2  0 1 0  −4 0 −1 2 1 1 2 2 0  2 0 2 (c) . In fact. In this section we have seen that. For example the matrix ￿ ￿ 0 1 A= −1 0 has characteristic equation λ2 + 1 = 0 which has no real solutions. Find the eigenvalues and eigenvectors of the following matrices. Because of this. which has nothing to do with characteristic polynomials. in practice it is rarely. Eigenvalues ￿ 0 0 So u = c and v = 0 or in vector form ￿ ￿ ￿ ￿ u 1 =c .

7 4 3  0 2 2  2 0 −2  2 −2 0 −2 0  0 −2  0 0 0 0 (e) (f)  0 0 5 −5   3 0 0 3 2.     −3 4 (a) You get  9  and I get  −12 . We get the results below. (d) You get 1 2 0 1 3. (Hint: They have the same characteristic polynomials. then A−1 x = λ x (f) If A is singular.  0  . then A2 x = λ2 x. Suppose you and I are computing eigenvectors. or not. · · ·. · · · .. Explain in what sense we got the same answers.   . (c) A = S  . .  0  . 6 −8         1 0 1 1 (b) You get  1  .  1  . λn . then λ = 0 must be an eigenvalue of A. (c) You get 1 −1 0 0 1         1 0 1 2  1  .)   λ1 λ2   −1  S has eigenvalues λ1 .  1  and I get  2  . 1 1 −1 0           1 0 1 1 1  1  .  1  and I get  2  . Eigenvalues 49 (d)     6 4 4  −7 −2 −1  Hint: Expand in cofactors of the ﬁrst row. (a) A and AT have the same eigenvalues. λn (d) If Ax = λx. 1 (e) If Ax = λx and A is nonsingular.  4  .  1  and I get  2  . Prove the following.) (b) A and BAB −1 have the same eigenvalues. (Hint: They have the same characteristic polynomials. λ2 .9. A3 x = λ3 x.

) 5. · · · . (Hint: Show det(A − λI) = det(B − λI) det(D − λI). then its eigenvalues are its diagonal entries a11 . Eigenvalues ￿ B C 4. If A = is the matrix of Section 8 Exercise 6.50 9. a22 . then show that the 0 D eigenvalues of A are the eigenvalues of B together with the eigenvalues of D. ￿ .   2 0 0 (a)  0 2 0  0 0 2   2 1 0 0 2 0 (b) 0 0 2   2 1 0 0 2 1 (c) 0 0 2 (g) If A is triangular. Find the eigenvalues and associated eigenvectors of each of the following matrices. ann .

  =  λ1 v1 λ2 v2 · · · λn vn  . . . vn . . . . . . . −1 1 1 1 −1 1 1 1 The two eigenvectors can be lined up to form the columns of a matrix S so that the two equations above can be combined into one matrix equation AS = SD where D is the diagonal matrix of eigenvalues: ￿ ￿￿ ￿ ￿ ￿￿ ￿ 4 2 −1 −2 −1 −2 2 0 = . where x1 and x2 are the associated eigenvectors. . λ2 .   . .  λ2   =  v1 v2 · · · vn    . λn with linearly independent associated eigenvectors v1 . . . . . . . . then the equations Av1 = λ1 v1 . . λn     . Suppose the n × n matrix A has eigenvalues λ1 . . . Diagonalization 51 10. DIAGONALIZATION Example 1: Let’s look back at Example 1 of the previous section ￿ ￿ 4 2 A= −1 1 which had two eigenvalues. . . .    · · · vn  =  Av1 Av2 · · · Avn  . . .   . . What just happened in this example is so important that we will illustrate it for the general case. . . . Av2 = λ2 v2 . . −1 1 1 1 1 1 0 3 This equation can be rewritten as A = SDS −1 : ￿ 4 −1 ￿ ￿ ￿￿ 2 −1 −2 2 = 1 1 1 0 0 3 ￿￿ −1 −2 1 1 ￿−1 . . . . . . . . · · · . λ = 2 and λ = 3. ..  . .  . .  A  v1 .10. . . . .  λ1 . Avn = λn vn can be written in matrix form as  . .  . · · · . . . . . . v2 · · · . . we obtain ￿ ￿￿ ￿ ￿ ￿ ￿ ￿￿ ￿ ￿ ￿ 4 2 −1 −1 4 2 −2 −2 =2 and =3 . . . . . v2 . . . . . . . . If we write the two equations Ax1 = 2x1 and Ax2 = 3x2 .

If the answer is no. . . An n × n matrix is diagonalizable if it has n real and distinct eigenvalues. . . and they are used to form the columns of S.      −1 2 3 0 1 0 3 −1 0 0 1 0 3  4 3 0  =  −1 0 4   0 6 0   −1 0 4  0 0 6 0 1 0 0 0 6 0 1 0 is an equally valid factorization. . . . We now give some conditions that insure that a matrix can be diagonalized. . even though there are only two eigenvalues. . 1. v2 · · · . . there are two distinct eigenvalues. each eigenvalue has an associated  0 0 −1 6 0 1 0 6 0 0 0 1 1 0 3 4 −1 . We will ﬁll this gap in Sections 16 and 19. which are the eigenvectors v1 .52 −1 10. −1 . Diagonalization S This last step is possible only if S is invertible. does in fact produce linearly independent eigenvectors. . .  λ1 . Just line up its eigenvectors to form the columns of S and write  2 4 0 3 3 0   0 −1 0 =  1 6 0 0 0 1 −1 1  0 0 0 3 4 This matrix equation is of the form AS = SD. then it is not diagonalizable. The only question is are there enough linearly independent eigenvectors to form a square matrix S? If the answer is yes. λn on the right by . For example. . which is to solve (A − λI)x = 0 by Gaussian elimination. This of course leaves a giant gap in our discussion since at this point we still don’t know what “linear independent” means. . . S will in fact be invertible if its columns.. .  Note that the diagonal factorization of a matrix is not completely unique. Whether or not a matrix can be diagonalized has important consequences for the matrix and what we can do with it. . . If a matrix does not have enough independent eigenvectors. . Example 2: The matrix of Example 2 of the previous section is diagonalizable. Such matrices are called defective. . In Example 1 above.  . .  vn  . then A is not diagonalizable. . . . . In Example 2 above. An n × n matrix is diagonalizable if and only if it has n linearly independent eigenvectors. and A is called diagonalizable. . By multiplying we obtain A = SDS −1 or    . as in Section 9 Examples 3 and 4. then A can be factored into A = SDS −1 where S is invertible and D is diagonal.     v1 v2 · · · A =  v1 v2 · · · vn   . Our method for ﬁnding eigenvectors. λ2  . vn . . .  . . are linearly independent. 2. there are three independent eigenvectors. . It is one of the paramount questions in linear algebra. one for each free variable. .

  0 −2 2 (a)  −2 0 −2  2 2 2 (b)    0 2 2  2 0 −2  2 −2 0 0 2  −2 0 2 −2  2 2 0 (c) 4. (See Section 22. These are the symmetric matrices. Decide which of the following matrices are diagonalizable just by looking at them. fortunately. 9 4 (a) The characteristic polynomial of A. says in part that all symmetric matrices are diagonalizable.10. Write diagonal factorizations for each of the matrices in Section 9 Exercise 1. Unfortunately there is no simple way to do this. and these eigenvectors can be used to form the columns of S since they are independent. Diagonalization 53 eigenvector. It would be helpful if we could decide if a matrix is diagonalizable just by looking at it. but. without having to go through the tedious process of determining if it has enough independent eigenvectors. that eigenvectors associated with distinct eigenvalues are always independent.) A nonsymmetric matrix may or may not be diagonalizable. If A = SDS −1 . many of the matrices that arise in physics and engineering are symmetric and are therefore diagonalizable. But there is an important class of matrices that are automatically diagonalizable.) 3. But why do distinct eigenvalues insure diagonability in general? This follows from the fact. (b) det(A) (c) A (d) The eigenvalues of A2 . (e) det(A2 ) . (See Section 22. If ￿A ￿ 2 × 2 with eigenvalues λ1 = 6 and λ2 = 7 and associated eigenvectors is ￿ ￿ 5 2 v1 = and v2 = . 3. to be proved later. 2. EXERCISES 1. then show An = SDn S −1 . then ﬁnd the following. called The Spectral Theorem. A deep theorem in linear algebra.

) The exponential of the zero matrix is therefore the identity matrix. eA . but A−1 can be thought of as the reciprocal of a matrix (deﬁned only if A is nonsingular). and multiply them. Does this make sense? Let’s try an example: ￿￿ ￿￿ ￿ ￿ ￿ ￿ ￿ ￿2 ￿ 1 0 0 0 0 1 0 0 0 1 exp = + + + ··· = 0 0 0 1 0 0 0 2! 0 0 0 1 ￿ (Note that eA is also written as exp(A). subtract. n! 2! 3! n=0 x This inﬁnite series converges to ex for any value of x and therefore can be taken as the deﬁnition of ex . MATRIX EXPONENTIAL So far we have developed a simple algebra for square matrices.? Under certain conditions we can. because of its importance in applications. 0 e3 It is clear that to exponentiate a diagonal matrix you just exponentiate its diagonal entries. . Is it possible for us to go √ further and give meaning to expressions like A.54 11. . Of course we cannot divide matrices. Does this always happen? Yes! It can be . but. Note that in both computations above the inﬁnite series of matrices converged (trivially in the ﬁrst example). To deﬁne it we use the Taylor series for the real exponential function: ∞ ￿ xn x2 x3 e = =1+x+ + + · · · for − ∞ < x < ∞. Let’s try another example: ￿￿ ￿￿ ￿ ￿ ￿ ￿ ￿ ￿2 ￿ ￿3 1 2 0 1 2 0 2 0 1 0 2 0 exp = + + + + ··· 0 3 0 1 0 3 2! 0 3 3! 0 3 ￿ ￿ ￿ ￿ ￿ 22 ￿ ￿ 23 ￿ 0 0 1 0 2 0 2! 3! = + + + + ··· 2 3 0 1 0 3 0 3 0 3 2! 3! ￿ n  ∞ 2 0 n!   =  n=0 ∞ ￿ 3n  0 n! ￿ 2 ￿ n=0 e 0 = . cos(A). We can add. . ln(A). sin(A). We use it as the starting point for the matrix exponential by simply deﬁning eA = ∞ ￿ 1 1 1 An = I + A + A2 + A3 + · · · n! 2! 3! n=0 for a square matrix A. and therefore expressions like I + 2A − 3A2 + A3 make sense. we will focus only on the matrix exponential eA . Matrix Exponential 11.

Accepting this. (We omit the proof.11. we still have the problem of how to compute eA for more complicated matrices than those in the two previous examples. The ﬁrst is that if AB = BA then eA+B = eA eB .) The second helpful property of matrix exponentials is that if A = SDS −1 then eA = SeD S −1 . (We omit the proof. then for these matrices the matrix exponential satisﬁes the familiar law of exponents.) n! n=0 ￿∞ ￿ ￿ 1 =S Dn S −1 n! n=0 = SeD S −1 Given the diagonal factorization ￿ ￿ ￿ 4 −5 1 = 2 −3 1 5 2 ￿￿ −1 0 0 2 ￿￿ 1 1 5 2 ￿−1 . if A and B commute.) This just says that. 0 e2 (Don’t forget to ﬁrst show the two matrices above commute in order to justify the use of the law of exponents. We use this fact to compute the following: ￿￿ ￿￿ ￿￿ ￿ ￿ ￿￿ 2 3 2 0 0 3 exp = exp + 0 2 0 2 0 0 ￿￿ ￿￿ ￿￿ ￿￿ 2 0 0 3 = exp exp 0 2 0 0 ￿ ￿ 2 ￿ ￿￿ ￿ ￿ ￿ ￿ ￿2 1 0 3 e 0 1 0 0 3 = + + + ··· 0 e2 0 1 0 0 2! 0 0 ￿ 2 ￿ ￿￿ ￿ ￿ ￿ ￿ ￿ ￿ 1 0 0 e 0 1 0 0 3 + ··· = + + 0 e2 0 1 0 0 2! 0 0 ￿ 2 ￿￿ ￿ e 0 1 3 = 0 e2 0 1 ￿ 2 ￿ e 3e2 = . The proof is so simple we exhibit it here: ∞ ￿ 1 e = (SDS −1 )n n! n=0 A ∞ ￿ 1 = SDn S −1 (See Section 10 Exercise 2. We can use two properties of the matrix exponential to help us. Matrix Exponential 55 shown that the inﬁnite series for eA converges for any square matrix A whatever.) Therefore eA exists for any square matrix A.

. Matrix Exponential we can therefore immediately write down exp ￿￿ 4 −5 2 −3 ￿￿ 1 = 1 ￿ 5 2 ￿￿ e−1 0 0 e2 ￿￿ 1 1 5 2 ￿−1 . The proof follows: dt ∞ d At d ￿ 1 e = (At)n dt dt n=0 n! = = ∞ ￿ 1 An ntn−1 n! n=1 ∞ ￿ ∞ ￿ ∞ d ￿ 1 n n A t dt n=0 n! =A =A 1 An−1 tn−1 (n − 1)! n=1 1 (At)n−1 (n − 1)! n=1 ∞ ￿ 1 =A (At)n n! n=0 = AeAt . We could multiply out the right-hand side. But this is easy for diagonalizable matrices like the one above since ￿ and therefore exp ￿￿ ￿ ￿ ￿ 4 −5 1 t )= 2 −3 1 5 2 ￿￿ e−t 0 0 e2t ￿￿ 1 1 5 2 ￿−1 . ￿ ￿ 4 −5 1 t= 2 −3 1 5 2 ￿￿ −t 0 0 2t ￿￿ 1 1 5 2 ￿−1 There is one more property of matrix exponentials that we will need in applid at cations. In applications to ODE’s we will need to compute matrix exponentials of the form eAt . For the matrix dt d exponential it is just eAt = AeAt . if A doesn’t have a diagonalization factorization. If A is defective. then there are more sophisticated ways to compute eA . that is. or we might just want to leave it in this form. We will not pursue them here. It is analogous to the derivative formula e = aeat .56 11.

2. . If Av = λv. ￿￿ ￿￿ ￿ 0 β cos β (a) exp = −β 0 − sin β exponential. e2t d 4. Find eAt where A is equal to each of the matrices of Section 9 Exercise 1. dt ￿ ￿ 2 0 (a) 0 3 (b) ￿ 2 0 3 2 ￿ 5. 3. then show eA v = eλ v. Find eA where A is equal to each of the matrices of Section 9 Exercise 1.) 6.) (b) exp ￿￿ α −β β α ￿￿ eα cos β = −eα sin β ￿ sin β cos β ￿ (Use the series deﬁnition of the matrix ￿ eα sin β eα cos β (Use the law of exponents. Prove the following equalities. 7.11. Matrix Exponential 57 EXERCISES 1. Prove (eA )−1 = e−A and conclude that eA is nonsingular for any square matrix A. Verify the formula eAt = AeAt where A is equal to the following matrices. Show exp ￿￿ 2 0 ￿ ￿ ￿ 2t 3 e t = 2 0 ￿ 3te2t .

58

12. Diﬀerential Equations

12. DIFFERENTIAL EQUATIONS We recall the diﬀerential equation y = ay that governs exponential growth and ˙ at decay. The general solution is y(t) = Ce . This fact will serve as a model for all that follows. Example 1: Suppose we want to solve the following linear system of ﬁrst-order ordinary diﬀerential equations with initial conditions:. x = 4x − 5y ˙ y = 2x − 3y ˙ We can write this system in matrix notation as ￿ ￿ ￿ ￿￿ ￿ x ˙ 4 −5 x = y ˙ 2 3 y ￿ ￿ x(t) or letting u(t) = we can write it as y(t) ￿ ￿ 4 −5 u= ˙ u 2 3 x(0) = 8 y(0) = 5 ￿ ￿ ￿ ￿ x(0) 8 = y(0) 5 ￿

￿ 8 u(0) = . 5

If we let A be the matrix deﬁned above (called the coeﬃcient matrix), then the system becomes simply u = Au. The solution of the system u = Au with initial con˙ ˙ At dition u(0) is u(t) = e u(0). This fact follows immediately from the computations d At (e u(0)) = A(eAt u(0)) and eA0 u(0) = Iu(0) = u(0). For the example above, the dt solution would just be ￿￿ ￿ ￿￿ ￿ 4 −5 8 u(t) = exp t . 2 3 5 Since the coeﬃcient matrix has the diagonal factorization ￿ ￿ ￿ ￿￿ ￿￿ ￿−1 4 −5 1 5 −1 0 1 5 = , 2 −3 1 2 0 2 1 2 we have 1 u(t) = 1 ￿ 5 2 ￿￿ e−t 0 0 e2t ￿￿ 1 1 5 2

To ﬁnd the ﬁnal solution it looks like we are going to have to compute an inverse. But in fact this can be avoided by writing ￿ ￿−1 ￿ ￿ ￿ ￿ 1 5 8 c = 1 1 2 5 c2 ￿

−1 ￿ ￿ 8 . 5

12. Diﬀerential Equations

59

as

which is just a linear system. Solving by Gaussian elimination we obtain ￿ ￿ c1 c2 ￿ ￿ ￿ 3 = . 1 ￿

1 1

5 2 ￿￿

c1 c2 ￿ ￿

￿ 8 = , 5

And putting this back into u(t) we get ￿￿ ￿￿ ￿ 1 5 e−t 0 3 u(t) = 1 2 0 e2t 1 ￿ ￿￿ ￿ 1 5 3e−t = 1 2 e2t ￿ −t ￿ . 3e + 5e2t = 3e−t + 2e2t ￿ ￿ ￿ ￿ −t 1 2t 5 = 3e +e 1 2 The solution in terms of the individual functions x and y is x(t) = 3e−t + 5e2t y(t) = 3e−t + 2e2t . If no initial conditions are given, then c1 and c2 would have to be carried through to the end. The solution would then look like ￿ ￿￿ ￿￿ ￿ 1 5 e−t 0 c1 u(t) = 2t 1 2 0 e c2 ￿ ￿ c e−t + 5c2 e2t = 1 −t c1 e + 2c2 e2t ￿ ￿ ￿ ￿ −t 1 2t 5 = c1 e + c2 e . 1 2 We have expressed the solution in matrix form and in vector form. Note that the vector form is a linear combination of exponentials involving the eigenvalues times the associated eigenvectors. In fact if we set t = 0 in the vector form, then from the initial conditions we obtain ￿ ￿ ￿ ￿ ￿ ￿ 1 5 8 c1 + c2 = 1 2 5

60 or

12. Diﬀerential Equations

which is the same system for the c’s that we obtained above. So the vector form of the solution carries all the information we need. This suggests that we really don’t need the matrix factorization at all. To ﬁnd the solution to u = Au, just ﬁnd the ˙ eigenvalues and eigenvectors of A, and, assuming there are enough eigenvectors, write down the solution in vector form. Example 2: Let’s try another system:    x ˙ 2 3 y  = 4 3 ˙ z ˙ 0 0 Since the coeﬃcient  2 3 4 3 0 0   0 x 0  y . 6 z ￿

1 1

5 2 ￿￿

c1 c2 ￿ ￿

￿ 8 = , 5

we can immediately write down the solution as     −t  −1   x −1 0 3 e 0 0 −1 0 3 x(0)  y  =  1 0 4  0 e6t 0  1 0 4   y(0)  z 0 1 0 0 0 e6t 0 1 0 z(0)   −t   −1 0 3 e 0 0 c1  1 0 4  0 e6t 0  c2  = 0 1 0 0 0 e6t c3    −1 0 3 c1 e−t =  1 0 4  c2 e6t  c3 e6t 0 1 0       −1 0 3 = c1 e−t  1  + c2 e6t  0  + c3 e6t  4 . 0 1 0

matrix has the diagonal factorization     0 −1 0 3 −1 0 0 −1  =  1 0 4  0 6 0  1 0 0 6 0 1 0 0 0 6

−1 0 3 0 4 1 0

Since no initial conditions were given, we have arbitrary constants in the solution. Note that once we recognize the general form of the solution, we can just write it down without going through the matrix exponential at all. In general, it is clear that if A is diagonalizable, that is, if it has eigenvalues λ1 , λ2 , · · · , λn and independent eigenvectors v1 , v2 , · · · , vn , then the solution to u = Au has the form ˙ u(t) = c1 eλ1 t v1 + c2 eλ2 t v2 + · · · + cn eλn t vn .

The factor tn ultimately has no eﬀect. In this case the matrix A is called stable.) A term of the form te2t has appeared. terms of the form tn eλt arise. In this case the matrix A is called neutrally stable. and therefore all solutions stay bounded as t → ∞. the deﬁnitions of stable. If all the eigenvalues are negative or zero with at least one actually equal to zero. If all the eigenvalues are negative. then the solutions consist of linear combinations of dying exponentials and at least one constant function. If at least one eigenvalue is positive. In general. There is also a third possibility. and therefore u(t) → 0 as t → ∞. EXERCISES 1. Find the general solution of u = Au where A is equal to each of the matrices in ˙ Section 9 Exercise 1. then there are solutions u(t) containing at least one growing exponential and therefore those u(t) → ∞ as t → ∞. Diﬀerential Equations 61 It is also clear that the eigenvalues decide how the solutions behave as t → ∞. (Actually a more precise statement has to be made in the case that zero is a multiple eigenvalue. then all the solutions consist only of linear combinations of dying exponentials. 2.         ￿ ￿ ￿ ￿ x(0) 1 x(0) 0 x(0) 3  y(0)  =  2   y(0)  =  1  (a) = (b) (c) y(0) 2 z(0) −3 z(0) 3 . Note that this term does not change the qualitative nature of the solution u(t) as t → ∞. It can be shown that this behavior holds for all defective matrices.12. but we will ignore this possibility. In this case the matrix A is called unstable . That is. Find the solutions of the systems above with the initial conditions below. unstable. This is typical of defective systems. y(0) (See Section 11 Exercise 3. but what about defective matrices? Consider the following example: ￿ ￿ ￿ ￿￿ ￿ x ˙ 2 3 x = . This is the situation with both systems above. y ˙ 0 2 y The solution is ￿ ￿ ￿￿ x 2 = exp y 0 ￿ ￿￿ ￿ ￿ 2t 3 x(0) e t = 2 y(0) 0 3te2t e2t ￿￿ ￿ x(0) . All this is clear enough for diagonalizable matrices. and neutrally stable and their implications about the long-term behavior of solutions hold for these matrices also. The eigenvalues therefore determine the qualitative nature of the solution.) All of this will become clearer when we consider the Jordan form of a matrix in a later section. but they tend to zero or inﬁnity as t → ∞ depending on whether λ is negative or positive.

This is just a ˙ simple system of n individual of ODE’s of the form w1 = λ1 w1 . ￿ ￿ 44 −28 (a) 77 −49 (b) (c) ￿ ￿ 47 75 8 15 −30 −48 −6 −11 ￿ ￿ 4. w2 (t) = c2 eλ2 t . but it does not generalize so easily to the complex case or the case of defective matrices. This alternate approach avoids the matrix exponential. Here is another way to derive the general form of the solution of the system u = Au. wn (t) = cn eλn t . cn eλn t and conclude that the solution of the original system is u(t) = Sw(t) = c1 eλ1 t v1 + c2 eλ2 t v2 + · · · + cn eλn t vn .  . wn = ˙ ˙ ˙ λn wn . assuming the diagonal factorization A = SDS −1 . Decide the stability properties of the following matrices.62  12. w2 = λ2 w2 . Diﬀerential Equations   x(0) 0 (d)  y(0)  =  0  z(0) 1    x(0) 4 (e)  y(0)  =  3  z(0) 4      x(0) 2  y(0)   2  (f)  =  z(0) 1 w(0) 2  3. Make the change of ˙ variables w = S −1 u. · · · . and show that the system then becomes w = Dw. that is. Write this as  c1 eλ1 t  c2 eλ2 t  w(t) =  . · · · . the eigenvectors of A.  . where the v’s are the columns of S.   . These equations are well-known to have solutions w1 (t) = c1 eλ1 t .

but tedious. In both cases above we multiplied the numerator and denominator by the complex conjugate of the denominator. and not just for theoretical reasons. For example: (2 + i) + (3 − i2) = 5 − i Dividing complex numbers is a little more troublesome.13. Complex conjugation commutes with multiplication. but don’t try to give any metaphysical meaning to it!) If z = a + ib. Complex numbers are added and multiplied much like real numbers. and so on carry over to complex matrices. Recall that a complex number has the form a + ib where a and b are real numbers and i is a quantity that satisﬁes the equation i2 = −1. 1 1 . Now we can go to work. that is. Does this mean that we have to consider complex eigenvalues. diagonalization. Gaussian elimination. but you must keep in mind that i2 = −1. It is possible. We use the same trick to divide two complex numbers: 2+i 2 + i (3 + i2) 6 + i4 + i3 − 2 4 + i7 4 7 = = = = +i 3 − i2 3 − i2 (3 + i2) 9+4 13 13 13 We say that the complex conjugate of a complex number a + ib is a − ib and write a + ib = a − ib. The complex case is essential in solving linear systems of diﬀerential equations that describe oscillations. (You can think of i as denoting √ −1. When we considered real systems Ax = b. First we take the reciprocal of a complex number: 1 1 (3 + i2) 3 + i2 3 2 = = = +i 3 − i2 3 − i2 (3 + i2) 9+4 13 13 (2 + i)(3 − i2) = 6 − i4 + i3 + 2 = 8 − i We just multiplied the numerator and denominator by 3 + i2. But in the eigenvalue problem we have seen that there are real matrices whose characteristic equations have complex roots. and complex diagonal factorizations? The answer is yes. determinants. We can deﬁne complex matrices in the same way as real matrices. There was no need to consider complex numbers. Two complex numbers are equal if and only if their real and imaginary parts are equal. to show that the algebra of matrices. inverses. Let’s consider the eigenvalue problem for the matrix ￿ ￿ 3 −2 A= . THE COMPLEX CASE We can no longer avoid complex numbers. the solution x was automatically real. eigenvalues and eigenvectors. The Complex Case 63 13. First we give a brief review of the most basic facts about complex numbers. complex eigenvectors. wz = w z (Exercise 1). then a is the real part of z and b is the imaginary part of z.

(In fact. complex arithmetic is involved. so this isn’t something we would want to do for large systems. however.64 13.) Since we are now in the complex world. For the eigenvalue 2 − i the computation is 1 ￿ ￿ 1−i almost the same.) Solving two ￿ 1+i this we obtain the eigenvector . Of course. How can this be? Somehow or other. The Complex Case We compute the characteristic equation in the usual way and obtain λ2 − 4λ + 5 = 0. The roots are 2 + i and 2 − i. 0 0 ￿0 (In the second step we exchanged the ￿ rows to avoid a complex division.) Now we simply line up these vectors in the usual way and obtain ￿ ￿￿ ￿ ￿ ￿￿ ￿ 3 −2 1+i 1−i 1+i 1−i 2+i 0 = . use Gaussian elimination to solve the system (A − (2 + i)I)x = 0: ￿ ￿ ￿ ￿0 3 − (2 + i) −2 ￿ 1 1 − (2 + i) ￿ 0 ￿ ￿ ￿ 1−i −2 ￿ 0 ￿ 1 −1 − i ￿ 0 ￿ ￿ ￿ 1 −1 − i ￿ 0 ￿ 1−i −2 ￿ 0 ￿ ￿ ￿ 1 −1 − i ￿ 0 ￿ . we can consider these two complex numbers as the eigenvalues of A. a complex conjugate pair. 1 1 1 1 0 2−i 1 1 Everything worked exactly as in the real case. First we take the eigenvalue 2 + i and. See Exercise 2. It turns out that it is possible to transform the complex diagonal factorization into one which is real and almost diagonal. and we obtain the eigenvector . something about this factorization that is troubling. To describe this we introduce some . Now let’s look for the eigenvectors. all the imaginary parts of the complex numbers appearing in them must cancel out! From this we might suspect that it shouldn’t really be necessary to introduce complex numbers in order to obtain a useful factorization of a real matrix. when the three matrices on the right are multiplied out. as usual. 1 1 1 1 1 1 0 2−i and therefore we have the complex diagonal factorization ￿ ￿ ￿ ￿￿ ￿￿ ￿−1 3 −2 1+i 1−i 2+i 0 1+i 1−i = . but at least the same principles hold. There is. all the complex roots of any real polynomial equation occur in complex conjugate pairs. (Note that this vector is 1 the complex conjugate of the previous eigenvector. The three matrices on the right are complex and the matrix on the left is real.

We can ignore them. The middle factor is no longer diagonal. The Complex Case 65 notation. (These are not so easy to compute by hand since the characteristic polynomial of B is of fourth degree. Everything on the right side is real. Then the second eigenvalue and associated eigenvector are λ = 2 − i 1￿ ￿ 1−i and v = . Since complex numbers are equal if and only if their real and imaginary parts are equal. When multiplied out it becomes Ax + iAy = (αx − βy) + i(βx + αy). Let  −2 −2 −2 −2 0 −2 −1   1 B= .) Let’s look at another example. −1 + i. Therefore for the matrix of our example we obtain ￿ ￿ 3 −2 1 = 1 1 1 1 0 ￿￿ 2 −1 1 2 1 1 ￿−1 . 0 0 1 −2 0 0 1 3  The eigenvalues of B are 2 + i.13. (The question of the independence of the vectors x and y will be settled in Section 16 Exercise 7. this equation implies that Ax = αx − βy and Ay = βx + αy. but it exhibits the real and imaginary parts of the eigenvalue in a nice pattern. This our desired factorization. In array form ￿   −4 − i −2 −2 −2 ￿ 0 ￿ −2 − i −2 −1 ￿ 0   1 ￿   0 0 −1 − i −2 ￿ 0 ￿ 0 0 1 1−i 0 .) First we ﬁnd the eigenvector associated with 2 + i by solving the system (B − (2 + i)I)x = 0. Then the basic equation Av = λv can be written A(x+iy) = 1 0 (α + iβ)(x + iy). These two equations can be written simultaneously in matrix form as ￿ ￿ ￿ ￿￿ ￿ x1 y1 x1 y1 α β A = x2 y2 x2 y2 −β α or x1 A= x2 ￿ ￿ y1 y2 ￿￿ α −β β α ￿￿ x1 x2 ￿￿ y1 y2 ￿−1 1 0 . −1 − i. We write the ﬁrst eigenvalue and associated eigenvector as λ = 2 + i and ￿ ￿ 1+i v= . 2 − i. Now identify the real and imaginary parts of λ and v as λ = α + iβ = 2 + i and ￿ ￿ ￿ ￿ 1 1 v = x+iy = +i . Clearly they are just complex conjugates of the ﬁrst eigenvalue 1 and eigenvector and therefore don’t add any new information.

In array form ￿  −1 − i −2 −2 −2 ￿ 0 ￿ 1 − i −2 −1 ￿ 0   1 ￿   0 0 2 − i −2 ￿ 0 ￿ 0 0 1 4−i 0    1 0  0 0 1−i 0 0 0 0 1 0 0 ￿  0￿0 ￿ 0￿0 ￿  1￿0 ￿ 0 0 by Gaussian elimination becomes  −1 + i  1  which gives the eigenvector  . All we have to do 0 0 now is write down the answers. The Complex Case by Gaussian elimination becomes   1 0  0 0 0 1 0 0 0 0 1 0 ￿  0 ￿0 ￿ i ￿0 ￿  1 − i￿0 ￿ 0 0  0  −i  which gives the eigenvector  . we ﬁnd the eigenvector associated −1 + i 1 with −1 + i by solving the system (B − (−1 + i)I)x = 0. Similarly. −1 + i −1 − i 0 0 1 1 0 0  .66 13. We are essentially done. The complex diagonal factorization is  0 0 −1 + i −1 − i i 1 1   −i B=  −1 + i −1 − i 0 0 1 1 0 0   2+i 0 0 0 2−i 0 0   0   0 0 −1 + i 0 0 0 0 −1 − i  −1 0 0 −1 + i −1 − i i 1 1   −i   .

We solve the system in the same way as above using the real block-diagonal factorization of B and obtain . 0 −1 1 −1 1 0 0 0 −1 −1 1 0 0 0 ￿ ￿ α + iβ 0 Note how each of the two complex conjugate eigenvalues in the 0 ￿ α − iβ ￿ α β diagonal matrix of the ﬁrst factorization expand to 2 × 2 blocks in the −β α diagonal-like matrix of the second factorization. = 2t 2t 1 0 −e sin t e cos t c2 ￿ ￿￿ ￿ 1 1 c1 e2t cos t + c2 e2t sin t = 1 0 −c1 e2t sin t + c2 e2t cos t ￿ ￿ ￿ ￿ 1 1 2t 2t = e (c1 cos t + c2 sin t) + e (−c1 sin t + c2 cos t) 1 0 Now consider the larger system w ˙ x ˙ y ˙ z ˙ = − 2w − 2x − 2y − 2z = w − 2y − z = y − 2z = y + 3z. From now on we will call such diagonal-like matrices block diagonal matrices.13. we get ￿ ￿ ￿￿ ￿ ￿￿ ￿ x(t) 3 −2 x(0) = exp t y(t) 1 1 y(0) ￿ ￿ ￿￿ ￿ ￿￿ ￿−1 ￿ ￿ 1 1 2 1 1 1 x(0) = exp t 1 0 −1 2 1 0 y(0) ￿ ￿￿ 2t ￿￿ ￿ 1 1 e cos t e2t sin t c1 . Using the real block-diagonal factorization of A computed above and the result of Section 11 Exercise 5(b). The Complex Case 67 And the corresponding real diagonal-like factorization is  −1 1 0 0 0 0 −1 1 2 0 0   0 −1 1 0    .  0 0 −1 1 2  0 −1 1 0   −1 B=  −1 1 0 0 0 1 0 0 0 0  The coeﬃcient matrix is just B of the second example. Now we apply all this to solving diﬀerential equations. To solve the system we have to compute eAt . Suppose we have the following system: x = 3x − 2y ˙ y= x+ y ˙ The coeﬃcient matrix is just A of the ﬁrst example above.

0 0 0 0 2  −1 exp  0 0    1 0 0 2 0 0    t 0 −1 1 0 −1 −1  (The third equality requires a slight generalization of Section 11 Exercise 5(b).68  13. then so is λ = α − iβ. If λ = α + iβ.) Now we can see the pattern. v = x − iy. The Complex Case   w(t) −2  x(t)   1   = exp  y(t) 0 z(t) 0   0 0 −1 1  0 −1 1 0  =  −1 1 0 0 1 0 0 0    −2 −2 −2 w(0) 0 −2 −1    x(0)   t   0 1 −2 y(0) 0 1 3 z(0) −1   0 0 −1 1 w(0)  0 −1 1 0   x(0)      −1 1 0 0 y(0) 1 0 0 0 z(0)    2t   0 0 −1 1 e cos t e2t sin t 0 0 c1 2t 2t 0 0  0 −1 1 0   −e sin t e cos t   c2  =    −1 1 0 0 0 0 e−t cos t e−t sin t c3 −t −t 1 0 0 0 0 0 −e sin t e cos t c4    0 0 −1 1 c1 e2t cos t + c2 e2t sin t  0 −1 1 0   −c1 e2t sin t + c2 e2t cos t  =   −1 1 0 0 c3 e−t cos t + c4 e−t sin t 1 0 0 0 −c3 e−t sin t + c4 e−t cos t     0 0  0   −1  = (c1 e2t cos t + c2 e2t sin t)   + (−c1 e2t sin t + c2 e2t cos t)   −1 1 1 0     −1 1 1   0 +(c3 e−t cos t + c4 e−t sin t)   + (−c3 e−t sin t + c4 e−t cos t)   . and they together will contribute terms like · · · + (c1 eαt cos βt + c2 eαt sin βt)x + (−c1 eαt sin βt + c2 eαt cos βt)y + · · · to the solution. When t = 0 these terms become · · · c1 x + c2 y + · · · and are equated to the initial conditions. . v = x + iy is a complex eigenvalueeigenvector pair for the coeﬃcient matrix. Terms of the form eαt cos βt and eαt sin βt describe oscillations.

In order to use the machinery that we have built up. The eigenvalues are λ = ±iω. Finally we present an application that describes vibrations in mechanical and electrical systems. divide by m and let ω 2 = k/m. In modeling mass-spring systems. x where m = the mass. you know a lot about the behavior of the solutions of that system without actually solving it. Therefore. k = the spring constant. Newton’s second law of motion and Hooke’s law lead to the second-order diﬀerential equation m¨(t) + kx(t) = 0. Just as in the real case. The real part α of the eigenvalue determines whether the oscillations grow without bound or die out. The solution of the system is iω 0 ω therefore ￿ ￿ ￿ ￿ ￿ ￿ y1 (t) 1 0 = (c1 cos ωt + c2 sin ωt) + (−c1 sin ωt + c2 cos ωt) .13. so the equation becomes x + ω 2 x = 0. We then obtain the system ˙ y 1 = y2 ˙ y 2 = − ω 2 y1 ˙ ￿ y1 ˙ y2 ˙ ￿ 0 = −ω 2 ￿ 1 0 ￿￿ ￿ y1 . y2 or in matrix form To solve the system we have to diagonalize the coeﬃcient matrix. and is neutrally stable if all of its eigenvalues have nonpositive real parts with at least one with real part actually equal to zero. is unstable if one of its eigenvalues has positive real part. The Complex Case 69 The imaginary part β of the eigenvalue controls the frequency of the oscillations. Using Gaussian elimination to solve (A − iωI)x = 0 ￿ ￿ ￿ −iω 1 ￿0 ￿ −ω 2 −iω ￿ 0 ￿ ￿ ￿ −iω 1 ￿ 0 ￿ 0 0￿0 ￿ ￿ ￿ ￿ ￿ ￿ 1 1 0 we obtain the eigenvector = +i . and x(t) = the displacement of the mass as a function of time. It is still possible by more general kinds of factorizations to compute exponentials of such matrices. y2 (t) 0 ω . In systems of diﬀerential equations such matrices will produce solutions containing terms of the form tn eαt cos βt and tn eαt sin βt. To do this let y1 = x and y2 = x. the factor of tn doesn’t have any eﬀect on the long-term qualitative behavior of such solutions. For simplicity. ¨ we have to cast this second-order equation into a ﬁrst-order system. Stability or instability and the oscillatory behavior of the solutions is still determined by the eigenvalues. We can therefore extend the language of the real case and say that a matrix is stable if all of its eigenvalues have negative real parts. if you know the eigenvalues of a system of diﬀerential equations. What about defective matrices? These are matrices with repeated complex eigenvalues that do not provide enough independent eigenvectors with which to construct a diagonalization.

Show if A is real and Av = λv. the real block-diagonal factorizations. v. ￿ ￿ 9 −10 (a) 4 −3   −1 0 3 (b)  −5 1 1  −3 0 −1 5. then Av = λv. This is the mathematical representation of simple harmonic motion. Find the complex diagonal factorizations. (a) (b) x = 9x − 10y ˙ y = 4x − 3y ˙ x=− x ˙ + 3z y = − 5x + y + z ˙ z = − 3x ˙ − z 6. Find the general solutions of the following systems of diﬀerential equations. α 4. Conclude that if λ. then so is λ. and the stability of the following matrices. The Complex Case It follows that the solution of the original problem is x(t) = y1 (t) = c1 cos ωt + c2 sin ωt. ￿ ￿ ￿ ￿ x(0) 3 (a) = y(0) 1     x(0) −2 (b)  y(0)  =  −1  z(0) 3 .70 13. Find the eigenvalues of the matrix ￿ α −β ￿ β . EXERCISES 1 1. Find the solutions of the systems in Exercise 5 with the following initial conditions. 2. Verify = a + ib ￿ a 2 + b2 a ￿ +i ￿ −b 2 + b2 a ￿ and (a + ib)(c + id) = (a + ib)(c + id). v is a complex eigenvalue-eigenvector pair for A. 3.

A3 u0 . u1 . . A2 u0 . λ2 . Diﬀerence equations are discrete analogues of diﬀerential equations.14. and (2) ﬁnd an explicit formula for uk in terms of u0 . · · · . Diﬀerence Equations and Markov Matrices 71 14. then we can use the fact that Ak = SDk S −1 (Section 10 Exercise 2). λn and associated eigen- . · · ·. ecology. The general relationship between consecutive terms of this sequence is expressed as a diﬀerence equation: uk = Auk−1 . To this end we observe that u1 = Au0 u2 = Au1 = A(Au0 ) = A2 u0 u3 = Au2 = A(A2 u0 ) = A3 u0 . biology. . u2 . They occur in a wide variety of applications and are used to desacribe relationships in physics. DIFFERENCE EQUATIONS AND MARKOV MATRICES In this section we investigate how eigenvalues can be used to solve diﬀerence equations. The basic challenge posed by a diﬀerence equation is to describe the behavior of the sequence u0 . The problem of ﬁnding the solution uk = Ak u0 of the diﬀerence equation at the kth stage then reduces to computing the matrix Ak and determining its behavior as k becomes large. so the sequence becomes u0 . then the following inﬁnite sequence of column vectors can be generated: u1 = Au0 u2 = Au1 u3 = Au2 . Au0 . Let A have eigenvalues λ1 . Let A be an n × n matrix and u0 be an n × 1 column vector. . engineering. . Speciﬁcally. · · ·. Suppose A has the diagonal factorization A = SDS −1 . chemistry. u3 . (1) determine if the sequence has a limit and if so then ﬁnd it. and demographics.

cn λk n  . In general.5 k ￿ ￿ ￿ ￿ 2 2 and u0 = . . . .  . .  λk n  c1   c2   . Then from the general solution uk = c1 λk v1 + c2 λk v2 + 1 2 · · · + cn λk vn it is clear that the behavior of uk as k → ∞ is determined by the size n of λ1 . (Note its similarity to the general solution of a system of ODE’s in Section 12.  . the long-term behavior of uk is determined by the largest λi for which ci ￿= 0. λk 2 .       = c1 λk  v1  + c2 λk  v2  + · · · + cn λk  vn  1 2 n . v2 . .   · · · vn    . . . . 0 Example 1: Find uk = A u0 where A = −.72 14. . Since A has the 2.  λ1 . · · · . (We are assuming that c1 ￿= 0. . Diﬀerence Equations and Markov Matrices vectors v1 . . . may have a limit |λ1 | > 1 ⇒ uk blows up. . . .5 5 . 1 1 . = c1 λk v1 + c2 λk v2 + · · · + cn λk vn 1 2 n This is then the general solution of the diﬀerence equation. . . To determine the long-term behavior of uk . which can be solved by Gaussian elimination.  c2 λk  2   =  v1 v2 · · · vn   ..  =  v1 . .  k . . .) The c’s are determined by the equation c = S −1 u0 . .   . . .   . . .  . . and let c = S −1 u0 .  .   . which is again Sc = u0 . . . . . then uk = Ak u0 = SDk S −1 u0 = SDk c  . |λ1 | < 1 ⇒ uk → 0 |λ1 | = 1 ⇒ uk bounded.) We now illustrate these ideas with the following examples. . cn  c λk   . . . To be speciﬁc. . v2 . . vn . We can avoid the taking of an inverse by multiplying this equation by S to obtain the linear system Sc = u0 . . . This can also be seen by letting k = 0 in the general solution to obtain u0 = c1 v1 + c2 v2 + · · · + cn vn . let the eigenvalues be ordered so that |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |.   . . . . .

14.5 5 ￿ ￿￿ ￿k ￿ ￿−1 ￿ ￿ 2 4 2 0 2 4 2 = 2 1 0 . Example 2: Each year 2/10 of the people in California move out and 1/10 of the people outside California move in. we could have written down the solution in this form as soon as we knew the eigenvaluse and eigenvectors.9 2 −1 0 . This is.1 1 1 1 0 1 1 = . We really didn’t need the diagonal factorization. (Of course.5 we have ￿ ￿ ￿ ￿ ￿￿ ￿￿ 2 2 4 2 0 2 = 2.2 .8Ik + .5)k . Diﬀerence Equations and Markov Matrices 73 diagonal factorization 0 A= −.5 2 1 0 . We only have to make sure that there are enough independent eigenvectors ￿ insure ￿ ￿ ￿ to ￿ ￿ 2 4 c1 2 that the diagonal factorization exists.5)k It is also clear that uk becomes unbounded as k → ∞.1Ok Ik+1 .5 2 4 1 ￿−1 . 2 1 6(2)k − (.2Ik + . The relationship between the populations in successive years is given by ￿ ￿ ￿ ￿￿ ￿ Ik+1 = .8 . of course.) And since the system = 2 1 c2 5 has the solution c1 = 3. As usual we ﬁnd the diagonal factorization of A ￿ ￿ ￿ ￿￿ ￿￿ ￿−1 . 2 1 ￿ Ik The problem is to ﬁnd the population distribution uk = and to determine if Ok it tends to a stable limit.2 .5) = .5 2.9Ok ￿ ￿k ￿ ￿ 0 2 2 uk = −. c2 = −1.7 2 −1 . Let Ik and Ok be the numbers of people inside and outside California in the kth year.8 .1 Ik or = . we obtain ￿ ￿ ￿ ￿ ￿ ￿ 6(2)k − 4(. The initial populations are I0 = 20 million and O0 = 202 million.9 Ok Ok+1 = . Ok+1 .5k c2 ￿ ￿ ￿ ￿ 2 4 = c1 (2)k + c2 (.5 2 1 5 ￿ ￿￿ k ￿￿ ￿ 2 4 2 0 c1 = 2 1 0 . the problem of solving the diﬀerence equation uk = Auk−1 where A is the matrix above.5)k k 2 k 4 uk = 3(2) + (−1)(.

and economic processes: (1) the total quantity in question is always constant. As a consequence of these two properties. so that clearly Ak u0 → c1 v1 . We cannot prove this theorem completely with the tools we have developed so far. we have AT v = v where v is the column vector consisting only of one’s. We can then write the solution as ￿ ￿ ￿ ￿ 1 1 k uk = 74(1) − 54(.7)k k ￿ 1 1 2 −1 ￿￿ c1 c2 ￿ 20 = 202 ￿ ￿ This is then the population distribution for any year. Any Markov matrix A has the following properties.7) = . (c) If any power of A has all entries positive. 148 + 54(. Then as usual we have Ak u0 = c1 (1)k v1 +c2 λk v2 +· · ·+cn λk vn . which we state but do not prove in the following theorem. Second. This is exactly n 2 what happened in the example above. note that the columns of the matrix A above are nonnegative and add to one. but we can make parts of it plausible. Note that as k → ∞ the ￿ ￿ 74 population distribution tends to . assume A has a diagonal factorization and λ2 .7) 2 −1 ￿ ￿ k 74 − 54(. Any matrix with nonnegative entries whose columns add to one is called a Markov matrix and the process it describes is called a Markov process. since the columns of A sum to one. Theorem. λn all have absolute value < 1. In terms of . Note also that. Markov matrices have several important properties. we have A(c1 v1 ) = c1 v1 . First. We therefore say c1 v1 is a stable distribution or it represents a steady state. Diﬀerence Equations and Markov Matrices and solve the system to obtain c1 = 74 and c2 = −54. and all the people outside California have to either move in or remain outside (⇒ the second column adds to one). (a) All the eigenvalues of A satisfy |λ| ≤ 1. (b) λ = 1 is always an eigenvalue and there exists an associated eigenvector v1 with all entries ≥ 0.74 14. · · · . This means that one is an eigenvalue of AT and therefore of A also since both matrices have the same eigenvalues (Section 9 Problem 3(a)). then multiples of v1 are the only eigenvectors associated with λ = 1 and Ak u0 → c1 v1 for any u0 . since the limiting vector c1 v1 is a multiple of the eigenvector associated with λ = 1. 148 This example exhibits two esssential properties that hold in many chemical. biological. and (2) the individual quantities are never negative. This can be interpreted as saying that each year all the people inside California have to either remain side or move out (⇒ the ﬁrst column adds to one).

9 74 initial population distribution is .25 .25 0 . and the trucks in Chicago are split evenly between New York and Los Angeles.S. Ck+1 ∗ ∗ ∗ Ck ￿ .5 4. Each year the distribution changes according to      U Sk+1 . and Europe have total assets of \$4 trillion.5 U Sk  Jk+1  =  .5 0   Jk  . 3. Diﬀerence Equations and Markov Matrices 75 ￿￿ ￿ ￿ ￿ . And if the 148 ￿ ￿ 74 initial population distribution is something else. EXERCISES 1.75 ￿ ￿ −2.      N Yk+1 ∗ ∗ ∗ N Yk  LAk+1  =  ∗ ∗ ∗   LAk  . For the diﬀerence equation uk = Auk−1 where the matrix A and the starting vector u0 ￿ as given below. it will remain as such forever.25 128 (a) A = and u0 = . Los Angeles. (d) Show the limiting distribution is stable.5 .5 . and \$2 trillion in Europe. \$0 in Japan. In other words.S. Suppose multinational companies in the U. the other half stay where they are. A truck rental company has centers in New York.2￿ . Los Angeles. Initially the distribution of trucks is 90.1 74 74 the population example this means = .8 .. 1 1 2 2. (c) Find the limiting distribution of assets. and Chicago. 64 ￿ ￿ .5 .5 Ek (We are implicitly making the completely false assumption that the world economy is a zero-sum game!) (a) Find the diagonal factorization of A. 30. Initially the distribution of assets is \$2 trillion in the U. (b) Find the distribution of assets in year k. if the 148 148 ￿ .14. it will tend to in the long 148 run. Japan. Ek+1 . and 30 in New York. −1 ￿ 2 ￿ ￿ ￿ 10 1 4 −1 (c) A = and u0 = . Every month half of the trucks in New York and Los Angeles go to Chicago.5 .5 18 (b) A = and u0 = . are ￿ ￿ ￿ .. compute uk and comment upon its behavior as k → ∞. and Chicago respectively.

Now suppose we allow only males of type   p  q  . the characteristics of an oﬀspring are determined by a pair of genes. Show the limiting distribution is stable.5 0  0 . Suppose in the setup of the previous problem we allow males of all genotypes to reproduce. Find the diagonal factorization of A. q. What is the limiting distribution? 6. 4. and gg genotypes in the initial generation. Find the distribution of trucks in month k. and r respectively represent the proportions of GG. Let the initial distribution of genotypes be u1 = r p.) Show that the Markov matrix   0 0 0 A =  1 . (They must be nonnegative and sum to one.5 . The entries gg to reproduce. In species that reproduce sexually.    Dk+1 ∗  Sk+1  =  ∗ Wk+1 ∗ ∗ ∗ ∗   ∗ Dk ∗   Sk  ∗ Wk 5. Let G and g respectively represent the proportion of G genes and g genes in the initial generation.76 (a) (b) (c) (d) (e) 14.) Show that G = p + q/2 and g = r + q/2. (They also must be nonnegative and sum to one. The genes of a particular trait (say eye color) are of two types. Diﬀerence Equations and Markov Matrices Find the Markov matrix A that describes this process. Find the corresponding Markov matrix and ﬁnd its stable distribution.5g  0 G g . a quarter of those who are sick get well. and another quarter of those who are sick die. Gg.5 1 represents how the distribution of genotypes in one generation transforms to the next under our restrictive mating policy (that is. whereas those of type gg exhibit the recessive trait. only blue-eyed males can reproduce).5G . Find the limiting distribution of trucks. Suppose there is an epidemic in which every month half of those are well become sick. the dominant G (brown eyes) and the recessive g (blue eyes). Show that the Markov matrix G g A= 0  . one inherited from each parent. Oﬀspring with genotype GG or Gg exhibit the dominant trait.

ramdom-mating population. In this way.14. even the rarest of genes. The matrix A therefore represents how the distribution of genotypes in the second generation transforms to the third. Show that u2 =  2Gg  . the distribution of genotypes and the proportion of dominant and recessive genes tend to remain constant from generation to generation. which one would expect to disappear. or migration come into play. mutation. (What does this say in the important special case where p = r?) This result is the Hardy-Weinberg law and is at the foundation of the modern science of population genetics. . are preserved. unless outside forces such as selection. Genetic equilibrium is therefore reached after only one generation. It says that in a large. Show that u3 = u2 . Show that G and g again respectively represent the g2 proportion of G genes and g genes in the second generation. Diﬀerence Equations and Markov Matrices 77 represents how the distribution genotypes in the ﬁrst generation transforms to the  2 of G second.

The operations must also satisfy the following rules: 1. and Span PART 2: GEOMETRY 15. x2 .78 15. called vectors. Yet there is a natural geometric approach to matrices that is at least as important as the algebraic approach. 5. For example. on which two operations are deﬁned. a vector in a higher dimensional Euclidean x3 .. the point (x1 . x3 ) in 3-space corresponds  x1 to the vector  x2  in R3 . This means that if x and y are vectors in V and if a is a scalar. lines. (We have been calling such matrices column vectors all along. If the scalars are real numbers.) That these spaces are vector spaces follows directly from the properties of matrices. R2 . We now want to examine what is really at the heart of these concepts. 3. The representations are clear. Likewise. . the vector space is a complex vector space. The space Rn consists of all n×1 column matrices with the familiar deﬁnitions of addition and scalar multiplication of matrices. 2. VECTOR SPACES. A vector space V is a collection of objects. R3 . but for a diﬀerent and deeper understanding we must look to geometry. . 8. addition and multiplication by scalars (numbers). but nothing of a geometric nature has been considered. 4. 1x = x (ab)x = a(bx) a(x + y) = ax + ay (a + b)x = ax + bx To put meat on this abstract deﬁnition we need some examples. To do this. R2 by the real plane. there is a unique vector −x such that x + (−x) = 0. SUBSPACES. . For us the most important vector spaces are the real Euclidean spaces R1 . Subspaces. we deﬁne an abstract model of a vector space and then show how this idea can be used to develop concepts and properties that are valid in all concrete instances of vector spaces. and geometrical vectors in two and three dimensional physical space. Vector Spaces. V must be closed under addition and scalar multiplication. 6. AND SPAN The presentation so far has been entirely algebraic. The ﬁrst three spaces can be identiﬁed with familiar geometric objects: R1 is represented by the real line. We will assume some familiarity with. then x + y and ax are also vectors in V . Matrices have been added and multiplied. The mechanics of Gaussian elimination has produced for us one kind of understanding of linear systems. and R3 by physical 3-space. 7. planes. For each vector x. x+y =y+x x + (y + z) = (x + y) + z There is a “zero” vector 0 such that x + 0 = x for all x. and if the scalars are complex numbers. equations have been solved. the vector space is a real vector space.

we obtain the complex Euclidean spaces: C 1 .15. the collection of all real valued functions deﬁned and continuous on [0.1]. · · ·. C 3 . A particular example is C 0 [0. and Span 79 space is completely determined by its components. 1]. R2 . y x+y y x x x-y FIGURE 3 . x3 ) x3 x1 x2 x3 x2 a vector x2 x1 a point x1 FIGURE 2 If we take column vectors whose components we allow to be complex numbers. (We were actually in the world of complex spaces in Section 13.) Even more abstract vector spaces that cannot be visualized as any kind of Euclidean space are functions spaces. And the vector −x has the same length as x but points in the opposite direction. This geometric desciption even extends to higher dimensional Euclidean spaces. Subspaces. (Note that the line segment from y to x is not the vector x − y and in fact is not a vector at all!) The product ax is the vector obtained from x by multiplying its length by a. x2 . Vector Spaces. C 2 . 1] is a real vector space. and R3 is that for them addition and scalar multiplication have simple geometric interpretations: The sum x + y is the diagonal of the parallelogram with sides formed by x and y. x3 (x1 . The diﬀerence x−y is the other side of the parallelogram with one side side y and diagonal x. but it is impossible to see it geometrically. we will concentrate on real Euclidean spaces. It is easy to see that C 0 [0. For now. even though the geometry is hard to visualize. One nice thing about the ﬁrst three Euclidean spaces R1 . since we want to keep things as concrete as possible.

Vectors that satisfy this equation. These vectors form a subspace of R2 since sums and scalar products of vectors that satisfy the equation must also satisfy the equation. x2 c -2 1 x1 FIGURE 4 If we change the equation to x1 + 2x2 = 2 we still have a line.) Furthermore. cannot form a subspace since the sum of two such . however. we can ﬁnd all such vectors explicitly. We formalize this idea by saying that a subset S of a vector space V is a subspace of V if S has the following properties: 1. there is no need to verify the rules for a vector space for S. Since addition and scalar multiplication in S follow the rules of the host space V . 2. (We will prove this using matrix notation later. Clearly they are represented by points in R2 that lie on a line through the origin. We now look at some examples of subspaces of Rn . If x and y are vectors in S. then x + y is also a vector in S. clearly a line through 1 the origin. run Gaussian elimination (unnecessary here of ￿ course).80 15. thereby giving another veriﬁcation that we have a subspace. and express the solu￿ −2 tion in vector form c . assign leading and free variables. Subspaces. If x is a vector in S and a is any scalar. S contains the zero vector. and Span It turns out that the vector spaces that we will need most occur inside the standard spaces Rn . that is. then ax is also a vector in S. We just write the equation in matrix form [1 x 2] 1 x2 ￿ ￿ = [0] ￿ and solve as usual. we write the array [ 1 2 | 0 ] . It is easy to show that such vectors are closed under addition and scalar multiplication (proved in greater generality later). It is automatically a vector space in its own right. 3. Vector Spaces. ￿ x1 Example 1: Consider all vectors in R2 whose components satisfy the equation x2 x1 + 2x2 = 0. We get all multiples of one vector.

Vector Spaces. so the zero vector is not even included.   x1 Example 2: Consider all vectors  x2  in R3 whose components satisfy the equation x3 x1 − x2 + x3 = 0. Subspaces. Vectors that satisfy this equation are closed under addition and scalar multiplication. We can ﬁnd all such vectors by writing the equation in matrix form   x1 [ 1 −1 1 ]  x2  = [ 0 ] x3 and solving. and Span 81 vectors does ￿ ￿ not satisfy the equation. We can also see this geometrically by adding two vectors that point to the line and noting that the result no longer points to the line. We use the array [ 1 −1     1 −1 c1 + d 0  0 1 1 | 0 ] to obtain the solution in vector form. If we solve the equation we obtain vectors of ￿ ￿ 2 −2 the form +c . This equation deﬁnes a plane in R3 passing through the origin. vectors of this form are closed under addition and scalar multiplication and therefore form a subspace. the line does not pass through the origin. Again. and the plane is therefore a subspace.15. Again we see that we do not have a subspace because these 0 1 vectors are not closed under addition. This is the vector representation of the plane. Even more simply. x3 -1 0 1 x2 1 1 0 x1 FIGURE 5 .

We might expect that this equation deﬁnes some kind of geometric plane passing through the origin. The problem is to show that S is a subspace. All multiples of this single vector generate a line in R3 and solve to obtain c 1 passing through the origin. we still get a plane.82 15. We can do this directly as follows: If Ax = 0 and Ay = 0. . Subspaces. but for the same reasons as before it is no longer a subspace. Thus vectors that satisfy the system Ax = 0 are closed under addition and scalar multiplication. . The solution in vector form will look like a1 v1 + a2 v2 + . Vector Spaces. Each deﬁnes a set S as the collection of all vectors x in Rn that satisfy a system of equations Ax = 0. Example 4: Finally. To ﬁnd all such vectors we write the equations in matrix form   ￿ ￿ x1 ￿ ￿ 1 −1 0   0 x2 = .  . The conditions of closure under addition and scalar multiplication are easily veriﬁed. consider all vectors in R4 whose components satisfy the equation x1 + x2 − x3 + x4 = 0. Example 3: This time we want all vectors in R3 that satisfy the two equations x1 − x2 = 0 x2 − x3 = 0 simultaneously. which can be expressed more simply in matrix notation.” All of the examples above have the same form. This makes sense since each equation deﬁnes a plane in R3 and their intersection must be a line. vectors that satisfy both equations are closed under addition and scalar multiplication. if we change the equation to x1 − x2 + x3 = 2. then A(x + y) = Ax + Ay = 0 + 0 = 0 and A(cx) = c(Ax) = c(0) = 0. Again. 0 1 −1 0 x3   1  1 . but it is hard to visualize. 0 1 0 0 0 1 These vectors do form a subspace. and Span As in Example 1. Later we will give precise meaning to the notion that this subspace is a “three dimensional hyperplane in four space. If we solve it we obtain      −1 1 −1  1  0  0  a  + b  + c . This also suggests the general fact that the intersection of any number of subspaces of a vector space is itself a subspace. + an vn . The second way to verify that S is a subspace is to solve the system Ax = 0 as we did in the examples.

v2 . . Vector Spaces. .  0  0 1 . + an vn are said to be linear combinations of the vectors v1 . + bn vn ) = (a1 + b1 )v1 + (a2 + b2 )v2 + . . . + (an + bn )vn and under scalar multiplication since c(a1 v1 + a2 v2 + . . . + an vn ) + (b1 v1 + b2 v2 + . .1]. Describegeometrically the subspace of R3 spanned by the following vectors. . . Show directly that the following are subspaces of R3 . ≥ 0. 1]. vn . So again we see that S is a subspace.   x1  x2  that satisfy the equation x1 − x2 + x3 = 0 (a) All vectors x3     1 0 (b) All vectors of the form c  1  + d  0  0 1 3. . . vn is called the span of v1 . . . vn span or generate the subspace S. and Span 83 where the a’s are arbitrary constants and the v’s are vectors. 2. .15. . Show that C 0 [0. Subspaces. and x2 are both integers. vn . is a subspace of C 0 [0. Vectors of the form a1 v1 + a2 v2 + . ￿ x1 x2 ￿ in R2 is a subspace. + (can )vn . . v2 . . . . . = 0 or x2 = 0. . 1]. . None of the following subsets of vectors (a) (b) (c) (d) (e) All All All All All vectors vectors vectors vectors vectors where where where where where x1 x1 x1 x1 x1 = 1. The subspace S of all linear combinations of v1 . 1] is a real vector space. Why? 4. .    1 0 (a)  1  . . . vn }. . Vectors of this form are closed under addition since (a1 v1 + a2 v2 + . . and x2 are both ≥ 0 or both ≤ 0. which is the set of all functions that are continuous and have continuous derivatives on [0. v2 . EXERCISES 1. . We also say the vectors v1 . v2 . and we write S = span{v1 . . . + an vn ) = (ca1 )v1 + (ca2 )v2 + . . Show that C 1 [0. v2 . .

(c) x1 + x2 + x3 = 0 and x1 − x2 + x3 = 0 in R3 . (b) x1 + x2 + x3 = 0 in R3 . (a) Two “two dimensional planes” that intersect only at the origin.  0  0 1 1 .84 15. x2 − x5 = 0 in R5 . and Span 5. Find examples of subspaces of R4 that satisfy the following conditions.  1  0 1       1 1 0 (d)  1  . Subspaces. ﬁnd spanning sets of vectors for each of the following subspaces. or said another way. x1 − 2x2 + x4 = 0. (d) x1 − 2x2 + 3x3 − 4x4 = 0 in R4 . 7. 6. Find vector representations for the following geometric objects. (b) x1 + x2 + x3 = 1 in R3 .  1  . Vector Spaces. (b) A line and a “three dimensional hyperplane” that intersect only at the origin.     1 0 1. Find vector representations for the following geometric objects and describe them. (a) 3x1 − x2 = 3 in R2 . (e) x1 + 2x2 − x3 = 0.0 (b) 1 1     1 1 (c)  1  . (a) 3x1 − x2 = 0 in R2 .

v2 . so any one of . v6 }. v4 . v5 . The remaining vectors will still generate the same subspace S = span{v1 .16. BASIS. it is easy to see geometrically that the two sets of vectors       1 0   1 1. Basis. and Dimension 85 16. AND DIMENSION It is possible for diﬀerent sets of vectors to span the same subspace. it can be removed from the spanning set. v3 . we could have solved the equation for v3 or for v4 or for v6 . v5 . Since v2 can be regenerated from the other vectors in the spanning set. The mathematical reason for this is that the second vector in the ﬁrst set can be written as a linear combination of the ﬁrst and third vectors:       1 1 0 1 = 1 + 0 1 0 1 Since the second vector can be regenerated from the other two. v3 .0   1 1 0 and     0   1 1. v6 }. it is really not needed and therefore can be dropped from the spanning set. v4 . x3 0 0 1 1 1 1 x2 x1 1 1 0 FIGURE 6 Example 1: Suppose S = span{v1 . For example. LINEAR INDEPENDENCE.0   0 1 generate the plane x1 − x2 = 0 in R3 .1. The question arises how in general can we reduce a spanning set to one of minimal size and still have it span the same subspace. If we solve this equation for v2 we obtain v2 = v3 + 3v4 − 2v6 . Linear Independence. and suppose we discover among the spanning vectors the linear relationship v2 − v3 − 3v4 + 2v6 = 0. Of course.

. Linear Independence. an are not zero. v6 }. If these vectors are linearly dependent.) v1 v2 v2 v3 v2 v1 two dependent vectors two independent vectors v1 three dependent vectors FIGURE 7 The process described in the example above can now be expressed in the language of linear independence and basis as follows: Suppose a set of vectors span a subspace S. . and Dimension those vectors could have been the one to have been dropped. a2 .) The remaining vectors will still span S. Now suppose among the remaining vectors we ﬁnd another linear relationship. . . (Any vector that appears in the linear combination with a nonzero coeﬃcient can be chosen. . v5 . . . . say 2v1 +v4 −4v5 = 0. . Although this process of successively dropping vectors from spanning set . . If a set of vectors v1 . . . then there is a nontrivial linear combination of them that equals zero. then we say these vectors form a basis for S. and we say that they are linearly independent if the only linear combination of them that equals zero is the trivial one 0v1 + 0v2 + . . This process of successively dropping dependent vectors can be continued until the set of spanning vectors is linearly independent. but. v2 . Then this process of shrinking the spanning set will have to stop. We say that a collection of vectors v1 . Basis. Then we can solve for v1 or for v4 or for v5 and can therefore drop any one of these vectors from the spanning set. since any vector space is a subspace of itself. We then obtain S = span{v1 . + an vn = 0 where at least some of the coeﬃcients a1 . v3 . (We state these deﬁnitions for subspaces. Suppose we drop v4 . At this point suppose there does not exist any linear relationship between the remaining vectors. The resulting spanning set is therefore a basis for S. vn is (1) linearly independent and (2) spans a subspace S. vn is linearly dependent if there exists a linear combination of them that equals zero a1 v1 + a2 v2 + . . they also hold for vector spaces. . + 0vn = 0. v2 . . In this case one of the vectors can be dropped from the spanning set.86 16. On the basis of these observations we make some deﬁnitions.

Our goal is to ﬁnd a basis for S.  −1 0 1 −1 4 5 generate a subspace S of R4 . 0 whose rows consist of these three vectors. . since A can be reconstructed from U by reversing the sequence of Gaussian operations. because of the nature of Gaussian operations. . . . v2 . + (an − bn )vn . vn are linearly independent.16. all the coeﬃcients (ai − bi ) = 0. Furthermore. . Example 2: The following three vectors       1 2 1  3  6 3  . Basis. . . every row of A is a linear combination of the rows of U . will form a basis for S:     1 0 3 0  . Linear Independence. 1 3 1 5   2 3. it does prove that every subspace has a basis. We conclude that there is only one way to write a vector as a linear combination of basis vectors. and running Gaussian elimination (or Gauss-Jordan elimination as we do here) to obtain 1 U = 0 0 3 0 0 0 1 0 The two nonzero rows of U . . . If for a basis v1 . v2 . every row of U is a linear combination of the rows of A.+bn vn . and therefore ai = bi . We accomplish this by forming the matrix   1 3 −1 −1 A = 2 6 0 4 . then subtraction gives 0 = (a1 − b1 )v1 + (a2 − b2 )v2 + . The importance of a basis to a subspace lies in the fact that not only can every vector in a subspace be represented as a linear combination of the vectors in its basis. even further. but. . . . vn we have v = a1 v1 +a2 v2 +. the rows of U must span the same subspace as the rows of A. and Dimension 87 is not a practical way to actually ﬁnd a basis for a subspace. But since v1 . .  0 1 2 3 Why does this work? First note that. . that this representation is unique. . when made into column vectors. First. We can now draw a number of conclusions. since there is some linear combination of the rows of A that results in the .+an vn and also v = b1 v1 +b2 v2 +. Second.

then as we have seen there will be exactly one solution.88 16. the rows of A must therefore be linearly dependent. In this case we obtain the solution a = 2 and b = −3 so that       1 0 2 3 0  6  2  − 3  =  . which we can solve by Gaussian elimination. otherwise there will be no solution. 0 1 −3 2 3 −5 Example 3: Find a basis for the subspace S of all vectors in R4 whose components satisfy the equation x1 + x2 − x3 + x4 = 0. 0 1 0 0 0 1  . the nonzero rows of U are automatically linearly independent. We make the extremely important observation that this equation is equivalent to the linear system  1 3  0 2    0 ￿ ￿ 2 0 a  6  =   1 b −3 3 −5 (See Section 2 Exercise 7). This was Example 4 of the previous section. which is the zero vector.) Now that we have a basis for S. There we found S consisted of all vectors of the form      −1 1 −1  1  0  0  a  + b  + c . because of the echelon form of U . For example. (See Exercise 4. to express the vector   2  6    −3 −5 in terms of the basis we must solve the equation       1 0 2 3 0  6  a  + b  =  . And ﬁnally. Basis. and Dimension third row of U . Linear Independence. 0 1 −3 2 3 −5 If the given vector is in S. we can express any vector in S as a unique linear combination of the basis vectors.

Basis. the comment in Section 10. . and in fact they are also linearly independent.   0 1     1   1 1. which is to solve (A − λI)x = 0 by Gaussian elimination. . It now makes sense to talk about things like “a three dimensional hyperplane passing through the origin in four space.. There is no unique choice of a basis for a subspace. it is true that the number of vectors in a basis is unique. For example.      . This is true because if          −1 1 −1 −a + b − c 0 a  1  0  0    0 a  + b  + c =  =  .” We state this important property of bases formally as: . and write the solution in vector form.   0 1     0   1 1. Clearly the Euclidean space Rn has dimension n. . This number we deﬁne to be the dimension of the subspace. . we were actually ﬁnding not just spanning sets but bases! Furthermore. Therefore. if we solve a homogeneous system Ax = 0 by Gaussian elimination.1 .” is justiﬁed.“ Our method for ﬁnding eigenvectors.0 .. For the Euclidean spaces Rn . one for each free variable.  . That is. In fact. . the each of three sets of vectors     0   1 1.. set the free variables equal to arbitrary constants.  . They clearly span and are linearly independent and therefore form a basis for Rn . . in all the examples of the previous section. This holds in general. 0 1 0 b 0 0 0 1 c 0 then clearly a = b = c = 0. 0 0 1 0 0 0 0 1 These are the vectors that point along the coordinate axes. . does in fact produce linearly independent eigenvectors. Even though the set of vectors in a basis is not unique. Linear Independence. . You can no doubt think of many more. .. These three vectors therefore form a basis for S. . then we obtain a linear combination of independent vectors. and Dimension 89 The three column vectors clearly span S. there is the following natural choice of basis:         0 0 0 1 0 1 0 0 . one for each free variable. there are inﬁnitely many possibilities. .16.0   1 1  are bases for the plane x1 − x2 = 0 in R3 . however. so we will call them coordinate vectors.

Decide the dependence or independence of the following sets of vectors. .6 (b) 1 5 2 . . this is . this means that any two bases must contain exactly the same number of vectors. . .   . EXERCISES 1. v3 span the subspace S and w1 . . . . The general case will then be clear. . In matrix terms  . but we will not do this. We now illustrate the proof in a special case. Any two bases for a subspace contain the same number of vectors. It is also a minimal spanning set of vectors since it cannot be made smaller and still span the space.3. . Proof: It is enough to show that in a subspace S the number of vectors in any linearly independent set must be less than or equal to the number of vectors in any spanning set. . a21 a22 a23 a31 a32 a33  a41 a42  . Suppose v1 . Basis.   w4  =  v1 . w3 . Note that we have been implicitly assuming that the number of vectors in a basis is ﬁnite. v2 . v2 . Linear Independence. . . . there are nontrivial solutions to the homogeneous system Ax = 0 (see Section 7 Exercise 5(b)). We then have W c = (V A)c = V (Ac) = V 0 = 0. We see that a basis is a maximal independent set of vectors in the sense that it cannot be made larger without losing independence. . . and Dimension Theorem. . . Since a basis is both linearly independent and spans.  v3   a12 . w3 . . The w’s are therefore linearly dependent and we are done. But the equation W c = 0 when written out is just c1 w1 + c2 w2 + c3 w3 + c4 w4 = 0 and is therefore a nontrivial linear combination of the w’s. . . . . . We show that the w’s must be linearly dependent. .  w1  . that is.  . each w can be written as a linear combination of the v’s: w1 = a11 v1 + a12 v2 + a13 v3 w2 = a21 v1 + a22 v2 + a23 v3 w3 = a31 v1 + a32 v2 + a33 v3 w4 = a41 v1 + a42 v2 + a43 v3 . there is a nonzero vector c such that Ac = 0. a11 . 2 1       1 3 3 3. . w4 is some larger set of vectors in S.90 16. Since the v’s span. w2 . . w2 . Since A has fewer rows than columns. . . . . a43 which we write as W = V A. a13 . It is possible to extend the discussion above to the inﬁnite dimensional case. ￿ ￿ ￿ ￿ 1 2 (a) .

.16.   . In each case indicate the dimension.1. Suppose we have three sets of vectors.   . Find bases for the subspaces deﬁned by the equations in Section 15 Exercise 6. . U = {u1 . . w6 }.2 (c) 2 2 4       1 2 2 2 1 2 (d)   . (b) The set (does) (does not) (might) span R5 . Basis. 5.   . . Find bases for the subspaces spanned by the sets of vectors in Exercise 1 above. v5 }. W = {w1 . In each case indicate the dimension. . V = {v1 . .2 (b)   4 2 1         10 2 −1   3 1. (a) The set (is) (is not) (might be) linearly independent. . . and Dimension 91       2 1 3 1. Show directly from the deﬁnition that the nonzero rows of  0 0 1 3  are 0 0 0 0 linearly independent. 1  (c)  −2    8 2 1 −1 ￿ ￿ ￿￿ ￿ ￿ ￿￿ 8 2 1 (d) . . For this case draw a picture! 13 1 2 6. Express each vector as a linear combination of the vectors in the indicated sets. 3. For each set answer the following. . . .   1 2 3 1 1 1 2 0 2.       5 2   3  −1  1.2 (a)   1 4 2       −3 2   3  1  1.   1 2 2 1 1 2         1 2 3 1  2   1   3   −1  (e)   . in R5 .2. . u4 }. Linear Independence.   1 3 0 2 4.

Basis. Linear Independence. then show that the real vectors x and y are linearly independent v+v over the complex numbers.) This settles a technnical question about complex 2 vectors from Section 13.92 16. and Dimension (c) The set (is) (is not) (might be) a basis for R5 . If the complex vectors v and v are linearly independent over the complex numbers and if v = x + iy. . (Hint: Assume ax + by = 0 and use x = and 2 v−v x = to show a = b = 0. 7.

which we denote as ￿x￿. Dot Product and Orthogonality 93 17. But this one will be adequate for our purposes. 4. there is a natural way to establish these notions that is often quite useful.17. we get 1 x. and in three-dimensional space the physical length of the vector x =  x2  1 2 x3 ￿ 2 + x2 + x2 .” This is because the deﬁnition of a vector space does not require such concepts. which is a vector of length one. especially for Euclidean spaces. DOT PRODUCT AND ORTHOGONALITY So far. . How can we decide if two vectors are perpendicular? In order to help us do this. With this notion of length we can immediately deﬁne the distance between two points x and y in Rn as ￿x − y￿. x·y =y·x (ax + by) · z = ax · z + by · z z · (ax + by) = az · x + bz · y x · x = ￿x￿2 . It seems is by two applications of the Pythagorean Theorem equal to x1 2 3 reasonable therefore to deﬁne the length or norm of a vector x in Rn . we deﬁne the dot product x · y of two vectors x and y in Rn as the number x · y = x1 y1 + x2 y2 + · · · + xn yn . For many vector spaces however.) Note that since our vectors are column vectors. in our discussion of vector spaces. We say this is the unit vector in the direction ￿x￿ of x. the length of a vector can also be written in matrix nota√ tion as ￿x￿ = xT x. The dot product satisﬁes the following properties: 1. ￿x￿ ≥ 0 and = 0 ⇔ x = 0. ￿ax￿ = |a| ￿x￿ 2. In matrix notation we can also write x · y = xT y. This corresponds to the usual physical distance between points in two and three-dimensional space. 2. Note also that if we multiply any vector x by the reciprocal of its length. 3. In two-dimensional space ￿ ￿ x1 the physical length of the vector x = is by the Pythagorean Theorem equal to x2   x1 ￿ x2 + x2 . It is easy to see that the length function satisﬁes the following two properties: 1. there has been no mention of “length” or “angle. in the following way: ￿ ￿x￿ = x2 + x2 + · · · + x2 n 1 2 (There are situations and applications where other measures of length are more appropriate.

and angles. assuming they are independent. Another term for perpendicular is orthogonal. circles.x || || y || x || x || FIGURE 8 If we write this equation for the triangle formed by the two vectors x and y in vector notation and use the properties of the dot product. Other terms for dot product are scalar product and inner product. In mathematical shorthand we write the statement “x is orthogonal to y” as x ⊥ y. When endowed with the length function ￿ ￿. First note that. Therefore the result above can be written as x ⊥ y ⇔ x · y = 0. In particular. we have ￿x￿2 + ￿y￿2 = ￿x − y￿2 = (x − y) · (x − y) =x·x−x·y−y·x+y·y = ￿x￿2 − 2x · y + ￿y￿2 . The second and third properties follow from the distributivity of matrix multiplication. (It goes both directions. which says that the sides of a triangle are in the relation a2 + b2 = c2 if and only if the angle opposite side c is a right angle. we obtain 0 = −2x · y or x · y = 0. We therefore have all the constructs of Euclidean geometry in this plane including lines. lengths. we have the Pythagorean Theorem. check your Euclid!) y || y . We therefore conclude that the vectors x and y are perpendicular if and only if their dot product x · y = 0. Canceling. this subspace satisﬁes all the axioms of the Euclidean plane. Dot Product and Orthogonality They can be veriﬁed by direct computation. . Now we will see how to determine if two vectors x and y in Rn are perpendicular.94 17. they span a two-dimensional subspace of Rn .

x || || y || x ! || x || FIGURE 9     2 1  2  and  3  is determined by cos θ = Example 2: The angle between the vectors 1 2 10 √ √ = 0. 1 2 √ 1 Each has length 4 + 4 + 1 = 3. we should say that we are seeking the projection p of the vector y onto the direction deﬁned by x or onto the line . ￿x￿￿y￿ y || y . For this we need the Law of Cosines.02◦ .89087.17. Dot Product and Orthogonality 95     2 −2 Example 1: The vectors x =  2  and y =  1  are orthogonal because x · y = 0. Suppose we wish to ﬁnd the vector p which is the geometrically perpendicular projection of the vector y onto the vector x. the dot product can also tell us the angle between any two vectors. The unit vector in the direction of x is x= ￿x￿      . 2 3 2 3 1 3 Even though it is not necessary for linear algebra. so θ = arccos(0. Again writing this equation for the triangle formed by the two vectors x and y in vector notation ￿x￿2 + ￿y￿2 = ￿x − y￿2 + 2￿x￿￿y￿ cos θ and computing (Exercise 9) we obtain x · y = ￿x￿￿y￿ cos θ or x·y cos θ = . which also appears in Euclid and which says that the sides of any triangle are in the relation a2 + b2 = c2 + 2ab cos θ where θ is the angle opposite side c. To be precise. 9 14 We are now in a position to compute the projection of one vector onto another. orthogonal or not.89087) = 27.

we now turn to subspaces. In fact. we immediately see from the ﬁgure below that p must have the property that x ⊥ (y − p).96 17. Dot Product and Orthogonality generated by x. if we write y = p + (y − p). But note that xz-plane and the xy-plane are not orthogonal. so 0 = x · (y − p) = x · y − x · p or x · p = x · y. The ﬁnal result is therefore x·y p= x. a wall of  room is not perpendicular to the ﬂoor! This is because the xa  1  0  is in both subspaces but is not orthogonal to itself. It is easy coordinate vector 0 . Substituting this into the previous equation we obtain c(x · x) = x · y or c = (x · y)/(x · x). Also. That is. For example. it must be some constant multiple of x. since p lies on the line generated by x. we have resolved y into the sum of its component in the direction of x and its component perpendicular to x. so p = cx. just compute p = 18  2  =  4  and obtain perpendicular to 9 1 1 2           4 5 4 4 1 y = p + (y − p) =  4  +  5  −  4  =  4  +  1  2 −2 2 2 −4  FIGURE 10  Having completed our discussion of orthogonality of vectors. the z-axis is orthogonal to the xy-plane in R3 . ￿x￿2 We should think of the vector p as the component of y in the direction of x. y p y-p x 5  5  into its components in the direction of and Example 3: To resolve y = −2       2 2 4  2 . Since we can do geometry in the plane deﬁned by the two vectors x and y. We say that two subspaces V and W are orthogonal subspaces if every vector in V is orthogonal to every vector in W .

to ﬁnd all vectors y whose dot product with the given vector is zero. In other words.17. the dot product of x with each of the two vectors that generate that plane must be zero. Example 4: Find the orthogonal complement of the line generated by the vector   1  2 . For example. These are then the equations that deﬁne the given line. because (a1 v1 + a2 v2 ) · (b1 w1 + b2 w2 ) = a1 b1 v1 · w1 + a2 b1 v2 · w1 + a1 b2 v1 · w2 + a2 b2 v2 · w2 = 0 We make one more deﬁnition. w2 } and the v’s are orthogonal to the w’s. the xy-plane and the z-axis are orthogonal complements. Again we look for all vectors orthogonal to the generating vectors. Expressed in matrix notation this is just   y1  y2  = 0. It is easy to see that W is in fact a subspace (Exercise 12). 0 1 The two vectors above therefore span the plane that is the orthogonal complement of the given line. but not so easily. the relationship is symmetric. that V is the perpendicular complement of W or V = W ⊥ (Exercise 13). The set W of all vectors perpendicular to a subspace V is called the orthogonal complement of V and is written as W = V ⊥ . note that a vector x lies in the line if and only if x is orthogonal to the plane we just found. For example. that is. [1 2 3] y3 We solve this linear system and obtain     −2 −3 y = c 1  + d 0 . these two vectors are a basis for that plane. then any vector in V is orthogonal to any vector in W . Therefore x must satisfy the equations −2x1 + x2 = 0 and −3x1 + x3 = 0. if V = span{v1 . Orthogonal complements are easy to compute.   1  1  and Example 5: Find the equations of the plane generated by the two vectors 1   1  −1 . Now to ﬁnd the equations of the line itself. Just verify that every vector in one spanning set is orthogonal to every vector in the other. Dot Product and Orthogonality 97 to check the orthogonality of subspaces if we have spanning sets for each subspace. It is also follows automatically. We 1 . and we are justiﬁed in saying that V and W are orthogonal complements of each other. In fact. In other words. but the x-axis and the y-axis are not. and ﬁnd the equations of the line. v2 } and W = span{w1 . Here the ﬁrst problem is to ﬁnd all vectors 3 y orthogonal to the given generating vector.

Therefore −x1 + x3 = 0 is the equation of the given plane. that is. We now know how to go in the reverse direction. β   2  2  is orthogonal to the plane gener4: Show that the line generated by the vector 1     1 2 ated by the two vectors  1  and  0 . (d) Find the projection of y onto x. 1   ￿ y1 ￿ ￿ 1   0 y2 = 1 0 y3  . (e) Resolve y into components in the direction of and perpendicular to x.98 17. For the two vectors x =   and y =   −2 2 −4 9 (a) Find their lengths. (b) Find the unit vectors in the directions they deﬁne. Find all vectors orthogonal to in R2 . (c) Find the angle between them. This means that the dot product of x with the vector that generates the orthogonal line must be zero. Dot Product and Orthogonality therefore set up the linear system ￿ 1 1 1 −1 and get the solution which generates the line orthogonal to the given plane. In R ﬁnd the point on the line generated by the vector closest to the point 3 (8. −4 −4   −1 y = c 0 . to the line just obtained. from its vector form to its equation form. EXERCISES    1 −6  2   −2  1. 11 ). that is. 2 ￿ ￿ α 3. Now to ﬁnd the equation form of the plane. note that any vector x in the plane must be orthogonal to the orthogonal complement of the plane. Note that in Section 15 we learned how to go from the equation form of a subspace to its vector form. ￿ ￿ 2 2 2.

. show W is closed under addition and scalar multiplication. 8. then they must be linearly independent. . 10. 7. v2 . Show x · y = 1 (￿x + y￿2 − ￿x − y￿2 ). then U is orthogonal to W . 12.  3  (b) 1 7     1 2 1 1 (c)  . v3 are all orthogonal to one another. True or false? (a) If two subspaces V and W are orthogonal. what is the same thing. We wish to show that W ⊥ = V .17. (V ⊥ )⊥ = V. (b) If U is orthogonal to V and V is orthogonal to W . ￿x￿ 9. Derive x · y = ￿x￿￿y￿ cos θ from the Law of Cosines. vn . 13. then show W = V ⊥ is also a subspace.) Of course this result extends to arbitrary numbers of vectors v1 . Show that if the vectors v1 .   0 1 0 2 1 3 6. Find equations deﬁning the subspaces in Exercise 5 above. Find the orthogonal complements of the subspaces generated by the following vectors. Show that the length of 1 x is one. that is. 4 11. . or.  . . (Hint: Write c1 v1 + c2 v2 + c3 v3 = 0 and show the c’s are all zero by dotting both sides with each of the v’s. Dot Product and Orthogonality 99 5.   1 (a)  1  1     1 1  1 . Let V be a subspace of R8 and W = V ⊥ . then so are their orthogonal complements.   1 1 2 1       1 2 2 1 0 2 (d)  . If V is a subspace. . v2 .

v3 . w5 . v2 . Dot Product and Orthogonality (a) Suppose V has a basis v1 . Hint: Expand ||v − cw||2 and show that it equals zero if c = ±||v||/||w||. w2 . ··· and by counting leading and free variables V ⊥ = W has a basis w1 . v3 satisfy Bx = 0 and therefore are in W ⊥ . then v = cw for some constant c.100 17. (b) Let  · · · w1  · · · w2  B =  · · · w3  · · · w4 · · · w5 in the system Ax = 0 show that  ··· ···  ···. w4 . (c) Observe that each of the three vectors v1 . 14. v2 . w3 . Let · · · v1 A =  · · · v2 · · · v3   ··· ···. then one vector is a multiple of the other. v2 . conclude that W ⊥ = span{v1 . Since they are also independent. . v3 } = V.  ··· ··· and by counting leading and free variables in the system Bx = 0 show that W ⊥ has dimension 3. Interpret this as saying that if the angle between two vectors is 0 or π. Show if v · w = ±||v||||w||.

x2 + 1) is a transformation that maps R2 to R2 . in other words. T (x + y) = T (x) + T (y) 2. . The picture we should keep in mind is that in general a transformation T maps the vector x in Rn to the vector T (x) in Rm . Linear Transformations 101 18. 1 Instead of mapping points to points. the way in which input data is changed into output data. We can think of property 1 as saying that T must take the vertices of the parallelogram deﬁned by x and y into the vertices of the parallelogram deﬁned by T (x) and T (y). It often happens that the transformations in question are linear. We deﬁne a transformation to be a function that takes points in Rn as input and produces points in Rm as output. S(x1 . Rm Rn x T T(x) FIGURE 11 We further deﬁne a transformation T from Rn to Rm to be a linear transformation if for all vectors x and y and constants c it satisﬁes the properties: 1. (The transformation S above is therefore not linear. and property 2 says that the image of x when multiplied by c should be the same as the image of cx. we can think of transformations as mapping ￿￿ ￿￿ ￿ ￿ x1 x2 1 vectors to vectors. We can therefore write S as S = . As usual we consider only Euclidean spaces. LINEAR TRANSFORMATIONS Many problems in the physical sciences involve transformations. T (cx) = cT (x) Note that if we take c = 0 in property 2 we have T (0) = 0. In this section we present some of the basic terminology and facts about linear transformations.) Let’s try to view these two properties geometrically. This is the x2 x2 + 1 view we will take from now on. Property 1 says that under the map T the images of x and y when added together should be the same as the image of x + y. For example. or. that is.18. x2 ) = (x2 . A linear transformation must therefore take the origin to the origin. maps points in Rn to points in Rm .

is a subspace of Rm . Suppose T is a linear transformation that maps from Rn to Rm . We now determine what that form must be. It is a further consequence of the deﬁnition that every linear transformation must have a certain special form. if S is a subspace of Rn . Second. This transformation is linear because T (x + y) = A(x + y) = Ax + Ay = T (x) + T (y) and T (cx) = A(cx) = cAx = cT (x). then T (S). In other words. Then we can write . we can create linear transformations by using matrices. every linear transformation is induced by some matrix.102 18. Then we can deﬁne the transformation T (x) = Ax. which is the set of all vectors of the form T (x). Suppose A is an m × n matrix. which both follow from the properties of matrix multiplication. Because of the way matrix multiplication works. the input vector x is in Rn and the output vector Ax is in Rm . Linear Transformations T(y) y T(x + y) T x+y T(x) x FIGURE 12 It is an immediate consequence of the deﬁnition that a linear transformation takes subspaces to subspaces. Therefore every m × n matrix induces a linear transformation from Rn to Rm . First.

18. Linear Transformations

103

 x1  x2  T  .  = T  .  . xn



0 0 1     1 0  0   1  = x1 T  .  + x2 T  .  + · · · + xn T  .   .  . .  = x1     a11 a21   .  + x2   .  . a12 a22 . . . ... ... . . . ... 0   a12 a22   .  + · · · + xn   .  .  x1   x2   . .  .  . xn   0 

      1 0 0  0 1  0  x1  .  + x2  .  + · · · + xn  .   . .  .  . . .

am1

(The second equality follows from the linearity of T . The fourth equality follows from Section 2 Exercise 7.) Therefore every linear transformation T has a matrix representation as T (x) = Ax. Note also that   x1 a11 x1 + a12 x2 + . . . + a1n xx  x2   a21 x1 + a22 x2 + . . . + a2n xn T  .  =  .  .   . . . xn    . 

a11  a21 = .  . .

am2 a1n a2n . . .

amn

a1n a2n  .  .  .

  0  0   .   .  . 1

am1

am2

amn

am1 x1 + am2 x2 + . . . + amn xn

So every linear transformation must have this form. From now on, we will forget about the formal linear transformation T and instead just consider the matrix A as a transformation from one Euclidean space to another. Note that A is completely determined by what it does to the coordinate vectors. This follows either from the computation above or just from   multiplication. For example, if A = matrix    ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 1 0 0 3 −1 1 3 −1 1 , then A  0  = , A1 = , and A  0  = . 1 5 2 1 5 2 0 0 1 Let S be a linear transformation from Rn to Rq and T be a linear transformation from Rq to Rn . The the composition T ◦ S is deﬁned to be the transformation (T ◦ S)(x) = T (S(x)) that takes Rn to Rm . It is a linear transformation since T (S(x + y)) = T (S(x) + (S(y)) = T (S(x)) + T (S(y)) and T (S(c)) = T (cS(x)) = cT (S(x)). If S has matrix A and T has matrix B, then the question arises, what is

104

18. Linear Transformations

the matrix for the composition T ◦ S? If we compute T (S(x)) = B(Ax) = (BA)x, we see immediately that that answer is that it is the product matrix BA. The key to this observation is the relation B(Ax) = (BA)x, which follows from the associativity of matrix multiplication. Since this result is so important, we will again compute the matrix of the composition, but this time directly. To ﬁnd the jth column of the matrix for T ◦ S we know that all we have to do is see what it does to the jth coordinate vector.  0   .  .   .     T S  1j  = T   .    .  . 0    a 
1j

  1  0  = a1j T  .  + a2j T  .  .  0  

 a2j   .   .  . aqj

b  b11 b12 1q b2q   b21   b22   = a1j  .  + a2j  .  + · · · + aqj  .   .   .   .  . . . bm1 bm2 bmq  b a + b a + ··· + b a  11 1j 12 2j 1q qj b21 a1j + b22 a2j + · · · + b2q aqj    = .   . . bm1 a1j + bm2 a2j + · · · + bmq aqj

  0  1   .  + · · · + aqj T  .  .  0

  0  0   .   .  . 1

This is exactly the jth column of the product matrix BA. Now we investigate the geometry of several speciﬁc linear transformations in order to build up our intuition. In all of the examples below, the matrix is be square and is therefore a map between Euclidean spaces of the same dimension. It can therefore also be thought of as a map from one Euclidean space to itself. ￿ ￿ ￿ ￿ ￿ ￿ ￿ 2 0 x 2x x Example 1: Let A = , then A = = 2 . The eﬀect of this 0 2 y 2y y matrix is to stretch every vector by a factor of 2. ￿ ￿ ￿ ￿ ￿ 2 0 x 2x Example 2: Let A = , then A = . This matrix stretches in the 0 3 y 3y x-direction by a factor of 2 and in the y-direction by a factor of 3. ￿ ￿

18. Linear Transformations

105 ￿

￿ ￿ ￿ ￿ 1 0 x x Example 3: Let A = , then A = . This matrix reﬂects the plane 0 −1 y −y R2 across the x-axis. ￿ ￿ ￿ ￿ ￿ 1 0 x x Example 4: Let A = , then A = . This matrix perpendicularly 0 0 y 0 projects the plane R2 onto the x-axis. ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 0 −1 1 0 0 −1 Example 5: Let A = , then A = and A = . Clearly A 1 0 0 1 1 0 rotates the coordinate vectors by 90◦ , but does this mean that it rotates every vector by this amount? Yes, as we will see in the next example. ￿ ￿ ￿

Example 6: Let’s consider the transformation that rotates the plane R2 by an angle θ. The ﬁrst thing we must do is to show that this transformation is linear. Since any rotation T takes the parallelogram deﬁned by x and y to the congruent parallelogram deﬁned by T (x) and T (y), it takes the vertex x + y to the vertex T (x) + T (y). Therefore it satisﬁes the property T (x) + T (y) = T (x + y), which is Property 1 for linear transformations. Property 2 can be veriﬁed in the same way.
T(x + y) T(x) y x+y T(y)

x

FIGURE 13 We conclude that a rotation is a linear transformation. We are therefore justiﬁed in asking for its matrix representation A. To ￿ ￿ A all we ￿ ﬁnd have to do ￿ to￿compute ￿ ￿ is ￿ 1 cos θ 0 − sin θ where the coordinate vectors go. Clearly A = and A = 0 sin θ 1 cos θ ￿ ￿ cos θ − sin θ and therefore A = . sin θ cos θ

106

18. Linear Transformations

T

0 1

= -sin ! cos !

0 1 T ! ! 1 0 1 0 = cos ! sin !

FIGURE 14 Example 7: Now consider reﬂection across an arbitrary line through the origin. A reﬂection clearly takes the parallelogram deﬁned by x and y to the congruent parallelogram deﬁned by T (x) and T (y) and therefore satisﬁes Property 1. Property 2 can be veriﬁed in the same way.
T(x + y)

T(y) y x+y

T(x)

x

FIGURE 15 A reﬂection is therefore a linear transformation and so has a matrix representation determined by where it takes the ￿ ￿ coordinate ￿ For ￿ vectors. ￿ ￿ example, if A reﬂects ￿ ￿ 1 0 0 1 R2 across the line y = x, then A = and A = and therefore 0 1 1 0 ￿ ￿ 0 1 A= . 1 0 Example 8: To show that a perpendicular projection of R2 onto an arbitrary line through the origin is a linear transformation is a little more diﬃcult. The parallelogram deﬁned by x and y is projected perpendicularly onto the line. By the congruence of the two shaded triangles in the ﬁgure below we see that ￿T (x)￿ + ￿T (y)￿ =

we conclude that T (x) + T (y) = T (x + y). then A = 1 and A = 1 . But if we ﬁx y = c go. For example. Property 2 can be veriﬁed in the same way. 0 1 2 2 ￿1 1￿ and therefore A = 2 1 2 2 1 2 . Linear Transformations 107 ￿T (x + y)￿. even though we know where the coordinate 0 1 vectors ￿ ￿it is￿still not￿easy to see what the transformation does.18. ￿ 1 2 Example 9: Let A = . and since these vectors all lie on the same line and point in the same direction. In this case. to the left otherwise). ￿ A FIGURE 17 . This is a horizontal shear. x x + 2c then A = shows us that the horizontal line at level c is shifted 2c units c c to the right (if c is positive. The other two cases when the line passes through the parallelogram or when x and y project to opposite sides of the origin are similar. if A is the matrix ￿ ￿ ￿1￿ ￿ ￿ ￿1￿ 1 0 2 2 of the projection of R2 onto the line y = x. x+y y x T(x) T(y) T(x + y) FIGURE 16 A projection is therefore a linear transformation and so has a matrix representation determined by where it takes the coordinate vectors.

Describe the geometrical eﬀect of each of the following transformations (where α2 + β 2 = 1 in (g) and (l)). This is a typical example of the composition of two linear transformations. It turns out that to see the geometrical eﬀect of this matrix we will need to compute its diagonal factorization. The reverse order would result in ￿ ￿￿ ￿ ￿ ￿ 0 −1 0 1 −1 0 AB = = . This is incorrect! EXERCISES 1. To apply 1 0 1 0 them in the correct order to an arbitrary vector x we must write B(A(x)) which by the associativity of matrix multiplication is the same as (BA)x. Linear Transformations ￿ 4 2 Example 10: Let A = . ￿ 1 ￿ ￿√ 1 1 ￿ ￿ ￿ ￿ ￿ − √2 −1 2 0 −1 0 0 2 2 (a) (b) (c) (d) 1 1 1 √ √ −1 0 0 1 −1 2 2 2 2   √ ￿ √ ￿ ￿ 1 ￿ 1 ￿ ￿ 3 3 0 −1 0 − 2 2 2 2 α −β √ √ (e) (f) (g) (h)  1 0 0  3 1 3 1 β α − 2 − 2 −2 2 0 0 1         0 0 −1 1 0 0 0 −1 0 α −β 0 (i)  0 1 0  (j)  0 1 0  (k)  1 0 0  (l)  β α 0  1 0 0 0 0 0 0 0 −1 0 0 −1 3. We will take up this approach in Section 22. Find the 3 × 3 matrix that ￿ .108 18. Note that it is extremely important to perform the multiplication in the correct order. Example 11: First rotate the plane R2 by 90◦ and then reﬂect across the 45◦ line. So we just compute the product ￿ ￿￿ ￿ ￿ ￿ 0 1 0 −1 1 0 BA = = . 1 0 1 0 0 −1 which is a reﬂection across the x-axis. Again the images of the coordinate vectors do not −1 1 tell us much. 1 0 1 0 0 1 which is a reﬂection across the y-axis. 2. Prove that linear transformations take subspaces to subspaces. Most matrices are in fact like this one or worse requiring even more sophisticated factorizations. The rotation is ￿ ￿ ￿ ￿ 0 −1 0 1 A= (Example 5) and the reﬂection is B = (Example 7).

reﬂects R3 across the plane x = y. then reﬂecting across the 135◦ line. ￿ ￿ 1 0 (a) 3 1 ￿ ￿ 3 1 (b) 1 3 6. (c) Transform R3 by ﬁrst rotating the xy-plane. Find the image of the unit circle x2 + y 2 = 1 under transformations induced by the two matrices below. 8. rotates R3 around the x-axis by 45◦ . y) where x2 + y 2 = 1.18. and then rotating by −60◦ . What are the image curves? (Hint: Let (¯. and ﬁnd an equation satisﬁed by (¯. projects R3 onto the xz-plane. y ) be the image x ¯ of (x. Why? 7. 4. In each case below ﬁnd the matrix that represents the resulting transformation and describe it geometrically. y ). then the xz-plane. (a) Transform R2 by ﬁrst rotating by −90◦ and then reﬂecting in the line x + y = 0. all through 90◦ .) x ¯ ￿ ￿ 2 0 (a) 0 2 ￿ ￿ 2 0 (b) 0 3 5. then the yzplane. Describe how the following two matrices transform the grid consisting of horizontal and vertical lines at each integral point of the x and y-axes. The matrix ￿ 1 0 1 0 ￿ maps R2 onto the x-axis but is not a projection. Interpret the equality ￿ cos β sin β − sin β cos β ￿￿ cos α sin α ￿ ￿ ￿ − sin α cos (α + β) − sin (α + β) = cos α sin (α + β) cos (α + β) geometrically. . (b) Transform R2 by ﬁrst rotating by 30◦ . Linear Transformations 109 (a) (b) (c) (d) reverses the direction of every vector. Obtain the trigonometric equalities cos (α + β) = cos α cos β − sin α sin β sin (α + β) = sin α cos β + cos α sin β.

) 13. T 0 = . 1 2.) ￿￿ ￿ ￿ ￿ cos θ sin θ 1 0 cos θ − sin θ 11.T = . If T rotates R2 by 30◦ and dilates it by a factor of 5. x3 . If T reﬂects R3 in the xy-plane and dilates it by a factor of matrix. then ﬁnd the matrix of T . Interpret the equality = geometrisin θ − cos θ 0 −1 sin θ cos θ cally. prove the product of any two reﬂections is a rotation. x4 + 2x3 . x2 . 17. If T = . If T  0  = . Conclude that any rotation can be written as the product of two reﬂections. x1 + x3 . 4 −2 2 1 ￿ 16. Show that the matrix that reﬂects R2 across the￿line through the origin that ￿ cos 2θ sin 2θ makes an angle θ with the x-axis is . (Hint: Compute where the sin 2θ − cos 2θ coordinate vectors go. (Use the results of Exercises 8 and 9. Prove the converse of the result of the previous exercise. then ﬁnd its matrix. then ﬁnd its . then ﬁnd its matrix. that is. x4 ) = (x2 . 12.) 10. 2x3 ). (Hint: Compute where cos θ sin θ sin2 θ the coordinate vectors go. Find the matrix that represents the linear transformation T (x1 .       ￿ ￿ ￿ ￿ ￿ ￿ 1 0 0 4 0 −3 14.110 18. T 1 = . 5 −2 1 0 0 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 5 6 3 7 15. Linear Transformations 9. Show that the matrix that projects R2 onto the line through the origin that ￿ ￿ cos2 θ cos θ sin θ makes an angle θ with the x-axis is .

) Rn null(A) A Rm row(A) col(A) FIGURE 18 Now we will show how to compute each of these subspaces for any given matrix. (This follows from Section 15 where it was shown that the set of solutions to Ax = 0 is closed under addition and scalar multiplication. we mean “ﬁnd bases for these subspaces.” To illustrate. The set of vectors x in Rn such that Ax = 0 is called the null space of A and is written null(A). By “compute these subspaces”. In fact. so running Gaussian elimination (actually Gauss-Jordan elimination) on A we obtain   1 2 0 0 −3 U = 0 0 0 2 2 . row(A): To ﬁnd a basis for row(A). Row Space. 1 2 0 6 3 1. Null Space 111 19. We view A as a map from Rn to Rm and make the following deﬁnitions. Let A be an m × n matrix. null(A) is a subspace of Rn . Column Space. NULL SPACE In the previous section we considered matrices as linear transformations. the spanning vectors are already the rows of a matrix. Now we consider rectangular matrices and try to understand the geometry of the linear transformations they induce. 0 0 0 0 0 . The subspace of Rn spanned by the rows of A (thought of as column vectors) is called the row space of A and is written row(A). we use the method of Section 16 Example 2. COLUMN SPACE. ROW SPACE. we deﬁne three fundamental subspaces associated with any matrix. Recall that to ﬁnd a basis for a subspace spanned by a set of vectors we just write them as rows of a matrix and then do Gaussian elimination. All of the examples we looked at were square matrices. we will use the example   1 2 0 4 1 A = 0 0 0 2 2. To do this. In this case.19. The subspace of Rm spanned by the columns of A is called the column space of A and is written col(A).

But we have done this before (Section 16 Example 3). null(A): We want to ﬁnd a basis for all solutions of Ax = 0. col(A): We have just seen that A and U have the same row spaces. We just solve U x = 0 and obtain      −2 0 3  1  0  0        x = a 0  + b1 + c 0 .  0  . the same is true of the pivot columns of A. That is.  0  . This implies that independence and dependence relations between the columns of U correspond to independence and dependence relations between the corresponding columns of A.112 19.  2  . since the pivot columns of U are linearly independent (because no such vector is a linear combination of the vectors that preceed it).  1  . The reason for this is as follows: The two systems Ac = 0 and U c = 0 have exactly the same solutions. We conclude that     4   1 col(A) has basis  0  . so     0   1   2   0         row(A) has basis  0  . Therefore. and any other columns are dependent on these two (Exercise 8).   0   0   −1       0 0 1 . the same is true of A. Column Space. Row Space. this is not true! What is true is that the columns of A that form a basis for col(A) are exactly those columns that correspond to the columns of U that form a basis for col(U ). linear combinations of the columns of A can be written as Ac and of U as U c. since every nonpivot column of U is a linear combination of the pivot columns.       0 0 −1 0 0 1 We conclude that       0 3   −2   1   0   0           null(A) has basis  0  .   0   2       −3 2 2. Therefore the same can be said of A. the two nonzero independent rows of U form a basis for row(A). Furthermore.   1 6  3. for the U of our example. In this example they are columns 1 and 4. And likewise. Do they also have the same column spaces? No. Null Space Since row(A) = row(U ). columns 1 and 4 are independent.

. From the example above. which when written out looks like     x1 row 1 of A 0  row 2 of A   x2   0   . determines the number of vectors in the bases of both row(A) and col(A). Therefore null(A) is the orthogonal ⊥ complement of row(A). 1. Null Space 113 -2 1 0 0 0 null(A) 0 0 1 0 0 R5 R3 1 2 0 0 -3 A col(A) 4 2 3 3 0 0 -1 1 row(A) 0 0 0 2 2 1 0 1 FIGURE 19 We make a series of observations about these three fundamental subspaces. The number of free variables in U determines the number of vectors in the basis of null(A).  . If x is any vector in null(A). we have dim(row(A)) + dim(null(A) = n. Column Space. this means that x is orthogonal to each row of A and therefore to row(A). 2. which is called the rank of A.   . (See Section 17. row m of A 0 xn  Because of the way matrix multiplication works. We therefore have dim(col(A)) = dim(row(A)) = rank(A). . .) This is the . then Ax = 0. . We write row(A) = null(A) and conclude that null(A) and row(A) are orthogonal complements of each other. Row Space. A= .  =  . Since (the number of leading variables) + (the number of free variables) = n.19.   . we immediately see that the number of leading variables in U . 3.

then A(x0 − y) = Ax0 − Ay = b − b = 0 ⇒ x0 − y = w where w is some vector in null(A). (e) The columns of A are linearly independent. . not empty. If A is a square matrix. (b) ⇒ (c): Suppose Ax = b and Ay = b. (e) ⇒ (a): The equation Ax = 0 can be interpreted as a linear combination of the columns of A equaling zero. (c) ⇒ (d): If v1 . For any matrix A the following statements are equivalent. . Column Space. A takes distinct vectors to distinct vectors). vn are linearly independent. (d) ⇒ (e): A maps the set of coordinate vectors. 6.  . Since the columns of A are independent. so we have y = x0 + w. 4. (a) null(A) = {0} (b) A is one-one (that is. Another way of saying this is that col(A) consists of all those vectors b for which there exists a vector x such that Ax = b. we get all solutions expressed in this form automatically.   . (c) If Ax = b has a solution x. the equation Ax = b can be written as       a11 a12 a1n  a21   a22   a2n  x1  . to the set of its own columns.  + · · · + xn  . then c1 Ax1 + c2 Ax2 + · · · + cn Axn = 0 ⇒ A(c1 x1 + c2 x2 + · · · + cn xn ) = 0 = A0 ⇒ c1 x1 + c2 x2 + · · · + cn xn = 0 ⇒ c1 = c2 = · · · = cn = 0. If x0 is a solution of the system Ax = b. .  . This ends the proof.) Then A has several important properties which we summarize in a theorem: Theorem. . that is. am1 am2 amn This immediately says that the system Ax = b has a solution if and only if b is in col(A). (d) A takes linearly independent sets to linearly independent sets. it must be unique. Row Space. then Ax = Ay ⇒ x = y. then any other solution can be written as x0 + w where w is any vector in null(A).  + x2  . Null Space reason that Figure 18 was drawn the way that it was.114 19. (In this case say that the null space is trivial.   . Note that when we solve Ax = b by Gaussian elimination. which are independent. For suppose y is another solution. Proof: We prove (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (a) (a) ⇒ (b): x ￿= y ⇒ x − y ￿= 0 ⇒ Ax − Ay = A(x − y) ￿= 0 ⇒ Ax ￿= Ay. Suppose null(A) = {0}. with the line null(A) perpendicular to the plane row(A). the null space of A consists of only the zero vector. this can happen only if x = 0. . v2 . that is. It must always contain at least the zero vector. A null space can never be empty. .  = b. or in other words col(A) is the image of Rn under the transformation A. 5. this theorem can be combined with the theorem of Section 9 as follows. . which therefore must also be independent. As we have seen many times before.

Then (d) ⇔ (g) follows from the previous theorem. (d). Row Space.” ￿ ￿ 1 2 (a) 2 4 ￿ ￿ 1 2 (b) 2 3   2 4 2 (c)  0 4 2  2 8 4   3 2 −1 3 5   6 (d)   −3 −1 8 0 −1 7   1 2 −1 −4 1 (e)  2 4 −1 −3 5  3 6 −3 −12 3   2 8 4 0 0 7 2 1 −2   2 (f)   −2 −6 0 −1 6 0 2 4 −2 4 . (g) The columns of A are linearly independent. Null Space 115 Theorem. (h) The rows of A are linearly independent. and (f) ⇔ (g) ⇔ (h) is obvious from dim(row(A)) = dim(col(A)) = rank(A). and (e). Proof: From the theorem of Section 8 we have the equivalence of (a). (f) A has rank n. (d) null(A) = {0} (e) det(A) ￿= 0. EXERCISES 1. Column Space. (b). (c) Ax = b has a unique solution for any b. and null spaces and ﬁll in the blanks in the sentence “As a linear transformation. For an n × n matrix A the following statements are equivalent.19. (c). A maps from dimensional Euclidean space to dimensional Euclidean space and has rank equal to . For each matrix below ﬁnd bases for the row. column. (a) A is nonsingular (b) A is invertible.

Column Space.     −3 −3 (a) Is  −3  in null(A)? What does A  −3  equal? −3 −3   −3  13  in col(A)? Is it in the image of A? (b) Is 0   −5 (c) Is Ax =  −5  solvable? 2   −4 (d) Is  6  in row(A)? −2   1  2  and column space 3. ￿ ￿ −3 (d) Is in col(A)? 3 ￿ ￿ −4 (e) Is Ax = solvable? 2 4. and null spaces of the following kinds of transformations of R2 . column.116 19. The 3 × 3 matrix A has null space generated by the vector 1 equal to the xy-plane. Describe the row. (a) rotations (b) reﬂections (c) projections . Row Space. The 2 × 3 matrix A has row space generated by the vector 9 ￿ ￿ 2 generated by the vector −1   −2 (a) Is  −4  in row(A)? −8   −2 (b) Is  −1  in null(A)? 2 (c) Find a basis for null(A). Null Space   1  1  and column space 2.

6. then n ≤ m (A is tall and skinny) and A is one-one. Show directly from the deﬁnition that columns 1 and 4 of  0 0 0 0 are linearly independent.19. 1 2 8. Column Space. Null Space 117 5. 10. For each case below explain why it is not possible for a matrix to exist with the stated properties. while columns 1. In particular. Row Space. 1 1 4 3 (c) Column space = R and row space = R . then A takes subspaces into subspaces of the same dimension. (a) rank(A) ≤ n and m. Show that if null(A) = {0}. Prove the following assertions for an m × n matrix A. (c) If rank(A) = m. and any other columns dependent. then n ≥ m (A is short and fat) and Ax = b has at least one solution for any b. 7. Give examples of matrices A such that    1  (a) null(A) = span  2    3  ⊥  1  (b) null(A) = span  2     3    1  (c) col(A) = span  2    3 (d) A is 4 × 5 and dim(null(A)) = 3   0 0 −3 0 2 2  0 0 0 are linearly .   1  2 . 4. (b) If rank(A) = n. A takes all of Rn into an n-dimensional subspace of Rm . (a) Row space and null space both contain the vector 3     3 1 (b) Column space has basis  2  and null space has basis  3 . 9. Write down all possible row echelon forms for 2 × 3 matrices.

y y = c + dx (2.4) This is an example of a curve ﬁtting problem. Example 1: We want to ﬁt a straight line y = c + dx to the data (0.011 .2) (0. (1. (2. This means we must ﬁnd the c and d that satisfy the equations c+d·0=1 c+d·1=4 c+d·2=2 c+d·3=5 or the system  1 1  1 1    0 ￿ ￿ 1 1 c 4 =  .012 N2 O5 108. LEAST SQUARES AND PROJECTIONS When a scientist wants to ﬁt a mathematical model to data. Least Squares and Projection 20.010 N2 O4 92.006 N2 O3 76. 2). (3. 4).013 NO2 46. he often samples a greater number of data points than the number of unknowns in the model.5) (1.  2 d 2 3 5 (3.1) x FIGURE 20 Example 2: Suppose we have experimentally determined the molecular weights of the following six oxides of nitrogen: NO 30. We illustrate this situation with the following two examples. The result is an overdetermined inconsistent system (one with more equations than unknowns and no solution). 1). 5).006 N2 O 44.118 20.

we rewrite the system as Ax − b ≈ 0.006 2 · N + 1 · O = 44.006  =  . Our goal then is to ﬁnd an x that makes this vector as close to zero as possible.010 2 · N + 4 · O = 92. one solution. (In statistics they are called linear regression problems.010 4 92. We are therefore faced with the problem of “solving” an overdetermined inconsistent system of equations – an impossibility! Since there is no hope of ﬁnding a solution to the system in the normal sense. the only thing we can do is to ﬁnd x’s that satisfy Ax ≈ b. as small as possible. in other words.013 1 · N + 2 · O = 46. ￿ c + 2d − 2 ￿ ￿ ￿ c + 3d − 5 Each term under the square root can be interpreted as the square of the vertical distance by which the line y = c + dx misses each data point.012 2 · N + 5 · O = 108.006 2 · N + 3 · O = 76. We know that a system can have no solution. when a system with more equations than unknowns arises from experimental data. But in practice. This means that we must ﬁnd the N and O that satisfy the equations 1 · N + 1 · O = 30.006 1  ￿ ￿  44. The vector x that does this is called the least squares solution to Ax = b. Since we measure the size of a vector by its length. 3 O  76.) . or inﬁnitely many solutions. Least Squares and Projections 119 We want to use this information to compute the atomic weights of nitrogen and oxygen as accurately as possible. This is why such problems are called least squares problems. The “best” x would be the one that makes this approximate equality as close to an exact equality as possible. it is extremely unlikely that the second or third cases will occur.20.013     2 N  46. The left-hand side of this “equation” is a vector. To give meaning to this last statement.011 or the system  1 2  1  2  2 2    1 30. Our goal is to minimize the sum of the squares of these errors.011 Each of these problems requires the solution of an overdetermined system Ax = b.012     5 108. we come to a formulation of the least squares problem for Ax = b: Find the vector x that makes ￿Ax − b￿ as small as possible. If we write out ￿Ax − b￿ for Example 1 above we get ￿ ￿ ￿ c + 0d − 1 ￿ ￿ ￿ ￿ c + 1d − 4 ￿ ￿ ￿ ￿ = (c + 0d − 1)2 + (c + 1d − 4)2 + (c + 2d − 2)2 + (c + 3d − 5)2 . or.

These are called the normal equations for the least squares problem Ax = b. .  .120 n 20. . . or said another way. The matrix A takes vectors x to vectors Ax in col(A). Then b and col(A) both lie in Rm . if the dot product Ax − b with each column of A is zero. .  . Least Squares and Projection How do we ﬁnd the x that minimizes ￿Ax − b￿? First we view A as a map from R to Rm . 0 2 of A     Ax − b  =  . (For a proof see Exercise 10. col n of A 0 This is just AT (Ax − b) = 0. . which can be rewritten as AT Ax − AT b = 0 or as AT Ax = AT b. They form an n × n linear system that can be solved by Gaussian elimination. Note that b does not lie in col(A) otherwise Ax = b would be solvable exactly. . that is.) And this holds if and only if Ax − b is orthogonal to the columns of A. . Intuitively this occurs when Ax − b is orthogonal to col(A). . to ﬁnd a vector of the form Ax that is as close to b as possible. Rm b Rn x A Ax-b Ax col(A) FIGURE 21 Our problem is to ﬁnd the Ax that makes Ax − b as short as possible.  . If we write the columns of A horizontally. we can express these conditions all at once as  col  col     0 1 of A  . We can now solve the two problems at the beginning of this section. We summarize: The least squares solution to the overdetermined inconsistent linear system Ax ≈ b is deﬁned to be that vector x that minimizes the length of the vector Ax − b. It is found as the exact solution to the normal equations AT Ax = AT b. .

for large scale problems AT A is usually singular.5 = . But when we said that the least squares solution is the solution of the normal equations. because in that case we have AT Ax = 0 ⇒ xT AT Ax = 0 ⇒ (Ax)T (Ax) = 0 ⇒ ￿Ax￿2 = 0 ⇒ Ax = 0 ⇒ x = 0.013    2  46.9993 ￿ It is clear that the matrix AT A is square and symmetric (see Section 2 Exercise 6(e)). Example 2 again: The normal equations for this problem are 1 ￿2  2 1  4 2  2 2 ￿ 18 29   1 1￿ ￿ ￿  2 N 1 =  3 O 1  5 4 29 56 ￿￿ N O ￿ ￿  30. In fact. This is true if the columns of A are independent. we were implicitly assuming that the normal equations could be solved. For such problems it is necessary to use more numerically stable methods such as the QR factorization (see the next section) or the singular value decomposition. d 1 ￿￿ ￿ ￿ ￿ c 12 = d 23 So the best ﬁt line in the least squares sense is y = 1. or is so close to being singular that Gaussian elimination tends to give very inaccurate answers.161 and the solution by Gaussian elimination is ￿ ￿ ￿ ￿ N 14.5 + x.010 92.104 = 1302.0069 = .006 ￿  44.012    108. then AT A will be singular. . But if the columns of A are not independent.20. O 15. that is. that AT A is nonsingular.006    4  76. Least Squares and Projections 121 Example 1 again: The normal equations for this problem are ￿ 1 0 1 1  ￿ 1 1 1 1  2 3 1 1 ￿ 4 6  0 ￿ ￿ ￿ 1 c 1 =  2 d 0 3 6 14 1 1 1 2   ￿ 1 1 4   3 2 5 or multiplied out are and the solution by Gaussian elimination is ￿ ￿ ￿ ￿ c 1.011  ￿ 1 1 2 1 1 2 2 3 2 5 2 1 1 2 2 3 2 5 or multiplied out are 716.

122 20. the columns of A must be independent for this to work. We conclude that P = A(AT A)−1 AT is the matrix that projects Rm onto the subspace col(A).  . we have inadvertently found the solution to a seemingly unrelated problem: the computation of projection matrices. that is. Note that P in the example above is symmetric. Therefore. the vector p = Ax is the orthogonal projection of the vector b onto the subspace col(A). Solving the normal equations for x we obtain x = (AT A)−1 AT b. Least Squares and Projection In solving the least squares problem. we simply multiply b by the matrix P = A(AT A)−1 AT . Example 3: Find the matrix that projects R3 onto the plane spanned by the vectors     1 2  0  and  1 . and then compute A= 1 1 P = A(AT A)−1 AT   ￿ 1 2 0 1 1 = 2 1 1   1 2 ￿￿ 2 = 0 1 3 1 1   1 2 ￿ 2 = 0 1 −1 1 1 2 1 1  = 3 1 3 1 3 3 2 3 −1 3  −1 ￿ 1 2 ￿ 0 1  1  0 1 1 1 2 1 1 ￿￿−1 ￿ ￿ 3 1 0 1 6 2 1 1 −1 2 3 0 1 1 1 ￿ 3 −1 3 2 3  ￿￿ 1 2 0 1 1 1 ￿ Just as in the case of least squares. These observations also go in the other direction. Furthermore. the two given vectors must form a basis for the subspace to be projected onto. to ﬁnd the projection of any vector b onto col(A). and putting this expression back into p we obtain p = A(AT A)−1 AT b. It turns out that this is true of any projection matrix (Exercise 9(a)). First line up the two vectors (in any order) to form the matrix 1  1  1 2  0 1 . From our geometrical considerations. . projection matrices also satisfy the property P 2 = P (Exercise 9(a)).

If we reverse the direction of x − P x we get a new vector y = P x − (x − P x) which we deﬁne to be the reﬂection of x across the subspace S. and therefore the matrix R = 2P − I reﬂects Rm across the subspace S. this resolution is unique. Furthermore. x x . From this we can see more precisely how any matrix A behaves as a linear transformation from one Euclidean space to another. Projection matrices can be used to compute reﬂection matrices. Any vector x can be written as x = P x + (x − P x) where P x is the projection of x onto S and x − P x is the component of x orthogonal to S. This shows that A essentially projects x onto r in row(A) and then maps r to a unique vector Ar in col(A). Any matrix can therefore be visualized as a projection onto its row space followed by one-one linear transformation of its row space onto its column space.     1 0 5 0 1 4 (a) A =   and b =   1 1 6 1 2 4 S .Px Px -(x . EXERCISES 1. Let S = null(A). Solve Ax = b in the least squares sense for the two cases below. Note that y can then be written as y = P x − x + P x = 2P x − x = (2P − I)x.20. We need only verify that P x − x is orthogonal to col(P ) for any vector x. where n is in null(A) and r is in col(A). We check all the required dot products at once with the computation P T (P x − x) = P (P x − x) = P 2 x − P x = P x − P x = 0. Let S be a subspace of Rm . since orthogonal vectors are linearly independent (Section 17 Exercise 11). Then any vector x can be expressed uniquely as x = n + r. any matrix P that satisﬁes P T = P and P 2 = P is the projection matrix of Rm onto col(P ). First we have to precisely deﬁne what we mean by a reﬂection. so that S ⊥ = row(A). Least Squares and Projections 123 that is.Px) y FIGURE 22 The equation x = P x + (x − P x) above also shows that any vector x can be resolved into a component in S and a component in S ⊥ . Applying A to x we obtain Ax = An + Ar = 0 + Ar.

3).15 CuS 95. 5. 8). 0. 0). (3. (1. (2. (0. (a) y = ax: (1. (−1. 3   2 (b) R3 onto the line generated by  1 . 0. 3).  1 . 1. the 1 (a) R2 onto the line generated by . ( π . For each case below ﬁnd the line or surface of the indicated type that best ﬁts the given data in the least squares sense. (1. 2. We want to use the following molecular weights of sulﬁdes of copper and iron to compute the atomic weights of copper. 1 1     1 0 1 0 (e) R4 onto the plane spanned by   . 20). 4). 1) (b) y = a + bx: (0.124 20. 1. 3). Do not solve them! 4. 6) (d) z = a + bx2 + cy 2 : (0. 1. 15) (e) y = a + bt + ct2 : (1. (2. 5). 0. iron. (0. 1 1     1 1 (d) R3 onto the plane spanned by  0  . (π. 0 1 1 0   1  2  onto the plane in Exercise 4(c) above. 5) (c) z = a + bx + cy: (0. and sulfur. 5). (− π . (3. (4.92 Fe3 S4 295. (0.61 FeS 87. 1). 5). Least Squares and Projection 2. −6). 3).98 1 2 (b) A =  0 1     4 −1 −1 3 1   2   and b =   3 1 −1 2 −1 1 Express this problem as an overdetermined linear system.  1 . 5). (−1. Cu2 S 159. 6). Write down the normal equations. 1. 2     1 −2 (c) R3 onto the plane spanned by  1  . (0. Find the projection of the vector 3 . 10).81 Fe2 S3 207. Find the projection matrices for ￿ ￿ indicated subspaces below. 5) (f) y = a + b cos t + c sin t: (0. 1).  . (1. (−1.90 FeS2 119. −3) 2 2 3. 2). 5). (1.

10.20. then P T = P and P 2 = P . b is a vector not in S. (ii) rotation by π radians. (iii) 0 −1 reﬂection across the x-axis and reﬂection across the y-axis. b z S w FIGURE 23 .   −1 0 0 (c)  0 −1 0  (i) Reﬂection through the origin and (ii) rotation by π radians 0 0 −1 around the z-axis and reﬂection across the xy-plane. then show ￿b − w￿ ≤ ￿b − z￿ where z is any other vector in S. Find the projection matrices for the indicated subspaces below. (a) R2 onto the line y = 2x. (b) R3 onto the plane x − y − 2z = 0. 8. (Use the Pythagorean Theorem on the right triangle with sides b − w and z − w. Show that as transformations the matrices below have the following geometric interpretations. and w is a vector in S such that b − w is orthogonal to S.   −1 0 0 (b)  0 −1 0  (i) Reﬂection across the z-axis and (ii) rotation by π radians 0 0 1 around the z-axis. 7. 9. (b) if R = 2P − I. ￿ ￿ −1 0 (a) (i) Reﬂection through the origin. Use matrix algebra to prove (a) if P = A(AT A)−1 AT . If S is a subspace of Rn .) Conclude that w is the unique point in S closest to b. then RT = R and R2 = I. Find the reﬂection matrix of R3 across the plane in Exercise 4(c) above. Least Squares and Projections 125 6.

· · · . ￿ ￿ ￿ ￿ 1 0 2 Example 1: In R the coordinate vectors and form an orthonormal basis. Gram-Schmidt. These are especially nice cause they don’t involve square roots.  2   3 2 2 6  ￿3 ￿ −1 −7 3 3 3 7 7 4 5 5  2  6 2  3 2  −1  3  7 3 3  7 7  4 3 −5 5 1 2 2 2 6 −3 −7 3 3 3 7 7  1 1 −2 −1 4  10 2 2 8 1  10 5  −9 −9 9 15 15 15 1  −1 −1 2 2 7  10  2 4 2 −4  − 11 15   1 9  15  9 9  15 1 1  −2 −2 2 4 1 8 5 2 − 15 − 15 14 9 9 9 15 −1 −1 −1 2 2 2 ones be3 5 4 5 2 and 4 5 −3 5 . But this time the coeﬃcient matrix has a special form: its columns are orthonormal. that is. We will see in a moment that this fact will enable us to solve the system much more easily than by using Gaussian elimination. As we 7 −3 5 5 ￿3 ￿￿ ￿ ￿ ￿ 4 c 2 5 5 have done many times before. We say that it forms an orthogonal or orthonormal basis (whichever the case) for the subspace that it spans. qn is orthogonal if every pair of vectors in the set is orthogonal. ORTHOGONAL MATRICES. ￿q1 ￿ = ￿q2 ￿ = · · · = ￿qn ￿ = 1. that is. (It is not called an orthonormal matrix even though that might make more sense. GRAM-SCHMIDT. we want to write = c 4 +d . We know that such a set of vectors is linearly independent (Section 17 Exercise 11). and QR Factorization 21. 0 1 ￿ ￿ ￿ ￿ 3 4 while the vectors and form an orthogonal basis. q2 . for example. which can therefore be taken as the deﬁning condition for a matrix to be orthogonal. qi · qj = 0 for i ￿= j. Example 2: Here are some orthogonal matrices. AND QR FACTORIZATION A set of vectors q1 .) Clearly the columns of Q are orthonormal if and only if QT Q = I.126 21. If we divide the second 4 −3 two vectors by their lengths to make them unit ￿ ￿ vectors (this is called normalizing the ￿ ￿ vectors).  −1 2 −1  2   −1  2 1 2 . Suppose. Orthogonal Matrices. we rewrite this as 4 = and solve 3 −5 d 7 5 by Gaussian elimination. we obtain the orthonormal basis we should be able to express any vector in R as a linear combination of ￿ ￿3￿ ￿ 4 these two ￿ ￿ 2 5 5 vectors. Furthermore the set is orthonormal if all the vectors in the set are unit vectors. Since we have a basis. We say that a square matrix Q is an orthogonal matrix if its columns are orthonormal.

Therefore Q has orthonormal columns and so is an orthogonal matrix. we also have QQT = QQ−1 = I. The matrix Q = 2 3  2  3 −1 3 2 3 −1 3 2 3 matrix because it is not square. In particular Q takes the coordinate vectors of Rn into an orthonormal set. and angles. For simplicity. From the deﬁning condition for an orthogonal matrix QT Q = I we immediately have Q−1 = QT . (d) angles: The angle between Qx and Qy is given by arccos((Qx · Qy)/(￿Qx￿￿Qy￿)) which from (a) and (c) equals arccos((x · y)/(￿x￿￿y￿)) which is the angle between x and y. .) Since Q preserves lengths and angles. but this set consists of the columns of Q. 5. and then normalize it to get the . and QR Factorization 127 Now we make a series of observations about orthogonal matrices. If a matrix Q preserves length. v2 . Check it! 4. . . vn of a subspace V . (a) length: ￿Qx￿2 = (Qx)T (Qx) = xT QT Qx = xT x = ￿x￿2 ⇒ ￿Qx￿ = ￿x￿. Suppose we are given the following basis for R3 :  −2 v1 =  −2  . (b) distance: From (a) and ￿Qx − Qy￿ = ￿Q(x − y)￿ = ￿x − y￿. since angles can be expressed in terms of the dot product. We will use an example to illustrate a method for doing this. instead of a subspace. This is the converse of 4(a) above. Another way to prove this is to show that Q must preserve dot products and. 1. p3 . q2 . . Thus a linear system with an orthogonal coeﬃcient matrix can by solved by a simple matrix multiplication. 1    2 8. Orthogonal matrices. it preserves distance (as in 4(b) above). v2 = 2   7 7. must preserve angles also. . we will take all of R3 . Since QT is the inverse of Q. This suggests that to solve a system Qx = b with an orthogonal coeﬃcient matrix like that in Example 1 above. qn for V . we just multiply both sides by QT to obtain QT Qx = QT b or x = QT b. (c) dot product: Qx · Qy = (Qx)T (Qy) = xT QT Qy = xT y = x · y. Since Q preserves length. Gram-Schmidt. By the SSS congruence theorem of Euclidean geometry this implies that Q takes triangles into congruent triangles and therefore preserves angles.21. ﬁnd an orthonormal basis q1 . it must be orthogonal. This immediately says that the rows of an orthogonal matrix are orthonormal as well as the columns!   3. v3 = 1   has orthonormal columns but is not an orthogonal We will ﬁrst ﬁnd an orthogonal basis p1 . dot product. . We leave orthogonal matrices for a moment and consider a seemingly unrelated problem: Given a basis v1 . (See Exercise 18 where even more is proved. . As a transformation an orthogonal matrix Q preserves length. distance. Let’s consider each separately. p2 . 2. it takes orthonormal sets into orthonormal sets. Note that QT Q = I but QQT ￿= I. .

q3 . 1 The second step is to ﬁnd a vector p2 that is orthogonal to p1 and such that span{p1 . v2 }. We can accomplish this by deﬁning p2 to be the component of v2 orthogonal to p1 . We can accomplish this by deﬁning p3 to be the . p3 } = span{v1 . v2 . p2 . v3 }.128 21. q2 . and QR Factorization orthonormal basis q1 . p2 } = span{v1 . = 4 p3 v3 p2 v2 v 1 = p1 FIGURE 24 The third step is to ﬁnd a vector p3 that is orthogonal to p1 and p2 and such that span{p1 . Orthogonal Matrices. Gram-Schmidt. The ﬁrst step is to set p1 = v1 :   −2 p1 =  −2  . Just subtract from v2 its projection onto p1 : v2 · p1 p2 = v2 − p1 p 1 · p1     2 −2 −18  = 8 − −2  9 2 1   −2  4 .

21. . . . It should be clear how to extend it to larger numbers of vectors. . . q2 . Q to be the matrix with columns q1 . . . All we have to do is to subtract oﬀ the projection of v3 onto p1 and p2 separately. .     v1 v2 v3  =  q1 . Finally we normalize the p’s to obtain the orthonormal q’s:  2  1  2  −3 −3 3  −2   2   −1  q1 =  3  .) v3 · p2 v3 · p1 p3 = v3 − p1 − p2 p1 · p1 p2 · p2       7 −2 −2 −27  18  = 7 − −2  − 4  9 36 1 1 4   2 =  −1  . First note that v1 is in span{q1 } v2 is in span{q1 . q2 . v3 . q2 } v3 is in span{q1 . Just subtract from v3 its projection onto span{p1 . v2 . . For our example this looks like 2     −2 −1  3 3 3 −2 2 7 ∗ ∗ ∗ 2  −2 8 7  =  − 2 −1   0 ∗ ∗  . v3 }. p2 } = span{v1 . Gram-Schmidt. v2 . v2 }. . and span{p1 . and QR Factorization 129 component of v3 orthogonal to span{p1 . q2 . .  . . p2 . . q3 =  3  . q3 }. 2 At each stage the p’s and the v’s just are linear combinations of each other. .  3 3 3  1 2 2 1 2 1 0 0 ∗ 3 3 3 . .  . 1 3 2 3 2 3 The method that we have just illustrated is called the Gram-Schmidt process. p2 }. Using matrices this can be written as  . (This works because p1 and p2 are orthogonal.  q3   0 ∗ ∗  . ∗ ∗ ∗ .   . . We can also express the result of the Gram-Schmidt process in terms of matrices. . q2 =  3  . Orthogonal matrices. . . . We can interpret this as a factorization of the matrix A into an orthogonal matrix times an upper triangular matrix. so we have span{p1 } = span{v1 }. If we deﬁne A to be the matrix with columns v1 . . To ﬁnd this projection we don’t have to compute a projection matrix as might be expected. . See Exercise 12. then we have A = QR. . q3 . . . 0 0 ∗ . p3 } = span{v1 . span{p1 . . . p2 }. and R to be the appropriate upper triangular matrix.

Suppose that we had started with the matrix   −2 2 B =  −2 8  1 2 Then we would have had the factorization    −2 −1  ￿ ￿ 3 3 −2 2  −2 2  3 −6  −2 8  =  3 . Actually. but we will stop here. . .  2  2 1 −3 −3 3 −2 2  −1 2 2  = 3  −2 8 3 3 2 1 2 −1 2 3 3 3   3 −6 −9 3 . Gram-Schmidt. .  v3  . Note that Gram-Schmidt process. it is possible to obtain a QR-like factorization for any matrix whatever. Just multiply the equation A = QR by R = QT A:     . we can make an even more general statement. This is called the QR factorization and is the third great matrix factorization that we have seen (after the LU and diagonal factorizations). We conclude that any matrix A with independent columns has a factorization of the form A = QR where Q has orthonormal columns and R is upper triangular. = 0 6 0 0 3 We ﬁnally have  −2  −2 1   −2 3 2 7  2 8 7  =  −3 1 2 1 3 −1 3 2 3 2 3 2 3 −1 3 2 3 QT on the left to obtain . In fact. .130 21. is the ﬁrst truly new computational technique we have had since we ﬁrst introduced Gaussian elimination! In fact. there . . . . .  7 7 1  This shows that any square matrix A with independent columns has a factorization A = QR into an orthogonal Q and an upper triangular R. ∗ ∗ ∗ · · · q1 · · · . . on which all this is based.  . . and QR Factorization It is easy to ﬁnd R.  0 6 0 0 3 We see that B = QR where now Q has orthonormal columns but is not orthogonal! Fortunately QT Q = I is still true so the method above to ﬁnd R still works. . 0 0 ∗ · · · q3 · · · .   0 ∗ ∗  =  · · · q2 · · ·   v1 v2 . Orthogonal Matrices. 3  0 6 1 2 1 2 3 3  3 −6 −9  3 .

  =1 √1 1 2 5 2  0 20   1 3 1 3 2 √ 20 This gives the normal equations in the form ￿ ￿￿ ￿ ￿ 1 2 2 √ 3 c = − √3 0 5 d 20 ￿ 6 √ . Then plugging into the normal equations we obtain (QR)T QRx = (QR)T b or RT QT QRx = RT QT b or RT Rx = RT QT b.21. Since R is upper triangular. which 3 makes it competitive with Gaussian elimination in many situations. This equation is another matrix expression of the normal equations. In practice the QR method preferable to solving the normal equations directly since the Gram-Schmidt process for ﬁnding the QR factorization is more numerically stable than Gaussian elimination. For the ﬁrst. We ﬁnd the QR factorization of the coeﬃcient matrix: 1  − √3 2 20   1 0 1  ￿  − √1  ￿ 20  2 3 1 1  2 √ . Example 3: Recall the system  1 1  1 1    0 ￿ ￿ 1 1 c 4 =  . Suppose we have the QR factorization A = QR.  2 d 2 3 5 from the line ﬁtting problem of Section 20. it can be solved simply by back substitution. We mention two. Of course. Since RT is nonsingular (it’s triangular with nonzeros down its diagonal). ￿ 1 2 − √1 20 1 2 √1 20 1 2 √3 20 ￿   1 4   2 5 . = 5 The solution is c = 1. most of the work was done in ﬁnding the QR factorization of A in the ﬁrst place. Gram-Schmidt. recall an overdetermined inconsistent system Ax = b has a least squares solution given by the normal equations AT Ax = AT b.5 and d = 1 as before. we can multiply through by (RT )−1 to obtain Rx = QT b. The QR factorization has a wide range of applications. and QR Factorization 131 2n3 are eﬃcient algorithms that can perform Gram-Schmidt in operations. Orthogonal matrices.

0 1 =  0 6   √3 0 1 1 1 1 2 √ √ − 2 6 Then the projection matrix is √ 1  P =   = EXERCISES  1  √ ￿√ 6 1 2 2  √  0 6  √ 1 6 1 1 √ − √6 2 2 1 1  3 3 3 1 2 −1  . the projection matrix P of Rn onto col(A) becomes P = A(AT A)−1 AT = QR((QR)T (QR))−1 (QR)T = QR(RT QT QR)−1 RT QT = QR(RT R)−1 RT QT = QRR−1 (RT )−1 RT QT = QQT .132 21. Gram-Schmidt. ￿ ￿ ￿ ￿ 5 −22 (a) .) We construct the spanned by 1 1 matrix A with these two vectors as its columns and ﬁnd its QR factorization: √ 1 1  √   √  3 2 6 2 √2 1 2  2  √  √ . and QR Factorization The second application of the QR factorization is to computing projection matrices. Orthogonal Matrices. Of course. So the projection matrix assumes a very simple form: P = QQT . 12 −19 . (This is Section 20 Example 3. 3 3 3  1 1 2 −3 3 3 2 0 2 √ 6 1 √ 2 1 − √6 ￿ 1. again all the work has been done earlier in ﬁnding the QR factorization of A. Use the Gram-Schmidt process to orthonormalize the following sets of vectors. If we have the QR factorization A = QR. Example 4:   Suppose we want the projection matrix P of R3 onto the subspace   1 2  0  and  1 .

Express  9  as a linear combination of the vectors  3 . ￿ ￿ 5 −22 (a) 12 −19   −3 1 1 (b)  6 −9 5  2 4 11 1 −1  (c)  −1 −1  −10 (d)  11 2    1 0 1 −1 −2 0   1 0 0 1 −2 −1  20 −7  26  1 2 3  −1 −2 0  (e)   −1 −1 −1 −1 −1 2  2  2   1   −3 3 3 3  2   −1   2  3. and QR Factorization 133     −3 1 1  6  .21. Orthogonal matrices.  3 . 3 .   −1 −1 −1 −1 −1 2  2.  −9   5  (b) 2 4 11        1 1 0 1  −1   −1   −2   0  (c)  . Gram-Schmidt.    −1 1 0 0 −1 1 −2 −1     −10 20 (d)  11  . Find the QR factorizations of the following matrices. 2 2 3 −1 3 3 3 .  −7  2 26     1 2 3  −1   −2   0  (e)  .

Orthogonal Matrices. and QR Factorization  4. Use the QR factorization to ﬁnd the projection matrix of R4 onto the plane     1 0  −1   0  spanned by the vectors   and  . Show that if Q is an orthogonal matrix. Show that if Q1 and Q2 are orthogonal matrices. reﬂections. 8. or. . 10. or projections? α 11. Let Q = β ￿ ∗ ∗ ￿ be an orthogonal matrix. 2 α (c) Conclude that any orthogonal transformation of R2 must be a rotation or a reﬂection.134 21. Conclude that Q = or . Extend the orthonormal set   −8 9 4 9 1 9  4 9 4 9  7 . ﬁnd a third column that makes the matrix   ∗ ∗  orthogonal. −1 −2 1 −2 7. what is the same  −8 9 4 9 1 9 4 9 7 9 4 9  thing. then QT AQ has the same eigenvalues as A. −α β α β −α ￿ ￿ β (b) Show that Q must be a rotation by arctan or a reﬂection in the line that α ￿ ￿ 1 β makes an angle of arctan with the x-axis.  ∗  1 0 5.  1 y 6 2 4 6. Gram-Schmidt. Show that if Q is an orthogonal matrix then det(Q) = ±1. 9. Use the QR factorization to ﬁnd the least squares solution of  1 1    0 ￿ ￿ 5 1 x 4 =  . Which of the following transformations are orthogonal: rotations.  9  to a basis of R3 . then so is Q1 Q2 . ￿ ￿ ￿ ￿ α −β (a) Show that the only unit vectors that are orthogonal to are and β α ￿ ￿ ￿ ￿ ￿ ￿ β α −β α β .

If Q is orthogonal. · · · . then Q−1 = ? ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 1 −3 1 −2 16. (1) T preserves distance and the origin ⇒ ||T (x)|| = ||x||. p1 · p1 p2 · p2 pm · p m (Hint: Verify v − w ⊥ pi for all i. and ||T (x) − T (y)||2 = ||x − y||2 . How would you extend an orthonormal basis v1 . If T is any transformation of Rn to itself that preserves distance and such that T (0) = 0. If P is a projection. v2 .21. Conclude that any reﬂection is an orthogonal transformation. If p1 . · · · . v is a vector outside S ￿ ￿ v · p1 v · p2 v · pm S. . vp . vn of all of Rn ? 14. 13. then T is linear and can be represented as T (x) = Qx where Q is an orthogonal matrix. and w = p1 + p2 + · · · pm . pm is an orthogonal ￿ ·· basis for a￿ subspace ￿ of Rn . then is T orthogonal? 3 1 2 −1 17. Orthogonal matrices.) Conclude that w is the orthogonal projection of v onto S. vp of a subspace V of Rn to an orthonormal basis v1 . If T = and T = . (3) Expand ||T (x + y) − T (x) − T (y)||2 and use (1) to show that it equals zero. then is (a) AAT symmetric? (b) AAT invertible? (c) AAT orthogonal? (d) QT symmetric? (e) QT invertible? (f) QT orthogonal? 18. If A is n × n and Q is n × n orthogonal. then show v − w ⊥ S. · · · . Interpret this as saying that any transformation that preserves length and the origin must be linear and can be represented by an orthogonal matrix. ￿ · . show (2P − I)T (2P − I) = I. v2 . Conclude that T is linear and preserves dot products. Expand this to show that T (x) · T (y) = x · y. Gram-Schmidt. p2 . 15. ||T (y)|| = ||y||. vp+1 . and QR Factorization 135 12. (2) Expand ||cT (x) − T (cx)||2 and use (1) to show that it equals zero. This can be proved in the following way.

The numbers a and b are the coordinates of the 1 2 vector with respect ￿ the skewed coordinate￿system deﬁned by the two eigenvectors. DIAGONALIZATION OF SYMMETRIC AND ORTHOGONAL MATRICES In Sections 9 and 10 we learned how to ﬁnd eigenvalues. If we think in terms of how this matrix operates on its eigenvectors 1 2 we have ￿ 11 ￿￿ ￿ ￿ 11 ￿￿ ￿ ￿ ￿ ￿ ￿ −3 −3 3 3 1 1 5 5 5 5 =2 and =1 . any vector in 3 1 R2 can be written as a +b . 2 4 2 4 1 1 2 2 5 5 5 5 In this case the eigenspaces are the two lines generated by the two eigenvectors. All other vectors are moved in more complicated ways. We can see how they are 2 moved by observing that. Our point of view was purely algebraic. since the￿ ￿ ￿ ￿ two eigenvectors form a basis for R . (See Exercise 1) Example 1: We now illustrate the geometry of diagonalization with the matrix ￿ 11 ￿ −3 5 5 . ￿ to ￿ ￿ ￿ ￿ ￿ 3 1 3 1 Since A maps a +b to 2a +b . eigenvectors.136 22. A FIGURE 25 . which has eigenvalues λ = 2 and λ = 1 with associated eigenvectors 2 4 5 5 ￿ ￿ ￿ ￿ 3 1 and . A maps each line to itself but stretches one by a factor of 2 and the other by a factor of 1. Now we consider these concepts geometrically. Diagonalization of Symmetric and Orthogonal Matrices 22. we see that the eﬀect of A is very 1 2 1 2 simple when viewed in this new coordinate system. The ﬁrst thing to mention is that all eigenvectors v associated with a particular eigenvalue λ of a matrix A form a subspace that we call the eigenspace of A for the eigenvalue λ. and diagonal factorizations.

In either case it will arrive at the same destination. also has a geometric interpretation illustrated by the diagram below. 1 2 3 1 1 2 2 3 1 ￿ ￿ ￿ ￿ 3 1 = 1 0 and ￿ ￿ ￿ ￿ 1 0 = . and ﬁnally sent back to stretched versions of the original two eigenvectors.22. 1. In particular we can watch how the eigenvectors are mapped. then stretched by factors of 2 and 1. 2 1 A S -1 0 1 0 1 S D 1 0 1 2 0 FIGURE 26 Now we cover some points that were skipped over in Section 11. which in this case looks like ￿ 11 5 2 5 −3 5 4 5 ￿ 3 = 1 ￿ 1 2 ￿￿ 2 0 0 1 ￿￿ 3 1 1 2 ￿−1 . To construct the diagonal factorization A = SDS −1 we need n linearly independent eigenvectors to serve as the columns of S. The independence of the columns . Since ￿ ￿ ￿ ￿ 1 3 S = 0 1 we have S −1 and ￿ ￿ ￿ ￿ 0 1 S = . 1 2 S −1 Therefore we see that the two eigenvectors are ﬁrst taken to the two coordinate vectors. Diagonalization of Symmetric and Orthogonal Matrices 137 The diagonal factorization A = SDS −1 . The diagram means that a vector can be mapped horizontally by A (transcontinental railroad) or around the horn by SDS −1 (clipper ship).

all the coeﬃcients in this equation must equal zero. For any eigenvalue. ..  . . meaning that there are pendent eigenvectors v1 . . . . 2. To see this. . ￿ ￿ ￿ 2 3 2 0 For example the shear matrix and the diagonal matrix both have 0 2 0 2 eigenvalues λ = 2. a contradiction.  . Subtracting one from the other gives 0 = c1 (λ1 −λ5 )v1 c1 +c2 (λ2 −λ5 )v2 +c3 (λ3 −λ5 )v3 +c4 (λ4 −λ5 )v4 . . . Theorem.   .  0 ··· 0 . then we can state the answer to this question formally as follows. But this means that v5 = 0. Expand this set of vectors to v1 . but the shear matrix has only one independent eigenvector whereas the diagonal matrix has two. . · · · .    . . . . . . · · · . . . . Unfortunately there are many interesting matrices that have repeated ￿ eigenvalues. 2 (meaning that the eigenvalue is repeated). . · · · . .  0 · · · λ0 vp vn  =  v1 vp vn   A  v1     . Suppose this vector is v5 . · · · . . . Eigenvectors that are associated with distinct eigenvalues are linearly independent. vp for λ0 .138 22.   . geometric multiplicity ≤ algebraic multiplicity. assume it is not true and ﬁnd the ﬁrst vector vi (reading from left to right) that can be written as a linear combination of the v’s to its left. . v4 are independent.  . . . . Diagonalization of Symmetric and Orthogonal Matrices will insure that S −1 exists (see Section 19). But since all the λ’s are diﬀerent. 3. v2 . . . From this result we see that an n × n matrix is diagonalizable if there are n real and distinct eigenvalues. Then we know that v1 . . v4 are linearly independent. Multiply one copy of this equation by A to obtain λ5 c5 = c1 λ1 v1 + c2 λ2 v2 + c3 λ3 v3 + c4 λ4 v4 and another copy by λ5 to obtain λ5 v5 = c1 λ5 v1 + c2 λ5 v2 + c3 λ5 v3 + c4 λ5 v4 . · · · . Proof: Suppose λ0 has geometric multiplicity p. vn for Rn . . vn are eigenvectors for A with associated eigenvalues λ1 . and therefore we have an equation of the form v5 = c1 v1 + c2 v2 + c3 v3 + c4 v4 . vp . λn where λi ￿= λj for all i ￿= j. Then B has the form .  . λ2 . v2 . . Then we have  λ0 · · · 0      . What is the relationship in general between the number of independent eigenvectors associated with a particular eigenvalue λ0 of a matrix A and the number of times λ0 is repeated as a root of the characteristic polynomial of A? If we deﬁne the ﬁrst number to be the geometric multiplicity of λ0 and the second to be the algebraic multiplicity of λ0 . . D . . Since v1 . . v3 . if v1 . the only way this can happen is if c1 = c2 = c3 = c4 = 0. . v2 . . v2 . v3 . then all the v’s are linearly independent. . . and so 0 E the characteristic polynomial det(A − λI) = det(B − λI) = det(C − λI) det(E −     . . E 0 ··· 0 p indea basis  which can be written A = SBS −1 where S is the matrix of column vectors and ￿ ￿ C D B is the matrix on the extreme right. The problem of diagonalization therefore reduces to the question of whether there are enough independent eigenvectors. . . In other words. v2 .

We can therefore normalize them so that the factorization becomes ￿3 ￿￿ ￿−1 ￿￿ 3 4 4 25 0 5 5 5 5 A= 4 . but those two are not orthogonal to each other. so we can write the factorization as A = QDQT or as ￿3 ￿￿ ￿T ￿￿ 3 4 4 25 0 5 5 5 5 A= 4 . so . Diagonalization of Symmetric and Orthogonal Matrices 139 λI) = (λ0 − λ)p det(E − λI) (see Section 9 Exercise 3). As usual.22. 0. ￿ ￿ 41 −12 Example 2: Consider the symmetric matrix A = . we com−12 34 ￿ ￿ ￿ ￿ 3 4 pute the eigenvalues 25 and 50. or at least diagonal-like. They are however both associated with the eigenvalue 2. and 4 −3 set up the factorization 3 4 A= 4 −3 ￿ ￿￿ 25 0 0 50 ￿￿ 3 4 4 −3 ￿−1 But note that the two eigenvectors have a very special property: they are orthogonal. 2 and the corresponding eigenvectors       −1 1 1  1 . 4 −3 0 50 −3 5 5 5 5 which has the form A = QDQ−1 where Q is an orthogonal matrix. we also have Q−1 = QT . meaning that the algebraic multiplicity of λ0 is at least p. This ends the proof. 2. the corresponding eigenvectors and . 1. The diﬀerence is that this time the coordinate system is rectangular. 1 0 1 The ﬁrst vector is orthogonal to the second and third.   4 −2 −2 Example 3: Consider the symmetric matrix A =  −2 4 2 . Because Q is orthogonal. 4 −3 0 50 −3 5 5 5 5 As in Example 1 the eigenvectors set up a coordinate system with respect to which the action of A is very simple. In particular we will now investigate symmetric and orthogonal matrices and show that they always have especially nice diagonal. factorizations. We compute the −2 2 4 eigenvalues λ = 8. There are important classes of matrices that always have diagonal factorizations.

which is one of the most important results of linear algebra. √   12  0 √ 2 √   −1  . If A is a symmetric n × n matrix. λn and its corresponding eigenvectors form an orthonormal basis with respect to which A takes the form   λ1 λ2     . in this case a plane. eigenvectors that come from diﬀerent eigenvalues seemed to be automatically orthogonal. We prove this by letting Av = λv and Aw = µw where λ ￿= µ and noting that λv · w = wT λv = wT Av = (wT Av)T = v T Aw = v T µw = µv · w ⇒ (λ − µ)v · w = 0 ⇒ v · w = 0. This is in fact true for any symmetric matrix A. The Spectral Theorem. If we run the Gram-Schmidt process on these two eigenvectors.140 22. We state this formally in the following theorem.   .  6 2 √ 6 6 Proof: We have to temporarily view A as a transformation of complex n-dimensional space C n . Diagonalization of Symmetric and Orthogonal Matrices they generate the eigenspace. or said another way. then A has n real eigenvalues (counting multiplicities) λ1 . does every symmetric have a diagonal factorization through orthogonal matrices.. In the previous two examples. we obtain the factorization  √ 1 1  1 1 1 T   −√ √ √ √ − 13 √2 8 0 0 6 3 2 6  √ 1 1 1  1 1 1   √ √ √ − √6   0 2 0   3 − √6  A= 3 2 2     1 2 1 2 0 0 2 √ √ √ √ 0 0 3 6 3 6 Again it has the form QDQT where Q is an orthogonal matrix. we will stay within the eigenspace and generate the two orthonormal eigenvectors √  1  1  If we normalize the ﬁrst eigenvector and assemble all the pieces. (Justify each step. λn or. and such a factorization is called a spectral factorization.) Can every symmetric matrix be factored as in the previous two examples? That is. of the eigenvalue 2. · · · . does every symmetric matrix have an orthonormal basis of eigenvectors? The answer is yes. λ2 . in orther words. A can be expressed as A = QDQT where Q is orthogonal and D as above. Since the characteristic equation det(A − λI) = an λn + an−1 λn−1 + · · · + .

the Fundamental Theorem of Algebra tells us that there are n (possibly complex) roots. (The ﬁrst column is v2 . (The ﬁrst column is v1 . we can conclude that  0 0 0 ∗ ∗ ∗ . but the fact that λ0 is a real eigenvalue ⇒ det(A − λ0 I) = 0 ⇒ the real matrix A − λ0 I is singular ⇒ there is some real eigenvector for λ0 . If λ0 and v are complex. 0 ∗ ∗ ∗ 0 ∗ ∗ ∗ But since QT AQ1 is symmetric (see Section 1  λ1  0 AQ1 = Q1  0 0 2 Exercise 6(f)). Normalize v2 and expand it to an orthonormal basis of R3 . Since λ0 equals its own conjugate. To illustrate the proof.) Therefore every symmetric n × n matrix has n real eigenvalues (counting multiplicities). it has an eigenvalue λ2 with eigenvector v2 . Let Q1 be the orthogonal matrix with these vectors as its columns. Diagonalization of Symmetric and Orthogonal Matrices 141 a0 = 0 involves a polynomial of degree n. ∗ ∗ ∗ ∗ ∗ ∗ Let A2 be the 3 × 3 matrix in the lower right corner of the last factor on the right. We therefore have the equality λ0 v T v = v T Av = (v T Av)T = v T Av = λ0 v T v. we have  1 0  0 0 0 0 U2 0 T 1  0 T  Q1 AQ1  0 0  0 0 U2 0    . Then A2 is symmetric and. Since A2 is symmetric.) Then as above we have   λ2 0 0 A2 U2 = U2  0 ∗ ∗  .22. it must be real. The rest of the proof takes place in the real world and proceeds in steps. If λ0 is one such root. so there is a vector v such that Av = λ0 v. then taking complex conjugates we have Av = λ0 v (Section 13 Exercise 2). be repeated four times) with eigenvector v1 . A has an eigenvalue λ1 (which could. except for λ1 . (The eigenvector v may not be real.) Then we have   λ1 ∗ ∗ ∗  0 ∗ ∗ ∗ AQ1 = Q1  . 0 ∗ ∗ Putting this together with the result of step one. then it is an eigenvalue of A. has the same eigenvalues as A (see Section 9 Exercise 3). This ends step one. Canceling v T v (justiﬁed by Exercise 9) we get λ0 = λ0 . Let U2 be the orthogonal matrix with these vectors as its columns. in the worst case. so that λ0 is also an eigenvalue with eigenvector v. we let A be a 4 × 4 matrix. Normalize v1 and expand it to an orthonormal basis of R4 .

In fact. .142 22. most orthogonal ￿ matrices are not diagonalizable at all as ￿ 0 −1 in the case of the rotation matrix . Diagonalization of Symmetric and Orthogonal Matrices 1 0 = 0 0  0 0 U2 0 T  or letting Q2 equal the product of Q1 and the matrix containing U2 we have   λ1 0 0 0  0 λ2 0 0  QT AQ2 =   2 0 0 ∗ ∗ 0 0 ∗ ∗ λ1 0 0   0   0 A2 0  λ1 0 0  0 λ2 0 = 0 0 ∗ 0 0 ∗ 0 0 0  ∗ ∗ 1 0  0 0   0 0 U2 0    Q2 is the product of orthogonal matrices and is therefore orthogonal (Section 21 Exercise 8). Then A3 is symmetric and. In general. except for λ1 and λ2 . which we will not pursue here.   . Let A3 be the 2 × 2 matrix in the lower right corner of the last factor on the right. The Spectral Theorem has many applications. λn This proves the Spectral Theorem. we continue in this manner until we obtain   λ1 λ2    QT AQ =  . so the Spectral Theorem does not apply. Instead we will end with a spectral-like factorization for orthogonal matrices. has the same eigenvalues as A.. But let’s push ahead anyway with the 1 0 following example. orthogonal matrices are not necessarily symmetric. We ﬁnd its roots and use Gaussian elimination with complex arithmetic as in Section 13 to obtain the following three eigenvalue-eigenvector pairs:  . Of course. This ends step two. Example 4: We consider the orthogonal matrix  2 2 A= 3  −1  3 2 3 3 2 3 −1 3 −1 3 2 3 2 3  The characteristic equation for A is x3 − 2x2 + 2x − 1 = 0.

Recall from that section that the equation Av = λv can be written as A(x + iy) = (α + iβ)(x + iy). 1 −2i 2i   1 1. fortunately. −2i 1 2 −i √ 3 2  √  3 √ −i − 3 − i. In our case. and A rotates R3 around the axis deﬁned by the ﬁrst eigenvector by an angle of −π/3. the equations Ax = αx − βy and Ay = βx + αy remain true. The three columns of Q deﬁne an orthonormal basis. 2i The equations for the second and third eigenvalue-eigenvector pairs can be written as Av = λv and Av = λv. which when multiplied out becomes Ax + iAy = (αx − βy) + i(βx + αy). both the second and third columns. This can only be done if we divide x and y by the same number. 1 1 2 +i √ 3 2  √  3 √ +i − 3 + i. Just as in Section 13.22. . Diagonalization of Symmetric and Orthogonal Matrices 143 1 We put all this together to obtain the complex diagonal factorization √ √   1 3+i 3−i √ √ A = 1 − 3 + i − 3 − i 1 −2i 2i   1 0 0 √ 0 1 + i 3  0 2 2   √ 1 3 0 0 2 −i 2 √ √  −1 1 3 3 √ +i √ −i 1 − 3 + i − 3 − i . This gives us the real block-diagonal factorization  √ √  1 −1 0 0  √ 1 3 1 1 3 1 √ √ 1 3   A = 1 − 3 1 0 1  2 2 1 − 3 √ 1 1 0 −2 1 0 −2 0 − 23 2 Note that the columns of the ﬁrst factor on the right are orthogonal. √ have length = 6. we will have an orthogonal matrix. we can therefore rewrite the factorization in real form. Equating real and imaginary parts we obtain Ax = αx − βy and Ay = βx + αy. Therefore we are justiﬁed in writing √ 1 1 1  1 1 1 T  √ √ √ √ √ 1 0 0 3 2 6 3 2 6 √ √ 1 3  √ 1 1 1 0 1 1 1  √ √ √ √   − 2 A= 3 − 2 2 2  3 6  6    √ 1 1 2 1 2 √ √ 0 − 23 0 − √6 0 − √6 2 3 3 We have a factorization of the form A = QDQT where Q is orthogonal and D is block-diagonal. We can now see the geometrical eﬀect of A as a transformation of R3 . so that if we normalize each column. But we must be careful that when we divide by lengths. We call it a real block-diagonal factorization . The kind of factorization we have just obtained can be realized for any orthogonal matrix. which correspond to x and y.

and each step consists of two cases. Proof: First we investigate the nature of the eigenvalues. except for λ.   . Let x and y be the real and imaginary parts of the eigenvector v.   .   . From the computation v T v = v T AT Av = (Av)T (Av) = λλv T v. Diagonalization of Symmetric and Orthogonal Matrices Theorem. 0 ∗ ∗ ∗ 0 ∗ ∗ ∗ Let A2 be the matrix in the lower right corner of the last factor on the right. and let Q be the orthogonal matrix with these vectors as its columns.144 22. A = QDQT where Q is orthogonal and D is as above. The proof proceeds in steps. 0 ∗ ∗ ∗ 0 ∗ ∗ ∗ But since QT AQ is orthogonal.. suppose A has eigenvalue λ = ±1 with eigenvector v. Then A2 is orthogonal and. We will just give a sketch of the rest of the proof since it is very similar to that of the Spectral Theorem. we can conclude   ±1 0 0 0  0 ∗ ∗ ∗ AQ = Q  . The second possibility is that λ is complex. Then .. cancelling v T v we obtain λλ = 1 or |λ| = 1. If λ is a possibly complex eigenvalue of A. Assume for a moment that ￿x￿ = ￿y￿ and x · y = 0. First.     −1     1     . has the same eigenvalues as A. then there is an orthonormal basis with respect to which A takes the form   α1 β1  −β1 α1      . Normalize v and expand it to an orthonormal basis of Rn . Therefore A has n eigenvalues each of which is either ±1 or a complex number and its complex conjugate both of length 1. If A is an orthogonal matrix. Then we have   ±1 ∗ ∗ ∗  0 ∗ ∗ ∗ AQ= Q  .     αp βp     −βp αp     −1   . then Av = λv and Av = λv. 1 or.. in other words.

If v T v ￿= 0. The block-diagonal matrix D then assumes the form  α1  −β1              β1 α1  αq −βq βq αq ±1 1 . This ends the proof. 0 0 ∗ ∗ 0 0 ∗ ∗ But since QT AQ is orthogonal. ∗ ∗ Let A2 be as above. We still have to prove ￿x￿ = ￿y￿ and x · y = 0. then A2 is orthogonal and. . Expand x and y into an orthonormal basis and let Q be the matrix with these vectors as its columns.        .       .. Therefore v T v = 0. Continue in the obvious way as in the Spectral Theorem. then we could cancel it from both sides obtaining λ2 = 1. Then we have   α β ∗ ∗  −β α ∗ ∗  AQ= Q  . Note that each consecutive pair of −1’s on the diagonal can be considered as a plane rotation of π radians. since then we would have v T v = (x+iy)T (x+iy) = x·x−y ·y +i2x·y = 0 ⇒ x·y = 0 and x · x = y · y or ￿x￿ = ￿y￿. To show v T v = 0 we compute v T v = v T AT Av = (Av)T (Av) = λ2 v T v. 1 So we can say that an orthogonal transformation in Rn produces a rotation through a certain angle in each of q mutually orthogonal planes and at most one reﬂection .22. contradicting the assumption that λ is complex. . and therefore they can be placed in the sequence of αβ blocks. But the only solutions to the equation λ2 = 1 are λ = ±1 (Exercise 11). It is enough to show v T v = 0. This ends the ﬁrst step. Diagonalization of Symmetric and Orthogonal Matrices 145 we can normalize x and y and still maintain the equations Ax = αx − βy and Ay = βx + αy.. except for λ and λ. we can conclude (Exercise 10) α −β  AQ = Q  0 0  β α 0 0 0 0 ∗ ∗  0 0 . has the same eigenvalues as A.

we see that C = 1 and therefore det(A) = λ1 λ2 · · · λn and tr(A) = λ1 + λ2 + · · · + λn . The determinant of a matrix we already know something about. a11 − λ  a21 det  . so det(A) = ±1. λ + constants = (−λ)n + tr(A)(−λ)n−1 + · · · + det(A) The ﬁrst equality follows from the determinant formula. · · · . The determinant of a matrix is equal to the product of its eigenvalues. The second equality follows by simple computation and the fact that det(A − 0I) = det(A). . then the characteristic polynomial can also be written in factored form as det(A − λI) = C(λ1 − λ)(λ2 − λ) · · · (λn − λ) = C[(−λ)n + (λ1 + λ2 + · · · + λn )(−λ)n−1 + · · · + λ1 λ2 · · · λn ] Equating the two forms of the characteristic polynomial. They are the determinant and the trace. They both have simple and useful expressions in terms of the eigenvalues of A. Diagonalization of Symmetric and Orthogonal Matrices that reverses one direction orthogonal to these planes. 0 −1 = (a11 − λ)(a22 − λ) · · · (ann − λ) + expressions in λn−2 . Finally we leave symmetric and orthogonal matrices and consider two important scalar functions of arbitrary square matrices. · · · . . Suppose A is 3 × 3 orthogonal. λ2 . Theorem. From the considerations above A is a pure . which are summarized in the following. an1  a12 a22 − λ . These facts are useful in analyzing orthogonal transformations of R3 . .  . . λn−3 . both taken over the complex numbers. and the trace of a matrix is equal to the sum of its eigenvalues. Proof: Consider the characteristic polynomial det(A − λI) of A. a pure rotation. If λ1 . ann − λ     R3 the only possibilities  β 0 α 0 .146 22. The trace of a matrix A is deﬁned as the sum of its diagonal elements tr(A) = a11 + a22 + · · · + ann . . Note that the ﬁrst term contains all expressions involving λn and λn−1 . a pure reﬂection. In are      α β 0 −1 0 0 α  −β α 0   0 1 0  −β 0 0 1 0 0 1 0 that is. or a rotation and reﬂection perpendicular to the plane of rotation. an2 ··· ··· ··· a1n a2n . λn are all the eigenvalues of A.

Find the spectral factorizations of the following symmetric matrices. Describe the eigenspaces of the following matrix and how the matrix acts on each. we have cos θ = tr(A) − 1 . In this case tr(A) = 1 + 2α. 2. Since α = cos θ where θ is the angle of rotation. ￿ ￿ 1 4 (a) 1 −2 ￿ ￿ 2 −2 (b) −2 −1   2 1 0 (c)  0 3 0  0 0 3 4. . To ﬁnd the axis and direction of the rotation. Show that an eigenspace of a matrix is a subspace. so A is a pure rotation such that cos θ = (6/3 − 1)/2 = 1/2 and therefore θ = π/3.22. Diagonalization of Symmetric and Orthogonal Matrices 147 rotation if and only if det(A) = 1. EXERCISES 1. it is still necessary to compute the eigenvectors. for the matrix  2 3  −1  3 2 3 2 3 2 3 −1 3 A= −1 3 2 3 2 3    of the earlier example. Find the diagonal factorizations of the following matrices and sketch a diagram that geometrically describes the eﬀect of each. 2 This means that the angle of rotation can be computed without ﬁnding eigenvalues. In particular. we have det(A) = 1. What are the algebraic and geometric multiplicities of the eigenvalues? 2 4 A= 0  3 3 0   0 −1 = 1 0 6 0 0 0 1 −1  0 1 0 0 3 4   0 0 −1  1 6 0 0 6 0 0 0 1 1 0 3 4 −1 3.

Find the spectral factorizations of the following transformations and reconstruct their matrices. ￿ ￿ 3 2 (a) Projection of R onto the line deﬁned by . Find the real block-diagonal factorizations of the following orthogonal matrices and describe geometrically the transformations they deﬁne. Diagonalization of Symmetric and Orthogonal Matrices  3 −2 0 (b)  −2 0 0  0 0 1   4 0 −2 (c)  0 5 0  −2 0 1   0 2 2 (d)  2 0 −2  2 −2 0 5. 1 6.  1  −2 −2 3 3 3  2 1 −2  (a)  − 3 3 3  2 2 1 −3 −3 3 (b)   2 3  2  3 −1 3  2 −2 −2 −1 ￿ −1 3 2 3 2 3 2 3 −1 3 2 3    0  0 (c)  0 −1  0 0 1 0 −1 0   1 0 0 0 0 0 7. 1 ￿ ￿ 3 2 (b) Reﬂection of R across the line deﬁned by .148 (a) ￿ 22. Construct the orthogonal matrix that rotates R3 around the axis deﬁned by the   −1 vector  0  by 90◦ by writing down block-diagonal factorization of the matrix and 1 multiplying it out. .

. v2 . This is The Larry Bird Theorem. then . . . they must say the same thing. If v is a nonzero (possibly complex) vector. State one signiﬁcant fact about the eigenvalues of (a) a symmetric matrix. (b) an orthogonal matrix.) 12. . (g) a projection matrix. . (d) a defective matrix. α What does each equation say about the direction of the rotation of the plane spanned by x and y? (Of course. 14. . . . . . and around vn by an angle θn . . If Q is an orthogonal matrix such that det Q = −1.   x = y . c3 0 0 then c1 = c2 = 0. If Ax = αx − βy and Ay = βx + αy. . .  Ay . . . . then show v T v ￿= 0. . . · · · . . Diagonalization of Symmetric and Orthogonal Matrices 149 8. . Rotate the basketball around v1 by an angle θ1 . You could have achieved the same result with one rotation around a certain axis and by a certain angle. θ2 . β α ￿ or ￿ −β . . . then what can you say about Q as a transformation? 13. . · · ·. . · · · . θn . . . . . . ￿ . . even in the world of complex numbers. (c) a stable matrix. Discuss why this is true and how you could ﬁnd the one axis and angle that will do the job. . . . Show that. . . .     α Ax y = x y −β . Fix the center of a basketball and choose n axes v1 . (Hint: Let λ = α + iβ and reach a contradiction. . the only solutions to the equation λ2 = 1 are λ = ±1.  α x β . . vn and angles θ1 . Show that if a vector  c2  is orthogonal to the two vectors  −β  and  α . .22. . . around v2 by an angle θ2 . (h) a reﬂection matrix. (e) a singular matrix. ￿ . 11.) 9.       c1 α β 10. .

Show tr(A) + tr(B) = tr(A + B). For each matrix below decide if it is symmetric. or diagonalizable. Find a counterexample for the converse ￿ ￿ ￿ 1 0 1 1 (⇐). orthogonal. characteristic ￿ polynomial. a projection. 0 0 A= 0 1 Find their eigenvalues. tr(AB) = tr(BA). 16.150 22. determinant. and rank. and tr(B −1 AB) = tr(A) 17. Show that A = SBS −1 ⇒ A and B have the same trace. eigenvalues. Hint: Try A = and B = 0 1 0 1  1 0 0 0 0 1 0 0  0 0  1 0 1 1 1 B=  4 1 1  1 1 1 1 1 1 1 1  1 1  1 1 . Diagonalization of Symmetric and Orthogonal Matrices 15. invertible.

the quadratic form xT Ax is called nondegenerate. We were able to express the quadratic forms in two and three variables above by means of symmetric matrices. Quadratic Forms In general.. We now turn to the question of how to recognize the graph of a quadratic equation. . since xT Ax = xT AT x (Exercise 1).. . which from Section 22 is ￿3 ￿￿ ￿￿ 3 ￿T 4 4 25 0 5 5 5 5 A= 4 . . Can this always be done? Yes. ￿3 ￿￿ .  .152 23. Since A is symmetric. an2 . . it has a spectral factorization A = QDQT . a quadratic form in n variables is an expression of the form n n ￿￿ i=1 j=1 aij xi xj a11  a21 · · · xn ]  . We can therefore always assume A is 2 symmetric.  .. We 1 2 can write this equation in the form xT Ax = 1 or [ x1 41 −12 x2 ] −12 34 ￿ ￿￿ x1 x2 ￿ = 1.  . and a quadratic equation in n variables has the representation xT Ax + bT x + c = 0. 4 −3 0 50 −3 5 5 5 5 If we substitute this into xT Ax we obtain xT QDQT x = (QT x)T D(QT x) = y T Dy where y = QT x or ￿￿ 3 ￿T ￿ ￿ 4 25 0 x1 5 5 [ x1 x2 ] 5 4 4 0 50 −3 x2 5 ￿ 5 ￿5￿ 3 ￿ 25 0 x1 + 4 x2 3 4 4 3 5 5 = [ 5 x1 + 5 x2 5 x1 − 5 x2 ] 4 3 0 50 5 x1 − 5 x2 ￿ ￿￿ ￿ 25 0 y1 = [ y 1 y2 ] 0 50 y2 4 5 −3 5 2 2 = 25y1 + 50y2 . .  . . xn = [ x1 x2 an1 ann = xT Ax. we have xT Ax = 1 (xT AT x + xT AT x) = xT ( 1 (A + AT ))x. .  .   a1n x1 a2n   x2  . . Example 1: Suppose we have the quadratic equation 41x2 − 24x1 x2 + 34x2 = 1. .) This just amounts to replacing the 2 oﬀ-diagonal elements aij and aji by 1 (aij +aji ).  a12 a22 . If A is also nonsingular... and the 2 2 matrix 1 (A + AT ) is symmetric. (Exercise 2..

We therefore have for any quadratic form xT Ax. T     1 1 √ x1 − √ x3 x1 2 2     √ x +√ x −√ x  2 1 x2 =  16 1 .) This is called the Principal Axis Theorem. and the quadratic 5 5 5 5 2 2 equation expressed in these coordinates becomes 25y1 + 50y2 = 1. which just amounts to a simple From Section 22 Exercise 5(d) the spectral factorization A = QDQT for this matrix looks like  √ 1 0 2 2 2  2 0 −2  =  0  1 2 −2 0 √  2 1 √ 6 2 √ 6 1 − √6 1 − √3 1 √ 3 1 √ 3  Therefore setting y = QT x so that  √ 1 y1 2  y2  =  0  1 y3 √  2 1 √ 6 2 √ 6 1 − √6 1 − √3 1 √ 3 1 √ 3 2   0 0  √ 1 0 0 2  2 0  0 1 0 −4 √ 2 1 √ 6 2 √ 6 1 − √6 1 − √3 1 √ 3 1 √ 3 T   . 2 −2 0 x3 3 5 4 5 . which provides an orthogonal transformation from y-space to x-space. . there is an orthogonal change of variables y = QT x with 2 2 2 respect to which the quadratic form becomes λ1 y1 +λ2 y2 +· · ·+λn yn . Since orthogonal transformations preserve distance. (A is symmetric with eigenvalues λ1 . which is just an ellipse. It is really just the Spectral Theorem in another form. Example 2: To ﬁnd the graph of the quadratic equation 4x1 x2 + 4x1 x3 − 4x2 x3 = 1 we ﬁrst write it as    0 2 2 x1 [ x1 x2 x3 ]  2 0 −2   x2  = 1. The x-coordinates and the y-coordinates are related by Q. the original quadratic equation ￿ ￿ ￿ ￿ 1 0 must also represent an ellipse. 2 2 2 the quadratic equation in terms of the y-coordinates takes the form 2y1 + 2y2 − 4y3 = 1.  6 2 6 3  1 1 1 x3 √ x1 − √ x2 + √ x3 − 3 3 3 The method just illustrated obviously works in general. and therefore congruence. 4 5 −3 5 in x-space. angle. and therefore the quadratic equation 4x1 x2 + 4x1 x3 − 4x2 x3 = 1 describes a hyperboloid of revolution around the axis deﬁned by the third column of Q. λn and Q is orthogonal. · · · . Quadratic Forms 153 The y-coordinates are therefore y1 = 3 x1 + 4 x2 and y2 = 4 x1 − 3 x2 . This is a hyperboloid of revolution around the y3 axis. Furthermore Q takes the coordinate vectors . Therefore 41x2 − 24x1 x2 + 34x2 = 1 is a rotated ellipse with major and 1 2 minor axes along the eigenvectors of A. 0 1 ￿ ￿ ￿ ￿ in y-space to the eigenvectors rotation.23. λ2 .

Show that A + AT is symmetric for any square matrix A. 5. Quadratic Forms EXERCISES 1. −9. ﬁnd a rotation of the coordinates so that the resulting quadratic form is in standard form. 27. (b) eliminate the linear terms by completing the square in each variable and making a translation of the coordinates. (a) x2 + x1 x2 + x2 = 6 1 2 (b) 7x2 + 7x2 − 5x2 − 32x1 x2 − 16x1 x3 + 16x2 x3 = 1 (Hint: The eigenvalues are 1 2 3 −9. (a) 14x2 − 16xy + 5y 2 = 6 (b) 2x2 + 4xy + 2y 2 + x − 3y = 1 6. (a) 2x2 + 2y 2 + 3z 2 + 4yz = 3 (b) 2x2 + 2y 2 + z 2 + 4xz = 4 . For the each of the following quadratic equations. Show that xT Ax = xT AT x. 3. For the quadratic equation 6x2 −6x1 x2 +14x2 −2x1 +x2 = 0. (Hint: Since xT Ax is a 1 × 1 matrix. (a) ﬁnd a rotation of 1 2 the coordinates so that the resulting quadratic form is in standard form. Identify the following quadrics.) 2. Identify the following conics. and identify and sketch the curve or surface. it must equal its own transpose.154 23.) 4. and (c) identify and sketch the curve.

But there are cases like 2x2 + 3y 2 and x2 − 2xy + 2y 2 = (x − y)2 + y 2 that are positive for all nonzero values of (x. or like −x2 − 6y 2 and −x2 + 4xy − 4y 2 = −(x − 2y)2 that are negative for all nonzero values of (x. y) has a maximum at (0. . 0) and f (x. y). Positive Deﬁnite Matrices 155 24. We are therefore led to the following deﬁnition. Then we have f (x. and zero values for various values of (x. . 0) is a critical point. an2 . .. . 0) + fx (0.  . that is if xT Ax < 0 for every x ￿= 0. Putting this back into the Taylor series and rewriting the second order terms. . 2! Since (0. y) can be expanded in a Taylor series in a neighborhood of that point. we have f (x. . . 0) = ax2 + bxy + cy 2 + higher order terms. y). 0) = 0. y) behaves near (0. 0)y + 1 (fxx (0. . In general. A symmetric matrix A is positive deﬁnite if its associated quadratic form xT Ax > 0 for every x ￿= 0. 0)x2 + 2fxy (0. How can we tell if a symmetric matrix is positive deﬁnite? There are ﬁve ways to answer this question. 0). y). Suppose we want to determine the nature of the critical points of a real valued function z = f (x. 0)x + fy (0. 0)xy + fyy (0.. 0)y 2 ) + · · · . y) then f (x.24. we must have fx (0. a12 a22 a32  a13 a23  a33 an1 ann we deﬁne the leading principal submatrices of A to be  ￿ ￿ a11 a11 a12 A1 = [ a11 ] A2 = A3 =  a21 a21 a22 a31 ··· Now for the characterization of positive deﬁnite matrices.. y). if the quadratic form ax2 + bxy + cy 2 is positive for every nonzero choice of (x. and if ax2 + bxy + cy 2 is negative for every nonzero choice of (x. an arbitrary quadratic form ax2 + bxy + cy 2 will assume positive..  a1n a2n  . y) then f (x. 0). y) − f (0.. y) has a minimum at (0. POSITIVE DEFINITE MATRICES Now we investigate how quadratic forms arise in the problem of maximizing and minimizing functions of several variables. Its proof is long but instructive. 0) like its second order terms ax2 +bxy+cy 2 . negative. We also say A is negative deﬁnite if −A is positive deﬁnite. y) = f (0. Assume for simplicity a critical point occurs at (0. This means that f (x. .  a12 a22 . and we present them all in the following theorem. First we need a deﬁnition: For any square matrix a11  a21 A= . . 0) = fy (0.  . That is to say. ..

λn are positive.. ann 0 There are similar equalities for all the other leading principal submatrices. (c) All the leading principal submatrices A1 . . · · · . Therefore. . . (a) ⇒ (c): Since A is positive deﬁnite. An of A have positive determinants. (a) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (a). λ2 . We now illustrate the implication of this for the 4 × 4 case. .. (a) ⇒ (b): If A is positive deﬁnite and Ax = λx.156 24. For any symmetric n × n matrix A the following statements are equavalent. · · · . This follows for A2 for example from the equality [ x1 a x2 ] 11 a21 ￿ a12 a22 ￿￿ x1 x2  ￿ a12 a22 .  . A2 .  . An . . since det(Ai ) equals the product of its eigenvalues (by the symmetry of Ai and Section 22 Exercise 6).   0  > 0. (c) ⇒ (d): We ﬁrst note that the Gaussian step of multiplying one row by a scalar and substracting from another row has no eﬀect on the determinant of a matrix or on the determinant of its leading principal submatrices. . . then 0 < xT Ax = xT λx = λ￿x￿2 and therefore 0 < λ. (e) There is a matrix R (not necessarily square) with independent columns such that A = RT R.  .. = [ x1 x2 a11  a21 0 ··· 0] . 2 2 2 (b) ⇒ (a): By the Principal Axis Theorem xT Ax = λ1 y1 + λ2 y2 + · · · + λn yn where y = QT x and Q is orthogonal. · · · . Proof: We show (a) ⇔ (b). an2 . (a) A is positive deﬁnite. . an1    x1 a1n x  a2n   2  .  . we have det(Ai ) > 0. Positive Deﬁnite Matrices Theorem. if all the eigenvalues λ1 . which are all positive by (b) above. then xT Ax > 0 for any x ￿= 0.. A2 . . then so are all the leading principal submatrices A1 . (b) All the eigenvalues of A are positive...  . Initially A looks like  p11  ∗  ∗ ∗  ∗ ∗ ∗ ∗ ∗ ∗  ∗ ∗ ∗ ∗ ∗ ∗ . Therefore. (d) A can be reduced to upper triangular form with all pivots positive by using only the Gaussian operation of multiplying one row by a scalar and subtracting from another row (no row exchanges or scalar multiplications of rows are necessary).

Finally we run one more Gaussian step and obtain   p11 ∗ ∗ ∗ ∗   0 p22 ∗  . 0 0 p33 ∗ 0 0 0 p44 Then p11 p22 p33 p44 = det(A4 ) > 0 ⇒ p44 > 0. It is useful in numerical applications and can be computed by a simple variant of Gaussian elimination. Now we use this result. Our goal is to show LT = M or LT M −1 = I. AT = A ⇒ M T DLT = LDM ⇒ LT M −1 = D−1 (M T )−1 LD. ∗ ∗ Then p11 p22 p33 = det(A3 ) > 0 ⇒ p33 > 0. Note that no row exchanges are necessary. We quickly indicate the proof. which has the form A = RT R. Rx = 0 ⇔ x = 0. Since A is symmetric with positive pivots. We then have A = (L D)( DLT ). then the same is true of M (LT )−1 (Exercise 1). This ends the proof. In the last equation. We conclude that M (LT )−1 = I. Both sides are therefore diagonal. √ √ The factorization A = (L D)( DLT ) is called the Cholesky factorization of the symmetric positive deﬁnite matrix A. ∗ ∗ ∗ ∗ Then p11 p22 = det(A2 ) > 0 ⇒ p22 > 0. We run one Gaussian step and obtain  p11  0  0 0 ∗ p22 ∗ ∗ ∗ p22 0 0  ∗ ∗ ∗ ∗ . Therefore x ￿= 0 ⇒ xT Ax = xT RT Rx = (Rx)T (Rx) = ￿Rx￿2 > 0. . (d) ⇒ (e): This is the hard one! We need a preliminary result: If A is symmetric and has an LU-factorization A = LU . We run another Gaussian step and obtain  p11  0  0 0 ∗ ∗ p33 ∗  ∗ ∗ . (e) ⇒ (a): Since R has independent columns. we have A = LDLT where the diagonal entries of D are all positive. since LT and M −1 are each upper triangular with ones down their diagonals. the left side is upper triangular since it is a product of upper triangular matrices. Since A is symmetric. Furthermore. Positive Deﬁnite Matrices 157 and we have p11 = det(A1 ) > 0. we immediately have A = LDM where M is upper triangular with ones down its diagonal. If we divide each row of U by its pivot and place the pivots into a diagonal matrix D. The general case is now clear. and the right side is lower triangular since it is a product of lower triangular and diagonal matrices. We can therefore √ deﬁne D to be the diagonal matrix with diagonal entries equal to the square roots √ √ of the corresponding diagonal entries of D. then it has a factorization of the form A = LDLT where D is diagonal.24.

and (e) A = −RT R for some matrix R with independent columns. we can also take the square root of the diagonal matrix in the spectral factorization of √ √ √ √ A to obtain A = (Q D)( DQT ) = ( DQ)T ( DQ) or  1 1 1  √ 1 2  √ √ √ − √3 0 2 6 2 2  2 1  √ 1 2 2 √ √ √ − √6  .158 24. · · · (Exercise 2). det(A2 ) = 3. Example 3: Let’s check each of the conditions above for the quadratic form 2x2 + 1 2x2 + 2x2 − 2x1 x2 − 2x1 x3 + 2x2 x3 . (c) det(A1 ) < 0. 0  0 √ A =  − √2   2 2 6 1 1 2 2 − √2 √6 √3 √ 0 0 3   . Positive Deﬁnite Matrices From this theorem we can also characterize negative deﬁnite matrices. so we have the factorization A = LDLT or     1 0 0 2 0 0 1 −1 −1 2 2 1  A =  −1 1 0   0 3 0   0 1 . 2 2 3 1 1 4 −2 3 1 0 0 3 0 0 1 √ √ √ √ We therefore can write A = (L D)( DLT ) = ( DLT )T ( DLT ) or  √   √2 − √ 1 1 − √2  2 ￿ 0 0 2 ￿  1 3 3 1 . det(A3 ) < 0. A= 0  6  6 3 6 2 2 2 1 1 2 √ √ √ − √6 − √3 √3 2 3 3 . (d) all the pivots of A are negative. 2 2 2 1 1 4 −2 3 1 0 0 3 The pivots are all positive. −1 1 2 The the spectral factorization of A is  1 1 1  √ √ − √3  1 0 2 6  2 1  √ √ A= 0 0 1 6 3  1 1 1 0 0 √ √ − √6 2 3  √ 1 0 2  0 0 1 4 √ 2 1 √ 6 2 √ 6 1 − √6 1 − √3 1 √ 3 1 √ 3 T All the eigenvalues are positive. det(A2 ) > 0. The equivalent statements are (a) A is negative deﬁnite. The LU factorization of A is    1 0 0 2 −1 −1 1  A =  −1 1 0   0 3 . (b) all the eigenvalues of A are negative. The leading principal submatrices have determinants det(A1 ) = 2. First we write it in the form xT Ax where 2 3   2 −1 −1 A =  −1 2 1 . and therefore A is positive deﬁnite. which has the form A = RT R. There is nothing unique about R. det(A3 ) = 4 and are therefore all positive as they should be. For example.

0 0 0 0 1 0 1 1 In fact.   . 0)fyy (0. There are many other such R’s. 0) = fx2 (0. If the Hessian evaluated at (0. Positive Deﬁnite Matrices 159 which also has the form A = RT R. 0. 0).  . 0. x2 . · · · .24.0. xn ). 0) < 0 and fxx (0. 0) − (fxy (0. 0. · · · . fxy (0. 0) is positive or negative deﬁnite. . it . 0) fxy (0. . It appears in the normal equations AT Ax = AT b. 0))2 > 0. 0. 0) = · · · = fxn (0. 0) > 0 and fxx (0.0)  · · · fxn xn  x1  x2   . · · · . 0) + 1 [ x1 2! x2 + higher order terms  fx2 x1 · · · xn ]  . . the product RT R should look familiar.···. In the n-variable case. 0) fyy (0. . not even necessarily square. 0)y 2 ) 2! or of the matrix ￿ ￿ fxx (0. 0. This is just the second derivative test from the calculus of several variables. we immediately obtain that (0. To determine if a large matrix is positive deﬁnite. xn ) has a critical point at (0. We have seen that the question comes down to the positive or negative deﬁniteness of the quadratic form 1 (fxx (0. · · · . 0) − (fxy (0. · · · . xn ) = f (0. 0). Now let’s return to the problem of maximizing or minimizing a function of two variables. xn The matrix of second derivatives is called the Hessian of f (x1 . a maximum point if fxx (0. . · · · . 0)x2 + 2fxy (0.  . for example 1 1  −1 0 A= 0 −1    1 −1 0  0 0  1 0 −1  0 1 . x2 . 0) . 0. 0. 0)fyy (0. · · · . 0) From the characterization of positive and negative deﬁnite matrices in terms of the signs of the determinants of their principal leading submatrices. 0)xy + fyy (0. · · · . We conclude that least squares problems invariably lead to positive deﬁnite matrices. 0) = 0 and locally we have f (x1 . if a function f (x1 . · · · . . 0) is a minimum point if fxx (0. fxn x1 f x1 x1 fx1 x2 fx2 x2 . . then a maximum or minimum occurs at (0. fx1 xn fx2 xn . fxn x2 ··· ··· . 0))2 > 0. then fx1 (0. · · · . x2 .     (0.

Prove that (a) A + B is positive deﬁnite. det(A2 ) > 0. EXERCISES 1. So we have come full circle. Show by an example that the product of two positive deﬁnite symmetric matrices may not deﬁne a positive deﬁnite quadratic form. 7) 6.) (b) A is nonsingular and A−1 is positive deﬁnite. (Use the deﬁnition.160 24. Positive Deﬁnite Matrices is obviously not eﬃcient to use the determinant test as we did for the 2×2 case above.) 0 2 5 √ A. (Use the deﬁnition. 0). That is the paramount and overriding principle of the subject and of these notes.) (f) The diagonal elements aii of A are all positive.) (d) E 2 is positive deﬁnite. Decide if the each of the indicated critical points is a maximum or minimum. · · ·? 3. It is much better to check the signs of the pivots. Gauss reigns supreme here as in every other domain of linear algebra. (Use the eigenvalue test and the Spectral Theorem. 5. det(A3 ) < 0. (See Exercise 5 above. make a reasonable deﬁnition of   3 2 0 it for A =  2 4 2 . (a) f (x. For positive deﬁnite matrices A. (Use the eigenvalue test and the Spectral Theorem. show A has all eigenvalues positive and all pivots positive and obtain two diﬀerent factorizations of the form A = RT R.) 4. Write the quadratic form 3x2 + 4x2 + 5x2 + 4x1 x2 + 4x2 x3 in the form xT Ax 1 2 3 and verify all the statements in the theorem on positive deﬁnite matrices. Let A and B be symmetric positive deﬁnite. 2. (Use the eigenvalue test and the Spectral Theorem. and F just symmetric. Describe the quadric surface 3x2 + 4x2 + 5x2 + 4x1 x2 + 4x2 x3 = 16 1 2 3 (Hint: λ = 1. Why does the determinant test for negative deﬁniteness look like det(A1 ) < 0.) (c) C T AC is positive deﬁnite.) (e) eF is positive deﬁnite. and compute 7. because they are easily found by Gaussian elimination. y) = −1 + 4(ex − x) − 5x sin y + 6y 2 at the point (0. (Take x to be a coordinate vector in the deﬁnition. . C be nonsingular. That is. E be nonsingular and symmetric. 4. one from A = QDQT and the other from A = LDLT . Show by example that the set of upper triangular matrices with ones down their diagonals is closed under multiplication and inverse.

24.  1 0  1 0 0 2 1 1 1 1 3 1  0 1  1 2 9. y) = (x2 − 2x) cos y at the point (1. Positive Deﬁnite Matrices 161 (b) f (x. 8. Characterize positive semideﬁnite matrices in terms of their eigenvalues. A symmetric matrix A is positive semideﬁnite if its associated quadratic form xT Ax ≥ 0 for every x ￿= 0. . Test the following matrix for positive deﬁniteness the easiest way you can. π).

 −.5 . 100 5.5  −3 4.75 1 0 9.5 1 0 1  2 (c)  −3 −11 1 2    17 (c)  4  −7  −8 10 (g)  −14 26  −2 −1   0 0 0  (k)  0 1 0 0 243 0  0 7  −5   (d) [ 2  14 −8 ] (h) 0 0 0  8 −3 12  5 0 7  −6 −3 −8  0 0 0   1 0 0 2 1 3 (b)  −1 1 0  0 6 4  0 0 −2 2 0 1   0 0 1 3 2 −1 0 0  0 −1 −1 4    1 0 0 0 −6 43 −. (a) 3 7 0 9. 1 serving of chicken. 580. 10 servings of pasta. (a)  ￿ −2 1 ￿ (b) ￿ −1 −4 ￿  1. y = 3x3 − 5x2 + x + 2 SECTION 2  ￿ ￿ 7 10 14 −2  10 1. y = x3 − 2x2 − 3x + 5 8. (a) .5 1 0 0 0 15.5 2. (a) (b) 8 −4 0 6   4 8 12  5 10 15  (e) [ 32 ] (f) 6 12 18    4 0 −1 32 0 1  (j)  0 (i) 0 2 −2 1 0   2 5 0  −1 0 −1  6. SECTION 3 ￿ ￿￿ ￿ 1 0 4 −6 1. All but the last two. 50 −2 (c)  3  −1   −. 4 servings of broccoli 7.5 (d)  5  −3     1 0 (e)   2 1 6.162 Answers to Exercises ANSWERS TO EXERCISES SECTION 1 1. 150.

5 −1.2 0 0 .5 0 0 . only (c)  10. (b) True. (a) 2 −1    . (a) False. (b) inﬁnitely many SECTION 5 ￿ ￿ −7 4 1. 350.5 .  −3  4  2 (b)  −1  3  0 0 2 0 0  0  0 0  0  1 0 0 2 1 0  1 0 0 0 3 3 0 0  0 1 1 0  0 0 2 1 0 0 0 1   2   1  0  0   (c)   (d)  −1  0   0 1 1 2. (a) none.5 −. s1 = 3.5  0 0 −. (a) 2  0 0 1 0 1 1 0 −1 0 0  3. 1628 SECTION 4   2 1. s3 = −3 .5 (b)  0 10 0  (c)  0 .Answers to Exercises 163 1 2  (d)  0  0 0 ￿ ￿ 1 2.  0 1  0 0 0 0 0 1 4 1 0 0 0 1 4 1     0 s0 3 0   s1   12      0   s2  =  0      1 s3 −12 2 s4 −3 s0 = s2 = s4 = 0.  7  2 4. all except (c) 3. SECTION 6  2 1 1 4  1.2       1 −2 1 0 10 −6 1 −1 0 1  1 −2 2 −3  (d)  −2 1 0  (e)  −5 1 3  (f)   0 1 −1 1 −7 5 −1 7 −1 −4 −2 3 −2 3 ￿ ￿ 1 d −b (g) ad − bc −c a   2 3.

milk = 4 − c. b = c = d = 1 10. −6 6 5. (a) 3 (b) −12 (c) x + 2y − 18 (d) −x3 + 6x2 − 8x 7.164 Answers to Exercises SECTION 7    2 −1 2. (a) three planes intersecting in a point (b) one plane intersecting two parallel planes (c) three nonparallel planes with no intersection (d) a line of intersection (e) a plane of intersection 8.5  + c −1  0 1 (b) no solution       3 1 −2 (c)  0  + c 0  + d 1  0 1 0     3 −1  −1   0  (d)   + c  0 1 0 0       2 −1 −2  0   0   1  (e)   + c  + d  −1. x2 + y 2 − 4x − 6y + 4 = 0 SECTION 8 1. SECTION 9 ￿ ￿ ￿ ￿ 1 1 1. orangejuice = c where 2 ≤ c ≤ 4 9. − . (a) −6 (b) −16 (c) −24 (d) −12 (e) −1 (f) −1 1 4. (a) two intersecting lines (b) two parallel lines (c) one line 4. (a) for λ = 1: . eggs = −2 + c. for λ = 2: 0 1 . True. a = 2.5 −.5 0 0 1 0 ￿ ￿ 3 (f) −5 3. (a)  .

 0  0 0 1    1 0 0 0 (b) 0 1   0 (c)  0  1 SECTION 10 ￿ ￿ ￿ ￿￿ ￿￿ ￿−1 1 1 1 1 1 0 1 1 1. (a)  0 . for λ = −4:  1  (e) for λ = 2: 1 0 1         1 1 0 0 0 1 1 0 (f) for λ = −2:   and  . for λ = 2:  −2  . for λ = 3:   and   0 0 1 1 0 0 0 1       1 0 0 5.Answers to Exercises 165       0 1 −1 (b) for λ = 1:  1  and  0  . for λ = 6:  −1  1 1 1       1 1 −1  0  and  1 . (a) = 0 2 0 1 0 2 0 1      −1 5 0 2 0 1 −1 1 0 0 0 1 −1 (b)  0 1 0  =  1 0 0  0 1 0 1 0 0  −4 0 −1 0 −2 1 0 0 3 0 −2 1      −1 2 2 2 −2 0 2 0 0 0 −2 0 2 (c)  1 2 0  =  1 −1 1   0 2 0   1 −1 1  1 0 2 1 1 1 0 0 4 1 1 1      −1 6 4 4 0 1 1 −1 0 0 0 1 1 (d)  −7 −2 −1  =  −1 −2 −1   0 2 0   −1 −2 −1  7 4 3 1 1 1 0 0 6 1 1 1 . for λ = 4:  1  1 1 1       0 1 1 (d) for λ = −1 :  −1  . for λ = 2:  −1  . for λ = 3:  0  0 −2 1       −2 0 2 (c) for λ = 0:  1  .  1 .

since symmetric.166 Answers to Exercises 3. (In fact it is. (c) Maybe.) (b) Yes. (In fact it is not. (a) Maybe. (a) exp t = 2t 0 2 0 1 0 e 0 1 2 2 ￿￿ ￿ 0 (e)  2 2  −2  0 (f)  0 0      2 2 1 1 −1 2 0 0 1 0 −2  =  0 1 1   0 2 0   0 −2 0 1 0 1 0 0 −4 1    0 0 0 1 1 0 0 −2 0 −2 5 −5   0 1 1 0   0 −2 =  0 3 0 0 0 1 1 0 0 0 0 3 0 0 0 1 0 0 −1 1 −1 1 1  0 1  0 0 1 1 0 0 0 1  3 0 0 0 0 3 0 0 0 1 1 0 −1 0 0  1 1 . (a) exp 0  ￿￿ ￿￿ ￿−1 1 1 e 0 1 1 = 0 1 0 e2 0 1     −1 5 0 2 0 1 −1 e 0 0 0 1 −1 (b) exp  0 1 0  =  1 0 0  0 e 0  1 0 0  3 −4 0 −1 0 −2 1 0 0 e 0 −2 1      −1 2 2 2 −2 0 2 1 0 0 −2 0 2 (c) exp  1 2 0  =  1 −1 1   0 e2 0   1 −1 1  1 0 2 1 1 1 0 0 e4 1 1 1   6 4 4  −7 −2 −1  (d) exp 7 4 3    −1  −1 0 1 1 e 0 0 0 1 1 =  −1 −2 −1   0 e2 0   −1 −2 −1  1 1 1 0 0 e6 1 1 1     2  −1 0 2 2 1 1 −1 e 0 0 1 1 −1 (e) exp  2 0 −2  =  0 1 1   0 e2 0  0 1 1  −4 2 −2 0 1 0 1 0 0 e 1 0 1   −2 0 0 0  0 −2 5 −5  (f) exp   0 0 3 0 0 0 0 3   −2  −1 1 1 0 0 e 0 0 0 1 1 0 0 e−2 0 0  0 1 1 0   0 1 1 0  0 =    0 0 1 1 0 0 e3 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 e3 ￿￿ ￿ ￿ ￿ ￿￿ t ￿￿ ￿−1 1 2 1 1 e 0 1 1 2.) SECTION 11 ￿￿ 1 1.

(a) c1 e + c2 e 0 1       0 1 −1 (b) c1 et  1  + c2 et  0  + c3 e3t  0  0 −2 1       −2 0 2  1  + c2 e2t  −1  + c3 e4t  1  (c) c1 1 1 1       0 1 1 (d) c1 e−t  −1  + c2 e2t  −2  + c3 e6t  −1  1 1 1  .Answers to Exercises 167   0 2 1 0  t 0 −1   t  −1 0 1 −1 e 0 0 0 1 −1 = 1 0 0   0 et 0   1 0 0  3t 0 −2 1 0 0 e 0 −2 1       −1 2 2 2 −2 0 2 1 0 0 −2 0 2 (c) exp  1 2 0  t =  1 −1 1   0 e2t 0   1 −1 1  1 0 2 1 1 1 0 0 e4t 1 1 1    6 4 4  −7 −2 −1  t (d) exp 7 4 3    −t  −1 0 1 1 e 0 0 0 1 1 =  −1 −2 −1   0 e2t 0   −1 −2 −1  1 1 1 0 0 e6t 1 1 1       2t  −1 0 2 2 1 1 −1 e 0 0 1 1 −1 (e) exp  2 0 −2  t =  0 1 1   0 e2t 0 0 1 1  −4t 2 −2 0 1 0 1 0 0 e 1 0 1    −2 0 0 0  0 −2 5 −5   (f) exp   t 0 0 3 0 0 0 0 3   −2t  −1 1 1 0 0 e 0 0 0 1 1 0 0 e−2t 0 0  0 1 1 0   0 1 1 0  0 =    3t 0 0 1 1 0 0 1 1 0 0 e 0 0 0 0 1 0 0 0 e3t 0 0 0 1 5  0 (b) exp −4 SECTION 12 ￿ ￿ ￿ ￿ t 1 2t 1 1.

(a) neutrally stable (b) unstable (c) stable SECTION 13 3. (a) et + 2e2t 0 1       0 1 −1 (b) 2et  1  + 2et  0  + e3t  0  0 −2 1       −2 0 2  1  + e2t  −1  + e4t  1  (c) 1 1 1       0 1 1 (d) e−t  −1  − e2t  −2  + e6t  −1  1 1 1       1 1 −1 (e) 3e2t  0  + 2e2t  1  + 1e−4t  1  1 0 1         1 1 0 0 0 1 1 0 (f) e−2t   + e−2t   + e3t   + 2e3t   0 0 1 1 0 0 0 1 3. (a) 2 2 0 3 − i2 2 2 ￿ ￿￿ ￿￿ ￿−1 3 1 3 2 3 1 = 2 0 −2 3 2 0    −1 −i i 0 −1 + i3 0 0 −i i 0 (b)  1 − i 1 + i 1   0 −1 − i3 0   1 − i 1 + i 1  1 1 0 0 0 1 1 1 0    −1 0 −1 0 −1 3 0 0 −1 0  1 −1 1   −3 −1 0   1 −1 1  = 0 0 1 1 0 0 1 0 0 .168 Answers to Exercises       1 1 −1 (e) c1 e2t  0  + c2 e2t  1  + c3 e−4t  1  1 0 1         1 1 0 0 0 1 1    0 (f) c1 e−2t   + c2 e−2t   + c3 e3t   + c4 e3t   0 0 1 1 0 0 0 1 ￿ ￿ ￿ ￿ 1 1 2. α ± iβ ￿ ￿￿ ￿￿ ￿−1 3+i 3−i 3 + i2 0 3+i 3−i 4.

5 . 4. (a) not closed under addition or scalar multiplication (b) not closed under addition (c) not closed under scalar multiplication (d) not closed under addition .5 1 −1 1         2 0 2 2 k  k  =  1 − (. (b) c1 = 1.5)k ￿ ￿ ￿ ￿ ￿ ￿ 5 1 6(3)k − 10(−1)k 3 k 2 k −2 (c) uk = (3) + (−1) = . ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 1 1 + (.25 0 . 3.5 1 −1 1 0 0 . (a) c1 = 2. c2 = 2.5)k (b) uk = 1(−1)k + 2(.5)k  −1  →  50  0 1 2 50     1 . no limit 2 4 2(−1)k + 8(. (a) (c1 e cos 2t + c2 e sin 2t) + (−c1 e sin 2t + c2 e cos 2t) 2 0   0 (b) (c1 e−t cos 3t + c2 e−t sin 3t)  1  + 1     −1 0 (−c1 e−t sin 3t + c2 e−t cos 3t)  −1  + c3 et  1  0 0 3t 3t 6.5 .5 0 .5 .5 2 2 0 1 0 0 2 2 0  . everyone dies! 0 .5 . .5 0 Everyone has blue eyes! k SECTION 15 3.5) −1 1 1 1 + (. blows up 1 1 4 4 4 3(3)k + 5(−1)k      −1 .5 0  =  1 −1 −1   0 0 0   1 −1 −1  .5)k = .Answers to Exercises 169 ￿ ￿ ￿ ￿ 3 1 3t 3t 5.25)k 1 64 k −1 (a) uk = 64(1) − 64(. bounded.25 . c3 = 3 2.25)k 2 128 ￿ ￿ ￿ ￿ ￿ ￿ 6 6 6(−1)k + 12(.5 .25) = 64 → 64 = 2 1 2 − (.5)k  →  1  1(1) 1 + (.5 0 1 2         −1 1 −1 50 −30(.5)k 1      −1 .5 0 0 −1 1 −1  0 .5 .5 0 0 1 2 0 0 −.5 −1 1 −1 .25 0 1  0 . .5  =  1 1 −1   0 1 0   1 1 −1  .  0 .5)k  1  + 50(1)k  1  − 10(−.25 . 5. c2 = −3 SECTION 14 1.

1 (c) 0 2 .170 Answers to Exercises (e) not closed under scalar multiplication 4. (a) .1. (a) +c 3 0 1       1 −1 −1 (b)  0  + c  0  + d  1  1 0 0 SECTION 16 1. ￿ ￿ 1 6.0 (b) 0 0 1     1 0 0. (a) independent (b) independent (c) dependent (d) independent (e) dependent ￿ ￿ ￿ ￿ 1 0 2. All span the plane of x1 − x2 = 0 in R3 . 0 1       1 0 0 0. (a) c 3     −1 −1 (b) c  0  + d  1  1 0   −1  0  (c) c 1       4 −3 2 0  0  1 (d) c   + d   + e  0 1 0 1 0 0     2 −1 1  0      (e) c  4  + d  −1      0 1 1 0 ￿ ￿ ￿1￿ 1 7.

  1 0 0 0 0 1     3 0 0 3 (e)   . Same answers as   for Section 15 Exercise 6. (a) 3  1  − 2  2  2 1 (b) no solution       3 2 −1 (c) (6 + c)  1  + (−4 − c)  2  + c  1  2 1 −1 ￿ ￿ ￿ ￿ 2 1 (d) ) +6 1 2 6. (a) ￿x￿ = 5.   .Answers to Exercises 171       1 0 0 0 1 0 (d)   .   3 0 1 1 3. (a) U and V might be. W is not. (5. V and W might.43   −2  −4  (d)   4 8     −2 −4  −4   2  (e)  +  4 −2 8 1 2.  2  −5   5√5     9 −4 √ 5 (c) 153. (c) U and W are not. SECTION 17 √ 1.   3 2 5. V might be. ￿y￿ = 5 5  √  −565  1  5  √   2   −525   5    (b)  2  . c α ◦ 5 5 many solutions . 15/2) ￿ ￿ −β 3. (b) U does not.

(b) False. (i) Rotation of R3 around y-axis by −90◦ . (a)  0 −1 0  0 0 −1   1 0 0 (b)  0 0 0  0 0 1  .172 Answers to Exercises    −1 −1 5. (d) Rotation of R2 by 45◦ . α (h) Rotation of R3 around z-axis by 90◦ . ￿ ￿ β 3 (l) Rotation of R around z-axis by arctan and reﬂection in xy-plane. (k) Rotation of R3 around z-axis by 90◦ and reﬂection in xy-plane. SECTION 18 2. (c) Projection of R2 onto 135◦ line. (b) Projection of R2 onto y-axis. (a) −x1 + x2 = 0. (f) Reﬂection of R2 in 150◦ line. (a) False. ￿ ￿ β (g) Rotation of R2 by arctan . (e) Rotation of R2 by −60◦ . α   −1 0 0 3. (a) c  1  + d  0  0 1   2 (b) c  −3  1     1 0  −3   −1  (c) c   + d  0 1 1 0   −1  1  (d) c   2 0 6. −x2 + x3 = 0 (d) −x1 + x2 + 2x3 = 0 7. (a) Reﬂection of R2 in 135◦ line. −x1 + x3 = 0 (b) 2x1 − 3x2 + x3 = 0 (c) x1 − 3x2 + x4 = 0. (j) Projection of R3 onto xy-plane.

 1 .   1 0 1 0 0 0 2 0 SECTION 19 ￿ ￿ ￿ ￿ ￿ ￿ 1 1 −2 1. rotates by 180◦ around the line deﬁned by the vector  0 . .  0 .  4 .  6 . rank = 2           1 5 3 −3 0 1 0 4 3 1 0 0  0 1 0 1 0 0 0 0 1   1 0 0 1 1  0 √ 2 − √2  1 1 √ 0 √2 2 . 0 −1 ￿ ￿ −1 0 (b) . R2 → R2 . ￿ ￿2 ￿ ￿2 x y (b) + = 1.  −3 .  0 . 2 .  −5 .Answers to Exercises 173 (c)  (d) 4.  0 . . . . R3 → R4 . (a) .  0 . R5 → R3 . 1 0 0 1   0 1 0 0 0 0 2 1 13.  3 . reﬂects in the x-axis. .  2 . rank = 1 2 2 1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ 1 0 1 2 0 (b) . rank = 2 0 1 2 3 0           1 0 2 4 0 (c)  0 . R3 → R3 .  1 . rank = 2 0 1 2 8 −2        3   2   −1    1 0 0 0  0 . (a) x2 + y 2 = 4. a circle of radius 2. −1 . . 0 1     0 0 1 1 (c)  0 −1 0 . 2 3 ￿ ￿ 1 0 7. rank = 3 (d)       −3 −1 8 0 0 1 0 0 −1 7           1 0    −1 −2  −4 −1  0   0   1  2 0 1              (e)  0 .  1 .  5 . reﬂects in the y-axis. R2 → R2 . (a) . an ellipse.

(a) . (d) Yes.  1 .  . (c)  0 .  2 . R5 → R4 .58 ￿ ￿ .9 4 2 4 (b)  1 0  0 2 9 4 9 9 9 1 9 2 9 9 2 9 4 9  −1  2 1 4 3    Cu 63.174 Answers to Exercises 2.  . 0     −9 −2 (a) No.13  3 21 32 S 2389.851 ! S 32. (a) 1  5  (b) 2. (e) Yes.3 4.  . (c) No. 6.1x (c) z = 2 + 2x + 3y (d) z = 10 + 8x2 − y 2 (e) y = − 3 − 3 t + 7 t2 2 2 2 (f) y = 2 + 3 cos t + sin t      5 0 3 Cu 413.065  0 1 2 1 2 0 1 2 1 2 (c)      .3 .  1 .  0 .  0 . 5. −2 −6 −1           0 0 1 −2 0 0 2 −2 −16 4 2 1 0 rank = 3  0 (a) Yes. (a) y = 4 x 3 (b) y = −. None or inﬁnitely many.91 3.  0 15 21  F e  =  1511. (c) Since dim(col(A)) = dim(row(A)).  SECTION￿20￿ 4 1. 3. (b) Yes. yes. (b) No.543 The solution is  F e  =  55.2 + 1. (d) No. (b) Since dim(row(A)) + dim(null(A)) = 3 and dim(col(A)) = dim(row(A)).  0 . 1 0 (a) Since row(A) ⊥ null(A).          1 0 0 16 6       2 8 0  0  1 0  −4   −2         2   7   1      (f)  −6 .1 .

(a) (b)  0 1 0 ￿ ￿ .  2  −1 2  1 2  1  2  1  2 −1 2      .      −1 2  −1   2  1 .8 5 1 0 0 1   6 1 6 1 3 6 5 6 −1 3 1 3 −1 3 1 3    SECTION 21 ￿ 5 ￿ ￿ 1. (a) 13 12 13 .4 .  7 .  (c) 1 2  −1  2  1  −2 −1 2  6 −2 7 7   −3   2  .  7   6 7 3 7 1 2  −1  2  1  2 1 2 1 3 2 15 14 15     .Answers to Exercises 175 (d) 1 2 1 2 (e) 1 3 1 3 1 3 0  0 1 0 1 3 1 3 5.   1 5 2 5 2    0 0 1 3 0 0 1 0 0  1 2 1 2  1 3 1 3 1 3    0  6. − 12 13 5 13 ￿ (b)     −3 7 6 7 2 7       .  1 0 0 7.2 .  (d)    −2 3 11 15 2 15     .4 .

  7 1 7  1  9  4  9 −8 9 6 7 2 7  1 2  −1  2  1  −2 −1 2  2 −3  11  15 2 15  1 2  −1  2  1  −2 −1 2    −3 7 6 7 1 2 −1 2 1 2 1 2 1  ￿ ￿ 3 2  15 −15 15  0 30 14 15 1 1  2 2  1 1  2 3 −2 2   0 1 1 −1  2 2 0 0 1 1 2 2 −2 7 −3 7 ￿￿ ￿ 13 −26 0 13  6  7 7 −7 7 2  7 7 7  0 3 0 0 7 7 1  −1  2 2 2 0 1 1  −2 2 0 2 1 1  2 2  0 0 0 0 −1 −1 2 2 1 2  1  2  1  −2 1 2      2 0 2 0  1 0  0 1  1 2 3 5. 4. ￿ ￿ 4 1  1  −1 9   4  or  − 9  8 9  4  −1  4  1  −4 1 4 −1 4 1 4 1 4 −1 4 −1 4 1 4 3 4 1 4 1 4 −1 4 1 4 3 4      .  (b) (c) (d) (e) 3.176 Answers to Exercises (e)  ￿ 1 2  −1  2  1  −2 −1 2 5 13 12 13     .  2. 6. (a) − 12 13 5 13 1 2  −1  2  1  2 1 2     .

(a) ￿   0  2 1 √ 2 1 √ 6 2 √ 6 √ − 16 1 √ 2 5 −3 √ 2 5 1 √ 2 5 −3 √ 2 5 5 0 0  2 √ 0 5 1 − √3 1 √ 3 1 √ 3  3 √ 2 5 1 √ 2 5 3 √ 2 5 1 √ 2 5 ￿￿ ￿￿  2 0  0 0 0 ￿￿  2  0 √ 5 0 0   −1 0  1 0 0 − √5 2   −√ 5 0 1  √ 0  5 1 0 1 √ 5 2 √ 5 0 0 1 √ 5 2 √ 5  0 1 T T 1 − √3 1 √ 3 1 √ 3 1 0 3 √ 2 5 1 √ 2 5 0 0  2 2 0  0  1 0 −4 √ 2 1 √ 2 5 −3 √ 2 5  √ 1 1 √ 6 2 √ 6 √ − 16 0   T    ￿T .3 = .8 .3 .1 ￿ ￿ ￿ ￿ 1 √ 3 1 √ 3 1 √ 3 (b) ￿  1 0 0 −1 1 √ 3 1 √ 3 1 √ 3 1 √ 3 1 √ 3 1 √ 3 ￿￿ 3 √ 2 5 1 √ 2 5 1 √ 2 5 −3 √ 2 5 ￿T .9 . (a) 1 1 0 −3 1 1 ￿ ￿￿ ￿￿ ￿−1 −2 1 3 0 −2 1 (b) 1 2 0 −2 1 2    −1 1 1 0 2 0 0 1 1 0 (c)  0 1 0   0 3 0   0 1 0  0 0 1 0 0 3 0 0 1 ￿ √ 1 ￿ 2 1 ￿T ￿ ￿ −√ √ − 25 √5 ￿ 5 5 3 0 4.6 −.Answers to Exercises 177 SECTION 22 ￿ ￿￿ ￿￿ ￿−1 4 −1 2 0 4 −1 3.8 1 √ 6 1 √ 6 2 − √6 6. (a) 1 √ 2  √  − 12  0 (b)  1 √ 6 1 √ 6 2 − √6 1 √ 2 1 − √2   √  − 16  2 √ 6 1 − √6  1 0  0      0 0 1 0 0 −1 √ 3 2 1 2 0 1 2 √ − 3 2 1 √ 2  √   − 12    T    T    0 0 0 0  1 0   − √6  2 √ 1 6  1 − √6 1 √ 2 1 − √2 0 1 √ 3 1 √ 3 1 √ 3 .6 = . (a) 1 2 1 2 √ √ √ √ 0 −2 5 5 5 5 (b)     2 − √5 1 √ 5 0 1 √ 5 2 √ 5 0 (c) 2 √ 0 5  −1 0  1 0 − √5 1 √ 5  0  4 0  0  0 −1 0 0 1  0 5 0 (d) √ 1 5.

1. 15. for B are 0. ±i.178 Answers to Exercises (c)   7. 0. 1 − √2  0  1 √ 2 0 −1 0 0  0 0 1 0 1 √ 2 0 1 √ 2   0 0 0 1 0 0 0 −1 0 1   −1 0 0 0   0 0   1 0 0 0 0 1 0 0 0 0 0 0 −1 0 1 0  T 1 1   −√ √ 0  0 2 2 1 0 0 1   0 0 −1   0 0 1    1 1 √ √ 0 1 0 0 0 2 2 T 0 0 0 1  1 0 0 0  1  =  2 1 √ 2 −1 2 1 − √2 0 −1 2 1 √ 2 1 2     1 − √2 . Eigenvalues for A are ±1. 0.